GOMAXPROCS and Kubernetes: The Problem Everyone Had and Nobody Knew About
Hey everyone!
You’ve looked at a Kubernetes dashboard, seen CPU usage looking fine, watched latency climb, p99 spiking, and had no idea why.
This is one of those problems that exists in almost every Go application running on Kubernetes but rarely shows up in runbooks. The pod is not using too much CPU. It’s not running out of memory. It’s being throttled by the kernel, and the reason is that the Go runtime created far more threads than the container was supposed to have.
Before Go 1.25, released in August 2025, this was the default behavior.
How GOMAXPROCS works
GOMAXPROCS is the variable that controls how many OS threads the Go scheduler uses to execute goroutines in parallel. By default, the runtime sets this value based on the number of logical CPUs available.
You can read and set this at runtime:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
package main
import (
"fmt"
"runtime"
)
func main() {
// Returns the current GOMAXPROCS value
current := runtime.GOMAXPROCS(0)
fmt.Printf("Current GOMAXPROCS: %d\n", current)
// Set manually to 4
runtime.GOMAXPROCS(4)
fmt.Printf("GOMAXPROCS after setting: %d\n", runtime.GOMAXPROCS(0))
}
The value 0 passed to GOMAXPROCS is a convention: it returns the current value without changing anything.
Before Go 1.25, when the process started, the runtime called runtime.NumCPU(), which reads /proc/cpuinfo or uses system calls to find out how many CPUs the host has. The problem is that this returns the CPUs of the physical node, not the CPUs allocated to the container.
What happens in a pod with a CPU limit
Imagine a common scenario: you have a node with 64 cores and a pod with the following limit:
1
2
3
4
5
resources:
requests:
cpu: "500m"
limits:
cpu: "2"
When your Go application starts in this pod, the runtime sees 64 CPUs available (the node’s) and sets GOMAXPROCS = 64. Result: 64 OS threads trying to execute goroutines in parallel.
Linux uses the CFS (Completely Fair Scheduler) to control CPU usage per container. When a container exceeds its CPU quota, CFS throttles it: it freezes the processes for a period of time so the quota is respected.
With 64 threads trying to run simultaneously and only 2 cores of quota, the container gets throttled very frequently, even when average CPU usage is low.
CPU throttling vs CPU saturation
This is the point that confuses a lot of people.
CPU saturation happens when the application wants more CPU than it has available. Usage stays high, close to 100%.
CPU throttling in Kubernetes is different. The container can be throttled even with low CPU usage. What matters is not the average usage, but short bursts of parallel activity.
When 64 threads wake up at the same time to process requests, the burst of usage exceeds the 2-core quota for a fraction of a second. The kernel freezes the container until the next CFS period (usually 100ms). Requests that arrived during that moment are left waiting.
The result is classic: 20% average CPU, normal p50 latency, p99 exploding.
1
2
3
4
5
Latency by percentile:
p50: 12ms (most requests go through fine)
p90: 45ms (some hit a bad window)
p99: 280ms (those that arrive during throttle wait 100ms+)
p99.9: 800ms
The average user notices nothing. The user who falls in the bad percentile thinks your system is slow. And the CPU dashboard shows green.
How to diagnose
Prometheus
If you use Prometheus with cAdvisor (default in most managed clusters), this metric shows throttling:
# Throttling rate per pod
rate(container_cpu_cfs_throttled_seconds_total{
container!="",
pod=~"my-app-.*"
}[5m])
A more direct way to see the throttle percentage:
# Percentage of throttled periods relative to total
sum(rate(container_cpu_cfs_throttled_periods_total{
container!="",
pod=~"my-app-.*"
}[5m]))
/
sum(rate(container_cpu_cfs_periods_total{
container!="",
pod=~"my-app-.*"
}[5m]))
If this value goes above 25%, you have a throttling problem worth investigating.
kubectl
For a quick view:
1
kubectl top pod -n my-namespace --sort-by=cpu
But remember: kubectl top shows average usage, not throttling. A pod with low CPU in top can still be heavily throttled.
To inspect the configured limits:
1
kubectl get pod my-pod -o jsonpath='{.spec.containers[*].resources}'
Log at application startup
The most direct way to confirm which GOMAXPROCS the runtime chose is to log it at startup:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
package main
import (
"fmt"
"runtime"
)
func main() {
fmt.Printf("GOMAXPROCS=%d, NumCPU=%d\n",
runtime.GOMAXPROCS(0),
runtime.NumCPU(),
)
// rest of application
}
If you see GOMAXPROCS=64, NumCPU=64 in a pod with cpu limit: 2, the problem is confirmed.
The solution before Go 1.25
Uber released the go.uber.org/automaxprocs library precisely to solve this. It reads the container’s cgroups information (v1 or v2) and adjusts GOMAXPROCS to reflect the configured CPU limit.
Usage is simple. Just import with a blank identifier:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
package main
import (
"fmt"
"runtime"
_ "go.uber.org/automaxprocs"
)
func main() {
// automaxprocs already ran in init()
// GOMAXPROCS now reflects the container's CPU limit
fmt.Printf("GOMAXPROCS=%d\n", runtime.GOMAXPROCS(0))
}
To add it to your project:
1
go get go.uber.org/automaxprocs
The library works like this:
- In
init(), reads/sys/fs/cgroup/cpu/cpu.cfs_quota_usandcpu.cfs_period_us(cgroups v1) - Or reads
/sys/fs/cgroup/cpu.max(cgroups v2) - Calculates how many CPUs the container is entitled to (quota / period)
- Calls
runtime.GOMAXPROCSwith that value
cgroups v1 vs v2
Most modern Kubernetes clusters (1.25+) use cgroups v2 by default. The practical difference for automaxprocs is where it reads the files:
1
2
3
4
5
6
7
8
cgroups v1:
/sys/fs/cgroup/cpu/cpu.cfs_quota_us (e.g.: 200000)
/sys/fs/cgroup/cpu/cpu.cfs_period_us (e.g.: 100000)
Available CPU = 200000 / 100000 = 2 cores
cgroups v2:
/sys/fs/cgroup/cpu.max (e.g.: "200000 100000")
Available CPU = 200000 / 100000 = 2 cores
automaxprocs handles both formats automatically.
What changed in Go 1.25
Go 1.25, released in August 2025, brought the native fix. The runtime now reads cgroups limits automatically at startup, without needing any external library.
The new default behavior is:
- The runtime checks whether it’s running inside a container (detects cgroups)
- If there’s a CPU limit configured, it uses that value to set GOMAXPROCS
- If there’s no limit (container without CPU limit), it keeps the previous behavior (host CPU count)
To verify that Go 1.25 is doing the right thing, use the same startup log:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
package main
import (
"fmt"
"runtime"
)
func main() {
procs := runtime.GOMAXPROCS(0)
cpus := runtime.NumCPU()
fmt.Printf("GOMAXPROCS=%d NumCPU=%d\n", procs, cpus)
// In Go 1.25+ in a container with cpu limit: 2
// you should see: GOMAXPROCS=2 NumCPU=64
}
How to disable the new behavior
If for some reason you need the previous behavior (for example, your application sets GOMAXPROCS manually via environment variable), you can disable cgroups reading:
1
GODEBUG=containeraware=0 ./my-application
Or in code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
package main
import (
"os"
"runtime"
)
func init() {
// Disables Go 1.25 container detection
// Useful if you configure GOMAXPROCS via environment variable
if v := os.Getenv("GOMAXPROCS"); v != "" {
// GOMAXPROCS will be applied by the environment variable
// the runtime respects this variable before cgroups detection
}
}
func main() {
runtime.GOMAXPROCS(4) // manual configuration
}
In practice, the GOMAXPROCS environment variable continues to be respected and takes precedence over automatic detection.
Complete example with automaxprocs and logging
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
package main
import (
"fmt"
"net/http"
"runtime"
_ "go.uber.org/automaxprocs"
)
func main() {
fmt.Printf("Starting server: GOMAXPROCS=%d NumCPU=%d\n",
runtime.GOMAXPROCS(0),
runtime.NumCPU(),
)
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintln(w, "ok")
})
fmt.Println("Server running on port 8080")
if err := http.ListenAndServe(":8080", nil); err != nil {
fmt.Printf("Error: %v\n", err)
}
}
CPU request vs CPU limit: the difference matters
Before wrapping up, it’s worth clarifying the distinction between request and limit in Kubernetes:
1
2
3
4
5
resources:
requests:
cpu: "500m" # 0.5 CPU - minimum guarantee for scheduling
limits:
cpu: "2" # 2 CPUs - maximum the container can use
CPU request: used by the Kubernetes scheduler to decide which node to place the pod on. Does not limit actual CPU usage. A pod with a 500m request can use more if the node has available resources.
CPU limit: this is the value the CFS uses for throttling. If the container tries to use more than this limit, the kernel throttles it. And this is the value that automaxprocs and Go 1.25 use to calculate the correct GOMAXPROCS.
Applications without a configured CPU limit don’t get the protection from automaxprocs or Go 1.25, because there’s no limit to respect. In that case, GOMAXPROCS continues to be the number of host CPUs.
References
- go.uber.org/automaxprocs - Official repository for Uber’s library
- Go 1.25 Release Notes - Go 1.25 release notes with details on container awareness
- runtime.GOMAXPROCS - Official function documentation
- Linux CFS Bandwidth Control - Kernel documentation on CFS and throttling
- Kubernetes Resource Management - Official documentation on requests and limits
- cgroups v2 - Kernel documentation on cgroups v2
- container_cpu_cfs_throttled_seconds_total - cAdvisor metrics in Prometheus
- A Practical Guide to Bandwidth Control - CFS bandwidth control guide
