Why Our HPA Was Scaling Too Slowly (and How We Fixed It)
1 min read
The symptom
Traffic spikes would cause p99 latency to climb for 3-4 minutes before new pods came online. By the time they did, the spike was over and we’d over-provision.
The root cause
Our HPA was using the default metrics pipeline:
# Before
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Default behavior: HPA evaluates every 15 seconds, but the default --horizontal-pod-autoscaler-upscale-delay is 3 minutes. Plus, Metrics Server scrapes every 60 seconds, and HPA reads from its metrics. That’s a 4-minute lag.
The fix
# After — faster reaction
spec:
behavior:
scaleUp:
stabilizationWindowSeconds: 30 # down from default 300
policies:
- type: Percent
value: 200 # double pods per step
periodSeconds: 30
selectPolicy: Faster
Also switched Metrics Server scrape interval to 15s.
What I learned
Kubernetes defaults are tuned for stability, not responsiveness. If your traffic is bursty, you need to explicitly tune the HPA behavior block. The stabilization window is the biggest lever.