Why Our HPA Was Scaling Too Slowly (and How We Fixed It)

December 2, 2025 1 min read

The symptom

Traffic spikes would cause p99 latency to climb for 3-4 minutes before new pods came online. By the time they did, the spike was over and we’d over-provision.

The root cause

Our HPA was using the default metrics pipeline:

# Before
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Default behavior: HPA evaluates every 15 seconds, but the default --horizontal-pod-autoscaler-upscale-delay is 3 minutes. Plus, Metrics Server scrapes every 60 seconds, and HPA reads from its metrics. That’s a 4-minute lag.

The fix

# After — faster reaction
spec:
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30   # down from default 300
      policies:
      - type: Percent
        value: 200                     # double pods per step
        periodSeconds: 30
      selectPolicy: Faster

Also switched Metrics Server scrape interval to 15s.

What I learned

Kubernetes defaults are tuned for stability, not responsiveness. If your traffic is bursty, you need to explicitly tune the HPA behavior block. The stabilization window is the biggest lever.