Designing API Rate Limits: Token Bucket vs Sliding Window

October 20, 2025 1 min read

The two approaches

Token bucket: Refill tokens at a fixed rate. Each request consumes one. Simple, efficient, but allows bursts up to the bucket size.

Sliding window: Count requests in the last N seconds. More precise rate enforcement, but requires tracking timestamps per user.

Token bucket implementation

type TokenBucket struct {
    capacity  int
    tokens    int
    rate      float64
    lastRefill time.Time
}

func (tb *TokenBucket) Allow() bool {
    now := time.Now()
    elapsed := now.Sub(tb.lastRefill).Seconds()
    tb.tokens = min(tb.capacity, tb.tokens + int(elapsed * tb.rate))
    tb.lastRefill = now

    if tb.tokens > 0 {
        tb.tokens--
        return true
    }
    return false
}

When each makes sense

Token bucket for per-endpoint rate limits where bursts are acceptable
Sliding window for per-user limits where fairness matters

We use token bucket for anonymous API keys and sliding window for paid tiers.

What I learned

Start with token bucket. It’s O(1) per request and easy to reason about. Only add sliding window when you have a user complaining about unfair rate limiting.