Rate Limiting Algorithms: Token Bucket vs Sliding Window vs Fixed Window

Rate limiting is a foundational control in API security, abuse prevention, and distributed systems reliability. It determines how systems allocate finite capacity across users, services, and regions. Every rate limiting algorithm encodes a different model of time, fairness, and burst tolerance. Choosing the wrong one can create boundary exploits, uneven load, or distributed consistency failures. This guide explains how the major rate limiting algorithms work, where they fail in practice, and which one is the strongest default for modern APIs.

Rate Limiting Algorithms at a Glance

Algorithm	Fairness	Burst Tolerance	Memory Cost	Best For
Fixed Window	Low	High at window boundaries	Very Low	Simple internal limits
Sliding Window Log	High	Low	High	Strict fairness enforcement
Sliding Window Counter	Medium to High	Low to Medium	Moderate	Scalable public APIs
Token Bucket	Medium to High	Controlled bursts	Low	Developer-facing APIs
Leaky Bucket	High output smoothing	None	Low	Traffic shaping

Key conclusions:

Fixed window is simple but allows boundary burst amplification.
Sliding log is the most accurate but can become memory intensive.
Sliding counter is a practical compromise for scale.
Token bucket is the strongest general-purpose default for APIs.
Leaky bucket is best for shaping outbound or downstream traffic.

A useful mental model: token bucket models capacity accumulation. Sliding window models recent history. Fixed window models discrete accounting periods.

What Is Rate Limiting?

Rate limiting constrains how frequently an identity can perform an action within a defined time interval. The identity may be an IP address, API key, user ID, tenant ID, or service.

At a systems level, every rate limiter defines:

How time is measured.
How usage is recorded.
What happens when capacity is exhausted.

Rate limiting is not just about blocking traffic, it’s about encoding a policy for fairness, abuse resistance, and infrastructure protection.

What Is the Fixed Window Algorithm?

The fixed window algorithm divides time into discrete intervals such as 60 seconds. For each identity, a counter tracks how many requests occur within the current interval. When the counter exceeds the configured limit, additional requests are rejected until the next window begins.

Fixed Window Rate Limiting Example

A limit of 100 requests per minute means:

Requests between 12:00:00 and 12:00:59 increment one counter.
At 12:01:00, the counter resets to zero.

This is computationally efficient, and it requires only a counter and a window identifier. However, it introduces boundary amplification: A client can send 100 requests at 12:00:59 and another 100 at 12:01:00, and the system technically enforces 100 per minute, but 200 requests arrive within two seconds.

When Fixed Window Fails in Practice

A login endpoint protected by fixed window can be brute-forced at window boundaries.
Public APIs experience load spikes at reset times.
Multi-tenant systems see unfair traffic distribution near window edges.

Fixed window is rarely appropriate for public-facing APIs unless burst amplification is acceptable.

What Is the Sliding Window Algorithm?

The sliding window algorithm evaluates requests against a rolling time interval instead of discrete blocks. There are two primary implementations: sliding window log and sliding window counter.

Sliding Window Log: The system stores a timestamp for every request. When a new request arrives, timestamps older than the window duration are removed. The remaining entries are counted. This provides strong fairness. At any moment, the system evaluates exactly the last N seconds of traffic. However, memory usage grows with request volume. During traffic spikes, log-based implementations can create memory pressure and increased CPU overhead for pruning. Failure example: a high-throughput endpoint using sliding logs can experience memory amplification during an attack spike.

Sliding Window Counter: Instead of storing every timestamp, this approach blends counters from the current and previous window based on elapsed time. It approximates sliding behavior with far lower memory cost. Accuracy is slightly reduced, but boundary amplification is dramatically minimized.

For large-scale APIs, sliding counter is often the scalable alternative to sliding log.

What Is the Token Bucket Algorithm?

The token bucket algorithm models capacity as tokens accumulating over time.

Each identity has:

A maximum bucket capacity.
A refill rate.
A current token count.

Requests consume tokens, so if no tokens remain, requests are rejected. If a client is idle, tokens accumulate up to the bucket’s maximum capacity, this allows controlled bursts without violating long-term average limits For most developer-facing APIs, token bucket is the strongest default, it enforces a steady average rate while allowing short bursts, which matches real-world API usage patterns.

Where Token Bucket Fails

Extremely strict fairness requirements where recent history must be exact.
Situations where output rate must be perfectly smoothed rather than burst-tolerant.

What Is the Leaky Bucket Algorithm?

Leaky bucket enforces a constant output rate. Incoming requests enter a queue that drains at a steady pace. If the queue is full, new requests are rejected. Unlike token bucket, leaky bucket does not accumulate burst capacity, it smooths traffic strictly.

This is useful for:

Protecting downstream services.
Traffic shaping.
Preventing retry storms from overwhelming dependencies.

Distributed Rate Limiting and Global Coordination

Distributed rate limiting introduces coordination complexity. If multiple instances serve traffic for the same identity, they must share state. Without coordination, each node may independently allow traffic, multiplying the effective limit.

Strong vs Eventual Consistency

Strong consistency ensures accurate limits but increases latency and reduces availability. Eventual consistency improves resilience but allows temporary overages. In multi-region deployments, global rate limiting across regions requires explicit tradeoffs. During a regional partition, systems may double-allow traffic if coordination fails.

Common Failure Modes

Redis hot keys under high traffic.
Network latency increasing enforcement delays.
Clock skew affecting sliding window accuracy.
Region isolation allows temporary limit bypass.

Distributed rate limiting is fundamentally a consistency problem. Algorithm choice matters, but state coordination dominates complexity at scale.

Rate Limiting in Microservices Architectures

In microservices architectures, rate limiting must be placed deliberately. Limits at the edge or API gateway protect external exposure and enforce per-identity fairness. Internal limits serve a different purpose: they protect downstream dependencies and reduce the risk of cascading failures.

Problems arise when retries are not coordinated with enforcement, so if a downstream service begins failing and upstream services retry aggressively, total traffic can exceed the original request volume. In fan-out systems, where one request triggers multiple internal calls, this effect multiplies quickly.

Effective rate limiting in microservices requires aligning enforcement with retry behavior and service topology. Otherwise, it can amplify instability instead of containing it.

What Is the Best Rate Limiting Algorithm for APIs?

For most public APIs, the token bucket algorithm is the strongest default. It enforces a predictable long-term rate while allowing controlled bursts that reflect real usage patterns, which makes it well suited for developer-facing platforms.

If strict fairness is the primary requirement, particularly in security-sensitive environments, a sliding window log provides the highest accuracy because it evaluates exact recent request history. However, that precision comes with higher memory and operational cost.

For large-scale distributed systems where memory usage and coordination overhead matter, a sliding window counter is often the most practical compromise. It reduces boundary burst effects while remaining efficient enough to operate at high throughput.

Fixed window algorithms are rarely appropriate for public endpoints because boundary amplification can be exploited. Leaky bucket is better suited for traffic shaping and steady output control than for user-facing API fairness.

Ultimately, the correct choice depends on whether the system prioritizes fairness, scalability, burst tolerance, or downstream protection. In modern API architectures, the token bucket is typically the right starting point.