10,000 Requests Per Second#
Imagine your API gets 10,000 requests in one second from the same IP address. Maybe it’s a bot scraping your data. Maybe it’s a misconfigured client stuck in a retry loop. Maybe someone’s genuinely trying to bring your service down. Whatever the reason, your database doesn’t care about intent — it just sees 10,000 queries it wasn’t designed to handle simultaneously.
Rate limiting is the bouncer. It counts how many times you’ve knocked, and after a certain point, the door stays closed.
The Four Algorithms#
There are four main ways to implement rate limiting, and they each handle the “how do we count requests” question differently.
Fixed Window is the simplest. Divide time into 1-minute chunks. Count requests per chunk. Reset at the start of each minute. The problem? If someone sends 100 requests at 11:59:59 and another 100 at 12:00:00, they’ve effectively sent 200 in 2 seconds. The window boundary is exploitable.
Sliding Window fixes this by looking at a moving time frame instead of fixed chunks. It considers a weighted average of the current and previous window, smoothing out the boundary problem. Slightly more complex, significantly more fair.
Token Bucket is the most popular for APIs. Imagine a bucket that holds 10 tokens, refilling at 1 token per second. Each request takes a token. When the bucket is empty, requests get rejected. This allows short bursts (you can spend all 10 tokens at once) while enforcing a sustained rate.
Leaky Bucket processes requests at a constant rate, like water dripping from a hole. Excess requests queue up. If the queue is full, they’re dropped. Perfect for scenarios where you need a steady, predictable processing rate — but it doesn’t handle bursts well.
| Algorithm | Bursts OK? | Accuracy | Best For |
|---|---|---|---|
| Fixed Window | Exploitable at boundaries | Low | Internal/simple APIs |
| Sliding Window | Controlled | High | Public APIs |
| Token Bucket | Yes (controlled) | High | Most API rate limiting |
| Leaky Bucket | No | Perfect rate | Constant-rate processing |
If you’re picking one: token bucket for most APIs. It balances steady rate control with accommodating legitimate bursts.
The Implementation Is Simpler Than You Think#
With Redis, a basic rate limiter is a few lines:
const key = `ratelimit:${userId}:${currentMinute}`;
const count = await redis.incr(key); // atomic increment
if (count === 1) await redis.expire(key, 60); // set TTL on first request
if (count > 100)
return res.status(429).json({
error: "Rate limited",
retryAfter: 30,
});
The 429 Too Many Requests status code exists specifically for this. Good APIs also return headers telling the client what’s going on:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1635724800
Retry-After: 30
This lets well-behaved clients back off and retry at the right time instead of hammering the endpoint.
Layers, Not a Single Wall#
The thing I learned setting up my homelab infrastructure is that rate limiting shouldn’t happen at just one layer. I run CrowdSec alongside Traefik — it monitors traffic patterns, maintains IP reputation lists, and can ban IPs automatically. But that’s just the outer layer.
Cloudflare (edge): 1000 req/min per IP → blocks DDoS before it hits you
→ Traefik + CrowdSec (proxy): 500 req/min per IP → catches what Cloudflare misses
→ Application: 100 req/min per user → business logic limits (free vs paid tiers)
→ Database: Connection pooling → protects DB from app overload
Each layer catches different things. Cloudflare stops volumetric attacks at the edge — the traffic never reaches your infrastructure. CrowdSec catches behavioral patterns (credential stuffing, slow-and-low attacks). Application-level limiting enforces business rules (free tier gets 100 requests/minute, paid gets 1000). And connection pooling protects the database from the application itself.
No single layer is enough. A DDoS attack bypasses your application rate limiter because your app is already overwhelmed before it can count requests. A slow credential-stuffing attack flies under Cloudflare’s radar because each individual request looks normal. You need depth.
The Tricky Part: What to Limit By#
Rate limiting by IP seems obvious, but it breaks in practice:
- Corporate offices share one IP — one heavy user limits everyone
- VPNs and proxies funnel many users through one IP
- Mobile users change IPs frequently — the limit resets
Better options:
| Limit By | Pros | Cons |
|---|---|---|
| IP | No auth needed, catches bots | Shared IPs, VPN issues |
| API Key | Per-client limits, tier-based | Requires auth on every route |
| User ID | Most accurate | Only works for authenticated endpoints |
| IP + User | Best of both | More complex to implement |
For public APIs (no auth), IP is your only option. For authenticated APIs, limit by user or API key. For a combination, use IP limiting on unauthenticated endpoints (login, register) and user-based limiting on everything else.
When Rate Limiting Saves You (That Isn’t DDoS)#
The most common case isn’t attacks — it’s protecting yourself from your own consumers. A frontend dev writes a useEffect that fires on every keystroke, each one calling your search API. A mobile client has a retry bug that hammers an endpoint on failure. A partner integration test script runs against production by accident.
Rate limiting catches all of these. Not because they’re malicious, but because systems misbehave, and the API needs to protect itself regardless of intent.
