The Bouncer at the Door

Table of Contents

10,000 Requests Per Second
#

Imagine your API gets 10,000 requests in one second from the same IP address. Maybe it’s a bot scraping your data. Maybe it’s a misconfigured client stuck in a retry loop. Maybe someone’s genuinely trying to bring your service down. Whatever the reason, your database doesn’t care about intent — it just sees 10,000 queries it wasn’t designed to handle simultaneously.

Rate limiting is the bouncer. It counts how many times you’ve knocked, and after a certain point, the door stays closed.

The Four Algorithms
#

There are four main ways to implement rate limiting, and they each handle the “how do we count requests” question differently.

Fixed Window is the simplest. Divide time into 1-minute chunks. Count requests per chunk. Reset at the start of each minute. The problem? If someone sends 100 requests at 11:59:59 and another 100 at 12:00:00, they’ve effectively sent 200 in 2 seconds. The window boundary is exploitable.

Sliding Window fixes this by looking at a moving time frame instead of fixed chunks. It considers a weighted average of the current and previous window, smoothing out the boundary problem. Slightly more complex, significantly more fair.

Token Bucket is the most popular for APIs. Imagine a bucket that holds 10 tokens, refilling at 1 token per second. Each request takes a token. When the bucket is empty, requests get rejected. This allows short bursts (you can spend all 10 tokens at once) while enforcing a sustained rate.

Leaky Bucket processes requests at a constant rate, like water dripping from a hole. Excess requests queue up. If the queue is full, they’re dropped. Perfect for scenarios where you need a steady, predictable processing rate — but it doesn’t handle bursts well.

Algorithm	Bursts OK?	Accuracy	Best For
Fixed Window	Exploitable at boundaries	Low	Internal/simple APIs
Sliding Window	Controlled	High	Public APIs
Token Bucket	Yes (controlled)	High	Most API rate limiting
Leaky Bucket	No	Perfect rate	Constant-rate processing

If you’re picking one: token bucket for most APIs. It balances steady rate control with accommodating legitimate bursts.

The Implementation Is Simpler Than You Think
#

With Redis, a basic rate limiter is a few lines:

const key = `ratelimit:${userId}:${currentMinute}`;
const count = await redis.incr(key); // atomic increment
if (count === 1) await redis.expire(key, 60); // set TTL on first request
if (count > 100)
  return res.status(429).json({
    error: "Rate limited",
    retryAfter: 30,
  });

The 429 Too Many Requests status code exists specifically for this. Good APIs also return headers telling the client what’s going on:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1635724800
Retry-After: 30

This lets well-behaved clients back off and retry at the right time instead of hammering the endpoint.

Layers, Not a Single Wall
#

The thing I learned setting up my homelab infrastructure is that rate limiting shouldn’t happen at just one layer. I run CrowdSec alongside Traefik — it monitors traffic patterns, maintains IP reputation lists, and can ban IPs automatically. But that’s just the outer layer.

Cloudflare (edge): 1000 req/min per IP → blocks DDoS before it hits you
  → Traefik + CrowdSec (proxy): 500 req/min per IP → catches what Cloudflare misses
    → Application: 100 req/min per user → business logic limits (free vs paid tiers)
      → Database: Connection pooling → protects DB from app overload

Each layer catches different things. Cloudflare stops volumetric attacks at the edge — the traffic never reaches your infrastructure. CrowdSec catches behavioral patterns (credential stuffing, slow-and-low attacks). Application-level limiting enforces business rules (free tier gets 100 requests/minute, paid gets 1000). And connection pooling protects the database from the application itself.

No single layer is enough. A DDoS attack bypasses your application rate limiter because your app is already overwhelmed before it can count requests. A slow credential-stuffing attack flies under Cloudflare’s radar because each individual request looks normal. You need depth.

The Tricky Part: What to Limit By
#

Rate limiting by IP seems obvious, but it breaks in practice:

Corporate offices share one IP — one heavy user limits everyone
VPNs and proxies funnel many users through one IP
Mobile users change IPs frequently — the limit resets

Better options:

Limit By	Pros	Cons
IP	No auth needed, catches bots	Shared IPs, VPN issues
API Key	Per-client limits, tier-based	Requires auth on every route
User ID	Most accurate	Only works for authenticated endpoints
IP + User	Best of both	More complex to implement

For public APIs (no auth), IP is your only option. For authenticated APIs, limit by user or API key. For a combination, use IP limiting on unauthenticated endpoints (login, register) and user-based limiting on everything else.

When Rate Limiting Saves You (That Isn’t DDoS)
#

The most common case isn’t attacks — it’s protecting yourself from your own consumers. A frontend dev writes a useEffect that fires on every keystroke, each one calling your search API. A mobile client has a retry bug that hammers an endpoint on failure. A partner integration test script runs against production by accident.

Rate limiting catches all of these. Not because they’re malicious, but because systems misbehave, and the API needs to protect itself regardless of intent.