Skip to main content
  1. Posts/

The Bouncer at the Door

·854 words·5 mins
Photograph By Enrico Bet
Blog Software Engineering System Design
Table of Contents

10,000 Requests Per Second
#

Imagine your API gets 10,000 requests in one second from the same IP address. Maybe it’s a bot scraping your data. Maybe it’s a misconfigured client stuck in a retry loop. Maybe someone’s genuinely trying to bring your service down. Whatever the reason, your database doesn’t care about intent — it just sees 10,000 queries it wasn’t designed to handle simultaneously.

Rate limiting is the bouncer. It counts how many times you’ve knocked, and after a certain point, the door stays closed.

The Four Algorithms
#

There are four main ways to implement rate limiting, and they each handle the “how do we count requests” question differently.

Fixed Window is the simplest. Divide time into 1-minute chunks. Count requests per chunk. Reset at the start of each minute. The problem? If someone sends 100 requests at 11:59:59 and another 100 at 12:00:00, they’ve effectively sent 200 in 2 seconds. The window boundary is exploitable.

Sliding Window fixes this by looking at a moving time frame instead of fixed chunks. It considers a weighted average of the current and previous window, smoothing out the boundary problem. Slightly more complex, significantly more fair.

Token Bucket is the most popular for APIs. Imagine a bucket that holds 10 tokens, refilling at 1 token per second. Each request takes a token. When the bucket is empty, requests get rejected. This allows short bursts (you can spend all 10 tokens at once) while enforcing a sustained rate.

Leaky Bucket processes requests at a constant rate, like water dripping from a hole. Excess requests queue up. If the queue is full, they’re dropped. Perfect for scenarios where you need a steady, predictable processing rate — but it doesn’t handle bursts well.

AlgorithmBursts OK?AccuracyBest For
Fixed WindowExploitable at boundariesLowInternal/simple APIs
Sliding WindowControlledHighPublic APIs
Token BucketYes (controlled)HighMost API rate limiting
Leaky BucketNoPerfect rateConstant-rate processing

If you’re picking one: token bucket for most APIs. It balances steady rate control with accommodating legitimate bursts.

The Implementation Is Simpler Than You Think
#

With Redis, a basic rate limiter is a few lines:

const key = `ratelimit:${userId}:${currentMinute}`;
const count = await redis.incr(key); // atomic increment
if (count === 1) await redis.expire(key, 60); // set TTL on first request
if (count > 100)
  return res.status(429).json({
    error: "Rate limited",
    retryAfter: 30,
  });

The 429 Too Many Requests status code exists specifically for this. Good APIs also return headers telling the client what’s going on:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1635724800
Retry-After: 30

This lets well-behaved clients back off and retry at the right time instead of hammering the endpoint.

Layers, Not a Single Wall
#

The thing I learned setting up my homelab infrastructure is that rate limiting shouldn’t happen at just one layer. I run CrowdSec alongside Traefik — it monitors traffic patterns, maintains IP reputation lists, and can ban IPs automatically. But that’s just the outer layer.

Cloudflare (edge): 1000 req/min per IP → blocks DDoS before it hits you
  → Traefik + CrowdSec (proxy): 500 req/min per IP → catches what Cloudflare misses
    → Application: 100 req/min per user → business logic limits (free vs paid tiers)
      → Database: Connection pooling → protects DB from app overload

Each layer catches different things. Cloudflare stops volumetric attacks at the edge — the traffic never reaches your infrastructure. CrowdSec catches behavioral patterns (credential stuffing, slow-and-low attacks). Application-level limiting enforces business rules (free tier gets 100 requests/minute, paid gets 1000). And connection pooling protects the database from the application itself.

No single layer is enough. A DDoS attack bypasses your application rate limiter because your app is already overwhelmed before it can count requests. A slow credential-stuffing attack flies under Cloudflare’s radar because each individual request looks normal. You need depth.

The Tricky Part: What to Limit By
#

Rate limiting by IP seems obvious, but it breaks in practice:

  • Corporate offices share one IP — one heavy user limits everyone
  • VPNs and proxies funnel many users through one IP
  • Mobile users change IPs frequently — the limit resets

Better options:

Limit ByProsCons
IPNo auth needed, catches botsShared IPs, VPN issues
API KeyPer-client limits, tier-basedRequires auth on every route
User IDMost accurateOnly works for authenticated endpoints
IP + UserBest of bothMore complex to implement

For public APIs (no auth), IP is your only option. For authenticated APIs, limit by user or API key. For a combination, use IP limiting on unauthenticated endpoints (login, register) and user-based limiting on everything else.

When Rate Limiting Saves You (That Isn’t DDoS)
#

The most common case isn’t attacks — it’s protecting yourself from your own consumers. A frontend dev writes a useEffect that fires on every keystroke, each one calling your search API. A mobile client has a retry bug that hammers an endpoint on failure. A partner integration test script runs against production by accident.

Rate limiting catches all of these. Not because they’re malicious, but because systems misbehave, and the API needs to protect itself regardless of intent.

Aaron Yong
Author
Aaron Yong
Building things for the web. Writing about development, Linux, cloud, and everything in between.

Related

The Fastest Code Never Runs
·1531 words·8 mins
Photograph By Kelly Sikkema
Blog Software Engineering System Design
Caching, Redis, and the art of not hitting your database
You Probably Don't Need Kafka
·1339 words·7 mins
Photograph By John Cameron
Blog Software Engineering System Design
A practical guide to message queues for developers who nod along when someone says ’let’s use Kafka'
Things I Changed My Mind On
·1353 words·7 mins
Photograph By ThisisEngineering - Unsplash
Blog Software Engineering
Opinions that didn’t survive contact with production