Skip to main content
  1. Posts/

Traffic Cops

·682 words·4 mins
Photograph By Adil Edin
Blog Software Engineering System Design
Table of Contents

Why One Server Isn’t Enough
#

You build a service. It runs on one server. Traffic grows. The server starts sweating. You could make it bigger (vertical scaling), but eventually you need a second server. And a third. And now you have a new problem: how does traffic know which server to go to?

That’s a load balancer. It sits in front of your servers, receives every request, and decides which backend handles it. Simple concept, surprisingly deep once you start asking “how does it decide?”

L4 vs L7 (The Two Speeds)
#

Load balancers operate at different layers of the network stack, and which layer matters more than you’d think.

Layer 4 (Network Layer) looks at IP addresses and ports. That’s it. It doesn’t open the request, doesn’t read the URL, doesn’t know if it’s HTTP or a database connection. It just forwards TCP packets. This makes it blazing fast (50-100 microseconds) but completely blind to content.

Layer 7 (Application Layer) reads the full HTTP request — URL path, headers, cookies, everything. It can route /api/* to one pool of servers and /images/* to another. It can do SSL termination, add headers, and make content-aware decisions. But it’s slower (0.5-3ms) because it has to actually parse the request.

Layer 4Layer 7
SeesIP, port, protocolURL, headers, cookies, body
Speed50-100μs0.5-3ms
RoutingBy IP/port onlyBy path, host, header, cookie
SSLPassthroughTerminates
Best forNon-HTTP, max throughputHTTP, microservices, content routing

I use both. Traefik in my homelab is L7 — it reads the Host header to route grafana.example.com to Grafana and app.example.com to my app. The GCE Ingress in production is also L7 but implemented completely differently — it’s an actual Google Cloud Load Balancer, not a pod in my cluster. Kubernetes Services use L4 internally for pod-to-pod traffic where content-aware routing isn’t needed.

Most production setups use both: L4 at the edge for raw throughput, L7 behind it for intelligent routing.

The Algorithms
#

Once the load balancer receives a request, it needs to pick a server. There are several ways to make that choice.

Round Robin is the simplest — requests go to servers in order: 1, 2, 3, 1, 2, 3. Fair if all servers are identical and all requests cost the same. Unfair in every other scenario.

Least Connections routes to whichever server has the fewest active connections. This adapts to reality — if Server 2 is handling a slow request, it has more active connections, so new requests go to Server 1 or 3. Much better for variable workloads.

Consistent Hashing hashes something about the request (client IP, a header value) to determine the server. The same client always hits the same server. The clever part: when you add or remove a server, only a fraction of requests remap — not all of them. This is critical for caches, where rehashing everything invalidates your entire cache.

AlgorithmSmart?Session Sticky?Best For
Round RobinNoNoIdentical servers, similar requests
Least ConnectionsYesNoVariable request duration
IP HashNoYes (by IP)Simple session affinity
Consistent HashingModerateYes (by key)Caches, distributed stores

The Traefik vs GCE Experience
#

Running Traefik at home and GCE Ingress at work taught me that the same concept — “put something in front of your servers” — can look completely different in practice.

Traefik watches the Docker socket. A new container with the right labels? Traefik picks it up and starts routing to it. No config reload, no deployment step. It lives inside my infrastructure.

GCE Ingress is the opposite. When I create a Kubernetes Ingress resource, GKE provisions a real Google Cloud Load Balancer — an external piece of infrastructure with a static IP, managed by Google. It takes minutes to provision because actual cloud resources are being created. But it handles global anycast routing, Google-edge SSL termination, and scales to levels my Traefik instance never will.

Same concept, wildly different implementations. Understanding which one to use where is the difference between an over-engineered homelab and a production system that handles real traffic.

Aaron Yong
Author
Aaron Yong
Building things for the web. Writing about development, Linux, cloud, and everything in between.

Related

The Bouncer at the Door
·854 words·5 mins
Photograph By Enrico Bet
Blog Software Engineering System Design
Rate limiting algorithms, layered protection, and why your API needs a velvet rope
The Fastest Code Never Runs
·1531 words·8 mins
Photograph By Kelly Sikkema
Blog Software Engineering System Design
Caching, Redis, and the art of not hitting your database
You Probably Don't Need Kafka
·1339 words·7 mins
Photograph By John Cameron
Blog Software Engineering System Design
A practical guide to message queues for developers who nod along when someone says ’let’s use Kafka'