Why One Server Isn’t Enough#
You build a service. It runs on one server. Traffic grows. The server starts sweating. You could make it bigger (vertical scaling), but eventually you need a second server. And a third. And now you have a new problem: how does traffic know which server to go to?
That’s a load balancer. It sits in front of your servers, receives every request, and decides which backend handles it. Simple concept, surprisingly deep once you start asking “how does it decide?”
L4 vs L7 (The Two Speeds)#
Load balancers operate at different layers of the network stack, and which layer matters more than you’d think.
Layer 4 (Network Layer) looks at IP addresses and ports. That’s it. It doesn’t open the request, doesn’t read the URL, doesn’t know if it’s HTTP or a database connection. It just forwards TCP packets. This makes it blazing fast (50-100 microseconds) but completely blind to content.
Layer 7 (Application Layer) reads the full HTTP request — URL path, headers, cookies, everything. It can route /api/* to one pool of servers and /images/* to another. It can do SSL termination, add headers, and make content-aware decisions. But it’s slower (0.5-3ms) because it has to actually parse the request.
| Layer 4 | Layer 7 | |
|---|---|---|
| Sees | IP, port, protocol | URL, headers, cookies, body |
| Speed | 50-100μs | 0.5-3ms |
| Routing | By IP/port only | By path, host, header, cookie |
| SSL | Passthrough | Terminates |
| Best for | Non-HTTP, max throughput | HTTP, microservices, content routing |
I use both. Traefik
in my homelab is L7 — it reads the Host header to route grafana.example.com to Grafana and app.example.com to my app. The GCE Ingress
in production is also L7 but implemented completely differently — it’s an actual Google Cloud Load Balancer, not a pod in my cluster. Kubernetes Services use L4 internally for pod-to-pod traffic where content-aware routing isn’t needed.
Most production setups use both: L4 at the edge for raw throughput, L7 behind it for intelligent routing.
The Algorithms#
Once the load balancer receives a request, it needs to pick a server. There are several ways to make that choice.
Round Robin is the simplest — requests go to servers in order: 1, 2, 3, 1, 2, 3. Fair if all servers are identical and all requests cost the same. Unfair in every other scenario.
Least Connections routes to whichever server has the fewest active connections. This adapts to reality — if Server 2 is handling a slow request, it has more active connections, so new requests go to Server 1 or 3. Much better for variable workloads.
Consistent Hashing hashes something about the request (client IP, a header value) to determine the server. The same client always hits the same server. The clever part: when you add or remove a server, only a fraction of requests remap — not all of them. This is critical for caches, where rehashing everything invalidates your entire cache.
| Algorithm | Smart? | Session Sticky? | Best For |
|---|---|---|---|
| Round Robin | No | No | Identical servers, similar requests |
| Least Connections | Yes | No | Variable request duration |
| IP Hash | No | Yes (by IP) | Simple session affinity |
| Consistent Hashing | Moderate | Yes (by key) | Caches, distributed stores |
The Traefik vs GCE Experience#
Running Traefik at home and GCE Ingress at work taught me that the same concept — “put something in front of your servers” — can look completely different in practice.
Traefik watches the Docker socket. A new container with the right labels? Traefik picks it up and starts routing to it. No config reload, no deployment step. It lives inside my infrastructure.
GCE Ingress is the opposite. When I create a Kubernetes Ingress resource, GKE provisions a real Google Cloud Load Balancer — an external piece of infrastructure with a static IP, managed by Google. It takes minutes to provision because actual cloud resources are being created. But it handles global anycast routing, Google-edge SSL termination, and scales to levels my Traefik instance never will.
Same concept, wildly different implementations. Understanding which one to use where is the difference between an over-engineered homelab and a production system that handles real traffic.
