Growing Pains#
The Docker setup worked great. CI/CD was humming along . And then at work, the requirements changed. Multiple services needed to scale independently. Some needed to run on a schedule. Others needed to survive node failures without manual intervention. Docker Compose on a single machine wasn’t going to cut it anymore.
Enter Kubernetes. Or as I like to call it, “Docker Compose’s older sibling who went to business school.”
The Translation Guide#
If you already understand Docker, Kubernetes concepts map pretty directly. The names just get fancier.
| Docker Compose | Kubernetes | What Changed |
|---|---|---|
| Container | Pod | A pod can have multiple containers (sidecars) |
| Service in compose | Deployment | Manages replicas, rolling updates, rollbacks |
| Port mapping | Service (ClusterIP) | DNS-based discovery instead of port numbers |
docker-compose up --scale=3 | HPA (Horizontal Pod Autoscaler) | Auto-scales based on CPU/memory metrics |
| Docker network | Namespace | Logical isolation + RBAC boundaries |
restart: always | Built-in self-healing | K8s restarts crashed pods automatically |
.env file | Secrets / ConfigMaps | Managed by the cluster, not files on disk |
The biggest mental shift: you stop thinking about where things run and start thinking about what should be running. You tell Kubernetes “I want 3 replicas of this service” and it figures out which nodes to put them on.
One Image, Many Roles#
One pattern I use at work is deploying the same Docker image with different startup commands. Instead of building separate images for the API server, the scheduler, and a background worker, it’s one image — three deployments:
# The API server — handles HTTP requests
containers:
- name: app
image: registry.example.com/my-service:v1.2.3
command: ["npm", "run", "start:server"] # serves the API
resources:
requests:
cpu: 100m # minimum CPU guaranteed
memory: 384Mi # minimum memory guaranteed
limits:
cpu: 500m # maximum CPU allowed
memory: 512Mi # maximum memory allowed
# The scheduler — runs cron jobs (only 1 replica to avoid duplicates)
containers:
- name: app
image: registry.example.com/my-service:v1.2.3
command: ["npm", "run", "start:scheduler"] # processes scheduled tasks
Same codebase, same image, different entry points. The scheduler runs as a single replica (you don’t want two instances both trying to send the same scheduled email). The API server scales to handle traffic.
Scaling on Autopilot#
Horizontal Pod Autoscaler (HPA) adjusts replicas based on real metrics. No more guessing how many instances you need:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 3 # never go below 3
maxReplicas: 10 # never go above 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # scale up when CPU > 70%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 # scale up when memory > 80%
During off-peak hours, you run 3 pods. Traffic spikes? Kubernetes spins up more, up to 10. Traffic drops? It scales back down. You pay for what you use (mostly).
Are You Alive? Are You Ready?#
Kubernetes has two types of health checks, and the distinction matters:
Liveness probe: “Is this container still alive?” If it fails, Kubernetes kills the container and restarts it. This catches containers that are technically running but stuck (deadlocked, out of memory, infinite loop).
Readiness probe: “Can this container handle traffic right now?” If it fails, Kubernetes removes the pod from the Service’s load balancer. The container stays running — it just doesn’t receive new requests until it’s ready again.
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30 # give the app 30s to start
periodSeconds: 10 # check every 10s
failureThreshold: 3 # 3 failures = restart
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10 # shorter — we want to serve traffic ASAP
periodSeconds: 5 # check more frequently
failureThreshold: 3
Both hit the same /health endpoint, but the readiness probe starts earlier and checks more often. You want to start serving traffic as soon as possible, but you want to be sure the container is actually functional before you do.
Resource Requests: The Scheduler’s Cheat Sheet#
Every container declares what it needs (requests) and its maximum (limits):
resources:
requests:
cpu: 100m # "I need at least 0.1 CPU cores"
memory: 256Mi # "I need at least 256MB RAM"
limits:
cpu: 500m # "Don't let me use more than 0.5 cores"
memory: 512Mi # "Kill me if I exceed 512MB"
Requests affect scheduling — Kubernetes places pods on nodes that have enough capacity. Limits enforce boundaries — exceed your memory limit and the container gets OOM-killed. Set requests too low and your pods get scheduled on crowded nodes. Set them too high and you waste resources.
Getting these numbers right is more art than science. Start generous, monitor actual usage (we’ll cover that in a later post ), and adjust.
The K9s Connection#
Remember k9s from the terminal post
? This is where it shines. Instead of typing kubectl get pods -n my-namespace, you type :pods in k9s and navigate visually. View logs, shell into containers, port-forward — all with keyboard shortcuts. Once you’re managing a cluster with dozens of pods across multiple namespaces, a TUI makes a real difference.
What’s Under the Cluster#
So far I’ve talked about what runs on Kubernetes, but not what runs Kubernetes. The cluster itself needs to live somewhere — and managing that infrastructure is its own challenge. We run ours on Google Kubernetes Engine (GKE), which handles the control plane so we don’t have to. That’s the next post .
