Enter the Cluster

Table of Contents

Infrastructure From Scratch - This article is part of a series.

Part 1: Containerizing Everything

Part 2: Building and Shipping with Docker

Part 3: This Article

Part 4: Running on Google's Cloud

Part 5: GitOps with ArgoCD

Part 6: Watching the Watchers

Part 7: Making Sense of Metrics

Growing Pains
#

The Docker setup worked great. CI/CD was humming along . And then at work, the requirements changed. Multiple services needed to scale independently. Some needed to run on a schedule. Others needed to survive node failures without manual intervention. Docker Compose on a single machine wasn’t going to cut it anymore.

Enter Kubernetes. Or as I like to call it, “Docker Compose’s older sibling who went to business school.”

The Translation Guide
#

If you already understand Docker, Kubernetes concepts map pretty directly. The names just get fancier.

Docker Compose	Kubernetes	What Changed
Container	Pod	A pod can have multiple containers (sidecars)
Service in compose	Deployment	Manages replicas, rolling updates, rollbacks
Port mapping	Service (ClusterIP)	DNS-based discovery instead of port numbers
`docker-compose up --scale=3`	HPA (Horizontal Pod Autoscaler)	Auto-scales based on CPU/memory metrics
Docker network	Namespace	Logical isolation + RBAC boundaries
`restart: always`	Built-in self-healing	K8s restarts crashed pods automatically
`.env` file	Secrets / ConfigMaps	Managed by the cluster, not files on disk

The biggest mental shift: you stop thinking about where things run and start thinking about what should be running. You tell Kubernetes “I want 3 replicas of this service” and it figures out which nodes to put them on.

One Image, Many Roles
#

One pattern I use at work is deploying the same Docker image with different startup commands. Instead of building separate images for the API server, the scheduler, and a background worker, it’s one image — three deployments:

# The API server — handles HTTP requests
containers:
  - name: app
    image: registry.example.com/my-service:v1.2.3
    command: ["npm", "run", "start:server"] # serves the API
    resources:
      requests:
        cpu: 100m # minimum CPU guaranteed
        memory: 384Mi # minimum memory guaranteed
      limits:
        cpu: 500m # maximum CPU allowed
        memory: 512Mi # maximum memory allowed

# The scheduler — runs cron jobs (only 1 replica to avoid duplicates)
containers:
  - name: app
    image: registry.example.com/my-service:v1.2.3
    command: ["npm", "run", "start:scheduler"] # processes scheduled tasks

Same codebase, same image, different entry points. The scheduler runs as a single replica (you don’t want two instances both trying to send the same scheduled email). The API server scales to handle traffic.

Scaling on Autopilot
#

Horizontal Pod Autoscaler (HPA) adjusts replicas based on real metrics. No more guessing how many instances you need:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 3 # never go below 3
  maxReplicas: 10 # never go above 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70 # scale up when CPU > 70%
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80 # scale up when memory > 80%

During off-peak hours, you run 3 pods. Traffic spikes? Kubernetes spins up more, up to 10. Traffic drops? It scales back down. You pay for what you use (mostly).

Are You Alive? Are You Ready?
#

Kubernetes has two types of health checks, and the distinction matters:

Liveness probe: “Is this container still alive?” If it fails, Kubernetes kills the container and restarts it. This catches containers that are technically running but stuck (deadlocked, out of memory, infinite loop).

Readiness probe: “Can this container handle traffic right now?” If it fails, Kubernetes removes the pod from the Service’s load balancer. The container stays running — it just doesn’t receive new requests until it’s ready again.

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30 # give the app 30s to start
  periodSeconds: 10 # check every 10s
  failureThreshold: 3 # 3 failures = restart

readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 10 # shorter — we want to serve traffic ASAP
  periodSeconds: 5 # check more frequently
  failureThreshold: 3

Both hit the same /health endpoint, but the readiness probe starts earlier and checks more often. You want to start serving traffic as soon as possible, but you want to be sure the container is actually functional before you do.

Resource Requests: The Scheduler’s Cheat Sheet
#

Every container declares what it needs (requests) and its maximum (limits):

resources:
  requests:
    cpu: 100m # "I need at least 0.1 CPU cores"
    memory: 256Mi # "I need at least 256MB RAM"
  limits:
    cpu: 500m # "Don't let me use more than 0.5 cores"
    memory: 512Mi # "Kill me if I exceed 512MB"

Requests affect scheduling — Kubernetes places pods on nodes that have enough capacity. Limits enforce boundaries — exceed your memory limit and the container gets OOM-killed. Set requests too low and your pods get scheduled on crowded nodes. Set them too high and you waste resources.

Getting these numbers right is more art than science. Start generous, monitor actual usage (we’ll cover that in a later post ), and adjust.

The K9s Connection
#

Remember k9s from the terminal post ? This is where it shines. Instead of typing kubectl get pods -n my-namespace, you type :pods in k9s and navigate visually. View logs, shell into containers, port-forward — all with keyboard shortcuts. Once you’re managing a cluster with dozens of pods across multiple namespaces, a TUI makes a real difference.

What’s Under the Cluster
#

So far I’ve talked about what runs on Kubernetes, but not what runs Kubernetes. The cluster itself needs to live somewhere — and managing that infrastructure is its own challenge. We run ours on Google Kubernetes Engine (GKE), which handles the control plane so we don’t have to. That’s the next post .