The Meeting#
You’re in a technical discussion. Someone says “we should use a message queue for that.” A few heads nod. Someone else says “Kafka?” More nodding. The architect draws some boxes and arrows on a whiteboard. You nod too, because the boxes and arrows make sense, but you’re not entirely sure why the arrows need to go through a box labeled “Kafka” instead of just… calling the other service directly.
I’ve been that person. And after finally sitting down and figuring out what all of this actually means, I’m convinced that most of the time, the answer to “should we use Kafka?” is “probably not.”
Why Queues Exist#
The simplest version: a queue lets two services talk to each other without being available at the same time.
Without a queue:
Service A calls Service B → B is down → A fails
Service A calls Service B → B is slow → A is slow
Service A calls Service B → A sends faster than B can handle → B crashes
With a queue:
Service A puts message on queue → A moves on immediately
→ B picks it up when it's ready
→ If B is down, message waits
→ If B is slow, messages buffer
That’s it. That’s the core value proposition. Decoupling. Service A doesn’t need to know or care about Service B’s availability, speed, or even existence. The queue is the buffer between them.
Three Flavors of Messaging#
This is where it gets confusing, because “message queue” is used loosely to describe three different patterns that solve different problems.
Point-to-Point (Task Queue)#
One message, one consumer. You’re distributing work.
[Send Email Queue] → Worker 1 picks up email A
→ Worker 2 picks up email B
→ Worker 3 picks up email C
Each email gets sent exactly once. Add more workers to process faster. This is what most people actually need when they say “message queue.” It’s what BullMQ, Sidekiq, and Solid Queue do for background jobs.
Publish-Subscribe (Fan-Out)#
One message, many consumers. You’re broadcasting events.
[User Signed Up] → Email service sends welcome email
→ Analytics service tracks the event
→ Notification service sends push notification
All three services get the same event. This is useful when multiple parts of your system need to react to the same thing, and you don’t want the publisher to know about all of them.
Event Streaming (The Log)#
Like pub/sub, but messages are stored and replayable. Consumers track their own position.
[Order Events Log] → Consumer A: caught up to event #5000
→ Consumer B: replaying from event #3000
→ Consumer C: real-time at event #5000
This is Kafka’s world. Messages don’t disappear after consumption — they sit in an append-only log until the retention period expires. Any consumer can go back and reprocess from any point. Useful for event sourcing, audit trails, and data pipelines.
The Contenders#
RabbitMQ — The Swiss Army Knife#
RabbitMQ is a message broker. It receives messages, routes them based on rules, delivers them to consumers, and deletes them once acknowledged. Think of it as a smart post office.
The routing is where RabbitMQ shines. Messages go to an exchange, which routes them to queues based on rules:
| Exchange Type | What It Does | Example |
|---|---|---|
| Direct | Exact match on routing key | “Send to the payment queue” |
| Fanout | Broadcast to all bound queues | “Notify all subscribers” |
| Topic | Pattern matching | order.* matches order.created, order.cancelled |
I’ve touched RabbitMQ, and the mental model clicks pretty quickly if you think of it as: messages go in, routing rules decide where they end up, consumers pull from their queue. Once a message is acknowledged, it’s gone.
Throughput: 10K-50K messages/second. Plenty for most applications.
Apache Kafka — The Distributed Log#
Kafka is not a message broker. It’s a distributed, append-only log that happens to support messaging patterns. That distinction matters.
In Kafka, messages are written to partitions within topics and stay there until the retention period expires (default: 7 days). Consumers don’t “receive” messages — they pull from the log and track their own position (offset).
The parallelism model is built around partitions:
- A topic with 10 partitions can have up to 10 consumers reading in parallel
- Each partition is read by exactly one consumer in a group
- Different consumer groups read independently (each gets all messages)
Throughput: millions of messages/second per broker. Built for massive scale.
Google Cloud Pub/Sub — The Managed Option#
If you’re on GCP and want pub/sub without managing infrastructure, this is it. No brokers, no clusters, no partitions to configure. Create a topic, publish messages, create subscriptions, consume. Google handles scaling.
Throughput: millions of messages/second, auto-scaled. You pay per message (~$0.04 per million).
The Comparison That Actually Matters#
| Question | RabbitMQ | Kafka | Cloud Pub/Sub |
|---|---|---|---|
| Can I replay messages? | No (deleted after ACK) | Yes (offset-based) | Yes (timestamp-based) |
| Complex routing? | Yes (exchanges, bindings) | No (topics only) | No (topics only) |
| Throughput | 10K-50K msg/s | Millions msg/s | Millions msg/s |
| Operations burden | Medium | High | Zero |
| Message ordering | Per queue (FIFO) | Per partition | Per ordering key |
| Consumer model | Push (broker delivers) | Pull (consumer fetches) | Both |
So When Do You Actually Need Kafka?#
Here’s the thing. Kafka was built at LinkedIn to handle trillions of events per day across a distributed system. If that sounds like your startup with 500 users… it’s not.
You need Kafka when:
- You’re processing millions of messages per second (not thousands — millions)
- Multiple independent systems need to replay the same event stream
- You’re building real-time data pipelines (ETL, analytics, stream processing)
- You need event sourcing — a complete, replayable history of everything that happened
- You have a team that can operate a Kafka cluster (or you’re paying for Confluent Cloud)
You don’t need Kafka when:
- You’re distributing background tasks → use a task queue (Redis-based or database-backed)
- You need simple pub/sub with a few subscribers → RabbitMQ or Cloud Pub/Sub
- You’re processing < 50K messages/second → RabbitMQ handles this fine
- You don’t have a team to manage Kafka infrastructure → Cloud Pub/Sub
- “Someone said we should use Kafka” → that’s not a requirement
The Scaling Ladder#
"I need to send emails in the background"
→ Redis queue (BullMQ, Sidekiq) or database-backed (Solid Queue)
"I need services to communicate asynchronously"
→ RabbitMQ (self-hosted) or Cloud Pub/Sub (managed)
"I need complex routing — different messages to different consumers based on patterns"
→ RabbitMQ (this is its specialty)
"I need millions of messages per second with replay capability"
→ Kafka (self-managed) or Confluent Cloud (managed)
"I need all of the above and I'm on GCP"
→ Cloud Pub/Sub for most things, Kafka only for the streaming use case
The Pattern That Trips People Up#
The Saga pattern — distributed transactions across microservices using events:
Order Service → [OrderCreated] → Payment Service
Payment Service → [PaymentProcessed] → Inventory Service
Inventory Service → [ItemReserved] → Shipping Service
Something fails?
→ Compensating events flow backward to undo each step
This is where queues genuinely shine. But you can implement Saga with RabbitMQ just as well as with Kafka. The pattern is about the architecture, not the specific queue technology. Don’t let someone convince you that you need Kafka for event-driven architecture — you need a message broker. Which one depends on your actual throughput and replay requirements.
The Honest Take#
I don’t have deep production experience with any of these tools. I’ve touched RabbitMQ enough to understand the mental model. I know what Kafka is and when it makes sense. But the most useful thing I’ve learned from researching all of this is the decision framework — knowing which layer of the stack solves which problem, and not reaching for the most complex tool just because it’s the one everyone talks about.
Most applications will never outgrow a Redis-backed task queue. Many that need pub/sub will do fine with RabbitMQ or a managed service. Kafka is for a specific class of problems at a specific scale. If you’re not sure whether you need it, you probably don’t.
