All Green, All Wrong#
You’ve been here. CI is green. Coverage is above 80%. You deploy with confidence. Fifteen minutes later, Slack lights up. Something is broken in production — something your 247 passing tests said was fine.
You dig in. The endpoint is returning a 500 because the external API changed a field name from userName to username. Your test didn’t catch it because your mock returns exactly what you told it to: the old shape, forever frozen in time, slowly drifting from reality while your test suite smiles and gives you a thumbs up.
Your tests weren’t wrong. They were lying by omission.
A Rose by Any Other Name (Would Still Break Your Build)#
Before I get into the actual problem, a quick taxonomy that took me embarrassingly long to internalize. Gerard Meszaros defined five types of test doubles, and Martin Fowler popularized the naming. Most people say “mock” for all of them, which is like calling every dog a golden retriever — technically sometimes correct, practically confusing.
| Double | What It Does | Example |
|---|---|---|
| Dummy | Passed around but never used | A placeholder argument to satisfy a constructor |
| Stub | Returns canned answers | getUser() always returns { id: 1, name: "Test" } |
| Spy | Records calls for later assertion | “Was sendEmail called twice?” |
| Mock | Pre-programmed expectations that verify behavior | “Expect save() to be called with these exact args” |
| Fake | Working implementation, simplified | An in-memory database instead of Postgres |
The distinction matters because each one hides different things from your tests, and the things they hide are usually the things that break.
The Line in the Sand#
Here’s the rule I wish someone had given me earlier: mock at the boundaries you don’t own; don’t mock the things you do own.
If you’re calling Stripe’s API, mock that. You don’t control Stripe. But if you’re testing your PaymentService and you mock out your own UserRepository — you’ve just told the test to trust that your code talks to your other code correctly. And you haven’t verified it.
The problem isn’t mocking itself. The problem is where you draw the line. Every mock is a tiny declaration of faith: “I believe the real thing behaves exactly like this.” The further that mock sits from the actual boundary, the more faith you’re exercising.
Three Ecosystems, Three Ways to Get Burned#
I’ve seen this play out differently depending on the stack, and each one has its own signature footgun.
Java: The H2 Trap#
Spring Boot’s test slices are genuinely great. @WebMvcTest loads only the web layer. @DataJpaTest loads only the JPA layer. Fast, focused, good.
The trap is @DataJpaTest with H2. H2 is an in-memory database that speaks SQL, so your JPA tests pass — but H2 is not Postgres. It doesn’t enforce the same constraints. It handles jsonb columns differently. It doesn’t have the same query planner behavior. I’ve watched tests pass on H2 and fail on Postgres because a native query used a Postgres-specific function that H2 silently ignored.
The fix: Testcontainers . Spin up a real Postgres in Docker for your integration tests. Yes, it’s slower. Yes, it’s worth it.
@Testcontainers
@DataJpaTest
class UserRepositoryTest {
@Container
static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>("postgres:16");
// Now your JPA tests run against real Postgres
// No more "works on H2, explodes in prod"
}
TypeScript: The Phantom Mock#
Jest and Vitest both have a fun quirk: neither clears mocks between tests by default. If test A sets up a mock for fetch and test B assumes a clean environment, test B is running against test A’s leftovers. Your test passes or fails depending on execution order — a heisenbug that makes you question your career choices.
// vitest.config.ts
export default defineConfig({
test: {
mockReset: true, // Reset all mocks between tests. You want this. Trust me.
},
});
The other TypeScript trap is vi.mock type safety — or lack thereof. You can mock a module to return anything, and TypeScript won’t complain if the shape drifts from the real implementation. Your mock says the function returns { data: string } but the real function now returns { data: string, meta: object }, and nothing catches it until runtime.
Ruby/Rails: The Stale Cassette#
VCR records real HTTP responses as “cassettes” and replays them in future test runs. Brilliant in theory — you test against real data once, then replay forever.
“Forever” is the problem. That cassette was recorded six months ago. The API has since added required headers, changed response shapes, deprecated endpoints. Your tests pass because VCR faithfully replays the ancient recording. Production fails because the real API has moved on.
The fix is a cassette expiry policy — either re-record on a schedule or use VCR.configure { |c| c.default_cassette_options = { re_record_interval: 30.days } }. Or better yet, for critical integrations, run periodic tests against the real API in a staging environment.
The Ice Cream Cone#
You know the testing pyramid: many unit tests, fewer integration tests, even fewer E2E tests. Most codebases I’ve worked in have the inverse — an ice cream cone. A thin layer of unit tests (often just the happy path), a handful of integration tests someone wrote once, and a massive E2E suite that takes 40 minutes and fails randomly because Selenium couldn’t find a button.
The pyramid isn’t just about cost. It’s about feedback speed and signal quality. A failing unit test tells you exactly what broke. A failing E2E test tells you something is broken somewhere, maybe, unless it’s a flaky test, which it probably is. (I covered the pyramid in more detail in The Full Test Suite if you want the Rails-specific version.)
The irony is that most teams have their mocks in the wrong place and their test distribution inverted. They mock the things they own (hiding integration bugs) and write E2E tests to compensate (slow, flaky, expensive). Flip both: don’t mock what you own, and invest in integration tests that run against real dependencies.
The Missing Middle#
There’s a tool that addresses the core problem — the contract between services drifting without anyone noticing — and it’s criminally underused. Pact is a contract testing framework. Instead of mocking an API’s response, you define a contract: “I expect this endpoint to return this shape.” The provider independently verifies that it honors the contract.
If the provider changes a field name from userName to username, the contract test fails on the provider’s side before anything deploys. No stale mocks. No cassettes from 2024. No 2 AM Slack messages.
It’s not a replacement for integration tests — it’s the verification layer that makes your mocks honest. You can mock the API in your consumer tests and know that the mock reflects reality, because the contract is tested on both sides.
The Honest Mock#
I’m not anti-mock. Mocks are essential for fast, focused tests. But every mock is a lie you’re choosing to believe, and the question isn’t whether to mock — it’s whether your lies are up to date.
Mock what you don’t own. Test what you do own against the real thing (or as close as you can get). Expire your recordings. Check your cleanup. And if you can, add contract tests for the boundaries that matter most.
Your tests are only as honest as your mocks are current. Right now, somewhere in your test suite, a mock is returning a response shape that hasn’t existed in production for three months. It’s passing. It’s green. And it’s lying to your face.
