Automating Yourself Out of the Loop

Table of Contents

“Proceed”
#

In January, I wrote about how I use AI coding tools. The thesis was simple: you drive, AI assists . Human handles judgment, AI handles volume. Review every line. Commit with intent. It was honest, and it worked.

Four months later, I tracked every interaction across a full PR lifecycle — three issues, one pull request, plan to merge — and found that 55% of my inputs were some variation of “proceed,” “yes,” or “do it.” Not decisions. Not design calls. Just me pressing the gas pedal on a car that already knew where it was going.

Only 8% of my inputs were genuine design decisions. The rest was coordination theater.

So I built an orchestrator to automate myself out of the loop.

The Ladder Nobody Talks About
#

Everyone talks about AI coding tools. Copilot, Cursor, Claude Code, Devin — pick your favorite, argue on Twitter. What nobody talks about is the progression of how you use them. Not which tool, but which mode.

There’s a ladder, and most developers are stuck on the first two rungs:

Rung 1: Autocomplete. You type, AI suggests the next line. Tab to accept. This is where most developers live. It’s comfortable. You’re still writing code — just faster.

Rung 2: Inline editing. Select a block, describe what you want, get a diff. Cursor’s Cmd+K, ChatGPT in the IDE. You’ve stopped typing code and started describing code. Bigger shift than it sounds.

Rung 3: Autonomous agents. Give the AI a goal, let it plan and execute. Claude Code reads your codebase, generates files, runs tests, fixes its own mistakes. You review results, not process.

Rung 4: Skill-driven workflows. Encode repeatable patterns as single commands. /build, /fix, /review-until-clean. The agent runs a multi-step pipeline — you just trigger it.

Rung 5: Orchestration. A meta-agent coordinates the entire lifecycle. You make design decisions at strategic checkpoints. Everything else runs without you.

Each rung changes what you spend your time on. At the bottom, you’re a typist. At the top, you’re an architect. The tools are the same — it’s the workflow that evolves.

January: Pair Programming With an Agent
#

My workflow in January looked like this:

1. Understand the problem         (human)
2. Plan the approach              (human + AI planner)
3. Generate the code              (AI)
4. Review every line              (human — non-negotiable)
5. Test it                        (AI writes, human verifies)
6. Commit                         (human decides what ships)

Steps 1, 2, 4, and 6 were mine. Steps 3 and 5 were the AI’s. I was present for everything. I reviewed every line. I triggered every tool manually — planner, code reviewer, TDD guide, build resolver. Each one was a separate command, a separate conversation.

The rule was: “If you can’t do it yourself, you probably shouldn’t be using AI to do it.”

That rule hasn’t changed. But the scope of what “doing it yourself” means has.

April: Agent Pipelines
#

In two days at the end of April, I built an agent pipeline system — five specialized agents chained into workflows:

Agent	Model	Job
Scout	Haiku	Fast codebase recon, compresses context to ~2000 tokens
Planner	Sonnet	Reads scout output, generates implementation plan
Worker	Sonnet	Executes plan with guards (30 tool call limit, 3 retries)
Reviewer	Sonnet	Quality gate: lint, types, tests, secret scan
Tester	Sonnet	Writes failing tests first, validates with mutation testing

These chain into workflows. /build runs scout, then planner, then worker, then reviewer. /fix skips the planner (it’s a bug, not a feature). /tdd routes through the tester before the worker.

Two things changed from January:

First, agent-to-agent handoff. The scout reads the codebase and compresses what it finds into a context blob. The planner reads that blob, not the full codebase. The worker reads the plan, not the codebase or the scout output. Each agent gets exactly the context it needs — no more, no less. This is what Anthropic calls “context engineering,” and they’re not wrong to call it the load-bearing skill of 2026 .

Second, model routing. Haiku for the scout (fast, cheap, read-only). Sonnet for workers and reviewers (good enough for execution, fast enough to iterate). This isn’t just cost optimization — it’s about matching reasoning depth to the task. A scout doesn’t need to think hard. A worker executing a detailed plan doesn’t need to think hard either. You save the deep reasoning for where it matters.

But I was still the coordinator. I picked which task to run. I chose which workflow to trigger. I decided when to create the PR. The agents handled the work, but I handled the flow.

May: The 55% Discovery
#

First week of May, I started building a full orchestrator — the thing that would handle the flow too. But before writing code, I wanted data. So I tracked every single interaction across a complete PR lifecycle.

38 interactions. Three issues implemented, reviewed, and committed. PR created, reviewed at the PR level, fixed, and merged.

Here’s what I found:

What I Typed	Count	%
“Proceed,” “yes,” “do it”	8	21%
“Review until clean,” “commit”	8	21%
“Feature complete?” “What’s next?”	5	13%
Directives with actual judgment	5	13%
Design decisions	3	8%
Nothing (agent ran autonomously)	8	21%

Twenty-one percent of the time, I said “proceed.” Another twenty-one percent, I said “review until clean” — same instruction, every time, for every issue. Thirteen percent was me asking the agent what it already knew.

Three out of thirty-eight interactions were design decisions. Three. That’s the only part where I was irreplaceable.

The review loops told the same story:

Issue	Review Rounds	Trend
#6 (first module)	6 rounds	Agent learning the codebase
#7	3 rounds	Patterns established
#8 (last module)	2 rounds	Agent knows the conventions

The agent got better within a single session. Issue #6 needed six review rounds because it was the first module — the agent was still learning the project’s patterns. By issue #8, it needed two. The bottleneck wasn’t the agent. It was me, saying “proceed” and waiting.

The Orchestrator
#

So I built the thing. TypeScript CLI with a TUI dashboard (think Lazygit, but for agent coordination). 12,200 lines. 515 tests. It automates the entire lifecycle:

Parse the PR plan. Create worktrees. Spawn workers. Implement. Verify (lint, typecheck, build, test — fail fast). Self-review. Fix findings. Create PR. Review the PR. Fix PR comments. Detect merge. Unlock the next PR group. Resume if interrupted.

The model routing carries over from the pipeline system but gets more intentional: Opus for planning and review (where you need judgment), Sonnet for implementation and fixing (where you need speed from a well-defined spec).

The key design decision — the one that makes everything else work — is documentation as interface. The orchestrator doesn’t communicate through terminal output. It writes plans, reports, and review findings to files. When it needs a human decision, it pauses, writes the context to a document, and sends a notification. I open the file in my editor, review it, make the call, and the orchestrator resumes.

I’m reading docs in NeoVim, not parsing agent logs in a terminal. The human’s interface is their editor. The terminal is the agent’s domain.

What Still Needs a Human
#

Martin Fowler wrote about this — the shift from “human in the loop” to “human on the loop.” In the loop means you’re reviewing every line, gatekeeping every step. On the loop means you’re designing the constraints, the tests, the feedback mechanisms that guide agent behavior. You intervene when it matters, not on every tick.

My orchestrator still pauses for:

Design decisions — when the implementation diverges from what was expected (should killWorker take an ID or a PID?)
Scope calls — fix this gap now or defer it?
Priority shifts — reorder the PR queue because something urgent came in
Complex plans — large, unfamiliar work should be reviewable before execution
Merge timing — when to actually ship

Everything else runs. And “everything else” turns out to be 92% of the work.

The Progression Is the Point
#

Here’s the thing nobody tells you about AI coding tools: the tools don’t change. You change. Copilot in 2023 and Claude Code in 2026 are wildly different products, but the bigger shift is in how I use them. I went from reviewing every line to designing the system that reviews for me.

Jan 2026:  Human drives every command
Apr 2026:  Human triggers pipelines, agents handle steps
May 2026:  Human makes design decisions, orchestrator handles the rest

Each jump required the same thing: running the workflow manually enough times to see which parts were judgment and which parts were mechanical. Then encoding the mechanical parts. Then building trust through verification loops — tests that must pass, reviews that must converge, acceptance criteria that must be met.

You can’t skip steps. You can’t orchestrate a workflow you haven’t done manually a dozen times. You can’t trust agents you haven’t verified. But once you’ve done the work — once you know where the judgment lives — you can automate everything else.

And then you stop saying “proceed.”