Skip to main content
  1. Posts/

Automating Yourself Out of the Loop

·1512 words·8 mins
Photograph By Aleksandr Popov
Blog Software Engineering AI
Table of Contents

“Proceed”
#

In January, I wrote about how I use AI coding tools. The thesis was simple: you drive, AI assists . Human handles judgment, AI handles volume. Review every line. Commit with intent. It was honest, and it worked.

Four months later, I tracked every interaction across a full PR lifecycle — three issues, one pull request, plan to merge — and found that 55% of my inputs were some variation of “proceed,” “yes,” or “do it.” Not decisions. Not design calls. Just me pressing the gas pedal on a car that already knew where it was going.

Only 8% of my inputs were genuine design decisions. The rest was coordination theater.

So I built an orchestrator to automate myself out of the loop.

The Ladder Nobody Talks About
#

Everyone talks about AI coding tools. Copilot, Cursor, Claude Code, Devin — pick your favorite, argue on Twitter. What nobody talks about is the progression of how you use them. Not which tool, but which mode.

There’s a ladder, and most developers are stuck on the first two rungs:

Rung 1: Autocomplete. You type, AI suggests the next line. Tab to accept. This is where most developers live. It’s comfortable. You’re still writing code — just faster.

Rung 2: Inline editing. Select a block, describe what you want, get a diff. Cursor’s Cmd+K, ChatGPT in the IDE. You’ve stopped typing code and started describing code. Bigger shift than it sounds.

Rung 3: Autonomous agents. Give the AI a goal, let it plan and execute. Claude Code reads your codebase, generates files, runs tests, fixes its own mistakes. You review results, not process.

Rung 4: Skill-driven workflows. Encode repeatable patterns as single commands. /build, /fix, /review-until-clean. The agent runs a multi-step pipeline — you just trigger it.

Rung 5: Orchestration. A meta-agent coordinates the entire lifecycle. You make design decisions at strategic checkpoints. Everything else runs without you.

Each rung changes what you spend your time on. At the bottom, you’re a typist. At the top, you’re an architect. The tools are the same — it’s the workflow that evolves.

January: Pair Programming With an Agent
#

My workflow in January looked like this:

1. Understand the problem         (human)
2. Plan the approach              (human + AI planner)
3. Generate the code              (AI)
4. Review every line              (human — non-negotiable)
5. Test it                        (AI writes, human verifies)
6. Commit                         (human decides what ships)

Steps 1, 2, 4, and 6 were mine. Steps 3 and 5 were the AI’s. I was present for everything. I reviewed every line. I triggered every tool manually — planner, code reviewer, TDD guide, build resolver. Each one was a separate command, a separate conversation.

The rule was: “If you can’t do it yourself, you probably shouldn’t be using AI to do it.”

That rule hasn’t changed. But the scope of what “doing it yourself” means has.

April: Agent Pipelines
#

In two days at the end of April, I built an agent pipeline system — five specialized agents chained into workflows:

AgentModelJob
ScoutHaikuFast codebase recon, compresses context to ~2000 tokens
PlannerSonnetReads scout output, generates implementation plan
WorkerSonnetExecutes plan with guards (30 tool call limit, 3 retries)
ReviewerSonnetQuality gate: lint, types, tests, secret scan
TesterSonnetWrites failing tests first, validates with mutation testing

These chain into workflows. /build runs scout, then planner, then worker, then reviewer. /fix skips the planner (it’s a bug, not a feature). /tdd routes through the tester before the worker.

Two things changed from January:

First, agent-to-agent handoff. The scout reads the codebase and compresses what it finds into a context blob. The planner reads that blob, not the full codebase. The worker reads the plan, not the codebase or the scout output. Each agent gets exactly the context it needs — no more, no less. This is what Anthropic calls “context engineering,” and they’re not wrong to call it the load-bearing skill of 2026 .

Second, model routing. Haiku for the scout (fast, cheap, read-only). Sonnet for workers and reviewers (good enough for execution, fast enough to iterate). This isn’t just cost optimization — it’s about matching reasoning depth to the task. A scout doesn’t need to think hard. A worker executing a detailed plan doesn’t need to think hard either. You save the deep reasoning for where it matters.

But I was still the coordinator. I picked which task to run. I chose which workflow to trigger. I decided when to create the PR. The agents handled the work, but I handled the flow.

May: The 55% Discovery
#

First week of May, I started building a full orchestrator — the thing that would handle the flow too. But before writing code, I wanted data. So I tracked every single interaction across a complete PR lifecycle.

38 interactions. Three issues implemented, reviewed, and committed. PR created, reviewed at the PR level, fixed, and merged.

Here’s what I found:

What I TypedCount%
“Proceed,” “yes,” “do it”821%
“Review until clean,” “commit”821%
“Feature complete?” “What’s next?”513%
Directives with actual judgment513%
Design decisions38%
Nothing (agent ran autonomously)821%

Twenty-one percent of the time, I said “proceed.” Another twenty-one percent, I said “review until clean” — same instruction, every time, for every issue. Thirteen percent was me asking the agent what it already knew.

Three out of thirty-eight interactions were design decisions. Three. That’s the only part where I was irreplaceable.

The review loops told the same story:

IssueReview RoundsTrend
#6 (first module)6 roundsAgent learning the codebase
#73 roundsPatterns established
#8 (last module)2 roundsAgent knows the conventions

The agent got better within a single session. Issue #6 needed six review rounds because it was the first module — the agent was still learning the project’s patterns. By issue #8, it needed two. The bottleneck wasn’t the agent. It was me, saying “proceed” and waiting.

The Orchestrator
#

So I built the thing. TypeScript CLI with a TUI dashboard (think Lazygit, but for agent coordination). 12,200 lines. 515 tests. It automates the entire lifecycle:

Parse the PR plan. Create worktrees. Spawn workers. Implement. Verify (lint, typecheck, build, test — fail fast). Self-review. Fix findings. Create PR. Review the PR. Fix PR comments. Detect merge. Unlock the next PR group. Resume if interrupted.

The model routing carries over from the pipeline system but gets more intentional: Opus for planning and review (where you need judgment), Sonnet for implementation and fixing (where you need speed from a well-defined spec).

The key design decision — the one that makes everything else work — is documentation as interface. The orchestrator doesn’t communicate through terminal output. It writes plans, reports, and review findings to files. When it needs a human decision, it pauses, writes the context to a document, and sends a notification. I open the file in my editor, review it, make the call, and the orchestrator resumes.

I’m reading docs in NeoVim, not parsing agent logs in a terminal. The human’s interface is their editor. The terminal is the agent’s domain.

What Still Needs a Human
#

Martin Fowler wrote about this — the shift from “human in the loop” to “human on the loop.” In the loop means you’re reviewing every line, gatekeeping every step. On the loop means you’re designing the constraints, the tests, the feedback mechanisms that guide agent behavior. You intervene when it matters, not on every tick.

My orchestrator still pauses for:

  1. Design decisions — when the implementation diverges from what was expected (should killWorker take an ID or a PID?)
  2. Scope calls — fix this gap now or defer it?
  3. Priority shifts — reorder the PR queue because something urgent came in
  4. Complex plans — large, unfamiliar work should be reviewable before execution
  5. Merge timing — when to actually ship

Everything else runs. And “everything else” turns out to be 92% of the work.

The Progression Is the Point
#

Here’s the thing nobody tells you about AI coding tools: the tools don’t change. You change. Copilot in 2023 and Claude Code in 2026 are wildly different products, but the bigger shift is in how I use them. I went from reviewing every line to designing the system that reviews for me.

Jan 2026:  Human drives every command
Apr 2026:  Human triggers pipelines, agents handle steps
May 2026:  Human makes design decisions, orchestrator handles the rest

Each jump required the same thing: running the workflow manually enough times to see which parts were judgment and which parts were mechanical. Then encoding the mechanical parts. Then building trust through verification loops — tests that must pass, reviews that must converge, acceptance criteria that must be met.

You can’t skip steps. You can’t orchestrate a workflow you haven’t done manually a dozen times. You can’t trust agents you haven’t verified. But once you’ve done the work — once you know where the judgment lives — you can automate everything else.

And then you stop saying “proceed.”

Aaron Yong
Author
Aaron Yong
Building things for the web. Writing about development, Linux, cloud, and everything in between.

Related

How AI Changed the Economics of Clean Code
·27 words·1 min
Photograph By freeCodeCamp
Blog Software Engineering AI FreeCodeCamp
AI made writing code nearly free. But humans still read it — and that changes the abstraction calculus entirely.
You Drive, AI Assists
·1312 words·7 mins
Photograph By Andrés Dallimonti
Blog Software Engineering AI
How I use Cursor and Claude Code in my day-to-day work, and why the split matters
The Data Structure That's Okay With Being Wrong
·1340 words·7 mins
Photograph By Elimende Inagella
Blog Software Engineering Data Structures
Bloom filters — probabilistic, memory-efficient, and surprisingly useful