The Human Cost Nobody Counts: Supervision, Correction, and the Real Price of Running AI Agents

Felix, the most-cited example of an autonomous business in 2026, generates revenue with no employees. What it also has is Nat Eliason - who built the operating manual that Felix runs on, who refined the agent instructions over months, and who still makes judgment calls when Felix hits something it hasn't been briefed on.

The Felix story is accurate. The framing most people put on it is not.

The conversation around autonomous companies has become obsessed with token costs - and rightly so, because they are real and they compound fast. But the cost that rarely gets measured is the human time that wraps every autonomous operation: setting up, briefing, redirecting, correcting, and debugging agents that have gone sideways. That cost doesn't appear on an API invoice. It doesn't show up in the dashboards that get screenshot for demos. And right now, for most operations, it is larger than the token bill.

The invisible labour

Consider what "running" an AI agent actually involves.

A well-functioning agent needs a clear brief that anticipates edge cases. It needs access to the right tools. It needs someone to check whether the output is correct - not just whether it completed. It needs redirection when it misinterprets the goal. And it needs the human to notice when something has gone wrong rather than waiting to be told.

None of that is passive.

At AutonomousHQ, we're tracking this from the inside. The honest version of the numbers so far: for every hour of autonomous output the agents produce, there is a corresponding period of human review, correction, and re-briefing. Sometimes that overhead is small - a quick redirect and the agent gets back on track. Sometimes an agent builds the wrong thing entirely and the work has to be discarded.

The Supabase authentication system is the clearest example. An engineering agent was asked to build a Discord invite flow. Instead, it built a complete email/password account system with database tables, migrations, and email confirmation logic. Everything worked. None of it was the right thing. Tearing it out and rebuilding from the correct brief took longer than starting from scratch would have.

That is not a bad agent. That is an accurate picture of where the tooling sits.

Why the overhead exists

The root cause is the gap between specification and execution. Writing a brief that leaves no room for misinterpretation is genuinely hard - and it gets harder as the task gets more complex. Most humans are not good at writing complete specifications because they've spent their careers working with other humans who fill gaps using shared context and judgment. Agents don't have that shared context.

The result is that the human ends up doing the specification work that the brief was supposed to handle, only they do it reactively, after the agent has already gone in the wrong direction.

This is a temporary problem in the sense that models are getting better at inference and at asking clarifying questions before they start. But it is the operational reality in early 2026, and the operators who are building sustainably are the ones who have accepted it rather than hidden from it.

What this means for the economics

The fully-loaded cost of an autonomous operation is:

Token costs (API usage)
Infrastructure (hosting, storage, tooling subscriptions)
Human supervision time, valued at the operator's effective hourly rate

The third item is almost never in the public accounting.

If a founder earning £100 an hour spends three hours a day redirecting and correcting agents, that is £300 a day in implicit supervision cost - on top of whatever the API is billing. For an operation generating £1,000 MRR, the supervision cost alone can exceed the revenue. The agents are not replacing the human. They are changing what the human does, and in the early stages, they are doing it without reducing the total hours.

This is not a reason not to build with agents. It is a reason to be precise about what you are building - and to track the actual cost rather than just the API bill.

Where the ratio improves

The operators who have got the supervision overhead down have done it in one of three ways.

Better prompts. Felix's instructions are not a paragraph. They are an operating manual - detailed enough that the agent can interpret an ambiguous situation without asking. Building prompts to that standard takes significant up-front time, but it pays back quickly in reduced correction cycles.

Narrower scope. Agents that do one specific thing reliably are dramatically cheaper to supervise than agents with broad mandates. A content agent that writes articles in a defined format with defined sources needs far less redirection than one asked to "manage the content operation." Scope creep is expensive.

Structured handoffs. Some operations have reduced correction overhead by building explicit check-in points into the workflow. The agent completes a discrete unit of work, a human reviews it at a defined checkpoint, and work only proceeds if it passes. This trades latency for accuracy - and in most cases the trade is worth it.

None of these solutions are clever. They are what any experienced manager does when setting up a new team member. The difference is that with an agent, there is a strong temptation to skip straight to "autonomous" because the demos look so capable.

The compounding problem

Here is why this matters at scale.

If you run one agent with £50 a day in supervision overhead, that is manageable - roughly the cost of a few hours of part-time help per week. If you run ten agents with the same per-agent overhead, that is £500 a day, or £182,000 a year in implicit labour cost, not counting API fees.

The bet that autonomous companies are making is that supervision overhead decreases as a function of better models, better prompts, and better tooling - and that token costs also fall as the API market matures. Both trends are real. Neither is happening as fast as the demos suggest.

The operations building for the long term are the ones treating supervision cost as a first-class metric: measuring it explicitly, setting targets for reducing it, and publishing the numbers rather than just the token bill.

What to actually track

If you are running AI agents for real work, these are the numbers worth recording:

Corrections per task - how often does a task require human intervention before it's complete?
Rework rate - how often does a completed task need to be discarded or rebuilt?
Briefing time - how long does it take to write a brief that produces a correct first attempt?
Supervision hours per week - total human time spent monitoring and redirecting agents

These numbers will be bad at the start. They should be getting better over time. If they are not, the problem is in the prompts or the process - not the models.

We're publishing all of AutonomousHQ's own supervision metrics on the public tracker on this site. If you want an honest baseline for what a single-human-operated, six-agent operation looks like in early 2026, that's where to find it.

Follow along. Tim is running this experiment live on YouTube - every correction, every redirect, every wrong implementation on camera. Sign up to the AutonomousHQ newsletter for weekly updates on the real economics of running AI agents.