Essay January 15, 2026

Human-in-the-Loop Is a Temporary Placebo

Sixty percent of enterprises require human oversight before AI agents access sensitive data. It feels responsible. It's a placebo with a 24-month shelf life.

According to KPMG’s Q4 AI Pulse Survey, sixty percent of enterprises now require human oversight before AI agents can access sensitive data. Security teams celebrate this as responsible deployment. Compliance officers sleep better. The board gets reassured that humans remain in control.

It’s a placebo. And like most placebos, it works until you realize what’s actually happening.

Human-in-the-loop isn’t a safety architecture. It’s a confession that you don’t trust your own deployment, and a temporary tax you’re paying while you figure out how to remove it. The organizations treating HITL as a permanent design choice are building systems that will be uncompetitive within 24 months.

The Math Doesn’t Work

Consider what human oversight actually costs. Every time an agent pauses for approval, you’re paying for the human’s time, the delay in processing, and the cognitive load of context-switching. A finance team approving 500 agent decisions per day isn’t supervising AI, they’re becoming a bottleneck dressed up as governance.

The entire value proposition of AI agents is that they operate faster and cheaper than humans at scale. Insert a human checkpoint into every sensitive workflow and you’ve negated the scale advantage while keeping the implementation cost. You’ve built an expensive system that performs like a cheap one.

This isn’t hypothetical. I’ve watched organizations deploy invoice processing agents that require human approval for any transaction over $10,000. The agent handles the matching, the validation, the GL coding, then waits. Sometimes for hours. Meanwhile, early payment discounts expire and vendor relationships strain. The agent did 95% of the work. The human added 80% of the delay.

The Trust Paradox

Here’s what HITL advocates won’t say out loud: if you don’t trust the agent to make the decision, why do you trust it to frame the decision?

The human reviewer sees what the agent shows them. They approve based on the context the agent provides. In most workflows, the human isn’t independently verifying, they’re rubber-stamping the agent’s recommendation with a thin veneer of oversight. The agent has already made the decision. The human just signs off.

This creates the worst of both worlds. You get the delay of human involvement without the rigor of human judgment. The approval step feels like control, but it’s theater. Real control would mean the human doing the work themselves, which defeats the purpose of deploying agents.

The honest version of HITL is this: we’re not ready to let the agent act autonomously, so we’re adding friction to slow it down until we build confidence. That’s a valid transition strategy. It’s not a valid end state.

What Replaces the Human?

The organizations moving past HITL aren’t removing oversight, they’re relocating it. Instead of humans approving individual decisions, they’re building systems where humans define boundaries and agents operate freely within them.

Think of it as the difference between approving every expense report versus setting an expense policy. One scales. The other doesn’t.

In practice, this means shifting human effort from transaction-level review to three higher-value activities:

Boundary definition. Humans decide what agents can and cannot do. Which accounts are sensitive? What dollar thresholds trigger escalation? What vendor categories require additional scrutiny? These decisions happen once and apply to thousands of transactions.

Exception handling. Agents operate autonomously within boundaries, but flag genuine anomalies for human review. The human isn’t approving routine decisions, they’re resolving edge cases the agent wasn’t designed to handle. This is where human judgment actually adds value.

Continuous calibration. Humans review agent performance in aggregate, adjusting boundaries and retraining models based on patterns. Instead of approving transaction #4,847, they’re asking why the agent’s accuracy dropped 2% last month and what to do about it.

This model treats humans as architects and auditors, not gatekeepers. It scales because human effort compounds rather than accumulates.

The 24-Month Window

Why 24 months? Because the economics will force the transition.

Right now, organizations can afford HITL because AI deployment is still early enough that competitive pressure is limited. Everyone’s moving slowly, so moving slowly doesn’t hurt. But adoption is accelerating. The 600% year-over-year growth in agentic AI usage means the gap between HITL-constrained organizations and autonomous-agent organizations will become visible fast.

When your competitor processes invoices in minutes while you process them in hours, the board stops celebrating your careful governance. When their working capital efficiency is 15% better because agents optimize payment timing without waiting for approval, your CFO starts asking hard questions.

The organizations that treat HITL as permanent architecture will face a painful choice: rip out the oversight layer under competitive pressure or watch margins erode while faster competitors take share. Neither option is pleasant. Both are avoidable if you plan the transition now.

The Transition Playbook

Moving from HITL to autonomous operation isn’t reckless, it’s methodical. Start with low-stakes workflows where errors are easily reversed. Build confidence through measurable performance. Gradually raise the ceiling on what agents can do without approval.

The key metric isn’t accuracy. It’s recovery time. An agent that makes occasional mistakes but corrects them in minutes is more valuable than an agent that waits for human approval to avoid mistakes. Speed of error correction determines whether autonomy is safe, not perfection of initial output.

Track your human approval data. What percentage of agent recommendations get approved without modification? If it’s above 95%, your humans aren’t adding judgment, they’re adding latency. That’s your signal to remove the checkpoint.

The Real Question

Human-in-the-loop feels responsible because it preserves the appearance of control. But appearance isn’t the same as reality. The organizations that understand this will spend the next 24 months systematically eliminating human checkpoints, replacing them with better boundaries, smarter exception handling, and continuous calibration.

The question isn’t whether to remove the human from the loop. It’s whether you’ll do it deliberately, on your timeline, with proper architecture, or whether competitive pressure will force you to do it hastily, under duress, with whatever shortcuts you can manage.

The placebo is wearing off. The only question is what you’ll replace it with.

Chris Couch is Head of Product for B2B at Flywire. He writes about AI in B2B finance. Work with me →