Review is the New Production Line

Jun 05, 2026

AI is already speeding up the development of AI. Not in the bootstrapped-successor sense the recursive self-improvement debate keeps gesturing at — in the boring, measurable sense that eighty percent of merged code at Anthropic last quarter was written by Claude, and the engineers who didn’t write it spent their day reading it.

Marina Favaro and Jack Clark, of the Anthropic Institute, published the numbers in a piece most people will read as a forecast. The numbers aren’t a forecast. They’re a description of what happens to an organization when the cost of producing work falls to nearly zero while the cost of judging it doesn’t move. And they’re a leading indicator for every white-collar field structured the same way.

The frame everyone’s using is wrong

The conventional story about AI and white-collar work goes like this. Models will keep getting better at execution. Humans will keep owning judgment. The roles that survive will be the ones where taste and review live. Hold the line at the review layer and you’re fine.

The frame everyone’s using is wrong. Anthropic’s data breaks this frame in a specific way.

When the model writes eighty percent of merged code, the human review queue is the assembly line. Amdahl’s law, dropped on an org chart: the part you didn’t speed up sets the pace for everything else. At Anthropic the speed limit moved twice in a year. First to code review. Then to the gap between generating ideas and pursuing them. Review didn’t stay a safety margin. It became the constraint.

The implications stack quickly. If review is the production line, then every choice you make about how reviews happen — who does them, what tooling they get, how they queue, how much context they carry — is now a choice about throughput, not safety. A senior engineer who spends two hours a day on PRs used to be a wise overhead. Now they’re the floor. The org doesn’t ship faster than they read.

What “taste” actually was

There’s a secondary result in the Institute’s piece that should bother anyone betting their career on judgment as a moat. In a study of Claude Code sessions where a human had taken a wrong turn, the November 2025 model picked a better next step than the human fifty-one percent of the time. By April 2026, a preview model was at sixty-four percent.

You’ve seen this movie. Models were bad at theory of mind, then they weren’t. Bad at multi-step planning, then they weren’t. Bad at long-horizon coding tasks, then the doubling rate dropped from seven months to four. The pattern is consistent enough that pretending taste sits outside it requires a story about what makes taste categorically different from every cognitive skill that fell. There isn’t one. Taste is pattern recognition over a small number of examples. Models get more examples every quarter.

The honest read: taste buys you a year or two as a defensible skill, not a decade.

This is uncomfortable in a particular way. Taste is what professionals tell themselves they’re really paid for. The associate writes the memo, the partner has the taste. The analyst pulls the data, the PM has the taste. The artist generates options, the director has the taste. When taste itself starts losing the close calls, the org chart underneath that story starts to wobble.

The one skill that compounds

So what’s left? The Institute names a single capability that compounds with human ownership: spotting the bottleneck the previous quarter created, and dissolving it before it ossifies. This is the only skill in their data that gets more valuable as everything around it speeds up.

Notice what this is. It’s not coding. It’s not direction-setting. It’s not even strategy in the usual sense. It’s a kind of operational meta-attention — the ability to look at a system that just got eight-times faster and ask, with no ego attached, where the new floor is.

Most people can’t do this on their own work. They built the system. They love the parts that just got fast. They don’t want to admit that the new constraint is them.

The skill is closer to a coroner’s discipline than a manager’s. You walk in, declare what’s dead, and reroute. Whoever’s good at this in 2027 will be running things in 2030.

This isn’t an AI story

Here’s where the Institute’s piece points outside itself.

Every field with the structure expert generates artifact, expert reviews artifact is on the same curve, lagged by the gap between Anthropic’s coding loop and your industry’s tooling. Law. Consulting. Design. Investing. Diligence. Editorial. Medicine. Each of these has been telling itself a version of “the doing will get cheaper but the judgment is ours.” Each will, in turn, discover that judgment was load-bearing in a different way than it claimed.

When the artifact arrives in seconds, the bottleneck moves to whoever has to bless it. Blessing was always the work. We just didn’t price it that way, because doing took long enough that blessing looked like a coda.

A law firm thinks it’s in the business of expert judgment. It’s actually in the business of expert review of associate-drafted artifacts. Replace the associates with a model and the bottleneck doesn’t disappear, it surfaces. The partner who could supervise four associates can supervise — what, twelve agents? Twenty? The answer turns out to depend on tooling nobody’s built yet. Same logic in every field where senior people review what junior people produce.

The white-collar shape of the next decade isn’t fewer humans doing the old jobs. It’s the same humans, doing only the review half of the old jobs, with the throughput question reframed as: how fast can you bless?

What changes Monday

If you ship software, you should treat your review capacity as the main constraint for the coming year. You need to rebuild for it now, including tooling, staffing, queue design, and everything else. The teams that will win the next cycle are not the ones with the best agents. They are the ones whose humans can process agent output at the same rate agents produce it.

If you ship anything else, like research, analysis, content, or decisions, assume the same logic will apply to your field on a slightly slower timeline. Find the part of your work that involves generating artifacts. Imagine that part becomes free. Look at what is left. That will be your job in three years.

If you’re hiring, the role description you’ve been writing is already wrong. You’re not hiring for the doing. You’re hiring for the readiness to look at last quarter’s output, find the slowest joint, and break it. Call it whatever you want. Most of the candidates you talk to will not be able to do it because they spent twenty years being praised for the opposite — for owning the doing, for being the deep expert in the part that just stopped being scarce.

The loop hasn’t closed. The Anthropic piece is careful to say so, and the caveat matters. What’s closed is the era when humans judge, machines execute was a stable description of the work. We’re in the next era now, and the surprising thing about it is how much of the future is just learning to review fast enough to keep up with ourselves.

The Intent Layer

Discussion about this post

Ready for more?