Why Every Lab Is Building a Coding Agent (And None of It Is About Code)

Frontier labs are betting on code as a proxy for general computer use, aiming to automate all knowledge work, not just programming.

Jun 29, 2026

Every frontier lab shipped a coding agent in the last year. OpenAI, Anthropic, Google, the open-weight teams behind Kimi and DeepSeek. If you only watched the launches, you’d think the world ran out of software to write and these companies were racing to serve the same few million programmers. That’s not what’s happening. The coding agent is the most general bet any of these labs has placed, and the fact that it writes code is almost incidental.

Here’s what the launches don’t say out loud: coding is a proxy for computer use. A command line, a handful of scripts, and an API on nearly every service get you most of the way to automating any knowledge work, not just software. When a model can write and run code, it can rename ten thousand files, reconcile two spreadsheets, pull a report from an internal tool, reformat it, and email the result. None of that is “programming” in the sense we picture when we imagine an engineer at a standup. It’s office work. The coding agent just happens to be the body that can do it.

Watch a capable one work and the disguise gets obvious. It writes a three-line script to do what a person would have done with forty clicks across three windows, runs it, reads the output, and decides what to do next. The code is scaffolding it throws away the moment the task is done. What’s left is the result, which is the only part anyone wanted.

The computer was always programmable

We forgot that the computer was always programmable. For its first few decades, that’s all it was. You typed instructions and it followed them. The graphical interface was an act of translation built for people: it took the machine’s native programmability and hid it behind buttons and windows so someone who never wanted to learn a shell could still get work done. That translation was a gift to us and a tax on the machine. Every menu, every modal, and every drag target exists because a human needs to see it and point at it.

An agent needs none of that. Give a model a shell and it talks to the computer in the computer’s own language, the one we papered over forty years ago because it was too hard for most people. So the labs aren’t teaching agents to code because code is the goal. They’re teaching agents to code because code is the widest door into the machine. Once you’re through that door, the question stops being “what can this model write” and becomes “what can this model reach.”

The file system is the working primitive

That’s why the file system matters so much right now. It’s the working primitive of this era, and it earns the role by asking nothing new of anyone. It’s already on every machine. It’s ready for a command line. An agent can move through it, read from it, and act on it without a single special interface built first. A folder of PDFs is a database the agent already knows how to query. A directory of notes is memory it can update between tasks.

Andrej Karpathy’s idea of an agent keeping its own wiki on disk works precisely because the file system demands no setup. You don’t provision it. You don’t define a schema or stand up a service. The simplest abstraction wins, and today the simplest abstraction that lets an agent touch real work is a folder it can open. That simplicity is the whole point, and it’s also the catch. The file system works because it’s text, and text is what these models were born fluent in. But most of the world’s work doesn’t live in text you can read off a page. It lives behind screens.

Vision breaks the text ceiling

Vision is the next layer, and it’s the one that breaks the ceiling. A coding agent can do anything you can express as a script. It still can’t do the thing a temp does on their first morning: look at an unfamiliar screen, find the button that says “Approve,” and click it. Plenty of software offers no API, no command line, and no clean way in except the pixels a person stares at. When an agent can read a screen the way it reads a file, the reachable surface stops being “everything programmable” and becomes “everything visible.” That’s a far larger number. It includes the legacy desktop app the finance team won’t give up, the internal tool nobody documented, and the vendor portal that ships no integration on purpose.

Vision doesn’t replace the coding layer. It extends that layer to the parts of the computer we never bothered to make programmable, because the only user was ever a human with eyes. You can read the order of operations in how the labs are moving. Coding first, because it’s the cheapest and most reliable way to reach the programmable majority of tasks. Vision second, because it’s harder and the models aren’t quite there, but it’s the only path to the rest. The endpoint isn’t a smarter autocomplete. It’s an agent that operates a computer the way you do, through whatever interface happens to be in front of it.

What this means for what you build

This should change what you build. The instinct, when a new capability appears, is to wrap it. Put a clean interface on the coding agent, add a few buttons, and ship a product. But look at what the agents are converging toward. They’re learning to use the computer directly, the same surfaces a person uses. The wrapper you build to sit between the agent and the work is the exact thing the next model release makes unnecessary. We used to add value by being the friendly layer over a hard system. The friendly layer is now the thing getting automated.

What survives is the opposite of a wrapper. It’s whatever makes the underlying surface richer for an agent that’s already standing on it. If your software is the system of record, the durable move is making it something an agent can read and act on cleanly, not bolting an AI button onto the screens you already have. If your value is locked inside documents, the play isn’t a chatbot stretched over them; it’s turning those documents into context an agent can trust. The labs are handing every builder a worker who already uses a command line and will soon use a screen. The work left for the rest of us isn’t building the worker. It’s making sure that when the worker shows up, the thing it needs to touch is ready to be touched.

This is why the coding agent is a bigger deal than it looks, and why reading it as a tool for programmers misses the point. Code was never the territory. It’s the road: paved, already running almost everywhere, and the few places it can’t reach are the places vision is being built to cover. If you’re deciding where to spend the next two years, don’t spend them smoothing the on-ramp to that road. Build the destination worth driving to, and make sure an agent can get in the door.

The Intent Layer

Discussion about this post

Ready for more?