A Thousand PRs a Week
The most important part of a software engineer's job was never writing code; it was deciding what should exist, a fact that AI has now made clear.
Stripe merges over a thousand pull requests a week through a system no human reviews line by line. Ramp built an agent that now writes most of the company’s PRs. These aren’t demos. They’re the daily output of real engineering teams, and they describe a job that didn’t exist eighteen months ago.
None of this came out of nowhere. The work those agents replaced had been shrinking for decades, and you can trace the whole decline if you start where it began.
The first programmers wrote in machine code, then assembly, moving values between registers by hand. The translation from human intent to running behavior was almost total: you held the whole machine in your head and spoke its language directly. It was slow, it was unforgiving, and a single wrong address could take down the program with no explanation. The scarce skill was the ability to think like the hardware.
Then came the abstractions, each one a big deal. Fortran and COBOL let you write something closer to math and English, and a compiler handled the descent into machine code. You gave up some control and got back enormous leverage. This is the pattern that repeats through the whole history: every advance in how code got built was a trade, ceding low-level control to a tool in exchange for working at a higher level of intent. C let you write systems without managing every instruction. Garbage collection let you stop tracking every byte of memory. Each step moved the programmer further from the metal and closer to the idea.
The tools around the code followed the same logic. Version control meant you could change things without fear of losing what worked. The compiler caught your type errors before the machine did. The debugger let you watch execution instead of guessing at it. Test suites let you state what correct behavior looked like and check it automatically. Frameworks packaged decisions other people had already made so you didn’t remake them. None of this changed the fundamental act. You were still translating intent into code. The translation just got faster, safer, and built on more that you didn’t have to write yourself.
By the 2010s, the job had a settled shape. A working engineer spent their day stitching together libraries, reading documentation, copying a pattern from one part of the codebase to another, and debugging the gap between what they meant and what they typed. Most code was not invented; it was assembled from known parts and adjusted to fit. Stack Overflow became infrastructure precisely because so much of the work was looking up how someone else had already solved your problem. The craft was real, but a large share of it was recall and recombination, the kind of work that rewards experience and pattern-matching more than raw invention.
That’s the world the first coding models walked into, and at first they fit it neatly. Autocomplete that finished your line. Then your function. You’d describe what you wanted in a comment and watch a plausible block appear, and your job was to read it, fix what was wrong, and move on. This felt like a big deal, and it was, but it didn’t change the structure of the work. You were still in the loop on every step, still the one holding intent in your head and checking each piece against it. The model was a faster way to do the recall-and-recombine part. You were still the translator; you’d just hired a very quick assistant who occasionally made things up.
Then, over the last year, the structure broke.
The models got good enough to stop being assistants. Good enough to take a whole feature, not a line, and produce something that mostly works. Good enough to run for an hour or more without a human checking each move, holding a plan across dozens of files. Good enough that you could split a task across several agents working in parallel, each owning a piece, reconciling their changes into one system. The change wasn’t that autocomplete got better. The change was that the human stopped being in the loop on every step, and started being in the loop on the ones that matter.
This isn’t a forecast, which is why we started with the receipts. Ramp’s agent generates most of the company’s PRs on the real codebase, not a side repo. Stripe’s system merges its thousand a week. WorkOS runs what they call a code factory that listens to Slack messages, Linear tickets, and error reports and opens pull requests on its own before anyone asks. Read those numbers as what they are. A thousand merged PRs a week is not a person typing faster. It’s a person who has stopped typing PRs and started doing something else.
So what’s the something else? Here’s where the whole history pays off. The scarce skill was always translation, and we treated code as the valuable thing because code was the artifact we shipped and got paid for. But code was never what we actually wanted. We wanted the behavior it produced, and we wanted that behavior to match what we meant. Code was just the only place meaning could live where a machine could read it. When the translation costs nothing, the thing we treated as the product turns out to have been a stand-in for something we’d never had to name.
Watch what the new systems actually need, and you can see what that something is. A code factory doesn’t open useful pull requests because it writes clean functions. It opens them because someone defined what an error should trigger, what counts as a fix, and what a fix is forbidden to touch. The bottleneck moved. It used to sit between your idea and the code. Now it sits between your idea and how precisely you’ve stated it. The scarce skill is specification: saying what you want clearly enough that a fast, literal, tireless collaborator can act on it without you in the room.
That’s a harder skill than it sounds, and most of us are worse at it than we think, because code used to carry it for us implicitly. Code failed loudly; a bad spec fails quietly, at scale, a thousand PRs a week built on a misunderstanding nobody caught. A junior engineer handed a fuzzy ticket asks four questions before lunch, and those questions force the intent into focus. An agent doesn’t ask. It guesses, runs for three hours on the guess, and hands you a confident answer to the wrong question.
So the history of how code got built turns out to have a direction, and we can finally see where it points. Every abstraction moved us further from the machine and closer to the intent. The compiler, the framework, the agent are all the same move at different scales: hand the translation to a tool so the human can work at the level of meaning. Last year we reached the end of that line. The translation is handled. What’s left is the part that was always the actual job, the part the code was only ever standing in for: deciding what should exist and saying it clearly enough to be true.


