The Hidden Costs of Agentic Engineering
The price of AI-generated code fell to near zero; the cost relocated to the failure paths nobody specified.
The price of a working line of code has been falling for many years, and this year it landed near zero. Each step took a slice of the work out of human hands: the compiler, the high-level language, the package registry, the answer already posted on Stack Overflow, and the autocomplete that finished your line. Agents took what was left. You describe a feature and the code comes back, compiling, passing the tests for the cases you named, clean when you run the demo. Translating intent into code, the skill that used to define the job, has stopped being scarce.
But now there is a new cost, and it moved somewhere harder to see.
The bug is in what nobody wrote
Picture a money transfer: debit one account, credit another. Routine. An agent writes it in seconds, and the code is right for every case you’d think to check. A customer moves $200 from checking to savings. Four months later that customer notices the $200 left checking and never arrived in savings. The server had restarted in the quarter-second between the two writes. The debit committed; the credit never ran. Nothing was logged, because from the program’s point of view nothing went wrong. It did exactly what it was told, right up until the moment it stopped existing.
That bug isn’t in the diff. You can read every line the agent produced and never find it because the defect is a line that was never written: the one that makes the debit and the credit succeed or fail together. The agent wrote what the prompt described. The prompt described a transfer, not a crash halfway through one.
This is the shape of nearly every expensive bug now. Not a wrong instruction sitting in the code where a reviewer can catch it, but a missing one. The retry that fires twice and charges twice. Two buyers both told yes for the last unit in stock. An event that arrives before the event it depends on. Engineers who’ve been burned carry these cases as instinct; they flinch at a write that isn’t idempotent the way you flinch at a hot stove. An agent has no scar tissue. It has a prompt.
Absence is the one thing our tools can’t catch
The blindness is structural. A demo exercises the path you built it to show. A test suite checks the cases someone thought to write down. Code review reads what’s on the screen. All three are good at judging what’s present and useless against what’s absent, and the bugs that reach production have always lived in the absence. We tolerated that for decades because the human author was also a slow author, and slowness worked as a filter. You couldn’t ship a thousand unconsidered edge cases in a week; you didn’t have the hours to write them.
That filter is gone. Code generation happens at machine speed and the omissions ride along with it, while the work of catching them, reading carefully and imagining what breaks runs at the same speed it always did. So a senior engineer starts reading every change line by line because the green checkmarks stopped meaning what they used to. Team throughput collapses to whatever that one careful person can vet in a day. The bill that never showed up as a bug shows up as a bottleneck instead, and the rest of it arrives later still: a support ticket, a day spent reproducing a fault that only fires when the network stalls at the wrong instant, a refund, a customer who quietly takes their next order somewhere else. None of it lands on the line item marked “engineering.”
The oldest move in computing
The instinct is to make the agent more careful. Sharper prompts, more tests, a second agent reviewing the first one’s output. That treats an omission as a mistake, and it isn’t one. You can’t reliably prompt your way to remembering every failure mode of every distributed operation any more than you could train every programmer to never once leak a pointer.
We’ve beaten problems with exactly this shape before, and never by trying harder. We moved them. Scheduling lived inside each application until the operating system took it over and handed out time slices. Memory safety was a discipline you kept by hand until virtual memory gave every process its own address space and made one program physically unable to corrupt another’s. Crash recovery meant a custom repair script bolted onto every accounting system until the database swallowed transactions whole and gave back one promise: the write happened completely, or it didn’t happen at all. Each of these began as careful, error-prone handiwork. Each ended as a layer beneath the application, where the failure it guarded against could no longer occur.
The hard problem still sitting up in the application, unmoved, is distributed correctness, the work of retrying without duplicating, surviving a crash mid-write, ordering events that have to arrive in sequence, and letting only one of two racing requests win. Every program that touches more than one machine solves this for itself, unevenly, and now generates that code at agent speed. Libraries help, but a library sits beside your code, not beneath it; you can call it wrong or forget to call it at all.
Make the failure impossible by construction
So put the model one layer down. The agent writes business logic against a runtime that owns the dangerous paths, and a verifier holds that logic to the rules you declared. Every write is idempotent by default, so the transfer that runs twice does nothing the second time. A process that dies mid-operation resumes where it stopped, the debit and the credit still bound to each other. When two requests reach for the last unit at once, the runtime serializes them and exactly one hears yes. The agent never has to remember any of this because the runtime doesn’t give it room to get it wrong.


