Uber: Leading engineering through an agentic shift - The Pragmatic Summit

How Uber is restructuring engineering for an AI-first future.

Jun 04, 2026

Uber: Leading engineering through an agentic shift - The Pragmatic Summit — The Pragmatic Engineer · 37:39

This video is part of a hand-curated collection — each video is picked one at a time, not pulled from a recommendation feed. The talks here are the ones worth a full watch if you’re trying to get serious about agentic engineering. Each written summary was drafted by an AI pass over the transcript and then edited by a human. The source video is linked at the top of every post, so if something reads off, go to the tape.

Uber’s Agentic Shift: When Toil Becomes the Asset

At The Pragmatic Engineer Summit, Anshu Bansal (Uber, Developer Platform org) and Ty Smith (Uber, principal engineer) walked through what two years of building agent infrastructure at Uber scale actually looks like. The headline isn’t velocity. It’s where the velocity went. Once Uber’s developers had real background agents, 70% of the workloads they pushed in were toil. This included upgrades, migrations, dead code cleanup, and bug fixes, not greenfield work. The thing developers hate most became the thing agents are best at, and the thing developers wanted to do, which was build product, got more of their time. That’s the inversion: the AI win at Uber isn’t that engineers got faster. It’s that the boring work got cheap.

From pair programming to peer programming

The 2022–2023 Copilot era gave Uber a 10–15% velocity bump through synchronous tab completion. That number is fine, but it’s also not the future Anshu described. The 2025 paradigm is async. A developer hands a workload to an agent, walks away, and comes back to course-correct. Uber’s working model now treats every developer as the tech lead of a small fleet of agents, kicking off three, four, or five jobs in parallel, then triaging the PRs that come back. Vibe coding stopped being a joke. It became a serious operating model the company built for.

Platforms first, agents second

Ty’s contribution to the talk is a reminder that the dramatic agent demos rest on quiet platform work. Uber’s Michelangelo ML platform already proxied frontier models through a gateway. On top of that, a tiger team built a central MCP gateway with auth, telemetry, and logging, plus a sandbox and registry so engineers can discover and test MCPs safely. A CLI called AIFX provisions and configures the agent clients themselves (Claude Code, Codex, Cursor) with sensible defaults so new users don’t start cold. The pattern is the one every infra team eventually arrives at. Don’t let a thousand teams each integrate Anthropic and OpenAI separately. Build the gateway, build the registry, build the defaults, then let people experiment on top.

Minion and the bottleneck that moved

Minion is Uber’s background agent platform, and the design call worth noting is that it runs on Uber’s own CI infrastructure rather than a vendor’s cloud. Mono repos sit pre-checked-out. Network access into internal services is already wired. MCPs route through AIFX. The agent shows up in Slack, GitHub PRs, the web UI, and the CLI, wherever the engineer is. The cleverest small detail is a red icon that flags low-quality prompts before the agent runs, and a built-in prompt improver suggests fixes. Defaults beat user input.

But the moment background agents work, the bottleneck moves. More code means more review. Code review is no one’s favorite work, and now developers are drowning in it. Uber’s response was three products: Code Inbox (smart assignment by ownership, timezone, calendar availability, and SLO tracking), U Review (a plugin architecture with a grading layer that kills low-confidence nitpicks before they reach the engineer), and Autocover (a custom unit test agent on top of their LangFX SDK that ships ~5,000 merged tests per month at roughly 3x the quality of a generic agent, with a critic engine that’s been spun out as a standalone test validator). Each one exists because shipping code got cheap and reviewing it didn’t.

Shepherd: the missing backbone

Looking at how Meta and Google land “X% of our code is now AI-generated,” Ty’s team realized Uber didn’t have the scaffolding underneath. Specifically, there was no scalable way to run large-scale changes across the monorepo. Shepherd fills that gap. A migration author writes a YAML file pointing at either a deterministic transformer like OpenRewrite or an agent prompt. Shepherd generates the PRs, finds the right code owners, refreshes branches on a cadence, integrates with Code Inbox, and tracks the campaign end to end. The Java 21 migration ran through it. The lesson is that AI-generated code doesn’t scale without the campaign management layer that gets it reviewed, refreshed, and landed.

The people and the bill

Anshu closed with the parts nobody puts in a keynote. Adoption is slow despite the magic. He once watched four VPs land code in 24 minutes for the first time in years. They loved it. They still didn’t change their habits. Top-down mandates moved the metric weakly. Sharing peer wins — engineer to engineer — moved it more, because engineers trust other engineers, not directors. The other unglamorous truth: cost is up 6x since 2024, the CFO wants revenue impact rather than diff counts, and Uber now routes plans through expensive models and execution through cheap ones so developers don’t have to think about it. Every new tool — JetBrains AI, Warp — drags in its own cost model to absorb.

The closing posture worth stealing: don’t get married to the stack you’re building. If Cursor ships something next month that obsoletes Autocover, that’s fine. Ship the outcome, not the platform.

The Intent Layer

Discussion about this post

Ready for more?