Harnesses in AI: A Deep Dive — Tejas Kumar, IBM

Tejas Kumar argues that the true value of an AI agent lies not in the unpredictable model, but in the deterministic code, or "harness," that you wrap around it.

Jun 06, 2026

Harnesses in AI: A Deep Dive — Tejas Kumar, IBM — AI Engineer · 20:26

Most agent content treats the model as the product and the wrapper code as plumbing. Tejas Kumar wants you to see it the other way around. The model is the rented, expensive part you don’t own and can’t fully predict. The harness is the cheap, deterministic part you write. If 2025 was the year of agents, 2026 is the year of the thing that makes agents work.

This video is part of a hand-curated collection — each video is picked one at a time, not pulled from a recommendation feed. The talks here are the ones worth a full watch if you’re trying to get serious about agentic engineering. Each written summary was drafted by an AI pass over the transcript and then edited by a human. The source video is linked at the top of every post, so if something reads off, go to the tape.

The Year of Harnesses

Tejas Kumar (IBM) opens his AI Engineer talk with a show of hands: who can confidently explain what an AI harness is? Almost nobody raises their hand. The word is everywhere now. In ML, it means a glorified test suite. In the AI engineering world, it means something else entirely, and most of the people using it can’t say what. So Tejas spends eighteen minutes building one on stage, in JavaScript, from scratch.

Why you pay rent for tokens you can’t trust

The opening framing is economic. Every inference call is rent paid to someone else’s data center, for output you cannot fully predict. The move is obvious once stated. Use a cheap model, wrap it in a great harness, and you can go very far. Qwen, GPT-OSS, anything small and free becomes useful again, not because the model got better but because the scaffolding around it carries the weight. Tejas is an AI Developer Advocate at IBM, where the team trains frontier models and builds harnesses around them. He came to talk about the second job.

The agent loop, naked

He writes a fifty-line agent that’s supposed to log into Hacker News and upvote the first story. The agent loop is the simple thing every framework hides: read history, call the model, run the tool the model picked, push the result into history, repeat. A max_steps counter keeps it from running forever. That’s the entire pattern. No magic.

He runs it. The agent hits the login page, can’t authenticate, gets stuck, and then confidently reports success. The model has no way to tell the difference between “I clicked the button” and “I clicked the button and something happened.” It will tell you the job is done because the next-token distribution says that’s how these stories end. The agent isn’t broken. It’s working exactly as a probabilistic system is supposed to work, which is the problem.

The harness as anti-lie machine

Now the harness goes in, and the first thing it does is catch the lie. Tejas writes a deterministic function called verifySuccessfulUpvote: pure code, no prompting, walking the actual tool-call history to check whether a real browser click on the upvote element occurred. If the agent claims success but the trace says otherwise, the harness overrides the agent and returns failure. He adds a second deterministic check: if the agent never invoked the login tool and yet the current URL is the login page, fail immediately.

Re-run. The agent still fails, but now it fails honestly. This is the move worth sitting with. The harness doesn’t make the model smarter; it makes the model auditable. Step one to solving a problem is admitting you have one.

The harness as privileged co-process

The second addition is more interesting. Tejas writes a loginHandler that runs before every loop iteration, inspects the browser’s current URL, and, if the agent is staring at a login form, programmatically fills in credentials and submits. The credentials live in the harness’s environment, not the model’s context. Then it pushes a message back into the agent’s history saying, “I’m the harness, I logged you in, carry on.”

Re-run. The agent logs in, upvotes, finishes in six iterations. Nothing about the prompt changed. Nothing about the model changed. The entire delta is deterministic code wrapped around a non-deterministic core, doing the work the model shouldn’t be trusted to do.

Where this goes

Tejas closes with the IBM application: Open RAG, an open-source project that wraps enterprise RAG in a harness strong enough to ship into regulated environments with siloed data. The model isn’t doing the security; the harness is. That’s the pattern at scale.

Then the forecast you should take seriously. 2025 was agents. 2026 is harnesses. 2027, he hopes, is dynamic harnesses: agents that, given a goal, generate their own harness before executing, the way you might enter plan mode but with real guardrails compiled out of the plan. You ask for a flight ticket; the agent writes the verification scaffold, runs against it, and hands you back something it can prove worked.

The lesson is small and uncomfortable: most of the value in your “agent” is the part you haven’t written yet.

The Intent Layer

Discussion about this post

Ready for more?