Opinion

Mar 18, 2026By Kostas Karolemeas

agentic systemsAI governancesoftware architectureenterprise AIjudgment

When Machines Build for Machines, Judgment Becomes the Bottleneck

If agentic systems make execution abundant and machine-native outputs normal, the real enterprise constraint shifts to judgment: what to optimize, what to trust, and how to govern systems humans can no longer fully inspect line by line.

When Machines Build for Machines, Judgment Becomes the Bottleneck

Eric Schmidt and Dror Berman's essay, , makes a strong claim that I think many people will underestimate on first reading.

The shift is not only that AI can help people build faster. The shift is that technology is beginning to be built by agents and increasingly consumed by agents, which removes two constraints that shaped almost every system we have known:

human coordination on the production side,
and human perception on the consumption side.

That argument is directionally right. It also has a consequence that deserves even more attention than the phrase "machines build for machines" itself.

When execution becomes abundant, judgment becomes the bottleneck.

Not judgment in the vague sense of "humans still matter." Judgment in the operational sense:

what should be built,
which signals matter,
what counts as acceptable evidence,
when a system should act,
and who remains accountable when the system is no longer fully understandable in human-native form.

That is where the next serious competitive gap will open.

The Scarcity Shift Is Real

For most of software history, the main bottlenecks were familiar. Engineering capacity was scarce. Coordination was expensive. Release cycles were slow. Most teams spent more time negotiating what they had the bandwidth to build than imagining what might actually be useful.

Agentic systems change that equation.

If a small team can generate, test, refactor, and reconfigure far more software than a similarly sized team could a year ago, then the classic production constraints weaken fast. If systems can also generate outputs intended primarily for other systems rather than for direct human interpretation, then design constraints loosen on the consumption side too.

This is not just a productivity story. It is an architecture story.

More things can be built. More variants can be built. More context-specific systems can be built. More of them can be rebuilt continuously instead of shipped as fixed artifacts.

That changes the economics of execution. It also changes the economics of mistakes.

Technical Debt Does Not Disappear. It Changes Shape

One of the most provocative ideas in the Schmidt-Berman essay is that what we call technical debt may really be human-comprehension debt, and that if code is increasingly written for machines and consumed by machines, that debt starts to dissolve.

There is something important in that claim.

A large share of software structure really is a concession to human limitations. Naming, modularity, documentation, and architectural conventions are partly there so future humans can parse what earlier humans did. If an agent can regenerate or refactor entire systems more easily than a person can read them, some of the old assumptions about maintainability do indeed start to weaken.

But the debt does not vanish. It mutates.

It becomes evaluation debt.

The central question is no longer, "Can a human understand every part of this codebase?" The central question becomes, "Can the organization evaluate whether this system is behaving correctly, safely, and accountably under real conditions?"

That is a harder problem in many enterprises.

You may not need every line to be readable. You do need:

inspectable boundaries,
reliable tests,
observable runtime traces,
rollback mechanisms,
policy constraints,
and evidence strong enough for humans to make consequential decisions.

A system nobody reads may be workable. A system nobody can evaluate is not.

Machine-Native Systems Still Live Inside Human Institutions

This is the point many futurist essays glide past.

Even if the production and consumption of technology become more machine-native, most important systems will still operate inside human institutions for a long time:

companies,
hospitals,
regulators,
supply chains,
courts,
defense organizations,
and financial systems.

Those institutions do not only care whether the system works. They care whether the system can be trusted, audited, explained at the right level, constrained, insured, and governed.

That means the future is unlikely to be "machine-native all the way down." It is more likely to be machine-native execution with human-accountable control layers.

This matters for software especially.

A generated internal component used for a narrow task may not need stable human-readable structure. A decision system involved in pricing, clinical prioritization, compliance review, hiring, underwriting, or operational triage is different. It may contain machine-optimized internals, but it still needs human-legible control points around intent, permissions, evidence, evaluation, and escalation.

The right question is not whether humans will keep reading every artifact. They will not.

The right question is where human legibility still needs to exist because authority, liability, and strategic judgment still live there.

The New Product Is Not Just Software. It Is Judgment Architecture

The Schmidt-Berman essay is especially sharp when it argues that the concept of a static product may start to dissolve. If software can materialize on demand for a user, a moment, or a task, then version numbers and fixed releases start to look like artifacts of an earlier production regime.

Again, that is directionally right. But here too, enterprises will not simply replace products with fluid generation. They will replace static software products with dynamic systems plus control architecture.

In practice, that means the valuable thing will no longer be just the generated output. It will be the surrounding judgment architecture:

the rules that shape what may be generated,
the policies that govern what may be acted on,
the evaluation system that decides whether performance is good enough,
the telemetry that makes behavior inspectable,
and the release discipline that decides when change is acceptable.

This is why abundant code will not make operational discipline less important. It will make it more important.

When generation is cheap, the risk is not only underbuilding. It is overproduction without epistemic control.

You get more software, more workflows, more agents, more automated decisions, more local optimizations, and more hidden fragility.

The organizations that win will not be those that generate the most. They will be those that know how to govern a much higher rate of generation without losing coherence.

What Leaders Should Actually Do

If you accept that execution is becoming abundant, then leadership work has to move up a layer.

The job is less about assembling raw production capacity and more about designing the conditions under which machine execution can be trusted.

That requires at least five moves.

1. Treat evaluation as a first-class capability

If agents are generating code, workflows, analyses, and operational actions at scale, then evaluation is no longer a QA afterthought. It becomes core infrastructure.

Every important system needs defined success criteria, regression checks, failure review loops, and clear thresholds for escalation.

2. Preserve human-legible control points

Do not aim for human readability everywhere. Aim for human accountability where consequences live.

Intent, permissions, policy, evidence, and rollback all need representations that responsible humans can inspect and act on.

3. Separate machine-native optimization from institutional trust

A system can be maximally efficient for machine consumption and still be unusable for an organization if nobody can certify, govern, or defend it.

Optimization and trust are related, but they are not the same design problem.

4. Shift metrics away from output volume

In an abundant-execution environment, volume is a vanity metric.

Better questions are:

How quickly do we detect bad behavior?
How reliably do we constrain side effects?
How fast can we recover?
How much evidence do we have when the system acts?
How much valuable work can we automate without losing control?

5. Train people in judgment, not only prompting

The next capability gap will not just be "who can use AI tools." It will be "who can define constraints, evaluate outputs, decide what matters, and redesign workflows around abundant machine labor."

That is a different management and operating skill than prompt fluency.

The Real Strategic Divide

The deepest implication of this transition is not that machines will replace human builders. It is that many organizations will keep thinking like execution is scarce long after it stops being scarce.

They will optimize for throughput in the layer that is already becoming cheap. They will underinvest in the layer that is becoming valuable: judgment.

That is where the strategic divide will emerge.

One class of organization will use agents mainly to increase output inside an old management model. Another will redesign itself around machine abundance, stronger evaluation, tighter policy encoding, clearer accountability, and faster learning loops.

The second group will build fewer illusions. It will build better systems.

Not because its models are magically smarter. Because its judgment architecture is.

A Practical Takeaway

If you are leading an AI initiative, ask these questions now:

Which parts of our system still assume that human coordination is the limiting factor?
Where are we about to generate more software or automation than we can properly evaluate?
Which decisions require human-legible evidence even if the underlying system becomes machine-native?
Do we have explicit rollback, audit, and escalation paths for high-consequence agent behavior?
Are we training teams to produce more output, or to exercise better judgment over abundant output?

The organizations that answer these well will be much better positioned for the world Schmidt and Berman describe.

Because once machines can build at scale for other machines, the question stops being whether we can generate more.

The question becomes whether we can still govern what we generate.

Where Gaia Fits

Gaia is relevant when a team wants to turn this argument into operating infrastructure rather than leave it at the level of commentary. The platform is useful where abundant generation meets enterprise accountability: designing workflows, constraining runtime behavior, capturing evidence, evaluating outcomes, and improving systems without losing traceability.

For readers who want to connect this idea to implementation, the most relevant Gaia resources are the , the , the , and the essay on .

About the author

Kostas Karolemeas

Product and Technology Lead of Gaia, two-time founder, and software product executive with more than three decades of experience building and scaling products across healthcare, architectural and mechanical engineering software, logistics and supply chain, financial services and banking, enterprise resource planning (ERP), and visual effects (VFX) for television.