New vocabulary for an old problem

In January 2019, yields at TSMC's Fab 14B started dropping. Something was wrong with the 12nm and 16nm process lines. Engineers worked through the possibilities for weeks. They eventually traced the problem to a batch of photoresist from a supplier: a chemical impurity too subtle to show up in standard quality checks. By the time they found it, 30,000 wafers were ruined. The damage was $550 million in lost revenue.

TSMC is the most sophisticated semiconductor manufacturer on earth, building what Chris Miller, author of Chip War, calls the most complex devices in human history. A leading-edge chip passes through thousands of process steps with tolerances measured in atoms, and TSMC documents every step. The problem was not carelessness. It was that no single person held enough of the picture to trace a falling yield back to its cause.

This is not a new problem. In his 1972 Turing Award lecture, Edsger Dijkstra observed that “the competent programmer is fully aware of the strictly limited size of his own skull”. The skull hasn’t grown, but the systems have. In 2019, a team of industrial engineers studying automated manufacturing named the resulting gap: epistemic debt. Not a failure of documentation or diligence, but the inevitable result of systems too complex for any one person to hold in their head.

Ward Cunningham arrived at the same insight from a different direction in 1992. He used a financial metaphor: code written to your current understanding carries a hidden debt, because no developer holds the full picture either. Within a decade, technical debt had become shorthand for sloppy code. But Cunningham was explicit: “I’m never in favour of writing code poorly, but I am in favour of writing code to reflect your current understanding of a problem even if that understanding is partial.” The word “technical” pointed everyone at the code, but the debt was always in the understanding.

The erratics

Windows 95 shipped with code that detected SimCity and silently altered the memory allocator, because the game read freed memory that older versions of Windows had tolerated by accident. Raymond Chen has catalogued dozens of similar cases across two decades at Microsoft: programs that parsed error message strings, applications that relied on undocumented behaviour including outright bugs. Rather than break them, the Windows team shipped compatibility shims to reproduce the old, broken behaviour. The bugs had become load-bearing: fix them, and customers blamed Windows, not the application.

Hyrum’s Law generalises the pattern: with enough users of an API, it does not matter what you promise in the contract. All observable behaviours of your system will be depended on by somebody.

Every system has behaviour that was intended. It also has bugs: race conditions, edge cases nobody anticipated. The third kind is the dangerous one: a bug that has been running long enough that other systems now depend on it. An accidentally intentional behaviour is both a bug and a feature, and no amount of reading the code will tell you which. You need to know what the system was for, not just what it does.

The distance between what a system does and what anyone understands about it is never zero, and in any system with enough users, someone is probably already depending on whatever lives in that space.

Ground truth

In the 1850s, the Great Trigonometrical Survey of India measured the same distance by two independent methods: triangulation across the land, and astronomical observation using a plumb bob. The results should have agreed. They differed by 5.236 arc seconds, a sliver of a degree, about 160 metres on the ground, but well outside the Survey’s margin of error.

Everyone expected the gravitational pull of the Himalayas to deflect the plumb bob sideways, so John Henry Pratt, a mathematician who took up the problem on his own, calculated how much pull the mountains should exert based on their visible mass. His result deepened the puzzle: they were pulling only about a third as much as his calculation suggested they should.

George Airy, the Astronomer Royal, proposed an explanation. Mountains have deep roots of lighter rock extending into the denser mantle below, like icebergs floating in the sea. This lighter rock displaces heavier material beneath, so the mountains exert less pull than their bulk suggests. The theory that emerged, isostasy, reshaped geology. Airy didn’t add a third measurement. He proposed a structural model that explained why the first two diverged.

In software, code and tests are two measurements of the same intent. Code says what the system does; tests say what it should do. But both emerge from the same understanding, and their agreement can confirm a shared assumption rather than the underlying intent. The Survey’s two methods were independent, and they still needed a deeper theory to explain what both had missed.

A behavioural specification is not a third measurement. It is a model of what the system is for: its rules and constraints, stated independently of how they’re implemented. Code, tests and specification each describe the same intent from a different angle, the way the Survey’s methods each measured the same distance. Where any two disagree, something is wrong.

Where the spec describes a simple rule but the code implementing it is complex, a question surfaces: is the spec naive, or has the code grown beyond what the problem requires? Where tests pass but verify different behaviour from what the spec describes, another: are the tests checking the right thing, or has the spec not kept pace? Each divergence is a question you wouldn’t know to ask without the third angle. Checking all three against each other systematically has a name: semantic triangulation.

Terracing

Legacy code is code where the gap between what it does and what anyone understands about it has grown too wide to safely change. By that measure, code shipped in an afternoon with an AI pair programmer, where the code is the only description of what the system should do, is legacy the moment it’s merged. The system runs. The tests pass. But the reasoning behind each decision was made by a model with no memory of making it.

Sonar’s 2026 State of Code survey found that AI already accounts for 42% of committed code, and 96% of developers don’t fully trust it. The 2025 Stack Overflow Developer Survey reported trust in AI accuracy falling to 29%, with two thirds of developers spending more time fixing “almost right” code than writing it themselves would have taken. The bottleneck in software delivery has moved from writing to verification.

The industry’s response has been to front-load intent. GitHub open-sourced Spec Kit in 2025 and it crossed 71,000 stars within months. ThoughtWorks placed spec-driven development on their Technology Radar. AWS built Kiro, an IDE that generates specifications before code. We built Allium for the same purpose: a specification language with enough formal structure to surface contradictions that prose specifications would leave hidden. The 2025 Octoverse found TypeScript overtaking Python as the most-used language on GitHub, with 66% year-over-year growth. GitHub’s explanation: “Strongly typed languages give AI much clearer constraints.” When code is cheap to produce, clarity of intent becomes the scarce resource.

The popular narrative is that AI will flood codebases with unreviewed code, but the fear may be proving self-correcting. Because so many organisations take the risk seriously, the same economic pressure is pushing them to do what most never managed when humans wrote every line: specify what the system is for, then check the code against it.

AI didn’t create the need for semantic triangulation, but it may be the reason it finally happens.

The deeper map

TSMC’s contaminated photoresist passed incoming inspection: one check from one angle. The expertise to triangulate across the full process existed across the organisation, but it was distributed across teams, and no single person could connect the steps. It took weeks of engineers cross-referencing from different angles before the fault surfaced. Half a billion dollars lost to a gap in understanding.

Every codebase carries this debt. For three decades we called it technical debt and tried to pay it down by rewriting code. Epistemic debt is the more honest name. Semantic triangulation is the practice that makes it visible.

Where code, tests and specification diverge, you’ve found the debt. Cunningham knew where to look: the debt accrues in the understanding, and that is where it has to be repaid.

This is how we approach AI-assisted engineering at JUXT. If you’d like to explore what spec-driven development looks like for your systems, we’d welcome a conversation.

New vocabulary for an old problem

Two recent terms that describe an engineering challenge as old as the hills.

The erratics

Ground truth

Terracing

The deeper map

Composition at a distance

Agile's founders wanted to separate intent from implementation, but the tools to hold them apart didn't exist. Now they do.

What outlasts the code

Code is fluid. The thinking it teaches is not.

Three paradoxes of AI-assisted engineering

Everyone is talking about Jevons' Paradox. Here are three that matter more.

New vocabulary for an old problem

Two recent terms that describe an engineering challenge as old as the hills.

The erratics

Ground truth

Terracing

The deeper map

Composition at a distance

Agile's founders wanted to separate intent from implementation, but the tools to hold them apart didn't exist. Now they do.

What outlasts the code

Code is fluid. The thinking it teaches is not.

Three paradoxes of AI-assisted engineering

Everyone is talking about Jevons' Paradox. Here are three that matter more.

Sign up for our newsletter