Engineering teams face an increasing tension. A developer with an LLM can write ten thousand lines of code in a day. That same developer can review maybe a few thousand lines with full attention.
This creates a severe bottleneck. These burgeoning tools that increase output velocity also make traditional code review effectively impossible. You cannot merge what you cannot verify. You cannot verify what you cannot read.
Most projects respond by slowing down, or understanding less. They limit LLM usage or require lengthy manual review with each change. Or, in the worst case, simply merge things into production that are poorly understood and cause technical debt to accrue.
This creates the perception that LLMs are error prone, developing with them is slow, and their outputs cannot be trusted. But the truth is, our traditional processes have failed. The root cause stems from what we choose to verify. Traditional code review asks “is this implementation well-crafted?”, however, the better question is “does this satisfy requirements?”
We found inspiration in the field of civil and architectural engineering. An architect doesn’t weld the strut or carve the statue, they decide what element is needed where and document it. An electrician doesn’t worry about the aesthetics of a fixture fitting into an area, they focus on the safety and correctness of the electrical system.
Near the end of the process, an inspector enforces the building code by checking reality against the agreed upon specs and the local building codes. They fully understand the way things must be to pass, and subsequently verify that structural elements meet requirements. They confirm that fire exits exist in the right locations. They test that electrical systems are grounded properly. Put simply- they verify the building doesn’t break any specific rules or deviate from the spec.
A building stands or falls based on the requirements it does or doesn’t satisfy. Not on who built it or how things “look”. The load-bearing element either handles the load or it doesn’t.
Trust isn’t placed in a bricklayer or a painter or even an architect. Trust is earned step by step through process and inspection.
This works because:
Software engineering can work the same way.
The core of our methodology is a five-step process where every stage has a clear input, output, and verification method. This structure ensures that as our velocity increases, our rigor remains.
Step 1: The Design Document (Decision Capture)
This is a “thinking document,” not a specification. It captures the choices made under constraints, the true craft of engineering.
• Context & Scope: Sets boundaries by defining the problem, background, goals, and critical Non-Goals to prevent scope creep.
• Proposed Design: Explains the approach through architecture diagrams, component interactions, and data flow examples to communicate the engineer’s mental model.
• Alternatives Considered: Documents rejected approaches and the reasons for those rejections to prevent future relitigation.
• Implementation Dependencies: Lists required components, invariants, API contracts, or features that must exist first to make sequencing clear.
• Cross-Cutting Concerns: Explicitly addresses system-wide impacts like security, privacy, performance, and monitoring.
• Decisions: Records resolved choices, including the options considered and the trade-offs made.
The Rule: Every change discovered during programming must be retroactively documented here, explaining what was learned and why the direction shifted.
Step 2: The Specification (The Definition of Success) While the design doc explains the thinking, the spec defines what success looks like. If a requirement cannot be tested unambiguously (pass/fail), it does not belong in the spec. The spec must be deeply informed by established best practices both in the chosen repository, and in the intended environment.
• User Stories: Includes priority justifications and acceptance scenarios in a strict Given/When/Then format.
• Exhaustive Edge Cases: Covers missing data, concurrent operations, legacy formats, and partial failures.
• Functional Requirements: Atomic, testable “MUST/MUST NOT” statements numbered for traceability (e.g., FR-001).
• Entity Definitions: Precise domain object descriptions, including table schemas, types, and constraints.
• Technical Details: Maps affected files, tools, frameworks, and “quality gates” like linters and required tests.
• Success Criteria: Concrete, measurable performance targets and user experience goals.
• Clarifications: A date-stamped log of Q&A and decisions made during development.
Step 3: Programming (Proof-Oriented Development) The engineer uses an LLM in a fresh context. It receives only the spec and best practices.
• Forced Clarity: If the spec is ambiguous, programming fails, forcing the engineer back to Step 2 for clarification.
• Verification over Debate: We don’t debate code aesthetics; we prove requirements are met through tests that either pass or fail.
• Test-Driven Proof: The LLM pursues correctness through TDD or minimal demonstration apps; the goal is satisfying the spec with minimal waste.
Step 4: Adversarial Review (The Inspection) A reviewer + fresh LLM (perhaps with a different LLM) examines the code, spec, and design to perform a correctness audit, not a quality review.
• Audit Questions: Does every entry/exit parameter have validation? Does the code align with the design? Which edge cases remain untested?
• The Gap List: Any missing functionality, missing validation, undocumented design deviations, or standard violations are flagged for iteration.
Step 5: Iteration and Approval The developer addresses the gap list and reruns validation. The final review consists of three parts:
Design Review: Does the solution still make sense? Are all changes from the initial design explained?
Specification Review: Are requirements testable? Are edge cases exhaustive? Does the spec match the design?
Validation Review: Do all tests pass? Does the demonstration app show the feature works end-to-end?
The Critical Distinction: In this methodology, the reviewer does not examine the code. Once the spec is satisfied, the “how” is irrelevant. Engineering is about finding an effective solution with minimal waste, not the “perfect” solution.
If the code passes linting, satisfies the spec, aligns with the design, and includes full validation, it merges as a complete package where the documentation is the primary artifact and the code is simply implementation detail.
This methodology works because it directly harnesses the best advantages of LLMs while constraining the core frictions in developing with them: they are incredibly fast but prone to drift. By shifting our focus, we solve three specific challenges.
1. Focus through Constraints LLMs are extraordinarily good at finding solutions within boundaries. Without clearly reinforced boundaries they tend to drift, adding unnecessary features or writing code that looks right but misses the point.
• The Spec as Guardrails: The specification provides the necessary constraints by defining exactly what goes in and what must come out.
• Focusing Capabilities: This doesn’t limit the AI; it focuses its power on satisfying every documented requirement.
• Adversarial Verification: A second layer of review from a fresh context catches any details the first pass might have missed.
2. Documentation as the Primary Artifact In traditional workflows, documentation is often an afterthought. Here, the design doc and spec are the primary engineering artifacts, the actual “product” of the engineer.
• Defining vs. Describing: We stop using documentation to describe what already exists and start using it to define what should exist.
• Capturing Intent: Every decision and change is documented, creating a permanent record that any human or LLM can use six months later to understand the “why” behind the code.
• Code as a Byproduct: The code is simply the functional downstream implementation of our design and specification. If the spec needs to change in response to bugs or new requirements, the code can be rapidly iterated.
3. Moving from Subjective Review to Objective Proof Traditional code review asks, “Is this code well-written?” which is subjective and requires a line-by-line crawl through thousands of functions. This methodology asks a much faster, objective question: “Does this code satisfy the spec?“.
• Objective Verification: Tests either pass or they don’t; the spec is either satisfied or it isn’t.
• Reviewing Outcomes, Not Implementation: We’ve stopped debating implementation “aesthetics” because there are dozens of ways to satisfy a requirement. If the code passes validation and stays in scope, it is correct.
• Scalable Speed: A 7,000-line PR no longer takes hours to review. Because the spec is concise and the validation is rigorous, review time drops to minutes while confidence in the result stays high.
The Result: A Higher Standard This approach isn’t about lowering standards to move faster; it’s about raising them by making “good” an objective measurement. We’ve shifted our energy away from the mechanical act of typing and toward the rigor of designing and specifying. Instead of trusting that an implementation “looks right,” we now have the proof that it works.
Getting started is simpler than it looks: pick one new feature, write a design doc and spec for it, and run the process to see what works for your team. The goal isn’t immediate perfection, but introducing a level of rigor that you can refine as you go.
• Don’t “boil the ocean”: You don’t need to retrofit your entire legacy codebase; focus on new features and only write design docs for old code when you are actively modifying it.
• Maintain a paper trail: Every pivot matters. If a spec changes or a design decision is reversed during implementation, document the “why” so that learning is captured.
• Demand a reason for every line: If a piece of code doesn’t satisfy a specific requirement, it shouldn’t be there. Either it or the spec must change.
Logistics and Ownership Your documentation should not live in a silo. Store your design docs, specs, and validation files in the repository alongside the code they describe. This ensures they are versioned, merged, and updated together as a single primary artifact.
When it’s time for review, bring in a neutral third party rather than the person who wrote the code. Their job is to verify that the design makes sense and the validation proves correctness of the spec, not to nitpick the implementation.
Costs While LLM API costs are real, they are now an expected part of the development budget. Because this methodology uses tight constraints to prevent “false starts”, you will likely see token usage stay equal to, or even lower than traditional approaches. The massive gains in speed and quality are well worth the investment.
LLMs have flipped the script on what’s actually expensive in engineering. Writing code used to be the hard part, but now it’s nearly free. The real cost has shifted to reading that code, understanding why the code is the way it is, and verifying that it actually does what it’s supposed to do.
Our old playbook was designed for a world where humans did all the typing. We were taught to obsess over the details and structure of our code. This simply doesn’t work when an AI can dump 10,000 lines on your desk in an afternoon.
This proves that value in development isn’t code anymore it’s clarity. If you can’t define exactly what you need or prove that you have it, you cannot move forward. This methodology thoroughly resolves this by using rigor to kill ambiguity and embracing validation to demonstrate correctness.
In this new paradigm, we don’t need to trust the code or coder; we can trust the process.