Phase 3 / Ep 12: TDD—The Steel Chains to Tame the Beast

3 MIN READ | UPDATED: 2026-05-15

Many developers dislike writing unit tests, finding them slow and counter-intuitive. In the era of manual coding, you might have been able to maintain a project of tens of thousands of lines by relying on your own memory, even without tests.

But in the era of Agent-driven development, without tests, your entire codebase will collapse within three minutes!

1. The Price of Speed

Large Language Models (like Gemini, Claude, and GPT) can write business logic at a speed of hundreds of lines per second. However, due to their inherent flaws like token forgetting and attention drift, once you reach the refactoring stage, a single change made by the AI can trigger a cascading butterfly effect elsewhere. It won't stop due to fatigue; it will rationalize its changes over and over until it has completely dismantled your existing code structure.

At this point, the only thing that can pull the AI back from the brink is the cold, unforgiving power of assertions.

2. What is TDD for Large Models?

Traditional TDD (Test-Driven Development) follows the cycle: Red (write a failing test) -> Green (write code to pass the test) -> Refactor. TDD for an Agent involves first defining a cage with tests at the beginning of each task in task_plan.md.

The Golden Rule of Agent-TDD:

"You are absolutely forbidden from touching the src/ business logic until you have a failing red test for it in Vitest/Jest."

3. Downgrading "Subjective Dialogue" to "Machine Verdicts"

If you don't write unit tests, how do you test your code? You click around in your browser, and when something breaks, you have to tell the AI: "When I clicked the add button, the time block didn't appear." This is a highly subjective and inefficient way to communicate.

With a TDD environment, as soon as the business logic is wrong, the Agent can run npx vitest run block_allocator.spec.ts in the terminal itself. It can read the resulting console stack trace: "Expected [2 hours] but received [undefined]".

Once it enters this state, the human is completely removed from the painful debugging process. The Agent enters a highly focused and efficient "self-healing loop" by interacting with the terminal. It will only stop and return control to you once the terminal shows a green 'PASS'.

In the next episode, we will forge these chains into reality by writing a dedicated test-driven-development Skill for our system.