AI Translated Document

This document was translated using an AI translation tool.
Due to the nature of AI translation, some sentences or terms may be interpreted differently from the original intent.
There may be mistranslations or inaccuracies, so please refer to the original text or consider additional review if accuracy is critical.
Your suggestion for a better translation would be highly appreciated.

Original Document in Korean

TDD - Unit Tests Did Not Provide Value¶

HJ, Ph.D. / Software Architect
(js.seth.h@gmail.com)
Draft: April 2021 / Revised: December 2025

[Author’s Intent] This article does not deny testing techniques themselves, but aims to reveal the problems that arise when tests are applied without considering “what is actually being protected.”

Executive Summary¶

Personally, there was a period when TDD failed for me.
I knew about TDD since 2005, but couldn't use it practically.
Around 2010, I tried to optimize design itself for testing & TDD, but saw little result.
Around 2015, I redefined the relationship between Testee and Tester, and completely redefined the criteria for TDD's effectiveness.
Since 2020, I don't write traditional TDD or unit tests, but apply test-driven development from a completely different perspective.
There are some principles for this.
Testee is a genius, Tester is a fool. So an approach like "zero-knowledge proof" is needed.
TDD is a user champion participating in development.
The ultimate goal of TDD is "confidence in normal service."
Absolute control over the environment is required.
So, development/testing must be possible locally. Specifically, don't share servers between developers.
Isolated tests do not provide confidence. Prefer non-isolated tests.
Write dependent, sequential tests and operate in blocks.
TDD is expensive, but provides irreplaceable "evidence" of normal service.

Expectations and Reality of TDD¶

TDD (Test Driven Development) has long been considered an ideal methodology in software development. Many developers believe TDD can improve code quality and system reliability.

But I never really felt TDD was effective for me—at least not before 2015.

Honestly, there was no clear standard for writing tests. There are proposals like BDD, GWT, but none felt like a solution. Writing test files as per the usual guides isn't hard—just run and check values. But no matter how many I wrote, I rarely felt the project was progressing.

// add.test.js
import { add } from './add';

describe('add()', () => {
  describe('adding two integers', () => {
    it('should return the sum', () => {
      // Given
      const a = 2;
      const b = 3;

      // When
      const result = add(a, b);

      // Then
      expect(result).toBe(5);
    });
  });

  describe('when negatives are included', () => {
    it('should sum correctly', () => {
      expect(add(-2, 3)).toBe(1);
      expect(add(-2, -3)).toBe(-5);
    });
  });
  describe('when 0 is included', () => {
    it('should return the other value', () => {
      expect(add(0, 5)).toBe(5);
      expect(add(5, 0)).toBe(5);
    });
  });
});

Some say: if there's a lot of green and no red, the project is going well...

No matter how many tests pass, problems keep coming up. If TDD is effective, the project should go much more smoothly than without it. Errors should be caught in advance, and stable systems should be delivered mid-way. But that wasn't the case, and at first I thought I just wasn't running tests properly. So I considered the concept of Testable Architecture—i.e., that the architecture itself wasn't designed for testing, so effectiveness was low. In the end, this approach failed. It was inevitable, because I hadn't faced the essence of Tester and Testee, or the real point of writing test cases as extra code.

The Essence of Tester and Testee¶

As far as I know

I've never seen existing materials clearly distinguish and define the essence of Tester and Testee.
But you need to know the unchanging essence and its limits to decide how far to go.

Testee: The thing being tested
Tester: The tester, who judges normality? No, first, the Tester uses the Testee. Judgment comes after.

A commonly overlooked point in TDD is the essential difference between Tester and Testee. The Testee is the actual working code, the thing we want to verify. The Tester is the code that uses the Testee, giving input and checking results. Most TDD guides just say "test code verifies production code" or "tests clarify requirements." But this is not just a simple distinction.

The Testee is essentially the product. It must handle many real-world problems, so it becomes complex and sophisticated. The Testee is the best solution the developer can offer for the problem at hand—the "general solution" in mathematical terms. But if the Tester is more functional than the Testee, then the Tester should be the product. If the Tester can calculate the correct result for every input it gives, then the Tester is the product. It's not product vs. quality check, but product vs. a superior product.

Simply put: Testee is a genius, Tester is a fool. In test-driven development, always remember: the Tester should not try to understand or surpass the Testee. This is an essential, forced limitation.

Thus, testing must have the dynamic of "zero-knowledge proof." The Testee contains all logic and complexity, but the Tester knows nothing inside. The Tester just knows a few cases and, if the Testee gives the same result, says "you're better than me." So, having the Tester check every state of the Testee is against the essence.

So don't say things like "judge only by clear expected values." The Tester is just a stand-in for the user. When a user logs in, they don't check the session value—they just see "success" on the screen. Before clear expected values, reduce the "criteria" to the user's level. The Tester should be dumb. Otherwise, you won't know if you're making a test or a product.

Suppose you make a monitor and do quality control. If it turns on and shows the adjustment screen, that's enough. That's how users judge it.

Tester = User Champion Participating in Development¶

A "user champion" is a passionate, skilled user of a product or technology, often in a company or software context.

We need to redefine the role of the Tester.

So far, discussion has focused on technical layers from unit to E2E tests, based on the Testee's scope. But this doesn't explain the essential role or purpose of testing. That's why most developers fail to use TDD well. What really matters is not the structural test layers, but what perspective and responsibility the Tester should have.

Have you ever considered SRP (Single Responsibility Principle) for the Tester, which is also code?

The Tester's role is to represent the perspective of the actual user—the "user champion." User champions understand the product's value and spot real-world problems. They don't just check if features work, but if the product is fit for purpose and trustworthy. This never happens at the unit test level. No user champion would judge a login by whether five items appear in a list.

A classic example is the Excel team at Microsoft, where accountants and finance experts worked closely with developers, giving real-time feedback on prototypes. Features like pivot tables and advanced formulas were improved based on user champion input. This kind of collaboration makes software much more valuable in the real world.

In practice, it's hard to have user champions on the team. Even in common fields, they're rare, and for new services, there may be no one with relevant experience. And user champions can't keep up with developer speed. Verifying a set of features is a long process, and humans can't keep up with every code change. Debugging is a string of failures, which is hard for people to endure.

So, code-based Testers can't fully replace human user champions, but should imitate them as much as possible to help during development and debugging. Testers should be written from the perspective of real users.

For example, don't just check API responses with Given-When-Then, but check the whole process—situation, action, action, ... action, result—at the human user level. For a shopping mall, you need to check scenarios like "login - search - select product - choose option - pay - confirm payment - track delivery - confirm receipt" and "login - search - add to cart - search - add to cart - checkout - choose option - pay" to see if the service is satisfactory at the user level.

Every developer can't have their own user champion, but test code can partially fill that role. The more the Tester internalizes the user champion's perspective, the more trustworthy the product is. Most importantly, if TDD is successful, the developer can be sure that at least a few minimum user paths are guaranteed.

The Ultimate Goal: Confidence in Normal Service¶

Suppose you inspect a monitor. If every circuit path works, can the user see the screen? What is the test for?

What matters in quality assurance is not just that some code works, but that the service can be reliably provided to many users. Unit tests or GWT (Given-When-Then) are too narrow to discuss value at the service level. And if you consider the effort to make service-level Testers that imitate user champions, there's only one justifiable ROI.

Writing tests takes a lot of time and skill. You must cover all normal and failure paths a real user might encounter. You must include all intended features, and for each branch, all cases. You must also check not just success, but rejection and follow-up actions, and failure cases. Rejection means planned denials like wrong password or account lock, and you must check that after handling, the service works. You must also check that unexpected errors (hardware failures, unhandled cases) are handled gracefully.

In a deterministic system, routine results fall into three categories: intended success, intended rejection, failure—success/rejection/failure. This is a topic for another document.

So the whole service is repeatedly checked with 2–5 or more inputs and situations, and the test code is much larger than the product code—often 10x more lines. Code lines don't exactly equal time, but they're not unrelated. If you want real value from TDD, expect to invest much more than a beginner would guess.

For me, time spent on test and product code is often close to 1:1. Nearly half of total development time goes to writing and maintaining tests. With such huge investment, the only justifiable ROI is this: TDD must give you confidence that the service actually works and users can use it. The purpose of tests is not code perfection, but to have minimum evidence that the system works as expected in the real world.

To be clear: it's about "evidence." Not the abstract feeling of "trust," but the concrete fact that the service works if used this way.

Modern software is so complex that users can use it in many ways. It's almost impossible to block all unintended paths in advance. But if you can guarantee a few normal paths, check known rejection paths, and handle unexpected errors, you have much stronger evidence than just saying QA/QC checked it last year.

Conditions for Confidence-Providing Tests¶

Absolute control over the environment
- For tests to be trusted, the execution environment must be fully controlled.
- The developer must control all dependencies—external systems, network, DB, etc.—and guarantee the same conditions for every test run.
- For example, the DB must be droppable and recreatable. I've even made a TCP/IP emulator for PLC integration.
- Every developer must use an independent environment. All tests must be runnable on a local machine. Don't share test infra—results get polluted.
Prefer non-isolated tests
- Isolated tests for single modules/functions don't reflect real service interactions.
- Non-isolated tests and scenario tests with time order between components verify real service behavior and increase trust.
- Honestly, I never write isolated tests. I use mocks only when absolutely necessary—fewer than three in all tests.
  
  We're not doing lab demos—we're making commercial products.
Block operation of dependent tests
- Real services have many features working in sequence, each affecting the next. Tests should run multiple features in sequence.
- But test runners and per-feature timeouts are useful tools. So, write unit tests for each feature, but run tests in blocks.
- For example, for signup → login → purchase → payment → delivery tracking, write each as a test, group them as a block, and check the whole service works.

Test Pyramid

(1) End-To-End Testing (UI Testing) - 10%
(2) Integration Testing - 20%
(3) Unit Testing - 70%

If you study TDD, you'll see this pyramid and recommended ratios.
It's often said "the more granular the test, the easier to find errors"—but that's wrong. Use the call stack for that.

And the higher the value, the greater the cost of failure, and the more complex, the more you need automated tests. Even by simple economics, honestly, the reverse pyramid is better.

If the Product (=Testee) is a library or framework, the usual pyramid applies.
I think the reason this ratio is so common is that TDD is still mostly used for libraries/frameworks, not applications.

Conclusion¶

Don't see TDD as just a technical matter. Unit, integration, and end-to-end tests are technical divisions. You can get local optima but still have a bad result, and TDD is no exception. Test is not a technical ritual for absolution, but a tool for ensuring software reliability and service availability. You must give the Tester a clear role, and if that's a user champion, the details in this document follow logically.

TDD takes a lot of resources. Test and product effort is close to 1:1. TDD is not a separate task, but a method for managing the whole development process. So I suggest All or Nothing for TDD.

What TDD ultimately provides is not the abstract feeling of "trust," but the physical evidence that the service works. The developer can be sure of at least the minimum normal, rejection, and failure paths. TDD's success is not about the amount of tests, but about guaranteeing real service quality and user experience.

If there are system boundaries

Sometimes people misunderstand "user" as always the general public. User is a broader concept, defined by the system boundary.

If the system is an API server, the API user is the user.
If you develop DB stored procedures, backend, and frontend separately, the backend developer is the user for the DB, the frontend developer is the user for the backend, and the website user is the user for the frontend. Likewise, if you develop an SDK like DirectX, the SDK user is the user.

Author¶

HJ, Ph.D. / Software Architect
(js.seth.h@gmail.com)
https://js-seth-h.github.io/website/Biography/

Over 20 years in software development, focusing on design reasoning and system structure - as foundations for long-term productivity and structural clarity.

Researched semantic web and meta-browser architecture in graduate studies,
with an emphasis on structural separation between data and presentation.

Ph.D. in Software, Korea University
M.S. in Computer Science Education, Korea University
B.S. in Computer Science Education, Korea University