Testing¶

Why We Test¶

Tests exist to give you confidence — confidence that your code works, confidence that you can refactor without breaking things, confidence that new features don't regress existing behavior.

Tests are not about hitting coverage numbers. They're about enabling you to move fast without breaking things.

"Write tests. Not too many. Mostly integration." — Guillermo Rauch

The Testing Trophy¶

We follow the Testing Trophy model (Kent C. Dodds), which prioritizes tests by return on investment:

                    ╭───────────╮
                    │    E2E    │  ← Few, critical user journeys
                 ╭──┴───────────┴──╮
                 │   Integration   │  ← Primary focus (best ROI)
              ╭──┴─────────────────┴──╮
              │        Unit           │  ← Pure functions, utilities
           ╭──┴───────────────────────┴──╮
           │      Static Analysis        │  ← TypeScript, ESLint, Ruff, MyPy
           ╰─────────────────────────────╯

Why Integration Tests Win¶

Unit tests are fast but can pass while the system is broken
E2E tests catch real bugs but are slow and flaky
Integration tests hit the sweet spot: they test real behavior, run reasonably fast, and don't break when you refactor internals

"The more your tests resemble the way your software is used, the more confidence they can give you." — Kent C. Dodds

FIRST Principles¶

Every test should be:

Principle	Meaning
Fast	Tests should run in milliseconds, not seconds. Slow tests don't get run.
Independent	Tests should not depend on each other. Run in any order, run in parallel.
Repeatable	Same input → same output. No flakiness. No dependence on external state.
Self-validating	Pass or fail. No manual inspection of logs or output.
Timely	Write tests before or alongside code, not as an afterthought.

Arrange-Act-Assert¶

Structure every test in three phases:

def test_user_can_change_password():
    # Arrange — set up the preconditions
    user = create_user(password="old_password")

    # Act — perform the action being tested
    result = user.change_password("old_password", "new_password")

    # Assert — verify the expected outcome
    assert result.success is True
    assert user.verify_password("new_password") is True

test('user can submit the form', async () => {
  // Arrange
  const user = userEvent.setup();
  render(<ContactForm />);

  // Act
  await user.type(screen.getByLabelText('Email'), 'test@example.com');
  await user.click(screen.getByRole('button', { name: 'Submit' }));

  // Assert
  expect(screen.getByText('Message sent')).toBeInTheDocument();
});

Keep the three phases visually distinct. A reader should immediately see what's being set up, what's being tested, and what's expected.

What to Test¶

Category	Test?	Why
Business logic	Yes	Core value of your application
Error handling	Yes	Users will hit errors; handle them gracefully
Edge cases	Yes	Empty inputs, nulls, boundaries, concurrent access
API contracts	Yes	Endpoints should return expected shapes
Happy path	Yes	The primary use case must work
Third-party wrappers	No	Trust the library; mock it at the boundary
Private methods	No	Test through public API; internals can change
Generated code	No	OpenAPI clients, Prisma, etc. — test the generator, not the output
UI layout details	Rarely	CSS positioning changes; test behavior, not pixels

What NOT to Test¶

Implementation Details¶

If your test breaks when you refactor internals (without changing behavior), you're testing implementation details.

Bad — coupled to implementation:

def test_user_service_calls_repository():
    mock_repo = Mock()
    service = UserService(repo=mock_repo)
    service.get_user(123)
    mock_repo.find_by_id.assert_called_once_with(123)  # Testing HOW, not WHAT

Good — tests behavior:

def test_user_service_returns_user():
    service = UserService(repo=real_or_fake_repo)
    user = service.get_user(123)
    assert user.id == 123
    assert user.email == "expected@example.com"

The Mockery Anti-Pattern¶

If you're mocking so much that you're testing mocks instead of code, step back. Integration tests with real (or realistic) dependencies give more confidence.

Coverage Philosophy¶

Target 80%+ coverage on business logic. Don't chase 100%.

Coverage tells you what code ran, not whether it works correctly. A test that executes code without meaningful assertions is worse than no test — it gives false confidence.

What to cover: - Services, use cases, business rules - API endpoints (request → response) - Data transformations - Error paths

What to exclude from coverage: - Generated code (API clients, ORM models) - Configuration files - Framework boilerplate - Development-only code

Test Naming¶

Use descriptive names that explain the scenario:

# Pattern: test_<unit>_<scenario>_<expected>

def test_login_with_invalid_password_returns_401():
    ...

def test_create_order_with_empty_cart_raises_validation_error():
    ...

def test_password_reset_token_expires_after_24_hours():
    ...

// Pattern: "should <expected behavior> when <scenario>"

test('should display error message when login fails', async () => { ... });

test('should redirect to dashboard when authenticated', async () => { ... });

test('should disable submit button while form is submitting', async () => { ... });

A test name should tell you what broke without reading the code.

Anti-Patterns¶

The Inspector¶

Tests that reach into private state or implementation details. They break on every refactor and don't test real behavior.

The Mockery¶

So many mocks that you're not testing real code. If everything is mocked, what are you actually verifying?

The Flaky Test¶

Tests that sometimes pass and sometimes fail. These erode trust in the entire test suite. Fix or delete them.

The Slow Test¶

A test that takes seconds (or minutes) discourages running tests frequently. Optimize or move to E2E.

The Giant Test¶

A single test that sets up a complex scenario and makes 50 assertions. Break it into focused tests.

The Coverage Chaser¶

Adding tests just to hit a coverage number, without meaningful assertions. Coverage is a metric, not a goal.

DAMP over DRY¶

In production code, DRY (Don't Repeat Yourself) reduces bugs.

In tests, DAMP (Descriptive And Meaningful Phrases) improves readability. A test should be understandable in isolation — you shouldn't need to trace through helpers and factories to understand what's being tested.

Too DRY:

def test_user_creation():
    user = make_user()  # What kind of user? What properties?
    assert validate(user)  # What does validate check?

DAMP:

def test_user_with_valid_email_passes_validation():
    user = User(email="valid@example.com", name="Test User")
    assert user.is_valid() is True

Duplication in tests is acceptable if it improves clarity.

Stack-Specific Guides¶

Backend Testing (Python/FastAPI) — pytest, async testing, database fixtures, API testing
Frontend Testing (React/TypeScript) — Jest, Testing Library, Playwright, component testing