Skip to content

Testing

Why We Test

Tests exist to give you confidence — confidence that your code works, confidence that you can refactor without breaking things, confidence that new features don't regress existing behavior.

Tests are not about hitting coverage numbers. They're about enabling you to move fast without breaking things.

"Write tests. Not too many. Mostly integration." — Guillermo Rauch


The Testing Trophy

We follow the Testing Trophy model (Kent C. Dodds), which prioritizes tests by return on investment:

                    ╭───────────╮
                    │    E2E    │  ← Few, critical user journeys
                 ╭──┴───────────┴──╮
                 │   Integration   │  ← Primary focus (best ROI)
              ╭──┴─────────────────┴──╮
              │        Unit           │  ← Pure functions, utilities
           ╭──┴───────────────────────┴──╮
           │      Static Analysis        │  ← TypeScript, ESLint, Ruff, MyPy
           ╰─────────────────────────────╯

Why Integration Tests Win

  • Unit tests are fast but can pass while the system is broken
  • E2E tests catch real bugs but are slow and flaky
  • Integration tests hit the sweet spot: they test real behavior, run reasonably fast, and don't break when you refactor internals

"The more your tests resemble the way your software is used, the more confidence they can give you." — Kent C. Dodds


FIRST Principles

Every test should be:

Principle Meaning
Fast Tests should run in milliseconds, not seconds. Slow tests don't get run.
Independent Tests should not depend on each other. Run in any order, run in parallel.
Repeatable Same input → same output. No flakiness. No dependence on external state.
Self-validating Pass or fail. No manual inspection of logs or output.
Timely Write tests before or alongside code, not as an afterthought.

Arrange-Act-Assert

Structure every test in three phases:

def test_user_can_change_password():
    # Arrange — set up the preconditions
    user = create_user(password="old_password")

    # Act — perform the action being tested
    result = user.change_password("old_password", "new_password")

    # Assert — verify the expected outcome
    assert result.success is True
    assert user.verify_password("new_password") is True
test('user can submit the form', async () => {
  // Arrange
  const user = userEvent.setup();
  render(<ContactForm />);

  // Act
  await user.type(screen.getByLabelText('Email'), 'test@example.com');
  await user.click(screen.getByRole('button', { name: 'Submit' }));

  // Assert
  expect(screen.getByText('Message sent')).toBeInTheDocument();
});

Keep the three phases visually distinct. A reader should immediately see what's being set up, what's being tested, and what's expected.


What to Test

Category Test? Why
Business logic Yes Core value of your application
Error handling Yes Users will hit errors; handle them gracefully
Edge cases Yes Empty inputs, nulls, boundaries, concurrent access
API contracts Yes Endpoints should return expected shapes
Happy path Yes The primary use case must work
Third-party wrappers No Trust the library; mock it at the boundary
Private methods No Test through public API; internals can change
Generated code No OpenAPI clients, Prisma, etc. — test the generator, not the output
UI layout details Rarely CSS positioning changes; test behavior, not pixels

What NOT to Test

Implementation Details

If your test breaks when you refactor internals (without changing behavior), you're testing implementation details.

Bad — coupled to implementation:

def test_user_service_calls_repository():
    mock_repo = Mock()
    service = UserService(repo=mock_repo)
    service.get_user(123)
    mock_repo.find_by_id.assert_called_once_with(123)  # Testing HOW, not WHAT

Good — tests behavior:

def test_user_service_returns_user():
    service = UserService(repo=real_or_fake_repo)
    user = service.get_user(123)
    assert user.id == 123
    assert user.email == "expected@example.com"

The Mockery Anti-Pattern

If you're mocking so much that you're testing mocks instead of code, step back. Integration tests with real (or realistic) dependencies give more confidence.


Coverage Philosophy

Target 80%+ coverage on business logic. Don't chase 100%.

Coverage tells you what code ran, not whether it works correctly. A test that executes code without meaningful assertions is worse than no test — it gives false confidence.

What to cover: - Services, use cases, business rules - API endpoints (request → response) - Data transformations - Error paths

What to exclude from coverage: - Generated code (API clients, ORM models) - Configuration files - Framework boilerplate - Development-only code


Test Naming

Use descriptive names that explain the scenario:

# Pattern: test_<unit>_<scenario>_<expected>

def test_login_with_invalid_password_returns_401():
    ...

def test_create_order_with_empty_cart_raises_validation_error():
    ...

def test_password_reset_token_expires_after_24_hours():
    ...
// Pattern: "should <expected behavior> when <scenario>"

test('should display error message when login fails', async () => { ... });

test('should redirect to dashboard when authenticated', async () => { ... });

test('should disable submit button while form is submitting', async () => { ... });

A test name should tell you what broke without reading the code.


Anti-Patterns

The Inspector

Tests that reach into private state or implementation details. They break on every refactor and don't test real behavior.

The Mockery

So many mocks that you're not testing real code. If everything is mocked, what are you actually verifying?

The Flaky Test

Tests that sometimes pass and sometimes fail. These erode trust in the entire test suite. Fix or delete them.

The Slow Test

A test that takes seconds (or minutes) discourages running tests frequently. Optimize or move to E2E.

The Giant Test

A single test that sets up a complex scenario and makes 50 assertions. Break it into focused tests.

The Coverage Chaser

Adding tests just to hit a coverage number, without meaningful assertions. Coverage is a metric, not a goal.


DAMP over DRY

In production code, DRY (Don't Repeat Yourself) reduces bugs.

In tests, DAMP (Descriptive And Meaningful Phrases) improves readability. A test should be understandable in isolation — you shouldn't need to trace through helpers and factories to understand what's being tested.

Too DRY:

def test_user_creation():
    user = make_user()  # What kind of user? What properties?
    assert validate(user)  # What does validate check?

DAMP:

def test_user_with_valid_email_passes_validation():
    user = User(email="valid@example.com", name="Test User")
    assert user.is_valid() is True

Duplication in tests is acceptable if it improves clarity.


Stack-Specific Guides