Testing¶
Why We Test¶
Tests exist to give you confidence — confidence that your code works, confidence that you can refactor without breaking things, confidence that new features don't regress existing behavior.
Tests are not about hitting coverage numbers. They're about enabling you to move fast without breaking things.
"Write tests. Not too many. Mostly integration." — Guillermo Rauch
The Testing Trophy¶
We follow the Testing Trophy model (Kent C. Dodds), which prioritizes tests by return on investment:
╭───────────╮
│ E2E │ ← Few, critical user journeys
╭──┴───────────┴──╮
│ Integration │ ← Primary focus (best ROI)
╭──┴─────────────────┴──╮
│ Unit │ ← Pure functions, utilities
╭──┴───────────────────────┴──╮
│ Static Analysis │ ← TypeScript, ESLint, Ruff, MyPy
╰─────────────────────────────╯
Why Integration Tests Win¶
- Unit tests are fast but can pass while the system is broken
- E2E tests catch real bugs but are slow and flaky
- Integration tests hit the sweet spot: they test real behavior, run reasonably fast, and don't break when you refactor internals
"The more your tests resemble the way your software is used, the more confidence they can give you." — Kent C. Dodds
FIRST Principles¶
Every test should be:
| Principle | Meaning |
|---|---|
| Fast | Tests should run in milliseconds, not seconds. Slow tests don't get run. |
| Independent | Tests should not depend on each other. Run in any order, run in parallel. |
| Repeatable | Same input → same output. No flakiness. No dependence on external state. |
| Self-validating | Pass or fail. No manual inspection of logs or output. |
| Timely | Write tests before or alongside code, not as an afterthought. |
Arrange-Act-Assert¶
Structure every test in three phases:
def test_user_can_change_password():
# Arrange — set up the preconditions
user = create_user(password="old_password")
# Act — perform the action being tested
result = user.change_password("old_password", "new_password")
# Assert — verify the expected outcome
assert result.success is True
assert user.verify_password("new_password") is True
test('user can submit the form', async () => {
// Arrange
const user = userEvent.setup();
render(<ContactForm />);
// Act
await user.type(screen.getByLabelText('Email'), 'test@example.com');
await user.click(screen.getByRole('button', { name: 'Submit' }));
// Assert
expect(screen.getByText('Message sent')).toBeInTheDocument();
});
Keep the three phases visually distinct. A reader should immediately see what's being set up, what's being tested, and what's expected.
What to Test¶
| Category | Test? | Why |
|---|---|---|
| Business logic | Yes | Core value of your application |
| Error handling | Yes | Users will hit errors; handle them gracefully |
| Edge cases | Yes | Empty inputs, nulls, boundaries, concurrent access |
| API contracts | Yes | Endpoints should return expected shapes |
| Happy path | Yes | The primary use case must work |
| Third-party wrappers | No | Trust the library; mock it at the boundary |
| Private methods | No | Test through public API; internals can change |
| Generated code | No | OpenAPI clients, Prisma, etc. — test the generator, not the output |
| UI layout details | Rarely | CSS positioning changes; test behavior, not pixels |
What NOT to Test¶
Implementation Details¶
If your test breaks when you refactor internals (without changing behavior), you're testing implementation details.
Bad — coupled to implementation:
def test_user_service_calls_repository():
mock_repo = Mock()
service = UserService(repo=mock_repo)
service.get_user(123)
mock_repo.find_by_id.assert_called_once_with(123) # Testing HOW, not WHAT
Good — tests behavior:
def test_user_service_returns_user():
service = UserService(repo=real_or_fake_repo)
user = service.get_user(123)
assert user.id == 123
assert user.email == "expected@example.com"
The Mockery Anti-Pattern¶
If you're mocking so much that you're testing mocks instead of code, step back. Integration tests with real (or realistic) dependencies give more confidence.
Coverage Philosophy¶
Target 80%+ coverage on business logic. Don't chase 100%.
Coverage tells you what code ran, not whether it works correctly. A test that executes code without meaningful assertions is worse than no test — it gives false confidence.
What to cover: - Services, use cases, business rules - API endpoints (request → response) - Data transformations - Error paths
What to exclude from coverage: - Generated code (API clients, ORM models) - Configuration files - Framework boilerplate - Development-only code
Test Naming¶
Use descriptive names that explain the scenario:
# Pattern: test_<unit>_<scenario>_<expected>
def test_login_with_invalid_password_returns_401():
...
def test_create_order_with_empty_cart_raises_validation_error():
...
def test_password_reset_token_expires_after_24_hours():
...
// Pattern: "should <expected behavior> when <scenario>"
test('should display error message when login fails', async () => { ... });
test('should redirect to dashboard when authenticated', async () => { ... });
test('should disable submit button while form is submitting', async () => { ... });
A test name should tell you what broke without reading the code.
Anti-Patterns¶
The Inspector¶
Tests that reach into private state or implementation details. They break on every refactor and don't test real behavior.
The Mockery¶
So many mocks that you're not testing real code. If everything is mocked, what are you actually verifying?
The Flaky Test¶
Tests that sometimes pass and sometimes fail. These erode trust in the entire test suite. Fix or delete them.
The Slow Test¶
A test that takes seconds (or minutes) discourages running tests frequently. Optimize or move to E2E.
The Giant Test¶
A single test that sets up a complex scenario and makes 50 assertions. Break it into focused tests.
The Coverage Chaser¶
Adding tests just to hit a coverage number, without meaningful assertions. Coverage is a metric, not a goal.
DAMP over DRY¶
In production code, DRY (Don't Repeat Yourself) reduces bugs.
In tests, DAMP (Descriptive And Meaningful Phrases) improves readability. A test should be understandable in isolation — you shouldn't need to trace through helpers and factories to understand what's being tested.
Too DRY:
def test_user_creation():
user = make_user() # What kind of user? What properties?
assert validate(user) # What does validate check?
DAMP:
def test_user_with_valid_email_passes_validation():
user = User(email="valid@example.com", name="Test User")
assert user.is_valid() is True
Duplication in tests is acceptable if it improves clarity.
Stack-Specific Guides¶
- Backend Testing (Python/FastAPI) — pytest, async testing, database fixtures, API testing
- Frontend Testing (React/TypeScript) — Jest, Testing Library, Playwright, component testing