Back to Blog
TestingEngineeringDevelopment

Pragmatic Testing for Small Teams: What to Skip, What to Keep

Most testing advice is written for teams with full-time QA engineers. For small teams, the question is different: what tests are actually worth the effort to write and maintain?

EvolRed Team··7 min read

Most testing advice is written as if the reader has unlimited time. Test everything. Test at every layer. Aim for 100% coverage. Follow the pyramid.

For small teams, this advice is unhelpful. You do not have unlimited time. You have a handful of engineers who are already behind on features, and a test suite that is either non-existent or a drag on velocity. The real question is: given that you can only write a limited number of tests, which ones are worth writing?

We have lived on both sides of this. Here is what we have settled on for small teams.

The Starting Premise

Tests are not free. Every test is code that has to be written, read, maintained, and updated when requirements change. A test that catches a bug once and is then maintained forever has a clear ROI. A test that catches nothing but regularly breaks during refactors is a liability — another form of the technical debt nobody budgets for.

This means the question is not "should we test this?" but "is this test going to earn its keep?". Most testing dogma ignores this trade-off, which is why it ends up with test suites that teams quietly stop running.

The Tests That Almost Always Earn Their Keep

End-to-end tests of the critical user paths. If your app is an e-commerce site, this is "user can find a product, add it to basket, and check out". If it is a SaaS tool, this is "user can sign up, log in, and complete the core workflow". These tests are expensive per test, but they catch the most catastrophic failures — the ones where the whole product stops working — and they are usually few in number. Five to ten of these, running reliably, is worth more than a thousand unit tests.

Tests of money-handling code. Anything involving payments, refunds, invoicing, or accounting. These are high-consequence areas where bugs are expensive and often invisible until something gets audited. Test them thoroughly, including the error paths.

Tests of permission and authorisation logic. The single most common serious bug in the codebases we inherit is broken access control. Tests of who can see what, who can modify what, and what happens when an unauthorised user tries, are almost always a good investment.

Tests of business rules with non-obvious edge cases. Anything where the correct behaviour depends on a combination of inputs that is easy to get wrong. Date/time handling with timezones. Discount logic with stacking rules. Any calculation where you would have to sit down with a calculator to verify by hand. These tests protect against regressions during refactors.

Tests of code that has broken before. If you have had a bug in production, write a test that would have caught it before you fix it. This is the single highest-signal test in any codebase, because it encodes a specific thing you already know can go wrong.

The Tests That Usually Do Not Earn Their Keep

Unit tests of trivial code. Testing that a getter returns what was set. Testing that a function with one line of logic behaves as expected. These tests catch nothing, exist mostly to make coverage numbers look better, and break constantly during refactors.

Mocks of your own code. A test that mocks your database, your API, your authentication service, and then asserts that the thing you mocked was called in the way you expected to call it, is mostly testing that the code you wrote is the code you wrote. Tests that mock heavily are tests that pass when the code is correct and also when it is not.

Tests of framework behaviour. You do not need to test that useState works, that your ORM persists records, or that your web framework routes requests correctly. The framework authors have those tests. Writing your own is duplicated work.

UI tests of cosmetic details. Snapshot tests of rendered components, pixel-perfect visual regression tests for every screen. These generate enormous amounts of breakage during legitimate design changes, and the signal-to-noise ratio is usually bad.

Over-mocked integration tests. A test that hits a mocked database, a mocked queue, a mocked external API — at that point you are testing that the mocks agree with each other, not that the system works.

The Testing Pyramid Is Not Wrong, But It Is Not the Only Answer

The classic testing pyramid — lots of fast unit tests, fewer integration tests, very few end-to-end tests — is reasonable advice if your team is big enough to maintain all three layers well.

If your team is small, the pyramid inverts. The highest-leverage tests are usually end-to-end tests of the critical paths, because each one covers a large amount of code and a specific user-visible failure mode. Unit tests, in small quantities focused on the areas above, are useful supporting infrastructure.

Kent C. Dodds has argued for a "testing trophy" shape — fewer unit tests, more integration tests — and for JavaScript applications specifically this tends to be better advice than the pyramid. The principle is the same: test at the level where the tests give you the most confidence for the least maintenance overhead.

What We Actually Do

On a new project, we start with two things:

  1. A smoke test that verifies the app starts, responds to a health check, and handles a representative request end-to-end. This runs on every deploy and catches the "we broke the build" class of issue.

  2. End-to-end tests for the one or two most important user paths. For most apps this is "user can sign up, log in, and complete the primary action". These run on CI on every PR and catch the "we broke the whole product" class of issue.

As the project matures, we add:

  • Unit tests of any business logic that is non-trivial enough to justify them, written as we write the logic.
  • Integration tests of any critical external boundary — our API, our webhook handlers, our payment integration.
  • A regression test every time we fix a production bug.

What we do not do: pursue coverage metrics. Coverage is a lagging indicator of confidence, not a leading one. A codebase with 80% coverage and no end-to-end tests is less well-tested than one with 30% coverage and five critical-path tests that actually run.

The Test You Will Thank Yourself For

If you do one thing: write an end-to-end test of your most important user flow, and run it on CI. Not a thousand unit tests. One test, that actually exercises your app the way a user does, and runs on every change.

This single test will catch an enormous fraction of the things that would otherwise get to production. It will feel like overkill right until the first time it fails for a reason you did not expect, at which point you will wish you had written it sooner.

The Test You Should Actually Delete

Look through your existing test suite for tests that have broken twice in the last six months without ever failing for a reason that was about the behaviour of your system. Tests that break during refactors, tests that are sensitive to implementation details, tests that nobody remembers why they exist.

These are not earning their keep. They are training your team to ignore test failures, which is actively worse than not having tests at all.

Delete them. The suite gets smaller. Velocity goes up. The tests that matter are easier to notice when they fail.

The Uncomfortable Admission

This blog post is a good example of the trade-off. The project this site is built on has zero tests. We have looked at what the tests would catch, and it is a small amount of value for a real amount of maintenance. The deployments are fast, the feedback loop is short, and a broken build is visible immediately.

We are not arguing this is the right answer for every project. We are arguing it is the right answer for this one, and being honest about that is better than pretending otherwise.

The testing question is not "are you testing enough?". It is "are you testing the things that matter, and skipping the things that do not?". For small teams, the answer is usually: less than the textbooks recommend, and more than you are doing right now in the specific areas that matter.


Have a test suite that feels like a drag but you are not sure what to cut? Get in touch — we have strong opinions about which tests pay off, informed in part by our honest view of LLMs in day-to-day engineering.