Small Models, Big Savings: When You Don't Need the Frontier

For a while the smart default was simple: use the biggest model you can afford and accept the bill. That default is ageing badly. Small models, including open ones you can run yourself, have got good enough that for a meaningful slice of real work the frontier is no longer the obvious choice. On several recent projects the better engineering decision was the smaller model, and not only because it cost less.

It is worth being clear about where that holds and where it does not, because the failure mode in both directions is expensive.

The Frontier Model Is a Default, Not a Requirement

The frontier is where you start when you do not yet know how hard your problem is, which is reasonable for a prototype. The mistake is leaving it there once you do know. A great deal of production work turns out to be narrow and well-defined, and narrow well-defined tasks are exactly what smaller models handle competently. The capability gap that matters on a hard reasoning problem often vanishes on the routine jobs that make up most of a product.

What Small Models Are Genuinely Good At

The sweet spot is high-volume, well-scoped work. Classification into a known set of categories. Extracting structured fields from messy text. Short rewrites and tone adjustments. Routing a request to the right place. Simple, bounded question answering over a small context. For tasks like these, a small model produces output users cannot distinguish from a frontier model, runs faster, and costs a fraction as much. When the work is repetitive and the rules are clear, size buys you very little.

Cost, Latency, and the Data Question

The savings are the headline, and they are real, often an order of magnitude on the right task. But two other benefits matter as much to certain clients. Latency is the first: smaller models respond faster, which is the difference between a feature that feels instant and one that makes users wait. Data residency is the second: an open model you host yourself means sensitive data never leaves your environment, which for clients in regulated sectors is not a nice-to-have but the thing that decides whether the feature is allowed to exist at all. For some engagements that single property outweighs every other consideration.

Where They Fall Down

Small models are not a free win, and pretending otherwise leads to the opposite mistake. They struggle with genuine multi-step reasoning, with tasks that need broad world knowledge, and with the long, messy inputs where a model has to hold a lot together and stay coherent. They are easier to knock off course with an awkwardly phrased request. Push a small model past its range to save money and you will pay it back in wrong answers, retries, and the human time spent cleaning up, which usually costs more than the frontier model would have. Hosting an open model yourself also brings real operational work that the savings have to justify.

How We Choose

We decide per task, with evidence, not per project by reputation. Take the specific job, run a small model against real examples, and look at where it breaks. If it holds up, the savings, the speed, and sometimes the data control make it the right call easily. If it breaks on cases that matter, we move up without hesitation, because a cheaper model that gets it wrong is not cheaper. The frontier and the small model are tools, and the skill is matching each one to the work it suits rather than committing to either as a policy.

The interesting shift is how much work now sits comfortably in small-model territory, and how fast that share is growing. A lot of features built on the frontier today could run on something far smaller, and the gap that justified the bigger model is closing on exactly the routine work most products are made of.

Curious whether part of your AI feature could run on a smaller or self-hosted model? We can test it and tell you what you would gain and what you would give up.

Small Models, Big Savings: When You Don't Need the Frontier

The Frontier Model Is a Default, Not a Requirement

What Small Models Are Genuinely Good At

Cost, Latency, and the Data Question

Where They Fall Down

How We Choose

Related articles

How We Pick a Model: Frontier, Mid, or Cheap

When a New Model Drops, What Changes for Clients

Cutting Your Token Bill Without Cutting Quality