How We Pick a Model: Frontier, Mid, or Cheap
The instinct is to reach for the most capable model and stop thinking. That instinct quietly wastes money and adds latency. Here is the decision we actually run for every AI feature we build.
There is a reflex on every AI project to reach for the best model available and leave it there. It feels safe. Nobody was ever blamed for picking the most capable option. The trouble is that the most capable option is also the slowest and the most expensive, and a surprising amount of the work in a real product does not need it.
Choosing a model well is one of the highest-leverage decisions in an AI feature, and it is usually made by default rather than on purpose.
The Reflex to Reach for the Best Model
Frontier models are extraordinary, and they are priced accordingly. When you route every task through one, you pay frontier rates for jobs a far cheaper model would have nailed, and your users wait longer than they needed to. We have inherited features where the entire cost problem was a single decision nobody revisited after the prototype: the demo used the top model, so production did too.
What the Expensive Model Is Actually For
The frontier earns its price on genuinely hard work. Multi-step reasoning where one wrong turn ruins the answer. Tasks that need broad world knowledge and careful judgement at the same time. Long, messy inputs where the model has to hold a lot in its head and stay coherent. Code that has to be correct, not merely plausible. When the cost of a wrong answer is high and the problem is genuinely difficult, the best model is not an indulgence, it is the cheapest path to a result you can ship.
Where the Cheap Model Is Indistinguishable
Most products are full of small, well-defined jobs. Classifying a message into one of five buckets. Extracting a date and an amount from an email. Rewriting a sentence to be more polite. Deciding whether a chunk of text is relevant. For work like this, a cheap, fast model produces output a user could not tell apart from the frontier, at a tiny fraction of the cost and a fraction of the wait. Paying frontier rates here buys you nothing a customer will ever notice.
Routing: The Boring Answer That Works
The strongest pattern is not one model, it is several, each doing what it is suited to. A cheap model handles the high-volume, low-stakes steps. The frontier handles the one step that genuinely needs it. Sometimes a small model triages the request and only escalates the hard cases upward, so you pay the premium on the minority of calls that warrant it rather than all of them.
This adds a little engineering complexity, and it is almost always worth it. A feature we reworked this way kept its quality, cut its average cost per request by roughly two thirds, and got noticeably faster for the common case, because most requests stopped waiting on the slowest model.
How We Decide
The test we run for each step of a feature is short. How bad is a wrong answer here, and how hard is the task really. If a mistake is cheap to catch and the job is narrow, start with the cheapest model that passes your evaluations and only move up if it fails. If a mistake is expensive and the task needs real reasoning, start at the top and only move down once you have proof a cheaper model holds up.
The point is to make the choice deliberately, per step, with evidence. Run the candidates against a set of real examples, look at where the cheap one breaks, and let that decide. "Use the best model everywhere" is not a strategy. It is the absence of one, and clients pay for it every month until someone looks.
Want a second opinion on whether your AI feature is over-specified and overpaying? Get in touch. We do this analysis often, and the savings usually surprise people.
Related articles
Charging Clients for AI When the Cost Is a Moving Target
Fixed-price delivery and variable per-token cost do not naturally fit together. Here is how we structure AI work commercially so the client gets a clear number and we do not absorb an open-ended bill.
6 min readThe Token Bill Nobody Budgeted For
Token pricing looks trivial on the pricing page and arrives as a real number on the invoice. Here is how AI features actually accumulate cost, and how we budget for it before a line of code is written.
6 min readToken Prices Keep Falling. Here's What That Changes.
The cost of a given level of model intelligence has dropped sharply and keeps dropping. That changes which features are worth building, and it quietly punishes anyone who designs as if today's price is permanent.
5 min read