Back to Blog

How We Pick a Model: Frontier, Mid, or Cheap

The instinct is to reach for the most capable model and stop thinking. That instinct quietly wastes money and adds latency. Here is the decision we actually run for every AI feature we build.

EvolRed Team··6 min read

There is a reflex on every AI project to reach for the best model available and leave it there. It feels safe. Nobody was ever blamed for picking the most capable option. The trouble is that the most capable option is also the slowest and the most expensive, and a surprising amount of the work in a real product does not need it.

Choosing a model well is one of the highest-leverage decisions in an AI feature, and it is usually made by default rather than on purpose.

The Reflex to Reach for the Best Model

Frontier models are extraordinary, and they are priced accordingly. When you route every task through one, you pay frontier rates for jobs a far cheaper model would have nailed, and your users wait longer than they needed to. We have inherited features where the entire cost problem was a single decision nobody revisited after the prototype: the demo used the top model, so production did too.

What the Expensive Model Is Actually For

The frontier earns its price on genuinely hard work. Multi-step reasoning where one wrong turn ruins the answer. Tasks that need broad world knowledge and careful judgement at the same time. Long, messy inputs where the model has to hold a lot in its head and stay coherent. Code that has to be correct, not merely plausible. When the cost of a wrong answer is high and the problem is genuinely difficult, the best model is not an indulgence, it is the cheapest path to a result you can ship.

Where the Cheap Model Is Indistinguishable

Most products are full of small, well-defined jobs. Classifying a message into one of five buckets. Extracting a date and an amount from an email. Rewriting a sentence to be more polite. Deciding whether a chunk of text is relevant. For work like this, a cheap, fast model produces output a user could not tell apart from the frontier, at a tiny fraction of the cost and a fraction of the wait. Paying frontier rates here buys you nothing a customer will ever notice.

Routing: The Boring Answer That Works

The strongest pattern is not one model, it is several, each doing what it is suited to. A cheap model handles the high-volume, low-stakes steps. The frontier handles the one step that genuinely needs it. Sometimes a small model triages the request and only escalates the hard cases upward, so you pay the premium on the minority of calls that warrant it rather than all of them.

This adds a little engineering complexity, and it is almost always worth it. A feature we reworked this way kept its quality, cut its average cost per request by roughly two thirds, and got noticeably faster for the common case, because most requests stopped waiting on the slowest model.

How We Decide

The test we run for each step of a feature is short. How bad is a wrong answer here, and how hard is the task really. If a mistake is cheap to catch and the job is narrow, start with the cheapest model that passes your evaluations and only move up if it fails. If a mistake is expensive and the task needs real reasoning, start at the top and only move down once you have proof a cheaper model holds up.

The point is to make the choice deliberately, per step, with evidence. Run the candidates against a set of real examples, look at where the cheap one breaks, and let that decide. "Use the best model everywhere" is not a strategy. It is the absence of one, and clients pay for it every month until someone looks.


Want a second opinion on whether your AI feature is over-specified and overpaying? Get in touch. We do this analysis often, and the savings usually surprise people.