Token Prices Keep Falling. Here's What That Changes.
The cost of a given level of model intelligence has dropped sharply and keeps dropping. That changes which features are worth building, and it quietly punishes anyone who designs as if today's price is permanent.
If you priced an AI feature eighteen months ago and shelved it as too expensive, it is worth pulling that estimate back out. The number has almost certainly changed, and not by a little. The cost of a given level of capability has fallen faster than almost anything else in software, and the trend has not finished.
That single fact reshapes how we plan AI work, and it catches out teams who quietly assume today's price is fixed.
The Cost of Intelligence Is Dropping Fast
Two things are happening at once. Models at a given capability keep getting cheaper to run, and cheaper models keep getting more capable. The result is that the price to accomplish a specific task, say summarising a document to a standard a user would accept, has dropped repeatedly in the time most teams take to plan a roadmap. Work that justified a frontier model last year is increasingly handled by a mid-tier one this year at a fraction of the cost.
Features That Were Absurd Last Year
This turns some "no" decisions into "yes" ones without anyone writing new code. Summarising every record in a large dataset rather than a sample. Running an AI check on every item in a queue instead of the flagged ones. Letting users ask open-ended questions of their own data, freely, without metering every interaction. These were straightforward to design and simply too expensive to operate at scale. As the per-task cost falls, the line between "demo only" and "ship it to everyone" moves, and features sitting just on the wrong side of that line cross over.
Designing for a Price That Keeps Moving
The practical lesson is to separate the design of a feature from the economics of running it at scale. We increasingly build the capability first, prove it works, and gate the volume behind a cost switch we can open as the price drops. A feature can launch limited to a subset of users or records, with the architecture ready to widen the moment the maths turns favourable. You are not rebuilding anything. You are turning a dial that the market keeps making cheaper to turn.
The Trap of Building for Today's Price
The mistake cuts both ways. Build assuming prices stay high and you will reject features that become viable within a quarter, leaving easy value on the table while a competitor ships it. Build assuming prices keep collapsing and you might launch something that genuinely is not affordable yet and bleed money waiting for the curve to catch up. Neither extreme is a plan. The discipline is to know roughly where a feature sits relative to the trend, ship the ones already over the line, and keep the near-misses on a short list you revisit rather than abandon.
What We Tell Clients
We tell clients two things that sound contradictory and are not. First, do not let a cost estimate from a year ago decide a feature today, because the estimate has probably moved. Second, do not bet the business on a price that has not arrived yet. The healthy position is to design for capability, instrument for cost, and stay close enough to the trend that you can act when a feature tips from too expensive to obvious.
Falling prices are a tailwind, and tailwinds only help the people who are already moving. The teams that benefit are the ones with the feature designed and waiting, ready to widen the moment the numbers say go.
Have a feature you parked as too expensive to run? Send it over. It might be worth a fresh look, and the answer takes us very little time to work out.
Related articles
Charging Clients for AI When the Cost Is a Moving Target
Fixed-price delivery and variable per-token cost do not naturally fit together. Here is how we structure AI work commercially so the client gets a clear number and we do not absorb an open-ended bill.
6 min readHow We Pick a Model: Frontier, Mid, or Cheap
The instinct is to reach for the most capable model and stop thinking. That instinct quietly wastes money and adds latency. Here is the decision we actually run for every AI feature we build.
6 min readThe Token Bill Nobody Budgeted For
Token pricing looks trivial on the pricing page and arrives as a real number on the invoice. Here is how AI features actually accumulate cost, and how we budget for it before a line of code is written.
6 min read