The Token Bill Nobody Budgeted For
Token pricing looks trivial on the pricing page and arrives as a real number on the invoice. Here is how AI features actually accumulate cost, and how we budget for it before a line of code is written.
Every model pricing page makes the same quiet promise. A fraction of a penny per thousand tokens, a number so small it reads as free. Teams see it, do a rough sum in their head, and move on. Then the first real invoice lands, and the conversation with the client changes tone.
We have shipped enough AI features to know that the surprise is almost never the headline price. It is everything the headline price hides.
A Fraction of a Penny, A Thousand Times Over
The unit cost is genuinely tiny. The problem is that production multiplies it in ways a napkin sketch does not. A single user action that feels like "one AI call" is often three or four under the hood: a classification step, a retrieval step, the main generation, and a formatting pass. Each one bills separately.
Now add real traffic. A feature used by two hundred people a day, several times each, with a handful of calls per interaction, quietly turns that fraction of a penny into a number with commas in it. Nothing went wrong. The maths simply compounded the way maths does.
The Multipliers Nobody Mentions
Four things drive the bill far more than the per-token rate, and none of them appear in the demo.
Retries are the first. When a response comes back malformed or fails a validation check, your code calls again. A 10 percent retry rate is a 10 percent surcharge on the whole feature, paid silently.
System prompts are the second. That carefully crafted instruction block gets resent on every single call. If it is two thousand tokens and it never changes, you are paying to transmit the same paragraph thousands of times a day.
Conversation history is the third. Chat features resend the entire thread on each turn. By message twenty, you are paying for nineteen previous messages again, every time someone types.
Agent loops are the fourth and the most dangerous. An agent that reasons across several steps can fire a dozen calls to answer one question. Useful, sometimes essential, and roughly an order of magnitude more expensive than a single completion.
Input Is Cheap Until Your Context Is Not
Input tokens cost less than output tokens, which lulls teams into stuffing everything into context "just in case". Whole documents, generous histories, long instructions. Individually cheap. Resent on every call and multiplied by traffic, the input side often becomes the larger half of the bill. We have audited features where trimming context by half cut the invoice by more than a third with no measurable drop in quality.
Budgeting Before You Build
We model the cost during scoping, not after launch. The exercise is simple and it changes decisions. Estimate calls per user action, tokens per call including the resent context, retry rate, and expected daily active users. Multiply it out, then multiply again for the worst week you can imagine. If that number frightens the client, better to learn it now than from accounting.
This is also where caching, smaller models for the cheap steps, and moving work off the live path earn their place. You cannot make those calls sensibly without a cost model in front of you.
The Number That Actually Matters
Cost per call is the wrong unit. Cost per successful outcome is the right one. A feature that costs a penny per call but needs four calls and a human review to land a usable result is not a penny feature. When you price the outcome, you start optimising the thing the client actually pays for, which is the answer in their hands, not the request leaving your server.
The token bill is not a reason to avoid building with AI. It is a design input, the same as latency or accuracy. Treated that way from day one, it stops being the unpleasant surprise and becomes one more thing you simply accounted for.
Planning an AI feature and want a realistic cost model before you commit to it? Talk to us. We would rather show you the number early than explain it late.
Related articles
Charging Clients for AI When the Cost Is a Moving Target
Fixed-price delivery and variable per-token cost do not naturally fit together. Here is how we structure AI work commercially so the client gets a clear number and we do not absorb an open-ended bill.
6 min readHow We Pick a Model: Frontier, Mid, or Cheap
The instinct is to reach for the most capable model and stop thinking. That instinct quietly wastes money and adds latency. Here is the decision we actually run for every AI feature we build.
6 min readToken Prices Keep Falling. Here's What That Changes.
The cost of a given level of model intelligence has dropped sharply and keeps dropping. That changes which features are worth building, and it quietly punishes anyone who designs as if today's price is permanent.
5 min read