The Token Bill Nobody Budgeted For

Every model pricing page makes the same quiet promise. A fraction of a penny per thousand tokens, a number so small it reads as free. Teams see it, do a rough sum in their head, and move on. Then the first real invoice lands, and the conversation with the client changes tone.

We have shipped enough AI features to know that the surprise is almost never the headline price. It is everything the headline price hides.

A Fraction of a Penny, A Thousand Times Over

The unit cost is genuinely tiny. The problem is that production multiplies it in ways a napkin sketch does not. A single user action that feels like "one AI call" is often three or four under the hood: a classification step, a retrieval step, the main generation, and a formatting pass. Each one bills separately.

Now add real traffic. A feature used by two hundred people a day, several times each, with a handful of calls per interaction, quietly turns that fraction of a penny into a number with commas in it. Nothing went wrong. The maths simply compounded the way maths does.

The Multipliers Nobody Mentions

Four things drive the bill far more than the per-token rate, and none of them appear in the demo.

Retries are the first. When a response comes back malformed or fails a validation check, your code calls again. A 10 percent retry rate is a 10 percent surcharge on the whole feature, paid silently.

System prompts are the second. That carefully crafted instruction block gets resent on every single call. If it is two thousand tokens and it never changes, you are paying to transmit the same paragraph thousands of times a day.

Conversation history is the third. Chat features resend the entire thread on each turn. By message twenty, you are paying for nineteen previous messages again, every time someone types.

Agent loops are the fourth and the most dangerous. An agent that reasons across several steps can fire a dozen calls to answer one question. Useful, sometimes essential, and roughly an order of magnitude more expensive than a single completion.

Input Is Cheap Until Your Context Is Not

Input tokens cost less than output tokens, which lulls teams into stuffing everything into context "just in case". Whole documents, generous histories, long instructions. Individually cheap. Resent on every call and multiplied by traffic, the input side often becomes the larger half of the bill. We have audited features where trimming context by half cut the invoice by more than a third with no measurable drop in quality.

Budgeting Before You Build

We model the cost during scoping, not after launch. The exercise is simple and it changes decisions. Estimate calls per user action, tokens per call including the resent context, retry rate, and expected daily active users. Multiply it out, then multiply again for the worst week you can imagine. If that number frightens the client, better to learn it now than from accounting.

This is also where caching, smaller models for the cheap steps, and moving work off the live path earn their place. You cannot make those calls sensibly without a cost model in front of you.

The Number That Actually Matters

Cost per call is the wrong unit. Cost per successful outcome is the right one. A feature that costs a penny per call but needs four calls and a human review to land a usable result is not a penny feature. When you price the outcome, you start optimising the thing the client actually pays for, which is the answer in their hands, not the request leaving your server.

The token bill is not a reason to avoid building with AI. It is a design input, the same as latency or accuracy. Treated that way from day one, it stops being the unpleasant surprise and becomes one more thing you simply accounted for.

Planning an AI feature and want a realistic cost model before you commit to it? Talk to us. We would rather show you the number early than explain it late.

The Token Bill Nobody Budgeted For

A Fraction of a Penny, A Thousand Times Over

The Multipliers Nobody Mentions

Input Is Cheap Until Your Context Is Not

Budgeting Before You Build

The Number That Actually Matters

Related articles

Charging Clients for AI When the Cost Is a Moving Target

Small Models, Big Savings: When You Don't Need the Frontier

When a New Model Drops, What Changes for Clients