Back to Blog

Charging Clients for AI When the Cost Is a Moving Target

Fixed-price delivery and variable per-token cost do not naturally fit together. Here is how we structure AI work commercially so the client gets a clear number and we do not absorb an open-ended bill.

EvolRed Team··6 min read

Most agency work has a comfortable shape. You scope it, you price it, you build it, and the cost of delivering it does not change after you ship. AI work breaks that shape. The feature keeps spending money every time someone uses it, and how much it spends depends on usage you cannot fully predict. That tension sits underneath every AI engagement, and pretending it is not there is how agencies quietly lose money.

We have had to get deliberate about this. Here is how we actually structure the commercials.

Fixed Price Meets Variable Cost

A client wants a number they can approve. We want to deliver without signing up for an unbounded liability. The instinct to quote a single fixed figure for "the AI feature" hides the real problem, because the build is a one-off cost and the running is a recurring one that scales with success. The more popular the feature becomes, the more it costs to operate. That is the opposite of how clients expect software to behave, and it needs saying out loud early.

Who Carries the Usage Risk

Every pricing model is really an answer to one question: who absorbs the cost when usage spikes. If the agency carries it, a viral month can wipe out the margin on the whole project. If the client carries it, they need visibility and controls so the bill cannot run away while nobody is looking. Neither party should be surprised. The job of the commercial structure is to put that risk somewhere on purpose, with both sides agreeing where.

Three Ways We Price It

We use one of three structures, chosen to fit the client and the feature.

The first is fixed build, client-owned running cost. We charge for design and engineering as normal, the client connects their own model provider account, and they pay the usage directly. Clean separation, no markup games, and the client sees exactly what the feature costs to run. This suits clients who are comfortable owning infrastructure.

The second is usage pass-through with a management margin. We run the infrastructure, the client pays actual usage plus an agreed percentage for us carrying it. Transparent, and it scales with them rather than against us.

The third is a capped retainer. A flat monthly fee covers operation up to a defined usage ceiling, with a clear rate for anything beyond it. Predictable for the client, safe for us, as long as the cap is set with a real cost model behind it rather than a guess.

Caps, Alerts, and the Awkward Conversation

Whatever the structure, hard limits go in from the start. Spend caps that throttle or pause the feature before the bill becomes a problem. Alerts at sensible thresholds so a runaway loop or an abuse pattern gets caught in hours, not at month end. We would far rather have the awkward conversation about a cap being hit than the much worse one about an invoice nobody saw coming. Clients respect the first and resent the second.

Price the Outcome, Not the Tokens

The healthiest engagements stop talking about tokens at all once the model is built. Tokens are our unit, not the client's. They care about resolved tickets, qualified leads, drafted documents, hours saved. When we can tie the cost to an outcome the client already values, the pricing conversation gets easier, because we are no longer asking them to underwrite a technical line item. We are charging for a result, and the token cost becomes our problem to manage well behind the scenes.

AI work is not harder to price than other software. It is just differently shaped, and that shape rewards being honest about the running cost from the first conversation rather than discovering it together later.


Trying to work out how to price an AI feature for a client, or how to have a client price one for you? We are happy to compare notes.