Inference and tokens
Prompt and completion token volume, context size, and model pricing create the most visible direct cost.
- Long prompts compound spend
- Frontier models change economics fast
- Verbose completions add cost at scale
Running an AI feature can cost fractions of a cent or multiple dollars per interaction depending on tokens, model choice, retrieval, orchestration, and traffic scale. The hard part is not seeing the total bill. It is understanding the cost of each feature, workflow, and customer interaction inside that bill.
A practical breakdown of how one AI feature should be measured inside a real product.
Most teams know their monthly OpenAI, Anthropic, Bedrock, or cloud bill. Far fewer can answer a simpler and more useful question: what does it cost to run one AI-powered feature inside the product?
That number is what determines pricing power, margin, adoption strategy, and whether the feature gets more investment or gets quietly throttled.
A modern AI feature often includes prompt assembly, retrieval, reranking, multiple invocations, policy checks, observability, retries, and orchestration.
A support copilot, content tool, and research agent may share a provider but have completely different economics.
The most useful metrics are cost per request, cost per workflow, cost per active user, and feature-level margin.
Even when each component looks inexpensive on its own, the full interaction can become materially expensive at scale.
Prompt and completion token volume, context size, and model pricing create the most visible direct cost.
Embeddings, vector database queries, reranking, tool calls, and validators each add incremental cost.
Retries, fallbacks, latency-driven duplication, and traffic volume turn small inefficiencies into real operating problems.
Below is a simplified example of a customer-facing AI assistant that retrieves documentation, generates an answer, and logs the interaction.
Start with the cost of one user interaction.
Capture multi-step chains, tools, and retries.
See which surfaces and customer groups consume spend.
Measure margin against pricing, retention, or expansion.
Understand the true cost of AI-powered workflows, and customer interactions.
See how to measure model usage, tokens, workflows, and infrastructure cost.
Break down the real cost to run one AI-powered feature inside your product.