What is Inference Cost?

The per-request cost of running a prompt through an LLM, calculated from input and output token counts.

Inference cost is the dollar amount charged by an LLM provider for processing a single API request. It is calculated as: (input_tokens × input_price_per_1M) + (output_tokens × output_price_per_1M).

Inference costs vary dramatically across models. GPT-4o costs $2.50/1M input tokens while GPT-4o mini costs $0.15/1M — a 16x difference. Choosing the right model for each task is one of the most impactful cost levers available.

GateCtr reports the exact inference cost for every request in the response metadata, enabling precise cost attribution per project, user, and model. This data feeds the analytics dashboard and budget enforcement system.

Comment GateCtr gère Inference Cost

GateCtr addresses inference cost automatically on every API call — no configuration required. The results are visible in real-time in the GateCtr dashboard, with per-request breakdowns of tokens, cost, and savings.

Modèles associés

Voir GateCtr en action — gratuit

No credit card required. Up and running in 5 minutes.

Start free