Glossaire des coûts LLM
Définitions claires des concepts essentiels en infrastructure de coût IA, optimisation de tokens et routage LLM.
Start saving — freeA hard limit on token or dollar spend that blocks LLM requests once the threshold is reached.
Read moreThe maximum number of tokens an LLM can process in a single request, including both input and output.
Read moreThe per-request cost of running a prompt through an LLM, calculated from input and output token counts.
Read moreThe balance between response speed and API cost when selecting an LLM for a given task.
Read moreStrategies and tools that reduce the total spend on LLM API calls without degrading application quality.
Read moreA proxy layer between your application and LLM providers that adds routing, caching, and cost controls.
Read moreThe ability to monitor, trace, and analyze LLM API calls including tokens, costs, latency, and errors.
Read moreAutomatically selecting the most cost-effective LLM for each request based on complexity and requirements.
Read moreAutomatically switching to an alternative LLM when the primary model is unavailable or over budget.
Read moreStoring LLM responses for reuse when identical or similar prompts are submitted again.
Read moreA technique that shortens prompts by removing redundant tokens while preserving semantic meaning.
Read moreControlling the frequency of LLM API calls to prevent abuse, manage costs, and stay within provider limits.
Read moreA caching strategy that reuses LLM responses for prompts that are semantically similar, not just identical.
Read moreThe process of measuring how many tokens a text string will consume when sent to an LLM.
Read moreThe process of reducing the number of tokens sent to an LLM without degrading output quality.
Read more