LLM Glossary

Glossaire des coûts LLM

Définitions claires des concepts essentiels en infrastructure de coût IA, optimisation de tokens et routage LLM.

Budget Cap

A hard limit on token or dollar spend that blocks LLM requests once the threshold is reached.

Context Window

The maximum number of tokens an LLM can process in a single request, including both input and output.

Inference Cost

The per-request cost of running a prompt through an LLM, calculated from input and output token counts.

Latency vs. Cost Tradeoff

The balance between response speed and API cost when selecting an LLM for a given task.

LLM Cost Reduction

Strategies and tools that reduce the total spend on LLM API calls without degrading application quality.

LLM Gateway

A proxy layer between your application and LLM providers that adds routing, caching, and cost controls.

LLM Observability

The ability to monitor, trace, and analyze LLM API calls including tokens, costs, latency, and errors.

LLM Routing

Automatically selecting the most cost-effective LLM for each request based on complexity and requirements.

Model Fallback

Automatically switching to an alternative LLM when the primary model is unavailable or over budget.

Prompt Caching

Storing LLM responses for reuse when identical or similar prompts are submitted again.

Prompt Compression

A technique that shortens prompts by removing redundant tokens while preserving semantic meaning.

Rate Limiting (LLM)

Controlling the frequency of LLM API calls to prevent abuse, manage costs, and stay within provider limits.

Semantic Caching

A caching strategy that reuses LLM responses for prompts that are semantically similar, not just identical.

Token Counting

The process of measuring how many tokens a text string will consume when sent to an LLM.

Token Optimization

The process of reducing the number of tokens sent to an LLM without degrading output quality.