LLM Glossary

Glossaire des coûts LLM

Définitions claires des concepts essentiels en infrastructure de coût IA, optimisation de tokens et routage LLM.

Start saving — free
Budget Cap

A hard limit on token or dollar spend that blocks LLM requests once the threshold is reached.

Read more
Context Window

The maximum number of tokens an LLM can process in a single request, including both input and output.

Read more
Inference Cost

The per-request cost of running a prompt through an LLM, calculated from input and output token counts.

Read more
Latency vs. Cost Tradeoff

The balance between response speed and API cost when selecting an LLM for a given task.

Read more
LLM Cost Reduction

Strategies and tools that reduce the total spend on LLM API calls without degrading application quality.

Read more
LLM Gateway

A proxy layer between your application and LLM providers that adds routing, caching, and cost controls.

Read more
LLM Observability

The ability to monitor, trace, and analyze LLM API calls including tokens, costs, latency, and errors.

Read more
LLM Routing

Automatically selecting the most cost-effective LLM for each request based on complexity and requirements.

Read more
Model Fallback

Automatically switching to an alternative LLM when the primary model is unavailable or over budget.

Read more
Prompt Caching

Storing LLM responses for reuse when identical or similar prompts are submitted again.

Read more
Prompt Compression

A technique that shortens prompts by removing redundant tokens while preserving semantic meaning.

Read more
Rate Limiting (LLM)

Controlling the frequency of LLM API calls to prevent abuse, manage costs, and stay within provider limits.

Read more
Semantic Caching

A caching strategy that reuses LLM responses for prompts that are semantically similar, not just identical.

Read more
Token Counting

The process of measuring how many tokens a text string will consume when sent to an LLM.

Read more
Token Optimization

The process of reducing the number of tokens sent to an LLM without degrading output quality.

Read more