What is LLM Routing?

LLM routing is the practice of dynamically selecting which language model handles a given request. Rather than sending all requests to a single model, a router evaluates each request and assigns it to the most appropriate model based on criteria like complexity, required quality, latency constraints, and cost.

A simple Q&A query might be routed to a fast, cheap model like GPT-4o mini, while a complex reasoning task is sent to GPT-4o or Claude 3.5 Sonnet. This approach can reduce average cost per request by 30–60% without sacrificing quality on tasks that require it.

GateCtr's Model Router uses semantic complexity scoring to make routing decisions automatically. Pass model: "auto" and GateCtr handles the rest.

Termes associés

Modèles associés

Voir GateCtr en action — gratuit