What is LLM Routing?
Automatically selecting the most cost-effective LLM for each request based on complexity and requirements.
LLM routing is the practice of dynamically selecting which language model handles a given request. Rather than sending all requests to a single model, a router evaluates each request and assigns it to the most appropriate model based on criteria like complexity, required quality, latency constraints, and cost.
A simple Q&A query might be routed to a fast, cheap model like GPT-4o mini, while a complex reasoning task is sent to GPT-4o or Claude 3.5 Sonnet. This approach can reduce average cost per request by 30–60% without sacrificing quality on tasks that require it.
GateCtr's Model Router uses semantic complexity scoring to make routing decisions automatically. Pass model: "auto" and GateCtr handles the rest.
GateCtr addresses llm routing automatically on every API call — no configuration required. The results are visible in real-time in the GateCtr dashboard, with per-request breakdowns of tokens, cost, and savings.