What is Prompt Caching?
Storing LLM responses for reuse when identical or similar prompts are submitted again.
Prompt caching stores the output of an LLM call and returns the cached response when the same (or semantically similar) prompt is submitted again. For applications with repetitive queries — FAQ bots, document Q&A, code assistants — caching can eliminate 30–70% of API calls entirely.
There are two types: exact caching (same prompt → same response) and semantic caching (similar prompts → reuse response if similarity exceeds a threshold). Semantic caching requires embedding the prompt and comparing against a vector store.
GateCtr's LLM Cache Layer (coming Q1 2027) will implement semantic caching transparently. Until then, GateCtr's token compression reduces the cost of cache misses.
GateCtr addresses prompt caching automatically on every API call — no configuration required. The results are visible in real-time in the GateCtr dashboard, with per-request breakdowns of tokens, cost, and savings.