What is Prompt Caching?

Prompt caching stores the output of an LLM call and returns the cached response when the same (or semantically similar) prompt is submitted again. For applications with repetitive queries — FAQ bots, document Q&A, code assistants — caching can eliminate 30–70% of API calls entirely.

There are two types: exact caching (same prompt → same response) and semantic caching (similar prompts → reuse response if similarity exceeds a threshold). Semantic caching requires embedding the prompt and comparing against a vector store.

GateCtr's LLM Cache Layer (coming Q1 2027) will implement semantic caching transparently. Until then, GateCtr's token compression reduces the cost of cache misses.

Termes associés

Voir GateCtr en action — gratuit