What is Rate Limiting (LLM)?

Rate limiting in the context of LLMs means restricting how many API calls can be made within a given time window. This serves two purposes: staying within provider-imposed rate limits (requests per minute, tokens per minute) and enforcing application-level policies (per-user limits, per-project caps).

Provider rate limits are hard constraints — exceeding them results in HTTP 429 errors. Application-level rate limits are policy decisions — you might limit a free-tier user to 100 requests/day to control costs.

GateCtr enforces both types of rate limits. Budget caps act as token-based rate limits, while the Budget Firewall blocks requests that would exceed defined thresholds. This prevents both provider errors and unexpected cost spikes.

Termes associés

Voir GateCtr en action — gratuit