Skip to main content

Evaluation Overview

This page consolidates evaluation requirements and maps them to corresponding LiteLLM capabilities, documentation links, and examples.

Core Platform​

Caching​

TitleDescriptionDocumentation
Prompt CachingCache repeated prompts to reduce cost and latency across providers.View Docs
Response CachingCache model responses with configurable TTL and cache-bypass options.View Docs
Per-Route Cache TTLsDefine different TTLs per route or model for prompt/response cache entries.View Docs
Cache Bypass ControlsAllow clients or rules to skip cache reads/writes for sensitive calls.View Docs
Semantic/Content-Aware CachingReduce re-computation by caching semantically-similar requests.View Docs
Cache Invalidation ControlsClear stale cache entries during rollouts or policy changes.View Docs

Routing​

TitleDescriptionDocumentation
Unified API Gateway for Multiple LLM ProvidersSingle endpoint to access local-hosted and multi-cloud LLMs across providers.View Docs
Supported Endpoints CatalogCore: /chat/completions • /completions • /v1/messages
Audio: /audio/transcriptions • /audio/speech
Images: /images/generations • /images/edits
Embeddings & Search: /embeddings • /rerank • /vector_stores • /search
Assistants & Batch: /assistants • /threads • /batches • /responses
Other: /fine_tuning/jobs • /moderations • /ocr • /mcp/tools • /realtime • /files
View Entire List
Advanced Routing StrategiesRoute based on budget, use-case, availability, rate limits, and lowest cost.View Docs
Reliable CompletionsProvider retries and fallbacks for resilient completions with exponential backoff and jitter.View Docs
Cost-Based RoutingAutomatically select lowest-cost viable provider/model.View Docs
Rate-Limit-Aware RoutingChoose providers based on available request/token headroom, with fallback to alternate models when nearing RPM/TPM caps.View Docs
Availability-Based RoutingReroute during provider outages to sustain uptime.View Docs
Budget-Aware RoutingSelect route based on remaining budget headroom, with fallback based on per-team/key remaining budget.View Docs
Latency-Aware RoutingPrefer providers with lower observed latency.View Docs
Error-Rate-Aware RoutingAvoid providers showing elevated error rates.View Docs