Skip to main content

Evaluation Overview

This page consolidates evaluation requirements and maps them to corresponding LiteLLM capabilities, documentation links, and examples.

Core Platform

Caching

Title	Description	Documentation
Prompt Caching	Cache repeated prompts to reduce cost and latency across providers.	View Docs
Response Caching	Cache model responses with configurable TTL and cache-bypass options.	View Docs
Per-Route Cache TTLs	Define different TTLs per route or model for prompt/response cache entries.	View Docs
Cache Bypass Controls	Allow clients or rules to skip cache reads/writes for sensitive calls.	View Docs
Semantic/Content-Aware Caching	Reduce re-computation by caching semantically-similar requests.	View Docs
Cache Invalidation Controls	Clear stale cache entries during rollouts or policy changes.	View Docs

Routing

Title	Description	Documentation
Unified API Gateway for Multiple LLM Providers	Single endpoint to access local-hosted and multi-cloud LLMs across providers.	View Docs
Supported Endpoints Catalog	Core: `/chat/completions` • `/completions` • `/v1/messages` Audio: `/audio/transcriptions` • `/audio/speech` Images: `/images/generations` • `/images/edits` Embeddings & Search: `/embeddings` • `/rerank` • `/vector_stores` • `/search` Assistants & Batch: `/assistants` • `/threads` • `/batches` • `/responses` Other: `/fine_tuning/jobs` • `/moderations` • `/ocr` • `/mcp/tools` • `/realtime` • `/files`	View Entire List
Advanced Routing Strategies	Route based on budget, use-case, availability, rate limits, and lowest cost.	View Docs
Reliable Completions	Provider retries and fallbacks for resilient completions with exponential backoff and jitter.	View Docs
Cost-Based Routing	Automatically select lowest-cost viable provider/model.	View Docs
Rate-Limit-Aware Routing	Choose providers based on available request/token headroom, with fallback to alternate models when nearing RPM/TPM caps.	View Docs
Availability-Based Routing	Reroute during provider outages to sustain uptime.	View Docs
Budget-Aware Routing	Select route based on remaining budget headroom, with fallback based on per-team/key remaining budget.	View Docs
Latency-Aware Routing	Prefer providers with lower observed latency.	View Docs
Error-Rate-Aware Routing	Avoid providers showing elevated error rates.	View Docs

Cost & Efficiency Optimization

Budgets

Title	Description	Documentation
Usage & Cost Tracking	Track spend and tokens per model, key, user, team, and environment.	View Docs
Budget Enforcement Policies	Set and enforce budgets for teams, users, and API keys.	View Docs
Budget Refresh Schedules	Support monthly/daily automatic budget refresh windows with configurable duration (seconds, minutes, hours, days).	View Docs
Per-Key Budgets	Budget caps for individual API keys.	View Docs
Per-User Budgets	Limit spend at the user account level.	View Docs
Per-Model Budgets	Assign budgets by model family or provider.	View Docs
Team Budgets	Assign budgets and quotas scoped to a team.	View Docs

Enterprise

Alerting

Title	Description	Documentation
LLM Performance Alerts	Detect model/provider outages, slow API calls (exceeding `alerting_threshold`), hanging requests, failed API calls, and sudden error spikes.	View Docs
Budget & Spend Alerts	Daily/weekly spend summaries per team or tag, soft budget threshold notifications at X% consumption, and budget limit alerts.	View Docs
Daily Health Reports	Automated daily status summaries including top 5 slowest deployments, top 5 deployments with most failed requests, and system health metrics.	View Docs

Deployment

Title	Description	Documentation
Deployment Options	Deploy via Docker, Kubernetes, Helm, Terraform, AWS CloudFormation, Google Cloud Run, Render, Railway, or Docker Compose with support for database, Redis, and production-ready configurations.	View Docs
Control Plane & Data Plane	Separate planes for global management and regional execution with multi-region/multi-cloud failover for high availability.	View Docs
Timeout Configuration	Global and per-model/provider timeouts to avoid hung requests.	View Docs
Concurrent Usage Testing	Simulate load to validate throughput targets.	View Docs

Monitoring, Logging & Observability

Integrations

Title	Description	Documentation
Datadog Integration	Publish metrics and traces to Datadog for dashboards and alerts, including pre-built panels for latency, errors, and usage.	View Docs
Prometheus Metrics	Expose proxy metrics for scrape and alert rules.	View Docs
SIEM & Tooling Integrations	Forward logs and events to external observability stacks.	View Docs

Logging

Title	Description	Documentation
Request/Response Logging	Enable or disable logging to capture payloads, identifiers, and outcomes for auditing with structured logging fields (user_id, call_id, model, tokens, latency).	View Docs
Logging Payload Specification	Reference documentation for all available fields and data captured in LiteLLM logging payloads.	View Docs
Custom Callbacks	Integrate custom logging hooks to process and forward structured logs to external systems (SIEM, observability platforms, databases) with real-time token usage, cost tracking, and event handling.	View Docs
PII-Safe Logging Practices	Use Presidio guardrails to mask or block PII, PHI, and sensitive data before logging.	View Docs

Metrics & Dashboards

Title	Description	Documentation
Latency, Error Rate, Token Usage	Track p50/p95 latency, error counts, and token consumption with dashboards for latency percentiles over time.	View Docs
Request Throughput Metrics	Dashboard visualizations showing the number of requests processed over time, broken down by API route, provider, or model.	View Docs
Error Rate Panels	Track HTTP status codes and failures.	View Docs
Budget & Spend Metrics	Monitor spend/budget usage per team/key with metrics and visualize budget burn-down per team/key.	View Docs
Daily Summary Reports	Automated daily summaries of usage and health.	View Docs
Cache Metrics	Export cache hit/miss metrics for dashboards.	View Docs

Performance

Reliability

Title	Description	Documentation
Production Best Practices	Production deployment recommendations including configuration, machine specifications, Redis optimization, worker management, and database connection pooling.	View Docs
Gateway Overhead P50/P90/P99	LiteLLM proxy adds minimal latency overhead compared to direct provider API calls.	View Docs
Provider Latency Comparison	Compare observed latencies across providers.	View Docs
Load Test Toolkit	Use mock requests and scenarios to validate SLOs.	View Docs

Security & Compliance

Identity

Title	Description	Documentation
RBAC & Team Segmentation	Enforce permissions by roles; segment teams and models.	View Docs
User/Team Rate Limits	Set RPM/TPM per user/team/model/key.	View Docs
SSO & OAuth	Integrate identity providers via SSO/OAuth.	View Docs
MCP Permission Management	Constrain model control permissions by user/team.	View Docs
Virtual Keys & Rotation	Create, rotate, and revoke virtual keys at scale with configurable rotation strategy (schedule/events).	View Docs
Team-Scoped Keys	Create keys scoped to specific teams for isolation.	View Docs
TLS Encryption Policy	TLS 1.2+ for secure transport between clients and gateway for all inbound connections.	View Docs
Self-Hosted Data Policy	Ensure no persistent storage of prompts/responses when self-hosted.	View Docs
IP Allow/Deny Lists	Enforce network-level access using IP-based policies and prevent lateral movement between teams and models.	View Docs
AWS Secrets Manager	Store and rotate provider secrets via AWS Secrets Manager with automation.	View Docs

Guardrails

Title	Description	Documentation
Guardrails Suite	Configure content filtering, prompt injection detection, PII masking, and security guardrails with support for multiple providers (Presidio, Lakera, Aporia, Bedrock, Pangea, and more).	View Docs
PII/PHI Masking	Mask or block personally identifiable information and protected health information using Presidio with configurable entity types and actions.	View Docs
Prompt Injection Detection	Detect and block prompt injection attacks and jailbreak attempts using similarity checks, LLM API calls, or third-party services.	View Docs
Secret Detection	Detect and mask secrets, API keys, and sensitive credentials in prompts and responses.	View Docs

Core Platform
- Caching
- Routing
Cost & Efficiency Optimization
- Budgets
Enterprise
- Alerting
- Deployment
Monitoring, Logging & Observability
Performance
- Reliability
Security & Compliance
- Identity
- Guardrails