Encyclopedia Evalica / Deployment / Caching

Caching
/'ka.shihng/The process of storing and reusing LLM responses for identical inputs, reducing cost and latency. Caching is especially impactful for repeated prompts and common workflows. (noun)
“Caching cut our average latency in half for repeat questions.”
Customer example
Graphite used Braintrust prompt caching to manage costs while benchmarking models for its AI code reviewer. Read more
Related Deployment terms
From the docs
Get started with Evals
Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.
Start building