Caching

/'ka.shihng/The process of storing and reusing LLM responses for identical inputs, reducing cost and latency. Caching is especially impactful for repeated prompts and common workflows. (noun)

“Caching cut our average latency in half for repeat questions.”

Customer example

Graphite used Braintrust prompt caching to manage costs while benchmarking models for its AI code reviewer. Read more

Related Deployment terms

Brainstore

•

Fallback

•

Function (deployed)

•

Gateway

•

Hybrid deployment

•

Prompt

•

Prompt (deployed)

•

Slug

•

Streaming

•

Token

From the docs

Deploy

•

Use the Braintrust gateway

•

Deploy prompts

•

Glossary

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.