Encyclopedia Evalica / Evaluation / Offline evaluation

Offline evaluation illustration

Offline evaluation

/'aw.fleyen ih.va.lyoo'ay.shuhn/An eval performed during development against curated datasets, before deploying changes to production. Offline evals reduce risk by catching regressions early. (noun)

We did an offline eval to validate the new retrieval settings before shipping.

Related Evaluation terms

From the docs

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.

Start building