Encyclopedia Evalica / Evaluation / Eval leakage

Eval leakage illustration

Eval leakage

/ee.val 'lee.kuhj/When eval data unintentionally influences the system being evaluated (e.g., test cases appear in training or prompts). Leakage can inflate scores without improving real-world performance. (noun)

We rotated the test set to reduce eval leakage.

Related Evaluation terms

From the docs

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.

Start building