Encyclopedia Evalica / Datasets / Dataset

Dataset
/'day.tuh.seht/A versioned collection of test cases used to run evals and track improvements over time. Versioning matters because it keeps comparisons reproducible. (noun)
“We added 50 new examples to the dataset and re-ran the eval.”
Related Datasets terms
From the docs
Get started with Evals
Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.
Start building