Encyclopedia Evalica / Evaluation / Reference-based scoring

Reference-based scoring

/'reh.fer.uhns bayst 'skaw.rihng/Scoring that compares output to a ground truth/expected output. It works best when "correctness" is well-defined. (noun)

We use reference-based scoring for extraction tasks with known answers.

Related Evaluation terms

From the docs

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.

Start building