Encyclopedia Evalica / Evaluation / Model comparison

Model comparison

/'mah.duhl kuhm'pah.ruh.suhn/Evaluating multiple models against the same dataset and scorers to determine which performs best for a given use case. Comparisons are most useful when cost and latency are included, not just quality. (noun)

Model comparison showed the cheaper model was fine on easy queries but failed on edge cases.

Related Evaluation terms

From the docs

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.

Start building