Encyclopedia Evalica / Evaluation / RAG evaluation

RAG evaluation

/rag ih.va.lyoo'ay.shuhn/The practice of measuring end-to-end RAG quality and its components (retrieval and generation). Common dimensions include retrieval relevance/recall, groundedness/citation accuracy, answer correctness, and whether the system abstains when context is insufficient. (noun)

Our RAG eval showed retrieval was fine, but the model still wasn't grounding answers in the sources.

Related Evaluation terms

From the docs

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.

Start building