Encyclopedia Evalica / Observability / Dashboard

Dashboard illustration

Dashboard

/'da.shbawrd/A configurable view that tracks key metrics over time using charts and aggregations, covering latency, cost, error rates, and quality scores. Dashboards make it easy to spot trends and regressions without querying raw traces. (noun)

Why it matters

Infrastructure dashboards track request rates, error codes, and latency. AI dashboards need to track all of that plus quality metrics that are specific to AI systems, like average scorer results, hallucination rates, token usage, cost per interaction, and topic distributions. These quality signals are what tell you whether your system is actually working well, not just whether it is running. A dashboard that shows your groundedness score trending down over the past week is a fundamentally different signal than one that shows your p95 latency increasing. Both matter, but the quality signal is often more actionable because it tells you something changed about how your system behaves, not just how it performs. Effective AI dashboards let you filter by model, prompt version, user segment, or time range so you can isolate regressions quickly. They also serve as the starting point for deeper investigation. When you see a metric drop, you need to be able to click through to the underlying traces and understand what happened.

The dashboard made it obvious that accuracy dropped right after the rollout.

Customer example

Retool's on-call rotation runs off Braintrust dashboards that track tool-call success rates, context window overflow, model/provider errors, and latency/token usage. Read more

Related Observability terms

From the docs

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.

Start building