Monitor how dataset rows perform across experiments.Documentation Index
Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
View experiment runs
See all experiments that used a dataset:- Go to Datasets.
- Open your dataset.
- In the right panel, select Runs.
- Review performance metrics across experiments.
Filter experiment runs
To narrow down the list of experiment runs, you can filter by time range, tag, or use SQL. Filter by time range: Click and drag across any region of the chart to select a time range. The table below updates to show only experiments in that range. To clear the filter, click clear. This helps you focus on specific periods, like recent experiments or historical baselines. Filter by tag: Click any tag chip on an experiment row to instantly filter the list to runs with that tag. You can also add a Tags column via Display > Columns to see tags for each run at a glance. To filter by tag in a query, use BTQL’sINCLUDES operator:
Filter states are persisted in the URL, allowing you to bookmark or share specific filtered views of experiment runs.
Analyze per-row performance
See how individual rows perform:- Select a row in the dataset table.
- In the right panel, select Runs.
- Review the row’s metrics across experiments.
This view only shows experiments that set the
origin field in eval traces.- Consistently low scores suggest ambiguous expectations.
- Failures across experiments indicate edge cases.
- High variance suggests instability.
Next steps
- Run more evaluations to expand the dataset’s coverage.
- Edit records that surface as problematic.