Skip to main content
To view the results of an evaluation, go to Experiments in your project and select an experiment from the list.

View summaries

The summary pane displays:
  • Comparisons to other experiments
  • Scorers used in the evaluation
  • Datasets tested
  • Metadata like model and parameters
Copy the experiment ID from the bottom of the summary pane for referencing in code or sharing with teammates.

Understand metrics

Braintrust tracks these metrics automatically:
  • Duration: Time to complete the task span
  • Offset: Time elapsed since trace start
  • Prompt tokens: Tokens in the input
  • Completion tokens: Tokens in the output
  • Total tokens: Combined token count
  • LLM duration: Time spent in LLM calls
  • Estimated cost: Approximate cost based on pricing
Metrics are computed on the task subspan, excluding LLM-as-a-judge scorer calls.
To compute LLM metrics, wrap your LLM calls with Braintrust provider wrappers.

Change the display

Switch the view

Each project includes locked default views that cannot be modified, including:
  • Non-errors: Shows only records without errors
  • Errors: Shows only records with errors
  • Scorer errors: Shows only records with scorer errors
  • Unreviewed: Hides items that have been human-reviewed
  • Assigned to me: Shows only records assigned to the current user for human review
Use the View menu to switch the view.
  • To set the current view as default, select Manage view > Set as your default view.
  • To discard unsaved changes and return to the default view, select Reset.

Create a custom view

Custom views save your table configurations including filters, sorts, column order, column visibility, and display settings. This lets you quickly switch between different ways of analyzing your experiments. To create a custom view:
  1. Apply the filters, sorts, columns, and display settings you want.
  2. Select Save as in the toolbar.
  3. Enter a view name.
Views are accessible and configurable by any member of the organization. When you create a custom view and set it as your default, it becomes your personal default view. The system default views remain available to all team members.

Show and hide columns

Select Display > Columns and then:
  • Show or hide columns to focus on relevant data
  • Reorder columns by dragging them
  • Pin important columns to the left
All column settings are automatically saved when you save a view.

Create custom columns

Extract specific values from traces using custom columns:
  1. Select Display > Columns > + Add custom column.
  2. Name your column.
  3. Choose from inferred fields or write a SQL expression.
Once created, filter and sort using your custom columns.

Group results

Select Display > Group by to group the table by metadata fields to see patterns. By default, group rows show one experiment’s summary data. To view summary data for all experiments, select Include comparisons in group.

Order by regressions

Score and metric columns show summary statistics in their headers. To order columns by regressions, select Display > Columns > Order by regressions. Within grouped tables, this sorts rows by regressions of a specific score relative to a comparison experiment.

Filter results

Select Filter to open the filter menu. Use the Basic tab for point-and-click filtering, or switch to SQL to write precise queries. The SQL editor includes Generate button that creates queries from natural language descriptions.

Adjust table layout

To change the table density to see more or less detail per row, select Display > Row height > Compact or Tall. To switch between different layouts, select Display > Layout and one of the following:
  • List: Default table view.
  • Grid: Compare outputs side-by-side.
  • Summary: Large-type summary of scores and metrics across all experiments.
Layouts respect view filters and are automatically saved when you save a view.

Examine individual traces

Select any row to open the trace view and see complete details:
  • Input, output, and expected values
  • Metadata and parameters
  • All spans in the trace hierarchy
  • Scores and their explanations
  • Timing and token usage
Trace view Ask yourself: Do good scores correspond to good outputs? If not, update your scorers or test cases.

Use aggregate scores

Aggregate scores combine multiple scores into a single metric. They are useful when you track many scores but need a single metric to represent overall experiment quality. See Create aggregate scores for more details.

Score retrospectively

Apply scorers to existing experiments:
  • Multiple cases: Select rows and use Score to apply chosen scorers
  • Single case: Open a trace and use Score in the trace view
Scores appear as additional spans within the trace.

View raw trace data

When viewing a trace, select a span and then select the button in the span’s header to view the complete JSON representation. The raw data view shows all fields including metadata, inputs, outputs, and internal properties that may not be visible in other views. The raw data view has two tabs:
  • This span - Shows the complete JSON for the selected span only
  • Full trace - Shows the complete JSON for the entire trace
Use the search bar at the top of the dialog to find specific content within the data. Raw span data is useful when you need to:
  • Inspect the complete span structure for debugging
  • Find specific fields in large or deeply nested spans
  • Verify exact values and data types
  • Export or copy the full span for reproduction

Analyze across experiments

Compare performance across multiple experiments using visualizations.

Bar chart

On the Experiments page, view scores as a bar chart by selecting Score comparison from the X axis selector: Group by metadata fields to create comparative bar charts:

Scatter plot

Select a metric on the x-axis to construct scatter plots. For example, compare the relationship between accuracy and duration:

Export experiments

To export an experiment’s results, open the menu next to the experiment name. You can export as CSV or JSON, and choose whether to download all fields.Export experiments

Next steps