Share evaluation results

Share evaluation results publicly using status pages. Status pages display aggregate scores and metrics in a branded dashboard without exposing individual traces or sensitive data. Configure which scores to show, group by metadata fields, and share a public URL. Status pages work well for:

Customer transparency: Show evaluation results to customers or partners
Team alignment: Give cross-functional teams visibility into quality metrics
Public benchmarks: Publish performance comparisons for open-source projects
CI/CD dashboards: Display test results in a clean, accessible format

Each project can have one active status page.

Create a status page

Go to Experiments in your project.
Click Publish eval status page in the experiments page header.
In the publish dialog, configure how your evaluation results will be displayed:
- Page title (required): The heading shown at the top of your status page.
- Description: Optional markdown-formatted text to provide context.
- Logo URL: Optional custom logo image (displays at the top of the page).
- Grouping field (required): Metadata field to group experiments by (typically metadata.model).
- Filter: Optional filtering to include only specific experiments.
- Score columns: Select which scores to display from your evaluations.
- Metric columns: Select which metrics to show (duration, tokens, error rate, etc.).
- Sort by: Optional sort by a specific score or metric.
- Theme: Choose between light or dark mode.
The dialog shows a live preview of your status page as you configure it.
Click Publish to make your status page available at a public URL. On first publish, Braintrust automatically creates a service account with read-only access to experiments in this project. This account is used to fetch data for the public page without exposing your API credentials.

Configure display options

Grouping field

The grouping field determines how experiments are organized on your status page. Experiments are grouped by the value of this metadata field, with each group displayed as a column. Common grouping fields:

metadata.model: Compare results across different models
metadata.version: Track performance across application versions
metadata.prompt: Compare different prompt variations
metadata.dataset: Show results for different test scenarios

Example: If you group by metadata.model and have experiments with gpt-5-mini, gpt-5-nano, and claude-sonnet-4, your status page will show three columns, one for each model.

Filters

Click + Filter and use the Basic tab for point-and-click filtering or switch to SQL to write precise queries. For example:

Focusing on recent evaluations: created > '2026-01-01'
Including specific datasets: dataset_name = 'production-sample'
Filtering by metadata: metadata.environment = 'staging'

Score columns

Select which evaluation scores to display. Scores appear as rows with color-coded progress bars:

Green (0.7-1.0): High scores
Yellow (0.4-0.7): Medium scores
Red (0.0-0.4): Low scores

Each cell shows the average score across all experiments in that group. The row header displays the overall average across all groups.

Metric columns

Display built-in or custom metrics from your experiments:

Built-in metrics:
- Duration: Average end-to-end execution time (seconds)
- LLM duration: Average LLM call duration (seconds)
- Prompt tokens: Average prompt tokens per example
- Completion tokens: Average completion tokens per example
- Total tokens: Total tokens (prompt + completion)
- Examples: Number of examples evaluated
- Error rate: Percentage of failed examples
Custom metrics defined in your experiments are also available for selection.

Once published, a status page is publicly accessible. Share the URL with anyone:

https://www.braintrust.dev/status/{org}/{project}

Status pages are:

Public: Accessible to anyone with the URL (no authentication required)
Read-only: Viewers cannot modify experiments or access sensitive data
Aggregate-only: Shows averaged scores and metrics, not individual test cases
Secure: Uses a service account with minimal permissions (experiment read-only)

Update a status page

To modify a status page configuration:

Go to Experiments in your project.
Click Update eval status page in the experiments page header.
Make changes to any configuration options.
Preview your updates in real-time.
Click Update to publish changes.

Updates are reflected immediately on the public URL.

Unpublish a status page

To remove your status page:

Go to Experiments in your project.
Click Update eval status page in the experiments page header.
Click Unpublish
Confirm the removal.

The public URL will no longer be accessible. The service account and its permissions remain for future republishing.

Next steps

Run evaluations to generate results for your status page
Interpret results before publishing to ensure accuracy
Compare experiments to understand performance across models
Manage projects to configure evaluation settings

Start

Instrument

Observe

Annotate

Evaluate

Deploy

Admin

Best practices

Share evaluation results

Create a status page

Configure display options

Grouping field

Filters

Score columns

Metric columns

Update a status page

Unpublish a status page

Next steps

Start

Instrument

Observe

Annotate

Evaluate

Deploy

Admin

Best practices

​Create a status page

​Configure display options

​Grouping field

​Filters

​Score columns

​Metric columns

​Share a status page

​Update a status page

​Unpublish a status page

​Next steps

Create a status page

Configure display options

Grouping field

Filters

Score columns

Metric columns

Share a status page

Update a status page

Unpublish a status page

Next steps