- Customer transparency: Show evaluation results to customers or partners
- Team alignment: Give cross-functional teams visibility into quality metrics
- Public benchmarks: Publish performance comparisons for open-source projects
- CI/CD dashboards: Display test results in a clean, accessible format
Create a status page
- Go to Experiments in your project.
- Click Publish eval status page in the experiments page header.
-
In the publish dialog, configure how your evaluation results will be displayed:
- Page title (required): The heading shown at the top of your status page.
- Description: Optional markdown-formatted text to provide context.
- Logo URL: Optional custom logo image (displays at the top of the page).
- Grouping field (required): Metadata field to group experiments by (typically
metadata.model). - Filter: Optional filtering to include only specific experiments.
- Score columns: Select which scores to display from your evaluations.
- Metric columns: Select which metrics to show (duration, tokens, error rate, etc.).
- Sort by: Optional sort by a specific score or metric.
- Theme: Choose between light or dark mode.
- Click Publish to make your status page available at a public URL. On first publish, Braintrust automatically creates a service account with read-only access to experiments in this project. This account is used to fetch data for the public page without exposing your API credentials.
Configure display options
Grouping field
The grouping field determines how experiments are organized on your status page. Experiments are grouped by the value of this metadata field, with each group displayed as a column. Common grouping fields:metadata.model: Compare results across different modelsmetadata.version: Track performance across application versionsmetadata.prompt: Compare different prompt variationsmetadata.dataset: Show results for different test scenarios
metadata.model and have experiments with gpt-5-mini, gpt-5-nano, and claude-sonnet-4, your status page will show three columns, one for each model.
Filters
Click + Filter and use the Basic tab for point-and-click filtering or switch to SQL to write precise queries. For example:- Focusing on recent evaluations:
created > '2026-01-01' - Including specific datasets:
dataset_name = 'production-sample' - Filtering by metadata:
metadata.environment = 'staging'
Score columns
Select which evaluation scores to display. Scores appear as rows with color-coded progress bars:- Green (0.7-1.0): High scores
- Yellow (0.4-0.7): Medium scores
- Red (0.0-0.4): Low scores
Metric columns
Display built-in or custom metrics from your experiments:- Built-in metrics:
- Duration: Average end-to-end execution time (seconds)
- LLM duration: Average LLM call duration (seconds)
- Prompt tokens: Average prompt tokens per example
- Completion tokens: Average completion tokens per example
- Total tokens: Total tokens (prompt + completion)
- Examples: Number of examples evaluated
- Error rate: Percentage of failed examples
- Custom metrics defined in your experiments are also available for selection.
Share a status page
Once published, a status page is publicly accessible. Share the URL with anyone:- Public: Accessible to anyone with the URL (no authentication required)
- Read-only: Viewers cannot modify experiments or access sensitive data
- Aggregate-only: Shows averaged scores and metrics, not individual test cases
- Secure: Uses a service account with minimal permissions (experiment read-only)
Update a status page
To modify a status page configuration:- Go to Experiments in your project.
- Click Update eval status page in the experiments page header.
- Make changes to any configuration options.
- Preview your updates in real-time.
- Click Update to publish changes.
Unpublish a status page
To remove your status page:- Go to Experiments in your project.
- Click Update eval status page in the experiments page header.
- Click Unpublish
- Confirm the removal.
Next steps
- Run evaluations to generate results for your status page
- Interpret results before publishing to ensure accuracy
- Compare experiments to understand performance across models
- Manage projects to configure evaluation settings