Why use datasets
Datasets in Braintrust have key advantages:- Integrated: Use directly in evaluations, explore in playgrounds, and populate from production.
- Versioned: Every change is tracked, so experiments can pin to specific versions.
- Scalable: Stored in a modern data warehouse without storage or performance limits.
- Secure: Self-hosted deployments keep data in your warehouse.
Create datasets
- UI
- SDK
Upload CSV/JSON
The fastest way to create a dataset is uploading a CSV or JSON file:- Go to Datasets.
- If there are existing datasets, click + Dataset. Otherwise, click Upload CSV/JSON.
- Drag and drop your file in the Upload dataset dialog.
-
Columns automatically map to the
inputfield. Drag and drop them into different categories as needed:- Input: Fields used as inputs for your task.
- Expected: Ground truth or ideal outputs for scoring.
- Metadata: Additional context for filtering and grouping.
- Tags: Labels for organizing and filtering. When you categorize columns as tags, they’re automatically added to your project’s tag configuration.
- Do not import: Exclude columns from the dataset.
- Click Import.
If your data includes an
id field, duplicate rows will be deduplicated, with only the last occurrence of each ID kept.Generate with Loop
Ask Loop to create a dataset based on your logs or specific criteria:
- “Generate a dataset from the highest-scoring examples in this experiment”
- “Create a dataset with the most common inputs in the logs”
Add records manually
Once you’ve created a dataset, add or edit records directly in the UI:
From user feedback
User feedback from production provides valuable test cases that reflect real user interactions. Use feedback to create datasets from highly-rated examples or problematic cases.See Capture user feedback for implementation details on logging feedback programmatically.To build datasets from feedback:- Filter logs by feedback scores using the Filter menu:
scores.user_rating > 0.8(SQL) orfilter: scores.user_rating > 0.8(BTQL) for highly-rated examplesmetadata.thumbs_up = falsefor negative feedbackcomment IS NOT NULL and scores.correctness < 0.5for low-scoring feedback with comments
- Select the traces you want to include.
- Select Add to dataset.
- Choose an existing dataset or create a new one.
Dataset structure
Each record has three top-level fields:- input: Data to recreate the example in your application (required).
- expected: Ideal output or ground truth (optional but recommended for evaluation).
- metadata: Key-value pairs for filtering and grouping (optional).
View and edit datasets

- Filter and search records
- Create custom columns to extract nested values
- Edit records inline
- Copy records between datasets
- Delete individual records or entire datasets
Create custom columns
Extract values from records using custom columns. Use SQL expressions to surface important fields directly in the table.Label datasets
Configure categorical scores to allow reviewers to rapidly label records. See Configure review scores for details.
Define schemas
Dataset schemas let you define JSON schemas forinput, expected, and metadata fields. Schemas enable:
- Validation: Ensure records conform to your structure.
- Form-based editing: Edit records with intuitive forms instead of raw JSON.
Infer from data
Automatically generate schemas from existing data:- Open the schema editor for a field.
- Click Infer schema.
- The schema generates from the first 100 records.
Enable enforcement
Toggle Enforce in the schema editor to validate all records. When enabled:- New records must conform or show validation errors.
- Existing non-conforming records display warnings.
- Form editing validates input automatically.
Enforcement is UI-only and doesn’t affect SDK inserts or updates.
Read and filter datasets
- UI
- SDK
Use the filter menu to narrow dataset views, or write SQL queries for complex filtering. See Filter and search for details.
Track performance
Monitor how dataset rows perform across experiments:View experiment runs
See all experiments that used a dataset:- Go to your dataset page.
- In the right panel, select Runs.
- Review performance metrics across experiments.

Filter experiment runs
To narrow down the list of experiment runs, you can filter by time range or use SQL. Filter by time range: Click and drag across any region of the chart to select a time range. The table below updates to show only experiments in that range. To clear the filter, click clear. This helps you focus on specific periods, like recent experiments or historical baselines. Filter with SQL: Select Filter and use the Basic tab for common filters, or switch to SQL to write more precise SQL queries based on criteria like score thresholds, time ranges, or experiment names. Common filtering examples:Filter states are persisted in the URL, allowing you to bookmark or share specific filtered views of experiment runs.
Analyze per-row performance
See how individual rows perform:- Select a row in the dataset table.
- In the right panel, select Runs.
- Review the row’s metrics across experiments.
This view only shows experiments that set the
origin field in eval traces.
- Consistently low scores suggest ambiguous expectations.
- Failures across experiments indicate edge cases.
- High variance suggests instability.
Multimodal datasets
You can store and process images and other file types in your datasets. There are several ways to use files in Braintrust:- Image URLs (most performant) - Keep datasets lightweight with external image references.
- Base64 (least performant) - Encode images directly in records.
- Attachments (easiest to manage) - Store files directly in Braintrust.
- External attachments - Reference files in your own object stores.
Use in evaluations
Pass datasets directly toEval():
Next steps
- Add human feedback to label datasets.
- Run evaluations using your datasets.
- Use the Loop to generate and optimize datasets.
- Read the SQL reference for advanced filtering.