- Store evaluation test cases for your eval script instead of managing large JSONL or CSV files
- Log all production generations to assess quality manually or using model graded evals
- Store user reviewed (, ) generations to find new test cases
- Integrated. Datasets are integrated with the rest of the Braintrust platform, so you can use them in evaluations, explore them in the playground, and log to them from your staging/production environments.
- Versioned. Every insert, update, and delete is versioned, so you can pin evaluations to a specific version of the dataset via the SDK.
- Scalable. Datasets are stored in a modern cloud data warehouse, so you can collect as much data as you want without worrying about storage or performance limits.
- Secure. If you run Braintrust in your cloud environment, datasets are stored in your warehouse and never touch our infrastructure.
Manage datasets with the SDK
Records in a dataset are stored as JSON objects, and each record has three top-level fields:inputis a set of inputs that you could use to recreate the example in your application. For example, if you’re logging examples from a question answering model, the input might be the question.expected(optional) is the output of your model. For example, if you’re logging examples from a question answering model, this might be the answer. You can accessexpectedwhen running evaluations as theexpectedfield; however,expecteddoes not need to be the ground truth.metadata(optional) is a set of key-value pairs that you can use to filter and group your data. For example, if you’re logging examples from a question answering model, the metadata might include the knowledge source that the question came from.
Create a dataset
Datasets are created automatically when you initialize them in the SDK.Read a dataset
To read a dataset, use the same method as above for creating a dataset, but pass the name of the dataset you want to retrieve.Filter, sort, and limit datasets
Use the_internal_btql parameter to filter, sort, and limit dataset records. This parameter accepts BTQL query clauses to control which records are returned.
The
_internal_btql parameter uses the BTQL AST (Abstract Syntax Tree) format, not the string-based BTQL syntax shown in the UI. See examples below for the correct structure.Filter records
Thefilter parameter is an object with a single btql field that contains the BTQL filter expression as a string.
Sort records
Thesort parameter is an array of sort expressions with sort direction. The options are "asc" for ascending and "desc" for descending.
Combine filters, sorts, and limits
You can use bothfilter and sort parameters with multiple BTQL clauses to create complex queries.
Insert records
You can use the SDK to insert into a dataset:Update records
In the above example, eachinsert() statement returns an id. You can use this id to update the record using update():
update() method applies a merge strategy: only the fields you provide will be updated, and all other existing fields in the record will remain unchanged.
Delete records
You can delete records via code byid:
Flush records
In both TypeScript and Python, the Braintrust SDK flushes records as fast as possible and installs an exit handler that tries to flush records, but these hooks are not always respected (e.g. by certain runtimes, or if youexit a process yourself). If
you need to ensure that records are flushed, you can call flush() on the dataset.
Multimodal datasets
You may want to store or process images in your datasets. There are currently three ways to use images in Braintrust:- Image URLs (most performant)
- Base64 (least performant)
- Attachments (easiest to manage, stored in Braintrust)
- External attachments (access files in your own object stores)
Manage datasets in the UI
In addition to managing datasets through the API, you can also manage them in the Braintrust UI.View a dataset
You can view a dataset in the Braintrust UI by navigating to the project and then clicking on the dataset.
Create custom columns
When viewing a dataset, create custom columns to extract values from the root span.Create a dataset
The easiest way to create a dataset is to upload a CSV file.
Update records
Once you’ve uploaded a dataset, you can update records or add new ones directly in the UI.
Label records
In addition to updating datasets through the API, you can edit and label them in the UI. Like experiments and logs, you can configure categorical fields to allow human reviewers to rapidly label records.This requires you to first configure human review in the Configuration tab of your project.

Delete records
To delete a record, navigate to Datasets and select the dataset. Select the check box next to the individual record you’d like to delete, and then select the Trash icon. You can follow the same steps to delete an entire dataset from the Datasets page.Use a dataset in an evaluation
You can use a dataset in an evaluation by passing it directly to theEval() function.
ids to link each dataset record
to the corresponding result.
asDataset()/as_dataset() function, which converts the experiment into dataset format (input, expected, and metadata).
Log from your application
To log to a dataset from your application, you can use the SDK and callinsert(). Braintrust logs
are queued and sent asynchronously, so you don’t need to worry about critical path performance.
Since the SDK uses API keys, it’s recommended that you log from a privileged environment (e.g. backend server),
instead of client applications directly.
This example walks through how to track / from feedback: