Datasets allow you to collect data from production, staging, evaluations, and even manually, and then
use that data to run evaluations and track improvements over time.For example, you can use Datasets to:
Store evaluation test cases for your eval script instead of managing large JSONL or CSV files
Log all production generations to assess quality manually or using model graded evals
Store user reviewed (, ) generations to find new test cases
In Braintrust, datasets have a few key properties:
Integrated. Datasets are integrated with the rest of the Braintrust platform, so you can use them in
evaluations, explore them in the playground, and log to them from your staging/production environments.
Versioned. Every insert, update, and delete is versioned, so you can pin evaluations to a specific version
of the dataset via the SDK.
Scalable. Datasets are stored in a modern cloud data warehouse, so you can collect as much data as you want without worrying about
storage or performance limits.
Secure. If you run Braintrust in your cloud environment, datasets are stored in your warehouse and
never touch our infrastructure.
Datasets are created automatically when you initialize them in the SDK.Records in a dataset are stored as JSON objects, and each record has three top-level fields:
input is a set of inputs that you could use to recreate the example in your application. For example, if you’re logging
examples from a question answering model, the input might be the question.
expected (optional) is the output of your model. For example, if you’re logging examples from a question answering model, this
might be the answer. You can access expected when running evaluations as the expected field; however, expected does not need to be
the ground truth.
metadata (optional) is a set of key-value pairs that you can use to filter and group your data. For example, if you’re logging
examples from a question answering model, the metadata might include the knowledge source that the question came from.
Report incorrect code
Copy
Ask AI
import { initDataset } from "braintrust";async function main() { const dataset = initDataset("My App", { dataset: "My New Dataset" }); console.log("Dataset created:", dataset);}main();
To read a dataset, use the same method as above for creating a dataset, but pass the name of the dataset you want to retrieve.
Report incorrect code
Copy
Ask AI
const dataset = initDataset("My App", { dataset: "My Existing Dataset" });// This will load the dataset in batches so large datasets aren't loaded entirely into memory.for await (const row of dataset) { console.log(row);}
Use the _internal_btql parameter to filter, sort, and limit dataset records. This parameter accepts BTQL query clauses to control which records are returned.
The _internal_btql parameter uses the BTQL AST (Abstract Syntax Tree) format, not the string-based BTQL syntax shown in the UI. See examples below for the correct structure.
for (let i = 0; i < 10; i++) { const id = dataset.insert({ input: i, expected: { result: i + 1, error: null }, metadata: { foo: i % 2 }, }); console.log("Inserted record with id", id);}
In the above example, each insert() statement returns an id. You can use this id to update the record using update():
Report incorrect code
Copy
Ask AI
dataset.update({ id, input: i, expected: { result: i + 1, error: "Timeout" },});
The update() method applies a merge strategy: only the fields you provide will be updated, and all other existing fields in the record will remain unchanged.
In both TypeScript and Python, the Braintrust SDK flushes records as fast as possible and installs an exit handler that tries
to flush records, but these hooks are not always respected (e.g. by certain runtimes, or if you exit a process yourself). If
you need to ensure that records are flushed, you can call flush() on the dataset.
You may want to store or process images in your datasets. There are currently three ways to use images in Braintrust:
Image URLs (most performant)
Base64 (least performant)
Attachments (easiest to manage, stored in Braintrust)
External attachments (access files in your own object stores)
If you’re building a dataset of large images in Braintrust, we recommend using image URLs. This keeps your dataset lightweight and allows you to preview or process them without storing heavy binary data directly.If you prefer to keep all data within Braintrust, create a dataset of attachments instead. In addition to images, you can create datasets of attachments that have any arbitrary data type, including audio and PDFs. You can then use these datasets in evaluations.
Report incorrect code
Copy
Ask AI
import { Attachment, initDataset } from "braintrust";import path from "node:path";async function createPdfDataset(): Promise<void> { const dataset = initDataset({ project: "Project with PDFs", dataset: "My PDF Dataset", }); for (const filename of ["example.pdf"]) { dataset.insert({ input: { file: new Attachment({ filename, contentType: "application/pdf", data: path.join("files", filename), }), }, }); } await dataset.flush();}// Create a dataset with attachments.createPdfDataset();
You can view a dataset in the Braintrust UI by navigating to the project and then clicking on the dataset.From the UI, you can filter records, create new ones, edit values, and delete records. You can also copy records
between datasets and from experiments into datasets. This feature is commonly used to collect interesting or
anomalous examples into a golden dataset.
In addition to updating datasets through the API, you can edit and label them in the UI. Like experiments and logs, you can
configure categorical fields to allow human reviewers
to rapidly label records.
This requires you to first configure human review in the Configuration tab of your project.
To delete a record, navigate to Datasets and select the dataset. Select the check box next to the individual record you’d like to delete, and then select the Trash icon.You can follow the same steps to delete an entire dataset from the Datasets page.
When a schema is defined for a field, the “Form” display type will be available in the field’s data editor. Form-based editing makes it easier to maintain consistent data structures and reduces errors when manually editing records.
Schemas are stored in the dataset’s metadata and are versioned along with your dataset. This ensures that evaluations pinned to specific dataset versions use the correct schema definitions. The Form display type is only available on the dataset page.
You can use a dataset in an evaluation by passing it directly to the Eval() function.
Report incorrect code
Copy
Ask AI
import { initDataset, Eval } from "braintrust";import { Levenshtein } from "autoevals";Eval( "Say Hi Bot", // Replace with your project name { data: initDataset("My App", { dataset: "My Dataset" }), task: async (input) => { return "Hi " + input; // Replace with your LLM call }, scores: [Levenshtein], },);
You can also manually iterate through a dataset’s records and run your tasks,
then log the results to an experiment. Log the ids to link each dataset record
to the corresponding result.
You can also use the results of an experiment as baseline data for future experiments by calling the asDataset()/as_dataset() function, which converts the experiment into dataset format (input, expected, and metadata).
To log to a dataset from your application, you can use the SDK and call insert(). Braintrust logs
are queued and sent asynchronously, so you don’t need to worry about critical path performance.Since the SDK uses API keys, it’s recommended that you log from a privileged environment (e.g. backend server),
instead of client applications directly.This example walks through how to track / from feedback:
See which experiments use your dataset and how each row performs. This helps you identify problematic test cases and understand your evaluation data quality.