Datasets

Datasets allow you to collect data from production, staging, evaluations, and even manually, and then use that data to run evaluations and track improvements over time. For example, you can use Datasets to:

Store evaluation test cases for your eval script instead of managing large JSONL or CSV files
Log all production generations to assess quality manually or using model graded evals
Store user reviewed (, ) generations to find new test cases

In Braintrust, datasets have a few key properties:

Integrated. Datasets are integrated with the rest of the Braintrust platform, so you can use them in evaluations, explore them in the playground, and log to them from your staging/production environments.
Versioned. Every insert, update, and delete is versioned, so you can pin evaluations to a specific version of the dataset via the SDK.
Scalable. Datasets are stored in a modern cloud data warehouse, so you can collect as much data as you want without worrying about storage or performance limits.
Secure. If you run Braintrust in your cloud environment, datasets are stored in your warehouse and never touch our infrastructure.

Create a dataset

Datasets are created automatically when you initialize them in the SDK.Records in a dataset are stored as JSON objects, and each record has three top-level fields:

input is a set of inputs that you could use to recreate the example in your application. For example, if you’re logging examples from a question answering model, the input might be the question.
expected (optional) is the output of your model. For example, if you’re logging examples from a question answering model, this might be the answer. You can access expected when running evaluations as the expected field; however, expected does not need to be the ground truth.
metadata (optional) is a set of key-value pairs that you can use to filter and group your data. For example, if you’re logging examples from a question answering model, the metadata might include the knowledge source that the question came from.

import { initDataset } from "braintrust";
async function main() {
  const dataset = initDataset("My App", { dataset: "My New Dataset" });
  console.log("Dataset created:", dataset);
}
main();

Read a dataset

To read a dataset, use the same method as above for creating a dataset, but pass the name of the dataset you want to retrieve.

const dataset = initDataset("My App", { dataset: "My Existing Dataset" });

// This will load the dataset in batches so large datasets aren't loaded entirely into memory.
for await (const row of dataset) {
  console.log(row);
}

Filter, sort, and limit datasets

Use the _internal_btql parameter to filter, sort, and limit dataset records. This parameter accepts BTQL query clauses to control which records are returned.

The _internal_btql parameter uses the BTQL AST (Abstract Syntax Tree) format, not the string-based BTQL syntax shown in the UI. See examples below for the correct structure.

Filter records

The filter parameter is an object with a single btql field that contains the BTQL filter expression as a string.

const dataset = initDataset("My App", {
  dataset: "My Existing Dataset",
  _internal_btql: {
    filter: { btql: "metadata.user_type = 'premium' and input MATCH 'question'" },
    limit: 100,
  },
});

for await (const row of dataset) {
  console.log(row);
}

Sort records

The sort parameter is an array of sort expressions with sort direction. The options are "asc" for ascending and "desc" for descending.

const dataset = initDataset("My App", {
  dataset: "My Existing Dataset",
  _internal_btql: {
    sort: [
      { expr: { btql: "created" }, dir: "desc" },
      { expr: { btql: "metadata.priority" }, dir: "asc" },
    ],
    limit: 50,
  },
});

Combine filters, sorts, and limits

You can use both filter and sort parameters with multiple BTQL clauses to create complex queries.

const dataset = initDataset("My App", {
  dataset: "My Existing Dataset",
  _internal_btql: {
    filter: {
      btql: "metadata.domain = 'support' and created > now() - interval 7 day",
    },
    sort: [{ expr: { btql: "created" }, dir: "desc" }],
    limit: 1000,
  },
});

For more information on BTQL syntax and available operators, see the BTQL reference documentation.

Insert records

You can use the SDK to insert into a dataset:

for (let i = 0; i < 10; i++) {
  const id = dataset.insert({
    input: i,
    expected: { result: i + 1, error: null },
    metadata: { foo: i % 2 },
  });
  console.log("Inserted record with id", id);
}

Update records

In the above example, each insert() statement returns an id. You can use this id to update the record using update():

dataset.update({
  id,
  input: i,
  expected: { result: i + 1, error: "Timeout" },
});

The update() method applies a merge strategy: only the fields you provide will be updated, and all other existing fields in the record will remain unchanged.

Delete records

You can delete records via code by id:

await dataset.delete(id);

To delete an entire dataset, use the API command.

Flush records

In both TypeScript and Python, the Braintrust SDK flushes records as fast as possible and installs an exit handler that tries to flush records, but these hooks are not always respected (e.g. by certain runtimes, or if you exit a process yourself). If you need to ensure that records are flushed, you can call flush() on the dataset.

await dataset.flush();

Multimodal datasets

You may want to store or process images in your datasets. There are currently three ways to use images in Braintrust:

Image URLs (most performant)
Base64 (least performant)
Attachments (easiest to manage, stored in Braintrust)
External attachments (access files in your own object stores)

If you’re building a dataset of large images in Braintrust, we recommend using image URLs. This keeps your dataset lightweight and allows you to preview or process them without storing heavy binary data directly.If you prefer to keep all data within Braintrust, create a dataset of attachments instead. In addition to images, you can create datasets of attachments that have any arbitrary data type, including audio and PDFs. You can then use these datasets in evaluations.

import { Attachment, initDataset } from "braintrust";
import path from "node:path";

async function createPdfDataset(): Promise<void> {
  const dataset = initDataset({
    project: "Project with PDFs",
    dataset: "My PDF Dataset",
  });
  for (const filename of ["example.pdf"]) {
    dataset.insert({
      input: {
        file: new Attachment({
          filename,
          contentType: "application/pdf",
          data: path.join("files", filename),
        }),
      },
    });
  }
  await dataset.flush();
}

// Create a dataset with attachments.
createPdfDataset();

To invoke this script, run this in your terminal:

npx tsx attachment_dataset.ts

View a dataset

You can view a dataset in the Braintrust UI by navigating to the project and then clicking on the dataset.

From the UI, you can filter records, create new ones, edit values, and delete records. You can also copy records between datasets and from experiments into datasets. This feature is commonly used to collect interesting or anomalous examples into a golden dataset.

Create custom columns

When viewing a dataset, create custom columns to extract values from the root span.

Create a dataset

The easiest way to create a dataset is to upload a CSV file.

Uploaded records that include an id key are automatically deduplicated by their id value.

Update records

Once you’ve uploaded a dataset, you can update records or add new ones directly in the UI.

Label records

In addition to updating datasets through the API, you can edit and label them in the UI. Like experiments and logs, you can configure categorical fields to allow human reviewers to rapidly label records.

This requires you to first configure human review in the Configuration tab of your project.

Delete records

To delete a record, navigate to Datasets and select the dataset. Select the check box next to the individual record you’d like to delete, and then select the Trash icon.

You can follow the same steps to delete an entire dataset from the Datasets page.

Dataset schemas

Dataset schemas allow you to define JSON schemas for the input, expected, and metadata fields in your dataset. When schemas are defined, you can:

Validate data: Enable schema enforcement to ensure all records conform to the defined structure
Form-based editing: Edit records using intuitive forms instead of raw JSON

Define schemas

To define schemas for your dataset:

Navigate to your dataset
Click the Field schemas button in the top toolbar
Select the field you want to define a schema for (input, expected, or metadata)
Use the visual schema builder to define your schema structure

Infer schemas from data

Instead of manually building a schema, you can automatically infer it from your existing data:

Open the schema editor for a field
Click the Infer schema button
The schema will be generated based on the first 100 records in your dataset

This is particularly useful when you have existing data and want to quickly create a schema that matches your current structure.

Enable schema enforcement

Once you’ve defined a schema, you can enable enforcement to validate all records:

In the schema editor, toggle the Enforce switch
Click Save to apply the schema

When enforcement is enabled:

New records must conform to the schema or validation errors will be shown
Existing records that don’t match the schema will display validation warnings
Form-based editing will automatically validate input as you type

Enforcement is UI-only and does not affect SDK inserts or updates

Form-based editing

When a schema is defined for a field, the “Form” display type will be available in the field’s data editor. Form-based editing makes it easier to maintain consistent data structures and reduces errors when manually editing records.

Schemas are stored in the dataset’s metadata and are versioned along with your dataset. This ensures that evaluations pinned to specific dataset versions use the correct schema definitions. The Form display type is only available on the dataset page.

Use a dataset in an evaluation

You can use a dataset in an evaluation by passing it directly to the Eval() function.

import { initDataset, Eval } from "braintrust";
import { Levenshtein } from "autoevals";

Eval(
  "Say Hi Bot", // Replace with your project name
  {
    data: initDataset("My App", { dataset: "My Dataset" }),
    task: async (input) => {
      return "Hi " + input; // Replace with your LLM call
    },
    scores: [Levenshtein],
  },
);

You can also manually iterate through a dataset’s records and run your tasks, then log the results to an experiment. Log the ids to link each dataset record to the corresponding result.

import { initDataset, init, Dataset, Experiment } from "braintrust";

function myApp(input: any) {
  return `output of input ${input}`;
}

function myScore(output: any, rowExpected: any) {
  return Math.random();
}

async function main() {
  const dataset = initDataset("My App", { dataset: "My Dataset" });
  const experiment = init("My App", {
    experiment: "My Experiment",
    dataset: dataset,
  });
  for await (const row of dataset) {
    const output = myApp(row.input);
    const closeness = myScore(output, row.expected);
    experiment.log({
      input: row.input,
      output,
      expected: row.expected,
      scores: { closeness },
      datasetRecordId: row.id,
    });
  }

  console.log(await experiment.summarize());
}

main();

You can also use the results of an experiment as baseline data for future experiments by calling the asDataset()/as_dataset() function, which converts the experiment into dataset format (input, expected, and metadata).

import { init, Eval } from "braintrust";
import { Levenshtein } from "autoevals";

const experiment = init("My App", {
  experiment: "my-experiment",
  open: true,
});

Eval<string, string>("My App", {
  data: experiment.asDataset(),
  task: async (input) => {
    return `hello ${input}`;
  },
  scores: [Levenshtein],
});

For a more advanced overview of how to use an experiment as a baseline for other experiments, see hill climbing.

Log from your application

To log to a dataset from your application, you can use the SDK and call insert(). Braintrust logs are queued and sent asynchronously, so you don’t need to worry about critical path performance. Since the SDK uses API keys, it’s recommended that you log from a privileged environment (e.g. backend server), instead of client applications directly. This example walks through how to track / from feedback:

import { initDataset, Dataset } from "braintrust";

class MyApplication {
  private dataset: Dataset | undefined = undefined;

  async initApp() {
    this.dataset = await initDataset("My App", { dataset: "logs" });
  }

  async logUserExample(
    input: any,
    expected: any,
    userId: string,
    orgId: string,
    thumbsUp: boolean,
  ) {
    if (this.dataset) {
      this.dataset.insert({
        input,
        expected,
        metadata: { userId, orgId, thumbsUp },
      });
    } else {
      console.warn("Must initialize application before logging");
    }
  }
}

Start

Integrations

Core

Context

Best practices

Reference

Create a dataset

Read a dataset

Filter, sort, and limit datasets

Filter records

Sort records

Combine filters, sorts, and limits

Insert records

Update records

Delete records

Flush records

Multimodal datasets

View a dataset

Create custom columns

Create a dataset

Update records

Label records

Delete records

Dataset schemas

Define schemas

Infer schemas from data

Enable schema enforcement

Form-based editing

Use a dataset in an evaluation

Log from your application

Start

Integrations

Core

Context

Best practices

Reference

​Create a dataset

​Read a dataset

​Filter, sort, and limit datasets

​Filter records

​Sort records

​Combine filters, sorts, and limits

​Insert records

​Update records

​Delete records

​Flush records

​Multimodal datasets

​View a dataset

​Create custom columns

​Create a dataset

​Update records

​Label records

​Delete records

​Dataset schemas

​Define schemas

​Infer schemas from data

​Enable schema enforcement

​Form-based editing

​Use a dataset in an evaluation

​Log from your application

Create a dataset

Read a dataset

Filter, sort, and limit datasets

Filter records

Sort records

Combine filters, sorts, and limits

Insert records

Update records

Delete records

Flush records

Multimodal datasets

View a dataset

Create custom columns

Create a dataset

Update records

Label records

Delete records

Dataset schemas

Define schemas

Infer schemas from data

Enable schema enforcement

Form-based editing

Use a dataset in an evaluation

Log from your application