Create datasets - Braintrust

Get test cases into a dataset from whichever source fits your workflow. Use file uploads for existing data, the SDK to populate programmatically, production logs and user feedback to capture real interactions, traces to promote specific examples, or Loop to generate from patterns.

Upload CSV/JSON

The fastest way to create a dataset is uploading a CSV or JSON file:

Go to Datasets.
If there are existing datasets, click + Dataset. Otherwise, click Upload CSV/JSON.
Drag and drop your file in the Upload dataset dialog.
Columns automatically map to the input field. Drag and drop them into different categories as needed:
- Input: Fields used as inputs for your task.
- Expected: Ground truth or ideal outputs for scoring.
- Metadata: Additional context for filtering and grouping.
- Tags: Labels for organizing and filtering individual records. When you categorize columns as tags, they’re automatically added to your project’s tag configuration. These are per-record tags, distinct from dataset-level tags that organize datasets in the list.
- Do not import: Exclude columns from the dataset.
The preview table updates in real-time as you move columns between categories, showing exactly how your dataset will be structured.
Click Import.

If your data includes an id field, duplicate rows will be deduplicated, with only the last occurrence of each ID kept.

Create via SDK

Create datasets programmatically and populate them with records. The approach varies by language:

TypeScript/Python: Use the high-level initDataset() / init_dataset() method which automatically creates datasets and provides simple insert() operations.
Go/Ruby: Use lower-level API methods that require initializing an API client and explicitly managing dataset creation and record insertion.

import { initDataset } from "braintrust";

async function main() {
  // Initialize dataset (creates it if it doesn't exist)
  const dataset = initDataset("My App", { dataset: "Customer Support" });

  // Insert records with input, expected output, and metadata
  dataset.insert({
    input: { question: "How do I reset my password?" },
    expected: { answer: "Click 'Forgot Password' on the login page." },
    metadata: { category: "authentication", difficulty: "easy" },
  });

  dataset.insert({
    input: { question: "What's your refund policy?" },
    expected: { answer: "Full refunds within 30 days of purchase." },
    metadata: { category: "billing", difficulty: "easy" },
  });

  dataset.insert({
    input: { question: "How do I integrate your API with NextJS?" },
    expected: { answer: "Install the SDK and use our React hooks." },
    metadata: { category: "technical", difficulty: "medium" },
  });

  // Flush to ensure all records are saved
  await dataset.flush();
  console.log("Dataset created with 3 records");
}

main();

Create via CLI

Use the bt datasets create CLI command to create datasets directly from the terminal. Accepts JSONL files, stdin, or inline JSON rows.

# Create an empty dataset
bt datasets create my-dataset

# Seed from a JSONL file
bt datasets create my-dataset --file records.jsonl

# Seed from stdin
cat records.jsonl | bt datasets create my-dataset

# Seed with inline JSON rows
bt datasets create my-dataset --rows '[{"id":"case-1","input":{"text":"hi"},"expected":"hello"}]'

Rows that omit an id field get auto-generated stable IDs. Accepted top-level record fields are id, input, expected, metadata, tags, and origin.

Promote traces from logs

You can add a trace to a dataset by mapping fields from a production log span into dataset row format. The span’s input maps to the dataset row’s input, and the span’s output typically becomes the row’s expected value. This is useful when you see a notably good or bad response in production and want to capture it as a test case. You can add traces to datasets with the Braintrust UI or programmatically with the Braintrust API.

To promote logs in bulk, define the mapping once and re-run it as a dataset pipeline.

Add traces to a dataset using the Braintrust UI:

Go to Logs.
Select the traces you want to add.
Select + Dataset and then the dataset you want to add to.

Use the BTQL endpoint to fetch an existing span from your production logs, then insert it into a dataset using the dataset insert API. The origin field links the dataset row back to the source span, creating a Log button in the Origin column.

import { initDataset } from "braintrust";

const projectId = "<your-project-id>";
const spanId = "<span-id-from-logs>";

// Fetch the span from project logs
const btqlResponse = await fetch("https://api.braintrust.dev/btql", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.BRAINTRUST_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    query: `SELECT id, input, output FROM project_logs('${projectId}') WHERE span_id = '${spanId}' LIMIT 1`,
  }),
});
const { data } = await btqlResponse.json();
const span = data[0];

// Insert into the dataset, mapping span fields to dataset row format
const dataset = initDataset("My App", { dataset: "Customer Support" });
const datasetId = await dataset.id;

await fetch(`https://api.braintrust.dev/v1/dataset/${datasetId}/insert`, {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.BRAINTRUST_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    events: [
      {
        input: span.input,
        // span.output is the raw output from your app — extract the relevant
        // value for your use case (e.g. span.output[0].message.content for
        // OpenAI chat completions)
        expected: span.output,
        origin: {
          object_type: "project_logs",
          object_id: projectId,
          // span.id is the row UUID from the SELECT above — what the Log button expects.
          id: span.id,
        },
      },
    ],
  }),
});

Curate from topics

Topic classifications turn logs into structured signals you can filter by, such as task type, sentiment, or error category. Filter logs by classification, then promote the matching traces to a dataset for targeted evaluation. See Build datasets from topics for the full workflow.

Curate from user feedback

User feedback from production provides valuable test cases that reflect real user interactions. Use feedback to create datasets from highly-rated examples or problematic cases. See Capture user feedback for implementation details on logging feedback programmatically. To build datasets from feedback:

Filter logs by feedback scores using the Filter menu:
- scores.user_rating > 0.8 (SQL) or filter: scores.user_rating > 0.8 (BTQL) for highly-rated examples
- metadata.thumbs_up = false for negative feedback
- comment IS NOT NULL and scores.correctness < 0.5 for low-scoring feedback with comments
Select the traces you want to include.
Select Add to dataset.
Choose an existing dataset or create a new one.

You can also ask Loop to create datasets based on feedback patterns, such as “Create a dataset from logs with positive feedback” or “Build a dataset from cases where users clicked thumbs down.”

Generate with Loop

Ask Loop to create a dataset based on your logs or specific criteria. Example queries:

“Generate a dataset from the highest-scoring examples in this experiment”
“Create a dataset with the most common inputs in the logs”

Log from production

Track user feedback from your application:

import { initDataset, Dataset } from "braintrust";

class MyApplication {
  private dataset: Dataset | undefined = undefined;

  async initApp() {
    this.dataset = await initDataset("My App", { dataset: "logs" });
  }

  async logUserExample(
    input: any,
    expected: any,
    userId: string,
    thumbsUp: boolean,
  ) {
    if (this.dataset) {
      this.dataset.insert({
        input,
        expected,
        metadata: { userId, thumbsUp },
      });
    }
  }
}

Multimodal datasets

You can store and process images and other file types in your datasets. There are several ways to use files in Braintrust:

Image URLs - Keep datasets lightweight by referencing external images. Best for large images and fastest to sync.
Base64 - Encode images directly in records. Self-contained but inflates dataset size.
Attachments (easiest to manage) - Store files directly in Braintrust.
External attachments - Reference files in your own object stores.

For large images, use image URLs to keep datasets lightweight. To keep all data within Braintrust, use attachments. Attachments support any file type including images, audio, and PDFs.

import { Attachment, initDataset } from "braintrust";
import path from "node:path";

async function createPdfDataset(): Promise<void> {
  const dataset = initDataset({
    project: "Project with PDFs",
    dataset: "My PDF Dataset",
  });
  for (const filename of ["example.pdf"]) {
    dataset.insert({
      input: {
        file: new Attachment({
          filename,
          contentType: "application/pdf",
          data: path.join("files", filename),
        }),
      },
    });
  }
  await dataset.flush();
}

createPdfDataset();

Next steps

Manage datasets — tag, snapshot, validate, and edit records.
Use in evaluations with Eval().
Track performance across experiments.

​Upload CSV/JSON

​Create via SDK

​Create via CLI

​Promote traces from logs

​Curate from topics

​Curate from user feedback

​Generate with Loop

​Log from production

​Multimodal datasets

​Next steps

Upload CSV/JSON

Create via SDK

Create via CLI

Promote traces from logs

Curate from topics

Curate from user feedback

Generate with Loop

Log from production

Multimodal datasets

Next steps