Skip to main content

Documentation Index

Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt

Use this file to discover all available pages before exploring further.

Get test cases into a dataset from whichever source fits your workflow. Use file uploads for existing data, the SDK to populate programmatically, production logs and user feedback to capture real interactions, traces to promote specific examples, or Loop to generate from patterns.

Upload CSV/JSON

The fastest way to create a dataset is uploading a CSV or JSON file:
  1. Go to Datasets.
  2. If there are existing datasets, click + Dataset. Otherwise, click Upload CSV/JSON.
  3. Drag and drop your file in the Upload dataset dialog.
  4. Columns automatically map to the input field. Drag and drop them into different categories as needed:
    • Input: Fields used as inputs for your task.
    • Expected: Ground truth or ideal outputs for scoring.
    • Metadata: Additional context for filtering and grouping.
    • Tags: Labels for organizing and filtering individual records. When you categorize columns as tags, they’re automatically added to your project’s tag configuration. These are per-record tags, distinct from dataset-level tags that organize datasets in the list.
    • Do not import: Exclude columns from the dataset.
    The preview table updates in real-time as you move columns between categories, showing exactly how your dataset will be structured.
  5. Click Import.
If your data includes an id field, duplicate rows will be deduplicated, with only the last occurrence of each ID kept.

Create via SDK

Create datasets programmatically and populate them with records. The approach varies by language:
  • TypeScript/Python: Use the high-level initDataset() / init_dataset() method which automatically creates datasets and provides simple insert() operations.
  • Go/Ruby: Use lower-level API methods that require initializing an API client and explicitly managing dataset creation and record insertion.
import { initDataset } from "braintrust";

async function main() {
  // Initialize dataset (creates it if it doesn't exist)
  const dataset = initDataset("My App", { dataset: "Customer Support" });

  // Insert records with input, expected output, and metadata
  dataset.insert({
    input: { question: "How do I reset my password?" },
    expected: { answer: "Click 'Forgot Password' on the login page." },
    metadata: { category: "authentication", difficulty: "easy" },
  });

  dataset.insert({
    input: { question: "What's your refund policy?" },
    expected: { answer: "Full refunds within 30 days of purchase." },
    metadata: { category: "billing", difficulty: "easy" },
  });

  dataset.insert({
    input: { question: "How do I integrate your API with NextJS?" },
    expected: { answer: "Install the SDK and use our React hooks." },
    metadata: { category: "technical", difficulty: "medium" },
  });

  // Flush to ensure all records are saved
  await dataset.flush();
  console.log("Dataset created with 3 records");
}

main();

Promote traces from logs

You can add a trace to a dataset by mapping fields from a production log span into dataset row format. The span’s input maps to the dataset row’s input, and the span’s output typically becomes the row’s expected value. This is useful when you see a notably good or bad response in production and want to capture it as a test case. You can add traces to datasets with the Braintrust UI or programmatically with the Braintrust API.
Add traces to a dataset using the Braintrust UI:
  1. Go to Logs.
  2. Select the traces you want to add.
  3. Select + Dataset and then the dataset you want to add to.

Curate from topics

Topic classifications turn logs into structured signals you can filter by, such as task type, sentiment, or error category. Filter logs by classification, then promote the matching traces to a dataset for targeted evaluation. See Build datasets from topics for the full workflow.

Curate from user feedback

User feedback from production provides valuable test cases that reflect real user interactions. Use feedback to create datasets from highly-rated examples or problematic cases. See Capture user feedback for implementation details on logging feedback programmatically. To build datasets from feedback:
  1. Filter logs by feedback scores using the Filter menu:
    • scores.user_rating > 0.8 (SQL) or filter: scores.user_rating > 0.8 (BTQL) for highly-rated examples
    • metadata.thumbs_up = false for negative feedback
    • comment IS NOT NULL and scores.correctness < 0.5 for low-scoring feedback with comments
  2. Select the traces you want to include.
  3. Select Add to dataset.
  4. Choose an existing dataset or create a new one.
You can also ask Loop to create datasets based on feedback patterns, such as “Create a dataset from logs with positive feedback” or “Build a dataset from cases where users clicked thumbs down.”

Generate with Loop

Ask Loop to create a dataset based on your logs or specific criteria. Example queries:
  • “Generate a dataset from the highest-scoring examples in this experiment”
  • “Create a dataset with the most common inputs in the logs”
Generate dataset from logs

Log from production

Track user feedback from your application:
import { initDataset, Dataset } from "braintrust";

class MyApplication {
  private dataset: Dataset | undefined = undefined;

  async initApp() {
    this.dataset = await initDataset("My App", { dataset: "logs" });
  }

  async logUserExample(
    input: any,
    expected: any,
    userId: string,
    thumbsUp: boolean,
  ) {
    if (this.dataset) {
      this.dataset.insert({
        input,
        expected,
        metadata: { userId, thumbsUp },
      });
    }
  }
}

Multimodal datasets

You can store and process images and other file types in your datasets. There are several ways to use files in Braintrust:
  • Image URLs - Keep datasets lightweight by referencing external images. Best for large images and fastest to sync.
  • Base64 - Encode images directly in records. Self-contained but inflates dataset size.
  • Attachments (easiest to manage) - Store files directly in Braintrust.
  • External attachments - Reference files in your own object stores.
For large images, use image URLs to keep datasets lightweight. To keep all data within Braintrust, use attachments. Attachments support any file type including images, audio, and PDFs.
import { Attachment, initDataset } from "braintrust";
import path from "node:path";

async function createPdfDataset(): Promise<void> {
  const dataset = initDataset({
    project: "Project with PDFs",
    dataset: "My PDF Dataset",
  });
  for (const filename of ["example.pdf"]) {
    dataset.insert({
      input: {
        file: new Attachment({
          filename,
          contentType: "application/pdf",
          data: path.join("files", filename),
        }),
      },
    });
  }
  await dataset.flush();
}

createPdfDataset();

Next steps