Skip to main content
This quickstart shows you how to set up and run evals in a Braintrust experiment to measure your AI application’s effectiveness and iterate continuously using production data. You can create evals with the Braintrust SDK or directly in the Braintrust UI.
Set up your environment and create an eval with the Braintrust SDK. Wrappers are available for TypeScript, Python, and other languages.

1. Install Braintrust libraries

Install the Braintrust SDK and autoevals library for your language:
# npm
npm install braintrust autoevals
# pnpm
pnpm add braintrust autoevals

2. Configure an API key

You need a Braintrust API key to authenticate your evaluation.Create an API key in the Braintrust UI and then add the key to your environment:
export BRAINTRUST_API_KEY="YOUR_API_KEY"

3. Run an evaluation

A Braintrust evaluation is a simple function composed of a dataset of user inputs, a task, and a set of scorers.
In addition to adding each data point inline when you call the Eval() function, you can also pass an existing or new dataset directly.
Create an evaluation script:
tutorial.eval.ts
import { Eval } from "braintrust";
import { Levenshtein } from "autoevals";

Eval(
  "Say Hi Bot", // Replace with your project name
  {
    data: () => {
      return [
        {
          input: "Foo",
          expected: "Hi Foo",
        },
        {
          input: "Bar",
          expected: "Hello Bar",
        },
      ]; // Replace with your eval dataset
    },
    task: async (input) => {
      return "Hi " + input; // Replace with your LLM call
    },
    scores: [Levenshtein],
  },
);
Run your evaluation:
npx braintrust eval tutorial.eval.ts
This will create an experiment in Braintrust. Once the command runs, you’ll see a link to your experiment.
To test your evaluation locally without sending results to Braintrust, add the --no-send-logs flag.

4. View your results

Congrats, you just ran an eval! You should see a dashboard like this when you load your experiment. This view is called the experiment view, and as you use Braintrust, we hope it becomes your trusty companion each time you change your code and want to run an eval.The experiment view allows you to look at high level metrics for performance, dig into individual examples, and compare your LLM app’s performance over time.First eval

5. Run another experiment

After running your first evaluation, you’ll see that we achieved a 77.8% score. Can you adjust the evaluation to improve this score? Make your changes and re-run the evaluation to track your progress.Second eval

Next steps