- Data - A dataset of test cases with inputs and expected outputs
- Task - An AI function you want to test
- Scores - Scoring functions that measure output quality
- SDK
- UI
Set up your environment and run evals with the Braintrust SDK.
1. Sign up
If you’re new to Braintrust, sign up free at braintrust.dev.2. Get API keys
Create API keys for:- Braintrust
- Your AI provider or framework (OpenAI, Anthropic, Gemini, etc.)
3. Install SDKs
Install the Braintrust SDK and required libraries:4. Run an eval
Build an evaluation that identifies movies from plot descriptions. You’ll define a dataset with movie plot descriptions as inputs and expected titles as outputs, write a task function with a prompt to identify movies, and use a scorer to measure accuracy.Write your evaluation
Create an evaluation that defines your dataset, task, and scorer (built-in
ExactMatch scorer for Python and TypeScript, equivalent code-based scorer for other languages):movie-matcher.eval.ts
Run the evaluation
Run your evaluation:This creates an experiment, a permanent record of how your task performed on the dataset. Each experiment captures inputs, outputs, scores, and metadata, making it easy to compare different versions of your prompts or models.
5. Iterate
You might notice that some scores are 0%. This is because the scorer requires outputs to exactly match the expected value. For example, if the AI returns “The movie is Se7en” instead of “Se7en”, or uses the UK title “Harry Potter and the Philosopher’s Stone” instead of the expected US title “Harry Potter and the Sorcerer’s Stone”, the score will be 0% for that case.Let’s improve the prompt to return only US-based movie titles and create a second experiment.View results
Click the link to your new experiment in the terminal output.The improved prompt should have higher scores because it returns just the movie title. In the Braintrust UI, you can compare this experiment with your first one to see the improvement.
Troubleshoot
Dataset not found error?
Dataset not found error?
Verify your dataset name matches exactly what you see in the Braintrust UI:Go to Datasets in your Braintrust project and confirm the dataset name.
Import errors or missing modules?
Import errors or missing modules?
Install all required packages:
API key errors?
API key errors?
Check your environment variables:Both should return values. If empty, set them:Get your Braintrust API key from Settings > API Keys.
Not seeing experiments in UI?
Not seeing experiments in UI?
Check your terminal output for the experiment link after running
braintrust eval. Click it to navigate directly to the experiment.If you don’t see a link:- Check for error messages in terminal output
- Verify network connectivity
- Ensure you’re viewing the correct project (“Evaluation quickstart”)
Need help?
Need help?
- Join our Discord
- Email us at [email protected]
- Use the Loop feature in the Braintrust UI
Next steps
- Explore the full Braintrust workflow
- Go deeper with evaluation:
- Write custom scorers - Measure what matters for your use case
- Compare experiments - Systematically test different approaches
- Build datasets - Create representative test cases from production data
- Run evaluations in CI/CD - Catch regressions automatically