Score tool calls, reasoning steps, and outputs across every agent run
Every tool call and reasoning step logs as a span. Score them, compare experiments, and find exactly which step caused the regression.
Free to start · No credit card · 5-min setup
Evaluating for a team? Talk to us →
8 results · query: "Q3 earnings"
2,841 tokens extracted
gpt-4o · 3,102 tokens · $0.009
0.96 · pass
0.91 · pass
search 0.61s · read 0.38s · llm 1.94s · overhead 0.28s
Trusted by AI teams at
Built around your eval workflow
Run agent evals from code or the UI. Iterate in the playground without touching code.

Run evaluations with Eval(), CLI, or UI
Define your agent task and test cases in code, run from the terminal, or build evals in the UI. Every span logs automatically.

Use playgrounds for rapid iteration
Adjust prompts, swap models, and replay your test cases without touching code. Iterations stay linked to your experiments.
Works with
Sign up free and we'll give you instructions to get set up with your stack in minutes.
From teams using Braintrust
<24hrs
To deploy a new frontier model
<10 min
Eval turnaround
50% → 90%+
Accuracy improvement
45x
More feedback
Notion, Dropbox, Zapier, and Coursera use Braintrust to ship better AI.
Built for agentic evaluation
Every step as a span
Tool calls and retrieval steps nest as child spans. Open any run and see the full decision path across services (LangGraph, CrewAI, AutoGen, and more).
Trace-level scoring and experiment diffs
Score at the trace level: factuality, task completion, tool use accuracy, groundedness. Compare experiments and see exactly which step caused the regression.
Production traces become eval datasets
Tag a failing trace and it goes straight into a dataset. The format is the same in production and in evals. The traces you debug today are the tests you ship tomorrow.
Customer spotlight
“Braintrust helps us ship AI agents customers actually trust.”
Mohsen Sardari, VP Engineering at Bill
Talk to us about agent evaluation →
Stop shipping agents on vibes
Set up your first agent eval in minutes
Free to start · No credit card required
Get started free