Claude Code meets Braintrust
Claude Code is a fast way to build AI agents. You stay in the terminal, iterate quickly, and work with an AI that understands your codebase. But the moment something breaks, the workflow falls apart: you leave Claude Code, open Braintrust in a browser, search for the right traces, and piece together what happened after the fact. Checking experiment results or logging new data adds more tab switching and more lost context.
That gap between where you build and where you debug slows agent development. We closed that gap by making Claude Code and Braintrust work as a two-way system instead of a one-way export.
A two-way integration
There are two plugins. The first, trace-claude-code, automatically captures every Claude Code session as structured, hierarchical traces in Braintrust. Conversations, tool calls, and intermediate steps are logged by default with no extra work.

The second plugin, braintrust, brings Braintrust data back into Claude Code so developers can query logs, fetch experiment results, and log new data directly from the terminal using natural language.

The bidirectional flow is the important part. Most observability integrations only send data out, but agent development requires moving in both directions. You need to see what just happened, pull context from past runs, and compare behavior across experiments while you are still writing code.
In practice, this means you can ask Claude Code to find sessions from last week related to authentication issues, or pull the failing cases from a specific experiment, or log a new example for an eval dataset without leaving your editor. Claude Code queries Braintrust and returns the results inline, keeping the full development context intact. This matters because agents grow more complex and failures become harder to reason about after the fact.
Getting started
Once the plugins are installed and an API key is added, trace capture starts automatically. When you need production data or experiment results, Claude Code can fetch them on demand. The same Braintrust infrastructure teams use to run production AI now operates directly inside the development loop.