- Instrument → Capture traces from your application
- Observe → Find patterns and issues in your data
- Annotate → Review and improve with human feedback
- Evaluate → Test and validate improvements
- Deploy → Ship changes and monitor impact
Instrument
Capture detailed traces from your AI application by integrating Braintrust logging into your code. Traces record inputs, outputs, model parameters, latency, token usage, and other metadata for every request. What you’ll do:- Wrap your AI provider clients (OpenAI, Anthropic, Gemini, etc.)
- Integrate with frameworks like LangChain or OpenTelemetry
- Configure structured tracing for complex workflows
Observe
Analyze your application’s behavior by exploring logs, identifying patterns, and discovering issues. Use filtering, search, and custom dashboards to understand what’s happening in production. What you’ll do:- View and filter logs to spot errors, latency issues, and unexpected outputs
- Use deep search to find similar traces semantically
- Create custom dashboards to track key metrics over time
- Use Loop to ask questions and explore patterns in your logs
Annotate
Improve your data quality by adding human feedback, creating datasets, and labeling important examples. Annotation transforms raw logs into high-quality evaluation data. What you’ll do:- Add human review scores and feedback to traces
- Review traces and add feedback, comments, and expected outputs
- Use labels to flag interesting examples for closer examination
- Build datasets from annotated traces
Evaluate
Test changes systematically by iterating in playgrounds and running experiments on your datasets. Start with rapid prototyping in playgrounds, then create immutable experiment snapshots to track improvements over time. What you’ll do:- Use playgrounds for rapid prototyping and iteration
- Write scorers to quantify quality improvements
- Run experiments to snapshot results and track progress
- Compare experiment results to identify improvements and regressions
Deploy
Ship validated changes to production and monitor their impact. Deployment includes updating prompts, switching models, and running online evaluations to catch issues in real time. What you’ll do:- Deploy prompts and functions to production
- Use the AI Proxy to call any AI provider through a unified interface
- Monitor production with online scoring and dashboards
The cycle repeats as you deploy changes. New production logs feed back into the Observe stage, creating a continuous improvement loop.