Skip to main content
Building reliable AI applications requires a different approach than traditional software development. With AI, small changes to prompts, models, or data can have unpredictable effects on quality. Braintrust provides a structured workflow that helps you measure, understand, and improve your AI applications systematically. Effective AI development follows a continuous improvement cycle with five key stages:
  1. Instrument → Capture traces from your application
  2. Observe → Find patterns and issues in your data
  3. Annotate → Review and improve with human feedback
  4. Evaluate → Test and validate improvements
  5. Deploy → Ship changes and monitor impact
Each stage builds on the previous one, creating a feedback loop that enables continuous improvement.

Instrument

Capture detailed traces from your AI application by integrating Braintrust logging into your code. Traces record inputs, outputs, model parameters, latency, token usage, and other metadata for every request. What you’ll do: Outcome: Your application automatically sends trace data to Braintrust, giving you visibility into every request. Get started with instrumentation

Observe

Analyze your application’s behavior by exploring logs, identifying patterns, and discovering issues. Use filtering, search, and custom dashboards to understand what’s happening in production. What you’ll do: Outcome: You understand where your application succeeds and where it struggles, with concrete examples of both. Get started with observability

Annotate

Improve your data quality by adding human feedback, creating datasets, and labeling important examples. Annotation transforms raw logs into high-quality evaluation data. What you’ll do: Outcome: You have curated datasets that represent real user interactions, annotated with expert feedback. Get started with annotation

Evaluate

Test changes systematically by iterating in playgrounds and running experiments on your datasets. Start with rapid prototyping in playgrounds, then create immutable experiment snapshots to track improvements over time. What you’ll do: Outcome: You know which changes improve your application and which cause regressions, backed by quantitative data. Get started with evaluation

Deploy

Ship validated changes to production and monitor their impact. Deployment includes updating prompts, switching models, and running online evaluations to catch issues in real time. What you’ll do: Outcome: Your improvements run in production with monitoring in place to catch issues early. Get started with deployment
The cycle repeats as you deploy changes. New production logs feed back into the Observe stage, creating a continuous improvement loop.