Latest News
Best practices
Read
Evals are a team sport: How we built Loop
How we debugged Loop's prompt optimization workflow by combining manual review, Loop analysis, and cross-functional collaboration.
25 November 20258 min
Read
The three pillars of AI observability
Why traces, evals, and annotation redefine observability for AI systems.
18 November 20258 min
Read
Braintrust Java SDK: AI observability and evals for the JVM
AI observability and evaluation tools for Java applications, built on OpenTelemetry.
23 October 20254 min
Read
Measuring what matters: An intro to AI evals
Learn how to build effective evals for your AI products with datasets, tasks, and scores.
10 October 20259 min
Read
Claude Sonnet 4.5 analysis
Learn how aspirational evals can help you figure out when new AI models unlock new product opportunities.
29 September 20255 min
Read
A/B testing can't keep up with AI
How evals can replace traditional A/B testing in product development.
3 September 20254 min
Read
The rise of async programming
The workflow that's changing how software gets built.
19 August 20255 min
Read
Five hard-learned lessons about AI evals
What our customers have taught us about running evals at scale.
17 July 20255 min
Read
Webinar recap: Eval best practices
A recap of our technical Q&A hosted by CEO Ankur Goyal.
22 April 20254 min
Read
What to do when a new AI model comes out
How to decide if you should use a new model in production.
4 December 20243 min
Read
Building a RAG app with MongoDB Atlas
How to iterate on AI applications without redeploying code.
18 November 20247 min
Read
I ran an eval. Now what?
A guide to next steps after your first eval and best practices for your workflows.
17 October 20246 min
Read
How to improve your evaluations
Learn how to improve your evals by identifying new evaluators, iterating on existing scorers, and adding new test cases.
20 June 20246 min
Read
AI development loops
Key activities that enable fast feedback and clear signal when developing AI features.
6 May 20245 min
Read
Getting started with automated evaluations
Three actionable approaches for engineering teams to get started with automated evaluations.
24 April 20245 min
Read
Eval feedback loops
Learn how to build robust eval feedback loops for AI products by connecting real-world log data to your evals. Discover best practices for structuring evals, flowing production logs into eval datasets, and using Braintrust to streamline the process.
17 April 20246 min
Read
The AI product development journey
Building reliable AI apps is hard. It’s easy to build a cool demo but hard to build an AI app that works in production for real users. In traditional software development, there’s a set of best practices like setting up CI/CD and writing tests to make your software robust and easy to build on. But, with LLM apps it’s not obvious how to create these tests or processes.
13 November 20235 min