Latest Braintrust news - Braintrust

Latest Braintrust news

GPT-5 vs. Claude Opus 4.1

Which one you should ship with, and how to know for sure.

The canonical agent architecture: A while loop with tools

Why the best AI agents are just loops that call functions.

Five hard-learned lessons about AI evals

What our customers have taught us about running evals at scale.

Braintrust is not an eval framework

Why we built infrastructure for AI products, not just another evaluation tool.

Building with Grok 4

xAI recently announced Grok 4. We put it to the ultimate test.

Experiments UI: Now 10x faster

Brainstore speeds up experiments, datasets, and logs.

Eval playgrounds for faster, focused iteration

Run full evals directly in a powerful editor UI.

How Coursera builds next-generation learning tools

Key learnings from the Coursera AI engineering team.

Webinar recap: Eval best practices

A recap of our technical Q&A hosted by CEO Ankur Goyal.

Resilient observability by design

How we built Braintrust to ensure no impact on downtime.

Brainstore is now on by default

Brainstore is now the default in both our UI and API. Learn what's changing and coming next.

Brainstore: the database designed for the AI engineering era

LLM observability, now 80x faster.

Bedrock, Vertex AI, and universal structured outputs

Full support for Bedrock, Vertex AI, and structured outputs in the AI proxy and playground.

14 February 2025

How Fintool generates millions of financial insights

Learn to build trusted and scalable LLM apps from the team at Fintool.

31 January 2025

How Loom auto-generates video titles

Learn scoring best practices from the software engineering team at Loom.

27 January 2025

Evaluating agents

Learn best practices for scoring agentic systems.

22 January 2025

Our approach to hybrid deployment

The easiest way to self-host Braintrust.

The top 10 most loved features of 2024

Our year in review.

31 December 2024

New monitor page for easy analytics

More visibility into performance across logs and experiments.

18 December 2024

What to do when a new AI model comes out

How to decide if you should use a new model in production.

4 December 2024

Building a RAG app with MongoDB Atlas

How to iterate on AI applications without redeploying code.

18 November 2024

Evaluating Gemini models for vision

Faster, more efficient, and highly accurate for real-world applications.

14 November 2024

Python tool functions: powered by uv

How we used the uv library to build Python tools.

13 November 2024

Building serverless apps with the OpenAI Realtime API

No server setup or configuration necessary.

4 November 2024

Logging with attachments

Observability for advanced AI applications.

24 October 2024

I ran an eval. Now what?

A guide to next steps after your first eval and best practices for your workflows.

17 October 2024

How Notion develops world-class AI features

Learn how Notion refined their development workflow with Braintrust.

Announcing our $36M Series A

We’re thrilled to announce that we've raised $36 million to advance the future of AI software engineering, bringing our total funding to $45 million.

Functions: flexible AI engineering primitives

Introducing functions, a general-purpose primitive for building, evaluating, and observing AI products.

Custom scoring functions in the Braintrust Playground

Create custom scorers and access them via the Braintrust UI and API.

16 September 2024

Braintrust achieves SOC 2 Type II compliance

We are excited to announce that Braintrust has achieved SOC 2 Type II compliance.

How to improve your evaluations

Learn how to improve your evals by identifying new evaluators, iterating on existing scorers, and adding new test cases.

How Zapier builds production-ready AI products

Zapier was one of the earliest adopters of GenAI. In this post, we share insights from Mike Knoop, Co-founder & Head of AI at Zapier.

AI development loops

Key activities that enable fast feedback and clear signal when developing AI features.

Getting started with automated evaluations

Three actionable approaches for engineering teams to get started with automated evaluations.

Eval feedback loops

Learn how to build robust eval feedback loops for AI products by connecting real-world log data to your evals. Discover best practices for structuring evals, flowing production logs into eval datasets, and using Braintrust to streamline the process.

Braintrust selected to be in the Enterprise Tech 30

The Enterprise Tech 30 by Wing Venture Capital names the highest potential private companies in enterprise technology.

How Hostinger evaluates AI applications with Braintrust

Liucija, Senior Data Scientist on the AI team @ Hostinger, provides an overview of how she leverages Braintrust to accelerate Hostinger's AI development process and automate over 40% of customer support chat conversations.

27 February 2024

2023, a year in review

Check out your Braintrust 2023 year in review to see how you did this year!

21 December 2023

Braintrust's seed round: $5m to build infrastructure for AI products

Announcing Braintrust's seed round led by Greylock. The round builds on our early traction with customers like Zapier, Coda, Airtable, and Instacart and allows us to accelerate our vision of building world-class infrastructure for AI products. We are hiring for a number of roles, so please check out our careers page if you are interested in joining us.

13 December 2023

Open sourcing the AI proxy

The Braintrust AI Proxy is now open source! We also added support for Azure OpenAI and provider load balancing.

27 November 2023

AI proxy: fostering a more open ecosystem

Introducing Braintrust's latest feature: an AI proxy that lets you use open source models like LLaMa 2 and Mistral, as well as all of OpenAI's and Anthropic's models, behind a single interface with caching, security, and API key management built in.

20 November 2023

State of AI development 2023

Retool recently surveyed over 1,500 workers and how their companies are adopting AI in their State of AI 2023 report. Here's what they are struggling with and how Braintrust can help them.

15 November 2023

The AI product development journey

Building reliable AI apps is hard. It’s easy to build a cool demo but hard to build an AI app that works in production for real users. In traditional software development, there’s a set of best practices like setting up CI/CD and writing tests to make your software robust and easy to build on. But, with LLM apps it’s not obvious how to create these tests or processes.

13 November 2023

Weekly update 11/13/23

Function calling and tool support, new blog posts, and project UI improvements.

13 November 2023

Weekly update 11/06/23

Perplexity models support, new OpenAI models, reworked diff selector in experiment view.

06 November 2023

Weekly update 10/30/23

Resizable sidebar, new help tooltips, performance optimizations, Replit.

30 October 2023

Weekly update 10/23/23

Auto input variables in the playground, duration metrics, performance optimizations, partner releases.

23 October 2023

Weekly update 10/16/23

Tracing, experiment dashboard customization, text-block prompts, bigger tables, new eval docs.

16 October 2023

Weekly update 10/09/23

Performance improvements, fine tuning tutorial, Alpaca Evals, autocomplete in the playground.

09 October 2023

It's time to build reliable AI

Introducing Braintrust: the enterprise-grade stack for building AI products. From evaluations, to prompt playground, to data management, we take uncertainty and tedium out of incorporating AI into your business.

12 September 2023