# Braintrust

> The comprehensive AI evaluation and testing platform. Braintrust helps developers evaluate, iterate, and ship reliable AI applications with powerful logging, evaluation, and debugging tools.

## Platform

- [Braintrust Platform](https://www.braintrust.dev/): Complete AI evaluation and testing platform for modern AI development
- [AI Proxy](https://www.braintrust.dev/docs/guides/proxy): Unified API to access leading AI models from OpenAI, Anthropic, Meta, Mistral, and more
- [Playground](https://www.braintrust.dev/docs/guides/playground): Interactive environment for testing and iterating on AI prompts and models
- [Self-Hosting](https://www.braintrust.dev/docs/guides/self-hosting): Deploy Braintrust in your own infrastructure with full control and security

## Features

- [Experiments](https://www.braintrust.dev/docs/guides/experiments): Run systematic evaluations and compare AI model performance across different configurations
- [Datasets](https://www.braintrust.dev/docs/guides/datasets): Manage and version your evaluation datasets with powerful querying and filtering
- [Functions](https://www.braintrust.dev/docs/guides/functions): Create and manage prompts, scorers, and tools as reusable, versionable functions
- [Logging](https://www.braintrust.dev/docs/guides/logs): Comprehensive logging and observability for AI applications with traces and spans
- [Human Review](https://www.braintrust.dev/docs/guides/human-review): Collect human feedback and annotations to improve your AI systems

## SDKs & Libraries

- [JavaScript/TypeScript SDK](https://www.braintrust.dev/docs/reference/libs/nodejs): Full-featured SDK for logging, evaluation, and experimentation
- [Python SDK](https://www.braintrust.dev/docs/reference/libs/python): Comprehensive Python library for AI evaluation and logging
- [Autoevals](https://www.braintrust.dev/docs/reference/autoevals): Automatic evaluation library with LLM-as-judge, statistical, and heuristic methods

## Getting Started

- [Quick Start](https://www.braintrust.dev/docs/start): Get started with your first AI evaluation in minutes
- [Evaluation Guide](https://www.braintrust.dev/docs/start/eval-sdk): Learn how to use the evaluation SDK
- [UI Guide](https://www.braintrust.dev/docs/start/eval-ui): Explore the web interface for managing evaluations

## Documentation

- [Documentation-specific llms.txt](https://www.braintrust.dev/docs/llms.txt): Focused documentation directory for technical references and guides
- [Documentation Home](https://www.braintrust.dev/docs): Complete documentation portal with guides, references, and tutorials
- [Guides](https://www.braintrust.dev/docs/guides): Comprehensive guides covering all platform features
- [API Reference](https://www.braintrust.dev/docs/reference/api): Complete REST API documentation
- [Best Practices](https://www.braintrust.dev/docs/best-practices): Expert guidance for AI evaluation and testing
- [Cookbook](https://www.braintrust.dev/docs/cookbook): Practical recipes and examples for common AI evaluation tasks

## Open Source

- [Braintrust SDK](https://github.com/braintrustdata/braintrust): Open source evaluation and logging SDKs
- [AI Proxy](https://github.com/braintrustdata/braintrust/tree/main/proxy): Open source unified AI proxy
- [Autoevals](https://github.com/braintrustdata/braintrust/tree/main/autoevals): Open source automatic evaluation library

## Blog & Resources

### Blog Home
- [Blog Home](https://www.braintrust.dev/blog): Latest updates, tutorials, and insights on AI evaluation

### Funding Announcements
- [Announcing our $36M Series A](https://www.braintrust.dev/blog/announcing-series-a): We're thrilled to announce that we've raised $36 million to advance the future of AI software engineering, bringing our total funding to $45 million.
- [Braintrust's seed round: $5m to build infrastructure for AI products](https://www.braintrust.dev/blog/seed-round): Announcing Braintrust's seed round led by Greylock. The round builds on our early traction with customers like Zapier, Coda, Airtable, and Instacart and allows us to accelerate our vision of building world-class infrastructure for AI products.
- [It's time to build reliable AI](https://www.braintrust.dev/blog/reliable-ai): Introducing Braintrust: the enterprise-grade stack for building AI products. From evaluations, to prompt playground, to data management, we take uncertainty and tedium out of incorporating AI into your business.

### Platform & Infrastructure
- [Braintrust is not an eval framework](https://www.braintrust.dev/blog/braintrust-not-eval-framework): Why we built infrastructure for AI products, not just another evaluation tool.
- [Brainstore: the database designed for the AI engineering era](https://www.braintrust.dev/blog/brainstore): LLM observability, now 80x faster.
- [Brainstore is now on by default](https://www.braintrust.dev/blog/brainstore-default): Brainstore is now the default in both our UI and API. Learn what's changing and coming next.
- [AI proxy: fostering a more open ecosystem](https://www.braintrust.dev/blog/ai-proxy): Introducing Braintrust's latest feature: an AI proxy that lets you use open source models like LLaMa 2 and Mistral, as well as all of OpenAI's and Anthropic's models, behind a single interface with caching, security, and API key management built in.
- [Open sourcing the AI proxy](https://www.braintrust.dev/blog/open-sourcing-proxy): The Braintrust AI Proxy is now open source! We also added support for Azure OpenAI and provider load balancing.
- [Our approach to hybrid deployment](https://www.braintrust.dev/blog/hybrid-deployment): The easiest way to self-host Braintrust.
- [Resilient observability by design](https://www.braintrust.dev/blog/resilient-design): How we built Braintrust to ensure no impact on downtime.

### Evaluations & Best Practices
- [Getting started with automated evaluations](https://www.braintrust.dev/blog/getting-started-evals): Three actionable approaches for engineering teams to get started with automated evaluations.
- [I ran an eval. Now what?](https://www.braintrust.dev/blog/after-evals): A guide to next steps after your first eval and best practices for your workflows.
- [Eval feedback loops](https://www.braintrust.dev/blog/eval-feedback-loops): Learn how to build robust eval feedback loops for AI products by connecting real-world log data to your evals. Discover best practices for structuring evals, flowing production logs into eval datasets, and using Braintrust to streamline the process.
- [How to improve your evaluations](https://www.braintrust.dev/blog/improve-evals): Learn how to improve your evals by identifying new evaluators, iterating on existing scorers, and adding new test cases.
- [Five hard-learned lessons about AI evals](https://www.braintrust.dev/blog/five-lessons-evals): What our customers have taught us about running evals at scale.
- [Evaluating agents](https://www.braintrust.dev/blog/evaluating-agents): Learn best practices for scoring agentic systems.
- [Webinar recap: Eval best practices](https://www.braintrust.dev/blog/best-practices): A recap of our technical Q&A hosted by CEO Ankur Goyal.

### AI Development & Programming
- [The canonical agent architecture: A while loop with tools](https://www.braintrust.dev/blog/agent-while-loop): Why the best AI agents are just loops that call functions.
- [AI development loops](https://www.braintrust.dev/blog/ai-development-loops): Key activities that enable fast feedback and clear signal when developing AI features.
- [The rise of async programming](https://www.braintrust.dev/blog/async-programming): The workflow that's changing how software gets built.
- [The AI product development journey](https://www.braintrust.dev/blog/journey): Building reliable AI apps is hard. It's easy to build a cool demo but hard to build an AI app that works in production for real users. In traditional software development, there's a set of best practices like setting up CI/CD and writing tests to make your software robust and easy to build on. But, with LLM apps it's not obvious how to create these tests or processes.

### Features & Tools
- [Functions: flexible AI engineering primitives](https://www.braintrust.dev/blog/functions): Introducing functions, a general-purpose primitive for building, evaluating, and observing AI products.
- [Logging with attachments](https://www.braintrust.dev/blog/attachments): Observability for advanced AI applications.
- [Custom scoring functions in the Braintrust Playground](https://www.braintrust.dev/blog/custom-scorers): Create custom scorers and access them via the Braintrust UI and API.
- [Eval playgrounds for faster, focused iteration](https://www.braintrust.dev/blog/eval-playgrounds): Run full evals directly in a powerful editor UI.
- [Experiments UI: Now 10x faster](https://www.braintrust.dev/blog/faster-experiments): Brainstore speeds up experiments, datasets, and logs.
- [New monitor page for easy analytics](https://www.braintrust.dev/blog/monitor): More visibility into performance across logs and experiments.

### Model Comparisons & Updates
- [GPT-5 vs. Claude Opus 4.1](https://www.braintrust.dev/blog/gpt-5-vs-claude-opus): Which one you should ship with, and how to know for sure.
- [Building with Grok 4](https://www.braintrust.dev/blog/grok-4): xAI recently announced Grok 4. We put it to the ultimate test.
- [Evaluating Gemini models for vision](https://www.braintrust.dev/blog/gemini): Faster, more efficient, and highly accurate for real-world applications.
- [What to do when a new AI model comes out](https://www.braintrust.dev/blog/new-model): How to decide if you should use a new model in production.
- [Bedrock, Vertex AI, and universal structured outputs](https://www.braintrust.dev/blog/model-updates): Full support for Bedrock, Vertex AI, and structured outputs in the AI proxy and playground.

### Customer Stories & Case Studies
- [How Zapier builds production-ready AI products](https://www.braintrust.dev/blog/zapier-ai): Zapier was one of the earliest adopters of GenAI. In this post, we share insights from Mike Knoop, Co-founder & Head of AI at Zapier.
- [How Notion develops world-class AI features](https://www.braintrust.dev/blog/notion): Learn how Notion refined their development workflow with Braintrust.
- [How Coursera builds next-generation learning tools](https://www.braintrust.dev/blog/coursera): Key learnings from the Coursera AI engineering team.
- [How Loom auto-generates video titles](https://www.braintrust.dev/blog/loom): Learn scoring best practices from the software engineering team at Loom.
- [How Fintool generates millions of financial insights](https://www.braintrust.dev/blog/fintool): Learn to build trusted and scalable LLM apps from the team at Fintool.
- [How Hostinger evaluates AI applications with Braintrust](https://www.braintrust.dev/blog/hostinger-evals): Liucija, Senior Data Scientist on the AI team @ Hostinger, provides an overview of how she leverages Braintrust to accelerate Hostinger's AI development process and automate over 40% of customer support chat conversations.

### Technical Tutorials & Guides
- [Building serverless apps with the OpenAI Realtime API](https://www.braintrust.dev/blog/realtime-api): No server setup or configuration necessary.
- [Building a RAG app with MongoDB Atlas](https://www.braintrust.dev/blog/rag-mongodb): How to iterate on AI applications without redeploying code.
- [Python tool functions: powered by uv](https://www.braintrust.dev/blog/python-tools-uv): How we used the uv library to build Python tools.

### Company Updates & News
- [The top 10 most loved features of 2024](https://www.braintrust.dev/blog/2024): Our year in review.
- [2023, a year in review](https://www.braintrust.dev/blog/2023-summary): Check out your Braintrust 2023 year in review to see how you did this year!
- [Braintrust achieves SOC 2 Type II compliance](https://www.braintrust.dev/blog/soc2): We are excited to announce that Braintrust has achieved SOC 2 Type II compliance.
- [Braintrust selected to be in the Enterprise Tech 30](https://www.braintrust.dev/blog/wing-30): The Enterprise Tech 30 by Wing Venture Capital names the highest potential private companies in enterprise technology.
- [State of AI development 2023](https://www.braintrust.dev/blog/state-of-ai): Retool recently surveyed over 1,500 workers and how their companies are adopting AI in their State of AI 2023 report. Here's what they are struggling with and how Braintrust can help them.

## Community

- [GitHub](https://github.com/braintrustdata/braintrust): Source code, issues, and contributions
- [Discord](https://discord.gg/6G8s47F44X): Join our community for support and discussions
- [Twitter](https://twitter.com/braintrustdata): Follow us for announcements and tips