Skip to main content
Braintrust offers a self-hosted deployment option that separates data storage from platform management. You deploy and control the infrastructure that stores your sensitive AI data, while Braintrust provides the managed UI, authentication, and platform updates. This gives you full control over your data without the operational overhead of running the entire platform.

Use cases

Self-hosting is designed for organizations with specific requirements:
  • Data residency and compliance: Meet regulatory or contractual obligations by keeping all customer data (experiment logs, traces, datasets, and prompts) within your own cloud account and region.
  • Security posture and isolation: Deploy the data plane behind your firewall or VPN, using your own IAM policies, KMS encryption keys, and audit trails. This ensures sensitive data never traverses external networks.
  • Access to private resources: Connect to internal LLM models, proprietary tools, or private APIs that are not accessible from the public internet. The data plane runs within your network and can access resources in your VPC or private network.

How it works

Braintrust’s architecture has two main components:
  • The data plane stores all sensitive data, including experiment records, logs, traces, spans, datasets, and prompt completions. It consists of the Braintrust API, a PostgreSQL database, Redis cache, object storage, and Brainstore (a high-performance query engine for real-time trace ingestion).
  • The control plane provides the web UI, authentication, user management, and metadata storage (project names, experiment names, organization settings). The control plane does not store or process your sensitive data.
DataLocation
Experiment records (input, output, expected, scores, metadata, traces, spans)Data plane
Log records (input, output, expected, scores, metadata, traces, spans)Data plane
Dataset records (input, output, metadata)Data plane
Prompt playground promptsData plane
Prompt playground completionsData plane
Human review scoresData plane
Project-level LLM provider secrets (encrypted)Data plane
Org-level LLM provider secrets (encrypted)Control plane
API keys (hashed)Control plane
Experiment and dataset namesControl plane
Project namesControl plane
Project settingsControl plane
Git metadata about experimentsControl plane
Organization info (name, settings)Control plane
Login info (name, email, avatar URL)Control plane
Auth credentialsClerk
When you self-host Braintrust, you deploy the data plane in your own infrastructure using Terraform. On AWS, this uses Lambda functions and EC2 instances. On GCP and Azure, this uses Kubernetes containers. Braintrust continues to host the control plane. When you use Braintrust’s SDKs, they send data directly to your data plane. When you use the web UI, your browser communicates directly with your data plane via CORS. The control plane and data plane communicate only for authentication and metadata synchronization. Braintrust’s servers and employees do not require access to your data plane for it to operate.

Deployment options

Braintrust provides official Terraform modules for self-hosting on AWS, Google Cloud Platform (GCP), and Azure:
  • AWS: Terraform with Lambda and EC2
  • GCP: Terraform with Kubernetes and Helm
  • Azure: Terraform with Kubernetes and Helm
Braintrust strongly recommends using these Terraform modules because they are kept up-to-date with best practices, mirror the fully hosted offering (proven at scale), minimize configuration issues, and ensure Braintrust can efficiently troubleshoot performance and operational issues. If the module conflicts with your organization’s infrastructure standards, you can deploy Braintrust in a dedicated cloud account or project to address these concerns. If this approach does not work for your situation, contact Braintrust to discuss possible modifications to the modules.
Legacy customers: If you previously deployed using AWS CloudFormation, the CloudFormation guide remains available. This deployment method is not supported for new customers.

Shared responsibility

When you self-host, uptime becomes a shared responsibility between your team and Braintrust:
  • Braintrust is responsible for responding quickly when you have issues, collaboratively resolving them with you, and fixing bugs to improve quality.
  • Your team is responsible for following the documentation, assigning infrastructure resources on your team, and ensuring that in the event of an incident, you have staff who are familiar with Braintrust and can work with the Braintrust team to share context and resolve issues.

Monitoring

By default, your self-hosted data plane automatically sends the following telemetry back to the Braintrust-managed control plane:
  • Health check information
  • System metrics (CPU/memory) and Braintrust-specific metrics like indexing lag
  • Billing usage telemetry for aggregate usage metrics
This allows Braintrust to monitor key health indicators and quickly identify issues before they cause downtime. In some cases, Braintrust may ask you to enable additional telemetry to help with troubleshooting, including logs and traces. For more details, see Enable or disable telemetry.

Upgrades

Braintrust releases new versions of the data plane around once per week, often with incremental changes that improve the performance of Brainstore, add support for new features, and improve logging. You can find the details of each data plane release on the Data plane changelog. Braintrust recommends that you update monthly, but you must update at least once per quarter. If you require support, either to diagnose an issue or improve a feature, Braintrust may ask you to upgrade to the latest version as a first step. Braintrust does not have specific long-term releases at this point, and the team is best equipped to support the latest version. For platform-specific upgrade instructions, see:

Remote access

There are occasionally issues that require ad-hoc debugging or running manual commands against containers, the Postgres database, or storage buckets to repair the state of the system. Customers who provide Braintrust with remote access (as needed) have experienced much faster resolutions when such issues occur, because the Braintrust team can connect directly and resolve issues. If this is not possible, factor this into your uptime calculations. If uptime of Braintrust is a key metric for you, strongly consider making remote access available to the Braintrust team as needed. If you cannot set up remote access, ensure that you can swiftly access:
  • Containers directly (to update them, view logs, restart them, and view host metrics like CPU, network, memory, and disk utilization)
  • Postgres to run SQL queries
  • Redis to run commands
  • Storage buckets to run read, write, and list commands
Your on-call staff should have basic familiarity with Braintrust and the ability to perform all of these operations.

Hardware requirements

When deploying Braintrust in production, consider these hardware requirements for reliable performance and uptime. These requirements assume typical production usage patterns. For high-utilization deployments, you may need to scale these resources up significantly. Monitor your resource utilization and adjust accordingly.

API service

This section applies to GCP and Azure with Kubernetes. AWS deployments use Lambda functions, which are managed automatically and do not require manual resource configuration.
ResourceTesting/StagingProduction
CPU1 vCPU2+ vCPUs per instance
Memory2GB RAM8GB+ RAM
Instance count14+
Environment variables:
  • NODE_MEMORY_PERCENT: Set to 80-90 if the API is running on a dedicated instance or container orchestrator with cgroup memory limits (e.g. Kubernetes, ECS).

Database (PostgreSQL)

ResourceTesting/StagingProduction
CPU2 vCPUs8+ vCPUs
Memory8GB RAM64GB+ RAM
Storage size100GB1000GB+ (monitor for growth)
Storage IOPS3,00015,000+
Version15+17+

Redis cache

ResourceTesting/StagingProduction
CPU1 vCPU2 vCPUs
Memory1GB RAM4GB+ RAM
Version7+7+
Important for AWS: Avoid using burstable Redis instances (t-family instances like cache.t4g.micro) in production. These instances use CPU credits that can be exhausted during high-load periods, leading to performance throttling.Instead, use non-burstable instances like cache.r7g.large, cache.r6g.medium, or cache.r5.large for predictable performance. Even if these instances seem oversized initially, they provide consistent performance without the risk of CPU credit exhaustion.

Brainstore

ResourceTesting/StagingProduction
CPU4 vCPUs16+ vCPUs (ARM recommended)
Memory8GB RAM32GB+ RAM
Storage size128GB1024GB+
Storage typeSSDNVMe (ephemeral)
Storage IOPS150,000+ read/write
Node typesCombined reader/writerSeparate readers and writers
Instance count12+ readers, 1+ writers
Important
  • Brainstore requires separate reader and writer nodes for reliability and performance. Plan for a minimum of 2 reader nodes to ensure high availability. A single writer node is sufficient since writers can tolerate brief downtimes and do not service interactive user requests.
  • Brainstore requires high-performance storage with at least 150,000 IOPS for both reads and writes. Use NVMe-based ephemeral storage (the storage does not need to be persistent). Do not use EBS volumes or other slower storage options like Azure’s standard local disks, as these will significantly degrade performance.
  • For Kubernetes deployments (GCP and Azure), each Brainstore pod must run on its own dedicated node to ensure optimal performance and resource isolation.