Skip to main content
The Braintrust AI proxy provides unified access to models from OpenAI, Anthropic, Google, AWS, Mistral, and third-party providers through a single API. Point your OpenAI SDK to the proxy URL and immediately get automatic caching, observability, and multi-provider support. The proxy is free to use, even without a Braintrust account.
Not for production. The AI proxy is intended for development and testing. It has no production SLAs and may experience service interruptions, rate limiting, and timeouts.

Quickstart

You can use the proxy without a Braintrust account by providing your API key from any supported provider. If you have a Braintrust account, you can use a single Braintrust API key to access all AI providers through one interface. The proxy is fully compatible with the OpenAI SDK. Set the API URL to https://api.braintrust.dev/v1/proxy. Run the following script twice to see caching in action:
import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://api.braintrust.dev/v1/proxy",
  apiKey: process.env.BRAINTRUST_API_KEY,
});

async function main() {
  const start = performance.now();
  const response = await client.chat.completions.create({
    model: "gpt-4o-mini", // Can use claude-3-5-sonnet-latest, gemini-2.5-flash, etc.
    messages: [{ role: "user", content: "What is a proxy?" }],
    seed: 1, // A seed activates the cache
  });
  console.log(response.choices[0].message.content);
  console.log(`Took ${(performance.now() - start) / 1000}s`);
}

main();
The second run will be significantly faster because the proxy serves your request from its cache, rather than calling the AI provider’s model. The proxy runs on Cloudflare Workers and caches requests with end-to-end encryption. The proxy supports over 100 models including GPT-5, Claude 4, Gemini 2.5, and Llama models through providers like Together AI and AWS Bedrock. New models are added regularly.

Configure API keys

Add provider API keys in your organization settings under AI providers, configure them at the project level to override organization defaults, or set them up inline when running playgrounds or prompts. Then use your Braintrust API key to access all providers through the proxy. Organization-level providers are available across all projects. Project-level providers override organization-level keys for that specific project, allowing you to isolate API usage, manage separate billing, or use different credentials per project. Project-level API keys take precedence over organization-level keys when making proxy requests in a project context. Without a Braintrust account, you can use the proxy with individual provider API keys to get automatic caching. The proxy response returns the x-bt-used-endpoint header, which specifies which of your configured providers was used to complete the request.

Supported providers

Standard providers include:
  • OpenAI (GPT-4o, GPT-4o-mini, o4-mini, etc.)
  • Anthropic (Claude 4 Sonnet, Claude 3.5 Sonnet, etc.)
  • Google (Gemini 2.5 Flash, Gemini 2.5 Pro, etc.)
  • AWS Bedrock (Claude, Llama, Mistral models)
  • Azure OpenAI Service
  • Third-party providers (Together AI, Fireworks, Groq, Replicate, etc.)
If you need a model that isn’t supported, let us know.

Enable caching

The proxy automatically caches results and reuses them when possible. Because the proxy runs on the edge, cached requests return in under 100ms. This is especially useful when developing and frequently re-running or evaluating the same prompts.

Cache modes

There are three caching modes: auto (default), always, never:
  • In auto mode, requests are cached if they have temperature=0 or the seed parameter set and they are one of the supported paths.
  • In always mode, requests are cached as long as they are one of the supported paths.
  • In never mode, the cache is never read or written to.
The supported paths are:
  • /auto
  • /embeddings
  • /chat/completions
  • /completions
  • /moderations
Set the cache mode by passing the x-bt-use-cache header:
import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://api.braintrust.dev/v1/proxy",
  defaultHeaders: {
    "x-bt-use-cache": "always",
  },
  apiKey: process.env.BRAINTRUST_API_KEY,
});
The response includes x-bt-cached: HIT or MISS to indicate cache status.

Cache TTL

By default, cached results expire after 1 week. Set the TTL for individual requests by passing the x-bt-cache-ttl header. The TTL is specified in seconds and must be between 1 and 604800 (7 days).

Cache control

The proxy supports a limited set of Cache-Control directives:
  • To bypass the cache, set the Cache-Control header to no-cache, no-store. This is semantically equivalent to setting the x-bt-use-cache header to never.
  • To force a fresh request, set the Cache-Control header to no-cache. Without the no-store directive, the response will be cached for subsequent requests.
  • To request a cached response with a maximum age, set the Cache-Control header to max-age=<seconds>. If the cached data is older than the specified age, the cache will be bypassed and a new response will be generated. Combine this with no-store to bypass the cache for a request without overwriting the current cached response.
When cache control directives conflict with the x-bt-use-cache header, the cache control directives take precedence. The proxy returns the x-bt-cached header in the response with HIT or MISS to indicate whether the response was served from the cache, the Age header to indicate the age of the cached response, and the Cache-Control header with the max-age directive to return the TTL of the cached response. For example, to set the cache mode to always with a TTL of 2 days:
import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://api.braintrust.dev/v1/proxy",
  defaultHeaders: {
    "x-bt-use-cache": "always",
    "Cache-Control": "max-age=172800",
  },
  apiKey: process.env.BRAINTRUST_API_KEY,
});

async function main() {
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "What is a proxy?" }],
  });
  console.log(response.choices[0].message.content);
}

main();

Cache encryption

The proxy uses AES-GCM to encrypt the cache, using a key derived from your API key. Results are cached for 1 week unless otherwise specified in request headers. This design ensures that the cache is only accessible to you. Braintrust cannot see your data and does not store or log API keys.
Because the cache’s encryption key is your API key, cached results are scoped to an individual user. Braintrust customers can opt into sharing cached results across users within their organization.

Enable logging

To log requests that you make through the proxy, specify an x-bt-parent header with the project or experiment you’d like to log to. While tracing, you must use a BRAINTRUST_API_KEY rather than a provider’s key. The proxy will derive your provider’s key and facilitate tracing using the BRAINTRUST_API_KEY.
import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://api.braintrust.dev/v1/proxy",
  defaultHeaders: {
    "x-bt-parent": "project_id:YOUR_PROJECT_ID", // Replace with your project ID
  },
  apiKey: process.env.BRAINTRUST_API_KEY,
});

async function main() {
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "What is a proxy?" }],
  });
  console.log(response.choices[0].message.content);
}

main();
The x-bt-parent header sets the trace’s parent project or experiment. You can use a prefix like project_id:, project_name:, or experiment_id: or pass in a span slug (span.export()) to nest the trace under a span within the parent object.

Load balance across providers

If you have multiple API keys for a given model type (e.g., OpenAI and Azure for gpt-4o), the proxy automatically load balances across them. This is useful for working around per-account rate limits and providing resiliency if one provider is down. To set up load balancing:
  1. Add your primary provider key (e.g., OpenAI) in your organization settings.
  2. Add Azure OpenAI as a custom provider for the same models.
  3. The proxy automatically distributes requests across both.
Load balancing provides:
  • Resilience if one provider is down
  • Higher effective rate limits
  • Geographic distribution
Configure endpoints on the secrets page in your Braintrust account.

Use reasoning models

For hybrid deployments, reasoning support requires v0.0.74 or later.
The proxy lets you write one chat completion call that works across multiple providers by standardizing support for reasoning-specific features.
  • Supported providers: OpenAI, Anthropic, and Google
  • Unified parameters: Consistent parameters related to reasoning:
    • reasoning_effort: Specify the desired level of reasoning complexity
    • reasoning_enabled: Explicit flag to enable or disable reasoning output (has no effect for OpenAI models)
    • reasoning_budget: Specify a budget for the reasoning process (requires either reasoning_effort or reasoning_enabled)
  • Structured reasoning output: Responses include a list of reasoning objects as part of the assistant’s message. Each object contains the content of the reasoning step and a unique id. Include these reasoning objects from previous turns in subsequent requests to maintain context in multi-turn conversations.
  • Streaming support: A reasoning_delta is available when streaming, allowing you to process reasoning output as it is generated.
  • Type safety: Type augmentations are available for better developer experience. For JavaScript/TypeScript, use the @braintrust/proxy/types module to extend OpenAI’s types. For Python, the braintrust-proxy package provides casting utilities for input parameters and output objects.

Non-streaming request

Here’s a non-streaming chat completion request using a Google model with reasoning enabled:
import { OpenAI } from "openai";
import "@braintrust/proxy/types";

async function main() {
  const openai = new OpenAI({
    baseURL: `${process.env.BRAINTRUST_API_URL || "https://api.braintrust.dev"}/v1/proxy`,
    apiKey: process.env.BRAINTRUST_API_KEY,
  });

  try {
    const response = await openai.chat.completions.create({
      model: "gemini-2.5-flash",
      reasoning_enabled: true,
      reasoning_budget: 1024,
      stream: false,
      messages: [
        {
          role: "user",
          content: "How many rs in 'ferrocarril'",
        },
        {
          role: "assistant",
          content: "There are 4 letter 'r's in the word \"ferrocarril\".",
          reasoning: [
            {
              id: "",
              content:
                "To count the number of 'r's in the word 'ferrocarril', I'll just go through the word letter by letter.\n\n'ferrocarril' has the following letters:\nf-e-r-r-o-c-a-r-r-i-l\n\nLooking at each letter:\n- 'f': not an 'r'\n- 'e': not an 'r'\n- 'r': This is an 'r', so that's 1.\n- 'r': This is an 'r', so that's 2.\n- 'o': not an 'r'\n- 'c': not an 'r'\n- 'a': not an 'r'\n- 'r': This is an 'r', so that's 3.\n- 'r': This is an 'r', so that's 4.\n- 'i': not an 'r'\n- 'l': not an 'r'\n\nSo there are 4 'r's in the word 'ferrocarril'.",
            },
          ],
        },
        {
          role: "user",
          content: "How many e in what you said?",
        },
      ],
    });

    console.log({
      message: response.choices[0].message,
      reasoning: response.choices[0].reasoning,
    });
  } catch (error) {
    console.error("Error during non-streaming request:", error);
  }
}

main().catch(console.error);

Streaming request

This example shows how to handle the reasoning_delta when streaming chat completion responses:
import { OpenAI } from "openai";
import "@braintrust/proxy/types";

async function main() {
  const openai = new OpenAI({
    baseURL: `${process.env.BRAINTRUST_API_URL || "https://api.braintrust.dev"}/v1/proxy`,
    apiKey: process.env.BRAINTRUST_API_KEY,
  });

  try {
    console.log("Streaming Request:");
    const stream = await openai.chat.completions.create({
      model: "claude-sonnet-4",
      messages: [
        {
          role: "user",
          content: "Tell me a short story.",
        },
      ],
      reasoning_effort: "high",
      stream: true,
    });

    for await (const event of stream) {
      if (event.choices && event.choices[0].delta) {
        const delta = event.choices[0].delta;
        if (delta.content) {
          process.stdout.write(`Content: ${delta.content}`);
        }
        if (delta.reasoning) {
          console.log("\nReasoning delta:", delta.reasoning);
        }
      }
    }
    console.log("\nStreaming Finished.");
  } catch (error) {
    console.error("Error during streaming request:", error);
  }
}

main().catch(console.error);

Use alternative protocols

The proxy translates OpenAI requests into various provider APIs automatically. You can also use native Anthropic and Gemini API schemas.

Anthropic API

curl -X POST https://api.braintrust.dev/v1/proxy/anthropic/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $BRAINTRUST_API_KEY" \
  -d '{
    "model": "claude-3-5-sonnet-20240620",
    "messages": [{"role": "user", "content": "What is a proxy?"}]
  }'
The anthropic-version and x-api-key headers are not required.

Gemini API

curl -X POST https://api.braintrust.dev/v1/proxy/google/models/gemini-2.5-flash:generateContent \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $BRAINTRUST_API_KEY" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{"text": "What is a proxy?"}]
      }
    ]
  }'

Add custom providers

Add custom models or endpoints to use with the proxy. Custom providers support self-hosted models, fine-tuned models, and proprietary AI services. See Custom providers for setup instructions and configuration options.

Use realtime models

The proxy supports the OpenAI Realtime API at the /realtime endpoint using WebSockets. Use the official OpenAI SDK (v6.0+) to connect to the proxy’s realtime endpoint.
Use https://braintrustproxy.com/v1, not https://api.braintrust.dev/v1/proxy, for WebSocket-based proxying.

Node.js with ws library

In Node.js environments, use OpenAIRealtimeWS from the openai/realtime/ws module:
import { OpenAIRealtimeWS } from "openai/realtime/ws";

const rt = new OpenAIRealtimeWS(
  {
    model: "gpt-realtime",
  },
  {
    apiKey: process.env.BRAINTRUST_API_KEY,
    baseURL: "https://braintrustproxy.com/v1",
  },
);

rt.socket.addEventListener("open", () => {
  console.log("Connection opened!");

  rt.send({
    type: "session.update",
    session: {
      output_modalities: ["text"], // or ["audio"]
      model: "gpt-realtime",
      type: "realtime",
    },
  });

  rt.send({
    type: "conversation.item.create",
    item: {
      type: "message",
      role: "user",
      content: [{ type: "input_text", text: "Say a couple paragraphs!" }],
    },
  });

  rt.send({ type: "response.create" });
});

rt.on("error", (err) => {
  console.error("Error:", err);
});

rt.on("response.output_text.delta", (event) => {
  process.stdout.write(event.delta);
});

rt.on("response.done", () => rt.close());

rt.socket.addEventListener("close", () => {
  console.log("\nConnection closed!");
});

Log realtime sessions

To log realtime sessions to Braintrust, pass the x-bt-parent header when creating the connection:
import { OpenAIRealtimeWS } from "openai/realtime/ws";
import { initLogger } from "braintrust";

async function main() {
  const logger = initLogger({ projectName: "My Realtime Project" });

  const rt = new OpenAIRealtimeWS(
    {
      model: "gpt-realtime",
      options: {
        headers: {
          "x-bt-parent": `project_id:${(await logger.project).id}`,
        },
      },
    },
    {
      apiKey: process.env.BRAINTRUST_API_KEY,
      baseURL: "https://braintrustproxy.com/v1",
    },
  );

  rt.socket.addEventListener("open", () => {
    console.log("Connection opened!");

    rt.send({
      type: "session.update",
      session: {
        output_modalities: ["text"], // or ["audio"]
        model: "gpt-realtime",
        type: "realtime",
      },
    });

    rt.send({
      type: "conversation.item.create",
      item: {
        type: "message",
        role: "user",
        content: [{ type: "input_text", text: "Say hello!" }],
      },
    });

    rt.send({ type: "response.create" });
  });

  rt.on("error", (err) => {
    console.error("Error:", err);
  });

  rt.on("response.output_text.delta", (event) =>
    process.stdout.write(event.delta),
  );

  rt.on("response.done", () => rt.close());

  rt.socket.addEventListener("close", () => {
    console.log("\nConnection closed!");
  });
}

main();
The proxy automatically logs audio, transcripts, and metadata to the specified project. Pass an experiment ID or span slug to log to a specific location. The OpenAI Realtime API uses different event names for output depending on the modality:
  • Text output: response.output_text.delta and response.output_text.done
  • Audio output: response.output_audio.delta and response.output_audio.done
  • Audio transcripts: response.output_audio_transcript.delta and response.output_audio_transcript.done

Compress audio

To reduce storage costs, enable audio compression by setting the x-bt-compress-audio header to true or 1:
import { OpenAIRealtimeWS } from "openai/realtime/ws";

async function main() {
  const projectId = "your-project-id"; // Replace with your project ID

  const rt = new OpenAIRealtimeWS(
    {
      model: "gpt-realtime",
      options: {
        headers: {
          "x-bt-parent": `project_id:${projectId}`,
          "x-bt-compress-audio": "true",
        },
      },
    },
    {
      apiKey: process.env.BRAINTRUST_API_KEY,
      baseURL: "https://braintrustproxy.com/v1",
    },
  );
}

main();
When enabled, the proxy compresses audio using MP3 encoding before logging it to Braintrust to significantly reduce storage requirements.

Browser or Cloudflare workers

For browser and Cloudflare Workers environments, use OpenAIRealtimeWebSocket:
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";

const rt = new OpenAIRealtimeWebSocket(
  {
    model: "gpt-realtime",
  },
  {
    apiKey: process.env.BRAINTRUST_API_KEY,
    baseURL: "https://braintrustproxy.com/v1",
  },
);

rt.socket.addEventListener("open", () => {
  console.log("Connection opened!");

  rt.send({
    type: "session.update",
    session: {
      output_modalities: ["text"], // or ["audio"]
      model: "gpt-realtime",
      type: "realtime",
    },
  });

  rt.send({
    type: "conversation.item.create",
    item: {
      type: "message",
      role: "user",
      content: [{ type: "input_text", text: "Say a couple paragraphs!" }],
    },
  });

  rt.send({ type: "response.create" });
});

rt.on("error", (err) => {
  console.error("Error:", err);
});

rt.on("response.output_text.delta", (event) => {
  console.log(event.delta);
});

rt.on("response.done", () => rt.close());

rt.socket.addEventListener("close", () => {
  console.log("\nConnection closed!");
});

Temporary credentials for realtime

For frontend or mobile applications, use temporary credentials to avoid exposing your API key. Pass the temporary credential as the apiKey:
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";

async function main() {
  const tempCredential = await fetchTempCredentialFromBackend(); // Replace with your backend call

  const rt = new OpenAIRealtimeWebSocket(
    {
      model: "gpt-realtime",
    },
    {
      apiKey: tempCredential,
      baseURL: "https://braintrustproxy.com/v1",
    },
  );

  rt.socket.addEventListener("open", () => {
    console.log("Connection opened!");

    rt.send({
      type: "session.update",
      session: {
        output_modalities: ["text"], // or ["audio"]
        model: "gpt-realtime",
        type: "realtime",
      },
    });

    rt.send({
      type: "conversation.item.create",
      item: {
        type: "message",
        role: "user",
        content: [{ type: "input_text", text: "Say hello!" }],
      },
    });

    rt.send({ type: "response.create" });
  });

  rt.on("error", (err) => {
    console.error("Error:", err);
  });

  rt.on("response.output_text.delta", (event) => {
    console.log(event.delta);
  });

  rt.on("response.done", () => rt.close());

  rt.socket.addEventListener("close", () => {
    console.log("\nConnection closed!");
  });
}

declare function fetchTempCredentialFromBackend(): Promise<string>;

main();

Create temporary credentials

A temporary credential converts your Braintrust API key (or model provider API key) to a time-limited credential that can be safely shared with end users.
  • Temporary credentials can carry additional information to limit access to a particular model and enable logging to Braintrust.
  • They can be used in the Authorization header anywhere you’d use a Braintrust API key or a model provider API key.
Use temporary credentials if you’d like your frontend or mobile app to send AI requests to the proxy directly, minimizing latency without exposing your API keys to end users.

Issue temporary credentials

Call the /credentials endpoint from a privileged location, such as your app’s backend, to issue temporary credentials. The temporary credential will be allowed to make requests on behalf of the Braintrust API key (or model provider API key) provided in the Authorization header. The body should specify the restrictions to be applied to the temporary credentials as a JSON object. If the logging key is present, the proxy will log to Braintrust any requests made with this temporary credential. The following example grants access to gpt-4o-realtime-preview-2024-10-01 for 10 minutes, logging the requests to the project named “My project”:
const PROXY_URL =
  process.env.BRAINTRUST_PROXY_URL || "https://braintrustproxy.com/v1";
const BRAINTRUST_API_KEY = process.env.BRAINTRUST_API_KEY;

async function main() {
  const response = await fetch(`${PROXY_URL}/credentials`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${BRAINTRUST_API_KEY}`,
    },
    body: JSON.stringify({
      model: "gpt-4o-realtime-preview-2024-10-01", // Leave undefined to allow all models
      ttl_seconds: 60 * 10, // 10 minutes
      logging: {
        project_name: "My project", // Replace with your project name
      },
    }),
    cache: "no-store",
  });

  if (!response.ok) {
    const error = await response.text();
    throw new Error(`Failed to request temporary credentials: ${error}`);
  }

  const { key: tempCredential } = await response.json();
  console.log(`Authorization: Bearer ${tempCredential}`);
}

main();
Generate temporary credentials using the web form for quick testing.

Inspect temporary credentials

Temporary credentials are formatted as JSON Web Tokens (JWT). Inspect the JWT’s payload using a library such as jsonwebtoken or a web-based tool like JWT.io to determine the expiration time and granted models:
import { decode as jwtDecode } from "jsonwebtoken";

const tempCredential = "<your temporary credential>";
const payload = jwtDecode(tempCredential, { complete: false, json: true });
// Example output:
// {
//   "aud": "braintrust_proxy",
//   "bt": {
//     "model": "gpt-4o",
//     "secret": "nCCxgkBoyy/zyOJlikuHILBMoK78bHFosEzy03SjJF0=",
//     "logging": {
//       "project_name": "My project"
//     }
//   },
//   "exp": 1729928077,
//   "iat": 1729927977,
//   "iss": "braintrust_proxy",
//   "jti": "bt_tmp:331278af-937c-4f97-9d42-42c83631001a"
// }
console.log(JSON.stringify(payload, null, 2));
Do not modify the JWT payload. This will invalidate the signature. Instead, issue a new temporary credential using the /credentials endpoint.

Use PDF input

The proxy extends the OpenAI API to support PDF input. Pass PDF URLs or base64-encoded PDFs with MIME type application/pdf:
curl https://api.braintrust.dev/v1/proxy/auto \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $BRAINTRUST_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": [
        {"type": "text", "text": "Extract the text from this PDF."},
        {"type": "image_url", "image_url": {"url": "https://example.com/document.pdf"}}
      ]}
    ]
  }'
For base64-encoded PDFs, use data:application/pdf;base64,<BASE64_DATA> as the URL.

Specify an organization

If you’re part of multiple organizations, specify which to use with the x-bt-org-name header:
import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://api.braintrust.dev/v1/proxy",
  defaultHeaders: {
    "x-bt-org-name": "Acme Inc", // Replace with your organization name
  },
  apiKey: process.env.BRAINTRUST_API_KEY,
});

Advanced configuration

Configure proxy behavior with these headers:
  • x-bt-use-cache: auto | always | never - Control caching behavior
  • x-bt-cache-ttl: Seconds (max 604800) - Set cache TTL
  • x-bt-use-creds-cache: auto | always | never - Control credentials caching (useful when rapidly updating credentials)
  • x-bt-org-name: Organization name - Specify organization for multi-org users
  • x-bt-endpoint-name: Endpoint name - Use a specific configured endpoint
  • x-bt-parent: Project/experiment/span - Enable logging to Braintrust
  • x-bt-compress-audio: true | false - Enable audio compression for realtime sessions

Monitor proxy usage

Track proxy usage across your organization:
  1. Create a project for proxy logs.
  2. Enable logging by setting the x-bt-parent header when calling the proxy (see Enable logging).
  3. View logs in the Logs page.
  4. Create dashboards to track usage, costs, and errors.
The proxy response includes the x-bt-used-endpoint header, which specifies which of your configured providers was used to complete the request.

Self-hosting

Self-hosted Braintrust deployments include a built-in proxy that runs in your environment. To configure your proxy URLs, see Configure API URLs in organization settings. For complete deployment instructions, see Self-hosting.

Integration with Braintrust

Several features in Braintrust are powered by the proxy. For example, when you create a playground, the proxy handles running the LLM calls. Similarly, if you create a prompt, when you preview the prompt’s results, the proxy is used to run the LLM. However, the proxy is not required when you:
  • Run evaluations in your code.
  • Load prompts to run in your code.
  • Log traces to Braintrust.
If you’d like to use it in your code to help with caching, secrets management, and other features, follow the instructions above to set it as the base URL in your OpenAI client.

Open source

The AI proxy is open source. View the code on GitHub.

Next steps