Anthropic

Anthropic provides access to Claude models including Claude 4 Sonnet, Claude 4.1 Opus, and other cutting-edge language models. Braintrust integrates seamlessly with Anthropic through direct API access, wrapAnthropic wrapper functions for automatic tracing, and proxy support.

Setup

To use Anthropic with Braintrust, you'll need an Anthropic API key.

  1. Visit Anthropic's Console and create a new API key
  2. Add the Anthropic API key to your organization's AI providers
  3. Set the Anthropic API key and your Braintrust API key as environment variables
.env
ANTHROPIC_API_KEY=<your-anthropic-api-key>
BRAINTRUST_API_KEY=<your-braintrust-api-key>
 
# If you are self-hosting Braintrust, set the URL of your hosted dataplane
# BRAINTRUST_API_URL=<your-braintrust-api-url>

API keys are encrypted using 256-bit AES-GCM encryption and are not stored or logged by Braintrust.

Install the braintrust and @anthropic-ai/sdk packages.

pnpm add braintrust @anthropic-ai/sdk

Trace with Anthropic

Trace your Anthropic LLM calls for observability and monitoring.

Trace automatically with wrapAnthropic

Braintrust provides wrapAnthropic (TypeScript) and wrap_anthropic (Python) functions that automatically log Anthropic API calls. Braintrust handles streaming, metric collection (including cached tokens), and other details.

Initialize the logger and pass the Anthropic client to the wrapAnthropic function.

wrapAnthropic is a convenience function that wraps the Anthropic client with the Braintrust logger. For more control, learn how to customize traces.

trace_anthropic.ts
import Anthropic from "@anthropic-ai/sdk";
import { wrapAnthropic, initLogger } from "braintrust";
 
// Initialize the Braintrust logger
const logger = initLogger({
  projectName: "My Project", // Your project name
  apiKey: process.env.BRAINTRUST_API_KEY,
});
 
// Wrap the Anthropic client with the Braintrust logger
const client = wrapAnthropic(
  new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }),
);
 
// All API calls are automatically logged
const result = await client.messages.create({
  model: "claude-3-5-sonnet-20241022",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What is machine learning?" }],
});

Evaluate with Anthropic

Evaluations distill the non-deterministic outputs of Anthropic models into an effective feedback loop that enables you to ship more reliable, higher quality products. The Braintrust Eval function is composed of a dataset of user inputs, a task, and a set of scorers. To learn more about evaluations, see the Experiments guide.

Basic Anthropic eval setup

Evaluate the outputs of Anthropic models with Braintrust.

eval_anthropic.ts
import { Eval } from "braintrust";
import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});
 
Eval("Anthropic Evaluation", {
  // An array of user inputs and expected outputs
  data: () => [
    { input: "What is 2+2?", expected: "4" },
    { input: "What is the capital of France?", expected: "Paris" },
  ],
  task: async (input) => {
    // Your Anthropic LLM call
    const response = await client.messages.create({
      model: "claude-3-5-sonnet-20241022",
      max_tokens: 1024,
      messages: [{ role: "user", content: input }],
    });
    return response.content[0].text;
  },
  scores: [
    {
      name: "accuracy",
      // A simple scorer that returns 1 if the output matches the expected output, 0 otherwise
      scorer: (args) => (args.output === args.expected ? 1 : 0),
    },
  ],
});

Learn more about eval data and scorers.

Use Anthropic as an LLM judge

You can use Anthropic models to score the outputs of other AI systems. This example uses the LLMClassifierFromSpec scorer to score the relevance of the outputs of an AI system.

Install the autoevals package to use the LLMClassifierFromSpec scorer.

pnpm add autoevals

Create a scorer that uses the LLMClassifierFromSpec scorer to score the relevance of the output. You can then include relevanceScorer as a scorer in your Eval function (see above).

anthropic_llm_judge.ts
import { LLMClassifierFromSpec } from "autoevals";
 
const relevanceScorer = LLMClassifierFromSpec("Relevance", {
  choice_scores: { Relevant: 1, Irrelevant: 0 },
  model: "claude-3-5-sonnet-20241022",
  use_cot: true,
});

Additional features

Tool use

Anthropic's tool use (function calling) is fully supported:

anthropic_tool_use.ts
const tools = [
  {
    name: "get_weather",
    description: "Get current weather for a location",
    input_schema: {
      type: "object",
      properties: {
        location: { type: "string", description: "City name" },
      },
      required: ["location"],
    },
  },
];
 
const response = await client.messages.create({
  model: "claude-3-5-sonnet-20241022",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What's the weather in San Francisco?" }],
  tools, 
});

System prompts

Anthropic models support system prompts for better instruction following.

anthropic_system_prompt.ts
const response = await client.messages.create({
  model: "claude-3-5-sonnet-20241022",
  max_tokens: 1024,
  system: "You are a helpful assistant that responds in JSON format.", 
  messages: [{ role: "user", content: "What is the capital of France?" }],
});

Cached tokens

Anthropic supports prompt caching to reduce costs and latency for repeated content.

anthropic_cached_tokens.ts
const response = await client.messages.create({
  model: "claude-3-5-sonnet-20241022",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "You are an AI assistant analyzing the following document...",
      cache_control: { type: "ephemeral" }, 
    },
  ],
  messages: [{ role: "user", content: "Summarize the key points." }],
});

Multimodal content, attachments, errors, and masking sensitive data

To learn more about these topics, check out the customize traces guide.

Use Anthropic with Braintrust AI proxy

You can also access Anthropic models through the Braintrust AI Proxy, which provides a unified, OpenAI-compatible interface for multiple providers.

anthropic_proxy.ts
import { OpenAI } from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.braintrust.dev/v1/proxy",
  apiKey: process.env.BRAINTRUST_API_KEY,
});
 
const response = await client.chat.completions.create({
  model: "claude-3-5-sonnet-20241022",
  messages: [{ role: "user", content: "What is a proxy?" }],
  seed: 1, // A seed activates the proxy's cache
});

Models and capabilities

ModelMultimodalReasoningMax inputMax outputInput $/1MOutput $/1M
claude-sonnet-4-20250514200,00064,000$3.00$15.00
claude-4-sonnet-20250514200,00064,000$3.00$15.00
claude-3-7-sonnet-latest200,000128,000$3.00$15.00
claude-3-7-sonnet-20250219200,000128,000$3.00$15.00
claude-3-5-haiku-latest200,0008,192$1.00$5.00
claude-3-5-haiku-20241022200,0008,192$0.80$4.00
claude-3-5-sonnet-latest200,0008,192$3.00$15.00
claude-3-5-sonnet-20241022200,0008,192$3.00$15.00
claude-3-5-sonnet-20240620200,0008,192$3.00$15.00
claude-opus-4-1-20250805200,00032,000$15.00$75.00
claude-opus-4-20250514200,00032,000$15.00$75.00
claude-4-opus-20250514200,00032,000$15.00$75.00
claude-3-opus-latest200,0004,096$15.00$75.00
claude-3-opus-20240229200,0004,096$15.00$75.00
claude-3-sonnet-20240229200,0004,096$3.00$15.00
claude-3-haiku-20240307200,0004,096$0.25$1.25
claude-instant-1.2100,0008,191$0.163$0.551
claude-instant-1100,0008,191$1.63$5.51
claude-2.1200,0008,191$8.00$24.00
claude-2.0$8.00$24.00
claude-2100,0008,191$8.00$24.00
anthropic.claude-opus-4-1-20250805-v1:0200,00032,000$15.00$75.00
Anthropic - Docs - Braintrust