Reasoning models like OpenAI’s o4, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 2.5 Flash generate intermediate thinking steps before producing final responses. Braintrust provides unified support for these models across providers.
Hybrid deployments require v0.0.74 or later for reasoning support.
Three parameters control reasoning behavior:
reasoning_effort: Intensity of reasoning (low, medium, or high). Compatible with OpenAI’s parameter.
reasoning_enabled: Boolean to explicitly enable/disable reasoning output (no effect on OpenAI models, which default to “medium”)
reasoning_budget: Token budget for reasoning (use either reasoning_effort or reasoning_budget, not both)
Use in code
Braintrust provides type augmentation for reasoning parameters:
- TypeScript:
@braintrust/proxy/types extends OpenAI SDK types
- Python:
braintrust-proxy provides casting utilities and type-safe helpers
Basic usage
import { OpenAI } from "openai";
import "@braintrust/proxy/types";
const openai = new OpenAI({
baseURL: `${process.env.BRAINTRUST_API_URL || "https://api.braintrust.dev"}/v1/proxy`,
apiKey: process.env.BRAINTRUST_API_KEY,
});
const response = await openai.chat.completions.create({
model: "claude-sonnet-4-5-20250929",
reasoning_effort: "medium",
messages: [
{
role: "user",
content: "What's 15% of 240?",
},
],
});
// Access final response
console.log(response.choices[0].message.content);
// Output: "15% of 240 is 36."
// Access reasoning steps
console.log(response.choices[0].reasoning);
// Output: Array of reasoning objects with step-by-step calculation
Reasoning structure
Reasoning steps include unique IDs and content:
[
{
"id": "reasoning_step_1",
"content": "I need to calculate 15% of 240..."
},
{
"id": "reasoning_step_2",
"content": "240 × 0.15 = 36..."
}
]
The id field contains provider-specific signatures that must be preserved in multi-turn conversations. Always use exact IDs returned by the provider.
Stream reasoning
Reasoning streams through delta.reasoning in streaming responses:
import { OpenAI } from "openai";
import "@braintrust/proxy/types";
const openai = new OpenAI({
baseURL: `${process.env.BRAINTRUST_API_URL || "https://api.braintrust.dev"}/v1/proxy`,
apiKey: process.env.BRAINTRUST_API_KEY,
});
const stream = await openai.chat.completions.create({
model: "claude-sonnet-4-5-20250929",
reasoning_effort: "high",
stream: true,
messages: [
{
role: "user",
content: "Explain quantum entanglement in simple terms.",
},
],
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta;
// Handle regular content
if (delta?.content) {
process.stdout.write(delta.content);
}
// Handle reasoning deltas
if (delta?.reasoning) {
console.log("\nReasoning step:", delta.reasoning);
}
}
Multi-turn conversations
Include reasoning from previous turns to let models build on earlier thinking:
import { OpenAI } from "openai";
import "@braintrust/proxy/types";
const openai = new OpenAI({
baseURL: `${process.env.BRAINTRUST_API_URL || "https://api.braintrust.dev"}/v1/proxy`,
apiKey: process.env.BRAINTRUST_API_KEY,
});
const firstResponse = await openai.chat.completions.create({
model: "claude-sonnet-4-5-20250929",
reasoning_effort: "medium",
messages: [
{
role: "user",
content: "What's the best approach to solve a complex math problem?",
},
],
});
// Include previous reasoning in next turn
const secondResponse = await openai.chat.completions.create({
model: "claude-sonnet-4-5-20250929",
reasoning_effort: "medium",
messages: [
{
role: "user",
content: "What's the best approach to solve a complex math problem?",
},
{
role: "assistant",
content: firstResponse.choices[0].message.content,
reasoning: firstResponse.choices[0].reasoning,
},
{
role: "user",
content: "Now apply that approach to solve: 2x² + 5x - 3 = 0",
},
],
});
Test in playgrounds
Use playgrounds to test reasoning models interactively:
- Select a reasoning-capable model
- Set
reasoning_effort in parameters
- Run evaluations
- View reasoning steps in trace view
Reasoning steps appear as separate spans in the trace, making it easy to understand the model’s thinking process.
Evaluate reasoning quality
Create scorers that evaluate both final outputs and reasoning steps:
project.scorers.create({
name: "Reasoning quality",
slug: "reasoning-quality",
messages: [
{
role: "user",
content:
'Evaluate the reasoning steps: {{reasoning}}\n\nAre they logical and complete? Return "A" for excellent, "B" for adequate, "C" for poor.',
},
],
model: "gpt-4o",
choiceScores: {
A: 1,
B: 0.5,
C: 0,
},
});
This helps you understand whether models are using sound reasoning paths to reach conclusions.
Next steps