This is an experimental feature. The API may change based on user feedback.
LangSmith is LangChain’s platform for tracing, evaluation, and monitoring of LLM applications. Braintrust provides an experimental wrapper to integrate LangSmith with Braintrust. The wrapper can either send tracing and evaluation calls to both LangSmith and Braintrust in parallel, or route them solely to Braintrust, with minimal code changes.
The wrapper supports two modes:
- Parallel (default): Send traces and evaluations to both LangSmith and Braintrust simultaneously. Use this to compare services, maintain existing workflows, or run both long-term.
- Standalone: Send traces and evaluations only to Braintrust. Use this when you want to use Braintrust exclusively.
Setup
Install LangSmith alongside the Braintrust SDK (requires Braintrust Python SDK v0.4.3 or later):
# uv
uv add braintrust langsmith
# pip
pip install braintrust langsmith
Set your Braintrust API key as an environment variable:
export BRAINTRUST_API_KEY=your-braintrust-api-key
Make sure you have LangSmith environment variables set as well:
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_PROJECT=your-project-name
export LANGSMITH_API_KEY=your-langsmith-api-key
Tracing
The wrapper automatically redirects:
- Functions decorated with LangSmith’s
@traceable to Braintrust’s @traced
- Nested span hierarchies with inputs and outputs
- Complete execution traces with metadata
Parallel tracing
By default, traces are sent to both LangSmith and Braintrust simultaneously.
To use the wrapper, call setup_langsmith() before importing from LangSmith modules:
import os
# Setup Braintrust wrapper BEFORE importing from langsmith
from braintrust.wrappers.langsmith_wrapper import setup_langsmith
setup_langsmith(
project_name="langsmith-integration",
api_key=os.environ.get("BRAINTRUST_API_KEY"),
)
# Now import and use LangSmith as usual - these are patched to use Braintrust
from langsmith import traceable
from openai import OpenAI
client = OpenAI()
@traceable(name="chat_completion")
def chat_completion() -> str:
"""Single traced call."""
result = client.responses.create(
model="gpt-5-mini",
input=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"},
],
)
return result.output_text
if __name__ == "__main__":
print(chat_completion())
The wrapper automatically reads the project name from the LANGCHAIN_PROJECT environment variable. You can override this by passing project_name to setup_langsmith().
Standalone tracing
With standalone mode, traces are sent only to Braintrust.
To enable standalone tracing, set standalone=True:
import os
# Setup Braintrust wrapper BEFORE importing from langsmith
from braintrust.wrappers.langsmith_wrapper import setup_langsmith
setup_langsmith(
project_name="langsmith-integration",
api_key=os.environ.get("BRAINTRUST_API_KEY"),
standalone=True, # Only Braintrust will receive traces
)
# Now import and use LangSmith as usual - these are patched to use Braintrust
from langsmith import traceable
from openai import OpenAI
client = OpenAI()
@traceable(name="chat_completion")
def chat_completion() -> str:
"""Single traced call."""
result = client.responses.create(
model="gpt-5-mini",
input=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"},
],
)
return result.output_text
if __name__ == "__main__":
print(chat_completion())
You can also enable standalone mode via environment variable:
export BRAINTRUST_STANDALONE=1
Evaluations
The wrapper automatically redirects:
evaluate() calls to Braintrust’s Eval() function
aevaluate() calls to Braintrust’s EvalAsync() function
Just like for tracing, you can send evaluation calls to both LangSmith and Braintrust in parallel, or only to Braintrust.
Parallel evals
Evaluators follow the LangSmith signature: (inputs, outputs, reference_outputs) -> bool | dict. The wrapper automatically converts these to Braintrust scorers.
When LangSmith evals are instrumented with @traceable, scores show up both in experiments and logging.
eval_langsmith_parallel.py
import os
# Setup Braintrust wrapper BEFORE importing from langsmith
from braintrust.wrappers.langsmith_wrapper import setup_langsmith
setup_langsmith(
project_name="langsmith-integration",
api_key=os.environ.get("BRAINTRUST_API_KEY"),
)
# Now import and use LangSmith as usual - these are patched to use Braintrust
from langsmith import Client, traceable
# Define a target function (the function being evaluated)
# LangSmith requires the parameter to be named 'inputs' (or 'attachments'/'metadata')
def multiply(inputs: dict, **kwargs) -> int:
"""Multiply two numbers.
Args:
inputs: Dictionary with 'x' and 'y' keys
**kwargs: Additional arguments (e.g., langsmith_extra from LangSmith)
"""
return inputs["x"] * inputs["y"]
# Define LangSmith-style evaluators
# LangSmith evaluators use signature: (inputs, outputs, reference_outputs) -> bool | dict
# When target returns a plain value, LangSmith wraps it as {"output": value}
def exact_match_evaluator(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
"""
LangSmith-style evaluator that checks for exact match.
"""
expected = reference_outputs["output"]
actual = outputs["output"]
return {
"key": "exact_match",
"score": 1.0 if actual == expected else 0.0,
}
def range_evaluator(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
"""
LangSmith-style evaluator that checks if result is in expected range.
"""
actual = outputs["output"]
expected = reference_outputs["output"]
# Check if within 10% of expected
if expected == 0:
score = 1.0 if actual == 0 else 0.0
else:
diff = abs(actual - expected) / abs(expected)
score = 1.0 if diff <= 0.1 else 0.0
return {
"key": "within_range",
"score": score,
"metadata": {"actual": actual, "expected": expected},
}
def main():
print("LangSmith to Braintrust Evaluation Example")
print("=" * 50)
print()
# Create a LangSmith client (patched to use Braintrust)
client = Client()
# Create a dataset in LangSmith (proper LangSmith API usage)
dataset_name = "multiply-dataset-example"
# Try to get or create the dataset
try:
dataset = client.read_dataset(dataset_name=dataset_name)
print(f"Using existing dataset: {dataset_name}")
except Exception:
# Create new dataset if it doesn't exist
dataset = client.create_dataset(dataset_name=dataset_name, description="Multiplication test dataset")
print(f"Created new dataset: {dataset_name}")
# Create examples in the dataset (proper LangSmith API)
client.create_examples(
dataset_id=dataset.id,
examples=[
{"inputs": {"x": 2, "y": 3}, "outputs": {"output": 6}},
{"inputs": {"x": 5, "y": 5}, "outputs": {"output": 25}},
{"inputs": {"x": 10, "y": 0}, "outputs": {"output": 0}},
{"inputs": {"x": 7, "y": 8}, "outputs": {"output": 56}},
],
)
print(f"Created {4} examples in dataset")
print()
print("Running evaluation...")
print()
# Run evaluation using LangSmith's API (redirects to Braintrust)
# Pass the dataset name - this is valid LangSmith API usage
client.evaluate(
multiply, # Target function
data=dataset_name, # Dataset name (valid LangSmith API)
evaluators=[exact_match_evaluator, range_evaluator],
experiment_prefix="multiply-test",
description="Testing multiplication function",
metadata={"version": "1.0", "migrated_from": "langsmith"},
)
print()
print("=" * 50)
print("✓ Evaluation completed!")
print("Check Braintrust to see the experiment results.")
if __name__ == "__main__":
main()
Standalone evals
eval_langsmith_standalone.py
import os
# Setup Braintrust wrapper BEFORE importing from langsmith
from braintrust.wrappers.langsmith_wrapper import setup_langsmith
setup_langsmith(
project_name="langsmith-integration",
api_key=os.environ.get("BRAINTRUST_API_KEY"),
standalone=True, # Only Braintrust will receive evals
)
# Now import and use LangSmith as usual - these are patched to use Braintrust
from langsmith import Client, traceable
# Define a target function (the function being evaluated)
# LangSmith requires the parameter to be named 'inputs' (or 'attachments'/'metadata')
def multiply(inputs: dict, **kwargs) -> int:
"""Multiply two numbers.
Args:
inputs: Dictionary with 'x' and 'y' keys
**kwargs: Additional arguments (e.g., langsmith_extra from LangSmith)
"""
return inputs["x"] * inputs["y"]
# Define LangSmith-style evaluators
# LangSmith evaluators use signature: (inputs, outputs, reference_outputs) -> bool | dict
# When target returns a plain value, LangSmith wraps it as {"output": value}
def exact_match_evaluator(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
"""
LangSmith-style evaluator that checks for exact match.
"""
expected = reference_outputs["output"]
actual = outputs["output"]
return {
"key": "exact_match",
"score": 1.0 if actual == expected else 0.0,
}
def range_evaluator(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
"""
LangSmith-style evaluator that checks if result is in expected range.
"""
actual = outputs["output"]
expected = reference_outputs["output"]
# Check if within 10% of expected
if expected == 0:
score = 1.0 if actual == 0 else 0.0
else:
diff = abs(actual - expected) / abs(expected)
score = 1.0 if diff <= 0.1 else 0.0
return {
"key": "within_range",
"score": score,
"metadata": {"actual": actual, "expected": expected},
}
def main():
print("LangSmith to Braintrust Evaluation Example")
print("=" * 50)
print()
# Create a LangSmith client (patched to use Braintrust)
client = Client()
# Create a dataset in LangSmith (proper LangSmith API usage)
dataset_name = "multiply-dataset-example"
# Try to get or create the dataset
try:
dataset = client.read_dataset(dataset_name=dataset_name)
print(f"Using existing dataset: {dataset_name}")
except Exception:
# Create new dataset if it doesn't exist
dataset = client.create_dataset(dataset_name=dataset_name, description="Multiplication test dataset")
print(f"Created new dataset: {dataset_name}")
# Create examples in the dataset (proper LangSmith API)
client.create_examples(
dataset_id=dataset.id,
examples=[
{"inputs": {"x": 2, "y": 3}, "outputs": {"output": 6}},
{"inputs": {"x": 5, "y": 5}, "outputs": {"output": 25}},
{"inputs": {"x": 10, "y": 0}, "outputs": {"output": 0}},
{"inputs": {"x": 7, "y": 8}, "outputs": {"output": 56}},
],
)
print(f"Created {4} examples in dataset")
print()
print("Running evaluation...")
print()
# Run evaluation using LangSmith's API (redirects to Braintrust)
# Pass the dataset name - this is valid LangSmith API usage
client.evaluate(
multiply, # Target function
data=dataset_name, # Dataset name (valid LangSmith API)
evaluators=[exact_match_evaluator, range_evaluator],
experiment_prefix="multiply-test",
description="Testing multiplication function",
metadata={"version": "1.0", "migrated_from": "langsmith"},
)
print()
print("=" * 50)
print("✓ Evaluation completed!")
print("Check Braintrust to see the experiment results.")
if __name__ == "__main__":
main()
Resources