LangSmith

This is an experimental feature. The API may change based on user feedback.

LangSmith is LangChain’s platform for tracing, evaluation, and monitoring of LLM applications. Braintrust provides an experimental wrapper to integrate LangSmith with Braintrust. The wrapper can either send tracing and evaluation calls to both LangSmith and Braintrust in parallel, or route them solely to Braintrust, with minimal code changes. The wrapper supports two modes:

Parallel (default): Send traces and evaluations to both LangSmith and Braintrust simultaneously. Use this to compare services, maintain existing workflows, or run both long-term.
Standalone: Send traces and evaluations only to Braintrust. Use this when you want to use Braintrust exclusively.

Setup

Install LangSmith alongside the Braintrust SDK (requires Braintrust Python SDK v0.4.3 or later):

# uv
uv add braintrust langsmith
# pip
pip install braintrust langsmith

Set your Braintrust API key as an environment variable:

export BRAINTRUST_API_KEY=your-braintrust-api-key

Make sure you have LangSmith environment variables set as well:

export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_PROJECT=your-project-name
export LANGSMITH_API_KEY=your-langsmith-api-key

Tracing

The wrapper automatically redirects:

Functions decorated with LangSmith’s @traceable to Braintrust’s @traced
Nested span hierarchies with inputs and outputs
Complete execution traces with metadata

Parallel tracing

By default, traces are sent to both LangSmith and Braintrust simultaneously. To use the wrapper, call setup_langsmith() before importing from LangSmith modules:

trace_parallel.py

import os

# Setup Braintrust wrapper BEFORE importing from langsmith
from braintrust.wrappers.langsmith_wrapper import setup_langsmith

setup_langsmith(
    project_name="langsmith-integration",
    api_key=os.environ.get("BRAINTRUST_API_KEY"),
)

# Now import and use LangSmith as usual - these are patched to use Braintrust
from langsmith import traceable
from openai import OpenAI

client = OpenAI()

@traceable(name="chat_completion")
def chat_completion() -> str:
    """Single traced call."""
    result = client.responses.create(
        model="gpt-5-mini",
        input=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is machine learning?"},
        ],
    )
    return result.output_text


if __name__ == "__main__":
    print(chat_completion())

The wrapper automatically reads the project name from the LANGCHAIN_PROJECT environment variable. You can override this by passing project_name to setup_langsmith().

Standalone tracing

With standalone mode, traces are sent only to Braintrust. To enable standalone tracing, set standalone=True:

trace_standalone.py

import os

# Setup Braintrust wrapper BEFORE importing from langsmith
from braintrust.wrappers.langsmith_wrapper import setup_langsmith

setup_langsmith(
    project_name="langsmith-integration",
    api_key=os.environ.get("BRAINTRUST_API_KEY"),
    standalone=True,  # Only Braintrust will receive traces
)

# Now import and use LangSmith as usual - these are patched to use Braintrust
from langsmith import traceable
from openai import OpenAI

client = OpenAI()

@traceable(name="chat_completion")
def chat_completion() -> str:
    """Single traced call."""
    result = client.responses.create(
        model="gpt-5-mini",
        input=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is machine learning?"},
        ],
    )
    return result.output_text


if __name__ == "__main__":
    print(chat_completion())

You can also enable standalone mode via environment variable:

export BRAINTRUST_STANDALONE=1

Evaluations

The wrapper automatically redirects:

evaluate() calls to Braintrust’s Eval() function
aevaluate() calls to Braintrust’s EvalAsync() function

Just like for tracing, you can send evaluation calls to both LangSmith and Braintrust in parallel, or only to Braintrust.

Parallel evals

Evaluators follow the LangSmith signature: (inputs, outputs, reference_outputs) -> bool | dict. The wrapper automatically converts these to Braintrust scorers.

When LangSmith evals are instrumented with @traceable, scores show up both in experiments and logging.

eval_langsmith_parallel.py

import os

# Setup Braintrust wrapper BEFORE importing from langsmith
from braintrust.wrappers.langsmith_wrapper import setup_langsmith

setup_langsmith(
    project_name="langsmith-integration",
    api_key=os.environ.get("BRAINTRUST_API_KEY"),
)

# Now import and use LangSmith as usual - these are patched to use Braintrust
from langsmith import Client, traceable

# Define a target function (the function being evaluated)
# LangSmith requires the parameter to be named 'inputs' (or 'attachments'/'metadata')
def multiply(inputs: dict, **kwargs) -> int:
    """Multiply two numbers.

    Args:
        inputs: Dictionary with 'x' and 'y' keys
        **kwargs: Additional arguments (e.g., langsmith_extra from LangSmith)
    """
    return inputs["x"] * inputs["y"]

# Define LangSmith-style evaluators
# LangSmith evaluators use signature: (inputs, outputs, reference_outputs) -> bool | dict
# When target returns a plain value, LangSmith wraps it as {"output": value}
def exact_match_evaluator(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
    """
    LangSmith-style evaluator that checks for exact match.
    """
    expected = reference_outputs["output"]
    actual = outputs["output"]
    return {
        "key": "exact_match",
        "score": 1.0 if actual == expected else 0.0,
    }

def range_evaluator(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
    """
    LangSmith-style evaluator that checks if result is in expected range.
    """
    actual = outputs["output"]
    expected = reference_outputs["output"]
    # Check if within 10% of expected
    if expected == 0:
        score = 1.0 if actual == 0 else 0.0
    else:
        diff = abs(actual - expected) / abs(expected)
        score = 1.0 if diff <= 0.1 else 0.0
    return {
        "key": "within_range",
        "score": score,
        "metadata": {"actual": actual, "expected": expected},
    }

def main():
    print("LangSmith to Braintrust Evaluation Example")
    print("=" * 50)
    print()

    # Create a LangSmith client (patched to use Braintrust)
    client = Client()

    # Create a dataset in LangSmith (proper LangSmith API usage)
    dataset_name = "multiply-dataset-example"

    # Try to get or create the dataset
    try:
        dataset = client.read_dataset(dataset_name=dataset_name)
        print(f"Using existing dataset: {dataset_name}")
    except Exception:
        # Create new dataset if it doesn't exist
        dataset = client.create_dataset(dataset_name=dataset_name, description="Multiplication test dataset")
        print(f"Created new dataset: {dataset_name}")

        # Create examples in the dataset (proper LangSmith API)
        client.create_examples(
            dataset_id=dataset.id,
            examples=[
                {"inputs": {"x": 2, "y": 3}, "outputs": {"output": 6}},
                {"inputs": {"x": 5, "y": 5}, "outputs": {"output": 25}},
                {"inputs": {"x": 10, "y": 0}, "outputs": {"output": 0}},
                {"inputs": {"x": 7, "y": 8}, "outputs": {"output": 56}},
            ],
        )
        print(f"Created {4} examples in dataset")

    print()
    print("Running evaluation...")
    print()

    # Run evaluation using LangSmith's API (redirects to Braintrust)
    # Pass the dataset name - this is valid LangSmith API usage
    client.evaluate(
        multiply,  # Target function
        data=dataset_name,  # Dataset name (valid LangSmith API)
        evaluators=[exact_match_evaluator, range_evaluator],
        experiment_prefix="multiply-test",
        description="Testing multiplication function",
        metadata={"version": "1.0", "migrated_from": "langsmith"},
    )
    print()
    print("=" * 50)
    print("✓ Evaluation completed!")
    print("Check Braintrust to see the experiment results.")


if __name__ == "__main__":
    main()

Standalone evals

eval_langsmith_standalone.py

import os

# Setup Braintrust wrapper BEFORE importing from langsmith
from braintrust.wrappers.langsmith_wrapper import setup_langsmith

setup_langsmith(
    project_name="langsmith-integration",
    api_key=os.environ.get("BRAINTRUST_API_KEY"),
    standalone=True,  # Only Braintrust will receive evals
)

# Now import and use LangSmith as usual - these are patched to use Braintrust
from langsmith import Client, traceable

# Define a target function (the function being evaluated)
# LangSmith requires the parameter to be named 'inputs' (or 'attachments'/'metadata')
def multiply(inputs: dict, **kwargs) -> int:
    """Multiply two numbers.

    Args:
        inputs: Dictionary with 'x' and 'y' keys
        **kwargs: Additional arguments (e.g., langsmith_extra from LangSmith)
    """
    return inputs["x"] * inputs["y"]

# Define LangSmith-style evaluators
# LangSmith evaluators use signature: (inputs, outputs, reference_outputs) -> bool | dict
# When target returns a plain value, LangSmith wraps it as {"output": value}
def exact_match_evaluator(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
    """
    LangSmith-style evaluator that checks for exact match.
    """
    expected = reference_outputs["output"]
    actual = outputs["output"]
    return {
        "key": "exact_match",
        "score": 1.0 if actual == expected else 0.0,
    }

def range_evaluator(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
    """
    LangSmith-style evaluator that checks if result is in expected range.
    """
    actual = outputs["output"]
    expected = reference_outputs["output"]
    # Check if within 10% of expected
    if expected == 0:
        score = 1.0 if actual == 0 else 0.0
    else:
        diff = abs(actual - expected) / abs(expected)
        score = 1.0 if diff <= 0.1 else 0.0
    return {
        "key": "within_range",
        "score": score,
        "metadata": {"actual": actual, "expected": expected},
    }

def main():
    print("LangSmith to Braintrust Evaluation Example")
    print("=" * 50)
    print()

    # Create a LangSmith client (patched to use Braintrust)
    client = Client()

    # Create a dataset in LangSmith (proper LangSmith API usage)
    dataset_name = "multiply-dataset-example"

    # Try to get or create the dataset
    try:
        dataset = client.read_dataset(dataset_name=dataset_name)
        print(f"Using existing dataset: {dataset_name}")
    except Exception:
        # Create new dataset if it doesn't exist
        dataset = client.create_dataset(dataset_name=dataset_name, description="Multiplication test dataset")
        print(f"Created new dataset: {dataset_name}")

        # Create examples in the dataset (proper LangSmith API)
        client.create_examples(
            dataset_id=dataset.id,
            examples=[
                {"inputs": {"x": 2, "y": 3}, "outputs": {"output": 6}},
                {"inputs": {"x": 5, "y": 5}, "outputs": {"output": 25}},
                {"inputs": {"x": 10, "y": 0}, "outputs": {"output": 0}},
                {"inputs": {"x": 7, "y": 8}, "outputs": {"output": 56}},
            ],
        )
        print(f"Created {4} examples in dataset")

    print()
    print("Running evaluation...")
    print()

    # Run evaluation using LangSmith's API (redirects to Braintrust)
    # Pass the dataset name - this is valid LangSmith API usage
    client.evaluate(
        multiply,  # Target function
        data=dataset_name,  # Dataset name (valid LangSmith API)
        evaluators=[exact_match_evaluator, range_evaluator],
        experiment_prefix="multiply-test",
        description="Testing multiplication function",
        metadata={"version": "1.0", "migrated_from": "langsmith"},
    )
    print()
    print("=" * 50)
    print("✓ Evaluation completed!")
    print("Check Braintrust to see the experiment results.")


if __name__ == "__main__":
    main()

AI providers

SDKs

Developer tools

Setup

Tracing

Parallel tracing

Standalone tracing

Evaluations

Parallel evals

Standalone evals

Resources

AI providers

SDKs

Developer tools

​Setup

​Tracing

​Parallel tracing

​Standalone tracing

​Evaluations

​Parallel evals

​Standalone evals

​Resources

Setup

Tracing

Parallel tracing

Standalone tracing

Evaluations

Parallel evals

Standalone evals

Resources