API reference - Braintrust

This page covers the key APIs in the Braintrust Python SDK. For setup, see the Quickstart. For the complete reference, see the full Python SDK reference.

Tracing

Tracing records what your application does as spans you can inspect in Braintrust. The recommended way to capture AI calls is auto-instrumentation: call init_logger() then auto_instrument(), and supported libraries are traced with no further code changes (see Install and instrument). The APIs below set up logging, trace your own code, and flush and link to your traces.

`init_logger()`

Creates a project logger for production traces and makes it the current logger by default. Call it once on startup.

import braintrust

logger = braintrust.init_logger(project="Support bot")

logger.log(
    input={"question": "How do I reset my password?"},
    output={"answer": "Use the account recovery flow."},
    metadata={"route": "/support"},
)

logger.flush()

Returns: Logger. Arguments (all optional):

project (str): project name for logs. If omitted, logs go to the global project.
project_id (str): project ID. Takes precedence over project.
api_key (str): API key. Defaults to BRAINTRUST_API_KEY.
app_url (str): Braintrust app URL. Defaults to https://www.braintrust.dev.
org_name (str): organization name, useful when credentials can access multiple orgs.
async_flush (bool): defaults to true. Set false when you need synchronous flush behavior.
set_current (bool): defaults to true. Controls whether current_logger() returns this logger.

`auto_instrument()`

Patches supported AI and ML libraries so their calls are traced to Braintrust automatically. This is the recommended way to capture AI calls.

import braintrust

logger = braintrust.init_logger(project="Support bot")
braintrust.auto_instrument()

from openai import OpenAI

client = OpenAI()
response = client.responses.create(
    model="gpt-5-mini",
    input=[{"role": "user", "content": "What is Braintrust?"}],
)

logger.flush()

Call auto_instrument() after init_logger() and before creating provider or framework clients. If your app imports provider classes directly, such as from openai import OpenAI, call auto_instrument() before those imports when possible so the SDK can patch the imported symbols. Returns: dict[str, bool], mapping each integration name to whether it was successfully instrumented. Missing optional dependencies are skipped. Arguments (all optional): each supported integration has a boolean flag that defaults to true. Set a flag to false to skip that integration. For the full list of integration flags, see Disabling specific integrations. For example, disable OpenAI instrumentation while keeping the other integrations enabled:

braintrust.auto_instrument(openai=False)

`traced()`

Decorates a function so each call creates a span, logs thrown errors, and ends the span automatically.

import braintrust

@braintrust.traced
def classify_text(text: str) -> str:
    return "positive"

classify_text("Great result")

You can also call it with span arguments:

@braintrust.traced(name="Classify text", type="task")
def classify_text(text: str) -> str:
    return "positive"

`start_span()`

Starts a span manually when you need more control than @traced gives you, such as when the work doesn’t fit inside a single function.

import braintrust

with braintrust.start_span(name="Retrieve documents") as span:
    docs = retrieve_documents()
    span.log(output={"count": len(docs)})

Returns: Span. Arguments (all optional):

name (str): span name shown in Braintrust.
type (SpanTypeAttribute): span type, such as a task or LLM span.
span_attributes (Mapping[str, Any]): additional span attributes.
set_current (bool): whether to make this span the current span.
parent (str): explicit parent span or object ID.
event (Any): initial event data to log on the span.

`current_logger()` and `current_span()`

Return the currently active logger or span, so you can add data without holding a direct reference.

logger = braintrust.current_logger()
span = braintrust.current_span()

span.log(metadata={"cache_hit": True})

current_span() returns a no-op span object when no span is active, so it is safe to call from helper functions.

`flush()`

Flushes pending rows to Braintrust.

braintrust.flush()

For short-lived scripts, call logger.flush(), span.flush(), or braintrust.flush() before the process exits.

`permalink()`

Builds a Braintrust app URL for an exported span slug, so you can link straight to a trace from your own logs or app.

url = braintrust.permalink(span.export())

Returns: str.

`set_masking_function()`

Installs a global masking function that runs over logged data before it leaves your process, so you can redact sensitive values before they reach Braintrust.

def mask_secrets(value):
    if isinstance(value, dict) and "api_key" in value:
        return {**value, "api_key": "***"}
    return value

braintrust.set_masking_function(mask_secrets)

Set the masking function to None to disable masking.

Evaluations

An evaluation runs your task over a set of cases, scores each output, and logs the results to an experiment, which is how you measure quality and catch regressions as you change prompts or models. Eval() is the main entry point. The other APIs here run async evaluations and customize reporting.

`Eval()`

Runs an evaluation from your data, a task, and scorers: it runs the task over every case, scores the outputs, logs each row to an experiment, and returns a summary you can compare across runs.

from braintrust import Eval

Eval(
    "Support bot",
    data=lambda: [
        {
            "input": "How do I reset my password?",
            "expected": "Use the account recovery flow.",
        }
    ],
    task=lambda input: answer_question(input),
    scores=[exact_match],
)

Returns: EvalResultWithSummary. Arguments:

name (str, required): project name in Braintrust. Passed as the first positional argument.
data (EvalData, required): iterator over evaluation cases. Each case should include an input and can include expected, metadata, and tags.
task (EvalTask, required): function under test. Receives one input and returns the output to score.
scores (Sequence[EvalScorer], required): scorers that evaluate the task output.
experiment_name (str): experiment name. If omitted, one is generated automatically.
trial_count (int): number of times to run each input, useful for non-deterministic applications.
metadata (Metadata): extra experiment metadata for filtering and analysis.
tags (Sequence[str]): tags to associate with the experiment.
timeout (float): per-evaluation timeout in seconds.
max_concurrency (int): maximum concurrent tasks and scorers.
project_id (str): project ID. Takes precedence over name.
base_experiment_name (str): experiment to compare against.
base_experiment_id (str): experiment ID to compare against. Takes precedence over base_experiment_name.
no_send_logs (bool): run locally without sending logs to Braintrust.
parameters (EvalParameters | RemoteEvalParameters): parameters to pass to the evaluator.

`EvalAsync()`

Asynchronous version of Eval(). Use it when your task or scorers perform async I/O.

from braintrust import EvalAsync

await EvalAsync(
    "Support bot",
    data=load_cases,
    task=answer_question_async,
    scores=[factuality_score],
)

Returns: an awaitable EvalResultWithSummary. Accepts the same arguments as Eval(), with async tasks and scorers. For data, use a synchronous callable that returns a list or an async generator. An async function that returns a list is not supported.

`Reporter()`

Creates a reporter for custom evaluation reporting, such as emitting results to CI.

from braintrust import Reporter

reporter = Reporter(
    name="CI reporter",
    report_eval=report_eval,
    report_run=report_run,
)

Arguments:

name (str, required): reporter name.
report_eval (Callable, required): called with each evaluator and its result.
report_run (Callable, required): called with all evaluator reports and returns whether the run succeeded.

Experiments

An experiment is a single evaluation run logged to a project. Use these APIs when you want to create an experiment and log rows yourself, instead of letting Eval() manage one for you.

`init()` / `init_experiment()`

Creates or opens an experiment in a project for manual experiment logging.

import braintrust

experiment = braintrust.init(project="Support bot", experiment="retrieval-v2")

experiment.log(
    input={"question": "How do I reset my password?"},
    output={"answer": "Use the account recovery flow."},
    scores={"exact_match": 1},
)

experiment.summarize()

Returns: Experiment. Arguments:

project (str): project name. Provide project or project_id.
project_id (str): project ID. Takes precedence over project.
experiment (str): experiment name. If omitted, one is generated automatically.
dataset (Dataset | DatasetRef): dataset associated with the experiment.
base_experiment (str): experiment name to compare against.
base_experiment_id (str): experiment ID to compare against.
metadata (Metadata): extra metadata for filtering and analysis.
tags (Sequence[str]): tags to associate with the experiment.
set_current (bool): defaults to true. Sets the current experiment for log().
update (bool): continue logging to an existing experiment if it exists.

Datasets

A dataset is a versioned collection of cases you manage in Braintrust and reuse across experiments and evals. Use init_dataset() to create a dataset or open an existing one.

`init_dataset()`

Creates or opens a dataset in a project.

import braintrust

dataset = braintrust.init_dataset(project="Support bot", name="Golden questions")

dataset.insert(
    input={"question": "How do I reset my password?"},
    expected={"answer": "Use the account recovery flow."},
    metadata={"source": "docs"},
)

dataset.flush()

Returns: Dataset. Arguments:

project (str): project name. Provide project or project_id.
project_id (str): project ID. Takes precedence over project.
name (str): dataset name. If omitted, one is generated automatically.
description (str): dataset description.
version (str | int): dataset version to read. Defaults to latest.
metadata (Metadata): extra dataset metadata.

Prompts and functions

In Braintrust, functions are units of logic you define and version in the UI, then load or invoke from your code. A prompt is a function whose job is to call a model with a templated set of messages. Other functions include scorers, tools, and code you deploy. Load and render a saved prompt with load_prompt(), or invoke a deployed function with invoke().

`load_prompt()`

Loads a saved prompt from a Braintrust project. Use the returned prompt’s build() to render request parameters with runtime variables.

import braintrust
from openai import OpenAI

prompt = braintrust.load_prompt(project="Support bot", slug="answer-question")
built = prompt.build(question="How do I reset my password?")

# build() renders Chat Completions parameters. Make the call on a Braintrust-
# instrumented client (for example after auto_instrument()) so it is traced and
# build()'s span_info is stripped before reaching OpenAI.
client = OpenAI()
response = client.chat.completions.create(**built)

Returns: Prompt. Arguments:

project (str): project name. Provide project, project_id, or id.
project_id (str): project ID. Takes precedence over project.
slug (str): prompt slug.
id (str): prompt ID. Takes precedence over project, slug, and version.
version (str | int): prompt version. Defaults to latest.
environment (str): environment to load from. version takes precedence when both are provided.
defaults (Mapping[str, Any]): default variables used when rendering the prompt.
no_trace (bool): if true, built prompt metadata is not included in traces.

`invoke()`

Invokes a Braintrust function and returns either a plain Python object or a BraintrustStream.

import braintrust

result = braintrust.invoke(
    project_name="Support bot",
    slug="answer-question",
    input={"question": "How do I reset my password?"},
)

Returns: the function’s output as a Python object, or a BraintrustStream when stream=True. Arguments:

input (Any, required): input passed to the function.
function_id (str): function ID to invoke.
project_name (str): project containing the function.
project_id (str): project ID containing the function.
slug (str): function slug.
version (str): function version.
stream (bool): return a stream when supported.

`init_function()`

Creates a Python callable for a Braintrust function, usable as an eval task or scorer.

import braintrust

answer_question = braintrust.init_function(
    project_name="Support bot",
    slug="answer-question",
)

output = answer_question({"question": "How do I reset my password?"})

Returns: a callable that invokes the function. Arguments:

project_name (str, required): project containing the function.
slug (str, required): function slug.
version (str): function version. Defaults to latest.

Attachments

Attachments let you log files or large payloads without storing the full bytes inline in the span. When you trace AI calls, Braintrust automatically converts base64 attachments in provider messages into uploaded attachments, so you rarely need the APIs below for instrumented calls. Reach for them when you’re attaching binary content to a span yourself.

`Attachment`

Wraps file data so you can attach it to logged data. The uploaded value is replaced with an attachment reference in Braintrust logs.

from braintrust import Attachment

logger.log(
    input="screenshot",
    metadata={
        "image": Attachment(
            data=open("screenshot.png", "rb").read(),
            filename="screenshot.png",
            content_type="image/png",
        )
    },
)

`ReadonlyAttachment`

Reads an already-uploaded attachment.

attachment = row["metadata"]["image"]
contents = attachment.data
metadata = attachment.metadata()

Methods:

data: lazily downloads the attachment contents as bytes.
metadata(): returns attachment metadata.
status(): returns upload or availability status.

Configuration

Configure the SDK with environment variables, or pass the equivalent options to init_logger() and login().

`set_http_adapter()`

Sets a custom requests HTTP adapter for Braintrust network requests. Use it for custom retry policies and timeouts.

from requests.adapters import HTTPAdapter

braintrust.set_http_adapter(HTTPAdapter(max_retries=3))

Environment variables

BRAINTRUST_API_KEY (required): Braintrust API key.
BRAINTRUST_API_URL: Braintrust API URL. Defaults to https://api.braintrust.dev.
BRAINTRUST_APP_URL: Braintrust app URL. Defaults to https://www.braintrust.dev.
BRAINTRUST_ORG_NAME: organization name, useful when credentials can access multiple orgs.

​Tracing

​init_logger()

​auto_instrument()

​traced()

​start_span()

​current_logger() and current_span()

​flush()

​permalink()

​set_masking_function()

​Evaluations

​Eval()

​EvalAsync()

​Reporter()

​Experiments

​init() / init_experiment()

​Datasets

​init_dataset()

​Prompts and functions

​load_prompt()

​invoke()

​init_function()

​Attachments

​Attachment

​ReadonlyAttachment

​Configuration

​set_http_adapter()

​Environment variables