Braintrust Python SDK

This page provides a complete, auto-generated reference for the Braintrust Python SDK. For usage guidance on important APIs, see the API Reference. Also see the Braintrust Python SDK on GitHub.

Requirements

Python 3.10 or higher

Installation

pip install braintrust

Functions

Eval

A function you can use to define an evaluator. This is a convenience wrapper around the Evaluator class.

name

str

The name of the evaluator. This corresponds to a project name in Braintrust.

data

EvalData[Input, Expected]

Returns an iterator over the evaluation dataset. Each element of the iterator should be a EvalCase.

task

EvalTask[Input, Output, Expected]

Runs the evaluation task on a single input. The hooks object can be used to add metadata to the evaluation.

scores

Sequence[EvalScorer[Input, Output, Expected]] | None

A list of scorers to evaluate the results of the task. Each scorer can be a Scorer object or a function that takes an EvalScorerArgs object and returns a Score object.

classifiers

Sequence[EvalClassifier[Input, Output, Expected]] | None

experiment_name

str | None

(Optional) Experiment name. If not specified, a name will be generated automatically.

trial_count

int

The number of times to run the evaluator per input. This is useful for evaluating applications that have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the variance in the results.

metadata

Metadata | None

(Optional) A dictionary with additional data about the test example, model outputs, or just about anything else that’s relevant, that you can use to help find and analyze examples later. For example, you could log the prompt, example’s id, or anything else that would be useful to slice/dice later. The values in metadata can be any JSON-serializable type, but its keys must be strings.

tags

Sequence[str] | None

(Optional) A list of tags to associate with the experiment.

is_public

bool

(Optional) Whether the experiment should be public. Defaults to false.

update

bool

reporter

ReporterDef[Input, Output, Expected, EvalReport] | None

(Optional) A reporter that takes an evaluator and its result and returns a report.

reporter.name

str

required

reporter.report_eval

Callable[[Evaluator[Input, Output, Expected], EvalResultWithSummary[Input, Output, Expected], bool, bool], EvalReport | Awaitable[EvalReport]]

required

reporter.report_run

Callable[[list[EvalReport], bool, bool], bool | Awaitable[bool]]

required

timeout

float | None

(Optional) The duration, in seconds, after which to time out the evaluation. Defaults to None, in which case there is no timeout.

max_concurrency

int | None

project_id

str | None

(Optional) If specified, uses the given project ID instead of the evaluator’s name to identify the project.

base_experiment_name

str | None

An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

base_experiment_id

str | None

An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment. This takes precedence over base_experiment_name if specified.

git_metadata_settings

GitMetadataSettings | None

Optional settings for collecting git metadata. By default, defers to org-level settings returned by the control plane.

git_metadata_settings.collect

Literal['all', 'some', 'none']

required

git_metadata_settings.fields

list[str] | None

required

repo_info

RepoInfo | None

Optionally explicitly specify the git metadata for this experiment. This takes precedence over git_metadata_settings if specified.

repo_info.commit

str | None

required

repo_info.branch

str | None

required

repo_info.tag

str | None

required

repo_info.dirty

bool | None

required

repo_info.author_name

str | None

required

repo_info.author_email

str | None

required

repo_info.commit_message

str | None

required

repo_info.commit_time

str | None

required

repo_info.git_diff

str | None

required

error_score_handler

ErrorScoreHandler[Input, Expected] | None

Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored.

description

str | None

An optional description for the experiment.

summarize_scores

bool

Whether to summarize the scores of the experiment after it has run.

no_send_logs

bool

Do not send logs to Braintrust. When True, the evaluation runs locally and builds a local summary instead of creating an experiment. Defaults to False.

parameters

EvalParameters | RemoteEvalParameters | None

A set of parameters that will be passed to the evaluator.

on_start

Callable[[ExperimentSummary], None] | None

An optional callback that will be called when the evaluation starts. It receives the ExperimentSummary object, which can be used to display metadata about the experiment.

stream

Callable[[SSEProgressEvent], None] | None

A function that will be called with progress events, which can be used to display intermediate progress.

parent

str | None

If specified, instead of creating a new experiment object, the Eval() will populate the object or span specified by this parent.

state

BraintrustState | None

Optional BraintrustState to use for the evaluation. If not specified, the global login state will be used.

enable_cache

bool

Whether to enable the span cache for this evaluation. Defaults to True. The span cache stores span data on disk to minimize memory usage and allow scorers to read spans without server round-trips.

EvalAsync

A function you can use to define an evaluator. This is a convenience wrapper around the Evaluator class.

name

str

The name of the evaluator. This corresponds to a project name in Braintrust.

data

EvalData[Input, Expected]

Returns an iterator over the evaluation dataset. Each element of the iterator should be a EvalCase.

task

EvalTask[Input, Output, Expected]

Runs the evaluation task on a single input. The hooks object can be used to add metadata to the evaluation.

scores

Sequence[EvalScorer[Input, Output, Expected]] | None

A list of scorers to evaluate the results of the task. Each scorer can be a Scorer object or a function that takes an EvalScorerArgs object and returns a Score object.

classifiers

Sequence[EvalClassifier[Input, Output, Expected]] | None

experiment_name

str | None

(Optional) Experiment name. If not specified, a name will be generated automatically.

trial_count

int

metadata

Metadata | None

tags

Sequence[str] | None

(Optional) A list of tags to associate with the experiment.

is_public

bool

(Optional) Whether the experiment should be public. Defaults to false.

update

bool

reporter

ReporterDef[Input, Output, Expected, EvalReport] | None

(Optional) A reporter that takes an evaluator and its result and returns a report.

reporter.name

str

required

reporter.report_eval

Callable[[Evaluator[Input, Output, Expected], EvalResultWithSummary[Input, Output, Expected], bool, bool], EvalReport | Awaitable[EvalReport]]

required

reporter.report_run

Callable[[list[EvalReport], bool, bool], bool | Awaitable[bool]]

required

timeout

float | None

(Optional) The duration, in seconds, after which to time out the evaluation. Defaults to None, in which case there is no timeout.

max_concurrency

int | None

project_id

str | None

(Optional) If specified, uses the given project ID instead of the evaluator’s name to identify the project.

base_experiment_name

str | None

An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

base_experiment_id

str | None

An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment. This takes precedence over base_experiment_name if specified.

git_metadata_settings

GitMetadataSettings | None

Optional settings for collecting git metadata. By default, defers to org-level settings returned by the control plane.

git_metadata_settings.collect

Literal['all', 'some', 'none']

required

git_metadata_settings.fields

list[str] | None

required

repo_info

RepoInfo | None

Optionally explicitly specify the git metadata for this experiment. This takes precedence over git_metadata_settings if specified.

repo_info.commit

str | None

required

repo_info.branch

str | None

required

repo_info.tag

str | None

required

repo_info.dirty

bool | None

required

repo_info.author_name

str | None

required

repo_info.author_email

str | None

required

repo_info.commit_message

str | None

required

repo_info.commit_time

str | None

required

repo_info.git_diff

str | None

required

error_score_handler

ErrorScoreHandler[Input, Expected] | None

Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored.

description

str | None

An optional description for the experiment.

summarize_scores

bool

Whether to summarize the scores of the experiment after it has run.

no_send_logs

bool

Do not send logs to Braintrust. When True, the evaluation runs locally and builds a local summary instead of creating an experiment. Defaults to False.

parameters

EvalParameters | RemoteEvalParameters | None

A set of parameters that will be passed to the evaluator.

on_start

Callable[[ExperimentSummary], None] | None

An optional callback that will be called when the evaluation starts. It receives the ExperimentSummary object, which can be used to display metadata about the experiment.

stream

Callable[[SSEProgressEvent], None] | None

A function that will be called with progress events, which can be used to display intermediate progress.

parent

str | None

If specified, instead of creating a new experiment object, the Eval() will populate the object or span specified by this parent.

state

BraintrustState | None

Optional BraintrustState to use for the evaluation. If not specified, the global login state will be used.

enable_cache

bool

Whether to enable the span cache for this evaluation. Defaults to True. The span cache stores span data on disk to minimize memory usage and allow scorers to read spans without server round-trips.

Reporter

A function you can use to define a reporter. This is a convenience wrapper around the ReporterDef class.

name

str

The name of the reporter.

report_eval

Callable[[Evaluator[Input, Output, Expected], EvalResultWithSummary[Input, Output, Expected], bool, bool], EvalReport | Awaitable[EvalReport]]

A function that takes an evaluator and its result and returns a report.

report_run

Callable[[list[EvalReport], bool, bool], bool | Awaitable[bool]]

A function that takes all evaluator results and returns a boolean indicating whether the run was successful.

auto_instrument

Auto-instrument supported AI/ML libraries for Braintrust tracing.

openai

bool

Enable OpenAI instrumentation (default: True)

anthropic

bool

Enable Anthropic instrumentation (default: True)

litellm

bool

Enable LiteLLM instrumentation (default: True)

pydantic_ai

bool

Enable Pydantic AI instrumentation (default: True)

google_genai

bool

Enable Google GenAI instrumentation (default: True)

instructor

bool

Enable Instructor (structured-output) instrumentation (default: True)

openrouter

bool

Enable OpenRouter instrumentation (default: True)

mistral

bool

Enable Mistral instrumentation (default: True)

huggingface_hub

bool

Enable HuggingFace Hub instrumentation (default: True)

agno

bool

Enable Agno instrumentation (default: True)

agentscope

bool

Enable AgentScope instrumentation (default: True)

claude_agent_sdk

bool

Enable Claude Agent SDK instrumentation (default: True)

dspy

bool

Enable DSPy instrumentation (default: True)

adk

bool

Enable Google ADK instrumentation (default: True)

langchain

bool

Enable LangChain instrumentation (default: True)

llamaindex

bool

Enable LlamaIndex instrumentation (default: True)

openai_agents

bool

Enable OpenAI Agents SDK instrumentation (default: True)

cohere

bool

Enable Cohere instrumentation (default: True)

autogen

bool

Enable AutoGen instrumentation (default: True)

bedrock

bool

Enable boto3 Bedrock Runtime instrumentation (default: True)

crewai

bool

Enable CrewAI instrumentation (default: True)

strands

bool

Enable Strands Agents instrumentation (default: True)

temporal

bool

Enable Temporal instrumentation (default: True)

livekit_agents

bool

Enable LiveKit Agents instrumentation (default: True)

current_experiment

Returns the currently-active experiment (set by braintrust.init(...)). Returns None if no current experiment has been set.

current_logger

Returns the currently-active logger (set by braintrust.init_logger(...)). Returns None if no current logger has been set.

current_span

Return the currently-active span for logging (set by running a span under a context manager). If there is no active span, returns a no-op span object, which supports the same interface as spans but does no logging.

extract_trace_context

Extract an opaque W3C trace-context from inbound request headers.

headers

dict

Inbound request headers (e.g. an HTTP framework’s headers).

flush

Flush any pending rows to the server.

get_prompt_versions

Get the versions for a specific prompt.

project_id

str

The ID of the project to query

prompt_id

str

The ID of the prompt to get versions for

get_span_parent_object

Mainly for internal use. Return the parent object for starting a span in a global context. Applies precedence: current span > propagated parent > experiment > logger.

parent

str | dict | None

state

BraintrustState | None

init

project

str | None

The name of the project to create the experiment in. Must specify at least one of project or project_id.

experiment

str | None

The name of the experiment to create. If not specified, a name will be generated automatically.

description

str | None

(Optional) An optional description of the experiment.

dataset

Dataset | None | DatasetRef

(Optional) A dataset to associate with the experiment. The dataset must be initialized with braintrust.init_dataset before passing it into the experiment.

parameters

RemoteEvalParameters | ParametersRef | None

(Optional) Saved parameters to associate with the experiment. Pass either a RemoteEvalParameters object or a dictionary containing an id and optional version.

parameters.id

str | None

required

parameters.project_id

str | None

required

parameters.name

str

required

parameters.slug

str

required

parameters.version

str | None

required

parameters.schema

ParametersSchema

required

parameters.data

dict[str, Any]

required

open

bool

If the experiment already exists, open it in read-only mode. Throws an error if the experiment does not already exist.

base_experiment

str | None

An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment. Otherwise, it will pick an experiment by finding the closest ancestor on the default (e.g. main) branch.

is_public

bool

An optional parameter to control whether the experiment is publicly visible to anybody with the link or privately visible to only members of the organization. Defaults to private.

app_url

str | None

The URL of the Braintrust App. Defaults to https://www.braintrust.dev.

api_key

str | None

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login.

org_name

str | None

(Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.

metadata

Metadata | None

(Optional) a dictionary with additional data about the test example, model outputs, or just about anything else that’s relevant, that you can use to help find and analyze examples later. For example, you could log the prompt, example’s id, or anything else that would be useful to slice/dice later. The values in metadata can be any JSON-serializable type, but its keys must be strings.

tags

Sequence[str] | None

(Optional) a list of strings to tag the experiment with. Tags can be used to filter and organize experiments.

git_metadata_settings

GitMetadataSettings | None

(Optional) Settings for collecting git metadata. By default, will collect git metadata fields allowed in org-level settings, excluding diff content unless the org opts in.

git_metadata_settings.collect

Literal['all', 'some', 'none']

required

git_metadata_settings.fields

list[str] | None

required

set_current

bool

If true (the default), set the global current-experiment to the newly-created one.

update

bool | None

If the experiment already exists, continue logging to it. If it does not exist, creates the experiment with the specified arguments.

project_id

str | None

The id of the project to create the experiment in. This takes precedence over project if specified.

base_experiment_id

str | None

An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this. This takes precedence over base_experiment if specified.

repo_info

RepoInfo | None

(Optional) Explicitly specify the git metadata for this experiment. This takes precedence over git_metadata_settings if specified.

repo_info.commit

str | None

required

repo_info.branch

str | None

required

repo_info.tag

str | None

required

repo_info.dirty

bool | None

required

repo_info.author_name

str | None

required

repo_info.author_email

str | None

required

repo_info.commit_message

str | None

required

repo_info.commit_time

str | None

required

repo_info.git_diff

str | None

required

state

BraintrustState | None

(Optional) A BraintrustState object to use. If not specified, will use the global state. This is for advanced use only.

init_dataset

Create a new dataset in a specified project. If the project does not exist, it will be created.

project

str | None

name

str | None

The name of the dataset to create. If not specified, a name will be generated automatically.

description

str | None

An optional description of the dataset.

version

str | int | None

An optional version of the dataset (to read). If not specified, the latest version will be used.

app_url

str | None

The URL of the Braintrust App. Defaults to https://www.braintrust.dev.

api_key

str | None

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login.

org_name

str | None

(Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.

project_id

str | None

The id of the project to create the dataset in. This takes precedence over project if specified.

metadata

Metadata | None

(Optional) a dictionary, or an object that serializes to a dictionary (such as a Pydantic model), with additional data about the dataset. The values in metadata can be any JSON-serializable type, but its keys must be strings.

use_output

bool

(Deprecated) If True, records will be fetched from this dataset in the legacy format, with the “expected” field renamed to “output”. This option will be removed in a future version of Braintrust.

_internal_btql

dict[str, Any] | None

(Internal) If specified, the dataset will be created with the given BTQL filters.

state

BraintrustState | None

(Internal) The Braintrust state to use. If not specified, will use the global state. For advanced use only.

init_experiment

Alias for init

args

Any

kwargs

Any

init_function

Creates a function that can be used as either a task or scorer in the Eval framework. When used as a task, it will invoke the specified Braintrust function with the input. When used as a scorer, it will invoke the function with the scorer arguments.

project_name

str

The name of the project containing the function.

slug

str

The slug of the function to invoke.

version

str | None

Optional version of the function to use. Defaults to latest.

init_logger

Create a new logger in a specified project. If the project does not exist, it will be created.

project

str | None

The name of the project to log into. If unspecified, will default to the Global project.

project_id

str | None

The id of the project to log into. This takes precedence over project if specified.

async_flush

bool

If true (the default), log events will be batched and sent asynchronously in a background thread. If false, log events will be sent synchronously. Set to false in serverless environments.

app_url

str | None

The URL of the Braintrust API. Defaults to https://www.braintrust.dev.

api_key

str | None

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login.

org_name

str | None

(Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.

set_current

bool

If true (the default), set the global current-experiment to the newly-created one.

state

BraintrustState | None

environment

SpanOriginEnvironment | None

inject_trace_context

Inject W3C trace-context headers for the current (or given) span into a carrier.

carrier

dict | None

Optional carrier dict (e.g. outbound HTTP headers) to mutate.

span

Span | None

Optional span to inject. Defaults to the current span.

invoke

Invoke a Braintrust function, returning a BraintrustStream or the value as a plain Python object.

function_id

str | None

The ID of the function to invoke.

version

str | None

The version of the function to invoke.

prompt_session_id

str | None

The ID of the prompt session to invoke the function from.

prompt_session_function_id

str | None

The ID of the function in the prompt session to invoke.

project_name

str | None

The name of the project containing the function to invoke.

project_id

str | None

The ID of the project to use for execution context (API keys, project defaults, etc.). This is not the project the function belongs to, but the project context for the invocation.

slug

str | None

The slug of the function to invoke.

global_function

str | None

The name of the global function to invoke.

function_type

FunctionTypeEnum | None

The type of the global function to invoke. If unspecified, defaults to ‘scorer’ for backward compatibility.

input

Any

The input to the function. This will be logged as the input field in the span.

messages

Sequence[Any] | None

Additional OpenAI-style messages to add to the prompt (only works for llm functions).

metadata

Metadata | None

Additional metadata to add to the span. This will be logged as the metadata field in the span. It will also be available as the {{metadata}} field in the prompt and as the metadata argument to the function.

tags

Sequence[str] | None

Tags to add to the span. This will be logged as the tags field in the span.

parent

Exportable | str | None

The parent of the function. This can be an existing span, logger, or experiment, or the output of .export() if you are distributed tracing. If unspecified, will use the same semantics as traced() to determine the parent and no-op if not in a tracing context.

stream

bool

Whether to stream the function’s output. If True, the function will return a BraintrustStream, otherwise it will return the output of the function as a JSON object.

mode

ModeType | None

The response shape of the function if returning tool calls. If “auto”, will return a string if the function returns a string, and a JSON object otherwise. If “parallel”, will return an array of JSON objects with one object per tool call.

strict

bool | None

Whether to use strict mode for the function. If true, the function will throw an error if the variable names in the prompt do not match the input keys.

overrides

dict[str, Any] | None

Per-call deep-merge into the resolved function data server-side. Useful for facet, code, global, and remote_eval functions (for example, overriding a facet’s model or a global function’s config). Has no effect on prompt functions, whose parameters live on a separate field that the override path does not touch.

org_name

str | None

The name of the Braintrust organization to use.

api_key

str | None

The API key to use for authentication.

app_url

str | None

The URL of the Braintrust application.

Whether to force a new login even if already logged in.

load_parameters

Load saved parameters from Braintrust.

project

str | None

The name of the project to load the parameters from. Must specify at least one of project or project_id.

slug

str | None

The slug of the parameters to load.

version

str | int | None

An optional version of the parameters to read. If not specified, the latest version will be used.

project_id

str | None

The ID of the project to load the parameters from. This takes precedence over project.

str | None

The ID of a specific parameters object to load. If specified, this takes precedence over project and slug.

environment

str | None

The environment to load the parameters from. If both version and environment are provided, version takes precedence.

app_url

str | None

The URL of the Braintrust App. Defaults to https://www.braintrust.dev.

api_key

str | None

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable.

org_name

str | None

The name of a specific organization to connect to.

load_prompt

Loads a prompt from the specified project.

project

str | None

The name of the project to load the prompt from. Must specify at least one of project or project_id.

slug

str | None

The slug of the prompt to load.

version

str | int | None

An optional version of the prompt (to read). If not specified, the latest version will be used.

project_id

str | None

The id of the project to load the prompt from. This takes precedence over project if specified.

str | None

The id of a specific prompt to load. If specified, this takes precedence over all other parameters (project, slug, version).

defaults

Mapping[str, Any] | None

(Optional) A dictionary of default values to use when rendering the prompt. Prompt values will override these defaults.

no_trace

bool

If true, do not include logging metadata for this prompt when build() is called.

environment

str | None

The environment to load the prompt from. If both version and environment are provided, version takes precedence.

app_url

str | None

The URL of the Braintrust App. Defaults to https://www.braintrust.dev.

api_key

str | None

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login.

org_name

str | None

(Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.

load_prompt_async

Asynchronously loads a prompt from the specified project.

project

str | None

slug

str | None

version

str | int | None

project_id

str | None

defaults

Mapping[str, Any] | None

no_trace

bool

environment

str | None

app_url

str | None

api_key

str | None

org_name

str | None

log

Log a single event to the current experiment. The event will be batched and uploaded behind the scenes.

event

Any

Log into Braintrust. This will prompt you for your API token, which you can find at https://www.braintrust.dev/app/token. This method is called automatically by init().

app_url

str | None

The URL of the Braintrust App. Defaults to https://www.braintrust.dev.

api_key

str | None

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login.

org_name

str | None

(Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.

Login again, even if you have already logged in (by default, this function will exit quickly if you have already logged in)

parent_context

Context manager to temporarily set the parent context for spans.

parent

str | dict | None

The parent to set during the context. May be an exported slug string or an opaque W3C trace-context dict (from extract_trace_context).

state

BraintrustState | None

Optional BraintrustState to use. If not provided, uses the global state.

parse_stream

Parse a BraintrustStream into its final value.

stream

BraintrustStream

The BraintrustStream to parse.

patch_openai

Patch OpenAI globally for Braintrust tracing.

permalink

Format a permalink to the Braintrust application for viewing the span represented by the provided slug.

slug

str

The identifier generated from Span.export.

org_name

str | None

The org name to use. If not provided, the org name will be inferred from the global login state.

app_url

str | None

The app URL to use. If not provided, the app URL will be inferred from the global login state.

register_otel_flush

Register a callback to flush OTEL spans. This is called by the OTEL integration when it initializes a span processor/exporter.

callback

Any

The async callback function to flush OTEL spans.

run_evaluator

Wrapper on _run_evaluator_internal that times out execution after evaluator.timeout.

experiment

Experiment | None

evaluator

Evaluator[Input, Output, Expected]

evaluator.project_name

str

required

evaluator.eval_name

str

required

evaluator.data

EvalData[Input, Expected]

required

evaluator.task

EvalTask[Input, Output, Expected]

required

evaluator.scores

Sequence[EvalScorer[Input, Output, Expected]]

required

evaluator.experiment_name

str | None

required

evaluator.metadata

Metadata | None

required

evaluator.tags

Sequence[str] | None

required

evaluator.trial_count

int

required

evaluator.is_public

bool

required

evaluator.update

bool

required

evaluator.timeout

float | None

required

evaluator.max_concurrency

int | None

required

evaluator.project_id

str | None

required

evaluator.base_experiment_name

str | None

required

evaluator.base_experiment_id

str | None

required

evaluator.git_metadata_settings

GitMetadataSettings | None

required

evaluator.repo_info

RepoInfo | None

required

evaluator.error_score_handler

ErrorScoreHandler[Input, Expected] | None

required

evaluator.description

str | None

required

evaluator.summarize_scores

bool

required

evaluator.parameters

EvalParameters | RemoteEvalParameters | None

required

evaluator.classifiers

list[EvalClassifier[Input, Output, Expected]] | None

required

evaluator.parameter_values

dict[str, Any] | None

required

position

int | None

filters

list[Filter]

stream

Callable[[SSEProgressEvent], None] | None

state

BraintrustState | None

enable_cache

bool

set_http_adapter

Specify a custom HTTP adapter to use for all network requests. This is useful for setting custom retry policies, timeouts, etc. Braintrust uses the requests library, so the adapter should be an instance of requests.adapters.HTTPAdapter. Alternatively, consider sub-classing our RetryRequestExceptionsAdapter to get automatic retries on network-related exceptions.

adapter

HTTPAdapter

The adapter to use.

set_masking_function

Set a global masking function that will be applied to all logged data before sending to Braintrust. The masking function will be applied after records are merged but before they are sent to the backend.

masking_function

Callable[[Any], Any] | None

A function that takes a JSON-serializable object and returns a masked version. Set to None to disable masking.

set_thread_pool_max_workers

Set the maximum number of threads to use for running evaluators. By default, this is the number of CPUs on the machine.

max_workers

Any

span_components_to_object_id

Utility function to resolve the object ID of a SpanComponentsV4 object. This function may trigger a login to braintrust if the object ID is encoded lazily.

components

SpanComponentsV4

components.object_type

SpanObjectTypeV3

required

components.object_id

str | None

required

components.compute_object_metadata_args

dict | None

required

components.row_id

str | None

required

components.span_id

str | None

required

components.root_span_id

str | None

required

components.propagated_event

dict | None

required

start_span

Lower-level alternative to @traced for starting a span at the toplevel. It creates a span under the first active object (using the same precedence order as @traced), or if parent is specified, under the specified parent row, or returns a no-op span object.

name

str | None

type

SpanTypeAttribute | None

span_attributes

SpanAttributes | Mapping[str, Any] | None

start_time

float | None

set_current

bool | None

parent

str | dict | None

propagated_event

dict[str, Any] | None

state

BraintrustState | None

internal

SpanInternalOptions | None

event

Any

summarize

Summarize the current experiment, including the scores (compared to the closest reference experiment) and metadata.

summarize_scores

bool

Whether to summarize the scores. If False, only the metadata will be returned.

comparison_experiment_id

str | None

The experiment to compare against. If None, the most recent experiment on the comparison_commit will be used.

traced

Decorator to trace the wrapped function. Can either be applied bare (@traced) or by providing arguments (@traced(*span_args, **span_kwargs)), which will be forwarded to the created span. See Span.start_span for full details on the span arguments.

span_args

Any

span_kwargs

Any

update_span

Update a span using the output of span.export(). It is important that you only resume updating to a span once the original span has been fully written and flushed, since otherwise updates to the span may conflict with the original span.

exported

str

The output of span.export().

event

Any

Classes

AsyncScorerLike

Protocol for asynchronous scorers that implement the eval_async interface. The framework will prefer this interface if available. Methods eval_async()

Attachment

Represents an attachment to be uploaded and the associated metadata. Properties

reference

AttachmentReference

The object that replaces this Attachment at upload time.

data

bytes

The attachment contents. This is a lazy value that will read the attachment contents from disk or memory on first access.

Methods __init__(), upload(), debug_info()

BaseExperiment

Use this to specify that the dataset should actually be the data from a previous (base) experiment. If you do not specify a name, Braintrust will automatically figure out the best base experiment to use based on your git history (or fall back to timestamps). Properties

name

str | None

The name of the base experiment to use. If unspecified, Braintrust will automatically figure out the best base using your git history (or fall back to timestamps).

Methods __init__()

BraintrustConsoleChunk

A console chunk from a Braintrust stream. Properties

message

str

stream

Literal['stderr', 'stdout']

type

Literal['console']

Methods __init__()

BraintrustErrorChunk

An error chunk from a Braintrust stream. Properties

data

str

type

Literal['error']

Methods __init__()

BraintrustInvokeError

An error that occurs during a Braintrust stream.

BraintrustJsonChunk

A chunk of JSON data from a Braintrust stream. Properties

data

str

type

Literal['json_delta']

Methods __init__()

BraintrustProgressChunk

A progress chunk from a Braintrust stream. Properties

data

str

object_type

str

format

str

output_type

str

name

str

event

Literal['json_delta', 'text_delta', 'reasoning_delta']

type

Literal['progress']

Methods __init__()

BraintrustStream

A Braintrust stream. This is a wrapper around a generator of BraintrustStreamChunk, with utility methods to make them easy to log and convert into various formats. Properties

stream

Iterable[BraintrustStreamChunk]

Methods __init__(), copy(), final_value()

BraintrustTextChunk

A chunk of text data from a Braintrust stream. Properties

data

str

type

Literal['text_delta']

Methods __init__()

ClassifierBuilder

Builder to create a classifier in Braintrust. Properties

project

Any

Methods __init__(), create()

CodeFunction

A generic callable, with metadata. Properties

project

Project

handler

Callable[..., Any]

name

str

slug

str

type_

str

description

str | None

parameters

Any

returns

Any

if_exists

IfExists | None

metadata

Metadata | None

tags

Sequence[str] | None

Methods __init__()

CodePrompt

A prompt defined in code, with metadata. Properties

project

Project

name

str

slug

str

prompt

PromptData

tool_functions

list[CodeFunction | SavedFunctionId]

description

str | None

function_type

str | None

if_exists

IfExists | None

metadata

Metadata | None

tags

Sequence[str] | None

Methods to_function_definition(), __init__()

DataSummary

Summary of a dataset’s data. Properties

new_records

int

New or updated records added in this session.

total_records

int

Total records in the dataset.

Methods __init__()

Dataset

A dataset is a collection of records, such as model inputs and outputs, which represent data you can use to evaluate and fine-tune models. You can log production data to datasets, curate them with interesting examples, edit/delete records, and run evaluations against them. Properties

new_records

Any

state

Any

str

name

str

data

Any

project

Any

logging_state

BraintrustState

Methods __init__(), insert(), update(), delete(), summarize(), close(), flush()

DatasetRef

Reference to a dataset by ID and optional version. Properties

str

version

str

DatasetSummary

Summary of a dataset’s scores and metadata. Properties

project_name

str

Name of the project that the dataset belongs to.

dataset_name

str

Name of the dataset.

project_url

str

URL to the project’s page in the Braintrust app.

dataset_url

str

URL to the experiment’s page in the Braintrust app.

data_summary

DataSummary | None

Summary of the dataset’s data.

Methods __init__()

EvalCase

An evaluation case. This is a single input to the evaluation task, along with an optional expected output, metadata, and tags. Properties

input

Input

expected

Expected | None

metadata

Metadata | None

tags

Sequence[str] | None

trial_count

int | None

str | None

created

str | None

Methods __init__()

EvalHooks

An object that can be used to add metadata to an evaluation. This is passed to the task function. Properties

metadata

Metadata | None

The metadata object for the current evaluation. You can mutate this object to add or remove metadata.

expected

Expected | None

The expected output for the current evaluation.

span

Span

Access the span under which the task is run. Also accessible via braintrust.current_span()

trial_index

int

The index of the current trial (0-based). This is useful when trial_count > 1.

tags

Sequence[str]

The tags for the current evaluation. You can mutate this object to add or remove tags.

parameters

ValidatedParameters | None

The parameters for the current evaluation. These are the validated parameter values that were passed to the evaluator.

Methods report_progress(), meta()

EvalResult

The result of an evaluation. This includes the input, expected output, actual output, and metadata. Properties

input

Input

output

Output

scores

Mapping[str, float | None]

classifications

dict[str, list[ClassificationItem]] | None

expected

Expected | None

metadata

Metadata | None

tags

list[str] | None

error

Exception | None

exc_info

str | None

Methods __init__()

EvalScorerArgs

Arguments passed to an evaluator scorer. This includes the input, expected output, actual output, and metadata. Properties

input

Input

output

Output

expected

Expected | None

metadata

Metadata | None

Evaluator

An evaluator is an abstraction that defines an evaluation dataset, a task to run on the dataset, and a set of scorers to evaluate the results of the task. Each method attribute can be synchronous or asynchronous (for optimal performance, it is recommended to provide asynchronous implementations). Properties

project_name

str

The name of the project the eval falls under.

eval_name

str

A name that describes the experiment. You do not need to change it each time the experiment runs.

data

EvalData[Input, Expected]

Returns an iterator over the evaluation dataset. Each element of the iterator should be an EvalCase or a dict with the same fields as an EvalCase (input, expected, metadata).

task

EvalTask[Input, Output, Expected]

Runs the evaluation task on a single input. The hooks object can be used to add metadata to the evaluation.

scores

Sequence[EvalScorer[Input, Output, Expected]]

A list of scorers to evaluate the results of the task. Each scorer can be a Scorer object or a function that takes input, output, and expected arguments and returns a Score object. The function can be async.

experiment_name

str | None

Optional experiment name. If not specified, a name will be generated automatically.

metadata

Metadata | None

A dictionary with additional data about the test example, model outputs, or just about anything else that’s relevant, that you can use to help find and analyze examples later. For example, you could log the prompt, example’s id, or anything else that would be useful to slice/dice later. The values in metadata can be any JSON-serializable type, but its keys must be strings.

tags

Sequence[str] | None

Optional list of tags for the experiment

trial_count

int

is_public

bool

Whether the experiment should be public. Defaults to false.

update

bool

Whether to update an existing experiment with experiment_name if one exists. Defaults to false.

timeout

float | None

The duration, in seconds, after which to time out the evaluation. Defaults to None, in which case there is no timeout.

max_concurrency

int | None

The maximum number of tasks/scorers that will be run concurrently. Defaults to None, in which case there is no max concurrency.

project_id

str | None

If specified, uses the given project ID instead of the evaluator’s name to identify the project.

base_experiment_name

str | None

An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

base_experiment_id

str | None

An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment. This takes precedence over base_experiment_name if specified.

git_metadata_settings

GitMetadataSettings | None

Optional settings for collecting git metadata. By default, defers entirely to org-level settings returned by the control plane; no git metadata is collected if the org has not configured any.

repo_info

RepoInfo | None

Optionally explicitly specify the git metadata for this experiment. This takes precedence over git_metadata_settings if specified.

error_score_handler

ErrorScoreHandler[Input, Expected] | None

Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored. A default implementation is exported as default_error_score_handler which will log a 0 score to the root span for any scorer that was not run.

description

str | None

An optional description for the experiment.

summarize_scores

bool

Whether to summarize the scores of the experiment after it has run.

parameters

EvalParameters | RemoteEvalParameters | None

A set of parameters that will be passed to the evaluator. Can be used to define prompts or other configurable values.

classifiers

list[EvalClassifier[Input, Output, Expected]] | None

Optional list of classifiers to evaluate the task output. Classifier results are recorded under the classifications field instead of scores.

parameter_values

dict[str, Any] | None

Methods __init__()

Experiment

An experiment is a collection of logged events, such as model inputs and outputs, which represent a snapshot of your application at a particular point in time. An experiment is meant to capture more than just the model you use, and includes the data you use to test, pre- and post- processing code, comparison metrics (scores), and any other metadata you want to include. Properties

dataset

Any

last_start_time

Any

state

Any

str

name

str

data

Mapping[str, Any]

project

ObjectMetadata

logging_state

BraintrustState

Methods __init__(), log(), log_feedback(), start_span(), update_span(), fetch_base_experiment(), summarize(), export(), close(), flush()

ExperimentSummary

Summary of an experiment’s scores and metadata. Properties

project_name

str

Name of the project that the experiment belongs to.

project_id

str | None

ID of the project. May be None if the eval was run locally.

experiment_id

str | None

ID of the experiment. May be None if the eval was run locally.

experiment_name

str

Name of the experiment.

project_url

str | None

URL to the project’s page in the Braintrust app.

experiment_url

str | None

URL to the experiment’s page in the Braintrust app.

comparison_experiment_name

str | None

The experiment scores are baselined against.

scores

dict[str, ScoreSummary]

Summary of the experiment’s scores.

metrics

dict[str, MetricSummary]

Summary of the experiment’s metrics.

Methods __init__()

ExternalAttachment

Represents an attachment that resides in an external object store and the associated metadata. Properties

reference

AttachmentReference

The object that replaces this Attachment at upload time.

data

bytes

The attachment contents. This is a lazy value that will read the attachment contents from the external object store on first access.

Methods __init__(), upload(), debug_info()

JSONAttachment

A convenience class for creating attachments from JSON-serializable objects. Methods __init__()

MetricSummary

Summary of a metric’s performance. Properties

name

str

Name of the metric.

metric

float | int

Average metric across all examples.

unit

str

Unit label for the metric.

improvements

int | None

Number of improvements in the metric.

regressions

int | None

Number of regressions in the metric.

diff

float | None

Difference in metric between the current and reference experiment.

Methods __init__()

ParametersBuilder

Builder to create saved parameters in Braintrust. Properties

project

Any

Methods __init__(), create()

ParametersRef

Reference to saved parameters by ID and optional version. Properties

str

version

str

Project

A handle to a Braintrust project. Properties

name

Any

tools

Any

prompts

Any

parameters

Any

scorers

Any

classifiers

Any

Methods __init__(), add_code_function(), add_prompt(), add_parameters(), publish()

ProjectBuilder

Creates handles to Braintrust projects. Methods create()

Prompt

A prompt object consists of prompt text, a model, and model parameters (such as temperature), which can be used to generate completions or chat messages. The prompt object supports calling .build() which uses mustache templating to build the prompt with the given formatting options and returns a plain dictionary that includes the built prompt and arguments. The dictionary can be passed as kwargs to the OpenAI client or modified as you see fit. Properties

defaults

Any

no_trace

Any

str

name

str

slug

str

prompt

PromptBlockData | None

version

str

options

PromptOptions

Methods __init__(), from_prompt_data(), build()

PromptBuilder

Builder to create a prompt in Braintrust. Properties

project

Any

Methods __init__(), create()

ReadonlyAttachment

A readonly alternative to Attachment, which can be used for fetching already-uploaded Attachments. Properties

reference

Any

data

bytes

The attachment contents. This is a lazy value that will read the attachment contents from the object store on first access.

Methods __init__(), metadata(), status()

ReadonlyExperiment

A read-only view of an experiment, initialized by passing open=True to init(). Properties

state

Any

str

logging_state

BraintrustState

Methods __init__(), as_dataset()

RepoInfo

Information about the current HEAD of the repo. Properties

commit

str | None

branch

str | None

tag

str | None

dirty

bool | None

author_name

str | None

author_email

str | None

commit_message

str | None

commit_time

str | None

git_diff

str | None

Methods __init__()

ReporterDef

A reporter takes an evaluator and its result and returns a report. Properties

name

str

The name of the reporter.

report_eval

Callable[[Evaluator[Input, Output, Expected], EvalResultWithSummary[Input, Output, Expected], bool, bool], EvalReport | Awaitable[EvalReport]]

A function that takes an evaluator and its result and returns a report.

report_run

Callable[[list[EvalReport], bool, bool], bool | Awaitable[bool]]

A function that takes all evaluator results and returns a boolean indicating whether the run was successful. If you return false, the braintrust eval command will exit with a non-zero status code.

Methods __init__()

RetryRequestExceptionsAdapter

An HTTP adapter that automatically retries requests on connection exceptions. Properties

base_num_retries

Any

backoff_factor

Any

default_timeout_secs

Any

Methods __init__(), send()

SSEProgressEvent

A progress event that can be reported during task execution, specifically for SSE (Server-Sent Events) streams. This is a subclass of TaskProgressEvent with additional fields for SSE-specific metadata. Properties

str

object_type

str

origin

ObjectReference

name

str

ScoreSummary

Summary of a score’s performance. Properties

name

str

Name of the score.

score

float

Average score across all examples.

improvements

int | None

Number of improvements in the score.

regressions

int | None

Number of regressions in the score.

diff

float | None

Difference in score between the current and reference experiment.

Methods __init__()

ScorerBuilder

Builder to create a scorer in Braintrust. Properties

project

Any

Methods __init__(), create()

Span

A Span encapsulates logged data and metrics for a unit of work. This interface is shared by all span implementations. Properties

str

Row ID of the span.

name

str

Name of the span, for display purposes only.

Methods log(), log_feedback(), start_span(), export(), inject(), link(), permalink(), end(), flush(), close(), set_attributes(), set_current(), unset_current()

SpanIds

The three IDs that define a span’s position in the trace tree. Properties

span_id

str

root_span_id

str

span_parents

list[str] | None

Methods __init__()

SpanImpl

Primary implementation of the Span interface. See the Span interface for full details on each method. Properties

can_set_current

bool

state

Any

parent_object_type

Any

parent_object_id

Any

parent_compute_object_metadata_args

Any

propagated_event

Any

propagated_state

Any

span_id

Any

root_span_id

Any

span_parents

Any

str

name

str

Methods __init__(), set_attributes(), log(), log_internal(), log_feedback(), start_span(), end(), export(), inject(), link(), permalink(), close(), flush(), set_current(), unset_current()

SpanInternalOptions

Options reserved for Braintrust SDK internals. Properties

instrumentation

str

Identifier for the instrumentation code creating the span. Stamped into context.span_origin.instrumentation.name. Set by SDK integrations (openai-auto, anthropic-auto, etc.).

SpanScope

Scope for operating on a single span. Properties

type

Literal['span']

str

root_span_id

str

SyncScorerLike

Protocol for synchronous scorers that implement the callable interface. This is the most common interface and is used when no async version is available. Methods __call__()

TaskProgressEvent

Progress event that can be reported during task execution. Properties

format

FunctionFormat

output_type

FunctionOutputType

event

Literal['reasoning_delta', 'text_delta', 'json_delta', 'error', 'console', 'start', 'done', 'progress']

data

str

ToolBuilder

Builder to create a tool in Braintrust. Properties

project

Any

Methods __init__(), create()

TraceScope

Scope for operating on an entire trace. Properties

type

Literal['trace']

root_span_id

str

Integration modules

braintrust.integrations.adk

Braintrust integration for Google ADK. Import from braintrust.integrations.adk.

adk.ADKIntegration

Braintrust instrumentation for Google ADK (Agent Development Kit). Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

adk.setup_adk

Setup Braintrust integration with Google ADK. Will automatically patch Google ADK agents, runners, flows, and MCP tools for automatic tracing.

api_key

str | None

Braintrust API key.

project_id

str | None

Braintrust project ID.

project_name

str | None

Braintrust project name.

SpanProcessor

type | None

Deprecated parameter.

adk.setup_braintrust

args

Any

kwargs

Any

adk.wrap_agent

Manually patch an agent class for tracing.

Agent

Any

adk.wrap_runner

Manually patch a runner class for tracing.

Runner

Any

adk.wrap_flow

Manually patch a flow class for tracing.

Flow

Any

adk.wrap_mcp_tool

Manually patch an MCP tool class for tracing.

McpTool

Any

braintrust.integrations.agentscope

Braintrust integration for AgentScope. Import from braintrust.integrations.agentscope.

agentscope.AgentScopeIntegration

Braintrust instrumentation for AgentScope. Requires AgentScope v1.0.0 or higher. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

agentscope.setup_agentscope

Setup Braintrust integration with AgentScope.

api_key

str | None

project_id

str | None

project_name

str | None

braintrust.integrations.agno

Braintrust integration for Agno. Import from braintrust.integrations.agno.

agno.AgnoIntegration

Braintrust instrumentation for Agno. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

agno.setup_agno

Setup Braintrust integration with Agno. Will automatically patch Agno agents, models, and function calls for tracing.

api_key

str | None

Braintrust API key (optional, can use env var BRAINTRUST_API_KEY)

project_id

str | None

Braintrust project ID (optional)

project_name

str | None

Braintrust project name (optional, can use env var BRAINTRUST_PROJECT)

agno.wrap_agent

Manually patch an Agent class for tracing.

Agent

Any

agno.wrap_function_call

Manually patch a FunctionCall class for tracing.

FunctionCall

Any

agno.wrap_model

Manually patch a Model class for tracing.

Model

Any

agno.wrap_team

Manually patch a Team class for tracing.

Team

Any

agno.wrap_workflow

Manually patch a Workflow class for tracing.

Workflow

Any

braintrust.integrations.anthropic

Import from braintrust.integrations.anthropic.

anthropic.AnthropicIntegration

Braintrust instrumentation for the Anthropic Python SDK on anthropic>=0.48.0. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

anthropic.wrap_anthropic_client

client

Any

braintrust.integrations.autogen

Braintrust AutoGen integration. Import from braintrust.integrations.autogen.

autogen.AutoGenIntegration

Braintrust instrumentation for Microsoft AutoGen AgentChat. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

autogen.setup_autogen

Setup Braintrust integration with AutoGen.

api_key

str | None

project_id

str | None

project_name

str | None

braintrust.integrations.bedrock_runtime

Public entry points for the boto3 Bedrock Runtime integration. Import from braintrust.integrations.bedrock_runtime.

bedrock_runtime.BedrockRuntimeIntegration

Braintrust instrumentation for boto3 Bedrock Runtime clients. Properties

name

Any

import_names

Any

distribution_names

Any

min_version

Any

patchers

Any

bedrock_runtime.setup_bedrock

Patch botocore client creation to auto-wrap Bedrock Runtime clients.

bedrock_runtime.wrap_bedrock

Instrument a boto3 Bedrock Runtime client instance in place.

client

Any

braintrust.integrations.claude_agent_sdk

Braintrust integration for Claude Agent SDK with automatic tracing. Import from braintrust.integrations.claude_agent_sdk.

claude_agent_sdk.ClaudeAgentSDKIntegration

Braintrust instrumentation for the Claude Agent SDK. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

claude_agent_sdk.setup_claude_agent_sdk

Setup Braintrust integration with Claude Agent SDK. Will automatically patch the SDK for automatic tracing.

api_key

str | None

Braintrust API key.

project_id

str | None

Braintrust project ID.

project

str | None

Braintrust project name.

braintrust.integrations.cohere

Braintrust integration for the Cohere Python SDK. Import from braintrust.integrations.cohere.

cohere.CohereIntegration

Braintrust instrumentation for the Cohere Python SDK. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

cohere.wrap_cohere

Wrap a Cohere client instance for Braintrust tracing.

client

Any

braintrust.integrations.crewai

Braintrust integration for CrewAI. Import from braintrust.integrations.crewai.

crewai.BraintrustCrewAIListener

CrewAI event-bus listener that maps events into Braintrust spans.

crewai.CrewAIIntegration

Braintrust instrumentation for CrewAI. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

crewai.patch_crewai

crewai.setup_crewai

Set up Braintrust tracing for CrewAI.

api_key

str | None

Braintrust API key (optional, BRAINTRUST_API_KEY env var works too).

project_id

str | None

Braintrust project id (optional).

project_name

str | None

Braintrust project name (optional, BRAINTRUST_PROJECT env var works too).

braintrust.integrations.dspy

Braintrust integration for DSPy. Import from braintrust.integrations.dspy.

dspy.BraintrustDSpyCallback

Callback handler that logs DSPy execution traces to Braintrust. Methods __init__(), on_lm_start(), on_lm_end(), on_module_start(), on_module_end(), on_adapter_format_start(), on_adapter_format_end(), on_adapter_parse_start(), on_adapter_parse_end(), on_tool_start(), on_tool_end(), on_evaluate_start(), on_evaluate_end()

dspy.DSPyIntegration

Braintrust instrumentation for DSPy. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

dspy.patch_dspy

Patch DSPy to automatically add Braintrust tracing callback.

braintrust.integrations.google_genai

Braintrust integration for Google GenAI. Import from braintrust.integrations.google_genai.

google_genai.GoogleGenAIIntegration

Braintrust instrumentation for the Google GenAI Python SDK. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

google_genai.setup_genai

Setup Braintrust integration with Google GenAI.

api_key

str | None

Braintrust API key.

project_id

str | None

Braintrust project ID.

project_name

str | None

Braintrust project name.

braintrust.integrations.huggingface_hub

Braintrust integration for the HuggingFace Hub Python SDK. Import from braintrust.integrations.huggingface_hub.

huggingface_hub.HuggingFaceHubIntegration

Braintrust instrumentation for the HuggingFace Hub Python SDK. Properties

name

Any

import_names

Any

distribution_names

Any

min_version

Any

patchers

Any

huggingface_hub.wrap_huggingface_hub

Wrap an InferenceClient or AsyncInferenceClient for Braintrust tracing.

client

Any

braintrust.integrations.instructor

Braintrust integration for the Instructor structured-output library. Import from braintrust.integrations.instructor.

instructor.InstructorIntegration

Braintrust instrumentation for the Instructor structured-output library. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

instructor.wrap_instructor

Instrument an instructor.Instructor / AsyncInstructor client in place.

client

Any

braintrust.integrations.langchain

Braintrust integration for LangChain. Import from braintrust.integrations.langchain.

langchain.BraintrustCallbackHandler

Methods __init__()

langchain.LangChainIntegration

Braintrust instrumentation for LangChain via a global callback handler. Properties

name

Any

import_names

Any

patchers

Any

langchain.set_global_handler

handler

Any

langchain.setup_langchain

Setup Braintrust integration with LangChain.

api_key

str | None

project_id

str | None

project_name

str | None

braintrust.integrations.litellm

Braintrust LiteLLM integration. Import from braintrust.integrations.litellm.

litellm.LiteLLMIntegration

Braintrust instrumentation for the LiteLLM Python SDK. Properties

name

Any

import_names

Any

patchers

Any

litellm.patch_litellm

Patch LiteLLM top-level callables to emit Braintrust spans.

litellm.wrap_litellm

Wrap a LiteLLM module-like object in-place with Braintrust tracing.

litellm

Any

braintrust.integrations.livekit_agents

Import from braintrust.integrations.livekit_agents.

livekit_agents.LiveKitAgentsIntegration

Properties

name

Any

import_names

Any

distribution_names

Any

min_version

Any

patchers

Any

livekit_agents.setup_livekit_agents

livekit_agents.wrap_livekit_agents

target

Any

braintrust.integrations.llamaindex

Braintrust integration for LlamaIndex. Import from braintrust.integrations.llamaindex.

llamaindex.BraintrustSpanHandler

Methods __init__()

llamaindex.LlamaIndexIntegration

Braintrust instrumentation for LlamaIndex via dispatcher handlers. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

llamaindex.setup_llamaindex

api_key

str | None

project_id

str | None

project_name

str | None

braintrust.integrations.mistral

Braintrust integration for the Mistral Python SDK. Import from braintrust.integrations.mistral.

mistral.MistralIntegration

Braintrust instrumentation for the Mistral Python SDK. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

mistral.wrap_mistral

Wrap a single Mistral client instance for tracing.

client

Any

braintrust.integrations.openai

Braintrust integration for the OpenAI Python SDK and OpenAI-compatible gateways. Import from braintrust.integrations.openai.

openai.OpenAIIntegration

Braintrust instrumentation for the OpenAI Python SDK. Properties

name

Any

import_names

Any

patchers

Any

openai.setup_openai

Setup Braintrust integration with OpenAI.

api_key

str | None

Braintrust API key (optional, can use env var BRAINTRUST_API_KEY)

project_id

str | None

Braintrust project ID (optional)

project_name

str | None

Braintrust project name (optional, can use env var BRAINTRUST_PROJECT)

openai.wrap_openai

Manually wrap an OpenAI client instance for tracing.

client

Any

braintrust.integrations.openai_agents

Braintrust integration for the OpenAI Agents SDK. Import from braintrust.integrations.openai_agents.

openai_agents.BraintrustTracingProcessor

Methods __init__()

openai_agents.OpenAIAgentsIntegration

Braintrust instrumentation for the OpenAI Agents SDK. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

openai_agents.setup_openai_agents

Setup Braintrust tracing for the OpenAI Agents SDK.

api_key

str | None

project_id

str | None

project

str | None

project_name

str | None

braintrust.integrations.openrouter

Braintrust integration for the OpenRouter Python SDK. Import from braintrust.integrations.openrouter.

openrouter.OpenRouterIntegration

Braintrust instrumentation for the OpenRouter Python SDK. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

openrouter.wrap_openrouter

Wrap a single OpenRouter client instance for tracing.

client

Any

braintrust.integrations.pydantic_ai

Braintrust integration for Pydantic AI. Import from braintrust.integrations.pydantic_ai.

pydantic_ai.PydanticAIIntegration

Braintrust instrumentation for Pydantic AI. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

pydantic_ai.setup_pydantic_ai

Setup Braintrust integration with Pydantic AI. Will automatically patch Pydantic AI agents and direct API functions for automatic tracing.

api_key

str | None

Braintrust API key.

project_id

str | None

Braintrust project ID.

project_name

str | None

Braintrust project name.

pydantic_ai.wrap_agent

Agent

Any

pydantic_ai.wrap_model_classes

Deprecated compatibility shim for scanning currently loaded model subclasses.

pydantic_ai.wrap_model_request

original_func

Any

pydantic_ai.wrap_model_request_sync

original_func

Any

pydantic_ai.wrap_model_request_stream

original_func

Any

pydantic_ai.wrap_model_request_stream_sync

original_func

Any

braintrust.integrations.strands

Braintrust integration for Strands Agents. Import from braintrust.integrations.strands.

strands.StrandsIntegration

Braintrust instrumentation for Strands Agents. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

strands.setup_strands

Set up Braintrust tracing for Strands Agents.

api_key

str | None

project_id

str | None

project_name

str | None

strands.wrap_strands_tracer

Manually patch a Strands Tracer class for Braintrust tracing.

Tracer

Any

braintrust.integrations.temporal

Braintrust integration for Temporal workflows and activities. Import from braintrust.integrations.temporal.

temporal.BraintrustInterceptor

Braintrust interceptor for tracing Temporal workflows and activities. Properties

payload_converter

Any

Methods __init__(), intercept_client(), intercept_activity(), workflow_interceptor_class()

temporal.BraintrustPlugin

Braintrust plugin for Temporal that automatically configures tracing. Methods __init__()

temporal.TemporalIntegration

Braintrust instrumentation for Temporal workflows and activities. Properties

name

Any

import_names

Any

min_version

Any

patchers

Any

temporal.setup_temporal

Set up Braintrust auto-instrumentation for Temporal.

api_key

str | None

project_id

str | None

project_name

str | None

API reference

Braintrust Python SDK

⌘I

​Requirements

​Installation

​Functions

​Eval

​EvalAsync

​Reporter

​auto_instrument

​current_experiment

​current_logger

​current_span

​extract_trace_context

​flush

​get_prompt_versions

​get_span_parent_object

​init

​init_dataset

​init_experiment

​init_function

​init_logger

​inject_trace_context

​invoke

​load_parameters

​load_prompt

​load_prompt_async

​log

​login

​parent_context

​parse_stream

​patch_openai

​permalink

​register_otel_flush

​run_evaluator

​set_http_adapter

​set_masking_function

​set_thread_pool_max_workers

​span_components_to_object_id

​start_span

​summarize

​traced

​update_span

​Classes

​AsyncScorerLike

​Attachment

​BaseExperiment

​BraintrustConsoleChunk

​BraintrustErrorChunk

​BraintrustInvokeError

​BraintrustJsonChunk

​BraintrustProgressChunk

​BraintrustStream

​BraintrustTextChunk

​ClassifierBuilder

​CodeFunction

​CodePrompt

​DataSummary

​Dataset

​DatasetRef

​DatasetSummary

​EvalCase

​EvalHooks

​EvalResult

​EvalScorerArgs

​Evaluator

​Experiment

​ExperimentSummary

​ExternalAttachment

​JSONAttachment

​MetricSummary

​ParametersBuilder

​ParametersRef

​Project

​ProjectBuilder

​Prompt

​PromptBuilder

​ReadonlyAttachment

​ReadonlyExperiment

​RepoInfo

​ReporterDef

​RetryRequestExceptionsAdapter

​SSEProgressEvent

Requirements

Installation

Functions

Eval

EvalAsync

Reporter

auto_instrument

current_experiment

current_logger

current_span

extract_trace_context

flush

get_prompt_versions

get_span_parent_object

init

init_dataset

init_experiment

init_function

init_logger

inject_trace_context

invoke

load_parameters

load_prompt

load_prompt_async

log

login

parent_context

parse_stream

patch_openai

permalink

register_otel_flush

run_evaluator

set_http_adapter

set_masking_function

set_thread_pool_max_workers

span_components_to_object_id

start_span

summarize

traced

update_span

Classes

AsyncScorerLike

Attachment

BaseExperiment

BraintrustConsoleChunk

BraintrustErrorChunk

BraintrustInvokeError

BraintrustJsonChunk

BraintrustProgressChunk

BraintrustStream

BraintrustTextChunk

ClassifierBuilder

CodeFunction

CodePrompt

DataSummary

Dataset

DatasetRef

DatasetSummary

EvalCase

EvalHooks

EvalResult

EvalScorerArgs

Evaluator

Experiment

ExperimentSummary

ExternalAttachment

JSONAttachment

MetricSummary

ParametersBuilder

ParametersRef

Project

ProjectBuilder

Prompt

PromptBuilder

ReadonlyAttachment

ReadonlyExperiment

RepoInfo

ReporterDef

RetryRequestExceptionsAdapter

SSEProgressEvent