Skip to main content
For complete Python documentation, examples, and API reference, please see the Braintrust SDK on GitHub.

Requirements

  • Python 3.10 or higher

Installation

pip install braintrust

Functions

Eval

A function you can use to define an evaluator. This is a convenience wrapper around the Evaluator class.
name
str
The name of the evaluator. This corresponds to a project name in Braintrust.
data
EvalData[Input, Output]
Returns an iterator over the evaluation dataset. Each element of the iterator should be a EvalCase.
task
EvalTask[Input, Output]
Runs the evaluation task on a single input. The hooks object can be used to add metadata to the evaluation.
scores
Sequence[EvalScorer[Input, Output]]
A list of scorers to evaluate the results of the task. Each scorer can be a Scorer object or a function that takes an EvalScorerArgs object and returns a Score object.
experiment_name
str | None
(Optional) Experiment name. If not specified, a name will be generated automatically.
trial_count
int
The number of times to run the evaluator per input. This is useful for evaluating applications that have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the variance in the results.
metadata
Metadata | None
(Optional) A dictionary with additional data about the test example, model outputs, or just about anything else that’s relevant, that you can use to help find and analyze examples later. For example, you could log the prompt, example’s id, or anything else that would be useful to slice/dice later. The values in metadata can be any JSON-serializable type, but its keys must be strings.
is_public
bool
(Optional) Whether the experiment should be public. Defaults to false.
update
bool
reporter
ReporterDef[Input, Output, EvalReport] | None
(Optional) A reporter that takes an evaluator and its result and returns a report.
reporter.name
str
required
reporter.report_eval
Callable[[Evaluator[Input, Output], EvalResultWithSummary[Input, Output], bool, bool], EvalReport | Awaitable[EvalReport]]
required
reporter.report_run
Callable[[list[EvalReport], bool, bool], bool | Awaitable[bool]]
required
timeout
float | None
(Optional) The duration, in seconds, after which to time out the evaluation. Defaults to None, in which case there is no timeout.
max_concurrency
int | None
project_id
str | None
(Optional) If specified, uses the given project ID instead of the evaluator’s name to identify the project.
base_experiment_name
str | None
An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.
base_experiment_id
str | None
An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment. This takes precedence over base_experiment_name if specified.
git_metadata_settings
GitMetadataSettings | None
Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
git_metadata_settings.collect
Literal['all', 'some', 'none']
required
git_metadata_settings.fields
list[str] | None
required
repo_info
RepoInfo | None
Optionally explicitly specify the git metadata for this experiment. This takes precedence over git_metadata_settings if specified.
repo_info.commit
str | None
required
repo_info.branch
str | None
required
repo_info.tag
str | None
required
repo_info.dirty
bool | None
required
repo_info.author_name
str | None
required
repo_info.author_email
str | None
required
repo_info.commit_message
str | None
required
repo_info.commit_time
str | None
required
repo_info.git_diff
str | None
required
error_score_handler
ErrorScoreHandler | None
Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored.
description
str | None
An optional description for the experiment.
summarize_scores
bool
Whether to summarize the scores of the experiment after it has run.
no_send_logs
bool
Do not send logs to Braintrust. When True, the evaluation runs locally and builds a local summary instead of creating an experiment. Defaults to False.
parameters
EvalParameters | None
A set of parameters that will be passed to the evaluator.
on_start
Callable[[ExperimentSummary], None] | None
An optional callback that will be called when the evaluation starts. It receives the ExperimentSummary object, which can be used to display metadata about the experiment.
stream
Callable[[SSEProgressEvent], None] | None
A function that will be called with progress events, which can be used to display intermediate progress.
parent
str | None
If specified, instead of creating a new experiment object, the Eval() will populate the object or span specified by this parent.
state
BraintrustState | None
Optional BraintrustState to use for the evaluation. If not specified, the global login state will be used.
enable_cache
bool
Whether to enable the span cache for this evaluation. Defaults to True. The span cache stores span data on disk to minimize memory usage and allow scorers to read spans without server round-trips.

EvalAsync

A function you can use to define an evaluator. This is a convenience wrapper around the Evaluator class.
name
str
The name of the evaluator. This corresponds to a project name in Braintrust.
data
EvalData[Input, Output]
Returns an iterator over the evaluation dataset. Each element of the iterator should be a EvalCase.
task
EvalTask[Input, Output]
Runs the evaluation task on a single input. The hooks object can be used to add metadata to the evaluation.
scores
Sequence[EvalScorer[Input, Output]]
A list of scorers to evaluate the results of the task. Each scorer can be a Scorer object or a function that takes an EvalScorerArgs object and returns a Score object.
experiment_name
str | None
(Optional) Experiment name. If not specified, a name will be generated automatically.
trial_count
int
The number of times to run the evaluator per input. This is useful for evaluating applications that have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the variance in the results.
metadata
Metadata | None
(Optional) A dictionary with additional data about the test example, model outputs, or just about anything else that’s relevant, that you can use to help find and analyze examples later. For example, you could log the prompt, example’s id, or anything else that would be useful to slice/dice later. The values in metadata can be any JSON-serializable type, but its keys must be strings.
is_public
bool
(Optional) Whether the experiment should be public. Defaults to false.
update
bool
reporter
ReporterDef[Input, Output, EvalReport] | None
(Optional) A reporter that takes an evaluator and its result and returns a report.
reporter.name
str
required
reporter.report_eval
Callable[[Evaluator[Input, Output], EvalResultWithSummary[Input, Output], bool, bool], EvalReport | Awaitable[EvalReport]]
required
reporter.report_run
Callable[[list[EvalReport], bool, bool], bool | Awaitable[bool]]
required
timeout
float | None
(Optional) The duration, in seconds, after which to time out the evaluation. Defaults to None, in which case there is no timeout.
max_concurrency
int | None
project_id
str | None
(Optional) If specified, uses the given project ID instead of the evaluator’s name to identify the project.
base_experiment_name
str | None
An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.
base_experiment_id
str | None
An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment. This takes precedence over base_experiment_name if specified.
git_metadata_settings
GitMetadataSettings | None
Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
git_metadata_settings.collect
Literal['all', 'some', 'none']
required
git_metadata_settings.fields
list[str] | None
required
repo_info
RepoInfo | None
Optionally explicitly specify the git metadata for this experiment. This takes precedence over git_metadata_settings if specified.
repo_info.commit
str | None
required
repo_info.branch
str | None
required
repo_info.tag
str | None
required
repo_info.dirty
bool | None
required
repo_info.author_name
str | None
required
repo_info.author_email
str | None
required
repo_info.commit_message
str | None
required
repo_info.commit_time
str | None
required
repo_info.git_diff
str | None
required
error_score_handler
ErrorScoreHandler | None
Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored.
description
str | None
An optional description for the experiment.
summarize_scores
bool
Whether to summarize the scores of the experiment after it has run.
no_send_logs
bool
Do not send logs to Braintrust. When True, the evaluation runs locally and builds a local summary instead of creating an experiment. Defaults to False.
parameters
EvalParameters | None
A set of parameters that will be passed to the evaluator.
on_start
Callable[[ExperimentSummary], None] | None
An optional callback that will be called when the evaluation starts. It receives the ExperimentSummary object, which can be used to display metadata about the experiment.
stream
Callable[[SSEProgressEvent], None] | None
A function that will be called with progress events, which can be used to display intermediate progress.
parent
str | None
If specified, instead of creating a new experiment object, the Eval() will populate the object or span specified by this parent.
state
BraintrustState | None
Optional BraintrustState to use for the evaluation. If not specified, the global login state will be used.
enable_cache
bool
Whether to enable the span cache for this evaluation. Defaults to True. The span cache stores span data on disk to minimize memory usage and allow scorers to read spans without server round-trips.

Reporter

A function you can use to define a reporter. This is a convenience wrapper around the ReporterDef class.
name
str
The name of the reporter.
report_eval
Callable[[Evaluator[Input, Output], EvalResultWithSummary[Input, Output], bool, bool], EvalReport | Awaitable[EvalReport]]
A function that takes an evaluator and its result and returns a report.
report_run
Callable[[list[EvalReport], bool, bool], bool | Awaitable[bool]]
A function that takes all evaluator results and returns a boolean indicating whether the run was successful.

current_experiment

Returns the currently-active experiment (set by braintrust.init(...)). Returns None if no current experiment has been set.

current_logger

Returns the currently-active logger (set by braintrust.init_logger(...)). Returns None if no current logger has been set.

current_span

Return the currently-active span for logging (set by running a span under a context manager). If there is no active span, returns a no-op span object, which supports the same interface as spans but does no logging.

flush

Flush any pending rows to the server.

get_prompt_versions

Get the versions for a specific prompt.
project_id
str
The ID of the project to query
prompt_id
str
The ID of the prompt to get versions for

get_span_parent_object

Mainly for internal use. Return the parent object for starting a span in a global context. Applies precedence: current span > propagated parent string > experiment > logger.
parent
str | None
state
BraintrustState | None

init

Log in, and then initialize a new experiment in a specified project. If the project does not exist, it will be created.
project
str | None
The name of the project to create the experiment in. Must specify at least one of project or project_id.
experiment
str | None
The name of the experiment to create. If not specified, a name will be generated automatically.
description
str | None
(Optional) An optional description of the experiment.
dataset
Optional[Dataset] | DatasetRef
(Optional) A dataset to associate with the experiment. The dataset must be initialized with braintrust.init_dataset before passing it into the experiment.
open
bool
If the experiment already exists, open it in read-only mode. Throws an error if the experiment does not already exist.
base_experiment
str | None
An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment. Otherwise, it will pick an experiment by finding the closest ancestor on the default (e.g. main) branch.
is_public
bool
An optional parameter to control whether the experiment is publicly visible to anybody with the link or privately visible to only members of the organization. Defaults to private.
app_url
str | None
The URL of the Braintrust App. Defaults to https://www.braintrust.dev.
api_key
str | None
The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login.
org_name
str | None
(Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.
metadata
Metadata | None
(Optional) a dictionary with additional data about the test example, model outputs, or just about anything else that’s relevant, that you can use to help find and analyze examples later. For example, you could log the prompt, example’s id, or anything else that would be useful to slice/dice later. The values in metadata can be any JSON-serializable type, but its keys must be strings.
git_metadata_settings
GitMetadataSettings | None
(Optional) Settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
git_metadata_settings.collect
Literal['all', 'some', 'none']
required
git_metadata_settings.fields
list[str] | None
required
set_current
bool
If true (the default), set the global current-experiment to the newly-created one.
update
bool | None
If the experiment already exists, continue logging to it. If it does not exist, creates the experiment with the specified arguments.
project_id
str | None
The id of the project to create the experiment in. This takes precedence over project if specified.
base_experiment_id
str | None
An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this. This takes precedence over base_experiment if specified.
repo_info
RepoInfo | None
(Optional) Explicitly specify the git metadata for this experiment. This takes precedence over git_metadata_settings if specified.
repo_info.commit
str | None
required
repo_info.branch
str | None
required
repo_info.tag
str | None
required
repo_info.dirty
bool | None
required
repo_info.author_name
str | None
required
repo_info.author_email
str | None
required
repo_info.commit_message
str | None
required
repo_info.commit_time
str | None
required
repo_info.git_diff
str | None
required
state
BraintrustState | None
(Optional) A BraintrustState object to use. If not specified, will use the global state. This is for advanced use only.

init_dataset

Create a new dataset in a specified project. If the project does not exist, it will be created.
project
str | None
name
str | None
The name of the dataset to create. If not specified, a name will be generated automatically.
description
str | None
An optional description of the dataset.
version
str | int | None
An optional version of the dataset (to read). If not specified, the latest version will be used.
app_url
str | None
The URL of the Braintrust App. Defaults to https://www.braintrust.dev.
api_key
str | None
The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login.
org_name
str | None
(Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.
project_id
str | None
The id of the project to create the dataset in. This takes precedence over project if specified.
metadata
Metadata | None
(Optional) a dictionary with additional data about the dataset. The values in metadata can be any JSON-serializable type, but its keys must be strings.
use_output
bool
(Deprecated) If True, records will be fetched from this dataset in the legacy format, with the “expected” field renamed to “output”. This option will be removed in a future version of Braintrust.
_internal_btql
dict[str, Any] | None
(Internal) If specified, the dataset will be created with the given BTQL filters.
state
BraintrustState | None
(Internal) The Braintrust state to use. If not specified, will use the global state. For advanced use only.

init_experiment

Alias for init
args
Any
kwargs
Any

init_function

Creates a function that can be used as either a task or scorer in the Eval framework. When used as a task, it will invoke the specified Braintrust function with the input. When used as a scorer, it will invoke the function with the scorer arguments.
project_name
str
The name of the project containing the function.
slug
str
The slug of the function to invoke.
version
str | None
Optional version of the function to use. Defaults to latest.

init_logger

Create a new logger in a specified project. If the project does not exist, it will be created.
project
str | None
The name of the project to log into. If unspecified, will default to the Global project.
project_id
str | None
The id of the project to log into. This takes precedence over project if specified.
async_flush
bool
If true (the default), log events will be batched and sent asynchronously in a background thread. If false, log events will be sent synchronously. Set to false in serverless environments.
app_url
str | None
The URL of the Braintrust API. Defaults to https://www.braintrust.dev.
api_key
str | None
The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login.
org_name
str | None
(Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.
force_login
bool
Login again, even if you have already logged in (by default, the logger will not login if you are already logged in)
set_current
bool
If true (the default), set the global current-experiment to the newly-created one.
state
BraintrustState | None

invoke

Invoke a Braintrust function, returning a BraintrustStream or the value as a plain Python object.
function_id
str | None
The ID of the function to invoke.
version
str | None
The version of the function to invoke.
prompt_session_id
str | None
The ID of the prompt session to invoke the function from.
prompt_session_function_id
str | None
The ID of the function in the prompt session to invoke.
project_name
str | None
The name of the project containing the function to invoke.
project_id
str | None
The ID of the project to use for execution context (API keys, project defaults, etc.). This is not the project the function belongs to, but the project context for the invocation.
slug
str | None
The slug of the function to invoke.
global_function
str | None
The name of the global function to invoke.
function_type
FunctionTypeEnum | None
The type of the global function to invoke. If unspecified, defaults to ‘scorer’ for backward compatibility.
input
Any
The input to the function. This will be logged as the input field in the span.
messages
list[Any] | None
Additional OpenAI-style messages to add to the prompt (only works for llm functions).
metadata
dict[str, Any] | None
Additional metadata to add to the span. This will be logged as the metadata field in the span. It will also be available as the {{metadata}} field in the prompt and as the metadata argument to the function.
tags
list[str] | None
Tags to add to the span. This will be logged as the tags field in the span.
parent
Exportable | str | None
The parent of the function. This can be an existing span, logger, or experiment, or the output of .export() if you are distributed tracing. If unspecified, will use the same semantics as traced() to determine the parent and no-op if not in a tracing context.
stream
bool
Whether to stream the function’s output. If True, the function will return a BraintrustStream, otherwise it will return the output of the function as a JSON object.
mode
ModeType | None
The response shape of the function if returning tool calls. If “auto”, will return a string if the function returns a string, and a JSON object otherwise. If “parallel”, will return an array of JSON objects with one object per tool call.
strict
bool | None
Whether to use strict mode for the function. If true, the function will throw an error if the variable names in the prompt do not match the input keys.
org_name
str | None
The name of the Braintrust organization to use.
api_key
str | None
The API key to use for authentication.
app_url
str | None
The URL of the Braintrust application.
force_login
bool
Whether to force a new login even if already logged in.

load_prompt

Loads a prompt from the specified project.
project
str | None
The name of the project to load the prompt from. Must specify at least one of project or project_id.
slug
str | None
The slug of the prompt to load.
version
str | int | None
An optional version of the prompt (to read). If not specified, the latest version will be used.
project_id
str | None
The id of the project to load the prompt from. This takes precedence over project if specified.
id
str | None
The id of a specific prompt to load. If specified, this takes precedence over all other parameters (project, slug, version).
defaults
Mapping[str, Any] | None
(Optional) A dictionary of default values to use when rendering the prompt. Prompt values will override these defaults.
no_trace
bool
If true, do not include logging metadata for this prompt when build() is called.
environment
str | None
The environment to load the prompt from. Cannot be used together with version.
app_url
str | None
The URL of the Braintrust App. Defaults to https://www.braintrust.dev.
api_key
str | None
The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login.
org_name
str | None
(Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.

log

Log a single event to the current experiment. The event will be batched and uploaded behind the scenes.
event
Any

login

Log into Braintrust. This will prompt you for your API token, which you can find at https://www.braintrust.dev/app/token. This method is called automatically by init().
app_url
str | None
The URL of the Braintrust App. Defaults to https://www.braintrust.dev.
api_key
str | None
The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login.
org_name
str | None
(Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.
force_login
bool
Login again, even if you have already logged in (by default, this function will exit quickly if you have already logged in)

parent_context

Context manager to temporarily set the parent context for spans.
parent
str | None
The parent string to set during the context
state
BraintrustState | None
Optional BraintrustState to use. If not provided, uses the global state.

parse_stream

Parse a BraintrustStream into its final value.
stream
BraintrustStream
The BraintrustStream to parse.

patch_anthropic

Patch Anthropic to add Braintrust tracing globally.

patch_litellm

Patch LiteLLM to add Braintrust tracing.

patch_openai

Patch OpenAI to add Braintrust tracing globally. Format a permalink to the Braintrust application for viewing the span represented by the provided slug.
slug
str
The identifier generated from Span.export.
org_name
str | None
The org name to use. If not provided, the org name will be inferred from the global login state.
app_url
str | None
The app URL to use. If not provided, the app URL will be inferred from the global login state.

prettify_params

Clean up parameters by filtering out NOT_GIVEN values and serializing response_format.
params
dict[str, Any]

register_otel_flush

Register a callback to flush OTEL spans. This is called by the OTEL integration when it initializes a span processor/exporter.
callback
Any
The async callback function to flush OTEL spans.

run_evaluator

Wrapper on _run_evaluator_internal that times out execution after evaluator.timeout.
experiment
Experiment | None
evaluator
Evaluator[Input, Output]
evaluator.project_name
str
required
evaluator.eval_name
str
required
evaluator.data
EvalData[Input, Output]
required
evaluator.task
EvalTask[Input, Output]
required
evaluator.scores
list[EvalScorer[Input, Output]]
required
evaluator.experiment_name
str | None
required
evaluator.metadata
Metadata | None
required
evaluator.trial_count
int
required
evaluator.is_public
bool
required
evaluator.update
bool
required
evaluator.timeout
float | None
required
evaluator.max_concurrency
int | None
required
evaluator.project_id
str | None
required
evaluator.base_experiment_name
str | None
required
evaluator.base_experiment_id
str | None
required
evaluator.git_metadata_settings
GitMetadataSettings | None
required
evaluator.repo_info
RepoInfo | None
required
evaluator.error_score_handler
ErrorScoreHandler | None
required
evaluator.description
str | None
required
evaluator.summarize_scores
bool
required
evaluator.parameters
EvalParameters | None
required
position
int | None
filters
list[Filter]
stream
Callable[[SSEProgressEvent], None] | None
state
BraintrustState | None
enable_cache
bool

serialize_response_format

Serialize response format for logging.
response_format
Any

set_http_adapter

Specify a custom HTTP adapter to use for all network requests. This is useful for setting custom retry policies, timeouts, etc. Braintrust uses the requests library, so the adapter should be an instance of requests.adapters.HTTPAdapter. Alternatively, consider sub-classing our RetryRequestExceptionsAdapter to get automatic retries on network-related exceptions.
adapter
HTTPAdapter
The adapter to use.

set_masking_function

Set a global masking function that will be applied to all logged data before sending to Braintrust. The masking function will be applied after records are merged but before they are sent to the backend.
masking_function
Callable[[Any], Any] | None
A function that takes a JSON-serializable object and returns a masked version. Set to None to disable masking.

set_thread_pool_max_workers

Set the maximum number of threads to use for running evaluators. By default, this is the number of CPUs on the machine.
max_workers
Any

span_components_to_object_id

Utility function to resolve the object ID of a SpanComponentsV4 object. This function may trigger a login to braintrust if the object ID is encoded lazily.
components
SpanComponentsV4
components.object_type
SpanObjectTypeV3
required
components.object_id
str | None
required
components.compute_object_metadata_args
dict | None
required
components.row_id
str | None
required
components.span_id
str | None
required
components.root_span_id
str | None
required
components.propagated_event
dict | None
required

start_span

Lower-level alternative to @traced for starting a span at the toplevel. It creates a span under the first active object (using the same precedence order as @traced), or if parent is specified, under the specified parent row, or returns a no-op span object.
name
str | None
type
SpanTypeAttribute | None
span_attributes
SpanAttributes | Mapping[str, Any] | None
start_time
float | None
set_current
bool | None
parent
str | None
propagated_event
dict[str, Any] | None
state
BraintrustState | None
event
Any

summarize

Summarize the current experiment, including the scores (compared to the closest reference experiment) and metadata.
summarize_scores
bool
Whether to summarize the scores. If False, only the metadata will be returned.
comparison_experiment_id
str | None
The experiment to compare against. If None, the most recent experiment on the comparison_commit will be used.

traced

Decorator to trace the wrapped function. Can either be applied bare (@traced) or by providing arguments (@traced(*span_args, **span_kwargs)), which will be forwarded to the created span. See Span.start_span for full details on the span arguments.
span_args
Any
span_kwargs
Any

update_span

Update a span using the output of span.export(). It is important that you only resume updating to a span once the original span has been fully written and flushed, since otherwise updates to the span may conflict with the original span.
exported
str
The output of span.export().
event
Any

wrap_anthropic

Wrap an Anthropic object (or AsyncAnthropic) to add tracing. If Braintrust is not configured, this is a no-op. If this is not an Anthropic object, this function is a no-op.
client
Any

wrap_litellm

Wrap the litellm module to add tracing. If Braintrust is not configured, nothing will be traced.
litellm_module
Any
The litellm module

wrap_openai

Wrap the openai module (pre v1) or OpenAI instance (post v1) to add tracing. If Braintrust is not configured, nothing will be traced. If this is not an OpenAI object, this function is a no-op.
openai
Any
The openai module or OpenAI object

Classes

AsyncResponseWrapper

Wrapper that properly preserves async context manager behavior for OpenAI responses. Methods __init__()

AsyncScorerLike

Protocol for asynchronous scorers that implement the eval_async interface. The framework will prefer this interface if available. Methods eval_async()

Attachment

Represents an attachment to be uploaded and the associated metadata. Properties
reference
AttachmentReference
The object that replaces this Attachment at upload time.
data
bytes
The attachment contents. This is a lazy value that will read the attachment contents from disk or memory on first access.
Methods __init__(), upload(), debug_info()

BaseExperiment

Use this to specify that the dataset should actually be the data from a previous (base) experiment. If you do not specify a name, Braintrust will automatically figure out the best base experiment to use based on your git history (or fall back to timestamps). Properties
name
str | None
The name of the base experiment to use. If unspecified, Braintrust will automatically figure out the best base using your git history (or fall back to timestamps).
Methods __init__()

BraintrustConsoleChunk

A console chunk from a Braintrust stream. Properties
message
str
stream
Literal['stderr', 'stdout']
type
Literal['console']
Methods __init__()

BraintrustErrorChunk

An error chunk from a Braintrust stream. Properties
data
str
type
Literal['error']
Methods __init__()

BraintrustInvokeError

An error that occurs during a Braintrust stream.

BraintrustJsonChunk

A chunk of JSON data from a Braintrust stream. Properties
data
str
type
Literal['json_delta']
Methods __init__()

BraintrustProgressChunk

A progress chunk from a Braintrust stream. Properties
data
str
id
str
object_type
str
format
str
output_type
str
name
str
event
Literal['json_delta', 'text_delta', 'reasoning_delta']
type
Literal['progress']
Methods __init__()

BraintrustStream

A Braintrust stream. This is a wrapper around a generator of BraintrustStreamChunk, with utility methods to make them easy to log and convert into various formats. Properties
stream
Any
Methods __init__(), copy(), final_value()

BraintrustTextChunk

A chunk of text data from a Braintrust stream. Properties
data
str
type
Literal['text_delta']
Methods __init__()

CodeFunction

A generic callable, with metadata. Properties
project
Project
handler
Callable[..., Any]
name
str
slug
str
type_
str
description
str | None
parameters
Any
returns
Any
if_exists
IfExists | None
metadata
dict[str, Any] | None
Methods __init__()

CodePrompt

A prompt defined in code, with metadata. Properties
project
Project
name
str
slug
str
prompt
PromptData
tool_functions
list[CodeFunction | SavedFunctionId]
description
str | None
function_type
str | None
id
str | None
if_exists
IfExists | None
metadata
dict[str, Any] | None
Methods to_function_definition(), __init__()

CompletionWrapper

Wrapper for LiteLLM completion functions with tracing support. Properties
completion_fn
Any
acompletion_fn
Any
Methods __init__(), completion(), acompletion()

DataSummary

Summary of a dataset’s data. Properties
new_records
int
New or updated records added in this session.
total_records
int
Total records in the dataset.
Methods __init__()

Dataset

A dataset is a collection of records, such as model inputs and outputs, which represent data you can use to evaluate and fine-tune models. You can log production data to datasets, curate them with interesting examples, edit/delete records, and run evaluations against them. Properties
new_records
Any
state
Any
id
str
name
str
data
Any
project
Any
logging_state
BraintrustState
Methods __init__(), insert(), update(), delete(), summarize(), close(), flush()

DatasetRef

Reference to a dataset by ID and optional version. Properties
id
str
version
str

DatasetSummary

Summary of a dataset’s scores and metadata. Properties
project_name
str
Name of the project that the dataset belongs to.
dataset_name
str
Name of the dataset.
project_url
str
URL to the project’s page in the Braintrust app.
dataset_url
str
URL to the experiment’s page in the Braintrust app.
data_summary
DataSummary | None
Summary of the dataset’s data.
Methods __init__()

EmbeddingWrapper

Wrapper for LiteLLM embedding functions. Properties
embedding_fn
Any
Methods __init__(), embedding()

EvalCase

An evaluation case. This is a single input to the evaluation task, along with an optional expected output, metadata, and tags. Properties
input
Input
expected
Output | None
metadata
Metadata | None
tags
Sequence[str] | None
id
str | None
created
str | None
Methods __init__()

EvalHooks

An object that can be used to add metadata to an evaluation. This is passed to the task function. Properties
metadata
Metadata
The metadata object for the current evaluation. You can mutate this object to add or remove metadata.
expected
Output | None
The expected output for the current evaluation.
span
Span
Access the span under which the task is run. Also accessible via braintrust.current_span()
trial_index
int
The index of the current trial (0-based). This is useful when trial_count > 1.
tags
Sequence[str]
The tags for the current evaluation. You can mutate this object to add or remove tags.
parameters
dict[str, Any] | None
The parameters for the current evaluation. These are the validated parameter values that were passed to the evaluator.
Methods report_progress(), meta()

EvalResult

The result of an evaluation. This includes the input, expected output, actual output, and metadata. Properties
input
Input
output
Output
scores
dict[str, float | None]
expected
Output | None
metadata
Metadata | None
tags
list[str] | None
error
Exception | None
exc_info
str | None
Methods __init__()

EvalScorerArgs

Arguments passed to an evaluator scorer. This includes the input, expected output, actual output, and metadata. Properties
input
Input
output
Output
expected
Output | None
metadata
Metadata | None

Evaluator

An evaluator is an abstraction that defines an evaluation dataset, a task to run on the dataset, and a set of scorers to evaluate the results of the task. Each method attribute can be synchronous or asynchronous (for optimal performance, it is recommended to provide asynchronous implementations). Properties
project_name
str
The name of the project the eval falls under.
eval_name
str
A name that describes the experiment. You do not need to change it each time the experiment runs.
data
EvalData[Input, Output]
Returns an iterator over the evaluation dataset. Each element of the iterator should be an EvalCase or a dict with the same fields as an EvalCase (input, expected, metadata).
task
EvalTask[Input, Output]
Runs the evaluation task on a single input. The hooks object can be used to add metadata to the evaluation.
scores
list[EvalScorer[Input, Output]]
A list of scorers to evaluate the results of the task. Each scorer can be a Scorer object or a function that takes input, output, and expected arguments and returns a Score object. The function can be async.
experiment_name
str | None
Optional experiment name. If not specified, a name will be generated automatically.
metadata
Metadata | None
A dictionary with additional data about the test example, model outputs, or just about anything else that’s relevant, that you can use to help find and analyze examples later. For example, you could log the prompt, example’s id, or anything else that would be useful to slice/dice later. The values in metadata can be any JSON-serializable type, but its keys must be strings.
trial_count
int
The number of times to run the evaluator per input. This is useful for evaluating applications that have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the variance in the results.
is_public
bool
Whether the experiment should be public. Defaults to false.
update
bool
Whether to update an existing experiment with experiment_name if one exists. Defaults to false.
timeout
float | None
The duration, in seconds, after which to time out the evaluation. Defaults to None, in which case there is no timeout.
max_concurrency
int | None
The maximum number of tasks/scorers that will be run concurrently. Defaults to None, in which case there is no max concurrency.
project_id
str | None
If specified, uses the given project ID instead of the evaluator’s name to identify the project.
base_experiment_name
str | None
An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.
base_experiment_id
str | None
An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment. This takes precedence over base_experiment_name if specified.
git_metadata_settings
GitMetadataSettings | None
Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
repo_info
RepoInfo | None
Optionally explicitly specify the git metadata for this experiment. This takes precedence over git_metadata_settings if specified.
error_score_handler
ErrorScoreHandler | None
Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored. A default implementation is exported as default_error_score_handler which will log a 0 score to the root span for any scorer that was not run.
description
str | None
An optional description for the experiment.
summarize_scores
bool
Whether to summarize the scores of the experiment after it has run.
parameters
EvalParameters | None
A set of parameters that will be passed to the evaluator. Can be used to define prompts or other configurable values.
Methods __init__()

Experiment

An experiment is a collection of logged events, such as model inputs and outputs, which represent a snapshot of your application at a particular point in time. An experiment is meant to capture more than just the model you use, and includes the data you use to test, pre- and post- processing code, comparison metrics (scores), and any other metadata you want to include. Properties
dataset
Any
last_start_time
Any
state
Any
id
str
name
str
data
Mapping[str, Any]
project
ObjectMetadata
logging_state
BraintrustState
Methods __init__(), log(), log_feedback(), start_span(), update_span(), fetch_base_experiment(), summarize(), export(), close(), flush()

ExperimentSummary

Summary of an experiment’s scores and metadata. Properties
project_name
str
Name of the project that the experiment belongs to.
project_id
str | None
ID of the project. May be None if the eval was run locally.
experiment_id
str | None
ID of the experiment. May be None if the eval was run locally.
experiment_name
str
Name of the experiment.
project_url
str | None
URL to the project’s page in the Braintrust app.
experiment_url
str | None
URL to the experiment’s page in the Braintrust app.
comparison_experiment_name
str | None
The experiment scores are baselined against.
scores
dict[str, ScoreSummary]
Summary of the experiment’s scores.
metrics
dict[str, MetricSummary]
Summary of the experiment’s metrics.
Methods __init__()

ExternalAttachment

Represents an attachment that resides in an external object store and the associated metadata. Properties
reference
AttachmentReference
The object that replaces this Attachment at upload time.
data
bytes
The attachment contents. This is a lazy value that will read the attachment contents from the external object store on first access.
Methods __init__(), upload(), debug_info()

JSONAttachment

A convenience class for creating attachments from JSON-serializable objects. Methods __init__()

LiteLLMWrapper

Main wrapper for the LiteLLM module. Methods __init__(), completion(), acompletion(), responses(), aresponses(), embedding(), moderation()

MetricSummary

Summary of a metric’s performance. Properties
name
str
Name of the metric.
metric
float | int
Average metric across all examples.
unit
str
Unit label for the metric.
improvements
int | None
Number of improvements in the metric.
regressions
int | None
Number of regressions in the metric.
diff
float | None
Difference in metric between the current and reference experiment.
Methods __init__()

ModerationWrapper

Wrapper for LiteLLM moderation functions. Properties
moderation_fn
Any
Methods __init__(), moderation()

NamedWrapper

Wrapper that preserves access to the original wrapped object’s attributes. Methods __init__()

Project

A handle to a Braintrust project. Properties
name
Any
tools
Any
prompts
Any
scorers
Any
Methods __init__(), add_code_function(), add_prompt(), publish()

ProjectBuilder

Creates handles to Braintrust projects. Methods create()

Prompt

A prompt object consists of prompt text, a model, and model parameters (such as temperature), which can be used to generate completions or chat messages. The prompt object supports calling .build() which uses mustache templating to build the prompt with the given formatting options and returns a plain dictionary that includes the built prompt and arguments. The dictionary can be passed as kwargs to the OpenAI client or modified as you see fit. Properties
defaults
Any
no_trace
Any
id
str
name
str
slug
str
prompt
PromptBlockData | None
version
str
options
PromptOptions
Methods __init__(), from_prompt_data(), build()

PromptBuilder

Builder to create a prompt in Braintrust. Properties
project
Any
Methods __init__(), create()

ReadonlyAttachment

A readonly alternative to Attachment, which can be used for fetching already-uploaded Attachments. Properties
reference
Any
data
bytes
The attachment contents. This is a lazy value that will read the attachment contents from the object store on first access.
Methods __init__(), metadata(), status()

ReadonlyExperiment

A read-only view of an experiment, initialized by passing open=True to init(). Properties
state
Any
id
str
logging_state
BraintrustState
Methods __init__(), as_dataset()

RepoInfo

Information about the current HEAD of the repo. Properties
commit
str | None
branch
str | None
tag
str | None
dirty
bool | None
author_name
str | None
author_email
str | None
commit_message
str | None
commit_time
str | None
git_diff
str | None
Methods __init__()

ReporterDef

A reporter takes an evaluator and its result and returns a report. Properties
name
str
The name of the reporter.
report_eval
Callable[[Evaluator[Input, Output], EvalResultWithSummary[Input, Output], bool, bool], EvalReport | Awaitable[EvalReport]]
A function that takes an evaluator and its result and returns a report.
report_run
Callable[[list[EvalReport], bool, bool], bool | Awaitable[bool]]
A function that takes all evaluator results and returns a boolean indicating whether the run was successful. If you return false, the braintrust eval command will exit with a non-zero status code.
Methods __init__()

ResponsesWrapper

Wrapper for LiteLLM responses functions with tracing support. Properties
responses_fn
Any
aresponses_fn
Any
Methods __init__(), responses(), aresponses()

RetryRequestExceptionsAdapter

An HTTP adapter that automatically retries requests on connection exceptions. Properties
base_num_retries
Any
backoff_factor
Any
default_timeout_secs
Any
Methods __init__(), send()

SSEProgressEvent

A progress event that can be reported during task execution, specifically for SSE (Server-Sent Events) streams. This is a subclass of TaskProgressEvent with additional fields for SSE-specific metadata. Properties
id
str
object_type
str
origin
ObjectReference
name
str

ScoreSummary

Summary of a score’s performance. Properties
name
str
Name of the score.
score
float
Average score across all examples.
improvements
int | None
Number of improvements in the score.
regressions
int | None
Number of regressions in the score.
diff
float | None
Difference in score between the current and reference experiment.
Methods __init__()

ScorerBuilder

Builder to create a scorer in Braintrust. Properties
project
Any
Methods __init__(), create()

Span

A Span encapsulates logged data and metrics for a unit of work. This interface is shared by all span implementations. Properties
id
str
Row ID of the span.
Methods log(), log_feedback(), start_span(), export(), link(), permalink(), end(), flush(), close(), set_attributes(), set_current(), unset_current()

SpanIds

The three IDs that define a span’s position in the trace tree. Properties
span_id
str
root_span_id
str
span_parents
list[str] | None
Methods __init__()

SpanImpl

Primary implementation of the Span interface. See the Span interface for full details on each method. Properties
can_set_current
bool
state
Any
parent_object_type
Any
parent_object_id
Any
parent_compute_object_metadata_args
Any
propagated_event
Any
span_id
Any
root_span_id
Any
span_parents
Any
id
str
Methods __init__(), set_attributes(), log(), log_internal(), log_feedback(), start_span(), end(), export(), link(), permalink(), close(), flush(), set_current(), unset_current()

SpanScope

Scope for operating on a single span. Properties
type
Literal['span']
id
str
root_span_id
str

SyncScorerLike

Protocol for synchronous scorers that implement the callable interface. This is the most common interface and is used when no async version is available. Methods __call__()

TaskProgressEvent

Progress event that can be reported during task execution. Properties
format
FunctionFormat
output_type
FunctionOutputType
event
Literal['reasoning_delta', 'text_delta', 'json_delta', 'error', 'console', 'start', 'done', 'progress']
data
str
Methods __init__()

ToolBuilder

Builder to create a tool in Braintrust. Properties
project
Any
Methods __init__(), create()

TraceScope

Scope for operating on an entire trace. Properties
type
Literal['trace']
root_span_id
str

TracedMessageStream

TracedMessageStream wraps both sync and async message streams. Obviously only one makes sense at a time Methods __init__()