Requirements
- Python 3.10 or higher
Installation
Functions
Eval
A function you can use to define an evaluator. This is a convenience wrapper around theEvaluator class.
The name of the evaluator. This corresponds to a project name in Braintrust.
Returns an iterator over the evaluation dataset. Each element of the iterator should be a
EvalCase.Runs the evaluation task on a single input. The
hooks object can be used to add metadata to the evaluation.A list of scorers to evaluate the results of the task. Each scorer can be a Scorer object or a function that takes an
EvalScorerArgs object and returns a Score object.(Optional) Experiment name. If not specified, a name will be generated automatically.
The number of times to run the evaluator per input. This is useful for evaluating applications that have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the variance in the results.
(Optional) A dictionary with additional data about the test example, model outputs, or just about anything else that’s relevant, that you can use to help find and analyze examples later. For example, you could log the
prompt, example’s id, or anything else that would be useful to slice/dice later. The values in metadata can be any JSON-serializable type, but its keys must be strings.(Optional) Whether the experiment should be public. Defaults to false.
(Optional) A reporter that takes an evaluator and its result and returns a report.
reporter.report_eval
Callable[[Evaluator[Input, Output], EvalResultWithSummary[Input, Output], bool, bool], EvalReport | Awaitable[EvalReport]]
required
(Optional) The duration, in seconds, after which to time out the evaluation. Defaults to None, in which case there is no timeout.
(Optional) If specified, uses the given project ID instead of the evaluator’s name to identify the project.
An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.
An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment. This takes precedence over
base_experiment_name if specified.Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
Optionally explicitly specify the git metadata for this experiment. This takes precedence over
git_metadata_settings if specified.Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored.
An optional description for the experiment.
Whether to summarize the scores of the experiment after it has run.
Do not send logs to Braintrust. When True, the evaluation runs locally and builds a local summary instead of creating an experiment. Defaults to False.
A set of parameters that will be passed to the evaluator.
An optional callback that will be called when the evaluation starts. It receives the
ExperimentSummary object, which can be used to display metadata about the experiment.A function that will be called with progress events, which can be used to display intermediate progress.
If specified, instead of creating a new experiment object, the Eval() will populate the object or span specified by this parent.
Optional BraintrustState to use for the evaluation. If not specified, the global login state will be used.
Whether to enable the span cache for this evaluation. Defaults to True. The span cache stores span data on disk to minimize memory usage and allow scorers to read spans without server round-trips.
EvalAsync
A function you can use to define an evaluator. This is a convenience wrapper around theEvaluator class.
The name of the evaluator. This corresponds to a project name in Braintrust.
Returns an iterator over the evaluation dataset. Each element of the iterator should be a
EvalCase.Runs the evaluation task on a single input. The
hooks object can be used to add metadata to the evaluation.A list of scorers to evaluate the results of the task. Each scorer can be a Scorer object or a function that takes an
EvalScorerArgs object and returns a Score object.(Optional) Experiment name. If not specified, a name will be generated automatically.
The number of times to run the evaluator per input. This is useful for evaluating applications that have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the variance in the results.
(Optional) A dictionary with additional data about the test example, model outputs, or just about anything else that’s relevant, that you can use to help find and analyze examples later. For example, you could log the
prompt, example’s id, or anything else that would be useful to slice/dice later. The values in metadata can be any JSON-serializable type, but its keys must be strings.(Optional) Whether the experiment should be public. Defaults to false.
(Optional) A reporter that takes an evaluator and its result and returns a report.
reporter.report_eval
Callable[[Evaluator[Input, Output], EvalResultWithSummary[Input, Output], bool, bool], EvalReport | Awaitable[EvalReport]]
required
(Optional) The duration, in seconds, after which to time out the evaluation. Defaults to None, in which case there is no timeout.
(Optional) If specified, uses the given project ID instead of the evaluator’s name to identify the project.
An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.
An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment. This takes precedence over
base_experiment_name if specified.Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
Optionally explicitly specify the git metadata for this experiment. This takes precedence over
git_metadata_settings if specified.Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored.
An optional description for the experiment.
Whether to summarize the scores of the experiment after it has run.
Do not send logs to Braintrust. When True, the evaluation runs locally and builds a local summary instead of creating an experiment. Defaults to False.
A set of parameters that will be passed to the evaluator.
An optional callback that will be called when the evaluation starts. It receives the
ExperimentSummary object, which can be used to display metadata about the experiment.A function that will be called with progress events, which can be used to display intermediate progress.
If specified, instead of creating a new experiment object, the Eval() will populate the object or span specified by this parent.
Optional BraintrustState to use for the evaluation. If not specified, the global login state will be used.
Whether to enable the span cache for this evaluation. Defaults to True. The span cache stores span data on disk to minimize memory usage and allow scorers to read spans without server round-trips.
Reporter
A function you can use to define a reporter. This is a convenience wrapper around theReporterDef class.
The name of the reporter.
report_eval
Callable[[Evaluator[Input, Output], EvalResultWithSummary[Input, Output], bool, bool], EvalReport | Awaitable[EvalReport]]
A function that takes an evaluator and its result and returns a report.
A function that takes all evaluator results and returns a boolean indicating whether the run was successful.
current_experiment
Returns the currently-active experiment (set bybraintrust.init(...)). Returns None if no current experiment has been set.
current_logger
Returns the currently-active logger (set bybraintrust.init_logger(...)). Returns None if no current logger has been set.
current_span
Return the currently-active span for logging (set by running a span under a context manager). If there is no active span, returns a no-op span object, which supports the same interface as spans but does no logging.flush
Flush any pending rows to the server.get_prompt_versions
Get the versions for a specific prompt.The ID of the project to query
The ID of the prompt to get versions for
get_span_parent_object
Mainly for internal use. Return the parent object for starting a span in a global context. Applies precedence: current span > propagated parent string > experiment > logger.init
Log in, and then initialize a new experiment in a specified project. If the project does not exist, it will be created.The name of the project to create the experiment in. Must specify at least one of
project or project_id.The name of the experiment to create. If not specified, a name will be generated automatically.
(Optional) An optional description of the experiment.
(Optional) A dataset to associate with the experiment. The dataset must be initialized with
braintrust.init_dataset before passing it into the experiment.If the experiment already exists, open it in read-only mode. Throws an error if the experiment does not already exist.
An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment. Otherwise, it will pick an experiment by finding the closest ancestor on the default (e.g. main) branch.
An optional parameter to control whether the experiment is publicly visible to anybody with the link or privately visible to only members of the organization. Defaults to private.
The URL of the Braintrust App. Defaults to https://www.braintrust.dev.
The API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login.(Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.
(Optional) a dictionary with additional data about the test example, model outputs, or just about anything else that’s relevant, that you can use to help find and analyze examples later. For example, you could log the
prompt, example’s id, or anything else that would be useful to slice/dice later. The values in metadata can be any JSON-serializable type, but its keys must be strings.(Optional) Settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
If true (the default), set the global current-experiment to the newly-created one.
If the experiment already exists, continue logging to it. If it does not exist, creates the experiment with the specified arguments.
The id of the project to create the experiment in. This takes precedence over
project if specified.An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this. This takes precedence over
base_experiment if specified.(Optional) Explicitly specify the git metadata for this experiment. This takes precedence over
git_metadata_settings if specified.(Optional) A BraintrustState object to use. If not specified, will use the global state. This is for advanced use only.
init_dataset
Create a new dataset in a specified project. If the project does not exist, it will be created.The name of the dataset to create. If not specified, a name will be generated automatically.
An optional description of the dataset.
An optional version of the dataset (to read). If not specified, the latest version will be used.
The URL of the Braintrust App. Defaults to https://www.braintrust.dev.
The API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login.(Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.
The id of the project to create the dataset in. This takes precedence over
project if specified.(Optional) a dictionary with additional data about the dataset. The values in
metadata can be any JSON-serializable type, but its keys must be strings.(Deprecated) If True, records will be fetched from this dataset in the legacy format, with the “expected” field renamed to “output”. This option will be removed in a future version of Braintrust.
(Internal) If specified, the dataset will be created with the given BTQL filters.
(Internal) The Braintrust state to use. If not specified, will use the global state. For advanced use only.
init_experiment
Alias forinit
init_function
Creates a function that can be used as either a task or scorer in the Eval framework. When used as a task, it will invoke the specified Braintrust function with the input. When used as a scorer, it will invoke the function with the scorer arguments.The name of the project containing the function.
The slug of the function to invoke.
Optional version of the function to use. Defaults to latest.
init_logger
Create a new logger in a specified project. If the project does not exist, it will be created.The name of the project to log into. If unspecified, will default to the Global project.
The id of the project to log into. This takes precedence over project if specified.
If true (the default), log events will be batched and sent asynchronously in a background thread. If false, log events will be sent synchronously. Set to false in serverless environments.
The URL of the Braintrust API. Defaults to https://www.braintrust.dev.
The API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login.(Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.
Login again, even if you have already logged in (by default, the logger will not login if you are already logged in)
If true (the default), set the global current-experiment to the newly-created one.
invoke
Invoke a Braintrust function, returning aBraintrustStream or the value as a plain Python object.
The ID of the function to invoke.
The version of the function to invoke.
The ID of the prompt session to invoke the function from.
The ID of the function in the prompt session to invoke.
The name of the project containing the function to invoke.
The ID of the project to use for execution context (API keys, project defaults, etc.).
This is not the project the function belongs to, but the project context for the invocation.
The slug of the function to invoke.
The name of the global function to invoke.
The type of the global function to invoke. If unspecified, defaults to ‘scorer’
for backward compatibility.
The input to the function. This will be logged as the
input field in the span.Additional OpenAI-style messages to add to the prompt (only works for llm functions).
Additional metadata to add to the span. This will be logged as the
metadata field in the span.
It will also be available as the {{metadata}} field in the prompt and as the metadata argument
to the function.Tags to add to the span. This will be logged as the
tags field in the span.The parent of the function. This can be an existing span, logger, or experiment, or
the output of
.export() if you are distributed tracing. If unspecified, will use
the same semantics as traced() to determine the parent and no-op if not in a tracing
context.Whether to stream the function’s output. If True, the function will return a
BraintrustStream, otherwise it will return the output of the function as a JSON
object.The response shape of the function if returning tool calls. If “auto”, will return
a string if the function returns a string, and a JSON object otherwise. If “parallel”,
will return an array of JSON objects with one object per tool call.
Whether to use strict mode for the function. If true, the function will throw an
error if the variable names in the prompt do not match the input keys.
The name of the Braintrust organization to use.
The API key to use for authentication.
The URL of the Braintrust application.
Whether to force a new login even if already logged in.
load_prompt
Loads a prompt from the specified project.The name of the project to load the prompt from. Must specify at least one of
project or project_id.The slug of the prompt to load.
An optional version of the prompt (to read). If not specified, the latest version will be used.
The id of the project to load the prompt from. This takes precedence over
project if specified.The id of a specific prompt to load. If specified, this takes precedence over all other parameters (project, slug, version).
(Optional) A dictionary of default values to use when rendering the prompt. Prompt values will override these defaults.
If true, do not include logging metadata for this prompt when build() is called.
The environment to load the prompt from. Cannot be used together with version.
The URL of the Braintrust App. Defaults to https://www.braintrust.dev.
The API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login.(Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.
log
Log a single event to the current experiment. The event will be batched and uploaded behind the scenes.login
Log into Braintrust. This will prompt you for your API token, which you can find at https://www.braintrust.dev/app/token. This method is called automatically byinit().
The URL of the Braintrust App. Defaults to https://www.braintrust.dev.
The API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login.(Optional) The name of a specific organization to connect to. This is useful if you belong to multiple.
Login again, even if you have already logged in (by default, this function will exit quickly if you have already logged in)
parent_context
Context manager to temporarily set the parent context for spans.The parent string to set during the context
Optional BraintrustState to use. If not provided, uses the global state.
parse_stream
Parse a BraintrustStream into its final value.The BraintrustStream to parse.
patch_anthropic
Patch Anthropic to add Braintrust tracing globally.patch_litellm
Patch LiteLLM to add Braintrust tracing.patch_openai
Patch OpenAI to add Braintrust tracing globally.permalink
Format a permalink to the Braintrust application for viewing the span represented by the providedslug.
The identifier generated from
Span.export.The org name to use. If not provided, the org name will be inferred from the global login state.
The app URL to use. If not provided, the app URL will be inferred from the global login state.
prettify_params
Clean up parameters by filtering out NOT_GIVEN values and serializing response_format.register_otel_flush
Register a callback to flush OTEL spans. This is called by the OTEL integration when it initializes a span processor/exporter.The async callback function to flush OTEL spans.
run_evaluator
Wrapper on _run_evaluator_internal that times out execution after evaluator.timeout.serialize_response_format
Serialize response format for logging.set_http_adapter
Specify a custom HTTP adapter to use for all network requests. This is useful for setting custom retry policies, timeouts, etc. Braintrust uses therequests library, so the adapter should be an instance of requests.adapters.HTTPAdapter. Alternatively, consider sub-classing our RetryRequestExceptionsAdapter to get automatic retries on network-related exceptions.
The adapter to use.
set_masking_function
Set a global masking function that will be applied to all logged data before sending to Braintrust. The masking function will be applied after records are merged but before they are sent to the backend.A function that takes a JSON-serializable object and returns a masked version. Set to None to disable masking.
set_thread_pool_max_workers
Set the maximum number of threads to use for running evaluators. By default, this is the number of CPUs on the machine.span_components_to_object_id
Utility function to resolve the object ID of a SpanComponentsV4 object. This function may trigger a login to braintrust if the object ID is encoded lazily.start_span
Lower-level alternative to@traced for starting a span at the toplevel. It creates a span under the first active object (using the same precedence order as @traced), or if parent is specified, under the specified parent row, or returns a no-op span object.
summarize
Summarize the current experiment, including the scores (compared to the closest reference experiment) and metadata.Whether to summarize the scores. If False, only the metadata will be returned.
The experiment to compare against. If None, the most recent experiment on the comparison_commit will be used.
traced
Decorator to trace the wrapped function. Can either be applied bare (@traced) or by providing arguments (@traced(*span_args, **span_kwargs)), which will be forwarded to the created span. See Span.start_span for full details on the span arguments.
update_span
Update a span using the output ofspan.export(). It is important that you only resume updating to a span once the original span has been fully written and flushed, since otherwise updates to the span may conflict with the original span.
The output of
span.export().wrap_anthropic
Wrap anAnthropic object (or AsyncAnthropic) to add tracing. If Braintrust is not configured, this is a no-op. If this is not an Anthropic object, this function is a no-op.
wrap_litellm
Wrap the litellm module to add tracing. If Braintrust is not configured, nothing will be traced.The litellm module
wrap_openai
Wrap the openai module (pre v1) or OpenAI instance (post v1) to add tracing. If Braintrust is not configured, nothing will be traced. If this is not anOpenAI object, this function is a no-op.
The openai module or OpenAI object
Classes
AsyncResponseWrapper
Wrapper that properly preserves async context manager behavior for OpenAI responses. Methods__init__()
AsyncScorerLike
Protocol for asynchronous scorers that implement the eval_async interface. The framework will prefer this interface if available. Methodseval_async()
Attachment
Represents an attachment to be uploaded and the associated metadata. PropertiesThe object that replaces this
Attachment at upload time.The attachment contents. This is a lazy value that will read the attachment contents from disk or memory on first access.
__init__(), upload(), debug_info()
BaseExperiment
Use this to specify that the dataset should actually be the data from a previous (base) experiment. If you do not specify a name, Braintrust will automatically figure out the best base experiment to use based on your git history (or fall back to timestamps). PropertiesThe name of the base experiment to use. If unspecified, Braintrust will automatically figure out the best base
using your git history (or fall back to timestamps).
__init__()
BraintrustConsoleChunk
A console chunk from a Braintrust stream. Properties__init__()
BraintrustErrorChunk
An error chunk from a Braintrust stream. Properties__init__()
BraintrustInvokeError
An error that occurs during a Braintrust stream.BraintrustJsonChunk
A chunk of JSON data from a Braintrust stream. Properties__init__()
BraintrustProgressChunk
A progress chunk from a Braintrust stream. Properties__init__()
BraintrustStream
A Braintrust stream. This is a wrapper around a generator ofBraintrustStreamChunk, with utility methods to make them easy to log and convert into various formats.
Properties
__init__(), copy(), final_value()
BraintrustTextChunk
A chunk of text data from a Braintrust stream. Properties__init__()
CodeFunction
A generic callable, with metadata. Properties__init__()
CodePrompt
A prompt defined in code, with metadata. Propertiesto_function_definition(), __init__()
CompletionWrapper
Wrapper for LiteLLM completion functions with tracing support. Properties__init__(), completion(), acompletion()
DataSummary
Summary of a dataset’s data. PropertiesNew or updated records added in this session.
Total records in the dataset.
__init__()
Dataset
A dataset is a collection of records, such as model inputs and outputs, which represent data you can use to evaluate and fine-tune models. You can log production data to datasets, curate them with interesting examples, edit/delete records, and run evaluations against them. Properties__init__(), insert(), update(), delete(), summarize(), close(), flush()
DatasetRef
Reference to a dataset by ID and optional version. PropertiesDatasetSummary
Summary of a dataset’s scores and metadata. PropertiesName of the project that the dataset belongs to.
Name of the dataset.
URL to the project’s page in the Braintrust app.
URL to the experiment’s page in the Braintrust app.
Summary of the dataset’s data.
__init__()
EmbeddingWrapper
Wrapper for LiteLLM embedding functions. Properties__init__(), embedding()
EvalCase
An evaluation case. This is a single input to the evaluation task, along with an optional expected output, metadata, and tags. Properties__init__()
EvalHooks
An object that can be used to add metadata to an evaluation. This is passed to thetask function.
Properties
The metadata object for the current evaluation. You can mutate this object to add or remove metadata.
The expected output for the current evaluation.
Access the span under which the task is run. Also accessible via braintrust.current_span()
The index of the current trial (0-based). This is useful when trial_count > 1.
The tags for the current evaluation. You can mutate this object to add or remove tags.
The parameters for the current evaluation. These are the validated parameter values
that were passed to the evaluator.
report_progress(), meta()
EvalResult
The result of an evaluation. This includes the input, expected output, actual output, and metadata. Properties__init__()
EvalScorerArgs
Arguments passed to an evaluator scorer. This includes the input, expected output, actual output, and metadata. PropertiesEvaluator
An evaluator is an abstraction that defines an evaluation dataset, a task to run on the dataset, and a set of scorers to evaluate the results of the task. Each method attribute can be synchronous or asynchronous (for optimal performance, it is recommended to provide asynchronous implementations). PropertiesThe name of the project the eval falls under.
A name that describes the experiment. You do not need to change it each time the experiment runs.
Returns an iterator over the evaluation dataset. Each element of the iterator should be an
EvalCase or a dict
with the same fields as an EvalCase (input, expected, metadata).Runs the evaluation task on a single input. The
hooks object can be used to add metadata to the evaluation.A list of scorers to evaluate the results of the task. Each scorer can be a Scorer object or a function
that takes
input, output, and expected arguments and returns a Score object. The function can be async.Optional experiment name. If not specified, a name will be generated automatically.
A dictionary with additional data about the test example, model outputs, or just about anything else that’s
relevant, that you can use to help find and analyze examples later. For example, you could log the
prompt,
example’s id, or anything else that would be useful to slice/dice later. The values in metadata can be any
JSON-serializable type, but its keys must be strings.The number of times to run the evaluator per input. This is useful for evaluating applications that
have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the
variance in the results.
Whether the experiment should be public. Defaults to false.
Whether to update an existing experiment with
experiment_name if one exists. Defaults to false.The duration, in seconds, after which to time out the evaluation.
Defaults to None, in which case there is no timeout.
The maximum number of tasks/scorers that will be run concurrently.
Defaults to None, in which case there is no max concurrency.
If specified, uses the given project ID instead of the evaluator’s name to identify the project.
An optional experiment name to use as a base. If specified, the new experiment will be summarized and
compared to this experiment.
An optional experiment id to use as a base. If specified, the new experiment will be summarized and
compared to this experiment. This takes precedence over
base_experiment_name if specified.Optional settings for collecting git metadata. By default, will collect all
git metadata fields allowed in org-level settings.
Optionally explicitly specify the git metadata for this experiment. This
takes precedence over
git_metadata_settings if specified.Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored.
A default implementation is exported as
default_error_score_handler which will log a 0 score to the root span for any scorer that was not run.An optional description for the experiment.
Whether to summarize the scores of the experiment after it has run.
A set of parameters that will be passed to the evaluator.
Can be used to define prompts or other configurable values.
__init__()
Experiment
An experiment is a collection of logged events, such as model inputs and outputs, which represent a snapshot of your application at a particular point in time. An experiment is meant to capture more than just the model you use, and includes the data you use to test, pre- and post- processing code, comparison metrics (scores), and any other metadata you want to include. Properties__init__(), log(), log_feedback(), start_span(), update_span(), fetch_base_experiment(), summarize(), export(), close(), flush()
ExperimentSummary
Summary of an experiment’s scores and metadata. PropertiesName of the project that the experiment belongs to.
ID of the project. May be
None if the eval was run locally.ID of the experiment. May be
None if the eval was run locally.Name of the experiment.
URL to the project’s page in the Braintrust app.
URL to the experiment’s page in the Braintrust app.
The experiment scores are baselined against.
Summary of the experiment’s scores.
Summary of the experiment’s metrics.
__init__()
ExternalAttachment
Represents an attachment that resides in an external object store and the associated metadata. PropertiesThe object that replaces this
Attachment at upload time.The attachment contents. This is a lazy value that will read the attachment contents from the external object store on first access.
__init__(), upload(), debug_info()
JSONAttachment
A convenience class for creating attachments from JSON-serializable objects. Methods__init__()
LiteLLMWrapper
Main wrapper for the LiteLLM module. Methods__init__(), completion(), acompletion(), responses(), aresponses(), embedding(), moderation()
MetricSummary
Summary of a metric’s performance. PropertiesName of the metric.
Average metric across all examples.
Unit label for the metric.
Number of improvements in the metric.
Number of regressions in the metric.
Difference in metric between the current and reference experiment.
__init__()
ModerationWrapper
Wrapper for LiteLLM moderation functions. Properties__init__(), moderation()
NamedWrapper
Wrapper that preserves access to the original wrapped object’s attributes. Methods__init__()
Project
A handle to a Braintrust project. Properties__init__(), add_code_function(), add_prompt(), publish()
ProjectBuilder
Creates handles to Braintrust projects. Methodscreate()
Prompt
A prompt object consists of prompt text, a model, and model parameters (such as temperature), which can be used to generate completions or chat messages. The prompt object supports calling.build() which uses mustache templating to build the prompt with the given formatting options and returns a plain dictionary that includes the built prompt and arguments. The dictionary can be passed as kwargs to the OpenAI client or modified as you see fit.
Properties
__init__(), from_prompt_data(), build()
PromptBuilder
Builder to create a prompt in Braintrust. Properties__init__(), create()
ReadonlyAttachment
A readonly alternative toAttachment, which can be used for fetching already-uploaded Attachments.
Properties
The attachment contents. This is a lazy value that will read the attachment contents from the object store on first access.
__init__(), metadata(), status()
ReadonlyExperiment
A read-only view of an experiment, initialized by passingopen=True to init().
Properties
__init__(), as_dataset()
RepoInfo
Information about the current HEAD of the repo. Properties__init__()
ReporterDef
A reporter takes an evaluator and its result and returns a report. PropertiesThe name of the reporter.
report_eval
Callable[[Evaluator[Input, Output], EvalResultWithSummary[Input, Output], bool, bool], EvalReport | Awaitable[EvalReport]]
A function that takes an evaluator and its result and returns a report.
A function that takes all evaluator results and returns a boolean indicating whether the run was successful.
If you return false, the
braintrust eval command will exit with a non-zero status code.__init__()
ResponsesWrapper
Wrapper for LiteLLM responses functions with tracing support. Properties__init__(), responses(), aresponses()
RetryRequestExceptionsAdapter
An HTTP adapter that automatically retries requests on connection exceptions. Properties__init__(), send()
SSEProgressEvent
A progress event that can be reported during task execution, specifically for SSE (Server-Sent Events) streams. This is a subclass of TaskProgressEvent with additional fields for SSE-specific metadata. PropertiesScoreSummary
Summary of a score’s performance. PropertiesName of the score.
Average score across all examples.
Number of improvements in the score.
Number of regressions in the score.
Difference in score between the current and reference experiment.
__init__()
ScorerBuilder
Builder to create a scorer in Braintrust. Properties__init__(), create()
Span
A Span encapsulates logged data and metrics for a unit of work. This interface is shared by all span implementations. PropertiesRow ID of the span.
log(), log_feedback(), start_span(), export(), link(), permalink(), end(), flush(), close(), set_attributes(), set_current(), unset_current()
SpanIds
The three IDs that define a span’s position in the trace tree. Properties__init__()
SpanImpl
Primary implementation of theSpan interface. See the Span interface for full details on each method.
Properties
__init__(), set_attributes(), log(), log_internal(), log_feedback(), start_span(), end(), export(), link(), permalink(), close(), flush(), set_current(), unset_current()
SpanScope
Scope for operating on a single span. PropertiesSyncScorerLike
Protocol for synchronous scorers that implement the callable interface. This is the most common interface and is used when no async version is available. Methods__call__()
TaskProgressEvent
Progress event that can be reported during task execution. Propertiesevent
Literal['reasoning_delta', 'text_delta', 'json_delta', 'error', 'console', 'start', 'done', 'progress']
__init__()
ToolBuilder
Builder to create a tool in Braintrust. Properties__init__(), create()
TraceScope
Scope for operating on an entire trace. PropertiesTracedMessageStream
TracedMessageStream wraps both sync and async message streams. Obviously only one makes sense at a time Methods__init__()