Installation
Starting with v2.0.0, if you’re using the Vercel AI SDK integration or other features that require schema validation, you must install
zod as a peer dependency: npm install zodFunctions
BaseExperiment
Use this to specify that the dataset should actually be the data from a previous (base) experiment. If you do not specify a name, Braintrust will automatically figure out the best base experiment to use based on your git history (or fall back to timestamps).The name of the base experiment to use. If unspecified, Braintrust will automatically figure out the best base
using your git history (or fall back to timestamps).
BraintrustMiddleware
Creates a Braintrust middleware for AI SDK v2 that automatically traces generateText and streamText calls with comprehensive metadata and metrics.Configuration options for the middleware
buildLocalSummary
buildLocalSummary functionAn optional experiment id to use as a base. If specified, the new experiment will be summarized
and compared to this experiment. This takes precedence over
baseExperimentName if specified.An optional experiment name to use as a base. If specified, the new experiment will be summarized
and compared to this experiment.
A function that returns a list of inputs, expected outputs, and metadata.
An optional description for the experiment.
Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored.
A default implementation is exported as
defaultErrorScoreHandler which will log a 0 score to the root span for any scorer that was not run.An optional name for the experiment.
Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
Whether the experiment should be public. Defaults to false.
The maximum number of tasks/scorers that will be run concurrently.
Defaults to undefined, in which case there is no max concurrency.
Optional additional metadata for the experiment.
A set of parameters that will be passed to the evaluator.
Can contain array values that will be converted to single values in the task.
If specified, uses the given project ID instead of the evaluator’s name to identify the project.
Optionally explicitly specify the git metadata for this experiment. This takes precedence over
gitMetadataSettings if specified.A set of functions that take an input, output, and expected value and return a score.
An abort signal that can be used to stop the evaluation.
If specified, uses the logger state to initialize Braintrust objects. If unspecified, falls back
to the global state (initialized using your API key).
Whether to summarize the scores of the experiment after it has run.
Defaults to true.
A function that takes an input and returns an output.
The duration, in milliseconds, after which to time out the evaluation.
Defaults to undefined, in which case there is no timeout.
The number of times to run the evaluator per input. This is useful for evaluating applications that
have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the
variance in the results.
Whether to update an existing experiment with
experiment_name if one exists. Defaults to false.createFinalValuePassThroughStream
Create a stream that passes through the final value of the stream. This is used to implementBraintrustStream.finalValue().
A function to call with the final value of the stream.
currentExperiment
Returns the currently-active experiment (set byinit). Returns undefined if no current experiment has been set.
currentLogger
Returns the currently-active logger (set byinitLogger). Returns undefined if no current logger has been set.
currentSpan
Return the currently-active span for logging (set by one of thetraced methods). If there is no active span, returns a no-op span object, which supports the same interface as spans but does no logging.
See Span for full details.
defaultErrorScoreHandler
defaultErrorScoreHandler functiondeserializePlainStringAsJSON
deserializePlainStringAsJSON functiondevNullWritableStream
devNullWritableStream functionEval
Eval functionAn optional experiment id to use as a base. If specified, the new experiment will be summarized
and compared to this experiment. This takes precedence over
baseExperimentName if specified.An optional experiment name to use as a base. If specified, the new experiment will be summarized
and compared to this experiment.
A function that returns a list of inputs, expected outputs, and metadata.
An optional description for the experiment.
Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored.
A default implementation is exported as
defaultErrorScoreHandler which will log a 0 score to the root span for any scorer that was not run.An optional name for the experiment.
Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
Whether the experiment should be public. Defaults to false.
The maximum number of tasks/scorers that will be run concurrently.
Defaults to undefined, in which case there is no max concurrency.
Optional additional metadata for the experiment.
A set of parameters that will be passed to the evaluator.
Can contain array values that will be converted to single values in the task.
If specified, uses the given project ID instead of the evaluator’s name to identify the project.
Optionally explicitly specify the git metadata for this experiment. This takes precedence over
gitMetadataSettings if specified.A set of functions that take an input, output, and expected value and return a score.
An abort signal that can be used to stop the evaluation.
If specified, uses the logger state to initialize Braintrust objects. If unspecified, falls back
to the global state (initialized using your API key).
Whether to summarize the scores of the experiment after it has run.
Defaults to true.
A function that takes an input and returns an output.
The duration, in milliseconds, after which to time out the evaluation.
Defaults to undefined, in which case there is no timeout.
The number of times to run the evaluator per input. This is useful for evaluating applications that
have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the
variance in the results.
Whether to update an existing experiment with
experiment_name if one exists. Defaults to false.flush
Flush any pending rows to the server.getContextManager
getContextManager functiongetIdGenerator
Factory function that creates a new ID generator instance each time. This eliminates global state and makes tests parallelizable. Each caller gets their own generator instance.getPromptVersions
Get the versions for a prompt.The ID of the project to query
The ID of the prompt to get versions for
getSpanParentObject
Mainly for internal use. Return the parent object for starting a span in a global context. Applies precedence: current span > propagated parent string > experiment > logger.init
Log in, and then initialize a new experiment in a specified project. If the project does not exist, it will be created.Options for configuring init().
The API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable.The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need
to change this unless you are doing the “Full” deployment.
If true, disables the local span cache used to optimize scorer access
to trace data. When disabled, scorers will always fetch spans from the
server. Defaults to false.
A custom fetch implementation to use.
By default, the SDK installs an event handler that flushes pending writes on the
beforeExit event.
If true, this event handler will not be installed.Calls this function if there’s an error in the background flusher.
The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually
unnecessary unless you are logging in with a JWT.
initDataset
Create a new dataset in a specified project. If the project does not exist, it will be created.Options for configuring initDataset().
The API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable.The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need
to change this unless you are doing the “Full” deployment.
If true, disables the local span cache used to optimize scorer access
to trace data. When disabled, scorers will always fetch spans from the
server. Defaults to false.
A custom fetch implementation to use.
By default, the SDK installs an event handler that flushes pending writes on the
beforeExit event.
If true, this event handler will not be installed.Calls this function if there’s an error in the background flusher.
The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually
unnecessary unless you are logging in with a JWT.
initExperiment
Alias for init(options).The API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable.The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need
to change this unless you are doing the “Full” deployment.
If true, disables the local span cache used to optimize scorer access
to trace data. When disabled, scorers will always fetch spans from the
server. Defaults to false.
A custom fetch implementation to use.
By default, the SDK installs an event handler that flushes pending writes on the
beforeExit event.
If true, this event handler will not be installed.Calls this function if there’s an error in the background flusher.
The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually
unnecessary unless you are logging in with a JWT.
initFunction
Creates a function that can be used as a task or scorer in the Braintrust evaluation framework. The returned function wraps a Braintrust function and can be passed directly to Eval(). When used as a task:Options for the function.
The project name containing the function.
The slug of the function to invoke.
Optional Braintrust state to use.
Optional version of the function to use. Defaults to latest.
initLogger
Create a new logger in a specified project. If the project does not exist, it will be created.Additional options for configuring init().
The API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable.The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need
to change this unless you are doing the “Full” deployment.
If true, disables the local span cache used to optimize scorer access
to trace data. When disabled, scorers will always fetch spans from the
server. Defaults to false.
A custom fetch implementation to use.
By default, the SDK installs an event handler that flushes pending writes on the
beforeExit event.
If true, this event handler will not be installed.Calls this function if there’s an error in the background flusher.
The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually
unnecessary unless you are logging in with a JWT.
invoke
Invoke a Braintrust function, returning aBraintrustStream or the value as a plain
Javascript object.
The arguments for the function (see
InvokeFunctionArgs for more details).The type of the global function to invoke. If unspecified, defaults to ‘scorer’ for backward compatibility.
The ID of the function to invoke.
The name of the global function to invoke.
The input to the function. This will be logged as the
input field in the span.Additional OpenAI-style messages to add to the prompt (only works for llm functions).
Additional metadata to add to the span. This will be logged as the
metadata field in the span.
It will also be available as the {{metadata}} field in the prompt and as the metadata argument
to the function.The mode of the function. If “auto”, will return a string if the function returns a string,
and a JSON object otherwise. If “parallel”, will return an array of JSON objects with one
object per tool call.
The parent of the function. This can be an existing span, logger, or experiment, or
the output of
.export() if you are distributed tracing. If unspecified, will use
the same semantics as traced() to determine the parent and no-op if not in a tracing
context.The ID of the project to use for execution context (API keys, project defaults, etc.).
This is not the project the function belongs to, but the project context for the invocation.
The name of the project containing the function to invoke.
The ID of the function in the prompt session to invoke.
The ID of the prompt session to invoke the function from.
A Zod schema to validate the output of the function and return a typed value. This
is only used if
stream is false.The slug of the function to invoke.
(Advanced) This parameter allows you to pass in a custom login state. This is useful
for multi-tenant environments where you are running functions from different Braintrust
organizations.
Whether to stream the function’s output. If true, the function will return a
BraintrustStream, otherwise it will return the output of the function as a JSON
object.Whether to use strict mode for the function. If true, the function will throw an error
if the variable names in the prompt do not match the input keys.
Tags to add to the span. This will be logged as the
tags field in the span.The version of the function to invoke.
The API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable.The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need
to change this unless you are doing the “Full” deployment.
If true, disables the local span cache used to optimize scorer access
to trace data. When disabled, scorers will always fetch spans from the
server. Defaults to false.
A custom fetch implementation to use.
By default, the SDK installs an event handler that flushes pending writes on the
beforeExit event.
If true, this event handler will not be installed.Calls this function if there’s an error in the background flusher.
The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually
unnecessary unless you are logging in with a JWT.
isTemplateFormat
isTemplateFormat functionloadPrompt
Load a prompt from the specified project.Options for configuring loadPrompt().
The API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable.The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need
to change this unless you are doing the “Full” deployment.
If true, disables the local span cache used to optimize scorer access
to trace data. When disabled, scorers will always fetch spans from the
server. Defaults to false.
A custom fetch implementation to use.
By default, the SDK installs an event handler that flushes pending writes on the
beforeExit event.
If true, this event handler will not be installed.Calls this function if there’s an error in the background flusher.
The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually
unnecessary unless you are logging in with a JWT.
log
Log a single event to the current experiment. The event will be batched and uploaded behind the scenes.The event to log. See
Experiment.log for full details.logError
logError functionRow ID of the span.
Root span ID of the span.
Span ID of the span.
Parent span IDs of the span.
login
Log into Braintrust. This will prompt you for your API token, which you can find at https://www.braintrust.dev/app/token. This method is called automatically byinit().
Options for configuring login().
The API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable.The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need
to change this unless you are doing the “Full” deployment.
If true, disables the local span cache used to optimize scorer access
to trace data. When disabled, scorers will always fetch spans from the
server. Defaults to false.
A custom fetch implementation to use.
By default, the SDK installs an event handler that flushes pending writes on the
beforeExit event.
If true, this event handler will not be installed.Calls this function if there’s an error in the background flusher.
The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually
unnecessary unless you are logging in with a JWT.
Login again, even if you have already logged in (by default, this function will exit quickly if you have already logged in)
loginToState
loginToState functionThe API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable.The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need
to change this unless you are doing the “Full” deployment.
If true, disables the local span cache used to optimize scorer access
to trace data. When disabled, scorers will always fetch spans from the
server. Defaults to false.
A custom fetch implementation to use.
By default, the SDK installs an event handler that flushes pending writes on the
beforeExit event.
If true, this event handler will not be installed.Calls this function if there’s an error in the background flusher.
The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually
unnecessary unless you are logging in with a JWT.
newId
newId functionparseCachedHeader
parseCachedHeader functionparseTemplateFormat
parseTemplateFormat functionpermalink
Format a permalink to the Braintrust application for viewing the span represented by the providedslug.
Links can be generated at any time, but they will only become viewable after
the span and its root have been flushed to the server and ingested.
If you have a Span object, use Span.link instead.
The identifier generated from
Span.export.Optional arguments.
The app URL to use. If not provided, the app URL will be inferred from the state.
The org name to use. If not provided, the org name will be inferred from the state.
The login state to use. If not provided, the global state will be used.
promptDefinitionToPromptData
promptDefinitionToPromptData functionpromptDefinition.params
objectOutputType | objectOutputType | objectOutputType | objectOutputType | objectOutputType
registerOtelFlush
Register a callback to flush OTEL spans. This is called by @braintrust/otel when it initializes a BraintrustSpanProcessor/Exporter. When ensureSpansFlushed is called (e.g., before a BTQL query in scorers), this callback will be invoked to ensure OTEL spans are flushed to the server. Also disables the span cache, since OTEL spans aren’t in the local cache and we need BTQL to see the complete span tree (both native + OTEL spans).renderMessage
renderMessage functionrenderPromptParams
renderPromptParams functionparams
undefined | objectOutputType | objectOutputType | objectOutputType | objectOutputType | objectOutputType
renderTemplateContent
renderTemplateContent functionReporter
Reporter functionreportFailures
reportFailures functionAn optional experiment id to use as a base. If specified, the new experiment will be summarized
and compared to this experiment. This takes precedence over
baseExperimentName if specified.An optional experiment name to use as a base. If specified, the new experiment will be summarized
and compared to this experiment.
A function that returns a list of inputs, expected outputs, and metadata.
An optional description for the experiment.
Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored.
A default implementation is exported as
defaultErrorScoreHandler which will log a 0 score to the root span for any scorer that was not run.An optional name for the experiment.
Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
Whether the experiment should be public. Defaults to false.
The maximum number of tasks/scorers that will be run concurrently.
Defaults to undefined, in which case there is no max concurrency.
Optional additional metadata for the experiment.
A set of parameters that will be passed to the evaluator.
Can contain array values that will be converted to single values in the task.
If specified, uses the given project ID instead of the evaluator’s name to identify the project.
Optionally explicitly specify the git metadata for this experiment. This takes precedence over
gitMetadataSettings if specified.A set of functions that take an input, output, and expected value and return a score.
An abort signal that can be used to stop the evaluation.
If specified, uses the logger state to initialize Braintrust objects. If unspecified, falls back
to the global state (initialized using your API key).
Whether to summarize the scores of the experiment after it has run.
Defaults to true.
A function that takes an input and returns an output.
The duration, in milliseconds, after which to time out the evaluation.
Defaults to undefined, in which case there is no timeout.
The number of times to run the evaluator per input. This is useful for evaluating applications that
have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the
variance in the results.
Whether to update an existing experiment with
experiment_name if one exists. Defaults to false.runEvaluator
runEvaluator functionAn optional experiment id to use as a base. If specified, the new experiment will be summarized
and compared to this experiment. This takes precedence over
baseExperimentName if specified.An optional experiment name to use as a base. If specified, the new experiment will be summarized
and compared to this experiment.
A function that returns a list of inputs, expected outputs, and metadata.
An optional description for the experiment.
Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored.
A default implementation is exported as
defaultErrorScoreHandler which will log a 0 score to the root span for any scorer that was not run.An optional name for the experiment.
Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
Whether the experiment should be public. Defaults to false.
The maximum number of tasks/scorers that will be run concurrently.
Defaults to undefined, in which case there is no max concurrency.
Optional additional metadata for the experiment.
A set of parameters that will be passed to the evaluator.
Can contain array values that will be converted to single values in the task.
If specified, uses the given project ID instead of the evaluator’s name to identify the project.
Optionally explicitly specify the git metadata for this experiment. This takes precedence over
gitMetadataSettings if specified.A set of functions that take an input, output, and expected value and return a score.
An abort signal that can be used to stop the evaluation.
If specified, uses the logger state to initialize Braintrust objects. If unspecified, falls back
to the global state (initialized using your API key).
Whether to summarize the scores of the experiment after it has run.
Defaults to true.
A function that takes an input and returns an output.
The duration, in milliseconds, after which to time out the evaluation.
Defaults to undefined, in which case there is no timeout.
The number of times to run the evaluator per input. This is useful for evaluating applications that
have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the
variance in the results.
Whether to update an existing experiment with
experiment_name if one exists. Defaults to false.setFetch
Set the fetch implementation to use for requests. You can specify it here, or when you calllogin.
The fetch implementation to use.
setMaskingFunction
Set a global masking function that will be applied to all logged data before sending to Braintrust. The masking function will be applied after records are merged but before they are sent to the backend.A function that takes a JSON-serializable object and returns a masked version.
Set to null to disable masking.
spanComponentsToObjectId
spanComponentsToObjectId functionstartSpan
Lower-level alternative totraced. This allows you to start a span yourself, and can be useful in situations
where you cannot use callbacks. However, spans started with startSpan will not be marked as the “current span”,
so currentSpan() and traced() will be no-ops. If you want to mark a span as current, use traced instead.
See traced for full details.
summarize
Summarize the current experiment, including the scores (compared to the closest reference experiment) and metadata.Options for summarizing the experiment.
The experiment to compare against. If None, the most recent experiment on the origin’s main branch will be used.
Whether to summarize the scores. If False, only the metadata will be returned.
traceable
A synonym forwrapTraced. If you’re porting from systems that use traceable, you can use this to
make your codebase more consistent.
traced
Toplevel function for starting a span. It checks the following (in precedence order):- Currently-active span
- Currently-active experiment
- Currently-active logger
parent is specified, it creates a span under the specified parent row. If none of these are active, it returns a no-op span object.
See Span.traced for full details.
updateSpan
Update a span using the output ofspan.export(). It is important that you only resume updating
to a span once the original span has been fully written and flushed, since otherwise updates to
the span may conflict with the original span.
withCurrent
Runs the provided callback with the span as the current span.Row ID of the span.
Root span ID of the span.
Span ID of the span.
Parent span IDs of the span.
withDataset
withDataset functionThe API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable.The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need
to change this unless you are doing the “Full” deployment.
If true, disables the local span cache used to optimize scorer access
to trace data. When disabled, scorers will always fetch spans from the
server. Defaults to false.
A custom fetch implementation to use.
By default, the SDK installs an event handler that flushes pending writes on the
beforeExit event.
If true, this event handler will not be installed.Calls this function if there’s an error in the background flusher.
The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually
unnecessary unless you are logging in with a JWT.
withExperiment
withExperiment functionThe API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable.The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need
to change this unless you are doing the “Full” deployment.
If true, disables the local span cache used to optimize scorer access
to trace data. When disabled, scorers will always fetch spans from the
server. Defaults to false.
A custom fetch implementation to use.
By default, the SDK installs an event handler that flushes pending writes on the
beforeExit event.
If true, this event handler will not be installed.Calls this function if there’s an error in the background flusher.
The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually
unnecessary unless you are logging in with a JWT.
withLogger
withLogger functionThe API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable.The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need
to change this unless you are doing the “Full” deployment.
If true, disables the local span cache used to optimize scorer access
to trace data. When disabled, scorers will always fetch spans from the
server. Defaults to false.
A custom fetch implementation to use.
By default, the SDK installs an event handler that flushes pending writes on the
beforeExit event.
If true, this event handler will not be installed.Calls this function if there’s an error in the background flusher.
The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually
unnecessary unless you are logging in with a JWT.
withParent
withParent functionwrapAISDK
Wraps Vercel AI SDK methods with Braintrust tracing. Returns wrapped versions of generateText, streamText, generateObject, streamObject, Agent, experimental_Agent, and ToolLoopAgent that automatically create spans and log inputs, outputs, and metrics.wrapAISDKModel
Wrap an ai-sdk model (created with.chat(), .completion(), etc.) to add tracing. If Braintrust is
not configured, this is a no-op
wrapAnthropic
Wrap anAnthropic object (created with new Anthropic(...)) to add tracing. If Braintrust is
not configured, nothing will be traced. If this is not an Anthropic object, this function is
a no-op.
Currently, this only supports the v4 API.
wrapClaudeAgentSDK
Wraps the Claude Agent SDK with Braintrust tracing. This returns wrapped versions of query and tool that automatically trace all agent interactions.The Claude Agent SDK module
wrapGoogleGenAI
Wrap a Google GenAI module (imported withimport * as googleGenAI from '@google/genai') to add tracing.
If Braintrust is not configured, nothing will be traced.
The Google GenAI module
wrapMastraAgent
wrapMastraAgent functionwrapOpenAI
Wrap anOpenAI object (created with new OpenAI(...)) to add tracing. If Braintrust is
not configured, nothing will be traced. If this is not an OpenAI object, this function is
a no-op.
Currently, this supports both the v4 and v5 API.
wrapOpenAIv4
wrapOpenAIv4 functionwrapTraced
Wrap a function withtraced, using the arguments as input and return value as output.
Any functions wrapped this way will automatically be traced, similar to the @traced decorator
in Python. If you want to correctly propagate the function’s name and define it in one go, then
you can do so like this:
myFunc will be traced, and the input and output will be logged automatically.
If tracing is inactive, i.e. there is no active logger or experiment, it’s just a no-op.
The function to wrap.
Span-level arguments (e.g. a custom name or type) to pass to
traced.Classes
Attachment
Represents an attachment to be uploaded and the associated metadata.Attachment objects can be inserted anywhere in an event, allowing you to
log arbitrary file data. The SDK will asynchronously upload the file to
object storage and replace the Attachment object with an
AttachmentReference.
Properties
The object that replaces this
Attachment at upload time.data(), debugInfo(), upload()
BaseAttachment
BaseAttachment class Propertiesdata(), debugInfo(), upload()
BraintrustState
BraintrustState class PropertiesapiConn(), appConn(), bgLogger(), copyLoginInfo(), disable(), enforceQueueSizeLimit(), flushOtel(), httpLogger(), login(), loginReplaceApiConn(), proxyConn(), registerOtelFlush(), resetIdGenState(), resetLoginInfo(), serialize(), setFetch(), setMaskingFunction(), setOverrideBgLogger(), toJSON(), toString(), deserialize()
BraintrustStream
A Braintrust stream. This is a wrapper around a ReadableStream ofBraintrustStreamChunk,
with some utility methods to make them easy to log and convert into various formats.
Methods
[asyncIterator](), copy(), finalValue(), toReadableStream(), parseRawEvent(), serializeRawEvent()
CachedSpanFetcher
Cached span fetcher that handles fetching and caching spans by type. Caching strategy:- Cache spans by span type (Map<spanType, SpanData[]>)
- Track if all spans have been fetched (allFetched flag)
- When filtering by spanType, only fetch types not already in cache
getSpans()
CodeFunction
CodeFunction class Propertieskey()
CodePrompt
CodePrompt class PropertiestoFunctionDefinition()
ContextManager
ContextManager class MethodsgetCurrentSpan(), getParentSpanIds(), runInContext()
Dataset
A dataset is a collection of records, such as model inputs and expected outputs, which represent data you can use to evaluate and fine-tune models. You can log production data to datasets, curate them with interesting examples, edit/delete records, and run evaluations against them. You should not createDataset objects directly. Instead, use the braintrust.initDataset() method.
Properties
[asyncIterator](), clearCache(), close(), delete(), fetch(), fetchedData(), flush(), getState(), insert(), summarize(), update(), version(), isDataset()
EvalResultWithSummary
EvalResultWithSummary class PropertiestoJSON(), toString()
Experiment
An experiment is a collection of logged events, such as model inputs and outputs, which represent a snapshot of your application at a particular point in time. An experiment is meant to capture more than just the model you use, and includes the data you use to test, pre- and post- processing code, comparison metrics (scores), and any other metadata you want to include. Experiments are associated with a project, and two experiments are meant to be easily comparable via theirinputs. You can change the attributes of the experiments in a project (e.g. scoring functions)
over time, simply by changing what you log.
You should not create Experiment objects directly. Instead, use the braintrust.init() method.
Properties
[asyncIterator](), clearCache(), close(), export(), fetch(), fetchBaseExperiment(), fetchedData(), flush(), getState(), log(), logFeedback(), startSpan(), summarize(), traced(), updateSpan(), version()
ExternalAttachment
Represents an attachment that resides in an external object store and the associated metadata.ExternalAttachment objects can be inserted anywhere in an event, similar to
Attachment objects, but they reference files that already exist in an external
object store rather than requiring upload. The SDK will replace the ExternalAttachment
object with an AttachmentReference during logging.
Properties
The object that replaces this
ExternalAttachment at upload time.data(), debugInfo(), upload()
FailedHTTPResponse
FailedHTTPResponse class PropertiesIDGenerator
Abstract base class for ID generators MethodsgetSpanId(), getTraceId(), shareRootSpanId()
JSONAttachment
Represents a JSON object that should be stored as an attachment.JSONAttachment is a convenience function that creates an Attachment
from JSON data. It’s particularly useful for large JSON objects that
would otherwise bloat the trace size.
The JSON data is automatically serialized and stored as an attachment
with content type “application/json”.
Properties
The object that replaces this
Attachment at upload time.data(), debugInfo(), upload()
LazyValue
LazyValue class Propertiesget(), getSync()
Logger
Logger class Propertiesexport(), flush(), log(), logFeedback(), startSpan(), traced(), updateSpan()
LoginInvalidOrgError
LoginInvalidOrgError class PropertiesNoopSpan
A fake implementation of the Span API which does nothing. This can be used as the default span. PropertiesRow ID of the span.
Root span ID of the span.
Span ID of the span.
Parent span IDs of the span.
close(), end(), export(), flush(), getParentInfo(), link(), log(), logFeedback(), permalink(), setAttributes(), startSpan(), startSpanWithParents(), state(), toString(), traced()
ObjectFetcher
ObjectFetcher class Properties[asyncIterator](), clearCache(), fetch(), fetchedData(), getState(), version()
Project
Project class PropertiesaddCodeFunction(), addPrompt(), publish()
ProjectNameIdMap
ProjectNameIdMap class MethodsgetId(), getName(), resolve()
Prompt
Prompt class Propertiesbuild(), buildWithAttachments(), fromPromptData(), isPrompt(), renderPrompt()
PromptBuilder
PromptBuilder class Methodscreate()
ReadonlyAttachment
A readonly alternative toAttachment, which can be used for fetching
already-uploaded Attachments.
Properties
Attachment metadata.
asBase64Url(), data(), metadata(), status()
ReadonlyExperiment
A read-only view of an experiment, initialized by passingopen: true to init().
Properties
[asyncIterator](), asDataset(), clearCache(), fetch(), fetchedData(), getState(), version()
ScorerBuilder
ScorerBuilder class Methodscreate()
SpanFetcher
Fetcher for spans by root_span_id, using the ObjectFetcher pattern. Handles pagination automatically via cursor-based iteration. Properties[asyncIterator](), clearCache(), fetch(), fetchedData(), getState(), version()
SpanImpl
Primary implementation of theSpan interface. See Span for full details on each method.
We suggest using one of the various traced methods, instead of creating Spans directly. See Span.startSpan for full details.
Properties
Row ID of the span.
Root span ID of the span.
Span ID of the span.
Parent span IDs of the span.
close(), end(), export(), flush(), getParentInfo(), link(), log(), logFeedback(), permalink(), setAttributes(), setSpanParents(), startSpan(), startSpanWithParents(), state(), toString(), traced()
TestBackgroundLogger
TestBackgroundLogger class Methodsdrain(), flush(), log(), setMaskingFunction()
ToolBuilder
ToolBuilder class Methodscreate()
UUIDGenerator
ID generator that uses UUID4 for both span and trace IDs MethodsgetSpanId(), getTraceId(), shareRootSpanId()
Interfaces
AttachmentParams
AttachmentParams interface PropertiesBackgroundLoggerOpts
BackgroundLoggerOpts interface PropertiesContextParentSpanIds
ContextParentSpanIds interface PropertiesDatasetSummary
Summary of a dataset’s scores and metadata. PropertiesSummary of the dataset’s data.
Name of the dataset.
URL to the experiment’s page in the Braintrust app.
Name of the project that the dataset belongs to.
URL to the project’s page in the Braintrust app.
DataSummary
Summary of a dataset’s data. PropertiesNew or updated records added in this session.
Total records in the dataset.
EvalHooks
EvalHooks interface PropertiesThe expected output for the current evaluation.
The metadata object for the current evaluation. You can mutate this object to add or remove metadata.
The current parameters being used for this specific task execution.
Array parameters are converted to single values.
Report progress that will show up in the playground.
The task’s span.
The tags for the current evaluation.
The index of the current trial (0-based). This is useful when trialCount > 1.
Evaluator
Evaluator interface PropertiesAn optional experiment id to use as a base. If specified, the new experiment will be summarized
and compared to this experiment. This takes precedence over
baseExperimentName if specified.An optional experiment name to use as a base. If specified, the new experiment will be summarized
and compared to this experiment.
A function that returns a list of inputs, expected outputs, and metadata.
An optional description for the experiment.
Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored.
A default implementation is exported as
defaultErrorScoreHandler which will log a 0 score to the root span for any scorer that was not run.An optional name for the experiment.
Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
Whether the experiment should be public. Defaults to false.
The maximum number of tasks/scorers that will be run concurrently.
Defaults to undefined, in which case there is no max concurrency.
Optional additional metadata for the experiment.
A set of parameters that will be passed to the evaluator.
Can contain array values that will be converted to single values in the task.
If specified, uses the given project ID instead of the evaluator’s name to identify the project.
Optionally explicitly specify the git metadata for this experiment. This takes precedence over
gitMetadataSettings if specified.A set of functions that take an input, output, and expected value and return a score.
An abort signal that can be used to stop the evaluation.
If specified, uses the logger state to initialize Braintrust objects. If unspecified, falls back
to the global state (initialized using your API key).
Whether to summarize the scores of the experiment after it has run.
Defaults to true.
A function that takes an input and returns an output.
The duration, in milliseconds, after which to time out the evaluation.
Defaults to undefined, in which case there is no timeout.
The number of times to run the evaluator per input. This is useful for evaluating applications that
have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the
variance in the results.
Whether to update an existing experiment with
experiment_name if one exists. Defaults to false.ExperimentSummary
Summary of an experiment’s scores and metadata. PropertiesThe experiment scores are baselined against.
ID of the experiment. May be
undefined if the eval was run locally.Name of the experiment.
URL to the experiment’s page in the Braintrust app.
Name of the project that the experiment belongs to.
URL to the project’s page in the Braintrust app.
Summary of the experiment’s scores.
Exportable
Exportable interfaceExternalAttachmentParams
ExternalAttachmentParams interface PropertiesFunctionEvent
FunctionEvent interface PropertiesInvokeFunctionArgs
Arguments for theinvoke function.
Properties
The type of the global function to invoke. If unspecified, defaults to ‘scorer’ for backward compatibility.
The ID of the function to invoke.
The name of the global function to invoke.
The input to the function. This will be logged as the
input field in the span.Additional OpenAI-style messages to add to the prompt (only works for llm functions).
Additional metadata to add to the span. This will be logged as the
metadata field in the span.
It will also be available as the {{metadata}} field in the prompt and as the metadata argument
to the function.The mode of the function. If “auto”, will return a string if the function returns a string,
and a JSON object otherwise. If “parallel”, will return an array of JSON objects with one
object per tool call.
The parent of the function. This can be an existing span, logger, or experiment, or
the output of
.export() if you are distributed tracing. If unspecified, will use
the same semantics as traced() to determine the parent and no-op if not in a tracing
context.The ID of the project to use for execution context (API keys, project defaults, etc.).
This is not the project the function belongs to, but the project context for the invocation.
The name of the project containing the function to invoke.
The ID of the function in the prompt session to invoke.
The ID of the prompt session to invoke the function from.
A Zod schema to validate the output of the function and return a typed value. This
is only used if
stream is false.The slug of the function to invoke.
(Advanced) This parameter allows you to pass in a custom login state. This is useful
for multi-tenant environments where you are running functions from different Braintrust
organizations.
Whether to stream the function’s output. If true, the function will return a
BraintrustStream, otherwise it will return the output of the function as a JSON
object.Whether to use strict mode for the function. If true, the function will throw an error
if the variable names in the prompt do not match the input keys.
Tags to add to the span. This will be logged as the
tags field in the span.The version of the function to invoke.
LoginOptions
Options for logging in to Braintrust. PropertiesThe API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable.The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need
to change this unless you are doing the “Full” deployment.
If true, disables the local span cache used to optimize scorer access
to trace data. When disabled, scorers will always fetch spans from the
server. Defaults to false.
A custom fetch implementation to use.
By default, the SDK installs an event handler that flushes pending writes on the
beforeExit event.
If true, this event handler will not be installed.Calls this function if there’s an error in the background flusher.
The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually
unnecessary unless you are logging in with a JWT.
LogOptions
LogOptions interface PropertiesMetricSummary
Summary of a metric’s performance. PropertiesDifference in metric between the current and reference experiment.
Number of improvements in the metric.
Average metric across all examples.
Name of the metric.
Number of regressions in the metric.
Unit label for the metric.
ObjectMetadata
ObjectMetadata interface PropertiesParentExperimentIds
ParentExperimentIds interface PropertiesParentProjectLogIds
ParentProjectLogIds interface PropertiesReporterBody
ReporterBody interfaceScoreSummary
Summary of a score’s performance. PropertiesDifference in score between the current and reference experiment.
Number of improvements in the score.
Name of the score.
Number of regressions in the score.
Average score across all examples.
Span
A Span encapsulates logged data and metrics for a unit of work. This interface is shared by all span implementations. We suggest using one of the varioustraced methods, instead of creating Spans directly. See Span.traced for full details.
Properties
Row ID of the span.
Root span ID of the span.
Span ID of the span.
Parent span IDs of the span.