Installation
Functions
BaseExperiment
Use this to specify that the dataset should actually be the data from a previous (base) experiment. If you do not specify a name, Braintrust will automatically figure out the best base experiment to use based on your git history (or fall back to timestamps).BraintrustMiddleware
Creates a Braintrust middleware for AI SDK v2 that automatically traces generateText and streamText calls with comprehensive metadata and metrics.Configuration options for the middleware
buildLocalSummary
buildLocalSummary functioncreateFinalValuePassThroughStream
Create a stream that passes through the final value of the stream. This is used to implementBraintrustStream.finalValue().
A function to call with the final value of the stream.
currentExperiment
Returns the currently-active experiment (set byinit). Returns undefined if no current experiment has been set.
currentLogger
Returns the currently-active logger (set byinitLogger). Returns undefined if no current logger has been set.
currentSpan
Return the currently-active span for logging (set by one of thetraced methods). If there is no active span, returns a no-op span object, which supports the same interface as spans but does no logging.
See Span for full details.
deepCopyEvent
Creates a deep copy of the given event. Replaces references to user objects with placeholder strings to ensure serializability, except forAttachment and ExternalAttachment objects, which are preserved
and not deep-copied.
defaultErrorScoreHandler
defaultErrorScoreHandler functiondeserializePlainStringAsJSON
deserializePlainStringAsJSON functiondevNullWritableStream
devNullWritableStream functionEval
Eval functionflush
Flush any pending rows to the server.getContextManager
getContextManager functiongetIdGenerator
Factory function that creates a new ID generator instance each time. This eliminates global state and makes tests parallelizable. Each caller gets their own generator instance.getPromptVersions
Get the versions for a prompt.The ID of the project to query
The ID of the prompt to get versions for
getSpanParentObject
Mainly for internal use. Return the parent object for starting a span in a global context. Applies precedence: current span > propagated parent string > experiment > logger.init
Log in, and then initialize a new experiment in a specified project. If the project does not exist, it will be created.Options for configuring init().
initDataset
Create a new dataset in a specified project. If the project does not exist, it will be created.Options for configuring initDataset().
initExperiment
Alias for init(options).initFunction
Creates a function that can be used as a task or scorer in the Braintrust evaluation framework. The returned function wraps a Braintrust function and can be passed directly to Eval(). When used as a task:Options for the function.
initLogger
Create a new logger in a specified project. If the project does not exist, it will be created.Additional options for configuring init().
invoke
Invoke a Braintrust function, returning aBraintrustStream or the value as a plain
Javascript object.
The arguments for the function (see
InvokeFunctionArgs for more details).loadPrompt
Load a prompt from the specified project.Options for configuring loadPrompt().
log
Log a single event to the current experiment. The event will be batched and uploaded behind the scenes.The event to log. See
Experiment.log for full details.logError
logError functionlogin
Log into Braintrust. This will prompt you for your API token, which you can find at https://www.braintrust.dev/app/token. This method is called automatically byinit().
Options for configuring login().
loginToState
loginToState functionnewId
newId functionparseCachedHeader
parseCachedHeader functionpermalink
Format a permalink to the Braintrust application for viewing the span represented by the providedslug.
Links can be generated at any time, but they will only become viewable after
the span and its root have been flushed to the server and ingested.
If you have a Span object, use Span.link instead.
The identifier generated from
Span.export.Optional arguments.
promptDefinitionToPromptData
promptDefinitionToPromptData functionrenderMessage
renderMessage functionrenderPromptParams
renderPromptParams functionparams
undefined | objectOutputType | objectOutputType | objectOutputType | objectOutputType | objectOutputType
Reporter
Reporter functionreportFailures
reportFailures functionrunEvaluator
runEvaluator functionsetFetch
Set the fetch implementation to use for requests. You can specify it here, or when you calllogin.
The fetch implementation to use.
setMaskingFunction
Set a global masking function that will be applied to all logged data before sending to Braintrust. The masking function will be applied after records are merged but before they are sent to the backend.A function that takes a JSON-serializable object and returns a masked version.
Set to null to disable masking.
spanComponentsToObjectId
spanComponentsToObjectId functionstartSpan
Lower-level alternative totraced. This allows you to start a span yourself, and can be useful in situations
where you cannot use callbacks. However, spans started with startSpan will not be marked as the “current span”,
so currentSpan() and traced() will be no-ops. If you want to mark a span as current, use traced instead.
See traced for full details.
summarize
Summarize the current experiment, including the scores (compared to the closest reference experiment) and metadata.Options for summarizing the experiment.
traceable
A synonym forwrapTraced. If you’re porting from systems that use traceable, you can use this to
make your codebase more consistent.
traced
Toplevel function for starting a span. It checks the following (in precedence order):- Currently-active span
- Currently-active experiment
- Currently-active logger
parent is specified, it creates a span under the specified parent row. If none of these are active, it returns a no-op span object.
See Span.traced for full details.
updateSpan
Update a span using the output ofspan.export(). It is important that you only resume updating
to a span once the original span has been fully written and flushed, since otherwise updates to
the span may conflict with the original span.
withCurrent
Runs the provided callback with the span as the current span.withDataset
withDataset functionwithExperiment
withExperiment functionwithLogger
withLogger functionwithParent
withParent functionwrapAISDK
Wraps Vercel AI SDK methods with Braintrust tracing. Returns wrapped versions of generateText, streamText, generateObject, and streamObject that automatically create spans and log inputs, outputs, and metrics.The AI SDK namespace (e.g., import * as ai from “ai”)
wrapAISDKModel
Wrap an ai-sdk model (created with.chat(), .completion(), etc.) to add tracing. If Braintrust is
not configured, this is a no-op
wrapAnthropic
Wrap anAnthropic object (created with new Anthropic(...)) to add tracing. If Braintrust is
not configured, nothing will be traced. If this is not an Anthropic object, this function is
a no-op.
Currently, this only supports the v4 API.
wrapClaudeAgentSDK
Wraps the Claude Agent SDK with Braintrust tracing. This returns wrapped versions of query and tool that automatically trace all agent interactions.The Claude Agent SDK module
wrapMastraAgent
Wraps a Mastra agent with Braintrust tracing. This function wraps the agent’s underlying language model with BraintrustMiddleware and traces all agent method calls. Important: This wrapper only supports AI SDK v5 methods such asgenerate and stream.
The Mastra agent to wrap
Optional configuration for the wrapper
wrapOpenAI
Wrap anOpenAI object (created with new OpenAI(...)) to add tracing. If Braintrust is
not configured, nothing will be traced. If this is not an OpenAI object, this function is
a no-op.
Currently, this supports both the v4 and v5 API.
wrapOpenAIv4
wrapOpenAIv4 functionwrapTraced
Wrap a function withtraced, using the arguments as input and return value as output.
Any functions wrapped this way will automatically be traced, similar to the @traced decorator
in Python. If you want to correctly propagate the function’s name and define it in one go, then
you can do so like this:
myFunc will be traced, and the input and output will be logged automatically.
If tracing is inactive, i.e. there is no active logger or experiment, it’s just a no-op.
The function to wrap.
Span-level arguments (e.g. a custom name or type) to pass to
traced.Classes
AISpanProcessor
A span processor that filters spans to only export filtered telemetry. Only filtered spans and root spans will be forwarded to the inner processor. This dramatically reduces telemetry volume while preserving important observability. MethodsforceFlush(), onEnd(), onStart(), shutdown()
Attachment
Represents an attachment to be uploaded and the associated metadata.Attachment objects can be inserted anywhere in an event, allowing you to
log arbitrary file data. The SDK will asynchronously upload the file to
object storage and replace the Attachment object with an
AttachmentReference.
Properties
The object that replaces this
Attachment at upload time.data(), debugInfo(), upload()
BaseAttachment
BaseAttachment class Propertiesdata(), debugInfo(), upload()
BraintrustExporter
A trace exporter that sends OpenTelemetry spans to Braintrust. This exporter wraps the standard OTLP trace exporter and can be used with any OpenTelemetry setup, including @vercel/otel’s registerOTel function, NodeSDK, or custom tracer providers. It can optionally filter spans to only send AI-related telemetry. Environment Variables:- BRAINTRUST_API_KEY: Your Braintrust API key
- BRAINTRUST_PARENT: Parent identifier (e.g., “project_name:test”)
- BRAINTRUST_API_URL: Base URL for Braintrust API (defaults to https://api.braintrust.dev)
export(), forceFlush(), shutdown()
BraintrustSpanProcessor
A span processor that sends OpenTelemetry spans to Braintrust. This processor uses a BatchSpanProcessor and an OTLP exporter configured to send data to Braintrust’s telemetry endpoint. Span filtering is disabled by default but can be enabled with the filterAISpans option. Environment Variables:- BRAINTRUST_API_KEY: Your Braintrust API key
- BRAINTRUST_PARENT: Parent identifier (e.g., “project_name:test”)
- BRAINTRUST_API_URL: Base URL for Braintrust API (defaults to https://api.braintrust.dev)
forceFlush(), onEnd(), onStart(), shutdown()
BraintrustState
BraintrustState class PropertiesapiConn(), appConn(), bgLogger(), copyLoginInfo(), disable(), enforceQueueSizeLimit(), httpLogger(), login(), loginReplaceApiConn(), proxyConn(), resetIdGenState(), resetLoginInfo(), serialize(), setFetch(), setMaskingFunction(), setOverrideBgLogger(), toJSON(), toString(), deserialize()
BraintrustStream
A Braintrust stream. This is a wrapper around a ReadableStream ofBraintrustStreamChunk,
with some utility methods to make them easy to log and convert into various formats.
Methods
[asyncIterator](), copy(), finalValue(), toReadableStream(), parseRawEvent(), serializeRawEvent()
CodeFunction
CodeFunction class Propertieskey()
CodePrompt
CodePrompt class PropertiestoFunctionDefinition()
ContextManager
ContextManager class MethodsgetCurrentSpan(), getParentSpanIds(), runInContext()
Dataset
A dataset is a collection of records, such as model inputs and expected outputs, which represent data you can use to evaluate and fine-tune models. You can log production data to datasets, curate them with interesting examples, edit/delete records, and run evaluations against them. You should not createDataset objects directly. Instead, use the braintrust.initDataset() method.
Properties
[asyncIterator](), clearCache(), close(), delete(), fetch(), fetchedData(), flush(), getState(), insert(), summarize(), update(), version(), isDataset()
EvalResultWithSummary
EvalResultWithSummary class PropertiestoJSON(), toString()
Experiment
An experiment is a collection of logged events, such as model inputs and outputs, which represent a snapshot of your application at a particular point in time. An experiment is meant to capture more than just the model you use, and includes the data you use to test, pre- and post- processing code, comparison metrics (scores), and any other metadata you want to include. Experiments are associated with a project, and two experiments are meant to be easily comparable via theirinputs. You can change the attributes of the experiments in a project (e.g. scoring functions)
over time, simply by changing what you log.
You should not create Experiment objects directly. Instead, use the braintrust.init() method.
Properties
[asyncIterator](), clearCache(), close(), export(), fetch(), fetchBaseExperiment(), fetchedData(), flush(), getState(), log(), logFeedback(), startSpan(), summarize(), traced(), updateSpan(), version()
ExternalAttachment
Represents an attachment that resides in an external object store and the associated metadata.ExternalAttachment objects can be inserted anywhere in an event, similar to
Attachment objects, but they reference files that already exist in an external
object store rather than requiring upload. The SDK will replace the ExternalAttachment
object with an AttachmentReference during logging.
Properties
The object that replaces this
ExternalAttachment at upload time.data(), debugInfo(), upload()
FailedHTTPResponse
FailedHTTPResponse class PropertiesIDGenerator
Abstract base class for ID generators MethodsgetSpanId(), getTraceId(), shareRootSpanId()
JSONAttachment
Represents a JSON object that should be stored as an attachment.JSONAttachment is a convenience function that creates an Attachment
from JSON data. It’s particularly useful for large JSON objects that
would otherwise bloat the trace size.
The JSON data is automatically serialized and stored as an attachment
with content type “application/json”.
Properties
The object that replaces this
Attachment at upload time.data(), debugInfo(), upload()
LazyValue
LazyValue class Propertiesget(), getSync()
Logger
Logger class Propertiesexport(), flush(), log(), logFeedback(), startSpan(), traced(), updateSpan()
NoopSpan
A fake implementation of the Span API which does nothing. This can be used as the default span. PropertiesRow ID of the span.
Root span ID of the span.
Span ID of the span.
Parent span IDs of the span.
close(), end(), export(), flush(), link(), log(), logFeedback(), permalink(), setAttributes(), startSpan(), startSpanWithParents(), state(), toString(), traced()
OTELIDGenerator
ID generator that generates OpenTelemetry-compatible IDs Uses hex strings for compatibility with OpenTelemetry systems MethodsgetSpanId(), getTraceId(), shareRootSpanId()
Project
Project class PropertiesaddCodeFunction(), addPrompt(), publish()
ProjectNameIdMap
ProjectNameIdMap class MethodsgetId(), getName(), resolve()
Prompt
Prompt class Propertiesbuild(), buildWithAttachments(), fromPromptData(), isPrompt(), renderPrompt()
PromptBuilder
PromptBuilder class Methodscreate()
ReadonlyAttachment
A readonly alternative toAttachment, which can be used for fetching
already-uploaded Attachments.
Properties
Attachment metadata.
asBase64Url(), data(), metadata(), status()
ReadonlyExperiment
A read-only view of an experiment, initialized by passingopen: true to init().
Properties
[asyncIterator](), asDataset(), clearCache(), fetch(), fetchedData(), getState(), version()
ScorerBuilder
ScorerBuilder class Methodscreate()
SpanImpl
Primary implementation of theSpan interface. See Span for full details on each method.
We suggest using one of the various traced methods, instead of creating Spans directly. See Span.startSpan for full details.
Properties
Row ID of the span.
Root span ID of the span.
Span ID of the span.
Parent span IDs of the span.
close(), end(), export(), flush(), link(), log(), logFeedback(), permalink(), setAttributes(), setSpanParents(), startSpan(), startSpanWithParents(), state(), toString(), traced()
TestBackgroundLogger
TestBackgroundLogger class Methodsdrain(), flush(), log(), setMaskingFunction()
ToolBuilder
ToolBuilder class Methodscreate()
UUIDGenerator
ID generator that uses UUID4 for both span and trace IDs MethodsgetSpanId(), getTraceId(), shareRootSpanId()
Interfaces
AttachmentParams
AttachmentParams interface PropertiesBackgroundLoggerOpts
BackgroundLoggerOpts interface PropertiesContextParentSpanIds
ContextParentSpanIds interface PropertiesDatasetSummary
Summary of a dataset’s scores and metadata. PropertiesSummary of the dataset’s data.
Name of the dataset.
URL to the experiment’s page in the Braintrust app.
Name of the project that the dataset belongs to.
URL to the project’s page in the Braintrust app.
DataSummary
Summary of a dataset’s data. PropertiesNew or updated records added in this session.
Total records in the dataset.
EvalHooks
EvalHooks interface PropertiesThe expected output for the current evaluation.
The metadata object for the current evaluation. You can mutate this object to add or remove metadata.
The current parameters being used for this specific task execution.
Array parameters are converted to single values.
Report progress that will show up in the playground.
The task’s span.
The tags for the current evaluation.
The index of the current trial (0-based). This is useful when trialCount > 1.
Evaluator
Evaluator interface PropertiesAn optional experiment id to use as a base. If specified, the new experiment will be summarized
and compared to this experiment. This takes precedence over
baseExperimentName if specified.An optional experiment name to use as a base. If specified, the new experiment will be summarized
and compared to this experiment.
A function that returns a list of inputs, expected outputs, and metadata.
An optional description for the experiment.
Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored.
A default implementation is exported as
defaultErrorScoreHandler which will log a 0 score to the root span for any scorer that was not run.An optional name for the experiment.
Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
Whether the experiment should be public. Defaults to false.
The maximum number of tasks/scorers that will be run concurrently.
Defaults to undefined, in which case there is no max concurrency.
Optional additional metadata for the experiment.
A set of parameters that will be passed to the evaluator.
Can contain array values that will be converted to single values in the task.
If specified, uses the given project ID instead of the evaluator’s name to identify the project.
Optionally explicitly specify the git metadata for this experiment. This takes precedence over
gitMetadataSettings if specified.A set of functions that take an input, output, and expected value and return a score.
An abort signal that can be used to stop the evaluation.
If specified, uses the logger state to initialize Braintrust objects. If unspecified, falls back
to the global state (initialized using your API key).
Whether to summarize the scores of the experiment after it has run.
Defaults to true.
A function that takes an input and returns an output.
The duration, in milliseconds, after which to time out the evaluation.
Defaults to undefined, in which case there is no timeout.
The number of times to run the evaluator per input. This is useful for evaluating applications that
have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the
variance in the results.
Whether to update an existing experiment with
experiment_name if one exists. Defaults to false.ExperimentSummary
Summary of an experiment’s scores and metadata. PropertiesThe experiment scores are baselined against.
ID of the experiment. May be
undefined if the eval was run locally.Name of the experiment.
URL to the experiment’s page in the Braintrust app.
Name of the project that the experiment belongs to.
URL to the project’s page in the Braintrust app.
Summary of the experiment’s scores.
Exportable
Exportable interfaceExternalAttachmentParams
ExternalAttachmentParams interface PropertiesFunctionEvent
FunctionEvent interface PropertiesInvokeFunctionArgs
Arguments for theinvoke function.
Properties
The ID of the function to invoke.
The name of the global function to invoke.
The input to the function. This will be logged as the
input field in the span.Additional OpenAI-style messages to add to the prompt (only works for llm functions).
Additional metadata to add to the span. This will be logged as the
metadata field in the span.
It will also be available as the {{metadata}} field in the prompt and as the metadata argument
to the function.The mode of the function. If “auto”, will return a string if the function returns a string,
and a JSON object otherwise. If “parallel”, will return an array of JSON objects with one
object per tool call.
The parent of the function. This can be an existing span, logger, or experiment, or
the output of
.export() if you are distributed tracing. If unspecified, will use
the same semantics as traced() to determine the parent and no-op if not in a tracing
context.The name of the project containing the function to invoke.
The ID of the function in the prompt session to invoke.
The ID of the prompt session to invoke the function from.
A Zod schema to validate the output of the function and return a typed value. This
is only used if
stream is false.The slug of the function to invoke.
(Advanced) This parameter allows you to pass in a custom login state. This is useful
for multi-tenant environments where you are running functions from different Braintrust
organizations.
Whether to stream the function’s output. If true, the function will return a
BraintrustStream, otherwise it will return the output of the function as a JSON
object.Whether to use strict mode for the function. If true, the function will throw an error
if the variable names in the prompt do not match the input keys.
Tags to add to the span. This will be logged as the
tags field in the span.The version of the function to invoke.
LoginOptions
Options for logging in to Braintrust. PropertiesThe API key to use. If the parameter is not specified, will try to use the
BRAINTRUST_API_KEY environment variable.The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need
to change this unless you are doing the “Full” deployment.
A custom fetch implementation to use.
By default, the SDK installs an event handler that flushes pending writes on the
beforeExit event.
If true, this event handler will not be installed.Calls this function if there’s an error in the background flusher.
The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually
unnecessary unless you are logging in with a JWT.
LogOptions
LogOptions interface PropertiesMetricSummary
Summary of a metric’s performance. PropertiesDifference in metric between the current and reference experiment.
Number of improvements in the metric.
Average metric across all examples.
Name of the metric.
Number of regressions in the metric.
Unit label for the metric.
ObjectMetadata
ObjectMetadata interface PropertiesParentExperimentIds
ParentExperimentIds interface PropertiesParentProjectLogIds
ParentProjectLogIds interface PropertiesReporterBody
ReporterBody interfaceScoreSummary
Summary of a score’s performance. PropertiesDifference in score between the current and reference experiment.
Number of improvements in the score.
Name of the score.
Number of regressions in the score.
Average score across all examples.
Span
A Span encapsulates logged data and metrics for a unit of work. This interface is shared by all span implementations. We suggest using one of the varioustraced methods, instead of creating Spans directly. See Span.traced for full details.
Properties
Row ID of the span.
Root span ID of the span.
Span ID of the span.
Parent span IDs of the span.