Braintrust TypeScript SDK

Installation

npm install braintrust

Starting with v2.0.0, if you’re using the Vercel AI SDK integration or other features that require schema validation, you must install zod as a peer dependency: npm install zod

Functions

addAzureBlobHeaders

addAzureBlobHeaders function

headers

Record

url

string

BaseExperiment

Use this to specify that the dataset should actually be the data from a previous (base) experiment. If you do not specify a name, Braintrust will automatically figure out the best base experiment to use based on your git history (or fall back to timestamps).

options

Object

options.name

string

The name of the base experiment to use. If unspecified, Braintrust will automatically figure out the best base using your git history (or fall back to timestamps).

BraintrustMiddleware

Creates a Braintrust middleware for AI SDK v2 that automatically traces generateText and streamText calls with comprehensive metadata and metrics.

config

MiddlewareConfig

Configuration options for the middleware

buildLocalSummary

buildLocalSummary function

evaluator

EvaluatorDef

evaluator.evalName

string

required

evaluator.projectName

string

required

evaluator.baseExperimentId

string

An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment. This takes precedence over baseExperimentName if specified.

evaluator.baseExperimentName

string

An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

evaluator.classifiers

EvalClassifier[]

A set of functions that take an input, output, and expected value and return a Classification. Results are recorded under the classifications column. At least one of scores or classifiers must be provided.

evaluator.data

EvalData

required

A function that returns a list of inputs, expected outputs, and metadata.

evaluator.description

string

An optional description for the experiment.

evaluator.errorScoreHandler

ErrorScoreHandler

Optionally supply a custom function to specifically handle score values when tasks or scoring functions have errored. A default implementation is exported as defaultErrorScoreHandler which will log a 0 score to the root span for any scorer that was not run.

evaluator.experimentName

string

An optional name for the experiment.

evaluator.flushBeforeScoring

boolean

Flushes spans before calling scoring functions

evaluator.gitMetadataSettings

Object

Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.

evaluator.isPublic

boolean

Whether the experiment should be public. Defaults to false.

evaluator.maxConcurrency

number

The maximum number of tasks/scorers that will be run concurrently. Defaults to undefined, in which case there is no max concurrency.

evaluator.metadata

Record

Optional additional metadata for the experiment.

evaluator.parameters

Parameters | RemoteEvalParameters | Promise

A set of parameters that will be passed to the evaluator. Can be:

A raw EvalParameters schema (Zod schemas)
A Parameters instance from loadParameters()
A Promise<Parameters> from loadParameters()

evaluator.projectId

string

If specified, uses the given project ID instead of the evaluator’s name to identify the project.

evaluator.repoInfo

null | Object

Optionally explicitly specify the git metadata for this experiment. This takes precedence over gitMetadataSettings if specified.

evaluator.scores

EvalScorer[]

A set of functions that take an input, output, and expected value and return a Score. At least one of scores or classifiers must be provided.

evaluator.signal

AbortSignal

An abort signal that can be used to stop the evaluation.

evaluator.state

BraintrustState

If specified, uses the logger state to initialize Braintrust objects. If unspecified, falls back to the global state (initialized using your API key).

evaluator.summarizeScores

boolean

Whether to summarize the scores of the experiment after it has run. Defaults to true.

evaluator.tags

string[]

Optional tags for the experiment.

evaluator.task

EvalTask

required

A function that takes an input and returns an output.

evaluator.timeout

number

The duration, in milliseconds, after which to time out the evaluation. Defaults to undefined, in which case there is no timeout.

evaluator.trialCount

number

The number of times to run the evaluator per input. This is useful for evaluating applications that have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the variance in the results.

evaluator.update

boolean

Whether to update an existing experiment with experiment_name if one exists. Defaults to false.

results

EvalResult[]

precomputedScores

ScoreAccumulator

configureInstrumentation

Configure auto-instrumentation. This must be called before importing any AI SDKs to take effect.

config

InstrumentationConfig

config.integrations

Object

Configuration for individual SDK integrations. Set to false to disable instrumentation for that SDK.

constructLogs3OverflowRequest

constructLogs3OverflowRequest function

key

string

createFinalValuePassThroughStream

Create a stream that passes through the final value of the stream. This is used to implement BraintrustStream.finalValue().

onFinal

Object

A function to call with the final value of the stream.

onError

Object

currentExperiment

Returns the currently-active experiment (set by init). Returns undefined if no current experiment has been set.

options

OptionalStateArg

currentLogger

Returns the currently-active logger (set by initLogger). Returns undefined if no current logger has been set.

options

unknown

currentSpan

Return the currently-active span for logging (set by one of the traced methods). If there is no active span, returns a no-op span object, which supports the same interface as spans but does no logging. See Span for full details.

options

OptionalStateArg

defaultErrorScoreHandler

defaultErrorScoreHandler function

args

Object

args.data

EvalCase

required

args.rootSpan

Span

required

args.unhandledScores

string[]

required

deserializePlainStringAsJSON

deserializePlainStringAsJSON function

string

devNullWritableStream

devNullWritableStream function

Eval

Eval function

name

string

evaluator

Evaluator

evaluator.baseExperimentId

string

An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment. This takes precedence over baseExperimentName if specified.

evaluator.baseExperimentName

string

An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

evaluator.classifiers

EvalClassifier[]

evaluator.data

EvalData

required

A function that returns a list of inputs, expected outputs, and metadata.

evaluator.description

string

An optional description for the experiment.

evaluator.errorScoreHandler

ErrorScoreHandler

evaluator.experimentName

string

An optional name for the experiment.

evaluator.flushBeforeScoring

boolean

Flushes spans before calling scoring functions

evaluator.gitMetadataSettings

Object

Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.

evaluator.isPublic

boolean

Whether the experiment should be public. Defaults to false.

evaluator.maxConcurrency

number

The maximum number of tasks/scorers that will be run concurrently. Defaults to undefined, in which case there is no max concurrency.

evaluator.metadata

Record

Optional additional metadata for the experiment.

evaluator.parameters

Parameters | RemoteEvalParameters | Promise

A set of parameters that will be passed to the evaluator. Can be:

A raw EvalParameters schema (Zod schemas)
A Parameters instance from loadParameters()
A Promise<Parameters> from loadParameters()

evaluator.projectId

string

If specified, uses the given project ID instead of the evaluator’s name to identify the project.

evaluator.repoInfo

null | Object

Optionally explicitly specify the git metadata for this experiment. This takes precedence over gitMetadataSettings if specified.

evaluator.scores

EvalScorer[]

A set of functions that take an input, output, and expected value and return a Score. At least one of scores or classifiers must be provided.

evaluator.signal

AbortSignal

An abort signal that can be used to stop the evaluation.

evaluator.state

BraintrustState

If specified, uses the logger state to initialize Braintrust objects. If unspecified, falls back to the global state (initialized using your API key).

evaluator.summarizeScores

boolean

Whether to summarize the scores of the experiment after it has run. Defaults to true.

evaluator.tags

string[]

Optional tags for the experiment.

evaluator.task

EvalTask

required

A function that takes an input and returns an output.

evaluator.timeout

number

The duration, in milliseconds, after which to time out the evaluation. Defaults to undefined, in which case there is no timeout.

evaluator.trialCount

number

evaluator.update

boolean

Whether to update an existing experiment with experiment_name if one exists. Defaults to false.

reporterOrOpts

string | ReporterDef | EvalOptions

flush

Flush any pending rows to the server.

options

OptionalStateArg

getContextManager

getContextManager function

getIdGenerator

Factory function that creates a new ID generator instance each time. This eliminates global state and makes tests parallelizable. Each caller gets their own generator instance.

getPromptVersions

Get the versions for a prompt.

projectId

string

The ID of the project to query

promptId

string

The ID of the prompt to get versions for

getSpanParentObject

Mainly for internal use. Return the parent object for starting a span in a global context. Applies precedence: current span > propagated parent string > experiment > logger.

options

unknown

options.parent

string

getTemplateRenderer

Gets an active template renderer by name. Returns undefined if the renderer is not active.

name

string

Name of the renderer to retrieve

init

options

Readonly

Options for configuring init().

options.project

string

options.apiKey

string

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable.

options.appUrl

string

The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need to change this unless you are doing the “Full” deployment.

options.debugLogLevel

false | DebugLogLevel

Controls internal Braintrust SDK troubleshooting output.Use "error", "warn", "info", or "debug" to control how much internal SDK troubleshooting output is emitted. Use false to explicitly disable this output.When omitted, the SDK remains silent unless BRAINTRUST_DEBUG_LOG_LEVEL is set to "error", "warn", "info", or "debug". This option only affects local console output; it does not change what data is logged to Braintrust.

options.disableSpanCache

boolean

If true, disables the local span cache used to optimize scorer access to trace data. When disabled, scorers will always fetch spans from the server. Defaults to false.

options.fetch

Object

A custom fetch implementation to use.

options.noExitFlush

boolean

By default, the SDK installs an event handler that flushes pending writes on the beforeExit event. If true, this event handler will not be installed.

options.onFlushError

Object

Calls this function if there’s an error in the background flusher.

options.orgName

string

The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually unnecessary unless you are logging in with a JWT.

options.baseExperiment

string

options.baseExperimentId

string

options.dataset

AnyDataset | DatasetRef

options.description

string

options.experiment

string

options.gitMetadataSettings

GitMetadataSettings

options.isPublic

boolean

options.metadata

Record

options.parameters

ParametersRef | RemoteEvalParameters

options.projectId

string

options.repoInfo

RepoInfo

options.setCurrent

boolean

options.state

BraintrustState

options.tags

string[]

options.update

boolean

initDataset

Create a new dataset in a specified project. If the project does not exist, it will be created.

options

Readonly

Options for configuring initDataset().

options.project

string

options.apiKey

string

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable.

options.appUrl

string

The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need to change this unless you are doing the “Full” deployment.

options.debugLogLevel

false | DebugLogLevel

options.disableSpanCache

boolean

If true, disables the local span cache used to optimize scorer access to trace data. When disabled, scorers will always fetch spans from the server. Defaults to false.

options.fetch

Object

A custom fetch implementation to use.

options.noExitFlush

boolean

By default, the SDK installs an event handler that flushes pending writes on the beforeExit event. If true, this event handler will not be installed.

options.onFlushError

Object

Calls this function if there’s an error in the background flusher.

options.orgName

string

The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually unnecessary unless you are logging in with a JWT.

options.dataset

string

options.description

string

options.metadata

Record

options.projectId

string

options.state

BraintrustState

options.version

string

initExperiment

Alias for init(options).

options

Readonly

options.apiKey

string

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable.

options.appUrl

string

The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need to change this unless you are doing the “Full” deployment.

options.debugLogLevel

false | DebugLogLevel

options.disableSpanCache

boolean

If true, disables the local span cache used to optimize scorer access to trace data. When disabled, scorers will always fetch spans from the server. Defaults to false.

options.fetch

Object

A custom fetch implementation to use.

options.noExitFlush

boolean

By default, the SDK installs an event handler that flushes pending writes on the beforeExit event. If true, this event handler will not be installed.

options.onFlushError

Object

Calls this function if there’s an error in the background flusher.

options.orgName

string

The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually unnecessary unless you are logging in with a JWT.

options.baseExperiment

string

options.baseExperimentId

string

options.dataset

AnyDataset | DatasetRef

options.description

string

options.experiment

string

options.gitMetadataSettings

GitMetadataSettings

options.isPublic

boolean

options.metadata

Record

options.parameters

ParametersRef | RemoteEvalParameters

options.projectId

string

options.repoInfo

RepoInfo

options.setCurrent

boolean

options.state

BraintrustState

options.tags

string[]

options.update

boolean

initFunction

Creates a function that can be used as a task or scorer in the Braintrust evaluation framework. The returned function wraps a Braintrust function and can be passed directly to Eval(). When used as a task:

const myFunction = initFunction({projectName: "myproject", slug: "myfunction"});
await Eval("test", {
  task: myFunction,
  data: testData,
  scores: [...]
});

When used as a scorer:

const myScorer = initFunction({projectName: "myproject", slug: "myscorer"});
await Eval("test", {
  task: someTask,
  data: testData,
  scores: [myScorer]
});

options

Object

Options for the function.

options.projectName

string

required

The project name containing the function.

options.slug

string

required

The slug of the function to invoke.

options.state

BraintrustState

Optional Braintrust state to use.

options.version

string

Optional version of the function to use. Defaults to latest.

initLogger

Create a new logger in a specified project. If the project does not exist, it will be created.

options

Readonly

Additional options for configuring init().

options.apiKey

string

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable.

options.appUrl

string

The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need to change this unless you are doing the “Full” deployment.

options.debugLogLevel

false | DebugLogLevel

options.disableSpanCache

boolean

If true, disables the local span cache used to optimize scorer access to trace data. When disabled, scorers will always fetch spans from the server. Defaults to false.

options.fetch

Object

A custom fetch implementation to use.

options.noExitFlush

boolean

By default, the SDK installs an event handler that flushes pending writes on the beforeExit event. If true, this event handler will not be installed.

options.onFlushError

Object

Calls this function if there’s an error in the background flusher.

options.orgName

string

The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually unnecessary unless you are logging in with a JWT.

options.orgProjectMetadata

OrgProjectMetadata

options.projectId

string

options.projectName

string

options.setCurrent

boolean

options.state

BraintrustState

initNodeTestSuite

Creates a new Node.js test suite with Braintrust experiment tracking.

config

NodeTestSuiteConfig

invoke

Invoke a Braintrust function, returning a BraintrustStream or the value as a plain Javascript object.

args

unknown

The arguments for the function (see InvokeFunctionArgs for more details).

args.functionType

The type of the global function to invoke. If unspecified, defaults to ‘scorer’ for backward compatibility.

args.function_id

string

The ID of the function to invoke.

args.globalFunction

string

The name of the global function to invoke.

args.input

Input

required

The input to the function. This will be logged as the input field in the span.

args.messages

Additional OpenAI-style messages to add to the prompt (only works for llm functions).

args.metadata

Record

Additional metadata to add to the span. This will be logged as the metadata field in the span. It will also be available as the {{metadata}} field in the prompt and as the metadata argument to the function.

args.mode

null | "text" | "auto" | "parallel" | "json"

The mode of the function. If “auto”, will return a string if the function returns a string, and a JSON object otherwise. If “parallel”, will return an array of JSON objects with one object per tool call.

args.parent

string | Exportable

The parent of the function. This can be an existing span, logger, or experiment, or the output of .export() if you are distributed tracing. If unspecified, will use the same semantics as traced() to determine the parent and no-op if not in a tracing context.

args.projectId

string

The ID of the project to use for execution context (API keys, project defaults, etc.). This is not the project the function belongs to, but the project context for the invocation.

args.projectName

string

The name of the project containing the function to invoke.

args.promptSessionFunctionId

string

The ID of the function in the prompt session to invoke.

args.promptSessionId

string

The ID of the prompt session to invoke the function from.

args.schema

unknown

A Zod schema to validate the output of the function and return a typed value. This is only used if stream is false.

args.slug

string

The slug of the function to invoke.

args.state

BraintrustState

(Advanced) This parameter allows you to pass in a custom login state. This is useful for multi-tenant environments where you are running functions from different Braintrust organizations.

args.stream

Stream

Whether to stream the function’s output. If true, the function will return a BraintrustStream, otherwise it will return the output of the function as a JSON object.

args.strict

boolean

Whether to use strict mode for the function. If true, the function will throw an error if the variable names in the prompt do not match the input keys.

args.tags

string[]

Tags to add to the span. This will be logged as the tags field in the span.

args.version

string

The version of the function to invoke.

args.apiKey

string

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable.

args.appUrl

string

The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need to change this unless you are doing the “Full” deployment.

args.debugLogLevel

false | DebugLogLevel

args.disableSpanCache

boolean

If true, disables the local span cache used to optimize scorer access to trace data. When disabled, scorers will always fetch spans from the server. Defaults to false.

args.fetch

Object

A custom fetch implementation to use.

args.noExitFlush

boolean

By default, the SDK installs an event handler that flushes pending writes on the beforeExit event. If true, this event handler will not be installed.

args.onFlushError

Object

Calls this function if there’s an error in the background flusher.

args.orgName

string

The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually unnecessary unless you are logging in with a JWT.

isTemplateFormat

isTemplateFormat function

unknown

loadParameters

Load parameters from the specified project.

options

LoadParametersByProjectNameOptions

Options for configuring loadParameters().

loadPrompt

Load a prompt from the specified project.

options

LoadPromptOptions

Options for configuring loadPrompt().

options.apiKey

string

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable.

options.appUrl

string

The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need to change this unless you are doing the “Full” deployment.

options.debugLogLevel

false | DebugLogLevel

options.disableSpanCache

boolean

If true, disables the local span cache used to optimize scorer access to trace data. When disabled, scorers will always fetch spans from the server. Defaults to false.

options.fetch

Object

A custom fetch implementation to use.

options.noExitFlush

boolean

By default, the SDK installs an event handler that flushes pending writes on the beforeExit event. If true, this event handler will not be installed.

options.onFlushError

Object

Calls this function if there’s an error in the background flusher.

options.orgName

string

The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually unnecessary unless you are logging in with a JWT.

options.defaults

DefaultPromptArgs

options.environment

string

options.id

string

options.noTrace

boolean

options.projectId

string

options.projectName

string

options.slug

string

options.state

BraintrustState

options.version

string

log

Log a single event to the current experiment. The event will be batched and uploaded behind the scenes.

event

ExperimentLogFullArgs

The event to log. See Experiment.log for full details.

event.input

unknown

required

event.id

string

required

logError

logError function

span

Span

span.id

string

required

Row ID of the span.

span.kind

"span"

required

span.rootSpanId

string

required

Root span ID of the span.

span.spanId

string

required

Span ID of the span.

span.spanParents

string[]

required

Parent span IDs of the span.

error

unknown

Log into Braintrust. This will prompt you for your API token, which you can find at https://www.braintrust.dev/app/token. This method is called automatically by init().

options

unknown

Options for configuring login().

options.apiKey

string

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable.

options.appUrl

string

The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need to change this unless you are doing the “Full” deployment.

options.debugLogLevel

false | DebugLogLevel

options.disableSpanCache

boolean

If true, disables the local span cache used to optimize scorer access to trace data. When disabled, scorers will always fetch spans from the server. Defaults to false.

options.fetch

Object

A custom fetch implementation to use.

options.noExitFlush

boolean

By default, the SDK installs an event handler that flushes pending writes on the beforeExit event. If true, this event handler will not be installed.

options.onFlushError

Object

Calls this function if there’s an error in the background flusher.

options.orgName

string

The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually unnecessary unless you are logging in with a JWT.

Login again, even if you have already logged in (by default, this function will exit quickly if you have already logged in)

loginToState

loginToState function

options

LoginOptions

options.apiKey

string

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable.

options.appUrl

string

The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need to change this unless you are doing the “Full” deployment.

options.debugLogLevel

false | DebugLogLevel

options.disableSpanCache

boolean

If true, disables the local span cache used to optimize scorer access to trace data. When disabled, scorers will always fetch spans from the server. Defaults to false.

options.fetch

Object

A custom fetch implementation to use.

options.noExitFlush

boolean

By default, the SDK installs an event handler that flushes pending writes on the beforeExit event. If true, this event handler will not be installed.

options.onFlushError

Object

Calls this function if there’s an error in the background flusher.

options.orgName

string

The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually unnecessary unless you are logging in with a JWT.

newId

newId function

parseCachedHeader

parseCachedHeader function

value

undefined | null | string

parseTemplateFormat

parseTemplateFormat function

value

unknown

defaultFormat

TemplateFormat

permalink

Format a permalink to the Braintrust application for viewing the span represented by the provided slug. Links can be generated at any time, but they will only become viewable after the span and its root have been flushed to the server and ingested. If you have a Span object, use Span.link instead.

slug

string

The identifier generated from Span.export.

opts

Object

Optional arguments.

opts.appUrl

string

The app URL to use. If not provided, the app URL will be inferred from the state.

opts.orgName

string

The org name to use. If not provided, the org name will be inferred from the state.

opts.state

BraintrustState

The login state to use. If not provided, the global state will be used.

pickLogs3OverflowObjectIds

pickLogs3OverflowObjectIds function

row

Record

promptDefinitionToPromptData

promptDefinitionToPromptData function

promptDefinition

unknown

promptDefinition.environments

string[]

promptDefinition.model

string

required

promptDefinition.params

objectOutputType | objectOutputType | objectOutputType | objectOutputType | objectOutputType

promptDefinition.templateFormat

"none" | "mustache" | "nunjucks"

rawTools

Object[]

registerOtelFlush

Register a callback to flush OTEL spans. This is called by @braintrust/otel when it initializes a BraintrustSpanProcessor/Exporter. When ensureSpansFlushed is called (e.g., before a BTQL query in scorers), this callback will be invoked to ensure OTEL spans are flushed to the server. Also disables the span cache, since OTEL spans aren’t in the local cache and we need BTQL to see the complete span tree (both native + OTEL spans).

callback

Object

registerTemplatePlugin

Register a template plugin and optionally activate it If options is provided it will be used to create the active renderer. If options is omitted but the plugin defines defaultOptions, the registry will activate the renderer using those defaults.

plugin

TemplateRendererPlugin

plugin.createRenderer

Object

required

Factory function that creates a renderer instance.

plugin.defaultOptions

unknown

Default configuration options for this plugin.

plugin.name

string

required

Unique identifier for this plugin. Must match the format string used in templateFormat option.

renderMessage

renderMessage function

render

Object

message

renderPromptParams

renderPromptParams function

params

args

Record

options

Object

options.strict

boolean

options.templateFormat

TemplateFormat

renderTemplateContent

renderTemplateContent function

template

string

variables

Record

escape

Object

options

Object

options.strict

boolean

options.templateFormat

TemplateFormat

Reporter

Reporter function

name

string

reporter

ReporterBody

reportFailures

reportFailures function

evaluator

EvaluatorDef

evaluator.evalName

string

required

evaluator.projectName

string

required

evaluator.baseExperimentId

string

An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment. This takes precedence over baseExperimentName if specified.

evaluator.baseExperimentName

string

An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

evaluator.classifiers

EvalClassifier[]

evaluator.data

EvalData

required

A function that returns a list of inputs, expected outputs, and metadata.

evaluator.description

string

An optional description for the experiment.

evaluator.errorScoreHandler

ErrorScoreHandler

evaluator.experimentName

string

An optional name for the experiment.

evaluator.flushBeforeScoring

boolean

Flushes spans before calling scoring functions

evaluator.gitMetadataSettings

Object

Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.

evaluator.isPublic

boolean

Whether the experiment should be public. Defaults to false.

evaluator.maxConcurrency

number

The maximum number of tasks/scorers that will be run concurrently. Defaults to undefined, in which case there is no max concurrency.

evaluator.metadata

Record

Optional additional metadata for the experiment.

evaluator.parameters

Parameters | RemoteEvalParameters | Promise

A set of parameters that will be passed to the evaluator. Can be:

A raw EvalParameters schema (Zod schemas)
A Parameters instance from loadParameters()
A Promise<Parameters> from loadParameters()

evaluator.projectId

string

If specified, uses the given project ID instead of the evaluator’s name to identify the project.

evaluator.repoInfo

null | Object

Optionally explicitly specify the git metadata for this experiment. This takes precedence over gitMetadataSettings if specified.

evaluator.scores

EvalScorer[]

A set of functions that take an input, output, and expected value and return a Score. At least one of scores or classifiers must be provided.

evaluator.signal

AbortSignal

An abort signal that can be used to stop the evaluation.

evaluator.state

BraintrustState

If specified, uses the logger state to initialize Braintrust objects. If unspecified, falls back to the global state (initialized using your API key).

evaluator.summarizeScores

boolean

Whether to summarize the scores of the experiment after it has run. Defaults to true.

evaluator.tags

string[]

Optional tags for the experiment.

evaluator.task

EvalTask

required

A function that takes an input and returns an output.

evaluator.timeout

number

The duration, in milliseconds, after which to time out the evaluation. Defaults to undefined, in which case there is no timeout.

evaluator.trialCount

number

evaluator.update

boolean

Whether to update an existing experiment with experiment_name if one exists. Defaults to false.

failingResults

EvalResult[]

__namedParameters

ReporterOpts

runEvaluator

runEvaluator function

experiment

null | Experiment

evaluator

EvaluatorDef

evaluator.evalName

string

required

evaluator.projectName

string

required

evaluator.baseExperimentId

string

An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment. This takes precedence over baseExperimentName if specified.

evaluator.baseExperimentName

string

An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

evaluator.classifiers

EvalClassifier[]

evaluator.data

EvalData

required

A function that returns a list of inputs, expected outputs, and metadata.

evaluator.description

string

An optional description for the experiment.

evaluator.errorScoreHandler

ErrorScoreHandler

evaluator.experimentName

string

An optional name for the experiment.

evaluator.flushBeforeScoring

boolean

Flushes spans before calling scoring functions

evaluator.gitMetadataSettings

Object

Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.

evaluator.isPublic

boolean

Whether the experiment should be public. Defaults to false.

evaluator.maxConcurrency

number

The maximum number of tasks/scorers that will be run concurrently. Defaults to undefined, in which case there is no max concurrency.

evaluator.metadata

Record

Optional additional metadata for the experiment.

evaluator.parameters

Parameters | RemoteEvalParameters | Promise

A set of parameters that will be passed to the evaluator. Can be:

A raw EvalParameters schema (Zod schemas)
A Parameters instance from loadParameters()
A Promise<Parameters> from loadParameters()

evaluator.projectId

string

If specified, uses the given project ID instead of the evaluator’s name to identify the project.

evaluator.repoInfo

null | Object

Optionally explicitly specify the git metadata for this experiment. This takes precedence over gitMetadataSettings if specified.

evaluator.scores

EvalScorer[]

A set of functions that take an input, output, and expected value and return a Score. At least one of scores or classifiers must be provided.

evaluator.signal

AbortSignal

An abort signal that can be used to stop the evaluation.

evaluator.state

BraintrustState

If specified, uses the logger state to initialize Braintrust objects. If unspecified, falls back to the global state (initialized using your API key).

evaluator.summarizeScores

boolean

Whether to summarize the scores of the experiment after it has run. Defaults to true.

evaluator.tags

string[]

Optional tags for the experiment.

evaluator.task

EvalTask

required

A function that takes an input and returns an output.

evaluator.timeout

number

The duration, in milliseconds, after which to time out the evaluation. Defaults to undefined, in which case there is no timeout.

evaluator.trialCount

number

evaluator.update

boolean

Whether to update an existing experiment with experiment_name if one exists. Defaults to false.

progressReporter

ProgressReporter

filters

Filter[]

stream

undefined | Object

parameters

InferParameters

collectResults

boolean

enableCache

boolean

setFetch

Set the fetch implementation to use for requests. You can specify it here, or when you call login.

fetch

Object

The fetch implementation to use.

setMaskingFunction

Set a global masking function that will be applied to all logged data before sending to Braintrust. The masking function will be applied after records are merged but before they are sent to the backend.

maskingFunction

null | Object

A function that takes a JSON-serializable object and returns a masked version. Set to null to disable masking.

spanComponentsToObjectId

spanComponentsToObjectId function

__namedParameters

Object

__namedParameters.components

SpanComponentsV3

required

__namedParameters.state

BraintrustState

startSpan

Lower-level alternative to traced. This allows you to start a span yourself, and can be useful in situations where you cannot use callbacks. However, spans started with startSpan will not be marked as the “current span”, so currentSpan() and traced() will be no-ops. If you want to mark a span as current, use traced instead. See traced for full details.

args

unknown

args.event

StartSpanEventArgs

args.name

string

args.parent

string

args.parentSpanIds

ParentSpanIds | MultiParentSpanIds

args.propagatedEvent

StartSpanEventArgs

args.spanAttributes

Record

args.spanId

string

args.startTime

number

args.type

SpanType

summarize

Summarize the current experiment, including the scores (compared to the closest reference experiment) and metadata.

options

Object

Options for summarizing the experiment.

options.comparisonExperimentId

string

The experiment to compare against. If None, the most recent experiment on the origin’s main branch will be used.

options.summarizeScores

boolean

Whether to summarize the scores. If False, only the metadata will be returned.

traceable

A synonym for wrapTraced. If you’re porting from systems that use traceable, you can use this to make your codebase more consistent.

args

unknown

args.event

StartSpanEventArgs

args.name

string

args.parent

string

args.parentSpanIds

ParentSpanIds | MultiParentSpanIds

args.propagatedEvent

StartSpanEventArgs

args.spanAttributes

Record

args.spanId

string

args.startTime

number

args.type

SpanType

args.setCurrent

boolean

traced

Toplevel function for starting a span. It checks the following (in precedence order):

Currently-active span
Currently-active experiment
Currently-active logger

and creates a span under the first one that is active. Alternatively, if parent is specified, it creates a span under the specified parent row. If none of these are active, it returns a no-op span object. See Span.traced for full details.

callback

Object

args

unknown

args.event

StartSpanEventArgs

args.name

string

args.parent

string

args.parentSpanIds

ParentSpanIds | MultiParentSpanIds

args.propagatedEvent

StartSpanEventArgs

args.spanAttributes

Record

args.spanId

string

args.startTime

number

args.type

SpanType

args.setCurrent

boolean

updateSpan

Update a span using the output of span.export(). It is important that you only resume updating to a span once the original span has been fully written and flushed, since otherwise updates to the span may conflict with the original span.

__namedParameters

unknown

__namedParameters.exported

string

required

uploadLogs3OverflowPayload

Upload a logs3 overflow payload to the signed URL. This is a standalone function that can be used by both SDK and app code.

upload

Object

The overflow upload metadata from the API

upload.fields

Record

upload.headers

Record

upload.key

string

required

upload.method

"POST" | "PUT"

required

upload.signedUrl

string

required

payload

string

The JSON payload string to upload

fetchFn

Object

Optional custom fetch function (defaults to global fetch)

utf8ByteLength

utf8ByteLength function

value

string

withCurrent

Runs the provided callback with the span as the current span.

span

Span

span.id

string

required

Row ID of the span.

span.kind

"span"

required

span.rootSpanId

string

required

Root span ID of the span.

span.spanId

string

required

Span ID of the span.

span.spanParents

string[]

required

Parent span IDs of the span.

callback

Object

state

undefined | BraintrustState

withDataset

withDataset function

project

string

callback

Object

options

Readonly

options.apiKey

string

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable.

options.appUrl

string

The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need to change this unless you are doing the “Full” deployment.

options.debugLogLevel

false | DebugLogLevel

options.disableSpanCache

boolean

If true, disables the local span cache used to optimize scorer access to trace data. When disabled, scorers will always fetch spans from the server. Defaults to false.

options.fetch

Object

A custom fetch implementation to use.

options.noExitFlush

boolean

By default, the SDK installs an event handler that flushes pending writes on the beforeExit event. If true, this event handler will not be installed.

options.onFlushError

Object

Calls this function if there’s an error in the background flusher.

options.orgName

string

The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually unnecessary unless you are logging in with a JWT.

options.dataset

string

options.description

string

options.metadata

Record

options.projectId

string

options.state

BraintrustState

options.version

string

withExperiment

withExperiment function

project

string

callback

Object

options

Readonly

options.apiKey

string

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable.

options.appUrl

string

The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need to change this unless you are doing the “Full” deployment.

options.debugLogLevel

false | DebugLogLevel

options.disableSpanCache

boolean

If true, disables the local span cache used to optimize scorer access to trace data. When disabled, scorers will always fetch spans from the server. Defaults to false.

options.fetch

Object

A custom fetch implementation to use.

options.noExitFlush

boolean

By default, the SDK installs an event handler that flushes pending writes on the beforeExit event. If true, this event handler will not be installed.

options.onFlushError

Object

Calls this function if there’s an error in the background flusher.

options.orgName

string

The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually unnecessary unless you are logging in with a JWT.

options.baseExperiment

string

options.baseExperimentId

string

options.dataset

DatasetRef | AnyDataset

options.description

string

options.experiment

string

options.gitMetadataSettings

Object

options.isPublic

boolean

options.metadata

Record

options.parameters

RemoteEvalParameters | ParametersRef

options.projectId

string

options.repoInfo

null | Object

options.setCurrent

boolean

options.state

BraintrustState

options.tags

string[]

options.update

boolean

options.setCurrent

boolean

withLogger

withLogger function

callback

Object

options

Readonly

options.apiKey

string

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable.

options.appUrl

string

The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need to change this unless you are doing the “Full” deployment.

options.debugLogLevel

false | DebugLogLevel

options.disableSpanCache

boolean

If true, disables the local span cache used to optimize scorer access to trace data. When disabled, scorers will always fetch spans from the server. Defaults to false.

options.fetch

Object

A custom fetch implementation to use.

options.noExitFlush

boolean

By default, the SDK installs an event handler that flushes pending writes on the beforeExit event. If true, this event handler will not be installed.

options.onFlushError

Object

Calls this function if there’s an error in the background flusher.

options.orgName

string

The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually unnecessary unless you are logging in with a JWT.

options.orgProjectMetadata

OrgProjectMetadata

options.projectId

string

options.projectName

string

options.setCurrent

boolean

options.state

BraintrustState

options.setCurrent

boolean

withParent

withParent function

parent

string

callback

Object

state

undefined | BraintrustState

wrapAgentClass

wrapAgentClass function

AgentClass

any

options

WrapAISDKOptions

wrapAISDK

Wraps Vercel AI SDK methods with Braintrust tracing.

aiSDK

options

WrapAISDKOptions

wrapAISDKModel

Wrap an ai-sdk model (created with .chat(), .completion(), etc.) to add tracing. If Braintrust is not configured, this is a no-op

model

wrapAnthropic

Wrap an Anthropic object (created with new Anthropic(...)) so calls emit tracing-channel events that Braintrust plugins can consume. Currently, this only supports the v4 API.

anthropic

wrapClaudeAgentSDK

Wraps the Claude Agent SDK with Braintrust tracing. Query calls only publish tracing-channel events; the Claude Agent SDK plugin owns all span lifecycle work, including root/task spans, LLM spans, tool spans, and sub-agent spans.

sdk

The Claude Agent SDK module

wrapCohere

Wrap a Cohere client so method calls emit diagnostics-channel events that Braintrust plugins can consume.

cohere

wrapGoogleADK

Wrap a Google ADK module (imported with import * as adk from '@google/adk') to add tracing. If Braintrust is not configured, nothing will be traced. This wraps:

Runner.runAsync / InMemoryRunner.runAsync — top-level agent execution
BaseAgent.runAsync (and all subclasses) — individual agent invocations
BaseTool.runAsync / FunctionTool.runAsync — tool execution

LLM calls are already traced via the existing @google/genai instrumentation, since ADK uses GenAI internally.

adkModule

The Google ADK module

wrapGoogleGenAI

Wrap a Google GenAI module (imported with import * as googleGenAI from '@google/genai') to add tracing. If Braintrust is not configured, nothing will be traced.

googleGenAI

The Google GenAI module

wrapHuggingFace

Wrap a HuggingFace Inference SDK module or client with Braintrust tracing. Supports the LLM and embeddings APIs we intentionally instrument:

chatCompletion
chatCompletionStream
textGeneration
textGenerationStream
featureExtraction

huggingFace

HuggingFaceModule

wrapMastraAgent

wrapMastraAgent function

agent

_options

Object

_options.name

string

_options.span_name

string

wrapMistral

Wrap a Mistral client (created with new Mistral(...)) with Braintrust tracing.

mistral

wrapOpenAI

Wrap an OpenAI object (created with new OpenAI(...)) to add tracing. If Braintrust is not configured, nothing will be traced. If this is not an OpenAI object, this function is a no-op. Currently, this supports the v4, v5, and v6 API.

openai

wrapOpenAIv4

wrapOpenAIv4 function

openai

wrapOpenRouter

Wrap an OpenRouter client (created with new OpenRouter(...)) so calls emit diagnostics-channel events that Braintrust plugins can consume.

openrouter

wrapOpenRouterAgent

Wrap an @openrouter/agent OpenRouter client so callModel() emits diagnostics-channel events consumed by the OpenRouter Agent plugin.

agent

wrapTraced

Wrap a function with traced, using the arguments as input and return value as output. Any functions wrapped this way will automatically be traced, similar to the @traced decorator in Python. If you want to correctly propagate the function’s name and define it in one go, then you can do so like this:

const myFunc = wrapTraced(async function myFunc(input) {
 const result = await client.chat.completions.create({
   model: "gpt-3.5-turbo",
   messages: [{ role: "user", content: input }],
 });
 return result.choices[0].message.content ?? "unknown";
},
// Optional: if you're using a framework like NextJS that minifies your code, specify the function name and it will be used for the span name
{ name: "myFunc" },
);

Now, any calls to myFunc will be traced, and the input and output will be logged automatically. If tracing is inactive, i.e. there is no active logger or experiment, it’s just a no-op.

The function to wrap.

args

unknown

Span-level arguments (e.g. a custom name or type) to pass to traced.

args.event

StartSpanEventArgs

args.name

string

args.parent

string

args.parentSpanIds

ParentSpanIds | MultiParentSpanIds

args.propagatedEvent

StartSpanEventArgs

args.spanAttributes

Record

args.spanId

string

args.startTime

number

args.type

SpanType

args.setCurrent

boolean

wrapVitest

Wraps Vitest methods with Braintrust experiment tracking. This automatically creates datasets and experiments from your Vitest tests, tracking pass/fail rates and evaluation metrics. Experiments are automatically flushed after all tests complete.

vitestMethods

VitestMethods

Object containing Vitest methods (test, describe, expect, etc.)

config

WrapperConfig

Optional configuration object

Classes

Attachment

Represents an attachment to be uploaded and the associated metadata. Attachment objects can be inserted anywhere in an event, allowing you to log arbitrary file data. The SDK will asynchronously upload the file to object storage and replace the Attachment object with an AttachmentReference. Properties

reference

Object

The object that replaces this Attachment at upload time.

Methods data(), debugInfo(), upload()

BaseAttachment

BaseAttachment class Properties

reference

Object | Object

Methods data(), debugInfo(), upload()

BraintrustState

BraintrustState class Properties

apiUrl

null | string

appPublicUrl

null | string

appUrl

null | string

currentExperiment

undefined | Experiment

currentLogger

undefined | Logger

currentParent

IsoAsyncLocalStorage

currentSpan

IsoAsyncLocalStorage

debugLogLevel

DebugLogLevel

fetch

Object

gitMetadataSettings

Object

string

loggedIn

boolean

orgId

null | string

orgName

null | string

parametersCache

ParametersCache

promptCache

PromptCache

proxyUrl

null | string

spanCache

SpanCache

contextManager

unknown

idGenerator

unknown

Methods [RESET_CONTEXT_MANAGER_STATE](), apiConn(), appConn(), bgLogger(), copyLoginInfo(), disable(), enforceQueueSizeLimit(), flushOtel(), getDebugLogLevel(), hasDebugLogLevelOverride(), httpLogger(), login(), loginReplaceApiConn(), proxyConn(), registerOtelFlush(), resetIdGenState(), resetLoginInfo(), serialize(), setDebugLogLevel(), setFetch(), setMaskingFunction(), setOverrideBgLogger(), toJSON(), toString(), deserialize()

BraintrustStream

A Braintrust stream. This is a wrapper around a ReadableStream of BraintrustStreamChunk, with some utility methods to make them easy to log and convert into various formats. Methods [asyncIterator](), copy(), finalValue(), toReadableStream(), parseRawEvent(), serializeRawEvent()

CachedSpanFetcher

Cached span fetcher that handles fetching and caching spans by type. Caching strategy:

Cache spans by span type (Map<spanType, SpanData[]>)
Track if all spans have been fetched (allFetched flag)
When filtering by spanType, only fetch types not already in cache

Methods getSpans()

CodeFunction

CodeFunction class Properties

description

string

handler

ifExists

"error" | "ignore" | "replace"

metadata

Record

name

string

parameters

ZodType

project

Project

returns

ZodType

slug

string

tags

string[]

type

Methods key()

CodePrompt

CodePrompt class Properties

description

string

environmentSlugs

string[]

functionType

string

ifExists

"error" | "ignore" | "replace"

metadata

Record

name

string

project

Project

prompt

Object

slug

string

tags

string[]

toolFunctions

Object | Object | GenericCodeFunction[]

Methods toFunctionDefinition()

ContextManager

ContextManager class Methods getCurrentSpan(), getParentSpanIds(), runInContext(), wrapSpanForStore()

Dataset

A dataset is a collection of records, such as model inputs and expected outputs, which represent data you can use to evaluate and fine-tune models. You can log production data to datasets, curate them with interesting examples, edit/delete records, and run evaluations against them. You should not create Dataset objects directly. Instead, use the braintrust.initDataset() method. Properties

unknown

loggingState

unknown

name

unknown

project

unknown

Methods [asyncIterator](), clearCache(), close(), delete(), fetch(), fetchedData(), flush(), getState(), insert(), summarize(), update(), version(), isDataset()

EvalResultWithSummary

EvalResultWithSummary class Properties

results

EvalResult[]

summary

ExperimentSummary

Methods toJSON(), toString()

Experiment

An experiment is a collection of logged events, such as model inputs and outputs, which represent a snapshot of your application at a particular point in time. An experiment is meant to capture more than just the model you use, and includes the data you use to test, pre- and post- processing code, comparison metrics (scores), and any other metadata you want to include. Experiments are associated with a project, and two experiments are meant to be easily comparable via their inputs. You can change the attributes of the experiments in a project (e.g. scoring functions) over time, simply by changing what you log. You should not create Experiment objects directly. Instead, use the braintrust.init() method. Properties

dataset

AnyDataset

kind

"experiment"

unknown

loggingState

unknown

name

unknown

project

unknown

Methods [asyncIterator](), clearCache(), close(), export(), fetch(), fetchBaseExperiment(), fetchedData(), flush(), getState(), log(), logFeedback(), startSpan(), summarize(), traced(), updateSpan(), version()

ExternalAttachment

Represents an attachment that resides in an external object store and the associated metadata. ExternalAttachment objects can be inserted anywhere in an event, similar to Attachment objects, but they reference files that already exist in an external object store rather than requiring upload. The SDK will replace the ExternalAttachment object with an AttachmentReference during logging. Properties

reference

Object

The object that replaces this ExternalAttachment at upload time.

Methods data(), debugInfo(), upload()

FailedHTTPResponse

FailedHTTPResponse class Properties

data

string

status

number

text

string

IDGenerator

Abstract base class for ID generators Methods getSpanId(), getTraceId(), shareRootSpanId()

JSONAttachment

Represents a JSON object that should be stored as an attachment. JSONAttachment is a convenience function that creates an Attachment from JSON data. It’s particularly useful for large JSON objects that would otherwise bloat the trace size. The JSON data is automatically serialized and stored as an attachment with content type “application/json”. Properties

reference

Object

The object that replaces this Attachment at upload time.

Methods data(), debugInfo(), upload()

LazyValue

LazyValue class Properties

hasSucceeded

unknown

Methods get(), getSync()

Logger

Logger class Properties

kind

"logger"

asyncFlush

unknown

loggingState

unknown

org_id

unknown

project

unknown

Methods export(), flush(), log(), logFeedback(), startSpan(), traced(), updateSpan()

LoginInvalidOrgError

LoginInvalidOrgError class Properties

message

string

NoopSpan

A fake implementation of the Span API which does nothing. This can be used as the default span. Properties

string

Row ID of the span.

kind

"span"

rootSpanId

string

Root span ID of the span.

spanId

string

Span ID of the span.

spanParents

string[]

Parent span IDs of the span.

Methods close(), end(), export(), flush(), getParentInfo(), link(), log(), logFeedback(), permalink(), setAttributes(), startSpan(), startSpanWithParents(), state(), toString(), traced()

ObjectFetcher

ObjectFetcher class Properties

unknown

Methods [asyncIterator](), clearCache(), fetch(), fetchedData(), getState(), version()

Project

Project class Properties

string

name

string

parameters

ParametersBuilder

prompts

PromptBuilder

scorers

ScorerBuilder

tools

ToolBuilder

Methods addCodeFunction(), addParameters(), addPrompt(), publish()

ProjectNameIdMap

ProjectNameIdMap class Methods getId(), getName(), resolve()

Prompt

Prompt class Properties

unknown

name

unknown

options

unknown

projectId

unknown

prompt

unknown

promptData

unknown

slug

unknown

templateFormat

unknown

version

unknown

Methods build(), buildWithAttachments(), fromPromptData(), isPrompt(), renderPrompt()

PromptBuilder

PromptBuilder class Methods create()

ReadonlyAttachment

A readonly alternative to Attachment, which can be used for fetching already-uploaded Attachments. Properties

reference

Object | Object

Attachment metadata.

Methods asBase64Url(), data(), metadata(), status()

ReadonlyExperiment

A read-only view of an experiment, initialized by passing open: true to init(). Properties

unknown

loggingState

unknown

name

unknown

Methods [asyncIterator](), asDataset(), clearCache(), fetch(), fetchedData(), getState(), version()

ScorerBuilder

ScorerBuilder class Methods create()

SpanFetcher

Fetcher for spans by root_span_id, using the ObjectFetcher pattern. Handles pagination automatically via cursor-based iteration. Properties

unknown

Methods [asyncIterator](), clearCache(), fetch(), fetchedData(), getState(), version()

SpanImpl

Primary implementation of the Span interface. See Span for full details on each method. We suggest using one of the various traced methods, instead of creating Spans directly. See Span.startSpan for full details. Properties

kind

"span"

unknown

Row ID of the span.

rootSpanId

unknown

Root span ID of the span.

spanId

unknown

Span ID of the span.

spanParents

unknown

Parent span IDs of the span.

Methods close(), end(), export(), flush(), getParentInfo(), link(), log(), logFeedback(), permalink(), setAttributes(), setSpanParents(), startSpan(), startSpanWithParents(), state(), toString(), traced()

TestBackgroundLogger

TestBackgroundLogger class Methods drain(), flush(), flushBackpressureBytes(), log(), pendingFlushBytes(), setMaskingFunction()

ToolBuilder

ToolBuilder class Methods create()

UUIDGenerator

ID generator that uses UUID4 for both span and trace IDs Methods getSpanId(), getTraceId(), shareRootSpanId()

Interfaces

AttachmentParams

AttachmentParams interface Properties

contentType

string

data

string | ArrayBuffer | Blob

filename

string

state

BraintrustState

BackgroundLoggerOpts

BackgroundLoggerOpts interface Properties

noExitFlush

boolean

onFlushError

Object

ContextParentSpanIds

ContextParentSpanIds interface Properties

rootSpanId

string

spanParents

string[]

DatasetSummary

Summary of a dataset’s scores and metadata. Properties

dataSummary

undefined | DataSummary

Summary of the dataset’s data.

datasetName

string

Name of the dataset.

datasetUrl

string

URL to the experiment’s page in the Braintrust app.

projectName

string

Name of the project that the dataset belongs to.

projectUrl

string

URL to the project’s page in the Braintrust app.

DataSummary

Summary of a dataset’s data. Properties

newRecords

number

New or updated records added in this session.

totalRecords

number

Total records in the dataset.

EvalHooks

EvalHooks interface Properties

expected

Expected

The expected output for the current evaluation.

tags

undefined | string[]

The tags for the current evaluation.

trialIndex

number

The index of the current trial (0-based). This is useful when trialCount > 1.

Evaluator

Defines an evaluator. At least one of scores or classifiers must be provided. Properties

baseExperimentId

string

An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment. This takes precedence over baseExperimentName if specified.

baseExperimentName

string

An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

classifiers

EvalClassifier[]

data

EvalData

A function that returns a list of inputs, expected outputs, and metadata.

description

string

An optional description for the experiment.

errorScoreHandler

ErrorScoreHandler

experimentName

string

An optional name for the experiment.

flushBeforeScoring

boolean

Flushes spans before calling scoring functions

gitMetadataSettings

Object

Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.

isPublic

boolean

Whether the experiment should be public. Defaults to false.

maxConcurrency

number

The maximum number of tasks/scorers that will be run concurrently. Defaults to undefined, in which case there is no max concurrency.

metadata

Record

Optional additional metadata for the experiment.

parameters

Parameters | RemoteEvalParameters | Promise

A set of parameters that will be passed to the evaluator. Can be:

A raw EvalParameters schema (Zod schemas)
A Parameters instance from loadParameters()
A Promise<Parameters> from loadParameters()

projectId

string

If specified, uses the given project ID instead of the evaluator’s name to identify the project.

repoInfo

null | Object

Optionally explicitly specify the git metadata for this experiment. This takes precedence over gitMetadataSettings if specified.

scores

EvalScorer[]

A set of functions that take an input, output, and expected value and return a Score. At least one of scores or classifiers must be provided.

signal

AbortSignal

An abort signal that can be used to stop the evaluation.

state

BraintrustState

If specified, uses the logger state to initialize Braintrust objects. If unspecified, falls back to the global state (initialized using your API key).

summarizeScores

boolean

Whether to summarize the scores of the experiment after it has run. Defaults to true.

tags

string[]

Optional tags for the experiment.

task

EvalTask

A function that takes an input and returns an output.

timeout

number

The duration, in milliseconds, after which to time out the evaluation. Defaults to undefined, in which case there is no timeout.

trialCount

number

update

boolean

Whether to update an existing experiment with experiment_name if one exists. Defaults to false.

ExperimentSummary

Summary of an experiment’s scores and metadata. Properties

comparisonExperimentName

string

The experiment scores are baselined against.

experimentId

string

ID of the experiment. May be undefined if the eval was run locally.

experimentName

string

Name of the experiment.

experimentUrl

string

URL to the experiment’s page in the Braintrust app.

metrics

Record

projectId

string

projectName

string

Name of the project that the experiment belongs to.

projectUrl

string

URL to the project’s page in the Braintrust app.

scores

Record

Summary of the experiment’s scores.

Exportable

Exportable interface

ExternalAttachmentParams

ExternalAttachmentParams interface Properties

contentType

string

filename

string

state

BraintrustState

url

string

FunctionEvent

FunctionEvent interface Properties

description

string

environments

Object[]

function_data

function_type

if_exists

"error" | "ignore" | "replace"

metadata

Record

name

string

project_id

string

prompt_data

Object

slug

string

tags

string[]

GetThreadOptions

Options for getThread(). Properties

preprocessor

string

The preprocessor to use for extracting the thread. If not specified, uses the project default preprocessor, falling back to the global “thread” preprocessor.

InstrumentationConfig

InstrumentationConfig interface Properties

integrations

Object

Configuration for individual SDK integrations. Set to false to disable instrumentation for that SDK.

InvokeFunctionArgs

Arguments for the invoke function. Properties

functionType

The type of the global function to invoke. If unspecified, defaults to ‘scorer’ for backward compatibility.

function_id

string

The ID of the function to invoke.

globalFunction

string

The name of the global function to invoke.

input

Input

The input to the function. This will be logged as the input field in the span.

messages

Additional OpenAI-style messages to add to the prompt (only works for llm functions).

metadata

Record

mode

null | "text" | "auto" | "parallel" | "json"

parent

string | Exportable

projectId

string

The ID of the project to use for execution context (API keys, project defaults, etc.). This is not the project the function belongs to, but the project context for the invocation.

projectName

string

The name of the project containing the function to invoke.

promptSessionFunctionId

string

The ID of the function in the prompt session to invoke.

promptSessionId

string

The ID of the prompt session to invoke the function from.

schema

unknown

A Zod schema to validate the output of the function and return a typed value. This is only used if stream is false.

slug

string

The slug of the function to invoke.

state

BraintrustState

(Advanced) This parameter allows you to pass in a custom login state. This is useful for multi-tenant environments where you are running functions from different Braintrust organizations.

stream

Stream

Whether to stream the function’s output. If true, the function will return a BraintrustStream, otherwise it will return the output of the function as a JSON object.

strict

boolean

Whether to use strict mode for the function. If true, the function will throw an error if the variable names in the prompt do not match the input keys.

tags

string[]

Tags to add to the span. This will be logged as the tags field in the span.

version

string

The version of the function to invoke.

LoginOptions

Options for logging in to Braintrust. Properties

apiKey

string

The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable.

appUrl

string

The URL of the Braintrust App. Defaults to https://www.braintrust.dev. You should not need to change this unless you are doing the “Full” deployment.

debugLogLevel

false | DebugLogLevel

disableSpanCache

boolean

If true, disables the local span cache used to optimize scorer access to trace data. When disabled, scorers will always fetch spans from the server. Defaults to false.

fetch

Object

A custom fetch implementation to use.

noExitFlush

boolean

By default, the SDK installs an event handler that flushes pending writes on the beforeExit event. If true, this event handler will not be installed.

onFlushError

Object

Calls this function if there’s an error in the background flusher.

orgName

string

The name of a specific organization to connect to. Since API keys are scoped to organizations, this parameter is usually unnecessary unless you are logging in with a JWT.

LogOptions

LogOptions interface Properties

asyncFlush

IsAsyncFlush

computeMetadataArgs

Record

linkArgs

LinkArgs

MetricSummary

Summary of a metric’s performance. Properties

diff

number

Difference in metric between the current and reference experiment.

improvements

number

Number of improvements in the metric.

metric

number

Average metric across all examples.

name

string

Name of the metric.

regressions

number

Number of regressions in the metric.

unit

string

Unit label for the metric.

ObjectMetadata

ObjectMetadata interface Properties

fullInfo

Record

string

name

string

ParentExperimentIds

ParentExperimentIds interface Properties

experiment_id

string

ParentProjectLogIds

ParentProjectLogIds interface Properties

log_id

"g"

project_id

string

ReporterBody

ReporterBody interface

ScoreSummary

Summary of a score’s performance. Properties

diff

number

Difference in score between the current and reference experiment.

improvements

number

Number of improvements in the score.

name

string

Name of the score.

regressions

number

Number of regressions in the score.

score

number

Average score across all examples.

Span

A Span encapsulates logged data and metrics for a unit of work. This interface is shared by all span implementations. We suggest using one of the various traced methods, instead of creating Spans directly. See Span.traced for full details. Properties

string

Row ID of the span.

kind

"span"

rootSpanId

string

Root span ID of the span.

spanId

string

Span ID of the span.

spanParents

string[]

Parent span IDs of the span.

SpanData

Span data returned by getSpans(). Properties

input

unknown

metadata

Record

output

unknown

span_attributes

Object

span_id

string

span_parents

string[]

TemplateRenderer

TemplateRenderer interface Properties

lint

Object

render

Object

TemplateRendererPlugin

A template renderer plugin that can be registered with Braintrust. Plugins provide support for different template engines (e.g., Nunjucks). They use a factory pattern where the plugin is registered once, then activated with specific configuration options when needed. Properties

createRenderer

Object

Factory function that creates a renderer instance.

defaultOptions

unknown

Default configuration options for this plugin.

name

string

Unique identifier for this plugin. Must match the format string used in templateFormat option.

Trace

Interface for trace objects that can be used by scorers. Both the SDK’s LocalTrace class and the API wrapper’s WrapperTrace implement this.

SDKs

API

CLI

Other

Documentation Index

​Installation

​Functions

​addAzureBlobHeaders

​BaseExperiment

​BraintrustMiddleware

​buildLocalSummary

​configureInstrumentation

​constructLogs3OverflowRequest

​createFinalValuePassThroughStream

​currentExperiment

​currentLogger

​currentSpan

​defaultErrorScoreHandler

​deserializePlainStringAsJSON

​devNullWritableStream

​Eval

​flush

​getContextManager

​getIdGenerator

​getPromptVersions

​getSpanParentObject

​getTemplateRenderer

​init

​initDataset

​initExperiment

​initFunction

​initLogger

​initNodeTestSuite

​invoke

​isTemplateFormat

​loadParameters

​loadPrompt

​log

​logError

​login

​loginToState

​newId

​parseCachedHeader

​parseTemplateFormat

​permalink

​pickLogs3OverflowObjectIds

​promptDefinitionToPromptData

​registerOtelFlush

​registerTemplatePlugin

​renderMessage

​renderPromptParams

​renderTemplateContent

​Reporter

​reportFailures

​runEvaluator

​setFetch

​setMaskingFunction

​spanComponentsToObjectId

​startSpan

​summarize

​traceable

​traced

​updateSpan

​uploadLogs3OverflowPayload

​utf8ByteLength

​withCurrent

​withDataset

​withExperiment

​withLogger

​withParent

​wrapAgentClass

​wrapAISDK

​wrapAISDKModel

​wrapAnthropic

​wrapClaudeAgentSDK

​wrapCohere

​wrapGoogleADK

​wrapGoogleGenAI

​wrapHuggingFace

​wrapMastraAgent

Installation

Functions

addAzureBlobHeaders

BaseExperiment

BraintrustMiddleware

buildLocalSummary

configureInstrumentation

constructLogs3OverflowRequest

createFinalValuePassThroughStream

currentExperiment

currentLogger

currentSpan

defaultErrorScoreHandler

deserializePlainStringAsJSON

devNullWritableStream

Eval

flush

getContextManager

getIdGenerator

getPromptVersions

getSpanParentObject

getTemplateRenderer

init

initDataset

initExperiment

initFunction

initLogger

initNodeTestSuite

invoke

isTemplateFormat

loadParameters

loadPrompt

log

logError

login

loginToState

newId

parseCachedHeader

parseTemplateFormat

permalink

pickLogs3OverflowObjectIds

promptDefinitionToPromptData

registerOtelFlush

registerTemplatePlugin

renderMessage

renderPromptParams

renderTemplateContent

Reporter

reportFailures

runEvaluator

setFetch

setMaskingFunction

spanComponentsToObjectId

startSpan

summarize

traceable

traced

updateSpan

uploadLogs3OverflowPayload

utf8ByteLength

withCurrent

withDataset

withExperiment

withLogger

withParent

wrapAgentClass

wrapAISDK

wrapAISDKModel

wrapAnthropic

wrapClaudeAgentSDK

wrapCohere

wrapGoogleADK

wrapGoogleGenAI

wrapHuggingFace

wrapMastraAgent

wrapMistral

wrapOpenAI

wrapOpenAIv4

wrapOpenRouter

wrapOpenRouterAgent