Tracing
Tracing records what your application does as spans you can inspect in Braintrust. The recommended way to capture AI calls is auto-instrumentation, which traces supported provider gems with no code changes (see Install and instrument). The APIs below initialize the SDK, control instrumentation, and let you flush and link to your traces.Braintrust.init
Initializes SDK state, configures OpenTelemetry tracing, and optionally auto-instruments supported provider gems. Call it once on startup.
api_key(String): Braintrust API key. Defaults toENV["BRAINTRUST_API_KEY"].org_name(String): organization name to use during login. Defaults toENV["BRAINTRUST_ORG_NAME"].default_project(String): project used for traced spans that do not set an explicit parent. Defaults toENV["BRAINTRUST_DEFAULT_PROJECT"].app_url(String): Braintrust app URL. Defaults toENV["BRAINTRUST_APP_URL"]orhttps://www.braintrust.dev.api_url(String): Braintrust API URL. Defaults toENV["BRAINTRUST_API_URL"]orhttps://api.braintrust.dev.set_global(Boolean): sets the created state as global SDK state. Defaults totrue.blocking_login(Boolean): logs in synchronously instead of using the background login thread. Defaults tofalse.enable_tracing(Boolean): enables OpenTelemetry tracing setup. Defaults totrue.tracer_provider(OpenTelemetry::SDK::Trace::TracerProvider): tracer provider to use instead of creating or reusing the global provider.filter_ai_spans(Boolean): sends only AI-related spans when enabled. Defaults toENV["BRAINTRUST_OTEL_FILTER_AI_SPANS"] == "true".span_filter_funcs(Array<Proc>): custom span filters.exporter(Object): optional OpenTelemetry exporter override.auto_instrument(BooleanorHash): controls provider auto-instrumentation. Usefalse,true,{only: [...]}, or{except: [...]}. Defaults toENV["BRAINTRUST_AUTO_INSTRUMENT"], enabled.
Braintrust.auto_instrument!
Discovers loaded provider gems and instruments the ones Braintrust supports. Braintrust.init runs this for you when auto-instrumentation is enabled, so you rarely call it directly. Reach for it to instrument explicitly, such as after initializing with auto_instrument: false, or to re-run discovery once more provider gems have loaded.
Array<Symbol> (the instrumented integrations), or nil when disabled.
Arguments:
config(nil,Boolean, orHash):nilreads environment settings,falsedisables,trueenables, and hashes accept:onlyor:except.
Braintrust.instrument!
Instruments a specific integration by name. Use it to wrap a single provider, or to instrument one specific client instance.
name(Symbol, required): integration name::anthropic,:openai,:ruby_openai, or:ruby_llm.target(Object): provider client instance to instrument.tracer_provider(OpenTelemetry::SDK::Trace::TracerProvider): tracer provider.
Braintrust::Trace.permalink
Builds a Braintrust UI URL for an OpenTelemetry span, so you can link straight to a trace from your own logs or app.
String (empty when the span lacks the Braintrust attributes needed to build a permalink).
Braintrust::Trace.flush_spans
Forces the global tracer provider to flush buffered spans. Call it before a short-lived process exits if you’ve disabled the automatic flush on exit.
Boolean.
Evaluations
An evaluation runs your task over a set of cases, scores each output, and logs the results to an experiment, which is how you measure quality and catch regressions as you change prompts or models.Braintrust::Eval.run is the main entry point. The other APIs here define the tasks, scorers, classifiers, and reusable evaluators it uses.
Braintrust::Eval.run
Runs an evaluation, logs each case to an experiment, and returns a result summary. Give it the cases to run, a task, and at least one scorer or classifier.
Braintrust::Eval::Result.
Arguments:
task(#call, required): callable that receivesinput:and returns the output to score.cases(ArrayorEnumerable): inline evaluation cases. Mutually exclusive withdataset.dataset(String,Hash,Braintrust::Dataset, orBraintrust::Dataset::ID): dataset to fetch for evaluation cases. Mutually exclusive withcases.scorers(Array<String, Braintrust::Scorer, #call>): scorers to run. At least one scorer or classifier is required.classifiers(Array<Braintrust::Classifier, #call>): classifiers to run. At least one scorer or classifier is required.project(String): project name. Enables full API mode and creates or resolves the project.project_id(String): project UUID. Skips project lookup when provided.experiment(String): experiment name.on_progress(#call): callback after each case. Receives output and scores, or an error payload.parallelism(Integer): number of worker threads. Tasks and scorers must be thread-safe when greater than1. Defaults to1.tags(Array<String>): experiment tags.metadata(Hash): experiment metadata.update(Boolean): reuse an existing experiment when possible. Defaults tofalse.quiet(Boolean): suppress result output. Defaults tofalse.tracer_provider(OpenTelemetry::SDK::Trace::TracerProvider): tracer provider. Defaults to the global provider.parent(Hash): parent span context for remote evals.parameters(Hash): runtime parameters passed to tasks, scorers, and classifiers that declareparameters:.
Braintrust::Task.new
A task is the function under evaluation: it takes a case’s input and returns the output to score. Eval.run accepts a plain lambda for task:, but Task.new wraps one in a named, reusable object, which is useful when you want the task name to show up in Braintrust or you plan to reuse it across runs. The block declares the keyword arguments it needs, and extra keywords are filtered out automatically.
name(String): task name. Defaults to"task".block(Proc, required): declare the keyword arguments you need.
Braintrust::Scorer.new
A scorer measures how good your task’s output is, producing a score for each case in an evaluation. Scorer.new is the standard way to define a custom scorer inline: give it a name and a block that returns a number (typically 0 to 1), a score hash, or an array of score hashes.
input:, expected:, output:, metadata:, trace:, and parameters:. Return an array of score hashes to emit multiple named scores from one scorer:
Braintrust::Classifier.new
A classifier categorizes your task’s output instead of scoring it numerically, which is useful for labeling outputs by topic, intent, or failure type. Classifier.new defines one inline: give it a name and a block that returns a classification.
input:, expected:, output:, metadata:, trace:, and parameters:. They return a classification hash, an array of classification hashes, or nil.
Braintrust::Eval::Evaluator
Bundles a task, scorers, and a parameter schema into one reusable object. Serve it from the Ruby dev server so Braintrust can run it as a remote eval from the Playground, or call #run to run it directly.
task(#call): task callable.scorers(Array): scorers attached to the evaluator. Defaults to[].classifiers(Array): classifiers attached to the evaluator. Defaults to[].parameters(Hash): parameter schema used by remote evals and the Playground UI. Defaults to{}.
#run on the evaluator to delegate to Braintrust::Eval.run.
Braintrust::Eval::Result
Represents the outcome of an evaluation run, returned by Braintrust::Eval.run.
#skip-compile
experiment_id,experiment_name: experiment identifiers.project_id,project_name: project identifiers.permalink: Braintrust UI link for the experiment.errors: errors collected during the run.duration: evaluation duration in seconds.scores: raw score data keyed by scorer name.classifications: classification results keyed by classifier name.success?: returnstruewhen no errors occurred.failed?: returnstruewhen errors occurred.summary: lazily computed experiment summary.scorer_stats: score statistics keyed by scorer name.to_pretty: human-readable CLI summary.
Datasets
A dataset is a versioned collection of cases you manage in Braintrust and reuse across evaluations. Reference one by name or ID, iterate its records, or pass it toBraintrust::Eval.run.
Braintrust::Dataset
References a dataset you manage in Braintrust. It fetches records lazily as you iterate, implementing Enumerable, so you can stream a large dataset without loading every record into memory at once. Use fetch_all when you do want them all in an array, or pass the dataset straight to Braintrust::Eval.run.
name(String): dataset name. Required unlessidis provided.id(String): dataset UUID. Required unlessnameis provided.project(String): project name. Required when usingname.version(String): dataset version to pin to.
id→String: resolves and returns the dataset UUID.metadata→Hash: fetches and returns dataset metadata.fetch_all(limit: nil)→Array: fetches records eagerly into an array.each: lazily iterates records and implementsEnumerable.
Braintrust::Dataset::ID
Wraps a dataset UUID so Braintrust::Eval.run can distinguish dataset-by-ID from dataset-by-name.
Prompts and functions
In Braintrust, functions are units of logic you define and version in the UI, then load or invoke from your code. A prompt is a function whose job is to call a model with a templated set of messages. Other functions include scorers, tools, and code you deploy. Load and render a saved prompt withBraintrust::Prompt.load, or turn a deployed function into an eval task or scorer with Braintrust::Functions.
Braintrust::Prompt.load
Loads a saved prompt from Braintrust by project and slug. Use the returned prompt’s #build to render provider parameters with runtime variables.
Braintrust::Prompt.
Arguments:
slug(String, required): prompt slug.project(String): project name. Required unlessproject_idis provided.project_id(String): project UUID. Required unlessprojectis provided.version(String): prompt version. Defaults to latest.defaults(Hash): default template variables for#build. Defaults to{}.api(Braintrust::API): API client override.
Braintrust::Prompt#build
Renders prompt variables and returns provider parameters ready to send to a model.
Hash containing :model, :messages, optional :tools, and prompt model parameters.
Arguments:
variables(Hash): template variables.strict(Boolean): raises when a template variable is missing. Defaults tofalse.**kwargs(Hash): additional template variables.
Braintrust::Functions.task
Creates a task that invokes a remote Braintrust function, for use as an eval task.
Braintrust::Task.
Arguments:
project(String, required): project name.slug(String, required): function slug.tracer_provider(OpenTelemetry::SDK::Trace::TracerProvider): tracer provider. Defaults to the global provider.
Braintrust::Functions.scorer
Creates a scorer that invokes a remote Braintrust function, for use as an eval scorer.
Braintrust::Scorer.
Arguments:
project(String): project name. Use withslug.slug(String): function slug. Use withproject.id(String): function UUID. Alternative toprojectandslug.version(String): function version when usingid.tracer_provider(OpenTelemetry::SDK::Trace::TracerProvider): tracer provider. Defaults to the global provider.
Attachments
When your traces involve binary content like images or PDFs, log it as an attachment so it appears in Braintrust instead of as an opaque blob.Braintrust::Trace::Attachment
Wraps binary data so you can attach it to a traced message. Create one from bytes, a file, or a URL, then add its hash to a message’s content.
from_bytes(content_type, data)→Attachment: creates an attachment from raw bytes.from_file(content_type, path)→Attachment: reads a file and creates an attachment.from_url(url)→Attachment: fetches a URL and creates an attachment using the response content type.
to_data_url→String: returns a base64 data URL.to_message→Hash: returns thebase64_attachmentmessage hash.to_h: alias forto_message.
API client
For direct access to the Braintrust REST API, useBraintrust::API and its namespaces. Reach for these when you need to manage datasets or functions programmatically, beyond what the higher-level APIs above cover.
Braintrust::API
Creates a REST API client using SDK state.
datasets→Braintrust::API::Datasets: the datasets API namespace.functions→Braintrust::API::Functions: the functions API namespace.login→Braintrust::API: logs in through SDK state and returns the API client.object_permalink(object_type:, object_id:)→String: builds a Braintrust UI object permalink.
Braintrust::API::Datasets
Provides dataset management APIs.
list(project_name: nil, dataset_name: nil, project_id: nil, limit: nil): lists datasets with optional filters.get(project_name:, name:): fetches one dataset by project and name.get_by_id(id:): fetches dataset metadata by UUID.create(name:, project_name: nil, project_id: nil, description: nil, metadata: nil): creates or registers a dataset.insert(id:, events:): inserts events into a dataset.delete(id:): deletes a dataset.permalink(id:): builds a Braintrust UI dataset permalink.fetch(id:, limit: 1000, cursor: nil, version: nil): fetches dataset records with pagination.
Braintrust::API::Functions
Provides function and prompt management APIs.
list(project_name: nil, project_id: nil, function_name: nil, slug: nil, limit: nil): lists functions with optional filters.create(project_name:, slug:, function_data:, prompt_data: nil, name: nil, description: nil, function_type: nil, function_schema: nil): creates or registers a function.invoke(id:, input:): invokes a function by UUID.get(id:, version: nil): fetches a function by UUID.delete(id:): deletes a function.create_tool(project_name:, slug:, prompt_data:, name: nil, description: nil, function_schema: nil): creates a tool function.create_scorer(project_name:, slug:, prompt_data:, name: nil, description: nil, function_schema: nil): creates a scorer function.create_task(project_name:, slug:, prompt_data:, name: nil, description: nil, function_schema: nil): creates a task function.create_llm(project_name:, slug:, prompt_data:, name: nil, description: nil, function_schema: nil): creates an LLM function.
Dev server
Serve your evaluators over HTTP so Braintrust can run them remotely, from the Playground or remote evals. Use the Rack app for any Ruby app, or the Rails engine to mount it in an existing Rails application.Braintrust::Server::Rack.app
Creates a Rack app that exposes evaluators for remote evals from Braintrust.
eval_server.ru
evaluators(Hash<String, Braintrust::Eval::Evaluator>): evaluators served by slug. Defaults to{}.auth(:clerk_token,:none, or an object): auth strategy. Use:noneto disable incoming request auth for local development. Defaults to:clerk_token.
Braintrust::Contrib::Rails::Server::Engine
A Rails Engine you can mount in a Rails application to serve evaluators alongside your app.
Run the generator to create an initializer that wires up your evaluators:
config/routes.rb
config/initializers/braintrust_server.rb, where you can review or customize the evaluator mapping and auth strategy.
Configuration
Configure the SDK with environment variables, or pass the equivalent options toBraintrust.init.
Environment variables
BRAINTRUST_API_KEY(required): Braintrust API key.BRAINTRUST_API_URL: Braintrust API URL. Defaults tohttps://api.braintrust.dev.BRAINTRUST_APP_URL: Braintrust app URL. Defaults tohttps://www.braintrust.dev.BRAINTRUST_AUTO_INSTRUMENT: set tofalseto disable auto-instrumentation.BRAINTRUST_DEBUG: set totrueto enable debug logging.BRAINTRUST_DEFAULT_PROJECT: default project for traced spans.BRAINTRUST_FLUSH_ON_EXIT: set tofalseto disable automatic span flushing on process exit.BRAINTRUST_INSTRUMENT_EXCEPT: comma-separated list of integrations to skip.BRAINTRUST_INSTRUMENT_ONLY: comma-separated list of integrations to enable, such asopenai,anthropic.BRAINTRUST_ORG_NAME: organization name.BRAINTRUST_OTEL_FILTER_AI_SPANS: set totrueto export only AI-related spans.