> ## Documentation Index
> Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# API reference

> Functions, classes, and configuration options found in the Braintrust Java SDK.

This page covers the key APIs in the Braintrust Java SDK. For setup, see the [Quickstart](/sdks/java/quickstart). For the complete reference, see [javadoc.io](https://javadoc.io/doc/dev.braintrust/braintrust-sdk-java/latest).

## Tracing

Tracing records what your application does as spans you can inspect in Braintrust. The recommended way to capture AI calls is auto-instrumentation: attach the Braintrust Java Agent at JVM startup and supported clients are traced with no code changes (see [Install and instrument](/sdks/java/install-and-instrument)). The APIs below configure tracing programmatically and let you trace your own application code. To instrument a specific provider client, see [Wrappers](#wrappers).

### `Braintrust.get`

Gets the global `Braintrust` instance, creating it on the first call and handing back that same instance on every call after. This is your entry point to the SDK: use it to set up tracing, run evaluations, load prompts, and fetch datasets.

Call `get()` with no arguments to configure the SDK from environment variables, which is the most common setup. To configure it in code instead, build a `BraintrustConfig` and pass it to `get(config)` (see [Configuration](#configuration)).

```java #skip-compile theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import dev.braintrust.Braintrust;

var braintrust = Braintrust.get();
```

Returns: `Braintrust`

### `braintrust.openTelemetryCreate`

Creates an `OpenTelemetry` instance pointed at Braintrust and registers it as the global instance, so the spans you record and the AI calls you trace are exported to Braintrust. Braintrust tracing is built on OpenTelemetry, and this sets up that pipeline. Call it once at startup.

Pass `openTelemetryCreate(false)` to create the instance without registering it globally, or use `openTelemetryEnable(tracerProviderBuilder, loggerProviderBuilder, meterProviderBuilder)` to add Braintrust's exporters to OpenTelemetry SDK builders you already manage.

```java #skip-compile theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
var openTelemetry = braintrust.openTelemetryCreate();
```

Returns: `io.opentelemetry.api.OpenTelemetry`

Use the returned instance to wrap your own application code in spans. Braintrust traces your AI calls automatically, but not the code around them, so wrapping that work shows its structure in the trace and nests any traced AI calls underneath.

```java #skip-compile theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
var tracer = openTelemetry.getTracer("my-app");
var span = tracer.spanBuilder("process-request").startSpan();
try (var scope = span.makeCurrent()) {
    // traced AI calls here nest under "process-request"
} finally {
    span.end();
}
```

### `braintrust.projectUri`

The Braintrust UI URL for the configured organization and project. Use it to link from your own app or logs straight to the project in Braintrust.

```java #skip-compile theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
var url = braintrust.projectUri();
```

Returns: `URI`

## Evaluations

An evaluation runs your task over a set of cases, scores each output, and reports the results, which is how you measure quality and catch regressions as you change prompts or models. These APIs build and run evaluations from your Java code.

### `braintrust.evalBuilder`

Defines an evaluation in code for input type `INPUT` and output type `OUTPUT`: you give it cases, a task function, and one or more scorers, then call `run()`, which returns an `EvalResult`. Call `createReportString()` on it for a human-readable summary.

```java #skip-compile theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import dev.braintrust.eval.DatasetCase;
import dev.braintrust.eval.Scorer;

var eval = braintrust.<String, String>evalBuilder()
    .name("food-classifier")
    .cases(
        DatasetCase.of("strawberry", "fruit"),
        DatasetCase.of("asparagus", "vegetable"))
    .taskFunction(getFoodType)
    .scorers(Scorer.of("exact_match", (expected, result) -> expected.equals(result) ? 1.0 : 0.0))
    .build();

var result = eval.run();
System.out.println(result.createReportString());
```

Returns: `Eval.Builder`

Builder methods (each returns the `Eval.Builder` for chaining, except `build()`):

* **`cases(DatasetCase...)`** → `Eval.Builder`: inline evaluation cases. Provide this or `dataset()`.
* **`dataset(Dataset)`** → `Eval.Builder`: run against a dataset instead of inline `cases`.
* **`taskFunction(Function<INPUT, OUTPUT>)`** → `Eval.Builder` (required): the task that produces the output to score.
* **`scorers(Scorer...)`** → `Eval.Builder`: scorers to apply to each case. Required unless `classifiers` is set.
* **`classifiers(Classifier...)`** → `Eval.Builder`: classifiers to apply to each case. Required unless `scorers` is set.
* **`name(String)`** → `Eval.Builder`: experiment name (optional).
* **`build()`** → `Eval`: builds the `Eval`.

### `Scorer`

A scorer measures how good your task's output is, producing a score between 0 and 1 for each case in an evaluation. Scorers turn raw outputs into the metrics you compare across runs, whether that's checking an answer against the expected value or asking an LLM to judge quality.

Create one inline with `Scorer.of`, implement the `Scorer<INPUT, OUTPUT>` interface when you need full control, or fetch a scorer you've defined in Braintrust with `braintrust.fetchScorer(slug)`. `Scorer.of` comes in two forms: one scores from `(expected, actual)`, the other from the full `TaskResult` when you need more than the output.

```java #skip-compile theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
// (expected, actual) -> score
Scorer.of("exact_match", (expected, result) -> expected.equals(result) ? 1.0 : 0.0);
```

* **`Scorer.of(String name, BiFunction<OUTPUT, OUTPUT, Double>)`** → `Scorer`: score from `(expected, actual)`.
* **`Scorer.of(String name, Function<TaskResult<INPUT, OUTPUT>, Double>)`** → `Scorer`: score from the full task result.

### `Classifier`

When you want to categorize a task's output instead of scoring it numerically, use a classifier. It returns zero or more `Classification` results for each case, which is useful for labeling outputs by topic, intent, or failure type rather than rating them between 0 and 1.

Create one with `Classifier.of(String name, Function<TaskResult<INPUT, OUTPUT>, List<Classification>> fn)`. A `Classification` is a record `(name, id, label, metadata)` where `id` is required, so for the common case you can use `Classification.of(id)`.

```java #skip-compile theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
Classifier.of("topic", taskResult -> List.of(Classification.of("billing")));
```

## Datasets

A dataset is the set of cases an evaluation runs against. Define cases inline in code, or load a dataset you manage in Braintrust.

### `DatasetCase.of`

A dataset case is a single example: an input paired with the output you expect. Use `of(input, expected)` for the common case, or `of(input, expected, tags, metadata)` to attach case-level tags and metadata.

Returns: `DatasetCase`

Parameters:

* **`input`** (`INPUT`, required): the value passed to the task.
* **`expected`** (`OUTPUT`, required): the expected output to score against.
* **`tags`** (`List<String>`): case-level tags.
* **`metadata`** (`Map<String, Object>`): case-level metadata.

### `Dataset.of`

Groups several cases into an in-memory dataset you can hand to an evaluation, as an alternative to passing cases inline. Build one with `Dataset.of(DatasetCase...)`.

Returns: `Dataset`

### `braintrust.fetchDataset`

Loads a dataset you manage in Braintrust by name, so you can evaluate against shared, versioned test data instead of defining cases in code.

Returns: `Dataset`

Parameters:

* **`name`** (`String`, required): the dataset name.
* **`version`** (`String`): a specific version to pin. Omit to load the latest.

## Prompts

Manage prompts in Braintrust and load them at runtime, so you can edit and version them without redeploying your code. These APIs load a prompt and turn it into the messages or request parameters you send to a model.

### `braintrust.promptLoader`

Provides the loader for prompts you manage in Braintrust. Load a prompt by its slug, then render it with variables to produce the messages you send to a model.

```java #skip-compile theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
var prompt = braintrust.promptLoader().load("my-prompt-slug");
```

Returns: `BraintrustPromptLoader`

### `prompt.renderMessages`

Fills in a loaded prompt's template variables to produce the final messages, ready to send to a model. Pass the variable values as `renderMessages(Map<String, Object> parameters)`.

Returns: `List<Map<String, Object>>`

### `BraintrustOpenAI.buildChatCompletionsPrompt`

Turns a loaded prompt and its variables directly into OpenAI chat completion parameters, so you can send a Braintrust-managed prompt to OpenAI without assembling the request yourself.

Returns: `ChatCompletionCreateParams`

Parameters:

* **`prompt`** (`BraintrustPrompt`, required): a prompt loaded via `promptLoader`.
* **`parameters`** (`Map<String, Object>`, required): the variable values to render into the prompt.

## Attachments

When your traces involve binary content like images or PDFs, log it as an attachment so it appears in Braintrust instead of as an opaque blob. When you trace AI calls, Braintrust automatically converts base64 attachments in provider messages into uploaded attachments, so you rarely need the APIs below for instrumented calls. Reach for them when you're attaching binary content to a span yourself.

### `Base64Attachment`

Wrap binary content in a `Base64Attachment` to log it on a span. Create one from a base64 data URI with `Base64Attachment.of(String dataUri)`, or from a file with `Base64Attachment.ofFile(ContentType contentType, String path)`. `ContentType` provides constants such as `IMAGE_PNG` and `APPLICATION_PDF`.

```java #skip-compile theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
var attachment = Base64Attachment.ofFile(Base64Attachment.ContentType.IMAGE_PNG, "chart.png");
```

## Wrappers

If you'd rather instrument a specific client in code than use the Java Agent, wrap it. Each wrapper takes the `OpenTelemetry` from `openTelemetryCreate()` and returns an instrumented client that traces its calls to Braintrust. For setup and the full list of supported integrations, see [Java SDK integrations](/sdks/java/sdk-integrations).

```java #skip-compile theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import dev.braintrust.instrumentation.openai.BraintrustOpenAI;

OpenAIClient client = BraintrustOpenAI.wrapOpenAI(openTelemetry, OpenAIOkHttpClient.fromEnv());
```

* **OpenAI**: `BraintrustOpenAI.wrapOpenAI(openTelemetry, client)` → `OpenAIClient`.
* **Anthropic**: `BraintrustAnthropic.wrap(openTelemetry, client)` → `AnthropicClient`.
* **Google GenAI (Gemini)**: `BraintrustGenAI.wrap(openTelemetry, clientBuilder)` → `Client`.
* **LangChain4j**: `BraintrustLangchain.wrap(openTelemetry, aiServices)` → the wrapped `AiServices<T>`.
* **AWS Bedrock**: `BraintrustAWSBedrock.wrap(openTelemetry, clientBuilder)` → `BedrockRuntimeClientBuilder`.
* **Spring AI**: `BraintrustSpringAI.wrap(openTelemetry, chatModelBuilder)` → the wrapped chat model builder.

## Dev server

Serve your evaluators over HTTP so Braintrust can run them as [remote evals](/evaluate/remote-evals) from the Playground. Define each evaluator as a `RemoteEval`, register it with a `Devserver`, and start the server.

```java #skip-compile theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import dev.braintrust.Braintrust;
import dev.braintrust.devserver.Devserver;
import dev.braintrust.devserver.RemoteEval;
import dev.braintrust.eval.Scorer;
import java.util.List;

var braintrust = Braintrust.get();

RemoteEval<String, String> foodTypeEval = RemoteEval.<String, String>builder()
    .name("food-type-classifier")
    .taskFunction(food -> classifyFood(food))
    .scorers(List.of(
        Scorer.of("exact_match", (expected, result) -> expected.equals(result) ? 1.0 : 0.0)))
    .build();

var devserver = Devserver.builder()
    .config(braintrust.config())
    .registerEval(foodTypeEval)
    .port(8301)
    .build();

devserver.start();
```

### `RemoteEval`

Defines an evaluator the dev server can run on demand. Bundle a task and its scorers under a name, with an optional parameter schema for values supplied from the Playground.

Build one with `RemoteEval.<INPUT, OUTPUT>builder()`. Builder methods (each returns the `RemoteEval.Builder` for chaining, except `build()`):

* **`name(String)`** → `RemoteEval.Builder` (required): evaluator name, used as its identifier.
* **`taskFunction(Function<INPUT, OUTPUT>)`** → `RemoteEval.Builder` (required): the task that produces the output to score.
* **`scorers(List<Scorer>)`** → `RemoteEval.Builder`: scorers to apply to each case. Add one at a time with `scorer(...)`.
* **`parameters(List<ParameterDef>)`** → `RemoteEval.Builder`: runtime parameter schema exposed in the Playground. Add one at a time with `parameter(...)`.
* **`build()`** → `RemoteEval`: builds the evaluator.

### `Devserver`

Runs an HTTP server that exposes your registered evaluators for Braintrust to call.

Build one with `Devserver.builder()`, then call `start()`. Builder methods (each returns the `Devserver.Builder` for chaining, except `build()`):

* **`config(BraintrustConfig)`** → `Devserver.Builder` (required): SDK configuration, typically `braintrust.config()`.
* **`registerEval(RemoteEval)`** → `Devserver.Builder` (required): registers an evaluator to serve. Call once per evaluator, with at least one registered.
* **`host(String)`** → `Devserver.Builder`: bind address. Defaults to `localhost`. Set to `0.0.0.0` to bind all interfaces.
* **`port(int)`** → `Devserver.Builder`: port to listen on. Defaults to `8300`.
* **`build()`** → `Devserver`: builds the server.

Methods:

* **`start()`**: starts the server and begins serving the registered evaluators. Throws `IOException`.
* **`stop()`**: stops the server.

## Configuration

Configure the SDK with environment variables, or programmatically with `BraintrustConfig.builder()`.

```java #skip-compile theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import dev.braintrust.config.BraintrustConfig;

var config = BraintrustConfig.builder()
    .apiKey(System.getenv("BRAINTRUST_API_KEY"))
    .defaultProjectName("My project")
    .build();

var braintrust = Braintrust.get(config);
```

### Environment variables

* **`BRAINTRUST_API_KEY`** (required): Braintrust API key.
* **`BRAINTRUST_DEFAULT_PROJECT_NAME`**: project that traced spans route to. Defaults to `default-java-project`.
* **`BRAINTRUST_DEFAULT_PROJECT_ID`**: project UUID. Takes precedence over the project name.
* **`BRAINTRUST_API_URL`**: Braintrust API URL. Defaults to `https://api.braintrust.dev`.
* **`BRAINTRUST_APP_URL`**: Braintrust app URL, used for permalinks. Defaults to `https://www.braintrust.dev`.
* **`BRAINTRUST_DEBUG`**: enable debug logging. Defaults to `false`.
* **`BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG`**: print spans to the console. Defaults to `false`.
* **`BRAINTRUST_REQUEST_TIMEOUT`**: request timeout in seconds. Defaults to `30`.
* **`BRAINTRUST_FILTER_AI_SPANS`**: export only AI-related spans. Defaults to `false`.
