Skip to main content
Remote evals let you run evaluations on your own infrastructure while using Braintrust’s playground for iteration, comparison, and analysis. Your evaluation code runs on your servers or local machine, and the Braintrust playground sends parameters and receives results through a simple HTTP interface. Use remote evals when your evaluation requires:
  • Agentic workflows: Multi-step agent flows or complex task logic that goes beyond a single prompt.
  • Custom infrastructure: Access to internal APIs, databases, or services that can’t run in the cloud.
  • Specific runtime environments: Custom dependencies, system libraries, or environment configurations.
  • Security or compliance requirements: Data that must remain on your infrastructure.
  • Long-running evaluations: Complex processing that exceeds typical execution timeouts.
If your evaluation can run in the Braintrust playground, you don’t need remote evals.

How it works

  1. Write an Eval() with parameters that define runtime configuration options.
  2. Run your eval locally with the --dev flag to expose an HTTP endpoint.
  3. Configure the endpoint URL in your Braintrust project settings.
  4. Use the remote eval in the playground. Parameters appear as UI controls.
  5. When you run the eval, Braintrust sends parameters to your endpoint and displays results.
The playground handles dataset management, scoring, comparison, and visualization while your code handles the task execution.

Set up a remote eval

A remote eval looks like a standard Eval() call with a parameters field that defines configurable options. These parameters become UI controls in the playground. See Remote eval parameters for details on parameter types and syntax.
remote.eval.ts
import { Levenshtein } from "autoevals";
import { Eval, initDataset, wrapOpenAI } from "braintrust";
import OpenAI from "openai";
import { z } from "zod";

const client = wrapOpenAI(new OpenAI());

Eval("Simple eval", {
  data: initDataset("local dev", { dataset: "sanity" }),
  task: async (input, { parameters }) => {
    const promptInput = parameters.prefix
      ? `${parameters.prefix}: ${input}`
      : input;

    const completion = await client.chat.completions.create(
      parameters.main.build({
        input: promptInput,
      }),
    );
    return completion.choices[0].message.content ?? "";
  },
  scores: [Levenshtein],
  parameters: {
    main: {
      type: "prompt",
      name: "Main prompt",
      description: "This is the main prompt",
      default: {
        messages: [
          {
            role: "user",
            content: "{{input}}",
          },
        ],
        model: "gpt-4o",
      },
    },
    prefix: z
      .string()
      .describe("Optional prefix to prepend to input")
      .default(""),
  },
});

Remote eval parameters

Parameters define runtime configuration that users can modify in the playground without changing code. They appear as form controls in the UI. When implementing remote evals, the parameter system works the same way in both languages but uses different syntax:
FeatureTypeScriptPython
Parameter typestype: "prompt" for LLM prompts
z.string(), z.boolean(), z.number(), z.array(), z.object() with .describe()
type: "prompt" for LLM prompts
Dictionary with type: "string", "boolean", "number", "array", "object"
Type definitionZod schemas with chained methodsDictionary with type, description, default fields
Parameter accessDirect property access: parameters.prefixDictionary access: parameters["prefix"] or parameters.get("prefix")
Prompt parameterstype: "prompt" with messages array directly in defaulttype: "prompt" with nested prompt.messages and options objects
Prompt usageparameters.main.build({ input: value })**parameters["main"].build(input=value)
Async handlingasync/await with promisesasync/await with coroutines
When your remote eval runs, Braintrust sends the configured parameter values through the parameters object in your task function.

Expose a remote eval

To make your eval accessible to Braintrust, run it with the --dev flag to start a local server:
Run npx braintrust eval path/to/eval.ts --dev to start the dev server at http://localhost:8300.
You can configure the host and port:
  • --dev-host DEV_HOST: The host to bind the dev server to. Defaults to localhost. Set to 0.0.0.0 to bind to all interfaces (be cautious about security when exposing beyond localhost).
  • --dev-port DEV_PORT: The port to bind the dev server to. Defaults to 8300.
Once running, your eval exposes an HTTP endpoint that Braintrust can connect to. Keep this process running while using the remote eval in the playground.

Configure remote eval sources

To add remote eval endpoints beyond localhost, configure them at the project level:
  1. In your project, go to Configuration > Remote evals.
  2. Select Remote eval source.
  3. Enter the name and URL of your remote eval server.
  4. Select Create remote eval source.
All team members with access to the project can now use this remote eval in their playgrounds.

Run a remote eval from a playground

After exposing your eval and configuring it in your project, you can use it in any playground:
  1. In a playground, select Task.
  2. Select Remote eval from the task type list.
  3. Choose your eval from the available sources (localhost or configured remote URLs).
  4. Configure parameters using the UI controls that were defined in your parameters object.
  5. Run the evaluation.
Braintrust sends your parameters to the remote endpoint and displays results. You can run multiple instances of the same remote eval side-by-side with different parameters to compare results.

Demo

This video walks through exposing a remote eval to Braintrust and using it in a playground.

Limitations

  • The dataset defined in your remote eval is ignored. Datasets are managed through the playground.
  • Scorers defined in remote evals are concatenated with playground scorers.

Next steps