Remote evals

If you have existing infrastructure for running evaluations that isn't easily adaptable to the Braintrust Playground, you can use remote evals to expose a remote endpoint. This lets you run evaluations directly in the playground, iterate quickly across datasets, run scorers, and compare results with other tasks. You can also run multiple instances of your remote eval side-by-side with different parameters and compare results. Parameters defined in the remote eval will be exposed in the playground UI.

Remote evals are in beta. If you are on a hybrid deployment, remote evals are available starting with v0.0.66.

Expose remote Eval

To expose an Eval running at a remote URL or your local machine, simply pass in the --dev flag. For example, given the following file, run npx braintrust eval parameters.eval.ts --dev to start the dev server and expose http://localhost:8300. The dev host and port can also be configured:

  • --dev-host DEV_HOST: The host to bind the dev server to. Defaults to localhost. Set to 0.0.0.0 to bind to all interfaces.
  • --dev-port DEV_PORT: The port to bind the dev server to. Defaults to 8300.
import { Levenshtein } from "autoevals";
import { Eval, initDataset, wrapOpenAI } from "braintrust";
import OpenAI from "openai";
import { z } from "zod";
 
const client = wrapOpenAI(new OpenAI());
 
Eval("Simple eval", {
  data: initDataset("local dev", { dataset: "sanity" }), // Datasets are currently ignored
  task: async (input, { parameters }) => {
    const completion = await client.chat.completions.create(
      parameters.main.build({
        input: `${parameters.prefix}:${input}`,
      }),
    );
    return completion.choices[0].message.content ?? "";
  },
  // These scores will be used along with any that you configure in the UI
  scores: [Levenshtein],
  parameters: {
    main: {
      type: "prompt",
      name: "Main prompt",
      description: "This is the main prompt",
      default: {
        messages: [
          {
            role: "user",
            content: "{{input}}",
          },
        ],
        model: "gpt-4o",
      },
    },
    another: {
      type: "prompt",
      name: "Another prompt",
      description: "This is another prompt",
      default: {
        messages: [
          {
            role: "user",
            content: "{{input}}",
          },
        ],
        model: "gpt-4o",
      },
    },
    include_prefix: z
      .boolean()
      .default(false)
      .describe("Include a contextual prefix"),
    prefix: z
      .string()
      .describe("The prefix to include")
      .default("this is a math problem"),
    array_of_objects: z
      .array(
        z.object({
          name: z.string(),
          age: z.number(),
        }),
      )
      .default([
        { name: "John", age: 30 },
        { name: "Jane", age: 25 },
      ]),
  },
});

Running a remote eval from a playground

To run a remote eval from a playground, select + Remote from the Task pane and choose from the evals exposed in localhost or remote sources.

Remote eval in playground

Configure remote eval sources

To configure remote eval source URLs for a project, navigate to Configuration > Remote evals. Then, select + Remote eval source to configure a new remote eval source for your project.

Configure remote eval

Limitations

  • The dataset defined in your remote eval will be ignored. Scorers defined in remote evals will be concatenated with playground scorers.
  • Remote evals are limited to TypeScript only. Python support is coming soon.

On this page