- Agentic workflows: Multi-step agent flows or complex task logic that goes beyond a single prompt.
- Custom infrastructure: Access to internal APIs, databases, or services that can’t run in the cloud.
- Specific runtime environments: Custom dependencies, system libraries, or environment configurations.
- Security or compliance requirements: Data that must remain on your infrastructure.
- Long-running evaluations: Complex processing that exceeds typical execution timeouts.
If your evaluation can run in the Braintrust playground, you don’t need remote evals.
How it works
- Write an
Eval()with parameters that define runtime configuration options. - Run your eval locally with the
--devflag to expose an HTTP endpoint. - Configure the endpoint URL in your Braintrust project settings.
- Use the remote eval in the playground. Parameters appear as UI controls.
- When you run the eval, Braintrust sends parameters to your endpoint and displays results.
Set up a remote eval
A remote eval looks like a standardEval() call with a parameters field that defines configurable options. These parameters become UI controls in the playground. See Remote eval parameters for details on parameter types and syntax.
remote.eval.ts
Remote eval parameters
Parameters define runtime configuration that users can modify in the playground without changing code. They appear as form controls in the UI. When implementing remote evals, the parameter system works the same way in both languages but uses different syntax:| Feature | TypeScript | Python |
|---|---|---|
| Parameter types | type: "prompt" for LLM promptsz.string(), z.boolean(), z.number(), z.array(), z.object() with .describe() | type: "prompt" for LLM promptsDictionary with type: "string", "boolean", "number", "array", "object" |
| Type definition | Zod schemas with chained methods | Dictionary with type, description, default fields |
| Parameter access | Direct property access: parameters.prefix | Dictionary access: parameters["prefix"] or parameters.get("prefix") |
| Prompt parameters | type: "prompt" with messages array directly in default | type: "prompt" with nested prompt.messages and options objects |
| Prompt usage | parameters.main.build({ input: value }) | **parameters["main"].build(input=value) |
| Async handling | async/await with promises | async/await with coroutines |
parameters object in your task function.
Expose a remote eval
To make your eval accessible to Braintrust, run it with the--dev flag to start a local server:
- TypeScript
- Python
Run
npx braintrust eval path/to/eval.ts --dev to start the dev server at http://localhost:8300.--dev-host DEV_HOST: The host to bind the dev server to. Defaults tolocalhost. Set to0.0.0.0to bind to all interfaces (be cautious about security when exposing beyond localhost).--dev-port DEV_PORT: The port to bind the dev server to. Defaults to8300.
Configure remote eval sources
To add remote eval endpoints beyond localhost, configure them at the project level:- In your project, go to Configuration > Remote evals.
- Select Remote eval source.
- Enter the name and URL of your remote eval server.
- Select Create remote eval source.
Run a remote eval from a playground
After exposing your eval and configuring it in your project, you can use it in any playground:- In a playground, select Task.
- Select Remote eval from the task type list.
- Choose your eval from the available sources (localhost or configured remote URLs).
- Configure parameters using the UI controls that were defined in your
parametersobject. - Run the evaluation.
Demo
This video walks through exposing a remote eval to Braintrust and using it in a playground.Limitations
- The dataset defined in your remote eval is ignored. Datasets are managed through the playground.
- Scorers defined in remote evals are concatenated with playground scorers.
Next steps
- Use playgrounds to compare and analyze results.
- Write scorers to evaluate outputs.
- Run evaluations programmatically.