Remote evals
If you have existing infrastructure for running evaluations that isn't easily adaptable to the Braintrust Playground, you can use remote evals to expose a remote endpoint. This lets you run evaluations directly in the playground, iterate quickly across datasets, run scorers, and compare results with other tasks. You can also run multiple instances of your remote eval side-by-side with different parameters and compare results. Parameters defined in the remote eval will be exposed in the playground UI.
Expose remote Eval
To expose an Eval
running at a remote URL or your local machine, pass in the --dev
flag.
Run npx braintrust eval parameters.eval.ts --dev
to start the dev server and expose http://localhost:8300
.
The dev host and port can also be configured:
--dev-host DEV_HOST
: The host to bind the dev server to. Defaults to localhost. Set to 0.0.0.0 to bind to all interfaces.--dev-port DEV_PORT
: The port to bind the dev server to. Defaults to 8300.
Running a remote eval from a playground
To run a remote eval from a playground, select + Remote from the Task pane and choose from the evals exposed in localhost or remote sources.
Configure remote eval sources
To configure remote eval source URLs for a project, navigate to Configuration > Remote evals. Then, select + Remote eval source to configure a new remote eval source for your project.
Language considerations
When implementing remote evals, be aware of these language-specific patterns:
Feature | TypeScript | Python |
---|---|---|
Parameter validation | Zod schemas (e.g., z.string() , z.boolean() ) | Optional: Pydantic models with single value field |
Parameter access | Direct access (e.g., parameters.prefix ) | Dictionary access (e.g., parameters["prefix"] or parameters.get("prefix") ) |
Validation approach | Automatic via Zod | Optional via Pydantic validators or manual in task |
Prompt format | messages array | Nested prompt and options objects |
Async handling | async /await with promises | async /await with coroutines |
Limitations
- The dataset defined in your remote eval will be ignored. Scorers defined in remote evals will be concatenated with playground scorers.
- Remote evals are supported in both TypeScript and Python.