Remote evals
If you have existing infrastructure for running evaluations that isn't easily adaptable to the Braintrust Playground, you can use remote evals to expose a remote endpoint. This lets you run evaluations directly in the playground, iterate quickly across datasets, run scorers, and compare results with other tasks. You can also run multiple instances of your remote eval side-by-side with different parameters and compare results. Parameters defined in the remote eval will be exposed in the playground UI.
Remote evals are in beta. If you are on a hybrid deployment, remote evals are available starting with v0.0.66
.
Expose remote Eval
To expose an Eval
running at a remote URL or your local machine, simply pass in the --dev
flag. For example, given the following file, run npx braintrust eval parameters.eval.ts --dev
to start the dev server and expose http://localhost:8300
. The dev host and port can also be configured:
--dev-host DEV_HOST
: The host to bind the dev server to. Defaults to localhost. Set to 0.0.0.0 to bind to all interfaces.--dev-port DEV_PORT
: The port to bind the dev server to. Defaults to 8300.
Running a remote eval from a playground
To run a remote eval from a playground, select + Remote from the Task pane and choose from the evals exposed in localhost or remote sources.
Configure remote eval sources
To configure remote eval source URLs for a project, navigate to Configuration > Remote evals. Then, select + Remote eval source to configure a new remote eval source for your project.
Limitations
- The dataset defined in your remote eval will be ignored. Scorers defined in remote evals will be concatenated with playground scorers.
- Remote evals are limited to TypeScript only. Python support is coming soon.