Eval() function, use the braintrust eval CLI command to run multiple evaluations from files, or create experiments in the Braintrust UI for no-code workflows. Integrate with CI/CD to catch regressions automatically.
Run with Eval()
TheEval() function runs an evaluation and creates an experiment:
Eval() automatically:
- Creates an experiment in Braintrust
- Displays a summary in your terminal
- Populates the UI with results
- Returns summary metrics
Run with CLI
Use thebraintrust eval command to run evaluations from files:
- TypeScript
- Python
.env.development.local.env.local.env.development.env
Watch mode
Re-run evaluations automatically when files change:Local testing mode
Run evaluations without sending logs to Braintrust for quick iteration:Run in UI
Create and run experiments directly in the Braintrust UI without writing code:- Navigate to Evaluations > Experiments.
- Click + Experiment or use the empty state form.
- Select one or more prompts, agents, or scorers to evaluate.
- Choose or create a dataset:
- Select existing dataset: Pick from datasets in your organization
- Upload CSV/JSON: Import test cases from a file
- Empty dataset: Create a blank dataset to populate manually later
- Add scorers to measure output quality.
- Click Create to execute the experiment.
Use playgrounds for rapid iteration
For iterative experimentation, use playgrounds to test prompts and models interactively, compare results side-by-side, and save winning configurations as experiments.UI experiments timeout after 15 minutes. For longer-running evaluations, use the SDK or CLI approach.
Run in CI/CD
Integrate evaluations into your CI/CD pipeline to catch regressions automatically.GitHub Actions
Use thebraintrustdata/eval-action to run evaluations on every pull request:

Other CI systems
For other CI systems, run evaluations as a standard command:BRAINTRUST_API_KEY environment variable set.
Run remotely
Expose evaluations running on remote servers or local machines using dev mode:Configure experiments
Customize experiment behavior with options:Run trials
Run each input multiple times to measure variance and get more robust scores. Braintrust intelligently aggregates results by bucketing test cases with the sameinput value:
Run local evals without sending logs
Run evaluations locally without creating experiments or sending data to Braintrust:--no-send-logs flag with the CLI command.
Next steps
- Interpret results from your experiments
- Compare experiments to measure improvements
- Write scorers to measure quality
- Use playgrounds for no-code experimentation