Autoevals
There are several pre-built scorers available via the open-source autoevals library, which offers standard evaluation methods that you can start using immediately. Autoeval scorers offer a strong starting point for a variety of evaluation tasks. Some autoeval scorers require configuration before they can be used effectively. For example, you might need to define expected outputs or certain parameters for specific tasks. To edit an autoeval scorer, you must copy it first. While autoevals are a great way to get started, you may eventually need to create your own custom scorers for more advanced use cases.Create a custom scorer
For more specialized evals, you can create custom scorers in TypeScript, Python, or as an LLM-as-a-judge. Code-based scorers (TypeScript/Python) are highly customizable and can return scores based on your exact requirements, while LLM-as-a-judge scorers use prompts to evaluate outputs. You can create custom scorers in TypeScript, Python, or as an LLM-as-a-judge either in the Braintrust UI or via the command line usingbraintrust push. These scorers will be available to use as functions throughout your project.
- UI
- CLI
Navigate to Scorers > + Scorer to create custom scorers in the UI.


The Playground allows you to iterate quickly on prompts while running evaluations, making it the perfect tool for testing and refining your AI models and prompts.
TypeScript and Python scorers
Add your custom code to the TypeScript or Python tabs. Your scorer will run in a sandboxed environment.Scorers created via the UI run with these available packages:
anthropicasyncioautoevalsbraintrustjsonmathopenairerequeststyping

LLM-as-a-judge scorers
In addition to code-based scorers, you can also create LLM-as-a-judge scorers through the UI. Define a prompt that evaluates the AI’s output and maps its choices to specific scores. You can also configure whether to use techniques like chain-of-thought (CoT) reasoning for more complex evaluations.

Pass thresholds
Pass thresholds allow you to define a minimum score (between 0 and 1) that a scorer must achieve for a result to be considered passing. This helps you quickly identify which evaluations meet your quality standards. You can set a pass threshold when creating or editing a scorer. The pass threshold is optional and can be adjusted using a slider in the scorer configuration form that ranges from 0 to 1. When creating scorers via the CLI, you can set a pass threshold with the__pass_threshold metadata field.
When a scorer has a pass threshold configured:
- Scores that meet or exceed the threshold are marked as passing and displayed with green highlighting and a checkmark
- Scores below the threshold are marked as failing and displayed with red highlighting
Use a scorer in the UI
You can use both autoevals and custom scorers in a Braintrust playground. In your playground, navigate to Scorers and select from the list of available scorers. You can also create a new custom scorer from this menu.