Skip to main content
POST
/
v1
/
eval
Launch an eval
curl --request POST \
  --url https://api.braintrust.dev/v1/eval \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "project_id": "<string>",
  "data": {
    "dataset_id": "<string>",
    "_internal_btql": {}
  },
  "task": {
    "function_id": "<string>",
    "version": "<string>"
  },
  "scores": [
    {
      "function_id": "<string>",
      "version": "<string>"
    }
  ],
  "experiment_name": "<string>",
  "metadata": {},
  "parent": {
    "object_type": "project_logs",
    "object_id": "<string>",
    "row_ids": {
      "id": "<string>",
      "span_id": "<string>",
      "root_span_id": "<string>"
    },
    "propagated_event": {}
  },
  "stream": true,
  "trial_count": 123,
  "is_public": true,
  "timeout": 123,
  "max_concurrency": 10,
  "base_experiment_name": "<string>",
  "base_experiment_id": "<string>",
  "git_metadata_settings": {
    "collect": "all",
    "fields": [
      "commit"
    ]
  },
  "repo_info": {
    "commit": "<string>",
    "branch": "<string>",
    "tag": "<string>",
    "dirty": true,
    "author_name": "<string>",
    "author_email": "<string>",
    "commit_message": "<string>",
    "commit_time": "<string>",
    "git_diff": "<string>"
  },
  "strict": true,
  "stop_token": "<string>",
  "extra_messages": "<string>",
  "tags": [
    "<string>"
  ],
  "mcp_auth": {}
}
'
{
  "project_name": "<string>",
  "experiment_name": "<string>",
  "project_url": "<string>",
  "experiment_url": "<string>",
  "comparison_experiment_name": "<string>",
  "scores": {},
  "metrics": {}
}

Authorizations

Authorization
string
header
required

Most Braintrust endpoints are authenticated by providing your API key as a header Authorization: Bearer [api_key] to your HTTP request. You can create an API key in the Braintrust organization settings page.

Body

application/json

Eval launch parameters

project_id
string
required

Unique identifier for the project to run the eval in

data
dataset_id · object
required

The dataset to use

task
function_id · object
required

The function to evaluate

scores
(function_id · object | project_slug · object | global_function · object | prompt_session_id · object | inline_code · object | inline_function · object | inline_prompt · object)[]
required

The functions to score the eval on

Options for identifying a function

experiment_name
string

An optional name for the experiment created by this eval. If it conflicts with an existing experiment, it will be suffixed with a unique identifier.

metadata
object

Optional experiment-level metadata to store about the evaluation. You can later use this to slice & dice across experiments.

parent

Options for tracing the evaluation

stream
boolean

Whether to stream the results of the eval. If true, the request will return two events: one to indicate the experiment has started, and another upon completion. If false, the request will return the evaluation's summary upon completion.

trial_count
number | null

The number of times to run the evaluator per input. This is useful for evaluating applications that have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the variance in the results.

is_public
boolean | null

Whether the experiment should be public. Defaults to false.

timeout
number | null

The maximum duration, in milliseconds, to run the evaluation. Defaults to undefined, in which case there is no timeout.

max_concurrency
number | null
default:10

The maximum number of tasks/scorers that will be run concurrently. Defaults to 10. If null is provided, no max concurrency will be used.

base_experiment_name
string | null

An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

base_experiment_id
string | null

An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

git_metadata_settings
object

Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.

repo_info
object

Optionally explicitly specify the git metadata for this experiment. This takes precedence over gitMetadataSettings if specified.

strict
boolean | null

If true, throw an error if one of the variables in the prompt is not present in the input

stop_token
string | null

The token to stop the run

extra_messages
string

A template path of extra messages to append to the conversion. These messages will be appended to the end of the conversation, after the last message.

tags
string[]

Optional tags that will be added to the experiment.

mcp_auth
object

Response

200 - application/json

Eval launch response

Summary of an experiment

project_name
string
required

Name of the project that the experiment belongs to

experiment_name
string
required

Name of the experiment

project_url
string<uri>
required

URL to the project's page in the Braintrust app

experiment_url
string<uri>
required

URL to the experiment's page in the Braintrust app

comparison_experiment_name
string | null

The experiment which scores are baselined against

scores
object

Summary of the experiment's scores

metrics
object

Summary of the experiment's metrics