Launch an eval

curl --request POST \
  --url https://api.braintrust.dev/v1/eval \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "project_id": "<string>",
  "data": {
    "dataset_id": "<string>",
    "_internal_btql": {}
  },
  "task": {
    "function_id": "<string>",
    "version": "<string>"
  },
  "scores": [
    {
      "function_id": "<string>",
      "version": "<string>"
    }
  ],
  "experiment_name": "<string>",
  "metadata": {},
  "parent": {
    "object_type": "project_logs",
    "object_id": "<string>",
    "row_ids": {
      "id": "<string>",
      "span_id": "<string>",
      "root_span_id": "<string>"
    },
    "propagated_event": {}
  },
  "stream": true,
  "trial_count": 123,
  "is_public": true,
  "timeout": 123,
  "max_concurrency": 10,
  "base_experiment_name": "<string>",
  "base_experiment_id": "<string>",
  "git_metadata_settings": {
    "collect": "all",
    "fields": [
      "commit"
    ]
  },
  "repo_info": {
    "commit": "<string>",
    "branch": "<string>",
    "tag": "<string>",
    "dirty": true,
    "author_name": "<string>",
    "author_email": "<string>",
    "commit_message": "<string>",
    "commit_time": "<string>",
    "git_diff": "<string>"
  },
  "strict": true,
  "stop_token": "<string>",
  "extra_messages": "<string>",
  "tags": [
    "<string>"
  ],
  "mcp_auth": {}
}
'

{
  "project_name": "<string>",
  "experiment_name": "<string>",
  "project_url": "<string>",
  "experiment_url": "<string>",
  "comparison_experiment_name": "<string>",
  "scores": {},
  "metrics": {}
}

POST

eval

Launch an eval

curl --request POST \
  --url https://api.braintrust.dev/v1/eval \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "project_id": "<string>",
  "data": {
    "dataset_id": "<string>",
    "_internal_btql": {}
  },
  "task": {
    "function_id": "<string>",
    "version": "<string>"
  },
  "scores": [
    {
      "function_id": "<string>",
      "version": "<string>"
    }
  ],
  "experiment_name": "<string>",
  "metadata": {},
  "parent": {
    "object_type": "project_logs",
    "object_id": "<string>",
    "row_ids": {
      "id": "<string>",
      "span_id": "<string>",
      "root_span_id": "<string>"
    },
    "propagated_event": {}
  },
  "stream": true,
  "trial_count": 123,
  "is_public": true,
  "timeout": 123,
  "max_concurrency": 10,
  "base_experiment_name": "<string>",
  "base_experiment_id": "<string>",
  "git_metadata_settings": {
    "collect": "all",
    "fields": [
      "commit"
    ]
  },
  "repo_info": {
    "commit": "<string>",
    "branch": "<string>",
    "tag": "<string>",
    "dirty": true,
    "author_name": "<string>",
    "author_email": "<string>",
    "commit_message": "<string>",
    "commit_time": "<string>",
    "git_diff": "<string>"
  },
  "strict": true,
  "stop_token": "<string>",
  "extra_messages": "<string>",
  "tags": [
    "<string>"
  ],
  "mcp_auth": {}
}
'

{
  "project_name": "<string>",
  "experiment_name": "<string>",
  "project_url": "<string>",
  "experiment_url": "<string>",
  "comparison_experiment_name": "<string>",
  "scores": {},
  "metrics": {}
}

Authorizations

Authorization

string

header

required

Most Braintrust endpoints are authenticated by providing your API key as a header Authorization: Bearer [api_key] to your HTTP request. You can create an API key in the Braintrust organization settings page.

Body

application/json

Eval launch parameters

project_id

string

required

Unique identifier for the project to run the eval in

data

dataset_id · object

required

The dataset to use

dataset_id
project_dataset_name
dataset_rows

Show child attributes

task

function_id · object

required

The function to evaluate

Show child attributes

scores

required

The functions to score the eval on

Options for identifying a function

Show child attributes

experiment_name

string

An optional name for the experiment created by this eval. If it conflicts with an existing experiment, it will be suffixed with a unique identifier.

metadata

object

Optional experiment-level metadata to store about the evaluation. You can later use this to slice & dice across experiments.

Show child attributes

parent

Options for tracing the evaluation

Show child attributes

stream

boolean

Whether to stream the results of the eval. If true, the request will return two events: one to indicate the experiment has started, and another upon completion. If false, the request will return the evaluation's summary upon completion.

trial_count

number | null

The number of times to run the evaluator per input. This is useful for evaluating applications that have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the variance in the results.

is_public

boolean | null

Whether the experiment should be public. Defaults to false.

timeout

number | null

The maximum duration, in milliseconds, to run the evaluation. Defaults to undefined, in which case there is no timeout.

max_concurrency

number | null

default:10

The maximum number of tasks/scorers that will be run concurrently. Defaults to 10. If null is provided, no max concurrency will be used.

base_experiment_name

string | null

An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

base_experiment_id

string | null

An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

git_metadata_settings

object

Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.

Show child attributes

repo_info

object

Optionally explicitly specify the git metadata for this experiment. This takes precedence over gitMetadataSettings if specified.

Show child attributes

strict

boolean | null

If true, throw an error if one of the variables in the prompt is not present in the input

stop_token

string | null

The token to stop the run

extra_messages

string

A template path of extra messages to append to the conversion. These messages will be appended to the end of the conversation, after the last message.

Response

200 - application/json

Eval launch response

Summary of an experiment

project_name

string

required

Name of the project that the experiment belongs to

experiment_name

string

required

Name of the experiment

project_url

string<uri>

required

URL to the project's page in the Braintrust app

experiment_url

string<uri>

required

URL to the experiment's page in the Braintrust app

comparison_experiment_name

string | null

The experiment which scores are baselined against

scores

object

Summary of the experiment's scores

Show child attributes

metrics

object

Summary of the experiment's metrics

Show child attributes

Proxy any OpenAI request (fallback)

Autoevals

⌘I

SDKs

API

Other

Launch an eval

Authorizations

Body

Response