Authorizations
Most Braintrust endpoints are authenticated by providing your API key as a header Authorization: Bearer [api_key] to your HTTP request. You can create an API key in the Braintrust organization settings page.
Body
Eval launch parameters
Unique identifier for the project to run the eval in
The dataset to use Dataset id
- dataset_id
- project_dataset_name
- dataset_rows
The function to evaluate Function id
- function_id
- project_slug
- global_function
- prompt_session_id
- inline_code
- inline_function
- inline_prompt
The functions to score the eval on
- function_id
- project_slug
- global_function
- prompt_session_id
- inline_code
- inline_function
- inline_prompt
An optional name for the experiment created by this eval. If it conflicts with an existing experiment, it will be suffixed with a unique identifier.
Optional experiment-level metadata to store about the evaluation. You can later use this to slice & dice across experiments.
Options for tracing the function call Options for tracing the evaluation Span parent properties
Whether to stream the results of the eval. If true, the request will return two events: one to indicate the experiment has started, and another upon completion. If false, the request will return the evaluation's summary upon completion.
The number of times to run the evaluator per input. This is useful for evaluating applications that have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the variance in the results.
Whether the experiment should be public. Defaults to false.
The maximum duration, in milliseconds, to run the evaluation. Defaults to undefined, in which case there is no timeout.
The maximum number of tasks/scorers that will be run concurrently. Defaults to 10. If null is provided, no max concurrency will be used.
An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.
An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment.
Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
Metadata about the state of the repo when the experiment was created
Optionally explicitly specify the git metadata for this experiment. This takes precedence over gitMetadataSettings if specified.
If true, throw an error if one of the variables in the prompt is not present in the input
The token to stop the run
A template path of extra messages to append to the conversion. These messages will be appended to the end of the conversation, after the last message.
Optional tags that will be added to the experiment.
Response
Eval launch response
Summary of an experiment
Name of the project that the experiment belongs to
Name of the experiment
URL to the project's page in the Braintrust app
URL to the experiment's page in the Braintrust app
The experiment which scores are baselined against
Summary of the experiment's scores
Summary of the experiment's metrics