A type of task that can be used in playgrounds. Consists of a chained sequence of prompts that automate complex workflows, where one LLM call’s output feeds into the next.Workflows guide
A configured workflow that lets you trigger actions based on specific events in Braintrust. For example, sending an alert, or batch exporting data.Automations guide
SQL queries for filtering and analyzing eval results, logs, and metrics. Braintrust also supports BTQL, an alternative pipe-delimited syntax.SQL reference
An instance of a live production or test interaction. Logs can include inputs, outputs, expected values, metadata, errors, scores, and tags. Scorers can also be applied to live logs to conduct online evaluations.Logs guide
An interactive space where you can prototype, iterate on, and compare multiple prompts and models against a dataset in real time. A playground can be saved as an experiment.Playgrounds guide
An evaluation that is executed on external or third-party systems or services, allowing you to evaluate tasks in environments outside Braintrust.Remote evals guide
A single unit of work, typically composed of an input, output, expected result, and evaluation. Tasks often appear within dataset or eval detail screens.
A visualization mode for traces that displays the interaction as a conversation thread, showing messages, tool calls, and scores in chronological order. Thread view is particularly useful for debugging LLM conversations and multi-turn interactions.Traces guide
A visualization mode for traces that displays spans as a gantt chart, where horizontal bars represent the duration of each operation. Timeline view is useful for identifying performance bottlenecks and understanding execution flow.Traces guide
An individual recorded session detailing each step of an interaction: model calls, tool invocations, and intermediate outputs. Traces aid debugging and root-cause analysis.Traces guide