Loop

Loop is Braintrust’s AI assistant that helps teams query, analyze, and improve AI in production. Use Loop to search logs semantically, generate filters from natural language, bootstrap scorers, optimize experiments, generate datasets, and more.

What you can do with Loop

Loop operates on data sources from across your project to summarize, generate, modify, and optimize your observability and evaluation tools based on real application data using natural language. With Loop, you can:

Generate and optimize prompts
Generate and optimize scorers
Generate, optimize, and analyze datasets
Summarize and improve experiments
Analyze and filter project logs
Generate and troubleshoot BTQL queries in the BTQL sandbox
Generate custom charts on the Monitor page
Search the documentation

Loop chat is available in Playgrounds, Logs, Datasets, Experiments, Scorers, Prompts, and the BTQL sandbox. Look for the Loop button in the bottom right corner of a page to open a window and start a chat, or use product search and look for “Loop”.

Loop keeps track of your queries in a queue, so you can ask multiple follow-ups while it’s running. Use the Enter key to interrupt the current operation and execute the next query in the queue. Loop also keeps a history of your conversations. Edit and re-run earlier Loop chat messages and make inline model and tool changes.

Setup

Select a model

Loop uses the AI models available in your Braintrust account via the Braintrust AI Proxy. We currently support the following models:

claude-4.5-sonnet (recommended)
claude-4.5-haiku
claude-4-sonnet
claude-4.1-opus
gpt-5
gpt-4.1
o3
o4-mini

Change the model in the dropdown at the bottom of the Loop chat window. Administrators can designate which models are available to be used in Loop for the organization. On your organization’s Settings page, select Loop and select the models you want to allow in Loop.

Toggle auto-accept

By default, Loop asks you for confirmation before executing certain tool calls, like running an eval or editing a prompt. To turn on auto-accept, select the settings button in your Loop chat window and select Auto-accept edits.

Select data sources

Loop can access different parts of your project, which lets you generate prompts based on datasets, optimize scorers based on results from evals, and run other multidimensional operations. In a chat, Loop prompts you to select a data source when you make a request that references one. For example, if you tell Loop “use a different dataset” from a playground, Loop asks you to select a dataset as a data source from a dropdown menu.

You can also give Loop access to data sources in the chat window. Select the add context icon and search for the data sources you want to let Loop query.

Generate and optimize prompts

Use Loop to generate, optimize, and edit your prompts. Loop can work with prompts from a Prompt or Playground page.

Generate prompts

Loop can generate prompts from scratch. On the Prompts page, select + Prompt to add a new, blank prompt. On a Playground page, add an empty Task. Then tell Loop to generate a prompt based on your request and it populates the prompt editor with the generated prompt. Example queries:

“Generate a prompt for a chatbot that can answer questions about the product”
“Write a good prompt based on recent logs”

Edit and optimize prompts

Loop can optimize existing prompts from a Prompt or Playground page. Ask Loop to optimize the prompt based on your request and it will suggest improvements. In a playground, select the Loop icon in the top right corner of a task to automatically select the task as a data source in the Loop chat window or quickly optimize the prompt.

Example queries:

“Add few-shot examples based on project logs”
“Optimize the prompts in this playground”
“Improve this prompt to make it friendlier and more engaging”

Generate and optimize scorers

Use Loop to generate, optimize, and edit your scorers. Loop can work with scorers from Scorer, Prompt, Experiment, Dataset, or Playground pages. You can also generate scorers from the Logs page.

Generate scorers

Loop can generate both code-based and LLM-as-a-judge scorers from scratch. On the Scorers page, select + Scorer to add a new, blank scorer. Then tell Loop to generate a scorer based on your request and it populates the scorer editor with the generated scorer. If you don’t specify the type of scorer, Loop generates an LLM-as-a-judge scorer.

On other pages, tell Loop to generate a new scorer and Loop will save it to your project. Loop gathers context from the resources on the page to build the scorer.

Loop can currently only generate code-based scorers for one language at a time. Specify the language you want to use when you generate a code-based scorer. Example queries:

“Write a good LLM-as-a-judge scorer for a chatbot that can answer questions about the product”
“Generate a code-based scorer based on project logs”
“Generate a code-based scorer based on this dataset”

Edit and optimize scorers

Loop can optimize existing scorers from a Scorer or Playground page. Ask Loop to optimize the scorer based on your request and it suggests improvements. If you ask Loop to optimize a built-in scorer from a Playground page, it suggests improvements and creates a new scorer with the changes. Loop can also take manually labelled target classification from evaluations in the playground and adjust scorer classification behavior. Select the rows that the scorers did not perform expectedly on, then select Tune scorer.

Select the desired classification, provide optional additional instruction and submit to Loop to tune the scorer. Loop adjusts the scorer based on the provided context.

Example queries:

“Optimize the Helpfulness scorer”
“Improve the Accuracy scorer based on the first prompt”
“Adjust the scorer to be more lenient”

Generate, optimize, and analyze datasets

Use Loop to generate, optimize, and analyze your datasets. Loop can analyze a dataset from a Dataset or Playground page and generate and modify datasets from any other project page.

Generate datasets

Loop can generate datasets from scratch based on parameters you provide, or it can create a dataset tailored to a specific context in your project. Generate a dataset from a specific page in your project to tailor the dataset to the context of that page.

Example queries:

“Generate a dataset from the highest-scoring examples in this experiment”
“Create a dataset with the most common inputs in the logs”

Analyze and optimize datasets

On a Dataset page or Playground page, you can ask Loop to analyze the dataset and generate a report. This gives you a high-level overview of the dataset including the dataset’s content, characteristics, strengths, and recommendations for improvement. You can then ask Loop to optimize the dataset based on the report or modify based on your requests.

Example queries:

“Summarize this dataset”
“Add five more rows”
“What edge cases are missing from this dataset?”

Summarize and improve experiments

Use Loop to summarize the results of your experiments, drill down into specific eval rows, and make suggestions for changes and improvements. On your Experiments page, select a single experiment or multiple experiments to compare. On the Experiment page that opens, ask Loop to summarize the results of the experiments and provide insights. Use these insights to generate or update your datasets, prompts, and scorers. You can also ask Loop to provide sample code for an improved experiment that you can add to your application and run to test the changes.

Loop can also analyze specific eval rows and provide insights or suggest improvements. For example, Loop can identify eval rows where a scorer performed poorly and generate a new dataset with those rows. It then gives you suggestions for how to use the dataset to improve your application. Example queries:

“What improved from the last experiment?”
“Categorize the errors in this experiment”
“Pick the best scorers for this task”

Analyze and filter project logs

Use Loop to analyze and filter your project’s logs. Loop understands the shape of your logs data and makes arbitrary queries to answer questions and provide insights. You can then use these insights to generate datasets, prompts, scorers, and more.

Analyze logs

On the Logs page, ask Loop to analyze the logs and give you insights. If you don’t specify an analysis vector, Loop gives you a comprehensive overview with general insights about health, activity trends, top errors, performance, and recommendations for ways to improve your project.

Example queries:

“What are the most common errors?”
“What user retention trends do you see?”
“Find common failure modes”

Filter logs

Use Loop to generate BTQL queries to filter logs. Select the Filter button to open the filter editor and select BTQL to switch to BTQL mode. Select Generate and type in a natural language description of the filter you want to apply. Loop generates a BTQL query based on your description. Example queries:

“Only LLM spans”
“From user John Smith”
“logs from the last 5 days where factuality score is less than 0.5”

Generate and troubleshoot BTQL queries in the BTQL sandbox

Use Loop to generate and troubleshoot BTQL queries. BTQL queries can return and filter project data, including logs, dataset rows, experiment traces, project prompts, and project scorers.

Generate and run BTQL queries

Loop can generate BTQL queries from natural language descriptions. For example, you can ask Loop to generate a BTQL query to find the most recent errors from the last 24 hours in your project logs. In the BTQL sandbox, Loop automatically populates the sandbox with the generated BTQL query and runs it. It also gives you a text summary of the results and suggests additional queries you can run to get more insights.

Example queries:

“Find the most common errors in logs over the last week”
“What are the highest scoring rows in my experiment”

Once you have a query in the sandbox, use Loop to update and optimize it.

“Update the query to show me error distribution over time”
“Add a filter to only show errors from specific models”

Troubleshoot BTQL queries

Loop can also help you resolve errors in your BTQL queries. Errors can occur when the query is syntactically incorrect, when the query is not valid against the data schema, or when the query is not valid against the data source. Select the Fix with Loop button next to the error in the sandbox. Loop analyzes the specific error type and context to provide targeted fixes, whether it’s correcting syntax, suggesting the right field names, or helping optimize query performance.

Generate custom charts on the Monitor page

Use Loop to create a new chart on the Monitor page with a natural language description. On the Monitor page, select the Chart button in the top right corner to open the chart editor. Use the text input at the top of the editor to describe the chart you want to create. Loop then selects the best chart type and configuration based on the description.

Example queries:

“List the top 5 models by error rate over the last 7 days”
“Show error rate over time for claude models”

Search the documentation

Use Loop to search through the Braintrust documentation to find relevant information and guidance. Ask Loop to search the documentation from any page where Loop is available.

Example queries:

“How do I use the Braintrust SDK?”
“What is the difference between a prompt and a scorer?”
“How do I use the Braintrust API?”

Next steps

Try out Loop using these examples:

From the Logs page: “find queries that took longer than 60 seconds” or “create a dataset from logs with errors”
From a Prompt page: “optimize this prompt to be friendlier but also more concise” or “add few-shot examples based on project logs”
From a Dataset page: “add 20 rows with more complex inputs” or “update this dataset to be more helpful when evaluating my most recent prompt”
From a Playground: “choose the best scorer for this eval” or “generate 10 more dataset rows”
In the BTQL sandbox: “write a query to return a list of org-level prompts” or “find the highest scoring rows in an experiment”

Check out the Loop cookbook for more examples and use cases.

Start

Integrations

Core

Context

Best practices

Reference

What you can do with Loop