Loop
Loop is an AI assistant in Braintrust playgrounds. It helps you optimize and generate prompts, datasets and evals.
Loop is in public beta and is off by default. To turn it on, flip the feature flag in your settings. If you are on a hybrid deployment, Loop is available starting with v0.0.74
.
Selecting a model
Loop uses the AI models available in your Braintrust account via the proxy. It will attempt to use Claude 4 Sonnet by default, but supports any model that you have configured in your AI providers, including custom models.
To choose a model, navigate to the gear icon in the Loop chat window and select from the list of available models.
Available tools
Loop currently offers the following functionalities:
- Summarize playground: generate a summary of current playground contents
- Get eval results: retrieve evaluation results directly within Loop
- Edit prompt: generate and modify prompts
- Run eval: execute evaluations directly within Loop
- Edit data: generate and modify datasets
- Continue execution: resume interrupted or paused tasks
Before suggesting any optimizations, the agent will run and/or summarize your playground to investigate what improvements to suggest. You can remove any of these tools from your Loop workflow by selecting the gear icon and deselecting a tool from the available list.
Coming soon
- Edit scorers: design and select custom scorers
- Fetch logs: access and review logs directly within Loop
- Create prompt: create a new prompt
- More UI integration: the ability to access Loop outside of playgrounds
Generating and optimizing prompts
Loop can help you generate a prompt from scratch. To do so, make sure you have an empty task open, then use Loop to generate a prompt.
If you have existing prompts, you can optimize them using Loop.
To optimize a prompt, ask Loop in the chat window, or select the Loop icon in the top bar of any existing task. From there, you can add the prompt to your chat, or quick optimize.
After Loop provides a suggested optimization, you can review and accept the suggestion or keep iterating.
Generating and optimizing datasets
If no dataset exists, Loop can create one automatically. You must have a task in order for Loop to generate a tailored dataset for the evaluation task.
You can review the dataset and further refine it as needed.
After you run your playground, you can also ask Loop to optimize your dataset. The agent will provide various areas for optimizations based on an analysis of your current dataset.
Run and assess evals
After your tasks, dataset, and scorers are set up, Loop can run an evaluation for you, analyze it, and suggest further improvements.
Mode
By default, Loop will ask you for confirmation before executing certain tool calls, like running an evaluation. If you'd like Loop to run evaluations without confirmation, you can turn off this setting in the agent mode menu.
Continuous agent
In continuous agent mode, Loop will execute tools and make edit suggestions one after the other.