Playgrounds vs experiments

Playgrounds are ephemeral scratchpads for exploring prompts. Experiments are permanent snapshots you can compare and revisit. When to use each.

Two tools, different purposes

Over the last two modules, you've used both playgrounds and experiments in Braintrust. They serve different purposes, and understanding the distinction will help you use each one effectively.

The playground is where you first pasted in the empathetic personality prompt and ran it against the customer complaints dataset. Think of the playground as a scratchpad. It's designed for rapid iteration: you can swap models, tweak prompts, adjust parameters, and see results immediately.

Playground runs are ephemeral. When you close the playground or start a new session, the previous results are gone (unless you explicitly save them as an experiment). This makes the playground ideal for exploring and prototyping, but not for tracking progress over time.

Use the playground when you want to:

Try out a new prompt idea quickly
Test how different models respond to the same input
Debug a specific output before formalizing your eval
Experiment with system prompts, temperature, or other parameters

Experiments

Experiments are permanent snapshots. When you saved your playground runs as experiments in module 3, Braintrust captured everything: the inputs, outputs, scores, model parameters, token usage, and cost. That data persists and can be compared against future experiments.

This permanence matters for several reasons:

No re-runs needed. You can revisit experiment results weeks or months later without running the eval again. This saves time and money, especially with large datasets.
Comparison across time. When you make changes to your prompt, model, or scoring criteria, you can compare the new experiment against previous ones to see whether things improved or regressed.
Team collaboration. Experiments create a shared record that your team can reference. Instead of saying "I tried this prompt and it seemed better," you can point to the experiment data.

Use experiments when you want to:

Record a baseline for future comparison
Share results with your team
Track how your system improves over time
Make a decision about what to ship

The typical workflow

In practice, most developers move between playgrounds and experiments fluidly. Start in the playground to iterate quickly. Once you have a configuration worth recording, save it as an experiment. Compare experiments to make decisions. Then go back to the playground to iterate on the next idea.

What's next

In the next lesson, you'll rebuild the customer support eval in Python using the Braintrust SDK. Code gives you version control for your prompts and scores, lets you run experiments programmatically, and lets you integrate evals into your CI/CD pipeline.

Evals

Two tools, different purposes

Playgrounds

Experiments

The typical workflow

What's next

Further reading

Trace everything