Learn everything you need to know about evals by building and monitoring a customer support chatbot from scratch.
Playgrounds are ephemeral scratchpads for exploring prompts. Experiments are permanent snapshots you can compare and revisit. When to use each.
Over the last two modules, you've used both playgrounds and experiments in Braintrust. They serve different purposes, and understanding the distinction will help you use each one effectively.
The playground is where you first pasted in the empathetic personality prompt and ran it against the customer complaints dataset. Think of the playground as a scratchpad. It's designed for rapid iteration: you can swap models, tweak prompts, adjust parameters, and see results immediately.
Playground runs are ephemeral. When you close the playground or start a new session, the previous results are gone (unless you explicitly save them as an experiment). This makes the playground ideal for exploring and prototyping, but not for tracking progress over time.
Use the playground when you want to:
Experiments are permanent snapshots. When you saved your playground runs as experiments in module 3, Braintrust captured everything: the inputs, outputs, scores, model parameters, token usage, and cost. That data persists and can be compared against future experiments.
This permanence matters for several reasons:
Use experiments when you want to:
In practice, most developers move between playgrounds and experiments fluidly. Start in the playground to iterate quickly. Once you have a configuration worth recording, save it as an experiment. Compare experiments to make decisions. Then go back to the playground to iterate on the next idea.
In the next lesson, you'll rebuild the customer support eval in Python using the Braintrust SDK. Code gives you version control for your prompts and scores, lets you run experiments programmatically, and lets you integrate evals into your CI/CD pipeline.