Build datasets and gather feedback to improve your application
After observing your application in production, the next step is annotating and curating data to build evaluation datasets. This process transforms raw production logs into high-quality test cases that help you systematically improve your application.
Annotation creates the ground truth data needed for evaluation. By collecting feedback, adding labels, and curating examples from production, you build datasets that:
Represent real user interactions and edge cases
Include expected outputs and quality assessments
Enable systematic testing and comparison
Support automated and human evaluation
Braintrust integrates annotation seamlessly with logs and experiments, making it easy to capture feedback and build datasets without context switching.