Why annotate
Annotation creates the ground truth data needed for evaluation. By collecting feedback, adding labels, and curating examples from production, you build datasets that:- Represent real user interactions and edge cases
- Include expected outputs and quality assessments
- Enable systematic testing and comparison
- Support automated and human evaluation
Gather human feedback
Human review provides qualitative assessments that complement automated scoring. Configure review scores in your project to collect:- Continuous scores: Numeric ratings with slider controls (0-100%)
- Categorical scores: Predefined options with assigned values
- Expected values: Corrections showing what the output should be
- Comments: Free-form feedback and context
Create custom trace views
Custom trace views transform complex traces into interfaces anyone on your team can use. Describe what you want in natural language and Loop generates an interactive React component you can customize or embed anywhere. Build custom views to:- Create annotation interfaces for large-scale human review tasks
- Replace JSON with intuitive UI components for non-technical reviewers
- Display data in domain-specific formats (carousels, conversation threads, dashboards)
- Aggregate information across multiple spans in a trace
Add labels and corrections
Beyond scores, you can annotate spans with:- Tags: Categorize traces for organization and filtering
- Comments: Provide context or explain issues
- Expected values: Specify correct outputs
- Metadata: Add custom fields for analysis
Build datasets
Datasets are versioned collections of test cases that you use to run evaluations. Each record contains:- Input: The data sent to your application
- Expected: The ideal output (optional but recommended)
- Metadata: Tags, user IDs, or other contextual information
- Production logs with interesting patterns
- User feedback (thumbs up/down, corrections)
- Manual curation by subject matter experts
- Generated examples from Loop
Export data
Extract annotated data for use in:- External evaluation frameworks
- Custom analysis pipelines
- Reporting and documentation
- Training data for fine-tuning
Next steps
- Add human feedback for your project
- Create custom trace views for tailored review workflows
- Add labels and corrections to traces
- Build datasets from production logs
- Capture user feedback in production