
Configure review scores
Define the scores you want to collect in your project’s Configuration tab:
- Continuous scores: Numeric values between 0% and 100% with a slider input control. Use for subjective quality assessments like helpfulness or tone.
- Categorical scores: Predefined options with assigned scores. Each option gets a unique percentage value between 0% and 100% (stored as 0 to 1). Use for classification tasks like sentiment or correctness categories.
- Free-form text: String values written to the
metadatafield at a specified path. Use for explanations, corrections, or structured feedback.

Write to expected fields
Configure categorical scores to write to theexpected field instead of creating a score. This is useful for labeling ground truth data.
To enable:
- Check Write to expected field instead of score.
- Optionally enable Allow multiple choice for multi-label classification.
Numeric scores are not assigned when writing to expected fields. If an object exists in the expected field, the categorical value appends to it.

expected field of any span through the UI.
Review logs and experiments
Select any row to open trace view and edit configured human review scores: Scores save automatically and update summary metrics in real time. The process works identically for logs and experiments.Leave comments
Add comments to spans alongside scores and expected values. Updates are tracked to form an audit trail of edits. Copy links to comments to share with teammates. Comments are searchable using the Filter menu.Use focused review mode
For reviewing large batches, use Review mode optimized for rapid evaluation. Enter review mode by pressing “r” or selecting the expand icon next to the Human review header. Review mode features:- Set scores, comments, and expected values
- Keyboard navigation for speed
- Shareable links that open directly in review mode
Review filtered data
Filter logs or experiments using natural language or SQL, then enter review mode to evaluate matching items: Use tags to mark items for “Triage”, then review them all at once. Save filters, sorts, and column configurations as views for standardized review workflows. Views update dynamically with new rows matching criteria. Views combine with review mode for optimal productivity:- Designed for optimal productivity: Intuitive filters, reusable configurations, and keyboard navigation enable fast and efficient evaluation.
- Dynamic and flexible views: Views dynamically update with new rows matching saved criteria, without requiring complex automation rules.
- Easy collaboration: Share review mode links for team collaboration without intricate permissions or setup overhead.
Create review queues
The Review list is a centralized queue showing all spans marked for review across your project. This complements focused reviews by giving you a curated queue of items that need attention, regardless of where they appear in your project. To mark spans for review:- Select Flag for review in the span header.
- Bulk select rows and flag them together.
- Optionally assign to specific users.
Review in context
When you open a span in the list, you’ll see it in the context of its full trace. This allows you to understand the span’s role within the larger request and review parent and child spans for additional context. Mark spans as Complete when finished or navigate to the next item in the queue.Filter by scores
Find logs with specific scores using the filter menu or API:Next steps
- Capture user feedback from production
- Build datasets from reviewed logs
- Add labels and corrections to traces
- Run evaluations with human-reviewed datasets