Why capture feedback
Production feedback helps you:- Build datasets from actual user interactions
- Identify edge cases and failure modes
- Understand user preferences and expectations
- Validate that improvements work for real users
Types of feedback
Braintrust supports multiple feedback types that you can combine:- Scores: Thumbs up/down, star ratings, or custom numeric values
- Expected values: User corrections showing what the output should be
- Comments: Free-form explanations or context
- Metadata: Structured data like user ID, session ID, or feature flags
Build datasets from feedback
Once you’ve captured feedback, use it to create evaluation datasets:Filter by feedback scores
Use the filter menu to find traces with specific feedback:Copy to datasets
After filtering:- Select the traces you want to include.
- Select Add to dataset.
- Choose an existing dataset or create a new one.
Generate with Loop
Ask Loop to create datasets based on feedback patterns: Example queries:- “Create a dataset from logs with positive feedback”
- “Generate a dataset from user corrections”
- “Build a dataset from cases where users clicked thumbs down”
Use feedback for human review
Production feedback complements internal review:Configure review scores
Set up review scores that match your production feedback. For example, if you capture thumbs up/down in production, configure a matching categorical score for internal review. This consistency lets you compare user feedback with expert assessments.Review low-scoring traces
Filter for traces with poor user feedback and enter review mode:- Apply filter:
WHERE scores.user_rating < 0.3(SQL) orfilter: scores.user_rating < 0.3(BTQL). - Enter Review mode.
- Add internal scores and comments.
- Update expected values.
Track feedback patterns
Use dashboards to monitor feedback trends:- User satisfaction over time
- Feedback distribution by feature or user segment
- Correlation between automated scores and user feedback
- Common feedback themes (via comments)
Iterate on improvements
Close the feedback loop:- Capture feedback from production users.
- Annotate traces with expected values and labels.
- Build datasets from annotated examples.
- Evaluate changes using those datasets.
- Deploy improvements and monitor feedback again.
Next steps
- Build datasets from user feedback
- Configure human review to add internal assessments
- Add labels to categorize feedback patterns
- Run evaluations using feedback-derived datasets