Add human feedback

Human review is a critical part of evaluating AI applications. While Braintrust helps you automatically evaluate AI software with scorers, human feedback provides essential ground truth and quality assessment. Braintrust integrates human feedback from end users, subject matter experts, and product teams in one place. Use human review to:

Evaluate and compare experiments
Assess the efficacy of automated scoring methods
Curate production logs into evaluation datasets
Label categorical data and provide corrections
Track quality trends over time

Configure review scores

Review scores let you collect structured feedback on spans and label dataset rows. Configure scores in Settings > Project > Human review. See Configure human review for details on score types and options.

Assign rows for review

You can assign rows in logs, experiments, and datasets to team members for review, analysis, or follow-up action. Assignments are particularly useful for human review workflows, where you can assign specific rows that need human evaluation and distribute review work across multiple team members. To assign a row to a team member from any table view (logs, experiments, or datasets):

Select the row.
Select Assign.
Choose a member to assign.

Team members receive email notifications when rows are assigned to them.

Score traces and datasets

Go to the Review page and select the type of data to review:

Log spans: production traces and debugging sessions
Experiment spans: Evaluation results and test runs
Dataset rows: Test cases and examples

Then select a row and set scores. You can also add comments and tags while reviewing. When finished reviewing, click Complete review and continue to move to the next item in the queue, or use the Next row and Previous row buttons.

Not all score types appear on dataset rows. Only categorical/slider scores configured to “write to expected” and free-form scores are available for dataset reviews, since datasets store test data (input/expected pairs) rather than subjective quality assessments.

Filter review data

The Review page shows any spans that have been flagged for review within a given time range. Each project provides default table views with common filters, including:

Default view: Shows all records
Awaiting review: Shows only records flagged for review but not yet started
Assigned to me: Shows only records assigned to you for review
Completed: Shows only records that have finished review

Use the View menu to switch between views. You can also use the Filter menu to focus on specific subsets for review. Use the Basic tab for point-and-click filtering, or switch to SQL to write precise queries. For example, filter by scores (e.g., scores.Preference > 0.75) to find highly-rated examples.

Default table views cannot be modified, but you can create custom table views based on custom filters and display settings.

Use tags to mark items for “Triage”, then review them all at once.

Change the trace layout

While reviewing log and experiment traces, you see detailed information about the flagged span by default.

View as a timeline

While viewing a trace, select Timeline to visualize the trace as a gantt chart. This view shows spans as horizontal bars where the width represents duration. Bars are color-coded by span type, making it easy to identify performance bottlenecks and understand the execution flow.

View as a conversation

While viewing a trace, select Thread to view the trace as a conversation thread. This view displays messages, tool calls, and scores in chronological order, ideal for debugging LLM conversations and multi-turn interactions. Use Find or press Cmd/Ctrl+F to search within the thread view and quickly locate specific content such as message text and score rationale. Matches are highlighted in-place using your browser’s native highlighting.

Thread view searches only within the currently open trace, not across all traces in your project.

Create custom trace views

While viewing a trace, select Views to create custom visualizations using natural language. Describe how you want to view your trace data and Loop will generate the code. For example:

“Create a view that renders a list of all tools available in this trace and their outputs”
“Render the video url from the trace’s metadata field and show simple thumbs up/down buttons”

By default, a custom trace view is only visible and editable by the user who created it. To share your view with all users in the project, select Save > Save as new view version > Update. See Create custom trace views for detailed examples, API reference, and how to embed views in your own applications.

Self-hosted deployments: If you restrict outbound access, allowlist https://www.braintrustsandbox.dev to enable custom views. This domain hosts the sandboxed iframe that securely renders custom view code.

Change span data format

When viewing a trace, each span field (input, output, metadata, etc.) displays data in a specific format. Change how a field displays by selecting the view mode dropdown in the field’s header. Available views:

Pretty - Parses objects deeply and renders values as Markdown (optimized for readability)
JSON - JSON highlighting and folding
YAML - YAML highlighting and folding
Tree - Hierarchical tree view for nested data structures

Additional format-specific views appear automatically for certain data types:

LLM - Formatted AI messages and tool calls with Markdown
LLM Raw - Unformatted AI messages and tool calls
HTML - Rendered HTML content

Your view mode selection is remembered per field type. To set a default view mode for all fields, go to Settings > Personal > Profile and select your preferred data view. See Personal settings for more details.

Create and edit scores inline

While reviewing, create new score types or edit existing configurations without navigating to settings:

To create a new score, click + Human review score.
To edit an existing score, select the edit icon next to the score name.

Changes apply immediately across your project.

Editing a score configuration affects how that score works going forward. Existing score values on traces remain unchanged.

Capture production feedback

In addition to internal reviews, capture feedback directly from production users. Production feedback helps you understand real-world performance and build datasets from actual user interactions. See Capture user feedback for implementation details and Build datasets from user feedback to learn how to turn feedback into evaluation datasets. You can also use dashboards to monitor user satisfaction trends and correlate automated scores with user feedback.

Customize the review table

Show and hide columns

Select Display > Columns and then:

Show or hide columns to focus on relevant data
Reorder columns by dragging them
Pin important columns to the left

All column settings are automatically saved when you save a view.

Use kanban layout

The kanban layout organizes flagged spans into three columns based on their review status:

Backlog: Spans flagged for review but not yet started
Pending: Spans currently being reviewed
Complete: Spans that have finished review

To use the kanban layout:

On the Review page, select Display > Layout > Kanban.
Drag cards between columns to update review status. Changes save automatically.
Click any card to open the full trace for detailed review.

Each card displays the span name, creation date, assignees, and a preview of the input and output.

Create custom table views

Custom table views save your table configurations including filters, column order, column visibility, and display settings. This lets you quickly switch between different ways of reviewing your data. To create a custom table view:

Apply the filters and display settings you want.
Select Save as in the toolbar.
Enter a view name.

Custom table views are accessible and configurable by any member of the organization. Table views update dynamically with new rows matching saved criteria.

Next steps

Add labels and corrections to categorize and tag traces
Build datasets from reviewed logs
Capture user feedback from production
Run evaluations with human-reviewed datasets

Start

Instrument

Observe

Annotate

Evaluate

Deploy

Admin

Best practices

Configure review scores

Assign rows for review

Score traces and datasets

Filter review data

Change the trace layout

View as a timeline

View as a conversation

Create custom trace views

Change span data format

Create and edit scores inline

Capture production feedback

Customize the review table

Show and hide columns

Use kanban layout

Create custom table views

Next steps

Start

Instrument

Observe

Annotate

Evaluate

Deploy

Admin

Best practices

​Configure review scores

​Assign rows for review

​Score traces and datasets

​Filter review data

​Change the trace layout

​View as a timeline

​View as a conversation

​Create custom trace views

​Change span data format

​Create and edit scores inline

​Capture production feedback

​Customize the review table

​Show and hide columns

​Use kanban layout

​Create custom table views

​Next steps

Configure review scores

Assign rows for review

Score traces and datasets

Filter review data

Change the trace layout

View as a timeline

View as a conversation

Create custom trace views

Change span data format

Create and edit scores inline

Capture production feedback

Customize the review table

Show and hide columns

Use kanban layout

Create custom table views

Next steps