Missing classification values from automated scorer

Applies to:

Plan -
Deployment -

Summary

Issue: Classification scorer results are missing or empty in logs when using an incompatible model. The scoring pipeline fails silently with ValidationException: This model doesn't support tool use in streaming mode. Cause: Classification scorers require a model that supports both streaming and tool use; Bedrock LLaMA models do not meet this requirement. Resolution: Switch the scorer to a compatible model such as Anthropic Claude or OpenAI GPT-4o.

Resolution steps

Step 1: Check for scoring errors

Open an individual log trace and look for a scoring child span. If you see:

ValidationException: This model doesn't support tool use in streaming mode.

the scorer model is incompatible.

Step 2: Switch to a compatible model

In your scorer configuration, replace the current model with one of the following: Compatible models:

claude-3-5-haiku
claude-3-7-sonnet
gpt-4o
gpt-4o-mini

Incompatible models:

Bedrock LLaMA models (do not support streaming tool use)

Step 3: Verify scoring is running

After switching models, trigger scoring on recent logs using Score past 3 days logs to confirm results populate correctly.

Step 4: Adjust prompt if needed

After switching models, classification output may differ. Refine the scorer prompt to ensure the model returns the expected category labels.

Where to find results

Scorer type	Results column
Classification scorer (`function_type = "classifier"`)	`classifications.<scorer_name>`
LLM-as-judge or custom code scorer	`scores.<scorer_name>`

⌘I

​Summary

​Resolution steps

​Step 1: Check for scoring errors

​Step 2: Switch to a compatible model

​Step 3: Verify scoring is running

​Step 4: Adjust prompt if needed

​Where to find results

Summary

Resolution steps

Step 1: Check for scoring errors

Step 2: Switch to a compatible model

Step 3: Verify scoring is running

Step 4: Adjust prompt if needed

Where to find results