Applies to:
- Plan -
- Deployment -
Summary
Issue: Classification scorer results are missing or empty in logs when using an incompatible model. The scoring pipeline fails silently withValidationException: This model doesn't support tool use in streaming mode.
Cause: Classification scorers require a model that supports both streaming and tool use; Bedrock LLaMA models do not meet this requirement.
Resolution: Switch the scorer to a compatible model such as Anthropic Claude or OpenAI GPT-4o.
Resolution steps
Step 1: Check for scoring errors
Open an individual log trace and look for a scoring child span. If you see:Step 2: Switch to a compatible model
In your scorer configuration, replace the current model with one of the following: Compatible models:claude-3-5-haikuclaude-3-7-sonnetgpt-4ogpt-4o-mini
- Bedrock LLaMA models (do not support streaming tool use)
Step 3: Verify scoring is running
After switching models, trigger scoring on recent logs using Score past 3 days logs to confirm results populate correctly.Step 4: Adjust prompt if needed
After switching models, classification output may differ. Refine the scorer prompt to ensure the model returns the expected category labels.Where to find results
| Scorer type | Results column |
|---|---|
Classification scorer (function_type = "classifier") | classifications.<scorer_name> |
| LLM-as-judge or custom code scorer | scores.<scorer_name> |