Skip to main content
Applies to:
  • Plan -
  • Deployment -

Summary

Issue: Classification scorer results are missing or empty in logs when using an incompatible model. The scoring pipeline fails silently with ValidationException: This model doesn't support tool use in streaming mode. Cause: Classification scorers require a model that supports both streaming and tool use; Bedrock LLaMA models do not meet this requirement. Resolution: Switch the scorer to a compatible model such as Anthropic Claude or OpenAI GPT-4o.

Resolution steps

Step 1: Check for scoring errors

Open an individual log trace and look for a scoring child span. If you see:
ValidationException: This model doesn't support tool use in streaming mode.
the scorer model is incompatible.

Step 2: Switch to a compatible model

In your scorer configuration, replace the current model with one of the following: Compatible models:
  • claude-3-5-haiku
  • claude-3-7-sonnet
  • gpt-4o
  • gpt-4o-mini
Incompatible models:
  • Bedrock LLaMA models (do not support streaming tool use)

Step 3: Verify scoring is running

After switching models, trigger scoring on recent logs using Score past 3 days logs to confirm results populate correctly.

Step 4: Adjust prompt if needed

After switching models, classification output may differ. Refine the scorer prompt to ensure the model returns the expected category labels.

Where to find results

Scorer typeResults column
Classification scorer (function_type = "classifier")classifications.<scorer_name>
LLM-as-judge or custom code scorerscores.<scorer_name>