Skip to main content
Applies to:


Summary

Issue: When using setup_pydantic_ai(), Braintrust incorrectly marks internal wrapper spans as type “llm”, causing a single API call to appear as 4 separate LLM calls in metrics and dashboards. Cause: PydanticAI creates nested spans for streaming, agent execution, fallback handling, and the actual API call - all are marked as type “llm” instead of distinguishing wrapper spans from actual API calls. Resolution: Filter queries to count only actual LLM API calls by model name patterns, excluding internal wrapper spans.

Resolution Steps

Step 1: Filter queries for accurate LLM counts

When querying LLM metrics in dashboards or reports, filter to count only actual API calls by model name patterns.
span_attributes.type = 'llm' AND (
  span_attributes.name LIKE 'chat gpt%' OR
  span_attributes.name LIKE 'chat gemini%' OR
  span_attributes.name LIKE 'chat claude%'
)

Step 2: Apply filters to cost calculations

Use the same filtering pattern when calculating costs or token usage to ensure accurate metrics based on actual API calls rather than wrapper spans.

Step 3: Update dashboard queries

Modify existing dashboard queries and alerts to use the filtered approach to prevent inflated LLM call counts in monitoring and reporting.