Log token metrics to @traced spans manually

Applies to:

Plan -
Deployment -

Summary

Goal: Access and manually log token metrics on @traced spans when using LangChain with direct-to-provider calls. Features: current_span().log(), metrics fields, SQL sandbox queries, BraintrustCallbackHandler.

Configuration steps

Step 1: Understand where token metrics are captured

Token metrics (prompt_tokens, completion_tokens, tokens) are only automatically captured on LLM-type spans — the actual model call spans created by BraintrustCallbackHandler. @traced decorator spans and chain spans do not automatically capture or roll up token usage from child spans.

Step 2: Query token metrics on LLM-type spans

Use the SQL sandbox, custom columns, or API to access span-level metrics:

SELECT
  span_id,
  metadata.model,
  metrics.prompt_tokens,
  metrics.completion_tokens,
  metrics.tokens,
  metrics.estimated_cost
FROM project_logs('your-project-id')
WHERE span_attributes.type = 'llm'

Step 3: Manually log metrics to a `@traced` span

Use current_span().log() inside the decorated function. If LangChain returns usage data on the response object, extract and log it directly:

import braintrust

@traced(name="LLMChainMixin.ainvoke")
async def ainvoke(self, input: Any, config: Any = None, **kwargs: Any) -> Any:
    result = await self.chain.ainvoke(input, config=config, **kwargs)

    usage = getattr(result, "usage_metadata", {}) or {}
    braintrust.current_span().log(metrics={
        "prompt_tokens": usage.get("input_tokens"),
        "completion_tokens": usage.get("output_tokens"),
    })

    return result

Step 4: Aggregate token counts across multiple LLM calls

If a single @traced span wraps multiple LLM calls, accumulate counts locally and log the total at the end:

@traced(name="multi-call-span")
async def run_multiple(self, inputs: list) -> list:
    total_prompt_tokens = 0
    total_completion_tokens = 0

    for item in inputs:
        result = await self.chain.ainvoke(item)
        usage = getattr(result, "usage_metadata", {}) or {}
        total_prompt_tokens += usage.get("input_tokens", 0)
        total_completion_tokens += usage.get("output_tokens", 0)

    braintrust.current_span().log(metrics={
        "prompt_tokens": total_prompt_tokens,
        "completion_tokens": total_completion_tokens,
    })

    return results

Step 5: Check model name for `estimated_cost`

estimated_cost is computed at query time using metadata.model matched against Braintrust’s pricing registry.

Standard names like gpt-4o resolve correctly.
Azure OpenAI custom deployment names (e.g., my-gpt4-deployment) will not match, and estimated_cost returns null.

If your deployment name doesn’t match, tally cost manually using your token counts and per-model pricing.

​Summary

​Configuration steps

​Step 1: Understand where token metrics are captured

​Step 2: Query token metrics on LLM-type spans

​Step 3: Manually log metrics to a @traced span

​Step 4: Aggregate token counts across multiple LLM calls

​Step 5: Check model name for estimated_cost

Summary

Configuration steps

Step 1: Understand where token metrics are captured

Step 2: Query token metrics on LLM-type spans

Step 3: Manually log metrics to a `@traced` span

Step 4: Aggregate token counts across multiple LLM calls

Step 5: Check model name for `estimated_cost`