Skip to main content
Applies to:
  • Plan:
  • Deployment:

Summary

Goal: Access and manually log token metrics on @traced spans when using LangChain with direct-to-provider calls. Features: current_span().log(), metrics fields, SQL sandbox queries, BraintrustCallbackHandler.

Configuration steps

Step 1: Understand where token metrics are captured

Token metrics (prompt_tokens, completion_tokens, tokens) are only automatically captured on LLM-type spans — the actual model call spans created by BraintrustCallbackHandler. @traced decorator spans and chain spans do not automatically capture or roll up token usage from child spans.

Step 2: Query token metrics on LLM-type spans

Use the SQL sandbox, custom columns, or API to access span-level metrics:
SELECT
  span_id,
  metadata.model,
  metrics.prompt_tokens,
  metrics.completion_tokens,
  metrics.tokens,
  metrics.estimated_cost
FROM project_logs('your-project-id')
WHERE span_attributes.type = 'llm'

Step 3: Manually log metrics to a @traced span

Use current_span().log() inside the decorated function. If LangChain returns usage data on the response object, extract and log it directly:
import braintrust

@traced(name="LLMChainMixin.ainvoke")
async def ainvoke(self, input: Any, config: Any = None, **kwargs: Any) -> Any:
    result = await self.chain.ainvoke(input, config=config, **kwargs)

    usage = getattr(result, "usage_metadata", {}) or {}
    braintrust.current_span().log(metrics={
        "prompt_tokens": usage.get("input_tokens"),
        "completion_tokens": usage.get("output_tokens"),
    })

    return result

Step 4: Aggregate token counts across multiple LLM calls

If a single @traced span wraps multiple LLM calls, accumulate counts locally and log the total at the end:
@traced(name="multi-call-span")
async def run_multiple(self, inputs: list) -> list:
    total_prompt_tokens = 0
    total_completion_tokens = 0

    for item in inputs:
        result = await self.chain.ainvoke(item)
        usage = getattr(result, "usage_metadata", {}) or {}
        total_prompt_tokens += usage.get("input_tokens", 0)
        total_completion_tokens += usage.get("output_tokens", 0)

    braintrust.current_span().log(metrics={
        "prompt_tokens": total_prompt_tokens,
        "completion_tokens": total_completion_tokens,
    })

    return results

Step 5: Check model name for estimated_cost

estimated_cost is computed at query time using metadata.model matched against Braintrust’s pricing registry.
  • Standard names like gpt-4o resolve correctly.
  • Azure OpenAI custom deployment names (e.g., my-gpt4-deployment) will not match, and estimated_cost returns null.
If your deployment name doesn’t match, tally cost manually using your token counts and per-model pricing.