> ## Documentation Index
> Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Query token distribution by user or prompt with SQL

export const plans_0 = "Any"

export const deployments_0 = "Any"

export const data_plane_version_0 = undefined

export const use_case_0 = "Use case - Querying token consumption distribution per user or prompt when token metrics and user metadata live on different span types"

<Note>
  **Applies to:**

  * Plan - {plans_0}
  * Deployment - {deployments_0}
  * {data_plane_version_0}
  * {use_case_0}
</Note>

## Summary

**Goal:** Query per-user and per-prompt token distribution across traces using the two-level GROUP BY pattern.

**Features:** `project_logs()`, `any_value()`, `estimated_cost()`, `shape => 'traces'`, timestamp filters.

## Configuration steps

### Step 1: Understand span data layout

Token metrics (`metrics.prompt_tokens`, `metrics.completion_tokens`) live on LLM-type spans. User and prompt metadata (e.g., `metadata.user_id`, `metadata.model`) may live on the root span. A single span row cannot see both. The two-level query below bridges them.

### Step 2: Write the inner query (group by trace)

Group by `root_span_id` to collapse all spans in a trace. Use `any_value()` to surface root-span metadata fields within that group.

```sql theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
SELECT
  any_value(metadata.user_id)   AS user,
  any_value(metadata.model)     AS model,
  SUM(metrics.prompt_tokens)    AS prompt_tokens,
  SUM(metrics.completion_tokens) AS completion_tokens,
  SUM(estimated_cost())         AS total_cost
FROM project_logs('<project-id>', shape => 'traces')
WHERE created >= NOW() - INTERVAL 7 DAY
GROUP BY root_span_id
```

### Step 3: Wrap with outer query (group by user or model)

```sql theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
SELECT
  user,
  model,
  SUM(prompt_tokens)    AS total_prompt_tokens,
  SUM(completion_tokens) AS total_completion_tokens,
  SUM(total_cost)       AS total_cost
FROM (
  SELECT
    any_value(metadata.user_id)    AS user,
    any_value(metadata.model)      AS model,
    SUM(metrics.prompt_tokens)     AS prompt_tokens,
    SUM(metrics.completion_tokens) AS completion_tokens,
    SUM(estimated_cost())          AS total_cost
  FROM project_logs('<project-id>', shape => 'traces')
  WHERE created >= NOW() - INTERVAL 7 DAY
  GROUP BY root_span_id
) subq
GROUP BY user, model
ORDER BY total_cost DESC;
```

### Step 4: Use the correct `shape` parameter

| Shape                | Use case                                                                                    |
| -------------------- | ------------------------------------------------------------------------------------------- |
| *(default)*          | One row per span                                                                            |
| `shape => 'traces'`  | All spans returned when any span matches a filter; enables inner GROUP BY on `root_span_id` |
| `shape => 'summary'` | Pre-aggregated trace-level metrics; useful for rollups without a subquery                   |

### Step 5: Use `estimated_cost()` not `metrics.estimated_cost`

`estimated_cost()` is a function that computes cost from token counts and the model registry. Use it in span-level queries. Aggregating the pre-computed `metrics.estimated_cost` field at a second level of grouping is a known bug.

```sql theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
SUM(estimated_cost()) AS total_cost  -- correct
SUM(metrics.estimated_cost)          -- may produce incorrect results in outer GROUP BY
```

### Step 6: Add a timestamp filter to avoid timeouts

Queries without a `created` filter scan the full table and will time out on large projects.

```sql theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
WHERE created >= NOW() - INTERVAL 1 DAY  -- start narrow, expand as needed
```

Start with 1 day and widen the window once the query shape is confirmed.

### Step 7: Work around cross-span JOIN limitations

`JOIN`, `UNION`, and subqueries across different `project_logs()` calls are unsupported. Use these alternatives instead:

* **`shape => 'traces'`** — returns all spans in a trace when any span matches; use `any_value()` to pull fields from other span types
* **`shape => 'summary'`** — returns one pre-aggregated row per trace with total token counts and cost

### Step 8: Handle Loop query generation limitations

Loop does not know about `any_value()` or the two-level GROUP BY pattern. For nuanced schemas:

1. Run Loop with a 1-hour window to generate a starting query.
2. Paste the output into the SQL sandbox.
3. Edit by hand — add `any_value()`, the inner/outer GROUP BY, and a tight `created` filter.

Loop output is a starting point and should be verified against known data.
