Score multi-turn agent conversations across traces

Applies to:

Plan -
Deployment -

Summary

Goal: Query sibling traces by shared metadata fields to reconstruct multi-turn agent conversations inside a custom scorer. Features: SQL queries via the /btql API endpoint, metadata.thread_id filtering, subfield indexing, trace-scope scoring helpers.

Configuration steps

Step 1: Choose a tracing model

Two patterns are supported. Single-trace is preferred when possible. Option A — Single trace per conversation (recommended) Model each conversation as one Braintrust trace with multiple spans (one per turn). Use span.export() to continue the same trace across turns. Trace-scope helpers then work without any cross-trace queries.

# Turn 1
with braintrust.start_span(name="turn_1") as span:
    parent_context = span.export()  # pass to next turn

# Turn 2
with braintrust.start_span(name="turn_2", parent=parent_context) as span:
    ...

Option B — Separate trace per turn If each HTTP request must produce its own trace, correlate traces via metadata.thread_id and query them with SQL.

Step 2: Set `metadata.thread_id` on every trace

with braintrust.start_span(name="turn") as span:
    span.log(metadata={"thread_id": "conv-abc123", "turn": 2})

Step 3: Query sibling traces via SQL inside a scorer

Custom Python scorers receive BRAINTRUST_API_KEY automatically. Use it to call /btql. Always bound the query with a time window and LIMIT to control latency and cost.

import os
import requests

def sql_string(value):
    return "'" + str(value).replace("'", "''") + "'"

def scorer(input, output, metadata):
    api_key = os.environ["BRAINTRUST_API_KEY"]
    api_url = os.environ.get("BRAINTRUST_API_URL", "https://api.braintrust.dev")
    thread_id = metadata.get("thread_id")

    if not thread_id:
        return None

    query = f"""
        SELECT id, root_span_id, span_id, input, output, created
        FROM project_logs('<PROJECT_ID>', shape => 'traces')
        WHERE metadata.thread_id = {sql_string(thread_id)}
          AND created > now() - interval 1 hour
        ORDER BY created ASC
        LIMIT 20
    """

    response = requests.post(
        f"{api_url}/btql",
        headers={"Authorization": f"Bearer {api_key}"},
        json={"query": query},
    )
    response.raise_for_status()

    turns = response.json()["data"]
    # assemble full conversation from turns...

For EU organizations, set BRAINTRUST_API_URL to https://api-eu.braintrust.dev. For self-hosted deployments, set it to your Braintrust data-plane URL.

Step 4: Enable subfield indexing on `metadata.thread_id`

If metadata.thread_id is high-cardinality and queried frequently, enable subfield indexing in your project settings. This reduces lookup latency for that specific filter. Still pair with a time range and LIMIT — indexing speeds up lookups but doesn’t eliminate duplicated scorer work.

Step 5: Scope scoring to the final turn only

Avoid running cross-trace BTQL queries from every span scorer invocation. Run the full reconstruction only on the final turn, or use a batch/offline eval.

def scorer(input, output, metadata):
    # Only score when this is the last turn in the conversation
    if not metadata.get("is_final_turn"):
        return None
    # proceed with BTQL query...

Version requirements

Feature	Requirement
`/btql` queries (self-hosted)	Data plane v1.1.29+
`trace.get_thread()`, `trace.get_spans()`	Data plane v2.0+
Trace-scope scoring helpers — Python SDK	v0.5.6+
Trace-scope scoring helpers — TypeScript SDK	v2.2.1+
Subfield indexing	Data plane v2.0+

​Summary

​Configuration steps

​Step 1: Choose a tracing model

​Step 2: Set metadata.thread_id on every trace

​Step 3: Query sibling traces via SQL inside a scorer

​Step 4: Enable subfield indexing on metadata.thread_id

​Step 5: Scope scoring to the final turn only

​Version requirements

Summary

Configuration steps

Step 1: Choose a tracing model

Step 2: Set `metadata.thread_id` on every trace

Step 3: Query sibling traces via SQL inside a scorer

Step 4: Enable subfield indexing on `metadata.thread_id`

Step 5: Scope scoring to the final turn only

Version requirements