Scorer template variable reference

LLM-as-a-judge scorer prompts support mustache templating. The variables available depend on whether the scorer is scoped to a Span or a Trace.

Span-level variables

Available in any scorer with Scope: Span. Each matching span is scored independently.

Variable	Description
`{{input}}`	Input passed to the span
`{{output}}`	Output produced by the span
`{{expected}}`	Expected output, if provided (optional)
`{{metadata}}`	Custom metadata attached to the span

Example prompt:

Rate the helpfulness of this response.

Input: {{input}}
Output: {{output}}
{{#expected}}
Expected: {{expected}}
{{/expected}}

Return "A" for very helpful, "B" for somewhat helpful, "C" for not helpful.

Trace-level variables

Available in scorers with Scope: Trace. The scorer runs once per trace and has access to the full conversation thread. The four span-level variables (input, output, expected, metadata) are also available here and are populated from the root span of the trace.

Variable	Type	Description
`{{input}}`	any	Input from the root span
`{{output}}`	any	Output from the root span
`{{expected}}`	any	Expected output from the root span (optional)
`{{metadata}}`	object	Metadata from the root span
`{{thread}}`	text	Full conversation rendered as human-readable text
`{{thread_count}}`	number	Total number of messages in the thread
`{{first_message}}`	object	First message in the thread
`{{last_message}}`	object	Last message in the thread
`{{user_messages}}`	array	All user/human messages only
`{{assistant_messages}}`	array	All assistant messages only
`{{human_ai_pairs}}`	array	Turn pairs — each item has `{human, assistant}`

`{{thread}}`

{{thread}} renders the entire conversation as formatted text, ready to pass directly to a judge model. It’s the simplest way to give the scorer full conversation context. Example prompt:

Evaluate whether the assistant's responses across this conversation are helpful and on-topic.

Conversation:
{{thread}}

Return "A" if the assistant performed well, "B" if adequate, "C" if poor.

`{{human_ai_pairs}}`

For Nunjucks prompts, {{human_ai_pairs}} lets you iterate over matched turn pairs:

{% for pair in human_ai_pairs %}
Turn {{ loop.index }}:
  User: {{ pair.human.content }}
  Assistant: {{ pair.assistant.content }}
{% endfor %}

Were the assistant's responses appropriate throughout?

Pairs are matched by index (first user message with first assistant message, etc.). If the counts are unequal, only the matched pairs are included.

`{{user_messages}}` and `{{assistant_messages}}`

These filter the thread to a single role. Useful if you only need one side of the conversation:

Rate the clarity of the user's questions in this support conversation.

User messages:
{{#user_messages}}
- {{content}}
{{/user_messages}}

SDK requirements for trace-level scoring

Trace-level scorers require:

TypeScript SDK v2.2.1+
Python SDK v0.5.6+
Ruby SDK v0.2.1+

Setting up multi-turn conversation scoring

If your application creates a new trace per turn (common for chatbots), the easiest way to make {{thread}} work is to route all turns under a single root span using span.export(): Python:

import braintrust

# First turn — create the session root and export it
with braintrust.start_span(name="chat.session") as session_span:
    session_id = session_span.export()
    # persist session_id (e.g. session cookie, Redis, DB)

# Every subsequent turn — attach as a child
with braintrust.start_span(name="chat.turn", parent=session_id) as span:
    span.log(input={"messages": messages}, output=response)

TypeScript:

import { traced } from "braintrust";

// First turn — create and export the session root
let sessionId: string;
await traced(async (span) => {
  sessionId = await span.export();
}, { name: "chat.session" });

// Every turn — pass the same parent
await traced(async (span) => {
  span.log({ input: { messages }, output: response });
}, { name: "chat.turn", parent: sessionId });

Once all turns share a root trace, a Trace-scoped LLM-as-a-judge scorer with {{thread}} in the prompt will receive the full conversation.

Documentation Index

​Span-level variables

​Trace-level variables

​{{thread}}

​{{human_ai_pairs}}

​{{user_messages}} and {{assistant_messages}}

​SDK requirements for trace-level scoring

​Setting up multi-turn conversation scoring