LLM-as-a-judge scorer prompts support mustache templating. The variables available depend on whether the scorer is scoped to a Span or a Trace.Documentation Index
Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
Span-level variables
Available in any scorer with Scope: Span. Each matching span is scored independently.| Variable | Description |
|---|---|
{{input}} | Input passed to the span |
{{output}} | Output produced by the span |
{{expected}} | Expected output, if provided (optional) |
{{metadata}} | Custom metadata attached to the span |
Trace-level variables
Available in scorers with Scope: Trace. The scorer runs once per trace and has access to the full conversation thread. The four span-level variables (input, output, expected, metadata) are also available here and are populated from the root span of the trace.
| Variable | Type | Description |
|---|---|---|
{{input}} | any | Input from the root span |
{{output}} | any | Output from the root span |
{{expected}} | any | Expected output from the root span (optional) |
{{metadata}} | object | Metadata from the root span |
{{thread}} | text | Full conversation rendered as human-readable text |
{{thread_count}} | number | Total number of messages in the thread |
{{first_message}} | object | First message in the thread |
{{last_message}} | object | Last message in the thread |
{{user_messages}} | array | All user/human messages only |
{{assistant_messages}} | array | All assistant messages only |
{{human_ai_pairs}} | array | Turn pairs — each item has {human, assistant} |
{{thread}}
{{thread}} renders the entire conversation as formatted text, ready to pass directly to a judge model. It’s the simplest way to give the scorer full conversation context.
Example prompt:
{{human_ai_pairs}}
For Nunjucks prompts, {{human_ai_pairs}} lets you iterate over matched turn pairs:
{{user_messages}} and {{assistant_messages}}
These filter the thread to a single role. Useful if you only need one side of the conversation:
SDK requirements for trace-level scoring
Trace-level scorers require:- TypeScript SDK v2.2.1+
- Python SDK v0.5.6+
- Ruby SDK v0.2.1+
Setting up multi-turn conversation scoring
If your application creates a new trace per turn (common for chatbots), the easiest way to make{{thread}} work is to route all turns under a single root span using span.export():
Python:
{{thread}} in the prompt will receive the full conversation.