Prompt diff feature has 4096 character limit

Summary

The diff feature in Braintrust has a 4096 character limit per prompt or output. When comparing prompts or outputs that exceed this limit in the Braintrust UI, the diff visualization will be truncated or unavailable, displaying a message like “This diff output has been truncated for performance reasons (to 4096 characters).” This limit was implemented to maintain UI performance when comparing large text fields. Users working with long context windows, detailed system prompts, or extensive outputs may encounter this limitation when trying to visualize changes between experiment runs.

Symptoms

Diff view shows a truncation message: “This diff output has been truncated for performance reasons (to 4096 characters)”
Cannot see the full comparison between two prompts or outputs in the UI
The diff feature appears incomplete or cut off when viewing experiment comparisons
Only the first 4096 characters of each prompt/output are compared in the visual diff

Workarounds

Option 1: Export and Compare Externally (Recommended)

Export your experiment data and use external diff tools to compare long prompts. This approach gives you full control over the comparison and supports prompts of any length.

Python SDK

# Export experiment data for external comparison
from braintrust import init_logger

logger = init_logger(project="your-project", experiment="your-experiment")

# Fetch experiment data
experiment = logger.experiment

# Export specific data points with full prompts
for record in experiment.fetch():
    prompt = record.input.get("prompt", "")
    output = record.output
    # Save to file or compare using external tools
    with open(f"prompt_{record.id}.txt", "w") as f:
        f.write(prompt)

TypeScript SDK

// Export experiment data for external comparison
import { initLogger } from "braintrust";

const logger = initLogger({
  project: "your-project",
  experiment: "your-experiment"
});

// Fetch and export experiment data
const records = await logger.experiment.fetch();
for (const record of records) {
  const prompt = record.input?.prompt || "";
  const output = record.output;
  // Save to file or use external diff tools
  await fs.writeFile(`prompt_${record.id}.txt`, prompt);
}

Option 2: View Full Prompts Separately

Instead of using the diff view, open each experiment run separately to view the complete prompts side-by-side in different browser tabs or windows. This allows you to manually review the full content without truncation.

Option 3: Break Down Long Prompts

If possible, restructure your prompts into smaller, logical segments stored in separate fields. This allows you to use the diff feature on individual components while staying under the 4096 character limit per field.

Python SDK

# Structure prompts as separate components
logger.log(
    input={
        "system_prompt": "Your system instructions...",
        "context": "Background context...",
        "user_query": "The actual user question..."
    },
    output=response
)
# Each field can now be diffed separately

TypeScript SDK

// Structure prompts as separate components
logger.log({
  input: {
    system_prompt: "Your system instructions...",
    context: "Background context...",
    user_query: "The actual user question..."
  },
  output: response
});
// Each field can now be diffed separately

Option 4: Log Objects for Efficient Diffing

As mentioned in the truncation message, if your prompt is structured data rather than plain text, log it as a JavaScript/Python object. Braintrust performs more efficient diffing on nested objects, which may provide better results for complex prompts.

Python SDK

# Log prompts as structured objects
logger.log(
    input={
        "messages": [
            {"role": "system", "content": "System instructions..."},
            {"role": "user", "content": "User query..."}
        ],
        "parameters": {"temperature": 0.7, "max_tokens": 2000}
    },
    output=response
)

TypeScript SDK

// Log prompts as structured objects
logger.log({
  input: {
    messages: [
      { role: "system", content: "System instructions..." },
      { role: "user", content: "User query..." }
    ],
    parameters: { temperature: 0.7, max_tokens: 2000 }
  },
  output: response
});

Notes

The 4096 character limit applies to each individual field being compared, not the total size of all fields combined
This is currently the first widely reported instance of this limitation affecting users since the feature was implemented in 2023
A feature request (BRA-3817) has been filed to make the character limit configurable or increase it for users with long prompts
The limit exists primarily for UI performance reasons when rendering large diffs in the browser
Structured object logging may provide better diff performance than long string fields, as Braintrust can diff nested structures more efficiently
For very long prompts (>10,000 characters), consider using external diff tools like diff, git diff, or specialized text comparison software

Documentation Index

​Summary

​Symptoms

​Workarounds

​Option 1: Export and Compare Externally (Recommended)

​Python SDK

​TypeScript SDK

​Option 2: View Full Prompts Separately

​Option 3: Break Down Long Prompts

​Python SDK

​TypeScript SDK

​Option 4: Log Objects for Efficient Diffing

​Python SDK

​TypeScript SDK

​Notes

Summary

Symptoms

Workarounds

Option 1: Export and Compare Externally (Recommended)

Python SDK

TypeScript SDK

Option 2: View Full Prompts Separately

Option 3: Break Down Long Prompts

Python SDK

TypeScript SDK

Option 4: Log Objects for Efficient Diffing

Python SDK

TypeScript SDK

Notes