Summary
The diff feature in Braintrust has a 4096 character limit per prompt or output. When comparing prompts or outputs that exceed this limit in the Braintrust UI, the diff visualization will be truncated or unavailable, displaying a message like “This diff output has been truncated for performance reasons (to 4096 characters).” This limit was implemented to maintain UI performance when comparing large text fields. Users working with long context windows, detailed system prompts, or extensive outputs may encounter this limitation when trying to visualize changes between experiment runs.Symptoms
- Diff view shows a truncation message: “This diff output has been truncated for performance reasons (to 4096 characters)”
- Cannot see the full comparison between two prompts or outputs in the UI
- The diff feature appears incomplete or cut off when viewing experiment comparisons
- Only the first 4096 characters of each prompt/output are compared in the visual diff
Workarounds
Option 1: Export and Compare Externally (Recommended)
Export your experiment data and use external diff tools to compare long prompts. This approach gives you full control over the comparison and supports prompts of any length.Python SDK
TypeScript SDK
Option 2: View Full Prompts Separately
Instead of using the diff view, open each experiment run separately to view the complete prompts side-by-side in different browser tabs or windows. This allows you to manually review the full content without truncation.Option 3: Break Down Long Prompts
If possible, restructure your prompts into smaller, logical segments stored in separate fields. This allows you to use the diff feature on individual components while staying under the 4096 character limit per field.Python SDK
TypeScript SDK
Option 4: Log Objects for Efficient Diffing
As mentioned in the truncation message, if your prompt is structured data rather than plain text, log it as a JavaScript/Python object. Braintrust performs more efficient diffing on nested objects, which may provide better results for complex prompts.Python SDK
TypeScript SDK
Notes
- The 4096 character limit applies to each individual field being compared, not the total size of all fields combined
- This is currently the first widely reported instance of this limitation affecting users since the feature was implemented in 2023
- A feature request (BRA-3817) has been filed to make the character limit configurable or increase it for users with long prompts
- The limit exists primarily for UI performance reasons when rendering large diffs in the browser
- Structured object logging may provide better diff performance than long string fields, as Braintrust can diff nested structures more efficiently
- For very long prompts (>10,000 characters), consider using external diff tools like
diff,git diff, or specialized text comparison software