Applies to:
- Plan -
- Deployment -
Summary
Issue: Iterating over a Braintrust dataset withfor row in dataset fails with a 500 Internal Server Error from the /btql endpoint, raising braintrust.util.AugmentedHTTPError.
Cause: The default batch size of 1000 rows per paginated BTQL request can hit a transient backend timeout (e.g., S3 connectivity issue on a storage node), and the Python SDK does not automatically retry on 5xx responses.
Resolution: Retry the evaluation — transient errors typically resolve on their own. If the error recurs, use dataset.fetch(batch_size=500) to reduce the size of each paginated request.
Resolution steps
If the error occurred once
Step 1: Retry the evaluation
Re-run the eval without changes. A transient backend timeout is not caused by your code or dataset size and should not persist.If the error recurs intermittently
Step 1: Replace dataset iteration with fetch(batch_size=...)
Replace for row in dataset with an explicit fetch() call using a smaller batch size. Start with 500; reduce to 100–200 if errors continue.
Step 2: Add retry logic for 5xx errors
The SDK does not retry5xx responses automatically. Wrap the fetch in a retry loop to handle transient failures without aborting the eval.