Skip to main content
Applies to:


Summary

Issue: High-volume operations like backfill jobs produce spikes of 401 authentication errors (invalid x-api-key) when using https://braintrustproxy.com/v1, with errors lasting 1-2 minutes before resolving automatically. Cause: The current proxy gateway infrastructure has capacity constraints that trigger temporary rate limiting during high load, manifesting as 401 errors even though authentication is valid. Resolution: Throttle concurrent requests during backfills or schedule them during off-peak hours until the new gateway is deployed.

Resolution Steps

If running backfill jobs

Step 1: Throttle request rate

Reduce concurrent requests to stay below gateway capacity limits.
# Add rate limiting to your backfill
import time
from braintrust import log

for batch in data_batches:
    log(batch)
    time.sleep(0.1)  # Throttle requests

Step 2: Schedule during off-peak hours

Run backfills when customer traffic is lowest to reduce contention.

Step 3: Implement retry logic

Handle transient 401 errors with exponential backoff.
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def log_with_retry(data):
    return log(data)

Otherwise, if experiencing errors during production traffic

Step 1: Monitor error duration

Errors should resolve within 1-2 minutes as load decreases.

Step 2: Implement client-side retry logic

Use exponential backoff to handle temporary failures without data loss.

Step 3: Contact support

If errors persist beyond 2 minutes or affect customer requests, contact Braintrust support with timestamps and error details.