Troubleshooting Traces Stuck in Progress

Summary

Traces can become stuck in an “in progress” state when they are not properly closed or ended. The most common cause is a missing call to end the trace or span, which prevents Braintrust from marking the trace as complete. This can also occur due to network interruptions, application crashes before the trace is finalized, or improper error handling that bypasses trace completion logic. Every trace must have a corresponding end call to transition from “in progress” to “complete” status.

Symptoms

Trace shows as “in progress” for an extended period (hours or days)
Trace never transitions to a “complete” or “finished” state in the UI
Missing end timestamp or duration information for the trace
Child spans are complete but parent trace remains open
Trace metrics and aggregations may be incomplete or inaccurate

Workarounds

Option 1: Ensure Proper Trace Lifecycle Management (Recommended)

Always use context managers (Python) or try-finally blocks (TypeScript) to ensure traces are properly closed, even when exceptions occur. This is the most reliable way to prevent traces from getting stuck.

Python SDK

# Using context manager (recommended)
import braintrust

with braintrust.trace("my-operation") as span:
    # Your code here
    span.log(input="example input", output="example output")
    # Span automatically closes when exiting the context

# Alternative: Manual management with try-finally
span = braintrust.start_span("my-operation")
try:
    # Your code here
    span.log(input="example input", output="example output")
finally:
    span.end()  # Ensure this is always called

TypeScript SDK

// Using async context (recommended)
import { traced } from "braintrust";

await traced(async (span) => {
  // Your code here
  span.log({ input: "example input", output: "example output" });
  // Span automatically closes when promise resolves
});

// Alternative: Manual management with try-finally
import { startSpan } from "braintrust";

const span = startSpan({ name: "my-operation" });
try {
  // Your code here
  span.log({ input: "example input", output: "example output" });
} finally {
  span.end();  // Ensure this is always called
}

Option 2: Add Explicit Error Handling

If you’re manually managing trace lifecycle, ensure that error handling paths also properly close traces. This prevents traces from remaining open when exceptions occur.

Python SDK

import braintrust

span = braintrust.start_span("my-operation")
try:
    # Your code here
    result = perform_operation()
    span.log(input="data", output=result)
except Exception as e:
    # Log the error to the span
    span.log(input="data", error=str(e))
    raise
finally:
    # Always close the span
    span.end()

TypeScript SDK

import { startSpan } from "braintrust";

const span = startSpan({ name: "my-operation" });
try {
  // Your code here
  const result = await performOperation();
  span.log({ input: "data", output: result });
} catch (e) {
  // Log the error to the span
  span.log({ input: "data", error: String(e) });
  throw e;
} finally {
  // Always close the span
  span.end();
}

Option 3: Check for Network and Flushing Issues

Ensure that your application properly flushes pending traces before shutting down, especially in short-lived processes like serverless functions or batch jobs.

Python SDK

import braintrust

# At application shutdown or end of execution
braintrust.flush()  # Ensure all pending traces are sent

TypeScript SDK

import { flush } from "braintrust";

// At application shutdown or end of execution
await flush();  // Ensure all pending traces are sent

Notes

Traces do not have an automatic timeout - they will remain “in progress” indefinitely until explicitly closed
Context managers and wrapper functions (like traced()) are the safest way to ensure proper trace lifecycle
If a trace is stuck in progress and cannot be closed programmatically, contact Braintrust support for manual resolution
Network interruptions during trace finalization can cause traces to appear stuck - ensure proper error handling and retry logic
In serverless environments, always call flush() before the function terminates to avoid incomplete traces
Child spans can be complete while parent traces remain open - ensure you’re closing spans in the correct order (child before parent)

​Summary

​Symptoms

​Workarounds

​Option 1: Ensure Proper Trace Lifecycle Management (Recommended)

​Python SDK

​TypeScript SDK

​Option 2: Add Explicit Error Handling

​Python SDK

​TypeScript SDK

​Option 3: Check for Network and Flushing Issues

​Python SDK

​TypeScript SDK

​Notes

​References

Summary

Symptoms

Workarounds

Option 1: Ensure Proper Trace Lifecycle Management (Recommended)

Python SDK

TypeScript SDK

Option 2: Add Explicit Error Handling

Python SDK

TypeScript SDK

Option 3: Check for Network and Flushing Issues

Python SDK

TypeScript SDK

Notes

References