> ## Documentation Index
> Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Retry-safe eval workers with stable row IDs

export const plans_0 = "Any"

export const deployments_0 = "Any"

export const data_plane_version_0 = undefined

export const use_case_0 = "Use case - Running interrupted or retried eval workers that should overwrite prior rows rather than creating duplicates"

<Note>
  **Applies to:**

  * Plan - {plans_0}
  * Deployment - {deployments_0}
  * {data_plane_version_0}
  * {use_case_0}
</Note>

## Summary

**Goal:** Make interrupted eval workers resume onto the same experiment rows instead of appending duplicates.

**Features:** `upsert_id` (TypeScript Eval), `experiment.log()` with stable `id`, `_object_delete` insert flag, `attemptId` metadata tagging.

## Configuration steps

### Step 1: Set a stable row ID in TypeScript Eval

Use `upsert_id` on each `EvalCase`, derived from a value that is stable across retries (e.g., sample ID or input hash).

```typescript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
data: examples.map((example) => ({
  input: example.input,
  expected: example.expected,
  metadata: { sample_id: example.id },
  upsert_id: `eval-row-${example.id}`,
}))
```

Retries that supply the same `upsert_id` land on the same logical row in the UI.

### Step 2: Set a stable row ID when logging manually

If you are using lower-level experiment logging instead of the Eval API, pass the stable key as `id`.

```typescript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
experiment.log({
  id: `eval-row-${example.id}`,
  input: example.input,
  output: result,
  scores: { ... },
})
```

This applies to both TypeScript and Python when calling `experiment.log()` directly.

### Step 3: Understand Python Eval limitations

`Eval()` in Python does not expose an equivalent `upsert_id` field on `EvalCase`. Options:

* Use lower-level `experiment.log()` with a stable `id` (see Step 2).
* Implement resume logic in your worker to skip already-completed cases before starting the eval.

### Step 4: Tag every span with a unique `attemptId`

Stable row IDs do not automatically remove child spans from a prior interrupted attempt. Tag **every span** (root and children) with an attempt-scoped value so stale spans can be identified later.

```typescript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
const attemptId = workerRunId; // unique per worker restart

// Root span
rootSpan.log({
  metadata: { evalId, iteration, attemptId },
})

// Each child span must also include attemptId
const llmSpan = rootSpan.startSpan({ name: "llm_call" });
llmSpan.log({
  input: prompt,
  output: response,
  metadata: { attemptId, model: "gpt-4" },
})

const toolSpan = rootSpan.startSpan({ name: "tool_execution" });
toolSpan.log({
  input: toolArgs,
  output: toolResult,
  metadata: { attemptId, tool: "search" },
})
```

### Step 5: Delete stale child spans from a prior attempt

Query for spans with the old `attemptId`, then delete them via the insert endpoint.

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
POST /v1/experiment/{experiment_id}/insert
```

```json theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
{
  "events": [
    { "id": "stale-child-span-id", "_object_delete": true },
    { "id": "another-stale-span-id", "_object_delete": true }
  ]
}
```

Repeat for each stale child span from the interrupted attempt.

### Step 6: Alternative — use a fresh experiment per attempt

If managing stale span cleanup is too complex, create a new experiment for each retry attempt and designate the final successful experiment as canonical. This avoids orphan span cleanup entirely.

## Behavior notes

* Experiment inserts are **append-only**. Reusing a stable `id` does not rewrite history; it creates a new version. The UI displays the latest version per row.
* Reusing a stable row `id` on retry does **not** delete child spans from the previous attempt. Orphan spans must be removed explicitly using `_object_delete`.
* Do not manually set `span_id` or `root_span_id` as the primary deduplication strategy. The high-level Eval runner manages these internally, and overriding them can produce rows with multiple traces.
