wrapAISDK interleaved traces with concurrent calls

Summary
Resolution steps
If using a shared model instance across parallel tasks
Step 1: Move model construction inside the task function
Step 2: Verify span parenting in the Braintrust UI
Otherwise, if using Eval with parallel task execution
Step 1: Construct the model inside the task function
Reproduction script

Applies to:

Plan -
Deployment -

Summary

Issue: When multiple generateText calls share a single wrapAISDK-instrumented model instance and run concurrently, doGenerate child spans show inputs and outputs from the wrong parent generateText span. Cause: wrapAISDK patches the model object’s doGenerate method in place, closing over the first call’s parent span; the AUTO_PATCHED_MODEL guard prevents re-patching on subsequent concurrent calls, so all parallel doGenerate invocations log into the first call’s span. Resolution: Construct a fresh model instance inside each task function instead of sharing a single module-scoped instance.

Resolution steps

If using a shared model instance across parallel tasks

Step 1: Move model construction inside the task function

Replace any module-scoped model singleton with a per-call factory so each generateText call patches its own instance. Before (triggers bug):

const model = anthropic("claude-haiku-4-5"); // shared instance

async function runTask(input: string) {
  return generateText({ model, messages: [{ role: "user", content: input }] });
}

await Promise.all(inputs.map(runTask));

After (workaround):

async function runTask(input: string) {
  const model = anthropic("claude-haiku-4-5"); // fresh instance per call
  return generateText({ model, messages: [{ role: "user", content: input }] });
}

await Promise.all(inputs.map(runTask));

Step 2: Verify span parenting in the Braintrust UI

For each top-level generateText span, expand its doGenerate child and confirm that input.messages contains the same content as the parent span’s input. Each doGenerate span should be parented under its own generateText span.

Otherwise, if using `Eval` with parallel task execution

Step 1: Construct the model inside the task function

Eval("my-project", {
  data: () => dataset,
  task: async (input) => {
    const model = anthropic("claude-haiku-4-5"); // fresh per task
    const result = await generateText({
      model,
      messages: [{ role: "user", content: input.query }],
    });
    return result.text;
  },
  scores: [/* ... */],
});

Reproduction script

The following script can be used to reproduce the problem. Set USE_FACTORY_WORKAROUND=true to enable the workaround:

/**
 * Repro for: wrapAISDK interleaves traces under concurrency.
 *
 * Summary: two concurrent `generateText` calls that share the same
 * `LanguageModelV2` instance end up with cross-wired `doGenerate` children --
 * wrapAISDK's per-call model patching silently skips re-patching on the
 * second call, so both `doGenerate` invocations run inside the first call's
 * patched closure, which captured the first call's parent span.
 *
 * Observed with:
 *   braintrust@3.8.0
 *   ai@6.0.164
 *   @ai-sdk/anthropic@3.0.69
 *   Node 22
 *
 * Setup (in some scratch directory):
 *   pnpm init -y
 *   pnpm add braintrust@3.8.0 ai@6.0.164 @ai-sdk/anthropic@3.0.69
 *   pnpm add -D tsx typescript
 *
 *   export BRAINTRUST_API_KEY=...           # required
 *   export BRAINTRUST_PROJECT_NAME=...      # required; created on first ingest
 *   export ANTHROPIC_API_KEY=...            # required
 *   # optional:
 *   #   BRAINTRUST_APP_URL   (default https://www.braintrust.dev)
 *   #   BRAINTRUST_API_URL   (default https://api.braintrust.dev)
 *   #   MODEL                (default claude-haiku-4-5)
 *   #   CONCURRENCY          (default 2)
 *
 *   npx tsx repro-wrapaisdk-concurrent-trace-interleave.ts
 *
 * What to observe in the Braintrust UI:
 *   1. Open the project Logs view.
 *   2. This script prints a unique marker per call (e.g. "MARKER_A", "MARKER_B")
 *      and the top-level span id for each. Open each top-level generateText
 *      span in turn.
 *   3. Expand its `doGenerate` child. The child's `input.messages` should
 *      contain the SAME marker as the parent's input. Under the bug, one of
 *      the top-level spans will have a `doGenerate` child whose input belongs
 *      to the OTHER call.
 *
 * Reliable workaround (documented in the original report): replace shared
 * model singletons with per-call factories, e.g. `() => anthropic("...")`,
 * so every call patches its own fresh instance. A toggle for that is provided
 * below via the `USE_FACTORY_WORKAROUND` env var.
 */

import { initLogger, wrapAISDK } from "braintrust";
import * as ai from "ai";
import { anthropic } from "@ai-sdk/anthropic";

function requireEnv(name: string): string {
  const v = process.env[name];
  if (!v) {
    console.error(`Missing required env var: ${name}`);
    process.exit(1);
  }
  return v;
}

const BRAINTRUST_API_KEY = requireEnv("BRAINTRUST_API_KEY");
const BRAINTRUST_PROJECT_NAME = requireEnv("BRAINTRUST_PROJECT_NAME");
requireEnv("ANTHROPIC_API_KEY");

const BRAINTRUST_APP_URL = process.env.BRAINTRUST_APP_URL;
const MODEL_NAME = process.env.MODEL ?? "claude-haiku-4-5";
const CONCURRENCY = Math.max(2, Number(process.env.CONCURRENCY ?? "2"));
const USE_FACTORY_WORKAROUND =
  (process.env.USE_FACTORY_WORKAROUND ?? "").toLowerCase() === "true";

const logger = initLogger({
  projectName: BRAINTRUST_PROJECT_NAME,
  apiKey: BRAINTRUST_API_KEY,
  asyncFlush: true,
  ...(BRAINTRUST_APP_URL ? { appUrl: BRAINTRUST_APP_URL } : {}),
});

const { generateText } = wrapAISDK(ai);

// Shared vs. per-call model instance. The shared variant is what triggers the
// bug; the factory variant is the documented workaround.
const SHARED_MODEL = anthropic(MODEL_NAME);
const pickModel = () =>
  USE_FACTORY_WORKAROUND ? anthropic(MODEL_NAME) : SHARED_MODEL;

// Unique markers per concurrent call so cross-wiring is visually obvious when
// inspecting the resulting doGenerate span inputs in the Braintrust UI.
const markers = Array.from({ length: CONCURRENCY }, (_, i) =>
  `MARKER_${String.fromCharCode(65 + i)}`,
);
const topics = ["cats", "dogs", "birds", "fish", "lizards", "rabbits"];

async function runOne(index: number) {
  const marker = markers[index];
  const topic = topics[index % topics.length];
  const prompt = `${marker}: write one short sentence about ${topic}.`;

  const result = await generateText({
    model: pickModel(),
    messages: [{ role: "user", content: prompt }],
  });

  return { marker, prompt, text: result.text };
}

async function main() {
  console.log(
    `Launching ${CONCURRENCY} concurrent generateText calls ` +
      `(USE_FACTORY_WORKAROUND=${USE_FACTORY_WORKAROUND}, model=${MODEL_NAME}).`,
  );

  const results = await Promise.all(
    Array.from({ length: CONCURRENCY }, (_, i) => runOne(i)),
  );

  console.log("\n=== Per-call results (client side) ===");
  for (const r of results) {
    console.log(`  ${r.marker}`);
    console.log(`    prompt: ${r.prompt}`);
    console.log(`    output: ${r.text.replace(/\s+/g, " ").slice(0, 120)}`);
  }
}

main()
  .catch((err) => {
    console.error(err);
    process.exitCode = 1;
  })
  .finally(async () => {
    await logger.flush();
    const appUrl = BRAINTRUST_APP_URL ?? "https://www.braintrust.dev";
    console.log(
      `\nSpans flushed. Open ${appUrl} -> project ` +
        `'${BRAINTRUST_PROJECT_NAME}' -> Logs.\n` +
        "For each top-level generateText span, expand its `doGenerate` child\n" +
        "and check that `input.messages` contains the SAME marker as the\n" +
        "parent's `input`. Under the bug, one of the children will contain\n" +
        "the OTHER call's marker (cross-wired input).\n\n" +
        "To confirm the workaround attributes children correctly, re-run with:\n" +
        "  USE_FACTORY_WORKAROUND=true npx tsx " +
        "repro-wrapaisdk-concurrent-trace-interleave.ts",
    );
  });

Note: This is a known bug in wrapAISDK. A permanent fix (resolving parent spans via async-context tracking instead of a closed-over span) has not yet been released. The per-task model instance pattern is the recommended workaround until a patched SDK version is available.

Using OpenAI Responses API with Braintrust SDKs

⌘I

​Summary

​Resolution steps

​If using a shared model instance across parallel tasks

​Step 1: Move model construction inside the task function

​Step 2: Verify span parenting in the Braintrust UI

​Otherwise, if using Eval with parallel task execution

​Step 1: Construct the model inside the task function

​Reproduction script

Summary

Resolution steps

If using a shared model instance across parallel tasks

Step 1: Move model construction inside the task function

Step 2: Verify span parenting in the Braintrust UI

Otherwise, if using `Eval` with parallel task execution

Step 1: Construct the model inside the task function

Reproduction script