Node.js test runner - Braintrust

Node.js test runner is the built-in test framework in Node.js. Braintrust integrates with node:test so you can use the integrated Node.js test runner to run evals.

Setup

Install Braintrust in your Node.js project:

npm install braintrust

Set your API key as an environment variable:

export BRAINTRUST_API_KEY=<your-api-key>

Run your first eval

Create a suite with initNodeTestSuite(), then pass suite.eval() directly to test().

translation.eval.test.ts

import assert from "node:assert/strict";
import { after, describe, test } from "node:test";
import { initNodeTestSuite } from "braintrust";

async function translate(text: string) {
  if (text === "hello") {
    return "hola";
  }

  return text;
}

describe("Translation evals", () => {
  const suite = initNodeTestSuite({
    projectName: "support-bot",
    after,
  });

  test(
    "translates hello",
    suite.eval(
      {
        input: { text: "hello" },
        expected: "hola",
        tags: ["smoke", "translation"],
      },
      async ({ input }) => {
        if (typeof input.text !== "string") {
          throw new Error("Expected input.text to be a string");
        }

        const result = await translate(input.text);
        assert.equal(result, "hola");
        return result;
      },
    ),
  );
});

Run the test:

node --test translation.eval.test.ts

Braintrust creates an experiment for the suite, records each tracked test as a span, and prints a summary when the suite flushes.

Separate evals from unit tests

Keep eval files separate from regular unit tests with a naming convention such as *.eval.test.ts or a dedicated evals/ directory.

# Unit tests
node --test tests/unit/**/*.test.ts

# Evals
node --test tests/evals/**/*.eval.test.ts

This keeps slower model-backed tests separate while letting untracked tests continue to use the native runner with no Braintrust involvement.

How it works

initNodeTestSuite() creates one Braintrust experiment for the suite.
suite.eval() returns a normal node:test callback, so you can mix tracked evals and regular unit tests in the same file.
The callback return value becomes the logged output and is passed to scorers.
Passing after from node:test registers an automatic flush hook at the end of the suite.

When you do not use suite.eval(), tests run normally and are not logged to Braintrust.

Add scorers

Scorers receive { output, expected, input, metadata } and return a score object.

test(
  "translation quality",
  suite.eval(
    {
      input: { text: "good morning" },
      expected: "buenos dias",
      scorers: [
        ({ output, expected }) => ({
          name: "exact_match",
          score: output === expected ? 1 : 0,
        }),
      ],
    },
    async ({ input }) => {
      if (typeof input.text !== "string") {
        throw new Error("Expected input.text to be a string");
      }

      return await translate(input.text);
    },
  ),
);

You can also use scorers from autoevals:

import { Levenshtein } from "autoevals";

test(
  "translation similarity",
  suite.eval(
    {
      input: { text: "goodbye" },
      expected: "adios",
      scorers: [Levenshtein],
    },
    async ({ input }) => {
      if (typeof input.text !== "string") {
        throw new Error("Expected input.text to be a string");
      }

      return await translate(input.text);
    },
  ),
);

​Setup

​Run your first eval

​Separate evals from unit tests

​How it works

​Add scorers

​Resources

Setup

Run your first eval

Separate evals from unit tests

How it works

Add scorers

Resources