Documentation Index
Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
Node.js test runner is the built-in test framework in Node.js.
Braintrust integrates with node:test so you can use the integrated Node.js test runner to run evals.
Setup
Install Braintrust in your Node.js project:
Set your API key as an environment variable:
export BRAINTRUST_API_KEY=<your-api-key>
Run your first eval
Create a suite with initNodeTestSuite(), then pass suite.eval() directly to test().
import assert from "node:assert/strict";
import { after, describe, test } from "node:test";
import { initNodeTestSuite } from "braintrust";
async function translate(text: string) {
if (text === "hello") {
return "hola";
}
return text;
}
describe("Translation evals", () => {
const suite = initNodeTestSuite({
projectName: "support-bot",
after,
});
test(
"translates hello",
suite.eval(
{
input: { text: "hello" },
expected: "hola",
tags: ["smoke", "translation"],
},
async ({ input }) => {
if (typeof input.text !== "string") {
throw new Error("Expected input.text to be a string");
}
const result = await translate(input.text);
assert.equal(result, "hola");
return result;
},
),
);
});
Run the test:
node --test translation.eval.test.ts
Braintrust creates an experiment for the suite, records each tracked test as a span, and prints a summary when the suite flushes.
Separate evals from unit tests
Keep eval files separate from regular unit tests with a naming convention such as *.eval.test.ts or a dedicated evals/ directory.
# Unit tests
node --test tests/unit/**/*.test.ts
# Evals
node --test tests/evals/**/*.eval.test.ts
This keeps slower model-backed tests separate while letting untracked tests continue to use the native runner with no Braintrust involvement.
How it works
initNodeTestSuite() creates one Braintrust experiment for the suite.
suite.eval() returns a normal node:test callback, so you can mix tracked evals and regular unit tests in the same file.
- The callback return value becomes the logged
output and is passed to scorers.
- Passing
after from node:test registers an automatic flush hook at the end of the suite.
When you do not use suite.eval(), tests run normally and are not logged to Braintrust.
Add scorers
Scorers receive { output, expected, input, metadata } and return a score object.
test(
"translation quality",
suite.eval(
{
input: { text: "good morning" },
expected: "buenos dias",
scorers: [
({ output, expected }) => ({
name: "exact_match",
score: output === expected ? 1 : 0,
}),
],
},
async ({ input }) => {
if (typeof input.text !== "string") {
throw new Error("Expected input.text to be a string");
}
return await translate(input.text);
},
),
);
You can also use scorers from autoevals:
import { Levenshtein } from "autoevals";
test(
"translation similarity",
suite.eval(
{
input: { text: "goodbye" },
expected: "adios",
scorers: [Levenshtein],
},
async ({ input }) => {
if (typeof input.text !== "string") {
throw new Error("Expected input.text to be a string");
}
return await translate(input.text);
},
),
);
Resources