Write logs

Logs are more than a debugging tool— they are a key part of the feedback loop that drives continuous improvement in your AI application. There are several ways to log things in Braintrust, ranging from higher level for simple use cases, to more complex and customized spans for more control.

Log LLM calls

Logs are most commonly used for LLM calls. Braintrust includes native SDK wrappers for several AI providers that automatically log your requests. See the AI providers documentation for detailed setup instructions for each provider.

import { initLogger, wrapOpenAI, wrapTraced } from "braintrust";
import OpenAI from "openai";

// Initialize the logger and OpenAI client
const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
});
const client = wrapOpenAI(new OpenAI({ apiKey: process.env.OPENAI_API_KEY }));

// Function to classify text as a question or statement
const classifyText = wrapTraced(async (input: string) => {
  const response = await client.chat.completions.create({
    messages: [
      {
        role: "system",
        content: "Classify the following text as a question or a statement.",
      },
      { role: "user", content: input },
    ],
    model: "gpt-4o",
  });

  // Extract the classification from the response
  const classification = response?.choices?.[0]?.message?.content?.trim();
  return classification || "Unable to classify the input.";
}, logger);

// Main function to call and log the result
async function main() {
  const input = "Is this a question?";
  try {
    const result = await classifyText(input);
    console.log("Classification:", result);
  } catch (error) {
    console.error("Error:", error);
  }
}

main().catch(console.error);

Braintrust automatically captures and logs information behind the scenes:

You can use other AI model providers through the AI proxy or use native SDK wrappers for various AI providers. You can also pick from a number of integrations (OpenTelemetry, Vercel AI SDK, and others) or create a custom LLM client wrapper in less than 10 lines of code.

Log with `invoke`

For more information about logging when using invoke to execute a prompt directly, check out the prompt guide.

Log user feedback

Braintrust supports logging user feedback, which can take multiple forms:

A score for a specific span, e.g. the output of a request could be 👍 (corresponding to 1) or 👎 (corresponding to 0), or a document retrieved in a vector search might be marked as relevant or irrelevant on a scale of 0->1.
An expected value, which gets saved in the expected field of a span, alongside input and output. This is a great place to store corrections.
A comment, which is a free-form text field that can be used to provide additional context.
Additional metadata fields, which allow you to track information about the feedback, like the user_id or session_id.

Each time you submit feedback, you can specify one or more of these fields using the logFeedback() / log_feedback() method. Specify the id (accessible via span.id) corresponding to the span you want to log feedback for and the feedback fields you want to update. As you log user feedback, the fields will update in real time. The following example shows how to log feedback within a simple API endpoint.

import { initLogger, wrapOpenAI, wrapTraced } from "braintrust";
import OpenAI from "openai";

const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
});

const client = wrapOpenAI(
  new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
  }),
);

const someLLMFunction = wrapTraced(async function someLLMFunction(
  input: string,
) {
  return client.chat.completions.create({
    messages: [
      {
        role: "system",
        content: "Classify the following text as a question or a statement.",
      },
      {
        role: "user",
        content: input,
      },
    ],
    model: "gpt-4o",
  });
});

export async function POST(req: Request) {
  return logger.traced(async (span) => {
    const text = await req.text();
    const result = await someLLMFunction(text);
    span.log({ input: text, output: result });
    return {
      result,
      requestId: span.id,
    };
  });
}

// Assumes that the request is a JSON object with the requestId generated
// by the previous POST request, along with additional parameters like
// score (should be 1 for thumbs up and 0 for thumbs down), comment, and userId.
export async function POSTFeedback(req: Request) {
  const body = await req.json();
  logger.logFeedback({
    id: body.requestId,
    scores: {
      correctness: body.score,
    },
    comment: body.comment,
    metadata: {
      user_id: body.userId,
    },
  });
}

Collect multiple scores

Often, you want to collect multiple scores for a single span. For example, multiple users might provide independent feedback on a single document. Although each score and expected value is logged separately, each update overwrites the previous value. Instead, to capture multiple scores, you should create a new span for each submission, and log the score in the scores field. When you view and use the trace, Braintrust will automatically average the scores for you in the parent span(s).

import { initLogger, wrapOpenAI, wrapTraced } from "braintrust";
import OpenAI from "openai";

const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
});

const client = wrapOpenAI(
  new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
  }),
);

const someLLMFunction = wrapTraced(async function someLLMFunction(
  input: string,
) {
  return client.chat.completions.create({
    messages: [
      {
        role: "system",
        content: "Classify the following text as a question or a statement.",
      },
      {
        role: "user",
        content: input,
      },
    ],
    model: "gpt-4o",
  });
});

export async function POST(input: string) {
  return logger.traced(async (span) => {
    const result = await someLLMFunction(input);
    span.log({ input, output: result });
    return {
      result,
      requestId: await span.export(),
    };
  });
}

export async function POSTFeedback(body: {
  requestId: string;
  comment: string;
  score: number;
  userId: string;
}) {
  logger.traced(
    async (span) => {
      logger.logFeedback({
        id: span.id, // Use the newly created span's id, instead of the original request's id
        comment: body.comment,
        scores: {
          correctness: body.score,
        },
        metadata: {
          user_id: body.userId,
        },
      });
    },
    {
      parent: body.requestId,
      name: "feedback",
    },
  );
}

Implementation considerations

Data model

Each log entry is associated with an organization and a project. If you do not specify a project name or id in initLogger()/init_logger(), the SDK will create and use a project named “Global”.
Although logs are associated with a single project, you can still use them in evaluations or datasets that belong to any project.
Like evaluation experiments, log entries contain optional input, output, expected, scores, metadata, and metrics fields. These fields are optional, but we encourage you to use them to provide context to your logs.
Logs are indexed automatically to enable efficient search. When you load logs, Braintrust automatically returns the most recently updated log entries first. You can also search by arbitrary subfields, e.g. metadata.user_id = '1234'. Currently, inequality filters, e.g. scores.accuracy > 0.5 do not use an index.

Production vs. staging

There are a few ways to handle production vs. staging data. The most common pattern we see is to split them into different projects, so that they are separated and code changes to staging cannot affect production. Separating projects also allows you to enforce access controls at the project level. Alternatively, if it’s easier to keep things in one project (e.g. to have a single spot to triage them), you can use tags to separate them. If you need to physically isolate production and staging, you can create separate organizations, each mapping to a different deployment. Experiments, prompts, and playgrounds can all use data across projects. For example, if you want to reference a prompt from your production project in your staging logs, or evaluate using a dataset from staging in a different project, you can do so.

Initializing

The initLogger()/init_logger() method initializes the logger. Unlike the experiment init() method, the logger lazily initializes itself, so that you can call initLogger()/init_logger() at the top of your file (in module scope). The first time you log() or start a span, the logger will log into Braintrust and retrieve/initialize project details.

Flushing

The SDK can operate in two modes: either it sends log statements to the server after each request, or it buffers them in memory and sends them over in batches. Batching reduces the number of network requests and makes the log() command as fast as possible. Each SDK flushes logs to the server as fast as possible, and attempts to flush any outstanding logs when the program terminates. Background batching is controlled by setting the asyncFlush / async_flush flag in initLogger()/init_logger(). This flag is true by default in both the Python and TypeScript SDKs. It is the safer default, since async flushes mean that clients will not be blocked if Braintrust is down. When async flush mode is on, you can use the .flush() method to manually flush any outstanding logs to the server.

import { initLogger } from "braintrust";

const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
});

// ... Your application logic ...

// Some function that is called while cleaning up resources
async function cleanup() {
  await logger.flush();
}

Serverless environments

The async flag controls whether or not logs are flushed when a trace completes. This flag is set to true by default, but extra care should be taken in serverless environments where the process may halt as soon as the request completes. If the serverless environment does not have waitUntil, asyncFlush: false should be set. Note that both Vercel and Cloudflare have waitUntil.

import { initLogger } from "braintrust";

const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
  asyncFlush: false,
});

Vercel

Braintrust automatically utilizes Vercel’s waitUntil functionality if it’s available, so you can set asyncFlush: true in Vercel and your requests will not need to block on logging.

Advanced logging

For more advanced logging topics, see the advanced logging guide.

Start

Integrations

Core

Context

Best practices

Reference

Log LLM calls

Log with `invoke`

Log user feedback

Collect multiple scores

Implementation considerations

Data model

Production vs. staging

Initializing

Flushing

Serverless environments

Vercel

Advanced logging

Start

Integrations

Core

Context

Best practices

Reference

​Log LLM calls

​Log with invoke

​Log user feedback

​Collect multiple scores

​Implementation considerations

​Data model

​Production vs. staging

​Initializing

​Flushing

​Serverless environments

​Vercel

​Advanced logging

Log LLM calls

Log with `invoke`

Log user feedback

Collect multiple scores

Implementation considerations

Data model

Production vs. staging

Initializing

Flushing

Serverless environments

Vercel

Advanced logging