Prompts

Prompt engineering is a core activity in AI engineering. Braintrust allows you to create prompts, test them out in the playground, use them in your code, update them, and track their performance over time.

Create a prompt

To create a prompt in the UI:

Go to Prompts and click Prompt.
Specify the following:
- Name: A descriptive display name for the prompt.
- Slug: A unique, stable identifier used to reference the prompt in code. The slug remains constant even when you update the prompt’s content or name.
- Model and parameters: The model to use, along with model-specific parameters to configure, such as temperature to control randomness and max tokens to limit response length.
- Messages: The messages to send to the model to generate a response. Each message has a role (system, user, assistant, or tool) to help the model understand who is speaking and how to respond. Messages can contain text or include images (for vision-capable models). Messages also support mustache templating syntax to make them dynamic and reusable. Use {{variable}} to reference variables that will be substituted when the prompt is used, for example:
  {{#input.few_shots}} input: {{input}} output: {{output}} {{/input.few_shots}}
- Response format: By default, prompts return freeform text, but you can also return a JSON object or define a specific JSON schema for structured outputs (OpenAI models only). Structured outputs correspond to the response_format.json_schema argument in the OpenAI API.
- Description (optional): Context about what the prompt does and when to use it.
- Metadata (optional): Additional information to attach to the prompt.
Click Save as custom prompt.

To create a prompt in code, write a script and push it to Braintrust from the command line:

summarizer.ts

import * as braintrust from "braintrust";

const project = braintrust.projects.create({
  name: "Summarizer",
});

export const summarizer = project.prompts.create({
  name: "Summarizer",
  slug: "summarizer",
  description: "Summarize text",
  model: "claude-3-5-sonnet-latest",
  temperature: 0.7,
  max_tokens: 1000,
  messages: [
    {
      role: "system",
      content: "You are a helpful assistant that can summarize text.",
    },
    {
      role: "user",
      content: "{{{text}}}",
    },
  ],
  response_format: { type: "json_object" },
  metadata: { version: "1.0", purpose: "text-summarization" }
});

# Typescript
npx braintrust push summarizer.ts

# Python
braintrust push summarizer.py

Parameter	Description
`name`	A descriptive display name for the prompt.
`slug`	A unique, stable identifier used to reference the prompt in code. The slug remains constant even when you update the prompt’s content or name.
`description` (optional)	Context about what the prompt does and when to use it.
`model`	The model to use, along with model-specific parameters to configure, such as `temperature` to control randomness, `max_tokens` to limit response length.
`messages`	The messages to send to the model to generate a response. Each message has a role (system, user, assistant, or tool) to help the model understand who is speaking and how to respond. Messages can contain text or include images (for vision-capable models). Messages also support mustache templating syntax to make them dynamic and reusable. Use `{{variable}}` to reference variables that will be substituted when the prompt is used.
`response_format`	The format of the prompt response. By default, prompts return freeform text, but you can also return a JSON object or define a specific JSON schema for structured outputs (OpenAI models only). Structured outputs correspond to the `response_format.json_schema` argument in the OpenAI API. For more details, see the OpenAI integration guide.
`metadata` (optional)	Additional information to attach to the prompt.

To push a prompt directly from your code instead, add the project.publish() method:

summarizer.ts

import * as braintrust from "braintrust";

const project = braintrust.projects.create({
  name: "Summarizer",
});

...

async function main() {
  await project.publish();
}

main().catch(console.error);

Add tools

Tools extend your prompt’s capabilities by allowing the LLM to call functions during execution. This enables prompts to:

Query external APIs or databases
Perform calculations or data transformations
Retrieve information from vector stores or search engines
Execute custom business logic

When you add a tool to a prompt, the LLM can decide when to call it based on the user’s input, making your prompts more dynamic and capable.

To add tools to a prompt in the UI:

When creating or editing a prompt, click Tools.
Select tool functions from your library or add a raw tools as JSON. Raw tools corresponds to the tools argument in the OpenAI API.
Click Save tools.

To add tools to a prompt in code, use to the tools parameter.

In Python, the prompt and the tool are defined in the same file and pushed to Braintrust together. In TypeScript, they can be defined and pushed separately.

import * as braintrust from "braintrust";

const project = braintrust.projects.create({
  name: "RAG app",
});

export const docSearch = project.prompts.create({
  name: "Doc Search",
  slug: "document-search",
  model: "gpt-4o-mini",
  messages: [
    {
      role: "system",
      content: "You are a helpful assistant that can answer questions about the Braintrust documentation.",
    },
    {
      role: "user",
      content: "{{{question}}}",
    },
  ],
  tools: [toolRAG],
});

Add MCP servers

You can use public MCP (Model Context Protocol) servers to give your prompts access to external tools and data. This is useful for:

Evaluating complex tool calling workflows
Experimenting with external APIs and services
Tuning public MCP servers

MCP servers must be public and support OAuth authentication.

MCP servers are a UI-only feature. They work in playgrounds and experiments but not when invoked via SDK.

Add to a prompt

To add an MCP server to a prompt:

When creating or editing a prompt, directly or in a playground or experiment, click MCP.
Enable any available project-wide servers.
To add a prompt-specific MCP server, click MCP server:
- Provide a name, the public URL of the server, and an optional description.
- Click Add server.
- Authenticate the MCP server in your browser.

For each MCP server, you’ll see a list of available tools. Tools are enabled by default, but you can click individual tools to disable them or click Disable all to disable all tools in an MCP. After testing a prompt-specific MCP server, you can promote it to a project-wide server by clicking … > Save to project MCP servers.

Add to a project

Project-wide MCP servers are accessible across all projects in your organization. To add a project-wide MCP server:

Go to Configuration > MCP.
Click MCP server and provide a name, the public URL of the server, and an optional description.
Click Authenticate to authenticate the MCP server in your browser. After authenticating, you’ll see a list of tools that will available to prompts using the MCP server.
Click Save.

Version a prompt

Every time you save changes to a prompt, Braintrust creates a new version with a unique identifier (e.g., 5878bd218351fb8e). This versioning system allows you to:

Track the evolution of your prompts over time
Pin specific versions in production code
Roll back to previous versions if needed
Compare performance across different versions

You can manage different versions of prompts across your development lifecycle by assigning them to environments. For more information about environments and deployment strategies, see the Environments guide.

Test in a playground

Playgrounds provide an interactive environment for testing and refining prompts before deploying them. You can:

Test prompts with real-world inputs
Adjust parameters like temperature and max tokens in real-time
Compare outputs from different models
Save new versions once you’re satisfied with the results

To open a prompt in a playground, click the playground icon in the prompt editor or select a prompt from the prompts list.

Playgrounds also support structured outputs with visual schema builders, making it easy to configure and validate JSON schemas.

For more information about playgrounds, see the Playground guide.

Use in an experiment

Experiments allow you to systematically evaluate prompt performance across multiple test cases. When using prompts in experiments, you can:

Test prompts against datasets of inputs and expected outputs
Compare multiple prompt versions or configurations side-by-side
Measure performance using built-in or custom scoring functions
Identify regressions or improvements as you iterate

Experiments provide rigorous, data-driven insights into prompt quality and help you make informed decisions about which versions to deploy. For more information about experiments, see the Experiments guide.

Use in code

Invoke directly

In Braintrust, a prompt is a simple function that can be invoked directly through the SDK and REST API. When invoked, prompt functions leverage the proxy to access a wide range of providers and models with managed secrets, and are automatically traced and logged to your Braintrust project.

Functions are a broad concept that encompass prompts, code snippets, HTTP endpoints, and more. When using the functions API, you can use a prompt’s slug or ID as the function’s slug or ID, respectively. To learn more about functions, see the functions reference.

import { invoke } from "braintrust";

async function main() {
  const result = await invoke({
    projectName: "your project name",
    slug: "your prompt slug",
    input: {
      // These variables map to the template parameters in your prompt.
      question: "1+1",
    },
  });
  console.log(result);
}

main();

The return value, result, is a string unless you have tool calls, in which case it returns the arguments of the first tool call. In TypeScript, you can assert this by using the schema argument, which ensures your code matches a particular zod schema.

import { invoke } from "braintrust";
import { z } from "zod";

async function main() {
  const result = await invoke({
    projectName: "your project name",
    slug: "your prompt slug",
    input: {
      question: "1+1",
    },
    schema: z.string(),
  });
  console.log(result);
}

main();

Load a prompt

The loadPrompt()/load_prompt() function loads a prompt into a simple format that you can pass along to the OpenAI client. loadPrompt also caches prompts with a two-layered cache and attempts to use this cache if the prompt cannot be fetched from the Braintrust server:

A memory cache, which stores up to BRAINTRUST_PROMPT_CACHE_MEMORY_MAX prompts in memory. This defaults to 1024.
A disk cache, which stores up to BRAINTRUST_PROMPT_CACHE_DISK_MAX prompts on disk. This defaults to 1048576.

You can also configure the directory used by disk cache by setting the BRAINTRUST_PROMPT_CACHE_DIR environment variable.

import { OpenAI } from "openai";
import { initLogger, loadPrompt, wrapOpenAI } from "braintrust";

const logger = initLogger({ projectName: "your project name" });

// wrapOpenAI will make sure the client tracks usage of the prompt.
const client = wrapOpenAI(
  new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
  }),
);

async function runPrompt() {
  // Replace with your project name and slug
  const prompt = await loadPrompt({
    projectName: "your project name",
    slug: "your prompt slug",
    defaults: {
      // Parameters to use if not specified
      model: "gpt-3.5-turbo",
      temperature: 0.5,
    },
  });

  // Render with parameters
  return client.chat.completions.create(
    prompt.build({
      question: "1+1",
    }),
  );
}

To use another model provider, use the Braintrust proxy to access a wide range of models using the OpenAI format. You can also grab the messages and other parameters directly from the returned object to use a model library of your choice.

Pin a specific version

When loading a prompt, you can reference a specific version:

const prompt = await loadPrompt({
  projectName: "your project name",
  slug: "your prompt slug",
  version: "5878bd218351fb8e",
});

Get all versions

To retrieve a list of all available versions, use the getPromptVersions() function:

import { getPromptVersions } from "braintrust";

const versions = await getPromptVersions("<project-id>", "<prompt-id>");
// Returns: ["5878bd218351fb8e", "a1b2c3d4e5f6789", ...]

Add extra messages

If you’re building a chat app, it’s often useful to send back additional messages of context as you gather them. You can provide OpenAI-style messages to the invoke function by adding messages, which are appended to the end of the built-in messages.

import { invoke } from "braintrust";
import { z } from "zod";

async function reflection(question: string) {
  const result = await invoke({
    projectName: "your project name",
    slug: "your prompt slug",
    input: {
      question,
    },
    schema: z.string(),
  });
  console.log(result);

  const reflectionResult = await invoke({
    projectName: "your project name",
    slug: "your prompt slug",
    input: {
      question,
    },
    messages: [
      { role: "assistant", content: result },
      { role: "user", content: "Are you sure about that?" },
    ],
  });
  console.log(reflectionResult);
}

reflection("What is larger the Moon or the Earth?");

Stream results

You can also stream results in an easy-to-parse format.

import { invoke } from "braintrust";

async function main() {
  const result = await invoke({
    projectName: "your project name",
    slug: "your prompt slug",
    input: {
      question: "1+1",
    },
    stream: true,
  });

  for await (const chunk of result) {
    console.log(chunk);
    // { type: "text_delta", data: "The answer "}
    // { type: "text_delta", data: "is 2"}
  }
}

main();

If you’re using Next.js and the Vercel AI SDK, you can use the Braintrust adapter by installing the @braintrust/vercel-ai-sdk package and converting the stream to Vercel’s format.

vercel-braintrust-adapter.ts

import { invoke } from "braintrust";
import { BraintrustAdapter } from "@braintrust/vercel-ai-sdk";

export async function POST(req: Request) {
  const stream = await invoke({
    projectName: "your project name",
    slug: "your prompt slug",
    input: await req.json(),
    stream: true,
  });

  return BraintrustAdapter.toDataStreamResponse(stream);
}

You can also use streamText to leverage the Vercel AI SDK directly. Configure the OpenTelemetry environment variables to log these requests to Braintrust.

vercel-braintrust-streamtext.ts

import { openai } from "@ai-sdk/openai";
import { streamText } from "ai";

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const result = await streamText({
    model: openai("gpt-4o-mini"),
    prompt,
    experimental_telemetry: { isEnabled: true },
  });

  return result.toDataStreamResponse();
}

Log spans

invoke uses the active logging state of your application, just like any function decorated with @traced or wrapTraced. This means that if you initialize a logger while calling invoke, it will automatically log spans to Braintrust. By default, invoke requests will log to a root span, but you can customize the name of a span using the name argument.

import { invoke, initLogger, traced } from "braintrust";

initLogger({
  projectName: "My project",
});

async function main() {
  const result = await traced(
    async (span) => {
      span.log({
        tags: ["foo", "bar"],
      });
      const res = await invoke({
        projectName: "Joker",
        slug: "joker-3c10",
        input: {
          theme: "silicon valley",
        },
      });
      return res;
    },
    {
      name: "My name",
      type: "function",
    },
  );
  console.log(result);
}

main().catch(console.error);

You can also pass in the parent argument, a string that you can derive from span.export() while doing distributed tracing.

Set chat/completion format

In Python, prompt.build() returns a dictionary with chat or completion parameters, depending on the prompt type. In TypeScript, however, prompt.build() accepts an additional parameter (flavor) to specify the format. This allows prompt.build to be used in a more type-safe manner. When you specify a flavor, the SDK also validates that the parameters are correct for that format.

typescript-chat-completion.ts

const chatParams = prompt.build(
  {
    question: "1+1",
  },
  {
    // This is the default
    flavor: "chat",
  },
);

const completionParams = prompt.build(
  {
    question: "1+1",
  },
  {
    // Pass "completion" to get completion-shaped parameters
    flavor: "completion",
  },
);

Download a prompt

Use version control to download prompts to your local filesystem and ensure you’re using a specific version. Use the pull command to:

Download prompts to public projects so others can use them
Pin your production environment to a specific version without running them through Braintrust on the request path
Review changes to prompts in pull requests

$ npx braintrust pull --help
usage: cli.js pull [-h] [--output-dir OUTPUT_DIR] [--project-name PROJECT_NAME] [--project-id PROJECT_ID] [--id ID] [--slug SLUG] [--version VERSION] [--force]

optional arguments:
  -h, --help            show this help message and exit
  --output-dir OUTPUT_DIR
                        The directory to output the pulled resources to. If not specified, the current directory is used.
  --project-name PROJECT_NAME
                        The name of the project to pull from. If not specified, all projects are pulled.
  --project-id PROJECT_ID
                        The id of the project to pull from. If not specified, all projects are pulled.
  --id ID               The id of a specific function to pull.
  --slug SLUG           The slug of a specific function to pull.
  --version VERSION     The version to pull. Will pull the latest version of each prompt that is at or before this version.
  --force               Overwrite local files if they have uncommitted changes.

Currently, braintrust pull only supports TypeScript.

When you run braintrust pull, you can specify a project name, prompt slug, or version to pull. If you don’t specify any of these, all prompts across projects will be pulled into a separate file per project. For example, using this command to retrieve a project named Summary will generate the following file:

$ npx braintrust pull --project-name "Summary"

summary.ts

// This file was automatically generated by braintrust pull. You can
// generate it again by running:
//  $ braintrust pull --project-name "Summary"
// Feel free to edit this file manually, but once you do, you should make sure to
// sync your changes with Braintrust by running:
//  $ braintrust push "braintrust/summary.ts"

import braintrust from "braintrust";

const project = braintrust.projects.create({
  name: "Summary",
});

export const summaryBot = project.prompts.create({
  name: "Summary bot",
  slug: "summary-bot",
  model: "gpt-4o",
  messages: [
    { content: "Summarize the following passage.", role: "system" },
    { content: "{{content}}", role: "user" },
  ],
});

To pin your production environment to a specific version, run braintrust pull with the --version flag.

Open from traces

When you use a prompt in your code, Braintrust automatically links spans to the prompt used to generate them. This allows you to select a span to open it in the playground, and see the prompt that generated it alongside the input variables. You can even test and save a new version of the prompt directly from the playground.

This workflow is very powerful. It effectively allows you to debug, iterate, and publish changes to your prompts directly within Braintrust. And because Braintrust flexibly allows you to load the latest prompt, a specific version, or even a version controlled artifact, you have a lot of control over how these updates propagate into your production systems.

Using the API

The full lifecycle of prompts - creating, retrieving, modifying, etc. - can be managed through the REST API. See the API docs for more details.

Start

Integrations

Core

Context

Best practices

Reference

Create a prompt

Add tools

Add MCP servers

Add to a prompt

Add to a project

Version a prompt

Test in a playground

Use in an experiment

Use in code

Invoke directly

Load a prompt

Pin a specific version

Get all versions

Add extra messages

Stream results

Log spans

Set chat/completion format

Download a prompt

Open from traces

Using the API

Start

Integrations

Core

Context

Best practices

Reference

​Create a prompt

​Add tools

​Add MCP servers

​Add to a prompt

​Add to a project

​Version a prompt

​Test in a playground

​Use in an experiment

​Use in code

​Invoke directly

​Load a prompt

​Pin a specific version

​Get all versions

​Add extra messages

​Stream results

​Log spans

​Set chat/completion format

​Download a prompt

​Open from traces

​Using the API

Create a prompt

Add tools

Add MCP servers

Add to a prompt

Add to a project

Version a prompt

Test in a playground

Use in an experiment

Use in code

Invoke directly

Load a prompt

Pin a specific version

Get all versions

Add extra messages

Stream results

Log spans

Set chat/completion format

Download a prompt

Open from traces

Using the API