Running Evaluations Per Git Commit: SHA-Based Experiment

Applies to:

Plan -
Deployment -

Summary

Goal: Run evaluations tied to specific git commits to track experiments across multiple SHAs. Features: Git metadata tracking via repo_info.commit or automatic collection with git_metadata_settings.

Tracking Options

Option 1: Set Commit SHA Explicitly (API)

Pass repo_info.commit when launching an eval to tie it to a specific SHA.

curl https://api.braintrust.dev/v1/eval \
  -H "Authorization: Bearer $BRAINTRUST_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "your-project-id",
    "data": {
       "dataset_id": "your-dataset-id"
    },
    "task": {
       "function_id": "id-of-function-to-eval"
    },
    "scores": [
       {
          "function_id": "id-of-function-to-score-eval-on"
       }
    ],
    "repo_info": {
      "commit": "abc1234567890def"
    }
  }'

Option 2: Set Commit SHA Explicitly (SDK)

Specify repo_info when calling Eval() in Python or TypeScript.

from braintrust import Eval, init_dataset
from autoevals import Factuality

Eval(
    "My Project",
    data=init_dataset(project="My Project", name="My Dataset"),
    task=lambda input: call_model(input),  # Your LLM call here
    scores=[Factuality],
    metadata={
        "model": "gpt-4o",
        "temperature": 0.7,
    },
    repo_info={
        "commit": "abc1234567890def"
    }
)

Option 3: Enable Automatic Git Metadata Collection

Configure git_metadata_settings to automatically capture commit info from your environment. Default behavior by Python SDK version:

Before v0.20.0: omitting git_metadata_settings requested every field allowed by your org’s Settings > Logging policy. With no org policy saved, the SDK treated this as “collect all available fields,” so git_diff was attached automatically for dirty working trees.
v0.20.0 to v0.22.x: omitting git_metadata_settings requests a fixed 8-field set (commit, branch, tag, dirty, author_name, author_email, commit_message, commit_time) and excludes git_diff when your org has not saved a policy. If your org has saved a policy at Settings > Logging, that policy applies instead.
v0.23.0+: omitting git_metadata_settings follows your org’s Settings > Logging policy instead of a built-in default field set. If your org hasn’t saved a policy, no git metadata is collected. Pass git_metadata_settings explicitly to override.

To collect git_diff from v0.20.0+, your org must enable it at Settings > Logging. The call-site git_metadata_settings can narrow what a specific eval collects but cannot expand beyond the org’s saved policy. The server strips git_diff from logged repo info unless your org has opted in, so this is enforced regardless of SDK version or call-site settings (data plane v2.2.0 or later).

import braintrust

braintrust.init(
    project="my-project",
    "git_metadata_settings": {
       "collect": "some",
       "fields": ["commit", "branch", "dirty"]
    }
)

Filtering

Run Evals Across Multiple SHAs

To run batch evals based on a specific sha, filtering will need to be handled via your CI/CD pipeline or scripts to target only the desired data. In addition, your datasets will need to have stored the sha for comparison in your code. The following example assumes your dataset has stored the sha in your datasets metadata following the same repo_info pattern used for eval tracking.

from braintrust import Eval, init_dataset
from autoevals import Factuality

target_sha = "abc123"

# Fetch the dataset
dataset = init_dataset(project="My Project", name="My Dataset")

# Filter by repo_info.commit
filtered_data = [
    record for record in dataset
    if record.get("metadata", {}).get("repo_info", {}).get("commit") == target_sha
]

def call_model(input):
      response = openai.ChatCompletion.create(
          model="gpt-4o",
          messages=[{"role": "user", "content": input}]
      )
      return response.choices[0].message.content

Eval(
    "My Project",
    data=filtered_data,
    task=lambda input: call_model(input),
    scores=[Factuality],
    metadata={
        "model": "gpt-4o",
        "temperature": 0.7,
    },
    experiment_name=f"eval-{target_sha[:7]}",  # Include SHA in experiment name
    repo_info={
        "commit": target_sha,
        "branch": "main",
    },
)

Filtering Evals by Commit SHA

Use commit SHA or 7+ character prefix to filter experiments in the UI or AI Search.

Navigate to your project’s experiments view
Use the search/filter bar with commit:abc1234 (minimum 7 chars for exact match)
Compare results across different SHAs

Best Practices

Use full SHA or minimum 7-character prefix for reliable filtering
Set commit SHA in CI/CD workflows using environment variables ($GITHUB_SHA, $CI_COMMIT_SHA)
Enable automatic collection for consistent metadata without manual specification
Tag evals with branch name alongside commit for easier navigation

​Summary

​Tracking Options

​Option 1: Set Commit SHA Explicitly (API)

​Option 2: Set Commit SHA Explicitly (SDK)

​Option 3: Enable Automatic Git Metadata Collection

​Filtering

​Run Evals Across Multiple SHAs

​Filtering Evals by Commit SHA

​Best Practices

​Related Documentation