Skip to main content
Applies to:


Summary

Goal: Run evaluations tied to specific git commits to track experiments across multiple SHAs. Features: Git metadata tracking via repo_info.commit or automatic collection with git_metadata_settings.

Tracking Options

Option 1: Set Commit SHA Explicitly (API)

Pass repo_info.commit when launching an eval to tie it to a specific SHA.
curl https://api.braintrust.dev/v1/eval \
  -H "Authorization: Bearer $BRAINTRUST_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "your-project-id",
    "data": {
       "dataset_id": "your-dataset-id"
    },
    "task": {
       "function_id": "id-of-function-to-eval"
    },
    "scores": [
       {
          "function_id": "id-of-function-to-score-eval-on"
       }
    ],
    "repo_info": {
      "commit": "abc1234567890def"
    }
  }'

Option 2: Set Commit SHA Explicitly (SDK)

Specify repo_info when calling Eval() in Python or TypeScript.
from braintrust import Eval, init_dataset
from autoevals import Factuality

Eval(
    "My Project",
    data=init_dataset(project="My Project", name="My Dataset"),
    task=lambda input: call_model(input),  # Your LLM call here
    scores=[Factuality],
    metadata={
        "model": "gpt-4o",
        "temperature": 0.7,
    },
    repo_info={
        "commit": "abc1234567890def"
    }
)

Option 3: Enable Automatic Git Metadata Collection

Configure git_metadata_settings to automatically capture commit info from your environment. By default, this setting will collect all git metadata fields allowed in org-level settings.
import braintrust

braintrust.init(
    project="my-project",
    "git_metadata_settings": {
       "collect": "some",
       "fields": ["commit", "branch", "dirty"]
    }
)

Filtering

Run Evals Across Multiple SHAs

To run batch evals based on a specific sha, filtering will need to be handled via your CI/CD pipeline or scripts to target only the desired data. In addition, your datasets will need to have stored the sha for comparison in your code. The following example assumes your dataset has stored the sha in your datasets metadata following the same repo_info pattern used for eval tracking.
from braintrust import Eval, init_dataset
from autoevals import Factuality

target_sha = "abc123"

# Fetch the dataset
dataset = init_dataset(project="My Project", name="My Dataset")

# Filter by repo_info.commit
filtered_data = [
    record for record in dataset
    if record.get("metadata", {}).get("repo_info", {}).get("commit") == target_sha
]

def call_model(input):
      response = openai.ChatCompletion.create(
          model="gpt-4o",
          messages=[{"role": "user", "content": input}]
      )
      return response.choices[0].message.content

Eval(
    "My Project",
    data=filtered_data,
    task=lambda input: call_model(input),
    scores=[Factuality],
    metadata={
        "model": "gpt-4o",
        "temperature": 0.7,
    },
    experiment_name=f"eval-{target_sha[:7]}",  # Include SHA in experiment name
    repo_info={
        "commit": target_sha,
        "branch": "main",
    },
)

Filtering Evals by Commit SHA

Use commit SHA or 7+ character prefix to filter experiments in the UI or AI Search.
  • Navigate to your project’s experiments view
  • Use the search/filter bar with commit:abc1234 (minimum 7 chars for exact match)
  • Compare results across different SHAs

Best Practices

  • Use full SHA or minimum 7-character prefix for reliable filtering
  • Set commit SHA in CI/CD workflows using environment variables ($GITHUB_SHA, $CI_COMMIT_SHA)
  • Enable automatic collection for consistent metadata without manual specification
  • Tag evals with branch name alongside commit for easier navigation