Skip to main content
Applies to:
  • Plan -
  • Deployment -

Summary

created is the original row timestamp and is not changed when a span/trace is updated. Exports can contain multiple files with duplicate ids. Use the stable id plus _xact_id to dedupe and detect the newest exported version. Do not treat _xact_id as a first-class updated_at timestamp.

What is happening

Exported rows keep their original created timestamp. When a trace or span is rewritten the export produces a new row rather than replacing the old file. That produces duplicate rows in downstream tables. Each exported row includes a stable record id (id or span_id) and an internal transaction id (_xact_id). _xact_id increases on each write, so higher _xact_id means a newer exported version of the same record. Exports do not include an updated_at column today. Rely on id + _xact_id to find the latest version and maintain a downstream high-water mark.

Fix or suggestion

Option 1: dedupe on ingest (most common)

  1. Read the exported parquet files into a staging table.
  2. Keep one row per id by choosing the row with the highest _xact_id.
  3. Write the deduped result to your final table or overwrite the id partition.
Minimal SQL (Databricks / Spark SQL):
SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY _xact_id DESC) rn
  FROM production.braintrust.traces
) t
WHERE rn = 1;
  • Use id for trace-level exports and span_id (or id, depending on shape) for span exports.
  • Keep _xact_id in the final table for future comparisons.

Option 2: incremental processing by _xact_id

  1. After each export run, store the highest _xact_id you processed.
  2. On the next ingest run, process only rows where _xact_id > last_exported_xact_id.
  3. Process only these rows (then dedupe by id if needed) and update last_exported_xact_id.
Example incremental filter:
SELECT * FROM production.braintrust.traces
WHERE date >= '2026-06-01'
  AND _xact_id > 123456789;
Pattern benefits:
  • Avoids reprocessing stable rows.
  • Ensures you capture writes in monotonic order.

Datadog / metric timestamp guidance

  • Use created as the metric timestamp when you want the metric to reflect the original event time (historical backfill).
  • Use _xact_id ordering to decide which row is the latest when a trace is updated.
  • If you must send a metric that reflects when your pipeline observed an update, choose one of:
    • Use the export run time (processing timestamp) for updated rows detected by _xact_id.
    • Or send only rows where _xact_id > last_exported_xact_id and set metric timestamp = the export run time.
  • Datadog APIs accept historical timestamps; supply the desired timestamp when submitting metrics.

How to confirm it worked

  • Count duplicates before and after dedupe: confirm no more than one row per id.
  • Verify the kept row per id has the highest _xact_id.
  • For incremental runs, confirm last_exported_xact_id increases and only new _xact_id values are processed.

Notes

  • Exports are partitioned by date, based on the row’s transaction date from _xact_id, not the original created timestamp or the export run time. Updated rows can appear in a later date= partition than their original created date, so incremental pipelines should scan new date= partitions and use _xact_id as the high-water mark.
  • There is no exported updated_at field currently. If you require a per-row update timestamp, either derive/update it in your ingest pipeline or request a product feature to add an updated_at export column.