Applies to:
- Plan -
- Deployment -
Summary
created is the original row timestamp and is not changed when a span/trace is updated. Exports can contain multiple files with duplicate ids. Use the stable id plus _xact_id to dedupe and detect the newest exported version. Do not treat _xact_id as a first-class updated_at timestamp.
What is happening
Exported rows keep their originalcreated timestamp. When a trace or span is rewritten the export produces a new row rather than replacing the old file. That produces duplicate rows in downstream tables. Each exported row includes a stable record id (id or span_id) and an internal transaction id (_xact_id). _xact_id increases on each write, so higher _xact_id means a newer exported version of the same record.
Exports do not include an updated_at column today. Rely on id + _xact_id to find the latest version and maintain a downstream high-water mark.
Fix or suggestion
Option 1: dedupe on ingest (most common)
- Read the exported parquet files into a staging table.
- Keep one row per id by choosing the row with the highest
_xact_id. - Write the deduped result to your final table or overwrite the id partition.
- Use
idfor trace-level exports andspan_id(orid, depending on shape) for span exports. - Keep
_xact_idin the final table for future comparisons.
Option 2: incremental processing by _xact_id
- After each export run, store the highest
_xact_idyou processed. - On the next ingest run, process only rows where
_xact_id > last_exported_xact_id. - Process only these rows (then dedupe by id if needed) and update
last_exported_xact_id.
- Avoids reprocessing stable rows.
- Ensures you capture writes in monotonic order.
Datadog / metric timestamp guidance
- Use
createdas the metric timestamp when you want the metric to reflect the original event time (historical backfill). - Use
_xact_idordering to decide which row is the latest when a trace is updated. - If you must send a metric that reflects when your pipeline observed an update, choose one of:
- Use the export run time (processing timestamp) for updated rows detected by
_xact_id. - Or send only rows where
_xact_id > last_exported_xact_idand set metric timestamp = the export run time.
- Use the export run time (processing timestamp) for updated rows detected by
- Datadog APIs accept historical timestamps; supply the desired timestamp when submitting metrics.
How to confirm it worked
- Count duplicates before and after dedupe: confirm no more than one row per id.
- Verify the kept row per id has the highest
_xact_id. - For incremental runs, confirm
last_exported_xact_idincreases and only new_xact_idvalues are processed.
Notes
- Exports are partitioned by
date, based on the row’s transaction date from_xact_id, not the originalcreatedtimestamp or the export run time. Updated rows can appear in a laterdate=partition than their originalcreateddate, so incremental pipelines should scan newdate=partitions and use_xact_idas the high-water mark. - There is no exported
updated_atfield currently. If you require a per-row update timestamp, either derive/update it in your ingest pipeline or request a product feature to add anupdated_atexport column.