Understanding BTQL SQL linter warnings

Summary
Warning 1: Missing timestamp filter (fires before execution)
Why it fires
What triggers it
What satisfies it
What this warning does NOT mean
Warning 2: Query retrieves full span documents (fires after execution)
Why it fires
What typically triggers it
Workaround
Warning 3: Filter pushdown may have failed (fires after execution)
Why it fires
Common causes
Workaround
Warning 4: Sort pushdown may have failed (fires after execution)
Why it fires
Common causes
Workaround
Warning 5: GroupBy node — may be expensive at high cardinality (fires after execution)
Why it fires
Workaround
What this warning does NOT mean
Warning 6: Dynamic group-by collector — slower than columnar (fires after execution)
Why it fires
Common causes
Workaround
Summary
References

Applies to:

Plan -
Deployment -

Summary

Issue: The SQL sandbox shows a warning indicator next to the Run button, flagging a query as likely slow before or after execution. Cause: The linter detected a query pattern or execution plan shape associated with poor performance. All warnings are advisory — the query will still run and return correct results. Resolution: Varies by warning type. Each warning points to a different structural issue in the query or execution plan.

Warnings never block execution. A query with warnings may complete fine on small datasets but slow down or time out at scale.

Warning 1: Missing timestamp filter (fires before execution)

“This query may scan too much data and run slowly. Add a range filter on created, _xact_id, or _pagination_key, or scope to a specific root_span_id or id.”

This is the only warning that fires before the query runs, based solely on the query text. It applies only to project_logs() queries.

Why it fires

Without a range bound, a project_logs() query scans your entire project history regardless of any other filters. Other predicates (like metadata.session_id = 'x') don’t help because they’re applied after the scan, not before.

What triggers it

-- ❌ No timestamp filter — fires warning
SELECT id, input, output
FROM project_logs('proj-id')
WHERE metadata.user_id = 'u_42';

-- ❌ Equality on created is not a range filter — still fires
SELECT id FROM project_logs('proj-id')
WHERE created = '2024-01-01';

-- ❌ IS NOT NULL is not a range filter — still fires
SELECT id FROM project_logs('proj-id')
WHERE created IS NOT NULL;

-- ❌ `timestamp` field does not count — only the literal `created` column
SELECT id FROM project_logs('proj-id')
WHERE timestamp >= now() - INTERVAL 1 DAY;

What satisfies it

-- ✅ Range on created (most common)
SELECT id, scores.accuracy
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 7 DAY;

-- ✅ Range on _xact_id
SELECT id FROM project_logs('proj-id')
WHERE _xact_id >= 12345;

-- ✅ Range on _pagination_key
SELECT id FROM project_logs('proj-id')
WHERE _pagination_key >= '2025-01-01T00:00:00Z';

-- ✅ Equality filter on id or root_span_id
SELECT id, input, output
FROM project_logs('proj-id')
WHERE root_span_id = 'trace_abc';

SELECT id FROM project_logs('proj-id')
WHERE id IN ('span_a', 'span_b', 'span_c');

BETWEEN may or may not satisfy the rule depending on how the parser represents it. Use explicit >= / <= to be safe.

What this warning does NOT mean

It does not apply to experiment(), dataset(), view(), or other sources — only project_logs().
Nested id/root_span_id equality inside AND/OR still counts — the linter walks the full filter tree.
If the query fails to parse, zero warnings are shown — a clean result here does not mean the query is valid.

Warning 2: Query retrieves full span documents (fires after execution)

“Query retrieves full span documents, which can be slow.”

Why it fires

Brainstore stores certain columns in fast columnar storage (Tantivy fast fields) and larger fields (input, output, full metadata blobs) in document storage. When a query projects any field that isn’t available as a fast field, the engine must fetch the full raw JSON document for every matching row from object storage. This bypasses vectorized column-level reads.

What typically triggers it

SELECT * — forces a full-document fetch for every row
Selecting input, output, or error — these are typically large blobs stored outside the columnar index
Selecting metadata as a whole object
Selecting metadata.* keys that aren’t indexed as fast fields (arbitrary user-defined metadata keys)

-- ❌ SELECT * fetches full documents
SELECT *
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY;

-- ❌ Explicit blob fields also trigger it
SELECT id, input, output
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY;

Workaround

Project only the scalar fields you actually need:

-- ✅ Scalar fast fields only
SELECT id, created, scores.accuracy, metrics.tokens
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
LIMIT 100;

If you need input/output/metadata, narrow the result set first so you’re fetching full documents for a small number of rows:

-- ✅ First pass: find matching span IDs (cheap, columnar)
SELECT id
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
  AND scores.accuracy < 0.5
LIMIT 50;

-- Then fetch full documents for just those IDs
SELECT id, input, output
FROM project_logs('proj-id')
WHERE id IN ('span_a', 'span_b', ...);

Warning 3: Filter pushdown may have failed (fires after execution)

“Query plan includes a standalone Filter above Index search, so filter pushdown may have failed.”

Why it fires

Normally, filters run inside the index scan itself so only matching rows are returned. When pushdown fails, the engine retrieves a broader set of rows from the index and then applies the filter separately. This means more data moves through memory before being discarded. The linter detects this by looking for a "Filter" plan node as the direct parent of an "Index search" node in the execution plan.

Common causes

Filtering on a field that isn’t indexed (e.g., arbitrary user-defined metadata.* keys)
Wrapping an indexed column in a function: WHERE lower(name) = 'foo' prevents the index from being used directly
Arithmetic on a timestamp column: WHERE created + INTERVAL 1 HOUR > now()

Workaround

-- ❌ Function on indexed column blocks pushdown
WHERE lower(name) = 'my-span'

-- ✅ Compare the raw column; normalize case separately if needed
WHERE name = 'my-span'

-- ❌ Arithmetic on the indexed side
WHERE created + INTERVAL 1 HOUR > now()

-- ✅ Put arithmetic on the constant side
WHERE created > now() - INTERVAL 1 HOUR

If a filter on user metadata (e.g., metadata.tenant_id) is what’s blocking pushdown, keep the indexable predicates first to reduce the scan, and treat the metadata filter as a secondary refinement:

-- ✅ Index-pushdownable predicates narrow the scan first
SELECT id, metadata.tenant_id
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY   -- indexed → pushed down
  AND status = 'error'                     -- indexed → pushed down
  AND metadata.tenant_id = 'acme';         -- may not push down, applied after

Warning 4: Sort pushdown may have failed (fires after execution)

“Query plan includes a standalone Project above Index search, so sort pushdown may have failed.”

Why it fires

When sort pushdown works, rows come out of the index already in the requested order so no separate sort step is needed, and LIMIT can short-circuit the scan. When it fails, the engine fetches rows, re-projects, and sorts them in memory. The linter detects this by looking for the plan pattern: "Expand traces" → "Project" → "Index search" (nested by depth), combined with a sort clause (ORDER BY) being present in the query.

Common causes

Sorting on a computed expression: ORDER BY lower(name), ORDER BY abs(score - 0.5)
Sorting on a field that requires a projection step (i.e., not directly in the index’s sort order)
Projecting fields that can’t come from the index, forcing a re-project node between the trace expansion and the index search

Workaround

-- ❌ Sort on computed expression
ORDER BY lower(name)

-- ✅ Sort on a raw indexed column
ORDER BY created DESC

-- ✅ For project_logs pagination — natural index-ordered keys
SELECT id, created, scores.accuracy
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
ORDER BY _xact_id DESC
LIMIT 100;

Always combine a sort with a LIMIT — sort pushdown failure is most costly when there is no limit, because the engine must sort the entire result set.

Warning 5: GroupBy node — may be expensive at high cardinality (fires after execution)

“Query plan includes a GroupBy node, which can be expensive at high cardinality.”

Why it fires

A GroupBy plan node appears whenever a GROUP BY / dimensions: clause is present. The warning fires on the mere presence of the node, not on an estimated group count — even a GROUP BY model with 5 distinct values will trigger it. The actual cost scales with:

Number of distinct grouping-key values (cardinality). Grouping by user_id, trace_id, or raw text is expensive. Grouping by model or a score bucket is cheap.
Number of input rows being aggregated
Complexity of aggregate expressions (percentile(), count_distinct() cost more than count())

Workaround

-- ❌ High-cardinality group-by on unbounded input
SELECT user_id, COUNT(*)
FROM project_logs('proj-id')
GROUP BY user_id;

-- ✅ Narrow time range first, then group
SELECT user_id, COUNT(*)
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
GROUP BY user_id;

-- ✅ Prefer low-cardinality grouping keys
SELECT model, status, COUNT(*)
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
GROUP BY model, status;

For top-level aggregates (totals) without per-group breakdown, use the summary shape which is pre-aggregated:

SELECT COUNT(*), avg(scores.accuracy)
FROM project_logs('proj-id', shape => 'summary')
WHERE created >= now() - INTERVAL 1 DAY;

What this warning does NOT mean

It does not distinguish fast columnar group-by from slow dynamic group-by. See Warning 6 for the more specific signal about the execution strategy.
A groupByNode warning without a dynamicGroupByCollector warning means the engine used the columnar path — still worth reviewing cardinality, but the execution strategy is optimal.

Warning 6: Dynamic group-by collector — slower than columnar (fires after execution)

“Query plan uses Dynamic segment collector (dynamic group-by), which is typically slower than columnar group-by.”

Why it fires

Brainstore has two strategies for aggregation:

Columnar group-by — reads values directly from indexed fast fields in a vectorized loop. Fast.
Dynamic segment collector — walks each document, extracts grouping keys dynamically. Slower, especially at high cardinality.

The engine falls back to the dynamic collector when it can’t serve the grouping expression from indexed columns.

Common causes

Grouping on a computed expression: GROUP BY lower(name), GROUP BY model || '-' || status
Grouping on a field that isn’t a fast field (e.g., arbitrary user-defined metadata.* keys)
Multi-key group-bys where at least one key is not fast-field-eligible

Workaround

-- ❌ Grouping on a computed expression forces dynamic collector
SELECT lower(name) AS grp, COUNT(*)
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
GROUP BY lower(name);

-- ✅ Group on the raw column, normalize in application code
SELECT name, COUNT(*)
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
GROUP BY name;

-- ❌ Grouping on arbitrary user metadata (likely not a fast field)
SELECT metadata.tenant_id, COUNT(*)
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
GROUP BY metadata.tenant_id;

-- ✅ Use a metadata filter to narrow rows, group on a fast indexed field
SELECT status, COUNT(*)
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
  AND metadata.tenant_id = 'acme'
GROUP BY status;

Seeing both groupByNode and dynamicGroupByCollector together is the strongest signal to narrow the input with indexable predicates (time range, status, model, name) before the group-by runs.

Summary

Warning	Fires when	Scope	Primary fix
Missing timestamp filter	Pre-execution	`project_logs()` only	Add `WHERE created >= now() - INTERVAL N DAY`
Full span documents	Post-execution	Any query	Remove `SELECT *`; project only scalar fields
Filter pushdown	Post-execution	Any query	Avoid functions/arithmetic on indexed columns in `WHERE`
Sort pushdown	Post-execution	Queries with `ORDER BY`	Sort on raw indexed columns; add `LIMIT`
GroupBy node	Post-execution	Queries with `GROUP BY`	Narrow time range; use low-cardinality grouping keys
Dynamic group-by	Post-execution	Queries with `GROUP BY`	Group on raw indexed columns, not computed expressions

References

UI experiment timeout configuration for self-hosted k8s

Understanding Experiment Score Aggregation: Simple vs

⌘I

​Summary

​Warning 1: Missing timestamp filter (fires before execution)

​Why it fires

​What triggers it

​What satisfies it

​What this warning does NOT mean

​Warning 2: Query retrieves full span documents (fires after execution)

​Why it fires

​What typically triggers it

​Workaround

​Warning 3: Filter pushdown may have failed (fires after execution)

​Why it fires

​Common causes

​Workaround

​Warning 4: Sort pushdown may have failed (fires after execution)

​Why it fires

​Common causes

​Workaround

​Warning 5: GroupBy node — may be expensive at high cardinality (fires after execution)

​Why it fires

​Workaround

​What this warning does NOT mean

​Warning 6: Dynamic group-by collector — slower than columnar (fires after execution)

​Why it fires

​Common causes

​Workaround

​Summary

​References

Summary

Warning 1: Missing timestamp filter (fires before execution)

Why it fires

What triggers it

What satisfies it

What this warning does NOT mean

Warning 2: Query retrieves full span documents (fires after execution)

Why it fires

What typically triggers it

Workaround

Warning 3: Filter pushdown may have failed (fires after execution)

Why it fires

Common causes

Workaround

Warning 4: Sort pushdown may have failed (fires after execution)

Why it fires

Common causes

Workaround

Warning 5: GroupBy node — may be expensive at high cardinality (fires after execution)

Why it fires

Workaround

What this warning does NOT mean

Warning 6: Dynamic group-by collector — slower than columnar (fires after execution)

Why it fires

Common causes

Workaround

Summary

References