Skip to main content
Applies to:
  • Plan -
  • Deployment -

Summary

Issue: The SQL sandbox shows a warning indicator next to the Run button, flagging a query as likely slow before or after execution. Cause: The linter detected a query pattern or execution plan shape associated with poor performance. All warnings are advisory — the query will still run and return correct results. Resolution: Varies by warning type. Each warning points to a different structural issue in the query or execution plan.
Warnings never block execution. A query with warnings may complete fine on small datasets but slow down or time out at scale.

Warning 1: Missing timestamp filter (fires before execution)

“This query may scan too much data and run slowly. Add a range filter on created, _xact_id, or _pagination_key, or scope to a specific root_span_id or id.”
This is the only warning that fires before the query runs, based solely on the query text. It applies only to project_logs() queries.

Why it fires

Without a range bound, a project_logs() query scans your entire project history regardless of any other filters. Other predicates (like metadata.session_id = 'x') don’t help because they’re applied after the scan, not before.

What triggers it

-- ❌ No timestamp filter — fires warning
SELECT id, input, output
FROM project_logs('proj-id')
WHERE metadata.user_id = 'u_42';

-- ❌ Equality on created is not a range filter — still fires
SELECT id FROM project_logs('proj-id')
WHERE created = '2024-01-01';

-- ❌ IS NOT NULL is not a range filter — still fires
SELECT id FROM project_logs('proj-id')
WHERE created IS NOT NULL;

-- ❌ `timestamp` field does not count — only the literal `created` column
SELECT id FROM project_logs('proj-id')
WHERE timestamp >= now() - INTERVAL 1 DAY;

What satisfies it

-- ✅ Range on created (most common)
SELECT id, scores.accuracy
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 7 DAY;

-- ✅ Range on _xact_id
SELECT id FROM project_logs('proj-id')
WHERE _xact_id >= 12345;

-- ✅ Range on _pagination_key
SELECT id FROM project_logs('proj-id')
WHERE _pagination_key >= '2025-01-01T00:00:00Z';

-- ✅ Equality filter on id or root_span_id
SELECT id, input, output
FROM project_logs('proj-id')
WHERE root_span_id = 'trace_abc';

SELECT id FROM project_logs('proj-id')
WHERE id IN ('span_a', 'span_b', 'span_c');
BETWEEN may or may not satisfy the rule depending on how the parser represents it. Use explicit >= / <= to be safe.

What this warning does NOT mean

  • It does not apply to experiment(), dataset(), view(), or other sources — only project_logs().
  • Nested id/root_span_id equality inside AND/OR still counts — the linter walks the full filter tree.
  • If the query fails to parse, zero warnings are shown — a clean result here does not mean the query is valid.

Warning 2: Query retrieves full span documents (fires after execution)

“Query retrieves full span documents, which can be slow.”

Why it fires

Brainstore stores certain columns in fast columnar storage (Tantivy fast fields) and larger fields (input, output, full metadata blobs) in document storage. When a query projects any field that isn’t available as a fast field, the engine must fetch the full raw JSON document for every matching row from object storage. This bypasses vectorized column-level reads.

What typically triggers it

  • SELECT * — forces a full-document fetch for every row
  • Selecting input, output, or error — these are typically large blobs stored outside the columnar index
  • Selecting metadata as a whole object
  • Selecting metadata.* keys that aren’t indexed as fast fields (arbitrary user-defined metadata keys)
-- ❌ SELECT * fetches full documents
SELECT *
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY;

-- ❌ Explicit blob fields also trigger it
SELECT id, input, output
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY;

Workaround

Project only the scalar fields you actually need:
-- ✅ Scalar fast fields only
SELECT id, created, scores.accuracy, metrics.tokens
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
LIMIT 100;
If you need input/output/metadata, narrow the result set first so you’re fetching full documents for a small number of rows:
-- ✅ First pass: find matching span IDs (cheap, columnar)
SELECT id
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
  AND scores.accuracy < 0.5
LIMIT 50;

-- Then fetch full documents for just those IDs
SELECT id, input, output
FROM project_logs('proj-id')
WHERE id IN ('span_a', 'span_b', ...);

Warning 3: Filter pushdown may have failed (fires after execution)

“Query plan includes a standalone Filter above Index search, so filter pushdown may have failed.”

Why it fires

Normally, filters run inside the index scan itself so only matching rows are returned. When pushdown fails, the engine retrieves a broader set of rows from the index and then applies the filter separately. This means more data moves through memory before being discarded. The linter detects this by looking for a "Filter" plan node as the direct parent of an "Index search" node in the execution plan.

Common causes

  • Filtering on a field that isn’t indexed (e.g., arbitrary user-defined metadata.* keys)
  • Wrapping an indexed column in a function: WHERE lower(name) = 'foo' prevents the index from being used directly
  • Arithmetic on a timestamp column: WHERE created + INTERVAL 1 HOUR > now()

Workaround

-- ❌ Function on indexed column blocks pushdown
WHERE lower(name) = 'my-span'

-- ✅ Compare the raw column; normalize case separately if needed
WHERE name = 'my-span'
-- ❌ Arithmetic on the indexed side
WHERE created + INTERVAL 1 HOUR > now()

-- ✅ Put arithmetic on the constant side
WHERE created > now() - INTERVAL 1 HOUR
If a filter on user metadata (e.g., metadata.tenant_id) is what’s blocking pushdown, keep the indexable predicates first to reduce the scan, and treat the metadata filter as a secondary refinement:
-- ✅ Index-pushdownable predicates narrow the scan first
SELECT id, metadata.tenant_id
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY   -- indexed → pushed down
  AND status = 'error'                     -- indexed → pushed down
  AND metadata.tenant_id = 'acme';         -- may not push down, applied after

Warning 4: Sort pushdown may have failed (fires after execution)

“Query plan includes a standalone Project above Index search, so sort pushdown may have failed.”

Why it fires

When sort pushdown works, rows come out of the index already in the requested order so no separate sort step is needed, and LIMIT can short-circuit the scan. When it fails, the engine fetches rows, re-projects, and sorts them in memory. The linter detects this by looking for the plan pattern: "Expand traces""Project""Index search" (nested by depth), combined with a sort clause (ORDER BY) being present in the query.

Common causes

  • Sorting on a computed expression: ORDER BY lower(name), ORDER BY abs(score - 0.5)
  • Sorting on a field that requires a projection step (i.e., not directly in the index’s sort order)
  • Projecting fields that can’t come from the index, forcing a re-project node between the trace expansion and the index search

Workaround

-- ❌ Sort on computed expression
ORDER BY lower(name)

-- ✅ Sort on a raw indexed column
ORDER BY created DESC
-- ✅ For project_logs pagination — natural index-ordered keys
SELECT id, created, scores.accuracy
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
ORDER BY _xact_id DESC
LIMIT 100;
Always combine a sort with a LIMIT — sort pushdown failure is most costly when there is no limit, because the engine must sort the entire result set.

Warning 5: GroupBy node — may be expensive at high cardinality (fires after execution)

“Query plan includes a GroupBy node, which can be expensive at high cardinality.”

Why it fires

A GroupBy plan node appears whenever a GROUP BY / dimensions: clause is present. The warning fires on the mere presence of the node, not on an estimated group count — even a GROUP BY model with 5 distinct values will trigger it. The actual cost scales with:
  • Number of distinct grouping-key values (cardinality). Grouping by user_id, trace_id, or raw text is expensive. Grouping by model or a score bucket is cheap.
  • Number of input rows being aggregated
  • Complexity of aggregate expressions (percentile(), count_distinct() cost more than count())

Workaround

-- ❌ High-cardinality group-by on unbounded input
SELECT user_id, COUNT(*)
FROM project_logs('proj-id')
GROUP BY user_id;

-- ✅ Narrow time range first, then group
SELECT user_id, COUNT(*)
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
GROUP BY user_id;
-- ✅ Prefer low-cardinality grouping keys
SELECT model, status, COUNT(*)
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
GROUP BY model, status;
For top-level aggregates (totals) without per-group breakdown, use the summary shape which is pre-aggregated:
SELECT COUNT(*), avg(scores.accuracy)
FROM project_logs('proj-id', shape => 'summary')
WHERE created >= now() - INTERVAL 1 DAY;

What this warning does NOT mean

  • It does not distinguish fast columnar group-by from slow dynamic group-by. See Warning 6 for the more specific signal about the execution strategy.
  • A groupByNode warning without a dynamicGroupByCollector warning means the engine used the columnar path — still worth reviewing cardinality, but the execution strategy is optimal.

Warning 6: Dynamic group-by collector — slower than columnar (fires after execution)

“Query plan uses Dynamic segment collector (dynamic group-by), which is typically slower than columnar group-by.”

Why it fires

Brainstore has two strategies for aggregation:
  • Columnar group-by — reads values directly from indexed fast fields in a vectorized loop. Fast.
  • Dynamic segment collector — walks each document, extracts grouping keys dynamically. Slower, especially at high cardinality.
The engine falls back to the dynamic collector when it can’t serve the grouping expression from indexed columns.

Common causes

  • Grouping on a computed expression: GROUP BY lower(name), GROUP BY model || '-' || status
  • Grouping on a field that isn’t a fast field (e.g., arbitrary user-defined metadata.* keys)
  • Multi-key group-bys where at least one key is not fast-field-eligible

Workaround

-- ❌ Grouping on a computed expression forces dynamic collector
SELECT lower(name) AS grp, COUNT(*)
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
GROUP BY lower(name);

-- ✅ Group on the raw column, normalize in application code
SELECT name, COUNT(*)
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
GROUP BY name;
-- ❌ Grouping on arbitrary user metadata (likely not a fast field)
SELECT metadata.tenant_id, COUNT(*)
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
GROUP BY metadata.tenant_id;

-- ✅ Use a metadata filter to narrow rows, group on a fast indexed field
SELECT status, COUNT(*)
FROM project_logs('proj-id')
WHERE created >= now() - INTERVAL 1 DAY
  AND metadata.tenant_id = 'acme'
GROUP BY status;
Seeing both groupByNode and dynamicGroupByCollector together is the strongest signal to narrow the input with indexable predicates (time range, status, model, name) before the group-by runs.

Summary

WarningFires whenScopePrimary fix
Missing timestamp filterPre-executionproject_logs() onlyAdd WHERE created >= now() - INTERVAL N DAY
Full span documentsPost-executionAny queryRemove SELECT *; project only scalar fields
Filter pushdownPost-executionAny queryAvoid functions/arithmetic on indexed columns in WHERE
Sort pushdownPost-executionQueries with ORDER BYSort on raw indexed columns; add LIMIT
GroupBy nodePost-executionQueries with GROUP BYNarrow time range; use low-cardinality grouping keys
Dynamic group-byPost-executionQueries with GROUP BYGroup on raw indexed columns, not computed expressions

References