-
Notifications
You must be signed in to change notification settings - Fork 2k
Unparser emits aggregate twice when Projection reaches through Limit/Sort to claim Aggregate #21374
Description
Describe the bug
When the SQL unparser encounters a Projection → Limit → Aggregate (or Projection → Sort → Aggregate) plan shape where the Aggregate's aliases are inlined (no intermediate Projection between
Limit and Aggregate), it emits the aggregate expressions twice — once in the outer SELECT and once inside a spurious derived subquery.
This happens because:
- reconstruct_select_statement reaches through the Limit/Sort via
find_agg_node_within_selectto claim the Aggregate, setting the SELECT items and GROUP BY - When recursion then hits the Limit/Sort arm, it sees
already_projected == trueand wraps everything below in a derived subquery - The Aggregate ends up emitted in both the outer SELECT and inside the derived subquery
PR with fix coming momentarily
To Reproduce
- Parser-roundtrip query (works correctly because the parser inserts an intermediate Projection)
SELECT __agg_0 AS "min(j1_id)", __agg_1 AS "max(j1_id)"
FROM (
SELECT min(j1_rename) AS __agg_0, max(j1_rename) AS __agg_1
FROM (SELECT j1_id AS j1_rename FROM j1) AS bla
LIMIT 20
)
This roundtrips correctly because the parser produces a plan with a Projection between the Limit and
Aggregate:
Projection: __agg_0 AS min(j1_id), __agg_1 AS max(j1_id)
Limit: skip=0, fetch=20
Projection: min(bla.j1_rename) AS __agg_0, max(bla.j1_rename) AS __agg_1 ← parser inserts this
Aggregate: groupBy=[[]], aggr=[[min(bla.j1_rename), max(bla.j1_rename)]]
SubqueryAlias: bla
- Manually constructed plan (exposes the bug)
When the aliases are inlined into the Aggregate (no intermediate Projection), as optimizers or plan builders may produce:
Projection: __agg_0 AS max1(j1_id), __agg_1 AS max2(j1_id)
Limit: skip=0, fetch=20
Aggregate: groupBy=[[]], aggr=[[max(bla.j1_rename) AS __agg_0, max(bla.j1_rename) AS __agg_1]]
SubqueryAlias: bla
Projection: j1.j1_id AS j1_rename
TableScan: j1
Actual output:
SELECT max(bla.j1_rename) AS "max1(j1_id)", max(bla.j1_rename) AS "max2(j1_id)"
FROM (SELECT max(bla.j1_rename) AS __agg_0, max(bla.j1_rename) AS __agg_1
FROM (SELECT j1.j1_id AS j1_rename FROM j1) AS bla LIMIT 20)
The aggregate is emitted twice and the outer SELECT references bla.j1_rename which is out of scope.
- Same bug with Sort instead of Limit
Projection: __agg_0 AS max1(j1_id)
Sort: __agg_0 ASC
Aggregate: groupBy=[[]], aggr=[[max(bla.j1_rename) AS __agg_0]]
SubqueryAlias: bla
Produces the same spurious derived subquery pattern.
Expected behavior
The unparser should emit a single SELECT statement:
SELECT max(bla.j1_rename) AS "max1(j1_id)", max(bla.j1_rename) AS "max2(j1_id)"
FROM (SELECT j1.j1_id AS j1_rename FROM j1) AS bla
LIMIT 20
When the Projection claims an Aggregate by reaching through a Limit or Sort, those intermediate nodes should be folded into the same SELECT statement rather than triggering a derived
subquery.
Additional context
The fix is to detect in the Projection arm when reconstruct_select_statement found an Aggregate and the Projection's direct child is a Limit or Sort. In that case, fold the Limit/Sort
clauses (LIMIT/OFFSET/ORDER BY) into the current query and recurse into the Limit/Sort's child, skipping the node so it never reaches the already_projected guard.