Skip to content

Unparser emits aggregate twice when Projection reaches through Limit/Sort to claim Aggregate #21374

@yonatan-sevenai

Description

@yonatan-sevenai

Describe the bug

When the SQL unparser encounters a Projection → Limit → Aggregate (or Projection → Sort → Aggregate) plan shape where the Aggregate's aliases are inlined (no intermediate Projection between
Limit and Aggregate), it emits the aggregate expressions twice — once in the outer SELECT and once inside a spurious derived subquery.

This happens because:

  1. reconstruct_select_statement reaches through the Limit/Sort via find_agg_node_within_select to claim the Aggregate, setting the SELECT items and GROUP BY
  2. When recursion then hits the Limit/Sort arm, it sees already_projected == true and wraps everything below in a derived subquery
  3. The Aggregate ends up emitted in both the outer SELECT and inside the derived subquery

PR with fix coming momentarily

To Reproduce

  1. Parser-roundtrip query (works correctly because the parser inserts an intermediate Projection)
  SELECT __agg_0 AS "min(j1_id)", __agg_1 AS "max(j1_id)"
  FROM (
    SELECT min(j1_rename) AS __agg_0, max(j1_rename) AS __agg_1
    FROM (SELECT j1_id AS j1_rename FROM j1) AS bla
    LIMIT 20
  )

This roundtrips correctly because the parser produces a plan with a Projection between the Limit and
Aggregate:

  Projection: __agg_0 AS min(j1_id), __agg_1 AS max(j1_id)
    Limit: skip=0, fetch=20
      Projection: min(bla.j1_rename) AS __agg_0, max(bla.j1_rename) AS __agg_1  ← parser inserts this
        Aggregate: groupBy=[[]], aggr=[[min(bla.j1_rename), max(bla.j1_rename)]]
          SubqueryAlias: bla
  1. Manually constructed plan (exposes the bug)

When the aliases are inlined into the Aggregate (no intermediate Projection), as optimizers or plan builders may produce:

  Projection: __agg_0 AS max1(j1_id), __agg_1 AS max2(j1_id)
    Limit: skip=0, fetch=20
      Aggregate: groupBy=[[]], aggr=[[max(bla.j1_rename) AS __agg_0, max(bla.j1_rename) AS __agg_1]]
        SubqueryAlias: bla
          Projection: j1.j1_id AS j1_rename
            TableScan: j1

Actual output:

SELECT max(bla.j1_rename) AS "max1(j1_id)", max(bla.j1_rename) AS "max2(j1_id)"
 FROM (SELECT max(bla.j1_rename) AS __agg_0, max(bla.j1_rename) AS __agg_1
       FROM (SELECT j1.j1_id AS j1_rename FROM j1) AS bla LIMIT 20)

The aggregate is emitted twice and the outer SELECT references bla.j1_rename which is out of scope.

  1. Same bug with Sort instead of Limit
  Projection: __agg_0 AS max1(j1_id)
    Sort: __agg_0 ASC
      Aggregate: groupBy=[[]], aggr=[[max(bla.j1_rename) AS __agg_0]]
        SubqueryAlias: bla

Produces the same spurious derived subquery pattern.

Expected behavior

The unparser should emit a single SELECT statement:

SELECT max(bla.j1_rename) AS "max1(j1_id)", max(bla.j1_rename) AS "max2(j1_id)"
 FROM (SELECT j1.j1_id AS j1_rename FROM j1) AS bla
 LIMIT 20

When the Projection claims an Aggregate by reaching through a Limit or Sort, those intermediate nodes should be folded into the same SELECT statement rather than triggering a derived
subquery.

Additional context

The fix is to detect in the Projection arm when reconstruct_select_statement found an Aggregate and the Projection's direct child is a Limit or Sort. In that case, fold the Limit/Sort
clauses (LIMIT/OFFSET/ORDER BY) into the current query and recurse into the Limit/Sort's child, skipping the node so it never reaches the already_projected guard.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions