Skip to content

Remove datafusion.optimizer.physical_uncorrelated_scalar_subquery config option #22566

@LiaCastaneda

Description

@LiaCastaneda

Is your feature request related to a problem or challenge?

We had to add the session property datafusion.optimizer.physical_uncorrelated_scalar_subquery in #22530 because the PR introduces a node to execute subqueries, ScalarSubqueryExec, whose structure holds shared state.

More context can be found in the following comment: datafusion-contrib/datafusion-distributed#460 (comment)

If ScalarSubqueryExec and the node holding the ScalarSubqueryExpr have a network boundary between them (i.e. they are on different machines), the query fails during deserialization with the following error: Internal("ScalarSubqueryExpr can only be deserialized as part of a surrounding ScalarSubqueryExec"). Even if we edited the decoder to create a fresh ScalarSubqueryResults on the worker, I don’t think the result would be correct: the worker’s slots are backed by a different Arc from the coordinator’s, so the writer would never fill them.

This affects both datafusion-distributed and Ballista (and we suspect in any distributed query engine that ends up in the situation mentioned above). Because this is a breaking change for them, we decided to gate planning of the new node (instead of the previous LeftJoin) behind a config option.

Ideally, we want to maintain this flag for a few more releases and then deprecate it (removing support for LeftJoin execution for scalar subqueries)

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions