Skip to content

Conversation

@peter-toth
Copy link
Contributor

What changes were proposed in this pull request?

This PR reverts the previous fix #38052 and adds subquery reference tracking to MergeScalarSubqueries to restore previous functionality of merging independent nested subqueries.

Why are the changes needed?

Restore previous functionality but fix the bug discovered in https://issues.apache.org/jira/browse/SPARK-40618.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing and new UTs.

@github-actions github-actions bot added the SQL label Oct 4, 2022
@peter-toth peter-toth changed the title [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries with reference tracking [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries using reference tracking Oct 4, 2022
@peter-toth
Copy link
Contributor Author

cc @dtenedor, @gengliangwang, @sigmod

Copy link
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix, looks good to me. Thanks for fixing it!

@peter-toth
Copy link
Contributor Author

@gengliangwang, @cloud-fan could you please take a look at this fix?

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 9287fc9 Oct 13, 2022
@peter-toth
Copy link
Contributor Author

Thanks @cloud-fan!

SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…subqueries using reference tracking

### What changes were proposed in this pull request?
This PR reverts the previous fix apache#38052 and adds subquery reference tracking to `MergeScalarSubqueries` to restore previous functionality of merging independent nested subqueries.

### Why are the changes needed?
Restore previous functionality but fix the bug discovered in https://issues.apache.org/jira/browse/SPARK-40618.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing and new UTs.

Closes apache#38093 from peter-toth/SPARK-40618-fix-mergescalarsubqueries.

Authored-by: Peter Toth <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
peter-toth added a commit that referenced this pull request Nov 7, 2025
…ries` to `PlanMerger`

### What changes were proposed in this pull request?

This PR extracts the plan merging logic from `MergeScalarSubqueries` to `PlanMerger` so as to other rules can reuse it.

While the plan merging logic is extracted without modification to `PlanMerger`, `MergeScalarSubqueries` required a significant adjustment. This is because [SPARK-40618](https://issues.apache.org/jira/browse/SPARK-40618) / #38093 added subquery reference tracking so as to avoid trying to merge a subquery to any of its nested subqueries. This kind of reference trancking doesn't work well with a general `PlanMerger` so this PR modifies `MergeScalarSubqueries` to use a separate `PlanMerger`s by each subquery level.

### Why are the changes needed?

To be able to reuse plan merging logic.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.

### Was this patch authored or co-authored using generative AI tooling?

Yes, Claude gave me suggestions to improve documentation.

Closes #52835 from peter-toth/SPARK-54136-extract-plan-merging-logic.

Authored-by: Peter Toth <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
a0x8o added a commit to a0x8o/spark that referenced this pull request Nov 7, 2025
…ries` to `PlanMerger`

### What changes were proposed in this pull request?

This PR extracts the plan merging logic from `MergeScalarSubqueries` to `PlanMerger` so as to other rules can reuse it.

While the plan merging logic is extracted without modification to `PlanMerger`, `MergeScalarSubqueries` required a significant adjustment. This is because [SPARK-40618](https://issues.apache.org/jira/browse/SPARK-40618) / apache/spark#38093 added subquery reference tracking so as to avoid trying to merge a subquery to any of its nested subqueries. This kind of reference trancking doesn't work well with a general `PlanMerger` so this PR modifies `MergeScalarSubqueries` to use a separate `PlanMerger`s by each subquery level.

### Why are the changes needed?

To be able to reuse plan merging logic.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.

### Was this patch authored or co-authored using generative AI tooling?

Yes, Claude gave me suggestions to improve documentation.

Closes #52835 from peter-toth/SPARK-54136-extract-plan-merging-logic.

Authored-by: Peter Toth <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants