-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries using reference tracking #38093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
peter-toth
wants to merge
5
commits into
apache:master
from
peter-toth:SPARK-40618-fix-mergescalarsubqueries
Closed
[SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries using reference tracking #38093
peter-toth
wants to merge
5
commits into
apache:master
from
peter-toth:SPARK-40618-fix-mergescalarsubqueries
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… nested subqueries" This reverts commit 9ac9cd5.
Contributor
Author
dtenedor
approved these changes
Oct 4, 2022
Contributor
dtenedor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix, looks good to me. Thanks for fixing it!
Contributor
Author
|
@gengliangwang, @cloud-fan could you please take a look at this fix? |
Contributor
|
thanks, merging to master! |
Contributor
Author
|
Thanks @cloud-fan! |
SandishKumarHN
pushed a commit
to SandishKumarHN/spark
that referenced
this pull request
Dec 12, 2022
…subqueries using reference tracking ### What changes were proposed in this pull request? This PR reverts the previous fix apache#38052 and adds subquery reference tracking to `MergeScalarSubqueries` to restore previous functionality of merging independent nested subqueries. ### Why are the changes needed? Restore previous functionality but fix the bug discovered in https://issues.apache.org/jira/browse/SPARK-40618. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing and new UTs. Closes apache#38093 from peter-toth/SPARK-40618-fix-mergescalarsubqueries. Authored-by: Peter Toth <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
peter-toth
added a commit
that referenced
this pull request
Nov 7, 2025
…ries` to `PlanMerger` ### What changes were proposed in this pull request? This PR extracts the plan merging logic from `MergeScalarSubqueries` to `PlanMerger` so as to other rules can reuse it. While the plan merging logic is extracted without modification to `PlanMerger`, `MergeScalarSubqueries` required a significant adjustment. This is because [SPARK-40618](https://issues.apache.org/jira/browse/SPARK-40618) / #38093 added subquery reference tracking so as to avoid trying to merge a subquery to any of its nested subqueries. This kind of reference trancking doesn't work well with a general `PlanMerger` so this PR modifies `MergeScalarSubqueries` to use a separate `PlanMerger`s by each subquery level. ### Why are the changes needed? To be able to reuse plan merging logic. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UTs. ### Was this patch authored or co-authored using generative AI tooling? Yes, Claude gave me suggestions to improve documentation. Closes #52835 from peter-toth/SPARK-54136-extract-plan-merging-logic. Authored-by: Peter Toth <[email protected]> Signed-off-by: Peter Toth <[email protected]>
a0x8o
added a commit
to a0x8o/spark
that referenced
this pull request
Nov 7, 2025
…ries` to `PlanMerger` ### What changes were proposed in this pull request? This PR extracts the plan merging logic from `MergeScalarSubqueries` to `PlanMerger` so as to other rules can reuse it. While the plan merging logic is extracted without modification to `PlanMerger`, `MergeScalarSubqueries` required a significant adjustment. This is because [SPARK-40618](https://issues.apache.org/jira/browse/SPARK-40618) / apache/spark#38093 added subquery reference tracking so as to avoid trying to merge a subquery to any of its nested subqueries. This kind of reference trancking doesn't work well with a general `PlanMerger` so this PR modifies `MergeScalarSubqueries` to use a separate `PlanMerger`s by each subquery level. ### Why are the changes needed? To be able to reuse plan merging logic. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UTs. ### Was this patch authored or co-authored using generative AI tooling? Yes, Claude gave me suggestions to improve documentation. Closes #52835 from peter-toth/SPARK-54136-extract-plan-merging-logic. Authored-by: Peter Toth <[email protected]> Signed-off-by: Peter Toth <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR reverts the previous fix #38052 and adds subquery reference tracking to
MergeScalarSubqueriesto restore previous functionality of merging independent nested subqueries.Why are the changes needed?
Restore previous functionality but fix the bug discovered in https://issues.apache.org/jira/browse/SPARK-40618.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing and new UTs.