-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-38918][SQL][3.2] Nested column pruning should filter out attributes that do not belong to the current relation #36386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@allisonwang-db Can you fix the test failure? |
|
@viirya I tried a few times but can't reproduce this TPCDSV1_4_PlanStabilitySuite test failure locally... |
|
@allisonwang-db Maybe you can just re-trigger the CI? |
|
Hmm, from the logs, seems there are unmatched plans: |
|
@allisonwang-db I can reproduce it locally by running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems cosmetic change only in the explain string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and there is no plan change
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm if CI can pass.
|
CI still doesn't pass: |
|
Hmm let me try again |
|
This test failure is strange. When I ran it individually (for q4 and q5), it failed but when I ran the entire suite together |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @allisonwang-db . Could you rebase this PR once more, please?
15c9ccc to
9652212
Compare
|
@dongjoon-hyun Looks like it still has the same issue: # Test passed
build/sbt "sql/testOnly *PlanStabilitySuite"
# Test failed for q4 and q5
build/sbt "sql/testOnly *PlanStabilitySuite -- -z (tpcds-v1.4/q5)" |
|
Thank you for rechecking. |
|
I am facing the same issues here: #36753 |
|
So the only change in the plan I can see that makes the test fail is that the last plan node has a source filename in it now, for example
|
|
Can #36828 help on stablizing |
|
+1 for @viirya 's comment. @cloud-fan found the root cause and is fixing now on that PR. |
|
We can revisit this PR after merging that PR. |
|
@allisonwang-db #36828 is merged, can you rebase to trigger CI? Thanks. |
… that do not belong to the current relation This PR updates `ProjectionOverSchema` to use the outputs of the data source relation to filter the attributes in the nested schema pruning. This is needed because the attributes in the schema do not necessarily belong to the current data source relation. For example, if a filter contains a correlated subquery, then the subquery's children can contain attributes from both the inner query and the outer query. Since the `RewriteSubquery` batch happens after early scan pushdown rules, nested schema pruning can wrongly use the inner query's attributes to prune the outer query data schema, thus causing wrong results and unexpected exceptions. To fix a bug in `SchemaPruning`. No Unit test Closes apache#36216 from allisonwang-db/spark-38918-nested-column-pruning. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Liang-Chi Hsieh <[email protected]> (cherry picked from commit 150434b) Signed-off-by: Liang-Chi Hsieh <[email protected]> (cherry picked from commit 793ba60) Signed-off-by: allisonwang-db <[email protected]>
9652212 to
83dba72
Compare
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending CI
…butes that do not belong to the current relation ### What changes were proposed in this pull request? Backport #36216 to branch-3.2 ### Why are the changes needed? To fix a bug in `SchemaPruning`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test Closes #36386 from allisonwang-db/spark-38918-branch-3.2. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
|
Merged to branch-3.2. Thank you so much, @allisonwang-db , @viirya . |
…butes that do not belong to the current relation ### What changes were proposed in this pull request? Backport apache#36216 to branch-3.2 ### Why are the changes needed? To fix a bug in `SchemaPruning`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test Closes apache#36386 from allisonwang-db/spark-38918-branch-3.2. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Backport #36216 to branch-3.2
Why are the changes needed?
To fix a bug in
SchemaPruning.Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test