-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-38132][SQL] Remove NotPropagation rule
#35428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
For unblocking #35395 |
|
cc @sunchao too since he was on the original PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, @kazuyukitanimura .
- Instead of a pure removal, we need a new test case to prevent future regressions like this.
- If possible, make a new JIRA ID for this PR because the original one is 3 months ago already (although it's not released yet)
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's okay to remove it completely. I agree with @dongjoon-hyun that we might still need the test case to guard the regression.
|
Thank you @dongjoon-hyun @viirya for the feedback. Added NotInSubqueryEndToEndSuite.scala |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for updates, @kazuyukitanimura .
sunchao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable. Maybe we can come back to revisit this when we find more practical use cases for this optimization.
|
|
||
| val t = "test_table" | ||
|
|
||
| test("SPARK-38132: Avoid Optimizing Not(InSubquery)") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "Avoid Optimizing Not(InSubquery)" -> "Avoid optimizing Not IN subquery"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
|
|
||
| import org.apache.spark.sql.test.SharedSparkSession | ||
|
|
||
| class NotInSubqueryEndToEndSuite extends QueryTest with SharedSparkSession { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think we already have test suite e.g. SubquerySuite. Can we just put the test into existing test suite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved
| assert(!nonDeterministicQueryPlan.deterministic) | ||
| } | ||
|
|
||
| test("SPARK-38132: Avoid optimizing Not IN subquery") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we give a more meaningful name for this test case?
For me, the test case looks like checking the correctness of the queries which is irrelevant to avoid optimizing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, renamed
NotPropagation rule
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @kazuyukitanimura , @viirya , @sunchao
Merged to master.
|
Thanks for pinging me. Just checking for my own understanding - so is this technically a revert of SPARK-36665? |
|
Late LGTM. It seems to me that this rule won't make a big deal to the overall performance, so it's better to make it very simple, otherwise it's not worthwhile. |
@HyukjinKwon It is actually not a full revert. SPARK-36665 introduced two classes(objects), and reverted only one of them |
What changes were proposed in this pull request?
This is a follow-up PR to mitigate the bug introduced by SPARK-36665. This PR removes
NotPropagationoptimization for now until we find a better approach.Why are the changes needed?
NotPropagationoptimization previously brokeRewritePredicateSubqueryso that it does not properly rewrite the predicate to a NULL-aware left anti join anymore.Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing tests