Skip to content

Conversation

@kazuyukitanimura
Copy link
Contributor

What changes were proposed in this pull request?

This is a follow-up PR to mitigate the bug introduced by SPARK-36665. This PR removes NotPropagation optimization for now until we find a better approach.

Why are the changes needed?

NotPropagation optimization previously broke RewritePredicateSubquery so that it does not properly rewrite the predicate to a NULL-aware left anti join anymore.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests

@kazuyukitanimura
Copy link
Contributor Author

@kazuyukitanimura
Copy link
Contributor Author

For unblocking #35395

@dongjoon-hyun
Copy link
Member

cc @sunchao too since he was on the original PR.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, @kazuyukitanimura .

  • Instead of a pure removal, we need a new test case to prevent future regressions like this.
  • If possible, make a new JIRA ID for this PR because the original one is 3 months ago already (although it's not released yet)

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay to remove it completely. I agree with @dongjoon-hyun that we might still need the test case to guard the regression.

@kazuyukitanimura kazuyukitanimura changed the title [SPARK-36665][SQL][FOLLOWUP] Remove NotPropagation [SPARK-38132][SPARK-36665][SQL] Remove NotPropagation Feb 7, 2022
@kazuyukitanimura
Copy link
Contributor Author

Thank you @dongjoon-hyun @viirya for the feedback. Added NotInSubqueryEndToEndSuite.scala

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-38132][SPARK-36665][SQL] Remove NotPropagation [SPARK-38132][SQL] Remove NotPropagation Feb 7, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updates, @kazuyukitanimura .

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable. Maybe we can come back to revisit this when we find more practical use cases for this optimization.


val t = "test_table"

test("SPARK-38132: Avoid Optimizing Not(InSubquery)") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "Avoid Optimizing Not(InSubquery)" -> "Avoid optimizing Not IN subquery"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


import org.apache.spark.sql.test.SharedSparkSession

class NotInSubqueryEndToEndSuite extends QueryTest with SharedSparkSession {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think we already have test suite e.g. SubquerySuite. Can we just put the test into existing test suite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

assert(!nonDeterministicQueryPlan.deterministic)
}

test("SPARK-38132: Avoid optimizing Not IN subquery") {
Copy link
Member

@dongjoon-hyun dongjoon-hyun Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we give a more meaningful name for this test case?
For me, the test case looks like checking the correctness of the queries which is irrelevant to avoid optimizing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, renamed

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-38132][SQL] Remove NotPropagation [SPARK-38132][SQL] Remove NotPropagation rule Feb 8, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @kazuyukitanimura , @viirya , @sunchao
Merged to master.

@HyukjinKwon
Copy link
Member

Thanks for pinging me. Just checking for my own understanding - so is this technically a revert of SPARK-36665?

@cloud-fan
Copy link
Contributor

Late LGTM. It seems to me that this rule won't make a big deal to the overall performance, so it's better to make it very simple, otherwise it's not worthwhile.

@kazuyukitanimura
Copy link
Contributor Author

Thanks for pinging me. Just checking for my own understanding - so is this technically a revert of SPARK-36665?

@HyukjinKwon It is actually not a full revert. SPARK-36665 introduced two classes(objects), and reverted only one of them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants