Skip to content

Conversation

@c21
Copy link
Contributor

@c21 c21 commented Feb 25, 2021

What changes were proposed in this pull request?

I discovered from review discussion - #31630 (comment) , that we can eliminate LEFT ANTI join (with no join condition) to empty relation, if the right side is known to be non-empty. So with AQE, this is doable similar to #29484 .

Why are the changes needed?

This can help eliminate the join operator during logical plan optimization.
Before this PR, left side physical plan execute() will be called, so if left side is complicated (e.g. contain broadcast exchange operator), then some computation would happen. However after this PR, the join operator will be removed during logical plan, and nothing will be computed from left side. Potentially it can save resource for these kinds of query.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added unit tests for positive and negative queries in AdaptiveQueryExecSuite.scala.

@github-actions github-actions bot added the SQL label Feb 25, 2021
@c21
Copy link
Contributor Author

c21 commented Feb 25, 2021

cc @cloud-fan and @maropu to take a look if you have time, thanks.

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Test build #135454 has finished for PR 31641 at commit ead65c8.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Test build #135456 has finished for PR 31641 at commit e9e7b16.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40036/

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40034/

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40036/

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40034/

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40037/

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40037/

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Test build #135457 has finished for PR 31641 at commit 255ee07.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


case j @ Join(_, _, LeftAnti, None, _) =>
val isNonEmptyBroadcastedRightSide = j.right match {
case LogicalQueryStage(_, stage: BroadcastQueryStageExec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

case LogicalQueryStage(_, stage: QueryStageExec) =>
  stage.getRuntimeStatistics.rowCount.exists(_ > 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan - thanks, use row count is better, updated.

* at the first place.
*
* 3. Join is left anti join without condition, and broadcasted join right side is not empty.
* This applies to broadcast nested loop join only.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we care about the physical plan? I think it's a logical optimization that left anti join without condition can be turned into empty relation if right side is non-empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan - I agree. Updated to remove this restriction. I was not thinking towards to use row count stats at the first place, so was checking the size of Array[InternalRow] for BroadcastNestedLoopJoinExec, so was having this restriction.

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40059/

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40059/

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Test build #135479 has finished for PR 31641 at commit 8e5f35d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@c21
Copy link
Contributor Author

c21 commented Feb 25, 2021

retest this please

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40061/

@SparkQA
Copy link

SparkQA commented Feb 26, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40061/

("SELECT /*+ broadcast(testData) */ * FROM testData LEFT ANTI JOIN testData3", false)
).foreach { case (query, isEliminated) =>
val (plan, adaptivePlan) = runAdaptiveAndVerifyResult(query)
val bnlj = findTopLevelBroadcastNestedLoopJoin(plan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to create findTopLevelBroadcastNestedLoopJoin? We just want to prove that, the non-AQE plan has join and the AQE one has no join. findTopLevelBaseJoin should be good enough here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan - agree, updated.

@SparkQA
Copy link

SparkQA commented Feb 26, 2021

Test build #135481 has finished for PR 31641 at commit 8e5f35d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40070/

@SparkQA
Copy link

SparkQA commented Feb 26, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40070/

@SparkQA
Copy link

SparkQA commented Feb 26, 2021

Test build #135489 has finished for PR 31641 at commit 934306e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 7d5021f Feb 26, 2021
@maropu
Copy link
Member

maropu commented Feb 26, 2021

Thanks for the update, @c21 !

@c21
Copy link
Contributor Author

c21 commented Feb 26, 2021

Thank you @cloud-fan and @maropu for review!

@c21 c21 deleted the left-anti-aqe branch February 26, 2021 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants