-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-34533][SQL] Eliminate LEFT ANTI join to empty relation in AQE #31641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan and @maropu to take a look if you have time, thanks. |
|
Test build #135454 has finished for PR 31641 at commit
|
|
Test build #135456 has finished for PR 31641 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #135457 has finished for PR 31641 at commit
|
|
|
||
| case j @ Join(_, _, LeftAnti, None, _) => | ||
| val isNonEmptyBroadcastedRightSide = j.right match { | ||
| case LogicalQueryStage(_, stage: BroadcastQueryStageExec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
case LogicalQueryStage(_, stage: QueryStageExec) =>
stage.getRuntimeStatistics.rowCount.exists(_ > 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan - thanks, use row count is better, updated.
| * at the first place. | ||
| * | ||
| * 3. Join is left anti join without condition, and broadcasted join right side is not empty. | ||
| * This applies to broadcast nested loop join only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we care about the physical plan? I think it's a logical optimization that left anti join without condition can be turned into empty relation if right side is non-empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan - I agree. Updated to remove this restriction. I was not thinking towards to use row count stats at the first place, so was checking the size of Array[InternalRow] for BroadcastNestedLoopJoinExec, so was having this restriction.
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #135479 has finished for PR 31641 at commit
|
|
retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
| ("SELECT /*+ broadcast(testData) */ * FROM testData LEFT ANTI JOIN testData3", false) | ||
| ).foreach { case (query, isEliminated) => | ||
| val (plan, adaptivePlan) = runAdaptiveAndVerifyResult(query) | ||
| val bnlj = findTopLevelBroadcastNestedLoopJoin(plan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to create findTopLevelBroadcastNestedLoopJoin? We just want to prove that, the non-AQE plan has join and the AQE one has no join. findTopLevelBaseJoin should be good enough here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan - agree, updated.
|
Test build #135481 has finished for PR 31641 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #135489 has finished for PR 31641 at commit
|
|
thanks, merging to master! |
|
Thanks for the update, @c21 ! |
|
Thank you @cloud-fan and @maropu for review! |
What changes were proposed in this pull request?
I discovered from review discussion - #31630 (comment) , that we can eliminate LEFT ANTI join (with no join condition) to empty relation, if the right side is known to be non-empty. So with AQE, this is doable similar to #29484 .
Why are the changes needed?
This can help eliminate the join operator during logical plan optimization.
Before this PR, left side physical plan
execute()will be called, so if left side is complicated (e.g. contain broadcast exchange operator), then some computation would happen. However after this PR, the join operator will be removed during logical plan, and nothing will be computed from left side. Potentially it can save resource for these kinds of query.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added unit tests for positive and negative queries in
AdaptiveQueryExecSuite.scala.