-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20636] Add new optimization rule to transpose adjacent Window expressions. #17899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
|
Test build #76591 has finished for PR 17899 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition might not be enough. w1 might depend on the outputs of w2, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are also changing the order of the columns. You will need to add a projection on top to be sure.
|
Test build #76678 has finished for PR 17899 at commit
|
|
Test build #76679 has finished for PR 17899 at commit
|
|
Test build #76680 has finished for PR 17899 at commit
|
f6a4e47 to
1ab81ca
Compare
|
Test build #76690 has finished for PR 17899 at commit
|
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably warrants a follow-up that tries to move projections that are wedged in between two window clauses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to check that the windows are independent, e.g.: w1.references.intersect(w2.windowOutputSet).isEmpty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to get a little more milage out of the rule by using semanticEquals for comparing the partition expressions, e.g.:
def sliceSemanticEquals(ps1: Seq[Expression], ps2: Seq[Expression]): Boolean = ps1.zip(ps2).forall {
case (l, r) => l.semanticEquals(r)
}
...
sliceSemanticEquals(ps1, ps2)You could even get more leverage if you do not consider the order of the partition spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this test? It does not really add anything new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'll remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comparePlans(optimized, analyzed)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely sure if we need this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'll remove it.
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ptkool this looks pretty good. One thing need to be addressed though: we need to factor in that a parent window can depend on its child window.
|
Test build #76970 has finished for PR 17899 at commit
|
1ab81ca to
f472bfe
Compare
|
Test build #77126 has finished for PR 17899 at commit
|
|
@hvanhovell @gatorsmile Can you have another look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No test case covers the condition w1.references.intersect(w2.windowOutputSet).isEmpty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expressions in both w1.expressions and w2.expressions must be deterministic. If not, we should not flip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? This seems overly restrictive to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to ensure the results are still the same with and without the rule.
f472bfe to
e2f24c2
Compare
|
Test build #78973 has finished for PR 17899 at commit
|
|
@ptkool Can you address the conflicts? We will review it. |
|
Test build #83189 has finished for PR 17899 at commit
|
|
Test build #83191 has finished for PR 17899 at commit
|
82d7390 to
f840c69
Compare
|
Test build #83193 has finished for PR 17899 at commit
|
|
Test build #83194 has finished for PR 17899 at commit
|
|
Test build #86355 has finished for PR 17899 at commit
|
|
Test build #95753 has finished for PR 17899 at commit
|
| object CollapseWindow extends Rule[LogicalPlan] { | ||
| def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { | ||
| case w1 @ Window(we1, ps1, os1, w2 @ Window(we2, ps2, os2, grandChild)) | ||
| if ps1 == ps2 && os1 == os2 && w1.references.intersect(w2.windowOutputSet).isEmpty && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't include style changes
|
Test build #95802 has finished for PR 17899 at commit
|
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks! Merged to master.
| } | ||
|
|
||
| /** | ||
| * Transpose Adjacent Window Expressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this rule useful?
## What changes were proposed in this pull request? This PR is a follow-up of the PR apache#17899. It is to add the rule TransposeWindow the optimizer batch. ## How was this patch tested? The existing tests. Closes apache#23222 from gatorsmile/followupSPARK-20636. Authored-by: gatorsmile <[email protected]> Signed-off-by: gatorsmile <[email protected]>
## What changes were proposed in this pull request? This PR is a follow-up of the PR apache#17899. It is to add the rule TransposeWindow the optimizer batch. ## How was this patch tested? The existing tests. Closes apache#23222 from gatorsmile/followupSPARK-20636. Authored-by: gatorsmile <[email protected]> Signed-off-by: gatorsmile <[email protected]>
What changes were proposed in this pull request?
Add new optimization rule to eliminate unnecessary shuffling by flipping adjacent Window expressions.
How was this patch tested?
Tested with unit tests, integration tests, and manual tests.