-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12978][SQL] Merge unnecessary partial aggregates #15945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #68901 has finished for PR 15945 at commit
|
|
@hvanhovell @cloud-fan I think this target might be 2.2.0, so could you check this after 2.1 is cut. Thanks! |
|
@hvanhovell @cloud-fan Could you check this? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we add a PhysicalOptimizer to do these things? then we can simply write lazy val executedPlan: SparkPlan = physicalOptimizer.execute(sparkPlan)) instead of lazy val executedPlan: SparkPlan = prepareForExecution(sparkPlan)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, good idea, so I try to do so.
|
Test build #71262 has finished for PR 15945 at commit
|
|
Test build #71264 has finished for PR 15945 at commit
|
0a12a4f to
30e7258
Compare
|
Test build #71265 has finished for PR 15945 at commit
|
|
Test build #71267 has finished for PR 15945 at commit
|
|
@cloud-fan How about this fix? |
|
@cloud-fan ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's think about a better name, it does more than only optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yea. So, how about PhysicalPlanRewriter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we put it in SessionState like analyzer and optimizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, SGTM. I'll try to fix
|
Test build #71724 has finished for PR 15945 at commit
|
|
Test build #71727 has finished for PR 15945 at commit
|
|
Test build #71917 has started for PR 15945 at commit |
|
Jenkins, retest this please. |
|
Test build #71926 has finished for PR 15945 at commit
|
|
Test build #72351 has finished for PR 15945 at commit
|
|
Test build #72355 has finished for PR 15945 at commit
|
|
Test build #72357 has finished for PR 15945 at commit
|
|
@cloud-fan ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we return an anonymous RuleExecutor[SparkPlan] here? then we don't need to bother the name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I'll try to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, if we use anonymous classes here, it seems we cannot avoid duplicate rule entries in IncrementalExecution: 4f1240d#diff-13a3f1b22cd7c812e433f771d39eec97R103.
I keep looking for other approaches to avoid this though, I would appreciate your more suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it needed? UnaryExecNode already extends SparkPlan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, you're right and this is meaningless. I'll remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not outer.getClass == inner.getClass?
|
Test build #72738 has finished for PR 15945 at commit
|
|
Test build #72739 has finished for PR 15945 at commit
|
8e5d522 to
ea586cf
Compare
|
Test build #74571 has finished for PR 15945 at commit
|
|
Test build #74573 has finished for PR 15945 at commit
|
|
Test build #74577 has finished for PR 15945 at commit
|
|
Test build #74580 has finished for PR 15945 at commit
|
|
I'll close for now. |
What changes were proposed in this pull request?
This pr is to merge unnecessary partial aggregates if the inputs of aggregates satisfy the distribution requirement of these partial aggregates. This pr is rework based on the @cloud-fan 's suggestion in #14909.
How was this patch tested?
Add tests in
PlannerSuiteto check if these partial aggregates are removed by catalyst.