[SPARK-12978][SQL] Merge unnecessary partial aggregates #15945

maropu · 2016-11-20T15:43:40Z

What changes were proposed in this pull request?

This pr is to merge unnecessary partial aggregates if the inputs of aggregates satisfy the distribution requirement of these partial aggregates. This pr is rework based on the @cloud-fan 's suggestion in #14909.

How was this patch tested?

Add tests in PlannerSuite to check if these partial aggregates are removed by catalyst.

SparkQA · 2016-11-20T17:50:16Z

Test build #68901 has finished for PR 15945 at commit 2633ced.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan)
- abstract class AggregateExec extends UnaryExecNode

maropu · 2016-11-21T00:36:34Z

@hvanhovell @cloud-fan I think this target might be 2.2.0, so could you check this after 2.1 is cut. Thanks!

maropu · 2017-01-12T12:13:52Z

@hvanhovell @cloud-fan Could you check this? Thanks!

cloud-fan · 2017-01-12T12:40:51Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

How about we add a PhysicalOptimizer to do these things? then we can simply write lazy val executedPlan: SparkPlan = physicalOptimizer.execute(sparkPlan)) instead of lazy val executedPlan: SparkPlan = prepareForExecution(sparkPlan)

Aha, good idea, so I try to do so.

SparkQA · 2017-01-12T13:20:03Z

Test build #71262 has finished for PR 15945 at commit da25be6.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class PhysicalOptimizer(sparkSession: SparkSession) extends RuleExecutor[SparkPlan]
class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan)

SparkQA · 2017-01-12T13:45:22Z

Test build #71264 has finished for PR 15945 at commit 6bd225f.

This patch fails Scala style tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
class PhysicalOptimizer(sparkSession: SparkSession) extends RuleExecutor[SparkPlan]
class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan)

SparkQA · 2017-01-12T13:59:58Z

Test build #71265 has finished for PR 15945 at commit 30e7258.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class PhysicalOptimizer(sparkSession: SparkSession) extends RuleExecutor[SparkPlan]
class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan)

SparkQA · 2017-01-12T16:57:35Z

Test build #71267 has finished for PR 15945 at commit 96d0723.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class PhysicalOptimizer(sparkSession: SparkSession) extends RuleExecutor[SparkPlan]
class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan)

maropu · 2017-01-13T00:09:30Z

@cloud-fan How about this fix?

maropu · 2017-01-15T01:18:50Z

@cloud-fan ping

cloud-fan · 2017-01-20T10:16:11Z

sql/core/src/main/scala/org/apache/spark/sql/execution/PhysicalOptimizer.scala

let's think about a better name, it does more than only optimization

Oh, yea. So, how about PhysicalPlanRewriter?

cloud-fan · 2017-01-20T10:16:37Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

shall we put it in SessionState like analyzer and optimizer?

yea, SGTM. I'll try to fix

SparkQA · 2017-01-20T15:50:08Z

Test build #71724 has finished for PR 15945 at commit 636d022.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-20T18:28:13Z

Test build #71727 has finished for PR 15945 at commit bea519f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-24T06:37:56Z

Test build #71917 has started for PR 15945 at commit bea519f.

maropu · 2017-01-24T08:45:50Z

Jenkins, retest this please.

SparkQA · 2017-01-24T11:15:40Z

Test build #71926 has finished for PR 15945 at commit bea519f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-04T03:15:20Z

Test build #72351 has finished for PR 15945 at commit c886d26.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-04T03:58:44Z

Test build #72355 has finished for PR 15945 at commit 8c6ab3e.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-04T06:56:08Z

Test build #72357 has finished for PR 15945 at commit d333ca3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2017-02-04T07:22:13Z

@cloud-fan ping

cloud-fan · 2017-02-10T21:55:25Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

how about we return an anonymous RuleExecutor[SparkPlan] here? then we don't need to bother the name

okay, I'll try to fix.

I think, if we use anonymous classes here, it seems we cannot avoid duplicate rule entries in IncrementalExecution: 4f1240d#diff-13a3f1b22cd7c812e433f771d39eec97R103.
I keep looking for other approaches to avoid this though, I would appreciate your more suggestions.

cloud-fan · 2017-02-10T21:56:18Z

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregateExec.scala

is it needed? UnaryExecNode already extends SparkPlan

oh, you're right and this is meaningless. I'll remove this.

cloud-fan · 2017-02-10T22:04:12Z

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/MergePartialAggregate.scala

why not outer.getClass == inner.getClass?

SparkQA · 2017-02-11T12:34:29Z

Test build #72738 has finished for PR 15945 at commit 149f277.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-11T15:15:27Z

Test build #72739 has finished for PR 15945 at commit 4f1240d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-15T00:49:02Z

Test build #74571 has finished for PR 15945 at commit 8e5d522.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-15T00:59:18Z

Test build #74573 has finished for PR 15945 at commit ea586cf.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-15T03:42:58Z

Test build #74577 has finished for PR 15945 at commit 870222e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-15T07:02:11Z

Test build #74580 has finished for PR 15945 at commit 11d2757.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2018-07-18T02:59:56Z

I'll close for now.

cloud-fan reviewed Jan 12, 2017

View reviewed changes

maropu force-pushed the SPARK-12978 branch from f3f072b to da25be6 Compare January 12, 2017 13:10

maropu force-pushed the SPARK-12978 branch from da25be6 to 6bd225f Compare January 12, 2017 13:41

maropu force-pushed the SPARK-12978 branch 2 times, most recently from 0a12a4f to 30e7258 Compare January 12, 2017 13:55

maropu force-pushed the SPARK-12978 branch from 30e7258 to 96d0723 Compare January 12, 2017 14:17

cloud-fan reviewed Jan 20, 2017

View reviewed changes

maropu force-pushed the SPARK-12978 branch from 636d022 to bea519f Compare January 20, 2017 15:55

maropu force-pushed the SPARK-12978 branch from bea519f to c886d26 Compare February 4, 2017 03:10

maropu force-pushed the SPARK-12978 branch from c886d26 to 8c6ab3e Compare February 4, 2017 03:48

maropu force-pushed the SPARK-12978 branch from 8c6ab3e to d333ca3 Compare February 4, 2017 04:21

cloud-fan reviewed Feb 10, 2017

View reviewed changes

maropu force-pushed the SPARK-12978 branch from 149f277 to 4f1240d Compare February 11, 2017 12:38

maropu and others added 5 commits March 15, 2017 09:16

Merge unnecessary partial aggregates

c226b2c

Add PhysicalOptimizer to apply rule sets for phsical plans

216d1ba

Move QueryExecution#optimizer into SessionState

e14d00d

Apply review comments

676dde9

Rename a class

4149d6f

maropu force-pushed the SPARK-12978 branch 2 times, most recently from 8e5d522 to ea586cf Compare March 15, 2017 00:47

Define RuleExecutor as anonymous one

870222e

maropu force-pushed the SPARK-12978 branch from ea586cf to 870222e Compare March 15, 2017 01:19

Revert some code

11d2757

maropu force-pushed the SPARK-12978 branch from 8011a28 to 11d2757 Compare March 15, 2017 04:51

maropu closed this Jul 18, 2018

maropu mentioned this pull request Nov 19, 2020

[SPARK-33486][SQL] Collapse Partial and Final physical aggregation nodes together whenever possible #30426

Closed

[SPARK-12978][SQL] Merge unnecessary partial aggregates #15945

[SPARK-12978][SQL] Merge unnecessary partial aggregates #15945

Uh oh!

Conversation

maropu commented Nov 20, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Nov 20, 2016

Uh oh!

maropu commented Nov 21, 2016

Uh oh!

maropu commented Jan 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 12, 2017

Uh oh!

SparkQA commented Jan 12, 2017

Uh oh!

SparkQA commented Jan 12, 2017

Uh oh!

SparkQA commented Jan 12, 2017

Uh oh!

maropu commented Jan 13, 2017

Uh oh!

maropu commented Jan 15, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 20, 2017

Uh oh!

SparkQA commented Jan 20, 2017

Uh oh!

SparkQA commented Jan 24, 2017

Uh oh!

maropu commented Jan 24, 2017

Uh oh!

SparkQA commented Jan 24, 2017

Uh oh!

SparkQA commented Feb 4, 2017

Uh oh!

SparkQA commented Feb 4, 2017

Uh oh!

SparkQA commented Feb 4, 2017

Uh oh!

maropu commented Feb 4, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 11, 2017

Uh oh!

SparkQA commented Feb 11, 2017

Uh oh!

SparkQA commented Mar 15, 2017

Uh oh!

SparkQA commented Mar 15, 2017

Uh oh!

SparkQA commented Mar 15, 2017

Uh oh!