[SPARK-33122][SQL][FOLLOWUP] Extend RemoveRedundantAggregates optimizer rule to apply to more cases #31914

tanelk · 2021-03-21T12:36:42Z

What changes were proposed in this pull request?

Addressed the @dongjoon-hyun comments on the previous PR #30018.
Extended the RemoveRedundantAggregates rule to remove redundant aggregations in even more queries. For example in

dataset
   .dropDuplicates()
   .groupBy('a)
   .agg(max('b))

the dropDuplicates is not needed, because the result on max does not depend on duplicate values.

Why are the changes needed?

Improve performance.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

SparkQA · 2021-03-21T13:52:40Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40890/

SparkQA · 2021-03-21T13:57:55Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40890/

SparkQA · 2021-03-21T17:40:54Z

Test build #136308 has finished for PR 31914 at commit 19748dc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

Hi, @tanelk .
Since the commit title is precious resource, please don't repeat the original JIRA title. The JIRA ID is enough for that purpose. It would be great if you give a more meaningful and specific PR title.

…dant_aggs_followup

tanelk · 2021-05-24T15:25:27Z

@maropu , this is a followup to a PR you reviewed a while back, but it has gone unnoticed.

tanelk · 2021-05-24T15:26:26Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala

+    val upperHasNoDuplicateSensitiveAgg = upper
+      .aggregateExpressions
+      .forall(expr => expr.find {
+        case ae: AggregateExpression => !EliminateDistinct.isDuplicateAgnostic(ae.aggregateFunction)
+        case e => AggregateExpression.isAggregate(e)
+      }.isEmpty)


This is only behaviour change

SparkQA · 2021-05-24T16:09:41Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43395/

SparkQA · 2021-05-24T16:42:43Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43395/

SparkQA · 2021-05-24T19:58:12Z

Test build #138873 has finished for PR 31914 at commit 3fe6985.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2021-05-25T01:03:43Z

I've checked the GA tests passed, so I will merge this. Thank you, @tanelk

maropu · 2021-05-25T01:04:50Z

Merged to master.

tanelk added 2 commits March 21, 2021 11:03

Address comments

fba05c7

Allow duplicate angostic aggs

19748dc

github-actions bot added the SQL label Mar 21, 2021

dongjoon-hyun reviewed Mar 21, 2021

View reviewed changes

tanelk changed the title ~~[SPARK-33122][SQL][FOLLOWUP] Remove redundant aggregates in the Optimzier~~ [SPARK-33122][SQL][FOLLOWUP] Extend RemoveRedundantAggregates optimizer rule to apply to more cases Mar 22, 2021

tanelk added 2 commits May 24, 2021 17:24

Merge remote-tracking branch 'upstream/master' into SPARK-33122_redun…

1a9f082

…dant_aggs_followup

Merge master

3fe6985

tanelk commented May 24, 2021

View reviewed changes

maropu approved these changes May 25, 2021

View reviewed changes

maropu closed this in 548e37b May 25, 2021

tanelk deleted the SPARK-33122_redundant_aggs_followup branch June 15, 2021 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-33122][SQL][FOLLOWUP] Extend RemoveRedundantAggregates optimizer rule to apply to more cases #31914

[SPARK-33122][SQL][FOLLOWUP] Extend RemoveRedundantAggregates optimizer rule to apply to more cases #31914

Uh oh!

tanelk commented Mar 21, 2021 •

edited

Loading

Uh oh!

SparkQA commented Mar 21, 2021

Uh oh!

SparkQA commented Mar 21, 2021

Uh oh!

SparkQA commented Mar 21, 2021

Uh oh!

dongjoon-hyun left a comment •

edited

Loading

Uh oh!

tanelk commented May 24, 2021

Uh oh!

tanelk May 24, 2021

Uh oh!

SparkQA commented May 24, 2021

Uh oh!

SparkQA commented May 24, 2021

Uh oh!

SparkQA commented May 24, 2021

Uh oh!

maropu commented May 25, 2021

Uh oh!

maropu commented May 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-33122][SQL][FOLLOWUP] Extend RemoveRedundantAggregates optimizer rule to apply to more cases #31914

[SPARK-33122][SQL][FOLLOWUP] Extend RemoveRedundantAggregates optimizer rule to apply to more cases #31914

Uh oh!

Conversation

tanelk commented Mar 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Mar 21, 2021

Uh oh!

SparkQA commented Mar 21, 2021

Uh oh!

SparkQA commented Mar 21, 2021

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tanelk commented May 24, 2021

Uh oh!

tanelk May 24, 2021

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 24, 2021

Uh oh!

SparkQA commented May 24, 2021

Uh oh!

SparkQA commented May 24, 2021

Uh oh!

maropu commented May 25, 2021

Uh oh!

maropu commented May 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tanelk commented Mar 21, 2021 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading