Skip to content

Conversation

@tanelk
Copy link
Contributor

@tanelk tanelk commented Mar 21, 2021

What changes were proposed in this pull request?

Addressed the @dongjoon-hyun comments on the previous PR #30018.
Extended the RemoveRedundantAggregates rule to remove redundant aggregations in even more queries. For example in

dataset
   .dropDuplicates()
   .groupBy('a)
   .agg(max('b))

the dropDuplicates is not needed, because the result on max does not depend on duplicate values.

Why are the changes needed?

Improve performance.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

@github-actions github-actions bot added the SQL label Mar 21, 2021
@SparkQA
Copy link

SparkQA commented Mar 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40890/

@SparkQA
Copy link

SparkQA commented Mar 21, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40890/

@SparkQA
Copy link

SparkQA commented Mar 21, 2021

Test build #136308 has finished for PR 31914 at commit 19748dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @tanelk .
Since the commit title is precious resource, please don't repeat the original JIRA title. The JIRA ID is enough for that purpose. It would be great if you give a more meaningful and specific PR title.

@tanelk tanelk changed the title [SPARK-33122][SQL][FOLLOWUP] Remove redundant aggregates in the Optimzier [SPARK-33122][SQL][FOLLOWUP] Extend RemoveRedundantAggregates optimizer rule to apply to more cases Mar 22, 2021
@tanelk
Copy link
Contributor Author

tanelk commented May 24, 2021

@maropu , this is a followup to a PR you reviewed a while back, but it has gone unnoticed.

Comment on lines +53 to +58
val upperHasNoDuplicateSensitiveAgg = upper
.aggregateExpressions
.forall(expr => expr.find {
case ae: AggregateExpression => !EliminateDistinct.isDuplicateAgnostic(ae.aggregateFunction)
case e => AggregateExpression.isAggregate(e)
}.isEmpty)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only behaviour change

@SparkQA
Copy link

SparkQA commented May 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43395/

@SparkQA
Copy link

SparkQA commented May 24, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43395/

@SparkQA
Copy link

SparkQA commented May 24, 2021

Test build #138873 has finished for PR 31914 at commit 3fe6985.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented May 25, 2021

I've checked the GA tests passed, so I will merge this. Thank you, @tanelk

@maropu maropu closed this in 548e37b May 25, 2021
@maropu
Copy link
Member

maropu commented May 25, 2021

Merged to master.

@tanelk tanelk deleted the SPARK-33122_redundant_aggs_followup branch June 15, 2021 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants