Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented May 12, 2017

What changes were proposed in this pull request?

For aggregate function with PartialMerge or Final mode, the input is aggregate buffers instead of the actual children expressions. So the actual children expressions won't affect the result, we should normalize the expr id for them.

How was this patch tested?

a new regression test

@cloud-fan
Copy link
Contributor Author

cc @hvanhovell

@SparkQA
Copy link

SparkQA commented May 12, 2017

Test build #76871 has finished for PR 17964 at commit a2ebfda.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so this works because of the way we plan aggregates and I am totally fine with this.

I am slightly worried about non-complete aggregate expression that cannot be resolved and wreak havok further down the line because sameResult falsely evaluated to true. Can we special case non-complete aggregate expressions?

From an architectural point of view it might be better to add this as a normalize function to Expression.

@SparkQA
Copy link

SparkQA commented May 13, 2017

Test build #76891 has started for PR 17964 at commit 557298e.

@hvanhovell
Copy link
Contributor

LGTM


test("SPARK-20725: partial aggregate should behave correctly for sameResult") {
val df1 = spark.range(10).agg(sum($"id"))
val df2 = spark.range(10).agg(sum($"id"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    val df1 = spark.range(10).agg(sumDistinct($"id"))
    val df2 = spark.range(10).agg(sumDistinct($"id"))

They will not match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! The reason is, HashAggregateExec.requiredChildDistributionExpressions is a Option[Seq[Expression]], which is not treated as expressions of HashAggregateExec, and thus not touched by QueryPlan.mapExpressions.

I have fixed it in QueryPlan

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented May 13, 2017

Test build #76893 has finished for PR 17964 at commit 557298e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 13, 2017

Test build #76901 has finished for PR 17964 at commit 49da955.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

hvanhovell commented May 13, 2017

LGTM - merging to master/2.2.

asfgit pushed a commit that referenced this pull request May 13, 2017
…Result

## What changes were proposed in this pull request?

For aggregate function with `PartialMerge` or `Final` mode, the input is aggregate buffers instead of the actual children expressions. So the actual children expressions won't affect the result, we should normalize the expr id for them.

## How was this patch tested?

a new regression test

Author: Wenchen Fan <[email protected]>

Closes #17964 from cloud-fan/tmp.

(cherry picked from commit 1283c3d)
Signed-off-by: Herman van Hovell <[email protected]>
@hvanhovell
Copy link
Contributor

@cloud-fan can you backport this to 2.1?

@asfgit asfgit closed this in 1283c3d May 13, 2017
robert3005 pushed a commit to palantir/spark that referenced this pull request May 19, 2017
…Result

## What changes were proposed in this pull request?

For aggregate function with `PartialMerge` or `Final` mode, the input is aggregate buffers instead of the actual children expressions. So the actual children expressions won't affect the result, we should normalize the expr id for them.

## How was this patch tested?

a new regression test

Author: Wenchen Fan <[email protected]>

Closes apache#17964 from cloud-fan/tmp.
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
…Result

## What changes were proposed in this pull request?

For aggregate function with `PartialMerge` or `Final` mode, the input is aggregate buffers instead of the actual children expressions. So the actual children expressions won't affect the result, we should normalize the expr id for them.

## How was this patch tested?

a new regression test

Author: Wenchen Fan <[email protected]>

Closes apache#17964 from cloud-fan/tmp.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants