Skip to content

Conversation

@hvanhovell
Copy link
Contributor

In #9409 we enabled multi-column counting. The approach taken in that PR introduces a bit of overhead by first creating a row only to check if all of the columns are non-null.

This PR fixes that technical debt. Count now takes multiple columns as its input. In order to make this work I have also added support for multiple columns in the single distinct code path.

cc @yhuai

@SparkQA
Copy link

SparkQA commented Nov 27, 2015

Test build #46814 has finished for PR 10015 at commit 3fe9cf3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class Count(children: Seq[Expression]) extends DeclarativeAggregate\n

@yhuai
Copy link
Contributor

yhuai commented Nov 29, 2015

LGTM. Merging to master and 1.6.

asfgit pushed a commit that referenced this pull request Nov 29, 2015
In #9409 we enabled multi-column counting. The approach taken in that PR introduces a bit of overhead by first creating a row only to check if all of the columns are non-null.

This PR fixes that technical debt. Count now takes multiple columns as its input. In order to make this work I have also added support for multiple columns in the single distinct code path.

cc yhuai

Author: Herman van Hovell <[email protected]>

Closes #10015 from hvanhovell/SPARK-12024.

(cherry picked from commit 3d28081)
Signed-off-by: Yin Huai <[email protected]>
@asfgit asfgit closed this in 3d28081 Nov 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants