-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SQL] Minor: Introduce SchemaRDD#aggregate() for simple aggregations #874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rdd.aggregate(Sum('val)) is just shorthand for
rdd.groupBy()(Sum('val)), but seems be more natural than
doing a groupBy with no grouping expressions when you
really just want an aggregation over all rows.
Did not add a JavaSchemaRDD or Python API, as these seem to
be lacking in several other methods like groupBy() already --
leaving that cleanup for future patches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example doesn't compile with the \ in there.
|
Merged build triggered. |
|
Merged build started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say in the scaladoc that this is equivalent to groupBy()(...) ?
|
LGTM other than the small addition to scaladoc. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
Added comment! |
|
Merged build triggered. |
|
Merged build started. |
|
LGTM |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
I've merged this into master & branch-1.0. |
```scala
rdd.aggregate(Sum('val))
```
is just shorthand for
```scala
rdd.groupBy()(Sum('val))
```
but seems be more natural than doing a groupBy with no grouping expressions when you really just want an aggregation over all rows.
Did not add a JavaSchemaRDD or Python API, as these seem to be lacking several other methods like groupBy() already -- leaving that cleanup for future patches.
Author: Aaron Davidson <[email protected]>
Closes #874 from aarondav/schemardd and squashes the following commits:
e9e68ee [Aaron Davidson] Add comment
db6afe2 [Aaron Davidson] Introduce SchemaRDD#aggregate() for simple aggregations
(cherry picked from commit c3576ff)
Signed-off-by: Reynold Xin <[email protected]>
```scala
rdd.aggregate(Sum('val))
```
is just shorthand for
```scala
rdd.groupBy()(Sum('val))
```
but seems be more natural than doing a groupBy with no grouping expressions when you really just want an aggregation over all rows.
Did not add a JavaSchemaRDD or Python API, as these seem to be lacking several other methods like groupBy() already -- leaving that cleanup for future patches.
Author: Aaron Davidson <[email protected]>
Closes apache#874 from aarondav/schemardd and squashes the following commits:
e9e68ee [Aaron Davidson] Add comment
db6afe2 [Aaron Davidson] Introduce SchemaRDD#aggregate() for simple aggregations
```scala
rdd.aggregate(Sum('val))
```
is just shorthand for
```scala
rdd.groupBy()(Sum('val))
```
but seems be more natural than doing a groupBy with no grouping expressions when you really just want an aggregation over all rows.
Did not add a JavaSchemaRDD or Python API, as these seem to be lacking several other methods like groupBy() already -- leaving that cleanup for future patches.
Author: Aaron Davidson <[email protected]>
Closes apache#874 from aarondav/schemardd and squashes the following commits:
e9e68ee [Aaron Davidson] Add comment
db6afe2 [Aaron Davidson] Introduce SchemaRDD#aggregate() for simple aggregations
is just shorthand for
but seems be more natural than doing a groupBy with no grouping expressions when you really just want an aggregation over all rows.
Did not add a JavaSchemaRDD or Python API, as these seem to be lacking several other methods like groupBy() already -- leaving that cleanup for future patches.