Skip to content

Conversation

@aarondav
Copy link
Contributor

rdd.aggregate(Sum('val))

is just shorthand for

rdd.groupBy()(Sum('val))

but seems be more natural than doing a groupBy with no grouping expressions when you really just want an aggregation over all rows.

Did not add a JavaSchemaRDD or Python API, as these seem to be lacking several other methods like groupBy() already -- leaving that cleanup for future patches.

rdd.aggregate(Sum('val)) is just shorthand for
rdd.groupBy()(Sum('val)), but seems be more natural than
doing a groupBy with no grouping expressions when you
really just want an aggregation over all rows.

Did not add a JavaSchemaRDD or Python API, as these seem to
be lacking in several other methods like groupBy() already --
leaving that cleanup for future patches.
@aarondav aarondav changed the title Introduce SchemaRDD#aggregate() for simple aggregations [SQL] Minor: Introduce SchemaRDD#aggregate() for simple aggregations May 25, 2014
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example doesn't compile with the \ in there.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say in the scaladoc that this is equivalent to groupBy()(...) ?

@rxin
Copy link
Contributor

rxin commented May 25, 2014

LGTM other than the small addition to scaladoc.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15188/

@aarondav
Copy link
Contributor Author

Added comment!

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@rxin
Copy link
Contributor

rxin commented May 25, 2014

LGTM

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15195/

@rxin
Copy link
Contributor

rxin commented May 26, 2014

I've merged this into master & branch-1.0.

@asfgit asfgit closed this in c3576ff May 26, 2014
asfgit pushed a commit that referenced this pull request May 26, 2014
```scala
rdd.aggregate(Sum('val))
```
is just shorthand for

```scala
rdd.groupBy()(Sum('val))
```

but seems be more natural than doing a groupBy with no grouping expressions when you really just want an aggregation over all rows.

Did not add a JavaSchemaRDD or Python API, as these seem to be lacking several other methods like groupBy() already -- leaving that cleanup for future patches.

Author: Aaron Davidson <[email protected]>

Closes #874 from aarondav/schemardd and squashes the following commits:

e9e68ee [Aaron Davidson] Add comment
db6afe2 [Aaron Davidson] Introduce SchemaRDD#aggregate() for simple aggregations

(cherry picked from commit c3576ff)
Signed-off-by: Reynold Xin <[email protected]>
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
```scala
rdd.aggregate(Sum('val))
```
is just shorthand for

```scala
rdd.groupBy()(Sum('val))
```

but seems be more natural than doing a groupBy with no grouping expressions when you really just want an aggregation over all rows.

Did not add a JavaSchemaRDD or Python API, as these seem to be lacking several other methods like groupBy() already -- leaving that cleanup for future patches.

Author: Aaron Davidson <[email protected]>

Closes apache#874 from aarondav/schemardd and squashes the following commits:

e9e68ee [Aaron Davidson] Add comment
db6afe2 [Aaron Davidson] Introduce SchemaRDD#aggregate() for simple aggregations
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
```scala
rdd.aggregate(Sum('val))
```
is just shorthand for

```scala
rdd.groupBy()(Sum('val))
```

but seems be more natural than doing a groupBy with no grouping expressions when you really just want an aggregation over all rows.

Did not add a JavaSchemaRDD or Python API, as these seem to be lacking several other methods like groupBy() already -- leaving that cleanup for future patches.

Author: Aaron Davidson <[email protected]>

Closes apache#874 from aarondav/schemardd and squashes the following commits:

e9e68ee [Aaron Davidson] Add comment
db6afe2 [Aaron Davidson] Introduce SchemaRDD#aggregate() for simple aggregations
agirish pushed a commit to HPEEzmeral/apache-spark that referenced this pull request May 5, 2022
)

- make "mapr.spark.user.secret" config optional
- review all mapr-specific volumes and make them optional
udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024
)

- make "mapr.spark.user.secret" config optional
- review all mapr-specific volumes and make them optional
mapr-devops pushed a commit to mapr/spark that referenced this pull request May 8, 2025
)

- make "mapr.spark.user.secret" config optional
- review all mapr-specific volumes and make them optional
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants