[SPARK-1958] Calling .collect() on a SchemaRDD should call executeCollect() on the underlying query plan. #939

liancheng · 2014-06-02T03:01:52Z

In cases like Limit and TakeOrdered, executeCollect() makes optimizations that execute().collect() will not.

…lect() on the underlying query plan.

AmplabJenkins · 2014-06-02T03:02:58Z

Merged build triggered.

AmplabJenkins · 2014-06-02T03:03:07Z

Merged build started.

rxin · 2014-06-02T03:09:14Z

LGTM.

aarondav · 2014-06-02T03:09:31Z

sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala

rxin · 2014-06-02T03:11:35Z

Actually - please define the return type explicitly for public methods.

AmplabJenkins · 2014-06-02T04:13:58Z

Merged build finished.

AmplabJenkins · 2014-06-02T04:13:59Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15341/

rxin · 2014-06-02T04:16:18Z

Actually a lot of tests are failing ...

AmplabJenkins · 2014-06-02T15:32:58Z

Merged build triggered.

AmplabJenkins · 2014-06-02T15:33:08Z

Merged build started.

liancheng · 2014-06-02T16:03:22Z

@rxin Shame... underestimated this issue and didn't run full test locally :( I think the problem is that executeCollect() should copy row objects to keep data immutable. Also, now user shouldn't call collect() on SchemaRDDs returned by a SQL/HiveQL command, since executeCollect() calls execute() and causes duplicated command execution.

AmplabJenkins · 2014-06-02T16:48:31Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-02T16:48:31Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15349/

rxin · 2014-06-02T19:09:30Z

Thanks. I've merged this in master & branch-1.0.

…lect() on the underlying query plan. In cases like `Limit` and `TakeOrdered`, `executeCollect()` makes optimizations that `execute().collect()` will not. Author: Cheng Lian <[email protected]> Closes #939 from liancheng/spark-1958 and squashes the following commits: bdc4a14 [Cheng Lian] Copy rows to present immutable data to users 8250976 [Cheng Lian] Added return type explicitly for public API 192a25c [Cheng Lian] [SPARK-1958] Calling .collect() on a SchemaRDD should call executeCollect() on the underlying query plan. (cherry picked from commit d000ca9) Signed-off-by: Reynold Xin <[email protected]>

…lect() on the underlying query plan. In cases like `Limit` and `TakeOrdered`, `executeCollect()` makes optimizations that `execute().collect()` will not. Author: Cheng Lian <[email protected]> Closes apache#939 from liancheng/spark-1958 and squashes the following commits: bdc4a14 [Cheng Lian] Copy rows to present immutable data to users 8250976 [Cheng Lian] Added return type explicitly for public API 192a25c [Cheng Lian] [SPARK-1958] Calling .collect() on a SchemaRDD should call executeCollect() on the underlying query plan.

…on (apache#939) ### What changes were proposed in this pull request? This PR adds support for defining distribution and ordering during table creation. ### Why are the changes needed? This change is needed for feature parity with ADT in Spark 2. ### Does this PR introduce any user-facing change? This PR adds new optional table creation clauses but it should only affect Iceberg users. ### How was this patch tested? This PR comes with new tests. More tests are in Iceberg.

…e creation (apache#939)" This reverts commit 18c6755.

* Add Iceberg as a dep * rdar://70004937 Rewrite row-level operations for Iceberg (apache#922) * rdar://72811621 Control distribution and ordering during table creation (apache#939) This PR adds support for defining distribution and ordering during table creation. This change is needed for feature parity with ADT in Spark 2. This PR adds new optional table creation clauses but it should only affect Iceberg users. This PR comes with new tests. More tests are in Iceberg.

[SPARK-1958] Calling .collect() on a SchemaRDD should call executeCol…

192a25c

…lect() on the underlying query plan.

aarondav reviewed Jun 2, 2014
View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala Outdated

Copy link

Contributor

aarondav Jun 2, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay

liancheng added 2 commits June 2, 2014 17:25

Added return type explicitly for public API

8250976

Copy rows to present immutable data to users

bdc4a14

asfgit closed this in d000ca9 Jun 2, 2014

liancheng deleted the spark-1958 branch September 24, 2014 00:13

flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021

Revert "rdar://72811621 Control distribution and ordering during tabl…

b397f9e

…e creation (apache#939)" This reverts commit 18c6755.

[SPARK-1958] Calling .collect() on a SchemaRDD should call executeCollect() on the underlying query plan. #939

[SPARK-1958] Calling .collect() on a SchemaRDD should call executeCollect() on the underlying query plan. #939

Uh oh!

Conversation

liancheng commented Jun 2, 2014

Uh oh!

AmplabJenkins commented Jun 2, 2014

Uh oh!

AmplabJenkins commented Jun 2, 2014

Uh oh!

rxin commented Jun 2, 2014

Uh oh!

aarondav Jun 2, 2014

Choose a reason for hiding this comment

Uh oh!

rxin commented Jun 2, 2014

Uh oh!

AmplabJenkins commented Jun 2, 2014

Uh oh!

AmplabJenkins commented Jun 2, 2014

Uh oh!

rxin commented Jun 2, 2014

Uh oh!

AmplabJenkins commented Jun 2, 2014

Uh oh!

AmplabJenkins commented Jun 2, 2014

Uh oh!

liancheng commented Jun 2, 2014

Uh oh!

AmplabJenkins commented Jun 2, 2014

Uh oh!

AmplabJenkins commented Jun 2, 2014

Uh oh!

rxin commented Jun 2, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants