Skip to content

Conversation

@liancheng
Copy link
Contributor

In cases like Limit and TakeOrdered, executeCollect() makes optimizations that execute().collect() will not.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@rxin
Copy link
Contributor

rxin commented Jun 2, 2014

LGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay

@rxin
Copy link
Contributor

rxin commented Jun 2, 2014

Actually - please define the return type explicitly for public methods.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15341/

@rxin
Copy link
Contributor

rxin commented Jun 2, 2014

Actually a lot of tests are failing ...

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@liancheng
Copy link
Contributor Author

@rxin Shame... underestimated this issue and didn't run full test locally :( I think the problem is that executeCollect() should copy row objects to keep data immutable. Also, now user shouldn't call collect() on SchemaRDDs returned by a SQL/HiveQL command, since executeCollect() calls execute() and causes duplicated command execution.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15349/

@rxin
Copy link
Contributor

rxin commented Jun 2, 2014

Thanks. I've merged this in master & branch-1.0.

@asfgit asfgit closed this in d000ca9 Jun 2, 2014
asfgit pushed a commit that referenced this pull request Jun 2, 2014
…lect() on the underlying query plan.

In cases like `Limit` and `TakeOrdered`, `executeCollect()` makes optimizations that `execute().collect()` will not.

Author: Cheng Lian <[email protected]>

Closes #939 from liancheng/spark-1958 and squashes the following commits:

bdc4a14 [Cheng Lian] Copy rows to present immutable data to users
8250976 [Cheng Lian] Added return type explicitly for public API
192a25c [Cheng Lian] [SPARK-1958] Calling .collect() on a SchemaRDD should call executeCollect() on the underlying query plan.

(cherry picked from commit d000ca9)
Signed-off-by: Reynold Xin <[email protected]>
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
…lect() on the underlying query plan.

In cases like `Limit` and `TakeOrdered`, `executeCollect()` makes optimizations that `execute().collect()` will not.

Author: Cheng Lian <[email protected]>

Closes apache#939 from liancheng/spark-1958 and squashes the following commits:

bdc4a14 [Cheng Lian] Copy rows to present immutable data to users
8250976 [Cheng Lian] Added return type explicitly for public API
192a25c [Cheng Lian] [SPARK-1958] Calling .collect() on a SchemaRDD should call executeCollect() on the underlying query plan.
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…lect() on the underlying query plan.

In cases like `Limit` and `TakeOrdered`, `executeCollect()` makes optimizations that `execute().collect()` will not.

Author: Cheng Lian <[email protected]>

Closes apache#939 from liancheng/spark-1958 and squashes the following commits:

bdc4a14 [Cheng Lian] Copy rows to present immutable data to users
8250976 [Cheng Lian] Added return type explicitly for public API
192a25c [Cheng Lian] [SPARK-1958] Calling .collect() on a SchemaRDD should call executeCollect() on the underlying query plan.
@liancheng liancheng deleted the spark-1958 branch September 24, 2014 00:13
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…on (apache#939)

### What changes were proposed in this pull request?

This PR adds support for defining distribution and ordering during table creation.

### Why are the changes needed?

This change is needed for feature parity with ADT in Spark 2.

### Does this PR introduce any user-facing change?

This PR adds new optional table creation clauses but it should only affect Iceberg users.

### How was this patch tested?

This PR comes with new tests. More tests are in Iceberg.
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
* Add Iceberg as a dep

* rdar://70004937 Rewrite row-level operations for Iceberg (apache#922)

* rdar://72811621 Control distribution and ordering during table creation (apache#939)

This PR adds support for defining distribution and ordering during table creation.

This change is needed for feature parity with ADT in Spark 2.

This PR adds new optional table creation clauses but it should only affect Iceberg users.

This PR comes with new tests. More tests are in Iceberg.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants