Skip to content

Conversation

@jerryshao
Copy link
Contributor

This PR is a sub-task of SPARK-2044 to move the execution of aggregation into shuffle implementations.

I leave CoGoupedRDD and SubtractedRDD unchanged because they have their implementations of aggregation. I'm not sure is it suitable to change these two RDDs.

Also I do not move sort related code of OrderedRDDFunctions into shuffle, this will be solved in another sub-task.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15705/

@jerryshao
Copy link
Contributor Author

Hi @mateiz, would you mind taking a look at this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this say "for map-side combine"?

@mateiz
Copy link
Contributor

mateiz commented Jun 19, 2014

Hey, sorry, been a bit busy lately but I will take a look soon. At a quick glance it looks pretty good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the one problem I see is that the InterruptibleIterator around these calls was lost when you moved them here. This is not great because it means tasks running these won't be cancelable. Can you add it back? You already have a TaskContext as a field of ShuffleReader.

@mateiz
Copy link
Contributor

mateiz commented Jun 21, 2014

Hey Saisai, I noticed one thing that got lost in the move, which is the use of InterruptibleIterator. We need to bring that back to allow cancellation of reduce tasks. Other than that it looks good to me.

@jerryshao
Copy link
Contributor Author

Hi Matei, thanks for your review, I will update the code soon.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@jerryshao
Copy link
Contributor Author

Hi Matei, I just updated the code according to your comments. For OrderedRDDFunctions, I only set KeyOrding into the shuffle, but not move the code path, so what's your plan about sort related sub-task (SPARK-2125).

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16021/

@mateiz
Copy link
Contributor

mateiz commented Jun 24, 2014

Thanks for the update! I've merged this in.

@asfgit asfgit closed this in 56eb8af Jun 24, 2014
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
This PR is a sub-task of SPARK-2044 to move the execution of aggregation into shuffle implementations.

I leave `CoGoupedRDD` and `SubtractedRDD` unchanged because they have their implementations of aggregation. I'm not sure is it suitable to change these two RDDs.

Also I do not move sort related code of `OrderedRDDFunctions` into shuffle, this will be solved in another sub-task.

Author: jerryshao <[email protected]>

Closes apache#1064 from jerryshao/SPARK-2124 and squashes the following commits:

4a05a40 [jerryshao] Modify according to comments
1f7dcc8 [jerryshao] Style changes
50a2fd6 [jerryshao] Fix test suite issue after moving aggregator to Shuffle reader and writer
1a96190 [jerryshao] Code modification related to the ShuffledRDD
308f635 [jerryshao] initial works of move combiner to ShuffleManager's reader and writer
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This PR is a sub-task of SPARK-2044 to move the execution of aggregation into shuffle implementations.

I leave `CoGoupedRDD` and `SubtractedRDD` unchanged because they have their implementations of aggregation. I'm not sure is it suitable to change these two RDDs.

Also I do not move sort related code of `OrderedRDDFunctions` into shuffle, this will be solved in another sub-task.

Author: jerryshao <[email protected]>

Closes apache#1064 from jerryshao/SPARK-2124 and squashes the following commits:

4a05a40 [jerryshao] Modify according to comments
1f7dcc8 [jerryshao] Style changes
50a2fd6 [jerryshao] Fix test suite issue after moving aggregator to Shuffle reader and writer
1a96190 [jerryshao] Code modification related to the ShuffledRDD
308f635 [jerryshao] initial works of move combiner to ShuffleManager's reader and writer
wangyum pushed a commit that referenced this pull request May 26, 2023
…1064)

* [CARMEL-6174][FOLLOWUP] Change prefer shuffled hash join condition

* [CARMEL-6174][FOLLOWUP] Change prefer shuffled hash join condition

* [CARMEL-6174][FOLLOWUP] Change prefer shuffled hash join condition
udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024
mapr-devops pushed a commit to mapr/spark that referenced this pull request May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants