-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-2124] Move aggregation into shuffle implementations #1064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
Hi @mateiz, would you mind taking a look at this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this say "for map-side combine"?
|
Hey, sorry, been a bit busy lately but I will take a look soon. At a quick glance it looks pretty good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the one problem I see is that the InterruptibleIterator around these calls was lost when you moved them here. This is not great because it means tasks running these won't be cancelable. Can you add it back? You already have a TaskContext as a field of ShuffleReader.
|
Hey Saisai, I noticed one thing that got lost in the move, which is the use of InterruptibleIterator. We need to bring that back to allow cancellation of reduce tasks. Other than that it looks good to me. |
|
Hi Matei, thanks for your review, I will update the code soon. |
|
Merged build triggered. |
|
Merged build started. |
|
Hi Matei, I just updated the code according to your comments. For OrderedRDDFunctions, I only set |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
Thanks for the update! I've merged this in. |
This PR is a sub-task of SPARK-2044 to move the execution of aggregation into shuffle implementations. I leave `CoGoupedRDD` and `SubtractedRDD` unchanged because they have their implementations of aggregation. I'm not sure is it suitable to change these two RDDs. Also I do not move sort related code of `OrderedRDDFunctions` into shuffle, this will be solved in another sub-task. Author: jerryshao <[email protected]> Closes apache#1064 from jerryshao/SPARK-2124 and squashes the following commits: 4a05a40 [jerryshao] Modify according to comments 1f7dcc8 [jerryshao] Style changes 50a2fd6 [jerryshao] Fix test suite issue after moving aggregator to Shuffle reader and writer 1a96190 [jerryshao] Code modification related to the ShuffledRDD 308f635 [jerryshao] initial works of move combiner to ShuffleManager's reader and writer
This PR is a sub-task of SPARK-2044 to move the execution of aggregation into shuffle implementations. I leave `CoGoupedRDD` and `SubtractedRDD` unchanged because they have their implementations of aggregation. I'm not sure is it suitable to change these two RDDs. Also I do not move sort related code of `OrderedRDDFunctions` into shuffle, this will be solved in another sub-task. Author: jerryshao <[email protected]> Closes apache#1064 from jerryshao/SPARK-2124 and squashes the following commits: 4a05a40 [jerryshao] Modify according to comments 1f7dcc8 [jerryshao] Style changes 50a2fd6 [jerryshao] Fix test suite issue after moving aggregator to Shuffle reader and writer 1a96190 [jerryshao] Code modification related to the ShuffledRDD 308f635 [jerryshao] initial works of move combiner to ShuffleManager's reader and writer
…1064) * [CARMEL-6174][FOLLOWUP] Change prefer shuffled hash join condition * [CARMEL-6174][FOLLOWUP] Change prefer shuffled hash join condition * [CARMEL-6174][FOLLOWUP] Change prefer shuffled hash join condition
Co-authored-by: Egor Krivokon <>
Co-authored-by: Egor Krivokon <>
This PR is a sub-task of SPARK-2044 to move the execution of aggregation into shuffle implementations.
I leave
CoGoupedRDDandSubtractedRDDunchanged because they have their implementations of aggregation. I'm not sure is it suitable to change these two RDDs.Also I do not move sort related code of
OrderedRDDFunctionsinto shuffle, this will be solved in another sub-task.