-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-10708] Consolidate sort shuffle implementations #8829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10708] Consolidate sort shuffle implementations #8829
Conversation
|
Test build #42693 has finished for PR 8829 at commit
|
803f62f to
26ecf5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: update these comments.
|
Test build #42782 has finished for PR 8829 at commit
|
|
Test build #42784 has finished for PR 8829 at commit
|
…uffle-implementations
|
Test build #42866 has finished for PR 8829 at commit
|
|
Test build #42868 has finished for PR 8829 at commit
|
…uffle-implementations
|
Test build #42931 has finished for PR 8829 at commit
|
|
Test build #42978 has finished for PR 8829 at commit
|
|
Test build #42990 has finished for PR 8829 at commit
|
|
This patch should now be ready for a preliminary round of review. There are still some comment updates to make, plus an update to preserve certain aspects of the old bypass-merge-sort shuffle fallback path, but the basics are good to go. /cc @rxin and @sryza for feedback. The key idea here is to consolidate on a single implementation of serialized buffering in sort-based shuffle in order to easy maintainability. |
…uffle-implementations
|
Jenkins, retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some comment explaining why this only works when aggregator & mapsidecombine is off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed via a larger refactoring / cleanup of this logic.
|
I took a pass. Looks pretty good. |
|
Test build #44011 has finished for PR 8829 at commit
|
|
Jenkins, retest this please. |
|
Test build #44027 has started for PR 8829 at commit |
|
LGTM |
|
Jenkins, retest this please. |
|
Test build #44090 has finished for PR 8829 at commit
|
|
Jenkins, retest this please. |
|
Test build #1934 has finished for PR 8829 at commit
|
|
Test build #44101 has finished for PR 8829 at commit
|
|
Test build #1937 has finished for PR 8829 at commit
|
|
Test build #44135 has finished for PR 8829 at commit
|
|
Woohoo, this passed tests and looks good in benchmarking, so I'm going to merge it to master (1.6). |
There's a lot of duplication between SortShuffleManager and UnsafeShuffleManager. Given that these now provide the same set of functionality, now that UnsafeShuffleManager supports large records, I think that we should replace SortShuffleManager's serialized shuffle implementation with UnsafeShuffleManager's and should merge the two managers together.