[SPARK-14293] Improve shuffle load balancing and minimize network data transmission. #12085

peterpc0701 · 2016-03-31T13:42:49Z

What changes were proposed in this pull request?

Based on the map output sizes and locations tracked by MapOutputTracker, we can obtain a better load balancing

This patch proposes a strategy to set preferred locations for each reduce task, which could firstly keep each executor process almost the same amount of intermediate data and secondly minimize the network data transmission. This can benefit some conditions:

1．REDUCER_PREF_LOCS_FRACTION tries to place the reduce tasks close to the largest output. If there exists data skew in the map outputs. It could cause some executors that have large of map outputs become busy. Our method could avoid this case and minimize the network data transmission.

2．When there are large of reduce tasks in the job, it helps each executor processes almost the same data and keeps load balancing.

The special steps are following.
Step 1: For each task, calculate the amount of data and their distributions.

Step 2: Divide the tasks into n groups according to the number of nodes and data size, ensuring that the data size for each group is nearly equal.

Step 3: Determine the amount of local data if the tasks of every group are executed on every node. Thus, a n × n matrix is created.

Step 4: Choose the largest value in the matrix to identify which group is allocated to which node. Mark the row and column at which the selected group is located to ensure that the group is not chosen next time. Goto Step 4 until no group is available.

How was this patch tested?

Unit test suite
Author: Cheng Pei [email protected]

holdenk · 2016-04-28T16:39:34Z

This is interesting, there are some minor formatting things that might make sense to fix while waiting for review - but since this is designed to improve performance it probably makes sense to also do some performance testing so that we all can see what the benefit would look like. Maybe @rxin would like to take a look?

peterpc0701 · 2016-05-29T07:29:32Z

@holdenk, thank you for your comments. I have fixed some minor formatting things about this issue. And some performance testings will be given in the last few days.

AmplabJenkins · 2016-05-31T21:57:18Z

Can one of the admins verify this patch?

HyukjinKwon · 2017-05-11T12:15:55Z

Hi @peterpc0701, how is the perf test going?

## What changes were proposed in this pull request? This PR proposes to close PRs ... - inactive to the review comments more than a month - WIP and inactive more than a month - with Jenkins build failure but inactive more than a month - suggested to be closed and no comment against that - obviously looking inappropriate (e.g., Branch 0.5) To make sure, I left a comment for each PR about a week ago and I could not have a response back from the author in these PRs below: Closes apache#11129 Closes apache#12085 Closes apache#12162 Closes apache#12419 Closes apache#12420 Closes apache#12491 Closes apache#13762 Closes apache#13837 Closes apache#13851 Closes apache#13881 Closes apache#13891 Closes apache#13959 Closes apache#14091 Closes apache#14481 Closes apache#14547 Closes apache#14557 Closes apache#14686 Closes apache#15594 Closes apache#15652 Closes apache#15850 Closes apache#15914 Closes apache#15918 Closes apache#16285 Closes apache#16389 Closes apache#16652 Closes apache#16743 Closes apache#16893 Closes apache#16975 Closes apache#17001 Closes apache#17088 Closes apache#17119 Closes apache#17272 Closes apache#17971 Added: Closes apache#17778 Closes apache#17303 Closes apache#17872 ## How was this patch tested? N/A Author: hyukjinkwon <[email protected]> Closes apache#18017 from HyukjinKwon/close-inactive-prs.

peterpc0701 added 2 commits March 31, 2016 21:07

Improve shuffle load balancing and minimize network data transmission.

3ce8524

Improve shuffle load balancing and minimize network data transmission.

9cd2294

fix some minor formatting things about this issue

5fe80e6

HyukjinKwon mentioned this pull request May 17, 2017

[INFRA] Close stale PRs #18017

Closed

asfgit closed this in 5d2750a May 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-14293] Improve shuffle load balancing and minimize network data transmission. #12085

[SPARK-14293] Improve shuffle load balancing and minimize network data transmission. #12085

Uh oh!

peterpc0701 commented Mar 31, 2016

Uh oh!

holdenk commented Apr 28, 2016

Uh oh!

peterpc0701 commented May 29, 2016

Uh oh!

AmplabJenkins commented May 31, 2016

Uh oh!

HyukjinKwon commented May 11, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-14293] Improve shuffle load balancing and minimize network data transmission. #12085

[SPARK-14293] Improve shuffle load balancing and minimize network data transmission. #12085

Uh oh!

Conversation

peterpc0701 commented Mar 31, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

holdenk commented Apr 28, 2016

Uh oh!

peterpc0701 commented May 29, 2016

Uh oh!

AmplabJenkins commented May 31, 2016

Uh oh!

HyukjinKwon commented May 11, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants