-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15354] [CORE] Topology aware block replication strategies #13932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Based on feedback from @rxin, added a Basic Strategy that replicates HDFS behavior as a simpler alternative to the constraint solver. I also ran some performance tests on the constraint solver and saw these numbers: The times show average, min and max of 50 runs of the optimizer for 50, 100, ..., 100000 peers placed in appropriate number of racks. When blocks are being replicated, the majority of time is expected to be spent in the actual data movement across the network. These numbers show that the performance hit from the constraint solver can be expected to be minimal. |
f94084f to
2d1cecb
Compare
2d1cecb to
ccc6ae0
Compare
44817f4 to
cff0966
Compare
|
Rebased to master to resolve merge conflict |
cff0966 to
93eb511
Compare
|
jenkins ok to test |
|
Test build #72188 has started for PR 13932 at commit |
|
No test errors. Looks like the test process was killed midway. Tests added as a part of this PR took less than 7s, so couldn't have caused the delay. |
|
test this please |
|
Test build #73311 has finished for PR 13932 at commit
|
|
Test build #73435 has finished for PR 13932 at commit
|
a35f673 to
ec601bd
Compare
|
Rebased to resolve merge conflicts. |
|
test this please |
|
Test build #73534 has started for PR 13932 at commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we use LinkedHashSet so that we don't need this extra shuffle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we explain the replicating logic for any replication factor?
|
retest this please |
|
Test build #75020 has finished for PR 13932 at commit
|
|
Test build #75102 has finished for PR 13932 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add a .filter(_.host != blockManagerId.host)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Master ensures the list of peers sent to a block manager doesn't include the requesting block manager. Was that the intention here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous indention was right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we test more explicitly that the first candidate is within rack and the second candidate is outside rack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intended behavior is to ensure one is within rack and one outside, not necessarily the first or the second.
|
LGTM except few minor comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that you're already shuffling the sample here anyways, just out of curiosity is there any advantage of using Robert Floyd's algorithm over (say) Fisher-Yates? Also, more generally, is space complexity really a concern here? Can't we just use r.shuffle(totalSize).take(sampleSize) for easy readability?
EDIT: Please ignore my first concern. I misread the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree with you here. Except I was told earlier that iterating through a list the size of the executors was a concern. So this was to address time complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But isn't the time complexity same for both cases? It seems like they both only differ in terms of space complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that, this logic is same as before, see https://github.com/apache/spark/pull/13932/files#diff-85cf7285f83b73c253480dc010b0013bL105
|
Test build #75264 has finished for PR 13932 at commit
|
…y to get peers making sure objectives previously satisfied are not violated.
…. 2. Fixing style issues
…nagerReplicationSuite, to also run the same set of tests when using the basic strategy. Added a couple of specific test cases to verify prioritization.
…, along with test cases.
…sed replication strategy and constraint solver associate with it.
e70c2a1 to
c465aaf
Compare
|
Rebased to master |
|
Test build #75315 has finished for PR 13932 at commit
|
|
retest this please |
|
I just merged a PR about block manager, retest this PR to make sure there is no conflict |
|
Test build #75356 has finished for PR 13932 at commit
|
|
thanks, merging to master! |
| } | ||
| } | ||
|
|
||
| test("Peers in 2 racks") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shubhamchopra this test seems to be failing occasionally: https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.storage.TopologyAwareBlockReplicationPolicyBehavior&test_name=Peers+in+2+racks. Can you please take a look? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed by #17624

What changes were proposed in this pull request?
Implementations of strategies for resilient block replication for different resource managers that replicate the 3-replica strategy used by HDFS, where the first replica is on an executor, the second replica within the same rack as the executor and a third replica on a different rack.
The implementation involves providing two pluggable classes, one running in the driver that provides topology information for every host at cluster start and the second prioritizing a list of peer BlockManagerIds.
The prioritization itself can be thought of an optimization problem to find a minimal set of peers that satisfy certain objectives and replicating to these peers first. The objectives can be used to express richer constraints over and above HDFS like 3-replica strategy.
How was this patch tested?
This patch was tested with unit tests for storage, along with new unit tests to verify prioritization behaviour.