Skip to content

Conversation

@shubhamchopra
Copy link
Contributor

What changes were proposed in this pull request?

Implementations of strategies for resilient block replication for different resource managers that replicate the 3-replica strategy used by HDFS, where the first replica is on an executor, the second replica within the same rack as the executor and a third replica on a different rack.
The implementation involves providing two pluggable classes, one running in the driver that provides topology information for every host at cluster start and the second prioritizing a list of peer BlockManagerIds.

The prioritization itself can be thought of an optimization problem to find a minimal set of peers that satisfy certain objectives and replicating to these peers first. The objectives can be used to express richer constraints over and above HDFS like 3-replica strategy.

How was this patch tested?

This patch was tested with unit tests for storage, along with new unit tests to verify prioritization behaviour.

@shubhamchopra
Copy link
Contributor Author

Based on feedback from @rxin, added a Basic Strategy that replicates HDFS behavior as a simpler alternative to the constraint solver. I also ran some performance tests on the constraint solver and saw these numbers:
image

The times show average, min and max of 50 runs of the optimizer for 50, 100, ..., 100000 peers placed in appropriate number of racks. When blocks are being replicated, the majority of time is expected to be spent in the actual data movement across the network. These numbers show that the performance hit from the constraint solver can be expected to be minimal.

@shubhamchopra
Copy link
Contributor Author

Rebased to master to resolve merge conflict

@shubhamchopra shubhamchopra changed the title [SPARK-15354] [CORE] [WIP] Topology aware block replication strategies [SPARK-15354] [CORE] Topology aware block replication strategies Jan 23, 2017
@sameeragarwal
Copy link
Member

jenkins ok to test

@SparkQA
Copy link

SparkQA commented Jan 31, 2017

Test build #72188 has started for PR 13932 at commit 93eb511.

@shubhamchopra
Copy link
Contributor Author

No test errors. Looks like the test process was killed midway. Tests added as a part of this PR took less than 7s, so couldn't have caused the delay.

@sameeragarwal
Copy link
Member

test this please

@SparkQA
Copy link

SparkQA commented Feb 23, 2017

Test build #73311 has finished for PR 13932 at commit 93eb511.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 24, 2017

Test build #73435 has finished for PR 13932 at commit a35f673.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shubhamchopra
Copy link
Contributor Author

Rebased to resolve merge conflicts.

@shubhamchopra
Copy link
Contributor Author

test this please

@SparkQA
Copy link

SparkQA commented Feb 27, 2017

Test build #73534 has started for PR 13932 at commit ec601bd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use LinkedHashSet so that we don't need this extra shuffle?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we explain the replicating logic for any replication factor?

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Mar 22, 2017

Test build #75020 has finished for PR 13932 at commit ec601bd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 23, 2017

Test build #75102 has finished for PR 13932 at commit 3d50cf3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add a .filter(_.host != blockManagerId.host)?

Copy link
Contributor Author

@shubhamchopra shubhamchopra Mar 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Master ensures the list of peers sent to a block manager doesn't include the requesting block manager. Was that the intention here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous indention was right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we test more explicitly that the first candidate is within rack and the second candidate is outside rack?

Copy link
Contributor Author

@shubhamchopra shubhamchopra Mar 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intended behavior is to ensure one is within rack and one outside, not necessarily the first or the second.

@cloud-fan
Copy link
Contributor

LGTM except few minor comments

Copy link
Member

@sameeragarwal sameeragarwal Mar 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you're already shuffling the sample here anyways, just out of curiosity is there any advantage of using Robert Floyd's algorithm over (say) Fisher-Yates? Also, more generally, is space complexity really a concern here? Can't we just use r.shuffle(totalSize).take(sampleSize) for easy readability?

EDIT: Please ignore my first concern. I misread the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree with you here. Except I was told earlier that iterating through a list the size of the executors was a concern. So this was to address time complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't the time complexity same for both cases? It seems like they both only differ in terms of space complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Mar 27, 2017

Test build #75264 has finished for PR 13932 at commit e70c2a1.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…y to get peers making sure objectives previously satisfied are not violated.
…nagerReplicationSuite, to also run the same set of tests when using the basic strategy. Added a couple of specific test cases to verify prioritization.
…sed replication strategy and constraint solver associate with it.
@shubhamchopra
Copy link
Contributor Author

Rebased to master

@SparkQA
Copy link

SparkQA commented Mar 28, 2017

Test build #75315 has finished for PR 13932 at commit c465aaf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@cloud-fan
Copy link
Contributor

I just merged a PR about block manager, retest this PR to make sure there is no conflict

@SparkQA
Copy link

SparkQA commented Mar 29, 2017

Test build #75356 has finished for PR 13932 at commit c465aaf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in b454d44 Mar 30, 2017
}
}

test("Peers in 2 racks") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by #17624

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants