[SPARK-2288] Hide ShuffleBlockManager behind ShuffleManager #1241

colorant · 2014-06-27T02:19:30Z

By Hiding the shuffleblockmanager behind Shufflemanager, we decouple the shuffle data's block mapping management work from Diskblockmananger. This give a more clear interface and more easy for other shuffle manager to implement their own block management logic. the jira ticket have more details.

AmplabJenkins · 2014-06-27T02:20:24Z

Merged build triggered.

AmplabJenkins · 2014-06-27T02:20:30Z

Merged build started.

AmplabJenkins · 2014-06-27T02:30:35Z

Merged build finished.

AmplabJenkins · 2014-06-27T02:30:35Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16183/

AmplabJenkins · 2014-06-27T03:00:24Z

Merged build triggered.

AmplabJenkins · 2014-06-27T03:00:30Z

Merged build started.

rxin · 2014-06-27T03:50:55Z

core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala

add @return explaining what the boolean return value means

AmplabJenkins · 2014-06-27T04:37:41Z

Merged build finished.

AmplabJenkins · 2014-06-27T04:37:41Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16190/

colorant · 2014-06-27T05:08:55Z

core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockManager.scala

yeah, I see your concern, I thought about this too. actually in this PR, it mostly is used internally by HashShuffleManager's writter. While, we can take this as a way to give a chance to expose the internal storage objects for short cut usage. Such as current netty based shuffle sender. Without this interface, it's hard to implement without introduce maybe many more extra interface. to keep it simple. I offer the chance to expose the ShuffleBlockManager.

And then, this "location" conception might not be meaningful, but and BlockObjectId might be a good fit for all the possible shuffleManager, afterall, you are handling some objects whether it is a File , or a Stream, or whatever way you save your data to, So, a ShuffleBlockManager it self might still be needed to access this object in certain shortcut cases for simplifier API, and you can name the method getDataObjectHandle or whatever fits.. I do also have a PR for this idea, say generalize the object and pass around an ObjectID for different storage type at #1209

So does this make any sense to you ;) Still, I agree if we could find better way to solve the netty block sender problem, This could be hide.

@rxin, How about we also hide current BlockFetcherIterator kind of thing behind shuffleManager. since a specific shuffleManager not necessary using current fetcher approaching to get shuffle data. Each shuffleManager should instance his own shuffle logic, while some could reuse the same logic, say FileBased one could reuse current implementation. By this way, we can solve the above problem and have better chance to not expose shuffleBlockManager, say a read/write interface for shuffle reader/writter is enough.

AmplabJenkins · 2014-06-27T06:30:25Z

Merged build triggered.

AmplabJenkins · 2014-06-27T06:30:32Z

Merged build started.

AmplabJenkins · 2014-06-27T07:15:19Z

Merged build finished.

AmplabJenkins · 2014-06-27T07:15:20Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16199/

AmplabJenkins · 2014-06-30T03:15:35Z

Merged build triggered.

AmplabJenkins · 2014-06-30T03:15:45Z

Merged build started.

AmplabJenkins · 2014-06-30T04:15:35Z

Merged build triggered.

AmplabJenkins · 2014-06-30T04:15:40Z

Merged build started.

colorant · 2014-06-30T04:24:33Z

@rxin Moved getBlockLocation method from shuffleBlockManager to HashShuffleBlockMananger to make the interface more general. Does current interface looks reasonable for you?

Also still a few shuffle related code could be moved further from block manager to some specific shuffle manager related classes' implementation ( e.g. blockManager.getMultiple). But since they are not tightly related to this shuffleBlockManager generalization works and I am not quite sure whether the other shufflemanager implementation will reuse them or not, so just leave it as it is, and could be done in future PR I guess.

AmplabJenkins · 2014-06-30T05:16:50Z

Merged build finished.

AmplabJenkins · 2014-06-30T05:16:50Z

Merged build finished.

AmplabJenkins · 2014-06-30T05:16:50Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16253/

AmplabJenkins · 2014-06-30T05:16:50Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16256/

rxin · 2014-06-30T05:19:39Z

Thanks - we are all super busy with Spark Summit this week so probably will get to this later in the week... feel free to send a reminder if I don't revisit this towards the end of the week.

colorant · 2014-07-07T02:10:38Z

ping @rxin ;)

AmplabJenkins · 2014-07-08T07:31:06Z

Merged build triggered.

AmplabJenkins · 2014-07-08T07:31:15Z

Merged build started.

rxin · 2014-07-08T07:40:50Z

Sorry will take a look tomorrow!

AmplabJenkins · 2014-07-08T08:17:30Z

Merged build finished.

AmplabJenkins · 2014-07-08T08:17:30Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16396/

rxin · 2014-08-27T06:28:14Z

Thanks for doing this. To help with the review, can you write a short design doc discussing the interfaces between different components, similar to the one attached here https://issues.apache.org/jira/browse/SPARK-3019 ?

rxin · 2014-08-27T06:34:33Z

core/src/main/scala/org/apache/spark/storage/DiskStore.scala

can u add a todo here that getValues should bypass getBytes to use stream based APIs? Otherwise this uses a lot of memory during external sort merge.

colorant · 2014-08-28T08:20:35Z

Jenkins, test this please

colorant · 2014-08-29T06:20:44Z

Jenkins, test this please.

rxin · 2014-08-29T06:21:21Z

Jenkins, ok to test.

SparkQA · 2014-08-29T06:24:18Z

QA tests have started for PR 1241 at commit 0e01ae3.

This patch merges cleanly.

SparkQA · 2014-08-29T07:19:36Z

QA tests have finished for PR 1241 at commit 0e01ae3.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2014-08-30T05:50:26Z

core/src/main/scala/org/apache/spark/storage/BlockId.scala

if the reduce id is always 0, why even bother defining it?

rxin · 2014-08-30T06:05:07Z

Merging this now. I will take care of some minor things myself. Thanks!

By Hiding the shuffleblockmanager behind Shufflemanager, we decouple the shuffle data's block mapping management work from Diskblockmananger. This give a more clear interface and more easy for other shuffle manager to implement their own block management logic. the jira ticket have more details. Author: Raymond Liu <[email protected]> Closes apache#1241 from colorant/shuffle and squashes the following commits: 0e01ae3 [Raymond Liu] Move ShuffleBlockmanager behind shuffleManager

rxin reviewed Jun 27, 2014
View reviewed changes

core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala Outdated

Copy link

Contributor

rxin Jun 27, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add @return explaining what the boolean return value means

colorant reviewed Jun 27, 2014
View reviewed changes

rxin reviewed Aug 27, 2014
View reviewed changes

colorant force-pushed the shuffle branch from 26d65c5 to 211a018 Compare August 28, 2014 08:01

Move ShuffleBlockmanager behind shuffleManager

0e01ae3

colorant force-pushed the shuffle branch from 211a018 to 0e01ae3 Compare August 29, 2014 06:01

rxin reviewed Aug 30, 2014
View reviewed changes

core/src/main/scala/org/apache/spark/storage/BlockId.scala

Copy link

Contributor

rxin Aug 30, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the reduce id is always 0, why even bother defining it?

asfgit closed this in acea928 Aug 30, 2014

[SPARK-2288] Hide ShuffleBlockManager behind ShuffleManager #1241

[SPARK-2288] Hide ShuffleBlockManager behind ShuffleManager #1241

Uh oh!

Conversation

colorant commented Jun 27, 2014

Uh oh!

AmplabJenkins commented Jun 27, 2014

Uh oh!

AmplabJenkins commented Jun 27, 2014

Uh oh!

AmplabJenkins commented Jun 27, 2014

Uh oh!

AmplabJenkins commented Jun 27, 2014

Uh oh!

AmplabJenkins commented Jun 27, 2014

Uh oh!

AmplabJenkins commented Jun 27, 2014

Uh oh!

rxin Jun 27, 2014

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Jun 27, 2014

Uh oh!

AmplabJenkins commented Jun 27, 2014

Uh oh!

colorant Jun 27, 2014

Choose a reason for hiding this comment

Uh oh!

colorant Jun 27, 2014

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Jun 27, 2014

Uh oh!

AmplabJenkins commented Jun 27, 2014

Uh oh!

AmplabJenkins commented Jun 27, 2014

Uh oh!

AmplabJenkins commented Jun 27, 2014

Uh oh!

AmplabJenkins commented Jun 30, 2014

Uh oh!

AmplabJenkins commented Jun 30, 2014

Uh oh!

AmplabJenkins commented Jun 30, 2014

Uh oh!

AmplabJenkins commented Jun 30, 2014

Uh oh!

colorant commented Jun 30, 2014

Uh oh!

AmplabJenkins commented Jun 30, 2014

Uh oh!

AmplabJenkins commented Jun 30, 2014

Uh oh!

AmplabJenkins commented Jun 30, 2014

Uh oh!

AmplabJenkins commented Jun 30, 2014

Uh oh!

rxin commented Jun 30, 2014

Uh oh!

colorant commented Jul 7, 2014

Uh oh!

AmplabJenkins commented Jul 8, 2014

Uh oh!

AmplabJenkins commented Jul 8, 2014

Uh oh!

rxin commented Jul 8, 2014

Uh oh!

AmplabJenkins commented Jul 8, 2014

Uh oh!

AmplabJenkins commented Jul 8, 2014

Uh oh!

rxin commented Aug 27, 2014

Uh oh!

rxin Aug 27, 2014

Choose a reason for hiding this comment

Uh oh!

colorant commented Aug 28, 2014

Uh oh!

colorant commented Aug 29, 2014

Uh oh!