[SPARK-8029][core] shuffleoutput per attempt #6648

squito · 2015-06-04T18:01:49Z

https://issues.apache.org/jira/browse/SPARK-8029

This implements one of the approaches in the design doc on the jira: now each ShuffleMapTask attempt write to a different location. ShuffleBlockId is extended to include the stage attempt id, so the fetch side knows which files to read from. MapStatus also includes the stage attempt, so now there is one MapStatus per (executor, attempt) as opposed to one per executor. This won't really matter when there is just one attempt per stage. In a pathological case, you'd end up with one MapStatus per partition, which would be much worse, but that is very unlikely.

This touches a lot of files, but almost all of the changes are just plumbing a stageAttemptId through a lot of different places.

cc @JoshRosen

…rtial fix, still have some concurrent attempts

…e actual data is in the middle of it

…ts for the same stage

Conflicts: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala

Conflicts: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

…les, tests do not)

Conflicts: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala

vanzin · 2015-10-08T17:46:19Z

core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala

super nit: comment should follow javadoc formatting:

/** * Comment. */

vanzin · 2015-10-08T18:01:22Z

Looks sane, but this isn't really my area of expertise. Just a reminder that you should either enable DAGSchedulerFailureRecoverySuite or remove it from the patch.

Also, left a question about backwards compatibility.

SparkQA · 2015-10-08T19:12:29Z

Test build #43401 has finished for PR 6648 at commit f37be91.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class ShuffleBlockId(shuffleId: Int, mapId: Int, reduceId: Int, stageAttemptId: Int)
- case class ShuffleDataBlockId(shuffleId: Int, mapId: Int, stageAttemptId: Int, reduceId: Int)
- case class ShuffleIndexBlockId(shuffleId: Int, mapId: Int, stageAttemptId: Int, reduceId: Int)

SparkQA · 2015-10-09T14:51:20Z

Test build #43470 has finished for PR 6648 at commit 37ac799.

This patch fails MiMa tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
- case class ShuffleBlockId(shuffleId: Int, mapId: Int, reduceId: Int, stageAttemptId: Int)
- case class ShuffleDataBlockId(shuffleId: Int, mapId: Int, stageAttemptId: Int, reduceId: Int)
- case class ShuffleIndexBlockId(shuffleId: Int, mapId: Int, stageAttemptId: Int, reduceId: Int)

squito · 2015-10-09T15:16:08Z

Jenkins, retest this please

Conflicts: project/MimaExcludes.scala

SparkQA · 2015-10-09T17:50:53Z

Test build #43475 has finished for PR 6648 at commit c9a9e08.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class ShuffleBlockId(shuffleId: Int, mapId: Int, reduceId: Int, stageAttemptId: Int)
- case class ShuffleDataBlockId(shuffleId: Int, mapId: Int, stageAttemptId: Int, reduceId: Int)
- case class ShuffleIndexBlockId(shuffleId: Int, mapId: Int, stageAttemptId: Int, reduceId: Int)
- public final class UnsafeRow extends MutableRow implements Externalizable, KryoSerializable
- /** Run a function within Hive state (SessionState, HiveConf, Hive client and class loader) */

squito · 2015-10-09T19:56:31Z

@vanzin @JoshRosen made external shuffle service backwards compatible and got rid of DAGSchedulerFailureRecoverySuite

vanzin · 2015-10-09T20:32:14Z

I looked at the diffs since my last review, looks good.

rxin · 2015-10-12T20:59:09Z

I will get @JoshRosen to take a look at this.

mateiz · 2015-10-13T00:02:25Z

Hey Imran,

Given the number of changes required for this approach, I wonder whether an atomic rename design wouldn't be simpler (in particular, the "first attempt wins" in the doc). The doc seems to be worried that a file output might be corrupted, but in that case, why not send a message to the node asking it to delete its old output files, and then send a new map task? It can just be the delete-block message that the block manager already supports. This seems much nicer because it doesn't require any changes to the data structures in the rest of Spark.

mateiz · 2015-10-13T00:08:19Z

BTW, with that design, I also wouldn't even implement the delete message in the first patch, unless we've actually seen block corruptions happen; but it sounds like we haven't seen such things and we probably wouldn't have a great way to detect them now anyway (i.e. the reduce task would mark a fetch successful and just crash).

SparkQA · 2015-11-09T21:01:43Z

Test build #45389 has finished for PR 6648 at commit fbd129b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class MasterWebUI(\n * case class ShuffleBlockId(shuffleId: Int, mapId: Int, reduceId: Int, stageAttemptId: Int)\n * case class ShuffleDataBlockId(shuffleId: Int, mapId: Int, stageAttemptId: Int, reduceId: Int)\n * case class ShuffleIndexBlockId(shuffleId: Int, mapId: Int, stageAttemptId: Int, reduceId: Int)\n * public class JavaAFTSurvivalRegressionExample\n

squito · 2015-11-09T23:10:26Z

Jenkins, retest this please

SparkQA · 2015-11-10T02:33:00Z

Test build #45437 has finished for PR 6648 at commit fbd129b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * case class ShuffleBlockId(shuffleId: Int, mapId: Int, reduceId: Int, stageAttemptId: Int)\n * case class ShuffleDataBlockId(shuffleId: Int, mapId: Int, stageAttemptId: Int, reduceId: Int)\n * case class ShuffleIndexBlockId(shuffleId: Int, mapId: Int, stageAttemptId: Int, reduceId: Int)\n

squito · 2015-11-10T02:54:18Z

Jenkins, retest this please

SparkQA · 2015-11-10T02:57:43Z

Test build #45484 has started for PR 6648 at commit fbd129b.

squito · 2015-11-10T14:46:59Z

Jenkins, retest this please

SparkQA · 2015-11-10T17:23:43Z

Test build #45528 has finished for PR 6648 at commit fbd129b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * case class ShuffleBlockId(shuffleId: Int, mapId: Int, reduceId: Int, stageAttemptId: Int)\n * case class ShuffleDataBlockId(shuffleId: Int, mapId: Int, stageAttemptId: Int, reduceId: Int)\n * case class ShuffleIndexBlockId(shuffleId: Int, mapId: Int, stageAttemptId: Int, reduceId: Int)\n

squito · 2015-11-10T18:00:55Z

Jenkins, retest this please

SparkQA · 2015-11-10T20:59:28Z

Test build #45533 has finished for PR 6648 at commit fbd129b.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * case class ShuffleBlockId(shuffleId: Int, mapId: Int, reduceId: Int, stageAttemptId: Int)\n * case class ShuffleDataBlockId(shuffleId: Int, mapId: Int, stageAttemptId: Int, reduceId: Int)\n * case class ShuffleIndexBlockId(shuffleId: Int, mapId: Int, stageAttemptId: Int, reduceId: Int)\n

squito added 30 commits May 6, 2015 19:49

tasks know which stageAttempt they belong to

d08c20c

reproduce the failure

89e8428

ignore fetch failure from attempts that are already failed. only a pa…

70a787b

…rtial fix, still have some concurrent attempts

ignore the test for now just to avoid swamping jenkins

7fbcefb

style

2eebbf2

more rigorous test case

7142242

index file needs to handle cases when data file already exist, and th…

ccaa159

…e actual data is in the middle of it

pare down the unit test

3585b96

SparkIllegalStateException if we ever have multiple concurrent attemp…

de23530

…ts for the same stage

better unit test

c91ee10

handle more cases from bad ordering of task attempt completion

05c72fd

Merge branch 'master' into SPARK-7308_fix

5dc5436

Conflicts: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala

cleanup imports

37eece8

style

31c21fa

include all missing mapIds in error msg

a894be1

update existing test since we now do more resubmitting than before

93592b1

style

ea2d972

Merge branch 'master' into SPARK-7308_fix

de0a596

Conflicts: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

fixes from merge

6654c53

better fix from merge

dd2839d

shuffle map output writes to a different file per attempt (main compi…

e684928

…les, tests do not)

tests compile

2523431

avoid NPE in finally block

4d976f4

use case class for result of mapOutputTracker.getServerStatus

2b723fd

fix tests

fd40a93

Merge branch 'master' into SPARK_8029_shuffleoutput_per_attempt

b5d8ec5

Conflicts: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala

style

9f01d7e

style

fae9c0c

make ContextCleanerSuite pass ... though maybe the test is pointless

06daceb

fix tests

cd16ee8

mima

f37be91

vanzin reviewed Oct 8, 2015
View reviewed changes

squito added 3 commits October 8, 2015 16:26

comment formatting

fac0f1c

get rid of DAGSchedulerFailureRecoverySuite

a38d760

ExternalShuffleBlockResolver can handle blockIds w/out stageAttemptId

37ac799

Merge branch 'master' into SPARK_8029_shuffleoutput_per_attempt

c9a9e08

Conflicts: project/MimaExcludes.scala

squito mentioned this pull request Nov 9, 2015

[SPARK-8029][core] first successful shuffle task always wins #9214

Closed

Merge branch 'master' into SPARK_8029_shuffleoutput_per_attempt

fbd129b

squito closed this Nov 13, 2015

cloud-fan mentioned this pull request Sep 3, 2018

[SPARK-23243][Core] Fix RDD.repartition() data correctness issue #22112

Closed

[SPARK-8029][core] shuffleoutput per attempt #6648

[SPARK-8029][core] shuffleoutput per attempt #6648

Uh oh!

Conversation

squito commented Jun 4, 2015

Uh oh!

vanzin Oct 8, 2015

Choose a reason for hiding this comment

Uh oh!

vanzin commented Oct 8, 2015

Uh oh!

SparkQA commented Oct 8, 2015

Uh oh!

SparkQA commented Oct 9, 2015

Uh oh!

squito commented Oct 9, 2015

Uh oh!

SparkQA commented Oct 9, 2015

Uh oh!

squito commented Oct 9, 2015

Uh oh!

vanzin commented Oct 9, 2015

Uh oh!

rxin commented Oct 12, 2015

Uh oh!

mateiz commented Oct 13, 2015

Uh oh!

mateiz commented Oct 13, 2015

Uh oh!

SparkQA commented Nov 9, 2015

Uh oh!

squito commented Nov 9, 2015

Uh oh!

SparkQA commented Nov 10, 2015

Uh oh!

squito commented Nov 10, 2015

Uh oh!

SparkQA commented Nov 10, 2015

Uh oh!

squito commented Nov 10, 2015

Uh oh!

SparkQA commented Nov 10, 2015

Uh oh!

squito commented Nov 10, 2015

Uh oh!

SparkQA commented Nov 10, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants