[SPARK-13902][SCHEDULER] Make DAGScheduler not to create duplicate stage. #12655

ueshin · 2016-04-25T02:34:34Z

What changes were proposed in this pull request?

DAGSchedulersometimes generate incorrect stage graph.

Suppose you have the following DAG:

[A] <--(s_A)-- [B] <--(s_B)-- [C] <--(s_C)-- [D]
            \                /
              <-------------

Note: [] means an RDD, () means a shuffle dependency.

Here, RDD B has a shuffle dependency on RDD A, and RDD C has shuffle dependency on both B and A. The shuffle dependency IDs are numbers in the DAGScheduler, but to make the example easier to understand, let's call the shuffled data from A shuffle dependency ID s_A and the shuffled data from B shuffle dependency ID s_B.
The getAncestorShuffleDependencies method in DAGScheduler (incorrectly) does not check for duplicates when it's adding ShuffleDependencies to the parents data structure, so for this DAG, when getAncestorShuffleDependencies gets called on C (previous of the final RDD), getAncestorShuffleDependencies will return s_A, s_B, s_A (s_A gets added twice: once when the method "visit"s RDD C, and once when the method "visit"s RDD B). This is problematic because this line of code: https://github.com/apache/spark/blob/8ef3399/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L289 then generates a new shuffle stage for each dependency returned by getAncestorShuffleDependencies, resulting in duplicate map stages that compute the map output from RDD A.

As a result, DAGScheduler generates the following stages and their parents for each shuffle:

	stage	parents
s_A	ShuffleMapStage 2	List()
s_B	ShuffleMapStage 1	List(ShuffleMapStage 0)
s_C	ShuffleMapStage 3	List(ShuffleMapStage 1, ShuffleMapStage 2)
-	ResultStage 4	List(ShuffleMapStage 3)

The stage for s_A should be ShuffleMapStage 0, but the stage for s_A is generated twice as ShuffleMapStage 2 and ShuffleMapStage 0 is overwritten by ShuffleMapStage 2, and the stage ShuffleMap Stage1 keeps referring the old stage ShuffleMapStage 0.

This patch is fixing it.

How was this patch tested?

I added the sample RDD graph to show the illegal stage graph to DAGSchedulerSuite.

…cal order to ensure building ancestor stages first.

This reverts commit 1636531.

kayousterhout · 2016-04-25T02:44:20Z

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

-    val parents = new Stack[ShuffleDependency[_, _, _]]
+  /**
+   * Find ancestor shuffle dependencies that are not registered in shuffleToMapStage yet.
+   * This is done in topological order to create ancestor stages first to ensure that the result


Can you move this comment to some place inside the method? It doesn't seem relevant to someone using this method, since the order of the ShuffleDependencies returned doesn't matter

markhamstra · 2016-04-25T02:58:47Z

I think some of the terminology used in this and related PRs is confusing the issues. When @kayousterhout and I ask about "correctness", what we are fundamentally concerned about is whether evaluation of the DAG produces the correct data elements. I don't think that your description of "incorrect" or "illegal" graphs is meant to imply that incorrect data is produced from their evaluation. Correct me if I am wrong, but I think that you are talking exclusively about graphs that are not optimal, causing duplication of effort and preventing further optimizations -- graphs that are taking longer to evaluate than is necessary, not graphs that are producing incorrect data elements.

If I am thinking correctly about this, then the entire effect of this and related PRs is to improve or optimize the DAGScheduler, not to create graphs and schedules that produce different end results than the DAGScheduler does now.

…hod.

SparkQA · 2016-04-25T04:14:53Z

Test build #56868 has finished for PR 12655 at commit 3a8ff84.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-25T06:05:52Z

Test build #56873 has finished for PR 12655 at commit cab5264.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ueshin · 2016-04-25T06:42:02Z

@markhamstra Thank you for your comment.
I thought the non-optimal state in DAGScheduler was a kind of bug so I used the words "incorrect" or "illegal" but now I see what you thought.
So yes, I was talking about only graphs and want to improve DAGSchduler performance by this and #12060.

kayousterhout · 2016-04-25T17:43:12Z

I just updated the JIRA with what I understand to be the issue. Can you take a look and let me know if that's correct? If the simpler example I showed is sufficient to reproduce the issue (and my explanation is correct), can you simplify the unit test to use that example, and also update the JIRA and pull request description to have that text?

Also, if that is a correct explanation of the issue, I think there is a simpler fix than the one you did. What about changing the method getAncestorShuffleDependencies to instead be called createAncestorShuffleMapStages (and have it not return anything). Then in that method, instead of adding each shuffle dependency to parents, immediately create the shuffle stage there (using the line of code that's currently here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L289). That way, the check for if shuffleIdToMapStage already contains the dependency will work correctly so we don't create duplicate changes (and that is a simpler change).

ueshin · 2016-04-26T05:08:53Z

@kayousterhout
I'll update the unit test and JIRA, PR descriptions to use the simpler example.

As for the createAncestorShuffleMapStages way, I think it will have a risk of StackOverflowError for a long job because the master branch version of getAncestorShuffleDependencies finds descendants first, which would have a lot of ancestors with long linage, and if we immediately create the shuffle stage there by the newOrUsedShuffleStage method, which builds all ancestor stages by recursive-call fashion, the StackOverflowError will be thrown.

kayousterhout · 2016-04-26T05:22:58Z

I see -- now I understand the motivation for returning the shuffle dependencies topologically sorted, because it limits the depth of the recursion (it looks like the old code was trying to do that with the stack, but didn't quite get it right?). Let me think about whether there's a simpler way to accomplish that.

ueshin · 2016-04-26T08:45:49Z

@kayousterhout
I added a comment to JIRA.
Could you take a look at it and let me know which example I should use for the description, the original one or the "simpler" one (because the "simpler" one is not so simpler than original..).

markhamstra · 2016-04-27T17:18:48Z

It looks like #8923 by @suyanNone fixes at least the test that this PR adds to the DAGSchedulerSuite, and does so much more simply.

kayousterhout · 2016-04-27T19:21:02Z

Thanks for pointing that out @markhamstra. I'd be in favor of that solution (but augmented with a more clear test case -- which might just mean adding a nice ascii-art description of the DAG in the test case). What do you think Mark? @useshin do you see any issue with the simpler approach in #8923?

SparkQA · 2016-05-04T05:11:37Z

Test build #57709 has finished for PR 12655 at commit ab92488.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-05-09T05:44:09Z

Test build #58119 has finished for PR 12655 at commit b4e2eb1.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-05-09T07:32:27Z

Test build #58121 has finished for PR 12655 at commit 55d6b6d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ueshin · 2016-05-09T07:45:31Z

@kayousterhout I simplified a test and updated JIRA and PR title and description.
Please take a look at them again and let me know what else to do to merge this PR.

kayousterhout · 2016-05-11T23:26:41Z

core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala

+   *
+   * Note: [] means an RDD, () means a shuffle dependency.
+   */
+  test("[SPARK-13902] not to create duplicate stage.") {


Can you change this to "[SPARK-13902] Ensure no duplicate stages are created"?

kayousterhout · 2016-05-11T23:29:04Z

This change looks good to me at this point (with the one small test name change). @markhamstra are you satisfied with this solution for now, and more significant clean up of this code path can be done in a later PR?

markhamstra · 2016-05-11T23:31:29Z

LGTM

SparkQA · 2016-05-12T02:32:32Z

Test build #58429 has finished for PR 12655 at commit 3ceb4d5.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

ueshin · 2016-05-12T03:04:48Z

Jenkins, retest this please.

SparkQA · 2016-05-12T04:41:41Z

Test build #58438 has finished for PR 12655 at commit 3ceb4d5.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-05-12T04:56:33Z

Sorry there was a build break in master.

retest this please

ueshin · 2016-05-12T05:24:26Z

Jenkins, retest this please.

SparkQA · 2016-05-12T07:17:35Z

Test build #58452 has finished for PR 12655 at commit 3ceb4d5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-05-12T20:52:20Z

I took another look - we should probably merge this in 2.0 too. Kay can you do that? Thanks!

…age. ## What changes were proposed in this pull request? `DAGScheduler`sometimes generate incorrect stage graph. Suppose you have the following DAG: ``` [A] <--(s_A)-- [B] <--(s_B)-- [C] <--(s_C)-- [D] \ / <------------- ``` Note: [] means an RDD, () means a shuffle dependency. Here, RDD `B` has a shuffle dependency on RDD `A`, and RDD `C` has shuffle dependency on both `B` and `A`. The shuffle dependency IDs are numbers in the `DAGScheduler`, but to make the example easier to understand, let's call the shuffled data from `A` shuffle dependency ID `s_A` and the shuffled data from `B` shuffle dependency ID `s_B`. The `getAncestorShuffleDependencies` method in `DAGScheduler` (incorrectly) does not check for duplicates when it's adding ShuffleDependencies to the parents data structure, so for this DAG, when `getAncestorShuffleDependencies` gets called on `C` (previous of the final RDD), `getAncestorShuffleDependencies` will return `s_A`, `s_B`, `s_A` (`s_A` gets added twice: once when the method "visit"s RDD `C`, and once when the method "visit"s RDD `B`). This is problematic because this line of code: https://github.com/apache/spark/blob/8ef3399/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L289 then generates a new shuffle stage for each dependency returned by `getAncestorShuffleDependencies`, resulting in duplicate map stages that compute the map output from RDD `A`. As a result, `DAGScheduler` generates the following stages and their parents for each shuffle: | | stage | parents | |----|----|----| | s_A | ShuffleMapStage 2 | List() | | s_B | ShuffleMapStage 1 | List(ShuffleMapStage 0) | | s_C | ShuffleMapStage 3 | List(ShuffleMapStage 1, ShuffleMapStage 2) | | - | ResultStage 4 | List(ShuffleMapStage 3) | The stage for s_A should be `ShuffleMapStage 0`, but the stage for `s_A` is generated twice as `ShuffleMapStage 2` and `ShuffleMapStage 0` is overwritten by `ShuffleMapStage 2`, and the stage `ShuffleMap Stage1` keeps referring the old stage `ShuffleMapStage 0`. This patch is fixing it. ## How was this patch tested? I added the sample RDD graph to show the illegal stage graph to `DAGSchedulerSuite`. Author: Takuya UESHIN <[email protected]> Closes #12655 from ueshin/issues/SPARK-13902.

kayousterhout · 2016-05-12T21:09:04Z

Merged into 2.0. Thanks @ueshin for your work on this and for bearing with us as we agreed on the simplest solution -- awesome to have this fixed!

ueshin · 2016-05-13T00:19:13Z

@kayousterhout, @markhamstra Thanks a lot!
I would like you to go back to #12060.

…te stage. apache#12655

…age. ## What changes were proposed in this pull request? `DAGScheduler`sometimes generate incorrect stage graph. Suppose you have the following DAG: ``` [A] <--(s_A)-- [B] <--(s_B)-- [C] <--(s_C)-- [D] \ / <------------- ``` Note: [] means an RDD, () means a shuffle dependency. Here, RDD `B` has a shuffle dependency on RDD `A`, and RDD `C` has shuffle dependency on both `B` and `A`. The shuffle dependency IDs are numbers in the `DAGScheduler`, but to make the example easier to understand, let's call the shuffled data from `A` shuffle dependency ID `s_A` and the shuffled data from `B` shuffle dependency ID `s_B`. The `getAncestorShuffleDependencies` method in `DAGScheduler` (incorrectly) does not check for duplicates when it's adding ShuffleDependencies to the parents data structure, so for this DAG, when `getAncestorShuffleDependencies` gets called on `C` (previous of the final RDD), `getAncestorShuffleDependencies` will return `s_A`, `s_B`, `s_A` (`s_A` gets added twice: once when the method "visit"s RDD `C`, and once when the method "visit"s RDD `B`). This is problematic because this line of code: https://github.com/apache/spark/blob/8ef3399/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L289 then generates a new shuffle stage for each dependency returned by `getAncestorShuffleDependencies`, resulting in duplicate map stages that compute the map output from RDD `A`. As a result, `DAGScheduler` generates the following stages and their parents for each shuffle: | | stage | parents | |----|----|----| | s_A | ShuffleMapStage 2 | List() | | s_B | ShuffleMapStage 1 | List(ShuffleMapStage 0) | | s_C | ShuffleMapStage 3 | List(ShuffleMapStage 1, ShuffleMapStage 2) | | - | ResultStage 4 | List(ShuffleMapStage 3) | The stage for s_A should be `ShuffleMapStage 0`, but the stage for `s_A` is generated twice as `ShuffleMapStage 2` and `ShuffleMapStage 0` is overwritten by `ShuffleMapStage 2`, and the stage `ShuffleMap Stage1` keeps referring the old stage `ShuffleMapStage 0`. This patch is fixing it. ## How was this patch tested? I added the sample RDD graph to show the illegal stage graph to `DAGSchedulerSuite`. Author: Takuya UESHIN <[email protected]> Closes apache#12655 from ueshin/issues/SPARK-13902.

ueshin added 11 commits March 15, 2016 17:46

Add a test to check if the stage graph is properly built.

9a1724d

Make DAGScheduler.getAncestorShuffleDependencies() return in topologi…

f8b7910

…cal order to ensure building ancestor stages first.

Refactor getAncestorShuffleDependencies.

0ea3fc8

Fix topological sort.

697b322

Merge branch 'master' into issues/SPARK-13902

d6d3c34

Add assertion to check not to overwrite illegally.

1636531

Modify to mitigate adds extra push&pop.

92e9f44

Modify comment.

4b412f5

Add a comment to explain what the test is doing.

8fb9a14

Revert "Add assertion to check not to overwrite illegally."

e2cfeaf

This reverts commit 1636531.

Modify to cut down on the repeated scanning of data structures.

3a8ff84

ueshin mentioned this pull request Apr 25, 2016

[SPARK-14269][SCHEDULER] Eliminate unnecessary submitStage() call. #12060

Closed

kayousterhout reviewed Apr 25, 2016
View reviewed changes

ueshin added 3 commits April 25, 2016 11:59

Move a comment of the order of the returned dependencies into the met…

b2bd75c

…hod.

Add a JIRA name to the test name.

3281f0e

Fix style.

cab5264

ueshin added 2 commits May 4, 2016 12:19

Revert DAGScheduler.

1b68724

Use apache#8923 change.

ab92488

Use simpler example.

b4e2eb1

Fix scalastyle.

55d6b6d

ueshin changed the title ~~[SPARK-13902][SCHEDULER] Make DAGScheduler.getAncestorShuffleDependencies() return in topological order to ensure building ancestor stages first.~~ [SPARK-13902][SCHEDULER] Make DAGScheduler not to create duplicate stage. May 9, 2016

kayousterhout reviewed May 11, 2016
View reviewed changes

Change a test name.

3ceb4d5

suyanNone mentioned this pull request May 12, 2016

[SPARK][SPARK-10842]Eliminate creating duplicate stage while generate job dag #8923

Closed

asfgit closed this in a57aada May 12, 2016

zzcclp added a commit to zzcclp/spark that referenced this pull request May 13, 2016

[EXT][SPARK-13902][SCHEDULER] Make DAGScheduler not to create duplica…

4fa2601

…te stage. apache#12655

[SPARK-13902][SCHEDULER] Make DAGScheduler not to create duplicate stage. #12655

[SPARK-13902][SCHEDULER] Make DAGScheduler not to create duplicate stage. #12655

Uh oh!

Conversation

ueshin commented Apr 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

kayousterhout Apr 25, 2016

Choose a reason for hiding this comment

Uh oh!

markhamstra commented Apr 25, 2016

Uh oh!

SparkQA commented Apr 25, 2016

Uh oh!

SparkQA commented Apr 25, 2016

Uh oh!

ueshin commented Apr 25, 2016

Uh oh!

kayousterhout commented Apr 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ueshin commented Apr 26, 2016

Uh oh!

kayousterhout commented Apr 26, 2016

Uh oh!

ueshin commented Apr 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markhamstra commented Apr 27, 2016

Uh oh!

kayousterhout commented Apr 27, 2016

Uh oh!

SparkQA commented May 4, 2016

Uh oh!

SparkQA commented May 9, 2016

Uh oh!

SparkQA commented May 9, 2016

Uh oh!

ueshin commented May 9, 2016

Uh oh!

kayousterhout May 11, 2016

Choose a reason for hiding this comment

Uh oh!

kayousterhout commented May 11, 2016

Uh oh!

markhamstra commented May 11, 2016

Uh oh!

SparkQA commented May 12, 2016

Uh oh!

ueshin commented May 12, 2016

Uh oh!

SparkQA commented May 12, 2016

Uh oh!

andrewor14 commented May 12, 2016

Uh oh!

ueshin commented May 12, 2016

Uh oh!

SparkQA commented May 12, 2016

Uh oh!

rxin commented May 12, 2016

Uh oh!

kayousterhout commented May 12, 2016

Uh oh!

ueshin commented May 13, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ueshin commented Apr 25, 2016 •

edited

Loading

kayousterhout commented Apr 25, 2016 •

edited

Loading

ueshin commented Apr 26, 2016 •

edited

Loading