-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-13902][SCHEDULER] Make DAGScheduler not to create duplicate stage. #12655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…cal order to ensure building ancestor stages first.
This reverts commit 1636531.
| val parents = new Stack[ShuffleDependency[_, _, _]] | ||
| /** | ||
| * Find ancestor shuffle dependencies that are not registered in shuffleToMapStage yet. | ||
| * This is done in topological order to create ancestor stages first to ensure that the result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this comment to some place inside the method? It doesn't seem relevant to someone using this method, since the order of the ShuffleDependencies returned doesn't matter
|
I think some of the terminology used in this and related PRs is confusing the issues. When @kayousterhout and I ask about "correctness", what we are fundamentally concerned about is whether evaluation of the DAG produces the correct data elements. I don't think that your description of "incorrect" or "illegal" graphs is meant to imply that incorrect data is produced from their evaluation. Correct me if I am wrong, but I think that you are talking exclusively about graphs that are not optimal, causing duplication of effort and preventing further optimizations -- graphs that are taking longer to evaluate than is necessary, not graphs that are producing incorrect data elements. If I am thinking correctly about this, then the entire effect of this and related PRs is to improve or optimize the DAGScheduler, not to create graphs and schedules that produce different end results than the DAGScheduler does now. |
|
Test build #56868 has finished for PR 12655 at commit
|
|
Test build #56873 has finished for PR 12655 at commit
|
|
@markhamstra Thank you for your comment. |
|
I just updated the JIRA with what I understand to be the issue. Can you take a look and let me know if that's correct? If the simpler example I showed is sufficient to reproduce the issue (and my explanation is correct), can you simplify the unit test to use that example, and also update the JIRA and pull request description to have that text? Also, if that is a correct explanation of the issue, I think there is a simpler fix than the one you did. What about changing the method getAncestorShuffleDependencies to instead be called createAncestorShuffleMapStages (and have it not return anything). Then in that method, instead of adding each shuffle dependency to parents, immediately create the shuffle stage there (using the line of code that's currently here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L289). That way, the check for if shuffleIdToMapStage already contains the dependency will work correctly so we don't create duplicate changes (and that is a simpler change). |
|
@kayousterhout As for the |
|
I see -- now I understand the motivation for returning the shuffle dependencies topologically sorted, because it limits the depth of the recursion (it looks like the old code was trying to do that with the stack, but didn't quite get it right?). Let me think about whether there's a simpler way to accomplish that. |
|
@kayousterhout |
|
It looks like #8923 by @suyanNone fixes at least the test that this PR adds to the DAGSchedulerSuite, and does so much more simply. |
|
Thanks for pointing that out @markhamstra. I'd be in favor of that solution (but augmented with a more clear test case -- which might just mean adding a nice ascii-art description of the DAG in the test case). What do you think Mark? @useshin do you see any issue with the simpler approach in #8923? |
|
Test build #57709 has finished for PR 12655 at commit
|
|
Test build #58119 has finished for PR 12655 at commit
|
|
Test build #58121 has finished for PR 12655 at commit
|
|
@kayousterhout I simplified a test and updated JIRA and PR title and description. |
| * | ||
| * Note: [] means an RDD, () means a shuffle dependency. | ||
| */ | ||
| test("[SPARK-13902] not to create duplicate stage.") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change this to "[SPARK-13902] Ensure no duplicate stages are created"?
|
This change looks good to me at this point (with the one small test name change). @markhamstra are you satisfied with this solution for now, and more significant clean up of this code path can be done in a later PR? |
|
LGTM |
|
Test build #58429 has finished for PR 12655 at commit
|
|
Jenkins, retest this please. |
|
Test build #58438 has finished for PR 12655 at commit
|
|
Sorry there was a build break in master. retest this please |
|
Jenkins, retest this please. |
|
Test build #58452 has finished for PR 12655 at commit
|
|
I took another look - we should probably merge this in 2.0 too. Kay can you do that? Thanks! |
…age.
## What changes were proposed in this pull request?
`DAGScheduler`sometimes generate incorrect stage graph.
Suppose you have the following DAG:
```
[A] <--(s_A)-- [B] <--(s_B)-- [C] <--(s_C)-- [D]
\ /
<-------------
```
Note: [] means an RDD, () means a shuffle dependency.
Here, RDD `B` has a shuffle dependency on RDD `A`, and RDD `C` has shuffle dependency on both `B` and `A`. The shuffle dependency IDs are numbers in the `DAGScheduler`, but to make the example easier to understand, let's call the shuffled data from `A` shuffle dependency ID `s_A` and the shuffled data from `B` shuffle dependency ID `s_B`.
The `getAncestorShuffleDependencies` method in `DAGScheduler` (incorrectly) does not check for duplicates when it's adding ShuffleDependencies to the parents data structure, so for this DAG, when `getAncestorShuffleDependencies` gets called on `C` (previous of the final RDD), `getAncestorShuffleDependencies` will return `s_A`, `s_B`, `s_A` (`s_A` gets added twice: once when the method "visit"s RDD `C`, and once when the method "visit"s RDD `B`). This is problematic because this line of code: https://github.com/apache/spark/blob/8ef3399/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L289 then generates a new shuffle stage for each dependency returned by `getAncestorShuffleDependencies`, resulting in duplicate map stages that compute the map output from RDD `A`.
As a result, `DAGScheduler` generates the following stages and their parents for each shuffle:
| | stage | parents |
|----|----|----|
| s_A | ShuffleMapStage 2 | List() |
| s_B | ShuffleMapStage 1 | List(ShuffleMapStage 0) |
| s_C | ShuffleMapStage 3 | List(ShuffleMapStage 1, ShuffleMapStage 2) |
| - | ResultStage 4 | List(ShuffleMapStage 3) |
The stage for s_A should be `ShuffleMapStage 0`, but the stage for `s_A` is generated twice as `ShuffleMapStage 2` and `ShuffleMapStage 0` is overwritten by `ShuffleMapStage 2`, and the stage `ShuffleMap Stage1` keeps referring the old stage `ShuffleMapStage 0`.
This patch is fixing it.
## How was this patch tested?
I added the sample RDD graph to show the illegal stage graph to `DAGSchedulerSuite`.
Author: Takuya UESHIN <[email protected]>
Closes #12655 from ueshin/issues/SPARK-13902.
|
Merged into 2.0. Thanks @ueshin for your work on this and for bearing with us as we agreed on the simplest solution -- awesome to have this fixed! |
|
@kayousterhout, @markhamstra Thanks a lot! |
…age.
## What changes were proposed in this pull request?
`DAGScheduler`sometimes generate incorrect stage graph.
Suppose you have the following DAG:
```
[A] <--(s_A)-- [B] <--(s_B)-- [C] <--(s_C)-- [D]
\ /
<-------------
```
Note: [] means an RDD, () means a shuffle dependency.
Here, RDD `B` has a shuffle dependency on RDD `A`, and RDD `C` has shuffle dependency on both `B` and `A`. The shuffle dependency IDs are numbers in the `DAGScheduler`, but to make the example easier to understand, let's call the shuffled data from `A` shuffle dependency ID `s_A` and the shuffled data from `B` shuffle dependency ID `s_B`.
The `getAncestorShuffleDependencies` method in `DAGScheduler` (incorrectly) does not check for duplicates when it's adding ShuffleDependencies to the parents data structure, so for this DAG, when `getAncestorShuffleDependencies` gets called on `C` (previous of the final RDD), `getAncestorShuffleDependencies` will return `s_A`, `s_B`, `s_A` (`s_A` gets added twice: once when the method "visit"s RDD `C`, and once when the method "visit"s RDD `B`). This is problematic because this line of code: https://github.com/apache/spark/blob/8ef3399/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L289 then generates a new shuffle stage for each dependency returned by `getAncestorShuffleDependencies`, resulting in duplicate map stages that compute the map output from RDD `A`.
As a result, `DAGScheduler` generates the following stages and their parents for each shuffle:
| | stage | parents |
|----|----|----|
| s_A | ShuffleMapStage 2 | List() |
| s_B | ShuffleMapStage 1 | List(ShuffleMapStage 0) |
| s_C | ShuffleMapStage 3 | List(ShuffleMapStage 1, ShuffleMapStage 2) |
| - | ResultStage 4 | List(ShuffleMapStage 3) |
The stage for s_A should be `ShuffleMapStage 0`, but the stage for `s_A` is generated twice as `ShuffleMapStage 2` and `ShuffleMapStage 0` is overwritten by `ShuffleMapStage 2`, and the stage `ShuffleMap Stage1` keeps referring the old stage `ShuffleMapStage 0`.
This patch is fixing it.
## How was this patch tested?
I added the sample RDD graph to show the illegal stage graph to `DAGSchedulerSuite`.
Author: Takuya UESHIN <[email protected]>
Closes apache#12655 from ueshin/issues/SPARK-13902.
…age.
## What changes were proposed in this pull request?
`DAGScheduler`sometimes generate incorrect stage graph.
Suppose you have the following DAG:
```
[A] <--(s_A)-- [B] <--(s_B)-- [C] <--(s_C)-- [D]
\ /
<-------------
```
Note: [] means an RDD, () means a shuffle dependency.
Here, RDD `B` has a shuffle dependency on RDD `A`, and RDD `C` has shuffle dependency on both `B` and `A`. The shuffle dependency IDs are numbers in the `DAGScheduler`, but to make the example easier to understand, let's call the shuffled data from `A` shuffle dependency ID `s_A` and the shuffled data from `B` shuffle dependency ID `s_B`.
The `getAncestorShuffleDependencies` method in `DAGScheduler` (incorrectly) does not check for duplicates when it's adding ShuffleDependencies to the parents data structure, so for this DAG, when `getAncestorShuffleDependencies` gets called on `C` (previous of the final RDD), `getAncestorShuffleDependencies` will return `s_A`, `s_B`, `s_A` (`s_A` gets added twice: once when the method "visit"s RDD `C`, and once when the method "visit"s RDD `B`). This is problematic because this line of code: https://github.com/apache/spark/blob/8ef3399/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L289 then generates a new shuffle stage for each dependency returned by `getAncestorShuffleDependencies`, resulting in duplicate map stages that compute the map output from RDD `A`.
As a result, `DAGScheduler` generates the following stages and their parents for each shuffle:
| | stage | parents |
|----|----|----|
| s_A | ShuffleMapStage 2 | List() |
| s_B | ShuffleMapStage 1 | List(ShuffleMapStage 0) |
| s_C | ShuffleMapStage 3 | List(ShuffleMapStage 1, ShuffleMapStage 2) |
| - | ResultStage 4 | List(ShuffleMapStage 3) |
The stage for s_A should be `ShuffleMapStage 0`, but the stage for `s_A` is generated twice as `ShuffleMapStage 2` and `ShuffleMapStage 0` is overwritten by `ShuffleMapStage 2`, and the stage `ShuffleMap Stage1` keeps referring the old stage `ShuffleMapStage 0`.
This patch is fixing it.
## How was this patch tested?
I added the sample RDD graph to show the illegal stage graph to `DAGSchedulerSuite`.
Author: Takuya UESHIN <[email protected]>
Closes apache#12655 from ueshin/issues/SPARK-13902.
What changes were proposed in this pull request?
DAGSchedulersometimes generate incorrect stage graph.Suppose you have the following DAG:
Note: [] means an RDD, () means a shuffle dependency.
Here, RDD
Bhas a shuffle dependency on RDDA, and RDDChas shuffle dependency on bothBandA. The shuffle dependency IDs are numbers in theDAGScheduler, but to make the example easier to understand, let's call the shuffled data fromAshuffle dependency IDs_Aand the shuffled data fromBshuffle dependency IDs_B.The
getAncestorShuffleDependenciesmethod inDAGScheduler(incorrectly) does not check for duplicates when it's adding ShuffleDependencies to the parents data structure, so for this DAG, whengetAncestorShuffleDependenciesgets called onC(previous of the final RDD),getAncestorShuffleDependencieswill returns_A,s_B,s_A(s_Agets added twice: once when the method "visit"s RDDC, and once when the method "visit"s RDDB). This is problematic because this line of code: https://github.com/apache/spark/blob/8ef3399/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L289 then generates a new shuffle stage for each dependency returned bygetAncestorShuffleDependencies, resulting in duplicate map stages that compute the map output from RDDA.As a result,
DAGSchedulergenerates the following stages and their parents for each shuffle:The stage for s_A should be
ShuffleMapStage 0, but the stage fors_Ais generated twice asShuffleMapStage 2andShuffleMapStage 0is overwritten byShuffleMapStage 2, and the stageShuffleMap Stage1keeps referring the old stageShuffleMapStage 0.This patch is fixing it.
How was this patch tested?
I added the sample RDD graph to show the illegal stage graph to
DAGSchedulerSuite.