-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage #30691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @jiangxb1987 @Victsm @mridulm @Ngone51 @tgravescs Please review |
core/src/main/scala/org/apache/spark/internal/config/package.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/internal/config/package.scala
Outdated
Show resolved
Hide resolved
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
+CC @venkata91 please resolve conflicts |
1. Handling of MergeResults from the executors in MapOutputTracker
2. Shuffle merge finalization in DagScheduler
This also includes the following changes:
- LIHADOOP-52972 Tests for changes in MapOutputTracker and DagScheduler related to pushbased shuffle.
Author: Chandni Singh <[email protected]>
- LIHADOOP-52202 Utility to create a directory with 770 permission.
Author: Chandni Singh <[email protected]>
- LIHADOOP-52972 Moved isPushBasedShuffleEnabled to Utils and added a unit test for it.
Author: Ye Zhou <[email protected]>
LIHADOOP-53321 Magnet: Merge client shuffle block fetcher related changes
LIHADOOP-54115 Unregister map and merge outputs on the host when DAG scheduler encounters a shuffle chunk failure
RB=2151376
BUG=LIHADOOP-54115
G=spark-reviewers
R=yezhou,mshen
A=mshen
LIHADOOP-52494 Magnet fallback to origin shuffle blocks when fetch of a shuffle chunk fails
Prepare for PR
SPARK-32920: Push based shuffle finalization of shuffle push/merge as part of
ShuffleMapStage and preparing for the reduce stage
Address otterc review comments
|
Gentle ping @Ngone51 @tgravescs @mridulm @Victsm |
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
Outdated
Show resolved
Hide resolved
mridulm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @venkata91 !
Took a pass through it
core/src/main/scala/org/apache/spark/internal/config/package.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/internal/config/package.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| private def processShuffleMapStageCompletion(shuffleStage: ShuffleMapStage): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review note: no changes here. Method extracted from handleTaskCompletion
fetch failure cases
aa889f1 to
7046206
Compare
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
|
Changes lgtm. |
|
jenkins, test this please |
|
Test build #139624 has finished for PR 30691 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
jenkins, test this please |
|
Kubernetes integration test starting |
|
Test build #139650 has finished for PR 30691 at commit
|
|
The test failures are unrelated to this pr. |
|
Kubernetes integration test status success |
|
Thanks for working on this @venkata91 |
…ased shuffle and preparation step for the reduce stage ### What changes were proposed in this pull request? Summary of the changes made as part of this PR: 1. `DAGScheduler` changes to finalize a ShuffleMapStage which involves talking to all the shuffle mergers (`ExternalShuffleService`) and getting all the completed merge statuses. 2. Once the `ShuffleMapStage` finalization is complete, mark the `ShuffleMapStage` to be finalized which marks the stage as complete and subsequently letting the child stage start. 3. Also added the relevant tests to `DAGSchedulerSuite` for changes made as part of [SPARK-32919](https://issues.apache.org/jira/browse/SPARK-32919) Lead-authored-by: Min Shen mshenlinkedin.com Co-authored-by: Venkata krishnan Sowrirajan vsowrirajanlinkedin.com Co-authored-by: Chandni Singh chsinghlinkedin.com ### Why are the changes needed? Refer to [SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit tests to DAGSchedulerSuite Closes apache#30691 from venkata91/SPARK-32920. Lead-authored-by: Venkata krishnan Sowrirajan <[email protected]> Co-authored-by: Min Shen <[email protected]> Co-authored-by: Chandni Singh <[email protected]> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
…ased shuffle and preparation step for the reduce stage Summary of the changes made as part of this PR: 1. `DAGScheduler` changes to finalize a ShuffleMapStage which involves talking to all the shuffle mergers (`ExternalShuffleService`) and getting all the completed merge statuses. 2. Once the `ShuffleMapStage` finalization is complete, mark the `ShuffleMapStage` to be finalized which marks the stage as complete and subsequently letting the child stage start. 3. Also added the relevant tests to `DAGSchedulerSuite` for changes made as part of [SPARK-32919](https://issues.apache.org/jira/browse/SPARK-32919) Lead-authored-by: Min Shen mshenlinkedin.com Co-authored-by: Venkata krishnan Sowrirajan vsowrirajanlinkedin.com Co-authored-by: Chandni Singh chsinghlinkedin.com Refer to [SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602) No Added unit tests to DAGSchedulerSuite Closes #30691 from venkata91/SPARK-32920. Lead-authored-by: Venkata krishnan Sowrirajan <[email protected]> Co-authored-by: Min Shen <[email protected]> Co-authored-by: Chandni Singh <[email protected]> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
What changes were proposed in this pull request?
Summary of the changes made as part of this PR:
DAGSchedulerchanges to finalize a ShuffleMapStage which involves talking to all the shuffle mergers (ExternalShuffleService) and getting all the completed merge statuses.ShuffleMapStagefinalization is complete, mark theShuffleMapStageto be finalized which marks the stage as complete and subsequently letting the child stage start.DAGSchedulerSuitefor changes made as part of SPARK-32919Lead-authored-by: Min Shen [email protected]
Co-authored-by: Venkata krishnan Sowrirajan [email protected]
Co-authored-by: Chandni Singh [email protected]
Why are the changes needed?
Refer to SPARK-30602
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added unit tests to DAGSchedulerSuite