Skip to content

Conversation

@venkata91
Copy link
Contributor

@venkata91 venkata91 commented Dec 9, 2020

What changes were proposed in this pull request?

Summary of the changes made as part of this PR:

  1. DAGScheduler changes to finalize a ShuffleMapStage which involves talking to all the shuffle mergers (ExternalShuffleService) and getting all the completed merge statuses.
  2. Once the ShuffleMapStage finalization is complete, mark the ShuffleMapStage to be finalized which marks the stage as complete and subsequently letting the child stage start.
  3. Also added the relevant tests to DAGSchedulerSuite for changes made as part of SPARK-32919

Lead-authored-by: Min Shen [email protected]
Co-authored-by: Venkata krishnan Sowrirajan [email protected]
Co-authored-by: Chandni Singh [email protected]

Why are the changes needed?

Refer to SPARK-30602

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added unit tests to DAGSchedulerSuite

@github-actions github-actions bot added the CORE label Dec 9, 2020
@venkata91
Copy link
Contributor Author

cc @jiangxb1987 @Victsm @mridulm @Ngone51 @tgravescs Please review

@HyukjinKwon HyukjinKwon changed the title [SPARK-32920][SHUFFLE] WIP Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage [WIP][SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage Dec 11, 2020
@HyukjinKwon HyukjinKwon marked this pull request as draft December 11, 2020 03:39
@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 22, 2021
@github-actions github-actions bot closed this Mar 23, 2021
@mridulm mridulm removed the Stale label Apr 28, 2021
@mridulm mridulm reopened this Apr 28, 2021
@mridulm
Copy link
Contributor

mridulm commented Apr 28, 2021

+CC @venkata91 please resolve conflicts

@venkata91 venkata91 changed the title [WIP][SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage Apr 28, 2021
@venkata91 venkata91 marked this pull request as ready for review April 28, 2021 17:14
    1. Handling of MergeResults from the executors in MapOutputTracker
    2. Shuffle merge finalization in DagScheduler

    This also includes the following changes:
      - LIHADOOP-52972 Tests for changes in MapOutputTracker and DagScheduler related to pushbased shuffle.
        Author: Chandni Singh <[email protected]>

      - LIHADOOP-52202 Utility to create a directory with 770 permission.
        Author: Chandni Singh <[email protected]>

      - LIHADOOP-52972 Moved isPushBasedShuffleEnabled to Utils and added a unit test for it.
        Author: Ye Zhou <[email protected]>

LIHADOOP-53321 Magnet: Merge client shuffle block fetcher related changes

LIHADOOP-54115 Unregister map and merge outputs on the host when DAG scheduler encounters a shuffle chunk failure

RB=2151376
BUG=LIHADOOP-54115
G=spark-reviewers
R=yezhou,mshen
A=mshen

LIHADOOP-52494 Magnet fallback to origin shuffle blocks when fetch of a shuffle chunk fails

Prepare for PR

SPARK-32920: Push based shuffle finalization of shuffle push/merge as part of
ShuffleMapStage and preparing for the reduce stage

Address otterc review comments
@venkata91
Copy link
Contributor Author

Fixed the PR rebasing the latest master changes and also marked it as open for review. cc @mridulm @Victsm @otterc @Ngone51

@venkata91
Copy link
Contributor Author

Gentle ping @Ngone51 @tgravescs @mridulm @Victsm

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @venkata91 !
Took a pass through it

}
}

private def processShuffleMapStageCompletion(shuffleStage: ShuffleMapStage): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review note: no changes here. Method extracted from handleTaskCompletion

@venkata91 venkata91 force-pushed the SPARK-32920 branch 2 times, most recently from aa889f1 to 7046206 Compare May 7, 2021 17:47
@Victsm
Copy link
Contributor

Victsm commented Jun 9, 2021

Changes lgtm.

@mridulm
Copy link
Contributor

mridulm commented Jun 10, 2021

jenkins, test this please

@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Test build #139624 has finished for PR 30691 at commit e570818.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44151/

@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44151/

@mridulm
Copy link
Contributor

mridulm commented Jun 10, 2021

jenkins, test this please

@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44178/

@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Test build #139650 has finished for PR 30691 at commit e570818.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mridulm
Copy link
Contributor

mridulm commented Jun 10, 2021

The test failures are unrelated to this pr.
Merging to master.

@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44178/

@asfgit asfgit closed this in b5a1503 Jun 10, 2021
@mridulm
Copy link
Contributor

mridulm commented Jun 10, 2021

Thanks for working on this @venkata91
Thanks for all the reviews and comments @Ngone51, @otterc, @Victsm !

domybest11 pushed a commit to domybest11/spark that referenced this pull request Jun 15, 2022
…ased shuffle and preparation step for the reduce stage

### What changes were proposed in this pull request?

Summary of the changes made as part of this PR:

1. `DAGScheduler` changes to finalize a ShuffleMapStage which involves talking to all the shuffle mergers (`ExternalShuffleService`) and getting all the completed merge statuses.
2. Once the `ShuffleMapStage` finalization is complete, mark the `ShuffleMapStage` to be finalized which marks the stage as complete and subsequently letting the child stage start.
3. Also added the relevant tests to `DAGSchedulerSuite` for changes made as part of [SPARK-32919](https://issues.apache.org/jira/browse/SPARK-32919)

Lead-authored-by: Min Shen mshenlinkedin.com
Co-authored-by: Venkata krishnan Sowrirajan vsowrirajanlinkedin.com
Co-authored-by: Chandni Singh chsinghlinkedin.com

### Why are the changes needed?

Refer to [SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added unit tests to DAGSchedulerSuite

Closes apache#30691 from venkata91/SPARK-32920.

Lead-authored-by: Venkata krishnan Sowrirajan <[email protected]>
Co-authored-by: Min Shen <[email protected]>
Co-authored-by: Chandni Singh <[email protected]>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
wangyum pushed a commit that referenced this pull request May 26, 2023
…ased shuffle and preparation step for the reduce stage

Summary of the changes made as part of this PR:

1. `DAGScheduler` changes to finalize a ShuffleMapStage which involves talking to all the shuffle mergers (`ExternalShuffleService`) and getting all the completed merge statuses.
2. Once the `ShuffleMapStage` finalization is complete, mark the `ShuffleMapStage` to be finalized which marks the stage as complete and subsequently letting the child stage start.
3. Also added the relevant tests to `DAGSchedulerSuite` for changes made as part of [SPARK-32919](https://issues.apache.org/jira/browse/SPARK-32919)

Lead-authored-by: Min Shen mshenlinkedin.com
Co-authored-by: Venkata krishnan Sowrirajan vsowrirajanlinkedin.com
Co-authored-by: Chandni Singh chsinghlinkedin.com

Refer to [SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)

No

Added unit tests to DAGSchedulerSuite

Closes #30691 from venkata91/SPARK-32920.

Lead-authored-by: Venkata krishnan Sowrirajan <[email protected]>
Co-authored-by: Min Shen <[email protected]>
Co-authored-by: Chandni Singh <[email protected]>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants