Add example for distributed launcher #743

drdarshan · 2020-03-24T23:00:50Z

Resolves PyTorch issue #35160 (pytorch/pytorch#35160). The example shows how to structure a DDP application so it an be started via the distributed launcher script in several configurations: one process per GPU, one process per node or anything in between.

mrshenli

This is awesome!! Thanks for putting this together in such a short period of time!! I left some minor comments inline.

distributed/ddp/ProcessMapping.svg

distributed/ddp/README.md

distributed/ddp/example.py

Fix typos and make a few grammatical improvements.

DDP now broadcasts the initial model from rank 0 so it's no longer necessary to randomly initialize it on all ranks with the same random seed

Per suggestion on PR, uploaded and replaced the SVG image with a GitHub permalink

Add example for distributed launcher

drdarshan added 5 commits March 24, 2020 15:51

Create README.md

00aeea1

Create example.py

6f632ef

Add files via upload

d7f66aa

Remove scaling of image

45b6110

Remove math notation from README, replace with italics

a9ca8ff

mrshenli reviewed Mar 25, 2020

View reviewed changes

distributed/ddp/ProcessMapping.svg Outdated Show resolved Hide resolved

distributed/ddp/README.md Outdated Show resolved Hide resolved

distributed/ddp/example.py Outdated Show resolved Hide resolved

drdarshan added 2 commits March 24, 2020 22:04

Address PR comments in README

e838b9d

Fix typos and make a few grammatical improvements.

Remove explicit setting of random seed

6a64766

DDP now broadcasts the initial model from rank 0 so it's no longer necessary to randomly initialize it on all ranks with the same random seed

mrshenli mentioned this pull request Mar 25, 2020

Documentation doesn't cover MWE using launch.py script pytorch/pytorch#35160

Open

drdarshan added 2 commits March 26, 2020 10:29

Replace SVG image with GitHub Permalink

a5cafe9

Per suggestion on PR, uploaded and replaced the SVG image with a GitHub permalink

Delete ProcessMapping.svg

39ff9d8

drdarshan requested a review from mrshenli April 14, 2020 16:52

mrshenli approved these changes May 20, 2020

View reviewed changes

Merge branch 'master' into master

a5fdab9

jlin27 merged commit 8dcb9c7 into pytorch:master May 20, 2020

YinZhengxun pushed a commit to YinZhengxun/mt-exercise-02 that referenced this pull request Mar 30, 2025

Merge pull request pytorch#743 from drdarshan/master

1b75213

Add example for distributed launcher

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add example for distributed launcher #743

Add example for distributed launcher #743

Uh oh!

drdarshan commented Mar 24, 2020

Uh oh!

mrshenli left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add example for distributed launcher #743

Add example for distributed launcher #743

Uh oh!

Conversation

drdarshan commented Mar 24, 2020

Uh oh!

mrshenli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants