Adding distributed pipeline parallelism example #749

mrshenli · 2020-04-12T21:40:27Z

This example shows how to use RPC to implement pipeline parallelism. This can be viewed as a distributed version of single machine multiple GPU pipeline parallelism.

The numbers below show how the total execution time decreases with the increase of num_split.

$ python main.py 
Processing batch 0
Processing batch 1
Processing batch 2
number of splits = 1, execution time = 16.45062756538391
Processing batch 0
Processing batch 1
Processing batch 2
number of splits = 2, execution time = 12.329529762268066
Processing batch 0
Processing batch 1
Processing batch 2
number of splits = 4, execution time = 10.164430618286133
Processing batch 0
Processing batch 1
Processing batch 2
number of splits = 8, execution time = 9.076049566268921

osalpekar

Left a few suggestions for comments. Thanks for the great example @mrshenli!

osalpekar · 2020-04-14T19:20:34Z

distributed/rpc/pipeline/main.py

+        labels = torch.zeros(batch_size, num_classes) \
+                      .scatter_(1, one_hot_indices, 1)
+
+        with dist_autograd.context() as context_id:


Should we add a short comment about what dist_autograd/dist_optimizer is doing here?

osalpekar · 2020-04-14T19:57:25Z

distributed/rpc/pipeline/main.py

+
+        return nn.Sequential(*layers)
+
+


Maybe we should include some comments about what we're doing here at a high level (defining resnet with 2 partitions so we can place them on separate machines.) Also, should we call these Partitions or Shards instead of parts?

osalpekar · 2020-04-14T20:23:40Z

distributed/rpc/pipeline/README.md

+Distributed Pipeline Parallel Example
+
+This example shows how to distribute a ResNet50 model on two RPC workers and
+then implement distributed pipeline parallelism using RPC.


Should we include a quick description of the pipelining strategy (pipelining micro-batches within a batch and then synchronously running the optimizer step)? Since this is like GPipe, should we also link the paper here?

Co-authored-by: Shen Li <[email protected]>

Adding a distributed pipeline parallelism example for RPC

f3fd229

mrshenli force-pushed the pipeline branch from 7f78743 to f3fd229 Compare April 13, 2020 01:58

mrshenli changed the title ~~[WIP] Adding distributed pipeline parallelism example~~ Adding distributed pipeline parallelism example Apr 13, 2020

mrshenli mentioned this pull request Apr 13, 2020

Adding distributed pipeline parallel tutorial pytorch/tutorials#948

Merged

osalpekar approved these changes Apr 14, 2020

View reviewed changes

soumith merged commit d431037 into pytorch:master Apr 23, 2020

mrshenli deleted the pipeline branch July 1, 2020 20:01

mrshenli mentioned this pull request Jul 1, 2020

Update pipeline parallel example to use RRef helpers #796

Merged

YinZhengxun pushed a commit to YinZhengxun/mt-exercise-02 that referenced this pull request Mar 30, 2025

Adding a distributed pipeline parallelism example for RPC (pytorch#749)

3873782

Co-authored-by: Shen Li <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding distributed pipeline parallelism example #749

Adding distributed pipeline parallelism example #749

Uh oh!

mrshenli commented Apr 12, 2020 •

edited

Loading

Uh oh!

osalpekar left a comment

Uh oh!

osalpekar Apr 14, 2020

Uh oh!

osalpekar Apr 14, 2020

Uh oh!

osalpekar Apr 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adding distributed pipeline parallelism example #749

Adding distributed pipeline parallelism example #749

Uh oh!

Conversation

mrshenli commented Apr 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

osalpekar left a comment

Choose a reason for hiding this comment

Uh oh!

osalpekar Apr 14, 2020

Choose a reason for hiding this comment

Uh oh!

osalpekar Apr 14, 2020

Choose a reason for hiding this comment

Uh oh!

osalpekar Apr 14, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mrshenli commented Apr 12, 2020 •

edited

Loading