Skip to content

Commit 6a62f96

Browse files
committed
Distinguish single-machine and distributed model parallel
1 parent c66913a commit 6a62f96

File tree

2 files changed

+10
-3
lines changed

2 files changed

+10
-3
lines changed

intermediate_source/model_parallel_tutorial.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# -*- coding: utf-8 -*-
22
"""
3-
Model Parallel Best Practices
3+
Single-Machine Model Parallel Best Practices
44
================================
55
**Author**: `Shen Li <https://mrshenli.github.io/>`_
66
@@ -27,6 +27,13 @@
2727
of model parallel. It is up to the readers to apply the ideas to real-world
2828
applications.
2929
30+
.. note::
31+
32+
For distributed model parallel training where a model spans multiple
33+
servers, please refer to
34+
`Getting Started With Distributed RPC Framework <rpc_tutorial.html>__
35+
for examples and details.
36+
3037
Basic Usage
3138
-----------
3239
"""

intermediate_source/rpc_tutorial.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@ This tutorial uses two simple examples to demonstrate how to build distributed
1212
training with the `torch.distributed.rpc <https://pytorch.org/docs/master/rpc.html>`__
1313
package which is first introduced as an experimental feature in PyTorch v1.4.
1414
Source code of the two examples can be found in
15-
`PyTorch examples <https://github.com/pytorch/examples>`__
15+
`PyTorch examples <https://github.com/pytorch/examples>`__.
1616

1717
Previous tutorials,
18-
`Getting Started With Distributed Data Parallel <https://pytorch.org/tutorials/intermediate/ddp_tutorial.html>`__
18+
`Getting Started With Distributed Data Parallel <ddp_tutorial.html>`__
1919
and `Writing Distributed Applications With PyTorch <https://pytorch.org/tutorials/intermediate/dist_tuto.html>`__,
2020
described `DistributedDataParallel <https://pytorch.org/docs/stable/_modules/torch/nn/parallel/distributed.html>`__
2121
which supports a specific training paradigm where the model is replicated across

0 commit comments

Comments
 (0)