@@ -4,40 +4,43 @@ Getting Started with Distributed RPC Framework
44
55
66This tutorial uses two simple examples to demonstrate how to build distributed
7- applications with the `torch.distributed.rpc ` package. Source code of the two
8- examples can be found in `PyTorch examples <https://github.com/pytorch/examples >`__
7+ training with the `torch.distributed.rpc <https://pytorch.org/docs/master/rpc.html >`__
8+ package. Source code of the two examples can be found in
9+ `PyTorch examples <https://github.com/pytorch/examples >`__
910
10- Previous tutorials described `DistributedDataParallel <https://pytorch.org/docs/stable/_modules/torch/nn/parallel/distributed.html >`__
11+ `Previous <https://deploy-preview-807--pytorch-tutorials-preview.netlify.com/intermediate/ddp_tutorial.html >`__
12+ `tutorials <https://deploy-preview-807--pytorch-tutorials-preview.netlify.com/intermediate/dist_tuto.html >`__
13+ described `DistributedDataParallel <https://pytorch.org/docs/stable/_modules/torch/nn/parallel/distributed.html >`__
1114which supports a specific training paradigm where the model is replicated across
1215multiple processes and each process handles a split of the input data.
1316Sometimes, you might run into scenarios that require different training
14- paradigms:
17+ paradigms. For example :
1518
16191) In reinforcement learning, it might be relatively expensive to acquire
1720 training data from environments while the model itself can be quite small. In
1821 this case, it might be useful to spawn multiple observers running in parallel
1922 and share a single agent. In this case, the agent takes care of the training
2023 locally, but the application would still need libraries to send and receive
21- data between observers and the trainer
24+ data between observers and the trainer.
22252) Your model might be too large to fit in GPUs on a single machine, and hence
23- would need a library to help split a model onto multiple machines. Or you
26+ would need a library to help split the model onto multiple machines. Or you
2427 might be implementing a `parameter server <https://www.cs.cmu.edu/~muli/file/parameter_server_osdi14.pdf >`__
2528 training framework, where model parameters and trainers live on different
2629 machines.
2730
2831
2932The `torch.distributed.rpc <https://pytorch.org/docs/master/rpc.html >`__ package
3033can help with the above scenarios. In case 1, `RPC <https://pytorch.org/docs/master/rpc.html#rpc >`__
31- and `RRef <https://pytorch.org/docs/master/rpc.html#rref >`__ can help send data
32- from one worker to another and also easily referencing remote data objects. In
34+ and `RRef <https://pytorch.org/docs/master/rpc.html#rref >`__ allow sending data
35+ from one worker to another while easily referencing remote data objects. In
3336case 2, `distributed autograd <https://pytorch.org/docs/master/rpc.html#distributed-autograd-framework >`__
3437and `distributed optimizer <https://pytorch.org/docs/master/rpc.html#module-torch.distributed.optim >`__
35- allows executing backward and optimizer step as if it is local training. In the
36- next two sections, we will demonstrate APIs of
38+ make executing backward pass and optimizer step as if it is local training. In
39+ the next two sections, we will demonstrate APIs of
3740`torch.distributed.rpc <https://pytorch.org/docs/master/rpc.html >`__ using a
3841reinforcement learning example and a language model example. Please note, this
39- tutorial is not aiming at building the most accurate or efficient models to
40- solve given problems, instead the main goal is to show how to use the
42+ tutorial does not aim at building the most accurate or efficient models to
43+ solve given problems, instead, the main goal here is to show how to use the
4144`torch.distributed.rpc <https://pytorch.org/docs/master/rpc.html >`__ package to
4245build distributed training applications.
4346
@@ -76,12 +79,13 @@ usages.
7679 action_scores = self .affine2(x)
7780 return F.softmax(action_scores, dim = 1 )
7881
79- Let's first prepare a helper function to call a function on a local ``RRef ``. It
80- might look unnecessary at the first glance, as you could simply do
81- ``rref.local_value().some_func(args) `` to run the target function. The reason
82- for adding this helper function is because there is no way to get a reference
83- of a remote value, and ``local_value `` is only available on the owner of the
84- ``RRef ``.
82+ Let's first prepare a helper to run functions remotely on the owner worker of an
83+ ``RRef ``. You will find this function been used in several places this
84+ tutorial's examples. Ideally, the `torch.distributed.rpc ` package should provide
85+ these helper functions out of box. For example, it will be easier if
86+ applications can directly call ``RRef.some_func(*arg) `` which will then
87+ translate to RPC to the ``RRef `` owner. The progress on this API is tracked in
88+ `pytorch/pytorch#31743 <https://github.com/pytorch/pytorch/issues/31743 >`__.
8589
8690.. code :: python
8791
@@ -99,13 +103,6 @@ of a remote value, and ``local_value`` is only available on the owner of the
99103 # _remote_method(some_func, rref, *args)
100104
101105
102- Ideally, the `torch.distributed.rpc ` package should provide these helper
103- functions out of box. For example, it will be easier if applications can
104- directly call ``RRef.some_func(*arg) `` which will then translate to RPC to the
105- ``RRef `` owner. The progress on this API is tracked in in
106- `pytorch/pytorch#31743 <https://github.com/pytorch/pytorch/issues/31743 >`__.
107-
108-
109106 We are ready to present the observer. In this example, each observer creates its
110107own environment, and waits for the agent's command to run an episode. In each
111108episode, one observer loops at most ``n_steps `` iterations, and in each
@@ -116,7 +113,8 @@ RPC to report the reward to the agent. Again, please note that, this is
116113obviously not the most efficient observer implementation. For example, one
117114simple optimization could be packing current state and last reward in one RPC to
118115reduce the communication overhead. However, the goal is to demonstrate RPC API
119- instead of building the best solver for CartPole.
116+ instead of building the best solver for CartPole. So, let's keep the logic
117+ simple and the two steps explicit in this example.
120118
121119.. code :: python
122120
@@ -152,9 +150,13 @@ such that it sends command to multiple distributed observers to run episodes,
152150and it also records all actions and rewards locally which will be used during
153151the training phase after each episode. The code below shows ``Agent ``
154152constructor where most lines are initializing various components. The loop at
155- the end initializes observers on other workers, and holds ``RRefs `` to those
156- observers locally. The agent will use those observer ``RRefs `` later to send
157- commands.
153+ the end initializes observers remotely on other workers, and holds ``RRefs `` to
154+ those observers locally. The agent will use those observer ``RRefs `` later to
155+ send commands. Applications don't need to worry about the lifetime of ``RRefs ``.
156+ The owner of each ``RRef `` maintains a reference counting map to track it's
157+ lifetime, and guarantees the remote data object will not be deleted as long as
158+ there is any live user of that ``RRef ``. Please refer to the ``RRef ``
159+ `design doc <https://pytorch.org/docs/master/notes/rref.html >`__ for details.
158160
159161
160162.. code :: python
@@ -186,9 +188,9 @@ commands.
186188 self .saved_log_probs[ob_info.id] = []
187189
188190
189- Next, the agent exposes two APIs to allow observers to select actions and report
190- rewards. Those functions are only run locally on the agent, but will be
191- triggered by observers through RPC.
191+ Next, the agent exposes two APIs to observers for selecting actions and
192+ reporting rewards. Those functions are only run locally on the agent, but will
193+ be triggered by observers through RPC.
192194
193195
194196.. code :: python
@@ -212,9 +214,9 @@ to execute an episode. In this function, it first creates a list to collect
212214futures from asynchronous RPCs, and then loop over all observer ``RRefs `` to
213215make asynchronous RPCs. In these RPCs, the agent also passes an ``RRef `` of
214216itself to the observer, so that the observer can call functions on the agent as
215- well. As shown above, each observer will make RPCs back to the agent, which is
216- actually nested RPCs. After each episode, the ``saved_log_probs `` and
217- `` rewards `` will contain the recorded action probs and rewards.
217+ well. As shown above, each observer will make RPCs back to the agent, which are
218+ nested RPCs. After each episode, the ``saved_log_probs `` and `` rewards `` will
219+ contain the recorded action probs and rewards.
218220
219221
220222.. code :: python
0 commit comments