You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: distributed/rpc/parameter_server/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@
3
3
This is a basic example of RPC-based training that uses several trainers remotely train a model hosted on a server.
4
4
5
5
To run the example locally, run the following command worker for the server and each worker you wish to spawn, in separate terminal windows:
6
-
`python rpc_parameter_server.py --world_size=WORLD_SIZE --rank=RANK`. For example, for a master node with world size of 2, the command would be `python rpc_parameter_server.py ---world_size=2 --rank=0`. The trainer can then be launched with the command `python rpc_parameter_server.py --world_size=2 --rank=1` in a separate window, and this will begin training with one server and a single trainer.
6
+
`python rpc_parameter_server.py --world_size=WORLD_SIZE --rank=RANK`. For example, for a master node with world size of 2, the command would be `python rpc_parameter_server.py --world_size=2 --rank=0`. The trainer can then be launched with the command `python rpc_parameter_server.py --world_size=2 --rank=1` in a separate window, and this will begin training with one server and a single trainer.
7
7
8
8
Note that for demonstration purposes, this example supports only between 0-2 GPUs, although the pattern can be extended to make use of additional GPUs. To configure the number of GPUs, pass in `--num_gpus=N` to your training command.
0 commit comments