Skip to content

Commit e94d48c

Browse files
Jeff Yangrohitgr7
authored andcommitted
[docs] distributed_backend -> accelerator (#4429)
* distributed_backend -> accelerator * distributed_backend -> accelerator * use_amp -> precision * format Co-authored-by: rohitgr7 <[email protected]> (cherry picked from commit ebe3a31)
1 parent 7f48c87 commit e94d48c

File tree

7 files changed

+37
-37
lines changed

7 files changed

+37
-37
lines changed

docs/source/introduction_guide.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -543,7 +543,7 @@ Or multiple nodes
543543
544544
# (32 GPUs)
545545
model = LitMNIST()
546-
trainer = Trainer(gpus=8, num_nodes=4, distributed_backend='ddp')
546+
trainer = Trainer(gpus=8, num_nodes=4, accelerator='ddp')
547547
trainer.fit(model, train_loader)
548548
549549
Refer to the :ref:`distributed computing guide for more details <multi_gpu>`.

docs/source/lightning_module.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -256,7 +256,7 @@ The matching pseudocode is:
256256
257257
Training with DataParallel
258258
~~~~~~~~~~~~~~~~~~~~~~~~~~
259-
When training using a `distributed_backend` that splits data from each batch across GPUs, sometimes you might
259+
When training using a `accelerator` that splits data from each batch across GPUs, sometimes you might
260260
need to aggregate them on the master GPU for processing (dp, or ddp2).
261261

262262
In this case, implement the `training_step_end` method
@@ -360,7 +360,7 @@ If you need to do something with all the outputs of each `validation_step`, over
360360
361361
Validating with DataParallel
362362
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
363-
When training using a `distributed_backend` that splits data from each batch across GPUs, sometimes you might
363+
When training using a `accelerator` that splits data from each batch across GPUs, sometimes you might
364364
need to aggregate them on the master GPU for processing (dp, or ddp2).
365365

366366
In this case, implement the `validation_step_end` method

docs/source/multi_gpu.rst

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -231,11 +231,11 @@ Distributed modes
231231
-----------------
232232
Lightning allows multiple ways of training
233233

234-
- Data Parallel (`distributed_backend='dp'`) (multiple-gpus, 1 machine)
235-
- DistributedDataParallel (`distributed_backend='ddp'`) (multiple-gpus across many machines (python script based)).
236-
- DistributedDataParallel (`distributed_backend='ddp_spawn'`) (multiple-gpus across many machines (spawn based)).
237-
- DistributedDataParallel 2 (`distributed_backend='ddp2'`) (DP in a machine, DDP across machines).
238-
- Horovod (`distributed_backend='horovod'`) (multi-machine, multi-gpu, configured at runtime)
234+
- Data Parallel (`accelerator='dp'`) (multiple-gpus, 1 machine)
235+
- DistributedDataParallel (`accelerator='ddp'`) (multiple-gpus across many machines (python script based)).
236+
- DistributedDataParallel (`accelerator='ddp_spawn'`) (multiple-gpus across many machines (spawn based)).
237+
- DistributedDataParallel 2 (`accelerator='ddp2'`) (DP in a machine, DDP across machines).
238+
- Horovod (`accelerator='horovod'`) (multi-machine, multi-gpu, configured at runtime)
239239
- TPUs (`tpu_cores=8|x`) (tpu or TPU pod)
240240

241241
.. note::
@@ -258,7 +258,7 @@ after which the root node will aggregate the results.
258258
:skipif: torch.cuda.device_count() < 2
259259

260260
# train on 2 GPUs (using DP mode)
261-
trainer = Trainer(gpus=2, distributed_backend='dp')
261+
trainer = Trainer(gpus=2, accelerator='dp')
262262

263263
Distributed Data Parallel
264264
^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -281,10 +281,10 @@ Distributed Data Parallel
281281
.. code-block:: python
282282
283283
# train on 8 GPUs (same machine (ie: node))
284-
trainer = Trainer(gpus=8, distributed_backend='ddp')
284+
trainer = Trainer(gpus=8, accelerator='ddp')
285285
286286
# train on 32 GPUs (4 nodes)
287-
trainer = Trainer(gpus=8, distributed_backend='ddp', num_nodes=4)
287+
trainer = Trainer(gpus=8, accelerator='ddp', num_nodes=4)
288288
289289
This Lightning implementation of DDP calls your script under the hood multiple times with the correct environment
290290
variables:
@@ -330,7 +330,7 @@ In this case, we can use DDP2 which behaves like DP in a machine and DDP across
330330
.. code-block:: python
331331
332332
# train on 32 GPUs (4 nodes)
333-
trainer = Trainer(gpus=8, distributed_backend='ddp2', num_nodes=4)
333+
trainer = Trainer(gpus=8, accelerator='ddp2', num_nodes=4)
334334
335335
Distributed Data Parallel Spawn
336336
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -348,7 +348,7 @@ project module) you can use the following method:
348348
.. code-block:: python
349349
350350
# train on 8 GPUs (same machine (ie: node))
351-
trainer = Trainer(gpus=8, distributed_backend='ddp')
351+
trainer = Trainer(gpus=8, accelerator='ddp')
352352
353353
We STRONGLY discourage this use because it has limitations (due to Python and PyTorch):
354354

@@ -400,7 +400,7 @@ You can then call your scripts anywhere
400400
.. code-block:: bash
401401
402402
cd /project/src
403-
python some_file.py --distributed_backend 'ddp' --gpus 8
403+
python some_file.py --accelerator 'ddp' --gpus 8
404404
405405
406406
Horovod
@@ -421,10 +421,10 @@ Horovod can be configured in the training script to run with any number of GPUs
421421
.. code-block:: python
422422
423423
# train Horovod on GPU (number of GPUs / machines provided on command-line)
424-
trainer = Trainer(distributed_backend='horovod', gpus=1)
424+
trainer = Trainer(accelerator='horovod', gpus=1)
425425
426426
# train Horovod on CPU (number of processes / machines provided on command-line)
427-
trainer = Trainer(distributed_backend='horovod')
427+
trainer = Trainer(accelerator='horovod')
428428
429429
When starting the training job, the driver application will then be used to specify the total
430430
number of worker processes:
@@ -554,13 +554,13 @@ Below are the possible configurations we support.
554554
+=======+=========+====+=====+=========+============================================================+
555555
| Y | | | | | `Trainer(gpus=1)` |
556556
+-------+---------+----+-----+---------+------------------------------------------------------------+
557-
| Y | | | | Y | `Trainer(gpus=1, use_amp=True)` |
557+
| Y | | | | Y | `Trainer(gpus=1, precision=16)` |
558558
+-------+---------+----+-----+---------+------------------------------------------------------------+
559-
| | Y | Y | | | `Trainer(gpus=k, distributed_backend='dp')` |
559+
| | Y | Y | | | `Trainer(gpus=k, accelerator='dp')` |
560560
+-------+---------+----+-----+---------+------------------------------------------------------------+
561-
| | Y | | Y | | `Trainer(gpus=k, distributed_backend='ddp')` |
561+
| | Y | | Y | | `Trainer(gpus=k, accelerator='ddp')` |
562562
+-------+---------+----+-----+---------+------------------------------------------------------------+
563-
| | Y | | Y | Y | `Trainer(gpus=k, distributed_backend='ddp', use_amp=True)` |
563+
| | Y | | Y | Y | `Trainer(gpus=k, accelerator='ddp', precision=16)` |
564564
+-------+---------+----+-----+---------+------------------------------------------------------------+
565565

566566

@@ -590,10 +590,10 @@ In (DDP, Horovod) your effective batch size will be 7 * gpus * num_nodes.
590590
.. code-block:: python
591591
592592
# effective batch size = 7 * 8
593-
Trainer(gpus=8, distributed_backend='ddp|horovod')
593+
Trainer(gpus=8, accelerator='ddp|horovod')
594594
595595
# effective batch size = 7 * 8 * 10
596-
Trainer(gpus=8, num_nodes=10, distributed_backend='ddp|horovod')
596+
Trainer(gpus=8, num_nodes=10, accelerator='ddp|horovod')
597597
598598
599599
In DDP2, your effective batch size will be 7 * num_nodes.
@@ -602,10 +602,10 @@ The reason is that the full batch is visible to all GPUs on the node when using
602602
.. code-block:: python
603603
604604
# effective batch size = 7
605-
Trainer(gpus=8, distributed_backend='ddp2')
605+
Trainer(gpus=8, accelerator='ddp2')
606606
607607
# effective batch size = 7 * 10
608-
Trainer(gpus=8, num_nodes=10, distributed_backend='ddp2')
608+
Trainer(gpus=8, num_nodes=10, accelerator='ddp2')
609609
610610
611611
.. note:: Huge batch sizes are actually really bad for convergence. Check out:
@@ -619,7 +619,7 @@ Lightning supports the use of PytorchElastic to enable fault-tolerent and elasti
619619

620620
.. code-block:: python
621621
622-
Trainer(gpus=8, distributed_backend='ddp')
622+
Trainer(gpus=8, accelerator='ddp')
623623
624624
625625
Following the `PytorchElastic Quickstart documentation <https://pytorch.org/elastic/latest/quickstart.html>`_, you then need to start a single-node etcd server on one of the hosts:

docs/source/performance.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,9 @@ The best thing to do is to increase the ``num_workers`` slowly and stop once you
3333

3434
Spawn
3535
^^^^^
36-
When using ``distributed_backend=ddp_spawn`` (the ddp default) or TPU training, the way multiple GPUs/TPU cores are used is by calling ``.spawn()`` under the hood.
36+
When using ``accelerator=ddp_spawn`` (the ddp default) or TPU training, the way multiple GPUs/TPU cores are used is by calling ``.spawn()`` under the hood.
3737
The problem is that PyTorch has issues with ``num_workers > 0`` when using ``.spawn()``. For this reason we recommend you
38-
use ``distributed_backend=ddp`` so you can increase the ``num_workers``, however your script has to be callable like so:
38+
use ``accelerator=ddp`` so you can increase the ``num_workers``, however your script has to be callable like so:
3939

4040
.. code-block:: bash
4141

docs/source/slurm.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ To train a model using multiple nodes, do the following:
2424
.. code-block:: python
2525
2626
# train on 32 GPUs across 4 nodes
27-
trainer = Trainer(gpus=8, num_nodes=4, distributed_backend='ddp')
27+
trainer = Trainer(gpus=8, num_nodes=4, accelerator='ddp')
2828
2929
3. It's a good idea to structure your training script like this:
3030

@@ -37,7 +37,7 @@ To train a model using multiple nodes, do the following:
3737
trainer = pl.Trainer(
3838
gpus=8,
3939
num_nodes=4,
40-
distributed_backend='ddp'
40+
accelerator='ddp'
4141
)
4242

4343
trainer.fit(model)

docs/source/tpu.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ Lightning supports training on a single TPU core. Just pass the TPU core ID [1-8
140140

141141
Distributed Backend with TPU
142142
----------------------------
143-
The ```distributed_backend``` option used for GPUs does not apply to TPUs.
143+
The ``accelerator`` option used for GPUs does not apply to TPUs.
144144
TPUs work in DDP mode by default (distributing over each core)
145145

146146
----------------

pytorch_lightning/trainer/__init__.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -203,18 +203,18 @@ def forward(self, x):
203203
.. testcode::
204204
205205
# default used by the Trainer
206-
trainer = Trainer(distributed_backend=None)
206+
trainer = Trainer(accelerator=None)
207207
208208
Example::
209209
210210
# dp = DataParallel
211-
trainer = Trainer(gpus=2, distributed_backend='dp')
211+
trainer = Trainer(gpus=2, accelerator='dp')
212212
213213
# ddp = DistributedDataParallel
214-
trainer = Trainer(gpus=2, num_nodes=2, distributed_backend='ddp')
214+
trainer = Trainer(gpus=2, num_nodes=2, accelerator='ddp')
215215
216216
# ddp2 = DistributedDataParallel + dp
217-
trainer = Trainer(gpus=2, num_nodes=2, distributed_backend='ddp2')
217+
trainer = Trainer(gpus=2, num_nodes=2, accelerator='ddp2')
218218
219219
.. note:: this option does not apply to TPU. TPUs use ```ddp``` by default (over each core)
220220
@@ -948,16 +948,16 @@ def on_train_end(self, trainer, pl_module):
948948
|
949949
950950
Number of processes to train with. Automatically set to the number of GPUs
951-
when using ``distrbuted_backend="ddp"``. Set to a number greater than 1 when
952-
using ``distributed_backend="ddp_cpu"`` to mimic distributed training on a
951+
when using ``accelerator="ddp"``. Set to a number greater than 1 when
952+
using ``accelerator="ddp_cpu"`` to mimic distributed training on a
953953
machine without GPUs. This is useful for debugging, but **will not** provide
954954
any speedup, since single-process Torch already makes effient use of multiple
955955
CPUs.
956956
957957
.. testcode::
958958
959959
# Simulate DDP for debugging on your GPU-less laptop
960-
trainer = Trainer(distributed_backend="ddp_cpu", num_processes=2)
960+
trainer = Trainer(accelerator="ddp_cpu", num_processes=2)
961961
962962
num_sanity_val_steps
963963
^^^^^^^^^^^^^^^^^^^^

0 commit comments

Comments
 (0)