You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
.. note:: Huge batch sizes are actually really bad for convergence. Check out:
@@ -619,7 +619,7 @@ Lightning supports the use of PytorchElastic to enable fault-tolerent and elasti
619
619
620
620
.. code-block:: python
621
621
622
-
Trainer(gpus=8, distributed_backend='ddp')
622
+
Trainer(gpus=8, accelerator='ddp')
623
623
624
624
625
625
Following the `PytorchElastic Quickstart documentation <https://pytorch.org/elastic/latest/quickstart.html>`_, you then need to start a single-node etcd server on one of the hosts:
Copy file name to clipboardExpand all lines: docs/source/performance.rst
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,9 +33,9 @@ The best thing to do is to increase the ``num_workers`` slowly and stop once you
33
33
34
34
Spawn
35
35
^^^^^
36
-
When using ``distributed_backend=ddp_spawn`` (the ddp default) or TPU training, the way multiple GPUs/TPU cores are used is by calling ``.spawn()`` under the hood.
36
+
When using ``accelerator=ddp_spawn`` (the ddp default) or TPU training, the way multiple GPUs/TPU cores are used is by calling ``.spawn()`` under the hood.
37
37
The problem is that PyTorch has issues with ``num_workers > 0`` when using ``.spawn()``. For this reason we recommend you
38
-
use ``distributed_backend=ddp`` so you can increase the ``num_workers``, however your script has to be callable like so:
38
+
use ``accelerator=ddp`` so you can increase the ``num_workers``, however your script has to be callable like so:
0 commit comments