You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/common/optimizers.rst
+10-14Lines changed: 10 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,10 +40,9 @@ to manually manage the optimization process. To do so, do the following:
40
40
loss = self.compute_loss(batch)
41
41
self.manual_backward(loss)
42
42
43
-
44
43
.. note:: This is only recommended for experts who need ultimate flexibility. Lightning will handle only precision and accelerators logic. The users are left with ``optimizer.zero_grad()``, gradient accumulation, model toggling, etc..
45
44
46
-
.. warning:: Before 1.2, ``optimzer.step`` was calling ``optimizer.zero_grad()`` internally. From 1.2, it is left to the users expertize.
45
+
.. warning:: Before 1.2, ``optimzer.step`` was calling ``optimizer.zero_grad()`` internally. From 1.2, it is left to the users expertise.
47
46
48
47
.. tip:: To perform ``accumulate_grad_batches`` with one optimizer, you can do as such.
49
48
@@ -65,8 +64,7 @@ to manually manage the optimization process. To do so, do the following:
65
64
opt.step()
66
65
opt.zero_grad()
67
66
68
-
69
-
.. tip:: It is a good practice to provide the optimizer with a ``closure`` function that performs a ``forward`` and ``backward`` pass of your model. It is optional for most optimizers, but makes your code compatible if you switch to an optimizer which requires a closure. See also `the PyTorch docs <https://pytorch.org/docs/stable/optim.html#optimizer-step-closure>`_.
67
+
.. tip:: It is a good practice to provide the optimizer with a ``closure`` function that performs a ``forward``, ``zero_grad`` and ``backward`` of your model. It is optional for most optimizers, but makes your code compatible if you switch to an optimizer which requires a closure. See also `the PyTorch docs <https://pytorch.org/docs/stable/optim.html#optimizer-step-closure>`_.
70
68
71
69
Here is the same example as above using a ``closure``.
72
70
@@ -78,20 +76,20 @@ Here is the same example as above using a ``closure``.
78
76
def training_step(self, batch, batch_idx):
79
77
opt = self.optimizers()
80
78
81
-
def forward_and_backward():
79
+
def closure():
80
+
# Only zero_grad on the first batch to accumulate gradients
81
+
is_first_batch_to_accumulate = batch_idx % 2 == 0
82
+
if is_first_batch_to_accumulate:
83
+
opt.zero_grad()
84
+
82
85
loss = self.compute_loss(batch)
83
86
self.manual_backward(loss)
87
+
return loss
84
88
85
-
opt.step(closure=forward_and_backward)
86
-
87
-
# accumulate gradient batches
88
-
if batch_idx % 2 == 0:
89
-
opt.zero_grad()
90
-
89
+
opt.step(closure=closure)
91
90
92
91
.. tip:: Be careful where you call ``zero_grad`` or your model won't converge. It is good pratice to call ``zero_grad`` before ``manual_backward``.
93
92
94
-
95
93
.. testcode:: python
96
94
97
95
import torch
@@ -174,10 +172,8 @@ Setting ``sync_grad`` to ``False`` will block this synchronization and improve y
174
172
175
173
Here is an example for advanced use-case.
176
174
177
-
178
175
.. testcode:: python
179
176
180
-
181
177
# Scenario for a GAN with gradient accumulation every 2 batches and optimized for multiple gpus.
0 commit comments