-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🚀 Feature
Provide an easy to use method to skip the gradient synchronization.
Motivation
Skipping the synchronization of gradients in .backward() is useful when accumulating gradients in the training loop, cutting down communication overhead for DDP and the sort.
Pitch
Introduce a context manager like we have internally in PL:
Lite.skip_gradient_sync(model)
Lite.skip_backward_sync(model)
Lite.block_gradient_sync(model)
Lite.block_backward_sync(model)
Lite.enable_gradient_sync(model, True|False)
Lite.disable_gradient_sync(model, True|False)
Lite.no_sync(model, enabled=True|False)
Lite.no_sync(model, enabled=True|False)Usage in the training loop:
num_accumulation_steps = 8
for step, (x, y) in enumerate(train_dataloader):
with self.no_sync(model, step % num_accumulation_steps != 0)
out = model(x)
loss = loss_fn(out, y)
self.backward(loss)
if step % num_accumulation_steps == 0:
# apply accumulated gradients
optimizer.step()
optimizer.zero_grad()Alternatives
Additional context
If you enjoy Lightning, check out our other projects! ⚡
-
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
-
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
-
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
-
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
-
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging PyTorch Lightning, Transformers, and Hydra.