-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Proposed refactoring or deprecation
LightningDistributed() class only used by ddp and ddpSpawn, and only have one broadcast function for torch collectives. It's unnecessary to have.
Also, we have to set rank and device in set up steps. If subclass extend DDP or DDPSpawn, and overridden function where LightningDistributed.rank and device setted, could cause silent failures.
- Now the src is not respected in torch broadcast
Motivation
Simplify the code structure and reduce the possibilities for silent failure
Pitch
Deprecate LightningDistributed. There is only one function
def broadcast(self, obj: Any, group=_group.WORLD):
# always wrap into a list so it can be broadcasted.
obj = [obj]
if self.rank != 0:
obj = [None] * len(obj)
broadcast_object_list(obj, 0, group=group or _group.WORLD)
return obj[0]
Move to ddp and ddpSpawn
def broadcast(self, obj: object, src: int = 0) -> object:
if not distributed_available():
raise RuntimeError("DDP is not initialized and torch.distributed is not available, can not broadcast object")
obj = [obj]
if self.global_rank != 0:
obj = [None] * len(obj)
broadcast_object_list(obj, src, group=_group.WORLD)
return obj[0]
Additional context
Related to #7534
If you enjoy Lightning, check out our other projects! ⚡
-
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
-
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning
-
Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch
-
Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.