Skip to content

Commit 571195c

Browse files
ananthsubBorda
authored andcommitted
Fix hang in DDP HPC accelerators (#5157)
* Fix hang in DDP HPC accelerators init_device was never called * Update CHANGELOG.md
1 parent ca886e7 commit 571195c

File tree

3 files changed

+5
-0
lines changed

3 files changed

+5
-0
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
5454

5555
- Do not warn when the `name` key is used in the `lr_scheduler` dict ([#5057](https://github.com/PyTorchLightning/pytorch-lightning/pull/5057))
5656

57+
- Fixed `DDPHPCAccelerator` hangs in DDP construction by calling `init_device` ([#5157](https://github.com/PyTorchLightning/pytorch-lightning/pull/5157))
5758

5859

5960
## [1.1.0] - 2020-12-09

pytorch_lightning/accelerators/ddp_cpu_hpc_accelerator.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,6 @@ def model_to_device(self, model, process_idx):
4343
def get_device_ids(self):
4444
device_ids = None
4545
return device_ids
46+
47+
def init_device(self, process_idx):
48+
pass

pytorch_lightning/accelerators/ddp_hpc_accelerator.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,7 @@ def ddp_train(self, process_idx, model):
121121
"""
122122
# determine which process we are and world size
123123
self.set_world_ranks(process_idx)
124+
self.init_device(process_idx)
124125

125126
# toggle prog bar
126127
if (self.trainer.node_rank != 0 or process_idx != 0) and self.trainer.progress_bar_callback is not None:

0 commit comments

Comments
 (0)