Skip to content

Commit b546431

Browse files
SeanNarenlexierule
authored andcommitted
[Fix] Ensure we set the default device before initializing deepspeed (#6460)
* Ensure we set the default device before initializing deepspeed * Add CHANGELOG.md * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Kaushik B <[email protected]> Co-authored-by: Kaushik B <[email protected]> (cherry picked from commit 1c013b4)
1 parent be65b2e commit b546431

File tree

2 files changed

+9
-17
lines changed

2 files changed

+9
-17
lines changed

CHANGELOG.md

Lines changed: 7 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -104,21 +104,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
104104
- Expose DeepSpeed loss parameters to allow users to fix loss instability ([#6115](https://github.com/PyTorchLightning/pytorch-lightning/pull/6115))
105105

106106

107-
- Fixed `AttributeError` when `logger=None` on TPU ([#6221](https://github.com/PyTorchLightning/pytorch-lightning/pull/6221))
108-
109-
110-
- Fixed `ModelPruning(make_pruning_permanent=True)` pruning buffers getting removed when saved during training ([#6073](https://github.com/PyTorchLightning/pytorch-lightning/pull/6073))
111-
112-
113-
- Fixed `trainer.test` from `best_path` hangs after calling `trainer.fit` ([#6272](https://github.com/PyTorchLightning/pytorch-lightning/pull/6272))
114-
115-
116107
- Fixed duplicate logs appearing in console when using the python logging module ([#5509](https://github.com/PyTorchLightning/pytorch-lightning/pull/5509), [#6275](https://github.com/PyTorchLightning/pytorch-lightning/pull/6275))
117108

118109

119-
- Fixed `SingleTPU` calling `all_gather` ([#6296](https://github.com/PyTorchLightning/pytorch-lightning/pull/6296))
120-
121-
122110
- Fixed DP reduction with collection ([#6324](https://github.com/PyTorchLightning/pytorch-lightning/pull/6324))
123111

124112

@@ -128,21 +116,23 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
128116
- Fixed `.on_fit_{start,end}()` getting called during `trainer.test` ([#6386](https://github.com/PyTorchLightning/pytorch-lightning/pull/6386))
129117

130118

131-
- Fixed PyTorch Profiler with `emit_nvtx` ([#6260](https://github.com/PyTorchLightning/pytorch-lightning/pull/6260))
119+
- Fixed an issue where the tuner would not tune the learning rate if also tuning the batch size ([#4688](https://github.com/PyTorchLightning/pytorch-lightning/pull/4688))
132120

133121

134-
- Fixed `Trainer` not resetting `lightning_optimizers` when calling `Trainer.fit()` multiple times ([#6372](https://github.com/PyTorchLightning/pytorch-lightning/pull/6372))
122+
- Fixed logger creating directory structure too early in DDP ([#6380](https://github.com/PyTorchLightning/pytorch-lightning/pull/6380))
135123

136124

137-
- Fixed an issue where the tuner would not tune the learning rate if also tuning the batch size ([#4688](https://github.com/PyTorchLightning/pytorch-lightning/pull/4688))
125+
- Fixed LightningModule `all_gather` on cpu tensors ([#6416](https://github.com/PyTorchLightning/pytorch-lightning/pull/6416))
138126

139127

140-
- Fixed logger creating directory structure too early in DDP ([#6380](https://github.com/PyTorchLightning/pytorch-lightning/pull/6380))
128+
- Fixed DeepSpeed additional memory use on rank 0 when default device not set early enough ([#6460](https://github.com/PyTorchLightning/pytorch-lightning/pull/6460))
141129

142130

143-
## [1.2.3] - 2021-03-09
131+
- Fixed `DummyLogger.log_hyperparams` raising a `TypeError` when running with `fast_dev_run=True` ([#6398](https://github.com/PyTorchLightning/pytorch-lightning/pull/6398))
144132

145133

134+
## [1.2.3] - 2021-03-09
135+
146136
### Fixed
147137

148138
- Fixed `ModelPruning(make_pruning_permanent=True)` pruning buffers getting removed when saved during training ([#6073](https://github.com/PyTorchLightning/pytorch-lightning/pull/6073))

pytorch_lightning/plugins/training_type/deepspeed.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,8 @@ def _init_scheduler_optimizer(self):
207207
return optimizer, scheduler, optimizer_frequencies
208208

209209
def _initialize_deepspeed_train(self, model):
210+
if self.on_gpu:
211+
torch.cuda.set_device(self.root_device)
210212
optimizer, lightning_scheduler, optimizer_frequencies = None, None, None
211213
if "optimizer" not in self.config:
212214
rank_zero_info(

0 commit comments

Comments
 (0)