Skip to content

Commit 2b420e2

Browse files
committed
Merge branch 'master' into docs/half_precision_on_cpu
2 parents b20947f + 6ad05d3 commit 2b420e2

File tree

12 files changed

+102
-330
lines changed

12 files changed

+102
-330
lines changed

CHANGELOG.md

Lines changed: 36 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -88,9 +88,6 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
8888
- Added support for the PyTorch 1.8.1 autograd profiler ([#6618](https://github.com/PyTorchLightning/pytorch-lightning/pull/6618))
8989

9090

91-
- Added `outputs` parameter to callback's `on_validation_epoch_end` & `on_test_epoch_end` hooks ([#6120](https://github.com/PyTorchLightning/pytorch-lightning/pull/6120))
92-
93-
9491
- Added `configure_sharded_model` hook ([#6679](https://github.com/PyTorchLightning/pytorch-lightning/pull/6679))
9592

9693

@@ -213,6 +210,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
213210
- Deprecated `Trainer.truncated_bptt_steps` in favor of `LightningModule.truncated_bptt_steps` ([#7323](https://github.com/PyTorchLightning/pytorch-lightning/pull/7323))
214211

215212

213+
- Deprecated `outputs` in both `LightningModule.on_train_epoch_end` and `Callback.on_train_epoch_end` hooks ([#7339](https://github.com/PyTorchLightning/pytorch-lightning/pull/7339))
214+
215+
216216
- Deprecated `LightningModule.grad_norm` in favor of `pytorch_lightning.utilities.grads.grad_norm` ([#7292](https://github.com/PyTorchLightning/pytorch-lightning/pull/7292))
217217

218218

@@ -342,11 +342,6 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
342342
- Fixed incorrect removal of `WORLD_SIZE` environment variable in DDP training when launching with torch distributed/torchelastic ([#6942](https://github.com/PyTorchLightning/pytorch-lightning/pull/6942))
343343

344344

345-
- Set better defaults for `rank_zero_only.rank` when training is launched with SLURM and torchelastic:
346-
* Support SLURM and torchelastic global rank environment variables ([#5715](https://github.com/PyTorchLightning/pytorch-lightning/pull/5715))
347-
* Remove hardcoding of local rank in accelerator connector ([#6878](https://github.com/PyTorchLightning/pytorch-lightning/pull/6878))
348-
349-
350345
- Made the `Plugin.reduce` method more consistent across all Plugins to reflect a mean-reduction by default ([#6011](https://github.com/PyTorchLightning/pytorch-lightning/pull/6011))
351346

352347

@@ -356,9 +351,6 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
356351
- Do not print top-k verbose log with `ModelCheckpoint(monitor=None)` ([#6109](https://github.com/PyTorchLightning/pytorch-lightning/pull/6109))
357352

358353

359-
- Fixed csv extension check ([#6436](https://github.com/PyTorchLightning/pytorch-lightning/pull/6436))
360-
361-
362354
- Fixed `ModelCheckpoint(monitor=None, save_last=True)` not saving checkpoints ([#6136](https://github.com/PyTorchLightning/pytorch-lightning/pull/6136))
363355

364356

@@ -380,30 +372,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
380372
- Fixed `trainer.tuner.{lr_find,scale_batch_size}` not setting the `Trainer` state properly ([#7258](https://github.com/PyTorchLightning/pytorch-lightning/pull/7258))
381373

382374

383-
- Fixed bug where `BaseFinetuning.flatten_modules()` was duplicating leaf node parameters ([#6879](https://github.com/PyTorchLightning/pytorch-lightning/pull/6879))
384-
385-
386375
- Fixed bug where the learning rate schedulers did not follow the optimizer frequencies ([#4868](https://github.com/PyTorchLightning/pytorch-lightning/pull/4868))
387376

388377

389-
- Fixed `EarlyStopping` logic when `min_epochs` or `min_steps` requirement is not met ([#6705](https://github.com/PyTorchLightning/pytorch-lightning/pull/6705))
390-
391-
392-
- Fixed TPU Spawn all gather ([#6896](https://github.com/PyTorchLightning/pytorch-lightning/pull/6896))
393-
394-
395-
- Fixed `--gpus` default for parser returned by `Trainer.add_argparse_args` ([#6898](https://github.com/PyTorchLightning/pytorch-lightning/pull/6898))
396-
397-
398378
- Fixed pickle error checker to now check for `pickle.PickleError` to catch all pickle errors ([#6917](https://github.com/PyTorchLightning/pytorch-lightning/pull/6917))
399379

400380

401-
- Fixed `AttributeError` for `require_backward_grad_sync` when running manual optimization with sharded plugin ([#6915](https://github.com/PyTorchLightning/pytorch-lightning/pull/6915))
402-
403-
404-
- Fixed multi-gpu join for Horovod ([#6954](https://github.com/PyTorchLightning/pytorch-lightning/pull/6954))
405-
406-
407381
- Fixed a bug where `LightningModule.training_epoch_end` was called after the `on_train_end_epoch` hook ([#6969](https://github.com/PyTorchLightning/pytorch-lightning/pull/6969))
408382

409383

@@ -413,27 +387,18 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
413387
- Fixed a bug where the outputs passed to `train_batch_end` would be lists even when using a single optimizer and no truncated backprop through time steps ([#6969](https://github.com/PyTorchLightning/pytorch-lightning/pull/6969))
414388

415389

416-
- Fixed `sync_dist` for tpus ([#6950](https://github.com/PyTorchLightning/pytorch-lightning/pull/6950))
417-
418-
419390
- Fixed bug for trainer error handling which would cause hang for distributed training ([#6864](https://github.com/PyTorchLightning/pytorch-lightning/pull/6864))
420391

421392

422393
- Fixed `self.device` not returning the correct device in replicas of data-parallel ([#6414](https://github.com/PyTorchLightning/pytorch-lightning/pull/6414))
423394

424395

425-
- Fixed process rank not being available right away after `Trainer` instantiation ([#6941](https://github.com/PyTorchLightning/pytorch-lightning/pull/6941))
426-
427-
428396
- Fixed `lr_find` trying beyond `num_training` steps and suggesting a too high learning rate ([#7076](https://github.com/PyTorchLightning/pytorch-lightning/pull/7076))
429397

430398

431399
- Fixed logger creating incorrect version folder in DDP with repeated `Trainer.fit` calls ([#7077](https://github.com/PyTorchLightning/pytorch-lightning/pull/7077))
432400

433401

434-
- Fixed the order to call for world ranks & the `root_device` property in `TPUSpawnPlugin` ([#7074](https://github.com/PyTorchLightning/pytorch-lightning/pull/7074))
435-
436-
437402
- Fixed metric objects passed directly to `self.log` not being reset correctly ([#7055](https://github.com/PyTorchLightning/pytorch-lightning/pull/7055))
438403

439404

@@ -443,9 +408,6 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
443408
- Fixed the save_dir in `WandbLogger` when the run was initiated externally ([#7106](https://github.com/PyTorchLightning/pytorch-lightning/pull/7106))
444409

445410

446-
- Fixed parsing for pre-release package versions ([#6999](https://github.com/PyTorchLightning/pytorch-lightning/pull/6999))
447-
448-
449411
- Fixed `num_sanity_val_steps` affecting reproducibility of training data shuffling ([#7014](https://github.com/PyTorchLightning/pytorch-lightning/pull/7014))
450412

451413

@@ -485,6 +447,39 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
485447
- Fixed a bug where an error would be raised if the train dataloader sometimes produced None for a batch ([#7342](https://github.com/PyTorchLightning/pytorch-lightning/pull/7342))
486448

487449

450+
## [1.2.9] - 2021-04-20
451+
452+
### Fixed
453+
454+
- Fixed the order to call for world ranks & the `root_device` property in `TPUSpawnPlugin` ([#7074](https://github.com/PyTorchLightning/pytorch-lightning/pull/7074))
455+
- Fixed multi-gpu join for Horovod ([#6954](https://github.com/PyTorchLightning/pytorch-lightning/pull/6954))
456+
- Fixed parsing for pre-release package versions ([#6999](https://github.com/PyTorchLightning/pytorch-lightning/pull/6999))
457+
458+
459+
## [1.2.8] - 2021-04-14
460+
461+
### Added
462+
463+
- Added TPUSpawn + IterableDataset error message ([#6875](https://github.com/PyTorchLightning/pytorch-lightning/pull/6875))
464+
465+
### Fixed
466+
467+
- Fixed process rank not being available right away after `Trainer` instantiation ([#6941](https://github.com/PyTorchLightning/pytorch-lightning/pull/6941))
468+
- Fixed `sync_dist` for tpus ([#6950](https://github.com/PyTorchLightning/pytorch-lightning/pull/6950))
469+
- Fixed `AttributeError` for `require_backward_grad_sync` when running manual optimization with sharded plugin ([#6915](https://github.com/PyTorchLightning/pytorch-lightning/pull/6915))
470+
- Fixed `--gpus` default for parser returned by `Trainer.add_argparse_args` ([#6898](https://github.com/PyTorchLightning/pytorch-lightning/pull/6898))
471+
- Fixed TPU Spawn all gather ([#6896](https://github.com/PyTorchLightning/pytorch-lightning/pull/6896))
472+
- Fixed `EarlyStopping` logic when `min_epochs` or `min_steps` requirement is not met ([#6705](https://github.com/PyTorchLightning/pytorch-lightning/pull/6705))
473+
- Fixed csv extension check ([#6436](https://github.com/PyTorchLightning/pytorch-lightning/pull/6436))
474+
- Fixed checkpoint issue when using Horovod distributed backend ([#6958](https://github.com/PyTorchLightning/pytorch-lightning/pull/6958))
475+
- Fixed tensorboard exception raising ([#6901](https://github.com/PyTorchLightning/pytorch-lightning/pull/6901))
476+
- Fixed setting the eval/train flag correctly on accelerator model ([#6983](https://github.com/PyTorchLightning/pytorch-lightning/pull/6983))
477+
- Fixed DDP_SPAWN compatibility with bug_report_model.py ([#6892](https://github.com/PyTorchLightning/pytorch-lightning/pull/6892))
478+
- Fixed bug where `BaseFinetuning.flatten_modules()` was duplicating leaf node parameters ([#6879](https://github.com/PyTorchLightning/pytorch-lightning/pull/6879))
479+
- Set better defaults for `rank_zero_only.rank` when training is launched with SLURM and torchelastic:
480+
* Support SLURM and torchelastic global rank environment variables ([#5715](https://github.com/PyTorchLightning/pytorch-lightning/pull/5715))
481+
* Remove hardcoding of local rank in accelerator connector ([#6878](https://github.com/PyTorchLightning/pytorch-lightning/pull/6878))
482+
488483

489484
## [1.2.7] - 2021-04-06
490485

pytorch_lightning/callbacks/base.py

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
from torch.optim import Optimizer
2323

2424
import pytorch_lightning as pl
25-
from pytorch_lightning.utilities.types import EPOCH_OUTPUT, STEP_OUTPUT
25+
from pytorch_lightning.utilities.types import STEP_OUTPUT
2626

2727

2828
class Callback(abc.ABC):
@@ -101,24 +101,28 @@ def on_train_epoch_start(self, trainer: 'pl.Trainer', pl_module: 'pl.LightningMo
101101
def on_train_epoch_end(
102102
self, trainer: 'pl.Trainer', pl_module: 'pl.LightningModule', unused: Optional = None
103103
) -> None:
104-
"""Called when the train epoch ends."""
104+
"""Called when the train epoch ends.
105+
106+
To access all batch outputs at the end of the epoch, either:
107+
108+
1. Implement `training_epoch_end` in the `LightningModule` and access outputs via the module OR
109+
2. Cache data across train batch hooks inside the callback implementation to post-process in this hook.
110+
"""
105111
pass
106112

107113
def on_validation_epoch_start(self, trainer: 'pl.Trainer', pl_module: 'pl.LightningModule') -> None:
108114
"""Called when the val epoch begins."""
109115
pass
110116

111-
def on_validation_epoch_end(
112-
self, trainer: 'pl.Trainer', pl_module: 'pl.LightningModule', outputs: EPOCH_OUTPUT
113-
) -> None:
117+
def on_validation_epoch_end(self, trainer: 'pl.Trainer', pl_module: 'pl.LightningModule') -> None:
114118
"""Called when the val epoch ends."""
115119
pass
116120

117121
def on_test_epoch_start(self, trainer: 'pl.Trainer', pl_module: 'pl.LightningModule') -> None:
118122
"""Called when the test epoch begins."""
119123
pass
120124

121-
def on_test_epoch_end(self, trainer: 'pl.Trainer', pl_module: 'pl.LightningModule', outputs: EPOCH_OUTPUT) -> None:
125+
def on_test_epoch_end(self, trainer: 'pl.Trainer', pl_module: 'pl.LightningModule') -> None:
122126
"""Called when the test epoch ends."""
123127
pass
124128

pytorch_lightning/core/hooks.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
from torch.utils.data import DataLoader
2121

2222
from pytorch_lightning.utilities import move_data_to_device, rank_zero_warn
23-
from pytorch_lightning.utilities.types import EPOCH_OUTPUT, STEP_OUTPUT
23+
from pytorch_lightning.utilities.types import STEP_OUTPUT
2424

2525

2626
class ModelHooks:
@@ -238,14 +238,19 @@ def on_train_epoch_start(self) -> None:
238238
def on_train_epoch_end(self, unused: Optional = None) -> None:
239239
"""
240240
Called in the training loop at the very end of the epoch.
241+
242+
To access all batch outputs at the end of the epoch, either:
243+
244+
1. Implement `training_epoch_end` in the LightningModule OR
245+
2. Cache data across steps on the attribute(s) of the `LightningModule` and access them in this hook
241246
"""
242247

243248
def on_validation_epoch_start(self) -> None:
244249
"""
245250
Called in the validation loop at the very beginning of the epoch.
246251
"""
247252

248-
def on_validation_epoch_end(self, outputs: EPOCH_OUTPUT) -> None:
253+
def on_validation_epoch_end(self) -> None:
249254
"""
250255
Called in the validation loop at the very end of the epoch.
251256
"""
@@ -255,7 +260,7 @@ def on_test_epoch_start(self) -> None:
255260
Called in the test loop at the very beginning of the epoch.
256261
"""
257262

258-
def on_test_epoch_end(self, outputs: EPOCH_OUTPUT) -> None:
263+
def on_test_epoch_end(self) -> None:
259264
"""
260265
Called in the test loop at the very end of the epoch.
261266
"""

pytorch_lightning/core/lightning.py

Lines changed: 18 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1123,6 +1123,24 @@ def configure_optimizers(self):
11231123
- **Tuple of dictionaries** as described above, with an optional ``"frequency"`` key.
11241124
- **None** - Fit will run without any optimizer.
11251125
1126+
Note:
1127+
The lr_dict is a dictionary which contains the scheduler and its associated configuration.
1128+
The default configuration is shown below.
1129+
1130+
.. code-block:: python
1131+
1132+
lr_dict = {
1133+
'scheduler': lr_scheduler, # The LR scheduler instance (required)
1134+
# The unit of the scheduler's step size, could also be 'step'
1135+
'interval': 'epoch',
1136+
'frequency': 1, # The frequency of the scheduler
1137+
'monitor': 'val_loss', # Metric for `ReduceLROnPlateau` to monitor
1138+
'strict': True, # Whether to crash the training if `monitor` is not found
1139+
'name': None, # Custom name for `LearningRateMonitor` to use
1140+
}
1141+
1142+
Only the ``"scheduler"`` key is required, the rest will be set to the defaults above.
1143+
11261144
Note:
11271145
The ``frequency`` value specified in a dict along with the ``optimizer`` key is an int corresponding
11281146
to the number of sequential batches optimized with the specific optimizer.
@@ -1148,33 +1166,6 @@ def configure_optimizers(self):
11481166
If an LR scheduler is specified for an optimizer using the ``lr_scheduler`` key in the above dict,
11491167
the scheduler will only be updated when its optimizer is being used.
11501168
1151-
Note:
1152-
The lr_dict is a dictionary which contains the scheduler and its associated configuration.
1153-
The default configuration is shown below.
1154-
1155-
.. code-block:: python
1156-
1157-
lr_dict = {
1158-
'scheduler': lr_scheduler, # The LR scheduler instance (required)
1159-
'interval': 'epoch', # The unit of the scheduler's step size
1160-
'frequency': 1, # The frequency of the scheduler
1161-
'reduce_on_plateau': False, # For ReduceLROnPlateau scheduler
1162-
'monitor': 'val_loss', # Metric for ReduceLROnPlateau to monitor
1163-
'strict': True, # Whether to crash the training if `monitor` is not found
1164-
'name': None, # Custom name for LearningRateMonitor to use
1165-
}
1166-
1167-
Only the ``"scheduler"`` key is required, the rest will be set to the defaults above.
1168-
1169-
Note:
1170-
The ``"frequency"`` value is an ``int`` corresponding to the number of sequential batches optimized with the
1171-
specific optimizer. It should be given to none or to all of the optimizers.
1172-
1173-
There is a difference between passing multiple optimizers in a list and passing multiple optimizers in
1174-
dictionaries with a frequency of 1:
1175-
In the former case, all optimizers will operate on the given batch in each optimization step.
1176-
In the latter, only one optimizer will operate on the given batch at every step.
1177-
11781169
Examples::
11791170
11801171
# most cases
@@ -1226,18 +1217,6 @@ def configure_optimizers(self):
12261217
at each training step.
12271218
- If you need to control how often those optimizers step or override the default ``.step()`` schedule,
12281219
override the :meth:`optimizer_step` hook.
1229-
- If you only want to call a learning rate scheduler every ``x`` step or epoch, or want to monitor a custom
1230-
metric, you can specify these in a lr_dict:
1231-
1232-
.. code-block:: python
1233-
1234-
lr_dict = {
1235-
'scheduler': lr_scheduler,
1236-
'interval': 'step', # or 'epoch'
1237-
'monitor': 'val_f1',
1238-
'frequency': x,
1239-
}
1240-
12411220
"""
12421221
rank_zero_warn("`configure_optimizers` must be implemented to be used with the Lightning Trainer")
12431222

pytorch_lightning/trainer/callback_hook.py

Lines changed: 6 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -111,44 +111,20 @@ def on_validation_epoch_start(self):
111111
for callback in self.callbacks:
112112
callback.on_validation_epoch_start(self, self.lightning_module)
113113

114-
def on_validation_epoch_end(self, outputs: EPOCH_OUTPUT):
115-
"""Called when the epoch ends.
116-
117-
Args:
118-
outputs: List of outputs on each ``validation`` epoch
119-
"""
114+
def on_validation_epoch_end(self):
115+
"""Called when the validation epoch ends."""
120116
for callback in self.callbacks:
121-
if is_param_in_hook_signature(callback.on_validation_epoch_end, "outputs"):
122-
callback.on_validation_epoch_end(self, self.lightning_module, outputs)
123-
else:
124-
warning_cache.warn(
125-
"`Callback.on_validation_epoch_end` signature has changed in v1.3."
126-
" `outputs` parameter has been added."
127-
" Support for the old signature will be removed in v1.5", DeprecationWarning
128-
)
129-
callback.on_validation_epoch_end(self, self.lightning_module)
117+
callback.on_validation_epoch_end(self, self.lightning_module)
130118

131119
def on_test_epoch_start(self):
132120
"""Called when the epoch begins."""
133121
for callback in self.callbacks:
134122
callback.on_test_epoch_start(self, self.lightning_module)
135123

136-
def on_test_epoch_end(self, outputs: EPOCH_OUTPUT):
137-
"""Called when the epoch ends.
138-
139-
Args:
140-
outputs: List of outputs on each ``test`` epoch
141-
"""
124+
def on_test_epoch_end(self):
125+
"""Called when the test epoch ends."""
142126
for callback in self.callbacks:
143-
if is_param_in_hook_signature(callback.on_test_epoch_end, "outputs"):
144-
callback.on_test_epoch_end(self, self.lightning_module, outputs)
145-
else:
146-
warning_cache.warn(
147-
"`Callback.on_test_epoch_end` signature has changed in v1.3."
148-
" `outputs` parameter has been added."
149-
" Support for the old signature will be removed in v1.5", DeprecationWarning
150-
)
151-
callback.on_test_epoch_end(self, self.lightning_module)
127+
callback.on_test_epoch_end(self, self.lightning_module)
152128

153129
def on_predict_epoch_start(self) -> None:
154130
"""Called when the epoch begins."""

0 commit comments

Comments
 (0)