Skip to content

Commit afd0adf

Browse files
committed
Merge branch 'master' into dataloader_warn
2 parents 511f962 + 01b9cf8 commit afd0adf

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+1690
-216
lines changed

CHANGELOG.md

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,15 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
1313
- Added more explicit exception message when trying to execute `trainer.test()` or `trainer.validate()` with `fast_dev_run=True` ([#6667](https://github.com/PyTorchLightning/pytorch-lightning/pull/6667))
1414

1515

16+
- Added `LightningCLI` class to provide simple reproducibility with minimum boilerplate training cli. ([#4492](https://github.com/PyTorchLightning/pytorch-lightning/pull/4492))
17+
18+
1619
- Trigger warning when non-metric logged value with multi processes hasn't been reduced ([#6417](https://github.com/PyTorchLightning/pytorch-lightning/pull/6417))
1720

1821

22+
- Added `gradient_clip_algorithm` argument to Trainer for gradient clipping by value ([#6123](https://github.com/PyTorchLightning/pytorch-lightning/pull/6123)).
23+
24+
1925
- Added a way to print to terminal without breaking up the progress bar ([#5470](https://github.com/PyTorchLightning/pytorch-lightning/pull/5470))
2026

2127

@@ -75,6 +81,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
7581

7682
- Added support for `precision=64`, enabling training with double precision ([#6595](https://github.com/PyTorchLightning/pytorch-lightning/pull/6595))
7783

84+
- Added support for DDP communication hooks ([#6736](https://github.com/PyTorchLightning/pytorch-lightning/issues/6736))
7885

7986
- Added `artifact_location` argument to `MLFlowLogger` which will be passed to the `MlflowClient.create_experiment` call ([#6677](https://github.com/PyTorchLightning/pytorch-lightning/pull/6677))
8087

@@ -173,7 +180,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
173180

174181
### Fixed
175182

176-
- Sanitize `None` params during pruning ([#6836](https://github.com/PyTorchLightning/pytorch-lightning/pull/6836))
183+
- Set better defaults for `rank_zero_only.rank` when training is launched with SLURM and torchelastic ([#6802](https://github.com/PyTorchLightning/pytorch-lightning/pull/6802/))
177184

178185

179186
- Made the `Plugin.reduce` method more consistent across all Plugins to reflect a mean-reduction by default ([#6011](https://github.com/PyTorchLightning/pytorch-lightning/pull/6011))
@@ -185,6 +192,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
185192
- Do not print top-k verbose log with `ModelCheckpoint(monitor=None)` ([#6109](https://github.com/PyTorchLightning/pytorch-lightning/pull/6109))
186193

187194

195+
- Fixed csv extension check ([#6436](https://github.com/PyTorchLightning/pytorch-lightning/pull/6436))
196+
197+
188198
- Fixed `ModelCheckpoint(monitor=None, save_last=True)` not saving checkpoints ([#6136](https://github.com/PyTorchLightning/pytorch-lightning/pull/6136))
189199

190200

@@ -203,7 +213,19 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
203213
- Fixed torch distributed not available in setup hook for DDP ([#6506](https://github.com/PyTorchLightning/pytorch-lightning/pull/6506))
204214

205215

206-
- Fixed an issue with `IterableDataset` when `__len__` is not defined ([#6828](https://github.com/PyTorchLightning/pytorch-lightning/pull/6828))
216+
- Fixed `EarlyStopping` logic when `min_epochs` or `min_steps` requirement is not met ([#6705](https://github.com/PyTorchLightning/pytorch-lightning/pull/6705))
217+
218+
219+
## [1.2.7] - 2021-04-06
220+
221+
### Fixed
222+
223+
- Fixed resolve a bug with omegaconf and xm.save ([#6741](https://github.com/PyTorchLightning/pytorch-lightning/pull/6741))
224+
- Fixed an issue with IterableDataset when __len__ is not defined ([#6828](https://github.com/PyTorchLightning/pytorch-lightning/pull/6828))
225+
- Sanitize None params during pruning ([#6836](https://github.com/PyTorchLightning/pytorch-lightning/pull/6836))
226+
- Enforce an epoch scheduler interval when using SWA ([#6588](https://github.com/PyTorchLightning/pytorch-lightning/pull/6588))
227+
- Fixed TPU Colab hang issue, post training ([#6816](https://github.com/PyTorchLightning/pytorch-lightning/pull/6816))
228+
- Fixed a bug where `TensorBoardLogger` would give a warning and not log correctly to a symbolic link `save_dir` ([#6730](https://github.com/PyTorchLightning/pytorch-lightning/pull/6730))
207229

208230

209231
## [1.2.6] - 2021-03-30
@@ -240,8 +262,6 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
240262
- Added Autocast in validation, test and predict modes for Native AMP ([#6565](https://github.com/PyTorchLightning/pytorch-lightning/pull/6565))
241263

242264

243-
- Fixed resolve a bug with omegaconf and xm.save ([#6741](https://github.com/PyTorchLightning/pytorch-lightning/pull/6741))
244-
245265
## [1.2.4] - 2021-03-16
246266

247267
### Changed

docs/source/advanced/multi_gpu.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -794,7 +794,7 @@ DeepSpeed ZeRO Stage 3
794794
DeepSpeed ZeRO Stage 3 shards the optimizer states, gradients and the model parameters (also optionally activations). Sharding model parameters and activations comes with an increase in distributed communication, however allows you to scale your models massively from one GPU to multiple GPUs.
795795
**The DeepSpeed team report the ability to fine-tune models with over 40B parameters on a single GPU and over 2 Trillion parameters on 512 GPUs.** For more information we suggest checking the `DeepSpeed ZeRO-3 Offload documentation <https://www.deepspeed.ai/news/2021/03/07/zero3-offload.html>`__.
796796

797-
We've ran benchmarks and give a simple example of how all these features in Lightning, which you can see at `minGPT <https://github.com/SeanNaren/minGPT/tree/stage3>`_.
797+
We've ran benchmarks for all these features and given a simple example of how all these features work in Lightning, which you can see at `minGPT <https://github.com/SeanNaren/minGPT/tree/stage3>`_.
798798

799799
Currently this functionality is only available on master and will be included in our next 1.3 Release Candidate and 1.3 release.
800800

@@ -815,7 +815,7 @@ Also please have a look at our :ref:`deepspeed-zero-stage-3-tips` which contains
815815

816816
.. note::
817817
Currently we only support non-elastic checkpointing. This means saving the model across GPUs will save shards of the model on all processes, which will then require the same amount of GPUS to load.
818-
This additionally means for inference you must use the ``Trainer.test` or ``Trainer.predict`` functionality as described below, to ensure we set up the distributed environment correctly.
818+
This additionally means for inference you must use the ``Trainer.test`` or ``Trainer.predict`` functionality as described below, to ensure we set up the distributed environment correctly.
819819

820820
This limitation is actively being worked on and will be resolved in the near future.
821821

@@ -849,10 +849,10 @@ We expose a hook that layers initialized within the hook will be sharded instant
849849
This reduces the time taken to initialize very large models, as well as ensure we do not run out of memory when instantiating larger models. For more information you can refer to the DeepSpeed docs for `Constructing Massive Models <https://deepspeed.readthedocs.io/en/latest/zero3.html>`_.
850850

851851
.. note::
852-
When using ``configure_sharded_model`` hook to shard models, note that ``LightningModule.load_from_checkpoint`` for loading saved checkpoints may not work. If you've trained on one GPU, you can manually instantiate the model and call the hook,
852+
When using the ``configure_sharded_model`` hook to shard models, note that ``LightningModule.load_from_checkpoint`` may not work for loading saved checkpoints. If you've trained on one GPU, you can manually instantiate the model and call the hook,
853853
however when using multiple GPUs, this will not work as ``LightningModule.load_from_checkpoint`` doesn't support sharded checkpoints.
854854

855-
We recommend using the ``Trainer`` and using ``Trainer.test`` or ``Trainer.predict`` for inference.
855+
We recommend using ``Trainer.test`` or ``Trainer.predict`` for inference.
856856

857857
.. code-block:: python
858858
@@ -945,7 +945,7 @@ This saves memory when training larger models however requires using a checkpoin
945945
DeepSpeed ZeRO Stage 3 Tips
946946
"""""""""""""""""""""""""""
947947

948-
Here are some helpful information when setting up DeepSpeed ZeRO Stage 3 with Lightning.
948+
Here is some helpful information when setting up DeepSpeed ZeRO Stage 3 with Lightning.
949949

950950
* If you're using Adam or AdamW, ensure to use FusedAdam or DeepSpeedCPUAdam (for CPU Offloading) rather than the default torch optimizers as they come with large speed benefits
951951
* Treat your GPU/CPU memory as one large pool. In some cases, you may not want to offload certain things (like activations) to provide even more space to offload model parameters

docs/source/advanced/training_tricks.rst

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,10 @@ The effect is a large effective batch size of size KxN.
2626

2727
Gradient Clipping
2828
-----------------
29-
Gradient clipping may be enabled to avoid exploding gradients. Specifically, this will `clip the gradient
30-
norm <https://pytorch.org/docs/stable/nn.html#torch.nn.utils.clip_grad_norm_>`_ computed over all model parameters together.
29+
Gradient clipping may be enabled to avoid exploding gradients. By default, this will `clip the gradient norm
30+
<https://pytorch.org/docs/stable/nn.html#torch.nn.utils.clip_grad_norm_>`_ computed over all model parameters together.
31+
If ``gradient_clip_algorithm`` option is set to ``value``, which is ``norm`` by default, this will
32+
`clip the gradient value <https://pytorch.org/docs/stable/nn.html#torch.nn.utils.clip_grad_value_>`_ for each parameter instead.
3133

3234
.. seealso:: :class:`~pytorch_lightning.trainer.trainer.Trainer`
3335

@@ -39,6 +41,10 @@ norm <https://pytorch.org/docs/stable/nn.html#torch.nn.utils.clip_grad_norm_>`_
3941
# clip gradients with norm above 0.5
4042
trainer = Trainer(gradient_clip_val=0.5)
4143

44+
# clip gradients with value above 0.5
45+
# gradient_clip_algorithm types => :class:`~pytorch_lightning.utilities.enums.GradClipAlgorithmType`
46+
trainer = Trainer(gradient_clip_val=0.5, gradient_clip_algorithm='value')
47+
4248
----------
4349

4450
Stochastic Weight Averaging

docs/source/api_references.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,5 +93,6 @@ Utilities API
9393
:toctree: api
9494
:nosignatures:
9595

96+
cli
9697
argparse_utils
9798
seed

0 commit comments

Comments
 (0)