Skip to content

Commit b49c809

Browse files
authored
Merge branch 'master' into attach-dataloaders
2 parents 9dcfd09 + 075de93 commit b49c809

File tree

88 files changed

+1504
-366
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+1504
-366
lines changed

.github/workflows/ci_test-full.yml

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,17 @@ jobs:
1818
fail-fast: false
1919
matrix:
2020
os: [ubuntu-18.04, windows-2019, macOS-10.15]
21-
python-version: [3.6, 3.7, 3.8, 3.9]
21+
python-version: [3.6, 3.8, 3.9]
2222
requires: ['minimal', 'latest']
23+
release: ['stable']
2324
exclude:
2425
- python-version: 3.9
2526
requires: 'minimal'
27+
include:
28+
- os: ubuntu-20.04
29+
python-version: 3.9
30+
requires: 'latest'
31+
release: 'pre'
2632

2733
# Timeout: https://stackoverflow.com/a/59076067/4521646
2834
# TODO: the macOS is taking too long, probably caching did not work...
@@ -96,9 +102,9 @@ jobs:
96102
uses: actions/cache@v2
97103
with:
98104
path: ${{ steps.pip-cache.outputs.dir }}
99-
key: ${{ runner.os }}-pip-py${{ matrix.python-version }}-${{ matrix.requires }}-${{ hashFiles('requirements.txt') }}-${{ hashFiles('requirements/extra.txt') }}
105+
key: ${{ runner.os }}-pip-py${{ matrix.python-version }}-${{ matrix.release }}-${{ matrix.requires }}-${{ hashFiles('requirements.txt') }}-${{ hashFiles('requirements/extra.txt') }}
100106
restore-keys: |
101-
${{ runner.os }}-pip-py${{ matrix.python-version }}-${{ matrix.requires }}-
107+
${{ runner.os }}-pip-py${{ matrix.python-version }}-${{ matrix.release }}-${{ matrix.requires }}-
102108
103109
- name: Pull checkpoints from S3
104110
run: |
@@ -126,7 +132,8 @@ jobs:
126132
python --version
127133
pip --version
128134
# python -m pip install --upgrade --user pip
129-
pip install --requirement requirements.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --upgrade
135+
flag=$(python -c "print('--pre' if '${{matrix.release}}' == 'pre' else '')" 2>&1)
136+
pip install --requirement requirements.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --upgrade $flag
130137
# adjust versions according installed Torch version
131138
python ./requirements/adjust_versions.py requirements/extra.txt
132139
python ./requirements/adjust_versions.py requirements/examples.txt
@@ -158,7 +165,7 @@ jobs:
158165
- name: Tests
159166
run: |
160167
# NOTE: do not include coverage report here, see: https://github.com/nedbat/coveragepy/issues/1003
161-
coverage run --source pytorch_lightning -m pytest pytorch_lightning tests -v --durations=50 --junitxml=junit/test-results-${{ runner.os }}-py${{ matrix.python-version }}-${{ matrix.requires }}.xml
168+
coverage run --source pytorch_lightning -m pytest pytorch_lightning tests -v --durations=50 --junitxml=junit/test-results-${{ runner.os }}-py${{ matrix.python-version }}-${{ matrix.requires }}-${{ matrix.release }}.xml
162169
163170
- name: Examples
164171
run: |
@@ -167,8 +174,8 @@ jobs:
167174
- name: Upload pytest results
168175
uses: actions/upload-artifact@v2
169176
with:
170-
name: pytest-results-${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.requires }}
171-
path: junit/test-results-${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.requires }}.xml
177+
name: pytest-results-${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.requires }}-${{ matrix.release }}
178+
path: junit/test-results-${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.requires }}-${{ matrix.release }}.xml
172179
if: failure()
173180

174181
- name: Statistics

CHANGELOG.md

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,13 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
99

1010
### Added
1111

12+
13+
- Added support for the `EarlyStopping` callback to run at the end of the training epoch ([#6944](https://github.com/PyTorchLightning/pytorch-lightning/pull/6944/))
14+
15+
16+
- Added synchronization points before and after `setup` hooks are run ([#7202](https://github.com/PyTorchLightning/pytorch-lightning/pull/7202))
17+
18+
1219
- Added a `teardown` hook to `ClusterEnvironment` ([#6942](https://github.com/PyTorchLightning/pytorch-lightning/pull/6942))
1320

1421

@@ -114,6 +121,20 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
114121
- Added new `EarlyStopping` parameters `stopping_threshold` and `divergence_threshold` ([#6868](https://github.com/PyTorchLightning/pytorch-lightning/pull/6868))
115122

116123

124+
- Added `debug` flag to TPU Training Plugins (PT_XLA_DEBUG) ([#7219](https://github.com/PyTorchLightning/pytorch-lightning/pull/7219))
125+
126+
127+
- Added new `UnrepeatedDistributedSampler` and `IndexBatchSamplerWrapper` for tracking distributed predictions ([#7215](https://github.com/PyTorchLightning/pytorch-lightning/pull/7215))
128+
129+
130+
- Added `trainer.predict(return_predictions=None|False|True)` ([#7215](https://github.com/PyTorchLightning/pytorch-lightning/pull/7215))
131+
132+
133+
- Added `BasePredictionWriter` callback to implement prediction saving ([#7127](https://github.com/PyTorchLightning/pytorch-lightning/pull/7127))
134+
135+
136+
- Added `tpu_distributed` check for TPU Spawn barrier ([#7241](https://github.com/PyTorchLightning/pytorch-lightning/pull/7241))
137+
117138

118139
### Changed
119140

@@ -144,11 +165,20 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
144165
- Changed warnings and recommendations for dataloaders in `ddp_spawn` ([#6762](https://github.com/PyTorchLightning/pytorch-lightning/pull/6762/))
145166

146167

147-
- `pl.seed_everyting` will now also set the seed on the `DistributedSampler` ([#7024](https://github.com/PyTorchLightning/pytorch-lightning/pull/7024))
168+
- `pl.seed_everything` will now also set the seed on the `DistributedSampler` ([#7024](https://github.com/PyTorchLightning/pytorch-lightning/pull/7024))
169+
170+
171+
- Changed default setting for communication of multi-node training using `DDPShardedPlugin` ([#6937](https://github.com/PyTorchLightning/pytorch-lightning/pull/6937))
172+
173+
174+
- `LightningModule.from_datasets()` now accepts `IterableDataset` instances as training datasets. ([#7503](https://github.com/PyTorchLightning/pytorch-lightning/pull/7503))
148175

149176

150177
### Deprecated
151178

179+
- Deprecated the `save_function` property from the `ModelCheckpoint` callback ([#7201](https://github.com/PyTorchLightning/pytorch-lightning/pull/7201))
180+
181+
152182
- Deprecated `LightningModule.write_predictions` and `LigtningModule.write_predictions_dict` ([#7066](https://github.com/PyTorchLightning/pytorch-lightning/pull/7066))
153183

154184

@@ -190,6 +220,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
190220

191221
### Removed
192222

223+
224+
- Removed `automatic_optimization` as a property from the training loop in favor of `LightningModule.automatic_optimization` ([#7130](https://github.com/PyTorchLightning/pytorch-lightning/pull/7130))
225+
226+
193227
- Removed evaluation loop legacy returns for `*_epoch_end` hooks ([#6973](https://github.com/PyTorchLightning/pytorch-lightning/pull/6973))
194228

195229

@@ -225,6 +259,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
225259
- Removed legacy code to log or include metrics in the progress bar by returning them in a dict with the `"log"/"progress_bar"` magic keys. Use `self.log` instead ([#6734](https://github.com/PyTorchLightning/pytorch-lightning/pull/6734))
226260

227261

262+
- Removed `trainer.fit()` return value of `1`. It has no return now ([#7237](https://github.com/PyTorchLightning/pytorch-lightning/pull/7237))
263+
264+
228265
- Removed `optimizer_idx` argument from `training_step` in manual optimization ([#6093](https://github.com/PyTorchLightning/pytorch-lightning/pull/6093))
229266

230267

@@ -233,6 +270,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
233270
- Fixed attaching train and validation dataloaders when `reload_dataloaders_every_epoch=True` and `num_sanity_val_steps=0` ([#7207](https://github.com/PyTorchLightning/pytorch-lightning/pull/7207))
234271

235272

273+
- Added a barrier in the accelerator `teardown` to synchronize processes before execution finishes ([#6814](https://github.com/PyTorchLightning/pytorch-lightning/pull/6814))
274+
275+
236276
- Fixed multi-node DDP sub-process launch by using `local_rank` instead of `global_rank` for main process assertion ([#7061](https://github.com/PyTorchLightning/pytorch-lightning/pull/7061))
237277

238278

@@ -337,9 +377,18 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
337377
- Fixed parsing for pre-release package versions ([#6999](https://github.com/PyTorchLightning/pytorch-lightning/pull/6999))
338378

339379

380+
- Fixed `num_sanity_val_steps` affecting reproducibility of training data shuffling ([#7014](https://github.com/PyTorchLightning/pytorch-lightning/pull/7014))
381+
382+
340383
- Fixed resetting device after `fitting/evaluating/predicting` ([#7188](https://github.com/PyTorchLightning/pytorch-lightning/pull/7188))
341384

342385

386+
- Fixed metrics not being properly logged with `precision=16` and `manual_optimization` ([#7228](https://github.com/PyTorchLightning/pytorch-lightning/pull/7228))
387+
388+
389+
- Fixed `parameters_to_ignore` not properly set to DDPWrapper ([#7239](https://github.com/PyTorchLightning/pytorch-lightning/pull/7239))
390+
391+
343392
## [1.2.7] - 2021-04-06
344393

345394
### Fixed

dockers/README.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ docker image list
4545
docker image rm pytorch-lightning:latest
4646
```
4747

48-
### Run docker image with GPUs
48+
## Run docker image with GPUs
4949

5050
To run docker image with access to you GPUs you need to install
5151
```bash
@@ -63,3 +63,23 @@ and later run the docker image with `--gpus all` so for example
6363
```
6464
docker run --rm -it --gpus all pytorchlightning/pytorch_lightning:base-cuda-py3.7-torch1.6
6565
```
66+
67+
## Run Jupyter server
68+
69+
Inspiration comes from https://u.group/thinking/how-to-put-jupyter-notebooks-in-a-dockerfile
70+
71+
1. Build the docker image:
72+
```bash
73+
docker image build \
74+
-t pytorch-lightning:v1.2.9 \
75+
-f dockers/nvidia/Dockerfile \
76+
--build-arg LIGHTNING_VERSION=1.2.9 \
77+
.
78+
```
79+
2. start the server and map ports:
80+
```bash
81+
docker run --rm -it --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -p 8888:8888 pytorch-lightning:v1.2.9
82+
```
83+
3. Connect in local browser:
84+
- copy the generated path e.g. `http://hostname:8888/?token=0719fa7e1729778b0cec363541a608d5003e26d4910983c6`
85+
- replace the `hostname` by `localhost`

dockers/nvidia/Dockerfile

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,18 +13,18 @@
1313
# limitations under the License.
1414

1515
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel_21-03.html#rel_21-03
16-
FROM nvcr.io/nvidia/pytorch:20.12-py3
16+
FROM nvcr.io/nvidia/pytorch:21.03-py3
1717

1818
MAINTAINER PyTorchLightning <https://github.com/PyTorchLightning>
1919

2020
ARG LIGHTNING_VERSION=""
2121

22+
RUN python -c "import torch ; print(torch.__version__)" >> torch_version.info
23+
2224
COPY ./ /workspace/pytorch-lightning/
2325

2426
RUN \
2527
cd /workspace && \
26-
mv pytorch-lightning/notebooks . && \
27-
mv pytorch-lightning/pl_examples . && \
2828
# replace by specific version if asked
2929
if [ ! -z "$LIGHTNING_VERSION" ] ; then \
3030
rm -rf pytorch-lightning ; \
@@ -33,18 +33,28 @@ RUN \
3333
mv pytorch-lightning-*/ pytorch-lightning ; \
3434
rm *.zip ; \
3535
fi && \
36+
# save the examples
37+
mv pytorch-lightning/notebooks . && \
38+
mv pytorch-lightning/pl_examples . && \
3639

3740
# Installations
3841
python -c "fname = './pytorch-lightning/requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if not line.startswith('horovod')] ; open(fname, 'w').writelines(lines)" && \
3942
pip install -r ./pytorch-lightning/requirements/extra.txt --no-cache-dir --upgrade-strategy only-if-needed && \
4043
pip install -r ./pytorch-lightning/requirements/examples.txt --no-cache-dir --upgrade-strategy only-if-needed && \
4144
pip install ./pytorch-lightning --no-cache-dir && \
42-
pip install "Pillow>=8.1" "torchtext>=0.9.0" ipython[all] --no-cache-dir --upgrade-strategy only-if-needed && \
43-
rm -rf pytorch-lightning
45+
pip install "Pillow>=8.1" --no-cache-dir --upgrade-strategy only-if-needed && \
46+
rm -rf pytorch-lightning && \
47+
pip list
48+
49+
ENV PYTHONPATH="/workspace"
4450

45-
RUN python --version && \
51+
RUN \
52+
TORCH_VERSION=$(cat torch_version.info) && \
53+
rm torch_version.info && \
54+
python --version && \
4655
pip --version && \
47-
pip list && \
56+
pip list | grep torch && \
57+
python -c "from torch import __version__ as ver ; assert ver == '$TORCH_VERSION', ver" && \
4858
python -c "import pytorch_lightning as pl; print(pl.__version__)"
4959

50-
# CMD ["/bin/bash"]
60+
CMD ["jupyter", "notebook", "--port=8888", "--no-browser", "--ip=0.0.0.0", "--allow-root"]

docs/source/advanced/multi_gpu.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -282,7 +282,10 @@ Data Parallel
282282
That is, if you have a batch of 32 and use DP with 2 gpus, each GPU will process 16 samples,
283283
after which the root node will aggregate the results.
284284

285-
.. warning:: DP use is discouraged by PyTorch and Lightning. Use DDP which is more stable and at least 3x faster
285+
.. warning:: DP use is discouraged by PyTorch and Lightning. State is not maintained on the replicas created by the
286+
:class:`~torch.nn.DataParallel` wrapper and you may see errors or misbehavior if you assign state to the module
287+
in the ``forward()`` or ``*_step()`` methods. For the same reason we do cannot fully support
288+
:ref:`manual_optimization` with DP. Use DDP which is more stable and at least 3x faster.
286289

287290
.. testcode::
288291
:skipif: torch.cuda.device_count() < 2
@@ -675,7 +678,7 @@ To use Sharded Training, you need to first install FairScale using the command b
675678
.. code-block:: python
676679
677680
# train using Sharded DDP
678-
trainer = Trainer(accelerator='ddp', plugins='ddp_sharded')
681+
trainer = Trainer(plugins='ddp_sharded')
679682
680683
Sharded Training can work across all DDP variants by adding the additional ``--plugins ddp_sharded`` flag.
681684

docs/source/common/lightning_cli.rst

Lines changed: 68 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,9 @@
3333
MyModelBaseClass = MyModel
3434
MyDataModuleBaseClass = MyDataModule
3535

36+
EncoderBaseClass = MyModel
37+
DecoderBaseClass = MyModel
38+
3639
mock_argv = mock.patch("sys.argv", ["any.py"])
3740
mock_argv.start()
3841

@@ -116,7 +119,7 @@ The start of a possible implementation of :class:`MyModel` including the recomme
116119
docstring could be the one below. Note that by using type hints and docstrings there is no need to duplicate this
117120
information to define its configurable arguments.
118121

119-
.. code-block:: python
122+
.. testcode::
120123

121124
class MyModel(LightningModule):
122125

@@ -131,7 +134,8 @@ information to define its configurable arguments.
131134
encoder_layers: Number of layers for the encoder
132135
decoder_layers: Number of layers for each decoder block
133136
"""
134-
...
137+
super().__init__()
138+
self.save_hyperparameters()
135139

136140
With this model class, the help of the trainer tool would look as follows:
137141

@@ -258,7 +262,67 @@ A possible config file could be as follows:
258262
...
259263
260264
Only model classes that are a subclass of :code:`MyModelBaseClass` would be allowed, and similarly only subclasses of
261-
:code:`MyDataModuleBaseClass`.
265+
:code:`MyDataModuleBaseClass`. If as base classes :class:`~pytorch_lightning.core.lightning.LightningModule` and
266+
:class:`~pytorch_lightning.core.datamodule.LightningDataModule` are given, then the tool would allow any lightning
267+
module and data module.
268+
269+
.. tip::
270+
271+
Note that with the subclass modes the :code:`--help` option does not show information for a specific subclass. To
272+
get help for a subclass the options :code:`--model.help` and :code:`--data.help` can be used, followed by the
273+
desired class path. Similarly :code:`--print_config` does not include the settings for a particular subclass. To
274+
include them the class path should be given before the :code:`--print_config` option. Examples for both help and
275+
print config are:
276+
277+
.. code-block:: bash
278+
279+
$ python trainer.py --model.help mycode.mymodels.MyModel
280+
$ python trainer.py --model mycode.mymodels.MyModel --print_config
281+
282+
283+
Models with multiple submodules
284+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
285+
286+
Many use cases require to have several modules each with its own configurable options. One possible way to handle this
287+
with LightningCLI is to implement a single module having as init parameters each of the submodules. Since the init
288+
parameters have as type a class, then in the configuration these would be specified with :code:`class_path` and
289+
:code:`init_args` entries. For instance a model could be implemented as:
290+
291+
.. testcode::
292+
293+
class MyMainModel(LightningModule):
294+
295+
def __init__(
296+
self,
297+
encoder: EncoderBaseClass,
298+
decoder: DecoderBaseClass
299+
):
300+
"""Example encoder-decoder submodules model
301+
302+
Args:
303+
encoder: Instance of a module for encoding
304+
decoder: Instance of a module for decoding
305+
"""
306+
super().__init__()
307+
self.encoder = encoder
308+
self.decoder = decoder
309+
310+
If the CLI is implemented as :code:`LightningCLI(MyMainModel)` the configuration would be as follows:
311+
312+
.. code-block:: yaml
313+
314+
model:
315+
encoder:
316+
class_path: mycode.myencoders.MyEncoder
317+
init_args:
318+
...
319+
decoder:
320+
class_path: mycode.mydecoders.MyDecoder
321+
init_args:
322+
...
323+
324+
It is also possible to combine :code:`subclass_mode_model=True` and submodules, thereby having two levels of
325+
:code:`class_path`.
262326

263327

264328
Customizing LightningCLI
@@ -275,7 +339,7 @@ extended to customize different parts of the command line tool. The argument par
275339
adding arguments can be done using the :func:`add_argument` method. In contrast to argparse it has additional methods to
276340
add arguments, for example :func:`add_class_arguments` adds all arguments from the init of a class, though requiring
277341
parameters to have type hints. For more details about this please refer to the `respective documentation
278-
<https://omni-us.github.io/jsonargparse/#classes-methods-and-functions>`_.
342+
<https://jsonargparse.readthedocs.io/en/stable/#classes-methods-and-functions>`_.
279343

280344
The :class:`~pytorch_lightning.utilities.cli.LightningCLI` class has the
281345
:meth:`~pytorch_lightning.utilities.cli.LightningCLI.add_arguments_to_parser` method which can be implemented to include

docs/source/common/optimizers.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ For advanced/expert users who want to do esoteric optimization schedules or tech
1515

1616
-----
1717

18+
.. _manual_optimization:
19+
1820
Manual optimization
1921
===================
2022
For advanced research topics like reinforcement learning, sparse coding, or GAN research, it may be desirable to

0 commit comments

Comments
 (0)