You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+26-13Lines changed: 26 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,18 +4,8 @@ All notable changes to this project will be documented in this file.
4
4
5
5
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
6
6
7
-
## Unreleased
8
-
9
-
### Added
10
-
11
-
- Added `all_gather` method to `LightningModule` which allows gradient based tensor synchronizations for use-cases such as negative sampling. ([#5012](https://github.com/PyTorchLightning/pytorch-lightning/pull/5012))
12
-
13
-
### Fixed
14
-
15
-
- Fixed `LoggerConnector` to have logged metrics on root device in DP ([#4138](https://github.com/PyTorchLightning/pytorch-lightning/pull/4138))
16
7
17
-
18
-
## [1.1.0rc] - 2020-12-02
8
+
## [1.1.0rc2] - 2020-12-02
19
9
20
10
### Added
21
11
@@ -89,6 +79,15 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
89
79
- Added `Pytorch Geometric` integration example with Lightning ([#4568](https://github.com/PyTorchLightning/pytorch-lightning/pull/4568))
90
80
91
81
82
+
- Added `all_gather` method to `LightningModule` which allows gradient based tensor synchronizations for use-cases such as negative sampling. ([#5012](https://github.com/PyTorchLightning/pytorch-lightning/pull/5012))
83
+
84
+
85
+
- Enabled `self.log` in most functions ([#4969](https://github.com/PyTorchLightning/pytorch-lightning/pull/4969))
86
+
87
+
88
+
- Added changeable extension variable for `ModelCheckpoint` ([#4977](https://github.com/PyTorchLightning/pytorch-lightning/pull/4977))
89
+
90
+
92
91
### Changed
93
92
94
93
- Removed `multiclass_roc` and `multiclass_precision_recall_curve`, use `roc` and `precision_recall_curve` instead ([#4549](https://github.com/PyTorchLightning/pytorch-lightning/pull/4549))
@@ -108,6 +107,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
108
107
- Changed `Simple Profiler` report to order by percentage time spent + num calls ([#4880](https://github.com/PyTorchLightning/pytorch-lightning/pull/4880))
- Deprecated `prefix` argument in `ModelCheckpoint` ([#4765](https://github.com/PyTorchLightning/pytorch-lightning/pull/4765))
@@ -127,12 +132,22 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
127
132
128
133
- Added feature to move tensors to CPU before saving ([#4309](https://github.com/PyTorchLightning/pytorch-lightning/pull/4309))
129
134
135
+
130
136
- Fixed `LoggerConnector` to have logged metrics on root device in DP ([#4138](https://github.com/PyTorchLightning/pytorch-lightning/pull/4138))
131
137
132
138
133
139
- Auto convert tensors to contiguous format when `gather_all` ([#4907](https://github.com/PyTorchLightning/pytorch-lightning/pull/4907))
134
140
135
141
142
+
- Fixed `PYTHONPATH` for ddp test model ([#4528](https://github.com/PyTorchLightning/pytorch-lightning/pull/4528))
143
+
144
+
145
+
- Fixed allowing logger to support indexing ([#4595](https://github.com/PyTorchLightning/pytorch-lightning/pull/4595))
146
+
147
+
148
+
- Fixed DDP and manual_optimization ([#4976](https://github.com/PyTorchLightning/pytorch-lightning/pull/4976))
149
+
150
+
136
151
## [1.0.8] - 2020-11-24
137
152
138
153
### Added
@@ -166,11 +181,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
166
181
167
182
- Added lambda closure to `manual_optimizer_step` ([#4618](https://github.com/PyTorchLightning/pytorch-lightning/pull/4618))
168
183
169
-
170
184
### Changed
171
185
172
186
- Change Metrics `persistent` default mode to `False` ([#4685](https://github.com/PyTorchLightning/pytorch-lightning/pull/4685))
173
-
174
187
- LoggerConnector log_metrics will use `total_batch_idx` instead of `global_step` when logging on `training step` ([#4738](https://github.com/PyTorchLightning/pytorch-lightning/pull/4738))
Copy file name to clipboardExpand all lines: docs/source/multi_gpu.rst
+81-6Lines changed: 81 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -612,6 +612,7 @@ This is useful when dealing with large Transformer based models, or in environme
612
612
Lightning currently offers the following methods to leverage model parallelism:
613
613
614
614
- Sharded Training (partitioning your gradients and optimizer state across multiple GPUs, for reduced memory overhead with **no performance loss**)
615
+
- Sequential Model Parallelism with Checkpointing (partition your :class:`nn.Sequential <torch.nn.Sequential>` module across multiple GPUs, leverage checkpointing and microbatching for further memory improvements and device utilization)
615
616
616
617
Sharded Training
617
618
^^^^^^^^^^^^^^^^
@@ -666,7 +667,7 @@ To use Sharded Training, you need to first install FairScale using the command b
@@ -678,6 +679,80 @@ Sharded Training can work across all DDP variants by adding the additional ``--p
678
679
679
680
Internally we re-initialize your optimizers and shard them across your machines and processes. We handle all communication using PyTorch distributed, so no code changes are required.
680
681
682
+
----------
683
+
684
+
.. _sequential-parallelism:
685
+
686
+
Sequential Model Parallelism with Checkpointing
687
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
688
+
PyTorch Lightning integration for Sequential Model Parallelism using `FairScale <https://github.com/facebookresearch/fairscale>`_.
689
+
Sequential Model Parallelism splits a sequential module onto multiple GPUs, reducing peak GPU memory requirements substantially.
690
+
We also provide auto-balancing techniques through FairScale, to find optimal balances for the model across GPUs.
691
+
In addition, we use Gradient Checkpointing to reduce GPU memory requirements further, and micro-batches to minimizing device under-utilization automatically.
692
+
693
+
Reference: https://arxiv.org/abs/1811.06965
694
+
695
+
.. note:: DDPSequentialPlugin is currently supported only for Pytorch 1.6.
696
+
697
+
To get started, install FairScale through extras using with ``pip install pytorch-lightning["extra"]``
To use Sequential Model Parallelism, you must define a :class:`nn.Sequential <torch.nn.Sequential>` module that defines the layers you wish to parallelize across GPUs.
706
+
This should be kept within the ``sequential_module`` variable within your ``LightningModule`` like below.
707
+
708
+
.. code-block:: python
709
+
710
+
from pytorch_lightning.plugins.ddp_sequential_plugin import DDPSequentialPlugin
We provide a minimal example of Sequential Model Parallelism using a convolutional model training on cifar10, split onto GPUs `here <https://github.com/PyTorchLightning/pytorch-lightning/tree/master/pl_examples/basic_examples/conv_sequential_example.py>`_.
726
+
To run the example, you need to install `Bolts <https://github.com/PyTorchLightning/pytorch-lightning-bolts>`_. Install with ``pip install pytorch-lightning-bolts``.
727
+
728
+
When running the Sequential Model Parallelism example on 2 GPUS we achieve these memory savings.
729
+
730
+
.. list-table:: GPU Memory Utilization
731
+
:widths: 25 25 50
732
+
:header-rows: 1
733
+
734
+
* - GPUS
735
+
- Without Balancing
736
+
- With Balancing
737
+
* - Gpu 0
738
+
- 4436 MB
739
+
- 1554 MB
740
+
* - Gpu 1
741
+
- ~0
742
+
- 994 MB
743
+
744
+
To run the example with Sequential Model Parallelism:
@@ -728,17 +803,17 @@ Lightning supports the use of TorchElastic to enable fault-tolerant and elastic
728
803
.. code-block:: python
729
804
730
805
Trainer(gpus=8, accelerator='ddp')
731
-
732
-
806
+
807
+
733
808
Following the `TorchElastic Quickstart documentation <https://pytorch.org/elastic/latest/quickstart.html>`_, you then need to start a single-node etcd server on one of the hosts:
0 commit comments