Skip to content

Commit fc61804

Browse files
authored
Merge branch 'master' into bugfix/sanitize-array
2 parents dca3dcf + c8e9fb4 commit fc61804

File tree

28 files changed

+704
-335
lines changed

28 files changed

+704
-335
lines changed

CHANGELOG.md

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,15 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
6060
- Added sanitization of tensors when they get logged as hyperparameters in `TensorBoardLogger` ([#9031](https://github.com/PyTorchLightning/pytorch-lightning/pull/9031))
6161

6262

63+
- Added `InterBatchParallelDataFetcher` ([#9020](https://github.com/PyTorchLightning/pytorch-lightning/pull/9020))
64+
65+
66+
- Added `DataLoaderIterDataFetcher` ([#9020](https://github.com/PyTorchLightning/pytorch-lightning/pull/9020))
67+
68+
69+
- Added a friendly error message when DDP attempts to spawn new distributed processes with rank > 0 ([#9005](https://github.com/PyTorchLightning/pytorch-lightning/pull/9005))
70+
71+
6372
### Changed
6473

6574
- Parsing of the `gpus` Trainer argument has changed: `gpus="n"` (str) no longer selects the GPU index n and instead selects the first n devices. ([#8770](https://github.com/PyTorchLightning/pytorch-lightning/pull/8770))
@@ -114,10 +123,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
114123
- Deprecated `DataModule` properties: `train_transforms`, `val_transforms`, `test_transforms`, `size`, `dims` ([#8851](https://github.com/PyTorchLightning/pytorch-lightning/pull/8851))
115124

116125

117-
-
118-
119-
120-
-
126+
- Deprecated `prepare_data_per_node` flag on Trainer and set it as a property of `DataHooks`, accessible in the `LightningModule` and `LightningDataModule` [#8958](https://github.com/PyTorchLightning/pytorch-lightning/pull/8958)
121127

122128

123129
-
@@ -139,6 +145,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
139145
- Removed the deprecated `optimizer_idx` from `training_step` as an accepted argument in manual optimization ([#8576](https://github.com/PyTorchLightning/pytorch-lightning/pull/8576))
140146

141147

148+
- Removed support for the deprecated `on_save_checkpoint` signature. The hook now takes a `checkpoint` positional parameter ([#8697](https://github.com/PyTorchLightning/pytorch-lightning/pull/8697))
149+
150+
151+
- Removed support for the deprecated `on_load_checkpoint` signature. The hook now takes a `pl_module` positional parameter ([#8697](https://github.com/PyTorchLightning/pytorch-lightning/pull/8697))
152+
153+
142154
- Removed the deprecated `save_function` property in `ModelCheckpoint` ([#8680](https://github.com/PyTorchLightning/pytorch-lightning/pull/8680))
143155

144156

@@ -160,9 +172,15 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
160172
- Removed deprecated `GradInformation` module in favor of `pytorch_lightning.utilities.grads` ([#8831](https://github.com/PyTorchLightning/pytorch-lightning/pull/8831/))
161173

162174

175+
- Removed `TrainingTypePlugin.on_save` and `Accelerator.on_save` ([#9023](https://github.com/PyTorchLightning/pytorch-lightning/pull/9023))
176+
177+
163178
- Removed deprecated `connect_precision_plugin` and `connect_training_type_plugin` from `Accelerator` ([#9019](https://github.com/PyTorchLightning/pytorch-lightning/pull/9019))
164179

165180

181+
- Removed `on_train_epoch_end` from `Accelerator` ([#9035](https://github.com/PyTorchLightning/pytorch-lightning/pull/9035))
182+
183+
166184
### Fixed
167185

168186
- Ensure the existence of `DDPPlugin._sync_dir` in `reconciliate_processes` ([#8939](https://github.com/PyTorchLightning/pytorch-lightning/pull/8939))

docs/source/governance.rst

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,14 @@
11
.. _governance:
22

3-
Lightning Governance | Persons of interest
4-
==========================================
3+
Lightning Governance
4+
####################
5+
6+
This document describes governance processes we follow in developing PyTorch Lightning.
7+
8+
Persons of Interest
9+
*******************
10+
11+
.. _governance_bdfl:
512

613
BDFL
714
----
@@ -14,7 +21,7 @@ Leads
1421
-----
1522
- Jirka Borovec (`Borda <https://github.com/Borda>`_)
1623
- Ethan Harris (`ethanwharris <https://github.com/ethanwharris>`_) (Torchbearer founder)
17-
- Justus Schock (`justusschock <https://github.com/justusschock>`_) (Former Core Member PyTorch Ignite)
24+
- Justus Schock (`justusschock <https://github.com/justusschock>`_)
1825
- Adrian Wälchli (`awaelchli <https://github.com/awaelchli>`_)
1926
- Thomas Chaton (`tchaton <https://github.com/tchaton>`_)
2027
- Sean Narenthiran (`SeanNaren <https://github.com/SeanNaren>`_)
@@ -44,3 +51,50 @@ Alumni
4451
- Teddy Koker (`teddykoker <https://github.com/teddykoker>`_)
4552
- Nate Raw (`nateraw <https://github.com/nateraw>`_)
4653
- Peter Yu (`yukw777 <https://github.com/yukw777>`_)
54+
55+
56+
Releases
57+
********
58+
59+
We release a new minor version (e.g., 1.5.0) every three months and bugfix releases every week.
60+
The minor versions contain new features, API changes, deprecations, removals, potential backward-incompatible
61+
changes and also all previous bugfixes included in any bugfix release. With every release, we publish a changelog
62+
where we list additions, removals, changed functionality and fixes.
63+
64+
Project Management and Decision Making
65+
**************************************
66+
67+
The decision what goes into a release is governed by the :ref:`staff contributors and leaders <governance>` of
68+
Lightning development. Whenever possible, discussion happens publicly on GitHub and includes the whole community.
69+
For controversial changes, it is mandatory to seek consultation from :ref:`governance_bdfl` for a final decision.
70+
When a consensus is reached, staff and core contributors assign milestones and labels to the issue and/or pull request
71+
and start tracking the development. It is possible that priorities change over time.
72+
73+
Commits to the project are exclusively to be added by pull requests on GitHub and anyone in the community is welcome to
74+
review them. However, reviews submitted by
75+
`code owners <https://github.com/PyTorchLightning/pytorch-lightning/blob/master/.github/CODEOWNERS>`_
76+
have higher weight and it is necessary to get the approval of code owners before a pull request can be merged.
77+
Additional requirements may apply case by case.
78+
79+
API Evolution
80+
*************
81+
82+
Lightning's development is driven by research and best practices in a rapidly developing field of AI and machine
83+
learning. Change is inevitable and when it happens, the Lightning team is committed to minimizing user friction and
84+
maximizing ease of transition from one version to the next. We take backward compatibility and reproducibility very
85+
seriously.
86+
87+
For API removal, renaming or other forms of backward-incompatible changes, the procedure is:
88+
89+
#. A deprecation process is initiated at version X, producing warning messages at runtime and in the documentation.
90+
#. Calls to the deprecated API remain unchanged in their function during the deprecation phase.
91+
#. Two minor versions in the future at version X+2 the breaking change takes effect.
92+
93+
The "X+2" rule is a recommendation and not a strict requirement. Longer deprecation cylces may apply for some cases.
94+
95+
New API and features are declared as:
96+
97+
- *Experimental*: Anything labelled as *experimental* or *beta* in the documentation is considered unstable and should
98+
not be used in production. The community is encouraged to test the feature and report issues directly on GitHub.
99+
- *Stable*: Everything not specifically labelled as experimental should be considered stable. Reported issues will be
100+
treated with priority.

pytorch_lightning/accelerators/accelerator.py

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -371,9 +371,6 @@ def lightning_module_state_dict(self) -> Dict[str, Union[Any, Tensor]]:
371371
"""
372372
return self.training_type_plugin.lightning_module_state_dict()
373373

374-
def on_save(self, checkpoint: Dict[str, Union[Any, Tensor]]) -> Dict[str, Union[Any, Tensor]]:
375-
return self.training_type_plugin.on_save(checkpoint)
376-
377374
def barrier(self, name: Optional[str] = None) -> None:
378375
self.training_type_plugin.barrier(name=name)
379376

@@ -479,10 +476,6 @@ def restore_checkpoint_after_pre_dispatch(self) -> bool:
479476
def update_global_step(self, total_batch_idx: int, current_global_step: int) -> int:
480477
return self.training_type_plugin.update_global_step(total_batch_idx, current_global_step)
481478

482-
def on_train_epoch_end(self) -> None:
483-
"""Hook to do something on the end of an training epoch."""
484-
pass
485-
486479
def on_train_start(self) -> None:
487480
"""Called when train begins."""
488481
return self.training_type_plugin.on_train_start()

pytorch_lightning/callbacks/early_stopping.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,9 @@ def on_save_checkpoint(
159159
"patience": self.patience,
160160
}
161161

162-
def on_load_checkpoint(self, callback_state: Dict[str, Any]) -> None:
162+
def on_load_checkpoint(
163+
self, trainer: "pl.Trainer", pl_module: "pl.LightningModule", callback_state: Dict[str, Any]
164+
) -> None:
163165
self.wait_count = callback_state["wait_count"]
164166
self.stopped_epoch = callback_state["stopped_epoch"]
165167
self.best_score = callback_state["best_score"]

pytorch_lightning/callbacks/timer.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,9 @@ def on_save_checkpoint(
158158
) -> Dict[str, Any]:
159159
return {"time_elapsed": {stage.value: self.time_elapsed(stage) for stage in list(RunningStage)}}
160160

161-
def on_load_checkpoint(self, callback_state: Dict[str, Any]) -> None:
161+
def on_load_checkpoint(
162+
self, trainer: "pl.Trainer", pl_module: "pl.LightningModule", callback_state: Dict[str, Any]
163+
) -> None:
162164
time_elapsed = callback_state.get("time_elapsed", {})
163165
self._offset = time_elapsed.get(RunningStage.TRAINING.value, 0)
164166

pytorch_lightning/core/hooks.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -372,6 +372,16 @@ def configure_sharded_model(self) -> None:
372372
class DataHooks:
373373
"""Hooks to be used for data related stuff."""
374374

375+
def __init__(self) -> None:
376+
"""
377+
Attributes:
378+
prepare_data_per_node:
379+
If True, each LOCAL_RANK=0 will call prepare data.
380+
Otherwise only NODE_RANK=0, LOCAL_RANK=0 will prepare data.
381+
"""
382+
super().__init__()
383+
self.prepare_data_per_node: bool = True
384+
375385
def prepare_data(self) -> None:
376386
"""
377387
Use this to download and prepare data.
@@ -405,6 +415,10 @@ def prepare_data(self):
405415
# call on GLOBAL_RANK=0 (great for shared file systems)
406416
Trainer(prepare_data_per_node=False)
407417
418+
Note:
419+
Setting ``prepare_data_per_node`` with the trainer flag is deprecated and will be removed in v1.7.0.
420+
Please set ``prepare_data_per_node`` in LightningDataModule or LightningModule directly instead.
421+
408422
This is called before requesting the dataloaders:
409423
410424
.. code-block:: python

pytorch_lightning/core/saving.py

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -212,25 +212,6 @@ def _load_model_state(cls, checkpoint: Dict[str, Any], strict: bool = True, **cl
212212

213213
return model
214214

215-
def on_load_checkpoint(self, checkpoint: Dict[str, Any]) -> None:
216-
"""
217-
Do something with the checkpoint.
218-
Gives model a chance to load something before ``state_dict`` is restored.
219-
220-
Args:
221-
checkpoint: A dictionary with variables from the checkpoint.
222-
"""
223-
224-
def on_save_checkpoint(self, checkpoint: Dict[str, Any]) -> None:
225-
"""
226-
Give the model a chance to add something to the checkpoint.
227-
``state_dict`` is already there.
228-
229-
Args:
230-
checkpoint: A dictionary in which you can save variables to save in a checkpoint.
231-
Contents need to be pickleable.
232-
"""
233-
234215
# -------------------------
235216
# OPTIONAL HOOKS
236217
# -------------------------

pytorch_lightning/loggers/mlflow.py

Lines changed: 26 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -171,14 +171,24 @@ def experiment(self) -> MlflowClient:
171171
return self._mlflow_client
172172

173173
@property
174-
def run_id(self):
175-
# create the experiment if it does not exist to get the run id
174+
def run_id(self) -> str:
175+
"""
176+
Create the experiment if it does not exist to get the run id.
177+
178+
Returns:
179+
The run id.
180+
"""
176181
_ = self.experiment
177182
return self._run_id
178183

179184
@property
180-
def experiment_id(self):
181-
# create the experiment if it does not exist to get the experiment id
185+
def experiment_id(self) -> str:
186+
"""
187+
Create the experiment if it does not exist to get the experiment id.
188+
189+
Returns:
190+
The experiment id.
191+
"""
182192
_ = self.experiment
183193
return self._experiment_id
184194

@@ -239,8 +249,20 @@ def save_dir(self) -> Optional[str]:
239249

240250
@property
241251
def name(self) -> str:
252+
"""
253+
Get the experiment id.
254+
255+
Returns:
256+
The experiment id.
257+
"""
242258
return self.experiment_id
243259

244260
@property
245261
def version(self) -> str:
262+
"""
263+
Get the run id.
264+
265+
Returns:
266+
The run id.
267+
"""
246268
return self.run_id

pytorch_lightning/loggers/wandb.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
from pytorch_lightning.utilities import _module_available, rank_zero_only
3030
from pytorch_lightning.utilities.exceptions import MisconfigurationException
3131
from pytorch_lightning.utilities.imports import _compare_version
32-
from pytorch_lightning.utilities.warnings import rank_zero_deprecation, rank_zero_warn
32+
from pytorch_lightning.utilities.warnings import rank_zero_warn
3333

3434
_WANDB_AVAILABLE = _module_available("wandb")
3535
_WANDB_GREATER_EQUAL_0_10_22 = _compare_version("wandb", operator.ge, "0.10.22")

pytorch_lightning/loops/epoch/training_epoch_loop.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -135,8 +135,10 @@ def advance(self, dataloader_iter: Iterator, **kwargs: Any) -> None:
135135
# ------------------------------------
136136
# TRAINING_STEP + TRAINING_STEP_END
137137
# ------------------------------------
138-
with self.trainer.profiler.profile("training_batch_to_device"):
139-
batch = self.trainer.accelerator.batch_to_device(batch)
138+
# FIXME: Remove with InterBatchProcessor.
139+
if not self.trainer.data_connector.data_fetcher.store_on_device:
140+
with self.trainer.profiler.profile("training_batch_to_device"):
141+
batch = self.trainer.accelerator.batch_to_device(batch)
140142

141143
self.batch_progress.increment_ready()
142144

0 commit comments

Comments
 (0)