Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -434,6 +434,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Fixed `LightningModule.save_hyperparameters()` when attempting to save an empty container ([#7268](https://github.com/PyTorchLightning/pytorch-lightning/pull/7268))


- Fixed `apex` not properly instantiated when running with `ddp` ([#7274](https://github.com/PyTorchLightning/pytorch-lightning/pull/7274))


## [1.2.7] - 2021-04-06

Expand Down
5 changes: 5 additions & 0 deletions pytorch_lightning/accelerators/accelerator.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,11 @@ def pre_dispatch(self, trainer: 'pl.Trainer') -> None:
self.setup_optimizers(trainer)
self.precision_plugin.pre_dispatch()

def dispatch(self, trainer: 'pl.Trainer') -> None:
"""Hook to do something before the training/evaluation/prediction starts."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to clear up somewhere here that this happens after accelerator setup? Otherwise this looks the same as pre_dispatch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaton
As the order of hooks being executed could be confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find the pre/dispatch/post confusing now :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should think about the naming of these hooks. But more importantly, I think we can do a better job at formally defining what these hook are supposed to do. Maybe another action item for 1.3 is to do a full pass over the plugins and improve all these docs. That would help everyone 1. implementing plugins 2. fix 3. review Plugin PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n00b question: would this be easier if the precision plugin was owned by the training type plugin instead of the accelerator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that could make it easier to interleave these operations between one plugin and the other. Here in this PR we see that the precision plugin needs to configure the model before it is wrapped, and needs to overwrite the reference in the training plugin. this really breaks the contract that these plugins currently have with each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should definitely refactor this and move optimizers, lr_schedulers to the training_type_plugin.

self.training_type_plugin.dispatch(trainer)
self.precision_plugin.dispatch(trainer)

def post_dispatch(self, trainer: 'pl.Trainer') -> None:
"""Hook to do something after the training/evaluation/prediction starts."""
self.training_type_plugin.post_dispatch()
Expand Down
5 changes: 5 additions & 0 deletions pytorch_lightning/plugins/base_plugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,18 @@
from abc import ABC
from typing import Generator

import pytorch_lightning as pl


class Plugin(ABC):
"""Basic class for all precision- and training type plugins."""

def pre_dispatch(self) -> None:
"""Hook to do something before the training/evaluation/prediction starts."""

def dispatch(self, trainer: "pl.Trainer") -> None:
"""Hook to do something at trainer run_stage starts."""

def post_dispatch(self) -> None:
"""Hook to do something after the training/evaluation/prediction finishes."""

Expand Down
63 changes: 12 additions & 51 deletions pytorch_lightning/plugins/precision/apex_amp.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Any, Callable, ContextManager, List, Sequence, Tuple, Type
from typing import Any, Callable, ContextManager, Sequence

import torch
from torch import Tensor
from torch.nn import Module
from torch.optim import Optimizer

from pytorch_lightning.core import LightningModule
import pytorch_lightning as pl
from pytorch_lightning.core.lightning import LightningModule
from pytorch_lightning.plugins.precision.mixed import MixedPrecisionPlugin
from pytorch_lightning.utilities import _APEX_AVAILABLE, AMPType
from pytorch_lightning.utilities.types import _PARAMETERS
Expand All @@ -34,24 +34,19 @@ def __init__(self, amp_level: str = "O2") -> None:
super().__init__()
self.backend = AMPType.APEX
self.amp_level = amp_level
self._connected = False

def master_params(self, optimizer: Optimizer) -> _PARAMETERS:
return amp.master_params(optimizer)

def connect(
self,
model: Module,
optimizers: List[Optimizer],
lr_schedulers: List[Any],
) -> Tuple[Module, List[Optimizer], List[Any]]:
"""Connects the precision plugin to the training process,
configures apex and reinits the schedulers
"""
if model.device.type != "cuda":
return model, optimizers, lr_schedulers
model, optimizers = self.configure_apex(amp, model, list(optimizers), self.amp_level)
self.reinit_scheduler_properties(optimizers, lr_schedulers)
return model, optimizers, lr_schedulers
def dispatch(self, trainer: "pl.Trainer") -> None:
if not self._connected:
accelerator = trainer.accelerator
_, accelerator.optimizers = amp.initialize(
trainer.lightning_module, accelerator.optimizers, opt_level=self.amp_level
)
self._connected = True
return super().dispatch(trainer)

def backward(
self,
Expand Down Expand Up @@ -99,40 +94,6 @@ def backward(
closure_loss = closure_loss.detach()
return closure_loss

def configure_apex(
self,
amp: Type,
model: Module,
optimizers: List[Optimizer],
amp_level: str,
) -> Tuple[Module, List[Optimizer]]:
r"""
Override to init AMP your own way.
Must return a model and list of optimizers.

Args:
amp: pointer to amp library object.
model: pointer to current :class:`torch.nn.Module`.
optimizers: list of optimizers passed in :meth:`configure_optimizers`.
amp_level: AMP mode chosen ('O1', 'O2', etc...)

Return:
Apex wrapped model and optimizers

Examples:
.. code-block:: python

# Default implementation used by Trainer.
def configure_apex(self, amp, model, optimizers, amp_level):
model, optimizers = amp.initialize(
model, optimizers, opt_level=amp_level,
)

return model, optimizers
"""
model, optimizers = amp.initialize(model, optimizers, opt_level=amp_level)
return model, optimizers

@staticmethod
def reinit_scheduler_properties(optimizers: Sequence[Optimizer], schedulers: Sequence[Any]) -> None:
"""Reinitializes schedulers with correct properties"""
Expand Down
3 changes: 2 additions & 1 deletion pytorch_lightning/trainer/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -516,7 +516,8 @@ def dispatch(self):
else:
self.accelerator.start_training(self)

def run_stage(self) -> Optional[Union[_EVALUATE_OUTPUT, _PREDICT_OUTPUT]]:
def run_stage(self):
self.accelerator.dispatch(self)
self.profile_connector.setup()

if self.evaluating:
Expand Down
60 changes: 59 additions & 1 deletion tests/plugins/test_amp_plugins.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
# Copyright The PyTorch Lightning team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
from unittest import mock

Expand Down Expand Up @@ -37,7 +51,7 @@ class MyApexPlugin(ApexMixedPrecisionPlugin):
pytest.param('native', False, NativeMixedPrecisionPlugin, marks=RunIf(amp_native=True)),
pytest.param('native', True, MyNativeAMP, marks=RunIf(amp_native=True)),
pytest.param('apex', False, ApexMixedPrecisionPlugin, marks=RunIf(amp_apex=True)),
pytest.param('apex', True, MyApexPlugin, marks=RunIf(amp_apex=True))
pytest.param('apex', True, MyApexPlugin, marks=RunIf(amp_apex=True)),
]
)
def test_amp_apex_ddp(
Expand Down Expand Up @@ -83,3 +97,47 @@ def test_amp_gradient_unscale(tmpdir, accum: int):
accumulate_grad_batches=accum,
)
trainer.fit(model)


@RunIf(min_gpus=2, amp_apex=True, special=True)
@pytest.mark.parametrize("amp_level", ['O2'])
def test_amp_apex_ddp_fit(amp_level, tmpdir):

Comment on lines +103 to +105
Copy link
Contributor

@awaelchli awaelchli Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have another apex test in
tests/models/test_amp.py::test_amp_with_apex It uses 1 gpu.
@tchaton your fix only applies to multi-gpu due to the dispatch, am I right? single gpu seems to be ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

class CustomBoringModel(BoringModel):

def training_step(self, batch, batch_idx):
assert self.layer.weight.dtype == torch.float16
assert self.trainer.precision_plugin._connected
return super().training_step(batch, batch_idx)

trainer = Trainer(
default_root_dir=tmpdir,
fast_dev_run=True,
precision=16,
amp_backend="apex",
gpus=2,
accelerator='ddp',
plugins=ApexMixedPrecisionPlugin(amp_level=amp_level),
)
assert isinstance(trainer.precision_plugin, ApexMixedPrecisionPlugin)
model = CustomBoringModel()
trainer.fit(model)
trainer.test(model)


@RunIf(min_gpus=2, amp_apex=True)
@pytest.mark.parametrize("amp_level", ['O2'])
def test_amp_apex_ddp_spawn_fit(amp_level, tmpdir):

trainer = Trainer(
default_root_dir=tmpdir,
fast_dev_run=True,
precision=16,
amp_backend="apex",
gpus=2,
accelerator='ddp_spawn',
plugins=ApexMixedPrecisionPlugin(amp_level=amp_level),
)
assert isinstance(trainer.precision_plugin, ApexMixedPrecisionPlugin)
model = BoringModel()
trainer.fit(model)
13 changes: 13 additions & 0 deletions tests/plugins/test_cluster_integration.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
# Copyright The PyTorch Lightning team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from unittest import mock

Expand Down