Skip to content

Commit 7ed3eea

Browse files
authored
Merge branch 'release/1.2-dev' into feature/lambdacallback
2 parents 72f3f0c + 54d20dc commit 7ed3eea

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+560
-445
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
6565
- Changed `iou` [func] to allow float input ([#4704](https://github.com/PyTorchLightning/pytorch-lightning/pull/4704))
6666

6767

68+
- Changed `callbacks` argument in `Trainer` to allow `Callback` input ([#5446](https://github.com/PyTorchLightning/pytorch-lightning/pull/5446))
69+
70+
6871
### Deprecated
6972

7073
- `stat_scores_multiple_classes` is deprecated in favor of `stat_scores` ([#4839](https://github.com/PyTorchLightning/pytorch-lightning/pull/4839))

docs/source/tpu.rst

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ To access TPUs, there are three main ways.
4040
----------------
4141

4242
Colab TPUs
43-
-----------
43+
----------
4444
Colab is like a jupyter notebook with a free GPU or TPU
4545
hosted on GCP.
4646

@@ -129,8 +129,7 @@ That's it! Your model will train on all 8 TPU cores.
129129
----------------
130130

131131
TPU core training
132-
133-
------------------------
132+
-----------------
134133

135134
Lightning supports training on a single TPU core or 8 TPU cores.
136135

@@ -177,7 +176,7 @@ on how to set up the instance groups and VMs needed to run TPU Pods.
177176
----------------
178177

179178
16 bit precision
180-
-----------------
179+
----------------
181180
Lightning also supports training in 16-bit precision with TPUs.
182181
By default, TPU training will use 32-bit precision. To enable 16-bit,
183182
set the 16-bit flag.
@@ -194,6 +193,28 @@ Under the hood the xla library will use the `bfloat16 type <https://en.wikipedia
194193

195194
----------------
196195

196+
Performance considerations
197+
--------------------------
198+
199+
The TPU was designed for specific workloads and operations to carry out large volumes of matrix multiplication,
200+
convolution operations and other commonly used ops in applied deep learning.
201+
The specialization makes it a strong choice for NLP tasks, sequential convolutional networks, and under low precision operation.
202+
There are cases in which training on TPUs is slower when compared with GPUs, for possible reasons listed:
203+
204+
- Too small batch size.
205+
- Explicit evaluation of tensors during training, e.g. ``tensor.item()``
206+
- Tensor shapes (e.g. model inputs) change often during training.
207+
- Limited resources when using TPU's with PyTorch `Link <https://github.com/pytorch/xla/issues/2054#issuecomment-627367729>`_
208+
- XLA Graph compilation during the initial steps `Reference <https://github.com/pytorch/xla/issues/2383#issuecomment-666519998>`_
209+
- Some tensor ops are not fully supported on TPU, or not supported at all. These operations will be performed on CPU (context switch).
210+
- PyTorch integration is still experimental. Some performance bottlenecks may simply be the result of unfinished implementation.
211+
212+
The official PyTorch XLA `performance guide <https://github.com/pytorch/xla/blob/master/TROUBLESHOOTING.md#known-performance-caveats>`_
213+
has more detailed information on how PyTorch code can be optimized for TPU. In particular, the
214+
`metrics report <https://github.com/pytorch/xla/blob/master/TROUBLESHOOTING.md#get-a-metrics-report>`_ allows
215+
one to identify operations that lead to context switching.
216+
217+
197218
About XLA
198219
----------
199220
XLA is the library that interfaces PyTorch with the TPUs.

pyproject.toml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,15 +39,13 @@ skip_glob = [
3939
"tests/backends/*",
4040
"tests/base/*",
4141
"tests/callbacks/*",
42-
"tests/checkpointing/*",
4342
"tests/core/*",
4443
"tests/loggers/*",
4544
"tests/metrics/*",
4645
"tests/models/*",
4746
"tests/plugins/*",
4847
"tests/trainer/*",
4948
"tests/tuner/*",
50-
"tests/utilities/*",
5149
]
5250
profile = "black"
5351
line_length = 120

pytorch_lightning/accelerators/accelerator_connector.py

Lines changed: 24 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -185,14 +185,21 @@ def select_accelerator(self):
185185
# ----------------------------------
186186
# choose an accelerator for the user
187187
# ----------------------------------
188-
use_slurm_ddp = self.trainer.use_ddp and self.trainer.is_slurm_managing_tasks
188+
use_slurm_ddp = (
189+
self.trainer._distrib_type in (DistributedType.DDP, DistributedType.DDP_SPAWN)
190+
and self.trainer.is_slurm_managing_tasks
191+
)
189192

190193
# torchelastic or general non_slurm ddp
191194
te_flags_passed = 'WORLD_SIZE' in os.environ and ('GROUP_RANK' in os.environ or 'NODE_RANK' in os.environ)
192-
use_torchelastic_ddp = self.trainer.use_ddp and te_flags_passed
195+
use_torchelastic_ddp = (
196+
self.trainer._distrib_type in (DistributedType.DDP, DistributedType.DDP_SPAWN) and te_flags_passed
197+
)
193198

194-
use_ddp_spawn = self.trainer.use_ddp and self.trainer.distributed_backend == "ddp_spawn"
195-
use_ddp_cpu_spawn = self.trainer.use_ddp and self.trainer.distributed_backend == "ddp_cpu"
199+
use_ddp_cpu_spawn = (
200+
self.trainer._distrib_type in (DistributedType.DDP, DistributedType.DDP_SPAWN)
201+
and self.trainer._device_type == DeviceType.CPU
202+
)
196203

197204
use_ddp_cpu_torch_elastic = use_ddp_cpu_spawn and self._is_using_torchelastic()
198205
use_ddp_cpu_slurm = use_ddp_cpu_spawn and self.trainer.is_slurm_managing_tasks
@@ -204,8 +211,9 @@ def select_accelerator(self):
204211

205212
cluster_env = self._select_environment()
206213

214+
# TODO: clean-up this branching as most just select class and uses the very same arguments
207215
# choose the appropriate accelerator backend
208-
if self.trainer.use_ddp2:
216+
if self.trainer._distrib_type == DistributedType.DDP2:
209217
accelerator_backend = accelerators.DDP2Accelerator(
210218
self.trainer,
211219
cluster_env,
@@ -240,7 +248,7 @@ def select_accelerator(self):
240248
self.trainer.plugin_connector.ddp_plugin
241249
)
242250

243-
elif use_ddp_spawn:
251+
elif self.trainer._distrib_type == DistributedType.DDP_SPAWN:
244252
accelerator_backend = accelerators.DDPSpawnAccelerator(
245253
self.trainer,
246254
nprocs=self.trainer.num_processes,
@@ -263,16 +271,16 @@ def select_accelerator(self):
263271
ddp_plugin=self.trainer.plugin_connector.ddp_plugin
264272
)
265273

266-
elif self.trainer.use_dp:
274+
elif self.trainer._distrib_type == DistributedType.DP:
267275
accelerator_backend = accelerators.DataParallelAccelerator(self.trainer, cluster_env)
268276

269-
elif self.trainer.use_horovod:
277+
elif self.trainer._distrib_type == DistributedType.HOROVOD:
270278
accelerator_backend = accelerators.HorovodAccelerator(self.trainer, cluster_env)
271279

272-
elif self.trainer.use_single_gpu:
280+
elif self.trainer._device_type == DeviceType.GPU and self.trainer.num_gpus == 1:
273281
accelerator_backend = accelerators.GPUAccelerator(self.trainer, cluster_env)
274282

275-
elif self.trainer.use_tpu:
283+
elif self.trainer._device_type == DeviceType.TPU:
276284
accelerator_backend = accelerators.TPUAccelerator(self.trainer, cluster_env)
277285

278286
elif self.trainer.distributed_backend is None:
@@ -347,13 +355,16 @@ def set_distributed_mode(self):
347355
self._set_horovod_backend()
348356

349357
# throw error to force user ddp or ddp2 choice
350-
if self.trainer.num_nodes > 1 and self.trainer._distrib_type not in (DistributedType.DDP2, DistributedType.DDP):
358+
_ddp = (DistributedType.DDP, DistributedType.DDP_SPAWN, DistributedType.DDP2)
359+
if (self.trainer.num_nodes > 1 and self.trainer._distrib_type not in _ddp):
351360
raise MisconfigurationException(
352361
'DataParallel does not support num_nodes > 1. Switching to DistributedDataParallel for you. '
353362
'To silence this warning set `accelerator="ddp"` or `accelerator="ddp2"`'
354363
)
355364

356-
rank_zero_info(f'GPU available: {torch.cuda.is_available()}, used: {self.trainer.on_gpu}')
365+
rank_zero_info(
366+
f'GPU available: {torch.cuda.is_available()}, used: {self.trainer._device_type == DeviceType.GPU}'
367+
)
357368
num_cores = self.trainer.tpu_cores if self.trainer.tpu_cores is not None else 0
358369
rank_zero_info(f'TPU available: {_TPU_AVAILABLE}, using: {num_cores} TPU cores')
359370

@@ -366,7 +377,7 @@ def _set_horovod_backend(self):
366377

367378
# Initialize Horovod to get rank / size info
368379
hvd.init()
369-
if self.trainer.on_gpu:
380+
if self.trainer._device_type == DeviceType.GPU:
370381
# Horovod assigns one local GPU per process
371382
self.trainer.root_gpu = hvd.local_rank()
372383

pytorch_lightning/accelerators/horovod_accelerator.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919

2020
from pytorch_lightning.accelerators.accelerator import Accelerator, ReduceOp
2121
from pytorch_lightning.cluster_environments import ClusterEnvironment
22-
from pytorch_lightning.utilities import _HOROVOD_AVAILABLE, AMPType
22+
from pytorch_lightning.utilities import _HOROVOD_AVAILABLE, AMPType, DeviceType
2323
from pytorch_lightning.utilities.distributed import rank_zero_only
2424

2525
if _HOROVOD_AVAILABLE:
@@ -46,7 +46,7 @@ def setup(self, model):
4646
# call setup after the ddp process has connected
4747
self.trainer.call_setup_hook(model)
4848

49-
if torch.cuda.is_available() and self.trainer.on_gpu:
49+
if torch.cuda.is_available() and self.trainer._device_type == DeviceType.GPU:
5050
# Horovod: pin GPU to local rank
5151
assert self.trainer.root_gpu == hvd.local_rank()
5252
torch.cuda.set_device(self.trainer.root_gpu)
@@ -116,7 +116,7 @@ def train(self):
116116
return results
117117

118118
def _step(self, model_step: Callable, args):
119-
if self.trainer.on_gpu:
119+
if self.trainer._device_type == DeviceType.GPU:
120120
args[0] = self.batch_to_device(args[0], hvd.local_rank())
121121

122122
if self.trainer.amp_backend == AMPType.NATIVE:
@@ -141,7 +141,7 @@ def backward(self, closure_loss, optimizer, opt_idx, *args, **kwargs):
141141
optimizer.synchronize()
142142

143143
def on_train_epoch_end(self, outputs):
144-
hvd.join(hvd.local_rank() if self.trainer.on_gpu else -1)
144+
hvd.join(hvd.local_rank() if self.trainer._device_type == DeviceType.GPU else -1)
145145

146146
def barrier(self, name: Optional[str] = None):
147147
hvd.join()

pytorch_lightning/callbacks/early_stopping.py

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525

2626
from pytorch_lightning.callbacks.base import Callback
2727
from pytorch_lightning.utilities import rank_zero_info, rank_zero_warn
28+
from pytorch_lightning.utilities.exceptions import MisconfigurationException
2829

2930

3031
class EarlyStopping(Callback):
@@ -96,15 +97,12 @@ def __init__(
9697
self.best_score = torch_inf if self.monitor_op == torch.lt else -torch_inf
9798

9899
def __init_monitor_mode(self):
99-
# TODO: Update with MisconfigurationException when auto mode is removed in v1.3
100100
if self.mode not in self.mode_dict and self.mode != 'auto':
101-
if self.verbose > 0:
102-
rank_zero_warn(
103-
f'EarlyStopping mode={self.mode} is unknown, fallback to auto mode.',
104-
RuntimeWarning,
105-
)
106-
self.mode = 'auto'
101+
raise MisconfigurationException(
102+
f"`mode` can be auto, {', '.join(self.mode_dict.keys())}, got {self.mode}"
103+
)
107104

105+
# TODO: Update with MisconfigurationException when auto mode is removed in v1.3
108106
if self.mode == 'auto':
109107
rank_zero_warn(
110108
"mode='auto' is deprecated in v1.1 and will be removed in v1.3."

pytorch_lightning/callbacks/gpu_stats_monitor.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
from typing import Dict, List, Tuple
2828

2929
from pytorch_lightning.callbacks.base import Callback
30-
from pytorch_lightning.utilities import rank_zero_only
30+
from pytorch_lightning.utilities import rank_zero_only, DeviceType
3131
from pytorch_lightning.utilities.exceptions import MisconfigurationException
3232
from pytorch_lightning.utilities.parsing import AttributeDict
3333

@@ -104,7 +104,7 @@ def on_train_start(self, trainer, *args, **kwargs):
104104
'Cannot use GPUStatsMonitor callback with Trainer that has no logger.'
105105
)
106106

107-
if not trainer.on_gpu:
107+
if trainer._device_type != DeviceType.GPU:
108108
raise MisconfigurationException(
109109
'You are using GPUStatsMonitor but are not running on GPU'
110110
f' since gpus attribute in Trainer is set to {trainer.gpus}.'

pytorch_lightning/callbacks/model_checkpoint.py

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -287,14 +287,12 @@ def __init_monitor_mode(self, monitor, mode):
287287
"max": (-torch_inf, "max"),
288288
}
289289

290-
# TODO: Update with MisconfigurationException when auto mode is removed in v1.3
291290
if mode not in mode_dict and mode != 'auto':
292-
rank_zero_warn(
293-
f"ModelCheckpoint mode {mode} is unknown, fallback to auto mode",
294-
RuntimeWarning,
291+
raise MisconfigurationException(
292+
f"`mode` can be auto, {', '.join(mode_dict.keys())}, got {mode}"
295293
)
296-
mode = "auto"
297294

295+
# TODO: Update with MisconfigurationException when auto mode is removed in v1.3
298296
if mode == 'auto':
299297
rank_zero_warn(
300298
"mode='auto' is deprecated in v1.1 and will be removed in v1.3."

pytorch_lightning/core/lightning.py

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -85,17 +85,8 @@ def __init__(self, *args, **kwargs):
8585
#: Pointer to the logger object
8686
self.logger = None
8787

88-
#: True if using dp
89-
self.use_dp = False
90-
91-
#: True if using ddp
92-
self.use_ddp = False
93-
94-
#: True if using ddp2
95-
self.use_ddp2 = False
96-
97-
# True if on tpu
98-
self.use_tpu = False
88+
self._distrib_type = None
89+
self._device_type = None
9990

10091
#: True if using amp
10192
self.use_amp = False

pytorch_lightning/core/memory.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
import torch.nn as nn
2424
from torch.utils.hooks import RemovableHandle
2525

26-
from pytorch_lightning.utilities import AMPType
26+
from pytorch_lightning.utilities import AMPType, DeviceType
2727

2828
PARAMETER_NUM_UNITS = [" ", "K", "M", "B", "T"]
2929
UNKNOWN_SIZE = "?"
@@ -229,7 +229,7 @@ def _forward_example_input(self) -> None:
229229
input_ = model.example_input_array
230230
input_ = model.transfer_batch_to_device(input_, model.device)
231231

232-
if trainer is not None and trainer.amp_backend == AMPType.NATIVE and not trainer.use_tpu:
232+
if trainer is not None and trainer.amp_backend == AMPType.NATIVE and trainer._device_type != DeviceType.TPU:
233233
model.forward = torch.cuda.amp.autocast()(model.forward)
234234

235235
mode = model.training

0 commit comments

Comments
 (0)