Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -708,6 +708,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Fixed passing `_ddp_params_and_buffers_to_ignore` ([#11949](https://github.com/PyTorchLightning/pytorch-lightning/pull/11949))


- Prevent modification of `torch.backends.cudnn.benchmark` when `benchmark` not set on the `Trainer` ([#12020](https://github.com/PyTorchLightning/pytorch-lightning/pull/12020))


- Fixed an `AttributeError` when calling `save_hyperparameters` and no parameters need saving ([#11827](https://github.com/PyTorchLightning/pytorch-lightning/pull/11827))


Expand Down
21 changes: 14 additions & 7 deletions docs/source/common/trainer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -416,22 +416,29 @@ benchmark

|

Defaults to ``True`` if :paramref:`~pytorch_lightning.trainer.Trainer.deterministic` is not set.
This flag sets the ``torch.backends.cudnn.deterministic`` flag. You can read more about its impact
The value (``True`` or ``False``) to set ``torch.backends.cudnn.benchmark`` to. If neither of this flag or
:paramref:`~pytorch_lightning.trainer.Trainer.deterministic` are set, the value for
``torch.backends.cudnn.benchmark`` set in the current session will be used. If
:paramref:`~pytorch_lightning.trainer.Trainer.deterministic` is set to ``True``, this will default to ``False``.
You can read more about the interaction of ``torch.backends.cudnn.benchmark`` and ``torch.backends.cudnn.deterministic``
`here <https://pytorch.org/docs/stable/notes/randomness.html#cuda-convolution-benchmarking>`__

This is likely to increase the speed of your system if your input sizes don't change. However, if they do, then it
might make your system slower. The CUDNN auto-tuner will try to find the best algorithm for the hardware when a new
input size is encountered. Read more about it `here <https://discuss.pytorch.org/t/what-does-torch-backends-cudnn-benchmark-do/5936>`__.
Setting this flag to ``True`` is likely to increase the speed of your system if your input sizes don't
change. However, if they do, then it might make your system slower. The CUDNN auto-tuner will try to find the best
algorithm for the hardware when a new input size is encountered. Read more about it
`here <https://discuss.pytorch.org/t/what-does-torch-backends-cudnn-benchmark-do/5936>`__.

Example::

# defaults to True if not deterministic (which is False by default)
trainer = Trainer()
# default used by the Trainer (will use whatever the current value for torch.backends.cudnn.benchmark is)
trainer = Trainer(benchmark=None)

# you can overwrite the value
trainer = Trainer(benchmark=False)

# `benchmark` defaults to False when deterministic is True
trainer = Trainer(deterministic=True)

deterministic
^^^^^^^^^^^^^

Expand Down
19 changes: 12 additions & 7 deletions pytorch_lightning/trainer/connectors/accelerator_connector.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ def __init__(
sync_batchnorm: bool = False,
benchmark: Optional[bool] = None,
replace_sampler_ddp: bool = True,
deterministic: bool = False,
deterministic: Optional[bool] = None,
num_processes: Optional[int] = None, # deprecated
tpu_cores: Optional[Union[List[int], int]] = None, # deprecated
ipus: Optional[int] = None, # deprecated
Expand Down Expand Up @@ -147,9 +147,13 @@ def __init__(
"You passed `deterministic=True` and `benchmark=True`. Note that PyTorch ignores"
" torch.backends.cudnn.deterministic=True when torch.backends.cudnn.benchmark=True.",
)
self.benchmark = not deterministic if benchmark is None else benchmark
if deterministic and benchmark is None:
# Set benchmark to False to ensure determinism
benchmark = False
# TODO: move to gpu accelerator
torch.backends.cudnn.benchmark = self.benchmark
if benchmark is not None:
torch.backends.cudnn.benchmark = benchmark
self.benchmark = torch.backends.cudnn.benchmark
self.replace_sampler_ddp = replace_sampler_ddp
self._init_deterministic(deterministic)

Expand Down Expand Up @@ -211,12 +215,13 @@ def __init__(
self._lazy_init_strategy()

def _init_deterministic(self, deterministic: bool) -> None:
self.deterministic = deterministic
# Default to False if not set
self.deterministic = deterministic or False
if _TORCH_GREATER_EQUAL_1_8:
torch.use_deterministic_algorithms(deterministic)
torch.use_deterministic_algorithms(self.deterministic)
else:
torch.set_deterministic(deterministic)
if deterministic:
torch.set_deterministic(self.deterministic)
if self.deterministic:
# fixing non-deterministic part of horovod
# https://github.com/PyTorchLightning/pytorch-lightning/pull/1572/files#r420279383
os.environ["HOROVOD_FUSION_THRESHOLD"] = "0"
Expand Down
12 changes: 8 additions & 4 deletions pytorch_lightning/trainer/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ def __init__(
resume_from_checkpoint: Optional[Union[Path, str]] = None,
profiler: Optional[Union[BaseProfiler, str]] = None,
benchmark: Optional[bool] = None,
deterministic: bool = False,
deterministic: Optional[bool] = None,
reload_dataloaders_every_n_epochs: int = 0,
auto_lr_find: Union[bool, str] = False,
replace_sampler_ddp: bool = True,
Expand Down Expand Up @@ -230,8 +230,11 @@ def __init__(
Default: ``False``.

benchmark: Sets ``torch.backends.cudnn.benchmark``.
Defaults to ``True`` if :paramref:`~pytorch_lightning.trainer.trainer.Trainer.deterministic`
is ``False``. Overwrite to manually set a different value. Default: ``None``.
The value (``True`` or ``False``) to set ``torch.backends.cudnn.benchmark`` to. If not specified, the
value set in the current session will be used. However, if
:paramref:`~pytorch_lightning.trainer.trainer.Trainer.deterministic` is ``True``, ``benchmark`` defaults
to ``False`` to ensure determinism. Override to manually set a different value.
Default: ``None``.

callbacks: Add a callback or list of callbacks.
Default: ``None``.
Expand Down Expand Up @@ -260,7 +263,8 @@ def __init__(
Default: ``False``.

deterministic: If ``True``, sets whether PyTorch operations must use deterministic algorithms.
Default: ``False``.
If not set, defaults to ``False``.
Default: ``None``.

devices: Will be mapped to either `gpus`, `tpu_cores`, `num_processes` or `ipus`,
based on the accelerator type.
Expand Down
8 changes: 6 additions & 2 deletions tests/trainer/test_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -641,12 +641,15 @@ def test_trainer_max_steps_accumulate_batches(tmpdir):
@pytest.mark.parametrize(
["benchmark_", "deterministic", "expected"],
[
(None, False, True),
(None, False, None),
(None, True, False),
(None, None, None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now, somebody that creates Trainer() with no arguments, won't get cudnn.benchmark=True as it does in master?

I think having benchmark=True by default is best.

The proposal in this PR only works if we were able to know if the user had set benchmark manually before.

Copy link
Author

@timesler timesler Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The justification for this change is that Pytorch Lightning shouldn't break standard Pytorch functionality, but extend it. In general, PL offers value by establishing sensible default values so users don't need to think about them usually. However, in this case, we are silently setting a global variable, resulting in this behaviour:

import torch
from pytorch_lightning import Trainer

torch.backends.cudnn.benchmark = False
trainer = Trainer(gpus=1)
print(torch.backends.cudnn.benchmark)

Output:

True
# When it should be False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, but since there's no way for us to know whether the user previously set the value, we have 2 exclusive options:

  1. Have the better default for most people which may override an existing value (current master)
  2. Always respect the existing value but users have to remember to set this flag (this PR)

I personally prefer 1 as it establishes a "sensible default". We could request the torch folks to add an optional default so the options are not exclusive in the future.

ccing revierwers @awaelchli @ananthsub @krshrimali to see if they think differently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, but since there's no way for us to know whether the user previously set the value,

We can know, by making the default value for the argument None. A default Trainer() would result in taking the value from torch.backends.cudnn.benchmark which by default is True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch.backends.cudnn.benchmark which by default is True

Is it? locally:

$ python -c 'import torch; print(torch.backends.cudnn.benchmark)'
False

(True, False, True),
(True, True, True),
(False, True, False),
(True, None, True),
(False, False, False),
(False, True, False),
(False, None, False),
],
)
def test_benchmark_option(benchmark_, deterministic, expected):
Expand All @@ -659,6 +662,7 @@ def test_benchmark_option(benchmark_, deterministic, expected):
trainer = Trainer(benchmark=benchmark_, deterministic=deterministic)
else:
trainer = Trainer(benchmark=benchmark_, deterministic=deterministic)
expected = original_val if expected is None else expected
assert torch.backends.cudnn.benchmark == expected
assert trainer._accelerator_connector.benchmark == expected

Expand Down