Skip to content

reduce_boolean_decision behavior diverges from EarlyStopping callback's intended usage of it #15252

@speediedan

Description

@speediedan

Bug description

When EarlyStopping is used in a distributed context, early stopping conditions may be met in some processes before others.

Per the described intended behavior of EarlyStopping:
https://github.com/Lightning-AI/lightning/blob/be1eb5e86d07fe22b53a59184089aac569875117/src/pytorch_lightning/callbacks/early_stopping.py#L204-L205
, all training processes should be stopped when an EarlyStopping threshold is reached in any process. The current behavior of reduce_boolean_decision is to only return True when all input process decisions are True:
https://github.com/Lightning-AI/lightning/blob/be1eb5e86d07fe22b53a59184089aac569875117/src/lightning_lite/strategies/parallel.py#L88-L92

Though this issue can be avoided when logging the monitored metric with sync_dist=True, since that configuration is not mandatory, reduce_boolean_decision should be adapted to behave as the EarlyStopping callback expects.

I will be submitting a PR shortly that maintains the current reduce_boolean_decision behavior by default, but enhances the function to accommodate any-analogous semantics as expected by the EarlyStopping callback. The PR will also include an additional test to validate the aforementioned new behavior resolves the issue described.

How to reproduce the bug

The easiest way to reproduce will be to checkout the forthcoming PR and use the new test in combination with the original ``EarlyStopping`` callback usage of ``reduce_boolean_decision``


pytest -v tests/tests_pytorch/callbacks/test_early_stopping.py::test_multiple_early_stopping_callbacks[callbacks2-2-False-ddp_spawn-2-2]


### Error messages and logs

_No response_

### Environment

  • CUDA:
    • GPU:
      • NVIDIA GeForce RTX 2070 SUPER
      • NVIDIA GeForce RTX 2070
    • available: True
    • version: 11.7
  • Lightning:
    • lightning-utilities: 0.3.0
    • pt-lightning-sphinx-theme: 0.0.31
    • pytorch-lightning: 1.8.0rc0
    • torch: 1.13.0
    • torchmetrics: 0.10.0
    • torchtext: 0.14.0
    • torchvision: 0.14.0
  • Packages:
    • absl-py: 1.3.0
    • aiohttp: 3.8.3
    • aiosignal: 1.2.0
    • alabaster: 0.7.12
    • alembic: 1.8.1
    • antlr4-python3-runtime: 4.9.3
    • anyio: 3.6.2
    • argon2-cffi: 21.3.0
    • argon2-cffi-bindings: 21.2.0
    • asttokens: 2.0.8
    • async-generator: 1.10
    • async-timeout: 4.0.2
    • attrs: 22.1.0
    • babel: 2.10.3
    • backcall: 0.2.0
    • beautifulsoup4: 4.11.1
    • black: 22.10.0
    • bleach: 5.0.1
    • boto3: 1.24.95
    • botocore: 1.27.95
    • bracex: 2.3.post1
    • bravado: 11.0.3
    • bravado-core: 5.17.1
    • brotlipy: 0.7.0
    • cachetools: 5.2.0
    • certifi: 2022.9.24
    • cffi: 1.15.1
    • cfgv: 3.3.1
    • charset-normalizer: 2.0.4
    • click: 8.1.3
    • cloudpickle: 2.2.0
    • codecov: 2.1.12
    • coloredlogs: 15.0.1
    • comet-ml: 3.31.15
    • commonmark: 0.9.1
    • configobj: 5.0.6
    • contourpy: 1.0.5
    • coverage: 6.5.0
    • cryptography: 37.0.1
    • curio: 1.5
    • cycler: 0.11.0
    • databricks-cli: 0.17.3
    • debugpy: 1.6.3
    • decorator: 5.1.1
    • deepspeed: 0.7.3
    • defusedxml: 0.7.1
    • distlib: 0.3.6
    • docker: 6.0.0
    • docker-pycreds: 0.4.0
    • docstring-parser: 0.15
    • docutils: 0.17.1
    • dulwich: 0.20.46
    • entrypoints: 0.4
    • everett: 3.0.0
    • exceptiongroup: 1.0.0rc9
    • executing: 1.1.1
    • fairscale: 0.4.12
    • fastapi: 0.85.1
    • fastjsonschema: 2.16.2
    • filelock: 3.8.0
    • fire: 0.4.0
    • flask: 2.2.2
    • flatbuffers: 22.9.24
    • fonttools: 4.37.4
    • frozenlist: 1.3.1
    • fsspec: 2022.10.0
    • future: 0.18.2
    • gitdb: 4.0.9
    • gitpython: 3.1.29
    • google-auth: 2.13.0
    • google-auth-oauthlib: 0.4.6
    • greenlet: 1.1.3.post0
    • grpcio: 1.50.0
    • gunicorn: 20.1.0
    • gym: 0.26.2
    • gym-notices: 0.0.8
    • h11: 0.14.0
    • hjson: 3.1.0
    • humanfriendly: 10.0
    • hydra-core: 1.2.0
    • identify: 2.5.6
    • idna: 3.4
    • imagesize: 1.4.1
    • importlib-metadata: 5.0.0
    • iniconfig: 1.1.1
    • ipykernel: 6.16.1
    • ipyparallel: 8.4.1
    • ipython: 8.5.0
    • ipython-genutils: 0.2.0
    • ipywidgets: 8.0.2
    • itsdangerous: 2.1.2
    • jedi: 0.18.1
    • jinja2: 3.0.3
    • jmespath: 1.0.1
    • joblib: 1.2.0
    • jsonargparse: 4.15.2
    • jsonpointer: 2.3
    • jsonref: 0.3.0
    • jsonschema: 3.2.0
    • jupyter-client: 7.4.3
    • jupyter-core: 4.11.2
    • jupyter-server: 1.21.0
    • jupyterlab-pygments: 0.2.2
    • jupyterlab-widgets: 3.0.3
    • kiwisolver: 1.4.4
    • lightning-utilities: 0.3.0
    • mako: 1.2.3
    • markdown: 3.4.1
    • markdown-it-py: 2.1.0
    • markupsafe: 2.1.1
    • matplotlib: 3.6.1
    • matplotlib-inline: 0.1.6
    • mdit-py-plugins: 0.3.1
    • mdurl: 0.1.2
    • mistune: 2.0.4
    • mkl-fft: 1.3.1
    • mkl-random: 1.2.2
    • mkl-service: 2.4.0
    • mlflow: 1.30.0
    • monotonic: 1.6
    • mpmath: 1.2.1
    • msgpack: 1.0.4
    • multidict: 6.0.2
    • mypy: 0.971
    • mypy-extensions: 0.4.3
    • myst-parser: 0.16.1
    • nbclassic: 0.4.5
    • nbclient: 0.7.0
    • nbconvert: 7.2.2
    • nbformat: 5.7.0
    • nbsphinx: 0.8.9
    • neptune-client: 0.16.9
    • nest-asyncio: 1.5.6
    • ninja: 1.10.2.4
    • nodeenv: 1.7.0
    • notebook: 6.5.1
    • notebook-shim: 0.2.0
    • numpy: 1.23.3
    • oauthlib: 3.2.2
    • omegaconf: 2.2.3
    • onnxruntime: 1.12.1
    • outcome: 1.2.0
    • packaging: 21.3
    • pandas: 1.5.1
    • pandoc: 2.2
    • pandocfilters: 1.5.0
    • parso: 0.8.3
    • pathspec: 0.10.1
    • pathtools: 0.1.2
    • pexpect: 4.8.0
    • pickleshare: 0.7.5
    • pillow: 9.2.0
    • pip: 22.2.2
    • platformdirs: 2.5.2
    • pluggy: 1.0.0
    • plumbum: 1.8.0
    • ply: 3.11
    • pre-commit: 2.20.0
    • prometheus-client: 0.15.0
    • prometheus-flask-exporter: 0.20.3
    • promise: 2.3
    • prompt-toolkit: 3.0.31
    • protobuf: 3.19.6
    • psutil: 5.9.3
    • pt-lightning-sphinx-theme: 0.0.31
    • ptyprocess: 0.7.0
    • pure-eval: 0.2.2
    • py: 1.11.0
    • py-cpuinfo: 8.0.0
    • pyasn1: 0.4.8
    • pyasn1-modules: 0.2.8
    • pycparser: 2.21
    • pydantic: 1.10.2
    • pygame: 2.1.0
    • pygments: 2.13.0
    • pyjwt: 2.6.0
    • pyopenssl: 22.0.0
    • pyparsing: 3.0.9
    • pyrsistent: 0.18.1
    • pysocks: 1.7.1
    • pytest: 7.0.1
    • pytest-asyncio: 0.20.1
    • pytest-cov: 4.0.0
    • pytest-forked: 1.4.0
    • pytest-rerunfailures: 10.2
    • python-dateutil: 2.8.2
    • pytorch-lightning: 1.8.0rc0
    • pytz: 2022.5
    • pyyaml: 6.0
    • pyzmq: 24.0.1
    • qtconsole: 5.3.2
    • qtpy: 2.2.1
    • querystring-parser: 1.2.4
    • requests: 2.28.1
    • requests-oauthlib: 1.3.1
    • requests-toolbelt: 0.10.0
    • rfc3987: 1.3.8
    • rich: 12.6.0
    • rsa: 4.9
    • s3transfer: 0.6.0
    • scikit-learn: 1.1.2
    • scipy: 1.9.3
    • semantic-version: 2.10.0
    • send2trash: 1.8.0
    • sentry-sdk: 1.10.1
    • setproctitle: 1.3.2
    • setuptools: 63.4.1
    • shortuuid: 1.0.9
    • simplejson: 3.17.6
    • six: 1.16.0
    • smmap: 5.0.0
    • sniffio: 1.3.0
    • snowballstemmer: 2.2.0
    • sortedcontainers: 2.4.0
    • soupsieve: 2.3.2.post1
    • sphinx: 4.5.0
    • sphinx-autodoc-typehints: 1.19.1
    • sphinx-copybutton: 0.5.0
    • sphinx-multiproject: 1.0.0rc1
    • sphinx-paramlinks: 0.5.4
    • sphinx-togglebutton: 0.3.2
    • sphinxcontrib-applehelp: 1.0.2
    • sphinxcontrib-devhelp: 1.0.2
    • sphinxcontrib-fulltoc: 1.2.0
    • sphinxcontrib-htmlhelp: 2.0.0
    • sphinxcontrib-jsmath: 1.0.1
    • sphinxcontrib-mockautodoc: 0.0.1.dev20130518
    • sphinxcontrib-qthelp: 1.0.3
    • sphinxcontrib-serializinghtml: 1.1.5
    • sqlalchemy: 1.4.42
    • sqlparse: 0.4.3
    • stack-data: 0.5.1
    • starlette: 0.20.4
    • strict-rfc3339: 0.7
    • swagger-spec-validator: 3.0.2
    • sympy: 1.11.1
    • tabulate: 0.9.0
    • tensorboard: 2.10.1
    • tensorboard-data-server: 0.6.1
    • tensorboard-plugin-wit: 1.8.1
    • termcolor: 2.0.1
    • terminado: 0.16.0
    • testpath: 0.6.0
    • threadpoolctl: 3.1.0
    • tinycss2: 1.2.1
    • toml: 0.10.2
    • tomli: 2.0.1
    • torch: 1.13.0
    • torchmetrics: 0.10.0
    • torchtext: 0.14.0
    • torchvision: 0.14.0
    • tornado: 6.2
    • tqdm: 4.64.1
    • traitlets: 5.5.0
    • trio: 0.22.0
    • types-croniter: 1.3.2
    • types-cryptography: 3.3.23.1
    • types-protobuf: 3.20.4.1
    • types-pyopenssl: 22.1.0.1
    • types-python-dateutil: 2.8.19.2
    • types-pyyaml: 6.0.12
    • types-redis: 4.3.21.2
    • types-requests: 2.28.11.2
    • types-setuptools: 65.5.0.1
    • types-six: 1.16.21
    • types-tabulate: 0.9.0.0
    • types-ujson: 5.5.0
    • types-urllib3: 1.26.25.1
    • typing-extensions: 4.3.0
    • urllib3: 1.26.12
    • uvicorn: 0.19.0
    • virtualenv: 20.16.5
    • wandb: 0.13.4
    • wcmatch: 8.4.1
    • wcwidth: 0.2.5
    • webcolors: 1.12
    • webencodings: 0.5.1
    • websocket-client: 1.3.3
    • werkzeug: 2.2.2
    • wheel: 0.37.1
    • widgetsnbextension: 4.0.3
    • wrapt: 1.14.1
    • wurlitzer: 3.0.2
    • yarl: 1.8.1
    • zipp: 3.9.0
  • System:
    • OS: Linux
    • architecture:
      • 64bit
      • ELF
    • processor: x86_64
    • python: 3.10.6
    • version: add Codecov info #144-Ubuntu SMP Tue Sep 20 11:00:04 UTC 2022

### More info

_No response_

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions