Skip to content

Conversation

@ifsheldon
Copy link
Contributor

What does this PR do?

Fixes #5966
Raise an exception rather than a warning to force users to use compatible accelerators.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Check that target branch and milestone match!

Did you have fun?

Make sure you had fun coding 🙃

@pep8speaks
Copy link

pep8speaks commented Feb 14, 2021

Hello @ifsheldon! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-02-23 02:30:28 UTC

@codecov
Copy link

codecov bot commented Feb 14, 2021

Codecov Report

Merging #5970 (5fd9f47) into master (ae6ce17) will decrease coverage by 0%.
The diff coverage is 94%.

@@          Coverage Diff           @@
##           master   #5970   +/-   ##
======================================
- Coverage      91%     91%   -0%     
======================================
  Files         160     160           
  Lines       11405   11435   +30     
======================================
  Hits        10417   10417           
- Misses        988    1018   +30     

@tchaton tchaton added this to the 1.2.x milestone Feb 15, 2021
@tchaton tchaton added design Includes a design discussion feature Is an improvement or enhancement and removed design Includes a design discussion labels Feb 15, 2021
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good. Some nits to resolve. Thanks for your work !

Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good ! Small nits left !

Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment on the robustness of this approach?
Can there be instances where IPython is loaded but we're not in a notebook? We should make sure we don't raise any false positive error messages

@ifsheldon
Copy link
Contributor Author

Could you comment on the robustness of this approach?
Can there be instances where IPython is loaded but we're not in a notebook? We should make sure we don't raise any false positive error messages

I would say it is reliable in my several tests on Linux, Windows and macOS with different distributions of IPython(x64 and aarch). And the code is given by the maintainers of IPython, which, I think, add more reliability.
And I dinged into the code a bit. The code calls get_ipython() that returns a InteractiveShell instance. That seems to be indeed reasonable way to check IPython environment, although I cannot find when the attribute kernel is attached to a InteractiveShell instance. Maybe @minrk can be so kind as to comment on that.

@ifsheldon
Copy link
Contributor Author

ifsheldon commented Feb 16, 2021

Can you please detail which kinds of IPython kernels and shells are not compatible? Or can you be more specific on what went wrong with Jupyter-lab and Jupyter notebook that causes such incompatibility?

I just found out that Jupiter-lab and Jupyter-notebook use zmqshelll, which means get_ipython() will return an instance of ipykernel.zmqshell.ZMQInteractiveShell that has a kernel attribute. However, if I run get_ipython() in terminal inside ipython, I got an instance of IPython.terminal.interactiveshell.TerminalInteractiveShell that has no kernel attribute, and ddp works well with that interactive shell, testing on the boring model with 2 GPUs.

@ifsheldon
Copy link
Contributor Author

ifsheldon commented Feb 16, 2021

get_ipython() explicitly check IPython environment. If python codes is run in IPython, it will return a shell instance. And the following getattr(ip, 'kernel', None) is not None only detects whether a kernel is present and implicitly infer whether python code is run with Jupyter, since Jupyter uses zmq interactive shell that has kernel attribute.

So, if we want to just check IPython, get_ipython() is not None should suffice. But, if the ddp incompatibility is jupyter-only and we want to detect Jupyter frontend, there is no way(as IPython aims to separate frontends and backends) but to infer that from shell types.

@awaelchli
Copy link
Contributor

awaelchli commented Feb 17, 2021

Can you please detail which kinds of IPython kernels and shells are not compatible? Or can you be more specific on what went wrong with Jupyter-lab and Jupyter notebook that causes such incompatibility?

DDP incompatibility is not strictly just with "notebooks". Our observation is simply that it doesn't work with them.

A script that contains Trainer(accelerator=ddp, gpus=4) must be launched with a python command, i.e.
python train.py --args ...
Under the hood, the process (call it main process) will then launch other processes (in the example here 3) using subprocess.Popen, so essentially:

LOCAL_RANK=1 {... other env vars} python train.py
LOCAL_RANK=2 {... other env vars} python train.py
LOCAL_RANK=3 {... other env vars} python train.py

This can't work with notebooks, because notebooks can't be launched this way. Same with interactive shell I guess...

@ifsheldon
Copy link
Contributor Author

ifsheldon commented Feb 17, 2021

I see, and I checked again and I saw some weird behaviors when I run the boring model with ddp in IPython interactive shell. So then, only checking ipython should suffice, and I modified my code as the new commit. This should work as long as get_ipython() works.

Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ifsheldon I fixed the pep8 problems (line too long) and added a test for coverage, hope this is fine with you.
otherwise LGTM

@mergify mergify bot removed the has conflicts label Feb 22, 2021
@carmocca carmocca changed the title added ipython environment check in accelerator connector Ensure accelerator is valid if running interactively Feb 23, 2021
@carmocca carmocca added the ready PRs ready to be merged label Feb 23, 2021
@carmocca carmocca merged commit ebabe56 into Lightning-AI:master Feb 23, 2021
@awaelchli
Copy link
Contributor

Thanks for sending in the PR @ifsheldon and also @carmocca for the help polishing it

@ifsheldon ifsheldon deleted the ipython_env_check branch February 23, 2021 19:34
ananthsub pushed a commit to ananthsub/pytorch-lightning that referenced this pull request Feb 24, 2021
Co-authored-by: chaton <[email protected]>
Co-authored-by: Adrian Wälchli <[email protected]>
Co-authored-by: Carlos Mocholi <[email protected]>
@carmocca carmocca mentioned this pull request Mar 25, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Is an improvement or enhancement ready PRs ready to be merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ipython kernel detection and give warning when accelerator = "ddp" trainer.fit() stuck with accelerator set to "ddp"

7 participants