Skip to content

Conversation

@ryanking13
Copy link
Contributor

@ryanking13 ryanking13 commented May 6, 2021

What does this PR do?

Fixes #7400.

pytorch_lightning/utilities/argparse_utils.py was renamed to pytorch_lightning/utilities/argparse.py at PL 1.2.0.

A checkpoint saved in PL < 1.2 tries to use argparse.utils._gpus_arg_default() which does not exists in PL >= 1.2.0.

# pytorch_lightning/utilities/argparse_utils.py
from pytorch_lightning.utilities.argparse import *

The current solution for backward compatibility above does not work since the function starts with an underscore.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@codecov
Copy link

codecov bot commented May 6, 2021

Codecov Report

Merging #7402 (dfa8800) into master (c3fc031) will decrease coverage by 4%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #7402    +/-   ##
=======================================
- Coverage      92%     87%    -4%     
=======================================
  Files         200     200            
  Lines       12982   12983     +1     
=======================================
- Hits        11908   11358   -550     
- Misses       1074    1625   +551     

Copy link
Contributor

@carmocca carmocca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better solution is to set

__all__ = ['_gpus_arg_default'] in pytorch_lightning.utilities.argparse

since this file will be eventually removed

@ryanking13
Copy link
Contributor Author

ryanking13 commented May 6, 2021

I think a better solution is to set

__all__ = ['_gpus_arg_default'] in pytorch_lightning.utilities.argparse

since this file will be eventually removed

I thought of that solution, but the _gpus_arg_default is already unused and remains only for backward compatibility.
https://github.com/PyTorchLightning/pytorch-lightning/blob/5a498d3b975460ca1658b658a0a74e18f48bea79/pytorch_lightning/utilities/argparse.py#L289-L293
After the argparse_utils.py is removed at v1.4, there will be no other file that imports the function.

@Borda Borda added this to the v1.3 milestone May 6, 2021
@Borda Borda added the bug Something isn't working label May 6, 2021
@carmocca
Copy link
Contributor

carmocca commented May 6, 2021

is already unused and remains only for backward compatibility.

I know, but I would say it's nicer to have both things together.

After the argparse_utils.py is removed at v1.4, there will be no other file that imports the function.

But if we remove this file in v1.4, will a checkpoint created before 1.2 be able to reload? As you said:

A checkpoint saved in PL < 1.2 tries to use argparse.utils._gpus_arg_default()

@awaelchli
Copy link
Contributor

In future versions, before loading the checkpoint we need to monkey patch a dummy function onto the modules where it was used. The fact that this function got pickled into the checkpoints means we will forever have to track changes surrounding that. Horrific. :(

@awaelchli awaelchli added the priority: 0 High priority task label May 6, 2021
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@ryanking13
Copy link
Contributor Author

ryanking13 commented May 6, 2021

But if we remove this file in v1.4, will a checkpoint created before 1.2 be able to reload?

That is the real problem. Both of the solutions won't help when this file is removed.
There should be a monkey patch function as @awaelchli said, which doesn't seem easy.

@awaelchli awaelchli enabled auto-merge (squash) May 6, 2021 13:31
@awaelchli awaelchli merged commit d9bdc56 into Lightning-AI:master May 6, 2021
@carmocca carmocca mentioned this pull request May 12, 2021
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working priority: 0 High priority task

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Loading a checkpoint that was saved in PL < 1.2 still breaks

5 participants