Skip to content

Conversation

@awaelchli
Copy link
Contributor

@awaelchli awaelchli commented May 8, 2021

What does this PR do?

Fixes #7396

We can ignore a test for pytorch 1.9 since it successfully implements the seeding for numpy in DataLoader workers.
Here is where it got implemented:
pytorch/pytorch@3b977a0

Thanks @ananthsub for pointing me to the commit.

cc @Borda

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@awaelchli awaelchli added the ci Continuous Integration label May 8, 2021
@awaelchli awaelchli added this to the v1.4 milestone May 8, 2021
@codecov
Copy link

codecov bot commented May 8, 2021

Codecov Report

Merging #7441 (9206798) into master (45143fd) will decrease coverage by 0%.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           master   #7441   +/-   ##
======================================
- Coverage      92%     92%   -0%     
======================================
  Files         199     199           
  Lines       12987   12987           
======================================
- Hits        11925   11909   -16     
- Misses       1062    1078   +16     

Copy link
Contributor

@ananthsub ananthsub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that your initial commit was a reference for the PyTorch fix!

@Borda Borda enabled auto-merge (squash) May 8, 2021 05:18
pass


@RunIf(max_torch="1.8.9")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we rather have an arg that spec max wersion as < instead of <=

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or set it just as 1.8 as max without minor numbers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

< would be more practical I guess but it would not be obvious from the name if < or <=.
Maybe a different approach would be to have torch_gt, torch_ge, torch_lt, torch_le but that would be more argumens ...
not sure

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may say that more practical would be setting level

  • max 1.8.1 will keep 1.8.0, 1.8.1 and ignore 1.8.2 and above
  • max 1.8 will keep 1.8.0, 1.8.1, 1.8.x and ignore 1.9.x and above
  • max 1 will keep 1.8.0, 1.x and ignore 2.x and above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the gt, ge, lt and le. Another option would be to have something like `torch_req='<1.9``. But this would involve more parsing logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awaelchli should the pl worker init function when used with PT 1.9 also be a no-op? given that pytorch now sets the seed correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for numpy yes, we don't need to worry about it in 1.9+.

But another part of the pl_worker_init_fn is that it derives not only a unique seed for each worker but it makes it also unique across all distributed processes (we do that by incorporating the global_rank into the seed sequence). I don't think 1.9 does that.

@Borda Borda merged commit 1af42d7 into master May 8, 2021
@Borda Borda deleted the tests/seed-pt1.9 branch May 8, 2021 18:03
@carmocca carmocca modified the milestones: v1.4, v1.3.x May 10, 2021
@carmocca carmocca mentioned this pull request May 10, 2021
Borda pushed a commit that referenced this pull request May 11, 2021
(cherry picked from commit 1af42d7)
Borda pushed a commit that referenced this pull request May 11, 2021
(cherry picked from commit 1af42d7)
Borda pushed a commit that referenced this pull request May 11, 2021
(cherry picked from commit 1af42d7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continuous Integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix pl_worker_init_fn for pytorch 1.9

5 participants