Skip to content

Conversation

@ananthsub
Copy link
Contributor

@ananthsub ananthsub commented Aug 25, 2021

What does this PR do?

https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html
Sets the flag for using deterministic algorithms based on the trainer flag deterministic

Fixes #9107
Fixes #9544

Does your PR introduce any breaking changes? If yes, please list them.

Yes. This will raise a runtime error if there are no deterministic algorithms available. Previously, this was best-effort, and no error would be raised.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@ananthsub
Copy link
Contributor Author

test failures for parity tests are related due to the new flag. will take a deeper look

@codecov
Copy link

codecov bot commented Aug 26, 2021

Codecov Report

Merging #9121 (9ab9bbd) into master (8c9cb0c) will decrease coverage by 4%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #9121    +/-   ##
=======================================
- Coverage      93%     89%    -4%     
=======================================
  Files         177     177            
  Lines       15456   15460     +4     
=======================================
- Hits        14317   13698   -619     
- Misses       1139    1762   +623     

Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@mergify mergify bot added the ready PRs ready to be merged label Aug 30, 2021
@ananthsub ananthsub closed this Sep 10, 2021
@ananthsub ananthsub force-pushed the feat/torch-deterministic-algorithm branch from 454f826 to b294c57 Compare September 10, 2021 22:07
@ananthsub ananthsub reopened this Sep 10, 2021
@justusschock
Copy link
Member

Actually, I am not sure if we can switch it that easily, Before deterministic was more like a "make it as deterministic as possible but still run it". Changing that to raising issues is quite a breaking change here even though this is more what one would expect.

Personally, I have a lot of code using that flag and relying on it not to fail.

@ananthsub
Copy link
Contributor Author

ananthsub commented Sep 11, 2021

@justusschock I agree, this would be a breaking change. Even some of our tests have been updated to avoid the runtime error raised.

I am not sure how to square this with new expectations PyTorch provides around these deterministic checks. The new API is more comprehensive and offers stricter guarantees around reproducibility.

Options:

  • go with the current approach in this PR, where deterministic follows the strictest interpretation currently offered. This leads to a breaking change with potential runtime exceptions now raised if deterministic implementations aren't available. To opt out, users don't set the flag on the trainer, and instead configure the cudnn flag independently (as seen in the parity test change). The runtime error could be seen as a good thing if we're trying to provide more visibility & guarantees around determinism. But it's obviously not great dealing with newly raised exceptions.
  • or ask users to set torch's use deterministic algorithms themselves, and deal with runtime errors, while the trainer does only safe changes that don't risk runtime errors raised. This is cleaner from the exception handling POV. But it's misleading because the guarantees the Lightning Trainer can make around determinism are inherently weaker. It's also the framework not leveraging PyTorch's capabilities to the fullest.

Are there other paths you see?

@carmocca
Copy link
Contributor

carmocca commented Sep 11, 2021

Are there other paths you see?

deterministic=False: Not enabled
deterministic=True: Same as today. Prints info message about deterministic="strict" for discoverability
deterministic="strict": Uses torch.use_deterministic_algorithms

IMO torch should've provided a function or flag that doesn't raise a RuntimeError so that cudnn.benchmark can be easily replaced, since the new function also "will make other PyTorch operations [than CUDA] behave deterministically".

For context: pytorch/pytorch#15359

@carmocca
Copy link
Contributor

carmocca commented Sep 11, 2021

@kurtamohler (author of use_deterministic_algorithms) Looks like the initial design included a mechanism to default to a warning but was eventually removed: pytorch/pytorch#38683 (comment). Where was the final decision made? Do you have any suggestions for how this PR should proceed?

Thanks :)

@kurtamohler
Copy link

I'm not sure at the moment where the discussion was, I'll do some searching.

To me, it seems like we would have to add a warn-only option to use_deterministic_algorithms. Would you mind opening an issue in pytorch and tag me in it? Otherwise, I can open an issue next time I'm at my computer

@ananthsub ananthsub added the breaking change Includes a breaking change label Sep 17, 2021
@awaelchli
Copy link
Contributor

+1 for the current approach. IMO if our Trainer has an argument "deterministic" it should do what it says and the stricter version introduced here makes sense to me.

Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@ananthsub ananthsub force-pushed the feat/torch-deterministic-algorithm branch from 346264a to e04c956 Compare September 25, 2021 04:32
@mergify mergify bot removed the has conflicts label Sep 25, 2021
@ananthsub ananthsub force-pushed the feat/torch-deterministic-algorithm branch from 704d2da to 3b2ad55 Compare September 29, 2021 22:31
@mergify mergify bot removed the has conflicts label Sep 29, 2021
@ananthsub ananthsub enabled auto-merge (squash) September 29, 2021 23:23
@ananthsub ananthsub merged commit 0d3325e into Lightning-AI:master Sep 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking change Includes a breaking change feature Is an improvement or enhancement ready PRs ready to be merged

Projects

None yet

9 participants