-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[bugfix] Prevent a DDP failure using copy #9239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pytorch_lightning/trainer/connectors/logger_connector/result.py
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #9239 +/- ##
=======================================
+ Coverage 88% 92% +4%
=======================================
Files 176 176
Lines 14807 14810 +3
=======================================
+ Hits 13043 13663 +620
+ Misses 1764 1147 -617 |
| if not enable_graph: | ||
|
|
||
| def detach_fn(tensor: Tensor) -> Tensor: | ||
| return tensor.detach() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to add a clone() here but doesn't solve the issue. I'm just perplexed why the deepcopy is necessary. Do you have any intuition what is going on?
|
Hey @awaelchli, Mind sharing more details on what you are trying to do ? Best, |
This reverts commit ff7305f.
What does this PR do?
Fixes #8821
This error is critical as it blocks DDP. Observations:
epoch=Truebreaks with anyreduce_fxstep=True, sync_dist=Truebreaks withreduce_fx != "mean"The failure can be reproduced with this minimal repro: #8821 (comment)
We believe this to be a bug upstream on PyTorch but haven't been able to easily reproduce it without Lightning.
A test hasn't been added because
pytestseems to hang on teardown due to thenum_workers>0requirement. Not sure why.This would break on master.
Does your PR introduce any breaking changes? If yes, please list them.
None
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list: