Skip to content

Conversation

@williamFalcon
Copy link
Contributor

@williamFalcon williamFalcon commented Jun 18, 2020

Implements #2140 in a way that works.

For local ddp, we now use the process id as a numpy sub-seed to generate a random port. This way since the user sets the full seed we can use the parent process id as a seed before spawning off children.

Also solves an issue where on init when ranks are know, we shouldn't see duplicate messages

@mergify mergify bot requested a review from a team June 18, 2020 03:42
@williamFalcon williamFalcon mentioned this pull request Jun 18, 2020
5 tasks
@williamFalcon
Copy link
Contributor Author

@ZhaofengWu mind putting your Full name + email here for the commit?

@ZhaofengWu
Copy link
Contributor

Zhaofeng Wu; [email protected] :)

@ZhaofengWu
Copy link
Contributor

@williamFalcon williamFalcon changed the title Pid port Pid port + duplicate rank_zero logging Jun 18, 2020
@codecov
Copy link

codecov bot commented Jun 18, 2020

Codecov Report

Merging #2231 into master will decrease coverage by 0%.
The diff coverage is 22%.

@@          Coverage Diff           @@
##           master   #2231   +/-   ##
======================================
- Coverage      88%     88%   -0%     
======================================
  Files          70      70           
  Lines        5415    5421    +6     
======================================
+ Hits         4759    4761    +2     
- Misses        656     660    +4     

@williamFalcon williamFalcon merged commit 476911d into master Jun 18, 2020
@Borda Borda added the bug Something isn't working label Jun 18, 2020
@Borda Borda added this to the 0.8.0 milestone Jun 18, 2020
@Borda Borda deleted the pid_port branch June 18, 2020 07:53
@Borda
Copy link
Collaborator

Borda commented Jun 18, 2020

@ZhaofengWu mind putting your Full name + email here for the commit?

unfortunately, this way of co-authoring is pure git thing and dos do not appear in any GitHub stats, see my comment here #1920 (comment)

if 'LOCAL_RANK' in os.environ:
rank_zero_only.rank = os.environ['LOCAL_RANK']
if 'SLURM_JOB_ID' in os.environ:
rank_zero_only.rank = os.environ['SLURM_JOB_ID']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line hear breaks logging in my setup. rank_zero_only.rank is set to a non-zero value (the slurm job id) and thus non of the logging functions are ever executed. Did you mean SLURM_PROCID which is the MPI rank?

@ExpectationMax
Copy link
Contributor

ExpectationMax commented Jun 20, 2020

I think this pull request breaks logging in some setups (when running on slurm clusters). See https://github.com/PyTorchLightning/pytorch-lightning/pull/2231/files#r443131338

(sorry for the double post, I did not realize that the review also shows up in this thread)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants