validation and training loops run the partial dataset #1192

sneiman · 2020-03-18T22:38:31Z

Fix #1161 - when using ddp/ddpd2, the validation and training loops run the full respective dataset on each gpu. This costs time, and changes batch counts for any statistics being collected.

The fix just makes sure that for ddp and ddp2, auto_add_sampler() creates a DistributedSampler for each data set.

This passes all the tests on my machine except for slurm and apex related as I do not have either. I don't think this needs any doc changes. I can look into writing a test for this ... if needed. Let me know.

williamFalcon · 2020-03-27T12:44:19Z

@srush mind taking a look? this came from our chats with the HF code.

srush · 2020-03-30T02:00:29Z

This seems good to me. We have some val sets that are quite large.

Borda

LGTM 🚀

) * auto_add_sampler() fix * auto_add_sampler() fix Co-authored-by: seth <[email protected]>

sneiman added 2 commits March 16, 2020 22:37

auto_add_sampler() fix

b972ad2

auto_add_sampler() fix

1eed293

Borda changed the title ~~Issue 1161~~ validation and training loops run the partial dataset Mar 18, 2020

Borda added the docs Documentation related label Mar 18, 2020

Borda requested review from ethanwharris and neggert March 18, 2020 23:35

Borda assigned jeffling Mar 26, 2020

Borda requested review from a team, jeffling and jeremyjordan March 30, 2020 16:11

Borda approved these changes Mar 30, 2020

View reviewed changes

Borda added the ready PRs ready to be merged label Mar 30, 2020

williamFalcon merged commit 6dfe995 into Lightning-AI:master Mar 30, 2020

alexeykarnachev pushed a commit to alexeykarnachev/pytorch-lightning that referenced this pull request Apr 3, 2020

validation and training loops run the partial dataset (Lightning-AI#1192

f9a2b75

) * auto_add_sampler() fix * auto_add_sampler() fix Co-authored-by: seth <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

validation and training loops run the partial dataset #1192

validation and training loops run the partial dataset #1192

Uh oh!

sneiman commented Mar 18, 2020 •

edited by Borda

Loading

Uh oh!

williamFalcon commented Mar 27, 2020

Uh oh!

srush commented Mar 30, 2020

Uh oh!

Borda left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

validation and training loops run the partial dataset #1192

validation and training loops run the partial dataset #1192

Uh oh!

Conversation

sneiman commented Mar 18, 2020 • edited by Borda Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

williamFalcon commented Mar 27, 2020

Uh oh!

srush commented Mar 30, 2020

Uh oh!

Borda left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sneiman commented Mar 18, 2020 •

edited by Borda

Loading