Skip to content

val_dataloader is called twice in each worker #3377

@undertherain

Description

@undertherain

🐛 Bug

I'm trying a LightningDataModule class to manage the data.
Using horovod backend, if that matters.
I've noticed that each rank is calling train_dataloader once, but val_dataloader two times somehow.

To Reproduce

run LIghtning with Dataclass and horovod, add some debug print on when val_dataloader is called

soemthing like

    def train_dataloader(self):
        print(f"\n#####worker {hvd.rank()} of {hvd.size()} creating train_loader\n")
        return load_ds_from_dir(os.path.join(self.path, "train"), self.batch_size)

    def val_dataloader(self):
        print(f"\n#####worker {hvd.rank()} of {hvd.size()} creating val\n")
        return load_ds_from_dir(os.path.join(self.path, "validation"), self.batch_size)

Expected behavior

expect val loader to be called only once...

Environment

* CUDA:
	- GPU:
		- Tesla V100-SXM2-16GB
		- Tesla V100-SXM2-16GB
		- Tesla V100-SXM2-16GB
		- Tesla V100-SXM2-16GB
	- available:         True
	- version:           10.2
* Packages:
	- numpy:             1.19.1
	- pyTorch_debug:     False
	- pyTorch_version:   1.6.0
	- pytorch-lightning: 0.9.0
	- tensorboard:       2.2.0
	- tqdm:              4.46.1
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- ELF
	- processor:         x86_64
	- python:            3.8.2
	- version:           #1 SMP Fri Apr 20 16:44:24 UTC 2018

Metadata

Metadata

Assignees

Labels

docsDocumentation relatedhelp wantedOpen to be worked onwon't fixThis will not be worked on

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions