-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
docsDocumentation relatedDocumentation relatedhelp wantedOpen to be worked onOpen to be worked onwon't fixThis will not be worked onThis will not be worked on
Description
🐛 Bug
I'm trying a LightningDataModule class to manage the data.
Using horovod backend, if that matters.
I've noticed that each rank is calling train_dataloader once, but val_dataloader two times somehow.
To Reproduce
run LIghtning with Dataclass and horovod, add some debug print on when val_dataloader is called
soemthing like
def train_dataloader(self):
print(f"\n#####worker {hvd.rank()} of {hvd.size()} creating train_loader\n")
return load_ds_from_dir(os.path.join(self.path, "train"), self.batch_size)
def val_dataloader(self):
print(f"\n#####worker {hvd.rank()} of {hvd.size()} creating val\n")
return load_ds_from_dir(os.path.join(self.path, "validation"), self.batch_size)
Expected behavior
expect val loader to be called only once...
Environment
* CUDA:
- GPU:
- Tesla V100-SXM2-16GB
- Tesla V100-SXM2-16GB
- Tesla V100-SXM2-16GB
- Tesla V100-SXM2-16GB
- available: True
- version: 10.2
* Packages:
- numpy: 1.19.1
- pyTorch_debug: False
- pyTorch_version: 1.6.0
- pytorch-lightning: 0.9.0
- tensorboard: 2.2.0
- tqdm: 4.46.1
* System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.8.2
- version: #1 SMP Fri Apr 20 16:44:24 UTC 2018
Metadata
Metadata
Assignees
Labels
docsDocumentation relatedDocumentation relatedhelp wantedOpen to be worked onOpen to be worked onwon't fixThis will not be worked onThis will not be worked on