Skip to content

Validation not called when using an IterableDataset and limit_train_batches flag #6332

@DeWittmm

Description

@DeWittmm

When using an IterableDataset the limit_train_batches trainer flag will prevent the validation loop from being called. Here is a simplified notebook to reproduce the issue. Note this error may manifest as:

ValueError: cannot convert float NaN to integer

if only limit_train_batches is set, but can also interfere with model checkpointing if the limit_val_batches flag is set:

MisconfigurationException: ModelCheckpoint(monitor='val_log') not found in the returned metrics: ['loss']. HINT: Did you call self.log('val_log', tensor) in the LightningModule?

Possible Workaround:
Interestingly if the IterableDataset has __len__ defined than neither flag is an issue.

Environment Info:

  • PyTorch Version (e.g., 1.0): 1.7.1+cu101
  • OS (e.g., Linux): Linux
  • How you installed PyTorch (conda, pip, source): pip
  • Build command you used (if compiling from source):
  • Python version: 3.8.2
  • Any other relevant information: 32 cores, 128GB mem
  • Pytorch Lightning Version: 1.1.5

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdata handlingGeneric data-related topic

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions