-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workingdata handlingGeneric data-related topicGeneric data-related topic
Milestone
Description
When using an IterableDataset the limit_train_batches trainer flag will prevent the validation loop from being called. Here is a simplified notebook to reproduce the issue. Note this error may manifest as:
ValueError: cannot convert float NaN to integer
if only limit_train_batches is set, but can also interfere with model checkpointing if the limit_val_batches flag is set:
MisconfigurationException: ModelCheckpoint(monitor='val_log') not found in the returned metrics: ['loss']. HINT: Did you call self.log('val_log', tensor) in the LightningModule?
Possible Workaround:
Interestingly if the IterableDataset has __len__ defined than neither flag is an issue.
Environment Info:
- PyTorch Version (e.g., 1.0): 1.7.1+cu101
- OS (e.g., Linux): Linux
- How you installed PyTorch (conda, pip, source): pip
- Build command you used (if compiling from source):
- Python version: 3.8.2
- Any other relevant information: 32 cores, 128GB mem
- Pytorch Lightning Version: 1.1.5
hudeven, melaanya and ananthsub
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingdata handlingGeneric data-related topicGeneric data-related topic