Validation not called when using an IterableDataset and `limit_train_batches` flag

When using an IterableDataset the `limit_train_batches` trainer flag will prevent the validation loop from being called. Here is  [a simplified notebook to reproduce the issue](https://colab.research.google.com/drive/1TMQigHK4G6egC1IXj-YnRu0P-L3zAA_K?usp=sharing). Note this error may manifest as:

> ValueError: cannot convert float NaN to integer 

if only `limit_train_batches` is set, but can also interfere with model checkpointing if the `limit_val_batches` flag is set: 

> MisconfigurationException: ModelCheckpoint(monitor='val_log') not found in the returned metrics: ['loss']. HINT: Did you call self.log('val_log', tensor) in the LightningModule?

**Possible Workaround:**
Interestingly if the IterableDataset has `__len__` defined than neither flag is an issue.

**Environment Info:**

* PyTorch Version (e.g., 1.0): 1.7.1+cu101
* OS (e.g., Linux): Linux
* How you installed PyTorch (conda, pip, source): pip
* Build command you used (if compiling from source):
* Python version: 3.8.2
* Any other relevant information: 32 cores, 128GB mem
* Pytorch Lightning Version: 1.1.5


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Validation not called when using an IterableDataset and `limit_train_batches` flag #6332

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Validation not called when using an IterableDataset and limit_train_batches flag #6332

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Validation not called when using an IterableDataset and `limit_train_batches` flag #6332