Skip to content

Move reload_dataloaders_every_n_epochs to the DataHooks class #8738

@ananthsub

Description

@ananthsub

🚀 Feature

Motivation

We are auditing the Lightning components and APIs to assess opportunities for improvements:

reload_dataloaders_every_n_epochs is an argument to the Trainer constructor. However, this could be a property of the DataHooks class, instead of the trainer, as whether to initiate reloading the dataloading every n epochs should be determined by the actor providing the dataloaders (e.g. the LightningModule or LightningDataModule).

This is very similar to #8733 and how automatic/manual optimization is a property of the LightningModule. That property also started out as a trainer argument before being migrated to the lightning module. Since this pattern keeps occurring, we should separately understand why it's so appealing to add things to the trainer constructor instead of a more specific component.

Moreover, this one setting controls dataloader behavior for both train & val dataloaders. Do we need more granular control? Do we need two properties, one for training and one for validation? This could make sense as we could have very different epoch counts with features like val_check_interval, where the training epoch count != val epoch count. The property for validation would only apply during trainer.fit as trainer.validate only makes a single pass through the data.

However, the documentation for the test_dataloader: https://github.com/PyTorchLightning/pytorch-lightning/blob/963c26764682fa4cf64c93c5a7572ae0040e9c32/pytorch_lightning/core/hooks.py#L535-L537
Is this a copy/paste issue?

Pitch

  • Add a property to the DataHooks class for this in v1.5
  • Deprecate the Trainer argument for this in v1.5
  • Remove the Trainer argument in v1.7

Benefits:

  • Simplify the Trainer constructor (one fewer argument)
  • Keep the data loader management in one place instead of two (at the DataHooks level)

Alternatives

Keep as is?

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

  • Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

  • Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Metadata

Metadata

Labels

data handlingGeneric data-related topicdeprecationIncludes a deprecationdesignIncludes a design discussionfeatureIs an improvement or enhancementhelp wantedOpen to be worked onlet's do it!approved to implement

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions