-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🚀 Feature
LightningModules and DataModules currently support a setup API which takes an optional stage argument.
#6386 addresses some issues in the setup/teardown lifecycle, so I was wondering if we should take this further (#6401)
Motivation
Pros of making the separate hooks for each stage:
- Clarity in the API that helps forwards compatibility: In the current scheme, the Lightning trainer can pass an arbitrary value for
stagethat user code might not handle. With the explicit hooks, new stages becomes opt-in for users, as users must implement the corresponding hook in their lightning/data module - Consistency in the API: this matches the pattern already established for Lightning/data modules which have train/validation/test/predict defined as separate hooks
- On the Lightning internals, we can remove the base datamodule wrapper class, and remove the
has_setup_{stage}attributes since it'll be obvious when the hooks are called
Cons:
- This requires a deprecation process and can cause thrash for users
- Users now have to implement more hooks. However, a mitigation is that the refactoring should be straightforward as users can easily share code with a helper function in the lightning/data module.
Pitch
We add the following hooks to the DataHooks base:
on_{stage}_prepare_dataon_{stage}_setupon_{stage}_teardown
for the existing values of stage: fit, test, validate, predict
Similarly, we add corresponding hooks to the Callback base:
on_{stage}_setupon_{stage}_teardown
During the migration, in the trainer, if the Lightning(Data)Module has this hook implemented, then we call it. Otherwise, we fallback to calling the existing setup/teardown hooks. We do the same for the callback hooks.
We could set a longer deprecation timeline for this given how prevalent these hooks are. For example, we don't deprecate prepare_data, setup, or teardown until version 1.7+.
Additionally, we should move the trainer argument prepare_data_per_node to the DataHooks base, similar to how automatic_optimization is a property of the LightningModule. This point is separate from the overall hooks discussion and could happen faster to slightly simplify the trainer API.
Alternatives
Keep the existing hooks