Skip to content

Should the model's grads be cleared before entering the validation loop? #18713

@awaelchli

Description

@awaelchli

Discussion

Our trainer performs optimization in this order: 1) zero_grad() 2) backward() 3) optimizer.step()
This means if validation is run after such an optimization step (either the last one in the epoch or according to the Trainer's val_check_interval during the epoch), gradients would still be around.
To minimize the memory usage and risk of OOM during validation, we zero_grad the model params before entering the validation loop.

In normal use cases, this zero_grad call is not noticeable. NeMo's MLPerf team reported however that

touching the grads there is causing some of the all gathers we are doing before the validation_step to get exposed and not allowing it to be overlapped with the compute of validation_step. We want the all_gathers of all the buckets other than first to be overlapped with the val_step.

Option 1: Remove the explicit zero_grad call as suggested by NeMo.
If this zero_grad call were to be removed, the implication is that grads will be available during validation (i.e. all(p.grad is not None for p in self.parameters()) and potentially use more memory than expected.

Option 2: Change the optimization step
An alternative would be to change the order to 1) backward() 3) optimizer.step() 3) zero_grad(). Then, an explicit zero-ing at the beginning of validation wouldn't be necessary. But this would potentially be a breaking change for some users who rely on this order and have overridden hooks accordingly.

Option 3: Make it possible to override the zero_grad call (turning it off) in the validation loop. For example, in the form of a hook LightningModule.on_evaluation_zero_grad() similar to our on_validation_model_eval() hooks.

cc @Borda

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions