Add AMP for validation, prediction and testing #6565

justusschock · 2021-03-17T13:17:26Z

What does this PR do?

Adds AMP Autocast for validation, prediction and testing

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

pep8speaks · 2021-03-17T13:17:30Z

Hello @justusschock! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-03-20 21:40:41 UTC

tchaton

Awesome catch ! Thanks a lot !

awaelchli · 2021-03-17T15:36:46Z

tests/models/test_amp.py

+    def validation_step(self, batch, batch_idx):
+        self._step(batch, batch_idx)
+
+    def test_step(self, batch, batch_idx):
+        self._step(batch, batch_idx)


You are missing the return statements
Boring model has epoch_end methods implemented that want the outputs

this will probably fix the tests

Suggested change

def validation_step(self, batch, batch_idx):

self._step(batch, batch_idx)

def test_step(self, batch, batch_idx):

self._step(batch, batch_idx)

def validation_step(self, batch, batch_idx):

return self._step(batch, batch_idx)

def test_step(self, batch, batch_idx):

return self._step(batch, batch_idx)

@awaelchli Actually I omitted them on purpose, since these steps don't have to return anything if self.log is used there.

If you think they should be there I can add them though

The epoch_end methods require it.
You can also do model.validation_epoch_end = None I suppose
In any case, in order to make the tests pass something has to change

my intuition was wrong, adding the return is not enough to fix the test. The training step returns a key "loss" but the validation epoch end hooks try to access a key "x" returned from validation_step.

Co-authored-by: Adrian Wälchli <[email protected]>

Borda · 2021-03-20T11:23:07Z

@justusschock mind check the failing GPU test?

* Add Tests for val and test-steps * Add native AMP * pep8 tests * pep8 plugin * changelog (cherry picked from commit 634d831)

…ter) to github/third-party/PyTorchLightning/pytorch-lightning Summary: ### New commit log messages ## [UnReleased] - 2021-MM-DD ### Added - Added more explicit exception message when trying to execute `trainer.test()` or `trainer.validate()` with `fast_dev_run=True` ([#6667](Lightning-AI/pytorch-lightning#6667)) - Added `LightningCLI` class to provide simple reproducibility with minimum boilerplate training cli. ([#4492](Lightning-AI/pytorch-lightning#4492)) - Trigger warning when non-metric logged value with multi processes hasn't been reduced ([#6417](Lightning-AI/pytorch-lightning#6417)) - Added `gradient_clip_algorithm` argument to Trainer for gradient clipping by value ([#6123](Lightning-AI/pytorch-lightning#6123)). - Added a way to print to terminal without breaking up the progress bar ([#5470](Lightning-AI/pytorch-lightning#5470)) - Added support to checkpoint after training steps in `ModelCheckpoint` callback ([#6146](Lightning-AI/pytorch-lightning#6146)) - Added `checkpoint` parameter to callback's `on_save_checkpoint` hook ([#6072](Lightning-AI/pytorch-lightning#6072)) - Added `RunningStage.SANITY_CHECKING` ([#4945](Lightning-AI/pytorch-lightning#4945)) - Added `TrainerState.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING}` ([#4945](Lightning-AI/pytorch-lightning#4945)) - Added `Trainer.validate()` method to perform one evaluation epoch over the validation set ([#4948](Lightning-AI/pytorch-lightning#4948)) - Added `LightningEnvironment` for Lightning-specific DDP ([#5915](Lightning-AI/pytorch-lightning#5915)) - Added `teardown()` hook to LightningDataModule ([#4673](Lightning-AI/pytorch-lightning#4673)) - Added `auto_insert_metric_name` parameter to `ModelCheckpoint` ([#6277](Lightning-AI/pytorch-lightning#6277)) - Added arg to `self.log` that enables users to give custom names when dealing with multiple dataloaders ([#6274](Lightning-AI/pytorch-lightning#6274)) - Added `teardown` method to `BaseProfiler` to enable subclasses defining post-profiling steps outside of `__del__` ([#6370](Lightning-AI/pytorch-lightning#6370)) - Added `setup` method to `BaseProfiler` to enable subclasses defining pre-profiling steps for every process ([#6633](Lightning-AI/pytorch-lightning#6633)) - Added no return warning to predict ([#6139](Lightning-AI/pytorch-lightning#6139)) - Added `Trainer.predict` config validation ([#6543](Lightning-AI/pytorch-lightning#6543)) - Added `AbstractProfiler` interface ([#6621](Lightning-AI/pytorch-lightning#6621)) - Added support for including module names for forward in the autograd trace of `PyTorchProfiler` ([#6349](Lightning-AI/pytorch-lightning#6349)) - Added support for the PyTorch 1.8.1 autograd profiler ([#6618](Lightning-AI/pytorch-lightning#6618)) - Added `outputs` parameter to callback's `on_validation_epoch_end` & `on_test_epoch_end` hooks ([#6120](Lightning-AI/pytorch-lightning#6120)) - Added `configure_sharded_model` hook ([#6679](Lightning-AI/pytorch-lightning#6679)) - Added support for `precision=64`, enabling training with double precision ([#6595](Lightning-AI/pytorch-lightning#6595)) - Added support for DDP communication hooks ([#6736](Lightning-AI/pytorch-lightning#6736)) - Added `artifact_location` argument to `MLFlowLogger` which will be passed to the `MlflowClient.create_experiment` call ([#6677](Lightning-AI/pytorch-lightning#6677)) - Added `model` parameter to precision plugins' `clip_gradients` signature ([#6764](Lightning-AI/pytorch-lightning#6764)) ### Changed - Renamed `pytorch_lightning.callbacks.swa` to `pytorch_lightning.callbacks.stochastic_weight_avg` ([#6259](Lightning-AI/pytorch-lightning#6259)) - Refactor `RunningStage` and `TrainerState` usage ([#4945](Lightning-AI/pytorch-lightning#4945)) - Changed `trainer.evaluating` to return `True` if validating or testing ([#4945](Lightning-AI/pytorch-lightning#4945)) - Changed `setup()` and `teardown()` stage argument to take any of `{fit,validate,test,predict}` ([#6386](Lightning-AI/pytorch-lightning#6386)) - Changed profilers to save separate report files per state and rank ([#6621](Lightning-AI/pytorch-lightning#6621)) - Changed `PyTorchProfiler` to use `torch.autograd.profiler.record_function` to record functions ([#6349](Lightning-AI/pytorch-lightning#6349)) ### Deprecated - `period` has been deprecated in favor of `every_n_val_epochs` in the `ModelCheckpoint` callback ([#6146](Lightning-AI/pytorch-lightning#6146)) - Deprecated `trainer.running_sanity_check` in favor of `trainer.sanity_checking` ([#4945](Lightning-AI/pytorch-lightning#4945)) - Deprecated `Profiler(output_filename)` in favor of `dirpath` and `filename` ([#6621](Lightning-AI/pytorch-lightning#6621)) - Deprecated `PytorchProfiler(profiled_functions)` in favor of `record_functions` ([#6349](Lightning-AI/pytorch-lightning#6349)) - Deprecated metrics in favor of `torchmetrics` ([#6505](Lightning-AI/pytorch-lightning#6505), [#6530](Lightning-AI/pytorch-lightning#6530), [#6540](Lightning-AI/pytorch-lightning#6540), [#6547](Lightning-AI/pytorch-lightning#6547), [#6515](Lightning-AI/pytorch-lightning#6515), [#6572](Lightning-AI/pytorch-lightning#6572), [#6573](Lightning-AI/pytorch-lightning#6573), [#6584](Lightning-AI/pytorch-lightning#6584), [#6636](Lightning-AI/pytorch-lightning#6636), [#6637](Lightning-AI/pytorch-lightning#6637), [#6649](Lightning-AI/pytorch-lightning#6649), [#6659](Lightning-AI/pytorch-lightning#6659), ) ### Removed - Removed support for passing a bool value to `profiler` argument of Trainer ([#6164](Lightning-AI/pytorch-lightning#6164)) - Removed no return warning from val/test step ([#6139](Lightning-AI/pytorch-lightning#6139)) - Removed passing a `ModelCheckpoint` instance to `Trainer(checkpoint_callback)` ([#6166](Lightning-AI/pytorch-lightning#6166)) - Removed deprecated Trainer argument `enable_pl_optimizer` and `automatic_optimization` ([#6163](Lightning-AI/pytorch-lightning#6163)) - Removed deprecated metrics ([#6161](Lightning-AI/pytorch-lightning#6161)) * from `pytorch_lightning.metrics.functional.classification` removed `to_onehot`, `to_categorical`, `get_num_classes`, `roc`, `multiclass_roc`, `average_precision`, `precision_recall_curve`, `multiclass_precision_recall_curve` * from `pytorch_lightning.metrics.functional.reduction` removed `reduce`, `class_reduce` - Removed deprecated `ModelCheckpoint` arguments `prefix`, `mode="auto"` ([#6162](Lightning-AI/pytorch-lightning#6162)) - Removed `mode='auto'` from `EarlyStopping` ([#6167](Lightning-AI/pytorch-lightning#6167)) - Removed legacy references for magic keys in the `Result` object ([#6016](Lightning-AI/pytorch-lightning#6016)) - Removed deprecated `LightningModule` `hparams` setter ([#6207](Lightning-AI/pytorch-lightning#6207)) - Removed legacy code to log or include metrics in the progress bar by returning them in a dict with the `"log"/"progress_bar"` magic keys. Use `self.log` instead ([#6734](Lightning-AI/pytorch-lightning#6734)) - Removed `optimizer_idx` argument from `training_step` in manual optimization ([#6093](Lightning-AI/pytorch-lightning#6093)) ### Fixed - Set better defaults for `rank_zero_only.rank` when training is launched with SLURM and torchelastic ([#6802](Lightning-AI/pytorch-lightning#6802)) - Made the `Plugin.reduce` method more consistent across all Plugins to reflect a mean-reduction by default ([#6011](Lightning-AI/pytorch-lightning#6011)) - Move lightning module to correct device type when using LightningDistributedWrapper ([#6070](Lightning-AI/pytorch-lightning#6070)) - Do not print top-k verbose log with `ModelCheckpoint(monitor=None)` ([#6109](Lightning-AI/pytorch-lightning#6109)) - Fixed csv extension check ([#6436](Lightning-AI/pytorch-lightning#6436)) - Fixed `ModelCheckpoint(monitor=None, save_last=True)` not saving checkpoints ([#6136](Lightning-AI/pytorch-lightning#6136)) - Fixed `ModelCheckpoint(save_top_k=0, save_last=True)` not saving the `last` checkpoint ([#6136](Lightning-AI/pytorch-lightning#6136)) - Fixed `.teardown(stage='fit')` getting called during `trainer.test` ([#6386](Lightning-AI/pytorch-lightning#6386)) - Fixed `.on_fit_{start,end}()` getting called during `trainer.test` ([#6386](Lightning-AI/pytorch-lightning#6386)) - Fixed LightningModule `all_gather` on cpu tensors ([#6416](Lightning-AI/pytorch-lightning#6416)) - Fixed torch distributed not available in setup hook for DDP ([#6506](Lightning-AI/pytorch-lightning#6506)) - Fixed `EarlyStopping` logic when `min_epochs` or `min_steps` requirement is not met ([#6705](Lightning-AI/pytorch-lightning#6705)) ## [1.2.7] - 2021-04-06 ### Fixed - Fixed resolve a bug with omegaconf and xm.save ([#6741](Lightning-AI/pytorch-lightning#6741)) - Fixed an issue with IterableDataset when __len__ is not defined ([#6828](Lightning-AI/pytorch-lightning#6828)) - Sanitize None params during pruning ([#6836](Lightning-AI/pytorch-lightning#6836)) - Enforce an epoch scheduler interval when using SWA ([#6588](Lightning-AI/pytorch-lightning#6588)) - Fixed TPU Colab hang issue, post training ([#6816](Lightning-AI/pytorch-lightning#6816)) - Fixed a bug where `TensorBoardLogger` would give a warning and not log correctly to a symbolic link `save_dir` ([#6730](Lightning-AI/pytorch-lightning#6730)) ## [1.2.6] - 2021-03-30 ### Changed - Changed the behavior of `on_epoch_start` to run at the beginning of validation & test epoch ([#6498](Lightning-AI/pytorch-lightning#6498)) ### Removed - Removed legacy code to include `step` dictionary returns in `callback_metrics`. Use `self.log_dict` instead. ([#6682](Lightning-AI/pytorch-lightning#6682)) ### Fixed - Fixed `DummyLogger.log_hyperparams` raising a `TypeError` when running with `fast_dev_run=True` ([#6398](Lightning-AI/pytorch-lightning#6398)) - Fixed error on TPUs when there was no `ModelCheckpoint` ([#6654](Lightning-AI/pytorch-lightning#6654)) - Fixed `trainer.test` freeze on TPUs ([#6654](Lightning-AI/pytorch-lightning#6654)) - Fixed a bug where gradients were disabled after calling `Trainer.predict` ([#6657](Lightning-AI/pytorch-lightning#6657)) - Fixed bug where no TPUs were detected in a TPU pod env ([#6719](Lightning-AI/pytorch-lightning#6719)) ## [1.2.5] - 2021-03-23 ### Changed - Update Gradient Clipping for the TPU Accelerator ([#6576](Lightning-AI/pytorch-lightning#6576)) - Refactored setup for typing friendly ([#6590](Lightning-AI/pytorch-lightning#6590)) ### Fixed - Fixed a bug where `all_gather` would not work correctly with `tpu_cores=8` ([#6587](Lightning-AI/pytorch-lightning#6587)) - Fixed comparing required versions ([#6434](Lightning-AI/pytorch-lightning#6434)) - Fixed duplicate logs appearing in console when using the python logging module ([#6275](Lightning-AI/pytorch-lightning#6275)) - Added Autocast in validation, test and predict modes for Native AMP ([#6565](Lightning-AI/pytorch-lightning#6565)) Reviewed By: shuyingsunshine21 Differential Revision: D27528929 fbshipit-source-id: 311c88f71461c2c79bbf185e28d7a6d683ccc26f

Add Tests for val and test-steps

f481d2b

Add native AMP

ae5ed78

justusschock mentioned this pull request Mar 17, 2021

Native mixed precision OOM Val Step #6566

Closed

justusschock self-assigned this Mar 17, 2021

justusschock added priority: 0 High priority task bug Something isn't working labels Mar 17, 2021

justusschock marked this pull request as ready for review March 17, 2021 13:52

justusschock requested review from Borda, SeanNaren, awaelchli, carmocca, tchaton and williamFalcon as code owners March 17, 2021 13:52

tchaton approved these changes Mar 17, 2021

View reviewed changes

pep8 tests

94fef92

SeanNaren approved these changes Mar 17, 2021

View reviewed changes

justusschock added 2 commits March 17, 2021 14:57

pep8 plugin

6d69d16

changelog

c1c9599

SeanNaren self-requested a review March 17, 2021 13:57

justusschock added the ready PRs ready to be merged label Mar 17, 2021

justusschock enabled auto-merge (squash) March 17, 2021 13:58

SeanNaren approved these changes Mar 17, 2021

View reviewed changes

SeanNaren added this to the 1.2.x milestone Mar 17, 2021

ananthsub approved these changes Mar 17, 2021

View reviewed changes

awaelchli approved these changes Mar 17, 2021

View reviewed changes

justusschock and others added 2 commits March 17, 2021 17:38

add missing import

19bf51f

Update tests/models/test_amp.py

cbc8a40

Co-authored-by: Adrian Wälchli <[email protected]>

Borda approved these changes Mar 20, 2021

View reviewed changes

fix step return values

58b5dac

awaelchli mentioned this pull request Mar 20, 2021

1.2.x cherries 🍒 #6083

Closed

awaelchli added 2 commits March 20, 2021 22:17

fix dict syntax

7a3f553

fix predict method signature

f02d2a3

justusschock merged commit 634d831 into master Mar 20, 2021

justusschock deleted the justusschock-patch-1 branch March 20, 2021 23:15

Borda mentioned this pull request Mar 23, 2021

Weekly Patch Release v.1.2.5 [full merge, no squash] #6646

Merged

4 tasks

Borda pushed a commit that referenced this pull request Mar 23, 2021

Add AMP for validation, prediction and testing (#6565)

d15cec3

* Add Tests for val and test-steps * Add native AMP * pep8 tests * pep8 plugin * changelog (cherry picked from commit 634d831)

Borda pushed a commit that referenced this pull request Mar 23, 2021

Add AMP for validation, prediction and testing (#6565)

8a52513

* Add Tests for val and test-steps * Add native AMP * pep8 tests * pep8 plugin * changelog (cherry picked from commit 634d831)

Borda pushed a commit that referenced this pull request Mar 23, 2021

Add AMP for validation, prediction and testing (#6565)

c3be721

* Add Tests for val and test-steps * Add native AMP * pep8 tests * pep8 plugin * changelog (cherry picked from commit 634d831)

Borda pushed a commit that referenced this pull request Mar 23, 2021

Add AMP for validation, prediction and testing (#6565)

e69f66f

* Add Tests for val and test-steps * Add native AMP * pep8 tests * pep8 plugin * changelog (cherry picked from commit 634d831)

lexierule pushed a commit that referenced this pull request Mar 24, 2021

Add AMP for validation, prediction and testing (#6565)

fec603d

* Add Tests for val and test-steps * Add native AMP * pep8 tests * pep8 plugin * changelog (cherry picked from commit 634d831)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add AMP for validation, prediction and testing #6565

Add AMP for validation, prediction and testing #6565

Uh oh!

justusschock commented Mar 17, 2021 •

edited by awaelchli

Loading

Uh oh!

pep8speaks commented Mar 17, 2021 •

edited

Loading

Uh oh!

tchaton left a comment

Uh oh!

awaelchli Mar 17, 2021

Uh oh!

awaelchli Mar 17, 2021

Uh oh!

awaelchli Mar 17, 2021

Uh oh!

justusschock Mar 17, 2021 •

edited

Loading

Uh oh!

awaelchli Mar 17, 2021

Uh oh!

awaelchli Mar 20, 2021 •

edited

Loading

Uh oh!

Borda commented Mar 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Add AMP for validation, prediction and testing #6565

Add AMP for validation, prediction and testing #6565

Uh oh!

Conversation

justusschock commented Mar 17, 2021 • edited by awaelchli Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

PR review

Did you have fun?

Uh oh!

pep8speaks commented Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2021-03-20 21:40:41 UTC

Uh oh!

tchaton left a comment

Choose a reason for hiding this comment

Uh oh!

awaelchli Mar 17, 2021

Choose a reason for hiding this comment

Uh oh!

awaelchli Mar 17, 2021

Choose a reason for hiding this comment

Uh oh!

awaelchli Mar 17, 2021

Choose a reason for hiding this comment

Uh oh!

justusschock Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awaelchli Mar 17, 2021

Choose a reason for hiding this comment

Uh oh!

awaelchli Mar 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Borda commented Mar 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

justusschock commented Mar 17, 2021 •

edited by awaelchli

Loading

pep8speaks commented Mar 17, 2021 •

edited

Loading

justusschock Mar 17, 2021 •

edited

Loading

awaelchli Mar 20, 2021 •

edited

Loading