Skip to content

Test AMP and Apex checkpointing #11885

@awaelchli

Description

@awaelchli

🚀 Feature

We currently don't have any tests that the amp/apex states (i.e. scaler) are saved and restored correctly (I couldn't find any such tests).

Motivation

PRs like #11638 which change the loading and saving behavior risk introducing bugs, especially when complicated logic is involved to remain backward compatible.

Pitch

Add tests.

Alternatives

None

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @awaelchli @ananthsub @ninginthecloud @rohitgr7 @Borda @akihironitta @carmocca @justusschock

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions