-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🚀 Feature
ddp_fork (and associated alias strategies) cannot currently be used along with native AMP due to the invocation of the CUDA Runtime API within the call to GradScaler in the NativeMixedPrecisionPlugin:
which in turn initializes CUDA and poisons subsequent forks.
It may be possible with a future version of PyTorch to alter the default behavior of torch.cuda.is_available() to use an NVML-based CUDA assessment throughout Lightning. In the meantime, patching torch.cuda.is_available() with Lightning's implementation of the upstream NVML-based assessment can unlock this functionality.
I'll be opening a PR shortly that patches torch.cuda.is_available() within NativeMixedPrecisionPlugin (both Lite and PL versions) and adds a standalone test for the ddp_fork strategy in a CUDA and AMP context (adding a standalone test only for PL given how expensive the standalone multi-gpu tests can be).
Motivation
Many users will use AMP within the context of jupyter notebooks, where if using multiple GPUS, ddp_fork will be important to support.
Pitch
Allow the use of AMP within the context of jupyter notebooks, where if using multiple GPUS, ddp_fork will be important to support.
I will open a small PR shortly that makes this available.
Additional context
There's a related PR in PyTorch currently that may allow the requested modification of torch.cuda.is_available() throughout Lightning without needing to patch the function or add Lightning's own NVML-based assessment (once the relevant version of PyTorch is the minimum)