-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🚀 Feature
Motivation
Lightning App MPBackend uses multiprocessing.Process to run the works. When trying to move a model to GPU locally, it raised a RuntimeError: Cannot re-initialize CUDA in forked subprocess.
import lightning as L
import torch
class Work(L.LightningWork):
def run(self):
torch.zeros(1, device="cuda")
app = L.LightningApp(Work()) File "/home/thomas/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/thomas/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/thomas/lightning/src/lightning/app/utilities/proxies.py", line 418, in __call__
raise e
File "/home/thomas/lightning/src/lightning/app/utilities/proxies.py", line 401, in __call__
self.run_once()
File "/home/thomas/lightning/src/lightning/app/utilities/proxies.py", line 549, in run_once
self.work.on_exception(e)
File "/home/thomas/lightning/src/lightning/app/core/work.py", line 564, in on_exception
raise exception
File "/home/thomas/lightning/src/lightning/app/utilities/proxies.py", line 514, in run_once
ret = self.run_executor_cls(self.work, work_run, self.delta_queue)(*args, **kwargs)
File "/home/thomas/lightning/src/lightning/app/utilities/proxies.py", line 350, in __call__
return self.work_run(*args, **kwargs)
File "gpu_app.py", line 8, in run
torch.zeros(1, device="cuda")
File "/home/thomas/Dreambooth_app/.venv/lib/python3.8/site-packages/torch/cuda/__init__.py", line 207, in _lazy_init
raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start methodPitch
Alternatives
Additional context
If you enjoy Lightning, check out our other projects! ⚡
-
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
-
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
-
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
-
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
-
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging PyTorch Lightning, Transformers, and Hydra.
cc @tchaton