Skip to content

Enable moving the model to gpu within a work locally. #15699

@tchaton

Description

@tchaton

🚀 Feature

Motivation

Lightning App MPBackend uses multiprocessing.Process to run the works. When trying to move a model to GPU locally, it raised a RuntimeError: Cannot re-initialize CUDA in forked subprocess.

import lightning as L
import torch

class Work(L.LightningWork):

    def run(self):
        torch.zeros(1, device="cuda")

app = L.LightningApp(Work())
  File "/home/thomas/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/thomas/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/thomas/lightning/src/lightning/app/utilities/proxies.py", line 418, in __call__
    raise e
  File "/home/thomas/lightning/src/lightning/app/utilities/proxies.py", line 401, in __call__
    self.run_once()
  File "/home/thomas/lightning/src/lightning/app/utilities/proxies.py", line 549, in run_once
    self.work.on_exception(e)
  File "/home/thomas/lightning/src/lightning/app/core/work.py", line 564, in on_exception
    raise exception
  File "/home/thomas/lightning/src/lightning/app/utilities/proxies.py", line 514, in run_once
    ret = self.run_executor_cls(self.work, work_run, self.delta_queue)(*args, **kwargs)
  File "/home/thomas/lightning/src/lightning/app/utilities/proxies.py", line 350, in __call__
    return self.work_run(*args, **kwargs)
  File "gpu_app.py", line 8, in run
    torch.zeros(1, device="cuda")
  File "/home/thomas/Dreambooth_app/.venv/lib/python3.8/site-packages/torch/cuda/__init__.py", line 207, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Pitch

Alternatives

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging PyTorch Lightning, Transformers, and Hydra.

cc @tchaton

Metadata

Metadata

Assignees

No one assigned

    Labels

    app (removed)Generic label for Lightning App packagepriority: 0High priority task

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions