Skip to content

Hanging when importing pytorch_lightning on google cloud vm. #4324

@L4zyy

Description

@L4zyy

❓ Questions and Help

Hi,

I was trying to use pytorch-lightning with TPU on google cloud virtual machine and the virtual machine is created by this command:

gcloud compute instances create tpu-vm \
	   --machine-type=n1-standard-4 \
	   --image-project=ml-images \
	   --image-family=torch-xla \
	   --boot-disk-size=200GB \
   --scopes=cloud-platform

When I try to import pytorch_lightning, it hanged all the time. I tried on both jupyter notebook and python in terminal, the results are the same. When I KeyboardInterrupt the code, it returns the logs below. It seems that it has something to do with the multiprocessing part. I have tried different installation methods but still have this problem. Could you help me to solve this problem? Any ideas or help would be much appreciated! Thanks!

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-1-702a026384b9> in <module>
      1 import torch_xla.core.xla_model as xm
----> 2 import pytorch_lightning as pl
      3 # from pytorch_lightning import Trainer, seed_everything

/anaconda3/envs/ldetr/lib/python3.6/site-packages/pytorch_lightning/__init__.py in <module>
     54     # We are not importing the rest of the lightning during the build process, as it may not be compiled yet
     55 else:
---> 56     from pytorch_lightning.core import LightningDataModule, LightningModule
     57     from pytorch_lightning.callbacks import Callback
     58     from pytorch_lightning.trainer import Trainer

/anaconda3/envs/ldetr/lib/python3.6/site-packages/pytorch_lightning/core/__init__.py in <module>
     14 
     15 from pytorch_lightning.core.datamodule import LightningDataModule
---> 16 from pytorch_lightning.core.lightning import LightningModule
     17 
     18 __all__ = [

/anaconda3/envs/ldetr/lib/python3.6/site-packages/pytorch_lightning/core/lightning.py in <module>
     44 
     45 
---> 46 TPU_AVAILABLE = XLADeviceUtils.tpu_device_exists()
     47 
     48 if TPU_AVAILABLE:

/anaconda3/envs/ldetr/lib/python3.6/site-packages/pytorch_lightning/utilities/xla_device_utils.py in tpu_device_exists()
     88         """
     89         if XLADeviceUtils.TPU_AVAILABLE is None and TORCHXLA_AVAILABLE:
---> 90             XLADeviceUtils.TPU_AVAILABLE = pl_multi_process(XLADeviceUtils._is_device_tpu)()
     91         return XLADeviceUtils.TPU_AVAILABLE

/anaconda3/envs/ldetr/lib/python3.6/site-packages/pytorch_lightning/utilities/xla_device_utils.py in wrapper(*args, **kwargs)
     41         proc = Process(target=inner_f, args=(queue, func,), kwargs=kwargs)
     42         proc.start()
---> 43         proc.join()
     44         return queue.get()
     45 

/anaconda3/envs/ldetr/lib/python3.6/multiprocessing/process.py in join(self, timeout)
    122         assert self._parent_pid == os.getpid(), 'can only join a child process'
    123         assert self._popen is not None, 'can only join a started process'
--> 124         res = self._popen.wait(timeout)
    125         if res is not None:
    126             _children.discard(self)

/anaconda3/envs/ldetr/lib/python3.6/multiprocessing/popen_fork.py in wait(self, timeout)
     48                     return None
     49             # This shouldn't block if wait() returned successfully.
---> 50             return self.poll(os.WNOHANG if timeout == 0.0 else 0)
     51         return self.returncode
     52 

/anaconda3/envs/ldetr/lib/python3.6/multiprocessing/popen_fork.py in poll(self, flag)
     26             while True:
     27                 try:
---> 28                     pid, sts = os.waitpid(self.pid, flag)
     29                 except OSError as e:
     30                     # Child process not yet created. See #1731717

KeyboardInterrupt: 

Metadata

Metadata

Assignees

Labels

accelerator: tpuTensor Processing UnitbugSomething isn't workingpriority: 0High priority task

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions