-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
accelerator: tpuTensor Processing UnitTensor Processing UnitbugSomething isn't workingSomething isn't workingpriority: 0High priority taskHigh priority task
Milestone
Description
❓ Questions and Help
Hi,
I was trying to use pytorch-lightning with TPU on google cloud virtual machine and the virtual machine is created by this command:
gcloud compute instances create tpu-vm \
--machine-type=n1-standard-4 \
--image-project=ml-images \
--image-family=torch-xla \
--boot-disk-size=200GB \
--scopes=cloud-platform
When I try to import pytorch_lightning, it hanged all the time. I tried on both jupyter notebook and python in terminal, the results are the same. When I KeyboardInterrupt the code, it returns the logs below. It seems that it has something to do with the multiprocessing part. I have tried different installation methods but still have this problem. Could you help me to solve this problem? Any ideas or help would be much appreciated! Thanks!
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-1-702a026384b9> in <module>
1 import torch_xla.core.xla_model as xm
----> 2 import pytorch_lightning as pl
3 # from pytorch_lightning import Trainer, seed_everything
/anaconda3/envs/ldetr/lib/python3.6/site-packages/pytorch_lightning/__init__.py in <module>
54 # We are not importing the rest of the lightning during the build process, as it may not be compiled yet
55 else:
---> 56 from pytorch_lightning.core import LightningDataModule, LightningModule
57 from pytorch_lightning.callbacks import Callback
58 from pytorch_lightning.trainer import Trainer
/anaconda3/envs/ldetr/lib/python3.6/site-packages/pytorch_lightning/core/__init__.py in <module>
14
15 from pytorch_lightning.core.datamodule import LightningDataModule
---> 16 from pytorch_lightning.core.lightning import LightningModule
17
18 __all__ = [
/anaconda3/envs/ldetr/lib/python3.6/site-packages/pytorch_lightning/core/lightning.py in <module>
44
45
---> 46 TPU_AVAILABLE = XLADeviceUtils.tpu_device_exists()
47
48 if TPU_AVAILABLE:
/anaconda3/envs/ldetr/lib/python3.6/site-packages/pytorch_lightning/utilities/xla_device_utils.py in tpu_device_exists()
88 """
89 if XLADeviceUtils.TPU_AVAILABLE is None and TORCHXLA_AVAILABLE:
---> 90 XLADeviceUtils.TPU_AVAILABLE = pl_multi_process(XLADeviceUtils._is_device_tpu)()
91 return XLADeviceUtils.TPU_AVAILABLE
/anaconda3/envs/ldetr/lib/python3.6/site-packages/pytorch_lightning/utilities/xla_device_utils.py in wrapper(*args, **kwargs)
41 proc = Process(target=inner_f, args=(queue, func,), kwargs=kwargs)
42 proc.start()
---> 43 proc.join()
44 return queue.get()
45
/anaconda3/envs/ldetr/lib/python3.6/multiprocessing/process.py in join(self, timeout)
122 assert self._parent_pid == os.getpid(), 'can only join a child process'
123 assert self._popen is not None, 'can only join a started process'
--> 124 res = self._popen.wait(timeout)
125 if res is not None:
126 _children.discard(self)
/anaconda3/envs/ldetr/lib/python3.6/multiprocessing/popen_fork.py in wait(self, timeout)
48 return None
49 # This shouldn't block if wait() returned successfully.
---> 50 return self.poll(os.WNOHANG if timeout == 0.0 else 0)
51 return self.returncode
52
/anaconda3/envs/ldetr/lib/python3.6/multiprocessing/popen_fork.py in poll(self, flag)
26 while True:
27 try:
---> 28 pid, sts = os.waitpid(self.pid, flag)
29 except OSError as e:
30 # Child process not yet created. See #1731717
KeyboardInterrupt:
Metadata
Metadata
Assignees
Labels
accelerator: tpuTensor Processing UnitTensor Processing UnitbugSomething isn't workingSomething isn't workingpriority: 0High priority taskHigh priority task