-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Description
🐛 Describe the bug
tl;dr: Some CUDA libraries are distributed alongside Torch via PyPI packages. These packages include nvidia-cudnn-cu11
, nvidia-cusparse-cu11
, and so on. Torch's __init__.py
has various tricks to find and load these libraries, but one of these tricks break when Torch is installed in a different location to the nvidia-*
packages. This could be fixed by linking all of Torch's CUDA dependencies into libtorch_global_deps.so
.
Longer version:
I'm using Torch PyPI with the pants build system, which creates Python environments with a slightly weird layout. Specifically, each package ends up in its own directory, rather than everything landing in site-packages
like it would in a virtualenv. This causes problems when I attempt to import PyTorch 2.0.0:
ImportError Traceback (most recent call last)
<ipython-input-20-eb42ca6e4af3> in <cell line: 1>()
----> 1 import torch
~/.cache/pants/named_caches/pex_root/installed_wheels/6befaad784004b7af357e3d87fa0863c1f642866291f12a4c2af2de435e8ac5c/torch-2.0.0-cp39-cp39-manylinux1_x86_64.whl/torch/__init__.py in <module>
--> 239 from torch._C import * # noqa: F403
240
241 # Appease the type checker; ordinarily this binding is inserted by the
ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory
I think this may point at an issue with the shared library loading logic in Torch. Specifically, _load_global_deps()
in Torch's __init__.py
has this logic that first attempts to load globals deps from libtorch_global_deps.so
, and then attempts to load any missing libraries if the CDLL()
call fails:
# See Note [Global dependencies]
def _load_global_deps():
# ... snip ...
lib_name = 'libtorch_global_deps' + ('.dylib' if platform.system() == 'Darwin' else '.so')
here = os.path.abspath(__file__)
lib_path = os.path.join(os.path.dirname(here), 'lib', lib_name)
try:
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
except OSError as err:
cuda_libs: Dict[str, str] = {
'cublas': 'libcublas.so.*[0-9]',
'cudnn': 'libcudnn.so.*[0-9]',
'cuda_nvrtc': 'libnvrtc.so.*[0-9].*[0-9]',
'cuda_runtime': 'libcudart.so.*[0-9].*[0-9]',
'cuda_cupti': 'libcupti.so.*[0-9].*[0-9]',
'cufft': 'libcufft.so.*[0-9]',
'curand': 'libcurand.so.*[0-9]',
'cusolver': 'libcusolver.so.*[0-9]', X
'cusparse': 'libcusparse.so.*[0-9]', X
'nccl': 'libnccl.so.*[0-9]', X
'nvtx': 'libnvToolsExt.so.*[0-9]',
}
is_cuda_lib_err = [lib for lib in cuda_libs.values() if(lib.split('.')[0] in err.args[0])]
# ... some more logic to load libs by looking through `sys.path` ...
On my system, the CDLL()
call succeeds at loading torch-2.0.0-cp39-cp39-manylinux1_x86_64.whl/torch/lib/libtorch_global_deps.so
, so it returns immediately without attempting to load the libraries in the cuda_libs
dict. However, that .so
file only links to a subset of the libraries listed above:
$ ldd /long/path/to/torch-2.0.0-cp39-cp39-manylinux1_x86_64.whl/torch/lib/libtorch_global_deps.so
linux-vdso.so.1 (0x00007ffe3b7d1000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f6d85c92000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6d85b41000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f6d85b3b000)
libcurand.so.10 => /lib/x86_64-linux-gnu/libcurand.so.10 (0x00007f6d7ff4b000)
libcufft.so.10 => /lib/x86_64-linux-gnu/libcufft.so.10 (0x00007f6d774be000)
libcublas.so.11 => /lib/x86_64-linux-gnu/libcublas.so.11 (0x00007f6d6dd40000)
libcublasLt.so.11 => /lib/x86_64-linux-gnu/libcublasLt.so.11 (0x00007f6d58cda000)
libcudart.so.11.0 => /lib/x86_64-linux-gnu/libcudart.so.11.0 (0x00007f6d58a34000)
libnvToolsExt.so.1 => /lib/x86_64-linux-gnu/libnvToolsExt.so.1 (0x00007f6d5882a000)
libgomp-a34b3233.so.1 => /long/path/to/torch-2.0.0-cp39-cp39-manylinux1_x86_64.whl/torch/lib/libgomp-a34b3233.so.1 (0x00007f6d58600000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6d5840e000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6d85cd9000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6d58404000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6d583e7000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6d58205000)
Some libraries from cuda_libs
are missing from the ldd
output. This is fine when the nvidia-*
Python packages are installed in the same directory as Torch, because Python can Torch's RPATH to find the packages. Specifically, the RPATH has a bunch of relative paths to the nvidia libraries, which look like this:
$ORIGIN/../../nvidia/cublas/lib:$ORIGIN/../../nvidia/cuda_cupti/lib:$ORIGIN/../../nvidia/cuda_nvrtc/lib:$ORIGIN/../../nvidia/cuda_runtime/lib:$ORIGIN/../../nvidia/cudnn/lib:$ORIGIN/../../nvidia/cufft/lib:$ORIGIN/../../nvidia/curand/lib:$ORIGIN/../../nvidia/cusolver/lib:$ORIGIN/../../nvidia/cusparse/lib:$ORIGIN/../../nvidia/nccl/lib:$ORIGIN/../../nvidia/nvtx/lib:$ORIGIN
Unfortunately these relative paths do not work when Torch is installed in a different directory to the nvidia-*
packages, which is the case for me.
__init__.py
already has the logic necessary to fix this problem by scanning sys.path
for the missing libraries. However, that logic currently only gets triggered when the libtorch_global_deps
import fails. When I modify the code to always look for these libraries, I can import PyTorch again:
try:
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
raise OSError("libcudnn libnvrtc libcupti libcusolver libcusparse libnccl") # always look for these libraries
except OSError as err:
cuda_libs: Dict[str, str] = {
# ... etc. ...
Ideally __init__.py
should use a more robust test to determine whether libcudnn
and friends can be loaded. Probably the easiest fix is to link all the libs from cuda_libs
into libtorch_global_deps
.
Versions
Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.9.5 (default, Nov 23 2021, 15:27:38) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-125-generic-x86_64-with-glibc2.31
Is CUDA available: N/A
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration:
GPU 0: NVIDIA RTX A6000
GPU 1: NVIDIA RTX A6000
GPU 2: NVIDIA RTX A6000
GPU 3: NVIDIA RTX A6000
GPU 4: NVIDIA RTX A6000
GPU 5: NVIDIA RTX A6000
GPU 6: NVIDIA RTX A6000
GPU 7: NVIDIA RTX A6000
Nvidia driver version: 510.60.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 1
Core(s) per socket: 64
Socket(s): 2
NUMA node(s): 2
Vendor ID: AuthenticAMD
CPU family: 25
Model: 1
Model name: AMD EPYC 7763 64-Core Processor
Stepping: 1
Frequency boost: enabled
CPU MHz: 3249.791
CPU max MHz: 2450.0000
CPU min MHz: 1500.0000
BogoMIPS: 4900.34
Virtualization: AMD-V
L1d cache: 4 MiB
L1i cache: 4 MiB
L2 cache: 64 MiB
L3 cache: 512 MiB
NUMA node0 CPU(s): 0-63
NUMA node1 CPU(s): 64-127
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and _user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP disabled, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tc
e topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v
vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca
Versions of relevant libraries:
[pip3] flake8==3.7.9
[pip3] numpy==1.17.4
[conda] No relevant packages