Skip to content

Shared library loading logic breaks when CUDA packages are installed in a non-standard location #101314

@qxcv

Description

@qxcv

🐛 Describe the bug

tl;dr: Some CUDA libraries are distributed alongside Torch via PyPI packages. These packages include nvidia-cudnn-cu11, nvidia-cusparse-cu11, and so on. Torch's __init__.py has various tricks to find and load these libraries, but one of these tricks break when Torch is installed in a different location to the nvidia-* packages. This could be fixed by linking all of Torch's CUDA dependencies into libtorch_global_deps.so.


Longer version:

I'm using Torch PyPI with the pants build system, which creates Python environments with a slightly weird layout. Specifically, each package ends up in its own directory, rather than everything landing in site-packages like it would in a virtualenv. This causes problems when I attempt to import PyTorch 2.0.0:

ImportError                               Traceback (most recent call last)
<ipython-input-20-eb42ca6e4af3> in <cell line: 1>()
----> 1 import torch

~/.cache/pants/named_caches/pex_root/installed_wheels/6befaad784004b7af357e3d87fa0863c1f642866291f12a4c2af2de435e8ac5c/torch-2.0.0-cp39-cp39-manylinux1_x86_64.whl/torch/__init__.py in <module>
--> 239     from torch._C import *  # noqa: F403
    240 
    241 # Appease the type checker; ordinarily this binding is inserted by the

ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory

I think this may point at an issue with the shared library loading logic in Torch. Specifically, _load_global_deps() in Torch's __init__.py has this logic that first attempts to load globals deps from libtorch_global_deps.so, and then attempts to load any missing libraries if the CDLL() call fails:

# See Note [Global dependencies]
def _load_global_deps():
    # ... snip ...

    lib_name = 'libtorch_global_deps' + ('.dylib' if platform.system() == 'Darwin' else '.so')
    here = os.path.abspath(__file__)
    lib_path = os.path.join(os.path.dirname(here), 'lib', lib_name)

    try:
        ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
    except OSError as err:
        cuda_libs: Dict[str, str] = {
            'cublas': 'libcublas.so.*[0-9]',
            'cudnn': 'libcudnn.so.*[0-9]',
            'cuda_nvrtc': 'libnvrtc.so.*[0-9].*[0-9]',
            'cuda_runtime': 'libcudart.so.*[0-9].*[0-9]',
            'cuda_cupti': 'libcupti.so.*[0-9].*[0-9]',
            'cufft': 'libcufft.so.*[0-9]',
            'curand': 'libcurand.so.*[0-9]',
            'cusolver': 'libcusolver.so.*[0-9]', X
            'cusparse': 'libcusparse.so.*[0-9]', X
            'nccl': 'libnccl.so.*[0-9]', X
            'nvtx': 'libnvToolsExt.so.*[0-9]',
        }
        is_cuda_lib_err = [lib for lib in cuda_libs.values() if(lib.split('.')[0] in err.args[0])]
        # ... some more logic to load libs by looking through `sys.path` ...

On my system, the CDLL() call succeeds at loading torch-2.0.0-cp39-cp39-manylinux1_x86_64.whl/torch/lib/libtorch_global_deps.so, so it returns immediately without attempting to load the libraries in the cuda_libs dict. However, that .so file only links to a subset of the libraries listed above:

$ ldd /long/path/to/torch-2.0.0-cp39-cp39-manylinux1_x86_64.whl/torch/lib/libtorch_global_deps.so
        linux-vdso.so.1 (0x00007ffe3b7d1000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f6d85c92000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6d85b41000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f6d85b3b000)
        libcurand.so.10 => /lib/x86_64-linux-gnu/libcurand.so.10 (0x00007f6d7ff4b000)
        libcufft.so.10 => /lib/x86_64-linux-gnu/libcufft.so.10 (0x00007f6d774be000)
        libcublas.so.11 => /lib/x86_64-linux-gnu/libcublas.so.11 (0x00007f6d6dd40000)
        libcublasLt.so.11 => /lib/x86_64-linux-gnu/libcublasLt.so.11 (0x00007f6d58cda000)
        libcudart.so.11.0 => /lib/x86_64-linux-gnu/libcudart.so.11.0 (0x00007f6d58a34000)
        libnvToolsExt.so.1 => /lib/x86_64-linux-gnu/libnvToolsExt.so.1 (0x00007f6d5882a000)
        libgomp-a34b3233.so.1 => /long/path/to/torch-2.0.0-cp39-cp39-manylinux1_x86_64.whl/torch/lib/libgomp-a34b3233.so.1 (0x00007f6d58600000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6d5840e000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f6d85cd9000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6d58404000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6d583e7000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6d58205000)

Some libraries from cuda_libs are missing from the ldd output. This is fine when the nvidia-* Python packages are installed in the same directory as Torch, because Python can Torch's RPATH to find the packages. Specifically, the RPATH has a bunch of relative paths to the nvidia libraries, which look like this:

$ORIGIN/../../nvidia/cublas/lib:$ORIGIN/../../nvidia/cuda_cupti/lib:$ORIGIN/../../nvidia/cuda_nvrtc/lib:$ORIGIN/../../nvidia/cuda_runtime/lib:$ORIGIN/../../nvidia/cudnn/lib:$ORIGIN/../../nvidia/cufft/lib:$ORIGIN/../../nvidia/curand/lib:$ORIGIN/../../nvidia/cusolver/lib:$ORIGIN/../../nvidia/cusparse/lib:$ORIGIN/../../nvidia/nccl/lib:$ORIGIN/../../nvidia/nvtx/lib:$ORIGIN

Unfortunately these relative paths do not work when Torch is installed in a different directory to the nvidia-* packages, which is the case for me.

__init__.py already has the logic necessary to fix this problem by scanning sys.path for the missing libraries. However, that logic currently only gets triggered when the libtorch_global_deps import fails. When I modify the code to always look for these libraries, I can import PyTorch again:

    try:
        ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
        raise OSError("libcudnn libnvrtc libcupti libcusolver libcusparse libnccl")  # always look for these libraries
    except OSError as err:
        cuda_libs: Dict[str, str] = {
            # ... etc. ...

Ideally __init__.py should use a more robust test to determine whether libcudnn and friends can be loaded. Probably the easiest fix is to link all the libs from cuda_libs into libtorch_global_deps.

Versions

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.9.5 (default, Nov 23 2021, 15:27:38) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-125-generic-x86_64-with-glibc2.31
Is CUDA available: N/A
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration:
GPU 0: NVIDIA RTX A6000
GPU 1: NVIDIA RTX A6000
GPU 2: NVIDIA RTX A6000
GPU 3: NVIDIA RTX A6000
GPU 4: NVIDIA RTX A6000
GPU 5: NVIDIA RTX A6000
GPU 6: NVIDIA RTX A6000
GPU 7: NVIDIA RTX A6000

Nvidia driver version: 510.60.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 1
Core(s) per socket: 64
Socket(s): 2
NUMA node(s): 2
Vendor ID: AuthenticAMD
CPU family: 25
Model: 1
Model name: AMD EPYC 7763 64-Core Processor
Stepping: 1
Frequency boost: enabled
CPU MHz: 3249.791
CPU max MHz: 2450.0000
CPU min MHz: 1500.0000
BogoMIPS: 4900.34
Virtualization: AMD-V
L1d cache: 4 MiB
L1i cache: 4 MiB
L2 cache: 64 MiB
L3 cache: 512 MiB
NUMA node0 CPU(s): 0-63
NUMA node1 CPU(s): 64-127
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and _user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP disabled, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tc
e topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v

vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca

Versions of relevant libraries:
[pip3] flake8==3.7.9
[pip3] numpy==1.17.4
[conda] No relevant packages

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: bazeltopic: buildtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions