-
Notifications
You must be signed in to change notification settings - Fork 201
Support non-CTK Nvidia libraries #864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support non-CTK Nvidia libraries #864
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
This comment has been minimized.
This comment has been minimized.
/ok to test |
/ok to test |
…ep find_all_so_files_via_metadata as fallback
…longer needed due to resolution of NVIDIA#608).
… keep find_all_dll_files_via_metadata as fallback
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a quick pass. Will resume tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the two make_site_packages_libdirs_*.py
files can be merged into one. You already handled the directory separators (/
vs \
or \\
) properly, and always use the former in supported_nvidia_libs.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot Could you please merge the newly added toolshed/make_site_packages_libdirs_*.py
scripts into one?
(These scripts were both auto-generated by ChatGPT (with a very minimal effort), I decided it's not worth a manual effort consolidating them.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copilot doesn't listen here. I tried via the Spaces feature, but that's only GPT 4.1, and it prompted me to attach files manually. So I went back to ChatGPT 5 Pro, the thinking version, that I mostly use.
I'm attaching what it generated. I'll have time to try it out only later. The line counts:
76 make_site_packages_libdirs_linux.py
110 make_site_packages_libdirs_windows.py
155 make_site_packages_libdirs.py
Do you think it's worth more effort? The original scripts were generated in ~5 minutes each, including writing the prompts.
Throwing them away seems wrong.
Working on the more, I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've manually consolidated the two script now: commit 69d7c98
Even ChatGPT 5 Pro didn't get it right the first time through, two iterations later it was a pretty big mess, then I just did it manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need to make copilot as an assignee before we can "at" it, but we probably can't do so in an existing PR, only before a PR is created.
…ll_using_nvidia_bin_dirs and associated foreign_wheels unit test
As discussed in our meeting today: I removed the fallback code paths in find_nvidia_dynamic_lib.py commit 5711602 |
/ok to test |
…libdirs_windows.py → make_site_packages_libdirs.py
901dc47
to
69d7c98
Compare
@leofang I'm holding off rerunning the tests, pending your full review. (I think the delta between the last time the full tests ran is small.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks a lot for the hard and thorough work Ralf!
|
||
LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY = ( | ||
"cufft", | ||
"nvrtc", | ||
) | ||
|
||
LIBNAMES_REQUIRING_RTLD_DEEPBIND = ("cufftMp",) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: is deepbind needed so as to avoid symbol collision with libcufft?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, but would be nice to revisit after the dust is settled.
I believe this is another file that can be merged with find_site_packages_so.py
. If the names dll
vs so
are bothering you, just refer to them as, say, dso
(dynamic shared libraries).
Even on Windows, we also have the same version "suffix" concept despite they are not literally the suffices. Taking cuBLASLt as example: cublasLt64_12.dll
is the Windows-equivalent of libcublasLt.so.12
on Linux, and 12 is the "suffix" . So we could have a split_dso_version_suffix
that handles both platforms, and then find_all_dso_files_via_metadata
is unified.
The same reasoning that find_all_so_files_via_metadata
returns a dict of dicts, strictly speaking, also applies to Windows, so a later unification of these two files can also make our treatment more robust.
"nvfatbin": ("nvidia/cu13/lib", "nvidia/nvfatbin/lib"), | ||
"nvjpeg": ("nvidia/cu13/lib", "nvidia/nvjpeg/lib"), | ||
"nvrtc": ("nvidia/cu13/lib", "nvidia/cuda_nvrtc/lib"), | ||
"nvvm": ("nvidia/cu13/lib", "nvidia/cuda_nvcc/nvvm/lib64"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh man, so in CUDA 13 the nvvm
subdir still exists, but only libdevice is still kept there while everything else is moved to bin/include/lib... 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is annoying... so the wheel layout for NVVM is changed, but not the system CTK which still has $CUDA_PATH/nvvm
...
$ tree /usr/local/cuda-13.0/nvvm/
/usr/local/cuda-13.0/nvvm/
├── bin
│ └── cicc
├── include
│ └── nvvm.h
├── lib64
│ ├── libnvvm.so -> libnvvm.so.4
│ ├── libnvvm.so.4 -> libnvvm.so.4.0.0
│ └── libnvvm.so.4.0.0
└── libdevice
└── libdevice.10.bc
4 directories, 6 files
/ok to test 69d7c98 |
Thanks Leo! |
|
Closes #776, #823, #856
Bump cuda-pathfinder version to
1.1.1a3
Adds support for non-CTK Nvidia libraries, as needed for
nvmath
:CUDA-related:
mathdx
([FEA]: Support for finding libmathdx through pathfinder #776)cufftMp
([FEA]: pathfinder support forcufftMp
wheel #856)nvshmem_host
Non-CUDA:
nvpl_fftw
A general approach for finding .so/.dll files under site-packages was implemented in these files:
cuda/pathfinder/_utils/find_site_packages_dll.py
cuda/pathfinder/_utils/find_site_packages_so.py
However, this turned out to be relatively slow (~1.9 seconds in a certain environment with a network drive), even though only
RECORD
files are inspected, via the stdlibimportlib.metadata
API. (An initial version that simply walked all site-packages directories was prohibitively slow, ~100 seconds in the same environment.)To maximize performance, and to make supporting more libraries more intutive at the same time, the approach to searching site-packages was changed to rely on new dictionaries added in
supported_nvidia_libs.py
:SITE_PACKAGES_LIBDIRS_LINUX
SITE_PACKAGES_LIBDIRS_WINDOWS
These dictionaries are based on the output from two pairs of helper scripts added under the
toolshed/
directory:collect_site_packages_so_files.sh
,make_site_packages_libdirs_linux.py
collect_site_packages_dll_files.ps1
,make_site_packages_libdirs_windows.py
With this, the same test that took 1.9 seconds to complete with the slower
importlib.metadata
-based approach completes in ~0.2 seconds (similar to status quo). — The slower approach is now used only fromtest_load_nvidia_dynamic_lib.py
, as an easy way to determine dynamically which wheels are available on each platform.A simple
LIBNAMES_REQUIRING_RTLD_DEEPBIND
feature was added to supportcufftMp
.Piggy-backed:
tests/child_load_nvidia_dynamic_lib_helper.py
was factored out oftests/test_load_nvidia_dynamic_lib.py
to improve test performance, especially on Windows (factor 2x).