-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[Cuda] Handle -fcuda-short-ptr even with -nocudalib #111682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When padded -nocudalib/-nogpulib, Cuda's argument handling would bail out before handling -fcuda-short-ptr, meaning the frontend and backend data layouts would mismatch.
|
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-driver Author: Fraser Cormack (frasercrmck) ChangesWhen padded -nocudalib/-nogpulib, Cuda's argument handling would bail out before handling -fcuda-short-ptr, meaning the frontend and backend data layouts would mismatch. Full diff: https://github.com/llvm/llvm-project/pull/111682.diff 2 Files Affected:
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp
index 509cd87b28c37e..7a70cf1c5694fd 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -848,6 +848,10 @@ void CudaToolChain::addClangTargetOptions(
if (CudaInstallation.version() >= CudaVersion::CUDA_90)
CC1Args.push_back("-fcuda-allow-variadic-functions");
+ if (DriverArgs.hasFlag(options::OPT_fcuda_short_ptr,
+ options::OPT_fno_cuda_short_ptr, false))
+ CC1Args.append({"-mllvm", "--nvptx-short-ptr"});
+
if (DriverArgs.hasArg(options::OPT_nogpulib))
return;
@@ -873,10 +877,6 @@ void CudaToolChain::addClangTargetOptions(
clang::CudaVersion CudaInstallationVersion = CudaInstallation.version();
- if (DriverArgs.hasFlag(options::OPT_fcuda_short_ptr,
- options::OPT_fno_cuda_short_ptr, false))
- CC1Args.append({"-mllvm", "--nvptx-short-ptr"});
-
if (CudaInstallationVersion >= CudaVersion::UNKNOWN)
CC1Args.push_back(
DriverArgs.MakeArgString(Twine("-target-sdk-version=") +
diff --git a/clang/test/Driver/cuda-short-ptr.cu b/clang/test/Driver/cuda-short-ptr.cu
new file mode 100644
index 00000000000000..e0ae4505e0b567
--- /dev/null
+++ b/clang/test/Driver/cuda-short-ptr.cu
@@ -0,0 +1,6 @@
+// Checks that cuda compilation does the right thing when passed -fcuda-short-ptr
+
+// RUN: %clang -### --target=x86_64-linux-gnu -c -march=haswell --cuda-gpu-arch=sm_20 -fcuda-short-ptr -nocudainc -nocudalib --cuda-path=%S/Inputs/CUDA/usr/local/cuda %s 2>&1 | FileCheck %s
+
+// CHECK: "-mllvm" "--nvptx-short-ptr"
+// CHECK-SAME: "-fcuda-short-ptr"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable, which architectures require this? I know that NVIDIA deprecated the 32-bit nvptx target in CUDA 12 or something.
I'm not an expert on CUDA but, AFAICT, even in 64-bit CUDA, certain pointers such as those pointing to shared memory are 32 bit, because the size of shared memory is somewhere in the kB range. This generates better code, fewer registers, etc. I'm not sure why the option isn't enabled by default, personally - it seems like I was just playing with the option downstream and noticed this issue. |
I figured it was something like that, since it saves a register per address. I don't know the history for why this isn't the default, it's pretty much just a data layout modifier to state that certain address spaces are 32-bit. Maybe @Artem-B or @jlebar can comment. |
Just threw together a nonsensical example for godbolt: https://godbolt.org/z/bhdEhrxd7. Notice the |
|
I'm not sure why we would ever want the current default if this is an option. It might be a bit of an intrusive change (albeit a relatively mechanical one), but until we see a GPU come to market that has >4GiB addressable shared memory, I think we should use the "short pointer" datalayout as default |
It also applies to constant and private / local address spaces. I don't think those hit 4 GiB yet but it's more feasible than shared. Making address space 3 32-bit by default would make sense to me. |
I was also considering separating out address space 3 into its own option, yep. At that point I suppose you might as well split out the remaining two, and have |
@jhuber6 do you have any thoughts on how we'd best split up It seems as though a comma-joined operation is best marshalled as a list of strings, but having the corresponding "no-" case is pretty rare, and would undoubtedly add extra code and complicate things when trying to tie it in with the old option format. Maybe we don't need |
|
We don't need marshalling because this isn't a cc1 option. This is just handled by the driver which forwards it as |
While it does indeed help with generating better code, using this option while compiling CUDA code may be problematic. IMO short pointers are currently are only safe to use on IR level. You may get by using them from CUDA, but I do not think that enabling them by default is a good idea. |
When passed -nocudalib/-nogpulib, Cuda's argument handling would bail out before handling -fcuda-short-ptr, meaning the frontend and backend data layouts would mismatch.