-
Notifications
You must be signed in to change notification settings - Fork 66
Rebuild ESPResSo for CUDA sanity check #1168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
|
Hm... seems to be compiled for SM_52, and with PTX code for SM_52. So, something needs to be fixed here: how do we convince the ESPResSo build system to compile for a different CUDA arch? |
|
Seems like that's hardcoded here for version 4.2.2: Though new versions apparently will allow you to explicitly define this: |
|
The following patch should work: diff --git a/cmake/FindCUDACompilerNVCC.cmake b/cmake/FindCUDACompilerNVCC.cmake
index 08f9db312..f68d4db94 100644
--- a/cmake/FindCUDACompilerNVCC.cmake
+++ b/cmake/FindCUDACompilerNVCC.cmake
@@ -52,8 +52,16 @@ list(APPEND CUDA_NVCC_FLAGS_COVERAGE -O3 -g -Xptxas=-O3 -Xcompiler=-Og,-g)
list(APPEND CUDA_NVCC_FLAGS_RELWITHASSERT -O3 -g -Xptxas=-O3 -Xcompiler=-O3,-g)
-if(CMAKE_CUDA_COMPILER_VERSION VERSION_LESS 11)
- list(APPEND CUDA_NVCC_FLAGS -gencode=arch=compute_30,code=sm_30)
+if(NOT DEFINED ESPRESSO_CUDA_ARCHITECTURES)
+ if("$ENV{CUDAARCHS}" STREQUAL "")
+ set(ESPRESSO_CUDA_ARCHITECTURES "75;86;89" CACHE INTERNAL "")
+ else()
+ set(ESPRESSO_CUDA_ARCHITECTURES "$ENV{CUDAARCHS}" CACHE INTERNAL "")
+ endif()
endif()
+foreach(ESPRESSO_CUDA_ARCH ${ESPRESSO_CUDA_ARCHITECTURES})
+ list(APPEND CUDA_NVCC_FLAGS
+ "-gencode=arch=compute_${ESPRESSO_CUDA_ARCH},code=sm_${ESPRESSO_CUDA_ARCH}"
+ "-gencode=arch=compute_${ESPRESSO_CUDA_ARCH},code=compute_${ESPRESSO_CUDA_ARCH}")
+endforeach()
list(APPEND CUDA_NVCC_FLAGS
- -gencode=arch=compute_52,code=sm_52
- -gencode=arch=compute_52,code=compute_52 -std=c++${CMAKE_CUDA_STANDARD}
+ -std=c++${CMAKE_CUDA_STANDARD}
$<$<BOOL:${WARNINGS_ARE_ERRORS}>:-Xcompiler=-Werror;-Xptxas=-Werror>With this one can invoke the |
...o/2023.06/accel/nvidia/rebuilds/20250828-eb-5.1.1-rebuild-ESPResSo-for-cuda-sanity-check.yml
Outdated
Show resolved
Hide resolved
…50828-eb-5.1.1-rebuild-ESPResSo-for-cuda-sanity-check.yml
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
...o/2023.06/accel/nvidia/rebuilds/20250828-eb-5.1.1-rebuild-ESPResSo-for-cuda-sanity-check.yml
Outdated
Show resolved
Hide resolved
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
edit: oh, of course, we need to set |
|
Or pass |
...o/2023.06/accel/nvidia/rebuilds/20250828-eb-5.1.1-rebuild-ESPResSo-for-cuda-sanity-check.yml
Outdated
Show resolved
Hide resolved
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
|
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws on:arch=zen2 for:arch=x86_64/amd/zen2,accel=nvidia/cc70 |
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
Only need to do one more build with the Surf bot, but the cluster is currently down. |
|
Snellius is back online. Not sure if the bot is already available, but let's give it a try: bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90 |
|
New job on instance
|
|
The Surf job was OOM killed. Don't really understand this, the other job that ran on Snellius (for icelake+cc80, see #1168 (comment)) only used: |
|
Let's try it one more time: bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90 |
|
New job on instance
|
|
Staging PR merged, all tarballs have been ingested 🎉 |
Note: I've seen in an interactive build that this didn't pass the CUDA sanity check. So we may have to investigate why not (i.e. which files don't provide the correct device code, and why), and how to make it pass...