Skip to content
This repository was archived by the owner on Feb 26, 2025. It is now read-only.

Conversation

@pramodk
Copy link

@pramodk pramodk commented Dec 10, 2021

  • Add NVIDIA HPC-SDK 21.11
  • Pass -mno-abm to 21.11 NVIDIA compilers to avoid an issue with the __ABM__ macro and Random123 (https://forums.developer.nvidia.com/t/21-11-tp-behaviour/198115)
  • Add + install CUDA 11.5.1, which matches the version bundled with the HPC-SDK 21.11 (🤞 Alternate CUDA provider spack/spack#19365 progresses soon)
  • Patch localrc generation for NVIDIA compilers to run in a cleaner environment. Previously this ran after compiler configuration for LLVM, and module load llvm ... module unload llvm left LLVM's autoloaded dependencies, python and cuda, in the environment, and those paths made their way into the localrc file.
  • Don't set CMAKE_C_FLAGS for CoreNEURON + NMODL builds that contain no C code.
  • Add (but don't use anymore) a helper script deploy/set-compiler-flags.py -- cc: @matz-e who was interested in tweaking it.

@pramodk pramodk marked this pull request as draft December 10, 2021 00:41
Copy link
Member

@matz-e matz-e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you file this against merge-upstream, too?

@olupton
Copy link

olupton commented Dec 10, 2021

Retest this please Jenkins.

2 similar comments
@olupton
Copy link

olupton commented Dec 13, 2021

Retest this please Jenkins.

@olupton
Copy link

olupton commented Dec 13, 2021

Retest this please Jenkins.

@olupton
Copy link

olupton commented Dec 13, 2021

I believe the failures are because the new NVC++ with its default target architecture (-tp host) defines __ABM__.

I'm not quite sure what's going on, as

$ nvc++ -V
nvc++ 21.11-0 64-bit target on x86-64 Linux -tp skylake-avx512

suggests it is auto-detecting -tp skylake-avx512, but

$ nvc++ -tp skylake-avx512 test.cpp
"test.cpp", line 2: error: identifier "__ABM__" is undefined
    return __ABM__;
           ^

while

$ nvc++ test.cpp

compiles successfully.

I guess we need to report this to the NVIDIA forums.

I also see in the generated localrc file:

set GPPDIR= /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/externals/2021-01-06/linux-rhel7-x86_64/gcc-9.3.0/python-3.8.3-suxrst/include/python3.8 /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/external    s/2021-01-06/linux-rhel7-x86_64/gcc-9.3.0/cuda-11.0.2-kb4wci/include /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/externals/2021-01-06/linux-rhel7-x86_64/gcc-9.3.0/python-3.8.3-suxrst/include /gpfs/b    bp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2021-01-06/linux-rhel7-x86_64/gcc-4.8.5/gcc-9.3.0-45gzrp/include/c++/9.3.0 /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2021-01-06/linux-rhe    l7-x86_64/gcc-4.8.5/gcc-9.3.0-45gzrp/include/c++/9.3.0/x86_64-pc-linux-gnu /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2021-01-06/linux-rhel7-x86_64/gcc-4.8.5/gcc-9.3.0-45gzrp/include/c++/    9.3.0/backward /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2021-01-06/linux-rhel7-x86_64/gcc-4.8.5/gcc-9.3.0-45gzrp/lib/gcc/x86_64-pc-linux-gnu/9.3.0/include /usr/local/include /gpfs/bbp.c    scs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2021-01-06/linux-rhel7-x86_64/gcc-4.8.5/gcc-9.3.0-45gzrp/include /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2021-01-06/linux-rhel7-x86_64/gcc-    4.8.5/gcc-9.3.0-45gzrp/lib/gcc/x86_64-pc-linux-gnu/9.3.0/include-fixed /usr/include;

which looks like it has too much irrelevant stuff in it.

@olupton olupton force-pushed the pramodk/nvhpc-21.11 branch from 75606e6 to b71ea24 Compare December 15, 2021 14:34
@olupton
Copy link

olupton commented Dec 15, 2021

Note that the release notes say 11.5.1 is included, so I think we should install that one. Hopefully Spack will learn how to express that nvhpc already contains CUDA "soon"...

@olupton olupton changed the title Add NVHPC 21.11 release (only x86_64 linux) Add NVHPC/21.11 and CUDA/11.5.1 Dec 16, 2021
@olupton olupton requested a review from matz-e December 16, 2021 11:18
@olupton
Copy link

olupton commented Dec 17, 2021

@pramodk I can't request a review from you because you're the original author 😅

Copy link
Author

@pramodk pramodk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM if everything is working in production :)

@olupton olupton marked this pull request as ready for review December 21, 2021 10:53
@olupton olupton merged commit 596e000 into develop Dec 21, 2021
@olupton olupton deleted the pramodk/nvhpc-21.11 branch December 21, 2021 12:09
olupton added a commit that referenced this pull request Dec 21, 2021
* Add + deploy NVIDIA HPC SDK 21.11.
* Add + deploy CUDA 11.5.1 to match it.
* Do not define CMAKE_C_FLAGS in `coreneuron+nmodl`.
* Run `makelocalrc` for `nvhpc` in a cleaner environment.
* Add `-mno-abm` to [Core]NEURON recipes for `[email protected]`.

Co-authored-by: Olli Lupton <[email protected]>
olupton added a commit to BlueBrain/CoreNeuron that referenced this pull request Dec 21, 2021
Presumably this was working before because our nvhpc localrc files
accidentally included CUDA include directories before
BlueBrain/spack#1392.
olupton added a commit that referenced this pull request Dec 22, 2021
* Add + deploy NVIDIA HPC SDK 21.11.
* Add + deploy CUDA 11.5.1 to match it.
* Do not define CMAKE_C_FLAGS in `coreneuron+nmodl`.
* Run `makelocalrc` for `nvhpc` in a cleaner environment.
* Add `-mno-abm` to [Core]NEURON recipes for `[email protected]`.

Co-authored-by: Olli Lupton <[email protected]>
olupton added a commit that referenced this pull request Dec 22, 2021
* Cherry-pick of #1392.
* Add + deploy NVIDIA HPC SDK 21.11.
* Add + deploy CUDA 11.5.1 to match it.
* Do not define CMAKE_C_FLAGS in `coreneuron+nmodl`.
* Run `makelocalrc` for `nvhpc` in a cleaner environment.
* Use GCC 11.2.0 for NVHPC instead of GCC 9.4.0.
* Add `-mno-abm` to [Core]NEURON recipes for `[email protected]`.

Co-authored-by: Pramod Kumbhar <[email protected]>
ohm314 pushed a commit to BlueBrain/CoreNeuron that referenced this pull request Dec 23, 2021
Presumably this was working before because our nvhpc localrc files
accidentally included CUDA include directories before
BlueBrain/spack#1392.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants