-
Notifications
You must be signed in to change notification settings - Fork 41
Re-enable CI tests on hackathon branch #717
Conversation
7f5253c to
8542e9c
Compare
|
The CI failures seem to be somehow related to differences between the phase 1 and phase 2 GPU nodes on the cluster. I can only reproduce the failure locally on a phase 2 node, and only with 2 MPI ranks. |
7f060ad to
f4c2eb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. Some small questions here and there.
Another question I have whether the OpenMP + Unified memory has been tested either in the CI or locally
I think that the Jenkins CI running on #713 will be trying to do something there, but I have not checked it carefully. I never tried this myself locally. |
Co-authored-by: Ioannis Magkanaris <[email protected]>
8954db4 to
9f8da7c
Compare
I see. Something to check before merging to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
(Skimmed through changes except CI part and LGTM. When the PR will be created against master then we can discuss overall changes.)
| // It seems that with NVHPC 21.9 then only setting the default OpenMP device | ||
| // is not enough: there were errors on some nodes when not-the-0th GPU was | ||
| // used. These seemed to be related to the NMODL instance structs, which are | ||
| // allocated using cudaMallocManaged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this is still an issue but curious if this is still an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still an issue in 21.11 you mean?
Summary of changes: - Support OpenMP target offload when NMODL and GPU support are enabled. (#693, #704, #705, #707, #708, #716, #719) - Use sensible defaults for the --nwarp parameter, improving the performance of the Hines solver with --cell-permute=2 on GPU. (#700, #710, #718) - Use a Boost memory pool, if Boost is available, to reduce the number of independent CUDA unified memory allocations used for Random123 stream objects. This speeds up initialisation of models using Random123, and also makes it feasible to use NSight Compute on models using Random123 and for NSight Systems to profile initialisation. (#702, #703) - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended on the NVIDIA forums. (#721) - Do not compile for compute capability 6.0 by default, as this is not supported by NVHPC with OpenMP target offload. - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and OpenMP. (#698, #717) - Add CUDA runtime header search path explicitly, so we don't rely on it being implicit in our NVHPC localrc. - Cleanup unused code. (#711) Co-authored-by: Pramod Kumbhar <[email protected]> Co-authored-by: Ioannis Magkanaris <[email protected]> Co-authored-by: Christos Kotsalos <[email protected]> Co-authored-by: Nicolas Cornu <[email protected]>
Summary of changes: - Support OpenMP target offload when NMODL and GPU support are enabled. (BlueBrain/CoreNeuron#693, BlueBrain/CoreNeuron#704, BlueBrain/CoreNeuron#705, BlueBrain/CoreNeuron#707, BlueBrain/CoreNeuron#708, BlueBrain/CoreNeuron#716, BlueBrain/CoreNeuron#719) - Use sensible defaults for the --nwarp parameter, improving the performance of the Hines solver with --cell-permute=2 on GPU. (BlueBrain/CoreNeuron#700, BlueBrain/CoreNeuron#710, BlueBrain/CoreNeuron#718) - Use a Boost memory pool, if Boost is available, to reduce the number of independent CUDA unified memory allocations used for Random123 stream objects. This speeds up initialisation of models using Random123, and also makes it feasible to use NSight Compute on models using Random123 and for NSight Systems to profile initialisation. (BlueBrain/CoreNeuron#702, BlueBrain/CoreNeuron#703) - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended on the NVIDIA forums. (BlueBrain/CoreNeuron#721) - Do not compile for compute capability 6.0 by default, as this is not supported by NVHPC with OpenMP target offload. - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and OpenMP. (BlueBrain/CoreNeuron#698, BlueBrain/CoreNeuron#717) - Add CUDA runtime header search path explicitly, so we don't rely on it being implicit in our NVHPC localrc. - Cleanup unused code. (BlueBrain/CoreNeuron#711) Co-authored-by: Pramod Kumbhar <[email protected]> Co-authored-by: Ioannis Magkanaris <[email protected]> Co-authored-by: Christos Kotsalos <[email protected]> Co-authored-by: Nicolas Cornu <[email protected]> CoreNEURON Repo SHA: BlueBrain/CoreNeuron@423ae6c
Re-enable various bits of CI and don't try and enable OpenMP target offload with MOD2C.
With this change we have one new GitLab CI build, giving a total of:
(*) SymPy doesn't work because of OpenMP/Eigen issues: https://forums.developer.nvidia.com/t/enabling-openmp-offload-breaks-openacc-code/196643
The other main change concerning OpenMP support with the NVIDIA compilers is that we no longer enable OpenACC. During the hackathon, when we had not migrated the data transfer code to use OpenMP, we were enabling both OpenACC and OpenMP (
-acc -mp=gpu) and relying on the compilers' interoperability. This PR drops the-accin that case. Making this work required numerous small fixes, with a lot of overlap with the draft changes for LLVM and XLC offload support.Use certain branches for the SimulationStack CI
CI_BRANCHES:NEURON_BRANCH=master,NMODL_BRANCH=hackathon_main,