Skip to content
This repository was archived by the owner on Mar 20, 2023. It is now read-only.

Conversation

@olupton
Copy link
Contributor

@olupton olupton commented Dec 14, 2021

Re-enable various bits of CI and don't try and enable OpenMP target offload with MOD2C.

With this change we have one new GitLab CI build, giving a total of:

  • CPU-only with the Intel compiler and MOD2C
  • CPU-only with the Intel compiler and NMODL, no SymPy
  • GPU-enabled using OpenACC, using MOD2C and OpenMP host parallelism
  • GPU-enabled using OpenACC, using NMODL and SymPy, no OpenMP host parallelism
  • GPU-enabled using OpenMP target offload, using NMODL, no SymPy (*), and OpenMP host parallelism

(*) SymPy doesn't work because of OpenMP/Eigen issues: https://forums.developer.nvidia.com/t/enabling-openmp-offload-breaks-openacc-code/196643

The other main change concerning OpenMP support with the NVIDIA compilers is that we no longer enable OpenACC. During the hackathon, when we had not migrated the data transfer code to use OpenMP, we were enabling both OpenACC and OpenMP (-acc -mp=gpu) and relying on the compilers' interoperability. This PR drops the -acc in that case. Making this work required numerous small fixes, with a lot of overlap with the draft changes for LLVM and XLC offload support.

Use certain branches for the SimulationStack CI

CI_BRANCHES:NEURON_BRANCH=master,NMODL_BRANCH=hackathon_main,

@olupton olupton force-pushed the olupton/reenable-ci branch from 7f5253c to 8542e9c Compare December 14, 2021 09:46
@olupton
Copy link
Contributor Author

olupton commented Dec 14, 2021

The CI failures seem to be somehow related to differences between the phase 1 and phase 2 GPU nodes on the cluster. I can only reproduce the failure locally on a phase 2 node, and only with 2 MPI ranks.

@olupton olupton closed this Dec 15, 2021
@olupton olupton reopened this Dec 15, 2021
@olupton olupton requested a review from iomaganaris December 15, 2021 15:44
@olupton olupton force-pushed the olupton/reenable-ci branch from 7f060ad to f4c2eb6 Compare December 16, 2021 09:00
@olupton olupton requested review from alkino and pramodk December 17, 2021 09:08
Copy link
Contributor

@iomaganaris iomaganaris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. Some small questions here and there.
Another question I have whether the OpenMP + Unified memory has been tested either in the CI or locally

@olupton
Copy link
Contributor Author

olupton commented Dec 17, 2021

Another question I have whether the OpenMP + Unified memory has been tested either in the CI or locally

I think that the Jenkins CI running on #713 will be trying to do something there, but I have not checked it carefully. I never tried this myself locally.

Co-authored-by: Ioannis Magkanaris <[email protected]>
@olupton olupton force-pushed the olupton/reenable-ci branch from 8954db4 to 9f8da7c Compare December 17, 2021 10:53
@iomaganaris
Copy link
Contributor

Another question I have whether the OpenMP + Unified memory has been tested either in the CI or locally

I think that the Jenkins CI running on #713 will be trying to do something there, but I have not checked it carefully. I never tried this myself locally.

I see. Something to check before merging to master then

Copy link
Collaborator

@pramodk pramodk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
(Skimmed through changes except CI part and LGTM. When the PR will be created against master then we can discuss overall changes.)

Comment on lines +67 to +70
// It seems that with NVHPC 21.9 then only setting the default OpenMP device
// is not enough: there were errors on some nodes when not-the-0th GPU was
// used. These seemed to be related to the NMODL instance structs, which are
// allocated using cudaMallocManaged.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is still an issue but curious if this is still an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still an issue in 21.11 you mean?

@olupton olupton merged commit 3fc7037 into hackathon_main Dec 17, 2021
@olupton olupton deleted the olupton/reenable-ci branch December 17, 2021 13:53
olupton added a commit that referenced this pull request Dec 23, 2021
Summary of changes:
 - Support OpenMP target offload when NMODL and GPU support are enabled.
   (#693, #704, #705, #707, #708, #716, #719)
 - Use sensible defaults for the --nwarp parameter, improving the performance
   of the Hines solver with --cell-permute=2 on GPU. (#700, #710, #718)
 - Use a Boost memory pool, if Boost is available, to reduce the number of
   independent CUDA unified memory allocations used for Random123 stream
   objects. This speeds up initialisation of models using Random123, and also
   makes it feasible to use NSight Compute on models using Random123 and for
   NSight Systems to profile initialisation. (#702, #703)
 - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended
   on the NVIDIA forums. (#721)
 - Do not compile for compute capability 6.0 by default, as this is not
   supported by NVHPC with OpenMP target offload.
 - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and
   OpenMP. (#698, #717)
 - Add CUDA runtime header search path explicitly, so we don't rely on it being
   implicit in our NVHPC localrc.
 - Cleanup unused code. (#711)

Co-authored-by: Pramod Kumbhar <[email protected]>
Co-authored-by: Ioannis Magkanaris <[email protected]>
Co-authored-by: Christos Kotsalos <[email protected]>
Co-authored-by: Nicolas Cornu <[email protected]>
pramodk pushed a commit to neuronsimulator/nrn that referenced this pull request Nov 2, 2022
Summary of changes:
 - Support OpenMP target offload when NMODL and GPU support are enabled.
   (BlueBrain/CoreNeuron#693, BlueBrain/CoreNeuron#704, BlueBrain/CoreNeuron#705, BlueBrain/CoreNeuron#707, BlueBrain/CoreNeuron#708, BlueBrain/CoreNeuron#716, BlueBrain/CoreNeuron#719)
 - Use sensible defaults for the --nwarp parameter, improving the performance
   of the Hines solver with --cell-permute=2 on GPU. (BlueBrain/CoreNeuron#700, BlueBrain/CoreNeuron#710, BlueBrain/CoreNeuron#718)
 - Use a Boost memory pool, if Boost is available, to reduce the number of
   independent CUDA unified memory allocations used for Random123 stream
   objects. This speeds up initialisation of models using Random123, and also
   makes it feasible to use NSight Compute on models using Random123 and for
   NSight Systems to profile initialisation. (BlueBrain/CoreNeuron#702, BlueBrain/CoreNeuron#703)
 - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended
   on the NVIDIA forums. (BlueBrain/CoreNeuron#721)
 - Do not compile for compute capability 6.0 by default, as this is not
   supported by NVHPC with OpenMP target offload.
 - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and
   OpenMP. (BlueBrain/CoreNeuron#698, BlueBrain/CoreNeuron#717)
 - Add CUDA runtime header search path explicitly, so we don't rely on it being
   implicit in our NVHPC localrc.
 - Cleanup unused code. (BlueBrain/CoreNeuron#711)

Co-authored-by: Pramod Kumbhar <[email protected]>
Co-authored-by: Ioannis Magkanaris <[email protected]>
Co-authored-by: Christos Kotsalos <[email protected]>
Co-authored-by: Nicolas Cornu <[email protected]>

CoreNEURON Repo SHA: BlueBrain/CoreNeuron@423ae6c
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants