Skip to content
This repository was archived by the owner on Mar 20, 2023. It is now read-only.

Conversation

@olupton
Copy link
Contributor

@olupton olupton commented Nov 23, 2021

Description
Migrate some offloaded kernels from OpenACC to OpenMP.

The idea with the macros defined here is that the macros nrn_acc_pragma and nrn_omp_pragma are used to annotate expressions that can be used with either OpenACC or OpenMP. i.e.

nrn_acc_pragma(atomic update)
nrn_omp_pragma(atomic update)
rhs += p * rhs;

will expand to either

#pragma acc atomic update
rhs += p * rhs;

or

#pragma omp atomic update
rhs += p * rhs;

Some other directives may be needed, for example if we want OpenACC directives in a build that prefers OpenMP offload then we may need code like

#ifdef CORENRN_PREFER_OPENMP_OFFLOAD
// Make sure OpenACC work is done before we start OpenMP work
if (nt->compute_gpu) {
    _Pragma("acc wait(nt->stream_id)")
}
#endif

Use certain branches for the GitLab/SimulationStack CI
CI_BRANCHES:NMODL_BRANCH=hackathon_main,NEURON_BRANCH=master,

@bbpbuildbot
Copy link
Collaborator

@olupton olupton force-pushed the olupton/basic-openmp branch from e3c8352 to 713b7ad Compare November 29, 2021 09:25
@bbpbuildbot
Copy link
Collaborator

@olupton olupton force-pushed the olupton/basic-openmp branch from 174211a to 80da782 Compare November 29, 2021 14:13
@bbpbuildbot
Copy link
Collaborator

@bbpbuildbot
Copy link
Collaborator

@olupton olupton closed this Nov 30, 2021
@olupton olupton reopened this Nov 30, 2021
@bbpbuildbot
Copy link
Collaborator

@olupton olupton force-pushed the olupton/basic-openmp branch from 1effa13 to e868af4 Compare November 30, 2021 15:39
@bbpbuildbot
Copy link
Collaborator

@bbpbuildbot
Copy link
Collaborator

@bbpbuildbot
Copy link
Collaborator

@olupton olupton closed this Dec 1, 2021
@olupton olupton reopened this Dec 1, 2021
@bbpbuildbot
Copy link
Collaborator

So far:
- Pass -mp=gpu when we pass -acc
- Pass -gpu=lineinfo for better debug information.
- Pass -Minfo=accel,mp for better compile time diagnostics.
- Add nrn_{acc,omp}_pragma macros to make clang-format less painful.
- Add omp_set_default_device call so the CTest suite works.
- Transform one loop in the matrix solver from OpenACC to OpenMP.
- Drop cc60 because of OpenMP offload incompatibility.
@olupton olupton force-pushed the olupton/basic-openmp branch from 572066f to 8cad9eb Compare December 2, 2021 10:06
@bbpbuildbot
Copy link
Collaborator

Copy link
Collaborator

@pramodk pramodk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. As discussed offline, we will fix the clang-formatted part of pragmas manually.

@bbpbuildbot
Copy link
Collaborator

@bbpbuildbot
Copy link
Collaborator

@bbpbuildbot
Copy link
Collaborator

@bbpbuildbot
Copy link
Collaborator

Prefer CORENEURON_ prefixes for macros, CORENRN_ prefixes for CMake
variables.
@bbpbuildbot
Copy link
Collaborator

@olupton olupton closed this Dec 7, 2021
@olupton olupton reopened this Dec 7, 2021
@bbpbuildbot
Copy link
Collaborator

Copy link
Collaborator

@pramodk pramodk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only my open question is about seq part in cellorder.cpp. We can merge this and review it later if you like.

@olupton olupton merged commit 21dc2c8 into hackathon_main Dec 7, 2021
@olupton olupton deleted the olupton/basic-openmp branch December 7, 2021 12:13
olupton added a commit that referenced this pull request Dec 23, 2021
Summary of changes:
 - Support OpenMP target offload when NMODL and GPU support are enabled.
   (#693, #704, #705, #707, #708, #716, #719)
 - Use sensible defaults for the --nwarp parameter, improving the performance
   of the Hines solver with --cell-permute=2 on GPU. (#700, #710, #718)
 - Use a Boost memory pool, if Boost is available, to reduce the number of
   independent CUDA unified memory allocations used for Random123 stream
   objects. This speeds up initialisation of models using Random123, and also
   makes it feasible to use NSight Compute on models using Random123 and for
   NSight Systems to profile initialisation. (#702, #703)
 - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended
   on the NVIDIA forums. (#721)
 - Do not compile for compute capability 6.0 by default, as this is not
   supported by NVHPC with OpenMP target offload.
 - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and
   OpenMP. (#698, #717)
 - Add CUDA runtime header search path explicitly, so we don't rely on it being
   implicit in our NVHPC localrc.
 - Cleanup unused code. (#711)

Co-authored-by: Pramod Kumbhar <[email protected]>
Co-authored-by: Ioannis Magkanaris <[email protected]>
Co-authored-by: Christos Kotsalos <[email protected]>
Co-authored-by: Nicolas Cornu <[email protected]>
pramodk pushed a commit to neuronsimulator/nrn that referenced this pull request Nov 2, 2022
Summary of changes:
 - Support OpenMP target offload when NMODL and GPU support are enabled.
   (BlueBrain/CoreNeuron#693, BlueBrain/CoreNeuron#704, BlueBrain/CoreNeuron#705, BlueBrain/CoreNeuron#707, BlueBrain/CoreNeuron#708, BlueBrain/CoreNeuron#716, BlueBrain/CoreNeuron#719)
 - Use sensible defaults for the --nwarp parameter, improving the performance
   of the Hines solver with --cell-permute=2 on GPU. (BlueBrain/CoreNeuron#700, BlueBrain/CoreNeuron#710, BlueBrain/CoreNeuron#718)
 - Use a Boost memory pool, if Boost is available, to reduce the number of
   independent CUDA unified memory allocations used for Random123 stream
   objects. This speeds up initialisation of models using Random123, and also
   makes it feasible to use NSight Compute on models using Random123 and for
   NSight Systems to profile initialisation. (BlueBrain/CoreNeuron#702, BlueBrain/CoreNeuron#703)
 - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended
   on the NVIDIA forums. (BlueBrain/CoreNeuron#721)
 - Do not compile for compute capability 6.0 by default, as this is not
   supported by NVHPC with OpenMP target offload.
 - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and
   OpenMP. (BlueBrain/CoreNeuron#698, BlueBrain/CoreNeuron#717)
 - Add CUDA runtime header search path explicitly, so we don't rely on it being
   implicit in our NVHPC localrc.
 - Cleanup unused code. (BlueBrain/CoreNeuron#711)

Co-authored-by: Pramod Kumbhar <[email protected]>
Co-authored-by: Ioannis Magkanaris <[email protected]>
Co-authored-by: Christos Kotsalos <[email protected]>
Co-authored-by: Nicolas Cornu <[email protected]>

CoreNEURON Repo SHA: BlueBrain/CoreNeuron@423ae6c
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants