-
Notifications
You must be signed in to change notification settings - Fork 41
[Hackathon] Basic OpenACC -> OpenMP migration. #693
Conversation
|
Logfiles from GitLab pipeline #26098 (:white_check_mark:) have been uploaded here! Status and direct links: |
e3c8352 to
713b7ad
Compare
|
Logfiles from GitLab pipeline #27003 (:white_check_mark:) have been uploaded here! Status and direct links: |
174211a to
80da782
Compare
|
Logfiles from GitLab pipeline #27095 (:white_check_mark:) have been uploaded here! Status and direct links: |
|
Logfiles from GitLab pipeline #27122 (:no_entry:) have been uploaded here! Status and direct links: |
|
Logfiles from GitLab pipeline #27287 (:no_entry:) have been uploaded here! Status and direct links: |
1effa13 to
e868af4
Compare
|
Logfiles from GitLab pipeline #27326 (:no_entry:) have been uploaded here! Status and direct links: |
|
Logfiles from GitLab pipeline #27373 (:white_check_mark:) have been uploaded here! Status and direct links: |
|
Logfiles from GitLab pipeline #27419 (:no_entry:) have been uploaded here! Status and direct links: |
|
Logfiles from GitLab pipeline #27444 (:no_entry:) have been uploaded here! Status and direct links: |
So far:
- Pass -mp=gpu when we pass -acc
- Pass -gpu=lineinfo for better debug information.
- Pass -Minfo=accel,mp for better compile time diagnostics.
- Add nrn_{acc,omp}_pragma macros to make clang-format less painful.
- Add omp_set_default_device call so the CTest suite works.
- Transform one loop in the matrix solver from OpenACC to OpenMP.
- Drop cc60 because of OpenMP offload incompatibility.
572066f to
8cad9eb
Compare
|
Logfiles from GitLab pipeline #27587 (:no_entry:) have been uploaded here! Status and direct links: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me. As discussed offline, we will fix the clang-formatted part of pragmas manually.
These OpenACC pragmas are debug-only and have not been re-implemented for OpenMP target offload.
|
Logfiles from GitLab pipeline #28163 (:no_entry:) have been uploaded here! Status and direct links: |
|
Logfiles from GitLab pipeline #28256 (:no_entry:) have been uploaded here! Status and direct links: |
|
Logfiles from GitLab pipeline #28269 (:no_entry:) have been uploaded here! Status and direct links: |
|
Logfiles from GitLab pipeline #28280 (:white_check_mark:) have been uploaded here! Status and direct links: |
Prefer CORENEURON_ prefixes for macros, CORENRN_ prefixes for CMake variables.
|
Logfiles from GitLab pipeline #28331 (:no_entry:) have been uploaded here! Status and direct links: |
|
Logfiles from GitLab pipeline #28336 (:white_check_mark:) have been uploaded here! Status and direct links: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Only my open question is about seq part in cellorder.cpp. We can merge this and review it later if you like.
Summary of changes: - Support OpenMP target offload when NMODL and GPU support are enabled. (#693, #704, #705, #707, #708, #716, #719) - Use sensible defaults for the --nwarp parameter, improving the performance of the Hines solver with --cell-permute=2 on GPU. (#700, #710, #718) - Use a Boost memory pool, if Boost is available, to reduce the number of independent CUDA unified memory allocations used for Random123 stream objects. This speeds up initialisation of models using Random123, and also makes it feasible to use NSight Compute on models using Random123 and for NSight Systems to profile initialisation. (#702, #703) - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended on the NVIDIA forums. (#721) - Do not compile for compute capability 6.0 by default, as this is not supported by NVHPC with OpenMP target offload. - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and OpenMP. (#698, #717) - Add CUDA runtime header search path explicitly, so we don't rely on it being implicit in our NVHPC localrc. - Cleanup unused code. (#711) Co-authored-by: Pramod Kumbhar <[email protected]> Co-authored-by: Ioannis Magkanaris <[email protected]> Co-authored-by: Christos Kotsalos <[email protected]> Co-authored-by: Nicolas Cornu <[email protected]>
Summary of changes: - Support OpenMP target offload when NMODL and GPU support are enabled. (BlueBrain/CoreNeuron#693, BlueBrain/CoreNeuron#704, BlueBrain/CoreNeuron#705, BlueBrain/CoreNeuron#707, BlueBrain/CoreNeuron#708, BlueBrain/CoreNeuron#716, BlueBrain/CoreNeuron#719) - Use sensible defaults for the --nwarp parameter, improving the performance of the Hines solver with --cell-permute=2 on GPU. (BlueBrain/CoreNeuron#700, BlueBrain/CoreNeuron#710, BlueBrain/CoreNeuron#718) - Use a Boost memory pool, if Boost is available, to reduce the number of independent CUDA unified memory allocations used for Random123 stream objects. This speeds up initialisation of models using Random123, and also makes it feasible to use NSight Compute on models using Random123 and for NSight Systems to profile initialisation. (BlueBrain/CoreNeuron#702, BlueBrain/CoreNeuron#703) - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended on the NVIDIA forums. (BlueBrain/CoreNeuron#721) - Do not compile for compute capability 6.0 by default, as this is not supported by NVHPC with OpenMP target offload. - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and OpenMP. (BlueBrain/CoreNeuron#698, BlueBrain/CoreNeuron#717) - Add CUDA runtime header search path explicitly, so we don't rely on it being implicit in our NVHPC localrc. - Cleanup unused code. (BlueBrain/CoreNeuron#711) Co-authored-by: Pramod Kumbhar <[email protected]> Co-authored-by: Ioannis Magkanaris <[email protected]> Co-authored-by: Christos Kotsalos <[email protected]> Co-authored-by: Nicolas Cornu <[email protected]> CoreNEURON Repo SHA: BlueBrain/CoreNeuron@423ae6c
Description
Migrate some offloaded kernels from OpenACC to OpenMP.
The idea with the macros defined here is that the macros
nrn_acc_pragmaandnrn_omp_pragmaare used to annotate expressions that can be used with either OpenACC or OpenMP. i.e.nrn_acc_pragma(atomic update) nrn_omp_pragma(atomic update) rhs += p * rhs;will expand to either
#pragma acc atomic update rhs += p * rhs;or
#pragma omp atomic update rhs += p * rhs;Some other directives may be needed, for example if we want OpenACC directives in a build that prefers OpenMP offload then we may need code like
Use certain branches for the GitLab/SimulationStack CI
CI_BRANCHES:NMODL_BRANCH=hackathon_main,NEURON_BRANCH=master,