Skip to content
This repository was archived by the owner on Mar 20, 2023. It is now read-only.

Conversation

@olupton
Copy link
Contributor

@olupton olupton commented Dec 1, 2021

This speeds up initialisation when running on GPU if Boost is available.

Previously many small Random123 stream objects were allocated separately using (ultimately) cudaMallocManaged in GPU builds. This is very slow, and makes setup on GPU much slower than on CPU.

This change places a pool allocator "in front of" cudaMallocManaged, which both makes allocation faster and (hopefully) reduces the number of unified memory page faults during simulation.

In a small channel-benchmark-based test this makes model setup 3x faster.

Use certain branches for the SimulationStack CI

CI_BRANCHES:NEURON_BRANCH=master,

This speeds up initialisation when running on GPU.
Copy link
Collaborator

@pramodk pramodk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👎 boost

@bbpbuildbot
Copy link
Collaborator

@olupton olupton requested a review from pramodk December 1, 2021 17:11
@bbpbuildbot
Copy link
Collaborator

Copy link
Collaborator

@pramodk pramodk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pramodk pramodk merged commit a8bb716 into hackathon_main Dec 1, 2021
@pramodk pramodk deleted the olupton/faster-random123-gpu-initialisation branch December 1, 2021 20:05
@pramodk
Copy link
Collaborator

pramodk commented Dec 2, 2021

As reported on Hackathon slack, this fails with:

[ 87%] Building CXX object coreneuron/CMakeFiles/coreneuron.dir/mpi/core/nrnmpidec.cpp.o
/ccsopen/home/PCARRIER/NEURON/CoreNeuron/coreneuron/utils/randoms/nrnran123.cu(72): error: identifier "nrnran123_State" is undefined

/autofs/nccsopen-svm1_sw/ascent/gcc/6.4.0/include/c++/6.4.0/bits/unique_ptr.h(171): error: no instance of constructor "std::tuple<_T1, _T2>::tuple [with _T1=<error-type>, _T2=coreneuron::alloc_deleter<<unnamed>::random123_allocator>]" matches the argument list
          detected during instantiation of "std::unique_ptr<_Tp, _Dp>::unique_ptr(std::unique_ptr<_Tp, _Dp>::pointer) [with _Tp=coreneuron::nrnran123_State, _Dp=coreneuron::alloc_deleter<<unnamed>::random123_allocator>]"
/ccsopen/home/PCARRIER/NEURON/CoreNeuron/coreneuron/utils/randoms/nrnran123.cu(292): here

/ccsopen/home/PCARRIER/NEURON/CoreNeuron/coreneuron/utils/memory.h(90): error: no instance of function template "std::allocator_traits<_Alloc>::destroy [with _Alloc=<unnamed>::random123_allocator]" matches the argument list
            argument types are: (<unnamed>::random123_allocator, <error-type>)
          detected during:
            instantiation of "void coreneuron::alloc_deleter<Alloc>::operator()(coreneuron::alloc_deleter<Alloc>::pointer) const [with Alloc=<unnamed>::random123_allocator]"
/autofs/nccsopen-svm1_sw/ascent/gcc/6.4.0/include/c++/6.4.0/bits/unique_ptr.h(239): here
            instantiation of "std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp=coreneuron::nrnran123_State, _Dp=coreneuron::alloc_deleter<<unnamed>::random123_allocator>]"
/ccsopen/home/PCARRIER/NEURON/CoreNeuron/coreneuron/utils/randoms/nrnran123.cu(292): here

3 errors detected in the compilation of "/tmp/tmpxft_0000e191_00000000-6_nrnran123.cpp1.ii".
gmake[2]: *** [coreneuron/CMakeFiles/coreneuron.dir/utils/randoms/nrnran123.cu.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
[ 88%] Linking CXX static library ../lib/libscopmath.a
[ 88%] Built target scopmath
gmake[1]: *** [coreneuron/CMakeFiles/coreneuron.dir/all] Error 2
gmake: *** [all] Error 2

olupton added a commit that referenced this pull request Dec 2, 2021
This was a silly bug in #702.
olupton added a commit that referenced this pull request Dec 23, 2021
Summary of changes:
 - Support OpenMP target offload when NMODL and GPU support are enabled.
   (#693, #704, #705, #707, #708, #716, #719)
 - Use sensible defaults for the --nwarp parameter, improving the performance
   of the Hines solver with --cell-permute=2 on GPU. (#700, #710, #718)
 - Use a Boost memory pool, if Boost is available, to reduce the number of
   independent CUDA unified memory allocations used for Random123 stream
   objects. This speeds up initialisation of models using Random123, and also
   makes it feasible to use NSight Compute on models using Random123 and for
   NSight Systems to profile initialisation. (#702, #703)
 - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended
   on the NVIDIA forums. (#721)
 - Do not compile for compute capability 6.0 by default, as this is not
   supported by NVHPC with OpenMP target offload.
 - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and
   OpenMP. (#698, #717)
 - Add CUDA runtime header search path explicitly, so we don't rely on it being
   implicit in our NVHPC localrc.
 - Cleanup unused code. (#711)

Co-authored-by: Pramod Kumbhar <[email protected]>
Co-authored-by: Ioannis Magkanaris <[email protected]>
Co-authored-by: Christos Kotsalos <[email protected]>
Co-authored-by: Nicolas Cornu <[email protected]>
pramodk pushed a commit to neuronsimulator/nrn that referenced this pull request Nov 2, 2022
Summary of changes:
 - Support OpenMP target offload when NMODL and GPU support are enabled.
   (BlueBrain/CoreNeuron#693, BlueBrain/CoreNeuron#704, BlueBrain/CoreNeuron#705, BlueBrain/CoreNeuron#707, BlueBrain/CoreNeuron#708, BlueBrain/CoreNeuron#716, BlueBrain/CoreNeuron#719)
 - Use sensible defaults for the --nwarp parameter, improving the performance
   of the Hines solver with --cell-permute=2 on GPU. (BlueBrain/CoreNeuron#700, BlueBrain/CoreNeuron#710, BlueBrain/CoreNeuron#718)
 - Use a Boost memory pool, if Boost is available, to reduce the number of
   independent CUDA unified memory allocations used for Random123 stream
   objects. This speeds up initialisation of models using Random123, and also
   makes it feasible to use NSight Compute on models using Random123 and for
   NSight Systems to profile initialisation. (BlueBrain/CoreNeuron#702, BlueBrain/CoreNeuron#703)
 - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended
   on the NVIDIA forums. (BlueBrain/CoreNeuron#721)
 - Do not compile for compute capability 6.0 by default, as this is not
   supported by NVHPC with OpenMP target offload.
 - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and
   OpenMP. (BlueBrain/CoreNeuron#698, BlueBrain/CoreNeuron#717)
 - Add CUDA runtime header search path explicitly, so we don't rely on it being
   implicit in our NVHPC localrc.
 - Cleanup unused code. (BlueBrain/CoreNeuron#711)

Co-authored-by: Pramod Kumbhar <[email protected]>
Co-authored-by: Ioannis Magkanaris <[email protected]>
Co-authored-by: Christos Kotsalos <[email protected]>
Co-authored-by: Nicolas Cornu <[email protected]>

CoreNEURON Repo SHA: BlueBrain/CoreNeuron@423ae6c
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants