-
Notifications
You must be signed in to change notification settings - Fork 41
Implement fast_imem calculation on GPU. #574
Conversation
da0f33d
to
30845dd
Compare
Retest this please Jenkins. |
Updates mod2c/nmodl submodule commits to include relevant fixes, BlueBrain/mod2c#64 and BlueBrain/nmodl#681. Closes #197.
30845dd
to
3dd6780
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
so here we transform so it only has membrane current contribution | ||
*/ | ||
double* p = _nt->nrn_fast_imem->nrn_sav_d; | ||
#pragma acc parallel loop present(p, vec_d) if (_nt->compute_gpu) async(_nt->stream_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is launched via async
but I believe the synchronisation via acc wait
is not immediately required (there should be top level acc wait
already). cc: @iomaganaris
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my conclusion too; let's wait for @iomaganaris's review though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC stream_id
should always be 0
unless we use openmp
threading, which in the normal case we don't.
More details here:
CoreNeuron/coreneuron/io/phase2.cpp
Line 843 in f61ec32
nt.stream_id = 0; |
CoreNeuron/coreneuron/io/phase2.cpp
Line 854 in f61ec32
nt.stream_id = omp_get_thread_num(); |
However since there is still this case I think that the
async(stream_id)
is still needed
I just checked that with NMODL+GPU+the current version of this branch then I get
as expected (BlueBrain/nmodl#675, BlueBrain/nmodl#678). Because the tests are not included in the CI yet, I think it's fine to merge this without waiting for fixes for those issues. |
thanks! good to merge! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for taking care of this! The PR LGTM as well 👍
This commit updates the CoreNEURON submodule commit to include BlueBrain/CoreNeuron#574, which fixes BlueBrain/CoreNeuron#197 by adding support for fast_imem computation on GPU. This means that various workarounds can be removed from the NEURON test configuration.
acc_memcpy_to_device(&(d_nrb->_displ), &d_displ, sizeof(int*)); | ||
|
||
d_nrb_index = (int*) acc_copyin(nrb->_nrb_index, sizeof(int) * (nrb->_size + 1)); | ||
d_nrb_index = (int*) acc_copyin(nrb->_nrb_index, sizeof(int) * nrb->_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you briefly explain what is behind this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change is to make it match
CoreNeuron/coreneuron/io/phase2.cpp
Line 461 in 2c51992
nrb->_nrb_index = (int*) ecalloc_align(nrb->_size, sizeof(int)); |
without the change then
acc_copyin
reads 4 undefined bytes at the end (Valgrind complained).
This commit updates the CoreNEURON submodule commit to include BlueBrain/CoreNeuron#574, which fixes BlueBrain/CoreNeuron#197 by adding support for fast_imem computation on GPU. This means that various workarounds can be removed from the NEURON test configuration.
* Avoids crashing without --mpi in an MPI build. * Fix off-by-one error in _nrb_index size. * Consistently pad the size of the `pdata` block. * Updates mod2c/nmodl submodule commits to include relevant fixes. CoreNEURON Repo SHA: BlueBrain/CoreNeuron@2c51992
Description
This PR adds support for copying the
fast_imem
data structures to/from a compute device/GPU and adds OpenACC#pragma
s to offload computations on those data. This closes #197.This will allow us to improve the coverage of the NEURON tests, where some checks are disabled because of this issue. It is also necessary to get the test suite running with CoreNEURON+NMODL+GPU, because NMODL's generated code crashes if the data structures are not copied to the GPU.
Some fixes were also needed to the
.mod
file translation to make the NEURON tests pass, so this changes the submodule commits to include BlueBrain/mod2c#64 and BlueBrain/nmodl#681.This PR also includes fixes for two memory errors found with Valgrind/memcheck, and a workaround so that not passing
--mpi
to an MPI build does not cause a crash.How to test this?
Build NEURON with CoreNEURON, NMODL and GPU support enabled and run the tests:
without this PR, some tests will fail with
(with an NVIDIA GPU), with this PR the only failures should be in the
testcorenrn_gf
andtestcorenrn_watch
tests, which will be fixed separately.Test System
Use certain branches for the SimulationStack CI
CI_BRANCHES:NEURON_BRANCH=olupton/gpu-fast-imem,