Skip to content
This repository was archived by the owner on Mar 20, 2023. It is now read-only.

Conversation

olupton
Copy link
Contributor

@olupton olupton commented Jun 1, 2021

Description
This PR adds support for copying the fast_imem data structures to/from a compute device/GPU and adds OpenACC #pragmas to offload computations on those data. This closes #197.

This will allow us to improve the coverage of the NEURON tests, where some checks are disabled because of this issue. It is also necessary to get the test suite running with CoreNEURON+NMODL+GPU, because NMODL's generated code crashes if the data structures are not copied to the GPU.

Some fixes were also needed to the .mod file translation to make the NEURON tests pass, so this changes the submodule commits to include BlueBrain/mod2c#64 and BlueBrain/nmodl#681.

This PR also includes fixes for two memory errors found with Valgrind/memcheck, and a workaround so that not passing --mpi to an MPI build does not cause a crash.

How to test this?
Build NEURON with CoreNEURON, NMODL and GPU support enabled and run the tests:

cmake ..  -DNRN_ENABLE_TESTS=ON -DNRN_ENABLE_CORENEURON=ON -DCORENRN_ENABLE_GPU=ON -DCORENRN_ENABLE_NMODL=ON
cmake --build . --parallel
ctest -j 8

without this PR, some tests will fail with

CUDA Exception: Warp Illegal Address

(with an NVIDIA GPU), with this PR the only failures should be in the testcorenrn_gf and testcorenrn_watch tests, which will be fixed separately.

Test System

  • OS: BB5
  • Compiler: NVHPC 21.2
  • Backend: GPU

Use certain branches for the SimulationStack CI

CI_BRANCHES:NEURON_BRANCH=olupton/gpu-fast-imem,

@olupton olupton force-pushed the olupton/gpu-fast-imem branch from da0f33d to 30845dd Compare June 2, 2021 11:43
@olupton
Copy link
Contributor Author

olupton commented Jun 2, 2021

Retest this please Jenkins.

Updates mod2c/nmodl submodule commits to include relevant fixes,
BlueBrain/mod2c#64 and
BlueBrain/nmodl#681.

Closes #197.
@olupton olupton force-pushed the olupton/gpu-fast-imem branch from 30845dd to 3dd6780 Compare June 2, 2021 11:47
@olupton olupton marked this pull request as ready for review June 2, 2021 11:48
Copy link
Collaborator

@pramodk pramodk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

so here we transform so it only has membrane current contribution
*/
double* p = _nt->nrn_fast_imem->nrn_sav_d;
#pragma acc parallel loop present(p, vec_d) if (_nt->compute_gpu) async(_nt->stream_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is launched via async but I believe the synchronisation via acc wait is not immediately required (there should be top level acc wait already). cc: @iomaganaris

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my conclusion too; let's wait for @iomaganaris's review though.

Copy link
Contributor

@iomaganaris iomaganaris Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC stream_id should always be 0 unless we use openmp threading, which in the normal case we don't.
More details here:

nt.stream_id = 0;

nt.stream_id = omp_get_thread_num();

However since there is still this case I think that the async(stream_id) is still needed

@olupton
Copy link
Contributor Author

olupton commented Jun 2, 2021

I just checked that with NMODL+GPU+the current version of this branch then I get

98% tests passed, 2 tests failed out of 83
The following tests FAILED:
         51 - testcorenrn_gf::compare_results (Failed)
         73 - testcorenrn_watch::compare_results (Failed)

as expected (BlueBrain/nmodl#675, BlueBrain/nmodl#678).

Because the tests are not included in the CI yet, I think it's fine to merge this without waiting for fixes for those issues.

@pramodk
Copy link
Collaborator

pramodk commented Jun 2, 2021

thanks! good to merge!

Copy link
Contributor

@iomaganaris iomaganaris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for taking care of this! The PR LGTM as well 👍

@iomaganaris iomaganaris merged commit 2c51992 into master Jun 2, 2021
@iomaganaris iomaganaris deleted the olupton/gpu-fast-imem branch June 2, 2021 14:24
olupton added a commit to neuronsimulator/nrn that referenced this pull request Jun 2, 2021
This commit updates the CoreNEURON submodule commit to include
BlueBrain/CoreNeuron#574, which fixes
BlueBrain/CoreNeuron#197 by adding support for
fast_imem computation on GPU. This means that various workarounds can be
removed from the NEURON test configuration.
acc_memcpy_to_device(&(d_nrb->_displ), &d_displ, sizeof(int*));

d_nrb_index = (int*) acc_copyin(nrb->_nrb_index, sizeof(int) * (nrb->_size + 1));
d_nrb_index = (int*) acc_copyin(nrb->_nrb_index, sizeof(int) * nrb->_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you briefly explain what is behind this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is to make it match

nrb->_nrb_index = (int*) ecalloc_align(nrb->_size, sizeof(int));

without the change then acc_copyin reads 4 undefined bytes at the end (Valgrind complained).

pramodk pushed a commit to neuronsimulator/nrn that referenced this pull request Jun 2, 2021
This commit updates the CoreNEURON submodule commit to include
BlueBrain/CoreNeuron#574, which fixes
BlueBrain/CoreNeuron#197 by adding support for
fast_imem computation on GPU. This means that various workarounds can be
removed from the NEURON test configuration.
pramodk pushed a commit to neuronsimulator/nrn that referenced this pull request Nov 2, 2022
* Avoids crashing without --mpi in an MPI build.
* Fix off-by-one error in _nrb_index size.
* Consistently pad the size of the `pdata` block.
* Updates mod2c/nmodl submodule commits to include relevant fixes.

CoreNEURON Repo SHA: BlueBrain/CoreNeuron@2c51992
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement fast_imem_calculation for GPU

4 participants