Implement fast_imem calculation on GPU. #574

olupton · 2021-06-01T09:23:39Z

Description
This PR adds support for copying the fast_imem data structures to/from a compute device/GPU and adds OpenACC #pragmas to offload computations on those data. This closes #197.

This will allow us to improve the coverage of the NEURON tests, where some checks are disabled because of this issue. It is also necessary to get the test suite running with CoreNEURON+NMODL+GPU, because NMODL's generated code crashes if the data structures are not copied to the GPU.

Some fixes were also needed to the .mod file translation to make the NEURON tests pass, so this changes the submodule commits to include BlueBrain/mod2c#64 and BlueBrain/nmodl#681.

This PR also includes fixes for two memory errors found with Valgrind/memcheck, and a workaround so that not passing --mpi to an MPI build does not cause a crash.

How to test this?
Build NEURON with CoreNEURON, NMODL and GPU support enabled and run the tests:

cmake ..  -DNRN_ENABLE_TESTS=ON -DNRN_ENABLE_CORENEURON=ON -DCORENRN_ENABLE_GPU=ON -DCORENRN_ENABLE_NMODL=ON
cmake --build . --parallel
ctest -j 8

without this PR, some tests will fail with

CUDA Exception: Warp Illegal Address

(with an NVIDIA GPU), with this PR the only failures should be in the testcorenrn_gf and testcorenrn_watch tests, which will be fixed separately.

Test System

OS: BB5
Compiler: NVHPC 21.2
Backend: GPU

Use certain branches for the SimulationStack CI

CI_BRANCHES:NEURON_BRANCH=olupton/gpu-fast-imem,

olupton · 2021-06-02T11:44:15Z

Retest this please Jenkins.

Updates mod2c/nmodl submodule commits to include relevant fixes, BlueBrain/mod2c#64 and BlueBrain/nmodl#681. Closes #197.

pramodk

LGTM

coreneuron/gpu/nrn_acc_manager.cpp

pramodk · 2021-06-02T12:13:35Z

coreneuron/sim/treeset_core.cpp

           so here we transform so it only has membrane current contribution
        */
        double* p = _nt->nrn_fast_imem->nrn_sav_d;
+#pragma acc parallel loop present(p, vec_d) if (_nt->compute_gpu) async(_nt->stream_id)


This is launched via async but I believe the synchronisation via acc wait is not immediately required (there should be top level acc wait already). cc: @iomaganaris

That was my conclusion too; let's wait for @iomaganaris's review though.

IIUC stream_id should always be 0 unless we use openmp threading, which in the normal case we don't.
More details here:

CoreNeuron/coreneuron/io/phase2.cpp

Line 843 in f61ec32

nt.stream_id = 0;

CoreNeuron/coreneuron/io/phase2.cpp

Line 854 in f61ec32

nt.stream_id = omp_get_thread_num();

However since there is still this case I think that the async(stream_id) is still needed

olupton · 2021-06-02T12:47:21Z

I just checked that with NMODL+GPU+the current version of this branch then I get

98% tests passed, 2 tests failed out of 83
The following tests FAILED:
         51 - testcorenrn_gf::compare_results (Failed)
         73 - testcorenrn_watch::compare_results (Failed)

as expected (BlueBrain/nmodl#675, BlueBrain/nmodl#678).

Because the tests are not included in the CI yet, I think it's fine to merge this without waiting for fixes for those issues.

pramodk · 2021-06-02T12:49:02Z

thanks! good to merge!

iomaganaris

Thanks a lot for taking care of this! The PR LGTM as well 👍

This commit updates the CoreNEURON submodule commit to include BlueBrain/CoreNeuron#574, which fixes BlueBrain/CoreNeuron#197 by adding support for fast_imem computation on GPU. This means that various workarounds can be removed from the NEURON test configuration.

alkino · 2021-06-02T15:27:52Z

coreneuron/gpu/nrn_acc_manager.cpp

                acc_memcpy_to_device(&(d_nrb->_displ), &d_displ, sizeof(int*));

-                d_nrb_index = (int*) acc_copyin(nrb->_nrb_index, sizeof(int) * (nrb->_size + 1));
+                d_nrb_index = (int*) acc_copyin(nrb->_nrb_index, sizeof(int) * nrb->_size);


Can you briefly explain what is behind this change?

The change is to make it match

CoreNeuron/coreneuron/io/phase2.cpp

Line 461 in 2c51992

nrb->_nrb_index = (int*) ecalloc_align(nrb->_size, sizeof(int));

without the change then acc_copyin reads 4 undefined bytes at the end (Valgrind complained).

This commit updates the CoreNEURON submodule commit to include BlueBrain/CoreNeuron#574, which fixes BlueBrain/CoreNeuron#197 by adding support for fast_imem computation on GPU. This means that various workarounds can be removed from the NEURON test configuration.

* Avoids crashing without --mpi in an MPI build. * Fix off-by-one error in _nrb_index size. * Consistently pad the size of the `pdata` block. * Updates mod2c/nmodl submodule commits to include relevant fixes. CoreNEURON Repo SHA: BlueBrain/CoreNeuron@2c51992

olupton requested review from iomaganaris and pramodk June 1, 2021 09:23

olupton mentioned this pull request Jun 1, 2021

Check fast_imem results on GPU too. neuronsimulator/nrn#1313

Merged

olupton closed this Jun 1, 2021

olupton reopened this Jun 1, 2021

olupton mentioned this pull request Jun 1, 2021

testcorenrn_patstim::coreneuron_gpu_offline test does not produce reliable output #563

Open

olupton closed this Jun 1, 2021

olupton reopened this Jun 1, 2021

olupton added 3 commits June 2, 2021 13:41

Do not crash without --mpi in an MPI build.

46ab8b3

Fix off-by-one error in _nrb_index size.

61f7ea5

Consistently pad the size of the pdata block.

9b18271

olupton force-pushed the olupton/gpu-fast-imem branch from da0f33d to 30845dd Compare June 2, 2021 11:43

Enable fast_imem on GPU.

3dd6780

Updates mod2c/nmodl submodule commits to include relevant fixes, BlueBrain/mod2c#64 and BlueBrain/nmodl#681. Closes #197.

olupton force-pushed the olupton/gpu-fast-imem branch from 30845dd to 3dd6780 Compare June 2, 2021 11:47

olupton marked this pull request as ready for review June 2, 2021 11:48

pramodk approved these changes Jun 2, 2021

View reviewed changes

iomaganaris approved these changes Jun 2, 2021

View reviewed changes

iomaganaris merged commit 2c51992 into master Jun 2, 2021

iomaganaris deleted the olupton/gpu-fast-imem branch June 2, 2021 14:24

alkino reviewed Jun 2, 2021

View reviewed changes

olupton mentioned this pull request Jun 30, 2021

coreneuron_modtests::{direct_hoc,datareturn_py} tests sometimes fail on GPU #586

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement fast_imem calculation on GPU. #574

Implement fast_imem calculation on GPU. #574

Uh oh!

olupton commented Jun 1, 2021 •

edited

Loading

Uh oh!

olupton commented Jun 2, 2021

Uh oh!

pramodk left a comment

Uh oh!

Uh oh!

pramodk Jun 2, 2021

Uh oh!

olupton Jun 2, 2021

Uh oh!

iomaganaris Jun 2, 2021 •

edited

Loading

Uh oh!

olupton commented Jun 2, 2021

Uh oh!

pramodk commented Jun 2, 2021

Uh oh!

iomaganaris left a comment

Uh oh!

alkino Jun 2, 2021

Uh oh!

olupton Jun 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Implement fast_imem calculation on GPU. #574

Implement fast_imem calculation on GPU. #574

Uh oh!

Conversation

olupton commented Jun 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

olupton commented Jun 2, 2021

Uh oh!

pramodk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pramodk Jun 2, 2021

Choose a reason for hiding this comment

Uh oh!

olupton Jun 2, 2021

Choose a reason for hiding this comment

Uh oh!

iomaganaris Jun 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

olupton commented Jun 2, 2021

Uh oh!

pramodk commented Jun 2, 2021

Uh oh!

iomaganaris left a comment

Choose a reason for hiding this comment

Uh oh!

alkino Jun 2, 2021

Choose a reason for hiding this comment

Uh oh!

olupton Jun 2, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

olupton commented Jun 1, 2021 •

edited

Loading

iomaganaris Jun 2, 2021 •

edited

Loading