- 
                Notifications
    You must be signed in to change notification settings 
- Fork 41
Integrate changes from NERSC GPU hackathon. #713
Conversation
* Disable cmake-format and clang-format checks. * Disable GitLab CI except for NMODL + GPU.
* Add a hackathon-specific argument for benchmarks. * Add a reference comparison for channel-benchmark.
* create build/benchmark folder before trying to use it * run nrnivmodl-core in parallel than serially (too slow)
…umber and update the related documentation (#700)
* Add memory pool for Random123 streams. This speeds up initialisation when running on GPU. * Make Boost optional.
This was a silly bug in #702.
* Simplify unified memory logic.
* Pass -mp=gpu when we pass -acc
* Pass -gpu=lineinfo for better debug information.
* Pass -Minfo=accel,mp for better compile time diagnostics.
* Add nrn_pragma_{acc,omp} macros for single-source Open{ACC,MP} support.
* Call omp_set_default_device.
* Drop cc60 because of OpenMP offload incompatibility.
* Add --gpu to test.
* Default (BB5-valid) CORENRN_EXTERNAL_BENCHMARK_DATA.
* Remove cuda_add_library.
* Don't print number of GPUs when quiet.
* Set OMP_NUM_THREADS=1 for lfp_test.
* Update NMODL to emit nrn_pragma{acc,omp} macros.
Co-authored-by: Pramod Kumbhar <[email protected]>
    * Add wrapper functions for using OpenMP or OpenACC API
* Add -mp=gpu in order to link gpu runtime with tests as well
* Avoid copying VecPlay members twice otherwise association fails with OpenMP
     * IvocVect members t_ and y_ were copied twice
     * only discon_indices_ is pointer and hence that
        needs to be copied
    | Logfiles from GitLab pipeline #28910 (:no_entry:) have been uploaded here! Status and direct links: | 
| Logfiles from GitLab pipeline #28931 (:white_check_mark:) have been uploaded here! Status and direct links: | 
…erGrid & threadsPerBlock (#710)
| Logfiles from GitLab pipeline #28992 (:no_entry:) have been uploaded here! Status and direct links: | 
* Use #pragma omp instead of runtime API in `cnrn_target_{copyin,delete}`
* Fix `VecPlayContinuous::discon_indices_` device transfer.
* Name `cnrn_target_` wrappers more consistently.
Co-authored-by: Olli Lupton <[email protected]>
    | Logfiles from GitLab pipeline #29035 (:white_check_mark:) have been uploaded here! Status and direct links: | 
We prefer selective host-to-device updates.
| Logfiles from GitLab pipeline #29110 (:white_check_mark:) have been uploaded here! Status and direct links: | 
Code fixes for XLC and Clang execution without build system changes. This mainly adds missing OpenMP pragmas and makes cnrn_target_ wrappers visible to NMODL.
| Logfiles from GitLab pipeline #29143 (:white_check_mark:) have been uploaded here! Status and direct links: | 
omp_get_mapped_ptr was added in OpenMP 5.1 and is not widely supported. With this change then calling cnrn_target_deviceptr on a pointer that is not present on the device is a hard error instead of returning nullptr, so avoid calling it for artificial cells.
| Logfiles from GitLab pipeline #29497 (:white_check_mark:) have been uploaded here! Status and direct links: | 
* Set nwarp to very big number for optimal parallelization and improve a bit grid config of CUDA solve_interleaved2
| Logfiles from GitLab pipeline #29728 (:white_check_mark:) have been uploaded here! Status and direct links: | 
* Re-enable GitLab CI.
* Add NMODL + OpenACC test.
* Restore {clang,cmake}-format checks.
* Prefer OpenACC with MOD2C.
* Do not enable OpenACC in NMODL + OpenMP mode.
* Convert more #pragma acc to nrn_pragma_acc(...).
* Call cudaSetDevice in OpenMP mode.
Co-authored-by: Ioannis Magkanaris <[email protected]>
    Presumably this was working before because our nvhpc localrc files accidentally included CUDA include directories before BlueBrain/spack#1392.
* Compile NVHPC+Open{ACC,MP} with -cuda.
* Pull in NMODL+Eigen fixes to make this work.
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Just one question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Overall all changes look fine to me. Added one/two comments for clarification.
Summary of changes: - Support OpenMP target offload when NMODL and GPU support are enabled. (BlueBrain/CoreNeuron#693, BlueBrain/CoreNeuron#704, BlueBrain/CoreNeuron#705, BlueBrain/CoreNeuron#707, BlueBrain/CoreNeuron#708, BlueBrain/CoreNeuron#716, BlueBrain/CoreNeuron#719) - Use sensible defaults for the --nwarp parameter, improving the performance of the Hines solver with --cell-permute=2 on GPU. (BlueBrain/CoreNeuron#700, BlueBrain/CoreNeuron#710, BlueBrain/CoreNeuron#718) - Use a Boost memory pool, if Boost is available, to reduce the number of independent CUDA unified memory allocations used for Random123 stream objects. This speeds up initialisation of models using Random123, and also makes it feasible to use NSight Compute on models using Random123 and for NSight Systems to profile initialisation. (BlueBrain/CoreNeuron#702, BlueBrain/CoreNeuron#703) - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended on the NVIDIA forums. (BlueBrain/CoreNeuron#721) - Do not compile for compute capability 6.0 by default, as this is not supported by NVHPC with OpenMP target offload. - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and OpenMP. (BlueBrain/CoreNeuron#698, BlueBrain/CoreNeuron#717) - Add CUDA runtime header search path explicitly, so we don't rely on it being implicit in our NVHPC localrc. - Cleanup unused code. (BlueBrain/CoreNeuron#711) Co-authored-by: Pramod Kumbhar <[email protected]> Co-authored-by: Ioannis Magkanaris <[email protected]> Co-authored-by: Christos Kotsalos <[email protected]> Co-authored-by: Nicolas Cornu <[email protected]> CoreNEURON Repo SHA: BlueBrain/CoreNeuron@423ae6c
Description
During the NERSC GPU hackathon we used
hackathon_mainas our "base" branch.Now the hackathon is over, we should merge our developments into
master, after re-enabling the full test suite and fixing any issues we delayed addressing during the hackathon.See also:
neuronsimulator/gpuhackathon#4
Summary of hackathon developments:
--nwarpparameter, improving the performance of the Hines solver with--cell-permute=2on GPU.-cudawhen compiling with NVHPC and OpenACC or OpenMP, as recommended on the NVIDIA forums.localrc.TODO:
Use certain branches for the SimulationStack CI
CI_BRANCHES:NEURON_BRANCH=master,NMODL_BRANCH=master,