Set by default the number of warps #700

iomaganaris · 2021-11-29T15:18:42Z

Set by default the number of warps to execute in a large reasonable number and update the related documentation

Description

Cell permute 2 algorithm is distributing the cells in groups based on the nwarp option based to the CLI. Since it used to be 0 as default all the cells were executed using one warp on GPU with the --cell-permute 2 option and there was no interleaving of the cells in the way they are sorted in memory.
This generated suboptimal performance.
A quick solution provided is setting the nwarp to a reasonable large number that might not generate the best load balancing but should be enough to hide large latencies introduced by memory accesses.

TODO:

Find a better and dynamic way to set it

…umber and update the related documentation

bbpbuildbot · 2021-11-29T16:09:31Z

Logfiles from GitLab pipeline #27115 (:white_check_mark:) have been uploaded here!

Status and direct links:

…umber and update the related documentation (#700)

Summary of changes: - Support OpenMP target offload when NMODL and GPU support are enabled. (#693, #704, #705, #707, #708, #716, #719) - Use sensible defaults for the --nwarp parameter, improving the performance of the Hines solver with --cell-permute=2 on GPU. (#700, #710, #718) - Use a Boost memory pool, if Boost is available, to reduce the number of independent CUDA unified memory allocations used for Random123 stream objects. This speeds up initialisation of models using Random123, and also makes it feasible to use NSight Compute on models using Random123 and for NSight Systems to profile initialisation. (#702, #703) - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended on the NVIDIA forums. (#721) - Do not compile for compute capability 6.0 by default, as this is not supported by NVHPC with OpenMP target offload. - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and OpenMP. (#698, #717) - Add CUDA runtime header search path explicitly, so we don't rely on it being implicit in our NVHPC localrc. - Cleanup unused code. (#711) Co-authored-by: Pramod Kumbhar <[email protected]> Co-authored-by: Ioannis Magkanaris <[email protected]> Co-authored-by: Christos Kotsalos <[email protected]> Co-authored-by: Nicolas Cornu <[email protected]>

Summary of changes: - Support OpenMP target offload when NMODL and GPU support are enabled. (BlueBrain/CoreNeuron#693, BlueBrain/CoreNeuron#704, BlueBrain/CoreNeuron#705, BlueBrain/CoreNeuron#707, BlueBrain/CoreNeuron#708, BlueBrain/CoreNeuron#716, BlueBrain/CoreNeuron#719) - Use sensible defaults for the --nwarp parameter, improving the performance of the Hines solver with --cell-permute=2 on GPU. (BlueBrain/CoreNeuron#700, BlueBrain/CoreNeuron#710, BlueBrain/CoreNeuron#718) - Use a Boost memory pool, if Boost is available, to reduce the number of independent CUDA unified memory allocations used for Random123 stream objects. This speeds up initialisation of models using Random123, and also makes it feasible to use NSight Compute on models using Random123 and for NSight Systems to profile initialisation. (BlueBrain/CoreNeuron#702, BlueBrain/CoreNeuron#703) - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended on the NVIDIA forums. (BlueBrain/CoreNeuron#721) - Do not compile for compute capability 6.0 by default, as this is not supported by NVHPC with OpenMP target offload. - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and OpenMP. (BlueBrain/CoreNeuron#698, BlueBrain/CoreNeuron#717) - Add CUDA runtime header search path explicitly, so we don't rely on it being implicit in our NVHPC localrc. - Cleanup unused code. (BlueBrain/CoreNeuron#711) Co-authored-by: Pramod Kumbhar <[email protected]> Co-authored-by: Ioannis Magkanaris <[email protected]> Co-authored-by: Christos Kotsalos <[email protected]> Co-authored-by: Nicolas Cornu <[email protected]> CoreNEURON Repo SHA: BlueBrain/CoreNeuron@423ae6c

Set by default the number of warps to execute in a large reasonable n…

f3f63d6

…umber and update the related documentation

iomaganaris requested review from kotsaloscv, olupton and pramodk November 29, 2021 15:18

kotsaloscv approved these changes Nov 29, 2021

View reviewed changes

pramodk approved these changes Nov 29, 2021

View reviewed changes

pramodk merged commit 3e394c4 into hackathon_main Nov 29, 2021

pramodk deleted the hackathon/magkanar/fix_nwarp branch November 29, 2021 20:39

olupton pushed a commit that referenced this pull request Nov 30, 2021

Set by default the number of warps to execute in a large reasonable n…

6c0cef1

…umber and update the related documentation (#700)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set by default the number of warps #700

Set by default the number of warps #700

Uh oh!

iomaganaris commented Nov 29, 2021

Uh oh!

bbpbuildbot commented Nov 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Set by default the number of warps #700

Set by default the number of warps #700

Uh oh!

Conversation

iomaganaris commented Nov 29, 2021

Uh oh!

bbpbuildbot commented Nov 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants