Array4: __cuda_array_interface__ v3 #30

ax3l · 2022-03-26T23:01:36Z

Start implementing the __cuda_array_interface__ for zero-copy data exchange on Nvidia CUDA GPUs.

Optional: accessing an external __cuda_array_interface__ object in non-owning manner as AMReX Array4:
https://github.com/cupy/cupy/blob/a5b24f91d4d77fa03e6a4dd2ac954ff9a04e21f4/cupy/core/core.pyx#L2478-L2514

mfab and mfab_device need to become functions, not fixtures. Otherwise they will be cached and outlive amrex.finalize(): AMReX Initialize/Finalize as Context Manager #81 MultiFab: Fix Fixture Lifetime #84

Particle Iter & MFIter: Python does not destruct keys by default: add a GPU stream synchronize here?

pyamrex/src/Base/MultiFab.cpp

Lines 72 to 75 in 78bbbc7

    
           if( !mfi.isValid() ) 
        
           { 
        
               first_or_done = true; 
        
               throw py::stop_iteration();

depends on MFIter::Finalize amrex#2983 and MFIter: Make Finalize Public amrex#2985
if and for do not create a scope in Python (they do in C++):

In [1]: import numpy as np                                                                                                           

In [2]: x = np.array([1,2,3])                                                   

In [3]: for a in x: 
   ...:     print(a)                                                                
1
2
3

# a is still alive xD
In [4]: a                                                                       
Out[4]: 3

ax3l · 2022-10-06T17:05:30Z

⚠️ There is an nvcc host code generation bug that we fixed with Nvidia last night. Affects CUDA Toolkit 11.4-11.8 with pybind11 (pybind/pybind11#4193)
I will ship a work-around for pybind11 (pybind/pybind11#4220) before the next CUDA release, pls use an older NVCC (e.g. 11.3) in the meantime.

Update: patch that unlocks that broken compiler range in pybind/pybind11#4220
Add this to CMake:

cmake -S . -B build -DAMREX_GPU_BACKEND=CUDA -DpyAMReX_pybind11_repo=https://github.com/ax3l/pybind11.git -DpyAMReX_pybind11_branch=fix-nvcc-11.4-11.8

ax3l · 2022-10-06T17:10:41Z

@RemiLehe build logic from README.md is this:

So concretely:

# Python packages if not already installed as described
python3 -m pip install -U pip setuptools wheel
python3 -m pip install -U cmake pytest
python3 -m pip install -U -r requirements.txt

# depending on what you try
python3 -m pip install cupy-cuda11x
python3 -m pip install numba
python3 -m pip install torch

# configure once (unless changing backend or versions heavily)
cmake -S . -B build -DAMReX_GPU_BACKEND=CUDA \
    -DpyAMReX_pybind11_repo=https://github.com/ax3l/pybind11.git \
    -DpyAMReX_pybind11_branch=fix-nvcc-11.4-11.8
# rinse & repeat: builds, packages & runs pip install
cmake --build build --target pip_install -j 8

and tests:

# Run all tests
python3 -m pytest tests/

# Run tests from a single file
python3 -m pytest tests/test_array4.py

# Run a single test (useful during debugging)
python3 -m pytest tests/test_array4.py::test_array4_cupy
python3 -m pytest tests/test_multifab.py::test_mfab_ops_cuda_cupy

# Run all tests, do not capture "print" output and be verbose
python3 -m pytest -s -vvvv tests/test_array4.py

and with nsight:

nsys profile -f true -t cuda,nvtx,osrt python3 -m pytest -s -vvv tests/test_multifab.py::test_mfab_ops_cuda_cupy

GUI:

nsight-sys

ax3l · 2022-10-06T20:35:04Z

Found a tiny bug, will rebase after #77 was merged. - Update: done.

Found another arena bug, will rebase after #78 was merged. - Update: done.

ax3l · 2022-10-07T04:00:12Z

First cupy progress. Gotta learn how to do in-place updates on arrays in kernels...

tests/test_multifab.py

ax3l · 2022-10-14T01:02:42Z

With the new MFIter::Finalize, I can also see the cudaStreamSynchronize calls at the end of the iteration :)

tests/test_multifab.py

Start implementing the `__cuda_array_interface__` for zero-copy data exchange on Nvidia CUDA GPUs.

Since `for` loops create no scope in Python, we need to trigger finalize logic, including stream syncs, before the destructor of `MultiFab` iterators are called.

incl. 3D kernel launch

src/Base/Array4.cpp

A bit tricky to implement this caster as new constructor. Not currently needed, but adds comments where to do this.

ax3l · 2022-10-17T07:26:37Z

Wuup, wuup. First part done.
Larger tests and particles next :)

ax3l added the enhancement New feature or request label Mar 26, 2022

ax3l force-pushed the array4-cuda-array-interface branch 3 times, most recently from 0e72782 to 44eb90c Compare March 26, 2022 23:09

ax3l force-pushed the array4-cuda-array-interface branch from 44eb90c to d2937d0 Compare April 8, 2022 07:14

ax3l requested a review from n01r July 1, 2022 15:58

ax3l mentioned this pull request Aug 3, 2022

.to_numpy(), .to_cupy(), etc. #55

Closed

ax3l force-pushed the array4-cuda-array-interface branch 2 times, most recently from 7c0289a to f3ff788 Compare October 5, 2022 20:52

ax3l mentioned this pull request Oct 6, 2022

Geometry: Fix Overloads #77

Merged

ax3l force-pushed the array4-cuda-array-interface branch from f3ff788 to 7fe40bd Compare October 6, 2022 21:05

ax3l requested review from RemiLehe and removed request for n01r October 6, 2022 21:14

ax3l force-pushed the array4-cuda-array-interface branch 3 times, most recently from d0f2dec to 5395043 Compare October 7, 2022 03:16

ax3l commented Oct 7, 2022

View reviewed changes

tests/test_multifab.py Show resolved Hide resolved

ax3l mentioned this pull request Oct 7, 2022

pytest fixtures: per function #78

Merged

ax3l force-pushed the array4-cuda-array-interface branch from 5395043 to df349c5 Compare October 7, 2022 18:06

ax3l commented Oct 7, 2022

View reviewed changes

tests/test_multifab.py Outdated Show resolved Hide resolved

ax3l mentioned this pull request Oct 11, 2022

MFIter::Finalize AMReX-Codes/amrex#2983

Merged

5 tasks

ax3l force-pushed the array4-cuda-array-interface branch from f5138a9 to a7fc736 Compare October 14, 2022 07:12

ax3l changed the title ~~[WIP] Array4: __cuda_array_interface__ v2~~ Array4: __cuda_array_interface__ v2 Oct 14, 2022

ax3l commented Oct 14, 2022

View reviewed changes

tests/test_multifab.py Outdated Show resolved Hide resolved

ax3l mentioned this pull request Oct 15, 2022

AMReX Initialize/Finalize as Context Manager #81

Open

ax3l mentioned this pull request Oct 17, 2022

Memory Arenas #85

Merged

ax3l and others added 6 commits October 16, 2022 20:27

Array4: __cuda_array_interface__ v2

62f2340

Start implementing the `__cuda_array_interface__` for zero-copy data exchange on Nvidia CUDA GPUs.

MultiFab: CuPy Test

271a021

MFIter: Finalize() on StopIteration

5acd36a

Since `for` loops create no scope in Python, we need to trigger finalize logic, including stream syncs, before the destructor of `MultiFab` iterators are called.

Add numba test

6965f9a

incl. 3D kernel launch

Add pytorch

9f539f9

CuPy Fuse: Avoid Extra Memset

4175194

ax3l force-pushed the array4-cuda-array-interface branch from f204a6d to 4175194 Compare October 17, 2022 03:33

ax3l mentioned this pull request Oct 17, 2022

Discussion on mapping between amrex, numpy.ndarray, and torch.tensor data types #9

Open

ax3l commented Oct 17, 2022

View reviewed changes

src/Base/Array4.cpp Outdated Show resolved Hide resolved

MultiFab Device Test: Fixes

6eb2da4

ax3l force-pushed the array4-cuda-array-interface branch from a6a1199 to 6eb2da4 Compare October 17, 2022 04:35

Update to v3

7f6d80b

ax3l changed the title ~~Array4: __cuda_array_interface__ v2~~ Array4: __cuda_array_interface__ v3 Oct 17, 2022

Array4: TODO from CUDA

e65cd41

A bit tricky to implement this caster as new constructor. Not currently needed, but adds comments where to do this.

ax3l enabled auto-merge (squash) October 17, 2022 07:26

ax3l merged commit 16ce636 into AMReX-Codes:development Oct 17, 2022

ax3l deleted the array4-cuda-array-interface branch October 17, 2022 07:28

ax3l mentioned this pull request Oct 17, 2022

Particles: CUDA Array Interface #86

Merged

7 tasks

ax3l mentioned this pull request Jun 6, 2023

AMReX Tiny Profiler Context #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Array4: __cuda_array_interface__ v3 #30

Array4: __cuda_array_interface__ v3 #30

Uh oh!

ax3l commented Mar 26, 2022 •

edited

Loading

Uh oh!

ax3l commented Oct 6, 2022 •

edited

Loading

Uh oh!

ax3l commented Oct 6, 2022 •

edited by RemiLehe

Loading

Uh oh!

ax3l commented Oct 6, 2022 •

edited

Loading

Uh oh!

ax3l commented Oct 7, 2022

Uh oh!

Uh oh!

Uh oh!

ax3l commented Oct 14, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ax3l commented Oct 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if( !mfi.isValid() )
	{
	first_or_done = true;
	throw py::stop_iteration();

Array4: __cuda_array_interface__ v3 #30

Array4: __cuda_array_interface__ v3 #30

Uh oh!

Conversation

ax3l commented Mar 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ax3l commented Oct 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ax3l commented Oct 6, 2022 • edited by RemiLehe Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ax3l commented Oct 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ax3l commented Oct 7, 2022

Uh oh!

Uh oh!

Uh oh!

ax3l commented Oct 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ax3l commented Oct 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ax3l commented Mar 26, 2022 •

edited

Loading

ax3l commented Oct 6, 2022 •

edited

Loading

ax3l commented Oct 6, 2022 •

edited by RemiLehe

Loading

ax3l commented Oct 6, 2022 •

edited

Loading

ax3l commented Oct 14, 2022 •

edited

Loading