-
Notifications
You must be signed in to change notification settings - Fork 23
Array4: __cuda_array_interface__ v3 #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Array4: __cuda_array_interface__ v3 #30
Conversation
0e72782 to
44eb90c
Compare
44eb90c to
d2937d0
Compare
7c0289a to
f3ff788
Compare
|
Update: patch that unlocks that broken compiler range in pybind/pybind11#4220 cmake -S . -B build -DAMREX_GPU_BACKEND=CUDA -DpyAMReX_pybind11_repo=https://github.com/ax3l/pybind11.git -DpyAMReX_pybind11_branch=fix-nvcc-11.4-11.8 |
|
@RemiLehe build logic from README.md is this:
So concretely: # Python packages if not already installed as described
python3 -m pip install -U pip setuptools wheel
python3 -m pip install -U cmake pytest
python3 -m pip install -U -r requirements.txt
# depending on what you try
python3 -m pip install cupy-cuda11x
python3 -m pip install numba
python3 -m pip install torch
# configure once (unless changing backend or versions heavily)
cmake -S . -B build -DAMReX_GPU_BACKEND=CUDA \
-DpyAMReX_pybind11_repo=https://github.com/ax3l/pybind11.git \
-DpyAMReX_pybind11_branch=fix-nvcc-11.4-11.8
# rinse & repeat: builds, packages & runs pip install
cmake --build build --target pip_install -j 8and tests: # Run all tests
python3 -m pytest tests/
# Run tests from a single file
python3 -m pytest tests/test_array4.py
# Run a single test (useful during debugging)
python3 -m pytest tests/test_array4.py::test_array4_cupy
python3 -m pytest tests/test_multifab.py::test_mfab_ops_cuda_cupy
# Run all tests, do not capture "print" output and be verbose
python3 -m pytest -s -vvvv tests/test_array4.pyand with nsight: GUI: |
f3ff788 to
7fe40bd
Compare
d0f2dec to
5395043
Compare
5395043 to
df349c5
Compare
f5138a9 to
a7fc736
Compare
Start implementing the `__cuda_array_interface__` for zero-copy data exchange on Nvidia CUDA GPUs.
Since `for` loops create no scope in Python, we need to trigger finalize logic, including stream syncs, before the destructor of `MultiFab` iterators are called.
incl. 3D kernel launch
f204a6d to
4175194
Compare
a6a1199 to
6eb2da4
Compare
A bit tricky to implement this caster as new constructor. Not currently needed, but adds comments where to do this.
|
Wuup, wuup. First part done. |



Start implementing the
__cuda_array_interface__for zero-copy data exchange on Nvidia CUDA GPUs.Optional: accessing an external
__cuda_array_interface__object in non-owning manner as AMReX Array4:https://github.com/cupy/cupy/blob/a5b24f91d4d77fa03e6a4dd2ac954ff9a04e21f4/cupy/core/core.pyx#L2478-L2514
mfabandmfab_deviceneed to become functions, not fixtures. Otherwise they will be cached and outliveamrex.finalize(): AMReX Initialize/Finalize as Context Manager #81 MultiFab: Fix Fixture Lifetime #84pyamrex/src/Base/MultiFab.cpp
Lines 72 to 75 in 78bbbc7
depends on MFIter::Finalize amrex#2983 and MFIter: Make Finalize Public amrex#2985
ifandfordo not create a scope in Python (they do in C++):