-
Notifications
You must be signed in to change notification settings - Fork 5.9k
cudev: Add __shfl_down implementation for long long and unsigned long for CUDA Tookit < 9.0 #3963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi, I thought I would chime in as it relates to my recent PR. The company I work for uses Jetson TX2 with CC=6.2, CUDA Toolkit 10.2 and everything seems to work, so I took a look at it. It looks like all devices that support warp shuffle (CC≥3.0) will support shuffle with To the question about if it is implemented in software or hardware, I think warp shuffle are always done 32 bit at a time because the registers are limited to 32 bit. It will just be two shuffles for 64 bit types. The PTX also indicate this in Compiler Explorer: https://godbolt.org/z/nxdYcqoWe. If the PTX view doesn't show, try opening a new compiler window. I think the if-statement should be changed to check the CUDA Toolkit version instead. The current code will change the behavior on Jetson TX2 even though it should be supported. Does OpenCV specify a minimum version of CUDA Toolkit? |
|
@troelsy The flag was just a test to try and fix the crash on CC 5.3 devices. Do you have access to CUDA toolkit < 9.0 to test whether |
|
TX2 should be able to run CUDA Toolkit 8.0, but my department doesn't have access to the firmware, so I can't try it out |
|
@asmorkalov Do you have a machine with CUDA Toolkit < 9.0 on it to check this? |
|
No, unfortunately. I want to deploy something with desktop PC, but after the 4.12 release. It's almost ready. |
|
Would you like me to kill this PR or change the |
…long for CUDA Tookit versions < 9.0
3051f9a to
af8945e
Compare
|
@cudawarped Is it ready for merge? |
Draft fix for #3962.
Support for
__shfl_downonlong longwas not introduced until CUDA Toolkit 9.0. I don't know if this is just software support or if hardware support was added as well. Its a long shot but it may be the reason that the tests are failing on Compute Capbability 5.3 devices.Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.