-
Notifications
You must be signed in to change notification settings - Fork 252
Open
Labels
cuda kernelsStuff about writing CUDA kernels.Stuff about writing CUDA kernels.enhancementNew feature or requestNew feature or request
Description
In CUDA C you can explicitly request vectorized loads/stores using the special vector types (float2
, float4
). Sometimes I found those useful to squeeze out the last bit of performance. This definitely isn't high priority, but I was wondering how hard would be to add something similar to CUDAnative
.
JuliaGPU/CUDAnative.jl#174 is related, but maybe some of the problems have been solved ?
cdsousajkozdon, JesseLu and Moelf
Metadata
Metadata
Assignees
Labels
cuda kernelsStuff about writing CUDA kernels.Stuff about writing CUDA kernels.enhancementNew feature or requestNew feature or request