-
Couldn't load subscription status.
- Fork 11
Closed
Labels
gpuanything involving a CuArray or similaranything involving a CuArray or similar
Description
I was asked to make this an issue, but I'm not sure there's anything to be done about it.
TensorCast uses scalar indexing when a shared index changes position, which is nightmarishly slow for big calculations on gpu.
using TensorCast
using CUDA
CUDA.allowscalar(false)
a, b, c, d = 1, 2, 3, 4
A = cu(rand(a,b,c))
B = cu(rand(a,c,d))
# Fast
@reduce C[a,b,d] := sum(c) A[a,b,c] * B[a,c,d]
# Fast
@reduce C[a,d,b] := sum(c) A[a,b,c] * B[a,c,d]
# Not fast
@reduce C[b,a,d] := sum(c) A[a,b,c] * B[a,c,d]
# Fast
@reduce C[b,c,d] := sum(a) A[a,b,c] * B[a,c,d]
# Not fast
@reduce C[c,b,d] := sum(a) A[a,b,c] * B[a,c,d]schneiderfelipe
Metadata
Metadata
Assignees
Labels
gpuanything involving a CuArray or similaranything involving a CuArray or similar