Skip to content

Conversation

@mcabbott
Copy link
Member

@mcabbott mcabbott commented Apr 3, 2020

This is an alternative to #187.

It similarly allows batched_mul to work on many PermutedDimsArrays, but does this just by calling strides(A) and branching. It also extends batched_mul! to take α, β scales like mul!.

It adds two functions storage_type and is_strided, both of which recursively unwrap things. This avoids trying to dispatch on BatchedAdjoint{PermutedDimsArray{...,CuArray}}... instead it can separately check the underlying storage, and whether it should be safe to call strides(A).

It also improves batched_gemm! to multi-thread the batch index (using JuliaLang/julia#36360 to save & restore the number of BLAS threads), and to allow size(A,3)==1 (batch only B,C).

@mcabbott
Copy link
Member Author

Bump. Who has merge permissions here? @CarloLucibello, @DhairyaLGandhi?

mcabbott pushed a commit to mcabbott/CUDA.jl that referenced this pull request Oct 24, 2020
@CarloLucibello
Copy link
Member

ops, didn't see this, thanks

@CarloLucibello CarloLucibello merged commit 9780c29 into FluxML:master Nov 11, 2020
@mcabbott mcabbott deleted the fix2 branch November 11, 2020 08:54
@mcabbott
Copy link
Member Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants