-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Description
OpenBLAS matrix multiplication optimization on an AWS EC2 ARM graviton2 (neoverse-n1) system with the following julia setup seems to be failing:
- OpenBLAS 0.3.9 (present on Julia master and v1.5-rc1)
- LLVM 10 (PR upgrade LLVM to 10 JuliaLang/julia#35318) (may be irrelevant)
- Updated ARM cpu detection (PR Update ARM feature and CPU detection (supersedes #36464) JuliaLang/julia#36485) (may be irrelevant)
julia> versioninfo()
Julia Version 1.6.0-DEV.341
Commit 8367e441ac* (2020-07-01 18:30 UTC)
Platform Info:
OS: Linux (aarch64-linux-gnu)
CPU: unknown
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-10.0.0 (ORCJIT, neoverse-n1)
Environment:
JULIA_NUM_THREADS = 16
julia> LinearAlgebra.BLAS.openblas_get_config()
"OpenBLAS 0.3.9 NO_AFFINITY ARMV8 MAX_THREADS=32"
julia> using BenchmarkTools
julia> @btime x * x setup=(x=rand(Float32, 100, 100));
21.123 ms (2 allocations: 39.14 KiB)
compared to a mac:
julia> @btime x * x setup=(x=rand(Float32, 100, 100));
18.161 μs (2 allocations: 39.14 KiB)
Metadata
Metadata
Assignees
Labels
No labels