-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Description
This came up in the context of #22210, where I'm noticing a big performance hit on transpose for sparse matrices. A convenient test case comes from copying these lines to a separate file, and annotating _computecolptrs_halfperm! with @noinline (not strictly necessary since it doesn't inline on master) and then comparing the result of using either @noinline or @inline on _distributevals_halfperm!.
Demo:
A = sprand(600, 600, 0.01);
X = transpose(A);
using BenchmarkToolsWith @inline on _distributevals_halfperm!:
julia> @benchmark halfperm!($X, $A, $(1:A.n), $(identity)) seconds=1
BenchmarkTools.Trial:
memory estimate: 166.98 KiB
allocs estimate: 10685
--------------
minimum time: 921.938 μs (0.00% GC)
median time: 936.064 μs (0.00% GC)
mean time: 954.923 μs (0.40% GC)
maximum time: 1.627 ms (38.60% GC)
--------------
samples: 1046
evals/sample: 1With @noinline on _distributevals_halfperm!:
julia> @benchmark halfperm!($X, $A, $(1:A.n), $(identity)) seconds=1
BenchmarkTools.Trial:
memory estimate: 64 bytes
allocs estimate: 2
--------------
minimum time: 23.175 μs (0.00% GC)
median time: 23.390 μs (0.00% GC)
mean time: 23.658 μs (0.00% GC)
maximum time: 52.727 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1Inspection does not suggest an immediate reason for this 40x performance gap; profiling places all the blame at this line with the function evaluation. It made me wonder whether there is some problem inlining the function call.
However, the truly bizarre part is that, with @inline, @code_llvm _distributevals_halfperm!(X, A, 1:A.n, identity) is, for all practical purposes that I can see, identical to @code_llvm halfperm!(X, A, 1:A.n, identity) (aside from the obvious call to _computecolptrs_halfperm!). I am not at all good at reading assembly, but even there the differences do not seem dramatic to me (there are some constant differences to movq statements that might be problematic?).
This seems really puzzling. LLVM bug? Present at least on 0.6.0-rc3 and master.