Skip to content

Conversation

judober
Copy link
Contributor

@judober judober commented Jan 13, 2019

Concerning this issue.
My proposal for increased performance of A/B if A is 2x2 or 3x3 and B is 2or3xn.
There is an ongoing discussion about this.

@coveralls
Copy link

coveralls commented Jan 13, 2019

Coverage Status

Coverage increased (+0.04%) to 66.012% when pulling 9faf9e2 on judober:Increase_Matrix_Div_Performance into 786b6f0 on JuliaArrays:master.

@judober
Copy link
Contributor Author

judober commented Jan 14, 2019

Speed comparison (other machine than on discourse).
Original version:

julia> a = Float64.(@SMatrix rand(3,3)); b = Float64.(@SMatrix rand(3,3));

julia> @benchmark a\b
BenchmarkTools.Trial: 
  memory estimate:  80 bytes
  allocs estimate:  1
  --------------
  minimum time:     149.989 ns (0.00% GC)
  median time:      153.293 ns (0.00% GC)
  mean time:        184.259 ns (7.69% GC)
  maximum time:     75.550 μs (99.69% GC)
  --------------
  samples:          10000
  evals/sample:     818

This PR (with the same matrices):

julia> @benchmark a\b
BenchmarkTools.Trial: 
  memory estimate:  80 bytes
  allocs estimate:  1
  --------------
  minimum time:     66.859 ns (0.00% GC)
  median time:      70.639 ns (0.00% GC)
  mean time:        98.225 ns (16.09% GC)
  maximum time:     73.637 μs (99.82% GC)
  --------------
  samples:          10000
  evals/sample:     978

@tkoolen
Copy link
Contributor

tkoolen commented Jan 14, 2019

Could you run the following benchmark (results below are on my machine with StaticArrays 0.10.2):

julia> using StaticArrays, BenchmarkTools

julia> @benchmark a \ b setup = begin
           a = rand(SMatrix{3, 3})
           b = rand(SMatrix{3, 3})
       end
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     71.394 ns (0.00% GC)
  median time:      71.689 ns (0.00% GC)
  mean time:        74.010 ns (0.00% GC)
  maximum time:     113.456 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     974

@andyferris
Copy link
Member

Seems like a reasonable approach to me.

@judober
Copy link
Contributor Author

judober commented Jan 15, 2019

New test with old version:

julia> using StaticArrays, BenchmarkTools

julia> @benchmark a \ b setup = begin
                  a = rand(SMatrix{3, 3})
                  b = rand(SMatrix{3, 3})
              end
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     104.663 ns (0.00% GC)
  median time:      106.303 ns (0.00% GC)
  mean time:        118.250 ns (0.00% GC)
  maximum time:     1.198 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     937

With PR:

julia> using StaticArrays, BenchmarkTools

julia> @benchmark a \ b setup = begin
                  a = rand(SMatrix{3, 3})
                  b = rand(SMatrix{3, 3})
              end
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     30.143 ns (0.00% GC)
  median time:      32.320 ns (0.00% GC)
  mean time:        35.998 ns (0.00% GC)
  maximum time:     1.324 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     994

Copy link
Contributor

@tkoolen tkoolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a few minor comments. Nice that the allocations are getting elided now.

@tkoolen
Copy link
Contributor

tkoolen commented Jan 15, 2019

Do you think the repeated computation of the determinant of A for each of the columns inside the inner solve calls is being optimized away? How long does det(A) take on your machine?

@judober
Copy link
Contributor Author

judober commented Jan 15, 2019

Ok, first the performance of det(A):

julia> using LinearAlgebra

julia> @benchmark det(a) setup = begin
                  a = rand(SMatrix{3, 3})
              end
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     17.886 ns (0.00% GC)
  median time:      17.888 ns (0.00% GC)
  mean time:        20.050 ns (0.00% GC)
  maximum time:     169.474 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

You might be right that this is a point for further improvements. I will try it.
Though, when I compare the numbers, it must be optimised away, otherwise the four calculations of det(a) would need alone approx. 60 ns.

@judober
Copy link
Contributor Author

judober commented Jan 15, 2019

The results when adding OneTo:

julia> @benchmark a \ b setup = begin
                  a = rand(SMatrix{3, 3})
                  b = rand(SMatrix{3, 3})
              end
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     30.143 ns (0.00% GC)
  median time:      30.303 ns (0.00% GC)
  mean time:        34.886 ns (0.00% GC)
  maximum time:     460.771 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     994

seems pretty identical.

@tkoolen
Copy link
Contributor

tkoolen commented Jan 16, 2019

Though, when I compare the numbers, it must be optimised away, otherwise the four calculations of det(a) would need alone approx. 60 ns.

Yeah, I agree. That's cool!

No further comments from me, @andyferris, good to go?

@c42f
Copy link
Member

c42f commented Apr 26, 2019

I think this is fine, let's merge it.

@c42f c42f merged commit 0759078 into JuliaArrays:master Apr 26, 2019
@c42f c42f mentioned this pull request Jun 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants