Large (20x) runtime difference between value and Jacobian with simple sparsity pattern

As mentioned in issue #136, I am observing a discrepancy in performance between the AD and non-AD runtime that is larger than expected. Here is a MWE with a simple sparsity pattern that isolates the behavior and performs some sanity checks:
```julia
using ForwardDiff
using SparsityDetection, SparseArrays
using SparseDiffTools
using BenchmarkTools
using Test

n = 1000*1000
# n = 1000
L = [i for i in 1:n]
R = [i % n + 1 for i in 1:n]

function g(out, in)
    n = length(in)
    for i in eachindex(in)
        l = i
        r = i % n + 1
        out[i] = in[l] - in[r]
    end
end
inp = rand(n)
out = similar(inp)
# Time case without any AD
println("Calling value version:")
@btime g(out, inp)

pattern = jacobian_sparsity(g, out, inp)
jacA = Float64.(sparse(pattern)) # Output from non-cached version
jacB = Float64.(sparse(pattern)) # Output from JacCache'd version

colors = matrix_colors(jacA)
# Verify that it works without cache
println("Calling sparse AD without cache")
@btime forwarddiff_color_jacobian!(jacA, g, inp, colorvec = colors,
                                           sparsity = pattern)
# Set up cache
cache = ForwardColorJacCache(g, inp, colorvec = colors, 
                                     sparsity = pattern);
# Call with cache
println("Calling sparse AD with cache")
@btime forwarddiff_color_jacobian!(jacB, g, inp, cache)
# Check that they have same pattern & values
@test nnz(jacA-jacB) == 0
# Basic sanity check
@test maximum(colors) == 2
```
The function is a cyclical version of `diff` and the coloring is just 1,2 alternating with the Jacobian as a general sparse matrix. Filling the Jacobian should ideally be somewhere around 2-4 x the value function call from two AD passes since each Dual contains two floats if we disregard cache effects and the cost of dealing with the sparse data structure. Output for n = 2, 1000 and 1,000,000:
```julia
n = 1000000
```
```
Calling value version:
  25.803 ns (0 allocations: 0 bytes)
Explored path: SparsityDetection.Path(Bool[], 1)
Calling sparse AD without cache
  8.733 μs (83 allocations: 4.38 KiB)
Calling sparse AD with cache
  93.172 ns (0 allocations: 0 bytes)
```
```julia
n = 1000
```
```
Calling value version:
  1.550 μs (0 allocations: 0 bytes)
Explored path: SparsityDetection.Path(Bool[], 1)
Calling sparse AD without cache
  213.300 μs (2573 allocations: 223.75 KiB)
Calling sparse AD with cache
  26.000 μs (0 allocations: 0 bytes)
```
```julia
n = 1000000
```
```
Calling value version:
  1.579 ms (0 allocations: 0 bytes)
Explored path: SparsityDetection.Path(Bool[], 1)
Calling sparse AD without cache
  283.338 ms (2999579 allocations: 218.15 MiB)
Calling sparse AD with cache
  35.923 ms (0 allocations: 0 bytes)
```
We see that the cache seems to work and gives a 8x or so improvement in runtime, and eliminates the allocations while yielding the same output, which is good! The ratio between the value and the AD versions is 3.72 for n = 2, 17.3 for n = 1000 and 23.3 for n = 1e6 so there may be something that is not linear here with respect to the input size... Or I have a bug somewhere in my setup. My `versioninfo()`:
```
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, icelake-client)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4
```
Relevant packages:
```
  "ForwardDiff"       => v"0.10.17"
  "BenchmarkTools"    => v"0.7.0"
  "SparsityDetection" => v"0.3.4"
  "SparseDiffTools" => v"1.13.0"
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Large (20x) runtime difference between value and Jacobian with simple sparsity pattern #138

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Large (20x) runtime difference between value and Jacobian with simple sparsity pattern #138

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions