StridedArray->AbstractArray in many places #15367

timholy · 2016-03-05T12:01:58Z

This is a largely janitorial change that allows us to defer the definition of StridedArray until much later in the bootstrap process. This was motivated by my (still unfinished) work on ReshapedArrays, but this appears to have some value on its own merits: now that we have nice iterators and indexing, "strides" simply aren't a very important concept anymore for pure julia code. So the usage of StridedArray should generally be limited to interactions with outside libraries that require strided data layout.

This also fixes a bug (currently on master):

immutable MyArray{T,N} <: AbstractArray{T,N}
    data::Array{T,N}
end

Base.size(A::MyArray) = size(A.data)
Base.getindex(A::MyArray, indexes...) = A.data[indexes...]

A = MyArray(rand(4,5))
b = rand(5)

julia> A*b
ERROR: LoadError: MethodError: no method matching A_mul_B!(::Array{Float64,1}, ::MyArray{Float64,2}, ::Array{Float64,1})
Closest candidates are:
...

The most important reason I decided to submit this separately is that janitorial changes have a nasty habit of generating unexpected performance regressions, so I wanted to runbenchmarks("array", vs = "JuliaLang/julia:master").

timholy · 2016-03-05T12:37:01Z

test/sparsedir/sparse.jl


-@test_throws ErrorException transpose(sub(sprandn(10, 10, 0.3), 1:4, 1:4))
-@test_throws ErrorException ctranspose(sub(sprandn(10, 10, 0.3), 1:4, 1:4))
-


While it's always scary to delete tests, I wasn't sure why this was something that needed testing. On this branch,

julia> A = sub(sprandn(10, 10, 0.3), 1:4, 1:4) 4x4 SubArray{Float64,2,SparseMatrixCSC{Float64,Int64},Tuple{UnitRange{Int64},UnitRange{Int64}},false}: 0.0 0.0 -1.41166 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.628624 1.40626 0.0 1.08286 0.0 -0.97074 julia> transpose(A) 4x4 sparse matrix with 5 Float64 entries: [3, 1] = -1.41166 [3, 3] = 0.628624 [4, 3] = 1.40626 [2, 4] = 1.08286 [4, 4] = -0.97074 julia> full(transpose(A)) 4x4 Array{Float64,2}: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.08286 -1.41166 0.0 0.628624 0.0 0.0 0.0 1.40626 -0.97074

which seems like a rather satisfactory result. (It's even sparse!)

CC @kshyatt

how does it perform relative to the non subarray sparse transpose?

and could this be changed to an affirmative test now that it works?

Haven't tested, but I'm sure it's dreadful. We need a good iterator for sparse matrices and their views.

timholy · 2016-03-05T12:50:49Z

@jrevels, any idea about the Nanosoldier error?

nalimilan · 2016-03-05T14:12:46Z

Really nice!

tkelman · 2016-03-06T02:18:58Z

The fixed bug with a wrapper AbstractArray type should be tested for, no?

timholy · 2016-03-06T20:46:32Z

Let's try again and see if it magically works this time:
runbenchmarks("array", vs = "JuliaLang/julia:master")

This allows us to defer the definition of StridedArray until much later in the bootstrap process.

jrevels · 2016-03-07T15:17:41Z

@jrevels, any idea about the Nanosoldier error?

Sorry about the delay, I've been battling the flu. Let's see if the patch I just pushed fixed things:

runbenchmarks("array", vs = "JuliaLang/julia:master")

timholy · 2016-03-07T16:01:09Z

Sorry to hear about the flu! Hope you're winning the battle!

I think I discovered the problem: sparse matrix multiplication was dispatching differently. I added another commit to fix that. Can we cancel the run you started and try fresh?

timholy · 2016-03-07T16:03:06Z

When we launch it, I guess we probably want "array" || "sparse" || "linalg".

jrevels · 2016-03-07T16:19:00Z

runbenchmarks("array" || "sparse" || "linalg", vs = "JuliaLang/julia:master")

jrevels · 2016-03-07T16:20:12Z

Hmm. Looks like the tag predicate validation is wonky now. I'll look into it.

timholy · 2016-03-07T16:35:29Z

I should add that I've run the sparse tests locally, and things look fine. I'm launching a run with the full "array" || "linalg" || "sparse" locally, and will fix any issues. You can go back to bed & rest now!

jrevels · 2016-03-07T16:43:10Z

Appreciated. Though it's difficult to rest when there are bugs afoot! Let's give it one last shot:

runbenchmarks("array" || "sparse" || "linalg", vs = "JuliaLang/julia:master")

Jutho · 2016-03-07T22:56:36Z

Am I correct that the goal here is to delay the definition of StridedArray, even when this widens function definitions to generic AbstractArray instances in cases where it is probably not wise or even possible?

I spotted that permutedims for AbstractArray calls permutedims! which is still only defined for StridedArray. Also, the definitions of (c)transpose! using a recursive blocking strategy were also written with a strided memory layout in mind.

timholy · 2016-03-07T23:33:52Z

I spotted that permutedims for AbstractArray callspermutedims!which is still only defined forStridedArray

I noticed that too. My view is, one MethodError is as good as another MethodError. Now permutedims will succeed but the call to permutedims! will not.

Also, the definitions of (c)transpose! using a recursive blocking strategy were also written with a strided memory layout in mind.

I've given some thought to defining permutedims! in cases where we don't rely on striding, but I haven't coded anything. In general terms, I think it will look vaguely like this:

for jblk in 1:skipj:size(A,d1)  # d1 is the dimension that is fastest in the permuted array B
    for iblk in 1:skipi:size(A,1)
        for j in jblk:min(jblk+skipj-1, size(A,d1))
            for i in iblk:min(iblk+skipi-1, size(A,1))
                B[j,IB1,i,IB2] = A[i,IA1,j,IA2]
 end end end end

where all the I iterators are CartesianIndexes.

Might be a good "up for grabs" issue.

timholy · 2016-03-08T00:05:07Z

Oh, and forgot to respond to

Am I correct that the goal here is to delay the definition of StridedArray, even when this widens function definitions to generic AbstractArray instances in cases where it is probably not wise or even possible?

Delaying is part of the goal, but not the only one. Our multiplication routine, for example, is general, yet we get the error at the top. In general, I suspect we've been using StridedMatrix to mean "not sparse," but the problem is that cuts out more than just sparse.

andreasnoack · 2016-03-08T00:15:03Z

I suspect we've been using StridedMatrix to mean "not sparse," but the problem is that cuts out more than just sparse.

Any kind of distributed array is also doomed when you start elementwise indexing. It is already a problem to the extent that during the development cycle, I don't make any kind of distributed array a subtype of AbstractArray because it hides the "generic" fallbacks that use elementwise indexing.

I'm sure you have mentioned it somewhere before, but why is it that your arrays cannot be considered Dense?

timholy · 2016-03-08T00:47:22Z

Sure, I completely agree that some types won't be efficient, and that this is a genuine problem. But that's true for many algorithms not among the ones I modified: try calling sum on a DistributedArray, and you'll be quite unhappy unless you've written a specialized version of sum. Yet we've not been restricting sum to StridedArrays.

AFAICT, in the functions I modified, with the exception of cummin, cummax, and permutedims, there's nothing about our implementations which assume strides. I replaced cummin and cummax with more generic versions, so only permutedims would need to be added (and I left the type-constraint on the workhorse function permutedims!, so we're basically in the same place we were before).

Since essentially none of the algorithms required strides, dispatching on StridedArray doesn't seem to make sense. I suppose we could dispatch on DenseArray, but what about this type:

type ProductMatrix{T} <: AbstractMatrix{T}
    xrange::StepRange{T,T}
    yrange::StepRange{T,T}
end

Base.getindex(A::ProductArray, i, j) = A.xrange[i]*A.yrange[j]

Definitely not a DenseArray, yet matrix multiplication and all those arithmetic operations work just fine as written. Therefore, they should be supported.

I'm simply of the growing belief that things should work first and foremost, and be efficient secondarily. If one needs a more efficient version of an algorithm, one should write it, but we shouldn't make people re-implement all of array arithmetic simply because we were worried that the generic algorithm might be slow for a subset of types.

nanosoldier · 2016-03-08T01:54:27Z

Your benchmark job has completed, but something went wrong when trying to upload the result data. cc @jrevels

EDIT: I forgot to turn off a debug flag I was using when messing with the report uploading code...oops. Here are the results -- Jarrett

Group ID	Benchmark ID	time	GC time	memory allocated	number of allocations
`"array index sum"`	`("sumelt","BaseBenchmarks.ArrayBenchmarks.ArrayLF{Float32,2}",(3,5))`	0.78 ✅	1.00	1.00	1.00
`"array index sum"`	`("sumelt","BaseBenchmarks.ArrayBenchmarks.ArrayLSLS{Int32,2}",(300,500))`	0.65 ✅	1.00	1.00	1.00
`"array index sum"`	`("sumelt","BaseBenchmarks.ArrayBenchmarks.ArrayLS{Int32,2}",(300,500))`	0.65 ✅	1.00	1.00	1.00
`"array index sum"`	`("sumrange","BaseBenchmarks.ArrayBenchmarks.ArrayLF{Int32,2}",(3,5))`	1.62 ❌	0.97	1.00	1.00
`"simd"`	`("axpy!","Float32",10)`	0.71 ✅	1.00	1.00	1.00
`"simd"`	`("conditional_loop!","Float32",9)`	0.79 ✅	1.00	1.00	1.00
`"simd"`	`("conditional_loop!","Float32",10)`	0.79 ✅	1.00	1.00	1.00
`"simd"`	`("conditional_loop!","Float64",9)`	0.75 ✅	1.00	1.00	1.00
`"simd"`	`("conditional_loop!","Float64",10)`	0.79 ✅	1.00	1.00	1.00
`"simd"`	`("inner","Float32",10)`	0.74 ✅	1.00	1.00	1.00
`"simd"`	`("inner","Float32",255)`	0.76 ✅	1.00	1.00	1.00
`"simd"`	`("inner","Float32",256)`	0.77 ✅	1.00	1.00	1.00
`"simd"`	`("loop_fields!","Float32","BaseBenchmarks.SIMDBenchmarks.ImmutableFields{V<:AbstractArray{T,1}}",256)`	0.48 ✅	0.98	1.00	1.00
`"simd"`	`("loop_fields!","Float32","BaseBenchmarks.SIMDBenchmarks.MutableFields{V<:AbstractArray{T,1}}",256)`	0.48 ✅	0.99	1.00	1.00
`"simd"`	`("manual_example!","Float32",10)`	0.71 ✅	1.00	1.00	1.00
`"simd"`	`("manual_example!","Int64",9)`	0.76 ✅	1.00	1.00	1.00
`"simd"`	`("manual_example!","Int64",10)`	0.76 ✅	1.00	1.00	1.00
`"simd"`	`("two_reductions","Float32",10)`	0.66 ✅	1.00	1.00	1.00
`"simd"`	`("two_reductions","Float32",256)`	0.79 ✅	1.00	1.00	1.00

timholy · 2016-03-08T03:15:58Z

While the nanosoldier benchmark didn't work out, it's not worth trying again: I've run the benchmarks on my local machine, and there are no reproducible regressions. For the record, when nanosoldier errors out, it's definitely worth looking into: without 1982118, because of the change in dispatch priority this would have had massive performance regressions for sparse matrix multiplication. ("...janitorial changes have a nasty habit of generating unexpected performance regressions" is a lesson learned the hard way.)

I'll leave this open longer to ensure that any concerns have been addressed.

nalimilan · 2016-03-08T10:19:49Z

I agree with @timholy. Generic definitions should accept any AbstractArray as long as that interface is enough for them to return correct results. More efficient methods can then be provided to improve performance, but there's no reason to refuse doing something just because it's likely to be slow. In some cases, the slow version can be good enough; OTC, a MethodError is never useful.

timholy · 2016-03-08T11:37:38Z

Thanks, @jrevels!

andreasnoack · 2016-03-08T15:34:24Z

a MethodError is never useful.

Above, I argued for the opposite. It is actually pretty useful when you try to exploit the structure of a type.

I'm okay with this change since it is consistent and will probably give a lot of users fewer error messages, but I wanted to point out that methods that extract each element of AbtractArrays will, in general, result in a lot of slow methods. Not just for sparse matrices but for all kinds of structured arrays. This includes the ProductMatrix for which you could easily write an O(n) matvec, but will fall back to the generic O(n^2) version with this change.

nalimilan · 2016-03-08T16:30:17Z

Above, I argued for the opposite. It is actually pretty useful when you try to exploit the structure of a type.

Sorry, but I don't understand what a MethodError allows you that a generic fallback prevents.

I'm okay with this change since it is consistent and will probably give a lot of users fewer error messages, but I wanted to point out that methods that extract each element of AbtractArrays will, in general, result in a lot of slow methods. Not just for sparse matrices but for all kinds of structured arrays. This includes the ProductMatrix for which you could easily write an O(n) matvec, but will fall back to the generic O(n^2) version with this change.

You could easily write it even if the fallback exists, so what's the problem?

andreasnoack · 2016-03-08T16:44:27Z

Sorry, but I don't understand what a MethodError allows you that a generic fallback prevents.

Getting the MethodError makes it easy to figure out which methods you'll have to define for a type. I'm simply talking from my own experience with implementing different kinds of distributed arrays. I end up not making them a subtype of AbstractArray to make sure they don't fall back on any generic method that uses indexing. The complete call graph of a method is not always easy to guess so in practice you'll have to do a lot of profiling to find the places where generic fallbacks are slowing down your code. It is easy to miss a method definition and, therefore, we'll end up with slow methods. In theory, there is nothing that prevents people from writing the fast methods but, in practice, people will forget some when they don't get an error.

nalimilan · 2016-03-08T17:23:27Z

I see. With protocols/traits, it would be easy to list which all methods that the type needs to support and show for each whether it uses the AbstractArray fallback or whether it is specialized.

timholy · 2016-03-08T18:57:59Z

That's an interesting and quite intriguing development strategy, I'd never thought of that before.

I expect that the debugger will change the calculus a lot, since it will be much easier to step in and see what gets called. In other words, even though it's useful from the standpoint of someone writing the library, I doubt we want such considerations to affect how we architect the standard library from a user perspective.

tkelman · 2016-03-08T20:16:07Z

There are certainly some types where you'd want fallback methods to be noticeable via warnings or errors in some debug/development/noisy configurations, but I think the default behavior should be to do the correct thing rather than error even if it's slow. Especially if the slowness is fixable - but to fix it someone has to be aware of it first.

StridedArray->AbstractArray in many places

timholy reviewed Mar 5, 2016
View reviewed changes

timholy mentioned this pull request Mar 5, 2016

No matrix operations for sub(SparseMatrixCSC, ..., ...)? #15355

Closed

pkofod mentioned this pull request Mar 5, 2016

Linear algebra test coverage for SubArray JuliaLang/LinearAlgebra.jl#299

Closed

22 tasks

tkelman mentioned this pull request Mar 7, 2016

Add tests for SubArrays in test/linalg/bunchkaufman.jl #15354

Merged

timholy added 3 commits March 7, 2016 05:43

StridedArray->AbstractArray in many places

57c53af

This allows us to defer the definition of StridedArray until much later in the bootstrap process.

Add some tests

03c2ef6

Fix dispatch on sparse multiplication

1982118

timholy force-pushed the teh/less_striding branch from 87cfcb4 to 1982118 Compare March 7, 2016 15:58

timholy added a commit that referenced this pull request Mar 8, 2016

Merge pull request #15367 from JuliaLang/teh/less_striding

dc0a038

StridedArray->AbstractArray in many places

timholy merged commit dc0a038 into master Mar 8, 2016

timholy deleted the teh/less_striding branch March 8, 2016 22:49

timholy mentioned this pull request Mar 10, 2016

* has no method matching *(::Array{Float64,2}, ::Tridiagonal{Float64}) JuliaLang/LinearAlgebra.jl#263

Closed


		@test_throws ErrorException transpose(sub(sprandn(10, 10, 0.3), 1:4, 1:4))
		@test_throws ErrorException ctranspose(sub(sprandn(10, 10, 0.3), 1:4, 1:4))

Uh oh!

StridedArray->AbstractArray in many places #15367

StridedArray->AbstractArray in many places #15367

Uh oh!

Conversation

timholy commented Mar 5, 2016

Uh oh!

timholy Mar 5, 2016

Choose a reason for hiding this comment

Uh oh!

tkelman Mar 6, 2016

Choose a reason for hiding this comment

Uh oh!

tkelman Mar 6, 2016

Choose a reason for hiding this comment

Uh oh!

timholy Mar 6, 2016

Choose a reason for hiding this comment

Uh oh!

timholy commented Mar 5, 2016

Uh oh!

nalimilan commented Mar 5, 2016

Uh oh!

tkelman commented Mar 6, 2016

Uh oh!

timholy commented Mar 6, 2016

Uh oh!

jrevels commented Mar 7, 2016

Uh oh!

timholy commented Mar 7, 2016

Uh oh!

timholy commented Mar 7, 2016

Uh oh!

jrevels commented Mar 7, 2016

Uh oh!

jrevels commented Mar 7, 2016

Uh oh!

timholy commented Mar 7, 2016

Uh oh!

jrevels commented Mar 7, 2016

Uh oh!

Jutho commented Mar 7, 2016

Uh oh!

timholy commented Mar 7, 2016

Uh oh!

timholy commented Mar 8, 2016

Uh oh!

andreasnoack commented Mar 8, 2016

Uh oh!

timholy commented Mar 8, 2016

Uh oh!

nanosoldier commented Mar 8, 2016

Uh oh!

timholy commented Mar 8, 2016

Uh oh!

nalimilan commented Mar 8, 2016

Uh oh!

timholy commented Mar 8, 2016

Uh oh!

andreasnoack commented Mar 8, 2016

Uh oh!

nalimilan commented Mar 8, 2016

Uh oh!

andreasnoack commented Mar 8, 2016

Uh oh!

nalimilan commented Mar 8, 2016

Uh oh!

timholy commented Mar 8, 2016

Uh oh!

tkelman commented Mar 8, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants