Skip to content

LinearAlgebra and TensorCore future design: Be more greedy #729

@EricForgy

Description

@EricForgy

TL ; DR

I'm proposing a change to the internal structure of LinearAlgebra that is more suitable for a to-be-developed TensorAlgebra library.

What it does:

  • Identifies the vector spaces to which vectors and matrices belong, e.g. ℝ3, U, U*, etc.
  • Makes Covectors first-class citizens, while retaining Transpose and Adjoint.
  • Introduces consistent, but distinct associative operators * and .

What it does not do:

  • Introduce new indexing.

For easily 95% of use cases, I don't think there will be any impact at all. Most of the machinery can be hidden behind the scenes, but it paves the way to a clean tensor implementation that makes the effort worthwhile.

Summary

This is a follow-up from a discussion in PR JuliaLang/julia#35150.

In particular (from Jeff):

It would be great to have an issue (is there one already?) listing all the dodgy stuff in LinearAlgebra that we want to deprecate or revisit.

Any concerns I have don't rise to the level of "dodgy" by any means since I think we already do things better than any alternative I have seen, but I do think we can do some things better.

If I were to try to summarize Jiahao's already nice summary, I'd say the conclusion was that scalars types, vector types and matrix types are not sufficient for linear algebra. You need a fourth fundamental type, i.e. covectors, which at the time, they called RowVector (but at some point along the way, RowVector was removed from LinearAlgebra for some reason).

The fact that "LinearAlgebra done right" requires dual vectors is no surprise since a linear map (matrix) is already a once-contravariant (vector) and once-covariant (covector) tensor, i.e. a machine that takes vectors and spits out vectors in a linear manner.

Although we lost RowVector (which I would call Covector if we were to introduce it again), given a vector u, we still have Transpose(u) and Adjoint(u). These are both, in fact, covectors so this is already better than any alternatives I have seen. However, not all covectors arise from taking the transpose or adjoint of a vector. For instance, the simplest nontrivial example of a pure covector I can think of is the differential:

df = (@_x f) dx + (@_y f) dy + (@_z f) dz.

[Ascii math note: @_x denotes "partial derivative with respect to x"]

[Aside: Given the importance of machine learning and automatic differentiation, I think getting this stuff right has some real practical benefits, so I hope what I'm saying isn't seen as purely theoretical.]

From a differential topological point of view, the differential df is more fundamental than grad(f), since grad(f) is defined as

grad(f) := (df)'

which has a dependency on the metric, where df does not. In fact, when you see things like

[f,@_x f,@_y f, @_z f]

in automatic differentiation, one could argue (correctly) that you are actually dealing with covectors. Calling these "gradients" is a misnomer that I'm not sure is healthy to propagate. You don't "pullback vectors". You "pullback covectors".

In the spirit of greediness, I want Julia to be even more greedy about getting this right where no one else I am aware (except maybe Haskell, which I am not familiar with) gets right. The difference between vectors and covectors matters although you can probably get away without worrying about it for 90% of use cases, but I don't think we are shooting for 90% excellence. We want to be 100% excellent and we can be.

LinearAlgebra

As I see it now, LinearAlgebra has 3 fundamental types:

  • Scalars: T
  • Vectors: Array{T,1}
  • Matrices: Array{T,2}

with corresponding derived types:

  • Transpose
  • Adjoint

Given a vector u, Transpose(u) and Adjoint(u) are covectors, but rightly have their own type distinct from Covector because of the way the data is laid out and how they interact with other operations (via dispatch).

To do LinearAlgebra right, I think we need (at least) 3 fundamental types:

  • Scalars
  • Vectors
  • Covectors

with corresponding derived types:

  • Matrices
  • Transpose
  • Adjoint

[Note: If we have vectors and covectors as fundamental types, a matrix is actually a derived type.]

If we were super greedy, we'd also include:

  • PseudoVectors
  • PseudoCovectors

For example, if u and v are vectors, the vector cross product u × v is actually a pseudovector.

I have put some thought into how to do this, but it is not easy to get right (otherwise everyone would have already done it). For example, one of my earlier ideas was to extend Array to negative dimensions so that:

const Covector{T} = Array{T,-1}

but that seems unsatisfactory and not great for students just learning linear algebra.

Then, there is a beautiful package: TensorKit.jl. The documentation goes into some awesome background material. It is probably my own failing, but some things there are little hard for me to understand, but what I do understand looks great and forms a nice basis for thinking through this stuff.

I think that getting LinearAlgebra right requires getting tensors right. While we sort all this out, Tim created TensorCore.jl

TensorCore

After the discussion in JuliaLang/julia#35150, a merged PR that introduced tensor product to LinearAlgebra was reverted and moved to TensorCore.jl. That seems like a good place to experiment and sort this out.

Both vectors and covectors are special kinds of tensors, so it seems sensible to try to work this out for tensors and then make Vector and Covector special instances of a new Tensor type rather than Array.

I sketched some initial (only lightly baked) ideas in a comment on the PR. The idea is that given vector spaces U, V, W and their respective dual spaces U*, V*, W*, we can define a vector u ϵ U as an instance of Tensor{T,U}, i.e.

typeof(u) == Tensor{T <: Number, U <: VectorSpace}

Similarly, we can define a covector α ϵ U^* as an instance of Tensor{T,U*}.

Note: U* doesn't actually parse here, but I experimented with something like

struct U{T} end

so that we can use types U and U{*} for dispatch. When written out, it would look like Tensor{T,U{*}}, but I'll continue to write U* for now.

In this way, a matrix A would be an instance of type Tensor{T,V,U*}.

The tensor product of two vectors of type Tensor{T,U} and Tensor{T,V} is a tensor of type Tensor{T,U,V}, i.e. the list of vector space type parameters is concatenated.

Given two vectors u ϵ U and v ϵ V together with covectors α ϵ U* and β ϵ V*, tensor product results in four distinct geometric objects:

  • u ⊗ v: An instance of Tensor{T,U,V}
  • u ⊗ β: An instance of Tensor{T,U,V*}, i.e. a "matrix"
  • α ⊗ v: An instance of Tensor{T,U*,V}
  • α ⊗ β: An instance of Tensor{T,U*,V*}

with additional obvious permutations. This extends to higher order, e.g.

  • (u ⊗ v) ⊗ (α ⊗ β): An instance of Tensor{T,U,V,U*,V*}.

Multilinear Machines

Tensors can be thought of multilinear machines taking vectors and covectors as arguments and producing a number that is linear in each argument, e.g. if A is an instance of Tensor{T,U,V*,U*,V} with u, v, α and β as above, then

A(α,v,u,β)

is an instance of T.

Partial Evaluation

With the same A as above, we can consider partial evaluation:

A(-,v,u,β).

This an an instance of type Tensor{T,U}, i.e. a vector.

Similarly,

A(α,-,u,β)

is an instance of type Tensor{T,V*}, i.e. a covector.

Hence, a vector can be thought of as a machine that takes a covector and produces a number in a linear manner. Similarly, a covector can be thought of as a machine that takes a vector and produces a number in a linear manner.

Now, we can revisit matrices as linear maps by considering

A(-,-,u,β)

which is an instance of Tensor{T,U,V*}, which we've said is a matrix. If we feed A one more vector like

A(-,v,u,β).

we get a machine that will take one more covector and produce a number, but we already said this type of machine is a vector, so A(-,-,u,β) is a linear map V -> U.

Tensor Operations

I think there is one area we did not get right with LinearAlgebra and I suspect it is the source of issues elsewhere. Given vectors u ϵ U and v ϵ V, LinearAlgebra currently defines

u*v'

to be a matrix. This is unfortunate. The correct expression should be

u ⊗ v'

is a matrix. So we have effectively made * = ⊗. This is unfortunate because * and are two distinct tensor operators with different meanings so identifying them is problematic.

Since the proposed tensor type Tensor consists of a scalar type T and an ordered list of vector spaces and dual vector spaces, I'd probably introduce three methods:

  • lastspace(A::Tensor)::VectorSpace
    • Returns the last (rightmost) vector space.
  • firstspace(A::Tensor)::VectorSpace
    • Returns the first (leftmost) vector space.
  • aredual(U::VectorSpace,V::VectorSpace)::Bool
    • Returns true if the two spaces are dual to each other.

Then, we can define A1*A2 for tensors A1 and A2 only if

aredual(lastspace(A1),firstspace(A2)

i.e. multiplication of tensors is only defined if the last space of the first tensor is dual to the first space of the second tensor. This can be thought of as path concatenation. Two paths can only be concatenated if the end of the first path coincides with the beginning of the second path.

If that is the case, then we simply contract the last tensor index of the first tensor with the the first tensor index of the second tensor.

For instance, given a matrix A of type Tensor{T,U,V*} and a vector u of type Tensor{T,V} then we have

*:  Tensor{T,U,V*} x Tensor{T,V} -> Tensor{T,U}.

This is clearly distinct from:

⊗: Tensor{T,U,V*} x Tensor{T,V} -> Tensor{T,U,V*,V}

As a more general example, consider

*: Tensor{T,U,V*,W} x Tensor{T,W*,V,U} -> Tensor{T,U,V*,V,U}

and

⊗: Tensor{T,U,V*,W} x Tensor{T,W*,V,U} -> Tensor{T,U,V*,W,W*,V,U}

Conclusion

It may seem like overkill to bring in such discussion of tensor algebra just in order to get LinearAlgebra right, but I am greedy and I would like LinearAlgebra to be consistent with a more general TensorAlgebra (or TensorCore). I think it is worth trying to get tensors right and then revise LinearAlgebra accordingly for consistency.

I also think the scope of LinearAlgebra should be confined to vectors, covectors, matrices (as linear maps) and possibly other rank-2 tensors since

  • u ⊗ v can be thought of as a linear map Covector -> Vector
  • u ⊗ β can be thought of as a linear map Vector -> Vector
  • α ⊗ v can be thought of as a linear map Covector -> Covector
  • α ⊗ β can be thought of as a linear map Vector -> Covector

Operations that keep you in scope, e.g. dot, are fine, but things like kron that take you out of scope should probably be in another package like TensorCore (or some new TensorAlgebra).

I see this as a long term project. Maybe for Julia v2.0 or v3.0. It would be awesome if something like this could make it to v2.0, but it is a lot of work. I can try to make a POC package and see if it makes sense to others. In any case, after the discussion in JuliaLang/julia#35150 , I felt like writing a comprehensive summary might be of interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions