LinearAlgebra and TensorCore future design: Be more greedy

## TL ; DR

I'm proposing a change to the internal structure of `LinearAlgebra` that is more suitable for a to-be-developed `TensorAlgebra` library.

What it does:

- Identifies the vector spaces to which vectors and matrices belong, e.g. `ℝ3`, `U`, `U*`, etc.
- Makes `Covector`s first-class citizens, while retaining `Transpose` and `Adjoint`.
- Introduces consistent, but distinct associative operators `*` and `⊗`.

What it does not do:

- Introduce new indexing. 

For easily 95% of use cases, I don't think there will be any impact at all. Most of the machinery can be hidden behind the scenes, but it paves the way to a clean tensor implementation that makes the effort worthwhile.

## Summary 

This is a follow-up from a discussion in PR JuliaLang/julia#35150.

In particular (from [Jeff](https://github.com/JuliaLang/julia/pull/35150#issuecomment-624112184)):

> It would be great to have an issue (is there one already?) listing all the dodgy stuff in LinearAlgebra that we want to deprecate or revisit. 

Any concerns I have don't rise to the level of "dodgy" by any means since I think we already do things better than any alternative I have seen, but I do think we can do some things better.

If I were to try to summarize Jiahao's already nice [summary](https://www.youtube.com/watch?v=C2RO34b_oPM), I'd say the conclusion was that scalars types, vector types and matrix types are not sufficient for linear algebra. You need a fourth fundamental type, i.e. covectors, which at the time, they called `RowVector` (but at some point along the way, `RowVector` was removed from `LinearAlgebra` for some reason).

The fact that "LinearAlgebra done right" requires dual vectors is no surprise since a linear map (matrix) is already a once-contravariant (vector) and once-covariant (covector) tensor, i.e. a machine that takes vectors and spits out vectors in a linear manner. 

Although we lost `RowVector` (which I would call `Covector` if we were to introduce it again), given a vector `u`, we still have `Transpose(u)` and `Adjoint(u)`. These are both, in fact, covectors so this is already better than any alternatives I have seen. However, not all covectors arise from taking the transpose or adjoint of a vector. For instance, the simplest nontrivial example of a pure covector I can think of is the differential:

```
df = (@_x f) dx + (@_y f) dy + (@_z f) dz.
```

[Ascii math note: `@_x` denotes "partial derivative with respect to `x`"]

[Aside: Given the importance of machine learning and automatic differentiation, I think getting this stuff right has some real practical benefits, so I hope what I'm saying isn't seen as purely theoretical.]

From a differential topological point of view, the differential `df` is more fundamental than `grad(f)`, since `grad(f)` is defined as

```
grad(f) := (df)'
```

which has a dependency on the metric, where `df` does not. In fact, when you see things like

```
[f,@_x f,@_y f, @_z f]
```

in automatic differentiation, one could argue (correctly) that you are actually dealing with covectors. Calling these "gradients" is a misnomer that I'm not sure is healthy to propagate. You don't "pullback vectors". You "pullback covectors".

In the spirit of greediness, I want Julia to be even more greedy about getting this right where no one else I am aware (except maybe Haskell, which I am not familiar with) gets right. The difference between vectors and covectors matters although you can probably get away without worrying about it for 90% of use cases, but I don't think we are shooting for 90% excellence. We want to be 100% excellent and we can be.

## LinearAlgebra

As I see it now, `LinearAlgebra` has 3 fundamental types:

- Scalars: `T`
- Vectors: `Array{T,1}`
- Matrices: `Array{T,2}`

with corresponding derived types:

- `Transpose`
- `Adjoint`

Given a vector `u`, `Transpose(u)` and `Adjoint(u)` are covectors, but rightly have their own type distinct from `Covector` because of the way the data is laid out and how they interact with other operations (via dispatch).

To do LinearAlgebra right, I think we need (at least) 3 fundamental types:

- Scalars
- Vectors
- Covectors

with corresponding derived types:

- Matrices
- Transpose
- Adjoint

[Note: If we have vectors and covectors as fundamental types, a matrix is actually a derived type.]

If we were super greedy, we'd also include:

- PseudoVectors
- PseudoCovectors

For example, if `u` and `v` are vectors, the vector cross product `u × v` is actually a [pseudovector](https://en.wikipedia.org/wiki/Pseudovector).

I have put some thought into how to do this, but it is not easy to get right (otherwise everyone would have already done it). For example, one of my earlier ideas was to extend `Array` to negative dimensions so that:

```julia
const Covector{T} = Array{T,-1}
```

but that seems unsatisfactory and not great for students just learning linear algebra.

Then, there is a beautiful package: [TensorKit.jl](https://jutho.github.io/TensorKit.jl/stable/). The documentation goes into some awesome background material. It is probably my own failing, but some things there are little hard for me to understand, but what I do understand looks great and forms a nice basis for thinking through this stuff.

I think that getting `LinearAlgebra` right requires getting tensors right. While we sort all this out, Tim created [TensorCore.jl](https://github.com/JuliaMath/TensorCore.jl)

## TensorCore

After the discussion in JuliaLang/julia#35150, a merged PR that introduced tensor product `⊗` to `LinearAlgebra` was reverted and moved to [TensorCore.jl](https://github.com/JuliaMath/TensorCore.jl). That seems like a good place to experiment and sort this out.

Both vectors and covectors are special kinds of tensors, so it seems sensible to try to work this out for tensors and then make `Vector` and `Covector` special instances of a new `Tensor` type rather than `Array`.

I sketched some initial (only lightly baked) ideas in a [comment](https://github.com/JuliaLang/julia/pull/35150#issuecomment-623192189) on the PR. The idea is that given vector spaces `U`, `V`, `W` and their respective dual spaces `U*`, `V*`, `W*`, we can define a vector `u ϵ U` as an instance of `Tensor{T,U}`, i.e. 

```julia
typeof(u) == Tensor{T <: Number, U <: VectorSpace}
```

Similarly, we can define a covector `α ϵ U^*` as an instance of `Tensor{T,U*}`.

> Note: `U*` doesn't actually parse here, but I experimented with something like
>
> ```julia
> struct U{T} end
> ```
> 
> so that we can use types `U` and `U{*}` for dispatch. When written out, it would look like `Tensor{T,U{*}}`, but I'll continue to write `U*` for now.

In this way, a matrix `A` would be an instance of type `Tensor{T,V,U*}`.

The tensor product of two vectors of type `Tensor{T,U}` and `Tensor{T,V}` is a tensor of type `Tensor{T,U,V}`, i.e. the list of vector space type parameters is concatenated.

Given two vectors `u ϵ U` and `v ϵ V` together with covectors `α ϵ U*` and `β ϵ V*`, tensor product results in four distinct geometric objects:

- `u ⊗ v`: An instance of `Tensor{T,U,V}`
- `u ⊗ β`: An instance of `Tensor{T,U,V*}`, i.e. a "matrix"
- `α ⊗ v`:  An instance of `Tensor{T,U*,V}`
- `α ⊗ β`: An instance of `Tensor{T,U*,V*}`

with additional obvious permutations. This extends to higher order, e.g.

- `(u ⊗ v) ⊗ (α ⊗ β)`: An instance of `Tensor{T,U,V,U*,V*}`.

### Multilinear Machines

Tensors can be thought of multilinear machines taking vectors and covectors as arguments and producing a number that is linear in each argument, e.g. if `A` is an instance of `Tensor{T,U,V*,U*,V}` with `u`, `v`, `α` and `β` as above, then

```
A(α,v,u,β)
```

 is an instance of `T`.

### Partial Evaluation

With the same `A` as above, we can consider partial evaluation:

```
A(-,v,u,β).
```

This an an instance of type `Tensor{T,U}`, i.e. a vector. 

Similarly,

```
A(α,-,u,β)
```

is an instance of type `Tensor{T,V*}`, i.e. a covector. 

Hence, a vector can be thought of as a machine that takes a covector and produces a number in a linear manner. Similarly, a covector can be thought of as a machine that takes a vector and produces a number in a linear manner.

Now, we can revisit matrices as linear maps by considering

```
A(-,-,u,β)
```

which is an instance of `Tensor{T,U,V*}`, which we've said is a matrix. If we feed `A` one more vector like

```
A(-,v,u,β).
```

we get a machine that will take one more covector and produce a number, but we already said this type of machine is a vector, so `A(-,-,u,β)` is a linear map `V -> U`.

## Tensor Operations

I think there is one area we did not get right with `LinearAlgebra` and I suspect it is the source of issues elsewhere.  Given vectors `u ϵ U` and `v ϵ V`, `LinearAlgebra` currently defines

```
u*v'
```

to be a matrix. This is unfortunate. The correct expression should be

```
u ⊗ v'
```

is a matrix. So we have effectively made `* = ⊗`. This is unfortunate because `*` and `⊗` are two distinct tensor operators with different meanings so identifying them is problematic.

Since the proposed tensor type `Tensor` consists of a scalar type `T` and an ordered list of vector spaces and dual vector spaces, I'd probably introduce three methods:

- `lastspace(A::Tensor)::VectorSpace`
  - Returns the last (rightmost) vector space.
- `firstspace(A::Tensor)::VectorSpace`
  - Returns the first (leftmost) vector space.
- `aredual(U::VectorSpace,V::VectorSpace)::Bool`
  - Returns true if the two spaces are dual to each other.

Then, we can define `A1*A2` for tensors `A1` and `A2` only if

```
aredual(lastspace(A1),firstspace(A2)
```

i.e. multiplication of tensors is only defined if the last space of the first tensor is dual to the first space of the second tensor. This can be thought of as path concatenation. Two paths can only be concatenated if the end of the first path coincides with the beginning of the second path.

If that is the case, then we simply contract the last tensor index of the first tensor with the the first tensor index of the second tensor.

For instance, given a matrix `A` of type `Tensor{T,U,V*}` and a vector `u` of type `Tensor{T,V}` then we have

```
*:  Tensor{T,U,V*} x Tensor{T,V} -> Tensor{T,U}.
```

This is clearly distinct from:

```
⊗: Tensor{T,U,V*} x Tensor{T,V} -> Tensor{T,U,V*,V}
```


As a more general example, consider

```
*: Tensor{T,U,V*,W} x Tensor{T,W*,V,U} -> Tensor{T,U,V*,V,U}
```

and

```
⊗: Tensor{T,U,V*,W} x Tensor{T,W*,V,U} -> Tensor{T,U,V*,W,W*,V,U}
```

## Conclusion

It may seem like overkill to bring in such discussion of tensor algebra just in order to get `LinearAlgebra` right, but I am greedy and I would like `LinearAlgebra` to be consistent with a more general `TensorAlgebra` (or `TensorCore`). I think it is worth trying to get tensors right and then revise `LinearAlgebra` accordingly for consistency.

I also think the scope of `LinearAlgebra` should be confined to vectors, covectors, matrices (as linear maps) and possibly other rank-2 tensors since

- `u ⊗ v` can be thought of as a linear map `Covector -> Vector`
- `u ⊗ β` can be thought of as a linear map `Vector -> Vector`
- `α ⊗ v` can be thought of as a linear map `Covector -> Covector`
- `α ⊗ β` can be thought of as a linear map `Vector -> Covector`

Operations that keep you in scope, e.g. `dot`, are fine, but things like `kron` that take you out of scope should probably be in another package like `TensorCore` (or some new `TensorAlgebra`).

I see this as a long term project. Maybe for Julia v2.0 or v3.0. It would be awesome if something like this could make it to v2.0, but it is a lot of work. I can try to make a POC package and see if it makes sense to others. In any case, after the discussion in JuliaLang/julia#35150 , I felt like writing a comprehensive summary might be of interest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

LinearAlgebra and TensorCore future design: Be more greedy #729

TL ; DR

Summary

LinearAlgebra

TensorCore

Multilinear Machines

Partial Evaluation

Tensor Operations

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

LinearAlgebra and TensorCore future design: Be more greedy #729

Description

TL ; DR

Summary

LinearAlgebra

TensorCore

Multilinear Machines

Partial Evaluation

Tensor Operations

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions