Skip to content

Missing TBAA information on view #34847

@vchuravy

Description

@vchuravy

In https://discourse.julialang.org/t/rfc-ann-restacker-jl-a-workaround-for-the-heap-allocated-immutable-problem/35037 @tkf that copying a view manually at that tart of the function causes a sizable performance improvement. It seems different from the stack allocation of immutable that contain heap references (x-ref: #34126). From my brief investigation it seems like the performance difference is due to a vectorization failure. The example I looked at is:

@noinline function f!(ys, xs)
         @inbounds for i in eachindex(ys, xs)
           x = xs[i]
           if -0.5 < x < 0.5
             ys[i] = 2x
           end
        end
end

"Restacking" ys causes this function to be vectorizable.
Looking at the optimized LLVM IR I noticed

   %40 = add i64 %39, %28, !dbg !67
   %41 = getelementptr inbounds double, double addrspace(13)* %31, i64 %40, !dbg !67
   %42 = load double, double addrspace(13)* %41, align 8, !dbg !67, !tbaa !70
...
   %47 = add i64 %39, %36, !dbg !81
   %48 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %38, align 8, !dbg !81, !tbaa !59, !nonnull !4
   %49 = getelementptr inbounds double, double addrspace(13)* %48, i64 %47, !dbg !81
   store double %46, double addrspace(13)* %49, align 8, !dbg !81, !tbaa !70

Where %42 = load is the getindex from xs and store double is the setindex! into ys. Notice how in %48 we are reloading the base pointer of the array at each iteration.

So how are we deriving %38 and %48, why don't we have to reload the pointer we are loading from?

The series of operations we do to obtain the base pointer for the getpointer

  %24 = bitcast %jl_value_t addrspace(11)* %11 to %jl_value_t addrspace(10)* addrspace(11)*
  %25 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %24, align 8, !tbaa !41, !nonnull !4, !dereferenceable !57, !align !58
  %26 = getelementptr inbounds i8, i8 addrspace(11)* %12, i64 16
  %27 = bitcast i8 addrspace(11)* %26 to i64 addrspace(11)*
  %28 = load i64, i64 addrspace(11)* %27, align 8, !tbaa !41
  %29 = addrspacecast %jl_value_t addrspace(10)* %25 to %jl_value_t addrspace(11)*
  %30 = bitcast %jl_value_t addrspace(11)* %29 to double addrspace(13)* addrspace(11)*
  %31 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %30, align 8, !tbaa !59, !nonnull !4

the operations we do to obtain the pointer from which to load the base pointer.

  %32 = bitcast %jl_value_t addrspace(11)* %8 to %jl_value_t addrspace(10)* addrspace(11)*
  %33 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %32, align 8
  %34 = getelementptr inbounds i8, i8 addrspace(11)* %9, i64 16
  %35 = bitcast i8 addrspace(11)* %34 to i64 addrspace(11)*
  %36 = load i64, i64 addrspace(11)* %35, align 8
  %37 = addrspacecast %jl_value_t addrspace(10)* %33 to %jl_value_t addrspace(11)*
  %38 = bitcast %jl_value_t addrspace(11)* %37 to double addrspace(13)* addrspace(11)*

Notice how in the later case we are missing all TBAA information. The two interesting tbaa trees are below.

!41 = !{!42, !42, i64 0}
!42 = !{!"jtbaa_immut", !43, i64 0}
!43 = !{!"jtbaa_value", !44, i64 0}
!44 = !{!"jtbaa_data", !45, i64 0}
!45 = !{!"jtbaa", !46, i64 0}
!46 = !{!"jtbaa"}
!59 = !{!60, !60, i64 0}
!60 = !{!"jtbaa_arrayptr", !61, i64 0}
!61 = !{!"jtbaa_array", !45, i64 0}

I don't understand our TBAA annotations well enough to conclude that this is a bug, but shouldn't the view be marked immutable either way?

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceMust go fasterupstreamThe issue is with an upstream dependency, e.g. LLVM

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions