Skip to content

Conversation

@KDr2
Copy link
Member

@KDr2 KDr2 commented Mar 4, 2022

No description provided.

@KDr2
Copy link
Member Author

KDr2 commented Mar 4, 2022

CodeInfoTools gives us wrong inferred types: https://github.com/TuringLang/Libtask.jl/runs/5416269891?check_suite_focus=true#step:6:77

CodeInfo(
    @ /home/runner/work/Libtask.jl/Libtask.jl/test/tf.jl:8 within `S`
1 ─ %1 = Main.S::Nothing
│   %2 = Core.fieldtype(%1, 1)::Core.Const(S)
│   %3 = (x + y)::Core.Const(Int64)
│   %4 = Base.convert(%2, %3)::Int64
│   %5 = %new(%1, %4)::Int64
└──      return %5
)

Should be:

CodeInfo(
    @ /home/runner/work/Libtask.jl/Libtask.jl/test/tf.jl:8 within `S`
1 ─ %1 = Main.S::Core.Const(S)
│   %2 = Core.fieldtype(%1, 1)::Core.Const(Int64)
│   %3 = (x + y)::Int64
│   %4 = Base.convert(%2, %3)::Int64
│   %5 = %new(%1, %4)::S
└──      return %5
)

I don't know why there's a Nothing at the beginning, but on my own machine it gives the correct one.

@yebai
Copy link
Member

yebai commented Mar 4, 2022

I don't know why there's a Nothing at the beginning, but on my own machine it gives the correct one.

@KDr2 Are you running the test using ] test on your machine? There might be some weird scoping issue with GlobalRef.

@KDr2
Copy link
Member Author

KDr2 commented Mar 4, 2022

@KDr2 Are you running the test using ] test on your machine? There might be some weird scoping issue with GlobalRef.

Yes, I used ] test and all tests passed. I checked the generated CodeInfo in REPL, all goes right.

EDIT:
by all tests I meant all unit tests in Libtask.

@yebai
Copy link
Member

yebai commented Mar 4, 2022

Main.S is a global variable, in general, its type can change during runtime. But the example above Main.S is a type, so it should be constant.

1 ─ %1 = Main.S::Nothing
│   %2 = Core.fieldtype(%1, 1)::Core.Const(S)

In addition, how can we get fieldtype of a Main.S::Nothing variable?

@femtomc Do you know what's going on here?

@yebai
Copy link
Member

yebai commented Mar 4, 2022

@KDr2 can you post your local package environment, and compare it against the one on CI?

@KDr2
Copy link
Member Author

KDr2 commented Mar 4, 2022

I am now on my way home, but I had done that already: I removed my Manifest.toml and confirmed my env was the same as the CI's, and all went well. Could you guys reproduce the error which CI encountered?

@yebai
Copy link
Member

yebai commented Mar 4, 2022

I just run the test and get the following error

Error message
counter=7
tf=TapedFunction:
* .func => #218#f
* .ir   =>
------------------
CodeInfo(
    @ /Users/hg344/.julia/dev/Libtask/test/tapedtask.jl:164 within `#218#f`
1 ─      (x = Main.zeros(2))::Vector{Float64}
    @ /Users/hg344/.julia/dev/Libtask/test/tapedtask.jl:165 within `#218#f`
2 ┄      Core.typeassert(true, Core.Bool)::Any
│   @ /Users/hg344/.julia/dev/Libtask/test/tapedtask.jl:166 within `#218#f`
│        Base.setindex!(x, 1, 1)::Any
│   @ /Users/hg344/.julia/dev/Libtask/test/tapedtask.jl:167 within `#218#f`
│        Base.setindex!(x, 2, 2)::Any
│   @ /Users/hg344/.julia/dev/Libtask/test/tapedtask.jl:168 within `#218#f`%5 = Base.getindex(x, 2)::Float64
│        Main.produce(%5)::Any
│   @ /Users/hg344/.julia/dev/Libtask/test/tapedtask.jl:169 within `#218#f`
│        Base.setindex!(x, 3, 3)::Any
└──      goto #2
3 ─      Core.Const(:(return nothing))::Union{}
)
------------------
* .tape =>
------------------
9-element RawTape:
	1 => Instruction(_2=##385(Symbol("##386"),)
	2 => Instruction(2=##387(Symbol("##388"), Symbol("##389"))
	3 => Instruction(3=##390(:_2, Symbol("##391"), Symbol("##392"))
	4 => Instruction(4=##393(:_2, Symbol("##394"), Symbol("##395"))
	5 => Instruction(5=##396(:_2, Symbol("##397"))
	6 => Instruction(6=##398(Symbol("5"),)
	7 => Instruction(7=##399(:_2, Symbol("##400"), Symbol("##401"))
	8 => GotoInstruction(_false, dest=2)
	9 => A Libtask.ReturnInstruction

------------------

bindings=(##390 = Box{GlobalRef}(##390), ##400 = Box{Int64}(##400), ##385 = Box{GlobalRef}(##385), ##387 = Box{GlobalRef}(##387), 2 = Box{Any}(2), 7 = Box{Any}(7), ##399 = Box{GlobalRef}(##399), ##394 = Box{Int64}(##394), ##393 = Box{GlobalRef}(##393), ##388 = Box{Bool}(##388), ##389 = Box{GlobalRef}(##389), ##386 = Box{Int64}(##386), 3 = Box{Any}(3), 5 = Box{Float64}(5), ##401 = Box{Int64}(##401), ##397 = Box{Int64}(##397), ##395 = Box{Int64}(##395), ##398 = Box{GlobalRef}(##398), ##391 = Box{Int64}(##391), ##392 = Box{Int64}(##392), 6 = Box{Any}(6), _2 = Box{Vector{Float64}}(_2), 4 = Box{Any}(4), ##402 = Box{Nothing}(##402), ##396 = Box{GlobalRef}(##396))
┌ Error: Instruction Executing Error: 
│   exception =
│    BoundsError: attempt to access 2-element Vector{Float64} at index [3]
│    Stacktrace:
│     [1] setindex!(A::Vector{Float64}, x::Int64, i1::Int64)
│       @ Base ./array.jl:903
│     [2] (::Libtask.Instruction{Symbol, 3})(tf::Libtask.TapedFunction{var"#218#f#10", NamedTuple{(Symbol("##390"), Symbol("##400"), Symbol("##385"), Symbol("##387"), Symbol("2"), Symbol("7"), Symbol("##399"), Symbol("##394"), Symbol("##393"), Symbol("##388"), Symbol("##389"), Symbol("##386"), Symbol("3"), Symbol("5"), Symbol("##401"), Symbol("##397"), Symbol("##395"), Symbol("##398"), Symbol("##391"), Symbol("##392"), Symbol("6"), :_2, Symbol("4"), Symbol("##402"), Symbol("##396")), Tuple{Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{GlobalRef}, Libtask.Box{GlobalRef}, Libtask.Box{Any}, Libtask.Box{Any}, Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{GlobalRef}, Libtask.Box{Bool}, Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{Any}, Libtask.Box{Float64}, Libtask.Box{Int64}, Libtask.Box{Int64}, Libtask.Box{Int64}, Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{Int64}, Libtask.Box{Any}, Libtask.Box{Vector{Float64}}, Libtask.Box{Any}, Libtask.Box{Nothing}, Libtask.Box{GlobalRef}}}})
│       @ Libtask ~/.julia/dev/Libtask/src/tapedfunction.jl:159
│     [3] (::Libtask.TapedFunction{var"#218#f#10", NamedTuple{(Symbol("##390"), Symbol("##400"), Symbol("##385"), Symbol("##387"), Symbol("2"), Symbol("7"), Symbol("##399"), Symbol("##394"), Symbol("##393"), Symbol("##388"), Symbol("##389"), Symbol("##386"), Symbol("3"), Symbol("5"), Symbol("##401"), Symbol("##397"), Symbol("##395"), Symbol("##398"), Symbol("##391"), Symbol("##392"), Symbol("6"), :_2, Symbol("4"), Symbol("##402"), Symbol("##396")), Tuple{Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{GlobalRef}, Libtask.Box{GlobalRef}, Libtask.Box{Any}, Libtask.Box{Any}, Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{GlobalRef}, Libtask.Box{Bool}, Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{Any}, Libtask.Box{Float64}, Libtask.Box{Int64}, Libtask.Box{Int64}, Libtask.Box{Int64}, Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{Int64}, Libtask.Box{Any}, Libtask.Box{Vector{Float64}}, Libtask.Box{Any}, Libtask.Box{Nothing}, Libtask.Box{GlobalRef}}}})(; callback::typeof(Libtask.producer))
│       @ Libtask ~/.julia/dev/Libtask/src/tapedfunction.jl:101
│     [4] wrap_task(::Libtask.TapedFunction{var"#218#f#10", NamedTuple{(Symbol("##390"), Symbol("##400"), Symbol("##385"), Symbol("##387"), Symbol("2"), Symbol("7"), Symbol("##399"), Symbol("##394"), Symbol("##393"), Symbol("##388"), Symbol("##389"), Symbol("##386"), Symbol("3"), Symbol("5"), Symbol("##401"), Symbol("##397"), Symbol("##395"), Symbol("##398"), Symbol("##391"), Symbol("##392"), Symbol("6"), :_2, Symbol("4"), Symbol("##402"), Symbol("##396")), Tuple{Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{GlobalRef}, Libtask.Box{GlobalRef}, Libtask.Box{Any}, Libtask.Box{Any}, Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{GlobalRef}, Libtask.Box{Bool}, Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{Any}, Libtask.Box{Float64}, Libtask.Box{Int64}, Libtask.Box{Int64}, Libtask.Box{Int64}, Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{Int64}, Libtask.Box{Any}, Libtask.Box{Vector{Float64}}, Libtask.Box{Any}, Libtask.Box{Nothing}, Libtask.Box{GlobalRef}}}}, ::Channel{Any}, ::Channel{Int64})
│       @ Libtask ~/.julia/dev/Libtask/src/tapedtask.jl:35
│     [5] (::Libtask.var"#11#12"{Libtask.TapedFunction{var"#218#f#10", NamedTuple{(Symbol("##390"), Symbol("##400"), Symbol("##385"), Symbol("##387"), Symbol("2"), Symbol("7"), Symbol("##399"), Symbol("##394"), Symbol("##393"), Symbol("##388"), Symbol("##389"), Symbol("##386"), Symbol("3"), Symbol("5"), Symbol("##401"), Symbol("##397"), Symbol("##395"), Symbol("##398"), Symbol("##391"), Symbol("##392"), Symbol("6"), :_2, Symbol("4"), Symbol("##402"), Symbol("##396")), Tuple{Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{GlobalRef}, Libtask.Box{GlobalRef}, Libtask.Box{Any}, Libtask.Box{Any}, Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{GlobalRef}, Libtask.Box{Bool}, Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{Any}, Libtask.Box{Float64}, Libtask.Box{Int64}, Libtask.Box{Int64}, Libtask.Box{Int64}, Libtask.Box{GlobalRef}, Libtask.Box{Int64}, Libtask.Box{Int64}, Libtask.Box{Any}, Libtask.Box{Vector{Float64}}, Libtask.Box{Any}, Libtask.Box{Nothing}, Libtask.Box{GlobalRef}}}}, Tuple{}, Channel{Int64}, Channel{Any}})()
│       @ Libtask ./task.jl:123
└ @ Libtask ~/.julia/dev/Libtask/src/tapedfunction.jl:167
Packages environment
  Status `~/.julia/environments/v1.7/Project.toml`

[80f14c24] AbstractMCMC v3.3.1
[0bf59076] AdvancedHMC v0.3.3
[5b7e9947] AdvancedMH v0.6.6
[576499cb] AdvancedPS v0.3.5
[b5ca4192] AdvancedVI v0.1.3
[6e4b80f9] BenchmarkTools v1.3.1
[bc773b8a] CodeInfoTools v0.3.5
[b0b7db55] ComponentArrays v0.11.10
[8f4d0f93] Conda v1.7.0
[31c24e10] Distributions v0.25.49
[e30172f5] Documenter v0.27.14
[366bfd00] DynamicPPL v0.17.8
[f6369f11] ForwardDiff v0.10.25
[069b7b12] FunctionWrappers v1.1.2
[6f1fad26] Libtask v0.7.0 ~/.julia/dev/Libtask
[429524aa] Optim v1.6.2
[91a5bcdd] Plots v1.26.0
[c3e4b0f8] Pluto v0.18.1
[438e738f] PyCall v1.93.1
[37e2e3b7] ReverseDiff v1.12.0
[efcf1570] Setfield v0.8.2
[f3b207a7] StatsPlots v0.14.33
[3ba4a88b] ThArrays v0.2.0
[9f7883ad] Tracker v0.2.20
[e88e6eb3] Zygote v0.6.35

@KDr2
Copy link
Member Author

KDr2 commented Mar 4, 2022

I just run the test and get the following error

It seems these errors are what we expect? In the test cases we intend to test exceptions?

@KDr2
Copy link
Member Author

KDr2 commented Mar 4, 2022

1 ─ %1 = Main.S::Nothing

In the integrated tests, there's no such problem, though it has some other issues.

@yebai
Copy link
Member

yebai commented Mar 4, 2022

It seems these errors are what we expect? In the test cases we intend to test exceptions?

I see, sorry for the noise. Then I can confirm ]test Libtask runs without broken tests on my computer.

@KDr2
Copy link
Member Author

KDr2 commented Mar 4, 2022

@KDr2
Copy link
Member Author

KDr2 commented Mar 4, 2022

Another issue we need to fix: https://discourse.julialang.org/t/conversion-while-constructing-object/77401

Maybe there are other types who also have this issue in the dependencies, and they may cause false data copy while actually we want to share the data.

@yebai
Copy link
Member

yebai commented Mar 4, 2022

Another issue we need to fix: https://discourse.julialang.org/t/conversion-while-constructing-object/77401

Maybe there are other types who also have this issue in the dependencies, and they may cause false data copy while actually we want to share the data.

@KDr2 Would it address these issues if we simply replace Box type with NamedTuple(id=:x, val=Ref(val))

@KDr2
Copy link
Member Author

KDr2 commented Mar 4, 2022

Would it address these issues if we simply replace Box type with NamedTuple(id=:x, val=Ref(val))

Unfortunately, it won't work.

see here, ref[]=v is seen as Assigning to an array converts to the array's element type. I think.

mutable struct S{T}
    i::T
end

function Base.convert(::Type{S{T1}}, d::S{T2}) where {T1, T2}
    println("Convert: ", T2, " --> ", T1)
    return S{T1}(zero(T1))
end
# Base.convert(::Type{S{T}}, d::S{T}) where {T} = d
r = Ref(S{Float64}(1.0))
r[] = S{Float64}(1.0)

@yebai
Copy link
Member

yebai commented Mar 4, 2022

I played with this example a bit more. It seems the Ref constructor will always convert the value.

julia> r = Ref(S{Float64}(1.0))
Convert: Float64 --> Float64
Base.RefValue{S{Float64}}(S{Float64}(0.0))

However, the following works. Maybe we can adopt a non-mutating design internally, and use BangBang.set!! to update val where needed?

julia> a = (;id=:x, val=S{Float64}(1.0))
(id = :x, val = S{Float64}(1.0))

# Also, copying a.val won't cause conversion. 
julia> b = (;id=:y, a.val)
(id = :y, val = S{Float64}(1.0))

@KDr2
Copy link
Member Author

KDr2 commented Mar 4, 2022

However, the following works. Maybe we can adopt a non-mutating design internally, and use BangBang.set!! to update val where needed?

This creates a new Box each time we update the value and would break the argument passing chain:
Instruction-1 creates a new Box when sets its output, Instruction still holes the old one.

So it won't work.

EDIT:

Currently, we look up value from bindings each time we execute instructions rather than holding them directly, so it will work if we update the value in the bindings. But binding is an immutable NamedTuple, so we can't update its field, we only can change the whole bindings to a new object, which means we should construct a new bindings (which is a LARGE object) each time we update a variable.

Changing bindings back to a Dict{Symbol, Box{<:Any}} may be another way, but I think it's nearly the same as we use Box{Any} everywhere.

@KDr2
Copy link
Member Author

KDr2 commented Mar 5, 2022

I found another way to involve in the type info in #130.

@yebai
Copy link
Member

yebai commented Mar 7, 2022

CodeInfo(
    @ /home/runner/work/Libtask.jl/Libtask.jl/test/tf.jl:8 within `S`
1%1 = Main.S::Nothing%2 = Core.fieldtype(%1, 1)::Core.Const(S)
│   %3 = (x + y)::Core.Const(Int64)
│   %4 = Base.convert(%2, %3)::Int64%5 = %new(%1, %4)::Int64
└──      return %5
)

Many thanks for the interesting explorations. @KDr2 In order to debug and understand the issue here better, could you 1) create an MWE and open an issue at CodeInfoTools? 2) compare the Julia startup options between CI and the one used on desktop computers? It seems Github actions set a few more options differently than the default.

@KDr2
Copy link
Member Author

KDr2 commented Mar 8, 2022

open an issue at CodeInfoTools

JuliaCompilerPlugins/CodeInfoTools.jl#18

@yebai
Copy link
Member

yebai commented Mar 8, 2022

Thanks, @KDr2 - that makes sense now! It seems this bug is specific to the case of converting a constructor to a taped function. Maybe we can detect it and wrap it into a function as a temporary fix? That is,

T(x::Int)

# becomes

f(x::Int) = (x -> T(x))(x)
julia> f(3)
T(3)

Then we replace T(x) with f(x) inside TapedFunction. This example is a bit heuristic, though. There might be better ways of getting around this issue.

@KDr2
Copy link
Member Author

KDr2 commented Mar 8, 2022

this bug is specific to the case of converting a constructor

It applies to all functions, but only the first one in our test cases is a type constructor.

It can be temporarily fixed by:

function fix_ir(ir::Core.CodeInfo)
    if Base.JLOptions().code_coverage != 0 &&
        length(ir.ssavaluetypes) > length(ir.code) &&
        ir.ssavaluetypes[1] === Nothing
        popfirst!(ir.ssavaluetypes)
    end
end

Edit:
fix_ir doesn't work in all kinds of cases...

@KDr2 KDr2 marked this pull request as ready for review March 8, 2022 15:35
@KDr2
Copy link
Member Author

KDr2 commented Mar 8, 2022

It's too slow: (master 34 secs) V.S. (this branch 238 secs) ...

Copy link
Member

@yebai yebai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiling NamedTuple requires additional time. It maybe help to use BenchmarkTools so that we can decouple the compilation time and run time.

On my computer

using Libtask

# first run 
@time include("runtests.jl");
 22.500156 seconds (33.86 M allocations: 1.874 GiB, 2.07% gc time, 91.76% compilation time)

# second run
@time include("runtests.jl");
 13.962012 seconds (17.00 M allocations: 1003.062 MiB, 1.31% gc time, 88.14% compilation time)


_lookup(tf::TapedFunction, v) = v
_lookup(tf::TapedFunction, v::Symbol) = tf.bindings[v] |> val
_update_var!(tf::TapedFunction, v::Symbol, c) = tf.bindings[v][] = (;val=c)
Copy link
Member

@yebai yebai Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rikhuijzer @devmotion Is there a way to avoid unnecessary memory allocation for NamedTuple (i.e. (;val=c))here? The type of tf.bindings[v]::Ref{NamedTuple} remains the same; therefore, it would improve efficiency by mutating the named tuple inside Ref.

Copy link
Contributor

@rikhuijzer rikhuijzer Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the allocation at the tf.bindings[v][]? Specifically, the creation of the vector due to []?

julia> @time (; a=1)
  0.000001 seconds

julia> nt = (; a=[1], b=2);

julia> @time nt[:a][] = 2
  0.000007 seconds (1 allocation: 32 bytes)

val(x::TapedFunction) = x.func

_lookup(tf::TapedFunction, v) = v
_lookup(tf::TapedFunction, v::Symbol) = tf.bindings[v] |> val
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for myself: tf.bindings[v] is still type-unstable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KDr2 we might need to bring back the instruction design to ensure type-stability. If we reference variables by Symbol from TF.bindings::NamedTuple, we will lose type information in thegetindex / getfield calls.

@yebai
Copy link
Member

yebai commented Mar 10, 2022

@KDr2 can you take a look at the broken integration test before this gets merged?

@KDr2
Copy link
Member Author

KDr2 commented Mar 11, 2022

broken integration

Two kinds of errors:

  • sometimes the bindings (a tuple) is too big to be put on the stack, we got a stack overflow.
  • a fatal error in type inference (type bound) error, I guess we triggered a compiler bug somehow.

@yebai
Copy link
Member

yebai commented Mar 11, 2022

sometimes the bindings (a tuple) is too big to be put on the stack, we got a stack overflow.
a fatal error in type inference (type bound) error, I guess we triggered a compiler bug somehow.

These two errors are probably related: the Tuple type becomes too big for the compiler, leading to both memory issues and type inference failures. We need to revisit the design to cap the size of Tuple types.

@yebai
Copy link
Member

yebai commented Mar 17, 2022

Replaced by #130

@yebai yebai closed this Mar 17, 2022
@KDr2 KDr2 deleted the type-info branch March 18, 2022 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants