[Merged by Bors] - Method to extract loglikelihoods #166

torfjelde · 2020-09-26T14:29:59Z

For several reasons, it would be very nice to have a way of extracting the log-likelihoods from a chain. This PR implements the method loglikelihoods to do exactly this.

Up for discussion

Return-value. Right now it returns a Dict{String, Vector{Float64}) with the keys being string(varname) and the values being an array with the i-th index corresponding to the log-likelihood for string(varname) in chain[i]. Alternatives:
- Dict of the form Dict(y => Dict(y[1] => ..., y[2] => ...), ...), i.e. "hierarhical"
- Dict of the form Dict(y[1] => ..., y[2] => ..., ...), i.e. "flattened"
- ????
Project structure. I'm a bit uncertain where to actually put the implementation. As I now experienced, what you actually need to implement for to make a AbstractSampler is a bit unclear, e.g. are some methods in varinfo.jl which also requires implementation (e.g. getindex). So, should I make it it's own file, like I have now, or should I follow suit with SampleFromPrior and SampleFromUniform?

Example

julia> using DynamicPPL, Turing

julia> @model function demo(xs, y)
           s ~ InverseGamma(2, 3)
           m ~ Normal(0, √s)
           for i in eachindex(xs)
               xs[i] ~ Normal(m, √s)
           end
                                                                                                                                                                                                                                                                                                                                                                          
           y ~ Normal(m, √s)                                                                                                                                                                                                                                                                                                                                              
       end                                                                                                                                                                                                                                                                                                                                                                
demo (generic function with 1 method)                                                                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                                                                                                          
julia> model = demo(randn(3), randn());
                                                                                                                                                                                                                                                                                                                                                                          
julia> chain = sample(model, MH(), 10);

julia> DynamicPPL.loglikelihoods(model, chain)
Dict{String,Array{Float64,1}} with 4 entries:
  "xs[3]" => [-1.02616, -1.26931, -1.05003, -5.05458, -1.33825, -1.02904, -1.23761, -1.30128, -1.04872, -2.03716]
  "xs[1]" => [-2.08205, -2.51387, -3.03175, -2.5981, -2.31322, -2.62284, -2.70874, -1.18617, -1.36281, -4.39839]
  "xs[2]" => [-2.20604, -2.63495, -3.22802, -2.48785, -2.40941, -2.78791, -2.85013, -1.24081, -1.46019, -4.59025]
  "y"     => [-1.36627, -1.21964, -1.03342, -7.46617, -1.3234, -1.14536, -1.14781, -2.48912, -2.23705, -1.26267]

devmotion · 2020-09-26T15:05:40Z

Nice!

Some quick high-level comments:

IMO is the name is a bit too generic and doesn't clearly explain the functionality
It would be nice to not base all our "cool" methods on Chains IMO - could we maybe define it for a VarInfo input or some other collection of VarName objects and corresponding values? Maybe a more/better structured input would also allow users to clearly define what the observations actually are (e.g., to solve the MvNormal ambiguity that was mentioned on Zulip)
Maybe return a StructArray, indexed by the symbol of the variables? Or in case the input is better structured, something similar to the input?

src/loglikelihoods.jl

devmotion · 2020-09-26T15:09:39Z

It's also a bit unfortunate that the approach will be broken with the new interface (which we can switch to soon hopefully)

torfjelde · 2020-09-26T15:51:58Z

It's also a bit unfortunate that the approach will be broken with the new interface (which we can switch to soon hopefully)

Home come?

devmotion · 2020-09-26T15:53:51Z

Actually, I think it might be possible to simplify this a bit since you don't accumulate log probabilities (and hence there is no benefit from using ThreadSafeVarInfo) and don't need to keep track of eval_num I guess? Then one could just implement

DynamicPPL.jl/src/model.jl

Lines 87 to 98 in 405546f

    
           function (model::Model)( 
        
               rng::Random.AbstractRNG, 
        
               varinfo::AbstractVarInfo, 
        
               sampler::AbstractSampler = SampleFromPrior(), 
        
               context::AbstractContext = DefaultContext() 
        
           ) 
        
               if Threads.nthreads() == 1 
        
                   return evaluate_threadunsafe(rng, model, varinfo, sampler, context) 
        
               else 
        
                   return evaluate_threadsafe(rng, model, varinfo, sampler, context) 
        
               end 
        
           end

for this sampler by just calling _evaluate (as in

DynamicPPL.jl/src/model.jl

Line 115 in 405546f

return _evaluate(rng, model, varinfo, sampler, context)

).

torfjelde · 2020-09-26T15:55:27Z

IMO is the name is a bit too generic and doesn't clearly explain the functionality

Agreed. Suggestions? get_likelihoods? likelihoods_from_chains?

It would be nice to not base all our "cool" methods on Chains IMO - could we maybe define it for a VarInfo input or some other collection of VarName objects and corresponding values? Maybe a more/better structured input would also allow users to clearly define what the observations actually are (e.g., to solve the MvNormal ambiguity that was mentioned on Zulip)

I 100% agree with this, as I don't like the "implicit" dependencies on MCMCChains.jl due to how we use chains. The issue is that we need a way in MCMCChains.jl to convert the chain to something that we can work with here in DynamicPPL.jl.

Maybe return a StructArray, indexed by the symbol of the variables? Or in case the input is better structured, something similar to the input?

StructArray seems like a great approach! Will do that.

devmotion · 2020-09-26T15:57:03Z

If we wouldn't enforce ::AbstractSampler in compiler.jl, one could also just call _evaluate without making it an AbstractSampler here (since in fact it's not really sampling...). Maybe it would be even simpler to just define it as a special context instead (the simplifications mentioned above could still be performed though)?

devmotion · 2020-09-26T15:58:25Z

Home come?

The sampler setup will change and immutable by design with a separate state, so it will require changes to the current implementation (but will still be possible). Making it a context would avoid these issues I guess.

devmotion · 2020-09-26T15:59:53Z

The issue is that we need a way in MCMCChains.jl to convert the chain to something that we can work with here in DynamicPPL.jl.

Yep, that's the important point IMO. Maybe we could add a way to build a "dummy" VarInfo object from a set of samples?

torfjelde · 2020-09-26T16:02:55Z

The sampler setup will change and immutable by design with a separate state, so it will require changes to the current implementation (but will still be possible). Making it a context would avoid these issues I guess.

I'm not using a sampler state here though, right?

devmotion · 2020-09-26T16:07:27Z

I'm not using a sampler state here though, right?

Not in the "Turing" sense but the array of likelihoods is like an internal state that is mutated when running the model, isn't it?

torfjelde · 2020-09-26T16:24:11Z

Not in the "Turing" sense but the array of likelihoods is like an internal state that is mutated when running the model, isn't it?

I'm confused. In the new interface, isn't that just a change to AbstractMCMC? Like, it doesn't actually touch the assume and observe statements, right? So, even though having a state in the sampler isn't really what the intention is, the code should still run, no?

torfjelde · 2020-09-26T16:41:33Z

StructArray seems like a great approach! Will do that.

Actually, I'm not certain I quite get how I'd use a StructArray here. You mean Symbol("y[1]") as the key? Then the users will have to do Symbol("y[1]") every time to get the index + the show will look somewhat nasty. Is this really better than Dict as a retur nvalue? Users can always convert that to StructArray if they want.

devmotion · 2020-09-26T16:54:37Z

Probably most parts will be fine, just some of the additional methods might have to be redefined. But what's your opinion about just using a special context here? I guess it could even be interesting in combination with other samplers than SampleFromPrior for keeping track of the individual log probabilities during sampling.

torfjelde · 2020-09-26T17:04:11Z

I guess it could even be interesting in combination with other samplers than SampleFromPrior for keeping track of the individual log probabilities during sampling.

Sorrry, I didn't properly respond to this; I think it's a great idea! Do you know on the top of your head what I'll need to overload here? I'll have a look myself too, ofc.

devmotion · 2020-09-26T17:23:53Z

Probably the context itself would be defined similar to

DynamicPPL.jl/src/contexts.jl

Lines 31 to 34 in 405546f

    
           struct LikelihoodContext{Tvars} <: AbstractContext 
        
               vars::Tvars 
        
           end 
        
           LikelihoodContext() = LikelihoodContext(nothing)

, with an additional field for tracking the loglikelihoods and for the original context that you want to use during sampling I assume. Then you would implement for the assumptions

function tilde_assume(rng, ctx::TrackedLikelihoodsContext, sampler, right, vn, inds, vi)
    return tilde_assume(rng, ctx.context, sampler, right, vn, inds, vi)
end

and for the observations

function tilde_observe(ctx::TrackedLikelihoodsContexts, sampler, right, left, vname, vinds, vi)
    # this is slightly unfortunate since it is not completely generic...
    # ideally we would call `tilde_observe` recursively but then we don't get the loglikelihood value
    logp = tilde(ctx.context, sampler, right, left, vi)
    acclogp!(vi, logp)

    # track loglikelihood value
    ...

	return left
end

function tilde_observe(ctx, sampler, right, left, vi)
    # same here
    ...
end

and similarly for dot_tilde_assume and dot_tilde_observe. To minimize the computations, the default inner context could be the LikelihoodContext.

torfjelde · 2020-09-26T17:26:45Z

That looks great; I'll change it at some point today (gotta get some dinner now though!). Thanks man:)

torfjelde · 2020-09-27T02:36:47Z

Implemented get_loglikelihoods using contexts as you suggested! Only thing is that I'm uncertain how to deal with the dot_tilde_observe statements, since it seems like are handled in tilde, which in turn returns a single value 😕

Yep, that's the important point IMO. Maybe we could add a way to build a "dummy" VarInfo object from a set of samples?

I'm very for this. But I'd prefer to merge this PR without it, if that's okay. Then we raise an issue in DynamicPPL/MCMCChains (whichever is most suitable), and figure out where to go from there?

torfjelde · 2020-09-27T14:50:51Z

Yep, that's the important point IMO. Maybe we could add a way to build a "dummy" VarInfo object from a set of samples?

Another comment on this: could do something similar to what we do in setval! and such, so we make a method which takes in a Model and a Chains, and returns a Vector{<:VarInfo}. Then it wouldn't even be a "dummy" since it would have all the same information.

torfjelde · 2020-09-27T14:50:59Z

Btw, is this ok to merge?

devmotion · 2020-09-27T15:07:20Z

Implemented get_loglikelihoods using contexts as you suggested! Only thing is that I'm uncertain how to deal with the dot_tilde_observe statements, since it seems like are handled in tilde, which in turn returns a single value confused

Hmm that's a bit unfortunate but I'm wondering if we should just consider these statements as a single observation for now, since there are ambiguities and design choices to make anyway. On the other hand, it's a bit unfortunate that it leads to a discrepancy between for-loop and dotted implementations - but the same problem arises if one uses, e.g., MvNormal for a set of observations.

I'm very for this. But I'd prefer to merge this PR without it, if that's okay. Then we raise an issue in DynamicPPL/MCMCChains (whichever is most suitable), and figure out where to go from there?

Yes, better to keep this PR focused on the loglikelihoods.

torfjelde · 2020-09-27T15:08:04Z

Another comment on this: could do something similar to what we do in setval! and such

Maybe:

function varinfo_from_chain(
    model::Turing.Model,
    chain::MCMCChains.Chains;
    sampler = DynamicPPL.SampleFromPrior()
)
    vi = Turing.VarInfo(model)

    vis = map(1:length(chain)) do i
        c = chain[i]
        md = vi.metadata
        for v in keys(md)
            for vn in md[v].vns
                vn_sym = Symbol(vn)

                # Cannot use `vn_sym` to index in the chain
                # so we have to extract the corresponding "linear"
                # indices and use those.
                # `ks` is empty if `vn_sym` not in `c`.
                ks = MCMCChains.namesingroup(c, vn_sym)

                if !isempty(ks)
                    # 1st dimension is of size 1 since `c`
                    # only contains a single sample, and the
                    # last dimension is of size 1 since
                    # we're assuming we're working with a single chain.
                    val = copy(vec(c[ks].value))
                    DynamicPPL.setval!(vi, val, vn)
                    DynamicPPL.settrans!(vi, false, vn)
                else
                    DynamicPPL.set_flag!(vi, vn, "del")
                end
            end
        end
        new_vi = VarInfo(vi, sampler, vi[sampler])
        setlogp!(new_vi, first(chain[i][:lp])) # Is there a better way?

        return new_vi
    end

    return vis
end

and reconstruction (at least in the context of Turing):

vis = varinfo_from_chain(model, chain)
chain_new = AbstractMCMC.bundle_samples(
    rng, model, SampleFromPrior(), length(vis), vis, MCMCChains.Chains
)

Doesn't require executing the model or anything.

devmotion · 2020-09-27T15:09:02Z

src/loglikelihoods.jl

+struct TrackedLikelihoodContext{A, Ctx, Tvars} <: AbstractContext
+    loglikelihoods::A
+    ctx::Ctx
+    vars::Tvars


It seems we don't use them currently - so either check for them when tracking the loglikelihoods (preferably IMO) or just remove this field?

Do you know what the field is supposed to do?

src/loglikelihoods.jl

Co-authored-by: David Widmann <[email protected]>

devmotion · 2020-09-27T15:16:47Z

src/loglikelihoods.jl

+    # Do the usual thing
+    logp = tilde(ctx.ctx, sampler, right, left, vi)
+    acclogp!(vi, logp)
+
+    # track loglikelihood value
+    lookup = ctx.loglikelihoods
+    ℓ = get!(lookup, string(vname), Float64[])
+    push!(ℓ, logp)
+
+    return left


Is it possible to merge this with the implementation above?

I just removed it because it doesn't have vname, i.e. the method wasn't even runnable. When is this actually called? Seems like it never should be?

src/loglikelihoods.jl

devmotion · 2020-09-27T15:23:27Z

src/loglikelihoods.jl

+        # Update the values
+        setval!(vi, chain, sample_idx, chain_idx)


So here empty! is not needed?

Actually, empty! will ruin it! I did that initially, but empty! also made it so that values would be resampled. So it ended up sampling from the prior instead.

Hmm but why would it resample values in this case? Shouldn't setval! fix them? There's something going on with this empty!/setval! thing that I don't understand 🤔

Ah, okay then there's something weird. I thought I just had misunderstood something, but if you also don't know why that's the case then there's something going on 😅

Can it be the fact that empty! clears the "del" flag + setval! does NOT set it to false? So then we you run the model again, it will resample?

Yeah, something is wrong:

julia> using DynamicPPL, Turing julia> import DynamicPPL: setval! julia> @model function demo(xs) m ~ MvNormal(2, 1.) for i in eachindex(xs) xs[i] ~ Normal(m[1], 1.) end end demo (generic function with 1 method) julia> model = demo(randn(3)); julia> chain = sample(model, MH(), 10); julia> var_info = VarInfo(model); julia> θ_old = var_info[SampleFromPrior()] 2-element Array{Float64,1}: 0.41438744434831887 0.373145757716783 julia> θ_chain = vec(MCMCChains.group(chain, :m)[1, :, 1].value) 2-element reshape(::AxisArrays.AxisArray{Float64,3,Array{Float64,3},Tuple{AxisArrays.Axis{:iter,StepRange{Int64,Int64}},AxisArrays.Axis{:var,Array{Symbol,1}},AxisArrays.Axis{:chain,UnitRange{Int64}}}}, 2) with eltype Float64: 0.3939341450590006 -1.1020030439893758 julia> empty!(var_info) VarInfo{NamedTuple{(:m,),Tuple{DynamicPPL.Metadata{Dict{VarName{:m,Tuple{}},Int64},Array{MvNormal{Float64,PDMats.ScalMat{Float64},FillArrays.Zeros{Float64,1,Tuple{Base.OneTo{Int64}}}},1},Array{VarName{:m,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}((m = DynamicPPL.Metadata{Dict{VarName{:m,Tuple{}},Int64},Array{MvNormal{Float64,PDMats.ScalMat{Float64},FillArrays.Zeros{Float64,1,Tuple{Base.OneTo{Int64}}}},1},Array{VarName{:m,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}(Dict{VarName{:m,Tuple{}},Int64}(), VarName{:m,Tuple{}}[], UnitRange{Int64}[], Float64[], MvNormal{Float64,PDMats.ScalMat{Float64},FillArrays.Zeros{Float64,1,Tuple{Base.OneTo{Int64}}}}[], Set{DynamicPPL.Selector}[], Int64[], Dict{String,BitArray{1}}("del" => [],"trans" => [])),), Base.RefValue{Float64}(0.0), Base.RefValue{Int64}(0)) julia> setval!(var_info, chain, 1, 1) VarInfo{NamedTuple{(:m,),Tuple{DynamicPPL.Metadata{Dict{VarName{:m,Tuple{}},Int64},Array{MvNormal{Float64,PDMats.ScalMat{Float64},FillArrays.Zeros{Float64,1,Tuple{Base.OneTo{Int64}}}},1},Array{VarName{:m,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}((m = DynamicPPL.Metadata{Dict{VarName{:m,Tuple{}},Int64},Array{MvNormal{Float64,PDMats.ScalMat{Float64},FillArrays.Zeros{Float64,1,Tuple{Base.OneTo{Int64}}}},1},Array{VarName{:m,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}(Dict{VarName{:m,Tuple{}},Int64}(), VarName{:m,Tuple{}}[], UnitRange{Int64}[], Float64[], MvNormal{Float64,PDMats.ScalMat{Float64},FillArrays.Zeros{Float64,1,Tuple{Base.OneTo{Int64}}}}[], Set{DynamicPPL.Selector}[], Int64[], Dict{String,BitArray{1}}("del" => [],"trans" => [])),), Base.RefValue{Float64}(0.0), Base.RefValue{Int64}(0)) julia> θ_new = var_info[SampleFromPrior()] Float64[] julia> model(var_info) julia> var_info[SampleFromPrior()] 2-element Array{Float64,1}: -0.7971105060749638 0.9046609763240063

Figured it out: empty! clears vi.metadata.$n.vns for every $n in names, so the following never touches _setval_kernel!:

DynamicPPL.jl/src/varinfo.jl

Lines 1167 to 1171 in 334cb98

quote

for vn in metadata.$n.vns

_setval_kernel!(vi, vn, values, keys)

end

end

src/loglikelihoods.jl

torfjelde · 2020-09-27T20:48:11Z

Ready for merge?

devmotion

I'm happy with it, just some final comments:

If you increase the version number, we can make a new release immediately (given that bors doesn't complain)
Models with .~ are not supported since only dot_tilde_assume is implemented. I added a comment of how we could implement dot_tilde_observe for now, but it's fine with me if we postpone this to a separate PR and do not support such models for the time being.
We should improve the setval! stuff, maybe copy your debugging steps to a separate issue? (we should also get the generated_quantities draft in and probably just use empty! there as long as it is not fixed...)

torfjelde · 2020-09-27T22:07:18Z

If you increase the version number, we can make a new release immediately (given that bors doesn't complain)

Will do!

Models with .~ are not supported since only dot_tilde_assume is implemented

Ah, sorry forgot to respond to that! So you said:

On the other hand, it's a bit unfortunate that it leads to a discrepancy between for-loop and dotted implementations - but the same problem arises if one uses, e.g., MvNormal for a set of observations.

At the moment, I make it clear in the docstring that stuff like MvNormal to represent iid samples is wrong, if you want to use elementwise_loglikelihood. I explain that the reason is ambiguity. And for the dot_tilde_observe, personally (obviously subjetive), I'd prefer it to fail rather than do the wrong thing silently. I think it's rare that someone will actually use dot_tilde_observe but indeed intend for it to be treated as a single observation (I might be wrong though).

We should improve the setval! stuff, maybe copy your debugging steps to a separate issue? (we should also get the generated_quantities draft in and probably just use empty! there as long as it is not fixed...)

Agree! I actually went back to check the generated_quantities stuff to see if we did something wrong there. A bit uncertain what you mean by "use empty! there as long as it is not fixed.". You mean implemented generated_quantities using empty!, and then merge only when empty! is "fixed" (I use " because I'm unceratin whether or not to call it a bug, as empty! seemst o do what it's indented for?)

devmotion · 2020-09-27T22:35:26Z

Agree! I actually went back to check the generated_quantities stuff to see if we did something wrong there. A bit uncertain what you mean by "use empty! there as long as it is not fixed.". You mean implemented generated_quantities using empty!, and then merge only when empty! is "fixed" (I use " because I'm unceratin whether or not to call it a bug, as empty! seemst o do what it's indented for?)

I just remembered that we were quite confused about the behaviour of the different suggestions. Basically, I meant that even though it is still unclear to me why empty! is needed now but wasn't before we should just go with the suggestion with empty! that seemed to work. So I was a bit sloppy, empty! seems to be fine, I'm just wondering if there is an issue since it is needed now.

torfjelde · 2020-09-27T22:44:48Z

I meant that even though it is still unclear to me why empty! is needed now but wasn't before we should just go with the suggestion with empty! that seemed to work.

Just to make sure we're on the same page: empty! should not be used in this PR because it makes us sample from the prior.

I'm also worried that it shouldn't be used in generated_quantities either 😕 The reason why we introduced it was because otherwise we got the same value, but now I'm worried that it "solved" the issue simply by sampling from the prior! Currently trying to debug and check.

torfjelde · 2020-09-27T23:01:22Z

@devmotion #167

torfjelde · 2020-09-27T23:14:12Z

I think we need to fix #167 before merging this; this should also fail similarily on models using TV(undef, d) stuff.

torfjelde · 2020-09-28T00:22:14Z

Waiting with merge until #168 has been merged

…kelihoods

torfjelde · 2020-09-28T01:27:17Z

bors r+

@model

For several reasons, it would be very nice to have a way of extracting the log-likelihoods from a chain. This PR implements the method `loglikelihoods` to do exactly this. # Up for discussion 1. **Return-value.** Right now it returns a `Dict{String, Vector{Float64})` with the keys being `string(varname)` and the values being an array with the i-th index corresponding to the log-likelihood for `string(varname)` in `chain[i]`. Alternatives: - Dict of the form `Dict(y => Dict(y[1] => ..., y[2] => ...), ...)`, i.e. "hierarhical" - Dict of the form `Dict(y[1] => ..., y[2] => ..., ...)`, i.e. "flattened" - ???? 2. **Project structure.** I'm a bit uncertain where to actually put the implementation. As I now experienced, what you actually need to implement for to make a `AbstractSampler` is a bit unclear, e.g. are some methods in `varinfo.jl` which also requires implementation (e.g. `getindex`). So, should I make it it's own file, like I have now, or should I follow suit with `SampleFromPrior` and `SampleFromUniform`? # Example ```julia julia> using DynamicPPL, Turing julia> @model function demo(xs, y) s ~ InverseGamma(2, 3) m ~ Normal(0, √s) for i in eachindex(xs) xs[i] ~ Normal(m, √s) end y ~ Normal(m, √s) end demo (generic function with 1 method) julia> model = demo(randn(3), randn()); julia> chain = sample(model, MH(), 10); julia> DynamicPPL.loglikelihoods(model, chain) Dict{String,Array{Float64,1}} with 4 entries: "xs[3]" => [-1.02616, -1.26931, -1.05003, -5.05458, -1.33825, -1.02904, -1.23761, -1.30128, -1.04872, -2.03716] "xs[1]" => [-2.08205, -2.51387, -3.03175, -2.5981, -2.31322, -2.62284, -2.70874, -1.18617, -1.36281, -4.39839] "xs[2]" => [-2.20604, -2.63495, -3.22802, -2.48785, -2.40941, -2.78791, -2.85013, -1.24081, -1.46019, -4.59025] "y" => [-1.36627, -1.21964, -1.03342, -7.46617, -1.3234, -1.14536, -1.14781, -2.48912, -2.23705, -1.26267] ```

bors · 2020-09-28T01:59:47Z

Pull request successfully merged into master.

Build succeeded:

added a loglikelihoods method to extract, well, loglikelihoods

de8f869

devmotion reviewed Sep 26, 2020

View reviewed changes

src/loglikelihoods.jl Outdated Show resolved Hide resolved

torfjelde added 2 commits September 26, 2020 16:23

added testing and fixed a bug making it use the prior instead

094792e

added docstring

4f2b146

implemented get_likelihoods using Context instead, thanks @devmotion

7e5a8f0

fixed docstring

1c43c9f