Skip to content

Conversation

@torfjelde
Copy link
Member

Fixes issues as discussed in #1579 regarding failing tests by making maybe_link! also link if proposal is a NamedTuple of proposals which all should be linked. If only one of the proposals does not require linking, we don't link.

It might be better to not dispatch here, but instead check at runtime and warn the user that there's a mix between proposals that are by default linked but don't link any?

@devmotion
Copy link
Member

Somehow this feels a bit wrong - wouldn't it be better to extract the different proposals from the NamedTuple and link only the variables that should be linked?

@torfjelde
Copy link
Member Author

Somehow this feels a bit wrong - wouldn't it be better to extract the different proposals from the NamedTuple and link only the variables that should be linked?

Yep. But isn't this when you should be using Gibbs?

@torfjelde
Copy link
Member Author

Also tests are now failiing o.O

@torfjelde
Copy link
Member Author

Also tests are now failiing o.O

Ah crap. My definition is too general and will also catch proposal == NamedTuple(). Though why would "unnecessarily" linking cause issues?

@devmotion
Copy link
Member

Somehow this feels a bit wrong - wouldn't it be better to extract the different proposals from the NamedTuple and link only the variables that should be linked?

Yep. But isn't this when you should be using Gibbs?

But the docstrings explicitly explain how to set different proposals for different variables without Gibbs, so it should work it seems. I still don't understand what the underlying problem is here but it seems wrong to fail or emit warning here, and it seems suboptimal to link everything.

@devmotion
Copy link
Member

Ideally, the linking would be more transparent and less confusing. There are so many bugs and hours of debugging...

IIRC when moving to AbstractMCMC 2 we discussed that in general it would be good if one could avoid to link the samples only in the beginning and at the end of sampling since it leads to surprising results when one uses the iteration interface.

@torfjelde
Copy link
Member Author

But the docstrings explicitly explain how to set different proposals for different variables without Gibbs, so it should work it seems. I still don't understand what the underlying problem is here but it seems wrong to fail or emit warning here, and it seems suboptimal to link everything.

I agree! But not clear to me how to go about doing that without additional implementations in DynamicPPL 😕

Ideally, the linking would be more transparent and less confusing. There are so many bugs and hours of debugging...

100% agree.

IIRC when moving to AbstractMCMC 2 we discussed that in general it would be good if one could avoid to link the samples only in the beginning and at the end of sampling since it leads to surprising results when one uses the iteration interface.

Wait, you want to NOT only link at beginning and end of sampling?

@torfjelde
Copy link
Member Author

Also tests are now failiing o.O

Ah crap. My definition is too general and will also catch proposal == NamedTuple(). Though why would "unnecessarily" linking cause issues?

Ah, so when we do MH() we use the priors as proposals and thus linking causes issues.

@torfjelde
Copy link
Member Author

torfjelde commented Apr 10, 2021

Btw, one simple way of allowing the desired behavior is to add a DynamicPPL.link!(vi, spl, space) rather than forcing link! to work on the "entire space" of spl. E.g.

function link!(vi::TypedVarInfo, spl::AbstractSampler)
    return link!(vi, spl, Val(getspace(spl)))
end
function link!(vi::TypedVarInfo, spl::AbstractSampler, spaceval::Val)
    vns = _getvns(vi, spl)
    return _link!(vi.metadata, vi, vns, spaceval)
end

Combined with

function maybe_link!(varinfo, sampler, proposal::NamedTuple{(), Tuple{}})
    return varinfo
end
function maybe_link!(
    varinfo,
    sampler,
    proposal::NamedTuple{Syms, Tuple{<:AdvancedMH.RandomWalkProposal}}
) where {Syms}
    return link!(varinfo, sampler, Val(Syms))
end
function maybe_link!(varinfo, sampler, proposal::NamedTuple)
    maybe_link!(varinfo, sampler, first(proposal))
    return maybe_link!(varinfo, sampler, Iterators.tail(proposal))
end

we get what we want.

@devmotion
Copy link
Member

Wait, you want to NOT only link at beginning and end of sampling?

Yep, maybe it would be better to just apply the bijective mapping in set_namedtuple! (called in

function (f::MHLogDensityFunction)(x)
sampler = f.sampler
vi = f.vi
x_old, lj_old = vi[sampler], getlogp(vi)
set_namedtuple!(vi, x)
f.model(vi)
lj = getlogp(vi)
vi[sampler] = x_old
setlogp!(vi, lj_old)
return lj
end
and
set_namedtuple!(vi, trans.params)
) and dist_val_tuple (in
dt, vt = dist_val_tuple(spl, vi)
), and in
vals = vi[spl]
and in
vi[spl] = trans.params
. Then one could perform whatever transformation is required for a specific variable and proposal.

As I said above, I am always confused by the linking. In general, IMO it is problematic if the samples that are returned in the intermediate steps return the transformed samples. I am not completely sure if this happens here but it is one reason for why I think the current design is a bit suboptimal. Also, if I understand correctly, the linking only makes it easier to set and retrieve the transformed samples (which could be moved to the other functions mentioned above) but when the model is executed the variables still have to be converted to the original anyway, so I wonder if and to what extent it actually improves performance. Maybe generally it would be cleaner to always store the samples in the original space in the VarInfo object and to map them on the fly if needed (or at least not modify the VarInfo object in-place since this e.g. makes it impossible to implement bijections from a submanifold in a higher dimensional space to some lower dimensional space where the dimensions and potentially the type changes).

@torfjelde
Copy link
Member Author

Yep, maybe it would be better to just apply the bijective mapping in set_namedtuple!

I disagree with this right now, simply because it diverges from the current approach of the rest of the samplers.

BUT I do agree that the current approach is suboptimal; just feels like we should address that head on rather than approach it differently for different samplers.

In general, IMO it is problematic if the samples that are returned in the intermediate steps return the transformed samples. I am not completely sure if this happens here but it is one reason for why I think the current design is a bit suboptimal.

I see what you're saying and I remember the conversation we had about potential issues this causes when using the iterator interface for AbstractMCMC.jl.

And yes, it does happen here. We only link! at the beginning of the sampling process.

Also, if I understand correctly, the linking only makes it easier to set and retrieve the transformed samples (which could be moved to the other functions mentioned above) but when the model is executed the variables still have to be converted to the original anyway, so I wonder if and to what extent it actually improves performance.

Currently we do:

for i = 1:num_samples
    # Propose in unconstrained space
    ϕ_new = step(ϕ)

    # Transform to constrained
    θ_new = f(ϕ)

    # Evaluate logpdf
    logpdf(model, θ_new)

    # Update sample
    ϕ = θ_new
end

but if we made it so that it should be transformed on the fly, we'd get

for i = 1:num_samples
    # ADDITIONAL STEP
    # Transform previous sample to unconstrained
    ϕ = f⁻¹(θ)
    
    # Propose in unconstrained space
    ϕ_new = step(ϕ)

    # Transform to constrained
    θ_new = f(ϕ)

    # Evaluate logpdf
    logpdf(model, θ_new)

    # Update sample
    ϕ = θ_new
end

So it def helps performance-wise in the case where the linking is actual a significant part of the computation.

Maybe generally it would be cleaner to always store the samples in the original space in the VarInfo object and to map them on the fly if needed (or at least not modify the VarInfo object in-place since this e.g. makes it impossible to implement bijections from a submanifold in a higher dimensional space to some lower dimensional space where the dimensions and potentially the type changes).

It does have a performance cost though, as mentioned above. But I do think it's an important point, and for certain bijectors we could possibly even gain some performance by doing what you mention.

With all this being said, I do agree that the implicit transformation is both:

  1. Very confusing at times.
  2. Often not even worth it.

I'm also of the opinion that we should exploit Bijectors.jl more.
These days, using Bijectors.jl we easily construct a bijector which handles the entire VarInfo without any additional overhead (for static models). We can even optimize the structure of the transformation, e.g. by combining contiguous ranges into a single vectorized bijector.
So as a "step" along the way, we could for example make something similar to TransformedDistribution for VarInfo, which is a bijector + VarInfo, and then we overload the necessary operations, e.g. setindex!, to take in unconstrained variables, convert them to constrained, and set the underlying VarInfo.

This would also make it possible to construct completely arbitrary transformations, use different bijectors than default if that's wanted, etc., by simply calling b = bijector(varinfo) and then replacing parts of b as desired.

@torfjelde
Copy link
Member Author

Also, we should take this discussion to an issue over at DPPL 👍

In addition, open an issue regarding how MH is a bit disingenuous when it comes to what it claims and what it can actually do wrt. unconstrained variables.

@devmotion
Copy link
Member

but if we made it so that it should be transformed on the fly, we'd get

The unconstrained sample could just be part of the state.

But whatever, I agree that the general design should be discussed (and possibly changed) separately and not in this PR 👍

@coveralls
Copy link

coveralls commented Apr 10, 2021

Pull Request Test Coverage Report for Build 736531705

  • 7 of 7 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.08%) to 78.167%

Totals Coverage Status
Change from base Build 730211560: 0.08%
Covered Lines: 1117
Relevant Lines: 1429

💛 - Coveralls

@codecov
Copy link

codecov bot commented Apr 10, 2021

Codecov Report

Merging #1582 (46a9ff4) into master (2eab753) will increase coverage by 0.07%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1582      +/-   ##
==========================================
+ Coverage   78.08%   78.16%   +0.07%     
==========================================
  Files          23       23              
  Lines        1424     1429       +5     
==========================================
+ Hits         1112     1117       +5     
  Misses        312      312              
Impacted Files Coverage Δ
src/inference/mh.jl 85.71% <100.00%> (+0.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2eab753...46a9ff4. Read the comment docs.

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but it seems it doesn't handle the case where we use not only random walk proposals in the NamedTuple, does it?

And I wonder if we should add specific tests or it's sufficient that the tests don't fail anymore?

@torfjelde
Copy link
Member Author

Looks good but it seems it doesn't handle the case where we use not only random walk proposals in the NamedTuple, does it?

Correct; IMO we should fix the general case through TuringLang/DynamicPPL.jl#231.
I'll open an issue here that we need to handle the general case though.

And I wonder if we should add specific tests or it's sufficient that the tests don't fail anymore?

Well, technically the tests to catch this were already present 😅 Issue was that they only did so every now and then, depending on the rng. But good point, I'll add some tests.

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, given that tests pass. Can you also update the version number?

@torfjelde torfjelde merged commit 9104637 into master Apr 10, 2021
@delete-merged-branch delete-merged-branch bot deleted the tor/MH-fix branch April 10, 2021 23:23
bors bot pushed a commit to TuringLang/DynamicPPL.jl that referenced this pull request Apr 11, 2021
Makes it possible to only link/invlink a subset of the space of a sampler *for static models only* (as there's no equivalent for `UntypedVarInfo`). Is non-breaking since this is just adding an intermediate method to an existing implementation, giving increased flexbility.

Probably don't want to encourage use of this, but it can be useful in cases such as the MH sampler TuringLang/Turing.jl#1582 (comment).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants