-
Notifications
You must be signed in to change notification settings - Fork 37
subset and merge for VarInfo (clean version)
#544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
subset and merge for VarInfosubset and merge for VarInfo (clean version)
src/varinfo.jl
Outdated
| """ | ||
| merge(varinfo_left::VarInfo, varinfo_right::VarInfo) | ||
| Merge two `VarInfo` instances into one, giving precedence to `varinfo_right` when reasonable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add some more docs for this a bit later today
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #544 +/- ##
==========================================
+ Coverage 81.51% 82.76% +1.24%
==========================================
Files 25 25
Lines 3003 3180 +177
==========================================
+ Hits 2448 2632 +184
+ Misses 555 548 -7
☔ View full report in Codecov by Sentry. |
|
Do any of you want to take a look at this @devmotion @yebai ? |
|
I'll take a look later today. @sunxd3 can you also help review this PR? |
| end | ||
|
|
||
| function subset(metadata::Metadata, vns::AbstractVector{<:VarName}) | ||
| # TODO: Should we error if `vns` contains a variable that is not in `metadata`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least a warning?
src/varinfo.jl
Outdated
| metadata = merge_metadata(varinfo_left.metadata, varinfo_right.metadata) | ||
| lp = getlogp(varinfo_left) + getlogp(varinfo_right) | ||
| # TODO: Is this really the way we want to combine `num_produce`? | ||
| num_produce = varinfo_left.num_produce[] + varinfo_right.num_produce[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me the use of num_produce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantically, taking the sum of two num_produce produces no meaning. I think we should throw an error if both num_produce are non-zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_produce is used by the particle samplers to indicate how many observations we've seen / what's the current "observation index".
Semantically, taking the sum of two num_produce produces no meaning. I think we should throw an error if both num_produce are non-zero.
I don't think that's quite right. It makes sense if you go "execute model A and then separately execute model B, and now we want to merge the two"; to ensure that the observation index is correctly, we need to add them, no?
Is this a scenario we want to consider? Probably not. But to me it wasn't obvious what else to do.
No matter, I don't think erroring if num_produce is non-zero is the right way to go. We should allow something like
varinfo_with_num_produce = last(evaluate!!(model, varinfo, context))
merge(varinfo_with_num_produce, varinfo)If there's no meaning in adding them, then I suggest the alternative is to just give precedence to varinfo_right, as is the semantics of merge (if a field/property/key is present in both left and right, then the value in right takes precedence).
EDIT: It's fair to ask if we should also do that for logp. I went with sum because I was imagining scenarios where we want to do something like execute two (sub-)models in parallel and then merge the resulting varinfos. In such a scenario, we want to add the logp fields together. But we could maybe make this a separate function, e.g. merge_with_add_logp or something (preferably named in a better way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what should we do here? Give precedence to varinfo_right or leave as it is?
|
Looks good to me, but I am not very familiar with |
yebai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a lot of additional code to support VarInfo. Is there a plan to drop VarInfo in favour of SimpleVarInfo at some point?
We have nearly everything we need in SimpleVarInfo now. So, the marginal benefit of keeping both VarInfo and SimpleVarInfo is diminishing.
src/varinfo.jl
Outdated
| metadata = merge_metadata(varinfo_left.metadata, varinfo_right.metadata) | ||
| lp = getlogp(varinfo_left) + getlogp(varinfo_right) | ||
| # TODO: Is this really the way we want to combine `num_produce`? | ||
| num_produce = varinfo_left.num_produce[] + varinfo_right.num_produce[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantically, taking the sum of two num_produce produces no meaning. I think we should throw an error if both num_produce are non-zero.
| end | ||
|
|
||
| """ | ||
| getorder(vi::VarInfo, vn::VarName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this API -- we can consider depreciating num_produce in favour of getorder in the longer run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. Isn't num_produce and order different things? At least they are two different fields in VarInfo. I I just added a getorder method because I've been trying to make the interface of VarInfo simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_produce is used to track the current observation index (for all vns), and its current value inserted to VarInfo.metadata when a new vn is created.
We don't though 😕 Until that has been done, we need to continue supporting |
Yes, I liked the |
Agreed, but that will take time; it's non-trivial to get right IMO. |
| # the `eltype` to `VarName`? This might be useful when someone does `[@varname(x[1]), @varname(m)]` which | ||
| # might result in a `Vector{Any}`. | ||
| """ | ||
| subset(varinfo::AbstractVarInfo, vns::AbstractVector{<:VarName}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we would not already have so many getindex methods, I would have thought that getindex would be a natural name for this function. But maybe it's still an option?
Then we could have getindex(::AbstractVarInfo, ::AbstractVector{<:VarName}) -> AbstractVarInfo and getindex(::T, ::VarName) -> typeof_varname_variate, similar to [1,2,3][[1,3]] = [1, 3] and [1,2,3][2] = 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like this yes, but I also really don't want to touch getindex in this codebase 😅
Happy to make this a long-term goal or something though!
src/abstract_varinfo.jl
Outdated
| See docstring of [`subset(varinfo, vns)`](@ref) for examples. | ||
| """ | ||
| function Base.merge(varinfo::AbstractVarInfo, varinfo_others::AbstractVarInfo...) | ||
| return merge(Base.merge(varinfo, first(varinfo_others)), Base.tail(varinfo_others)...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that length(varinfo_others) == 0, and then the function would error it seems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah true; I'll address this.
I meant to add tests for merge with more than two inputs too, but didn't get around to it. Will do that too.
|
Okay, so I think this is now properly ready for a potential approval (after comments have been resolved) |
Pull Request Test Coverage Report for Build 6518813946
💛 - Coveralls |
|
Okay, so I added quite a bit of changes to ensure that |
|
Test coverage should now be improved 👍 |
|
This is ready for a look-over now btw. |
yebai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @torfjelde -- it looks good to me!
|
@devmotion had some open comments here, so I really would have preferred to have his 👍 on this PR too |
I was thinking that #542 would be a quick merge, and so I just based #543 on #542 so I could easily test the new
Gibbsin Turing.jl, buuut it seems that wasn't as quick as I thought and so this PR replaces #543 without any dependence on #542.Hence: see #543 for examples and some comments