Skip to content

Composing partial models? #1553

@burnpanck

Description

@burnpanck

When experimenting with complex hierarchical models, I try to build the model in a modular fashion, with a clean structured interface between the weakly coupled parts of the model. While not much of an actual problem in practice, I find it conceptually unfitting that the modular parts need to provide a reference to the complete Model while it is constructed. In my view, a model is just a collection of random variables that are either dependent on other random variables, or free random variables that are associated with a probability distribution. Such collections should be easy to compose. Thus I'm wondering if there is an API to compose multiple related or unrelated models.

In fact, I think it should be possible to specify a model by simply describing the relations between random variables without ever explicitly referencing a global model with which the variables are associated. In that view, the model could in principle always stay implicit, as a user you simply ask PyMC3 to calculate/sample a posterior distribution of a given set of random variables, and PyMC3 would then traverse the theano graph to identify all involved random variables.

An intermediate version of these two extremes would be an API that allows a user to explicitly ask PyMC3 to generate a model by collecting all involved random variables from a set of random variables of explicit interest, thus removing the reference to the model during construction of the random variables. Indeed, as far as I can tell from reading the code, the random variables themselves don't do much with the model that they are so eager to know at construction time, except storing it in an attribute. The model on the other hand collects the variables directly, but it shouldn't be too difficult to delay this collection effort to later.

I am inclined to try to implement a mock Model which doesn't store anything to keep the random variables code happy, and then add a function to collect random variables into a model at a later time. Do you see any difficulties with that approach, i.e. are there situations where the random variables actually need to know their model at construction time?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions