Skip to content

Conversation

junpenglao
Copy link
Member

@junpenglao junpenglao commented Jun 5, 2018

The output from sample_ppc does not equal to the shape of the observed sometimes, as the distribution.shape of ObservedRVs are []. Initially, this PR adds the specific shape (i.e., shape of the data) to the ObservedRVs, so that generating random sample will have the correct shape, which makes working with random method easier (e.g., PR #2984). However, doing so turns out breaks a lot of examples when theano.shared observed is used, as updating the value via set_values change the shape.

Now this PR just fix the mixture random error (close #2954).

@junpenglao
Copy link
Member Author

Wow this is much more difficult than I thought with theano.shared variables...

@ColCarroll
Copy link
Member

Merge #2984 first? It doesn't break anything new...

@junpenglao
Copy link
Member Author

Totally - is that ready?

@ColCarroll
Copy link
Member

Yep! Tests all pass now, and I think it just passes size to more places.

Junpeng Lao added 7 commits June 5, 2018 21:22
Revert back to original, ie not setting the shape of observedRV
Also updated dependent_density_regression notebook
@junpenglao junpenglao changed the title Add observed RV shape to its distribution Fix random sampling in Mixture Jun 6, 2018
Copy link
Member

@ColCarroll ColCarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a tricky change! I made a few tiny suggestions (and a docstring suggestion that might be wrong?)

@@ -197,21 +212,21 @@ class NormalMixture(Mixture):
the component standard deviations
tau : array of floats
the component precisions
distshape : shape of the Normal component
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this is already called dist_shape in generate_samples (instead of distshape)... is it easy to change that here as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it to comp_shape to avoid confusion.

distshape : shape of the Normal component
notice that it should be different than the shape
of the mixture distribution, with one axis being
the number of component.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small typo: "number of components". I am also not sure I understand the docstring either -- would adding "a mixture of three 2d normal distributions would have shape=(3, 2) and dist_shape=(2,)" be accurate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a tricky one.
"a mixture of three 2d normal distributions would have shape=(3, 2) and dist_shape=(2,)" this is not correct. A mixture of three 2d normal distributions would have shape=(..., 2), and dist_shape=(2, 3). But I am not sure multi-D normal mixture actually works...
In practice, if you have a NormalMixture RV that is shape=(a,b,c,...), the component shape get one more axis comp_shape=(a,b,c,...,k), with k components. However, passing comp_shape=(a,b,c,...,k) (also what the previous code trying to do) break cases when you are using theano.shared observed and sample_ppc.

@junpenglao
Copy link
Member Author

Will merge if no more comments.

@junpenglao junpenglao merged commit 9e7495b into pymc-devs:master Jun 14, 2018
@junpenglao junpenglao deleted the fix_mixture_random branch June 14, 2018 07:48
@ahmadsalim
Copy link

Of course, I am glad that bugs with this distribution is fixed 😄 .

However, I experienced that some of my code which relied on mixtures for multiple dimensions broke. Have you considered updating the RELEASE_NOTES to note the change in the way shapes are handled?

Thanks!

PS: I can provide a minimal breaking example if needed.

@twiecki
Copy link
Member

twiecki commented Jun 20, 2018

@ahmadsalim Please provide an example.

@ahmadsalim
Copy link

@twiecki

Here, you go:

import pymc3 as pm
import numpy as np

with pm.Model() as model:
    mus = pm.Normal('mus', shape=(6,12))
    taus = pm.Gamma('taus', alpha=1, beta=1, shape=(6, 12))
    ws = pm.Dirichlet('ws', np.ones(12))
    mixture = pm.NormalMixture('m', w=ws, mu=mus, tau=taus)

I get the error:

IndexError: index 10 is out of bounds for size 6

@junpenglao
Copy link
Member Author

Can confirm - could you please raise an issue?

@ahmadsalim
Copy link

Yes, of course!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sample_ppc does not respect the size of Mixture likelihoods
4 participants