BNP priors for random partitions #591

trappmartin · 2018-10-22T15:10:24Z

This PR is a work in progress PR, integrating the existing codes of #370 and #374 for random partitions.

TODO:

Add tests for distributions.
Add SB, SBS, CRP codes for DP, PYP related models.
Modify Turing.Chain to work with missing values.
Add CRP example & test.
Add Stick-Breaking example & test.
Add SBS example & test.
Check if we sample from correct posterior, see: https://github.com/TuringLang/Turing.jl/blob/project-bnp/test/rpm.jl/pym_posterior.jl

Changes to code base:

Added BNP priors

mlomeli1 · 2018-11-30T10:10:09Z

This PR is a work in progress PR, integrating the existing codes of #370 and #374 for random partitions.

Do not merge!

TODO:

Add tests for distributions.

Add missing codes.

Modify Turing.Chain to work with missing values.

Add DP-MM example.

cc: @mlomeli1 , @emilemathieu

That's great @trappmartin . If you want to have a look at https://github.com/mlomeli1/SMC-MPhilproject/tree/master/MFM_for_Turing , there is some code for SMC for DPM as well. I believe you have access to this private repo. I also have my Phd Matlab code for the Q-class, let me know if you would like access to that repo if it is useful :)

emilemathieu · 2018-11-30T10:25:58Z

Looking forward to that :)
What's still missing then ? I'm glad to answer questions if that can help.
Cheers,
Emile

trappmartin · 2018-11-30T10:55:41Z

Thanks, @emilemathieu and @mlomeli1!
There are still some things breaking but I keep you up to date.

I changed the implementation of BNP priors by separating the representation from the stochastic process. For example, a Pitman-Yor process can now be constructed as follows:

a = 0.5
θ = 0.1
t = 2

# stick-breaking representation
d = StickBreakingProcess(PitmanYorProcess(a, θ, t))

# size-biased sampling representation
surplus = 2.0
d = SizeBiasedSamplingProcess(PitmanYorProcess(a, θ, t), surplus)

# CRP representation
cluster_counts = [2, 1]
d = ChineseRestaurantProcess(PitmanYorProcess(a, θ, t), cluster_counts)

Let me know what you think of the new interface. I hope it's easier to use and allows us to have a more flexible interface for BNP priors.

Cheers,
Martin

emilemathieu · 2018-11-30T11:49:50Z

I believe such an interface is way better !
The only question is how such processes are represented internally ? Thus how inference can be performed (eg SMC)?

trappmartin · 2018-11-30T11:54:17Z

Thanks, I think it should not have much of an influence on the sampling process. I should see soon. :D

trappmartin · 2018-12-10T13:37:36Z

Chinese Restaurant Process Example using current implementation:

@model infiniteMM(y; H = Normal(mean(y), std(y) * 2), rpm = DirichletProcess(0.1) ) = begin
    
    # Latent assignments.
    N = length(y)
    z = tzeros(Int, N)

    # Cluster counts.
    cluster_counts = tzeros(Int, N)

    # Cluster locations.
    x = tzeros(Float64, N)

    for i in 1:N

        # Draw assignments using a CRP.
        z[i] ~ ChineseRestaurantProcess(rpm, cluster_counts)
        if cluster_counts[z[i]] == 0
            # Cluster is new, therefore, draw new location.
            x[z[i]] ~ H
        end
        cluster_counts[z[i]] += 1

        # Draw observation.
        y[i] ~ Normal(x[z[i]], 0.5)
    end
    return z
end

emilemathieu · 2018-12-10T13:54:15Z

Nice ! Do you think that interface allows to easily extends to NIGP and such ?

trappmartin · 2018-12-10T14:06:04Z

Yes, I'm pretty certain this should be possible. For this PR the focus is only on DP and PYP but it totally makes sense to extend the code after merging this PR.

trappmartin · 2018-12-10T14:07:32Z

Ups, I think I broke something. o.O

emilemathieu · 2018-12-11T17:23:08Z

changing data = vcat(rand(Normal(0, 0.5), 10), rand(Normal(8, 0.5), 10)) to data = vcat(rand(Normal(0, 0.5), 10), rand(Normal(1, 0.5), 10)) yields an error.
I was pushing the cluster closer to try to get different value for the particles since they are all equals...

… ref/BNP

trappmartin · 2019-03-11T14:28:17Z

I added the test for the stick-breaking representation. However, I'm a bit unsure if my implementation is correct / necessary or if we can use the one by @emilemathieu.

Here is my version of a truncated stick-breaking in Turing.

@model sbimm(y, rpm, trunc) = begin
    # Base distribution.
    H = Normal(mu_0, sigma_0)

    # Latent assignments.
    N = length(y)
    z = tzeros(Int, N)

    # Infinite collection of stick pieces and weights.
    v = tzeros(Float64, trunc)
    w = tzeros(Float64, trunc)
    K = 0

    # Cluster locations.
    x = tzeros(Float64, trunc)

    for i in 1:N

        # Draw a slice ∈ [0,1].
        u[i] ~ Beta(1, 1)

        # Instantiate new cluster.
        while (sum(w) < u[i]) && (K < trunc)
            K += 1
            v[K] ~ StickBreakingProcess(rpm)
            x[K] ~ H
            w[K] = v[K] * prod(1 .- v[1:(K-1)])
        end

        # Find truncation point
        K_ = findfirst(u[i] .< cumsum(w))

        # Sample assignments.
        w_ = w[1:K_] / sum(w[1:K_])
        z[i] ~ Categorical(w_)

        # Draw observation.
        y[i] ~ Normal(x[z[i]], sigma_1)
    end
end

@emilemathieu and @yebai what are your thoughts?

emilemathieu · 2019-03-11T15:03:18Z

Hi Martin! This seems to be a valid implementation of a stick breaking process :)
PS: yet it is inefficient as we argue in our workshop paper

trappmartin · 2019-03-11T15:33:49Z

@cpfiffer the tests of this PR seem to be broken due to some bug in displaying MCMCChains. Can you have a look?

yebai · 2019-03-11T15:35:31Z

@cpfiffer the tests of this PR seem to be broken due to some bug in displaying MCMCChains. Can you have a look?

It's solved on the master branch; you need to rebase master into this PR.

… ref/BNP

trappmartin · 2019-03-11T17:09:42Z

@emilemathieu I did some minor adjustment on your code for the SBS. Could you let me know if this is correct for simulation based sampling. I'm still not too familiar with the SBS and probably should read the paper again once I find the time. :)

Based on: https://github.com/TuringLang/Turing.jl/blob/project-bnp/test/rpm.jl/imm.jl

Thanks!

@model sbsimm(y,rpm) = begin
    # Base distribution.
    H = Normal(mu_0, sigma_0)

    # Latent assignments.
    N = length(y)
    z = tzeros(Int, N)

    x = tzeros(Float64, N)
    J = tzeros(Float64, N)
    z = tzeros(Int, N)

    k = 0
    surplus = 1

    for i in 1:N
        ps = vcat(J[1:k], surplus)
        z[i] ~ Categorical(ps)
        if z[i] > k
            k = k + 1
            J[k] ~ SizeBiasedSamplingProcess(rpm, surplus)
            x[k] ~ H
            surplus -= J[k]
        end
        y[i] ~ Normal(x[z[i]], sigma_1)
    end
end

trappmartin · 2019-03-11T18:03:55Z

@yebai once the SBS sampling example is correct, this PR is ready for review and merging.

… ref/BNP

trappmartin · 2019-03-18T11:23:45Z

This PR is ready to be merged from my side.

src/randomMeasures/rpm.jl

yebai · 2019-03-19T22:07:52Z

test/rpm.jl/sb.jl

+        end
+
+        # Find truncation point
+        K_ = findfirst(u[i] .< cumsum(w))


This seems like a non-typical slice sampler for DPs. Do you have a reference for this slice sampling representation?

You are right. After looking at it again, it seems rather odd and is probably not quite correct. I can change it to the retrospective sampler by Papaspiliopoulos and Roberts which seems straight forward in Turing to me. Or do you have a preference for another one, e.g. Walker et al.?

I don't have a preference for this; the method in Papaspiliopoulos and Roberts sounds good to me. Or, we can simply implement a basic recursive stick breaking if it's only for testing purpose. We can leave the task of advanced implementations till later, perhaps in a BNP tutorial?

Sounds good to me. I already started a BNP tutorial and will put the Papaspiliopoulos and Roberts code in there. With basic recursive stick breaking you mean a truncated implementation with fixed truncation point, right?

I mean Alg 1 from the following paper:

http://www.robots.ox.ac.uk/~twgr/assets/pdf/bloemreddy2017rpm.pdf

This version doesn't involve any truncation through the use of a random coin-flip based termination criterion. I still need to read the original paper to have a better understanding of why this is equivalent to the standard stick-breaking, but my guess is that the expectation of the process converges to the standard stick-breaking process.

I see. Yes, it looks like it does. I'll read the paper more carefully again as I forgot about the recursive coin-flipping.

Algo 1 (coin-flipping based) is complicated to implement in Turing because of the variable name issue we have, i.e. the coin will not be resampled recursively because of:

Turing.jl/src/inference/pgibbs.jl

Lines 154 to 167 in c5ca896

if ~haskey(vi, vn)

r = rand(dist)

push!(vi, vn, r, dist, spl.alg.gid)

spl.info[:cache_updated] = CACHERESET # sanity flag mask for getidcs and getranges

elseif is_flagged(vi, vn, "del")

unset_flag!(vi, vn, "del")

r = rand(dist)

vi[vn] = vectorize(dist, r)

setgid!(vi, spl.alg.gid, vn)

setorder!(vi, vn, vi.num_produce)

else

updategid!(vi, vn, spl)

r = vi[vn]

end

I'll keep the truncated stick-breaking test for now.

I see, perhaps move this into a separate issue?

This should be fixed soon - see related discussion #720 (review).

yebai · 2019-03-20T11:21:17Z

It's probably good to have a dedicated folder for customised distributions in Turing, e.g. src/distrs/. If so, we can consider placing the main BNP module in src/distrs/RandomMeasures.jl.

src/randomMeasures/RandomMeasures.jl

src/Turing.jl

yebai · 2019-03-20T13:53:43Z

Excellent work - Ready to merge except one minor filename issue (see above).

trappmartin added the bnp label Nov 15, 2018

trappmartin self-assigned this Dec 7, 2018

trappmartin changed the title ~~[WIP] BNP priors for random partitions~~ BNP priors for random partitions Dec 10, 2018

trappmartin added 10 commits February 11, 2019 12:57

Added parts of the existing BNP code.

2718f99

Refactoring of DP and PYP, added CRP and tests for existing BNP priors.

dbbb46d

Cleaned up chain type construction.

be84689

changes code base to correctly initialize Chain type, adjusted tests

6702202

minor fix

d9b6569

infinite MM test, wip

80cbaea

CRP now seems to be working.

4f775c7

clean up.

e4c580d

Bug fix.

0660ae5

Changed to SMC.

2a6c130

trappmartin force-pushed the ref/BNP branch from 745ab92 to 2a6c130 Compare February 11, 2019 12:04

trappmartin added 6 commits February 11, 2019 14:57

reverted

444d0b8

reverting

e42dd53

Refactoring and fixing distribution tests.

58eb02b

Work in progress.

b5f094c

Merge branch 'master' into ref/BNP

d36691f

Merge branch 'master' of https://github.com/TuringLang/Turing.jl into…

128be53

… ref/BNP

Added stick-breaking process test, truncated.

fe27f0d

trappmartin requested review from emilemathieu and yebai March 11, 2019 14:24

Fix in trunc. stick-breaking test.

4152c67

Merge branch 'master' of https://github.com/TuringLang/Turing.jl into…

d545e4b

… ref/BNP

trappmartin added 3 commits March 17, 2019 12:05

Merge branch 'master' of https://github.com/TuringLang/Turing.jl into…

4668951

… ref/BNP

Fixed test to work with new Chains type.

1eddb58

added size-biased sampling test.

719ec45

trappmartin added the new-feature label Mar 17, 2019

Minor change on SBS test.

e2ea777

yebai requested changes Mar 19, 2019

View reviewed changes

Changed crp to _logpdf_table, changed internal interface to use dispatch

329e1e0

yebai reviewed Mar 20, 2019

View reviewed changes

src/randomMeasures/RandomMeasures.jl Outdated Show resolved Hide resolved

trappmartin added 2 commits March 20, 2019 12:51

Changed stick-breaking test to use fixed truncation point.

5d44ee9

Moved code to single file under distributions folder

20f7551

yebai reviewed Mar 20, 2019

View reviewed changes

src/Turing.jl Outdated Show resolved Hide resolved

trappmartin added 2 commits March 20, 2019 16:59

Renamed randomMeasures.jl to RandomMeasures.jl

2e8b17d

Increased test bounds to account for truncation error.

99920f9

yebai approved these changes Mar 20, 2019

View reviewed changes

yebai merged commit 8f6aee6 into master Mar 20, 2019

yebai deleted the ref/BNP branch March 20, 2019 17:12

	if ~haskey(vi, vn)
	r = rand(dist)
	push!(vi, vn, r, dist, spl.alg.gid)
	spl.info[:cache_updated] = CACHERESET # sanity flag mask for getidcs and getranges
	elseif is_flagged(vi, vn, "del")
	unset_flag!(vi, vn, "del")
	r = rand(dist)
	vi[vn] = vectorize(dist, r)
	setgid!(vi, spl.alg.gid, vn)
	setorder!(vi, vn, vi.num_produce)
	else
	updategid!(vi, vn, spl)
	r = vi[vn]
	end

BNP priors for random partitions #591

BNP priors for random partitions #591

Uh oh!

Conversation

trappmartin commented Oct 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlomeli1 commented Nov 30, 2018

Uh oh!

emilemathieu commented Nov 30, 2018

Uh oh!

trappmartin commented Nov 30, 2018

Uh oh!

emilemathieu commented Nov 30, 2018

Uh oh!

trappmartin commented Nov 30, 2018

Uh oh!

trappmartin commented Dec 10, 2018

Uh oh!

emilemathieu commented Dec 10, 2018

Uh oh!

trappmartin commented Dec 10, 2018

Uh oh!

trappmartin commented Dec 10, 2018

Uh oh!

emilemathieu commented Dec 11, 2018

Uh oh!

trappmartin commented Mar 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emilemathieu commented Mar 11, 2019

Uh oh!

trappmartin commented Mar 11, 2019

Uh oh!

yebai commented Mar 11, 2019

Uh oh!

trappmartin commented Mar 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trappmartin commented Mar 11, 2019

Uh oh!

trappmartin commented Mar 18, 2019

Uh oh!

Uh oh!

Uh oh!

yebai Mar 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trappmartin Mar 20, 2019

Choose a reason for hiding this comment

Uh oh!

yebai Mar 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trappmartin Mar 20, 2019

Choose a reason for hiding this comment

Uh oh!

yebai Mar 20, 2019

Choose a reason for hiding this comment

Uh oh!

trappmartin Mar 20, 2019

Choose a reason for hiding this comment

Uh oh!

trappmartin Mar 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yebai Mar 20, 2019

Choose a reason for hiding this comment

Uh oh!

yebai commented Mar 20, 2019

Uh oh!

Uh oh!

Uh oh!

yebai commented Mar 20, 2019

Uh oh!

Reviewers

Assignees

trappmartin commented Oct 22, 2018 •

edited

Loading

trappmartin commented Mar 11, 2019 •

edited

Loading

trappmartin commented Mar 11, 2019 •

edited

Loading

yebai Mar 19, 2019 •

edited

Loading

yebai Mar 20, 2019 •

edited

Loading

trappmartin Mar 20, 2019 •

edited

Loading