More efficient `PoissonBinomial` #1285

devmotion · 2021-02-23T12:46:47Z

For many computations, such as e.g. mean and var, it is not necessary to evaluate the probability mass function of PoissonBinomial for all possible outcomes. However, currently, it is computed when PoissonBinomial is constructed. For large number of trials this induces a significant delay and might even make it too inefficient to work with PoissonBinomial at all.

In this PR, instead the values of the probability mass function are only computed when a user performs an evaluation that requires it.

Simple benchmark on my computer:

master

julia> using Distributions, BenchmarkTools, Random

julia> Random.seed!(1234);

julia> p = rand(100);

julia> @btime PoissonBinomial($p);
  4.993 μs (1 allocation: 896 bytes)

julia> @btime PoissonBinomial($p; check_args=false);
  4.885 μs (1 allocation: 896 bytes)

julia> @btime mean(PoissonBinomial($p));
  5.000 μs (1 allocation: 896 bytes)

julia> @btime var(PoissonBinomial($p));
  5.035 μs (3 allocations: 2.63 KiB)

julia> @btime mode(PoissonBinomial($p));
  5.239 μs (1 allocation: 896 bytes)

julia> @btime median(PoissonBinomial($p));
  5.256 μs (2 allocations: 1.75 KiB)

julia> @btime modes(PoissonBinomial($p));
  6.412 μs (50 allocations: 6.23 KiB)

julia> @btime mgf(PoissonBinomial($p), 0.1);
  5.221 μs (2 allocations: 1.75 KiB)

julia> @btime cf(PoissonBinomial($p), 0.1);
  5.380 μs (2 allocations: 2.64 KiB)

julia> @btime entropy(PoissonBinomial($p));
  5.988 μs (2 allocations: 1.75 KiB)

julia> @btime mgf($(PoissonBinomial(p)), 0.1);
  87.389 ns (1 allocation: 896 bytes)

julia> @btime cf($(PoissonBinomial(p)), 0.1);
  337.177 ns (1 allocation: 1.77 KiB)

This PR

julia> using Distributions, BenchmarkTools, Random

julia> Random.seed!(1234);

julia> p = rand(100);

julia> @btime PoissonBinomial($p);
  90.943 ns (1 allocation: 32 bytes)

julia> @btime PoissonBinomial($p; check_args=false);
  5.545 ns (1 allocation: 32 bytes)

julia> @btime mean(PoissonBinomial($p));
  104.703 ns (0 allocations: 0 bytes)

julia> @btime var(PoissonBinomial($p));
  191.663 ns (0 allocations: 0 bytes)

julia> @btime mode(PoissonBinomial($p));
  5.118 μs (2 allocations: 928 bytes)

julia> @btime median(PoissonBinomial($p));
  5.004 μs (3 allocations: 1.78 KiB)

julia> @btime modes(PoissonBinomial($p));
  6.132 μs (50 allocations: 6.17 KiB)

julia> @btime mgf(PoissonBinomial($p), 0.1);
  136.762 ns (0 allocations: 0 bytes)

julia> @btime cf(PoissonBinomial($p), 0.1);
  332.973 ns (0 allocations: 0 bytes)

julia> @btime entropy(PoissonBinomial($p));
  5.617 μs (2 allocations: 928 bytes)

julia> @btime mgf($(PoissonBinomial(p)), 0.1);
  43.553 ns (0 allocations: 0 bytes)

julia> @btime cf($(PoissonBinomial(p)), 0.1);
  245.246 ns (0 allocations: 0 bytes)

This PR is motivated by TuringLang/MCMCChains.jl#263: often it is sufficient to evaluate the mean and the variance (as an uncertainty measure) of the Poisson binomial distribution of the statistic of interest, and therefor the quadratic complexity of the evaluation of the probability mass function slows down the computations for large number of trials (e.g., MCMC samples in this case) unnecessarily.

andreasnoack · 2021-02-23T13:00:01Z

src/univariate/discrete/poissonbinomial.jl

-        new{T}(p, pb)
+mutable struct PoissonBinomial{T<:Real,P<:AbstractVector{T}} <: DiscreteUnivariateDistribution
+    p::P
+    pmf::Union{Nothing,Vector{T}} # lazy computation of the probability mass function


Did you consider using an empty vector here? That would allow the struct to be immutable.

Yes, I did. I wanted to try this approach first since arguably it is a bit "clearer" (nothing clearly indicates that it was not initialized whereas otherwise one has to check isempty) and the alternative would require to change poissonbinomial_pdf to an in-place function (which is clearly possible though). But I have no strong opinion here, I can compare it to the other approach.

codecov-io · 2021-02-23T13:02:46Z

Codecov Report

Merging #1285 (5d303cd) into master (7120d50) will decrease coverage by 0.06%.
The diff coverage is 88.23%.

@@            Coverage Diff             @@
##           master    #1285      +/-   ##
==========================================
- Coverage   81.60%   81.54%   -0.07%     
==========================================
  Files         117      115       -2     
  Lines        6665     6636      -29     
==========================================
- Hits         5439     5411      -28     
+ Misses       1226     1225       -1

Impacted Files	Coverage Δ
src/univariate/discrete/poissonbinomial.jl	`89.65% <88.23%> (-0.83%)`	⬇️
src/univariates.jl	`61.41% <0.00%> (-5.25%)`	⬇️
src/common.jl	`67.56% <0.00%> (-0.86%)`	⬇️
src/multivariates.jl	`70.66% <0.00%> (-0.77%)`	⬇️
src/matrixvariates.jl	`76.81% <0.00%> (-0.66%)`	⬇️
src/quantilealgs.jl	`80.95% <0.00%> (-0.23%)`	⬇️
src/utils.jl	`90.47% <0.00%> (-0.23%)`	⬇️
src/Distributions.jl
src/samplers.jl
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7120d50...5d303cd. Read the comment docs.

devmotion · 2021-02-23T19:40:06Z

I compared the implementation in this PR with an implementation of an immutable struct that uses pmf::Vector{T} which is empty in the uninitialized state and resized and then filled with an in-place variant of poissonbinomial_pdf if needed.

Without `sizehint!`

Benchmarks without sizehint!(pmf, length(p) + 1) in the constructor:

julia> using Distributions, BenchmarkTools, Random

julia> Random.seed!(1234);

julia> p = rand(100);

julia> @btime PoissonBinomial($p);
  110.461 ns (1 allocation: 80 bytes)

julia> @btime PoissonBinomial($p; check_args=false);
  31.534 ns (1 allocation: 80 bytes)

julia> @btime mean(PoissonBinomial($p));
  121.792 ns (1 allocation: 80 bytes)

julia> @btime var(PoissonBinomial($p));
  217.599 ns (1 allocation: 80 bytes)

julia> @btime mode(PoissonBinomial($p));
  4.675 μs (4 allocations: 960 bytes)

julia> @btime median(PoissonBinomial($p));
  4.706 μs (5 allocations: 1.81 KiB)

julia> @btime modes(PoissonBinomial($p));
  5.808 μs (52 allocations: 6.20 KiB)

julia> @btime mgf(PoissonBinomial($p), 0.1);
  152.058 ns (1 allocation: 80 bytes)

julia> @btime cf(PoissonBinomial($p), 0.1);
  360.134 ns (1 allocation: 80 bytes)

julia> @btime entropy(PoissonBinomial($p));
  5.155 μs (4 allocations: 960 bytes)

julia> @btime mgf($(PoissonBinomial(p)), 0.1);
  42.884 ns (0 allocations: 0 bytes)

julia> @btime cf($(PoissonBinomial(p)), 0.1);
  246.104 ns (0 allocations: 0 bytes)

With `sizehint!`

With sizehint!(pmf, length(p) + 1) in the constructor I get:

julia> using Distributions, BenchmarkTools, Random

julia> Random.seed!(1234);

julia> p = rand(100);

julia> @btime PoissonBinomial($p);
  139.443 ns (2 allocations: 896 bytes)

julia> @btime PoissonBinomial($p; check_args=false);
  57.826 ns (2 allocations: 896 bytes)

julia> @btime mean(PoissonBinomial($p));
  146.683 ns (2 allocations: 896 bytes)

julia> @btime var(PoissonBinomial($p));
  240.535 ns (2 allocations: 896 bytes)

julia> @btime mode(PoissonBinomial($p));
  4.688 μs (4 allocations: 960 bytes)

julia> @btime median(PoissonBinomial($p));
  4.714 μs (5 allocations: 1.81 KiB)

julia> @btime modes(PoissonBinomial($p));
  5.825 μs (52 allocations: 6.20 KiB)

julia> @btime mgf(PoissonBinomial($p), 0.1);
  175.456 ns (2 allocations: 896 bytes)

julia> @btime cf(PoissonBinomial($p), 0.1);
  385.257 ns (2 allocations: 896 bytes)

julia> @btime entropy(PoissonBinomial($p));
  5.162 μs (4 allocations: 960 bytes)

julia> @btime mgf($(PoissonBinomial(p)), 0.1);
  42.805 ns (0 allocations: 0 bytes)

julia> @btime cf($(PoissonBinomial(p)), 0.1);
  244.248 ns (0 allocations: 0 bytes)

Summary

So in both cases the number of allocations is larger than in this PR and the functions that do not require the evaluation of the probability mass function are slower (which I tried to optimize in this PR). I did not expect these results - what's the explanation for them? Are the compiler optimizations for Union{Nothing,T} just "too" good?

andreasnoack · 2021-02-23T19:54:39Z

Thanks for benchmarking this. The compiler optimizations for the small union here might be really good. Why would e.g. the mean function allocate in the latter case but not in the Union case?

devmotion · 2021-02-23T20:20:28Z

Why would e.g. the mean function allocate in the latter case but not in the Union case?

Yes, IMO this was the most surprising result. I assume that we just see the allocation of the empty vector (80 bytes) in the benchmark without sizehint! (with sizehint! it increases to 896 bytes) whereas in the Union case no vectors have to be allocated. Therefore the number of allocations is reduced and it is faster.

andreasnoack · 2021-02-24T08:06:14Z

This is puzzling to me

julia> @btime PoissonBinomial($p);
  90.943 ns (1 allocation: 32 bytes)
...
julia> @btime mean(PoissonBinomial($p));
  104.703 ns (0 allocations: 0 bytes)

I'd think that the latter would time construction as well as the mean calculation whereas the first would only time the construction. Why is the first allocating when the second one isn't?

devmotion · 2021-02-24T10:40:39Z

Yes, this was surprising to me as well. One can get rid of the allocation by changing the returned value:

julia> f(p) = (PoissonBinomial(p); return)

julia> @btime f($p);
  88.468 ns (0 allocations: 0 bytes)

I just don't fully understand why one gets this additional allocation. I inspected the output of @code_llvm for both cases, and it seems the main/only difference apart from the returned value are the additional statements

; │││ @ /home/david/.julia/dev/Distributions/src/univariate/discrete/poissonbinomial.jl:32 within `_#64'
     %44 = bitcast %jl_value_t*** %ptls to i8*
     %45 = call noalias nonnull %jl_value_t* @jl_gc_pool_alloc(i8* %44, i32 1424, i32 32) #2
     %46 = bitcast %jl_value_t* %45 to %jl_value_t**
     %47 = getelementptr %jl_value_t*, %jl_value_t** %46, i64 -1
     store %jl_value_t* inttoptr (i64 140610215652240 to %jl_value_t*), %jl_value_t** %47
     %48 = bitcast %jl_value_t* %45 to %jl_value_t**
     store %jl_value_t* %13, %jl_value_t** %48, align 8
     %49 = bitcast %jl_value_t* %45 to i8*
     %50 = getelementptr inbounds i8, i8* %49, i64 8
     %51 = bitcast i8* %50 to %jl_value_t**
     store %jl_value_t* inttoptr (i64 140610869056864 to %jl_value_t*), %jl_value_t** %51, align 8

in the version with one allocation. So I assume in the non-allocating case (and also the mean etc. examples) the PoissonBinomial instance is never instantiated but the call of new{T,typeof(p)}(p, nothing) is removed by the compiler?

devmotion · 2021-03-25T14:15:50Z

How should we move on with this PR?

andreasnoack · 2021-03-26T08:32:05Z

I think we should just stick with your current implementation since the alternative one didn't show improvements.

More efficient PoissonBinomial

5d303cd

andreasnoack reviewed Feb 23, 2021

View reviewed changes

This was referenced Feb 23, 2021

Generalize LocationScale to discrete distributions #1286

Merged

Return analytical distribution of R star statistic TuringLang/MCMCChains.jl#263

Closed

andreasnoack merged commit d63d370 into JuliaStats:master Mar 26, 2021

devmotion mentioned this pull request Apr 22, 2021

Update ChainRules definitions and add differential for PoissonBinomial pdf TuringLang/DistributionsAD.jl#162

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More efficient `PoissonBinomial` #1285

More efficient `PoissonBinomial` #1285

Uh oh!

devmotion commented Feb 23, 2021

Uh oh!

andreasnoack Feb 23, 2021

Uh oh!

devmotion Feb 23, 2021

Uh oh!

codecov-io commented Feb 23, 2021 •

edited

Loading

Uh oh!

devmotion commented Feb 23, 2021

Uh oh!

andreasnoack commented Feb 23, 2021

Uh oh!

devmotion commented Feb 23, 2021

Uh oh!

andreasnoack commented Feb 24, 2021

Uh oh!

devmotion commented Feb 24, 2021

Uh oh!

devmotion commented Mar 25, 2021

Uh oh!

andreasnoack commented Mar 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

More efficient PoissonBinomial #1285

More efficient PoissonBinomial #1285

Uh oh!

Conversation

devmotion commented Feb 23, 2021

master

This PR

Uh oh!

andreasnoack Feb 23, 2021

Choose a reason for hiding this comment

Uh oh!

devmotion Feb 23, 2021

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Feb 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

devmotion commented Feb 23, 2021

Without sizehint!

With sizehint!

Summary

Uh oh!

andreasnoack commented Feb 23, 2021

Uh oh!

devmotion commented Feb 23, 2021

Uh oh!

andreasnoack commented Feb 24, 2021

Uh oh!

devmotion commented Feb 24, 2021

Uh oh!

devmotion commented Mar 25, 2021

Uh oh!

andreasnoack commented Mar 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

More efficient `PoissonBinomial` #1285

More efficient `PoissonBinomial` #1285

codecov-io commented Feb 23, 2021 •

edited

Loading

Without `sizehint!`

With `sizehint!`