Skip to content

Conversation

@devmotion
Copy link
Member

For many computations, such as e.g. mean and var, it is not necessary to evaluate the probability mass function of PoissonBinomial for all possible outcomes. However, currently, it is computed when PoissonBinomial is constructed. For large number of trials this induces a significant delay and might even make it too inefficient to work with PoissonBinomial at all.

In this PR, instead the values of the probability mass function are only computed when a user performs an evaluation that requires it.

Simple benchmark on my computer:

master

julia> using Distributions, BenchmarkTools, Random

julia> Random.seed!(1234);

julia> p = rand(100);

julia> @btime PoissonBinomial($p);
  4.993 μs (1 allocation: 896 bytes)

julia> @btime PoissonBinomial($p; check_args=false);
  4.885 μs (1 allocation: 896 bytes)

julia> @btime mean(PoissonBinomial($p));
  5.000 μs (1 allocation: 896 bytes)

julia> @btime var(PoissonBinomial($p));
  5.035 μs (3 allocations: 2.63 KiB)

julia> @btime mode(PoissonBinomial($p));
  5.239 μs (1 allocation: 896 bytes)

julia> @btime median(PoissonBinomial($p));
  5.256 μs (2 allocations: 1.75 KiB)

julia> @btime modes(PoissonBinomial($p));
  6.412 μs (50 allocations: 6.23 KiB)

julia> @btime mgf(PoissonBinomial($p), 0.1);
  5.221 μs (2 allocations: 1.75 KiB)

julia> @btime cf(PoissonBinomial($p), 0.1);
  5.380 μs (2 allocations: 2.64 KiB)

julia> @btime entropy(PoissonBinomial($p));
  5.988 μs (2 allocations: 1.75 KiB)

julia> @btime mgf($(PoissonBinomial(p)), 0.1);
  87.389 ns (1 allocation: 896 bytes)

julia> @btime cf($(PoissonBinomial(p)), 0.1);
  337.177 ns (1 allocation: 1.77 KiB)

This PR

julia> using Distributions, BenchmarkTools, Random

julia> Random.seed!(1234);

julia> p = rand(100);

julia> @btime PoissonBinomial($p);
  90.943 ns (1 allocation: 32 bytes)

julia> @btime PoissonBinomial($p; check_args=false);
  5.545 ns (1 allocation: 32 bytes)

julia> @btime mean(PoissonBinomial($p));
  104.703 ns (0 allocations: 0 bytes)

julia> @btime var(PoissonBinomial($p));
  191.663 ns (0 allocations: 0 bytes)

julia> @btime mode(PoissonBinomial($p));
  5.118 μs (2 allocations: 928 bytes)

julia> @btime median(PoissonBinomial($p));
  5.004 μs (3 allocations: 1.78 KiB)

julia> @btime modes(PoissonBinomial($p));
  6.132 μs (50 allocations: 6.17 KiB)

julia> @btime mgf(PoissonBinomial($p), 0.1);
  136.762 ns (0 allocations: 0 bytes)

julia> @btime cf(PoissonBinomial($p), 0.1);
  332.973 ns (0 allocations: 0 bytes)

julia> @btime entropy(PoissonBinomial($p));
  5.617 μs (2 allocations: 928 bytes)

julia> @btime mgf($(PoissonBinomial(p)), 0.1);
  43.553 ns (0 allocations: 0 bytes)

julia> @btime cf($(PoissonBinomial(p)), 0.1);
  245.246 ns (0 allocations: 0 bytes)

This PR is motivated by TuringLang/MCMCChains.jl#263: often it is sufficient to evaluate the mean and the variance (as an uncertainty measure) of the Poisson binomial distribution of the statistic of interest, and therefor the quadratic complexity of the evaluation of the probability mass function slows down the computations for large number of trials (e.g., MCMC samples in this case) unnecessarily.

new{T}(p, pb)
mutable struct PoissonBinomial{T<:Real,P<:AbstractVector{T}} <: DiscreteUnivariateDistribution
p::P
pmf::Union{Nothing,Vector{T}} # lazy computation of the probability mass function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider using an empty vector here? That would allow the struct to be immutable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did. I wanted to try this approach first since arguably it is a bit "clearer" (nothing clearly indicates that it was not initialized whereas otherwise one has to check isempty) and the alternative would require to change poissonbinomial_pdf to an in-place function (which is clearly possible though). But I have no strong opinion here, I can compare it to the other approach.

@codecov-io
Copy link

codecov-io commented Feb 23, 2021

Codecov Report

Merging #1285 (5d303cd) into master (7120d50) will decrease coverage by 0.06%.
The diff coverage is 88.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1285      +/-   ##
==========================================
- Coverage   81.60%   81.54%   -0.07%     
==========================================
  Files         117      115       -2     
  Lines        6665     6636      -29     
==========================================
- Hits         5439     5411      -28     
+ Misses       1226     1225       -1     
Impacted Files Coverage Δ
src/univariate/discrete/poissonbinomial.jl 89.65% <88.23%> (-0.83%) ⬇️
src/univariates.jl 61.41% <0.00%> (-5.25%) ⬇️
src/common.jl 67.56% <0.00%> (-0.86%) ⬇️
src/multivariates.jl 70.66% <0.00%> (-0.77%) ⬇️
src/matrixvariates.jl 76.81% <0.00%> (-0.66%) ⬇️
src/quantilealgs.jl 80.95% <0.00%> (-0.23%) ⬇️
src/utils.jl 90.47% <0.00%> (-0.23%) ⬇️
src/Distributions.jl
src/samplers.jl
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7120d50...5d303cd. Read the comment docs.

@devmotion
Copy link
Member Author

I compared the implementation in this PR with an implementation of an immutable struct that uses pmf::Vector{T} which is empty in the uninitialized state and resized and then filled with an in-place variant of poissonbinomial_pdf if needed.

Without sizehint!

Benchmarks without sizehint!(pmf, length(p) + 1) in the constructor:

julia> using Distributions, BenchmarkTools, Random

julia> Random.seed!(1234);

julia> p = rand(100);

julia> @btime PoissonBinomial($p);
  110.461 ns (1 allocation: 80 bytes)

julia> @btime PoissonBinomial($p; check_args=false);
  31.534 ns (1 allocation: 80 bytes)

julia> @btime mean(PoissonBinomial($p));
  121.792 ns (1 allocation: 80 bytes)

julia> @btime var(PoissonBinomial($p));
  217.599 ns (1 allocation: 80 bytes)

julia> @btime mode(PoissonBinomial($p));
  4.675 μs (4 allocations: 960 bytes)

julia> @btime median(PoissonBinomial($p));
  4.706 μs (5 allocations: 1.81 KiB)

julia> @btime modes(PoissonBinomial($p));
  5.808 μs (52 allocations: 6.20 KiB)

julia> @btime mgf(PoissonBinomial($p), 0.1);
  152.058 ns (1 allocation: 80 bytes)

julia> @btime cf(PoissonBinomial($p), 0.1);
  360.134 ns (1 allocation: 80 bytes)

julia> @btime entropy(PoissonBinomial($p));
  5.155 μs (4 allocations: 960 bytes)

julia> @btime mgf($(PoissonBinomial(p)), 0.1);
  42.884 ns (0 allocations: 0 bytes)

julia> @btime cf($(PoissonBinomial(p)), 0.1);
  246.104 ns (0 allocations: 0 bytes)

With sizehint!

With sizehint!(pmf, length(p) + 1) in the constructor I get:

julia> using Distributions, BenchmarkTools, Random

julia> Random.seed!(1234);

julia> p = rand(100);

julia> @btime PoissonBinomial($p);
  139.443 ns (2 allocations: 896 bytes)

julia> @btime PoissonBinomial($p; check_args=false);
  57.826 ns (2 allocations: 896 bytes)

julia> @btime mean(PoissonBinomial($p));
  146.683 ns (2 allocations: 896 bytes)

julia> @btime var(PoissonBinomial($p));
  240.535 ns (2 allocations: 896 bytes)

julia> @btime mode(PoissonBinomial($p));
  4.688 μs (4 allocations: 960 bytes)

julia> @btime median(PoissonBinomial($p));
  4.714 μs (5 allocations: 1.81 KiB)

julia> @btime modes(PoissonBinomial($p));
  5.825 μs (52 allocations: 6.20 KiB)

julia> @btime mgf(PoissonBinomial($p), 0.1);
  175.456 ns (2 allocations: 896 bytes)

julia> @btime cf(PoissonBinomial($p), 0.1);
  385.257 ns (2 allocations: 896 bytes)

julia> @btime entropy(PoissonBinomial($p));
  5.162 μs (4 allocations: 960 bytes)

julia> @btime mgf($(PoissonBinomial(p)), 0.1);
  42.805 ns (0 allocations: 0 bytes)

julia> @btime cf($(PoissonBinomial(p)), 0.1);
  244.248 ns (0 allocations: 0 bytes)

Summary

So in both cases the number of allocations is larger than in this PR and the functions that do not require the evaluation of the probability mass function are slower (which I tried to optimize in this PR). I did not expect these results - what's the explanation for them? Are the compiler optimizations for Union{Nothing,T} just "too" good?

@andreasnoack
Copy link
Member

Thanks for benchmarking this. The compiler optimizations for the small union here might be really good. Why would e.g. the mean function allocate in the latter case but not in the Union case?

@devmotion
Copy link
Member Author

Why would e.g. the mean function allocate in the latter case but not in the Union case?

Yes, IMO this was the most surprising result. I assume that we just see the allocation of the empty vector (80 bytes) in the benchmark without sizehint! (with sizehint! it increases to 896 bytes) whereas in the Union case no vectors have to be allocated. Therefore the number of allocations is reduced and it is faster.

@andreasnoack
Copy link
Member

This is puzzling to me

julia> @btime PoissonBinomial($p);
  90.943 ns (1 allocation: 32 bytes)
...
julia> @btime mean(PoissonBinomial($p));
  104.703 ns (0 allocations: 0 bytes)

I'd think that the latter would time construction as well as the mean calculation whereas the first would only time the construction. Why is the first allocating when the second one isn't?

@devmotion
Copy link
Member Author

Yes, this was surprising to me as well. One can get rid of the allocation by changing the returned value:

julia> f(p) = (PoissonBinomial(p); return)

julia> @btime f($p);
  88.468 ns (0 allocations: 0 bytes)

I just don't fully understand why one gets this additional allocation. I inspected the output of @code_llvm for both cases, and it seems the main/only difference apart from the returned value are the additional statements

; │││ @ /home/david/.julia/dev/Distributions/src/univariate/discrete/poissonbinomial.jl:32 within `_#64'
     %44 = bitcast %jl_value_t*** %ptls to i8*
     %45 = call noalias nonnull %jl_value_t* @jl_gc_pool_alloc(i8* %44, i32 1424, i32 32) #2
     %46 = bitcast %jl_value_t* %45 to %jl_value_t**
     %47 = getelementptr %jl_value_t*, %jl_value_t** %46, i64 -1
     store %jl_value_t* inttoptr (i64 140610215652240 to %jl_value_t*), %jl_value_t** %47
     %48 = bitcast %jl_value_t* %45 to %jl_value_t**
     store %jl_value_t* %13, %jl_value_t** %48, align 8
     %49 = bitcast %jl_value_t* %45 to i8*
     %50 = getelementptr inbounds i8, i8* %49, i64 8
     %51 = bitcast i8* %50 to %jl_value_t**
     store %jl_value_t* inttoptr (i64 140610869056864 to %jl_value_t*), %jl_value_t** %51, align 8

in the version with one allocation. So I assume in the non-allocating case (and also the mean etc. examples) the PoissonBinomial instance is never instantiated but the call of new{T,typeof(p)}(p, nothing) is removed by the compiler?

@devmotion
Copy link
Member Author

How should we move on with this PR?

@andreasnoack
Copy link
Member

I think we should just stick with your current implementation since the alternative one didn't show improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants