Skip to content

Conversation

@torfjelde
Copy link
Member

In the more recent Julia versions view is zero-overhead, so maybe we should be using them all over the place?

There might also be some neat stuff we can do by accessing parent within the tilde-statements:)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@torfjelde torfjelde mentioned this pull request Jul 9, 2021
3 tasks
Copy link
Member

@yebai yebai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@torfjelde Maybe add some benchmarking with a concrete example? Also, a test would be helpful.

@torfjelde torfjelde mentioned this pull request Jul 13, 2021
@torfjelde
Copy link
Member Author

Maybe add some benchmarking with a concrete example? Also, a test would be helpful.

Benchmarks I'm with you on (but I see you found #248 which is exactly what I was going to refer to). But regarding tests, shouldn't this be covered abundantly by the existing test suite?

maybe_view(x::Expr) = :($(DynamicPPL.maybe_unwrap_view)(@view($x)))

maybe_unwrap_view(x) = x
maybe_unwrap_view(x::SubArray{<:Any,0}) = x[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment on why we need unwrap view here? Otherwise, happy to merge as-is.

@torfjelde
Copy link
Member Author

torfjelde commented Jul 13, 2021

Benchmarks are showing a tiny performance improvement, though the models we're currently benchmarking are probably not ideal for demonstrating the difference. demo3 is probably the best indication of what since it includes expressions such as x[:, i].

Current release

Setup

using BenchmarkTools, DynamicPPL, Distributions, Serialization
import DynamicPPLBenchmarks: time_model_def, make_suite, typed_code, weave_child

Models

demo1

@model function demo1(x)
    m ~ Normal()
    x ~ Normal(m, 1)

    return (m = m, x = x)
end

model_def = demo1;
data = 1.0;
@time model_def(data)();
1.235004 seconds (2.86 M allocations: 178.948 MiB, 5.86% gc time, 99.91% 
compilation time)
m = time_model_def(model_def, data);
0.000005 seconds (2 allocations: 48 bytes)
suite = make_suite(m);
results = run(suite);
results["evaluation_untyped"]
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  661.000 ns …  13.963 ms  ┊ GC (min … max): 0.00% … 0.0
0%
 Time  (median):     721.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):     2.182 μs ± 139.622 μs  ┊ GC (mean ± σ):  0.00% ± 0.0
0%

  ▂▆██▆▅▄▄▃▂▁▁                                                  ▂
  ██████████████▇█▇▇▆▆▆▅▅▆▆▆▅▆▆▆▇▇▆▆█▇▇▆▇▆▆▆▅▄▅▃▄▄▂▂▅▅▄▅▅▆▄▅▄▅▆ █
  661 ns        Histogram: log(frequency) by time       1.78 μs <

 Memory estimate: 528 bytes, allocs estimate: 14.
results["evaluation_typed"]
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  497.000 ns …  19.219 μs  ┊ GC (min … max): 0.00% … 0.0
0%
 Time  (median):     542.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   601.136 ns ± 386.032 ns  ┊ GC (mean ± σ):  0.00% ± 0.0
0%

  ▄█▇▆█▇▅▃▂▁▁                      ▁▂▂▂▂▂▁▁▁ ▁       ▁▁         ▂
  ████████████▇▆▆▆▅▅▃▁▃▅▃▅▆▆▇▇▇█▇███████████████████████▆▇▇▇▆▆▆ █
  497 ns        Histogram: log(frequency) by time       1.15 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.
if WEAVE_ARGS[:include_typed_code]
    typed = typed_code(m)
end

demo2

@model function demo2(y) 
    # Our prior belief about the probability of heads in a coin.
    p ~ Beta(1, 1)

    # The number of observations.
    N = length(y)
    for n in 1:N
        # Heads or tails of a coin are drawn from a Bernoulli distribution.
        y[n] ~ Bernoulli(p)
    end
end

model_def = demo2;
data = rand(0:1, 10);
@time model_def(data)();
0.534269 seconds (894.98 k allocations: 53.237 MiB, 3.11% gc time, 99.91%
 compilation time)
m = time_model_def(model_def, data);
0.000003 seconds (1 allocation: 32 bytes)
suite = make_suite(m);
results = run(suite);
results["evaluation_untyped"]
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  1.827 μs …  11.291 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.948 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.395 μs ± 112.885 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅█▇▆▅▃▃▂▁           ▁▂▂▂▁▁▂▂▂▂▂▂▂▂▁                         ▂
  ███████████▇▆▅▅▆▄▅▆█████████████████████▇▇▆▆▆▆▆▅▅▅▅▅▆▂▄▅▄▅▅ █
  1.83 μs      Histogram: log(frequency) by time      4.63 μs <

 Memory estimate: 1.70 KiB, allocs estimate: 48.
results["evaluation_typed"]
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  837.000 ns …  24.108 μs  ┊ GC (min … max): 0.00% … 0.0
0%
 Time  (median):     907.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.207 μs ± 606.730 ns  ┊ GC (mean ± σ):  0.00% ± 0.0
0%

    ▅▄█                                                          
  ▅▆███▃▂▂▂▂▂▂▂▂▂▁▁▂▁▂▂▁▁▂▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▃▄▄▄▄▄▄▅▅▄▄▃▃▃▂▂ ▃
  837 ns           Histogram: frequency by time         1.79 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.
if WEAVE_ARGS[:include_typed_code]
    typed = typed_code(m)
end

demo3

@model function demo3(x)
    D, N = size(x)

    # Draw the parameters for cluster 1.
    μ1 ~ Normal()

    # Draw the parameters for cluster 2.
    μ2 ~ Normal()

    μ = [μ1, μ2]

    # Comment out this line if you instead want to draw the weights.
    w = [0.5, 0.5]

    # Draw assignments for each datum and generate it from a multivariate normal.
    k = Vector{Int}(undef, N)
    for i in 1:N
        k[i] ~ Categorical(w)
        x[:,i] ~ MvNormal([μ[k[i]], μ[k[i]]], 1.)
    end
    return k
end

model_def = demo3

# Construct 30 data points for each cluster.
N = 30

# Parameters for each cluster, we assume that each cluster is Gaussian distributed in the example.
μs = [-3.5, 0.0]

# Construct the data points.
data = mapreduce(c -> rand(MvNormal([μs[c], μs[c]], 1.), N), hcat, 1:2);
@time model_def(data)();
0.971409 seconds (1.50 M allocations: 83.990 MiB, 1.54% gc time, 99.95% c
ompilation time)
m = time_model_def(model_def, data);
0.000004 seconds (1 allocation: 32 bytes)
suite = make_suite(m);
results = run(suite);
results["evaluation_untyped"]
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  58.644 μs …  10.989 ms  ┊ GC (min … max): 0.00% … 0.00
%
 Time  (median):     64.808 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   84.832 μs ± 251.726 μs  ┊ GC (mean ± σ):  9.77% ± 3.68
%

  ▄█▅▅▄                                                         
  ██████▇█▃▂▂▂▂▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▃▂▂▂▂▂▂▂▁▁▁▂▁▁ ▂
  58.6 μs         Histogram: frequency by time          141 μs <

 Memory estimate: 60.56 KiB, allocs estimate: 1229.
results["evaluation_typed"]
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  41.223 μs …  5.131 ms  ┊ GC (min … max): 0.00% … 98.79
%
 Time  (median):     43.693 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   48.518 μs ± 92.478 μs  ┊ GC (mean ± σ):  4.16% ±  2.20
%

  █▆▆▅▅▅▅▄▃▄▄▄▃▂▁  ▁▁                                         ▂
  ████████████████████▇▇▆▇▇▇▆▇▇▆▆▇▇▅▆▆▆▇▆▆▇▆▇▇▇▆▆▆▇▆▆▆▅▇▇▇█▇▇ █
  41.2 μs      Histogram: log(frequency) by time      81.9 μs <

 Memory estimate: 28.88 KiB, allocs estimate: 303.
if WEAVE_ARGS[:include_typed_code]
    typed = typed_code(m)
end

This PR

Setup

using BenchmarkTools, DynamicPPL, Distributions, Serialization
import DynamicPPLBenchmarks: time_model_def, make_suite, typed_code, weave_child

Models

demo1

@model function demo1(x)
    m ~ Normal()
    x ~ Normal(m, 1)

    return (m = m, x = x)
end

model_def = demo1;
data = 1.0;
@time model_def(data)();
1.277506 seconds (2.86 M allocations: 178.945 MiB, 4.26% gc time, 99.91% 
compilation time)
m = time_model_def(model_def, data);
0.000003 seconds (2 allocations: 48 bytes)
suite = make_suite(m);
results = run(suite);
results["evaluation_untyped"]
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  682.000 ns …  14.221 ms  ┊ GC (min … max): 0.00% … 0.0
0%
 Time  (median):     722.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):     2.193 μs ± 142.205 μs  ┊ GC (mean ± σ):  0.00% ± 0.0
0%

  ▆█▇▇▅▄▃▃▂▁▁▁                                                  ▂
  ██████████████▇▆▆▆▆▅▆▅▄▆▅▅▅▅▆▆▆▅▅▅▆▆▆▅▆▅▆▄▅▄▆▅▅▄▃▅▄▅▂▄▅▄▅▄▅▆▅ █
  682 ns        Histogram: log(frequency) by time        1.7 μs <

 Memory estimate: 528 bytes, allocs estimate: 14.
results["evaluation_typed"]
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  497.000 ns …  17.007 μs  ┊ GC (min … max): 0.00% … 0.0
0%
 Time  (median):     550.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   578.488 ns ± 367.308 ns  ┊ GC (mean ± σ):  0.00% ± 0.0
0%

     ▁▇█▅                                                        
  ▁▃▅█████▅▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  497 ns           Histogram: frequency by time          180 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.
if WEAVE_ARGS[:include_typed_code]
    typed = typed_code(m)
end

demo2

@model function demo2(y) 
    # Our prior belief about the probability of heads in a coin.
    p ~ Beta(1, 1)

    # The number of observations.
    N = length(y)
    for n in 1:N
        # Heads or tails of a coin are drawn from a Bernoulli distribution.
        y[n] ~ Bernoulli(p)
    end
end

model_def = demo2;
data = rand(0:1, 10);
@time model_def(data)();
0.482644 seconds (970.80 k allocations: 57.795 MiB, 2.40% gc time, 99.90%
 compilation time)
m = time_model_def(model_def, data);
0.000004 seconds (1 allocation: 32 bytes)
suite = make_suite(m);
results = run(suite);
results["evaluation_untyped"]
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  1.750 μs …  11.173 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.895 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.176 μs ± 111.710 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄███▇▆▅▅▃▃▂▂▁▁          ▁▁▂▂▁▁▁▁▁    ▁                      ▂
  ████████████████▇▇▆▆▅▆▅▆████████████▇██▇▇▇▆▆▆▆▅▇▅▅▆▆▆▅▄▅▃▅▅ █
  1.75 μs      Histogram: log(frequency) by time       420 μs <

 Memory estimate: 1.70 KiB, allocs estimate: 48.
results["evaluation_typed"]
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  763.000 ns …  23.271 μs  ┊ GC (min … max): 0.00% … 0.0
0%
 Time  (median):     782.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   809.620 ns ± 273.985 ns  ┊ GC (mean ± σ):  0.00% ± 0.0
0%

  ▅█▆▅▄▃▃▁                                                      ▂
  █████████▆▇▇▆▅▄▃▄▄▅▄▄▄▄▄▃▁▃▄▃▃▄▃▄▃▃▁▁▃▃▅▄▃▅▅▆▆▅▆▆▆▇▆▇▆▆▆▆▇▆▆▆ █
  763 ns        Histogram: log(frequency) by time       1.39 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.
if WEAVE_ARGS[:include_typed_code]
    typed = typed_code(m)
end

demo3

@model function demo3(x)
    D, N = size(x)

    # Draw the parameters for cluster 1.
    μ1 ~ Normal()

    # Draw the parameters for cluster 2.
    μ2 ~ Normal()

    μ = [μ1, μ2]

    # Comment out this line if you instead want to draw the weights.
    w = [0.5, 0.5]

    # Draw assignments for each datum and generate it from a multivariate normal.
    k = Vector{Int}(undef, N)
    for i in 1:N
        k[i] ~ Categorical(w)
        x[:,i] ~ MvNormal([μ[k[i]], μ[k[i]]], 1.)
    end
    return k
end

model_def = demo3

# Construct 30 data points for each cluster.
N = 30

# Parameters for each cluster, we assume that each cluster is Gaussian distributed in the example.
μs = [-3.5, 0.0]

# Construct the data points.
data = mapreduce(c -> rand(MvNormal([μs[c], μs[c]], 1.), N), hcat, 1:2);
@time model_def(data)();
1.016511 seconds (1.70 M allocations: 96.028 MiB, 2.22% gc time, 99.96% c
ompilation time)
m = time_model_def(model_def, data);
0.000004 seconds (1 allocation: 32 bytes)
suite = make_suite(m);
results = run(suite);
results["evaluation_untyped"]
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  56.578 μs …  10.830 ms  ┊ GC (min … max): 0.00% … 0.00
%
 Time  (median):     58.916 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   70.547 μs ± 227.694 μs  ┊ GC (mean ± σ):  9.71% ± 3.41
%

  ██▇▆▅▄▄▃▃▃▂▂▁▁▁▁                                             ▂
  ██████████████████▇▇█▆▇▆▆▇▄▅▆▆▇▄▆▆▅▇▇▇▆▇▇▆▆▇▇▆▆▆▆▅▆▅▆▅▅▄▃▁▄▅ █
  56.6 μs       Histogram: log(frequency) by time       131 μs <

 Memory estimate: 52.12 KiB, allocs estimate: 1169.
results["evaluation_typed"]
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  38.024 μs …  5.795 ms  ┊ GC (min … max): 0.00% … 99.03
%
 Time  (median):     39.284 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   42.942 μs ± 81.470 μs  ┊ GC (mean ± σ):  3.15% ±  1.71
%

  ██▆▄▅▃▄▅▅▅▄▃▂▁                                              ▂
  ███████████████████▇▇▇▆▆▆▇▆▆▆▆▆▆▆▆▆▅▆▅▆▄▅▆▅▇▆▆▇▆▆▅▆▆▆▆▆▆▆▆▇ █
  38 μs        Histogram: log(frequency) by time      70.4 μs <

 Memory estimate: 17.62 KiB, allocs estimate: 183.
if WEAVE_ARGS[:include_typed_code]
    typed = typed_code(m)
end

@torfjelde
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jul 14, 2021
@bors
Copy link
Contributor

bors bot commented Jul 14, 2021

try

Build failed:

@torfjelde
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jul 14, 2021
@yebai
Copy link
Member

yebai commented Jul 15, 2021

bors r+

@bors
Copy link
Contributor

bors bot commented Jul 15, 2021

👎 Rejected by code reviews

@yebai
Copy link
Member

yebai commented Jul 15, 2021

bors r+

bors bot pushed a commit that referenced this pull request Jul 15, 2021
In the more recent Julia versions `view` is zero-overhead, so maybe we should be using them all over the place?

There might also be some neat stuff we can do by accessing `parent` within the tilde-statements:)

Co-authored-by: Hong Ge <[email protected]>
@torfjelde
Copy link
Member Author

It needs a version-bump, but we can do this in a direct commit to master after bors is done 👍

@bors
Copy link
Contributor

bors bot commented Jul 15, 2021

Timed out.

@yebai yebai merged commit bdbaf32 into master Jul 16, 2021
@yebai yebai deleted the tor/views branch July 16, 2021 12:33
@yebai
Copy link
Member

yebai commented Jul 16, 2021

Merging this now as the previous bors test was successful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants