Use `view` whenever possible #272

torfjelde · 2021-07-09T22:07:12Z

In the more recent Julia versions view is zero-overhead, so maybe we should be using them all over the place?

There might also be some neat stuff we can do by accessing parent within the tilde-statements:)

src/compiler.jl

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…tor/views

yebai

@torfjelde Maybe add some benchmarking with a concrete example? Also, a test would be helpful.

torfjelde · 2021-07-13T19:36:05Z

Maybe add some benchmarking with a concrete example? Also, a test would be helpful.

Benchmarks I'm with you on (but I see you found #248 which is exactly what I was going to refer to). But regarding tests, shouldn't this be covered abundantly by the existing test suite?

yebai · 2021-07-13T21:54:25Z

src/compiler.jl

+maybe_view(x::Expr) = :($(DynamicPPL.maybe_unwrap_view)(@view($x)))
+
+maybe_unwrap_view(x) = x
+maybe_unwrap_view(x::SubArray{<:Any,0}) = x[1]


Maybe add a comment on why we need unwrap view here? Otherwise, happy to merge as-is.

torfjelde · 2021-07-13T23:19:13Z

Benchmarks are showing a tiny performance improvement, though the models we're currently benchmarking are probably not ideal for demonstrating the difference. demo3 is probably the best indication of what since it includes expressions such as x[:, i].

Current release

Setup

using BenchmarkTools, DynamicPPL, Distributions, Serialization

import DynamicPPLBenchmarks: time_model_def, make_suite, typed_code, weave_child

Models

`demo1`

@model function demo1(x)
    m ~ Normal()
    x ~ Normal(m, 1)

    return (m = m, x = x)
end

model_def = demo1;
data = 1.0;

@time model_def(data)();

1.235004 seconds (2.86 M allocations: 178.948 MiB, 5.86% gc time, 99.91% 
compilation time)

m = time_model_def(model_def, data);

0.000005 seconds (2 allocations: 48 bytes)

suite = make_suite(m);
results = run(suite);

results["evaluation_untyped"]

BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  661.000 ns …  13.963 ms  ┊ GC (min … max): 0.00% … 0.0
0%
 Time  (median):     721.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):     2.182 μs ± 139.622 μs  ┊ GC (mean ± σ):  0.00% ± 0.0
0%

  ▂▆██▆▅▄▄▃▂▁▁                                                  ▂
  ██████████████▇█▇▇▆▆▆▅▅▆▆▆▅▆▆▆▇▇▆▆█▇▇▆▇▆▆▆▅▄▅▃▄▄▂▂▅▅▄▅▅▆▄▅▄▅▆ █
  661 ns        Histogram: log(frequency) by time       1.78 μs <

 Memory estimate: 528 bytes, allocs estimate: 14.

results["evaluation_typed"]

BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  497.000 ns …  19.219 μs  ┊ GC (min … max): 0.00% … 0.0
0%
 Time  (median):     542.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   601.136 ns ± 386.032 ns  ┊ GC (mean ± σ):  0.00% ± 0.0
0%

  ▄█▇▆█▇▅▃▂▁▁                      ▁▂▂▂▂▂▁▁▁ ▁       ▁▁         ▂
  ████████████▇▆▆▆▅▅▃▁▃▅▃▅▆▆▇▇▇█▇███████████████████████▆▇▇▇▆▆▆ █
  497 ns        Histogram: log(frequency) by time       1.15 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

if WEAVE_ARGS[:include_typed_code]
    typed = typed_code(m)
end

`demo2`

@model function demo2(y) 
    # Our prior belief about the probability of heads in a coin.
    p ~ Beta(1, 1)

    # The number of observations.
    N = length(y)
    for n in 1:N
        # Heads or tails of a coin are drawn from a Bernoulli distribution.
        y[n] ~ Bernoulli(p)
    end
end

model_def = demo2;
data = rand(0:1, 10);

@time model_def(data)();

0.534269 seconds (894.98 k allocations: 53.237 MiB, 3.11% gc time, 99.91%
 compilation time)

m = time_model_def(model_def, data);

0.000003 seconds (1 allocation: 32 bytes)

suite = make_suite(m);
results = run(suite);

results["evaluation_untyped"]

BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  1.827 μs …  11.291 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.948 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.395 μs ± 112.885 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅█▇▆▅▃▃▂▁           ▁▂▂▂▁▁▂▂▂▂▂▂▂▂▁                         ▂
  ███████████▇▆▅▅▆▄▅▆█████████████████████▇▇▆▆▆▆▆▅▅▅▅▅▆▂▄▅▄▅▅ █
  1.83 μs      Histogram: log(frequency) by time      4.63 μs <

 Memory estimate: 1.70 KiB, allocs estimate: 48.

results["evaluation_typed"]

BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  837.000 ns …  24.108 μs  ┊ GC (min … max): 0.00% … 0.0
0%
 Time  (median):     907.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.207 μs ± 606.730 ns  ┊ GC (mean ± σ):  0.00% ± 0.0
0%

    ▅▄█                                                          
  ▅▆███▃▂▂▂▂▂▂▂▂▂▁▁▂▁▂▂▁▁▂▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▃▄▄▄▄▄▄▅▅▄▄▃▃▃▂▂ ▃
  837 ns           Histogram: frequency by time         1.79 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

if WEAVE_ARGS[:include_typed_code]
    typed = typed_code(m)
end

`demo3`

@model function demo3(x)
    D, N = size(x)

    # Draw the parameters for cluster 1.
    μ1 ~ Normal()

    # Draw the parameters for cluster 2.
    μ2 ~ Normal()

    μ = [μ1, μ2]

    # Comment out this line if you instead want to draw the weights.
    w = [0.5, 0.5]

    # Draw assignments for each datum and generate it from a multivariate normal.
    k = Vector{Int}(undef, N)
    for i in 1:N
        k[i] ~ Categorical(w)
        x[:,i] ~ MvNormal([μ[k[i]], μ[k[i]]], 1.)
    end
    return k
end

model_def = demo3

# Construct 30 data points for each cluster.
N = 30

# Parameters for each cluster, we assume that each cluster is Gaussian distributed in the example.
μs = [-3.5, 0.0]

# Construct the data points.
data = mapreduce(c -> rand(MvNormal([μs[c], μs[c]], 1.), N), hcat, 1:2);

@time model_def(data)();

0.971409 seconds (1.50 M allocations: 83.990 MiB, 1.54% gc time, 99.95% c
ompilation time)

m = time_model_def(model_def, data);

0.000004 seconds (1 allocation: 32 bytes)

suite = make_suite(m);
results = run(suite);

results["evaluation_untyped"]

BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  58.644 μs …  10.989 ms  ┊ GC (min … max): 0.00% … 0.00
%
 Time  (median):     64.808 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   84.832 μs ± 251.726 μs  ┊ GC (mean ± σ):  9.77% ± 3.68
%

  ▄█▅▅▄                                                         
  ██████▇█▃▂▂▂▂▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▃▂▂▂▂▂▂▂▁▁▁▂▁▁ ▂
  58.6 μs         Histogram: frequency by time          141 μs <

 Memory estimate: 60.56 KiB, allocs estimate: 1229.

results["evaluation_typed"]

BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  41.223 μs …  5.131 ms  ┊ GC (min … max): 0.00% … 98.79
%
 Time  (median):     43.693 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   48.518 μs ± 92.478 μs  ┊ GC (mean ± σ):  4.16% ±  2.20
%

  █▆▆▅▅▅▅▄▃▄▄▄▃▂▁  ▁▁                                         ▂
  ████████████████████▇▇▆▇▇▇▆▇▇▆▆▇▇▅▆▆▆▇▆▆▇▆▇▇▇▆▆▆▇▆▆▆▅▇▇▇█▇▇ █
  41.2 μs      Histogram: log(frequency) by time      81.9 μs <

 Memory estimate: 28.88 KiB, allocs estimate: 303.

if WEAVE_ARGS[:include_typed_code]
    typed = typed_code(m)
end

This PR

Setup

using BenchmarkTools, DynamicPPL, Distributions, Serialization

import DynamicPPLBenchmarks: time_model_def, make_suite, typed_code, weave_child

Models

`demo1`

@model function demo1(x)
    m ~ Normal()
    x ~ Normal(m, 1)

    return (m = m, x = x)
end

model_def = demo1;
data = 1.0;

@time model_def(data)();

1.277506 seconds (2.86 M allocations: 178.945 MiB, 4.26% gc time, 99.91% 
compilation time)

m = time_model_def(model_def, data);

0.000003 seconds (2 allocations: 48 bytes)

suite = make_suite(m);
results = run(suite);

results["evaluation_untyped"]

BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  682.000 ns …  14.221 ms  ┊ GC (min … max): 0.00% … 0.0
0%
 Time  (median):     722.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):     2.193 μs ± 142.205 μs  ┊ GC (mean ± σ):  0.00% ± 0.0
0%

  ▆█▇▇▅▄▃▃▂▁▁▁                                                  ▂
  ██████████████▇▆▆▆▆▅▆▅▄▆▅▅▅▅▆▆▆▅▅▅▆▆▆▅▆▅▆▄▅▄▆▅▅▄▃▅▄▅▂▄▅▄▅▄▅▆▅ █
  682 ns        Histogram: log(frequency) by time        1.7 μs <

 Memory estimate: 528 bytes, allocs estimate: 14.

results["evaluation_typed"]

BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  497.000 ns …  17.007 μs  ┊ GC (min … max): 0.00% … 0.0
0%
 Time  (median):     550.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   578.488 ns ± 367.308 ns  ┊ GC (mean ± σ):  0.00% ± 0.0
0%

     ▁▇█▅                                                        
  ▁▃▅█████▅▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  497 ns           Histogram: frequency by time          180 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

if WEAVE_ARGS[:include_typed_code]
    typed = typed_code(m)
end

`demo2`

@model function demo2(y) 
    # Our prior belief about the probability of heads in a coin.
    p ~ Beta(1, 1)

    # The number of observations.
    N = length(y)
    for n in 1:N
        # Heads or tails of a coin are drawn from a Bernoulli distribution.
        y[n] ~ Bernoulli(p)
    end
end

model_def = demo2;
data = rand(0:1, 10);

@time model_def(data)();

0.482644 seconds (970.80 k allocations: 57.795 MiB, 2.40% gc time, 99.90%
 compilation time)

m = time_model_def(model_def, data);

0.000004 seconds (1 allocation: 32 bytes)

suite = make_suite(m);
results = run(suite);

results["evaluation_untyped"]

BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  1.750 μs …  11.173 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.895 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.176 μs ± 111.710 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄███▇▆▅▅▃▃▂▂▁▁          ▁▁▂▂▁▁▁▁▁    ▁                      ▂
  ████████████████▇▇▆▆▅▆▅▆████████████▇██▇▇▇▆▆▆▆▅▇▅▅▆▆▆▅▄▅▃▅▅ █
  1.75 μs      Histogram: log(frequency) by time       420 μs <

 Memory estimate: 1.70 KiB, allocs estimate: 48.

results["evaluation_typed"]

BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  763.000 ns …  23.271 μs  ┊ GC (min … max): 0.00% … 0.0
0%
 Time  (median):     782.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   809.620 ns ± 273.985 ns  ┊ GC (mean ± σ):  0.00% ± 0.0
0%

  ▅█▆▅▄▃▃▁                                                      ▂
  █████████▆▇▇▆▅▄▃▄▄▅▄▄▄▄▄▃▁▃▄▃▃▄▃▄▃▃▁▁▃▃▅▄▃▅▅▆▆▅▆▆▆▇▆▇▆▆▆▆▇▆▆▆ █
  763 ns        Histogram: log(frequency) by time       1.39 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

if WEAVE_ARGS[:include_typed_code]
    typed = typed_code(m)
end

`demo3`

@model function demo3(x)
    D, N = size(x)

    # Draw the parameters for cluster 1.
    μ1 ~ Normal()

    # Draw the parameters for cluster 2.
    μ2 ~ Normal()

    μ = [μ1, μ2]

    # Comment out this line if you instead want to draw the weights.
    w = [0.5, 0.5]

    # Draw assignments for each datum and generate it from a multivariate normal.
    k = Vector{Int}(undef, N)
    for i in 1:N
        k[i] ~ Categorical(w)
        x[:,i] ~ MvNormal([μ[k[i]], μ[k[i]]], 1.)
    end
    return k
end

model_def = demo3

# Construct 30 data points for each cluster.
N = 30

# Parameters for each cluster, we assume that each cluster is Gaussian distributed in the example.
μs = [-3.5, 0.0]

# Construct the data points.
data = mapreduce(c -> rand(MvNormal([μs[c], μs[c]], 1.), N), hcat, 1:2);

@time model_def(data)();

1.016511 seconds (1.70 M allocations: 96.028 MiB, 2.22% gc time, 99.96% c
ompilation time)

m = time_model_def(model_def, data);

0.000004 seconds (1 allocation: 32 bytes)

suite = make_suite(m);
results = run(suite);

results["evaluation_untyped"]

BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  56.578 μs …  10.830 ms  ┊ GC (min … max): 0.00% … 0.00
%
 Time  (median):     58.916 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   70.547 μs ± 227.694 μs  ┊ GC (mean ± σ):  9.71% ± 3.41
%

  ██▇▆▅▄▄▃▃▃▂▂▁▁▁▁                                             ▂
  ██████████████████▇▇█▆▇▆▆▇▄▅▆▆▇▄▆▆▅▇▇▇▆▇▇▆▆▇▇▆▆▆▆▅▆▅▆▅▅▄▃▁▄▅ █
  56.6 μs       Histogram: log(frequency) by time       131 μs <

 Memory estimate: 52.12 KiB, allocs estimate: 1169.

results["evaluation_typed"]

BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  38.024 μs …  5.795 ms  ┊ GC (min … max): 0.00% … 99.03
%
 Time  (median):     39.284 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   42.942 μs ± 81.470 μs  ┊ GC (mean ± σ):  3.15% ±  1.71
%

  ██▆▄▅▃▄▅▅▅▄▃▂▁                                              ▂
  ███████████████████▇▇▇▆▆▆▇▆▆▆▆▆▆▆▆▆▅▆▅▆▄▅▆▅▇▆▆▇▆▆▅▆▆▆▆▆▆▆▆▇ █
  38 μs        Histogram: log(frequency) by time      70.4 μs <

 Memory estimate: 17.62 KiB, allocs estimate: 183.

if WEAVE_ARGS[:include_typed_code]
    typed = typed_code(m)
end

torfjelde · 2021-07-14T00:00:46Z

bors try

bors · 2021-07-14T00:15:22Z

try

Build failed:

test (1, ubuntu-latest, x64, 2)

torfjelde · 2021-07-14T00:22:42Z

bors try

bors · 2021-07-14T01:09:56Z

try

Build succeeded:

yebai · 2021-07-15T16:22:12Z

bors r+

bors · 2021-07-15T16:22:14Z

👎 Rejected by code reviews

yebai · 2021-07-15T16:23:21Z

bors r+

In the more recent Julia versions `view` is zero-overhead, so maybe we should be using them all over the place? There might also be some neat stuff we can do by accessing `parent` within the tilde-statements:) Co-authored-by: Hong Ge <[email protected]>

torfjelde · 2021-07-15T16:25:21Z

It needs a version-bump, but we can do this in a direct commit to master after bors is done 👍

bors · 2021-07-15T18:52:13Z

Timed out.

yebai · 2021-07-16T12:33:35Z

Merging this now as the previous bors test was successful.

use views whenever possible

e0fec7c

github-actions bot reviewed Jul 9, 2021

View reviewed changes

src/compiler.jl Outdated Show resolved Hide resolved

src/compiler.jl Outdated Show resolved Hide resolved

src/compiler.jl Outdated Show resolved Hide resolved

formatting

4f76ceb

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

torfjelde mentioned this pull request Jul 9, 2021

Faster evaluation: SimpleVarInfo #267

Merged

3 tasks

torfjelde added 2 commits July 9, 2021 23:24

dont view literals

5a065ec

Merge branch 'tor/views' of github.com:TuringLang/DynamicPPL.jl into …

0215ca0

…tor/views

yebai requested changes Jul 13, 2021

View reviewed changes

torfjelde mentioned this pull request Jul 13, 2021

Benchmarking #248

Merged

yebai reviewed Jul 13, 2021

View reviewed changes

torfjelde added 5 commits July 13, 2021 23:37

fixed the failing tests

2713a77

added a bunch of get_sections to tests to avoid unnecessary warnings

74acfda

formatting

80ff053

Merge branch 'master' into tor/views

f05dc67

added comment to describe maybe_unwrap_view

b18918d

bors bot added a commit that referenced this pull request Jul 14, 2021

Try #272:

69df79c

Merge branch 'tor/test-fix' into tor/views

38ec6b9

bors bot added a commit that referenced this pull request Jul 14, 2021

Try #272:

3c4094a

Merge branch 'master' into tor/views

399c836

yebai approved these changes Jul 15, 2021

View reviewed changes

yebai merged commit bdbaf32 into master Jul 16, 2021

yebai deleted the tor/views branch July 16, 2021 12:33

Use view whenever possible #272

Use view whenever possible #272

Uh oh!

Conversation

torfjelde commented Jul 9, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yebai left a comment

Choose a reason for hiding this comment

Uh oh!

torfjelde commented Jul 13, 2021

Uh oh!

yebai Jul 13, 2021

Choose a reason for hiding this comment

Uh oh!

torfjelde commented Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current release

Setup

Models

demo1

demo2

demo3

This PR

Setup

Models

demo1

demo2

demo3

Uh oh!

torfjelde commented Jul 14, 2021

Uh oh!

bors bot commented Jul 14, 2021

try

Uh oh!

torfjelde commented Jul 14, 2021

Uh oh!

bors bot commented Jul 14, 2021

try

Uh oh!

yebai commented Jul 15, 2021

Uh oh!

bors bot commented Jul 15, 2021

Uh oh!

yebai commented Jul 15, 2021

Uh oh!

torfjelde commented Jul 15, 2021

Uh oh!

bors bot commented Jul 15, 2021

Uh oh!

yebai commented Jul 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use `view` whenever possible #272

Use `view` whenever possible #272

torfjelde commented Jul 13, 2021 •

edited

Loading

`demo1`

`demo2`

`demo3`

`demo1`

`demo2`

`demo3`