Skip to content

Conversation

@torfjelde
Copy link
Member

Changes to DPPL can often have quite significant effects for compilation time and performance of both itself and downstream packages. It's also sometimes difficult to discover these performance regressions.

E.g. in #221 we made a small simplification to the compiler and it ended up taking quite a while to figure out what was going wrong and had to test several models to identify the issue.

So, this is a WIP PR for including a small set of models which we can weave into a document where we can look at the changes. It's unclear to me whether this should go in DPPL itself or in a separate package. I found it useful myself and figured I'd put it here so we can start maybe get some "standard" benchmarks to run for testing purposes. IMO we don't need many of them, as we will add more as we go along.

For each model the following will be included in the document:

  • Benchmarked evaluation of the model on untyped and typed VarInfo.
  • Timing of the compilation of the model in the typed VarInfo.
  • Lowered code for the model.
    • If :prefix is provided to weave, the string-representation of code_typed for the evaluation of the model will be saved to a file $(prefix)_(model.name). Furthermore, if :prefix_old is provided, pointing to :prefix used for a previous run (likely using a different version of DPPL), we will diff the code_typed for the two models by loading the saved files.

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this can be very helpful 👍

Maybe put it in /benchmarks/? Since it's for benchmarking DynamicPPL specifically it seems fine to me to put it in this repo.

@torfjelde
Copy link
Member Author

Maybe put it in /benchmarks/? Since it's for benchmarking DynamicPPL specifically it seems fine to me to put it in this repo.

Good point 👍

I'll also add README.md to the folder specifying how it can be used.

@torfjelde
Copy link
Member Author

@devmotion Could you maybe have another look at this? Found it extremely useful when evaluating whether #269 and #271 will cause issues, so it would be nice to just get it merged. Then we can develop it further once it's there, and potentially separate it out into it's own project if it reaches sufficient complexity/size.

@torfjelde torfjelde marked this pull request as ready for review July 8, 2021 17:06
@torfjelde torfjelde changed the title [WIP] Benchmarking Benchmarking Jul 10, 2021
@yebai
Copy link
Member

yebai commented Jul 13, 2021

bors r+

bors bot pushed a commit that referenced this pull request Jul 13, 2021
Changes to DPPL can often have quite significant effects for compilation time and performance of both itself and downstream packages. It's also sometimes difficult to discover these performance regressions.

E.g. in #221 we made a small simplification to the compiler and it ended up taking quite a while to figure out what was going wrong and had to test several models to identify the issue.

So, this is a WIP PR for including a small set of models which we can `weave` into a document where we can look at the changes. It's unclear to me whether this should go in DPPL itself or in a separate package. I found it useful myself and figured I'd put it here so we can start maybe get some "standard" benchmarks to run for testing purposes. IMO we don't need many of them, as we will add more as we go along.

For each model the following will be included in the document:
- Benchmarked evaluation of the model on untyped and typed `VarInfo`.
- Timing of the compilation of the model in the typed `VarInfo`.
- Lowered code for the model.
  - If `:prefix` is provided to `weave`, the string-representation of `code_typed` for the evaluation of the model will be saved to a file `$(prefix)_(model.name)`. Furthermore, if `:prefix_old` is provided, pointing to `:prefix` used for a previous run (likely using a different version of DPPL), we will `diff` the `code_typed` for the two models by loading the saved files.
@bors
Copy link
Contributor

bors bot commented Jul 13, 2021

Build failed:

@yebai
Copy link
Member

yebai commented Jul 13, 2021

@torfjelde Some tests are failing in loglikelihoods.jl. Can you take a look?

Related #268

@torfjelde
Copy link
Member Author

torfjelde commented Jul 13, 2021

@torfjelde Some tests are failing in loglikelihoods.jl. Can you take a look?

Seems like there has been some downstream changes that are causing these. Having a look.

Related #268

Can you elaborate on why #268 is related? Not quite seeing the connection 😕

EDIT: Did you mean to point to #272 ?

@torfjelde
Copy link
Member Author

Also, this PR makes absolutely no changes to DynamicPPL's functionality; it only touches benchmarks/. So this PR shouldn't be held back by failing tests since it's not responsible for those.

Still need to figure out what is breaking the tests though!

@yebai
Copy link
Member

yebai commented Jul 13, 2021

Can you elaborate on why #268 is related? Not quite seeing the connection 😕

Sorry for the confusion - I mean we are experiencing similar falling tests there.

@yebai yebai merged commit 4de6f54 into master Jul 13, 2021
@yebai yebai deleted the tor/benchmarks branch July 13, 2021 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants