Skip to content

Update after Binaryen change to inline all single-use functions #13744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 30, 2021
Merged

Conversation

kripken
Copy link
Member

@kripken kripken commented Mar 23, 2021

Investigating the tuning of the binaryen inliner with @manoskouk , it looks like always inlining single-use functions is beneficial. In that case we can delete the function after inlining, so the code is "moved". Binaryen's default is to only inline fairly small functions in that case, but the heuristic is too conservative it seems. This PR shows what happens if we always inline such functions.

Positive findings:

  • As the updated test results here show, code size shrinks by a few % on several of our reference code size tests. And the metadce tests show which function names were inlined and then removed. (If we decide to go with this, this PR would be landed with these text expectation changes after the binaryen roll.)
  • The emscripten benchmark suite shows a 3.5% code size reduction. One big 30% dominates that; ignoring it using the median, the improvement is 2%, and that remains the same when looking just at the large/real-world benchmarks. Every single benchmark shrinks with this, and none regress.
  • Speed improves on a few benchmarks. That is mainly on the microbenchmarks, though, so even while some of those speedups are large (one by 40%), I don't think it's very realistic. The real-world codebases in the benchmark suite do not change much in speed (but they do shrink by 2% as mentioned before).

Negative findings:

  • One reason we never did this is the risk of increasing compile times on VMs, due to creating larger functions (and regalloc etc. can be nonlinear). I think that risk is much lower today since we have great baseline compilers pretty much in all wasm VMs. However, disabling the baseline compiler in v8 for example I can see some startup regressions on the benchmark suite, like 10% on base64. So there may be a downside here of slower times to reach the optimizing tier. That's just in a few of the microbenchmarks, but it is a risk.
  • Even if we can delete the function after inlining we have no guarantee of a total reduction in code size. That is the second reason we never did this. It is hard to construct counterexamples manually, but the fuzzer easily does so when instrumented for that. Investigating several of them, they seem like unlikely scenarios. For example, in one case if we did not inline a function then the target of the inlining would be identical to another function and duplicate function elimination would end up helping a lot more. My impression is that these possible regressions are unlikely (especially since every single benchmark in the benchmark suite shrinks), but this is a risk.

Overall I think we should make this change. The positives are very strong, better than I expected. It would also let us deprecate and eventually remove a flag from Binaryen (unless we decide it's worth keeping?)

Thoughts?

cc @dschuff @sbc100 @aheejin @tlively @aardappel @juj @manoskouk @jakobkummerow

@sbc100
Copy link
Collaborator

sbc100 commented Mar 23, 2021

Nice!

@aardappel
Copy link
Collaborator

Awesome!

I think I argued for more aggressive inlining by default in the past :)

It is an interesting question who is responsible for things like start-up times, and whether the tools side should be in the business of protecting engines against their own slowness in any current implementation. Engines may eventually have to deal with these as other toolchains produce these, and they're a moving target.

I'd definitely say that focusing on speed (which is also abstract to some extent) & size is more important.

@aheejin
Copy link
Member

aheejin commented Mar 23, 2021

Nice! I agree it's better to do this. By the way, how can we be sure if a function is single use, in the presence of indirect calls? Do we exclude calls that can be called indirectly from multiple places?

@dschuff
Copy link
Member

dschuff commented Mar 23, 2021

This looks nice, and I agree with what folks have said here so far. Regarding indirect calls, I think the answer is that if a function is address-taken (i.e. if it's in the table) then it can't be removed entirely, so it probably shouldn't be inlined into a direct caller (since that would be a duplication). Or maybe you could say that a direct caller is one use, and presence in the table is another use, so if we had another heuristic that said inlining/duplicating was good even for some multiple-use functions, it could still apply there.

@kripken
Copy link
Member Author

kripken commented Mar 24, 2021

Yes, the pass considers an appearance in the table as a global use (which can't be inlined into, and prevents inlining as a single-use function).

@juj
Copy link
Collaborator

juj commented Mar 28, 2021

Agreed that it would be very beneficial to have this on by default. I would recommend having it as a flag that can be disabled if explicitly desired.

@manoskouk
Copy link

I strongly support this. The current V8 policy is to allow more compile time in the optimizing compiler, and this will help us produce better code.

kripken added a commit to WebAssembly/binaryen that referenced this pull request Mar 29, 2021
This implements emscripten-core/emscripten#13744

Inlining functions with a single use allows us to remove the function afterward.
That looks highly beneficial, shrinking every single benchmark in emscripten's
benchmark suite, by an average of 2% on the macrobenchmarks and 3.5% on
all of them. Speed also improves, although mostly on the microbenchmarks so
that might be less realistic.

There may be a slight downside to startup time due to emitting larger functions,
but given the baseline compilers in VMs these days it seems worth it, as the
delay would be just to get to the upper tier. On the benchmark suite the risk
seems low.

See more details in the PR above.
@kripken
Copy link
Member Author

kripken commented Mar 29, 2021

I would recommend having it as a flag that can be disabled if explicitly desired.

That can currently be done with -s BINARYEN_EXTRA_PASSES=--one-caller-inline-max-function-size=1, that is, passing a flag through to binaryen. I think that's good enough as this is not likely to be modified by many users. (But we can consider an emscripten flag perhaps if the need arises.)

kripken added a commit that referenced this pull request Mar 29, 2021
To allow the binaryen inlining improvement to roll in.

See WebAssembly/binaryen#3730 and #13744
kripken added a commit that referenced this pull request Mar 29, 2021
… traces (#13792)

To allow the binaryen inlining improvement to roll in.

See WebAssembly/binaryen#3730 and #13744
@kripken kripken changed the title Effects of inlining all single-use functions Update after Binaryen change to inline all single-use functions Mar 30, 2021
@kripken kripken marked this pull request as ready for review March 30, 2021 00:32
@kripken
Copy link
Member Author

kripken commented Mar 30, 2021

The binaryen change landed, and this should be ready to go.

It must wait for the binaryen change to roll however.

@kripken kripken requested a review from sbc100 March 30, 2021 00:33
@kripken kripken merged commit 90d9863 into main Mar 30, 2021
@kripken kripken deleted the inline branch March 30, 2021 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants