Description
I've observed an unexpected increase in binary size in response to a change in a crate that we use. The change only adds new public methods, which we don't call, so all the changed code is effectively dead code, but still it results in a significant increase in our binary size. My guess is that the presence of this new code causes LLVM to make different inlining decisions, even though the new code isn't actually called anywhere.
This happens on 1.50.0. The increase (for a minimal binary included below) is from 932 bytes to 2164 bytes.
Switching from 1.50 to 1.51 (currently in beta) without the above change causes the same increase from 932 bytes to 2164 bytes.
I was going to mark this as a stable to beta regression, but TBH, I think it's probably a pre-existing issue that just triggers in response to legitimate changes in library code. I expect that whatever changed between 1.50 and 1.51 is similar in nature to the code change above.
I've tarred up a moderately minimal bit of code that reproduces this:
To reproduce, run the ./check-size
script contained within the tarball. You might need to rustup target install thumbv6m-none-eabi
first.
For me, with current stable 1.50, this shows a change in binary size from 932 bytes to 2164 bytes:
Size prior to commit 77dace37908f281feb9432fc13874475d9dc0765
-rwxr-xr-x 1 dml eng 932 Mar 4 16:39 a.bin
Size after commit 77dace37908f281feb9432fc13874475d9dc0765
-rwxr-xr-x 1 dml eng 2164 Mar 4 16:39 b.bin
If I adjust the script to use 1.51, then I get 2164 bytes for both.
Looking at the disassembly of each binary, it seems that the larger binary includes compiler_builtins::int::specialized_div_rem::u64_div_rem
, where the smaller binary doesn't. u64_div_rem
is called from __udivmoddi4
, which is called from __aeabi_uldivmod
. These are also absent from the smaller binary, but present and called from MicroSecond::cycles
/ Delay::delay
in the larger binary.
Cargo.toml sets opt-level = "s". Similar results are observed with opt-level = "z".
Given that LTO is enabled, I'd have expected that dead code would be removed before inlining decisions were made, so I'm surprised that a change to code that isn't called would have this effect.
If there's anything we can do to help LLVM make more optimal decisions when optimizing for binary size, that'd be awesome, although I'm sure it's a pretty difficult problem.