Skip to content

Arm thumbv6m binary size increase with dead code change (or from 1.50 to 1.51) #82748

Open
@davidlattimore

Description

@davidlattimore

I've observed an unexpected increase in binary size in response to a change in a crate that we use. The change only adds new public methods, which we don't call, so all the changed code is effectively dead code, but still it results in a significant increase in our binary size. My guess is that the presence of this new code causes LLVM to make different inlining decisions, even though the new code isn't actually called anywhere.

This happens on 1.50.0. The increase (for a minimal binary included below) is from 932 bytes to 2164 bytes.

Switching from 1.50 to 1.51 (currently in beta) without the above change causes the same increase from 932 bytes to 2164 bytes.

I was going to mark this as a stable to beta regression, but TBH, I think it's probably a pre-existing issue that just triggers in response to legitimate changes in library code. I expect that whatever changed between 1.50 and 1.51 is similar in nature to the code change above.

I've tarred up a moderately minimal bit of code that reproduces this:

binary-size-increase.tar.gz

To reproduce, run the ./check-size script contained within the tarball. You might need to rustup target install thumbv6m-none-eabi first.

For me, with current stable 1.50, this shows a change in binary size from 932 bytes to 2164 bytes:

Size prior to commit 77dace37908f281feb9432fc13874475d9dc0765
-rwxr-xr-x 1 dml eng 932 Mar  4 16:39 a.bin
Size after commit 77dace37908f281feb9432fc13874475d9dc0765
-rwxr-xr-x 1 dml eng 2164 Mar  4 16:39 b.bin

If I adjust the script to use 1.51, then I get 2164 bytes for both.

Looking at the disassembly of each binary, it seems that the larger binary includes compiler_builtins::int::specialized_div_rem::u64_div_rem, where the smaller binary doesn't. u64_div_rem is called from __udivmoddi4, which is called from __aeabi_uldivmod. These are also absent from the smaller binary, but present and called from MicroSecond::cycles / Delay::delay in the larger binary.

Cargo.toml sets opt-level = "s". Similar results are observed with opt-level = "z".

Given that LTO is enabled, I'd have expected that dead code would be removed before inlining decisions were made, so I'm surprised that a change to code that isn't called would have this effect.

If there's anything we can do to help LLVM make more optimal decisions when optimizing for binary size, that'd be awesome, although I'm sure it's a pretty difficult problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-heavyIssue: Problems and improvements with respect to binary size of generated code.O-ArmTarget: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions