Skip to content

Conversation

@m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Nov 10, 2025

Part of #99012

This is a new implementation of fmt::Arguments. In this implementation, fmt::Arguments is only two pointers in size. (Instead of six, before.) This makes it the same size as a &str and makes it fit in a register pair.


This fmt::Arguments can store a &'static str without any indirection or additional storage. This means that simple cases like print_fmt(format_args!("hello")) are now just as efficient for the caller as print_str("hello"), as shown by this example:

code:

fn main() {
    println!("Hello, world!");
}

before:

main:
 sub     rsp, 56
 lea     rax, [rip + .Lanon_hello_world]
 mov     qword ptr [rsp + 8], rax
 mov     qword ptr [rsp + 16], 1
 mov     qword ptr [rsp + 24], 8
 xorps   xmm0, xmm0
 movups  xmmword ptr [rsp + 32], xmm0
 lea     rdi, [rsp + 8]
 call    qword ptr [rip + std::io::stdio::_print]
 add     rsp, 56
 ret

after:

main:
 lea     rsi, [rip + .Lanon_hello_world]
 mov     edi, 29
 jmp     qword ptr [rip + std::io::stdio::_print]

(panic!("Hello, world!"); shows a similar change.)


This implementation stores all static information as just a single (byte) string, without any indirection:

code:

format_args!("Hello, {name:-^20}!")

lowering before:

fmt::Arguments::new_v1_formatted(
    &["Hello, ", "!\n"],
    &args,
    &[
        Placeholder {
            position: 0usize,
            flags: 3355443245u32,
            precision: format_count::Implied,
            width: format_count::Is(20u16),
        },
    ],
)

lowering after:

fmt::Arguments::new(
    b"\x07Hello, \xc3-\x00\x00\xc8\x14\x00\x02!\n\x00",
    &args,
)

This saves a ton of pointers and simplifies the expansion significantly, but does mean that individual pieces (e.g. "Hello, " and "!\n") cannot be reused. (Those pieces are often smaller than a pointer to them, though, in which case reusing them is useless.)


The details of the new representation are documented in library/core/src/fmt/mod.rs.

@m-ou-se m-ou-se self-assigned this Nov 10, 2025
@m-ou-se m-ou-se added the A-fmt Area: `core::fmt` label Nov 10, 2025
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-clippy Relevant to the Clippy team. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 10, 2025
@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 10, 2025

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Nov 10, 2025
 Experiment: New fmt::Arguments implementation (another one)
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 10, 2025
@rust-log-analyzer

This comment has been minimized.

@rust-bors
Copy link

rust-bors bot commented Nov 10, 2025

☀️ Try build successful (CI)
Build commit: 6e6ba94 (6e6ba949d24fbfbd9cd48ca4c98adf59fbd04482, parent: a7b3715826827677ca8769eb88dc8052f43e734b)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (6e6ba94): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.7% [0.1%, 5.8%] 26
Regressions ❌
(secondary)
0.6% [0.1%, 1.3%] 44
Improvements ✅
(primary)
-0.7% [-4.3%, -0.1%] 109
Improvements ✅
(secondary)
-1.7% [-38.2%, -0.0%] 93
All ❌✅ (primary) -0.5% [-4.3%, 5.8%] 135

Max RSS (memory usage)

Results (primary -1.5%, secondary -0.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.2% [2.2%, 2.2%] 1
Regressions ❌
(secondary)
3.7% [1.0%, 6.7%] 12
Improvements ✅
(primary)
-1.6% [-6.0%, -0.5%] 31
Improvements ✅
(secondary)
-2.6% [-7.9%, -0.7%] 25
All ❌✅ (primary) -1.5% [-6.0%, 2.2%] 32

Cycles

Results (primary -0.5%, secondary -4.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
4.8% [3.4%, 6.2%] 2
Regressions ❌
(secondary)
8.8% [2.6%, 18.8%] 6
Improvements ✅
(primary)
-3.1% [-5.0%, -2.1%] 4
Improvements ✅
(secondary)
-10.3% [-39.4%, -2.1%] 13
All ❌✅ (primary) -0.5% [-5.0%, 6.2%] 6

Binary size

Results (primary -0.7%, secondary -1.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.5% [0.0%, 1.4%] 4
Regressions ❌
(secondary)
3.2% [0.0%, 7.5%] 12
Improvements ✅
(primary)
-0.8% [-3.3%, -0.0%] 129
Improvements ✅
(secondary)
-1.7% [-23.6%, -0.0%] 123
All ❌✅ (primary) -0.7% [-3.3%, 1.4%] 133

Bootstrap: 476.631s -> 471.922s (-0.99%)
Artifact size: 391.32 MiB -> 388.56 MiB (-0.70%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Nov 10, 2025
@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 10, 2025

Ooh that's pretty good :D

@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 10, 2025

Pretty much everything looks like a great improvement. Not only number of instructions executed, but also memory usage and binary size. 🎉

Only two significant negative results:

1. "image-0.25.6 opt incr-patched:println" with almost +6% instructions:u.

Looking at the detailed results, it looks like that's all LLVM. Probably because llvm got more optimization opportunities. That's not necessarily a bad thing.

2. The fmt-write-str runtime benchmark with over +12% instructions:u.

This could be concerning, but I can't seem to fully replicate it locally.

If I recompile and run this benchmark 100 times in both nightly and with this PR, I do get this interesting result though:

Violin plot

With the nightly compiler, the results vary, with many measurements clustered close to 25ms but also many around 40ms. With this PR, the results are very consistent, all clustered around 27ms. (Update: It's around 26ms now, after a minor optimization.)

So, the median result is worse, but the average is better.

My guess is that the indirection (a slice of string slices) can make things unpredictable, as the strings aren't always in the optimal place for caching. The lack of indirection in the new version then makes it much more predictable. This is just a guess though.

@rustbot rustbot added the A-run-make Area: port run-make Makefiles to rmake.rs label Nov 11, 2025
@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 11, 2025

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Nov 11, 2025
 Experiment: New fmt::Arguments implementation (another one)
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 11, 2025
@m-ou-se m-ou-se changed the title Experiment: New fmt::Arguments implementation (another one) New format_args!() and fmt::Arguments implementation Nov 11, 2025
@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 12, 2025
@bors
Copy link
Collaborator

bors commented Nov 12, 2025

☔ The latest upstream changes (presumably #148851) made this pull request unmergeable. Please resolve the merge conflicts.

@m-ou-se m-ou-se marked this pull request as ready for review November 12, 2025 11:01
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 12, 2025
@rustbot

This comment was marked as off-topic.

@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Nov 12, 2025
@m-ou-se m-ou-se marked this pull request as draft November 12, 2025 11:01
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 12, 2025
@Zalathar
Copy link
Member

Heads up that you might also need to do ./x test coverage --set=build.profiler=true --bless, as the coverage-run tests are automatically skipped if the profiler runtime isn't enabled in your bootstrap.toml.

@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 12, 2025

There are probably tons of micro optimizations left we can do; lots of ways in which we can tweak the exact encoding that makes it faster to execute. But this PR as it is seems good enough and a clear improvement, so I'd like to get this merged first and then do smaller tweaks in later PRs that we can then individually benchmark.

@m-ou-se m-ou-se marked this pull request as ready for review November 12, 2025 11:49
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 12, 2025
@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 12, 2025

The fmt-write-str benchmark results aren't great, but I don't think that's too surprising. This PR makes it cheaper to create a fmt::Arguments, but a bit more expensive to execute/format a fmt::Arguments. In most realistic scenarios, that results in a net win, as a fmt::Arguments is usually created directly before getting used once. But in a benchmark situation where the fmt::Arguments is created once and then executed many many times, it results in a net loss.

If I move the write!() to a separate function (and prevent inlining that function into the benchmark), I see a ~30% increase in performance rather than a ~10% decrease.

@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 12, 2025

So, summary: This improves rustc performance (0-1%, peaks up to 40%), improves rustc memory usage (1-2%, even 3-4% for Hello World and Ripgrep), improves binary size (~1%, close to 2% for both Hello World and Cargo), and improves runtime performance (0-2%, up to 30% in extreme cases).

The few losses can be attributed to llvm getting more optimization opportunities (e.g. the image-0.25.6 test) and an an unrealistic benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-fmt Area: `core::fmt` A-run-make Area: port run-make Makefiles to rmake.rs perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-clippy Relevant to the Clippy team. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants