pub async fn impl is monomorphized when func itself is monomorphized #143290

azhogin · 2025-07-01T17:15:08Z

Implentation coroutine (func::{closure#0}) is monomorphized, when func itself is monomorphized.

Currently, when pub async fn foo(..) is exported from lib and used in several dependent crates, only 'header' function is monomorphized in the defining crate. 'header' function, returning coroutine object, is monomorphized, but the coroutine's poll function (which actually implements all the logic for the function) is not. In such situation, func::{closure#0} will be monomorphized in every dependency.

This PR adds monomorphization for func::{closure#0} (coroutine poll function), when func itself is monomorphized.

Simple test with one lib async function and ten dependent crates (executable) that use the function, shows 5-7% compilation time improvement (single-threaded).

rustbot · 2025-07-01T17:15:12Z

r? @oli-obk

rustbot has assigned @oli-obk.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

oli-obk · 2025-07-02T07:55:56Z

I don't think this needs a -Z flag. It makes a lot of sense to just change this everywhere. We can then benchmark it in the benchmark suite, too.

A similar change could be done for iterator or closure returning functions.

On that note: instead of collecting nested bodies in general, wouldn't it be slightly more correct to collect types that are in the opaque (non-opque types must already have all their impls monomorphized, as they are publicly reachable) return type of monomorphized functions? For opaque types we'd only need to monomorphize the trait impls that the opaque type has in its bounds

cc @compiler-errors for thoughts as you wrote #135314

rustbot · 2025-07-02T11:54:45Z

Some changes occurred in coverage tests.

cc @Zalathar

azhogin · 2025-07-02T12:00:23Z

Flag removed, behaviour changed to be default.
"opaque return type of monomorphized functions" - not yet.

oli-obk · 2025-07-02T12:57:52Z

@bors try @rust-timer queue

bors · 2025-07-02T12:59:08Z

⌛ Trying commit d6e2c24 with merge 9443527...

pub async fn impl is monomorphized when func itself is monomorphized Implentation coroutine (`func::{closure#0}`) is monomorphized, when func itself is monomorphized. Currently, when `pub async fn foo(..)` is exported from lib and used in several dependent crates, only 'header' function is monomorphized in the defining crate. 'header' function, returning coroutine object, is monomorphized, but the coroutine's poll function (which actually implements all the logic for the function) is not. In such situation, `func::{closure#0}` will be monomorphized in every dependency. This PR adds monomorphization for `func::{closure#0}` (coroutine poll function), when func itself is monomorphized. Simple test with one lib async function and ten dependent crates (executable) that use the function, shows 5-7% compilation time improvement (single-threaded).

bors · 2025-07-02T15:19:22Z

☀️ Try build successful - checks-actions
Build commit: 9443527 (9443527f3f94abad97e5cecc88f429e299eac1f0)

rust-timer · 2025-07-03T12:09:26Z

Finished benchmarking commit (9443527): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	6.0%	[0.8%, 13.6%]	6
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results (primary -3.1%, secondary 2.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.4%	[0.4%, 5.0%]	4
Improvements ✅ (primary)	-3.1%	[-3.8%, -2.5%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-3.1%	[-3.8%, -2.5%]	2

Cycles

Results (primary 2.5%, secondary 4.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	2.5%	[2.0%, 3.7%]	11
Regressions ❌ (secondary)	5.6%	[0.9%, 14.1%]	9
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.2%	[-1.2%, -1.2%]	1
All ❌✅ (primary)	2.5%	[2.0%, 3.7%]	11

Binary size

Results (secondary 9.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	9.4%	[1.2%, 15.8%]	5
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Bootstrap: 462.08s -> 462.853s (0.17%)
Artifact size: 372.26 MiB -> 372.28 MiB (0.01%)

oli-obk · 2025-07-04T13:35:04Z

Some tests need blessing

The performance regression is expected, as the regressing benchmark has public async fn as a library build, and will thus now do the work that normally happened in the downstream crate.

Zalathar · 2025-07-06T09:29:06Z

In particular, you will need to set build.profiler = true in your bootstrap.toml config to properly bless the modified coverage test.

azhogin · 2025-07-08T20:50:48Z

It looks like '-Z print-type-sizes' shows different results for x86_64 and aarch64 for some tests. Are there some helpfull flags to remove the difference between targets (except -C panic=abort)?

Also, I performed deeply-nested-multi test locally with one dependent crate with usage of main async func poll. And compilation time is the same (lib crate + use crate compilation with/without this change).

oli-obk · 2025-07-09T10:00:30Z

you could limit the test to one platform with //@only-x86_64

bors · 2025-07-22T22:55:12Z

☔ The latest upstream changes (presumably #144249) made this pull request unmergeable. Please resolve the merge conflicts.

oli-obk · 2025-07-23T08:24:22Z

oh sorry I didn't realize you addressed this.

@bors delegate+

r=me after a rebase

bors · 2025-08-31T08:44:48Z

💔 Test failed - checks-actions

azhogin · 2025-08-31T14:06:54Z

@bors r=oli-obk

bors · 2025-08-31T14:06:56Z

📌 Commit 961e96a has been approved by oli-obk

It is now in the queue for this repository.

pub async fn impl is monomorphized when func itself is monomorphized Implentation coroutine (`func::{closure#0}`) is monomorphized, when func itself is monomorphized. Currently, when `pub async fn foo(..)` is exported from lib and used in several dependent crates, only 'header' function is monomorphized in the defining crate. 'header' function, returning coroutine object, is monomorphized, but the coroutine's poll function (which actually implements all the logic for the function) is not. In such situation, `func::{closure#0}` will be monomorphized in every dependency. This PR adds monomorphization for `func::{closure#0}` (coroutine poll function), when func itself is monomorphized. Simple test with one lib async function and ten dependent crates (executable) that use the function, shows 5-7% compilation time improvement (single-threaded).

bors · 2025-08-31T15:02:11Z

⌛ Testing commit 961e96a with merge b1275d7...

bors · 2025-08-31T16:18:02Z

💔 Test failed - checks-actions

…hized, when func itself is monomorphized

azhogin · 2025-09-01T08:52:51Z

@bors r=oli-obk

bors · 2025-09-01T08:52:54Z

📌 Commit c2c58cb has been approved by oli-obk

It is now in the queue for this repository.

bors · 2025-09-01T10:54:44Z

⌛ Testing commit c2c58cb with merge c0bb3b9...

bors · 2025-09-01T14:01:32Z

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing c0bb3b9 to master...

github-actions · 2025-09-01T14:04:30Z

What is this?

This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 84a1747 (parent) -> c0bb3b9 (this PR)

Test differences

Show 13 test diffs

Stage 1

[codegen-units] tests/codegen-units/item-collection/async-fn-impl.rs: [missing] -> pass (J1)
[codegen-units] tests/codegen-units/item-collection/opaque-return-impls.rs: [missing] -> pass (J1)
[ui] tests/ui/async-await/future-sizes/async-awaiting-fut.rs: pass -> ignore (ignored when randomizing layouts) (J1)
[ui] tests/ui/async-await/future-sizes/large-arg.rs: pass -> ignore (ignored when randomizing layouts) (J1)
[ui] tests/ui/print_type_sizes/async.rs: pass -> ignore (ignored when randomizing layouts) (J1)

Stage 2

[ui] tests/ui/async-await/future-sizes/async-awaiting-fut.rs: pass -> ignore (only executed when the architecture is x86_64) (J0)
[ui] tests/ui/async-await/future-sizes/large-arg.rs: pass -> ignore (only executed when the architecture is x86_64) (J0)
[ui] tests/ui/print_type_sizes/async.rs: pass -> ignore (only executed when the architecture is x86_64) (J0)
[codegen-units] tests/codegen-units/item-collection/opaque-return-impls.rs: [missing] -> pass (J2)
[codegen-units] tests/codegen-units/item-collection/opaque-return-impls.rs: [missing] -> ignore (only executed when the target is x86_64-unknown-linux-gnu) (J3)
[codegen-units] tests/codegen-units/item-collection/async-fn-impl.rs: [missing] -> pass (J4)

Additionally, 2 doctest diffs were found. These are ignored, as they are noisy.

Job group index

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard c0bb3b98bb7aac24a37635e5d36d961e0b14f435 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

pr-check-1: 1740.6s -> 1374.3s (-21.0%)
dist-aarch64-apple: 5811.0s -> 6892.3s (18.6%)
aarch64-gnu-debug: 5013.7s -> 4127.7s (-17.7%)
dist-x86_64-apple: 6838.0s -> 7912.1s (15.7%)
x86_64-rust-for-linux: 3104.4s -> 2620.8s (-15.6%)
dist-aarch64-msvc: 5467.8s -> 6186.7s (13.1%)
i686-gnu-2: 6181.5s -> 5405.2s (-12.6%)
aarch64-apple: 6674.9s -> 5902.3s (-11.6%)
i686-gnu-1: 8286.5s -> 7384.6s (-10.9%)
pr-check-2: 2386.1s -> 2152.3s (-9.8%)

How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

rust-timer · 2025-09-01T15:11:44Z

Finished benchmarking commit (c0bb3b9): comparison URL.

Overall result: ❌ regressions - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

If the regression was expected or you think it can be justified,
please write a comment with sufficient written justification, and add
@rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
If you think that you know of a way to resolve the regression, try to create
a new PR with a fix for the regression.
If you do not understand the regression or you think that it is just noise,
you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	6.1%	[0.9%, 13.5%]	6
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results (secondary 4.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	4.0%	[2.0%, 5.9%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

Results (secondary 4.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	8.5%	[2.6%, 13.9%]	4
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-3.0%	[-3.6%, -2.4%]	2
All ❌✅ (primary)	-	-	0

Binary size

Results (secondary 9.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	9.5%	[1.2%, 16.0%]	5
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Bootstrap: 468.417s -> 466.41s (-0.43%)
Artifact size: 388.46 MiB -> 388.46 MiB (-0.00%)

Kobzol · 2025-09-02T11:05:12Z

The regression is expected, as we now do more work for async fns in their crate, with the hope of reducing the amount of work required in downstream crates.

@rustbot label: +perf-regression-triaged

rustbot assigned oli-obk Jul 1, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 1, 2025

azhogin force-pushed the azhogin/link-pub-async-impls branch from d6bb74b to a6b81d5 Compare July 2, 2025 11:54

azhogin force-pushed the azhogin/link-pub-async-impls branch from a6b81d5 to d6e2c24 Compare July 2, 2025 11:55

azhogin changed the title ~~-Zlink-pub-async-impls flag added to monomorphize pub async fn impl~~ pub async fn impl is monomorphized when func itself is monomorphized Jul 2, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 2, 2025

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 3, 2025

azhogin force-pushed the azhogin/link-pub-async-impls branch from d6e2c24 to 57c2602 Compare July 8, 2025 19:25

This comment has been minimized.

Sign in to view

azhogin force-pushed the azhogin/link-pub-async-impls branch from 57c2602 to e6b35f4 Compare July 9, 2025 11:23

This comment has been minimized.

Sign in to view

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 31, 2025

azhogin force-pushed the azhogin/link-pub-async-impls branch from 97b9fc4 to 961e96a Compare August 31, 2025 14:06

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 31, 2025

This comment has been minimized.

Sign in to view

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 31, 2025

pub async fn implementation coroutine (func::{closure#0}) is monomorp…

c2c58cb

…hized, when func itself is monomorphized

azhogin force-pushed the azhogin/link-pub-async-impls branch from 961e96a to c2c58cb Compare September 1, 2025 06:45

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 1, 2025

bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 1, 2025

bors merged commit c0bb3b9 into rust-lang:master Sep 1, 2025
11 checks passed

rustbot added this to the 1.91.0 milestone Sep 1, 2025

rustbot added the perf-regression-triaged The performance regression has been triaged. label Sep 2, 2025

pub async fn impl is monomorphized when func itself is monomorphized #143290

pub async fn impl is monomorphized when func itself is monomorphized #143290

Uh oh!

Conversation

azhogin commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Jul 1, 2025

Uh oh!

oli-obk commented Jul 2, 2025

Uh oh!

rustbot commented Jul 2, 2025

Uh oh!

azhogin commented Jul 2, 2025

Uh oh!

This comment has been minimized.

oli-obk commented Jul 2, 2025

Uh oh!

This comment has been minimized.

bors commented Jul 2, 2025

Uh oh!

bors commented Jul 2, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Jul 3, 2025

Overall result: ❌ regressions - please read the text below

Uh oh!

oli-obk commented Jul 4, 2025

Uh oh!

Zalathar commented Jul 6, 2025

Uh oh!

This comment has been minimized.

azhogin commented Jul 8, 2025

Uh oh!

oli-obk commented Jul 9, 2025

Uh oh!

bors commented Jul 22, 2025

Uh oh!

oli-obk commented Jul 23, 2025

Uh oh!

This comment has been minimized.

bors commented Aug 31, 2025

Uh oh!

azhogin commented Aug 31, 2025

Uh oh!

bors commented Aug 31, 2025

Uh oh!

bors commented Aug 31, 2025

Uh oh!

This comment has been minimized.

bors commented Aug 31, 2025

Uh oh!

azhogin commented Sep 1, 2025

Uh oh!

bors commented Sep 1, 2025

Uh oh!

bors commented Sep 1, 2025

Uh oh!

bors commented Sep 1, 2025

Uh oh!

Uh oh!

github-actions bot commented Sep 1, 2025

Test differences

Stage 1

Stage 2

Job duration changes

Uh oh!

rust-timer commented Sep 1, 2025

Overall result: ❌ regressions - please read the text below

Uh oh!

Kobzol commented Sep 2, 2025

Uh oh!

Uh oh!

azhogin commented Jul 1, 2025 •

edited

Loading