Skip to content

Conversation

azhogin
Copy link
Contributor

@azhogin azhogin commented Jul 1, 2025

Implentation coroutine (func::{closure#0}) is monomorphized, when func itself is monomorphized.

Currently, when pub async fn foo(..) is exported from lib and used in several dependent crates, only 'header' function is monomorphized in the defining crate. 'header' function, returning coroutine object, is monomorphized, but the coroutine's poll function (which actually implements all the logic for the function) is not. In such situation, func::{closure#0} will be monomorphized in every dependency.

This PR adds monomorphization for func::{closure#0} (coroutine poll function), when func itself is monomorphized.

Simple test with one lib async function and ten dependent crates (executable) that use the function, shows 5-7% compilation time improvement (single-threaded).

@rustbot
Copy link
Collaborator

rustbot commented Jul 1, 2025

r? @oli-obk

rustbot has assigned @oli-obk.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 1, 2025
@oli-obk
Copy link
Contributor

oli-obk commented Jul 2, 2025

I don't think this needs a -Z flag. It makes a lot of sense to just change this everywhere. We can then benchmark it in the benchmark suite, too.

A similar change could be done for iterator or closure returning functions.

On that note: instead of collecting nested bodies in general, wouldn't it be slightly more correct to collect types that are in the opaque (non-opque types must already have all their impls monomorphized, as they are publicly reachable) return type of monomorphized functions? For opaque types we'd only need to monomorphize the trait impls that the opaque type has in its bounds

cc @compiler-errors for thoughts as you wrote #135314

@azhogin azhogin force-pushed the azhogin/link-pub-async-impls branch from d6bb74b to a6b81d5 Compare July 2, 2025 11:54
@rustbot
Copy link
Collaborator

rustbot commented Jul 2, 2025

Some changes occurred in coverage tests.

cc @Zalathar

@azhogin azhogin force-pushed the azhogin/link-pub-async-impls branch from a6b81d5 to d6e2c24 Compare July 2, 2025 11:55
@azhogin azhogin changed the title -Zlink-pub-async-impls flag added to monomorphize pub async fn impl pub async fn impl is monomorphized when func itself is monomorphized Jul 2, 2025
@azhogin
Copy link
Contributor Author

azhogin commented Jul 2, 2025

Flag removed, behaviour changed to be default.
"opaque return type of monomorphized functions" - not yet.

@rust-log-analyzer

This comment has been minimized.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 2, 2025

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 2, 2025
@bors
Copy link
Collaborator

bors commented Jul 2, 2025

⌛ Trying commit d6e2c24 with merge 9443527...

bors added a commit that referenced this pull request Jul 2, 2025
pub async fn impl is monomorphized when func itself is monomorphized

Implentation coroutine (`func::{closure#0}`) is monomorphized, when func itself is monomorphized.

Currently, when `pub async fn foo(..)` is exported from lib and used in several dependent crates, only 'header' function is monomorphized in the defining crate. 'header' function, returning coroutine object, is monomorphized, but the coroutine's poll function (which actually implements all the logic for the function) is not. In such situation, `func::{closure#0}` will be monomorphized in every dependency.

This PR adds monomorphization for `func::{closure#0}` (coroutine poll function), when func itself is monomorphized.

Simple test with one lib async function and ten dependent crates (executable) that use the function, shows 5-7% compilation time improvement (single-threaded).
@bors
Copy link
Collaborator

bors commented Jul 2, 2025

☀️ Try build successful - checks-actions
Build commit: 9443527 (9443527f3f94abad97e5cecc88f429e299eac1f0)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (9443527): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
6.0% [0.8%, 13.6%] 6
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (primary -3.1%, secondary 2.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.4% [0.4%, 5.0%] 4
Improvements ✅
(primary)
-3.1% [-3.8%, -2.5%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -3.1% [-3.8%, -2.5%] 2

Cycles

Results (primary 2.5%, secondary 4.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.5% [2.0%, 3.7%] 11
Regressions ❌
(secondary)
5.6% [0.9%, 14.1%] 9
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.2% [-1.2%, -1.2%] 1
All ❌✅ (primary) 2.5% [2.0%, 3.7%] 11

Binary size

Results (secondary 9.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
9.4% [1.2%, 15.8%] 5
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Bootstrap: 462.08s -> 462.853s (0.17%)
Artifact size: 372.26 MiB -> 372.28 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 3, 2025
@oli-obk
Copy link
Contributor

oli-obk commented Jul 4, 2025

Some tests need blessing

The performance regression is expected, as the regressing benchmark has public async fn as a library build, and will thus now do the work that normally happened in the downstream crate.

@Zalathar
Copy link
Contributor

Zalathar commented Jul 6, 2025

In particular, you will need to set build.profiler = true in your bootstrap.toml config to properly bless the modified coverage test.

@azhogin azhogin force-pushed the azhogin/link-pub-async-impls branch from d6e2c24 to 57c2602 Compare July 8, 2025 19:25
@rust-log-analyzer

This comment has been minimized.

@azhogin
Copy link
Contributor Author

azhogin commented Jul 8, 2025

It looks like '-Z print-type-sizes' shows different results for x86_64 and aarch64 for some tests. Are there some helpfull flags to remove the difference between targets (except -C panic=abort)?

Also, I performed deeply-nested-multi test locally with one dependent crate with usage of main async func poll. And compilation time is the same (lib crate + use crate compilation with/without this change).

@oli-obk
Copy link
Contributor

oli-obk commented Jul 9, 2025

you could limit the test to one platform with //@only-x86_64

@azhogin azhogin force-pushed the azhogin/link-pub-async-impls branch from 57c2602 to e6b35f4 Compare July 9, 2025 11:23
@bors
Copy link
Collaborator

bors commented Jul 22, 2025

☔ The latest upstream changes (presumably #144249) made this pull request unmergeable. Please resolve the merge conflicts.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 23, 2025

oh sorry I didn't realize you addressed this.

@bors delegate+

r=me after a rebase

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented Aug 31, 2025

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 31, 2025
@azhogin azhogin force-pushed the azhogin/link-pub-async-impls branch from 97b9fc4 to 961e96a Compare August 31, 2025 14:06
@azhogin
Copy link
Contributor Author

azhogin commented Aug 31, 2025

@bors r=oli-obk

@bors
Copy link
Collaborator

bors commented Aug 31, 2025

📌 Commit 961e96a has been approved by oli-obk

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 31, 2025
bors added a commit that referenced this pull request Aug 31, 2025
pub async fn impl is monomorphized when func itself is monomorphized

Implentation coroutine (`func::{closure#0}`) is monomorphized, when func itself is monomorphized.

Currently, when `pub async fn foo(..)` is exported from lib and used in several dependent crates, only 'header' function is monomorphized in the defining crate. 'header' function, returning coroutine object, is monomorphized, but the coroutine's poll function (which actually implements all the logic for the function) is not. In such situation, `func::{closure#0}` will be monomorphized in every dependency.

This PR adds monomorphization for `func::{closure#0}` (coroutine poll function), when func itself is monomorphized.

Simple test with one lib async function and ten dependent crates (executable) that use the function, shows 5-7% compilation time improvement (single-threaded).
@bors
Copy link
Collaborator

bors commented Aug 31, 2025

⌛ Testing commit 961e96a with merge b1275d7...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented Aug 31, 2025

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 31, 2025
@azhogin azhogin force-pushed the azhogin/link-pub-async-impls branch from 961e96a to c2c58cb Compare September 1, 2025 06:45
@azhogin
Copy link
Contributor Author

azhogin commented Sep 1, 2025

@bors r=oli-obk

@bors
Copy link
Collaborator

bors commented Sep 1, 2025

📌 Commit c2c58cb has been approved by oli-obk

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 1, 2025
@bors
Copy link
Collaborator

bors commented Sep 1, 2025

⌛ Testing commit c2c58cb with merge c0bb3b9...

@bors
Copy link
Collaborator

bors commented Sep 1, 2025

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing c0bb3b9 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 1, 2025
@bors bors merged commit c0bb3b9 into rust-lang:master Sep 1, 2025
11 checks passed
@rustbot rustbot added this to the 1.91.0 milestone Sep 1, 2025
Copy link
Contributor

github-actions bot commented Sep 1, 2025

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 84a1747 (parent) -> c0bb3b9 (this PR)

Test differences

Show 13 test diffs

Stage 1

  • [codegen-units] tests/codegen-units/item-collection/async-fn-impl.rs: [missing] -> pass (J1)
  • [codegen-units] tests/codegen-units/item-collection/opaque-return-impls.rs: [missing] -> pass (J1)
  • [ui] tests/ui/async-await/future-sizes/async-awaiting-fut.rs: pass -> ignore (ignored when randomizing layouts) (J1)
  • [ui] tests/ui/async-await/future-sizes/large-arg.rs: pass -> ignore (ignored when randomizing layouts) (J1)
  • [ui] tests/ui/print_type_sizes/async.rs: pass -> ignore (ignored when randomizing layouts) (J1)

Stage 2

  • [ui] tests/ui/async-await/future-sizes/async-awaiting-fut.rs: pass -> ignore (only executed when the architecture is x86_64) (J0)
  • [ui] tests/ui/async-await/future-sizes/large-arg.rs: pass -> ignore (only executed when the architecture is x86_64) (J0)
  • [ui] tests/ui/print_type_sizes/async.rs: pass -> ignore (only executed when the architecture is x86_64) (J0)
  • [codegen-units] tests/codegen-units/item-collection/opaque-return-impls.rs: [missing] -> pass (J2)
  • [codegen-units] tests/codegen-units/item-collection/opaque-return-impls.rs: [missing] -> ignore (only executed when the target is x86_64-unknown-linux-gnu) (J3)
  • [codegen-units] tests/codegen-units/item-collection/async-fn-impl.rs: [missing] -> pass (J4)

Additionally, 2 doctest diffs were found. These are ignored, as they are noisy.

Job group index

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard c0bb3b98bb7aac24a37635e5d36d961e0b14f435 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. pr-check-1: 1740.6s -> 1374.3s (-21.0%)
  2. dist-aarch64-apple: 5811.0s -> 6892.3s (18.6%)
  3. aarch64-gnu-debug: 5013.7s -> 4127.7s (-17.7%)
  4. dist-x86_64-apple: 6838.0s -> 7912.1s (15.7%)
  5. x86_64-rust-for-linux: 3104.4s -> 2620.8s (-15.6%)
  6. dist-aarch64-msvc: 5467.8s -> 6186.7s (13.1%)
  7. i686-gnu-2: 6181.5s -> 5405.2s (-12.6%)
  8. aarch64-apple: 6674.9s -> 5902.3s (-11.6%)
  9. i686-gnu-1: 8286.5s -> 7384.6s (-10.9%)
  10. pr-check-2: 2386.1s -> 2152.3s (-9.8%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c0bb3b9): comparison URL.

Overall result: ❌ regressions - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

  • If the regression was expected or you think it can be justified,
    please write a comment with sufficient written justification, and add
    @rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
  • If you think that you know of a way to resolve the regression, try to create
    a new PR with a fix for the regression.
  • If you do not understand the regression or you think that it is just noise,
    you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
    were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
6.1% [0.9%, 13.5%] 6
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (secondary 4.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.0% [2.0%, 5.9%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results (secondary 4.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
8.5% [2.6%, 13.9%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.0% [-3.6%, -2.4%] 2
All ❌✅ (primary) - - 0

Binary size

Results (secondary 9.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
9.5% [1.2%, 16.0%] 5
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Bootstrap: 468.417s -> 466.41s (-0.43%)
Artifact size: 388.46 MiB -> 388.46 MiB (-0.00%)

@Kobzol
Copy link
Member

Kobzol commented Sep 2, 2025

The regression is expected, as we now do more work for async fns in their crate, with the hope of reducing the amount of work required in downstream crates.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants