-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Split Bound index into Canonical and Bound #147138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Split Bound index into Canonical and Bound
This comment has been minimized.
This comment has been minimized.
3d24484
to
1372123
Compare
Finished benchmarking commit (76676ec): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.5%, secondary -4.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.3%, secondary -1.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.0%, secondary 0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 470.189s -> 470.608s (0.09%) |
gotta say, I do not understand perf xd, I was so sure this would have a bigger perf impact on the new solver 😄 |
compiler/rustc_trait_selection/src/traits/select/candidate_assembly.rs
Outdated
Show resolved
Hide resolved
@bors2 try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Split Bound index into Canonical and Bound
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
let anon_bound_tys = (0..NUM_PREINTERNED_ANON_BOUND_TYS_I) | ||
.map(|i| { | ||
(0..NUM_PREINTERNED_ANON_BOUND_TYS_V) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please either check it yourself or leave a fixme here: we sohuld not have fr fewer bound tys/regions/consts so we could preintern fewer of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to NUM_PREINTERNED_ANON_BOUND_TYS_V
, let me know your thoughts. tl;dr 90% 0 vars, 9% 1, usually not more than 3-5. But, given that it's heap allocated, I don't know what reducing it from 20 actually buys us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah 🤔 20 is low enough to not really matter
i though we had like 400 of them (for regions at least) and i would expect to pretty much never have that many bound vars
compiler/rustc_next_trait_solver/src/canonical/canonicalizer.rs
Outdated
Show resolved
Hide resolved
assert!( | ||
!predicate | ||
.trait_ref | ||
.has_type_flags(TypeFlags::HAS_CANONICAL_BOUND | TypeFlags::HAS_TY_BOUND) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this assert work? It would ICE for for<T> fn(T)
. I guess that doesn't exist yet (if ever)🤔
can you add a next-solver revision to the previously very slow test |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (f13b256): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -3.7%, secondary -4.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary 0.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.0%, secondary 0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 470.099s -> 472.362s (0.48%) |
f0434cc
to
050d60f
Compare
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me after you fix CI :3
050d60f
to
d1bbd39
Compare
@bors r=lcnr |
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 42b384e (parent) -> 4b9c62b (this PR) Test differencesShow 48 test diffsStage 1
Stage 2
Additionally, 42 doctest diffs were found. These are ignored, as they are noisy. Job group index
Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 4b9c62b4da3e17cee99d3d2052f1c576b188e2a8 --output-dir test-dashboard And then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
Finished benchmarking commit (4b9c62b): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowOur benchmarks found a performance regression caused by this PR. Next Steps:
@rustbot label: +perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.8%, secondary -4.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.8%, secondary -2.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.0%, secondary 0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 471.941s -> 471.25s (-0.15%) |
perf triage: Improvements outweigh regressions, but some of those improvements are noise (clap-derive and syn). Main regressions match pre-merge results, so I assume this was deemed acceptable as a part of work on new solver, but I don't see any explicit justification. @rustbot label: +perf-regression-triaged |
Split Bound index into Canonical and Bound See [#t-types/trait-system-refactor > perf &rust-lang#96;async-closures/post-mono-higher-ranked-hang.rs&rust-lang#96;](https://rust-lang.zulipchat.com/#narrow/channel/364551-t-types.2Ftrait-system-refactor/topic/perf.20.60async-closures.2Fpost-mono-higher-ranked-hang.2Ers.60/with/541535613) for context Things compile and tests pass, but not sure if this actually solves the perf issue (edit: it does). Opening up this to do a perf (and maybe crater) run. r? lcnr
Perf here is somewhat irrelevant, since this change fixes a hang with the next solver that isn't covered by the test suite. |
See #t-types/trait-system-refactor > perf `async-closures/post-mono-higher-ranked-hang.rs` for context
Things compile and tests pass, but not sure if this actually solves the perf issue (edit: it does). Opening up this to do a perf (and maybe crater) run.
r? lcnr