-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[AArch64] Neoverse V2 FeatureDisableLatencySchedHeuristic #140897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] Neoverse V2 FeatureDisableLatencySchedHeuristic #140897
Conversation
This adds FeatureDisableLatencySchedHeuristic to the Neoverse V2 core tuning description. This gives us a 20% improvement on a key workload, some other minor improvements here and there, and no real regressions; nothing outside the noise levels. Earlier attempts to solve this problems included disabling the MI scheduler entirely (llvm#127784), and llvm#139557 was about a heuristic to not schedule hand-written vector code. This solution is preferred because it avoids another heuristic and achieves what we want, and for what is worth, there is a lot of precedent for setting this feature. Thanks to: - Ricardo Jesus for pointing out this subtarget feature, and - Cameron McInally for the extensive performance testing.
@llvm/pr-subscribers-backend-aarch64 Author: Sjoerd Meijer (sjoerdmeijer) ChangesThis adds FeatureDisableLatencySchedHeuristic to the Neoverse V2 core tuning description. This gives us a 20% improvement on a key workload, some other minor improvements here and there, and no real regressions; nothing outside the noise levels. Earlier attempts to solve this problems included disabling the MI scheduler entirely (#127784), and #139557 was about a heuristic to not schedule hand-written vector code. This solution is preferred because it avoids another heuristic and achieves what we want, and for what is worth, there is a lot of precedent for setting this feature. Thanks to:
Full diff: https://github.com/llvm/llvm-project/pull/140897.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index a5f0f6a2eb150..e7a3527202f6a 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -561,7 +561,8 @@ def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2
FeatureEnableSelectOptimize,
FeatureUseFixedOverScalableIfEqualCost,
FeatureAvoidLDAPUR,
- FeaturePredictableSelectIsExpensive]>;
+ FeaturePredictableSelectIsExpensive,
+ FeatureDisableLatencySchedHeuristic]>;
def TuneNeoverseV3 : SubtargetFeature<"neoversev3", "ARMProcFamily", "NeoverseV3",
"Neoverse V3 ARM processors", [
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have looked at this and this looks good to me, but maybe give it a day or so before merging so that everyone can share their thoughts. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah SGTM. (And I don't know a good way to test it either).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also LGTM, can revisit if we notice any fallout for workloads we're interested in.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/29556 Here is the relevant piece of the build log for the reference
|
This adds FeatureDisableLatencySchedHeuristic to the Neoverse V2 core tuning description. This gives us a 20% improvement on a key workload, some other minor improvements here and there, and no real regressions; nothing outside the noise levels. Earlier attempts to solve this problems included disabling the MI scheduler entirely (llvm#127784), and llvm#139557 was about a heuristic to not schedule hand-written vector code. This solution is preferred because it avoids another heuristic and achieves what we want, and for what is worth, there is a lot of precedent for setting this feature. Thanks to: - Ricardo Jesus for pointing out this subtarget feature, and - Cameron McInally for the extensive performance testing.
This adds FeatureDisableLatencySchedHeuristic to the Neoverse V2 core tuning description. This gives us a 20% improvement on a key workload, some other minor improvements here and there, and no real regressions; nothing outside the noise levels. Earlier attempts to solve this problems included disabling the MI scheduler entirely (llvm#127784), and llvm#139557 was about a heuristic to not schedule hand-written vector code. This solution is preferred because it avoids another heuristic and achieves what we want, and for what is worth, there is a lot of precedent for setting this feature. Thanks to: - Ricardo Jesus for pointing out this subtarget feature, and - Cameron McInally for the extensive performance testing.
This adds FeatureDisableLatencySchedHeuristic to the Neoverse V2 core tuning description. This gives us a 20% improvement on a key workload, some other minor improvements here and there, and no real regressions; nothing outside the noise levels. Earlier attempts to solve this problems included disabling the MI scheduler entirely (llvm#127784), and llvm#139557 was about a heuristic to not schedule hand-written vector code. This solution is preferred because it avoids another heuristic and achieves what we want, and for what is worth, there is a lot of precedent for setting this feature. Thanks to: - Ricardo Jesus for pointing out this subtarget feature, and - Cameron McInally for the extensive performance testing.
Hi, @sjoerdmeijer! There’s a bit of a story here: PR #136329 caused about a 1.8% performance regression in 525.x264 on Neoverse-V2 concentrated in With my current patch, we’re only recovering around 0.3–0.4% of the lost performance, while reverting your patch on top gives an additional 1.4–1.5% improvement. I’m not sure if this is something we can address easily - the scheduler is tricky territory - but I just wanted to bring it to your attention in case you have any thoughts. |
Hi @igogo-x86 , sorry for the late reply, thanks for sharing this. We have certainly seen some up and down behaviour too. But there were a lot more ups than downs (and a big up), making this overall a good change. We do have a follow up task on our backlog, we are interested to see if we can claw back one of the regressions. But like you say, this is tricky business because of scheduler and regalloc interaction, and there are different ways of solving this, so we will have to see. I don't think I can get more concrete than this at this point. The x264 problem sounds a bit different though, so I am going to have a look at #146694 and see if I can help with that one. |
Ah, sorry, I misunderstood. It was the SLP patch that caused a slow down first. And then you found that this patch is interacting, and is not really helping. Do you know why, by any chance? |
LLVM IR is exactly the same before the SLP patch (#136329) and after my fix (#140897). However, the performance was still worse. So I looked at the assembly and saw that the instructions were identical, but the order was different. I then bisected the issue down to your change. Here are the
P.S. Something improved |
This adds FeatureDisableLatencySchedHeuristic to the Neoverse V2 core tuning description. This gives us a 20% improvement on a key workload, some other minor improvements here and there, and no real regressions; nothing outside the noise levels.
Earlier attempts to solve this problems included disabling the MI scheduler entirely (#127784), and #139557 was about a heuristic to not schedule hand-written vector code. This solution is preferred because it avoids another heuristic and achieves what we want, and for what is worth, there is a lot of precedent for setting this feature.
Thanks to: