Skip to content

Conversation

@tomershafir
Copy link
Contributor

@tomershafir tomershafir commented Jun 11, 2025

  • Add a gpr32 suffix to test name to denote the specific register class being checked
  • Expand -mtriple=arm64-apple-ios to -march=arm64 to broaden the test context to the generic architecture, as the specific triple is not required
  • Port bl match to Linux too via the regex: {{_?foo}}
  • Advance -mcpu=cyclone to the newer M series major -mcpu=apple-m1
  • Use -mcpu so that -mattr=-zcm has a real effect
  • Add a test that generic arm64 doesn't optimize for ZCM
  • Distinguish 4 different assembly layouts: NOTCPU, CPU, NOTATTR, ATTR
  • Fix broken test logic, for example: ; NOT: mov [[REG2:w[0-9]+]], w3 matched mov w1, w3 then REG2 captured w1 but then ; NOT: mov w1, [[REG2]] matched by prefix mov, w1, w19 even though it should have matched mov w1, w1. This change adds explicit matches for all of the generated copies.

@llvmbot
Copy link
Member

llvmbot commented Jun 11, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Tomer Shafir (tomershafir)

Changes
  • Add a gpr32 suffix to test name to denote the specific register class being checked
  • Expand -mtriple=arm64-apple-ios to -march=arm64 to broaden the test context to the generic architecture, as the specific triple is not required
  • Advance -mcpu=cyclone to the newer M series major -mcpu=apple-m1
  • Use -mcpu instead of -mtriple so that -mattr=-zcm has a real effect
  • Remove uneeded -mtriple where -mcpu is the config that enables ZCM optimization
  • Add a test that generic arm64 doesn't optimize for ZCM
  • Distinguish 4 different assembly layouts: NOTCPU, CPU, NOTATTR, ATTR
  • Fix broken test logic, for example: ; NOT: mov [[REG2:w[0-9]+]], w3 matched mov w1, w3 then REG2 captured w1 but then ; NOT: mov w1, [[REG2]] matched by prefix mov, w1, w19 even though it should have matched mov w1, w1. This change adds explicit matches for all of the generated copies.

Full diff: https://github.com/llvm/llvm-project/pull/143680.diff

2 Files Affected:

  • (added) llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-gpr32.ll (+39)
  • (removed) llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll (-23)
diff --git a/llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-gpr32.ll b/llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-gpr32.ll
new file mode 100644
index 0000000000000..9891da6622600
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-gpr32.ll
@@ -0,0 +1,39 @@
+; RUN: llc < %s -march=arm64 | FileCheck %s -check-prefixes=CHECK,NOTCPU
+; RUN: llc < %s -mcpu=apple-m1 | FileCheck %s -check-prefixes=CHECK,CPU
+; RUN: llc < %s -mcpu=apple-m1 -mattr=-zcm | FileCheck %s -check-prefixes=CHECK,NOTATTR
+; RUN: llc < %s -march=arm64 -mattr=+zcm | FileCheck %s -check-prefixes=CHECK,ATTR
+
+define i32 @t(i32 %a, i32 %b, i32 %c, i32 %d) nounwind ssp {
+entry:
+; CHECK-LABEL: t:
+; NOTCPU: mov w0, w2
+; NOTCPU: mov w1, w3
+; NOTCPU: mov [[REG2:w[0-9]+]], w3
+; NOTCPU: mov [[REG1:w[0-9]+]], w2
+; CPU: mov [[REG2:x[0-9]+]], x3
+; CPU: mov [[REG1:x[0-9]+]], x2
+; CPU: mov x0, x2
+; CPU: mov x1, x3
+; NOTATTR: mov [[REG2:w[0-9]+]], w3
+; NOTATTR: mov [[REG1:w[0-9]+]], w2
+; NOTATTR: mov w0, w2
+; NOTATTR: mov w1, w3
+; ATTR: mov x0, x2
+; ATTR: mov x1, x3
+; ATTR: mov [[REG2:x[0-9]+]], x3
+; ATTR: mov [[REG1:x[0-9]+]], x2
+; CHECK: bl _foo
+; NOTCPU: mov w0, [[REG1]]
+; NOTCPU: mov w1, [[REG2]]
+; CPU: mov x0, [[REG1]]
+; CPU: mov x1, [[REG2]]
+; NOTATTR: mov w0, [[REG1]]
+; NOTATTR: mov w1, [[REG2]]
+; ATTR: mov x0, [[REG1]]
+; ATTR: mov x1, [[REG2]]
+  %call = call i32 @foo(i32 %c, i32 %d) nounwind
+  %call1 = call i32 @foo(i32 %c, i32 %d) nounwind
+  unreachable
+}
+
+declare i32 @foo(i32, i32)
diff --git a/llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll b/llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll
deleted file mode 100644
index b390853d44bff..0000000000000
--- a/llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll
+++ /dev/null
@@ -1,23 +0,0 @@
-; RUN: llc < %s -mtriple=arm64-apple-ios -mattr=-zcm   | FileCheck %s -check-prefixes=CHECK,NOT
-; RUN: llc < %s -mtriple=arm64-apple-ios -mattr=+zcm   | FileCheck %s -check-prefixes=CHECK,YES
-; RUN: llc < %s -mtriple=arm64-apple-ios -mcpu=cyclone | FileCheck %s -check-prefixes=CHECK,YES
-
-; rdar://12254953
-define i32 @t(i32 %a, i32 %b, i32 %c, i32 %d) nounwind ssp {
-entry:
-; CHECK-LABEL: t:
-; NOT: mov [[REG2:w[0-9]+]], w3
-; NOT: mov [[REG1:w[0-9]+]], w2
-; YES: mov [[REG2:x[0-9]+]], x3
-; YES: mov [[REG1:x[0-9]+]], x2
-; CHECK: bl _foo
-; NOT: mov w0, [[REG1]]
-; NOT: mov w1, [[REG2]]
-; YES: mov x0, [[REG1]]
-; YES: mov x1, [[REG2]]
-  %call = call i32 @foo(i32 %c, i32 %d) nounwind
-  %call1 = call i32 @foo(i32 %c, i32 %d) nounwind
-  unreachable
-}
-
-declare i32 @foo(i32, i32)

@tomershafir tomershafir force-pushed the aarch64-improve-0-cycle-regmove-test branch 2 times, most recently from 1ac2414 to 8e6cb06 Compare June 11, 2025 12:28
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
define i32 @t(i32 %a, i32 %b, i32 %c, i32 %d) nounwind ssp {
define i32 @t(i32 %a, i32 %b, i32 %c, i32 %d) {

nounwind/ssp shouldn't be needed I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big is the full assembly for the function? Would it make sense to auto-generate the check lines via llvm/utils/update_llc_test_checks.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its only 20-30 lines of assembly, I think its small enough to keep it manual and more manageable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interleaving off checks for different configuration makes it very hard to see what's going on.

I would expect that CPU/ATTR check lines are the same? Same for NOTATTR/NOTCPU?

If that's the case, they should be merged. Then it would make sense to auto-generate and check the full assembly for the case with and without the feature enabled (either via CPU or attribute)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So CPU/ATTR differ because -mcpu changes the order of instructions. Same for NOTATTR/NOTCPU. I can separate each configuration by duplicating ; CHECK: bl {{_?foo}}, or autogen it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will separate the outputs as described, as I dont think the cons of autogenerating the test worth it here (overfitting, robustness, readability, harder to debug)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So CPU/ATTR differ because -mcpu changes the order of instructions. Same for NOTATTR/NOTCPU. I can separate each configuration by duplicating ; CHECK: bl {{_?foo}}, or autogen it.

Ah I see, thanks. Separating the output makes things clearer.

@tomershafir tomershafir force-pushed the aarch64-improve-0-cycle-regmove-test branch from 8e6cb06 to 916c109 Compare June 12, 2025 12:58
@tomershafir
Copy link
Contributor Author

tomershafir commented Jun 12, 2025

v2:

  • Remove nownwind and ssp from t function decleration
  • Remove nounwind from foo call sites

@tomershafir tomershafir requested a review from fhahn June 12, 2025 13:00
@tomershafir tomershafir force-pushed the aarch64-improve-0-cycle-regmove-test branch 2 times, most recently from c0fd9b7 to 0034d15 Compare June 13, 2025 06:53
@tomershafir
Copy link
Contributor Author

v3:

  • Separate different outputs for readability

@tomershafir
Copy link
Contributor Author

@fhahn Ping

@tomershafir tomershafir force-pushed the aarch64-improve-0-cycle-regmove-test branch from 0034d15 to 7c835ad Compare June 15, 2025 19:39
@tomershafir
Copy link
Contributor Author

v4:

  • Add --match-full-lines param to FileCheck

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks with the suggested edits

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably also check that the call comes next

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So CPU/ATTR differ because -mcpu changes the order of instructions. Same for NOTATTR/NOTCPU. I can separate each configuration by duplicating ; CHECK: bl {{_?foo}}, or autogen it.

Ah I see, thanks. Separating the output makes things clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
define i32 @t(i32 %a, i32 %b, i32 %c, i32 %d) {
define void @t(i32 %a, i32 %b, i32 %c, i32 %d) {

nothing returned

@tomershafir tomershafir force-pushed the aarch64-improve-0-cycle-regmove-test branch from 7c835ad to 13052e5 Compare June 18, 2025 07:11
@tomershafir tomershafir force-pushed the aarch64-improve-0-cycle-regmove-test branch 2 times, most recently from 53d57cc to ec31b1b Compare June 18, 2025 08:33
- Add a `gpr32` suffix to test name to denote the specific register class being checked
- Expand `-mtriple=arm64-apple-ios` to `-march=arm64` to broaden the test context to the generic architecture, as the specific triple is not required
- Port `bl` match to Linux too via the regex: `{{_?foo}}`
- Advance `-mcpu=cyclone` to the newer M series major `-mcpu=apple-m1`
- Use `-mcpu` so that `-mattr=-zcm` has a real effect
- Add a test that generic arm64 doesn't optimize for ZCM
- Distinguish 4 different assembly layouts: NOTCPU, CPU, NOTATTR, ATTR
- Fix broken test logic, for example: `; NOT: mov [[REG2:w[0-9]+]], w3` matched `mov w1, w3` then `REG2` captured `w1` but then `; NOT: mov w1, [[REG2]]` matched by prefix `mov, w1, w19` even though it should have matched `mov w1, w1`. This change adds explicit matches for all of the generated copies, and `--match-full-lines` param to FileCheck.
- Remove nownwind and ssp from t function decleration
- Remove nounwind from foo call sites
- Separate different outputs for readability
- Change return type from `i32` to `void`
- FileCheck exact position of calls with `*-NEXT`
@tomershafir tomershafir force-pushed the aarch64-improve-0-cycle-regmove-test branch from ec31b1b to 9cc45c9 Compare June 18, 2025 09:10
@tomershafir
Copy link
Contributor Author

v5:

  • Change return type from i32 to void
  • FileCheck exact position of calls

@fhahn fhahn merged commit 835d303 into llvm:main Jun 18, 2025
7 checks passed
@tomershafir tomershafir deleted the aarch64-improve-0-cycle-regmove-test branch June 18, 2025 18:22
tomershafir added a commit that referenced this pull request Jun 26, 2025
This change emits optimized copy instructions for FPR32, FPR16, FPR8
register classes on targets that support it. The implementation is
similar to what has been done for GPR32. It adds 2 regression tests for
FPR32 and FPR16.

Depends on: #143680 to resolve
the test structure.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 26, 2025
…(#144152)

This change emits optimized copy instructions for FPR32, FPR16, FPR8
register classes on targets that support it. The implementation is
similar to what has been done for GPR32. It adds 2 regression tests for
FPR32 and FPR16.

Depends on: llvm/llvm-project#143680 to resolve
the test structure.
anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025
This change emits optimized copy instructions for FPR32, FPR16, FPR8
register classes on targets that support it. The implementation is
similar to what has been done for GPR32. It adds 2 regression tests for
FPR32 and FPR16.

Depends on: llvm#143680 to resolve
the test structure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants