Skip to content

Conversation

@dtellenbach
Copy link
Member

@dtellenbach dtellenbach commented Aug 19, 2025

Fix incorrect super-register lookup when copying from $wzr on subtargets that lack zero-cycle zeroing but support 64-bit zero-cycle moves.

When copying from $wzr, we used the wrong register class to lookup the super-register, causing

$w0 = COPY $wzr

to get expanded as

$x0 = ORRXrr $xzr, undef $noreg, implicit $wzr,

rather than the correct

$x0 = ORRXrr $xzr, undef $xzr, implicit $wzr.

Fix incorrect super-register lookup when copying from $wzr on
subtargets that lack zero-cycle zeroing but support 64-bit zero-cycle
moves.

When copying from $wzr, we used the wrong register class to lookup the
super-register, causing

  $w0 = COPY $wzr

to get expanded as

  $x0 = ORRXrr $xzr, undef $noreg, implicit $wzr,

rather than the correct

  $x0 = ORRXrr $xzr, undef $xzr, implicit $wzr.
@llvmbot
Copy link
Member

llvmbot commented Aug 19, 2025

@llvm/pr-subscribers-backend-aarch64

Author: David Tellenbach (dtellenbach)

Changes

Fix incorrect super-register lookup when copying from $wzr on subtargets that lack zero-cycle zeroing but support 64-bit zero-cycle moves.

When copying from $wzr, we used the wrong register class to lookup the super-register, causing

$w0 = COPY $wzr

to get expanded as

$x0 = ORRXrr $xzr, undef $noreg, implicit $wzr,

rather than the correct

$x0 = ORRXrr $xzr, undef $xzr, implicit $wzr.


Full diff: https://github.com/llvm/llvm-project/pull/154362.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+7-2)
  • (added) llvm/test/CodeGen/AArch64/arm64-copy-phys-zero-reg.mir (+109)
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index c987a2950f152..d15f90deba74e 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -5085,8 +5085,13 @@ void AArch64InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
         // Cyclone recognizes "ORR Xd, XZR, Xm" as a zero-cycle register move.
         MCRegister DestRegX = TRI->getMatchingSuperReg(
             DestReg, AArch64::sub_32, &AArch64::GPR64spRegClass);
-        MCRegister SrcRegX = TRI->getMatchingSuperReg(
-            SrcReg, AArch64::sub_32, &AArch64::GPR64spRegClass);
+        assert(DestRegX.isValid() && "Destination super-reg not valid");
+        MCRegister SrcRegX =
+            SrcReg == AArch64::WZR
+                ? AArch64::XZR
+                : TRI->getMatchingSuperReg(SrcReg, AArch64::sub_32,
+                                           &AArch64::GPR64spRegClass);
+        assert(SrcRegX.isValid() && "Source super-reg not valid");
         // This instruction is reading and writing X registers.  This may upset
         // the register scavenger and machine verifier, so we need to indicate
         // that we are reading an undefined value from SrcRegX, but a proper
diff --git a/llvm/test/CodeGen/AArch64/arm64-copy-phys-zero-reg.mir b/llvm/test/CodeGen/AArch64/arm64-copy-phys-zero-reg.mir
new file mode 100644
index 0000000000000..76b5b76130657
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/arm64-copy-phys-zero-reg.mir
@@ -0,0 +1,109 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -o - -mtriple=arm64-apple-ios -run-pass=postrapseudos -simplify-mir -verify-machineinstrs -mattr="-zcm-gpr32,-zcm-gpr64,-zcz" %s \
+# RUN:    | FileCheck --check-prefix=CHECK-NO-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ %s
+# RUN: llc -o - -mtriple=arm64-apple-ios -run-pass=postrapseudos -simplify-mir -verify-machineinstrs -mattr="+zcm-gpr32,-zcm-gpr64,-zcz" %s \
+# RUN:    | FileCheck --check-prefix=CHECK-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ %s
+# RUN: llc -o - -mtriple=arm64-apple-ios -run-pass=postrapseudos -simplify-mir -verify-machineinstrs -mattr="-zcm-gpr32,+zcm-gpr64,-zcz" %s \
+# RUN:    | FileCheck --check-prefix=CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ %s
+# RUN: llc -o - -mtriple=arm64-apple-ios -run-pass=postrapseudos -simplify-mir -verify-machineinstrs -mattr="+zcm-gpr32,+zcm-gpr64,-zcz" %s \
+# RUN:    | FileCheck --check-prefix=CHECK-ZCM-GPR32-ZCM-GPR64-NO-ZCZ %s
+# RUN: llc -o - -mtriple=arm64-apple-ios -run-pass=postrapseudos -simplify-mir -verify-machineinstrs -mattr="-zcm-gpr32,-zcm-gpr64,+zcz" %s \
+# RUN:    | FileCheck --check-prefix=CHECK-NO-ZCM-ZCZ %s
+# RUN: llc -o - -mtriple=arm64-apple-ios -run-pass=postrapseudos -simplify-mir -verify-machineinstrs -mattr="+zcm-gpr32,+zcm-gpr64,+zcz" %s \
+# RUN:    | FileCheck --check-prefix=CHECK-ZCM-ZCZ %s
+
+--- |
+  define void @f0(i64 noundef %x) { ret void }
+  define void @f1(i64 noundef %x) { ret void }
+  define void @f2(i32 noundef) { ret void }
+...
+---
+name:            f0
+liveins:
+  - { reg: '$x0' }
+body:             |
+  bb.0:
+    liveins: $x0, $lr
+    ; CHECK-NO-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-LABEL: name: f0
+    ; CHECK-NO-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ: liveins: $x0, $lr
+    ; CHECK-NO-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-NEXT: {{  $}}
+    ; CHECK-NO-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-NEXT: $w0 = ORRWrr $wzr, $wzr
+    ; CHECK-NO-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-NEXT: BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
+    ;
+    ; CHECK-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-LABEL: name: f0
+    ; CHECK-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ: liveins: $x0, $lr
+    ; CHECK-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-NEXT: {{  $}}
+    ; CHECK-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-NEXT: $w0 = ORRWrr $wzr, $wzr
+    ; CHECK-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-NEXT: BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
+    ;
+    ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-LABEL: name: f0
+    ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ: liveins: $x0, $lr
+    ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: {{  $}}
+    ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: $x0 = ORRXrr $xzr, undef $xzr, implicit $wzr
+    ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
+    ;
+    ; CHECK-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-LABEL: name: f0
+    ; CHECK-ZCM-GPR32-ZCM-GPR64-NO-ZCZ: liveins: $x0, $lr
+    ; CHECK-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: {{  $}}
+    ; CHECK-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: $w0 = ORRWrr $wzr, $wzr
+    ; CHECK-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
+    ;
+    ; CHECK-NO-ZCM-ZCZ-LABEL: name: f0
+    ; CHECK-NO-ZCM-ZCZ: liveins: $x0, $lr
+    ; CHECK-NO-ZCM-ZCZ-NEXT: {{  $}}
+    ; CHECK-NO-ZCM-ZCZ-NEXT: $w0 = MOVZWi 0, 0
+    ; CHECK-NO-ZCM-ZCZ-NEXT: BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
+    ;
+    ; CHECK-ZCM-ZCZ-LABEL: name: f0
+    ; CHECK-ZCM-ZCZ: liveins: $x0, $lr
+    ; CHECK-ZCM-ZCZ-NEXT: {{  $}}
+    ; CHECK-ZCM-ZCZ-NEXT: $w0 = MOVZWi 0, 0
+    ; CHECK-ZCM-ZCZ-NEXT: BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
+    $w0 = COPY $wzr
+    BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
+...
+---
+name:            f1
+liveins:
+  - { reg: '$x0' }
+body:             |
+  bb.0:
+    liveins: $x0, $lr
+    ; CHECK-NO-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-LABEL: name: f1
+    ; CHECK-NO-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ: liveins: $x0, $lr
+    ; CHECK-NO-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-NEXT: {{  $}}
+    ; CHECK-NO-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-NEXT: $x0 = ORRXrr $xzr, $xzr
+    ; CHECK-NO-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-NEXT: BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
+    ;
+    ; CHECK-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-LABEL: name: f1
+    ; CHECK-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ: liveins: $x0, $lr
+    ; CHECK-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-NEXT: {{  $}}
+    ; CHECK-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-NEXT: $x0 = ORRXrr $xzr, $xzr
+    ; CHECK-ZCM-GPR32-NO-ZCM-GPR64-NO-ZCZ-NEXT: BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
+    ;
+    ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-LABEL: name: f1
+    ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ: liveins: $x0, $lr
+    ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: {{  $}}
+    ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: $x0 = ORRXrr $xzr, $xzr
+    ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
+    ;
+    ; CHECK-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-LABEL: name: f1
+    ; CHECK-ZCM-GPR32-ZCM-GPR64-NO-ZCZ: liveins: $x0, $lr
+    ; CHECK-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: {{  $}}
+    ; CHECK-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: $x0 = ORRXrr $xzr, $xzr
+    ; CHECK-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
+    ;
+    ; CHECK-NO-ZCM-ZCZ-LABEL: name: f1
+    ; CHECK-NO-ZCM-ZCZ: liveins: $x0, $lr
+    ; CHECK-NO-ZCM-ZCZ-NEXT: {{  $}}
+    ; CHECK-NO-ZCM-ZCZ-NEXT: $x0 = MOVZXi 0, 0
+    ; CHECK-NO-ZCM-ZCZ-NEXT: BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
+    ;
+    ; CHECK-ZCM-ZCZ-LABEL: name: f1
+    ; CHECK-ZCM-ZCZ: liveins: $x0, $lr
+    ; CHECK-ZCM-ZCZ-NEXT: {{  $}}
+    ; CHECK-ZCM-ZCZ-NEXT: $x0 = MOVZXi 0, 0
+    ; CHECK-ZCM-ZCZ-NEXT:BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
+    $x0 = COPY $xzr
+    BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
+...

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The undef seems odd and it could avoid the implicit, but SGTM as a fix.

Copy link
Contributor

@tomershafir tomershafir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

@@ -0,0 +1,109 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't the test miss 2 cases +zcm-gpr32,-zcm-gpr64,+zcz and -zcm-gpr32,+zcm-gpr64,+zcz?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but that's not exercising any other control-flow which because of +zcz.

SrcReg, AArch64::sub_32, &AArch64::GPR64spRegClass);
assert(DestRegX.isValid() && "Destination super-reg not valid");
MCRegister SrcRegX =
SrcReg == AArch64::WZR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add if (SrcReg == AArch64::WZR) and have 2 ways to build the MI to avoid undef and implicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to check if that's really okay or causes any problems. There is a comment by Tim Northover about using undef to help register allocation so we shouldn't blindly remove it.

Following Dave's reasoning here that this is good enough to fix the real issue.

@dtellenbach dtellenbach merged commit 0542355 into llvm:main Aug 19, 2025
9 checks passed
tomershafir pushed a commit to swiftlang/llvm-project that referenced this pull request Nov 11, 2025
Fix incorrect super-register lookup when copying from $wzr on subtargets
that lack zero-cycle zeroing but support 64-bit zero-cycle moves.

When copying from $wzr, we used the wrong register class to lookup the
super-register, causing

    $w0 = COPY $wzr

to get expanded as

    $x0 = ORRXrr $xzr, undef $noreg, implicit $wzr,

rather than the correct

    $x0 = ORRXrr $xzr, undef $xzr, implicit $wzr.

(cherry-pick b51486f88a7f90719a1e744c828738e717ba5ffc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants