[DRAFT] RISC-V: Use `.insn` on (Zkne or Zknd) intrinsics #1917

a4lg · 2025-09-11T05:39:39Z

Using the inline assembly and .insn could avoid current issues regarding intrinsics available when either Zknd or Zkne is available.

Note: this is intentionally incomplete as rustc changes are not performed yet.

sayantn · 2025-09-11T19:37:58Z

My only problem with this approach is that this will interact badly with the optimizer. LLVM must assume all asm blocks to be black boxes, so they are major hazards to optimization. And considering that these are cryptography instructions which are meant for performance, this might not be the right choice.

Also, you can do both the rustc PR and the stdarch PR together now, due to stdarch being a subtree of the rustc repo, rather than a submodule

a4lg · 2025-09-11T23:55:13Z

My only problem with this approach is that this will interact badly with the optimizer. LLVM must assume all asm blocks to be black boxes, so they are major hazards to optimization. And considering that these are cryptography instructions which are meant for performance, this might not be the right choice.

I see.

Worth experimenting and I'll try that later whether actual AES decryption code is heavily affected.

Also, you can do both the rustc PR and the stdarch PR together now, due to stdarch being a subtree of the rustc repo, rather than a submodule

This is intentional because CI of stdarch hasn't integrated into the main repository and I wanted to test stdarch's CI/testing process.

I actually found that some of the tests (assert_instr) are passing by accident (due to an undocumented behavior of the LLVM's disassembler) so that was worth it.

**DRAFT: THIS COMMIT SHALL BE SQUASHED INTO THE NEXT COMMIT.** Because changes to rustc are not performed yet, we cannot switch `#[target_feature]` attribute to enable desired features. Instead, this commit just comments out `#[target_feature]`.

a4lg · 2025-09-13T05:12:30Z

It seems... even if we use .insn and inline assembly, the output is pretty much the same as long as a function which uses intrinsics enable the same target (through either #[target_feature] or Rust compiler options) as the #[target_feature] attribute on each intrinsic.

The code below implements 1 and 2 of AES-256 decryption routines.

Key scheduling (expand 256-bit (==32-byte) AES key into 240-byte scheduled key; same routine as encryption)
Key conversion for decryption
Because key scheduling (1.) generates the key suitable for encryption and for decryption with better efficiency, we'd better to process the scheduled key to decryption-intended one.
Actual AES-256 decryption (not shown here)
It would use the 16-byte block(s) and scheduled key (output of 2.) on the ECB mode.

#![feature(riscv_ext_intrinsics)]
#![allow(clippy::identity_op)]

use core::arch::riscv64::*;

/*
    Key Sizes:

    AES-256 key:
        256-bit key data
        == 4 u64 words
    AES-256 scheduled key:
        128-bit block * (14 + 1)  ; where 14 is the number of AES-256 rounds
        == 30 u64 words
*/

// Adjust #[target_feature] attributes below depending on the configuration and build.

// #[target_feature(enable = "zknd")]
// #[target_feature(enable = "zknd", enable = "zkne")]
#[unsafe(no_mangle)]
pub unsafe extern "C" fn aes256_key_schedule(key: *const [u64; 4], scheduled_key: *mut [u64; 30]) {
    unsafe {
        let key: &[u64; 4] = &*key;
        let mut rk0 = key[0];
        let mut rk1 = key[1];
        let mut rk2 = key[2];
        let mut rk3 = key[3];
        let scheduled_key: &mut [u64; 30] = &mut *scheduled_key;
        scheduled_key[0] = rk0;
        scheduled_key[1] = rk1;
        scheduled_key[2] = rk2;
        scheduled_key[3] = rk3;
        macro_rules! double_round {
            ($i: expr) => {
                let tmp = aes64ks1i::<$i>(rk3);
                rk0 = aes64ks2(tmp, rk0);
                rk1 = aes64ks2(rk0, rk1);
                let tmp = aes64ks1i::<10>(rk1);
                rk2 = aes64ks2(tmp, rk2);
                rk3 = aes64ks2(rk2, rk3);
                scheduled_key[4 * ($i + 1) + 0] = rk0;
                scheduled_key[4 * ($i + 1) + 1] = rk1;
                scheduled_key[4 * ($i + 1) + 2] = rk2;
                scheduled_key[4 * ($i + 1) + 3] = rk3;
            };
        }
        double_round!(0);
        double_round!(1);
        double_round!(2);
        double_round!(3);
        double_round!(4);
        double_round!(5);
        // Process tail
        let tmp = aes64ks1i::<6>(rk3);
        rk0 = aes64ks2(tmp, rk0);
        rk1 = aes64ks2(rk0, rk1);
        scheduled_key[4 * 7 + 0] = rk0;
        scheduled_key[4 * 7 + 1] = rk1;
    }
}

// #[target_feature(enable = "zknd")]
// #[target_feature(enable = "zknd", enable = "zkne")]
#[unsafe(no_mangle)]
pub unsafe extern "C" fn aes256_key_schedule_for_decryption(
    scheduled_key: *const [u64; 30],
    scheduled_key_dec: *mut [u64; 30],
) {
    unsafe {
        let scheduled_key: &[u64; 30] = &*scheduled_key;
        let scheduled_key_dec: &mut [u64; 30] = &mut *scheduled_key_dec;
        // Copy the first scheduled key as-is.
        scheduled_key_dec[0] = scheduled_key[0];
        scheduled_key_dec[1] = scheduled_key[1];
        // Inverse MixColumns on scheduled keys index 1..=13
        macro_rules! invmc {
            ($i: expr) => {
                scheduled_key_dec[$i * 2 + 0] = aes64im(scheduled_key[$i * 2 + 0]);
                scheduled_key_dec[$i * 2 + 1] = aes64im(scheduled_key[$i * 2 + 1]);
            };
        }
        invmc!(1);
        invmc!(2);
        invmc!(3);
        invmc!(4);
        invmc!(5);
        invmc!(6);
        invmc!(7);
        invmc!(8);
        invmc!(9);
        invmc!(10);
        invmc!(11);
        invmc!(12);
        invmc!(13);
        // Copy the last scheduled key as-is.
        scheduled_key_dec[28] = scheduled_key[28];
        scheduled_key_dec[29] = scheduled_key[29];
    }
}

Using the inline assembly and `.insn` could avoid current issues regarding intrinsics available when either Zknd or Zkne is available. Note that, coincidental success of the testing process depends on the possibly flaky disassembler behavior (which attempts to disassemble instructions *not* enabled on the target of the output object file) and should be removed now or if anything breaks.

a4lg · 2025-09-13T06:48:19Z

Little Update: Added preserves_flags to the option.

While RISC-V is normally described as "RISC-V does not have condition flags", by default the Rust compiler assumes that RISC-V inline assembly may overwrite some of vector-related registers. Since we don't touch them, we'd better add it to ensure mixed code don't need saving/restoring such registers.

It affects other intrinsics with inline assembly and I'll submit another (non-draft) PR for better efficiency on mixed RVV + non-vector intrinsics.

a4lg · 2025-09-14T00:09:09Z

Also note that, the option pure can make inline assembly optimization friendly (it states that there's no side effects outside declared in asm!).
Rust's asm! is by default __asm__ __volatile__ in the GCC syntax but the option pure makes this __asm__ (which allows rearranging instructions across this block and removing the block entirely).

sayantn · 2025-09-18T23:22:51Z

Ok, if it does optimize satisfyingly enough, we can then probably use the asm implementation. Did you send the rustc PR yet?

a4lg · 2025-09-18T23:27:35Z

@sayantn
I'll submit after rust-lang/rust#146709 is merged.
By the way, I'd like to credit your name in my commit (zkne_or_zknd target feature part) and is Co-authored-by trailer okay to you? If not, I'm going to credit your name as Helped-by trailer.

sayantn · 2025-09-18T23:45:34Z

I have no problems, but typically changes due to review comments don't need attribution, so you really don't need to put any attributions.

a4lg · 2025-09-18T23:53:13Z

We didn't share the code directly but I think we helped each other so... Helped-by seems better (at first, I thought Co-authored-by is okay since we don't have much choices to implement zkne_or_zknd and our experimental code should be pretty much the same).

I understand that attribution is completely optional but since you have first examined the possibilities to fix the issue #1765, I want to show your contribution in some way.

a4lg · 2025-09-20T00:43:35Z

Closing this in favor of rust-lang/rust#146798.

In that PR, inline assembly got symbolic.

a4lg force-pushed the test-riscv64-zknd-zkne-by-direct-asm branch from f31c152 to bc825ff Compare September 11, 2025 06:09

a4lg changed the title ~~[DRAFT] RISC-V: Test .insn on Zknd or Zkne intrinsics~~ [DRAFT] RISC-V: Test .insn on Zkne or Zknd intrinsics Sep 11, 2025

a4lg force-pushed the test-riscv64-zknd-zkne-by-direct-asm branch from bc825ff to 3ad3868 Compare September 11, 2025 06:18

a4lg mentioned this pull request Sep 11, 2025

aes64ks1i and aes64ks2 have wrong target feature #1765

Open

a4lg force-pushed the test-riscv64-zknd-zkne-by-direct-asm branch 2 times, most recently from 7ee6e1e to 643b06a Compare September 11, 2025 07:01

a4lg changed the title ~~[DRAFT] RISC-V: Test .insn on Zkne or Zknd intrinsics~~ [DRAFT] RISC-V: Use .insn on (Zkne or Zknd) intrinsics Sep 11, 2025

a4lg force-pushed the test-riscv64-zknd-zkne-by-direct-asm branch from 643b06a to 1209865 Compare September 12, 2025 06:17

a4lg force-pushed the test-riscv64-zknd-zkne-by-direct-asm branch from 1209865 to 1a47fad Compare September 13, 2025 06:44

a4lg closed this Sep 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT] RISC-V: Use `.insn` on (Zkne or Zknd) intrinsics #1917

[DRAFT] RISC-V: Use `.insn` on (Zkne or Zknd) intrinsics #1917

Uh oh!

a4lg commented Sep 11, 2025

Uh oh!

sayantn commented Sep 11, 2025 •

edited

Loading

Uh oh!

a4lg commented Sep 11, 2025 •

edited

Loading

Uh oh!

a4lg commented Sep 13, 2025 •

edited

Loading

Uh oh!

a4lg commented Sep 13, 2025 •

edited

Loading

Uh oh!

a4lg commented Sep 14, 2025 •

edited

Loading

Uh oh!

sayantn commented Sep 18, 2025

Uh oh!

a4lg commented Sep 18, 2025 •

edited

Loading

Uh oh!

sayantn commented Sep 18, 2025

Uh oh!

a4lg commented Sep 18, 2025

Uh oh!

a4lg commented Sep 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[DRAFT] RISC-V: Use .insn on (Zkne or Zknd) intrinsics #1917

[DRAFT] RISC-V: Use .insn on (Zkne or Zknd) intrinsics #1917

Uh oh!

Conversation

a4lg commented Sep 11, 2025

Uh oh!

sayantn commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a4lg commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a4lg commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a4lg commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a4lg commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayantn commented Sep 18, 2025

Uh oh!

a4lg commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayantn commented Sep 18, 2025

Uh oh!

a4lg commented Sep 18, 2025

Uh oh!

a4lg commented Sep 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[DRAFT] RISC-V: Use `.insn` on (Zkne or Zknd) intrinsics #1917

[DRAFT] RISC-V: Use `.insn` on (Zkne or Zknd) intrinsics #1917

sayantn commented Sep 11, 2025 •

edited

Loading

a4lg commented Sep 11, 2025 •

edited

Loading

a4lg commented Sep 13, 2025 •

edited

Loading

a4lg commented Sep 13, 2025 •

edited

Loading

a4lg commented Sep 14, 2025 •

edited

Loading

a4lg commented Sep 18, 2025 •

edited

Loading