Skip to content

Conversation

@a4lg
Copy link
Contributor

@a4lg a4lg commented Sep 11, 2025

Using the inline assembly and .insn could avoid current issues regarding intrinsics available when either Zknd or Zkne is available.

Note: this is intentionally incomplete as rustc changes are not performed yet.

@a4lg a4lg force-pushed the test-riscv64-zknd-zkne-by-direct-asm branch from f31c152 to bc825ff Compare September 11, 2025 06:09
@a4lg a4lg changed the title [DRAFT] RISC-V: Test .insn on Zknd or Zkne intrinsics [DRAFT] RISC-V: Test .insn on Zkne or Zknd intrinsics Sep 11, 2025
@a4lg a4lg force-pushed the test-riscv64-zknd-zkne-by-direct-asm branch from bc825ff to 3ad3868 Compare September 11, 2025 06:18
@a4lg a4lg force-pushed the test-riscv64-zknd-zkne-by-direct-asm branch 2 times, most recently from 7ee6e1e to 643b06a Compare September 11, 2025 07:01
@a4lg a4lg changed the title [DRAFT] RISC-V: Test .insn on Zkne or Zknd intrinsics [DRAFT] RISC-V: Use .insn on (Zkne or Zknd) intrinsics Sep 11, 2025
@sayantn
Copy link
Contributor

sayantn commented Sep 11, 2025

My only problem with this approach is that this will interact badly with the optimizer. LLVM must assume all asm blocks to be black boxes, so they are major hazards to optimization. And considering that these are cryptography instructions which are meant for performance, this might not be the right choice.

Also, you can do both the rustc PR and the stdarch PR together now, due to stdarch being a subtree of the rustc repo, rather than a submodule

@a4lg
Copy link
Contributor Author

a4lg commented Sep 11, 2025

My only problem with this approach is that this will interact badly with the optimizer. LLVM must assume all asm blocks to be black boxes, so they are major hazards to optimization. And considering that these are cryptography instructions which are meant for performance, this might not be the right choice.

I see.

Worth experimenting and I'll try that later whether actual AES decryption code is heavily affected.

Also, you can do both the rustc PR and the stdarch PR together now, due to stdarch being a subtree of the rustc repo, rather than a submodule

This is intentional because CI of stdarch hasn't integrated into the main repository and I wanted to test stdarch's CI/testing process.

I actually found that some of the tests (assert_instr) are passing by accident (due to an undocumented behavior of the LLVM's disassembler) so that was worth it.

@a4lg a4lg force-pushed the test-riscv64-zknd-zkne-by-direct-asm branch from 643b06a to 1209865 Compare September 12, 2025 06:17
**DRAFT: THIS COMMIT SHALL BE SQUASHED INTO THE NEXT COMMIT.**

Because changes to rustc are not performed yet, we cannot switch
`#[target_feature]` attribute to enable desired features.

Instead, this commit just comments out `#[target_feature]`.
@a4lg
Copy link
Contributor Author

a4lg commented Sep 13, 2025

It seems... even if we use .insn and inline assembly, the output is pretty much the same as long as a function which uses intrinsics enable the same target (through either #[target_feature] or Rust compiler options) as the #[target_feature] attribute on each intrinsic.

The code below implements 1 and 2 of AES-256 decryption routines.

  1. Key scheduling (expand 256-bit (==32-byte) AES key into 240-byte scheduled key; same routine as encryption)
  2. Key conversion for decryption
    Because key scheduling (1.) generates the key suitable for encryption and for decryption with better efficiency, we'd better to process the scheduled key to decryption-intended one.
  3. Actual AES-256 decryption (not shown here)
    It would use the 16-byte block(s) and scheduled key (output of 2.) on the ECB mode.
#![feature(riscv_ext_intrinsics)]
#![allow(clippy::identity_op)]

use core::arch::riscv64::*;

/*
    Key Sizes:

    AES-256 key:
        256-bit key data
        == 4 u64 words
    AES-256 scheduled key:
        128-bit block * (14 + 1)  ; where 14 is the number of AES-256 rounds
        == 30 u64 words
*/

// Adjust #[target_feature] attributes below depending on the configuration and build.

// #[target_feature(enable = "zknd")]
// #[target_feature(enable = "zknd", enable = "zkne")]
#[unsafe(no_mangle)]
pub unsafe extern "C" fn aes256_key_schedule(key: *const [u64; 4], scheduled_key: *mut [u64; 30]) {
    unsafe {
        let key: &[u64; 4] = &*key;
        let mut rk0 = key[0];
        let mut rk1 = key[1];
        let mut rk2 = key[2];
        let mut rk3 = key[3];
        let scheduled_key: &mut [u64; 30] = &mut *scheduled_key;
        scheduled_key[0] = rk0;
        scheduled_key[1] = rk1;
        scheduled_key[2] = rk2;
        scheduled_key[3] = rk3;
        macro_rules! double_round {
            ($i: expr) => {
                let tmp = aes64ks1i::<$i>(rk3);
                rk0 = aes64ks2(tmp, rk0);
                rk1 = aes64ks2(rk0, rk1);
                let tmp = aes64ks1i::<10>(rk1);
                rk2 = aes64ks2(tmp, rk2);
                rk3 = aes64ks2(rk2, rk3);
                scheduled_key[4 * ($i + 1) + 0] = rk0;
                scheduled_key[4 * ($i + 1) + 1] = rk1;
                scheduled_key[4 * ($i + 1) + 2] = rk2;
                scheduled_key[4 * ($i + 1) + 3] = rk3;
            };
        }
        double_round!(0);
        double_round!(1);
        double_round!(2);
        double_round!(3);
        double_round!(4);
        double_round!(5);
        // Process tail
        let tmp = aes64ks1i::<6>(rk3);
        rk0 = aes64ks2(tmp, rk0);
        rk1 = aes64ks2(rk0, rk1);
        scheduled_key[4 * 7 + 0] = rk0;
        scheduled_key[4 * 7 + 1] = rk1;
    }
}

// #[target_feature(enable = "zknd")]
// #[target_feature(enable = "zknd", enable = "zkne")]
#[unsafe(no_mangle)]
pub unsafe extern "C" fn aes256_key_schedule_for_decryption(
    scheduled_key: *const [u64; 30],
    scheduled_key_dec: *mut [u64; 30],
) {
    unsafe {
        let scheduled_key: &[u64; 30] = &*scheduled_key;
        let scheduled_key_dec: &mut [u64; 30] = &mut *scheduled_key_dec;
        // Copy the first scheduled key as-is.
        scheduled_key_dec[0] = scheduled_key[0];
        scheduled_key_dec[1] = scheduled_key[1];
        // Inverse MixColumns on scheduled keys index 1..=13
        macro_rules! invmc {
            ($i: expr) => {
                scheduled_key_dec[$i * 2 + 0] = aes64im(scheduled_key[$i * 2 + 0]);
                scheduled_key_dec[$i * 2 + 1] = aes64im(scheduled_key[$i * 2 + 1]);
            };
        }
        invmc!(1);
        invmc!(2);
        invmc!(3);
        invmc!(4);
        invmc!(5);
        invmc!(6);
        invmc!(7);
        invmc!(8);
        invmc!(9);
        invmc!(10);
        invmc!(11);
        invmc!(12);
        invmc!(13);
        // Copy the last scheduled key as-is.
        scheduled_key_dec[28] = scheduled_key[28];
        scheduled_key_dec[29] = scheduled_key[29];
    }
}

Using the inline assembly and `.insn` could avoid current issues regarding
intrinsics available when either Zknd or Zkne is available.

Note that, coincidental success of the testing process depends on the
possibly flaky disassembler behavior (which attempts to disassemble
instructions *not* enabled on the target of the output object file) and
should be removed now or if anything breaks.
@a4lg a4lg force-pushed the test-riscv64-zknd-zkne-by-direct-asm branch from 1209865 to 1a47fad Compare September 13, 2025 06:44
@a4lg
Copy link
Contributor Author

a4lg commented Sep 13, 2025

Little Update: Added preserves_flags to the option.

While RISC-V is normally described as "RISC-V does not have condition flags", by default the Rust compiler assumes that RISC-V inline assembly may overwrite some of vector-related registers. Since we don't touch them, we'd better add it to ensure mixed code don't need saving/restoring such registers.

It affects other intrinsics with inline assembly and I'll submit another (non-draft) PR for better efficiency on mixed RVV + non-vector intrinsics.

@a4lg
Copy link
Contributor Author

a4lg commented Sep 14, 2025

Also note that, the option pure can make inline assembly optimization friendly (it states that there's no side effects outside declared in asm!).
Rust's asm! is by default __asm__ __volatile__ in the GCC syntax but the option pure makes this __asm__ (which allows rearranging instructions across this block and removing the block entirely).

@sayantn
Copy link
Contributor

sayantn commented Sep 18, 2025

Ok, if it does optimize satisfyingly enough, we can then probably use the asm implementation. Did you send the rustc PR yet?

@a4lg
Copy link
Contributor Author

a4lg commented Sep 18, 2025

@sayantn
I'll submit after rust-lang/rust#146709 is merged.
By the way, I'd like to credit your name in my commit (zkne_or_zknd target feature part) and is Co-authored-by trailer okay to you? If not, I'm going to credit your name as Helped-by trailer.

@sayantn
Copy link
Contributor

sayantn commented Sep 18, 2025

I have no problems, but typically changes due to review comments don't need attribution, so you really don't need to put any attributions.

@a4lg
Copy link
Contributor Author

a4lg commented Sep 18, 2025

We didn't share the code directly but I think we helped each other so... Helped-by seems better (at first, I thought Co-authored-by is okay since we don't have much choices to implement zkne_or_zknd and our experimental code should be pretty much the same).

I understand that attribution is completely optional but since you have first examined the possibilities to fix the issue #1765, I want to show your contribution in some way.

@a4lg
Copy link
Contributor Author

a4lg commented Sep 20, 2025

Closing this in favor of rust-lang/rust#146798.

In that PR, inline assembly got symbolic.

@a4lg a4lg closed this Sep 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants