Skip to content

Conversation

ananas-block
Copy link

@ananas-block ananas-block commented Jun 29, 2025

Issue

  • p-token uses partial_eq to compare pubkeys which is not optimal.

  • spl-token uses sol_memcmp which is better than partial_eq (program/src/processor.rs).

  • A custom pubkey_eq implementation can improve performance of many p-token instructions by 10s of CU see table.

    Instruction p-token main sol_memcmp pubkey_eq Savings (pubkey_eq - main)
    InitializeMint 148 CU 144 CU 143 CU -5 CU
    InitializeAccount 228 CU 227 CU 225 CU -3 CU
    Mint 187 CU 174 CU 148 CU -39 CU
    Transfer 189 CU 175 CU 144 CU -45 CU
    TransferChecked 256 CU - 186 CU -70 CU
    Batch 1,691 CU - 1,598 CU -93 CU

To reproduce CU measurements, see tx logs by transfer , transfer_checked, batch tests in branches:

Changes:

  • introduce pubkey_eq, cast 32 byte arrays to 4 u64 chunks, compare in a loop and exit early on unequal
  • replace pubkey comparisons with pubkey_eq

Notes:

  • u64 chunks perform significantly better than u16, u32, u128 chunks see this repo.
  • I have only run transfer, transfer_checked, and batch tests (building all tests doesn't work with 32gb ram). It is unlikely that performance of other instructions decreased with this change but best double check.
  • Maybe it makes sense to upstream this to pinocchio to use it in AccountInfo::is_owned_by.
  • Versions:
    solana-cargo-build-sbf 2.2.15
    platform-tools v1.48
    rustc 1.84.1

@joncinque joncinque requested a review from febo June 30, 2025 14:56
@joncinque
Copy link
Contributor

@febo can you look at this? I thought the compiler did the right thing here, but it might not be the case

@febo
Copy link
Contributor

febo commented Jun 30, 2025

@febo can you look at this? I thought the compiler did the right thing here, but it might not be the case

Interesting, this is what the compiler should be doing – I will check with @LucasSte. It might depend on what platform tools version is being used to benchmark.

@febo
Copy link
Contributor

febo commented Jul 1, 2025

On a small test program using platform-tools 1.48 (solana cli 2.2.15), pubkey comparisons are done using memcmp syscall. Not sure whether it is generating bloated code in the case of p-token or not.

I am not sure we can use the u64 chunks approach, since it requires the memory to be 8-bytes aligned, which we cannot guarantee in every case:

let a_chunks = unsafe { from_raw_parts(a.as_ptr() as *const u64, 4) };
let b_chunks = unsafe { from_raw_parts(b.as_ptr() as *const u64, 4) };

Another aspect that is interesting is that if you use platform-tools 1.43 (solana cli 2.1.22), CUs significantly improve for the same instruction – e.g., transfer goes from 189 to 153 without any code change.

@ananas-block
Copy link
Author

ananas-block commented Jul 1, 2025

True wrt alignment, put it in draft.

AccountInfos should be aligned to u64 right?

If so, alignment should be guaranteed for all uses of pubkey_eq except one:

 else if !pubkeys_eq(destination_account_info.key(), &INCINERATOR_ID)

Fyi, I had similar CU results in p-token tests with the following implementation,
but it performed significantly worse than this pr impl in my benchmark repo over 1k iterations.

#[inline(always)]
pub fn pubkeys_eq(a: &Pubkey, b: &Pubkey) -> bool {
    let a_bytes = a.as_ref();
    let b_bytes = b.as_ref();

    // Compare 8 bytes at a time using safe array slicing
    let a_chunks = [
        &a_bytes[0..8],
        &a_bytes[8..16],
        &a_bytes[16..24],
        &a_bytes[24..32],
    ];
    let b_chunks = [
        &b_bytes[0..8],
        &b_bytes[8..16],
        &b_bytes[16..24],
        &b_bytes[24..32],
    ];

    for i in 0..4 {
        if a_chunks[i] != b_chunks[i] {
            return false;
        }
    }
    true
}

@ananas-block ananas-block marked this pull request as draft July 1, 2025 17:26
@LucasSte
Copy link

LucasSte commented Jul 1, 2025

It is not clear the version of platform tools used for these benchmarks. The numbers for the latest released version v1.50 are in the table below. They are lower than the base for the benchmark, but still do not match the optimized version for this PR.

The compiler has a threshold for when to transform a chain of loads and compares into a syscall. Currently, we consider 3 sequences for that, since a comparison involves two loads and one compare. Thus, 3*3 = 9 CUs, while the syscall overhead is 10 CUs.

That does not account for the arguments set up we must do before invoking the syscall. We must adjust three registers which will hold the arguments for memcmp. If we account that and raise the threshold to 4 sequences (8 loads and 4 compares = 12 CUs), we achieve very good results (see the third column in the table).

Instruction v1.50 v1.50 + adjustments
InitializeMint 135 CU 121 CU
InitializeAccount 227 CU 176 CU
Mint 170 CU 142 CU
Transfer 176 CU 146 CU
TransferChecked 232 CU 184 CU
Batch 1,598 CU 1,387 CU

I'll update the compiler, and put this improvement in a new release.

When not invoking memcmp, the code the compiler emits for comparison is a chain of u64 comparisons, similar to what this PR proposes.

@febo
Copy link
Contributor

febo commented Jul 1, 2025

AccountInfos should be aligned to u64 right?

The AccountInfos are aligned, but the account data is not necessarily aligned. For example, the close account instruction compares the close authority pubkey stored on an Account in the validate_owner call and that close authority pubkey is not aligned to 8 bytes. Same situation happens with the freeze authority of a Mint account.

All signers of a Multisig are also not 8 bytes aligned.

@LucasSte
Copy link

LucasSte commented Jul 1, 2025

The fix is here: anza-xyz/llvm-project#160

@ananas-block
Copy link
Author

ananas-block commented Jul 1, 2025

It is not clear the version of platform tools used for these benchmarks.

solana-cargo-build-sbf 2.2.15
platform-tools v1.48
rustc 1.84.1

@ananas-block
Copy link
Author

With platform tools 1.50 a transfer with custom eq is 148 CU, instead of 144 CU with 1.48.
Anyway I am looking forward to the platform tools release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants