-
Notifications
You must be signed in to change notification settings - Fork 53
perf: p-token optimize pubkey cmp #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: p-token optimize pubkey cmp #64
Conversation
@febo can you look at this? I thought the compiler did the right thing here, but it might not be the case |
On a small test program using platform-tools I am not sure we can use the let a_chunks = unsafe { from_raw_parts(a.as_ptr() as *const u64, 4) };
let b_chunks = unsafe { from_raw_parts(b.as_ptr() as *const u64, 4) }; Another aspect that is interesting is that if you use platform-tools |
True wrt alignment, put it in draft. AccountInfos should be aligned to u64 right? If so, alignment should be guaranteed for all uses of
Fyi, I had similar CU results in p-token tests with the following implementation,
|
It is not clear the version of platform tools used for these benchmarks. The numbers for the latest released version v1.50 are in the table below. They are lower than the base for the benchmark, but still do not match the optimized version for this PR. The compiler has a threshold for when to transform a chain of loads and compares into a syscall. Currently, we consider 3 sequences for that, since a comparison involves two loads and one compare. Thus, 3*3 = 9 CUs, while the syscall overhead is 10 CUs. That does not account for the arguments set up we must do before invoking the syscall. We must adjust three registers which will hold the arguments for memcmp. If we account that and raise the threshold to 4 sequences (8 loads and 4 compares = 12 CUs), we achieve very good results (see the third column in the table).
I'll update the compiler, and put this improvement in a new release. When not invoking memcmp, the code the compiler emits for comparison is a chain of u64 comparisons, similar to what this PR proposes. |
The All signers of a |
The fix is here: anza-xyz/llvm-project#160 |
solana-cargo-build-sbf 2.2.15 |
With platform tools |
Issue
p-token
usespartial_eq
to compare pubkeys which is not optimal.spl-token
usessol_memcmp
which is better thanpartial_eq
(program/src/processor.rs).A custom
pubkey_eq
implementation can improve performance of manyp-token
instructions by 10s of CU see table.To reproduce CU measurements, see tx logs by
transfer
,transfer_checked
,batch
tests in branches:Changes:
pubkey_eq
, cast 32 byte arrays to 4 u64 chunks, compare in a loop and exit early on unequalpubkey_eq
Notes:
transfer
,transfer_checked
, andbatch
tests (building all tests doesn't work with 32gb ram). It is unlikely that performance of other instructions decreased with this change but best double check.pinocchio
to use it inAccountInfo::is_owned_by
.solana-cargo-build-sbf 2.2.15
platform-tools v1.48
rustc 1.84.1