-
Notifications
You must be signed in to change notification settings - Fork 70
feat(nightly): execution becomes faster
#2013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`become` keyword was causing some corruption and not passing references properly between calls.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment has been minimized.
This comment has been minimized.
CodSpeed WallTime Performance ReportMerging #2013 will improve performances by 95.21%Comparing
|
| Benchmark | BASE |
HEAD |
Change | |
|---|---|---|---|---|
| ⚡ | benchmark_execute[bubblesort] |
33.1 ms | 21.6 ms | +53.62% |
| ⚡ | benchmark_execute[factorial_iterative_u256] |
150.6 ms | 84.6 ms | +78.03% |
| ⚡ | benchmark_execute[fibonacci_iterative] |
48.8 ms | 26.5 ms | +84.06% |
| ⚡ | benchmark_execute[fibonacci_recursive] |
75.7 ms | 38.8 ms | +95.21% |
| ⚡ | benchmark_execute[keccak256] |
38.3 ms | 24.7 ms | +55.16% |
| ⚡ | benchmark_execute[keccak256_iter] |
126.4 ms | 83.7 ms | +51.09% |
| ⚡ | benchmark_execute[pairing] |
148.9 ms | 134.8 ms | +10.45% |
| ⚡ | benchmark_execute[quicksort] |
37.4 ms | 23.6 ms | +58.44% |
| ⚡ | benchmark_execute[revm_transfer] |
53.6 ms | 39.2 ms | +36.76% |
| ⚡ | benchmark_execute[sha256] |
37.7 ms | 23.2 ms | +62.73% |
| ⚡ | benchmark_execute[sha256_iter] |
109.6 ms | 60.1 ms | +82.4% |
| ⚡ | benchmark_execute_metered[bubblesort] |
67.7 ms | 44.4 ms | +52.41% |
| ⚡ | benchmark_execute_metered[factorial_iterative_u256] |
252.8 ms | 210.8 ms | +19.91% |
| ⚡ | benchmark_execute_metered[fibonacci_recursive] |
109.1 ms | 91.6 ms | +19.09% |
| ⚡ | benchmark_execute_metered[keccak256_iter] |
179.6 ms | 147.9 ms | +21.4% |
| ⚡ | benchmark_execute_metered[quicksort] |
71.4 ms | 50.1 ms | +42.53% |
| ⚡ | benchmark_execute_metered[revm_transfer] |
66.7 ms | 60.1 ms | +11.06% |
| ⚡ | benchmark_internal_verifier_execute[fibonacci] |
46.4 ms | 40.4 ms | +14.68% |
98988d9 to
9d9aa8a
Compare
becomes faster
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
3505ba2 to
64c0571
Compare
This comment has been minimized.
This comment has been minimized.
4a1e5e8 to
e066de4
Compare
e066de4 to
ce7c037
Compare
This comment has been minimized.
This comment has been minimized.
| return; | ||
| } | ||
| // exec_state.pc should have been updated by execute_impl at this point | ||
| let next_handler = interpreter.get_handler(exec_state.vm_state.pc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both get_handler and get_pre_compute call get_pc_index. maybe we can calculate the index once and reuse it
or add a get_pre_handler_and_pre_compute function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the compiler should optimize this out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh actually this is the new pc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still think there's duplicate work happening here that won't be optimized out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh you mean like because new pc doesn't get passed across function boundary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
| } | ||
| // exec_state.pc should have been updated by execute_impl at this point | ||
| let next_handler = interpreter.get_handler(exec_state.vm_state.pc); | ||
| if next_handler.is_none() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not required rn but maybe we can just return a null pointer instead of Option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again I hope the compiler just optimizes this out
| let handler_fn = quote! { | ||
| #[inline(never)] | ||
| unsafe fn #handler_name #handler_generics ( | ||
| interpreter: &::openvm_circuit::arch::interpreter::InterpretedInstance<#f_type, #ctx_type>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if it matters but maybe we just pass like a slimmed down instruction table that contains a function/mapping from pc -> (data, handler) instead of a reference to this whole struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea will do with the register pinning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. there's an argument that we can have a single handler function instead of separate precompute and handler functions but i think the current approach with using macros to avoid duplication is fine too
| /// `pc_index = (pc - pc_base) / DEFAULT_PC_STEP`. | ||
| /// `pc_index = pc / DEFAULT_PC_STEP`. | ||
| /// SAFETY: The first `pc_base / DEFAULT_PC_STEP` entries will be unreachable. We do this to | ||
| /// avoid needing to subtract `pc_base` during runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is pc_base guaranteed to be bounded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well it's bounded by the ELF size
| } | ||
| #[cfg(feature = "tco")] | ||
| { | ||
| tracing::debug!("execute_tco"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slightly pedantic, but i feel the tail-recursive function is more like a "trampoline"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I thought (but didn't validate) that other interpreters refer to trampoline as the standard one?
| return; | ||
| } | ||
| // exec_state.pc should have been updated by execute_impl at this point | ||
| let next_handler = interpreter.get_handler(exec_state.vm_state.pc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still think there's duplicate work happening here that won't be optimized out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing else besides Ayush's comments. LGTM if the comments are addressed
Commit: aee66b1 |
Uses the nightly
becomekeyword to tell LLVM tomusttail. This did not work at first when theHandlertype returnedResultbut after I changed it to return nothing it does.Ideas taken from https://github.com/xacrimon/tcvm/blob/main/src/interp.rs
To remove code duplication, I created local declarative macros
dispatch!for any "enum dispatch" logic of function pointers we were doing. The benefit is that this is now shared across four functions: {e1,e2} x {pre_compute,handler}. I did not make a single general purpose macro because Rust token parsing rules actually make this hard because it doesn't let you reference ident's you don't declare, and the ident's used actually vary based on the function.Removed
pc_baseand just leave some empty space to avoidpc - pc_basecalculation. This is becauseself.pc_baseis a runtime variable, so we don't want to have to use a register (or worse load/store) to access it.To complete this:
#[create_tco_handler]attribute aboveexecute_e1_implfunctions. Then for eachExecutorimplementation, copy thepre_computefunction implementation verbatim but switch tohandlerfunction signature and return the tco handler fn pointer instead.Switch to x86 global asm instead of relying on LLVM if we want to be extra safe.this seemed complicated and hard to do fully properly so I preferbecome.Closes INT-4309