Skip to content

Conversation

@drozdziak1
Copy link
Contributor

This new value is bumped on chain with each new attestation, on a per-symbol basis. The timestamp is common for all cranks. This enables attester cranks to rate limit their attestations of a symbol together with all other attesters. Ultimately, our tx expenses should drop while still fulfilling our preferred attestation rates.

Rate limiting logic in depth

The rate-limiting works by letting clients specify a desired rate limiting interval in instruction args. If all of the batch's symbols were already attested up to that amount of time ago, the TX fails and no SOL is spent. Note: this rate limit may be reached because of a different attester working in parallel with the client that sees this soft error. The rate limiting error does not contribute to metrics nor healthcheck.

Review areas of interest

  • Comments and naming - This feature is the first of its kind, affecting concurrent attester behavior. I think it's important to make sure it's easy to understand the results coming from using this new feature in the crank service. Hopefully the comments are enough and easy to understand
  • New logic in attest.rs code path - there's new logic evaluating the rate limit on chain and updating the value afterwards.
  • New attestation job logic in main.rs - We use tx simulation to detect the exact rate limit error and avoid counting it in monitoring ok/error counters. This is done by returning early from a rather dense code block. Style/readability suggestions are very welcome there.

Testing

The mock attester in Tilt gets a 2-second rate limit using the config value on its 1-second min interval attestation group. The rate limiting code can be seen as non-error INFO level messages about the rate limit. This happens only on 50-66% of resend attempts, as expected for the fast resends on a slower rate limit.

In consequence, attester clients are able to rate-limit attestations
among _all_ active attesters - because the new last attestation timestamp is kept up
to date on chain. Ultimately, this value being shared by concurrent clients,
this feature limits our tx expenses while fulfilling our preferred attestation rates.
@vercel
Copy link

vercel bot commented Feb 22, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

2 Ignored Deployments
Name Status Preview Comments Updated
example-oracle-amm ⬜️ Ignored (Inspect) Feb 24, 2023 at 2:00PM (UTC)
xc-admin-frontend ⬜️ Ignored (Inspect) Feb 24, 2023 at 2:00PM (UTC)

Copy link
Contributor

@jayantk jayantk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great to me

- group_name: fast_interval_rate_limited
conditions:
min_interval_secs: 1
rate_limit_interval_secs: 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to have a version of this under default_attestation_conditions as well? I think in practice we're going to set this to 1 for most batches, and that would save us some configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense, will do

Copy link
Contributor

@guibescos guibescos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This made sense to me. PR description was great. main.rs was hard to review because I don't fully understand the crank logic.

rate_limit_interval_secs,
)?;

// Detect rate limiting error early
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit hesitant to this change because the RPC latency (which seems to be around 1s) might actually make this check worthless and even reduce the actual attestation rate hitting the network (regardless of the result).

So 2 unfortunate consequences are:

  1. You decide not to attest because the simulation fails, but if you continue the instruction and it actually sends the transaction it might not get rate limited.
  2. You decide to attest and then it gets rate limited when it's sent for real.

I think it's better to send the tx and only capture the log of its result instead.

The slightly more nuance here is that rpc performs simulation before the actual tx send. I assume that when the contract returns error then the transaction will get reverted, right?
So they do the simulation and the above scenario's can still happen that actually make our performance a bit un-reliable.
My suggestion is that you do not fail return an error when it is rate limited and just keep sending the msg to the log (either it is attesting or not). Then you can also have 2 metrics instead of 1 for sent messages. one for sent tx and one for tx es that really attest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing this. I thought about this and for some reason I couldn't find a way to get the logs from an error in solana_client :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in that case, We would only need to do a map_err and fish the expected log line out of the error type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, the right thing to do might be using a custom error code instead of a log line, I think I'll do that instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But doesn't a custom error code face the same simulation problem i mentioned? this time in the rpc?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But our concern is not much the fees, they are essentially free right?

Copy link
Contributor

@guibescos guibescos Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can disable preflight simulation (I know how to do it in javascript).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Guillermo is right. I'll keep thinking about this for a while and settle for a preflight-less tx. This is probably the only way to get 100% precise rate limiting of the wormhole cross-txs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I came up with the right compromise. We use an error code, look for it in failing tx, finish attestation job if it's detected, pass the error through otherwise. This means the attester will never breach the desired rate limit, and at absolute worst pay for the failing tx if the rare race condition is met.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tilt just went green locally.

Copy link
Collaborator

@ali-behjati ali-behjati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing code and PR description 🔥

I just have one comment that I like to know your opinion about. I might be wrong about what i think.

}

pub const fn default_rate_limit_interval_secs() -> Option<u32> {
Some(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good default value :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uuugh, now we can't pass none 🥇

This lets users pass 0 to disable the feature (0-rate limiting means
no rate limiting at all), which was not possible with the Option type.
@drozdziak1 drozdziak1 merged commit 1978d73 into main Feb 24, 2023
@drozdziak1 drozdziak1 deleted the drozdziak1/attester-last-attestation-time branch February 24, 2023 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants