-
Notifications
You must be signed in to change notification settings - Fork 5
227 consensus validation / Flight Control #262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Rationale: Within memory there is no extra cost to sending the full block body in the two cases (EpochState, SPOState) which subscribe only to headers, and having separate header messages complicates the forthcoming insertion of consensus into the block flow. The only case where this would add overhead would be if either EpochState and/or SPOState were remote while all the other consumers of block bodies were still internal. This seems very unlikely...
Consensus listens on cardano.block.available, and resends as cardano.block.proposed. Downstreams BlockUnpacker, EpochsState, SPOState now subscribe for cardano.block.proposed. Side-effect: Removed info-level filter from omnibus main to allow debug output again - needs investigation what it was for (ajw)
Reads BlockValidation messages from all configured validators, fails the block if any say NoGo. All must respond for it to continue. Note: Validation failure is only logged so far, we don't have multiple upstreams to run consensus on. Note: There are no configured topics yet because no validators yet issue messages!
modules/consensus/src/consensus.rs
Outdated
|
|
||
| // Read validation responses from all validators in parallel | ||
| let results: Vec<_> = | ||
| try_join_all(validator_subscriptions.iter_mut().map(|s| s.read())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this deal with a validator that doesn't respond, like a timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It will indeed lock up if a validator doesn't respond, or if a validator is configured which doesn't exist. I did wonder about a timeout but actually a non-response is a system failure - we can't safely either accept or reject the block. If we did had a short timeout, there's always a risk that something gets slowed down and it gets falsely triggered. If it's long, it's game over anyway because we'll miss the slot.
But very open to suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we validate block header, will we have several modules to validate each category?
e.g. vrf validator module and kes validator module?
Or will we have them all in one module (Will this reduce some duplication? since these validations can share same state)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's right - either of which can flag a failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think because we're listening for messages, it's best to have a timeout and a log message, otherwise it will be very difficult to debug the system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed - I'll add a timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, but see input-output-hk/caryatid#13
|
@sandtreader this is really nice! Super clean design and easy to follow. Hope you don't mind - I turned on the co-pilot reviewer just to see what it would find. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements the "Flight Control" validation mechanism by introducing a consensus module that acts as a validation gateway between block sources and downstream consumers. The key changes consolidate block message handling and establish a validation framework where multiple validators can provide Go/NoGo decisions on proposed blocks.
Key Changes:
- Merged
BlockHeaderandBlockBodymessages into a singleRawBlockMessagecontaining both header and body data - Created new consensus module that receives blocks on
cardano.block.availableand publishes validated blocks oncardano.block.proposed - Defined validation framework with
ValidationStatusandValidationErrortypes for Go/NoGo decisions
Reviewed Changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| modules/consensus/src/consensus.rs | New consensus module implementing validation orchestration |
| common/src/validation.rs | Validation types defining Go/NoGo status and error categories |
| common/src/messages.rs | Consolidated block messages and added BlockValidation message type |
| modules/upstream_chain_fetcher/src/utils.rs | Updated to publish single RawBlockMessage instead of separate header/body |
| modules/mithril_snapshot_fetcher/src/mithril_snapshot_fetcher.rs | Updated to publish consolidated RawBlockMessage |
| modules/epochs_state/src/epochs_state.rs | Updated subscription from header-only to full block messages |
| modules/spo_state/src/spo_state.rs | Updated subscription from header-only to full block messages |
| modules/block_unpacker/src/block_unpacker.rs | Updated to consume RawBlockMessage instead of BlockBody |
| processes/omnibus/src/main.rs | Registered consensus module and modified logging filter |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
lowhung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach feels intuitive to me 👍
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
golddydev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me 👍🏼
Note ChainStore now listens on cardano.block.proposed
processes/omnibus/src/main.rs
Outdated
| .with_filter(filter::filter_fn(|meta| meta.is_event())); | ||
| let fmt_layer = fmt::layer().with_filter(EnvFilter::from_default_env()); | ||
|
|
||
| // TODO disabled this filter because it prevents debugging - investigate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it intentional to check this in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was going to ask you about that... Maybe shouldn't go in this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now reverted and opened #273 for it
Routes blocks to consensus
Note: Triggers a Caryatid bug where we don't handle timeouts on the future which produces a panic in select_all() - should be fine once this is fixed (and this is a Black Swan safety catch anyway)
Still needs fixing to allow DEBUG but left for future PR
Resolves issue #227
This PR implements the skeleton of the "Flight Control" Go/NoGo mechanism for validation:
cardano.block.availablefrom Mithril/Upstream and sending oncardano.block.proposed(which downstreams all now switch to)For now consensus can't do anything with a validation failure other than log it, because it doesn't have multiple chains to choose from. #205 will extend this.
I've requested lots of reviewers because (a) this is pretty central to a lot we are doing, and comments welcome and (b) almost everyone will be involved in implementing validators which need to connect to it.