Skip to content

Conversation

@sandtreader
Copy link
Collaborator

@sandtreader sandtreader commented Oct 21, 2025

Resolves issue #227

This PR implements the skeleton of the "Flight Control" Go/NoGo mechanism for validation:

  • Merges BlockHeader and BlockBody messages into RawBlock, switches handling in EpochState, SPOState
  • Definition of BlockValidation messages which contain a ValidationResult (in common::validation)
  • New consensus module which interposes in the (merged) block message stream, accepting cardano.block.available from Mithril/Upstream and sending on cardano.block.proposed (which downstreams all now switch to)
  • Configuration of validators in consensus - currently empty because there aren't any. VRF Validator (VRF validation #197) can be the first one.

For now consensus can't do anything with a validation failure other than log it, because it doesn't have multiple chains to choose from. #205 will extend this.

I've requested lots of reviewers because (a) this is pretty central to a lot we are doing, and comments welcome and (b) almost everyone will be involved in implementing validators which need to connect to it.

Rationale: Within memory there is no extra cost to sending the full block
body in the two cases (EpochState, SPOState) which subscribe only to headers,
and having separate header messages complicates the forthcoming insertion of
consensus into the block flow.

The only case where this would add overhead would be if either EpochState and/or
SPOState were remote while all the other consumers of block bodies were still
internal.  This seems very unlikely...
Consensus listens on cardano.block.available, and resends as
cardano.block.proposed.  Downstreams BlockUnpacker, EpochsState,
SPOState now subscribe for cardano.block.proposed.

Side-effect: Removed info-level filter from omnibus main to allow
debug output again - needs investigation what it was for (ajw)
Reads BlockValidation messages from all configured validators, fails
the block if any say NoGo.  All must respond for it to continue.

Note: Validation failure is only logged so far, we don't have multiple
upstreams to run consensus on.

Note: There are no configured topics yet because no validators yet
issue messages!

// Read validation responses from all validators in parallel
let results: Vec<_> =
try_join_all(validator_subscriptions.iter_mut().map(|s| s.read()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this deal with a validator that doesn't respond, like a timeout?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It will indeed lock up if a validator doesn't respond, or if a validator is configured which doesn't exist. I did wonder about a timeout but actually a non-response is a system failure - we can't safely either accept or reject the block. If we did had a short timeout, there's always a risk that something gets slowed down and it gets falsely triggered. If it's long, it's game over anyway because we'll miss the slot.

But very open to suggestions!

Copy link
Collaborator

@golddydev golddydev Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we validate block header, will we have several modules to validate each category?
e.g. vrf validator module and kes validator module?

Or will we have them all in one module (Will this reduce some duplication? since these validations can share same state)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's right - either of which can flag a failure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because we're listening for messages, it's best to have a timeout and a log message, otherwise it will be very difficult to debug the system.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed - I'll add a timeout.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@buddhisthead buddhisthead requested a review from Copilot October 21, 2025 20:45
@buddhisthead
Copy link
Collaborator

buddhisthead commented Oct 21, 2025

@sandtreader this is really nice! Super clean design and easy to follow. Hope you don't mind - I turned on the co-pilot reviewer just to see what it would find.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements the "Flight Control" validation mechanism by introducing a consensus module that acts as a validation gateway between block sources and downstream consumers. The key changes consolidate block message handling and establish a validation framework where multiple validators can provide Go/NoGo decisions on proposed blocks.

Key Changes:

  • Merged BlockHeader and BlockBody messages into a single RawBlockMessage containing both header and body data
  • Created new consensus module that receives blocks on cardano.block.available and publishes validated blocks on cardano.block.proposed
  • Defined validation framework with ValidationStatus and ValidationError types for Go/NoGo decisions

Reviewed Changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
modules/consensus/src/consensus.rs New consensus module implementing validation orchestration
common/src/validation.rs Validation types defining Go/NoGo status and error categories
common/src/messages.rs Consolidated block messages and added BlockValidation message type
modules/upstream_chain_fetcher/src/utils.rs Updated to publish single RawBlockMessage instead of separate header/body
modules/mithril_snapshot_fetcher/src/mithril_snapshot_fetcher.rs Updated to publish consolidated RawBlockMessage
modules/epochs_state/src/epochs_state.rs Updated subscription from header-only to full block messages
modules/spo_state/src/spo_state.rs Updated subscription from header-only to full block messages
modules/block_unpacker/src/block_unpacker.rs Updated to consume RawBlockMessage instead of BlockBody
processes/omnibus/src/main.rs Registered consensus module and modified logging filter

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Collaborator

@lowhung lowhung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach feels intuitive to me 👍

Copy link
Collaborator

@golddydev golddydev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me 👍🏼

Note ChainStore now listens on cardano.block.proposed
.with_filter(filter::filter_fn(|meta| meta.is_event()));
let fmt_layer = fmt::layer().with_filter(EnvFilter::from_default_env());

// TODO disabled this filter because it prevents debugging - investigate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it intentional to check this in?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was going to ask you about that... Maybe shouldn't go in this PR though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now reverted and opened #273 for it

Note: Triggers a Caryatid bug where we don't handle timeouts
on the future which produces a panic in select_all() - should be fine
once this is fixed (and this is a Black Swan safety catch anyway)
Still needs fixing to allow DEBUG but left for future PR
@sandtreader sandtreader merged commit 651746a into main Oct 23, 2025
2 checks passed
@sandtreader sandtreader deleted the prc/227-consensus-validation branch October 23, 2025 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants