Skip to content

Nexus Quiescing: Only update db_metadata_nexus when locally quiesced AND all other instances with old nexus_generations report that they have finished re-assignment based on the same blueprint. #8857

@smklein

Description

@smklein

From RFD 588:

The Nexus quiesce process (i.e., the body of the quiesce task) needs to change to do this:

  1. Immediately disable creation of new sagas.
  2. Any time the local saga quiesce state changes (because a saga reassignment pass has completed successfully or because sagas have finished), if sagas are currently fully drained, update the db_metadata_nexus record’s sagas_drained_as_of_blueprint_id field for this Nexus.
  3. Periodically wake up:
    a. Check if this Nexus has fully drained its sagas (see definition below). If not, go back to sleep.
    b. Load the current target blueprint (B)
    c. Load db_metadata_nexus records for all possibly-running Nexus instances (as determined by the target blueprint B).
    d. If all these Nexus instances have sagas_drained_as_of_blueprint_id equal to B, or if any of them has state quiesce, then proceed with quiescing. Otherwise, go back to sleep.
  4. If we get this far, then we’ve determined that all sagas on all old instances have been drained and no new ones can be started. Begin database quiesce the same way we do it today.
  5. Once the database is quiesced, update our db_metadata_nexus record’s state to quiesced, as well as that of any known-dead Nexus instances (according to blueprint B).

This issue covers: performing those validation steps, and only seeing the db_metadata_nexus record state to quiesced once it is safe to do so.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions