Skip to content

Conversation

@dzajkowski
Copy link
Contributor

@dzajkowski dzajkowski commented Jul 21, 2021

Check if the db is consistent. It has been observed that dbs get broken on sagano. This should support figuring out when it happens.

@dzajkowski dzajkowski marked this pull request as ready for review July 21, 2021 14:05
hash <- blockNumberMappingStorage.get(idx)
_ <- blockHeadersStorage.get(hash)
} yield ()).fold {
log.error("Database seems to be in inconsistent state, shutting down")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we include information on how the user should act when this happens? I guess how to delete the DB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a good place to add such documentation for the client? Not convinced, I think we have a mantis docs project.
We don't have a policy for such situations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case advising the user to check the website. But I think that saying nothing is the worse option :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should log the missing block's hash and number. I'm not sure it would really helps us when this happens, but it does not hurt.

@leo-bogastry
Copy link
Contributor

Is the long-term plan to be able to have Mantis self fix this consistency?

@dzajkowski
Copy link
Contributor Author

Is the long-term plan to be able to have Mantis self fix this consistency?

@LeonorLunatech depends, we don't know what the setting is. We did not have a user report this as an issue, so it might be test env specific.

@leo-bogastry
Copy link
Contributor

Is the long-term plan to be able to have Mantis self fix this consistency?

@LeonorLunatech depends, we don't know what the setting is. We did not have a user report this as an issue, so it might be test env specific.

Ok, i get that. In that case, because I'm also wondering if there's a performance impact, would it make sense to have this enabled only when running in test networks?

loadGenesisData()

private[this] def startSyncController(): Unit = syncController ! SyncProtocol.Start
StorageConsistencyChecker.checkStorageConsistency(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind wrapping this in a startConsistencyCheck() or something similar?

blockHeadersStorage,
shutdown
)(log)
timers.startSingleTimer(Tick, 5.seconds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confusing. Are we doing the check every 10 minutes (https://github.com/input-output-hk/mantis/pull/1070/files#diff-a8c02d6306b1c231f0efde1b3f46fa9cc6659899bfa882c317479432df489a7aR23) or 5 seconds?
It's probably a good idea to make this configurable, unless it's just a temporary measure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, this should be a method

shutdown: ShutdownOp,
step: Int = DefaultStep
)(implicit log: Logger): Unit =
Range(0, bestBlockNumber.intValue, step).foreach { idx =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it makes sense to check every 1000th block. I imagine a case where the check passes but the db is inconsistent.
Perhaps it would be worthwhile to occasionally check the whole range?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a first pass. I want to start identifying the circumstances when clients loose consistency.

@dzajkowski dzajkowski force-pushed the feature/etcm-967-db-consistency-check branch from ea77359 to 70f9679 Compare July 23, 2021 14:27
@dzajkowski dzajkowski merged commit 5ccbdfd into develop Jul 26, 2021
@dzajkowski dzajkowski deleted the feature/etcm-967-db-consistency-check branch July 26, 2021 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants