-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state #948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
...e/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not execute this in the default ForkJoinPool. That can suffer from thread exhaustion/deadlock issues since there are very few threads in the default pool.
Instead use the overload of supplyAsync that accepts an Executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
...e/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question: currently we are passing leaderId = null. So will this call to getOzoneManagerDBSnapshot fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was temporary till RATIS-564. Fixed now.
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One risk is that this code path for stop may not be well tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, also the correct way would be to pause the Ratis server and not stop it. Need a Ratis patch to support pause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omDoubleBuffer.stop interrupts the thread but does not call join. So the doubleBuffer thread may still be running when the stop call returns.
We should probably fix stop to call join.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be a directory correct? Since I assume a DB is multiple files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Updated variable names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand this TODO. Could you clarify a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InstallSnaphsot notification response requires the TermIndex. But we only have the install snapshot index. The term index is irrelevant here. We return a dummy (0) term index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can some of this code be shared with startup initialization by moving to a common function?
|
💔 -1 overall
This message was automatically generated. |
|
Reviewing this patch, @hanishakoneru can you resolve the merge conflict meanwhile? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we will snapshot every 1024 transactions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, when a snapshot is being taken, if the gap between log purges is more than 1024, then it will purge the logs. Snapshot frequency is not dependent on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's set this to a higher value. We don't need to be too aggressive about purging Ratis logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1024 transactions is 100ms worth of edits in a busy cluster. We could set this as high as 1M maybe to keep more history. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Will update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Will update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would they be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ratis is not enabled. This is for the non-HA code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we restore the original files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bharatviswa504 can you take a look at this part to see if we need to instantiate anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (isAclEnabled) {
accessAuthorizer = getACLAuthorizerInstance(conf);
if (accessAuthorizer instanceof OzoneNativeAuthorizer) {
OzoneNativeAuthorizer authorizer =
(OzoneNativeAuthorizer) accessAuthorizer;
authorizer.setVolumeManager(volumeManager);
authorizer.setBucketManager(bucketManager);
authorizer.setKeyManager(keyManager);
authorizer.setPrefixManager(prefixManager);
}
} else {
accessAuthorizer = null;
}
We should do this part also during reloadOMState.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO looks a little worrying. Something we need to address now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thanks for catching this. On startup, we should read the saved ratis snapshot index from disk. I will update the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction: On reloading state, we should not read the saved snaphsot index. Instead, we should updated the snapshot index on disk.
During normal startup, we already read the saved snapshot index.
|
Question: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: MiniOzoneCluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMMetrics.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have set the lifeCycle State here, but I don't see how this will pause stateMachine.
As this state is not being used anywhere else except during initliaze and unpause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is taken care of internally by Ratis. The StateMachineUpdater in Ratis checks if the state is RUNNING before applying log entries to StateMachine.
|
I am mostly +1 on this change. Couple of minor comments and one thing I requested Bharat to double check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shutdown of this executor needs to be done in StateMachine stop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If checkpointSnapshotIndex <= lastAppliedIndex, I think here we need to clean up the DB checkpoint which is downloaded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to delete this file here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should delete the old metrics file as the metrics are no longer valid. We should also save the new metrics. Updated the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where this is being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is by mistake. Not sure how this came here. Removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is successfully done, when can we delete back up?
Might be at end of notifyInstallSnapshot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just trying to understand, as according to your tests with log purge gap 50 and snapshot interval 50. Old logs will be purged. So, when inactive OM start's ratis will internally makes the notifyInstallSnapshot and do reload automatically right. Why in this test it is being done manually, is this just to test the steps. If so, can we have the test automatically to do so by ratis. (I mean have some wait, until it is done, and then check DB, not sure for IT this is too much)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this was for the test only. The problem with testing end to end is that HttpGet does not work for unit tests. The CheckpointServlet (from Recon) which we use for downloading the checkpoint from leader uses HttpGet for the checkpoint transfer.
We do not want the snapshots to be handled via Ratis. When a follower receives installSnaphsot notification, it sends the new loaded DB's snapshot index back to the leader. The leader updates the followers snaphsot index through this. But when Ratis server is starting up, it should be able to determine the latest snapshot index. Otherwise, all the logs will be replayed from the start. I will create a new Jira to address this. Thanks Bharat. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please use the new format for configs? Here are some examples: https://cwiki.apache.org/confluence/display/HADOOP/Java-based+configuration+API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! Let me file a followup jira to fix that. Want to get this patch committed today, it's been hanging around for over a month.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed HDDS-1831.
|
💔 -1 overall
This message was automatically generated. |
|
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 thanks for fixing this @bharatviswa504!
|
💔 -1 overall
This message was automatically generated. |
ca605d8 to
f954e14
Compare
|
Rebased with latest trunk. |
|
💔 -1 overall
This message was automatically generated. |
|
/retest |
|
I am merging this with couple of caveats.
|
* SAMZA-2124: Add Beam API doc to the website * Address pr feedback
…heckpoint and reload OM state (apache#948)
Unit tests pending.