-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29682: Roll HMaster WAL in response to FlushMasterStoreRequest #7407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
🎊 +1 overall
This message was automatically generated. |
|
Which WAL implementation do you use? We should be able to deal with datanode restart. The data which has been hflushed should be OK, and once there is a write error, we will try rolling the WAL and then write the data out again. |
|
🎊 +1 overall
This message was automatically generated. |
|
We're not setting The problem we see in HMasters is this: Since several DataNodes have recently restarted, they get added to the DFSClient's bad node list, and once enough DataNodes are on the bad nodes list, there are none left to write to. This makes the HMaster restart. |
Seems we have already started a log roll here? Then I do not think issuing a log roll manually can fix the problem? |
I argue that in addition to flushing the master region, the
flushMasterStore()RPC should also roll the WAL files being written by the HMaster.In clusters with small numbers of DataNodes (less than 9, in my experience), a rolling restart of the DataNodes will break WAL writers, in both HMasters and RegionServers. This can be worked around on RegionServers by calling
admin.rollWALWriter()on every RegionServer after restarting each DataNode. Currently, there is no equivalent way to do this to an HMaster. I think the most elegant solution is to augmentadmin.flushMasterStore().