-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-25720 Sync WAL stuck when prepare flush cache will prevent flus… #3140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
| doSyncOfUnflushedWALChanges(wal, getRegionInfo()); | ||
| } catch (Throwable t) { | ||
| status.abort("Sync unflushed WAL changes failed: " + StringUtils.stringifyException(t)); | ||
| fatalForFlushCache(t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind explaining a bit why here we need to abort the regionserver? We do not need to abort the regionserver when hitting IOException above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IOException thrown above is not caused by syncing WAL, only the sync failure should abort the RS as early as possible. And if we only allow aborting RS in the commit step of flushing cache, there will always exit the circumstance that OOM kill, because all failures in the previous preparing step of flushing cache only aborted the flush process, all flushing can not go to the commit step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Apache9 We currently only abort RS when there are critical problems, such as the WAL stuck? The above IOExceptions can only abort the flush in mem and should not prevent the next flushes, right? Here the abort is caused by the wal stucks who are ahead of the committing flush wal sync stucks. What do you think?
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
…hing cache and cause OOM