-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-23157 WAL unflushed seqId tracking may wrong when Durability.AS… #762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -352,9 +352,36 @@ Long startCacheFlush(final byte[] encodedRegionName, final Map<byte[], Long> fam | |
| return lowestUnflushedInRegion; | ||
| } | ||
|
|
||
| void completeCacheFlush(final byte[] encodedRegionName) { | ||
| void completeCacheFlush(byte[] encodedRegionName, long maxFlushedSeqId) { | ||
| // This is a simple hack to avoid maxFlushedSeqId go backwards. | ||
| // The system works fine normally, but if we make use of Durability.ASYNC_WAL and we are going | ||
| // to flush all the stores, the maxFlushedSeqId will be next seq id of the region, but we may | ||
| // still have some unsynced WAL entries in the ringbuffer after we call startCacheFlush, and | ||
| // then it will be recorded as the lowestUnflushedSeqId by the above update method, which is | ||
| // less than the current maxFlushedSeqId. And if next time we only flush the family with this | ||
| // unusual lowestUnflushedSeqId, the maxFlushedSeqId will go backwards. | ||
| // This is an unexpected behavior so we should fix it, otherwise it may cause unexpected | ||
| // behavior in other area. | ||
| // The solution here is a bit hack but fine. Just replace the lowestUnflushedSeqId with | ||
| // maxFlushedSeqId + 1 if it is lesser. The meaning of maxFlushedSeqId is that, all edits less | ||
| // than or equal to it have been flushed, i.e, persistent to HFile, so set | ||
| // lowestUnflushedSequenceId to maxFlushedSeqId + 1 will not cause data loss. | ||
| // And technically, using +1 is fine here. If the maxFlushesSeqId is just the flushOpSeqId, it | ||
| // means we have flushed all the stores so the seq id for actual data should be at least plus 1. | ||
| // And if we do not flush all the stores, then the maxFlushedSeqId is calculated by | ||
| // lowestUnflushedSeqId - 1, so here let's plus the 1 back. | ||
| Long wrappedSeqId = Long.valueOf(maxFlushedSeqId + 1); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The +1 makes me wary. Just add these edits w/ maxFlushedSeqId? Wouldn't that be safer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I explained why we need to +1 here, so what's your real problem? If you do not want to +1, you need to change bunch of comments and field names... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. Your comment/explanation here is good. It is also where it should be, in the class named SequenceIdAccounting. Though hard to follow, it should be possible to read this one class to find how sequnceid accounting is done in the system including tricks to keep the system working when the likes of ASYNC_WAL is enabled. |
||
| synchronized (tieLock) { | ||
| this.flushingSequenceIds.remove(encodedRegionName); | ||
| Map<ImmutableByteArray, Long> unflushed = lowestUnflushedSequenceIds.get(encodedRegionName); | ||
| if (unflushed == null) { | ||
| return; | ||
saintstack marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
| for (Map.Entry<ImmutableByteArray, Long> e : unflushed.entrySet()) { | ||
| if (e.getValue().longValue() <= maxFlushedSeqId) { | ||
| e.setValue(wrappedSeqId); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How we going to fix it? Can't we drive through the flush marker before completing the flush?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we need to do a wal sync under the updateLock.writeLock, which will impact the performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... maybe a slight hiccup for the async wal case -- maybe, given sync'ing to HDFS is such an erratic affair -- but we will also be more 'correct' having pushed out all in the ring buffer ahead of this flush sync w/o having to rewrite their sequenceid.