- 
                Notifications
    You must be signed in to change notification settings 
- Fork 3.4k
HBASE-28456 HBase Restore restores old data if data for the same timestamp is in different hfiles #5775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e788c26    to
    cdef073      
    Compare
  
    …stamp is in different hfiles
cdef073    to
    5c744b1      
    Compare
  
    | 🎊 +1 overall 
 
 This message was automatically generated. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| 💔 -1 overall 
 
 This message was automatically generated. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| 💔 -1 overall 
 
 This message was automatically generated. | 
| 💔 -1 overall 
 
 This message was automatically generated. | 
| 💔 -1 overall 
 
 This message was automatically generated. | 
| * ExtendedCell. | ||
| */ | ||
| static final String EXTENDED_CELL_SERIALIZATION_ENABLED_KEY = | ||
| public static final String EXTENDED_CELL_SERIALIZATION_ENABLED_KEY = | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is becoming part of the public API. It's worth updating the comment about its usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ndimiduk I decided to annotate that as IA.Private. I know we generally prefer not to do that for fields/methods, but for better or worse there's already a strong convention of doing it in this class (11 methods annotated IA.Private). It's probably worth a larger refactor/cleanup of HFileOutputFormat2
Let me know if you strongly disagree with this approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good by me.
        
          
                hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStoreFile.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
      |  | ||
| /** | ||
| * Cells in a bulkloaded file don't have a sequenceId since they don't go through memstore. When a | ||
| * bulkload file is committed, the current memstore ts is stamped onto the file name as the | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes! Seems like we need a seqId field in the file header instead of relying on the file name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason why we don't do this already is that a bulkloaded HFile is typically created outside the cluster. At that point, we don't know what the memstore seq id is at. Only at the time of bulkload file commit, when we've locked and flushed the memstore, do we know the memstore seq id.
Since hfiles are immutable, we can't open it at that point and change a metadata. So we need to add it to the filename I guess. I agree relying on the file name seems brittle. A more robust solution might be tricky given the above tho
| 🎊 +1 overall 
 
 This message was automatically generated. | 
|  | ||
| import static org.apache.hadoop.hbase.backup.BackupInfo.BackupState.COMPLETE; | ||
| import static org.apache.hadoop.hbase.backup.BackupType.FULL; | ||
| import static org.junit.Assert.*; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid star imports.
| import java.io.IOException; | ||
| import java.nio.ByteBuffer; | ||
| import java.time.Instant; | ||
| import java.util.*; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
| } | ||
| value = scanner.getCell(); | ||
| if (value != null && bulkloadSeqId.isPresent()) { | ||
| PrivateCellUtil.setSequenceId(value, bulkloadSeqId.getAsLong()); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not use bulkloadSeqId.ifPresent because PrivateCellUtil.setSequenceId will throw exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea setSequenceId throws IOException. Figured it looks cleaner this way
…stamp is in different hfiles (#5775) Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Duo Zhang <[email protected]>
…stamp is in different hfiles (#5775) Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Duo Zhang <[email protected]>
…stamp is in different hfiles (#5775) Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Duo Zhang <[email protected]>
| 💔 -1 overall 
 
 This message was automatically generated. | 
| 💔 -1 overall 
 
 This message was automatically generated. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
… for the same timestamp is in different hfiles (apache#5775) Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Duo Zhang <[email protected]>
_SeqId_handling