- 
                Notifications
    You must be signed in to change notification settings 
- Fork 3.4k
HBASE-29039 Optimize read performance for accumulated delete markers on the same row or cell #6557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
8075b0d    to
    80044c2      
    Compare
  
    
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
| I checked the code, we do have logic to seek to next row or column when we hit a delte family cell. Line 207 in 28c4353 
 But the problem is that, seems we will return earlier before we actually call this method here Line 76 in 28c4353 
 The above code block Seems incorrect, we will always return MatchCode.SKIP if we get a delete maker... I think why we do not find this before is that, usually there will be only one delete maker, so when we check the next cell, we will fall through and call the checkDeleted method so we will seek to next row or column. Here the scenario is that we have bunch of delete makrer, then here we will see them all instead of seek to next row or column, since we will always go into the code block above and return MatchCode.SKIP. I think we should try to optimize the logic of the above code block. Thanks. | 
| FYI @kadirozde | 
fc55b25    to
    f6739e2      
    Compare
  
    
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
91636d1    to
    6390115      
    Compare
  
    | 
 f6739e2 74ecf71 | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| 💔 -1 overall 
 
 This message was automatically generated. | 
| private boolean canOptimizeReadDeleteMarkers() { | ||
| // for simplicity, optimization works only for these cases | ||
| return !seePastDeleteMarkers && scanMaxVersions == 1 && !visibilityLabelEnabled | ||
| && getFilter() == null && !(deletes instanceof NewVersionBehaviorTracker); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EungsopYoo have you also considered Dual File Compaction #5545 ?
Could you also run some perf test comparing Dual File Compaction with this optimization? This might be really helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@virajjasani
I have reviewed Dual File Compaction you mentioned. This PR and Dual File Compaction have something in common, especially handling delete markers. But I think there are some differences.
This PR focuses on the accumulated delete markers of the same row or cell, but that handles delete marker of different rows or columns. And this PR can optimize read from both of MemStore and StoreFiles, but that can optimize read from StoreFiles only.
So I think they are complementary and can be used all together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @EungsopYoo, this is what I was also expecting.
On the Jira https://issues.apache.org/jira/browse/HBASE-25972, Kadir has also provided how full scan is improvement is observed using PE (second comment on the Jira). Could you also run the same steps to see how much improvement you observe using this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@virajjasani
PE does not have the test case Put, Delete and Get on the same row. Should I add the new test case and run it, maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary because the steps mentioned in the Jira will take care of adding many delete markers so you can follow the exact same steps. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this improvement is only meaningful when the scanned data is in memstore assuming that that the skip list will be used for jumping from one column to the next (I have not looked at the code in detail recently so I assume it is the case). However, when HBase scans data from HFile, do we have data structures in place to jump from one column to next one? I think we do not have one. No only we linearly scan the cells within a row, we also linearly scan all rows within a HBase block, do not we? So I did not understand why skipping to the next column would be a significant optimization in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @EungsopYoo, how about you keep KEEP_DELETED_CELL false above and see how perf numbers look like with and without your patch?
I have run the same tests again except KEEP_DELETED_CELL is set false.
master
- scan with dual file compaction enabled 4047ms
- scan with dual file compaction disabled 4277ms
- scan with delete markers and dual file compaction disabled 2.5031 seconds
- scan with delete markers and dual file compaction enabled 0.0198 seconds
this PR
- scan with dual file compaction enabled 4134ms
- scan with dual file compaction disabled 3383ms
- scan with delete markers and dual file compaction disabled 3.4726 seconds
- scan with delete markers and dual file compaction enabled 0.0245 seconds
It looks like there is some performance degradation on the result 3. I will dig into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this improvement is only meaningful when the scanned data is in memstore assuming that that the skip list will be used for jumping from one column to the next (I have not looked at the code in detail recently so I assume it is the case). However, when HBase scans data from HFile, do we have data structures in place to jump from one column to next one? I think we do not have one. No only we linearly scan the cells within a row, we also linearly scan all rows within a HBase block, do not we? So I did not understand why skipping to the next column would be a significant optimization in general.
https://issues.apache.org/jira/browse/HBASE-29039
The performance tests on the Jira description are just the cases of reading from MemStore only. So I have run new performance tests, reading from StoreFiles only with or without dual file compaction.
create 'test', 'c'
java_import org.apache.hadoop.hbase.client.Delete
java_import org.apache.hadoop.hbase.TableName
java_import java.lang.System
con = @hbase.instance_variable_get(:@connection)
table = con.getTable(TableName.valueOf('test'))
1000.times do |i|
  # batch 10000 deletes with different timestamps every 10 seconds
  now = System.currentTimeMillis()
  dels = 10000.times.map do |i|
    del = Delete.new(Bytes.toBytes('row'))
    del.addFamily(Bytes.toBytes('c'), now + i)
  end
  table.delete(dels)
  sleep(10)
  flush 'test'
  # Trigger manually minor compaction because of compaction is not triggered automatically. But I don't know why yet
  compact 'test' if i % 10 == 0
  puts "i - #{i}"
  # Read after flush from StoreFiles
  get 'test', 'row'
end
master - without dual file compaction
i - 0
COLUMN  CELL
0 row(s)
Took 0.0184 seconds
...
i - 10
COLUMN  CELL
0 row(s)
Took 0.0206 seconds
...
i - 20
COLUMN  CELL
0 row(s)
Took 0.0398 seconds
...
i - 30
COLUMN  CELL
0 row(s)
Took 0.0462 seconds
...
i - 40
COLUMN  CELL
0 row(s)
Took 0.0517 seconds
...
i - 50
COLUMN  CELL
0 row(s)
Took 0.0730 seconds
...
i - 60
COLUMN  CELL
0 row(s)
Took 0.0903 seconds
master - with dual file compaction
i - 0
COLUMN  CELL
0 row(s)
Took 0.0050 seconds
...
i - 10
COLUMN  CELL
0 row(s)
Took 0.0175 seconds
...
i - 20
COLUMN  CELL
0 row(s)
Took 0.0265 seconds
...
i - 30
COLUMN  CELL
0 row(s)
Took 0.0310 seconds
...
i - 40
COLUMN  CELL
0 row(s)
Took 0.0205 seconds
...
i - 50
COLUMN  CELL
0 row(s)
Took 0.0532 seconds
...
i - 60
COLUMN  CELL
0 row(s)
Took 0.0360 seconds
this PR - without dual file compaction
i - 0
COLUMN  CELL
0 row(s)
Took 0.0061 seconds
...
i - 10
COLUMN  CELL
0 row(s)
Took 0.0102 seconds
...
i - 20
COLUMN  CELL
0 row(s)
Took 0.0052 seconds
...
i - 30
COLUMN  CELL
0 row(s)
Took 0.0073 seconds
...
i - 40
COLUMN  CELL
0 row(s)
Took 0.0077 seconds
...
i - 50
COLUMN  CELL
0 row(s)
Took 0.0115 seconds
...
i - 60
COLUMN  CELL
0 row(s)
Took 0.0101 seconds
this PR - with dual file compaction
i - 0
COLUMN  CELL
0 row(s)
Took 0.0046 seconds
...
i - 10
COLUMN  CELL
0 row(s)
Took 0.0103 seconds
...
i - 20
COLUMN  CELL
0 row(s)
Took 0.0074 seconds
...
i - 30
COLUMN  CELL
0 row(s)
Took 0.0067 seconds
...
i - 40
COLUMN  CELL
0 row(s)
Took 0.0077 seconds
...
i - 50
COLUMN  CELL
0 row(s)
Took 0.0091 seconds
i - 60
COLUMN  CELL
0 row(s)
Took 0.0106 seconds
The results show that the optimization of this PR works on reading from StoreFiles too, even without dual file compaction. What do you think about this results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @EungsopYoo, how about you keep KEEP_DELETED_CELL false above and see how perf numbers look like with and without your patch?
I have run the same tests again except KEEP_DELETED_CELL is set false.
master
- scan with dual file compaction enabled 4047ms
- scan with dual file compaction disabled 4277ms
- scan with delete markers and dual file compaction disabled 2.5031 seconds
- scan with delete markers and dual file compaction enabled 0.0198 seconds
this PR
- scan with dual file compaction enabled 4134ms
- scan with dual file compaction disabled 3383ms
- scan with delete markers and dual file compaction disabled 3.4726 seconds
- scan with delete markers and dual file compaction enabled 0.0245 seconds
It looks like there is some performance degradation on the result 3. I will dig into it.
The slight performance degradation is due to removing of early return of MatchCode.SKIP in the normal cases. Because of the removal of early return, checkDeleted() method is executed more than before, and then some burden of computation is added. I found the result by removing added code blocks one by one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
Line 644 in 6390115
| ScanQueryMatcher.MatchCode qcode = matcher.match(cell, prevCell); | 
hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
Lines 758 to 768 in 6390115
| case SEEK_NEXT_COL: | |
| seekOrSkipToNextColumn(cell); | |
| NextState stateAfterSeekNextColumn = needToReturn(outResult); | |
| if (stateAfterSeekNextColumn != null) { | |
| return scannerContext.setScannerState(stateAfterSeekNextColumn).hasMoreValues(); | |
| } | |
| break; | |
| case SKIP: | |
| this.heap.next(); | |
| break; | 
With some more digging, I found the actual cause of the degradation is return value of
matcher.match(). It is very lightweight to process the return value of SKIP. But it is much more heavier to process the return value of SEEK_NEXT_COL.
    | @virajjasani @Apache9 @kadirozde | 
No description provided.