Skip to content

Conversation

@EungsopYoo
Copy link
Contributor

No description provided.

@EungsopYoo EungsopYoo changed the title Optimize read performance for accumulated delete markers on the same row or cell HBASE-29039 Optimize read performance for accumulated delete markers on the same row or cell Dec 23, 2024
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache9
Copy link
Contributor

Apache9 commented Dec 29, 2024

I checked the code, we do have logic to seek to next row or column when we hit a delte family cell.

But the problem is that, seems we will return earlier before we actually call this method here

The above code block

    if (PrivateCellUtil.isDelete(typeByte)) {
      boolean includeDeleteMarker =
        seePastDeleteMarkers ? tr.withinTimeRange(timestamp) : tr.withinOrAfterTimeRange(timestamp);
      if (includeDeleteMarker) {
        this.deletes.add(cell);
      }
      return MatchCode.SKIP;
    }

Seems incorrect, we will always return MatchCode.SKIP if we get a delete maker...

I think why we do not find this before is that, usually there will be only one delete maker, so when we check the next cell, we will fall through and call the checkDeleted method so we will seek to next row or column.

Here the scenario is that we have bunch of delete makrer, then here we will see them all instead of seek to next row or column, since we will always go into the code block above and return MatchCode.SKIP.

I think we should try to optimize the logic of the above code block.

Thanks.

@virajjasani
Copy link
Contributor

FYI @kadirozde

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@EungsopYoo
Copy link
Contributor Author

    if (PrivateCellUtil.isDelete(typeByte)) {
      boolean includeDeleteMarker =
        seePastDeleteMarkers ? tr.withinTimeRange(timestamp) : tr.withinOrAfterTimeRange(timestamp);
      if (includeDeleteMarker) {
        this.deletes.add(cell);
      }
      return MatchCode.SKIP;
    }

Seems incorrect, we will always return MatchCode.SKIP if we get a delete maker...

I think why we do not find this before is that, usually there will be only one delete maker, so when we check the next cell, we will fall through and call the checkDeleted method so we will seek to next row or column.

Here the scenario is that we have bunch of delete makrer, then here we will see them all instead of seek to next row or column, since we will always go into the code block above and return MatchCode.SKIP.

I think we should try to optimize the logic of the above code block.

f6739e2
I removed the early return MatchCode.SKIP statement. But some test cases related to NewVersionBehavior were failed after that.

74ecf71
So I restored return MatchCode.SKIP with some exceptional conditions.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 28s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 7s master passed
+1 💚 compile 3m 7s master passed
+1 💚 checkstyle 0m 37s master passed
+1 💚 spotbugs 1m 38s master passed
+1 💚 spotless 0m 45s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 51s the patch passed
+1 💚 compile 3m 8s the patch passed
+1 💚 javac 3m 8s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 35s /results-checkstyle-hbase-server.txt hbase-server: The patch generated 4 new + 8 unchanged - 0 fixed = 12 total (was 8)
+1 💚 spotbugs 1m 37s the patch passed
+1 💚 hadoopcheck 11m 6s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 43s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
36m 49s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6557/10/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6557
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 4ce58ddd6d71 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 6390115
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6557/10/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 27s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 15s master passed
+1 💚 compile 0m 52s master passed
+1 💚 javadoc 0m 26s master passed
+1 💚 shadedjars 5m 36s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 52s the patch passed
+1 💚 compile 0m 52s the patch passed
+1 💚 javac 0m 52s the patch passed
+1 💚 javadoc 0m 25s the patch passed
+1 💚 shadedjars 5m 34s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 206m 15s /patch-unit-hbase-server.txt hbase-server in the patch failed.
230m 30s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6557/10/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6557
Optional Tests javac javadoc unit compile shadedjars
uname Linux a34d57e7da1c 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 6390115
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6557/10/testReport/
Max. process+thread count 5332 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6557/10/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Comment on lines +117 to +121
private boolean canOptimizeReadDeleteMarkers() {
// for simplicity, optimization works only for these cases
return !seePastDeleteMarkers && scanMaxVersions == 1 && !visibilityLabelEnabled
&& getFilter() == null && !(deletes instanceof NewVersionBehaviorTracker);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EungsopYoo have you also considered Dual File Compaction #5545 ?
Could you also run some perf test comparing Dual File Compaction with this optimization? This might be really helpful.

Copy link
Contributor Author

@EungsopYoo EungsopYoo Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@virajjasani
I have reviewed Dual File Compaction you mentioned. This PR and Dual File Compaction have something in common, especially handling delete markers. But I think there are some differences.

This PR focuses on the accumulated delete markers of the same row or cell, but that handles delete marker of different rows or columns. And this PR can optimize read from both of MemStore and StoreFiles, but that can optimize read from StoreFiles only.

So I think they are complementary and can be used all together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @EungsopYoo, this is what I was also expecting.

On the Jira https://issues.apache.org/jira/browse/HBASE-25972, Kadir has also provided how full scan is improvement is observed using PE (second comment on the Jira). Could you also run the same steps to see how much improvement you observe using this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@virajjasani
PE does not have the test case Put, Delete and Get on the same row. Should I add the new test case and run it, maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary because the steps mentioned in the Jira will take care of adding many delete markers so you can follow the exact same steps. Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this improvement is only meaningful when the scanned data is in memstore assuming that that the skip list will be used for jumping from one column to the next (I have not looked at the code in detail recently so I assume it is the case). However, when HBase scans data from HFile, do we have data structures in place to jump from one column to next one? I think we do not have one. No only we linearly scan the cells within a row, we also linearly scan all rows within a HBase block, do not we? So I did not understand why skipping to the next column would be a significant optimization in general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @EungsopYoo, how about you keep KEEP_DELETED_CELL false above and see how perf numbers look like with and without your patch?

I have run the same tests again except KEEP_DELETED_CELL is set false.

master

  1. scan with dual file compaction enabled 4047ms
  2. scan with dual file compaction disabled 4277ms
  3. scan with delete markers and dual file compaction disabled 2.5031 seconds
  4. scan with delete markers and dual file compaction enabled 0.0198 seconds

this PR

  1. scan with dual file compaction enabled 4134ms
  2. scan with dual file compaction disabled 3383ms
  3. scan with delete markers and dual file compaction disabled 3.4726 seconds
  4. scan with delete markers and dual file compaction enabled 0.0245 seconds

It looks like there is some performance degradation on the result 3. I will dig into it.

Copy link
Contributor Author

@EungsopYoo EungsopYoo Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this improvement is only meaningful when the scanned data is in memstore assuming that that the skip list will be used for jumping from one column to the next (I have not looked at the code in detail recently so I assume it is the case). However, when HBase scans data from HFile, do we have data structures in place to jump from one column to next one? I think we do not have one. No only we linearly scan the cells within a row, we also linearly scan all rows within a HBase block, do not we? So I did not understand why skipping to the next column would be a significant optimization in general.

https://issues.apache.org/jira/browse/HBASE-29039
The performance tests on the Jira description are just the cases of reading from MemStore only. So I have run new performance tests, reading from StoreFiles only with or without dual file compaction.

create 'test', 'c'

java_import org.apache.hadoop.hbase.client.Delete
java_import org.apache.hadoop.hbase.TableName
java_import java.lang.System

con = @hbase.instance_variable_get(:@connection)
table = con.getTable(TableName.valueOf('test'))

1000.times do |i|
  # batch 10000 deletes with different timestamps every 10 seconds
  now = System.currentTimeMillis()
  dels = 10000.times.map do |i|
    del = Delete.new(Bytes.toBytes('row'))
    del.addFamily(Bytes.toBytes('c'), now + i)
  end
  table.delete(dels)
  sleep(10)
  flush 'test'
  # Trigger manually minor compaction because of compaction is not triggered automatically. But I don't know why yet
  compact 'test' if i % 10 == 0
  puts "i - #{i}"
  # Read after flush from StoreFiles
  get 'test', 'row'
end

master - without dual file compaction

i - 0
COLUMN  CELL
0 row(s)
Took 0.0184 seconds
...
i - 10
COLUMN  CELL
0 row(s)
Took 0.0206 seconds
...
i - 20
COLUMN  CELL
0 row(s)
Took 0.0398 seconds
...
i - 30
COLUMN  CELL
0 row(s)
Took 0.0462 seconds
...
i - 40
COLUMN  CELL
0 row(s)
Took 0.0517 seconds
...
i - 50
COLUMN  CELL
0 row(s)
Took 0.0730 seconds
...
i - 60
COLUMN  CELL
0 row(s)
Took 0.0903 seconds

master - with dual file compaction

i - 0
COLUMN  CELL
0 row(s)
Took 0.0050 seconds
...
i - 10
COLUMN  CELL
0 row(s)
Took 0.0175 seconds
...
i - 20
COLUMN  CELL
0 row(s)
Took 0.0265 seconds
...
i - 30
COLUMN  CELL
0 row(s)
Took 0.0310 seconds
...
i - 40
COLUMN  CELL
0 row(s)
Took 0.0205 seconds
...
i - 50
COLUMN  CELL
0 row(s)
Took 0.0532 seconds
...
i - 60
COLUMN  CELL
0 row(s)
Took 0.0360 seconds

this PR - without dual file compaction

i - 0
COLUMN  CELL
0 row(s)
Took 0.0061 seconds
...
i - 10
COLUMN  CELL
0 row(s)
Took 0.0102 seconds
...
i - 20
COLUMN  CELL
0 row(s)
Took 0.0052 seconds
...
i - 30
COLUMN  CELL
0 row(s)
Took 0.0073 seconds
...
i - 40
COLUMN  CELL
0 row(s)
Took 0.0077 seconds
...
i - 50
COLUMN  CELL
0 row(s)
Took 0.0115 seconds
...
i - 60
COLUMN  CELL
0 row(s)
Took 0.0101 seconds

this PR - with dual file compaction

i - 0
COLUMN  CELL
0 row(s)
Took 0.0046 seconds
...
i - 10
COLUMN  CELL
0 row(s)
Took 0.0103 seconds
...
i - 20
COLUMN  CELL
0 row(s)
Took 0.0074 seconds
...
i - 30
COLUMN  CELL
0 row(s)
Took 0.0067 seconds
...
i - 40
COLUMN  CELL
0 row(s)
Took 0.0077 seconds
...
i - 50
COLUMN  CELL
0 row(s)
Took 0.0091 seconds
i - 60
COLUMN  CELL
0 row(s)
Took 0.0106 seconds

The results show that the optimization of this PR works on reading from StoreFiles too, even without dual file compaction. What do you think about this results?

Copy link
Contributor Author

@EungsopYoo EungsopYoo Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @EungsopYoo, how about you keep KEEP_DELETED_CELL false above and see how perf numbers look like with and without your patch?

I have run the same tests again except KEEP_DELETED_CELL is set false.

master

  1. scan with dual file compaction enabled 4047ms
  2. scan with dual file compaction disabled 4277ms
  3. scan with delete markers and dual file compaction disabled 2.5031 seconds
  4. scan with delete markers and dual file compaction enabled 0.0198 seconds

this PR

  1. scan with dual file compaction enabled 4134ms
  2. scan with dual file compaction disabled 3383ms
  3. scan with delete markers and dual file compaction disabled 3.4726 seconds
  4. scan with delete markers and dual file compaction enabled 0.0245 seconds

It looks like there is some performance degradation on the result 3. I will dig into it.

https://github.com/EungsopYoo/hbase/blob/63901155caf5c226b02564128669234c08251e8d/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/NormalUserScanQueryMatcher.java#L81-L99

The slight performance degradation is due to removing of early return of MatchCode.SKIP in the normal cases. Because of the removal of early return, checkDeleted() method is executed more than before, and then some burden of computation is added. I found the result by removing added code blocks one by one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScanQueryMatcher.MatchCode qcode = matcher.match(cell, prevCell);

case SEEK_NEXT_COL:
seekOrSkipToNextColumn(cell);
NextState stateAfterSeekNextColumn = needToReturn(outResult);
if (stateAfterSeekNextColumn != null) {
return scannerContext.setScannerState(stateAfterSeekNextColumn).hasMoreValues();
}
break;
case SKIP:
this.heap.next();
break;

With some more digging, I found the actual cause of the degradation is return value of matcher.match(). It is very lightweight to process the return value of SKIP. But it is much more heavier to process the return value of SEEK_NEXT_COL.

@EungsopYoo
Copy link
Contributor Author

@virajjasani @Apache9 @kadirozde
The review seems to have stalled. Is there anything I can do to help move it forward?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants