HBASE-29039 Optimize read performance for accumulated delete markers on the same row or cell #6557

EungsopYoo · 2024-12-23T06:02:52Z

No description provided.

…row or cell

Apache9 · 2024-12-29T13:24:29Z

I checked the code, we do have logic to seek to next row or column when we hit a delte family cell.

hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java

Line 207 in 28c4353

switch (deleteResult) {

But the problem is that, seems we will return earlier before we actually call this method here

hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/NormalUserScanQueryMatcher.java

Line 76 in 28c4353

returnCode = checkDeleted(deletes, cell);

The above code block

    if (PrivateCellUtil.isDelete(typeByte)) {
      boolean includeDeleteMarker =
        seePastDeleteMarkers ? tr.withinTimeRange(timestamp) : tr.withinOrAfterTimeRange(timestamp);
      if (includeDeleteMarker) {
        this.deletes.add(cell);
      }
      return MatchCode.SKIP;
    }

Seems incorrect, we will always return MatchCode.SKIP if we get a delete maker...

I think why we do not find this before is that, usually there will be only one delete maker, so when we check the next cell, we will fall through and call the checkDeleted method so we will seek to next row or column.

Here the scenario is that we have bunch of delete makrer, then here we will see them all instead of seek to next row or column, since we will always go into the code block above and return MatchCode.SKIP.

I think we should try to optimize the logic of the above code block.

Thanks.

virajjasani · 2024-12-30T22:39:58Z

FYI @kadirozde

EungsopYoo · 2025-01-17T05:35:16Z

    if (PrivateCellUtil.isDelete(typeByte)) {
      boolean includeDeleteMarker =
        seePastDeleteMarkers ? tr.withinTimeRange(timestamp) : tr.withinOrAfterTimeRange(timestamp);
      if (includeDeleteMarker) {
        this.deletes.add(cell);
      }
      return MatchCode.SKIP;
    }
Seems incorrect, we will always return MatchCode.SKIP if we get a delete maker...

I think why we do not find this before is that, usually there will be only one delete maker, so when we check the next cell, we will fall through and call the checkDeleted method so we will seek to next row or column.

Here the scenario is that we have bunch of delete makrer, then here we will see them all instead of seek to next row or column, since we will always go into the code block above and return MatchCode.SKIP.

I think we should try to optimize the logic of the above code block.

f6739e2
I removed the early return MatchCode.SKIP statement. But some test cases related to NewVersionBehavior were failed after that.

74ecf71
So I restored return MatchCode.SKIP with some exceptional conditions.

Apache-HBase · 2025-01-17T05:47:53Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 28s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	hbaseanti	0m 0s		Patch does not have any anti-patterns.
			_ master Compile Tests _
+1 💚	mvninstall	3m 7s		master passed
+1 💚	compile	3m 7s		master passed
+1 💚	checkstyle	0m 37s		master passed
+1 💚	spotbugs	1m 38s		master passed
+1 💚	spotless	0m 45s		branch has no errors when running spotless:check.
			_ Patch Compile Tests _
+1 💚	mvninstall	2m 51s		the patch passed
+1 💚	compile	3m 8s		the patch passed
+1 💚	javac	3m 8s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 35s	/results-checkstyle-hbase-server.txt	hbase-server: The patch generated 4 new + 8 unchanged - 0 fixed = 12 total (was 8)
+1 💚	spotbugs	1m 37s		the patch passed
+1 💚	hadoopcheck	11m 6s		Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚	spotless	0m 43s		patch has no errors when running spotless:check.
			_ Other Tests _
+1 💚	asflicense	0m 10s		The patch does not generate ASF License warnings.
		36m 49s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6557/10/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#6557
Optional Tests	dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname	Linux 4ce58ddd6d71 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `6390115`
Default Java	Eclipse Adoptium-17.0.11+9
Max. process+thread count	85 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6557/10/console
versions	git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2025-01-17T09:01:35Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 27s		Docker mode activated.
-0 ⚠️	yetus	0m 2s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ master Compile Tests _
+1 💚	mvninstall	3m 15s		master passed
+1 💚	compile	0m 52s		master passed
+1 💚	javadoc	0m 26s		master passed
+1 💚	shadedjars	5m 36s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	2m 52s		the patch passed
+1 💚	compile	0m 52s		the patch passed
+1 💚	javac	0m 52s		the patch passed
+1 💚	javadoc	0m 25s		the patch passed
+1 💚	shadedjars	5m 34s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
-1 ❌	unit	206m 15s	/patch-unit-hbase-server.txt	hbase-server in the patch failed.
		230m 30s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6557/10/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR	#6557
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux a34d57e7da1c 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `6390115`
Default Java	Eclipse Adoptium-17.0.11+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6557/10/testReport/
Max. process+thread count	5332 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6557/10/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

virajjasani · 2025-01-18T01:12:04Z

.../main/java/org/apache/hadoop/hbase/regionserver/querymatcher/NormalUserScanQueryMatcher.java

+  private boolean canOptimizeReadDeleteMarkers() {
+    // for simplicity, optimization works only for these cases
+    return !seePastDeleteMarkers && scanMaxVersions == 1 && !visibilityLabelEnabled
+      && getFilter() == null && !(deletes instanceof NewVersionBehaviorTracker);
+  }


@EungsopYoo have you also considered Dual File Compaction #5545 ?
Could you also run some perf test comparing Dual File Compaction with this optimization? This might be really helpful.

@virajjasani
I have reviewed Dual File Compaction you mentioned. This PR and Dual File Compaction have something in common, especially handling delete markers. But I think there are some differences.

This PR focuses on the accumulated delete markers of the same row or cell, but that handles delete marker of different rows or columns. And this PR can optimize read from both of MemStore and StoreFiles, but that can optimize read from StoreFiles only.

So I think they are complementary and can be used all together.

Thanks @EungsopYoo, this is what I was also expecting.

On the Jira https://issues.apache.org/jira/browse/HBASE-25972, Kadir has also provided how full scan is improvement is observed using PE (second comment on the Jira). Could you also run the same steps to see how much improvement you observe using this PR?

@virajjasani
PE does not have the test case Put, Delete and Get on the same row. Should I add the new test case and run it, maybe?

It's not necessary because the steps mentioned in the Jira will take care of adding many delete markers so you can follow the exact same steps. Thank you!

To me, this improvement is only meaningful when the scanned data is in memstore assuming that that the skip list will be used for jumping from one column to the next (I have not looked at the code in detail recently so I assume it is the case). However, when HBase scans data from HFile, do we have data structures in place to jump from one column to next one? I think we do not have one. No only we linearly scan the cells within a row, we also linearly scan all rows within a HBase block, do not we? So I did not understand why skipping to the next column would be a significant optimization in general.

Thanks @EungsopYoo, how about you keep KEEP_DELETED_CELL false above and see how perf numbers look like with and without your patch?

I have run the same tests again except KEEP_DELETED_CELL is set false.

master

scan with dual file compaction enabled 4047ms

scan with dual file compaction disabled 4277ms

scan with delete markers and dual file compaction disabled 2.5031 seconds

scan with delete markers and dual file compaction enabled 0.0198 seconds

this PR

scan with dual file compaction enabled 4134ms

scan with dual file compaction disabled 3383ms

scan with delete markers and dual file compaction disabled 3.4726 seconds

scan with delete markers and dual file compaction enabled 0.0245 seconds

It looks like there is some performance degradation on the result 3. I will dig into it.

To me, this improvement is only meaningful when the scanned data is in memstore assuming that that the skip list will be used for jumping from one column to the next (I have not looked at the code in detail recently so I assume it is the case). However, when HBase scans data from HFile, do we have data structures in place to jump from one column to next one? I think we do not have one. No only we linearly scan the cells within a row, we also linearly scan all rows within a HBase block, do not we? So I did not understand why skipping to the next column would be a significant optimization in general.

https://issues.apache.org/jira/browse/HBASE-29039
The performance tests on the Jira description are just the cases of reading from MemStore only. So I have run new performance tests, reading from StoreFiles only with or without dual file compaction.

create 'test', 'c' java_import org.apache.hadoop.hbase.client.Delete java_import org.apache.hadoop.hbase.TableName java_import java.lang.System con = @hbase.instance_variable_get(:@connection) table = con.getTable(TableName.valueOf('test')) 1000.times do |i| # batch 10000 deletes with different timestamps every 10 seconds now = System.currentTimeMillis() dels = 10000.times.map do |i| del = Delete.new(Bytes.toBytes('row')) del.addFamily(Bytes.toBytes('c'), now + i) end table.delete(dels) sleep(10) flush 'test' # Trigger manually minor compaction because of compaction is not triggered automatically. But I don't know why yet compact 'test' if i % 10 == 0 puts "i - #{i}" # Read after flush from StoreFiles get 'test', 'row' end

master - without dual file compaction

i - 0 COLUMN CELL 0 row(s) Took 0.0184 seconds ... i - 10 COLUMN CELL 0 row(s) Took 0.0206 seconds ... i - 20 COLUMN CELL 0 row(s) Took 0.0398 seconds ... i - 30 COLUMN CELL 0 row(s) Took 0.0462 seconds ... i - 40 COLUMN CELL 0 row(s) Took 0.0517 seconds ... i - 50 COLUMN CELL 0 row(s) Took 0.0730 seconds ... i - 60 COLUMN CELL 0 row(s) Took 0.0903 seconds

master - with dual file compaction

i - 0 COLUMN CELL 0 row(s) Took 0.0050 seconds ... i - 10 COLUMN CELL 0 row(s) Took 0.0175 seconds ... i - 20 COLUMN CELL 0 row(s) Took 0.0265 seconds ... i - 30 COLUMN CELL 0 row(s) Took 0.0310 seconds ... i - 40 COLUMN CELL 0 row(s) Took 0.0205 seconds ... i - 50 COLUMN CELL 0 row(s) Took 0.0532 seconds ... i - 60 COLUMN CELL 0 row(s) Took 0.0360 seconds

this PR - without dual file compaction

i - 0 COLUMN CELL 0 row(s) Took 0.0061 seconds ... i - 10 COLUMN CELL 0 row(s) Took 0.0102 seconds ... i - 20 COLUMN CELL 0 row(s) Took 0.0052 seconds ... i - 30 COLUMN CELL 0 row(s) Took 0.0073 seconds ... i - 40 COLUMN CELL 0 row(s) Took 0.0077 seconds ... i - 50 COLUMN CELL 0 row(s) Took 0.0115 seconds ... i - 60 COLUMN CELL 0 row(s) Took 0.0101 seconds

this PR - with dual file compaction

i - 0 COLUMN CELL 0 row(s) Took 0.0046 seconds ... i - 10 COLUMN CELL 0 row(s) Took 0.0103 seconds ... i - 20 COLUMN CELL 0 row(s) Took 0.0074 seconds ... i - 30 COLUMN CELL 0 row(s) Took 0.0067 seconds ... i - 40 COLUMN CELL 0 row(s) Took 0.0077 seconds ... i - 50 COLUMN CELL 0 row(s) Took 0.0091 seconds i - 60 COLUMN CELL 0 row(s) Took 0.0106 seconds

The results show that the optimization of this PR works on reading from StoreFiles too, even without dual file compaction. What do you think about this results?

Thanks @EungsopYoo, how about you keep KEEP_DELETED_CELL false above and see how perf numbers look like with and without your patch?

I have run the same tests again except KEEP_DELETED_CELL is set false.

master

scan with dual file compaction enabled 4047ms

scan with dual file compaction disabled 4277ms

scan with delete markers and dual file compaction disabled 2.5031 seconds

scan with delete markers and dual file compaction enabled 0.0198 seconds

this PR

scan with dual file compaction enabled 4134ms

scan with dual file compaction disabled 3383ms

scan with delete markers and dual file compaction disabled 3.4726 seconds

scan with delete markers and dual file compaction enabled 0.0245 seconds

It looks like there is some performance degradation on the result 3. I will dig into it.

https://github.com/EungsopYoo/hbase/blob/63901155caf5c226b02564128669234c08251e8d/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/NormalUserScanQueryMatcher.java#L81-L99

The slight performance degradation is due to removing of early return of MatchCode.SKIP in the normal cases. Because of the removal of early return, checkDeleted() method is executed more than before, and then some burden of computation is added. I found the result by removing added code blocks one by one.

hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java

Line 644 in 6390115

ScanQueryMatcher.MatchCode qcode = matcher.match(cell, prevCell);

hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java

Lines 758 to 768 in 6390115

case SEEK_NEXT_COL:

seekOrSkipToNextColumn(cell);

NextState stateAfterSeekNextColumn = needToReturn(outResult);

if (stateAfterSeekNextColumn != null) {

return scannerContext.setScannerState(stateAfterSeekNextColumn).hasMoreValues();

}

break;

case SKIP:

this.heap.next();

break;

With some more digging, I found the actual cause of the degradation is return value of matcher.match(). It is very lightweight to process the return value of SKIP. But it is much more heavier to process the return value of SEEK_NEXT_COL.

EungsopYoo · 2025-08-05T06:19:44Z

@virajjasani @Apache9 @kadirozde
The review seems to have stalled. Is there anything I can do to help move it forward?

EungsopYoo changed the title ~~Optimize read performance for accumulated delete markers on the same row or cell~~ HBASE-29039 Optimize read performance for accumulated delete markers on the same row or cell Dec 23, 2024

This comment has been minimized.

Sign in to view

terence-yoo added 3 commits December 26, 2024 11:25

optimize read performance for accumulated delete markers on the same …

eb68bdc

…row or cell

handle visibilityLabelEnabled

fbf042e

handle scan with filter

80044c2

EungsopYoo force-pushed the HBASE-29039 branch from 8075b0d to 80044c2 Compare December 26, 2024 02:26

This comment has been minimized.

Sign in to view

terence-yoo added 2 commits January 16, 2025 10:00

remove unnecessary codes

202ea7d

remove incorrect early return of MatchCode.SKIP

f6739e2

EungsopYoo force-pushed the HBASE-29039 branch from fc55b25 to f6739e2 Compare January 16, 2025 03:00

This comment has been minimized.

Sign in to view

handle failed test cases

74ecf71

This comment has been minimized.

Sign in to view

make visibilityLabelEnabled a instance variable

6390115

EungsopYoo force-pushed the HBASE-29039 branch from 91636d1 to 6390115 Compare January 17, 2025 05:05

virajjasani reviewed Jan 18, 2025

View reviewed changes

	case SEEK_NEXT_COL:
	seekOrSkipToNextColumn(cell);
	NextState stateAfterSeekNextColumn = needToReturn(outResult);
	if (stateAfterSeekNextColumn != null) {
	return scannerContext.setScannerState(stateAfterSeekNextColumn).hasMoreValues();
	}
	break;

	case SKIP:
	this.heap.next();
	break;

HBASE-29039 Optimize read performance for accumulated delete markers on the same row or cell #6557

Are you sure you want to change the base?

HBASE-29039 Optimize read performance for accumulated delete markers on the same row or cell #6557

Conversation

EungsopYoo commented Dec 23, 2024

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Apache9 commented Dec 29, 2024

Uh oh!

virajjasani commented Dec 30, 2024

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

EungsopYoo commented Jan 17, 2025

Uh oh!

Apache-HBase commented Jan 17, 2025

Uh oh!

Apache-HBase commented Jan 17, 2025

Uh oh!

virajjasani Jan 18, 2025

Choose a reason for hiding this comment

Uh oh!

EungsopYoo Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

virajjasani Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

EungsopYoo Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

virajjasani Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

kadirozde Jan 22, 2025

Choose a reason for hiding this comment

Uh oh!

EungsopYoo Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

EungsopYoo Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EungsopYoo Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EungsopYoo Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

EungsopYoo commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

EungsopYoo Jan 20, 2025 •

edited

Loading

EungsopYoo Jan 23, 2025 •

edited

Loading

EungsopYoo Jan 23, 2025 •

edited

Loading