HBASE-25984: Avoid premature reuse of sync futures in FSHLog #3371

bharathv · 2021-06-09T20:56:30Z

No description provided.

bharathv · 2021-06-09T21:29:28Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AsyncFSWAL.java

+   */
+  private void markFutureDoneAndOffer(SyncFuture future, long txid, Throwable t) {
+    future.done(txid, t);
+    syncFutureCache.offer(future);


This patch in the current form doesn't get rid of future overwrites as it does not seem to cause any issues in AsyncWAL case (based on code reading), but if the reviewers think we should do that, I can refactor accordingly.

Where is the future overwrite here? The call to 'done'?

@saintstack I documented the race here..

Once done() is called, the future can be reused immediately from another handler (without this patch). That was causing deadlocks in FSHLog. Based on my analysis of AsyncFSWAL, I think the overwrites are possible but it should not affect the correctness as the safe point is attained in a different manner. So wanted to check with Duo who is the expert on that.

Sorry. Should have done more background reading before commenting.

Apache-HBase · 2021-06-09T21:32:55Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 27s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	3m 57s	master passed
+1 💚	compile	1m 2s	master passed
+1 💚	shadedjars	8m 15s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 39s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	3m 39s	the patch passed
+1 💚	compile	0m 59s	the patch passed
+1 💚	javac	0m 59s	the patch passed
-1 ❌	shadedjars	6m 32s	patch has 10 errors when building our shaded downstream artifacts.
-0 ⚠️	javadoc	0m 37s	hbase-server generated 1 new + 20 unchanged - 0 fixed = 21 total (was 20)
		_ Other Tests _
-1 ❌	unit	7m 51s	hbase-server in the patch failed.
		35m 27s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#3371
JIRA Issue	HBASE-25984
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 3e9d363cf5c7 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `7f7a293`
Default Java	AdoptOpenJDK-1.8.0_282-b08
shadedjars	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/artifact/yetus-jdk8-hadoop3-check/output/patch-shadedjars.txt
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/artifact/yetus-jdk8-hadoop3-check/output/diff-javadoc-javadoc-hbase-server.txt
unit	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/testReport/
Max. process+thread count	1003 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-06-09T21:35:50Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 27s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	4m 38s	master passed
+1 💚	compile	1m 13s	master passed
+1 💚	shadedjars	8m 11s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 44s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	4m 19s	the patch passed
+1 💚	compile	1m 13s	the patch passed
+1 💚	javac	1m 13s	the patch passed
-1 ❌	shadedjars	6m 39s	patch has 10 errors when building our shaded downstream artifacts.
-0 ⚠️	javadoc	0m 41s	hbase-server generated 1 new + 85 unchanged - 0 fixed = 86 total (was 85)
		_ Other Tests _
-1 ❌	unit	8m 52s	hbase-server in the patch failed.
		38m 25s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#3371
JIRA Issue	HBASE-25984
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 9746a8806cd5 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `7f7a293`
Default Java	AdoptOpenJDK-11.0.10+9
shadedjars	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/artifact/yetus-jdk11-hadoop3-check/output/patch-shadedjars.txt
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/artifact/yetus-jdk11-hadoop3-check/output/diff-javadoc-javadoc-hbase-server.txt
unit	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/testReport/
Max. process+thread count	753 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-06-09T21:52:49Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 34s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+1 💚	mvninstall	5m 31s	master passed
+1 💚	compile	4m 38s	master passed
+1 💚	checkstyle	1m 11s	master passed
+1 💚	spotbugs	2m 26s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	3m 59s	the patch passed
+1 💚	compile	3m 52s	the patch passed
+1 💚	javac	3m 52s	the patch passed
-0 ⚠️	checkstyle	1m 11s	hbase-server: The patch generated 2 new + 12 unchanged - 0 fixed = 14 total (was 12)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	20m 22s	Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
-1 ❌	spotbugs	2m 32s	hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
		_ Other Tests _
-1 ❌	asflicense	0m 16s	The patch generated 2 ASF License warnings.
		55m 2s

Reason	Tests
FindBugs	module:hbase-server
	Unread field:field be static? At SyncFutureCache.java:[line 27]

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#3371
JIRA Issue	HBASE-25984
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux f88e618babcd 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `7f7a293`
Default Java	AdoptOpenJDK-1.8.0_282-b08
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
spotbugs	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/artifact/yetus-general-check/output/new-spotbugs-hbase-server.html
asflicense	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/artifact/yetus-general-check/output/patch-asflicense-problems.txt
Max. process+thread count	96 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/1/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-06-09T22:43:16Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 0s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	3m 59s	master passed
+1 💚	compile	1m 1s	master passed
+1 💚	shadedjars	8m 14s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 38s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	3m 40s	the patch passed
+1 💚	compile	0m 59s	the patch passed
+1 💚	javac	0m 59s	the patch passed
+1 💚	shadedjars	8m 15s	patch has no errors when building our shaded downstream artifacts.
-0 ⚠️	javadoc	0m 36s	hbase-server generated 1 new + 20 unchanged - 0 fixed = 21 total (was 20)
		_ Other Tests _
-1 ❌	unit	7m 57s	hbase-server in the patch failed.
		37m 50s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#3371
JIRA Issue	HBASE-25984
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux e64d0299f9a7 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `7f7a293`
Default Java	AdoptOpenJDK-1.8.0_282-b08
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/2/artifact/yetus-jdk8-hadoop3-check/output/diff-javadoc-javadoc-hbase-server.txt
unit	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/2/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/2/testReport/
Max. process+thread count	915 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/2/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-06-09T22:44:41Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 27s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	4m 8s	master passed
+1 💚	compile	1m 11s	master passed
+1 💚	shadedjars	8m 11s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 41s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	4m 16s	the patch passed
+1 💚	compile	1m 12s	the patch passed
+1 💚	javac	1m 12s	the patch passed
+1 💚	shadedjars	8m 9s	patch has no errors when building our shaded downstream artifacts.
-0 ⚠️	javadoc	0m 40s	hbase-server generated 1 new + 85 unchanged - 0 fixed = 86 total (was 85)
		_ Other Tests _
-1 ❌	unit	8m 49s	hbase-server in the patch failed.
		39m 14s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#3371
JIRA Issue	HBASE-25984
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 71dbc29a6f98 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `7f7a293`
Default Java	AdoptOpenJDK-11.0.10+9
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/2/artifact/yetus-jdk11-hadoop3-check/output/diff-javadoc-javadoc-hbase-server.txt
unit	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/2/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/2/testReport/
Max. process+thread count	765 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/2/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-06-09T22:56:55Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 4s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+1 💚	mvninstall	4m 19s	master passed
+1 💚	compile	3m 22s	master passed
+1 💚	checkstyle	1m 10s	master passed
+1 💚	spotbugs	2m 12s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	4m 1s	the patch passed
+1 💚	compile	3m 21s	the patch passed
+1 💚	javac	3m 21s	the patch passed
-0 ⚠️	checkstyle	1m 10s	hbase-server: The patch generated 2 new + 12 unchanged - 0 fixed = 14 total (was 12)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	19m 58s	Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
-1 ❌	spotbugs	2m 23s	hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
		_ Other Tests _
+1 💚	asflicense	0m 12s	The patch does not generate ASF License warnings.
		51m 29s

Reason	Tests
FindBugs	module:hbase-server
	Unread field:field be static? At SyncFutureCache.java:[line 44]

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#3371
JIRA Issue	HBASE-25984
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux d11702d0593c 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `7f7a293`
Default Java	AdoptOpenJDK-1.8.0_282-b08
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/2/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
spotbugs	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/2/artifact/yetus-general-check/output/new-spotbugs-hbase-server.html
Max. process+thread count	86 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/2/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

saintstack

Looks nice. Cache makes sense.

Should there be bounds on the cache size? There doesn't seem to be any.

I do wonder about all threads contending on the cache object (where, IIRC, there was on coordination before).

saintstack · 2021-06-10T05:46:25Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AsyncFSWAL.java

+   */
+  private void markFutureDoneAndOffer(SyncFuture future, long txid, Throwable t) {
+    future.done(txid, t);
+    syncFutureCache.offer(future);


Where is the future overwrite here? The call to 'done'?

saintstack · 2021-06-10T05:51:07Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFutureCache.java

+  public SyncFutureCache(final Configuration conf) {
+    final int handlerCount = conf.getInt(HConstants.REGION_SERVER_HANDLER_COUNT,
+        HConstants.DEFAULT_REGION_SERVER_HANDLER_COUNT);
+    syncFutureCache = CacheBuilder.newBuilder().initialCapacity(handlerCount)


I thought this guava cache 'slow'. That Ben Manes tried to get a patch to guava that improved it but couldn't get interest so his caffeine cache has means of implementing gauava cache api.... so you can drop in his thing instead.

Maybe it doesn't matter here because scale of objects is small? It is critical section though.

Ah sweet, I heard about this Guava vs Caffeine thing, let me replace that. (didn't notice any performance issues with the patch but if we get some extra performance why not)

I'm not sure if there is a benefit over a ThreadLocal in your case. The expiration time here seems to be only to evict when the thread dies, which a TL does automatically. A weakKey cache might be the closest equivalent to that.

It also looks like your on 2.8.1, whereas a putIfAbsent optimization was added in 2.8.2 to avoid locking if the entry is present. That might help.

We don't need a putIfAbsent() either, switched to put() as we are ok with the overwrites here, it doesn't show up in the profiler now. Ya we can't use the

I'm not sure if there is a benefit over a ThreadLocal in your case.

Right, we can't use TL here because of the bug, but WeakKey cache seems like a good alternative, thanks for the pointer.

bharathv · 2021-06-10T16:03:34Z

Should there be bounds on the cache size? There doesn't seem to be any.

Good point, I can add a limit but practically it doesn't matter, I think. We only grow as many as handler threads (and some extra for temporary operations) which should be low thousands and we already have "expireAfterAccess" set to 2mins, so the cache should be GC-ed automatically. I'll still add a max limit, just in case.

I do wonder about all threads contending on the cache object (where, IIRC, there was on coordination before).

Cache is keyed by the Thread object using the future. So the contention scope is the cache key but there is only a single thread using a sync future at any point in time, so in effect I think there is no contention.

saintstack · 2021-06-10T16:29:34Z

Cache is keyed by the Thread object using the future.

You might check if guava cache is backed by a ConcurrentHashMap (going by API, it looks like it). CHM #get and #put are lockless but the likes of #putIfAbsent are not; they lock the 'bucket' the key is in.

We have a PerformanceEvaluation for WAL that might help here... Could try a run and see if you see the cache mentioned in a profile, etc.

bharathv · 2021-06-10T17:33:08Z

lockless but the likes of #putIfAbsent are not; they lock the 'bucket' the key is in.

Nice catch, this coarse grained segment locking thing didn't cross my mind, thanks for the correction. I did run the WAL PE, here are the results, there is like ~1% drop for async WAL but performance was better in FSHLog case. Following are the results for the default AsyncFSWAL implementation. cc: @apurtell

bin/hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation -threads 256 -roll 10000 -verify

With Patch:

-- Histograms ------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos
             count = 10463419
               min = 2490066
               max = 69582818
              mean = 4095968.77
            stddev = 5892795.05
            median = 3486862.00
              75% <= 3664389.00
              95% <= 4136964.00
              98% <= 4732630.00
              99% <= 33418377.00
            99.9% <= 68981951.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync
             count = 102788
               min = 52
               max = 103
              mean = 101.87
            stddev = 2.54
            median = 102.00
              75% <= 102.00
              95% <= 102.00
              98% <= 102.00
              99% <= 103.00
            99.9% <= 103.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs
             count = 102788
               min = 113865
               max = 62777544
              mean = 1628760.64
            stddev = 3796429.99
            median = 1391037.00
              75% <= 1458844.00
              95% <= 1647948.00
              98% <= 1999632.00
              99% <= 2492916.00
            99.9% <= 62777544.00

-- Meters ----------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes
             count = 5828294825
         mean rate = 38614011.23 events/second
     1-minute rate = 36650337.06 events/second
     5-minute rate = 35737367.38 events/second
    15-minute rate = 34778685.18 events/second
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs
             count = 102789
         mean rate = 681.00 events/second
     1-minute rate = 646.16 events/second
     5-minute rate = 629.76 events/second
    15-minute rate = 612.67 events/second

Without patch:

-- Histograms ------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos
             count = 10836273
               min = 2468323
               max = 66655180
              mean = 3893813.44
            stddev = 5620979.14
            median = 3247994.00
              75% <= 3559261.00
              95% <= 4026170.00
              98% <= 4801295.00
              99% <= 5688797.00
            99.9% <= 66655180.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync
             count = 106307
               min = 52
               max = 103
              mean = 101.94
            stddev = 1.70
            median = 102.00
              75% <= 102.00
              95% <= 102.00
              98% <= 102.00
              99% <= 103.00
            99.9% <= 103.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs
             count = 106308
               min = 126570
               max = 54387876
              mean = 1423180.52
            stddev = 2174336.12
            median = 1282380.00
              75% <= 1421388.00
              95% <= 1610647.00
              98% <= 1897988.00
              99% <= 2200189.00
            99.9% <= 52882012.00

-- Meters ----------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes
             count = 6035972275
         mean rate = 39993180.92 events/second
     1-minute rate = 38393898.28 events/second
     5-minute rate = 36625545.98 events/second
    15-minute rate = 35407434.87 events/second
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs
             count = 106308
         mean rate = 704.37 events/second
     1-minute rate = 676.34 events/second
     5-minute rate = 644.42 events/second
    15-minute rate = 622.68 events/second

It does show up on the profiler though.. doesn't seem like a great idea then?

bharathv · 2021-06-10T17:57:55Z

Let me try Caffeine and see what it looks like.

bharathv · 2021-06-10T18:33:55Z

Caffeine seems to perform even worse overall (even though flame graph % is slightly lower), may be its some noise on my machine too.

With patch using Caffeine


-- Histograms ------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos
             count = 15726724
               min = 3199736
               max = 54760561
              mean = 4271968.63
            stddev = 3613748.22
            median = 3942103.00
              75% <= 4057762.00
              95% <= 4403199.00
              98% <= 4815190.00
              99% <= 5449043.00
            99.9% <= 49375530.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync
             count = 154233
               min = 101
               max = 103
              mean = 102.00
            stddev = 0.15
            median = 102.00
              75% <= 102.00
              95% <= 102.00
              98% <= 102.00
              99% <= 103.00
            99.9% <= 103.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs
             count = 154233
               min = 120917
               max = 47121051
              mean = 1786963.74
            stddev = 2953364.50
            median = 1573054.00
              75% <= 1624446.00
              95% <= 1782991.00
              98% <= 1933771.00
              99% <= 2221343.00
            99.9% <= 44809603.00

-- Meters ----------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes
             count = 8759954596
         mean rate = 32330356.51 events/second
     1-minute rate = 32550176.33 events/second
     5-minute rate = 30821417.63 events/second
    15-minute rate = 29564873.50 events/second
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs
             count = 154234
         mean rate = 569.23 events/second
     1-minute rate = 573.07 events/second
     5-minute rate = 542.31 events/second
    15-minute rate = 519.90 events/second

bharathv · 2021-06-10T18:46:50Z

Flame graph without any changes.. overall it seems like we have 1.xx% overhead due to the cache and guava seems slightly better than Caffeine. cc: @saintstack / @apurtell

Without Changes

saintstack · 2021-06-11T00:15:01Z

Thread local was used to avoid threads having to coordinate around a resource; this plus ringbuffer was meant to get us a write that was w/o locking (till we hit dfsclient at least). This impl is old though now given redo in asyncwal.

Thanks for doing the nice graphs and perf runs above.

Interesting on caffeine vs guava; probably this workload (but in compare of caffeine against our block cache CHM, CHM seemed better though caffeine looked to have much nicer 'locking' profile).

It is hard to argue w/ a perf improvement (and improved safety). I'd be in favor of commit.

Funny though how its 103 appends per sync w/ patch and w/o (buffer limit?).

You might try using get/put instead of getIfPresent or whatever it is you are using just to see what is possible perf-wise.

Have you looked at flamegraphs for asyncwal? Perhaps you'll see where the 1% is going? Yeah, would be good to get @Apache9 input.

Nice find @bharathv (after reading the JIRA)

bharathv · 2021-06-11T00:56:23Z

Thread local was used to avoid threads having to coordinate around a resource; this plus ringbuffer was meant to get us a write that was w/o locking (till we hit dfsclient at least).

Ya that makes sense. I also have a version that fixes that problem by retaining Thread locals (see this). It works by clarifying two states DONE and RELEASED, if this contention is a concern for any reason, just FYI.

Interesting on caffeine vs guava; probably this workload

I don't know what % of it is noise in the environment, it is on my local machine may have some interference (tried to minimize as much as I can).

Have you looked at flamegraphs for asyncwal? Perhaps you'll see where the 1% is going?

It is going into putIfAbsent(), see below.

Apache9 · 2021-06-11T01:58:40Z

I think the assumption here is that, the thread which holds the SyncFuture will block on it. If timed out, we will kill the regionserver so it is not a big deal to make a per thread cache.

If this is not the case then, I agree to use a general cache, but it will require us to manually return the SyncFuture though.

Will take a look at the performance result later.

Thanks.

bharathv · 2021-06-11T02:36:31Z

If timed out, we will kill the regionserver

Thanks for taking a look. Can you please point me to the code (I may have missed it). If we timed out (as seen by client), the timeout (in some form) is propagated back to the caller but in the background sync call could still be successful (if there are issues like slow syncs/HDFS etc) ? or did I misinterpret what you are saying?

Apache9 · 2021-06-11T03:10:30Z

If timed out, we will kill the regionserver

Thanks for taking a look. Can you please point me to the code (I may have missed it). If we timed out (as seen by client), the timeout (in some form) is propagated back to the caller but in the background sync call could still be successful (if there are issues like slow syncs/HDFS etc) ? or did I misinterpret what you are saying?

For example, on region flush

      if (wal != null) {
        // write flush marker to WAL. If fail, we should throw DroppedSnapshotException
        FlushDescriptor desc = ProtobufUtil.toFlushDescriptor(FlushAction.COMMIT_FLUSH,
          getRegionInfo(), flushOpSeqId, committedFiles);
        WALUtil.writeFlushMarker(wal, this.getReplicationScope(), getRegionInfo(), desc, true,
            mvcc);
      }

And a DroppedSnapshotException will trigger a region server abort.

Anyway, there is no strong guarantee on killing the regionserver when timeout, especially it is not easy to add this logic in SyncFuture, I'm OK with the approach here.

Thanks.

ben-manes · 2021-06-11T06:47:03Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFutureCache.java

+    syncFutureCache.asMap().compute(Thread.currentThread(), (thread, syncFuture) -> {
+      result[0] = syncFuture;
+      return null;
+    });


you could simplify this by using remove(key) instead of a computation, e.g.

SyncFuture future = syncFutureCache.asMap().remove(Thread.currentThread()); return (future == null) ? new SyncFuture() : future;

Neat, done!

ben-manes · 2021-06-11T06:49:38Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFutureCache.java

+  public void cleanUp() {
+    if (syncFutureCache != null) {
+      syncFutureCache.invalidateAll();
+    }
+  }


This might be a confusing method name, as cleanUp for the cache lib means that pending maintenance work is performed immediately (e.g. discarding any expired entries). It doesn't clear the cache like this does, so a user might be surprised by whichever mental model they have. Instead, please call this clear, invalidateAll, etc to avoid conflicting names.

Makes sense, done.

Apache-HBase · 2021-06-14T15:18:47Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 28s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	3m 58s	master passed
+1 💚	compile	1m 1s	master passed
+1 💚	shadedjars	8m 14s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 39s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	3m 40s	the patch passed
+1 💚	compile	0m 59s	the patch passed
+1 💚	javac	0m 59s	the patch passed
+1 💚	shadedjars	8m 11s	patch has no errors when building our shaded downstream artifacts.
-0 ⚠️	javadoc	0m 38s	hbase-server generated 1 new + 20 unchanged - 0 fixed = 21 total (was 20)
		_ Other Tests _
-1 ❌	unit	8m 5s	hbase-server in the patch failed.
		37m 24s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#3371
JIRA Issue	HBASE-25984
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 4c79f0293486 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ba6995e`
Default Java	AdoptOpenJDK-1.8.0_282-b08
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/3/artifact/yetus-jdk8-hadoop3-check/output/diff-javadoc-javadoc-hbase-server.txt
unit	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/3/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/3/testReport/
Max. process+thread count	779 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/3/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-06-14T15:22:31Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 28s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	4m 29s	master passed
+1 💚	compile	1m 19s	master passed
+1 💚	shadedjars	8m 29s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 44s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	4m 18s	the patch passed
+1 💚	compile	1m 14s	the patch passed
+1 💚	javac	1m 14s	the patch passed
+1 💚	shadedjars	8m 42s	patch has no errors when building our shaded downstream artifacts.
-0 ⚠️	javadoc	0m 40s	hbase-server generated 1 new + 85 unchanged - 0 fixed = 86 total (was 85)
		_ Other Tests _
-1 ❌	unit	9m 0s	hbase-server in the patch failed.
		40m 53s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#3371
JIRA Issue	HBASE-25984
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux ca0244ba596a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ba6995e`
Default Java	AdoptOpenJDK-11.0.10+9
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/3/artifact/yetus-jdk11-hadoop3-check/output/diff-javadoc-javadoc-hbase-server.txt
unit	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/3/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/3/testReport/
Max. process+thread count	793 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/3/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-06-14T15:37:08Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	6m 43s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+1 💚	mvninstall	4m 36s	master passed
+1 💚	compile	3m 30s	master passed
+1 💚	checkstyle	1m 11s	master passed
+1 💚	spotbugs	2m 17s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	4m 6s	the patch passed
+1 💚	compile	3m 30s	the patch passed
+1 💚	javac	3m 30s	the patch passed
-0 ⚠️	checkstyle	1m 14s	hbase-server: The patch generated 2 new + 12 unchanged - 0 fixed = 14 total (was 12)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	20m 35s	Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
-1 ❌	spotbugs	2m 34s	hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
		_ Other Tests _
+1 💚	asflicense	0m 13s	The patch does not generate ASF License warnings.
		59m 2s

Reason	Tests
FindBugs	module:hbase-server
	Unread field:field be static? At SyncFutureCache.java:[line 44]

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#3371
JIRA Issue	HBASE-25984
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux d6034bc25add 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ba6995e`
Default Java	AdoptOpenJDK-1.8.0_282-b08
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/3/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
spotbugs	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/3/artifact/yetus-general-check/output/new-spotbugs-hbase-server.html
Max. process+thread count	86 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/3/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-06-14T16:42:46Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 27s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+1 💚	mvninstall	3m 54s	master passed
+1 💚	compile	3m 10s	master passed
+1 💚	checkstyle	1m 4s	master passed
+1 💚	spotbugs	2m 2s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	3m 34s	the patch passed
+1 💚	compile	3m 11s	the patch passed
+1 💚	javac	3m 11s	the patch passed
-0 ⚠️	checkstyle	1m 2s	hbase-server: The patch generated 2 new + 12 unchanged - 0 fixed = 14 total (was 12)
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	18m 6s	Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚	spotbugs	2m 16s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 15s	The patch does not generate ASF License warnings.
		46m 50s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#3371
JIRA Issue	HBASE-25984
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux c35b18b95aa7 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ba6995e`
Default Java	AdoptOpenJDK-1.8.0_282-b08
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/5/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
Max. process+thread count	96 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/5/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-06-14T20:00:47Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 5s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	4m 32s	master passed
+1 💚	compile	1m 7s	master passed
+1 💚	shadedjars	9m 6s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 42s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	4m 9s	the patch passed
+1 💚	compile	1m 3s	the patch passed
+1 💚	javac	1m 3s	the patch passed
+1 💚	shadedjars	9m 22s	patch has no errors when building our shaded downstream artifacts.
-0 ⚠️	javadoc	0m 40s	hbase-server generated 1 new + 20 unchanged - 0 fixed = 21 total (was 20)
		_ Other Tests _
+1 💚	unit	211m 0s	hbase-server in the patch passed.
		244m 37s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/5/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#3371
JIRA Issue	HBASE-25984
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 1489aaa19bf1 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ba6995e`
Default Java	AdoptOpenJDK-1.8.0_282-b08
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/5/artifact/yetus-jdk8-hadoop3-check/output/diff-javadoc-javadoc-hbase-server.txt
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/5/testReport/
Max. process+thread count	3559 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/5/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2021-06-14T20:30:03Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 17s	Docker mode activated.
-0 ⚠️	yetus	0m 2s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	5m 22s	master passed
+1 💚	compile	1m 23s	master passed
+1 💚	shadedjars	9m 24s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 45s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	4m 59s	the patch passed
+1 💚	compile	1m 22s	the patch passed
+1 💚	javac	1m 22s	the patch passed
+1 💚	shadedjars	9m 53s	patch has no errors when building our shaded downstream artifacts.
-0 ⚠️	javadoc	0m 55s	hbase-server generated 1 new + 85 unchanged - 0 fixed = 86 total (was 85)
		_ Other Tests _
+1 💚	unit	236m 22s	hbase-server in the patch passed.
		273m 22s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/5/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#3371
JIRA Issue	HBASE-25984
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 465f652549e3 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 01:11:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `ba6995e`
Default Java	AdoptOpenJDK-11.0.10+9
javadoc	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/5/artifact/yetus-jdk11-hadoop3-check/output/diff-javadoc-javadoc-hbase-server.txt
Test Results	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/5/testReport/
Max. process+thread count	2805 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3371/5/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

bharathv · 2021-06-14T23:08:06Z

Can someone sign off on this change please? Need a +1 to merge.

virajjasani

Left minor questions, overall looks good to ship, +1

virajjasani · 2021-06-15T11:49:14Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFutureCache.java

+    SyncFuture future = syncFutureCache.asMap().remove(Thread.currentThread());
+    return (future == null) ? new SyncFuture() : future;


This suggestion was great. I think we can make this change in branch-1 as well (sorry I lost the track, not sure if branch-1 PR is still pending for merge or already merged)

There is no branch-1 patch (yet).

virajjasani · 2021-06-15T11:54:23Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFutureCache.java

+@InterfaceAudience.Private
+public final class SyncFutureCache {
+
+  private static final long SYNC_FUTURE_INVALIDATION_TIMEOUT_MINS = 2;


2 min is good estimate? Trying to understand if we might run into overhead (cache entry getting expired followed by same entry getting created for same thread from pool)

If a handler remained idle for 2 mins, that indicates there isn't enough load on the server in which case there is no advantage in keeping this around. Usually only helpful if there is high load and potential for reuse so that we don't need to GC these small objects often.

…3371) Signed-off-by: Viraj Jasani <[email protected]> (cherry picked from commit 5a19bcf)

…3371) Signed-off-by: Viraj Jasani [email protected] (cherry picked from commit 5a19bcf)

…3371) Signed-off-by: Viraj Jasani <[email protected]> (cherry picked from commit 5a19bcf)

…3392) Signed-off-by: Viraj Jasani <[email protected]> (cherry picked from commit 5a19bcf)

…3393) Signed-off-by: Viraj Jasani <[email protected]> Signed-off-by: Wei-Chiu Chuang <[email protected]> (cherry picked from commit 5a19bcf)

…3394) Signed-off-by: Viraj Jasani <[email protected]> (cherry picked from commit 5a19bcf)

…3371) Signed-off-by: Viraj Jasani [email protected] (cherry picked from commit 5a19bcf)

…3398) Signed-off-by: Viraj Jasani [email protected] (cherry picked from commit 5a19bcf)

bharathv requested a review from Apache9 June 9, 2021 20:56

bharathv marked this pull request as draft June 9, 2021 20:57

bharathv commented Jun 9, 2021

View reviewed changes

bharathv force-pushed the HBASE-25984 branch from 8c34920 to 98df2b1 Compare June 9, 2021 22:04

saintstack reviewed Jun 10, 2021

View reviewed changes

virajjasani self-requested a review June 10, 2021 09:11

bharathv requested a review from apurtell June 10, 2021 17:35

ben-manes reviewed Jun 11, 2021

View reviewed changes

bharathv force-pushed the HBASE-25984 branch from 98df2b1 to 680289a Compare June 14, 2021 02:06

bharathv changed the title ~~HBASE-25984: Avoid premature reuse of sync futures in FSHLog [DRAFT]~~ HBASE-25984: Avoid premature reuse of sync futures in FSHLog Jun 14, 2021

bharathv marked this pull request as ready for review June 14, 2021 02:07

bharathv requested a review from saintstack June 14, 2021 02:07

bharathv force-pushed the HBASE-25984 branch from 680289a to ff53bd0 Compare June 14, 2021 15:36

HBASE-25984: Avoid premature reuse of sync futures in FSHLog

a51f7d6

bharathv force-pushed the HBASE-25984 branch from ff53bd0 to a51f7d6 Compare June 14, 2021 15:38

virajjasani approved these changes Jun 15, 2021

View reviewed changes

bharathv merged commit 5a19bcf into apache:master Jun 16, 2021

bharathv deleted the HBASE-25984 branch June 16, 2021 21:30

bharathv added a commit to bharathv/hbase that referenced this pull request Jun 16, 2021

HBASE-25984: Avoid premature reuse of sync futures in FSHLog (apache#…

94eec29

…3371) Signed-off-by: Viraj Jasani <[email protected]> (cherry picked from commit 5a19bcf)

bharathv added a commit to bharathv/hbase that referenced this pull request Jun 16, 2021

HBASE-25984: Avoid premature reuse of sync futures in FSHLog (apache#…

5bec992

…3371) Signed-off-by: Viraj Jasani <[email protected]> (cherry picked from commit 5a19bcf)

bharathv added a commit to bharathv/hbase that referenced this pull request Jun 16, 2021

HBASE-25984: Avoid premature reuse of sync futures in FSHLog (apache#…

a727cb6

…3371) Signed-off-by: Viraj Jasani <[email protected]> (cherry picked from commit 5a19bcf)

bharathv pushed a commit to bharathv/hbase that referenced this pull request Jun 17, 2021

HBASE-25984: Avoid premature reuse of sync futures in FSHLog (apache#…

8cc95dc

…3371) Signed-off-by: Viraj Jasani [email protected] (cherry picked from commit 5a19bcf)

bharathv added a commit to bharathv/hbase that referenced this pull request Jun 17, 2021

HBASE-25984: Avoid premature reuse of sync futures in FSHLog (apache#…

89616b9

…3371) Signed-off-by: Viraj Jasani [email protected] (cherry picked from commit 5a19bcf)

bharathv added a commit to bharathv/hbase that referenced this pull request Jun 17, 2021

HBASE-25984: Avoid premature reuse of sync futures in FSHLog (apache#…

8e0913c

…3371) Signed-off-by: Viraj Jasani [email protected] (cherry picked from commit 5a19bcf)

bharathv added a commit to bharathv/hbase that referenced this pull request Jun 17, 2021

HBASE-25984: Avoid premature reuse of sync futures in FSHLog (apache#…

958759e

…3371) Signed-off-by: Viraj Jasani <[email protected]> (cherry picked from commit 5a19bcf)

bharathv added a commit to bharathv/hbase that referenced this pull request Jun 17, 2021

HBASE-25984: Avoid premature reuse of sync futures in FSHLog (apache#…

78352e0

…3371) Signed-off-by: Viraj Jasani <[email protected]> (cherry picked from commit 5a19bcf)

bharathv added a commit to bharathv/hbase that referenced this pull request Jun 17, 2021

HBASE-25984: Avoid premature reuse of sync futures in FSHLog (apache#…

ace10fe

…3371) Signed-off-by: Viraj Jasani <[email protected]> (cherry picked from commit 5a19bcf)

bharathv added a commit that referenced this pull request Jun 17, 2021

HBASE-25984: Avoid premature reuse of sync futures in FSHLog (#3371) (#…

836fd3c

…3392) Signed-off-by: Viraj Jasani <[email protected]> (cherry picked from commit 5a19bcf)

bharathv added a commit that referenced this pull request Jun 17, 2021

HBASE-25984: Avoid premature reuse of sync futures in FSHLog (#3371) (#…

33f175f

…3394) Signed-off-by: Viraj Jasani <[email protected]> (cherry picked from commit 5a19bcf)

bharathv added a commit to bharathv/hbase that referenced this pull request Jun 18, 2021

HBASE-25984: Avoid premature reuse of sync futures in FSHLog (apache#…

ad8b464

…3371) Signed-off-by: Viraj Jasani [email protected] (cherry picked from commit 5a19bcf)

bharathv added a commit that referenced this pull request Jun 19, 2021

HBASE-25984: Avoid premature reuse of sync futures in FSHLog (#3371) (#…

2e24bad

…3398) Signed-off-by: Viraj Jasani [email protected] (cherry picked from commit 5a19bcf)

		SyncFuture future = syncFutureCache.asMap().remove(Thread.currentThread());
		return (future == null) ? new SyncFuture() : future;

HBASE-25984: Avoid premature reuse of sync futures in FSHLog #3371

HBASE-25984: Avoid premature reuse of sync futures in FSHLog #3371

Uh oh!

Conversation

bharathv commented Jun 9, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Apache-HBase commented Jun 9, 2021

Uh oh!

Apache-HBase commented Jun 9, 2021

Uh oh!

Apache-HBase commented Jun 9, 2021

Uh oh!

Apache-HBase commented Jun 9, 2021

Uh oh!

Apache-HBase commented Jun 9, 2021

Uh oh!

Apache-HBase commented Jun 9, 2021

Uh oh!

saintstack left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bharathv commented Jun 10, 2021

Uh oh!

saintstack commented Jun 10, 2021

Uh oh!

bharathv commented Jun 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bharathv commented Jun 10, 2021

Uh oh!

bharathv commented Jun 10, 2021

Uh oh!

bharathv commented Jun 10, 2021

Uh oh!

saintstack commented Jun 11, 2021

Uh oh!

bharathv commented Jun 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Apache9 commented Jun 11, 2021

Uh oh!

bharathv commented Jun 11, 2021

Uh oh!

Apache9 commented Jun 11, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Apache-HBase commented Jun 14, 2021

Uh oh!

Apache-HBase commented Jun 14, 2021

Uh oh!

Apache-HBase commented Jun 14, 2021

Uh oh!

Apache-HBase commented Jun 14, 2021

bharathv commented Jun 10, 2021 •

edited

Loading

bharathv commented Jun 11, 2021 •

edited

Loading

virajjasani left a comment •

edited

Loading