HBASE-27223 Avoid data inconsistent between primary and secondary rep… #4633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

comnetwork wants to merge 2 commits into apache:master from comnetwork:replicaex

Contributor

comnetwork commented Jul 19, 2022

…licas for the new region replication framework

comnetwork added 2 commits

July 19, 2022 19:55


          HBASE-27223 Avoid data inconsistent between primary and secondary rep…

161d843

…licas for the new region replication framework


          remove throws IOException

44f13c2

Apache-HBase commented Jul 19, 2022

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	2m 41s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+1 💚	mvninstall	2m 27s	master passed
+1 💚	compile	2m 16s	master passed
+1 💚	checkstyle	0m 33s	master passed
+1 💚	spotless	0m 46s	branch has no errors when running spotless:check.
+1 💚	spotbugs	1m 17s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	2m 11s	the patch passed
+1 💚	compile	2m 9s	the patch passed
+1 💚	javac	2m 9s	the patch passed
+1 💚	checkstyle	0m 30s	the patch passed
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	11m 36s	Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.
+1 💚	spotless	0m 44s	patch has no errors when running spotless:check.
+1 💚	spotbugs	1m 22s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 11s	The patch does not generate ASF License warnings.
		33m 54s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#4633
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname	Linux 81f457351398 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `acf1447`
Default Java	AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count	64 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/1/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase commented Jul 19, 2022

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 38s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	2m 22s	master passed
+1 💚	compile	0m 33s	master passed
+1 💚	shadedjars	3m 53s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 23s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	2m 8s	the patch passed
+1 💚	compile	0m 35s	the patch passed
+1 💚	javac	0m 35s	the patch passed
+1 💚	shadedjars	3m 58s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 21s	the patch passed
		_ Other Tests _
+1 💚	unit	201m 48s	hbase-server in the patch passed.
		218m 1s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#4633
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 0a3601907fd2 5.4.0-1071-aws #76~18.04.1-Ubuntu SMP Mon Mar 28 17:49:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `acf1447`
Default Java	AdoptOpenJDK-1.8.0_282-b08
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/1/testReport/
Max. process+thread count	2561 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/1/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase commented Jul 19, 2022

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 7s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	2m 59s	master passed
+1 💚	compile	0m 47s	master passed
+1 💚	shadedjars	3m 44s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 28s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	2m 35s	the patch passed
+1 💚	compile	0m 48s	the patch passed
+1 💚	javac	0m 48s	the patch passed
+1 💚	shadedjars	3m 45s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 26s	the patch passed
		_ Other Tests _
+1 💚	unit	205m 15s	hbase-server in the patch passed.
		223m 52s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#4633
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 0bb6b15b5419 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `acf1447`
Default Java	AdoptOpenJDK-11.0.10+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/1/testReport/
Max. process+thread count	2739 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/1/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase commented Jul 19, 2022

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 3s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+1 💚	mvninstall	2m 8s	master passed
+1 💚	compile	2m 14s	master passed
+1 💚	checkstyle	0m 30s	master passed
+1 💚	spotless	0m 43s	branch has no errors when running spotless:check.
+1 💚	spotbugs	1m 15s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	2m 15s	the patch passed
+1 💚	compile	2m 14s	the patch passed
+1 💚	javac	2m 14s	the patch passed
+1 💚	checkstyle	0m 31s	the patch passed
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	11m 22s	Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.
+1 💚	spotless	0m 45s	patch has no errors when running spotless:check.
+1 💚	spotbugs	1m 22s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 9s	The patch does not generate ASF License warnings.
		31m 44s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#4633
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname	Linux 444a247c0975 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `acf1447`
Default Java	AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count	64 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/2/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase commented Jul 19, 2022

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 40s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	2m 25s	master passed
+1 💚	compile	0m 34s	master passed
+1 💚	shadedjars	3m 56s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 23s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	2m 5s	the patch passed
+1 💚	compile	0m 34s	the patch passed
+1 💚	javac	0m 34s	the patch passed
+1 💚	shadedjars	3m 56s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 21s	the patch passed
		_ Other Tests _
+1 💚	unit	201m 32s	hbase-server in the patch passed.
		217m 47s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#4633
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux c89783809a76 5.4.0-1071-aws #76~18.04.1-Ubuntu SMP Mon Mar 28 17:49:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `acf1447`
Default Java	AdoptOpenJDK-1.8.0_282-b08
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/2/testReport/
Max. process+thread count	2412 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/2/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase commented Jul 19, 2022

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 9s	Docker mode activated.
-0 ⚠️	yetus	0m 2s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	2m 35s	master passed
+1 💚	compile	0m 49s	master passed
+1 💚	shadedjars	3m 43s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 25s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	2m 39s	the patch passed
+1 💚	compile	0m 46s	the patch passed
+1 💚	javac	0m 46s	the patch passed
+1 💚	shadedjars	3m 41s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 27s	the patch passed
		_ Other Tests _
+1 💚	unit	205m 46s	hbase-server in the patch passed.
		223m 37s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#4633
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 3e69fe08c33e 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `acf1447`
Default Java	AdoptOpenJDK-11.0.10+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/2/testReport/
Max. process+thread count	2683 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4633/2/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache9 reviewed

View reviewed changes

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractFSWAL.java

    
                protected static final int DEFAULT_SLOW_SYNC_ROLL_INTERVAL_MS = 60 * 1000; // in ms, 1 minute

                protected static final String WAL_SYNC_TIMEOUT_MS = "hbase.regionserver.wal.sync.timeout";

                public static final String WAL_SYNC_TIMEOUT_MS = "hbase.regionserver.wal.sync.timeout";

Contributor

Apache9 Jul 20, 2022

This is a missing part in our design, usually, if here we get a timeout exception, the only correct way is to abort the region server, as the design of WAL sync, is to succeed or die, there is no 'failure'. It is usually not a big deal is because we set a very large default value here, 5 minutes, usually the WAL system will abort the region server if it can not finish the sync within 5 minutes...

So I think we should throw a special IOException to upper layer, if we get this exception, we abort the region server.

Contributor Author

comnetwork Jul 20, 2022

@Apache9 ,ok, thank you very much for explanation, I would try to fix the code following your point.

Contributor

Apache9 Jul 20, 2022

I think this should be another issue.

Contributor Author

comnetwork Jul 20, 2022 •

edited

Loading

@Apache9 , after I read the code about WAL , I have a question:
For AsyncFSWAL, basically WAL.sync could not throw any exception except that TimeoutIOException,
but for FSHLog, WAL.sync could throw any exception thrown by ProtobufLogWriter.append and
ProtobufLogWriter.sync , and when throwing these exceptions, it just requests the WAL rolling and does not abort the RegionServer, so for AsyncFSWAL,we could abort the RegionServer, but for FSHLog, it is not suitable to abort the RegionServer when WAL.sync throws the exception other than TimeoutIOException , we still need a way to avoid the situation described by this issue.

Contributor

Apache9 Jul 20, 2022

This is also a problem for the FSHLog implementation. Basically, if the write to HDFS fails, we do not know whether the data has been persistent or not. The implementation for AsyncFSWAL, is to open a new writer and try to write the WAL entries again, and then adding logic in WAL split and replay to deal with duplicate entries. So for FSHLog, if it is not easy to add the same logic with AsyncFSWAL, the correct way is to abort the region server to let the failover logic to detect whether the WAL entries have been persistent or not.

Contributor Author

comnetwork Jul 20, 2022 •

edited

Loading

@Apache9, ok, I think we could open two new jiras , one is for AsyncFSWAL to abort the RegionServer for TimeoutIOException, and the other is to implement the retry WAL entries logic of HLog as same as AsyncFSWAL , which is a little more complicated.

Contributor Author

comnetwork Jul 21, 2022 •

edited

Loading

@Apache9 , I have opened HBASE-27230 and HBASE-27231 for these two problems.

comnetwork closed this

comnetwork mentioned this pull request

HBASE-27230 RegionServer should be aborted when WAL.sync throws Timeo… #4641

Merged

This was referenced Aug 23, 2022

HBASE-27303 Unnecessary replication to secondary region replicas shou… #4707

Merged

HBASE-27231 FSHLog should retry writing WAL entries when syncs to HDF… #4721

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet