Skip to content

Conversation

@chenxu14
Copy link
Contributor

during Log rolling, we should bypass the slow DN that the previously Log used.

@chenxu14 chenxu14 changed the title when WAL sync failed, we should bypass the failed DN that previously used HBASE-20902 when WAL sync failed, we should bypass the failed DN that previously used Apr 30, 2019
}
if (LOG.isDebugEnabled()) {
StringBuilder sb = new StringBuilder("create new output because old wal sync failed, old path is: ");
sb.append(oldPathStr).append(", newPath excludesNodes are :");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:tabs in line

if (LOG.isDebugEnabled()) {
StringBuilder sb = new StringBuilder("create new output because old wal sync failed, old path is: ");
sb.append(oldPathStr).append(", newPath excludesNodes are :");
for(DatanodeInfo info : excludesNodes) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:tabs in line

StringBuilder sb = new StringBuilder("create new output because old wal sync failed, old path is: ");
sb.append(oldPathStr).append(", newPath excludesNodes are :");
for(DatanodeInfo info : excludesNodes) {
sb.append(info.getInfoAddr()).append(";");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:tabs in line

sb.append(oldPathStr).append(", newPath excludesNodes are :");
for(DatanodeInfo info : excludesNodes) {
sb.append(info.getInfoAddr()).append(";");
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:tabs in line

for(DatanodeInfo info : excludesNodes) {
sb.append(info.getInfoAddr()).append(";");
}
LOG.debug(sb.toString());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:tabs in line

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 23 Docker mode activated.
_ Prechecks _
+1 hbaseanti 0 Patch does not have any anti-patterns.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 23 new or modified test files.
_ master Compile Tests _
0 mvndep 26 Maven dependency ordering for branch
+1 mvninstall 240 master passed
+1 compile 83 master passed
+1 checkstyle 92 master passed
+1 shadedjars 265 branch has no errors when building our shaded downstream artifacts.
+1 findbugs 264 master passed
+1 javadoc 64 master passed
_ Patch Compile Tests _
0 mvndep 16 Maven dependency ordering for patch
+1 mvninstall 234 the patch passed
+1 compile 86 the patch passed
+1 javac 86 the patch passed
-1 checkstyle 71 hbase-server: The patch generated 14 new + 157 unchanged - 0 fixed = 171 total (was 157)
-1 whitespace 0 The patch 5 line(s) with tabs.
+1 shadedjars 257 patch has no errors when building our shaded downstream artifacts.
+1 hadoopcheck 509 Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0.
+1 findbugs 279 the patch passed
+1 javadoc 63 the patch passed
_ Other Tests _
+1 unit 27 hbase-hadoop-compat in the patch passed.
+1 unit 36 hbase-hadoop2-compat in the patch passed.
-1 unit 8582 hbase-server in the patch failed.
+1 asflicense 62 The patch does not generate ASF License warnings.
11424
Reason Tests
Failed junit tests hadoop.hbase.replication.TestSerialSyncReplication
hadoop.hbase.replication.TestSyncReplicationMoreLogsInLocalCopyToRemote
hadoop.hbase.TestFullLogReconstruction
hadoop.hbase.replication.TestSyncReplicationActive
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-205/1/artifact/out/Dockerfile
GITHUB PR #205
Optional Tests dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
uname Linux 1b4b38d5e6a3 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 GNU/Linux
Build tool maven
Personality /testptch/patchprocess/precommit/personality/provided.sh
git revision master / 3f40df8
maven version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z)
Default Java 1.8.0_181
findbugs v3.1.11
checkstyle https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-205/1/artifact/out/diff-checkstyle-hbase-server.txt
whitespace https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-205/1/artifact/out/whitespace-tabs.txt
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-205/1/artifact/out/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-205/1/testReport/
Max. process+thread count 4812 (vs. ulimit of 10000)
modules C: hbase-hadoop-compat hbase-hadoop2-compat hbase-server U: .
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-205/1/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@Apache9
Copy link
Contributor

Apache9 commented May 1, 2019

Mind explaining the implementation a bit? The patch is so big, and across lots of files, not easy to follow up...

@chenxu14
Copy link
Contributor Author

chenxu14 commented May 6, 2019

Mind explaining the implementation a bit? The patch is so big, and across lots of files, not easy to follow up...

  1. modified WALActionsListener#logRollRequested(boolean tooFewReplicas, boolean syncFailed)
    add syncFailed param to identify whether logRoll request was caused by a sync failure,
    If so, LogRoller thread will call WAL#rollWriter(boolean force, boolean syncFailed), and pass syncFailed param to it

  2. modified WAL#rollWriter(boolean force, boolean syncFailed) throws FailedLogCloseException, IOException;
    If syncFailed is true, pass current WAL’s path to AbstractFSWAL#createWriterInstance(newPath, oldPath) as oldPath,
    so FanOutOneBlockAsyncDFSOutputHelper can determine which DN is slow based on the oldPath, and add it to excludesNodes

Copy link
Member

@carp84 carp84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also fix the checkstyle warnings reported by @Apache-HBase robot.

for(LocatedBlock block : namenode.getBlockLocations(oldPathStr, Math.max(0, len - 1), len)
.getLocatedBlocks()) {
for(DatanodeInfo dn : block.getLocations()) {
excludesNodes = ArrayUtils.add(excludesNodes, dn);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it always adding nodes into exclude list but never check and remove even after the DN recovers, right? So it's possible that one day all DN nodes are excluded and the OutputStream will fail due to could only be replicated to 0 nodes?

Copy link
Contributor Author

@chenxu14 chenxu14 May 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for your review @carp84, there seems to be some difference opinions with HBASE-22301, i will fix the checkstyle first.

So it's possible that one day all DN nodes are excluded

excludesNodes are not a global variable, each FanOutOneBlockAsyncDFSOutput will use different instance, when new FanOutOneBlockAsyncDFSOutput created, it's Initial excludesNodes will be an empty array(code in FanOutOneBlockAsyncDFSOutputHelper#createOutput)

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 0 Docker mode activated.
-1 patch 7 #205 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/in-progress/precommit-patchnames for help.
Subsystem Report/Notes
GITHUB PR #205
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-205/3/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 0 Docker mode activated.
-1 patch 6 #205 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/in-progress/precommit-patchnames for help.
Subsystem Report/Notes
GITHUB PR #205
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-205/4/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@saintstack
Copy link
Contributor

Any updates @chenxu14 ? Thanks.

@Apache9
Copy link
Contributor

Apache9 commented Jul 23, 2019

Close since the PR is stale. Feel free to reopen.

@Apache9 Apache9 closed this Jul 23, 2019
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants