Skip to content

Conversation

@xiaoyuyao
Copy link
Contributor

… on Datanode details from heartbeat. Contributed by Xiaoyu Yao.

… on Datanode details from heartbeat. Contributed by Xiaoyu Yao.
@xiaoyuyao xiaoyuyao self-assigned this Jun 23, 2019
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 42 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 495 trunk passed
+1 compile 252 trunk passed
+1 checkstyle 69 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 935 branch has no errors when building and testing our client artifacts.
+1 javadoc 155 trunk passed
0 spotbugs 316 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 511 trunk passed
_ Patch Compile Tests _
+1 mvninstall 441 the patch passed
+1 compile 259 the patch passed
+1 javac 259 the patch passed
+1 checkstyle 74 the patch passed
+1 mvnsite 0 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 728 patch has no errors when building and testing our client artifacts.
+1 javadoc 155 the patch passed
+1 findbugs 536 the patch passed
_ Other Tests _
+1 unit 270 hadoop-hdds in the patch passed.
-1 unit 1351 hadoop-ozone in the patch failed.
+1 asflicense 46 The patch does not generate ASF License warnings.
6479
Reason Tests
Failed junit tests hadoop.ozone.client.rpc.TestBlockOutputStream
hadoop.ozone.client.rpc.TestBCSID
hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException
hadoop.ozone.scm.node.TestQueryNode
hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis
hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerByPipeline
hadoop.ozone.client.rpc.TestOzoneAtRestEncryption
hadoop.ozone.container.TestContainerReplication
hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient
hadoop.ozone.client.rpc.TestSecureOzoneRpcClient
hadoop.ozone.client.rpc.TestCommitWatcher
hadoop.ozone.client.rpc.TestOzoneRpcClient
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1008/1/artifact/out/Dockerfile
GITHUB PR #1008
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 1c1ecfa50501 4.4.0-141-generic #167~14.04.1-Ubuntu SMP Mon Dec 10 13:20:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / b28ddb2
Default Java 1.8.0_212
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1008/1/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1008/1/testReport/
Max. process+thread count 5147 (vs. ulimit of 5500)
modules C: hadoop-hdds/server-scm U: hadoop-hdds/server-scm
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1008/1/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@xiaoyuyao
Copy link
Contributor Author

/retest

} else {
// Get the datanode details again from node manager with the topology info
// for registered datanodes.
datanodeDetails = nodeManager.getNode(datanodeDetails.getIpAddress());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not rely on the IP address of datanode in NetworkTopology path, instead we should use the datanode UUID. It is possible that more than one datanode process is running on the same machine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than one DN instances on the same machine are most likely from test/dev environment such as MiniOzoneCluster. In production, even containers in K8S has dedicate IPs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But IP address can change for the same datanode. In fact, we have a Jira to remove it in the future from the yaml file: HDDS-1480

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Property "dfs.datanode.use.datanode.hostname" is used to control whether use IP address or hostname. Use Ip address or hostname, current exiting hadoop/hdfs/yarn topology tools/customer mgt scripts can be reused. It would be easy for user to adopt Ozone. @xiaoyuyao, I can take over this if you are fully occupied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than one DN instances on the same machine are most likely from test/dev environment such as MiniOzoneCluster. In production, even containers in K8S has dedicate IPs.

I agree, but the problem here is that after this change the test/dev environment where there are more than one datanode process running in same machine will not even work properly. Heartbeat from different datanode process (running on same machine) will be mapped to a single process and all the other datanode process will be marked as dead even though they are heartbeating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nandakumar131, yes. We will need to handle this case for the minicluster based tests.
The current topology awareness is based on a map of ip/dns->location, I think change it to uuid->location should work as long we have a mapping from uuid->ip/dns maintained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on changing it to uuid -> location and maintaining a map for uuid -> ip/dns.

} else {
// Get the datanode details again from node manager with the topology info
// for registered datanodes.
datanodeDetails = nodeManager.getNode(datanodeDetails.getIpAddress());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xiaoyu, node can use Ipaddress or hostname as topology network name.
Maybe we should refactor nodeManager.getNode function, pass datanodeDetails in. Then make whether use Ipaddress or hostname as network topology name an inner logic in the getNode function.

} else {
// Get the datanode details again from node manager with the topology info
// for registered datanodes.
datanodeDetails = nodeManager.getNode(datanodeDetails.getIpAddress());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Property "dfs.datanode.use.datanode.hostname" is used to control whether use IP address or hostname. Use Ip address or hostname, current exiting hadoop/hdfs/yarn topology tools/customer mgt scripts can be reused. It would be easy for user to adopt Ozone. @xiaoyuyao, I can take over this if you are fully occupied.

@ChenSammi
Copy link
Contributor

New patch to support datanode uuid -> ip/hostname -> network path mapping. Also resolve the heartbeat issue.

#1112

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 39 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 508 trunk passed
+1 compile 259 trunk passed
+1 checkstyle 64 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 823 branch has no errors when building and testing our client artifacts.
+1 javadoc 152 trunk passed
0 spotbugs 332 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 528 trunk passed
_ Patch Compile Tests _
+1 mvninstall 452 the patch passed
+1 compile 261 the patch passed
+1 javac 261 the patch passed
+1 checkstyle 71 the patch passed
+1 mvnsite 0 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 637 patch has no errors when building and testing our client artifacts.
+1 javadoc 161 the patch passed
+1 findbugs 566 the patch passed
_ Other Tests _
+1 unit 287 hadoop-hdds in the patch passed.
-1 unit 1571 hadoop-ozone in the patch failed.
+1 asflicense 43 The patch does not generate ASF License warnings.
6561
Reason Tests
Failed junit tests hadoop.ozone.container.ozoneimpl.TestSecureOzoneContainer
hadoop.ozone.client.rpc.TestOzoneRpcClient
hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException
hadoop.ozone.scm.node.TestQueryNode
hadoop.ozone.client.rpc.TestOzoneAtRestEncryption
hadoop.ozone.client.rpc.TestFailureHandlingByClient
hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerByPipeline
hadoop.ozone.client.rpc.TestSecureOzoneRpcClient
hadoop.ozone.client.rpc.TestBlockOutputStream
hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures
hadoop.ozone.client.rpc.TestCommitWatcher
hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures
hadoop.ozone.client.rpc.TestKeyInputStream
hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis
hadoop.ozone.web.client.TestKeysRatis
hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory
hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient
hadoop.ozone.container.server.TestSecureContainerServer
Subsystem Report/Notes
Docker Client=18.09.8 Server=18.09.8 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1008/2/artifact/out/Dockerfile
GITHUB PR #1008
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 3786908b30e1 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 4e66cb9
Default Java 1.8.0_212
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1008/2/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1008/2/testReport/
Max. process+thread count 4839 (vs. ulimit of 5500)
modules C: hadoop-hdds/server-scm U: hadoop-hdds/server-scm
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1008/2/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@xiaoyuyao xiaoyuyao closed this Jul 25, 2019
@xiaoyuyao
Copy link
Contributor Author

The JIRA has been taken over with a different PR.

@xiaoyuyao xiaoyuyao deleted the HDDS-1713 branch July 25, 2019 17:25
shanthoosh pushed a commit to shanthoosh/hadoop that referenced this pull request Oct 15, 2019
…in changelog (apache#1008)

* Throw a record too large exception for changelog oversized records
* Change implementation to handle large messages in CachedStore based on user defined configs
* Address review and change new Scala classes to Java
* Address review and add a test case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants