org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException)
今天重新部署了一个3个节点的hadoop完全分布式集群,在第一台节点上安装好hadoop后,使用了伪分布式模式对集群进行了上传文件操作,运行了wordcount案例。在进行完全分布式时,发现有个节点无法启动,查看该节点的 .log文件,发现在文件最后抛出了以下错误。仔细回顾操作过程后,这一系列操作是属于往集群中添加节点。于是,我关闭了集群,并在namenode节点的etc/hadoop目录下添加了dfs.hosts文件(暂时没有在hdfs-site.xml中添加dfs.hosts属性),将新节点的名字添加了进去,重启集群发现所有的datanode可以正常启动。
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException): Data node DatanodeRegistration(192.168.10.106:50010, datanodeUuid=1ed09498-904c-43fd-99c5-5d1611b15c89, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a27afbc7-b086-4f78-9e39-7afdd34ed8cc;nsid=2127549857;c=0) is attempting to report storage ID 1ed09498-904c-43fd-99c5-5d1611b15c89. Node 192.168.10.107:50010 is expected to serve this storage.
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(DatanodeManager.java:495)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1788)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1321)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:171)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28756)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy14.blockReport(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:203)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:463)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:688)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:824)
at java.lang.Thread.run(Thread.java:745)
HDFS web页面看不到所有的datanode正常运行。
在namenode节点的hdfs-site.xml中添加dfs.hosts属性,并刷新datanode节点,运行sbin/start-balancer.sh使得集群平衡。
均不能使得web页面的datanode节点正常运行。在与namenode节点同在的datanode的.log文件中,看到如下错误:
2019-07-25 19:25:46,033 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing datanode Command
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(192.168.10.107:50010, datanodeUuid=1ed09498-904c-43fd-99c5-5d1611b15c89, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a27afbc7-b086-4f78-9e39-7afdd34ed8cc;nsid=2127549857;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:876)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4528)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1285)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:96)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28752)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy14.registerDatanode(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:124)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:754)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:886)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:609)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:858)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:672)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:824)
at java.lang.Thread.run(Thread.java:745)
重启集群。web页面还是没看到所有的datanode正常运行。
查看日志文件,发现namenode节点上的datanode抛出了如下错误:
2019-07-25 19:35:38,323 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-812608005-192.168.10.107-1564046841125 (Datanode Uuid null) service to hadoop107/192.168.10.107:9000 beginning handshake with NN
2019-07-25 19:35:38,326 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool BP-812608005-192.168.10.107-1564046841125 (Datanode Uuid null) service to hadoop107/192.168.10.107:9000 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(192.168.10.107:50010, datanodeUuid=1ed09498-904c-43fd-99c5-5d1611b15c89, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-a27afbc7-b086-4f78-9e39-7afdd34ed8cc;nsid=2127549857;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:876)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4528)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1285)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:96)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28752)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
仔细分析以上日志,是datanode注册的问题。那么首先定位到的位置是slaves文件,仔细检查后,发现该文件的配置没有问题。于是,将焦点转到了dfs.hosts文件,由于一开始考虑的是该文件指示的是新服役的节点,但是在hdfs-site.xml文件中配置了dfs.hosts属性后,系统会根据这个文件中的节点而启动节点。于是我在dfs.hosts中追加了namenode所在节点的名称。重启集群发现,所有的节点均正常运行。查看日志文件输出,一切正常。
注:关于上面第一个错误,下面一篇博客也做出了相应的分析。读者所遇到的情况如果和我的不符,可以参考之。
https://blog.csdn.net/shaock2018/article/details/87890920