hbase 遇到的问题解决

分享 shengda ⋅ 于 2017-03-14 15:55:37 ⋅ 最后回复由 shengda 2017-03-20 11:17:55 ⋅ 10175 阅读

近两天学习Hbase的全分布式搭建,因为好几个地方疏忽,几个问题同时出现,着实费了好些时间才理清,为了方便理解,问题解决后每次只重现一个错误,分别记录。

一、防火墙未关闭

之前记得在学Hadoop的时候所有节点的防火墙就已经关好了,所以这个问题刚开始的时候压根就没往上考虑过,上网查了好久发现有相同经历的文章才去核实。

现象:start-abase.sh执行能看到hmaster进程打开,但是用web UI访问不了 http://:16010(我这里masternode是hadoop.lsd1.com,后续不再重述);并且一段时间后所有节点的hmaster和hregionserver都挂掉;查master节点的日志hba se-root-master-hadoop.lsd1.com.log有如下错误。


2017-03-13 12:02:29,850 INFO  [main-SendThread(hadoop.lsd3.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server hadoop.lsd3.com/192.168.56.13:2181. Will not attempt to authenticate using SASL (unknown error)
2017-03-13 12:02:29,851 WARN  [main-SendThread(hadoop.lsd3.com:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
Java.NET.NoRouteToHostException: 没有到主机的路由
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-03-13 12:02:30,934 INFO  [main-SendThread(hadoop.lsd1.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server hadoop.lsd1.com/192.168.56.11:2181. Will not attempt to authenticate using SASL (unknown error)
2017-03-13 12:02:30,935 INFO  [main-SendThread(hadoop.lsd1.com:2181)] zookeeper.ClientCnxn: Socket connection established to hadoop.lsd1.com/192.168.56.11:2181, initiating session
2017-03-13 12:02:30,938 INFO  [main-SendThread(hadoop.lsd1.com:2181)] zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2017-03-13 12:02:31,038 ERROR [main] zookeeper.RecoverableZooKeeper: ZooKeeper create failed after 4 attempts
2017-03-13 12:02:31,040 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster. 
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2426)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:231)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:137)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2436)
Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException: master:160000x0, quorum=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181, baseZNode=/hbase Unexpected KeeperException creating base node
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:206)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:187)
at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:585)
at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:381)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2419)
... 5 more
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:565)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:544)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:1204)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:1182)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:194)
... 13 more
[root@hadoop hbase-1.2.4]# 

这看起来是zookeeper在通信的时候遇到问题了,其实像这类网络问题可能的原因应该不止是防火墙没关,这里我不好说,只能说就我遇到的来讲,防火墙算是一种原因;后续的实验中把其他节点的防火墙关闭掉,单单打开某一个regionserver节点的防火墙,也会导致全部的hmaster和hregionserver挂掉(这有点费解,难道hbase不允许节点故障吗?);如果是启动的时候防火墙关闭,在启动成功后再打开某个节点的防火墙(包括master节点也是),却并不会导致集群退出,只是该节点无法访问,并且在再次关闭该节点后都是可以恢复访问的;

解决方法:
在所有的节点核实一下防火墙是否关闭
service frewalld status
如果没有关闭,关闭掉,并且禁用
systemctl stop firewalls.service
systemctl disable firewalls.service
然后重启hbase,这个问题已经解决,web UI能访问到16010端口了,剩下还有其他的问题另开一篇描述。
总结:如果start-hbase.sh运行后能用jps查看到hmaster进程正常,但是用web UI又访问不到master,并且查看日志有类似上述的connect 错误时,可以考虑一下是否防火墙没关好;当然hmaster和hregionserver进程也很可能在过一小段时间后全挂掉,所以主要还是要查看日志来判断。

二、集群时间不同步

现象:start-abase.sh启动Hbase集群后,web UI能访问到masternode:16010,但是过几秒钟后就发现从节点的regionserver进程都自己死掉了;查询该节点的hbase--regionserver-.log(我这里是hbase-root-regionserver-Hadoop.lsd3.com.log)如下:


2017-03-13 08:39:32,547 FATAL [regionserver/hadoop.lsd3.com/192.168.56.13:16020] regionserver.HRegionServer: Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server hadoop.lsd3.com,16020,1489408762185 has been rejected; Reported time is too far out of sync with master.  Time difference of 3729977ms > max allowed of 30000ms
at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.Java:409)
at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:275)
at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:361)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2180)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:329)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2298)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:906)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ClockOutOfSyncException): org.apache.hadoop.hbase.ClockOutOfSyncException: Server hadoop.lsd3.com,16020,1489408762185 has been rejected; Reported time is too far out of sync with master.  Time difference of 3729977ms > max allowed of 30000ms
at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:409)
at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:275)
at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:361)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2180)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)

at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1267)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2296)
... 2 more

这还是比较好分辨的,开头错误的地方已经说了时间不同步,于是使用date命令检查一下各节点与master节点的时间是否一致,果然差好几个小时。
解决方法:
在每个节点中设置同样的时间:我这里是日期相同小时数不同而已,使用date -s xx:xx:xx就可以,因为要一个个设置不同的节点,时间肯定会差个几秒钟,没有关系的,从日志可以看出集群能容忍一定的时间差(这里是30000ms)
当然也可以用ntp来设置同步,不详述。

---2017-3-16更
发现虚拟机休眠会影响时间进度,本来同步好时间的几个虚拟机节点,如果其中某个节点暂停一段时间,会导致节点间的时间又会不同步,所以如果有节点不用还是直接关掉比较好;我用的是virtualbox,不知道其他的虚拟机平台是不是也这样;另外如果有同学知道怎么设置避免这个现象,麻烦留言告知,感激不尽。

三、集群中存在没有做好ip映射的节点

现象:启动start-abase.sh后过一小段时间,所有的hmaster和regionserver进程全部自动死掉。
这个问题因为hmaster和hregionserver进程都死掉,一直以为是什么别的原因,也没有耐心去查看日志,花了很多时间瞎弄,后来无意间才发现我这有两个节点根本无法解析另一个节点的主机名(Hadoop.lsd4.com),才导致这样的问题,贴一下日志:


2017-03-13 09:18:41,194 INFO  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181 sessionTimeout=90000 watcher=regionserver:160200x0, quorum=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181, baseZNode=/Hbase
2017-03-13 09:18:41,195 WARN  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.RecoverableZooKeeper: Unable to create ZooKeeper Connection
Java.NET.UnknownHostException: hadoop.lsd4.com
at java.Net.InetAddress.getAllByName0(InetAddress.java:1259)
at java.net.InetAddress.getAllByName(InetAddress.java:1171)
at java.net.InetAddress.getAllByName(InetAddress.java:1105)
at org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:141)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1236)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1225)
at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1416)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1090)
at java.lang.Thread.run(Thread.java:745)
2017-03-13 09:18:42,206 INFO  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181 sessionTimeout=90000 watcher=regionserver:160200x0, quorum=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181, baseZNode=/hbase
2017-03-13 09:18:42,207 WARN  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.RecoverableZooKeeper: Unable to create ZooKeeper Connection
java.net.UnknownHostException: hadoop.lsd4.com
at java.net.InetAddress.getAllByName0(InetAddress.java:1259)
at java.net.InetAddress.getAllByName(InetAddress.java:1171)
at java.net.InetAddress.getAllByName(InetAddress.java:1105)
at org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:141)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1236)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1225)
at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1416)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1090)
at java.lang.Thread.run(Thread.java:745)
2017-03-13 09:18:44,207 INFO  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181 sessionTimeout=90000 watcher=regionserver:160200x0, quorum=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181, baseZNode=/hbase
2017-03-13 09:18:44,208 WARN  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.RecoverableZooKeeper: Unable to create ZooKeeper Connection
java.net.UnknownHostException: hadoop.lsd4.com
at java.net.InetAddress.getAllByName0(InetAddress.java:1259)
at java.net.InetAddress.getAllByName(InetAddress.java:1171)
at java.net.InetAddress.getAllByName(InetAddress.java:1105)
at org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:141)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1236)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1225)
at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1416)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1090)
at java.lang.Thread.run(Thread.java:745)
2017-03-13 09:18:48,208 INFO  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181 sessionTimeout=90000 watcher=regionserver:160200x0, quorum=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181, baseZNode=/hbase
2017-03-13 09:18:48,209 WARN  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.RecoverableZooKeeper: Unable to create ZooKeeper Connection
java.net.UnknownHostException: hadoop.lsd4.com: unknown error
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:907)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1302)
at java.net.InetAddress.getAllByName0(InetAddress.java:1255)
at java.net.InetAddress.getAllByName(InetAddress.java:1171)
at java.net.InetAddress.getAllByName(InetAddress.java:1105)
at org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:141)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1236)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1225)
at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1416)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1090)
at java.lang.Thread.run(Thread.java:745)
2017-03-13 09:18:56,210 INFO  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181 sessionTimeout=90000 watcher=regionserver:160200x0, 
2017-03-13 09:18:56,211 ERROR [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.RecoverableZooKeeper: ZooKeeper delete failed after 4 attempts
2017-03-13 09:18:56,212 WARN  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] regionserver.HRegionServer: Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$OperationTimeoutException: KeeperErrorCode = OperationTimeout
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:144)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1236)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1225)
at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1416)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1090)
at java.lang.Thread.run(Thread.java:745)
2017-03-13 09:18:56,213 INFO  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] regionserver.HRegionServer: stopping server hadoop.lsd2.com,16020,1489411066501; zookeeper connection closed.
2017-03-13 09:18:56,213 INFO  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] regionserver.HRegionServer: regionserver/hadoop.lsd2.com/192.168.56.12:16020 exiting
2017-03-13 09:18:56,225 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:68)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2677)
[root@hadoop logs]# 

其实回头来看,如果仔细查看日志的话,也不难找出问题,我这里是hadoop.lsd2.com/hadoop.lsd3.com两个节点的/etc/hosts文件中没有配置好hadoop.lsd4.com的映射(应该是以前做别的试验删掉了没及时还原),导致在通信的时候无法解析域名。
解决方法:重新把主机名映射写的最全的节点的/etc/hosts文件拷贝到各节点,保证每个节点的主机名都能解析,再重启集群。


从上面的三种情况来看,每种情形都会导致进群的某些进程挂掉,所以主要还是遇到问题仔细查日志,自勉。

版权声明:原创作品,允许转载,转载时务必以超链接的形式表明出处和作者信息。否则将追究法律责任。来自海汼部落-shengda,http://hainiubl.com/topics/71
本帖由 青牛 于 7年前 解除加精
回复数量: 1
  • shengda 奔跑的多多
    2017-03-20 11:17:55

    又了解到一位同学因为 hbase.zookeeper.quorum(在hbase-site.xml中配置)没配置好也会导致类似的问题;另外也有同学zookeeper的zoo.cfg文件有就的配置痕迹没删除也会导致进程不稳定(zookeeper目前我不了解,先记录下来)

暂无评论~~
  • 请注意单词拼写,以及中英文排版,参考此页
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`, 更多语法请见这里 Markdown 语法
  • 支持表情,可用Emoji的自动补全, 在输入的时候只需要 ":" 就可以自动提示了 :metal: :point_right: 表情列表 :star: :sparkles:
  • 上传图片, 支持拖拽和剪切板黏贴上传, 格式限制 - jpg, png, gif,教程
  • 发布框支持本地存储功能,会在内容变更时保存,「提交」按钮点击时清空
Ctrl+Enter