首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >致命的master.HMaster:意外状态:..无法将其转换为脱机

致命的master.HMaster:意外状态:..无法将其转换为脱机
EN

Stack Overflow用户
提问于 2013-07-23 00:21:31
回答 2查看 1.9K关注 0票数 3

我有一个严重的Hbase崩溃问题。我使用的是带有一个主服务器和两个区域服务器的HBase 0.94.7。HBase主机经常崩溃,我甚至不能重启它。我得到了如下的主日志:

代码语言:javascript
运行
复制
DEBUG master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=master,60020,1374506461230, region=46c2333f401964bf877254be19c2cc8c
DEBUG handler.ClosedRegionHandler: Handling CLOSED event for 6423df864603aa6e8c45c726ab3ae62f
DEBUG master.AssignmentManager: Forcing OFFLINE; was=LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF8\xB3\x8F\x17\xCE\xE2g\x84,1374498065657.6423df864603aa6e8c45c726ab3ae62f. state=CLOSED, ts=1374508769672, server=slave,60020,1374506460892
DEBUG zookeeper.ZKAssign: master:60000-0x14006f52f3f000e Creating (or updating) unassigned node for 6423df864603aa6e8c45c726ab3ae62f with OFFLINE state
FATAL master.HMaster: Unexpected state : LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF6\xC17p&c\x8F\x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d. state=PENDING_OPEN, ts=1374508769697, server=master,60020,1374506461230 .. Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state : LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF6\xC17p&c\x8F\x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d. state=PENDING_OPEN, ts=1374508769697, server=master,60020,1374506461230 .. Cannot transit it to OFFLINE.
    at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879)
    at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688)
    at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
    at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
    at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
    at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
INFO master.HMaster: Aborting
DEBUG handler.ClosedRegionHandler: Handling CLOSED event for 0710b486dcb3d51465695b51db376255

……

代码语言:javascript
运行
复制
DEBUG master.AssignmentManager: The znode of region LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF6\xC17p&c\x8F\x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d. has been deleted.
INFO master.AssignmentManager: The master has opened the region LogDetail,\x00\x00\x01\xE8\x00\x00\x01?\xF6\xC17p&c\x8F\x14,1374498085655.c2f4143750eb1559a1dd92e937ea712d. that was online on master,60020,1374506461230
DEBUG master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=master,60000,1374508461536, region=c9cfdd360c09b292412ba5ad88815e6f
DEBUG catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@5c061cd2
INFO client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x14006f52f3f000f
INFO zookeeper.ZooKeeper: Session: 0x14006f52f3f000f closed
INFO zookeeper.ClientCnxn: EventThread shut down
INFO master.AssignmentManager$TimerUpdater: master,60000,1374508461536.timerUpdater exiting
INFO master.SplitLogManager$TimeoutMonitor: master,60000,1374508461536.splitLogManagerTimeoutMonitor exiting
INFO master.AssignmentManager$TimeoutMonitor: master,60000,1374508461536.timeoutMonitor exiting
INFO zookeeper.ZooKeeper: Session: 0x14006f52f3f000e closed
INFO zookeeper.ClientCnxn: EventThread shut down
INFO master.HMaster: HMaster main thread exiting
ERROR master.HMasterCommandLine: Failed to start master

我还在ZK日志中发现了一些不寻常的东西:

代码语言:javascript
运行
复制
INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /master:37856
INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /master:37856
INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x140100dda0300e1 with negotiated timeout 180000 for client /master:37856
WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x140100dda0300e1, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:662)
INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /master:37856 which had sessionid 0x140100dda0300e1

有没有人能帮我看看问题出在哪里?它是否与未分配的区域或类似的东西有关?我试过bin/hbase hbck -repairbin/hbase hbck -fix,但没有用。

谢谢

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2013-07-26 21:56:39

在仔细检查了我的地域服务器的日志后,我得到了答案。

导致

原来有一个叫做'SNAPPY‘的压缩hbase表的库没有在地域服务器上很好的安装。我所有的表都是使用这种压缩算法创建的。当主服务器尝试将地域均衡到地域服务器时,会失败。最终,主程序中止了。

解决方案

在每个节点上安装并配置SNAPPY,如下所示:

代码语言:javascript
运行
复制
apt-get install libsnappy1
su hbase
mkdir /home/hbase/hbase-0.94.7/lib/native/Linux-amd64-64
ln -s /usr/lib/libsnappy.so.1.1.2 /home/hbase/hbase-0.94.7/lib/native/Linux-amd64-64/libsnappy.so
exit (-> root)
ln -s /usr/lib/libsnappy.so.1.1.2 /usr/lib64/libsnappy.so.1.1.2
ln -s /usr/lib/libsnappy.so.1.1.2 /usr/lib64/libsnappy.so.1
ln -s /usr/lib/libsnappy.so.1.1.2 /usr/lib64/libsnappy.so
ln -s /usr/lib/libsnappy.so.1 /usr/lib/libsnappy.so

现在一切都好了!区域在区域服务器上得到了很好的平衡。

票数 4
EN

Stack Overflow用户

发布于 2015-01-12 18:10:20

请查看地域服务器日志,如果是LZO压缩器丢失导致,并且您使用的是Cloudera Hadoop,您可以按照以下说明轻松安装lzo:

http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/v1/v1-0-1/Installing-and-Using-Impala/ciiu_lzo.html

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/17792619

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档