作者介绍:简历上没有一个精通的运维工程师。下面的思维导图也是预计更新的内容和当前进度(不定时更新)。
前面我们介绍介绍了几个常用的代理服务器,本章节我们讲来讲解Zookeeper这个中间件。
在我们后面要讲解的各种分布式系统里面,需要遵循一个基本原则就是奇数节点。选举的时候,需要满足半数以上的节点:3节点需要2个节点,5节点需要3个节点。才可以正常选举或者提供服务。
我们前面部署了集群版的ZooKeeper,里面有2个角色,一个是Leader,另外一个是Floower,他们是如何来选举自己Leader呢?
这个首先要分区是有数据的选举还是无数据的选举,前面我们在配置集群的时候,给每个节点都添加了一个节点id(myid),尤其在初次选举的时候很重要。
ZooKeeper的选举流程是其集群实现高可用的核心机制,主要依赖ZAB协议(ZooKeeper Atomic Broadcast)。以下是选举流程的详细步骤:
0x100000001
)。5. 选举过程
无数据情况:集群配置完成以后,第一次启动,假设我们这里从myid最小的时候启动。
节点1启动
2025-04-19 10:26:53,060 [myid:] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@1455] - LOOKING
2025-04-19 10:26:53,061 [myid:] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.FastLeaderElection@946] -New election. My id = 1, proposed zxid=0x0
2025-04-19 10:26:53,071 [myid:] - INFO [ListenerHandler-/192.168.31.140:3888:o.a.z.s.q.QuorumCnxManager$Listener$ListenerHandler@1071] - 1 isaccepting connections now, my election bind port: /192.168.31.140:3888
2025-04-19 10:26:53,071 [myid:] - INFO [WorkerReceiver[myid=1]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:1, n.state:LOOKING, n.leader:1, n.round:0x1, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0
2025-04-19 10:26:53,078 [myid:] - WARN [QuorumConnectionThread-[myid=1]-1:o.a.z.s.q.QuorumCnxManager@401] - Cannot open channel to 2 at election address /192.168.31.141:3888
大概意思开始选举,我的myid是1,zxid是0x0(代表没数据),并且自己投自己为Leader。由于其他节点未启动所以还处于选举状态中,也就是我们前面提到的Looking状态。
节点2启动
2025-04-19 10:33:21,444 [myid:] - INFO [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@1455] - LOOKING
2025-04-19 10:33:21,444 [myid:] - INFO [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.FastLeaderElection@946] -New election. My id = 2, proposed zxid=0x0
2025-04-19 10:33:21,454 [myid:] - INFO [ListenerHandler-/192.168.31.141:3888:o.a.z.s.q.QuorumCnxManager$Listener$ListenerHandler@1071] - 2 isaccepting connections now, my election bind port: /192.168.31.141:3888
2025-04-19 10:33:21,455 [myid:] - INFO [WorkerReceiver[myid=2]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:2, n.round:0x1, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0
这里的意思和上面基本雷同,然后后续日志显示他已经被当选leader,因为总共3个节点,集群已经满足半数要求,并且由于2号节点的myid是2大于1号节点,所以他选上Leader。
2025-04-19 10:33:21,675 [myid:] - INFO [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@903] -
Peer state changed: leading
2025-04-19 10:33:21,675 [myid:] - INFO [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@1549] - LEADING
这个时候节点1也显示他收到了第二个节点的投票信息(没有细节内容),他的状态就会变成Following。
2025-04-19 10:33:21,454 [myid:] - INFO [ListenerHandler-/192.168.31.140:3888:o.a.z.s.q.QuorumCnxManager$Listener$ListenerHandler@1076] - Received connection request from /192.168.31.141:37540
2025-04-19 10:33:21,466 [myid:] - INFO [WorkerReceiver[myid=1]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:2, n.round:0x1, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0
2025-04-19 10:33:21,469 [myid:] - INFO [WorkerReceiver[myid=1]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:2, n.round:0x4, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0
2025-04-19 10:33:21,473 [myid:] - INFO [WorkerReceiver[myid=1]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:1, n.state:LOOKING, n.leader:2, n.round:0x4, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0
2025-04-19 10:33:21,474 [myid:] - WARN [QuorumConnectionThread-[myid=1]-9:o.a.z.s.q.QuorumCnxManager@401] - Cannot open channel to 3 at election address /192.168.31.142:3888
#部分省略
2025-04-19 10:33:21,673 [myid:] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@903] -
Peer state changed: following
2025-04-19 10:33:21,674 [myid:] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@1537] - FOLLOWING
节点3启动
他也会触发选举状态,但是由于集群已经选举出来Leader,所以这他就自动变成Following。
2025-04-19 10:43:45,137 [myid:] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@1455] - LOOKING
2025-04-19 10:43:45,137 [myid:] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.FastLeaderElection@946] -New election. My id = 3, proposed zxid=0x0
2025-04-19 10:43:45,151 [myid:] - INFO [ListenerHandler-/192.168.31.142:3888:o.a.z.s.q.QuorumCnxManager$Listener$ListenerHandler@1071] - 3 isaccepting connections now, my election bind port: /192.168.31.142:3888
2025-04-19 10:43:45,152 [myid:] - INFO [WorkerReceiver[myid=3]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:3, n.state:LOOKING, n.leader:3, n.round:0x1, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0
2025-04-19 10:43:45,169 [myid:] - INFO [WorkerReceiver[myid=3]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:2, n.state:LEADING, n.leader:2, n.round:0x4, n.peerEpoch:0x1, n.zxid:0x0, message format version:0x2, n.config version:0x0
2025-04-19 10:43:45,169 [myid:] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@903] -
Peer state changed: following
有数据情况:有数据的情况实际上和上面类似,只是他的判断标准变成了zxid,zxid相同的情况下再对比sid。