Note:2PC assumes that the data in stable storage at each node is never lost and that no node crashes forever. Data loss is still possible if the data in the stable storage is corrupted in a crash
A network partition is the failure of a network link to one or several nodes. The nodes themselves continue to stay active, and they may even be able to receive requests from clients on their side of the network partition
网络分区的一个特点是我们很难网络分区和节点故障区分开,一旦网络分区发生,系统中就会有多个部分都是出于active的状态,在Primary/backup中就会出现两个primary。因此,Partition tolerant consensus algorithms必须要解决的一个问题就是:during a network partition, only one partition of the system remains active
解决的方法主要从:
Majority decisions:在N个节点中,只有有N/2+1个还正常就能正常工作
Roles:有两种思路(all nodes may have the same responsibilities, or nodes may have separate, distinct roles.)通过选出一个master,能使系统变得更有效,最简单的好处就是:操作都经过master,就使得所有的操作都强制排序了。
Epochs:Epochs作用类似于逻辑时钟,能够使得不同节点对当前系统状态有个统一的认知。
除了上面给出的方法外,还需要注意的点有:
practical optimizations:
avoiding repeated leader election via leadership leases (rather than heartbeats)【防止重复leader选举,手段是通过租期而不是心跳】
avoiding repeated propose messages when in a stable state where the leader identity does not change【防止重复propose消息】
ensuring that followers and proposers do not lose items in stable storage and that results stored in stable storage are not subtly corrupted (e.g. disk corruption)【对于items要持久化存储防止丢失】
enabling cluster membership to change in a safe manner (e.g. base Paxos depends on the fact that majorities always intersect in one node, which does not hold if the membership can change arbitrarily)
procedures for bringing a new replica up to date in a safe and efficient manner after a crash, disk loss or when a new node is provisioned
procedures for snapshotting and garbage collecting the data required to guarantee safety after some reasonable period (e.g. balancing storage requirements and fault tolerance requirements)