What do we mean when say X is more abstract than Y? First, that X does not introduce anything new or fundamentally different from Y. In fact, X may remove some aspects of Y or present them in a way that makes them more manageable. Second, that X is in some sense easier to grasp than Y, assuming that the things that X removed from Y are not important to the matter at hand.
run concurrently on independent nodes …【独立节点上并发执行】
are connected by a network that may introduce nondeterminism and message loss …【通过网络互连】
and have no shared memory or shared clock.【不共享内存和时钟】
具体解释是:
each node executes a program concurrently【每个节点都并发执行】
knowledge is local: nodes have fast access only to their local state, and any information about global state is potentially out of date 【每个节点只知道自己节点上的信息】
nodes can fail and recover from failure independently 【每个节点失败和恢复都是独立的】
messages can be delayed or lost (independent of node failure; it is not easy to distinguish network failure and node failure) 【通信是不可靠的】
and clocks are not synchronized across nodes (local timestamps do not correspond to the global real time order, which cannot be easily observed)【时钟不同步】
那什么是系统模型?
System model:a set of assumptions about the environment and facilities on which a distributed system is implemented
系统模型定义了关于 environment and facilities 的假设,这些假设包括:
what capabilities the nodes have and how they may fail 【每个节点能力和失败方式】
how communication links operate and how they may fail and 【节点间通信方式和失败方式】
properties of the overall system, such as assumptions about time and order【整个系统属性:如时序】
Synchronous system
model Processes execute in lock-step; there is a known upper bound on message transmission delay; each process has an accurate clock
Asynchronous system
model No timing assumptions - e.g. processes execute at independent rates; there is no bound on message transmission delay; useful clocks do not exist
The consensus problem
下面对网络是否分区包含在错误模型中和网络传输是同步还是异步模型两个条件的讨论
whether or not network partitions are included in the failure model, and
synchronous vs. asynchronous timing assumptions
先介绍下什么是一致性模型
Agreement: Every correct process must agree on the same value.
Integrity: Every correct process decides at most one value, and if it decides some value, then it must have been proposed by some process.
Termination: All processes eventually reach a decision.
Validity: If all correct processes propose the same value V, then all correct processes decide V.
Two impossibility results
什么是impossibility results
A proof of impossibility, also known as negative proof, proof of an impossibility theorem, or negative result, is a proof demonstrating that a particular problem cannot be solved, or cannot be solved in general. Often proofs of impossibility have put to rest decades or centuries of work attempting to find a solution. To prove that something is impossible is usually much harder than the opposite task; it is necessary to develop a theory. Impossibility theorems are usually expressible as universal propositions in logic (see universal quantification).
A CA system does not distinguish between node failures and network failures, and hence must stop accepting writes everywhere to avoid introducing divergence (multiple copies). It cannot tell whether a remote node is down, or whether just the network connection is down: so the only safe thing is to stop accepting writes.【不能区分网络分区和节点失败,因此必须停止写入避免引入不一致】
A CP system prevents divergence (e.g. maintains single-copy consistency) by forcing asymmetric behavior on the two sides of the partition. It only keeps the majority partition around, and requires the minority partition to become unavailable (e.g. stop accepting writes), which retains a degree of availability (the majority partition) and still ensures single-copy consistency.【即使网络分区了,大多数节点的一方还是能够提供服务】
CP系统因为将网络分区考虑到了failure model中,因此能够通过类似Paxos, Raft 的协议来区分a majority partition and a minority partition
First, that many system designs used in early distributed relational database systems did not take into account partition tolerance (e.g. they were CA designs). Partition tolerance is an important property for modern systems, since network partitions become much more likely if the system is geographically distributed (as many large systems are).【早期系统大多没有考虑P,因此是CA系统,但是现代系统,特别是出现异地多主后,必须考虑分区了】
Second, that there is a tension between strong consistency and high availability during network partitions. The CAP theorem is an illustration of the tradeoffs that occur between strong guarantees and distributed computation.【P既然无法避免,我们只能在C和A之间做选择,有时候我们可以通过降低数据的一致性模型,不再追求强一致,从而达到"CAP"】
Third, that there is a tension between strong consistency and performance in normal operation.【当一个操作涉及的消息数和节点的数少的时候,延迟自然就低,但是这也意味着有些节点不会被经常访问,意味着数据会是旧数据】
Fourth - and somewhat indirectly - that if we do not want to give up availability during a network partition, then we need to explore whether consistency models other than strong consistency are workable for our purposes.【有时候3选2可能是误解,我们如果将自己不限制在强一致性模型,我们会有更多的选择】
我们要记住:
ACID consistency != CAP consistency != Oatmeal consistency
一致性模型的概念是:
Consistency model
a contract between programmer and system, wherein the system guarantees that if the programmer follows some specific rules, the results of operations on the data store will be predictable
一致性模型是编程者和系统之间的契约,只要编程者按照某种规则,那计算机的操作结果就是可预测的。
下面介绍一些一致性模型:
Strong consistency vs. other consistency models
Strong consistency models (capable of maintaining a single copy)
Linearizable consistency: Under linearizable consistency, all operations appear to have executed atomically in an order that is consistent with the global real-time ordering of operations. (Herlihy & Wing, 1991)
Sequential consistency: Under sequential consistency, all operations appear to have executed atomically in some order that is consistent with the order seen at individual nodes and that is equal at all nodes. (Lamport, 1979)
First, how long is "eventually"? It would be useful to have a strict lower bound, or at least some idea of how long it typically takes for the system to converge to the same value.【最终,这个时间是多久】
Second, how do the replicas agree on a value? how非常重要,因为如果设计的不好,可能会导致数据丢失。
因此,在谈论最终一致的时候,我们需要知道这可能是:"eventually last-writer-wins, and read-the-latest-observed-value in the meantime"