Understanding CAP Theorem for Distributed Systems
The Relation between Consistency, Availability, & Partition Tolerance in Distributed Systems
When designing a stateful distributed system, software engineers frequently encounter challenges related to three essential properties: Consistency (C), Availability (A), and Partition Tolerance (P). While engineers ideally aim to address all three properties in their design, the CAP theorem dictates that a system can only fulfill two of these properties simultaneously.
Three Properties
Consistency
In distributed systems, consistency in the CAP theorem ensures that after a write operation, all subsequent read operations should return the most recently written value, thereby preventing stale reads. This means that regardless of which node clients are connected to and which node the writes happen, all clients should read the same data simultaneously.
Availability
For a distributed system to be considered available, it must be able to respond to every request it receives, even if some of its nodes are failing or experiencing network partition. However, in such cases, the response may not always reflect the most recent data written to the system.
Partition Tolerance
In a distributed system, a network partition occurs when the system divides into multiple sub-networks due to communication failures. A partition-tolerant system will continue to serve requests even if communication between nodes is dropped or delayed.
Pick Any Two!
As previously mentioned, the CAP theorem states that a distributed system can only achieve two out of the three properties: consistency, availability, and partition tolerance. This means that if a node or group of nodes becomes partitioned from the rest of the system, it can either continue responding to client queries (AP), stop responding to client queries (CP), or shut down completely (CA).
AP — Availability and Partition Tolerance
In this scenario, the system prioritizes availability and partition tolerance, resulting in slightly weaker consistency. This means that during a network partition, the disconnected nodes may return outdated data to clients (hence sacrificing consistency). However, once the network delay is resolved or the partition is healed, the data on the nodes will eventually synchronize to the most recent version, achieving eventual consistency.
CP — Consistency and Partition Tolerance
In this situation, a distributed system emphasizes consistency and partition tolerance, which leads to decreased availability when there are network faults. In other words, if a network partition happens, the nodes that are disconnected will not respond to client queries (hence sacrificing availability), which contradicts the idea of availability.
CA — Consistency and Availability
In this situation, the system prioritizes consistency and availability, resulting in a lack of partition tolerance. During a network partition, the disconnected nodes shut down, sacrificing partition tolerance. Nevertheless, the connected nodes continue to maintain consistency and availability.
Good one 👍