Paxos Made Simple Leslie Lamport
Introduction • Lock is the easiest way to manage concurrency • Mutex and semaphore. • Read and write locks in 2PL for transaction. • In distributed system: • No master for issuing locks. • Failures
Problem • How to reach consensus/data consistency in distributed system that can tolerate non-malicious failures?
Requirements • Safety • Only a value that has been proposed may be chosen. • Only a single value is chosen. • A node never learns that a value has been chosen unless it actually has been. • Liveness • Some proposed value is eventually chosen. • If a value has been chosen, a node can eventually learn the value.
Paxos‘s notation • Classes of agents: • Proposers • Acceptors • Learners • A node can act as more than one clients (usually 3).
Paxos algorithm • Phase 1 (prepare): • A proposer selects a proposal number n and sends a prepare request with number n to majority of acceptors. • If an acceptor receives a prepare request with number n greater than that of any prepare request it saw, it responses YES to that request with a promise not to accept any more proposals numbered less than n and include the highest-numbered proposal (if any) that it has accepted.
Paxos algorithm • Phase 2 (accept): • If the proposer receives a response YES to its prepare requests from a majority of acceptors, then it sends an accept request to each of those acceptors for a proposal numbered n with a values v which is the value of the highest-numbered proposal among the responses. • If an acceptor receives an accept request for a proposal numbered n, it accepts the proposal unless it has already responded to a prepare request having a number greater than n.
Definition of chosen • A value is chosen at proposal number n iff majority of acceptor accept that value in phase 2 of the proposal number.
Paxos’s properties • P1: Any proposal number is unique. • P2: Any two set of acceptors have at least one acceptor in common. • P3: the value sent out in phase 2 is the value of the highest-numbered proposal of all the responses in phase 1.
Interpretation of P3 # value pool of acceptors 2 α a1 a2 a3a4 5 β a1 a2a3 a5 14 αa2 a4a5 27 βa1 a3 a4 29 βa2 a3 a4
Proof of safety • Claim: if a value v is chosen at proposal number n, any value that is sent out in phase 2 of any later prososal numbers must be also v. • Proof (by contradiction): Let m is the first proposal number that is later than n and in phase 2, the value sent out is not v.
Proof # value pool of acceptors a a n v … n+1 v … … m-1 v … m v’ … the highest # chosen in phase 2 ≥ the highest # that a accept ≥ n
Learning a chosen value • There are some options: • Each acceptor, whenever it accepts a proposal, informs all the learners. • Acceptors informs a distinguished learner (usually the proposer) and let the distinguished learner broadcast the result.
Tunable knobs • Acceptors have many options to response: • Prepare request: No/Yes • Accept request: No/Yes if it didn’t promise not to do so • Back off time after abandon a proposal: exponential back-off/pre-assigned values • Should we wait for nodes to online in each phase?
Applications • Chubby lock service. • Petal: Distributed virtual disks. • Frangipani: A scalable distributed file system.