About ballots in Paxos
Recently I am reading papers about asynchronous consensus protocols, and I was trying to contrast these protocols with Paxos (Lamport, 2001) and PBFT (Castro et al., 1999). When re-reading the Paxos paper, some of the design points take me some time to reason about. These thoughts may seem naive for Paxos experts, but I will just jot them down.
In Paxos, a value is confirmed only if it is accepted with the same ballot by a majority of acceptors. Why is having a majority of acceptors accept the same value not enough?
(At first, I thought the accept action in Paxos corresponds to decide in Ben-Or’s algorithm (Ben-Or, 1983), and thus I was wondering why Ben-Or’s algorithm can let nodes decide a value in different rounds, but Paxos requires a majority of nodes to accept with the same ballot. However, this comparison is wrong. Ben-Or’s decide is more like the behavior of learners in Paxos, and Ben-Or’s algorithm can be regarded as skipping the first phase of Paxos if we have to parallelize them.)
To answer this question, we can simply construct a counterexample which shows that, after a majority
of acceptors have accepted a same value
Let’s use the simplest setting with 3 acceptors
If the prepare message of $n’$ arrives after
We visualize the construction below. A new proposer can use
Overall, in Paxos, although an acceptor will try to let subsequent proposers follow
its accepted value, the proposer can find a majority/quorum without including this acceptor.
Even if a majority of acceptors accept the same value
This makes me think about another possible design. What if we can have a method
that makes there no gap between the ballots for accepting the same value? I did
not conduct a formal proof, but it seems to be enough for correctness. However, how can
we ensure there are no new ballots can be inserted? Using consecutive integers? Using
timestamps and safe timestamp mechanism?
Another thought is about the proposed value. In Ben-Or’s algorithm, a node tries to follow the value from other nodes in its Phase-1 in order to facilitate convergence. In contrast, in Phase-1 of Paxos, the proposer does not follow the value prepared by acceptors; its proposed value is affected only by accepted values.
What if we adapt Paxos and let a proposer follow the value observed in Phase-1. I was thinking this may make it possible to enable Paxos to confirm a value by accepting it value with different ballots. However, by simple variants of the previous construction, we can see that it is not correct. However, will this adaption cannot make the consecutive-ballot design viable?
Anyway, these are just some random thought, and I did not spend much time to verify them, as the vanilla Paxos (Paxos-made-simple without a stable leader) is rarely used in deployments.
References:
-
Paxos made simple. ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001) 2001.
-
Practical byzantine fault tolerance. In OsDI 1999.
-
Another advantage of free choice (extended abstract) completely asynchronous agreement protocols. In Proceedings of the second annual ACM symposium on Principles of distributed computing 1983.