ELSEVIER
Information
Information Processing Letters 50 (1994) 69-73Message complexity of hierarchical
quorum consensus algorithm
Her-Km Chang, Shyan-Ming Yuan *
Department of Computer and Information Science, National Chiao Tung lJnir,ersity, 1001 Ta Hsueh Roud, Hsinchu 30050, Taiwan
(Communicated by W.M. Turski; received 20 September 1993; revised 10 January 1994)
Abstract
The hierarchical quorum consensus (HQC) algorithm, which is a generalization of the standard quorum consensus algorithm for managing replicated data, can reduce the quorum size to NO.‘j”. This paper analyzes the message complexity of HQC. Moreover, an asymptotic analysis on the ratio of the message complexity to the quorum size is presented. It is shown that the ratio converges to a constant.
Key words: Analysis of algorithms; Replicated data; Hierarchical quorum consensus; Quorum size; Message complex- ity
1. Introduction
In a distributed system, copies of the same data are kept in different sites to increase the availability of data. To maintain the mutual con- sistency among replicated copies, a replica con- trol algorithm is required to synchronize read and write operations such that the multiple copies of the same data behave like a single copy. A survey of algorithms for replica control can be found in
121.
Quorum CONSENSUS algorithms [1,3,5] are well known for replica control. With quorum consen- sus, a read operation is allowed to be performed if it can get permissions from a read quorum of r
sites. On the other hand, a write operation must get permissions from a write quorum of w sites.
* Corresponding author. Email: smyuan@ tiger.cis. nctu.edu.tw.
To ensure consistency, r and w must satisfy the following constraints:
r+w>N, (1)
2w>N, (2)
where N is the number of sites in the system. Condition (1) ensures that a read operation can access a most recently updated copy (i.e. a copy updated by the last write operation). Ver- sion numbers can be used to determine which copy is most recently updated. Condition (2) en- sures that the most recently updated copy has the largest version number (see [3] for details).
A major problem with quorum consensus algo- rithms is that the write quorum size is at least ](N + 1)/2]. The hierarchical quorum consensus (HQC) [4] algorithm, which logically organizes the sites in a system into a multilevel hierarchy, can reduce the quorum size to N”.‘j”. Quorum size was used to evaluate quorum consensus algo- rithms based on the assumption that message
0020.0190/94/$07.00 0 1994 El sevier Science B.V. All rights reserved SSDZ 0020-0190(94)00012-N
complexity is proportional to quorum size. How- ever, the communication cost of a distributed algorithm is usually estimated by the message complexity, which is defined to be the average number of exchanged messages per operation. In this paper, the message complexity of HQC is analyzed. Also, an asymptotic analysis on the ratio of the message complexity to the quorum size is presented. It is shown that the ratio con- verges to a constant y. The presented analysis shows that, for HQC, the message complexity is proportional to the quorum size in most practical cases.
The remainder of this paper is organized as follows. The next section describes the model and notation. Section 3 reviews HQC. In Section 4, the message complexity of HQC is analyzed and an asymptotic analysis on the ratio of the message complexity to the quorum size is presented. Some concluding remarks are given in the final section.
2. Model and notation
A distributed system consists of a set of sites connected by a computer network. There is no shared memory between any pair of sites. The sites can communicate by exchanging messages only. In this paper, we assume that each site stores a replicated copy of the replicated data. The sites can be unreliable. When a site fails, the copy at the site becomes unavailable.
The availability of a site is the probability that the site is operational at any time instant. The
availability of HQC is the probability that at least one hierarchical quorum can be formed. Message complexity is defined to be the expected number of exchanged messages per operation. Quorum size is defined to be the average number of sites that form a quorum.
In this paper, we use the following notation:
a p: availability of a single site, i <p < 1,
l Ai: availability of an i level hierarchy, i > 0, o Cj: quorum size of an i Ievel hierarchy, i >, 0, * Mj: message complexity of an i level hierarchy,
i > 0,
l Ri: M,/C,, i 2 0.
For HQC, it can be shown that if p < 5, then A, GAi_i, for all i 2 1. So we consider only the
case that i <p < 1.
3. Hierarchical quorums
HQC is based on logically organizing sites into a multilevel hierarchy of depth m with root at level m. The physical sites are represented as leaves of the hierarchy at level 0, while the higher level nodes correspond to logical groups. A node at level i + 1, 0 G i < m - 1, is viewed as a logical group which in turn consists of Ii subgroups at level i. The read quorum, ri, and write quorum, wi, at level i, 0 G i .G m - 1, are defined as the number of subgroups that must be collected for read and write operations by a level i + 1 group, respectively. HQC requires read and write opera- tions to (recursively) assemble quorums of rm _ 1
and wm_ 1 level m - 1 subgroups, respectively, such that
(a> ri + wi > li, and (b) 2Wi > I,,
for all levels i, 0 < i 6 m - 1.
Fig. 1 shows some examples of hierarchical (read/ write) quorums for a 2 level hierarchy, wherein I, = Zi = 3 and r0 = ri = w0 = w1 = 2.
It was shown that the minimum quorum size required for a write operation is No.63 which can be achieved in systems having N = 3” * 5h sites where b = 0 or 1 [4]. In this paper, we discuss only ternary hierarchies, in which every non-leaf (logical) node consists of three nodes in the next lower level and the (read/write~ quorums in all levels are 2. For a ternary hierarchy of depth m,
the number of sites in the system is N = 3” and the size of quorums is 2” + N”,63.
Fig. 1. Examples of hierarchical quorums: dark nodes form a quorum in each case.
H.-K. Chang, S.-M. Yuan /Information Processing Letters 50 (1994) 69-73 71
For 1 G i G m, A, can be computed from the following recurrence:
Ai =A;_, + 3&,(1 -/f-i)
= 3& - 2&,, (3)
where A,, =p and the availability of the hierarchy is A,.
The quorum size of an i level hierarchy, Ci,
i > 0, is
ci = 2’. (4)
4. Analysis of HQC
In this section, the message complexity of HQC is analyzed. An asymptotic analysis on the ratio of the message complexity to the quorum size, say
Ri, is presented. It is shown that the sequence
{R;} is convergent and a way to estimate the limit of (Ri} is given.
4.1. Message complexity of HQC
For an m level ternary hierarchy, with root at level m and leaves (physical sites) at level 0, M, is the expected number of messages required to form a quorum at level i, 0 < i < m. Consider the construction of a quorum at level i, 1 < i G m,
if the first two children (in some predefined order) are available - only 2M,-, messages are required;
if the first two children are unavailable - since it is impossible to form a quorum, only 2M,_, messages are required;
otherwise - 3M,_, messages are required de- spite whether the quorum can be formed or not.
Thus,
Mj= (/I-, + (1 +IJ)2Mj&i
+(2A,_,(l -Ai_,))3M,_,
Z2(1 +t~Pi-A?_i)Mi_ij (5)
where M, = 1 and M,,, is the message complexity of the m level hierarchy.
4.2. Asymptotic analysis of R, for H&C
Lemma 1. For HQC, i <p < 1, Ai has the follow-
ing properties :
(1) 1 -A,+, < (1 fp - 2~‘)~(1 -A;),
(2) (I+ (1 -A;))(1 + (1 -A,+,)>
. . . (1 + (1 -A,+,)) <e(1--A,)/L’(2P-1).
where i > 0 and m > 0.
Proof. The proof is shown in the Appendix. Lemma2. ForHQC, Ri>Ri_,, forallial.
Proof. R, = M,/C, 2(1 +Aj_, -A:-,) M,_, = 2 Ci&1 = (1 +Ai~,(l -A,_,))Ri_, >Ri_,. 0 Lemma3. ForHQC, $<p<l, R r+m < e (1 --A,)/P@P ‘,R i’ wherei> andmal. Proof. For i > 0, Ri+, =M,+,/‘Ci+,
(6)
2(1 +A,-A:) Mi 2 c, = (1 +Ai(l -A,))R; < (1 + (1 -A,))R;. By iteration,R r+nz ’ {(l + (I1 -A,))(1 + (l -Ai+l>> ... (1 + (1 -A,+,_,))}R, According to Lemma 1,
R;+m < ,o -A,)/p(Zp- I)&. 0
Theorem 4. For HQC, i <p < 1, the sequence
Proof. Lemma 2 shows that (RJ is monotonically
increasing. By Lemma 3, Ri < ef1-p)/pf2p-‘), i.e.,
{R$ is bounded. Thus, {Ri} is convergent to a
constant y, where y < e(‘--p)/P(‘2J’-t1).
Theorem 4 shows that, for HQC, the message complexity is proportional to the quorum size if the system is sufficiently large.
4.3. Estimation of the limit of (Ri) for HQC
It is shown by Lemma 3 that, for HQC, $ <p < 1,
Ri+m < e(’ -Ai)/P(2P- “)Ri
where i z= 0 and m 2 1. For any F > 0, if e(l--A,f/P(2P--i) < (I +E) then Ritrn < (1 +E)Ri. Since ,U--A,)/P(2P--l) < (I + S) if A,>l-p(2p-l)In(l+&).
Thus, for any E > 0, we can find condition (7) such that Rj+m < (1 m > 1. Consequently, the limit smaller than (1 + c)Ri. That is, y val (Ri, (1 + c)Ri).
(7)
an i satisfying f E)J\~, for any
of CR,), Y, is is in the inter-
Table 1 illustrates the smallest value of i that satisfies condition (7) for several combinations of p and E. Several notable observations from Table
1 are:
Table 1
The smallest value of i that satisfies condition (5)
P E 10-Z 10-x 10-4 10-s 0.6 7 7 8 8 0.7 5 5 6 6 0.8 3 4 4 5 0.9 2 3 3 4
0 In most practical applications, p 2 0.9 (most current workstations can achieve this availabil- ity) and i GS 2, the ratio of the message com- plexity to the quorum size is almost a constant.
l Even the site availability is low, e.g. p = 0.6, the ratio converges very fast 6.2 7 for p = 0.6).
5. Conclusions
This paper analyzes the message complexity of the hierarchical quorum consensus algorithm. Also, an asymptotic analysis on the ratio of the message complexity to the quorum size is pre- sented. It is shown that the ratio converges to a constant y, where y < ecr-pf/p(2p-11f. Finally, a way to estimate the value of y is given. The result shows that, in most practical cases (p 2 0.9 and
i & 2), the ratio is almost a constant. Even for
some impractical cases, the ratio converges very fast. Quorum size was used to appraise the per- formance of quorum consensus algorithms based on the assumption that the message complexity is proportional to the quorum size. However, the assumption was not proved in previous works. The presented analysis shows that the assumption is valid in most practical cases.
6. Appendix
Lemma 5. Zf 0 <xi < 1, for all i, then
(l+x,)(l+~,)~~~(l+~,)<e~~~.
Proof. It is known that
In(liz)=z-~+~-q+~*~.
Hence, for any 0 <z < 1, ln(1 + z) <z. Then, if 0 <xi < 1, for all i, we have
In((1 +x,)(1 -t-X,) **. (1 +xJ) < cxi
That is,
H.-K. Chang, S.-M. Yuan /Information Processing Letters 50 (1994) 69-73 73
Proof of Lemma 1. For i > 0, by (3), 1 -/Ii+1 = l-3&+2& = (1 -A;) - 241 -Ai) = (1+ (1 - 24)4)(1 -Ai) Q (1+ (1 - 2p)A,)(l -Ai) < (1 + (1 - 2p)p)(l -Ai) = (1 +p - 2p2)(1 -Ai). By iteration, for i > 0 and m & 0,
1 -Ai+, ~ (1 +p - 2pZ)m( 1 -Ai). Next, according to (8),
(1 -Ai) + (1 -Ai+1) + *.. +(l -Ai+,)
< (1 -A,){1 -t (1 t-p - 2p2) + . *. + (1 tp - 2p”)m} 1 < (1 -Ai) 1 - (1 +p - 2p2) 1 -Aj = p(2p - 1) ’ Thus, by Lemma 5,
(1 + (1
-A,))(1 +
(1
-A,+,>>
. . .
(1 + (1 -Ai+,)) < e(1-At)/p(2P-*). 7. References(8)
[l] M. Ahamad and M.H. Ammar, Performance characteriza- tion of quorum-consensus algorithms for replicated data, IEEE Trans. Soffware Engrg. 15 (4) (1989) 492-496. [2] S.B. Davidson, H. Garcia-Molina and D. Skeen, Consis-tency in partitioned networks, ACM Compuf. Surveys 17 (3) (1985) 341-370.
[3] D.K. Gifford, Weighted voting for replicated data, in: Proc. 7th ACM Symp. on Operating System Principles (1979)
150-162.
[4] A. Kumar, Hierarchical quorum consensus: A new algo- rithm for managing replicated data, IEEE Trans. Comput. 40 (9) (1991) 996-1004.
[5] R.H. Thomas, A majority consensus approach to concur- rency control for multiple copy databases, ACM Trans. Database Systems 4 (2) (1979) 180-209.