Reaching strong consensus in the presence of mixed failure types

(1)

I N F O R M A T I O N S C I E N C E S a n II¢l~rNA~O~a~ fOU'~A~ ELSEVIER Journal of Information Sciences 108 (1998) 157 180

Reaching strong consensus in the

presence of mixed failure types

Hin-Sing Siu a,., Yeh-Hao Chin b, Wei-Pang Yang c

~' Department of Industrial Engineering and Management, Mingchi Institute o[" Technology, Taipei, Taiwan 24306, ROC

b Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan 30043, ROC c Department of Computer and Information Science, National Chiao Tung University, Hsinehu,

Taiwan 30050, ROC

Received 10 October 1995; received in revised form 24 May 1997; accepted 26 August 1997

Abstract

The Strong Consensus (SC) is a variant of the conventional distributed consensus problem. The protocol designed for the SC problem requires that the agreed value among fault-free processors be one of the fault-free processor's initial value. The SC problem is re-examined with the assumption of mixed failure types (also referred to as the hybrid fault model). Compared with the features of the existing protocols, the under- lying network topologies of the proposed protocol do not have to be fully connected, the mixed failure types can be tolerated, and no prior information of the system's faulty status is required. The proposed protocol can tolerate a maximum number of faults to enable each fault-free processor to reach an agreement with a minimum number of message exchanges. © 1998 Published by Elsevier Science Inc. All rights reserved.

Keywords." Byzantine agreement; Distributed consensus; Fault-tolerant distributed

system; Mixed failure types; Nonfully connected network; Strong consensus

* Corresponding author. E-mail: hssiu@ccsun.mit.edu.tw, fax: 886 2 29041914.

(2)

158 H.-S. Siu et al. / Journal o f lnjormation Sciences 108 (1998) 157-180

I. I n t r o d u c t i o n

The Distributed Consensus problem [1-5] is one o f the most important prob-

lems in designing a fault-tolerant distributed system. A variant of distributed consensus, called the Strong Consensus (SC) problem, was introduced by Ne- iger [6]. In the SC problem, each processor A starts with an initial value, be- longs to a finite set, V, o f all possible values ([ V] = m). The protocol for the SC problem is to enable all fault-free processors to obtain a c o m m o n value. Af- ter the execution o f the protocol, the c o m m o n value obtained by the fault-free processors shall be the value that satisfies the following conditions:

1. Agreement: All fault-free processors agree on the same c o m m o n value v, and

2. strong validity: v is the initial value o f some fault-free processors.

In practice, most network topologies are not fully connected and network processors may be subjected to different types o f failure simultaneously [1]. F r o m the standpoint o f disruptive effects, the processor failure types can be di- vided into two disjoint subsets: dormant faults and arbitrary faults, termed by Meyer and Pradhan [7]. A dormant fault is defined as a fault that does not con- taminate the content of a message and produces missing values detectable by all fault-free processors. An arbitrary fault refers to the case when the behavior o f a fault is not restricted.

However, the SC protocol proposed by Neiger [6], like most conventional consensus protocols [8,9,3], is designed to handle arbitrary faulty processors in a fully connected network. Based on the discussion of [10,7,11], the protocol of [6] cannot tolerate a maximum number of faults when dormant faults are considered because the faulty behaviors of the dormant faults are more consistent than that of the arbitrary faults. Also, the protocol cannot be applied to a nonfully connected network such as the one shown in Fig. 1.

ve=l f

t

v~=2 B A

"

S

H Vc= l @ : Fault-flee processor 0 : Crashed process~ • : Byzantine faulty processor

(3)

H.-S. Siu et al. / Journal of Information Sciences 108 (1998) 15~180 159 Some existing protocols [10,7,11] are designed to solve the consensus problem with mixed failure types (also referred to as the hybrid fault model). H o w -

ever, they will not guarantee the strong validity condition as specified by [6]. Moreover, the m a j o r limitation o f these protocols is that the n u m b e r o f arbitrary faulty processors must be k n o w n prior to execution of the protocols. However, this requirement violates the general assumption of the consensus p r o b l e m - that a fault-free processor does not know which processor is faulty [3,5]. Furthermore, Shin and R a m a n a t h a n [12] found that it is impractical to run diagnostics to detect all arbitrary faults in a network. Thus, these protocols cannot solve the SC p r o b l e m with mixed failure types.

The strategies used in previous research of the consensus problem can be classified into two groups: deterministic strategy and nondeterministic strategy

[13,14]. Each step for executing deterministic strategy is predetermined, such as all processors execute the same protocol, and all processors start and stop the execution of protocol at the same time. Conversely, certain steps of a protocol are not predefined in the nondeterministic strategy. F o r example, a processor can stop the executing of the protocol early when it can decide the

c o m m o n value. However, in such a strategy, a processor cannot decide whether another processor has stopped early or crashed when no message was received f r o m it; thus, the nondeterministic strategy is inappropriate for mixed failure types. Hence, the research in the consensus problem with mixed failure types (including this paper) concentrates on the deterministic strategy.

We re-examined the SC p r o b l e m with mixed failure types in a synchronous

network I and proposed a protocol that can solve the problem under the following assumptions:

1. The underlying network m a y not be fully connected and the communication links in the network are assumed to be fault-free.

2. Let n be the total n u m b e r of processors. Each processor's identifier is unique. A processor does not k n o w the fault status of another processor. 3. Let c be the connectivity of the underlying network. Due to the Menger the-

o r e m [15], at least c disjoint paths exist between any pair of processors S and R if the connectivity of the network is c. F o r any two paths the only c o m m o n c o m p o n e n t s are S and R.

4. Let Pa be the n u m b e r of processors subjected to arbitrary faults. 5. Let Pd be the n u m b e r of processors subjected to d o r m a n t faults.

6. Let V be the set of all possible values o f the SC problem, m be the size of V, and each processor uses a predefined enumeration o f the values in V: v l ~ v2~ . . . ~ v m.

1 In a synchronous network, the bounds for processing and communication delay in fault-flee

(4)

160 H.-S. Siu et al. / Journal o f lnformation Sciences 108 (1998) 157-180

The protocol designed for solving the SC problem with mixed failure types is called SCMIX. SCMIX enables all fault-free processors to obtain a c o m m o n value for solving the SC if n > max{mPa + Pd, 3Pa + Pd} and c > 2Pd + Pd. It uses a minimum number of message exchanges and can tolerate a maximum allowable number o f faulty processors. SCMIX is based on the oral message

model (one of the deterministic strategy) [3], which has two phases: the message

exchange phase and the decision making phase. The goal of the message ex-

change phase is to collect the messages and that o f the decision making phase is to compute a c o m m o n value for solving the SC problem. The formal descriptions o f these phases are given in Section 3.

The remainder o f the paper proceeds as follows. In Section 2 we give the conditions for the SC problems. Section 3 proposes the detaiI descriptions of the proposed protocol. The analysis and evaluation o f SCMIX are presented in Section 4. The conclusion is given in Section 5.

2. The conditions for strong consensus

In order to solve the SC problem, the number of message exchange rounds required and the n u m b e r o f faulty processors allowed shall first be considered.

2.1. The number of rounds required by SCMIX

Since a processor does not know the fault status o f another processor, each fault-free processor requires t + 1 rounds 2 to exchange the messages needed to reach an agreement [3,6,5], where t = L ( n - 1 ) / ( m a x { m , 3 } ) J . Fischer and Lynch [16] also pointed oat that the t + 1 rounds are the minimum number o f rounds required to reach a c o m m o n value when the network's fault status is unknown. Therefore, the minimum number of rounds required by S C M I X i s t + l .

2.2. The number of allowable faulty processors by SCMIX

Essentially, the fault tolerant capabilities of a network depend on the total number of processors and the network topology (connectivity). F o r example, every faulty processor can prevent the fault-free processors from achieving an agreement if the network topology is a tree. To generalize, the complete characterizations o f constraints on failures for the SC problem are shown in Theorem 1. The SC protocol that meets Theorem 1 is given in Section 3.

(5)

H,-S. Siu et aL / Journal o f lnformation Sciences 108 (1998) 15~180 161

Theorem 1. For any network with n processors and c connectivity, S C can be

achieved if'.

(1) n > max{mP~ + Pd, 3Pa + Pd}, and

(2) c > 2Pa + Pd

Proof. The first constraint specifies the number of processors required and follows the concept o f [3,7,17]. Via a time-out mechanism (the approach is presented in Section 3), a fault-free processor can detect the occurrence o f a dormant fault in a processor and ignore all messages from that processor if no message is received from it within a predefined time interval. Then the network behaves as a network with n - Pd processors. The constraint n - Pd > 3P~, namely n > 3Pa + Pd, requires that the SC protocol also solve the consensus problem [7,17]. Follow the results o f [7], the constraint n - Pd > mP~, namely

n > mPa + Pd, is required to solve the SC problem. Thus,

n > max{mPa + Pa, 3Pa + Pd} is the constraint on the number of processors required.

On the other hand, the second constraint specifies the connectivity required. In each round, every processor sends its message to other processors. In order to decide whether a processor has sent out its message, by the concept of ma-

jority, the total number o f arbitrary faulty processors must be less than half of

c - Pd, namely c > 2P~, + Pd [9]. Otherwise, such a goal cannot be reached. [] By Theorem 1, the maximum number of tolerable faulty processors by SCMIX is Pa + Pd ifn > max{mPa + Pd, 3Pa + Pd} and c > 2P~ + Pd (the formal p r o o f is shown in Theorem 7).

3. Basic concept and approaches

In a nonfully connected network, a processor can act either as sender, receiv-

er, or relay depending on the message flow. A message, sent from a sender to a receiver, may be passed through some relay. The message may be influenced by either the sender (dormant or arbitrary fault), some faulty relay, or both. To solve the SC problem, S C M I X must therefore completely remove the influences of d o r m a n t faulty senders, arbitrary faulty senders, and faulty relays.

In order to solve the SC problem, based on the oral message model, each processor should execute t message exchange rounds (stated in the Section 2) to collect the messages. At each round, a processor should broadcast its message to all processors and receive messages from all processors. To remove the influence caused by the faulty relays, each processor uses the fault-tolerant vir-

tual channel (FTVC) protocol to broadcast its message. The formal description

is presented in Section 3.1. Using FTVC, a fault-free processor can therefore detect that a processor Q is faulty if no message is received from Q during

(6)

one or some rounds. Once a processor Q is detected as faulty, the messages received from Q can be ignored at each subsequent round. Such an approach is called the

Absent Rule

ARsc and its formal definition is presented in Sec- tion 3.2. The absent rule can handle the faults, especially the dormant faults, that do not send messages as it should be. In [17], we found that the traditional majority vote is inappropriate for mixed failure types (also see Section 3.3). As a result, a new voting scheme VOTEsc is proposed for solving the SC problem with mixed failure types, and the formal description o f VOTEsc is presented in Section 3.3. Since the messages received from the faulty processors, detected by the absent rule, are ignored, the network behaves like a network with n - P d processors. Based on the majority concept, the total number of the tolerable faulty processors by SCMIX can be increased. With F T V C protocol, the absent rule ARsc and voting function VOTEsc, the proposed protoocol SCMIX consists of:

the message-exchange phase

and

the decision-making phase.

The main functions of these phases are shown in Fig. 2. Using these two phases, SCMIX can solve the SC problem.

Phase [ • Collect the messages to compute a common value for Message Exchange

I strong consensus Initialization

• Create an IG-tree with root E

Step 1: The first message exchange round

• Use FTVC to broadcast its initial value

• Receive the messages sent by all processors and store the received messages to the depth 1 of the 1G-tree

• Apply the absent rule to depth 1 of the 1G-tree

Step 2: The i-th message exchange round, where i = 2 to t + 1

• Pack the values stored in depth (i-1) of the IG-tree to the message M • Use FTVC to broadcast the message M

• Receive the messages from all processors

• According to the structure of the IG-tree, unpack the received messages, and store the unpacked messages to depth i of the IG-tree

• Apply the absent rule to depth i of the IG-tree

~ I G - t r e e

Decision _MakingPhase ] • Compute a common value for strong consensus Step 3: Compute the common value

• Apply the voting function VOTEsc to the root E of the IG-tree

Step 4: Output the common value

• Output the value stored at the root E of the 1G-tree as the comman value Fig. 2. The basic approaches of SCMIX.

(7)

H.-S. Siu et al. /Journal o f ln/ormation Sciences 108 (1998) 157 180 163

As for the data structure used to collect the messages, each fault-free processor maintains a tree structure, called the

information gathering tree

(IG-tree) [8], of depth t + 1. It is organized and labeled as follows. Each vertex of the IG-tree is labeled by a non-repeating sequence, :~, of processor identifiers. The root of the IG-tree is labeled by the empty sequence E and the parent of a vertex labeled by sequence c~p is labeled ~. Because all sequences are nonrepeating~ the root o f the IG-tree has n children, each child of the root has n - 1 children, and the vertices at depth t each has n - t leaves as it children. If a vertex is labeled c~p, it is said to correspond to processor p. The root of the IG-tree corresponds to no processor. Note that each level of an IG-tree contains a round of received messages. N o repeating processor identifier can avoid the recursive influences made by a faulty processor.

S C M I X is illustrated by an example that shows the complete procedure for executing S C M I X on the fault-free processor A in the network shown in Fig. 1. The same procedure is executed by each fault-free processor. Suppose that the d o r m a n t faulty processor F does not send any messages during the entire execution o f the protocol and m = 3. When S C M I X is finished, all the fault-free processors reach a c o m m o n value '1' that is the initial value of processor A, i.e., the

Agreement

and

Strong Validity

condition of the SC problem are both satisfied. Hence, S C M I X does solve the SC problem with mixed failure types. The procedure of S C M I X on processor A is presented as follows.

Message exchange phase

Initialization

• Create an IG-tree with root E

In the initial step, processor A creates an IG-tree with roQt E.

Step

l: The first message exchange round • Use FTVC to broadcast its initial value

Processor A uses F T V C to broadcast its initial value to all processors. • Receive the messages sent by all processors and store the received messages

to the depth 1 of the IG-tree

Although the message passing may be influenced by the faulty processors F, H, and I, processor A receives the messages sent by all processors, and stores the messages to depth 1 of its IG-tree as shown in Fig. 3(a). That means FTVC can remove the influences of the faulty relay processors. For example, A stores the value '1' received from processor B in the vertex B, denoted as val(B) = l, at depth 1 o f IG-tree as shown in Fig. 3(a).

• Apply the absent rule A R s c to depth 1 o f the IG-tree

To remove the influence of a d o r m a n t faulty processor, processor A then applies

ARsc

to depth 1 of its IG-tree as shown in Fig. 3(b). Note that the value stored as vertex F in the IG-tree is Q3 as shown in Fig. 3(a). That means processor A does not receive any messages from F and uses the value ~ to represent the message from F. After the absent rule is applied, processor A will use the value ,~/to replace the messages received from F as shown in Fig. 3(b). The

(8)

164 H.-S. Siu et al. / J o u r n a l o f I n f o r m a t i o n Sciences 108 ( 1 9 9 8 ) 1 5 ~ 1 8 0 Depth 0 Depth 1 O O A v.~(A )=1 E O B val (B)=I O c val ( C ) =1 O D v~ (D)=I O E val (E)=2 O F val (F)=¢ ~ O G val (G)=2 O H va] (H)=o O : voJ(: )=o : no message received

(a)

D e p t h 0 D e p t h 1 val ( A)= 1 o o E a val( B)= l o B va]( C)= ] O ~ - C v~l(D)= I O - - - - " D val( E ) = 2 O ~ • E v a l ( F ) = ,4 o F val( G ) = 2 O - - - • G val(/-/) = o O - - - H val( 1)=o ~ O 1 D e p t h 2 ~ s c O A B val (AB) = I O A C val(AC)= I O A D val (AD) = l O A E val(AE)=l O A F val (AF) =,,¢ O A G val(AG) = I O A H val(AH) =0 O A I val( A I ) = 0 Depth 0 O E O FA val ( FA ) = ~ 1 O F B va](FB)= ~ 4 1 O F C va](FC)= ~,¢1 O F D v)I(FD)= .~4/ O F E val(FE)= ;~¢/ O F G val(FG)= ~041 O F H val (F/./) = 0 o F I val( FI )=0 D e p t h 1 O A v~(A )=1 O B va] (B)=l O c v~ (c)=I O D va1(D)=l 0 E val (E)=2 O F val (F)-~ O G val (G)=2 O H val (H)=o 0 1 v~(l )=o

(b)

D e p t h 3

~

ABC vaI(ABC)= 1 ABD vaI(ABD)= I ABE val ( A B E ) = I

ABF val (ABF) = .,# ABG val (ABG) = I ABH val (ABH ) = 0

ABI val (ABI) = 0

O FAB val (FaB )= ~ ¢ 2 O F A C vaI(FAC) = ~ 4 2 O F A D vaI(FAD) = ~'¢2 O F A E val(FAE )= 7(~'¢ 2 O F A G vaI(FAG) = ~ ¢ 2 ' "(~FAH val (FA//) = 0

O F B I val(FBl ) = 0

01.4 val{iA )=o ~ / A B val(/Ao )=o

O I B val(lB )=O i ' O I A C val(lAC)=O

O I C val (/C ) = 0 O I A D val(1AD)=O

O l D val ( I O ) = 0 [ OlAE val(IAE)=O

O I E v a l ( l E ) - O O l A F val (/AF ) = ,.¢

O I F val ( IF ) = A [ O I A G val(1AG)=O

O I G val ( IG )=0 J ~ I A H val(lAH)=l

0114 val ( tH ) - I

(c)

Fig. 3. S C M I X solves t h e S C p r o b l e m f o r p r o c e s s o r A in the n e t w o r k m o d e l s h o w n in Fig. 1: (a) T h e first r o u n d I G - t r e e . (b) A f t e r t h e a b s e n t r u l e is a p p l i e d . (c) A f t e r t h e m e s s a g e e x c h a n g e p h a s e (d) A p p l y i n g t h e V O T E s o

(9)

H.-S. Siu et al. / J o u r n a l o f I n f o r m a t i o n Sciences 108 (1998) 1 5 ~ 1 8 0 165 Depth 0 , 0 *-Ty--~ E D ~ t h 1 D e p t h 2 Death 3 O A C "" I 1 0 A B D wJ(ABD)=I A

l

O A D l I O A B E ~ val(ABE)=l O A E [ ,,¢ O A B F ~ vaI(ABF)=.,¢

l ' l i

OAF l ,

O A G

[ 0 0 A B H

OABG ~

~,(AB~)= .

vaI(ABH)=O

O A H L 0 0 A B I vnI( ABI )=O

- O A 1 ~ o ~ B , o IDO

2 0

E , ¢ O F < c4 I FB ~" I;~'¢20FAC " cl val(FAC)= ~ ¢ 2 ~41 0 FC I ~ ! 2 0 F A D v~I ( FAD ) = ~ ¢ 2 ~ 4 1 0 FD 1 ~ 1 2 0 F A E ~ v~(FAE)= ~ ¢ 2 ~ ¢ I O F E 1 ~ 4 2 0 F A G ~ v~(FAG)= ~ ¢ 2

I OFG

|0

OFAH ~

val(rAH)= 0

0 FH L 0 o FAI vM ( FAI ) = 0

O F I

0 0 9 c3

H

m 0 0 1B c3 0 0 1AC val ( IAC ) = 0

0 0 I C O I A D ~ vaJ(lXD )=O

o o l ~ o _{0 IAE} ~ _{vaJ ( IAE}~.o

0 0 I E O l A F ~ val(lXF)=,¢ A O I F O I A G ~ ~ ( I A G ) = 0 0 o I G O I A H v'A (/AH) = ~ I O I H (d) Fig. 3 (continued)

value d will be relayed to all receivers as value ~ / 1 and the value ~ ¢ j will be relayed to all receivers as value N d j + l (the meaning of value ~ and . ~ ¢ / w i l l be described later), where 1 ~< j ~< t.

S t e p 2: the second message exchange round

• Pack the values stored in depth 1 of the IG-tree to the message M

Processor A packs the values stored in depth 1 of the IG-tree to the message M, namely (1,1,1,1,2, ~ 4 1 , 2,0,0). Note that the value d is replaced by . ~ d ] by using the absent rule.

• Use FTVC to broadcast the message M

Using FTVC, processor A broadcasts the massage M to all processors. • According to the structure of the IG-tree, unpack the received messages, and

(10)

166 H.-S. Siu et al. / Journal of Information Sciences 108 (1998) 157-180

• Processor A receives the messages sent by all processors, unpacks these messages, and stores the unpacked messages to the vertices at depth 2 of its IG-tree. • Apply the absent rule to depth 2 o f the IG-tree.

T o remove the influence of the detected faulty processors, processor A applies the absent rule to depth 2 of its IG-tree.

In the third round, processor A executes the same procedure as in the second round did. Processor A broadcast, using F T V C , the messages stored at depth 2 o f its IG-tree, receives the messages sent by all processors, and stores the received messages to depth 3 o f its IG-tree. After the message exchange phase, the messages collected in A's IG-tree is presented in Fig. 3(c) (due to space limitations, only a part of the IG-tree is shown). F o r example, the val(ABC) (val(ABC) = 1)

represents that the message was first sent by processor A to processor B, and then processor B relayed this message to processor C. Finally, A received this message from C and stored it in vertex ABC as shown in depth 3 o f the IG-tree in

Fig. 3(c). In this example, 505 (1 + 9 • 8 • 7) vertices are created in the IG-tree.

Decision making phase

Step 3: C o m p u t e the c o m m o n value

• Apply the voting function V O T E s c to the root E of the IG-tree.

• Processor A applies the voting function V O T E s c to its IG-tree to c o m p u t e the c o m m o n value for strong consensus. N o t e the value ~ ' (excluding the last round) is not counted at the time the VOTEs¢ is taken. The value '1' is stored in the root E after V O T E s c is applied as shown in Fig. 3(d). • Output the value stored at the root E of the IG-tree as the c o m m o n value.

Finally, processor A selects the value '1' stored in the root of the IG-tree as the c o m m o n value.

The detailed descriptions of the above steps for removing the influences of the multiple faulty processors are presented below.

3.1. Removing the influence o f a faulty relay

To remove the influence of a faulty relay, a protocol, called FTVC, provides

a fault-tolerant virtual channel on the physical links in a nonfully connected net-

work. To illustrate the concept o f FTVC, we first consider the case o f a single sender S and a single receiver R. S uses F T V C to send its message ms to R. An- alyzing this exchange will enable us to p o r t r a y the general situation in which every sender sends a message to every receiver. F o r example, when F T V C is applied to the network model shown in Fig. 4(a), receiver R can receive the fault-free message sent by sender S; while in the case o f Fig. 4(b), R can posi- tively detect that S did not send a message to it even if it does receive the false message sent by the arbitrary faulty relay.

As the Menger theorem [15] states, at least c disjoint paths exist between S and R if the connectivity o f the network is c. Hence S is able to send e copies of its messages through c disjoint paths to R. The c disjoint paths between S and R

(11)

H.-S. Siu et al. / Journal of lnformation Sciences 108 (1998) 157-180

167 0 Fault-free processor

S e n d e r ~ R e c e i v e r

(a)

I~ Dormant faulty processor • Arbitrary faulty processor

Sender ~ R e R i v e r

S

(b)

Fig. 4. An example of the function of FTVC.

can be predefined as stated in [9,7], and the path information is distributed to the relaying processors between S and R. A detailed description of the path information distribution is presented in Appendix A. According to the path information, a relay processor receives the message (R, S, ms) from a predefined immediate predecessor and sends the message to a predefined immediate successor. Since the network is synchronous, the predefined immediate successor P of S should have the message sent by S after a predefined time interval [13]; otherwise, it knows that S is faulty. When P receives no message from S, it will relay the symbol O (O ~ V) to its immediate successor along the predefined disjoint path between S and R to reflect the faulty status. These are the concepts of the transfer rules obeyed by each relay processor. The formal definition of the transfer rules is presented in Appendix B.

According to the transfer rules, an arbitrary faulty relay can modify at most one message, and a dormant faulty relay can drop at most one message. In the worst case, R will receive c - Pa copies of messages sent by S. Applying the majority vote MAJ to these messages, R can determine what message was sent by S if the constraint on connectivity, namely c > 2Pa + Pd, holds. MAJ has three possible outcomes:

Case 1: ms, if S is fault-free.

Case 2: O, if S does not send the message to R.

Case 3: Arbitrary value, if S has an arbitrary fault.

In case 1, R receives the message ms sent by the fault-free sender S when MAJ is applied to the receiver messages. If S does not send the message to R (case 2), R will use O as the message sent by S because the major of c - Pa copies of messages is O. The third outcome of MAJ implies that the received message is not only contaminated by a faulty relay, but is also contaminated by an arbitrary faulty sender. F T V C is unable to remove the influence in such a case; hence such an outcome for MAJ shall be an arbitrary value.

3.2. Removing the influence of a dormant faulty sender

Each fault-flee sender must send its messages to all receivers in each round of the message exchange phase. As mentioned in Section 3.1, a receiver can

(12)

therefore detect that a sender is faulty if no message is received from the sender (the output o f M A J o f F T V C is •). A fault-free receiver R can detect that a sender S is faulty if no message is received from S. If R receives no message from S at the rth round, all messages received from S (directly) at the rth round and any subsequent rounds will be replaced by the value d ; and this value will be relayed to the other receivers as value ~ d l . In each subsequent round, the value ~ 4 j will be relayed to the other receivers as value N d j + l ( ~ ' and ~ ¢ / j ~ V), where 1 ~<j ~< t.

Semantically, the value ~4 is represented as an absentee vote, and sender S is

treated as an absentee. Hence, the voting ticket of S is ignored during the de-

cision making phase. The value ~ ' j will be interpreted as the j t h time an ab- sent vote is reported. Receiver R will report to all other receivers that S is an

absentee, and then the faulty sender S will be forced out of the game of agreement; thus, S has no influence on the others when the voting function VOTEsc is taken in the decision making phase. The approach is called the Absent Rule (ARsc) and it can be formalized as follows.

"ARsc: When receiver R receives no message directly from sender S in the rth round, then all messages received from S in the rth and any subsequent rounds will be replaced by value ~¢, and this value will be relayed to the other receivers (if any) as value ~ ¢ 1 . In each subsequent round, the value ~ 4 j will be relayed to the other receivers as value ~ d j + 1 , where

l<~j<~t."

3.3. Removing the influence of an arbitrary faulty sender

After the message exchange phase, the messages collected in a fault-free receiver's IG-tree are free from the influence of faulty relays and the dormant faulty senders. However, the messages may still be contaminated by arbitrary faulty senders. In order to reach an agreement, such influences must be removed in the decision making phase.

Conventionally, the influence of arbitrary faulty senders is removed by means o f a recursive majority vote when only arbitrary faults are considered [3,6]. The main concept used in these protocols is majority because a majority

of the processors in the network are assumed to be fault-free. However, this concept is inappropriate for mixed failure types because a majority o f the processors may also fail. Using Fig. 5 as an example, there are four faulty proces-

sors (three d o r m a n t faulty processors and one arbitrary faulty processor; i.e., Pd = 3 and Pa = 1), a greater number than the number o f fault-free processors (three). Suppose that m = 3 (the number o f values o f V). The bound on the constraints on failures, namely n > max{mPa + P d, 3Pa +Pd}, holds because 7 > 3 • 1 + 3. However, the fault-free processors, A , B and C, are unable to

(13)

H.-S. Siu et al. / Journal o f Information Sciences 108 (1998) 15~180 169

v c = l C D

Fig. 5. A fully connected network with mixed failure types (n = 7). O Fault-free processor • Dormant fault O Arbitrary fault

reach agreement when a conventional majority vote is taken. A detailed description of this example is presented in Appendix C. Thus, a new voting scheme V O T E s c should be proposed to solve the SC problem with mixed failure types.

By the constraint on the number of processors required, namely n > max{mP~ + Pd, 3Pa + Pa}, it can tolerate k(-- max{m, 3}) more dormant faulty senders because k(Pa - 1) + (Pd + k) = kPa +Pd, where Pa ~> 1 if the net- work eliminates one arbitrary faulty sender. This phenomenon can be used by VOTEsc to remove the influence o f an arbitrary faulty sender. The basic concept o f V O T E s c is as follows. Let P be a fault-free processor and a be the vertex at depth i o f P's IG-tree, 1 <.i<~t. If P detects that k ( t - i + 1)+ [(n - 1)modk] children of a have value sJ, it uses the original value stored at o, namely val(a), as the output o f VOTEsc to remove the influence of the arbitrary faulty sender as in the above discussion; otherwise, it uses the most c o m m o n value of children o f a as the output of VOTEsc.

V O T E s c is always correct if vertex a corresponds to a fault-free or a dormant faulty sender since each fault-free receiver has the same message sent by the sender. On the other hand, the output of VOTEsc may be contaminated by Q after our approach is applied if vertex a corresponds to an arbitrary faulty sender Q (Q cooperates with other arbitrary faulty senders to prevent the fault-free processors from achieving a c o m m o n value). However, the influence of Q can still be removed during upper level voting if n > max{mPa +Pd,3Pa +Pd}. Appendix D presents the formal definition of VOTEsc.

4. Analysis and evaluation

S C M I X removes the influence o f processors subjected to various types of failures to enable all fault-free processors to reach a c o m m o n value to solve

(14)

170 H.-S. Siu et al. I Journal of Information Sciences 108 (1998) 157-180

the SC problem. Since a fault-free processor cannot know the fault status of the other processors, t + 1 rounds are required to reach an agreement. SCMIX uses the approaches stated in Section 3 to remove the multiple faulty components, and these approaches can be presented as the following primitives: • FTVC_SEND(m, Q): send the message m to processor Q by using FTVC.

• F T V C _ R E C E I V E ( m , Q): receive the message m from processor Q by using FTVC.

• A B S E N T _ R U L E ( r ) : apply the absent rule ARs¢ to depth r o f the IG-tree.

• VOTEsc(s): apply the function VOTE to vertex s.

Some additional primitives should be presented to ensure a thorough solu- tion:

• C R E A T E ( ~ Q , v): create the vertex ~Q, and set val(aQ) = v. • P A C K ( r , m): fold depth r o f the IG-tree to the message m.

• U N P A C K ( m , r): according the structure o f depth r o f the IG-tree, unfold the message.

• O U T P U T ( v ) : output the value v.

Using the above primitives, the formal procedure o f S C M I X is stated as follows.

1. Protocol S C M I X (for each processor P) 2. begin

3. /* Initialization */

4. C R E A T E (E, N U L L ) ;

5. /* M e s s a g e E x c h a n g e Phase *l

6. /* The first round */ 7. for Q E N do

8. FTVC_SEND(vp, O);

9. for Q E N do 10. begin

11. FTVC_RECEIVE(vq, Q);/* Vq is the initial value o f processor Q

*/ 12.

CREATE(Q,

Vq);

13. end; 14. A B S E N T _ R U L E ( l ) ; 15. /* round 2 to round t + 1 */ 16. f o r r = 2 t o t + l do 17. begin 18. P A C K ( r - 1, m); 19. for Q c N do 20. FTVC_SEND(m, Q); 21. for Q E N do 22. begin 23. F T V C _ R E C E I V E ( m , Q); 24. U N P A C K ( m , r);

(15)

H.-S. Siu et al. / Journal oJIn]brmation Sciences 108 (1998) 1 5 ~ 1 8 0 171 25. for a E m do 26. begin 27. v = val(o-); 28. C R E A T E ( a Q , v) 29. end 30. end; 31. A B S E N T _ R U L E ( r ) 32. end;

33. /* Decision Making Phase */

34. O U T P U T ( V O T E s c ( E ) ) 35. end.

4.1. Correctness

The goal of S C M I X is to enable all fault-free processors to reach a c o m m o n value to solve the SC problem; thus, the correctness of S C M I X can be proven f r o m the fact that the c o m m o n value of each fault-free processor satisfies the conditions of Agreement and Strong Validity. To reach a c o m m o n agreement, each fault-free processor must be insulated f r o m contamination by faulty processors. As stated in Section 3, S C M I X uses F T V C to remove the influences of faulty relays during each round of message exchange. To remove the influence of d o r m a n t faulty senders, S C M I X applies A R s c to be received messages after each round. Finally, it applies V O T E s c to the messages, received in the message exchange phase, to remove the influence of arbitrary faulty senders. When all contamination by faulty processors has been removed, an agreement is reached. This is the basic concept for proving the correctness of SCMIX.

Since S C M I X uses the IG-tree (based on the oral message model) to collect the messages as presented in Section 3, some concepts and terminology used by [8] are presented here. A vertex cr is called common, if each fault-free processor

computes a same value for ~r. In other words, a c o m m o n value for solving the SC p r o b l e m can be reached if the root of each fault-free processor's IG-tree is c o m m o n . To prove the root is c o m m o n , the term commonJJ'ontier is defined as

follows. I f every root-to-leaf path in an IG-tree contains a c o m m o n vertex, then the collection of c o m m o n vertices forms a c o m m o n frontier. By theJi'on- tier lemma in [8], the fault-free processor's IG-tree root is c o m m o n if the com-

m o n frontier exists on each fault-free processor's IG-tree. Hence, a c o m m o n agreement can be reached a m o n g the fault-free processors if a c o m m o n frontier does exist in each fault-free processor's IG-tree.

To prove the correctness of FTVC, the output of M A J shall be proven free from the influence of faulty relays. Thus, we shall prove that a fault-free receiver can receive a message sent by a fault-free sender, or can detect that the sender did not send a message to it. Accordingly, we first define the consistent vertex

(16)

Consistent vertex, vertex ~(~ = •i) in a fault-free receiver's IG-tree is a consistent vertex if sender i is fault-free or d o r m a n t fault. By the behavior of i, all fault-free receivers receive the identical message sent by i. Although a processor does not know which vertex is consistent, the consistent vertices do exist since some processors in the network are fault-free or d o r m a n t fault.

According to the definition of a consistent vertex, all fault-free receivers should receive an identical message sent by a sender if the influence of a faulty relay is removed. Therefore, the consistent vertices of an IG-tree are c o m m o n . Since the m a x i m u m n u m b e r of arbitrary faulty processors is Pa( ~< t), each root- to-leaf p a t h has at least one consistent vertex. Therefore, the c o m m o n frontier does exist in the IG-tree. Thus the root is a c o m m o n vertex due to the existence of a c o m m o n frontier. A c o m m o n agreement is reached a m o n g all fault-free processors; thus, the SC p r o b l e m with mixed failure types is solved.

To summarize the semantics for the following lemmas and theorems, Lem- m a 1 indicates that a fault-free receiver can receive the message sent by a fault- free sender by using FTVC. L e m m a 2 shows that a fault-free receiver can detect that the sender did not send a message to it by using FTVC. T h e o r e m 2 proves the correctness of FTVC. L e m m a 3 states that all consistent vertices in an IG-tree are c o m m o n after the voting function V O T E s c is applied to an IG-tree. By the definition o f a c o m m o n frontier, L e m m a 4 shows the existence o f a com- m o n frontier in an IG-tree. Based on the frontier lemma [8], T h e o r e m 3 shows that the root of a fault-free processor's IG-tree is c o m m o n . Finally, T h e o r e m 4 proves that the S C M I X is correct under the constraints on failures stated in Section 2.

Lemma 1. Using F T V C , f a u l t - f r e e receiver R can receive message m sent by f a u l t - f r e e sender S if c > 2Pa + Pd.

Proof. Using FTVC, fault-free sender S sends c copies of m to R through c disjoint paths. According to the p a t h information and transfer rules presented in Section 3.1, each d o r m a n t faulty relay can d r o p at most one message. In the worst case, R receives at least c - Pd messages sent by S. By hypothesis, we k n o w that c - Pd > 2Pa. Therefore, R can decide the message sent by S when the majority vote M A J is applied to these c - Pd messages. []

L e m m a 2. Using F T V C , f a u l t - f r e e receiver R can detect that sender S did not send a message to it i f c > 2P~ + Pd.

Proof. When S does not send a message to R, each fault-free immediate successor o f S (along the disjoint paths between S and R) will relay the symbol ~3 to R. In the worst case, R receives at least c - (Pa - 1) messages of value Q. By hypothesis, we k n o w that c - (Pd - 1) > 2Pa. Hence, the output

(17)

H.-S. Siu et al. / Journal o f lnformation Sciences 108 (1998) 157-180 173

of the majority vote M A J is @, and R notices that S did not send a message to it. []

Theorem 2. F T V C does remove the influence o f a f a u l t y relay if c > 2P~ + Pd.

Proof. By L e m m a s 1 and 2, the message received by R is free f r o m the influence of a faulty relay; thus, the theorem is proved. []

L e m m a 3. All consistent vertices are common after V O T E s c is" applied to an IG-tree if n > max{mPa + Pd, 3Pa + Pd}.

Proof. Suppose that k = max{m, 3}. Each consistent vertex a of an IG-tree can be proven to be c o m m o n in the following cases.

Case 1 (o- is a leaf). Fault-free and d o r m a n t faulty senders always send identical messages to all receivers. Hence, a is c o m m o n after V O T E s c is applied to o-.

Case 2 (a is at depth i, 1 ~< i ~< t).

Case 2.1. a has at least k*(t - i + 1) + [(n - 1)modk] children, each of which has a stored value ~¢. By condition c2 of V O T E s c stated in Appendix D, the original value stored at a, namely val(a), is used as the output of VOTEsc; thus, a is c o m m o n .

Case 2.2. a has j ( < k*(t - i + 1) + [(n - 1)modk]) children, each of which has a stored value s~¢. According to the structure o f the IG-tree, a has n - i children. By hypothesis, we have n - P ~ - Pd > 2P~. Since t ~> P~, we have n - i ~> n - t I> n - Pa; moreover, j ~< Pd, we can write n -- i -- j > 2Pa. Hence, by condition c3, c4 or c5 of V O T E s c stated in Appendix D, a is c o m m o n . [] Lemma 4. A common J~ontier does exist in the IG-tree.

Proof. By definition, an IG-tree is a tree o f depth t + 1. Since the m a x i m u m n u m b e r o f arbitrary faulty processors is Pa( ~< t), each root-to-leaf path has at least one consistent vertex. By L e m m a 3, a consistent vertex is c o m m o n . Therefore, a c o m m o n frontier does exist in an IG-tree. []

Theorem 3. The root o f a fault-free processor's IG-tree is common.

Proof. Let k = max{m, 3}. According to the structure of the IG-tree, root E has n children. I f no arbitrary faulty processor exists in the network, namely P~ = 0, the message passing is influenced by d o r m a n t faulty processors only. These influences are removed by using ARsc; therefore, E is c o m m o n . Generally, suppose that some senders in the network are subjected to arbitrary faults, namely P~ > 0. By hypothesis, we have,

n - P ~ - P ~ > ( k - 1)P... Since P~ > 0, we can write,

(18)

174 H.-S. Siu et al. /Journal of Information Sciences 108 (1998) 157-180 n > n - Pa, =~ n - Pd > n - Pa - Pd > (k - 1 ) P a ,

n > (k - 1)P~ + Pd.

Since (k - 1) > 2 (k = max{m, 3}), by majority concept, E is c o m m o n after VOTEsc is applied. []

Theorem 4. S C M I X does solve the S C problem with m i x e d f a i l u r e types i f

n > max{mPa + Pd, 3Pa + P d } a n d c > 2Pa +Pd-

Proof. By Theorem 3, the agreement condition is satisfied. That SCMIX satisfies the strong validity condition is shown as follows. Let VFF c_ V be the initial values o f the fault-free processors. If VVF = V, then strong validity is trivially satisfied. On the other hand, if lIFE ¢ V, then there are at most

k - l ( k = m a x { m , 3 } ) values among the fault-free processors. Since

n > kP.d + Pd, there are at least n - Pa - Pd > (k - 1)Pa fault-free processors. Thus, for at least one value in VFF, there are more than Pa fault-free processors with that initial value.

When a fault-free processor applies VOTEsc to the root E o f its IG-tree, it first applies VOTEs¢ to the n children o f E. By Lemma 3, each depth 1 vertex that corresponds to a fault-free processor outputs the original value stored at that vertex, which is the initial value of the corresponding processor. By the above observation, some value in VFF is output by VOTEsc for more than Pa vertices at depth 1 because n > kPa + Pd and the influence of the dormant faults is removed by ARsc. Since all fault-free processors agree on this value, strong validity is satisfied. Thus, the theorem is proven. []

4.2. C o m p l e x i t y

The SC problem with mixed failure types is solved by SCMIX that is based on the oral message model [3]. In this model, all fault-free processors should exchange enough messages in order to reach a c o m m o n value for the SC problem. Thus, the time for message passing dominates the entire execution o f SCMIX and the complexity analysis o f SCMIX is focused on message complex- ity. The complexity o f S C M I X is defined in terms of: (1) the number of rounds required, (2) the number o f messages required, and (3) the number o f faulty components allowed, In this section, we prove that SCMIX is optimal. It uses the minimum number of rounds and messages, and tolerates a maximum number o f faulty components.

To solve the SC problem with mixed failure types in a generalized network, Theorem 5 shows that SCMIX requires t + 1 rounds and (t + 1)cn 2 messages. Theorem 6 shows that SCMIX can solve the problem by using a minimum number o f rounds and messages, and Theorem 7 proves that SCMIX can tolerate a maximum number o f allowable faulty processors.

(19)

H.-S. Siu et al. I Journal o f lnformation Sciences 108 (1998) 157-180 175

Theorem 5. S C M I X requires t + 1 rounds and (t + 1)cn 2 messages to solve the SC problem with mixed failure types tf n > max{mPa +Pd,3Pa + P d } and c > 2Pa + Pd.

Proof. The message passing is required in the message exchange phase only; thus, S C M I X requires t + 1 rounds and this number is the minimum as shown by Fischer and Lynch [16]. In each round, a processor packs the values stored at the last level o f the IG-tree to a message, and uses FTVC (c copies o f the message are sent to a processor) to broadcast the message to all processors. Hence, there are cn 2 messages generated in each message exchange round.

Therefore, the total number o f messages required by SCMIX is (t + 1)cn 2. By Theorem 4, S C M I X can enable all fault-free processors to reach an agreement. Hence, the theorem is proven. []

Theorem 6. S C M I X solves the SC problem with mixed failure types by using a minimum number of rounds and messages.

Proof. If the system's fault status is unknown, then t + 1 rounds are proven to be the lower bound on message passing for reaching an agreement [16]. By Theorem 5, at least (t + 1)cn 2 messages are required to reach a c o m m o n value. Hence the theorem is proven. []

Theorem 7. The total number of allowable faulty processors by SCMIX, namely

Pa +Pd, is maximum t f n > max{mPa + Pd, 3Pa + P d } a n d c > 2Pa +Pd.

Proof. As stated in Section 1, a protocol for the SC problem with mixed failure types does exist if the constraints on failures, namely n > max{mPa + Pd, 3Pa + Pd} and c > 2Pa + Pd, hold. Otherwise, an agreement cannot be reached. If Pa + Pd is not the maximum number o f allowable faulty processors, then other constraints on failures should exist, namely n ~< max{mPa + Pd, 3Pa + Pa} or c ~< 2Pa + Pa- However, this stands in con- tradiction with Theorem 1. Thus, the theorem is proven. []

5. Conclusion

S C M I X is a protocol for solving the SC problem with mixed failure types in a network proven in Theorem 4. We have shown the conditions for an agreement, namely the number o f processors required and the connectivity required as stated in Theorem 1. Since S C M I X is based on the general assumptions of mixed failure types and generalized network topology, the protocol o f [6] is a special case o f S C M I X as shown in Table 1. F r o m the previous discussion, we can present the following results.

(20)

176 H.-S. Siu et al. / Journal o f Information Sciences 108 (1998) 15~180

1. In solving the SC problem for a nonfully connected network, SCMIX is optimal in terms of the number o f rounds required, the number o f messages required, and the number o f faulty components allowable as proven in The- orems 5-7.

2. S C M I X does not require a priori knowledge o f processor fault status. 3. S C M I X is designed to solve the SC problem with the most general assump-

tions on processors as shown in Table 1.

4. The F T V C protocol provides a reliable communication mechanism for removing the influence of faulty relays.

Since SCMIX was originally designed for handling processor faults, it cannot tolerate the maximum number o f allowable faulty components when processors and links can both fail [14]. Our future work will focus on improving SCMIX to where it can solve the SC problem with mixed failure types in both processors and links.

Appendix A. Path information

The path information about each sender and receiver pair is distributed to the reply processors between sender and receiver. Each relay processor P maintains tuple

(receiver, sender, predecessor, successor)

path information such that the path

(predecessor, P, successor)

constitute a subpath o f the path from the

sender

to the

receiver.

The sender and receiver also need the c neighbors along a prescribed set o f processor-disjoint paths. The sender will send c copies o f the message formatted

(receiver, sender, message)

along the c predefined paths to the receiver during each round of message passing.

Appendix B. Transfer rules

The transfer rules obeyed by a relay processor P are defined as follows: R 1: According to the path information described above, P only relays messages to its predefined immediate successor if it receives them from its predefined immediate predecessor.

R2: Let P be a predefined immediate successor of the sender S. If after time

Tk + Tsp, P

has not received a message from S, then P will relay the symbol O to its predefined immediate successor, where Tk is the starting time o f the kth round o f the message exchange phase, and T~p is the upper bound on communication time between S and P.

Semantically, R 1 indicates that a fault-free relay receives messages

only

from its predefined immediate predecessor and sends messages only to its predefined immediate successor. R2 is proposed to help R to determine the status of S.

(21)

Table 1 The constraints on failures for the SC protocols Assumption Arbitrary fault Dormant fault Result Fully connected Nonfully connected Fully connected Nonfully (protocol) network network network connected network Mixed fault Fully connected network Nonfully connected network Neiger [6] n > max{mP~, 3P~} NA SCMIX n > max{mP~, 3P~} n > max{mP~, 3P~} c > 2P~ n > max{mPd,3Pd} NA n > Pa n > Pd c >Pd n > max{m(& + pd), 3(& + &)} n > max{mPa + Pu, 3& + &} NA n > max{m& + pd, 3& +&}, c > 2Pa + Pd NA: not applicable. --d

(22)

178 H.-S. Siu et al. / Journal of lnformation Sciences 108 (1998) 157-180

Since the network is synchronous, the starting time of each round and the upper bound on each link's communication time can be predefined by each fault-free processor [13]. At T w after the starting time of the kth round, namely T~ + Tsp, the predefined immediate successor P of S should have the message sent by S; otherwise, it knows that S is faulty. When P receives no message from S, it relays the symbol • to its predefined immediate successor to reflect the fault status of S. The properties of path information and transfer rules can be found in current network path components such as ATM-based networks [18].

Appendix C. Conventional majority vote for mixed failure types

Fig. 6 shows that an agreement cannot be reached in the network shown in Fig. 5 when conventional majority voting is used. When conventional majority voting is applied to the internal vertex AB shown in Fig. 6, the output of the voting is still contaminated by vertex A B G that corresponds to the arbitrary faulty sender G (it sends different messages to different receivers). Although the vertices correspond to dormant faulty senders, ABD, ABE, and ABF, they are not counted when the vote is taken [10,7,11], and the number of children related to the fault-free senders in vertex AB is not greater than that of the ar- bitrary faulty senders. Hence, the voting result is dominated by the value stored in vertex ABG. Consequently, the fault-free processors, A, B and C, are unable to reach an agreement when conventional majority voting is used.

Appendix D. VOTEsc

VOTEsc only counts the non-sO values (excluding the last level of the IG- tree). Suppose that k = max{m, 3}. For all vertex a at depth i of an IG-tree, the output of VOTEsc depends on the following conditions:

O : Vertex corresponding to a fault-free sender t ~ : Vertex corresponding to a dormant tiaulty sender • : Vertex corresponding to a arbitrary faulty sender

0

ARC

AB • ABD

• ABE

• ABF

ABG

(23)

H.-S. Siu et al. / Journal of lnformation Sciences 108 (1998) 157-180 179 V O T E s c ( g ) begin if a is a l e a f / * c o n d i t i o n cl */ then o u t p u t val(a) else begin

let v be the m o s t c o m m o n value o f VOTEsc(o-p), for all child p o f vertex a stored at depth i o f IG-tree, a n d w be the n u m b e r o f copies o f value v; let x = k * (t - i + 1) + [(n - 1)modk]; if w ~> x and v = ~ / * c o n d i t i o n c2 */ then o u t p u t val(a) else if v :~ ~ d j , where 1 ~< j < t / * c o n d i t i o n c3 */ then o u t p u t v else if v = '~d:~/1 /* c o n d i t i o n c4 */ then o u t p u t value d else if v = ~ 4 j a n d j 7 £ 1 /* condition c5 */ then o u t p u t ~o~'j 1 end end.

N o t e t h a t if there is m o r e t h a n one m o s t c o m m o n value in conditions c3, e4, a n d c5, then the value returned is the one that appears first in a n y predefined ordering o f the values o f V ( V = {vl, v 2 , . . . , Vm}). All fault-free processors use the same ordering. I f the m o s t c o m m o n value is n o t unique, the value returned is the one that a p p e a r s first in a n y fixed e n u m e r a t i o n o f the values in V. C o n - ditions cl a n d c3 are similar to c o n v e n t i o n a l majority voting. T h e other three c o n d i t i o n s are used to handle cases o f mixed failure types. Semantically, conditions c4 a n d c5 are used to r e p o r t the existence o f an absentee. W h e n a m a - jority o f processors r e p o r t that an absentee exists, V O T E s c returns the value , ~ or ~ d i _ l to represent the event. As m e n t i o n e d in Section 3.3, V O T E s c uses val(a) as the o u t p u t if c o n d i t i o n c2 is satisfied.

W h e n V O T E s c is applied to the vertex A B shown in Fig. 5, c o n d i t i o n c2 o f V O T E s c is satisfied a n d the original value stored in vertex A B is used as the o u t p u t o f V O T E s c . Therefore, the influence o f the faulty processor G is re- m o v e d by using V O T E s c a n d all fault-free processors can reach a c o m m o n value '1' after the decision m a k i n g phase.

References

[1] M. Barborak, M. Malek, A. Dahbura, The Consensus Problem in Fault-Tolerant Computing, ACM Computing Surveys 25 (2) (1993) 171 220.

[2] D. Dolev, M.J. Fischer, R. Fowler, N.A. Lynch, H.R. Strong, An efficient algorithm for Byzantine agreement without authentication, Inform. Comput. 52 (1982) 257 274.

(24)

[3] L. Lamport, R. Shostak, M. Pease, The Byzantine Generals Problem, ACM Trans. Prog. Lang. Syst. 4 (3) (1982) 382401.

[4] H.G. Molina, F. Pittelli, S. Davidson, Applications of Byzantine Agreement in Database Systems, ACM Trans. TODS 11 (1) (1986) 2747.

[5] M. Pease, R. Shostak, L. Lamport, Reaching agreement in presence of faults, J. ACM 27 (2) (1980) 228-234.

[6] G. Neiger, Distributed Consensus revisited, Inform, Process. Lett. 49 (1994) 195-201. [7] F.J. Meyer, D.K. Pradhan, Consensus with dual failure modes, IEEE Trans. Parallel Distrib.

Syst. 2 (2) (1991) 214-222.

[8] A. Bar-Noy, D. Dolev, C. Dwork, R. Strong, Shifting gears: Changing algorithms on the fly to expedite Byzantine agreement, Proceedings of the Symposium on Principle of Distributed Computing 1987, pp. 42 51.

[9] D. Dolev, The Byzantine generals strike again, J. Algorithms 3 (1) (1982) 14-30.

[10] P. Lincoln, J. Rushby, A formally verified algorithm for interactive consistency under a hybrid fault model, Proceedings of the Symposium on Fault-Tolerate Computing Toulouse, 1993, pp. 402411.

[11] P. Thambidurai, Y.-K. Park, Interactive Consistency with Multiple failure modes, Proc. Symp. on Reliable Distributed Systems Columbus, OH, 1988, pp. 93-100.

[12] K. Shin, P. Ramanathan, Diagnosis of processors with Byzantine faults in a distributed computing systems, Proceedings of the Symposium on Fault-Tolerate Computing, 1987, pp. 55-60.

[13] M. Fischer, M. Paterson, N. Lynch, Impossibility of distributed consensus with one faulty process, J. ACM 32 (4) (1985) 374-382.

[14] K.Q. Yan, Y.H. Chin, S.C. Wang, Optimal agreement protocol in malicious faulty processors and faulty links, IEEE Trans. on Knowledge and Data Engrg. 4 (3) (1992) 266-280. [15] N. Deo, GRAPH THEORY with Applications to Engineering and Computer Science,

Prentice-Hall, Englewood Cliffs, NJ, 1974.

[16] M. Fischer, N. Lynch, A lower bound for the assure interactive consistency, Inform. Process. Lett. 14 (4) (1982) 183-186.

[17] H.S. Siu, Y.H. Chin, W.P. Yang, A note on consensus on dual failure modes, IEEE Trans. Parallel Distrib. Syst. 3 (1996) 230-255.

[18] R. Handel, M.N. Huber, Integrated Broadband Networks: An Introduction to ATM-based Networks, Addison-Wesley, Reading, MA, 1991.