Efficient algorithms for reliability analysis of distributed computing systems

(1)

Ecient algorithms for reliability analysis of

distributed computing systems

Min-Sheng Lin

a,*

, Ming-Sang Chang

b

, Deng-Jyi Chen

b

a_{Department of Information Management, Tamsui Oxford University College, 32, Chen-Li Rd.,}

Tamsui, Taipei, 25103, Taiwan, ROC

b_{Institute of Computer Science and Information Engineering, National Chiao-Tung University,}

Hsin Chu, 30050, Taiwan, ROC

Received 12 March 1998; received in revised form 23 October 1998; accepted 1 January 1999

Abstract

A distributed computing system is modeled as a collection of resources (e.g. pro-cessing elements, data ®les and programs) interconnected via an arbitrary communi-cation network and controlled by a distributed operating system. The distributed program reliability in a distributed computing system is the probability of successful execution of a program running on multiple processing elements and needs to retrieve data ®les from other processing elements. This reliability varies according to (1) the topology of the distributed computing system, (2) the reliability of the communication edges, (3) the data ®les and programs distribution among processing elements and (4) the data ®les required to execute a program. In addition, computing the reliability of distributed computing systems is #P-complete even when the distributed computing system is restricted to a series-parallel, a 2-tree, a tree, or a star structure. This paper presents ecient algorithms for computing the reliability of a distributed program running on other restricted classes of networks. Ó 1999 Elsevier Science Inc. All rights reserved.

Keywords: Distributed computing systems; Distributed program reliability; Computa-tional complexity; Algorithms

www.elsevier.com/locate/ins

*_{Corresponding author. E-mail: [email protected]}

(2)

1. Introduction

A typical distributed computing system (DCS) consists of processing ele-ments (nodes), communication links (links), memory units, data ®les, and programs [1,2]. These resources are interconnected via a communication net-work that dictates how information ¯ows between nodes. Programs residing on some nodes can run using data ®les at other nodes.

A previous investigation [3], introduced distributed program reliability (DPR) to evaluate the reliability of DCSs. Consider DCS in which the nodes are perfectly reliable but the links can fail, s-independently of each other, with known probabilities. Successfully executing a distributed program depends on the node containing the program, other nodes that have required data ®les, and the links between them being operational. DPR is thus de®ned as the proba-bility that a program with distributed ®les can run successfully despite some faults in the links. For example, consider the DCS in Fig. 1 which consists of four nodes (processing elements) and ®ve edges (communication links). This ®gure also includes the available ®les at each processing element. Assume that program f1 requires data ®les f2, f3, and f4to complete its execution, and it is

running at node v1, which holds data ®les f2and f3. Hence, it must access data

®le f4, which is stored in both nodes v2and v4. Therefore, the DPR of the DCS

in Fig. 1 can be formulated as: DPR Prob[(v1and v2are connected) or (v1and

v4 are connected)].

Although several algorithms have been proposed for evaluation DPR [4,5], none satisfy our desire for more ecient algorithms. We hypothesize that either the approaches examined are ineective, or that no ecient algorithms exist for our reliability problems. Lin and Chen [6] demonstrated, for the ®rst time, that computing DPR is #P-hard even when the distributed computing system is restricted to a series-parallel, a 2-tree, a tree, or a star structure. The class of #P-complete problems was introduced by Valiant [7]. The class #P

(3)

contains those problems that involve counting the accepting computations for problems in NP; the class of #P-complete problems contains the hardest problems in #P. As widely recognized, all known exact algorithms for these problems have exponential time complexity, thereby making it unlikely that ecient (polynomial time) algorithms can be developed for this class of problems. This complexity can be averted by considering only a restricted class of DCS's. In light of above discussion, this paper presents a polyno-mially-solvable case of DPR problem for star topologies in which data ®les are restricted to a certain type of distribution. A linear time algorithm is also proposed to verify whether or not a star DCS has this restricted class of ®le distribution. Also proposed herein are two polynomial-time algorithms for computing the DPR of a DCS with a linear and a circular structure, re-spectively.

2. Assumptions, de®nitions and notation Assumptions

· The nodes are perfect

· The edges are s-independent and either function or fail with known proba-bilities.

De®nitions

· A star DCS Dshas the consecutive file distribution property if and only if its

nodes can be linearly ordered such that, for each distinct ®le fi, the nodes

containing ®le fd occur consecutively. More formally, a star DCS Ds has

the consecutive file distribution property if and only if there exists a permuta-tion P [p(1), p(2), . . ., p(n)] of numbers {1, 2, . . ., n} such that if ®le fd 2 Api and fd2 Api, then fd2 Apk for all k, i < k < j.

· A set C of edges of Dsis referred to as a file cut set if and only if all edges in

C fail which implies system failure.

· A ®le cut set C is referred to as minimal if there is no other ®le cut set C0_such

that C0_C.

· A set I of edges for a linear DCS Dlis referred to as a file path set if and only

if all edges in I function which implies system functions.

· A ®le path set I is referred to as minimal if there is no other ®le path set I0

such that I0_I.

Notation (general)

D a Distributed Computing System (DCS)

(4)

ei edge i in D

vi node i in D

fi data ®le i

m number of distinct ®les in D

t total number of ®les in D

Ai the set of ®les available at node vi

pi probability that edge ei functions

qi probability that edge ei fails; º 1 ÿ pi

E complement of event E

for star topology

Ds a star DCS with n + 1 nodes {s, v1, v2, . . ., vn} and n edges

{e1 (s,v1), e2 (s,v2), . . ., en (s, vn)}

P º [p(1), p(2), . . ., p(n)] a permutation of numbers {1, 2, . . ., n} such that if ®le fd2 Apiand fd2 Apj, then fd2 Apkfor all k,

i < k < j

Cd the minimal ®le cut set for ®le fdif it consists of all edges (s, vi)

such that node vi contains ®le fd, i.e. Cd {(s, vi) | fd 2 Ai}.

(Without loss of generality, we reorder the minimal ®le cut sets, if necessary, by their minimal component, i.e. for two distinct minimal ®le cut sets Ciand Cj, i < j if and only if

min{k | (s, vpk) 2 Ci} < min{k | (s, vpk) 2 Cj}.)

U ordered set of all minimal ®le cut sets according to their

minimal components

r number of minimal ®le cut sets in U

ai º min{k | epk2 Ci}, i.e. the index of the minimal component in

Ci

bi º max{k | epk2 Ci}, i.e. the index of the maximal component

in Ci

H(i, j) º{epi, epi1, . . ., epj}; 1 6 i 6 j 6 n (note that Ciº H(ai, bi))

X(i, j) event: all edges in H(i, j) fail

Wi ºSij1X aj; bj (note that the DPR of Dscan be expressed as

1 ÿ Pr(Wr))

Fi event: the star DCS D0sfails in which it consists of i 1 nodes s,

vp1, vp2, . . ., vpi and i edges ep1, ep2, . . ., epi

for linear topology

Dl a linear DCS with n + 1 nodes {v0, v1, v2, . . ., vn} and n edges

{e1 (v0,v1), e2 (v1,v2), . . ., en (vnÿ1,vn)}

Ii the minimal ®le path set which starts at edge ei

bi º max{k | ek2 Ii} , i.e. , the index of the maximal component in

Ii

(5)

3. Ecient algorithms for computing DPR of DCS's

According to a previous investigation [6], computing DPR over a star DCS is #P-complete, implying that polynomial algorithms unlikely exist for solving them. However, ecient algorithms possibly exist for computing DPR over some restricted classes.

3.1. Star DCS's with a consecutive ®le distribution

In this section, we present a polynomial-time algorithm for computing the DPR of a star DCS with a consecutive ®le distribution. Let Dsbe a star DCS

and it have the consecutive ®le distribution property. Then, the minimal ®le cut sets can be ordered by their minimal component, i.e. for two distinct minimal ®le cut sets Ci and Cj, i < j if and only if min{k | (s, vpk) 2 Ci}< min{k | (s,

vpk) 2 Cj}. By de®nition, Ds fails if and only if at least one event X(ai, bi),

1 6 i 6 r, occurs, where aiand biare the indexes of the minimal and maximal

components in Ci, respectively. Clearly, if r 1, the unreliability of Ds can be

easily obtained as Pr[W1] Pr[X(a1, b1)]. Next consider the case with r P 2.

The unreliability of Dswith the ®rst i's ®le cut sets is

PrWi PrWiÿ1[ X ai; bi:

This expression can be decomposed using conditional probability as

PrWi PrWiÿ1 PrWiÿ1\ X ai; bi: 1

Consider the event Wiÿ1\ X ai; bi, which implies

· E1: For each k, 1 6 k 6 i ÿ 1, at least one edge e 2 Hak; bk Ckfunctions

and

· E2: All edges 2 Hai; bi Ci fail.

By event E2, event E1 can be rewritten as

Ui Sij1 Yj (Notably, the DPR of Dlcan be expressed as

1 ÿ Pr(Un))

Rj event: there exists an operating event Yibetween edges e1and ej

for ring topology

Dr a ring DCS with n nodes {v1, v2, . . ., vn} and n edges

{e1 (v1, v2), e2 (v2,v3), . . ., enÿ1 (vnÿ1, vn), en (vn, v1)}

D

rei the DCS Drwith edge ei (vi, vi1) contracted so that nodes vi

and vi1are merged into a single node. This newly merged node

contains all data ®les that were previously in nodes viand vi1,

and

(6)

· E0

1: For each k, 1 6 k 6 i ÿ 1, at least one edge e 2 fHak; bk ÿ Hai; big

functions.

A fundamental diculty in calculating Pr(E0

1) is that events in E01are not, in

general, disjoint. However, we can de®ne events Sj's that are disjoint by

Sj fE0

1occurs and edge epjis the last good oneg; for aiÿ16 j 6 aiÿ 1:

Thus, E0 1\ E2 [ aiÿ1 jaiÿ1 Sj\ E2 and PrWiÿ1\ X ai; bi Pr [ aiÿ1 jaiÿ1 Sj\ E2 " # : 2

Since Sj's are disjoint events, we have

Pr a[iÿ1 jaiÿ1 Sj\ E2 " # Xaiÿ1 jaiÿ1 PrSj\ E2: 3

The event Sj\ E2; aiÿ16 j 6 aiÿ 1, can be decomposed into three independent

events: {no ®le cut set fail between edges ep1 and epjÿ1}, {edge epj

func-tions}, and {all edges between epj1 and epbifail}. So

PrSj\ E2 1 ÿ PrFjÿ1 ppj PrX j 1; bi: 4

Therefore, according to Eqs. (1)±(4) , we have PrWi PrWiÿ1 Xaiÿ1 jaiÿ1 1 ÿ PrFjÿ1 ppj PrX j 1; bi : The following theorem can now be easily established.

Theorem 1. For 2 6 i 6 r: PrWi PrWiÿ1 Xaiÿ1 jaiÿ1 1 ÿ PrFjÿ1 ppj PrX j 1; bi ; 5

with the boundary conditions: PrW1 PrX a1; b1; and PrFk 0 for

0 6 k < b₁.

Before applying Theorem 1, initially compute the values of PrX j 1; b_i and Pr(Fjÿ1) for 2 6 i 6 r and aiÿ16 j 6 aiÿ 1. By noting that ag< ahwhenever

(7)

PrX j 1; bi

qpa1iÿ1 PrX aiÿ1; biÿ1

Qbi kbiÿ11 qpk for j aiÿ1; 1 qpj PrX j; bi for aiÿ1< j 6 aiÿ 1: 8 > < > : 6

By starting with PrX a1; b1 Qbka1 1qpk, we successively determine that

PrX a1 1; b2; PrX a1 2; b2; . . . ; PrX a2; b2;

PrX a2 1; b3; PrX a2 3; b3; . . . ; PrX a3; b3;

. . .

PrX arÿ1 1; br; PrX arÿ1 2; br; . . . ; and PrX ar; br:

To obtain the values of Pr(Fjÿ1) in Theorem 1, by de®nition, we have that

PrFk ₀PrWiÿ1 for b_{for k 6 b}iÿ16 k 6 biÿ 1; 1ÿ 1:

7 Hence, while computing Pr(Wi) by Theorem 1, we can also obtain Pr(Fk), for

biÿ1 6 k 6 biÿ1.

Next, the major algorithm-related strategies to compute the DPR of star DCS's are outlined. Given a star DCS Dsand the ®le distribution Ai's for each

node. By assuming that Dshas the property of consecutive ®le distribution, let

P be a permutation of numbers f1; 2; . . . ; ng such that if ®le fd2 Apiand fd2

Apj, then fd2 Apkfor all k, i < k < j. All ®le cut sets can be easily enumerated

from Ai's in the following manner: if node vicontains ®le fd, then ®le cut set Cd

contains edge ei. Subsequently, aiand bivalues of Cican be determined from

the permutation P such that ai min{k| epk2 Ci} and bi max{k| epk2 Ci}.

Then, remove the ®le cut sets which are not minimal and rearrange the re-maining minimal ®le cut sets according to their ai's values. Finally, use

The-orem 1, Eqs. (6) and (7) to compute the DPR ( 1 ÿ Pr[Wr] ). The algorithm is

formally described as belows. Algorithm Reliability_Star_DCS

Input: A star DCS Dswith n + 1 nodes {s, v1, v2, . . ., vn} and n edges

{(s,v1), (s,v2), . . ., (s,vn)}.

A permutation P [p(1), p(2), . . ., p(n)] of numbers {1, 2, . . ., n} such that if ®le fd2 Api, fd2Apj, then fd2 Apkfor all k,

i < k < j, where Airepresents the set of ®les available at node

vi.

(8)

begin

Step 1: // ®nd all ®le cut sets //

for i ¬ 1 to m do Ci ¬ B ; // initialization step; m is the number of dis-tinct ®les //

for i ¬ 1 to n do

for each fd2 Aido Cd¬ Cd[ {ei}; // For convenience, let eidenote edge

(s, vi) //

Step 2: // set the values of ai and bifor 1 6 i 6 m //

for i ¬ 1 to m do begin

ai¬ min{k| epk 2 Ci };

bi¬ max{k| epk2 Ci};

end

Step 3: // ®nd all minimal ®le cut set // U ¬ B;

for i ¬ 1 to m do U ¬ U [ {Ci}; for 1 6 i, j 6 m do

if (aiP ajand bi6 bj) then remove Cjfrom U; == which implies CiÍ

Cj //

Step 4: reorder the minimal ®le cut sets in U for two distinct minimal ®le cut sets Ciand Cj, i < j if and only if ai< aj;

Step 5: // compute PrX j 1; b_i, for 2 6 i 6 r and aiÿ16 j 6 aiÿ 1, by

Eq. (6) //

PrX a1; b1 Qbka1 1qpk;

for i ¬ 2 to r do // r is the number of minimal ®le cut sets in U // begin

Pr[X(ai-1+1, bi)] ¬ 1=qpaiÿ1 PrX aiÿ1; biÿ1

Q_b_i

kbiÿ11qpk;

for j ¬ ai-1+2 to aiÿ1do Pr[X(j + 1, bi)] ¬ 1=qpj PrX j; bi ;

end

Step 6: // Apply Theorem 1 and Eq. (7) to compute Pr(Wi) and Pr(Fj) //

PrW1 PrX a1; b1; // boundary condition //

for k ¬ 0 to b1ÿ1 do Pr(Fk) ¬ 0; == boundary condition // for i ¬ 2 to r do

begin

for k ¬ bi-1 to biÿ1 do Pr(Fk) ¬ Pr(Wi-1);

PrWi PrWiÿ1 Pjaaiÿ1iÿ1 1 ÿ PrFjÿ1 ppj PrX j 1; bi

; end Step 7: DPR ¬ 1 ÿ Pr(Wr); Output(DPR); end Reliability_Star_DCS Complexity analysis

The time complexity of Algorithm Reliability_Star_DCS is analyzed as follows. Step 1 performs Om Pn_i1 Om t Ot time (sinceApi

(9)

m < t) to identify all ®le cut sets, where t denotes the total number of ®les in Ds. Step 2 requires O2 Pm_i1j j Ot time to set aCi iand bi, 1 6 i 6 m and

step 3 takes O(m2_{) time to obtain all minimal ®le cut sets. Step 4 requires the}

reordering of all minimal ®le cut sets in a nondecreasing order of their index of the minimal component. This ordering can be executed in O(rálog r) using an ecient sorting algorithm, where r denotes the number of minimal ®le cut sets. In step 5, evaluating Pr[X(j + 1, bi)] by making use of Eq. (6) requires

that

O Pr

i2biÿ biÿ1 2

Ob_rÿ b₁ r On r; for j aiÿ1;

O Pr

i21

Or ÿ 1 Or; for aiÿ16 j

6 aiÿ 1: 8 > > > > < > > > > :

Hence, the total time to evaluate all Pr[X(j + 1, bi)] is therefore O(n + r).

In step 6, computing all Pr(Fk) takes OPri2biÿ biÿ1 Obrÿ b1 On

time and computing all Pr(Wi) takes OfPri21 aiÿ aiÿ1 3 O1

3 arÿ a1 On time. Therefore, the total time in step 6 is O(n). Clearly,

step 7 performs in constant time. Finally, the entire algorithm has time com-plexity O[t + t + m2_{+ rálog r + (n + r) + n]. Since t 6 mán, and r 6 m, the}

com-plexity of Algorithm Reliability_Star_DCS can be obtained as Om2_{m n.}

An illustrative example

To illustrate Algorithm Reliability_Star_DCS as stated above, consider the star DCS in Fig. 2 in which there is a consecutive ®le distribution property and the associative permutation P [3, 6, 4, 2, 5, 1, 7]. (In Section 3.2, we will show

(10)

how to identify the associative permutation when the star DCS has the con-secutive ®le distribution property.) The overall procedure is as follows:

Step 1: The ®le cut sets are found to be

C1 e2; e5; C2 e1; e5; e7; C3 e1; e2; e5; C4 e3; e6; C5 e2; e4; e5:

Step 2: According to the permutation

p1 3; p2 6; p3 4; p4 2; p5 5; p6 1; p7 7 and the results of Step 1, we have

a1 4; b1 5; a2 5; b2 7; a3 4;

b3 6; a4 1; b4 2; a5 3; b5 5:

Step 3: Since C1Ì C3 and C1Ì C5, remove C3 and C5. Thus, the set of

minimal ®le cut sets is U C1; C2; C4:

Step 4: Reorder the minimal ®le cut sets in such a manner that for Ciand Cj,

i < j if and only if ai< aj, and we obtain

C1 e3; e6; a1 1; b1 2;

C2 e2; e5; a2 4; b2 5;

C3 e1; e5; e7; a3 5; b3 7:

Step 5: By using Eq. (6), we have

PrX 1; 2 q3q6; PrX 2; 5 q6q4q2q5;

PrX 3; 5 q4q2q5; PrX 4; 5 q2q5; and PrX 5; 7 q5q1q7:

Step 6: We use Theorem 1 and Eq. (7) to compute Pr(Wi) and Pr(Fk) for

2 6 i 6 3 and biÿ 1 6 k 6 biÿ 1, and obtain

PrW1 q3q6; PrF0 PrF1 0 boundary condition Step 7: Therefore, DPR is i 2: PrF2 Pr(F3) Pr(F4) Pr(W1) q3q6, Pr(W2) Pr(W1) + [1 ÿ Pr(F0)] á p3á PrX 2; 5 (j 2) +[1 ÿ Pr(F1)] á p6á PrX 3; 5 (j 3) +[1 ÿ Pr(F2)] á p4á PrX 4; 5 (j 4) q3q6+ p3q6q4q2q5+ p6q4q2q5+ (1 ÿ q3q6) á p4q2q5 i 3: PrF5 Pr(W2) Pr(W3) Pr(W2) + [1 ÿ Pr(F3)] á p2á Pr[X(5,7)] (j 5) q3q6+ p3q6q4q2q5+ p6q4q2q5 + (1 ÿ q3q6) á p4q2q5+ (1 ÿ q3q6) á p2q5q1q7

(11)

DPR 1 ÿ PrW3

1 ÿ fq3q6 p3q6q4q2q5 p6q4q2q5 1 ÿ q3q6 p4q2q5

1 ÿ q3q6 p2q5q1q7g:

3.2. A linear-time algorithm of testing for the consecutive ®le distribution property in a star DCS

The previous section has presented a polynomial-time algorithm for com-puting the DPR of a star DCS when it has the consecutive ®le distribution property. In this section, we con®rm whether or not a star DCS has the con-secutive ®le distribution property. The problem statement would be:

Input: A star DCS Ds with n + 1 nodes s, v1, v2, . . ., vn and ®le distributions

Ai, 1 6 i 6 n.

Output: A permutation P [p(1), p(2), . . ., p(n)] of numbers {1; 2; . . . ; n} such that if ®le fd2 Apiand fd2 Apj, then fd2 Apk for all k, i < k < j.

Notably a solution does not always exist. To facilitate our search for the ®nding the correct ordering of P, we use a data structure of a PQ-tree pro-posed by Booth and Leuker [8]. A PQ-tree is a rooted tree that has nodes of two varieties: P-nodes and Q-nodes. A P-node is a node whose children can be arbitrarily permuted. A Q-node is a node whose children are ordered or reverse ordered. The frontier of a PQ-tree is the permutation of leaves from left to right. Two PQ-trees are equivalent if and only if one can be transformed into the other by applying a sequence of the following transformation rules. · arbitrarily permute the children of a P-node,

· reverse the children of a Q-node.

By using PQ-tree data structure, we have the following algorithm. Algorithm Check_Consecutive_File_Distribution

begin

T ¬ universal tree; // a single P-node connected to all the leaf nodes of {1, 2, . . ., n} //

for j ¬ 1 to m do Aÿ1

j ¬ B; // m denotes the number of distinct ®les in Ds// // Aÿ1

j is the set of indexes of nodes which contain the ®le fj //

for i ¬ 1 to n do

for each fj2 Aido Aÿ1j ¬ {i};

Input : A star DCS Ds with n + 1 nodes s, v1, v2, . . ., vn, n edges e1, e2,

. . ., en, where ei (s, vi) for 1 6 i 6 n, and ®le available set

Ai {fj| for each fj stored in node vi} for 1 6 i 6 n.

Output : A permutation P [p(1), p(2), . . ., p(n)] of numbers{1, 2, . . ., n}such that if ®le fd 2 Apiand fd 2Apj, then fd 2 Apkfor

(12)

for j ¬ 1 to m do T ¬ REDUCE(T, Aÿ1 j ); if T is a null tree

then

print out ``Dshas no consecutive ®le distribution property'' ;

else

print out the frontier of T ;

end Check_Consecutive_File_Distribution

The routine REDUCE attempts to apply a set of eleven templates. Each template consists of a pattern to be matched against the current PQ-tree and the set Aÿ1

j and a replacement to be substituted for the pattern. The templates

are applied from the bottom to the top of the tree. Notably, the null tree may be returned when no template is applied. For brevity, the details are omitted herein. Details of the algorithm can be found in Booth and Leuker [8]. Complexity analysis

For Aÿ1

j , 1 6 j 6 m, it can be obtained in Om

Pn

i1j j steps. AccordingAi

to [8], the loop of REDUCE routine can be computed in Om n Pm_j1jAÿ1 j j

steps. Furthermore, it is very easy to verify thatPn_i1j j Ai Pmj1jAÿ1j j t (the

total number of ®les in Ds). Therefore, the time complexity for the above al-gorithm is O(m + t) + O(m + n + t) O(m + n + t).

Consider the star DCS Ds shown in Fig. 2. Applying the above algorithm

lead to Aÿ1

1 f2; 5g; Aÿ12 f1; 5; 7g; Aÿ13 f1; 2; 5g; Aÿ14 f3; 6g; Aÿ15 f2; 4; 5g:

Fig. 3 displays the reduction steps. In an illustration of a PQ-tree, a P-node is drawn as a circle and a Q-node as a rectangle. From this ®gure, we can conclude that the star DCS Ds of Fig. 2 has the consecutive ®le distribution

property and one of the associative permutations is P 3; 6; 4; 2; 5; 1; 7:

3.3. Linear DCS's

In this section, we extend the results in Section 3.1 for computing the DPR of linear DCS's. Consider a linear DCS Dlwith n + 1 nodes {v0, v1, v2, . . ., vn}

and n edges {e1 (v0, v1), e2 (v1, v2), . . ., en (vnÿ1, vn)}. Let Iibe the minimal

®le path set which starts at edge ei. Notably, a linear DCS has the consecutive

®le distribution property resembling that of a star DCS such that for each minimal ®le path set I if ei2 I and ej2 I then ek2 I for all k, i < k < j.

(13)

Prob{at least one minimal ®le path set I whose all edges function} and the unreliability of a star DCS with the consecutive ®le distribution property can be expressed as Prob{at least one minimal ®le cut set C whose edges all fail}.Owing to this duality, a simple relationship exists between a linear DCS and a star DCS with the consecutive ®le distribution property. The relationship is stated as follows.

According to the mirror image described in Table 1, if let Wi Ui,

ai p(i) i, pi qi, Pr(Fi) Pr(Ri), and X(i, bi) Yi, in Theorem 1, then the

following theorem can be readily obtained to compute the reliability of a linear DCS Dl.

Theorem 2. For 2 6 i 6 n:

PrUi PrUiÿ1 1 ÿ PrRiÿ2 qiÿ1 PrYi

with the boundary conditions Pr(U1) Pr(Y1) and PrRj 0 for j 6 b1.

(14)

In addition, Pr(Yi) and Pr(Rj) can be easily obtained from Eq. (6) as follows.

PrYi 1

piÿ1 PrYiÿ1

Qbi jbiÿ11 pj for bi6 n; 0 for bi 1; 8 < : 8

with the boundary condition PrY1 Qbj11 pj, and

PrRj ₀PrUi for b_{for 0 6 j 6 b}i6 j 6 bi1ÿ 1; 1ÿ 1:

9 Next, the complete algorithm for computing the reliability of a linear DCS is presented as follows.

Algorithm Reliability_Linear_DCS

begin

Step 1: // ®nd all bi's //

for i ¬ 1 to m do NFi¬ 0 // NFiis the number of ®le fibetween vhand vt //

for each fi 2 A0 do NFi¬ 1;

h ¬ 0; // h and k are two indexes moving among nodes //

for k ¬ 1 to n do begin

for each ®le fi2 Ak do NFi¬ NFi+ 1; // update the total number of ®le i

for node vk //

MFPS ¬ true; // if there is a minimal ®le path set between vhand vt, then

MFPS true // while MFPS do

begin

for i ¬ 1 to m do if NFi 0 then MFPS ¬ false;

Input: A linear DCS Dlwith n + 1 nodes {v0, v1, v2, . . ., vn} and n

edges {e1 (v0,v1), e2 (v1,v2), . . ., en (vnÿ1, vn)}

Ai: the set of ®les available at node vi.

Output: the DPR of Dl

Table 1

The relationship between a linear DCS and a star DCS with the consecutive ®le distribution Star DCS Dswith the consecutive ®le

distribution M Linear DCS Dl

minimal ®le cut set C M minimal ®le path set I

qiº probability that edge eifails M piº probability that edge eifunctions

[p(1), p(2), . . ., p(n)] a permutation such that if ®le fd2 Apiand fd2 Apj, then

fd2 Apk for all k, i < k < j

M [p(1), p(2), . . ., p(n)] (1,2, . . ., n)

(15)

// check if there exists a minimal ®le path set if MFPS then

begin

for each ®le fi2 Ah do NFi¬ NFiÿ 1;

h ¬ h + 1; bh ¬ k; end end end for i ¬ h to n do bi¬ 1;

Step 2: // compute Pr(Yi) by Eq. (8) //

PrY1 Qbj11 pj// boundary condition // for i ¬ 1 to n do

begin

if bi6 n then PrYi 1=piÿ1 PrYiÿ1 Qbjbi iÿ11pj

else Pr(Yi) ¬ 0

end

Step 3: // Apply Theorem 2 and Eq. (9) to compute Pr(Ui) and Pr(Rj) //

for i ¬ 0 to b1ÿ 1 do Pr(Ri) ¬ 0; // boundary condition // Pr(U1) ¬ Pr(Y1) ; // boundary condition //

for i ¬ b1 to b2 ÿ 1 do Pr(Ri) ¬ Pr(U1); for i ¬ 2 to n do

begin

PrUi PrUiÿ1 1 ÿ PrRiÿ2 qiÿ1 PrYi;

for j ¬ bito bi+ 1 ÿ 1 do Pr(Rj) ¬ Pr(Ui);

end

Step 4: DPR ¬ Pr(Un); Output(DPR);

end Reliability_Linear_DCS Complexity analysis

For step 1, the computational complexity of the procedure biis O(nm) since

the value of h in the inner while_loop monotonously increases and does not exceed the value of k, i.e. the index of the outer for_loop. Computing Pr(Yi) in

step 2 is the similar operation as computing Pr[X(j, bi)] in step 5 of Algorithm

Reliability_Star_DCS. Thus, the complexity for step 2 is O(n + n) O(n). Step 3, which is the same as step 6 of Algorithm Reliability_Star_DCS, can be computed in O(n). Therefore, the algorithm Reliability_Linear_DCS takes O(nm) + O(n) + O(n) O(nm) time.

Consider the linear DCS Dl in Fig. 4. Applying the algorithm

(16)

Step 1:

b₁ 1; b₂ 2; b₃ 4; b₅ 1; Step 2:

PrY1 p1; PrY2 p2 p3 p4;

PrY3 p3 p4; fPrY4 p4 p5; PrY5 0;

Step 3:

PrR0 0;

PrU1 p1; PrR1 PrR2 PrR3 PrU1 p1;

Step 4: Therefore, DPR is Pr(U5) p1+ q1áp2áp3áp4+ q1áq2áp3áp4+ q1áq3áp4áp5.

3.4. Ring DCS's

A ring DCS is a DCS with a circular communication link. Each node connects two conjoining edges with two neighboring nodes. Assume that Dris

a DCS with a ring structure. According to the well known factoring theorem [7], the DPR of Dris obtained as follows:

DPRDr pi DPRDrei qi DPRDrÿ ei; 10

i 2: Pr(U3) Pr(U2) Pr(U1)+[1 ÿ Pr(R0)]áq1áPr(Y2)

p1+ q1áp2áp3áp4

i 3: Pr(U3) Pr(U2) + [1 ÿ Pr(R1)]áq2áPr(Y3)

p1+ q1áp2áp3áp4+ q1áq2áp3áp4

Pr(R4) Pr(U3) p1+ q1áp2áp3áp4+ q1áq2áp3áp4

i 4: Pr(U4) Pr(U3)+[1 ÿ Pr(R2)]áq3áPr(Y4)

p1+ q1áp2áp3áp4+ q1áq2áp3áp4+ q1áq3áp4áp5

Pr(R5) Pr(U4) p1+ q1áp2áp3áp4+ q1áq2áp3áp4+ q1áq3áp4áp5

i 5: Pr(U5) Pr(U4) + [1 ÿ Pr(R3)]áq4áPr(Y5)

Pr(U4) // since Pr(Y5) 0 //

p1+ q1áp2áp3áp4+ q1áq2áp3áp4+ q1áq3áp4áp5

(17)

where ei is an arbitrary edge of Dr. Since Drÿ ei is a DCS with a linear

structure with n ÿ 1 edges, its reliability can be computed by the algorithm Reliability_Linear_DCS in O(nm) time. Notably, D

reiremains a DCS with a

ring structure with n ÿ 1 edges. The same analysis is then applied to D rei. By

recursively applying Eq. (10), we decompose the ring DCS Drwith n edges into,

in the worst case, n linear DCSs. Therefore, we have an O(n2_{m) time algorithm}

for computing the reliability of a DCS with a ring structure. Algorithm Reliability_Ring_DCS(Dr)

Step 1: if there exists one node that holds all distinct data ®les then Return (DPR ¬ 1);

Step 2: Select an arbitrary edge ei of Dr; Step 3: Rell ¬ Reliability_Linear_DCS(Drÿ ei); Step 4: Relr ¬ Reliability_Ring_DCS(D

rei); Step 5: Return(DRP ¬ piáRelr+ qiáRell); end Reliability_Ring_DCS

Consider the DCS with a ring topology in Fig. 5. This is a simpli®cation of the DCS in Fig. 4 with one edge e6 added between nodes v5and v0. Applying

algorithm Reliability_Ring_DCS yields

DPRDr q6 DPRDrÿ e6 p6 DPRDre6

q6 DPRDrÿ e6 p6 q5DPRDre6ÿ e5

p5 DPRDre6e5:

(18)

The fact that there exists one node in D

re6e5that holds all distinct data ®les

{f1, f2, f3, f4}, so we have DPRDre6e5 1. The example in Section 3.3

ob-viously reveals that DPR(Drÿ e6) PrU5 and DPRDre6ÿ e5 Pr(U4).

Therefore, we have

DPRDr q6 p1 q1 p2 p3 p4 q1 q2 p3 p4 q1 q3 p4 p5

p6 q5 p1 q1 p2 p3 p4 q1 q2 p3 p4 p5:

4. Conclusions

This paper elucidates the distributed program reliability in various classes of distributed computing systems. This reliability is computationally intractable for arbitrarily distributed computing systems, even when it is restricted to the class of star distributed computing systems. A particular solvable case for star distributed computing systems is identi®ed, in which data ®les are distributed with respect to a consecutive property. In addition, a polynomial-time algo-rithm is developed for this case as well. Also proposed herein is a linear-time algorithm to verify whether or not an arbitrary star distributed computing system has this consecutive ®le distribution property. Furthermore, these re-sults are applied towards star DCS's to obtain the reliability of linear and ring DCS's in polynomial time. A future work should attempt to construct ecient algorithms for computing lower and upper bounds on the distributed program reliability for arbitrarily distributed computing systems.

References

[1] P. Enslow, What is a distributed data processing system, Computer, vol. 11, Jan. 1978. [2] J. Garcia-Molina, Reliability issues for fully replicated distributed database, IEEE Trans.

Computer 16 (1982) 34±42.

[3] A. Satyanarayana, J.N. Hagstrom, A new algorithm for the reliability analysis of multi-terminal networks, IEEE Trans. on Reliability 30 (1981) 325±334.

[4] A. Kumar, S. Rai, D.P. Agrawal, On computer communication network reliability under program execution constraints, IEEE JSAC 6 (1988) 1393±1399.

[5] V.K.P. Kumar, S. Hariri, C.S. Raghavendra, Distributed program reliability analysis, IEEE Trans. Software Eng. 12 (1986) 42±50.

[6] M.S. Lin, D.J. Chen, The computational complexity of the reliability problem on distributed systemsInformation Processing Letters 64 (1997) 143±147.

[7] L.G. Valiant, The complexity of enumeration and reliability problems, SIAM J. Computing 8 (1979) 410±421.

[8] K.S. Booth, G.S. Leuker, Testing for the consecutive ones property interval graphs and graph planarity using PQ-tree algorithms, Journal of Computer System and Science 13 (1976) 335± 379.

Efficient algorithms for reliability analysis of distributed computing systems

Ecient algorithms for reliability analysis of

distributed computing systems

Min-Sheng Lin

, Ming-Sang Chang

, Deng-Jyi Chen

Ecient algorithms for reliability analysis of