• 沒有找到結果。

Efficient algorithms for reliability analysis of distributed computing systems

N/A
N/A
Protected

Academic year: 2021

Share "Efficient algorithms for reliability analysis of distributed computing systems"

Copied!
18
0
0

加載中.... (立即查看全文)

全文

(1)

Ecient algorithms for reliability analysis of

distributed computing systems

Min-Sheng Lin

a,*

, Ming-Sang Chang

b

, Deng-Jyi Chen

b

aDepartment of Information Management, Tamsui Oxford University College, 32, Chen-Li Rd.,

Tamsui, Taipei, 25103, Taiwan, ROC

bInstitute of Computer Science and Information Engineering, National Chiao-Tung University,

Hsin Chu, 30050, Taiwan, ROC

Received 12 March 1998; received in revised form 23 October 1998; accepted 1 January 1999

Abstract

A distributed computing system is modeled as a collection of resources (e.g. pro-cessing elements, data ®les and programs) interconnected via an arbitrary communi-cation network and controlled by a distributed operating system. The distributed program reliability in a distributed computing system is the probability of successful execution of a program running on multiple processing elements and needs to retrieve data ®les from other processing elements. This reliability varies according to (1) the topology of the distributed computing system, (2) the reliability of the communication edges, (3) the data ®les and programs distribution among processing elements and (4) the data ®les required to execute a program. In addition, computing the reliability of distributed computing systems is #P-complete even when the distributed computing system is restricted to a series-parallel, a 2-tree, a tree, or a star structure. This paper presents ecient algorithms for computing the reliability of a distributed program running on other restricted classes of networks. Ó 1999 Elsevier Science Inc. All rights reserved.

Keywords: Distributed computing systems; Distributed program reliability; Computa-tional complexity; Algorithms

www.elsevier.com/locate/ins

*Corresponding author. E-mail: [email protected]

0020-0255/99/$ ± see front matter Ó 1999 Elsevier Science Inc. All rights reserved. PII: S 0 0 2 0 - 0 2 5 5 ( 9 9 ) 0 0 0 0 3 - 1

(2)

1. Introduction

A typical distributed computing system (DCS) consists of processing ele-ments (nodes), communication links (links), memory units, data ®les, and programs [1,2]. These resources are interconnected via a communication net-work that dictates how information ¯ows between nodes. Programs residing on some nodes can run using data ®les at other nodes.

A previous investigation [3], introduced distributed program reliability (DPR) to evaluate the reliability of DCSs. Consider DCS in which the nodes are perfectly reliable but the links can fail, s-independently of each other, with known probabilities. Successfully executing a distributed program depends on the node containing the program, other nodes that have required data ®les, and the links between them being operational. DPR is thus de®ned as the proba-bility that a program with distributed ®les can run successfully despite some faults in the links. For example, consider the DCS in Fig. 1 which consists of four nodes (processing elements) and ®ve edges (communication links). This ®gure also includes the available ®les at each processing element. Assume that program f1 requires data ®les f2, f3, and f4to complete its execution, and it is

running at node v1, which holds data ®les f2and f3. Hence, it must access data

®le f4, which is stored in both nodes v2and v4. Therefore, the DPR of the DCS

in Fig. 1 can be formulated as: DPR ˆ Prob[(v1and v2are connected) or (v1and

v4 are connected)].

Although several algorithms have been proposed for evaluation DPR [4,5], none satisfy our desire for more ecient algorithms. We hypothesize that either the approaches examined are ine€ective, or that no ecient algorithms exist for our reliability problems. Lin and Chen [6] demonstrated, for the ®rst time, that computing DPR is #P-hard even when the distributed computing system is restricted to a series-parallel, a 2-tree, a tree, or a star structure. The class of #P-complete problems was introduced by Valiant [7]. The class #P

(3)

contains those problems that involve counting the accepting computations for problems in NP; the class of #P-complete problems contains the hardest problems in #P. As widely recognized, all known exact algorithms for these problems have exponential time complexity, thereby making it unlikely that ecient (polynomial time) algorithms can be developed for this class of problems. This complexity can be averted by considering only a restricted class of DCS's. In light of above discussion, this paper presents a polyno-mially-solvable case of DPR problem for star topologies in which data ®les are restricted to a certain type of distribution. A linear time algorithm is also proposed to verify whether or not a star DCS has this restricted class of ®le distribution. Also proposed herein are two polynomial-time algorithms for computing the DPR of a DCS with a linear and a circular structure, re-spectively.

2. Assumptions, de®nitions and notation Assumptions

· The nodes are perfect

· The edges are s-independent and either function or fail with known proba-bilities.

De®nitions

· A star DCS Dshas the consecutive file distribution property if and only if its

nodes can be linearly ordered such that, for each distinct ®le fi, the nodes

containing ®le fd occur consecutively. More formally, a star DCS Ds has

the consecutive file distribution property if and only if there exists a permuta-tion P ˆ [p(1), p(2), . . ., p(n)] of numbers {1, 2, . . ., n} such that if ®le fd 2 Ap…i† and fd2 Ap…i†, then fd2 Ap…k† for all k, i < k < j.

· A set C of edges of Dsis referred to as a file cut set if and only if all edges in

C fail which implies system failure.

· A ®le cut set C is referred to as minimal if there is no other ®le cut set C0such

that C0 C.

· A set I of edges for a linear DCS Dlis referred to as a file path set if and only

if all edges in I function which implies system functions.

· A ®le path set I is referred to as minimal if there is no other ®le path set I0

such that I0 I.

Notation (general)

D a Distributed Computing System (DCS)

(4)

ei edge i in D

vi node i in D

fi data ®le i

m number of distinct ®les in D

t total number of ®les in D

Ai the set of ®les available at node vi

pi probability that edge ei functions

qi probability that edge ei fails; º 1 ÿ pi

E complement of event E

for star topology

Ds a star DCS with n + 1 nodes {s, v1, v2, . . ., vn} and n edges

{e1ˆ (s,v1), e2ˆ (s,v2), . . ., enˆ (s, vn)}

P º [p(1), p(2), . . ., p(n)] a permutation of numbers {1, 2, . . ., n} such that if ®le fd2 Ap…i†and fd2 Ap…j†, then fd2 Ap…k†for all k,

i < k < j

Cd the minimal ®le cut set for ®le fdif it consists of all edges (s, vi)

such that node vi contains ®le fd, i.e. Cdˆ {(s, vi) | fd 2 Ai}.

(Without loss of generality, we reorder the minimal ®le cut sets, if necessary, by their minimal component, i.e. for two distinct minimal ®le cut sets Ciand Cj, i < j if and only if

min{k | (s, vp…k†) 2 Ci} < min{k | (s, vp…k†) 2 Cj}.)

U ordered set of all minimal ®le cut sets according to their

minimal components

r number of minimal ®le cut sets in U

ai º min{k | ep…k†2 Ci}, i.e. the index of the minimal component in

Ci

bi º max{k | ep…k†2 Ci}, i.e. the index of the maximal component

in Ci

H(i, j) º{ep…i†, ep…i‡1†, . . ., ep…j†}; 1 6 i 6 j 6 n (note that Ciº H(ai, bi))

X(i, j) event: all edges in H(i, j) fail

Wi ºSijˆ1X …aj; bj† (note that the DPR of Dscan be expressed as

1 ÿ Pr(Wr))

Fi event: the star DCS D0sfails in which it consists of i ‡ 1 nodes s,

vp…1†, vp…2†, . . ., vp…i† and i edges ep…1†, ep…2†, . . ., ep…i†

for linear topology

Dl a linear DCS with n + 1 nodes {v0, v1, v2, . . ., vn} and n edges

{e1ˆ (v0,v1), e2ˆ (v1,v2), . . ., enˆ (vnÿ1,vn)}

Ii the minimal ®le path set which starts at edge ei

bi º max{k | ek2 Ii} , i.e. , the index of the maximal component in

Ii

(5)

3. Ecient algorithms for computing DPR of DCS's

According to a previous investigation [6], computing DPR over a star DCS is #P-complete, implying that polynomial algorithms unlikely exist for solving them. However, ecient algorithms possibly exist for computing DPR over some restricted classes.

3.1. Star DCS's with a consecutive ®le distribution

In this section, we present a polynomial-time algorithm for computing the DPR of a star DCS with a consecutive ®le distribution. Let Dsbe a star DCS

and it have the consecutive ®le distribution property. Then, the minimal ®le cut sets can be ordered by their minimal component, i.e. for two distinct minimal ®le cut sets Ci and Cj, i < j if and only if min{k | (s, vp…k†) 2 Ci}< min{k | (s,

vp…k†) 2 Cj}. By de®nition, Ds fails if and only if at least one event X(ai, bi),

1 6 i 6 r, occurs, where aiand biare the indexes of the minimal and maximal

components in Ci, respectively. Clearly, if r ˆ 1, the unreliability of Ds can be

easily obtained as Pr[W1] ˆ Pr[X(a1, b1)]. Next consider the case with r P 2.

The unreliability of Dswith the ®rst i's ®le cut sets is

Pr‰WiŠ ˆ Pr‰Wiÿ1[ X …ai; bi†Š:

This expression can be decomposed using conditional probability as

Pr‰WiŠ ˆ Pr‰Wiÿ1Š ‡ Pr‰Wiÿ1\ X …ai; bi†Š: …1†

Consider the event Wiÿ1\ X …ai; bi†, which implies

· E1: For each k, 1 6 k 6 i ÿ 1, at least one edge e 2 H…ak; bk†  Ckfunctions

and

· E2: All edges 2 H…ai; bi†  Ci fail.

By event E2, event E1 can be rewritten as

Ui Sijˆ1 Yj (Notably, the DPR of Dlcan be expressed as

1 ÿ Pr(Un))

Rj event: there exists an operating event Yibetween edges e1and ej

for ring topology

Dr a ring DCS with n nodes {v1, v2, . . ., vn} and n edges

{e1ˆ (v1, v2), e2ˆ (v2,v3), . . ., enÿ1ˆ (vnÿ1, vn), enˆ (vn, v1)}

D

rei the DCS Drwith edge eiˆ (vi, vi‡1) contracted so that nodes vi

and vi‡1are merged into a single node. This newly merged node

contains all data ®les that were previously in nodes viand vi‡1,

and

(6)

· E0

1: For each k, 1 6 k 6 i ÿ 1, at least one edge e 2 fH…ak; bk† ÿ H…ai; bi†g

functions.

A fundamental diculty in calculating Pr(E0

1) is that events in E01are not, in

general, disjoint. However, we can de®ne events Sj's that are disjoint by

Sj ˆ fE0

1occurs and edge ep…j†is the last good oneg; for aiÿ16 j 6 aiÿ 1:

Thus, E0 1\ E2ˆ [ aiÿ1 jˆaiÿ1 …Sj\ E2† and Pr‰Wiÿ1\ X …ai; bi†Š ˆ Pr [ aiÿ1 jˆaiÿ1 …Sj\ E2† " # : …2†

Since Sj's are disjoint events, we have

Pr a[iÿ1 jˆaiÿ1 …Sj\ E2† " # ˆ Xaiÿ1 jˆaiÿ1 Pr…Sj\ E2†: …3†

The event Sj\ E2; aiÿ16 j 6 aiÿ 1, can be decomposed into three independent

events: {no ®le cut set fail between edges ep…1† and ep…jÿ1†}, {edge ep…j†

func-tions}, and {all edges between ep…j‡1† and ep…bi†fail}. So

Pr…Sj\ E2† ˆ ‰1 ÿ Pr…Fjÿ1†Š  pp…j† Pr‰X …j ‡ 1; bi†Š: …4†

Therefore, according to Eqs. (1)±(4) , we have Pr…Wi† ˆ Pr…Wiÿ1† ‡ Xaiÿ1 jˆaiÿ1 ‰1  ÿ Pr…Fjÿ1†Š  pp…j† Pr‰X …j ‡ 1; bi†Š : The following theorem can now be easily established.

Theorem 1. For 2 6 i 6 r: Pr…Wi† ˆ Pr…Wiÿ1† ‡ Xaiÿ1 jˆaiÿ1 ‰1  ÿ Pr…Fjÿ1†Š  pp…j† Pr‰X …j ‡ 1; bi†Š ; …5†

with the boundary conditions: Pr…W1† ˆ Pr‰X …a1; b1†Š; and Pr…Fk† ˆ 0 for

0 6 k < b1. 

Before applying Theorem 1, initially compute the values of Pr‰X …j ‡ 1; bi†Š and Pr(Fjÿ1) for 2 6 i 6 r and aiÿ16 j 6 aiÿ 1. By noting that ag< ahwhenever

(7)

Pr‰X …j ‡ 1; bi†Š

ˆ qp…a1iÿ1† Pr‰X …aiÿ1; biÿ1†Š 

Qbi kˆbiÿ1‡1 qp…k† for j ˆ aiÿ1; 1 qp…j† Pr‰X …j; bi†Š for aiÿ1< j 6 aiÿ 1: 8 > < > : …6†

By starting with Pr‰X …a1; b1†Š ˆQbkˆa1 1qp…k†, we successively determine that

Pr‰X …a1‡ 1; b2†Š; Pr‰X …a1‡ 2; b2†Š; . . . ; Pr‰X …a2; b2†Š;

Pr‰X …a2‡ 1; b3†Š; Pr‰X …a2‡ 3; b3†Š; . . . ; Pr‰X …a3; b3†Š;

. . .

Pr‰X …arÿ1‡ 1; br†Š; Pr‰X …arÿ1‡ 2; br†Š; . . . ; and Pr‰X …ar; br†Š:

To obtain the values of Pr(Fjÿ1) in Theorem 1, by de®nition, we have that

Pr…Fk† ˆ 0Pr…Wiÿ1† for bfor k 6 biÿ16 k 6 biÿ 1; 1ÿ 1:



…7† Hence, while computing Pr(Wi) by Theorem 1, we can also obtain Pr(Fk), for

biÿ1 6 k 6 biÿ1.

Next, the major algorithm-related strategies to compute the DPR of star DCS's are outlined. Given a star DCS Dsand the ®le distribution Ai's for each

node. By assuming that Dshas the property of consecutive ®le distribution, let

P be a permutation of numbers f1; 2; . . . ; ng such that if ®le fd2 Ap…i†and fd2

Ap…j†, then fd2 Ap…k†for all k, i < k < j. All ®le cut sets can be easily enumerated

from Ai's in the following manner: if node vicontains ®le fd, then ®le cut set Cd

contains edge ei. Subsequently, aiand bivalues of Cican be determined from

the permutation P such that aiˆ min{k| ep…k†2 Ci} and biˆ max{k| ep…k†2 Ci}.

Then, remove the ®le cut sets which are not minimal and rearrange the re-maining minimal ®le cut sets according to their ai's values. Finally, use

The-orem 1, Eqs. (6) and (7) to compute the DPR ( ˆ 1 ÿ Pr[Wr] ). The algorithm is

formally described as belows. Algorithm Reliability_Star_DCS

Input: A star DCS Dswith n + 1 nodes {s, v1, v2, . . ., vn} and n edges

{(s,v1), (s,v2), . . ., (s,vn)}.

A permutation P ˆ [p(1), p(2), . . ., p(n)] of numbers {1, 2, . . ., n} such that if ®le fd2 Ap…i†, fd2Ap…j†, then fd2 Ap…k†for all k,

i < k < j, where Airepresents the set of ®les available at node

vi.

(8)

begin

Step 1: // ®nd all ®le cut sets //

for i ¬ 1 to m do Ci ¬ B ; // initialization step; m is the number of dis-tinct ®les //

for i ¬ 1 to n do

for each fd2 Aido Cd¬ Cd[ {ei}; // For convenience, let eidenote edge

(s, vi) //

Step 2: // set the values of ai and bifor 1 6 i 6 m //

for i ¬ 1 to m do begin

ai¬ min{k| ep…k† 2 Ci };

bi¬ max{k| ep…k†2 Ci};

end

Step 3: // ®nd all minimal ®le cut set // U ¬ B;

for i ¬ 1 to m do U ¬ U [ {Ci}; for 1 6 i, j 6 m do

if (aiP ajand bi6 bj) then remove Cjfrom U; == which implies CiÍ

Cj //

Step 4: reorder the minimal ®le cut sets in U for two distinct minimal ®le cut sets Ciand Cj, i < j if and only if ai< aj;

Step 5: // compute Pr‰X …j ‡ 1; bi†Š, for 2 6 i 6 r and aiÿ16 j 6 aiÿ 1, by

Eq. (6) //

Pr‰X …a1; b1†Š Qbkˆa1 1qp…k†;

for i ¬ 2 to r do // r is the number of minimal ®le cut sets in U // begin

Pr[X(ai-1+1, bi)] ¬ 1=…qp…aiÿ1††  Pr‰X …aiÿ1; biÿ1†Š 

Qbi

kˆbiÿ1‡1qp…k†;

for j ¬ ai-1+2 to aiÿ1do Pr[X(j + 1, bi)] ¬ 1=…qp…j††  Pr‰X …j; bi†Š ;

end

Step 6: // Apply Theorem 1 and Eq. (7) to compute Pr(Wi) and Pr(Fj) //

Pr…W1† Pr‰X …a1; b1†Š; // boundary condition //

for k ¬ 0 to b1ÿ1 do Pr(Fk) ¬ 0; == boundary condition // for i ¬ 2 to r do

begin

for k ¬ bi-1 to biÿ1 do Pr(Fk) ¬ Pr(Wi-1);

Pr…Wi† Pr…Wiÿ1† ‡Pjˆaaiÿ1iÿ1 ‰1 ÿ Pr…Fjÿ1†Š  pp…j† Pr‰X …j ‡ 1; bi†Š

 ; end Step 7: DPR ¬ 1 ÿ Pr(Wr); Output(DPR); end Reliability_Star_DCS Complexity analysis

The time complexity of Algorithm Reliability_Star_DCS is analyzed as follows. Step 1 performs O…m ‡Pniˆ1 † ˆ O…m ‡ t† ˆ O…t† time (sinceAp…i†

(9)

m < t) to identify all ®le cut sets, where t denotes the total number of ®les in Ds. Step 2 requires O…2 Pmiˆ1j j†  O…t† time to set aCi iand bi, 1 6 i 6 m and

step 3 takes O(m2) time to obtain all minimal ®le cut sets. Step 4 requires the

reordering of all minimal ®le cut sets in a nondecreasing order of their index of the minimal component. This ordering can be executed in O(rálog r) using an ecient sorting algorithm, where r denotes the number of minimal ®le cut sets. In step 5, evaluating Pr[X(j + 1, bi)] by making use of Eq. (6) requires

that

O Pr

iˆ2‰…biÿ biÿ1† ‡ 2Š

 

ˆ O…brÿ b1‡ r†  O…n ‡ r†; for j ˆ aiÿ1;

O Pr

iˆ2…1†

 

ˆ O…r ÿ 1† ˆ O…r†; for aiÿ16 j

6 aiÿ 1: 8 > > > > < > > > > :

Hence, the total time to evaluate all Pr[X(j + 1, bi)] is therefore O(n + r).

In step 6, computing all Pr(Fk) takes O‰Priˆ2…biÿ biÿ1†Š ˆ O…brÿ b1†  O…n†

time and computing all Pr(Wi) takes OfPriˆ2‰1 ‡ …aiÿ aiÿ1†  3Š ˆ O‰1‡

3  …arÿ a1†Š  O…n† time. Therefore, the total time in step 6 is O(n). Clearly,

step 7 performs in constant time. Finally, the entire algorithm has time com-plexity O[t + t + m2+ rálog r + (n + r) + n]. Since t 6 mán, and r 6 m, the

com-plexity of Algorithm Reliability_Star_DCS can be obtained as O…m2‡ m  n†.

An illustrative example

To illustrate Algorithm Reliability_Star_DCS as stated above, consider the star DCS in Fig. 2 in which there is a consecutive ®le distribution property and the associative permutation P ˆ [3, 6, 4, 2, 5, 1, 7]. (In Section 3.2, we will show

(10)

how to identify the associative permutation when the star DCS has the con-secutive ®le distribution property.) The overall procedure is as follows:

Step 1: The ®le cut sets are found to be

C1ˆ e2; e5; C2ˆ e1; e5; e7; C3ˆ e1; e2; e5; C4ˆ e3; e6; C5ˆ e2; e4; e5:

Step 2: According to the permutation

p…1† ˆ 3; p…2† ˆ 6; p…3† ˆ 4; p…4† ˆ 2; p…5† ˆ 5; p…6† ˆ 1; p…7† ˆ 7 and the results of Step 1, we have

a1ˆ 4; b1ˆ 5; a2ˆ 5; b2ˆ 7; a3ˆ 4;

b3ˆ 6; a4ˆ 1; b4ˆ 2; a5ˆ 3; b5ˆ 5:

Step 3: Since C1Ì C3 and C1Ì C5, remove C3 and C5. Thus, the set of

minimal ®le cut sets is U ˆ C1; C2; C4:

Step 4: Reorder the minimal ®le cut sets in such a manner that for Ciand Cj,

i < j if and only if ai< aj, and we obtain

C1ˆ e3; e6; a1ˆ 1; b1ˆ 2;

C2ˆ e2; e5; a2ˆ 4; b2ˆ 5;

C3ˆ e1; e5; e7; a3ˆ 5; b3ˆ 7:

Step 5: By using Eq. (6), we have

Pr‰X …1; 2†Š ˆ q3q6; Pr‰X …2; 5†Š ˆ q6q4q2q5;

Pr‰X …3; 5†Š ˆ q4q2q5; Pr‰X …4; 5†Š ˆ q2q5; and Pr‰X …5; 7†Š ˆ q5q1q7:

Step 6: We use Theorem 1 and Eq. (7) to compute Pr(Wi) and Pr(Fk) for

2 6 i 6 3 and biÿ 1 6 k 6 biÿ 1, and obtain

Pr…W1† ˆ q3q6; Pr…F0† ˆ Pr…F1† ˆ 0 …boundary condition† Step 7: Therefore, DPR is i ˆ 2: Pr…F2† ˆ Pr(F3) ˆ Pr(F4) ˆ Pr(W1) ˆ q3q6, Pr(W2) ˆ Pr(W1) + [1 ÿ Pr(F0)] á p3á Pr‰X …2; 5†Š (j ˆ 2) +[1 ÿ Pr(F1)] á p6á Pr‰X …3; 5†Š (j ˆ 3) +[1 ÿ Pr(F2)] á p4á Pr‰X …4; 5†Š (j ˆ 4) ˆ q3q6+ p3q6q4q2q5+ p6q4q2q5+ (1 ÿ q3q6) á p4q2q5 i ˆ 3: Pr…F5† ˆ Pr(W2) Pr(W3) ˆ Pr(W2) + [1 ÿ Pr(F3)] á p2á Pr[X(5,7)] (j ˆ 5) ˆ q3q6+ p3q6q4q2q5+ p6q4q2q5 + (1 ÿ q3q6) á p4q2q5+ (1 ÿ q3q6) á p2q5q1q7

(11)

DPR ˆ 1 ÿ Pr…W3†

ˆ 1 ÿ fq3q6‡ p3q6q4q2q5‡ p6q4q2q5‡ …1 ÿ q3q6†  p4q2q5

‡ …1 ÿ q3q6†  p2q5q1q7g:

3.2. A linear-time algorithm of testing for the consecutive ®le distribution property in a star DCS

The previous section has presented a polynomial-time algorithm for com-puting the DPR of a star DCS when it has the consecutive ®le distribution property. In this section, we con®rm whether or not a star DCS has the con-secutive ®le distribution property. The problem statement would be:

Input: A star DCS Ds with n + 1 nodes s, v1, v2, . . ., vn and ®le distributions

Ai, 1 6 i 6 n.

Output: A permutation P ˆ [p(1), p(2), . . ., p(n)] of numbers {1; 2; . . . ; n} such that if ®le fd2 Ap…i†and fd2 Ap…j†, then fd2 Ap…k† for all k, i < k < j.

Notably a solution does not always exist. To facilitate our search for the ®nding the correct ordering of P, we use a data structure of a PQ-tree pro-posed by Booth and Leuker [8]. A PQ-tree is a rooted tree that has nodes of two varieties: P-nodes and Q-nodes. A P-node is a node whose children can be arbitrarily permuted. A Q-node is a node whose children are ordered or reverse ordered. The frontier of a PQ-tree is the permutation of leaves from left to right. Two PQ-trees are equivalent if and only if one can be transformed into the other by applying a sequence of the following transformation rules. · arbitrarily permute the children of a P-node,

· reverse the children of a Q-node.

By using PQ-tree data structure, we have the following algorithm. Algorithm Check_Consecutive_File_Distribution

begin

T ¬ universal tree; // a single P-node connected to all the leaf nodes of {1, 2, . . ., n} //

for j ¬ 1 to m do Aÿ1

j ¬ B; // m denotes the number of distinct ®les in Ds// // Aÿ1

j is the set of indexes of nodes which contain the ®le fj //

for i ¬ 1 to n do

for each fj2 Aido Aÿ1j ¬ {i};

Input : A star DCS Ds with n + 1 nodes s, v1, v2, . . ., vn, n edges e1, e2,

. . ., en, where eiˆ (s, vi) for 1 6 i 6 n, and ®le available set

Aiˆ {fj| for each fj stored in node vi} for 1 6 i 6 n.

Output : A permutation P ˆ [p(1), p(2), . . ., p(n)] of numbers{1, 2, . . ., n}such that if ®le fd 2 Ap…i†and fd 2Ap…j†, then fd 2 Ap…k†for

(12)

for j ¬ 1 to m do T ¬ REDUCE(T, Aÿ1 j ); if T is a null tree

then

print out ``Dshas no consecutive ®le distribution property'' ;

else

print out the frontier of T ;

end Check_Consecutive_File_Distribution

The routine REDUCE attempts to apply a set of eleven templates. Each template consists of a pattern to be matched against the current PQ-tree and the set Aÿ1

j and a replacement to be substituted for the pattern. The templates

are applied from the bottom to the top of the tree. Notably, the null tree may be returned when no template is applied. For brevity, the details are omitted herein. Details of the algorithm can be found in Booth and Leuker [8]. Complexity analysis

For Aÿ1

j , 1 6 j 6 m, it can be obtained in O…m ‡

Pn

iˆ1j j† steps. AccordingAi

to [8], the loop of REDUCE routine can be computed in O…m ‡ n ‡Pmjˆ1jAÿ1 j j†

steps. Furthermore, it is very easy to verify thatPniˆ1j j ˆAi Pmjˆ1jAÿ1j j ˆ t (the

total number of ®les in Ds). Therefore, the time complexity for the above al-gorithm is O(m + t) + O(m + n + t) ˆ O(m + n + t).

An illustrative example

Consider the star DCS Ds shown in Fig. 2. Applying the above algorithm

lead to Aÿ1

1 ˆ f2; 5g; Aÿ12 ˆ f1; 5; 7g; Aÿ13 ˆ f1; 2; 5g; Aÿ14 ˆ f3; 6g; Aÿ15 ˆ f2; 4; 5g:

Fig. 3 displays the reduction steps. In an illustration of a PQ-tree, a P-node is drawn as a circle and a Q-node as a rectangle. From this ®gure, we can conclude that the star DCS Ds of Fig. 2 has the consecutive ®le distribution

property and one of the associative permutations is P ˆ ‰3; 6; 4; 2; 5; 1; 7Š:

3.3. Linear DCS's

In this section, we extend the results in Section 3.1 for computing the DPR of linear DCS's. Consider a linear DCS Dlwith n + 1 nodes {v0, v1, v2, . . ., vn}

and n edges {e1ˆ (v0, v1), e2ˆ (v1, v2), . . ., enˆ (vnÿ1, vn)}. Let Iibe the minimal

®le path set which starts at edge ei. Notably, a linear DCS has the consecutive

®le distribution property resembling that of a star DCS such that for each minimal ®le path set I if ei2 I and ej2 I then ek2 I for all k, i < k < j.

(13)

Prob{at least one minimal ®le path set I whose all edges function} and the unreliability of a star DCS with the consecutive ®le distribution property can be expressed as Prob{at least one minimal ®le cut set C whose edges all fail}.Owing to this duality, a simple relationship exists between a linear DCS and a star DCS with the consecutive ®le distribution property. The relationship is stated as follows.

According to the mirror image described in Table 1, if let Wiˆ Ui,

aiˆ p(i) ˆ i, piˆ qi, Pr(Fi) ˆ Pr(Ri), and X(i, bi) ˆ Yi, in Theorem 1, then the

following theorem can be readily obtained to compute the reliability of a linear DCS Dl.

Theorem 2. For 2 6 i 6 n:

Pr…Ui† ˆ Pr…Uiÿ1† ‡ ‰…1 ÿ Pr…Riÿ2†Š  qiÿ1 Pr…Yi†

with the boundary conditions Pr(U1) ˆ Pr(Y1) and Pr…Rj† ˆ 0 for j 6 b1. 

(14)

In addition, Pr(Yi) and Pr(Rj) can be easily obtained from Eq. (6) as follows.

Pr…Yi† ˆ 1

piÿ1 Pr…Yiÿ1† 

Qbi jˆbiÿ1‡1 pj for bi6 n; 0 for biˆ 1; 8 < : …8†

with the boundary condition Pr…Y1† ˆQbjˆ11 pj, and

Pr…Rj† ˆ 0Pr…Ui† for bfor 0 6 j 6 bi6 j 6 bi‡1ÿ 1; 1ÿ 1:



…9† Next, the complete algorithm for computing the reliability of a linear DCS is presented as follows.

Algorithm Reliability_Linear_DCS

begin

Step 1: // ®nd all bi's //

for i ¬ 1 to m do NFi¬ 0 // NFiis the number of ®le fibetween vhand vt //

for each fi 2 A0 do NFi¬ 1;

h ¬ 0; // h and k are two indexes moving among nodes //

for k ¬ 1 to n do begin

for each ®le fi2 Ak do NFi¬ NFi+ 1; // update the total number of ®le i

for node vk //

MFPS ¬ true; // if there is a minimal ®le path set between vhand vt, then

MFPS ˆ true // while MFPS do

begin

for i ¬ 1 to m do if NFiˆ 0 then MFPS ¬ false;

Input: A linear DCS Dlwith n + 1 nodes {v0, v1, v2, . . ., vn} and n

edges {e1ˆ (v0,v1), e2ˆ (v1,v2), . . ., enˆ (vnÿ1, vn)}

Ai: the set of ®les available at node vi.

Output: the DPR of Dl

Table 1

The relationship between a linear DCS and a star DCS with the consecutive ®le distribution Star DCS Dswith the consecutive ®le

distribution M Linear DCS Dl

minimal ®le cut set C M minimal ®le path set I

qiº probability that edge eifails M piº probability that edge eifunctions

[p(1), p(2), . . ., p(n)] a permutation such that if ®le fd2 Ap…i†and fd2 Ap…j†, then

fd2 Ap…k †for all k, i < k < j

M [p(1), p(2), . . ., p(n)] ˆ (1,2, . . ., n)

(15)

// check if there exists a minimal ®le path set if MFPS then

begin

for each ®le fi2 Ah do NFi¬ NFiÿ 1;

h ¬ h + 1; bh ¬ k; end end end for i ¬ h to n do bi¬ 1;

Step 2: // compute Pr(Yi) by Eq. (8) //

Pr…Y1† Qbjˆ11 pj// boundary condition // for i ¬ 1 to n do

begin

if bi6 n then Pr…Yi† 1=…piÿ1†  Pr…Yiÿ1† Qbjˆbi iÿ1‡1pj

else Pr(Yi) ¬ 0

end

Step 3: // Apply Theorem 2 and Eq. (9) to compute Pr(Ui) and Pr(Rj) //

for i ¬ 0 to b1ÿ 1 do Pr(Ri) ¬ 0; // boundary condition // Pr(U1) ¬ Pr(Y1) ; // boundary condition //

for i ¬ b1 to b2 ÿ 1 do Pr(Ri) ¬ Pr(U1); for i ¬ 2 to n do

begin

Pr…Ui† Pr…Uiÿ1† ‡ ‰…1 ÿ Pr…Riÿ2†Š  qiÿ1 Pr…Yi†;

for j ¬ bito bi+ 1 ÿ 1 do Pr(Rj) ¬ Pr(Ui);

end

Step 4: DPR ¬ Pr(Un); Output(DPR);

end Reliability_Linear_DCS Complexity analysis

For step 1, the computational complexity of the procedure biis O(nm) since

the value of h in the inner while_loop monotonously increases and does not exceed the value of k, i.e. the index of the outer for_loop. Computing Pr(Yi) in

step 2 is the similar operation as computing Pr[X(j, bi)] in step 5 of Algorithm

Reliability_Star_DCS. Thus, the complexity for step 2 is O(n + n) ˆ O(n). Step 3, which is the same as step 6 of Algorithm Reliability_Star_DCS, can be computed in O(n). Therefore, the algorithm Reliability_Linear_DCS takes O(nm) + O(n) + O(n) ˆ O(nm) time.

An illustrative example

Consider the linear DCS Dl in Fig. 4. Applying the algorithm

(16)

Step 1:

b1ˆ 1; b2ˆ 2; b3ˆ 4; b5ˆ 1; Step 2:

Pr…Y1† ˆ p1; Pr…Y2† ˆ p2 p3 p4;

Pr…Y3† ˆ p3 p4; fPr…Y4† ˆ p4 p5; Pr…Y5† ˆ 0;

Step 3:

Pr…R0† ˆ 0;

Pr…U1† ˆ p1; Pr…R1† ˆ Pr…R2† ˆ Pr…R3† ˆ Pr…U1† ˆ p1;

Step 4: Therefore, DPR is Pr(U5) ˆ p1+ q1áp2áp3áp4+ q1áq2áp3áp4+ q1áq3áp4áp5.

3.4. Ring DCS's

A ring DCS is a DCS with a circular communication link. Each node connects two conjoining edges with two neighboring nodes. Assume that Dris

a DCS with a ring structure. According to the well known factoring theorem [7], the DPR of Dris obtained as follows:

DPR…Dr† ˆ pi DPR…Drei† ‡ qi DPR…Drÿ ei†; …10†

i ˆ 2: Pr(U3) Pr(U2) ˆ Pr(U1)+[1 ÿ Pr(R0)]áq1áPr(Y2)

ˆ p1+ q1áp2áp3áp4

i ˆ 3: Pr(U3) ˆ Pr(U2) + [1 ÿ Pr(R1)]áq2áPr(Y3)

ˆ p1+ q1áp2áp3áp4+ q1áq2áp3áp4

Pr(R4) ˆ Pr(U3) ˆ p1+ q1áp2áp3áp4+ q1áq2áp3áp4

i ˆ 4: Pr(U4) ˆ Pr(U3)+[1 ÿ Pr(R2)]áq3áPr(Y4)

ˆ p1+ q1áp2áp3áp4+ q1áq2áp3áp4+ q1áq3áp4áp5

Pr(R5) ˆ Pr(U4) ˆ p1+ q1áp2áp3áp4+ q1áq2áp3áp4+ q1áq3áp4áp5

i ˆ 5: Pr(U5) ˆ Pr(U4) + [1 ÿ Pr(R3)]áq4áPr(Y5)

ˆ Pr(U4) // since Pr(Y5) ˆ 0 //

ˆ p1+ q1áp2áp3áp4+ q1áq2áp3áp4+ q1áq3áp4áp5

(17)

where ei is an arbitrary edge of Dr. Since Drÿ ei is a DCS with a linear

structure with n ÿ 1 edges, its reliability can be computed by the algorithm Reliability_Linear_DCS in O(nm) time. Notably, D

reiremains a DCS with a

ring structure with n ÿ 1 edges. The same analysis is then applied to D rei. By

recursively applying Eq. (10), we decompose the ring DCS Drwith n edges into,

in the worst case, n linear DCSs. Therefore, we have an O(n2m) time algorithm

for computing the reliability of a DCS with a ring structure. Algorithm Reliability_Ring_DCS(Dr)

Step 1: if there exists one node that holds all distinct data ®les then Return (DPR ¬ 1);

Step 2: Select an arbitrary edge ei of Dr; Step 3: Rell ¬ Reliability_Linear_DCS(Drÿ ei); Step 4: Relr ¬ Reliability_Ring_DCS(D

rei); Step 5: Return(DRP ¬ piáRelr+ qiáRell); end Reliability_Ring_DCS

An illustrative example

Consider the DCS with a ring topology in Fig. 5. This is a simpli®cation of the DCS in Fig. 4 with one edge e6 added between nodes v5and v0. Applying

algorithm Reliability_Ring_DCS yields

DPR…Dr† ˆ q6 DPR…Drÿ e6† ‡ p6 DPR…Dre6†

ˆ q6 DPR…Drÿ e6† ‡ p6 ‰q5DPR…Dre6ÿ e5†

‡ p5 DPR…Dre6e5†Š:

(18)

The fact that there exists one node in D

re6e5that holds all distinct data ®les

{f1, f2, f3, f4}, so we have DPR…Dre6e5† ˆ 1. The example in Section 3.3

ob-viously reveals that DPR(Drÿ e6) ˆ Pr…U5† and DPR…Dre6ÿ e5† ˆ Pr(U4).

Therefore, we have

DPR…Dr† ˆ q6 …p1‡ q1 p2 p3 p4‡ q1 q2 p3 p4‡ q1 q3 p4 p5†

‡ p6 ‰q5 …p1‡ q1 p2 p3 p4‡ q1 q2 p3 p4† ‡ p5Š:

4. Conclusions

This paper elucidates the distributed program reliability in various classes of distributed computing systems. This reliability is computationally intractable for arbitrarily distributed computing systems, even when it is restricted to the class of star distributed computing systems. A particular solvable case for star distributed computing systems is identi®ed, in which data ®les are distributed with respect to a consecutive property. In addition, a polynomial-time algo-rithm is developed for this case as well. Also proposed herein is a linear-time algorithm to verify whether or not an arbitrary star distributed computing system has this consecutive ®le distribution property. Furthermore, these re-sults are applied towards star DCS's to obtain the reliability of linear and ring DCS's in polynomial time. A future work should attempt to construct ecient algorithms for computing lower and upper bounds on the distributed program reliability for arbitrarily distributed computing systems.

References

[1] P. Enslow, What is a distributed data processing system, Computer, vol. 11, Jan. 1978. [2] J. Garcia-Molina, Reliability issues for fully replicated distributed database, IEEE Trans.

Computer 16 (1982) 34±42.

[3] A. Satyanarayana, J.N. Hagstrom, A new algorithm for the reliability analysis of multi-terminal networks, IEEE Trans. on Reliability 30 (1981) 325±334.

[4] A. Kumar, S. Rai, D.P. Agrawal, On computer communication network reliability under program execution constraints, IEEE JSAC 6 (1988) 1393±1399.

[5] V.K.P. Kumar, S. Hariri, C.S. Raghavendra, Distributed program reliability analysis, IEEE Trans. Software Eng. 12 (1986) 42±50.

[6] M.S. Lin, D.J. Chen, The computational complexity of the reliability problem on distributed systemsInformation Processing Letters 64 (1997) 143±147.

[7] L.G. Valiant, The complexity of enumeration and reliability problems, SIAM J. Computing 8 (1979) 410±421.

[8] K.S. Booth, G.S. Leuker, Testing for the consecutive ones property interval graphs and graph planarity using PQ-tree algorithms, Journal of Computer System and Science 13 (1976) 335± 379.

數據

Fig. 1. A simple DCS.
Fig. 2. A star DCS with the consecutive ®le distribution property.
Fig. 3. The reduction steps by using a PQ-tree.
Fig. 4. A DCS with a linear structure.
+2

參考文獻

相關文件

2 Distributed classification algorithms Kernel support vector machines Linear support vector machines Parallel tree learning.. 3 Distributed clustering

2 Distributed classification algorithms Kernel support vector machines Linear support vector machines Parallel tree learning?. 3 Distributed clustering

Breu and Kirk- patrick [35] (see [4]) improved this by giving O(nm 2 )-time algorithms for the domination and the total domination problems and an O(n 2.376 )-time algorithm for

 The class of languages decided by polynomi al-time algorithms 是 the class of languages accepted by polynomial-time algorithms 的 su bset.. G=(V,E) is a simple cycle that contains

important to not just have intuition (building), but know definition (building block).. More on

Both problems are special cases of the optimum communication spanning tree problem, and are reduced to the minimum routing cost spanning tree (MRCT) prob- lem when all the

▪ Approximation algorithms for optimization problems: the approximate solution is guaranteed to be close to the exact solution (i.e., the optimal value)..

• There are important problems for which there are no known efficient deterministic algorithms but for which very efficient randomized algorithms exist.. – Extraction of square roots,