The Distributed Program Reliability Analysis on Ring-Type Topologies

(1)

The Distributed Program Reliability Analysis on Ring-Type

Topologies

環狀拓樸邏輯分散式程式可靠度分析

Ming-Sang Chang

1

Min-Sheng Lin Deng-Jyi Chen

Department of Information Department of Electric Institute of Computer

Management Engineering National Science and Information

Lunghwa University of Taipei University of Engineering National Chiao Tung

Science and Technology Technology 1, Sec. 3, University 1001 Ta Hsueh Road,

300,Wan Shou Rd., Sec. 1, Chung Hsiao E. Rd., Hsin Chu, Taiwan, R.O.C.

Kueishan Taoyuan, Taipei, Taiwan, R.O.C

Taiwan, R.O.C.

Email:[email protected] Email:[email protected] Email: [email protected]

1

All correspondences should be sent to MING-SANG CHANG ([email protected])

Abstract

Distributed Computing System (DCS) has become very popular for its high fault-tolerance, potential for parallel processing, and better reliability performance. One of the important issues in the design of the DCS is the reliability performance. Distributed Program Reliability (DPR) is addressed to obtain this reliability measure.

In this paper, We propose a polynomial-time algorithm for computing the DPR of ring topologies and show that solving the DPR problem on a ring of trees topology is NP-hard.

Keywords: Distributed Program Reliability, Minimal File Spanning Tree, Algorithm, Ring of Tree

1. Introduction

Distributed Computing System (DCS) has become very popular for its fault-tolerance, potential for parallel processing, and better reliability performance. One of the important issues in the design of the DCS is the reliability performance. Distributed program reliability is address to obtain this reliability measure [1-4].

An efficient network topology is quite important for the distributed computing system. The ring topology is a popular one used in high speed network. It has been considered for IEEE 802.5 token ring, for the fiber distributed data interface (FDDI) token ring, for the synchronous optical network (SONET), and for asynchronous transfer mode (ATM) networks. The ring network has widely used in current distributed system design.

In a ring of tree topology, a ring is used to connect each tree topology in the network. This architecture can be used in FDDI that consists of (1) a tree of wiring concentrators and terminal stations, and (2) a counter-rotating dual ring [5].

A large amount of work has been devoted to developing algorithms to compute measures of reliability for a DCS. One typical reliability measure for a DCS is the K-terminal reliability (KTR) [6-8]. KTR is the probability that a specified set of nodes K, which is subset of all the nodes in a DCS, remains connected in a DCS whose edges may fail independently of each other, with known probabilities. However, the KTR measure is not applicable to a practical DCS since a reliability measure for a DCS should capture the effects of redundant distribution of programs and data files. In [1-4], distributed program reliability (DPR) was introduced to accurately model the reliability of a DCS. For successful execution of a distributed program, it is essential that the node containing the program, other nodes that have required data files, and the edges between them be operational. DPR is thus defined as the probability that a program with distributed files can run successfully in spite of some faults occurring in the edges. In reality, the DPR problem is a logical OR-ing of Prob{K-terminals are connected}, but the computing the conditional probabilities required could be rather nasty.

In this paper, We propose a polynomial-time algorithm to analyze the DPR of ring topology and show that solving the DPR problem on a ring of tree topology is NP-hard.

2. Notation and Definitions

Notation

D=(V, E, F) an undirected Distributed Computing System (DCS) graph with vertex set V, edge set E and data file set F .

FAi set of files available at node i. (Note:

F=

∪

_{FAi )} pi reliability of edge i

(2)

qi 1−_pi

H subset of files of F, i.e., H

⊆

F, and

H contains the programs to be

executed and all needed data files for the execution of these programs

R (DH) the DPR of D with a set H of needed

files: Pr{all data files in H can be accessed successfully by the executed programs in H}.

Definition: A File Spanning Tree (FST) is a tree whose

nodes hold all needed files in H.

Definition: A Minimal File Spanning Tree (MFST) is a

FST such that there exists no other FST that is a subset of it.

Definition: Distributed program reliability (DPR) is

defined as the probability that a distributed program runs on multiple processing elements (PEs) and needs to communicate with other PEs for remote files will be executed successfully.

By the definition of MFST, the DPR can be written as

R (DH)=Prob(at least one MFST is operational), or

R (DH)=Prob (

∪

MFST

j j = 1

#mfst

)

where #mfst is the number of MFSTs for a given needed file set H.

3. Computing DPR Over a DCS with a Ring

Topology

Now, we consider a DCS with a linear structure

D=(V, E, F) with |E|=n edges in which an alternation

sequence of distinct nodes and edges (v0, e1, v1, e2, ...,

vn−1, en, vn) is given. For 1

≤

i

≤

n, let

Ii the FST which starts at edge ei and has the

minimal length

Si the event that all edges in Ii are working

Qi ≡

∏

_{all edge j} ∈I_i

p

j be the probability

that Si occurs

Ei the event that there exists an operating event Sj between edges e₁_{and ei}

gi the number of Ij which lies between e1 and ei

xi state of edge ei ; xi=0 if edge ei fails ; else xi=1

A the complement of event A.

It is easy to see that the DPR of a DCS with a linear structure D with |E|=n edges , R(DH), can be stated

as Pr(En). The following theorem provides a recursive method for computing Pr(En).

Theorem 1.

]

*

))

Pr(

1 (

[

)

Pr(

)

Pr(

1 2 1 1 1 i i i g g i n n

E

q

Q

E

n n − − + = −

+

−

=

∑

−

with the boundary conditions Pr(E_i) = 0, g_i=0, and p_i=0, for i

≤

0.

Proof. See the appendix.

Before applying the Theorem 1, we use the following procedure COMGQ to compute the values of

gi and Qi, for 1

≤

i

≤

n, for a given linear DCS with |E|=n edges.

Procedure COMGQ

// Given a DCS with a linear structure with the alternation sequence of distinct nodes and edges //

// (v₀, e₁, v₁, e₂_{, ..., vn}−₁, en, vn),//

// F: the set of files (including data files and programs) distributed in D;//

// H: the set of files that must be commu nicated each other through the edges in D;//

// FAi: the set of files available at node vi, for 0

≤

i

≤

_n; and //

(3)

// this procedure computes the values of gi and Qi, for 1

≤

i

≤

_n.//

//h (head) and t (tail) are two indexes moving among nodes. NFi is the total number of file i //

// between nodes v_h and v_t. If there exists a FST between nodes vh and vt then flag=true //

// else flag= false //

begin

for 2

≤

i

≤

_{n do Qi}

←

0 repeat // initialize //

p₀

←

Q₁

←

₁ // initialize //

h

←

0; t

←

1 // initialize //

for each file i

∈

F do // initialize //

if file i

∈

FAh then NFi

←

1

else NFi

←

0

endif repeat while t

≤

n do

for each file i

∈

_{FAt do NFi}

←

_NFi+1 repeat

Q_h+1

←

Q_h+1_*pt

flag

←

true

while flag do

for each file i

∈

H do // check if there exists a FST between //

if NFi=0 then flag

←

false

endif //nodes v_h and v_t //

repeat if flag then

for each file i

∈

_{FAh do NFi}

←

_NFi−1

repeat h

←

h+1 Q_h+1

←

Q_h/p_h endif repeat gt

←

h t

←

t+1 repeat

for 1

≤

i

≤

_{n do output(gi, Qi) repeat}

end COMGQ

Now, using the procedure COMGQ and Theorem 1, we are able to provide an algorithm for computing the reliability of a DCS with a linear structure.

Algorithm Reliability_Linear_DCS(D)

// Given a DCS with a linear structure D=(V, E, F) with |E|=n and a specified set of files H ,//

// this algorithm returns the DPR of D //

Step 1: Call COMGQ to compute the values of gi and Qi,

1

≤

i

≤

n.

Step 2: Evaluate Pr(En), recursively using Theorem 1. Step 3: Return (Pr(En)).

end Reliability_Linear_DCS

For step 1, the computational complexity of the procedure COMGQ is O(|E||F|), where |E| = n and

|F|

≥

m ax ((m ax_in_{= 0}_(FA

i), H )) since the value of h in the inner while_loop is monotonously increasing and doesn't exceed the value of t that is the index of the outer while_loop. For step 2, by Theorem 1, Pr(Ei) can be computed in O(gi−gi−1+1). Since there are n such Pr(Ei)'s to compute, we need

another

(

₁

1 ))

1

+

−

₋ =

∑

g

O

_i n i i =

O

(

n

+

g

_n

−

g

₀

)

=

O(n) = O(|E|). Therefore algorithm

Reliability_Linear_DCS takes O(|E||F|)+O(|E|) = O(|E||F|) time to compute the reliability of a DCS with a linear structure system. Example 1: MRG ADF IXF MRG CAF CAF ADF IXF

Computer A Computer B Computer C Computer D Computer E ADF

Computer F

(4)

e1 e2 e3 e4 e5

f1

f2 f3 f2 f3 f1 f2

P ro g ra m n eed s d ata files f₁, f₂, an d f₃ fo r its ex ec u tio n .

v0

f 4 f 4

f 4

v1 v2 v3 v4 v5

Figure 2. The graph model for the distributed banking system in figure 1.

Consider a possible DCS of a banking system [4,17] shown in figure 1. Each local disk stores some of the following information:

Consumer accounts file (CAF), Administrative aids file (ADF), and Interest and exchange rates file (IXF).

Management report generation (MRG) in computers B and E indicates a query (program) to be executed for report generation. Figure 2 shows the graph model for this system. A node represents any computer location and the links show the communication network. We assume that the query MRG( f4) requires data CAF(f1),

ADF(f2) and IXF(f3) to complete its execution. Let

V={v₀, v₁, v₂, v₃, v₄, v₅}, E={e₁, e₂, e₃, e₄, e₅}, F={f₁, f₂, f₃_,

f₄} and H={f₁, f₂, f₃_{, f}₄}. Applying the algorithm Reliability_Linear_DCS, we get Step 1: g 0 =0, // boundary condition // g 1=1, Q₁= p 1, g 2=1, Q₂= p₂p₃p₄, g₃_=1, _Q 3= p3 p4, g 4=3, Q₄= p₄ p5, g₅_{=4, and} _Q

5=0. // I5 does not exist //

Step 2:

Pr(E1) = Pr(E2) =Pr(E3)=q0Q1 = p1

Pr(E4) = Pr(E3)+(1−Pr(E0))q1Q2+(1−Pr(E1))q2Q3

= p₁+q₁p₂p₃_p₄+q₁_q₂_p₃_p₄

Pr(E5) = Pr(E4)+(1−Pr(E2))q3Q4

= p₁+q₁p₂_p₃_p₄+q₁_q₂_p₃_p₄+q₁_q₃_p₄_p₅. A ring DCS is a DCS with a circular communication link. Each node connects two conjoining edges with two neighboring nodes. Suppose D=(V, E, F) be a DCS with a ring topology. By factoring theorem [9], the DPR of D can be given as

R(DH)=peR((D+e)H)+qeR((D−e)H),

(Eq. 1) where

e is an arbitrary edge of D,

pe is the reliability of edge e,

qe

≡

1−_pe,

D+e is the DCS D with edge e =(u,v) contracted so that nodes u and v are merged into a single node and this new merged node contains all data files that previously were in nodes u and v, and

D−e is the DCS D with edge e deleted.

Since D−e is a DCS with a linear structure with |E|−1 edges, its DPR reliability can be computed by the algorithm Reliability_Linear_DCS in O(|E||F|) time. Note that D+e remains a DCS with a ring structure with |E|−1 edges. We then apply the same analysis to D+e. Recursively applying Equation (Eq.1), the ring DCS D with |E| edges can be decomposed into, in worst case, | E| linear DCSs. So, we have a O(|E|2|F|) time algorithm for computing the reliability of a DCS with a ring structure.

Algorithm Reliability_Ring_DCS(D)

// Given a DCS with a ring structure D=(V, E, F) and a specified set of files H , //

// this algorithm returns the DPR of D //

Step 1: If there exists one node in V holds all data files in

H then return (1).

Step 2. Select an arbitrary edge e of D. Step 3: R_l

←

Reliability_Linear_DCS(D−e).

Step 4: Rr

←

Reliability_Ring_DCS(D+e).

Step 5: Return(pe*Rr+qe*Rl).

end Reliability_Ring_DCS

Example 2:

Consider the DCS with a ring topology shown in Figure 3. This is the DCS shown in Figure 2 with one edge e₆_{added between nodes v}₅ and v₀.

f1 f2 f3 f4 f2 f2 f3 f4 f1 v0 v1 v2 v3 v4 v5 e1 e2 e3 e4 e5 e6

Program f4 needs data files f1, f2, f3 for its execuitn.

Figure 3. A DCS with a ring structure

Applying algorithm Reliability_Ring_DCS, we have

(5)

= q6R ((D -e6)H) + p6{ q5R (( D + e6-e5)H)+ p5R (( D+ e6+ e5)H)}

Since there exists one node in D+e₆+e₅ that holds all files in H, we have R((D+e₆+e₅)_H)=1. From example 1, it is easy to see that R((D−e₆)_H)=Pr(E₅) and

R((D+e₆−e₅)_H)=Pr(E₄). So we have

R (DH)

= q6(p1+q1p2p3p4+q1q2p3p4+ q1q3p4p5) + p6[q5(p1+q1p2p3p4+ q1q2p3p4)+p5]_.

4. Computational Complexity of the DPR

Problem on a Ring of Tree Topology

Complexity results are obtained by transforming known NP-hard problems to our reliability problems [10-14]. For this reason, we first state some known NP -hard problems as follows.

i) K-Terminal Reliability (KTR)

Input: an undirected graph G = (V, E) where V is the set of nodes and E is the set of edges that fail s-independently of each other with known probabilities. A set

K⊆V is distinguished with |K| ≥ 2. Output:R(GK), the probability that the set K of

nodes of G is connected in G. ii) Number of Edge Covers (#EC) Input: an undirected graph G = (V, E). Output: the number of edge covers for G ≡ |{L ⊆ E: each node of G is an end of

some edge in L}|.

iii) Number of Vertex Covers (#VC) Input: an undirected graph G = (V, E). Output: the n umber of vertex covers for G

≡ |{K ⊆ V: every edge of G has at least one end in K}|.

Theorem 2. Computing DPR for a DCS with a star

topology even with each |FAi|=2 is NP-hard.

Proof. We reduce the #EC problem to our problem. For a given network G=(V1, E₁) where E₁={e₁, e₂, ...,

en}, we construct a DCS D=(V2, E2, F) with a

star topology where V2={s, v₁, v₂, ..., vn},

E2={ (s, vi) | 1 ≤ i ≤ n}, and F={ fi | for each

node i ∈_{G}. Let FAvi= { fu, fv | if ei=(u, v)}∈G}

for 1≤ i ≤_{n, FAs=}∅ and H =F. From the construction of D, it is easy to show that there is one-to-one correspondence between one of sets of edge covers and one FST. The DPR of D,

R(DH), can be expressed as R(DH)=. { p_i Π for each edgei∈t ( 1-p_i)

Π

for each edgei∉t

Σ

for all FST t∈D }

Thus, a polynomial-time algorithm for computing R(DH) over a DCS with a star

topology and each |FAi|=2 would imply an efficient algorithm for #EC problem. Since #EC problem is NP-hard, Theorem 2 follows.

Theorem 3. Computing DPR for a DCS with a star

topology even when there are only two copies of each file is NP-hard .

Proof. We reduce the #VC problem to our problem. For a given G=(V1, E1) where |E1|=n and V1={v₁,

v₂, ..., v_m}, we construct a DCS D=(V2, E2, F)

with a star topology where V2=V1

∪

{s},

E2={ei=(s, vi) | 1

≤

_i

≤

_{m}, and F={fi | for all}

edge i

∈

_{G}. Let FAi={ fj | for all edge j that are} incident on vi

∈

G} and H=F. From the

construction of D, it is easy to show that there are only two copies of each file in D and one-to-one correspondence between one-to-one of sets of vertex covers and one FST of D. The DPR of D,

R(DH), can be expressed as R(DH)=

{

_Π

p

_i for each edge i∈t

(1-p

_i

)

Π

for each edge i∉t

Σ

for all FST t∈D

}

Since #VC problem is NP-hard, Theorem 3 follows.

Theorem 4. Computing DPR for a DCS with a tree

topology is NP-hard.

Proof. By Theorems 2 and 3, we can see that DPR problem for a DCS with a star topology, in general, is NP-hard. This implies DPR problem for a DCS with a tree topology, in general, is also NP-hard, since a DCS with a star topology is just a DCS with a tree topology that has one level branch.

Now, We use the results of Theorem 2, 3, and 4 to prove the DPR problem on ring of tree topology is NP-hard.

Theorem 5: Computing DPR for a DCS with a ring of

trees topology even with one level of tree is NP-hard.

Proof. Give a DCS graph D=(V, E, H) where V={s, v1,

v2, ..., vn} and E={(s, vi) | 1

≤

i

≤

n} with a star

topology. We construct a DCS graph D’=(V’, E’,

H) from graph D, where

V’={ v1, v2, ..., vn}∪{( sj | 1

≤

j

≤

n} and

E’ = {(sj, sj+1) |1 ≤ j < n} ∪ {(sn, s1)} ∪ {(sj, vj) | 1 ≤ j ≤

n}.

It is easy to see that D’ is a ring of tree topology with one level of tree. If we assume all added edges, {(sj, sj+1) |1 ≤ j < n} ∪ {(sn, s1)}, of D’ be

perfect reliability, then we have R(DH)=R(D’H)

for any given H

⊆

H. By Theorem 2 and 3,

computing DPR over a DCS with a star topology is NP-hard, thus, computing DPR over a DCS with a ring of tree topology with one level of tree is also NP-hard.

Theorem 6: Computing DPR for a DCS with a ring of

tree topology, in general, is NP-hard. Proof. By Theorem 5, we can see that DPR problem for

(6)

one level of tree is NP-hard. With the same approach stated in Theorem 5, we construct a ring of tree topology with a tree topology. By Theorem 4, computing DPR over a DCS with a tree topology is NP-hard, thus, computing DPR over a DCS with a ring of tree topology is also

NP-hard.

5. Conclusions

In this paper, we investigated the problem of distributed program reliability on ring distributed computing systems. We propose a polynomia l-time algorithm for computing the DPR on a ring topology. We also propose Theorem 5 and Theorem 6 to show that solving the DPR problem on a ring of trees topology is

NP-hard.

6. Appendix

The detailed proof of Theorem 1 is as follows.

Theorem 1.

]

*

))

Pr(

1 (

[

)

Pr(

)

Pr(

1 2 1 1 1 i i i g g i n n

E

q

Q

E

n n − − + = −

+

−

=

∑

−

with the boundary conditions

Pr(E_i) = 0, g_i=0, and p_i=0, for i

≤

0. Proof. Pr(En) = Pr(En-1

∪

Si i=gn-1+1 gn ) = Pr(En-1)+ Pr(En-1

∩

(

∪

Si i=gn-1+ 1 gn )) (Eq.2)

For the term Pr(En-1

∩

(

∪

Si i=gn-1+ 1 gn )) in (Eq.2), we have Pr(En-1

∩

(

∪

Si i=gn-1+ 1 gn )) = Pr(E_n_-1

∩

S_g_n-1₊₁)+ Pr(E_n_-1

∩

S_g_n-1_{+ 1}

∩

S_g_n-1_{+ 2}) +Pr(E_n_-1

∩

S_g_n-1₊₁

∩

S_g_n-1_{+ 2}

∩

S_g_n-1_{+ 3})+.. . +Pr(En-1

∩

Sgn-1+1

∩

.. .

∩

Sgn-1

∩

Sgn). (Eq.3) Since Si = Si+ 1

∩

{xi=1} for n n

i

g

+

≤

−1

1

, we

have _S_i

⊂

_S_i₊₁ and _S_i

∩

_S_i₊₁_{= S}_i₊₁. Thus,

Pr(En-1

∩

(

∪

Si i=gn-1+ 1 gn )) = Pr(E_n_-1

_∩

S_g_n-1_{+ 1}) + Pr(E_n_-1

_∩

S_g_n-1_{+ 1}

_∩

S_g_n-1_{+ 2}) + Pr(En-1

∩

Sgn-1+ 2

∩

Sgn-1+ 3) +.. . +Pr(En-1

∩

Sgn-1

∩

Sgn) =

Pr(

)

Pr(

)

1 1 2 1 1 1 1 n i i g g i g n

S

E

S

E

n n n

∩

+

∩

₋ ₋ + = + −

∑

− − = Pr(E_g_n-1_-1

∩

{x_g_n-1= 0}

∩

S_g_n-1_{+ 1}) +

Pr(

{

0 }

)

1 2 2 1 i i i g g i

S

x

E

n n

∩

=

∩

₋ − + =

∑

₋ =

Pr(

{

0 }

)

1 2 1 1 i i i g g i

S

x

E

n n

∩

=

∩

₋ − + =

∑

₋

Note that the events E_i−₂, {x_i-1_{=0}, and Si are disjoint} with each other. We have

Pr(E_i_-2

∩

{x_i_-1=0}

∩

S_i) = Pr(Ei-2)*Pr({xi-1=0})*Pr(Si). So Pr(E_n_-1

∩

(

∪

S_i i=gn-1+ 1 gn )) =

Pr(

₂

)

*

Pr({

₁

0 })

*

Pr(

)

1 1 i i i g g i

S

x

E

n n

=

− − + =

∑

− =

]

*

))

Pr(

1 (

[

1 2 1 1 i i i g g i

Q

q

E

n n − − + =

−

∑

− (Eq.4)

(7)

By equations (Eq.2) and (Eq.4), we obtain Theorem 1.

Reference

[1] V. K. Prasanna Kumar, S. Hariri and C. S. Raghavendra, "Distributed program reliability analysis," IEEE Trans. Software Eng., vol. SE-12, pp. 42-50, Jan. 1986.

[2] S. Hariri and C.S. Raghavendra, "SYREL: A Symbolic Reliability Algorithm based on Path and Cutset Methods", USC Tech. Rep., 1984. [3] A. Kumar, S. Rai and D.P. Agrawal, "Reliability

Evaluation Algorithms for Distributed Systems", in Proc. IEEE INFOCOM 88, pp.851-860, 1988. [4] A. Kumar, S. Rai and D.P. Agrawal, "On

Computer Communication Network Reliability Under Task Execution Constraints", IEEE

Journal on Selected Areas in Communication,

Vol.6, No.8, pp. 1393-1399, Oct.1988.

[5] Raj Jain, "FDDI Handbook : High Speed Networking Using Fiber and Other Media" , Addison-Wesley Publishing Company, 1994 [6] Jiahnsheng Yin, Charles B. Silio Jr., "K-Terminal

Reliability In Ring Networks," IEEE Trans. on Reliability, vol. 43, no. 3, pp. 389-400,1994.

[7] Dimitris Logothetis, Kishor S. Trivedi, " Reliability Analysis of the Double Counter-rotating Ring with Concentrator Attachment," IEEE Trans. On Networking, vol.2, no.5, pp. 520-532,1994.

[8] D.J.Chen, M.S.Chang, C.L.Yang, Kuo-Lung Ku," Multimedia Task Reliability Analysis Based on Token Ring Network," 1996 Int. Conference on Parallel and Distributed System, pp.265-272,1996. [9] R. Kevin Wood, "Factoring Algorithms for

Computing K-terminal Network Reliability",

IEEE Trans. Reliability, Vol. R-35, pp.269-278,

Aug. 1986.

[10] A. Rosenthal, "A Computer Scientist Looks at Reliability Computations, "in: Reliability and

Fault tree Analysis SLAM, 1975, pp. 133-152.

[11] L. G. Valiant, "The Complexity of Enumeration and Reliability Problems," SIAM J. Computing, vol. 8, pp. 410-421, 1979.

[12] M. O. Ball , J. S. Provan and D. R. Sh ier, "Reliability Covering Problems," Networks, vol. 21, pp. 345-357, 1991

[13] J. S. Provan and M. O. Ball, "The Complexity of Counting Cuts and of Computing the Probability that a Graph is Connected," SIAM J. Computing, vol. 12, no. 4, pp. 777-788, Nov. 1983.

[14] J. S. Provan, "The complexity of reliability computations in planar and acyclic graphs", SIAM

Journal on Computing 15 (1986) 694-702.

[15] M.S. Lin, "Program Reliability Analysis in Distributed Computing System", Ph.D. Dissertation, 1994; National Chiao Tung University, Taiwan.

[16] Min-Sheng Lin, Deng-Jyi Chen, "The Computational Complexity of the Reliability Problem on Distributed Systems", Information Processing Letters 64 (1997) 143-147.

[17] D.A. Sheppard, "Standard for banking communication system", IEEE Trans. Computer, 1987 Nov, pp 92-95.