The distributed program reliability analysis on ring-type topologies

(1)

The distributed program reliabilityanalysis on

ring-type topologies

Min-Sheng Lin

*, Ming-Sang Chang, Deng-Jyi Chen, Kuo-Lung Ku

Department of Information Management, Aletheia University, 32 Chen Li Road, Tamsui, Taipei, 25103 Taiwan, ROC

Department of Information Technology, Chunghwa Telecommunication Training Institute, 168 Min Chu Road,

Pan Chiao, Taipei, 22077 Taiwan, ROC

Institute of Computer Science and Information Engineering, National Chiao Tung University, 1001 Ta Hsueh Road,

Hsin Chu, Taiwan, ROC

Chung-San Institute of Science and Technology, Tao-Yuan, Taiwan, ROC

Abstract

Distributed computing system (DCS) has become very popular for its high fault-tolerance, potential for parallel processing, and better reliabilityperformance. One of the important issues in the design of the DCS is the reliabilityperformance. Distributed program reliability(DPR) is addressed to obtain this reliability measure. In this paper, we propose a polynomial-time algorithm for computing the DPR of ring topology and show that solving the DPR problem on a ring of trees topologyis NP-hard. 2001 Elsevier Science Ltd. All rights reserved.

Scope and purpose

The widespread use of distributed computing system is due to the price}performance revolution in microelectronics, the development of cost-e!ective and e$cient communication subsets, the development of resource sharing software, and the increased user demands for communication, economical sharing of resources, and productivity. This article is concerned with the analysis of distributed program reliability on a ring-distributed computing system. The distributed program reliability is a useful measure for reliability evaluation of distributed computing system. The distributed program reliability analyses also give a good index for designing a high-reliability-performance-distributed computing system.

Keywords: Distributed program reliability; Minimal "le spanning tree; Algorithm; Ring of tree

* Corresponding author.

E-mail addresses: [email protected] (M.-S. Lin), [email protected] (M.-S. Chang), [email protected].

tw (D.-J. Chen).

(2)

1. Introduction

Distributed computing system (DCS) has become very popular for its fault-tolerance, potential for parallel processing, and better reliabilityperformance. One of the important issues in the design of the DCS is the reliabilityperformance. Distributed program reliabilityis address to obtain this reliabilitymeasure [1}4].

An e$cient network topologyis quite important for the distributed computing system. The ring topologyis a popular one used in high-speed network. It has been considered for IEEE 802.5 token ring, for the "ber-distributed data interface (FDDI) token ring, for the synchronous optical network (SONET), and for asynchronous transfer mode (ATM) networks. The ring network has widelyused in current distributed system design.

In a ring of tree topology, a ring is used to connect each tree topology in the network. This architecture can be used in FDDI that consists of (1) a tree of wiring concentrators and terminal stations, and (2) a counter-rotating dual ring [5].

A large amount of work has been devoted to developing algorithms to compute measures of reliabilityfor a DCS. One typical reliabilitymeasure for a DCS is the K-terminal reliability(KTR) [6}8]. KTR is the probabilitythat a speci"ed set of nodes K, which is subset of all the nodes in a DCS, remains connected in a DCS whose edges mayfail independent of each other, with known probabilities. However, the KTR measure is not applicable to a practical DCS since a reliability measure for a DCS should capture the e!ects of redundant distribution of programs and data "les. In Prasanna Kumar et al., Hari and Raghavendra and Kumar et al. [1}4], distributed program reliability(DPR) was introduced to accuratelymodel the reliabilityof a DCS. For successful execution of a distributed program, it is essential that the node containing the program, other nodes that have required data "les, and the edges between them be operational. DPR is thus de"ned as the probabilitythat a program with distributed "les can run successfullyin spite of some faults occurring in the edges. In reality, the DPR problem is a logical OR-ing of Prob+K-terminals are connected,, but computing the conditional probabilities required could be rather nasty.

In this paper, we propose a polynomial-time algorithm to analyze the DPR of ring topology and show that solving the DPR problem on a ring of tree topologyis NP-hard.

2. Notation and de5nitions

Notation

D"(<, E, F) an undirected distributed computing system (DCS) graph with vertex set <, edge set E and data "le set F.

FAG set of "les available at node i. (Note: F"6FAG)

pG reliabilityof edge i

qG 1!pG

H subset of "les of F, i.e., H-F, and H contains the programs to be executed and all needed data "les for the execution of these programs

R(D&) the DPR of D with a set H of needed "les: Pr+all data "les in H can be accessed successfullybythe executed programs in H,.

(3)

De5nition. A xle spanning tree (FST) is a tree whose nodes hold all needed "les in H.

De5nition. A minimal xle spanning tree (MFST) is an FST such that there exists no other FST that is a subset of it.

De5nition. Distributed program reliability (DPR) is de"ned as the probabilitythat a distributed program runs on multiple processing elements (PEs) and needs to communicate with other PEs for remote "les will be executed successfully.

Bythe de"nition of MFST, the DPR can be written as

R(D&)"Prob(at least one MFST is operational)

or

R(D&)"Prob

I 8

H MFSTH

,

whereCmfst is the number of MFSTs for a given needed"le set H. 3. Computing DPR over a DCS with a ring topology

Now, we consider a DCS with a linear structure D"(<, E, F) with"E""n edges in which an alternation sequence of distinct nodes and edges (v, e, v, e,2, vL\,eL,vL) is given. For 1)i)n, let

IG the FST which starts at edge eG and has the minimal length SG the event that all edges in IG are working

QG , HZ'GpH be the probabilitythat SG occurs

EG the event that there exists an operating event SH between edges e and eG gG the number of IH which lies between e and eG

xG state of edge eG; xG"0 if edge eG fails; else xG"1 AM the complement of event A.

It is easyto see that the DPR of a DCS with a linear structure D with"E""n edges, R(D&), can be stated as Pr(EL). The following theorem provides a recursive method for computing Pr(EL). Theorem 1.

Pr(EL)"Pr(EL\)# EL

GEL\>[(1!Pr(EG\))qG\QG]

with the boundary conditions Pr(EG)"0, gG"0, and pG"0 for i)0.

Proof. See the appendix.

Before applying Theorem 1, we use the following procedure COMGQ to compute the values of

(4)

Procedure COMGQ

// Given a DCS with a linear structure with the alternation sequence of distinct nodes and edges // // (v, e, v, e,2, vL\, eL, vL), //

// F: the set of "les (including data "les and programs) distributed in D; //

// H: the set of "les that must be communicated each other through the edges in D; // // FAG: the set of "les available at node vG, for 0)i)n; and //

// pG: the reliabilityof edge i, for 1)i)n, //

// this procedure computes the values of gG and QG, for 1)i)n. //

// h (head) and t (tail) are two indexes moving among nodes. NFG is the total number of "le i // // between nodes vF and vR. If there exists an FST between nodes vF and vR then -ag"true // // else -ag"false //

begin

for 2)i)n do QGQ0 repeat// initialize //

pQQQ1 // initialize //

hQ0; tQ1 // initialize // for each "le i3F do // initialize //

if "le i3FAF then NFGQ1 else NFGQ0 endif

repeat

while t)n do

for each "le i3FAR do NFGQNFG#1 repeat

QF>QQF>*pR

-agQtrue while yag do

for each "le i3H do // check if there exists an FST between // if NFG"0 then -agQfalse endif //nodes vF and vR // repeat

if yag then

for each "le i3FAF do NFGQNFG!1 repeat

hQh#1 QF>QQF/pF endif repeat gtPh tQt#1 repeat

for 1)i)n do output(gG,QG) repeat end COMGQ

Now, using the procedure COMGQ and Theorem 1, we are able to provide an algorithm for computing the reliabilityof a DCS with a linear structure.

(5)

Algorithm Reliability_{+Linear+DCS(D)}

// Given a DCS with a linear structure D"(<, E, F) with"E""n and a speci"ed set of "les H, // // this algorithm returns the DPR of D //

Step 1: Call COMGQ to compute the values of gG and QG, 1)i)n. Step 2: Evaluate Pr(EL), recursivelyusing Theorem 1.

Step 3: Return (Pr(EL)). end Reliability_+Linear+DCS

For step 1, the computational complexityof the procedure COMGQ is O("E""F"), where "E""n and "F"*max((maxLG (FAG),H)) since the value of h in the inner while}loop is monotonously increasing and does not exceed the value of t that is the index of the outer while_{}loop. For step 2, by} Theorem 1, Pr(EG) can be computed in O(gG!gG\#1). Since there are n such Pr(EG)'s to compute, we need another O( LG (gG!gG\#1))"O(n#gL!g)"O(n)"O("E"). Therefore algorithm Reliability_}Linear}DCS takes O("E""F")#O("E")"O("E""F") time to compute the reliabilityof a DCS with a linear structure system.

Example 1. Consider a possible DCS of a banking systm [4,15] shown in Fig. 1. Each local disk stores some of the following information:

consumer accounts "le (CAF), administrative aids "le (ADF), and interest and exchange rates "le (IXF).

Management report generation (MRG) in computers B and E indicates a query(program) to be executed for report generation. Fig. 2 shows the graph model for this system. A node represents any computer location and the links show the communication network. We assume that the query MRG( f) requires data CAF(f), ADF(f) and IXF(f) to complete its execution. Let <"+v,v,v,v,v,v,, E"+e,e,e,e,e,, F"+f, f, f, f, and H"+f, f, f, f,. Ap-plying the algorithm Reliability_{}Linear}DCS, we get}

Step 1: g"0, //boundarycondition // g"1, Q"p, g"1, Q"ppp, g"1, Q"pp, g"3, Q"pp,

g"4 and Q"0. //I does not exist // Step 2:

Pr(E)"Pr(E)"Pr(E)"qQ"p

Pr(E)"Pr(E)#(1!Pr(E))qQ#(1!Pr(E))qQ "

(6)

Fig. 1. A distributed banking system.

Fig. 2. The graph model for the distributed banking system in Fig. 1.

Pr(E)"Pr(E)#(1!Pr(E))qQ "

p#qppp#qqpp#qqpp.

A ring DCS is a DCS with a circular communication link. Each node connects two conjoining edges with two neighboring nodes. Suppose D"(<, E, F) be a DCS with a ring topology. By factoring theorem [9], the DPR of D can be given as

R(D&)"pCR((D#e)&)#qCR((D!e)&), (1) where e is an arbitraryedge of D, pC is the reliabilityof edge e, qC,1!pC, D#e is the DCS D with edge e"(u, v) contracted so that nodes u and v are merged into a single node and this new merged node contains all data "les that previouslywere in nodes u and v, and D!e is the DCS D with edge

e deleted.

Since D!e is a DCS with a linear structure with "E"!1 edges, its DPR reliabilitycan be computed bythe algorithm Reliability_{}Linear}DCS in O("E""F") time. Note that D}#e remains a DCS with a ring structure with "E"!1 edges. We then applythe same analysis to D#e. Recursivelyapplying Eq.1, the ring DCS D with"E" edges can be decomposed into, in worst case, "E" linear DCSs. So, we have an O("E""F") time algorithm for computing the reliabilityof a DCS with a ring structure.

Algorithm Reliability_+Ring+DCS(D)

// Given a DCS with a ring structure D"(<, E, F) and a speci"ed set of "les H, // // this algorithm returns the DPR of D //

(7)

Fig. 3. A DCS with a ring structure.

Step 1: If there exists one node in < which holds all data "les in H then return (1). Step 2: Select an arbitraryedge e of D.

Step 3: RJQReliability}Linear}DCS(D!e). Step 4: RPQReliability}Ring}DCS(D#e). Step 5: Return(pC* R>PqC* RJ).

end Reliability_+Ring+DCS

Example 2. Consider the DCS with a ring topologyshown in Fig. 3. This is the DCS shown in Fig. 2 with one edge e added between nodes v and v.

Applying algorithm Reliability_{}Ring}DCS, we have}

R(D&)"qR((D!e)&)#pR((D#e)&)

"

qR((D!e)&)#p+qR((D#e!e)&)#pR((D#e#e)&),.

Since there exists one node in D#e#e that holds all "les in H, we have

R((D#e#e)&)"1. From Example 1, it is easyto see that R((D!e)&)"Pr(E) and R((D#e!e)&)"Pr(E). So we have

R(D&)"q(p#qppp#qqpp#qqpp)

#

p[q(p#qppp#qqpp)#p].

4. Computational complexityof the DPR problem on a ring of tree topology

Complexityresults are obtained bytransforming known NP-hard problems to our reliability problems [10}14]. For this reason, we "rst state some known NP-hard problems as follows.

(i) K- terminal reliability (KTR)

Input: an undirected graph G"(<, E) where < is the set of nodes and E is the set of edges that fail s-independent of each other with known probabilities. A set K-< is distinguished with"K"*2.

(8)

Output: R(G)), the probabilitythat the set K of nodes of G is connected in G. (ii) Number of edge covers (CEC)

Input: an undirected graph G"(<, E). Output: the number of edge covers for G

,"+¸-E: each node of G is an end of some edge in ¸,". (iii) Number of vertex covers (CVC)

Input: an undirected graph G"(<, E). Output: the number of vertex covers for G

,"+K-<: everyedge of G has at least one end in K,".

Theorem 2. Computing DPR for a DCS with a star topology even with each "FAG""2 is NP-hard. Proof. We reduce the CEC problem to our problem. For a given network G"(<,E) where

E"+e, e,2, eL,, we construct a DCS D"(<, E,F) with a star topologywhere

<"+s,v, v,2, vL,, E"+(s,vG) " 1)i)n,, and F"+fG " for each node i3G,. Let

FATG"+ fS,fT " if eG"(u,v)3G, for 1)i)n, FAQ"~ and H"F. From the constructionof D, it is

easyto show that there is one-to-one correspondence between one of the sets of edge covers and one FST. The DPR of D, R(D&), can be expressed as

R(D&)" $12 RZ"

 GZR pG   GAR (1!pG)

.

Thus, a polynomial-time algorithm for computing R(D&) over a DCS with a star topologyand each "FAG""2 would implyan e$cient algorithm for CEC problem. Since CEC problem is NP-hard, Theorem 2 follows. 䊐

Theorem 3. Computing DPR for a DCS with a star topology even when there are only two copies of

each xle is NP-hard.

Proof. We reduce theCVC problem to our problem. For a given G"(<,E) where "E""n and <"+v, v,2, vK,, we construct a DCS D"(<, E, F) with a star topologywhere <"<6+s,, E"+eG"(s,vG) "1)i)m,, and F"+fG "for all edge i3G,. Let FAG"+fH "for all edge j that are incident on vG3G, and H"F. From the construction of D, it is easyto show that there are onlytwo copies of each "le in D and one-to-one correspondence between one of sets of vertex covers and one FST of D. The DPR of D, R(D&), can be expressed as

R(D&)" $12 RZ"

 GZR pG   GAR (1!pG)

(9)

Theorem 4. Computing DPR for a DCS with a tree topology is NP-hard.

Proof. ByTheorems 2 and 3, we can see that DPR problem for a DCS with a star topology, in general, is NP-hard. This implies DPR problem for a DCS with a tree topology, in general, is also NP-hard, since a DCS with a star topologyis just a DCS with a tree topologythat has one level branch. 䊐

Now, We use the results of Theorems 2}4 to prove the DPR problem on ring of tree topologyis NP-hard.

Theorem 5. Computing DPR for a DCS with a ring of trees topology even with one level of tree is NP-hard.

Proof. Give a DCS graph D"(<, E, H) where <"+s, v,v,2,vL, and E"+(s,vG)"1)i)n, with a star topology. We construct a DCS graph D"(<, E, H) from graph D, where

<"+v,v,2,vL,6+(sH "1)j)n, and

E"+(sH,sH>)"1)j(n,6+(sL,s),6+(sH,vH)"1)j)n,.

It is easyto see that D is a ring of tree topologywith one level of tree. If we assume all added edges, +(sH,sH>)"1)j(n,6+(sL,s),, of D be perfect reliability, then we have R(D&)"R(D&) for any given H-H. ByTheorems 2 and 3, computing DPR over a DCS with a star topologyis NP-hard, thus, computing DPR over a DCS with a ring of tree topologywith one level of tree is also NP-hard.

Theorem 6. Computing DPR for a DCS with a ring of tree topology, in general, NP-hard.

Proof. ByTheorem 5, we can see that DPR problem for a DCS with a ring of tree topologyeven with one level of tree is NP-hard. With the same approach stated in Theorem 5, we construct a ring of tree topologywith a tree topology. ByTheorem 4, computing DPR over a DCS with a tree topologyis NP-hard, thus, computing DPR over a DCS with a ring of tree topologyis also NP-hard. 䊐

5. Conclusions

In this paper, we investigated the problem of distributed program reliabilityon ring distributed computing systems. We propose a polynomial-time algorithm for computing the DPR on a ring topology. We also propose Theorems 5 and 6 to show that solving the DPR problem on a ring of trees topologyis NP-hard.

Appendix

(10)

Theorem 1.

Pr(EL)"Pr(EL\)#_GE EL

L\>[(1!Pr(EG\))qG\QG]

with the boundary conditions Pr(EG)"0, gG"0, and pG"0, for i)0.

Proof.

Pr(EL)"Pr

EL\_GE8EL L\>SG

"

Pr(EL\)#Pr

EL\5

_GE8EL L\>SG

. (A.1)

For the term Pr(EL\5( EL

GEL\>SG)) in Eq. (1), we have

Pr

_EL\5

8EL

GEL\>SG

"

Pr(EL\5SEL\>)#Pr(EL\5SEL\>5SEL\>)

#

Pr(EL\5SEL\>5SEL\>5SEL\>

#

Pr(EL\5SEL\>525SEL\5SEL). (A.2)

Since SG"SG>5+xG"1, for gL\#1)i)gL, we have SGLSG> and SG5Si>"SG>. Pr

_EL\5

8EL

GEL\>SG

"

Pr(EL\5SEL\>)#Pr(EL\5SEL\>5SEL\>)

#

Pr(EL\5SEL\>5SEL\>)#2#Pr(EL\5SEL\5SEL)

" Pr(EL\5SEL\>)# EL GEL\>Pr(EL\5SG\5SG) " Pr(EEL\\5+xEL\"0,5SEL\>)# EL GEL\>Pr(EG\5+xG\"0,5SG) " EL GEL\>Pr(EG\5+xG\"0,5SG).

Note that the events EG\, +xG\"0,, and SG are disjoint with each other. We have Pr(EG\5+xG\"0,5SG)"Pr(EG\)Pr(+xG\"0,)Pr(SG). So Pr

_EL\5

8EL GEL\>SG

" EL GEL\>Pr(EG\)Pr(+xG\"0,)Pr(SG) " E L GEL\>[(1!Pr(EG\))qG\QG] (A.3)

(11)

References

[1] Prasanna Kumar VK, Hariri S, Raghavendra CS. Distributed program reliabilityanalysis. IEEE Transaction Software Engineering 1986;SE-12:42}50.

[2] Hariri S, Raghavendra CS. SYREL: A symbolic reliability algorithm based on path and cutset methods. USC Tech. Rep. 1984.

[3] Kumar A, Rai S, Agrawal DP. Reliabilityevaluation algorithms for distributed systems. In Proceedings IEEE INFOCOM 88, 1988. p. 851}60.

[4] Kumar A, Rai S, Agrawal DP. On computer communication network reliabilityunder task execution constraints. IEEE Journal on Selected Areas in Communication 1988;6(8):1393}9.

[5] Jain R. FDDI handbook: high speed networking using "ber and other media. New York: Addison-Wesley, 1994. [6] Jiahnsheng Yin, Charles Silio Jr. B. K-terminal reliabilityin ring networks. IEEE Transactions on Reliability

1994;43(3):389}400.

[7] Logothetis D, Trivedi KS. Reliabilityanalysis of the double counter-rotating ring with concentrator attachment. IEEE Transactions On Networking 1994;2(5):520}32.

[8] Chen DJ, Chang MS, Yang CL, Ku Kuo-Lung. Multimedia task reliabilityanalysis based on token ring network. 1996 International Conference on Parallel and Distributed System, 1996. p. 265}72.

[9] Wood RK. Factoring algorithms for computing K-terminal network reliability. IEEE Transaction Reliability 1986;R-35:269}78.

[10] Rosenthal A. A computer scientist looks at reliabilitycomputations. In: Reliabilityand fault tree analysis SLAM. 1975. p. 133}52.

[11] Valiant LG. The complexityof enumeration and reliabilityproblems. SIAM Journal of Computing 1979;8:410}21. [12] Ball MO, Provan JS, Shier DR. Reliabilitycovering problems. Networks 1991;21:345}57.

[13] Provan JS, Ball MO. The complexityof counting cuts and of computing the probabilitythat a graph is connected. SIAM Journal of Computing 1983;12(4):777}88.

[14] Provan JS. The complexityof reliabilitycomputations in planar and acyclic graphs. SIAM Journal on Computing 1986;15:694}702.

[15] Sheppard DA. Standard for banking communication system. IEEE Transaction Computer 1987;92}5.

Min-Sheng Lin received his MS & Ph.D. in Computer Science & Information Engineering from National Chiao Tung University(Hsin Chu, Taiwan). He is currentlyan associate professor at Aletheia University(Taipei, Taiwan). His research interests include reliabilityand performance evaluation of distributed computing systems.

Ming-Sang Chang received BS degree in Electronic Engineering from National Taiwan Universityof Science and Technology(Taipei, Taiwan), MS degree in information engineering from TamKang University(Taipei, Taiwan), and Ph.D. degree in Computer Science & Information Engineering from National Chiao Tung University(Hsin Chu, Taiwan). He is currentlyworking in Chunghwa Telecommunication Training Institute (Taipei, Taiwan). His research interests include computer network, performance evaluation, distributed system, and reliability evaluation.

Deng-Jyi Chen received the BS degree in Computer Science from Missouri State University(Cape Girardeau), and MS and Ph.D. degree in Computer Science from the Universityof Texas (Arlington). He is now a professor at National Chiao Tung University(Hsin Chu, Taiwan). His papers have been published in more than 100 journals and conference papers in the area of reliability and performance modeling of distributed systems, computer networks, object-oriented systems, and software reuse. Professor Chen works verycloselywith industrial sectors and provides consulting for manylocal companies (both for software and hardware companies). So far, he has been a chief leader of designing and implementing two commercial products which are now marketing around the world.

Kuo-Lung Ku received his bachelor and master degree from Chiao Tung University, and his Ph.D. in Electrical Engineering from the Universityof Washington. His major research areas include computer architecture, real-time computation and security. He is working in Chung Shan Institute of Science and Technology.

The distributed program reliability analysis on ring-type topologies