The computational complexity of the reliability problem on distributed systems

(1)

Information Processing Letters 64 (1997) 143-147

Informqtion

~~;cesng

The computational complexity of the reliability problem

on distributed systems

Min-Sheng Lin a**, Deng-Jyi Chen by1

’ Department of Information Management, Tamsui Oxford University College. Tamsui, Taipei, 25103, Taiwan, ROC b Institute of Computer Science and Information Engineering, National Chiao-Tung Universify, Hsin Chu, 30050, Taiwan, ROC

Received 31 January 1997; revised 16 July 1997 Communicated by T. Asano

Abstract

The reliability of a distributed program in a distributed computing system is the probability that a program which runs on multiple processing elements and needs to communicate with other processing elements for remote data files will be executed successfully. This reliability varies according to ( 1) the topology of the distributed computing system, (2) the reliability of the communication links, (3) the data files and program distribution among processing elements, and (4) the data files required to execute a program. This paper shows that solving this reliability problem is NP-hard even when the distributed computing system is restricted to a series-parallel, a 2-tree, a tree, or a star structure. @ 1997 Elsevier Science B.V.

Keywords: Distributed systems; Distributed program reliability; Computational complexity;

Graph

theory

1. Introduction

A typical distributed computing system (DCS) consists of processing elements (nodes), communication links (edges), memory units, data files, and programs [ 561. These resources are interconnected via a communication network that dictates how information flows between nodes. Programs residing on some nodes can run using data files at other nodes.

One important issue in the design of a DCS is reliability. A large amount of work [ 1,8,12,16] has been devoted to developing algorithms to compute mea- sures of reliability for DCSs. One typical reliability measure for DCSs is the K-terminal reliability (KTR)

+ Corresponding author. Email: [email protected]. ’ Email: [email protected].

[ 201. KTR is the probability that a specified set of nodes K, which is subset of all the nodes in the DCS, remains connected in a DCS whose edges may fail independently of each other, with a known probabilities. However, the KTR measure is not applicable to practical DCSs since a reliability measure for DCSs should capture the effects of redundant distribution of programs and data files.

In [ lo], distributed program reliability (DPR) was introduced to accurately model the reliability of DCSs. Consider DCS in which the nodes are perfectly reliable but the edges can fail, statistically independently of each other, with known probabilities. For successful execution of a distributed program, it is essential that the node containing the program, other nodes that have required data files, and the edges between them be

(2)

144 M.-S. Lin, D.-J. Chedlnformation Processing Letters 64 (1997) 143-147

v3

fig. 1. A simple DCS. Program fl needs data files f2, f3, and

f4 to complete execution.

operational. DPR is thus defined as the probability that a program with distributed files can run successfully in spite of some faults occurring in the edges. To illustrate the definition of DPR, consider the DCS shown in Fig. 1. There are four nodes (~1, 2~2. us, ~4) and five edges (et, e2, es, e4, es). Program fr requires data files f2, f3, and f4 to complete execution, and it is running at node VI, which holds data files f2 and f3.

Hence, it must access data file f4, which is resident at both node v2 and node ~4. Therefore, the reliability of distributed program fl can be formulated as follows: DPR(program f 1)

= Prob( (ur and u2 are connected) or (VI and v4 are connected) ) .

Several algorithms have been proposed for evaluation DPR [3,4,9,11]. However, we have seen that none meets our desire for efficient algorithms. At this point, one must conclude that either the approaches examined are just not sufficiently clever, or that no efficient algorithms exist for our reliability problems. Nevertheless, tools in complexity theory do provide a vehicle for giving strong evidence that no polynomial time algorithm exists for certain problems. A gener- ally accepted method for providing evidence of intractability is to prove NP-complete or NP-hard. We refer the reader to [ 71 for an excellent exposition of the theory of NP-complete, and for proofs of stan- dard NP-complete results. NP-hard is not a proof of intractability, but is convincing evidence. An efficient solution for any NP-hard problem provides efficient methods for every problem in NP, which contains a formidable list of apparently difficult problems.

The purpose of this paper is to show that the DPR problem, in general, is NP-hard even on a DCS with a series-parallel, a 2-tree, a tree, or a star structure.

In this paper we will make use of the following notation:

D = (YE, F) : an undirected DCS graph with vertex (nodes) set V, edge set E, and file set F which is distributed in D,

V: the set of nodes that are all perfectly reliable, E: the set of edges that can fail, statistically independently of each other, with known probability, F: the set of files (including data files and pro- grams) distributed in D,

H C F: the specified set of files that must commu- nicate with each other through the edges,

FAi 2 F: the set of files available at node i, pi: the reliability of edge i,

qi E 1 - pi,

R( DH) : the DPR of D with H specified E the prob- ability that all files in H can communicate with each other through the edges in D.

Using the above notation, we can describe the ex- ample in Fig. 1 as follows:

V={ul,U2rU3rU4}, E= {el,e29e39e4,e5}, F = {flrf2,f3,f4,fSrf6}v H={flTf2.f3?f4}9

//Program f 1 needs data files f2, f3, and f4 to complete execution. I I

mu1 = {flv f2. f3), Mu2 = (f2, f4)r mu3 = (f2, fs), and mu4 = (f4, f6).

2. The computational complexity of the DPR problem

Complexity results are obtained by transforming known NP-hard problems into our reliability problems. For this reason, we first state some known NP- hard problems:

( 1) K-Terminal Reliability (KTR) [ 15,181.

Input: an undirected graph G = (YE), where V is the set of nodes and E is the set of edges that fail statistically independently of each other

(3)

M.-S. Lin, D.-J. Chedlnformation Processing Letters 64 (1997) 143-147 145 with known probabilities. A set K c V is distin-

guished with ]lyl 2 2.

Output: R(

GK)

,

the probability that the set K of nodes of G is connected in G.

(2)

Number of Edge Covers (#EC) [ 21. Input: an undirected graph G = ( V E) .

Output: the number of edge covers for G z / {L 2 E : each node of G is an end of some edge in L]I.

(3) Number of Vertex Covers (#VC) [ 141. Input: an undirected graph G = ( YE) .

Output: the number of vertex covers for G z I{K C V : every edge of G has at least one end in K}I.

Theorem 1. Computing DPR for a general DCS is

NP-hard.

Proof. We reduce the well-known KTR problem to

our DPR problem. For a given network G = (V E) and a specified set K L V, we can define an instance of the DPR problem. Construct a DCS graph D = (v E, F) in which the topology and the reliability of each edge are the same as G. Let F = Unode ,,{fi} and FA, = {fi} if node i E K, else FAi = 8 for each node i E V. If we set H = F = Unode i,K{fi}, then we have R(DH) = R(GK). 0

Corollary 2. Computing DPR for a planar DCS is

NP-hard.

Proof. From the proof of Theorem 1, it is clear that the KTR problem is just a special case of the DPR problem. It has been shown that computing KTR over planar networks is NP-hard [ 131. This also immediately implies that computing DPR over planar networks is NP-hard. 0

The result of Theorem 1 implies that it is unlikely that polynomial-time algorithms exist for solving the DPR problem. One possible means of avoid- ing this complexity is to consider only a restricted class of structures. Classes of interest here include linear systems, which are widely used in bus local networks, ring systems, which are widely used in token ring local networks, stars, which are used in one-node circuit-switched networks, trees, which are used in hierarchical local access networks, and

series-parallel system which arise in wide-area networks.

For the KTR problem polynomial-time (or linear- time) algorithms have been developed for other restricted networks, such as linear systems, ring systems, stars, trees, and series-parallel graphs [ 171. Ob- viously, if there are no replicated files, i.e., if there is only one copy of each file in the DCS, then the DPR problem can be transformed into a KTR equivalent problem in which the K set is the set of nodes that contain the data files needed for the program under consideration. However, data files are usually replicated and distributed in DCS, so these two problems are different. In the remainder of this section, we will show that computing DPR over stars, trees, or series- parallel networks in general is still NP-hard.

Theorem 3. Computing DPR for a DCS with a star

topology is NP-hard even when each 1 FAi I = 2.

Proof. We reduce the #EC problem to our prob-

lem. For a given network G = (VI, El >, where El =

{el,e2,... ,e,},weconstructaDCSD=(V2,E2,F)

with a star topology, where V2 = {s, ~1, 4, . . , on}, E2 = {(s, Ui) / 1 6 i 6 n}, and F = {fi I for each node i E G}. Let FAUi = {fi,, fi, I if ei = (u, U) E G} for 1 < i < n, FA, = 0 and H = F. In the DCS we now define a file spanning tree (FST), which is a tree whose nodes hold all files E H, i.e., H C

Uvi~Ffl{FAil-

F

rom the construction of D, it is easy to show that there is a one-to-one correspondence between one of the sets of edge covers and one FST. The DPR of D, R( DH), can be expressed as

R(DH) = _c _{ _rI _pi

for all FST IED for each edge iet X

rI (1 -Pi)}.

for each edge $0

If we set each pi = $ for all 1 < i < n, then we have

R(DH) = _c ($ for all FST ED R( DH)2* = _c _’ for all FST ED =# of FSTs in D =# of edge covers in G. 0

(4)

146 M.-S. Lin, D.-J. Chetdlnformation Processing Letters 64 (1997) 143-147

Theorem 4. Computing DPR for a DCS with a star topology is NP-hard even when there are only two copies of each file.

Proof. We employ the reduce from #VC problem to our problem. For a given G = (Vi, El ), where [Et 1 = n and V, = {uI,u~,... , unt}, we construct a DCS D = ( &, E2, F) with a star topology, where v2 = 6 U {s}, E2 = {ei = (s,q) 1 1 < i < m}, and F = {fi 1 for all edges i E G}. Let E4i = {fj 1 for all edges j that are incident on Ui E G} and H = F. From the construction of D, it is easy to show that there are only two copies of each file in D and there is a one-to- one correspondence between one of the sets of vertex covers and one FST of D. The DPR of D, R(Dn), can be expressed as

R(DH) =

c {

II

pi

for all FST GD for each edge iEt

x n (I-Pi)}.

for each edge i$t

If we set each pi = i for all 1 < i < n, then we have

NDH) =

c

($3

for all FST tED

R(

DH)~” =

c

1

for all FST tED

=#ofFSTs in D

= # of vertex covers in G. Cl

Corollary 5. Computing DPR for a DC’S with a tree topology is NP-hard.

Proof. By Theorems 3 and 4, we see that the DPR problem for a DCS with a star topology in general is NP-hard. This implies that the DPR problem for a DCS with a tree topology in general is also NP-hard, since a DCS with a star topology is just a DCS with a tree topology which has one level branch. 0

For KTR, it is obviously true that polynomial-time The reliability of a distributed program in a dis- algorithms exist over DCSs with a star or a tree topol- tributed computing system is the probability that a ogy. In addition, polynomial-time algorithms do exist

for computing KTR over series-parallel graphs [ 141

program which runs on multiple processing elements and needs to communicate with other processing ele- and 2-trees [ 131. A 2-tree is defined recursively as ments for remote data files will be executed success-

follows: fully. This reliability varies according to ( 1) the topol-

( 1) The complete graph K2 (a single edge) is a 2- tree.

(2) Given any 2-tree G on n > 2 nodes, let (u, U) be an edge of G. Adding a new node w and two edges (w, U) and (w, o) produces a 2-tree on n + 1 nodes.

We now show that the DPR problem for a DCS with a 2-tree structure in general is NP-hard.

Theorem 6. Computing DPR for a DCS with a 2-tree topology in general is NP-hard.

Proof. We reduce an arbitrary instance of a star topol- ogy to a 2-tree topology. Assume we have a DCS graph

D=(VE,F) whereV={s,vi,U2 ,..., u,}andE=

{(s, Ui) 1 1 < i < n} with a star topology. We con- struct from D a DCS graph D’ = (YE’, H), where E’=EU{(Ui,Ui+l) 1 l<i<n-l}.Itiseasytosee

that D’ is a 2-tree on n + 1 nodes. If we stipulate that all added edges (Ui, Ui+l), 1 6 i < n - 1, of D’ have a reliability of 0, then we have R( Dn) = R( Dh) for any given H C F. 0

Corollary 7. Computing DPR over a series-parallel DCS is NP-hard.

Proof. From [ 191, a 2-tree is a maximal series- parallel graph. A maximal series-parallel graph is a series-parallel graph with neither loops nor parallel edges. Since computing DPR over a DCS with a 2-tree topology is NP-hard, computing DPR over a series-parallel DCS is also NP-hard. It is easy to see that the DCS graph D’ constructed in Theorem 6 is also a series-parallel DCS. The theorem follows. q

In this section, we have shown that computing DPR over a DCS with a star, a tree, a 2-tree, a series-parallel, a planar, or a general topology in general is NP-hard.

(5)

M.-S. Lin, D.-J. Chen/lnfortmtion Processing Letters 64 (1997) 143-147 147

ogy of the distributed computing system, (2) the reliability of the communication links, (3) the data files and program distribution among processing elements, and (4) the data files required to execute a program. This paper shows that solving this reliability problem is NP-hard even when the distributed computing system is restricted to a series-parallel, a 2-tree, a tree, or a star structure.

References

[ I] K.K. Aggrawal, S. Rai, Reliability evaluation in computer- communication networks, IEEE Trans. Reliability 30 ( 198 1) 32-35.

[21 M.O. Ball, J.S. Provan, D.R. Shier, Reliability covering problems, Networks 21 (1991) 345-357.

[ 31 D.J. Chen, T.H. Huang, Reliability analysis of distributed systems based on a fast reliability algorithm, IEEE Trans. Parallel Distributed Systems 3 (2) (1992) 139-153. [4 1 D.J. Chen, M.S. Lin, On distributed computing systems

reliability analysis under program execution constraints, IEEE Trans. Comput. 15 (12) (1993).

15 1 I? Enslow, What is a distributed data processing system, Computer I1 ( 1978).

161 J. Garcia-Molina, Reliability issues for fully replicated distributed database, IEEE Comput. 16 (1982) 34-42. 17 I M.R. Carey, D.S. Johnson, Computers and Intractability:

A Guide to the Theory of NP-completeness, Freeman, San Francisco, CA, 1979.

[ 8 1 A.F! Gmarov, M. Gerla, Multi-terminal reliability analysis of distributed processing system, in: Proc. 1981 Intemat. Conf. Parallel Processing ( 198 1) 79-86.

[9] A. Kumar, S. Rai, D.P. Agrawal, On computer communication network reliability under program execution constraints, IEEE JSAC 6 (1988) 1393-1399.

[IO] V.K. Prasanna Kumar, S. Hariri, C.S. Raghavendra, Distributed program reliability analysis, IEEE Trans. Software Engineering 12 ( 1986) 42-50.

[ 111 M.S. Lin, D.J. Chen, General reduction methods for the reliability analysis of distributed computing systems, Comput. J. 36 (7) (1993) 631-644.

[ 121 R.E. Merwin, M. Mirhakak, Derivation and use of a survivability criterion for DDP systems, in: Proc. 1980 Nat. Comput. Conf. (1980) 1399146.

[ 131 J.S. Provan, The complexity of reliability computations in planar and acyclic graphs, SIAM J. Comput. 15 (1986) 694-702.

[ 141 J.S. Provan, M.O. Ball, The complexity of counting cuts and of computing the probability that a graph is connected, SIAM J. Comput. 12 (4) (1983) 777-788.

[ 151 A. Rosenthal, A computer scientist looks at reliability computations, in: Reliability and Fault tree Analysis SLAM (1975) 133-152.

[ 161 A. Satyanarayana, J.N. Hagstrom, A new algorithm for the reliability analysis of multi-terminal networks, IEEE Trans. Reliability 30 (1981) 325-334.

[ 171 A. Satyanarayana, R.K. Wood, A linear-time algorithm for computing K-terminal reliability in series-parallel networks, SIAM J. Comput. 14 (4) (1985) 818-832.

[ 181 L.G. Valiant, The complexity of enumeration and reliability problems, SIAM J. Comput. 8 (1979) 410-421.

[ 191 P Winter, Steiner problem in networks: A survey, Networks 17 (1987) 129-167.

[ 201 R.K. Wood, Factoring algorithms for computing K-terminal network reliability, IEEE Trans. Reliability 35 ( 1986) 269- 278.