The Distributed Program Reliability Analysis on Ring-Type
Topologies
環狀拓樸邏輯分散式程式可靠度分析
Ming-Sang Chang
1Min-Sheng Lin Deng-Jyi Chen
Department of Information Department of Electric Institute of Computer
Management Engineering National Science and Information
Lunghwa University of Taipei University of Engineering National Chiao Tung
Science and Technology Technology 1, Sec. 3, University 1001 Ta Hsueh Road,
300,Wan Shou Rd., Sec. 1, Chung Hsiao E. Rd., Hsin Chu, Taiwan, R.O.C.
Kueishan Taoyuan, Taipei, Taiwan, R.O.C
Taiwan, R.O.C.
Email:[email protected] Email:[email protected] Email: [email protected]
1
All correspondences should be sent to MING-SANG CHANG ([email protected])
Abstract
Distributed Computing System (DCS) has become very popular for its high fault-tolerance, potential for parallel processing, and better reliability performance. One of the important issues in the design of the DCS is the reliability performance. Distributed Program Reliability (DPR) is addressed to obtain this reliability measure.
In this paper, We propose a polynomial-time algorithm for computing the DPR of ring topologies and show that solving the DPR problem on a ring of trees topology is NP-hard.
Keywords: Distributed Program Reliability, Minimal File Spanning Tree, Algorithm, Ring of Tree
1. Introduction
Distributed Computing System (DCS) has become very popular for its fault-tolerance, potential for parallel processing, and better reliability performance. One of the important issues in the design of the DCS is the reliability performance. Distributed program reliability is address to obtain this reliability measure [1-4].
An efficient network topology is quite important for the distributed computing system. The ring topology is a popular one used in high speed network. It has been considered for IEEE 802.5 token ring, for the fiber distributed data interface (FDDI) token ring, for the synchronous optical network (SONET), and for asynchronous transfer mode (ATM) networks. The ring network has widely used in current distributed system design.
In a ring of tree topology, a ring is used to connect each tree topology in the network. This architecture can be used in FDDI that consists of (1) a tree of wiring concentrators and terminal stations, and (2) a counter-rotating dual ring [5].
A large amount of work has been devoted to developing algorithms to compute measures of reliability for a DCS. One typical reliability measure for a DCS is the K-terminal reliability (KTR) [6-8]. KTR is the probability that a specified set of nodes K, which is subset of all the nodes in a DCS, remains connected in a DCS whose edges may fail independently of each other, with known probabilities. However, the KTR measure is not applicable to a practical DCS since a reliability measure for a DCS should capture the effects of redundant distribution of programs and data files. In [1-4], distributed program reliability (DPR) was introduced to accurately model the reliability of a DCS. For successful execution of a distributed program, it is essential that the node containing the program, other nodes that have required data files, and the edges between them be operational. DPR is thus defined as the probability that a program with distributed files can run successfully in spite of some faults occurring in the edges. In reality, the DPR problem is a logical OR-ing of Prob{K-terminals are connected}, but the computing the conditional probabilities required could be rather nasty.
In this paper, We propose a polynomial-time algorithm to analyze the DPR of ring topology and show that solving the DPR problem on a ring of tree topology is NP-hard.
2. Notation and Definitions
Notation
D=(V, E, F) an undirected Distributed Computing System (DCS) graph with vertex set V, edge set E and data file set F .
FAi set of files available at node i. (Note:
F=
∪
FAi ) pi reliability of edge iqi 1−pi
H subset of files of F, i.e., H
⊆
F, andH contains the programs to be
executed and all needed data files for the execution of these programs
R (DH) the DPR of D with a set H of needed
files: Pr{all data files in H can be accessed successfully by the executed programs in H}.
Definition: A File Spanning Tree (FST) is a tree whose
nodes hold all needed files in H.
Definition: A Minimal File Spanning Tree (MFST) is a
FST such that there exists no other FST that is a subset of it.
Definition: Distributed program reliability (DPR) is
defined as the probability that a distributed program runs on multiple processing elements (PEs) and needs to communicate with other PEs for remote files will be executed successfully.
By the definition of MFST, the DPR can be written as
R (DH)=Prob(at least one MFST is operational), or
R (DH)=Prob (
∪
MFST
j j = 1#mfst
)
where #mfst is the number of MFSTs for a given needed file set H.
3. Computing DPR Over a DCS with a Ring
Topology
Now, we consider a DCS with a linear structure
D=(V, E, F) with |E|=n edges in which an alternation
sequence of distinct nodes and edges (v0, e1, v1, e2, ...,
vn−1, en, vn) is given. For 1
≤
i≤
n, letIi the FST which starts at edge ei and has the
minimal length
Si the event that all edges in Ii are working
Qi ≡
∏
all edge j ∈Iip
j be the probabilitythat Si occurs
Ei the event that there exists an operating event Sj between edges e1 and ei
gi the number of Ij which lies between e1 and ei
xi state of edge ei ; xi=0 if edge ei fails ; else xi=1
A the complement of event A.
It is easy to see that the DPR of a DCS with a linear structure D with |E|=n edges , R(DH), can be stated
as Pr(En). The following theorem provides a recursive method for computing Pr(En).
Theorem 1.
]
*
*
))
Pr(
1
(
[
)
Pr(
)
Pr(
1 2 1 1 1 i i i g g i n nE
E
q
Q
E
n n − − + = −+
−
=
∑
−with the boundary conditions Pr(Ei) = 0, gi=0, and pi=0, for i
≤
0.Proof. See the appendix.
Before applying the Theorem 1, we use the following procedure COMGQ to compute the values of
gi and Qi, for 1
≤
i≤
n, for a given linear DCS with |E|=n edges.Procedure COMGQ
// Given a DCS with a linear structure with the alternation sequence of distinct nodes and edges //
// (v0, e1, v1, e2, ..., vn−1, en, vn),//
// F: the set of files (including data files and programs) distributed in D;//
// H: the set of files that must be commu nicated each other through the edges in D;//
// FAi: the set of files available at node vi, for 0
≤
i≤
n; and //// this procedure computes the values of gi and Qi, for 1
≤
i
≤
n.////h (head) and t (tail) are two indexes moving among nodes. NFi is the total number of file i //
// between nodes vh and vt. If there exists a FST between nodes vh and vt then flag=true //
// else flag= false //
begin
for 2
≤
i≤
n do Qi←
0 repeat // initialize //p0
←
Q1←
1 // initialize //h
←
0; t←
1 // initialize //for each file i
∈
F do // initialize //if file i
∈
FAh then NFi←
1else NFi
←
0endif repeat while t
≤
n dofor each file i
∈
FAt do NFi←
NFi+1 repeatQh+1
←
Qh+1*ptflag
←
truewhile flag do
for each file i
∈
H do // check if there exists a FST between //if NFi=0 then flag
←
falseendif //nodes vh and vt //
repeat if flag then
for each file i
∈
FAh do NFi←
NFi−1repeat h
←
h+1 Qh+1←
Qh/ph endif repeat gt←
h t←
t+1 repeatfor 1
≤
i≤
n do output(gi, Qi) repeatend COMGQ
Now, using the procedure COMGQ and Theorem 1, we are able to provide an algorithm for computing the reliability of a DCS with a linear structure.
Algorithm Reliability_Linear_DCS(D)
// Given a DCS with a linear structure D=(V, E, F) with |E|=n and a specified set of files H ,//
// this algorithm returns the DPR of D //
Step 1: Call COMGQ to compute the values of gi and Qi,
1
≤
i≤
n.Step 2: Evaluate Pr(En), recursively using Theorem 1. Step 3: Return (Pr(En)).
end Reliability_Linear_DCS
For step 1, the computational complexity of the procedure COMGQ is O(|E||F|), where |E| = n and
|F|
≥
m ax ((m axin= 0(FAi), H )) since the value of h in the inner while_loop is monotonously increasing and doesn't exceed the value of t that is the index of the outer while_loop. For step 2, by Theorem 1, Pr(Ei) can be computed in O(gi−gi−1+1). Since there are n such Pr(Ei)'s to compute, we need
another
(
(
11
))
1+
−
− =∑
g
g
O
i n i i =O
(
n
+
g
n−
g
0)
=O(n) = O(|E|). Therefore algorithm
Reliability_Linear_DCS takes O(|E||F|)+O(|E|) = O(|E||F|) time to compute the reliability of a DCS with a linear structure system. Example 1: MRG ADF IXF MRG CAF CAF ADF IXF
Computer A Computer B Computer C Computer D Computer E ADF
Computer F
e1 e2 e3 e4 e5
f1
f2 f3 f2 f3 f1 f2
P ro g ra m n eed s d ata files f1, f2, an d f3 fo r its ex ec u tio n .
v0
f 4 f 4
f 4
v1 v2 v3 v4 v5
Figure 2. The graph model for the distributed banking system in figure 1.
Consider a possible DCS of a banking system [4,17] shown in figure 1. Each local disk stores some of the following information:
Consumer accounts file (CAF), Administrative aids file (ADF), and Interest and exchange rates file (IXF).
Management report generation (MRG) in computers B and E indicates a query (program) to be executed for report generation. Figure 2 shows the graph model for this system. A node represents any computer location and the links show the communication network. We assume that the query MRG( f4) requires data CAF(f1),
ADF(f2) and IXF(f3) to complete its execution. Let
V={v0, v1, v2, v3, v4, v5}, E={e1, e2, e3, e4, e5}, F={f1, f2, f3,
f4} and H={f1, f2, f3, f4}. Applying the algorithm Reliability_Linear_DCS, we get Step 1: g 0 =0, // boundary condition // g 1=1, Q1= p 1, g 2=1, Q2= p2 p3 p4, g3=1, Q 3= p3 p4, g 4=3, Q4= p4 p5, g5=4, and Q
5=0. // I5 does not exist //
Step 2:
Pr(E1) = Pr(E2) =Pr(E3)=q0Q1 = p1
Pr(E4) = Pr(E3)+(1−Pr(E0))q1Q2+(1−Pr(E1))q2Q3
= p1+q1 p2 p3 p4+q1 q2 p3 p4
Pr(E5) = Pr(E4)+(1−Pr(E2))q3Q4
= p1+q1 p2 p3 p4+q1 q2 p3 p4+q1 q3 p4 p5. A ring DCS is a DCS with a circular communication link. Each node connects two conjoining edges with two neighboring nodes. Suppose D=(V, E, F) be a DCS with a ring topology. By factoring theorem [9], the DPR of D can be given as
R(DH)=peR((D+e)H)+qeR((D−e)H),
(Eq. 1) where
e is an arbitrary edge of D,
pe is the reliability of edge e,
qe
≡
1−pe,D+e is the DCS D with edge e =(u,v) contracted so that nodes u and v are merged into a single node and this new merged node contains all data files that previously were in nodes u and v, and
D−e is the DCS D with edge e deleted.
Since D−e is a DCS with a linear structure with |E|−1 edges, its DPR reliability can be computed by the algorithm Reliability_Linear_DCS in O(|E||F|) time. Note that D+e remains a DCS with a ring structure with |E|−1 edges. We then apply the same analysis to D+e. Recursively applying Equation (Eq.1), the ring DCS D with |E| edges can be decomposed into, in worst case, | E| linear DCSs. So, we have a O(|E|2|F|) time algorithm for computing the reliability of a DCS with a ring structure.
Algorithm Reliability_Ring_DCS(D)
// Given a DCS with a ring structure D=(V, E, F) and a specified set of files H , //
// this algorithm returns the DPR of D //
Step 1: If there exists one node in V holds all data files in
H then return (1).
Step 2. Select an arbitrary edge e of D. Step 3: Rl
←
Reliability_Linear_DCS(D−e).Step 4: Rr
←
Reliability_Ring_DCS(D+e).Step 5: Return(pe*Rr+qe*Rl).
end Reliability_Ring_DCS
Example 2:
Consider the DCS with a ring topology shown in Figure 3. This is the DCS shown in Figure 2 with one edge e6 added between nodes v5 and v0.
f1 f2 f3 f4 f2 f2 f3 f4 f1 v0 v1 v2 v3 v4 v5 e1 e2 e3 e4 e5 e6
Program f4 needs data files f1, f2, f3 for its execuitn.
Figure 3. A DCS with a ring structure
Applying algorithm Reliability_Ring_DCS, we have
= q6R ((D -e6)H) + p6{ q5R (( D + e6-e5)H)+ p5R (( D+ e6+ e5)H)}
Since there exists one node in D+e6+e5 that holds all files in H, we have R((D+e6+e5)H)=1. From example 1, it is easy to see that R((D−e6)H)=Pr(E5) and
R((D+e6−e5)H)=Pr(E4). So we have
R (DH)
= q6(p1+q1p2p3p4+q1q2p3p4+ q1q3p4p5) + p6[q5(p1+q1p2p3p4+ q1q2p3p4)+p5].
4. Computational Complexity of the DPR
Problem on a Ring of Tree Topology
Complexity results are obtained by transforming known NP-hard problems to our reliability problems [10-14]. For this reason, we first state some known NP -hard problems as follows.
i) K-Terminal Reliability (KTR)
Input: an undirected graph G = (V, E) where V is the set of nodes and E is the set of edges that fail s-independently of each other with known probabilities. A set
K⊆V is distinguished with |K| ≥ 2. Output:R(GK), the probability that the set K of
nodes of G is connected in G. ii) Number of Edge Covers (#EC) Input: an undirected graph G = (V, E). Output: the number of edge covers for G ≡ |{L ⊆ E: each node of G is an end of
some edge in L}|.
iii) Number of Vertex Covers (#VC) Input: an undirected graph G = (V, E). Output: the n umber of vertex covers for G
≡ |{K ⊆ V: every edge of G has at least one end in K}|.
Theorem 2. Computing DPR for a DCS with a star
topology even with each |FAi|=2 is NP-hard.
Proof. We reduce the #EC problem to our problem. For a given network G=(V1, E1) where E1={e1, e2, ...,
en}, we construct a DCS D=(V2, E2, F) with a
star topology where V2={s, v1, v2, ..., vn},
E2={ (s, vi) | 1 ≤ i ≤ n}, and F={ fi | for each
node i ∈G}. Let FAvi= { fu, fv | if ei=(u, v) ∈G}
for 1≤ i ≤ n, FAs=∅ and H =F. From the construction of D, it is easy to show that there is one-to-one correspondence between one of sets of edge covers and one FST. The DPR of D,
R(DH), can be expressed as R(DH)=. { pi Π for each edgei∈t ( 1-pi)
Π
for each edgei∉tΣ
for all FST t∈D }Thus, a polynomial-time algorithm for computing R(DH) over a DCS with a star
topology and each |FAi|=2 would imply an efficient algorithm for #EC problem. Since #EC problem is NP-hard, Theorem 2 follows.
Theorem 3. Computing DPR for a DCS with a star
topology even when there are only two copies of each file is NP-hard .
Proof. We reduce the #VC problem to our problem. For a given G=(V1, E1) where |E1|=n and V1={v1,
v2, ..., vm}, we construct a DCS D=(V2, E2, F)
with a star topology where V2=V1
∪
{s},E2={ei=(s, vi) | 1
≤
i≤
m}, and F={fi | for alledge i
∈
G}. Let FAi={ fj | for all edge j that are incident on vi∈
G} and H=F. From theconstruction of D, it is easy to show that there are only two copies of each file in D and one-to-one correspondence between one-to-one of sets of vertex covers and one FST of D. The DPR of D,
R(DH), can be expressed as R(DH)=
{
Π
p
i for each edge i∈t(1-p
i)
Π
for each edge i∉tΣ
for all FST t∈D}
Since #VC problem is NP-hard, Theorem 3 follows.
Theorem 4. Computing DPR for a DCS with a tree
topology is NP-hard.
Proof. By Theorems 2 and 3, we can see that DPR problem for a DCS with a star topology, in general, is NP-hard. This implies DPR problem for a DCS with a tree topology, in general, is also NP-hard, since a DCS with a star topology is just a DCS with a tree topology that has one level branch.
Now, We use the results of Theorem 2, 3, and 4 to prove the DPR problem on ring of tree topology is NP-hard.
Theorem 5: Computing DPR for a DCS with a ring of
trees topology even with one level of tree is NP-hard.
Proof. Give a DCS graph D=(V, E, H) where V={s, v1,
v2, ..., vn} and E={(s, vi) | 1
≤
i≤
n} with a startopology. We construct a DCS graph D’=(V’, E’,
H) from graph D, where
V’={ v1, v2, ..., vn}∪{( sj | 1
≤
j≤
n} andE’ = {(sj, sj+1) |1 ≤ j < n} ∪ {(sn, s1)} ∪ {(sj, vj) | 1 ≤ j ≤
n}.
It is easy to see that D’ is a ring of tree topology with one level of tree. If we assume all added edges, {(sj, sj+1) |1 ≤ j < n} ∪ {(sn, s1)}, of D’ be
perfect reliability, then we have R(DH)=R(D’H)
for any given H
⊆
H. By Theorem 2 and 3,computing DPR over a DCS with a star topology is NP-hard, thus, computing DPR over a DCS with a ring of tree topology with one level of tree is also NP-hard.
Theorem 6: Computing DPR for a DCS with a ring of
tree topology, in general, is NP-hard. Proof. By Theorem 5, we can see that DPR problem for
one level of tree is NP-hard. With the same approach stated in Theorem 5, we construct a ring of tree topology with a tree topology. By Theorem 4, computing DPR over a DCS with a tree topology is NP-hard, thus, computing DPR over a DCS with a ring of tree topology is also
NP-hard.
5. Conclusions
In this paper, we investigated the problem of distributed program reliability on ring distributed computing systems. We propose a polynomia l-time algorithm for computing the DPR on a ring topology. We also propose Theorem 5 and Theorem 6 to show that solving the DPR problem on a ring of trees topology is
NP-hard.
6. Appendix
The detailed proof of Theorem 1 is as follows.
Theorem 1.
]
*
*
))
Pr(
1
(
[
)
Pr(
)
Pr(
1 2 1 1 1 i i i g g i n nE
E
q
Q
E
n n − − + = −+
−
=
∑
−with the boundary conditions
Pr(Ei) = 0, gi=0, and pi=0, for i
≤
0. Proof. Pr(En) = Pr(En-1∪
Si i=gn-1+1 gn ) = Pr(En-1)+ Pr(En-1∩
(∪
Si i=gn-1+ 1 gn )) (Eq.2)For the term Pr(En-1
∩
(∪
Si i=gn-1+ 1 gn )) in (Eq.2), we have Pr(En-1∩
(∪
Si i=gn-1+ 1 gn )) = Pr(En-1∩
Sgn-1+1)+ Pr(En-1∩
Sgn-1+ 1∩
Sgn-1+ 2) +Pr(En-1∩
Sgn-1+1∩
Sgn-1+ 2∩
Sgn-1+ 3)+.. . +Pr(En-1∩
Sgn-1+1∩
.. .∩
Sgn-1∩
Sgn). (Eq.3) Since Si = Si+ 1∩
{xi=1} for n ni
g
g
+
≤
≤
−11
, wehave Si
⊂
Si+1 and Si∩
Si+1 = Si+1. Thus,Pr(En-1
∩
(∪
Si i=gn-1+ 1 gn )) = Pr(En-1∩
Sgn-1+ 1) + Pr(En-1∩
Sgn-1+ 1∩
Sgn-1+ 2) + Pr(En-1∩
Sgn-1+ 2∩
Sgn-1+ 3) +.. . +Pr(En-1∩
Sgn-1∩
Sgn) =Pr(
)
Pr(
)
1 1 2 1 1 1 1 n i i g g i g nS
E
S
S
E
n n n∩
∩
+
∩
− − + = + −∑
− − = Pr(Egn-1-1∩
{xgn-1= 0}∩
Sgn-1+ 1) +Pr(
{
0
}
)
1 2 2 1 i i i g g iS
x
E
n n∩
=
∩
− − + =∑
− =Pr(
{
0
}
)
1 2 1 1 i i i g g iS
x
E
n n∩
=
∩
− − + =∑
−Note that the events Ei−2, {xi-1=0}, and Si are disjoint with each other. We have
Pr(Ei-2
∩
{xi-1=0}∩
Si) = Pr(Ei-2)*Pr({xi-1=0})*Pr(Si). So Pr(En-1∩
(∪
Si i=gn-1+ 1 gn )) =Pr(
2)
*
Pr({
10
})
*
Pr(
)
1 1 i i i g g iS
x
E
n n=
− − + =∑
− =]
*
*
))
Pr(
1
(
[
1 2 1 1 i i i g g iQ
q
E
n n − − + =−
∑
− (Eq.4)By equations (Eq.2) and (Eq.4), we obtain Theorem 1.
Reference
[1] V. K. Prasanna Kumar, S. Hariri and C. S. Raghavendra, "Distributed program reliability analysis," IEEE Trans. Software Eng., vol. SE-12, pp. 42-50, Jan. 1986.
[2] S. Hariri and C.S. Raghavendra, "SYREL: A Symbolic Reliability Algorithm based on Path and Cutset Methods", USC Tech. Rep., 1984. [3] A. Kumar, S. Rai and D.P. Agrawal, "Reliability
Evaluation Algorithms for Distributed Systems", in Proc. IEEE INFOCOM 88, pp.851-860, 1988. [4] A. Kumar, S. Rai and D.P. Agrawal, "On
Computer Communication Network Reliability Under Task Execution Constraints", IEEE
Journal on Selected Areas in Communication,
Vol.6, No.8, pp. 1393-1399, Oct.1988.
[5] Raj Jain, "FDDI Handbook : High Speed Networking Using Fiber and Other Media" , Addison-Wesley Publishing Company, 1994 [6] Jiahnsheng Yin, Charles B. Silio Jr., "K-Terminal
Reliability In Ring Networks," IEEE Trans. on Reliability, vol. 43, no. 3, pp. 389-400,1994.
[7] Dimitris Logothetis, Kishor S. Trivedi, " Reliability Analysis of the Double Counter-rotating Ring with Concentrator Attachment," IEEE Trans. On Networking, vol.2, no.5, pp. 520-532,1994.
[8] D.J.Chen, M.S.Chang, C.L.Yang, Kuo-Lung Ku," Multimedia Task Reliability Analysis Based on Token Ring Network," 1996 Int. Conference on Parallel and Distributed System, pp.265-272,1996. [9] R. Kevin Wood, "Factoring Algorithms for
Computing K-terminal Network Reliability",
IEEE Trans. Reliability, Vol. R-35, pp.269-278,
Aug. 1986.
[10] A. Rosenthal, "A Computer Scientist Looks at Reliability Computations, "in: Reliability and
Fault tree Analysis SLAM, 1975, pp. 133-152.
[11] L. G. Valiant, "The Complexity of Enumeration and Reliability Problems," SIAM J. Computing, vol. 8, pp. 410-421, 1979.
[12] M. O. Ball , J. S. Provan and D. R. Sh ier, "Reliability Covering Problems," Networks, vol. 21, pp. 345-357, 1991
[13] J. S. Provan and M. O. Ball, "The Complexity of Counting Cuts and of Computing the Probability that a Graph is Connected," SIAM J. Computing, vol. 12, no. 4, pp. 777-788, Nov. 1983.
[14] J. S. Provan, "The complexity of reliability computations in planar and acyclic graphs", SIAM
Journal on Computing 15 (1986) 694-702.
[15] M.S. Lin, "Program Reliability Analysis in Distributed Computing System", Ph.D. Dissertation, 1994; National Chiao Tung University, Taiwan.
[16] Min-Sheng Lin, Deng-Jyi Chen, "The Computational Complexity of the Reliability Problem on Distributed Systems", Information Processing Letters 64 (1997) 143-147.
[17] D.A. Sheppard, "Standard for banking communication system", IEEE Trans. Computer, 1987 Nov, pp 92-95.