Reliability Evaluation of Multi-state Systems Subject to
Imperfect Coverage using OBDD
Yung-Ruei Chang
1, Suprasad V. Amari
2, and Sy-Yen Kuo
1 1Department of Electrical Engineering
National Taiwan University
Taipei, Taiwan
[email protected]
2
Relex Software Corporation
540 Pellis Road, Greensburg,
PA 15601, USA
[email protected]
Abstract
This paper presents an efficient approach based on OBDD for the reliability analysis of a multi-state system subject to imperfect fault-coverage with combinatorial performance requirements. Since there exist dependencies between combinatorial performance requirements, we apply the Multi-state Dependency Operation (MDO) of OBDD to deal with these dependencies in a multi-state system. In addition, this OBDD-based approach is combined with the conditional probability methods to find solutions for the multi-state imperfect coverage models. Using conditional probabilities, we can also apply this method for modular structures. The main advantage of this algorithm is that it will take computational time that is equivalent to the same problem without assuming imperfect coverage (i.e. with perfect coverage). This algorithm is very important for complex systems such as fault-tolerant computer systems, since it can obtain the complete results quickly and accurately even when there exist a number of dependencies such as shared loads (reconfiguration), degradation and common-cause failures.
1. Introduction
The s-coherent multi-state system theory has been investigated since 1975 [1]. Many researchers have analyzed the s-coherent multi-state system reliability [2][3][4][5]. Most of them extend the concepts and conclusions for the 2-state s-coherent systems to the multi-state systems. To describe the dynamic characteristics of the component state transition, Stochastic process (Markov process) techniques are combined with the s-coherent multi-state system theory to analyze the dynamic multi-state system reliability. The multi-state reliability theory can handle situations in which the system and its components have a range of performance levels, e.g. from perfect operation to complete failure. Because performance degradation is very common in industrial products, it is important to develop the multi-state system reliability theory.
When a multi-state system (MSS) is considered, it is important to estimate the impact of each element on the system output/performance. The general definition of MSS reliability [2] is: } ) ( Pr{ ) , (t L F t L RMSS t (1)
where L is the required performance level for MSS, F(t) is the MSS output/performance rate. For a multi-state system that has a finite number of states, there can be H different levels of output/performance at time t:
} 1 , { ) (t F h H F F h d d
and the system output/performance distribution can be defined by two finite vectors F and
) 1 ( }, ) ( Pr{ )} ( {qh t F t Fh dhdH q
Therefore, the non-repairable MSS reliability is the probability that the system remains in the states withFhtLduring (0, t):
¦
tL F h MSS h t q L t R (, ) () (2)In addition, systems that are used in life-critical applications such as flight control, nuclear power plant monitoring, space missions, etc., are designed with sufficient redundancy to be tolerant of errors. However, if the system cannot adequately detect, locate and recover from faults & errors in the system, then system failure can still result even when there exists adequate redundancy [6]. An accurate analysis must account for not only the complex system structure, but also the system fault and error recovery behavior. Therefore, the fault coverage problem of a system should be considered. This helps in fixing the optimal level of redundancy [7].
Most of published works use Markov models (non-homogenous Markov or semi-Markov model) to solve multi-state problems [8]. However, it is difficult to find the
correct model of a system and there will be a total of N =
(m+1)n states if there are n modules in the system and each
module has (m+1) states including the imperfect coverage
state. The computational time is proportional to N3 =
[(m+1)n]3. Hence, the computational complexity of the
problem is O(m3n). It is not just an NP problem, there are
paper provides a new approach to model a multi-state system and proposes an efficient method combined with conditional probability concepts and OBDD method to evaluate the reliability of a multi-state system with imperfect coverage. This method could also be extended to use modularization methods for reliability analysis. This efficient integration of OBDD and modularization method simplifies the problem further.
Section 2 introduces the concepts of OBDD and coverage model. Section 3 illustrates a new model and a new approach to evaluate the reliability of a multi-state system with imperfect coverage. Section 4 proposes an OBDD-based algorithm to deal with the dependency problem in the probability evaluation of a multi-state system with imperfect coverage. Section 5 gives some examples. The last section gives the conclusions and future works.
2. Preliminaries
2.1. Ordered Binary Decision Diagram (OBDD)
This section introduces the representation and manipulation of Boolean functions based on OBDD. OBDD [9] is based on a decomposition of Boolean function
called the Shannon expansion. A function f can b e
decomposed in terms of a variable x as:
0
1
fx x fx
x f
A node and its descendants in an OBDD represent a
Boolean function f, where for node label x, one outgoing
edge is directed to the subgraph representing fx1, and the
other tofx0. Shannon decomposition is the basis for using
OBDD. In order to express Shannon decomposition
concisely, the if-then-else (ite) format [10][11] is defined
as: ) , , (x fx 1 fx 0 ite f
2.2. Manipulation of OBDD
The manipulation of OBDD to represent logical operations is simple. In practice, the OBDD is generated by using logical operations on variables. Let Boolean
expressions f and g be:
) , , ( ) , , ( ) , , ( ) , , ( 0 1 0 1 0 1 0 1 G G y ite g g y ite g F F x ite f f x ite f y y x x
A logic operation between f and g can be represented by
OBDD manipulations as:
°¯ ° ® ! ¡ ¡ ¡ ¡ ¡ ¡ ¡ ) ( order ) ( order ) , , ( ) ( order ) ( order ) , , ( ) ( order ) ( order ) , , ( ) , , ( ) , , ( 0 1 0 1 0 0 1 1 0 1 0 1 y x G f G f y ite y x g F g F x ite y x G F G F x ite G G y ite F F x ite (3)
where ¸ represnents a logic operation such as AND, OR,
and NOT. Figure 1 illustrates the construction and manipulation steps of a Boolean function. For more details
on using the operations of OBDD, please refer [9].
F = (x1and x3) or (x2and x3) Variable Ordering: x1<x2<x3
Evaluation Steps:
x1= declare_var(x1, 1) x2= declare_var(x2, 1) x3= declare_var(x3, 1) T1= BDD_and(x1, x3) T2= BDD_and(x2, x3) F = BDD_or(T1, T2)
0 1 x1 0 1 x2 0 1 T1 0 1 x3 T2 0 1 F 0 1 x1 x2 x3 x1 x3 x2 x3 x1 x2 x3 x3 F = (x1and x3) or (x2and x3) Variable Ordering: x1<x2<x3 Evaluation Steps:
x1= declare_var(x1, 1) x2= declare_var(x2, 1) x3= declare_var(x3, 1) T1= BDD_and(x1, x3) T2= BDD_and(x2, x3) F = BDD_or(T1, T2)
0 1 x1 0 1 x2 0 1 T1 0 1 x3 T2 0 1 F 0 1 x1 x1 x2 x2 x3 x3 x1 x1 x3 x3 x2 x2 x3 x3 x1 x1 x2 x2 x3 x3 xx33
Figure 1. The OBDD generated from a Boolean equation.
2.3. Coverage Model
Figure 2(a) shows the general structure of a fault-coverage model representing a recovery process [12][13] initiated when a fault occurs. The entry point to the model signifies the occurrence of a fault, and the three exits
(R, S, C) signify the 3 possible outcomes.
• If the offending fault is transient and can be handled without discarding any components, then the transient
restoration exit (R) is taken.
• If the fault is determined to be permanent, and the offending component is discarded, then the permanent fault-coverage exit (C) is taken.
• If the fault by itself causes a system to fail, then the
single-point failure exit (S) is taken.
Single-point failure S exit Fault occurs Permanent Coverage Transient Restoration C exit R exit Coverage Model Single-point failure S exit Fault occurs Permanent Coverage Transient Restoration C exit R exit Coverage Model Pr{x[i]} = a[i] Component not failed Pr{y[i]} = b[i] Component failed & covered Pr{z[i]} = c[i] Component failed & uncovered Pr{x[i]} = a[i] Component not failed Pr{y[i]} = b[i] Component failed & covered Pr{z[i]} = c[i] Component failed & uncovered (a) (b)
Figure 2. (a) General structure of a fault coverage model. (b) The event and probability space of component i.
The exit probabilities r0,c0,s0 are required for the analysis
of system reliability. The exits are a partitioning of the event space; thus the three exit probabilities sum to one, i.e.
(c0+ s0) = (1 – r0). The r0,c0,s0 can be determined by an
appropriate fault coverage model [13]; for more details, see [6][8].
For the fault coverage model, each component is always
in one of three states: x[i], y[i], z[i]. To determine the
system reliability (unreliability), it is required to have a[i],
b[i], c[i] which represent the probabilities of component i
associated respectively with the exits of the fault coverage model. Figure 2(b) shows the event space (and corresponding probability) representation of a component. Therefore,
>
@
>
1 exp[ (1 ) ]@
] [ ] ) 1 ( exp[ 1 ] [ ] ) 1 ( exp[ ] [ 0 0 0 0 0 0 0 0 0 0 0 0 t r s c s i c t r s c c i b t r i a i i i i i i i i i i i i O O O (4) where (ri0,ci0,si0) are the probabilities of taking (transient restoration, permanent coverage, single-point failure) exit in the coverage model, andOi0is the rate of occurrence offault of component i. It should be noted that the effective failure rate Oi and the effective coverage factor ci of component i are ) ( ) 1 ( ) ( 0 0 0 0 0 0 0 0 i i i i i i i i i i s c c c r s c { { O O O (5) Amari et al. [12] proposed an efficient algorithm, the SEA, to calculate the reliability of a system under the imperfect coverage model. The basic idea is shown in the following equation and could be easily proved [12] by using conditional probabilities.
System Unreliability (Us) =
Pr{at least one uncovered failure}× Pr{system failure | a uncovered failure} + Pr{no uncovered failure}×
Pr{system failure | no uncovered failure} (6) Let Pr{no uncovered failure}
iS(a[i]b[i]) Pu, thenPr{at least one uncovered failure} = 1–Pu . Also let Pr{system failure | no uncovered failure} = Ucs. Since Pr{system failure | at least one uncovered failure} is always equal to 1, we have cs u s s cs u cs u u s R P U R R P U P P U 1 1 ) 1 ( (7) where Rs is the system reliability and Rcs is Pr{system success | no uncovered failure}.
Example 1:
For a terminal-pair network system, Kuo [14] proposed an efficient approach to determine the terminal-pair (from source node s to target node t) reliability based on edge expansion diagrams using OBDD. The main idea, which makes his approach very efficient, is that the OBDD of a given network is automatically constructed with mergence of isomorphic sub-problems during tracing all paths of the terminal-pair. Therefore, the system reliability is efficiently derived from OBDD.
Considering a bridge network as shown in Figure 3(a), Figure 3(b) shows the OBDD of this network system. Therefore, we get the conditional reliability, Rcs, of the network system by substituting the conditional reliability/ unreliability (p[i]/q[i]) for the reliability/unreliability of component i. Then, we can easily obtain the reliability of a network system subject to imperfect coverage from Equation (7). By this efficient integration, we don’t need to solve the whole state’s problem using Markov chains even when the network system is quite large and complex. In addition, using conditional probabilities, the computational
complexity of this method is the same as that of the method for solving perfect coverage problems. ϭ
x1 0 1 x2 x3 x2 x3 x4 x5 x4 ] [ ] [ ] [ ] [ i b i a i a i p ] [ ] [ ] [ ] [ i b i a i b i q x1 x2 x3 x4 x5 t s x1 0 1 x2 x3 x2 x3 x4 x5 x4 x1 x1 0 1 x2 x2 x3 x3 x2 x2 x3 x3 x4 x4 x5 x5 x4 x4 ] [ ] [ ] [ ] [ i b i a i a i p ] [ ] [ ] [ ] [ i b i a i b i q x1 x2 x3 x4 x5 t s x1 x2 x3 x4 x5 t s (a) (b)
Figure 3. (a) A bridge network. (b) The OBDD of (a).
3. Multi-state Coverage Model
3.1. Multi-state Systems with Imperfect Coverage
Assume there are n modules in a system and module i has mistates (i = 1,…,n). Depending upon the performance /capacity, we can arrange the states such that state miis a perfect state and state 1 is a failed state (the performance level decreases from state mito state 1). The ordering is not a constraint to apply the proposed algorithm, but it helps to use the existing algorithms for multi-state coherent system subject to perfect coverage model (PCM) as a part of an algorithm to solve the MSS problem subject to imperfect coverage model (IPCM). If imperfect coverage is introduced in the model, then each module will have an extra state; i.e. the total number of states in module i become mi+1. Here, state 0 is the state corresponding to the uncovered failure of module i. Figure 4 shows the event and probability space of multi-state module i.
Assumption
A system subject to imperfect coverage will function satisfactorily as long as there exist system conditions that satisfy the system success requirements under the condition of perfect coverage and when there are no uncovered failures.
State Representation
state mi perfect state of module i (highest performance level of module i)
state j state of a module at performance level j
state 1 state of a module at zero performance level (module failed and covered state)
state 0 module failed and uncovered state Notation
I i i x
x , indicator variable of state of module i for [PCM,
IPCM]; xi=l means modules i of PCM is in state
l. I j i j i x
x: , : represents that module i of [PCM, IPCM] is at
performance level j or above; i.e., I
j i j i x
equivalent toxit ,j xiI t j
Pi(t, j) Pr{module i of PCM is in state j at time t}
PiI(t, j) Pr{module i of IPCM is in state j at time t}
Pic(t, j) Pr{module i of IPCM is in state j at time t | no
uncovered failure in module i}
Ri(t, j) Pr{module i of PCM is in state j at time t}
= (, ) Pr{ i:j}
m j
k Pi t k x
¦
RiI(t, j) Pr{module i of IPCM is in state j at time t}
= (, ) Pr{ : } I j i m j k I i t k x P
¦
Ric(t, j) Pr{module i of IPCM is in state j at time t | no uncovered failure in module i}
Ps(t, j) Pr{system of PCM is in state j at time t} PsI(t, j) Pr{system of IPCM is in state j at time t}
Psc(t, j) Pr{system of IPCM is in state j at time t | no uncovered failure in module i}
Rs(t, j) Pr{system of PCM is in state j at time t} =
¦
mk jPs(t,k)RsI(t, j) Pr{system of IPCM is in state j at time t} =
¦
mk jPsI(t,k)Rsc(t, j) Pr{system of IPCM is in state j at time t | no uncovered failure in module i}
Pr{xi= m} = PiI(t,m) … Pr{xi= 3} = PiI(t,3) Pr{xi= 2} = PiI(t,2) Module at various performance levels Pr{xi=1}= PiI(t,1)
Module failed & covered Pr{xi=0} = PiI(t,0)
Module failed & not covered Pr{xi= m} = PiI(t,m) … Pr{xi= 3} = PiI(t,3) Pr{xi= 2} = PiI(t,2) Module at various performance levels Pr{xi=1}= PiI(t,1)
Module failed & covered Pr{xi=0} = PiI(t,0)
Module failed & not covered
Figure 4. The probability space of multi-state module i.
Example 2 :
Figure 5 shows the combinatorial performance requirements of a system for being operational at performance level s. It includes three sub-requirement trees T1, T2,T3. The event, xi:j, of the tree means the minimum performance requirement for the system to operate at performance level s. That is, xi:j means module i needs to be operational at performance level j or above. For example, in Figure 5, the system will be at performance level s if every module i (i = 1, 2, 3) is operational at level 2 or above, or if any module i (i = 1, 2, 3) is operational at level 3 or above, or if module 1 is operational at level 4 or above, or both module 2 and module 3 are operational at level 4 or above.
From the definition of MSS [2], the probability of event xi:j is ) , ( ) , ( } Pr{x: P t k Ri t j m j k i j i
¦
(8)However, since “module 3 is operational at level 4 or above” implies “module 3 must be operational at level 3 or above”, there exists dependency between Pr{x3:3} and
Pr{x3:4}. We need to deal with the dependency problem in
the probability calculation of combinatorial performance requirements.
ϭ
system at performance level s
OR OR x1:3x2:3x3:3 AND x1:2x2:2x3:2 T1 T2 x1:4 AND OR x2:4x3:4 T3
system at performance level s
OR OR OR OR x1:3x2:3x3:3 AND AND x1:2x2:2x3:2 T1 T2 x1:4 AND AND OR OR x2:4x3:4 T3
Figure 5. The combinatorial performance requirements of a multi-state system being operational at performance level s.
3.2. Reliability/Availability Evaluation of a Multi-
state System
In this section, an algorithm similar to SEA for evaluating the system reliability of multi-state imperfect coverage is proposed. Using this method, the MSS with IPCM can be solved using the corresponding MSS subject to PCM.
System Reliability (RsI) =
Pr{no uncovered failure in system} ×
Pr{system success | no uncovered failure in system} Pr{no uncovered failure in system} (Pu) =
iS Pr{no uncovered failure of module i}Pr{no uncovered failure of module i} =
RiI(t, 1) = 1 – PiI(t, 0) (9) Further, ) 1 , ( / ) , ( } 0 | Pr{ ) , (t j x j x R t j R t R iI I i I i I i c i t ! (10)
This probability represents the conditional probability that module i of IPCM is in state j (i.e. performance level j). Therefore, Pr{system success | no uncovered failures in system} can be obtained by substituting Ri(t, j) with Ri
c (t, j) in the corresponding PCM model. Hence, the system reliability subject to imperfect coverage will be
¿ ¾ ½ ¯ ® u ) , ( ) , ( ), , ( ) , ( PCM with of y Reliabilit j t P j t P j t R j t R P R c i i c i i u I s (11) where
iS I i u R t P (,1), Ric(t, j) = RiI(t, j)/RiI(t, 1), Pic(t, j) =PiI(t, j)/RiI(t, 1).It should be noted that the same algorithm is applicable for availability evaluation, but in this case the input-set to the algorithm should be derived using components availability models. Moreover, the state probabilities of MSS subject to IPCM can be found as follows:
PsI(t, j) = Pu ×Psc(t, j) Ps c (t, j) = Ps(t, j) of PCM with Pi(t, j) = Pi c (t, j) (12) Therefore, the step wise procedure of the proposed algorithm is as follows:
1. Read the state probabilities of all modules, i.e. PiI(t, j) for i = 1, …, n; j = 0, 1, …, mi.
3. Find
in I iu R t
P 1 (,1)
4. Find the conditional probabilities of each module at every level; • Ri c (t, j) = Ri I (t, j)/Ri I (t, 1) • Pic(t, j) = PiI(t, j)/RiI(t, 1)
5. Use these conditional probabilities to find the system reliability/availability (or the probability of a system state) at the required performance level of the corresponding MSS subject to PCM. Solve modular structures using modularization method. Use the proposed OBDD method to solve generic problems. Let this conditional probability be P.
• Reliability Rsc(t, j): Find the reliability of MSS with PCM by substituting either Pic(t, j) for Pi(t, j) or Ric(t, j) for Ri(t, j)
• Availability Rs c
(t, j): Find the availability of MSS with PCM by substituting either Pic(t, j) for Pi(t, j) or Ric(t, j) for Ri(t, j) (here Ric(t, j) and Ri(t, j) are corresponding availabilities).
• System state probability Ps c
(t, j): Find the probability of the state of MSS with respect to PCM by substituting either Pic(t, j) for Pi(t, j) or Ric(t, j) for Ri(t, j).
6. Find the system reliability/availability (or the probability of a system state) of MSS with IPCM using:
Reliability/Availability = PuP
Consider that the problem can be solved using general multi-state algorithms. The computational time is proportional to O(m+1)n. Hence, the computational complexity is O(mn). However, we can’t use modularization methods in this case. In our method, the advantage of using conditional probabilities makes it possible to apply this method for modular structures. Then the problem can be solved in linear time (in most cases). Without using conditional probabilities, we can’t apply the solutions of modular structures.
The percentage reduction (reduction factor) by using our method is approximately m n
m ) (
1 1 . That means the reduction increases with n and decreases with m. But the difference (m+1)n–mn actually increases with m. However, this is a worst case situation without modular structures in a system. In general, our method is much more faster than existing methods.
Figure 6 shows the OBDDs of three sub-requirement trees in Example 2. As mentioned earlier, there exist some dependencies between the sub-requirement trees, and it generates dependency problems for the probability calculation in step 5 of our algorithm. Next section will show the methods to deal with the problems.
4. Multi-state OBDD
4.1. Multi-state Dependency Operation (MDO)
In a multi-state system with multi-state modules, different performance levels for the same module are
represented by different nodes of the OBDD as shown in Figure 6. The operations on OBDD should be modified to deal with the dependency problem in probability calculation. We use of the methods in [15][16] to solve this problem.
For example, xi:k means module i is operational at performance level k or above and xi:l means the same module i is operational at performance level l or above. If level k is greater than level l, then xi:k= 1 implies xi:l = 1, i.e.xi:k xi:l. Therefore, there exist dependencies between xi:k and xi:l. We need to deal with dependency in constructing the OBDD tree of a multi-state system. It means some branches should be cut during the multi-state dependency operation (MDO).
Lemma 1:
If performance level k is greater than performance level l, i.e.xi:k xi:l, and the order of xi:k is smaller than xi:l, then
] , , [ ) , , ( ) , , ( 0 1 1 : 0 1 : 0 1 : l k i l k l i k i E K L K x ite E E L L x ite K K x ite ¡ ¡ ¡ ¡ (13) Proof: Since the two nodes belong to the same module, module i being operational at performance level k or above means module i must be operational at performance level l or above, i.e. xi:k= 1 implies xi:l= 1. Therefore,
] , , [ ] ) ( ) ( , ) ( ) ( , [ ] ) ( , ) ( , [ ) , , ( ) , , ( 0 1 1 : 0 : 0 : 1 : 1 : : 0 : 1 : : 0 1 : 0 1 : l k i k i x l k i x k l i x l k i x k k i k i x l k k i x l k k i l k l i k i E K L K x ite E E E E x ite E E E E x ite E E L L x ite K K x ite ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡
The derivation use the relation(El)xi:k 0 Elsince xi:k = 0 is not relevant to El.
ϭ For example, in Figure 6, if the ordering of the nodes is x1:4 < x1:3 < x1:2 < x2:4 < x2:3 < x2:2 < x3:4 < x3:3 < x3:2 , after
applying the MDO on the system, we get the result as shown in Figure 7. It should be noted that x3:4 is
automatically eliminated during MDO. This is because, from the sub-requirement tree T2 in Figure 5, when module
3 is operational in performance level 3 or above, it makes the system meet the required performance level s. Hence, we don’t need to consider if module 3 is operational in performance level 4 or above in sub-requirement tree T3, i.e.
the node x3:4 disappears.
¡ ¡ x1:4 0 1 x2:4 x3:4 T3 x1:3 0 1 x2:3 x3:3 T2 x1:2 0 1 x2:2 x3:2 T1 ¡ ¡ x1:4 0 1 x2:4 x3:4 T3 x1:4 x1:4 0 1 x2:4 x2:4 x3:4 x3:4 T3 x1:3 0 1 x2:3 x3:3 T2 x1:3 x1:3 0 1 x2:3 x2:3 x3:3 x3:3 T2 x1:2 x1:2 0 1 x2:2 x2:2 x3:2 x3:2 T1
x1:4 x2:4 x1:3 x2:3 x3:3 x1:2 0 1 x2:2 x3:2 x2:4 x2:3 x3:3 x1:4 x1:4 x2:4 x2:4 x1:3 x1:3 x2:3 x2:3 x3:3 x3:3 x1:2 x1:2 0 1 x2:2 x2:2 x3:2 x3:2 x2:4 x2:4 x2:3 x2:3 x3:3 x3:3
Figure 7. The OBDD of the system in Figure 5.
4.2. Dependency Probability Calculation
The traditional recursive algorithm can efficiently calculate the probability for an OBDD. However, for the OBDD tree of a multi-state system as shown in Figure 7, there exists dependency between a node and its child node if they belong to the same module but with different performance levels. Therefore, the traditional recursive algorithm should be modified to deal with the dependency problem in probability calculation.
Let k and l be two performance levels (k > l). Table 1 shows the rules of level algebra. The validness of these relationships can be easily verified. Note that xi:k = w1w2…
wk , where wh is the Boolean value (True/False) that represents module i to be (operational/failed) at performance h (1 h k).
Table 1. The rules of level algebra (k > l).
operation value
1. xi:kxi:l xi:k
2. xi:kxi:l 0
3. xi:kxi:l xi:l
4. xi:kxi:l xi:l xi:k
Therefore, for the 1st rule,
k i k l k l i k i x ww w ww w ww w x x: : ( 1 2... )( 1 2... ) 1 2... :
The physical meaning of this equation is that the requirement “module i is operational both at performance level k and l” is equivalent to the requirement “module i is operational at performance level k”. For the 2nd rule,
0 ) ... ( ) ... ( 1 2 1 2 : :k il k l i x ww w w w w x
The physical meaning of this equation is that the requirement “module i is operational at performance level k, but is not operational at performance level l” does not exist. For the 3rd rule,
l i l k l i k i x ww w ww w x x: : (1 1 2... )( 1 2... ) :
The physical meaning of this equation is that the
requirement “module i is not operational both at performance level k and l” is equivalent to the requirement “module i is not operational at performance level l”.
For the 4th rule, xi:kxi:l means that module i is operational at performance level l, but failes when the performance achieves level k, and we have
k i l i l k l i k i x ww w ww w x x x: : (1 1 2... )( 1 2... ) : :
These rules in Table 1 are only applicable to variables or nodes belonging to the same module. The ordinary Boolean relationships hold for the indicator variables belonging to different modules since module i and j are s-independent.
Looking into the OBDD tree constructed from MDO as shown in Figure 7, we find that a 1-edge always connects two nodes that belong to different modules. However, for the 0-edge, there are two cases that must be treated differently:
1. The 0-edge linking nodes belong to different module. 2. The 0-edge linking nodes belong to the same module.
In case 1, we calculate the probability of the node using ordinary OBDD method since the nodes are s-independent to each other. However, in case 2, we should make some modifications in probability calculation.
Lemma 2:
For the nodes (xi:k , xj:l) belonging to different modules, if G = ite (xi:k,G1,G0) and G0 = ite (xj:l,H1,H0), then
} Pr{ }] Pr{ } [Pr{ } Pr{ } Pr{ } Pr{ 0 0 1 : 0 : 1 : G G G x G x G x G k i k i k i (14) Proof: This is an ordinary OBDD equation of calculation.
ϭ Lemma 3:
For the nodes (xi:k , xi:l) belonging to the same module but with different performance levels, if G = ite (xi:k,G1,G0)
and G0 = ite (xi:l,H1,H0), then
>
Pr{ } Pr{ }@
Pr{ } }Pr{ }
Pr{G xi:k G1 H1 G0 (15)
Proof: Since the 1-edge branch of xi:k always links to the node which belongs to a different module, Pr{xi:k} is independent of Pr{G1}, i.e. Pr{xi:kG1} = Pr{xi:k}Pr{G1}.
Therefore, applying the rules of level algebra in Table 1, we get } Pr{ }] Pr{ } [Pr{ } Pr{ } ) Pr{( } Pr{ } Pr{ } Pr{ } Pr{ } Pr{ } Pr{ 0 1 1 : 0 : 1 : : 1 : 0 : : 1 : : 1 : 0 : 1 : G H G x H x H x x G x H x x H x x G x G x G x G k i l i k i l i k i l i k i l i k i k i k i k i ϭ Let us consider the reliability calculation of a multi-state system with imperfect coverage using OBDD. As depicted in Section 3, we first find the conditional probability
}
Pr{xic:k of each multi-state module i. Second we use
}
Pr{xic:k instead of the probability or reliability of module i,
}
Pr{xic:k , to calculate the multi-state system conditional reliability Rcmss from Equations (14) and (15). Therefore, we have the following lemma.
If G = ite (xi:k, G1, G0),G0 = ite (Z, H1, H0), and the
order of node xi:k is smaller than that of node Z, the probability of G is
>
@
>
@
° ¯ ° ® c c module) same the to belong and (if modules) different to belong and (if : 0 1 1 : : 0 0 1 : } Pr{ } Pr{ } Pr{ } Pr{ } Pr{ } Pr{ } Pr{ } Pr{ } Pr{ Z x Z x k i k i k i k i G H G x G G G x G (16)where Pr{xic:k} is the conditional reliability of module i
being operational at performance level k or above given that no uncovered failure occurred in that module (or module i).
Therefore, the probability of the OBDD’s root node representing the multi-state system conditional reliability Rcmss is obtained from Equation (16). Hence, we get the multi-state system reliability RMSS by
RMSS = Rcmss × Pu (17)
5. Examples
Let us consider a bridge network shown in Figure 3(a) of Example 1. Assuming redundancy techniques are used such that each link has a fault-tolerance scheme. Therefore, we can treat a link as a module with various link capacities or with various performance levels (i.e. a multi-state network system). Also the fault-coverage condition should be considered. The path function of the system is
4 3 2 5 2 4 1 5 3 1x x xx x x x x x x F
Case I– the basic requirement for the system being in a acceptable performance level is:
2 : 4 2 : 3 2 : 2 2 : 5 2 : 2 2 : 4 2 : 1 2 : 5 2 : 3 2 : 1 x x x x x x x x x x accept )
Case II– the path x1x4 is the backbone of the network
and most of the dataflow run through the path, and the limitation of the requirement for the path is more strict. Therefore if x1 needs to be at least in level 5 and x4 needs to
be at least in level 4 for path x1x4, the requirement for the
system being in a good performance level is:
2 : 4 2 : 3 2 : 2 2 : 5 2 : 2 4 : 4 5 : 1 2 : 5 2 : 3 2 : 1 x x x x x x x x x x good )
Figure 8(a)(b) show the results of )accept and )good after applying MDO. Table 2 shows the parameters of a module obtained by using Markov techniques and assuming each module has 6 performance states including failed and uncovered state. If all modules in Figure 3(a) are identical, the system reliabilities of )accept and )good with different coverage factors are obtained from Equation (16) and (17) as shown in Figure 9. Figure 9 shows that )good is less reliable than )accept. That means we need to pay more on the system if we want to increase the reliability of )good to be the same as that of )accept. Figure 10 illustrates the system reliability decreases when the required level of path x1x4 increases. This means the higher the required
performance level of a module, the more difficult the system satisfying a demand.
6. Conclusions
x1:2 0 1 x2:2 x3:2 x2:2 x3:2 x4:2 x5:2 x4:2 0 1 x4:4 x3:2 x2:2 x2:2 x1:2 x1:5 x2:2 x3:2 x3:2 x3:2 x4:4 x4:4 x4:2 x5:2 x1:2 0 1 x2:2 x3:2 x2:2 x3:2 x4:2 x5:2 x4:2 x1:2 x1:2 0 1 x2:2 x2:2 x3:2 x3:2 x2:2 x2:2 x3:2 x3:2 x4:2 x4:2 x5:2 x5:2 x4:2 x4:2 0 1 x4:4 x3:2 x2:2 x2:2 x1:2 x1:5 x2:2 x3:2 x3:2 x3:2 x4:4 x4:4 x4:2 x5:2 0 1 x4:4 x4:4 x3:2 x3:2 x2:2 x2:2 x2:2 x2:2 x1:2 x1:2 x1:5 x1:5 x2:2 x2:2 x3:2 x3:2 x3:2 x3:2 xx3:23:2 x4:4 x4:4 xx4:44:4 x4:2 x4:2 x5:2 x5:2(a) The OBDD of)accept (b) The OBDD of)good
Figure 8. The results of)accept and)good after MDO.
Table 2: The parameters of individual module with exponential distribution given t = 20, failure
rateOi= 0.015 and coverage factor ci = 0.9.
j PI(t, j) Pc(t, j) Rc(t, j) 0 0.0259 —– —– 1 0.2333 0.2395 1.0000 2 0.0823 0.0845 0.7605 3 0.2469 0.2535 0.6760 4 0.2469 0.2535 0.4225 5 0.1646 0.1690 0.1690 0 10 20 30 40 50 0.5 0.6 0.7 0.8 0.9 1 T im e S y s te m R e liab ility accept (c = 1) accept (c = 0.9) accept (c = 0.8) good (c = 1) good (c = 0.9) good (c = 0.8)
Figure 9. The system reliability of)accept and)good
with different coverage factors c.
0 10 20 30 40 50 0.5 0.6 0.7 0.8 0.9 1 T im e S y s tem R e li a bi li ty level = 2 level = 3 level = 4 level = 5
Figure 10. The system reliabilities of)good with
This paper has proposed an OBDD-based approach for the reliability evaluation of a multi-state system with combinatorial performance requirements subject to imperfect fault coverage. A new model for multi-state systems with imperfect fault coverage has also been proposed. It was shown that the algorithm used to evaluate the reliability can also be used to evaluate the availability of a system subject to imperfect fault coverage if the Markov process is applied to analyze the state transition behavior. Further, with the application of conditional probabilities, the time complexity of this method for reliability evaluation is the same as that without considering imperfect coverage.
In addition, an efficient integration of OBDD and modularization simplifies the problem further. The multi-state dependency operation (MDO) method handles the dependencies between the combinatorial performance requirements. Through the MDO method, some of the redundant nodes in OBDD are automatically eliminated. This means that we can simplify the combinatorial performance requirements using MDO. Moreover, our approach deals with the dependency problem of multi-state modules in probability calculation.
[2] and [17], with helpful comments from [2], have presented various performance measures related to multi-state systems. In order to compute these measures we need to find the reliability of a system at various performance levels. Therefore, the result of this paper can be integrated to find the performance measures of multi-state systems. This process is straightforward, and therefore, it is not discussed here explicitly [2].
This algorithm could be applied to complex systems such as fault-tolerant computer systems, variable link- capacities network systems, etc., since it generates the complete results more quickly and accurately even when there exist a number of dependencies such as shared loads (reconfiguration), degradation, common- cause failures and so on. Based on this approach, researches on sensitivity analysis, importance measures, failure frequency analysis or optimal design issues of multi-state systems will be the focus of our future works.
7. Acknowledgement
This research was supported by the National Science Council, Taiwan, R.O.C. under grant NSC 90-2213-E-002- 113.
8. References
[1] J.D. Murchland, “Fundamental concepts and relations for
reliability analysis of multistate systems”, Reliability and Fault
Tree Analysis, Theoretical and Applied Aspects of System Reliability, SIAM, 1975, pp. 581-618.
[2] J. Xue and K. Yang, “Dynamic reliability analysis of coherent
multi-state systems”, IEEE Trans. on Reliability, Vol.44, Dec.
1995, pp. 683-688.
[3] G. Levitin, “Incorporating Common-Cause Failure Into
Nonrepairable Multistate Series-Parallel System Analysis”, IEEE
Trans. on Reliability, Vol. 50, No. 4, Dec. 2001, pp. 380-388.
[4] G. Levitin, A. Lisnianski, H. Beh-Haim, and D. Elmakis, “Redundancy optimization for series-parallel multi-state systems”,
IEEE Trans. on Reliability, Vol. 47, June 1998, pp. 165-172.
[5] J. Xue, “On multistate system analysis”, IEEE Trans. on
Reliability, Vol. R-34, Oct. 1985, pp. 329-337.
[6] J.B. Dugan, “Fault Tree and Imperfect Coverage”, IEEE Trans.
on Reliability, Vol. R-38, June 1989, pp. 177-185.
[7] S.V. Amari, J.B. Dugan, and R.B. Misra, ”Optimal Reliability
of Systems subject to Imperfect Fault-Coverage”, IEEE Trans. on
Reliability, Vol. 48, No. 3, Sep. 1999, pp. 275-284.
[8] S.A. Doyle, J.B. Dugan, F.A. Patterson-Hine, "A
combinatorial Approach to Modeling Imperfect Coverage", IEEE
Trans. on Reliability, Vol. 44, Mar. 1995, pp. 87-94.
[9] R.E. Bryant, “Graph-based algorithms for Boolean function
manipulation”, IEEE Trans. on Computers, Vol. C-35, Aug. 1986,
pp. 677-691.
[10] A. Rauzy, “New algorithms for fault tree analysis”,
Reliability Engineering and System Safety, Vol. 40, 1993, pp.
203-211.
[11] R.M. Sinnamon and J.D. Andrews, “Improved efficiency in
qualitative fault tree analysis”, Quality and Reliability
Engineering Int’l., Vol. 13, 1997, pp. 293-298.
[12] S.V. Amari, J.B. Dugan, and R.B. Misra, "A separable method for incorporating imperfect fault-coverage into combinatorial
models", IEEE Trans. on Reliability, Sep. 1999, pp. 267-274.
[13] J.B. Dugan and K.S. Trivedi, “Coverage modeling for
dependability analysis of fault-coverage systems”, IEEE Trans. on
Computers, Vol. 38, June 1989, pp. 775-787.
[14] S.Y. Kuo, S.K. Lu, and F.M. Yeh, ”Determining Terminal- Pair Reliability Based on Edge Expansion Diagrams Using
OBDD”, IEEE Trans. on Reliability, Vol. 48, Sep. 1999, pp.
234-246.
[15] X. Zang, H. Sun, and K.S. Trivedi, “Dependability Analysis of Distributed Computer Systems with Imperfect Coverage”,
Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing, 1999, pp. 330-337.
[16] X. Zang, H. Sun, and K.S. Trivedi, “A bdd-based algorithm for analysis of multi-state systems with multi-state components”, Technical Report, 1998.
[17] S.V. Amari and R.B. Misra, “Comment on: Dynamic
Reliability Analysis of Coherent Multistate Systems”, IEEE Trans.