OBDD-Based Evaluation of Reliability and Importance Measures for Multistate Systems Subject to Imperfect Fault Coverage

(1)

OBDD-Based Evaluation of Reliability and

Importance Measures for Multistate Systems

Subject to Imperfect Fault Coverage

Yung-Ruei Chang, Member, IEEE, Suprasad V. Amari, Senior Member, IEEE, and

Sy-Yen Kuo, Fellow, IEEE

Abstract—Algorithms for evaluating the reliability of a complex system such as a multistate fault-tolerant computer system have become more important recently. They are designed to obtain the complete results quickly and accurately even when there exist a number of dependencies such as shared loads (reconfiguration), degradation, and common-cause failures. This paper presents an efficient method based on Ordered Binary Decision Diagram (OBDD) for evaluating the multistate system reliability and the Griffith’s importance measures which can be regarded as the importance of a system-component state of a multistate system subject to imperfect fault-coverage with various performance requirements. This method combined with the conditional probability methods can handle the dependencies among the combinatorial performance requirements of system modules and find solutions for multistate imperfect coverage model. The main advantage of the method is that its time complexity is equivalent to that of the methods for perfect coverage model and it is very helpful for the optimal design of a multistate fault-tolerant system.

Index Terms— Reliability, multistate system, OBDD, fault-coverage, importance measure.

æ

1 I

NTRODUCTION

T

he multistate system theory has been investigated since

1975 [1]. For example, a power plant which has states 0, 1, 2, 3, and 4 that correspond to generating electricity of 0, 25, 50, 75, 100 percent of its full capacity is an example of a multistate system that has ordered multiple states [2]. A nuclear reactor system [3] or a pumping system [4] that performs differently according to the many different combi-nations of states of its subsystems or submodules are example multistate systems with unordered multiple states. Many researchers have analyzed the multistate system reliability [5], [6], [7]. Most of them extend the concepts and conclusions for the two-state systems to the multistate systems. To describe the dynamic characteristics of the component state transition, Stochastic process (Markov process) techniques are combined with the multistate system theory to analyze the dynamic multistate system reliability. The multistate reliability theory can handle situations in which the system and its components have a range of performance levels, e.g., from perfect operation to complete failure. Because performance degradation is very common in industrial products, it is important to develop the multistate system reliability theory.

When a multistate system (MSS) is considered, it is important to estimate the impact of each element on the system output/performance. The general definition of the MSS reliability [6] is:

RMSSðt; LÞ ¼ PrfF ðtÞ Lg; ð1Þ

where L is the required performance level for MSS, F ðtÞ is the MSS output/performance at time t, which is a random variable. That is, RMSSðt; LÞ means the probability that the system output/performance is above the required perfor-mance level over the time interval ð0; tÞ. For a multistate system that has a finite number of states, there can be H different levels of output/performance at time t. Therefore, FðtÞ belongs to the finite vector set F which is composed of Hdifferent levels of output/performance, Fh:

FðtÞ 2 F ¼ fFh; 1 h Hg ð2Þ

and the system output/performance distribution can be defined by the following finite vector set:

q¼ fqhðtÞg ¼ fPrfF ðtÞ ¼ Fhgg; ð1 h HÞ: ð3Þ Therefore, the nonrepairable MSS reliability, which is also equivalent to availability, is the probability that the system remains in the states with Fh L during ð0; tÞ:

RMSSðt; LÞ ¼ X FhL

qhðtÞ: ð4Þ

Similarly, (4) could be used for the definition of MSS availability if the multistate system is repairable, where

qhðtÞ is the availability of the system running at the

performance Fh.

In addition, systems that are used in life-critical applica-tions such as flight control, nuclear power plant monitoring, . Y.-R. Chang is with Institute of Nuclear Energy Research, Atomic Energy

Council, Taoyuan 32546, Taiwan. E-mail: raymond@iner.gov.tw. . S.V. Amari is with Relex Software Corporation, 540 Pellis Road,

Greensburg, PA 15601. E-mail: suprasad.amari@relexsoftware.com. . S.-Y. Kuo is with the Department of Electrical Engineering, National

Taiwan University, Taipei 10617, Taiwan. E-mail: sykuo@cc.ee.ntu.edu.tw.

Manuscript received 2 Feb. 2004; revised 18 July 2005; accepted 6 Oct. 2005; published online 3 Nov. 2005.

For information on obtaining reprints of this article, please send e-mail to: tdsc@computer.org, and reference IEEECS Log Number TDSC-0025-0204.

(2)

space missions, etc., are designed with sufficient redun-dancy to be tolerant of errors. However, if the system cannot adequately detect, locate, and recover from faults and errors in the system, then system failure can still occur even when there exists adequate redundancy [8]. An accurate analysis must account for not only the complex system structure, but also the system fault and error recovery behavior. Therefore, the fault-coverage problem of a system should be considered. This helps in determining the optimal level of redundancy [9].

Most of the published works use Markov models (nonhomogenous Markov or semi-Markov model) to solve multistate problems. However, it is difficult to find the correct model of a system since there will be a total of N ¼ ðm þ 1Þnstates if there are n modules in the system and each module has ðm þ 1Þ states including the imperfect coverage state. The computational time to solve the Markov model is proportional to N3_{¼ ½ðm þ 1Þ}n

3. Hence, the computational complexity of the problem is Oðm3n_{Þ [10], [11].}

The recent literature [12], [13], [14], [15], [16], [17], [18], [19] showed that, in many cases, Ordered Binary Decision Diagram (OBDD)-based algorithms are more efficient in reliability evaluation compared to other methods such as the Inclusion-Exclusion (I-E) method and the sum of disjoint products (SDP) method. This paper, which is modified from [14], provides an approach for modelling a multistate system and proposes an efficient method inte-grated with the OBDD method and the conditional probability concepts to evaluate the reliability of a multi-state system with imperfect coverage. This efficient integra-tion of the OBDD and the modularizaintegra-tion methods simplifies the problem further. This method could also be integrated with those methodologies that use the OBDD method for reliability analysis [13], [15], [16], [17], [18].

Furthermore, the importance measure, also called the sensitivity analysis, is used to measure the effect of the individual component reliabilities on the system reliability. Identifying the weaknesses of the system and how failure of each individual component affects the proper function-ing of the system is an important topic for optimal design issue so that efforts can be spent properly to improve the system reliability [20]. Several literatures have introduced the importance measures in a two-state system. Birnbaum-importance measure indicates the contribution of compo-nent reliability to the system reliability [20], [21]. Structur-al-importance measure indicates the topographic importance of a position in the system [22], [23]. Criticality-importance measure corresponds to the condi-tional probability of a component failure, given that the system has failed [20], [21]. Joint-importance measure indicates how components in a system interact and contribute to the system-reliability [24], [25].

Very few publications discuss how the particular state of a component contributes to a multistate system, and how the presence of a component and a particular state of a component affects the contributions of other components in the system. Most importance concepts have been built on how the state change of a component affects the system [26], [27], [28], [29], [30], [31], [32]. In this paper, we also propose an OBDD-based method to compute the Griffith importance

vector [31], which can be interpreted as the range of a multistate system performance when a module moves from one state to another state and can be regarded as the importance of the state of the system-module. Moreover, with a little modification, we can extend our method to evaluate the Griffith’s importance vector under the im-perfect fault-coverage model.

Section 2 introduces the fundamentals of OBDD and coverage model. Section 3 illustrates a multistate imperfect coverage model and an OBDD-based approach to evaluat-ing the reliability and availability of a multistate system with imperfect coverage. In Section 4, an OBDD-based approach is introduced to compute the Griffith’s impor-tance measure subject to either perfect or imperfect fault-coverage. Section 5 gives some examples. The last section gives the conclusions and future works.

2 P

RELIMINARIES

2.1 Ordered Binary Decision Diagram (OBDD)

This section introduces the representation and manipula-tion of Boolean funcmanipula-tions based on OBDD. OBDD [12] is based on a decomposition of Boolean function called the Shannon expansion. A function f can be decomposed in terms of a variable x as:

f¼ x fx¼1þ xx fx¼0: ð5Þ

A node and its descendants in an OBDD represent a Boolean function f, where, for node label x, one outgoing edge is directed to the subgraph representing fx¼1and the other to fx¼0. Shannon decomposition is the basis for using OBDD. In order to express Shannon decomposition con-cisely, the if-then-else (ite) format [33], [34] is defined as:

f¼ iteðx; fx¼1; fx¼0Þ: ð6Þ

The manipulation of OBDD to represent logical opera-tions is simple. In practice, the OBDD is generated by using logical operations on variables. Let Boolean expressions f and g be:

f¼ iteðx; fx¼1; fx¼0Þ ¼ iteðx; F1; F0Þ g¼ iteðy; gy¼1; gy¼0Þ ¼ iteðy; G1; G0Þ:

ð7Þ A logic operation between f and g can be represented by OBDD manipulations as:

iteðx; F1; F0Þ}iteðy; G1; G0Þ ¼

iteðx; F1}G1; F0}G0Þ orderingðxÞ ¼ orderingðyÞ iteðx; F1}g; F0}gÞ orderingðxÞ < orderingðyÞ iteðy; f}G1; f}G0Þ orderingðxÞ > orderingðyÞ; 8 > < > : ð8Þ

where } represents a logic operation such as AND and OR. Fig. 1 illustrates the construction and manipulation steps of a Boolean function. For more details on using the operations of OBDD, please refer to [12].

2.2 Coverage Model

Fig. 2a shows the general structure of a fault-coverage model representing a recovery process [35], [36] initiated when a fault occurs. The entry point to the model signifies

(3)

the occurrence of a fault, and the three exits ðR; S; CÞ signify the three possible outcomes:

. If the offending fault is transient and can be handled without discarding any components, then the tran-sient restoration exit (R) is taken.

. If the fault is determined to be permanent, and the

offending component is discarded, then the perma-nent fault-coverage exit (C) is taken.

. If the fault by itself causes a system to fail, then the single-point failure exit (S) is taken.

The exit probabilities r0; c0; s0 are required for the analysis of system reliability. The exits are a partitioning of the event space; thus, the three exit probabilities sum to one, i.e., ðc0þ s0Þ ¼ ð1 r0Þ. The values of r0; c0; s0 can be determined by an appropriate fault-coverage model [36]; for more details, see [8], [37].

For the fault-coverage model, each component is always in one of three states: x½i, y½i, z½i, which represent, in Fig. 2b, the states of component not failed, component failed and covered, and component failed and uncovered, respectively. To determine the system reliability (unreliability), it is required to have a½i; b½i; c½i that represent the probabilities of component i associated with the three exits of the fault-coverage model, respectively. Fig. 2b shows the event space (and corresponding probability) representation of a compo-nent. Therefore, a½i ¼ exp½ð1 ri0Þ i0 t b½i ¼ ci0 ci0þ si0 ½1 exp½ð1 ri0Þ i0 t c½i ¼ si0 ci0þ si0 ½1 exp½ð1 ri0Þ i0 t; ð9Þ

where ðri0; ci0; si0Þ are the probabilities of taking (transient restoration, permanent coverage, and single-point failure) exit, respectively, in the coverage model and i0is the fault occurrence rate of component i. It should be noted that the effective failure rate iand the effective coverage factor ciof component i are

i ðci0þ s0Þ i0¼ ð1 ri0Þ i0 ci ci0=ðci0þ si0Þ:

ð10Þ

When a system consists of multiple components and each component is subject to imperfect fault-coverage, the reliability evaluation of such a system becomes more complicated. Amari et al. [35] proposed an efficient algorithm, the SEA, to calculate the reliability of a system that is composed of multiple components under the imperfect coverage model (IPCM). The basic idea is shown in (11) and could be easily proven [35] by using conditional probabilities or the total probability theorem.

System UnreliabilityðUsðtÞÞ

¼ Prfat least one uncovered failureg

Prfsystem failure j at least one uncovered failureg þ Prfno uncovered failureg

Prfsystem failure j no uncovered failureg:

ð11Þ L e t Prfno uncovered failureg ¼ i2Sða½i þ b½iÞ ¼ PuðtÞ, then Prfat least one uncovered failureg ¼ 1 PuðtÞ. Also, let Prfsystem failure j no uncovered failureg ¼ Uc

sðtÞ. Since Prfsystem failure j at least one uncovered failureg is always equal to 1, we have

UsðtÞ ¼ ½1 PuðtÞ þ PuðtÞ UscðtÞ ¼ 1 PuðtÞ RcsðtÞ RsðtÞ ¼ 1 UsðtÞ ¼ PuðtÞ RcsðtÞ;

ð12Þ

where RsðtÞ is the system reliability and RcsðtÞ is

Prfsystem success j no uncovered failureg. Rc

sðtÞ can be de-rived by using the conditional reliability instead of the reli-ability of each component during the evaluation of the conditional system reliability. The correctness and the proof are given in [35]. Therefore, the time complexity of (12) will be equivalent to that of the method under perfect coverage model (PCM).

Fig. 1. The OBDD generated from a Boolean equation.

Fig. 2. (a) General structure of a fault-coverage model. (b) The event and probability space of component i.

(4)

3 M

ULTISTATE

C

OVERAGE

M

ODEL

3.1 Multistate Systems with Imperfect Coverage

In this section, we will extend the three-state model as depicted in Section 2 into a multistate imperfect fault-coverage model. Assume there are n modules in a system, module i has mi states ði ¼ 1; . . . ; nÞ, and a system has H different levels of output/performance. Depending upon the performance/capacity, we can arrange the states such that state mi is a perfect state and state 1 is a failed state (the performance level decreases from state mi to state 1). The ordering is not a constraint, but it helps our algorithm to include the existing algorithms that are for a multistate coherent system under the perfect coverage model (PCM). However, for the imperfect coverage model (IPCM), each module will have an extra state, i.e., the total number of states in module i becomes miþ 1. Here, state 0 is the state corre-sponding to the uncovered failure of module i. Fig. 3 shows the event and probability space of the multistate module i.

Assumption.The uncovered failure of any module within a

system causes immediate uncovered failure of the system.

State Representation

state H: perfect state of a system (highest performance level of a system)

state h: state of a system at performance level h state mi: perfect state of module i (highest performance

level of module i)

state j: state of a module at performance level j

state 1: state of a system/module at zero performance

level (state of system/module failed and covered)

state 0: state of system/module failed and uncovered

Notation

xi; xIi indicator variable of state of module i under

[PCM, IPCM]; xi¼ 1 means modules i under PCM

is in state l.

xi:j; xIi:j represents that module i under [PCM, IPCM] is at performance level j or above, i.e., xi:j; xIi:j are equivalent to xi j; xIi j

Piðt; jÞ Pr{module i under PCM is in state j at time t} PI

iðt; jÞ Pr{module i under IPCM is in state j at time t} P_icðt; jÞ Pr{module i under IPCM is in state j at time t | no

uncovered failure in module i}

Riðt; jÞ Prfmodule i under PCM is in state j at time tg ¼_P mi

k¼jPiðt; kÞ ¼ Prfxi:jg

RI

iðt; jÞ Prfmodule i under IPCM is in state j at time tg¼_P mi

k¼jPiIðt; kÞ ¼ PrfxIi:jg Rc

iðt; jÞ Prfmodule i under IPCM is in state j at time t j no uncovered failure in module ig

Psðt; hÞ Prfsystem under PCM is in state h at time tg PI

sðt; hÞ Prfsystem under IPCM is in state h at time tg Pc

sðt; hÞ Prfsystem under IPCM is in state h at time t j no uncovered failure in the systemg

Rsðt; hÞ Prfsystem under PCM is in state h at time tg ¼_P H

k¼hPsðt; kÞ

RIsðt; hÞ Prfsystem under IPCM is in state h at time tg ¼_P H

k¼hPsIðt; kÞ Rc

sðt; hÞ Prfsystem under IPCM is in state h at time t j no uncovered failure in the systemg

Fig. 4 shows a combination of performance requirements of a multistate system for being operational at performance level h or above. It includes three subrequirements T1, T2, and T3. xi:j is the basic event of subrequirements and represents the minimum performance requirement of

module i. That is, xi:j means module i needs to be

operational at performance level j or above. For example, in Fig. 4, the system will be at performance level h or above if every module i (i ¼ 1; 2; 3) is operational at level 2 or above ðT1Þ, or if any module i (i ¼ 1; 2; 3) is operational at level 3 or above ðT2Þ, or if module 1 is operational at level 4 or above, or both module 2 and module 3 are operational at level 4 or above ðT3Þ.

There exist several methods to solve the problem of reliability evaluation of a multistate system that includes various combinations of subrequirements. However, when there exists imperfect coverage, the entire problem should be solved using Markov chains even when the system requirements can be represented as a combination of modular subrequirements. For large complex systems, the Markov is not suitable for the well-known exponential state explosion problem. Therefore, there is a reasonable method to solve MSS reliability under IPCM. In the next section, we will propose an efficient method for evaluating the reliability of a multistate system subject to imperfect fault-coverage. The computational complexity of the method is equivalent to that of the method under PCM.

3.2 Reliability/Availability Evaluation of a Multistate

System Subject to Imperfect Fault-Coverage In this section, an algorithm for evaluating the reliability of a multistate system subject to imperfect coverage is Fig. 3. The probability space of multistate module i.

Fig. 4. The combination of performance requirements of a system being operational at performance level h or above.

(5)

proposed. Based on (12), the MSS with IPCM can be solved using the corresponding MSS under PCM.

System Reliability yðRI

sðt; hÞÞ ¼ Prfno uncovered failure in systemg

Prfsystem success j no uncovered failure in systemg; ð13Þ

where

Prfno uncovered failure in systemgðPuðtÞÞ

¼Y

i2S

Prfno uncovered failure of module ig

¼Y i2S ð1 PI iðt; 0ÞÞ ¼ Y i2S RI_iðt; 1Þ: ð14Þ Further, Rc_iðt; jÞ ¼ PrfxI i j j x I i > 0g ¼ R I iðt; jÞ=R I iðt; 1Þ: ð15Þ This probability represents the conditional probability that module i under IPCM is in state j (i.e., performance level j) or above. Therefore, Pr{system success | no uncovered failures in system} can be obtained by substituting Rc

iðt; jÞ for Riðt; jÞ in the corresponding PCM. Hence, the system reliability subject to imperfect coverage will be

RI_sðt; hÞ ¼ PuðtÞ Reliability under PCM with Riðt; jÞ ¼ Rciðt; jÞ; Piðt; jÞ ¼ Picðt; jÞ

: ð16Þ It should be noted that the same algorithm is applicable for availability evaluation, but, in this case, the input-set to the algorithm should be derived using component avail-ability models. Moreover, the state probabilities of MSS under IPCM can be found as follows:

P_sIðt; hÞ ¼ PuðtÞ Pscðt; hÞ Pc

sðt; hÞ ¼ Psðt; hÞ of PCM with Piðt; jÞ ¼ Picðt; jÞ 0 h H; 0 j mi:

ð17Þ

Therefore, the procedure of the proposed algorithm is as follows:

1. Read the state probabilities of all modules, i.e.,

P_iIðt; jÞ for i ¼ 1; . . . ; n; j ¼ 0; 1; . . . ; mi. 2. For all i, find RI

iðt; 1Þ 1 PiIðt; 0Þ. 3. Find PuðtÞ ¼ ni¼1RIiðt; 1Þ.

4. Find the conditional probability of each module at

every level; Rc_iðt; jÞ ¼ RI iðt; jÞ=R I iðt; 1Þ P_icðt; jÞ ¼ PI iðt; jÞ=R I iðt; 1Þ: ð18Þ

5. Solve modular structures using the modularization

method and deal with the dependency problem in the probability calculation using the OBDD method. Then, use these conditional probabilities to find the system reliability/availability (or the probability of a system state) at the required performance level h of the corresponding MSS under PCM and solve the generic problems as the following:

. Conditional reliability Rc

sðt; hÞ: Find the condi-tional reliability of a MSS under PCM by substi-tuting either Pc

iðt; jÞ for Piðt; jÞ or Rciðt; jÞ for Riðt; jÞ.

. Conditional availability Rc

sðt; hÞ: Find the condi-tional availability of a MSS under PCM by sub-stituting either Pc

iðt; jÞ for Piðt; jÞ or Rciðt; jÞ for Riðt; jÞ (here, Rciðt; jÞ and Riðt; jÞ are correspond-ing availabilities).

. Conditional system state probability Pc

sðt; hÞ: Find the conditional probability of the state of a MSS with respect to PCM by substituting either Pc

iðt; jÞ for Piðt; jÞ or Rciðt; jÞ for Riðt; jÞ. 6. Find the multistate system reliability/availability (or

the probability of a system state) under IPCM using: RI_sðt; hÞ ¼ PuðtÞ Rcsðt; hÞ

P_sIðt; hÞ ¼ PuðtÞ Pscðt; hÞ:

ð19Þ Consider that, if a problem subject to imperfect fault-coverage is solved using a general multistate algorithm such as the Markov method, the computational time is

proportional to Oðm þ 1Þ3n, where n is the number of

modules in the system and each module has m þ 1 states including the imperfect coverage state. Furthermore, in this case, it is very difficult to combine with the modularization methods to solve the problem. In our method, only the m (instead of m þ 1) state probabilities are manipulated with the conditional probability method under imperfect fault-coverage model. Therefore, the percentage reduction (re-duction factor) by using our method is approximately 1 ðm=ðm þ 1ÞÞ3n. That means the reduction increases with

n and decreases with m. However, this is a worst-case

situation without any modular structure in a system. In our method, the advantage of using conditional probabilities makes it possible to apply this method for modular structures. If there exist some modular structures in the system, the modularization methods can be combined and

n will be reduced. In general, our method is much faster

than existing methods.

In Fig. 4, from the definition of MSS [6], the probability of event xi:j is

Prfxi:jg ¼ Xmi

k¼j

Piðt; kÞ ¼ Riðt; jÞ: ð20Þ

For the calculation of the multistate system reliability, there exists dependency between Prfx3:4g and Prfx3:3g since “module 3 is operational at level 4 or above” implies “module 3 must be operational at level 3 or above.” In Step 5 of our algorithm, we need not only to construct the OBDD representing the combination of performance requirements, but also to deal with the dependency problem in the probability calculation of the OBDD. The algorithm we adopt here is the Multistate Dependency Operation (MDO) algorithm in [18], whose idea is conceived from [38], and has a good result. And next, we combine our idea as described in Step 5 to solve the problem under IPCM. Fig. 5 shows the OBDD representations of three subrequirements of Fig. 4. The resultant OBDD and the probability calcula-tion applied by MDO are shown as Fig. 6. It should be noted that x3:4 is automatically eliminated during MDO. This is

(6)

because, from the subrequirement T2 in Fig. 4, when module 3 is operational in performance level 3 or above, it makes the system fit the requirement of being at performance level h or above. Hence, we don’t need to consider if module 3 is operational in performance level 4 or above in subrequirement T3, i.e., the node x3:4 disappears. The detailed MDO operations and the probability calcula-tions please refer to [18].

However, for the condition under IPCM in Step 5, the procedure of constructing a multistate OBDD is the same as the MDO, but the reliability calculation for IPCM should be made some modifications. We first find the conditional probability Prfx0

i:jg of each multistate module i. Then, we use Prfx0

i:jg instead of the probability or reliability of module i to calculate the conditional multistate system reliability Rc

MSS. Therefore, we combine our algorithm with the MDO and get the following: If G ¼ iteðxi:j; G1; G0Þ, G0¼ iteðZ; H1; H0Þ, and the order of node xi:j is smaller than that of node Z, the probability of G is

PrfGg ¼

Prfx0

i:jg½PrfG1g PrfG0g þ PrfG0g ðif xi:jand Z belong to different modulesÞ Prfx0

i:jg½PrfG1g PrfH1g þ PrfG0g ðif xi:jand Z belong to the same moduleÞ; 8 > > < > > : ð21Þ where Prfx0i:jg is the conditional reliability of module i being operational at performance level j or above given that no uncovered failure occurred in that module (or module i). Therefore, the probability of the OBDD’s root node representing the conditional multistate system reliability Rc_MSSðt; hÞ is obtained by (21). PuðtÞ is derived from (14). Hence, we get the multistate system reliability RI

MSSðt; hÞ under IPCM by

RI_MSSðt; hÞ ¼ PuðtÞ RcMSSðt; hÞ: ð22Þ

4 T

HE

G

RIFFITH

’s I

MPORTANCE

M

EASURE

In this section, we will first discuss the importance measure, also called the sensitivity analysis, of a multistate system under PCM. An OBDD-based method is proposed to evaluate the Griffith importance vector of a multistate system under PCM. With a little modification, the method is also applicable for computing the Griffith importance vector under imperfect fault-coverage model.

In [31], the Griffith importance vector, IG

i , was proposed to study how the particular states of a module contribute to a multistate system, and how the presence of a module and

a particular state of a module affect the contributions of other modules in the system:

IGðiÞ ¼ ðIG 1ðiÞ; . . . ; I G mðiÞÞ I_jGðiÞ ¼X H h¼1 ðFh Fh1Þ ½Rsjxi¼jðt; hÞ Rsjxi¼j1ðt; hÞ ¼X H h¼1 ðFh Fh1Þ ri:jðRsðt; hÞÞ; j ¼ 1; ; Mi; ð23Þ

where Fhis the value of output/performance level when the multistate system is in state h, H is the index of maximum performance level of the system, 1 h H, Rsjxi¼jðt; hÞ is the

reliability of a system being at performance level h or above given that module i is at performance level j, and ri:jðRsðt; hÞÞ ¼ Rsjxi¼jðt; hÞ Rsjxi¼jðt; hÞ.

The IG

j ðiÞ in Griffith’s importance vector can be inter-preted as the range of the system performance when module i moves from state j to state j 1; it can be regarded as the importance of state j of system-module i. If the Boolean expression of Rsðt; hÞ is transformed into the OBDD representation BDDhas described in Section 3, then

ri:jðRsðt; hÞÞ ¼ PrfBDDhjxi¼jg PrfBDDhjxi¼j1g

¼ PrfBDDhjðxi:1;;xi:jÞ¼1;ðxi:jþ1;;x_i:miÞ¼0g

PrfBDDhjðxi:j;;xi:j1Þ¼1;ðxi:j;;x_i:miÞ¼0g:

ð24Þ

4.1 Two-Pass Traversal Method

This method needs to traverse the OBDD twice to obtain ri:jðRsðt; hÞÞ.

. Find

PrfBDDhjxi¼jg ¼ PrfBDDhjðxi1;;xi:jÞ¼1;ðxi:jþ1;;xi:miÞ¼0g

by assuming that module i is running at level j, i.e., ðxi:1; ; xi:jÞ ¼ 1, ðxi:jþ1; ; xi:miÞ ¼ 0.

. Find

PrfBDDhjxi¼j1g ¼ PrfBDDhjðxi:1;;xi:j1Þ¼1;ðxi:j;;x_i:miÞ¼0g

Fig. 5. The OBDD of subrequirement tree T1, T2, and T3.

Fig. 6. The OBDD and the reliability evaluation of the system under PCM in Fig. 4.

(7)

by assuming that module i is running at level j 1, i.e., ðxi:1; ; xi:j1Þ ¼ 1, ðxi:j; ; xi:miÞ ¼ 0.

Then, ri:jðRsðt; hÞÞ ¼ PrfBDDhjxi¼jg PrfBDDhjxi¼j1g

can be obtained and, therefore, we can derive the Griffith’s importance vector by (23) and (24).

4.2 Single-Pass Traversal Method

This method needs to traverse the OBDD only once to get ri:jðRsðt; hÞÞ. There are two parts. First, to evaluate ri:jðRsðt; hÞÞ from (24), we found that the only difference between computing PrfBDDhjxi¼jg and PrfBDDhjxi¼j1g is

to let xi:j¼ 1 and xi:j¼ 0 in the OBDD traversal, respec-tively. Therefore, only the disjoint path of the OBDD that goes from the root to the terminal one and does include node xi:jin it will contribute to the importance measure. We should delete the disjoint paths that do not include node xi:j or let the probabilities of those paths be 0 during the OBDD traversal.

In the second part, we can combine the two calculations of Section 4.1 at node xi:j. When we visit node xi:j in the OBDD traversal, the probability of the node is equivalent to the probability of the right subtree (i.e., xi:j¼ 1) minus the probability of the left subtree (i.e., xi:j¼ 0). Further, let ðxi:1; ; xi:j1Þ ¼ 1 and ðxi:jþ1; ; xi:miÞ ¼ 0 since module i

moves only from state j to state j 1 and does not affect the other states. If the Boolean expression of OBDD at node v is fv¼ iteðv; fv¼1; fv¼0Þ, Prfvg is the probability of the Boolean variable, where node v corresponds, being true, node u is the subnode of node v, and the Boolean expression of OBDD at node u is fu¼ iteðu; fu¼1; fu¼0Þ, then the single-pass traversal algorithm to evaluate ri:jðRsðt; hÞÞ from (24) is as following:

Compute the probability Prffvg of each node (say v)

from the bottom to the up of OBDD using the following steps:

1. If node v has been visited, then retrieve Prffvg from the computed table and go to Step 8.

2. If node v corresponds to xi:1; ; xi:j1, then let Prfvg ¼ 1 (i.e., ðxi:1; ; xi:j1Þ ¼ 1). Go to Step 5. 3. If node v corresponds to xi:jþ1; ; xi:mi, then let

Prfvg ¼ 0 (i.e., ðxi:jþ1; ; xi:miÞ ¼ 0). Go to Step 6.

4. If node v corresponds to xi:j, then Prffvg ¼

Prffv¼1g Prffv¼0g ðif v and u belong to different modulesÞ Prffv¼1g Prffu¼1g ðif v and u belong to the same moduleÞ:

5. If node v does not correspond to xi:j and

orderingðvÞ > orderingðxi:jÞ, then

Prffvg ¼

Prfvg½Prffv¼1g Prffv¼0g þ Prffv¼0g ðif v and u belong to different modulesÞ Prfvg½Prffv¼1g Prffu¼1g þ Prffv¼0g

ðif v and u belong to the same moduleÞ: 8 > > < > > : ð25Þ

6. If node v does not correspond to xi:j and

orderingðvÞ < orderingðxi:jÞ, then also use (25) to

calculate Prffvg except that let Prffv¼1g or

Prffv¼0g be 0 in (25) if the right or left subtree is independent of xi:j, respectively. To check if the subtree of node v is independent of xi:j is simple. If orderingðuÞ > orderingðxi:jÞ, then the subtree is independent of xi:j.

7. Record node v and Prffvg into the computed table. 8. Visit next node.

Therefore, the probability Prffvg of the root node (top node) of the OBDD represents ri:jðRsðt; hÞÞ and can be efficiently obtained by the single-pass OBDD traversal. The pseudo code of the algorithm for computing the Griffith’s importance measure is shown in Fig. 7. It should be noted that the computed table recording the probabilities of the OBDD nodes is also used to avoid repeated computations. This scheme improves the efficiency of the algorithm.

4.3 Griffith’s Importance Measure under Imperfect

Coverage Model

In this section, we will discuss the Griffith’s importance measure under imperfect coverage model and propose a method to compute ri:jRIsðt; hÞÞ. For the imperfect fault-coverage model, from (14), the probability of no uncovered failure in the system is

Fig. 7. The pseudo code of the algorithm for computing the Griffith’s importance measure.

(8)

PuðtÞ ¼ Yn v¼1

RI

vðt; 1Þ ð26Þ

and the conditional state probability of module i given that module i is not in failed and uncovered state is

P_icðt; jÞ ¼ P I iðt; jÞ PI iðt; 1Þ þ þ PiIðt; miÞ : ð27Þ

For evaluating the Griffith’s importance measure under the imperfect coverage model, by (16), we have

ri:jðRIsðt; hÞÞ ¼ ri:jðPuðtÞ Rcsðt; hÞÞ ¼ Rc sðt; hÞri:jðPuðtÞÞ þ PuðtÞri:jðRcsðt; hÞÞ; ð28Þ where ri:jðPuðtÞÞ ¼ Yn v¼1;v 6¼ i RI_vðt; 1Þ ½PI iðt; 1Þ þ þ P I iðt; j 1Þ þ PI iðt; j þ 1Þ þ þ PiIðt; miÞ ð29Þ and, by the chain rule of differentiation,

ri:j Rcsðt; hÞ ¼@R c sðt; hÞ @Pc iðt; jÞ @P c iðt; jÞ @PI iðt; jÞ @P_icðt; jÞ @PI iðt; jÞ ¼ðP I iðt; 1Þ þ þ P I iðt; miÞÞ þ PiIðt; jÞ ðPI iðt; 1Þ þ þ PiIðt; miÞÞ2 ¼ 1 PI iðt; 1Þ þ þ PiIðt; miÞ ½1 þ Pc iðt; jÞ ¼ 1 RI iðt; 1Þ ½1 þ Pc iðt; jÞ: ð30Þ Therefore, ri:jðRIsðt; hÞÞ ¼R c sðt; hÞ Yn v¼1 v6¼ i RI_vðt; 1Þ½PI iðt; 1Þþ þP I iðt; j1Þ þ PI iðt; j þ 1Þ þ þ P I iðt; miÞ þ PuðtÞ RI iðt; 1Þ ½1 þ Pc iðt; jÞ @Rc sðt; hÞ @Pc iðt; jÞ : ð31Þ In order to derive ri:jðRIsðt; hÞÞ, we need to find only @Rc

sðt; hÞ=@Picðt; jÞ because all the parameters are known already. Finding @Rc

sðt; hÞ=@Picðt; jÞ is equivalent to finding ri:jðRsðt; hÞÞ under PCM as described in Section 4.2, but with the modified values of the module’s state reliabilities, that is, the conditional state reliability of module i, Pc

iðt; jÞ. Then, ri:jðRIsðt; hÞÞ can be evaluated and, therefore, the Griffith’s importance vectors under IPCM can be derived.

5 I

LLUSTRATIVE

E

XAMPLES

Example 1. Let us consider an example of Fault-Tolerant

Parallel Processor (FTPP) in [39]. Fig. 8 is an instance of an FTPP cluster that consists of 16 processing elements (PEs), with four connected to each of four network elements (NEs). This configuration divides the active elements of a triad among NE1, NE2, and NE3, and uses

the PEs on NE4 as spares. The PEs that are in the same relative position on first 3 NEs form a triad, and the PE in the same relative position on NE4 serves as a hot spare for the triad.

The fault-tree model (Fig. 9) for this configuration uses four functional-dependency gates (FDEP) [39] to reflect the dependence of the PEs on the NEs. The FDEP has one trigger-input and one or more dependent basic events. In an FDEP, the dependent basic events are functionally dependent on the trigger event. When the trigger event occurs, the dependent basic events are forced to occur. The FDEP gates are not explicitly connected to the other gates in the tree since the reliability requirements (all four triads must be opera-tional) do not explicitly mention the NEs. Fig. 9 shows four 3/4 gates connected to the top OR gate, one 3/4 gate for each triad. A triad fails when only one member PE remains (three of the four PEs in the triad have failed).

Let us consider this FTPP as a multistate system subject to imperfect coverage. NE1 is a trigger event for an FDEP gate. System fails if NE1 fails in uncovered mode. If its failure is covered, then all components connected to NE1 do not work properly in the FTPP cluster. In addition, if NE1 is not failed, then the com-ponents connected to NE1 may acts independently. It means that they can fail in covered or uncovered mode or can function properly. Obviously, the system will be in better performance if more NEs are functioning. Fig. 8. An FTPP cluster with one spare per triad.

(9)

Therefore, if we consider NE1, NE2, NE3, and NE4 as the key roles of system performance, there will be 24_{þ 1 ¼} 17states:

state 16: all NEs are good. PI

sðt; 16Þ ¼ ðp1 p2 p3 p4Þ ð1 Uðt; 16ÞÞ

state 15: NE1 failure is covered—others are good. PI

sðt; 15Þ ¼ ððq1c1Þ p2 p3 p4Þ ð1 Uðt; 15ÞÞ. ..

.

state 11: NE1 and NE2 failed and covered—others are good.

PI

sðt; 11Þ ¼ ððq1c1Þ ðq2c1Þ p3 p4Þ ð1 Uðt; 11ÞÞ. ..

.

state 1: all failed and covered. PI

sðt; 1Þ ¼ ððq1c1Þðq2c2Þ ðq3c3Þ ðq4c4ÞÞð1Uðt; 1ÞÞ. state 0: at least one uncovered failure.

P_sIðt; 0Þ ¼ 1 ðp1þ q1c1Þ ðp2þ q2c2Þ ðp3þ q3c3Þ ðp4þ q4c4Þ,

where pi, qi, and ci are the reliability, unreliability, and coverage factor of NEi, respectively. Uðt; hÞ is the unreliability of the system configuration at performance state h (Note: Different performance state has different system configuration). Then, the multistate system reliability subject to IPCM is RI

sðt; hÞ ¼ P16

k¼hPsIðt; kÞ. The imperfect coverage of NEs introduces the temporal system failure logic. Therefore, the sequence of events is important and it has to be solved using Markov chains. However, the computation becomes very complex if there exist multiple states for each component or each PE including failed and uncovered state. On the contrary, using the concepts of our approach, we can solve it using combinatorial models easily. For example, if the required performance level of the system is in state 11 and each element has three states (not failed, failed and covered, and failed and uncovered), the failure logic of this con-figuration is shown in Fig. 10.

To evaluate the system unreliability of this configura-tion (or this performance state) under IPCM, first we compute the conditional probability of each element by (18). Using the conditional probability of each element, the conditional system unreliability can be derived by Step 5. Therefore, the probability of system unreliability in performance state 11 subject to imperfect fault-cover-age is obtained by (19). Generally, our method can be easily combined with modularization techniques to obtain the reliability or unreliability of different perfor-mance states under IPCM. The overall multistate system reliabilities RI

sðt; hÞ are described in Table 1.

Example 2.Let us consider a simple bridge network shown

in Fig. 11a. The two-state OBDD-based path function of this network system is

F¼ x1x3x5þ x1x4þ x2x5þ x2x5þ x2x3x4

and is constructed by the algorithm in [16] as shown in Fig. 11b. Let us consider the multistate network system. Assume that the redundancy techniques are used so that each link has a fault-tolerance scheme. Therefore, we can treat a link as a module with various link capacities or with various performance levels (i.e., a multistate network system) and the fault-coverage condition should be considered.

Case 1. The basic requirement for the system being in an acceptable performance level is:

accept¼ x1:2x3:2x5:2þ x1:2x4:2þ x2:2x5:2þ x2:2x3:2x4:2: Case 2. The path x1x4is the backbone of the network. Most of the dataflow run through the path and, thus, the limitation of the requirement for the path is much more strict. Therefore, if x1needs to be at least in level 5 and x4 Fig. 10. The failure logic of system running in performance state 11.

Fig. 11. (a) A bridge network. (b) The corresponding OBDD-based path function.

TABLE 1

(10)

needs to be at least in level 4 for path x1x4, the requirement for the system being in a good performance level is:

good¼ x1:2x3:2x5:2þ x1:5x4:4þ x2:2x5:2þ x2:2x3:2x4:2: Fig. 12 shows the results of accept and good after applying MDO. Table 2 illustrates the parameters of a module with the assumption that each module has six performance states including failed and uncovered state. If all modules are identical, the system reliabilities of

accept and good with different coverage factors are

obtained by (21) and (22) as shown in Table 3.

Table 4 illustrates the ri:jðRsðt; hÞÞ and ri:jðRIsðt; hÞÞ

of accept and good obtained by our algorithm under

PCM and IPCM with different coverage factors. The result shows that the importance measure of each module becomes larger if imperfect fault-coverage is taken into consideration. For good, x1:2is more important than x1:5. This is because that x1x4is the backbone and, if x1:5is not functioning, we can use an alternative path to slightly compensate the dataflow. However, if the

performance of x1 drops too much (below x1:2), the

system cannot afford it. Therefore, maintaining x1:2to be functional is more important than x1:5. If we apply the

Markov techniques or other techniques on each module to solve the behavior of state’s transition, then the time-dependent multistate system reliability of different performance levels with different coverage factors is shown in Fig. 13. Fig. 13 shows that goodis less reliable

than accept. That means we need to pay more on the

system if we want to increase the reliability of goodto be the same as that of accept.

6 C

ONCLUSIONS

This paper has proposed a model for multistate systems with imperfect fault-coverage. An OBDD-based approach for the evaluation of the multistate system reliability and the Griffith’s importance measure has also been proposed. With a little modification, the OBDD-based method can be applied to the multistate systems under IPCM. The evaluation of the system reliability and importance measure is very helpful for the optimal design of a multistate system. It was shown that with the application of conditional probabilities, the time complexity of this method for reliability evaluation is equivalent to that of the methods for perfect coverage model. Furthermore, the approach used to evaluate the reliability can also be employed to evaluate the availability of a system if the Markov process is applied to analyze the behavior of the state transition of each module.

In [6] and [40], with helpful comments from [6], they have presented various performance measures related to multistate systems. In order to compute these measures, we need to find the reliability of a system at various performance levels. Therefore, the results of this paper can be integrated to find the performance measures of a multistate system. This process is straightforward and, therefore, it is not discussed here explicitly [6]. Also, the method proposed for IPCM in this paper can be integrated Fig. 12. The resultant OBDD after MDO. (a) accept. (b) good.

TABLE 2

The Parameters of Individual Module with a Coverage Factor of c1> c2

TABLE 3

Reliability with a Coverage Factor of c1> c2

TABLE 4

(11)

with the technique [13], [15], [16] that uses OBDD for reliability analysis of a system.

For complex systems such as fault-tolerant computer systems, network systems with variable link-capacities, and so on, this approach is applicable. It generates the complete results more quickly and accurately even when there exist a number of dependencies such as shared loads (reconfigura-tion), degradation, common-cause failures, and so on. Based on this approach, researches on failure frequency analysis and optimal design issues of a multistate system will be the focus of our future works.

A

CKNOWLEDGMENTS

This research was supported by the National Science Council, Taiwan, R.O.C. under grant NSC 92-2213-E-002-007.

R

EFERENCES

[1] J.D. Murchland, “Fundamental Concepts and Relations for Reliability Analysis of Multistate Systems,” Reliability and Fault Tree Analysis, Theoretical, and Applied Aspects of System Reliability (SIAM), pp. 581-618, 1975.

[2] A.P. Wood, “Multistate Block Diagrams and Fault Trees,” IEEE Trans. Reliability, vol. 34, no. 3, pp. 236-240, Aug. 1985.

[3] S. Garribba, E. Guagnini, and P. Mussio, “Multiple-Valued Logic Trees: Meaning and Prime Implicants,” IEEE Trans. Reliability, vol. 34, no. 5, pp. 463-472, Dec. 1985.

[4] A. Gandini, “Importance and Sensitivity Analysis in Assessing System Reliability,” IEEE Trans. Reliability, vol. 39, no. 1, pp. 61-69, Apr. 1990.

[5] J. Xue, “On Multistate System Analysis,” IEEE Trans. Reliability, vol. 34, no. 4, pp. 329-337, Oct. 1985.

[6] J. Xue and K. Yang, “Dynamic Reliability Analysis of Coherent Multistate Systems,” IEEE Trans. Reliability, vol. 44, pp. 683-688, Dec. 1995.

[7] G. Levitin, “Incorporating Common-Cause Failure into Nonre-pairable Multistate Series-Parallel System Analysis,” IEEE Trans. Reliability, vol. 50, pp. 380-388, Dec. 2001.

[8] J.B. Dugan, “Fault Tree and Imperfect Coverage,” IEEE Trans. Reliability, vol. 38, pp. 177-185, June 1989.

[9] S.V. Amari, J.B. Dugan, and R.B. Misra, “Optimal Reliability of Systems Subject to Imperfect Fault-Coverage,” IEEE Trans. Reliability, vol. 48, pp. 275-284, Sept. 1999.

[10] A. Reibman and K.S. Trivedi, “Numerical Transient Analysis of Markov Models,” Computers and Operations Research, vol. 15, no. 1, pp. 19-36, 1998.

[11] K.S. Trivedi, Probability and Statistics with Reliability, Queuing, and Computer Science Applications. Prentice-Hall, 1982.

[12] R.E. Bryant, “Graph-Based Algorithms for Boolean Function Manipulation,” IEEE Trans. Computers, vol. 35, no. 8, pp. 677-691, Aug. 1986.

[13] Y.R. Chang, S.V. Amari, and S.Y. Kuo, “Computing System Failure Frequencies and Reliability Importance Measure Using OBDD,” IEEE Trans. Computers, vol. 53, no. 1, pp. 54-68, Jan. 2004. [14] Y.R. Chang, S.V. Amari, and S.Y. Kuo, “Reliability Evaluation of Multi-State Systems Subject to Imperfect Coverage Using OBDD,” Proc. 2002 Pacific Rim Int’l Symp. Dependable Computing (PRDC ’02), pp. 193-200, 2002.

[15] F.M. Yeh, S.K. Lu, and S.Y. Kuo, “OBDD-Based Evaluation of K-Terminal Network Reliability,” IEEE Trans. Reliability, vol. 51, no. 4, pp. 443-451, Dec. 2002.

[16] S.Y. Kuo, S.K. Lu, and F.M. Yeh, “Determining Terminal-Pair Reliability Based on Edge Expansion Diagrams Using OBDD,” IEEE Trans. Reliability, vol. 48, pp. 234-246, Sept. 1999.

[17] X. Zang, H. Sun, and K.S. Trivedi, “Dependability Analysis of Distributed Computer Sys-tems with Imperfect Coverage,” Proc. 29th Ann. Int’l Symp. Fault-Tolerant Computing (FTC-29), pp. 330-337, 1999.

[18] X. Zang, D. Wang, H. Sun, K.S. Trivedi, “A BDD-Based Algorithm for Analysis of Multistate Systems with Multistate Components,” IEEE Trans. Computers, vol. 52, no. 12, pp. 1608-1618, Dec. 2003. [19] B. Lin, O. Coudert, and J.C. Madre, “Symbolic Prime Generation

for Multiple-Valued Function,” Proc. 29th ACM/IEEE Design Automation Conf., pp. 40-44, 1992.

[20] E.A. Elsayed, Reliability Engineering. Addison Wesley Longman, 1996.

[21] A. Hoyland and M. Rausand, System Reliability Theory: Models and Statistical Methods. John Wiley & Sons, 1994.

[22] F.C. Meng, “Comparing the Importance of System Components by Some Structural Characteristics,” IEEE Trans. Reliability, vol. 45, no. 1, pp. 59-65, Mar. 1996.

[23] F.C. Meng, “Some Further Results on Ranking the Importance of System Components,” Reliability Eng. and System Safety, vol. 47, pp. 97-101, 1995.

[24] J.S. Hong and C.H. Lie, “Joint Reliability-Importance of Two Edges in an Undirected Network,” IEEE Trans. Reliability, vol. 42, no. 1, pp. 17-23, Mar. 1993.

[25] M.J. Armstrong, “Joint Reliability-Importance of Components,” IEEE Trans. Reliability, vol. 44, no. 3, pp. 408-412, Sept. 1995. [26] R.E. Barlow and A.S. Wu, “Coherent Systems with Multistate

Components,” Math. Operations Research, vol. 3, pp. 275-281, 1978. [27] T. Aven, “On Performance Measures for Multistate Monotone Systems,” Reliability Eng. and System Safety, vol. 41, pp. 259-266, 1993.

[28] B. Natvig, “Two Suggestions of How to Define a Multistate Coherent System,” Advanced Applied Probability, vol. 14, pp. 434-457, 1982.

[29] H.W. Block, “A Decomposition for Multistate Monotone System,” J. Applied Probability, vol. 19, pp. 391-402, 1982.

[30] F.C. Meng, “Component-Relevancy and Characterization Results in Multistate Systems,” IEEE Trans. Reliability, vol. 42, no. 3, pp. 478-483, Sept. 1993.

[31] W.S. Griffith, “Multistate Reliability Models,” J. Applied Probability, vol. 17, pp. 735-744, 1980.

[32] S. Wu and L.Y. Chan, “Performance Utility-Analysis of Multistate Systems,” IEEE Trans. Reliability, vol. 52, no. 1, pp. 14-21, Mar. 2003.

[33] A. Rauzy, “New Algorithms for Fault Tree Analysis,” Reliability Eng. and System Safety, vol. 40, pp. 203-211, 1993.

[34] R.M. Sinnamon and J.D. Andrews, “Improved Efficiency in Qualitative Fault Tree Analysis,” Quality and Reliability Eng. Int’l, vol. 13, pp. 293-298, 1997.

[35] S.V. Amari, J.B. Dugan, and R.B. Misra, “A Separable Method for Incorporating Imperfect Fault-Coverage into Combinatorial Mod-els,” IEEE Trans. Reliability, pp. 267-274, Sept. 1999.

[36] J.B. Dugan and K.S. Trivedi, “Coverage Modeling for Depend-ability Analysis of Fault-Coverage Systems,” IEEE Trans. Compu-ters, vol. 38, pp. 775-787, June 1989.

[37] S.A. Doyle, J.B. Dugan, and F.A. Patterson-Hine, “A Combinator-ial Approach to Modeling Imperfect Coverage,” IEEE Trans. Reliability, vol. 44, pp. 87-94, Mar. 1995.

[38] H. Sun, X. Zang, and K.S. Trivedi, “A BDD-Based Algorithm for Reliability Analysis of Phased-Mission Systems,” IEEE Trans. Reliability, vol. 48, no. 1, pp. 50-60, Mar. 1999.

Fig. 13. The system reliability of acceptand goodwith different coverage

(12)

[39] J.B. Dugan, S.J. Bavuso, and M.A. Boyd, “Dynamic Fault-Tree Models for Fault-Tolerant Computer Systems,” IEEE Trans. Reliability, vol. 41, no. 3, pp. 363-377, Sept. 1992.

[40] S.V. Amari and R.B. Misra, “Comment on: Dynamic Reliability Analysis of Coherent Multi-State Systems,” IEEE Trans. Reliability, vol. 46, pp. 460-461, Dec. 1997.

Yung-Ruei Chang received the MS degree

(1995) and the PhD degree (2004) in

electrical engineering from National Taiwan University. He is an assistant researcher at the Institute of Nuclear Energy Research (INER), Atomic Energy Council, Taiwan, where he has been working since 1996. He is a member of INER’s fourth nuclear power plant development team and is involved in system reliability and fault-tolerant system design. His research interests include nuclear system reliability analysis, fault-tolerant system, dependable computing, and software reliability. He is a member of the IEEE.

Suprasad V. Amari received the BS degree (1990) in mechanical engineering from Sri Venkateswara University, Tirupati, the MS de-gree in reliability engineering (1992) and the PhD degree in reliability, risk, and fault tolerance of complex systems (1998) from the Indian Institute of Technology, Kharagpur. He is a senior reliability engineer at Relex Software Corporation. He is a member of Relex develop-ment team and consulting services. He was with Tata Consultancy Services from 1996 to 2000, where he was involved in software design and development using objected-oriented methodolo-gies and formal methods, and worked as a consultant and technical lead for data mining and data warehousing projects. His research interests include hardware and software reliability, risk assessment, fault-tolerant computing, and optimization. He has published more than 30 research papers in reputed international journals and conferences. He is an editorial board member of the International Journal of Reliability, Quality and Safety Engineering, and the International Journal on Performability Engineering. He also is a management committee member of RAMS, advisory board member of several international conferences, and a reviewer for several journals on reliability and safety. He is a senior member of ASQ and the IEEE and a member of IIE, ACM, ASA, SSC, SRE, and SOLE.

Sy-Yen Kuo received the BS degree (1979) in electrical engineering from National Taiwan University, the MS degree (1982) in electrical and computer engineering from the University of California at Santa Barbara, and the PhD degree (1987) in computer science from the University of Illinois at Urbana-Champaign. He is the dean of the College of Electrical Engineering and Computer Science, National Taiwan University, Keelung, Taiwan. He is also a professor in the Department of Electrical Engineering, National Taiwan University where he is currently on leave and was the Chairman of the same department from 2001 to 2004. He spent his sabbatical years as a visiting professor in the Computer Science and Engineering Department at the Chinese University of Hong Kong from 2004-2005 and as a visiting researcher at AT&T Labs-Research, New Jersey, from 1999 to 2000, respectively. He was the chairman of the Department of Computer Science and Information Engineering at National Dong Hwa University, Taiwan, from 1995 to 1998, a faculty member in the Department of Electrical and Computer Engineering at the University of Arizona from 1988 to 1991, and an engineer at Fairchild Semiconductor and Silvar-Lisco, both in California, from 1982 to 1984. In 1989, he also worked as a summer faculty fellow at the Jet Propulsion Laboratory of California Institute of Technology. His current research interests include dependable systems and networks, software reliability engineering, mobile computing, and reliable sensor networks. Professor Kuo is an IEEE fellow. He has published more than 240 papers in journals and conferences. He received the distinguished research award between 1997 and 2005, consecutively, from the National Science Council in Taiwan and is now a research fellow there. He was also a recipient of the Best Paper Award in the 1996 International Symposium on Software Reliability Engineer-ing, the Best Paper Award in the simulation and test category at the 1986 IEEE/ACM Design Automation Conference (DAC), the US National Science Foundation’s Research Initiation Award in 1989, and the IEEE/ACM Design Automation Scholarship in 1990 and 1991.

. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.