Optimal Resource Allocation and Reliability Analysis for Component-based Software Applications

(1)

Abstract

In this paper we propose an analytical approach for estimating the reliability of a component-based software. This methodology assumes that the software components are heterogeneous and the transfers of control between components follow a discrete time Markov process. Besides, we also formulate and solve two resource allocation problems. Finally, we demonstrate how these analytical approaches can be employed to measure the reliability of a software system including multiple-input/multiple-output systems and distributed software systems. Experimental results show that the proposed methods can solve the testing-effort allocation problems and improve the quality and reliability of a software system.

1. Introduction

With the great advancement of computer technology, software designers are motivated to integrate commercial off-the-shelf (COTS) software components for rapid software development. To ensure high reliability for such applications using software components as their building blocks to construct a software system, dependable components have to be deployed to meet the reliability requirements. Therefore, it is necessary to assess the reliabilities of such systems by investigating the architectures, the testing strategies, and the component reliabilities [1-4].To ensure the overall reliability of a software application, software components in the system have to meet certain reliability requirements, subject to some resource constraints [5-9]. These resources include human power, CPU hours, and elapsed time, etc. Without loss of generality, we call all these resources as the testing-effort [10]. Hence, to develop a good reliable software system, a project manager must determine in advance how to effectively allocate these resources [7]. In this paper, we investigate two optimal testing-effort allocation problems: minimization of the number of remaining faults in a system given a fixed amount of

testing-effort given specific reliability requirements [5-9]. The organization of this paper is as follows. Section 2 presents an analytical approach to estimating the reliability of a system. Two optimum testing-effort allocation problems are discussed in Section 3. Section 4 illustrates how the proposed approach is actually applied on the three applications. Conclusions and future works are presented in Section 5.

2. Reliability analysis for component-based

systems

A software system can be regarded as composed of logically individual components, which can be implemented and tested independently [2, 11-14]. In this section, we propose an approach to estimating the reliability of a component-based system by taking the architecture of the software system and the reliabilities of the components into consideration. For example, if a system consists of n components with reliabilities denoted by R1,…, Rn respectively, the reliability of an execution path, 1, 3, 2, 3, 2, 3, 4, 3, n, is given by R ×R ×R ×R4×Rn 4 3 2 2 1 . Thus, the

objective here is to estimate the reliability of a system by averaging over all path reliabilities [16].

2.1. Reviews of some testing-effort functions

In the field of software reliability modeling, Yamada et. al. [6] adopted the concept of testing-effort within an NHPP model to get a better description of the software fault phenomenon. The testing-effort can be measured by the man power, the number of test cases, the number of CPU hours,..., etc. Furthermore, if the number of faults detected by the current testing-effort expenditures is proportional to the number of remaining faults, then we we have

m

(

t

)

=

a

(

1 −

e

−r(W(t)−W(0))

)

(1) wherem(t)=the number of faults detected in time (0,t) ,

W(t)= the testing-effort consumption in time (0,t), a= the expected number of initial faults,

Optimal Resource Allocation and Reliability Analysis

for Component-Based Software Applications

Jung-Hua Lo, Sy-Yen Kuo, Michael R. Lyu

*

, and Chin-Yu Huang

Department of Electrical Engineering

National Taiwan University

Taipei, Taiwan

sykuo@cc.ee.ntu.edu.tw

(2)

From [15], we know the fact that a superposition or a decomposition of independent Poisson processes is also a Poisson process. Consider a series system ofnindependent components. Assume that the system failure intensity isλs(t) and the failure intensity of component i isλi(t),

i=1,…,n, the relationship between λs(t)and λi(t) is given

as follows:

λs(t)=λ₁(t)+λ₂(t)+...+λn(t). (2) Xie et. al. [16] proposed an additive model for assessing the reliability of this type of integrated system. Although the approach of the additive model is straightforward, there are some potential drawbacks. For example, Eq. (2) reveals that each component has equal influence on the overall system no matter how often it executes. In practice, the frequency of a component being executed affects the overall system reliability. A higher frequency indicates a greater effect of that component on the performance of the system. This fact shows that the components should have distinct weights according to the architecture of the software system. Consequently, we propose a new approach to representing the effects of weighting on the components. This means that we add a weight vector to Eq. (2) and the new solution is as follows:

λs(t)=w₁λ₁(t)+w₂λ₂(t)+...+wnλn(t)(3) where the vector of weights (w1, w2, ..., wn) represents the importance of each component and is obtained based on the system architecture.

2.3. Markov process to model the architecture of a

component-based system

In this section, we consider systems with different architecture styles and utilize the Markov process to model the failure behaviors of the applications. Three general input-output cases were employed. In addition, we develop three methodologies to estimating the reliability of a software system.

Definition : Let {Xn, n=0, 1, 2…} be a Markov process

with some absorbing states and some transient states. Define the random variable, Nij, to represent the number of visits to state j before entering an absorbing state given Xo=i. The expected value of Nij , E(Nij), is denoted by µij .

Moreover, let ηkdenote the probability of absorption when

a process terminates at an absorbing state k. Furthermore, let the probability of reaching state k from state i in n steps

be denoted as f ._ikn

The proofs of Theorem 1 and Theorem 2 are similar to Theorem 3 and are omitted.

Theorem 1 (single-input/single-output system):Consider

a single-input and single-output system consisting of N components with reliabilities R1,…, RN. Let {Xn} be the

Markov process where state N is an absorbing state, i.e., an output node, while states {1, 2, ..., N−1} are transient states. In particular, assume state 1 is the input node. Therefore, we have the reliability of the system:

Rs= × × ∏ − = 1 2 1 1 ( , ) N i i i N pow R R R µ ,

where the pow(x, y) is the power function, namely, pow(x,

y)=xy.

Theorem 2 (single-input/multiple-output system):

Consider a single-input and r-output system consisting of N components with individual reliabilities denoted by R1,…,

RN.. Let {Xn} be the Markov process where {N, N−1, ..., N−r+1} are absorbing states (i.e. r output nodes) and {1, 2, ..., N−r} are transient states. In particular, assume state 1 is the input node. Therefore, we have the reliability of the system:

R

s

=

× ∏ × ∏ − = = − + r N i N r N k j j i i powR R pow R 2 1 1 1 ( ,µ ) ( ,η )

.

Theorem 3 (multiple-input/multiple-output system):

Consider an s-input and r-output system consisting of N components with reliabilities R1,…, RN.. Let {Xn} be a Markov process where {N, N−1, ..., N−r+1} are absorbing states (i.e. r output nodes) and {1, 2, ..., N−r} are transient states. In particular, assume states {1, 2, ..., s} are the input nodes with probability p1, p2, ..., ps, respectively. Therefore, the system reliability, Rs, equals

∏ ∑ × ∏ × ∏ − + = = = − + = r N s j N r N k k k s l l lj j s i i i R pow p R pow p R pow 1 1 1 1 ) , ( ) , ( ) , ( µ η Proof:

(1) The transformation probability matrix can be written: P= N N r N r N r r N r r Q R I × − × − × − ×     ) ( ) ( ) ( 0

where Q is the transformation probability matrix corresponding to the transient states, R(N− )r×r is the matrix

of transition probabilities from transient to recurrent state, and I is an identity matrix of size r.

(2) Next, we focus on µij where N−r≥i,j≥1. By the

definition of µij and the conditional probability on X1, we can have µij as follows:

E(Nij|X1=1)Pr(X1=1|X0=i)+..+E(Nij|X1=N−r)Pr(X1=N−r|X0=i)+

E(Nij|X1=N−r+1)Pr(X1=N−r+1|X0=i)+..+E(Nij|X1=N)Pr(X1=N|X0=i)

Because {N, ..., N−r+1} are absorbing states, it indicates that the mathematical terms, E(Nij | X1=K)Pr(X1= K| X0=i) where N≥K ≥N−r+1, equal to one. That is,

= ij µ ∑− = +N r k kj ik ij P 1µ δ where    ≠ = = _ifif _ii _jj ij ₀ 1 δ (4)

(3)

M=I+QM, and get M=(I−Q)-1where M is composed of µij.

(3) Assume that the starting state i is the transient state and the ending state j is the absorbing states. Therefore, we have fij1=Rij and ∑ − = − =N r k n kj ik n ij Q f f 1 1 . Equivalently, we can have η =i ((I−Q)−(N1−r)×(N−r)×R(N−r)×r)1i. Therefore, in this

case, the process may be absorbed from one of the input nodes. Thus we have the result for multiple inputs:

i η =∑ = − × − − × − × − × s j j N r N r N r r R Q I p 1 ( ) 1 ) ( ) ( ). ) ((

(4) At last, we can conclude that the nodes i (s≥ i≥1) will have pi visits, the node i (N−r≥i≥s+1) have li

s

l l

pµ

∑

=1

visits on average, and the output nodes i (N≥i≥N−r+1) have η visits. Therefore, the system reliability , Ri s, equals

∏ ∑ ∏ ∏ − + = = = − + = × × r N s j N r N k k k s l l lj j s i i i R pow p R pow p R pow 1 1 1 1 ) , ( ) , ( ) , ( µ η

3. Optimum testing-effort allocation problems

In this section we describe a general problem of allocating testing resources to software components so that the software applications can be constructed effectively, given that the applications have prescribed reliability requirements [5-9]. Furthermore, we consider two testing-effort allocation problems [5-8] based on the proposed model:

(1)minimizing the number of software faults remaining in the system given fixed amount of testing-effort,

(2)minimizing the total testing-effort given the fixed reliability requirements.

3.1. Minimizing the number of remaining faults

Suppose that each application is distributed with a limited amount of testing-effort over its components, and component i is allotted Wi testing-effort, and thus the optimization problem can be represented as follows:

Minimize _∑ = − N i i i i i W r a v 1 )

exp( , subject to the requirements:

1 1 2 12 1 11 α ... α ε α W + W + + NWN ≤ … M N MN M MW α W α W ε α 1 1+ 2 2 +...+ ≤ N j M i W_j i ij≥0, ε ≥0, ≥0, =1,2,..., , =1,2,..., α

Note that the parameters vi, ai, riandαijhave already been

estimated by the proposed model and εi is the limited

amount of testing-effort available for components used by application i.

3.2. Minimizing the total testing-effort

On the other hand, suppose the applications have pre-specified reliability requirements, one has to allocate an amount of testing-effort to each component to minimize the total testing-effort such that all applications meet their reliability requirement. Therefore, the optimization problem can be represented as follows:

Minimize _∑ = N i i W 1

, subject to the requirements:

1 1 2 2 12 1 1

11exp( ) β exp( ) ... β exp( ) γ

β −rW + −rW + + N −rNWN ≤ … M N N MN M M rW β rW β rW γ

β 1exp(−1 1)+ 2exp(− 2 2)+...+ exp(− )≤ N j M i Wj i ij ≥0, γ ≥0, ≥0, =1,2,..., , =1,2,..., β

Note that the parameters ri and β have already beenij

estimated by the proposed model and γ is the reliability_i requirement used by application i.

3.3. Results for single application environment

3.3.1. Minimizing the number of remaining faults given

fixed amount of testing-effort. The optimization problem

is that the total amount of testing-effort is fixed, and we want to allocate these efforts to each component to maximize the system reliability. Suppose the total amount of testing-effort is W, and component i is allotted Wi

testing-effort, and thus the optimization problem can be represented as follows: Minimize: ∑ − = N i i i i i W r a v 1 ) exp( , subject to N i W W W _i N i i ..., , 2 , 1 , 0 , 1 = ≥ = ∑ = .

Note that the parameters ai and vi have already been

estimated by the proposed model in Section 2. To solve the above problem, the Lagrange multiplier method [20] can be applied. Furthermore, we propose a simple optimization algorithm to solve the above problem.

Algorithm 1for minimizing the number of remaining faults

Step 1: Set l=0.

Step 2: Calculate the following equations where i=1,…,N-l. ]. ln ) [ln( , ) / 1 ( ) )(ln / 1 ( ln 1 1 1 _λ λ= − = − ∑ ∑ = − = i i i i r i N i i l N i i i ii _W _v_a_r r W r a v r

Step 3: Rearrange the index i such that W1*≥...≥WN*−l. Step 4: If WN*_−l ≥0 then stop, else update WN*−l =0 and

l=l+1.

Step 5: Go to Step 2.

3.3.2. Minimizing the total testing-effort given the fixed

(4)

number of remaining faults in the system is specified by γ , one has to allocate an amount of testing-effort to each component to minimize the total testing-effort. Therefore, the optimization problem can be represented as follows: Minimize ∑ = N i i W 1 , subject to exp( ) , 0 1 − = ≥ ∑ = i N i i i i i W W r a v γ

We also propose a simple optimization algorithm to solve it.

Algorithm 2 for minimizing the total amount of

testing-effort expenditures: Step 1: Set l=0. Step 2: Calculate 1[ln( 1)], 1,2,..., . 1 l N i r r a v r W N l i i i i i i i = ∑ = − − = γ

Step 3: Rearrange the index i such that W₁*≥...≥W_N*₋_l. Step 4: IfW_N*_−l ≥0 then stop, else update W_N*_−l =0 and l=l+1.

Step 5: Go to Step 2.

4. Numerical examples

4.1. Reliability evaluation of component-based systems

The following examples adapted from [2, 11] are used to illustrate the three architecture cases discussed in Section 2. Without loss of generality, we use the terminating application reported in [11] as a running example and let the estimated reliabilities of the components be regarded as unchanged throughout the following three subsections and listed in Table 1.

4.1.1. Example 1: a single-input/single-output system.

The first example is a single-input/single-output system. It consists of 10 components where component 1 is the input component and component 10 the output component. Figure 1 depicts the control-flow graph of the example, and the transition probabilities among the components are given as follows: P1,2= 0.6, P1,3= 0.2, P1,4= 0.2, P2, 3= 0.7, P2,5= 0.3,

P3,5= 1.0, P4,5= 0.4, P4, 6= 0.6, P5,7= 0.4, P5,8= 0.6, P6,3= 0.3,

P6,7= 0.3, P6,8= 0.1, P6,9= 0.3, P7,2= 0.5, P7,9= 0.5, P8,4= 0.25,

P8,10= 0.75, P9,8= 0.1, P9,10= 0.9. Therefore, the expected

number of visits on each transient state before absorption from the input node (component 1) and the probability of absorption can be derived as follows:

, 9784 . 1 , 5289 . 0 , 3254 . 1 , 4717 . 1 , 1 ₁₂ ₁₃ ₁₄ ₁₅ 11 = µ = µ = µ = µ = µ . 1 , 9669 . 0 , 3155 . 1 , 7433 . 1 , 3173 . 0 ₁₇ ₁₈ ₁₉ ₁₀ 16 = µ = µ = µ = η = µ

Thus, the system reliability is estimated as R1= 0.7715. Table 1: The estimated reliabilities of the components.

1 2 3 4 5 6 7 8 9 10 0.99 0.98 0.99 0.96 0.98 0.95 0.98 0.96 0.97 0.99 1 2 4 3 5 7 6 8 9 10 Figure 1: A single-input/single-output system.

4.1.2. Example 2: a single-input/multiple-output type.

In this example, we delete two links of the original program control graph in Example 1, and obtain the modified graph as shown in Figure 2. The modification is a simple transformation from a single-output system to a multiple-output system and the corresponding transition probabilities are similar to Example1.

1 2 4 3 5 7 6 8 9 10

Figure 2: A single-input/multiple-output system. Therefore, following the same approach we can have following results: µ₁₁=1,µ₁₂ =0.6,µ₁₃ =0.6845, , 6326 . 0 , 2149 . 0 , 0077 . 1 , 3581 . 0 ₁₅ ₁₆ ₁₈ 14 = µ = µ = µ = µ , 0645 . 0 19= µ η₇ =0.4676,η₁₀ =0.5324. Thus, the reliability of the application,R2, is 0.8890.

4.1.3. Example 3: a multiple-input/multiple-output type.

(5)

input components (components 1 and 2) with equal probability and terminates at the output components (components 7 and 10). That is, the modification is a transformation from a single-input system to a multiple-input system. Figure 3 depicts the control-flow graph of the example and the transition probabilities are similar to Example 2 except P1,3= 0.5 and P1,4= 0.5.

0.5 0.5 1 2 4 3 5 7 6 8 9 10

Figure 3: A multiple-input/multiple-output system. Therefore, according to Theorem 3, portion of the vector of weights in Eq. (3) can be obtained by ∑

= = 2 1 . l l lk k p w µ Therefore, we have (w1, w2, w3, w4, w5, w6, w8, w9) = (0.5, 0.5, 0.673, 0.4057, 0.9853, 0.2434, 0.6228, 0.073). On the other hand, with the aim to computing the probability of absorption at each absorbing state, the following information about the two absorbing states is obtained based on Theorem 3: w₇ =0.4672,w₁₀ =0.5327. Thus, we have the reliability of the system is R3= 0.8929.

4.2. Optimum testing-effort allocation problems

In this subsection, three numerical examples for the optimum testing-effort allocation problem are demonstrated. In particular, assume the components are adopted from [6] and the estimated parameters ai, ri, in Section 3.3.1, for i=1,…, 10, in the software system are summarized in Table 2. Moreover, the control flows of the three systems of interest are similar to the cases in Section 3 and the weighting vector viin Eq. (3), for i=1,…, 10, are also listed in Table 2.

4.2.1. Minimizing the number of remaining faults.

Assume the total amount of testing-effort expenditures W is 50,000. One has to allocate the expenditures to each component to minimize the number of remaining faults. Using the algorithm 1 in Section 4.1, the optimal

testing-effort expenditures for three systems are estimated and shown in Table 3. Furthermore, the number of initial faults and the number of remaining faults estimated for three examples are also shown in Table 4.

Table 2: The estimated values of a, riand vi. ai ri(10-4) viin Example1 viin Example 2 viin Example 3 1 89 4.1823 1 1 0.5 2 25 5.0923 1.4717 0.6 0.5 3 27 3.9611 1.3254 0.6845 0.6730 4 45 2.2956 0.5289 0.3581 0.4057 5 39 2.5336 1.9784 1.0077 0.9853 6 39 1.7246 0.3173 0.2149 0.2434 7 59 0.8819 1.7433 0.4676 0.4672 8 68 0.7274 1.3155 0.6326 0.6228 9 37 0.6824 0.9669 0.0645 0.073 10 14 1.5309 1 0.5324 0.5327

Table 3: The optimal solution using Algorithm1.

* i

W for example 1 W_i*for example 2W_i*for example 3

1 6215 8112 6512 2 3756 3552 3241 3 4125 4459 4477 4 2964 4721 5396 5 7718 8261 8192 6 0 835 1697 7 13465 7538 7800 8 11757 1598 12713 9 0 0 0 10 0 0 0

Table 4: The reduction in the number of faults. Initial faults Remaining faults Reduction (%)

Example 1 517.0 173.6 33.6

Example 2 266.7 67.5 25.3

Example 3 221.4 66.4 30.0

Table 5: The optimal solution using Algorithm2.

* i

W for example 1W_i*for example 2W_i*for example 3

1 7700 6954 5340 2 4976 2602 2278 3 5692 3237 3239 4 5669 2612 3233 5 10168 6275 6256 6 2096 0 0 7 20505 2046 2240 8 20293 5943 5971 9 7265 0 0 10 2388 0 0

(6)

4.2.2. Minimizing the total testing-effort expenditures.

Assume the total number of remaining faultsZ 100. One has to allocate the expenditures to each component to minimize the total amount of testing-effort expenditures. Using algorithm 2 in Section 4.2 and Table 2, the optimal solutions for Section 3.3.2 are derived and shown in Table 5. Furthermore, the relationship between the total amount of testing-effort expenditures and the reduction rate of remaining faults is depicted in Figure 4.

Figure 4: The reduction rate of remaining faults v.s. the total testing-effort expenditures.

5. Conclusions

This paper presents a new approach to analyzing the reliability of a component-based software, based on the reliabilities of the individual components and the architecture of the system. Furthermore, we derive some useful mathematical properties to show that the model is indeed very powerful. Three general cases are utilized to validate the proposed approach. On the other hand, two testing-effort allocation problems are also studied and efficient solutions are provided. Experimental results show that the proposed methods can solve the testing-effort allocation problems and improve the quality and reliability of the software system.

Acknowledgment

This research was supported by the National Science Council, Taiwan, ROC., under Grant NSC 90-2213-E-002-113 and also substantially supported by a grant from the Research Grant Council of the Hong Kong Special Administrative Region (Project No. CUHK4222/01E). Further, we thank the anonymous referees for their critical review and comments.

References

[1] M. R. Lyu. Handbook of Software Reliability Engineering. McGraw-Hill, 1996.

[2] S. S. Gokhale, "Analysis of Software Reliability and Performance," Ph.D. Dissertation, Department of Electrical and Computer Engineering, Duke University, Durham, 1998. [3] J. D. Musa, A. Iannino, and K. Okumoto (1987). Software Reliability, Measurement, Prediction and Application. McGraw-Hill.

[4] J. D. Musa (1998). Software Reliability Engineering: More

Reliable Software, Faster Development and Testing.

McGraw-Hill.

[5] P. Kubat and H. S. Koch, "Managing Test-Procedure to Achieve Reliable Software," IEEE Trans. on Reliability, vol. 32, No. 3, pp. 299-303, September 1983.

[6] H. Ohtera and S. Yamada, "Optimal Allocation & Control Problems for Software-testing Resources," IEEE Trans. on Reliability, 39, vol. 2, pp. 171-176, June 1990.

[7] Y. W. Leung, "Software Reliability Growth Model with Debugging Efforts," Microelectron. Reliab. Vol. 32, No. 5, pp. 699-704, 1992.

[8] R. H. Hou, S. Y. Kuo, and Y. P. Chang, " Needed Resources for Software Module Test, Using the Hyper-Geometric Software Reliability Growth Model," IEEE Trans. on Reliability, Vol. 45, No. 4, pp. 541-549, December 1996. [9] Michael R. Lyu and Aad P. A. van Moorsel, "Optimization of

Reliability Allocation and Testing Schedule for Software Systems," Proceedings of the 8th International Symposium on Software Reliability Engineering, pp. 336-347, November 1997, Los Almitos, California.

[10] C. Y. Huang, J. H. Lo, S. Y. Kuo, and Michael R. Lyu, "Software Reliability Modeling and Cost Estimation Incorporating Testing-Effort and Efficiency," Proceedings of the 10th International Symposium on Software Reliability Engineering, pp. 62-72, November 1999, Florida.

[11] R. C. Cheung, "A User-Oriented Software Reliability Model," IEEE Trans. on Software Engineering, SE-6(2), pp. 118-125, March 1980.

[12] W. L. Wang, Y. Wu, and M. H. Chen, "An

Architecture-Based Software Reliability Model," Proceedings of the Pacific Rim International Symposium on Dependable Computing, pp. 143-150, Dec. 1999, HongKong. [13] S. Krishnamurthy and A. P. Mathur, "On the Estimation of Reliability of a Software System using Reliabilities of its Component", Proceedings of the 8th International Symposium on Software Reliability Engineering, pp. 146-155, November 1997, Albuquerque, New Mexico.

[14] B. Littlewood, "Software Reliability Model for Modular Program Structure", IEEE Trans. on Reliability, vol. 28, No. 3, pp. 241-246, August 1979.

[15] K. S. Trivedi. Probability and Statistics with Reliability, Queuing and Computer Science Applications. Prentice-Hall, Englewood Cliffs, New Jersey, 1982.

[16] M. Xie and C. Wohlin, "An additive Reliability Model for the Analysis of Modular Software Failure Data," Proceedings of the 6th International Symposium on Software Reliability Engineering, pp. 188-194, October 1995, Toulouse, France. [17] M. S. Bazaraa and C. M. Shetty, Nonlinear Programming: