Survivability and Performance Optimization of Mobile Wireless Communication Networks in the Event of Base Station Failure

(1)

Survivability and performance optimization of mobile

wireless communication networks in the event

of base station failure

Kuo-Chung Chu

a,*

, Frank Yeong-Sung Lin

b,1

a

Department of Information Management, Jin-Wen Institute of Technology, No. 99, An-Chung Road, Shin-Tien City, Taipei County 231, Taiwan

b

Department of Information Management, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 106, Taiwan Available online 17 April 2006

Abstract

In this paper, we investigate the survivability of mobile wireless communication networks in the event of base station (BS) failure. A survivable network is modeled as a mathematical optimization problem in which the objective is to min-imize the total amount of blocked traffic. We apply Lagrangean relaxation as a solution approach and analyze the exper-iment results in terms of the blocking rate, service rate, and CPU time. The results show that the total call blocking rate (CBR) is much less sensitive to the call blocking probability (CBP) threshold of each BS when the load is light, rather than heavy; therefore, the more traffic loaded, the less the service rate will vary. BS recovery is much more important when the network load is light. However, the BS recovery ratio (BSRR), which is a key factor in reducing the blocking rate for a small number of BSs, is more important when a system is heavily loaded. The proposed model provides network surviv-ability subject to available resources. The model also fits capacity expansion requirements by locating mobile/portable BSs in the places they are most needed.

Keywords: Base station recovery; Lagrangean relaxation; Mathematical modeling; Network survivability; Performance evaluation; Quality of service

1. Introduction

Although the survivability of telecommunication networks has been thoroughly investigated, it is still a cru-cial issue for both service providers and end users. Service providers increase competition by provisioning and deploying reliable/survivable network services. Actually, the problem of network survivability stems from net-work planning. The planning process can be divided into the following phases: topological design, traﬃc

rout-ing/management, and circuit routing design of the transmission facility network[1]. In traﬃc management, the

* _{Corresponding author. Tel.: +886 2 8212 2323; fax: +886 2 8212 2339.}

E-mail address:kcchu8992@gmail.com(K.-C. Chu).

1

Tel.: +882 2 3366 1191.

(2)

primary goal is to provide optimal capacity and meet quality of service (QoS) requirements, while minimum-cost routing is primarily concerned with circuit routing design.

Nowadays, a fiber optic-based backbone is incorporated in the design of network topologies. As a single transmission link carries a significant amount of traffic, failure of such a link causes a breakdown of services. Failures can occur as a result of natural disasters (floods, hurricanes, etc.), human actions (war, terrorism, e.g., 911), or failure of software or control systems. Thus, the network must be designed for survivability so that traffic can still be carried immediately after a failure. Efforts should focus on designing networks/protocols in such way that service can be maintained at a reasonable cost if there is a failure. To improve the performance of the network in such an event, the following network planning and management implementation options should be considered: dynamic call routing between switching nodes, circuit diversity through cross-connect nodes, augmented trunk capacity, real-time restoration in the transmission facility network, or a combination

of these options[2–4]. Finding an approach that meets survivability requirements is generally regarded as an

NP-hard problem [5].

To date, most research has concentrated on survivability in wired networks. In general, a wireless or mobile network consists of a number of components, including base stations (BSs), BS controllers (BSCs), a mobile switch center (MSC), home location registers (HLRs), visiting location registers (VLRs), signaling system 7 (SS7), and high-capacity trunks. As a failure could involve one or more of these components, the survivability

of mobile wireless networks is also an important research area[6,7]. The impact of a failure can be measured in

terms of the number of users aﬀected and the duration of the outage.

Table 1summarizes the eﬀects of failure and the mitigation strategies in a wireless environment[6]. A BS

serves hundreds of mobile users (MUs) who either initiate new calls or keep their on-going (handoff) calls, while a BSC serves thousands of MUs by supporting switching between several neighboring BSs. An MSC, which is a switch that interconnects a number of BSCs, is capable of supporting at least 100,000 MUs. The HLR is a central database that contains details of each MU authorized to use the network. The VLR is a tem-porary database of MUs that have roaming in the particular area the database serves. Each BS in the network is served by exactly one VLR; hence, an MU cannot be present in more than one VLR at a time. The failure of switch level components significantly affects services, so that most MUs are unable to access service due to the loss of initial call requests or the end of call delivery. Fortunately, by using redundant components the switches can be operated reliably, and in the event of failure of HLRs/VLRs, the replicated databases are suf-ficient to continue service.

Even though redundant components can be used to mitigate the impact of a BS failure, the cost is an impor-tant issue. In addition, wireless network survivability has to take account of radio channel capacity, interfer-ence, and user mobility; as well as the impact a failure has on the signaling network and how a failure in one part of a network impacts on several other parts of the network. To provide reliable and survivable wireless and mobile services, network providers must ﬁnd ways to reduce the number of network failures and cope with failures when they do occur.

The remainder of this paper is organized as follows. Section2contains a literature review. In Section3, a

mathematical problem formulation of a survivable network is proposed. Section 4 presents a solution

approach to the problem based on Lagrangean relaxation. Section5describes the computational experiments.

In Section6, a sensitivity analysis of associated constraints is conducted by calculating the Lagrangean

mul-tipliers for all procedures. Finally, in Section7, we present our conclusions.

Table 1

The eﬀects of wireless component failure and mitigation strategies

Failure Cause Number of users aﬀected Time to ﬁx Mitigation strategies

BS/BSC Hardware, software, nature 1000–20,000 Hours to days Overlay BS, redundant

components

MSC Hardware, software, operators 100,000 Hours to days Spare components, power,

smaller switches

HLR/VLR Hardware, software 100,000 Hours to days Replicated database, redundant

(3)

2. Literature review

The next-generation mobile environment of CDMA-based (code division multiple access-based) networks

will have a variety of requirements, such as multimedia data services, higher data rates, mobility, and QoS[8].

With regard to QoS, several previous works have addressed the issue of multimedia traﬃc in terms of call admission control (CAC). Generally, a network system manages available resources and allocates them in an optimal way among the system’s users. Survivable service is another important issue in wireline/wireless

communications networks[2–6].

A typical infrastructure of a cellular network consists of a number of components, as shown inFig. 1. A

failure in an MSC aﬀects nearly all customers covered by it – perhaps hundreds of thousands of users. Even though failed components can be replaced with spare parts, natural disasters, such as ﬂoods and earthquakes, as well as some human factors, may cause a BS to crash. In a wireless system, a BS that communicates directly with MUs is a critical facility. If more BSs are deployed, the coverage as well as the system’s capacity can be enhanced, but there is a greater possibility of BS failure. Thus, partial users may be out-of-service and the overall QoS would be degraded.

In terms of operating, if we can properly recover some of the failed BSs, it would enhance QoS and provide survivable service. Accordingly, BS recovery is one of the most important approaches for minimizing the total system call blocking rate (CBR). Intensive research on the comparison of FH-CDMA and DS-CDMA for

wireless survivable networks was reported in[6], but it did not deal with the overall call blocking problem

in conjunction with BS recovery. Even though[9]dealt with the survivability issue as an integrated planning

problem, it focused on the long-term planning problem; however, network monitoring/servicing to identify and avoid potential failures in a system’s operation is more important in the short-term.

The basic model of network survivability proposed in[10]uses an exhaustive search approach. It denotes a

set of failed BSs and a set of ﬁxed BSs asjFj and jF0_{j, respectively. The overall procedure of the basic model is}

as follows:

Step 1. Specify the number of BSs jF0_{j to be repaired, after which the speciﬁc number of combinations}

required to ﬁx the BSs can be generated, say C(jFj, jF0_j).

(4)

Step 2. Sequentially repair the BSs selected from each combination identiﬁed in Step 1.

Step 3. Solve the network survivability problem of all workable BSs and calculate the minimal CBR value of each combination; compare the optimal values of all combinations and retain the overall optimal value; if end of total combinations, go to Step 4, or Step 2 otherwise.

Step 4. End solution procedure.

The time complexity of the basic model is O(jFj2). However, in this paper, we propose a combinatorial

opti-mization model (hereafter called the extended model) that reduces the complexity to O(1). We focus on BS recovery decisions to formulate the survivability problem in terms of call blocking control. The objective is to minimize the blocked traﬃc in the overall system by allocating some restricted resources to recover broken

BSs. To better describe the problem, the sets for modeling the problem are listed in theTable 2. We assume

there arejBj BSs in a system, and that a number of BSs, say jFj, have failed. To minimize the total blocked

traﬃc, some BSs, say jF0_{j, can be ﬁxed by allocating restricted resources. The BS recovery ratio (BSRR) is}

deﬁned asjF0_{j/jFj. A system that enables available BSs (i.e., functioning) to cooperate with ﬁxed BSs provides}

survivable service with overall minimal traﬃc blocking. 3. Network survivability and performance modeling 3.1. Problem description

The main advantage of the extended model is the time complexity, which is reduced from O(jFj2) to O(1),

wherejFj is the total number of failed BSs. In the basic model[10], the decision of each BSRR is selected from

the minimal objective values among all recovery combinations. For example, ifjFj = 4 and BSRR = 0.5 (i.e.,

the number of BSs to be fixed is two), there exist C(4, 2) = 6 possible cases. However, instead of using the time-consuming approach in the basic model, the extended model considers the BS recovery decision for network survivability and performance modeling with CBR minimization jointly. The more traffic loaded into the sys-tem, the greater the improvement in time consumption. To effectively evaluate the performance of network survivability, the model only focuses on voice service. We do not consider the existing MU connections, Table 2

Deﬁnition of sets for network survivability

Notation Description B A set of BSs B0 _{A set of available BSs, B}0_{= B}_F B00 _{A set of workable BSs, B}00_{= B}0_{[ F}0 F A set of failed BSs, F B F0 _{A set of ﬁxed BSs, F}0_F T A set of MUs M2 M0 BS0 BS3 BS1 BS2 M4 M5 _M7 M6 M3 M1 R R R R M2 M0 BS0 BS3 M4 M5 _M7 M6 M3 M1 R' R' M2 M0 BS0 BS3 BS2 M4 M5 _M7 M6 M3 M1 R' R R M2 M0 BS0 BS3 BS1 M4 M5 _M7 M6 M3 R' M1 R R a b c d

Fig. 2. An example of network survivability: (a) normal case; (b) BS1 and BS2 failures with BS0 and BS3 power adjustment; (c) BS2 recovery with BS0 power adjustment; and (d) BS1 recovery with BS3 power adjustment.

(5)

but we do assume perfect power control, the uplink perfectly separated from the downlink, fading, and down-link analysis. In order to simplify the analysis, some complicated scenarios like new, re-homing, outbound, and handover calls are not dealt with.

The extended model provides network survivability by allocating redundant resources. Furthermore, over-laying the architecture, by adjusting the transmission power of all available/workable BSs, reduces the impact

of BS failures.Fig. 2shows an example of four BSs (BS0–BS3) and eight MUs (M0–M7), where two MUs are

covered by each BS. InFig. 2(a), all BSs work normally (no failures) within power radius R. M2 and M5 are

served by BS1 and BS2, respectively. InFig. 2(b), both BS1 and BS2 are failures. Thus, both BS0 and BS3

adjust power within R0_{to serve M2 and M5, respectively. However, it is a complicated process to recover some}

failed BSs.Fig. 2(c) andFig. 2(d) show alternative ways of recovering BS2 and BS1, respectively. The power of

the available BSs, BS0 and BS3, is also adjusted. Network survivability deals with recovery decisions about failed BSs, i.e., recovering N of M failed BSs C(M, N), is a combinatorial optimization problem.

3.2. Performance indicators

Snow et al.[6], in their study of the performance issues of wireless communications, propose an outage

index that considers the magnitude (the number of customers aﬀected) and duration (the length of time that

service is disrupted). In addition, the index focuses on two service components: (1) registration blocking (IRB),

i.e., new customers are unable to register with the wireless network; and (2) call blocking (ICB), i.e., the failure

of calls by registered customers. Based on IRBand ICB, we now introduce two performance indicators to be

analyzed in this research, namely, the service rate and the call blocking rate.

(1) Service rate (RS): No matter what causes BSs to fail, the power of available/workable BSs (integrating

partially ﬁxed BSs) is adjusted. Some MUs originally covered by the failed BSs will probably still be out

of service, which will result in registration blocking, RS= (1 IRB).

(2) Call blocking rate (RB): The call blocking probability (PB) of MUs is expressed by Erlang’s B formula

PB= B(g, c), where g is the aggregate traﬃc, and c denotes the available channel resources. Thus, CBR is

RB= g Æ PB= g Æ B(g, c). Actually, in a survivable environment, PBcan be further decomposed into two

cases, (i)PB= B(g, c), where g is aggregate ﬂow of total users covered by workable BSs, and c denotes all

available channels; and (ii) PB= B(g0, c0) = 1, where g0is the aggregate ﬂow of lost traﬃc (out of service);

and c0_{is zero, since the system cannot allocate any channels.}

3.3. Performance modeling

The extended model is formulated as the following mathematical optimization problem(IP). The objective

is to minimize the total CBR of the overall system, which is the sum of the average CBR traﬃc in the system

and lost traﬃc (g0_{). Associated parameters used in the model are deﬁned in} _{Table 3}_{. The decision variables}

(DVs) of the extended model are deﬁned inTable 4

ZIP¼ min X j2B00 g_jBBjðgj; cjÞ þ g0 ðIPÞ subject to: Eb Ntotal req 6 P N0 1þ1 Ga P N0ðcj 1Þ þ 1 Ga P N0 P j0_2B00 j0_6¼j r_j0 2 Max Djj r_j0 2 ;- s cj0 ! 8j 2 B00 _ð1Þ X t2T Atzjt¼ gj 8j 2 B00 ð2Þ Djtzjt6rjujt 8j 2 B00; t2 T ð3Þ

(6)

X j2B00_[b0 zjt¼ 1 8t 2 T ð4Þ ujt6fj sj 8j 2 B; t 2 T ð5Þ fj6sj 8j 2 B ð6Þ X j2F fj6U ð7Þ cj6Mj 8j 2 B00 ð8Þ 0 6 rj6Rj 8rj2 Yj; j2 B00 ð9Þ BBjðgj; cjÞ 6 bj 8j 2 B00 ð10Þ zjt¼ 0 or 1 8j 2 B00; t2 T ð11Þ fj¼ 0 or 1 8j 2 B ð12Þ cj2 Zþ 8j 2 B00 ð13Þ

Constraint(1)ensures that each traﬃc demand is serviced by a BS with the required QoS. The second and third

terms of the denominator represent intra-cell and inter-cell interference, respectively. Constraint(2)calculates

Table 3

Deﬁnitions of parameters in the extended survivability model

Notation Description

- A small number

a Voice activity factor

s Attenuation factor

bj Threshold of the call blocking probability (CBP) for each BS j, j2 B

At The traﬃc requirement of MU t (in Erlangs), t2 T

b0 _{The artiﬁcial BS that carries a rejected call when the admission control function decides}

to reject the call request

BBj(gj, cj) CBP of the Erlang-B function with aggregate traﬃc gjand available channel cjin BS j

Djj Distance between base station j and j0

Djt Distance between BS j and MU t

Eb The energy that BSs receive

G The processing gain

Mj Upper bound (UB) on the number of users that can be active at the same time in BS j, j2 B

N0 Background noise

NTotal Total noise

P The power a BS receives from an MU that is homed to the BS with perfect power control

Rj Upper bound on the transmission power radius of BS j, j2 B

sj Indicator function of BS j, which is 1 if BS has failed, and 0 otherwise

U UB on number of failed BSs to be repaired

ujt The coverage indicator function, which is 1 if MU t can be served by BS j, and 0 otherwise

Yj The set of transmission radii of BS j

Table 4

Decision variables in the extended survivability model

Notation Description

cj The number of MUs active at the same time in BS j

fj DV, which is 1 if failed BS j is ﬁxed, and 0 otherwise

fj sj DV, which is exclusive-or (XOR) function of fjand sj, and expressed by (1 fj)(1 sj)

g0 _{DV of aggregate lost traﬃc in the system, where g}0_¼P

t2TAt

P

j2B00ð1 z_jtÞ

gj DV of aggregate traﬃc in BS j

rj DV of transmission power radius in BS j

(7)

the aggregate ﬂow of each BS j. Constraint(3)requires that an MU must be in the coverage area of a BS before

it can be served by that BS. Constraint(4)guarantees that each MU cannot home into more than one BS.

Con-straint(5)ensures that ujt= 1 iﬀ fj sj= 1 in each of the following cases: (i) BS j has failed (sj= 1) and has to be

ﬁxed (fj= 1); and (ii) BS j is operating normally (sj= 0), so that maintenance is not needed (fj= 0). Constraint

(6)ensures that the DV to fix BSs is set to 1 iff sj= 1. Constraint(7)guarantees that total number of fixed BSs is

less than a pre-deﬁned threshold U (available resources). Constraint(8)ensures that the number of users who

can be active at the same time in a BS is no greater than upper bound Mj. Constraint(9)ensures that the

trans-mission power radius of each BS j is between 0 and Rj. Constraint(10)requires that a BS serves its slave MUs

under a pre-deﬁned CBP threshold. Constraints(11) and (12)enforce the binary property of the DVs. Finally,

Constraint(13)denotes the integer and nonnegative property of DV cjfor channel allocation.

4. Solution approach 4.1. Lagrangean relaxation

The procedure of the Lagrangean relaxation (LR) method is as follows: relax complicating constraints,

multiply the relaxed constraints by the corresponding Lagrangean multipliers, and add the relaxed constraints

to the primal objective function. The primal optimization problem(IP)can be transformed into the problem

(LR)and then into a Lagrangean dual problem. The better primal feasible solution is an upper bound (UB) of

the primal problem, while the Lagrangean dual problem solution guarantees the lower bound (LB) of the pri-mal problem. Generally, selecting a level of relaxation is a tradeoﬀ between two properties: the tightness of the gaps between the bounds produced, and the computation time needed to obtain these bounds. To reduce the computational complexity, the constraints that contain at least two decision variables are relaxed, i.e.

Con-straints(1)–(3), because those DVs can be separated and decided independently after relaxing the respective

constraints. In addition, some constraints related to the BS recovery decision, i.e., Constraints (5) and (6),

are relaxed. The problem(LR)is further decomposed into three independent subproblems:(SUB 1)channel

assignment, power control, and capacity management;(SUB 2)admission control; and(SUB 3)BS recovery.

Each subproblem can be optimally and independently solved.

ZDðv1j; v 2 j; v 3 jt; v 4 jt; v 5 jÞ ¼ min X j2B00 gjBBjðgj; cjÞ þ X t2T At 1 X j2B00 zjt ! þX j2B00 v1_j Eb Ntotal req 0 B B @ þ Eb Ntotal req 1 Ga P N0 ðcj 1Þ þ X j0_2B00 j0_2j rj0 2 Max Djj0 rj0 2; - !s cj0 0 B B @ 1 C C A P N0 1 C C A þ X j2B00 v2_j X t2T Atzjt gi ! þX t2T X j2B0 v3 jtðDjtzjt rjujtÞ þ X t2T X j2B v4 jtðujt ð1 fjÞð1 sjÞ fj sjÞ þ X j2B v5 jðfj sjÞ ðLRÞ subject to:(4), (7)–(13).

Subproblem (SUB 1) related to DVs cj, rj, and gj:

ZSUB 1¼ min X j2B00 g_jBBjðgj; cjÞ þ Eb Ntotal req 1 Ga P N0 v1_j cjþ X j0_2B00 j0_6¼j r_j0 2 Max Djj0rj0 2; - !s cj0 0 B B @ 1 C C A 0 B B @ þX t2T ðv4 jt v 3 jtrjÞujt v2jgjþ v 1 j Eb Ntotal req P N0 Eb Ntotal req 1 Ga P N0 !1 C C A ðSUB 1Þ

(8)

subject to:(8)–(10), and(13).

Problem(SUB 1)can be decomposed intojB00_{j subproblems, since the values of DVs r}

jand cjare discrete

and limited. To get an optimal solution, we exhaustively search for all possible combinations of cj, rjand gj.

Subproblem (SUB 2) related to DV zjt:

ZSUB 2 ¼ min X t2T At 1 X j2B00 zjt ! þX j2B00 v2_jX t2T Atzjtþ X t2T X j2B00 v3_jtDjtzjt ¼ minX t2T X j2B00 ðv3 jtDjtþ ðv2j 1ÞAtÞzjtþ X t2T At ðSUB 2Þ

subject to:(4) and (11).

In(SUB 2), the second termP_t2TAt¼ jT j Atis a constant value, i.e., the total aggregate traﬃc in the

sys-tem. This can be dropped initially and added to the optimal value later, since it will not aﬀect the optimal

solu-tion of(SUB 2). The subproblem can then be decomposed intojB00_{j · jTj independent subproblems for each}

ljt¼ ðv3jtDjtþ ðvj2 1ÞAtÞ where j 2 B00 and t2 T. To derive the minimal value of (SUB 2), we assign either

zjt= 1 to all ljt60, or zjt= 1 as the minimal value of ljtfor alljB00j · jTj subproblems.

Subproblem (SUB 3) related to fj:

ZSUB 3 ¼ min X t2T X j2B v4_jtðð1 fjÞð1 sjÞ fj sjÞ þ X j2B00 v5_jðfj sjÞ ¼X j2B X t2T v4_jtð1 2sjÞ þ v5j ! fjþ X j2B X t2T v4_jt v5 j ! sj X t2T v4_jt ! ðSUB 3Þ

subject to:(7) and (12).

In(SUB 3), the second termP_j2B P_t2Tv4

jt v5j sj P t2Tv4jt

is a constant, which can be calculated

eas-ily. Let kj¼P_t2Tv4jtð1 2sjÞ þ v5j so that(SUB 3)can be decomposed into jBj independent subproblems for

each kj. To obtain the minimal value of(SUB 3), we assign at most fj= 1 for the ﬁrstjUj minimal value of kj

(Constraint(7)), and fj= 0 otherwise.

4.2. Lagrangean dual problem and subgradient method

According to the weak Lagrangean duality theorem [11], for anyðv1

j; v 3 jt; v 4 jt; v 5 jÞ P 0 and v 2 j, the objective value of ZDðv1j; v 2 j; v 3 jt; v 4 jt; v 5

jÞ is a lower bound on ZIP. Based on the problem(LR), the following dual problem

(D) is constructed to calculate the tightest lower bound (LB).

ZD¼ max ZDðv1j; v 2 j; v 3 jt; v 4 jt; v 5 jÞ ðDÞ subject to:ðv1 j; v3jt; v4jt; v5jÞ P 0 and v2j.

Then, a subgradient method is applied to solve the dual problem. Let the vector S be a subgradient of

ZDðv1j; v 2 j; v 3 jt; v 4 jt; v 5 jÞ at ðv 1 j; v 2 j; v 3 jt; v 4 jt; v 5

jÞ. In iteration k of the subgradient optimization procedure, the multiplier

vector p is updated by pk+1= pk+ tkSk, in which tkis a step size determined by tk _{¼ dðZ}

IP ZDðpkÞÞ=kSkk2,

where Z

IPis an UB on the primal objective function value after iteration k; and d is a constant, where 0 6 d 6 2.

4.3. Getting primal feasible solutions

Based on the problem(LR), the dual problem ZD¼ max ZDðv1j; v

2 j; v 3 jt; v 4 jt; v 5

jÞ is constructed to calculate the

tightest LB subject to ðv1 j; v 3 jt; v 4 jt; v 5 jÞ P 0 and v 2

j. We have developed Algorithm A to get primal feasible

(9)

Algorithm A

Step 1. Check the decision feasibility of BS recovery by Constraint(6). Reset fj= 0 if it was fj= 1 in problem

ZDfor a workable state of BS j; otherwise, go to Step 2.

Step 2. Recalculate indication function ljtif the process of DV fjwas adjusted in Step 1.

Step 3. Check QoS Constraint(1)for each BS j.

Step 3.1. Reduce the transmission power radius rj if the QoS constraint is still violated; otherwise

go to Step 3.2.

Step 3.2. Compute the aggregate traﬃc ﬂow gjbased on the transmission power radius rjdetermined

in Step 3.1.

Step 4. Check the CBP Constraint(10)BBj(gj, cj) 6 bj. If it is still violated, increase the available channels cj

to meet the requirement bj; otherwise, go to Step 5.

Step 5. Adjust the transmission power radius rjso that it just covers all MUs in each BS j.

Step 6. Calculate the total blocked traﬃc in the system based on the DVs, including cj, rj, gj, and zjt, solved in

the previous steps. Step 7. End algorithm.

5. Computational experiments 5.1. Experiment environment

The parameters used in the experiment are as follows [12,13]: P/N0= 7 db, Eb/Ntotal= 6 db, Mj= 120,

G = 156.25, At= 0.11, a = 0.75.

The algorithm implementation is coded in C programming language and the running platform is a PC with an INTEL P4-1.6 GHZ CPU. We evaluate the algorithm for 5 cases of BS (jBj): 40, 80, 120, 160, and 200; and 6 cases of MU (jTj): 500, 1000, 1500, 2000, 2500, and 3000. The number of failed BSs (jFj) accounts for one-tenth of all BSs, and the BS recovery ratios (BSRR) are 0.00, 0.25, 0.50, and 0.75. The limit of iterations for the Lagrangean relaxation approach is 1000, and the improvement counter is 25. The parameter d adopted in the subgradient method is initialized to 2 and halved when the dual objective function value does not improve for 25 iterations. In addition, a pre-deﬁned CBP threshold for each BS

is given as bj= 0.03.

5.2. Performance analysis

The analysis of the scalability of the extended model is based onjTj = 3000 (Fig. 3) andjBj = 200 (Fig. 4).

A number ofjBj and jTj are compared in these ﬁgures. Our experiments calculate all near-optimal solutions

with gaps of less than 0.001%, and the results include the CBR (UB of ZIP), the service rate, and the CPU time.

We now discuss the results in detail.

(1) CBR (RB): In the case ofjTj = 3000, the diﬀerence in RBis insigniﬁcant for a number ofjBj, as shown in

Fig. 3(a). In the case ofjBj = 200 inFig. 4(a), RBis a monotonically increasing function ofjTj, which

aﬀects RB signiﬁcantly. In Fig. 3(b), the case of jTj = 3000, the increase in RB varies between 15%

and 71% in the case ofjBj = 40, while it is in the range 10–44% in the case of jBj = 200. The BSRR

is a key factor in reducing RB for a small number of BSs. In Fig. 4(b), givenjBj = 200, the maximum

increase of RB is 27% in the case of jTj = 500, while it is 44% in the case of jTj = 3000. The BSRR is

an important factor in reducing RB when the system is heavily loaded.

(2) Service rate (RS): Obviously, RS is a monotonically increasing function of the BSRR. However, the

BSRR has an insigniﬁcant impact on RSin all cases ofjBj and jTj in Figs. 3(c) and 4(c), respectively.

Irrespective of which case is calculated, RSis in the range 0.92–0.98.

(3) CPU time: Even though our experiments run the LR method up to 1000 iterations, the experiments converge in less than 1000 iterations. Since survivability analysis focuses on available BSs, the more

(10)

0 5 10 15 20 25 40 80 120 160 200 |B|

Call Blocking Rate (R

B ) NORMAL BSRR=0.75 BSRR=0.50 BSRR=0.25 BSRR=0.00 0 20 40 60 80 40 80 120 160 200 |B|

Increasing Call Blocking Rate

(%) BSRR=0.75 BSRR=0.50 BSRR=0.25 BSRR=0.00 0.9 0.92 0.94 0.96 0.98 1 1.02 40 80 120 160 200 |B| Service Rate (RS) NORMAL BSRR=0.75 BSRR=0.50 BSRR=0.25 BSRR=0.00 0 200 400 600 800 1000 40 80 120 160 200 |B| Number of Iteration NORMAL BSRR=0.75 BSRR=0.50 BSRR=0.25 BSRR=0.00 a b c d

Fig. 3. Analysis of the extended survivability model, withjTj = 3000: (a) CBR with respect to BSRR; (b) increasing CBR compared with the normal case; (c) CBR with respect to BSRR; (d) CPU time with respect to BSRR.

0 5 10 15 20 25 500 1000 1500 2000 2500 3000 |T|

Call Blocking Rate (R

B ) NORMAL BSRR=0.75 BSRR=0.50 BSRR=0.25 BSRR=0.00 0 10 20 30 40 50 500 1000 1500 2000 2500 3000 |T|

Increasing Call Blocking Rate

(%) BSRR=0.75 BSRR=0.50 BSRR=0.25 BSRR=0.00 0.9 0.92 0.94 0.96 0.98 1 1.02 500 1000 1500 2000 2500 3000 |T| Service Rate (RS) NORMAL BSRR=0.75 BSRR=0.50 BSRR=0.25 BSRR=0.00 0 100 200 300 400 500 600 500 1000 1500 2000 2500 3000 |T| CPU (seconds) NORMAL BSRR=0.75 BSRR=0.50 BSRR=0.25 BSRR=0.00 a b c d

Fig. 4. Analysis of the extended survivability model, givenjBj = 200: (a) CBR with respect to BSRR; (b) increasing CBR compared with normal case; (c) CBR with respect to BSRR; (d) CPU time with respect to BSRR.

(11)

BSRR required, the more CPU time consumed. In the case ofjTj = 3000 inFig. 3(d), the CPU time

consumed is between 8 and 16 s in the smallest case jBj = 40, while it is between 209 and 556 s in

the largest casejBj = 200. In the case of jBj = 200 in Fig. 4(d), the CPU time consumed is between

57 and 117 s in the smallest case jTj = 500, while it is between 209 and 556 s in the largest case

jTj = 3000. The time consumed is more signiﬁcant when the number of BSs increases, than when the number of MUs increases.Comparison of the two models demonstrates the ﬂexibility of the modeling

and that the extended model is capable of solving the scalability problem.Table 5compares the CPU

time of the two models, where the problem size is given by jTj, jBj, and jFj; and jF0_{j is deﬁned by}

the BSRR, i.e., jF0_{j = jFj · BSRR. The total number of candidate BSs to be ﬁxed is calculated by}

C(jFj, jF0_j).

Table 5

Comparison of survivability models in terms of scalability

Model jTj/jBj/jFj CPU/C(jFj, jF0_j) BSRR = 0.25 BSRR = 0.5 BSRR = 0.75 Basic 1000/16/4 344/4 517/6 346/4 Extended 3000/200/4 250/20 310/190 400/1140 0 0.0001 0.0002 0.0003 0.0004 0.0005 0.0006 0.0007 0.0008 0.0009 0.001 1 11 21 31 41 51 61 71 81 Iteration Mean V1 V2 V3 V4 V5 0 0.0001 0.0002 0.0003 0.0004 0.0005 0.0006 0.0007 0.0008 0.0009 0.001 1 11 21 31 41 51 61 71 Iteration Mean V1 V2 V3 V4 V5 0 0.0001 0.0002 0.0003 0.0004 0.0005 0.0006 0.0007 0.0008 0.0009 0.001 1 11 21 31 Iteration Mean V1 V2 V3 V4 V5 0 0.0001 0.0002 0.0003 0.0004 0.0005 0.0006 0.0007 0.0008 0.0009 0.001 1 11 21 31 41 51 Iteration Mean V1 V2 V3 V4 V5 a b c d

Fig. 5. Sensitivity analysis of the Lagrangean multipliers for network survivability with diﬀerent BSRRs: (a) BSRR = 0.75, (b) BSRR = 0.5, (c) BSRR = 0.25, (d) BSRR = 0.

(12)

6. Sensitivity analysis

In this section, givenjBj = 36, jTj = 1000, we analyze the mean value of the following multipliers: V1 related

to Constraint(1), V2 related to Constraint(2), V3 related to Constraint(3), V4 related to Constraint(5), and V5

related to Constraint(6). All values of V5 are calculated close to zero.Fig. 5shows the eﬀect of the BSRRs on the

values of the multipliers, which are functions of the number of iterations. The calculations converge with diﬀer-ent numbers of iterations. Nevertheless, the multipliers are all increasing functions of the number of iterations, except for multiplier V2, because it can be a negative value. The results show that the QoS constraint (V1) is the most important in the survivability problem, especially when the BSRR is 0.25. This ﬁnding is validated by

Fig. 6. InFig. 6(a), the V1 value converges to 0.01 (the largest value) with respect to BSRR = 0.25 (vs. other

BSRRs). The CBR is aﬀected by the constraint of checking the aggregate ﬂow (V2), as inFig. 6(b). This

con-straint becomes much more important with a smaller BSRR (0–0.25) than a larger BSRR (0.5–0.75). The

cov-erage constraint (V3) plays a signiﬁcant role when no BS is recovered, as shown inFig. 6(c).

We also investigate the eﬀects of the problem of scalability. GivenjBj = 200 inFig. 7, irrespective of which

BSRR is considered, the QoS constraint (V1) is critical when the system has a heavy load (jTj larger than 2500) than when there is a light load (jTj less than 2500). The more jTj there are in the system, the larger the value obtained. When the system’s load is light, the constraint for calculating the aggregate ﬂow is more important than the QoS constraint. This means that it is more important to minimize the CBR for a light load than for a heavy load.

InFig. 8, ifjTj = 3000 is given, the QoS constraint (V1) is a signiﬁcant factor. Nevertheless, the multiplier

value decreases substantially when jBj increases, because the average system load decreases. The sensitivity

analysis yields the same results as the experimental analysis.

0 0.0001 0.0002 0.0003 0.0004 0.0005 0.0006 0.0007 0.0008 0.0009 0.001 1 21 41 61 81 Iteration Mean BSRR=0 BSRR=0.25 BSRR=0.5 BSRR=0.75 0 0.0001 0.0002 0.0003 0.0004 0.0005 0.0006 0.0007 0.0008 0.0009 0.001 1 21 41 61 81 Iteration Mean BSRR=0 BSRR=0.25 BSRR=0.5 BSRR=0.75 0 0.00002 0.00004 0.00006 0.00008 0.0001 0.00012 0.00014 0.00016 0.00018 0.0002 1 21 41 61 81 Iteration Mean BSRR=0 BSRR=0.25 BSRR=0.5 BSRR=0.75 0 0.000001 0.000002 0.000003 0.000004 0.000005 0.000006 0.000007 0.000008 0.000009 1 21 41 61 81 Iteration Mean BSRR=0 BSRR=0.25 BSRR=0.5 BSRR=0.75 a b c d

(13)

7. Concluding remarks 7.1. Research contributions

Survivability is always a crucial issue in both wireline and wireless networks. In the case of BS failures, when maintenance is undertaken to ensure a network’s performance, we propose a ﬂexible survivability model to relocate spare resources in such a way that total call blocking is minimized. In other words, we consider the survivability issue at the operational level. We reformulate the basic model to reduce the time complexity from

O(N2) to O(1), where N is the number of failed BSs. This indicates that the process of mathematical modeling

is an issue of concern in problem solving. The goal of the extended survivability model is to oﬀer survivable services that minimize total system call blocking. In addition, this model can also be applied to dynamic capac-ity expansibilcapac-ity. Comparison of the basic and extended models proves the ﬂexibilcapac-ity of our modeling, and demonstrates that the extended model can deal with the scalability problem by allocating portable/mobile BSs anywhere when needed.

7.2. Solution approach advantages

We apply LR as our solution approach because the complicated optimization problem discussed in this paper is NP-hard. Even though LR is a standard solution technique that can solve a wide range of combina-torial optimization problems, to eﬃciently and eﬀectively solve those problems, the following nontrivial and novel tasks/stages should also be considered: (1) the problem formulation stage: a suitable formulation that

0 0.00001 0.00002 0.00003 0.00004 0.00005 0.00006 0.00007 0.00008 0.00009 0.0001 500 1000 1500 2000 2500 3000 |T| Mean V1 V2 V3 V4 V5 0 0.00001 0.00002 0.00003 0.00004 0.00005 0.00006 0.00007 0.00008 0.00009 0.0001 500 1000 1500 2000 2500 3000 |T| Mean V1 V2 V3 V4 V5 0 0.00001 0.00002 0.00003 0.00004 0.00005 0.00006 0.00007 0.00008 0.00009 0.0001 500 1000 1500 2000 2500 3000 |T| Mean V1 V2 V3 V4 V5 C 0 0.00001 0.00002 0.00003 0.00004 0.00005 0.00006 0.00007 0.00008 0.00009 0.0001 500 1000 1500 2000 2500 3000 |T| Mean V1 V2 V3 V4 V5 a b c d

Fig. 7. Sensitivity analysis of network survivability givenjBj = 200 and diﬀerent BSRRs. The Lagrangean multiplier is a function of jTj: (a) BSRR = 0.75, (b) BSRR = 0.5, (c) BSRR = 0.25, (d) BSRR = 0.

(14)

can be decomposed into a number of subproblems to which LR can be successfully applied. This may require many attempts to reformulate the problem by trial and error; (2) the solution procedure stage: decides which constraints should be relaxed and how a number of the critical parameters can be carefully determined by a procedure, so that the optimal solution and a high convergence rate can be achieved; (3) The primal feasible solution: how to apply Lagrangean multipliers and develop an eﬃcient algorithm to get primal feasible solu-tions is a challenging issue. We use Lagrangean multipliers for sensitivity analysis so that the corresponding constraints can be evaluated for decision support.

7.3. Managerial implications

The total system CBR is much less aﬀected by the CBP threshold of each BS when the network load is light than when it is heavy; hence, the more traﬃc loaded, the less the service rate varies. BS recovery is much more important in a light load than in a heavy load. The BSRR is a key factor in reducing the blocking rate for a small number of deployed BSs, and it is the most important factor for reducing the blocking rate when the network load is heavy. The proposed extended model provides a survival service, subject to the allocation of available resources.

References

[1] Medhi D. A uniﬁed approach to network survivability for teletraﬃc networks: models, algorithms and analysis. IEEE Trans Commun 1994;42(2–4):534–48. 0 0.0005 0.001 0.0015 0.002 0.0025 0.003 40 80 120 160 200 |B| Mean V1 V2 V3 V4 V5 0 0.0005 0.001 0.0015 0.002 0.0025 0.003 40 80 120 160 200 |B| Mean V1 V2 V3 V4 V5 0 0.0005 0.001 0.0015 0.002 0.0025 0.003 40 80 120 160 200 |B| Mean V1 V2 V3 V4 V5 0 0.0005 0.001 0.0015 0.002 0.0025 0.003 40 80 120 160 200 |B| Mean V1 V2 V3 V4 V5 a b c d

Fig. 8. Sensitivity analysis of network survivability givenjTj = 3000 and diﬀerent BSRRs. The Lagrangean multiplier is a function of jBj: (a) BSRR = 0.75, (b) BSRR = 0.5, (c) BSRR = 0.25, (d) BSRR = 0.

(15)

[2] Medhi D, Khurana R. Optimization and performance of network restoration schemes for wide-area teletraﬃc networks. J Network Syst Manage 1995;3(September):265–94.

[3] Clarke LW, Anandalingam G. A bootstrap heuristic for designing minimum cost survivable networks. Comput Oper Res 1995;22(9):921–34.

[4] Koh SJ, Lee CY. A tabu search for the survivable ﬁber optic communication network design. Comput Indust Eng 1995;28(4):689–700.

[5] Ghashghai E, Rardin RL. Using a hybrid of exact and genetic algorithms to design survivable networks. Comput Oper Res 2002;29(1):53–66.

[6] Snow AP, Varshney U, Malloy AD. Reliability and survivability of wireless and mobile networks. IEEE Comput 2000;33(7):49–55. [7] Chuprun S, Bergstrom CS. Comparison of FH/CDMA and DS/CDMA for wireless survivable networks. In: Proc IEEE

GLOBECOM, vol. 3, 1998. p. 1823–7.

[8] Chen AC. Overview of code division multiple access technology for wireless communications. In: Proc IEEE 27th IECON, Aachen, Germany, 1998. p. T15–24.

[9] Chu K-C, Lin FY-S, Lee S-H. Integrated planning and capacity management of survivable DS-CDMA networks. In: Proc IEEE ICNSC, vol. 2, 2004. p. 1154–9.

[10] Chu K-C. Network survivability and performance modeling in cellular communication systems. GESTS Int Trans Comput Sci Eng 2005;8(1):13–24.

[11] Fisher ML. The Lagrangian relaxation method for solving integer programming problems. Manage Sci 1981;27:1–18.

[12] Hernandez MA, Janssen GJM, Prasad R. Uplink performance enhancement for WCDMA systems through adaptive antenna and multiuser detection. In: Proc IEEE VTC-Spring, vol. 1, 2000. p. 571–5.

[13] Tam W-M, Lau FCM. Analysis of power control and its imperfections in CDMA cellular systems. IEEE Trans Vehic Technol 1999;48(5):1706–17.

Kuo-Chung Chu received his B.S. and M.S. degrees in Computer Science from Feng-Chia University, Taiwan, in 1988 and 1990, respectively; and his Ph.D. degree in Information Management from the Department of Infor-mation Management, National Taiwan University in 2005. After graduating from the FCU, he joined the Computer Centre, Academia Sinica, Taiwan, where he was responsible for network systems management. In 1994, Professor Chu joined the Faculty of Information Management, Jin-Wen Institute of Technology, Taipei as a lecturer. Since 2005, he has been an Associate Professor in that department. His research interests include decision modeling, simulation, network planning and optimization, management, and performance evaluation of mobile wireless communications networks.

Professor Frank Yeong-Sung Lin received his B.S. degree in Electrical Engineering from the Department of Electrical Engineering, National Taiwan University in 1983; and his Ph.D. degree in Electrical Engineering from the Electrical Engineering Department, University of Southern California in 1991. After graduating from the USC, he joined Telcordia Technologies (formerly Bell Communications Research, abbreviated as Bellcore) in New Jersey, USA, where he was responsible for developing network planning and capacity management algo-rithms for a wide range of advanced networks. In 1994, Prof. Lin joined the Faculty of Electronic Engineering, National Taiwan University of Science and Technology. Since 1996, he has been with the Department of Information Management, National Taiwan University. His research interests include network optimization, network planning, performance evaluation, high-speed networks, wireless communications systems, distributed algorithms, and information security.