Channel - System Specification - 在以道路基礎蜂巢式網路中以馬可夫決策為基礎並使用鄰近狀態資訊的允入控制機制

Chapter 3 System Specification

3.3 Channel

In our model, we assume that each cell can support up to C mobiles simultaneously, and each mobile use one channel to make the connection. As a result each cell has C channels. In our cellular system structure, all cells are surrounded by six cells. So this cell’s C channel and its adjacent cells’ total 6C channels evolve as a two-dimensional Markov chain as shown in Fig.

3.2 below.

In Fig. 3.2, the first dimension is made of base cell’s channel state, where C1 is the fixed link capacity C. The second dimension is made of all adjacent cells’ total channel state, where C2 is six times of the fixed link capacity C. The total states of this two-dimensional Markov chain are C1 × C2.

Fig. 3.2 Total states diagram of the two-dimensional Markov chain

Chapter 4 Problem Formulation by MDP in FA Strategy

And Proposed Method

In this chapter, we introduce how to find the optimal policy under Markov Decision Process (MDP). Making a correct decision depends on cost function. We then use the policy-iteration method and state aggregation method to solve the problem of MINOBJ.

In the case of single-service networks, Krishnan and Ott [15], and Lazarev and Starobinets [16] have proposed state dependent routing schemes with roots in Markov decision theory. We use the separable routing concept defined by Krishnan and Ott which is appropriately modified for the case of cellular networks. We also study the problem of call admission control where we follow Zachary’s procedure [17] to determine the cost of rejecting new calls and dropping handoff calls.

4.1 Our Model

The cell is described by a two-dimensional Markov chain with the following assumptions:

1. New call arrival in the base cell and adjacent cells are according to a stationary Poisson process with mean rate λ1 and λ2, respectively.

2. Departure rate of both new and handoff call is exponentially distributed with rate μ.

3. Call handoff form the base cell to adjacent cells and form adjacent cells to base cell are also exponentially distributed with rate h1 and h2, respectively.

We consider a homogeneous system where each radio cell can support up to C calls, the cell state vector n(t) which provides the complete state description of the cell at any time instant is defined as

( ) ( , ),

n t = x y ∀ ∈ n N (4.1)

where x is the number of calling mobiles in the base cell at time t, and y is the number of calling mobiles in all adjacent cells at time t. The cell space is denoted by N, which contains a finite but large number of states. The state transition rate diagram is shown in Fig. 4.1.

Fig. 4.1 State transition diagram

4.2 Alternatives and Costs

The MDP with costs has been the means to an end. This end is the analysis of decisions in sequential processes that are Markovian in nature [5]. We at first introduce alternatives and costs of sequential decision process and define them in this section.

In our cell model, we have two alternatives when a new call (or a handoff call) comes:

z alternatives 1 : accept

z alternatives 2 : reject (or drop)

We then define that a cost ω1 (or ω2) is incurred when cell rejects (or drops) the arrival call. By these definitions, there are different behaviors with corresponding alternatives. In our case, we make a difference in Fig. 4.2 that cell admits a new (or handoff) call incur nothing but rejects (or drops) it with cost ω1 (or ω2). These analyses will help us to find the solution of the sequential decision process.

Fig. 4.2 Transition diagram with alternatives 1 and 2

4.3 Our Policy-Iteration Method

An optimal policy is defined as a policy that minimizes the cost. It is conceivable that we could find the cost for each of these decisions in order to find the policy with the least cost.

We are interested in infinite-horizon systems and we know that the appropriate objective is the average cost optimization. It simply means that our goal is to minimize the expected rate of the cost of lost calls. Let us denote by Vπ(t) the lost revenue in the cell during the time interval [0, t] under the policy π∈П, where П is the set of all policies. Then, using the result from [5], we have the expected value

(

)

( ) (1), ( )

E V t n⎡⎣ _π =n ⎤⎦=g t v n_π + _π +o t→ ∞ (4.2)

where n∈N is the cell state at time t = 0. In Markov decision theory, vπ(n) is the well-known relative value or cost of starting in state n0 = n. In Eq. (4.2), gπ represents the expected cost per unit time under the policy π on the original continuous-time scale. Since the system is ergodic, we may call gπ the gain of the process. The objective is to minimize the equilibrium expected cost per unit time, that is, gπ. The “small o” symbol o(1) means that for both the right hand side (RHS) and left hand side (LHS) of the equation go to infinity, and the difference goes to zero.

Before to find the relative cost values vπ(n), we define two vectors

2 1 0

Then, in the case of the departure of the call when the cell state is n, the immediately subsequent state dk(n)∈N is found as

k( ) k

d n = − n e (4.5)

A new call admission decision needs to be made at call attempt epochs: either accept or reject. Denoting an alternative taken on the arrival of a call by πk(n) where n∈N is the current cell state. In the case of call rejection

k( )n n

π = (4.6)

If the new call is accepted, the subsequent state of the cell will be found as

k( )n n ek

π = + (4.7)

A handoff call admission decision needs to be made at call cross the cell boundary epochs:

either accept or drop. Use the same definition above, in the case of dropping a handoff call

( )= −

k n n ek

π (4.8)

If the handoff call is accepted, the subsequent state of the cell will be found as

k( )n n fk

π = + (4.9)

Now we start to introduce how to find the relative cost values vπ(n) for all n∈N. The same equation also governs the asymptotic behavior of the process if we assume that it has started immediately after the first event that has occurred after t = 0. This is because of the ergodic

nature of the system, where the initial state has no effect on the asymptotic behavior of the process far enough in the nature. The first event is either a call termination or a new (handoff) call arrival. The expected time τ for the first event after t = 0 is given as

( )

where we used the memoryless property of the system. Writing Eq. (4.2) for a starting time t = 0 and a first event time t = τ (the latter one is conditional on the type of the first event), we obtain after some arrangements

In the system of linear Eq. (4.11), the unknown variables are vπ(n) for all n∈N, and the gain of the process gπ. Obviously, the system has one more variable than the number of equations so that vπ(⋅)s can be determined up to an additive constant. To solve the system Eq.

(4.11), we follow the standard procedure by setting vπ(0) = 0. Thus, we get the system

( )

4.4 Our State Aggregation Method

When using Gaussian elimination method to solve Eq. (4.11), we will face the same

problem already described in section 2.5. The inverse matrix of transition probability matrix P is of complexity O(n³), which is impractical for large n.

We take the Guard Channel policy mentioned in section 2.2 for an example. The threshold T will divide the states of the cell into three groups. From state 0 to T is of group one which can accept all kinds of calls. And from state ( T＋1 ) to ( C－1 ) is of group two which can accept only handoff calls. Note that there is a group three when the cell state is C. When in this group, no call will be accept due to unavailable of the channel. Thus we learn from this example that we can group states which are few steps reachable in the neighborhood.

After that, we use the method like quantization to divide the one-dimensional Markov chain into even size, excluding the last state which is an independent group. Finally, in the case of taking adjacent cells’ states into consideration, the two-dimensional Markov chain can be grouped as shown in Fig. 4.3 below.

Fig. 4.3 Make two-dimensional Markov chain into smaller groups

Chapter 5 Modification from FA to BDCL by One-Step Policy

And Update Rules of Policy with Time-varying MDP Parameters

In this chapter, we will introduce how we modify the previous MDP model of FA strategy to BDCL strategy by One-step Policy, which will make use of the previous computational result. Because the parameters of MDP model vary with time, we will introduce the update rules of time-varying MDP parameters to fit the actual system.

5.1 Effects of Borrowing Operation

The main difference between FA and BDCL strategies is “Borrowing” operation; therefore, we have to define the state transition diagram of MDP model for “Borrowing” operation.

Fig 5.1 Transition diagram of 3 alternatives when new call, handoff call arrive for Base Cell

As the above Fig 5.1, in BDCL model, there are three alternatives when a new call (or a handoff call) arrives:

z alternatives 1 : accept

z alternatives 2 : reject (block or drop) z alternatives 3 : borrow

When a new call arrives, accepting the call will make a channel in use for Base Cell and therefore the state will transition right; blocking the call will not make any channel in use or released for Base Cell and therefore the state will self transition; borrowing a channel from the neighbor will make the neighboring cell one channel in use for Base Cell and therefore the state will transition down. It is illustrated as the above Fig 5.1.

When a handoff call arrives, accepting the call will make a channel in use for Base Cell, a channel released for the neighboring cell and therefore the state will transition upward-right;

dropping the call will make a channel released for the neighboring cell and therefore the state will transition up; borrowing a channel from the neighbor will make the neighboring cell one channel released for a neighboring cell, a channel in use for another and therefore the state will self transition. It is illustrated as the above Fig 5.1.

Fig 5.2 Transition diagram for alternatives 1, 2 and 3

As illustrated above in Fig 5.2, we will define a cost ω1 which is incurred with the alternative 2 when a new call arrives, a cost ω2 which is incurred with alternative 2 when a handoff call arrives, and a cost ω3 which is incurred with alternative 3 when a call (new call or handoff call) arrives. The cost ω3 is not fixed but varied with the condition of all adjacent cells, and ω3 is introduced by the effect of “Borrowing” operation on all adjacent cells. ω3 is derived online, and it depends on the condition which includes the channel to borrow and the states of all adjacent cells. After the borrowing, alternative 3 will cause the state transition of the adjacent cells of the base cell. There will be an example as illustrated as Fig 5.3 below.

Fig 5.3 Effect of borrowing operation on adjacent cells

If Cell P borrows Channel Ch1 from Cell A1, the state transitions of the adjacent cells will be illustrated below as Fig 5.4 and Fig 5.5

Fig 5.4 Example of effect by borrowing operation on adjacent cells

Fig 5.5 Example of effect by borrowing operation on adjacent cells

When a call (new call or handoff call) arrives, we have to check all channels of the neighboring cells and get the channel that will cause the least cost ω3. The channel that causes the least cost of ω3 is selected as the channel to borrow if alternative 3 is the best alternative to take.

5.2 One-Step Policy for BDCL Strategy

There are 9 Policies for both new calls and handoff calls, and it is listed as table 5.1 below.

In order to get the One-step policy online, we facilitate the MDP computational result, which is derived offline, by FA strategy mentioned in chapter 4. And then we use the values of the states derived offline in FA strategy to get One-Step Policy, which is the improved policy. The improved policy means that the policy derived online is not the optimal policy but the improved one. The derivation of One-Step Policy is different from Policy Iteration Routine mentioned in chapter 3 because we just make Policy Improvement Routine once and not make Value Determination Routine. It is illustrated as Fig 5.6 below. It is proved that One-Step Policy although is not the optimal policy but it is closed to the optimal one [5], and since it just make Policy Improvement Routine once, it is also economical of computational resources.

Table 5.1 Alternatives of One-Step Policy

Alternatives New Call Handoff Call

0 block drop

1 use drop

2 borrow drop

3 block use

4 use use

5 borrow use

6 block borrow

7 use borrow

8 borrow borrow

Fig 5.6 One-Step Policy

One-Step-Policy improvement routine is illustrated above as Fig 5.6, we just make Policy-Improvement Routine once but run Value-Determination Operation.

5.3 Update Rules of Policy with Time-varying MDP Parameters

There are six parameters in our MDP model, and they are λ1,λ2, μ1, μ2,h1 and h2. The six parameters of the actual system vary with time; therefore, we have to adjust them periodically to make our MDP model closer to the actual system and to get the more improved policy. The method how we adjust these parameters is to appreciate the system cost that is induced by rejecting calls (new calls or handoff calls) and the model cost that is one of the computational result in our MDP model, “gain”. The system cost is divided into two parts, and they are

“Block Cost” that is the cost due to blocking new calls and “Drop Cost” that is due to dropping handoff calls. The update rule is derived by appreciating the data of the simulation result. We find that the six parameters may be rational to the either Block Cost or Drop Cost, and it is listed blow.

Table 5.2 Parameters update rules

Parameters Rational Cost (Block Cost or Drop Cost) (+ : positive rational or - : negative rational) Base cell’s arrival rate λ1 Block Cost (+) model cost is defined as Eq. (5.2), and is the sum of the gain of the MDP model per update period.

Before the explanation of the update rules, we have to define the difference of model cost and system cost as Eq. (5.3) below:

Furthermore, the adjustment factor of parameters is defined as Eq. (5.4) below:

: base cell's arrival rate

, : neighboring cells' arrival rate

(1 )

: base cell's departure rate

, : neighboring cells' departure rate

(1 )

: base cell's handoff-out rate , : neighboring cells' handoff-in rate

(1 ) update rules to ensure that the model is closer to the actual system. The update period is determined by how fast the system changes.

Chapter 6 Simulator and Results

6.1 Simulator Settings

1. The size of the map for simulation is 12.12 km x 24.25 km. The map are composed of nodes that contain information about :

i. the position : (x , y);

ii. the type of the node : road or not;

iii. the coverage of the cell.

2. There are 98 (14 X 7) cells on the map, and the radius of each cell is 2 km

3. Wrapped-around Map: when mobile reaches the boarder of the map, it will move to the opposite side boarder of the map.

4. There are 50 Channels per cell : the capacity of base cell (the number nominal channels C1) is 50, and the capacity of the neighboring cells C2 is 300.

5. The whole map is spread non-uniform traffic loading

6. Poisson Arrivals on the whole map with arrival rate λ (arrivals / cell / hour). Arrival Rate of mobiles in Base Cell : λ1 ; Arrival Rate of mobiles in Neighboring Cells : λ2.

7. Exponentially distributed service time per mobile with average service time 180 seconds per call. Departure Rate of mobiles in Base Cell : μ1 ; Departure Rate of mobiles in Neighboring Cells : μ2.

8. Handoff rates are determined by the randomized number of mobiles in the base cell, the number of mobiles in the neighboring cells, the speeds of mobiles in the base cell and neighboring cells, and the road topology spread on the base cell and the neighboring cells, and so on. Handoff-in Rate : h1 ; Handoff-out Rate : h2. h1 and h2 are derived by measuring in Base Cell.

9. Mobiles move mainly in one direction and at speeds (20 ~ 90 km/hr) with 5% variance, and they do not move back unless there is no way to move forward, right, or left.

10. We define call failure rate:

Call Failure Rate = P_b + (1-P_b) x P_d , Pb: Call Blocking Rate, Pd: Call Dropping Rate (6.1)

Fig 6.1 Road Topology Example for Simulation

6.2 UML Statechart of Our Model

We use UML(Unified Machine Language) to simulate the environment. The map is transformed from the simulator as illustrated in fig 6.1 above. The OMD (Object Main Diagram) is as illustrated in fig 6.2 below. When the simulation starts, one object of CellsGen generates objects of Map, of Cell, of Channel, and sets all links between Map-Cells, between Cell-Cell, between Cells-Channels. The object of Map generates the map and roads on the map in the beginning and produces objects of Mobile non-uniformly on the roads of the object of Map at the moment when the call comes. Objects of Mobile move along the roads generated by the object of Map, and make the handoff operation from one object of Cell to another when they move across the objects of Cell. The object of Cell has three jobs. First, it gets new calls and handoff calls from objects of Mobile. Secondly, it sets states of the objects of Channel.

Finally, it informs the objects of Cell that are affected by the handoff operation, and keeps the list of using channels, the list of borrowing channels, and the list of borrowed channels. The objects of Channel are passive objects. They just own the records of states of themselves recorded by objects of Cell. The object of MDP owned by the object of Cell make calculation of Policy Iterations to decide the policy.

Fig 6.2 OMD of the Simulator

The State Chart of Mobile are illustrated as fig 6.3 below. When it is constructed, it will own a randomly generated service time which is exponentially distributed, the speed which is randomly selected from 20 km ~ 90 km. Its residual time is determined by the roads and the speed at which the object of Mobile moves.

Fig 6.3 State Chart of Mobile

The State Chart of Map is illustrated as Fig 6.4 below. When it is constructed, it produces the map in the beginning, and generates the inter-arrival time randomly with exponential distribution to determine when the object of Mobile is constructed.

The State Chart of Cell is illustrated as Fig 6.5 below. It receives events evIn which include new call events from objects of Mobile, and handoff-in call events from objects of Cell. It also receives events evOut which include call ending events from objects of Mobile and handoff-out events from objects of Cell. It also periodically updates parameters including Base Cell arrival rateλ¹ , Neighboring Cell arrival rateλ², Base Cell departure rateμ¹, Neighboring Cell departure rateμ², Handoff-out rate h1, and Handoff-in rate h2.

Fig 6.4 State Chart of Map

Fig 6.5 State Chart of Cell

Because there are total 50 × 300 states which is to difficult to compute and not efficient in real-time, we use the aggregation method mentioned to group states into smaller groups. In our model, we choose total 6 × 11 states as shown in Table 6.1 below which is a compromise between computing complexity and the difference of the result derived. After the offline policies are determined, the values of all states are derived. With the values of the states, we will then determine the online policies by One-Step policy when events evIn occur.

Note that the last column and row (with gray background) of the table is made of only one single state. Because no matter there is a new call or a handoff call arrives in that state, it will

在文檔中在以道路基礎蜂巢式網路中以馬可夫決策為基礎並使用鄰近狀態資訊的允入控制機制 (頁 24-0)