• 沒有找到結果。

Dynamic programming deals with situations where decisions are made in stages. The outcome of each decision may not be fully predictable but can be anticipated to some extent before the next. The objective is to minimize a certain cost – a mathematical expression of what is considered an undesirable outcome. The key aspect of such situations is that one must take the tradeoff between the desire of low present cost and undesired high future costs.

Therefore, at each stage, decisions are made based on the sum of present cost and the expected future cost.

2.2.1 Bellman Equation

The basic model of dynamic programming systems has two features: (1) an underlying discrete-time dynamic system, and (2) a cost function that is additive over time. The dynamic system expresses the evolution of some variables, the system “state”, under the influence of decisions made at discrete instance of time. The system has the form

(

, ,

)

, 0,1, , 1

1 = = −

+ f x u w k N

xk k k k k K

where

k indexes discrete time,

x is the state of the system and summarizes past information that is relevant for future k

optimization,

u is the control or decision variable to be selected at time , k k w is a random parameter (also called disturbance or noise), k

N is the number of times control is applied,

f is a function that describe the system and the mechanism by which the state is update. k

The cost function is additive in the sense that the cost incurred at time k, denoted by , accumulates over time. The total cost is

(

k k k

k x u w

g , ,

)

) )

( ) ∑

(

=

+ 1

0

, ,

N

k

k k k k N

N x g x u w

g

where is a terminal cost incurred at the end of the process. However, because of the presence of , the cost is generally a random variable and cannot be meaningfully optimized. We therefore formulate the problem as an optimization of the expected cost

(

N

N x

g wk

( ) ( )

⎭⎬

⎩⎨

⎧ +

= 1

0

, ,

N

k

k k k k N

N x g x u w

g E

where the expectation is with respect to the joint distribution of the random variables involved. The optimization is over the controls, but each control uk is selected with some

knowledge of the current state xk.

2.2.2 Dynamic Programming Algorithm

The dynamic programming (DP) technique base on a simple idea, the principle of optimality. Roughly, the principle of optimality states the following obvious facts.

Principle of Optimality

Let π =

{

µ01,KN1

}

be an optimal policy for the basic problem, and assume that when using , a given state occurs at time i with positive probability.

Consider the subproblem whereby we are at at time and wish to minimize the

“cost-to-go” from time to time N

π xi

xi i

i

( ) ( )

⎭⎬

⎩⎨

⎧ +

= 1

, ,

N

i k

k k k k N

N x g x u w

g E

Then the truncated policy

{

}

+

1 1, ,

, i N

i µ µ

µ K is optimal for this subproblem.

The principle of optimality suggests that an optimal policy can be constructed in the following steps. First construct an optimal policy for the “tail subproblem” involving the last stage. Then extend the optimal policy to the “tail subproblem” involving the last two stages.

Continue in this manner until an optimal policy for the entire problem is constructed.

2.2.3 Dynamic Programming in Handoff Problem

In [3], a dynamic programming algorithm in handoff is proposed. Dynamic programming allows optimization of the total cost along a state trajectory of a discrete-time dynamical system that has a stepwise additive-cost criterion and, conditioned on the state, stepwise independent-noise statistics.

First, the signal strength Xk( )i received from base station B at distance k( )i at kth sampling instant can be written as

( )i

dk

( )

( )

( ) log ( ) k( )i dB

i k i

i

k d d Z

X =µ −η + i=1 ,2

where μ and η account for path loss, μ depends on the transmitted power at the base

station, and η is the path-loss exponent. The term Zk( )i is the shadow fading component, which is accurately modeled (in decibels) as a zero-mean stationary Gaussian random process [14]

Suppose there are total of n time steps k = 1, 2, …, n on the portion of the mobile’s trajectory that involves B and ( )1 B( )2 . Let Bk denote the index of the operative base station at time k (i.e., Bk = i when the mobile is communicating with B(i)) and denote the other AP. A handoff decision is made during each sampling interval. The decision variable that takes on two values can be based in all signal strength measurements up to time k. If = 1, a handoff is made resulting in B

c

Bk

Uk

Uk k+1 =Bkc. If Uk= 0, no handoff is made and Bk+1= Bk+1.

Handoff algorithm design involves choosing the handoff decision function φk at times k = 1, 2, …, n-1. Let denote the minimum level of signal strength required for satisfactory service. And let N

SF and NH denote the total number of service failures and number of handoffs from time 1 to n. Then

[ ] {

( )

} {

( )

}

where I{‧} is the indicator function.

An optimal handoff algorithm is the set of decision function φ which provides the best tradeoff between the E[NSF] and E[NH]. This optimal tradeoff problem can be posed in Bayes formulation:

[ ] [ ]

H E SF

E

minc N + N

φ

where c > 0 is a tradeoff parameter.

For the handoff problem, the state Sk at time k consists of

(

Xk( )1 ,Xk( )2 ,Bk

)

, where denotes the base station which mobile communicates with at time. Thus we get the following update equation for :

Bk

Sk

( )

have the required independence structure.

Finally, the cost criterion as defined before is additive over time. If we define

(

k, k

)

= Ι

{

k =1

}

+Ι

{

k( )Bk <

}

,

then, the Bayes optimal handoff algorithm minimizes

[ ( ) ] ∑

[ ( ) ]

The DP solution is obtained recursively as follows. Let the expected cost-to-go of base station Bk at time k (due to all the decisions up to time k) be denoted by . Then the optimal handoff decision functions are obtained by solving the follow set of recursive equatins:

( ) ( ) past signal-strength measurements. These optimum decision functions are described by

( ) ( )

For the lognormal fading model that we have assumed, the conditional distribution of given is Gaussian, hence the probabilities in (1) are entirely determined by the conditional means and variances

( )i

where a is the correlation coefficient of the discrete-time fading process.

However, with and unknown, we are forced to use the best available estimate

E + , denoted as , base on the available information . The resulting decision function are:

( )i

相關文件