Model for MDP - Self-Configurable Congestion Control

Self-Configurable Congestion Control

3.1 Model for MDP

We use MDP to generate our desired policy and we will start to build a Markov decision process to generate our desired policy from now on. We first define the discrete state space as the packet arrival rate, since our goal is to control the packet arrival rate, which is measured in the percentage of the target bandwidth. Let S denotes the state space, then S={10,20, ... ,200}, which represents 10% to 200% of target bandwidth. And we assume at each state, we can give “rate commands” to control TCP sessions’ throughput. We define these “rate commands” to be our action space A, A={15,25,35,45,50,60,76,86,90,94,95,96,97,98,99,105,110,120,130,140}.

Each action represents the amount of the target bandwidth (measured in percentage).

An essential part of Markov is the state transition probability, but such kind of transition probability of TCP traffic is very hard to determine. Instead of using complicated model, we try to build up a simplified model that can satisfy our need.

Another important issue is that we shall conclude the effect of real network environment on just one parameter, so we can let the model to configure this parameter by its own self to reduce the difference between our model and the reality.

Obviously, there is no such technique can help us to build up our model, and such kind of model has not yet been seen. So we build a model that can generate our desired policy first, then we analyze our model later to see if it is reasonable. We conclude the properties of our policy as follows: 1. While the arrival rate is larger than 100% of bandwidth, we should ask TCP senders to decrease their throughput lower than 100% of bandwidth to decrease the queue length; 2. While the arrival rate is smaller than 100% of bandwidth, we should ask TCP senders to increase their throughput larger than 100% of bandwidth to increase the queue length; 3. By changing just one parameter, the Markov decision process should be able to generate a new policy. We use MDP to decide when to decrease the rate and how much should it decrease. We try to use normal random variables to generate state transition probability. The reason of choosing normal random variable is that we can easily control its mean and deviation to generate our desired policy.

After merely infinite trials, we conclude that state transition probability for a transition from state i to state j by choosing action k (P_ij^k) should be in the following form:

and the variable

N denotes the normal random variable we used to generate

_i^k transition probability. It’s mean and deviation is defined in the following equations :

(3.3)

The parameter α and error are used to represent the effect of the real network environment, and we can configure the value of them to reduce the difference between our model and the reality. We will explain their roles and functionality later. And the immediate cost is defined to be :

] [ 100 statei

r_ij = − . (3.4)

So we are looking for a policy that can minimize the cost. Such policy will keep arrival rate approaching to 100% of target bandwidth. If we set α to be 0.9 and

error to be 40, then we can use the method introduced in section (2.1) to find the

policy. In general, it takes only 5 iterations to find the policy. The result is shown in table (3.1)

State 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Action 105 105 105 105 105 99 99 99 99 99 99 99 99 99 99 98 97 95 94 94

Table(3.1) policy obtained by iteration method forα=0.9, error=40

And the lost of the Markov process is 28.730847. This means that under such policy in average the instantaneous packet arrival rate is about [1⁺₋(0.28730847)]*(target bandwidth). We try to maintain the average packet arrival rate to be the target bandwidth by using policy that can minimize the lost. And simply by changing the

value ofα, we can generate different policy as in Table (3.2).

Table 3.2 policy for differentα, error=40 and the corresponding “lost” for each α is:

α 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

“lost” 9.4158 11.6888 13.9770 16.2859 18.6062 20.9794 23.4336 26.0079 28.7308 Table 3.3 “lost” value of different α

From table(3.2), we can see that for largerα, the policy becomes more aggressive.

So if current state is much larger than target bandwidth, the policy will decrease the TCP senders’ throughput significantly.

In our model, we have two parameter need to be configured: α and error.

However, our current technique can only dynamically adjust one of them. Because that a different policy correlates to a different value of ‘lost’. So, we can use the value of gain to connect between the Markov process and the real network environment. For example, we can first assume the error is very large, so we let α to be 0.91 and use Markov decision process to find the policy. Then we use the policy to control the arrival rate and record the immediate cost as the system transits between states. After a while, we can obtain the real value of the system’s lost, and we can use this value to find the corresponding value of α. Then, we can use the new α to find a new policy. We define

g to be the average of immediate cost

_t over time, so gt is actually the “lost” of the Markov process, and update

g in the

_t following way:

t r g g

g

_t _t ^t − ^t +

= ₋₁ ⁻¹ (3.5)

In eq(3.5), t denotes the number of stages, and

r denotes the immediate cost we

_t observed at stage t. So after we obtain the new value of

g , we will find the value

_t ofα according to the table (3.3) and generate a new policy.

在文檔中可自動調整之壅塞控制 (頁 32-36)