Fairness Model - System Model - 使用Fuzzy Q-learning達成具公平性動態頻寬分配的MPLS-TP環

Chapter 2 System Model

2.3 Fairness Model

In the transport network, if the nodes transmit data rate larger than the link capacity, an unfair bandwidth allocation problem occurs. The upstream nodes have more chance to transmit their data. The concept is to generate an advertised fair rate at

each node to regulate transit traffic flows. In order to define the meaning of fairness, an appropriate fairness model is indispensable. We choose a max-min fairness model for utilization optimization. The ring ingress aggregated with spatial reuse (RIAS) reference model was defined for RPR in the beginning.

The formal definition of RIAS [24] is described as follows: A matrix of rates R is said to be RIAS fair if it is feasible and if for each flow(i,j) cannot be increased while maintaining feasibility without decreasing R^i’,j’ for some flow(i’,j’) for which R^i,j’≤ R^i,j, (IA(i’) – IA(i)) at Bⁿ(i,j) ≤ (IA(i’) – IA(i)) at Bⁿ’(i,j), and IA(i’) ≤ IA(i) otherwise. The definition ensures fair share between different IA flows and subflows of each IA flow.

In RIAS model, the available bandwidth in current link will be fairly allocated among all ingress aggregated (IA) flows, where IA flow represents the aggregation of all flows originating from a specific ingress node. Furthermore, RIAS ensures maximal spatial reuse. That is, the unused bandwidth can be reallocated to other IA flows while maintaining fairness among the IA flows. Note that the RIAS model only takes single ring in consideration which is imperfect efficient for dual-ring networks. That’s why we need traffic engineering techniques to achieve network load balancing. If we do the ring selection appropriately, the traffic loading among two rings should be balanced.

Then if we tune the transmitting rate for each flow satisfied RIAS model in each ring, we can say the dual-ring network is fair. Note that RIAS model assume each flow is greedy. Even though the final target rate for each IA which travel through the congestion link are fair, the rate tuning procedure should consider the finite bandwidth querying by each IA. Otherwise, the permanent oscillation would be severe which make slow convergence.

Chapter 3 Fuzzy Q-learning Fair Rate Generator

3.1 Fuzzy Q-learning Fair Rate Generator

The Fuzzy Q-learning Fair Rate Generator (QLFG) contains four functional blocks: an adaptive fair rate calculator, a fuzzy Q-learning congestion estimator, a fuzzy local fair rate generator, and an advertised fair rate selector for each ringlet.

Figure 3.1 reveals the functional blocks and input/output parameters of QLFG.

Fuzzy

Figure 3.1: Functional blocks of QLFG

The fairness control scheme, which called fairRate, defined in IEEE 802.17 is implemented congestion control in each node. The downstream node generates an

advertised fair rate periodically to advertise its upstream node for regulating the added fairness-eligible traffic flow. We adopt the fair rate scheme and generate two fair rates individually for two ringlets. As the same time, the flows belong to different types traffic may select the two rings with different considerations to guarantee the quality of services. Since we have to guarantee the bandwidth of EF and AF type traffic, the only factor we can tune is BE type traffic transmission rate.

During the n th aging interval which is from time (n−1)T to nT, the QLFG determines the advertise fair rate, denoted by fv

( )

n . The fuzzy q-learning congestion estimator will estimate the congestion degree D n of the local node according to the

( )

buffer queue length and flow arrival rate. Besides traditional fuzzy inference system, we add the Q-learning ability for it. The fuzzy rule base output of the congestion estimator become changeable by q-function and the reinforcement signal which is called rate difference. And then the congestion estimator transmits the congestion degree to the fuzzy local fair rate generator.

The adaptive fair rate generator collects the received fair rate which advertised from downstream nodes, the reservation bandwidth for EF and AF traffic, the flow arrival rate of transit BE traffic, and the flow arrival rate of local existing BE traffic.

Then generating a provisional fair rate fp

( )

n and transmit to the fuzzy local fair rate generator.

The fuzzy local fair rate generator will generate a local fair rate f n , according l

( )

to provisional fair rate fp

( )

n and the congestion degree D n . Finally, the

( )

advertised fair rate selector will choose the received fair rate or the local fair rate as the advertised fair rate and transmit to upstream node by the opposite direction ringlet.

3.2 Fuzzy Q-learning Congestion Estimator

Considering for certain ringlet- x , the fuzzy congestion estimator is in charge of generating the congestion degree, denoted by D n , at the n -th aging interval for

( )

local fair rate generator. Since the larger flow arrival rate will increase the buffer queue length if serving rate non-changed, the congestion degree is determined not only by the buffer queue length of transit BE type traffic at the n -th aging interval, denoted by

( )

lt BE₋ n , but also by the flow arrival rate at the n -th aging interval, denoted by

( )

rt BE₋ n .

In our design, we adopted fuzzy logic control system to deal with the calculation.

The typical structure of a fuzzy logic control system, shown in Fig. 3.2, comprises of four principal components: a fuzzifier, a fuzzy rule base, an inference engine, and a defuzzifier. The fuzzifier transforms crisp measured input variables X into suitable fuzzy linguistic terms. These fuzzy linguistic terms are specified by membership functions μ(X) and defined in a term set T(X). The fuzzy rule base stores the empirical knowledge, which is described by a collection of fuzzy control rules (e.g., IF-THEN rules) involving linguistic variables. These rules describe the relationship between the input variable X and the control action Y. The inference engine is the kernel of a FLC.

It simulates human decision-making by performing an inference method to yield a desired control, which is described as fuzzy linguistic terms. These fuzzy linguistic terms are specified by membership functions μ(Y) and defined in a term set T(Y). The

defuzzifier is used to transform the inferred fuzzy control action to a non-fuzzy control action Y.

Figure 3.2: The basic structure of a fuzzy logic control system

Typically in traffic control area, we use the triangular function f x x a a ( ; ₀, ₀, ₁) and the trapezoidal function g x x x a a to define the membership functions for ( ; ₀, ,₁ ₀, ₁) terms in the term set. These two functions are given by

where x in ₀ f( )⋅ is the center of the triangular function; x (₀ x ) in ₁ g( )⋅ is the left (right) edge of the trapezoidal function; a (₀ a ) is the left (right) width of the ₁ triangular or the trapezoidal function. The center, edge, or width of the triangular or trapezoidal membership function is set intuitively but based on the characteristics of the linguistic variables.

Membership functions for terms of S, L, and H in T l

(

t BE₋

( )

)

are defined For the reason of simplicity in computation complexity of defuzzification, the corresponding membership functions of VL,L, M , H, and VH in ^{T D n}

( ( ) )

^are

( ( ) ) ( ( )

; 0.75, 0, 0 ,

)

Then we adopt Q-learning ability into the fuzzy inference system. Let

( ) ( ( )

( ) )

x t BE t BE

X n = l₋ n r₋ n denote the vector of input linguistic variables, and S , _m for m = 1, . . . , 6, denote the fuzzy linguistic terms of X n at n th interval. The

( )

fuzzy Q-learning rule for congestion estimator can then be designed as:

Rule m : if ^{X n is}

( )

S , then m D with _k qn

(

Sm^,D , for 1 k

)

≤ k ≤ 5,

(3.15) where D is the value of congestion degree with _k ^0.25

(

^k^{− and}¹

)

(

Sm^,Dk

)

is the Q-value for the state-action pair (S ,_m D ). The space of _k qn

(

Sm^,D is 30 since there k

)

are 6 rules and 5 output terms. The Q-value can be viewed as the preferring value for each D under different input state. Note that, the value of _k qn

(

Sm^,D is learned by k

)

the reinforcement signal at each episode. The intensity X n belongs to

( )

S , also _m called the rule intensity, is obtained by

( ) ( ( ) ) ( ( ) )

m n _α lt BE n _β rt BE n

µ =µ ₋ ×µ ₋ (3.16)

where α ∈{S, L} and β ∈{L, M , H}. For each rule m , we choose the D _k with maximum Q-value as the most suitable action for this input state. Let this selected action for rule m be am

( )

n . The global optimal action, denoted by ^{a X n n}

( ( )

)

can then be obtained by

( ( ) ) ^{( )} ^{( )}

( ( )

)

Since the main idea of reinforcement learning to learn an optimal value such that the accumulation of benefit can be maximized in the future. Watkins and Dayan proposed a strong off-policy method to obtain the optimal Q-function. The value of Q-function, denoted by qn

(

Sm^,Dk

)

, updates recursively according to the rate difference r n . Once the state-action pair has been determined, the rate difference d

( )

oscillation effect. We find out that r n can reveal the convergence condition of the d

( )

network flows. r n varies from d

( )

−C to C. Every time the available bandwidth changes, the flows constrained by fair rate will oscillation for a certain period. Besides the flow rates from each nodes, the magnitude of rate difference r n decreases and d

( )

converges to zero during the period. So we take the rate difference as reinforcement signal:

The idea is if the rate difference is positive, which means the summation of transit flow and local add flow rate exceed the available bandwidth, the congestion degree should become higher to overdose the local fair rate. Vice versa, if the rate difference is negative, which means the summation of transit flow and local add flow rate unfulfilled the available bandwidth, the congestion degree should become lower to release up the local fair rate. Then the value of q_n₊1

(

S_m,D_k

)

can then be updated

receive r X n a X n n

( ( )

) )

→ update qn

(

Sm^,D and back to the first step. k

)

3.3 Adaptive Fair Rate Calculator

The adaptive fair rate calculator is designed to calculate a provisional fair rate

( )

f n and transmit it to the fuzzy local fair rate generator. The provisional fair rate

( )

f n can been seen as the base of the local fair rate fl x,

( )

n where is tuned by fuzzy local fair rate generator according to congestion degree.

The concept of fairness is based on the RIAS model. We denote M n( ) as the equivalent number of IA transit flows traversing node i at n th aging interval. The

( ) aging interval and the flow arrival rate of transit BE traffic at the n -th aging interval.

Since the ring network might be large, the propagation delay would also be high. We must take the time-average to eliminate influence of the propagation delay.

So we can calculate a provisional fair rate of ringlet- x at the n -th aging interval, denoted by f n_p( ), by

(3.24) where r n is the reservation bandwidth for EF/AF traffic at the n -th aging _R( )

interval.

3.4 Fuzzy Local Fair Rate Generator

The fuzzy local fair rate generator is designed to find a suitable value of local fair rate for BE-type traffic of certain ringlet- x . The function of fair rate is to limit the excess upstream nodes’ traffic. A congestion node generates an advertised fair rate.

Then the upstream nodes receive the advertised fair rate via the opposing ringlet. We take the provisional fair rate, denoted by f n , and the congestion degree _p( )

generated from congestion estimator of ringlet- x at the n -th aging interval, denoted by D n , as fuzzy linguist input.

( )

(

^, ^{( )}

) (

^, ( ); 0.8 , 0.2 , 0.2

)

For the reason of simplicity in computation complexity of defuzzification, the corresponding membership functions of T f

(

_{l x}, ( )n

)

for terms of EL, VL, PL, L,

(

, ( )

) (

, ( ); 0.9 , 0, 0 ,

)

VH fl x n f fl x n B

µ = (3.43)

(

, ( )

) (

, ( ); , 0, 0 ,

)

VH fl x n f fl x n B

µ = (3.44)

There are 18 fuzzy rules for fuzzy local fair rate generator. As shown in Table 3.2, the order of significance of the input linguistic variables is f_{p x}_, ( )n then D n . The x

( )

fuzzy rule base is set by the concept that the value of f_{l x}_, ( )n mainly refers to f_{p x}_, ( )n but slightly adjusted by D n so as to achieve lower convergence period and higher x

( )

the utilization.

Table 3.1: The rule base of fuzzy local fair rate generator

Rule f_{p x}, ( )n D n x

( )

f_{l x}_, ( )n Rule f_{p x}, ( )n D n x

( )

f_{l x}_, ( )n

1 EL L PL 10 SH L H

2 EL M VL 11 SH M SH

3 EL H EL 12 SH H M

4 PL L SL 13 PH L VH

5 PL M L 14 PH M PH

6 PL H PL 15 PH H H

7 SL L M 16 EH L EH

8 SL M SL 17 EH M VH

9 SL H L 18 EH H PH

Then the defuzzifier uses the max-min method for the inference engine to generate a crisp-values local fair rate because it is suitable for real-time operation. To

explain the max-min inference method, we take rule 7 and rule 12, which have the

Subsequently, applying the max operator yields the overall membership function value of the control action “ f_{l x}_, ( )n is M ” denoted by wM

( )

n , by

( )

max

{

( )

, 12

( ) }

wM n = m n m n . And as so as for EL, VL, PL, L, SL, SH, H, PH , VH and EH, we can obtain all the output index. Finally, the fuzzy inference results are to be defuzzified to become usable values. By adopting the center of area defuzzification method, a crisp value of the local fair rate of ringlet- x at the n -th aging interval, denoted by f_{l x}_, ( )n , can be obtained by

3.5 Advertised Fair Rate Selector

Finally the advertised fair rate selector observes the incoming transit BE traffic

t BE( )

r₋ n , compares it with the minimum of the local fair rate f_{l x}_, ( )n and the received advertised fair rate from the other ringlet f_{r x}_, ( )n for ringlet- x . The main reason of using minimum operation is to be a little more conservative in order not to incur

overuse of a link too often. If r_{t BE}₋ ( )n is bigger than or equal to the minimum of

, ( )

fl x n and f_{r x}_, ( )n , the link is considered as overused. So, fair rate scheme will decrease the rates of IA flows, and choose the minimum of f_{l x}_, ( )n and f_{r x}_, ( )n as the advertised fair rate f_{v x}_, ( )n . If r_{t BE}₋ ( )n is smaller than or equal to the minimum of f_{l x}_, ( )n and f_{r x}_, ( )n , the link is considered as not sufficiently used. So, FQFRG will increase the rates of IA flows, and choose the maximum of f_{l x}_, ( )n and f_{r x}_, ( )n as the advertised fair rate f_{v x}_, ( )n . The function is described as below:

If r_{t BE}₋ ( )n ≥ min ( f_{r x}_, ( )n , f_{l x}_, ( )n ), f_{v x}_, ( )n = min ( f_{r x}_, ( )n , f_{l x}_, ( )n ) ; If r_{t BE}₋ ( )n < min ( f_{r x}_, ( )n , f_{l x}_, ( )n ), f_{v x}_, ( )n = max ( f_{r x}_, ( )n , f_{l x}_, ( )n ).

(3.46) However, we can simplify (3.46) as:

If f_{l x}_, ( )n > f_{r x}_, ( )n > r_{t BE x}_- _, ( )n , f_{v x}_, ( )n = f_{l x}_, ( )n ;

Otherwise, f_{v x}_, ( )n = min (f_{r x}_, ( )n , f_{l x}_, ( )n ). (3.47)

Chapter 4 Simulation Results and Discussions

4.1 Simulation Environment

In the simulations, we compare the performance of QLFG with DBA and FLAG.

The setting for the environment include 2.5Gbps link capacity, 100us propagation delay between nodes, 4Mbytes transit queue buffer for BE traffic, and 100us aging interval. The value of the transit queue high threshold is 1 Mbytes and the value of the transit queue low threshold is 0.5Mbytes. Simulation results are recoded per aging interval. For simplicity, we assume that no EF flow and only one AF traffic flow pass through each node and the rate is has 1Gbps. The learning rate η and discounted factor γ for fuzzy Q-learning congestion estimator is 1 and 0.9, respectively. The initial value of Q-function is based on the rule base of fuzzy congestion detector in FLAG [26]. We focus on the BE traffic flow rate variation of each node in the first period before it stable. All of the simulation parameters are shown as Table 4.1. The convergence time is obtained if the rate oscillation is in the range of 1% variation to its ideal fair rate.

Table 4.1: The simulation parameters

Parameter Assumption

Link capacity 2.5 Gbps

Propagation delay 100 μs to neighbor nodes

Aging interval 100 μs

Simulation time 300 s

Transit queue buffer for BE traffic 4 Mbytes Transit queue low threshold 0.5 Mbytes

AF flow rate 1 Gbps

AF flow on-off switch period

25 ms for greedy case 50 ms for finite traffic case

Learning rate 1

Discounted factor 0.9

Followings are scenarios set to examine the fairness algorithms’ performance. We adopt the large parking lot scenario with greedy traffic flows and various finite traffic flows to observe the property of bandwidth fair share on congestion link. We will compare three figures for each scenario, which are flow throughput, congestion detection signals, and congestion node output rate.

4.2 Large Parking Lot Scenario with Greedy Traffic Flows

Fig. 4.1 shows a large parking lot scenario where there are eight stations with seven greedy traffic flows. Node 1~7 all has a traffic flow to node 8, and the demand bandwidth are equal to the link capacity 2.5Gbps. There is one on/off AF traffic flow pass through each node. The rate of AF flow is 1Gbps when it switch on. And the switch period is 25ms.

Figure 4.1: Large parking lot scenario with greedy traffic flows.

Figs 4.2(a), 4.2(b), 4.2(c) present the throughput of flow(1-8), flow(3-8), flow(5-8), and flow(7-8) at node 7 by DBA, FLAG, and QLFG, respectively. First we can see that the QLFG and the FLAG take less than 20ms to stabilize the traffic flows at the first 25ms. It can also be seen the flows are getting the same bandwidth about 357.1Mbps and it guarantee the fairness property. Even though the QLFG take a little longer time to stabilize the traffic flows than FLAG, the situation can be seen as an initial state. Unfortunately, DBA can not stabilize the traffic flows in 25ms. Then at 25ms, the AF flow starts transmitting with 1Gbps bandwidth. The available bandwidth for BE traffic flows decrease to 1.5Gbps. DBA still oscillate now and for all. The

reason of DBA oscillation is the propagation delay from node 7 to node 1 is large. It can be seen that the QLFG and the FLAG take 10.7ms and 18.8ms to stabilize the traffic flows at the 25ms to 50ms. It can also be seen the flows are getting the same bandwidth about 214.3Mbps and it still guarantee the fairness property. Then at 50ms, the AF flow terminates and the available bandwidth for BE traffic flows increase back to 2.5Gbps. It can be seen that the QLFG and the FLAG take 7ms and 10.6ms to stabilize the traffic flows at the 50ms to 75ms. The better saturation performance of QLFG is cause by the fuzzy Q-learning ability and taking the rate difference into account.

0 50 100 150

0 500 1000 1500

Time(ms)

Throughput (Mbps)

flow(1,8) flow(3,8) flow(5,8) flow(7,8)

Figure 4.2 (a): Throughput of DBA.

0 50 100 150

Figure 4.2 (b): Throughput of FLAG.

0 50 100 150

Figure 4.2 (c): Throughput of QLFG.

We can see the difference between QLFG and FLAG clearly by Figs 4.3.

Figs 4.3 (a), 4.2(b), 4.2(c) reveals the congestion detection on node 7 by DBA, FLAG, and QLFG, respectively. Focus on the AF flow switching time per 25ms. By taking the rate difference into account, the fuzzy Q-learning ability makes the congestion degree much sensitive. We can see the congestion degree increase quicker than FLAG at 50ms in QLFG. And it so does for the available bandwidth increasing at 500ms. Since the congestion degree becomes sensitive in QLFG, the advertised fair rate from node7 can make the network flows stable more quickly.

0 50 100 150

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Time(ms)

transit BE queue length / size rate diff. / capacity

transit BE arrival rate / capacity congestion degree

Figure 4.3 (a): Congestion detection on Node 7 by DBA.

0 50 100 150

transit BE queue length / size rate diff. / capacity

transit BE arrival rate / capacity congestion degree

Figure 4.3 (b): Congestion detection on Node 7 by FLAG.

0 50 100 150

transit BE queue length / size rate diff. / capacity

transit BE arrival rate / capacity congestion degree

Figure 4.3 (c): Congestion detection on Node 7 by QLFG.

Figs 4.3(a), 4.3(b), 4.3(c) reveal the output rate of transit queue, local added queue, and advertised fair rate with BE available bandwidth switching at node 7 by DBA, FLAG, and QLFG, respectively. We can see the advertised fair rate by QLFG has less oscillation. The transit rate and local add rate is changing smoothly and properly because of the fuzzy Q-learning ability.

0 50 100 150

0 500 1000 1500 2000 2500

Time(ms)

Rate (Mbps)

BE available bandwidth transit BE arrival rate local BE add rate adv. fair rate

Figure 4.4 (a): Node 7 Output by DBA.

0 50 100 150 0

500 1000 1500 2000 2500

Time(ms)

Rate (Mbps)

BE available bandwidth transit BE arrival rate local BE add rate adv. fair rate

Figure 4.4 (b): Node 7 Output by FLAG.

0 50 100 150

0 500 1000 1500 2000 2500

Time(ms)

Rate (Mbps)

BE available bandwidth transit BE arrival rate local BE add rate adv. fair rate

Figure 4.4 (c): Node 7 Output by QLFG.

4.3 Large Parking Lot Scenario with Various Finite Traffic

Flows

Fig. 4.5 shows a large parking lot scenario where there are eight stations with seven non-greedy traffic flows. Node 1~7 all has a traffic flow to node 8. Assume that flow(1-8) and flow(2-8) require 650Mbps, flow(3-8) and flow(4-8) require 400Mbps, flow(5-8) and flow(6-8) require 200Mbps, and flow(7-8) requires 100Mbps. There is one on/off AF traffic flow pass through each node. The rate of AF flow is 1Gbps when it switch on. And the switch period is 50ms.

Figure 4.5: Large parking lot scenario with various finite traffic flows.

Figs 4.6(a), 4.6(b), 4.6(c) present the throughput of flow(1-8), flow(3-8), flow(5-8), and flow(7-8) at node 7 by DBA, FLAG, and QLFG, respectively. First we can see that the QLFG and the FLAG take about 21ms to stabilize the traffic flows at the first 50ms. Even though the QLFG take a little long time to stabilize the traffic flows than FLAG, the situation can be seen as an initial state as in the previous greedy

在文檔中使用Fuzzy Q-learning達成具公平性動態頻寬分配的MPLS-TP環 (頁 22-0)

Fairness Model

Chapter 2 System Model

2.3 Fairness Model

Chapter 3

Fuzzy Q-learning Fair Rate Generator

3.1 Fuzzy Q-learning Fair Rate Generator

( )

( )

( )

( )

( )

( )

3.2 Fuzzy Q-learning Congestion Estimator

( )

( )

( )

(

( )

)

( ( ) )

( ( ) ) ( ( )

)

( ) ( ( )

( ) )

( )

( )

(

)

(

)

(

)

(

)

(

)

( )

( ) ( ( ) ) ( ( ) )

( )

( ( )

)

( ( ) ) ( ) ( )

( ( )

)

(

)

( )

( )

( )

( )

( )

(

)

( ( )

( ( )

) )

(

)

3.3 Adaptive Fair Rate Calculator

( )

( )

( )

3.4 Fuzzy Local Fair Rate Generator

( )

(

) (

)

(

)

(

) (

)

(

) (

)

( )

( )

( )

( )

( )

( ( ) ) ^{( )} ^{( )}