Design of the FQL Rules Base c3 3 - Fuzzy Q-learning based H-ARQ Scheme

Chapter 3 Fuzzy Q-learning based H-ARQ Scheme

3.3 Design of the FQL Rules Base c3 3

In section 3.2, we have selected the input and output linguistic variables as well as defined the membership function of input variables for the fuzzy interface. In this section, we will design the fuzzy rule base, which is consisted of the if-then rules and the Q-learning algorithm as well as the corresponding reinforcement signal for each

rule.

The rule form of the FQL is shown in Eq. (3-4). We need to design the reinforcement signal for each rule to update the q-value of each action and accomplish the Q-learning operation. Design of the fuzzy rules will be based on the concept that the choice of MCS will be more aggressive if better BLER n performance, and on ( ) the other hand, is more conservative if worse BLER n . The decision is mainly ( )

counted on BLER n , while ( ) CQI n will be used to determine the selection base so ( ) as to accelerate the learning procedure. Therefore we divide the rules into three parts based on the 3 fuzzy terms, Green, Yellow and Red. Rules in the same parts will have the same reinforcement signal. The details of each part are described in the following.

Part 1: if ^{BLER n}^{( )} is Green, and ^{CQI n}^{( )}is Level m, then

MCSk with q_n

(

⁽BLER n( ) is Green ,CQI n( ) is m),MCS_k

)

,k >m There are 10 rules (1≤ ≤m 10)in part 1. In this part, BLER n is considered to ( )

be in a safe region where BLER n is much smaller than ( ) BLER^*. The main goal is to maximize the throughput. To be more aggressive, only MCS_k with k > will be m considered. The amount of carrying information and whether the signal could be successfully decoded before dropping will be focused. Thus the reinforcement signal is designed as

, if succesful transmission, 10, if failure transmission,

R represents the number of information bits in the packet with MCS and _k

rdk

R is the required redundancy bits ,contained in the initial transmission and the retransmission, for the successful transmission We will update the q-value after the transmission of the packet is completed. It can be expected that higher achievable data rate after transmission will get larger reward feedback. If the transmission fails to be decoded after 3 retransmissions, the block will be dropped and we will give a severe punishment, r= -10 at such condition.

Part 2: if ^{BLER n}^{( )} is Yellow, and ^{CQI n}^{( )}is Level m, then

MCSk with q_n

(

⁽BLER n( ) is Yellow ,CQI n( ) is m),MCS_k

)

,m− ≤ ≤ +1 k m 1 There are 10 rules (1≤ ≤m 10)in part 2. Yellow BLER n means that the ( ) BLER performance is around the requirement. It will be better to keep BLER still in a safe range. Hence we will choose the MCS which has the BLER nearest to BLER ^* under CQI n while containing the most information.( ) MCS ,_k k = −m 1, , 1m m+ , will be the considered action candidates. The reinforcement signal in this part is set as

* ^k ,

MCS

r R σ R

= × (3-27)

where R_MCS^* is the number of information bits if 16 QAM modulation order and coding rate 4

5, and σ is a scalar, which is 1

8 if successful decoding after initial transmission and − if failed. Here the degree of reward is normalized by 1 R_MCS^* to be proportional to the amount of information data.

σ is a weighting factor according to the BLER requirement 0.1. Since it means the occurrence possibility of success and failure transmission has the ratio equal to 9, the weighting factors for success and failure initial transmission are set to have the

ratio 1

9. For the reason to be more aggressive, we increase the ration up to 1

8. And if the transmission is dropped, we will give a severe punishment r=-10.

Part 3: if ^{BLER n}^{( )} is Red, and ^{CQI n}^{( )}is Level m, then

MCSk with q_n

(

⁽BLER n( ) is Red ,CQI n( ) is m),MCS_k

)

,k<m There are 10 rules (1≤ ≤m 10)in this part. Red BLER n represents BLER ( ) requirement violation. It will be better to take action to recover from this situation.

The action decision should be more conservative, thus MCS k_k, < will be chosen. m The reinforcement signal is set as

* , if successful decoding after initial transmission.

1, if failed decoding after initial transmission.

-10, if transmission is dropped.

Only when the packet is successfully decoded in the initial transmission, the system will be rewarded. The degree of reward is proportional to the amount of information bits. If the initial transmission failed, the system will be given a severe punished r=-10.

There are ten rules contained in each part separately. The intensity of the 30 rules j , j=1,..., 30 will be inferred from the membership functions of BLER n and ( )

( )

CQI n by a max-product operation, which can be expressed as

( ) ⁽ ⁾

pair of the input variable fuzzy terms

(

α β,

)

, ∀ ∈α T_{BLER n}( ) and ∀ ∈β T_{CQI n}( ) while j represent the index number of the fuzzy term pair.

Every TTI, the system state pair (BLER n CQI n( ), ( ))will be inputted and then the local optimal actions a^*_j(n), 1,...,j= J will be inferred by select-max EEP, Eq (3-5), as

well as the Q-learning algorithm based on above fuzzy rules separately. Here a n^*_j( ) represented the number of selected action. Then these local optimal actions

*_j(n), 1,...,

a j= J will be used as well as μ_{j n}_, to get the global optimal action a n^*( ) by Eq. (3-6). However a n^*( ) would be continuous while the output should be discrete in our application. Then we will use some method to map the continuous result a^*_n to discrete output action a^*_{n d}_, . The continuous result a n^*( ) will be quantized by following principle:

* * *

After the base station use the decision MCS to transmit and the transmission is finished such that the reinforcement signal of each part is available, operations in Eq.

(3-7), (3-8), (3-9), (3-10) will be used to update the Q-values.

chapter 4

Chapter 4 Simulation Results and Discussions

4.1 System Environment and Parameters c4 1

In our simulation, we consider a hexagonal grid cell structure. There are 19 base stations (BS) in the multi-cell system to consider 2-tier neighboring cell interference.

For a HSDPA user, we assume that the HS-DSCH is allocated at maximum up to 80%

of the total power of a BS. In this thesis, we define HSDPA service power ratio (HSPR) to represent the ratio of transmission power on the HS-DSCH for the HSDPA user to the total transmission power at BS side. The residual power except for HSDPA service will be used for other service and control channels within same cell. The interference from other cell is fixed. Here we use HSPR, which controls not only the amount of HSDPA transmission power but also the interference from self cell, as condition variable to observe the system performance.

In the simulation, we will observe and compare the system performance of both circumstances with and without CQI delay consideration. The CQI delay is set to be 6ms if considered. To evaluate the maximal achievable throughput, we assume that the users always have data to be transmitted to. The channel model is described in section 2.3. The channel condition within a TTI is assumed to be constant. The

detailed simulation environment parameters are shown in Table 4-1.

Table 4-1 : Simulation parameters

Parameter Assumption Cellular layout Hexagonal grid, 19 sites, 1000m cell radius

Path loss model

(

^ξ

( )

)

128.1 + 37.6log10(r)

r is the base station separation in kilometers

Decorrelation length ( dcor ) 30m

σL 8.0

Mobility assignment 0 km/hr to 120 km/hr, random distribution

Carrier frequency 2.0 GHz

Channel bandwidth 5.0 MHz

Chip-rate 3.84 Mcps

Spreading factor 16

Thermal noise density -174 dBm/Hz

TTI length 2 ms

Forgetting factor (γ) 0.1

Learning rate (η) 0.9

N 50

BS total Tx power Up to 44 dBm

Power for HSDPA data transmission

Maximum of 80% of total maximum available transmission power

ACK/NACK delay 6ms

HARQ IR

4.2 Conventional Schemes

^{c4 2}

In the simulation, we will compare the proposed FQL-HARQ scheme with some other conventional schemes. According to [10], we need to choose a suitable MCS at initial transmission in the HARQ process to maintain the BLER requirement 0.1.

Three conventional schemes are described in the following:

¾ Fixed threshold selection [10] :

Based on the pre-known BLER performance, the fixed threshold selection (FTS) scheme sets fixed SINR threshold for each MCS. The threshold is the required SINR that the MCS has BLER equal to the requirement 0.1. At each TTI, FTS will choose the MCS whose corresponding threshold is just under and closest to the measured SINR.

¾ Adaptive threshold selection (Adaptive control of link adaptation [11] ) :

Compared with FTS, the adaptive threshold selection (ATS) scheme improves the performance of users with high mobility. ATS sets threshold for every MCS, too. Moreover, after a transmission is completed, the thresholds which are close to the SINR of last transmission will be updated based on the block decoding result. The thresholds will be increased if failed initial transmission and be deceased if succeeded. The ratio of increasing and decreasing step is set to be

1 *

BLER

−BLER .

¾ Q-learning based HARQ (QL-HARQ) [13] :

Without any pre-knowledge of BLER performance of each MCS, QL-HAQR uses the Q-learning algorithm to learn an optimal policy in both link adaptation and HARQ retransmission version. The reinforcement signal is

designed to be the normalized difference square of received SINR and required SINR for maintaining BLER=0.1. After learning, QL-HARQ will choose a MCS whose required SINR to maintain BLER=0.1 is closest to the received SINR.

In next section, we will show the performance of the FQL-HARQ scheme and the traditional schemes versus HSPR with and without CQI delay. Besides, we will also display the simulation results of these schemes versus different UE mobility with fixed power allocation, and discuss about it.

4.3 Simulation Results and Discussions

^{c4 3}

Fig 4.1(a) and Fig 4.1(b) show the transmission block error rate versus HSPR without and with 6ms CQI delay considering for the proposed FQL-HARQ scheme and three comparative schemes. It can be seen in Fig 4.1(a) that when more than 70%

BS transmission power is allocated for HSDPA service, all schemes can perform MCS adaptation under satisfying the BLER requirement without CQI delay. However, when the CQI delay is considered, FTS and QL-HARQ will violate the BLER requirement even with HSPR up to 80% as shown in Fig 4.1(b). The mobility of UEs in the simulation is uniformly distributed at the range from 0 to 120 km/hr. Motion of UEs will incur not only the Doppler Effect but the higher channel variance, and then affect the accuracy of channel condition information for MCS determination. After 6 ms, the actual transmission channel condition may be much different from the information used for determination. Compare the results of Fig 4.1(a) and Fig 4.1(b), we can find that FTS and QL-HARQ are not flexible enough to accommodate to imperfect CQI report. The MCS determination may not suitable to the transmission channel anymore and violate the BLER requirement.

30 35 40 45 50 55 60 65 70 75 80

Figure 4.1(a): The BLER comparison versus HSPR without CQI delay.

30 35 40 45 50 55 60 65 70 75 80

Figure 4.1(b): The BLER comparison versus HSPR with CQI delay.

Figure 4. 1

On the other hand, ATS, the modified scheme for FTS, and our proposed scheme, FQL-HARQ are more sensitive to the channel variance and able to modify the MCS detection policy based on the past transmission result adaptively, so they can make the BLER requirement as shown in Fig 4.1(a) and Fig 4.1(b). If failure initial transmission occurs too frequently, ATS and FQL-HARQ will decrease the rating of CQI and justify the decision rule to be conservative. So they can maintain the BLER requirement.

It can be observed that BLER of FQL-HARQ will violate the requirement a little when HSPR is smaller than 35% at both circumstances with and without CQI delay consideration. This is because that at low SNR, there are fewer MCSs for selection as shown in Fig 4.2. Since the SNR gap between the considerable MCSs at low SNR is larger, the idea of FQL-HARQ to choosing more aggressive MCS if better short term BLER (SBLER) than requirement will result in too aggressive MCS decision. When at low HSPR, UE may face bad channel condition (low SNR) more frequently, and hence too much forward MCS selection will accumulate. So FQL-HARQ is going to violate the requirement at HSPR smaller than 35%. This can be resolved by increasing N, the window size of SBLER. If we increase N to 500, FQL-HARQ can maintain the requirement even with HSPR below 40% yet will decrease the throughput of the system.

For the same reason, it can be seen in Fig 4.1(a) and Fig 4.1 (b) that BLER of QL-HARQ will be affected by low HSPR more intensely. When the power allocated for HSDPA user is less than 65% BS transmission power, the BLER performance of QL-HARQ will get worse and violate the requirement severely. As mentioned in section 4.2, the decision of QL-HARQ will be the MCS with required SINR maintaining 0.1 BLER closest to the reporting CQI but neglecting whether the former is less than the latter. It will be too aggressive due to the larger SNR gap between

-4 -2 0 2 4 6 8 10

Figure 4.2 : The BLER of turbo code of each MCS versus SNR under AWGN.

considerable MCSs at low SNR as shown in Fig 4.2 and then result in too high BLER.

Compare FQL-HARQ and QL-HARQ: on account of considering the performance and following HARQ process in reinforcement signal after using more aggressive MCS, FQL-HARQ can avoid the BLER violation at low HSPR more effectively than QL-HARQ does.

It can also be found in Fig 4.1 (a) that FTS and ATS have almost the same performance when CQI delay is not considerable, unless at low HSPR. ATS gets a little higher BLER than that of FTS at low HSPR. This is also due to the bigger SINR gap within MCS at low SNR and the thresholds updating range of ATS.

Fig 4.3(a) and Fig 4.3(b) show the system throughput for the four schemes in case of without and with 6ms CQI delay. Definitely it can be seen that as HSPR increases, the throughput of all schemes increases in both cases, too, and all schemes has better system throughput with perfect CQI than that of itself with CQI delay, especially ATS. It is shown that for ATS, with more accurate channel condition

information can result in higher system throughput since it is helpful to decrease the probability of executing wrong threshold adaptation. We can also find that FTS, the only non-adaptive scheme, keep the maximal throughput among the four schemes. By using perfect CQI, the adaptive operation of the other schemes will inference the instant MCS decision and make it be too conservative when channel condition is good.

Then compare the performance of FQL-HARQ and ATS, which are the only two schemes able to make the requirement when CQI delay is considered. It can be seen in Fig 4.1(b) that ATS keeps a lower BLER than FQL-HARQ does. This is for the reason that ATS tune the selection threshold based on current ACK/NACK result directly and immediately, while FQL-HARQ tunes the selection policy based on a long term measure, BLER n , more sophisticatedly and then results in more slowly updating ( ) process than that of ATS. Nevertheless it is can be seen in Fig 4.3(b) that FQL-HARQ reaches a much higher throughput than ATS does. This is because that only BLER performance is considered and affects the threshold updating process for ATS scheme.

On the other hand, as mentioned in Chapter 3, the MCS decision of FQL-HARQ is inferred from the fuzzy rules which are justified by reinforcement signals. Rule base is designed and separated to different parts according to the BLER requirement while the reinforcement signals are set so as to reward MCS with higher throughput. Since both BLER maintaining and throughput maximizing are considered in FQL-HARQ, the throughput can be enlarged by a more aggressive but safe MCS determination.

Due to the too immediately threshold tuning, the selection policy of ATS may oscillate and obtain a too conservative MCS at good channel condition.

30 35 40 45 50 55 60 65 70 75 80

Figure 4.3(a): The system throughput versus HSPR without CQI delay

30 35 40 45 50 55 60 65 70 75 80

Figure 4.3(b): The system throughput versus HSPR with CQI delay.

Figure 4. 3f

30 35 40 45 50 55 60 65 70 75 80 0.06

0.07 0.08 0.09 0.1 0.11 0.12

HSPR (%)

Dropping Rate

FQL QL ATS FTS

Figure 4.4 : The dropping rate comparison versus HSPR with 6ms CQI delay

Fig 4.4 depicts the dropping rate versus HSPR with 6ms CQI delay. In the simulation, every transmission block has at most three times of retransmission. If the block fails to be decoded after the third retransmission, the block will be dropped. It can be seen obviously that a more conservative initial transmission MCS selection, which has smaller initial BLER, can result in lower retransmission dropping rate. Low dropping rate can decrease the signaling cost, however may reduce the system throughput by using too conservative MCS. As shown in Fig 4.3(b) and Fig 4.4, FQL-HARQ can keep a more balance performance in the trade off between dropping rate and throughput maximizing than the other three schemes. It can also be found in Fig 4.4 that QL-HARQ is more sensitive to HSPR than the others. As mentioned, this is because of the MCS BLER distribution versus SNR shown in Fig 4.2. At high SNR, there are more MCSs for selection and then smaller SNR gap between the considerable MCSs, so the QL-HARQ can execute a more accurate learning process.

We can find that the dropping rate of ATS and FTS arise slightly as HSPR

increases while the BLER of both schemes decrease as shown in Fig 4.1(b). This is because that the SNR gap between considerable MCS at high SNR is smaller than that at low SNR. The considered coding rate schemes are 4 3 2 1 1

, , , ,

5 4 3 2 3 of the order with SNR performance. If the decoding of the transmission block fails, appending redundancy bits will be transmitted to the user so that the block will have the next stage coding rate after retransmission. When the SNR gap of MCS is small, which means the BLER performance will improve a little after retransmission, the decoding failure rate will still be high after three retransmissions. So the dropping rate of FTS and ATS will arise little at high HSPR. On the other hand, since the dropping condition is considered in FQL-HARQ, which the Q values of fuzzy rules will be updated by a severe punishment signal, the dropping rate can keep stable as HSPR increase.

Fig 4.5(a) and 4.5(b) show the BLER performance and system throughput of the four schemes versus different user mobility. In the simulation, BS allocate 80% of the total transmission power for the HSDPA user, and CQI has 6ms delay. We can see that all schemes has better performance when the UE is immotility than that of itself when the UE with mobility. Besides, FTS, the only scheme with fixed selection policy, has better BLER and throughput performance than the other three adaptive schemes when the UE is at low speed and with low cannel condition variance. However FTS has the worst BLER requirement violation when the UE at mobility higher than 45 km/hr among all schemes. This is due to the channel information inaccuracy resulting from CQI delay and Doppler Effect. On the other hand, when the variance of CQI inaccuracy is small, i.e. UE at low mobility, schemes with too rapidly channel adaptation, i.e. ATS, will choose too conservative MCS and result in non-effective system throughput. Again we can find that FQL-HARQ can reach the maximal system throughput among the schemes which can maintain the BLER requirement at the same time.

Surprisingly, we can also find in Fig 4.5(a) and Fig 4.5(b) that when the mobility is beyond 30 km/hr and get higher, the BLER will decrease and the system throughput will increase a little for all schemes on the contrary. This is because when the mobility of the UE is higher than 30 km/hr, the effect of channel variance and CQI inaccuracy

在文檔中在高速下行封包擷取系統中採用乏晰Q-Learning技術之混合自動重傳機制 (頁 32-0)