在高速下行封包擷取系統中採用乏晰Q-Learning技術之混合自動重傳機制

(1)

國立交通大學

電信工程學系

碩士論文

在高速下行封包擷取系統中

採用乏晰 Q-Learning 技術之混合自動重傳機制

HARQ Process for HSDPA by Fuzzy

Q-learning Technique

研究生：黃巧瑩

指導教授：張仲儒博士

(2)

在高速下行封包擷取系統中

採用乏晰 Q-Learning 技術之混合自動重傳機制

HARQ Process for HSDPA by Fuzzy Q-learning Technique

研究生：黃巧瑩 Student：Chiao-Yin Huang

指導教授：張仲儒博士 Advisor：Dr. Chung-Ju Chang

國立交通大學

電信工程學系

碩士論文

A Thesis

Submitted to Department of Communication Engineering

College of Electrical and Computer Engineering

National Chiao Tung University

in partial Fulfillment of the Requirements

for the Degree of Master of Science

in

Communication Engineering

July 2008

Hsinchu, Taiwan

中華民國九十七年七月

(3)

在高速下行封包擷取系統中

採用乏晰 Q-Learning 技術之混合自動重傳機制

研究生：黃巧瑩指導教授：張仲儒博士

國立交通大學電信工程學系碩士班

摘要

mimi 1 第三代合作夥伴計畫（3rd

generation partnership project, 3GPP）在現有的第三代通訊系統下，提出了一種高速下行封包擷取技術（high speed downlink packet

access, HSDPA）來提供更高速且安全的下鏈路資料封包傳送。HSDPA 採取混合自動重傳機制（hybrid automatic retransmission request, HARQ）程序，它有一個重要的服務品質要求：在根據通道狀況決定原始傳送的 MCS 時，必須使其封包錯誤率小於 0.1。針對這個課題，我們在本篇論文裡利用乏晰 Q 學習演算法來實現高速下行封包擷取技術之混合自動重傳機制（FQL based HARQ）。在同時考慮了通道動態調節以及重傳程序的交互影響的情況下，我們將 HARQ 程序模擬為離散時間馬可夫決策過程（Markov decision process, MDP），經由乏晰系統規則的設計，來實現 BLER 的服務品質需求，同時利用 Q 學習法，不斷的學習每一種 MCS 在不同的系統環境下之輸出表現，並修正乏晰系統規則。在學習收斂後，我們期望可以達到正確的選擇不違反 QoS 需求同時又能達到最高系統輸出的傳送方式之目的。由模擬結果可以看到，我們所提出的機制可以在有通道訊息延遲的情況下，對不同的行動用戶通道環境來選擇適當的 MCS，既可滿足 QoS 的需求，又可比傳統只考慮通道動態調節的機制達到更高速的傳輸速率。相較於其他的比較系統，我們的 FQL-HARQ 可以在滿足 QoS 需求下，獲得最大之系統輸出。

(4)

HARQ Process for HSDPA by Fuzzy Q-learning Technique

Student：Chiao Yin Huang Advisor：Dr. Chung-Ju Chang

Department of Communication Engineering

National Chiao Tung University

ABSTRACT

mimi 2

I

n order to provide higher speed and more effective downlink packet data service in 3G, high speed downlink packet access (HSDPA) is proposed by 3rd generation partnership project (3GPP). An important QoS requirement defined in spec for the hybrid automatic retransmission request (HARQ) process is to choose a suitable MCS to maintain the initial block error rate (BLER) smaller than 0.1 based on the channel quality information. In this thesis, we proposed a fuzzy Q-learning based HARQ (FQL-HARQ) scheme for HSDPA to solve this problem. The HARQ scheme is modeled as a Markov decision process (MDP). On one hand, the fuzzy rule is designed to maintain the BLER requirement by separated to different parts based on a shot term BLER performance. On the other hand, by considering both link adaptation and HARQ version, the Q-learning algorithm is used to learn the performance of MCS under different environment. After learning, we want to choose the MCS with highest throughput while not going to violate the BLER requirement.

The simulation results show that the proposed scheme can indeed choose a suitable MCS for the initial transmission with channel information delay consideration. Comparing to other traditional schemes, the FQL-HARQ scheme can achieve higher system throughput and maintain the BLER at the same time.

(5)

誌謝

mimi 3 碩士生涯在不知不覺中結束了。這兩年來，生活中的點點滴滴，知識的累積和獲得以及這篇論文的完成，要感謝許多人給我力量與幫助。首先要感謝我的指導教授張仲儒博士，在碩士生涯的過程中給予我研究方向的引導與心靈上的鼓勵以及教導我做人處事的道理；然後要感謝早已自學校畢業的芳慶學長，每個星期犧牲自己的時間來幫我們解答疑惑並給予我們建議；感謝立峰、志明、詠翰、文祥、耀興學長和煖玉學姊的幫助和建議，在我遇到問題時，指引我思考的方向讓我可以渡過各種難關；感謝建興、佳璇、世宏、建安、佳泓、正昕、尚樺，親切的帶領我熟悉並融入這整個實驗室的大家庭，在我寫論文時給我加油打氣；感謝兩年來一路一起走來的宗利、英奇、維謙、邱胤和浩翔，總是在我低潮時給予我繼續前進的勇氣，讓我在這條路上不孤單，真的很謝謝你們。感謝惟媜，在這一年裡，總是在我鬱悶時，給我溫暖的關懷。還要謝謝盈伃、志遠、欣毅、和儁，是你們讓實驗室變得更熱鬧，為苦悶的生活增添了繽紛的色彩，也祝你們在未來一年論文研究順利。謝謝玉棋，在這一年裡給予我很多行政上的幫忙。另外，要感謝彥碩學弟，沒有你的幫助，我的論文將無法順利完成。最後，感謝我的父母、妹妹，你們的支持、關心和鼓勵，是我努力及支持下去的最大動力。感謝我的朋友們，特別要感謝我的好友斯寗，總是在我最需要幫助時，無條件的給予我幫助和關懷，謝謝大家。黃巧瑩謹誌民國九十七年七月

(6)

mimi 4

Mandarin Abstract ... i English Abstract ... ii Acknowledge ... iii Contents ... iv List of Tables ... v List of Figures ... vi Chapter 1 Introduction ... 1

Chapter 2 System Model ... 6

2.1 HSDPA System c2 1... 6

2.2 HARQ Scheme c2 2... 10

2.3 Channel Model c2 3... 11

Chapter 3 Fuzzy Q-learning based H-ARQ Scheme ... 13

3.1 Fuzzy Q-Learning Algorithm c3 1... 14

3.2 Input and Output Linguistic Variables c3 2... 18

3.3 Design of the FQL Rules Base c3 3... 23

Chapter 4 Simulation Results and Discussions ... 28

4.1 System Environment and Parameters c4 1... 28

4.2 Conventional Schemes c4 2... 30

4.3 Simulation Results and Discussions c4 3... 31

Chapter 5 Conclusion ... 42

Bibliography ... 44

(7)

y List of Tables

mimi 5

Table 3-1: the action pool and numbers of actions ... 20

(8)

y

List of Figures

mimi 6

Figure 2.1 : Release’99 and Release 5 HSDPA retransmission control ... 6

Figure 2.2 : HSDPA protocol architecture ... 7

Figure 2.3 : HS-SCCH and HS-DSCH timing relationship ... 8

Figure 2.4 : The allocation of bits in adjacent TTIs ... 10

Figure 3.1 : Block diagram of a learning system ... 14

Figure 3.2 : The overall structure of FQL ... 18

Figure 3.3 : Definition for function f( )⋅ ... 21

Figure 3.4 : Definition for function g( )⋅ ... 21

Figure 3.5 : The membership function of BLER n ... 22 ( )

Figure 3.6 : The membership function of CQI n ... 23 ( )

Figure 4.1(a): The BLER comparison versus HSPR without CQI delay. ... 32

Figure 4.1(b): The BLER comparison versus HSPR with CQI delay. ... 32

Figure 4.2 : The BLER of turbo code of each MCS versus SNR under AWGN. ... 34

Figure 4.3(a): The system throughput versus HSPR without CQI delay ... 36

Figure 4.3(b): The system throughput versus HSPR with CQI delay... 36

(9)

Figure 4.4 : The dropping rate comparison versus HSPR with 6ms CQI

delay. ... 37

Figure 4.5(a): The BLER versus different mobility of UE with CQI delay ... 40

Figure 4.5(b): The system throughput versus different mobility of UE with CQI

delay ... 40

(10)

chapter 1

Chapter 1 Introduction

High speed Downlink Packet Access (HSDPA) has been included in the 3GPP

Release 5 [1] UMTS specification and is developed to enhance the downlink of

packet data services which is already provided in release 99 for WCDMA networks.

There are two main design targets for the HSDPA concept. Firstly, the HSDPA scheme

wants to support downlink best effort based packet data services with peak data rate

up to 14.4Mb/s. Secondly, it hopes to reduce the downlink transmission delays and

finally reach three times capacity of release 99 [2].

Basically in order to provide such efficient, robust, and high speed packet data

service, there are several techniques used in the HSDPA scheme. A key characteristic

of HSDPA is the use of HS-DSCH (high speed downlink shared channel) [3]. The fast

scheduler treats all the available resources such as channelization codes and

transmission power within a cell as a common source and schedules users in a

time-multiplexing fashion. At each transmission time interval (TTI), which is 2 ms in

HSDPA, the fast scheduler will choose the most suitable user to be transmitted and to

use the HS-DSCH which contains all allocated source for HSDPA system. Then

instead of power control and tunable number of CDMA codes, data rate adaptation

(11)

the Eb/N0 (energy per bit/noise) for each transmission every TTI.

Another important technique used in HSDPA is hybrid automatic repeat request

(HARQ). When the receiver transmits NACK information back, node B will use this

enhanced retransmission scheme to recover the scheduling error. Compared with the

traditional ARQ scheme, which treats every (re)transmission of one block

independently, the most powerful improvement in HARQ is that it softly combines

the energy from the previously erroneous transmissions and present retransmission in

order to increase the probability of success decoding.

There are three kinds of scheme to implement the HARQ technique. The first is

the chase combining (CC) scheme, in which each retransmission is identical to the

original one. The second is incremental redundancy (IR) scheme, where each

retransmission consists of new redundancy bits generated from turbo encoder. Then

the third is H-ARQ-type-III scheme, and it belongs to the class of incremental

redundancy HARQ schemes, however, with each self-decodable retransmission which

consists of both mother code and redundancy bits.

The performance comparison of HARQ with CC and HARQ with IR for HSDPA

was shown in [4] and [5]. The benefit of the HARQ chase combining scheme is that

the final received SINR is the summation of each (re)transmission when

maximal-ratio combining (MRC) [6] is used. And it can be expected that IR

outperforms CC schemes because of the coding gain from turbo code. However IR

implies larger memory requirements for the mobile receivers and a larger amount of

control signaling compared to CC. From simulation results, it can be found that when

the SINR of first transmission is worse than that of retransmission, IR scheme may

not get any gain and even worse than CC scheme. This is due to the fact that

systematic bits are included in every (re)transmission for CC. While the channel state

(12)

correct systematic bits. It is shown in [7] that the performance of the three schemes

will be very close if under smart antenna. In this thesis, we will concentrate on the IR

scheme for HARQ.

To accommodate different channel conditions, several adaptive HARQ

techniques have been researched. A2IR HARQ (asynchronous and adaptive hybrid ARQ) schemes were proposed to provide more diversity [8], [9]. “Asynchronous”,

here, means that each retransmission can be operated at any time, while

“synchronous” means that retransmission occurs only at some specific time slots. This

function introduces multi-user diversity. On the other hand, the “adaptive” operation

stands for the rate compatible (different modulation order and coding rate schemes,

MCSs, selection) for each (re)transmission based on the channel state information and

an estimate of the residual energy required for the packet to succeed. Two different

adaptive schemes were proposed with MCS adaptation: variant TTI scheme [8] and

variant CDMA code scheme [9]. Using variant TTI [8], different sub-packet formation

will be generated with different code rate by the turbo encoder. Selecting resultant

higher rate would shorten the TTI for retransmission and free up times slots for

scheduling other users. With fixed TTI [9], the resultant higher rate selection would

free up codes for assignment to other users. As simulation results show, HARQ with

adaptive scheme can provide higher gain when the channel state is even worse such as

lower SINR and higher Doppler frequency.

In the HARQ procedure, an important object is to find a suitable MCS decision

dependent on the feedback information of channel quality indicator (CQI) when first

transmission occurs. The MCS selection for the initial transmission of each packet

will influence the retransmission times to success and then effect the performance of

the system throughput, packet drop rate, etc. Usually, the relation between the feedback

(13)

scheme has already decided in a predetermined table to achieve the block error rate

(BLER) requirement for initial transmission which is set to 0.10 according to [10].

Many method has been researched to adaptively choose the MCS while maintain

the BLER requirement 0.1. In [11], Nakamura, Awad and Vadgama proposed an

adaptive method to tune the SINR threshold for each MCS based on the last

transmission result. And in [12], Muller and Chen proposed a modified MCS SINR

threshold adaptation method considering not only the transmission result but also the

rating of CQI for different CQI delay schemes.

To achieve more system efficiency and resource utilization, a new H-ARQ

procedure based on Q-learning algorithm (Q-HARQ) has been proposed in [13]. The

Q-HARQ procedure is modeled as a discrete-time Markov decision process (MDP)

since the decision making is based on the current channel state. The Q-learning is one

kind of powerful reinforcement learning [14]. The Q-HARQ uses Q-learning method

and a so-called Q-value function to evaluate the expected summation of some error

feedback which is called the reinforcement signal in the reinforcement learning step by

step. The base station will choose the MCS with minimum Q-value, meaning that the

accumulation of the difference between the BLER requirement and the instantaneous

BLER of each initial transmission is minimal. Simulation results have shown that the

Q-HARQ scheme can improve the system throughput over conventional scheme and

can have better QoS performance when the channel condition is bad [13].

However the reinforcement information used in Q-HARQ [13] needed a large

amount of additional information signal feedback to the base station since the

algorithm was implemented in a HSDPA scheme. This induced waste of bandwidth

resource. And the effect of mobility is not considered in [13], which might result in

CQI delay and inaccuracy problem.

(14)

time and conventionally used to model the motion and thinking way of human in the

robot design [15], [16], [17], [18]. The fuzzy Q-learning (FQL) algorithm can be seen

an extension of Q-learning into fuzzy environments. Primarily, the Q-learning is a

very strong off-policy TD control method for reinforcement learning which learns the

action-value function Q to decide the most suitable action by a feedback

reinforcement signal, which may be a reward or punishment. On the other hand, the

fuzzy logic could be considered as a mathematical approach to emulate the human

way thinking by using if-then rules to deal with the control of the imprecision.

Therefore, the fuzzy Q-learning algorithm, the combination of these two methods can

help the system efficiently to adapt to the environment to choose suitable actions

properly.

Since it is hard to find an explicit mathematics equation to describe the relation

between BLER requirement fulfillment and throughput maximization, it will be very

attractive to adopt the advantage of the fuzzy inference system with on-line learning

operation of Q-learning algorithm to treat the above imprecise problem to get the best

action-value function gradually.

In this thesis, we propose a fuzzy Q-learning (FQL) based HARQ process for

HSDPA in UMTS to find the most suitable MCS for each transmission time interval.

The proposed FQL based HARQ process will be used not only to provide an efficient

method to make the MCS decision and to achieve better system capacity but also to

improve above problems.

The rest of the thesis is organized as follows. The system model is described in

Chapter 2. Chapter 3 describes the concept of fuzzy Q-learning and the design of

fuzzy Q-learning based H-ARQ scheme. Simulation results are given in Chapter 4,

which compares the performance among the proposed schemes and the conventional

(15)

chapter 2

Chapter 2 System Model

2.1 HSDPA System

c2 1

HSDPA is designed to increase downlink packet data throughput by means of

fast physical layer (L1) retransmission and transmission combining, as well as fast

link adaptation controlled by the Node B. Notice that the retransmission for HSDPA

is processed in the Node B instead of in the radio network control (RNC). The

advantage is the faster retransmission and shorter delay when a retransmission is

needed. Fig 2.1 shows the difference of retransmission handling in HSDPA between

Release 5 and Release’99.

(16)

The medium access control (MAC) layer protocol architecture of HSDPA is

shown in Fig 2.2. A new MAC functionality called MAC-hs is added in the Node B.

The MAC-hs is to handle the automatic repeat request (ARQ) functionality,

scheduling, as well as priority handling.

Figure 2.2 : HSDPA protocol architecture

In HSDPA, there are three new channels introduced in the physical layer

specification. They are high-speed downlink shared channel (HS-DSCH), high-speed

shared control channel (HS-SCCH), and uplink high-speed dedicated physical control

channel (HS-DPCCH).

HS-DSCH carries the data to the user in the downlink direction with the peak

rate reaching up to 14Mbps with 16QAM. The TTI or interleaving period has been

defined to be 2ms (three slots) to achieve a short round rip delay between the Node B

and the terminal for retransmission. The spread factor (SF) is always fixed at 16, and

multi-code transmission as well as code multiplexing of different users can take place

in HS-DSCH. This means that the maximum number of codes which can be allocated

is 15 since there is a need to have code space available for common channels. The

only coding scheme of HS-DSCH is turbo code. To achieve the multiplexing coding

gain, HARQ functionality is added to vary the transport block size, modulation

scheme and the number of multicodes.

(17)

data of HS-DSCH and perform the possible physical layer combining of the data sent

on HS-DSCH when a retransmission is needed. If there is no data on the HS-DSCH,

there is no need to transmit the HS-SCCH. Each HS-SCCH block has three slots and

is divided into two functional parts. The first slot carries the time-critical information

needed to start the demodulation process in due time to avoid over-buffer in chip level.

The remaining two slots contain less time-critical parameters, like cyclic redundancy

check (CRC), to check the validity of HS-DSCH information and HARQ process

information. Fig 2.3 shows the timing relationship between HS-SCCH and HS-DSCH.

From this figure, we can see that the terminal has time duration of one slot to

determine which codes to de-spread from the HS-DSCH.

Figure 2.3 : HS-SCCH and HS-DSCH timing relationship

HS-DPCCH carries the necessary control information in the uplink direction and

is divided into two parts, which carries ACK/NACK messages and downlink quality

feedback information, respectively. The second part is also called channel quality

indicator (CQI) feedback. The information on HS-DPCCH can be used by the Node B

scheduler to decide which terminal to transmit and at what data rate.

The HSDPA physical layer operation goes through the following steps:

(i) The scheduler in the Node B evaluates several channel quality information

(18)

much data is pending in the buffer for each user, for which users

retransmissions are pending and how much time has elapsed since a

particular user was last served and so forth.

(ii) Once a terminal has been determined to be served in a particular TTI, the

Node B identifies the necessary HS-DSCH parameters. These parameters

are, for example, the number of available codes, the kind of modulation

order that can be used and the terminal capability limitations. The terminal

soft memory capability also defines which kink of HARQ can be used.

(iii) The Node B starts to transmit the HS-SCCH two slots before the

corresponding HS-DSCH TTI to inform the terminal of the necessary

parameters. The HS-SCCH selection is free if there was no data for the

terminal in the previous HS-DSCH frame.

(iv) The terminal monitors the HS-SCCHs given by the network. While the

terminal has decoded part 1 (as shown in 2.3) from an HS-SCCH intended

for that terminal, it will start to decode the rest of that HS-SCCH and will

buffer the necessary codes from the HS-DSCH.

(v) Until the HS-SCCH parameters has been decoded from part 2, the terminal

can determine which H-ARQ process the data belongs to and whether it

needs to be combined with data received previously in the soft buffer.

(vi) After decoding the combined data, the terminal sends an ACK/NACK

indicator in the uplink HS-DPCCH, depending on the outcome of the CRC

check conducted on the HS-DSCH data.

(vii) If the network continues to transmit data for the same terminal in

consecutive TTIs, the terminal will stay on the same HS-SCCH which was

(19)

2.2 HARQ Scheme

c2 2

HARQ plays an important role in HSDPA. It can combine the previous packet

and redundancy from present packet to help decoding if received a NACK signal.

Fig 2.4 shows an example of bit allocation in adjacent TTIs. By increasing the

redundancy for previous failure packet, it can improve the probability of successful

decoding of previous packet. With this appending redundancy bits, the coding rate

will decrease to the next stage considerable stage after retransmission. By an

appropriate HARQ functionality, the delay time of retransmission can be reduced.

HARQ can determine the transport block size, modulation scheme, and the code rate

according to the channel quality information received from the CQI on HS-DPCCH.

The CQI feeds the corresponding modulation scheme and coding rate back to the

Node B scheduler according to the CQI table. The CQI table stores the information

that which modulation order (QPSK or 16QAM) and coding rate are suitable for the

channel condition. CQI sends the information to the Node B based on the instant

channel condition. The Node B receives the information from CQI and determines the

exact (re)transmission modulation order and coding rate since the information is just a

suggestion for the Node B. By changing the modulation order according to the instant

channel condition, HARQ can lessen the number of retransmission effectively and

improve the throughput.

(20)

2.3 Channel Model

c2 3

We will consider a terrestrial radio channel for urban areas just as what Chang

did in [13] in this thesis. There are three types of propagation factor included in the

channel model. These are path loss, slow variation resulting from shadowing and

scattering, and the rapid variation in the signal due to the multi-path effects. Here we

demote ( )F t the objective fading channel condition function at time t for

WCDMA cellular system. The F t( ) is mainly modeled by long-term and short-term fading and can be represented by

/10

( ) ( ) 10 ( )

F t =ξ r × η ×ζ t (2-1)

where ξ( ) 10r × η/10is the long-term fading including path loss and shadowing, r is the distance from the base station to the mobile user, and η is a normal-distributed random variable with zero mean and variance σ_L2. The short-term fading factor, ( )ζ t , caused by multi-path effect is assumed to be the Jakes model [19], which is given by

1 2 ( ) 2 cos(2 cos(2 / ) ) m, M j D m m t f t m L e L β ζ σ π π θ = =

∑

+ (2-2)

where σ is the radical of the average power signal, f_Dis the Doppler frequency, 4 2

L= M + is the number of the signal path, β_m =πm M/( + , and 1) 2 /( 1) , 0,1, 2,..., 1.

m m ms M s M

θ =β + π + = − (2-3)

Because the summation components of ζ( )t are mutually independent to each other, this technique can produce up to M independent short-term fadings. Therefore we will

choose M equal to the number of the total links in all cells of the system. And since it

is reasonable to suppose that the scattering geometry is time invariant within some

(21)

simulations.

As we know, the shadowing effect of a moving user would be different when the

position of the user changes. For a practical system, however, the degradation degree

between two sampling time is small due to fact that the sampling frequency in HARQ

is very short compared to the motion of the user. In other words, the shadowing

effects of these sampling points are expected to be highly correlated and the

correlation function will be function of the distance between two adjacent sampling

points. In this thesis, we will use the normalized autocorrelation function ρ(Δ [20] x) in to model the correlated shading fading, where Δx is the position difference between two adjacent TTI. The ρ(Δ can be obtained by x)

| | ln 2 ( ) cor , x d x e ρ Δ − Δ = (2-4)

where d_coris the decorrelation length.

After the UE measure the channel quality by common pilot channel (CPICH) at

time t, this estimation will be transformed to a discrete level from 0 to 30 as

( )

(22)

chapter 3

Chapter 3 Fuzzy Q-learning based HARQ Scheme

The fuzzy Q-learning based HARQ (FQL-HARQ) scheme is designed to

determine a proper MCS at each initial transmission in the HARQ process such that

the QoS requirement BLER* can be maintained and the system throughput can be

enlarged.

The decision at every TTI, which is a function of channel quality indication

(CQI) and the block error rate (BLER) performance, is dependent on current and past

system state only, so this process is modeled as a Discrete-time Markov decision

process (MDP) in this thesis. Andthe MCS selection at current TTI will influence not

only current but also future system performance; here, we imply the concept of

reinforcement learning to solve this problem.

Furthermore, we combine the learning technique with fuzzy logic, which can be

considered as a mathematical approach to emulate the human way thinking by using

if-then rules, to deal with the control of the imprecision. In this thesis, the so called

(23)

Figure 3.1 : Block diagram of a learning system

3.1 Fuzzy Q-Learning Algorithm

c3 1

First we give a brief introduction of the fuzzy Q-learning (FQL) algorithm [15],

[16], [17], [18].

As shown in Fig 3.1, a general learning system consists of five elements. The

learner will interact with the environment and make a decision according to the state.

After the decided action is applied, some reward resulting from the acting will be

feedback to the learner and be used to justify the decision policy.

The basic idea of the reinforcement learning is to learn an optimal policy which

can choose the best action in order to maximize (minimize) the accumulation of

rewards (costs) induced by the selected action each time. The expectation of the

accumulation, called Q-function, is related to the action, denoted by a, as well as the

state of system, denoted by x, which is defined as

(

0 0

)

(

)

0 0 0 , n ( ), ( ) (0) , (0) , n Q x a E γ r x n a n x x a a ∞ = ⎧ ⎫ = _⎨ = = _⎬ ⎩

∑

⎭ (3-1)

where r is the reward, also called reinforcement signal, γ is the discount factor , n in ( )x n or ( )a n is the index of the episode of the system state or action and

{}

E ⋅ is the expectation operation. The output of Q-function, called Q-value, is the

(24)

from now on and this expectation value will be effected by the selected action a₀

under state x₀ at current decision episode n=0. The optimal action, denoted by *

a ,

can be obtained by:

*

0

arg max ( , ).

a

a = Q x a (3-2)

However, the system state in the future, ( ) for x n n> , and the expectation value of 0

r are usually unavailable in the real world. It is hard to build the relation among action a, system state x and expected value of feedback r . Watkins and Dayan

[14] proposed a recursive method, called Q-learning algorithm, to solve above

problems and obtain an the optimal Q function, denoted by Q . The rule of each step *

of the learning method, resulting from mathematical inferring of Eq. (3-1), can be

given by [14]

[

1 ( , ) ( ( ), ( )) ( , ) max[ ( ( 1), )] ( ( ), ( )) , for ( , ) ( ( ), ( )) , ( , ), for ( , ) ( ( ), ( )). n n n n n a n Q x a r x n a n Q x a Q x n a Q x n a n x a x n a n Q x a x a x n a n η γ + ⎧ + ⎪⎪ _⎤ =_⎨ + + − _⎦ = ⎪ ≠ ⎪⎩ (3-3)

where ( , )Q x a is the transient function for _n Q at episode * n, and η_n is the learning rate at the n-th episode, η_n∈[0,1]. It is assumed that the next system state x n( + is 1) available. At episode n+1, feedback reward ( ( ), ( ))r x n a n caused by the last action is

used to update Q x a to get a new function _n( , ) Q_n₊₁( , )x a . It should be noticed that only the state-action pair ( , )x a which occurrs in the just past episode will have

information to correct its Q-value in this episode. After updating, we can use a more

accurate Q-function approximation and the policy in Eq. (3-2) to select the action.

The fuzzy Q-learning (FQL) algorithm can be regarded as the Q-learning

(25)

combination scheme can be observed from the general form of fuzzy inference system

(FIS) rules:

Rule : if ( ) is j X n S_j, then a_k with q S a_n( _j, _k), 1≤ ≤k K. (3-4)

where X n( )=

[

x n₁( ),...,x_H( )n

]

is the vector of input linguistic variables, H is the

number of input variables, and q S a_n( _j, _k)is the Q-value for the state-action pair (S a_j, _k)at the episode n. Denote S=

{

S_j,j=1,...,J

}

as the set of state vectors. The uncertain input vector X n will belong to each ( ) S with different intensity _j

depending on the membership function of input variables. A=

{

a k_k, =1,...,K

}

is the set of action candidate. For simplification, we assume each S containing a rule. _j

Then we will get J different Q-values q_n

(

S a n_j, ( ) ,

)

j=1,...,J for each pair ( ( ),X n a_k),k=1,...,K and can infer Jconsequences from these J rules separately. In this thesis, the so called select-max strategy is adopted for each rule to choose

the most suitable action:

* A ( ) arg max ( , ) . k j n j k a a n q S a ∈ = (3-5)

Then these consequences a n*_j( ), j=1,...,J will be gathered to infer the global optimal action, denoted by *

( ) a n * , 1 * , 1 ( ) ( ) , J j n j j J j n j a n a n μ μ = = × =

∑

(3-6)

(26)

the membership function of each input variable.

As in traditional Q-learning algorithm, q S a_n( _j, _k) is a transient value at time n and will be updated after reward is replied. The Q-function updating operation to get

new q_n₊₁(S a n_j, *( )),j=1,...,J is given by * * * 1( , ( )) ( , ( )) ( , ( )), for 1 , n j n j n n j q ₊ S a n =q S a n + × Δη q S a n ≤ ≤j J (3-7) and

(

)

(

)

(

)

∑

(3-9) * ( ( 1), ( 1)) n

Q X n+ a n+ is the next-stage optimal global Q-value. Since the next stage Q-values q_n₊₁(S a_j, _k), j=1,..., ,J k=1,...,K are not available, Q X n*_n( ( +1), (a n+1)) will be calculated by q S a_n( _j, _k), j=1,..., ,J k =1,...,K which is defined as

(

)

(

)

* , 1 1 * , 1 1 , ( ) ( 1), ( 1) . J j n n j j j n J j n j q S a n Q X n a n μ μ + = + = ⎡ × ⎤ ⎣ ⎦ + + =

∑

(3-10)

(27)

Step 1: Initialize q_{j n}_, (S a_j, _k)for all pair (S a_j, _k),k=1,...,K j, =1,...,J. Step 2: Observe the input linguistic vector X n . ( )

Step 3: Using Eq. (3-5) to find *

( ), 1,...,

j

a n j= J

Step 4: Use Eq. (3-6) to infer the global optimal action a n and then *( ) imply it to the system.

Step 5: Compute the reinforcement signal *

( j, ( ))

r S a n and measureX n( + 1) Step 6: Update *

, ( , ( ))

j n j

q S a n for all j using Eq. (3-7), (3-8), (3-9), (3-10).

Step 7: Return to step 2 and repeat.

In summary, the procedure of the FQL algorithm step is list as following:

The overall FQL structure is shown in Fig 3.2.

Fuzzy Rule Base 1 ( , ) n k q S a ( , ) n J k q S a Inference ( ) X n q S an( 2, k) _EEP (select-max) * 1( ) a n * 2( ) a n * ( ) J a n * ( ) a n System Q-function Update * ( , ( )), 1,.., n j q S a n j J Δ = ( ) r n (k=1,...,K)

Figure 3.2 : The overall structure of FQL

3.2 Input and Output Linguistic Variables

c3 2

The structure of the FQL algorithm was described in section 3.1. In this section,

(28)

HARQ scheme (FQL-HARQ).

At decision episode n, the FQL-HARQ scheme chooses two system measures

as input linguistic variables. First is a short-term block error rate (BLER) performance

indicator, which is denoted as BLER n and defined as the failure times in the last N ( ) packets from the (n-N)-th to the (n-1)th transmission over N. The other one is

( )

CQI n , the channel quality indicator received at episode n. The values of CQI n ( ) will be a discrete number and at the range of [0, 30] since the reporting information is

composed of 5 bits in the HSDPA scheme. These two variables will be the fuzzy input

to help infer the most suitable modulation order and coding rate scheme (MCS)

decision. That is, we have X n( )=

{

BLER n CQI n( ), ( )

}

in the FQL-HARQ system. In this thesis, we assume two kinds of modulation order, QPSK and 16 QAM,

and five kinds of coding rate, 1 1, , , 2 3 and 4

3 2 3 4 5, resulting in totally 10 kinds of modulation and coding pairs, to be chosen. The MCS pair pool, which is shown in

Table 3-1, is the set for the output linguistic variable a mentioned in section 3.1. *_n

And we number these MCSs from 1 to 10 depending on the degree of BLER

performance. Under the same channel condition, MCS with lower BLER will have

smaller number.

The fuzzy term sets for the two fuzzy input linguist variables, BLER n and ( ) ( )

CQI n are defined as T BLER n

(

( )

)

= {Green, Yellow, Red} = {G, Y, R} where

(

( )

)

T CQI n ={Level 1, Level 2, Level 3, Level 4, Level 5, Level 6, Level 7, Level 8,

Level 9, Level 10} ={L1, L2, L3, L4, L5, L6, L7, L8, L9, L10}, respectively. Terms

in T BLER n

(

( )

)

are used to describe the degree of the BLER performance. Based on the QoS requirement of BLER, denoted by BLER , “Green”, here, means the BLER *

(29)

2/3 16QAM 8 2/3 QPSK 4 Coding rate Modulation order MCS number 4/5 16QAM 10 3/4 16QAM 9 4/5 QPSK 7 3/4 QPSK 6 1/2 16QAM 5 1/3 16QAM 3 1/2 QPSK 2 1/3 QPSK 1 2/3 16QAM 8 2/3 QPSK 4 Coding rate Modulation order MCS number 4/5 16QAM 10 3/4 16QAM 9 4/5 QPSK 7 3/4 QPSK 6 1/2 16QAM 5 1/3 16QAM 3 1/2 QPSK 2 1/3 QPSK 1

Table 3-1: the action pool and numbers of actions

performance is “safe” while “Yellow” is “general” and “Red” is “violation”. On the

other hand, terms in T CQI n

(

( )

)

stand for the judgment of channel quality indication and are designed with the ten levels, which is related to the 10 kinds of MCS adopted

R

1 (BLER n( )) μ * 0.7 BLER× * 0.2 BLER×

Figure 3.5 : The membership function of BLER n ( )

The membership function μ

(

CQI n( )

)

is defined and shown in Fig 3.6. As mentioned, Ai is set to be the required SINR to maintain

*

BLER while using the i-th modulation and coding scheme (MCS i_i, =1,...,10). Again we expressμ(CQI n( )), which is constituted by the membership functions for the terms, L1,…,L10, of

( )

)

1 ( ) ( ); - , 0, A1, A2 . L CQI n g CQI n μ = ∞ (3-16)

(

L CQI n f CQI n μ = (3-24)

(

)

(

)

10 ( ) ( ); A9, A10, 30, L CQI n g CQI n μ = ∞ (3-25) (CQI n( )) μ ( ) CQI n 1 L L2 L3 L4 L5 L6 L7 L8 L9 L10

Figure 3.6 : The membership function of CQI n( )

3.3 Design of the FQL Rules Base

c3 3

In section 3.2, we have selected the input and output linguistic variables as well

as defined the membership function of input variables for the fuzzy interface. In this

section, we will design the fuzzy rule base, which is consisted of the if-then rules and

(33)

rule.

The rule form of the FQL is shown in Eq. (3-4). We need to design the

reinforcement signal for each rule to update the q-value of each action and accomplish

the Q-learning operation. Design of the fuzzy rules will be based on the concept that

the choice of MCS will be more aggressive if better BLER n performance, and on ( ) the other hand, is more conservative if worse BLER n . The decision is mainly ( ) counted on BLER n , while ( ) CQI n will be used to determine the selection base so ( ) as to accelerate the learning procedure. Therefore we divide the rules into three parts

based on the 3 fuzzy terms, Green, Yellow and Red. Rules in the same parts will have

the same reinforcement signal. The details of each part are described in the following.

Part 1: if

BLER n( )

is Green, and

CQI n( )

is Level

m

, then

k

MCS

with

q_n

(

(BLER n( ) is Green ,CQI n( ) is m),MCS_k

)

,k >m

There are 10 rules (1≤ ≤m 10)in part 1. In this part, BLER n is considered to ( ) be in a safe region where BLER n is much smaller than ( ) *

BLER . The main goal is to maximize the throughput. To be more aggressive, only MCS_k with k > will be m

considered. The amount of carrying information and whether the signal could be

successfully decoded before dropping will be focused. Thus the reinforcement signal

is designed as , if succesful transmission, 10, if failure transmission, k k k MCS MCS rd R R R r ⎧ ⎪ ₊ = ⎨ ⎪ − ⎩ (3-26) where k MCS

(34)

k

rd

R is the required redundancy bits ,contained in the initial transmission and the retransmission, for the successful transmission We will update the q-value after the

transmission of the packet is completed. It can be expected that higher achievable data

rate after transmission will get larger reward feedback. If the transmission fails to be

decoded after 3 retransmissions, the block will be dropped and we will give a severe

punishment, r= -10 at such condition.

Part 2: if

BLER n( )

is Yellow, and

CQI n( )

is Level

m

, then

k

MCS

with

q_n

(

(BLER n( ) is Yellow ,CQI n( ) is m),MCS_k

)

,m− ≤ ≤ +1 k m 1

There are 10 rules (1≤ ≤m 10)in part 2. Yellow BLER n means that the ( ) BLER performance is around the requirement. It will be better to keep BLER still in a

safe range. Hence we will choose the MCS which has the BLER nearest to BLER *

under CQI n while containing the most information.( ) MCS ,_k k = −m 1, , 1m m+ ,

will be the considered action candidates. The reinforcement signal in this part is set as

* , k MCS MCS R r R σ = × (3-27)

where R_MCS* is the number of information bits if 16 QAM modulation order and coding rate 4

5, and σ is a scalar, which is 1

8 if successful decoding after initial

transmission and − if failed. Here the degree of reward is normalized by 1 R_MCS* to be proportional to the amount of information data.

σ is a weighting factor according to the BLER requirement 0.1. Since it means the occurrence possibility of success and failure transmission has the ratio equal to 9,

(35)

ratio 1

9. For the reason to be more aggressive, we increase the ration up to 1

8. And if the transmission is dropped, we will give a severe punishment r=-10.

Part 3: if

BLER n( )

is Red, and

CQI n( )

is Level

m

, then

k

MCS

with

q_n

(

(BLER n( ) is Red ,CQI n( ) is m),MCS_k

)

,k<m

There are 10 rules (1≤ ≤m 10)in this part. Red BLER n represents BLER ( ) requirement violation. It will be better to take action to recover from this situation.

The action decision should be more conservative, thus MCS k_k, < will be chosen. m

The reinforcement signal is set as

* , if successful decoding after initial transmission.

1, if failed decoding after initial transmission. -10, if transmission is dropped. MCS MCS R R r ⎧ ⎪⎪ = ⎨₋ ⎪ ⎪⎩ (3-28)

Only when the packet is successfully decoded in the initial transmission, the system

will be rewarded. The degree of reward is proportional to the amount of information

bits. If the initial transmission failed, the system will be given a severe punished

r=-10.

There are ten rules contained in each part separately. The intensity of the 30

rules j , j=1,..., 30 will be inferred from the membership functions of BLER n and ( ) ( )

CQI n by a max-product operation, which can be expressed as

(

)

(

)

, ( ( )) ( ( )), where ( ) and ( ) .

j n α BLER n β CQI n T BLER n T CQI n

μ =μ ×μ α∈ β∈

(3-29)

Here j is the number of rule respect to the case that BLER n is ( ) α and CQI n is ( )

(36)

pair of the input variable fuzzy terms

(

α β,

)

, ∀ ∈α T_{BLER n}_{( )} and ∀ ∈β T_{CQI n}_{( )} while

j represent the index number of the fuzzy term pair.

Every TTI, the system state pair (BLER n CQI n( ), ( ))will be inputted and then the local optimal actions a*_j(n), 1,...,j= J will be inferred by select-max EEP, Eq (3-5), as well as the Q-learning algorithm based on above fuzzy rules separately. Here a n*_j( )

represented the number of selected action. Then these local optimal actions

*

(n), 1,...,

j

a j= J will be used as well as μ_{j n}_, to get the global optimal action a n*( ) by Eq. (3-6). However *

( )

a n would be continuous while the output should be discrete in our application. Then we will use some method to map the continuous

result a*_n to discrete output action a*n d, . The continuous result *

( )

a n will be quantized by following principle:

* * * * * * * ( ) , with probability ( ) ( ) ( ) , ( ) , with probability ( ) ( ) d a n a n a n a n a n a n a n ⎧⎡ ⎤ −⎢ ⎥ ⎪⎢ ⎥ ⎣ ⎦ = ⎨ ⎢ ⎥ −⎡ ⎤ ⎪⎣ ⎦ ⎢ ⎥ ⎩ (3-30) where * * * * * * * *

( ) , ( ) are the integer at [1,10] such that ( ) 1 ( ) ( ) . ( ) ( ) ( ) 1 a n a n a n a n a n a n a n a n ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎧ − <⎢ ⎥≤ ⎪ ⎣ ⎦ ⎨ ⎡ ⎤ < ≤ + ⎪ _⎢ _⎥ ⎩ (3-31) * ( ) d

a n is the final action.

After the base station use the decision MCS to transmit and the transmission is

finished such that the reinforcement signal of each part is available, operations in Eq.

(37)

chapter 4

Chapter 4 Simulation Results and Discussions

4.1 System Environment and Parameters

c4 1

In our simulation, we consider a hexagonal grid cell structure. There are 19 base

stations (BS) in the multi-cell system to consider 2-tier neighboring cell interference.

For a HSDPA user, we assume that the HS-DSCH is allocated at maximum up to 80%

of the total power of a BS. In this thesis, we define HSDPA service power ratio

(HSPR) to represent the ratio of transmission power on the HS-DSCH for the HSDPA

user to the total transmission power at BS side. The residual power except for HSDPA

service will be used for other service and control channels within same cell. The

interference from other cell is fixed. Here we use HSPR, which controls not only the

amount of HSDPA transmission power but also the interference from self cell, as

condition variable to observe the system performance.

In the simulation, we will observe and compare the system performance of both

circumstances with and without CQI delay consideration. The CQI delay is set to be

6ms if considered. To evaluate the maximal achievable throughput, we assume that

the users always have data to be transmitted to. The channel model is described in

(38)

detailed simulation environment parameters are shown in Table 4-1.

Table 4-1 : Simulation parameters

Parameter Assumption Cellular layout Hexagonal grid, 19 sites, 1000m cell radius

Path loss model

(

ξ

( )

r

)

128.1 + 37.6log10(r)

r is the base station separation in kilometers Decorrelation length ( dcor) 30m

L

σ 8.0

Mobility assignment 0 km/hr to 120 km/hr, random distribution Carrier frequency 2.0 GHz

Channel bandwidth 5.0 MHz

Chip-rate 3.84 Mcps

Spreading factor 16 Thermal noise density -174 dBm/Hz

TTI length 2 ms

Forgetting factor (γ) 0.1 Learning rate (η) 0.9

N 50 BS total Tx power Up to 44 dBm

Power for HSDPA data transmission

Maximum of 80% of total maximum available transmission power

ACK/NACK delay 6ms

(39)

4.2 Conventional Schemes

c4 2

In the simulation, we will compare the proposed FQL-HARQ scheme with some

other conventional schemes. According to [10], we need to choose a suitable MCS at

initial transmission in the HARQ process to maintain the BLER requirement 0.1.

Three conventional schemes are described in the following:

¾ Fixed threshold selection [10] :

Based on the pre-known BLER performance, the fixed threshold selection

(FTS) scheme sets fixed SINR threshold for each MCS. The threshold is the

required SINR that the MCS has BLER equal to the requirement 0.1. At each

TTI, FTS will choose the MCS whose corresponding threshold is just under and

closest to the measured SINR.

¾ Adaptive threshold selection (Adaptive control of link adaptation [11] ) :

Compared with FTS, the adaptive threshold selection (ATS) scheme

improves the performance of users with high mobility. ATS sets threshold for

every MCS, too. Moreover, after a transmission is completed, the thresholds

which are close to the SINR of last transmission will be updated based on the

block decoding result. The thresholds will be increased if failed initial

transmission and be deceased if succeeded. The ratio of increasing and

decreasing step is set to be

* * 1 BLER BLER − . ¾ Q-learning based HARQ (QL-HARQ) [13] :

Without any pre-knowledge of BLER performance of each MCS,

QL-HAQR uses the Q-learning algorithm to learn an optimal policy in both link

(40)

designed to be the normalized difference square of received SINR and required

SINR for maintaining BLER=0.1. After learning, QL-HARQ will choose a MCS

whose required SINR to maintain BLER=0.1 is closest to the received SINR.

In next section, we will show the performance of the FQL-HARQ scheme and the

traditional schemes versus HSPR with and without CQI delay. Besides, we will also

display the simulation results of these schemes versus different UE mobility with

fixed power allocation, and discuss about it.

4.3 Simulation Results and Discussions

c4 3

Fig 4.1(a) and Fig 4.1(b) show the transmission block error rate versus HSPR

without and with 6ms CQI delay considering for the proposed FQL-HARQ scheme

and three comparative schemes. It can be seen in Fig 4.1(a) that when more than 70%

BS transmission power is allocated for HSDPA service, all schemes can perform MCS

adaptation under satisfying the BLER requirement without CQI delay. However,

when the CQI delay is considered, FTS and QL-HARQ will violate the BLER

requirement even with HSPR up to 80% as shown in Fig 4.1(b). The mobility of UEs

in the simulation is uniformly distributed at the range from 0 to 120 km/hr. Motion of

UEs will incur not only the Doppler Effect but the higher channel variance, and then

affect the accuracy of channel condition information for MCS determination. After 6

ms, the actual transmission channel condition may be much different from the

information used for determination. Compare the results of Fig 4.1(a) and Fig 4.1(b),

we can find that FTS and QL-HARQ are not flexible enough to accommodate to

imperfect CQI report. The MCS determination may not suitable to the transmission

(41)

30 35 40 45 50 55 60 65 70 75 80 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 HSPR (%) BL ER FQL QL ATS FTS

Figure 4.1(a): The BLER comparison versus HSPR without CQI delay.

30 35 40 45 50 55 60 65 70 75 80 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 HSPR (%) BL ER FQL QL ATS FTS

Figure 4.1(b): The BLER comparison versus HSPR with CQI delay.

(42)

On the other hand, ATS, the modified scheme for FTS, and our proposed scheme,

FQL-HARQ are more sensitive to the channel variance and able to modify the MCS

detection policy based on the past transmission result adaptively, so they can make the

BLER requirement as shown in Fig 4.1(a) and Fig 4.1(b). If failure initial

transmission occurs too frequently, ATS and FQL-HARQ will decrease the rating of

CQI and justify the decision rule to be conservative. So they can maintain the BLER

requirement.

It can be observed that BLER of FQL-HARQ will violate the requirement a little

when HSPR is smaller than 35% at both circumstances with and without CQI delay

consideration. This is because that at low SNR, there are fewer MCSs for selection as

shown in Fig 4.2. Since the SNR gap between the considerable MCSs at low SNR is

larger, the idea of FQL-HARQ to choosing more aggressive MCS if better short term

BLER (SBLER) than requirement will result in too aggressive MCS decision. When

at low HSPR, UE may face bad channel condition (low SNR) more frequently, and

hence too much forward MCS selection will accumulate. So FQL-HARQ is going to

violate the requirement at HSPR smaller than 35%. This can be resolved by increasing

N, the window size of SBLER. If we increase N to 500, FQL-HARQ can maintain the

requirement even with HSPR below 40% yet will decrease the throughput of the

system.

For the same reason, it can be seen in Fig 4.1(a) and Fig 4.1 (b) that BLER of

QL-HARQ will be affected by low HSPR more intensely. When the power allocated

for HSDPA user is less than 65% BS transmission power, the BLER performance of

QL-HARQ will get worse and violate the requirement severely. As mentioned in

section 4.2, the decision of QL-HARQ will be the MCS with required SINR

maintaining 0.1 BLER closest to the reporting CQI but neglecting whether the former

(43)

-4 -2 0 2 4 6 8 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SNR (dB) BL ER MCS1 MCS2 MCS3 MCS4 MCS5 MCS6 MCS7 MCS8 MCS9 MCS10

Figure 4.2 : The BLER of turbo code of each MCS versus SNR under AWGN.

considerable MCSs at low SNR as shown in Fig 4.2 and then result in too high BLER.

Compare FQL-HARQ and QL-HARQ: on account of considering the performance

and following HARQ process in reinforcement signal after using more aggressive

MCS, FQL-HARQ can avoid the BLER violation at low HSPR more effectively than

QL-HARQ does.

It can also be found in Fig 4.1 (a) that FTS and ATS have almost the same

performance when CQI delay is not considerable, unless at low HSPR. ATS gets a

little higher BLER than that of FTS at low HSPR. This is also due to the bigger SINR

gap within MCS at low SNR and the thresholds updating range of ATS.

Fig 4.3(a) and Fig 4.3(b) show the system throughput for the four schemes in

case of without and with 6ms CQI delay. Definitely it can be seen that as HSPR

increases, the throughput of all schemes increases in both cases, too, and all schemes

has better system throughput with perfect CQI than that of itself with CQI delay,

(44)

information can result in higher system throughput since it is helpful to decrease the

probability of executing wrong threshold adaptation. We can also find that FTS, the

only non-adaptive scheme, keep the maximal throughput among the four schemes. By

using perfect CQI, the adaptive operation of the other schemes will inference the

instant MCS decision and make it be too conservative when channel condition is

good.

Then compare the performance of FQL-HARQ and ATS, which are the only two

schemes able to make the requirement when CQI delay is considered. It can be seen in

Fig 4.1(b) that ATS keeps a lower BLER than FQL-HARQ does. This is for the reason

that ATS tune the selection threshold based on current ACK/NACK result directly and

immediately, while FQL-HARQ tunes the selection policy based on a long term

measure, BLER n , more sophisticatedly and then results in more slowly updating ( ) process than that of ATS. Nevertheless it is can be seen in Fig 4.3(b) that FQL-HARQ

reaches a much higher throughput than ATS does. This is because that only BLER

performance is considered and affects the threshold updating process for ATS scheme.

On the other hand, as mentioned in Chapter 3, the MCS decision of FQL-HARQ is

inferred from the fuzzy rules which are justified by reinforcement signals. Rule base

is designed and separated to different parts according to the BLER requirement while

the reinforcement signals are set so as to reward MCS with higher throughput. Since

both BLER maintaining and throughput maximizing are considered in FQL-HARQ,

the throughput can be enlarged by a more aggressive but safe MCS determination.

Due to the too immediately threshold tuning, the selection policy of ATS may oscillate

(45)

30 35 40 45 50 55 60 65 70 75 80 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 HSPR (%) T hr o ugh put ( M bp s ) FQL QL ATS FTS

Figure 4.3(a): The system throughput versus HSPR without CQI delay

30 35 40 45 50 55 60 65 70 75 80 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 HSPR (%) T hr o ugh put ( M bp s ) FQL QL ATS FTS

Figure 4.3(b): The system throughput versus HSPR with CQI delay. Figure 4. 3f

(46)

30 35 40 45 50 55 60 65 70 75 80 0.06 0.07 0.08 0.09 0.1 0.11 0.12 HSPR (%) Dr o p p in g Ra te FQL QL ATS FTS

Figure 4.4 : The dropping rate comparison versus HSPR with 6ms CQI delay

Fig 4.4 depicts the dropping rate versus HSPR with 6ms CQI delay. In the

simulation, every transmission block has at most three times of retransmission. If the

block fails to be decoded after the third retransmission, the block will be dropped. It

can be seen obviously that a more conservative initial transmission MCS selection,

which has smaller initial BLER, can result in lower retransmission dropping rate. Low

dropping rate can decrease the signaling cost, however may reduce the system

throughput by using too conservative MCS. As shown in Fig 4.3(b) and Fig 4.4,

FQL-HARQ can keep a more balance performance in the trade off between dropping

rate and throughput maximizing than the other three schemes. It can also be found in

Fig 4.4 that QL-HARQ is more sensitive to HSPR than the others. As mentioned, this

is because of the MCS BLER distribution versus SNR shown in Fig 4.2. At high SNR,

there are more MCSs for selection and then smaller SNR gap between the

considerable MCSs, so the QL-HARQ can execute a more accurate learning process.

(47)

increases while the BLER of both schemes decrease as shown in Fig 4.1(b). This is

because that the SNR gap between considerable MCS at high SNR is smaller than that

at low SNR. The considered coding rate schemes are 4, , , 3 2 1 , 1

5 4 3 2 3 of the order with SNR performance. If the decoding of the transmission block fails, appending

redundancy bits will be transmitted to the user so that the block will have the next

stage coding rate after retransmission. When the SNR gap of MCS is small, which

means the BLER performance will improve a little after retransmission, the decoding

failure rate will still be high after three retransmissions. So the dropping rate of FTS

and ATS will arise little at high HSPR. On the other hand, since the dropping

condition is considered in FQL-HARQ, which the Q values of fuzzy rules will be

updated by a severe punishment signal, the dropping rate can keep stable as HSPR

(48)

Fig 4.5(a) and 4.5(b) show the BLER performance and system throughput of the

four schemes versus different user mobility. In the simulation, BS allocate 80% of the

total transmission power for the HSDPA user, and CQI has 6ms delay. We can see that

all schemes has better performance when the UE is immotility than that of itself when

the UE with mobility. Besides, FTS, the only scheme with fixed selection policy, has

better BLER and throughput performance than the other three adaptive schemes when

the UE is at low speed and with low cannel condition variance. However FTS has the

worst BLER requirement violation when the UE at mobility higher than 45 km/hr

among all schemes. This is due to the channel information inaccuracy resulting from

CQI delay and Doppler Effect. On the other hand, when the variance of CQI

inaccuracy is small, i.e. UE at low mobility, schemes with too rapidly channel

adaptation, i.e. ATS, will choose too conservative MCS and result in non-effective

system throughput. Again we can find that FQL-HARQ can reach the maximal system

throughput among the schemes which can maintain the BLER requirement at the

same time.

Surprisingly, we can also find in Fig 4.5(a) and Fig 4.5(b) that when the mobility

is beyond 30 km/hr and get higher, the BLER will decrease and the system throughput

will increase a little for all schemes on the contrary. This is because when the mobility

of the UE is higher than 30 km/hr, the effect of channel variance and CQI inaccuracy

are almost the same. Instead, the effect of path loss will dominate the system

throughput. A traveling UE may either move toward the BS and then get better

channel condition or move apart the BS and then get worse path loss effect. Fig 4.6 is

the path loss effect versus the distance in km between BS and UE. In reality, a mobile

user in the cell should have moving direction with uniform probability distribution. It

can be observed from the path loss curve in Fig 4.6 that with the same movement, the

(49)

0 20 40 60 80 100 120 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Mobility (km/hr) BL ER FQL QL ATS FTS

Figure 4.5(a): The BLER versus different mobility of UE with CQI delay

0 20 40 60 80 100 120 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 Mobility (km/hr) T hr o ugh put ( M bp s ) FQL QL ATS FTS

在高速下行封包擷取系統中採用乏晰Q-Learning技術之混合自動重傳機制

國 立 交 通 大 學

電信工程學系

碩 士 論 文

在高速下行封包擷取系統中

採用乏晰 Q-Learning 技術之混合自動重傳機制

HARQ Process for HSDPA by Fuzzy

Q-learning Technique

研 究 生：黃巧瑩

指導教授：張仲儒 博士

在高速下行封包擷取系統中

採用乏晰 Q-Learning 技術之混合自動重傳機制

HARQ Process for HSDPA by Fuzzy Q-learning Technique

研 究 生：黃巧瑩 Student：Chiao-Yin Huang

指導教授：張仲儒 博士 Advisor：Dr. Chung-Ju Chang

國 立 交 通 大 學

電 信 工 程 學 系

碩 士 論 文

A Thesis

Submitted to Department of Communication Engineering

College of Electrical and Computer Engineering

National Chiao Tung University

in partial Fulfillment of the Requirements

for the Degree of Master of Science

in

Communication Engineering

July 2008

Hsinchu, Taiwan

中華民國九十七年七月

在高速下行封包擷取系統中

採用乏晰 Q-Learning 技術之混合自動重傳機制

國立交通大學電信工程學系碩士班

摘 要

HARQ Process for HSDPA by Fuzzy Q-learning Technique

Department of Communication Engineering

National Chiao Tung University

ABSTRACT

I

誌 謝

Contents

mimi 4

y List of Tables

List of Figures

Chapter 1

Introduction

Chapter 2

System Model

2.1 HSDPA System

2.2 HARQ Scheme

2.3 Channel Model

∑

Chapter 3

Fuzzy Q-learning based HARQ Scheme

3.1 Fuzzy Q-Learning Algorithm

c3 1

(

)

(

)

∑

[

[

]

{

}

{

}

(

)

∑

∑

(

)

(

)

(

)

{

}

∑

國立交通大學

碩士論文

研究生：黃巧瑩

指導教授：張仲儒博士

研究生：黃巧瑩 Student：Chiao-Yin Huang

指導教授：張仲儒博士 Advisor：Dr. Chung-Ju Chang

國立交通大學

電信工程學系

碩士論文

摘要

誌謝