An Intelligent HARQ Scheme for HSDPA

(1)

An Intelligent HARQ Scheme for HSDPA

Chiao-Yin Huang, Wen-Ching Chung, Member, IEEE, Chung-Ju Chang, Fellow, IEEE, and

Fang-Ching Ren, Member, IEEE

Abstract—In this paper, we propose an intelligent hybrid au-tomatic repeat request (iHARQ) scheme for high-speed downlink packet access (HSDPA) systems. The challenge in the hybrid automatic repeat request (HARQ) control problem of HSDPA is how to choose an appropriate modulation and coding scheme (MCS) for initial transmission in the situation wherein the channel quality indication (CQI) has report delay. The iHARQ scheme will determine the suitable MCS to maximize the system through-put and guarantee the block error rate requirement in such an uncertain environment. By modeling the HARQ behavior as a Markov decision process, we adopt fuzzy logics to determine an appropriate MCS for each initial packet transmission. In addition, a Q-learning algorithm is utilized to update the fuzzy rule base, according to well-designed reinforcement signals fed back from the HSDPA system, such that the iHARQ scheme can adapt to the delayed CQI. Simulation results show that, com-pared with a conventional adaptive threshold selection method, the proposed iHARQ scheme increases the system throughput by up to 75.2%.

Index Terms—Block error rate (BLER), fuzzy logics, high-speed downlink packet access (HSDPA), hybrid automatic repeat request (HARQ), Q-learning (QL), system throughput.

I. INTRODUCTION

H

IGH-SPEED downlink packet access (HSDPA) [1] has been introduced in Release 5 of the Universal Mobile Telecommunications System to provide higher peak data rate, increase capacity, and reduce latency. The HSDPA contains a high-speed downlink shared channel (HS-DSCH) to allocate a large fraction of downlink resources to a specific user [2]. There are a number of performance enhancement technologies ap-plied in HS-DSCH to achieve link adaptation, such as adaptive modulation and coding (AMC), extensive multicode operation, hybrid automatic repeat request (HARQ), and so on [3]. The HARQ combines the forward error correction mechanism with the original automatic repeat request function, where it operates per transport block. When the cyclic redundancy check (CRC) for HS-DSCH has an error, a retransmission of the original transport block is required. The AMC is used to enhance the spectrum efficiency according to the channel condition. Although the higher modulation and coding scheme (MCS) Manuscript received April 15, 2010; revised March 2, 2011; accepted March 2, 2011. Date of publication March 10, 2011; date of current version May 16, 2011. This work was supported in part by the National Science Council (NSC), Taiwan, under Contract NSC 97-2221-E-009-098-MY3 and in part by the Ministry of Education, Taiwan, under the Aiming for Top University plan. The review of this paper was coordinated by Prof. N. Arumugam.

C.-Y. Huang, W.-C. Chung, and C.-J. Chang are with the Department of Electrical Engineering, National Chiao Tung University, Hsinchu 300, Taiwan (e-mail: [email protected]).

F.-C. Ren is with the Information and Communications Research Laborato-ries, Industrial Technology Research Institute, Hsinchu 300, Taiwan.

Digital Object Identifier 10.1109/TVT.2011.2125809

level can transmit more information bits, this would be more susceptible to bit error, thus causing more retransmission and lower system throughput. Therefore, the challenge of HARQ in HSDPA is the choice of a suitable MCS level for the initial packet transmission for HARQ.

There are three kinds of schemes to implement the HARQ [4]–[6]: 1) chase combining (CC); 2) incremental redundancy (IR); and 3) HARQ-type-III schemes. By the CC scheme, each retransmission is with an identical copy of the original transmission. After each retransmission, the receiver combines these multiple copies of packet according to the received signal-to-noise ratio (SNR). By the IR scheme, each retransmission consists of new redundancy bits to enhance the error correction ability. The modulation scheme and the number of coded bits for each retransmission can be different from that for the origi-nal transmission. Since the retransmission contains additioorigi-nal parity bits, the code rate is lowered by the retransmission. By the HARQ-type-III scheme, each retransmission consists of both information bits and new redundancy bits. Hence, in this scheme, each retransmission can be self-decodable without combing the previous transmission. In this paper, we adopt the IR scheme to implement the HARQ since it has higher error correction ability.

In addition, the MCS for HARQ control has attracted a lot of attention [7]–[12]. The MCS level selection for the initial packet transmission of the transport block is usually given according to the channel quality indication (CQI) to satisfy the block error rate (BLER) requirement. A fixed threshold selection method for MCS level was proposed in the 3rd Gen-eration Partnership Project TR25.214 [8]. However, due to the measurement inaccuracy and the report delay of CQI, the fixed threshold method would not be effective. An adaptive threshold selection (ATS) method [9] was proposed to tune the signal-to-interference-plus-noise ratio (SINR) threshold for each MCS based on the last transmission result. In addition, an MCS selection criterion was analyzed to find the mapping between SINR and MCS to maximize the average user throughput [10]. An adaptive control algorithm, considering not only the last transmission result but the age of CQI reports as well, was studied in [11].

Moreover, a new HARQ procedure based on a Q-learning algorithm (QL-HARQ) [12] was once proposed to learn the pol-icy of choosing the MCS. Primarily, Q-learning (QL) is a very strong off-policy temporal-difference (TD) control method for reinforcement learning [13]–[16]. To decide the most suitable action, the QL algorithm learns the action-value function by a feedback reinforcement signal, which can be a reward or a punishment. However, it is very hard for the QL algorithm to explicitly describe the relationship between the MCS level and 0018-9545/$26.00 © 2011 IEEE

(2)

the throughput when the CQI report is obsolete, considering the fulfillment of the BLER requirement.

A more powerful intelligent technique called fuzzy QL (FQL) algorithm [17], [18] has been proposed to model the mo-tion and way of thinking of humans in robot design. The FQL algorithm is an extension of the QL into fuzzy inference sys-tems, where fuzzy logic [19], [20] is a mathematical approach to emulate the way humans think by using if–then rules. Fuzzy logics can deal with soft reasoning instead of crisp. However, the performance of fuzzy logics is highly related to the design of the fuzzy rule base. The FQL algorithm applies the QL algo-rithm in fuzzy logics to adaptively adjust the fuzzy rule base for the varying environment according to the feedback reinforce-ment signal. Therefore, the FQL algorithm takes advantages from both fuzzy logics and QL to adapt the closed-loop system to uncertain environments. Many applications of the FQL algo-rithm can be found in [21]–[23]. For example, call admission control using the FQL algorithm for wideband code-division multiple-access/wireless local area network heterogeneous networks with multimedia traffic can achieve higher system throughput and lower handoff rate than conventional effective schemes under the quality-of-service (QoS) requirements [23]. Therefore, we propose an intelligent HARQ (iHARQ) scheme using the FQL algorithm for HSDPA systems in this paper with the purpose of maximizing the system throughput and to keep the BLER requirement guaranteed. The iHARQ scheme, which is a sort of IR scheme, not only determines an appropriate MCS level for initial packet transmission but learns the function between the MCS level and the measured BLER with delayed CQI as well. An initial result of this paper was given in [24]. In this paper, we further effectively divide the fuzzy rules into three parts to accelerate the learn-ing process. Moreover, the reinforcement signal is properly designed according to the results of both initial transmission and retransmissions. Simulation results show that the iHARQ scheme outperforms the conventional ATS scheme [9] and the QL-based HARQ (QL-HARQ) scheme [12]. In particular, when the mobile user is in speed of between 10 and 120 km/h, the iHARQ scheme can increase the system throughput up to 75.2% compared with the ATS scheme and up to 16.4% compared with the QL-HARQ scheme.

The remainder of this paper is organized as follows: The system model and the QL algorithm are introduced in Section II. Section III presents the design of the iHARQ scheme. Section IV shows the simulation results for the per-formance analysis of the iHARQ scheme. Finally, conclusions are given in Section V.

II. PRELIMINARIES A. HSDPA Model

In HSDPA, Node B controls the link adaptation and the fast physical retransmission combining instead of the radio network control [1]. The user data are transmitted on the HS-DSCH, which is multiplexed into the high-speed physical downlink shared channel. The HS-DSCH supports multicode transmission and code multiplexing of different users. The

TABLE I

MCS LEVELS FOR THEHSDPA SYSTEM

spreading factor for HS-DSCH is fixed at 16. However, the maximum number of codes that can be allocated to a user is 15, since the common channel reserves one code space. The modulation orders supported are quaternary phase-shift keying and 16-quadratic-amplitude modulation, and the coding scheme of HS-DSCH is turbo code. In this paper, the code rates of the turbo encoder are considered as {1/3, 1/2, 2/3, 3/4, 4/5} [12]. Therefore, there are in total ten kinds of MCS levels being chosen, as shown in Table I, where MCS level 10 can carry the highest data rate, whereas MCS level 1 can carry the lowest data rate, under the additive white Gaussian noise (AWGN). To achieve a short round trip delay and improve the tracking of fast channel variations, a packet transmission time interval (TTI) is specified as 2 ms.

The physical layer control information is carried on the high-speed shared control channel. Each control information contains the channelization code set, the modulation scheme, the transport block size, HARQ-related parameters, and a CRC attachment. An acknowledgement (ACK) or a negative acknowledgement (NACK) will be sent in the uplink high-speed dedicated physical control channel (HS-DPCCH) to indicate the result of the HS-DSCH decoding. In addition to ACK/NACK indication, the HS-DPCCH also carries CQI to Node B to recommend which transport format is sufficient to satisfy the BLER requirement. The value of CQI is measured by the user equipment (UE) through the common pilot channel and is an integer ranging from 1 to 30.

B. Channel Model

Terrestrial mobile radio channels in urban areas are consid-ered for the HSDPA system. The channel condition at time t, which is denoted by F (t), is modeled by long-term fading and short-term fading and is given by

F (t) = ξ(d)× 10η/10× ζ(t) (1) where ξ(d)× 10η/10 is the long-term fading, including path loss ξ(d) and shadowing effect 10η/10; d is the distance be-tween Node B and UE; η is a normal-distributed random variable with zero mean as well as variance σ2_L; and ζ(t) is the short-term fading caused by the multipath effect. The ζ(t) is modeled by the Jakes model [25] and is given by

ζ(t) = 2σ 2 L M m=1 cos (2πfDt cos(2πm/L) + θm) ejβm (2)

(3)

where σ is the radical of the average power signal, L = 4M + 2 is the number of signal paths, M is the number of complex oscillators to model the fading waveform, fDis the Doppler

fre-quency, βm= πm/(M + 1), and θm= βm+ 2πs/(M + 1),

for s = 0, 1, 2, . . . , M− 1. The changing of the shadowing effect for a mobile user is also considered. In this paper, we set M = 4, and there are in total 18 signal paths. Practically, the shadowing effects of two sampling points are highly cor-related since the duration of the sampling time is very short compared with the motion of the user. We model the corre-lation of the shadow fading as a normalized autocorrecorre-lation function [26] denoted by ρ(Δx), which is given by ρ(Δx) = e−(|Δx|/dcor) ln 2_{, where Δx is the user distance between the two}

sampling points, and dcoris the decorrelation length.

C. QL Algorithm

The QL algorithm learns the value of the state–action pair via a reinforcement signal. The main idea of the reinforcement learning adopted here for iHARQ is to learn an optimal policy to make a proper decision that can maximize (minimize) the ac-cumulation of benefit (cost) in the future [13]. The expectation of the benefit accumulation is called Q-function, which counts from the initial state–action pair (X0, a0) over an infinite time.

It is denoted by Q(X0, a0) and is given as

Q(X0, a0) = E _∞ n=0 γnr (X(n), a(n))|X(0) = X0, a(0) = a0 (3)

where X(n) is the input system state at the decision episode of the nth TTI, a(n) is the action at episode n, r(X(n), a(n)) is the reward (cost) or the so-called reinforcement signal that is a function of X(n) resulting from the action a(n), γ is the discount factor, and E[·] is the expectation operation. The output of the Q-function is called the Q-value and is the expectation value of the weighted accumulation of rewards (costs) from present to the future. This expectation value will be affected by the selected action a under system state X at the current decision episode n = 0. The optimal action, which is denoted by a∗, can be obtained by

a∗ = arg max

a Q(X0, a). (4)

However, the system state and the value of the reinforcement signal in the future are usually unavailable in reality. Watkins and Dayan [13] proposed a recursive method called the Q-learning algorithm to obtain an optimal Q-function. This algorithm works recursively at each decision episode by Qn+1(X, a) = ⎧ ⎨ ⎩ Qn(X, a)

+αΔQn(X, a), if X = X(n) and a = a(n)

Qn(X, a), otherwise

(5)

where Qn(X, a) is a transient form for the optimal Q-function

at the decision episode of the nth TTI, α∈ [0, 1] is a learning rate at the nth TTI, and

ΔQn(X, a) = r (X(n), a(n))− Qn(X, a)

+ γ max

b [Qn(X(n + 1), b)] . (6)

Here, (X(n + 1), b) is the state–action pair at TTI (n + 1). The feedback reinforcement signal r(X(n), a(n)) is used to update Qn(X, a) to get a new Q-value Qn+1(X, a) at TTI

(n + 1). It should be noticed that only the selected action–pair (X(n), a(n)) has to update its Q-value. Thus, an optimal Q-function can be achieved to perform the suitable action selection.

III. INTELLIGENTHYBRIDAUTOMATICREPEATREQUEST (iHARQ) SCHEME

The iHARQ scheme using the FQL algorithm is here de-signed to appropriately determine a suitable MCS level for the initial packet transmission of the transport block in HARQ control. Usually, the decision problem of the initial packet transmission in the HARQ control scheme is formulated as a discrete-time Markov decision process (MDP). The iHARQ scheme applies the FQL technique to model and solve the MDP problem.

At every decision episode of the nth TTI, the iHARQ scheme selects the average BLER performance indication and the CQI, which is denoted by BLER(n) and CQI(n), respectively, as input linguistic variables to infer the suitable action of MCS level for the initial packet transmission. Thus, it has the input system state at the nth TTI, which is denoted by X(n) and expressed as

X(n) = BLER(n), CQI(n). (7) It defines the output linguistic variable at the nth TTI, which is denoted by a(n), as the MCS level expressed as

a(n) = M CSk, k = 1, . . . , K. (8)

Note that the smaller MCS level has better BLER perfor-mance but lower system throughput, and there are ten kinds of MCS levels (K = 10) for the HARQ control given in Table I.

The functional block diagram of the iHARQ scheme is shown in Fig. 1. At the beginning of the nth episode of TTI, the input linguistic variables of the system state X(n) = (BLER(n), CQI(n)) are measured and transformed into fuzzy linguistic terms with their associated membership func-tions defined in the fuzzifier. According to these fuzzy linguistic terms, the Q-values can be obtained through the designed fuzzy rules in the fuzzy rule base. These Q-values are denoted by qn(Sj, M CSk) for j = 1, . . . , J and k = 1, . . . , K, where Sj

is the jth fuzzy linguistic term of X(n), M CSk is the kth

MCS level of a(n), J is the total number of input linguistic terms (system states), and K is the total number of output MCS level. Then, the optimal action a∗_j(n) for each state–action pair j can be inferred at the nth episode of the TTI by the

(4)

Fig. 1. Functional blocks of the iHARQ scheme.

Fig. 2. Membership function of BLER(n).

inference engine. The intensity of X(n) belonging to Sj, which

is denoted by μj,n, is also yielded in this block. Finally, the

optimal action a∗(n) can be obtained for the HSDPA system through the defuzzifier. Afterward, the reinforcement signal r(n) is then attained and fed back to the block of Q-function update, where the adjustment step of the Q-value in the fuzzy rule base for the input Sjand output a∗(n), which is denoted by

Δqn(Sj, a∗(n)), is derived. The detailed designs of the iHARQ

scheme are subsequently given. A. Fuzzifier

The fuzzifier emulates the feeling of humans to the en-vironment. Therefore, the fuzzifier transforms each mea-sured input of the system state X(n) = (BLER(n), CQI(n)) into fuzzy linguistic terms and their associated member-ship functions. The fuzzy term set for BLER(n) is defined as T (BLER(n)) ={Green(G), Y ellow(Y ), Red(R)}. The term G(R) in T (BLER(n)) denotes that the BLER per-formance satisfies (violates) the BLER requirement BLER∗. Each term A in BLER(n) has a membership function de-noted by μA(BLER(n)), A = G, Y , or R. As shown in

Fig. 2, μY(BLER(n)) is a trapezoidal function with left (right)

edge A2 (A3) and left (right) width A2− A1 (A4− A3);

μG(BLER(n)) (μR(BLER(n))) is a right (left-)-hand-sided

trapezoidal function with right (left) edge A1 and right (left)

width A2− A1 (A4− A3). The Ai, for i = 1, . . . , 4, is set to

indicate the degree of the performance for BLER(n). The A2

and A3 should be chosen to be close to BLER∗ to satisfy

the BLER requirement and enhance the system throughput. We set A2= 0.7× BLER∗and A3= 0.9× BLER∗. In addition,

we set A1 (A4) to be 0.5(1.0)× BLER∗ to indicate that

BLER(n) is in the very safe (dangerous) region.

The fuzzy term set for CQI(n) is defined as T (CQI(n)) = {Level 1 (L1), Level 2 (L2), Level 3 (L3), Level 4 (L4), Level 5 (L5), Level 6 (L6), Level 7 (L7), Level 8 (L8), Level 9 (L9), Level 10 (L10)}. Each term in T (CQI(n))

Fig. 3. Membership function of CQI(n).

stands for the status of one CQI and is denoted by one kind of 10 MCS levels adopted to guarantee BLER∗ during this CQI region. Each term B in CQI(n) has a membership function denoted by μB(CQI(n)), B = L1, L2, . . ., or L10. As shown

in Fig. 3, μLi(CQI(n)), i = 2, . . . , 9 is a triangular function

with center at Biand left (right) width Bi− Bi−1(Bi+1− Bi);

μL1(CQI(n)) (μL10(CQI(n))) is a trapezoidal function with

right (left) edge at B1(B10) and right (left) width B1(B10−

B9). The Bk, for k = 1, . . . , K, is set to be the required

CQI value to maintain BLER∗ for M CSk, and B1= 1 and

Bk= 3× k, for k = 2, . . . , 10. There are J = 30 kinds of

linguistic terms (states) in X(n), and Sj, 1≤ j ≤ J, and

(μA(BLER(n)), μB(CQI(n))) are the outputs to the fuzzy

rule base and the inference engine, respectively. B. Fuzzy Rule Base

The fuzzy rule base emulates the memory of humans reacting to the environment. Each fuzzy rule j for the jth linguistic term in the fuzzy rule base 1≤ j ≤ J is designed as

Rule j : if X(n) is Sj, then a(n) is M CSkwith

qn(Sj, M CSk), for 1≤ k ≤ K. (9)

The qn(Sj, M CSk) denotes the Q-value of the state–action

pair (Sj, M CSk) and will be reported to the inference engine.

To accelerate the learning procedure, we divide the fuzzy rule base into green, yellow, and red parts according to the fuzzy term of BLER(n). Within each part, the candidate set for MCS selection is determined by the fuzzy term of CQI(n).

Green Part: If BLER(n) is Green and CQI(n) is Level m, then a(n) is M CSk with qn ((BLER(n) is Green,

CQI(n) is Level m), M CSk), k > m. In this part, since

the value of BLER(n) is much smaller than the BLER requirement BLER∗, the MCS selection should be more aggressive to enhance the system throughput. Thus, only

(5)

M CSk with k > m are taken into account. Note that the

MCS with larger level can transmit more information bits. Yellow Part: If BLER(n) is Y ellow and CQI(n) is Level m,

then a(n) is M CSk with qn ((BLER(n) is Y ellow,

CQI(n) is Level m), M CSk), m− 1 ≤ k ≤ m + 1.In

this part, since the BLER performance is around BLER∗, it is better to keep the BLER performance unchanged. Hence, only M CSk with k = m− 1, m and m + 1 are

considered as candidate actions.

Red Part: If BLER(n) is Red and CQI(n) is Level m, then a(n) is M CSk with qn ((BLER(n) is Red, CQI(n) is

Level m), M CSk), k < m. In this part, since the BLER

requirement is violated, the MCS selection should turn to be conservative so as to fulfill the BLER performance requirement. Hence, only M CSkwith k < m are chosen.

In addition, the Q-value of the selected action a∗(n) will be updated here by another input from the Q-function update, named Δqn(Sj, a∗(n)), which is a function of the

reinforce-ment signal and will be given in Section III-D. The Q-value update is given by

qn+1(Sj, a∗(n)) = qn(Sj, a∗(n)) + α× Δqn(Sj, a∗(n))

for 1≤ j ≤ J (10) where α is the learning rate. If the packet is received success-fully under the selected MCS, then the Q-value for this MCS would be increased. Hence, for each rule j, the MCS with maximum Q-value implies that the selected MCS is optimal at this situation.

C. Inference Engine and Defuzzifier

The inference engine emulates the way of thinking of humans to infer an action for each possible situation. Here, the inference engine determines an optimal action for each fuzzy rule j, 1≤ j ≤ J at every nth TTI a∗_j(n). It takes μA(BLER(n)), A = G, Y, R, μB(CQI(n)), B =

L1, . . . , L10, and qn(Sj, M CSk), 1≤ j ≤ J, 1 ≤ k ≤ K as

inputs and delivers the intensity of X(n) belonging to Sj,

which is denoted by μj,n, which is defined as a product

oper-ation [20] of the μA(BLER(n)) and μB(CQI(n)) associated

with Sjas outputs to the defuzzifier. The μj,nis expressed as

μj,n= μA

BLER(n)× μB(CQI(n)) , 1≤ j ≤ J

(11) where A is G, Y , or R (B is L1, . . . , or L10) with respect to Sj. It also infers the a∗j(n) for each fuzzy rule j, 1≤ j ≤ J, by

the exploration/exploitation policy (EEP) [21] by a∗_j(n) = arg max

k qn(Sj, M CSk), 1≤ j ≤ J. (12)

The inference engine then outputs the intensity and the optimal action of Sj at the nth TTI, μj,n and a∗j(n), to the

defuzzifier.

Finally, the defuzzifier is to determine an optimal action for this nth TTI HARQ operation. The defuzzifier emulates the way humans think when making a decision. Taking μj,n

and a∗_j(n) as inputs, the defuzzifier obtains the global optimal action, which is denoted by a∗(n), by

a∗(n) = J j=1μj,n× a∗j(n) J j=1μj,n . (13)

However, the value of a∗(n) may be a real number, whereas the actual output value should be an integer. Therefore, the final output action a∗(n) is to round off a∗(n) in (13) to obtain an integer value.

D. Q-Function Update

The Q-function update emulates the learning process of humans with interactions of the environment. It generates Δqn(Sj, a∗(n)) to update the Q-value in the fuzzy rule base.

The input of the Q-function update is the reinforcement signal r(n), which is derived according to the transmission results of the iHARQ scheme in the HSDPA system. The Q-value update for each state–action pair by the reinforcement signal will be performed when every packet transmission is finished. Each rule in the same part of the fuzzy rule base will have the same reinforcement signal to generate the Q-value to reward the selected MCS level.

For the green part, the MCS level selection would be more aggressive to maximize the system throughput. The reinforce-ment signal at the nth TTI is designed here according to the results of both initial transmission and retransmissions and given as

r(n) =

_R

k

Rk+Rd,k, if packet is successfully received

−10, if packet is dropped (14)

where Rk is the number of information bits in the transmitted

packet with M CSk, and Rd,k is the summation of redundant

bits in the initial transmission and in the retransmission. No-tice that BLER(n) decreases only when the initial packet is successfully received without retransmission. The retrans-mission increases BLER(n); however, the throughput is also increased. Therefore, to achieve aggressive MCS selection, the selected MCS level is here enforced when the packet is suc-cessfully received without regard to either initial transmission or retransmission. It can be expected that the higher the ratio of information bits to the total transmitted bits, the larger the reward feedback. However, as the retransmission occurs, the reward will be decreased since Rd,kincreases. If the reception

of this packet failed after three retransmissions, then this packet will be dropped, and a severe punishment will be given by letting r(n) =−10.

For the yellow part, the MCS level would be selected to keep the BLER performance unchanged. Hence, the reinforcement signal is defined as r(n) = ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩ σ×Rk

R∗, if packet is successfully received

without retransmission −1 ×Rk

with retransmission −10, if packet is dropped

(6)

where R∗ is the largest information bits that the system can support, which is used as a normalized factor, and σ is the ratio of failure transmissions to successful transmissions, which is used as a weighting factor. Since the BLER requirement is 0.1, the value of σ is set to 1/8. In this part, if the packet is successfully received without retransmission, then the selected MCS level would receive a slight reward by slowly increasing its Q-value. Once the MCS level makes BLER(n) increasing, a punishment should be given to react this event and to preserve the BLER performance in this part. Therefore, if the packet is successfully received with retransmission, then the selected MCS level would receive a punishment by setting r(n) =−1 × (Rk/R∗) to balance off the reward. If the packet is dropped,

then a severe punishment, i.e., r(n) =−10, is given.

For the red part, the MCS level would be chosen to recover the BLER performance. Hence, the reinforcement signal r(n) is set as r(n) = ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩ Rk

without retransmission

−1, if packet is successfully received with retransmission

−10, if packet is dropped.

(16)

If the packet is successfully received without retransmission, then the selected MCS level will receive a reward. Since this MCS level selection decreases the BLER(n), the reward is set to Rk/R∗to make a fast recovery of the BLER performance. If

the packet is successfully received with retransmission, then the punishment will be set larger than the reward to let the iHARQ scheme have an opportunity to select other MCS levels in the next decision process. However, this punishment should not be too large, since the packet is successfully received. Therefore, we set r(n) =−1. If the packet is dropped, then a severe punishment, i.e., r(n) =−10, is given to retrain this action.

Whenever the reinforcement signals are yielded, the adjust-ment step for the Q-value of state Sj and the selected action

a∗(n), i.e., Δqn(Sj, a∗(n)), is obtained by Δqn(Sj, a∗(n)) = μj,n ΣJ i=1μi,n × {r(n) − Qn(X(n), a∗(n)) + γ× Q∗_n(X(n + 1), a(n + 1))} (17)

where Qn(X(n), a∗(n)) is the Q-value for the state–action

pair (X(n), a∗(n)), which is the weighted summation of qn(Sj, a∗j(n)), for j = 1, . . . , J by using the rule intensity μj,n

of X(n) and is given by Qn(X(n), a∗(n)) = ΣJ j=1 μj,n× qn Sj, a∗j(n) ΣJ j=1μj,n (18)

and Q∗n(X(n + 1), a(n + 1)) is the optimal Q-value among

all state–action pairs at the (n + 1)st TTI. Since the next-stage Q-value for each state–action pair is unavailable, we

calculate Q∗_n(X(n + 1), a(n + 1)) by qn(Sj, a∗j(n)), instead of

qn+1(Sj, a∗j(n + 1)), and obtain it by Q∗_n(X(n + 1), a(n + 1)) =Σ J j=1 μj,n+1× qn Sj, a∗j(n) ΣJ j=1μj,n+1 . (19) Finally, the procedure of the iHARQ scheme is summarized as follows.

Step 1: Transform the received input variables X(n) = (BLER(n), CQI(n)) into a fuzzy linguistic term Sj, 1≤

j≤ J, where Sj= (A, B), A = G, Y , or R, and B =

L1, L2, . . ., or L10, and calculate values of membership functions μA(BLER(n)) by Fig. 2 and μB(CQI(n)) by

Fig. 3 in the fuzzifier.

Step 2: Find out the Q-value of the state–action pair (Sj, M CSk), qn(Sj, M CSk) by (9) from the fuzzy rule

base.

Step 3: Calculate the intensity of X(n) belonging to Sj, μj,n,

by (11), and determine an optimal action for each fuzzy rule j, a∗_j(n) by (12) in the inference engine.

Step 4: Determine a global optimal action a∗(n) by (13) in the defuzzifier.

Step 5: Derive the reinforcement signal r(n) by (14), (15), or (16) from the HSDPA system and calculate the adjustment value of the Q-value Δqn(Sj, a∗(n)) by (17) in the

Q-function update.

Step 6: Update the Q-value by (10) in the fuzzy rule base. IV. SIMULATIONRESULTS ANDPERFORMANCEANALYSIS A. Simulation Environment

In the simulation analysis, Matlab is used for the simulation. We consider a hexagonal grid multicell HSDPA system, where each cell suffers two-tier neighbor cell interference. Assume that there are in total ten users uniformly distributed in the HSDPA system. The direction of the UE mobility is uniformly distributed and fixed during data transfer. The mobility of UEs is also uniformly distributed over the range from 10 to 120 km/h, and the motion of UEs will incur the variation of the channel condition due to the Doppler effect. If a UE moves out the cell, a new UE will be added into this cell with uniform dis-tribution. For convenience, we suppose that the user always has packets waiting for transmission in the buffer. The channel con-dition, as given in (1) and (2), consists of long- and short-term fadings but is assumed to be constant within a TTI. The delay of either CQI delay or ACK/NACK is assumed to be 3× TTI, i.e., 6 ms. The power allocated to HS-DSCH is assumed to be maximum up to 80% of the total transmission power of Node B. The residual power is completely allocated to other service and control channels. In addition, set BLER∗= 0.1, and list the other parameters of this system in Table II. Note that the average BLER BLER(n) is obtained by calculating the failed packets over the total transmitted packets from the beginning of the packet transmission to the present time, i.e., the nth episode. We simulate the transmission of packet encoded by turbo code under the AWGN and obtain the relationship between the BLER of packet with each MCS versus SNR under the AWGN.

(7)

TABLE II

PARAMETERS FOR THEHSDPA SYSTEM

According to this relationship, the success or failure of packet decoding for the received SNR is determined.

The iHARQ scheme is compared with two conventional adaptive HARQ schemes. The first scheme is the ATS scheme [9]. The ATS scheme adaptively sets the SINR threshold for each MCS level. After each transmission, the thresholds that are close to the SINR of the last transmission will be in-creased (dein-creased) if the transmission fails (succeeds). The increasing step of the threshold adjustment for the failed transmission is set to be 0.1× (1 − BLER∗), whereas the decreasing step for the successful transmission is set to be 0.05× BLER∗. The second scheme is the QL-based HARQ (QL-HARQ) scheme [12]. The QL-HARQ scheme uses the QL algorithm to learn an optimal policy in both link adapta-tion and HARQ retransmission version. The transmission cost is defined as ((SIN R(xk, ak)− SIN R(ak))/ SIN R(ak))2,

where SIN R(xk, ak) is the received SINR at UE for the state

xk with action ak, and SIN R(ak) is the required SINR with

action ak. After learning, the QL-HARQ scheme will choose

the MCS where the SINR threshold is closest to the received SINR.

B. Performance Analysis

We define an HSDPA service power ratio (HSPR) as the transmission power allocated to HS-DSCH over the total trans-mission power of Node B for the performance analysis. If the value of HSPR for a Node B is small, then it might not guarantee the BLER requirement for the UE in bad channel condition.

Fig. 4 depicts the system throughput versus HSPR for the proposed iHARQ scheme, the ATS scheme [9], and the QL-HARQ scheme [12], with the user mobility equal to 60 km/h. It can be observed that, in the case with 6-ms CQI delay, the iHARQ scheme achieves the system throughput higher by up to 75.2% than the ATS scheme and by 16.4% than the QL-HARQ scheme; the system throughputs are increased with respect to the increment of HSPR. The phenomena are subsequently explained. As the HSPR becomes larger, since Node B has more power to deliver the data, UE can thus

Fig. 4. System throughput versus HSPR.

have better received SINR, and the system throughput be-comes higher. In addition, the iHARQ scheme infers the MCS level selection from the fuzzy rules with respect to the QL-HARQ scheme. These fuzzy rules are well designed, both in structure and in individual, based upon domain knowledge. In addition, they are adaptively and optimally adjusted by reinforcement signals. The reinforcement signals are set to reward the MCS with the purpose to enhance the system throughput. Moreover, the inference engine adopts the EEP [21] to choose the most suitable action, and the defuzzifier uses the center of mass method to determine the global optimal MCS level. Notice that both the BLER requirement and the system throughput are considered in the iHARQ scheme. The throughput of the iHARQ scheme can be higher by a more aggressive but safe MCS level selection. On the other hand, the QL-HARQ scheme estimates the transmission cost to perform the decision of MCS for the initial packet transmission. By the self-tuning capability of the QL algorithm, the QL-HARQ scheme chooses a more suitable MCS to improve the through-put than the ATS scheme. However, due to the lack of domain knowledge, the throughput of the QL-HARQ scheme is less than that of the iHARQ scheme. The ATS scheme adjusts the SINR threshold for each MCS according to the ACK/NACK information. In addition, since the granularity of the adjustment is fixed and the dynamic range of the adjustment is small, this scheme is less flexible, and the system throughput is the lowest. Fig. 5 shows the BLER performance measure versus HSPR. It can be seen that the BLER performances of the three com-pared schemes are always satisfied. The reason for this is that the iHARQ scheme intends to enhance the system throughput as well as guarantee the BLER performance via the FQL algo-rithm. When BLER(n) is lower than the BLER requirement, the iHARQ scheme takes an aggressive action by selecting a higher MCS level to enhance the system throughput but would increase the BLER(n). When BLER(n) is over the BLER requirement, the iHARQ scheme selects a conservative action by choosing a lower MCS level to decrease BLER(n). Hence, the iHARQ scheme can maintain the BLER performance just below the BLER requirement. On the other hand, the QL-HARQ scheme adjusts the Q-value according to the received

(8)

Fig. 5. BLER performance measure versus HSPR.

Fig. 6. Packet dropping ratio versus HSPR.

SINR and the required SINR, which is too sensitive to the fast varying channel condition. This makes the QL-HARQ scheme adopt a conservative action and have the best BLER performance. The ATS scheme adjusts the SINR thresholds only based on the ACK/NACK information, which is also too sensitive to the fast varying channel condition. However, since the adjustment step is fixed, the ATS scheme has higher BLER performance than the QL-HARQ scheme.

Fig. 6 presents the packet dropping ratio versus HSPR. In the simulation, if a packet failed to be decoded after three retrans-missions, then this packet will be dropped. It can be found that the ATS (QL-HARQ) scheme has the largest (smallest) packet dropping ratio, and the iHARQ scheme is in the middle. The iHARQ scheme has the packet dropping rate in the middle, but it still achieves the largest system throughput and has the highest BLER performance. It is because the iHARQ scheme chooses a more aggressive but safe MCS to enhance system throughput, and a higher MCS level carries more information bits. This safe MCS selection makes the iHARQ scheme have a higher probability of successfully decoding a packet after retransmission. However, since the ATS scheme is less flexible, it has a higher probability of dropping a transmitted packet once an unsuitable MCS level is determined. On the other hand, since

Fig. 7. System throughput versus UE mobility.

Fig. 8. BLER performance versus UE mobility.

the QL-HARQ scheme chooses a more conservative action and has a self-tuning capability of the QL algorithm, it has the lowest packet dropping ratio.

In the following two figures, the effect of UE mobility on the performance measures is evaluated, where 80% of the total transmission power is supposed to be allocated to HSDPA users. Fig. 7 shows the system throughput versus the UE mobility. It can be observed that the iHARQ (ATS) scheme still has the largest (smallest) system throughput among the three schemes, regardless of the UE mobility. The reasons for this are the same as those given in Fig. 4. As the UE mobility increases, since the short-term fading becomes more deterio-rate (deeper) due to the Doppler effect, the channel condition becomes worse, and the system throughput is lower. Fig. 8 depicts the BLER performance measure versus the UE mobility. It can be seen that the three compared schemes can guarantee the BLER requirement, regardless of the UE mobility. The reasons are similar to those given in Fig. 5. In addition, as the UE mobility decreases, since the effect of short-term fading decreases, the BLER performance decreases for the QL-HARQ and ATS schemes as well. However, since the iHARQ scheme adopts the closed-loop iteration manner, it keeps the BLER

(9)

Fig. 9. Convergence performance of three compared schemes.

performance just below the BLER requirement, regardless of the UE mobility.

Fig. 9 shows the convergence performance of the three compared schemes at HSPR equal to 80% and the UE mobility equal to 60 km/h. The value of BLER(n) for each scheme is initially set to 0. At the beginning, since the number of trans-mitted packets is not so many, a few erroneously transtrans-mitted packets would make the BLER performance metric become high. At this condition, the iHARQ scheme can take a more conservative action to quickly recover the BLER performance in a few iterations. It can be seen that the iHARQ achieves the best convergence performance. After 90 iterations, the iHARQ scheme converges the learning process and keeps the BLER performance guaranteed under the BLER requirement. On the other hand, the QL-HARQ scheme needs about 180 iterations to attain the BLER performance to fulfill the requirement since it adjusts the Q-value only based on the received SINR and the required SINR. This cannot provide enough information about BLER for the QL-HARQ scheme to quickly recover the BLER performance. The ATS scheme needs about 140 iterations to satisfy the BLER requirement since it adjusts the SINR thresholds for MCS levels with fixed step according to the ACK/NACK information, which has a little relation with BLER.

V. CONCLUSION

In this paper, we have proposed an iHARQ scheme for the HSDPA system to achieve efficient resource utilization under the situation that CQI has report delay. According to the received but obsolete CQI and past transmission results, we combine the fuzzy logics and the QL algorithm to assist the HARQ error control mechanism in selecting the MCS level for the initial packet transmission. The fuzzy rule base is effectively divided into green, yellow, and red parts to indicate oversatis-fied, just-satisoversatis-fied, and violated conditions, respectively. It is a good and well-organized design to guarantee the BLER per-formance and enhance the system throughput. In addition, the QL algorithm is adopted to adjust the Q-values of fuzzy rules to adapt to channel conditions where reinforcement signals

are well designed by using domain knowledge considering the results of both initial transmission and retransmission. By the self-tuning ability of the iHARQ scheme, an appropriate selec-tion of the MCS level for the initial packet transmission can be achieved. The simulation results show that the iHARQ scheme guarantees the QoS requirement of BLER and attains higher system throughput than conventional HARQ schemes, such as the ATS scheme [9] and the QL-based HARQ (QL-HARQ) scheme [12]. In addition, the iHARQ scheme achieves the best convergence performance. Moreover, this iHARQ scheme is feasible to be realized in practical systems because the fuzzy logic function can easily be implemented on a chip.

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers for their suggestions to improve the presentation of this paper.

REFERENCES

[1] 3GPP TR 25.308, Tech. Rep. High speed downlink packet access (HSDPA): Overall description; Stage 2, Dec. 2004.

[2] H. Holma and A. Toskala, WCDMA for UMTS—Radio Access for Third

Generation Mobile Communication, 3rd ed. Hoboken, NJ: Wiley, 2005.

[3] M. Wrulich, W. Weiler, and M. Rupp, “HSDPA performance in a mixed traffic network,” in Proc. IEEE VTC—Spring, 2008, pp. 2056–2060. [4] S. Parkvall and E. Dahlman, “Performance comparison of HARQ with

chase combining and incremental redundancy for HSDPA,” in Proc. IEEE

VTC—Fall, 2001, pp. 1829–1833.

[5] T. Cheng, “Coding performance of hybrid ARQ schemes,” IEEE Trans.

Commun., vol. 54, no. 6, pp. 1017–1029, Jun. 2006.

[6] S. Chen, J. Du, M. Peng, and W. Wang, “Performance analysis and im-provement of HARQ techniques in TDD-HSDPA/SA system,” in Proc.

ITS Telecommun., 2006, pp. 523–526.

[7] M. Park, B. Keum, M. Lee, and H. S. Lee, “A selective HARQ scheme operating based on channel conditions for high speed packet data trans-mission systems,” in Proc. IEEE PIMRC, 2007, pp. 1–5.

[8] 3GPP TR 25.214, Tech. Rep. UMTS Physical layer procedures (FDD), Jun. 2002.

[9] M. Nakamura, Y. Awad, and S. Vadgama, “Adaptive control of link adaptation for high speed downlink packet access in WCDMA,” in Proc.

Wireless Pers. Multimedia Commun., 2002, vol. 2, pp. 382–386.

[10] H. Zheng and H. Viswanathan, “Optimizing the ARQ performance in downlink packet data systems with scheduling,” IEEE Trans. Wireless

Commun., vol. 4, no. 2, pp. 495–506, Mar. 2005.

[11] A. Muller and T. Chen, “Improving HSDPA link adaptation by consid-ering the age of channel quality feedback information,” in Proc. IEEE

VTC—Fall, 2005, pp. 1643–1647.

[12] C. J. Chang, C. Y. Chang, and F. C. Ren, “Q-learning-based hybrid ARQ for high speed downlink packet access in UMTS,” in Proc. IEEE

VTC—Spring, 2007, pp. 2610–2615.

[13] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Mach. Learn., vol. 8, pp. 279–292, 1992.

[14] Y. S. Chen, C. J. Chang, and F. C. Ren, “Q-learning-based multirate transmission control scheme for RRM in multimedia WCDMA systems,”

IEEE Trans. Veh. Technol., vol. 53, no. 1, pp. 38–48, Jan. 2004.

[15] V. L. R. Chinthalapati, N. Yadati, and R. Karumanchi, “Learning dynamic prices in MultiSeller electronic retail markets with price sensitive cus-tomers, stochastic demands, and inventory replenishments,” IEEE Trans.

Syst., Man, Cybern. C, Appl. Rev., vol. 36, no. 1, pp. 92–106, Jan. 2006.

[16] Y. V. Kiran, T. Venkatesh, and C. S. R. Murthy, “A reinforcement learning framework for path selection and wavelength selection in optical burst switched networks,” IEEE J. Sel. Areas Commun., vol. 25, no. 9, pp. 18– 26, Dec. 2007.

[17] P. Y. Glorennec, “Fuzzy Q-learning and dynamic fuzzy Q-learning,” in

Proc. IEEE Int. Conf. Fuzzy Syst., 1994, vol. 1, pp. 474–479.

[18] P. Y. Glorennec and L. Jouffe, “Fuzzy Q-learning,” in Proc. IEEE Int.

Conf. Fuzzy Syst., 1997, vol. 2, pp. 659–662.

[19] L. Giupponi, R. Agusti, J. Perez-Romero, and O. Sallent, “Fuzzy neural control for economic-driven radio resource management in beyond 3G networks,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 39, no. 2, pp. 170–189, Mar. 2009.

(10)

[20] C. T. Lin and C. S. G. Lee, Neural Fuzzy Systems: A Neuro-Fuzzy

Syner-gism to Intelligent Systems. Englewood Cliffs, NJ: Prentice-Hall, 1996.

[21] A. Waldock and B. Carse, “Fuzzy Q-learning with an adaptive represen-tation,” in Proc. IEEE Int. Conf. Fuzzy Syst., 2008, pp. 720–725. [22] L. Jouffe, “Fuzzy inference system learning by reinforcement methods,”

IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 28, no. 3, pp. 338–355,

Aug. 1998.

[23] Y. H. Chen, C. J. Chang, and C. Y. Huang, “Fuzzy Q-learning admission control for WCDMA/WLAN heterogeneous networks with multimedia traffic,” IEEE Trans. Mobile Comput., vol. 8, no. 11, pp. 1469–1479, Nov. 2009.

[24] C. Y. Huang, W. C. Chung, C. J. Chang, and F. C. Ren, “Fuzzy Q-learning-based hybrid ARQ for high speed downlink packet access,” in Proc. IEEE

VTC—Fall, 2009, pp. 1–4.

[25] G. L. Stuber, Principle of Mobile Communication. Norwell, MA: Kluwer, 2001.

[26] M. Gudmundson, “Correlation model for shadow fading in mobile radio systems,” Electron. Lett., vol. 27, no. 23, pp. 2145–2146, Nov. 1991.

Chiao-Yin Huang was born in Taiwan in September 1984. She received the B.E. degree in electrical control engineering and the M.E. degree in communication engineering from the National Chiao Tung University, Hsinchu, Taiwan, in 2006 and 2008, respectively.

She is currently with the Department of Electrical Engineering, National Chiao Tung University. Her research interests include radio resource manage-ment and link adaptation for wireless communication networks.

Wen-Ching Chung (M’11) was born in Taiwan, in

June 1977. He received the B.E. and Ph.D. degrees in electrical control engineering from the National Chiao Tung University, Hsinchu, Taiwan, in 1999 and 2006, respectively.

Since January 2008, he has been a Post Doc-toral Researcher with the Department of Electrical Engineering, National Chiao Tung University. His research interests are in the areas of radio resource management for wireless communication networks, link adaptation for broadband networks, and intelli-gent control systems.

Chung-Ju Chang (F’06) was born in Taiwan in

August 1950. He received the B.E. and M.E. degrees in electronics engineering from the National Chiao Tung University, Hsinchu, Taiwan, in 1972 and 1976, respectively, and the Ph.D. degree in electrical en-gineering from National Taiwan University, Taipei, Taiwan, in 1985.

From 1976 to 1988, he was a Design Engineer, Supervisor, Project Manager, and then Division rector with the Telecommunication Laboratories, Di-rectorate General of Telecommunications, Ministry of Communications, Taiwan. From 1987 to 1989, he also acted as a Science and Technical Advisor for the Minister with the Ministry of Communications. In 1988, he joined the faculty of the Department of Communication Engineering, College of Electrical Engineering and Computer Science, National Chiao Tung University, as an Associate Professor. He has been a Professor since 1993 and a Chair Professor since 2009. He was the Director of the Institute of Communication Engineering from August 1993 to July 1995, Chairman of Department of Communication Engineering from August 1999 to July 2001, and Dean of the Research and Development Office from August 2002 to July 2004. In addition, he was an Advisor for the Ministry of Education to promote the education of communication science and technologies for colleges and universities in Taiwan during 1995–1999. He is acting as a Committee Member of the Telecommunication Deliberate Body, Taiwan. His research in-terests include performance evaluation, radio resource management for wireless communication networks, and traffic control for broadband networks.

Dr. Chang is a member of the Chinese Institute of Engineers and the Chinese Institute of Electrical Engineers, Moreover, he once served as Editor for the IEEE COMMUNICATIONSMAGAZINEand as Associate Editor for the IEEE TRANSACTIONS ONVEHICULARTECHNOLOGY.

Fang-Ching Ren (M’03) was born in Hsinchu,

Taiwan. He received the B.E., M.E., and Ph.D. degrees in communication engineering from the National Chiao Tung University, Hsinchu, in 1992, 1994, and 2001, respectively.

Since 2001, he has been a Protocol Design Engineer with the Industrial Technology Research Institute, Hsinchu, where he was involved in the design and development of wireless code-division multiple access chipsets, WiMAX mobile multihop relay technology, and fourth-generation access tech-nology. Since 2005, he has been working for standard development as an aggressive contributor, including the IEEE 802.16j/16m and LTE/LTE-A stan-dards. His current research interests include system performance analysis, protocol design, and mobile radio networks.