An MIMO Configuration Mode and MCS Level Selection Scheme by Fuzzy Q-Learning for HSPA(+) Systems

(1)

An MIMO Configuration Mode and

MCS Level Selection Scheme by Fuzzy

Q-Learning for HSPA

þ

Systems

Wen-Ching Chung, Member, IEEE, Chung-Ju Chang, Fellow, IEEE,

Kai-Ten Feng, Member, IEEE, and Ying-Yu Chen

Abstract—In this paper, we propose a fuzzy Q-learning-based MIMO configuration mode and MCS level (FQL-MOMS) selection scheme for high speed packet access evolution (HSPAþ) systems. The FQL-MOMS selection scheme intends to enhance the system throughput under the block error rate (BLER) requirement guarantee. It will determine an appropriate MIMO configuration mode and MCS (modulation and coding scheme) level for packet data transmission in HSPAþ systems, under the situations that the channel status is varying and the channel quality indication (CQI) has report delay. The FQL-MOMS scheme considers not only the reported CQI and the last transmission result but also the BLER performance metric and the transmission efficiency. Moreover, it is effectively configured, where the fuzzy rules and the reinforcement signals for the Q-learning algorithm are sophisticatedly designed. Simulation results show that the proposed FQL-MOMS scheme increases the system throughput by up to 49.3 and 35.9 percent, compared to the conventional adaptive threshold selection (ATS) scheme [12] and the Q-HARQ scheme [14], respectively, under the BLER requirement fulfillment.

Index Terms—HSPAþ, MIMO, MCS, HARQ, BLER, fuzzy logic, and Q-learning.

Ç

1 I

NTRODUCTION

H

IGHspeed packet access evolution (HSPAþ) has been

introduced in Release 7 by the 3rd generation partner-ship project (3GPP) for universal mobile

telecommunica-tions system (UMTS) [1]. HSPAþ adopts many effective

techniques to enhance the performance of high speed downlink packet access (HSDPA) services proposed in Release 5, such as multiple-input multiple-output (MIMO), higher order modulation and coding, continuous packet connectivity, and so on [2].

When MIMO is enabled in HSPAþ, a larger peak data rate can be achieved through the spatial multiplexing (SM) when channel quality is good, or a higher link reliability can be provided through the spatial diversity (SD) when channel quality is bad [3], [4], [5]. 3GPP extended the double transmitter antenna array (D-TxAA) in Release 99 to be the standard of Release 7 [6], [7]. The use of multiple transmit antennas at the base station can provide diversity gain without additional receiver chains at the mobile terminal. When the channel quality is good, the D-TxAA transmits two data streams (transport blocks) simultaneously over the radio channel to increase the system throughput by using the same channelization codes. Each data stream is processed and coded separately. When the channel quality is bad, the D-TxAA transmits one data stream through two

antennas to increase the probability of the successful decoding. Consequently, there are totally two MIMO configuration (transmission) modes. How to determine an appropriate MIMO configuration mode to increase the system throughput is an interesting and important issue.

At the data link layer of protocol for packet data transmission in HSPAþsystems, the hybrid automatic repeat request (HARQ) is conventionally the error control scheme. The HARQ control scheme combines forward error correc-tion (FEC) mechanism with the original ARQ scheme. Li and Zhao analyzed the ARQ scheme adopted in parallel multi-channel communications for error control [8]. Multiple parallel channels are often created by using orthogonal frequency division multiplex (OFDM) technology or MIMO technology. Wang and Chang studied the performance analysis of stall avoidance schemes for HSDPA with parallel HARQ mechanisms and determined a proper number of processes for the parallel HARQ mechanisms [9]. On the other hand, there are traditionally three kinds of schemes to implement the HARQ scheme: chase combining (CC), incremental redundancy (IR), and partially IR schemes. The performance of these three schemes was studied in [10] and [11]. Since the IR scheme has higher error correction ability, we adopt the IR scheme to implement the HARQ scheme in this paper.

As for the modulation and coding scheme (MCS) level selection in the HARQ control scheme, the 3GPP has predefined a table to show the relationship between the MCS level and the required channel quality indication (CQI) to fulfill the block error rate (BLER) performance requirement, according to fixed signal-to-interference-plus-noise ratio (SINR) thresholds [1]. However, the relationship definition is by a fixed threshold selection (FTS) method . The authors are with the Department of Electrical Engineering, National

Chiao Tung University, 1001 University Road, Hsinchu, Taiwan 300, ROC. E-mail: [email protected], [email protected], [email protected], [email protected].

Manuscript received 12 Aug. 2010; revised 3 June 2011; accepted 10 June 2011; published online 22 June 2011.

For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TMC-2010-08-0379. Digital Object Identifier no. 10.1109/TMC.2011.139.

(2)

which has some drawbacks since the CQI has report delay and measurement inaccuracy. Nakamura et al. proposed an adaptive threshold selection (ATS) scheme to adjust the SINR threshold for the selected MCS level by considering the last transmission result [12]. Also, Muller and Chen further adjusted the SINR threshold for each MCS level according to not only the last transmission result but also the CQI delay [13]. However, these two adaptive schemes still cannot cope with the rapidly varying channel status since the granularity of the SINR threshold adjustment according to the reported acknowledgment (ACK) or negative acknowledgment (NACK) is coarse or, say, inflexible. A new HARQ scheme by using Q-learning algorithm (Q-HARQ) was proposed in [14] to learn the policy of MCS decision. The Q-learning algorithm is a kind of unsupervised learning methods, which can learn optimal control rules through dynamic interactions with the envir-onment [15]. By using the feedback reinforcement signal obtained from the HSDPA system, the Q-HARQ scheme proposed in [14] outperforms the conventional IR scheme and two other link adaptation IR (LA-IR) schemes proposed in [16] in the total system throughput. The reason is that the Q-learning algorithm is performed in a closed-loop iteration manner such that the optimal solution for the initial packet MCS can be found.

On the other hand, fuzzy logics has been well developed [17], [18]. Fuzzy logics emulates the way of human thinking to describe the behavior of systems which are too complicated to tackle mathematically. Also, it can deal with uncertainty and imprecision knowledge to reason a suitable decision. By combining the fuzzy logics with the Q-learning algorithm, the fuzzy Q-learning (FQL) algorithm [19], [20], [21], [22], [23], [24], [25] takes advantages of both Q-learning algorithm and fuzzy logics to efficiently learn optimal control rules for uncertainty and imprecision problems. Therefore, the FQL algorithm can use domain knowledge to accelerate the Q-learning algorithm to find the optimal solution of the MCS selection for packet transmission. A preliminary study on the selection of MCS level but without MIMO configuration mode for an HARQ system using the FQL algorithm has been given in [26], under the BLER requirement guarantee.

In this paper, we propose a fuzzy Q-learning-based MIMO configuration mode and MCS level (FQL-MOMS)

selection scheme for HSPAþ systems. The FQL-MOMS

scheme intends to enhance the system throughput while guarantee the BLER performance requirement. In order to overcome the imprecise problem of CQI report delay, an effective configuration for the FQL-MOMS selection scheme is designed. The FQL-MOMS selection scheme adopts fuzzy logics to reason a suitable MIMO configura-tion (transmission) mode and an MCS level simultaneously for a new packet data transmission. Also, to accelerate the learning process, the fuzzy rule bases for SD and SM are all partitioned into three regions. Moreover, the FQL-MOMS selection scheme uses the Q-learning algorithm to learn optimal decision values for rules in its configuration. The fuzzy rule bases are updated by feedback reinforcement signals, which are designed according to not only the CQI and the transmission results but also the BLER performance metrics and the transmission efficiency. Because of the closed-loop iteration manner, the fuzzy Q-learning can learn an excellent relationship between the MIMO config-uration mode combined with MCS level and the measured BLER with delayed CQI. Simulation results show that, when user mobility is at around 60 km/hr, the proposed FQL-MOMS selection scheme increases the system throughput by up to 49.3 percent compared to the conventional ATS scheme [12], and by up to 35.9 percent compared to the Q-HARQ scheme [14].

The remainder of the paper is organized as follows: The system model is introduced in Section 2. In Section 3, the design of the FQL-MOMS scheme is presented. It is followed by the performance analysis of the FQL-MOMS scheme in Section 4. Finally, conclusions are given in Section 5.

2 S

YSTEM

M

ODEL

2.1 HSPAþ System

In the HSPAþ system shown in Fig. 1, both base station

(Node B) and user equipment (UE) are assumed to be equipped with two antennas. Node B controls link adapta-tion and physical layer transmission and retransmission combining to fast react the variation of channel condition. The link adaptation performs a suitable MIMO configuration Fig. 1. The HSPAþsystem with the FQL-MOMS scheme.

(3)

mode and MCS level for data flow transmission according to the decision from the fuzzy Q-learning-based MIMO config-uration mode and MCS level selection scheme, which will be designed in Section 3. The data flow is divided and coded into suitable data streams by the HARQ control scheme.

Data flows for UE are transmitted via high-speed

down-link shared channel (HS-DSCH) in HSPAþ [1], which

supports multicode transmission and code multiplexing of different users. The coding scheme of HS-DSCH is by turbo code with lowest code rate 1=3, and the code rate used in this study is considered as f1=3; 1=2; 2=3; 3=4g [27]. The modula-tion order supported in the operamodula-tion is QPSK and 16-QAM. Therefore, the HSPAþsystem totally has eight kinds of MCS levels [1], shown in Table 1. Each MCS level has different transport block size, which is the number of data bits transmitted in one data stream. Let MCSk be the kth MCS

level, k ¼ 1; . . . ; 8. Note that the higher MCS level has higher system throughput but less reliability for data transmission. The MIMO configuration has two modes of operation, based on the downlink channel condition reported in CQI by UE. Mode 1 is with SD transmission and mode 2 is with

SM transmission. Let MCSH (MCSL) be the MCS level

HðLÞ assigned to the primary (secondary) data stream with

the CQIH (CQIL), 1 MCSL MCSH 8. In the SD

transmission mode, only primary stream carries data packet

with proper MCSH level to enhance the reliability of

transmission. In the SM transmission mode, primary and secondary streams carry different data packets with proper

MCSH and MCSL levels, respectively, to increase the

system throughput. Notice that the primary (secondary) stream is the one with large (small) CQI. After the spatial processing, the data stream(s) will be transmitted to UE through the two antennas. A packet transmission time interval (TTI) is specified as 2 ms to achieve shorter round trip delay. Each data stream for UE is processed and coded separately per TTI. Node B sets precoding weights for each data stream according to the precoding control indication (PCI) reported by UE.

Control information for UE is delivered via high-speed shared control channel (HS-SCCH), which contains the channelization code set, modulation scheme, transport-block size, HARQ-related parameters, and CRC attachment. This control information is used to decode the data transported by the HS-DSCH and to perform the soft combining when the retransmission is carried out. Control information to Node B from UE is via high-speed dedicated physical control channel (HS-DPCCH), which consists

of ACK or NACK messages, PCI, and CQI value. The ACK/NCAK message indicates the result of the HS-DSCH decoding. The CQI contains the channel quality indications of the primary and the secondary data streams, denoted by

CQIH and CQIL, respectively, and CQIH CQIL. The

values of CQIH and CQIL are measured by UE through

common pilot channel (CPICH) and are an integer ranging from 0 to 14 [1]. According to the reported CQIs, measured BLER, and some system information, the FQL-MOMS selection scheme designed in the Node B will choose a suitable MIMO configuration mode and MCS level for packet data transmission to enhance the system throughput and to satisfy the BLER requirement for the HSPAþsystem.

2.2 Channel Model

In this study, we consider the HSPAþsystem in an urban area with a terrestrial mobile radio channel. Three types of propagation factor are considered in the channel model: path loss, slow fading resulting from shadowing and scattering, and fast fading due to multipath effects. Let be a normal-distributed random variable with zero mean and variance 2

L. The channel condition at time t, denoted

by F ðtÞ, can be expressed as

FðtÞ ¼ ðdÞ 10=10 ðtÞ; ð1Þ

where d is the distance between the Node B and the UE, ðtÞ is the short-term fading, and ðdÞ 10=10 _{is the long-term}

fading with path loss ðdÞ and shadowing effect 10=10_{. The}

short-term fading is caused by the multipath effects and is given by Stuber [29] ðtÞ ¼ 2 ffiffiffiffi 2 L r _XM m¼1

cosð2fDt cosð2m=LÞ þ mÞejm; ð2Þ

where L ¼ 4M þ 2 is the number of the signal path, m¼

m=ðM þ 1Þ; m¼ mþ 2s=ðM þ 1Þ; s ¼ 0; 1; . . . ; M 1;

is the radical of the average power signal, and fD is the

Doppler frequency.

When the position of the UE changes, the shadowing effect of the UE is different. Since the sampling rate in HARQ is very fast compared to the motion of the UE, the shadowing effects of two sampling points are highly correlated in practical. Let x be the user distance between two sampling points. According to [30], the correlation of shadow fading, denoted by ðxÞ, is modeled by a normalized autocorrela-tion funcautocorrela-tion and is given by ðxÞ ¼ ejxjdcorln 2, where d_coris

the decorrelation length.

3 FQL-MOMS S

ELECTION

S

CHEME

The fuzzy Q-learning-based MIMO configuration (transmis-sion) mode and MCS level (FQL-MOMS) selection scheme is here designed to determine a suitable MIMO configuration mode with MCS level for packet data transmission in HSPAþ systems. As known, the selection decision for packet transmission should be based on current and past system states. Thus, this selection scheme can be modeled as a discrete-time Markov decision process (MDP). We here adopt the fuzzy Q-learning algorithm to solve this MDP problem.

Functional blocks of the FQL-MOMS selection scheme are depicted in Fig. 2. Since the MIMO transmission mode

TABLE 1

(4)

with the MCS level is tightly related with channel quality and the resulted performance metric BLER, at each nth TTI (episode), the FQL-MOMS selection scheme chooses three measures to be the input linguistic variables. They are

BLERðnÞ, CQIHðnÞ, and CQILðnÞ, where the BLERðnÞ is

defined as the number of packets needed retransmission over the total number of transmitted packets over an observation window at the nth TTI. The input state vector, denoted by XðnÞ, is ðBLERðnÞ; CQIHðnÞ; CQILðnÞÞ. There

are two fuzzy rule bases designed in the FQL-MOMS selection scheme. Fuzzy rule base SD is for the SD transmission mode and fuzzy rule base SM is for the SM transmission mode. The Q-value for the state-action pair at the nth episode, denoted by qn, can then be obtained form

these two rule bases. The optimal action for each transmis-sion mode is inferred from corresponding inference engine. By action decision block, a suitable MIMO transmission mode with an MCS level can be determined. The reinforcement signal generator will generate a reinforcement signal, denoted by rðnÞ, based on not only the results of transmission and retransmission but also the BLER perfor-mance metric and the transmission efficiency. This rðnÞ will be used to finely update the Q-values in fuzzy rule base in the block of Q-function update. Detailed design of each functional blocks are given as follows:

3.1 Fuzzifier

The fuzzifier receives the system state vector XðnÞ ¼ ðBLERðnÞ; CQIHðnÞ; CQILðnÞÞ as input linguistic vector.

The BLERðnÞ is an essential performance metric for HSPAþ

service which has BLER requirement, denoted by BLER_.

This performance metric BLERðnÞ would be in satisfaction, attention, or violation region with respect to the

perfor-mance requirement BLER _{of the HSPA}þ _service.

There-fore, the fuzzifier defines the fuzzy term set for BLERðnÞ as TðBLERðnÞÞ¼fSatisfaction ðSÞ; Attention ðAÞ;ViolationðV Þg. The membership function of each term in BLERðnÞ, denoted by ðBLERðnÞÞ; ¼ S; A; or V , is given in Fig. 3.

This membership function for the fuzzy term indicates that the intensity of the input variable belong to itself fuzzy labels, and is designed with preknowledge of the system. In

Fig. 3, we set A1¼ 0:5 BLERðA4¼ BLERÞ to indicate

that the region of ð0; A1Þ ððA4; 1Þ where the BLER

perfor-mance metric is resided is in the satisfaction (violation)

situation. Also, we set A2¼ 0:7 BLER and A3¼

0:9 BLER_{to represent whenever the BLER performance}

measure is in this region approaching to BLER_{, the BLER}

performance metric is in the attention situation.

The fuzzy term sets for CQIHðnÞ and CQILðnÞ are

defined as

TðCQIHðnÞÞ ¼ T ðCQILðnÞÞ ¼ fLevel 1 ðL1Þ; Level 2 ðL2Þ;

Level 3ðL3Þ; Level 4 ðL4Þ; Level 5 ðL5Þ; Level 6 ðL6Þ; Level 7ðL7Þ; Level 8 ðL8Þg:

Terms in T ðCQIHðnÞÞ and T ðCQILðnÞÞ represent the ranges

of the channel quality indication, and each term will correspond to one MCS level which can guarantee the BLER

requirement BLER _{during this CQI range. The}

member-ship functions of the term in CQIHðnÞ and CQILðnÞ,

denoted by ðCQIHðnÞÞ and ðCQILðnÞÞ, respectively,

¼ L1; L2; . . . , or L8, are designed to be the same and are shown in Fig. 4. In the figure, we set Bk; k¼ 1; . . . ; 8, to be the

required CQI to maintain BLERfor MCSk.

Since CQIHðnÞ is greater than or equal to CQILðnÞ, there

are totally J ¼ 108 kinds of XðnÞ. Let Sj be the fuzzy

linguistic terms of XðnÞ; 1 j J. The intensity of XðnÞ belonging to Sj, denote by j;n, is obtained by the

member-ship functions of BLERðnÞ, CQIHðnÞ, and CQILðnÞ via a

max-product operation [17], [18]. It is given by Fig. 2. The functional blocks of the FQL-MOMS selection scheme.

(5)

j;n¼ ðBLERðnÞÞ ðCQIHðnÞÞ ðCQILðnÞÞ: ð3Þ

3.2 Fuzzy Rule Base

Two fuzzy rule bases are constituted in the FQL-MOMS selection scheme. They are fuzzy rule base SD for the SD transmission mode and fuzzy rule base SM for the SM transmission mode, where each input vector has one fuzzy Q-learning rule. The fuzzy rule in the fuzzy rule base SD is designed as

Rule j : if XðnÞis Sj; then a1;kwith qnðSj; a1;kÞ;

for 1 j J; 1 k 8; ð4Þ

where a1;k is the action for the SD transmission mode and

qnðSj; a1;kÞ is the Q-value for the state-action pair ðSj; a1;kÞ. If

a1;k is selected, the FQL-MOMS selection scheme uses the

kth MCS level for the primary data stream. Similarly, the fuzzy rule in the fuzzy rule base SM is designed as

Rulej : if XðnÞ is Sj; then a2;kwith qnðSj; a2;kÞ;

for 1 j J; 1 k 36; ð5Þ

where a2;kis the action for the SM transmission mode. If a2;k

is selected, the FQL-MOMS selection scheme assigns the

MCS level HðMCSHÞ for the primary data stream with

CQIHðnÞ and the MCS level LðMCSLÞ for the secondary

data stream with CQILðnÞ. The mappings of MCSH and

MCSL with respect to a2;k are given in Table 2, where

k¼ fðH; LÞ ¼ H 8 þ 0:5Lð17 LÞ.

In these two fuzzy rule bases, Q-value is learned via the reinforcement signal at each TTI. The reinforcement signal is designed to reward (punish) the selected action which can increase the system throughput (guarantee the BLER requirement) under the BLER performance requirement.

Therefore, the Q-value can be seen as the preference value for each action under different system state.

In order to accelerate the learning process, we have divided each fuzzy rule base into satisfaction, attention, and violation regions according to the fuzzy terms of BLERðnÞ. The candidate set for action selection in each region is decided by the fuzzy terms of CQIHðnÞ and CQILðnÞ, since

each fuzzy term for CQIs indicates one kind of MCSs to

guarantee the BLER requirement BLER _{during this CQI}

region. If BLERðnÞ is in the Satisfaction region, it means that the BLER performance is much smaller than the BLER, and a more aggressive action should be selected so as to increase the system throughput. Since the action with larger MCS level can transmit more information bits, the actions with the required CQI larger than the reported CQI are considered as action candidates. If BLERðnÞ is in the Attention region, it means that the BLER performance is close to the BLER, and the BLER performance can be kept to still remain in this region. In order to keep the BLER performance unchanged, only the actions with the required CQI at around the reported CQI are considered as action candidates. If BLERðnÞ is in the V iolation region, it means that the BLER requirement is violated, and a more conservative action should be selected so as to comply the BLER requirement of the HSPA service. In order to recover the BLER performance from the violation region, only the actions which the required CQI is less than the reported CQI are considered as action candidates. In summary, the fuzzy rules bases SD and SM are given by:

Fuzzy Rule Base SD:

. If BLERðnÞ is in satisfaction region, CQIHðnÞ is

Level b, and CQILðnÞ is Level c, then a1;k with

qnððS; Lb; LcÞ; a1;kÞ; dbþc₂ e k 8.

. If BLERðnÞ is in attention region, CQIHðnÞ is

qnððA; Lb; LcÞ; a1;kÞ; k ¼ bbþc₂c 1; . . . ; b þ 1.

. If BLERðnÞ is in violation region, CQIHðnÞ is

qnððV ; Lb; LcÞ; a1;kÞ; k ¼ c 1; . . . ; b 1.

Fuzzy Rule Base SM:

. If BLERðnÞ is in satisfaction region, CQIHðnÞ is

qnððS; Lb; LcÞ; a2;kÞ; k ¼ fðH; LÞ; H > b; L > c; H L.

TABLE 2

The Mapping between the MCS Levels and Action a2;k

(6)

. If BLERðnÞ is in attention region, CQIHðnÞ is

qnððA; Lb; LcÞ; a2;kÞ; k ¼ fðH; LÞ; b 1 H b þ 1; c

1 L c þ 1; H L.

. If BLERðnÞ is in violation region, CQIHðnÞ is

qnððV ; Lb; LcÞ; a2;kÞ; k ¼ fðH; LÞ; H < b; L < c; H L.

3.3 Inference Engine and Action Decision

By using a select-max strategy in the exploration/exploita-tion policy (EEP) [23], [25], a most suitable acexploration/exploita-tion for rule j in fuzzy rule base SD, denoted by a

1;jðnÞ, can be obtained by

a_1;jðnÞ ¼ arg max

k qnðSj; a1;kÞ: ð6Þ

The a

1;jðnÞ in (6) is also called the greedy action. In order to

explore the set of possible actions and acquire experience through the reinforcement signals, the action of each rule is selected according to the select-max strategy of EEP [23], [25]. In the exploitation policy, the greedy action is selected. In the exploration policy, one of the nongreedy actions is selected to produce a larger total reward in the long iterations. Then, the optimal action for the SD transmission mode, denoted by a1ðnÞ, can be inferred by

a₁ðnÞ ¼ PJ j¼1j;n a1;jðnÞ PJ j¼1j;n ; ð7Þ

and the Q-value for a

1ðnÞ can then be obtained by

QnðXðnÞ; a1ðnÞÞ ¼ PJ j¼1½j;n qnðSj; a1;jðnÞÞ PJ j¼1j;n : ð8Þ

Using the same strategy in (6), a most suitable action for

rule j in fuzzy rule base SM, denoted by a

2;jðnÞ, can be

obtained. Similarly, taking a2;jðnÞ instead of a1;jðnÞ into

(7) and (8), we can get the optimal action for the SM transmission mode, denoted by a2ðnÞ, and the Q-value for

a2ðnÞ, denoted by QnðXðnÞ; a2ðnÞÞ. Note that, since the

output value of the FQL-MOMS selection scheme should be an integer, the optimal actions a

1ðnÞ and a2ðnÞ are rounded

off to be integer values.

The proper transmission mode at every episode n, denoted by z, can then be decided in the action decision block by z¼ arg max z¼1;2QnðXðnÞ; a zðnÞÞ: ð9Þ

If z¼ 1, the FQL-MOMS selection scheme selects the SD

transmission mode and the outputs ðMCSH; MCSLÞ ¼

ða

1ðnÞ; 0Þ for single primary data stream. If z¼ 2, it chooses

the SM transmission mode and outputs ðMCSH; MCSLÞ ¼

ðMCSH; MCS_LÞ based on a

2ðnÞ ¼ fðH; LÞ for (primary,

secondary) data streams given in Table 2.

3.4 Reinforcement Signal Generator

The reinforcement signal generator in the FQL-MOMS selection scheme will generate a reinforcement signal to effectively update the Q-value for each state-action pair, at every TTI. The reinforcement signals generated for fuzzy rules in the same region of the fuzzy rule base are designed to be in the same form. Also, in order to let Q-values in SD

and SM fuzzy rule bases have the similar degree of reward and punishment, the reinforcement signals in these two rule bases have the similar form. The ratio of the largest reward over the severest punishment for the reinforcement signal is set around at 0.1 since the BLER_{¼ 0:1. In the}

following, the reinforcement signals is sophisticatedly designed so as to achieve a great granularity and large dynamic range to intelligently tune the Q-value in the fuzzy rule bases SD and SM.

3.4.1 Reinforcement Signal for SD Fuzzy Rule Base If BLERðnÞ is in the satisfaction region, a more aggressive action to increase the system throughput should be encouraged. As noted, the BLERðnÞ can be getting better only when the initial packet is successfully received without retransmission. Therefore, in order to act in a more aggressive manner, a reward feedback reinforcement signal is given if a packet is successfully received no matter whether the retransmission is occurred or not. The reinforcement signal is designed, according to the results of both initial transmission and retransmissions, by

rðnÞ ¼

Rk

Rd;kþ R

; if packet is successfully

received;

ð þ BLERðnÞÞ; if packet is dropped;

8 > < >

: ð10Þ

where Rk is the number of information bits of the

transmitted packet with the selected action a1;k; Rd;k is the

summation of redundant bits in the initial transmission and the retransmission, R_{is the maximum information bits that}

the system can support, used as a normalized factor, is a bias constant for punishment if the packet is dropped, and is the weight constant for BLERðnÞ. From (10), we can see that if the selected action can deliver more information bits with fewer redundant bits (more efficient transmission), a larger reward feedback will be generated. Retransmission will decrease the reward since the value of Rd;k increases.

However, if a packet is dropped after three times of transmissions, a severe punishment is given and a negative reinforcement signal is designed as in (10). Here, the will be set to let the punishment be greater than the reward, for example, ¼ 5. Also, the smaller BLERðnÞ should get the less punishment for packet dropped since it can tolerate more unsuccessful transmission. The is set to let the effect of BLERðnÞ be observable and comparable with respect to the reward, for example, ¼ 40.

If BLERðnÞ is in the attention region, an action should be determined to keep the BLER performance unchanged. Therefore, the reward in this region should be smaller than that in the satisfaction region. The reinforcement signal is then designed as rðnÞ ¼ 1 ð1þNr;kÞ Rk Rd;kþR ; if packet is successfully received;

ð þ BLERðnÞÞ; if packet is dropped;

8 > < >

: ð11Þ

where Nr;k is the number of the retransmission for the

transmitted packet with the selected action a1;k, and

0 Nr;k 2. If the packet can be received without

retrans-mission, the selected action can get the same reward as that in the satisfaction region. If the retransmission occurs, the

(7)

reward is divided by ð1 þ Nr;kÞ to decrease the reward of the

selected action since the BLER performance is still within the BLER. However, if the packet is dropped, a similar severe punishment is given. Since the BLERðnÞ in this attention region is greater than that in the satisfaction region, the punishment in the attention region is usually larger than that in the satisfaction region.

If BLERðnÞ is in the violation region, a conservative action should be determined to recover the BLER require-ment. Therefore, a reward reinforcement signal is given only if the packet is successfully received without retrans-mission. The reinforcement signal is given by

rðnÞ ¼ Rk Rd;kþ R ; if packet is successfully received without retransmission; ð1 þ Nr;kÞ 3 Rk Rd;kþ R ; if packet is successfully received with retransmission; ð þ BLERðnÞÞ; if packet is dropped: 8 > > > > > > > > > > < > > > > > > > > > > : ð12Þ If the packet is successfully received without retransmis-sion, the reward for the selected action is the same form as that in the satisfaction region since the BLERðnÞ can be reduced. However, if the packet is successfully received but with retransmission, the selected action will still attain a slight punishment since the BLER performance has been violated and is becoming deteriorate. This punishment increases as the number of retransmission increases. If the packet is dropped, a severe punishment is put on the selected action. Notice that the punishment in this region is the severest since the BLERðnÞ in this region is the largest. 3.4.2 Reinforcement Signal for SM Fuzzy Rule Base Similarly, if BLERðnÞ is in the satisfaction region, the reinforcement signal is designed to increase the system throughput by

rðnÞ ¼

Rk

R_{þ R} d;k

; if both packets are

successfully received; Nd;k

2 ð þ BLERðnÞÞ; if one or two packets_{are dropped;} 8 > > > < > > > : ð13Þ

where Nd;k is the number of the dropped transmitted

packets with action a2;k; Nd;k¼ 0; 1; or 2. Notice that, in the

SM transmission mode, Rkis the summation of information

bits in the two transmitted packets with action a2;k, and the

value of Rk would be greater than ðRþ Rd;kÞ if the action

a2;kadopts higher MCS levels for the two data streams. The

selected action a2;kwould get a higher reward than that in

fuzzy rule base SD. This is a necessity since the SM transmission mode can enhance the system throughput more than the SD transmission mode. If both packets are dropped, a severe punishment is set to ð þ BLERðnÞÞ. However, if one of two packets is dropped, the punishment is set to be the half of that for two packets dropped since there is still one packet can be successfully received.

If BLERðnÞ is in the attention region, the reinforcement signal is designed to keep the BLER performance un-changed by rðnÞ ¼ 1 ð1 þ Nr;kÞ Rk Rd;kþ R

successfully received; Nd;k

2 ð þ BLERðnÞÞ; if one or two_{packets are dropped:} 8 > > > < > > > : ð14Þ Notice that in the SM transmission mode, 0 Nr;k 4. As

Nr;k increases, the reward for the selected action a2;k is

decreased. The reason of this is the same as that given in the fuzzy rule base SD.

If BLERðnÞ is in the violation region, the reinforcement signal is designed to recover the BLER requirement by

rðnÞ ¼

Rk

Rd;kþ R

successfully received without retransmission; ð1 þ Nr;kÞ 5 Rk Rd;kþ R

; if both packets are successfully received but with retransmission; ð þ BLERðnÞÞ; if one or two packets are

dropped: 8 > > > > > > > > > > > > < > > > > > > > > > > > > : ð15Þ Similar to that designed for fuzzy rule base SD, only when two transmitted packets are successfully decoded without retransmission, the selected action a2;k can get a reward

since the BLER performance is improved. If there is any retransmission in the two packets transmission, a slight punishment is put on the selected action a2;ksince the BLER

performance has been out of BLER requirement BLER.

This slight punishment is divided by 5 since the maximum value of Nr;kis 4. If any packet dropped occurs, the selected

action obtains a severe punishment since the BLER performance has been violated the requirement and cannot be allowed to deteriorate.

3.5 Q-Function Update

Applying the fuzzy Q-learning algorithm in [20], [22], the Q-values in the fuzzy rule base for transmission mode zin the next episode is updated by the Bellman equation [22], [28] given below qnþ1ðSj; azðnÞÞ ¼ qnðSj; azðnÞÞ þ j;n J i¼1i;n qnðSj; azðnÞÞ; for 1 j J; ð16Þ

where is a learning rate, 0 < 1, and the qnðSj; azðnÞÞ

is the temporal difference of the current episode and the next episode Q-values. By the ordinary gradient descent method [21], the qnðSj; azðnÞÞ can be obtained by

qnðSj; azðnÞÞ ¼ rðnÞ þ QnðXðn þ 1Þ; azðn þ 1ÞÞ

QnðXðnÞ; azðnÞÞ;

ð17Þ where is a discount factor, 0 < < 1, and QnðXðnÞ; azðnÞÞ

is the Q-value for the state-action pair ðXðnÞ; a

(8)

using the rule intensity j;n of XðnÞ, we can get the QnðXðnÞ; azðnÞÞ by QnðXðnÞ; azðnÞÞ ¼ J j¼1½j;n qnðSj; az;j ðnÞÞ J j¼1j;n : ð18Þ Also, the Q nðXðn þ 1Þ; azðn þ 1ÞÞ in (17) is the optimal

Q-value at the next episode. Since the next episode Q-value for each state-action pair is unavailable, we use qnðSj; az;k Þ

instead of qnþ1ðSj; az;k Þ to get QnðXðn þ 1Þ; azðn þ 1ÞÞ by the

center of area (COA) method [17] QnðXðn þ 1Þ; azðn þ 1ÞÞ ¼ J j¼1½j;nþ1 qnðSj; az;j ðnÞÞ J j¼1j;nþ1 : ð19Þ Based on the principle of Bellman’s optimality [28], as n is sufficiently large, a

zðnÞ will be an optimal action for the

transmission mode zat which the maximum of Q-value is attained. Therefore, the fuzzy Q-learning algorithm can make an optimal decision for this MDP problem.

Note that, the learning rate controls the convergence of Q-value. As it is larger, the system learns faster but tends to more oscillation. The discount factor makes the Q-value at the next episode less valuable. As it is larger, the system becomes more foresight but requires a longer training time.

4 S

IMULATION

R

ESULTS

4.1 Simulation Environment

In the simulations, we consider the HSPAþ system

supported in a hexagonal grid multicell CDMA system, where each cell suffers interference from two-tier neighbor cells. The spread factor (SF) of the CDMA system is fixed at 16, where one spreading code is reserved to common channels and 15 spreading codes are assigned to HS-DSCH. The channel model consists of the long-term fading and the short-term fading as given in (1) and (2), respectively. The channel condition is fixed within a TTI. The delay of either ACK/NACK or CQI report is set to be 10 ms, 5 TTI. Also, assume that the maximum power allocated to HS-DSCH is up to 80 percent of the total transmission power of the Node B, and the residual power is allocated to other control and service channels. Users always have packets waiting for transmission in the buffer. Since the variation of channel condition is very fast, the FQL-MOMS scheme would become myopic and we set ¼ 0:1. Also, the FQL-MOMS

selection scheme has to make decision for the HSPAþ

system per TTI (2 ms) interval. Thus, it needs fast learning speed and we set ¼ 0:9. Other parameters of the HSPAþin CDMA mobile system and the FQL-MOMS selection scheme are given in Table 3.

We compare the FQL-MOMS selection scheme with two conventional MCS selection schemes. One is the ATS scheme proposed in [12]. The ATS scheme adaptively sets the SINR threshold for each MCS level, where the SINR thresholds which are close to the SINR of the last transmission will be adjusted after each transmission. The increasing step of the threshold adjustment for the failed

(NAK) transmission is set to be BLER

1BLER, while the

decreasing step for the successful (ACK) transmission is set to be 0:1 BLER

1BLER. The other is the Q-learning-based

HARQ (Q-HARQ) scheme studied in [14]. The Q-HARQ scheme uses only the Q-learning algorithm to learn an optimal policy to choose the MCS level. The reinforcement signal of the Q-HARQ scheme is defined as

SINRðxk; akÞ gSINRðakÞ

g SINRðakÞ

!2

;

where SINRðxk; akÞ is the received SINR at UE for the state

xk with action ak and SINRðag kÞ is the required SINR with

action ak. It is a transmission cost function. The action of

which SINR is closest to the required SINR has the smallest transmission cost value. After learning, the Q-HARQ scheme selects the MCS with minimum Q-value. In the meantime, the selection of the MIMO configuration mode is assumed for both the ATS and the Q-HARQ schemes. It is according to CQIL. If the CQILis larger (less) than CQIth,

the SM (SD) transmission mode is selected since the channel condition is good (bad). Here, we set CQIth¼ 7, the middle

value of CQI and also the threshold for the switching between QPSK and 16-QAM, which is a suitable threshold to judge the quality of the channel.

In the following figures, we define an HSDPA service power ratio (HSPR) as the transmission power allocated to HS-DSCH over the total transmission power of the Node B. If HSPR is too small, there is no enough power to transmit the data. On the contrary, if HSPR is too large, the interference from other cells will seriously affect the data transmission.

4.2 Performance Evaluation

Fig. 5 shows the system throughput versus the user mobility in the HSPAþsystem for the FQL-MOMS selection scheme, the ATS scheme [12], and the Q-HARQ scheme [14], with HSPR equal to 80 percent. It can be found that the FQL-MOMS selection scheme enhances the system

through-put of the HSPAþ system by an amount of 49.3 percent

more than the ATS scheme and by 35.9 percent more than the Q-HARQ scheme. Reasons for this are as follows: The decision by the FQL-MOMS selection scheme is intelligently inferred from fuzzy rules, the rules are adaptively refined

TABLE 3

Parameters for the HSPAþ System and the FQL-MOMS Selection Scheme

(9)

by reinforcement signals in the Q-learning algorithm, and the reinforcement signals are effectively designed. The FQL-MOMS selection scheme considers not only the CQI and the last transmission results but also the BLER performance metric and the transmission efficiency. Also, based on the BLER performance metric, the FQL-MOMS selection scheme makes an aggressive (conservative) decision if the BLER performance falls in the satisfaction (violation) region. The granularity and the dynamic range of the adjustment for the decision of the MIMO transmission mode and MCS level are more flexible and sophisticated to cope with the varying channel environment. Consequently, the excellent balance between the system throughput enhancement and BLER requirement guarantee can be intelligently achieved. On the other hand, the ATS scheme determines the MIMO configuration mode and MCS level by an adjustable SINR. However, the adjustment of SINR threshold is less flexible, meaning that either the granularity of the adjustment is fixed or the dynamic range of the adjustment is small. The Q-HARQ scheme uses only the Q-learning algorithm to learn the optimal policy of MCS level selection but the designed reinforcement signal is a little coarse since it only considers the received SINR and the required SINR. Therefore, the Q-HARQ scheme can attain the system throughput higher than the ATS scheme but lower than the FQL-MOMS scheme.

Fig. 6 shows the packet dropping ratio versus the user mobility at HSPR equal to 80 percent. In the HSPAþsystem, if a packet decoding is failed after three times of transmis-sions, this packet will be dropped. It can be observed that the FQL-MOMS (ATS) scheme has the smallest (largest) packet dropping ratio and the Q-HARQ scheme is in the between. Reasons for this are similar to as those given in Fig. 5, denoting that a larger system throughput would usually result in a smaller packet dropping ratio. Since the FQL-MOMS scheme infers a suitable MIMO transmission mode and MCS level by fuzzy rules, it achieves a lowest packet dropping ratio.

Fig. 7 shows the BLER performance versus the user mobility at HSPR equal to 80 percent. The Q-HARQ scheme has the best BLER performance, while the FQL-MOMS selection scheme attains the BLER performance

larger than the Q-HARQ scheme by an amount of 0.01, and all the three schemes can guarantee the BLER requirement whatever the user mobility is. The phenomena are explained below. The FQL-MOMS selection scheme in-tends to enhance the system throughput while guarantee the BLER performance via the learning process for fuzzy rule bases. As the BLERðnÞ is over the BLER requirement, the FQL-MOMS scheme adopts a conservative action to recover the BLER performance. As the BLERðnÞ is far lower than the BLER requirement, the FQL-MOMS scheme adopts an aggressive action to enhance the system throughput and may increase the BLERðnÞ. Therefore, the FQL-MOMS selection scheme can achieve an excellent balance between the system throughput enhancement and BLER requirement guarantee. As it can be found, the FQL-MOMS attains the highest system throughput in Fig. 5, and in the meantime it maintains the BLER performance at just below the BLER requirement in this figure. On the other hand, the Q-HARQ scheme is too sensitive to the fast varying channel condition. It selects the action with minimum Q-value, where the Q-value is adjusted only based on the received SINR and the required SINR. The selected action which SINR is closer to the received SINR Fig. 5. The system throughput versus the user mobility. Fig. 6. The packet dropping ratio versus the user mobility.

(10)

has a smaller reinforcement signal. This makes that the Q-HARQ scheme selects a conservative action and has the best BLER performance. The ATS scheme can rapidly adjust SINR thresholds based on the current ACK/NACK information, where the ratio of the decreasing step for successful transmission over the increasing step for failed transmission is set to 0.1. The decision of selection is too rough and thus makes that the ATS scheme has the largest BLER performance.

Fig. 8 presents the system throughput versus HSPR at the user mobility equal to 60 km/hr. It can be found that the FQL-MOMS (ATS) scheme still has the largest (smallest) system throughput among the three schemes no matter what the HSPR is. Reasons for this are the same as those given in Fig. 5. Also, as the HSPR increases, since the Node B has enough power to deliver the data and the UE has better received SINR, the system throughput increases. Performance metrics such as the packet dropping ratio and BLER versus HSPR are also measured in the simulations. The figure showing the packet dropping ratio (BLER) versus HSPR in contrast to Fig. 6 (Fig. 7) also has the same phenomena as Fig. 8 with respect to Fig. 5, which is not shown here.

Fig. 9 shows a worst case learning process of the FQL-MOMS selection scheme in the 1,000 simulations, where

SINRs related with a desired action a2;22 are randomly

generated and input to the FQL-MOMS scheme with three partitioned regions and the FQL-MOMS scheme without partition. It is observed that the FQL-MOMS scheme with three partitioned regions first stays at a temporary action a2;17after four iterations, then it oscillates at either a2;17 or

a2;22 and finally achieves the desired action after 56

iterations, while the FQL-MOMS scheme without partition oscillates quite a lot and finally reaches the desired action after 78 iterations. The FQL-MOMS scheme with three partitioned regions accelerates the learning process by up to 28 percent compared to the FQL-MOMS scheme without partition. Also, averagely speaking in the 1,000 simula-tions, the former can increase the learning speed by up to 61 percent than the latter. The phenomenon is explained below. In the FQL-MOMS scheme with three partitioned regions, only actions in a region are enabled for each

update learning. On the other hand, in the FQL-MOMS scheme without partition, all actions are enabled. Although the former might attain a local optimal action for the time being, if the initial BLER performance metric results in an improper region of actions in the fuzzy rule base, it will fastly go to the desired action. It is because the purpose of fuzzy rule bases is to keep the BLER performance in the attention region where the desired action is enabled in each fuzzy rule.

5 C

ONCLUSIONS

In this paper, we propose a fuzzy Q-learning-based MIMO configuration mode and MCS level selection scheme for HSPAþsystems, intending to efficiently utilize the system radio resource. The FQL-MOMS selection scheme combines the fuzzy logics and Q-learning algorithm to intelligently choose an MIMO configuration mode and MCS level for packet data transmission of HSPAþsystems to enhance the system throughput while guarantee the BLER requirement. It is designed to have a clear and effective configuration which contains the SD fuzzy rule base and the SM fuzzy rule base. Also, in order to accelerate the learning process, each rule base is partitioned into satisfaction, attention, and violation regions according to the BLER performance. The fuzzy rules with Q-values in the three regions of the SD and SM fuzzy rule bases are carefully designed by the domain knowledge to maximize the system throughput and fulfill the BLER requirement. Moreover, the Q-learning algorithm is used to update the Q-values of fuzzy rules in both rule bases to adapt to the variation of channel condition, where the reinforcement signals consider BLER performance metric, transmission efficiency, and transmission results of both transmission and retransmission to reward or punish the selected action. By using the closed-loop iteration manner, the FQL-MOMS selection scheme can determine an appropriate MIMO configuration mode and MCS level

for packet data transmission in HSPAþ systems. It can

indeed achieve an excellent balance between the system throughput maximization and BLER requirement fulfill-ment. Simulation results show that the FQL-MOMS selection Fig. 8. The system throughput versus HSPR. _{Fig. 9. The learning process of the FQL-MOMS selection scheme.}

(11)

scheme can achieve higher system throughput than conven-tional ATS and Q-HARQ schemes, under the BLER require-ment guarantee.

A

CKNOWLEDGMENTS

The authors would like to give thanks to the editor and the four anonymous reviewers for their kind comments to improve the presentation of the paper. This work was supported by the National Science Council (NSC), Taiwan, under contract number NSC 97-2221-E-009-098-MY3, and the Ministry of Education, Taiwan, under the Aiming for Top University plan.

R

EFERENCES

[1] “UMTS Physical Layer Procedures (FDD),” technical report 3GPP TR 25.214, Third Generation Partnership Project, May 2008. [2] E. Dahlman, S. Parkvall, J. Sko¨ld, and P. Beming, 3G Evolution:

HSPA and LTE for Mobile Broadband. Elsevier, 2007.

[3] G.J. Foschini, “Layered Space-Time Architecture for Wireless Communication in Fading Environment when Using Multiple Antennas,” Bell Labs Technical J., vol. 1, no. 2, pp. 41-59, 1996. [4] V. Tarokh, H. Jafarkhani, and A.R. Calderbank, “Space-Time

Block Codes from Orthogonal Designs,” IEEE Trans. Information Theory, vol. 45, no. 5, pp. 1456-1467, July 1999.

[5] L. Zheng and D.N.C Tse, “Diversity and Multiplexing: A Fundamental Trade Off in Multiple-Antenna Channels,” IEEE Trans. Information Theory, vol. 49, no. 5, pp. 1073-1096, May 2003. [6] M. Wrulich, S. Eder, I. Viering, and M. Rupp, “Efficient Link-to-System Level Model for MIMO HSDPA,” Proc. IEEE GlobeCom, pp. 1-6, Dec. 2008.

[7] C. Mehlfu¨hrer, S. Caban, M. Wrulich, and M. Rupp, “Joint Throughput Optimized CQI and Precoding Weight Calculation for MIMO HSDPA,” Proc. Signals, Systems and Computers Conf., pp. 1320-1325, Oct. 2008.

[8] J. Li and Y.Q. Zhao, “Resequencing Analysis of Stop-and-Wait ARQ for Parallel Multichannel Communications,” IEEE/ACM Trans. Networking, vol. 17, no. 3, pp. 817-830, June 2009.

[9] L.C. Wang and C.W. Chang, “Gap Processing Time Analysis of Stall Avoidance Schemes for High-Speed Downlink Packet Access with Parallel HARQ Mechanism,” IEEE Trans. Mobile Computing, vol. 5, no. 11, pp. 1591-1605, Nov. 2006.

[10] T. Cheng, “Coding Performance of Hybrid ARQ Schemes,” IEEE Trans. Comm., vol. 54, no. 6, pp. 1017-1029, June 2006.

[11] S. Chen, J. Du, M. Peng, and W. Wang, “Performance Analysis and Improvement of HARQ Techniques in TDD-HSDPA/SA System,” Proc. Sixth Int’l Conf. ITS-Telecomm., pp. 523-526, 2006. [12] M. Nakamura, Y. Awad, and S. Vadgama, “Adaptive Control of

Link Adaptation for High Speed Downlink Packet Access in WCDMA,” Wireless Personal Multimedia Comm., vol. 2, pp. 382-386, 2002.

[13] A. Muller and T. Chen, “Improving HSDPA Link Adaptation by Considering the Age of Channel Quality Feedback Information,” Proc. IEEE Vehicular Technology Conf., pp. 1643-1647, Sept. 2005. [14] C.J. Chang, C.Y. Chang, and F.C. Ren, “Q-Learning-Based Hybrid

ARQ for High Speed Downlink Packet Access in UMTS,” Proc. IEEE Vehicular Technology Conf., pp. 2610-2615, Apr. 2007. [15] C.J.C.H. Watkins and P. Dayan, “Technical Note: Q-Learning,”

Machine Learning, vol. 8, no. 3, pp. 279-292, 1992.

[16] L. Zhao, J.W. Mark, and T.C. Yoon, “A Combined Link Adaptation and Incremental Redundancy Protocol for Enhanced Data Transmission,” Proc. IEEE GlobeCom, vol. 2, pp. 1277-1281, Nov. 2001.

[17] C.T. Lin and C.S.G. Lee, Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems. Prentice Hall, 1996.

[18] J. Ye, X. Shen, and J.W. Mark, “Call Admission Control in Wideband CDMA Cellular Networks by Using Fuzzy Logic,” IEEE Trans. Mobile Computing, vol. 4, no. 2, pp. 129-141, Mar./Apr. 2005.

[19] H.R. Berenji, “Fuzzy Q-Learning: A New Approach for Fuzzy Dynamic Programming,” Proc. IEEE Int’l Conf. Fuzzy Systems, pp. 486-491, 1994.

[20] T. Horiuchi, A. Fujino, O. Katai, and T. Sawaragi, “Fuzzy Interpolation-Based Q-Learning with Profit Sharing Plan Scheme,” Proc. IEEE Int’l Conf. Fuzzy Systems, pp. 1707-1712, 1997. [21] P.Y. Glorennec and J. Jouffe, “Fuzzy Q-Learning,” Proc. IEEE Int’l

Conf. Fuzzy Systems, pp. 659-662, 1997.

[22] L. Jouffe, “Fuzzy Inference System Learning by Reinforcement Methods,” IEEE Trans. Systems, Man, and Cybernetics, Part C: Applications and Rev., vol. 28, no. 3, pp. 338-355, Aug. 1998. [23] C.F. Juang and C.M. Lu, “Ant Colony Optimization Incorporated

with Fuzzy Q-Learning for Reinforcement Fuzzy Control,” IEEE Trans. Systems, Man and Cybernetics, Part A: Systems and Humans, vol. 39, no. 3, pp. 597-608, May 2009.

[24] C.F. Juang, “Combination of Online Clustering and Q-Value Based GA for Reinforcement Fuzzy System Design,” IEEE Trans. Fuzzy Systems, vol. 13, no. 3, pp. 289-302, June 2005.

[25] Y.H. Chen, C.J. Chang, and C.Y. Huang, “Fuzzy Q-Learning Admission Control for WCDMA/WLAN Heterogeneous Net-works with Multimedia Traffic,” IEEE Trans. Mobile Computing, vol. 8, no. 11, pp. 1469-1479, Nov. 2009.

[26] C.Y. Huang, W.C. Chung, C.J. Chang, and F.C. Ren, “Fuzzy Q-Learning-Based Hybrid ARQ for High Speed Downlink Packet Access,” Proc. IEEE Vehicular Technology Conf., pp. 1-4, Sept. 2009. [27] “Physical Layer Aspects of UTRA High Speed Downlink Packet Access (Release 4),” technical report 3GPP TR 25.848, Third Generation Partnership Project, Mar. 2001.

[28] R. Bellman, Dynamic Programming. Princeton Univ., 1957. [29] G.L. Stuber, Principle of Mobile Communication. Kluwer Academic

Publisher, 2001.

[30] M. Gudmundson, “Correlation Model for Shadow Fading in Mobile Radio Systems,” IEE Electronics Letters, vol. 27, no. 23, pp. 2145-2146, Nov. 1991.

Wen-Ching Chung received the BE and PhD degrees in electrical control engineering from National Chiao Tung University, Hsinchu, Tai-wan, in 1999 and 2006, respectively. In January 2008, he joined the Department of Electrical Engineering of National Chiao Tung University in Taiwan as a postdoctor. His research interests are in the areas of radio resources management for wireless communication networks, link adap-tation for broadband networks, and intelligent control systems. He is a member of the IEEE.

Chung-Ju Chang received the BE and ME degrees in electronics engineering from National Chiao Tung University, Hsinchu, Taiwan, in 1972 and 1976, respectively, and the PhD degree in electrical engineering from National Taiwan University in 1985. From 1976 to 1988, he was with Telecommunication Laboratories, Directorate General of Telecommunications, Ministry of Communications, Taiwan, as a design engineer, supervisor, project manager, and then division director. He also acted as a science and technical advisor for the Minister of the Ministry of Communications from 1987 to 1989. In 1988, he joined the Faculty of the Department of Communica-tion Engineering, College of Electrical Engineering and Computer Science, National Chiao Tung University, as an associate professor. He has been a professor since 1993 and a chair professor since 2009. He was the director of the Institute of Communication Engineering from August 1993 to July 1995, chairman of the Department of Communica-tion Engineering from August 1999 to July 2001, and dean of the Research and Development Office from August 2002 to July 2004. Also, he was an advisor for the Ministry of Education to promote the education of communication science and technologies for colleges and universities in Taiwan during 1995-1999. He is acting as a committee member of the Telecommunication Deliberate Body, Taiwan. Moreover, he once served as editor for IEEE Communications Magazine and associate editor for the IEEE Transactions on Vehicular Technology. His research interests include performance evaluation, radio resources management for wireless communication networks, and traffic control for broadband networks. He is a member of the Chinese Institute of Engineers (CIE) and the Chinese Institute of Electrical Engineers (CIEE). He is a fellow of the IEEE.

(12)

Kai-Ten Feng received the BS degree from National Taiwan University, Taipei, in 1992, the MS degree from the University of Michigan, Ann Arbor, in 1996, and the PhD degree from the University of California, Berkeley, in 2000. Since August 2007, he has been with the Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan, as an associate professor. He was an assistant professor with the same department between February 2003 and July 2007. He was with OnStar Corp., a subsidiary of General Motors Corporation, as an in-vehicle development manager/senior technologist between 2000 and 2003, working on the design of future Telematics platforms and in-vehicle networks. His current research interests include cooperative and cognitive networks, mobile ad hoc and sensor networks, embedded system design, wireless location technol-ogies, and Intelligent Transportation Systems (ITSs). He received the Best Paper Award from the IEEE Vehicular Technology Conference in Spring 2006, which ranked his paper first among the 615 accepted papers. He was also the recipient of the Outstanding Young Electrical Engineer Award in 2007 from the Chinese Institute of Electrical Engineering (CIEE). He has served on the technical program committees of VTC, ICC, and APWCS. He is a member of the IEEE.

Ying-Yu Chen received the BE degree in electrical engineering from National Chung Hsing University in 2007 and the ME degree in electrical engineering from National Chiao Tung University, Hsinchu, Taiwan, in 2009. Her research interests include radio resource man-agement and link adaptation for wireless com-munication networks.

. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.