Chapter 2 System Model
2.3 Channel Model
The channel condition in WCDMA at time t, denoted by F(t), is modeled mainly by long-term fading and short-term fading, which is shown as:
( ) ( ) 10 /10 ( )
Since ζ( )t is mutually independent, it can produce up to M independent short-term fading. Hence we could choose M equal to the number of the total links in all cells of the system.
The shadowing effect for a user will be changed according to the motion of the user. But in practical, the degradation degree of shadowing between two sampling time is small since the duration of two sampling time in HARQ is very short compared to the motion of the user. As mentioned above, the shadowing effect of these sampling points are highly correlated. We assume that the model of the correlated shadow fading is defined as a normalized autocorrelation function ρ(∆ , x) where ∆ is the distance between two adjacent TTI. The ( )x ρ ∆x can be obtained by
where dcor is the decorrelation length.
Chapter 3
Fuzzy Q-learning based H-ARQ Scheme
Conventional H-ARQ procedures try to select a proper modulation and coding scheme for transmission according to the CQI value from UE, but they cannot adapt the modulation and coding scheme (MCS) of transmissions to the channel condition precisely. The reason is that the range of CQI value is confined and obviously cannot represent entire possible channel condition precisely. We try to find an advanced scheme so that the adaptation between the channel quality and the MCS can be more precisely. As shown in Figure 3.1, we can consider a learning system interacting with its environment to accommodate the rate adjustment to the entire possible channel condition. We take the practical channel condition as the system state and the MCS adopted by Node B as the action. Then the object is to find an appropriate cost function to tell Node B whether the previous action is suitable or not.
To implement the advanced H-ARQ scheme with learning ability, first we should define the system state and the action. It is found that the signal to interference and noise ratio (SINR) is the most intuitional parameter to estimate the channel condition, and the block error rate (BLER) is another important parameter to be guaranteed. Therefore we choose the measurements of these parameters as the system state. On the other hands, the action is intuitively defined as the MCS since it is the major factor to influence the transmission rate.
The last parameter we should define is the cost function. The cost function needs to reflect the
condition whether the applied action is suitable or not. We will propose it explicitly in later section.
Before the design of the FQL-based H-ARQ scheme, we briefly expound the concept of fuzzy Q-learning algorithm first.
Figure 3.1 : Block diagram of a learning system
3.1 Fuzzy Q-learning Algorithm
The fuzzy Q-learning algorithm combines the fuzzy theorem and the learning ability of Q-learning algorithm to help the system adapting the channel states and the adopting actions.
Since the FQL is based on the Q-learning, the learning processes of them are very similar. The major difference between these two algorithms is that the system state of Q-learning is described by the fuzzy system. We will illustrate the detail of the FQL in the following.
Owing to the property of the H-ARQ process, the input state information can be regarded as a Markov decision process (MDP). We denote S as the set of state vectors described by the fuzzy inference system (FIS) and S=
{
Sj, j=1, 2, ... ,N}
. Each state vector Sj is constituted by M fuzzy linguistic variables to describe the system. The set vector of the candidate output actions is denoted as A, and A={
Ak, k =1, 2, ... ,K}
. The input state vector received in ith time period which is denoted as xi. Then the rule representation of FQL for state Sj is in the following form:is xi Sj, Ak with (q Sj, Ak), 1≤ ≤j N and 1≤ ≤k K Every fuzzy rule has to choose an action from the action candidate set A by a selection policy.
In the FQL, the action selection policy for each fuzzy rule we adopted is select–max strategy.
The action selection policy in FQL is called exploration/exploitationpolicy (EEP).
As to the defuzzification of the N fuzzy rules, the global inferred action a(xi) for the function if we have received the state information set x , and ai j is the action selected by EEP of jth fuzzy rule. Also, the Q-value for the state-action pair
(
x a xi, ( )i)
in ith time period is( )
According to the Q-learning rule [11], the Q-value updated in the FQL can be expressed as
( ) ( ) ( )
repeat Steps 2~7 iteratively if it receives state information in each time period. That means the FQL algorithm is implemented frame by frame.Step 1: Initialize Q-values for all fuzzy rules Step 2: Receive the state information vector xi
Step 3: For each fuzzy rule, choose the rule consequence aj by the EEP Step 4: Compute the global consequence a(xi) and its corresponding
Q-value Qi(xi ,a(xi)) using (3-2) and (3-3) Step 5: Apply the action a(xi) to system
Step 6: Receive the reinforcement r (xi, a(xi))
Step 7: Update qi (Sj, aj) for all j using (3-4), (3-5) and (3-6) Figure 3.2 : Procedure of FQL algorithm
3.2 Design of FQL based H-ARQ
In this paper we incorporate the fuzzy Q-learning algorithm into the design of the H-ARQ scheme. The FQL combines the benefits of fuzzy system and reinforcement learning.
The fuzzy system provides a good function approximation for the FQL and a prior knowledge can be easily applied to the system design. By applying FQL, the system can be managed with uncertain information. From the measured state information in each TTI, the H-ARQ mechanism should select a suitable modulation and coding scheme to transmit information data. In the design, the state information defined here means the channel quality indicator (CQI) and the block error rate (BLER). After we have made a decision, we can know how much information bits will be transmitted in this TTI. So the output signal of the H-ARQ scheme would be able to change the modulation and coding scheme and trim the number of information bits which will be transmitted in each TTI.
The H-ARQ scheme with FQL is shown as Figure 3.3. H-ARQ segments arrival packets according to the modulation order and coding rate of transmission that the FQL-based H-ARQ determines. The two-stage rate matching is to match the physical layer transmission rate based on the determined action and to buffer the retransmission data packet for IR request.
The CRC attachment and the interleaving technique are used for the error detecting and against errors. The turbo coding with a minimum 1/3 code rate is employed in this paper.
CRC
Figure 3.3 : The FQL-based H-ARQ for packet transmission in HS-DSCH
Figure 3.4 shows the block diagram of the FQL-based H-ARQ scheme. The FIS describes the system state using the channel information which is received from the HS-DPCCH. The EEP block decides the action of each fuzzy rule according to the Q-values of the fuzzy rules. With the membership values of the total fuzzy rules, the inferred action is output to the controller block, and the controller block translates it to the control signal to the H-ARQ procedure. After applying the global inferred action, there is a reinforcement signal to reflect the inferred action is suitable to the previous system state or not. Then the Q-learning block updates the Q-values of the fuzzy rules with the reinforcement value.
The state information in Figure 3.4 is the state vector which has been mentioned in the detail of FQL. In our proposed scheme, we choose the signal to interference and noise ratio (SINR) and the ratio of block error rate (BLER) to its requirement value as the input parameters to describe the system state. Instead of CQI value, we choose the measured SINR value as the input of the scheme to achieve more accurate calculation. We define SINRi is the measured value of SINR at the beginning of a TTI, and Ri as
Figure 3.4 : Block diagram of the FQL-based H-ARQ scheme
value Ri can clearly represent the degree of disparity between actual BLER and the guaranteed value.
We denote the linguistic variables of SINR and the ratio of BLER as LSINR and LR, respectively. Then the input state information set xi is defined as
{
i , i}
i SINR R
x = L L , (3-8)
where the suffix i means the ith TTI. We use 9 terms for LSINR and 2 terms for LR. Here we define their fuzzy term sets as T(LSINR) = {Extremely High, Very High, High, Slightly High, Medium, Slightly Low, Low, Very Low, Extremely Low } = {EH, VH, H, SH, M, SL, L, VL, EL}, and T(LR) = {Medium, Small} = {M, S}. From the fuzzy set theory, the fuzzy rule base forms with dimensions T(LSINR)×T(LR). It means that the total number of fuzzy rules is equal to 18. Here the fuzzy term set of LSINR is according to the desired SINR value of all action candidates under the guaranteed BLER condition.
On the other hand, the output action of FQL-based H-ARQ scheme is the modulation order and coding rate. From the combination of the two kinds of modulation order (QPSK and 16QAM) and five values of coding rate
{
13, 12, 23, 34, 1}
, we can find 9 action candidates for each fuzzy rule, which has been shown in Table 2-1.Since each modulation order with a coding rate corresponds to a specific peak data rate, we can build a table showed as Table 2-1, which shows the modulation and coding schemes that can be adopted for each state and their corresponding peak data rates. Here the fuzzy term set of the output action is equal to the action candidate set A, A=
{
Ak, k=1, 2, ... ,9}
, and Ak corresponds to Table 2-1 from lowest peak data rate to highest one..The reinforcement signal should be an indicator to the system performance or related ones. Since the object is to keep the BLER under or equal to the requirement value, the reinforcement signal should be defined to reflect this condition. Based on this concept, we define the reinforcement signal from the function showed as follows:
( ) measured SINR value of state xi. The notation SINRAk is the theoretical SINR value when
applying action Ak under the requirement BLER, while the notation
3
( )
BLER
SINRa xi× is the theoretical SINR value when applying global action a(xi) under the value of BLER three times the requirement value 0.1. The physical meaning of the reinforcement function is to adopt the actions which can reach the requirement BLER under different channel conditions. We define two boundary conditions to speed up the learning process since we already have the pre-knowledge of the system. If the BLER is smaller than the requirement value, we can venture to take an action with higher code rate to increase the system throughput. Under such condition, we design the reinforcement function to choose the action whose theoretical BLER value under the measured SINR value is three times the requirement value as far as possible.
Notice that we use the normalized SINR value instead of normalized BLER value for the reason that the trend of the theoretical BLER versus SINR is not linear. It implies if SINR is
higher than a specified value, the normalized BLER values of different actions will make no difference. But the normalized BLER value is more effective than the normalized SINR value to guarantee the requirement value, so we still adopt it as the reinforcement for other cases.
Hence we define that an action with lowest reinforcement value is most suitable. In other words, eventually the system would adaptive apply an action according to the channel conditions to satisfy the QoS and keep high throughput as far as it can.
The BLER SINR( xi, ( )a xi ) can be obtained by the following equation: assume that a block would be regarded as a failure block as long as there is one error bit in this block. The theoretical uncoded bit error rate Pe,uncoded is calculated from [2]
,
p= erfc SINR × . To model the effect of the error correcting code, we make an assumption that the relation between the code rate and SINR is linear. In other words, the performance of a code rate 0.5 is equal to the uncoded performance with two times SINR value. Hence we can find the BLER values of transmissions using different modulation and coding schemes under any measured SINR value through this equation.
After received the system state information, the EEP would decide the output actions for all fuzzy rules based on the Q-values. Since each fuzzy rule represents a specific system state, the Q-values of all fuzzy rules represent the degree of suitability for their corresponding adopting actions and system states. After applying an action, the reinforcement signal is fed
back to be used to update the Q-values of those related fuzzy rules. Then the Q-value of an unsuited action would be decreased rapidly by the reinforcement signal. Eventually the most suitable action of a fuzzy rule would be with the largest Q-value and be always chosen by the EEP in that fuzzy rule.
3.3 Membership Function Definition
Finally we would define the membership functions of all fuzzy inputs. To illustrate the membership function, first we define a function f
( )
⋅ which is expressed asfunction. The definition of this function is shown as Figure 3.5.
Figure 3.5 : Definitions for function f(.)
For the purpose of simplification, we set the membership function of the SINR input as triangle functions. Note that we build the input SINR levels according to the actions we can
adopt. We define a new notation as SINR0.5Ak to notice the theoretical SINR value reaching a values for all actions and set these values corresponding to linguistic levels respectively.
Hence the SINRAk (or SINR0.5Ak ) is defined as the desired SINR value using action Ak to reach a statistic BLER equal to the requirement value (or 0.5). With the pre-knowledge of the relation between candidates of actions and the channel conditions, we set the theoretical SINR values of actions which can reach the specific BLER value 0.5 as the left boundaries of SINR levels. It is for the reason that an action should be adopted only if there is no other action more suitable than this action under any channel condition. Since for a SINR level there is a best suitable action corresponding to it, it is very intuitional that if the input SINR is very close to the desired SINR value corresponding to this action, the matched degree of this level
should be the highest value.
For the other input (the ratio of the measured BLER over the requirement value), we set the member function as Figure 3.7. We simply divide it into two regions for the reason that we expect to adopt a venturesome action or not. For the “Small” level, we expect to adopt an action with higher code rate to reach higher throughput. And for the “Medium” level, we prefer to adopt an action which can help the system satisfying the QoS requirement. So we define the membership functions of BLER ratio input as follow:
( ) (
; 0, 0.8, 1)
the membership value of a fuzzy rule is defined as( )
regarded as the desired action in that scene. We adopt the center of area (CoA) defuzzification method to calculate the final output action. For simplification, we set the membership function of the output as an impulse function with unit length.For the procedure of defuzzification, we adopt the max-min inferred method to get the truth value of each corresponding fuzzy rule. Through the center of gravity (COA) method, a crisp value is exported to decide which action will be applied. Since this crisp value usually not corresponds to any value of an action candidate, a quantized value is output to apply an action.
0.5
A2
SINR
SINRA2
SINRA1
0.5
A3
SINR
SINRA9
0.5
A9
SINR
Figure 3.6 : The membership function of the SINR input
Figure 3.7 : The membership function of the BLER ratio input
Chapter 4 Simulation Results
4.1 System Environment and Parameters
In our simulation, we consider a hexagonal grid cell structure. There are 19 base stations (BS) in the multi-cell system. For a HSDPA user, we assume that the HS-DSCH is allocated with total transmission power up to 80%. The power received from other cells is considered as the interference signal of the users in the own cell.
For a HSDPA link, we consider the long-term fading and the short-term fading. The long-term fading consists of two components. They are shadow effect and path loss respectively. We consider the correlation in time for the same HSDPA link to describe the real environment since the mobility of the users is considered in our system. On the other hand, the short-term fading we adopt is the Jakes model. The total simulation parameters are shown as Table 4-1.
The total period of the propagation delay and the processing delay is assumed to be 6ms. It implies that the Node B requires two TTIs to receive an ACK/NACK from the UE after two additional TTIs. To simplify the simulation complexity and reach higher data rate, we assume the users always have data to be transmitted.
Table 4-1 : Simulation parameters
Parameter Assumption
Cellular layout Hexagonal grid, 19 sites, 2000m cell radius Path loss model
(
ξ( )
r)
128.1 + 37.6log10(r)r is the base station separation in kilometers
Decorrelation length ( dcor ) 20m
σL 8.0
Mobility assignment 0, 20, 40, 60 km/hr, random distribution
Carrier frequency 2.0 GHz
Number of UE in one cell 4, random distribution
Discounted factor (γ) 0.9
Learning rate (η) 0.2
BS total Tx power Up to 44 dBm
Power for HSDPA data transmission Maximum of 80% of total maximum available transmission power
4.2 Conventional Schemes
In the simulation, we will compare the proposed scheme with other conventional schemes. The purpose of the proposed scheme tries to adjust the initial transmission rate adaptive according to the received channel information. After the training process, the H-ARQ scheme could adopt suitable action to satisfy the QoS requirement and keep high throughput. The intuitive conventional schemes are incremental redundancy (IR) scheme and its advance schemes, which are proposed in [17]. From [17], the author proposes the two advanced schemes combined with link adaptation (LA) and IR protocol, and shows that they can reduce the number of transmission times effectively. This implies that the two schemes can adjust adaptively the initial
transmission rate to avoid retransmissions. We also compare our proposed scheme with Q-learning based scheme [12] since these two schemes are both learning schemes.
The two advanced IR schemes are described as follow:
LA_IR_1: The code rate of the initial transmission is the one incremental version of the most recent code rate when a successfully decoding occurs without IR request. This scheme tries to make the initial code rate tracking the variation of the channel condition. It makes the initial code rate always retreat one version and then maintain the throughput of the conventional IR.
LA_IR_2: The code rate of the initial transmission is to keep the current decoded code rate as the next initial code rate whenever a successful decoding occurs. If there is no IR request for two consecutive times, it takes the two incremental version of the current code rate as the next initial code rate.
4.3 Performance Evaluation and Discussions
To show the simulation result, we define a parameter called signal to other services’ interference ratio (SIOR) to describe the ratio of the transmitting power for
To show the simulation result, we define a parameter called signal to other services’ interference ratio (SIOR) to describe the ratio of the transmitting power for