This part considers a multi-cell WCDMA system containing N cells, where each cell has a base station using FQ-SDAM to allocate the radio resource for real-time and non-real-time terminals within its coverage area. An uplink supporting slotted transmission is adopted. All terminals transmit at the same frequency band and are distinguished by their own spreading codes. Each terminal holds two communication channels, the dedicated physical data channel (DPDCH) and the dedicated physical control channel (DPCCH). The DPDCH carries data generated by layer 2 protocol, while the DPCCH carries control information. A channel has a frame-based structure, where the frame length Tf = 10 ms is divided into 15 slots with length Tslot = 2560 chips, each slot corresponding to one power control period. Hence, the power control frequency is 1500 Hz. The spreading factor (SF) for DPDCH can vary between 4 ∼ 256 by SF = 256/2k, k = 0, 1, · · · , 6, carrying 10×2k bits per slot, and the SF for DPCCH is fixed at 256, carrying 10 bits per slot.
Two types of traffic are considered: real-time (type-1) traffic and non-real-time (type-2) traffic. The system provides continuous transmission for real-time traffic and bursty transmis-sion for non-real-time traffic. Here, the real-time terminal is the terminal supporting real-time services, and the non-real-time terminal is the terminal supporting non-real-time services. The real-time terminals may transmit at any possible data rate while necessary; on the other hand, the transmission of non-real-time terminals is controlled by the data access manager at the base station. Considering the terminal’s link gain and the received interference power from both the home and adjacent cells, the data access manager assigns an appropriate data rate for each non-real-time terminal. For the bursty transmission, the available data transmission rates
are 1X, 2X, 4X and 8X, and 1X transmission rate is called the basic rate. A strength-based power control scheme is assumed such that the required transmission power of a mobile is directly proportional to the transmission rate [18]. Additionally, the overall capacity is set by the upper bound of the total received interference power, and the residual capacity is defined as the allowable received interference power from the non-real-time terminals.
The link gain between terminal i to base station j, denoted by hij, is usually determined by the long-term fading F Lij and the short-term fading F Sij [19], which is given by
hij = F Lij× F Sij. (1)
The long-term fading F Lij, combining the path loss and shadowing, is modelled as
F Lij = k × r−α× 10η/10, (2)
wherek is constant, r is distance from mobile i to base station j, α is path loss exponent usually lying between 2 and 5 for a mobile environment (α = 4), and η is normal-distributed random variable with zero mean and variance σL2. The parameter σL is affected by the configuration of the terrain and ranges from 5 to 12 (σL2=10) [19]. The short-term fadingF Sij is mainly caused by multi-path reflections, and is modelled by Rayleigh distribution.
The real-time service is modelled as an ON-OFF Markov process with a transition rate µ from ON to OFF and λ from OFF to ON. The non-real-time service is modelled as a batch Poisson process, in which the arrival process of the data burst is in Poisson distribution and the data length is assumed to have a geometric distribution. The measure of the packet error probability, denoted by Pe, is regarded as the system performance index. The maximum tolerable packet error probability, denoted by Pe∗, is defined as the system QoS requirement.
Additionally, the measure of packet transmission delay is used as a parameter for the data rate scheduler.
III. DESIGN OF FQ-SDAM
The FQ-SDAM contains two functional blocks of a fuzzy Q-learning-based residual capacity estimator (FQ-RCE) and a data rate scheduler (DRS). The FQ-RCE estimates the residual interference power budget, and then the DRS allocates the resource for the non-real-time terminals. The following section describes the fuzzy Q-learning and the detailed design of the two function blocks.
A. The Fuzzy Q-Learning (FQL)
Denote S the set of state vectors for the system, S={Si, i = 1, 2, · · · , M}; each state vector Si comprises L fuzzy linguistic variables selected to describe the system. Denote A the set of actions possibly chosen by system states,A={Aj, j = 1, 2, · · · , N}. For an input state vector x containing theL linguistic variables, the rule representation of FQL for state Si is in the form by
if x is Si, then Aj with q[Si, Aj], 1 ≤ i ≤ M and 1 ≤ j ≤ N, whereAj is thejth action candidate that is possibly chosen by state Si, and q[Si, Aj] is the Q-value for the state-action pair (Si, Aj). The number of state-action pairs for each stateSi equals the number of the elements in the action set; i.e., each antecedent hasN possible consequences.
Every fuzzy rule needs to choose an action Ai from the action candidates set A by an action selection policy. In the FQL, the action selection policy for each fuzzy rule may be select-max or another exploration strategy. To defuzzify the M fuzzy rules, the inferred actiona(x) for the input vector x is expressed as
where αi is the truth value of the rule representation of FQL for state Si. Additionally, the Q-value for the state-action pair (x, a(x)) is given by
Q(x, a(x)) =
For the current system statex after applying the chosen action a(x), the next-stage system state is assumed at y, and the system reinforcement signal is given by c(x, a(x)). To update the Q-value, the next-stage optimal Q-value, Q∗(y, a(y)), is defined as
Q∗(y, a(y)) =
According to the Q-learning rule [17], the Q-value update in the FQL can be expressed as q[Si, ai] = q[Si, ai] + η∆q[Si, ai], (6)
where η is the learning rate, 0 ≤ η ≤ 1, and
∆q[Si, ai] = {c(x, a(x)) + γQ∗(y, a(y)) − Q(x, a(x))} × αi
M
k=1αk
. (7)
c(x, a(x)) in (7) is the reinforcement signal.
B. Fuzzy Q-learning-based Residual Capacity Estimator (FQ-RCE)
The FQ-RCE selects three interference measures as input linguistic variables: the received interference power from real-time terminals at the home cell (Ih1), the received interference power from non-real-time terminals at the home cell (Ih2), and the received interference power from adjacent cells (Io). Notably, the received interference power in the WCDMA system is a good indicator of system loading because the system capacity is interference-limited; moreover, the interference generated from the home cell can be identified by PN codes and the interference from adjacent cells can be distinguished by long scrambling codes [21]. Accordingly, the system state vector x containing the three linguistic variables input to FQ-RCE is defined as
x = (Ih1, Ih2, Io). (8)
Comprehensive experiments found that five terms for both Ih1 and Io, and three terms for Ih2 were proper. Hence, their fuzzy term sets are T(Ih1)={Largely High, HiGh, MeDium, LoW, Largely Low}={LH, HG, MD, LW, LL}, T(Ih2)={HiGh, MeDium, LoW}={HG, MD, LW}, and T(Io)={Largely High, HiGh, MeDium, LoW, Largely Low}={LH, HG, MD, LW, LL}. From the fuzzy set theory, the fuzzy rule base forms have dimensions |T(Ih1)|×|T(Ih2)|×T|(Io)|.
Accordingly, M=75. On the other hand, the step-wise incremental/decremental action of the interference power budget for the non-real-time services, denoted by Pinc, is selected as the output linguistic variable. Here, seven levels of increment actions (N=7) are given, and the corresponding fuzzy term set is T(Pinc)={P I1, P I2, P I3, P I4, P I5, P I6, P I7}. After the interference increment is estimated by the FQ-RCE, the residual system capacity (RC) being allocated for the non-real-time services is defined as
RC = Ih2+ Pinc, (9)
where Ih2 is the capacity previously assigned to the non-real-time services. Additionally, the reinforcement learning signal c(x, a(x)) is defined as
c(x, a(x)) = [Pe(x, Pinc) − Pe∗
Pe∗ ]2, (10)
where Pe(x, Pinc) is the packet error probability of real-time services for the state-action pair (x, Pinc), which is a performance measure of the system, and Pe∗ is the QoS requirement of real-time packet error probability.
Figure 1 shows the structure of FQ-RCE as a five-layer adaptive-network-based imple-mentation of a fuzzy inference system. In the FQ-RCE, layer 1 to layer 3 are the antecedent components of the FIS, while layer 4 and layer 5 represent the consequent components. The node function in each layer is described as follows.
Layer 1: Every node k, 1 ≤ k ≤ 13, in this layer is a term node which represents a fuzzy term of an input linguistic variable, wherek= 1, · · ·, 5 (6, 7, 8) (9, · · ·, 13) denotes that node k is thekth ((k − 5)th) ((k − 8)th) term in T (Ih1) (T (Ih2)) (T (Io)). The node function is defined as the membership function with a bell shape for the term. Thus, for an input linguistic variable x, the output O1,k is given by
O1,k = b(x; mk, σk) = e−(x−mk)2σk2 , (11) whereb(·) is the bell-shaped function, and mk and σk is the mean and the variance of the node k, respectively.
Layer 2: Every node k, 1 ≤ k ≤ 75, in this layer is a rule node which represents the truth value of kth fuzzy rule; it is a fuzzy-AND operator. Here, the product operation is employed as the node function. Since each fuzzy rule has three input linguistic variables, the node outputO2,k
is the product sum of three fuzzy membership values corresponding to the inputs. Therefore, O2,k is given by
O2,k ={O1,l}, ∀l ∈ Pk, (12)
where Pk={l| all ls that are the pre-condition nodes of the k-th fuzzy rule}.
Layer 3: Every node k, 1 ≤ k ≤ 75, in this layer is a normalization node which performs a normalization operation so that all the truth values sum to unity. After the normalization, the output of this node O3,k is given by
O3,k= O2,k
75
l=1O2,l. (13)
Layer 4: Every node k, 1 ≤ k ≤ 75, in this layer is an action-select node which represents the consequence part of kth fuzzy rule. Based on the action selection policy and Q-values of the possible action candidates (P Ij, j = 1, 2, · · · 7), the node needs to choose an appropriate
action. Since improper initial fuzzy parameters settings would lead to a bad learning result, the Boltzmann-distributed exploration strategy in [20] is employed to explore the set of all the possible action candidates. In the Boltzmann-distributed exploration, the node chooses the state-action pair (Sk, ak), ak ∈ T (Pinc), for the kth rule, with the probability ξ(Sk, ak) given by
ξ(Sk, ak) = eq[Sk,ak]/T
7
j=1eq[Sk,P Ij]/T, (14)
where T is the temperature which reflects the randomness of action selection. After the action is chosen, the node sends two outputs O4,k,1 and O4,k,2 to the action node and Q-value node in layer 5, respectively. Outputs O4,k,1 and O4,k,2 are represented by
O4,k,1 = O3,k× ak, (15)
and
O4,k,2= O3,k× q[Sk, ak]. (16)
Layer 5: This layer has two output nodes, action node O5,1 and Q-value node O5,2, which represent the fuzzy defuzzification of FQ-RCE. Herein, the center of area method is applied for defuzzification. Since layer 3 normalizes the truth value of the antecedent part of the ith fuzzy rule, the node functions in layer 5 are summation of the inputs from layer 4. Hence,O5,1
and O5,2 are given by
O5,1= Pinc =M =75
k=1
O4,k,1, (17)
and
O5,2 = Q(x, Pinc) =M =75
k=1
O4,k,2. (18)
After the action is performed, the FQ-RCE calculates the reinforcement signal c(x, a(x)) by (10) and updates the Q-value of each state-action pair according to (6).
Notably, the convergence property of Q-learning is held for the single-agent (learner) case and may not be held for multiple-agent cases. Additionally, the convergence of Q-learning in multi-cell WCDMA systems would be a difficult task because decision policies of all cells concurrently change during the learning phase. To handle this difficulty, the perceptual coordination mechanism [16] is applied to FQ-RCE by designing the input linguistic variables, which incorporate two parts:Ih1 andIh2represent the current state of the radio resource usage in
home cell andIorepresents the radio resource allocations performed in adjacent cells. Therefore, by measuring the adjacent-cell interference, the FQ-RCE at home cell can implicitly perceive the situation of radio resource allocation (action) in adjacent cells. The multi-cell learning environment can then be simplified as a single-cell environment, and the convergence property for the FQ-RCE can be held as a result.
C. The Data Rate Scheduler (DRS)
The DRS modifies the exponential rule scheduling algorithm in [10]. The formula of the modified exponential rule is given by
j = argmax
i
{ri
¯ri × e1+Wi−W√W}, (19)
whereri, ¯ri, andWi are the link capacity, the average transmission rate, and the waiting time, of the ith data terminal, respectively, and W is the average waiting time of all the data terminals.
The main difference between the modified and the original exponential rules is in the definition of the link capacity. The original exponential rule was proposed for downlink transmission in the CDMA HDR system [9], where the link capacity was defined as the maximum transmission rate under the current link condition. However, in the multi-cell WCDMA environment, the uplink transmission power would interfere with adjacent cells. The closer the terminal’s location near the cell boundary, the larger the interference power. Therefore, the modified exponential rule algorithm sets a guard threshold of adjacent-cell interference for the uplink transmission power such that its incurred adjacent-cell interference is lower than the pre-defined level. Then, the location-dependent link capacity ri is defined as the maximum transmission rate available for a radio link, which must satisfy the following condition:
P (ri) × hai ≤ Pd, (20)
whereP (ri) is the transmission power of terminal i with rate ri, hai is the maximum link gain between the terminal i and adjacent cells, and Pd is the guard threshold of the adjacent-cell interference. In the strength-based power control scheme, the transmission powerP (ri) is given by
P (ri) = ri× (Eb/N0)∗× Imax
P G × hi , (21)
where (Eb/N0)∗ is the signal-to-noise requirement,Imax is the maximum received interference power, P G is the processing gain, and hi is the link gain between the terminal and its home cell. Additionally, hi and hai can be measured by monitoring the received pilot strength from the home and adjacent cells. Hence, the modified exponential rule states that the terminal with higher maximum available transmission rate, lower average transmitted rate and longer delay obtains higher transmission priority. As the terminal moves toward the cell boundary, the emission power to the adjacent cells increases, the transmission priority falls, and the waiting time accumulates. However, if the terminal’s waiting time is long, the transmission priority is high. Therefore, the modified exponential rule can strike a balance among the link gain, the location and the waiting time of terminals.
The DRS performs the rate allocation according to the terminal’s priority. The terminal with the highest priority is given the rate allocation first, and the other terminals are given the allocation in priority order. The operation of the DRS stops when all the data power budget is used out. Its procedure is described below:
[The DRS Algorithm]
Step 1 Obtain the residual system capacity (RC) for non-real-time services from FQ-RCE.
Step 2 Choose the highest-priority terminal, j, out of data terminals that are not allocated yet, by (19).
Step 3 Compute the remaining RC by
RC = RC − P (rj)/P G.
If the remaining RC is larger than 0, go back to Step 2. Otherwise, go to Step 4.
Step 4 Inform terminals of the assigned data rate via the signaling channel. End
IV. SIMULATIONRESULTS ANDDISCUSSION
In the simulations, a concatenated 19-cell (N=19) environment was configured as the multi-cell WCDMA system. The central multi-cell was labelled as multi-cell 1, the multi-cells in the first tier were cell 2∼ cell 7, and the cells in the second tier were cell 8 ∼ cell 19. Three kinds of real-time traffic were considered: voice traffic, high-bursty real-time data traffic and low-bursty real-time data traffic. The voice traffic assumed 2-level transmission rate traffic which is modelled by a 2-level MMDP (Markov modulated deterministic process) [22]. The real-time data traffic was
modelled by an ON/OFF traffic stream with specific burstiness 1/ρh (1/ρl) and peak rate Rp,h
(Rp,l) for high-bursty (low-bursty) real-time traffic. The two real-time data traffic flow had the same mean rate but different burstiness level. The non-real-time data traffic was considered to have a Poisson arrival process with data burst length in geometric distribution. Table I shows all the detailed traffic parameters. A basic rate in the WCDMA system is assumed to be a physical channel with SF=256. For each connection, DPCCH is always active to maintain the connection reliability. To reduce the overhead cost of interference produced by DPCCHs, the transmitting power of a DPCCH was assumed to be lower than its respective DPDCH by an amount of 3 dB. The QoS requirement of the packet error parameter, Pe∗, is set to be 0.01.
The conventional resource reservation scheme proposed in [7], LIDA (load and interference demand assignment), was used as a benchmark for performance comparison. The basic concept of the LIDA scheme is two-folded: firstly, a portion of interference power budget,β, is reserved to avoid overloading, and second, a burst-mode admission is applied for the high-rate traffic.
Additionally, the allocation of the incremental of transmission power,Pinc, to the non-real-time data traffic in the LIDA scheme is given by
Pinc = (1 − β)Imax− Ih1− Ih2− Io. (22) The performance of the LIDA scheme relies heavily on the choice of reservation threshold,β.
The simulations considered three reservation threshold,β = 0%, 5%, and 10%, and the modified exponential rule with Pd=2dB was applied for the LIDA scheme. Moreover, a scheme which combines the FQ-RCE with the original exponential rule, called FQ-RCE/EXP, was considered to further evaluate the effectiveness of the modified exponential rule. Notably, all the considered schemes were applied only to non-real-time terminals, and all the real-time terminals initiated data transmission whenever they had packets in queues.
A. Homogeneous Case
In the homogeneous case, all cells are assumed to contain 22 voice terminals, 40 real-time data terminals and 20 non-real-time data terminals. The 40 real-time data terminals consist of ND,h high-bursty and ND,l low-bursty data users, where ND,h+ND,l=40.
Figure 2 shows the packet error probabilities versus the number of high-bursty real-time data users. The packet error probability of the LIDA scheme was found to violate the QoS requirement, and the LIDA scheme without reservation (β=0%) had the largest packet error
probability. The results demonstrate the necessity to precise residual capacity estimation to avoid overloading in the multi-cell WCDMA environment. The packet error probabilities of the FQ-SDAM and FQ-RCE/EXP schemes always fulfill the QoS requirement because the FQ-RCE adopts the FQL, which inherently possesses the capability of reinforcement learning. Thus, the FQ-RCE can precisely determine the residual system capacity by monitoring the loading status of the home cell and the interference variation of adjacent cells. Additionally, regardless of the value of ND,h, FQ-SDAM scheme always achieves lower packet error probabilities than the FQ-RCE/EXP because the up-link transmission powers emitted from terminals interfere with users at the home cell and adjacent cells in the multi-cell environment. With the awareness of location of users, the modified exponential rule in FQ-SDAM effectively curbs the interference influence on adjacent cells within a sustainable level and consequently reduces the packet error probabilities.
Figure 3 shows the aggregate throughput of non-real-time data traffic versus three numbers of high-bursty real-time users: ND,h=10, 20 and 30. The three cases of different real-time data users were used to simulate the low-bursty, medium-bursty and high-bursty scenarios. Here, the performance of the LIDA scheme withβ=0% was not considered due to its QoS violation. FQ-SDAM was found to achieve the highest data throughout for non-real-time services, while LIDA with β=10% produced the lowest throughput. Compared with the LIDA scheme with β=10%, the FQ-SDAM, FQ-RCE/EXP, and LIDA with β=5% improved the throughput by 75.3%, 73.3% and 52.9% (53.3%, 51.1% and 49.2%), respectively, in the low-bursty (medium-bursty) case. In the high-bursty case, under QoS constraint, FQ-SDAM and FQ-RCE/EXP schemes improved the throughput over the LIDA with β=10% by 16.8% and 10.7%, respectively, because FQ-SDAM approaches the desired transmission target (Pe∗=0.01) by fuzzy Q-learning.
According to the definition of reinforcement signalc(x, a(x)), FQ-SDAM would try to allocate the maximum possible resource under the QoS requirement. By contrast, LIDA withβ=10% is
According to the definition of reinforcement signalc(x, a(x)), FQ-SDAM would try to allocate the maximum possible resource under the QoS requirement. By contrast, LIDA withβ=10% is