B3G無線接取網路之無線資源管理技術---子計畫一：異質多接取網路之資源管理技術(II)

(1)

子計畫一：異質多接取網路之資源管理技術(2/2)

計畫類別：整合型計畫計畫編號： NSC93-2219-E-009-011- 執行期間： 93 年 08 月 01 日至 94 年 07 月 31 日執行單位：國立交通大學電信工程學系(所) 計畫主持人：張仲儒共同主持人：廖維國，王蒞君計畫參與人員：陳詠翰、楊煖玉、顏志明、郭立忠、李宗軒報告類型：完整報告報告附件：出席國際會議研究心得報告及發表論文處理方式：本計畫可公開查詢

中華民國 94 年 10 月 27 日

(2)

ࣁჴ౜ B3G ύ౦፦܄ӭௗڗᆛၡޑଯਏ౗ၗྍଛ࿼аϷ୍ܺࠔ፦Ȑqualtiy-of-serviceǴQoSȑߥ᛾ǴᆛၡᆄჴᡏቫȐPHYȑᆶ൞ᡏ௓ڋቫȐMACȑӵՖӅӕ࣬ϕ ଛӝၲډଯس಍٬Ҕਏ౗ǵᆶ೯ૻᕉნଯ፾ᔈ܄Ǵஒޔௗ،ۓӭௗڗᆛၡޑ᏾ᡏਏ ૈǶӧҁηीฝύǴךॺޑࣴزճҔӚ໨ඵችࠠמೌǴ೛ी B3G س಍ύᜢᗖޑค ጕӭௗڗၗྍᆅ౛ᐒڋǴхࡴᔈҔܭ B3G ޑඵችࠠၗ਑Ӹڗᆅ౛מೌǵඵችࠠӭ ൞ᡏ௨ำᐒڋǴаϷԾ୏ୀෳ೯ૻރݩޑ୏ᄊಒझଛ࿼ϐࣴزǶځύඵችࠠၗ਑Ӹ ڗᆅ౛௦ҔΑ fuzzy Q-learning ޑמೌǴԖਏӦᅱ௓س಍೯ૻࠔ፦ǴЬाхࡴٰԾ ӕಒझᆶᎃ߈ಒझޑυᘋރݩǴ٠Ъ೛ी fuzzy Q-learning residual capacity estimator ȐFQ-RCEȑഭᎩ৒ໆ՗ෳᏔྗዴӦ՗ෳس಍ഭᎩ৒ໆǴаϷ໺ᒡೲ౗௨ำȐdata rate schedulerǴDRSȑǴၲډനԖਏ౗ޑӸڗ՗ෳᆶ௓ڋǴԶҗ FQ-RCE ᆶ DRS ܌ ᕇளޑջਔس಍ၗૻ൩ૈբࣁၗྍଛ࿼ᆶ௨ำϐҔǶԖΑ೭٤ၗૻǴךॺ׳຾΋؁ షکճҔီணԄઓ࿶ᆛၡȐcellular neural networkǴCNNȑаϷਏૈڄԄȐutility functionsȑ຾Չ௨ำ௓ڋᏔȐCNNU-based schedulerȑޑ೛ीࣴزǶӧךॺ܌ගр ޑ௨ำ௓ڋᐒڋύǴCNNU-based scheduler ૈ،ۓคጕၗྍ٬ҔރݩǴ٠ЪਥᏵس ಍୏ᄊᡂϯ຾Չၗྍଛ࿼Ǵаၲനεس಍٬Ҕਏ౗ᆶന٫໺ᒡਏૈϐҞޑǴԶЪૈ ᆢ࡭ QoS ߥ᛾ǶԶ೯ૻس಍୏ᄊᡂϯ௓ڋ߾ё೸ၸ୏ᄊಒझଛ࿼Ȑdynamic cell configurationȑޑᐒڋၲԋǶӧ೭໨೛ीύǴךॺӕਔԵቾډ soft handoffǵlink power allocation аϷ admission control ጄൎޑచҹǴ௦ҔΑ reinforcement-learning ޑמೌ୏ᄊፓ᏾س಍ჴᡏቫ pilot ޑελǴΨ൩ࢂ୏ᄊፓ᏾೯ૻᙟᇂጄൎǴаၲന ٫ޑф౗ଛ࿼ਏૈǴ٠Ъ຾΋؁ѳᑽس಍ॄၩǶ

ᜢᗖӷǺ౦፦܄ӭௗڗᆛၡǴ୍ܺࠔ፦Ǵӭ൞ᡏ௨ำǴ୏ᄊಒझଛ࿼Ǵfuzzy Q-learningǴഭᎩ৒ໆ՗ෳᏔǴ໺ᒡೲ౗௨ำǴီணԄઓ࿶ᆛၡǴਏૈڄ ԄǴreinforcement-learning

(3)

In order to achieve high-efficiency resource allocation and quality-of-service (QoS) guarantee in B3G heterogeneous multiple access networks, the collaboration of resource control in PHY and MAC layers will directly effect the system performance. In the subproject, we take advantage of intelligent technologies to design three critical mechanisms, which includes intelligent data access management, intelligent multimedia scheduling, and situation-aware dynamic cell configuration. The intelligent data access management adopts fuzzy Q-learning technology to monitor the communication situations such as inter-cell and intra-cell interference. The fuzzy Q-learning residual capacity estimator (FQ-RCE) and data rate scheduler (DRS) are proposed to efficiently estimate and control system resources. The real-time system information from FQ-RCE and DRS can further supports radio resource allocation and scheduling. Then we propose a cellular neural network utility (CNNU)-based scheduler, which combines the technologies of cellular neural network (CNN) and utility function. The CNNU-based scheduler decides the radio resource situations and allocations according to the system changes. With QoS guarantee, the CNNU-based scheduler can achieve maximum system utilization and throughput. And the system changes can be controlled by advanced dynamic cell configuration. In the design, we consider soft handoff, link power allocation, and admission control ranges to dynamically adjust the power of pilot in PHY by using reinforcement-learning technology. This will change the coverage of the controlled cell to maximize the performance of power allocation and load-balancing.

Keywords: heterogeneous multiple access network, QoS, multimedia scheduling, dynamic cell configuration, fuzzy Q-learning, FQ-RCE, DRS, CNN, utility function, reinforcement-learning

(4)

Mandarin Abstract

i

English Abstract

ii

iii

List of Figures

vi

List of Tables

vii

1 Project Overview

1 2 Situation-Aware Data Access Manager Using Fuzzy Q-learning

Technique for Multi-cell WCDMA Systems

4 I. Introduction . . . . 4

II. System Model . . . 8

III. Design of FQ-SDMA . . . . 9

A. The Fuzzy Q-Learning (FQL)

. . . . 10

B. Fuzzy Q-learning-based Residual Capacity Estimator (FQ-RCE)

. . . . 11

C. The Data Rate Scheduler (DRS)

. . . . 14

IV. Simulation Results and Discussion . . . . 15

A. Homogeneous Case

. . . . . . . . 16

(5)

Multimedia CDMA Cellular Networks

23 I. Introduction . . . . 23

II. System Model . . . . 25

III. Formulation of the Utility Function . . . . 27

A. Radio Resource Function Ri(t

) . . . . 27

B. The QoS Requirement Deviation Function

A

i

(t) . . . . . . . . . 28

C. The Fairness Compensation Function

F

i

(t) . . . . 28

IV. Design of the CNNU-Based Scheduler . . . . 30

A. Preliminaries for Cellular Neural Networks

. . . . 31

B. Cost Function for CNN Processor

. . . . 32

C. The Architecture of CNN Processor

. . . . 34

D. The Two-Layer Structure for CNN Processor

. . . . 36

V. Simulation Results and Discussion . . . . 38

4 A Novel Dynamic Cell Configuration Scheme in Next-Generation

Situation-Aware CDMA Networks

43 I. Introduction . . . . 43

II. Issues of Dynamic Cell Configuration . . . . 45

A. Effects of Pilot Power Allocation Schemes

. . . . 45

B. Effects of Soft Handoff Power Allocation Schemes

. . . . . . . . 46

C. Effects of New/Handoff Call Admission Control

. . . . 46

III. System Model . . . . 47

A. Signal Model

. . . . 47

B. Initial Cell Coverage Design

. . . . . . . . 48

IV. Proposed CDD-RL Scheme . . . . 49

(6)

C. Dynamic Maximum Link Power Constraint Design

. . . . 52

D. Dynamic CAC Criterion Design

. . . . 52

V. Simulation Results and Discussions . . . . 53

A. Simulation Model

. . . . 53

B. Performance Measurements and Discussions

. . . . . . . . 54

(7)

List of Figures

Chapter 2

1. Structure of FQ-RCE

. . . . 20

2. Packet error probabilities: homogeneous case

. . . . 20

3. Aggregate throughput of non-real-time data traffic: homogeneous case

. . . 21

4. Packet error probabilities: non-homogeneous case

. . . . 22

5. Aggregate throughput of non-real-time data traffic: non-homogeneous case

. . . . 22

Chapter 3

1. The block diagram of CNNU-based scheduler

. . . . 30

2. The two-layer structure of CNN processor

. . . . 37

3. The average system throughput

. . . . 39

4. QoS performance measures of PD and Rm

. . . . 40

5. The ratio ӽPD for RT connections and the ratio ӽRm for NRT interactive connections

. . . .

41

6. The fairness variation index for NRT connections

. . . . 41

Chapter 4

1. Power allocation in downlink CDMA systems

. . . . 59

2. System block diagram of proposed DCC-RL scheme

. . . . 60

3. Average pilot power of hotspot, 1st-tier, and 2nd-tier cells for (a) LPPA scheme and (b) SSDT scheme under FIX and DCC-RL

_{. . . . 60}

4. Comparison of blocking probability of (a) real-time and (b) non-real-time services

. . . . 61

5. Comparison of handoff forced termination probability

. . . . 61

6. Comparison of average total throughput

. . . . 62

7. Comparison of frame error probability

. . . . 62

(8)

List of Tables

Chapter 2

1. TRAFFIC PARAMETERS IN THE MULTI-CELL WCDMA SYSTEM

. . . 21

Chapter 4

(9)

The applications of multimedia services over wideband communication networks increase dramatically in recent years. In order to support a diverse of multimedia applications, the next-generation broadband networks have been required to satisfy the Quality of Service (QoS) requirements. Real-time and precise traffic control and scheduling mechanisms are essential to achieve the QoS guarantee and maximum utilization. Major topics about traffic control and resource management include link situation awareness, capacity estimation, rate allocation, traffic scheduling, call admission, resource monitoring, and cell configuration. According to the required services, users can access the network through the beyond third generation (B3G) mobile systems, which could be composed of heterogeneous networks. The radio resource management (RRM) in the heterogeneous network is essential, and the

(10)

find the ways of achieving best system utilization while maintaining QoS of every service in such B3G systems. In order to maximize the utilization in B3G systems, we should focus on some essential elements of RRM to efficiently allocate, manage, and monitor the radio resources. Therefore the key technologies of RRM in B3G systems are the most critical points to provide comprehensive and satisfactory mobile communication experience.

In the subproject, we propose a set of intelligent RRM schemes to reach our goals. In the second chapter, we propose a novel situation-aware data access manager using fuzzy Q-learning technique (FQ-SDAM) for multi-cell WCDMA systems. The FQ-SDAM contains a fuzzy Q-learning-based residual capacity estimator (FQ-RCE) and a data rate scheduler (DRS). The FQ-RCE can accurately estimate the situation-dependent residual system capacity, and appropriately chooses the received interference powers from the home-cell and adjacent-cell as input linguistic variables, which simplifies the multi-cell environment into a single-cell environment by applying a perceptual coordination mechanism. The DRS can effectively allocate the resource for non-real-time terminals by modifying the exponential rule, which considers the effect of interference on adjacent cells.

In the third chapter, a cellular neural network and utility (CNNU)-based scheduler is proposed for multimedia CDMA cellular networks supporting differentiated quality-of-service (QoS). The cellular neural network is powerful for complicated optimization problems and has been proved that it can rapidly converge to a desired equilibrium; the utility-based scheduling algorithm can efficiently utilize the radio resource for system and provide QoS requirements and fairness for connections. A relevant utility function for each connection is here defined as its radio resource function further weighted by both a QoS requirement deviation function and a fairness compensation function. The CNNU-based scheduler determines a radio resource assignment vector for all connections so that the overall system utility is maximized and the system throughput can be achieved as high as possible. At the same time, the performance measures of all connections are kept closed to their QoS requirements in an efficient way.

The fourth chapter presents a novel dynamic cell configuration scheme in next-generation situation-aware CDMA networks. To balance the time-varying traffic load between cells,

(11)

cellular networks to configure cell coverage and capacity dynamically. In this chapter, we show that pilot power allocation is highly coupled to other facets of radio resource management. We propose a novel dynamic cell configuration scheme for multimedia CDMA cellular networks, based on reinforcement-learning, which takes into account pilot, soft handoff, and maximum link power allocations as well as call admission control mechanisms. Simulation results demonstrate the effectiveness of the proposed scheme in situation-aware CDMA networks.

(12)

Situation-Aware Data Access Manager Using Fuzzy

Q-learning Technique for Multi-cell WCDMA

Systems

Abstract

his paper proposes a novel situation-aware data access manager using fuzzy Q-learning technique (FQ-SDAM) for multi-cell WCDMA systems. The FQ-SDAM contains a fuzzy Q-learning-based residual capacity estimator (FQ-RCE) and a data rate scheduler (DRS). The FQ-RCE can accurately estimate the situation-dependent residual system capacity, and appropriately chooses the received interference powers from the home-cell and adjacent-cell as input linguistic variables, which simplifies the multi-adjacent-cell environment into a single-adjacent-cell environment by applying a perceptual coordination mechanism. The DRS can effectively allocate the resource for non-real-time terminals by modifying the exponential rule [10], which considers the effect of interference on adjacent cells. his paper proposes a novel situation-aware data access manager using fuzzy Q-learning technique (FQ-SDAM) for multi-cell WCDMA systems. The FQ-SDAM contains a fuzzy Q-learning-based residual capacity estimator (FQ-RCE) and a data rate scheduler (DRS). The FQ-RCE can accurately estimate the situation-dependent residual system capacity, and appropriately chooses the received interference powers from the home-cell and adjacent-cell as input linguistic variables, which simplifies the multi-cell environment into a single-cell environment by applying a perceptual coordination mechanism. The DRS can effectively allocate the resource for non-real-time terminals by modifying the exponential rule [10], which considers the effect of interference on adjacent cells. T

I. INTRODUCTION

The WCDMA cellular system supports integrated services with mixed QoS (quality of services) requirements: real-time services require continuous transmission and is intolerant to time delay, while non-real-time services require bursty transmission and tolerate moderate time delay. An adequate radio resource management (RRM) is required to maximize the system

Chapter 2

Situation-Aware Data Access Manager

Using Fuzzy Q-learning Technique for

Multi-cell WCDMA Systems

(13)

capacity and fulfill the complementary QoS requirements. Among many traffic engineering techniques for the RRM, a call admission control method is applied to prevent system over-loading, based on the long-term availability of radio resources. On the other hand, a data access

control scheme provides bursty transmission permission for non-real-time services, based on

the short-term availability of radio resources.

The main purpose of the data access control scheme in WCDMA systems supporting integrated services is to maximize the throughput of non-real-time services while maintaining the transmission quality of real-time services [1]-[5]. To achieve this goal, dynamic access probability schemes [2]-[4] and a base station-controlled scheduling scheme [5] have proposed. In these schemes, the residual system capacity for non-real-time services is first estimated and then shared to non-real-time terminals. A single-cell environment was considered in [2]-[4], while a multi-cell environment was studied in [5]. The multi-cell scheme [5] treats the interference generated from other-cell terminals as if from several home-cell terminals, and consequently the multi-cell environment is regarded as a single-cell environment. However, the mutual-affected behavior of radio resource allocation in the multi-cell environment is still not considered. Notably, in the multi-cell WCDMA system, the increment of data transmission power in one cell would cause the interference level to rise in the adjacent cells. If each cell allocates the entire residual capacity for bursty transmission without considering the interference influence from adjacent cells, then the system become overloaded.

The over-loading phenomenon could be alleviated by an appropriate coordination method among cells [6]. Knowing the radio resources of all cells, a centralized data access method for the multi-cell WCDMA system can maximize the system throughput by applying a global optimization method. Unfortunately, the coordination procedure takes a long time to transact the resource information between cells, making practical implementation infeasible. Usually, the data access control scheme operates in the short-term time scale, e.g. frame time, making distributed schemes preferable. Kumar and Nanda [7] proposed a distributed scheme called load and interference-based demand assignment (LIDA). The LIDA is a resource reservation-based scheme which reserves some resources in each cell against the interference variation. Additionally, LIDA uses the concept of burst admission threshold for high-rate transmission in a cell to avoid excess interference power to adjacent cells, allowing bursty transmission only when the strength difference between the received pilot signals from the home cell and adjacent

(14)

cells is larger than the threshold. The effectiveness of this scheme relies on the selection of the reservation threshold, which should be dynamically chosen according to the system loading and the received interference power level.

Additionally, a rate scheduling scheme is also embedded in the data access control scheme to allocate the residual capacities for non-real-time terminals according to a service principle. Ramakrishna and Holtzman adopted a maximization throughput criterion for the scheduling scheme [8]. This criterion can maximize the system throughput, but may cause the low-class users to suffer from starvation. Alternatively, Jalali, Padovani, and Pankai proposed a

proportional fairness criterion [9] for a down link scheduling scheme in a CDMA-HDR (high

data rate) system. Their proposed scheme defines a utility function as a ratio of the supported and the average data rates. The supported data rate is determined by the channel condition, while the average data rate is calculated as the window average of the transmitted throughput. The terminal with the highest utility value transmits data in the next frame time. This algorithm may lead to large transmission delay for some terminals. Additionally, Shakkottai and Stolyar proposed an exponential rule criterion [10] for the another definition of the utility function to strike a good balance between the system throughput and the transmission delay. However, applying the exponential rule to the uplink transmission should consider the terminal’s location factor minimize interference with adjacent cells.

This part proposes a situation-aware data access manager using fuzzy Q-learning technique (FQ-SDAM) for multi-cell WCDMA systems. The proposed FQ-SDAM scheme consists of two parts: fuzzy Q-learning-based residual capacity estimator (FQ-RCE) and data rate

sched-uler (DRS). The FQ-RCE, by fuzzy Q-learning, estimates the appropriate situation-dependent

residual system capacity, in terms of interference power, for non-real-time services, while the DRS assigns transmission rates for non-real-time terminals by a modified exponential rule.

The fuzzy inference system (FIS) and the reinforcement learning technique have been separately applied to solve network resource management problems [11]-[14]. A fuzzy resource allocation controller was proposed in [12], where the FIS method was adopted to estimate the resource availability. A reinforcement learning technique, Q-learning, was applied respectively to handle dynamic channel assignment in [13] and multi-rate transmission control problems in [14] for wireless communication systems. By learning from the system environment, the Q-learning technique can converge to a pre-defined optimal control target. In [15], Jouffle

(15)

proposed a reinforcement learning technique for FIS, called fuzzy Q-learning (FQL). The FQL technique combines the advantages of FIS and reinforcement learning. The FIS provides a good function approximation for the FQL, which enables a priori knowledge to be applied to the system design. Additionally, the reinforcement learning provides a model-free approach to obtain a control target. By applying the FQL technique, the radio resource can be managed under partial, uncertain information, and the optimal resource management can be reached incrementally.

FQ-RCE uses interference measures from three sources as input linguistic variables to estimate the situation-dependent residual capacity in the multi-cell environment: the received interference power from real-time terminals at the home cell, the received interference power from non-real-time terminals at the home cell and the received interference power from the adjacent cells. Notably, the received interference power from adjacent cells is regarded as a different variable from the received interference power from home cell to distinguish the interference variations. Therefore, by the linguistic variable of the adjacent-cell interference power, the RCE at the home cell can perceive the radio resource allocation by those FQ-SDAMs in adjacent cells, or say, be aware of the loading of adjacent cells, and precisely estimate the residual resource in a distributed fashion. Thus, the multi-cell WCDMA environment does not require an explicit action coordination.

On the other hand, the DRS modifies the exponential rule in [10] to assign the transmission rates for non-real-time terminals, based on the residual capacity estimated by FQ-RCE. The modified exponential rule is a utility-function-based scheduling algorithm which considers the transmission delay, average transmission rate, and link capacity. The modified rule differs from the original exponential rule [10] in the link capacity definition. For the modified exponen-tial rule, the link capacity is defined as the maximum available rate where the interference influence on adjacent cells by the transmission power is below a guard threshold, considering location awareness. The modified exponential rule is most suitable for applications in the uplink transmission of multi-cell WCDMA systems, which is explained later. Simulation results show that the proposed FQ-SDAM outperforms the LIDA scheme since it can effectively reduce the packet error probability and improve the aggregate throughput in both homogeneous and non-homogeneous multi-cell WCDMA environments. Additionally, the modified exponential rule can achieve better system performance than the original exponential rule. In the homogeneous

(16)

case, FQ-SDAM achieves higher aggregate throughput by 75.3% (53.3%) than LIDA with

β=10%, under high-bursty (low-bursty) real-time traffic. In the nonhomogeneous case,

FQ-SDAM achieves greater aggregate throughput by 31.53%, 35.5%, and 34.2% for the cells in the central, first-tier, and second-tier, respectively, than LIDA with β=10%.

The rest of this chapter is organized as follows. The system model is described in Section II. Section III briefly describes the concept of fuzzy Q-learning and proposes the design of FQ-SDAM. Simulation results are presented in Section IV, which compares the performance of the FQ-SDAM and a conventional LIDA scheme.

II. SYSTEMMODEL

This part considers a multi-cell WCDMA system containing N cells, where each cell has a base station using FQ-SDAM to allocate the radio resource for real-time and non-real-time terminals within its coverage area. An uplink supporting slotted transmission is adopted. All terminals transmit at the same frequency band and are distinguished by their own spreading codes. Each terminal holds two communication channels, the dedicated physical data channel (DPDCH) and the dedicated physical control channel (DPCCH). The DPDCH carries data generated by layer 2 protocol, while the DPCCH carries control information. A channel has a frame-based structure, where the frame length T_f = 10 ms is divided into 15 slots with length T_slot = 2560 chips, each slot corresponding to one power control period. Hence, the power control frequency is 1500 Hz. The spreading factor (SF) for DPDCH can vary between

4 ∼ 256 by SF = 256/2k, k = 0, 1, · · · , 6, carrying 10×2k _{bits per slot, and the SF for DPCCH}

is fixed at 256, carrying 10 bits per slot.

Two types of traffic are considered: real-time (type-1) traffic and non-real-time (type-2) traffic. The system provides continuous transmission for real-time traffic and bursty transmis-sion for non-real-time traffic. Here, the real-time terminal is the terminal supporting real-time services, and the non-real-time terminal is the terminal supporting non-real-time services. The real-time terminals may transmit at any possible data rate while necessary; on the other hand, the transmission of non-real-time terminals is controlled by the data access manager at the base station. Considering the terminal’s link gain and the received interference power from both the home and adjacent cells, the data access manager assigns an appropriate data rate for each non-real-time terminal. For the bursty transmission, the available data transmission rates

(17)

are 1X, 2X, 4X and 8X, and 1X transmission rate is called the basic rate. A strength-based power control scheme is assumed such that the required transmission power of a mobile is directly proportional to the transmission rate [18]. Additionally, the overall capacity is set by the upper bound of the total received interference power, and the residual capacity is defined as the allowable received interference power from the non-real-time terminals.

The link gain between terminal i to base station j, denoted by h_ij, is usually determined by the long-term fading F L_ij and the short-term fading F S_ij [19], which is given by

hij = F Lij× F Sij. (1)

The long-term fading F Lij, combining the path loss and shadowing, is modelled as

F Lij = k × r−α× 10η/10, (2)

wherek is constant, r is distance from mobile i to base station j, α is path loss exponent usually lying between 2 and 5 for a mobile environment (α = 4), and η is normal-distributed random variable with zero mean and variance σ_L2. The parameter σ_L is affected by the configuration of the terrain and ranges from 5 to 12 (σ_L2=10) [19]. The short-term fadingF Sij is mainly caused

by multi-path reflections, and is modelled by Rayleigh distribution.

The real-time service is modelled as an ON-OFF Markov process with a transition rate

µ from ON to OFF and λ from OFF to ON. The non-real-time service is modelled as a

batch Poisson process, in which the arrival process of the data burst is in Poisson distribution and the data length is assumed to have a geometric distribution. The measure of the packet error probability, denoted by Pe, is regarded as the system performance index. The maximum

tolerable packet error probability, denoted by P_e∗, is defined as the system QoS requirement. Additionally, the measure of packet transmission delay is used as a parameter for the data rate scheduler.

III. DESIGN OF FQ-SDAM

The FQ-SDAM contains two functional blocks of a fuzzy Q-learning-based residual capacity estimator (FQ-RCE) and a data rate scheduler (DRS). The FQ-RCE estimates the residual interference power budget, and then the DRS allocates the resource for the non-real-time terminals. The following section describes the fuzzy Q-learning and the detailed design of the two function blocks.

(18)

A. The Fuzzy Q-Learning (FQL)

Denote S the set of state vectors for the system, S={S_i, i = 1, 2, · · · , M}; each state vector

Si comprises L fuzzy linguistic variables selected to describe the system. Denote A the set of

actions possibly chosen by system states,A={A_j, j = 1, 2, · · · , N}. For an input state vector x containing theL linguistic variables, the rule representation of FQL for state S_i is in the form by

if x is S_i, then A_j with q[S_i, A_j], 1 ≤ i ≤ M and 1 ≤ j ≤ N, whereA_j is thejth action candidate that is possibly chosen by state S_i, and q[S_i, A_j] is the Q-value for the state-action pair (Si, Aj). The number of state-action pairs for each stateSi equals

the number of the elements in the action set; i.e., each antecedent hasN possible consequences. Every fuzzy rule needs to choose an action A_i from the action candidates set A by an action selection policy. In the FQL, the action selection policy for each fuzzy rule may be select-max or another exploration strategy. To defuzzify the M fuzzy rules, the inferred actiona(x) for the input vector x is expressed as

a(x) = _M i=1αi× Ai _M i=1αi , (3)

where α_i is the truth value of the rule representation of FQL for state S_i. Additionally, the Q-value for the state-action pair (x, a(x)) is given by

Q(x, a(x)) = _M i=1αi× q[Si, Ai] _M i=1αi . (4)

For the current system statex after applying the chosen action a(x), the next-stage system state is assumed at y, and the system reinforcement signal is given by c(x, a(x)). To update the Q-value, the next-stage optimal Q-value, Q∗(y, a(y)), is defined as

Q∗_{(y, a(y)) =} Mi=1αi× q[Si, a∗i] M

i=1αi

, (5)

where q[S_i, a∗_i] is the Q-value of state-action pair (S_i, a∗_i) and a∗_i = argmax

Aj

{q[Si, Aj]}.

According to the Q-learning rule [17], the Q-value update in the FQL can be expressed as

(19)

where η is the learning rate, 0 ≤ η ≤ 1, and

∆q[Si, ai] = {c(x, a(x)) + γQ∗(y, a(y)) − Q(x, a(x))} × _Mαi k=1αk

. (7)

c(x, a(x)) in (7) is the reinforcement signal.

B. Fuzzy Q-learning-based Residual Capacity Estimator (FQ-RCE)

The FQ-RCE selects three interference measures as input linguistic variables: the received interference power from real-time terminals at the home cell (I_h1), the received interference power from non-real-time terminals at the home cell (I_h2), and the received interference power from adjacent cells (Io). Notably, the received interference power in the WCDMA system is a

good indicator of system loading because the system capacity is interference-limited; moreover, the interference generated from the home cell can be identified by PN codes and the interference from adjacent cells can be distinguished by long scrambling codes [21]. Accordingly, the system state vector x containing the three linguistic variables input to FQ-RCE is defined as

x = (Ih1, Ih2, Io). (8)

Comprehensive experiments found that five terms for both Ih1 and Io, and three terms for

Ih2 were proper. Hence, their fuzzy term sets are T(Ih1)={Largely High, HiGh, MeDium, LoW,

Largely Low}={LH, HG, MD, LW, LL}, T(I_h2)={HiGh, MeDium, LoW}={HG, MD, LW}, and T(Io)={Largely High, HiGh, MeDium, LoW, Largely Low}={LH, HG, MD, LW, LL}. From

the fuzzy set theory, the fuzzy rule base forms have dimensions |T(I_h1)|×|T(I_h2)|×T|(I_o)|. Accordingly, M=75. On the other hand, the step-wise incremental/decremental action of the interference power budget for the non-real-time services, denoted by Pinc, is selected as the

output linguistic variable. Here, seven levels of increment actions (N=7) are given, and the corresponding fuzzy term set is T(P_inc)={P I₁, P I₂, P I₃, P I₄, P I₅, P I₆, P I₇}. After the interference increment is estimated by the FQ-RCE, the residual system capacity (RC) being allocated for the non-real-time services is defined as

RC = Ih2+ Pinc, (9)

where I_h2 is the capacity previously assigned to the non-real-time services. Additionally, the reinforcement learning signal c(x, a(x)) is defined as

c(x, a(x)) = [Pe(x, Pinc) − Pe∗

P∗ e

(20)

where P_e(x, P_inc) is the packet error probability of real-time services for the state-action pair

(x, Pinc), which is a performance measure of the system, and Pe∗ is the QoS requirement of

real-time packet error probability.

Figure 1 shows the structure of FQ-RCE as a five-layer adaptive-network-based imple-mentation of a fuzzy inference system. In the FQ-RCE, layer 1 to layer 3 are the antecedent components of the FIS, while layer 4 and layer 5 represent the consequent components. The node function in each layer is described as follows.

Layer 1: Every node k, 1 ≤ k ≤ 13, in this layer is a term node which represents a fuzzy

term of an input linguistic variable, wherek= 1, · · ·, 5 (6, 7, 8) (9, · · ·, 13) denotes that node k is thekth ((k − 5)th) ((k − 8)th) term in T (I_h1) (T (I_h2)) (T (I_o)). The node function is defined as the membership function with a bell shape for the term. Thus, for an input linguistic variable

x, the output O1,k is given by

O1,k = b(x; mk, σk) = e−

(x−mk)2

σk2 , ₍₁₁₎

whereb(·) is the bell-shaped function, and mk _and σk _{is the mean and the variance of the node}

k, respectively.

Layer 2: Every node k, 1 ≤ k ≤ 75, in this layer is a rule node which represents the truth

value of kth fuzzy rule; it is a fuzzy-AND operator. Here, the product operation is employed as the node function. Since each fuzzy rule has three input linguistic variables, the node outputO_2,k is the product sum of three fuzzy membership values corresponding to the inputs. Therefore,

O2,k is given by

O2,k =

{O1,l}, ∀l ∈ Pk, (12)

where P_k={l| all ls that are the pre-condition nodes of the k-th fuzzy rule}.

Layer 3: Every node k, 1 ≤ k ≤ 75, in this layer is a normalization node which performs

a normalization operation so that all the truth values sum to unity. After the normalization, the output of this node O_3,k is given by

O3,k= ₇₅O2,k

l=1O2,l

. (13)

Layer 4: Every node k, 1 ≤ k ≤ 75, in this layer is an action-select node which represents

the consequence part of kth fuzzy rule. Based on the action selection policy and Q-values of the possible action candidates (P I_j, j = 1, 2, · · · 7), the node needs to choose an appropriate

(21)

action. Since improper initial fuzzy parameters settings would lead to a bad learning result, the Boltzmann-distributed exploration strategy in [20] is employed to explore the set of all the possible action candidates. In the Boltzmann-distributed exploration, the node chooses the state-action pair (S_k, a_k), a_k ∈ T (P_inc), for the kth rule, with the probability ξ(S_k, a_k) given by ξ(Sk, ak) = e q[Sk,ak]/T ₇ j=1eq[Sk,P Ij]/T , (14)

where T is the temperature which reflects the randomness of action selection. After the action is chosen, the node sends two outputs O_4,k,1 and O_4,k,2 to the action node and Q-value node in layer 5, respectively. Outputs O_4,k,1 and O_4,k,2 are represented by

O4,k,1 = O3,k× ak, (15)

and

O4,k,2= O3,k× q[Sk, ak]. (16)

Layer 5: This layer has two output nodes, action node O_5,1 and Q-value node O_5,2, which

represent the fuzzy defuzzification of FQ-RCE. Herein, the center of area method is applied for defuzzification. Since layer 3 normalizes the truth value of the antecedent part of the ith fuzzy rule, the node functions in layer 5 are summation of the inputs from layer 4. Hence,O_5,1 and O_5,2 are given by

O5,1= Pinc = M =75 k=1 O4,k,1, (17) and O5,2 = Q(x, Pinc) = M =75 k=1 O4,k,2. (18)

After the action is performed, the FQ-RCE calculates the reinforcement signal c(x, a(x)) by (10) and updates the Q-value of each state-action pair according to (6).

Notably, the convergence property of Q-learning is held for the single-agent (learner) case and may not be held for multiple-agent cases. Additionally, the convergence of Q-learning in multi-cell WCDMA systems would be a difficult task because decision policies of all cells concurrently change during the learning phase. To handle this difficulty, the perceptual coordination mechanism [16] is applied to FQ-RCE by designing the input linguistic variables, which incorporate two parts:I_h1 andI_h2represent the current state of the radio resource usage in

(22)

home cell andI_orepresents the radio resource allocations performed in adjacent cells. Therefore, by measuring the adjacent-cell interference, the FQ-RCE at home cell can implicitly perceive the situation of radio resource allocation (action) in adjacent cells. The multi-cell learning environment can then be simplified as a single-cell environment, and the convergence property for the FQ-RCE can be held as a result.

C. The Data Rate Scheduler (DRS)

The DRS modifies the exponential rule scheduling algorithm in [10]. The formula of the modified exponential rule is given by

j = argmax i {ri ¯ri × e Wi−W 1+√W}, ₍₁₉₎

wherer_i, ¯r_i, andW_i are the link capacity, the average transmission rate, and the waiting time, of the ith data terminal, respectively, and W is the average waiting time of all the data terminals. The main difference between the modified and the original exponential rules is in the definition of the link capacity. The original exponential rule was proposed for downlink transmission in the CDMA HDR system [9], where the link capacity was defined as the maximum transmission rate under the current link condition. However, in the multi-cell WCDMA environment, the uplink transmission power would interfere with adjacent cells. The closer the terminal’s location near the cell boundary, the larger the interference power. Therefore, the modified exponential rule algorithm sets a guard threshold of adjacent-cell interference for the uplink transmission power such that its incurred adjacent-cell interference is lower than the pre-defined level. Then, the location-dependent link capacity r_i is defined as the maximum transmission rate available for a radio link, which must satisfy the following condition:

P (ri) × hai ≤ Pd, (20)

whereP (r_i) is the transmission power of terminal i with rate r_i, ha

i is the maximum link gain

between the terminal i and adjacent cells, and Pd is the guard threshold of the adjacent-cell

interference. In the strength-based power control scheme, the transmission powerP (r_i) is given by P (ri) = ri× (Eb/N0 )∗_{× I} max P G × hi , (21)

(23)

where (E_b/N₀)∗ is the signal-to-noise requirement,I_max is the maximum received interference power, P G is the processing gain, and hi is the link gain between the terminal and its home

cell. Additionally, h_i and ha

i can be measured by monitoring the received pilot strength from

the home and adjacent cells. Hence, the modified exponential rule states that the terminal

with higher maximum available transmission rate, lower average transmitted rate and longer delay obtains higher transmission priority. As the terminal moves toward the cell boundary, the

emission power to the adjacent cells increases, the transmission priority falls, and the waiting time accumulates. However, if the terminal’s waiting time is long, the transmission priority is high. Therefore, the modified exponential rule can strike a balance among the link gain, the location and the waiting time of terminals.

The DRS performs the rate allocation according to the terminal’s priority. The terminal with the highest priority is given the rate allocation first, and the other terminals are given the allocation in priority order. The operation of the DRS stops when all the data power budget is used out. Its procedure is described below:

[The DRS Algorithm]

Step 1 Obtain the residual system capacity (RC) for non-real-time services from

FQ-RCE.

Step 2 Choose the highest-priority terminal, j, out of data terminals that are not

allocated yet, by (19).

Step 3 Compute the remaining RC by

RC = RC − P (rj)/P G.

If the remaining RC is larger than 0, go back to Step 2. Otherwise, go to

Step 4.

Step 4 Inform terminals of the assigned data rate via the signaling channel. End

IV. SIMULATIONRESULTS ANDDISCUSSION

In the simulations, a concatenated 19-cell (N=19) environment was configured as the multi-cell WCDMA system. The central multi-cell was labelled as multi-cell 1, the multi-cells in the first tier were cell 2∼ cell 7, and the cells in the second tier were cell 8 ∼ cell 19. Three kinds of real-time traffic were considered: voice traffic, high-bursty real-time data traffic and low-bursty real-time data traffic. The voice traffic assumed 2-level transmission rate traffic which is modelled by a 2-level MMDP (Markov modulated deterministic process) [22]. The real-time data traffic was

(24)

modelled by an ON/OFF traffic stream with specific burstiness 1/ρ_h (1/ρ_l) and peak rate R_p,h (R_p,l) for high-bursty (low-bursty) real-time traffic. The two real-time data traffic flow had the same mean rate but different burstiness level. The non-real-time data traffic was considered to have a Poisson arrival process with data burst length in geometric distribution. Table I shows all the detailed traffic parameters. A basic rate in the WCDMA system is assumed to be a physical channel with SF=256. For each connection, DPCCH is always active to maintain the connection reliability. To reduce the overhead cost of interference produced by DPCCHs, the transmitting power of a DPCCH was assumed to be lower than its respective DPDCH by an amount of 3 dB. The QoS requirement of the packet error parameter, P_e∗, is set to be 0.01.

The conventional resource reservation scheme proposed in [7], LIDA (load and interference demand assignment), was used as a benchmark for performance comparison. The basic concept of the LIDA scheme is two-folded: firstly, a portion of interference power budget,β, is reserved to avoid overloading, and second, a burst-mode admission is applied for the high-rate traffic. Additionally, the allocation of the incremental of transmission power,P_inc, to the non-real-time data traffic in the LIDA scheme is given by

Pinc = (1 − β)Imax− Ih1− Ih2− Io. (22)

The performance of the LIDA scheme relies heavily on the choice of reservation threshold,β. The simulations considered three reservation threshold,β = 0%, 5%, and 10%, and the modified exponential rule with Pd=2dB was applied for the LIDA scheme. Moreover, a scheme which

combines the FQ-RCE with the original exponential rule, called FQ-RCE/EXP, was considered to further evaluate the effectiveness of the modified exponential rule. Notably, all the considered schemes were applied only to non-real-time terminals, and all the real-time terminals initiated data transmission whenever they had packets in queues.

A. Homogeneous Case

In the homogeneous case, all cells are assumed to contain 22 voice terminals, 40 real-time data terminals and 20 non-real-time data terminals. The 40 real-time data terminals consist of

ND,h high-bursty and ND,l low-bursty data users, where ND,h+ND,l=40.

Figure 2 shows the packet error probabilities versus the number of high-bursty real-time data users. The packet error probability of the LIDA scheme was found to violate the QoS requirement, and the LIDA scheme without reservation (β=0%) had the largest packet error

(25)

probability. The results demonstrate the necessity to precise residual capacity estimation to avoid overloading in the multi-cell WCDMA environment. The packet error probabilities of the FQ-SDAM and FQ-RCE/EXP schemes always fulfill the QoS requirement because the FQ-RCE adopts the FQL, which inherently possesses the capability of reinforcement learning. Thus, the FQ-RCE can precisely determine the residual system capacity by monitoring the loading status of the home cell and the interference variation of adjacent cells. Additionally, regardless of the value of N_D,h, FQ-SDAM scheme always achieves lower packet error probabilities than the FQ-RCE/EXP because the up-link transmission powers emitted from terminals interfere with users at the home cell and adjacent cells in the multi-cell environment. With the awareness of location of users, the modified exponential rule in FQ-SDAM effectively curbs the interference influence on adjacent cells within a sustainable level and consequently reduces the packet error probabilities.

Figure 3 shows the aggregate throughput of non-real-time data traffic versus three numbers of high-bursty real-time users: N_D,h=10, 20 and 30. The three cases of different real-time data users were used to simulate the low-bursty, medium-bursty and high-bursty scenarios. Here, the performance of the LIDA scheme withβ=0% was not considered due to its QoS violation. FQ-SDAM was found to achieve the highest data throughout for non-real-time services, while LIDA with β=10% produced the lowest throughput. Compared with the LIDA scheme with β=10%, the FQ-SDAM, FQ-RCE/EXP, and LIDA with β=5% improved the throughput by 75.3%, 73.3% and 52.9% (53.3%, 51.1% and 49.2%), respectively, in the low-bursty (medium-bursty) case. In the high-bursty case, under QoS constraint, FQ-SDAM and FQ-RCE/EXP schemes improved the throughput over the LIDA with β=10% by 16.8% and 10.7%, respectively, because FQ-SDAM approaches the desired transmission target (P_e∗=0.01) by fuzzy Q-learning. According to the definition of reinforcement signalc(x, a(x)), FQ-SDAM would try to allocate the maximum possible resource under the QoS requirement. By contrast, LIDA withβ=10% is a conservative scheme, which has the lowest packet error probability at the expense of capacity waste. Additionally, in the three cases, the FQ-SDAM achieved a higher aggregate throughput than FQ-RCE/EXP by 1.4%, 1.43% and 5.5%, respectively. As the number of high-bursty real-time users goes up, the performance gain rises because the modified exponential rule considers the terminal’s interference influence on adjacent cells and accordingly cuts the packet error probability in the multi-cell WCDMA environment. With a reinforcement signal containing a

(26)

lower packet error probability, the FQ-RCE tends to allocate more capacity in the next-turn decision during the fuzzy Q-leaning period; consequently, the data throughput increases as more packets are successfully transmitted.

B. Non-homogeneous Case

In the non-homogeneous case, the real-time data terminals for the first-tier cells (cell 2 to cell 8) are: ND,h = 25 − 2 ∗ (i − 1) and ND,l = 40− ND,h, i=2, · · ·, 8, while for the central

and second-tier cells, the real-time data terminals are: ND,h = ND,l = 20.

Figure 4 shows the packet error probabilities of the three tiers in the multi-cell WCDMA system. As the figure reveals, only FQ-SDAM, FQ-RCE/EXP, and LIDA with β=10% meet the QoS requirement because FQ-SDAM and FQ-RCE/EXP consider the received adjacent-cell interference power as an input parameter for resource estimation. The resource allocation in the adjacent-cells is perceived by observing the interference fluctuation. Consequently, the resource allocations between cells can be conceptually coordinated implicitly. Additionally, compared to Fig. 2 at N_D,h = 20, the packet error probability in the non-homogeneous case is larger than that in the homogeneous case because the fluctuation of received adjacent-cell interference ,in the non-homogeneous case, differs from cell to cell when the cells compete for the residual capacity in the multi-cell environment. Without coordination, each cell allocates myopically, causing the system to over-loading.

Fig. 5 shows the aggregate throughputs of non-real-time data traffic in the three tiers of the multi-cell WCDMA system. Here, the aggregate throughputs of the LIDA withβ=0% and

β=5% are not considered due to their QoS violation. The aggregate throughput in the

non-homogeneous case is smaller than that in the non-homogeneous case due to the higher interference fluctuation. Also, the FQ-SDAM and FQ-RCE/EXP schemes still achieves higher aggregate throughput by an amount of 31.53% and 28.346% (35.5% and 33.63%) (34.2% and 32%) for the cells in the central (first-tier) (second-tier) than the LIDA with β = 10% scheme does.

REFERENCES

[1] K. Das and S. D. Morgera, “Interference and SIR in integrated voice/data wireless DS-CDMA networks - a simulation study,”IEEE J. Select. Areas. Commun., vol. 15, no. 8, pp. 1527-1537, Oct., 1997.

[2] T. K. Liu and J. A. Silvester, “Joint admission/congestion control for wireless CDMA systems supporting integrated services,”IEEE J. Select. Areas. Commun., vol. 16, no. 6, pp. 845-857, Aug., 1998.

(27)

[3] C. Comaniciu and N. B. Mandayam, “Delta modulation based prediction for access control in integrated voice/data CDMA systems,”IEEE J. Select. Areas. Commun., vol. 18, no. 1, pp. 112-122, Jan., 2000.

[4] A. Sampath and J. M. Holtzman, “Access control of data in integrated voice/data CDMA systems: benifits and tradeoffs,”

IEEE J. Select. Areas. Commun., vol. 15, no. 8, pp. 1511-1526, Oct., 1997.

[5] C. Comaniciu, N. B. Mandayam, D. Famolari, and P. Agrawal, “Wireless access to the world wide web in an integrated CDMA system,”IEEE J. Select. Areas. Commun., vol. 2, pp. 472-483, May, 2003

[6] L. Chen , H. Kayama, and N. Umeda, “Power resource cooperation control considering wireless QoS for CDMA packet mobile communication systems,”IEEE Int’l symposium on Personal, Indoor and Mobile Radio Communications (PIMRC

2002), vol. 3, pp. 1092 -1096.

[7] S. Kumar and S. Nanda, “High data-rate packet communication for cellular network using CDMA: algorithm and performance,”IEEE J. Select. Areas. Commun., vol. 17, no. 3, pp. 472-492, Mar., 1999.

[8] S. Ramakrishna and J. M. Holtzman, “A scheme for throughput maximization in a dual-class CDMA system, ”IEEE J. Select. Areas. Commun., vol. 16, no. 6, pp. 830-844, Aug., 1998.

[9] A. Jalali, R. Padovani and R. Pankaj, “Data throughput of CDMA-HDR a high efficiency-high data rate personal communication wireless system,” IEEE VTC2000-Spring, Tokyo, May 2000, pp.1854-1858.

[10] S. Shakkottai and A. L. Stolyar, “Scheduling algorithms for a mixture of real-time and non-real-time data in HDR,” 17th

International Teletraffic Congress (ITC-17), Sep., 2001.

[11] L. Wang(Ed), Soft Computing in Communications, Springer, 2003.

[12] Y. S. Chen and C. J. Chang, “A resource allocation scheme using adaptive-network based fuzzy control for mobile multimedia network,”IEICE Trans. Commun., vol. E85-B, no. 2, pp. 502-513, Feb. 2002.

[13] J. Nie and S. Haykin, “A Q-learning-based dynamic channel assignment technique for mobile communication systems,”

IEEE Trans. Veh. Technol., vol. 48, no. 5, pp. 1676-1687, Sep. 1999.

[14] Y. S. Chen, C. J. Chang, and F. C. Ren, “A Q-learning-based multi-rate transmission control scheme for RRM in multimedia WCDMA systems,”IEEE Trans. Veh. Technol., vol. 53, no. 1, pp 38-48, Jan. 2004.

[15] L. Jouffle, “Fuzzy inference system learning by reinforcement methods,”IEEE Trans. Syst. Man. Cybern., vol. 8, no. 3,

pp. 338-355, Aug. 1998.

[16] O. Abul, F. Polat, and R. Alhajj, “Multiagent reinforcement learning using function approximation,”IEEE Trans. Syst. Man. Cybern., vol. 30, no. 4, pp. 485-497, Nov. 2000.

[17] C. J. C. H. Watkins and P. Dayan, “Q-learning,”Machine Learning, vol. 8, pp. 279-292, 1992.

[18] S. Ariyavistakul and L. F. Chang, “Signal and Interference Statistics of a CDMA System with Feedback Power Control,”

IEEE Trans. Comm., no. 11, Nov. 1993, pp. 1626-1634.

[19] G. L. St¨uber, Priniciple of Mobile Communication, Kluwer Academic Publishers, 1996. [20] S. Haykin, Neural Networks 2nd. Prentice Hall, 1999.

[21] 3rd Generation Partnership Project, (Sep. 2002 ) Spreading and Modulation (FDD), 3GPP TS 25.213 [On-line]http://www.3gpp.org.

[22] P. T. Brady, “A model for on-off speech patterns in two-way conversation,” Bell Syst.Tech. J, vol. 48, pp. 2445-2472, Jan. 1969.

[23] J. L. Huertas, S. Sanchez-Solano, I. Baturone, I, and A. Barriga, ”Integrated circuit implementation of fuzzy controllers,”

IEEE J. Solid-State Circuits, vol. 31, no. 7 , pp. 1051-1058, Jul. 1996.

(28)

Fig. 1. Structure of FQ-RCE

(29)

TABLE I

TRAFFICPARAMETERS INTHE MULTI-CELLWCDMA SYSTEM

Traffic Type Traffic Parameters

2-level real-time voice Mean talkspurt duration: 1.00 seconds Mean silence duration: 1.35 seconds Peak rate (Rp,h): 4-fold of basic rate High-bursty Mean rate: 1-fold of basic rate real-time data traffic ρh: 0.25

Peak rate (Rp,l): 2-fold of basic rate Low-bursty Mean rate: 1-fold of basic rate real-time data traffic ρl: 0.5

Mean data burst size: 200 packets Non-real-time data traffic rmin: 1-fold of basic rate

rmax: 8-fold of basic rate

(30)

Fig. 4. Packet error probabilities: non-homogeneous case

(31)

A Cellular Neural Network and Utility-based

Scheduler for Multimedia CDMA Cellular

Networks

Scott Shen and Chung-Ju Chang Department of Communication Engineering

National Chiao Tung University Hsinchu 300, Taiwan ROC E-mail: [email protected]

Tel. No.: 886-3-5731923 Fax No.: 886-3-5710116

Abstract

In this paper, a cellular neural network and utility (CNNU)-based scheduler is proposed for multi-media CDMA cellular networks supporting differentiated quality-of-service (QoS). The cellular neural network is powerful for complicated optimization problems and has been proved that it can rapidly converge to a desired equilibrium; the utility-based scheduling algorithm can efficiently utilize the radio resource for system and provide QoS requirements and fairness for connections. A relevant utility function for each connection is here defined as its radio resource function further weighted by both a QoS requirement deviation function and a fairness compensation function. The CNNU-based scheduler determines a radio resource assignment vector for all connections so that the overall system utility is maximized and the system throughput can be achieved as high as possible. At the same time, the performance measures of all connections are kept closed to their QoS requirements in an efficient way.

I. INTRODUCTION

In future wireless networks, heterogeneous and customized services with diverse traffic charac-teristics and QoS requirements are expected to be provided via a number of air interfaces. Also, multimedia applications are commonly accepted as enabling services, which are categorized into several classes [1]. To meet various traffic characteristics and QoS requirements of these potential

Chapter 3

A Cellular Neural Network and

Utility-based Scheduler for Multimedia

CDMA Cellular Networks

(32)

applications, a sophisticated scheduling algorithm plays an essential role so that the system resource allocation is optimal, while retaining a pre-defined QoS requirements and fairness among them.

Many scheduling algorithms have been widely studied for wireline networks [2]-[3]. In the wireless communication networks, the radio channel have quite different characteristics from those in wireline networks. The transmission error probability is by several order greater than that in wireline links, and the available maximum transmission rate to each connection is location-dependent and time-varying due to link loss, shadowing, and multi-path fading. The QoS requirements and the weighted fairness among all connections should be modified.

The literature studied the resource scheduling and allocation among connections with consider-ation of physical layer processing, power control range, and link conditions [4]-[5]. Bhargharvan, Lu, and Nandagopal [6] proposed a framework to achieve long-term fairness in wireless network. Varsou and Poor [7] proposed another class of scheduling algorithm from EDF concept in wireless environment. This class of scheme considers delay bound as its QoS requirement. In [8], a throughput-optimal scheduling algorithm for delay bounded system was proposed and proved. Shakkottai and Stolyar [10] considered both link quality and QoS requirements as the criteria and derived the exponential form of scheduling function via fluid Markovian techniques. Many of these scheduling algorithms above, [4]-[5], [8]-[10], were formulated in utility-based approaches.

The utility-based scheduling algorithm over radio channels, is usually formulated as a com-plicated constrained optimization problem with real time requirement. To solve this optimization problem, the class of generalized HNN has been adopted for real-time tasks with several inherent defficiencies. A special type of Hopfield neural networks (HNN), named cellular neural network (CNN) proposed in [11], has been proved that it can rapidly converge to desired equilibrium on vertex along the prescribed trajectories by proper design [12]. The CNN was widely applied in image processing field and was suitable for VLSI implementation. However, to adopt the CNN technique for the scheduling optimization problem, modifications of its architecture are necessary.

In the paper, we propose a CNN and utility (CNNU)-based scheduler for downlink in mul-timedia CDMA cellular networks. The CNNU-based scheduler contains a utility function (UF) preprocessor, a radio-resource range (RR) decision maker, and a CNN processor. Noticeably, the

(33)

utility function for each connection, adopted in the UF preprocessor, jointly considers radio resource efficiency, diverse QoS requirements, and fairness. It is a radio resource function weighted by both its QoS requirement deviation function and its fairness compensation function. The UF preprocessor generates a matrix of normalized utility functions of all connections. On the other hand, the RR decision maker determines a matrix showing the upper limit of radio resource assignment for each connection. The CNN processor receives the two matrix as inputs and determines an optimal normalized radio resource assignment vector for connections in multimedia CDMA cellular systems, by minimizing the system cost function which is in terms of the overall system utility function under system constraints of maximum transmission power, minimum spreading factor, and remaining queue length. The architecture of the CNN is constructed via the energy-based approach [13]-[14]. by mapping the system cost function to a proper energy function. It is designed in a two-layered configuration, which consists of a decision layer and an output layer, to reduce the number of inter-connections in the CNN. It can be shown that the stable equilibriums locate in the desired state space and the stability exists. The performance of the proposed CNNU-based scheduler is investigated by comparing with

Exponential Rule [10] for systems using both dedicated and shared channel. Results show that

the CNNU-based scheduler is efficient and effective for multimedia CDMA cellular networks. The rest of the paper is organized as follows. Section II presents the features and the operations of the considered system. In section III, an relevant utility function is then proposed. In section IV, the architecture of CNNU-based scheduler and the structure of CNN are discussed. Finally, simulation results and concluding remarks are summarized in section ??.

II. SYSTEMMODEL

Assume that there are N real-time (RT) and non-real-time (NRT) connections (users) in the downlink transmissions of the multimedia CDMA cellular system with chip rate W . RT connections transmit on dedicated channels and NRT connections transmit on shared channels. For every active connection using either dedicated or shared channels, a fixed number of code channels with their corresponding spreading factors are given in the connection setup phase. A minimum spreading factor SFi is therefore associated with the assigned code channels for

connection i. The system radio resource is here defined to be the transmission power. It is limited by a maximum power budget denoted by Pmax∗ and scheduled to all connections every frame

(34)

time period Tf.

For a downlink connection i, there are four QoS requirements defined in either the packet level, such as BER∗i, or the call level, such as delay bound Di∗, packet dropping ratio PD,i∗ ,

and minimum transmission rate R_m,i∗ . For RT connections, hard delay bound D_i∗ exists and P_D,i∗ can be larger than zero; while for NRT connections, no explicit delay bound is imposed, but

R∗_m,i > 0 should be satisfied for interactive connections and Rm,i∗ = 0 be set for best effort

connections.

For a RT connection i, a transmission suspension in a soft fashion is carried out by allocating zero transmission power when its utility calculated by the scheduler is lower than those of NRT connections. At that moment, its link gain ζi(t) is lower than the averaged mean link

gains of all NRT connections ζNRT by a relative margin, and this relative margin should be

considered to restrict the probability of transmission suspension below PD,i∗ due to the

delay-sensitive nature. Denote by ζ_i∗ the suspension threshold of connection i, which is obtained by

P {ζi(t) ≤ ζi∗} ≤ PD,i∗ . Then the relative margin of ζi(t) is a function of ζNRT and ζi∗, and is

dependent on the design of scheduling algorithm. For NRT connections, their transmissions are scheduled so that NRT connections will be allocated with proper radio resource to achieve high system utilization and keep the fairness and the QoS requirements fulfilled as much as possible. Assume that the link-gain ζi(t) and the interference Ii(t) for connection i at time t can be

measured at the user side and perfectly signaled to the base station. The ζi(t) consists of the mean

path loss, long-term fading, and short-term fading, and is given by ζi(t) = d−4i · 10 ζL_{i (t)}

10 · ζ_iS(t),

where diis the distance between the user i and its base station, ζiL(t) is the log-normal shadowing

component, and ζiS(t) is the Rayleigh-fading component. The adaptive QAM modulation is

adopted and the modulation order Mκi with index κi for connection i is determined according

to the link gain quality and interference. The traffic source of connection i generates packets and packets are queued in its individual buffer. The buffer size is infinite. The source models are assumed to be on-off for RT connections, Perato for NRT interactive (NRT-I) connections, and batch Poisson with truncated geometrical batch size for NRT best-effort (NRT-B) connections.

The proposed CNNU-based scheduler determines an optimal normalized radio resource as-signment vector c∗(t) = (c∗₁(t), . . . , c∗_N(t)) to N connections via maximizing an overall system

utility function. The transmission rate for connection i at t-th frame , denoted by ri(t), is then

(35)

III. FORMULATION OFTHE UTILITY FUNCTION

The utility function for connection i, Ui(t), is defined as the radio resource function of

connection i, Ri(t), weighted by its QoS requirement deviation function Ai(t) and its fairness

compensation function Fi(t). It can be expressed as

Ui(t) = Ri(t) · Ai(t) · Fi(t). (III.1)

A. Radio Resource Function Ri(t)

With the modulation order Mκi of the adaptive QAM modulation scheme and the

correspond-ing (Eb/N0)∗κi to satisfy the BER∗i requirement for connection i, the following inequality should

hold W Rs,i(t) · ci(t) · Pmax∗ · ζi(t) Ii(t) ≥ Eb No ∗ κi , (III.2) where Rs,i(t) is its symbol rate and ci(t) is its normalized radio resource assignment at time t.

The Ii(t) in (III.2) is given by [(1 − α)Pmax∗ · ζi(t) +bPmax∗ · ζi,b(t) + N0W ], where α is the

orthogonality factor for downlink, b is the index referring to the adjacent base stations, ζi,b(t)

is the link gain from base station b to connection i, and the (Eb/N0)∗κi in (III.2) is given by −(M_κi−1)·ln{5BER∗

i}

1.5 .We denote the maximum achievable symbol rate that can fulfill the (Eb/N0)∗κi

at ci(t) = 1 by R∗s,i(t). Clearly, R∗s,i(t) = (Eb/NW0)∗_κi ·

P∗ max·ζi(t)

Ii(t) . The R ∗

s,i(t) is further limited by W

SFi for a given spreading factor SFi of the allocated code channel. Thus the R ∗ s,i(t) can be obtained by R∗_s,i(t) = min W (Eb/N0)∗κi ·Pmax∗ · ζi(t) Ii(t) , W SFi . (III.3) According to (III.3), the most efficient modulation order Mκi is selected by the following

inequality, Mκi ≤ SFi· Pmax∗ · ζi(t) Ii(t) · _−ln{5BER∗ i} 1.5  + 1 ≤ M(κi+1). (III.4)

Since the information bit of one symbol is log2Mκi, consequently the radio resource function of

connection i, Ri(t), can be obtained by

Ri(t) 1.5W · log2Mκi (Mκi − 1) · ln_BER1 ∗ i − ln5 ·Pmax∗ · ζi(t) Ii(t) . (III.5)

Note that if the assignment of radio resource for connection i, ci(t), is allocated, the transmission

rate ri(t) is therefore equal to ci(t) · Ri(t).

(1) (3) (2) (4) (5) (3) (2) (2)

(36)

B. The QoS Requirement Deviation Function Ai(t)

The QoS requirement deviation function Ai(t) is used to indicate how much extent the

connection i deviates from its call-level QoS requirements. For a RT connection i, a hard delay bound Di∗ is imposed on each packet. Since QoS over wireless interface can be provided in a

soft fashion, the QoS guarantee of packet dropping ratio due to excess delay is expressed by

Prob{Di(t) > Di∗} < PD,i∗ ,where Di(t) is the waiting time delay for head-of-line packet at time

t. For an NRT interactive (NRT-I) connection i, a different notion of QoS requirement is that a minimum transmission rate must be guaranteed by E [ri(t)] ≥ R∗m,i. As for an NRT best-effort

(NRT-B) connection i, no call level QoS requirements are guaranteed and the Rm,i∗ is set to be

0.

From [18], the proposed Modified Largest Weighted Delay First (M-LWDF) algorithm suggests that an exponential rule [10] be the form with throughput optimal for the above call level QoS requirement constraints. Therefore, the QoS requirement deviation function Ai(t) is defined as

Ai(t) =                exp −log(P ∗_D,i) D∗_i ·Di(t)−D(t) 1+[D(t)]1/2 , if i∈ {RT}, exp ˆ Li(t)−L(t) 1+[L(t)]1/2 , if i∈ {NRT-I}, 1, if i∈ {NRT-B}, (III.6) where D(t) = _N1 _i _−log(P∗ D,i) D∗ i

· Di(t) is the average weighted delay, ˆLi(t) = ˆLi(t − 1) +

_R∗ m,i−ri(t)

R∗ m,i

is the normalized measurement on the difference of guaranteed minimum transmis-sion rate and the assigned rate, and L(t) = _N1 _iLˆi(t). For the RT connections, if the weighted

delay is more than the average weighted delay of all connections, theAi(t) will be exponentially

increased, and more resource will be scheduled; on the other hand, if the weighted delay is less than the average weighted delay, the Ai(t) will dramatically decayed, and less resource will be

allocated. Similarly, for the NRT-I connections, if the accumulated difference of the guaranteed minimum transmission rate and the assigned rate is greater than the average value, more resource is assigned. As to the NRT-B connections, this function is simply bypassed.

C. The Fairness Compensation Function Fi(t)

The fairness compensation function is to ensure that RT connections using dedicated channels have the relative priority over NRT connections using shared channels. It is also the way that (6)

B3G無線接取網路之無線資源管理技術---子計畫一：異質多接取網路之資源管理技術(II)

子計畫一：異質多接取網路之資源管理技術(2/2)

中 華 民 國 94 年 10 月 27 日

Contents

Mandarin Abstract

i

English Abstract

ii

Contents

iii

List of Figures

vi

List of Tables

vii

1 Project Overview

1

2 Situation-Aware Data Access Manager Using Fuzzy Q-learning

Technique for Multi-cell WCDMA Systems

4

I. Introduction . . . . 4

II. System Model . . . 8

III. Design of FQ-SDMA . . . . 9

. . . . 10

. . . . 11

. . . . 14

IV. Simulation Results and Discussion . . . . 15

. . . . . . . . 16

Multimedia CDMA Cellular Networks

23

I. Introduction . . . . 23

II. System Model . . . . 25

III. Formulation of the Utility Function . . . . 27

) . . . . 27

A

(t) . . . . . . . . . 28

F

(t) . . . . 28

IV. Design of the CNNU-Based Scheduler . . . . 30

. . . . 31

. . . . 32

. . . . 34

. . . . 36

V. Simulation Results and Discussion . . . . 38

4 A Novel Dynamic Cell Configuration Scheme in Next-Generation

Situation-Aware CDMA Networks

43

I. Introduction . . . . 43

II. Issues of Dynamic Cell Configuration . . . . 45

. . . . 45

. . . . . . . . 46

. . . . 46

III. System Model . . . . 47

. . . . 47

. . . . . . . . 48

IV. Proposed CDD-RL Scheme . . . . 49

. . . . 52

. . . . 52

V. Simulation Results and Discussions . . . . 53

. . . . 53

. . . . . . . . 54

List of Figures

Chapter 2

. . . . 20

. . . . 20

. . . 21

. . . . 22

. . . . 22

Chapter 3

. . . . 30

. . . . 37

. . . . 39

. . . . 40

. . . .

41

. . . . 41

Chapter 4

. . . . 59

. . . . 60

. . . . 60

. . . . 61

中華民國 94 年 10 月 27 日

_{. . . . 60}