下一代異質性網路智能技術之研究

(1)

國立臺灣大學電機資訊學院電信工程學研究所博士論文

Graduate Institute of Communication Engineering College of Electrical Engineering and Computer Science

National Taiwan University Doctoral Dissertation

下一代異質性網路智能技術之研究

On the Efficiency of Intelligent Technologies for Next Generation Heterogeneous Networks

陳贊羽 Zanyu Chen

指導教授：林宗男博士 Advisor: Tsung-Nan Lin, Ph.D.

中華民國 104 年 7 月

July, 2015

(2)

口試委員會審定書

下一代異質性網路智能技術之研究

On the Efficiency of Intelligent Technologies for Next Generation Heterogeneous Networks

本論文係陳贊羽君（D98942025）在國立臺灣大學電信工程學研究所完成之博士學位論文，於民國 104 年 7 月 20 日承下列考試委員審查通過及口試及格，特此證明

口試委員：

（簽名）

（指導教授）

(3)

誌謝

無論順利與否，終於畢業了，感謝的是林宗男老師推了我一大把，朝著博士學位努力，過程中的鼓勵也在最低落的時候起了很大的作用，最終才能夠將本論文順利產出。

感謝在博士班的這段日子當中，實驗室學弟們在各方面的陪伴，R97 的書銘、

育宏、琮訓，R98 的訓甫、賢祥、舜橋、政偉，R99 的重良、彥志、明甫、中超，

R00 的致程、崇瑋、俊堎、潘叡，R01 的康林、大豪、智彥、韋翰，R02 的洋銘、

瑋時、柏宇、冠翔，R03 的三豐和昇毅，相信大家離開 507 之後都擁有相當美好的未來。

感謝的還有平常在一起鬼混打牌、羽球、出遊、爬山、下海的歐肥、keter、

昊呆和佳雯，在博士班的生涯當中，多了許多的變化，最後自然還有總是能夠尊重我決定的父母，在這段時間的支持，雖然總是問何時要畢業，還好終於畢業了!

(4)

在今日的異質性網路中包含了許多的大、小型基站，然而目前這些基站的配置無法滿足未來的使用者需求，實際上在對來的預測當中，至少在 2020 年的時候需要比今日還要增加 100 倍的網路容量，才有辦法應付各方面的需求，因此網路的供應商和營運商們，仔細思考著他們手邊所能夠使用的方法，來提升網路的容量。

在這場即將到來的戰役之中，有三個面向是值得來思考的：增加網路的密度、使用更多的頻譜和增加頻譜使用效率的各種技術。在本論文當中，我們針對在增加網路密度所遭遇到的各種可能情況來研究，增加網路的密度會採用大量的小型基站，像是家用基站、微型基站和轉傳節點。本論文主要分為兩個部份來探討：首先是關於在合作式通訊當中，轉傳節點選擇的問題，第二個主題是在異質性網路當中干擾管理的問題，關於第一個主題，我們提出了一個完全分散式的演算法稱為「分散式學習為基礎的轉傳選擇」，解決在合作式通訊中的轉傳選擇問題，另一方面在第二個主題當中，我們提出一個稱為「多調子訊框」方案，來減輕在異質性網路當中的干擾問題。

關鍵詞 ──異質性網路、合作式通訊、轉傳選擇、干擾管理、近乎空白子訊框、

原對偶內點法。

(5)

Abstract

Today’s heterogeneous networks comprised of mostly macrocells and small cells will not be able to meet the upcoming traffic demands.

Indeed, it is forecasted that at least a 100× network capacity increase will be required to meet the traffic demands in 2020. As a result, vendors and operators are now looking at using every tool at hand to improve network capacity. In this epic campaign, three paradigms are noteworthy, i.e., network densification, the use of of higher frequency bands, and spectral efficiency enhancement technique. In this dissertation, we focus on the issue on network densification, which contains many small cells in the network such as femtocells, picocells and relay nodes. The dissertation can be divided into two parts:

the first one is about relay node selection in cooperative communication, and the other is about interference management in heterogeneous networks. We proposed an fully decentralized algorithm call

”Decentralized Learning based Relay Assignment” algorithm to solve the relay assignment problem in cooperative communication. On the other hand, in the topic about interference management, we propose an approach called ”Multi-Tone Subframes” to mitigate the interference in heterogeneous networks.

Keywords: Heterogeneous Network, Cooperative Communication,

(6)

(7)

List of Figures

1.1 Existing paradigms to improve network capacity: Spectrum (more spectrum), Technology (more spectral efficiency) and Topology

(more spatial efficiency). . . 2

2.1 The concept of Cell Range Expansion (CRE). . . 10

2.2 The transmission power on each subframe of approaches: ABS, RPS and DRPS. . . 12

3.1 A Stochastic learning automata. . . 21

3.2 State transition diagram of DLRA. . . 35

3.3 Markov chain model of DLRA. . . 37

3.4 The number of transitions DLRA would take vs. the probability ξ(b) . . . 39

3.5 The CDF of the capacity of all source nodes for the two algorithms in the environment while Ns < Nr. . . 41

3.6 The CDF of the capacity of all source nodes for the two algorithms in the environment while Ns > Nr. . . 42

3.7 The 5%-outage, median and mean capacity of DLRA and ORA in both cases. . . 43

(12)

relay selection result for each source node. . . 44

3.9 The variation of aggregate performance of average results with different values of ∆. . . 45

3.10 The average convergent time with different values of ∆. . . 46

3.11 The state evolution of source node 5. . . 47

3.12 The performance evolution for source node 5. . . 48

3.13 The probability vector evolution for source node 5 without small scale fading. . . 49

3.14 The probability vector evolution for source node 5 with Rayleigh fading. . . 50

3.15 Different initialization of P0: (a) Uniform distribution, (b) Non- uniform distribution and the probability of the best state is not equal to 0, and (c) the probability of the best state is equal to 0. 52 3.16 The CDF of capacity for all algorithms. . . 53

3.17 The aggregate capacity for all algorithms. . . 53

3.18 The average improvement for all algorithms. . . 54

3.19 The fairness index for all algorithms. . . 54

4.1 Illustration of coverage area in different transmission. . . 59

4.2 The comparison of the capacity: the original case means the maximum transmission power on one subframe and the after case means use 1/3 maximum transmission power on each subframes. . . 62

4.3 Example of coverage area of MBS for 7 power levels. . . 69

(13)

LIST OF FIGURES

4.4 Effect of t on approximation accuracy. As t increase, the approxi-

mation becomes more accurate. . . 71

4.5 The sketch map of the performance relation among ABS, RPS and MTS with different levels. . . 83

4.6 The topology used in the simple numeric analysis. . . 85

4.7 The distance between the MBS and the PBS is 800 meters, and different ABS proportions are evaluated. . . 87

4.10 The performance loss rate with different power levels compared to the optimal solution obtained from the interior point relaxed MTS optimization algorithm. . . 93

4.11 The user association in the center cell. . . 94

4.12 The group of UEs served by the power level P2. . . 96

4.14 The group of UEs served by the power level P₄. . . 97

4.18 Comparison of the capacity of the MBS and the PBS in the original ABS scheme and the proposed MTS approach. . . 99

4.19 The average throughput of cells in different approaches. . . 100

(14)

MTS. . . 101 4.21 The distribution of transmission power levels in the system. . . . 102 4.22 The power consumption of all approaches. . . 103 4.23 The over-the-air delay of the ABS scheme and the proposed ap-

proaches. . . 104

(15)

List of Tables

4.1 Definitions of Notations . . . 65

4.2 Simulation Settings . . . 92

4.3 Transmission Power Level . . . 93

4.4 Number of UEs in the Groups . . . 95

(16)

(17)

Chapter 1 Introduction

1.1 Technology Trends and Motivations of the Dissertation

The phenomenal growth in mobile broadband has created a massive challenge for the industry to satisfy the thirsty for data. The rapid and continuing growth of mobile data has the industry gearing up to meet a new challenge. In view of such significant future traffic demands, the mobile industry has set its targets high, and has decided to improve the capacity of today’s networks by a factor of 100×

or more over the next 20 years - 1000× the most ambitious [1].

In order to achieve this goal, venders and operators are currently looking at using every tool they have at hand, where the existing tools can be classified within the following three paradigms as illustrated in Fig. 1.1:

• Enhance spatial reuse through network densification, i.e., Heterogeneous Networks (HetNets) and small cells [2–5].

• Use of larger bandwidths, exploiting higher carrier frequencies, both in li- censed and unlicensed spectrum [6–8].

(18)

erative communications [10], dynamic TDD techniques [11,12], etc.

Figure 1.1: Existing paradigms to improve network capacity: Spectrum (more spectrum), Technology (more spectral efficiency) and Topology (more spatial efficiency).

Given the different approaches to enhance network capacity, it may be worth understanding how network capacity has been improved in the past and which have been the lessons learnt to make sure the best choices are taken. To this end, according to [13], the different methods used to enhance network capacity from 1950 to 2000, and the wireless capacity has increased around a 1 million fold in

(19)

1.1 Technology Trends and Motivations of the Dissertation

50 years. The breaking down of these gains is as follows: 15× improvement was achieved from a wider spectrum, 5× improvement from better Medium Access Control (MAC) and modulation schemes, 5× improvement by designing better coding techniques, and an astounding 2700× gain through network densification and reduced cell sizes. According to this data, it seems obvious that if we are looking for a 1000× improvement in network performance, network densification through ultra-dense small cell deployments is the most appealing approach, and today’s networks have already started going down this path.

In order to meet the exponentially increasing traffic demands [14], mobile operators are already evolving their network form traditional macrocellular-only networks to HetNets [15,16], in which small cells reuse the spectrum locally and prvide most of the capacity while macrocells provide a blanket coverage for mobile UEs. Currently, small cells are deployed in large numbers. Indeed, according to recent surveys, in 2012, the number of small BSs was already larger than that of macro BSs [17]. These small cell deployments are mainly in the form of home small cells, known as femtocells [3,4,18], but many operators have also already started to deploy outdoor small cell solutions to complement their macrocellular coverage [19].

Since there many technology to improve the network capacity, we focus on the topics in the HetNets, which are considered as the main technology to meet the increasing data traffic demands. HetNets include traditional macrocells and low-power smallcells. Smallcells, such as femtocells, picocells and relay nodes, are sharing the same spectrum resource as macrocells. Therefore, the interference is the main factor affecting the performance of smallcells. Besides, relay nodes are also used to increase the cell edge capacity in HetNets. We would address this

(20)

1.2 Topic to Be Addressed

In this dissertation, we consider two different types of small cells: relay nodes and pico base stations. In the first, we take relay nodes into consideration. Relay assignment problem in cooperative communication will be addressed. Second, we consider a heterogeneous network which consists of macro BS and pico BSs. We study a topic about interference management among these cells.

1.2.1 Cooperative Communication

Spatial diversity, in the form of employing multiple transceiver antennas, is shown to be very effective in coping fading in wireless channel. However, equipping a wireless node with multiple antennas may not be practical, as the footprint of multiple antennas may not fit on a wireless node. To achieve spatial diversity without requiring multiple transceiver antennas on the same node, the so-called cooperative communications has been introduced. Under cooperative communications, each node is equipped with only a single transceiver, and spatial diversity is achieved by exploiting the antenna on another node the network.

In this dissertation, we study the relay node assignment problem in cooperative communication system, which includes ad hoc network and LTE relay networks. We propose an algorithm which is fully decentralized called ”Decen- tralized Learning based Relay Assignment” (DLRA). We evaluate DLRA from many aspects: mathematical analysis, performance manipulation and computer simulation.

(21)

1.3 Dissertation Organization

1.2.2 Interference Management

HetNet is a two-tier network scenario, and co-channel cross-tier interference is a major factor affecting network performance. To handle this problem, a frame work called Enhanced Inter-Cell Interference Coordination (eICIC) is proposed by 3GPP. Almost blank subframes (ABSs) are the major part of eICIC, and the concept is to blank some subframes of the interferer tier, where only pilot and system signals are transmitted [20]. Because FBSs and PBSs are deployed overlaid on coverage range of MBSs, the interference from MBSs is severe. Therefore, adopting ABSs can lead these small BSs to have better performance during ABSs.

In this work, we address some weak points of ABS scenarios and propose our approach to improve them. First, the latency of real-time traffic, such as voice, would get longer if the proportion of ABSs is getting higher. Because user information can not be delivered by MBSs in ABS periods, some real-time information can not be transmitted in time while MBSs under heavy load. The more proportion of ABSs is adopted, the longer latency is got in MBSs. Second, ABS scenarios sacrifice some subframes to protect PBSs, and the behavior decrease the spectrum utilization. It leads to capacity loss in MBSs, and the performance of users served by MBSs is degraded.

1.3 Dissertation Organization

The rest of this dissertation is organized as follows. In Chapter2, we review some related literature about this work. The topic about relay assignment is in Chapter 3 and the proposed ”Decentralized Learning based Relay Assignment” (DLRA) algorithm is shown in the same chapter. We show the proposed interference

(22)

(MTS) is evaluated in the same chapter. In the last, we conclude this work in Chapter 5

(23)

Chapter 2 Related Work

In this chapter, some works related to the thesis are introduced, and some literature is also reviewed.

2.1 Cooperative Communication

The concept of cooperative communication was pioneered by a paper by van der Meulen [21] and Cover and EL Gamal [22]. This section focuses on related works on the relay node selection problem. Zhao et al. [23] show that it is sufficient to choose the best relay node for transmission rather than having multiple relay nodes participate. This work thus considers a scheme in which each source node chooses at most one relay node for transmission.

The problem is studied in many works, and from several different aspects.

The basic structure of a cooperative communication system is also discussed in this section. For example, Yang et al. [24] introduce TDD/FDD system frame structures in LTE-advanced and WiMax systems. To implement cooperative communication systems in the real world, the degeree of complexity must be lim- ited. Jing et al. [25] therefore propose a relay selection scheme with polynomial

(24)

that an optimal relay node selection problem solution can be obtained in polynomial time. Furthermore, many works have adopted network coding schemes in cooperative communication systems [27–29]. In these schemes, network capacity can be significantly increased, but it is fairly challenging to implement network coding schemes in real-world systems [30,31]. Instead of using a centralized mechanism, Cai et al. [32] propose a semi-distributed algorithm with a greedy algorithm methodology. There is therefore no performance guarantee in this algorithm [33]. Other works consider relay selection with a power allocation problem in the system [34–39]. These approaches achieve the goal of energy-saving in relay networks. [40] and [41] combine relay selection problems and rate adaptation to achieve higher system performance. Abouelseoud et al. [42] combine many different protocols in relay networks in order to enhance system performance.

The last part of this section introduces the works to be compared with DLRA in this dissertation. The first is that proposed by Sharma et al., called ”Optimal Relay Assignment” (ORA); the goal is to maximize the minimal performance in the system. [43]. A linear marking algorithm is proposed in ORA; the idea is to mark the node with the worst performance. ORA tries to increase its performance without decreasing the worst performance in the system. A relay node can only be shared by one source node in ORA. The second is proposed by Yang et al., called

”OPtimal Relay Assignment” (OPRA), and the goal is to maximize the total system performance [44]. OPRA discusses the possibility of allowing a relay node to be shared by multiple source nodes to achieve its objective, and formulates the problem as a maximum weighted bipartite matching problem; it then solves it with the corresponding algorithm. The third work is proposed by Cai et al., and

(25)

2.2 enhanced Inter-Cell Interference Management (eICIC)

is a semi-distributed approach [32]. Cai’s algorithm is to consecutively have each source node randomly select a relay node, a process based on a greedy approach.

2.2 enhanced Inter-Cell Interference Management (eICIC)

2.2.1 Range Expansion and Inter-cell Interference Coor- dination

Cell association is usually performed according to the RSRP [45]. Due to the large difference of transmission power between an MBS and a PBS, a pseudo bias is added to RSRP from PBS. The so called CRE is proposed by 3GPP to allow load balancing among macrocells and picocells. 3GPP has studied the concept of CRE throughput handover biasing and resource partitioning among nodes with different levels of transmission powers [46,47]. The larger value of the bias is, the more UEs can be associated to picocells. CRE approach is simply to add an offset on RSRP of picocells to increase the converage range, however, the downlink signal quality of those users, which we call CRE users in this work, in the expanded range is significant reduced. Fig. 2.1 illustrate the users are offloaded from macrocells to picocells. CRE users do not associate to the cells which provide the best downlink signal quality, and they may suffer severe interference while CRE is adopted.

In 3GPP LTE Release 8-9, ICIC schemes have not consider HetNet environment yet. To handle the cross-tier interference, 3GPP proposes ABS to enhance ICIC, which is referred to eICIC, in order to mitigate the cross-tier interference. According to [48], the ICIC techniques in 3GPP Release 10-12 can be

(26)

Figure 2.1: The concept of Cell Range Expansion (CRE).

grouped into four categories: time-domain, frequency-domain, power based and antenna/spatial-based techniques [46,49]. ABS belongs to the time-domain ICIC techniques. The idea of ABS is to mute some subframes of MBS in order to protect PUEs. In these muted subframes, no data is transmitted to MUEs, therefore, CRE users do not suffer the severe interference from MBSs anymore. The illustration of ABS scheme is shown in Fig. 2.2(a).

The drawback of ABS can be seen significantly: it wastes much spectrum resource of MBSs. Although the performance of picocells is increased, the performance of MBSs is sacrificed. According to [46], instead of muting the MBS completely during ABS, transmitting at reduced power to serve only its nearby UEs would considerable improve the HetNet performance in terms of the trade- off between cell-edge and average throughputs. Latter on, reduced power subframe (RPS) transmission have also been standardized under LTE Release 11 of

(27)

3GPP, and commonly referred therein as further-enhanced ICIC (FeICIC) [50].

In another study [51], simulation results show that FeICIC is less sensitive to the duty-cycle of ABS than the eICIC. The illustration of RPS is shown in Fig.

2.2(b).

RPS is to transmit reduced power during the original ABS, however, the purpose to blanking these frames is to protect CRE users. The reduced power transmission would interfere CRE users, although the interference is lighten compared to full power transmission. The performance of CRE users are sacrificed compared to ABS scenario. So, an approach called dynamic RPS (DRPS) is proposed, and the illustration of DRPS is shown in Fig. 2.2(c). Compared to RPS, DRPS gives more flexible transmission power on the original ABS.

2.2.2 Literature Review

there is a sizeable body of literature on the use of CRE for traffic load balancing in HetNets; see e.g. [19,52–60]. Lopez-Perez et al. [19] calculate CRE bias values for different range expressions strategies, e.g., equal downlink RSS boundary and equal path-loss boundary, and closed-form expression are derived. Guvenc et al. [52] propose cell selection procedure based on subframes blanking to improve downlink capacity and UE’s fairness. Shirakabe et al. [57] provide the performance of different CRE values and different ration of protected subframes, which is based on system level simulations. Using tools from stochastic geometry, ana- lytical models accounting for BS and UE locations have been studied to analyze spectral efficiencies in range expanded picocell networks in [56], which has later been extended to ICIC scenarios in [58,59].

For the purpose of interference management, frequency domain interference

(28)

(a) ABS

(b) RPS

(29)

coordination techniques have also been studied in literature. The simplest strat- egy is called universal reuse or reuse of factor 1. It allows each cell access the whole bandwidth without any restriction. Simonsson [61] shows the fact that the universal reuse performs best for wideband services. Decentralized inter- cell interference coordination is proposed by Ellenbeck et al. [62]. Many carrier aggregation based ICIC techniques have also been proposed in literature; see e.g. [63–66].

For the aspect of time-domain ICIC, Wang et al. [67] consider time and power domain interference management in HetNets. However, the authors apply ABSs and power reduction on small BSs, it may reduce the performance of small BSs.

The purpose of eICIC is to protect the transmissions of small BSs. Cherny et al. [68] study the work that the number of necessary ABSs is needed in HetNets.

The problem is addressed by the authors by tools from stochastic geometry, and they show moderate performance gain for victim users in HetNets. Deb et al. [69]

formulate the same problem as an optimization problem, and find it is NP-hard.

So, the authors propose a distributed approach to obtain the suboptimal solution of the problem. The problem of ABS duty cycles is addressed in [70,71]. Ding et al. [12] evaluate dynamic time-duplex TDD transmission by CRE and ABS in co-channel HetNets.

To enhance ABS, many researches are provided to increase the performance of ABS. Soret et al. [72] propose a scheme named low-power ABS. Through this scheme, MBSs reduced their transmission power in particular subframes instead of blanking them. Merwaday et al. [59,60] provide the performance of HetNets with reduced power subframes. The authors show the impact of interference coordination in HetNets by using stochastic geometry techniques. Soret et al. [73]

(30)

(31)

Chapter 3 Decentralized Learning-Based Relay Assignment for

Cooperative Communications

3.1 Background Information

Cooperative communication [74] is considered a promising approach to achieving spatial diversity and addressing the increasing demand for data throughput in wireless networks. Network systems achieve spatial diversity by exploiting the broadcast nature and antennas of other nodes, i.e. relay nodes. Such systems therefore do not require multiple antennas on individual nodes. In addition, the deployment of relay nodes is also able to address increasing mobile user demands.

Only an appropriate relay node can lead to better performance; an inappropriate relay assignment may negatively affect network performance [75].

To date, many approaches to solving the relay selection problem have been proposed. The two leading approaches are: maximizing the aggregate performance [44], and maximizing the minimal performance [43]. Many studies have developed centralized algorithms to handle the problem [23,76–82], for example,

(32)

formulating relay selection problems as optimization problems. The centralized approaches always require link metrics, such as signal-to-noise-ratio (SNR), distance between nodes or channel state information, to make the relay assignment decisions. When the numbers of source nodes and relay nodes increase, optimization problems become increasingly complicated. In order to address this problem, some distributed algorithms have been proposed [32,83,84]. Some of these distributed approaches are based on opportunistic cooperation, which still causes system overhead.

This study proposes a fully distributed algorithm called ”Decentralized Learning- based Relay Assignment” (DLRA), which is based on stochastic learning automata (SLA), to solve the relay selection problem in cooperative communications. SLA are used in many areas, such as, pattern recognition [85] and robot systems [86]. The idea of SLA is attempting to solve a problem without having any information regarding the solutions. An action is selected according to a probability vector, the feedback is observed from the environment, the probability vector is updated according the feedbacks, and the procedure is repeated.

Finally, the most suitable solution will be found in the end.

DLRA is a self-organizing algorithm; it can give each source node self-optimizing and self-learning abilities. Each source node is therefore able to select an appropriate relay node for itself without exchanging information with all of the other source nodes. Thus, it is unnecessary for DLRA to maintain a central control unit to handle the whole system. In addition, the complexity of DLRA does not increase with the number of source nodes.

DLRA performance is evaluated by mathematical analysis and computer experiments. The convergency and optimality of DLRA is demonstrated in the

(33)

3.1 Background Information

mathematical analysis. In the computer experiments, two different cooperative network systems are constructed: one is a cooperative ad hoc network, and the other is an LTE-advanced relay network. This study demonstrates the effective- ness of DLRA in the first network system, where all simulation topologies and parameters follow [43]. DLRA performance is then compared with that of other algorithms in the second network system to show its superiority.

The main contributions of this study are summarized as follows:

• A fully distributed and self-optimizing algorithm based on SLA is proposed.

The need for information exchange among all source nodes is eliminated since each source node is able to individually and autonomously find the most appropriate relay node for transmission.

• The selection mechanism is based on existing environmental feedback, which enables source nodes to adjust the preferred transmission method, and no additional overhead is produced in the system; it balances the load of the relay nodes in turn.

• Through mathematical analysis, the proposed algorithm converges into one state; it is shown that the convergent state exhibits the best performance.

The optimality of the proposed algorithm is also shown in the mathematical analysis.

• The experiments are not performed in a specified network system. Different network systems are used to apply the proposed algorithm, and the experi- mental results show that the proposed algorithm exhibits good performance in different network systems. The proposed algorithm can therefore be ap- plied to various cooperative communication systems.

(34)

3.2 System Model

3.2.1 Cooperative Communication Modes

Two modes of cooperation communication are considered in this section: amplify- and-forward (AF) and decode-and-forward (DF) [75]. The expressions for capacity in cooperation communications are also given in this section.

Amplify-and-Forward (AF)

In AF mode, relay nodes simply amplify the received signal from the source nodes and transmit it to the destination nodes. It is a simple method, and makes for low-cost implementation. However, it also amplifies noise at relay nodes with the desired signal component, thus decreasing the received SNR and reducing the enhanced gain. According to [75], the capacity of AF can be expressed as:

CAF(s, r, d) = W

2 log₂(1 + SNRsd + SNRsrSNRrd

1 + SNRsr+ SNRrd

)

= W IAF(SNRsd, SNRsr, SNRrd)

(3.1)

,where W is the transmitted bandwidth, s, r and d denote the source node, relay node and destination node respectively. SNRsd is the signal-to-noise-ratio at destination nodes while the signal is from source nodes, and SNRsr and SNRrd

are similar.

Decoded-and-Forward (DF)

(35)

3.2 System Model

In DF mode, relay nodes demodulate and decode the received signal at relay nodes, and modulate and encode it again before transmitting it to destination nodes. DF offers better performance gain compared to AF. However, DF causes a delay associated with the modulation/demodulation and encoding/decoding processes. The capacity of DF can be expressed as:

CDF(s, r, d) = W

2 log₂(min{1 + SNR^sr, 1 + SNRsd + SNRrd})

= W IDF(SNRsd, SNRsr, SNRrd)

(3.2)

Direct Transmissions

When a source node communicates with a destination node without using relay nodes, it is a direct transmission. The capacity of direct transmission is expressed as:

CD(s, d) = W log₂(1 + SNRsd) (3.3)

3.2.2 Network Model

This work assumes that a network system has N nodes, with each node being either a source node, a destination node, or a relay node. Denote X = {x¹, ..., xNx} as the set of source nodes, Y = {y¹, ..., yNy} as the set of destination nodes, and R = {r1, ..., rNr} as the set of relays. The destination node of source node i, namely xi is denoted by Y (xi), and the relay node used by xi is R(xi). Note that if xi doesn’t get a relay node, then R(xi) = φ. This may be caused by the fact that direct transmission is better than transmission via relay nodes.

(36)

Suppose that all nodes are equipped with a single antenna, and work in half- duplex mode. So, the structure of on transmission is that one frame is divided into two time slots, where one is used for the source node to the relay node, and the other is the relay node to the destination node. They are therefore unable to simultaneously transmit and receive. For both AF and DF, the capacity of Si

while R(Si) 6= φ can be written as:

W IR(SNRxi,Y(xi), SNRxi,R(xi), SNR_R(x_i_{),Y (x}_i₎) (3.4) where IR(·) = I^AF(·) for AF and I^R(·) = I^DF() for DF.

If xi does not use a relay, the capacity is calculated as the direct transmission, namely

W log₂(1 + SNRxi,Y(xi)) (3.5) Denote A(ri) as the number of source nodes which use the relay node ri. When multiple source nodes choose the same relay node, it is assumed that the source nodes share radio resources equally. In this work, the proposed algorithm is a decentralized approach. There is no information exchanged among source nodes.

The number of source nodes served by a relay node is only known by that relay node. From the viewpoint of relay nodes, the capacity of xi while using relay node R(xi) is denoted as C(xi, R(xi)), and can be shown as follows:

C(xi, R(xi), Y (xi)) =











W

A(R(xi))IR(SNRxi,Y(xi), SNRxi,R(xi), SNRR₍xi),Y (xi)), if R(xi) 6= φ

W log₂(1 + SNRxi,Y(xi)), if R(xi) = φ

(3.6)

(37)

3.3 Decentralized Learning based Relay Assignment algorithm

Source nodes only send and receive information to and from relay nodes, allowing them to obtain the capacity among them. In this work, the proposed algorithm is evaluated in two different network systems. Detailed information on both systems is shown in Section 3.5.

3.3 Decentralized Learning based Relay Assign- ment algorithm

Environment

P(t+1) = T(D(t))

Learning Automata

S(t) = Q(P(t)) D(t) = {D

1

, D

2

, …, D

r

}

S(t) ∈ {S

¹

, S

2

, …, S

r

}

Figure 3.1: A Stochastic learning automata.

3.3.1 SLA: Stochastic Learning Automata

Stochastic learning automatas are self-optimizing, reinforcement learning techniques in machine learning [87]. An SLA is a finite state machine which interacts

(38)

with an unknown environment and attempts to learn the best action offered by that environment via a learning process [88]. SLAs learn by means of the feedback from their environments, which can tell an SLA whether or not its selection is good. The learning process is iteratively performed until the SLA reaches a stable condition. There is no predetermined relationship between actions and responses, and SLAs are therefore suitable for use in unknown network environments, for example, cooperative networks where UEs do not know which relays are the best for them. So, SLAs are an attractive mechanism in such environments, and many studies have been conducted to apply them to network systems. In [89], a rate adaptation mechanism of 802.11 networks is proposed based on SLA. In [90], an algorithm to perform opportunistic spectrum access is based on SLA. In [66], SLA is used to conduct an energy-saving algorithm in LTE-advanced networks.

An illustration of SLA is given in Figure 3.1, and SLA is defined by the 5-tuples.

• S = {S¹, ..., Sn} is the set of n states in the system. The selected state at time t is symbolized as : S(t) = Si ∈ S

• D = {D1, ..., Dn} is the set of environmental responses corresponding to each state.

• P is the probability distribution over the set of states, and P (t) = {P¹(t), ..., Pn(t)}

where Pi(t) is the probability of state Si ∈ S at time t.

• T is the learning algorithm that modifies the probability vector P (t + 1) at the next iteration according to D(t).

• Q is the output function from P (t) to S(t).

(39)

In SLA, each agent has many available states, denoted by set S, and must choose a state, S(t), according to the probability vector P (t) of the states and output function Q. After an agent selects a state, the selected state triggers the environment by responding with a performance estimation D(t). After time period T , an agent updates the probability vector P (t+1) in the next time instant based on D(t).

3.3.2 Proposed Algorithm

In this section, the proposed algorithm is described: DLRA (Decentralized Learning- based Relay Assignment), which is based on SLA and is an online and totally distributed algorithm. The goal of DLRA is to choose an appropriate relay for transmission, so that each relay selection stands for different states. Then, a probability distribution is covers these states, and DLRA chooses a relay node according to this probability. Once a state, namely a relay node, is chosen, the system will give feedback information, which is a performance measure in DLRA, about of this relay node. DLRA updates the probability distribution according to the performance over every period T. So, the five tuples defined in the previous section are described as follows:

• S = {1, ..., N^r+ 1} is the set of relay nodes, and each relay nodes represents a state. The state Nr+ 1 means the direct transmission, and state i (i 6=

Nr+ 1) is using relay node i for the transmission.

• Dⁱt= {dⁱt(1), ..., dⁱ_t(Nr+1)} is the performance vector of source node i at time t. When a relay node is chosen, a environmental response is obtained; it represents the performance measure of the chosen relay node. The capacity

(40)

of source nodes is used as the performance measure. So, dⁱ_t(r) means the capacity of source node i at time t when choosing relay node r.

• Pⁱ^t = {pⁱt(1), ..., pⁱ_t(Nr+ 1)} is a probability vector. DLRA chooses a relay node according to this vector. For example, P_tⁱ(r) is the probability of source node i choosing relay node r at time t.

• T is the period after which DLRA updates the probability vector.

• Qⁱtis the output function from P_tⁱ to S. Qⁱ_t(P_tⁱ) = r means that source node i picks up state r at time t according to P_tⁱ.

In DLRA, source node i picks out a state from set S according to its probability vector P_tⁱ at time t. After selecting a state, e.g., state r is chosen, DLRA will obtain feedback from the system and denote it by dⁱ_t(r), which is the capacity of source node i when choosing state r at time t in cooperative communication. DLRA then has a performance vector for source node i, namely Dⁱ_t= {dⁱt(1), ..., dⁱ_t(Nr+ 1)}. DLRA updates the probability of the source node i according to the performance vector D_tⁱ.

Before updating the probability vector, DLRA needs to find the best state for the source node. The best state for source node i is the state with the highest capacity among all states, and is denoted by bⁱ_t at time t, so:

bⁱ_t = arg min

j dⁱ_t(j) (3.7)

After the best state is obtained, the next step is to update the probability vector. DLRA applies the discrete pursuit reward inaction (DPRI) algorithm for updating probability vectors [91], which is able to obtain the best reward. The

(41)

DPRI algorithm has been shown to exhibit good convergent properties, where the probability of the best state is increased and the probabilities of the other states are decreased. Thus, the minimum probability for all states is zero. The probabilities are updated according to the following equation:

P_t+1ⁱ (j) =

( max{pⁱt(j) − ∆, 0} , if j 6= bⁱt

1 − P

j6=bⁱ_t

pⁱ_t+1(j) , if j = bⁱ_t (3.8)

where ∆ = _n(N¹_r₊₁₎ is the smallest step size, and n ∈ [1, ∞) is a resolution parameter used to determine the size of ∆. The pseudo code of DLRA is shown in Table 1 where T is the training period, and end means DLRA converges to one state.

3.3.3 Complexity

There are five loops in the pseudo-code, where the first loop contains a variable j from 1 to Nr + 1. The complexity the first loop is O(Nr). The second loop contains the first loop and a variable from from 1 to T . The complexity of the second loop is O(T Nr). The third loop contains a variable l from 1 to Nr+ 1, so the complexity of the third loop is O(Nr). The fourth loop contains a variable j from 1 to Nr+ 1,so the complexity of the fourth loop is O(Nr).

The fifth loop contains the second, third, fourth loop and a variable t from 1 to end. We use E to represent end which signifies the iterations that a source node needs for DLRA to converge. The complexity of the fifth loop is O(E(T Nr+ Nr+ Nr)) = O(ET Nr). T and E are two constants for a node, so the complexity of DLRA is O(Nr). It is the first degree polynomial in Nr. In contrast, the complexity of ORA and OPRA includes O(NsN_r²) and O(N_s²Nr), respectively.

(42)

Algorithm 1 DLRA: Decentralized Learning based Relay Assignment algorithm Initialization:

p₀(k) = _S¹, trial(k) = 0, D(k) = 0, ∀k ∈ {1, ..., N^r+ 1}

i = 1, j = 1, l = 1, t = 0, state = 0, Best = 1 repeat

repeat

temp = rand() repeat

if temp < pt(j) then state← j

trial(j) ← trial(j) + 1 break

else

temp ← temp − p^t(j) continue

end if j ← j + 1 untilj = Nr+ 1

Calculate the capacity of the state according to (3.4).

D(state) ← D(state) + capacity j ← 1

i ← i + 1 until i = T i ← 1 repeat

D(l) = D(l)/trial(l) if D(Best) > D(l) then

Best ← l end if

until l = Nr+ 1 trail(k) = 0 ∀k D(k) = 0 ∀k p_t+1(Best) ← 1 repeat

if j 6= Best then if pt(j) > ∆ then

p_t+1(j) ← p^t(j) − ∆ else

p_t+1(j) ← 0 end if

end if

p_t+1(Best) ← pt+1(Best) − p^t(j) j ← j + 1

until j = N_r+ 1

(43)

3.4 Mathematical Analysis

Although all three algorithms have polynomial complexity, it is obvious that the complexity of DLRA is much lower than the others.

3.4 Mathematical Analysis

3.4.1 Convergency

In this study, DLRA converges if a probability of one state achieves its maximum value, namely 1. DLRA then converges to this state. This section shows the convergency of DLRA. Suppose that the update policy as in (3.8) will increase the probability of the actual best state, pt(b) (the superscript i is dropped for brevity), with probability ξt(b) and will decrease with probability 1 − ξ^t(b) at time t. Thus:

pt+1(b) =











1 −P

j6=bmax{p^t(j) − ∆, 0}, w.p. ξt(b)

max{pt(b) − ∆, 0}, w.p. 1 − ξt(b) (3.9)

where w.p. stands for ”with probability.” The algorithm is converged when pt(b) = 1 − N^rΨ. Suppose that the algorithm has not converged to state b yet; there then exists a state j with probability pt(j) (j 6= b) which satisfies the following:

pt(j) > max{p^t(j) − ∆, 0} (3.10) According to the second axiom of probability:

pt(b) = 1 −X

j6=b

pt(j) (3.11)

(44)

and, thus

1 −X

j6=b

max{p^t(j) − ∆, 0} > p^t(b) (3.12)

As long as there is at least one pt(j) which is larger than 0, pt(b) can be increased by decreasing pt(j), and the increasing amount is at least min{p^t(j), ∆}.

Therefore, (3.9) is re-written as:

pt+1(b) = pt(b) + at∆, w.p. ξt(b)

pt(b) − ∆, w.p. 1 − ξ^t(b) (3.13) where at∈ (0, N^r].

For a given source node, the current system state which includes the algorithm state of the other source nodes is denoted as θt, and the probability vector of the source node is Pt. So, the expected value of pt(b) conditioned on θt and Pt can be calculated, and can be written as

E[pt(b)|θ^t, Pt] = ξt(b){p^t(b) + at∆} + (1 − ξ^t(b)){p^t(b) − ∆} (3.14)

In (3.14), pt(b) does not achieve its maximum value 1. In the next step, the condition for pt(b) is derived to be a submartingale which means that the condition pt(b) is increased by achieving its maximum value 1. The definition of submartingale is shown as follows:

Definition 1. Submartingale: A discrete-time submartingale is a sequence X1, X2, ..., Xn, ... of integrable random variables satisfying E[Xn+1|X¹, ..., Xn] ≥ Xn

(45)

Since the maximum value of pt(b) is 1, then:

sup

t≥0

E[pt(b)|θ^t, Pt] < ∞ (3.15) and (3.14) can be rewritten as

E[pt(b) − p^t−1(b)|θ^t, Pt] = [ξt(b)(at+ 1) − 1]∆ (3.16)

The right-hand-side of (3.16) ≥ 0 if and only if

ξt(b)(at+ 1)− 1 ≥ 0

⇒ ξt(b) ≥ 1/(a^t+ 1)

(3.17) It is therefore a submartingle when (3.17) holds. Suppose that the algorithm satisfies the condition at time t0, and the condition holds for all t > t0. So, according to the submartingle convergent theorem [92], the sequence {p^t(b)}^t>t0

converge, such that

E[pt+1(b) − p^t(b)|θ^t, Pt] → 0 w.p. 1 (3.18) and the maximum value of pt(b), namely 1, is achieved as t → ∞.

3.4.2 Asymptotic Theorems

Asymptotic theory is often used in mathematical sciences to provide limiting ap- proximations of the probability distribution of sample statics. In this section, three asymptotic theorems of DLRA based on the technique presented by Oom- men et al. [93] are established. In Theorem 1, it is shown that DLRA can reach the required number of trials in a finite time. Theorem2shows that if each state

(46)

is chosen more than the required times, the best rate chosen actually has the best performance. Theorem 3 shows the optimality of DLRA.

Theorem 1. For each state si, suppose p₀(i) 6= 0. Then, for any constant δ > 0 and M < ∞, there exists t⁰ < ∞ and n⁰ < ∞ such that under DLRA algorithm,

∀t > t⁰, ∀n > n⁰ :

Pr {each state chosen more than M times at time t} ≥ 1 − δ

Proof. Denote a random variable Z_i^t as the number of times that state si was chosen up to time t. Next, for any iteration of DLRA algorithm

Pr{sⁱ is chosen} ≤ 1 (3.19)

Likewise, the magnitude by which the probability of any state can decrease in any single iteration is bounded by ∆. Therefore, during any of the first t iterations of DLRA:

Pr{sⁱ is not chosen} ≤ 1 − max{p⁰(i) − t∆, 0} (3.20) According to (3.19) and (3.20), the probability that state si is chosen at most M times among t choices satisfies the following:

Pr{Zi^t≤ M} ≤

M

X

j=0

t j

(1)^j(1 − max{p⁰(i) − t∆, 0})^t−j

=

M

X

j=0

t j

(1)^jϕ^t−j

(3.21) It must now be shown that (3.21) is less than or equivalent to δ. To show the mth term in (3.21) is less than or equivalent to δ/(M + 1), and is sufficient to make a sum of (M + 1) terms in (3.21) less than δ. Therefore, it must be proved that:

t

m(1)^mϕ^t−m≤ δ/(M + 1)

⇒ (M + 1) _m^tϕ^t−m≤ δ

(3.22)

(47)

It is observed _m^t ≤ t^m, thus:

(M + 1)t^mϕ^t−m≤ δ (3.23)

In order to make the L.H.S. of (3.23) less than δ as t increases, ϕ must be strictly less than unity. Therefore, the value of ∆ = 1/n(Nr+ 1) is bounded to achieve this goal with respect to t by ϕ < 1. Thus:

1 − [p0(i) − t/n(N^r+ 1)] < 1

⇒ n > _p ^t

0(t)(Nr+1)

(3.24) So, the value of n is set to _p ^2t

0(t)(Nr+1) to achieve the requirement. Then, according to (3.21), (3.22) and (3.23):

Pr{Zi^t ≤ M} ≤ (M + 1)t^mϕ^t−m (3.25) The R.H.S. of (3.25) is considered when t approaches infinity:

t→∞lim(M + 1)t^mϕ^t−m = (M + 1) lim

t→∞t^m 1

(1/ϕ)^t−m (3.26) By using L’Hopital’s rule m times, the following is obtained:

(M + 1) lim

t→∞

m!

(ln(_ϕ¹)^m)(¹_ϕ)^t−m = 0 (3.27) Thus, since the limit exists, for every state si, there is a t(i) such that Pr{Zi^t≤ M} ≤ δ for all t > t(i). In addition, for any t > t(i), since Zi^t(i) ≥ M implies Z_i^t≥ M. So, by the law of probability:

Pr{Zi^t≤ M} > Pr{Zi^t(i) ≤ M} (3.28) Therefore, for any state si, Pr{Zi^t≤ M} ≤ δ whenever t > t(i). Define

t0 = max

1≤i≤Nr+1{t(i)} (3.29)

In this way, it is true, for all t > t0, Pr{Zi^t ≤ M} ≤ δ for all i, and it implies

Pr{Zi^t≥ M} ≥ 1 − δ (3.30)

(48)

Next, the second theorem shows that if all the states are chosen enough times, the best state chosen by DLRA actually has the best performance among all of the states.

Theorem 2. There exists an integer, denoted by M, for every δ ∈ (0, 1) such that if every state si is selected at least M times by time t:

Pr {the best state chosen by DLRA actually has the best performance among all states}> 1 − δ ⇒ Pr{ˆb^t = arg max

j d(j)}> 1 − δ

Proof. Denote h as the difference between the two largest performances in the network system. By this assumption, the best performance for the best state, d(b), is unique, therefore, h > 0 and d(b) − h ≥ d(i) ∀i 6= b. Let Zi^t be the number of times si is chosen up to time instant t. Suppose ˆdt(i) is the estimator of the performance of state si at time t, Then, according to the weak law of large numbers, for a given δ > 0, there exists an Mi < ∞ , such that, if sⁱ is chosen at least Mi times:

Pr{| ˆdt(i) − d(i)| < h/2} > 1 − δ (3.31) Let M = max

1≤i≤Nr+1{Mⁱ}, and if min

1≤i≤Nr+1{Zi^t} > M, then:

Pr{| ˆdt(b) − d(j)| < h/2} < 1 − δ, ∀j 6= b ∀t (3.32) From Theorem 1, a t0 can be found such that

Pr{ min

1≤i≤Nr+1{Zi^t} > M} > 1 − δ, ∀t > t⁰ (3.33) Therefore, it is known that each ˆd(i) will be in an h/2 neighborhood of d(i) if all states are chosen at least M times. So:

dˆt(b) ≥ d(b) − h/2 > d(i) − h/2

⇒ dˆt(b) ≥ ˆd(i) (3.34)

Based on the two previous theorems, the last theorem can be obtained. The last theorem shows the optimality of DLRA which means that the probability of the best state achieves its maximal value.

(49)

Theorem 3. In every stationary network system, the DLRA algorithm is ǫ- optimal. More explicitly, given any ǫ > 0 and δ > 0, there exists n0 < ∞ and t < ∞ such that:

Pr{|p^b(t) − 1] < ǫ|} > 1 − δ

Proof. According to Theorem 2, M is a constant for each network system, and by Theorem1, there exists t0 < ∞ and n⁰ < ∞, such that, under DLRA:

Pr{Zi^t> M} > 1 − δ (3.35) Then, define U and V as the two events shown as follows:

( U ≡ |p^b(t) − 1| < ǫ V ≡ max

1≤i≤Nr+1{ ˆdt(i) − d(i)} < h/2 (3.36) So

Pr{U|V } = Pr{|p^b(t) − 1]| < ǫ| max

1≤i≤Nr+1{ ˆd^t(i) − d(i)} < h/2} (3.37)

According to the previous discussion:

t→∞lim Pr{U|V } → 1 (3.38)

By Theorem2 and (3.35):

t→∞lim Pr{V } → 1 − δ (3.39)

By the low of total probability and probability is a continuous function, then:

t→∞lim Pr{U} ≥ lim_t→∞Pr{U|V } lim_t→∞Pr{V } (3.40) From (3.38), (3.39), and (3.40):

t→∞lim Pr{|p^b(t) − [1 − N^rΨ]| < ǫ} ≥ 1 − δ (3.41)

Proposition 1. DLRA converges under any initialization of P₀.

Proof. Assume that P0 = {P⁰(1), P0(2), ..., P0(Nr+ 1)}, and we discuss in three cases:

(50)

(1) If P0(i) > 0 ∀i:

The discussion was mentioned before in this section. When (3.17) holds, the sequence pb(t) is a submartingle, therefore, DLRA converges to the best state.

According to theorem 2, there always exists an integer M; thus, (3.17) always holds true. DLRA converges to the best state under this case.

(2) If P0(i) = 0 and i 6= b

When DLRA converges to the best state, the value of P0(i) ∀i is zero. There- fore, we can say that the sequence pt(i) converged in this initial condition. There- fore, DLRA would converge to the best state, namely Pb(t) = 1, eventually.

(3) If P0(b) = 0

Since the probability of the best state is zero in the initial condition, DLRA cannot converge to the best state b. Suppose ˆb, the best state, excludes the original best state b among all states. Therefore, denote ξ(ˆb) as the probability that the update policy would actually increase the value of pt(ˆb). We can consider this condition as signifying that the best state b no longer exists in the system.

It is the case (1) where the best state becomes ˆb. So, DLRA will converge to the state ˆb, namely p_ˆb(t) = 1.

According to the discussion on the three cases, we conclude that DLRA converges under any initialization of P0.

Proposition 2. DLRA converges under any channel fading model.

Proof. The impact of channel fading model is included in the variable θt. Ac- cording to Theorem 2, there always exists an M for variable ξt(b) to meet (3.17).

The different channel fading models have different value of M. Since M always exists, DLRA converges in any channel model.

(51)

ı

Ĳ ĳ

Ń

ŏ

ȭ_ıġĩĲĪ ȭ_ıġĩĳĪ ȭıġĩĴĪ

ȭ_ıġĩŃĪ

ȭ_ıġĩŏĪ

ĴįĲ Ĵįĳ ĴįŃ Ĵįŏ

ȭĲġĩĲĪ ȭĲġĩĳĪ ȭĲġĩŃĪ ȭĲġĩŏĪ

Ĵ

ŏįĲ ŏįĳ ŏįŃ ŏįŏ

ȭĲġĩĲĪ ȭĲġĩĳĪ ȭ_ĲġĩŃĪ ȭĲġĩŏĪ

ĴįŃįĲ ĴįŃįĳ ĴįŃįŃ ĴįŃįŏ

ȭĴġĩĲĪ ȭĴġĩĳĪ ȭĴġĩŃĪ ȭ_ĴġĩŏĪ

ŏįŃįĲ ŏįŃįĳ ŏįŃįŃ ŏįŃįŏ

ȭ_ĴġĩĲĪ ȭĴġĩĳĪ ȭĴġĩŃĪ ȭĴġĩŏĪ

t=0 t=1 t=2 t=3

Figure 3.2: State transition diagram of DLRA.

3.4.3 Performance

In this section, we provide a comprehensive performance analysis based on mathematical manipulation. Assume that each source node assigns equal probability of each choice in the beginning, namely each choice has probability 1/N of being chosen. Fig. 3.2 shows the state transition diagram of one node in DLRA; the different states stand for the different evolutions of probability vector. In Fig.

3.2, N is the number of choice, ξi(j) is the probability that the probability of choosing choice j is increased at time i, and Si stands for the state i. In this section, different choices refer to different relay selections, and each state stands for a probability vector of relay selections. Suppose the performance of choice i is denoted by Ci. Therefore, the expected performance of initial state, namely

(52)

state 0 in Fig. 3.2, is:

E[S₀] =

N

X

i=1

Pi× Cⁱ =

N

X

i=1

1

N × Cⁱ =

N

P

i=1

Ci

N (3.42)

In Fig. 3.2, state i (i ∈ {1, ..., N}) stands for the probability of choosing choice i is increased, and decreased for the others; assume the step size is ∆.

Therefore, the expected performance of state i is:

E[Si] =

N

X

j=1,j6=i

(1

N)Cj+ [1

N + (N − 1)∆]Cⁱ

= ( 1 N − ∆)

N

X

j=1,j6=i

Cj+ Cj

N + ∆(N − 1)C^j

=

N

P

i=1

Ci

N + ∆

N

X

j=1

(Ci− C^j)

(3.43)

In this state diagram, there are 1 and N states when t = 0 and t = 1, respectively. The number of states is N² when t = 2, and the value is increased to N^T when t = T . It is an unaccepted number of states and too complicated to solve, via this diagram, when t is getting larger.

We combine the states which are not the best choice into one state, and assume that these states have the same performance which is denoted by Ca, and less than the best one denoted by Cb. Besides, we assume that ξb(t) is the same for all t. According to the asymptotic theorem, this hypothesis is reasonable, and p = ξb(t) is assumed for simplicity. Therefore, the new diagram is modeled as a Markov chain and is shown as Fig. 3.3.

下一代異質性網路智能技術之研究

國立臺灣大學電機資訊學院電信工程學研究所 博士論文

Graduate Institute of Communication Engineering College of Electrical Engineering and Computer Science

National Taiwan University Doctoral Dissertation

下一代異質性網路智能技術之研究

On the Efficiency of Intelligent Technologies for Next Generation Heterogeneous Networks

陳贊羽 Zanyu Chen

指導教授：林宗男 博士 Advisor: Tsung-Nan Lin, Ph.D.

中華民國 104 年 7 月

July, 2015

口試委員會審定書

下一代異質性網路智能技術之研究

On the Efficiency of Intelligent Technologies for Next Generation Heterogeneous Networks

本論文係陳贊羽君（D98942025）在國立臺灣大學電信工程學研究 所完成之博士學位論文，於民國 104 年 7 月 20 日承下列考試委員審查 通過及口試及格，特此證明

口試委員：

（簽名）

（指導教授）

誌謝

Abstract

Contents

List of Figures

List of Tables

Chapter 1 Introduction

1.1 Technology Trends and Motivations of the Dissertation

1.2 Topic to Be Addressed

1.2.1 Cooperative Communication

1.2.2 Interference Management

1.3 Dissertation Organization

Chapter 2

Related Work

2.1 Cooperative Communication

2.2 enhanced Inter-Cell Interference Management (eICIC)

2.2.1 Range Expansion and Inter-cell Interference Coor- dination

2.2.2 Literature Review

Chapter 3

Decentralized Learning-Based Relay Assignment for

Cooperative Communications

3.1 Background Information

3.2 System Model

3.2.1 Cooperative Communication Modes

3.2.2 Network Model

3.3 Decentralized Learning based Relay Assign- ment algorithm

Environment

P(t+1) = T(D(t))

Learning Automata

S(t) = Q(P(t)) D(t) = {D

, D

, …, D

}

S(t) ∈ {S

, S

, …, S

}

3.3.1 SLA: Stochastic Learning Automata

3.3.2 Proposed Algorithm

3.3.3 Complexity

3.4 Mathematical Analysis

3.4.1 Convergency

3.4.2 Asymptotic Theorems

3.4.3 Performance

國立臺灣大學電機資訊學院電信工程學研究所博士論文

指導教授：林宗男博士 Advisor: Tsung-Nan Lin, Ph.D.

本論文係陳贊羽君（D98942025）在國立臺灣大學電信工程學研究所完成之博士學位論文，於民國 104 年 7 月 20 日承下列考試委員審查通過及口試及格，特此證明