A
chieving reliable and high data rate communications over wireless links remains a challenging problem. In fact, the inherent nature of the wireless medium has created a number of new research topics. Compared to the wire-line communications, the wireless medium is a ubiquitous resource which is accessible simultaneously by multiple transmissions. The sharing of the medium by multiple links results in a mutually interfered environment, and gives rise to challenges in resource management. In conventional cellularnetworks consisting of multiple base stations, frequency planning is adopted. However, we have to consider universal frequency reuse. The reasons are two-fold. First, frequency reuse factor larger than one limits the spectrum efficiency in that only a fraction of spectrum is utilized by each cell regardless of the actual interference condition. Second, in newly developed network topology, the base stations can be deployed in a distributed manner, which makes cell planning hard. Obviously, universal frequency reuse among nearby cells results in inter-cell interference (ICI) and degrades the performance. This statement, though straightforward, lies at the basis of many research topics within wireless communications. Let us mention two examples as follows.
Cooperative communications.
The broadcast nature of wireless communications suggests that a receiver node can overhear the source signal transmitted towards a neighboring nodes. Instead of treating the overheard information as interference and trying to mitigate the neg-ative effect, cooperneg-ative communication takes advantage of the proximity of nodes to create spatial diversity, thereby to improve the spectrum efficiency and reliab-ility. In practice, the cooperation can be implemented in different ways. In the relay (multi-hop) networks, the signal is received and processed at the surround-ing nodes, then re-transmitted towards the destination. On the other hand, when multi-antenna system is considered, signal processing techniques can be applied to transmit the signal simultaneously from multiple nodes. In this case, the signal to be transmitted is pre-processed to suppress the ICI and obtain the diversity gain.
Assuming perfect back-haul connection, the network consisting of multiple cells can be viewed, and we end up with a virtual MIMO system. The two scenarios are shown in Figure 1.1.
Self-organized resource management.
The limitations on coordination of distributed networks gives rise to new challenges for resource management. On top of that, self-organized network (SoN) capability has received much attention because, unlike the negotiation-based approaches, it does not suffer from the information exchange overhead. SoN has been considered
source
Relay network relay
source
source
Virtual MIMO
Figure 1.1: Two scenarios in cooperative communications.
in different examples. From the spectrum utilization perspective, dynamic spectrum access (DSA) suggests a distributed decision-making mechanism with consideration on a possibly varying environment. Another example is the heterogeneous networks, in which the spectrum is owned by multiple service providers, and users need proper network selection. Two fundamental mathematical tools frequently involved in SoN are the game theory and the reinforcement learning (RL). Game theory investigates the interaction among self-acting agents, in either cooperative or non-cooperative ways. Game theoretic formulation defines possible solution concept of equilibrium at which unilateral deviation from an equilibrium point brings no better results.
On the other hand, RL algorithms helps individual agents learn a better strategy based on their own action-reward history. Interestingly, the two tools may be com-bined; several reinforcement learning techniques have been proved to achieve the equilibrium point.
This thesis aims at investigating the distributed resource management in wireless communications. Specifically, we study the use of reinforcement learning under game-theoretic formulations. The motivation behind is that, while the problem structures can be quite different, we would like to propose a unifying scheme which is suitable for various applications. A general guideline of the proposed scheme is described as follows. First, some components (e.g., base stations or users) in the network are identified as the agents (players) of the game. Second, the utility function is defined in order to reflect the agents’
interests, either individual or common ones. Finally, assuming they are rational and selfish devices, agents act as learning automata to learn their strategies that maximize their individual payoffs. Notice that in addition to the interaction among players, the time-varying external state is also considered in the learning procedure.
Starting with the seminal contributions of Von Neumann, Morgenstern [1] and Nash [2], game theory has been extensively investigated in the previous century. While early works focused on the studies of economy, game theory has become a popular choice for the researchers in wireless networks. Comprehensive surveys on the game-theoretic studies for different wireless network applications can be found in [3,4]. On the other hand, we also see rapid development of RL algorithms over the past few decades. Q-learning [5] is a simple way for agents to learn how to act optimally in controlled Markovian domains. It works by successively improving its evaluations of the quality (Q-value) of particular actions at particular states. Another learning method, referred to as the stochastic learning (SL), is based on the update of probability. Using the techniques in stochastic approximation [6], the SL process tracks the ODE of different dynamics. The resulting state depends on the learning rule adopted. The hybrid learning was discussed [7], where the agents may adopt different learning rules to obtain the strategy. SL has been applied to several areas in wireless networks, for example, precoder selection [8], network selection [9], and cognitive radio [10]. The connection between learning and game has been investigated by Sastry et al. [11]. The authors have proposed an SL algorithm and pointed out that NE can be achieved when the algorithm is applied to common-payoff games. In this thesis we will further show that the same algorithm achieves NE for potential games, of which the common-payoff game is a special case.