國 立 交 通 大 學
電子工程學系 電子研究所碩士班
碩 士 論 文
以多層級貝氏賽局為基礎之感知無線網路頻譜買賣
Multistage Bayesian Game based Spectrum Trading for
Cognitive Radio Networks
研 究 生: 李重佑
指導教授: 簡鳳村 博士
以多層級貝氏賽局為基礎之感知無線網路頻譜買賣
Multistage Bayesian Game based Spectrum Trading for Cognitive
Radio Networks
研 究 生: 李重佑 Student: Chong-You Lee
指導教授: 簡鳳村 博士 Advisor: Dr. Feng-Tsun Chien
國 立 交 通 大 學
電子工程學系 電子研究所碩士班
碩 士 論 文
A Thesis
Submitted to Department of Electronics Engineering & Institute of Electronics College of Electrical and Computer Engineering
National Chiao Tung University in Partial Fulfillment of the Requirements
for the Degree of Master of Science
in
Electronics Engineering January 2010
Hsinchu, Taiwan, Republic of China
以多層級貝氏賽局為基礎之感知無線網路頻譜買賣
研究生:李重佑 指導教授:簡鳳村 博士
國立交通大學
電子工程學系 電子研究所碩士班
摘要
在本篇論文,我們以賽局理論的角度來研究感知無線電網路頻譜買賣。我們考 慮一個由多主要服務者(primary service)和多的次要服務者(secondary service)所構成 的感知無線電網路。主要服務者是此買賣賽局中的頻譜賣家,它們可以設定租借頻 帶給次要服務者的單位頻帶價格;次要服務者是該賽局中的買家,它們要決定跟買 家買多少頻帶。我們提出以多層級貝氏賽局為基礎的買賣模型來建立每個玩家可能 未公開的私人資訊的情況,並在符合頻帶限制下依序地求得完美貝氏平衡點(perfect Bayesian equilibrium)。所謂的頻帶限制其實就是所有買家所要求的頻帶量加起來不 能超過賣家所能負荷,而每個買家所要求的頻帶量也不能是負值。由倒推歸納原則, 我們將買家的 Karush-Kuhn-Tucker (KKT) condition 轉化為賣家的最佳化問題之條 件,並將所有賣家的 KKT condition 集合起來成為 joint KKT condition,符合該 joint KKT condition 的解即為此賽局的解。我們並提出以 active-set algorithm 來解該 joint KKT condition,並分析它的複雜度。此論文也探討了玩家的行動和對未知資訊的信 念是否會收斂。在模擬中,我們比較了我們的作法和前人的作法,並且數值上探討Multistage Bayesian Game based Spectrum
Trading for Cognitive Radio Networks
Student: Chong-You Lee Advisor: Dr. Feng-Tsun Chien
Department of Electronics Engineering
Institute of Electronics
National Chiao Tung University
Abstract
In this thesis, we study the problem of spectrum trading in cognitive radio (CR) networks from a game theoretical perspective. Particularly, we consider a CR network with multiple primary services (PSs) and multiple secondary services (SSs), where all PSs are sellers targeting at setting the prices for spectrum leasing and SSs are buyers deciding how much spectrum are demanded from each PS in the trading game. Aiming at dealing with the trading behaviors, we propose using a multistage Bayesian game based trading model to account for possible unknown private information in each player, and obtain the perfect Bayesian equilibrium (PBE) sequentially under a bandwidth constraint, which requires all SSs' demanded bandwidth not exceeding that the PS can possibly offer and each SS's demand should not be negative. Following the backward induction principle, we transfer the Karush-Kuhn-Tucker (KKT) condition of the SSs into each PS's optimization constraint, and collectively form joint KKT conditions that satisfy the bandwidth constraint. We present an active-set based algorithm to solve the joint KKT conditions, and analyze the corresponding complexity. Furthermore, the convergence behaviors of the action profiles and the beliefs of the unknown information are also investigated in the work. Finally, in the simulations, we compare the proposed approach with earlier work and numerically study the convergence behaviors of the proposed
誌謝
我要感謝我的指導教授簡鳳村老師,從大四以來積極地提供各種學習資源;碩 一赴魯汶攻取雙聯學位老師也是欣然答應,讓我得以出國體驗,那一年見識給我很 大的衝擊,不論是外國學習環境、學術研究方式、社會和文化;碩二回台灣後,和 老師從事未曾研究過的賽局理論領域,雖然自學的過程辛苦,但當我有任何問題時, 老師總是不厭其煩地和我討論,有時一討論一個下午就過去了,有時一個禮拜和老 師討論三次,這個過程中我不僅學習良多,也慢慢建立起獨立研究的自信,這都要 感謝老師給予的挑戰和指導。謝謝桑梓賢教授在最佳化理論上的啟蒙和研究上的討 論與指導。 此外,感謝通訊電子與訊號處理實驗室的同學,主要要感謝江清德黃盈叡,除 了剛回來幫我適應環境和融入實驗室外,做研究之餘常常還被我拜託幫忙確認我的 邏輯和思緒是否有不合理之處。其它的學長姊、學弟妹、同學們和朋友,雖然沒有 一一誌謝,沒有你們,我的碩二生涯不會如此豐富。 在新竹的這六年,還要謝謝大伯父李宗仁和如英姊的照顧。最後,我要感謝我 的家人,謝謝你們一直以來的支持,你們永遠是我精神上最大的支柱。 在此,將此篇論文獻給所有愛我和我愛的人。 李重佑 民國九十九年一月 於新竹Contents
1 Introduction 1
1.1 Significance of the Research . . . 1
1.2 Motivation . . . 2
1.2.1 Why Game Theory? . . . 2
1.2.2 Related Work and Our Approach . . . 3
1.3 Contribution . . . 5
2 Cognitive Radio and Game Theory Preliminary 7 2.1 Cognitive Radio . . . 7
2.2 Game Theory . . . 8
2.2.1 A Basic Game – Definitions and Theorems . . . 9
2.2.2 Multistage Bayesian Game . . . 12
3 Multistage Bayesian Spectrum Trading Game 15 3.1 Problem Setup . . . 15
3.2 Game Formulation . . . 17
3.3 General Formulation for Multiple Sellers and Multiple Buyers . . . 18
3.3.1 Utility Model . . . 18
3.3.2 Self-Optimization and KKT Translation . . . 19
3.3.3 Perfect Bayesian Equilibrium and Joint KKT Condition . . . 21
3.4 Explicit System For Multiple Sellers and Multiple Buyers . . . 23
3.4.1 Utility Model . . . 23
3.4.3 Algorithm for Solving Joint KKT Condition . . . 26
3.5 Convergence of Beliefs and Actions . . . 28
4 Simulations 30 4.1 Simulation Setup . . . 30
4.2 Numerical Results . . . 30
4.2.1 Effectiveness of The Joint KKT Conditions . . . 30
4.2.2 Evolutions of Beliefs and Actions . . . 37
5 Conclusion and Future Work 46 5.1 Conclusion . . . 46
5.2 Future Work . . . 47
List of Figures
2.1 Simplified Cognition Cycle. . . 8
3.1 The cognitive radio network with multiple primary services and multiple
secondary services. . . 16
3.2 The evolution of the multistage sequential game. . . 17
4.1 The best response, Nash equilirium and feasible region for PS’s with
W1 = 15 MHz and W2 = 15 MHz, γ1p = 15 dB, γ
p
2 = 15 dB, γs11 = 22
dB, andγs
12= 22 dB. . . 32
4.2 Profit function given that the opponents act equilibrium strategy with
W1 = 15 MHz and W2 = 15 MHz, γ1p = 15 dB, γ
p
2 = 15 dB, γs11 = 22
dB, andγs
12= 22 dB. (a) Of PS1. (b) Of PS2. (c) Of SS. . . 33
4.3 The best response, Nash equilirium and feasible region for PS’s with with
W1 = 5 MHz and W2 = 5 MHz, γ1p = 15 dB, γ
p
2 = 15 dB, γ11s = 22 dB,
γs
12= 10 dB. . . 34
4.4 Profit function given that the opponents act equilibrium strategy with
W1 = 5 MHz and W2 = 5 MHz, γ1p = 15 dB, γ
p
2 = 15 dB, γ11s = 22 dB,
γs
12= 10 dB. (a) Of PS1. (b) Of PS2. (c) Of SS. . . 35
4.5 The equilibrium strategy over stages, a comparison between the scenario
with M=3 and that with M=4. (a) Of PS’s. (b) Of SS’s. . . 36
4.6 The equilibrium strategies over stages of Case 1. (a) Of PS’s. (b) Of SS’s.
(c) The possible minimal equilibrium strategies of SS’s. . . 38
4.7 The equilibrium strategies over stages of Case 2. (a) Of PS’s. (b) Of SS’s.
4.8 The equilibrium strategies over stages of Case 3. (a) Of PS’s. (b) Of SS’s.
List of Tables
4.1 Belief Updating versus Stage for Case 1 . . . 40
4.2 Belief Updating versus Stage for Case 2 . . . 42
Chapter 1
Introduction
1.1
Significance of the Research
Nowadays we are facing a more congested spectrum than ever before. Most spectrum for long-distance radio transmissions has been allocated to licensed users, and there’s no much room for emerging wireless applications. However, large temporal and
geograph-ical variations exist in licensed spectrum utilization, and almost only 2% are always in
use according to the survey in [1]. That is, the efficiency of spectrum utilization is unac-ceptably low, so researchers start to think different spectrum allocation policies in order to tackle the problem. There’re two main approaches to this issue. One is that unlicensed users can opportunistically utilize licensed spectrum if not interfering with licensed users, while the other is that spectrum trading (or active negotiation) between licensed and un-licensed users would be a promising solution [2, 3]. We attempt to address several is-sues in the spectrum trading problem in this thesis. The idea of spectrum trading comes from economic point of view because of the success of economical world. Both schemes are considered as possible solutions in dynamic spectrum management. Apparently both schemes could enhance the efficiency of spectrum utilization, but how the network be-haves, how much efficiency can be increased, and how the fairness is guaranteed are open issues to be studied.
1.2
Motivation
The promise of providing anytime and anywhere multimedia services demands a large spectrum for broadband wireless communications. On one hand, this drives the advance of radio technology to faster, convenient and reliable communications. On the other hand, the enormous demand also unveils the problem of insufficiency and under-utilized ineffi-ciency of current radio spectrum. Useful radio spectrum is a scarce resource in that the characteristic of spectrum on different frequency is different, e.g. the communication on 60 GHz is only suitable for short distance because of the absorption of radio signal by oxygen of the Atmosphere. Nowadays, the most useful spectrum band for median and long distance communication is below 5 GHz due to the characteristic of spectrum and current circuit technology. To tackle the problem, the idea of exploiting under-utilized licensed spectrum for more flexible and efficient transmissions is receiving significant at-tentions lately. In particular, the concept of cognitive radio (CR) [4] is considered as a promising technique to improve the efficiency of current radio spectrum.
A cognitive radio (CR) is a software-defined radio capable of intelligently sensing, adapting and responding to constantly varying environments, particularly the available spectrum temporarily not used by licensed users. However, there still exist many technical challenges before cognitive radios can be practically deployed. One critical challenge is how to invite the licensed service operators to accept coexistence with cognitive users so that they are willing to share their unused spectrum to unlicensed cognitive (secondary) services. Leasing available spectrum to unlicensed services is an attractive solution that provides an incentive for legitimate licensed operators to support deploying cognitive radios [3]. This gains monetary profits for licensed operators, while fulfilling unlicensed services’ satisfaction requirements by renting.
1.2.1
Why Game Theory?
Conventional Media Access Control (MAC) theory is based on optimization, and the ob-jective function it aims at optimizing is the network system utility or the network system utility in terms of fairness, e.g. proportional fairness. Although some problem
formula-tions using optimization theory can be decomposed to problems to optimize network and user utility separately by dual-primal method [5], which makes distributed decision mak-ing possible, the solution for the optimization problem inherently couldn’t always satisfy each user’s individual utility.
In contrast to optimization-based approach, game theory is a mathematical tool to deal with interactions between multiple entities, each of which has its own utility function, and intrinsically looks for equilibrium solutions that maximizes each user’s individual utility. Though the network system utility may not be optimized, the strategy obtained from the game theoretical perspective provides a solution that achieves efficiency and fairness under certain criteria.
1.2.2
Related Work and Our Approach
An overview of the general idea and recent developments about dynamic spectrum shar-ing games can be found in [6]. The auction mechanism for spectrum band in CR networks with multiple primary and multiple secondary users is considered in [7], where the authors discuss competitive equilibrium, cheating behaviors which may deteriorate the efficiency of of the spectrum sharing and propose using reserve prices and beliefs to prevent collu-sion. The work in [8] and [9] consider a game model which incorporates both monetary gain and quality-of-service (QoS) satisfaction of wireless services in utility functions. The authors in [9] explicitly model the price for available bandwidth as a function of demand, and obtain the Nash equilibrium (NE) for the spectrum sharing strategy in a network con-sisting of a single primary service (PS) and multiple secondary services (SS). The work in [8] considers the spectrum trading game in a CR network with multiple PS’s and a single SS, and models the utilities of the PS and SS separately, wherein the demand of SS implicitly affects the price. However, under certain circumstances, the equilibrium band-width demand for the SS would be negative, and the corresponding NE turns out to be infeasible, though theoretically solvable. The work in [10] discusses the same problem as in [8], and compare different features such as market equilibrium as well as competitive and cooperative pricing strategies. In [11], the authors investigate the spectrum trading behaviors with a more general model in which multiple primary users (PUs) and
multi-ple secondary users (SUs) are considered in the CR network. However, the utility model considered in [11] may not capture different QoS requirements of SUs and assumes that each PU sets the same price for all SUs. One key assumption underlying all the above work is that each player in the modeled game have complete knowledge about the other players’ private information. This is in general not a realistic assumption. To account for the unknown private information within each player, one can resort to tools in Bayesian game or stochastic game to study the behaviors of spectrum trading in a sequential (dy-namic) manner [12, 13]. In [12], we formulate the spectrum trading behaviors for a CR network with multiple PS’s and a single SS as a Bayesian game, and study the correspond-ing solution concept, i.e. the perfect Bayesian equilibrium, sequentially. The work in [13] proposes to characterize the dynamics of spectrum access strategies under a stochastic game framework with the introduction of state transitions. The authors also propose to predict the future dynamics using approaches in learning theory in order to obtain better strategies for spectrum bidding.
In this work, extending the studies in [12], we address the problem of spectrum trad-ing in a CR network consisttrad-ing of multiple PS’s and multiple SS’s. We assume each player (PS or SS) in the game has its own private information, such as the number of active connections within each service or the channel conditions, that is unknown to other players. With the assumption, we formulate a multistage trading model based on the Bayesian game to statistically account for the unknown private information (incomplete information), and sequentially obtain the perfect Bayesian equilibrium (PBE) in the trad-ing process. We further assume that each PS is allowed to set different prices to the SS’s with different QoS, and SS’s with different QoS can demand different bandwidth sizes to a particular PS in the considered model. Particularly, we consider a bandwidth constraint on the aggregate bandwidth demand from all SS’s such that the total demand has to be within feasible supply regions provided by each PS.
1.3
Contribution
Aiming at dealing with the trading behaviors with that each play has its own private information, we propose using a multistage Bayesian game based trading model to ac-count for possible unknown private information in each player, and obtain the perfect Bayesian equilibrium (PBE) sequentially under a bandwidth constraint, which requires all SS’s demanded bandwidth not exceeding that the PS can possibly offer and each SS’s demand should not be negative. We formulate the considered problem as a multistage game, since one-shot game can’t capture the time-varying demands for resources due to the dynamic nature of wireless channels and wireless services. In multistage game, the allocation is performed repeatedly, and belief updates through observing others’ actions can also be made possible. Our formulation captures different pricing and demand strate-gies for different seller and buyer pairs based on their QoS’s. More specifically, on one hand we allow a primary service set different prices per unit bandwidth to different sec-ondary services based on their operating conditions and QoS requirements. On the other hand, different secondary services can demand different bandwidth sizes from the same primary service. Following the backward induction principle, we transfer the Karush-Kuhn-Tucker (KKT) condition of the SS’s into each PS’s optimization constraint, and collectively form joint KKT conditions that satisfy the bandwidth constraint to guaran-tee our PBE is physically feasible. We present an active-set based algorithm to solve the joint KKT conditions, and analyze the corresponding complexity. Furthermore, we illus-trate the spectrum trading game by an example with specific utility functions of PS’s and SS’s. The convergence behaviors of the action profiles and the beliefs of the unknown information are also investigated in the work. In the simulations, we compare the pro-posed approach with that in [8] and numerically study the convergence behaviors of the proposed multistage game.
As a final remark in the section, we would like to emphasize the general applicability of the joint KKT approaches to solve a game with constraints. Mathematically, we formu-late the problem considered in the thesis as a game with constraints, which is often very difficult to solve. Relevant approaches are rarely seen in the field of pure game theory,
not to mention in the literature related to wireless networks. In most studies that consider games with constraints, their problems usually have certain mathematical structure so that the solutions are always on the boundary set by the constraints. In this thesis, we attempt to solve a bandwidth-constrained game, where the constraints include budgets and feasi-ble bandwidths, using the proposed joint KKT conditions. It is worthwhile to note that joint KKT condition is generally applicable to solve a constrained game. The solutions generally need not be on the boundary of the constraints.
Chapter 2
Cognitive Radio and Game Theory
Preliminary
2.1
Cognitive Radio
Cognitive radio, which first appeared in Joseph Mitola’s doctoral dissertation in 2000 [4], is defined as an intelligent wireless communication system that are capable of achieving highly reliable communication whenever and wherever needed by adjusting its own trans-mission parameters according to the radio environmental conditions it senses. CR is called ”cognitive” in that it’s equipped with structures supporting a cognition cycle consisting
of Observe, Orient, Plan, Decide, and Act phases as Fig. 2.11 shows. As for realistic
implementation, CR is built based on software defined radio and wide-band RF front end to achieve that. There’re prototypes of CR already built, such as the first prototype CR1 by Mitola [4], CR and networking by Virginia Tech [14].
Although the initial aim of CR is not to efficiently utilize the radio spectrum, it serves as the natural candidate for the problem of spectrum under-utilization. CR can either opportunistically detect the spectrum hole and transmit or actively negotiate with primary users, i.e the existing licensed users, to access the spectrum. In recent years, there’re tremendous amount of researches on CR-related topic. They can be classified into three 1This figure is adapted From Mitola, ”Cognitive Radio: An Integrated Agent Architecture for Soft-ware
Figure 2.1: Simplified Cognition Cycle.
fundamental tasks [3],
1. Radio-scene analysis, which includes estimation of interference temperature of the radio environment and detection of spectrum holes.
2. Channel state estimation and predictive modeling, which encompasses estimation of channel-state information and prediction of channel capacity for use by the trans-mitter.
3. Transmit power control and dynamic spectrum management.
Our work is focus on dynamic spectrum management, which we adopt game theoretic approach to tackle with.
2.2
Game Theory
Game theory is a mathematical tool to predict the result of rational interactive decision makers. Predicting the result of such players has great merit in many fields such as chess,
card game, gambling, business and economics, politics, international diplomatics, and also in wireless network in which we are interested since in those fields no player can achieve his goal or gain his own maximal profit without considering the competitors’ be-havior. Although sometimes the explicit model is difficult to be defined (e.g. politics) or
too complex to predict the result and to derive the winning strategy2(e.g. in chess game,
the problem can’t easily be formulated as the math form with which we are familiar and strategy space is discrete, making it both not differentiable and too complex in number to examine the all strategy profile), game theory stands a important tool to provide either a solution to simplified problem or an insight. As for wireless network, applying game theory to predict and further to regulate the network is anticipated since the increasing complexity of wireless network results in significant interference and foreseeable dynam-ics of interactive users in cognitive network.
In this section, we introduce some basic knowledge of noncooperative game theory that are necessary for understanding our work, while interested reader can refer to [15] or [16] for deeper materials.
2.2.1
A Basic Game – Definitions and Theorems
A game in essence is that there’re multiple players and each player possesses its own strategy (e.g. variable) which it can freely adjust and its own objective function (e.g. function) which depends on its and other players’ strategy. In mathematics, a game is defined as
Definition 1 A gameΓ is
Γ =DI, {Ax}x∈I, {ux}x∈I
E
, (2.1)
whereI ≡ {1, 2, · · · , N} is the set of players, Axis the set of actions available for player
x, and we denote the set of all available actions for all players as A = A1×A2×· · ·×AN.
A action taken by player x is ax ∈ Ax, and the action profile of all players is a =
a1× a2 × · · · × aN ∈ A. For notational simplicity, we denote a−x as the action profile
2Actually, game theory predicts the equilibrium strategy instead of winning strategy, but one can pick
taken by all players except playerx. ux is playerx’s utility function which is a function
ofax and of a−x.
There’re some assumptions in game theory. First, each player is rational and selfish so that each want to maximize its own utility. Readers should mind that ”selfish” doesn’t mean ”malicious”. A selfish player cares about its utility, while a malicious player aims at harming other players. It’s also assumed that all players know the rules of the game,
i.e. each knows all players’ action set and utility, and each knows that other players know
that and so on, and the action is perfectly observable by all. Indeed, the scenario is too ideal due to those assumptions, so other different kind of game models are developed by mathematicians to make the model more practical. For instance, Bayesian game, the game model we apply in this thesis, is a game that there’re some private parameters in each player’s utility function. The private parameters are not known to all in this kind of game, and it’s also called game of incomplete information. The detail of Bayesian game will be introduced in the latter section. Lets go on the basic game.
What action or strategy would each player take? Apparently, each player choose the
action that are best for it given the other players’ action, and that action that player x
would take is defined as follows,
Definition 2 The best responsebx(a−x) of player x to the action profile a−x is a action
axsuch that:
bx(a−x) = arg max
ax∈Ax
ux(ax, a−x) (2.2)
Since best response is the best for playerx, player x would stick to it.
Each knows that each player would take best response, so the result of game is the action profile that is best response for all, if it exists. This mutual best response point, which was found by the Nobel Laureate John Forbes Nash, is a equilibrium since every player would stick to it. The formal definition is as below,
Definition 3 The pure strategy profile a∗ constitutes a Nash equilibrium (NE) if, for each playerx,
Note that this definition is for pure strategy, and there’s corresponding NE for mixed
strategy3. In the following, we address the condition for the existence of pure-strategy
NE under different conditions,
Theorem 1 (Debrew 1952; Glicksberg 1952; Fan 1952 [16]) Consider a strategic-form
game whose strategy spacesAx are nonempty compact convex subsets of an Euclidean
space. If the payoff are continuous in a and quasi-concave in ax, ther exists a
pure-strategy Nash equilibrium.
Theorem 2 (Dasgupta and Maskin [16]) Let Ax be a nonempty, convex and compact
subset of a finite-dimensional Euclidean space, for allx. If, for all x, uxis quasi-concave
in ax, is upper semi-continuous in a, and has a continuous maximum , there exists a
pure-strategy Nash equilibrium.
The definition of quasiconcave, upper semi-continuous and continuous maximum are illustrated as follows.
Definition 4 Iff (λx + (1 − λ)y) ≥ min(f (x), f (y)) for all x,y ∈ domf and 0 ≤ λ ≤ 1,
and domf is convex [17], then f is quasiconcave.
Definition 5 A functionui(·) on S isupper semi-continuousat s, if, for any sequence sn
converging to s, [16]
lim sup
n→+∞
ui(sn) ≤ ui(s) (2.4)
Definition 6 A functionui has a continuous maximumif u∗i(s−i) ≡ maxsiui(si, s−i) is
continuous in s−i. [16]
NE is thought to be the solution concept, i.e. the rule for predicting how the game will be played, of static game of complete and perfect information, and interested read-ers can find the corresponding theorem for mixed-strategy vread-ersion in [16]. It’s notable that there’re different solution concepts for different kind of games, for example perfect
Bayesian equilibrium for Bayesian game.
3Mixed strategy is randomization of pure strategy, which can be viewed as more general strategy than
pure one. The condition for existence of mixed-strategy NE is also looser than for pure-strategy NE. How-ever, we often like to find pure-strategy NE since it’s more physically achievable.
2.2.2
Multistage Bayesian Game
Considering each player contains its own private information without knowing others’ one, NE is not a suitable solution concept in such game due to the unknown information. This kind of problem happens often, for example, say two competing firms whose strategy is to determine the quantity of goods, what’s the best strategy for each of them without knowing the other’s operation cost details? Or more generally, how would the game proceed if there’s some uncertainty about players’ information? Bayesian game is a type of game aiming at this kind of situation. The Bayesian approach forms belief about the unknown information and allows each player to update its posterior beliefs, i.e. posterior probabilities, about the other players’ private information by observing their actions in prior stages. Each player can act accordingly in the current stage based on the updated beliefs, and then the game proceeds in a way that each player maximizes its expected profit according to its belief.
Game Formulation
A multistage Bayesian gameΓ can be formulated as follows,
Γ =DI, {Ax(ht)}, {θx ∈ Θx}, {ux}, {µx(θ−x|θx, h0)}
x ∈ I, ht∈ Ht, t = 0, 1, 2, · · · , TE,
(2.5)
where I ≡ {1, 2, · · · , N} is the set of players, Ax(ht) is the set of actions available
for playerx given a history ht = (a0, a1, · · · , at−1) at the beginning of stage t with the
notation aτ = aτ
1 × aτ2 × · · · × aτN the action profile at stageτ with aτi ∈ Ai(hτ) being
the action of theith player at stage τ , Ht is the set of all history ht withh0 = Ø. T is
the length of game. We denote the set of all available actions for all players at stageτ as
A(hτ) = A
1(hτ) × A2(hτ) × · · · × AN(hτ). θxis the private information, also known as
type, of playerx. Type, which is the incomplete information in Bayesian game, cannot
be known but can be inferred by other players. The type profile θ = θ1× θ2 × · · · × θN,
and θ−x denotes the type profile θ excluding θx. The actual type value for player x is
denoted by bθx, and the corresponding type profile is bθ. The utility function ux of player
other words,uxis a function of all players’ types, past and current actions.µx(θ−x|θx, h0)
is playerx’s belief about other players’ type θ−x given its own typeθx at history h0. In
contrast with the static game of incomplete information, the belief about others can be updated stage by stage. Bayesian game defines the rule of how players update their belief stage by stage, and the players’ actions can change according to the newly updates of beliefs.
Solution Concept and Belief System
In the game theory literature, the solution concept in a multistage game of incomplete information is called the perfect Bayesian equilibrium (PBE), which is a parallel to the subgame perfect equilibrium (SPE) in a multistage game of complete information. As SPE serves as a refinement of the Nash equilibrium in a multistage game of complete information, PBE is a refinement of the Bayesian NE (BNE) in a multistage game of incomplete information. To obtain PBE, some restrictions and assumptions on the belief system must be satisfied, and players’ behaviors must be sequentially rational [16]. For the purpose of self-contained exposition of this thesis, we list the definition for the pure-strategy PBE. The mixed-pure-strategy version can also be found in [16].
Definition 7 A perfect Bayesian equilibrium is a(a∗, µ) that satisfies (P) and B(i)-B(iv).
B(i) Posterior beliefs are independent, and all types of playerx have the same beliefs.
For all θ,t, and ht, we have
µx(θ−x|θx, ht) =
Y
y6=x
µx(θy|ht). (2.6)
B(ii) Bayes’ rule is used to update beliefs fromµx(θy|ht) to µx(θy|ht+1) whenever
pos-sible. For allx, y, ht, andat
y ∈ Ay(ht), if there exists ˘θy withµx(˘θy|ht) > 0 and
at∗
y(˘θy) = at∗y( bθy), then, for all θy
µx(θy|ht+1) = µx(θy|ht)δ at∗ y(θy) − at∗y ( bθy) P θ′ y:at∗y(θ′y)=at∗y( bθy)µx(θ ′ y|ht) , (2.7)
where δat∗y (θy) − at∗y ( bθy) = 1, if at∗ y(θy) = at∗y ( bθy) , 0, otherwise. (2.8) whereat∗
y (θy) denotes the best action for player y corresponds to type θy at staget.
Note that B(ii) doesn’t restrict the way belief about playery are updated if player
y’s stage-t action had conditional probability 0, which is the very difference from
SE.
B(iii) For allht,x, y, θ
y, at, anda˜t,
µx(θy|(ht, at)) = µx(θy|(ht, ˜at)) if aty=˜aty (2.9)
This condition means that even if player y does deviate at stage t, the updating
process should not be influenced by the action of other players.
B(iv) For allht,θ
z, andx 6= y 6= z,
µx(θz|ht) = µy(θz|(ht)) = µ(θz|(ht)) (2.10)
the belief of playerx, y about third player z are the same.
This condition implies that the posterior beliefs are consistent with a common joint
distribution onΘ given htwith
µ(θ−x|ht)µ(θx|(ht)) = µ(θ|(ht)) (2.11)
(P) Sequentially rational: For each playerx, type θx, and historyht,
at∗x(θx) = arg max at x∈Ax X θ −x µx(θ−x|ht)ux(axt, at∗−x(θ−x)|θ, ht), (2.12)
Here we assume thatΘ is discrete set. For continuous set, we replace the
summa-tion with integral for the condisumma-tion (P), or we can do approximasumma-tion by quantizing continuous set into discrete one.
Chapter 3
Multistage Bayesian Spectrum Trading
Game
3.1
Problem Setup
We consider a cognitive radio network withN primary services (e.g. the existing
cellu-lar services) andM secondary services (e.g. an SS can be a small network with a CR
base-station and multiple CR users), as shown in Fig. 3.1. The ith PS operates on its
own exclusive spectrumWi, from which theith PS can lease available unused bandwidth
bji to thejth SS who doesn’t own the legal right to use the spectrum. To maximize each
PS’s profit, each PS offers different prices to different SS’s. In the trading process, all PS’s compete with each other in the prices offering to the SS’s, and each SS decides from whom and how much of the available spectrum to rent. Specifically, we model the spec-trum trading process as a multistage game in a manner that all PS’s simultaneously set
their own pricespji, for all i and j, in the first stage. And, in the subsequent stage the
jth SS requests bandwidth bji from theith PS, for i = 1, 2, · · · , N and j = 1, 2, · · · , M.
Practically, however, each player (PS or SS) may possess its own private information that is unknown to other players. Therefore, each player cannot predict the overall trading behaviors correctly, which makes the decisions of optimal strategies a challenging task. In this incomplete information game, we propose using the theory of multistage Bayesian game to deal with the problem. The dynamic Bayesian approach allows each player to
Figure 3.1: The cognitive radio network with multiple primary services and multiple secondary services.
update its posterior beliefs, i.e. posterior probabilities, about the other players’ private in-formation by observing their actions in prior stages. And, each player can act accordingly in the current stage based on the updated beliefs.
We assume that each player is selfish, but rational in the considered multistage sequen-tial Bayesian game [16]. And the objective is to find the perfect Bayesian equilibrium for all players actions in a way that each player maximizes its expected profit as the game evolves sequentially.
As the private information may not be updated promptly and the channel conditions may change, we study a repeated version of the multistage Bayesian. The evolution of the repeated multistage game is illustrated in Fig. 3.2 with that one unit game composed of two stage is finished in one period.
Rather than learning in game to reach equilibrium, which spends time and energy on signaling and evolving, we believe that computing optimal strategy in one shot is more suitable for our scenario by the following reasons. First, the decision making of primary/secondary service is done by each primary/secondary base station, so the
com-Figure 3.2: The evolution of the multistage sequential game.
plexity of computing optimization problem is affordable for them. Also, since the players
are BS, the numberN and M is not as much as the number of terminal wireless users in
a normal cell.
3.2
Game Formulation
In this section, we describe the proposed multistage Bayesian game for spectrum trading with a general utility function. We will illustrate the idea by a specific example in Sec. 3.4.
We formulate the spectrum trading process as a multistage Bayesian game
Γ =DI, {Ax(ht)}, {θx ∈ Θx}, {Px}, {µx(θ−x|θx, h0)}
x ∈ I, ht ∈ Ht, t = 0, 1, 2, ..., TE,
(3.1)
whereI , Ip ∪ Is is the set of players withIp = {p1, p2, . . . , pN} being the set of all
PS’s andIs = {s1, s2, . . . , sM} the set of all SS’s, Ax(ht) is the set of actions available
for player x given a history ht = (a0, a1, · · · , at−1) at the beginning of stage t with
aτ = aτ p1 × a τ p2 × · · · × a τ pN × a τ s1 × a τ s2 × · · · × a τ
sM the action profile consisting of
the actions from all players (including PS’s and SS’s) at stageτ with aτ
pi ∈ Api(h
τ) and
aτs
j ∈ Asj(h
τ) being the action of the ith PS and the jth SS at stage τ , respectively. The
of all available actions for all players asA(hτ) = A p1(h τ) × A p2(h τ) × · · · × A pN(h τ) × As1(h τ) × A s2(h τ) × · · · × A sM(h τ). We denote ppi = (p1i, p2i, · · · , pM i)T, psj = (pj1, pj2, · · · , pjN)T, bpi = (b1i, b2i, · · · , bM i)T, bsj = (bj1, bj2, · · · , bjN)T, pτ = pp,τ1 ×p p,τ 2 ×· · ·×p p,τ N , and bτ = b s,τ 1 ×b s,τ 2 ×· · ·×b s,τ M . In
each time period, PSi sets price api = p
p
i at the even stage and stays silent (i.e. api = φ)
at the odd stage [16]. On the contrary, SSj performs ”do nothing” (i.e. asj = φ) at the
even stage and demands asj = b
s
j for bandwidth at the odd stage. Therefore, aτ = pτ× φ
at even stage, and aτ = φ × bτ at odd stage where φ is the action profile of ”do nothing”.
Θx is the set of possible private informationθx for player x. The type profile θp =
(θp1, θp2, · · · , θpN), and θs= (θs1, θs2, · · · , θsM). θp−idenotes the type profile θp
exclud-ingθpi. Similarly, θs−j denotes the type profile θs excludingθsj. The overall type profile
is θ = (θs, θp). The actual type value for player x is denoted by bθx, and the actual type
profile for PS’s, SS’s and overall players are bθp, bθs, bθ, respectively. Px standing for the
profit function (i.e. the net utility) of playerx is a mapping Px : Hτ × θ → R from the
space Hτ × θ to the set of real numbers R. µ
x(θ−x|θx, ht) is player x’s beliefs about
other players’ types given its type θx with historyht. More details about the beliefs will
be described in the next section.
In contrast with the static game of incomplete information, the belief about others can be updated stage by stage, and the players’ actions can change according to the newly updates of beliefs.
3.3
General Formulation for Multiple Sellers and
Multi-ple Buyers
3.3.1
Utility Model
As mentioned in the system model, PSi leases bandwidth bji to SS j The amount of bji
affects the remaining available bandwidth of PSi, and thus affects the corresponding QoS
satisfaction, which for PSi is denoted by the utility function upi(b
p
i|θ, ht). The monetary
gain of trading isPjpjibji = (ppi)Tb
p
And the total profit of PSi is given by Ppi(p p i, b p i|θ) = upi(b p i|θ) + (p p i)Tb p i, (3.2) whereupi(b p i|θ, ht) is denoted as upi(b p
i|θ) for notational simplicity, also the condition
on historyhtwill be omitted inP
pi. We assume thatPpi(p
p i, b
p
i|θ) is a concave function
of(ppi, bpi). Although such assumption is made, we will show in the explicit system that
the assumption may not be the same as general formulation does to guarantee the joint
KKT condition.Θ is assumed a discrete space in the formulation.
For secondary service, the utility of QoSusj(b
s
j|θ) is modeled as a concave function
of bsj. The cost of buying bandwidth isPipjibji = (psj)Tbsj. The total profit of secondary
service is Psj(p s j, bsj|θ) = usj(b s j|θ) − (psj)Tbsj (3.3)
which is still a concave function of bsj.
3.3.2
Self-Optimization and KKT Translation
Since SSj is a follower of the game, it can observe the sellers’ action ps
j( ˆθp) = (pj1(ˆθp1),
pj2(ˆθp2), · · · , pjN(ˆθpN))
T. Note that ps
j( ˆθp) is the optimal price corresponds to type
pro-file ˆθp, and SS just observes the prices without the knowledge of ˆθp. That is, SS may still
don’t know ˆθp correctly (θpis still random), but the prices corresponds to ˆθpis
determin-istic. Based on that, SSj of type ˆθsj would maximize its expected profit which can be
formulated as bs∗j = arg max bsj E θs −j,θp[Psj(p s j( ˆθp), bsj|ˆθsjθs−jθp)] (3.4)
The KKT condition [17] for the profit maximization of SSj of type ˆθsj is
∇bs jEθs−j,θp[Psj(p s j( ˆθp), bsj|ˆθsjθs−jθp)]|bsj=bs∗j = 0, (3.5) which is equivalent to ∇bsjEθs −j,θp[usj(b s j|ˆθsj, θs−j, θp)]|bsj=bs∗j = p s j( ˆθp), (3.6)
here we observe that bs∗j is a function of ˆθsj and ˆθp, hence it can be can denote as
bs∗j (ˆθsj, ˆθp). Another observation is that SS’s self-optimization are not coupled each other,
namely, the profit maximization for SSj depends on all PS’s and SS j itself but not other
SS’s. Hence, to be sequentially rational for SS j, it only needs to solve (3.5) or (3.6)
without taking other SS’s action into consideration.
But, how would PS’s move with knowing that each SS is sequentially rational? It’s widely known that the technique backward induction [16] is useful in solving finite dy-namic game of perfect information. Here we apply the similar idea in solving trading
game. All primary services know that SS j would ask the best demand bs∗
j (ˆθsj, ˆθp), or
equivalently, they know the KKT condition for all SS’s (3.6). However, since PSi doesn’t
know the exact type ˆθsj and ˆθp−iexactly, PSi views b
s∗
j (θsj, ˆθpi, θp−i) as a random
vari-able with uncertainθsj and θp−i. Here, the objective of PSi is to maximize its expected
profit based on the beliefsµ(θs, θp−i|h
t) about other players’ private information,
consid-ering the KKT condition of SS’s. The optimization for PSi of type bθpi is therefore given
by pp∗i (ˆθpi) = arg max ppi E θs,θp −i[Ppi(p p i, b p∗ i (θs, ˆθpi, θp−i)|θs, ˆθpi, θp−i)], (3.7) s.t.0 ≤b∗ ji(θsj, ˆθpi, θp−i), ∀sj, ∀θsj ∈ Πsj(h t), ∀θ p−i ∈ Πp−i(h t), (3.8) Wi ≥ X j b∗
ji(θsj, ˆθpi, θp−i), ∀θs ∈ Πs, ∀θp−i ∈ Πp−i(h
t), (3.9) psj(ˆθpi, θp−i) =∇bsjEθs−j,θp[usj(b s j|θ)]|bsj=bs∗ j (θsj,ˆθpi,θp−i), ∀sj, ∀θsj ∈ Πsj(h t), ∀θ p−i ∈ Πp−i(h t), (3.10) whereΠsj(h
t) is the set of all possible θ
sj’s that satisfyµ(θsj|h
t) > 0, Π
sis the set of all
possible θs’s that satisfyµ(θs|ht) > 0 and Πp−i(h
t) is the set of all possible θ
p−i’s that
satisfyµ(θp−i|h
t) > 0. The constraints in (3.8) and (3.9) limit the demand to be within
the physically realizable spectrum region afforded by PSi under all possible type profiles
of the other players. Note that there are numbers of inequalities in (3.8) and (3.9), but we
can reduce them by finding the minimal setΘm,i(ht) and ΘM,i(ht) to represent these two
inequalities. The determination ofΘm,i(ht) and ΘM,i(ht) depends largely on the utility
profile. In this work, we call this approach the KKT translation. In this optimization problem, with the assumptions we’ve made, if the constraints (3.10) are affine functions, then the problem is a convex optimization problem, otherwise it’s a optimization problem [17]. The difference lies in whether KKT condition is sufficient and necessary or purely necessary for the problem.
3.3.3
Perfect Bayesian Equilibrium and Joint KKT Condition
We are now ready to find the PBE at staget of the multistage Bayesian game modeled in
the considered cognitive radio network. The posterior belief is obtained by PBE updating rule B(i)-B(iv). With that, the condition (P) of PBE at any stage is
pp∗i (θpi) = arg max ppi E θs,θp −i[Ppi(p p i, b p∗ i (θs, θpi, θp−i)|θ)], (3.11) s.t. 0 ≤b∗
ji(θsj, θpi, θp−i), ∀sj, ∀(θsj, θp−i) ∈ Θm,i(h
t), (3.12)
Wi ≥
X
j
b∗
ji(θsj, θpi, θp−i), ∀(θs, θp−i) ∈ ΘM,i(h
t), (3.13) ps∗j (θpi, θp−i) =∇bsjEθs−j,θp[usj(b s j|θ)]|bs j=bs∗j (θsj,θpi,θp−i), ∀sj, ∀θsj ∈ Πsj(h t), ∀θ p−i ∈ Πp−i(h t), (3.14) ∀θpi ∈ Πpi(h t), ∀p i ∈ Ip
It is clear that if the constraint (3.14) is affine and the price profile pp∗−i(θp−i) for all type
profiles is known, then the KKT condition is sufficient and necessary for solving the
convex optimization problem. However, finding the optimal strategy profile pp∗−i(θp−i)
for all possible θp−i needs the information of p
p∗
i (θpi) for all possible θpi. It follows that
each player has to jointly solve all PS’s KKT conditions simultaneously. The joint KKT conditions are given by
−b∗ ji(θsj, θp) ≤ 0, ∀(θsj, θp−i) ∈ Θm,i(h t), ∀s j ∈ Is PM j=1b∗ji(θsj, θp) − Wi ≤ 0, ∀(θs, θp−i) ∈ ΘM,i(h t) Kjk(θsj) = pjk(θpk), ∀θsj ∈ Πsj(h t), ∀θp−i ∈ Πp−i(h t), ∀s j ∈ Is, ∀pk ∈ Ip
λi,j,θsj,θp ≥ 0, ∀(θsj, θp−i) ∈ Θm,i(h
t), ∀s
j ∈ Is
νi,θs,θp ≥ 0, ∀(θs, θp−i) ∈ ΘM,i(h
t) λi,j,θsj,θpb ∗ ji(θsj, θp) = 0, ∀(θsj, θp−i) ∈ Θm,i(h t), ∀s j ∈ Is νi,θs,θp( PM j=1b∗ji(θsj, θp) − Wi) = 0, ∀(θs, θp−i) ∈ ΘM,i(h t) ∇ppiLi = 0 ∀θpi ∈ Πpi(h t), ∀p i∈ Ip. (3.15) whereλi,j,θsj,θp
−i,νi,θs,θp−i andηk,j,θsj,θp−i are Lagrange multipliers.Kjk(θsj) represents
righthand part of equation (3.14), which is
Kjk(θsj) ≡ ∂Eθs −j,θp[usj(b s j|θ)]|bs j=bs∗j (θsj,θp) ∂bjk , ∀pk ∈ Ip (3.16)
Liis Lagrangian function of PSi of type θpi, which is
∇ppiLi = ∇ppiEθs,θp −i[Ppi(p p i, b p∗ i (θ)|θ)] (3.17) − X
sj∈Is,(θsj,θp−i)∈Θm,i(ht)
λi,j,θsj,θp∇ppi(−b∗ji(θsj, θp)) (3.18) − X (θs,θp−i)∈ΘM,i(ht) νi,θs,θp∇ppi M X j=1 b∗ ji(θsj, θp) − Wi ! (3.19) − X
pk∈Ip,sj∈Is,θsj∈Πsj(ht),θp−i∈Πp−i(ht)
ηk,j,θsj,θp∇ppi[pjk(θpk) − Kjk(θsj)] (3.20)
3.4
Explicit System For Multiple Sellers and Multiple
Buy-ers
3.4.1
Utility Model
In this part, we adopt and modify the utility models in [8]. The profit function of theith
PS is given by Ppi(p p i, b p i|θpi) = (p p i)Tb p i + c1θpi− c2θpi B req i − k (p) i Wi−PMj=1bji θpi !2 , (3.21)
wherec1 and c2 are constant weights, Breqi is the bandwidth requirement for a primary
connection, ki(p) = log2 1 + 1.5γip ln(0.2/BERtar i )
denotes the spectral efficiency of wireless
transmission for theith PS with γip being the signal-to-noise ratio (SNR) at theith PS’s
receivers andBERtari being the target bit-error-rate (BER) for theith PS’s local
connec-tion [18]. The private informaconnec-tionθpi, taking values in the setΘp, represents the number
of connections in theith PS. The first term in righthand side of (3.21) is the monetary
gain of selling bandwidths. The second term is the revenue of maintaining primary
con-nections that is proportional toθpi. The third term is the cost of sharing the spectrum with
SS’s, the square term could be interpreted as magnification of the difference between
re-quired throughput and actual serving throughput per terminal user ofith PS. Instead of
single SS scenario in [8], the profit function (3.21) considers multiple SS’s.
The profit function of SSj is given by
Psj(p s j, bsj|θsj) = 1 θsj " N X i bjik (sj) i − 1 2 (b s j)Tbsj + 2ξj X k6=i bjkbji !# − (ps j)Tbsj, (3.22)
whereξj ∈ [−1.0, 1.0] is jth SS’s spectrum substitutability is defined as follows. When
ξj = 1, SS j could switch among the spectrum rent from all PS’s freely. When ξj = 0,
SSj can’t switch among the operating spectrum. If ξj < 0, spectrum sharing by SS j is
complementary, that is, it will need to buy one or more additional spectrum
simultane-ously. We consider0 ≤ ξj ≤ 1 for the rest of the thesis, the other case −1 ≤ ξj ≤ 0
is straightforward. k(sj) i = log2 1 + 1.5γ s ji ln(0.2/BERtar j )
acquired byjth SS’s secondary user on the band Wi owned by PSi. The first two term
in righthand side of (3.22) are QoS satisfaction function of SS j, which is modeled as
a concave function of bsj. The last term is the payment for buying bandwidths from all
PS’s. Compared with the utility in [8], we introduce the private informationθsj ofjth SS
in this paper to represent the factor leveraging the weighting between QoS and the spec-trum trading expense. This weighting factor is implicitly related to the number of active connections within SS. When there is no connections requested by the cognitive users in
jth SS, jth SS must have zero profit in terms of QoS and the corresponding θsj is∞.
3.4.2
Solving for Perfect Bayesian Equilibrium
To obtain the optimal strategy ofjth SS of type bθsj, the KKT condition of the
maximiza-tion ofjth SS’s profit function is
∇bs jEθs−j,θp[Psj(p s j( ˆθp), bsj|bθsj)]|bsj=bs∗j = ∇bsjPsj(p s j( ˆθp), bsj|bθsj)|bsj=bs∗j = 0, (3.23)
In this example, the close form solution of the best demand from jth SS to ith PS is
obtained as follows b∗ji(ˆθsj, ˆθp) = Dji(p s j( bθp), bθsj) = D1,ji(p s j−i( bθp−i), bθsj) − bθsjpji(bθpi)D2,j, (3.24)
where psj−i( bθp−i) is p
s
j( bθp) with the exclusion of pji(bθpi) and
D1,ji(psj−i( bθp−i), bθsj) =
Cji Aj + ξjθbsj P k6=ipjk(bθpk) Aj (3.25) D2,j = (ξj(N − 2) + 1) Aj > 0, if 0 ≤ ξj ≤ 1 (3.26) withAj = (1 − ξj)(ξj(N − 1) + 1) ≥ 0, Cji= ki(sj)(ξj(N − 2) + 1) − ξjPk6=ikk(sj).
We observe that Dji(psj( bθp), bθsj) is an affine function of p
s
j. It would increase as
pjk(bθpk) increases for all pk ∈ Ip, pk 6= pi and would decrease as pji(bθpi) increases.
The minimum ofDji(psj( bθp), bθsj) happens when pji(bθpi) is highest and p
s
j−i( bθp−i) is
low-est. However, the dependency on bθsj is not clear, which also depends on p
s
j−i( bθp−i) and
pji(bθpi). Similar reasoning could be applied for the maximum of Dji(p
s
the minimal set for bandwidth constraint (3.8) and (3.9) are Θm,i,j(ht) = {(θsmj, θ m p−i), (θ M sj, θ m p−i)} (3.27) ΘM,i(ht) = {(θsc, θpM−i) (θc s)j = θsmj orθ M sj, ∀sj ∈ Is} (3.28) whereθm sj is minimum ofθsj withµ(θ m sj|h t) > 0, θM sj is maximum ofθsj withµ(θ M sj|h t) > 0, θm
p−i is elementwise minimum of θp−i withµ(θ
m p−i|h
t) > 0, θM
p−i is elementwise
maxi-mum of θp−iwithµ(θ
M p−i|h
t) > 0.
Then, we examine the objective function of PS’s.Ppi(p
p i, b
p∗
i (θs, θpi, θp−i)|θ) is not a
concave function of(ppi, bpi) in this explicit case, but with bip∗(θs, θpi, θp−i) being replaced
withDip(p(θp), θs) , the new function Ppi(p
p i, D p i(p(θp), θs)|θpi) is concave of p p i, where Dip(p(θp), θs) = (D1i(ps1(θp), θs1), · · · , Dji(p s j(θp), θsj), · · · , DM i(p s M(θp), θsM)) T. To-gether withb∗
ji(θsj, θpi, θp−i) in (3.12) and (3.13) being replaced with Dji(p
s
j(θp), θsj), we
can drop the equation (3.14), and the equation (3.11)- (3.13) becomes,
pp∗i (θpi) =arg max ppi E θs,θp −i[Ppi(p p i, D p i(p(θp), θs)|θpi)], (3.29) s.t. 0 ≤ Dji(psj(θp), θsj), ∀sj ∈ Is, ∀(θsj, θp−i) ∈ Θm,i,j(h t), (3.30) Wi ≥ M X j=1 Dji(psj(θp), θsj), ∀(θs, θp−i) ∈ ΘM,i(h t), (3.31) ∀θpi ∈ Πpi(h t), ∀p i∈ Ip.
The optimization of PS’s in the explicit system is a convex optimization problem. And the joint KKT condition now becomes
−Dji(psj(θp), θsj) ≤ 0, ∀sj ∈ Is, ∀(θsj, θp−i) ∈ Θm,i,j(h t) PM j=1Dji(psj(θp), θsj) − Wi ≤ 0, ∀(θs, θp−i) ∈ ΘM,i(h t)
λi,j,θsj,θp ≥ 0, ∀sj ∈ Is, ∀(θsj, θp−i) ∈ Θm,i,j(h
t)
νi,θs,θp ≥ 0, ∀(θs, θp−i) ∈ ΘM,i(h
t) λi,j,θsj,θpDji(p s j(θp), θsj) = 0, ∀sj ∈ Is, ∀(θsj, θp−i) ∈ Θm,i,j(h t) νi,θs,θp PM j=1Dji(psj(θp), θsj) − Wi = 0, ∀(θs, θp−i) ∈ ΘM,i(h t) ∇ppiLi(θpi) = 0 ∀θpi ∈ Πpi(h t), ∀p i∈ Ip, (3.32)
whereLi(θpi) is the Lagrangian function for maximization of ith PS of type θpi, and the n-th element of ∇ppiLi(θpi) is h ∇ppiLi(θpi) i n= ∂Li(θpi) ∂pni = ∂E [Ppi(p p i, D p i(ps(θp), θs)|θpi)] ∂pni − λi,n,θm sn,θmp−iθ m
snD2,ni− λi,n,θMsn,θmp−iθ
M snD2,ni+ X (θs,θp−i)∈ΘM,i(ht) θsnD2,niνi,θs,θp Since∇ppiEθs,θp −i[Ppi] = Eθs,θp−i[∇p p iPpi], we compute h ∇ppiPpi(p p i, D p i(ps(θp), θs)|θpi) i n = ∂Ppi(p p i, D p i(ps(θp), θs)|θpi) ∂pni = " Cni An + 2c2k(p)i θsnD2,ni B req i − k (p) i Wi−PMj=1CAji j θpi !# | {z } En,i(θpi,θsn) − " 2θsnD2,ni+ 2c2(ki(p)θsnD2,ni) 2 θpi # | {z } Gn,i(θpi,θsn) pni(θpi) − 2c2(ki(p))2θsnD2,ni θpi | {z } Hn,i(θpi,θsn) X j6=n θsjpji(θpi)D2,j + " ξnθsn An +2c2(k (p) i θsn) 2D 2,niξn θpiAn # | {z } Fn,i(θpi,θsn) X k6=i pnk(θpk) + 2c2(ki(p))2θsnD2,ni θpi | {z } In,i(θpi,θsn) X j6=n ξjθsj Aj X k6=i pjk(θpk)
= En,i(θpi, θsn) − Gn,i(θpi, θsn)pni(θpi) − Hn,i(θpi, θsn)
X j6=n θsjpji(θpi)D2,j + Fn,i(θpi, θsn) X k6=i pnk(θpk) + In,i(θpi, θsn) X j6=n ξjθsj Aj X k6=i pjk(θpk), ∀sn∈ Is. Therefore, Eθs,θp −i ∂Ppi(p p i, D p i(ps(θp), θs)|θpi) ∂pni
= En,i− Gn,ipni(θpi) − Hn,i
X j6=n θsj · pji(θpi)D2,j + Fn,i X k6=i pnk(θpk) + In,i X j6=n ξjθsj Aj X k6=i pjk(θpk).
3.4.3
Algorithm for Solving Joint KKT Condition
The joint KKT conditions can be solved by active-set method [19], which is summarized in Algorithm 1.
Algorithm 1 Active-set method for solving joint KKT condition
0: Define: S ,Fi,jΘm,i,j(ht) ∪ ΘM,i(ht), and W is the working set.
1: Initialize: SetW = ∅.
2: Repeat: Solve the joint KKT conditions with thatλi,j,θsj,θp = 0 and
νi,θs,θp = 0 for those constraints /∈ W.
3: Condition 1: Check whether equation (3.30) is satisfied forθM
pi, ∀pi ∈ Ip
4: Condition 2: Check whether equation (3.31) is satisfied forθm
pi, ∀pi ∈ Ip
5: Condition 3: Check whetherλi,j,θsj,θp ≥ 0 and νi,θs,θp ≥ 0 for
those constraints∈ W,
6: If conditions 1, 2, and 3 all are satisfied, then
we obtain the optimal pp∗i (θpi) for all θpi ∈ Πpi(h
t) and for all p
i. We finish.
7: Else choose anotherW ⊂ S.
8: End repeat
The complexity of this algorithm depends on two factors, one is how you choose next working set, and the other is how you solve the linear equations. If the simplest working set choosing, i.e. linear choosing, is implemented, then the worst case searching number
would be22M N +N 2M
. It’s because there’re 2MN + N2M constraints in total, therefore
22M N +N 2M
combinations of working set are possible. The number of linear equations
for given working set W is (N ∗ M ∗ |Θp| + |W|), where |W| is the number of active
constraints, which ranges from 0 to22M N +N 2M
.
To make this algorithm more practical, we can reduce the complexity by quantizing
Θp. For instance, if nowΘp ≡ {1, 2, · · · , 10}, then we can quantize it into 2 subsets, the
upper set and the lower set, and let8 be the representative element for upper set, and 3 be
the representative element for lower set. For all elements greater than 5, they are viewed 8; for all elements less or equal to 5, they are viewed 3. Now the algorithm is performed
with the quantized type spaceΘq
p ≡ {3, 8}. After the current period game is finished and
the opponents’ type are classified into either upper set or lower set, the upper or lower set could be further quantized for the next period game. In this way, the type space is now of
size 2 for every time calculation, so the complexity is greatly reduced.
3.5
Convergence of Beliefs and Actions
In this section, we discuss the convergence of beliefs and actions. We’ll conclude that 1. the belief update always tends to lead to a correct one, but may not converge; 2. although the belief may not converge, the action would converge to the one of actual type.
Proposition 1 The belief of player x 6= y about the actual type of player z at stage t
would be greater or equal to the belief at staget′ ift > t′.
µi( bθj|ht) ≥ µi( bθj|ht ′ ) (3.33) Proof 1 µi( bθj|ht ′+1 ) = µi( bθj|h t′ )δ(at∗ j ( bθj) − at∗j ( bθj)) P θ′ j:at∗j(θ′j)=at∗j ( bθj)µi(θ ′ j|ht ′ ) (3.34) = µi( bθj|h t′ ) P θ′ j:at∗j (θ′j)=at∗j ( bθj)µi(θ ′ j|ht ′ ) ≥ µi( bθj|h t′ ) (3.35)
According to the above statement, the updating of belief is never a misleading updat-ing. But it doesn’t address about whether the updating converges to the actual one or not, perhaps the improvement stops before converging to the actual one. Fortunately, even the belief may not converge to actual profile, the action profile taken by all players con-verges, and it would converge to the action profile same as the one taken in the complete information game. The reasoning is as follows.
Given that pj−i(θp−i) are taken by joint KKT method, PS i knows that the optimal
demand from SSj of type θsj isb
∗
ji(θsj, ˆθpi, θp−i) by solving (3.10). The optimal pricing
p∗unc
ji (ˆθpi) of (3.7) without constraint (3.8) and (3.9) may result in feasible or infeasible
demandb∗unc
ji (θsj, ˆθpi, θp−i). However, the demand must be feasible. If p
∗unc
ji (ˆθpi) makes
thei-th demand negative, then by joint KKT condition, b∗
ji(θsj, ˆθpi, θp−i) would be fixed
to0, and that would reversely generate new optimal i-th pricing p∗
following equations p∗ ji(ˆθpi) = ∂Eθp[usj(bsj|θ)]|bs j=bs∗j (θ) ∂bji , (3.36) pjk(θpk) = ∂Eθp[usj(bsj|θ)]|bs j=bs∗j (θ) ∂bjk , ∀k 6= i, (3.37)
where the i-th term of bs∗j (θ) is b∗
ji(θsj, ˆθpi, θp−i) = 0. Note that since b
∗
ji(θsj, ˆθpi, θp−i) =
0, Solving (3.37) obtains b∗
j−i for givenpj−i(θp−i), which means that b
∗
j−i is irrelevant to
p∗
ji(ˆθpi) if p
∗unc
ji (ˆθpi) gives negative demand. Then, since b
∗
ji(θs, ˆθpi, θp−i) = 0, b
∗ j−i is
determined bypj−i(θp−i) solely, and p
∗
ji(ˆθpi) is determined by b
∗
j−i completely. The
rela-tion between the newly generated optimal pricingp∗
ji(ˆθpi) and type ˆθpi lies onp
∗unc ji (ˆθpi). Ifp∗unc ji (ˆθpi) results in feasible b ∗unc ji (θsj, ˆθpi, θp−i), then p ∗ ji(ˆθpi) = p ∗unc ji (ˆθpi), which de-pends on ˆθpi. If p ∗unc ji (ˆθpi) results in negative b ∗unc ji (θsj, ˆθpi, θp−i), then p ∗ ji(ˆθpi) is
deter-mined bypj−i(θp−i) completely, which is independent of θpi. Here, we defineΘj,i,neg ≡
{θpi except ˆθpi|p
∗unc
ji (θpi) results in negative demand} to proceed the discussion. For those
θpi ∈ Θj,i,neg, thei-th demand b
∗
ji(θsj, θpi, θp−i) = 0 by joint KKT and p
∗
ji(θpi) will also be
constrained as (3.36). Following the same reasoning,pj−i(θp−i) determines p
∗
ji(θpi)
com-pletely, and the constrained pricingp∗
ji(θpi) is independent of θpi. Therefore, ifp
∗unc ji (ˆθpi)
results in negative demand, thenp∗
ji(ˆθpi) is the same as p
∗
ji(θpi) for θpi ∈ Θj,i,neg given
the samepj−i(θp−i) (hence for the same θ−i). Clearly, ifΘj,i,neg is nonempty, then PSi’s
opponents couldn’t tell what the actual type PSi is since the best strategy for those type
are the same, but we should note that the best strategy still corresponds to the actual type.
It’s similar to apply the reasoning for the case that the demand more thanWi, then by
joint KKT condition,b∗
ji(θsj, ˆθpi, θp−i) would be fixed to Wi, and that would reversely
gen-erate new optimali-th pricing p∗
ji(ˆθpi) by (3.36) with the i-th term of b
∗(θ) is b∗
ji(θsj, ˆθpi, θp−i) =
Wi.
To sum up, although the belief may not converge to the actual type, the actions always converge to the actual value.
Chapter 4
Simulations
4.1
Simulation Setup
The explicit model developed in Section 3.4 is adopted for simulation. In the first section, we show the effectiveness of the proposed joint KKT method for several cases and com-pare it with other existing work. In the second section, we examine the players’ actions and the belief about players’ type as time evolves and numerically analyze the result.
The type space of PS’s is set to be ΘP = {10, 11, 12}, and the type space of SS’s is
set asΘS = {1, 2, 3}. The initial beliefs are assumed uniformly distributed over the type
space, µ(θpi|h
0) = 1
3 for all pi and µ(θsj|h
0) = 1
3 for allsj. The constants in the PS’s
utility are chosen asc1 = 2 and c2 = 2, and the spectrum substitutability ξj is0.4 for all
sj. Note that some parameters may change depending on different simulation scenarios,
and the remaining parameters will be specified in each simulation scenario.
4.2
Numerical Results
4.2.1
Effectiveness of The Joint KKT Conditions
In the section, we simulate the multistage game with complete information, i.e.µx(bθy) =
1 for all x, y, and compare the results of the proposed joint KKT conditions with those
effec-tiveness with effeceffec-tiveness of joint KKT conditions.
2 PS vs. 1 SS
First, we simulate the game with 2 PS and 1 SS with complete information, and compare the results of the proposed joint KKT conditions with those in [8] that correspond to
unconstrained (unc) game. Since there’s only one SS,bidenotesb1ifor simplicity, andpi
denotesp1i. Note that the constraint forbiin (3.8) is denoted here asfi,1, and that in (3.9)
is denoted here asfi,2. We show the best responses (BR), Nash equilibrium (NE) and the
corresponding feasible regions in both Fig. 4.1 and Fig. 4.3. The intersection of the best responses is the NE which is the result of sequential rationality when the information is complete.
In Fig. 4.1, with parametersW1 = 15 MHz and W2 = 15 MHz, Breq1 = 2 Mbps and
Breq2 = 2 Mbps, ˆθp1 = 10 and ˆθp2 = 10, ˆθs1 = 1, and the received SNR’s γ
p
1 = 15 dB,
γ2p = 15 dB, γs
11 = 22 dB, and γ12s = 22 dB, the unc solution satisfies the bandwidth
constraints, so it agrees with the solution of the proposed joint KKT conditions. Fig. 4.2(a) shows the profit function of PS1 given PS2 acting equilibrium strategy obtained by solving joint KKT condition and SS taking best demand. Fig. 4.2(b) shows the profit function of PS2 on similar condition. In this case, we observe that the feasible region on each PS’s profit function cover the unconstrained best response point. Fig. 4.2(c) shows the contour plot of the profit of SS given that PS1 and PS2 act equilibrium strategy obtained by solving joint KKT condition, and it shows that SS’s highest profit lies in strictly feasible region.
In Fig. 4.3, with parameters W1 = 5 MHz, W2 = 5 MHz, Breq1 = 2 Mbps and
Breq2 = 2 Mbps, ˆθp1 = 10 and ˆθp2 = 10, ˆθs1 = 1, and the received SNR’s γ
p
1 = 15 dB,
γ2p = 15 dB, γ11s = 22 dB, γ12s = 10 dB, the unc solution lies outside the bandwidth
constraints, while the optimal strategiesb∗
1 = 0 and b∗2 = 0 of the joint KKT conditions
satisfy the constraint. Fig. 4.4(a) shows the profit function of PS1 given that PS2 acting equilibrium strategy obtained by solving joint KKT condition and SS taking best demand. Fig. 4.4(b) shows the profit function of PS2 on similar condition. Fig. 4.4(c) shows the contour plot of the profit of SS given that PS1 and PS2 act equilibrium strategy obtained