• 沒有找到結果。

以多層級貝氏賽局為基礎之感知無線網路頻譜買賣

N/A
N/A
Protected

Academic year: 2021

Share "以多層級貝氏賽局為基礎之感知無線網路頻譜買賣"

Copied!
63
0
0

加載中.... (立即查看全文)

全文

(1)

國 立 交 通 大 學

電子工程學系 電子研究所碩士班

碩 士 論 文

以多層級貝氏賽局為基礎之感知無線網路頻譜買賣

Multistage Bayesian Game based Spectrum Trading for

Cognitive Radio Networks

研 究 生: 李重佑

指導教授: 簡鳳村 博士

(2)

以多層級貝氏賽局為基礎之感知無線網路頻譜買賣

Multistage Bayesian Game based Spectrum Trading for Cognitive

Radio Networks

研 究 生: 李重佑 Student: Chong-You Lee

指導教授: 簡鳳村 博士 Advisor: Dr. Feng-Tsun Chien

國 立 交 通 大 學

電子工程學系 電子研究所碩士班

碩 士 論 文

A Thesis

Submitted to Department of Electronics Engineering & Institute of Electronics College of Electrical and Computer Engineering

National Chiao Tung University in Partial Fulfillment of the Requirements

for the Degree of Master of Science

in

Electronics Engineering January 2010

Hsinchu, Taiwan, Republic of China

(3)

以多層級貝氏賽局為基礎之感知無線網路頻譜買賣

研究生:李重佑 指導教授:簡鳳村 博士

國立交通大學

電子工程學系 電子研究所碩士班

摘要

在本篇論文,我們以賽局理論的角度來研究感知無線電網路頻譜買賣。我們考 慮一個由多主要服務者(primary service)和多的次要服務者(secondary service)所構成 的感知無線電網路。主要服務者是此買賣賽局中的頻譜賣家,它們可以設定租借頻 帶給次要服務者的單位頻帶價格;次要服務者是該賽局中的買家,它們要決定跟買 家買多少頻帶。我們提出以多層級貝氏賽局為基礎的買賣模型來建立每個玩家可能 未公開的私人資訊的情況,並在符合頻帶限制下依序地求得完美貝氏平衡點(perfect Bayesian equilibrium)。所謂的頻帶限制其實就是所有買家所要求的頻帶量加起來不 能超過賣家所能負荷,而每個買家所要求的頻帶量也不能是負值。由倒推歸納原則, 我們將買家的 Karush-Kuhn-Tucker (KKT) condition 轉化為賣家的最佳化問題之條 件,並將所有賣家的 KKT condition 集合起來成為 joint KKT condition,符合該 joint KKT condition 的解即為此賽局的解。我們並提出以 active-set algorithm 來解該 joint KKT condition,並分析它的複雜度。此論文也探討了玩家的行動和對未知資訊的信 念是否會收斂。在模擬中,我們比較了我們的作法和前人的作法,並且數值上探討

(4)
(5)

Multistage Bayesian Game based Spectrum

Trading for Cognitive Radio Networks

Student: Chong-You Lee Advisor: Dr. Feng-Tsun Chien

Department of Electronics Engineering

Institute of Electronics

National Chiao Tung University

Abstract

In this thesis, we study the problem of spectrum trading in cognitive radio (CR) networks from a game theoretical perspective. Particularly, we consider a CR network with multiple primary services (PSs) and multiple secondary services (SSs), where all PSs are sellers targeting at setting the prices for spectrum leasing and SSs are buyers deciding how much spectrum are demanded from each PS in the trading game. Aiming at dealing with the trading behaviors, we propose using a multistage Bayesian game based trading model to account for possible unknown private information in each player, and obtain the perfect Bayesian equilibrium (PBE) sequentially under a bandwidth constraint, which requires all SSs' demanded bandwidth not exceeding that the PS can possibly offer and each SS's demand should not be negative. Following the backward induction principle, we transfer the Karush-Kuhn-Tucker (KKT) condition of the SSs into each PS's optimization constraint, and collectively form joint KKT conditions that satisfy the bandwidth constraint. We present an active-set based algorithm to solve the joint KKT conditions, and analyze the corresponding complexity. Furthermore, the convergence behaviors of the action profiles and the beliefs of the unknown information are also investigated in the work. Finally, in the simulations, we compare the proposed approach with earlier work and numerically study the convergence behaviors of the proposed

(6)
(7)

誌謝

我要感謝我的指導教授簡鳳村老師,從大四以來積極地提供各種學習資源;碩 一赴魯汶攻取雙聯學位老師也是欣然答應,讓我得以出國體驗,那一年見識給我很 大的衝擊,不論是外國學習環境、學術研究方式、社會和文化;碩二回台灣後,和 老師從事未曾研究過的賽局理論領域,雖然自學的過程辛苦,但當我有任何問題時, 老師總是不厭其煩地和我討論,有時一討論一個下午就過去了,有時一個禮拜和老 師討論三次,這個過程中我不僅學習良多,也慢慢建立起獨立研究的自信,這都要 感謝老師給予的挑戰和指導。謝謝桑梓賢教授在最佳化理論上的啟蒙和研究上的討 論與指導。 此外,感謝通訊電子與訊號處理實驗室的同學,主要要感謝江清德黃盈叡,除 了剛回來幫我適應環境和融入實驗室外,做研究之餘常常還被我拜託幫忙確認我的 邏輯和思緒是否有不合理之處。其它的學長姊、學弟妹、同學們和朋友,雖然沒有 一一誌謝,沒有你們,我的碩二生涯不會如此豐富。 在新竹的這六年,還要謝謝大伯父李宗仁和如英姊的照顧。最後,我要感謝我 的家人,謝謝你們一直以來的支持,你們永遠是我精神上最大的支柱。 在此,將此篇論文獻給所有愛我和我愛的人。 李重佑 民國九十九年一月 於新竹

(8)

Contents

1 Introduction 1

1.1 Significance of the Research . . . 1

1.2 Motivation . . . 2

1.2.1 Why Game Theory? . . . 2

1.2.2 Related Work and Our Approach . . . 3

1.3 Contribution . . . 5

2 Cognitive Radio and Game Theory Preliminary 7 2.1 Cognitive Radio . . . 7

2.2 Game Theory . . . 8

2.2.1 A Basic Game – Definitions and Theorems . . . 9

2.2.2 Multistage Bayesian Game . . . 12

3 Multistage Bayesian Spectrum Trading Game 15 3.1 Problem Setup . . . 15

3.2 Game Formulation . . . 17

3.3 General Formulation for Multiple Sellers and Multiple Buyers . . . 18

3.3.1 Utility Model . . . 18

3.3.2 Self-Optimization and KKT Translation . . . 19

3.3.3 Perfect Bayesian Equilibrium and Joint KKT Condition . . . 21

3.4 Explicit System For Multiple Sellers and Multiple Buyers . . . 23

3.4.1 Utility Model . . . 23

(9)

3.4.3 Algorithm for Solving Joint KKT Condition . . . 26

3.5 Convergence of Beliefs and Actions . . . 28

4 Simulations 30 4.1 Simulation Setup . . . 30

4.2 Numerical Results . . . 30

4.2.1 Effectiveness of The Joint KKT Conditions . . . 30

4.2.2 Evolutions of Beliefs and Actions . . . 37

5 Conclusion and Future Work 46 5.1 Conclusion . . . 46

5.2 Future Work . . . 47

(10)

List of Figures

2.1 Simplified Cognition Cycle. . . 8

3.1 The cognitive radio network with multiple primary services and multiple

secondary services. . . 16

3.2 The evolution of the multistage sequential game. . . 17

4.1 The best response, Nash equilirium and feasible region for PS’s with

W1 = 15 MHz and W2 = 15 MHz, γ1p = 15 dB, γ

p

2 = 15 dB, γs11 = 22

dB, andγs

12= 22 dB. . . 32

4.2 Profit function given that the opponents act equilibrium strategy with

W1 = 15 MHz and W2 = 15 MHz, γ1p = 15 dB, γ

p

2 = 15 dB, γs11 = 22

dB, andγs

12= 22 dB. (a) Of PS1. (b) Of PS2. (c) Of SS. . . 33

4.3 The best response, Nash equilirium and feasible region for PS’s with with

W1 = 5 MHz and W2 = 5 MHz, γ1p = 15 dB, γ

p

2 = 15 dB, γ11s = 22 dB,

γs

12= 10 dB. . . 34

4.4 Profit function given that the opponents act equilibrium strategy with

W1 = 5 MHz and W2 = 5 MHz, γ1p = 15 dB, γ

p

2 = 15 dB, γ11s = 22 dB,

γs

12= 10 dB. (a) Of PS1. (b) Of PS2. (c) Of SS. . . 35

4.5 The equilibrium strategy over stages, a comparison between the scenario

with M=3 and that with M=4. (a) Of PS’s. (b) Of SS’s. . . 36

4.6 The equilibrium strategies over stages of Case 1. (a) Of PS’s. (b) Of SS’s.

(c) The possible minimal equilibrium strategies of SS’s. . . 38

4.7 The equilibrium strategies over stages of Case 2. (a) Of PS’s. (b) Of SS’s.

(11)

4.8 The equilibrium strategies over stages of Case 3. (a) Of PS’s. (b) Of SS’s.

(12)

List of Tables

4.1 Belief Updating versus Stage for Case 1 . . . 40

4.2 Belief Updating versus Stage for Case 2 . . . 42

(13)

Chapter 1

Introduction

1.1

Significance of the Research

Nowadays we are facing a more congested spectrum than ever before. Most spectrum for long-distance radio transmissions has been allocated to licensed users, and there’s no much room for emerging wireless applications. However, large temporal and

geograph-ical variations exist in licensed spectrum utilization, and almost only 2% are always in

use according to the survey in [1]. That is, the efficiency of spectrum utilization is unac-ceptably low, so researchers start to think different spectrum allocation policies in order to tackle the problem. There’re two main approaches to this issue. One is that unlicensed users can opportunistically utilize licensed spectrum if not interfering with licensed users, while the other is that spectrum trading (or active negotiation) between licensed and un-licensed users would be a promising solution [2, 3]. We attempt to address several is-sues in the spectrum trading problem in this thesis. The idea of spectrum trading comes from economic point of view because of the success of economical world. Both schemes are considered as possible solutions in dynamic spectrum management. Apparently both schemes could enhance the efficiency of spectrum utilization, but how the network be-haves, how much efficiency can be increased, and how the fairness is guaranteed are open issues to be studied.

(14)

1.2

Motivation

The promise of providing anytime and anywhere multimedia services demands a large spectrum for broadband wireless communications. On one hand, this drives the advance of radio technology to faster, convenient and reliable communications. On the other hand, the enormous demand also unveils the problem of insufficiency and under-utilized ineffi-ciency of current radio spectrum. Useful radio spectrum is a scarce resource in that the characteristic of spectrum on different frequency is different, e.g. the communication on 60 GHz is only suitable for short distance because of the absorption of radio signal by oxygen of the Atmosphere. Nowadays, the most useful spectrum band for median and long distance communication is below 5 GHz due to the characteristic of spectrum and current circuit technology. To tackle the problem, the idea of exploiting under-utilized licensed spectrum for more flexible and efficient transmissions is receiving significant at-tentions lately. In particular, the concept of cognitive radio (CR) [4] is considered as a promising technique to improve the efficiency of current radio spectrum.

A cognitive radio (CR) is a software-defined radio capable of intelligently sensing, adapting and responding to constantly varying environments, particularly the available spectrum temporarily not used by licensed users. However, there still exist many technical challenges before cognitive radios can be practically deployed. One critical challenge is how to invite the licensed service operators to accept coexistence with cognitive users so that they are willing to share their unused spectrum to unlicensed cognitive (secondary) services. Leasing available spectrum to unlicensed services is an attractive solution that provides an incentive for legitimate licensed operators to support deploying cognitive radios [3]. This gains monetary profits for licensed operators, while fulfilling unlicensed services’ satisfaction requirements by renting.

1.2.1

Why Game Theory?

Conventional Media Access Control (MAC) theory is based on optimization, and the ob-jective function it aims at optimizing is the network system utility or the network system utility in terms of fairness, e.g. proportional fairness. Although some problem

(15)

formula-tions using optimization theory can be decomposed to problems to optimize network and user utility separately by dual-primal method [5], which makes distributed decision mak-ing possible, the solution for the optimization problem inherently couldn’t always satisfy each user’s individual utility.

In contrast to optimization-based approach, game theory is a mathematical tool to deal with interactions between multiple entities, each of which has its own utility function, and intrinsically looks for equilibrium solutions that maximizes each user’s individual utility. Though the network system utility may not be optimized, the strategy obtained from the game theoretical perspective provides a solution that achieves efficiency and fairness under certain criteria.

1.2.2

Related Work and Our Approach

An overview of the general idea and recent developments about dynamic spectrum shar-ing games can be found in [6]. The auction mechanism for spectrum band in CR networks with multiple primary and multiple secondary users is considered in [7], where the authors discuss competitive equilibrium, cheating behaviors which may deteriorate the efficiency of of the spectrum sharing and propose using reserve prices and beliefs to prevent collu-sion. The work in [8] and [9] consider a game model which incorporates both monetary gain and quality-of-service (QoS) satisfaction of wireless services in utility functions. The authors in [9] explicitly model the price for available bandwidth as a function of demand, and obtain the Nash equilibrium (NE) for the spectrum sharing strategy in a network con-sisting of a single primary service (PS) and multiple secondary services (SS). The work in [8] considers the spectrum trading game in a CR network with multiple PS’s and a single SS, and models the utilities of the PS and SS separately, wherein the demand of SS implicitly affects the price. However, under certain circumstances, the equilibrium band-width demand for the SS would be negative, and the corresponding NE turns out to be infeasible, though theoretically solvable. The work in [10] discusses the same problem as in [8], and compare different features such as market equilibrium as well as competitive and cooperative pricing strategies. In [11], the authors investigate the spectrum trading behaviors with a more general model in which multiple primary users (PUs) and

(16)

multi-ple secondary users (SUs) are considered in the CR network. However, the utility model considered in [11] may not capture different QoS requirements of SUs and assumes that each PU sets the same price for all SUs. One key assumption underlying all the above work is that each player in the modeled game have complete knowledge about the other players’ private information. This is in general not a realistic assumption. To account for the unknown private information within each player, one can resort to tools in Bayesian game or stochastic game to study the behaviors of spectrum trading in a sequential (dy-namic) manner [12, 13]. In [12], we formulate the spectrum trading behaviors for a CR network with multiple PS’s and a single SS as a Bayesian game, and study the correspond-ing solution concept, i.e. the perfect Bayesian equilibrium, sequentially. The work in [13] proposes to characterize the dynamics of spectrum access strategies under a stochastic game framework with the introduction of state transitions. The authors also propose to predict the future dynamics using approaches in learning theory in order to obtain better strategies for spectrum bidding.

In this work, extending the studies in [12], we address the problem of spectrum trad-ing in a CR network consisttrad-ing of multiple PS’s and multiple SS’s. We assume each player (PS or SS) in the game has its own private information, such as the number of active connections within each service or the channel conditions, that is unknown to other players. With the assumption, we formulate a multistage trading model based on the Bayesian game to statistically account for the unknown private information (incomplete information), and sequentially obtain the perfect Bayesian equilibrium (PBE) in the trad-ing process. We further assume that each PS is allowed to set different prices to the SS’s with different QoS, and SS’s with different QoS can demand different bandwidth sizes to a particular PS in the considered model. Particularly, we consider a bandwidth constraint on the aggregate bandwidth demand from all SS’s such that the total demand has to be within feasible supply regions provided by each PS.

(17)

1.3

Contribution

Aiming at dealing with the trading behaviors with that each play has its own private information, we propose using a multistage Bayesian game based trading model to ac-count for possible unknown private information in each player, and obtain the perfect Bayesian equilibrium (PBE) sequentially under a bandwidth constraint, which requires all SS’s demanded bandwidth not exceeding that the PS can possibly offer and each SS’s demand should not be negative. We formulate the considered problem as a multistage game, since one-shot game can’t capture the time-varying demands for resources due to the dynamic nature of wireless channels and wireless services. In multistage game, the allocation is performed repeatedly, and belief updates through observing others’ actions can also be made possible. Our formulation captures different pricing and demand strate-gies for different seller and buyer pairs based on their QoS’s. More specifically, on one hand we allow a primary service set different prices per unit bandwidth to different sec-ondary services based on their operating conditions and QoS requirements. On the other hand, different secondary services can demand different bandwidth sizes from the same primary service. Following the backward induction principle, we transfer the Karush-Kuhn-Tucker (KKT) condition of the SS’s into each PS’s optimization constraint, and collectively form joint KKT conditions that satisfy the bandwidth constraint to guaran-tee our PBE is physically feasible. We present an active-set based algorithm to solve the joint KKT conditions, and analyze the corresponding complexity. Furthermore, we illus-trate the spectrum trading game by an example with specific utility functions of PS’s and SS’s. The convergence behaviors of the action profiles and the beliefs of the unknown information are also investigated in the work. In the simulations, we compare the pro-posed approach with that in [8] and numerically study the convergence behaviors of the proposed multistage game.

As a final remark in the section, we would like to emphasize the general applicability of the joint KKT approaches to solve a game with constraints. Mathematically, we formu-late the problem considered in the thesis as a game with constraints, which is often very difficult to solve. Relevant approaches are rarely seen in the field of pure game theory,

(18)

not to mention in the literature related to wireless networks. In most studies that consider games with constraints, their problems usually have certain mathematical structure so that the solutions are always on the boundary set by the constraints. In this thesis, we attempt to solve a bandwidth-constrained game, where the constraints include budgets and feasi-ble bandwidths, using the proposed joint KKT conditions. It is worthwhile to note that joint KKT condition is generally applicable to solve a constrained game. The solutions generally need not be on the boundary of the constraints.

(19)

Chapter 2

Cognitive Radio and Game Theory

Preliminary

2.1

Cognitive Radio

Cognitive radio, which first appeared in Joseph Mitola’s doctoral dissertation in 2000 [4], is defined as an intelligent wireless communication system that are capable of achieving highly reliable communication whenever and wherever needed by adjusting its own trans-mission parameters according to the radio environmental conditions it senses. CR is called ”cognitive” in that it’s equipped with structures supporting a cognition cycle consisting

of Observe, Orient, Plan, Decide, and Act phases as Fig. 2.11 shows. As for realistic

implementation, CR is built based on software defined radio and wide-band RF front end to achieve that. There’re prototypes of CR already built, such as the first prototype CR1 by Mitola [4], CR and networking by Virginia Tech [14].

Although the initial aim of CR is not to efficiently utilize the radio spectrum, it serves as the natural candidate for the problem of spectrum under-utilization. CR can either opportunistically detect the spectrum hole and transmit or actively negotiate with primary users, i.e the existing licensed users, to access the spectrum. In recent years, there’re tremendous amount of researches on CR-related topic. They can be classified into three 1This figure is adapted From Mitola, ”Cognitive Radio: An Integrated Agent Architecture for Soft-ware

(20)

Figure 2.1: Simplified Cognition Cycle.

fundamental tasks [3],

1. Radio-scene analysis, which includes estimation of interference temperature of the radio environment and detection of spectrum holes.

2. Channel state estimation and predictive modeling, which encompasses estimation of channel-state information and prediction of channel capacity for use by the trans-mitter.

3. Transmit power control and dynamic spectrum management.

Our work is focus on dynamic spectrum management, which we adopt game theoretic approach to tackle with.

2.2

Game Theory

Game theory is a mathematical tool to predict the result of rational interactive decision makers. Predicting the result of such players has great merit in many fields such as chess,

(21)

card game, gambling, business and economics, politics, international diplomatics, and also in wireless network in which we are interested since in those fields no player can achieve his goal or gain his own maximal profit without considering the competitors’ be-havior. Although sometimes the explicit model is difficult to be defined (e.g. politics) or

too complex to predict the result and to derive the winning strategy2(e.g. in chess game,

the problem can’t easily be formulated as the math form with which we are familiar and strategy space is discrete, making it both not differentiable and too complex in number to examine the all strategy profile), game theory stands a important tool to provide either a solution to simplified problem or an insight. As for wireless network, applying game theory to predict and further to regulate the network is anticipated since the increasing complexity of wireless network results in significant interference and foreseeable dynam-ics of interactive users in cognitive network.

In this section, we introduce some basic knowledge of noncooperative game theory that are necessary for understanding our work, while interested reader can refer to [15] or [16] for deeper materials.

2.2.1

A Basic Game – Definitions and Theorems

A game in essence is that there’re multiple players and each player possesses its own strategy (e.g. variable) which it can freely adjust and its own objective function (e.g. function) which depends on its and other players’ strategy. In mathematics, a game is defined as

Definition 1 A gameΓ is

Γ =DI, {Ax}x∈I, {ux}x∈I

E

, (2.1)

whereI ≡ {1, 2, · · · , N} is the set of players, Axis the set of actions available for player

x, and we denote the set of all available actions for all players as A = A1×A2×· · ·×AN.

A action taken by player x is ax ∈ Ax, and the action profile of all players is a =

a1× a2 × · · · × aN ∈ A. For notational simplicity, we denote a−x as the action profile

2Actually, game theory predicts the equilibrium strategy instead of winning strategy, but one can pick

(22)

taken by all players except playerx. ux is playerx’s utility function which is a function

ofax and of a−x.

There’re some assumptions in game theory. First, each player is rational and selfish so that each want to maximize its own utility. Readers should mind that ”selfish” doesn’t mean ”malicious”. A selfish player cares about its utility, while a malicious player aims at harming other players. It’s also assumed that all players know the rules of the game,

i.e. each knows all players’ action set and utility, and each knows that other players know

that and so on, and the action is perfectly observable by all. Indeed, the scenario is too ideal due to those assumptions, so other different kind of game models are developed by mathematicians to make the model more practical. For instance, Bayesian game, the game model we apply in this thesis, is a game that there’re some private parameters in each player’s utility function. The private parameters are not known to all in this kind of game, and it’s also called game of incomplete information. The detail of Bayesian game will be introduced in the latter section. Lets go on the basic game.

What action or strategy would each player take? Apparently, each player choose the

action that are best for it given the other players’ action, and that action that player x

would take is defined as follows,

Definition 2 The best responsebx(a−x) of player x to the action profile a−x is a action

axsuch that:

bx(a−x) = arg max

ax∈Ax

ux(ax, a−x) (2.2)

Since best response is the best for playerx, player x would stick to it.

Each knows that each player would take best response, so the result of game is the action profile that is best response for all, if it exists. This mutual best response point, which was found by the Nobel Laureate John Forbes Nash, is a equilibrium since every player would stick to it. The formal definition is as below,

Definition 3 The pure strategy profile aconstitutes a Nash equilibrium (NE) if, for each playerx,

(23)

Note that this definition is for pure strategy, and there’s corresponding NE for mixed

strategy3. In the following, we address the condition for the existence of pure-strategy

NE under different conditions,

Theorem 1 (Debrew 1952; Glicksberg 1952; Fan 1952 [16]) Consider a strategic-form

game whose strategy spacesAx are nonempty compact convex subsets of an Euclidean

space. If the payoff are continuous in a and quasi-concave in ax, ther exists a

pure-strategy Nash equilibrium.

Theorem 2 (Dasgupta and Maskin [16]) Let Ax be a nonempty, convex and compact

subset of a finite-dimensional Euclidean space, for allx. If, for all x, uxis quasi-concave

in ax, is upper semi-continuous in a, and has a continuous maximum , there exists a

pure-strategy Nash equilibrium.

The definition of quasiconcave, upper semi-continuous and continuous maximum are illustrated as follows.

Definition 4 Iff (λx + (1 − λ)y) ≥ min(f (x), f (y)) for all x,y ∈ domf and 0 ≤ λ ≤ 1,

and domf is convex [17], then f is quasiconcave.

Definition 5 A functionui(·) on S isupper semi-continuousat s, if, for any sequence sn

converging to s, [16]

lim sup

n→+∞

ui(sn) ≤ ui(s) (2.4)

Definition 6 A functionui has a continuous maximumif u∗i(s−i) ≡ maxsiui(si, s−i) is

continuous in s−i. [16]

NE is thought to be the solution concept, i.e. the rule for predicting how the game will be played, of static game of complete and perfect information, and interested read-ers can find the corresponding theorem for mixed-strategy vread-ersion in [16]. It’s notable that there’re different solution concepts for different kind of games, for example perfect

Bayesian equilibrium for Bayesian game.

3Mixed strategy is randomization of pure strategy, which can be viewed as more general strategy than

pure one. The condition for existence of mixed-strategy NE is also looser than for pure-strategy NE. How-ever, we often like to find pure-strategy NE since it’s more physically achievable.

(24)

2.2.2

Multistage Bayesian Game

Considering each player contains its own private information without knowing others’ one, NE is not a suitable solution concept in such game due to the unknown information. This kind of problem happens often, for example, say two competing firms whose strategy is to determine the quantity of goods, what’s the best strategy for each of them without knowing the other’s operation cost details? Or more generally, how would the game proceed if there’s some uncertainty about players’ information? Bayesian game is a type of game aiming at this kind of situation. The Bayesian approach forms belief about the unknown information and allows each player to update its posterior beliefs, i.e. posterior probabilities, about the other players’ private information by observing their actions in prior stages. Each player can act accordingly in the current stage based on the updated beliefs, and then the game proceeds in a way that each player maximizes its expected profit according to its belief.

Game Formulation

A multistage Bayesian gameΓ can be formulated as follows,

Γ =DI, {Ax(ht)}, {θx ∈ Θx}, {ux}, {µx(θ−x|θx, h0)}

x ∈ I, ht∈ Ht, t = 0, 1, 2, · · · , TE,

(2.5)

where I ≡ {1, 2, · · · , N} is the set of players, Ax(ht) is the set of actions available

for playerx given a history ht = (a0, a1, · · · , at−1) at the beginning of stage t with the

notation aτ = aτ

1 × aτ2 × · · · × aτN the action profile at stageτ with aτi ∈ Ai(hτ) being

the action of theith player at stage τ , Ht is the set of all history ht withh0 = Ø. T is

the length of game. We denote the set of all available actions for all players at stageτ as

A(hτ) = A

1(hτ) × A2(hτ) × · · · × AN(hτ). θxis the private information, also known as

type, of playerx. Type, which is the incomplete information in Bayesian game, cannot

be known but can be inferred by other players. The type profile θ = θ1× θ2 × · · · × θN,

and θ−x denotes the type profile θ excluding θx. The actual type value for player x is

denoted by bθx, and the corresponding type profile is bθ. The utility function ux of player

(25)

other words,uxis a function of all players’ types, past and current actions.µx(θ−x|θx, h0)

is playerx’s belief about other players’ type θ−x given its own typeθx at history h0. In

contrast with the static game of incomplete information, the belief about others can be updated stage by stage. Bayesian game defines the rule of how players update their belief stage by stage, and the players’ actions can change according to the newly updates of beliefs.

Solution Concept and Belief System

In the game theory literature, the solution concept in a multistage game of incomplete information is called the perfect Bayesian equilibrium (PBE), which is a parallel to the subgame perfect equilibrium (SPE) in a multistage game of complete information. As SPE serves as a refinement of the Nash equilibrium in a multistage game of complete information, PBE is a refinement of the Bayesian NE (BNE) in a multistage game of incomplete information. To obtain PBE, some restrictions and assumptions on the belief system must be satisfied, and players’ behaviors must be sequentially rational [16]. For the purpose of self-contained exposition of this thesis, we list the definition for the pure-strategy PBE. The mixed-pure-strategy version can also be found in [16].

Definition 7 A perfect Bayesian equilibrium is a(a∗, µ) that satisfies (P) and B(i)-B(iv).

B(i) Posterior beliefs are independent, and all types of playerx have the same beliefs.

For all θ,t, and ht, we have

µx(θ−x|θx, ht) =

Y

y6=x

µx(θy|ht). (2.6)

B(ii) Bayes’ rule is used to update beliefs fromµx(θy|ht) to µx(θy|ht+1) whenever

pos-sible. For allx, y, ht, andat

y ∈ Ay(ht), if there exists ˘θy withµx(˘θy|ht) > 0 and

at∗

y(˘θy) = at∗y( bθy), then, for all θy

µx(θy|ht+1) = µx(θy|ht)δ  at∗ y(θy) − at∗y ( bθy)  P θ′ y:at∗y(θ′y)=at∗y( bθy)µx(θ ′ y|ht) , (2.7)

(26)

where δat∗y (θy) − at∗y ( bθy)  =    1, if at∗ y(θy) = at∗y ( bθy) , 0, otherwise. (2.8) whereat∗

y (θy) denotes the best action for player y corresponds to type θy at staget.

Note that B(ii) doesn’t restrict the way belief about playery are updated if player

y’s stage-t action had conditional probability 0, which is the very difference from

SE.

B(iii) For allht,x, y, θ

y, at, anda˜t,

µx(θy|(ht, at)) = µx(θy|(ht, ˜at)) if aty=˜aty (2.9)

This condition means that even if player y does deviate at stage t, the updating

process should not be influenced by the action of other players.

B(iv) For allht,θ

z, andx 6= y 6= z,

µx(θz|ht) = µy(θz|(ht)) = µ(θz|(ht)) (2.10)

the belief of playerx, y about third player z are the same.

This condition implies that the posterior beliefs are consistent with a common joint

distribution onΘ given htwith

µ(θ−x|ht)µ(θx|(ht)) = µ(θ|(ht)) (2.11)

(P) Sequentially rational: For each playerx, type θx, and historyht,

at∗x(θx) = arg max at x∈Ax X θ −x µx(θ−x|ht)ux(axt, at∗−x(θ−x)|θ, ht), (2.12)

Here we assume thatΘ is discrete set. For continuous set, we replace the

summa-tion with integral for the condisumma-tion (P), or we can do approximasumma-tion by quantizing continuous set into discrete one.

(27)

Chapter 3

Multistage Bayesian Spectrum Trading

Game

3.1

Problem Setup

We consider a cognitive radio network withN primary services (e.g. the existing

cellu-lar services) andM secondary services (e.g. an SS can be a small network with a CR

base-station and multiple CR users), as shown in Fig. 3.1. The ith PS operates on its

own exclusive spectrumWi, from which theith PS can lease available unused bandwidth

bji to thejth SS who doesn’t own the legal right to use the spectrum. To maximize each

PS’s profit, each PS offers different prices to different SS’s. In the trading process, all PS’s compete with each other in the prices offering to the SS’s, and each SS decides from whom and how much of the available spectrum to rent. Specifically, we model the spec-trum trading process as a multistage game in a manner that all PS’s simultaneously set

their own pricespji, for all i and j, in the first stage. And, in the subsequent stage the

jth SS requests bandwidth bji from theith PS, for i = 1, 2, · · · , N and j = 1, 2, · · · , M.

Practically, however, each player (PS or SS) may possess its own private information that is unknown to other players. Therefore, each player cannot predict the overall trading behaviors correctly, which makes the decisions of optimal strategies a challenging task. In this incomplete information game, we propose using the theory of multistage Bayesian game to deal with the problem. The dynamic Bayesian approach allows each player to

(28)

Figure 3.1: The cognitive radio network with multiple primary services and multiple secondary services.

update its posterior beliefs, i.e. posterior probabilities, about the other players’ private in-formation by observing their actions in prior stages. And, each player can act accordingly in the current stage based on the updated beliefs.

We assume that each player is selfish, but rational in the considered multistage sequen-tial Bayesian game [16]. And the objective is to find the perfect Bayesian equilibrium for all players actions in a way that each player maximizes its expected profit as the game evolves sequentially.

As the private information may not be updated promptly and the channel conditions may change, we study a repeated version of the multistage Bayesian. The evolution of the repeated multistage game is illustrated in Fig. 3.2 with that one unit game composed of two stage is finished in one period.

Rather than learning in game to reach equilibrium, which spends time and energy on signaling and evolving, we believe that computing optimal strategy in one shot is more suitable for our scenario by the following reasons. First, the decision making of primary/secondary service is done by each primary/secondary base station, so the

(29)

com-Figure 3.2: The evolution of the multistage sequential game.

plexity of computing optimization problem is affordable for them. Also, since the players

are BS, the numberN and M is not as much as the number of terminal wireless users in

a normal cell.

3.2

Game Formulation

In this section, we describe the proposed multistage Bayesian game for spectrum trading with a general utility function. We will illustrate the idea by a specific example in Sec. 3.4.

We formulate the spectrum trading process as a multistage Bayesian game

Γ =DI, {Ax(ht)}, {θx ∈ Θx}, {Px}, {µx(θ−x|θx, h0)}

x ∈ I, ht ∈ Ht, t = 0, 1, 2, ..., TE,

(3.1)

whereI , Ip ∪ Is is the set of players withIp = {p1, p2, . . . , pN} being the set of all

PS’s andIs = {s1, s2, . . . , sM} the set of all SS’s, Ax(ht) is the set of actions available

for player x given a history ht = (a0, a1, · · · , at−1) at the beginning of stage t with

aτ = aτ p1 × a τ p2 × · · · × a τ pN × a τ s1 × a τ s2 × · · · × a τ

sM the action profile consisting of

the actions from all players (including PS’s and SS’s) at stageτ with aτ

pi ∈ Api(h

τ) and

s

j ∈ Asj(h

τ) being the action of the ith PS and the jth SS at stage τ , respectively. The

(30)

of all available actions for all players asA(hτ) = A p1(h τ) × A p2(h τ) × · · · × A pN(h τ) × As1(h τ) × A s2(h τ) × · · · × A sM(h τ). We denote ppi = (p1i, p2i, · · · , pM i)T, psj = (pj1, pj2, · · · , pjN)T, bpi = (b1i, b2i, · · · , bM i)T, bsj = (bj1, bj2, · · · , bjN)T, pτ = pp,τ1 ×p p,τ 2 ×· · ·×p p,τ N , and bτ = b s,τ 1 ×b s,τ 2 ×· · ·×b s,τ M . In

each time period, PSi sets price api = p

p

i at the even stage and stays silent (i.e. api = φ)

at the odd stage [16]. On the contrary, SSj performs ”do nothing” (i.e. asj = φ) at the

even stage and demands asj = b

s

j for bandwidth at the odd stage. Therefore, aτ = pτ× φ

at even stage, and aτ = φ × bτ at odd stage where φ is the action profile of ”do nothing”.

Θx is the set of possible private informationθx for player x. The type profile θp =

(θp1, θp2, · · · , θpN), and θs= (θs1, θs2, · · · , θsM). θp−idenotes the type profile θp

exclud-ingθpi. Similarly, θs−j denotes the type profile θs excludingθsj. The overall type profile

is θ = (θs, θp). The actual type value for player x is denoted by bθx, and the actual type

profile for PS’s, SS’s and overall players are bθp, bθs, bθ, respectively. Px standing for the

profit function (i.e. the net utility) of playerx is a mapping Px : Hτ × θ → R from the

space Hτ × θ to the set of real numbers R. µ

x(θ−x|θx, ht) is player x’s beliefs about

other players’ types given its type θx with historyht. More details about the beliefs will

be described in the next section.

In contrast with the static game of incomplete information, the belief about others can be updated stage by stage, and the players’ actions can change according to the newly updates of beliefs.

3.3

General Formulation for Multiple Sellers and

Multi-ple Buyers

3.3.1

Utility Model

As mentioned in the system model, PSi leases bandwidth bji to SS j The amount of bji

affects the remaining available bandwidth of PSi, and thus affects the corresponding QoS

satisfaction, which for PSi is denoted by the utility function upi(b

p

i|θ, ht). The monetary

gain of trading isPjpjibji = (ppi)Tb

p

(31)

And the total profit of PSi is given by Ppi(p p i, b p i|θ) = upi(b p i|θ) + (p p i)Tb p i, (3.2) whereupi(b p i|θ, ht) is denoted as upi(b p

i|θ) for notational simplicity, also the condition

on historyhtwill be omitted inP

pi. We assume thatPpi(p

p i, b

p

i|θ) is a concave function

of(ppi, bpi). Although such assumption is made, we will show in the explicit system that

the assumption may not be the same as general formulation does to guarantee the joint

KKT condition.Θ is assumed a discrete space in the formulation.

For secondary service, the utility of QoSusj(b

s

j|θ) is modeled as a concave function

of bsj. The cost of buying bandwidth isPipjibji = (psj)Tbsj. The total profit of secondary

service is Psj(p s j, bsj|θ) = usj(b s j|θ) − (psj)Tbsj (3.3)

which is still a concave function of bsj.

3.3.2

Self-Optimization and KKT Translation

Since SSj is a follower of the game, it can observe the sellers’ action ps

j( ˆθp) = (pj1(ˆθp1),

pj2(ˆθp2), · · · , pjN(ˆθpN))

T. Note that ps

j( ˆθp) is the optimal price corresponds to type

pro-file ˆθp, and SS just observes the prices without the knowledge of ˆθp. That is, SS may still

don’t know ˆθp correctly (θpis still random), but the prices corresponds to ˆθpis

determin-istic. Based on that, SSj of type ˆθsj would maximize its expected profit which can be

formulated as bs∗j = arg max bsj E θs −j,θp[Psj(p s j( ˆθp), bsj|ˆθsjθs−jθp)] (3.4)

The KKT condition [17] for the profit maximization of SSj of type ˆθsj is

∇bs jEθs−j,θp[Psj(p s j( ˆθp), bsj|ˆθsjθs−jθp)]|bsj=bs∗j = 0, (3.5) which is equivalent to ∇bsjs −j,θp[usj(b s j|ˆθsj, θs−j, θp)]|bsj=bs∗j = p s j( ˆθp), (3.6)

(32)

here we observe that bs∗j is a function of ˆθsj and ˆθp, hence it can be can denote as

bs∗j (ˆθsj, ˆθp). Another observation is that SS’s self-optimization are not coupled each other,

namely, the profit maximization for SSj depends on all PS’s and SS j itself but not other

SS’s. Hence, to be sequentially rational for SS j, it only needs to solve (3.5) or (3.6)

without taking other SS’s action into consideration.

But, how would PS’s move with knowing that each SS is sequentially rational? It’s widely known that the technique backward induction [16] is useful in solving finite dy-namic game of perfect information. Here we apply the similar idea in solving trading

game. All primary services know that SS j would ask the best demand bs∗

j (ˆθsj, ˆθp), or

equivalently, they know the KKT condition for all SS’s (3.6). However, since PSi doesn’t

know the exact type ˆθsj and ˆθp−iexactly, PSi views b

s∗

j (θsj, ˆθpi, θp−i) as a random

vari-able with uncertainθsj and θp−i. Here, the objective of PSi is to maximize its expected

profit based on the beliefsµ(θs, θp−i|h

t) about other players’ private information,

consid-ering the KKT condition of SS’s. The optimization for PSi of type bθpi is therefore given

by pp∗i (ˆθpi) = arg max ppi E θsp −i[Ppi(p p i, b p∗ i (θs, ˆθpi, θp−i)|θs, ˆθpi, θp−i)], (3.7) s.t.0 ≤b∗ ji(θsj, ˆθpi, θp−i), ∀sj, ∀θsj ∈ Πsj(h t), ∀θ p−i ∈ Πp−i(h t), (3.8) Wi ≥ X j b∗

ji(θsj, ˆθpi, θp−i), ∀θs ∈ Πs, ∀θp−i ∈ Πp−i(h

t), (3.9) psj(ˆθpi, θp−i) =∇bsjEθs−j,θp[usj(b s j|θ)]|bsj=bs∗ j (θsj,ˆθpi,θp−i), ∀sj, ∀θsj ∈ Πsj(h t), ∀θ p−i ∈ Πp−i(h t), (3.10) whereΠsj(h

t) is the set of all possible θ

sj’s that satisfyµ(θsj|h

t) > 0, Π

sis the set of all

possible θs’s that satisfyµ(θs|ht) > 0 and Πp−i(h

t) is the set of all possible θ

p−i’s that

satisfyµ(θp−i|h

t) > 0. The constraints in (3.8) and (3.9) limit the demand to be within

the physically realizable spectrum region afforded by PSi under all possible type profiles

of the other players. Note that there are numbers of inequalities in (3.8) and (3.9), but we

can reduce them by finding the minimal setΘm,i(ht) and ΘM,i(ht) to represent these two

inequalities. The determination ofΘm,i(ht) and ΘM,i(ht) depends largely on the utility

(33)

profile. In this work, we call this approach the KKT translation. In this optimization problem, with the assumptions we’ve made, if the constraints (3.10) are affine functions, then the problem is a convex optimization problem, otherwise it’s a optimization problem [17]. The difference lies in whether KKT condition is sufficient and necessary or purely necessary for the problem.

3.3.3

Perfect Bayesian Equilibrium and Joint KKT Condition

We are now ready to find the PBE at staget of the multistage Bayesian game modeled in

the considered cognitive radio network. The posterior belief is obtained by PBE updating rule B(i)-B(iv). With that, the condition (P) of PBE at any stage is

pp∗i (θpi) = arg max ppi E θsp −i[Ppi(p p i, b p∗ i (θs, θpi, θp−i)|θ)], (3.11) s.t. 0 ≤b∗

ji(θsj, θpi, θp−i), ∀sj, ∀(θsj, θp−i) ∈ Θm,i(h

t), (3.12)

Wi ≥

X

j

b∗

ji(θsj, θpi, θp−i), ∀(θs, θp−i) ∈ ΘM,i(h

t), (3.13) ps∗j (θpi, θp−i) =∇bsjEθs−j,θp[usj(b s j|θ)]|bs j=bs∗j (θsj,θpi,θp−i), ∀sj, ∀θsj ∈ Πsj(h t), ∀θ p−i ∈ Πp−i(h t), (3.14) ∀θpi ∈ Πpi(h t), ∀p i ∈ Ip

It is clear that if the constraint (3.14) is affine and the price profile pp∗−i(θp−i) for all type

profiles is known, then the KKT condition is sufficient and necessary for solving the

convex optimization problem. However, finding the optimal strategy profile pp∗−i(θp−i)

for all possible θp−i needs the information of p

p∗

i (θpi) for all possible θpi. It follows that

each player has to jointly solve all PS’s KKT conditions simultaneously. The joint KKT conditions are given by

(34)

                                             −b∗ ji(θsj, θp) ≤ 0, ∀(θsj, θp−i) ∈ Θm,i(h t), ∀s j ∈ Is PM j=1b∗ji(θsj, θp) − Wi ≤ 0, ∀(θs, θp−i) ∈ ΘM,i(h t) Kjk(θsj) = pjk(θpk), ∀θsj ∈ Πsj(h t), ∀θp−i ∈ Πp−i(h t), ∀s j ∈ Is, ∀pk ∈ Ip

λi,j,θsj,θp ≥ 0, ∀(θsj, θp−i) ∈ Θm,i(h

t), ∀s

j ∈ Is

νi,θs,θp ≥ 0, ∀(θs, θp−i) ∈ ΘM,i(h

t) λi,j,θsj,θpb ∗ ji(θsj, θp) = 0, ∀(θsj, θp−i) ∈ Θm,i(h t), ∀s j ∈ Is νi,θs,θp( PM j=1b∗ji(θsj, θp) − Wi) = 0, ∀(θs, θp−i) ∈ ΘM,i(h t) ∇ppiLi = 0 ∀θpi ∈ Πpi(h t), ∀p i∈ Ip. (3.15) whereλi,j,θsj,θp

−i,νi,θs,θp−i andηk,j,θsj,θp−i are Lagrange multipliers.Kjk(θsj) represents

righthand part of equation (3.14), which is

Kjk(θsj) ≡ ∂Eθs −j,θp[usj(b s j|θ)]|bs j=bs∗j (θsj,θp) ∂bjk , ∀pk ∈ Ip (3.16)

Liis Lagrangian function of PSi of type θpi, which is

∇ppiLi = ∇ppisp −i[Ppi(p p i, b p∗ i (θ)|θ)] (3.17) − X

sj∈Is,(θsj,θp−i)∈Θm,i(ht)

λi,j,θsj,θp∇ppi(−b∗ji(θsj, θp)) (3.18) − X (θs,θp−i)∈ΘM,i(ht) νi,θs,θp∇ppi M X j=1 b∗ ji(θsj, θp) − Wi ! (3.19) − X

pk∈Ip,sj∈Is,θsj∈Πsj(ht),θp−i∈Πp−i(ht)

ηk,j,θsj,θp∇ppi[pjk(θpk) − Kjk(θsj)] (3.20)

(35)

3.4

Explicit System For Multiple Sellers and Multiple

Buy-ers

3.4.1

Utility Model

In this part, we adopt and modify the utility models in [8]. The profit function of theith

PS is given by Ppi(p p i, b p i|θpi) = (p p i)Tb p i + c1θpi− c2θpi B req i − k (p) i Wi−PMj=1bji θpi !2 , (3.21)

wherec1 and c2 are constant weights, Breqi is the bandwidth requirement for a primary

connection, ki(p) = log2  1 + 1.5γip ln(0.2/BERtar i ) 

denotes the spectral efficiency of wireless

transmission for theith PS with γip being the signal-to-noise ratio (SNR) at theith PS’s

receivers andBERtari being the target bit-error-rate (BER) for theith PS’s local

connec-tion [18]. The private informaconnec-tionθpi, taking values in the setΘp, represents the number

of connections in theith PS. The first term in righthand side of (3.21) is the monetary

gain of selling bandwidths. The second term is the revenue of maintaining primary

con-nections that is proportional toθpi. The third term is the cost of sharing the spectrum with

SS’s, the square term could be interpreted as magnification of the difference between

re-quired throughput and actual serving throughput per terminal user ofith PS. Instead of

single SS scenario in [8], the profit function (3.21) considers multiple SS’s.

The profit function of SSj is given by

Psj(p s j, bsj|θsj) = 1 θsj " N X i bjik (sj) i − 1 2 (b s j)Tbsj + 2ξj X k6=i bjkbji !# − (ps j)Tbsj, (3.22)

whereξj ∈ [−1.0, 1.0] is jth SS’s spectrum substitutability is defined as follows. When

ξj = 1, SS j could switch among the spectrum rent from all PS’s freely. When ξj = 0,

SSj can’t switch among the operating spectrum. If ξj < 0, spectrum sharing by SS j is

complementary, that is, it will need to buy one or more additional spectrum

simultane-ously. We consider0 ≤ ξj ≤ 1 for the rest of the thesis, the other case −1 ≤ ξj ≤ 0

is straightforward. k(sj) i = log2  1 + 1.5γ s ji ln(0.2/BERtar j ) 

(36)

acquired byjth SS’s secondary user on the band Wi owned by PSi. The first two term

in righthand side of (3.22) are QoS satisfaction function of SS j, which is modeled as

a concave function of bsj. The last term is the payment for buying bandwidths from all

PS’s. Compared with the utility in [8], we introduce the private informationθsj ofjth SS

in this paper to represent the factor leveraging the weighting between QoS and the spec-trum trading expense. This weighting factor is implicitly related to the number of active connections within SS. When there is no connections requested by the cognitive users in

jth SS, jth SS must have zero profit in terms of QoS and the corresponding θsj is∞.

3.4.2

Solving for Perfect Bayesian Equilibrium

To obtain the optimal strategy ofjth SS of type bθsj, the KKT condition of the

maximiza-tion ofjth SS’s profit function is

∇bs jEθs−j,θp[Psj(p s j( ˆθp), bsj|bθsj)]|bsj=bs∗j = ∇bsjPsj(p s j( ˆθp), bsj|bθsj)|bsj=bs∗j = 0, (3.23)

In this example, the close form solution of the best demand from jth SS to ith PS is

obtained as follows b∗ji(ˆθsj, ˆθp) = Dji(p s j( bθp), bθsj) = D1,ji(p s j−i( bθp−i), bθsj) − bθsjpji(bθpi)D2,j, (3.24)

where psj−i( bθp−i) is p

s

j( bθp) with the exclusion of pji(bθpi) and

D1,ji(psj−i( bθp−i), bθsj) =

Cji Aj + ξjθbsj P k6=ipjk(bθpk) Aj (3.25) D2,j = (ξj(N − 2) + 1) Aj > 0, if 0 ≤ ξj ≤ 1 (3.26) withAj = (1 − ξj)(ξj(N − 1) + 1) ≥ 0, Cji= ki(sj)(ξj(N − 2) + 1) − ξjPk6=ikk(sj).

We observe that Dji(psj( bθp), bθsj) is an affine function of p

s

j. It would increase as

pjk(bθpk) increases for all pk ∈ Ip, pk 6= pi and would decrease as pji(bθpi) increases.

The minimum ofDji(psj( bθp), bθsj) happens when pji(bθpi) is highest and p

s

j−i( bθp−i) is

low-est. However, the dependency on bθsj is not clear, which also depends on p

s

j−i( bθp−i) and

pji(bθpi). Similar reasoning could be applied for the maximum of Dji(p

s

(37)

the minimal set for bandwidth constraint (3.8) and (3.9) are Θm,i,j(ht) = {(θsmj, θ m p−i), (θ M sj, θ m p−i)} (3.27) ΘM,i(ht) = {(θsc, θpM−i) (θc s)j = θsmj orθ M sj, ∀sj ∈ Is} (3.28) whereθm sj is minimum ofθsj withµ(θ m sj|h t) > 0, θM sj is maximum ofθsj withµ(θ M sj|h t) > 0, θm

p−i is elementwise minimum of θp−i withµ(θ

m p−i|h

t) > 0, θM

p−i is elementwise

maxi-mum of θp−iwithµ(θ

M p−i|h

t) > 0.

Then, we examine the objective function of PS’s.Ppi(p

p i, b

p∗

i (θs, θpi, θp−i)|θ) is not a

concave function of(ppi, bpi) in this explicit case, but with bip∗(θs, θpi, θp−i) being replaced

withDip(p(θp), θs) , the new function Ppi(p

p i, D p i(p(θp), θs)|θpi) is concave of p p i, where Dip(p(θp), θs) = (D1i(ps1(θp), θs1), · · · , Dji(p s j(θp), θsj), · · · , DM i(p s M(θp), θsM)) T. To-gether withb∗

ji(θsj, θpi, θp−i) in (3.12) and (3.13) being replaced with Dji(p

s

j(θp), θsj), we

can drop the equation (3.14), and the equation (3.11)- (3.13) becomes,

pp∗i (θpi) =arg max ppi E θsp −i[Ppi(p p i, D p i(p(θp), θs)|θpi)], (3.29) s.t. 0 ≤ Dji(psj(θp), θsj), ∀sj ∈ Is, ∀(θsj, θp−i) ∈ Θm,i,j(h t), (3.30) Wi ≥ M X j=1 Dji(psj(θp), θsj), ∀(θs, θp−i) ∈ ΘM,i(h t), (3.31) ∀θpi ∈ Πpi(h t), ∀p i∈ Ip.

The optimization of PS’s in the explicit system is a convex optimization problem. And the joint KKT condition now becomes

                                 −Dji(psj(θp), θsj) ≤ 0, ∀sj ∈ Is, ∀(θsj, θp−i) ∈ Θm,i,j(h t) PM j=1Dji(psj(θp), θsj) − Wi ≤ 0, ∀(θs, θp−i) ∈ ΘM,i(h t)

λi,j,θsj,θp ≥ 0, ∀sj ∈ Is, ∀(θsj, θp−i) ∈ Θm,i,j(h

t)

νi,θs,θp ≥ 0, ∀(θs, θp−i) ∈ ΘM,i(h

t) λi,j,θsj,θpDji(p s j(θp), θsj) = 0, ∀sj ∈ Is, ∀(θsj, θp−i) ∈ Θm,i,j(h t) νi,θs,θp PM j=1Dji(psj(θp), θsj) − Wi  = 0, ∀(θs, θp−i) ∈ ΘM,i(h t) ∇ppiLi(θpi) = 0 ∀θpi ∈ Πpi(h t), ∀p i∈ Ip, (3.32)

(38)

whereLi(θpi) is the Lagrangian function for maximization of ith PS of type θpi, and the n-th element of ∇ppiLi(θpi) is h ∇ppiLi(θpi) i n= ∂Li(θpi) ∂pni = ∂E [Ppi(p p i, D p i(ps(θp), θs)|θpi)] ∂pni − λi,n,θm sn,θmp−iθ m

snD2,ni− λi,n,θMsn,θmp−iθ

M snD2,ni+ X (θs,θp−i)∈ΘM,i(ht) θsnD2,niνi,θs,θp Since∇ppisp −i[Ppi] = Eθs,θp−i[∇p p iPpi], we compute h ∇ppiPpi(p p i, D p i(ps(θp), θs)|θpi) i n = ∂Ppi(p p i, D p i(ps(θp), θs)|θpi) ∂pni = " Cni An + 2c2k(p)i θsnD2,ni B req i − k (p) i Wi−PMj=1CAji j θpi !# | {z } En,i(θpi,θsn) − " 2θsnD2,ni+ 2c2(ki(p)θsnD2,ni) 2 θpi # | {z } Gn,i(θpi,θsn) pni(θpi) − 2c2(ki(p))2θsnD2,ni θpi | {z } Hn,i(θpi,θsn) X j6=n θsjpji(θpi)D2,j + " ξnθsn An +2c2(k (p) i θsn) 2D 2,niξn θpiAn # | {z } Fn,i(θpi,θsn) X k6=i pnk(θpk) + 2c2(ki(p))2θsnD2,ni θpi | {z } In,i(θpi,θsn) X j6=n ξjθsj Aj X k6=i pjk(θpk)

= En,i(θpi, θsn) − Gn,i(θpi, θsn)pni(θpi) − Hn,i(θpi, θsn)

X j6=n θsjpji(θpi)D2,j + Fn,i(θpi, θsn) X k6=i pnk(θpk) + In,i(θpi, θsn) X j6=n ξjθsj Aj X k6=i pjk(θpk), ∀sn∈ Is. Therefore, Eθsp −i  ∂Ppi(p p i, D p i(ps(θp), θs)|θpi) ∂pni 

= En,i− Gn,ipni(θpi) − Hn,i

X j6=n θsj · pji(θpi)D2,j + Fn,i X k6=i pnk(θpk) + In,i X j6=n ξjθsj Aj X k6=i pjk(θpk).

3.4.3

Algorithm for Solving Joint KKT Condition

The joint KKT conditions can be solved by active-set method [19], which is summarized in Algorithm 1.

(39)

Algorithm 1 Active-set method for solving joint KKT condition

0: Define: S ,Fi,jΘm,i,j(ht) ∪ ΘM,i(ht), and W is the working set.

1: Initialize: SetW = ∅.

2: Repeat: Solve the joint KKT conditions with thatλi,j,θsj,θp = 0 and

νi,θs,θp = 0 for those constraints /∈ W.

3: Condition 1: Check whether equation (3.30) is satisfied forθM

pi, ∀pi ∈ Ip

4: Condition 2: Check whether equation (3.31) is satisfied forθm

pi, ∀pi ∈ Ip

5: Condition 3: Check whetherλi,j,θsj,θp ≥ 0 and νi,θs,θp ≥ 0 for

those constraints∈ W,

6: If conditions 1, 2, and 3 all are satisfied, then

we obtain the optimal pp∗i (θpi) for all θpi ∈ Πpi(h

t) and for all p

i. We finish.

7: Else choose anotherW ⊂ S.

8: End repeat

The complexity of this algorithm depends on two factors, one is how you choose next working set, and the other is how you solve the linear equations. If the simplest working set choosing, i.e. linear choosing, is implemented, then the worst case searching number

would be22M N +N 2M

. It’s because there’re 2MN + N2M constraints in total, therefore

22M N +N 2M

combinations of working set are possible. The number of linear equations

for given working set W is (N ∗ M ∗ |Θp| + |W|), where |W| is the number of active

constraints, which ranges from 0 to22M N +N 2M

.

To make this algorithm more practical, we can reduce the complexity by quantizing

Θp. For instance, if nowΘp ≡ {1, 2, · · · , 10}, then we can quantize it into 2 subsets, the

upper set and the lower set, and let8 be the representative element for upper set, and 3 be

the representative element for lower set. For all elements greater than 5, they are viewed 8; for all elements less or equal to 5, they are viewed 3. Now the algorithm is performed

with the quantized type spaceΘq

p ≡ {3, 8}. After the current period game is finished and

the opponents’ type are classified into either upper set or lower set, the upper or lower set could be further quantized for the next period game. In this way, the type space is now of

(40)

size 2 for every time calculation, so the complexity is greatly reduced.

3.5

Convergence of Beliefs and Actions

In this section, we discuss the convergence of beliefs and actions. We’ll conclude that 1. the belief update always tends to lead to a correct one, but may not converge; 2. although the belief may not converge, the action would converge to the one of actual type.

Proposition 1 The belief of player x 6= y about the actual type of player z at stage t

would be greater or equal to the belief at staget′ ift > t.

µi( bθj|ht) ≥ µi( bθj|ht ′ ) (3.33) Proof 1 µi( bθj|ht ′+1 ) = µi( bθj|h t′ )δ(at∗ j ( bθj) − at∗j ( bθj)) P θ′ j:at∗j(θ′j)=at∗j ( bθj)µi(θ ′ j|ht ′ ) (3.34) = µi( bθj|h t′ ) P θ′ j:at∗j (θ′j)=at∗j ( bθj)µi(θ ′ j|ht ′ ) ≥ µi( bθj|h t′ ) (3.35)

According to the above statement, the updating of belief is never a misleading updat-ing. But it doesn’t address about whether the updating converges to the actual one or not, perhaps the improvement stops before converging to the actual one. Fortunately, even the belief may not converge to actual profile, the action profile taken by all players con-verges, and it would converge to the action profile same as the one taken in the complete information game. The reasoning is as follows.

Given that pj−i(θp−i) are taken by joint KKT method, PS i knows that the optimal

demand from SSj of type θsj isb

ji(θsj, ˆθpi, θp−i) by solving (3.10). The optimal pricing

p∗unc

ji (ˆθpi) of (3.7) without constraint (3.8) and (3.9) may result in feasible or infeasible

demandb∗unc

ji (θsj, ˆθpi, θp−i). However, the demand must be feasible. If p

∗unc

ji (ˆθpi) makes

thei-th demand negative, then by joint KKT condition, b∗

ji(θsj, ˆθpi, θp−i) would be fixed

to0, and that would reversely generate new optimal i-th pricing p∗

(41)

following equations p∗ ji(ˆθpi) = ∂Eθp[usj(bsj|θ)]|bs j=bs∗j (θ) ∂bji , (3.36) pjk(θpk) = ∂Eθp[usj(bsj|θ)]|bs j=bs∗j (θ) ∂bjk , ∀k 6= i, (3.37)

where the i-th term of bs∗j (θ) is b∗

ji(θsj, ˆθpi, θp−i) = 0. Note that since b

ji(θsj, ˆθpi, θp−i) =

0, Solving (3.37) obtains b∗

j−i for givenpj−i(θp−i), which means that b

j−i is irrelevant to

p∗

ji(ˆθpi) if p

∗unc

ji (ˆθpi) gives negative demand. Then, since b

ji(θs, ˆθpi, θp−i) = 0, b

∗ j−i is

determined bypj−i(θp−i) solely, and p

ji(ˆθpi) is determined by b

j−i completely. The

rela-tion between the newly generated optimal pricingp∗

ji(ˆθpi) and type ˆθpi lies onp

∗unc ji (ˆθpi). Ifp∗unc ji (ˆθpi) results in feasible b ∗unc ji (θsj, ˆθpi, θp−i), then p ∗ ji(ˆθpi) = p ∗unc ji (ˆθpi), which de-pends on ˆθpi. If p ∗unc ji (ˆθpi) results in negative b ∗unc ji (θsj, ˆθpi, θp−i), then p ∗ ji(ˆθpi) is

deter-mined bypj−i(θp−i) completely, which is independent of θpi. Here, we defineΘj,i,neg ≡

{θpi except ˆθpi|p

∗unc

ji (θpi) results in negative demand} to proceed the discussion. For those

θpi ∈ Θj,i,neg, thei-th demand b

ji(θsj, θpi, θp−i) = 0 by joint KKT and p

ji(θpi) will also be

constrained as (3.36). Following the same reasoning,pj−i(θp−i) determines p

ji(θpi)

com-pletely, and the constrained pricingp∗

ji(θpi) is independent of θpi. Therefore, ifp

∗unc ji (ˆθpi)

results in negative demand, thenp∗

ji(ˆθpi) is the same as p

ji(θpi) for θpi ∈ Θj,i,neg given

the samepj−i(θp−i) (hence for the same θ−i). Clearly, ifΘj,i,neg is nonempty, then PSi’s

opponents couldn’t tell what the actual type PSi is since the best strategy for those type

are the same, but we should note that the best strategy still corresponds to the actual type.

It’s similar to apply the reasoning for the case that the demand more thanWi, then by

joint KKT condition,b∗

ji(θsj, ˆθpi, θp−i) would be fixed to Wi, and that would reversely

gen-erate new optimali-th pricing p∗

ji(ˆθpi) by (3.36) with the i-th term of b

(θ) is b

ji(θsj, ˆθpi, θp−i) =

Wi.

To sum up, although the belief may not converge to the actual type, the actions always converge to the actual value.

(42)

Chapter 4

Simulations

4.1

Simulation Setup

The explicit model developed in Section 3.4 is adopted for simulation. In the first section, we show the effectiveness of the proposed joint KKT method for several cases and com-pare it with other existing work. In the second section, we examine the players’ actions and the belief about players’ type as time evolves and numerically analyze the result.

The type space of PS’s is set to be ΘP = {10, 11, 12}, and the type space of SS’s is

set asΘS = {1, 2, 3}. The initial beliefs are assumed uniformly distributed over the type

space, µ(θpi|h

0) = 1

3 for all pi and µ(θsj|h

0) = 1

3 for allsj. The constants in the PS’s

utility are chosen asc1 = 2 and c2 = 2, and the spectrum substitutability ξj is0.4 for all

sj. Note that some parameters may change depending on different simulation scenarios,

and the remaining parameters will be specified in each simulation scenario.

4.2

Numerical Results

4.2.1

Effectiveness of The Joint KKT Conditions

In the section, we simulate the multistage game with complete information, i.e.µx(bθy) =

1 for all x, y, and compare the results of the proposed joint KKT conditions with those

(43)

effec-tiveness with effeceffec-tiveness of joint KKT conditions.

2 PS vs. 1 SS

First, we simulate the game with 2 PS and 1 SS with complete information, and compare the results of the proposed joint KKT conditions with those in [8] that correspond to

unconstrained (unc) game. Since there’s only one SS,bidenotesb1ifor simplicity, andpi

denotesp1i. Note that the constraint forbiin (3.8) is denoted here asfi,1, and that in (3.9)

is denoted here asfi,2. We show the best responses (BR), Nash equilibrium (NE) and the

corresponding feasible regions in both Fig. 4.1 and Fig. 4.3. The intersection of the best responses is the NE which is the result of sequential rationality when the information is complete.

In Fig. 4.1, with parametersW1 = 15 MHz and W2 = 15 MHz, Breq1 = 2 Mbps and

Breq2 = 2 Mbps, ˆθp1 = 10 and ˆθp2 = 10, ˆθs1 = 1, and the received SNR’s γ

p

1 = 15 dB,

γ2p = 15 dB, γs

11 = 22 dB, and γ12s = 22 dB, the unc solution satisfies the bandwidth

constraints, so it agrees with the solution of the proposed joint KKT conditions. Fig. 4.2(a) shows the profit function of PS1 given PS2 acting equilibrium strategy obtained by solving joint KKT condition and SS taking best demand. Fig. 4.2(b) shows the profit function of PS2 on similar condition. In this case, we observe that the feasible region on each PS’s profit function cover the unconstrained best response point. Fig. 4.2(c) shows the contour plot of the profit of SS given that PS1 and PS2 act equilibrium strategy obtained by solving joint KKT condition, and it shows that SS’s highest profit lies in strictly feasible region.

In Fig. 4.3, with parameters W1 = 5 MHz, W2 = 5 MHz, Breq1 = 2 Mbps and

Breq2 = 2 Mbps, ˆθp1 = 10 and ˆθp2 = 10, ˆθs1 = 1, and the received SNR’s γ

p

1 = 15 dB,

γ2p = 15 dB, γ11s = 22 dB, γ12s = 10 dB, the unc solution lies outside the bandwidth

constraints, while the optimal strategiesb∗

1 = 0 and b∗2 = 0 of the joint KKT conditions

satisfy the constraint. Fig. 4.4(a) shows the profit function of PS1 given that PS2 acting equilibrium strategy obtained by solving joint KKT condition and SS taking best demand. Fig. 4.4(b) shows the profit function of PS2 on similar condition. Fig. 4.4(c) shows the contour plot of the profit of SS given that PS1 and PS2 act equilibrium strategy obtained

數據

Figure 2.1: Simplified Cognition Cycle.
Figure 3.1: The cognitive radio network with multiple primary services and multiple secondary services.
Figure 3.2: The evolution of the multistage sequential game.
Figure 4.1: The best response, Nash equilirium and feasible region for PS’s with W 1 = 15 MHz and W 2 = 15 MHz, γ 1 p = 15 dB, γ 2 p = 15 dB, γ 11s = 22 dB, and γ 12s = 22 dB.
+7

參考文獻

相關文件

(4) The survey successfully enumerated some 4 200 establishment, to collect their views on manpower requirements and training needs in Hong Kong over the next five years, amidst

If land resource for private housing increases, the trading price in private housing market will decrease but there may not be any effects on public housing market 54 ; if we

If land resource for private housing increases, the trading price in private housing market will decrease but there may not be any effects on public housing market 54 ; if

• Nokia has been using Socialtext wiki software for a year and a half to facilitate information exchange within its Insight &amp;

CAST: Using neural networks to improve trading systems based on technical analysis by means of the RSI financial indicator. Performance of technical analysis in growth and small

CAST: Using neural networks to improve trading systems based on technical analysis by means of the RSI financial indicator. Performance of technical analysis in growth and small

For a directed graphical model, we need to specify the conditional probability distribution (CPD) at each node.. • If the variables are discrete, it can be represented as a

For your reference, the following shows an alternative proof that is based on a combinatorial method... For each x ∈ S, we show that x contributes the same count to each side of