在感知性網路中以部分探測馬可夫決策過程為基礎的頻帶換手機制

(1)

國立交通大學

電信工程學系

碩士論文

在感知性網路中以部分探測馬可夫決策過程為基礎的

頻帶換手機制

A POMDP-based Spectrum Handoff Protocol for

Partially Observable Cognitive Radio Networks

研究生：馬瑞廷

指導教授：方凱田教授

(2)

在感知性網路中以部分探測馬可夫決策過程為基礎的頻帶換手機制

A POMDP-based Spectrum Handoff Protocol for Partially Observable Cognitive Radio

Networks

研究生：馬瑞廷 Student：Rui-Ting Ma

指導教授：方凱田 Advisor：Kai-Ten Feng

國立交通大學

電信工程學系

碩士論文

A Thesis

Submitted to Department of Communications Engineering College of Electrical and Computer Engineering

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master of Science

in

Communications Engineering June 2009

Hsinchu, Taiwan, Republic of China

(3)

在感知性網路中以部分探測馬可夫決策過程為基礎的頻帶換手機制

學生：馬瑞廷

指導教授

：

方凱田教授

國立交通大學電信工程學系碩士班

摘

要

近年的研究說明了靜態的頻帶分配是造成頻帶使用缺少效率的主

因，為了增進頻帶使用率，可動態偵測且使用認證頻帶的感知無線電

(CR)因應而生。如何提供有效率的頻帶換手在 CR 中是個很重要的

議題。現存的頻帶換手方法假定感知無線電的使用者(CR user)可以正

確的偵測每一個頻帶以便找到適合的頻帶進行換手。然而，這個假設

在實際的情況下是不實際的，因為 CR 使用者偵測頻帶所花費的時間

將會太高而影響主要使用者的品質。在這篇論文中，藉由部分可知

的環境下的 Marcov 決策過程(POMDP)的幫助，可以透過探測部分的

頻帶來估測整個網路環境。此篇論文提出以 POMDP 為基準的頻帶

換手機制(POSH)，其目的為藉由部分的通道狀態來找出最適合進行

換手的頻帶。除此之外，為了適應多位使用者的環境，此篇論文本著

將頻寬資源充分分配給各使用者的概念提出了一個以 POMDP 為基

準的多使用者的換手機制(M-POSH)。藉由 POMDP 為基準的頻帶換

手機制所選出的頻帶，可達到在每次換手時 CR 使用者所需等待的時

間最短，數據結果顯示出此方法可有效率地讓 CR 使用者在每次頻帶

換手時達到最少的等待時間。

(4)

A POMDP-based Spectrum Handoff Protocol for Partially Observable

Cognitive Radio Networks

student：

Rui-Ting Ma

Advisors：Dr.

Kai-Ten Feng

Department of Communication Engineering

National Chiao Tung University

ABSTRACT

Recent studies have been conducted to indicate the ineffective usage of licensed

bands due to the static spectrum allocation. In order to improve the spectrum

utilization, the cognitive radio (CR) is therefore suggested to dynamically exploit the

opportunistic primary frequency spectrums. How to provide efficient spectrum

handoff has been considered a crucial issue in the CR networks. Existing spectrum

handoff algorithms assume that all the channels within the network can be correctly

sensed by the CR users in order to perform appropriate spectrum handoff process.

However, this assumption is considered impracticable in realistic circumstances

primarily due to the excessive time required for the CR user to sense the entire

spectrum space. In this paper, the partially observable Markov decision process

(POMDP) is exploited to estimate the network information by partially sensing the

frequency spectrums. A POMDP-based spectrum handoff (POSH) scheme is

proposed to determine the optimal target channel for spectrum handoff according to

the partially observable channel state information. Moreover, a POMDP-based

multi-user spectrum handoff (M-POSH) protocol is proposed to adapt the POMDP

policy into multi-user CR network by distributing CR users to opportunistic

(5)

誌

謝

在畢業典禮的時候，很多人都會說『恭喜畢業』！誠然畢業是件值得高興的事，他代表著兩年努力的成果以及學生生涯的一個段落。然而，對我而言，值得高興的事除了畢業這個結果之外，更重要的是這兩年難忘的時光。兩年前的自己站在交大這個偉大的學術殿堂前無所適從，對研究生活一無所知，甚至連我要加入的實驗室其實也不是很瞭解...現在回想起，加入 MINT Lab 這個大家庭，無疑是我做過最正確的選擇。這本論文能順利產生，最要感謝指導教授方凱田老師的指導，老師一次一次很有耐心地和完全沒有經驗的我討論並修改論文，讓我能慢慢從中學習並完成這本論文。口試委員蘇育德老師和黃經堯老師提供了很多寶貴的意見，點出我論文仍不完善之處，讓我能從不同的角度去思考。除了老師之外，這本論文的幕後功臣欲彬學長也幫了大忙，在我對研究還沒有概念的時候帶我上手，幫助我釐清問題，在我困惑的時候幫我指出正確的方向，同時也讓我更有信心能完成論文。很幸運有他們的幫助，讓我不至於在陌生的領域費時摸索，而能很快地找到正確的道路。除了欲彬學長之外，仲賢學長，小華學長，Walker 學長和文俊學長也都是很有本事的學長，不只在研究上，也在各個方面為我樹立了許多良好的模範，研究助理昭霖學長和美代學姐畢業後仍回來參與研究的精神也相當值得欽佩。從他們身上我學到很多研究的方法，也大概瞭解了所謂的研究是怎麼一回事，多虧了他們的幫助，讓我不致成為一個名不符實的「研究生」。學弟其懋、萬邦、俊宇、承澤和惟能為實驗室帶來了許多歡樂，讓實驗室生活不僅只有生硬的研究，也讓實驗室的大家在不同地方有更加緊密的連結。同屆的同學，佳偉，俊傑和佳仕，這些日子中一起修課，一起研究，一起運動，一起討論一些有趣但不是很有意義的話題，一起度過這 2 年愉快的時光。佳偉的研究雖然我一直沒有弄懂，不過這 2 年來都在同一個 Group，他在研究上的努力認真也給我一些正面的影響。這 2 年和俊傑有蠻多合作的經驗，他是個令人放心的伙伴，在他身上常常能發現許多令人稱道的特質，希望以後也能繼續好好合作。雖然在研究上和佳仕的合作不多，然而在其他方面的接觸上發現他有十分與眾不同的特質，礙於標題的限制只好用短短一句話祝福他博班生活順利，並且在學長們相繼離開之後能好好領導 MINT Lab,在他的幫助下，未來的學弟們一定可以頭腦清晰，辯才無礙。很高興能成為你們的同學也謝謝你們這 2 年來的幫助，從你們身上我學到很多受用無窮的東西。

(6)

最後要感謝我的家人，他們給我毫無保留的支持，讓我能在外地生活而無後顧之憂。有了他們的支持，我才能經歷這許多美好的一切。套句人家說的話，我們家雖不富有，但他們願意給我最好的。研究成果可以用這本論文記載，這兩年經歷獲得的助益怎樣也訴說不盡。感謝這一路走來幫助過我的朋友們，有了你們才造就了今天的我。雖然我無法在這一一表達所有朋友曾經帶給我的幫助，但我將懷抱著這份感激，邁向我下一段的旅程。馬瑞廷謹誌于新竹國立交通大學

(7)

List of Figures

2.1 The schematic diagram of the POMDP framework. . . 6

3.1 Performance comparison: the number of waiting time slots versus the number of spectrum handoff. . . 13

4.1 Practical design for the time-slotted channels under different spectrum handoff schemes. . . 22

6.1 Performance validation: the number of waiting time slots versus traffic arrival rate. . . 33 6.2 Performance validation: the biased percentage β versus traffic arrival rate. 34 6.3 Performance comparison: the number of waiting time slots versus the

number of spectrum handoff. . . 35 6.4 Performance comparison: the number of waiting time slots versus the

total number of transmission time slots. . . 36 6.5 Performance comparison: the number of waiting time slots versus traffic

arrival rate (numbers of spectrum handoff = 250). . . 37 6.6 Performance comparison: the number of waiting time slots versus traffic

arrival rate (number of transmission time slots = 10000). . . 37 6.7 Performance comparison: the waiting time with practical consideration

(10)

6.8 Performance comparison: the net transmission time per slot versus dif-ferent arrival rates. . . 39 6.9 Performance comparison: the expected number of waiting slots versus

different number of channels (with 2 CR users). . . 40 6.10 Performance comparison: the expected number of waiting slots versus

(11)

Chapter 1 Introduction

According to the research conducted by FCC [1; 2], a large portion of the priced fre-quency spectrums remains idle at any given time and location. It has been indicated that the spectrum shortage problem is primarily resulted from the spectrum manage-ment policy rather than the physical scarcity of frequency spectrum. Consequently, a great amount of research is devoted to the study of cognitive radio (CR) in recent years [3–6]. The CR user (i.e. the secondary user) is capable of sensing the channel condition and can adapt its internal parameters to access the licensed channels while these channels are not being utilized by the primary users. The IEEE 802.22 [7; 8], considered as a realistic implementation of the CR concept, is an emerging standard that allocates spectrums for TV broadcast services via a license-exempt basis. Since there is no promise for a CR user to finish its transmission on a certain spectrum, a mechanism called spectrum handoff has been introduced to allow the CR user to select another channel to maintain its data transmission. Consequently, the main objective for spectrum handoff is to select a feasible target channel such that the CR user can be switched into such as to retain its on-going transmission. The performance of spectrum handoff will primarily be dominated by the feasibility of conducting channel selection.

(12)

Target channel selection can be categorized into two different types of schemes according to their sensing strategies [9], i.e. the pre-sensing and post-sensing meth-ods. The pre-sensing scheme indicates that the secondary user will sense the frequency spectrums and consequently choose a sequence of selected target channels before the beginning of its data transmission. Once the secondary user is interrupted by the pri-mary user, the secondary user will be switched to a channel which was determined in sequence from the pre-sensing phase. In general, pre-sensing techniques can reduce the waiting time for spectrum handoff since the target channel is selected based on a pre-determined channel list. However, since the stochastic characteristics of channel can vary drastically in realistic situations, the pre-determined channel list can become infeasible to be adopted in target channel selection for spectrum handoff. The channel reservation scheme as proposed in [10] conducts pre-sensing by exploiting the balance between blocking probability and forced termination in order to reserve idle channels for spectrum handoff. Analytical models have been studied in [11; 12] to illustrate the beneficial aspects of the pre-sensing strategy. However, those reserved idle channels can not be ensured available at the time for spectrum handoff. As a result, the performance of pre-sensing strategies can not be guaranteed especially under the fast fading channel environments.

On the other hand, the post-sensing techniques is implemented while the secondary user is forced to terminate its transmission by the primary user. The CR user will start to sense the spectrum in order to verify if there are available channels that can be accessed and consequently becomes its target channel. Compared to the pre-sensing approaches, a more feasible and accurate channel can be selected by exploiting the

(13)

post-not permissible in spectrum handoff since the allowable time duration is considered limited for the CR user to vacate its current channel for the primary user. Further-more, it is assumed in most of the existing pre-sensing and post-sensing strategies that all the channels within the network can be correctly sensed, which is considered im-practicable in realistic environments. In other words, the transition probabilities of all the channel states are not always served as the available information to the CR users within the network. An estimation algorithm has been proposed in [13] to estimate the transition probability by adopting the maximum likelihood function. However, the con-verging speed for the estimation can become intolerable while small value of transition probability has been occurred.

In this paper, without the necessity of obtaining all the correct channel informa-tion, the partially observable Markov decision process (POMDP) [14–17] is utilized to reveal the network information by partially sensing the available frequency channels. A POMDP-based spectrum handoff (POSH) mechanism is proposed as a post-sensing strategy in order to acquire the policy such that optimal channel can be obtained with minimal waiting time at each occurrence of spectrum handoff. The transition proba-bilities for the channel states are derived in this paper by considering the channel as an M/G/1 system with given packet arrival rate and service rate. In order to observe the behavior of spectrum handoff, analytical models for the proposed POSH method and other existing channel selection schemes are derived, validated, and compared via simulations. Furthermore, one may doubt even though the proposed POSH scheme can effectively reduce the number of waiting time slots, the complicated calculation process may bring out excessive overhead which will actually degrade its performance. Therefore, the required phases for precise channel selection behavior is discussed and compared under realistic environment. The computation complexity of the precise POSH scheme associated with the update of belief state update are also presented. It

(14)

is demonstrated in the simulation results that reduced waiting time can be obtained from the proposed POSH mechanism under the consideration of precise behaviors.

Furthermore, multiple users can exist concurrently and access the licensed spectrums in a more realistic environment. Unlike the single user scenario, a specific frequency spectrum can be simultaneously chosen by more than one CR users as the target chan-nel. Therefore, it is crucial to provide channel selection policy by considering informa-tion from other CR users, and also to alleviate the overhead while multiple CR users are contending to access the target channel. In this paper, a POMDP-based multi-user spectrum handoff (M-POSH) protocol is proposed to extend the original POSH scheme into the multi-user CR networks. In addition to the observe the traffic of primary user, that from the CR users is required to be monitored by the M-POSH scheme. A negotiation procedure is employed in order to facilitate the determination of a feasible CR user to operate within a specific target channel. In comparison with other existing algorithm, the proposed M-POSH scheme can effectively minimize the waiting time for spectrum handoff.

The rest of this paper is organized as follows. Chapter 2 briefly summarizes the concept of the POMDP approach. The proposed POSH scheme is modeled and derived in Chapter 3. Performance analysis of the POSH protocol is conducted in Chapter 4, while precise schemes of channel selection is presented in Chapter 4.2. Considering the network scenario with multi-users, the M-POSH protocol is proposed in Chapter 5. Chapter 6 illustrates the performance evaluation for both the proposed POSH and M-POSH mechanisms; while the conclusions are drawn in Chapter 7.

(15)

Chapter 2 POMDP framework

A Markov decision process (MDP) refers to a discrete time stochastic control process that conducts decision-making based on the present state information, i.e. sk ∈ Sk. It

is noted that the subscript k is utilized to denote the kth time slot in consideration; while Sk represents the entire state space at the kth time slot. Considering the realistic

case that not all the current states are obtainable, the partially observable Markov decision process (POMDP) [14] is utilized to determine the decision policy based on the partially available information and the observations from the environment. The schematic diagram of the POMDP framework is illustrated in Fig. 2.1. In general, optimization techniques are required to be exploited in order to obtain the solution for the POMDP-based problem.

2.1 Observation

Since not all the states are directly observable within the POMDP setting, a set of observations zk ∈ Zk is essential to provide an indication about which state the

en-vironment should be located. The observations can be considered with probabilistic nature, where an observation function o is defined as a probability distribution over all

(16)

Action . Realistic Environments POMDP Policy State Estimator Observation Belief State

Figure 2.1: The schematic diagram of the POMDP framework.

possible observations zk for each action ak and resulting states sk+1, i.e.

o(sk+1, ak, zk) = Pr(zk|ak, sk+1) (2.1)

∀zk∈ Zk, ak∈ Ak, sk+1 ∈ Sk+1 where Ak stands for the action set at the kth time slot.

The parameter ak denotes the action chosen by the POMDP formulation and sk+1 is

the resulting state after executing action ak. Considering the MDP case, a policy is

determined to map from the current state to the corresponding action since the present state of MDP is fully observable. On the other hand, the POMDP can only map from the latest observation to the corresponding action (as in Fig. 2.1), which is in general considered insufficient to represent the history of the process. Therefore, the belief state is utilized to reveal the statistic distribution of the current state information, which will be explained in the next section.

(17)

2.2 Belief State

The concept of the belief state, i.e. the information state, is developed to reveal the state of environment and help to behave truly and effectively in a partially observable world. The belief state b(Sk) is a statistic distribution over the state space Sk; while

b(sk) corresponds to the probability of state sk with

P

sk∈Skb(sk) ≤ 1. It is noticed

that the belief state comprises a sufficient statistical information for the past history, including all the actions and observations that can provide a basis for decision-making under environmental uncertainties. Furthermore, the essential part of belief state is that it can be updated after each corresponding action in order to incorporate one additional step of information into the history. It is considered beneficial to capture the variations from dynamic environment and consequently obtain a more accurate information of the environment. As shown in the Fig. 2.1, the updated belief state is acquired as the outcome of the state estimator, which is consisted by the inputs of observation, action, and the former belief state. Consequently, the resulting belief state b(sk+1) w.r.t. the

state sk+1 can be obtained as

b(sk+1) = Pr(sk+1|b(Sk), ak, zk) = o(sk+1, ak, zk) P sk∈SkΓ(sk, ak, sk+1)b(sk) Pr(zk|b(Sk), ak) (2.2)

where b(sk) indicates the former belief state of sk. The parameter Γ(sk, ak, sk+1)

rep-resents the state transition probability from sk to sk+1 according to the action ak,

i.e. Γ(sk, ak, sk+1) = Pr(sk+1|ak, sk). The denominator of (2.2) can be considered as a

normalizing factor, which is obtained as

Pr(zk|b(Sk),ak)= X sk∈Sk, X sk+1∈Sk+1 o(sk+1,ak,zk)·Γ(sk,ak,sk+1)b(sk) (2.3)

(18)

2.3 Reward and Value Functions

In order to ensure optimal decision is made by adopting the POMDP, it is necessary to provide a measurement such as to evaluate the cost or to reward the update from each state. An immediate reward function r(sk, ak, sk+1, zk) is defined to represent the reward

by executing action ak to turn from state sk to sk+1 associated with the observation zk.

Since both the state transition and observation function are probabilistic, the expected reward R(sk, ak) can be obtained as

R(sk, ak) = X sk+1∈Sk+1, X zk∈Zk Γ(sk, ak, sk+1) · o(sk+1, ak, zk) · r(sk, ak, sk+1, zk) (2.4)

It is noticed that the immediate reward function is denoted as the one-step value func-tion since only the present reward is the major concern. The optimal policy can be directly determined by adopting the reward function as in (2.4). However, certain pe-riod of time is in general considered for evaluating the value of the reward. Therefore, the decision policy by exploiting the POMDP is determined by optimizing the L-step value function with L ≥ 1.

(19)

Chapter 3 Proposed POMDP-based Spectrum

Handoff (POSH) Scheme

In this chapter, the POMDP framework will be utilized to model the spectrum handoff problem in a slotted overlay CR network. The proposed POSH scheme will be exploited under the single CR user scenario. The value function will consequently be formulated in order to obtain the optimal policy for spectrum handoff.

3.1 System Model for POSH Scheme

In the considered CR network, there are N channels that are available to be accessed by both the primary and the secondary users. Based on the secondary user’s point of view, each channel is assumed to be either in the busy state, i.e. occupied by the primary user, or in the idle state, i.e. free to be accessed. Considering that ci,k denotes

the state of the ith channel in time slot k (i.e. ci,k = 0 indicates the idle state and

(20)

can be written as

sk= [c1,k, . . . ci,k, . . . cN,k], ci,k ∈ {0, 1}, ∀sk∈ Sk (3.1)

The most essential part in spectrum handoff is the target channel selection, which is defined as the action set within the POMDP framework. In other words, an action ak

at the time instant k is to appropriately choose the target handoff channel from the entire N channels within the CR network, i.e. ak = {1, . . . , N }. After the execution

of an action, the channel state can consequently be observed. The set of observations

zk∈ {0, 1} can be defined as the sensing outcome, where 0 represents the idle state and

1 stand for the busy state.

Furthermore, the transition probability can be determined by modeling a channel as an M/G/1 system with arrival rate λ and service rate µ. By assuming poisson traffic for the arrival packets, the probability distribution of arriving packets can be represented as

Pr(nλ,k= x) =

e−λ_λx

x! (3.2)

where nλ,k is denoted as the number of arriving packets in the kth time slot. With the

execution of action ak, the channel transition probability τ (ci,k, ak, ci,k+1) represents the

transition from the present channel state ci,k to the channel state ci,k+1 at the next time

slot. By adopting the result from (3.2), the transition probability from idle to idle state for a channel ci,k (for i = {1, . . . , N }) after executing action ak can be acquired as

(21)

It is noted that the second equality in (3.3) is attributed to the fact that the availability of a channel in any kth time slot is independent to the total number of arrival packets within the same slot. On the other hand, the transition probability for the channel ci,k

coming from busy to idle state can be represented as

τ (ci,k = 1, ak, ci,k+1= 0) = Pr(nλ,k = 0, nλ,k−1 > 0, Ts,k ≤ 1)

Pr(ci,k = 1)

= Pr(nλ,k = 0) · Pr(nλ,k−1 > 0) · Pr(Ts,k ≤ 1)

Pr(ci,k = 1)

(3.4)

where the second equality is also contributed to the independency of the three prob-abilities within the numerator of (3.4). The parameter Ts,k is the total service time

in time slot k which includes both the time durations for serving packets coming into this kth time slot and the remaining packets acquired from the previous (k − 1)th slot. It is assumed that γ1, γ2, . . . γα are the random variables of a packet service time with

mean value of 1/µ, where α represents the number of packets arrived from the previous (k − 1)th slot, i.e. nλ,k−1 = α. The time server take for serving these α packets within

the kth slot is denoted as Tα,k =

P_α

j=0γjPr(nλ,k−1 = α). Therefore, the third term in

the numerator of (3.4) can be rewritten as

Pr(Ts,k ≤ 1) = Pr(Ts,k−1 ≤ 1) · Pr(Tα,k ≤ 1) + Pr(Ts,k−1> 1) · Pr(Ts,k−1− 1 + Tα,k ≤ 1)

(3.5)

which is the combination of two cases as follows: (a) the packets can be served in both previous and this time slots; and (b) the packets have not been entirely served in the previous slot but are able to be served in this time slot. Furthermore, the denominator

(22)

of (3.4) that representing the probability for a busy channel can be expressed as

Pr(ci,k=1)=Pr(nλ,k−1>0)+Pr(nλ,k−1=0)·Pr(Ts,k−1>1) (3.6)

Based on (3.3) and (3.4), the transition probabilities from idle to busy state and from busy to busy state can be respectively obtained as τ (ci,k = 0, ak, ci,k+1 = 1) = 1 −

τ (ci,k = 0, ak, ci,k+1 = 0) and τ (ci,k = 1, ak, ci,k+1 = 1) = 1 − τ (ci,k = 1, ak, ci,k+1 = 0).

As a result, by assuming that each channel ci,k is independent with each other for i = 1

to N, the transition probability for the entire network Γ(sk, ak, sk+1) can be obtained

as Γ(sk, ak, sk+1) = N Y i=1 τ (ci,k = ς1, ak, ci,k+1 = ς2) (3.7)

where ci,k ∈ sk, ci,k+1 ∈ sk+1, and ς1, ς2 ∈ {0, 1}, the time required for spectrum

handoff of a CR user is defined as the time duration from the termination of packet transmission in one channel to the starting time of retransmission in another channel. Three factors are considered for the spectrum handoff time including the switching time, the handshaking time, and the waiting time. In general, both the switching and the handshaking time intervals are assumed fixed with comparably smaller values. Therefore, the main objective of the proposed POSH scheme is to select a target channel that has the minimum waiting time, i.e. the smallest number of waiting slots required for the CR user in the case that the target channel is still occupied by the primary user.

(23)

frame-B B I I B B B B B B I I CR CR CR Channel 1 Channel 2 Channel 3 { Handoff { Handoff switch switch B I Spectrum hole CR waiting slots Primary transmmison CR user transmmison

Figure 3.1: Performance comparison: the number of waiting time slots versus the number of spectrum handoff.

(i.e. a∗ _{= a}

k) for spectrum handoff. Consequently, the expected reward R(sk, ak) can

be written as R(sk,ak)=E[nw=`|a∗=ak]= ∞ X `=0 `·Pr(nw=`|a∗=ak)= ∞ X `=1 `·Pr Ã _` \ p=1 cak,k+p−1=1,cak,k+`=0, ! = ∞ X `=1 `·τ (cak,`+k−1=1,ak,cak,`+k=0)·Pr Ã _` \ p=1 cak,k+p−1=1 ! = ∞ X `=1 `· τ (cak,`+k−1=1,ak,cak,`+k=0)·Pr(cak,k=1)·τ (cak,k=1,ak,cak,k+1=1) `−1 (3.8) = ∞ X `=1 `· cak,k·τ (cak,`+k−1=1,ak,cak,`+k=0)·τ (cak,k=1,ak,cak,k+1=1) `−1 _(3.9)

where cak,k denotes the channel state after selecting channel ak at the time instant k.

It is noted that the relationship from (3.8) to (3.9) is due to the fact that Pr(cak,k =

1) = cak,k since a specific channel ak is considered for the calculation of the waiting

(24)

3.2 Protocol Implementation of POSH Scheme

An overlay slotted CR network with partially observable information is considered for the POSH scheme, which indicates that the secondary user is not allow to coexist with the primary user while the time duration for packet transmission is divided into time slots. As shown in Fig. 2.1, partial channel information o(sk+1, ak, zk) is assumed

available to be observed by the secondary users, which will be exploited for the update of the belief state b(sk+1) as in (2.2). The secondary user utilizes the updated belief

state in order to estimate the channel state of the CR network.

According to the POSH scheme, an L-step value function will be adopted to obtain the corresponding action that results in the minimal waiting time after the handoff process. In other words, based on the L-step value function V∗

L[b(sk)], which is mapped

from the belief state space, the CR user will determine the feasible action to take in order to achieve the highest reward. The L-step value function for the CR user can be obtained as V∗ L[b(sk)] = max ak∈Ak X sk∈Sk b(sk) · R(sk, ak) + ρ X zk∈Zk Pr(zk|b(Sk), ak) · VL+1∗ [b(sk+1)] (3.10)

where ρ is denoted as a discount factor for convergence control of the value function. The probability Pr(zk|b(Sk), ak) is defined as in (2.3). At the beginning of the time slot

where the spectrum handoff occurred, the CR user will choose a target channel that possesses the minimum waiting slots based on the results obtained from the L-step value function as in (3.10). After switching to the target channel, the CR user will conduct the sensing task for observing the newly updated channel state even though

(25)

Moreover, it is noticed that the computation of L-step value function (in (3.10)) is considered complex and in general difficult to solve. The dimension of the belief state can grow exponentially as the number of channel is augmented, which makes it difficult to be adopted for practical implementation. A reduced state strategy has been proposed in [18] to establish an approximated linear state vector, which can effectively decrease the computation complexity of the value function. The complex optimization problem associated with (3.10) can therefore be resolved in an efficient manner.

(26)

Chapter 4 Performance Analysis of Proposed

POSH Scheme

In this chapter, the analytical model for proposed POSH scheme is derived in section A in order to analyze its performance. The models for both the no spectrum handoff (NSH) scheme and the randomly choose strategy (RCS) will also be demonstrated. The effectiveness of the analytical models is to serve as the validation purpose for these schemes, which will be compared with simulation results as in Section6. Furthermore, the analytical models for these three handoff algorithms under practical considerations will also be derived and explained in section B.

4.1 Analytical Modeling of Spectrum Handoff Schemes

(27)

However, it is not essential to point out a particular moment in analytical expressions, i.e. the subscript k for each action ak will be neglected. Instead, two different types

of actions are defined, including selecting the current channel ac and the destination

channel ad where ac, ad ∈ Ak and ac 6= ad. It is noted that the channel (either ac or

ad) selected by the action will be retained for a time period until the another action is

executed.

4.1.1 NSH Scheme

Let nw,N SH be the expected waiting time if the NSH scheme is performed, the CR user

will not switch to other channels but stay at its current channel ac to wait for the next

spectrum hole. Similar to (3.8), the expected waiting time nac

w,N SH can be obtained as nw,nsh= E[nw = `|a∗ = ac] = ∞ X `=1 ` · Pr Ã_`−1 \ p=1 cac,k+p = 1, cac,k+`= 0|cac,k = 1 ! = ∞ X `=1 ` · τ (cac,k+`−1= 1, ac, cac,k+`= 0) · τ (cac,k = 1, ac, cac,k+1= 1) `−1 _(4.1)

It is noted that the second equality in (4.1) indicates that the NSH method is adopted under the condition that no spectrum handoff is executed at the current time k even though the current channel state is busy, i.e. cac,k = 1. Therefore, the expected waiting

time by exploiting NSH scheme nw,nsh is to assign Pr(cac,k = 1) = 1 in (3.8).

4.1.2 RCS Scheme

Let nw,rcs be the expected waiting time as the RCS scheme is performed, the CR user

will randomly switch to a target channel as from the network spectrums 1 to N to

(28)

Pr(a∗ = as) for the action as to select one channel within all the N channels as below: nw,rcs = N X as=1 E[nw = `|a∗ = as] · Pr(a∗ = as) = 1 N · E[nw = `|a ∗ _{= a} c] + 1 N · N X as=1,as6=ac E[nw = `|a∗ = as] = 1 N · nw,nsh+ 1 N · N X as=1,as6=ac nas w (4.2)

where nw,nsh is defined in (4.1) and naws denotes the expected waiting time by selecting

the channel as excepting to stay at the current channel ac. In the case that the traffic

pattern of all the N channels are identical, (4.2) can be reformulated by incorporating (4.1) and (4.2) as nw,rcs = 1 N[1 + Pr(cad,k = 1)]· ∞ X `=1 ` · τ (cad,k+`−1= 1, ad, cad,k+`= 0) · τ (cad,k = 1, ad, cad,k+1= 1) `−1_nwNSH (4.3)

where the selected destination channel ad is composed by both the randomly selected

channel asand the current channel ac. Furthermore, as illustrated in (4.3), the expected

waiting time obtained from the RCS scheme should be less or at least equal to that acquired from NSH method (i.e. nw,rcs ≤ nw,nsh) since the probability Pr(cas,k = 1) ≤ 1

for ∀as 6= ac. The benefit for using the random channel selection scheme over the case

(29)

4.1.3 POSH Scheme

The channel selection behavior of both the NSH and RCS schemes are straightforward such that their analytical models can be directly expressed by stationary probabilities and transition probabilities. The proposed POSH scheme, on the other hand, deter-mines its target channel by the POMDP policy which is mainly mapped by the belief state at each step as shown in Fig. 2.1. Since the value of belief state is obtained from the observation, action, and former belief state at each step, it becomes impossible to predict and obtained the target channel in advance. Nevertheless, the analytical model of POSH scheme can still be approximately estimated since the updated belief state will gradually approach to stationary probability considering that the network is not varied rapidly. The effectiveness of the analytical model for the POSH scheme will be validated and evaluated via simulations as will be presented in Section 6.

Let nw,poshbe the expected waiting time as the POSH scheme is adopted, the

result-ing formulation is similar to the RCS scheme except that the probabilistic distribution ˜

Pr(a∗ = as) of action a∗ is considered non-uniform in this case, i.e.

nw,posh = N X as=1 E[nw = `|a∗ = as] · ˜Pr(a∗ = as) (4.4) Let nad1

w and nawd2 be the first and second minimum expected waiting time resulted from

action ad1 and ad2 respectively, where these two values are obtained as

ad1 = arg min ∀as nas w ad2 = arg min ∀as,as6=ad1 nas w (4.5)

where the expected waiting time by conducting spectrum handoff nas

w is defined in

(30)

identical, switching to another channel will result in either smaller or at least equal expected waiting time than that with staying at the current channel, i.e. the probability

Pr(cas,k = 1) ≤ 1 for ∀as 6= ac which results in naws ≤ naw,nshc as can be observed from

(4.3). Therefore, the statistical distribution of the chosen action a∗ _{in proposed POSH}

scheme can be acquired as

˜ Pr(a∗ = as) =                  1, for as = ad1 and naw,nshd1 ≤ nawd2 0.5, for as = ad1 and na_w,nshd1 > nawd2 0.5, for as = ad2 and na_w,nshd1 > nawd2 0, otherwise. (4.6)

It can be observed that the first case ˜Pr(a∗ = as) = 1 happens under the situations when

asis selected as ad1 and na_w,nshd1 ≤ nawd2. It indicates that in the case of staying at the the

channel ad1 with minimal expected waiting time will still result in comparably smaller

or equal value than that from POSH-based spectrum handoff nad2

w , it is suggested not to

conduct spectrum handoff to another channel ad2. Therefore, the probability to stay at

the channel a∗ _{= a}

d1 will be equal to 1, i.e. ˜Pr(a∗ = ad1) = 1. On the other hand, it is

also required to consider the other case with nad1

w,nsh> nawd2. Under the situation that the

previous selected channel is ad1, if the expected waiting time resulted from ad1 without

spectrum handoff is worse than that from ad2 by conducting spectrum handoff, the

corresponding CR user will decide to switch into channel ad2 at its spectrum handoff.

Nevertheless, instead of remaining at channel ad2, the CR user will choose ad1 again at

the next spectrum handoff since the waiting time nad1

w is definitely smaller than naw,nshd2 .

(31)

expected waiting time of proposed POSH scheme can be reformulated from (4.4) as nw,posh =      nad1 w,nsh, if naw,nshd1 ≤ nawd2 1 2nawd1 +12nwad2, if naw,nshd1 > nawd2 (4.7)

The benefits of these analytical models can provide the flexibility to evaluate the performance of different spectrum handoff schemes in advance. A feasible mechanism can be selected by the CR user that will be most beneficial in specific circumstance. The performance validation and comparison between these schemes will be illustrated in Section 6.

4.2 Practical Considerations for Spectrum Handoff

Schemes

In the previous section, according to the POMDP-based channel selection policies, the analytical results show that the proposed POSH scheme results in reduced expected waiting time slots than that from both the RCS and NSH algorithms at the occurrence of spectrum handoff. However, the ideal circumstance is assumed where only data trans-mission is considered in a specific time slot. In practical, it’s inevitably to spend a period of time in order to detect the network condition and to make sure that spectrum hand-off can be successfully executed. It is intuitive to recognize that additional periods of time can be required by the proposed POSH scheme for conducting the POMDP-based channel selection with partially observable channel information. Therefore, practical consideration for spectrum handoff that involves necessitate channel sensing and hand-shaking will be discussed in this chapter. The performance difference between these schemes under the practical circumstance can consequently be observed.

(32)

ACK DATA nw, RCS H Swiching Time ACK DATA Time ACK DATA nw, POSH P

Swiching Time Time

U nw, NSH Time (a) NSH (b) RCS (c) POSH ACK DATA ACK DATA U Spectrum Sensing Primary Traffic Secondary Traffic { { Handoff Slot Waiting Slots

Next Transmission Slot

H: Hand-Shake Message P: POMDP Algorithm U: Belief State Update

Figure 4.1: Practical design for the time-slotted channels under different spectrum handoff schemes.

As mentioned in previous chapters, the time slotted system is considered in this paper as shown in Fig. 4.1. It is required for the CR user to perform initial sensing on its current spectrum at the beginning of each time slot in order to assure the availability of the present channel. If the outcome of initial sensing is observed to be idle, the CR user can remain in the same slot to conduct packet transmissions. The receiver will consequently return an ACK frame at the end of this slot to acknowledge to reception of the data packet. On the other hand, in the case that the sensing outcome is busy, additional messages will be delivered in order to perform different spectrum handoff schemes which will be stated as follows.

(33)

At the slot Sk, the intention from primary user to utilized this channel is observed

during the spectrum sensing period. Based on the NSH scheme, the CR user will remain silent on the existing channel and wait until the primary user has finished its transmissions. Considering that the traffic from primary user has not been observed during the sensing period of slot Sk+p, the CR user can consequently conduct its data

transmission. Therefore, the total waiting time by adopting the NSH scheme (i.e.

Tw,nsh) can be expressed as

Tw,nsh = nw,nsh· Tslot (4.8)

where nw,nsh is obtained from (4.1), and Tslotrepresents the time duration of each slot.

Furthermore, the net transmission time within a slot by adopting the NSH method (i.e.

Ts,nsh) can be acquired as

Ts,nsh = Tslot− Tack− Tsens (4.9)

where Tsens dentes the time interval of sensing period, and Tack indicates the required

time for returning the ACK packet in order to conduct handshake for data transmission.

4.2.2 RCS Scheme

The practical consideration for the RCS scheme is illustrated in Fig. 4.1.(b). After the initial sensing, the CR transmitter will randomly select a target channel for spec-trum handoff, where the determination process is considered short and ignored in the designed. Considering that there are N channels to be selected in the network. Ac-cording to the RCS method, there are 1/N probability that the CR user will remain at the current channel; while there are(N − 1)/N chances to switch to another spectrum.

(34)

If the CR user is determined to stay at the existing channel, the required waiting time acquired in (4.1) will be adopted. In the case that the CR user is suggested to switch to another channel, both the switching time Tsw and an additional sensing time Tsens will

be required to observe if the destination channel is busy or not. If the target channel is occupied by the primary user with probability Pr(cad,k = 1), additional waiting time

slots are required until the channel becomes idle to be utilized. On the other hand, if the randomly selected channel is found to be idle with probability Pr(cad,k = 0), the

CR transmitter and receiver will spend Ths time for exchanging handshake messages in

order to confirm the utilization of target channel. As a result, the average waiting time of the RCS scheme Tw,rcs becomes

Tw,rcs = 1 N · nw,nshTslot+ N − 1 N [Tsw+ Tsens+ Pr(cad,k = 1) · Tslot· N X as=1,as6=ac nas w + Pr(cad,k = 0) · Ths] (4.10)

where nw,nsh is obtained from (4.1). The resulting net transmission time in a slot Ts,rcs

can therefore be obtained as

Ts,RCS = Tslot− TACK − Tsens− (Ths+ Tsw) ∗ Pr(cad,k = 0) (4.11)

4.2.3 Precise POSH Scheme

The precise calculation for proposed POSH scheme is illustrated as shown in Fig. 4.1.(c). The POSH scheme does not need to spend handshake time between the transmitter and the receiver as required by the RCS method; while additional calculation time for

(35)

con-and receiver, it is unnecessary to exchange messages in order to inform the receiver which spectrum is selected as the target channel. Nevertheless, CR user needs to spend additional time Tp to implement the POMDP algorithm, and the time Tu to update the

belief state at the end of each slot. Combining all the compartments mentioned above, the overall waiting time Tw,posh can be represented as

Tw,posh= Pr(cad,k = 1) ∗ (nw,posh∗ Tslot+ Tsens) + Pr(cad,k = 0) ∗ (Tp+ Tsw+ Tsens)

(4.12)

The net transmission time in a slot Ts,posh becomes

Ts,posh = Tslot− Tack− Tsens− Tu− (Tp+ Tsw) ∗ Pr(cad_,k = 0) (4.13)

Due to the large amounts of system states and the complexity of witness algorithm [19–21], it is in general considered a complicated process to calculate a POMDP-based algorithm. It can be found in section 3.1 that as the number of channel increases, the number of system state is augmented exponentially, which makes the update process of belief state becomes complicated. A reduced form of system state was introduced in [3; 18] to be adapted in the problems with POMDP-based formulation. It has been proved to reduce the total number of states which effectively decrease the computation complexity. The reduced form successfully reduce the number of system state which becomes equivalent to the number of channel in the system. Furthermore, a efficient approach for belief state update is also exploited in order to effectively simplify the updated procedure. Since the number of reduced belief state is equal to the number of channels as N and each of the state is updated by a specific equation, it can be concluded that the time complexity of belief state update becomes O(N).

(36)

Chapter 5 Proposed POMDP-based

Multi-user Spectrum Handoff

(M-POSH) Protocol

In Section 3, the POSH scheme is introduced to provide the POMDP-based policy for a single CR user in order to determine the target channel for spectrum handoff. In the case that the network exists more than one CR user whose transmissions are interrupted by the primary user at a specific time slot, those CR users will likely to select an identical frequency channel for spectrum handoff since all the secondary users observe the same behavior of primary user. Therefore, the consequence of performing the POSH scheme in multi-user network will result in packet collisions among those CR users that intend to perform spectrum handoff. A simple solution to deal with the collision problem is to adopt the CSMA-based contention resolution [? ]. The

(37)

chosen spectrum within a specific time slot. All the other CR users will turn out to lose the opportunity to transmit data in this slot.

In this chapter, multi-user POSH (M-POSH) protocol is proposed to distribute the CR users to different opportunistic spectrums rather than contending the access right in the same spectrum. The main purpose of the proposed scheme is to reduce the waiting time of entire network by ensuring that every possible spectrum hole can be fully exploited, whereas the fairness for channel access among all CR users can still be maintained. In the following two sections, the system model and implementation of the proposed M-POSH scheme will be described.

5.1 System Model of M-POSH Protocol

The system model of proposed M-POSH protocol is described as follows. It is assumed that there are n CR users which conduct saturated data transmission in a slotted CR network with N (n < N ) licensed spectrum. All the CR users are considered to observe the same channel behavior in the network. Unlike the POSH scheme, the M-POSH protocol not only needs to consider the traffic of primary user but also re-quires to coordinate the channel access among the secondary users. Moreover, instead of exchange messages among the CR users in distributed manner, a common control channel is utilized to exchange required information between the CR users. There has been arguments regarding whether to utilize a dedicate channel for delivering control messages. The investigation in [22] provides analytical results in order to illustrate the benefits for adopting an dedicated control channel in the network.

The belief state for multi-user can be extended from that for single user as defined in section II.B and (3.1). The belief state of the rth CR user is represented as br(ci,k),

(38)

the set of channels that are utilized by the CR users at time slot k, i.e. ˜Ak ∈ Ak where

Ak represents the total available channels in the network as defined in section III.A.

The update for belief state in multi-user scenario can be obtained as

br(ci,k+1) =                  0, if zr,k= 0, ar,k= i, and ar,k∈ ˜/Ak 1, if zr,k= 1 and ar,k= i, or ar,k∈ ˜Ak

br(ci,k) · τ (ci,k = 0, ar,k, ci,k+1 = 0) + (1 − br(ci,k)) · τ (ci,k = 1, ar,k, ci,k+1 = 0),

if ar,k6= i

(5.1)

where ar,k represents the action of the rth user in time slot k, and zr,k denotes the

observation of the rth user in slot k. The update process of belief state in (5.1) is derived via the reduced strategy as in [3] for POMDP-based formulation. For the rth user that determines to take action ar,k to access the ith channel, i.e. ar,k = 1, the update of the

belief state br(ci,k+1) is determined by the observation zr,k. In other words, the update

of belief state br(ci,k+1) = 0 if the observation zr,k is equal to 0; while br(ci,k+1) = 1

in the case that zr,k = 1. Another condition for the belief state at the (k + 1)th time

slot to illustrate the busy state is that the action taken at time slot k belongs to the busy channel set, i.e. ar,k ∈ ˜Ak. Furthermore, the update process of belief state for the

other n − 1 channel (i.e. ar,k 6= i) is determined by the the probability computation

as shown in the third item of (5.1). In other words, the idle probability for the next time slot is equal to the idle to the idle probability τ (ci,k = 0, ar,k, ci,k+1 = 0) times

(39)

5.2 Implementation of M-POSH Protocol

Without loss of generality, it is assumed that all the existing CR users are initially located and operated on different frequency spectrums. The information that the oc-cupancy of each channel by its corresponding CR user is available to all the CR users. In the case that a CR user intends to conduct spectrum handoff, it will broadcast a handoff message on the common control channel in order to announce all the users re-garding the change of network condition. Since there may exist more than one CR user that need to broadcast its handoff message simultaneously in the same time slot, there is potential that these messages from different CR users can collide with each other. A random backoff contention window is therefore utilized in the proposed M-POSH scheme in order to resolve the potential packet collision problem. Each CR user that intends to delivery the handoff message on the control channel will wait for a random number in a pre-specified interval such as to ensure the message can be successfully received.

The handoff message includes three fields as follows: (a) access number, (b) original channel, and (c) candidate set of target channels. The access number is assigned to each CR user that intends to perform spectrum handoff, whose value is randomly selected by the CR user within a predefined interval. The original channel field records the ID of the channel that the CR user is currently in use before handoff. The candidate set of target channels for each specific CR user denotes the available channels to potentially conduct spectrum handoff, i.e. Dr = [dr,1, ..., dr,n] where dr,i represents the ith priority

in the target channel set of user r. The procedure for user r to acquire the candidate set of target channels Dr can be expressed in Algorithm 1 as follows.

The concept of the candidate set of target channels for each CR user is to acquire the formation of unoccupied channels based on the criterion of minimal waiting time. As shown in Algorithm 1, the target candidates are selected sequentially based on the

(40)

corresponding cost function Ψ. It is noted that the parameter m denotes the number of CR users that intend to conduct spectrum handoff at time slot k, which consist of the user set Rm. If the selected action is observed to exist in ˜Ak, the chosen a∗ will

be replaced by the original channel and recorded in the candidate set. As a result, this replacement criterion can ensure that none of the CR user’s candidate set Dr

will be identical. This can facilitate the negotiation process will for the CR users to select a designate target channel from their candidate set, which will be described as in Algorithm 2.

Algorithm 1: Acquiring Candidate Set of Target Channels Dr for User r

Input : ˜Ak with size (?? n), Ak with size N

Output: Dr with size n

for i ← 1 to n do

1

Ψ ←Pm_`=1` · τ (ci,`+k−1= 1, ar,k, ci,`+k = 0) · Pr(ci,k = 1) · τ (ci,k =

2 1, ar,k, ci,k+1= 1)`−1 a∗ r = arg minar,kΨ 3 if a∗ r ∈ ˜/ Ak then 4 Add a∗ r to Dr 5 Remove a∗ _{from A} k why??? 6 end 7 else 8

Add original channel to Dr

9

end

10

end

11

As illustrated in Algorithm 2, after collecting the information of handoff message from the m CR users, the target channel for each user can therefore be determined by a negotiation procedure between the users. Let Tk denotes the set of target channel

(41)

CR users within the network can choose its target channel without collision with the other CR users. It is noticed that with the consideration of other CR traffic within the network, the occupied channel sets of CR users ˜Ak will also be updated based on

the information from both the original channel and the target channel of the user set

Rm that conduct spectrum handoff. In summary, the proposed M-POSH protocol can

provide a negotiation mechanism for each CR user to select its target channel without being collide with the selection made by other CR users in the network.

Algorithm 2: Selection of Target Channel Input : Dr with size n, Rm with size m

Output: Target channel of each CR user ak,r

foreach CR users in Rm do

1

for i ← 1 to m do

2

if dr,i ∈ T/ r and dr,i ∈ ˜/ Ak then

3

Destination channel ← dr,i

4 Add dr,i to Tk 5 break 6 end 7 end 8 end 9

(42)

Chapter 6 Performance Evaluation

In this chapter, simulations are presented to demonstrate the performance of proposed POSH and M-POSH protocols. The major focus in the simulations is to obtain the re-quired waiting time slots for the secondary user while it has been directed to the target channel. Since full channel state information is required by all of the existing spectrum handoff algorithms, it is considered unfair to compare the existing schemes within the environment adopted by the proposed POSH and M-POSH algorithms, where only par-tial channel information is observable. Therefore, the proposed scheme will be compared with two different cases as mentioned in Section 4, including both the NSH scheme and the RCS mechanism. The traffic of the primary user follows the poison distribution, and the service time is assumed to be a uniform distribution with mean 1/µ = 1. Three channels are considered in the simulations, i.e. sk = [c1,k, c2,k, c3,k]; while the

discount factor in (3.10) is selected as ρ = 1. The reduced state strategy is utilized in the simulations to obtain the numerical results of the POMDP-based optimization

(43)

0.1 0.2 0.3 0.4 0.5 0.6 0 1 2 3 4 5 6

Traffic Arrival Rate (λ)

Expected Number of Waiting Slots

RCS−Simulation RCS−Analysis NSH−Simulation NSH−Analysis POSH−Simulation POSH−Estimation

Figure 6.1: Performance validation: the number of waiting time slots versus traffic arrival rate.

6.1 Model Validation

The analytical models for required waiting time slots of the three schemes, including POSH, NSH, and RCS algorithms, as presented in (4.1), (4.3), and (4.7) are validated via simulations results as shown in Fig. 6.1. It can be observed that the expected number of waiting time slots from all these three schemes increase as the traffic arrival rate (λ) of the primary user is augmented. Under different arrival rates, the proposed POSH algorithm can provide the smallest waiting time slots comparing with the other two schemes. Furthermore, it can also be seen that the simulation results of both NSH and RCS schemes match with their corresponding analytical results. On the other hand, there exists slight difference between the analytical and simulation results of the proposed POSH scheme. The major reason for this deviation can be contributed to the imprecise modeling of POSH scheme based on stationary probabilities.

In order to clearly illustrate the difference between these two results, the biased percentage β is introduced and is defined as β = naw,ζ−nsw,ζ

ns

w,ζ × 100 where n

a

w,ζ and nsw,ζ

(44)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 −8 −6 −4 −2 0 2 4 6 8

Traffic Arrival Rate (λ)

Biased Percentage (

β

)

RCS & NSH POSH with 106 slots POSH with 105 slots POSH with 104 slots

Figure 6.2: Performance validation: the biased percentage β versus traffic arrival rate.

respectively. The parameter ζ indicates the either one of the three spectrum hand-off schemes. Fig. 6.2 illustrates the biased percentage β for the three schemes under different traffic arrival rates. It is noted that the proposed POSH method is imple-mented under the simulation runs with different number of transmission time slots, i.e.

T = 104_{, 10}5_{, and 10}6_{. It can be observed that even the behavior of belief state can not}

be exactly modeled and analyzed, the analytical results derived by stationary proba-bility can still approach to the simulation values within 3% of estimation difference. It can also be seen from Fig. 6.2 that the bias can be diminished as the number of trans-mission time slots for the simulations are increased. This results reveal the case that with longer time of simulation, the incomplete network information can be updated more accurately. The simulation results will tend to possess stationary behaviors as is presented by the analytical model in (4.7).

(45)

0 20 40 60 80 100 120 140 160 180 200 0 200 400 600 800 1000 1200 1400

Number of Spectrum Handoff

Number of Waiting Time Slots

RCS_Cw POSH_Cw RCS_Cb POSH_Cb

Figure 6.3: Performance comparison: the number of waiting time slots versus the number of spectrum handoff.

6.2 Performance Comparison

Fig. 6.3 shows the performance comparison of the number of waiting time slots versus the total number of spectrum handoff for both the proposed POSH scheme and the RCS method. Two different channel conditions are considered for comparison purpose as follows. A better channel condition Cb is chosen with the transition probability from idle

to idle state for each channel as τ (ci,k = 0, ak, ci,k+1 = 0) = 0.8, 0.7, 0.65 for i = 1, 2, 3;

while that from busy to idle state is selected as τ (ci,k = 1, ak, ci,k+1 = 0) = 0.4, 0.5, 0.55

for i = 1, 2, 3. On the other hand, a worse channel condition Cw is determined with the

transition probability from idle to idle state as τ (ci,k = 0, ak, ci,k+1 = 0) = 0.4, 0.3, 0.35,

and that from busy to idle state is set as τ (ci,k = 1, ak, ci,k+1 = 0) = 0.1, 0.2, 0.15.

For fair comparison, the NSH method is not implemented in this case since the CR user can always stay at the channel with better condition. It is intuitive to see that the total number of waiting time slots is increased as the number of spectrum handoff is augmented. Furthermore, the secondary user has to wait for comparably more time slots in the worse channel case under both schemes. Nevertheless, the total waiting time

(46)

0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800

Total Number of Transmission Time Slots

Number of Waiting Time Slots

RCS_Cw POSH_Cw RCS_Cb POSH_Cb

Figure 6.4: Performance comparison: the number of waiting time slots versus the total number of transmission time slots.

slots acquired from the proposed POSH scheme is comparably smaller than that from the RCS method under both channel conditions. It is also observed that the POSH scheme performs better as the number of spectrum handoff is increased. The reason can be attributed to the situation that more updated belief states are acquired by the POSH scheme as the number of handoff is augmented.

Fig. 6.4 illustrates the performance comparison between the number of waiting time slots and the total number of transmission time slots. It is noticed that different numbers of waiting time slots and handoff numbers will be resulted by each scheme at every specific number of transmission time slots. In other words, the combining effects from both the waiting time slots and the handoff numbers will be revealed in Fig. 6.4 at each horizontal data point. It can be observed that the proposed POSH algorithm still outperforms the RCS scheme under both the Cb and Cw channel conditions. Even

(47)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 102

103 104

λ

Number of Waiting Slots RCS

NSH POSH

Figure 6.5: Performance comparison: the number of waiting time slots versus traffic arrival rate (numbers of spectrum handoff = 250).

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

102 103 104

λ

Number of Waiting Slots

RCS NSH POSH

Figure 6.6: Performance comparison: the number of waiting time slots versus traffic arrival rate (number of transmission time slots = 10000).

(48)

0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 100 200 300 400 500 600 700 800 900 λ Waiting Time (ms) NSH RCS POSH with T policy=0ms POSH with T policy=5ms

Figure 6.7: Performance comparison: the waiting time with practical consideration versus different arrival rates.

and the NSH schemes under different values of packet arrival rate λ of the primary user. Fig. 6.5 shows the comparison under fix numbers of spectrum handoff equal to 250 and Fig. 6.6 is performed under number of transmission time slots equal to 1200. It can be observed that the proposed POSH scheme can outperform the other two methods under different packet arrival rates. The benefits from the adoption of the POSH algorithm is especially revealed at smaller values of packet arrival rate since there can be more opportunity for the POSH scheme to select a feasible target channel.

The practical consideration for the three handoff schemes are presented as in Figs. 6.7 and 6.8. The corresponding parameters are listed as follows: Tslot = 100 ms,

Tsens = 5 ms, Tsw = 10 ms, Ths = 1 ms, and Tack = 1 ms. Fig. 6.7 illustrates

the expected waiting time as was obtained from (4.8), (4.10), and (4.12) from the NSH, RCS, and POSH scheme respectively. Two cases with policy time Tp = 0 and

(49)

0.2 0.3 0.4 0.5 0.6 0.7 10 20 30 40 50 60 70 80 90 λ

Net Transmission Time(ms) / Slot

NSH RCS POSH with T update=0ms POSH with T update=3ms

Figure 6.8: Performance comparison: the net transmission time per slot versus different arrival rates.

the overall performance is still better in comparison with the NSH and RCS methods. Furthermore, it is observed in Fig. 6.7 that the waiting time Tw,poshwill only be affected

by Tp while the target channel of spectrum handoff has high probability in idle state.

In other words, Tp will degrade the performance of proposed POSH scheme at lower

primary traffic circumstance, and will become insignificant with higher primary traffic. Nevertheless, since POSH scheme can provide better performance among the three schemes at low primary traffic (as shown in Figs. 6.5 and 6.6), it is concluded that the degraded effect from Tp will not be significant by adopting the POSH scheme.

In order to better present the utilization of licensed spectrum, Fig. 6.8 is exploited to illustrate the net transmission time within a time slot. Two cases with update time

Tp = 0 and 3 ms for the proposed POSH scheme are considered; while the policy time

Tp =??. It is noticed that the update time Tu is considered additional time to expense

for the proposed POSH scheme; while the policy time Tp will only occur as spectrum

handoff happens. It is intuitive to observe that the net transmission time obtained from all three schemes decreases as the arrival rate of primary traffic is increased. Furthermore, as λ is increased, the net transmission time of proposed POSH scheme

(50)

2 3 4 5 6 7 8 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Number of Channel

Expected Number of Waiting Time Slots

NSH with λ=0.5 RCS with λ=0.5 M−POSH with λ=0.5 NSH with λ=0.2 RCS with λ=0.5 M−POSH with λ=0.2

Figure 6.9: Performance comparison: the expected number of waiting slots versus different number of channels (with 2 CR users).

becomes closer or even worse than the RCS and NSH schemes, e.g. the POSH scheme with Tu = 3 ms is worse than the RCS algorithm as λ > 0.75. Therefore, practical

consideration for these handoff schemes can provide a channel selection criteria for the CR users to determine which handoff scheme should be applied to obtain their target channel.

Figs. 6.9 and 6.10 compare the performance of NSH scheme, RCS scheme, and the proposed M-POSH protocol under the circumstance of multiple CR users within multi-channels network. Fig. 6.9 shows the performance comparison under two CR users; while Fig. 6.10 illustrates the circumstance with 7 channels in the networks. Two different arrival rates of primary traffic are considered for all schemes, i.e. λ = 0.2 and 0.5. It is apparently to observe that the NSH scheme results in the same performance under different numbers of channels and CR users since it does not perform any spectrum handoff activity. In other words, the NSH scheme in multi-user scenario

(51)

1 2 3 4 5 6 7 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 Number of CR Users NSH with λ=0.5 RCS with λ=0.5 M−POSH with λ=0.5 NSH with λ=0.2 RCS with λ=0.2 M−POSH with λ=0.2

Figure 6.10: Performance comparison: the expected number of waiting slots versus different number of CR users (with 7 available channels).

channels is augmented; while it is increased with the augmentation of total CR users due to the potential collisions happened in the network. The performance of RCS scheme can becomes worse than that from NSH method in certain circumstances accounting to NSH scheme at least guarantee no collisions will happen in the network. From example as in Fig. 6.10, the RCS scheme with λ = 5 results in higher number of waiting slots as the number of CR users greater than 5 in comparison with the NSH method.

Nevertheless, the proposed M-POSH protocol can adaptively selecting the target channel based on the availability of network channels with the consideration of collisions between the CR users. As shown in both Figs. 6.9 and 6.10, the proposed M-POSH scheme outperforms both the RCS and NSH algorithms under different scenarios. As the network channels are in occupied conditions, e.g. the number of CR users is equal to the number of licensed spectrum as the most left data point in Fig. 6.9, the proposed M-POSH protocol will decide to stay at the current channel in order to reduce the probability of collision between the CR users. On the other hand, as long as the network contains more available channels that can be utilized, the M-POSH protocol will distribute the CR users that intend to conduct handoff to exploit those available channels. Consequently, the performance of M-POSH scheme can achieve the optimal

在感知性網路中以部分探測馬可夫決策過程為基礎的頻帶換手機制

國 立 交 通 大 學

電信工程學系

碩 士 論 文

在感知性網路中以部分探測馬可夫決策過程為基礎的

頻帶換手機制

A POMDP-based Spectrum Handoff Protocol for

Partially Observable Cognitive Radio Networks

研究生：馬瑞廷

指導教授：方凱田 教授

在感知性網路中以部分探測馬可夫決策過程為基礎的頻帶換手機制

A POMDP-based Spectrum Handoff Protocol for Partially Observable Cognitive Radio

Networks

研 究 生：馬瑞廷 Student：Rui-Ting Ma

指導教授：方凱田 Advisor：Kai-Ten Feng

國 立 交 通 大 學

電 信 工 程 學 系

碩 士 論 文

在感知性網路中以部分探測馬可夫決策過程為基礎的頻帶換手機制

學生：馬瑞廷

指導教授

方凱田 教授

國立交通大學電信工程學系碩士班

摘

要

近年的研究說明了靜態的頻帶分配是造成頻帶使用缺少效率的主

因，為了增進頻帶使用率，可動態偵測且使用認證頻帶的感知無線電

(CR)因應而生。 如何提供有效率的頻帶換手在 CR 中是個很重要的

議題。現存的頻帶換手方法假定感知無線電的使用者(CR user)可以正

確的偵測每一個頻帶以便找到適合的頻帶進行換手。然而，這個假設

在實際的情況下是不實際的，因為 CR 使用者偵測頻帶所花費的時間

將會太高而影響主要使用者的品質。 在這篇論文中，藉由部分可知

的環境下的 Marcov 決策過程(POMDP)的幫助，可以透過探測部分的

頻帶來估測整個網路環境 。此篇論文提出以 POMDP 為基準的頻帶

換手機制(POSH)，其目的為藉由部分的通道狀態來找出最適合進行

換手的頻帶。除此之外，為了適應多位使用者的環境，此篇論文本著

將頻寬資源充分分配給各使用者的概念提出了一個以 POMDP 為基

準的多使用者的換手機制(M-POSH)。 藉由 POMDP 為基準的頻帶換

手機制所選出的頻帶，可達到在每次換手時 CR 使用者所需等待的時

間最短，數據結果顯示出此方法可有效率地讓 CR 使用者在每次頻帶

換手時達到最少的等待時間。

A POMDP-based Spectrum Handoff Protocol for Partially Observable

Cognitive Radio Networks

student：

Advisors：Dr.

Department of Communication Engineering

National Chiao Tung University

ABSTRACT

Recent studies have been conducted to indicate the ineffective usage of licensed

bands due to the static spectrum allocation. In order to improve the spectrum

utilization, the cognitive radio (CR) is therefore suggested to dynamically exploit the

opportunistic primary frequency spectrums. How to provide efficient spectrum

handoff has been considered a crucial issue in the CR networks. Existing spectrum

handoff algorithms assume that all the channels within the network can be correctly

sensed by the CR users in order to perform appropriate spectrum handoff process.

However, this assumption is considered impracticable in realistic circumstances

primarily due to the excessive time required for the CR user to sense the entire

spectrum space. In this paper, the partially observable Markov decision process

(POMDP) is exploited to estimate the network information by partially sensing the

frequency spectrums. A POMDP-based spectrum handoff (POSH) scheme is

proposed to determine the optimal target channel for spectrum handoff according to

the partially observable channel state information. Moreover, a POMDP-based

multi-user spectrum handoff (M-POSH) protocol is proposed to adapt the POMDP

policy into multi-user CR network by distributing CR users to opportunistic

誌

謝

Contents

List of Figures

Chapter 1

Introduction

Chapter 2

POMDP framework

2.1

Observation

2.2

Belief State

2.3

Reward and Value Functions

Chapter 3

Proposed POMDP-based Spectrum

國立交通大學

碩士論文

指導教授：方凱田教授

研究生：馬瑞廷 Student：Rui-Ting Ma

國立交通大學

電信工程學系

碩士論文

方凱田教授

(CR)因應而生。如何提供有效率的頻帶換手在 CR 中是個很重要的

將會太高而影響主要使用者的品質。在這篇論文中，藉由部分可知

頻帶來估測整個網路環境。此篇論文提出以 POMDP 為基準的頻帶

準的多使用者的換手機制(M-POSH)。藉由 POMDP 為基準的頻帶換