在大型網路下以群簇法為基礎的樣本比對定位法之研究

(1)

國

立

交

通

大

學

網路工程研究所

碩

士

論

文

在大型網路下以群簇法為

基礎的樣本比對定位法之研究

Cluster-Based Pattern-Matching Localization Schemes for

Large-Scale Wireless Networks

研究生：吳秉禎

(2)

在大型網路下以群簇法為基礎的樣本比對定位法之研究

學生

:

吳秉禎

指導教授

:

曾煜棋老師

國立交通大學資訊工程學系

(

研究所

)

碩士班

摘要

在定位服務裡

,

系統的反應時間是一個關鍵點

,

對於即時性的應用來說

,

更是如此。在大型

網路下

(

如無線城市

),

以樣本比對法為基礎的定位系統

,

如此的需求更為明顯。此類定位法的運

作是仰賴目前物體收集到的訊號強度特徵與事先在訓練階段建立的以訊號強度為樣本的資料庫

做比對來達到定位的目的。在這篇論文中

,

我們提出一個以群簇法為基礎的樣本比對定位架構

來加快定位的程序。藉著將擁有類似的訊號特徵樣本的訓練點群聚在一起

,

我們會展示如何降

低定位所需的比較複雜度來加速整個定位的流程。為了解決訊號飄移的問題

,

我們更提出了幾

個有效的分群法。在許多廣泛的模擬的結果下

,

我們可以發現

:

平均來說

,

在不影響定位準確度

的情況下

,

我們提出的系統相較於原來的樣本比對法的比較複雜度上可減少至少

90%

。

(3)

Cluster-Based Pattern-Matching Localization

Schemes for Large-Scale Wireless Networks

Student: Bing-Jhen Wu Advisor: Prof. Yu-Chee Tseng

Department of Computer Science and Information Engineering National Chiao Tung University

Abstract

In location-based services, the response time of location determination is critical, especially in real-time applications. This is especially true for pattern-matching localization methods, which rely on comparing an object’s current signal strength pattern against a pre-established lo-cation database of signal strength patterns collected at the training phase, when the sensing field is large (such as a wireless city). In this work, we propose a cluster-based localization frame-work to speed up the positioning process for pattern-matching localization schemes. Through grouping training locations with similar signal strength patterns, we show how to reduce the associated comparison cost so as to accelerate the pattern-matching process. To deal with sig-nal fluctuations, several clustering strategies are proposed. Extensive simulation studies are conducted. Experimental results show that more than 90% computation cost can be reduced in average without degrading the positioning accuracy.

(4)

誌謝

這篇論文的完成

,

首先要感謝曾教授所給予的指導與意見

,

不僅僅提升整體論文的品質更帶

領我了解到整個研究的過程。感謝在過程中協助良多的郭聖博學長

,

對於整篇論文的所提供的

寶貴的建議與許多的幫助

,

讓此篇論文得以順利完成。也感謝實驗室夥伴們彼此間的加油打氣

,

讓我在研究的路上不孤單。最後

,

感謝我的家人兩年來一直在背後默默地支持

,

讓我能專心在自

己的研究上。

(5)

中文摘要

i Abstract ii

誌謝

iii Contents iv List of Figures 1 1 Introduction 2 2 Related Works 5

3 The Cluster-Based Pattern-Matching Localization Framework 8

3.1 The Training Phase . . . 8 3.2 The Positioning Phase . . . 10

4 Clustering Algorithms 11

(6)

4.2 Clustering Techniques Allowing Overlaps . . . 13 4.2.1 Multi-Nearest-Neighbor Strategy . . . 13 4.2.2 Voronoi-based Strategy . . . 14 4.2.3 Probability-based Strategy . . . 16 5 Simulations 20 5.1 Simulation Model . . . 20

5.2 Impact of Clustering on the Average Positioning Error . . . 22

5.3 Sensitive Performance Study for Clustering Strategies . . . 23

5.4 Performance Comparison of Clustering Strategies . . . 25

5.5 Performance Study of Total Comparison Cost . . . 25

6 Conclusion 27

(7)

List of Figures

3.1 The cluster-based localization framework. . . 9

4.1 An example of the problem ofk-means algorithm. . . 12

4.2 An example of Voronoi-based Overlapping mechanism. . . 15

5.1 The comparison of the average positioning error under differentσ. . . 22

5.2 Sensitivity performance studies for three clustering strategies. . . 24

5.3 Effect of the standard deviationσ on the (a) hit rate and (b) average cluster size. 26 5.4 The total comparison cost under different number of clusters. . . 26

(8)

Chapter 1 Introduction

Location-based services (LBSs) have emerged as one of the killer applications for mobile com-puting and wireless data services. While providing great market values to business applications, such services are also critical to public safety, transportation, emergency response, and disaster management. Consequently, location estimation is essential to the success of LBSs. In addition to the well-known GPS [1], a lot of techniques have been proposed for indoor localization, such as infrared-based [2], ultrasonic-based [3], and RF-based [4, 5] systems.

Among all localization systems, the RF-based systems are probably most cost-effective be-cause they can rely on existing wireless network infrastructures (such as IEEE 802.11 WLAN). However, such systems need to handle the characteristic of signal strengths, which may fluc-tuate frequently. The pattern-matching schemes [4, 5, 6, 7, 8], or known as the fingerprinting schemes, deal with this problem by involving two phases: training and positioning. In the train-ing phase, given a set of traintrain-ing locations, the received signal strengths of all base stations (or beacons) at these locations are collected for a sufficient amount of time. Therefore, for each training location, a feature vector is calculated. Then, in the positioning phase, when an object needs to determine its location, it can compare its current received signal strengths against the

(9)

feature vectors in the location database to check their similarity. The corresponding location of the most similar feature vector is selected as the possible location of the object.

Recently, many literatures apply pattern-matching localization methods to a large-scale en-vironment [9, 10, 11, 12]. However, they may encounter the scalability problem because of the huge calibration efforts required in the training phase and high comparison cost spent in the po-sitioning phase. For example, in a wireless city, thousands or millions of training records may have to be collected in a location database. Several efforts have been dedicated to reducing the calibration cost [10, 13, 14, 15]. In this paper, we aim to reduce the computation cost incurred in the positioning phase. This would enable us to support real-time LBSs. We propose a cluster-based localization framework which also consists of two phases. In the training phase, similar to existing pattern-matching approaches, we first collect feature vectors of training locations. Through clustering techniques, those training locations with similar feature vectors are grouped together. This results in a small number of clusters. For each cluster, a representative feature vector is derived. Then in the positioning phase, given a signal strength vector, we first compare it against all clusters’ representative feature vectors and pick the one whose representative fea-ture vector is most similar to the given signal strength vector. Finally, only the training locations in the selected cluster are further evaluated to determine the estimated location of the object.

Although the clustering technique is able to reduce the computation cost, its positioning accuracy may be reduced if the right cluster is not selected. If a false cluster is selected, the final location estimation may be incorrect. We refer to this as the false cluster selection. Apparently, the probability of a false cluster selection should be reduced. In this paper, we propose several clustering strategies. First, we show that the traditionalk-means algorithm [16] is not suitable

when the effect of noise is not negligible. Then we propose three clustering strategies to enhance the k-means algorithm by allowing clusters to have overlapping members. Although having

(10)

duplications is redundant, it can effectively reduce the events of false cluster selection due to noises. To verify our results, a simulation model is built and extensive simulation studies are conducted. Experimental results show that this framework is able to reduce at least 90% computation cost without sacrificing accuracy.

The rest of this paper is organized as follows. Chapter 2 discusses some reviews. The pro-posed cluster-based framework is described in Chapter 3. Chapter 4 presents several clustering strategies. Chapter 5 contains our performance studies. Chapter 6 concludes this paper.

(11)

Chapter 2 Related Works

Several localization systems have explored the pattern-matching techniques. In [4], the nearest-neighbor algorithm is applied to search the location database for the training location with the shortest Euclidean distance in the signal space. Based on probability theory, [5] presents a probabilistic framework for localization to handle signal strength fluctuations. Reference [8] adopts the similar concept to develop recursive Bayesian filters for localization. In general, the nearest-neighbor approach is not as effective as the probabilistic one [17].

In [6], a more sophisticated network-based classification method is proposed. A neural network, which consists of multiple layers of interconnected neurons is adopted to model the dependencies among a set of random variables. It has a forward and back propagation mecha-nism to adaptively assign suitable weights to neuron connections in the training phase. Then, the well trained network can be used to classify an observed sample of signal strengths in the positioning phase. Based on the statistical learning theory, [7] proposes a support vector ma-chine (SVM) to find a high-dimensional hyperplane such that any two training data set can be partitioned between two sides of this plane and their distances to this plane can be as far as possible.

(12)

For large-scale environments, some literatures have considered the scalability issue incurred in the training phase [13, 14, 11] and the positioning phase [18, 19]. In the training phase, to re-lieve huge labor cost needed for training data collection, an intuitive idea is to collect less train-ing locations. However, it also represents that we cannot capture the detailed signal strength patterns so the positioning results will be coarse-grained. Hence, [13] proposes to generate a small number of virtual training locations from the actual ones by interpolation techniques. Similarly, multidimensional regression [14] is used to build a nonlinear mapping between the signal space and the physical space. For the reason that manually collecting training samples with the correct location labels is time-consuming, [13] suggests to use unlabeled user traces to compensate the loss of accuracy caused by a relatively small number of training locations. An unlabeled user trace is a sequences of continuously received signal strength measurements without location labels. With the help of a hidden Markov model to model user traces, unla-beled user traces can be used to simplify the training process while keeping a certain degree of positioning accuracy. Furthermore, a calibration-free mechanism is the extreme solution to save labor cost. Without a training phase, signal propagational models can be used to pre-dict the characteristics of signal strengths in the environment [4]. However, such systems will have higher positioning error because multipath fading and interference are hard to be precisely modeled in an indoor environment.

The issue of reducing the real-time comparison cost in the positioning phase is discussed in [18, 19]. Their main ideas are both to apply clustering techniques to the training locations, so only a subset of them needs to be searched. Reference [18] constructs clusters according to the physical coordinates of training locations. It claims that the estimated locations of two consecutive location queries should be very close in the physical space. Thus, only the training locations close to the previous estimated one need to be searched for the current query. However,

(13)

it does not consider the actual signal space and its searching range strongly relies on the query interval and the user mobility model.

In [19], training locations that see the q strongest signal strengths from the same q access

points (APs) are grouped together. This clustering technique is simple but has several draw-backs. First, the top q APs with the strongest signals at a fixed location may vary over time,

thus causing false cluster selection. Second, the number of clusters is not a controllable param-eter. In our work, the number of clusters is tunable and false cluster selection can be effectively avoided.

(14)

Chapter 3 The Cluster-Based Pattern-Matching

Localization Framework

The proposed clustering framework can be applied to most pattern-matching localization meth-ods. It also consists of two phases: training and positioning. Fig. 3.1 depicts the structure of the framework.

3.1 The Training Phase

We assume that there arem beacons (or APs), denoted as B = {b1, b2, . . . , bm}, being deployed

in the field. In this field, we define n training locations L = {ℓ1, ℓ2, . . . , ℓn}. Let the feature

space _{F ∈ R}m_{, where} _{R is the set of possible signal strengths. For each training location ℓ} i,

i = 1..n, we collect a sufficient number of training samples from beacons and calculate the

feature vector υi = [υi,1, υi,2, . . . , υi,m] ∈ F for ℓi, whereυi,j is the average received signal

strength from bj at ℓi. For those training locations with similar feature vectors, we exploit

clustering techniques to group them together. Specifically, we will compute k location sets C1, C2, . . . , Ck such that Ci ⊆ L, i = 1..k, and Sk_i=1Ci = L. The detail clustering algorithms

(15)

Wireless Network b1 b2 bm b5 b3 b4 real-time data C1 C2 Ck .. . Pattern-Matching Localization Algorithm l* C* l l l Clustering l₁ l₂ l_n i X 1 X s training data s i X li Training Phase Positioning Phase .. . l 2 2 l X l 1 1 l X l n n l X 1 Z 2 Z k Z 1 X

(16)

will be given in Section 4. For each location setCi, its representative feature vector is expressed

by ωi = [ωi,1, ωi,2, . . . , ωi,m] ∈ F , where ωi,j is the average of signal strengths

P

x:ℓx∈Ciυx,j

|Ci|

. Note that two location sets may have overlaps, i.e.,Ci∩ Cj is not necessarily an empty set.

3.2 The Positioning Phase

In the positioning phase, when an object needs to determine its location, we can measure its signal strength vector s = [s1, s2, . . . , sm], where sj is the signal strength of bj. Our goal is

to determine the object’s location in a real-time manner. In typical pattern-matching methods,

s will be compared to all n feature vectors in the location database. However, in a large-scale

field (such as a wireless city), thousands or millions of vectors may need to be compared. By clustering training locations with similar feature vectors into a group, we only need to compare

s against the representative feature vector ωi of eachCi first. As in most works, the similarity

between s andCiis defined as the Euclidean distance of their feature vectors inF , sim(s, Ci) =

ks, ωik =

q

Pm

j=1(sj− ωi,j)2. Then, the most similar cluster, denoted byC∗, is selected, i.e.,

C∗

= arg minC_isim(s, Ci). That is, only the training locations in C ∗

will be further searched. We refer to this as the Nearest Neighbor in Signal Space (NNSS) algorithm [4]. In NNSS, users’ locations are estimated by comparing s against each training locationℓi inC∗ according

to the Euclidean distance, i.e.,sim(s, ℓi) = ks, υik =

q

Pm

j=1(sj− υi,j)2 inF . The estimated

location is ℓ∗

= arg minℓi∈C∗sim(s, ℓi). Therefore, the computation cost is decreased from

O(|L|) to O(k + |L|

(17)

Chapter 4 Clustering Algorithms

Below, we propose several clustering algorithms to partition the location database. We start with the well-knownk-means algorithm, followed by three enhanced clustering strategies.

4.1 k-means Algorithm

Thek-means algorithm developed in [16] can be applied to our model. There are multiple

iter-ations. In thex-th iteration, we will form k clusters C₁(x), C₂(x), . . . , C_k(x). Initially, we construct

k seeds ω₁(0), ω(0)₂ , . . . , ω_k(0) ∈ F , where each seed ω(0)_i ,i = 1..k, is randomly selected from the

set of feature vectors{υ1, υ2, . . . , υn}, and ωi(0) 6= ω (0)

j for alli 6= j. (Other ways to choose the

initial values of ω₁(0), ω₂(0), . . . , ω(0)_k are discussed in [20]. Here we adopt the random strategy.) With these seeds, we define clusterC_i(1) in the first iteration as follows:

ℓj ∈ Ci(1) ⇔ ω (0)

i = arg min

ω(0)_y

kυj, ωy(0)k.

That is,ℓj will be categorized as a member of clusterCi(1)if υjis closest to the seed ωi(0)among

(18)

X RSS from b1 RSS from b 2 Cluster C1 Cluster C2 Cluster C3

Figure 4.1: An example of the problem ofk-means algorithm.

the received signal strengths for allℓj ∈ C (1) i ,

ω(1)

i = avg{υj : ∀ℓj ∈ Ci(1)}.

In the x-th iteration, for all x ≥ 2, according to the seeds generated in the (x − 1) − th

iteration, we define clusterC_i(x) as follows:

ℓj ∈ C (x) i ⇔ ω (x−1) i = arg min ω(x−1)_y kυj, ω(x−1)y k.

Similarly, from each clusterC_i(x), we can calculate another seed ω_i(x) = avg{υj : ∀ℓj ∈ Ci(x)}.

The regrouping processes will be repeated iteratively until the conditionC_j(x) = C_j(x+1) is satis-fied for allj = 1..k. At last, we obtain ωj = ω(x)j for each location setCj = Cj(x).

Ideally, in the positioning phase, when an object provides its current signal strength vector

s, we would expect that a correct cluster with the most similar feature vector can be selected.

However, due to the fluctuation of radio signal, this cannot always be achieved. Fig. 4.1 shows an example with three clusters in a feature space. Due to signal fluctuation, the signal strength vector s of an object which locates atℓ1 may appear in multiple clusters. As shown by dotted

(19)

discussion. According to thek-means algorithm, if s is in the gray region, it is more similar to

the clusterC2 thanC1 andC3. Hence, a false cluster selection happens. Clearly, this situation is

more serious near the boundary of clusters.

4.2 Clustering Techniques Allowing Overlaps

The problem shown in Fig. 4.1 calls for the design of clusters with certain degrees of overlaps. Below, we propose three clustering schemes extended from thek-means algorithm, which allow

a training location to join multiple clusters. We define overlapping degreeλ to be the average

number of clusters that training locations can join. Clearly, this will increase the searching complexity in the positioning phase toO(k + λ × |L|

k ).

All schemes are similar to the k-means algorithm partitioning L into k sets. The main

difference is the way to determine which clusters a training location will join. In the first multi-nearest-neighbor strategy, each training location can join the first few clusters closest to it. In the second Voronoi-based strategy, the overlapping degree is determined by the geometric characteristic of the distribution of clusters. The last probability-based strategy can adaptively adjust the overlapping degree of each training location according to the levels of environmental noise.

4.2.1 Multi-Nearest-Neighbor Strategy

The multi-nearest-neighbor strategy assigns a constant overlapping degree λM to all training

locations. As mentioned before, k clusters of training locations {C1, C2, . . . , Ck} are obtained

by thek-means clustering algorithm. Then, for each training location ℓi, it will join the topλM

(20)

the searching space is increased fromO(k +|L|

k ) to O(k + λM× |L|

k ) compared to the k-means

clustering algorithm.

This strategy allows each training location to join multiple closest clusters in the feature space, unlike the single one in thek-means clustering algorithm. It is an intuitive solution for

solving the signal fluctuation problem. If samples of a location are possible to be estimated to many nearby clusters, there is no reason to make this training location join only one cluster. For the example illustrated in Section 4.1, we can avoid incorrect location estimations caused by false cluster selection ifℓ1is allowed to join two closest clustersC2andC3 simultaneously.

4.2.2 Voronoi-based Strategy

Although the multi-nearest-neighbor strategy is simple to be implemented and easy to control the average searching space, the parameterλM is hard to determine. IfλM is too small, it may

not compensate for the effect of signal fluctuations. Thus, the problem of false cluster selection remains. On the other hand, if λM is too large, some training locations may join unnecessary

clusters, thus causing redundancy. We can observe that when υi of a training location is close

to the center of a cluster in F , the number of clusters it joins should not the same as another

training location whose feature vector υj is near the periphery of a cluster.

For this consideration, we next propose the Voronoi-based strategy. After performing the

k-means algorithm,{υ1, υ2, . . . , υn} is decomposed into k partitions centered at ωj,1 ≤ j ≤ k.

It can be observed thatkυi, ωxk ≤ kυi, ωyk for all y 6= x if ℓi ∈ Cx. This property is equivalent

to a Voronoi diagram [21], where all points in a Voronoi cell are closest to the Voronoi vertex in the same cell. Thus, the members of a cluster Cx are contained in a Voronoi cell Vx with a

(21)

R1,2 (a) (b) b a R1,3 R2,3 R2,3 1 Z 2 Z 2 Z 3 Z 3 Z 1 X 1 X G 1 ' X e1,2 e2,3 e1,3 e2,3

Figure 4.2: An example of Voronoi-based Overlapping mechanism.

clusters and oppositely let the others join less clusters, we can improve the effectiveness of the overlapping technique.

Motivated by the observation above, we propose the Voronoi-based strategy. For each neigh-boring Voronoi cells Vx and Vy, we formally define an overlapping region Rx,y (x < y) in

which any training location whose feature vector located joins bothCx andCy. For example, in

Fig. 4.2(a), there are three Voronoi cells V1,V2, andV3, separated by three Voronoi edgese1,2,

e2,3, ande1,3, and three overlapping regionsR1,2,R1,3,R2,3, are shaded. The feature vector υ1

is insideR2,3 and υ2 is located both inR2,3 andR1,3. As a result, ℓ1 joinsC2 andC3; whileℓ2

joinsC1, C2, and C3. An overlapping regionRx,y can be regarded as an expansion of an Voronoi

edgeex,y along the edges incident to the endpoints ofex,y, like the gray region R2,3 shown in

Fig. 4.2(b). The expansion range δ is used to control the size of Rx,y by expanding both sides

fromex,y.

To determine which overlapping regions where a feature vector υi located, we have to

determine the Voronoi cell Vx such that ℓi ∈ Cx and a neighboring Voronoi cell Vy of Vx.

Let dist(υi, ex,y) be the vertical distance between υi and the Voronoi edge ex,y. Then, if

dist(υi, ex,y) < δ, then υi is definitely in Rx,y. Hence, ℓi will join Cy in the Voronoi-based

(22)

However, due to the costly computation ofex,yin high dimension feature space [21], we do

not calculatedist(υi, ex,y) directly. Instead, we use the projection of the vector −−_ω−−−−−⇀

xυion the line

ω_xω_y_{to obtain}_dist(υ_i_{, e}_x,y_{). Again in Fig. 4.2(b), we want to determine dist(υ}₁_{, e}_2,3_{) = |υ}₁_a|.

First, −−ω−−₃−−υ−⇀₁ _{is projected on ω}₃ω₂ _as −−ω−−₃−−υ−⇀′

1. Let b be an intersection point of ω3ω2 and e2,3.

The edge e2,3 and ω3ω2 are mutually orthogonal, which is a property of a Voronoi diagram.

Therefore, the points υ1, υ1′, b, and a form a rectangle so |υ ′

1b| = |υ1a|. Finally, |υ1′b| can be

obtained by||ω3b| − |ω3υ1′|| = |ω2ω3|/2 − −−_ω−−−−−⇀ 3υ1· −−_ω−−−−−⇀ 3ω2/k −_ω−−−−−−⇀

3ω2k. Compared with finding |υ1a|

directly, this method saves more computation cost.

The above procedure functions well based on the assumption that each Voronoi cell knows the neighborhood information. For example,V3knowsV1andV2are its neighbors in Fig. 4.2(b).

Unfortunately, we cannot obtain this information until the relationship between Voronoi cells is completely discovered. This is as hard as finding Voronoi edges. Note that the k-means

clustering algorithm only finds out the Voronoi vertex of each cell. Here, we propose a simple speculation technique, called neighborhood speculation, to guess the neighborhood relation-ship. It is based on an observation that if two Voronoi cellsVx and Vy are neighbors, then the

midpoint of ωxωy is usually closer toVx andVy than any other cell. Therefore, we use the

po-sition of the midpoint ofVx andVy to speculate the relationship between them. If the midpoint

is inside other cells except forVx orVy, we tend to believe thatVx andVy are not neighbors.

4.2.3 Probability-based Strategy

So far, the above overlapping strategies cannot adaptively adjust the overlapping degree of each training location according to different levels of environmental noise. Besides, the proposed strategies are lack of guaranteeing the probability of correct cluster selection. Hence, we pro-pose the probability-based strategy which can overcome these problems by an off-line analysis.

(23)

As we have mentioned, the received samples are uncertain because of signal fluctuations. This uncertainty is usually modeled by a zero-mean Gaussian normal distribution. Hence, we denote the possible received samples atℓias a vector of random variables Si = [ri,1+ N1, ri,2+

N2, . . . , ri,m+ Nm], where ri,j is the expected signal strength ofbj at ℓi without fluctuations

and Nj = N(0, σj), j = 1..m, are independent and identically distributed zero-mean normal

random variables with variancesσj,j = 1..m.

Then, we define a random variableZx,y = X − Y , where X is the square of the Euclidean

distance between a random sample Sicollected at a fixed locationℓiand a cluster feature vector

ω_x_{, and}_{Y is the square of the Euclidean distance between S}_i_{and another ω}_y_{. Then we have} Zx,y = X − Y = kSi, ωxk2− kSi, ωyk2 = m X j=1

[(ri,j+ Nj) − ωx,j]2 − [(ri,j + Nj) − ωy,j]2

=

m

X

j=1

2(ωy,j − ωx,j)(ri,j + Nj) − (ω2y,j− ω 2

x,j). (4.1)

Assume the number of training samples is large, so we can expect υi,j = ri,j. Let Θj =

ω2

y,j − ωx,j2 andΦj = ωy,j − ωx,j. Hence,

Zx,y = m X j=1 2ΦjNj + m X j=1 (2Φjυi,j− Θj). (4.2)

Because allNj = N(0, σj), for j = 1..m, are i.i.d. and Φj, Θj, andυi,j are constants,Zx,y is

still a normal distributed random variable. Its mean and variance is

µx,y = m X j=1 (2Φjυi,j− Θj), σx,y2 = m X j=1 (2Φjσj)2.

(24)

Therefore, the probability of the eventX < Y is P r(Zx,y < 0) = Z 0 −∞ 1 p2πσ2 x,y

exp(−(zx,y− µx,y)

2

2σ2 x,y

)dzx,y. (4.3)

Let X < Y be equivalent to X ≤ Y . For a randomly collected sample Si, if kSi, ωxk is

smaller thankSi, ωjk for all j = 1..k, j 6= x, the estimated cluster will be Cx. We can express

the probability of this event by

P r(C∗

= Cx)

=P r(Zx,1≤ 0, Zx,2 ≤ 0, . . . , Zx,x−1≤ 0,

Zx,x+1≤ 0, . . . , Zx,k ≤ 0). (4.4)

For ease of computation, we assume eventsZx,j ≤ 0 for all j = 1..k, j 6= x, are independent.

Thus, Eq. (4.4) can be rewritten as

P r(C∗ = Cx) = k Y j=1 j6=x P r(Zx,j≤ 0). (4.5)

In the multi-nearest-neighbor strategy, ℓi is allowed to join the top λM close clusters in

F . Instead, in this strategy, ℓi can join different number of clusters based on Eq. (4.5). A

probability threshold ξ is defined here to denote the expected probability of a correct cluster

selection. Then, we sort the clusters according to P r(C∗

= Cx) for all x = 1..k in descending

order. By this sequence,ℓi will join the clusters one by one untilP_xP r(C∗ = Cx) ≥ ξ.

In summary, this strategy provides a more effective and efficient way to determine the over-lapping degree of each training location. There are two key advantages. First, it assigns the

(25)

overlapping degree of a training location by its possibility of correct cluster selection. Second, no matter how the environment changes, it assures that the clusters to which a training location belongs can cover most possible regions where its signal strengths would fluctuate inF .

(26)

Chapter 5 Simulations

In this section, we conduct some experiments to evaluate the performance of our proposed framework. We study the impact of varying parameters used in our framework.

5.1 Simulation Model

We consider a 100×100 square meters sensing field. Eight beacons are placed at (0, 0), (0, 99), (99, 0), (0, 50), (50, 0), (50, 99), (99, 50), and (99, 99), respectively. As to other 9992 grid

points, we collect 200 training samples at each of them in the training phase. The log-distance path loss model is exploited to model the signal propagation given by [22]:

P L(d) = P L(d0) + 10αlog(

d d0

) + N(0, σ), (5.1)

where d0 = 1 is the reference distance, and d is the distance between the transmitter and the

receiver. α denotes the path loss exponent, typically from 2 to 6, and N(0, σ) is a zero-mean

normal distributed random variable with a standard deviationσ. Also, the transmit power Ptis

(27)

• Positioning error: The error distance between the estimated location and the true location

is the positioning error. We will use this metric to evaluate our proposal framework with other fingerprinted-based methods.

• Hit rate: To get insight into the impact of clustering on localization, the hit rate is

de-fined as the probability of accurately predicting the cluster containing the true location. Obviously, the higher the hit rate, the less the positioning error caused by false cluster selection.

• Average cluster size: This metric stands for the improvement on computation reduction.

According to our cluster-enhanced localization framework, the total number of compar-isons would be O(k + λ × |L|/k). Since the number of clusters is a tunable parameter

in our proposed clustering strategies, we more care about the average number of training locations in clusters (i.e.,Pk

i=1|Ci|/k).

We evaluate the following clustering techniques: k-means algorithm, the Joint Clustering

(abbreviated as JC) technique in [19], the multi-nearest-neighbor (abbreviated as MNN) strat-egy, the Voronoi-based (abbreviated as Voronoi) stratstrat-egy, and the probability-based (abbreviated as Prob) strategy. A good clustering strategy should have a higher hit rate, a smaller average cluster size, and a lower positioning error.

The signal propagation model mentioned in Eq. (5.1) is used again in the positioning phase to simulate test samples. For each grid point, 200 test samples are generated and then a clus-tering technique is applied for each sample to determine its nearest cluster. A hit event occurs when the selected cluster contains the corresponding location of this sample.

To implement the probability-based strategy, we have to calculate the integrals in Eq. (4.5). However, it is not efficient and hard to obtain a precise product. Hence, in our simulations, we

(28)

3 4 5 6 7 8 9 10 11 12 13 2 2.5 3 3.5 4 4.5 5 5.5 6 Positioning Error (m) Standard Deviation (σ)

Average Positioning Error JT with q=3

k-means with k=10 k-means with k=16 k-means with k=22

Figure 5.1: The comparison of the average positioning error under differentσ.

approximateP r(C∗

= Cx) by randomly generating h samples for each grid point ℓi according

to the path loss model. After performing the probability-based clustering strategy to all these

h samples, we obtain P r(C∗

= Cx) = hx/h, where hx is the number of samples whose closest

cluster isCx. In our simulation model,h is set to 1000.

5.2 Impact of Clustering on the Average Positioning Error

We first investigate the impact of clustering on the average position error. To demonstrate that clustering will also guarantee the accuracy of location estimation, we only compare the base-line clustering method (i.e. k-means algorithm) with the JC algorithm. We vary the standard

deviationσ in the path loss model and show the effectiveness of clustering. Note that JC only

generates14 ∼ 18 clusters in our simulation and thus we set the number of clusters for k-means

algorithms to 10, 16 and 22, respectively. Fig. 5.1 shows the experimental results. In Fig. 5.1, JC incurs larger average positioning error under different noise levels. On the other hand, our proposed framework is able to provide better performance than that of JC in terms of average positioning error.

(29)

5.3 Sensitive Performance Study for Clustering Strategies

From the above experiment, our proposed framework with the baseline clustering algorithm (i.e., mean algorithm) outperforms the existing algorithm (i.e., JC). Before comparing

k-means with other proposed strategies, we first conduct sensitive performance study for cluster-ing strategies so as to determine the optimal parameter for each one. The number of clusters for each clustering strategy is set to 50, 100 and 200, respectively. Fig. 5.2 shows the performance study of the three clustering strategies with their parameters varied (λM in MNN,δ in Voronoi,

andξ in Prob). Note that to show their difference, we will compare these clustering strategies in

terms of the hit rate and the average cluster size. It can be seen in Fig. 5.2(a) and Fig. 5.2(c), the hit rates of MNN and Voronoi have similar trend. However, Fig. 5.2(b) and Fig. 5.2(d) reveal that the Voronoi strategy can have smaller average cluster size in a lower density environment (k = 50). In other words, Voronoi is more suitable for a sparse environment. This agrees with

our claim that Voronoi can effectively avoid training locations joining unnecessary clusters. The performance study of the Prob strategy under different ξ is shown in Fig. 5.2(e) and

Fig. 5.2(f). By comparing with other strategies, we have the following observations. First, both Voronoi and Prob have almost the same hit rate and their average cluster sizes are similar. Second, the hit rate of Prob is always above the required thresholdξ under different scenarios

(i.e., different setting fork), which indicates that the Prob strategy can automatically adjust itself

to adapt to different environments. As a result, the Prob strategy is superior than the Voronoi strategy.

(30)

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 Hit Rate λM MNN Overlapping Strategy (a) k=50 k=100 k=200 0 200 400 600 800 1000 1200 1400 1600 2 3 4 5 6 7 8

Average Cluster Size

λM MNN Overlapping Strategy (b) k=50 k=100 k=200 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 2.5 3 3.5 4 4.5 5 5.5 6 Hit Rate δ

Voronoi-based Overlapping Strategy

(c) k=50 k=100 k=200 100 200 300 400 500 600 700 2.5 3 3.5 4 4.5 5 5.5 6

δ

Voronoi-based Overlapping Strategy

(d) k=50 k=100 k=200 0.75 0.8 0.85 0.9 0.95 1 0.7 0.75 0.8 0.85 0.9 0.95 Hit Rate ξ

Probability-based Overlapping Strategy

(e) k=50 k=100 k=200 100 200 300 400 500 600 700 0.7 0.75 0.8 0.85 0.9 0.95

ξ

Probability-based Overlapping Strategy

(f)

k=50 k=100 k=200

(31)

5.4 Performance Comparison of Clustering Strategies

In light of sensitive performance studies in Section 5.3, we select MNN withλM = 4, Voronoi

withδ = 5, and Prob with ξ = 0.9 to further compare their performance with the noise degree

varied. The performance study of these clustering strategies is shown in Fig. 5.3. Fig. 5.3(a) shows that in a very noisy environment (i.e., larger standard deviation), the Prob strategy outper-forms the others. Also, the hit rate of MNN and Voronoi is very high whenσ ≤ 3.5. However,

the hit rates of these two strategies decrease quickly as noise degree increases. It is worth men-tioning that without the overlapping technique, k-means performs worst in terms of hit rates.

However, from the result in Fig. 5.3(b),k-mean has the smallest average cluster size. If the

la-tency caused by positioning is more important, this algorithm is a good choice because smaller average cluster size implying shorter latency. Besides, both MNN and Voronoi have reasonable average cluster size in the environment with larger noise degree. On the other hand, if accuracy of location estimation is more important, one should employ the Prob strategy. Hence, upon the requirement of applications, one should determine to use a suitable clustering technique.

5.5 Performance Study of Total Comparison Cost

The number of clusters will impact on the computation cost. Hence, we further conduct some experiments by varying the number of clusters. The experimental result is shown in Fig. 5.4, where the total comparison cost is defined as the summation of the cluster number and the average cluster size. In this experiment, each strategy should guarantee that the hit rate is larger than 0.85. From Fig. 5.4, whenk = 100, the total costs of Voronoi and Prob are minimal. On

(32)

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 2.5 3 3.5 4 4.5 5 5.5 6 Hit Rate Standard Deviation (σ)

Hit Rate under Different Noise Degree

(a) k-means MNN with λ_M=4 Voronoi with δ=5 Prob. with ξ=0.9 100 200 300 400 500 600 700 800 900 1000 2 2.5 3 3.5 4 4.5 5 5.5 6

Average cluster size

Standard Deviation (σ)

Average Cluster Size under Different Noise Degree

(b)

k-means

MNN with λ_M=4

Voronoi with δ=5

Prob. with ξ=0.9

Figure 5.3: Effect of the standard deviationσ on the (a) hit rate and (b) average cluster size.

400 600 800 1000 1200 1400 1600 1800 50 100 150 200 250 300

Total Comparison Cost

Cluster Number (k)

Comparison Cost under Different Number of Clusters

MNN with λ_M=8

Voronoi with δ=4.5

Prob. with ξ=0.85

(33)

Chapter 6 Conclusion

In this paper, we presented an efficient cluster-enhanced localization framework to speed up the pattern-matching positioning algorithms in large-scale wireless networks. This framework can be plugged into any clustering and positioning algorithm. With the aid of clustering techniques, the training data can be divided into several groups based on their similarity defined in a specific feature space. Then, we selected the one which is most similar to the real-time received sample and only search the locations in it. Considering the problem of potential positioning errors caused by false cluster selection, three clustering strategies allowing overlaps are proposed. Our performance evaluation shows that the proposed overlapping strategies can greatly improve the hit rate of the clustering technique and reduce at least 90% computation cost.

(34)

Bibliography

[1] P. Enge and P. Misra, “Special Issue on Global Positioning System,” Proc. IEEE, vol. 87, no. 1, pp. 3–15, 1999.

[2] R. Want1, A. Hopper, V. Falc˜ao, and J. Gibbons, “The Active Badge Location System,” ACM Trans. on Information Systems (TOIS), vol. 10, no. 1, pp. 91–102, 1992.

[3] N. B. Priyantha, A. Chakraborty, and H. Balakrishnan, “The Cricket Location-support System,” in IEEE/ACM MOBICOM. ACM Press New York, NY, USA, 2000, pp. 32–43. [4] P. Bahl and V. N. Padmanabhan, “RADAR: An In-Building RF-based User Location and

Tracking System,” in IEEE INFOCOM, 2000, pp. 775–784.

[5] T. Roos, P. Myllym¨aki, H. Tirri, P. Misikangas, and J. Siev¨anen, “A Probabilistic Approach to WLAN User Location Estimation,” Int’l Journal of Wireless Information Networks, vol. 9, no. 3, pp. 155–164, 2002.

[6] R. Battiti, T. L. Nhat, and A. Villani, “Location-aware Computing: A Neural Network Model for Determining Location in Wireless LANs,” University of Trento, Department of Information and Communication Technology, Tech. Rep. DIT-5, 2002.

[7] M. Brunato and R. Battiti, “Statistical Learning Theory for Location Fingerprinting in Wireless LANs,” Computer Networks, vol. 47, no. 6, 2005.

[8] V. Seshadri, G. V. Z´aruba, and M. Huber, “A Bayesian Sampling Approach to In-door Localization of Wireless Devices using Received Signal Strength Indication,” in IEEE PERCOM, 2005, pp. 75– 84.

[9] J. Letchner, D. Fox, and A. LaMarca, “Large-Scale Localization from Wireless Signal Strength,” in Proc. of the Nat’l Conf. on Artificial Intelligence (AAAI), 2005, pp. 15–20. [10] Y.-C. Cheng, Y. Chawathe, A. LaMarca, and J. Krumm, “Accuracy Characterization for

(35)

[11] A. LaMarca, J. Hightower, I. Smith, and S. Consolvo, “Self-Mapping in 802.11 Location Systems,” in Proc. 7th Int’l Conf. on Ubiquitous Computing (UBICOMP). Springer, 2005, pp. 87–104.

[12] A. Haeberlen, E. Flannery, A. M. Ladd, A. Rudys, D. S. Wallach, and L. E. Kavraki, “Prac-tical Robust Localization over Large-scale 802.11 Wwireless Networks,” in IEEE/ACM MOBICOM, 2004.

[13] X. Chai and Q. Yang, “Reducing the Calibration Effort for Location Estimation Using Unlabeled Samples,” in IEEE PERCOM, 2005, pp. 95–104.

[14] J. J. Pan, J. T. Kwok, Q. Yang, and Y. Chen, “Multidimensional Vector Regression for Accurate and Low-Cost Location Estimation in Pervasive Computing,” IEEE Trans. on Knowledge and Data Engineering, vol. 18, no. 9, pp. 1181–1193, 2006.

[15] P. Krishnan, A. S. Krishnakumar, W.-H. Ju, C. Mallows, and S. Ganu, “A System for LEASE: Location Estimation Assisted by Stationary Emitters for Indoor RF Wireless Net-works,” in IEEE INFOCOM, vol. 2, 2004, pp. 1001–1011.

[16] J. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observa-tions,” in Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 1967, pp. 281–297.

[17] M. Youssef and A. Agrawala, “On the Optimality of WLAN Location Determination Sys-tems,” in Comm. Networks and Dist. Syst. Modeling and Simulation Conf., 2004.

[18] A. Agiwal, P. Khandpur, and H. Saran, “LOCATOR: Location Estimation System for Wireless LANs,” in ACM WMASH, 2004, pp. 102–109.

[19] M. A. Youssef, A. Agrawala, and A. U. Shankar, “WLAN Location Determination via Clustering and Probability Distributions,” in IEEE PERCOM, 2003, pp. 143–150.

[20] R. Xu and D. W. II, “Survey of Clustering Algorithms,” IEEE Trans. on Neural Networks, vol. 16, no. 3, pp. 645–678, 2005.

[21] F. Aurenhammer, “Voronoi Diagrams - A Survey of a Fundamental Geometric Data Struc-ture,” ACM Computing Surveys (CSUR), vol. 23, no. 3, pp. 345–405, 1991.

在大型網路下以群簇法為基礎的樣本比對定位法之研究

國

立

交

通

大

學

網路工程研究所

碩

士

論

文

在 大 型 網 路 下 以 群 簇 法 為

基 礎 的 樣 本 比 對 定 位 法 之 研 究

Cluster-Based Pattern-Matching Localization Schemes for

Large-Scale Wireless Networks

研 究 生：吳秉禎

在大型網路下以群簇法為基礎的樣本比對定位法之研究

學生

吳秉禎

指導教授

曾煜棋老師

國立交通大學資訊工程學系

研究所

碩士班

摘要

在定位服務裡

系統的反應時間是一個關鍵點

對於即時性的應用來說

更是如此。 在大型

網路下

如無線城市

以樣本比對法為基礎的定位系統

如此的需求更為明顯。 此類定位法的運

作是仰賴目前物體收集到的訊號強度特徵與事先在訓練階段建立的以訊號強度為樣本的資料庫

做比對來達到定位的目的。 在這篇論文中

我們提出一個以群簇法為基礎的樣本比對定位架構

來加快定位的程序。 藉著將擁有類似的訊號特徵樣本的訓練點群聚在一起

我們會展示如何降

低定位所需的比較複雜度來加速整個定位的流程。 為了解決訊號飄移的問題

我們更提出了幾

個有效的分群法。 在許多廣泛的模擬的結果下

我們可以發現

平均來說

在不影響定位準確度

的情況下

我們提出的系統相較於原來的樣本比對法的比較複雜度上可減少至少

。

Cluster-Based Pattern-Matching Localization

Schemes for Large-Scale Wireless Networks

Abstract

誌謝

這篇論文的完成

首先要感謝曾教授所給予的指導與意見

不僅僅提升整體論文的品質更帶

領我了解到整個研究的過程。 感謝在過程中協助良多的郭聖博學長

對於整篇論文的所提供的

寶貴的建議與許多的幫助

讓此篇論文得以順利完成。 也感謝實驗室夥伴們彼此間的加油打氣

讓我在研究的路上不孤單。 最後

感謝我的家人兩年來一直在背後默默地支持

讓我能專心在自

己的研究上。

Contents

中文摘要

誌謝

List of Figures

Chapter 1

Introduction

Chapter 2

Related Works

Chapter 3

The Cluster-Based Pattern-Matching

Localization Framework

3.1

The Training Phase

3.2

The Positioning Phase

Chapter 4

Clustering Algorithms

在大型網路下以群簇法為

基礎的樣本比對定位法之研究

研究生：吳秉禎

更是如此。在大型

如此的需求更為明顯。此類定位法的運

做比對來達到定位的目的。在這篇論文中

來加快定位的程序。藉著將擁有類似的訊號特徵樣本的訓練點群聚在一起

低定位所需的比較複雜度來加速整個定位的流程。為了解決訊號飄移的問題

個有效的分群法。在許多廣泛的模擬的結果下

領我了解到整個研究的過程。感謝在過程中協助良多的郭聖博學長

讓此篇論文得以順利完成。也感謝實驗室夥伴們彼此間的加油打氣

讓我在研究的路上不孤單。最後