Monitoring Path Nearest Neighbor in Road Networks

(1)

Monitoring Path Nearest Neighbor in Road Networks

Zaiben Chen

^†

, Heng Tao Shen

^†

, Xiaofang Zhou

^†

, Jeffrey Xu Yu

^‡

†School of Information Technology & Electrical Engineering The University of Queensland, QLD 4072 Australia

‡The Chinese University of Hong Kong, Hong Kong, China {

zaiben, shenht, zxf

}

@itee.uq.edu.au, yu@se.cuhk.edu.hk

ABSTRACT

This paper addresses the problem of monitoring the k near- est neighbors to a dynamically changing path in road networks. Given a destination where a user is going to, this new query returns the k -NN with respect to the shortest path connecting the destination and the user’s current location, and thus provides a list of nearest candidates for ref- erence by considering the whole coming journey. We name this query the k -Path Nearest Neighbor query (k -PNN). As the user is moving and may not always follow the shortest path, the query path keeps changing. The challenge of mon- itoring the k -PNN for an arbitrarily moving user is to dy- namically determine the update locations and then refresh the k -PNN efficiently. We propose a three-phase Best-first Network Expansion (BNE) algorithm for monitoring the k - PNN and the corresponding shortest path. In the searching phase, the BNE finds the shortest path to the destination, during which a candidate set that guarantees to include the k -PNN is generated at the same time. Then in the verifi- cation phase, a heuristic algorithm runs for examining can- didates’ exact distances to the query path, and it achieves significant reduction in the number of visited nodes. The monitoring phase deals with computing update locations as well as refreshing the k -PNN in different user movements.

Since determining the network distance is a costly process, an expansion tree and the candidate set are carefully maintained by the BNE algorithm, which can provide eﬃcient update on the shortest path and the k -PNN results. Finally, we conduct extensive experiments on real road networks and show that our methods achieve satisfactory performance.

Categories and Subject Descriptors

H.2.8 [Database Applications]: Spatial databases and GIS

General Terms

Algorithms

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

SIGMOD’09, June 29–July 2, 2009, Providence, Rhode Island, USA.

Keywords

Path Nearest Neighbor, Road Networks, Spatial Databases

1. INTRODUCTION

Nearest Neighbor query is one of the fundamental issues in spatial database research area. It is designed to ﬁnd the closest object p to a speciﬁed query point q, given a set of objects and a distance metric. This problem is well studied in the literature, and its variants include k -Nearest Neighbor search [6, 17], Continuous Nearest Neighbor search [1, 10, 21], Aggregate Nearest Neighbor queries [14, 22], etc.

While all the queries mentioned above concern only the locally optimized results, in this paper, we investigate the problem of Path Nearest Neighbor (PNN) query, which re- trieves the nearest neighbor with respect to the whole query path. Here, ‘locally optimized results’ means the nearest neighbors with respect to the current query location. How- ever, sometimes a user is moving and may want to know the best choice by considering the whole path to be traveling on, thus a globally optimal choice for the nearest neighbor to a given path is required, and that is the motivation of this work. As exempliﬁed in Figure 1, assume that we are traveling from s to t along a path P ={s, n2, n₃, t} and we hope to ﬁnd the nearest gas station for refueling. If we use conventional Nearest Neighbor query, gas station A is re- turned at the beginning. However, A is not the best choice because there is another gas station B not far away which is much closer to the path we are traveling on. So PNN query suits the applications where a user wants to consume a ser- vice when traveling towards a given destination. For such applications, neither the current nearest neighbor nor the nearest neighbor at any particular point is the best for the user; instead, the user wants to know the nearest neighbor relative to the route he/she will travel (B in this example).

$

%

V W

Q

Figure 1: an example

A similar issue called In-Route Nearest Neighbor (IRNN) query is ﬁrst proposed by Shekhar et al. in [20] to search a facility instance (e.g. gas station) with the minimum detour

(2)

distance from the query route on the way to the destination.

Still considering Figure 1, the detour distance of A from P is greater than that of B, so obviously IRNN(P ) would return B to the user. The intuition behind is that users (e.g. commuters) prefer to follow the route they are familiar with, thus they would like to choose the gas station with the smallest deviation from the route. After refueling, they will return to the previous route and continue the journey.

However a drawback of IRNN is that the user has to input exactly the whole query path in advance, which is identiﬁed by all intersections along the path, while a user’s driving path often cannot be precisely pre-decided. Imagine that a user is driving from Washington to New York, which is a long journey. It is impractical for a user to input hundreds of intersections before successfully making a query.

Therefore, we propose the Path Nearest Neighbor query, requiring users to input only the destination as well as the current location rather than the whole specified path. For each PNN query, we construct a shortest path connecting the destination and the current location and then search for the nearest facility instance to the shortest path (i.e. the facility instance with the minimum detour distance). Since a moving user may not follow the shortest path and the driving route might change over time in the coming journey, we provide continuous monitoring of the k -PNN, which always gives the user the best candidates for consideration. This raises the issue of how to dynamically query the nearest neighbors to a changing shortest path efficiently. To provide efficient monitoring of the k -PNN, we propose in this paper a Best-first Network Expansion (BNE) method. Specifically, the BNE consists of three main phases, including searching phase, verification phase and monitoring phase.

In the searching phase, the BNE algorithm incorporates a bi-directional search method for establishing the shortest path, which conducts two independent network expansions from the starting location and the destination separately, and when the two expansions meet, the shortest path is determined. The novelty of the searching phase is that, we can also derive all encountered nodes’ lower bounds and upper bounds of minimum detour distance during the bidirectional search, which are further utilized in determining a candidate set for the k -PNN and examining candidates’

exact detour distances. As the searching for the shortest path is inevitable for the k -PNN query (if not consider pre- computation for distance browsing), the BNE algorithm is designed to retrieve as much information as possible during the searching process, and improve the performance of monitoring by using this information.

With a list of potential candidates that guarantee to in- clude the k -PNN results returned from the searching phase, the veriﬁcation phase processes these candidates in the order of their lower bounds. Here, a heuristic veriﬁcation function for examining candidates’ exact minimum detour distances to the query path is devised. The heuristic function searches the minimum detour path from a candidate towards the query path directionally, instead of simply conducting a Dijkstra’s network expansion. By doing so, the area of searching is reduced greatly especially when the candidate is not close to the query path.

In the monitoring phase, the main task is to ﬁgure out where an update for the k -PNN is needed, which could be an update of the order, or a re-calculation of the k -PNN. We discuss these two cases in the situation when the user follows

the shortest path or deviates from the shortest path respec- tively. To facilitate the k -PNN updates, the BNE carefully maintains an expansion tree rooted at the destination, which stores the shortest paths (from destination) to the surround- ing nodes. This expansion tree is ﬁrstly recorded during the bi-directional search in the searching phase, and it enlarges or shrinks accordingly while the user’s current location is changing. Besides, the candidate set and candidates’ lower bounds/upper bounds acquired previously are also updated gradually in the monitoring phase, which are utilized to accelerate the update algorithms.

To sum up, we make the following main contributions:

• We deﬁne a new type of query for searching the k near- est neighbors to a changing shortest path. It provides new features for advanced spatial-temporal information systems, and may beneﬁt users by reporting best candidates from the global view.

• We devise the BNE algorithm which eﬃciently mon- itors the k -PNN while the user is moving arbitrarily.

An expansion tree and the candidate set are utilized with lower and upper bounds on minimum detour dis- tance for fast k -PNN update.

• We also propose the methods for determining the up- date locations which invoke potential updates on the k -PNN results in diﬀerent user movements, as well as the algorithms for eﬃciently updating the k -PNN re- sults.

• We conduct extensive experiments on real datasets to study the performance of the proposed approaches.

The remainder of the paper is organized as follows. In Section 2 we discuss the related work. In Section 3, a formal deﬁnition of the problem is given. The searching phase and the verification phase of the BNE algorithm are presented in Section 4, and the monitoring phase is introduced in Section 5. Finally we show our experiment results in Section 6 and draw a conclusion in Section 7.

2. RELATED WORK

Spatial queries in advanced traveler information system continue to proliferate in recent years. Nearest Neighbor (NN) query is considered as an important issue in such kind of applications. This query aims to retrieve the closest neighbor to a query point from a set of given objects.

In [17] and [6] a depth-ﬁrst and a best-ﬁrst tree traversal approaches are proposed respectively for NN query in Eu- clidean space and they employ a branch-and-bound strategy.

The Nearest Neighbor query is also extended to a road network scenario by using network distance as the distance metric. Papadias et al. present in [15] the Incremental Eu- clidean Restriction (IER) and Incremental Network Expan- sion (INE) algorithms for retrieving k -NN according to net- work distance. IER uses the Euclidean distance as a lower bound for pruning during the search, and INE performs a network expansion similar to the Dijkstra’s algorithm [3].

Jensen et al. also propose in [7] a general spatial-temporal framework for NN queries in a road network which is represented by a graph. In [19], a graph embedding technique is proposed to transform a road network to a high-dimensional Euclidean space and then the approximate k -NN can be

(3)

found. Pre-computation based methods for k -NN queries are also studied in [8] and [18], in which Voronoi diagrams and Shortest Path Quadtrees are utilized separately.

Many variants of Nearest Neighbor search are studied as well, like Aggregate k -NN monitoring [16], Trip Planning Queries [9] and Continuous Nearest Neighbor queries (CNN) [1, 10, 13, 21]. CNN queries report the k -NN results con- tinuously while the user is moving along a path. The main challenge of this type of queries is to ﬁnd the split points on the query path where an update of the k -NN is required, and thus to avoid unnecessary k -NN re-calculations. However, a limitation of CNN queries is that the query path has to be given in advance and it can not change during the user’s movement. Therefore, in [11], Mouratidis et al. investigate the Continuous Nearest Neighbor monitoring problem in a road network, in which the query point moves freely and the data objects’ positions are also changing dynamically.

The basic idea of [11] is to carefully maintain a spanning tree originated from the query point and to grow or discard branches of the spanning tree according to the data objects and query point’s movements. To some extent, the motiva- tion of our k -PNN monitoring problem is similar to that of the CNN monitoring. However, we aim to provide monitor- ing of the k -NN to a dynamically changing path rather than a moving query point, and we assume all data objects (e.g.

restaurants, gas stations) keep stationary.

In-Route Nearest Neighbor Queries (IRNN) in [20] is designed for users that drive along a ﬁxed path routinely.

As this kind of drivers would like to follow their preferred routes, IRNN queries are proposed for ﬁnding nearest neighbor with the minimum detour distance from the ﬁxed route, because they make the assumption that a commuter will return to the route after going to the nearest facility (e.g.

gas station) and will continue the journey along the previous route. Our problem is an extension of the IRNN query, by monitoring the k nearest neighbors to a continuously changing shortest path, and the user only needs to input the destination rather than exactly the whole query path.

3. PROBLEM DEFINITION

In this paper, a road network is modeled as a weighted undirected graph G(V, E), in which V consists of all vertices (nodes) of the network, and E is the set of all edges. We assume that all facility instances (data objects) lie on the road. If a data object is not located at a road intersection, we treat the data object as a node and further divide the edge it lies on into two edges. So V is a node set comprised of all intersections and data objects and E contains all the edges between them. Each edge is associated with a non- negative weight representing the time cost of traveling or simply the road distance between the two neighboring nodes.

We deﬁne the network distance D_n(n₁, n₂) between two nodes n₁and n₂as the length of the shortest path SP (n₁, n₂) connecting n1and n2. A path P from node s to destination t is represented by a series of nodes P ={n1, n₂,· · · , nr}, in which n₁ = s, n_r = t and the length |P | is the sum of the weight of all edges on P . The minimum detour distance Dd(o, P ) of a data object o from a path P is deﬁned as:

D_d(o, P ) = min

n_i∈P{Dn(o, n_i)}

We may also denote D_d(o, P ) by D_d(o) alternatively when in a clear context.

Table 1: A list of notations

Notation Description

V The set of all nodes

E The set of all edges

weight(n₁, n₂) The weight of edge (n₁, n₂)

P, |P | A path in a road network, and its length (n₁, n₂) The edge between n₁and n₂, or the path

from n1to n2 if in a clear context SP (n1, n2) The shortest path between n1and n2

Dn(n1, n2) The network distance between n1 and n₂

D_d(o, P ) The minimum detour distance of data object o from path P

De(n1, n2) The Euclidean distance between n1, n2

LB(o, P ), UB(o, P ) The lower bound and upper bound of minimum detour distance of o from P L_f(), L_r(), L_v() The distance labels in forward, reverse

and verification searches

Distp(n_i, s, t) The perpendicular distance from n_i to line (s,t)

Definition 1. (k-Path Nearest Neighbor query) Given a starting node s, a destination node t, a road net- work G(V, E) and a set of data objects O (O ⊆ V ), the k-Path Nearest Neighbor (k-PNN) query is to find the k data objects: O ={o1, o₂,· · · , ok} (O ⊆ O), such that

D_d(o_i, SP (s, t))≤ Dd(o_j, SP (s, t)),∀oi∈ O, o_j∈ O − O Here SP (s, t) is the shortest path from s to t. We aim to monitor the k -PNN relative to SP (s, t) while s is moving in a road network. In our application scenarios, SP (s, t) keeps changing and the k -PNN needs to be reported dynamically.

Table 1 shows a list of notations used in this paper.

4.

K–

PATH NEAREST NEIGHBOR QUERY

Intuitively the k -PNN query can be solved by issuing at each node of the current shortest path a traditional k -NN search and thereafter combining all the results together.

However the cost of this method is high especially in a monitoring scenario. Therefore, in this section, we propose the Best-first Network Expansion (BNE) algorithm for efficient monitoring of the k -PNN. The BNE is composed of three phases: the searching phase for finding the shortest path and potential candidates at the beginning; the verification phase for determining the exact k -PNN results; and the monitoring phase for updating the k -PNN efficiently. In the verification phase, the BNE always selects the data object which is most likely to be the closest one from the candidate set for verifi- cation, and that is why we call it best-first. As determining distance in a road network is a costly network expansion process, the BNE takes advantage of previous expansion results by maintaining an expansion tree and a candidate set of data objects that must contain the k -PNN results. In our approach, we estimate the minimum detour distance of a data object by a lower bound derived from the triangular inequality of shortest path, and that is the basis of our searching and verification algorithms. In a road network, the triangular inequality holds for shortest path such that

|SP (n1, n₂)| + |SP (n2, n₃)| ≥ |SP (n1, n₃)|

|SP (n1, n₂)| − |SP (n2, n₃)| ≤ |SP (n1, n₃)| SP (n1, n2) indicates the shortest path between nodes n1

(4)

V W R

F F

Q_L O [

\

Figure 2: Lower bound

\

[

\ F[

\ FO[

F

FO DE

Figure 3: (a,b)

and n2, and|SP (n1, n2)| is the length of the path. Consid- ering the illustration in Figure 2, there is a shortest path SP (s, t) connecting the two nodes s and t with|SP (s, t)| = l, while o is a data object in the road network with|SP (o, s)| = c₁ and |SP (o, t)| = c2. n_i is a node on the shortest path SP (s, t). Obviously, o has an upper bound of minimum detour distance UB determined by

U B(o, SP (s, t)) = min{c1, c₂} (1) This upper bound can be further tightened during the searching phase as discussed later in this section. Now we expect to estimate the lower bound LB of the minimum detour dis- tance for the data object o. Assume that the distance from s to n_i is x, and the distance from o to n_i is y. According to the triangular inequality theory stated above, we have:

j c₁− x ≤ y c₂− (l − x) ≤ y

Therefore, the distance (y) from data object o to the short- est path SP (s, t) is no shorter than LB :

LB = min

x∈[0,l]{max{c1− x, c2− (l − x)}} (2) Consequently the lower bound LB(o, SP (s, t)) of the mini- mum detour distance of o from SP (s, t) is determined by ﬁguring out the intersection point (a, b) of the two lines y = c1 − x and y = c2 − (l − x), as shown in Figure 3.

We get:

a =l + c₁− c2

2 , b =c₁+ c₂− l 2 So the lower bound is estimated by

LB(o, SP (s, t)) =c₁+ c₂− l

2 (3)

With l ﬁxed, we can infer from Equation 3 that a smaller lower bound also implies a smaller value of (c₁+ c₂), which means that (c₁+ c₂) declines to l. This happens when the data object is closer to the shortest path connecting s and t.

Therefore, a data object with smaller lower bound has higher opportunity in having a shorter minimum detour distance.

Based on this observation, the BNE algorithm chooses data objects for veriﬁcation in the order of their lower bounds until the current selected data object’s minimum detour distance is smaller than the next object’s lower bound.

Firstly, in the searching phase of our algorithm, the BNE ﬁnds the shortest path between s and t. Here, we adopt a bidirectional algorithm [12] by running the forward and reverse versions of the Dijkstra’s algorithm [3] from s and t separately. The novel point is that we can also obtain the scanned nodes’ lower bounds and upper bounds of the minimum detour distance during the searching for the shortest

path. The forward version of the Dijkstra’s algorithm ex- pands from s and the reverse version expands from t in the road network, while each of them maintains its own set of distance labels. Once the two searches meet (a node scanned by the forward search has also been scanned by the reverse search, or vice versa), a shortest path from s to t is detected.

During the search for the shortest path SP (s, t), some data objects around s and t are scanned and their distances to s or t are determined as well. We can utilize these recorded distances for the veriﬁcaton of the k nearest neighbors in the following veriﬁcation phase.

Another task during the bidirectional search is to get a candidate set of data objects that guarantees to include the k -PNN results. To achieve that, the bidirectional expansion may need to continue even after the shortest path is found, until we ﬁnd a data object o, satisfying that the lower bound LB(o, SP (s, t)) is not less than at least k found data objects’

upper bounds. We denote by L_f(n_i) the distance label of a node nimaintained by the forward search, and by Lr(ni) the distance label of a node ni maintained by the reverse search, and by l the length of the shortest path SP (s, t).

We formalize the process as following: assume that dur- ing the bidirectional search, so far there is a set of k data objects (O) get scanned (expanded) by either the forward search or the reverse search or both of them. Among O, each oi∈ O is assigned an upper bound U B(oi, SP (s, t)) = min{Lf(o_i), L_r(o_i)} according to Equation 1, or Lf(o_i) if only scanned by the forward search, or L_r(o_i) if only scanned by the reverse search, while those scanned by both searches also have a lower bound LB(o_i, SP (s, t)) = ^L^f^(oⁱ^)+L₂^r^(oⁱ^)−l according to Equation 3.

Theorem 1. During the bidirectional search, if there ex- ists a data object o ∈ O, and we can find at least k data objects O ={o1, o2,· · · , ok} from O, such that

LB(o, SP (s, t))≥ max

o_i∈O{UB(oi, SP (s, t))} Then, the k-PNN must be included in O.

Proof. For any data object oj that is not in O, which means it has not been scanned yet, if we continue the bidi- rectional search till oj gets both distance labels from the forward and the reverse searches, we have

Lf(oj)≥ Lf(oi),∀oi∈ O Lr(oj)≥ Lr(oi),∀oi∈ O

because the search process based on the Dijkstra’s algorithm always chooses the node with the smallest distance label value for expansion. o∈ O, then

L_f(o_j) + L_r(o_j)− l

2 ≥L_f(o) + L_r(o)− l 2

⇒ LB(oj, SP (s, t))≥ LB(o, SP (s, t))

⇒ LB(oj, SP (s, t))≥ UB(oi, SP (s, t),∀oi∈ O Therefore, any o_jmust not have a minimum detour distance less than that of the k data objects in O found so far.

Notice that the k data objects {o1, o2,· · · , ok} are not necessarily to be the k -PNN results. We can only guarantee

(5)

that the k -PNN is within the set of data objects (O). The searching phase of the BNE is shown in Algorithm 1.

Algorithm 1: BNE - searching phase input : Node s, t; G(V ,E)

output: SP (s, t); Candidate Set CS S, T, Q_s, Q_t← null; l ← ∞;

1 ∀p ∈ V , Lf(p), L_r(p)← ∞;Lf(s), L_r(t)← 0;

2

Qs← Qs∪ s; Qt← Qt∪ t;

3

Heap Lowerbounds, U pperbounds;

4

while Q_s, Q_t= null do

5

// Forward search u← ExtractMin(Qs);

6

S← S ∪ u;

7

if u∈ T and l = ∞ then

8

l← Lf(u) + Lr(u);

9

record SP (s, t);

10

foreach node v∈ u.adjacentNodes do

11

if L_f(v) > L_f(u) + weight(u, v) then

12

Lf(v)← Lf(u) + weight(u, v);

13

Q_s← Qs∪ v;

14

π_f(v)← u;

15

if u is a data object then

16

U pperbounds.add(L_f(u));

17

if L_r(u)= ∞ then

18

u.lowerbound← ^L^f^(u)+L₂^r^(u)−l;

19

Lowerbounds.add(u.lowerbound);

20

k-minimal values← Upperbounds.minK();

21

if Lowerbounds.min≥ max{the k-minimal

22 values} then

CS← all data objects in S ∪ T ;

23

return SP (s, t) & CS;

24

// Reverse search

The same process as the forward search, with (S,

25

Qs, Lf(), πf()) replaced by (T , Qt, Lr(), πr());

In Algorithm 1 the forward and reverse searches run al- ternately. During the initialization step, the sets of scanned nodes S and T are initialized to be null, and all nodes’ dis- tance labels except Lf(s) and Lr(t) are set to be∞. The Heaps are for recording all data objects’ lower bounds and upper bounds found so far (non-data object nodes’ lower bounds/upper bounds are also recorded in another heaps).

The search process is similar to the Dijkstra’s algorithm, which always chooses the node with the minimal distance label for expansion (line 6). When a node scanned by both searches is found, the shortest path SP (s, t) is recorded (line 9-10). A data object’s upper bound of minimum detour distance is stored as the min{Lf(u), Lr(u)} (line 17), and once the object gets scanned by both forward and reverse searches, it is assigned a lower bound of the minimum detour distance (line 19). This part of the algorithm stops when Theorem 1 meets (line 22-24) and a candidate set is then returned.

Note that after the candidate set CS and the shortest path SP (s, t) are returned, there could still be some data objects in CS that have not been scanned by both the forward and reverse searches and thus their lower bounds are unknown yet. Therefore, before going to the candidate veriﬁcation phase, we further continue the network expansion of the bidirectional search until all data objects in CS have their

lower bounds be determined. This part of the searching phase is intuitive and we omit it in Algorithm 1 for the simplicity of presentation.

During the searching phase presented above, we can also get two expansion trees T_f and T_r originated from s and t respectively (by recording parent node as π_f(v),πr(v) at line 15), which can be re-used as ‘pre-computed’ knowledge in our monitoring phase. As illustrated in Figure 4 (we only show the expansion tree originated from t with thicker lines), if the user moves from s to another node sthat has already been included in T_r, then the shortest path from s to t is ﬁgured out to be SP (s, t) ={s, n4, t} by using the expansion tree easily without extra search. Besides, during the network expansion after SP (s, t) is found in the searching phase, we can also tighten the upper bounds of some found data objects if their ancestor nodes in the expansion tree are on SP (s, t). For example, the data object o in Figure 4 has an ancestor node n₃ (not necessarily the par- ent node) on SP (s, t) ={s, n2, n3, t}, then the upper bound U B(o, SP (s, t)) is tightened to be|Dn(o, n₃)| and Algorithm 1 may return results faster since smaller upper bounds make Theorem 1 easier to be satisﬁed.

V W

V Q Q

Q Q

R

Figure 4: Expansion tree originated from t On acquiring the candidate set CS together with lower bounds of candidates, as well as the shortest path SP (s, t), the veriﬁcation phase executes to verify the k -PNN candi- dates in CS in the sequence of their lower bounds as shown in Algorithm 2.

Algorithm 2: BNE - verification phase input : Lowerbounds, SP (s, t) output: k-PNN

count← 0; Heap kpnn; kpnn.max ← ∞;

1

while Lowerbounds= null do

2

o← Lowerbounds.popMin();

3

if kpnn.max > o.lowerbound then

4

Dd(o, SP (s, t))← verify(o, SP (s, t));

5

if D_d(o, SP (s, t)) < kpnn.max then

6

if count < k then

7

kpnn.add(o);

8

count + +;

9 10 else

kpnn.deleteM ax();

11

kpnn.add(o);

12

13 else

return kpnn;

14

The veriﬁcation phase examines the exact minimum de- tour distance of each candidate from CS in the order of lower bound (the node with the minimal lower bound is pop out

(6)

at line 3), until a candidate’s lower bound is not less than the kpnn’s max value (line 4-12, kpnn stores the k minimal detour distances found so far). The verif y() function per- forms a network expansion from the candidate o to get its exact minimum detour distance. As this function is invoked every time an update occurs, the expansion method can af- fect the efficiency of monitoring significantly. Normally, the Dijkstra’s expansion method can be used. Here, we propose a heuristic expansion approach that improves the efficiency greatly. The basic idea is to select the next node n with the minimum (D_n(n, o) + n.detourEstimate) for expansion.

n.detourEstimate is the estimate of n’s minimum detour distance, and it is determined by either LB(n, SP (s, t)), or Dist_p(n, s, t) which is the perpendicular distance from n to the line (s, t). Dist_p(n, s, t) uses Euclidean distance to approximate the minimum detour distance and it can be easily ﬁgured out by using the Cosine Theorem as follows. Let c₁ = D_e(n, s), c₂ = D_e(n, t) and l = D_e(s, t) (D_e() is Eu- clidean distance), then we have:

Distp(n, s, t) =|c1× sin(arccos(c²₁+ l²− c²2

2c₁l ))|

However, the Euclidean detour estimate may not be appli- cable when the weight of an edge is not measured by real geo- graphic distance (e.g. time cost). In contrast LB(n, SP (s, t)) gives a more tightened estimate and holds for any type of edge weight. One potential drawback is that some nodes encountered during the expansion may have not been previously scanned yet and have no lower bound determined. In this case we need further expansion of T_f and T_r to get the node’s lower bound. However, in our experiments on real datasets, this situation is rare and very limited number of encountered nodes haven’t been scanned as most of them are covered by the expansion trees.

Basically, the search area of the verif y() function using the Dijkstra’s expansion is a circle, while the search area is normally in a triangle shape towards SP (s, t) if using the detour estimate as a heuristic. Algorithm 3 describes the details.

Algorithm 3: verif y(o, SP (s, t)) S_v, Q_v← null;detourDist ← ∞;

1 ∀p ∈ V , Lv(p)← ∞;Lv(o)← 0; Qv← Qv∪ o;

2

while Q_v= null do

3

n← ExtractMin(Qv), such that

4

Lv(n) + n.detourEstimate is minimized ; if Lv(n) + n.detourEstimate≥ detourDist then

5

return detourDist;

6

if n∈ SP (s, t) and detourDist > Lv(n) then

7

detourDist← Lv(n);

8

Sv← Sv∪ n;

9

foreach node v∈ n.adjacentNodes do

10

if L_v(v) > L_v(n) + weight(n, v) then

11

Lv(v)← Lv(n) + weight(n, v);

12

Qv← Qv∪ v;

13

In Algorithm 3, the node with the minimum (D_n(n, o) + n.detourEstimate) gets explored ﬁrst (line 4). Once a node

∈ SP (s, t) gets scanned (line 7-8), a detour path from o to SP (s, t) is found and we update the current minimum detour distance detourDist if a shorter one is found. Here, Lv() is the distance label indicating how far a node is from o. Notice

that the verif y() function may continue the search even af- ter it reaches the shortest path SP (s, t) since it is not neces- sarily that a node with smaller distance label Lv(n) gets ex- plored ﬁrst, until the current detourDist is not greater than the current (L_v(n) + n.detourEstimate) which is a lower bound of all unscanned nodes’ minimum detour distances (line 5-6). The correctness of Algorithm 3 is guaranteed as stated in the following:

Lemma 1. For every node n scanned by the verify() func- tion, Lv(n) is equal to the length of the shortest path SP (o, n), where o is the data object for verification.

Proof. Denote detourEstimate by e. The verify() func- tion’s expansion method is equal to the Dijkstra’s algorithm if we replace the distance label Lv(n) by Lv(n) + n.e. Thus we can deﬁne a new weight of an edge as:

weight(n1, n2) = Lv(n2) + n2.e− (Lv(n1) + n1.e)

= weight(n₁, n₂)− n1.e + n₂.e

weight(n₁, n₂) is the original weight deﬁned in G(V, E).

Straightforwardly, weight(n1, n2)− n1.e + n2.e≥ 0 because of the triangular inequality (proof by replacing e with Equa- tion 3). Suppose we replace the weight of each edge in G(V, E) by the non-negative weight. Then for any two nodes n_x, n_y, the length of any path from n_xto n_ychanges by the same amount: n_y.e− nx.e. Therefore, a path is the shortest path from n_x to n_ywith respect to weight, if and only if it is also the shortest path from nxto nywith respect to weight.

The rationale of the expansion method in Algorithm 3 is similar to that of the Aalgorithm [5], although a diﬀerent heuristic is designed, and the detour estimate is essentially a feasible potential function in [4]. As Lv(n) is guaranteed to be the length of the shortest path from o by Lemma 1, once the minimal detourDist is conﬁrmed, it must be the minimum detour distance from o to SP (s, t).

5. MONITORING

K–

PNN

In this section, we present the monitoring phase of the BNE algorithm and show how to update the k -PNN results when the user is moving arbitrarily. As described before, the user may deviate from the shortest path and then the current shortest path which is actually the query path may be changed from time to time, and thus an update of the k -PNN results is caused by the change of the query path.

Even though the user always follows the shortest path, the path is also becoming shorter while the user is going towards the destination. Therefore, we need to deal with the shortest path update and consequently the k -PNN update.

In this part, the candidate set CS of data objects, the expansion tree T_r and T_f rooted at t and s respectively, as well as lower bounds and upper bounds of scanned nodes that acquired previously are all further utilized and carefully maintained in the monitoring phase as they provide

’pre-computed’ knowledge to accelerate our update algo- rithm. Obviously, Tr is static because the destination does not change, by which we can ﬁgure out a node’s shortest path to the destination quickly. As the user is probably moving closer gradually towards the destination, the can- didate set CS probably covers the new k -PNN results. All

(7)

these information are also updated gradually in the monitoring phase, based on which we design the update algorithms.

There are basically two types of updates for the k -PNN:

(1) update of the order; and (2) update of the members. In the ﬁrst category, the k -PNN results are still the same but the order with respect to minimum detour distance changes, while in the second category some data objects of the k -PNN become invalid and new data objects are inserted into the k -PNN results. Now the problem is to determine where an update of the k -PNN will be needed (i.e. update location), and then only refresh the k -PNN results when necessary. In the following, we present our update algorithms for the cases when the user follows the shortest path, and deviates from the shortest path.

5.1 Following the Shortest Path

Firstly, we discuss the case that so far the user follows the shortest path found previously. Figure 5 illustrates such a 4 -PNN ={o2, o₅, o₄, o₃} example, in which we assume the user follows SP (s, t) and his/her current position is denoted by s. The shortest path from a data object o_ito SP (s, t) in- tersects SP (s, t) at ni, and we call SP (oi, ni) the minimum detour path of o_i, and ni the entrance point of oi’s mini- mum detour path. For instance SP (o₂, n₂) is the minimum detour path of o2, and n2 is o2’s entrance point.

V

W

R

Q

Q Q

GR

Q^ÿ Q

R

GR

GR GR

Figure 5: Update locations

It is not hard to see that before sreaches the ﬁrst entrance point of the current k -PNN (n2in this example), neither the order nor the members of the k -PNN needs to be updated, because when sis on SP (s, t), we have SP (s, t)⊆ SP (s, t), which means SP (s, t) is the same as the part of SP (s, t) from s to t, and hence the minimum detour path of any o_i does not change, and there can not be any other data object closer to SP (s, t), otherwise the closer data object must be included in the k -PNN of SP (s, t).

Once s overtakes the entrance point of a data object oi, the minimum detour distance of oiwill increase and thus it may aﬀect the order of the k -PNN. For instance, when s overtakes n2and keeps going forwards, the minimum detour distance of o₂ becomes larger and the order of o₂ (the 1^st PNN) and o₅ (the 2^nd PNN) may change when o₂’s mini- mum detour distance rises to a certain value. If o_i is just the k^th PNN, it may also become invalid and the (k + 1)^th PNN will replace it to be the k^thPNN. To detect the change of the k^thPNN, we actually maintain the (k + 1)-PNN re- sults in the algorithm, and we calculate the update locations for the k -PNN to indicate where a change of the order could happen. Normally, an update location for a data object o_iis computed every time when s arrives at oi’s entrance point by:

d(oi) =|SP (oj, nj)| − |SP (oi, ni)| (4)

where d(o_i) is the distance from o_i’s entrance point n_ito the update location. Let o_i be the λ^th PNN (λ≤ k), then we choose oj= (λ + 1)^thPNN for calculating d(oi) in Equation 4. The idea is that an upper bound of the minimum detour distance of o_i from SP (s, t) is|SP (oi, n_i)| + |SP (ni, s)|, and as long as this upper bound is smaller than the (λ + 1)^th PNN’s minimum detour distance|SP (oj, nj)|, the order of the k -PNN keeps the same.

For example in Figure 5, the 4 -PNN = {o2, o5, o4, o3}, when s arrives at n₂, it generates an update location for o₂ determined by d(o₂), which is equal to |SP (o5, n₅)| −

|SP (o2, n2)|. While the user is traveling within the range of d(o₂) from n₂, it is expected that no change of the order between o₂ and o₅ is required. However, if the (λ + 1)^th PNN’s entrance point is met before s arrives at the λ^th PNN’s update location, for example s meets o₅’s entrance point n₅ and it generates an update location for o₅ with d(o5) as shown in Figure 5, in this case o5’s update loca- tion is reset to be the same as o₂’s update location which is closer to s, because we need to re-compute both o₂ and o5’s minimum detour distances at o2’s update location to determine whether the order changes, and to ﬁgure out their next update locations. However, if o₅’s update location is closer to s, we do not need to reset o₅’s update location. Similarly, if the (λ− 1)^th PNN’s entrance point is met before s arrives at the λ^thPNN’s update location, like that n4 is encountered before s reaches o3’s update loca- tion as illustrated in Figure 5, since o3’s update location is closer to s, there is no need to adjust o₃’s update location. Algorithm 4 shows how to determine the update location when encountering a data object’s entrance point.

Algorithm 4: Encountering oi’s entrance point

/* oi = the λ^th PNN */

/* oj = the (λ + 1)^th PNN */

/* ok = the (λ − 1)^th PNN */

o_i.updateLoc← pos(ni) + d(o_i);

1

if o_k.updateLoc= null and

2

o_k.updateLoc < o_i.updateLoc then o_i.updateLoc← ok.updateLoc;

3

if o_j.updateLoc= null and

4

oi.updateLoc < oj.updateLoc then o_j.updateLoc← oi.updateLoc;

5

Here, pos(o_i) is the position of o_i, and d(o_i) is computed by Equation 4. The criteria is to reset a lower ranking PNN’s update location (denoted by updateLoc) to the higher rank- ing one’s update location if the higher one’s update location is closer to s (with a smaller value).

On arriving at oi’s update location, the minimum detour distance of o_iis re-examined by running Algorithm 3 and the k -PNN is refreshed accordingly. Recall Algorithm 3, note that the lower bound LB(oi, SP (s, t)) determined previ- ously at s can still be used as the detour estimate in the ver- iﬁcation process even the current query path has changed to be SP (s, t), because LB(oi, SP (s, t))≤ LB(oi, SP (s, t)).

Let D_n(o_i, s) = c₁, D_n(o_i, t) = c₂, D_n(o_i, s) = c₃, we have:

LB(o_i, SP (s, t))− LB(oi, SP (s, t))

=c1+ c2− |SP (s, t)|

2 −c2+ c3− |SP (s, t)|

2

(8)

= (c₁− c3)− (|SP (s, t)| − |SP (s, t)|) 2

=(c₁− c3)− |SP (s, s)|

2 ≤ 0

The update algorithm is invoked when encountering an up- date location Loc as described in Algorithm 5. Firstly it veriﬁes all corresponding data objects’ minimum detour dis- tances, and then refreshes the order of the (k + 1)-PNN. If the previous k^th PNN is not valid any longer (line 5), a re- computation of the whole (k +1)-PNN is executed by calling the updateKP N N () function in Algorithm 6.

Algorithm 5: Encountering an update location Loc foreach object o_i that o_i.updateLoc = Loc do

1

D_d(o_i)← verify(oi, SP (Loc, t));

2

remove oi.updateLoc;

3

refresh the order of the (k + 1)-PNN ;

4

if the k^th PNN is changed then

5

updateKP N N (Loc, t);

6

foreach object o_i that o_i.entranceP oint = Loc do

7

calculate o_i.updateLoc by Algorithm 4;

8

In some cases, oi’s minimum detour path may have a new entrance point even ahead of safter veriﬁcation, such as n₅ in Figure 5. After the update of k -PNN, a data object is assigned a new update location if its new entrance point is right at s (line 7-8).

Algorithm 6: updateKP N N (n, t) S, Qs← null; ∀p ∈ V , Lf(p)← ∞;

1

L_f(n)← 0; Qs← Qs∪ n;

2

Heap Lowerbounds, U pperbounds;

3

while Qs= null do

4

u← ExtractMin(Qs);

5

S← S ∪ u;

6

if u∈ Tr and SP(n,t) is not determined then

7

record SP (n, t);

8

foreach node v∈ u.adjacentNodes do

9

if L_f(v) > L_f(u) + weight(u, v) then

10

Lf(v)← Lf(u) + weight(u, v);

11

Q_s← Qs∪ v; πf(v)← u;

12

if u is a data object then

13

U pperbounds.add(L_f(u));

14

if u /∈ Tr then

15

further expand Tr until Lr(u)= ∞;

16

u.lowerbound←^L^f^(u)+L^r(u)−|SP (n,t)|

2 ;

17

Lowerbounds.add(u.lowerbound);

18

k-minimal values← Upperbounds.minK();

19

if Lowerbounds.min≥ max{the k-minimal

20 values} then

Tr← Tr−{ni: Lr(ni) > Lr(u)};

21

CS← all data objects in S ∪ Tr;

22

break;

23

continue the expansion until for each n_i∈ CS we have

24

ni.lowerbound= null;

run Algorithm 2 for verifying k-PNN;

25

In Algorithm 6, a Dijkstra’s expansion from the current

position n is conducted to update the candidate set CS, and all candidates’ lower bounds and upper bounds of the minimum detour distance. This process is similar to the search- ing phase in Algorithm 1. Since the expansion tree T_rrooted at t and the distance label L_r(u) are invariable, we just need a forward expansion from n to get L_f(u) and subsequently the lower bound of n. All L_r(u) (u ∈ T r) are added to the U pperbounds in the initialization step. If a data object scanned by the forward expansion is not included in Tr(line 15), which happens when the user deviates from the short- est path too much, T_r needs a further expansion to catch up with the forward expansion, and during the expansion of Tr the shortest path SP (n, t) may also be recorded if it has not been determined yet (line 16). In fact, with the user approaching the destination, a smaller search area from n is required, and the candidate set CS and the expansion tree T_r are also updated to smaller ones (line 21-22). At the same time, all scanned nodes’ lower bounds and upper bounds of minimum detour distance are also updated. Note that at line 14, we choose the min{Lf(u), Lr(u)} as u’s upper bound. Finally, the veriﬁcation function runs to acquire the exact (k + 1)-PNN results. As the k -PNN is already known, we just need a veriﬁcation for the (k + 1)^thPNN.

5.2 Deviating from the Shortest Path

In the case that the user does not follow the shortest path, as exempliﬁed in Figure 6, and leaves the current shortest path SP (s, t) = {s, n2, n₃, t} for destination t through n5, st and n4, ﬁrstly we need to update the current shortest path to the destination. There will be a split point f on the coming edge such that the shortest path from the current position s to t is {s, s, n2, n3, t} through node s when s is on the path (s, f ), and the shortest path changes to be {s, s_t, n₄, t} through node st after the user passes f .

V W

V

_W

Q

I

_W\SH

W\SH

Q

Figure 6: Split point & Object types

To find the split point f, first of all we search along the coming edges until encountering the first node with out de- gree ≥ 3 (st in this example), and it is easy to see that the shortest path from s to t must go through SP (s, t) or SP (s_t, t) when sis on the path (s, s_t). So the next step is to find the shortest path SP (st, t). If stis already contained in the expansion tree T_r, SP (s_t, t) can be constructed by trac- ing upwards from s_t along parent node (recorded by π_r()) until it reaches the root t. Otherwise, again a Dijkstra’s network expansion from stis conducted, trying to touch the expansion tree T_r. As stated before, T_rcovers the surround- ing area of t, therefore, as long as the user does not deviate too much, st is close to Tr and the expansion from st will meet T_rvery soon, after which the shortest path from s_tto t is determined just like that in the bidirectional search of the searching phase. In addition, the expansion tree Tf rooted