Dynamic Monitoring of Optimal Locations in Road Network Databases

(1)

(will be inserted by the editor)

Dynamic Monitoring of Optimal Locations in Road Network Databases

Bin Yao¹, Xiaokui Xiao², Feifei Li³, Yifan Wu¹,

1Department of Computer Science and Engineering, Shanghai Key Laboratory of Scalable Computing and Systems, Shanghai Jiao Tong University, Shanghai, China

2School of Computer Engineering, Nanyang Technological University, Singapore

3School of Computing, University of Utah

1[email protected], [email protected],²[email protected],³[email protected]

the date of receipt and acceptance should be inserted later

Abstract Optimal location (OL) queries are a type of spatial queries that are particularly useful for the strategic plan- ning of resources. Given a set of existing facilities and a set of clients, an OL query asks for a location to build a new facility that optimizes a certain cost metric (defined based on the distances between the clients and the facilities). Sev- eral techniques have been proposed to address OL queries, assuming that all clients and facilities reside in an Lpspace.

In practice, however, movements between spatial locations are usually confined by the underlying road network, and hence, the actual distance between two locations can differ significantly from their Lpdistance.

Motivated by the deficiency of the existing techniques, this paper presents a comprehensive study on OL queries in road networks. We propose a unified framework that addresses three variants of OL queries that find important applications in practice, and we instantiate the framework with several novel query processing algorithms. We further extend our framework to efficiently monitor the OLs when locations for facilities and/or clients have been updated. Our dynamic update methods lead to efficient answering of continuous optimal location queries. We demonstrate the efficiency of our solutions through extensive experiments with large real data.

1 Introduction

An optimal location (OL) query concerns three spatial point sets: a set F of facilities, a set C of clients, and a set P of candidate locations. The objective of this query is to identify a candidate location p ∈ P , such that a new facility built at p can optimize a certain cost metric that is defined based on the distances between the facilities and the clients.

Address(es) of author(s) should be given

OL queries find important applications in the strategic plan- ning of resources (e.g., hospitals, post offices, banks, retail facilities) in both public and private sectors [2, 6, 29]. As an example, we illustrate three OL queries based on different cost metrics.

Example 1 Julie would like to open a new supermarket in San Francisco that can attract as many customers as possi- ble. Given the set F (C) of all existing supermarkets (residential locations) in the city, Julie may look for a candidate location p, such that a new supermarket on p would be the closest supermarket for the largest number of residential locations.

Example 2 John owns a set F of pizza shops that deliver to a set C of places in Gotham city. In case that John wants to extend his business by adding another pizza shop, a nat- ural choice for him is a candidate location that minimizes the average distance from the points in C to their respective

nearest pizza shops.

Example 3 Gotham city government plans to establish a new fire station. Given the set F (C) of existing fire stations (buildings), the government may seek a candidate location that minimizes the maximum distance from any building to

its nearest fire station.

Several techniques [2, 6, 26, 29] have been proposed for processing OL queries under various cost metrics. All those techniques, however, assume that F and C are point sets in an Lp space. This assumption is rather restrictive because, in practice, movements between spatial locations are usually confined by the underlying road network, and hence, the commute distance between two locations can differ significantly from their Lpdistance. Consequently, the existing solutions for OL queries cannot provide useful results for practical applications in road networks.

(2)

v1

v₅

v4 v3 v₆

v2 f₁

f3

f2 c₁ c8

c₉ c7 c6

c₅

c₄ c3

c2

Fig. 1 Example ofG^o

v1

v₅

v4 v3 v₆

v₂ f1

f3

f2 c₁c9 c₈ c7

c6 c5

c₄ c₃

c₂

Fig. 2 Example ofG

Problem Formulation. This paper presents a novel and comprehensive study on OL queries in road network databases.

We consider a problem setting as follows. First, any facility in F or any client in C should locate on an edge in an undi- rected connected graph G^◦= (V^◦, E^◦), where V^◦(E^◦) denotes the set of vertices (edges) in G^◦. Second, every client c ∈ C is associated with a positive weight w(c) that captures the importance of the client. For example, if each client point c represents a residential location, then w(c) may be specified as the size of the population residing at c. Third, there should exist a user-specified set E^◦_cof edges in E^◦, such that a new facility f can be built on any point on any edge in E_c^◦, as long as f does not overlap with an existing facility in F . E_c^◦can be arbitrary, e.g., we can have E_c^◦= E^◦. We define P as the set of points on the edges in E_c^◦that are not in F , and we refer to any point in P as a candidate location. For example, Figure 1 illustrates a road network that consists of 5 vertices and 8 edges. The squares (crosses) in the Figure denote the facilities (clients) in the road network. The highlighted edges are the user-specified set E_c^◦of edges where a new facility may be built.

We investigate three variants of OL queries as follows:

1) The competitive location query asks for a candidate location p ∈ P that maximizes the total weight of the clients attractedby a new facility built on p. Specifically, we say that a client c is attracted by a facility f , and that f is an at- tractorfor c, if the network distance d(c, f ) between c and f is at most the distance between c and any facility in F . In other words, the competitive location query ensures that

p= argmax

p∈P

X

c∈Cp

w(c), (1)

where Cp = {c | c ∈ C ∧ ∀f ∈ F, d(c, p) ≤ d(c, f )}, i.e., Cpis the set of clients attracted by p. Example 1 demon- strates an instance of this query.

2) The MinSum location query asks for a candidate location p ∈ P on which a new facility can be built to minimize the total weighted attractor distance (WAD) of the clients.

In particular, the WAD of a client c is defined as ba(c) = w(c) · a(c), where a(c) denotes the distance from c to its attractor (referred as the attractor distance of c). That is, the MinSum location query requires that

p= argmin

p∈P

X

c∈C

w(c)·minn

d(c, f)| f ∈ F ∪ {p}o

= argmin

p∈P

X

c∈C

ba(c). (2)

Example 2 shows a special case of the MinSum location query where all clients have the same weight.

3) The MinMax location query asks for a candidate location p ∈ P to construct a new facility that minimizes the maximum WADof the clients, i.e.,

p= argmin

p∈P

max

c∈C

n

ba(c)| F =F ∪ {p}o

. (3)

Example 3 illustrates a MinMax location query.

One fundamental challenge in answering an OL query is that there exists an infinite number of candidate locations in P where the new facility may be built, i.e., P is a continuous domain on the edges of the network. (Recall that P contains allpoints on the edges in the user-specified set E_c^◦, except the points where existing facilities are located.) This neces- sitates query processing techniques that can identify query results without enumerating all candidate locations. Another complicating issue is that the answer to an OL may not be unique, i.e., there may exist multiple candidate locations in P that satisfy Equation 1, 2, or 3. We propose to identify allanswers for any given optimal location query, and return them to the user for final selection. This renders the problem even more challenging, since it requires additional efforts to ensure the completeness of the query results.

Lastly, in some applications it is common that clients or existing facilities have moved on the road network after the last execution of OL queries. Instead of computing the OLs again from scratch, we expect to monitor the query results in an incremental fashion, which may dramatically reduce the query cost (compared to the naive approach of recomputing the OLs after location updates for facilities and clients).

Contributions. In this paper, we propose a unified solution that addresses all aforementioned variants of optimal location queries in road network databases. Our first contribution is a solution framework based on the divide-and-conquer paradigm. In this framework, we process a query by first (i) dividing the edges in G^◦ into smaller intervals, then (ii) computing the best query answers on each interval, and finally (iii) combining the answers from individual intervals to derive the global optimal locations. A distinct feature of this framework is that most of its algorithmic components are generic, i.e., they are not specific to any of the three types of OL queries. This significantly simplifies the design of query processing algorithms, and enables us to develop general optimization techniques that work for all three query types.

(3)

Secondly, we instantiate the proposed framework with a set of novel algorithms that optimize query efficiency by ex- ploiting the characteristics of OL queries. We provide the- oretical analysis on the performance of each algorithm in terms of time complexity and space consumption.

Thirdly, we present extensions to our framework to en- able the incremental monitoring of the query results from OL queries, when the locations of facilitates or clients have changed.

Last, we demonstrate the efficiency of our algorithms with extensive experiments on large-scale real datasets. In particular, our algorithms can answer an OL query efficiently on a commodity machine, in a road network with 174,955 vertices and 500,000 clients. Furthermore, the query result can be incrementally updated in just a few seconds after a location update for either a client or a facility.

2 Related Work

The problem of locating “preferred” facilities with respect to a given set of client points, referred to as the facility location problem, has been extensively studied in past years (see [8,20] for surveys). In its most common form, the problem (i) involves a finite set C of clients and a finite set P of candidate facilities, and (ii) asks for a subset of k (k > 0) facilities in P that optimizes a predefined metric. The problem is polynomial-time solvable when k is a constant, but is NP- hard for general k [8, 20]. Furthermore, existing solutions do not scale well for large P and C. Hence, existing work on the problem mainly focuses on developing approximate solutions.

OL queries can be regarded as variations of the facility location problem with three modified assumptions: (i) P is an infinite set, (ii) k = 1, i.e., only one location in P is to be selected (but all locations that tie with each other need to be returned), and (iii) a finite set F of facilities has been constructed in advance. These modified assumptions distin- guish OL queries from the facility location problem.

Previous work [2, 6, 26, 29] on OL queries considers the case when the transportation cost between a facility and a client is decided by their Lpdistance. Specifically, Cabello et al. [2] and Wong et al. [26] investigate competitive location queries in the L2 space. Du et al. [6] and Zhang et al. [29] focus on the L1 space, and propose solutions for competitive and MinSum location queries, respectively. None of the solutions developed therein is applicable when the facilities and clients reside in a road network.

There also exist two other variations of the facility location problem, namely, the single facility location problem [8, 20] and the online facility location problem [9, 17], that are related to (but different from) OL queries. The single facility location problem asks for one location in P that

optimizes a predefined metric with respect to a given set C of clients. It requires that no facility has been built previ- ously, whereas OL queries consider the existence of a set F of facilities.

The online facility location problem assumes a dynamic setting where (i) the set C of clients is initially empty, and (ii) new clients may be inserted into C as time evolves. It asks for a solution that constructs facilities incrementally (i.e., one at a time), such that the quality of the solution (with respect to some predefined metric) is competitive against any solutions that are given all client points in advance. This problem is similar to OL queries, in the sense that they all aim to optimize the locations of new facilities based on the existing facilities and clients. However, the techniques [9, 17] for the online facility location problem cannot address OL queries, since those techniques assume that the set P of candidate facility locations is finite; in contrast, OL queries assume that P contains an infinite number of points, e.g., P may consist of all points (i) in an Lpspace (as in [2,6,26,29]) or (ii) on a set of edges in a road network (as in our setting).

In the preliminary version of this paper [27], we inves- tigated the static version of optimal location queries. Com- pared with the preliminary version, this paper presents a new study on handling updates in the locations of facilities and clients. In particular, we present novel incremental methods to identify OLs after updates for all three types of optimal location queries. We also include an extensive experiments that demonstrate the efficiency of the incremental update methods over the naive approach of recomputing OLs from scratch after each update (using the static methods from our preliminary version [27]).

Ghaemi et al. [10–12] studied static and dynamic ver- sions of competitive location queries. Their solutions, however, are not applicable for MinSum and MinMax location queries. In contrast, we present a uniform framework for all three variants of optimal location queries. Furthermore, as will be shown in Section 10, our solution for competitive location queries has a much lower memory consumption than Ghaemi et al.’s while only incurring a slightly higher computation cost.

Lastly, there is a large body of literature on query processing techniques for road network databases [3, 4, 15, 16, 19, 21–24, 28]. Most of those techniques are designed for the nearest neighbor (NN) query [16, 21, 22] or its variants, e.g., approximate NN queries [23, 24], aggregate NN queries[28], continuous NN queries [19], path NN queries [3], etc. None of those techniques can address the problem we consider, due to the fundamental differences between NN queries and OL queries. Such differences are also demonstrated by the fact that, despite the plethora of solutions for Lp-space NN queries, considerable research ef- fort [2, 6, 26, 29] is still devoted to OL queries in Lpspaces.

(4)

3 Solution Overview

We propose one unified framework for the three variants of OL queries. In a nutshell, our solution adopts a divide-and- conquer paradigm as follows. First, we divide the edges in E^◦into smaller intervals, such that all facilities and clients fall on only the endpoints (but not the interior) of the intervals. As a second step, we collect the intervals that are segments of some edges in E_c^◦, i.e, all points in such an interval are candidate locations in P . Then, we traverse those intervals in a certain order. For each interval I examined, we compute the local optimal locations on I, i.e., the points on I that provide a better solution to the OL query than any other points on I. The global optimal locations are pinpointed and returned, once we confirm that none of the unvisited intervals can provide a better solution than the best local optima found so far.

In the following, we will introduce the basic idea of each step in our framework; the details of our algorithms will be presented in Sections 4-8. For convenience, we define n as the maximum number of elements in V^◦, E^◦, C, and F , i.e., n = max{|V^◦|, |E^◦|, |C|, |F |}. Table 1 summarizes the notations frequently used in the paper.

Construction of Road Intervals. We divide the edges in E^◦ into intervals, by inserting all facilities and clients into the road network G^◦ = (V^◦, E^◦). Specifically, for each point ρ ∈ C ∪ F , we first identify the edge e ∈ E^◦ on which ρ locates. Let vland vrbe the two vertices connected by e. We then break e into two road segments, one from vlto ρ and the other from ρ to vr. As such, ρ becomes a vertex in the network. Once all facilities and clients have been inserted into G^◦, we obtain a new road network G = (V, E) where V = V^◦∪ C ∪ F . For example, Figure 2 illustrates a road network transformed from the one in Figure 1. Transforming G^◦ to G requires only O(n) space and O(n) time, since

|C| = O(n), |F | = O(n), and it takes only O(1) time to add a vertex in G^◦. In the sequel, we simply refer to G as our road network.

Traversal of Road Intervals. After G is constructed, we collect the set Ec of edges in E that are partial segments of some edges in E_c^◦. For example, the highlighted edges in Figure 2 illustrate the set Ec that correspond to the set E_c^◦of highlighted edges in Figure 1. As a next step, we traverse Ecto look for the optimal locations. A straightforward approach is to process the edges in Ec in a random order, which, however, incurs significant overhead, since the optimal locations cannot be identified until all edges in Ec are inspected. Section 6 addresses this issue with novel techniques that avoid the exhaustive search on Ec. The idea is to first divide Ec into subsets, and then process the subsets in descending order of their likelihood of containing the optimal locations.

Table 1 Frequently Used Notations

Symbol Description

G^◦= (V^◦,E^◦) the road network with vertex (edge) setV^◦(E^◦) C the set of clients

F the set of existing facilities

E^◦_c the user-specified set of edges on which the new facility can be built

P the set of candidate locations

d(p1, p2) the network distance between pointsp1andp2

w(c) the weight of a clientc

a(c) the attractor distance of a clientc ba(c) the weighted attractor distance of a clientc

Cp the set of clients attracted by a pointp n n= max{|V^◦|, |E^◦|, |C|, |F |}

G= (V, E) the road network transformed fromG^◦ (see Section 3)

Ec

the set of edges inEthat are segments of the edges inE^◦_c(see Section 3)

A(v) the attraction set of a vertexvinG (see Section 3)

m(p) the merit of a pointp(see Section 4.2)

Identification of Local Optimal Locations. In Section 4, we will present algorithms for computing the local optimal locations on any edge e ∈ Ec, based on (i) the attractor distance of each client, and (ii) the attraction set A(v) of each endpoint v of e. Specifically, the attraction set A(v) contains entries of the form hc, d(c, v)i, for any client c such that d(c, v) ≤ a(c). That is, A(v) records the clients that are closer to v than to their respective attractors (i.e., the respective nearest facilities). The attraction sets of e’s endpoints are crucial to our algorithm, since they capture all clients that might be affected by a new facility built on e (see Sec- tion 4 for a detailed discussion). We will present our algorithms for computing attraction sets and attractor distances in Section 5.

Updates of Facilities and Clients. In Sections 7 and 8, we present algorithms for incrementally monitoring the results of OL queries when there are updates in the locations of facilities and/or clients. The basic idea of our algorithms is to (i) maintain auxiliary information about the solutions to OL queries, and (ii) utilize the auxiliary information to acceler- ate the re-computation of query results in case of updates.

4 Local Optimal Locations

This section presents our initial algorithms for computing local optimal locations on any edge e ∈ Ec, given the attraction sets of e’s endpoints, and the attractor distances of the clients. For ease of exposition, we will elaborate our algorithms under the assumption that none of e’s endpoints is an existing facility in F , i.e., both endpoints of e are candidate locations in P . We will discuss how our algorithms can be extended (for the general case) in the end of the discussion for each query type.

(5)

Algorithm CompLoc (e)

1. construct an empty one-dimensional planeR 2. let`be the length ofe, andv_l(vr) be the left (right)

endpoint ofe

3. for each clientcthat appears inA(vl) but notA(vr) 4. create inRa line segment [0, a(c)− d(c, vl)]

5. assign a weightw(c) to the segment

6. for each clientcthat appears inA(vr) but not^A(vl) 7. create inRa segment [` − a(c) +d(c, vr), `]

with a weightw(c)

8. for each clientcthat appears in bothA(vl) andA(vr) 9. if` ≤2· a(c)− d(c, vl)− d(c, vr)

10. create inRa line segment [0, `] with a weightw(c) 11. else

12. create inRtwo line segments [0, a(c)− d(c, vl)] and [` − a(c) +d(c, vr), `], each with a weightw(c) 13. compute the intervalsI ⊆[0, `], such thatImaximizes the

total weights of the line segments inRthat fully coverI 14. return the intervals identified at Line 13

Fig. 3 The CompLoc Algorithm 4.1 Competitive Location Queries

Recall that a competitive location query asks for a new facility that maximizes the total weight of the clients attracted by it. Intuitively, to decide the optimal locations for such a new facility on a given edge e ∈ Ec, it suffices to identify the set of clients that can be attracted by each point p on e.

As shown in the following lemma, the clients attracted by any p can be easily computed from the attraction sets of e’s endpoints.

Lemma 1 A client c is attracted by a point p on an edge e ∈ Ec, iff there exists an entryhc, d(c, v)i in the attraction set of an endpointv of e, such that d(c, v) + d(v, p) ≤ a(c).

Proof Observe that d(c, p) ≤ d(c, v)+d(v, p). Hence, when d(c, v) + d(v, p) ≤ a(c), we have d(c, p) ≤ a(c), i.e., c is attracted by p. Thus, the “if” direction of the lemma holds.

Now consider the “only if” direction. Since p is a point on e, the shortest path from p to c must go through an endpoint v of e. Observe that d(p, c) ≥ d(v, c). Therefore, if c is attracted by p, we have a(c) ≥ d(p, c) ≥ d(v, c), which indicates that hc, d(c, v)i must be an entry in A(v).

Based on Lemma 1, we propose the CompLoc algorithm (in Figure 3) for finding local competitive locations on an edge e ∈ Ec. We illustrate the algorithm with an example.

Example 4 Suppose that we apply CompLoc on an edge e0

with a length ` = 5. Figure 4(a) illustrates A(vl) and A(vr), where vl(vr) is the left (right) endpoint of e0. Assume that each client c has a weight w(c) = 1 and an attractor distance a(c) = 5.

CompLocstarts by creating a one-dimensional plane R.

After that, it identifies those clients that appear in A(vl) but not A(vr). By Lemma 1, for any c of those clients, if c is attracted to a point p on e0, then d(p, vl) ∈ [0, a(c) − d(c, vl)], and vice versa. To capture this fact, CompLoc creates in R a line segment [0, a(c)−d(c, vl)], and assigns a weight w(c) = 1 to the segment. In our example, c1 is the only client that appears in A(vl) but not A(vr), and a(c1) − d(c1, vl) = 1.

Hence, CompLoc adds in R a segment s₁ = [0, 1] with a weight w(c1) = 1, as illustrated in Figure 4(b).

Next, CompLoc examines the only client c2that is con- tained in A(vr) but not A(vl). By Lemma 1, a point p ∈ e0

is an attractor for c, if and only if d(p, vl) ∈ [` − a(c2) + d(c2, vr), `]. Accordingly, CompLoc inserts in R a segment s₂= [` − a(c₂) + d(c₂, v_r), `] with a weight w(c2) = 1.

After that, CompLoc identifies the clients c3and c4that appear in both A(vl) and A(vr). For c3, we have ` ≤ 2 · a(c3) − d(c3, vl) − d(c3, vr), which (by Lemma 1) indicates that any point on e0can attract c3. Hence, CompLoc creates in R a segment [0, 5] with a weight w(c3) = 1. On the other hand, since ` > 2 · a(c4) − d(c4, vl) − d(c4, vr), a point p on e0 can attract c4, if and only if d(p, vl) ∈ [0, a(c₄) − d(c4, vl)] or d(p, vl) ∈ [` − a(c4) + d(c4, vr), `]. Therefore, CompLocinserts in R two segments s4 = [0, 2] and s⁰₄ = [4, 5], each with a weight 1 (see Figure 4(b)).

As a next step, CompLoc scans through the line segments in R to compute the local competitive locations on e0. Let p be any point on e0, and o be the point in R whose coordinate equals the distance from p to vl. Observe that, a client c ∈ C is attracted by p, if and only if there exists a segment s in R, such that (i) s is constructed from c and (ii) s covers o. Therefore, to identify the local competitive locations on e0, it suffices to derive the intervals I in R, such that (i) I ⊆ [0, `], and (ii) I maximizes the total weight of the line segments that fully cover I. Such intervals can be computed by applying a standard plane sweep algorithm [1] on the line segments in R. In our example, the local competitive locations on e0correspond to two intervals in R, namely, [0, 1]

and [4, 5], each of which is covered by three segments with a total weight 3. Finally, CompLoc terminates by returning the two intervals [0, 1] and [4, 5], as well as the weight 3. Our discussion so far assumes that no facility in F locates on an endpoint of the given edge e. Nevertheless, Com- pLoccan be easily extended for the case when either of e’s endpoints is a facility. The only modification required is that, we need to exclude the facility endpoint(s) of e, when we construct the line segment(s) on R that corresponds to each client. For example, if we have a line segment [0, 5] and the left endpoint of e is a facility, then we should modify segment as (0, 5] before we compute the local competitive locations on e. The case when the right endpoint of e is a facility can be handled in a similar manner.

(6)

e0(length=5)

v_l v_r

‹

^c¹^{, 4}

›

‹

^c³^{, 1}

›

‹

^c⁴^{, 3}

›

‹

^c²^{, 3}

›

‹

^c³^{, 2}

›

‹

^c⁴^{, 4}

›

A(vl) A(vr)

0 1 2 3 4 5

s₂ s₁

s₄ s'₄

s₃

R

(a) Edgee0 (b) PlaneR

Fig. 4 Demonstration of CompLoc

CompLocruns in O(n log n) time and O(n) space. First, constructing line segments in R takes O(n) time and O(n) space, since (i) there exist O(n) clients in the attraction sets of the endpoints of e, (ii) at most two segments are created from each client. Second, since there are only O(n) line segments in R, the plane sweep algorithm on the segments runs in O(n log n) time and O(n) space.

4.2 MinSum Location Queries

For any candidate location p, we define the merit of p (de- noted as m(p)) as

m(p) =X

c∈C

w(c)·max{0, a(c)− d(c, p)}.

That is, m(p) captures how much the total WAD of all clients may reduce, if a new facility is built on p. A point is a local MinSum location on an edge e ∈ Ec, if and only if it has the maximum merit among all points on e. Interestingly, the merit of the points on any edge e is always maximized at one endpoint of e, as shown in the following lemma.

Lemma 2 For any point p in the interior of an edge e ∈ E, ifm(p) is larger than the merit of one endpoint of e, then m(p) must be smaller than the merit of the other endpoint.

Proof Let vl(vr) be the left (right) endpoint of e. Recall that Cpis the set of clients attracted by p. First of all,

m(vl) = X

c∈C

w(c)·max{0, a(c)−d(c, vl)}

≥ X

c∈C_p

w(c)· a(c)− d(c, vl)

,and similarly,

m(vr)≥ X

c∈C_p

w(c)· a(c)− d(c, vr)

. (4)

Assume w.l.o.g. that m(p) > m(v_l). We have

m(p) = X

c∈C_p

w(c)· a(c)− d(c, p)

≥ m(vl)≥ X

c∈C_p

w(c)· a(c)− d(c, vl) ,

which leads to X

c∈Cp

w(c)(d(c, vl)− d(c, p))>0. (5)

Let C_p^l (C_p^r) be the subset of clients c in Cp, such that the shortest path from c to p passes through v_l(v_r). Clearly, C_p^r = Cp − C_p^l, and d(c, p) = d(c, vl) + d(vl, p) for any c ∈ C_p^l. By Equation 5,

X

c∈C_p^r

w(c)· d(c, vl)− d(c, p)

> X

c∈C_p^l

w(c)· d(c, p)− d(c, v_l) =d(v_l, p)· X

c∈C^l_p

w(c). (6)

Since d(c, vl) ≤ d(c, p) + d(vl, p) for any c ∈ C_p^r, we have

d(vl, p)· X

c∈C_p^r

w(c)≥LHS of (6)≥ d(vl, p)· X

c∈C_p^l

w(c),

which means that X

c∈C^r_p

w(c)> X

c∈C_p^l

w(c). (7)

Note that d(c, p) = d(c, vr) + d(v_r, p) for any c ∈ C_p^r, and d(c, v_r) ≤ d(c, p) + d(v_r, p) for any c ∈ C_p^l. By Eqn. 4 & 5,

m(vr)−m(p)≥ −m(p)+X

c∈Cp

w(c)·(a(c)−d(c, vr))

= X

c∈C^l_p

w(c)· d(c, p)− d(c, vr) + X

c∈C^r_p

w(c)· d(c, p)− d(c, vr)

≥ d(vr, p)·

−P

c∈C_p^lw(c) +P

c∈C^r_pw(c)

(8)

By Equations 7 and 8, m(vr) − m(p) ≥ 0. Hence, the lemma is proved.

By Lemma 2, if the endpoints of an edge e ∈ Ec have different merits, then the endpoint with the larger merit should be the only local MinSum location on e. But what if the merits of the endpoints are identical? The following lemma provides the answer.

Lemma 3 Let e be an edge in E with endpoints vl,vr, such thatm(v_l) = m(v_r). Then, either all points on e have the same merit, orvland vrhave larger merit than any other points one.

Proof First of all, by Lemma 2, for any point ρ on e, it must satisfy m(ρ) ≤ m(v_l) = m(v_r), given that m(vl) = m(v_r).

Now, assume on the contrary that there exist two points p and q on e, such that m(vl) = m(vr) = m(p) 6= m(q). This indicates that m(q) < m(vl) = m(v_r) = m(p). Assume without loss of generality that d(vl, p) < d(vl, q). We will prove the lemma by showing that m(p) = m(vl) cannot hold given m(p) > m(q).

Let Cpbe the set of clients attracted by p. We divide Cp

into three subsets C₁, C₂, and C₃, such that

C1={c ∈ C_p| d(c, p) =d(c, q)− d(p, q)}, C2={c ∈ Cp| d(c, p) =d(c, q) +d(p, q)}, C3=Cp− C₁− C₂.

(7)

It can be verified that, for any client c ∈ C3, the shortest path from c to p must go through vl. This indicates that, d(c, vl) = d(c, p) − d(vl, p), ∀c ∈ C3. (9) Given m(p) > m(q) and Cp⊆ C, we have

X

c∈C1

w(c)· d(c, q)− d(c, p)₋ X

c∈C2

w(c) d(c, p)− d(c, q)

+ X

c∈C3

w(c)· |d(c, q)− d(c, p)| > 0.

This leads to X

c∈C1∪C3

w(c)− X

c∈C2

w(c)>0 (10)

On the other hand, we have

m(vl)−m(p)≥ X

c∈C₁∪C₃

w(c)· d(c, p)− d(c, vl)

− X

c∈C2

w(c)· d(c, v_l− d(c, p)

=d(vl, p)· X

c∈C1∪C₃

w(c)− X

c∈C2

w(c)

>0. (By Equation 10) (11) Thus, the lemma is proved.

By Lemmas 2 and 3, we can identify the local MinSum locations on any given edge e as follows. First, we compute the merits of e’s endpoints based on their attraction sets. If the merits of the endpoints differ, then we return the endpoint with the larger merit as the answer. Otherwise (i.e., both endpoints of e have the same merit γ), we inspect any point p in the interior of e, and derive m(p) using the attraction sets of the endpoints. If m(p) < γ, both endpoints of e are returned as the result; otherwise, we must have m(p) = γ, in which case we return the whole edge e as the answer. In summary, the local MinSum locations on e can be found by computing the merits of at most three points on e, which takes O(n) time and O(n) space given the attraction sets of e’s endpoints.

Note that the above algorithm assumes that both endpoints of e are candidate locations. To accommodate the case when either endpoint of e is a facility, we post-process the output of our algorithm as follows. If the set S of local Min- Sum locations returned by our algorithm contains a facility endpoint, we set S = ∅; otherwise, we keep S intact. To understand this post-processing step, observe that the merit of any facility point is zero, since building a new facility on any point in F does not change the attractor distance of any client. Hence, if S contains a facility point, then the maximum merit of all points on e should be zero. In that case, the global MinSum location must not be on e, and hence, we can ignore the local MinSum locations found on e.

0 1 2 3 4 5 x 1

2 3 4 5 y

g3

g2

g4

g1

gup

g2

g1

g⁴

g³ a(c1) = 5

a(c2) = 4.5 a(c3) = 6 a(c4) = 5.5

(a) Attractor Distances Without

the New Facility (b) Piecewise Linear Functions Fig. 5 Representing Attractor Distances as Functions of the Location of the New Facility

4.3 MinMax Location Queries

Next, we present our solution for finding the local MinMax locations on any edge e ∈ Ec, i.e., the points on e where a new facility can be built to minimize the maximum WADof all clients. Our solution is based on the following observa- tion: For any client c, the relationship between the WADof c and the new facility’s location can be precisely captured using a piecewise linear function.

For example, consider the edge e0 in Figure 4(a). As- sume that there exist only 4 clients c₁, c₂, c₃, and c₄, as illustrated in the attraction sets in Figure 4(a). Further assume that (i) the clients’ attractor distances are as shown in Fig- ure 5(a), and (ii) all clients have a weight 1. Then, if we add a new facility on e0that is x (x ∈ [0, 5]) distance away from the left endpoint vl of e0, the WADof c3can be expressed as a piecewise linear function:

g3(x) = x + 1, if x ∈ [0, 3]

7 − x, if x ∈ (3, 5]

We define g3as the WADfunctionof c3. Similarly, we can also derive a WADfunction g_ifor each of the other client c_i (i = 1, 2, 4). Figure 5(b) illustrates gi(i ∈ [1, 4]).

Let gupbe the upper envelope [1] of {gi}, i.e., gup(x) = max_i{g_i(x)} for any x ∈ [0, 5] (see Figure 5(b)). Then, gup(x) captures the maximum WAD of the clients when a new facility is built on x. Thus, if the point (on e0) that is x distance away from vlis a local MinMax location, then gup must be minimized at x, and vice versa. As shown in Figure 5(b), gupis minimized when x ∈ [0, 0.5]. Hence, the local MinMax locations on e0 are the points p on e0 with d(p, vl) ∈ [0, 0.5].

In general, to compute the local MinMax locations on an edge e, it suffices to first construct the upper envelope of all clients’ WAD functions, and then identify the points at which the upper envelope is minimized. This motivates our MinMaxLoc algorithm (in Figure 6) for computing local MinMax locations.

Given an edge e ∈ Ec, MinMaxLoc first retrieves two attraction sets A(vl) and A(vr), where vl (vr) is the left (right) endpoint of e. After that, it creates a two-dimensional plane R, in which it will construct the WAD functions of

(8)

Algorithm MinMaxLoc (e)

1. let`be the length ofe, andv_l(vr) be the left (right) endpoint ofe

2. construct an empty two-dimensional planeR

3. letC−be the set of clients that appear in neitherA(vl) norA(vr) 4. find the clientc0∈ C−with the largest WAD

5. construct the WADfunction ofc0, i.e., draw inRa line segment from point with coordinate 0,b^a(c0) to point with coordinate

`,b^a(c0)

6. letC∆be the set of clients that appear in bothA(vl) andA(vr) 7. for each clientc ∈ C − C∆− C−

8. ifcappears inA(vl)

9. x1= 0, y1=w(c)· d(c, v_l) 10. x3=`, x2= min{`, a(c)− d(c, v_l)} 11. y2=y3=w(c)· x2+d(c, v_l)

12. else /∗ifcdoes not appear inA(vl), but appears inA(vr)∗/ 13. x1=`, y1=w(c)· d(c, vr)

14. x3= 0, x2= max{0, ` − a(c) +d(c, vr)} 15. y2=y3=w(c)· ` − x2+d(c, vr)

16. construct the WADfunction ofc, i.e., draw inRtwo line segments, from (x1, y1) to (x2, y2), then to (x3, y3) 17. for each clientc ∈ C∆/∗cappears in bothA(vl) andA(vr)∗/ 18. x1= 0, y1=w(c)· d(c, v_l)

19. β= ¹₂` −¹₂d(c, vl) +¹₂d(c, vr)

20. x2= min{β, a(c)− d(c, vl)}, y2=w(c)· x₂+d(c, vl) 21. x3= max{β, ` − a(c) +d(c, vr)}, y3=y2

22. x4=`, y4=w(c)· d(c, vr)

23. construct the WADfunction ofc, i.e., draw inRthree line segments, from (x1, y1) to (x2, y2), then to (x3, y3), then to (x4, y4)

24. compute the upper envelopegupof the WADfunctions inR 25. identify and return the points on whichgupis minimized

Fig. 6 The MinMaxLoc Algorithm

some clients. Specifically, MinMaxLoc first identifies the set C₋ of clients that appear in neither A(vl) nor A(vl). By Lemma 1, for any client c ∈ C₋, the attractor distance of c is not affected by a new facility built on e. Hence, the WAD

function of c can be represented by a horizontal line segment in R. Observe that, only one of those segments may affect the upper envelope gup, i.e., the segment corresponding to the client c^∗with the largest WADin C₋. Therefore, given C₋, MinMaxLoc only constructs the WADfunction of c^∗, ignoring all the other clients in C−.

Next, MinMaxLoc examines each client c ∈ C −C₋, and derive the WADfunction of c based on A(vl) and A(vr). In particular, each WADfunction is represented using at most three line segments in R. Finally, MinMaxLoc computes the upper envelope gup of the WAD functions in R, and then identifies and returns the points at which gupis minimized.

MinMaxLoccan be implemented in O(n log n) time and O(n) space. Specifically, given the attractor distances of the clients in C, we can identify the client c^∗ with O(n) time and space. After that, it takes only O(n) time and space to construct the WADfunctions of clients, since each function is represented with O(1) line segments. As there exist O(n) segments in R, the upper envelope gupshould contain O(n)

v1 v2

f1

2 1 G2

G3

G1

Fig. 7 Example of Lemma 4

v₁ v₃

v₂

e₁

e₄

e2

e₃ e5

e7

e₆

e₈

Fig. 8 Example of GPart

linear pieces, and can be computed in O(n log n) time and O(n) space [13]. Finally, by scanning the O(n) linear pieces of g_up, we can compute the local MinMax locations on e in O(n) time and space.

In addition, MinMaxLoc can also be extended to handle the case when either endpoint of e is a facility in F . In particular, if the left endpoint vlof e is a facility, then MinMaxLoc excludes vlwhen it computes the upper envelope gupof the WADfunctions. That is, the domain of gupis defined as (0, `]

instead of [0, `]. The case when the right endpoint of e is a facility can be addressed similarly.

5 Computing Attraction Sets and Attractor Distances Our algorithms in Section 4 require as input (i) the attractor distances of all clients in C, and (ii) the attraction sets of the endpoints of the given edge e ∈ Ec. The attractor distances can be easily computed using the algorithm by Erwig and Hagen [7]. Specifically, Erwig and Hagen’s algorithm takes as input a road network G and a set F of facilities. With O(n log n) time and O(n) space, the algorithm can identify the distance from each vertex v in G to its nearest facility inF . In the following, we will investigate how to compute the attraction sets of the vertices in G, given the attractor distances derived from Erwig and Hagen’s algorithm.

5.1 The Blossom Algorithm

By definition, a client c appears in the attraction set of a vertex v, if and only if d(c, v) is no more than the attractor distance a(c) of c. Therefore, given the attractor distances of all clients, we can compute the attraction sets of all vertices in G in a batch as follows. First, we set the attraction set of every vertex in G to ∅. After that, for each client c ∈ C, we apply Dijkstra’s algorithm [5] to traverse the vertices in G in ascending order of their distances to c. For each vertex v encountered, we check whether d(c, v) ≤ a(c).

If d(c, v) ≤ a(c), c is inserted into the attraction set of v.

Otherwise, d(c, v⁰) > a(c) must hold for any unvisited vertex v⁰, i.e., none of the unvisited vertices can attract c. In that case, we terminate the traversal and proceed to the next client. Once all clients are processed, we obtain the attraction sets of all vertices in G. We refer to the above algorithm as Blossom, as illustrated in Figure 9.

Blossomhas an O(n²log n) time complexity, since it in- vokes Dijkstra’s algorithm once for each client, and each execution of Dijkstra’s algorithm takes O(n log n) time in the