Probabilistic Range Query over Uncertain Moving Objects in Constrained 2D Space∗

(1)

Probabilistic Range Query over Uncertain Moving Objects in Constrained 2D Space

^∗

Zhi Jie Wang, Dong-Hua Wang, and Bin Yao Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

[email protected], [email protected]

Abstract Recently, probabilistic range query (PRQ) over uncertain moving objects has at- tracted an ever increasing attention, as it can help people return the interested moving objects together with quantitative probabilities. Most of existing works usually assume objects are moving freely in 2D space. This assumption, however, is impractical in real applications since various obstacles may limit the movement of the moving objects. Specifically, in this paper we consider the problem of PRQ over uncertain moving objects in constrained 2D space. We analyse its unique properties and show that it is infeasible to solve this problem in a direct way.

In order to tackle it, we first propose an elementary solution, then develop a suit of optimized strategies for further improving the efficiency. We demonstrate the effectiveness and efficiency of our proposed approaches through extensive experiments under various experimental settings.

∗Technical Report

(2)

1 Introduction

In the recent years, with the rapid development of positioning technologies like GPS, RFID and WSN, as well as the broad application of location-based services [18] in many scenarios such as digital battlefield, traffic control, mobile workforce management, transportation industry, range query has been the subject of much attentions [31, 7, 29, 27, 22, 16, 19, 15, 14, 10]. In general, a mobile object either reports its location to servers through a wireless interface, or it is tracked through ground-based radars or satellites [17]. It however is often impossible (for the database) to contain the total status of an entity, due to the limited network bandwidth and limited battery power of the mobile devices [4]. Usually it is only feasible to obtain discrete location information, implying that the specific position is uncertain, until we obtain the next sampled location information. In order to alleviate these problems, the idea of incorporating uncertainty into moving objects data has been proposed [28]. In order to characterize the location uncertainty of an object, a common model is to associate a closed region together with a probability density function (PDF) [28, 4, 23, 3]. In other words, this model assumes that an object can always be found in a closed region, and its location follows some given probabilistic distribution (e.g., uniform distribution) linked to the region.

In the literature of moving objects database, a lot of works discuss PRQ over uncertain moving objects [5, 21, 17, 3, 25, 4, 30, 24]. On the whole, previous works either assume objects are moving on well defined routes (e.g., [32, 5]), or assume objects are moving freely in 2D space (e.g., [3, 30]). The first assumption is suitable for Query processing on Road Networks (QRN) where the road networks are represented by a graph. The second assumption is suitable for Query processing on Region (called QR) where moving objects have no predefined routes. For example, a battlefield usually does not have any fixed road network structure and tanks/soldiers can move freely. In this paper we are interested in QR. The second assumption however is impractical in realistic applications, since various obstacles can restrict the movement of the objects. For example, an automobile usually cannot run in a lake or river. Inspired by this, we introduce the concept of restricted area (RA). First, it better fits real applications (like in the previous examples). Second we note that more accurate answers can be obtained by incorporating these additional information. In particular, we observe that ignoring constraints leads to incorrect answers. The main reason is ignoring the changes of the uncertainty region (UR)¹ and PDF.

Taking Figure 1 for example, we denote by RL_j, DT_j, R, RA_i the recorded location of the moving object Oj, the distance threshold of Oj, the query range, and the i^th restricted area, respectively. For ease of discussion, suppose the location of Oj is subjected to uniform distribution in its UR (denoted by U R_j). Then, the probability of O_j being located in R is equal to the ratio of the two areas (i.e., the area of U Rj∩ R over the one of U R_j).

Given the query “retrieve the objects that are possibly located in R currently and return their appearance probabilities”. Figure 1(a) depicts the case ignoring the constraints, where the circle (Oj. for short ) illustrates its UR. In this case, the query answer is {(O1, 100%), (O3, 56%), (O4, 42%)}. In contrast, Figure 1(b) presents the case considering the constraints, where O_j. cannot be simply regarded as U R_j; its real value is O_j. − ∪⁴_i=1RA_i. In this case, the query answer is { (O1, 100%), (O3, 22%), (O4, 76%)}. We observe that the above two answers are different, and it is easy to know that the first one is incorrect.

Motivated by the above fact, we investigate the problem of PRQ over uncertain moving objects in constrained 2D space. To the best of our knowledge, it is the first effort to address this problem. First, we formulate this problem and present a query framework; based on this framework, we analyse each step in details. However, as demonstrated in Section 3.2, solving this problem in a direct way is complicated and difficult to implement. In order to solve it,

1The uncertainty region is the so called closed region in which the object can always be found; for a more formal definition please refer to Section 2.

(3)

Recorded Location of O1

O3

O4

RA1

R

O2

RA2

RA4

RA3

(a)ignore constraints

O2

O₃ O₄ Distance Threshold of O1

R RA1

RA₂ RA3

RA4

(b) consider constraints

Figure 1: Illustration of PRQ over uncertain moving objects

we adopt a pre-approximation idea and present a label based data structure (LBDS), which is convenient not only for representing the UR and IS (intersection set between the R and UR), but also for the follow-up computation (e.g., computing the area of the IS). In addition, other important issues (e.g., picking out the real UR, computing the probabilities for different PDFs) are also discussed. Further, we present two enhanced algorithms, called CompUR and CompIS, for computing the UR and IS, respectively. In the two algorithms, we utilize tactfully the MBRs (minimum bounding rectangles) of various entities to help us select useful candidate entities; in particular, a series of strategies (e.g., sorting candidate entities based on their spans, postponing the operation on holes, lazy update) are incorporated. Finally, as it is time consuming to compute the UR on the fly, we adopt a preprocessing idea to further enhance the efficiency. In summary, we make the following contributions.

• We refine the previous uncertain moving object model by introducing the concept of restricted area, and re-formulate the query, based on this refined model.

• We analyse the problem and show that tackling it in a direct way is troublesome and difficult to implement; we also note that it is almost infeasible to develop an exact solution.

• we propose a basic solution, followed by a suit of strategies that further improve its efficiency.

• We demonstrate the performance of our proposed algorithms through extensive experiments under various experimental settings.

The rest of the paper is organized as follows. In the next section we formulate our problem.

We introduce a basic framework and analyse the problem in Section 3. We present an elementary solution in Section 4, and discuss two enhanced algorithms in Section 5. We introduce the preprocessing idea for further improving the efficiency in Section 6. We evaluate the efficiency of our proposed algorithms through extensive experiments in Section 7. Section 8 reviews the related work. Finally, Section 9 concludes this paper.

2 Problem Definition

Given a territory with M disjoint restricted areas (RAs), and N moving objects (MOs), which are continuously and freely moving in the territory but cannot enter into the RAs, we assume that the last sampled location²of each MO is already stored on the server. In addition, suppose each MO reports its new location to the sever when the deviation between its current location and the recorded location (RL) is larger than a given distance threshold (DT). Formally, we denote the territory by T, the restricted area by RAi(1 ≤ i ≤ M ) and the moving object by Oj(1 ≤ j ≤ N ). For Oj, we term the location at an arbitrary instant of time t as (L^t_j.X, L^t_j.Y ), the recorded location as RL_j, the distance threshold as DT_j. In addition, we set the following conditions are always satisfied:

(L^t_j.X, L^t_j.Y ) /∈ ∪^M_i=1RA_i (1)

2In this paper, the terms the last sampled location and recorded location are used interchangeably.

(4)

(L^t_j.X, L^t_j.Y ) ∈ T − ∪^Mi=1RA_i (2)

∪^M_i=1RA_i ⊂ T (3)

Since the location of a MO is continuously changing, it is unreasonable to simply use the RL as its current location. Essentially, the specific location at the current time is often unknown.

Here a common model [4, 28] allows us to capture the location uncertainty of a MO through two components:

Definition 1. (Uncertainty Region) The uncertainty region of a moving object O_j at a given time t, denoted by U R^t_j, is a closed region where O_j can always be found.

Definition 2. (Uncertainty Probability Density Function) The uncertainty probability density function of Oj at time t, denoted by f_j^t(x, y), is the PDF of Oj’s location at a given time t. Its value is 0 if (L^t_j.X, L^t_j.Y ) /∈ U R^t_j.

Since f_j^t(x, y) is a PDF, in theory, it has the property:

Z

U R^t_j

f_j^t(x, y)dxdy = 1 (4)

In general, the UR under the distance based update policy [28] can be derived based on the following formula.

U R^t_j =C(RLj, DT_j) (5)

where C(·) denotes a circle with centre RLj and radius DTj. For convenience, we use Oj. to denote this region. The above representation is only possible where there is no RA. Therefore, the real UR for our problem should be as follows.

U R^t_j = O_j. − ∪^M_i=1RA_i (6)

Note that, under the distance based update policy, for any two different time t₁ and t₂, we have

U R^t_j¹ = U R^t_j² (7)

f_j^t¹(x, y) = f_j^t²(x, y) (8)

where t1, t2 ∈ (L,N], L refers to the latest reporting time, N refers to the current time. In view of these, in the remainder of the paper we use U Rj and fj(x, y) to denote the UR and PDF of O_j, respectively.

Definition 3. (Probability Range Query) Given a closed query range R, a probability range query over uncertain moving objects in constrained 2D space returns a series of tuples in form of (Oj, Pj), where Pj is the non-zero probability of Oj being located in R.

Since the realistic application environment varies from place to place, the shapes of RAs should be diversified, whereas our objective is to establish a general approach instead of focusing on certain specific environment. Therefore throughout this paper we use polygons to denote RAs. In addition, we assume

(L^t_j.X, L^t_j.Y ) 6= (L^t_j0.X, L^t_j0.Y ) (9) where j and j0 denote two different MOs. Note that, in this paper we restrict our attention to the distance based update policy³. For convenience, we summarize frequently used symbols in Table 1.

3Another common location update policy is time based update, i.e. updating the RL periodically (e.g., every 3 minutes); a time based update policy is much more challenging to handle; we leave this interesting topic for a future work.

(5)

Table 1: Frequently used symbols in this paper

Symbols Description

R query range

RAi the i restricted area Oj the j moving object

U Rj uncertainty region of the j moving object DTj distance threshold of the j moving object fj(x, y) PDF of the j moving object

ISj the intersection set between R and U Rj

Pj probability of Ojbeing located in query range R

3 Framework & problem analysis

For ease of understanding we start by presenting a basic framework and then provide a detailed analysis of our problem.

3.1 The query framework

Definition 4. (Candidate Moving Object) Given RLj and DTj of a moving object Oj, and the query range R, we denote by BR and BO_j. the MBRs of R and O_j., respectively, where Oj.=C(RLj, DTj). Oj is a candidate moving object (CMO) such that BR∩BOj.6=∅.

Definition 5. ( Candidate Restricted Area) Given RL_j and DT_j of a moving object O_j, and a restricted area RAi, we denote by BOj. and BRAi the MBRs of Oj. and RAi, respectively, where O_j.=C(RLj, DT_j). RA_i is a candidate restricted area (CRA) such that BRA_i∩BO_j.6=∅.

Procedure QueFrame

Input: R, RLs, DTs and PDFs Output: ∪(Oj, Pj), where Pj> 0 (1) Result ← ∅;

(2) CMOs ← Search moving objects that may be located in R (3) for each Oj∈ CMOs

(4) CRAs ← Search candidate restricted areas (5) U Rj← Compute the uncertainty region of O_j (6) ISj ← Compute U R_j∩ R

(7) if (ISj6= 0)

(8) if (ISj= U Rj) then Pj← 1 (9) else Pj←R

IS_jfj(x, y)dxdy

(10) if (Pj6= 0) then Result ← (Oj, Pj) ∪ Result (11) return Result

Figure 2: PRQ over Uncertain Moving Objectsin Constrained 2D Space

In fact, it is straightforward to derive the basic framework illustrated in Figure 2. First, we search the CMOs, this can be achieved by comparing their MBRs (line 2). Second, for each CMO, we search its CRAs, and compute its UR and IS (line 4-6). Third, if the IS is equal to the UR (∅), we set Pj to 1 (0). Otherwise, we obtain Pj by calculating the integral on ISj (line 7-9). Fourth, if P_j is not equal to 0, we store the identifier of this CMO and its probability, then shift to dealing with the next CMO until all CMOs have been processed (line 10). At last, we return the result that includes all the CMOs that have a non-zero probability (line 11).

(6)

3.2 Where are the troubles?

Once we obtained the above framework, all the rest seem to follow it. We now study this framework in more details.

Since all RAs are static, their MBRs can be obtained easily. In addition, since the RLs and DTs are already stored in the database, the MBRs of all the MOs can easily be computed (e.g., for O_j, its MBR is a square centering at RL_j and with 2DT_j × 2DT_j size). Then, for all RAs and MOs, we can directly use a twin-index to manage their MBRs. For instance, we can build two R-trees (or a variant such as the R^∗ tree) to manage the MBRs of MOs and the ones of RAs, respectively. Therefore, For the lines 1-4 in procedure QueFrame, it is indeed easy.

MBR of O .j RLj DTj

(a)

UR j R

(b)

IS j

(c)

Figure 3: Illustration of A Direct Solution

Now, suppose we have obtained all the CRAs of O_j, we show how to achieve the rest of steps in a direct way. Taking Figure 3 for example, the grey polygons and the biggest rectangle illustrate the CRAs of Oj and the query range R, respectively. In this case we can rewrite Equation 6 as follows.

U Rj = Oj. − ∪^k_m=1CRAm (10)

where k denotes the number of CRAs of Oj, CRAm denotes the m^th CRA of Oj. Based on Equation 10, we can get U R_j, as shown in Figure 3(b). Further, we execute a Boolean intersection operation between U R_j and R, then get IS_j as shown in Figure 3(c). Since IS_j 6=

∅, we go to line 9 in procedure QueFrame.

Computing the integral of such geometrical entity is not an easy task, since its boundary consists of both straight line segments and curves, and it includes many holesIn fact, there may be multiple subdivisions (in addition to holes) in the IS, these even more complex cases will be discussed in Section 5. . A well known solution to this problem is to use the Monte Carlo method. The basic idea is to randomly generate N₁ points in U R_j; for each point p, compute f_j(x_i, y_i), where (x_i, y_i) is the coordinates of the point p, and check whether or not p ∈ ISj. Without loss of generality, suppose N₂ points are located in IS_j. Then, we have

Pj = PN2

i=1fj(xi, yi) PN1

i=1fj(xi, yi) (11)

If we look a bit deeper into our original idea we realise that four main issues arise.

First, given a random point p, we need to determine whether or not p ∈ U Rj (or p ∈ ISj).

This is not an easy task. In fact the solution to the point in polygon problem [9] cannot be applied in our context since the geometrical entity considered here is more complicated. In particular curves on the boundaries make it hard to extend the technique in [9] to our case of concern.

Second, suppose O_j follows uniform distribution in its UR; in this case, it is not reasonable to use the Monte Carlo method, since this method is time-consuming. As Pfoser et al. [17]

pointed out, uniform distribution corresponds to the “worst-case” scenario; in this case, we have Pj = A(ISj)

A(U Rj) (12)

(7)

where A(·) denotes the area of the geometrical entity. Unless stated otherwise, it has the same meaning in the rest of the paper. Obviously, it should be more efficient to compute the ratio of these two areas. Then, how to compute the area of such a geometrical entity? A potential solution could be to divide the geometrical entity into multiple small strips, as shown in Figure 4(a) and then to compute the area of each strip and add them together. Clearly, it is an approximate solution since the curves are regarded as line segments when the area of each strip is computed. In practice this solution, is complicated and difficult to implement.

Third, these geometrical entities are somewhat complicated, and as such are not easy to represent and operate. Then, how to represent them in a concise and efficient way? Note that, a well known data structure, doubly connected edge list [1], consists of three collections of records: One for the vertices, one for the faces, and one for the half-edges. For our problem, this data structure is a little clunky and not intuitive enough.

Fourth, computing the UR is not a simple subtraction operation. For instance, Figure 4(b) illustrates the case before executing the subtraction operation; the result is shown in Figure 4(c), in which there are 4 subdivisions. However only S2 is the real UR, other subdivisions are invalid, the reason will be explained in Section 4.3.

IS j

(a)

DTj RLj

(b)

S4 S3

S2 S1

(c)

Figure 4: Illustration of Computing Area and Real UR

One way to overcome the above challenges is to address them directly. However, according to the above analysis it reveals to us that the natural method is troublesome and difficult to implement. In addition, developing an exact solution to our problem is almost infeasible, since first, the Monte Carlo method is an approximation algorithm and second, calculating the area of such complex geometrical entity is difficult.

4 Our Solution

In this section, we adopt a pre-approximation idea that we later use as a basis to tackle our problem. Next, we propose a label based data structure that is intuitive, concise, and convenient for the follow-up computation. Third, we address the issue of the error in the computation of the UR. Finally, we discuss two methods for computing the probability.

4.1 Pre-approximation

If we approximate the curves on the boundary of the UR (or IS) into line segments then the troubles shown in Section 3.2 seem to be tackled easily. In fact, existing curve interpolation techniques can indeed transform the boundary of the UR (or IS) into line segments. It however is still inconvenient and inefficient, since there are too many URs and ISs in the query processing.

In addition, it is also difficult and troublesome to approximate curves into line segments in such a manner, since the shapes of different URs (or ISs) vary from one to another.

In our solution, we adopt a pre-approximation idea. Specifically, before we compute the UR based on Equation 10, we first transform O_j. into an equilateral polygon (EP) as follows.

X_k= RL_j.X + DT_j· cos((k − 1) · 2π/EL) (13) Y_k= RL_j.Y + DT_j· sin((k − 1) · 2π/EL) (14)

(8)

where (RL_j.X, RL_j.Y ) denote the coordinates of the recorded location, EL is the number of edges (of the EP), k∈[1, 2, . . . , EL], (X_k, Y_k) denote the coordinates of the k^th vertex (of the EP).

For clarity, we denote by EP_j the EP transformed from O_j.. Thus, according to Equation 10, we have

U Rj .

= EPj − ∪^k_m=1CRAm (15)

Generally speaking, we can get more accurate results if we use more equivalent edges. Note that, we let Oj. be the circumscribed circle of EPj. Hence we can assure that the distance from any point in EP_j to the center is always less than its distance threshold DT_j. The main reasons for this transformation are as follows. First, it is convenient for the follow-up calculations since operating on line segments, in most cases, is more simple and efficient than on curves. Second, it is easy to represent the calculated result. Last, all the troubles discussed in Section 3.2 can be significantly simplified.

4.2 LBDS

Definition 6. (Outer Ring, Inner Ring) Given a closed regionCR with a hole H , the boundary of CR and the one of H are termed as the outer ring and inner ring of CR, respectively.

Once the pre-approximation idea is adopted, the boundaries of all the geometrical entities will be no curve. A well known data structure, doubly connected edge list (DCEL) [1], may be a candidate for representing the UR and IS. It however is redundant and not intuitive for our problem, as discussed in Section 3.2. In particular, we observe that, the UR may be a closed region with hole(s) or just be a simple closed region; the IS possibly consists of multiple subdivisions with hole(s). For ease of operating on them in a unified manner, we present a Label Based Data Structure (LBDS) that consists of three domains - one label domain and two pointer domains.

• Flag: This domain tells us whether there are holes in the entity. Specifically, when the Flag is equal to 0, it means there is no hole; otherwise, there is no less than one hole.

• OPointer: This domain points to a simple polygon that denotes the outer ring of the entity.

A simple polygon consists of two domains.

– VPointer: This domain points to a linked list that store a series of vertexes.

– B: This domain stores the MBR of the polygon.

• IPointer: This domain points to a linked list in which the simple polygons are stored if the Flag is not equal to 0. Here the simple polygons denote the holes (or inner rings) of this entity.

The UR can directly be represented by the LBDS. The IS can be represented by a linked list in which a series of ’LDBSs’ are stored. The benefits of this structure will gradually be demonstrated in the rest of the paper.

4.3 Picking out the real UR

Executing Formula 15 is straightforward. We can use EPj to subtract CRA1 and then use the obtained result to subtract CRA₂, and so on.

In Section 3.2, we show that computing the UR is not a simple subtraction operation. In another word, Equation 6, 10, and Formula 15 imply some possible mistakes; for presentation simplicity, we abuse them. Let us revisit Figure 4(c), in which S₂ (rather than other three subdivisions) is the real UR. To this point, we are based on the lemma below.

Lemma 1. (Choose Real UR) Given Oj., RLj and CRAs of Oj, we let S_k be one of subdivisions after we execute a subtraction operation based on Equation 10. If RL_j ∈ S_k; then √

(S_k), where √

(·) denotes that it is the real UR. Otherwise, ¬(√ (S_k)).

(9)

Proof. We first prove “RL_j ∈ S/ _k ⇒ ¬(√

(S_k))”. According to Definition 6, we only need to prove that O_j cannot be found in S_k. First, since RL_j is the latest recorded location, and the distance threshold is DTj; thus, Oj must be located in Oj.. Second, based on analysis geometry, it is easy to know that O_j cannot reach S_k at all if it does not walk out of O_j. (e.g., see Figure 4(b), O_j cannot reach the topmost (or bottommost) region of O_j.). Thus, we cannot find Oj in Sk.

Next, we prove “ RL_j ∈ S_k ⇒ √

(S_k)”. Similarly, according Definition 6, we only need to prove that O_j can always be found in S_k. By contradiction. Assume that we cannot find O_j in Sk; then Oj must be out of Sk. For any point p ( /∈ S_k), there are only two cases:

• Case 1: p /∈ O_j..

• Case 2: p ∈ S_k^∗, where S_k^∗6=S_k ∧ S_k^∗ ⊂ O_j. ∧ S_k^∗∩S_k= ∅.

Since RLj ∈ S_k and we already assumed Oj cannot be found in S_k; then the location of Oj

must belong to Case 1 or 2. Based on analysis geometry, O_j must have walked out of O_j.. It is contrary to the given condition. Pulling all together, thus the lemma holds.

Especially, once the pre-approximation idea is used, determining whether or not RL_j ∈ S_k is simple since it is a point in polygon problem [9].

After we obtain the real UR, we can get the IS by executing an “ intersection operation”

on the UR and R. There are many algorithms (e.g., [26, 8, 20, 12, 13]) that can perform “ intersection operation” on polygons with holes. They, however, do not well consider the case where there may be a lot of holes. Even so, there is a simple solution to tackle this problem.

Specifically, we compute the intersection set between the R and the outer ring of the UR at first, termed this result as OcR. Next, we use OcR to subtract each inner ring of the UR one by one. Finally, we obtain the IS.

4.4 Two solvers for different PDFs

In this subsection, we discuss two methods for computing the appearance probability, they are used for uniform and arbitrary distribution PDFs, respectively.

Quick Method For uniform distribution PDF, consider Equation 12, the crucial task is to compute the areas of the IS and UR. We show in Section 3.2 that directly computing these areas is complicated. Whereas it is simple and efficient to compute these areas now, which ascribes the pre-approximation and LBDS.

The quick (Q) method is straightforward. First, the area of a polygon can be derived based on the following formula [2].

S = 1 2·

x1 x2

y₁ y₂

+

x2 x3

y₂ y₃

+ ... +

xn x1

y_n y₁

(16) where

x₁ x₂ y1 y2

= (x1 · y₂ − x₂ · y₁), and (x1, y1) denote the coordinates of a vertex, other symbols have similar meanings. Further, since we use the polygon as the most basic element in the LBDS; then, the area of the UR can be obtained as follows.

A(U Rj) = A(OU Rj) −

K

X

i=0

A(IU Rⁱj) (17)

where OU R_j denotes the outer ring of U R_j, K (≥ 0) is the number of inner rings (or holes) in U R_j, IU Rⁱ_j denotes the i^th inner ring in U R_j, i ≤ K. Similarly, we have

A(ISj) =

N S

X

i=1

A(ISjⁱ) =

N S

X

i=1

(A(OISjⁱ) −

K

X

k=0

A(IIS_j^i,k)) (18)

(10)

where N S (≥ 1) is the number of subdivisions in IS_j, IS_jⁱ is the i^th subdivision, OIS_jⁱ is the outer ring of IS_jⁱ, K (≥ 0) is the number of inner rings in IS_jⁱ, IIS_j^i,k is the k^th inner ring from ISⁱ_j, k ≤ K.

Remark 1. In the rest of the paper, the notations OU R_j, IU Rⁱ_j and the like remain the same meanings, unless stated otherwise.

Monte Carlo Method We mentioned the Monte Carlo (MC) method in Section 3.2, where the trouble is to determine whether p ∈ U Rj (or p ∈ ISj). Now, there is almost no any trouble, since no curve is on the boundary of the UR (or IS) after we use the pre-approximation idea.

Definition 7. (Valid Random Point) Given a randomly generated 2D point p ∈ BU Rj, where BU R_j denotes the MBR of U R_j, the point p is a valid random point (VRP) such that p ∈ U R_j.

Procedure MonteCarlo Input: U Rj, ISj, N⁰ Output: Pj

(1) Pj← 0, SU M1← 0, SU M2← 0, N1← 0 (2) repeat

(3) p ←Generate a random point in the MBR of U Rj

(4) if (p ∈ OU Rj) then

(5) if (U Rj.F lag = 1) then sign ← 0 (6) for each holeH in URj

(7) if (p ∈H) then sign ← 1; break;

(8) if (sign=1) then continue; //to generate the next point (9) SU M1← SU M₁+ fj(xi, yi); N1← N₁+ 1

(10) for each S_jⁱ∈ IS_j // where S_jⁱis the i^thsubdivision of ISj

(11) if (p ∈ OISⁱ_j) then SU M2← SU M2+ fj(xi, yi) (12) until N1= N⁰

(13) Pj←^{SU M}_{SU M}²

1

(14) return Pj

Figure 5: Psuedocodes ofMC Methodfor Computing P_j

Figure 5 illustrates the pseudocodes. Specifically, there are four steps. (1) We repeatedly generate random points in the MBR of U R_j. (2) For each random point p, we check if p ∈ U Rj. (3) If so, we compute the value of fj(xi, yi) based on its coordinates and the PDF, and accumulate this value to a variable SU M₁. In addition, we validate whether p ∈ IS_j. If so, we also accumulate this value to another variable SU M₂. (4) we compute ^{SU M}_{SU M}²

1 when the number of VRPs is equal to a pre-set value N⁰, and assign the result to Pj.

5 Optimization

Our optimization is based on a well known principle: To reduce the execution time of the most frequent operation. Recall Section 4, we compute the UR and IS in a direct way, it is feasible but not efficient enough. Moreover, the two operations are rather frequent in our query processing, especially when the number of MOs is large. This motivates us to develop more efficient algorithms. In this section, we first introduce some basic concepts, followed by our optimized schemes and two targeted algorithms, called CompUR and CompIS, for computing the UR and IS, respectively.

5.1 Basic concepts

LetE be a 2D entity, we denote by BE the MBR of E, and use BE.h⁻, BE.h⁺, BE.v⁻, BE.v⁺ to denote the boundary of BE. Unless stated otherwise, we deal with other MBRs similarly (e.g., the MBR of EP_j is denoted by BEP_j).

(11)

Definition 8. (Span) Given E and BE, the horizontal span of E is |BE.h⁺−BE.h⁻|, E.hs for short. Similarly, the vertical span of E is |BE.v⁺−BE.v⁻|, E.vs for short. The span of E, denoted by ! (E), is M AX{E.hs, E.vs}.

In the process of computing the UR, the geometrical entity is continuous evolving when EP_j subtracts CRAs one by one.

Definition 9. (Effective Subdivision) Given EP_j, RL_j and k CRAs (CRA₁,· · · ,CRA_k), without loss of generality, we assume kE − CRAmk > 1 when the m^th “subtraction operation” is executed, where m ≤ k, E denotes EPj or its evolved version, k · k denotes the number of subdivisions. A subdivision S_i is an effective subdivision such that RL_j ∈ S_i.

Let the UR be a closed region with k (≥ 0) holes, and let query range R be a simple closed region. The IS can be parsed by understanding the relation between R and OU R_j (or IU R_jⁱ), which is the basis of developing the targeted algorithm for computing the IS.

The geometrical relation between OU Rj and R has five cases as shown in the top of Table 2, where G, ≡, and denote that the two geometrical entities be intersecting with each other, totally coinciding, and disjointed, respectively. Next, we mainly discuss Case 4.2 and Case 5.2 since other cases are straightforward.

Remark 2. The notations G and ∩ used in this paper have different meanings. For example, given two 2D entitiesE1 andE2, E1GE2 denotes the outer ring of E1 and the one ofE2 intersect with each other. E1∩E2 denotes the intersection set of E1 and E2.

For Case 4.2, the geometrical relation between R and IU Rⁱ_j has also five cases as shown in the middle of Table 2. Note that, for Case 4.2.5, suppose R is subdivided, by certain hole (e.g., IU R¹_j), into two subdivisions, say S₁ and S₂. Next, it is possible that S₁ (or S₂) will be further subdivided, by another hole (e.g., IU R²_j), into two subdivisions S_1,1 and S_1,2 (S_2,1 and S_2,2), and so on.

Table 2: Parsing IS by understanding the relation

Name Condition → Result

Case 1. OU Rj R → ISj= ∅ .

Case 2. OU Rj⊂ R → IS_j= U Rj. Case 3. OU Rj≡ R → IS_j= U Rj.

Case 4. OU Rj⊃ R

Case 4.1. k = 0 → IS_j= R.

Case 4.2. k 6= 0 → IS_j= R − ∪^k_i=1IU Rⁱ_j.

Case 5. OU RjG R

Case 5.1. k = 0 → IS_j= OU Rj∩ R.

Case 5.2. k 6= 0 → IS_j= (OU Rj∩ R) − ∪^k_i=1IU R_jⁱ. Case 4.2.1. R ≡ IU Rⁱ_j → IS_j= ∅ .

Case 4.2.2. R ⊂ IU Rⁱ_j → ISj= ∅.

Case 4.2.3. R IU Rⁱj → IU Rⁱ_jmake no any impact on ISj. Case 4.2.4. R ⊃ IU Rⁱ_j → IU Rⁱ_jwill be a hole of ISj. Case 4.2.5. R G IU Rⁱ_j → IU Rⁱ_jpossibly subdivide R .

Case 5.2.1. OcR IU Rⁱ_j → IU R_jⁱ make no any impact on ISj. Case 5.2.2. OcR ⊃ IU Rⁱ_j → IU Rⁱ_j will be a hole of ISj. case 5.2.3. OcR G IU Rⁱ_j → IU Rⁱ_jpossibly subdivide OcR.

For Case 5.2, we should consider the impact of IU Rⁱ_j on “OU Rj∩R”; we substitute “OU R_j∩ R” with “OcR” for short. There are three cases as shown in the bottom of Table 2. For Case 5.2.3, it is similar to Case 4.2.5. Suppose OcR is subdivided (by certain hole) into two subdivisions S₁ and S₂. Next, S₁ (or S₂) is possibly further subdivided by other hole.

(12)

5.2 Enhancing the efficiency of computing UR

Heuristic 1. Given two CRAs, say CRAm and CRAn, and a 2D entity E, we have Pr(kE − CRA_mk > 1) > Pr(kE − CRAnk > 1), if ! (CRAm) >! (CRAn), where k · k denotes the number of subdivisions, Pr(·) denotes the probability.

The big equilateral polygon (with 32 edges), for example, denotes EPj, the grey polygon denotes CRA_m (m ∈ [1, · · · , 7]). See Figure 6(a). Compared to other CRAs, CRA₁ here has the largest span and P r(kEP_j− CRA₁k > 1) > P r(kEP_j− CRA_nk > 1), where n ∈ [2, · · · , 7].

Inspired by Heuristic 1, we have

Optimization 1. (Sort CRAs in Descending Order) Suppose Oj has k CRAs (CRA1,· · · ,CRA_k), we first sort all the CRAs of O_j in descending order based on their spans, then use EP_j to subtract CRAs one by one.

In above example, assume the spans of CRAs decrease from CRA1 to CRA7. According to Optimization 1, we deal with CRA₁ at first and then deal with CRA₂, and so on.

Heuristic 2. Recall Section 4.3, we choose the real UR at the last step when we compute the UR. This method however incurs many unnecessary calculations.

There are two subdivisions, S1and S2, after we dealt with CRA1. See Figure 6(b). Provided that we follow the method in Section 4.3; then, we have to consider each subdivision when we deal with the rest of CRAs. For instance, when we deal with CRA₂, we may first check whether CRA2 G S1; as a result, we find that CRA2 G S2, we then compute S2− CRA₂. All these operations, in fact, are redundant.

Lemma 2. (Prune Unrelated Subdivisions) Given EP_j, RL_j and k CRAs (CRA₁,· · · ,CRA_k), without loss of generality, we assume kE − CRAmk > 1 when the m^th “subtraction operation”

is executed, where m ≤ k, E denotes EPj or its evolved version. All subdivisions except the effective subdivision can be pruned safely.

Proof. The proof can be derived based on Definition 9 and analysis geometry, it is similar to that in Lemma 1, omitted for saving space.

For ease of discussion, we use EP_j and “EP_j.1” interchangeably. We denote the result of

“EPj.1 − CRA1” by EPj.2, the result of “EPj.2 − CRA2” by EPj.3, and so on ⁴. Thus, based on Heuristic 2 and Lemma 2, we develop

Optimization 2. ( Choose Effective Subdivision and Update MBR Immediately) Suppose kEP_j.m-CRA_mk > 1, where m ∈ [1, · · · , k], we choose the effective subdivision and discard other subdivision(s), especially, update its MBR at once.

Optimization 1 and 2 together contribute to quickly pruning unrelated CRAs, and thus saving the overhead. Continue to the example above, we choose S₁ as the effective subdivision, then update the MBR of EP_j and discard S₂; The dashed rectangle in Figure 6(b) illustrates the new MBR. When we deal with CRA2, by comparing the MBRs, we can quickly know that CRA₂is irrelevant with the final result. Otherwise, many unnecessary calculations are involved, as demonstrated before.

Heuristic 3. Suppose CRAm ⊂ EP_j.m (m ∈ [1, · · · , k]), the direct method is to compute

“EPj.m − CRAm” and let EPj.(m + 1) ← (EPj.m − CRAm) right now. This method however complicates the follow-up computation.

4Note that, EPj.n (n ∈ [1, · · · , k, k + 1]) refers to the effective subdivision. For instance, suppose kEPj.1−CRA1k >1, we let the effective subdivision (rather than all subdivisions) be EPj.2. In addition, for presentation simplicity, we abuse the notations EP_j.n and EP_j.m in this subsection.

(13)

CRA3

CRA1

CRA2

CRA4

CRA5

CRA6

CRA7

(a)

CRA2

CRA4 CRA5

CRA6

CRA7

S1

S2

CRA3

(b)

Figure 6: Illustration of Computing UR

The above heuristic is derived from a well known fact that Boolean operation on polygons with holes is generally more complicated than on polygons without holes. In view of Heuristic 3, we develop

Optimization 3. ( Postpone Processing) Suppose CRA_m ⊂ EP_j.m, we postpone the “subtraction operation” by caching CRAm in a temporary place.

Taking CRA3 in Figure 6(b) for example, here CRA3⊂EP_j.3, we store it in a temporary place (e.g., a linked list ’uHoles’), and shift to dealing with CRA₄, and so on. After we traversed all the rest of CRAs, we fetch CRA₃ from ’uHoles’, and check whether it results in a hole to EPj.8. If so, we let it be an inner ring of EPj.8.

Heuristic 4. Suppose CRAm G EPj.m ∧ kEPj.m − CRAmk=1, the direct method is to let EP_j.(m + 1) ← (EP_j.m − CRA_m), and update the MBR of EP_j.(m + 1). This method however incurs the extra overhead.

The above heuristic is derived from two facts. First, such a new MBR in most cases does not make enough contribution to the rest of computation. Second, we also have to traverse the vertexes of EP_j.(m + 1), in order to obtain such a new MBR. Thus, we develop

Optimization 4. (Lazy Update) Suppose CRA_m G EPj.m ∧ kEP_j.m − CRA_mk=1, we let EPj.(m + 1) ← (EPj.m − CRAm), but do not update its MBR.

Taking CRA4 in Figure 6 for example, it just satisfies the above two conditions. Here we let EP_j.5 ← (EP_j.4 − CRA₄), but we do not update the MBR.

5.3 Procedure CompUR

Figure 7 depicts the algorithm for computing the UR. Note that, some symbols (e.g., EP_j) used in this algorithm should be understood as similar as our previous discussion, since these entities are evolving continuously.

Firstly, we initialize the variables and sort the CRAs based on their spans (line 1-3). Next, we handle each CRAm (∈ CRAs) (line 4-9). In the process of handling each CRAm, if EPj has ever not been subdivided, we execute subprocedure HandleCRA (line 5-6). Otherwise, we first check if CRA_mcan directly be pruned by comparing the MBRs (line 8); if it cannot be pruned, we also execute subprocedure HandleCRA (line 9). Once we traversed all the CRAs, we handle those “postponed” CRAs (∈ uHoles). For each one of those postponed CRAs, we check if it results in a hole to EP_j; if so, we add it into a linked list rHoles (line 10-12). In particular, we let all the CRAs (∈ rHoles) be the inner rings of the last EPj (line 13). At last, we obtain the final result (line 14-15).

Regarding to the subprocedure HandleCRA, it is straightforward. There are three cases in terms of the geometrical relation between the current EPj and CRAm. When EPj G CRAm, we do a subtraction operation, and check if multiple subdivisions appear; if so, we choose the real subdivision and update the MBR (line 1-5). When EP_j ⊃ CRA_m, we add CRA_m into the linked list uHoles; and we do nothing when EP_j CRAm (line 6-7).

(14)

Procedure CompUR Input: CRAs, RLj, DTj

Output: U Rj

(1) U Rj← ∅, EPj← Transform O_j., BEPj←MBR of EP_j (2) if (CRAs 6= ∅) then

(3) Sort CRAs, switch ← 0, uHoles← ∅, rHoles← ∅ (4) for each CRAm∈ CRAs

(5) if (switch = 0) then

(6) Subprocedure HandleCRA (uHoles,CRAm,switch,EPj) (7) else //switch=1

(8) if (¬(BCRAmBEPj)) then

(9) Subprocedure HandleCRA(uHoles,CRAm,switch,EPj) (10) if (uHoles 6= ∅) then

(11) for each CRAm∈ uHoles

(12) if (CRAm⊂ EP_j) then rHoles ←rHoles ∪ CRAm

(13) if (rHoles 6= ∅) then Let all CRAs (∈ rHoles) be the inner rings of EPj

(14) U Rj← EPj; (15) return U Rj

Figure 7: Algorithm for Computing UR Subprocedure HandleCRA

Input: EPj, CRAm, uHoles, switch (1) if (EPjG CRAm)

(2) if (kEPj− CRAmk=1) then EPj← EP_j− CRAm

(3) else

(4) EPj← Choose the real subdivision from (EP_j− CRA_m) and update BEPj

(5) if (switch = 0) then switch ← 1 (6) else // (EPj⊃ CRAm) or (EPj CRAm)

(7) if (EPj⊃ CRAm) then uHoles← uHoles ∪ CRAm

Figure 8: Pseudocodes of Hanling CRA

5.4 Optimized scheme for computing IS

Let E be a single 2D entity, S be a set of N 2D entities (E1, E2,· · · ,EN). Similar to that in Section 5.1, we use BE and BEi (i ∈ [1, 2,· · · ,N ]) to denote the MBRs ofE and Ei, respectively.

In addition, let ~(E,S) be an operation that returns a set S^∗ such that (1) S^∗ ⊆ S, and (2) for each elementEi ∈S^∗, BEi BE.

Heuristic 5. Given E, S, and ~(E,S), it is common that ](S^∗)>0, where ](·) denotes the cardinality.

The above heuristic implies that (1) only a part of entities needed to be considered when we execute geometry operations betweenE and S, and (2) the MBR is usually an efficient screening tool. Inspired by these, we develop

Optimization 5. (Multi-level Screening) We use the MBRs to do an elementary screening in the process of computing the IS, a total of three level screenings are employed.

BEP BOUR

UR

UR UR

EP2 UR1

5

3

2 4

1

Figure 9: Example of Computing IS (I)

The first level screening is used for quickly pruning the unrelated UR. The biggest rectangle, for example, illustrates query range R, it has five CMOs since BR ∩BEP_j 6= ∅ (j ∈ [1, · · · , 5]).

(15)

See Figure 9. The grey polygons illustrate the URs. In order to get IS_j, first, we execute the first level screening. Here, BR BOU Rj (1 ≤ j ≤ 4), hence we can immediately know that ISj =0. Note that, for U R5, it cannot quickly be pruned by the first level screening.

Definition 10. (Candidate Hole) Given IU Rⁱ_j, R (or OcR), we useE to denote R (or OcR).

IU Rⁱ_j is a candidate hole (CH) such that BE∩BIURⁱ_j 6=∅.

The second level screening is used for picking out CHs from Pk

i=1IU Rⁱ_j. There are two cases:

• Case 1: Using the MBR of R to prune unrelated holes.

• Case 2: Using the MBR of OcR (i.e., OU R_j∩ R) to prune unrelated holes.

Figure 10(a) illustrates the first case, the grey region denotes U R_j that has seven holes, say IU Rⁱ_j (i ∈ [1, · · · , 7]). Here R ⊂ OU R_j; thus IS_j = R − ∪⁷_i=1IU Rⁱ_j (recall Case 4.2 in Table 2).

By comparing their MBRs, IU R¹_j and IU R²_j can be pruned, the rest of holes are CHs. Figure 10(b) presents the second case. Here R G OU Rj; thus ISj = (R ∩ OU Rj) − ∪⁷_i=1IU Rⁱ_j (recall Case 5.2 in Table 2). Similar to the first case, IU R¹_j and IU R²_j can be pruned by the second level screening.

6 1

4 2 7 3 5

(a)R ⊂ OU Rj

6 1

4 2 7 5 3

(b) R G OU Rj

Figure 10: Example of Computing IS (II)

The third level screening is using the MBR of CH to prune unrelated subdivisions. There are also two cases:

• Case 1: The subdivisions are from R.

• Csee 2: The subdivisions are from OcR.

As an example, suppose we deal with CHs in Figure 10(a) from left to right. R − IU R⁶_j has two subdivisions S₁ and S₂ (see Figure 11(a)). Next, since BS₁ BIU R⁵_j, S₁ can be pruned.

Similarly, after we dealt with IU R⁷_j, S₂ is subdivided into S_2.1 and S_2.2 (see Figure 11(b)).

When we deal with IU R³_j (or IU R⁴_j), S₁ and S_2.1 can be pruned. Figure 10(b) illustrates the second case where OcR is subdivided into multiple subdivisions.

Remark 3. Similar to that in Section 5.2, once a single region is subdivided, we update the MBR, but there is a little difference. (1) We use the new produced “multiple” subdivisions to substitute the old “single” region, and (2) We update the MBR of “each” new subdivision.

5

7 3

4 S₁ S₂

(a)

3 S 4

S₁ _2.1 S_2.2 (b)

S₁ S₂ (c)

Figure 11: Example of Computing IS (III)

Heuristic 6. Given two CHs, say CHm and CHn, we assume ! (CHm) > ! (CHn). It is more likely to incur the extra overhead, if we deal with CH_m prior to CH_n.

The above heuristic is derived from two facts. First, once multiple subdivisions appear, we cannot thoroughly discard any subdivision. Second, when we deal with a CH, we first have to choose the subdivision(s) on which it possibly makes impact.

(16)

Optimization 6. (Sort CHs in Ascending Order) Suppose there are k CHs, we first sort them in ascending order based on their spans, then use R (or OcR) to subtract them one by one.

Consider Figure 10(b), suppose the spans increase from IU R³_j to IU R_j⁷. Then, according to Optimization 6, we first deal with IU R³_j, next IU R⁴_j, and so on. After we dealt with IU R⁶_j, there are two subdivisions, S₁ and S₂ (see Figure 11(c)). When we deal with IU R⁷_j, the third level screening is activated since multiple subdivisions appear. Note that, here we only execute two times comparisons, i.e., we compare the MBR of IU R⁷_j with the one of S1 and with the one of S₂. However, the number of comparisons in Figure 11(a) and 11(b) is ten, where we deal with CHs from left to right.

Remark 4. As same as Optimization 4, we update the MBR in a lazy manner, if no multiple subdivisions appear. In addition, the strategy for dealing with the entity that results in a hole is the same as Optimization 3.

5.5 Procedure CompIS

Figure 12 illustrates the algorithm for computing the IS. First, we use the MBRs of R and OU R_j to do the first level screening (line 1-2). If otherwise, we then do different processing, based on the geometrical relation and U R_j.F lag (line 3-12). Here we just list Case 4.2 and Case 5.2, other cases are straightforward, which can be extended easily (recall Table 2).

Procedure CompIS Input: U Rj, R Output: ISj

(1) ISj← ∅

(2) if (¬(BU RjBR)) then

(3) if ( (U Rj.F lag = 1) ∧ ((OU Rj⊃ R) ∨ (OU RjG R))) then (4) if (OU Rj⊃ R ) then // Case 4.2

(5) tempIS ← R

(6) else // OU RjG R, Case 5.2 (7) tempIS ← OcR // OcR = OU Rj∩ R (8) Subprocedure HandleComplex (tempIS, U Rj) (9) ISj← tempIS ;

(10) else // other cases are straightforward

(11) ISj← “certain value” // please refer to Table 2.

(12) return ISj

Figure 12: Algorithm for Computing IS

The subprocedure HandleComplex is used for dealing with Case 4.2 and Case 5.2, the pseudocodes are shown in Figure 13. First, we obtain the CHs according to the second level screening and then sort them based on their spans (line 1-4). Next, we process each CH (line 5-21). If the entity tempIS has ever not been subdivided by any CH, we execute line 6-12. Otherwise, we execute line 13-21. Here we use the MBRs of h_i and S_ito prune unrelated subdivisions (line 16), i.e., the third level screening.

Note that, we add the result of Si− h_iinto temp and delete Si from tempIS (line 18), where temp is a linked list that stores the modified S_i (i.e., S_i− h_i). This is because the current hole will not make any impact on the modified (or new) Si. Thus we store the modified Si in temp for the present, and combine temp and tempIS until we traversed all the old subdivisions (line 21). In addition, when S_i⊃ h_i, we postpone dealing with this hole by storing h_i in rHoles (line 16 and 19), where rHoles is a linked list used for storing hi(∈ cHoles) that must be a hole in ISj. At last, if rHoles is not empty, we put all the holes (∈ rHoles) into their corresponding subdivisions (line 22-26).