• 沒有找到結果。

Fast Algorithms for the Density Finding Problem

N/A
N/A
Protected

Academic year: 2022

Share "Fast Algorithms for the Density Finding Problem"

Copied!
16
0
0

加載中.... (立即查看全文)

全文

(1)

DOI 10.1007/s00453-007-9023-8

Fast Algorithms for the Density Finding Problem

D.T. Lee· Tien-Ching Lin · Hsueh-I Lu

Received: 27 September 2006 / Accepted: 2 August 2007

© Springer Science+Business Media, LLC 2007

Abstract We study the problem of finding a specific density subsequence of a se- quence arising from the analysis of biomolecular sequences. Given a sequence A= (a1, w1), (a2, w2), . . . , (an, wn)of n ordered pairs (ai, wi)of numbers ai and width wi>0 for each 1≤ i ≤ n, two nonnegative numbers , u with  ≤ u and a number δ, the DENSITYFINDING PROBLEMis to find the consecutive subsequence A(i, j) over all O(n2) consecutive subsequences A(i, j ) with width constraint satisfying

≤ w(i, j) =j

r=iwr ≤ u such that its density d(i, j)=j

r=iar/w(i, j)is closest to δ. The extensively studied MAXIMUM-DENSITYSEGMENTPROBLEMis a special case of the DENSITYFINDINGPROBLEMwith δ= ∞. We show that the DENSITYFINDINGPROBLEMhas a lower bound (n log n) in the algebraic decision tree model of computation. We give an algorithm for the DENSITYFINDINGPROB-

LEMthat runs in optimal O(n log n) time and O(n log n) space for the case when there is no upper bound on the width of the sequence, i.e., u= w(1, n). For the general case, we give an algorithm that runs in O(n log2m)time and O(n+ m log m) space, where

Grants NSC95-2221-E-001-016-MY3, NSC-94-2422-H-001-0001, and

NSC-95-2752-E-002-005-PAE, and by the Taiwan Information Security Center (TWISC) under the Grants NSC NSC95-2218-E-001-001, NSC95-3114-P-001-002-Y, NSC94-3114-P-001-003-Y and NSC 94-3114-P-011-001.

D.T. Lee· T.-C. Lin (



)· H.-I Lu

Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan

e-mail: kero@iis.sinica.edu.tw D.T. Lee

e-mail: dtlee@csie.ntu.edu.tw H.-I Lu

e-mail: hil@csie.ntu.edu.tw

D.T. Lee· T.-C. Lin

Institute of Information Science, Academia Sinica, Nankang, Taipei 115, Taiwan

(2)

m= min{wu−min, n} and wmin= minnr=1wr. As a byproduct, we give another O(n) time and space algorithm for the MAXIMUM-DENSITYSEGMENTPROBLEM. Keywords Maximum-density segment problem· Density finding problem · Slope selection problem· Convex hull · Computational geometry · GC content · DNA sequence· Bioinformatics

1 Introduction

Let A be a sequence of n ordered pairs of numbers (a1, w1), (a2, w2), . . . , (an, wn) with wi>0 for each i. A segment A(i, j )= (ai, wi), (ai+1, wi+1), . . . , (aj, wj)is a consecutive subsequence of A starting with index i and ending with index j . The width w(i, j ) of segment A(i, j ) is wi+ wi+1+ · · · + wj. The density d(i, j ) of seg- ment A(i, j ) is (ai+ ai+1+ · · · + aj)/w(i, j ). Given a sequence A of n ordered pairs of numbers, density δ and width bounds  and u with ≤ u, the DENSITYFINDING

PROBLEMis to find the feasible segment A(i, j)that minimizes|d(i, j) − δ|. We say that a segment A(i, j ) is feasible if its width satisfies ≤ w(i, j) ≤ u.

The density finding problem with δ = ∞ is exactly the extensively studied MAXIMUM-DENSITYSEGMENTPROBLEM[1–9] which arises from the investiga- tion of non-uniformity of nucleotide composition within genomic sequences. For the uniform case where all wi’s are identical, for each 1≤ i ≤ n, Nekrutenko and Li [8]

and Rice, Longden, and Bleasby [9] presented algorithms for the case u= , which is trivially solvable in O(n) time. If u= , this problem can be easily solved in O(n(u− )) time, linear in the number of feasible segments. Huang [4] studied the special case u= n, and observed that an optimal segment exists with width at most 2−1. Therefore, this case is equivalent to the case with u = 2−1 and can be solved in O(n) time. Lin, Jiang, and Chao [6] gave an O(n log ) time algorithm based on right-skew decomposition. For the general case when u < n Goldwasser, Kao, and Lu [2,3] gave an O(n) time algorithm. Without loss of generality we shall assume wi= 1 for the uniform case in this paper. For the nonuniform case Goldwasser, Kao, and Lu [2] also gave an O(n log(u− )) time algorithm. Kim [5] gave a clever O(n) time algorithm based on a geometric interpretation of the problem, which transforms the finding of maximum-density segment into the finding of maximum-slope line segment in computational geometry, but unfortunately Kim’s algorithm has a flaw, as pointed out by Chung and Lu [1], which causes it to run in O(n2)time in the worst case. Chung and Lu [1] recently gave an O(n) time algorithm bypassing the complicated right-skew decomposition. Based on Kim’s geometric technique and the decomposibility of tangent query to convex hulls, we fix the flaw of Kim’s algorithm and give yet another an O(n) time algorithm for the nonuniform case.

The MAXIMUM-DENSITYSEGMENTPROBLEMbecomes the problem of finding the most GC-rich region of a given DNA sequence, when we let the input sequence A= (a1, w1), (a2, w2), . . . , (an, wn)correspond to a given DNA sequence with uni- form width such that ai= 1 if the corresponding nucleotide in the DNA sequence is G or C, and ai = 0 if the corresponding nucleotide in the DNA sequence is A or T.

It is obvious that the output segment corresponds to the most GC-rich region of the given DNA sequence.

(3)

The DENSITYFINDINGPROBLEMfor A with respect to density δ was motivated by the results due to Ioshikhes and Zhang [10], and to Ohler et al. [11], who used specific GC-ratio as one of the parameters in their programs that locate promoter regions in large-scale genomic analysis.

The rest of the paper is organized as follows. Section 2 reduces the DENSITY

FINDINGPROBLEMto a geometric SLOPEFINDINGPROBLEM. Section3gives our density finding algorithm and a proof of its optimality. Section 4gives an optimal algorithm for finding the maximum-density segment. Section5concludes this paper with some future research directions.

2 Reduction to a Geometric Slope Finding Problem

We first briefly review some basic concepts of computational geometry used in this paper. We say that a point p is below (resp. above) a line l or a line segment uv, if the y-coordinate of p is less (resp. greater) than that of the point pl, which is the intersection of the vertical line passing through p with line l or with line seg- ment uv. pl is called the vertical projection of p on l or on uv. A polygonal chain C is a sequence of points, c1, c2, . . . , ct, such that cici+1, i= 1, 2, . . . , t − 1 is a line segment, or an edge. A polygonal chain C is said to be monotone with respect to a straight-line l if any line perpendicular to l intersects C at most once. If the line perpendicular to l passes through point ci, then the intersection point qi on l is called the orthogonal projection of ci on l. Similarly, a point p is said to be below (resp. above) a polygonal chain C monotone with respect to the x-axis, if p is below (resp. above) an edge of C. The upper hull of a point set P = {p0, p1, . . . , pn} in the plane, denotedUH(P ), is a polygonal chain monotone with respect to the x-axis, such that any point of P is either onUH(P ) or below UH(P ). That is, UH(P ) is a sequence of hull points, ph1, ph2, . . . , pht, where phi ∈ P such that for every point pi = (xi, yi)∈ P the vertical line νi passing through pi intersects the sequence of hull edges, ph1ph2, ph2ph3, . . . , ph(t−1)pht, exactly once and the intersection point qi

has y-coordinate no less than yi. Let phmbe the hull point inUH(P ) with the greatest y-coordinate. But if the hull points inUH(P ) with the greatest y-coordinate are not unique, we set phm to be the rightmost point among those points. We define the left branch L(P ) (resp. right branch R(P )) ofUH(P ) to be the sequence of hull points, ph1, ph2, . . . , phm (resp. phm, phm+1, . . . , pht).

Given an upper hullUH(P ) and a point q ∈ P external to UH(P ), the tangent segment ofUH(P ) from q is a line segment li= phiq passing through a hull point phi onUH(P ) such that all points of P lie entirely below the line containing li and phiis called the tangent point ofUH(P ) from q. Given any hull point phr ∈ UH(P ), if phr is not the tangent point of UH(P ) from q and phrq does not intersect the region belowUH(P ), then phr is called a reflex point ofUH(P ) from q. If phr is neither a tangent point nor a reflexive point from q, then it is called a concave point ofUH(P ) from q. It is easy to see that testing if a hull point on UH(P ) is tangent, reflex or concave from a point q not in P can be done in O(1) time, assuming all the hull points are maintained as a doubly-linked list and the hull points before and after a given hull point can be accessed in O(1) time. An example of the upper hull is shown in Fig.1.

(4)

Fig. 1 The upper hull UH(P ) = ph1, ph2, . . . , pht. The left branch L(P ) (resp.

right branch R(P )) ofUH(P ) is ph1, ph2, . . . , phm(resp.

phm, phm+1, . . . , pht). ph3is the tangent point onUH(P ) from q,{ph4, ph5, . . . , pht} are reflex points onUH(P ) from q, and{ph1, ph2} are concave points onUH(P ) from q

We shall transform the DENSITY FINDING PROBLEM into a geometric SLOPE

FINDING PROBLEM as follows: We define the point set P = {p0, p1, . . . , pn} in R2 according to the prefix sums of the sequence A, where pi = (xi, yi)= (i

k=1wk,i

k=1ak), i= 1, 2, . . . , n and p0= (0, 0). To simplify the presentation, we assume that the points are in general position. By general position it means that no three points are collinear and no two induced lines passing through any two points have the same slope. It is easy to see that the slope m(i, j ) of the line segment s(i, j ) connecting pi and pj is equal to the density d(i+ 1, j) of the segment A(i + 1, j), so we can define that a line segment s(i, j ) is feasible if its corresponding segment A(i+ 1, j) is feasible, i.e., s(i, j) is feasible if  ≤ w(i + 1, j) ≤ u. Let F (δ) be the set of all feasible line segments of P . Let F+(δ)= {s(i, j) ∈ F (δ) | m(i, j) ≥ δ} and F(δ)= {s(i, j) ∈ F (δ) | m(i, j) ≤ δ} denote the sets of all feasible line segments of P with slopes no less or no greater than δ respectively. Without loss of general- ity, we may assume δ= 0 for the DENSITY FINDING PROBLEM, since the density finding problem for sequence A with respect to density δ and width bounds  and u is equivalent to the same problem for sequence B of n ordered pairs (bi, wi), with respect to density 0 and the same width bounds, where bi= ai− δwi holds for each i= 1, 2, . . . , n. Clearly, we can further restrict our attention to segments in F+(0) since the segments in F(0) can be converted to those in F+(0) by setting bi= −bi

for each i = 1, 2, . . . , n. Therefore, it suffices to consider the following geometric SLOPEFINDINGPROBLEM.

Given a set of points P= {p0, p1, . . . , pn} in R2where pi= (xi, yi)as defined earlier and two width bounds , u with ≤ u, find the feasible line segment s(i, j)in F (0)+that minimizes m(i, j ).

Let Pa,bdenote the subset{pa, pa+1, . . . , pb} of P starting with left index a and ending with right index b. The indices a and b are assumed to be in[0, n]: If a < 0, we take a= 0. If b > n, we take b = n. If a > b, we define Pa,bto be an empty set.

For each point pj we have a set of all feasible points Pcj,dj = {pcj, pcj+1, . . . , pdj}, such that each pi ∈ Pcj,dj satisfies ≤ xj − xi = w(i + 1, j) ≤ u. Without con- fusion we shall for simplicity denote Pcj,dj as Pj. That is, Pj is the subset of P such that the line segment s(i, j ) is feasible for each pi ∈ Pj. Since the sequence {xj}nj=1 of the x-coordinates of P is monotonically increasing, the left and right index sequences {cj}nj=1 and {dj}nj=1 respectively are non-decreasing. Therefore, we can obtain sequences {cj}nj=1and{dj}nj=1 respectively by a linear scan of the

(5)

sequence {xj}nj=1. Let Pj+= {pi ∈ Pj | m(i, j) ≥ 0}. For each j, we define ij to be the index of point pij ∈ Pj+ such that m(i, j ) is minimized. That is, we have m(ij, j )= min{m(i, j) | pi ∈ Pj+}. Let i and j be the pair of indices that mini- mizes m(ij, j ). That is, we have m(i, j)= min{m(ij, j )| 1 ≤ j ≤ n}. The geomet- ric SLOPE FINDING PROBLEMcan now be reduced to that of finding the feasible point pij in Pj+ that minimizes m(i, j ) for each j , and then select the minimum among all m(ij, j )’s. Let ptj be the tangent point ofUH(Pj+)from pj. It is not dif- ficult to see that m(tj, j )= min{m(i, j) | pi∈ Pj+} = m(ij, j ). Therefore, if we can find out the tangent point ptj ofUH(Pj+)from pj for each 1≤ j ≤ n, we can solve the geometric SLOPEFINDINGPROBLEM.

3 Algorithm for Density Finding Problem

We first show that the DENSITYFINDINGPROBLEMhas a lower bound (n log n) in the algebraic decision tree model of computation. The ELEMENT UNIQUENESS

PROBLEMis to determine if a set of n positive numbers z1, z2, . . . , zn are all dis- tinct. It is known that the ELEMENT UNIQUENESS PROBLEM has a lower bound of (n log n) in the algebraic decision tree model of computation [12]. We can transform an instance of ELEMENT UNIQUENESS PROBLEM to an instance of the DENSITY FINDING PROBLEM for the uniform case with = 1, u = n and δ = 0 in O(n) time by letting a1= z1, ai = zi − zi−1 for i = 2, . . . , n. The density d(i, j )= (ai + ai+1+ · · · + aj)/(j− i) = (zj − zi−1)/(j− i) of the output seg- ment A(i, j ) is not equal to 0 if and only if z1, z2, . . . , znare all distinct. Therefore, the DENSITYFINDINGPROBLEMhas a lower bound of (n log n) in the algebraic decision tree model of computation.

Lemma 1 The DENSITYFINDINGPROBLEMfor the uniform case with = 1, u = n and δ= 0 has a lower bound of (n log n) in the algebraic decision tree model of computation.

For each pi = (xi, yi)∈ Pj, we let Pji = {pr ∈ Pj | yr ≤ yi} be the subset of Pj whose y-coordinates are no greater than the y-coordinate of pi. The following lemma, Lemma 2, shows that if we can find the point pkj in Pj with the largest y-coordinate no greater than pj then we can obtain pij by finding the tangent point on the left branch ofUH(Pjkj)from pj. Therefore, once we have maintained a data structure of the left branch ofUH(Pji)for each pi∈ Pj, we can find the tangent point on the left branch ofUH(Pjkj)from pj.

Lemma 2 Let pkj be the point in Pj with the largest y-coordinate no greater than pj. Then Pj+= Pjkj and pij must be a tangent point on the left branch L(Pjkj)of the upper hullUH(Pjkj).

(6)

Proof Since pkj is the point in Pj with the largest y-coordinate no greater than pj, we have m(i, j ) < 0 for each pi ∈ Pj \ Pjkj and m(i, j )≥ 0 for each pi ∈ Pjkj. It means Pj+= Pjkj. We also have xkj < xi < xj and yi ≤ ykj for each pi on the right branch R(Pjkj)of UH(Pjkj). This implies m(kj, j )= yxjj−y−xkj

kj <yxj−ykj

j−xi

yj−yi

xj−xi = m(i, j). Therefore, pij must be a tangent point on the left branch L(Pjkj)of

UH(Pjkj). 

3.1 Special Case when u= w(1, n)

Our density finding algorithm for the case u= w(1, n) iterates from j = 1 to n for finding the tangent point ptj on the left branch ofUH(Pj+)from pj, and is described as follows: At the beginning of iteration j , we have available the set of all feasible points, Pcj,dj (abbreviated as Pj for convenience), and maintain some data struc- tures in Pj such that we can make a predecessor query to obtain pkj and a tangent query to obtain ptj. At the end of iteration we will update Pj to Pj+1by inserting points pdj+1, . . . , pdj+1 one at a time from left to right. In this special case since u= w(1, n), all cj’s are identical, i.e., cj= 1 for all j. We will maintain two dy- namic data structures in Pj for the above purposes: a balanced binary search tree T (Pj)[13] which stores all points of Pj in ascending y-coordinates to support both the predecessor query for finding the point pkj in Pj, and the insertion operations;

and a left branch data structureL(Pj), which stores the left branch L(Pji)for each pi, cj≤ i ≤ dj in Pj by a singly linked list in descending x-coordinates to support the tangent query for finding the tangent point ptj on the left branch L(Pjkj)from pj. Updates from T (Pj)to T (Pj+1)by insertions of points to the balanced binary search tree can be done in a straightforward manner. We briefly describe the update of the left branch fromL(Pj)toL(Pj+1)by inserting a point pi ∈ Pdj+1,dj+1 intoL(Pj) in O(log n) time each as follows. We need to construct the left branchL(Pji+1)for each pi in Pj+1. We first find the point pki in Pj with the largest y-coordinate less than piby a predecessor query to T (Pj). If pkiexists, then we can obtain the tangent point pti inL(Pjki)= ph1, ph2, . . . , phm from pi by a tangent query (which will be described later) toL(Pjki). After finding out the tangent point pti= phv ∈ L(Pjki)for some hv, we insert pi intoL(Pj), make a link from pi to phv and set L(Pji+1)= ph1, ph2, . . . , phv, pi. But if pki doesn’t exist, we just insert pi intoL(Pj)and set L(Pji+1)= pi. An example of inserting a point pi intoL(Pj)is shown in Fig.2.

To ensure the correctness of the construction ofL(Pj+1), we need to show that L(Pji+1)is indeed equal to the sequence Hi = ph1, ph2, . . . , phv, pi. It is easy to see that Hi is a sequence of hull points of UH(Pji+1)and pi is the hull point in UH(Pji+1)with the greatest y-coordinate. Therefore, we have L(Pji+1)= Hi. The correctness of the construction then follows.

In order to support the tangent query in O(log n) time to every L(Pji+1) of L(Pj+1), we need to add a few jumping pointers toL(Pji+1)so that we can per- form binary search in logarithmic time. We denote this new data structure for

(7)

Fig. 2 Insert point piinto L(Pj)to obtainL(Pj+1i ). We first find the point pkiin Pj with the largest y-coordinate no greater than piby a predecessor query to T (Pj), and then we can obtain the tangent point ption L(Pjki)from piby a tangent query toL(Pjki). After finding out tangent point ptiwe make a link from pito ptito get L(Pji+1)

L(Pji+1)that is augmented with jumping pointers byL(Pji+1). For each point in L(Pj+1), we maintain an array of size log2n to store the jumping pointers. We let L(Pji+1)(r)[i] denote the index of the hull point on L(Pji+1) to the left of pi with link distance 2r away from pi for 0≤ r ≤ log2n. It can be defined re- cursively using the following formula: L(Pji+1)(0)[i] = ti andL(Pji+1)(r+1)[i] = max{L(Pj+1i )(r)[L(Pj+1i )(r)[i]], 0}. Note that, pti, the tangent point from pi, is one (=20) link distance away from point pi.

The operation of tangent queries in L(Pj) from pj is implemented by a bi- nary search in O(log n) time as follows. Let pkj be the point in Pj with the largest y-coordinate less than pj obtained from the predecessor query to T (Pj). We do bi- nary search in the arrayL(Pjkj)(r)[kj], 0 ≤ r ≤ log2n. When we search a point ps in this array, if ps on L(Pjkj)is a reflex point from pj, we search forward, and if psis a concave point from pj we search backward, until we find the tangent point ptj from pj. A detailed description of the tangent query to L(Pjkj)is shown in the pseudo code below.

Algorithm DENSITYFINDINGPROBLEM(P , )

Input: a set of points P = {p0, p1, . . . , pn} in R2and a nonnegative number .

Output: the feasible line segment s(i, j) such that m(i, j) = mins(i,j )∈F (0)+m(i, j ).

1. j← 0; Pj← ∅;

2. ({ci}ni=1+1,{di}ni=1+1)← CONSTRUCTINDEX({xi}ni=1);

3. for j= 0 to n do

4. pkj ← point in T (Pj)with the largest y-coord. less than pj by a predecessor query;

5. ptj← TANGENTQUERY(L(Pjkj), log2n, kj, pj);

6. m(ij, j )← m(tj, j );

7. for i= dj+ 1 to dj+1do

8. T (Pj+1)← insert point pi into T (Pj);

9. L(Pj+1)← INSERTPOINT(L(Pj), pi);

10. m(i, j)← min0≤j≤nm(ij, j );

11. return s(i, j);

(8)

Function CONSTRUCTINDEX({xi}ni=1).

Input: sequence{xi}ni=1.

Output: sequences{ci}ni=1+1and{di}ni=1+1. 1. c0← 0; d0← 0; cn+1← 0; dn+1← 0;

2. for j= 1 to n do 3. cj← cj−1;

4. while xj− xcj+1> udo cj← cj+ 1;

5. dj← dj−1;

6. while xj− xdj+1≥  do dj← dj+ 1;

7. return ({ci}n+1i=1,{di}n+1i=1);

Function TANGENTQUERY(L(Pjkj), r, t, pj).

Input: data structureL(Pjkj)of left branch L(Pjkj), order r, index t , and point pj. Output: the tangent point on L(Pjkj)from pj.

1. if r≥ 0

2. s← L(Pjkj)(r)[t];

3. if ps is tangent on L(Pjkj)from pj return ps;

4. if ps is reflex on L(Pjkj)from pj return TANGENTQUERY(L(Pjkj), r− 1, s, pj);

5. if psis concave on L(Pjkj)from pj return TANGENTQUERY(L(Pjkj), r− 1, t, pj);

Function INSERTPOINT(L(Pj), pi).

Input: data structureL(Pj)of left branch L(Pj)and point pi.

1. pki← the largest y-coord. less than pi in T (Pj)by predecessor query;

2. if pkidoesn’t exist 3. then insert pi intoL(Pj);

4. else

5. pti← TANGENTQUERY(L(Pjki), log2n, ki, pi);

6. insert pi intoL(Pj)and make a link from pito pti; 7. L(Pji+1)(0)[i] ← ti;

8. for r= 0 to log2n− 1 do

9. L(Pji+1)(r+1)[i] ← max{L(Pji+1)(r)[L(Pji+1)(r)[i]], 0};

Lemma 3 The tangent query and insertion operation can be done in O(log n) time each on the left branch data structureL(Pj).

Proof The insertion operation of a point to L(Pj)can be done by a predecessor query to T (Pj)in O(log n) time and a tangent query to L(Pj)in O(log n) time by binary search using the jumping pointers, so it takes O(log n) time per insertion.

Setting up the jumping pointers for each newly inserted point can also be done in O(log n) time, since there are as many such pointers that need to be created. 

(9)

Theorem 1 The DENSITY FINDING PROBLEM for the case u= w(1, n) can be solved in optimal O(n log n) time and O(n log n) space.

Proof The correctness of this algorithm follows from the arguments given earlier and the correctness of insertion operation of L(Pj). Since T (Pj)is a balanced binary tree, it takes O(log n) time to insert a point. It takes O(n log n) time overall. Since the algorithm performs O(n) tangent queries and insertions onL(Pj), the overall time needed is O(n log n) by Lemma3. Since this algorithm maintainsL(Pj)dy- namically whose size is at most n and it takes O(log n) space for each point pi∈ Pj

to construct jumping pointers of all orders of r, for 1≤ r ≤ log2n, the total space

requirement is O(n log n). 

3.2 General Case when u < w(1, n)

Now we develop our density finding algorithm for the general case, where the width of the segment has an upper bound u < w(1, n). At any iteration j , we maintain as described above, a dynamic left branch data structure L(Pj)of the upper hull UH(Pj)such that we can make tangent query toL(Pj)to obtain tangent point ptj. The only difference is that the set of all feasible points Pcj,dj for each pj varies in that cj+1is no longer identical to cj. In case cj+1= cj, we need to delete points pcj, pcj+1, . . . , pcj+1−1from Pj. Insertions of points pdj+1, . . . , pdj+1are performed in the same way to obtainL(Pj+1)fromL(Pj). Moreover, our dynamic left branch data structure, as described above, supports only insertion operations. It doesn’t sup- port deletion operations as effectively as we desire. Further modifications to the dy- namic left branch data structure are needed.

We observe that the tangent query is decomposable. A query is called decompos- able if the answer to the query over the entire set can be obtained by combining the answers to the queries to a suitable collection of subsets of the set. We will partition Pj into several disjoint canonical subsets such that Pj= P0∪ P1∪ · · · ∪ Ph where h= log m, m = min{wu−min, n} and each canonical subset Pi has size|Pi| ≤ 2i. Note that some of these canonical subsets are empty, and that except the last non- empty canonical subset, all of the nonempty ones will be full, i.e., of size 2i for some i≥ 0. We will maintain a left branch data structure L(Pi)for each i. We define the dynamic left branch data structureL(Pj)to beL(P0)∪ L(P1)∪ · · · ∪ L(Ph). At iteration j , we first make tangent query toL(Pi)for each i and find the one with the smallest slope as the tangent point ptj, and then we delete points pcj, . . . , pcj+1−1

fromL(Pj)and insert points pdj+1, . . . , pdj+1 intoL(Pj). When we insert a point pi into L(Pj), we will insert it into L(Pr) which contains the point pi−1 if

|Pr| < 2r, otherwise we insert it intoL(Pr+1). When we delete a point pi from L(Pj), we will delete the entire data structureL(Pr)which piis located and recon- structL(P0),L(P1), . . . ,L(Pr−1)by inserting all the remaining points inL(Pr) intoL(P0),L(P1), . . . ,L(Pr−1)one by one in a left-to-right order. A detailed de- scription of the density finding algorithm for general case is shown in the pseudo code below.

(10)

Algorithm DENSITYFINDINGPROBLEM(P , , u)

Input: a set of points P = {p0, p1, . . . , pn} in R2where pi= (xi, yi)and two non- negative numbers , u.

Output: the feasible line segment s(i, j) such that m(i, j) = mins(i,j )∈F (0)+m(i, j ).

1. j← 0; Pj← ∅; t ← 0; h ← log(min{wu−min, n});

2. ({ci}ni=1+1,{di}ni=1+1)← CONSTRUCTINDEX({xi}ni=1);

3. for j= 0 to n do 4. m(ij, j )← ∞;

5. for i= 0 to h

/* lines 6–8 do tangent query toL(Pj)*/

6. pkj ← the largest y-coord. less than pjin T (Pi)by predecessor query;

7. ptj ← TangentQuery(L(Pi), h, kj, pj);

8. if m(ij, j ) > m(tj, j )then s(ij, j )← s(tj, j );

/* lines 9–21 delete points pcj, . . . , pcj+1−1fromL(Pj)*/

9. for a= cj to cj+1− 1 do

10. let Pr be the set such that pa∈ Pr;

11. let|Pr| − 1 = λ020+ λ121+ · · · + λr−12r−1, where λk= 0 or 1, k = 0, 1, . . . , r− 1;

12. q← a + 1;

13. for i= 0 to r − 1

14. f ← 0;

15. while λi= 1 and f < 2iand q≤ |Pr| − 1 16. T (Pi)← insert point pqinto T (Pi);

17. L(Pi)← INSERTPOINT(L(Pi), pq);

18. q← q + 1; f ← f + 1;

19. if r= t /* Pt is the last nonempty canonical subset */

20. t← max{k|λk= 1, k = 0, 1, . . . , r − 1};

21. DeleteL(Pr)and T (Pr);

/* lines 22–29 insert points pdj+1, . . . , pdj+1 intoL(Pj)*/

22. for b= dj+ 1 to dj+1do 23. if|Pt| < 2t

24. T (Pt)← insert point pbinto T (Pt);

25. L(Pt)← INSERTPOINT(L(Pt), pb);

26. else

27. t← t + 1;

28. T (Pt)← insert point pbinto T (Pt);

29. L(Pt)← INSERTPOINT(L(Pt), pb);

30. m(i, j)= min{m(ij, j )|0 ≤ j ≤ n};

31. return s(i, j);

Theorem 2 The DENSITY FINDINGPROBLEM can be solved in O(n log2m) time and O(n+ m log m) space where m = min{wu−

min, n}.

Proof The algorithm maintains a left branch data structure L(Pj) dynamically whose size is at most m= min{wu−min, n}. When a point pi in L(Pr)is deleted,

(11)

we destroy the entire data structureL(Pr)and reconstruct at most r left branch data structures L(P0),L(P1), . . . ,L(Pr−1)with the remaining points in L(Pr). We note that each time when a data structureL(Pr)of size 2r is destroyed, the remain- ing points were reinserted into some data structuresL(Pi), i < r, of smaller size.

Overall in the whole deletion process of the algorithm each point in P can be rein- serted into a left branch data structure of sizes, 20,21, . . . ,2log m at most once, and it takes O(log m) time to insert a point into a left branch data structure, so it totally takes O(n log2m)time for reinsertions and deletions operations. Since the algorithm needs to do O(log m) tangent queries at any iteration j and it takes O(log m) time for each query, the total time taken is O(n log2m). As for the space requirement, since this algorithm maintainsL(Pj)= L(P0)∪ L(P1)∪ · · · ∪ L(Plog m)dynamically whose size is at most m and it needs at most O(log m) jumping pointers for each

point, it totally needs O(n+ m log m) space. 

4 Algorithm for Maximum-Density Segment Problem

As a byproduct, we shall present yet another optimal algorithm for the MAXIMUM- DENSITY SEGMENT PROBLEM based on Kim’s idea and the decomposibility of tangent query to convex hulls by which we fix the flaw of his algorithm. Recall that the MAXIMUM-DENSITY SEGMENTPROBLEMis a special case of the DEN-

SITYFINDING PROBLEM with δ= ∞. For this case we have F(δ)= F (δ). The MAXIMUM-DENSITY SEGMENT PROBLEM can be reformulated as the following geometric MAXIMUMSLOPEFINDINGPROBLEM.

Given a set of points P = {p0, p1, . . . , pn} in R2and two width bounds , u with ≤ u, output the feasible line segment s(i, j)in F (∞) that maximizes m(i, j ).

Similar to the upper hull of Pj, we introduce the notion of lower hull, denoted LH(Pj), of Pj, which is a polygonal chain monotone with respect to the x-axis, and all points of Pj are either on or above the polygonal chain. Let ptj be the tangent point on the lower hull LH(Pj)of Pj from pj. It is easy to see that m(tj, j )= max{m(i, j) | pi∈ Pj} = m(ij, j ). Therefore, if we can find out the tangent point ptj

inLH(Pj)from pjfor 0≤ j ≤ n, we can solve the MAXIMUM-DENSITYSEGMENT

PROBLEM.

Our maximum-density segment algorithm that iterates from j= 0 to n for finding the tangent point ptj on the lower hullLH(Pj)from pj is described as follows: As before, associated with each pj we have a set Pj = Pcj,dj of all feasible points. At any iteration j , we maintain a dynamic lower hull data structure in LH(Pj)such that we can make tangent query to LH(Pj)to obtain tangent point ptj in amor- tized O(1) time, and then we delete points pcj, pcj+1, . . . , pcj+1−1and insert points pdj+1, pdj+2, . . . , pdj+1 both in amortized O(1) time each to obtain the dynamic lower hull data structureLH(Pj+1). It is not obvious how to maintain just one dy- namic data structure that supports both insertion and deletion of points and the tangent query operations. That is the difficulty with which Kim’s lower hull data structure faced. We provide a fix by exploiting the decomposibility of tangent query again. We

參考文獻

相關文件

The goal of this note is to introduce two new lower bounds on the size of an optimal augmentation and to prove that (a slightly modified version of) our algorithm from [4] produces

(Hint: this problem can be solved by either integrating over the Bloch sphere (expressing the states in polar coordinates) or finding the density operator (the expected state) for

– Factorization is “harder than” calculating Euler’s phi function (see Lemma 51 on p. 404).. – So factorization is harder than calculating Euler’s phi function, which is

In the initial alignment finding stage, the time complexity for EMPSC algorithm in this stage is O(eloge + pn) with the scoring function based on fast the O(n)hash function, where

Motivated by recent work of Hajela, we here reconsider the problem of determining the minimum distance between output sequences of an ideal band-limiting channel,

The NP-completeness of finding the mini- mum diameter subgraph with a budget constraint was es- tablished in [17], while a polynomial-time algorithm for finding the minimum

Chun-Chuan Chou, A Study Of Stamp problem, Thesis for Master of Science, De- partment of Applied Mathematics, National Chiao Tung University, Hsinchu,

But by definition the true param- eter value θθθ ◦ uniquely solves the population moment condition (11.1), so the probability limit θθθ ∗ must be equal to the true parameter