• 沒有找到結果。

An Optimal Algorithm for the Range Maximum-Sum Segment Query Problem

N/A
N/A
Protected

Academic year: 2021

Share "An Optimal Algorithm for the Range Maximum-Sum Segment Query Problem"

Copied!
9
0
0

加載中.... (立即查看全文)

全文

(1)

An Optimal Algorithm for the Range Maximum-Sum Segment

Query Problem

Kuan-Yu Chen and Kun-Mao Chao

Department of Computer Science and Information Engineering

National Taiwan University, Taipei, Taiwan 106

{r92047, kmchao}@csie.ntu.edu.tw

Abstract

We are given a sequence A of n real num-bers which is to be preprocessed. In the Range Maximum-Sum Segment Query (RMSQ) problem, a query is comprised of two intervals [i, j] and [k, l] and our goal is to return the maximum-sum segment of A where the staring index of the seg-ment lies in [i, j] and the ending index lies in [k, l]. We provide the flrst known optimal algorithm with O(n) preprocessing time and O(1) query time.

Keywords: RMQ, maximum sum interval, se-quence analysis

1

Introduction

Sequence analysis in bioinformatics has been studied for decades. One important line of in-vestigation in sequence analysis is to locate the biologically meaningful segments, like conserved regions or GC-rich regions in DNA sequences. A common approach is to assign a real number(also called scores) to each residue, and then look for the maximum-sum or maximum-average segment [3, 5, 10].

Ruzzo and Tompa [12] proposed a linear time algorithm for flnding all maximal scoring subse-quences. Huang [9] extended the well-known re-currence relation used by Bentley [2] for solving the maximum sum consecutive subsequence prob-lem, and derived a linear-time algorithm for com-puting the optimal segments with lengths at least L. Lin, Jiang, and Chao [10] and Fan et al. [5] studied the maximum-sum segment problem with

segment whose staring and ending indices of the segment lie in given intervals. By preprocessing the sequence in O(n) time, each query can be answered in O(1) time. This also yields an al-ternative linear-time algorithm for computing the maximum-sum segment with length constraints.

The rest of the paper is organized as follows. Section 2 gives the formal deflnition of the RMSQ problem and introduces the RMQ techniques [6]. Section 3 coping with the special case of the RMSQ problem(called the SRMSQ problem). Sec-tion 4 gives the optimal algorithm for the RMSQ problem.

2

Problem Definition and

Prelimi-naries

The input is a sequence A = ha1, a2, . . . , ani

of (not necessarily positive) real numbers. The maximum-sum segment of A is the contiguous sub-sequence having the greatest total sum, where the sum of a subsequence S(i, j) = hai, . . . , aji is

sim-ply w(S(i, j)) =Pj

k=iak. For simplicity,

through-out the paper, the term ”subsequence” will be taken to mean ”contiguous subsequence”. To avoid ambiguity, we disallow nonempty, zero-sum preflx or su–x (also called ties) in the maximum-sum segments. We deflne c[i] = Pi

k=1ak as the

cumulative sum of A for all 1 6 i 6 n and c[0] = 0. Notice that w(S(i, j)) = c(j) − c(i − 1).

As an example, consider the input sequence A= h4, −5, 2, −2, 4, 3, −2, 6i. The maximum-sum segment of A is M = h4, 3, −2, 6i, with a total sum of 11. There is another subsequence tied for

(2)

2.1

Problem Deflnition

We start with a special case of the RMSQ prob-lem, called the SRMSQ problem.

Problem 1. A Special Case of the RMSQ prob-lem(SRMSQ)

Structure to Preprocess: A= ha1, a2, . . . , ani

is a sequence of n real numbers.

Query: For an interval I = (i, j), 1 6 i 6 j 6 n, SRMSQA(i, j) returns a pair of indices

(x, y), i 6 x 6 y 6 j, such that segment S(x, y) is the maximum-sum segment of the subsequence hai, . . . , aji.

A na¨ıve algorithm is to build a n × n table stor-ing answers to all of the O(n2) possible queries.

Each entry (i, j) in the table represents an interval (i, j) so we only have to flll in the upper-triangular part of the table. By applying the well known linear time algorithm for flnding the maximum-sum segments of a sequence, we have an O(n3)

preprocessing algorithm. Notice that answering a SRMSQ query now requires just one lookup to the table. To achieve O(n2) preprocessing rather than

the O(n3) na¨ıve preprocessing, we use the online

property of the algorithm, fllling the table row-by-row. In the paper, our algorithm can achieve O(n) time and space preprocessing, and O(1) time for each query.

Problem 2. the Range Maximum-Sum Segment Query problem(RMSQ)

Structure to Preprocess: A= ha1, a2, . . . , ani

is a sequence of n real numbers.

Query: For two intervals S = (i, j) and E = (k, l), 1 6 i 6 j 6 n and 1 6 k 6 l 6 n, RMSQA(i,j,k,l) returns a pair of indices (x, y),

such that w(S(x, y)) is maximized for i 6 x 6 j and k 6 y 6 l.

This is a generalized version of the SRMSQ problem because when i = k and j = l we are actu-ally querying SRMSQA(i, j). The na¨ıve algorithm

will have to build a 4-dimensional table and the time complexity for preprocess will achieve Ω(n4).

This is deflnitely ine–cient for practical use. We will also provide an algorithm with O(n) prepro-cessing time and O(1) query time.

2.2

The RMQ Techniques

We next introduce an important technique, called RMQ, used in our algorithm. We are given a sequence A = ha1, a2, . . . , ani to be preprocessed.

A Range Minima Query(RMQ) specifles an inter-val I and the goal is to flnd the index k with min-imum value ak for k ∈ I.

Lemma 1. The RMQ problem can be solved in O(n) time and space prerpocessing and O(1) time per query. [1, 6]

The well known algorithm for the RMQ prob-lem is to flrst construct the Cartesian tree(deflned by Vuillemin 1980) of the sequence, then be preprocessed for LCA(Least Common Ancestor) queries [8, 4]. This algorithm can be easily modi-fled to output the index k for which akachieves the

minimum or the maximum. We denote RMQmin

as the minimum query and RMQmax as the

max-imum query. That is, RMQmin(A, i, j) will return

index k such that ak achieves the minimum for

i 6 k 6 j, and RMQmax(A, i, j) will return index

k such that ak achieves the maximum. For the

correctness of our algorithm if there are more than two minimums(maximums) in the query interval, it always outputs the rightmost(leftmost) index k for which ak achieves the minimum(maximum).

This can be done by constructing the Cartesian tree in a particular order.

3

Coping with the SRMSQ

Prob-lem

A key idea to solve the SRMSQ problem is to view the problem in the sense of the cumulative sum. For convenience of later proof, we give the following deflnition.

Definition 1. Let A be any nonempty real num-ber sequence. We deflne l[j] for each index j of A to be the largest index k such that c[k] ≥ c[j] and 1 6 k 6 j − 1. But if no such k exists, that is, c[j] > c[k] for all 1 6 k 6 j −1, we assign l[j] = 0. Such largest index l[j] and the cumulative sum c[j] for each index j of A can be computed by the PREPROCESS1 algorithm as illustrated in Figure 1.

(3)

Algorithm preprocess1

Input: An array of n real numbers A[1 . . . n].

Output: A length n array l[·] and a length n + 1 array c[·] 1 c[0] ← 0; 2 for j← 1 to n do 3 c[j] ← c[j − 1] + A[j]; 4 l[j] ← j − 1; 5 while c[l[j]] < c[j] and l[j] > 0 do 6 l[j] ← l[l[j]]; 7 end while 8 end for

Figure 1: Algorithm for computing c[·] and l[·]

Lemma 2. Let A be any nonempty real number sequence. If segment S(i, j) is the maximum-sum segment of A, then index i is the largest index such that the cumulative sum c[i − 1] is minimized for l[j] 6 i − 1 < j(that is, c[i − 1] is the rightmost minimum).

Proof. Suppose not, then either (1) index i − 1 lies in the interval [0, l[j] − 1] or (2) index i − 1 lies in the interval [l[j], j − 1] but is not the largest index such that the cumulative sum c[i − 1] is minimized for l[j] 6 i − 1 < j. We discuss it as follows. (1) Suppose index i − 1 lies in the interval [0, l[j] − 1]:

When l[j] = 0, it’s obvious that i − 1 cannot lie in the interval [0, l[j] − 1]. When l[j] > 0, we have w(S(i, l[j])) = c[l[j]] − c[i − 1] ≥ c[j] − c[i − 1] = w(S(i, j)). If equality holds, then w(S(l[j] + 1, j)) = c[j]−c[l[j]] = 0. So S(l[j]+1, j) would be a zero-sum preflx of S(i, j). Thus, w(S(i, l[j])) must be strictly greater than w(S(i, j)). But, this con-tradicts to the fact that S(i, j) is the maximum-sum segment of A. So index i − 1 cannot lie in the interval [0, l[j] − 1].

(2) Suppose index i − 1 lies in the interval [l[j], j − 1]:

If c[i − 1] is not minimized for l[j] 6 i − 1 < j. That is, there exists an index k, k 6= i and l[j] 6 k − 1 < j, such that c[k − 1] < c[i − 1]. Then we have w(S(k, j)) = c[j] − c[k − 1] > c[j] − c[i − 1] = w(S(i, j)). This contradicts to the fact that S(i, j) is the maximum-sum segment of A. So the cumulative sum c[i − 1] must be minimized for l[j] 6 i − 1 < j. We further

sup-pose that c[i − 1] is not the rightmost minimum, that is, there exists an index k0 > i such that

c[k0−1] is also a minimum. Then w(S(i, k0−1)) =

c[k0− 1] − c[i − 1] = 0. So S(i, j) has a zero-sum

preflx S(i, k0 − 1) which also contradicts to the

deflnition of the maximum-sum segment. Hence, index i must be the largest index such that c[i − 1] is minimized for l[j] 6 i − 1 < j.

Definition 2. Let A be any nonempty real number sequence. We deflne the ”good partner”, p[j], of each index j in A as the largest index k such that c[k − 1] is minimized for l[j] 6 k − 1 < j. And segment S(p[j], j) is called a candidate segment of A at j. We also deflne m[j] to be the sum of the candidate segment of A at j, that is m[j] = w(S(p[j], j)).

By Lemma 2, we know that each pair (p[j], j) constitutes a candidate solution of the maximum-sum segment of A, that is, segment S(p[j], j). The relationship between l[j] and p[j] as deflned above is illustrated in Figure 2. The left side of the flgure shows the case that there exists a largest index l[j] such that c[l[j]] ≥ c[j] and 1 6 l[j] 6 j. And the right side of the flgure shows the case that c[j] is the unique maximum of c[k] for all 1 6 k 6 j.

The good partner p[j] and the sum of the can-didate segment m[j] for each index j of A can be computed by the PREPROCESS2 algorithm as illustrated in Figure 3.

(4)

A

1 l[j] p[j] − 1 j n l[j] j− 1 c[l[j]] c[j] min→ c[p[j] − 1]  × 1 p[j] − 1 j n l[j] = 0 j− 1 c[j]  × min→ c[p[j] − 1]

Figure 2: An illustration for l[·] and p[·].

Algorithm preprocess2

Input: An array of n real numbers A[1 . . . n], array c[·] and array l[·]. Output: Two length n arrays, p[·] and m[·].

1 Apply RM Qmin preprocess to array c[·].

2 for j← 1 to n do

3 p[j] ← RM Qmin(c, l[j], j − 1) + 1;

4 m[j] ← c[j] − c[p[j] − 1]; 5 end for

Figure 3: Algorithm for computing p[·] and m[·]

Lemma 3. Let A be any nonempty real number sequence of length n. If index i satisfles m[i] ≥ m[k] for all 1 6 k 6 n, then S(p[i], i) is the maximum-sum segment.

Proof. Suppose on the contrary, segment S(s, t), (s, t) 6= (p[i], i), is the maximum-sum segment. By Lemma 2, we have s = p[t]. So m[t] = w(S(p[t], t)) = w(S(s, t)) > w(S(p[i], i) = m[i] which contradicts to m[i] is the maximum value.

Lemma 3 tells us, once we have computed m[j] and p[j] for each index j of A. To flnd the maximum-sum segment of A, we only have to re-trieve the index i such that m[i] ≥ m[k] for all 1 6 k 6 n. Then, S(p[i], i) is the maximum-sum segment of A. Lemma 4-6 will show some impor-tant properties of the candidate segments. Lemma 4. Let A be any nonempty real number sequence. If p[j] is the good partner of index j and p[j] < j, then c[p[j] − 1] < c[k] < c[j] for all p[j] − 1 < k < j.

Proof. Suppose not. That is, there exists an index k0, p[j] − 1 < k0 < j, such that c[k0] ≤ c[p[j] − 1]

or c[k0] ≥ c[j]. (1) If c[k0] ≤ c[p[j] − 1]. By

deflnition of p[j], we know that l[j] ≤ p[j] < j. Since p[j] − 1 < k0, we have l[j] < k0 < j and

c[k0] ≤ c[p[j] − 1]. This contradicts to the

deflni-tion of p[j] to be the largest index and c[p[j] − 1] is minimized for l[j] 6 p[j]−1 < j. (2) If c[k0] ≥ c[j],

then by deflnition of p[j], we have k0 ≤ p[j]−1 < j.

So k0≤ p[j] − 1 < k0. A contradiction occurs.

That is, c[p[j] − 1] is the unique minimum and c[j] is the unique maximum of c[k] for all p[j] − 1 6 k 6 j. The following lemma shows the nesting property of the candidate segments. See Figure 5. Notice that, the pointer of each index j points to the position of p[j].

(5)

Algorithm Preprocess of SRMSQA(i, j)

1 Run algorithm PREPROCESS1 to compute array c[·], l[·] of A. 2 Run algorithm PREPROCESS2 to compute array p[·], m[·] of A. 3 Apply RMQmaxpreprocess to array m[·].

4 Apply RMQmin preprocess to array c[·].

Algorithm Query of SRMSQA(i, j)

1 r ← RMQmax(m, i, j) 2 if p[r] < i then 3 r1 ← RMQmin(c, i − 1, r − 1) + 1; 4 r2 ← RMQmax(m, r + 1, j); 5 if c[r] − c[r1− 1] < m[r2] then 6 OUTPUT (p[r2],r2); 7 else 8 OUTPUT (r1, r); 9 end if 10 else 11 OUTPUT (p[r], r); 12 end if

Figure 4: Algorithm for the SRMSQ problem.

Lemma 5. Let A be any nonempty real num-ber sequence. For two indices i and j, i < j, if p[i] is the good partner of i and p[j] is the good partner of j, then it cannot be the case that p[i] < p[j] ≤ i < j.

Proof. Suppose p[i] < p[j] ≤ i < j holds. By Lemma 4, we have

(1) c[p[i] − 1] < c[k0] < c[i] p[i] − 1 < k0< i

(2) c[p[j] − 1] < c[k00] < c[j] p[j] − 1 < k00< j

Since p[j] − 1 < i, we can substitute p[j] − 1 for k0

in (1) ⇒ c[p[i] − 1] < c[p[j] − 1] < c[i].

Similarly, we can substitute i for k00 in (2) ⇒

c[p[j] − 1] < c[i] < c[j]. Then we have

(3) c[p[i]−1] < c[p[j]−1] < c[k00] < c[j] p[j]−

1 < k00< j

(4) c[p[i] − 1] < c[k0] < c[i] < c[j] p[i] − 1 <

k0 < i

By (3) and (4), we have

(5) c[p[i]−1] < c[k000] < c[j] p[i]−1 < k000< j

So, if there exists an index k such that c[k] ≥ c[j], then k < p[i] − 1. We also have by (5) that c[p[i] − 1] < c[p[j] − 1] which is a contradiction to that c[p[j] − 1] minimizes c[l] for k < l < j. If there is no such index k, then c[p[i]−1] < c[p[j]−1] is a contradiction to that c[p[j] − 1] minimizes c[l]

between sequence A and its subsequence Q. The following key lemma will show that a candidate segment S of A is still a candidate segment of Q if Q contains the whole candidate segment S. Lemma 6. Let A = ha1, . . . ani be any nonempty

real number sequence and Q = has, . . . ati be any

subsequence of A. If p[j] is the good partner of index j for sequence A and s ≤ p[j] ≤ j ≤ t, then p[j] is still the good partner of index j for sequence Q.

Proof. Let c⁄[j] be the cumulative sum of each

in-dex j in Q. Then, c⁄[j] =Pj

k=sak =

Pj

k=1ak−

Ps−1

k=1ak = c[j] − c[s − 1] for all s − 1 6 j 6 r. Let

l⁄[j] be the largest index k such that c[k] ≥ c[j]

for all s 6 k 6 j − 1. If no such k exists, we as-sign l⁄[j] = s − 1. Our goal is to prove that p[j] is

the largest index k that minimizes c⁄[k − 1] for all l⁄[j] 6 k − 1 < j.

(1) If l[j] ≥ s. Since l[j], by deflnition, is the largest index k such that c[k] ≥ c[j] for all 1 6 k 6 j − 1. So l[j] is the largest index k such that c⁄[k] = c[k] − c[s − 1] ≥ c[j] − c[s − 1] = c⁄[j] for

all 1 6 s 6 k 6 j − 1. Hence, we have l⁄[j] = l[j].

And p[j], by deflnition, is the largest index k that minimizes c[k − 1] for all l[j] 6 k − 1 < j. So p[j]

(6)

s 6 k 6 j − 1. So, c⁄[k] = c[k] − c[s − 1] <

c[j] − c[s − 1] = c⁄[j] for all s 6 k 6 j − 1.

Hence, we have l⁄[j] = s − 1. Since p[j] is the

largest index k that minimizes c[k − 1] for all l[j] 6 k − 1 < j. So, p[j] is the largest index k that minimizes c⁄[k − 1] = c[k − 1] − c[s − 1]

for all l[j] 6 s − 1 6 k − 1 < j. That is, p[j] is the largest index k that minimizes c⁄[k − 1] for all

l⁄[j] 6 k − 1 < j.

Corollary 1. Let A = ha1, . . . ani be any

nonempty real number sequence and Q = has, . . . ati be any subsequence of A. If p[j] is the

good partner of index j for sequence A, s ≤ p[j] ≤ j ≤ t, and m[j] is the sum of the candidate seg-ment of A at j. Then m[j] is also the sum of the candidate segment of Q at j.

Proof. A direct result of Lemma 6.

Now, we are ready to present our algorithm for the SRMSQ problem(See Figure 4). Let A be a sequence of n real numbers and [i, j] is the query interval, where 1 6 i 6 j 6 n.

For instance, see Figure 5, the input sequence A has 15 elements. suppose we are querying SRMSQA(3, 7). The QUERY OF SRMSQ

algo-rithm in Figure 4 will flrst query the index r such that m[r] is maximized for 3 6 r 6 7(line 1). In this case, r = 5, which means candidate segment S(p[5], 5) has the largest sum compared to other segments. The left end of S(p[5], 5), p[5] = 3, lies in the interval [3, 7]. The algorithm executes line 11, output (p[5], 5), which means segment S(3, 5) is the maximum-sum segment of the subsequence A[3 . . . 7].

Suppose we are querying SRMSQA(6, 12),

RMQmax(m, 6, 12) will return index 9(line 1).

Since p[9] = 3 < 6, line 3-9 will be executed. In line 3, RMQmin(c, 5, 8) will return index 8. In line

4, RMQmax(m, 10, 12) will return index 11. In line

5, since c[9] − c[8] = 6 < m[11] = 8, the QUERY OF RMSQ algorithm will output (p[11], 11), which means S(11, 11) is the maximum-sum segment of the subsequence A[6. . . 12].

Theorem 1. Algorithm QUERY OF SRMSQA(i, j) will output the maximum-sum

seg-ment of the subsequence S(i, j).

Proof. Let m⁄[k] be the sum of the candidate

seg-ment of S(i, j) at k and c⁄[k] be the cumulative

sum of S(i, j) for i − 1 6 k 6 j. We have c⁄[k] = c[k]−c[i−1], i−1 6 k 6 j. Let p[k] be the

good partner of index k of Q for all i 6 k 6 j. Let index r satisfy m[r] ≥ m[k] for all i 6 k 6 j(line 1).

(A) If p[r] ≥ i: Our goal is to show that m[r] ≥

m⁄[k] for all i 6 k 6 j, and then apply Lemma 3 to complete the proof. We flrst consider each index k0 such that i 6 k0 6 j and i ≤ p[k0] ≤ k0≤ j. By

corollary 1, we can deduce that m⁄[k0] = m[k0] ≤

m[r] = m⁄[r]. We next consider each index k00

such that i 6 k00 6 j and p[k00] < i. Since

p[k00] < i ≤ k00, we can apply Lemma 4 and get

c[p[k00] − 1] < c[k] < c[k00] p[k00] − 1 < k < k00

(1)

Since i ≤ p⁄[k00] < k00 by deflnition of p[k00], we

have p[k00] − 1 < i − 1 ≤ p[k00] − 1 < k00. So,

we can substitute p⁄[k00] − 1 for k in (6) and get c[p[k00] − 1] < c[p[k00] − 1]. Hence, we can deduce

that m⁄[k00] = c[k00] − c[p[k00] − 1] = c[k00] −

c[p⁄[k00] − 1] < c[k00] − c[p[k00] − 1] = m[k00]

More-over, m[k00] ≤ m[r] = m[r]. So, we have m[k00] <

m⁄[r]. Till now, we have shown that m[r] ≥ m⁄[k]

for all i 6 k 6 j. By Lemma 3, S(p[r], r) is the maximum-sum segment of S(i, j)(line 11). (B) If p[r] < i: We flrst consider each index k0

such that i 6 k0 < r. Since p[r] < i ≤ r, we can

apply Lemma 4 and obtain

c[p[r] − 1] < c[k] < c[r] ∀ p[r] − 1 < k < r (2) Since c[k] < c[r] for all p[r] − 1 < i 6 k < r, we have c⁄[k] = c[k] − c[i − 1] < c[r] − c[i − 1] =

c⁄[r] i 6 k < r. For any segment S(k, k0),

i 6 k < r, since w(S(k, k0)) = c[k0] − c[k] <

c⁄[r] − c⁄[k] = w(S(k, r)), it’s not hard to see that S(k, k0) cannot be the maximum-sum segment.

We next consider each index k00 such that r <

k00≤ j. By Lemma 5, we know that it cannot be the case p[r] < p[k00] ≤ r < k00. If p[k00] ≤ r, then

it must be the case that p[k00] ≤ p[r] < r < k00.

Since p[k00] − 1 ≤ r < k00and p[k00] − 1 ≤ p[r] − 1 <

k00, we can apply Lemma 4 and get c[p[k00] − 1] <

c[r] < c[k00] and c[p[k00] − 1] < c[p[r] − 1] < c[k00].

So, m[k00] = c[k00]−c[p[k00]−1] > c[r]−c[p[r]−1] =

m[r] which contradicts to that m[r] ≥ m[k] for all i 6 k 6 j. So, p[k00] £ r, that is, p[k00] > r.

Hence, by corollary 1, we have m⁄[k00] = m[k00] for

r < k00 ≤ j. Let index r

2 satisfles m[r2] ≥ m[k]

for all r + 1 6 k 6 j(line 4). It’s not hard to see that either S(r1, r) or S(p[r2], r2) is the

maximum-sum segment of S(i, j). By Lemma 3, the one with greater sum is the maximum-sum segment of S(i, j)(line 5-9).

(7)

j 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 aj - 9 −10 4 −2 4 −5 4 −3 6 −11 8 −3 4 −5 3 c[j] 0 9 −1 3 1 5 0 4 1 7 −4 4 1 5 0 3 l[j] - 0 1 1 3 1 5 5 7 1 9 9 11 9 13 13 p[j] - 1 2 3 4 3 6 7 8 3 10 11 12 11 14 15 m[j] - 9 −10 4 −2 6 −5 4 −3 8 −11 8 −3 9 −5 3

9

-10

4

-2

4

-5

4

-3

6

-11

8

-3

4

-5

3

Figure 5: The candidate segment S(p[j],j) of each index j.

Algorithm Preprocess of RMSQ 1 Apply SRMSQ preprocess. 2 Apply RMQmaxpreprocess to c[·].

Algorithm Query of RMSQA(i, j, k, l)

1 if j≤ k then

2 OUTPUT (RMQmin(c, i − 1, j − 1) + 1,RMQmax(c, k, l))

3 else

4 (r1,r01) ← (RMQmin(c, i − 1, k − 1) + 1,RMQmax(c, k, l));

5 (r2,r02) ← (RMQmin(c, k, j − 1) + 1,RMQmax(c, j, l));

6 (r3,r03) ← SRMSQA(k, j);

7 OUTPUT (rm,r0m) such that c[rm] − c[r0m] is maximized for 1 6 m 6 3;

8 end if

Figure 6: Algorithm for the RMSQ problem.

Lemma 7. Algorithm PREPROCESS1 runs in O(n) time.

Proof. It can be shown by a simple amortized analysis. The total number of operations of the al-gorithm is clearly bounded by O(n) except for the while-loop body of Steps 5-7. In the following, we show that the amortized cost of the while-loop is a constant. Therefore, the overall time required by the loop is O(n). We deflne the potentialf unction of A after the ith iteration of the for-loop to be Φ(i), i.e. the number of times pointer l[i] may be advanced most. So we have Φ(i) ≥ 0 always holds in every iteration. Suppose that the pointer l[·] is advanced ci times in this period. Then the

actual cost of the operations is ci+ 1.

Observ-ing that Φ(i) = Φ(i − 1) − ci + 1, the change

of the potential of A during the ith iteration is

niterations would be executed in the entire pro-cess, the while-loop spends at most overall O(n) time.

Lemma 8. Algorithm PREPROCESS2 runs in O(n) time.

Proof. By Lemma 1, the preprocessing time for RMQ is O(n) and the query time is O(1). So, the cost of each step in the for-loop is a constant and the overall time required is O(n).

Theorem 2. The SRMSQ problem can be solved in O(n) preprocessing time and O(1) query time. Proof. By Lemma 1, Lemma 7, and Lemma 8, the algorithms for preprocess all run in O(n) time. So, the time complexity for the PREPROCESS OF SRMSQ algorithm is O(n). Since each RMQ

(8)

It’s not hard to verify that the space complexity for these algorithms is also O(n).

4

Coping with the RMSQ Problem

The RMSQ problem is to answer the queries comprising of two intervals [i, j] and [k, l], where [i, j] specifles the range of the starting index of the maximum-sum segment, and [k, l] specifles the range of the ending index. Since it is mean-ingless if the range of the starting index is in front of the range of the ending index. So, without loss of generality, we assume that i ≤ k and j ≤ l. We presents our algorithm as follows.

Theorem 3. Algorithm QUERY OF RMSQA(i, j, k, l) will output the correct answer.

Proof. We discuss it under two possible condi-tions.

(1) Nonoverlapping(j ≤ k): Suppose the intervals [i, j] and [k, l] do not overlap. Since w(S(x, y)) = c(y)−c(x−1), to maximize S(x, y) is equivalent to maximize c(y) and minimize c(x − 1) for i ≤ x ≤ j and k ≤ y ≤ l. By applying the RMQ technique to preprocess c[·], the maximum-sum segment can be easily located.(line 2)

(2) Overlapping(j > k): When it comes to the overlapping case, just to flnd the maximum cu-mulative sum and the minimum cucu-mulative sum might go wrong if the minimum is on the right of the maximum. We discuss it under 3 possible conditions for the maximum-sum segment S(x, y). (a) Suppose i ≤ x ≤ k and k ≤ y ≤ l, this is an nonoverlapping case. So we flnd the mini-mum cumulative sum and the maximini-mum cumula-tive sum.(line 4) (b) Suppose k + 1 ≤ x ≤ j and j ≤ y ≤ l, this is also an nonoverlapping case. We flnd the minimum cumulative sum and the maximum cumulative sum.(line 5) (c) Otherwise, k+ 1 ≤ x ≤ j and k + 1 ≤ y ≤ j, this is exactly the same as a SRMSQA(k + 1, j) query.(line 6)

The maximum sum segment S(x, y) must be one of these three possible cases and have the largest sum(line 7).

Theorem 4. The RMSQ problem can be solved in O(n) preprocessing time and O(1) query time. Proof. The time for the RMQmaxand the SRMSQ

preprocesses are O(n). So the PREPROCESS OF RMSQ algorithm costs O(n) time. And the query time is O(1) since each step in the QUERY OF RMSQ algorithm costs O(1) time.

Acknowledgements. We thank Yu-Ru Huang and An-Chiang Chu for helpful conversations. Quan-Yu Chen and Kun-Mao Chao were sup-ported in part by an NSC grant 92-2213-E-002-059.

References

[1] M. A. Bender, and M. Farach-Colton. The LCA Problem Revisited. In Proceedings of the 4th Latin American Symposium on Theoreti-cal INformatics, 17: 88–94, 2000.

[2] J. Bentley, Programming Pearls - Algorithm Design Techniques, CACM, 865-871, 1984. [3] K. Chung and H.-I. Lu. An Optimal

Al-gorithm for the Maximum-Density Segment Problem. In Proceedings of the 11th Annual European Symposium on Algorithms (ESA 2003), LNCS 2832, pp. 136-147, 2003. [4] D. Harel and R. E. Tarjan. Fast

Algo-rithms for Finding Nearest Common Ances-tors. SIAM J Comput. Vol. 13, No 2, 1984. [5] T.-H. Fan ,S. Lee, H.-I Lu, T.-S. Tsou, T.-C.

Wang, and A. Yao. An Optimal Algorithm for Maximum-Sum Segment and Its Appli-cation in Bioinformatics. CIAA, LNCS 2759, pp. 251-257, 2003.

[6] H. Gabow, J. Bentley, and R. Tarjan. Scal-ing and Related Techniques for Geometry Problems. Proc. Symp Theory of Comput-ing(STOC), 1984, 135-143.

[7] D. Gusfleld, Algorithms on Strings, Trees, and Sequences: Computer Science and Com-putational Biology. Cambridge University Press, 1999.

[8] B. Schieber and U. Vishkin. On Finding Low-est Common AncLow-estors: Simpliflcation and Parallelization. SIAM J Comput. Vol. 17, No 6, 1988.

[9] Huang, X. (1994) An algorithm for identify-ing regions of a DNA sequence that satisfy a content requirement. CABIOS, 10, 219-225. [10] Y.-L. Lin, T. Jiang, and K.-M. Chao.

Ef-flcient Algorithms for Locating the Length-constrained Heaviest Segments with Appli-cations to Biomolecular Sequence Analysis. Journal of Computer and System Sciences, 65, 570-586, 2002.

(9)

[11] S. Muthukrishnan, E–cient Algorithms for Document Retrieval Problems. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, 657–666, 2002.

[12] W. L. Ruzzo and M. Tompa. A Linear Time Algorithm for Finding All Maximal Scoring Subsequences. In 7th Intl. Conf. Intelligent Systems for Molecular Biology, Heidelberg, Germany, Aug. 1999.

數據

Figure 1: Algorithm for computing c[·] and l[·]
Figure 2: An illustration for l[·] and p[·].
Figure 5: The candidate segment S(p[j],j) of each index j.

參考文獻

相關文件

• tiny (a single segment, used by .com programs), small (one code segment and one data segment), medium (multiple code segments and a single data segment), compact (one code

In this section we define a general model that will encompass both register and variable automata and study its query evaluation problem over graphs. The model is essentially a

More precisely, it is the problem of partitioning a positive integer m into n positive integers such that any of the numbers is less than the sum of the remaining n − 1

The proof is based on Hida’s ideas in [Hid04a], where Hida provided a general strategy to study the problem of the non-vanishing of Hecke L-values modulo p via a study on the

• When a system undergoes any chemical or physical change, the accompanying change in internal energy, ΔE, is the sum of the heat added to or liberated from the system, q, and the

&lt; Notes for Schools: schools are advised to fill in the estimated minimum quantity and maximum quantity (i.e. the range of the quantity) of the items under Estimated Quantity

experiences in choral speaking, and to see a short segment of their performance at the School Speech Day... Drama Festival and In-school Drama Shows HPCCSS has a tradition

The min-max and the max-min k-split problem are defined similarly except that the objectives are to minimize the maximum subgraph, and to maximize the minimum subgraph respectively..