An improved algorithm for finding a length-constrained maximum-density subtree in a tree

(1)

Information Processing Letters 109 (2008) 161–164

Contents lists available atScienceDirect

Information Processing Letters

www.elsevier.com/locate/ipl

An improved algorithm for ﬁnding a length-constrained

maximum-density subtree in a tree

Hsin-Hao Su

a

, Chin Lung Lu

b

,

c

, Chuan Yi Tang

a

,

∗

a_{Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan} b_{Institute of Bioinformatics, National Chiao Tung University, Hsinchu 300, Taiwan}

c_{Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan}

a r t i c l e i n f o a b s t r a c t

Article history: Received 21 April 2008

Received in revised form 8 August 2008 Available online 30 September 2008 Communicated by F.Y.L. Chin

Keywords: Algorithms

Dynamic programming Trees

Network design Divide and conquer

Given a tree T with weight and length on each edge, as well as a lower bound L and an upper bound U , the so-called length-constrained maximum-density subtree problem is to ﬁnd a maximum-density subtree in T such that the length of this subtree is between L and U . In this study, we present an algorithm that runs in O(nU log n)time for the case when the edge lengths are positive integers, where n is the number of nodes in T , which is an improvement over the previous algorithms when U_{= (}log n). In addition, we show that the time complexity of our algorithm can be reduced to O(nL logn_L), when the edge lengths being considered are uniform.

1. Introduction

Given a tree T

= (

V

,

E

)

, let w

(

e

)

be a positive real function for representing the weight of an edge e

∈

E

and l

(

e

)

be a positive integer function for representing the length of e. For a path

(

e1,e2, . . . ,ek

)

in T , its

den-sity is deﬁned as

k_i₌₁w

(

ei

)/

ki=1l

(

ei

)

. Suppose now that we are given a tree T , as well as two positive integers L and U with L

U . Then the so-called length-constrained maximum-density path (LDP) problem is to ﬁnd a

maxi-mum-density path in T such that the length of this path is between the lower bound L and the upper bound U . The LDP problem [8,10], as well as its special case in which the given T is a path [1,3,6,7,9], has applications for study-ing alignment and GC-content of genomic sequences in computational biology. Lin et al. were the ﬁrst to study this problem but with a restriction that all the consid-ered edges are all equal in length and also proposed two

O

(

nL

)

-time algorithms to solve it [10], where n denotes

*

Corresponding author.

E-mail address:cytang@cs.nthu.edu.tw(C.Y. Tang).

the number of nodes in the given tree. Later, Lau et al. de-signed an O

(

n log2n

)

-time algorithm for the general case of this problem with allowing the edge lengths to be any positive real numbers and further reduced the time com-plexity of this algorithm to O

(

n log n

)

when the lengths of edges are uniform (i.e., l

(

e

)

=

1 for each e

∈

E) [8]. More

recently, by giving an extra parameter K that is a positive integer, Hsieh and Cheng [4] have studied an variant of the LDP problem that is to ﬁnd a maximum-density path in T such that the length of this path is between L and U and the number of its edges is at least K . In this study, they proposed an O

(

nK U

)

-time algorithm to solve this prob-lem.

Lau et al. [8] studied the length-constrained

maxi-mum-density subtree (LDT) problem, a generalization of

the LDP problem, which is to ﬁnd a maximum-density subtree, rather than path, in the given tree T

= (

V

,

E

)

with the length between L and U , where the density of a subtree T

= (

V

,

E

)

in T is similarly deﬁned as

e∈Ew

(

e

)/

e∈El

(

e

)

. Actually, as mentioned in [8], the LDT problem has its applications in computer, traﬃc or logistics network design. In [5], Hsieh and Chou studied a variant of the LDT problem in which the weight and length 0020-0190/$ – see front matter ©2008 Elsevier B.V. All rights reserved.

(2)

162 H.-H. Su et al. / Information Processing Letters 109 (2008) 161–164

functions were deﬁned on nodes rather than edges. How-ever, their algorithms can still be applied to solve the LDT problem we considered in this study. In fact, as pointed out in [11], the LDT problem with general integer edge lengths can be shown to be NP-hard by a simple reduction from the knapsack problem [2]. Currently, to the best of our knowledge, the LDT problem can be solved in O

(

nU2

)

time using the algorithm proposed in [5,8] when the edge lengths are positive integers and, in addition, it can be solved in O

(

nL2

)

time using the algorithm designed in [8] when the edge lengths are uniform. It should be noted here that neither of algorithms in [5,8] for solving the LDT problem with any integer edge lengths are polynomial but pseudo-polynomial. Basically, these two algorithms uti-lized the same idea that is first to transform the given tree into a rooted binary tree and then use the dynamic programming approach to find all maximum-weight sub-trees with all possible lengths on the transformed binary tree. Further note that the LDT problem with uniform edge lengths can be simply considered as the problem of find-ing a size-constrained maximum-density subtree (SDT) in a given tree [10]. In [8], Lau et al. have shown that find-ing an SDT in a general graph is still NP-hard. However, as mentioned above, this problem can be solved in poly-nomial time using the algorithm proposed in [8] when the given graph is just a tree, where notably the time-complexity of this algorithm is O

(

nL2

)

that is polynomial because L

n.

In this study, we propose an improved algorithm on the basis of a combination of divide-and-conquer and dy-namic programming to solve the LDT problem. If the edge lengths are positive integers, then our algorithm can solve the LDT problem in O

(

nU log n

)

time, which is an improve-ment over the previous algorithms when U

= (

log n

)

. If the edge lengths are uniform, then the time-complexity of our algorithm can be further reduced to O

(

nL logn_L

)

. Fi-nally, we also show that the maximum-weight subtrees of all sizes can be computed in O

(

n2

₎

_time.

The rest of this paper is organized as follows. Section 2 presents the main idea of the algorithm we used to solve the LDT problem with positive integer lengths. Section 3 shows its further improvement in the case when all edge lengths are uniform. Section 4 concludes the study with some remarks.

2. Algorithm for solving the LDT problem in a tree In the following, we propose a method to improve the algorithms that were designed in [5] and [8] based on the dynamic programming approach for solving the LDT prob-lem. The basic steps of these two algorithms are as follows. First, the algorithms transform the given tree into a rooted binary tree. Next, for each node x in this binary tree, they allocate a table of size U in which the ith entry, where 1

i

U , represents the weight of a maximum-density

subtree whose root is at x and whose length is restricted as i. Suppose that y and z are the two children of x. Then a subtree rooted at x of length i can be constructed by joining a subtree of length j rooted at y and a subtree of length i

−

l

(

x

,

y

)

−

l

(

x

,

z

)

−

j rooted at z together with

edges

(

x

,

y

)

and

(

x

,

z

)

. Based on this property, we can see

that there are O

(

U

)

possible choices for the computation of each entry in the table associated with x and therefore the total computation of the table takes O

(

U2

)

time.

The basic idea we used in our dynamic programming algorithm is as follows. We ﬁrst transform the input tree into a rooted tree by choosing a node r as the root and then we compute a maximum-density subtree contain-ing r. Note that by removcontain-ing r and all edges incident with it, we can yield several subtrees, say T1,T2, . . . ,Tdeg(r). If the optimal solution does not contain r, then it must be a subtree of Ti, where 1

i

deg

(

r

)

. Actually, this idea was also used in [8,11] to ﬁnd a length-constrained maximum-density path or a length-constrained heaviest path in a tree.

Based on the above idea, we are dedicated to design a dynamic programming algorithm, as described below, to ﬁnd a length-constrained maximum-density subtree con-taining the root r. For each node x in the rooted tree T , we use Ax

[

i

]

to store the weight of a maximum-weight subtree of length i that contains r and x, where Ax is a ta-ble of size U . Initially, we set Ar

[

0

] =

0 and Ar

[

i

] = −∞

for i

=

1

,

2

, . . . ,

U and Ax

[

i

] = −∞

for i

=

0

,

1

,

2

, . . . ,

U for every non-root node x. Then we traverse the tree T rooted at r in a depth-ﬁrst search manner. In the travers-ing process, there are two different directions to visit y: (1) descending direction from a parent node x to y and (2) ascending direction from a child node x to y. Depend-ing on the traversDepend-ing direction, we then assign a value to

Ay

[

i

]

, as described as follows, when we visit node y. If the direction we visit y is descending, then we let Ay

[

i

] =

Ax

[

i

−

l

(

x

,

y

)

] +

w

(

x

,

y

)

for i

=

l

(

x

,

y

),

l

(

x

,

y

)

+

1

, . . . ,

U . If the direction we visit y is ascending, then we let Ay

[

i

] =

max

{

Ax

[

i

],

Ay

[

i

]}

for i

=

0

,

1

, . . . ,

U .

Actually, after the traversal is ﬁnished, we can prove that Ar

[

i

]

is the weight of the maximum-weight subtree of length i containing r according to Lemma 1, as described below. Based on this property, therefore, the density of a length-constrained maximum-density subtree containing r is maxLiU Ar

[

i

]/

i. Note that during the traversing pro-cess, we record the sequence of the nodes we have visited and denote it by

(

r

, . . . ,

x

)

, where x is the currently visiting node. Clearly, the graph induced by these nodes is a sub-tree of T and for convenience we denote it by T

(

r

, . . . ,

x

)

. Lemma 1. When we currently visit x, let

(

r

, . . . ,

x

)

be the sequence of the nodes we have traversed. Then Ax

[

i

]

stores

the weight of the maximum-weight subtree in T

(

r

, . . . ,

x

)

of length i that contains both r and x until new Ax

[

i

]

is computed. Proof. We prove this lemma by induction on the sequence of the visited nodes. Initially, T

(

r

)

is a subtree that con-tains an isolated node r and clearly the lemma holds. Next, we assume that the lemma holds for

(

r

, . . . ,

x

)

and let y be the next node we are going to visit. Then there are two cases to be considered.

(1) Suppose that y is a child node of x. Then y is a leaf in the induced subtree T

(

r

, . . . ,

x

,

y

)

. Therefore, if we re-move y from any subtree of T

(

r

, . . . ,

x

,

y

)

that contains r and y, the resulting tree must be a subtree of T

(

r

, . . . ,

x

)

that contains r and x. According to the assumption, it is clear that Ax

[

i

−

l

(

x

,

y

)

] +

w

(

x

,

y

)

, which equals to the

(3)

H.-H. Su et al. / Information Processing Letters 109 (2008) 161–164 163

value of Ay

[

i

]

based on our method described above, is the weight of a maximum-weight subtree in T

(

r

, . . . ,

x

,

y

)

of length i that contains r and y.

(2) Suppose that y is a parent of x. Let P be a max-imum-weight subtree of T

(

r

, . . . ,

x

,

y

)

with length i con-taining r and y. Since y is the parent of x, our traversal sequence must be in the order of

(

r

, . . . ,

y

,

x

, . . . ,

x

,

y

)

. Let

Q be the subtree that is rooted at x. If P

∩

Q

= ∅

then P is a subtree of T

(

r

, . . . ,

y

,

x

, . . . ,

x

,

y

)

\

Q

=

T

(

r

, . . . ,

y

)

pass-ing through r and y, indicatpass-ing that the weight of P has already been stored in Ay

[

i

]

. If P contains a node in Q , then P must contain x. In this case, P is a maximum-weight subtree of T

(

r

, . . . ,

y

,

x

, . . . ,

x

)

passing through r and x and its weight was stored in Ax

[

i

]

. According to our method, we will select the maximum one between

Ax

[

i

]

and old Ay

[

i

]

and assign it to new Ay

[

i

]

. In other words, Ay

[

i

]

keeps the weight of maximum-weight sub-tree of T

(

r

, . . . ,

x

,

y

)

.

2

When the traversal is completed, Ar

[

i

]

stores the weight of a maximum-weight subtree in T

(

r

, . . . ,

r

)

=

T

of length i that contains r according to Lemma 1. There-fore, the density of a length-constrained maximum-density subtree containing r is maxLiUAr

[

i

]/

i.

Given a tree T

= (

V

,

E

)

, there exists a node c called centroid such that deleting c results in several subtrees each containing no more than

|

V

|/

2 nodes. We can find this node by rooting T at some node first. Then we per-form a postorder traversal on T to count the total number of nodes below each node. The first node that has more than

|

V

|/

2 nodes below it is a centroid. This can be done in linear time.

Now, we describe our algorithm that proceeds as fol-lows:

1. Find a centroid c of T .

2. Use the dynamic programming method described be-fore to ﬁnd a length-constrained maximum-density subtree that contains c.

3. Separate T into several subtrees, say T1,T2, . . . ,Tdeg(c), by removing c and all edges incident with it from T , and recursively repeat steps 1 and 2 on these subtrees. 4. Compare deg

(

c

)

+

1 length-constrained maximum-density subtrees with containing the centroid we ob-tained in steps 2 and 3 and choose the one with the highest density as the output.

Below, we analyze the time complexity of our algo-rithm. Let T

(

n

)

be the time complexity of the algorithm when the size of the input tree is n. Then step 1 can be done in O

(

n

)

, as described above. In step 2, we take O

(

U

)

time to update the table associated with each node in the input tree. Thus the time complexity of step 2 is O

(

nU

)

. Clearly, according to our algorithm, T

(

n

)

can be written as a recursive function as follows, where notably ni denotes the size of Ti.

T

(

n

)

=

deg(

c)

i=1

T

(

ni

)

+

O

(

nU

).

Since ni

n

/

2, it is not hard to derive that T

(

n

)

=

O

(

nU log n

)

.

Theorem 1. The LDT problem can be solved in O

(

nU log n

)

time.

3. Algorithms for solving the SDT problem in a tree We here deﬁne the size of a tree as the number of edges it contains and therefore the size-constrained maximum-density subtree problem is equivalent to the length-constrained maximum-density subtree problem with uniform edge lengths. As pointed out in [8], if the size is constrained between L and U , there exists a maximum-density subtree with a size less than 3L. For completeness, we give a more detailed proof below.

Lemma 2. A tree T of a size greater than or equal to 3L can be

separated into two edge-disjoint subtrees at least of the size L.

Proof. Let the tree T be rooted at some node r. Then we perform a postorder traversal on T to compute the size of subtree rooted at every node. Let x be the ﬁrst node we traverse in postorder such that the subtree rooted at

x has at least the size L. Let x1,x2, . . . ,xk be the chil-dren of x. Denote the subtree rooted at xi by T

(

xi

)

and its size by size

(

T

(

xi

))

. Let j be the smallest positive inte-ger such that

_ij₌₁size

(

T

(

xi

))

+

j

L. Such j must ex-ist, since

k_i₌₁size

(

T

(

xi

))

+

k is the size of the subtree rooted at x, which is at least L. Let P

=

_ij₌₁T

(

xi

)

∪ (

x

,

xi

)

. Then P is a subtree at least of the size L. But the size of P is no greater than 2L. Suppose size

(

P

) >

2L. Then

j−1

i=1size

(

T

(

xi

))

+

j

−

1

=

size

(

P

)

−

size

(

T

(

xj

))

−

1. Re-call that x is the ﬁrst node whose induced subtree has a size at least L, implying size

(

T

(

xj

)) <

L. As a result,

j−1

i=1size

(

T

(

xi

))

+

j

−

1

>

L, which again contradicts to the assumption that x is the ﬁrst node whose induced sub-tree has a size greater than or equal to L. Therefore, all the edges which are in T but not in P induce another subtree at least of a size L.

2

If the size of T exceeds 3L

−

1, then T can be sepa-rated into two edge-disjoint subtrees, say T1 and T2, each with size of at least L, according to Lemma 2. Then it is clear that the higher density between T1 and T2 must be higher than or equal to the density of T . To ﬁnd a size-constrained maximum-density subtree in a given tree, we actually can employ the same algorithm for ﬁnding a length-constrained maximum-density subtree described in the previous section. For each node v in the given tree, however, we only need to compute the table of Avranging from 1 to 3L

−

1, rather than U , if U is greater than 3L

−

1. In addition, we do not need to consider those maximum-density subtree whose size is less than L, since all these subtrees do not satisfy the size constraint required to be at least L. Therefore, we can reformulate the recursive func-tion of the time complexity as follows, so that we can get a tighter time bound.

T

(

n

)

=

deg(c)

i=1 T

(

ni

)

+

O

(

nL

),

n

−

1

L

,

(4)

164 H.-H. Su et al. / Information Processing Letters 109 (2008) 161–164

Note that in the above recursive function, ni denotes the size of the subtree induced by the ith child of the ini-tial centroid c and the size of T is n

−

1.

Corollary 1. A size-constrained maximum-density subtree in a

tree can be ﬁnd in O

(

nL logn_L

)

time when n

−

1

L.

Proof. Suppose by induction that T

(

m

) <

knL lgm_L for all

m

<

n, where k

>

0 is a constant. Then we have T

(

n

)

kL

_n_i₋₁_Lnilgn_Li

+

ni−1<Le

+

dnL, for some constants

d

>

0 and e

>

0. It should be noted that if all the sub-trees have a size less than L, then the ﬁrst term on the right-hand side of the above inequality is zero. Ac-tually, we can express the above inequality as T

(

n

)

kL

_inilgnmax_L

+

enL

+

dnL, where nmax

=

max

{

ni

|

ni

−

1

L

}

, and consequently T

(

n

)

knL lgnmax

L

+ (

d

+

e

)

nL

knL lg_2Ln

+ (

d

+

e

)

nL. Here, we choose k such that k

>

d

+

e. If the ﬁrst term is zero, then T

(

n

) <

knL

knL lgn_L. If the ﬁrst term exists, then T

(

n

)

knL lgn_L

+ (

d

+

e

−

k

)

nL

<

knL lgn_L.

2

It is also possible to compute the maximum-weight subtrees of all sizes in O

(

n2

)

time. The algorithm is the same as the one we described above. But when applying the dynamic programming approach on each tree, we con-sider the table up to the size of the tree instead of 3L

−

1. In this way, the maximum-weight subtree of the size i is the maximum of maximum-weight subtree of the size i in each tree that has a size greater than or equal to i. In this case, the recursive function for the time complexity of al-gorithm becomes as T

(

n

)

=

deg(_i₌₁c)T

(

ni

)

+

O

(

n2

)

. Corollary 2. It takes O

(

n2

₎

_{time to ﬁnd the maximum-weight} subtrees of all sizes.

Proof. Clearly, T

(

n

)

_ideg(₌₁c)T

(

ni

)

+

en2 for some con-stant e

>

0. Suppose that T

(

n

) <

dn2_{. Then}

T

(

n

) <

deg(

c) i=1 dni2

+

en2

<

deg(

c) i=1 dni n 2

+

en 2

dn2 2

+

en 2

_<

_dn2

_,

if we choose d

>

2e. Therefore, T

(

n

)

=

O

(

n2

₎

_.

₂

4. Concluding remarks

In this study, we have proposed an O

(

nU log n

)

-time algorithm for solving the maximum-density subtree

prob-lem, which is better than the previous algorithms when bound U

= (

log n

)

. In addition, we have shown that the time complexity of this algorithm can be reduced to

O

(

nL logn_L

)

when the edge lengths in the given tree are uniform. Actually, the idea behind the dynamic program-ming we designed in this study can be used to compute a length-constrained or size-constrained optimal subtree with other objective functions in a tree, such as length-constrained bottleneck subtree. As a future work, it would be interesting to know if the space complexity of the length-constrained maximum-density subtree problem can be reduced to O

(

n

+

U

)

.

Acknowledgements

This work was supported in part by National Science Council of Republic of China under grant NSC-97-2815-C-007-041-E.

References

[1] K.M. Chung, H.I. Lu, An optimal algorithm for the maximum-density segment problem, SIAM Journal on Computing 34 (2004) 373– 387.

[2] M.R. Garey, D.S. Johnson, Computers and Intractability—A Guide to the Theory of NP-Completeness, Freeman, San Francisco, 1979.

[3] M.H. Goldwasser, M.Y. Kao, H.I. Lu, Linear-time algorithms for com-puting maximum-density sequence segments with bioinformatics applications, Journal of Computer and System Science 70 (2005) 128– 144.

[4] S.Y. Hsieh, C.C. Cheng, Finding a maximum-density path in a tree under the weight and length constraints, Information Processing Let-ters 105 (2008) 202–205.

[5] S.Y. Hsieh, T.Y. Chou, Finding a weight-constrained maximum-density subtree in a tree, in: Proceedings of the 16th International Sympo-sium on Algorithms and Computation (ISAAC 2005), in: Lecture Notes in Computer Science, vol. 3827, 2005, pp. 944–953.

[6] X. Huang, An algorithm for identifying regions of a DNA sequence that satisfy a content requirement, Computer Applications in the Bio-sciences 10 (1994) 219–225.

[7] S.K. Kim, Linear-time algorithm for ﬁnding a maximum-density seg-ment of a sequence, Information Processing Letters 86 (2003) 339– 342.

[8] H.C. Lau, T.H. Ngo, B.N. Nguyen, Finding a length-constrained maximum-sum or maximum-density subtree and its application to logistics, Discrete Optimization 3 (2006) 385–391.

[9] Y.L. Lin, T. Jiang, K.M. Chao, Eﬃcient algorithms for locating the length-constrained heaviest segments, Journal of Computer and Sys-tem Sciences 65 (2002) 570–586.

[10] R.R. Lin, W.H. Kuo, K.M. Chao, Finding a length-constrained maximum-density path in a tree, Journal of Combinatorial Optimiza-tion 9 (2005) 147–156.

[11] B.Y. Wu, K.M. Chao, C.Y. Tang, An eﬃcient algorithm for the length-constrained heaviest path problem on a tree, Information Processing Letters 69 (1999) 63–67.