A fast algorithm for rooting a tree to minimeze the ultrametric size
全文
(2) 2. Preliminaries. In this paper, by T = (V, E) we denote an unweighted tree with vertex set V and edge set E. A tree with an edge weight function w is denoted by T = (V, E, w). Let n denote the number of species. All the elements in a matrix and the weights on edges of a graph are assumed to be nonnegative. We first give some definitions as follows: Definition 1 : A distance matrix of n species is a symmetric n × n matrix M such that M [i, j] ≥ 0 for all 0 ≤ i, j ≤ n, and M [i, i] = 0 for all 0 ≤ i ≤ n. Definition 2 : An n × n metric M is an ultrametric if and only if M [i, j] ≤ max{M [i, k], M [j, k]} for all 1 ≤ i, j, k ≤ n. [1] Definition 3: Let T = (V, E, w) be an edge weighted tree and u, v ∈ V . The path length from u to v is denoted by P dT (u, v). The size of T is defined by w(T ) = e∈E w(e). Definition 4: Let T be a rooted tree and r be any node of T . we use Tr to denote the subtree rooted at r, and L(T ) to denote the leaf set of T . Definition 5 : An ultrametric tree T of {1..n} is a rooted and edge-weighted binary tree with L(T ) = {1..n} and root r such that dT (u, r) = dT (v, r) for all u, v ∈ L(T ). A rooted tree is binary if every internal node has exactly two children. An unrooted binary tree is a tree in which the degree of every internal node is exactly three. We consider only binary tree since any nonbinary tree can be easily transformed into a binary tree without changing the distances between leaves. Let T be an ultrametric tree with root r. It is easy to see that for any internal node v, Tv is an ultrametric tree of L(Tv ). It should be noted that an n×n metric is ultrametric if and only if there is an ultrametric tree T of {1..n} such that dT (i, j) = M [i, j] for all 1 ≤ i, j ≤ n [1]. By the definition of an ultrametric tree, the distances from an internal node r to all the leaves in Tr are the same. Therefore we can define the height of a node as follows: Definition 6: Let T = (V, E, w) be an ultrametric tree. For any r ∈ V , The height of r is the distance from r to any leaf in the subtree Tr .. The minimum ultrametric tree of a distance matrix was defined in [2]. Definition 7: For an n by n distance matrix M , an ultrametric tree T is an ultrametric tree of M if L(T ) ={1..n} and dT (i, j) ≥ M [i, j] for all 1 ≤ i, j ≤ n. T is the minimum ultrametric tree of M if the tree size is minimum among all ultrametric trees of M . The next definition and two lemmas were shown in [5]. Definition 8: Min Ultrametric Tree with a given Topology (MUTT) problem: Given a distance matrix M and a unweighted rooted tree P = (V, E) with L(P ) = {1..n}, the MUTT problem is to find a nonnegative edge weight function w of P such that T = (V, E, w) is the minimum ultrametric tree of M. Lemma 1: A tree T is a minimum ultrametric tree with respect to the fixed topology and distance matrix M if and only if the height of each internal node r is exactly max{M [u, v]/2 | u, v ∈ L(Tr )}. [5] Lemma 2: The MUTT problem, as well as the heights of all nodes of the minimum tree, can be computed in O(n2 ) time. [5] The problem to be solved in this paper is formally defined in the following: Definition 9: Given any distance matrix M and a unweighted unrooted tree P = (V, E) with L(P ) = {1..n}, the RMUT problem is to root P at one of its edges and to find a nonnegative edge weight function w for the resulted tree T such that T is an ultrametric tree of M and w(T ) is minimum among all possible roots and edge weight functions.. 3. The algorithm. As mentioned in Section 1, the RMUT problem can be solved in O(n3 ) time. We shall reduce the time complexity to O(n2 ) in this section. The next property is helpful for improving the time efficiency. Lemma 3: Let M be the distance matrix and M [u, v] be maximal among all observed distances. The tree can be rooted optimally at some edge of the path between u and v on the tree..
(3) Proof: Let T and r be an optimal tree and an optimal root of the RMUT problem respectively. By Lemma 1, the height of r is M [u, v]/2 since M [u, v] is maximal. Also we have dT (u, v) = M [u, v]. Therefore there is an internal node r1 of the path between u and v on T , whose height is exactly M [u, v]/2. In the case that r1 6= r, since the heights of r and r1 are the same, we may reroot T at r1 and the size of the tree remains minimal. By the above lemma, the trees rooted at one of the edges of the path are candidates of the solution. However, the number of edges of the path may be up to O(n). Computing all of the candidates individually takes also O(n3 ) time in worst case. The idea is to compute all the candidates in two passes. Let M [u, v] be a maximal element of M and (u = x0 , x1 , x2 , ..., xk = v) be the path from u to v on T . For each vertex xi , we first compute f1 (i) as the minimum size of the subtree rooted at xi if the optimal root is between xi and v. Then we compute f2 (i) as the minimum size of the subtree rooted at xi if the optimal root is between xi and u. Finally the minimum size of the whole tree rooted at edge (xi , xi+1 ) can be found by f1 (i) and f2 (i + 1). The time complexity is reduced because the values f1 (i) for all 0 ≤ i ≤ k can be computed in one pass. Similarly every value f2 (i) can be found in the second pass. Our algorithm is listed below and illustrated in Figure 1: Algorithm RootMUT Input:A unweighted unrooted tree T = ({1..n}, E) and a distance matrix M . Output:A rooted tree with edge weights. 1: Find u,v such that M [u, v] is a maximal element of M . 2: Find (u = x0 , x1 , x2 , ..., xk = v) which is the path from u to v on T . 3: Root T at edge (xk−1 , v). For every i, compute f1 (i) to be the minimum size of the subtree rooted at xi and h1 (i) to be the height of xi . 4: Root T at edge (u, x1 ) . For every i, compute f2 (i) to be the minimum size of the subtree rooted at xi and h2 (i) to be the height of xi . 5: For every i, compute f1 (i) + f2 (i + 1) +M [u, v] − h1 (i) − h2 (i + 1), which is the minimum size of the whole tree rooted at edge (xi , xi+1 ). Then find the optimal root by choosing the minimum. 6: Output the tree with the optimal root.. Theorem 4 : The algorithm RootMUT finds the optimal root for the RMUT problem in O(n2 ) time. Proof: Apparently Step 1 takes O(n2 ) time and Step 2, 5, 6 take O(n) time. By Lemma 2, Step 3 and 4 can be done in O(n2 ) time. Therefore the time complexity of the algorithm is O(n2 ). For the correctness of the algorithm, we shall show that f1 (i) is the minimum size of the subtree rooted at xi in the case that the optimal root is between xi and v. Let e1 , e2 be two edges of the path between xi and v. For the two trees resulted by rooting T at e1 and e2 respectively, the leaf sets of the subtrees rooted at xi are the same. By Lemma 1, the subtree rooted of xi has the same minimum size once the root is between xi and v. Therefore, in the case that the optimal root is between xi and v, the minimum size of the subtree rooted at xi is correctly given by f1 (i). The correctness of f2 (i) can be shown similarly. Let r be the root. The minimum size of the tree rooted at edge (xi , xi+1 ) is f1 (i) + f2 (i + 1) + w(r, xi ) + w(r, xi+1 ), in which w(r, xi ) = M [u, v]/2 − h1 (i) and w(r, xi+1 ) = M [u, v]/2 − h2 (i + 1) since the height of r is M [u, v]/2.. 4. Concluding remarks. It is interesting how to compute the minimum additive tree size of a given tree topology, instead of the restriction to ultrametric. It is obviously that such a problem can be solved by linear programming. But the algorithmic approach is still open. For the RMUT problem discussed in this paper, a C program based on algorithm RootMUT was written and ported on a PC running MS-DOS. The program, as well as some explanation and a sample input, are free and available at URL http://www.personal.stu.edu.tw/bangye/mutroot.htm.. Acknowledgements The work was partially supported by grant NSC 89-2218-E-366-003 from the National Science Council..
(4) xi+1 u x1. x2. xk-1 v. f1(i) xi. u. v. (a). (b) xi xi. xi+1. xi+1. v u. v. u f2(i+1). (c). (d). Figure 1: (a): Find the path between u and v on the tree. (b): Root the tree at the edge incident to v and compute f1 (i), h1 (i). (c): Root the tree at the edge incident to u and compute f2 (i), h2 (i). (d): The minimum size for rooting at edge (xi , xi+1 ) can be computed by f1 (i), f2 (i + 1), h1 (i) and h2 (i + 1).. References [1] H.J. Bandelt, Recognition of tree metrics, SIAM Journal on Discrete Mathematics., 3(1), 1–6, 1990. [2] M. Farach, S. Kannan and T. Warnow, A robust model for finding optimal evolutionary trees, Algorithmica, 13, 155–179, 1995. [3] J/ Felsenstein, PHYLIP — Phylogeny Inference Package (Version 3.2), Cladistics, 5, 164–166, 1989. [4] J.D. Thompson, D.G. Higgins, and T.J. Gibson, CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Research, 22, 4673-4680, 1994 [5] B.Y. Wu, K.M. Chao and C.Y. Tang, Approximation and exact algorithms for constructing minimum ultrametric trees from distance matrices, Journal of Combinatorial Optimization, 3, 199–211, 1999..
(5)
相關文件
The purpose of this research is to study a tiling problem: Given an m × n chessboard, how many ways are there to tile the chessboard with 1 × 2 dominoes and also ”diagonal”
Write the following problem on the board: “What is the area of the largest rectangle that can be inscribed in a circle of radius 4?” Have one half of the class try to solve this
Primal-dual approach for the mixed domination problem in trees Although we have presented Algorithm 3 for finding a minimum mixed dominating set in a tree, it is still desire to
Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it
(c) If the minimum energy required to ionize a hydrogen atom in the ground state is E, express the minimum momentum p of a photon for ionizing such a hydrogen atom in terms of E
In summary, the main contribution of this paper is to propose a new family of smoothing functions and correct a flaw in an algorithm studied in [13], which is used to guarantee
Given a connected graph G together with a coloring f from the edge set of G to a set of colors, where adjacent edges may be colored the same, a u-v path P in G is said to be a
• Definition: A max tree is a tree in which the key v alue in each node is no smaller (larger) than the k ey values in its children (if any). • Definition: A max heap is a