• 沒有找到結果。

Approximation Algorithms for Constructing Evolutionary Trees

N/A
N/A
Protected

Academic year: 2021

Share "Approximation Algorithms for Constructing Evolutionary Trees"

Copied!
11
0
0

加載中.... (立即查看全文)

全文

(1)Approximation Algorithms for Constructing Evolutionary Trees Chia-Mao Huang and Chang-Biau Yang Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan [email protected]. Abstract. evolutionary trees have been proposed. There are also many various scoring functions to evaluate an evolutionary tree. Most evolutionary tree optimization problems are NP-hard [2, 3, 16], except some very special scoring functions with some special input data [6].. In this paper, we shall propose heuristic algorithms to construct evolutionary trees under the distance base model. When the distance matrix is metric, the problem is called the triangle minimum ultrametric tree problem ( MUT). For the MUT, we shall propose an approximation algorithm, with error ratio logα n 1 5 1 1 44 logn 1, where α . We shall also 2 propose a heuristic algorithm to obtain a good leaf node circular order. The heuristic algorithm is based on the clustering scheme. And then we shall design a dynamic programming algorithm to construct the optimal ultrametric tree under some fixed leaf node circular order. The time complexity of the dynamic programming is O n3 , if the scoring function is the minimum tree size or L1 -min increment.. .  . . 

(2) . In this paper, we shall use the distance base model to construct the evolutionary tree. The model is based on the computing results of the distances between species. First, we use the DNA sequence to compute the distance between every pair of species. Then, we construct the evolutionary tree from these distance data. For examples, the neighbor joining (NJ) method [13] and the unweighted pair group method with maximum (UPGMM) [11] are often used to construct the evolutionary tree from the distance data. The evolution of organisms will change their DNA sequences, and these evolution events can be viewed as insertion, deletion, point mutation, rearrangement or inversion of DNA sequences. Thus, we can compute the number of event occurrences, and accordingly calculate the distance between two species [10, 12, 14]. In this model, the distance between any two species in the evolutionary tree is similar to the original distance..  .  . Key words: computational biology, evolutionary tree, approximation algorithm, dynamic programming. 1 Introduction. The organization of this paper is as follows. In Section 2, we shall first present some definitions about the evolutionary tree problem. In Section 3, we shall define the binary splitting tree problem and propose an algorithm to construct a binary splitting tree with height no more than 5 1 logα n , where α 2 and n is number of leaf nodes. In Section 4, we shall propose an approximation algorithm to construct the minimum ultrametric tree under the metric distance matrix and prove that the error ratio is within logα n 1 5 1 1 44 logn 1, where α 2 . And in Section 5, we shall propose a heuristic algorithm to construct a leaf node circular order, and also design a dynamic programming algorithm to solve the optimal ultrametric tree problem under a certain. An evolutionary tree is an important tool to show branching diagrams and the history of life. And now, we can obtain the DNA (Deoxyribonucleic Acid) sequence from the organisms. The DNA sequence is the most original information of one life. Thus, we shall use the DNA information to infer an evolutionary tree close to the real evolutionary process [15]. Many researchers have studied the construction of evolutionary trees [1, 4, 5, 7]. However, we are not sure what the real evolutionary process is. So, many various construction models for. . .  . . This research work was partially supported by the National Science Council of the Republic of China under contract NSC-90-2213-E-110-043.. 1.  . . .  .

(3) . leaf node circular order. Then, in Section 6, we show our experiment results and compare them with the UPGMM method. Finally, we shall give some conclusions in Section 7.. Definition 6 Given a metric distance matrix M, the MUT (minimum ultrametric tree for a metric) problem is to construct an ultrametric tree T such that the total weight assigned to the tree edges is minimized and diT j  diM j  i  j.. 2 Preliminaries. Definition 7 [19] Given an ultrametric distance matrix M associated with a tree topology T , the MUTT (minimum ultrametric tree with a given topology) problem is to assign each tree edge a weight such that the total weight is minimized and diT j  diM j  i  j.. In this section, we shall give some definitions about the evolutionary tree and various scoring functions to estimate the goodness of an evolutionary tree..  . Theorem 1 [19] The MUTT problem can be solved in O n2 time, where n is the number of species.. Definition 1 An n n distance matrix M is used to represent the distances among n species, where diMj denotes the distance between the ith species and the jth specie. Moreover, M is a symmetric matrix, that is, diMj d M ji .. There are also many various scoring functions to estimate the goodness of an evolutionary tree. Given an n n distance matrix M, several popular scoring functions [6] for measuring an evolutionary tree T are as follows.. Definition 2 An n n distance matrix M is metric if the distances among any three points satisfy the triangle inequality, that is, for any three M M M. points x  y z, dxy dxz dyz. . . . minimum tree size: diT j  diM j  i  j and the total weight of the tree is minimized. The MUT, MUT and MUTT problems are based on this scoring function. . L1 -min increment: diT j  ∑ diT j diM j is minimized.. . . Definition 3 An n n metric M is ultrametric if and only if for any three points i  j  k, diMj  max dikM  d M jk  . In other words, for any triangle, the two longer sides have the same length. In the following, diT j is used to denote the distance between the ith species and the jth species in an evolutionary tree T , and w T denotes the total weight assigned to the tree edges.. . . i j.  . . . Lk -min increment: diT j  diM j  i  j and ∑ diT j diM j k is minimized. i j. Definition 4 Given a set S of species, the set of leaves in an evolutionary tree is equal to S. And in the tree, each internal node represents the common ancestor of the species on the leaves of the subtree.. diM j  i  j and. . . L∞ -min increment: diT j  diM j  i  j and max diT j diM j is minimized. i j. . . Because the evolution of organisms repeatedly changes their DNA sequence, the distance may be shorter than the real distance of evolution. Thus, we usually use minimum tree size, L1 -min increment, Lk -min increment and L∞ -min increment scoring functions for measuring an evolutionary tree [17].. In a rooted evolutionary tree, each internal node has exactly two children. And, in an unrooted tree, the degree of each internal node is exactly 3. Fact 1 [6] Given an ultrametric matrix M, there exists a unique rooted evolutionary tree, called an ultrametric tree, T such that diT j diM j . In addition, for any internal node v, the distances from v to all leaf nodes in the subtree rooted at v are the same.. 3 The Binary Splitting Tree Problem. . For solving the MUT problem, we first have to solve the binary splitting tree problem. In this section, we shall first define the binary splitting tree problem and then propose an algorithm to construct a binary splitting tree with height no 5 1 more than logα n , where α and n is 2 number of leaf nodes.. Definition 5 Given an arbitrary distance matrix M, the MUT (minimum ultrametric tree) problem is to construct an ultrametric tree T such that the total weight assigned to the tree edges is minimized and diT j  diM j  i  j.. . 2. . .

(4)    . Definition 8 Given a tree T V  E , for any two nodes v1  v2 V , the path connecting v1 And and v2 is denoted as pathT v1  v2 . E pathT v1  v2 denotes the set of edges contained in pathT v1  v2 .. .  . . . v2 e1. Definition 9 Given an unrooted tree T  V E  , the binary splitting tree τ  V V E  is a rooted. e3 e4. v1. v4. . .  . . . v4. v2. v4 v3 v5 (b). For example, consider Figure 1. Figure 1 (a) shows an unrooted tree T , and Figure 1 (b) and (c) show two binary splitting trees of T . In Figure 1 (b), the binary splitting tree is built easily. Nodes v1 and v2 are connected by edge e1 , nodes v4 , v3  and v5  are connected, by edges e3 and e4 , and e1   e3  e4  φ. It is similar in the subtree v4  v3  v5   . However, Figure 1 (c) is more complicated. v2  v1  v3  are con nected by edges e1  e2  , and v4  v5   are connected by edges e3  e4  . It is clear that e1  e2    e3  e4  φ. v2  and v1  v3  have the similar situation. Thus, the tree is a binary splitting tree. Figure 1 (d) is not a binary splitting  tree, because edges e  e2  e3 connect nodes 1     v1  v 4  , and e2   e4  connect nodes v  v3  v5 , 2   and e1  e2  e3   e2  e4  φ. Next, we shall propose an algorithm to construct the binary splitting tree and show that the height of the tree is no more than logα n , where 5 1 and n is the number of leaf nodes. α 2. v2 v1 v3 (c). v1. v4. v2 v3 v5 (d). .          . V  E , its Definition 12 Given a k-way tree T corresponding 3-way tree T  V  V   E  is defined as follows. Let U ui  ui V  degree ui  4  . Suppose  U  h. Let ci degree ui , and vi1 , vi2 ,  , vi  ci be adjacent to u i . Vi v i j  3 j c i 1  , 1 i h. Ei E u1  vi1  ui  vi2   ui  v i3  v i3  vi3  v i3  v i4  v i4  vi4    v  i  c i 2  v  i  c i 1  v  i  c i 1  vi  c i 1  v  i  c i 1  vi  c i   1 i h. Then V 

(5) 1  i  h Vi and E 

(6) 1  i  n Ei . The nodes in V  are called virtual nodes.. . .  .        .      . .  .   . For example, Figure 2 (a) shows a k-way  tree,  k 4.  By Definition  12, U g , V 1 v 13  w  . E1 E g  c  g  d  g  w  w c  w d  , V  V1 and E  E1 , as shown in Figure 2 (b).. . v5. Figure 1: An example for the binary splitting tree. (a) A tree. (b) A binary splitting tree. (c) Another binary splitting tree. (d) Not a binary splitting tree.. Definition 10 For given an unrooted tree T , the binary splitting tree problem is to find a binary splitting tree.. . v1. τ. binary tree such that and V and Vτ are the set of the leaf nodes and the set of internal nodes, respectively, in τ, Eτ denotes the set of tree edges in τ, and for any two nodes v1  v2 VL and any two nodes v3  v4 VR , where VL and VR contain the leaf nodes in the left and right subtrees rooted at node u τ, respectively, E pathT v1  v2  E pathT v3  v4 φ.. . v5. (a). . τ. . v3. e2. . . . Theorem 3 Given a k-way tree, k  4, a 3-way tree T  V  V   E  can be constructed such that there exists a node v V or v V  to split T  into p subtrees, p 2 or p 3, and the number of nonvirtual nodes in any subtree is no more than 1 2 V  ..  . Theorem 2 [18] For any tree T V  E , there exists a node v V such that the T can be split from v into k, k  2, subtrees and the number of nodes in any subtree is no more than 21  V  ..  . Theorem 3 is based on Theorem 2. The only difference between Theorem 2 and Theorem 3 is the latter includes the concept of virtual nodes. Note that in Theorem 3, the splitting node v is included in one of the subtrees. Our algorithm for constructing a binary splitting tree with height no more than logα n , 5 1 where α 2 , is as follows.. Definition 11 An unrooted tree is a k-way tree if the degree of each node is no more than k. Before constructing a binary splitting tree from an unrooted (k-way) tree, we have to convert the k-way tree, k  4, to a 3-way tree by adding some virtual nodes.. . 3. . .

(7) Algorithm BST (Binary Splitting Tree).  V E  ,  V  n. Output: A binary splitting tree τ  V V E  with height no more than  log n , where α . Step 1: Convert T to a 3-way tree T   V V  E  . Step 2: If  V  1, B contains only one node v V , V φ and E φ, and stop. Input: An unrooted tree T. c. . τ. . d. τ. α. a. 5 1 2. f g. . e. b (a). . τ. c a. w (virtual node) f.   . .  . . e. (b). Step 5: Let TA and TBC be T . Recursively apply this algorithm and obtain binary splitVA  VτA  EτA and τBC ting trees τA VBC  VτBC  EτBC , respectively.. Figure 2: Conversion from a 4-way tree to a 3way tree. (a) A 4-way tree. (b) A 3-way tree.. . . . TC. r. . . Step 8: Let TA , TB and TC be T . Recursively apply this algorithm and obtain biVA  VτA  EτA , τB nary splitting τA VB  VτB  EτB and τC VC  VτC  EτC , respectively.. r. TA TC. TB (b) (c). 4. . .  . Step 9: Create a root r and subroot r  , build a binary splitting tree τ rooted at r, as shown in Figure 3 (c). Precisely, τ V  Vτ  Eτ , where V VA  VB  VC , Vτ VτA  VτB  V τC  r r   and Eτ EτA  EτB  EτC  r root of τA  r r   r   root of τB  r   root of τC  . Stop.. . Figure 3: The binary splitting tree.. . . . r'. TBC. 3 5 Step 7: If  VA  2  V  , keep the splitting done in Step 3. In other words, split T into three subtrees TA , TB and TC . . TA. Step 6: Create a root r, build a binary splitting tree τ rooted at r with the left and right subtree being τA and τBC , respectively, as shown in Figure 3 (b). In other words, τ V  Vτ  Eτ  , where V VA  VBC , Vτ V τA  VτBC  r  and Eτ EτA  EτBC  r root of τA  r root ofτBC  . Stop.. v. (a). . . . TA. TB. . Step 4: If  VA   3 2 5  V  , combine TB and TC into TBC . In other words, split T into two 3 5 subtrees TA and TBC . If  VA  2 V  , go to Step 7.. g b.  . Step 3: By Theorem 3, find node v V to split T  into 3-way subtrees TA VA  V  A  E  A , TB VB  V  B  EB and TC VC  V  C  EC such that  VC  1  VB   VA  2  V  .. . d. τ. . . . . . . .

(8) c. d TC. a. c. w (virtual node). d. f TA. a. g. w (virtual node). TB. b. f. e. g b. e (a). Figure 4: Splitting a tree to three subtrees.. (b) r. For example, Figure 2 (a) shows a 4-way tree, in which there are four nodes adjacent to node g. We first convert the tree into a 3-way tree, as shown in Figure 2 (b). Then we further split the 3-way tree  from g into three subtrees TA   a  b  f  , TB g  e  and TC c  d  v  , as shown in Figure 4. Since  VA   3 2 5 , the two smaller trees TB and TC are merged. Thus, the tree is fi nally split into two subtrees T a  b  f , TBC A   c  d  e  g  , as shown in Figure 5 (a) and Figure 5 (b). The merging procedure done in Step 6 is shown in Figure 5 (c). Finally, by Algorithm BST, a binary splitting tree can be constructed, as shown in Figure 6.. {c, d, e, g}. {a, b, f}. (c). Figure 5: Splitting the tree from node g. (a) Sub  tree TA a  b  f  . (b) Subtree TBC c d  e g . (c) The merging procedure.. Theorem 4 Given an unrooted tree of n nodes, Algorithm BST constructs a binary splitting tree with height no more than logα n , where α 5 1 2 .. . .  . . . Proof: Given an unrooted tree T V  E and V n, let π n denote the number of levels re  quired for splitting the corresponding 3-way tree T  . By Theorem 3, we can split T  into three subtrees TA VA  V  A  EA , TB VB  V  B  EB and TC VC  V  C  EC such that  VC   VB   VA  1 2 V  . The possible relations between  VA  and  V  n are as follows. It is assumed that x is an unknown constant.. . . . . . . . . . . root. . 1 Case 1: nx  VA  2 n. We combine TB with TC to get TBC VBC  EBC . It is clear that  VBC    VA  and  VBC  n  VA  1 x n. So we split T  into TA and TBC with one level. At the next recursion level, the number of leaf nodes is reduced from n at the current level to no more than 1 x n.. . Case 2:. 1 3n. . . . .  . . . b. f. c. d. g. e. Figure 6: A binary splitting tree constructed from the 4-way tree in Figure 2 (a).. . .  VA  nx. Because  VB   VA  and  VC . a.  VA  , it 5.

(9) . . is obvious that  VB  nx and  VC  nx. Thus, at the first level, we split T into TA and TB TC . At the next level, we split TB TC into TB and TC . Hence, the number of leaf nodes is reduced from n to no more than nx with two levels.. . . Before giving the following lemma and theorem, we need define some notations. When we are given a metric distance matrix M, we use some notation as follows: . 1 and Case 2, π  n  max π   1 x nBy  Case 1 π  xn   2  . , then π  n  We claim that if x  log n , where α . We shall prove this claim by induction. It is clear that π  1  0. By induction hypothesis, suppose π  k    log k , k n. . It is clear that α . We have Let x max π   1 x  n   1 π  xn   2  π  n  max log  1 x n  1 log xn  2 max log n  log  1 x  1 log n  log x  2 max log n log n log n. OPTMUT : the total weight of the optimal solution of the minimum ultrametric tree problem.. . . . 3. . APPMUT : the total weight of the approximation solution obtained from Algorithm APPULTRA.. 5. 2. 5 1 2. α. α. . . . 3. 5. 1 1 x. 2. . . . α. . α. α. α. . α. . . . . α. α. . Lemma 1 OPTMST. α . . Proof: The labels of nodes and edges in the ultrametric tree constructed by Algorithm APP-ULTRA are shown in Figure 7. The cost of edge ei  j is denoted as ci  j , and the height of node vi  j , which is the length from vi  j to any leaf node in the subtree rooted at vi  j , is denoted as Height vi  j . Let k denote the number of levels in the tree. By Theorem 5 1 4, k logα n , where α 2 . Then we have the following inequalities.. . . Algorithm APP-ULTRA (Approximate Ultrametric Tree).   ∑ c  c (1) ∑ Height  v  ∑ c   . OPTMST. Output: An ultrametric tree under the minimum tree size scoring function with error ratio 5 1 ε logα n 1, where α 2 ..  . . . n metric distance matrix M.. . 2OPTMUT ..  .  . . In this section, we shall propose an approximation algorithm for solving the MUT problem. Our approximation algorithm uses the minimum spanning tree as the backbone to construct a rooted ultrametric tree under the minimum tree size scoring function with error ratio 5 1 ε logα n 1, where α 2 .. . . OPTT SP. Theorem 5 Given an n n metric distance matrix, Algorithm APP-ULTRA builds an ultrametric tree and APPMUT logα n 1 OPTMUT , 5 1 where α . 2. 4 An Approximation Algorithm for  MUT. Input: An n. . . . OPTMST OPTT SP is a clear fact and OPTT SP 2OPTMUT has also been proved [8, 19].. Thus, the proof is complete.. . OPTT SP : the total weight of the optimal solution of the traveling salesperson problem.. We shall use the MST (Minimum Spanning Tree) to prove the error ratio of our algorithm.. . . α. . OPTMST : the total weight of the optimal solution of the minimum spanning tree problem.. k . . 2 Height v1  1. i 1. i  2i 1 1 . i 1. . 2. 1 OPTMST 2. 2 . 2 j. . Step 1: Find the minimum spanning tree (MST) T for the distance matrix M.. k. ∑. . i 2 1. Step 2: Apply Algorithm BST to construct a binary splitting tree B with input T .. 1 OPTMST 2. Step 3: Given tree topology B, solve the MUTT problem [19] to construct a weighted evolutionary tree. The tree is the output.. j 1. ci  2 i . 2.  1. ∑ Height  v3  j  4. . . j 1 k. . ∑. i 3 1. 6. 2 2 j. j 1. ci  2 i . 3.  1.   . j 1 2 i 1 . 4 . ∑ . c3  2 j. j 1.   . j 1 2 i 2 . . (2).

(10) v1,1 e1,1. e1,2. v2,1. v2,2. e2,1. e2,2. v3,1. e2,3. v3,2. e3,1. e3,2. e3,3. e3,4. e3,5. v3,4. e3,6. e3,7. e3,8. vk-1,(j+1)/2. vk-1,1 ek-1,1. vk,2 ek,2. vk-1,n/4. ek-1,j. ek-1,2. vk,1 ek,1. e2,4. v3,3. ek,3. ek,4. ek-1,n/2. vk-1,n/2. vk,j. vk,n/2. ek,2j-1 ek,2j. ek,n-1 ek,n. Figure 7: The ultrametric tree with APP-ULTRA. .. .. ∑ Height  vm  j . 1 OPTMST 2. 2m 1 . . j 1 k. ∑. . i m 1. .. . 1 OPTMST 2. ci  2 i . m.  1. 2m 1 . ∑ . 2k 1 . ∑ . j 1. . i k 1. ci  2 i . k. MUT. α.   . APPMUT. ck  2 j. . j 1.     .  . . 1  j 1  2  . i k 1 .  .  . OPTMST.   . For an evolutionary tree with n different leaves, the order of the leaves from the left to the right is called the leaf node circular order [8]. For n different leaves, there are N n 3  5   2n 5 ∏nk  13 2k 1 different unweighted unrooted evolutionary trees [8]. And there are N n 2n 3 ∏nk  13 2k 1 unweighted rooted evolutionary trees [9]. However, given a leaf node  circular order, only 2 n 2 ! n 2 ! n 1 ! different unweighted unrooted evolutionary trees can be built with that order and 2n  3 2 n 2 ! n 2 ! n 1 ! different unweighted rooted evolutionary trees can be built. Some possible numbers are shown in Table 1..                       .   k 2 1  OPT. .  . 5 A Heuristic Algorithm for MUT.    . . 5 1 2. For the MUT, there is a previous approximation algorithm [19], with error ratio 1 5 logn 1 . And our algorithm APPULTRA, with error ratio logα n 1 5 1 1 44 logn 1, where α 2 , has a better approximation ratio.. Let Si  j denote the set of leaf nodes in the subtree rooted at vi  j . And let OPTMSTi j denote the total weight of the minimum spanning tree for Si  j . We have Height vi  j  1 1  max d s 1  s2  Thus, s1  s2 S 2 2 OPTMSTi j . Equation 1 holds. And, because the ultrametric tree is a binary splitting tree (Definition 9), S2  1 and S2  2 are constructed to two subtrees without edge repetition. OPTMST v2 1 OPTMST v2 2 OPTMST , so 1 Height v2  1 Height v2  2 OPT in MST 2 Equation 2.. . MUT. α. k j. k. . .  . ∑ Height  v   ∑. MST. j 1 2 i m 1 . 2k 1 . cm  2 j. j 1.  k  2 1  OPT   k  1 OPT   log n 1 OPT Thus, the approximation algorithm has error ratio ε  log n 1 , where α . . MST. 7. . .  .

(11) The number of unweighted rooted evolutionary trees without any circular order grows every fast, so it is very difficult to solve the evolutionary tree optimization problem. In this section, we shall first propose a heuristic algorithm to get a good leaf node circular order. Then, we shall present an algorithm with dynamic programming to construct the optimal ultrametric tree under a certain circular order. The concatenation of these two phases is our heuristic algorithm for solving the MUT (Minimum Ultrametric Tree) problem. Our heuristic algorithm to obtain a leaf node circular order for given an n n distance matrix is follows.. r f(p, q, k) = eL + eR. . . L.   . . Step 3: Find a node w S such that dwM vi  minx  S  v j  L dxM v j  .. .  . .  . . . 1. .    . Opti  j. . min. Opti  k. i k j 1. . . Optk.  j  2n 1 j. f  i j k  f or 1  i 1 2 j i n 1 where f  i j k  is a predefined scoring . . . . . . . . . function.. Step 4: Find the minimum of Opti  j , where j i n 1, as the cost of the optimal ultrametric tree. Step 5: Construct the ultrametric tree with the information when we determine the value of Opti  j .. from L, and obtain. In the following, we shall propose a dynamic programming method to construct the optimal ultrametric tree for a certain fixed leaf node circular order. Our algorithm can work on the minimum tree size and L1 -min increment scoring functions and its time complexity is O n3 . The algorithm is as follows.. . . In the following, we shall show how to calculate the predefined scoring function f p  q  k . Based on the measurement of the minimum ultrametric tree size, the scoring function f in the above algorithm can be defined as follows:.  . . . f p q k max. Algorithm OPT-ULTRA. . . Step 3: Compute. Step 5: Repeat Step 3 and Step 4, until S becomes empty. Step 6: Delete v0 and vn L v1  v2   vn. . Step 2: Set Opti  i 0, 1 i 2n 1. Set Opti  i 1 = distance of ui and ui 1 , where 1 i 2n 2.. Step 4: If dwM vi 1 dwM vi 1 , insert w prior to vi v0  v1   vi 1  into L, that is, L w vi   vk  vk 1 ; otherwise, insert w posterior to vi , that is, L v 0  v1    vi  w vi 1   vk  vk 1 . Reindex L as v0  v1   vi   vk 2 . Set k k 1. Remove w from S.. Optk+1,q. Step 1: Set a sequence S v1  v2   vn  v1  v2   vn 1 . Reindex S as u1  u2   un  un 1   u2n 1 .. Step 2: Find d uv maxi  j  S diMj  . Set L v0  v1  v2  v3 , where v3 v0 , v1 u and v2 v. Set k 2. Remove u and v from S. . Optp, k. Output: An optimal ultrametric tree T with respect to the node circular order.. Step 1: Create a new virtual node v0 and dvM0  vi ∞ for all vi S. . {uk+1 ... uq}. . Input: A set S of n species and its distance matrix M. order. {up ... uk}. Figure 8: f p  q  k for the minimum tree size scoring function.. Algorithm Circular-Order. Output: A node circular v1  v2   vn .. eR. eL. p i q p j  q. . . Input: An n n distance matrix M with its node circular order v1  v2   vn .. max. . diMj . p i k p j  k. 8. . diMj .  . max. k 1 i q k 1 j  q. . diMj . . 2.

(12) 4 5 10 20 n. . without circular order 15 105 34459425 8 2 1021 2n 3 ∏nk  13 2k 1. . with circular order 10 98 24310 17672631900  2 n 2 ! n 2 ! n 1 !.     2n 3   .  . . . Table 1: Number of possible ultrametric trees that can be built without and with a circular order.. . . where diM j denotes the distance between ui and p  q  k represents the cost of the u j . In fact, f root to the subroots of the left subtree u p  uk and the right subtree uk 1  uq , as shown in Figure 8. And based on the measurement of the L1 -min increment, the scoring function f in the above algorithm can be defined as follows:. . . f p q k. ∑. . p i k k 1 j  q.  .  . . max. diMj . p i q p j  q.  q k  k p  1 d ∑    . . max. p i q p j  q. . . diMj. diMj . (1, 1) (1, 2). . . max. diMj . . max. . . max. p i q 1 p j  q 1. dM pq  1. max.  p. . . diMj  . Algorithm HEU-ULTRA Input: An n n distance matrix M and the scoring function for the ultrametric tree problem.. n. Output: A good ultrametric tree T .. p i k k 1 j  q. . . . max. p i q p j  q. . Step 1: Apply Algorithm Circular-Order on the input M to construct a good leaf node circular order L .. . . ∑. diMj . diMj. . Step 2: Apply Algorithm OPT-ULTRA on M and L to construct an ultrametric tree T . Step 3: The tree T is the solution of this algorithm.. k.  . Here, f p  q  k can be calculated in O n2 time. Thus, when scoring functions is Lk -min increment, where k  2, Algorithm OPT-ULTRA requires O n5 time. Figure 9 shows the computation dependence of subtrees. For example, Opt1  4 needs the results of Opt1  1 Opt2  4  Opt1  2 Opt3  4 and Opt1  3 Opt4  4 .. 6 Experiment Results.  . . (2, 4). In the following, we shall present our heuristic algorithm to solve the ultrametric tree problem, which is the combination of Algorithm CircularOrder and Algorithm OPT-ULTRA.. . diMj .  q. n 1.  . . (3,4). Figure 9: The dependence graph for the dynamic programming.. can be computed by dynamic programming in O n2 time. In addition, based on the measurement of the Lk -min increment, where k  2, f p  q  k can be defined similarly as follows.. . (4, 4). (1, 4).  . p 1 i q p 1 j  q. f p q k. (2, 3) (1, 3). When the scoring function is the minimum ultrametric tree size or L1 -min increment, f p  q  k can be calculated in O 1 time. Thus, the time complexity of Algorithm OPT-ULTRA is O n3 , since p i q p j  q. (3, 3). M ij . p i k k 1 j q. . (2, 2). . In this section, we shall how our experiment results. In our experiment, we use the random data to test our heuristic algorithm (Circular-Order) and dynamic programming (OPT-ULTRA). Each entry in the distance matrix M is between 2 and. . 9.

(13) 100 and the number of test instances in each test set is 100. We compare our results with the UPGMM method [11]. In addition, we also use the combination of the leaf node circular order of UPGMM with our dynamic programming (UPGMM + OPT-ULTRA) for comparison. In Table 2 and Table 3, we compare UPGMM, UPGMM + OPT-ULTRA and Circular-Order + OPT-ULTRA with the scoring functions minimum tree size and L1 -min increment, respectively. Each column represents one test set and there are 100 test instances in each test set. Each entry represents the number of occurrences that the performance of the method is superior to those of the other two methods. If both methods or all three methods get the top performance, then the number of occurrences increases one on each method getting top performance. Thus, the total number in each column may be greater than 100. For example, in Table 2, when the number of species is 10, the entry of UPGMM represents that UPGMM method gets the top performance 8 times in 100 test instances. And, the three methods may get the same result, so the sum of 8, 26 and 80 is greater than 100. In Table 2 and Table 3, we can find that UPGMM + OPT-ULTRA has better performance. We get a conclusion that UPGMM combined with our OPT-ULTRA has significantly improvement. Since UPGMM is based on the minimum tree size scoring function, and Algorithm Circular-Order (our method) is based on neither tree minimum tree size nor L1 -min increment scoring function, in Table 2, we can find that Circular-Order + OPT-ULTRA has worse performance than that of pure UPGMM when n is large. Furthermore, Circular-Order + OPT-ULTRA has better performance than that of pure UPGMM. And, in Table 2 and Table 3, when n is small, Circular-Order + OPT-ULTRA has better performance than other methods. Thus, we get a conclusion that when number of species is smaller, our method can get better leaf node circular order. When the number of species becomes larger, UPGMM shall get better leaf node circular order. Our OPT-ULTRA method is very effective to improve other methods.. is based on MST (Minimum Spanning Tree) and BST (Binary Splitting Tree). And, we define the BST problem and design an O n3 algorithm to solve this problem. The algorithm can produce a binary splitting tree with height no more than 5 1 logα n , where α and n is the number of 2 leaf nodes. Besides, we also propose a heuristic algorithm, Algorithm Circular-Order, to construct a leaf node circular order for given an n n distance matrix, where n is the number of species. And, we design a dynamic programming algorithm, Algorithm OPT-ULTRA, to solve the optimal ultrametric tree problem under a certain leaf node circular order. Algorithm OPT-ULTRA can work on the minimum tree size and L1 -min increment scoring functions. The time complexity of the dynamic programming is O n3 . In fact, by our experiment results, we get a clear conclusion that Algorithm OPT-ULTRA significantly improves UPGMM..  . . References [1] R. Agarwala, D. Fernandez-Baca, and G. Slutzki, “Fast algorithms for inferring evolutionary trees,” In Proceedings of the 30th Allerton Conference on Comm., Control, and Comput, pp. 594–603, 1992. [2] W. H. E. Day, “Computational complexity of inferring phylogenies by dissimilarity matrices,” Bulletin of Mathematical Biology, Vol. 49, No. 4, pp. 461–467, 1987. [3] W. H. E. Day and D. Sankoff, “Computational complexity of inferring phylogenies by compatibility,” Systematic Zoology, Vol. 35, No. 2, pp. 224–229, 1986. [4] M. Farach and J. Cohen, “Numerical taxonomy on data: Experimental results,” ACMSIAM Symposium on Discrete Algorithms, 1997. [5] M. Farach, T. Przytycka, and M. Thorup, “On the agreement of many trees,” Information Processing Letters, Vol. 55, pp. 297– 301, 1995.. . In this paper, we propose an approximation algorithm, APP-ULTRA, with error ratio log 2 n 1 1 44 logn 1, for solving the. 5 1.    . .  . 7 Conclusion. . . [6] M. Farach, S.Kannan, and T.Warnow, “A robust model for finding optimal evolutionary trees,” Algorithmica, Vol. 13, No. 1/2, pp. 155–179, 1995.. MUT problem. Our proof of the error ratio 10.

(14) Method UPGMM UPGMM + OPT-ULTRA Circular-Order + OPT-ULTRA. 5 0 16 100. # of species (n) 10 20 50 8 24 9 26 90 100 80 10 0. 100 3 100 0. Table 2: Experiment results for the minimum tree size scoring function. Method UPGMM UPGMM + OPT-ULTRA Circular-Order + OPT-ULTRA. 5 16 16 100. # of species (n) 10 20 50 100 5 3 0 0 23 52 98 100 85 48 3 0. Table 3: Experiment results for the L1 -min increment scoring function. [7] M. Farach and M. Thorup, “Fast comparison of evolutionary trees,” In Proc. 5th ACM-SIAM Symp. on Discrete Algorithms, pp. 481–488, 1994.. ology and Evolution, Vol. 4, pp. 406–424, 1987. [14] P. H. Sellers, “On the theory and computation of evolutionary distances,” SIAM Journal of Applied Mathematics, Vol. 26, pp. 787–793, 1974.. [8] C. Korostensky and G. H. Gonnet, “Using traveling salesman problem algorithms for evolutionary tree construction,” Bioinformatics, Vol. 16, No. 7, pp. 619–627, 2000.. [15] D. L. Swofford and G. J. Olsen, “Phylogeny reconstruction,” Molecular Systematics (D. M. Hillis and C. Moritz, eds.), pp. 411–501, Sinauer Associates, 1990.. [9] R. C. T. Lee, “Computational biology.” http://www.csie.ncnu.edu.tw/˜rctlee/ biology.html, Department of Computer Science and Information Engineering, National Chi-Nan University, 2001.. [16] H. T. Wareham, “On the computational complexity of inferring evolutionary trees,” Tech. Rep. 9301, Department of Computer Science, Memorial University of Newfoundland, 1993. Available by anonymous ftp from ftp.cs.mun.ca in directory pub/techreports.. [10] M. Li, J. H. Badger, X. Chen, S. K. P. Kearney, and H. Zhang, “An information based sequence distance and its application to whole mitochondrial genome phylogeny,” Bioinformatics, Vol. 17, No. 2, pp. 149–154, 2001.. [17] M. Waterman, T. Smith, M. Singh, and W. Beyer, “Additive evolutionary trees,” Journal of Theoretical Biology, Vol. 64, pp. 199–213, 1977.. [11] W. H. Li and D. Graur, Fundamentals of molecular evolution. MA: Sinauer Associates, 1991.. [18] R. Wong, “Worst-case analysis of network design problem heuristics,” SIAM J. Algebraic Descrete Mathematics, Vol. 1, pp. 51– 63, 1980.. [12] W. J. Masek and M. S. Paterson, “How to compute string-edit distances quickly,” Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison (D. Sankoff and J. Kruskal, eds.), Addison-Wesley Reading, 1983.. [19] B. Y. Wu, K. M. Chao, and C. Y. Tang, “Approximation and exact algorithms for constructing minimum ultrametric trees from distance matrices,” Journal of Combinatorial Optimization, Vol. 3, pp. 199–211, 1999.. [13] N. Saitou and M. Nei, “The neighborjoining method: a new method for reconstructing phylogenetic trees,” Molecular Bi11.

(15)

數據

Figure 1: An example for the binary splitting tree.
Figure 3: The binary splitting tree.
Figure 5: Splitting the tree from node g. (a) Sub- Sub-tree T A 	  a  b f  . (b) Subtree T BC 	  c  d  e  g
Figure 7: The ultrametric tree with APP-ULTRA.
+4

參考文獻

相關文件

In Section 3, we propose a GPU-accelerated discrete particle swarm optimization (DPSO) algorithm to find the optimal designs over irregular experimental regions in terms of the

To date we had used PSO and successfully found optimal designs for experiments up to 8 factors for a mixture model, nonlinear models up to 6 parameters and also for more involved

Primal-dual approach for the mixed domination problem in trees Although we have presented Algorithm 3 for finding a minimum mixed dominating set in a tree, it is still desire to

Then, we tested the influence of θ for the rate of convergence of Algorithm 4.1, by using this algorithm with α = 15 and four different θ to solve a test ex- ample generated as

For the proposed algorithm, we establish its convergence properties, and also present a dual application to the SCLP, leading to an exponential multiplier method which is shown

Zdunek and Cichocki (2008, Algorithm 4) used a projected Barzilai-Borwein method to solve m independent problems (15) without line search.. (2009, Section 5) reported that on

In outline, we locate first and last fragments of a best local alignment, then use a linear-space global alignment algorithm to compute an optimal global

We solve the three-in-a-tree problem on