• 沒有找到結果。

Design Parallel Algorithms for Ultrametric Tree Construction, Chemical Compound Inference, and Frequent Pattern Mining on

N/A
N/A
Protected

Academic year: 2022

Share "Design Parallel Algorithms for Ultrametric Tree Construction, Chemical Compound Inference, and Frequent Pattern Mining on "

Copied!
93
0
0

加載中.... (立即查看全文)

全文

(1)

中 華 大 學 博 士 論 文

題目:Design Parallel Algorithms for Ultrametric Tree Construction, Chemical Compound Inference,

and Frequent Pattern Mining on Multiple Computing Unit System

針對等距演化樹、化合物推論以及頻繁項目集探勘問 題於多計算單元系統設計平行演算法

系 所 別:工程科學博士學位學程 學號姓名:D09424006 周嘉奕 指導教授:游坤明 博士

中華民國 九十九 年 一 月

(2)

Design Parallel Algorithms for Ultrametric Tree Construction, Chemical Compound Inference, and Frequent Pattern Mining on

Multiple Computing Unit System

by Jiayi Zhou

Advisor: Prof. Kun-Ming Yu

DISSERTATION

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in

Ph.D. Program in Engineering Science at Chung Hua University, Taiwan, R.O.C.

January 2010

(3)

ii

Abstract

With the rapid development of science in many fields, much information need to be processed and analyzed. Although the development of computers is very fast, but computing power always cannot keep up with the growth of data. Therefore, connect computing units is a good strategy to obtain higher performance with lower cost. In this dissertation, we concerned about three important problems in Bioinformatics, Cheminformatics, and data mining field, that are Ultrametric tree construction, Chemical compound inference, and frequent pattern mining respectively. Ultrametric tree can assist biologists to observe the relationship among species. Moreover, it is also used in orthologous-domain classification and multiple sequence alignment. Chemical compound inference could produce compounds with the same given constrains, and the applications include reconstructing molecular structure, classification of compounds, and may be useful for drug design. Frequent pattern is a fundamental processing in association rules mining, time series mining, classification, and etc. The computation power daemon of above-mentioned topics is increase rapidly in growth of the amount of data. Therefore, we use parallel and distributed strategy to solve these problems in this dissertation. Moreover, we considered the critical issue in parallel computing--load balancing. For each topic, we proposed corresponding facilities to the load balancing issue. In order to verify the performance of proposed algorithms, we implemented each of them. The experimental results show that proposed algorithms could reduce the computation time on multiple computing units system and have good speed-up ratio.

Keywords: Parallel and distributed algorithm, Load balancing, Minimum ultrametric tree, Chemical compound inference, Frequent pattern mining.

(4)

iii

摘要

隨著不同領域的科學研究快速發展,相對也愈來愈多的資料需要被快速的處理 及分析。雖然資訊科學的發展相當快速,但是所需要計算及分析的資料量相對也增加 的非常快速,於是高效能計算的需求就因應而生。高效能計算有許多不同的架構,目 前主要的趨勢是利用系統匯流排、高速網路、或網際網路將多個計算單元連結成一系 統以提供強大計算能力。本論文中,我們主要對三個在生物資訊、化學資訊以及資料 探勘領域中重要的問題設計平行演算法以期利用高效能系統加快求解的速度。這三個 問題分別為建構等距演化樹、化合物推論以及頻繁項目集探勘。等距演化樹可以幫助 生物學家觀察物種間的親疏關係,同時也是可以做為多序列比對的指南樹。化合物推 論的結果可以產生擁有相同特色的候選化合物,其應用包含:重建分子結構、化合物 分類或是藥物設計。而找尋頻繁項目集議題則是資料探勘中相當基礎且重要的問題,

應用範圍在關聯式法則、時間序列、分類…等。隨著資料量的增加,上述議題所需要 的計算能力增加非常的快速。因此在本論文中,我們使用平行及分散式計算的策略來

解決上述問題。此外我們也針對每個問題考量了在平行系統中相當重要的議題—負載

平衡,並且依據問題的特性設計負載平衡衡機制。為了驗證程式執行的效能及結果,

我們設計並實作了上述的平行程式,並運用叢集、網格及多核心系統驗證效能。而實 驗結果中也可以觀察到,提出的解決方案都可有效的運用多計算單元系統的計算能力 降低計算所需時間,並且也達到令人滿意的加速比。

關鍵字:平行及分散式演算法,負載平衡,最小權值等距演化樹,化合物推論,頻繁 項目集探勘

(5)

iv

Acknowledgments

Be able to complete this dissertation, first of all, I would like to thank my advisor, Prof. Kun-Ming Yu. He gave me helpful suggestions in many ways. I appreciate his guidance, advice, support, and encouragement. Working with Prof. Yu has been a great learning experience. I would also like to thank the members of my Ph.D. oral committee—Professors Chuan Yi Tang, Tzung-Pei Hong, Tong-Ying Juang, and Ching-Hsien Hsu for their very useful comments.

I wish to thank Prof. Chuan Yi Tang for his guidance in many research directions. I am grateful to Prof. Chun-Yuan Lin spent a lot of time to discuss with me and to give me many invaluable suggestions. Prof. Wen Ouyang has helped me at various points in paper writing and presentation skills. I am also grateful to my friends particularly Shih-Chung Chen, Bin-Chang Wu, Hui-Yuan Wang making my stay at Chung Hua University enjoyable of my graduate studies. Finally, I would like to thank my parents, brother, and sister because of their love, encouragement, and support. I am also grateful to my most sincerely love--Yanru Luo, because of her encouragements that kept me working. I dedicate this dissertation to all of them.

(6)

v

Table of Contents

Abstract ... ii

摘要 ... iii

Acknowledgments... iv

Table of Contents ... v

List of Figures ... viii

List of Tables ... x

Chapter 1 Introduction ... 1

1.1 Organization of the Dissertation ... 3

Chapter 2 Parallel Processing ... 4

2.1 Classes of Parallel Computers ... 4

2.1.1 Multi-core Computing ... 5

2.1.2 Cluster Computing ... 5

2.1.3 Grid Computing... 5

2.2 Parallel Programming APIs ... 6

2.2.1 Multi-Process ... 6

2.2.2 MPI ... 7

2.3 Summary ... 7

Chapter 3 Parallel Algorithm for Constructing Minimum Ultrametric Trees ... 9

3.1 Introduction ... 9

(7)

vi

3.2 Related Work ... 12

3.3 Problem Definition ... 14

3.4 Sequential Branch-and-Bound Algorithm ... 16

3.5 Parallel Branch-and-Bound Algorithm (PBBU) ... 18

3.6 Experimental Results... 22

3.7 Summary ... 33

Chapter 4 Parallel Branch-and-Bound Algorithm for Chemical Compound Inference Problem ... 35

4.1 Introduction ... 35

4.2 Problem Definition ... 37

4.3 BMPBB-CCI ... 40

4.4 Experimental Results... 43

4.5 Summary ... 48

Chapter 5 Parallel FP-tree Algorithm for Frequent Pattern Mining Problem ... 50

5.1 Introduction ... 50

5.2 Related Work ... 52

5.2.1 Frequent Pattern Growth (FP-growth) ... 52

5.2.2 Parallel FP-tree Algorithm ... 53

5.3 Problem Definition ... 55

5.4 Tidset-based Parallel FP-tree (TPFP-tree) ... 56

5.5 Balanced Tidset Parallel FP-tree (BTP-tree) algorithm for Grid computing . 62 5.6 Experimental Results... 65

(8)

vii

5.6.1 IBM’s Quest Synthetic Data Generator ... 65

5.6.2 TPFP-tree on PC Cluster ... 65

5.6.3 BTP-tree on Multi-cluster Grid ... 69

5.7 Summary ... 75

Chapter 6 Conclusions and Future Work ... 76

References ... 79

(9)

viii

List of Figures

Figure 3-1: An Example of BBT ... 17

Figure 3-2: The Execution Time of Five Instances of the HMDNA and Random Data Sets .. 24

Figure 3-3: Execution Time of the PBBU for the HMDNA Data Set ... 26

Figure 3-4: Execution Time of the PBBU for the Random Data Set ... 27

Figure 3-5: Speed-up Ratio of the PBBU for the HMDNA Data Set ... 29

Figure 3-6: Speed-up Ratio of the PBBU for the Random Data Set ... 30

Figure 3-7: Results of the PBBU for the HMDNA Data Set ... 32

Figure 3-8: Results of the PBBU for the BT7 Data Set ... 33

Figure 4-1: An Example of ( ,val)-labeled Multitree T ... 38

Figure 4-2: Multi-process Framework of BMPBB-CCI ... 41

Figure 4-3: Computation Time of Different Number of Processes ... 46

Figure 4-4: Speed-up Ratio of Different Entries ... 46

Figure 4-5: The Cpd-Ori and Cpd-Reb Docked Pose Into NA Protein (PDB code: 2hu4) ... 48

Figure 5-1: Example of DB Partitioning Into 4 Processors with the Given Threshold ξ ... 61

Figure 5-2: Example of the Exchange Stage of 4 Processors ... 62

Figure 5-3: Execution time (TPFP vs. PFP), Threshold = 0.0005 ... 67

Figure 5-4: Execution Time (TPFP vs. PFP), Threshold = 0.0001 ... 67

Figure 5-5: Execution Time of Various Thresholds ... 68

Figure 5-6: Execution Time of Different Processors (d50k) ... 70

Figure 5-7: Execution Time of Different Processors (d100k) ... 71

Figure 5-8: Execution Time of Different Threshold (d050k) ... 72

Figure 5-9: Execution Time of Different Threshold (d100k) ... 72

(10)

ix Figure 5-10: Execution Time of Each Processor (d050k) ... 73

(11)

x

List of Tables

Table 4-1: Properties of Selected Compounds ... 44

Table 4-2: Results for Test Compounds by the Pharmacophore Model ... 47

Table 5-1: Execution Time of Each Stage of the PFP-tree (t20.i04.d200k.n100k, Threshold = 0.0005) for the Different Processors ... 56

Table 5-2: Hardware and Software Specification of PC Cluster ... 66

Table 5-3: Statistical Characteristics of Datasets ... 66

Table 5-4: Speed-up Ratio (SU ) of TPFP and PFP2 ... 68

Table 5-5: Hardware and Software Specification of Multi-cluster Grid ... 69

Table 5-6: Execution Time of BTP, TPFP, and PFP (in seconds) ... 73

Table 6-1: Characteristics of MUT, CCI, and FPM ... 78

(12)

1

Chapter 1 Introduction

With the development of computer science, information processing and data analysis play an important role in various research areas. Because there are more and more information needed to be dealt with, the development of computer quickly. According to the Moore’s law, describing a long-term trend of computing hardware, the number of transistors can be doubling approximately every two years [57]. With the growth of the amount of data, the computing power also needed to grow up. However, the price of high-end central processing unit (CPU) is very high. Therefore, connect multiple computing units through system bus, interconnection network, or internet to produce high throughput computing power is currently the trend. The parallel computing communication architecture can be roughly classified into two categories: (1) shared-memory multiprocessors, e.g. multi-core processors, symmetric multiprocessing (SMP), and etc. (2) distributed-memory multiprocessor, e.g. cluster computing, grid computing, and etc. Uniform memory address, ease of programming, lower communication overhead, and hardware controlled caching are the advantages of shared-memory multiprocessors. Relatively, simpler hardware, higher scalability, explicit communication, more computation and storage resources are the advantages of distributed-memory multiprocessor. The programming models also progress from POSIX Threads, multi-processes to OpenMP, MPI, and etc.

In the design of parallel algorithms, the most important is load balancing. Worse task allocation facilities will result in worse performance. Because of the workload cannot be dispatched to each participated computing unit properly, it leads to some computing units

(13)

2 have heavy workload and some do not. For the network communications is the same reasoning. The goals of load balancing are maximizing application performance by keeping processor idle time and inter-computing unit communication as low as possible. There are two aspects to discuss the load balancing: (1) static load balancing and (2) dynamic load balancing. Static load balancing fits the applications with constant workloads, and it can be used as pre-processed task dispatching. For unpredictable tasks, it requires dynamic facilities to balance and adjust the workload among computing units.

In this dissertation, we address how to deal with the large computing requirement with parallel computing strategy for three important problems in Bioinformatics, Cheminformatics, and data mining field. Those are ultrametric tree construction problem, chemical compound inference problem, and frequent pattern mining respectively. Construct ultrametric tree (UT) is important for biologists. UT can represent the evolutionary histories of a set of species and help biologists to determine the relationship among species. Wu et al. [70] has proved constructing UT from distance matrix is an NP-hard problem, and the execution time increased significantly with the number of species grown. Since it is worth to construct minimum UT for a set of species where the number of species is middle-sized, we proposed a parallel branch-and-bound algorithm to solve this problem. Moreover, we also considered the load balancing issue via Global Pool and Local Pool facilities to balance the workloads among computing units. In Cheminformatics and Bioinformatics research areas, enumerating chemical compound is one of the important problems. Enumerated compounds have the same path frequency, and its applications include reconstructing molecular structure with given signatures, classification of compounds, and etc. Many methods have been proposed to solve this problem, but most of them are heuristic algorithms and the optimal solution cannot be found. We proposed a parallel branch-and-bound algorithm to solve this program in the dissertation. Moreover, we also take care of the load balancing issue and the ring structure in

(14)

3 compound. To extract frequent patterns in a transaction-oriented database is important in the mining of association rules, time series, classification, and etc. The mining strategy can be classified to (1) generate-and-test (Apriori-like), (2) pattern growth approach (FP-growth).

Although FP-growth has better performance and can deal with larger scale database, the computation time still increases significantly with the database grows. For that reason, we use parallel and distribution technique to overcome this problem. In this dissertation, a balanced parallel frequent-pattern mining algorithm was proposed to solve frequent pattern mining problem with multi computing units. The goal of proposed algorithm was to reduce both communication and computation cost.

1.1 Organization of the Dissertation

The rest of this dissertation is organized as follows. In Chapter 2, we defined the minimum ultrametric tree (MUT) construction problem and proposed a parallel algorithm for MUT problem called Parallel Branch-and-Bound Algorithm for MUT (PBBU). We also presented Global Pool and Local Pool to solve load balancing issue. The experimental results were also shown in this chapter. In Chapter 3, we proposed a balanced parallel algorithm for chemical compound inference (CCI) problem. The definition and the related works of CCI problem were given in this chapter. Moreover, we designed a Global Queue and Local Queue to balance the workload among multi-core computing units. The experimental results of CCI problem were shown in this chapter. In Chapter 4, the frequent pattern mining problem was defined. We proposed two parallel algorithms to solve this problem on PC cluster and Grid computing system. The load balancing issue was also considered for this problem. Moreover, the experimental results were shown in this chapter. Chapter 5 concluded the dissertation and gives the direction for future work in this area.

(15)

4

Chapter 2

Parallel Processing

Parallel processing is a technique that could process many works simultaneously. A parallelizable large problem could be divided into many sub-problems to be processed by difference computing units. Different types of parallel computing were proposed: bit-level, instruction level, data, and task parallelization. Since most of algorithms were proposed and written for single computing unit, these non-parallelized methods could not utilize the power of parallel computing. In order to employ the power of multiple computing units, algorithms should be rewritten for different forms of parallel computers. Moreover, concurrent programming languages, libraries, and APIs are important factors to develop parallel programs. Proposed libraries and APIs could be classified into shared memory programming and distributed programming. The classes of parallel computers and APIs are described as following.

2.1 Classes of Parallel Computers

Depending on the composition of the hardware, parallel computers could be divided into shared memory computers and distributed memory computers. Shared memory computers has a single memory address space, all processes have equally latency and bandwidth to each element of memory. Multi-core computer, symmetric multiprocessor computer are typical shared memory computers. Each computer has its own local memory address space is call distributed memory computers. Accesses to local memory are always faster than to non-local memory. Compared with shared memory computer, distributed

(16)

5 computers are highly scalable since the computing units are connected by a network. Cluster and Grid system are well known distributed computers.

2.1.1 Multi-core Computing

In order to enhance the processor performance, increasing the clock of processor will also raising the temperature and power consumption. Therefore, processor design is no longer in pursuit of a clock cycle, turning to combine multiple computing cores into a single processor. Multi-core processor is current widely used in processor design, and its structure is belonging to the shared memory architecture. Multi-core processor could process multiple instructions per cycle from multiple instruction streams. Each core in processor also could potentially process multiple instructions by super-scale pipeline.

2.1.2 Cluster Computing

Cluster computing system is a group of loosely coupled computers, it allows for sharing of computing power, storage resource, and services from multiple standalone machines connected by a network. Beowulf is the most common type of cluster which is built by multiple identical computers connected with a TCP/IP Ethernet network. Cluster is also the major type of architecture in TOP500 super-computers since 2005 [1]. Message Passing Interface (MPI) or Parallel Virtual Machine (PVM) is always used to permit parallel programs to communicate among computing nodes

2.1.3 Grid Computing

Unlike the conventional high performance computing (e.g., Cluster computing) which connected by high speed network, grid computing nodes connected by various kind of network (e.g., gigabit network, internet, etc.). Since grid connect various computing device with different software, hardware specification on different geographically location, in order to manage the resource among participated nodes in grid, middleware play an important role

(17)

6 to integrate these resources. Globus, Sun Grid Engine, gLite, etc. are most well-known grid middlewares. Among of them, Globus is a popular and widely used open source grid middleware. There are many categories of gird computing system. Multi-cluster is the most popular type and widely used grid system, e.g. EGGG. In the multi-cluster grid, resources are distributed across difference network on multi-cluster grid. Moreover, each cluster can be a grid site and there is a grid head in each site. Jobs dispatch to grid head and grid head dispatch the jobs to the computing node inside cluster according to its job scheduling algorithm. The administrator issues a certificate to the grid head and permit grid head to manage computing nodes inside the cluster. Moreover, when the cluster size varied, it only requires grid head to adjust the setting instead of reconfigure entire grid system.

2.2 Parallel Programming APIs

It needs different programming languages, libraries, and APIs to design and implement programs on different types of hardware architecture. The programming APIs can be roughly classified to three types based on its memory architecture: shared memory, distributed memory, or shared distributed memory. Multi-thread and OpenMP are two most wide used APIs on shared memory, whereas MPI is the most commonly used message- passing system API especially for distributed memory computer. However, in order to full utilize the computing power of multi-core computers, multi-process is the novel programming method for multi-core computer.

2.2.1 Multi-Process

In order to use of multiple computing units in a single machine, programs need to be implemented in a way which have parallel execution capabilities. Multi-thread or multi- process programming model is always used to develop the program. The majority between threads and processes is how to manage the memory. Threads could communicate to each

(18)

7 other via shared memory object directly, whereas Inter-Process Communications (IPC) is used for processes communications. The main benefit of multi-process programming is protection between processes. Since each process reply data to others via some intermediary such as the file system or network stack, the program could be extend to execute on different computers connected via network.

2.2.2 MPI

Message Passing Interface (MPI) is a specification of APIs for passing messages among computing units that was defined by the MPI Forum. Several implementations of MPI had been proposed and developed. It is also a language-independent used to program parallel computers, e.g. Cluster system, Grid system. Virtual topology is built among computing units, moreover it also provides synchronization and communication functionality between a set of processes. The MPI library functions include following categories: point-to-point communication, communicator creation, collective functions, and derived data functions.

Beside of Fotran, C and C++ implementation, MPI also has following implementations:

Python, OCaml, Java, CLI (.NET), Ruby, and etc.

2.3 Summary

Parallel processing is a good strategy to solve computation intensive problems.

Several forms of parallelism strategies were proposed to solve large scale problem: bit-level parallelism, instruction-level parallelism, data parallelism, and task parallelism. Mulit-core computer, SMP computer, Cluster computer, and Grid computer were designed based on different memory communication mechanism. In this dissertation, we would like to use data parallelism strategy to solve MUT, CCI, and FPM problems. Parallel algorithms for MUT and CCI were designed for Cluster computer and Grid computer. Since Cluster and Grid are distributed memory system, therefore, MPI library was used to perform the communication

(19)

8 among computing units. Algorithm for CCI problem was designed for multi-core computer.

In order to enhance the extensibility, multi-process programming technique was used to implementation proposed algorithm. For each of proposed algorithm, we also designed and implemented load balancing strategies to balance the workload among computing units.

(20)

9

Chapter 3

Parallel Algorithm for Constructing Minimum Ultrametric Trees

3.1 Introduction

The construction of evolutionary trees is important for computational biology, especially for the development of biological taxonomies. An evolutionary tree can be seen as a representation of evolutionary histories for a set of species and is helpful for biologists in observing existing species or to determine their relationships in taxonomy. However, there are no real evolutionary histories (trees) in practice. Therefore, many methods or models have been proposed and tried to construct a meaningful evolutionary tree. The majority of these methods or models are based on two inputs: the sequences and the distance matrix [52].

In the input of sequences, an evolutionary tree is usually constructed according to the multiple sequence alignment (MSA) results. However, it has been shown to be nondeterministic polynomial (NP)-hard to obtain an optimal solution for the MSA problem [33]. When inputting a distance matrix, an evolutionary tree is constructed according to a given distance matrix. The distance matrix is composed of a set of user-defined values for any two species. Generally, these values are calculated as edit distances between two sequences from any two species. Many different models to solve algorithmic problems have been proposed [33, 53]. However, most optimization problems for evolutionary tree construction have been shown to be NP-hard [16-18, 23, 28-29, 48].

(21)

10 An important and commonly used model assumes that the rate of evolution is constant [33, 53] (molecular clock hypothesis [75]). Based on this assumption, the evolutionary tree will be an ultrametric tree (UT), which is a rooted, leaf labeled, and edge weighted binary tree.

In a UT, each internal node has the same path length to all leaves in its subtree. According to [36], it only needs to consider binary trees since a general UT can be easily converted to a binary tree without changing the distances between leaves. More examples of UTs can be found in [8, 15, 23, 48]. Because many of these problems are intractable and NP-hard, biologists usually construct the trees by using heuristic algorithms. The Unweighted Pair Group Method with Arithmetic mean (UPGMA, [53]) is one of the most popular heuristic algorithms [38] to construct UTs. Although the UT constructed by the UPGMA often is not a true tree unless the molecular clock assumption holds, the UT is still useful for clocklike data and, as such, has been compared with other methods [20, 61]. Moreover, the UT has been successfully applied to other problems, such as orthologous-domain classification [65] and multiple sequence alignment (CLUSTAL [37], MUSCLE [21], M-Coffee [68]).

Depending on the scoring scheme of alignment, the distance matrix either follows the triangle inequality or not. This is called a metric or non-metric distance matrix. However, constructing minimum ultrametric trees (MUTs) (principle of minimum evolution [63]) even from a metric distance matrix [36, 70] has shown to be NP-hard. Although the MUT construction problem is NP-hard, it is worth constructing MUTs for a set of species where the number of species is middle-sized. Anyway, the number of species studied by biologists at one time is never really large. It seems possible to find MUTs by exhaustively searching to check all possible trees, however, for n species, the number of UTs is

( ) 1 3 5 (2 3)

A n = × × × × n− [36]. Function A , therefore, grows very rapidly. For example, (10) 107

A > , A(20) 10> 21, A(30) 10> 37. Hence, it is virtually impossible to exhaustively search for all of possible trees even when n is middle-sized.

(22)

11 The use of branch-and-bound algorithms is a well-known technique for avoiding exhaustive searching. Decomposing a problem into smaller sub-problems needs a partition algorithm and then they need to be repeatedly decomposed until infeasibility is proved or a solution is found [73]. Theoretically, the branch-and-bound algorithm cannot ensure polynomial time complexity. However, it has been used to solve some NP-hard problems, such as Traveling Salesman, Knapsack, Vertex Covering, Integer Programming and so on [73]. In addition, the branch-and-bound algorithms often find near-optimal solutions as well as optimal ones. A sequential branch-and-bound algorithm was presented by Hendy and Penny [36] to construct a minimum evolutionary tree of 11 species. Wu et al. [70] proposed an efficient sequential branch-and-bound algorithm (BBU) to construct MUTs from a metric distance matrix. An optimal solution of 25 species was found in a reasonable time by the BBU. Chen and Chang [12] used a look-ahead approach to compute a tighter lower bound for each sub-problem and tried to speed the BBU up. However, the execution time of BBU depends on the input distance matrix [70], therefore, the improvement by [12] is very limited.

These results show that the BBU is useful for MUT construction.

In this chapter, a parallel (decomposition) branch-and-bound algorithm (PBBU) for constructing MUTs from a metric distance matrix was designed. In a decomposition approach [72], the search space was divided into p subspaces with p processors and each subspace was done by a processor first. Then, each processor used a sequential branch-and-bound algorithm for the assigned subspace. Therefore, each processor only required local communication and it minimized the communication cost for parallel algorithms. In order to be useful for biologists, a time constraint T was used as the reasonable time in the PBBU. When the c execution time of PBBU was larger than T set up by biologists, the PBBU was used to c construct near-MUTs. Besides, a partition strategy, a load-balancing strategy, search strategies and a new data structure were also applied to the PBBU to improve its performance.

(23)

12 In the partition strategy, a function was used to evaluate initial nodes first and then they were sorted in an increasing order, followed by the distribution of these nodes to corresponding processors by using the cyclic partition method [4]. Two pools, Global (GP) and Local (LP), were used to do load-balancing of branching nodes, which was a mix of synchronous and asynchronous parallel branch-and-bound algorithms [61]. Both the depth-first search [49] and the best-first search [67] were used to regulate the time and memory space required as in the literature [73]. A new data structure was used to store the necessary information for a node trying to decrease the computation time. In the experimental tests, a random data set and some practical data sets of Human+Chimpanzee Mitochondrial and Bacteriophage T7 DNAs were used to test the PBBU on an AMD Athlon Personal Computer (PC) cluster. The experimental results showed that the PBBU found an optimal solution for 36 species within a reasonable time on 16 PCs. Moreover, the PBBU achieved satisfied speed-up ratios for most of test cases.

This chapter was organized as follows: In Section 3.2, related work of parallel branch- and-bound algorithms and preliminaries were given. The MUT problem was defined in Section 3.3. The BBU was briefly described in Section 3.4. Section 3.5 presents the PBBU in detail. The experimental results were given in Section 3.6. The summary was given in Section 3.7.

3.2 Related Work

The branch-and-bound algorithm is a generally used technique to solve combinatorial search problems. Many theoretical properties for sequential and parallel branch-and-bound algorithms have been discussed. A branch-and-bound algorithm, generally, consists of four parts: branching rule, selection rule, bounding rule, and termination rule. The branching, bounding, and termination rules are problem dependent, yet the selection rule is algorithm

(24)

13 dependent [72]. The selection rule is an important factor for the performance of a designed algorithm. Four well-known search methods, breadth-first [49], depth-first [49], best-first [67], and random [43], have been presented for the selection rule. Among them, the depth- first search and the best-first search are two efficient and commonly used methods. In the depth-first search, a list of subproblems is stored in a LIFO stack. In the best-first search, the subproblem with the smallest lower bound is selected first.

Wah et al. [66] have proposed a parallel machine, called MANIP, with a parallel global best-first search branch-and-bound algorithm for NP-hard problems. It showed that the speed- up ratio increases linearly to system size. In the literature [73], they have investigated the depth-first search and best-first search branch-and-bound algorithms in a system with a two- level memory hierarchy. In general, a best-first search has the best time-efficiency if the main memory is large enough to support all its active sub-problems (space consuming). On the other hand, the depth-first search has the space-efficiency, but is not efficient in execution time. However, in [73], it has been suggested that no search method is good for all problems in a two-level memory hierarchy (depending on the problem).

Regarding other research, EI-Dessouki and Huen [22] have presented a parallel decomposition depth-first search branch-and-bound algorithm on a network of computers.

Kumar and Rao [62] also proposed a parallel decomposition depth-first search branch-and- bound algorithm on multiprocessors. A parallel decomposition best-first search branch-and- bound algorithm for a distributed memory environment was presented by Karp and Zhang [46]. In the literature [61], Quinn has analyzed the execution time for loosely synchronous and asynchronous parallel branch-and-bound algorithms. In the loosely synchronous algorithm, a centralized data structure is used to contain unexamined nodes. In the asynchronous algorithm, the unexamined nodes were distributed within the processors’ local

(25)

14 memory. Yang and Das [72] have proposed a parallel decomposition best-first search branch- and-bound algorithm for MIN-based multiprocessors.

Most of the above focused on presenting a model for parallel branch-and-bound algorithm. However, in this dissertation, the goal is to design a parallel branch-and-bound algorithm, PBBU, to construct MUTs. Hence, the PBBU is not compared with the above algorithms, although some inferences from them have been used to design it.

3.3 Problem Definition

For the PBBU, some useful definitions, lemmas and theorems from Bandelt [8], Farach et al. [23], and Wu et al. [70] were used. The detailed proofs for the lemmas and theorems can be found in [70]. In the following, an unweighted graph G V E=( , ) with a vertex set V and an edge set E was denoted. A graph with an edge weight function ω was denoted by G V E=( , , )ω . All the elements in a matrix and the weights on the edges of the graph were assumed to be nonnegative.

Some definitions, lemmas, and theorems are given below.

Definition 2-1: A distance matrix of n species is a symmetric n n× matrix M such that [ , ] 0

M i j ≥ for all 0 ,i j n≤ ≤ , and M i i =[ , ] 0 for all 0 i n≤ ≤ [70].

Definition 2-2: A M is a metric if the distances obey the triangle inequality, i.e., [ , ] [ , ] [ , ]

M i j M j k+ ≥M i k for all 1 , ,≤i j k n [70].

Definition 2-3: A metric M is an ultrametric if and only if M i j[ , ] max( [ , ], [ , ])≤ M i k M j k for all 1 , ,≤i j k n≤ [70].

(26)

15 Definition 2-4: Let T =( , , )V E ω be an edge weighted tree and ,u v V∈ . The path length from u to v is denoted by ( , )d u v . The weight of T is defined by T

( )T e E ( )e ω = ω [70].

Definition 2-5: Let T be a rooted tree and r be any node of T . We use T to denote the r subtree rooted at r , and ( )L T to denote the leaf set of T [70].

Definition 2-6: An ultrametric tree T of {1, , }n is a rooted and edge-weighted binary tree with ( ) {1, , }L T =  n and root r such that ( , )d u r = ( , )T d v r for all ,T u v L T∈ ( ) [70].

Definition 2-7: Let T = (V, E, ω) be an UT. For any r∈V, the height of r, denoted by height(r), is the distance from r to any leaf in the subtree Tr, i.e., height(r)= ( , )d r v for T any v∈L(Tr) [70].

Definition 2-8: For any M (not necessarily a metric), MUT for M is T with minimum ω(T) such that L T( ) {1, , }=  n and ( , )d i j ≥ M[i, j] for all 1 ≤ i, j ≤ n. The problem T of finding MUT for M is called MUT problem [23].

Definition 2-9: The metric minimum ultrametric tree ( ∆ MUT) problem has the same definition as MUT problem except that the input is a metric [70].

In the following, unless specifically indicated, a tree is a rooted, non-negative edge weighted binary tree. For a tree T = (V, E, ω), the unweighted tree P = (V, E) is called the topology of T.

Lemma 2-1: Let T = (V, E, ω) be MUT for M. For any internal node s in V, height(s) = max{M[i, j] | i, j∈L(Ts)}/2 [70].

(27)

16 Theorem 2-1: The ∆ MUT problem is NP-hard [70].

Definition 1-10: MUT with a given topology (MUTT) problem: Given any M and a unweighted tree P = (V, E) with L P( ) {1, , }=  n , MUTT problem is to find a nonnegative weight function ω of P such that T = (V, E, ω) is MUT for M [70].

Theorem 2-2: The MUTT problem can be solved in O(n2) time, where n is the number of species [70].

Definition 2-11: Let M be a matrix. max(M) denotes max { [ , ]}i j, M i j [70].

Lemma 2-2: If M[u, v]=max(M), there exists a minimum ultrametric tree T such that u and v are in the two subtress of root r [70].

Lemma 2-3: Let T(i) be a MUT for M with leaf set {1, , } and specified topology P(i). If i T(n) is MUT for M with leaf set {1, , }n and the topology of T(n) contains P(i), then ω(T(n)) ≥ ω(T(i))+i j n< ≤ min{ [ , ] |M k j k j< }/ 2 [70].

Definition 2-12: Let M be an nn distance matrix. A permutation (a1, a2, …, an) of {1, , } n is called a max-min permutation if M[a1, a2]=max(M) and min { [ , ]}k i< M a ai k

min { [ , ]}k i< M a aj k for all 1<i< j [70].

3.4 Sequential Branch-and-Bound Algorithm

Before presenting the PBBU, the BBU is briefly described as in [70]. In MUT construction, the branch-and-bound algorithm is a tree search algorithm which repeatedly searches the branch-and-bound tree (BBT) to find a better solution until the optimal one is found. The BBT is a tree with all possible topologies of UTs. Assume that the root of BBT has depth 0, and then a child of a node with depth i is said to have depth 1i + . Hence, each node

(28)

17 with depth i in the BBT represents a topology with a leaf set {1, , 2} i + . Figure 3-1 shows an example of BBT. In Figure 3-1, the root of BBT with depth 0 must have a leaf set {1,2}

and each node in the BBT is a possible topology of UT. The goal is to find a node in the BBT with an optimal solution. Therefore, the BBU for constructing MUTs from a metric distance matrix M is shown below.

Figure 3-1: An Example of BBT

Algorithm BBU

Input: An nn metric distance matrix M.

Output: The minimum ultrametric trees for M.

Step 1: Reorder M to form a max-min permutation and then re-label the species as a leaf set {1,2, , } n . Step 2: Create a root v of BBT which v represents the only topology with leaves 1 and 2.

Step 3: Run the UPGMM to find a feasible solution and store its weight in a variable UB as an initial upper bound.

Step 4: while there is a node in the BBT do

Delete all nodes v from BBT if LB(v) ≥ UB or all the children of v have been deleted.

Select a node s in the BBT according to selection rule, whose children of s has not been generated.

Generate the children of s by using the branching rule.

If a better solution is found, then update the UB as a new upper bound.

endwhile

Step 5: Report the minimum ultrametric trees for M.

End BBU

(29)

18

3.5 Parallel Branch-and-Bound Algorithm (PBBU)

In this section, the PBBU is described in detail. The PBBU was designed on distributed memory multiprocessors with the master-slave architecture. The master processor (MP) created initial nodes and then dispatched most of them to slave processors (SPs). The MP was also used to do the same work in the SPs and try to balance the nodes between the MP and SPs.

In the MP, in Step 1, the input metric distance matrix M was reordered to form a max- min permutation and then re-label the species as a leaf set {1,2, , } n . Of course, this could have been done in parallel. However, the time complexity of Step 1 is O(n2), which is much smaller than the overall execution time of the PBBU. Hence, it was done in the MP in order to simplify the problem. In Step 2, a root v of BBT was created by the MP which v only represented the topology with leaves 1 and 2. In Step 3, the MP ran the UPGMM to find a feasible solution and stored its weight in a global variable UB as an initial upper bound. In order to dispatch nodes to SPs, some nodes of BBT were generated in advance. Therefore, in Step 4, the MP does the parts of Step 4 in the BBU to generate some nodes of the BBT. Note that the value of LB(v) for each node v generated by the MP had to be lower than or equal to the initial UB. In the PBBU, the number of nodes was set to be triple the number of processors p. Similarly, this step also could have been done in parallel, but it was only done by the MP for the same reason as in Step 1.

Since each node v in each SP was bounded quickly or not, the work of the SPs before the dispatching procedure was balanced. In Step 5, for each node v generated by the MP, a C(v) was computed first, and then all of C(v) were sorted in an increasing order. Here, an increasing order or a decreasing order was irrelevant. According to the sorting results, each

(30)

19 corresponding node was stored sequentially into the Global pool (GP). Afterwards, the MP dispatched most of them to SPs by using the cyclic partition method [4]. In the dispatching procedure, the initial UB and the M with a max-min permutation were also sent to SPs. Since the MP was also used to do the same work in the SPs, it had to preserve some nodes in the GP. In the PBBU, the MP preserved 1/p nodes in the GP. By Step 5, a potential effect existed to balance the work among MP and SPs. After dispatching most of nodes from MP to SPs, the PBBU tried to find the optimal solution.

In Wu [70], the BBU was stopped when no node in the BBT had a better solution than the node with the current upper bound UB. However, it is not useful in practice since biologists do not know when they can give the results or they do not have time to wait for results.

Hence, in the PBBU, an addition termination rule--time constrain Tc--was given as an input or not. Therefore, there are two termination rule of the PBBU: one was that when no node in the MP and SPs had a better solution than the node with the current upper bound UB and the other was the execution time of the PBBU larger than Tc.

When the execution time of PBBU was less than the Tc, the branching rule, the bounding rule, and the searching rule of the PBBU were similar to those of the BBU. It was found that the time of the depth-first search was close to that of the best-first search when the number of species was not large. Since the requirement of memory space for the depth-first search was less than that of the best-first search, it seemed that the PBBU only used the depth-first search when the number of species was not large. However, when the number of species was large, the PBBU use d the depth-first search and the best-first search to regulate the time and memory space requirements. In the PBBU, the MP or SPs did not have all nodes on the path from the root of the BBT to node v when it had to branch the children of v by

(31)

20 constructing the topology of v. Therefore, the data structure used in the BBU is not useful for the PBBU. Another data structure, called UT node, was used as node v by storing its leaf and right children for each internal node in the topology of v, leaves, the parent of each leaf, and the lower bound of node v. Each internal node v was represented by the number of leaves when it was created. The UT node was used to decrease the computation time, such as the branching time and the time of repeatedly computing the lower bound of node v. In the PBBU, two pools, GP and Local pool (LP), were used to do load-balancing for branching nodes. In the MP, when the number of nodes in the GP was zero, it broadcast a request to SPs. When an SP received a request for the MP, it sent two nodes to the MP when the number of nodes in the LP was larger than 2. Then, the MP received nodes from SPs or reported MUTs.

Similarly, when the number of nodes in the LP for a SP was zero, it sent a request to MP.

When the MP received a request from an SP, it sent two nodes to this SP when the number of nodes in the GP was larger than 2. This SP received two nodes from the MP or ended the run.

When the execution time of the PBBU was larger than Tc, a special procedure was followed in order to obtain near-MUTs quickly. Each SP sent all nodes in its LP to the MP first. Then, the MP received all nodes from the SPs and stored them in the GP. Afterwards, in this procedure, the termination rule was that no node in the GP had a better solution than the node with the current upper bound UB. The selection rule was to select a node s from the GP directly, when the value of LB(v) was less than or equal to UB (the bounding rule). When selecting a node s, a new leaf was inserted by using the greedy algorithm to choose one of all possible ways with a minimum LB(v) now. This work was done repeatedly until all leaves had been inserted to node s (branching rule). Finally, the MP reported some of the UTs with a better solution. Since the initial upper bound from the UPGMM was near to the optimal solution [15], reported UTs by the MP were near-MUTs.

(32)

21 Although the UTs obtained by applying the time constraint termination rule are near- MUTs solutions, the time constraint termination rule is still useful for finding large scale MUTs. Since the solution space and computation time increasing significantly when the number of species increased, biologists can obtained the better solutions than initial UPGMM and sequential BBU algorithm by PBBU with given Tc. We do not consider the speedup ratio for the UTs which applied the time constraint termination rule. Moreover, for the same given Tc, we can get the better UTs (lower cost) with 16 processors than single processor.

The PBBU for constructing MUTs or near-MUTs from the M is shown below.

Algorithm PBBU

Input: An nn metric distance matrix M and a time constrain Tc.

Output: The minimum ultrametric trees (MUTs) for M or near-MUTs for M if the execution time of PBBU is larger than Tc.

Environment: p processors include a master processor (MP) and p-1 slave processors (SPs).

Master Processor (MP):

Step 1: Reorder M to form a max-min permutation and then re-label the species as a leaf set {1,2, . n} Step 2: Create a root v of BBT which v represents the only topology with leaves 1 and 2.

Step 3: Run the UPGMM to find a feasible solution and store its weight in a global variable UB as an initial upper bound.

Step 4: Run Step 4 of BBU to generate 3p nodes of BBT.

Step 5: Compute C(v) for each node v, then sort all of C(v) in an increasing order and store corresponding nodes into the Global pool (GP), send the UB, the M with a max-min permutation, and 3(p-1) nodes by using the cyclic partition method to SPs finally.

Step 6: while there is a node in the GP and the execution time of PBBU < Tc do

Delete all nodes v in the GP if LB(v) ≥ UB or all the children of v have been deleted.

Select a node s in the GP according to selection rule, whose children of s has not been generated.

Generate the children of s by using the branching rule.

If a better solution is found, then update the UB as a new upper bound and broadcast the UB to SPs.

If there is a request from a SP and the number of nodes in the GP>2, then send 2 nodes to this endwhile SP.

Step 7: if there is no node in the GP and the execution time of PBBU < Tc Broadcast a request to SPs.

If receive 2k nodes from LPs in SPs, then go to Step 6. (k: number of available SPs) Step 8: if the execution time of PBBU > Tc

Receive all nodes from LPs in all of SPs and store them in the GP.

while there is a node in the GP do

Delete all nodes v in the GP if LB(v) ≥ UB or all the children of v have been deleted.

Select a node s in the GP directly, whose children of s has not been generated.

while there is a leaf which needs to be inserted to node s and LB(s)<UB do Insert a new leaf to node s by the greedy algorithm.

endwhile

If a better solution is found, then update UB as a new upper bound.

(33)

22

endwhile

Step 9: Report MUTs for M or near-MUTs for M if the execution time of PBBU is larger than Tc.

Each of Slave Processors (SP):

Step 1: while there is a node in the LP and the execution time of PBBU < Tc do

Delete all nodes v in Local pool (LP) if LB(v) ≥ UB or all the children of v have been deleted.

Select a node s in the LP according to selection rule, whose children of s has not been generated.

Generate the children of s by using the branching rule.

If a better solution is found, then update the UB as a new upper bound and broadcast the UB to the MP and other SPs.

If there is a request from the MP and the number of nodes in LP>2, then send 2 nodes to the endwhile MP.

Step 2: if there is no node in LP and the execution time of PBBU < Tc Send a request to MP.

If receive 2 nodes from GP in MP, then go to Step 1.

Step 3: if the execution time of PBBU > Tc Send all nodes in LP to MP.

End PBBU

3.6 Experimental Results

In the experimental tests, the PBBU was implemented on an AMD Athlon Personal Computers (PCs) cluster. The MP and SPs both were AMD Athlon with a clock rate of 2.2 GHz and 1GB memory; the SPs were connected by a 100Mbps switch. The PBBU was written in C++ with MPI 1.25 codes. Three data sets were used to verify the PBBU. One was a Random data set, which was generated randomly. The distance matrix in the Random data set was metric and the range of distances was between 1 and 100. The second one was a practical data set of 135 Human + one Chimpanzee Mitochondrial DNAs (HMDNA) obtained from [65] (clocklike data shown in the literature [11]). The third practical data set with 9 Bacteriophage T7 DNAs (BT7) was obtained from the literature [38]. The distance matrices from these two data sets were both metric.

In the results of Wu et al. [70], the execution time for constructing MUTs was shown to be dependent on the input distance matrices (data dependency). Figure 3-2 shows the execution time of five instances for HMDNA and Random data sets with various numbers of

(34)

23 species. Figure 3-2 also confirms the observation in [40]. In order to eliminate the data dependence circumstances, twenty instances were used to test the PBBU. For the Random data set, thirty species were randomly generated; for the HMDNA data set, each instance with thirty species was randomly selected from 136 species. In each instance for both the HMDNA and Random data sets, the species were labeled as a set {1,2, ,30} . For different numbers of species i in each instance, they were selected from {1, , } in order to eliminate the data i dependence circumstances.

1 10 100 1000 10000 100000 1000000

inst. 01 inst. 02 inst. 03 inst. 04 inst. 05

Time (sec.)

Instance

Execution time (HMDNA)

21 species 22 species 23 species 24 species 25 species 26 species 27 species

(a)

(35)

24

1 10 100 1000 10000 100000 1000000

inst. 01 inst. 02 inst. 03 inst. 04 inst. 05

Time (sec.)

Instance

Execution time (Random,)

14 species 15 species 16 species 17 species 18 species 19 species 20 species

(b)

Figure 3-2: The Execution Time of Five Instances of the HMDNA and Random Data Sets

Figure 3-3 and Figure 3-4 both show the execution time of the PBBU with a different number of species and processors for the HMDNA and Random data sets. As mentioned above, the execution time depended on the input instances. Hence, the worst, the average, and the median execution time were determined. However, the average case was heavily influenced by the worst case. From Figure 3-3, it can be seen that the execution time decreased when the number of processors increased. This observation also was found in Figure 3-4 for the Random data set.

Figure 3-5 and Figure 3-6 both show the speed-up ratios of the PBBU for the HMDNA and Random data sets, respectively. From Figure 3-5, the speed-up ratios increased when the number of processors increased. Moreover, the speed-up ratios were satisfied (> 0.5p) for all of test cases, even in 16 PCs. This result shows that the PBBU is scalable. This observation is also found in Figure 3-6 of the Random data set for most of test cases. Assume that Tc is 24 hours, for 136 species, and then PBBU can find a near-MUT within 25.09 hours. The cost of

(36)

25 near-MUT from the PBBU was 89450, which is less than the initial upper bound of 89537 for the UPGMM.

1 10 100 1000 10000 100000 1000000

1 2 4 8 16

Time (sec.)

Number of processors

Execution time (HMDNA, average case)

21 species 22 species 23 species 24 species 25 species 26 species 27 species

(a)

1 10 100 1000 10000 100000 1000000

1 2 4 8 16

Time (sec.)

Number of processors

Execution time (HMDNA, median case)

21 species 22 species 23 species 24 species 25 species 26 species 27 species

(b)

(37)

26

1 10 100 1000 10000 100000 1000000

1 2 4 8 16

Time (sec.)

Number of processors

Execution time (HMDNA, worst case)

21 species 22 species 23 species 24 species 25 species 26 species 27 species

(c)

Figure 3-3: Execution Time of the PBBU for the HMDNA Data Set

1 10 100 1000 10000 100000 1000000

1 2 4 8 16

Time (sec.)

Number of processors

Execution time (Random, average case)

14 species 15 species 16 species 17 species 18 species 19 species 20 species

(a)

(38)

27

1 10 100 1000 10000 100000

1 2 4 8 16

Time (sec.)

Number of processors

Execution time (Random, median case)

14 species 15 species 16 species 17 species 18 species 19 species 20 species

(b)

1 10 100 1000 10000 100000 1000000

1 2 4 8 16

Time (sec.)

Number of processors

Execution time (Random, worst case)

14 species 15 species 16 species 17 species 18 species 19 species 20 species

(c)

Figure 3-4: Execution Time of the PBBU for the Random Data Set

(39)

28

0 2 4 6 8 10 12 14

2 4 8 16

Speed-up ratio

Number of processors

Speed-up ratio (HMDNA, average case)

21 species 22 species 23 species 24 species 25 species 26 species 27 species

(a)

0 2 4 6 8 10 12 14 16

2 4 8 16

Speed-up ratio

Number of processors

Speed-up ratio (HMDNA, median case)

21 species 22 species 23 species 24 species 25 species 26 species 27 species

(b)

(40)

29

0 2 4 6 8 10 12 14 16

2 4 8 16

Speed-up ratio

Number of processors

Speed-up ratio (HMDNA, worst case)

21 species 22 species 23 species 24 species 25 species 26 species 27 species

(c)

Figure 3-5: Speed-up Ratio of the PBBU for the HMDNA Data Set

0 2 4 6 8 10 12

2 4 8 16

Speed-up ratio

Number of processors

Speed-up ratio (Random, average case)

14 species 15 species 16 species 17 species 18 species 19 species 20 species

(a)

(41)

30

0 2 4 6 8 10 12 14

2 4 8 16

Speed-up ratio

Number of processors

Speed-up ratio (Random, median case)

14 species 15 species 16 species 17 species 18 species 19 species 20 species

(b)

0 1 2 3 4 5 6 7 8 9 10

2 4 8 16

Speed-up ratio

Number of processors

Speed-up ratio (Random, worst case)

14 species 15 species 16 species 17 species 18 species 19 species 20 species

(c)

Figure 3-6: Speed-up Ratio of the PBBU for the Random Data Set

In order to verify the correctness of the PBBU, the results were compared to Vigilant et al. [65], Ingman et al. [42], and Hills et al. [38]. In the literature [65], a PAUP tool (version 3) [64] was used to generate the evolutionary tree for HMDNA data set. It was found that the Chimpanzee would be an out-group in the tree and for Human this tree could be roughly divided into two parts: African and non- African. In the literature [42], the same results were obtained by using a neighbor-joining method [63] from 52 Human+one Chimpanzee

(42)

31 Mitochondrial DNAs (partial of HMDNA data set). In this dissertation, thirty Human+one Chimpanzee Mitochondrial DNAs from HMDNA data set were randomly selected. Figure 3-7 shows the evolutionary tree constructed by the PBBU. From Figure 3-7, it can be seen that the Chimpanzee was an out-group and the tree for Human was roughly divided into African and non- African. This result matches those of Vigilant et al. [65] and Ingman et al. [42]. In the literature [38], a true (known) phylogeny for the experimental lineages of BT7 data set was generated. These data were used to test the five most popular methods: PAUP [64], Fitch-Margoliash method [26], Cavalli-Sforza method [10], neighbor-joining method [63]

and UPGMA [53]. These methods all predicted the correct branching orders of the known phylogeny. The BT7 data set was also used to test the PBBU and the constructed evolutionary tree is shown in Figure 3-8. From Figure 3-8, it can be seen that the branching orders are correct. It should be noted that the evolutionary tree constructed with the PBBU could not be used to reject other evolutionary trees by other methods. The goal of the PBBU is to supply another viewpoint/perspective for studying the evolution relationships of species.

(43)

32

Chimpanzee

!Kung 8

!Kung 7

!Kung 10 European 8 Asian 84 Asian 85 Asian 88 Asian 87 P.NewG 81 P.NewG 82 W.Pygmy 1 W.Pygmy 2 A.American E.Pygmy 4 E.Pygmy 6 E.Pygmy 5 Hadza 83 Asian 86

!Kung 9 P.NewG 80

Figure 3-7: Results of the PBBU for the HMDNA Data Set

(44)

33

T7-R

T7-K

T7-L

T7-M

T7-J

T7-N

T7-Q

T7-P

T7-O

Figure 3-8: Results of the PBBU for the BT7 Data Set

3.7 Summary

In this chapter, a PBBU was designed for constructing MUTs or near- MUTs. The PBBU was designed on PC cluster which is a distributed memory multiprocessors with master-slave architecture system. Master Processor (MP) created initial nodes and dispatched

(45)

34 most of them to slave processors (SPs), moreover, it hold a Global Pool (GP) to preserve some nodes. Each SP hold a Local Pool (LP) to store candidate nodes, this facility could reduce the communications between MP and SP to save the execution time. The MP also used to do the same work in SPs and try to balance the nodes between MP and SPs. In order to evaluate the PBBU, a random data set and some practical data sets of Human+Chimpanzee Mitochondrial and Bacteriophage T7 DNAs were used. The execution times for various numbers of species and processors were compared. From the experimental results, it can be seen that the PBBU found an optimal solution for 36 species within a reasonable time on 16 PCs. Moreover, the PBBU achieved satisfying speed-up ratios for most of test cases.

參考文獻

相關文件

• Tree lifetime: When the first node is dead in tree T, the rounds number of the node surviving is the lifetime of the tree. The residual energy of node is denoted as E)), where

• Use the binomial interest rate tree to model stochastic term structure.. – Illustrates the basic ideas underlying

We solve the three-in-a-tree problem on

Good Data Structure Needs Proper Accessing Algorithms: get, insert. rule of thumb for speed: often-get

Srikant, Fast Algorithms for Mining Association Rules in Large Database, Proceedings of the 20 th International Conference on Very Large Data Bases, 1994, 487-499. Swami,

Monopolies in synchronous distributed systems (Peleg 1998; Peleg

Data larger than memory but smaller than disk Design algorithms so that disk access is less frequent An example (Yu et al., 2010): a decomposition method to load a block at a time

Microphone and 600 ohm line conduits shall be mechanically and electrically connected to receptacle boxes and electrically grounded to the audio system ground point.. Lines in