Definition 5: If SPi sends messages to DPj-1 and DPj+1, the transmission between SPi and DPj
4. Performance Evaluation
To evaluate the performance of the proposed methods, we have implemented the DRC along with the Divide-and-Conquer algorithm [23]. The performance simulation is discussed in two categories, even GEN_BLOCK and uneven GEN_BLOCK distributions. In even GEN_BLOCK distribution, each processor owns similar size of data. In contrast to even distribution, few processors might be allocated by grand volumes of data with uneven distribution.
Since data elements could be centralized to some specific processors, it is also possible for those processors to have the maximum degree of communications.
The simulation program generates a set of random integer number and the size of message as A[SPi] and A[DPi]. Moreover, the total message size sending from SPi equals to the total size receiving to DPi keeping the balance between source processors and destination processors.
We assume that the data computation (communication) time in the simulation is represented by the transmission size |Eij|. In the following figures, the percentage of events is plotted as a function of the message size and the number of processors. Also, in the figures, “DRC Better” represents the percentage of the number of events that the DRC algorithm has lower total computation (communication) time than the Divide-and-Conquer algorithm, while “DC Better”
gives the reverse situation. If both algorithms have the same total computation (communication) time, “The Same Results” represents the number of that event.
In the uneven distribution, the size of message’s up-bound is set to be B*1.7 and that of low-bound is set to be B*0.3, where B is equal to the sum of total transmission message size / total number of processors. In the even distribution, the size of message’s up-bound is set to be B*1.3 and that of low-bound is set to be B*0.7. The total message-size is 10M.
Fig 6(a) and 6(b) show the simulation results of both the DRC and the Divide-and-Conquer algorithm
with different number of processors and total message size. The number of processors is from 8 to 24. We can observe that the DRC algorithm has better performance in the uneven data redistribution compared with Divide-and-Conquer algorithm. Since
the data is concentrated in the even case, from Fig 7(a) and 7(b), we can observe that DRC has better performance compared with the uneven case. In both even and uneven cases, DRC performs better than the Divide-and-Conquer algorithm.
Figure 6. The events percentage of computing time is plotted (a) with different number of processors and (b) with different number of total message sizes in 24 processors, on the uneven data set.
Figure 7. The events percentage of computing time is plotted (a) with different number of processors and (b) with different number of total message sizes in 24 processors, on the even data set.
5.Conclusion
In this paper, we have presented a Degree-Reduction-Coloring (DRC) scheduling algorithm to efficiently perform HPF2 irregular array redistribution on a distributed memory multi-computer.
The DRC algorithm is a simple method with low algorithmic complexity to perform GEN_BLOCK array redistribution. The DRC algorithm is an optimal algorithm in terms of minimal number of steps. In the same time, DRC algorithm is also a near optimal algorithm satisfying the condition of minimal message size of total steps. Effectiveness of the proposed methods not only avoids node contention, but also shortens the overall communication length.
For verifying the performance of our proposed algorithm, we have implemented DRC as well as the Divide-and-Conquer redistribution algorithm. The experimental results show improvement in communication costs and high practicability on different processor hierarchies. Also, the experimental results indicate that both of them have good
performance on GEN_BLOCK redistribution. In many situations, DRC is better than the Divide-and-Conquer redistribution algorithm.
Reference
[1] G. Bandera and E.L. Zapata, “Sparse Matrix Block-Cyclic Redistribution,” Proceeding of IEEE Int'l. Parallel Processing Symposium (IPPS'99), San Juan, Puerto Rico, 355 - 359 ,April 1999
[2] J.A. Bondy and U.S.R. Murty, Graph Theory with Applications, Macmillan, London, 1976.
[3] Frederic Desprez, Jack Dongarra and Antoine Petitet,
“Scheduling Block-Cyclic Data redistribution,” IEEE Trans. on PDS, vol. 9, no. 2, pp. 192-205, Feb. 1998.
[4] Minyi Guo, “Communication Generation for Irregular Codes,” The Journal of Supercomputing, vol. 25, no. 3, pp. 199-214, 2003.
[5] Minyi Guo and I. Nakata, “A Framework for Efficient Array Redistribution on Distributed
Memory Multicomputers,” The Journal of Supercomputing, vol. 20, no. 3, pp. 243-265, 2001.
[6] Minyi Guo, I. Nakata and Y. Yamashita,
“Contention-Free Communication Scheduling for Array Redistribution,” Parallel Computing, vol. 26, no.8, pp. 1325-1343, 2000.
[7] Minyi Guo, I. Nakata and Y. Yamashita, “An Efficient Data Distribution Technique for Distributed Memory Parallel Computers,” Joint Symp. on Parallel Processing (JSPP'97), pp.189-196, 1997.
[8] Minyi Guo, Yi Pan and Zhen Liu, “Symbolic Communication Set Generation for Irregular Parallel Applications,” The Journal of Supercomputing, vol.
25, pp. 199-214, 2003.
[9] Edgar T. Kalns, and Lionel M. Ni, “Processor Mapping Technique Toward Efficient Data Redistribution,” IEEE Trans. on PDS, vol. 6, no. 12, pp. 1234-1247, December 1995.
[10] S. D. Kaushik, C. H. Huang, J. Ramanujam and P.
Sadayappan, “Multiphase data redistribution:
Modeling and evaluation,” International Parallel Processing Symposium (IPPS’95), pp. 441-445, 1995.
[11] Peizong Lee, Academia Sinica, and Zvi Meir Kedem,
“Automatic Data and Computation Decomposition on Distributed Memory Parallel Computers,” ACM Transactions on Programming Languages and systems, Vol 24, No. 1, pp. 1-50, January 2002.
[12] S. Lee, H. Yook, M. Koo and M. Park, “Processor reordering algorithms toward efficient GEN_BLOCK redistribution,” Proceedings of the ACM symposium on Applied computing, pp . 539-543, 2001.
[13] Y. W. Lim, Prashanth B. Bhat and Viktor and K.
Prasanna, “Efficient Algorithms for Block-Cyclic Redistribution of Arrays,” Algorithmica, vol. 24, no.
3-4, pp. 298-330, 1999.
[14] C.-H Hsu, S.-W Bai, Y.-C Chung and C.-S Yang, “A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution,” IEEE Transactions on Parallel and Distributed Systems, vol. 11, no. 12, pp. 1201-1216, Dec. 2000.
[15] Ching-Hsien Hsu, Kun-Ming Yu, “An Optimal Processor Replacement Scheme for Efficient Communication of Runtime Data Realignment,”
Parallel and Distributed and Processing and Applications, - Lecture Notes in Computer Science,
Vol. 3358, pp. 268-273, 2004.
[16] C.-H Hsu, Dong-Lin Yang, Yeh-Ching Chung and Chyi-Ren Dow, “A Generalized Processor Mapping Technique for Array Redistribution,” IEEE Transactions on Parallel and Distributed Systems, vol.
12, vol. 7, pp. 743-757, July 2001.
[17] Antoine P. Petitet and Jack J. Dongarra, “Algorithmic Redistribution Methods for Block-Cyclic Decompositions,” IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 12, pp. 1201-1216, Dec. 1999
[18] Neungsoo Park, Viktor K. Prasanna and Cauligi S.
Raghavendra, “Efficient Algorithms for Block-Cyclic Data redistribution Between Processor Sets,” IEEE Transactions on Parallel and Distributed Systems, vol.
10, No. 12, pp.1217-1240, Dec. 1999.
[19] .L. Prylli and B. Touranchean, “Fast runtime block cyclic data redistribution on multiprocessors,”
Journal of Parallel and Distributed Computing, vol.
45, pp. 63-72, Aug. 1997.
[20] S. Ramaswamy, B. Simons, and P. Banerjee,
“Optimization for Efficient Data redistribution on Distributed Memory Multicomputers,” Journal of Parallel and Distributed Computing, vol. 38, pp.
217-228, 1996.
[21] Akiyoshi Wakatani and Michael Wolfe,
“Optimization of Data redistribution for Distributed Memory Multicomputers,” short communication, Parallel Computing, vol. 21, no. 9, pp. 1485-1490, September 1995.
[22] Hui Wang, Minyi Guo and Wenxi Chen, “An Efficient Algorithm for Irregular Redistribution in Parallelizing Compilers,” Proceedings of 2003 International Symposium on Parallel and Distributed Processing with Applications, LNCS 2745, 2003.
[23] Hui Wang, Minyi Guo and Daming Wei,
"Divide-and-conquer Algorithm for Irregular Redistributions in Parallelizing Compilers”, The Journal of Supercomputing, vol. 29, no. 2, pp.
157-170, 2004.
[24] H.-G. Yook and Myung-Soon Park, “Scheduling GEN_BLOCK Array Redistribution,” Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems, November, 1999.
應 用 網 格 建 立 一 個 高 效 能 演 化 樹 平 行 建 構 環 境 *
游 坤 明1, 徐 蓓 芳1, 賴 威 廷1, 謝 一 功1, 周 嘉 奕1, 林 俊 淵2, 唐 傳 義3
1 中 華 大 學 資 訊 工 程 學 系
2 國 立 清 華 大 學 分 子 與 細 胞 生 物 研 究 所
3 國 立 清 華 大 學 資 訊 工 程 學 系
1 yu@chu.edu.tw, {b9102042, b9004060, b9102004}@cc.chu.edu.tw, jyzhou@pdlab.csie.chu.edu.tw
2 cyulin@mx.nthu.edu.tw
3 cytang@cs.nthu.edu.tw
摘要
以平行處理方式來計算龐大的資料運算是近 年來一個非常重要的應用觀念。有許多不同的環境 架構伴隨著不同的應用。網格 (Grid) 是一種建立 在網際網路上的架構,網格可透過網際網路與其他 網格互相分享資源,因此可以視為在使用龐大的且 容易增減的資源來運算;與傳統的叢集式系統相 比,傳統的叢集式系統 (Cluster) 若要增加運算能 力,則必需花費比網格多的費用,因此運算能力有 限。在一般所見的網格中,必須要有相同的協定、
彼此認同的認證、安全性的考量以及合理的資源存 取,才能讓網格在網路上互相溝通。使用網格運算 我們所要處理的資料及程式,並且在合理的時間內 得到正確的結果。本論文使用平行化演算法並以人 類粒腺體為例,在單機、網格與叢集電腦環境中建 構演化樹,並比較其效能差異。
關鍵詞:等距演化樹 , 叢集電腦計算, 網格計算, Globus Toolkit
1. 簡介
生物資訊研究領域中,科學家常常需要從演 化樹的結果以了解物種間的親疏關係。從距離矩陣 中建造演化樹在生物學和分類法方面是一個重要 的議題,因此也產生許多不同的模型及相對應的演 算 法 。 而 大 部 份 的 最 佳 解 問 題 都 已 被 証 明 為 NP-hard。
*