# 可重組態架構上之有向圖平行演算法

全文

(2) Parallel Digraph Algorithms on Reconfigurable Architectures. A dissertation presented by Pan, Tien-Tai to the Department of Computer Science and Information Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of Computer Science. National Taiwan Normal University Taipei, Taiwan, R.O.C 2011.

(3)

(4) 誌謝 在此感謝 指導教授 林順喜博士多年的辛勞、 妻子 瓊瑤的陪伴與支持、 女兒 岱華可愛的笑容、 兒子 禾坤調皮的神情、 與其他助我良多的友人。. i.

(5) 中文摘要 在過去的30年中，可重組態架構所具有的超高速計算能力，吸引許多研究者 與科學家投入該領域，它們是具有強大能力的平行計算模型，但是有些類型的計 算問題並沒有被完整地研究清楚，例如：有向圖問題。有方向性的可重組態架構 是針對有向圖問題特別開發出來的可重組態架構，這些有方向性的可重組態架構 是一種理想化的機器，且只用來處理有向圖問題。在本論文中，我們將嘗試以新 的方法並使用無方向性的可重組態架構去處理有向圖問題。 根據我們所知在本研究之前，有二種方法可以在可重組態架構上處理有向圖 問題：第一種方法是基於與計算模型無關的矩陣乘法，它被用來解決代數路徑問 題，例如遞移閉包、任兩端點間最短路徑、與最小擴張樹等問題，其所使用的計 算模型為無方向性的可重組態架構；第二種方法則是使用有方向性的可重組態架 構，例如有方向性的可重組態格狀網路、有方向性的可重組態匯流排之處理器陣 列，來解決特定的有向圖問題。基於第二種方法的演算法必須利用有方向性的可 重組態架構之特定能力才能正確無誤地處理問題，這能力就是它們能夠在匯流排 的每一段落控制資料的流向。 在本論文中我們提出第三種解決方案，它是基於那些無方向性的可重組態架 構上去解決有向圖問題，例如有向的遞移閉包問題可以在三維n×n×n的可重組 態格狀網路用O(log d(D))的時間解決該問題，其中d(D)是有向圖D的直徑，基於 有向的遞移閉包演算法的相同設計概念，我們可以解決下列的有向圖問題：強連 接圖、強連接元件、環狀圖檢查、樹的建構、任兩端點之最短距離、單源最短距 離、直徑、拓樸排序等問題，這些演算法價恰恰證明了這第三種解決方案的威 力，因此我們認為這個在無方向性的可重組態架構上有向圖問題解決方案是有相 當價值，而這將對超高速計算與即時運算等領域有所貢獻。 ii.

(6) 關鍵詞：可重組態架構、可重組態格狀網路、遞移閉包、強連接元件、任 兩頂點對之最短距離、單源最短距離、拓樸排序、有向圖問題。. iii.

(7) Abstract Reconfigurable architectures have attracted many researchers and scientists for their high performance computing for the past three decades. They are very powerful parallel computation models, but some types of problems have not been studied completely, for example, the digraph problems. The directional reconfigurable architectures are developed especially for digraph problems, they are idealistic machines for handling such problems. In this dissertation, we focus on digraph problems by using non-directional reconfigurable architectures and try to solve them by a new approach. Before this study, to our best knowledge, there are two approaches to solve digraph problems on reconfigurable architectures. The first one is on the basis of matrix multiplication which is independent of computation models. It was used to solve the algebraic path problems (APP), for example, transitive closure (TC), all-pairs shortest path (APSP), the minimum spanning tree (MST), on non-directional reconfigurable architectures. The second approach uses directional reconfigurable architectures, such as directional reconfigurable mesh (DR-Mesh) and complete directional processor arrays with reconfigurable bus systems (CD-PARBS), to solve specified digraph problems. The algorithms of the second approach specifically use the capability of the directional reconfigurable architectures which can control the data flow in each segment of a bus. In this dissertation, the third approach on non-directional reconfigurable architectures will be proposed to solve many digraph problems. For example, the transitive closure problem can be solved in O(log d(D)) time on a three-dimensional (3-D) n×n×n reconfigurable mesh (R-Mesh), where d(D) is the diameter of digraph D. Based on the same idea used in the transitive closure algorithm, we can solve the following diiv.

(8) graph problems: strongly connected graph, strongly connected component (SCC), cyclic digraph checking, tree construction, all-pairs shortest distance, single source shortest distance, diameter, topological sort (TS). These algorithms show the power of the third approach. So we believe this approach is valuable to digraph problems on non-reconfigurable architectures for the high performance computing and time-critical applications.. keywords: reconfigurable architectures, reconfigurable mesh (R-Mesh), transitive closure (TC), strongly connected component (SCC), all-pairs shortest distance, single source shortest distance, topological sort (TS), digraph problems.. v.

(9) Contents 誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 1.1 Parallel Computation Models . . . . . . . . . . . 1.1.1 Parallel Random Access Machine . . . . 1.1.2 Reconfigurable Architectures . . . . . . . 1.2 Basic Algorithm for Reconfigurable Architectures 1.3 Algorithms on Parallel Computation Models . . . 1.4 Notations and Definitions . . . . . . . . . . . . . 1.5 Organization of the Dissertation . . . . . . . . . 2 Transitive Closure Algorithm . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . 2.2 Transitive Closure Algorithm . . . . . . . . . . . 2.3 Conclusion . . . . . . . . . . . . . . . . . . . . 3 Transitive Closure Related Algorithms . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . 3.2 Strongly Connected Digraph Algorithm . . . . . 3.3 Strongly Connected Component Algorithm . . . 3.4 Cyclic Graph Check Algorithm . . . . . . . . . . 3.5 Tree Construction Algorithm . . . . . . . . . . . 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . 4 All-Pairs Shortest Distance Algorithm . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . 4.2 All-Pairs Shortest Distance Algorithm . . . . . . 4.3 Conclusion . . . . . . . . . . . . . . . . . . . . 5 All-Pairs Shortest Distance Related Algorithms . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . 5.2 Single Source Shortest Distance Algorithm . . . 5.3 Diameter Algorithm . . . . . . . . . . . . . . . . 5.4 Topological Sort Algorithm . . . . . . . . . . . . 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . 6 Conclusion and Future Work . . . . . . . . . . . . . . 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . 6.2 Future Work . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . List of Publications . . . . . . . . . . . . . . . . . . . . .. vi. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. i ii iv 1 4 5 6 12 14 21 23 24 24 26 38 41 41 42 43 46 49 50 52 52 53 63 66 66 67 67 69 70 72 72 73 75 82.

(10) Appendix A：發表於Parallel Processing Letters之期刊論文 . . . . . . . . . . . .. vii. 83.

(11) List of Figures 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8. A reconfigurable mesh of size 3×5 . . . . . . . . . . . . . . . . . 15 possible port connections . . . . . . . . . . . . . . . . . . . . A 2-D reconfigurable mesh with five sub-buses . . . . . . . . . . A digraph example on non-directional R-Mesh . . . . . . . . . . Examine whether the arc, from V1 to V4 , is redundant . . . . . . . Examine whether the arc, from V1 to V4 , is redundant in D-PARBS The O(1) time n-bits AND Algorithm on a 1-D R-Mesh . . . . . The O(1) time n-bits OR Algorithm on a 1-D R-Mesh . . . . . .. . . . . . . . .. 8 8 9 11 11 12 14 14. 2.1 2.2. A digraph example with its adjacent matrix on R-Mesh . . . . . . . . . . Example of the transitive closure algorithm . . . . . . . . . . . . . . . .. 26 34. 3.1 3.2 3.3 3.4 3.5 3.6. A digraph has three strongly connected components . . . . . . . . . . . A digraph has three strongly connected components . . . . . . . . . . . A digraph example . . . . . . . . . . . . . . . . . . . . . . . . . . . . A digraph has three strongly connected components . . . . . . . . . . . Example for our faster cyclic checking and tree construction algorithms Example of our tree construction algorithm . . . . . . . . . . . . . . .. . . . . . .. 42 44 45 46 48 49. 4.1 4.2 4.3. A digraph example with its adjacent matrix on R-Mesh . . . . . . . . . . All-pairs shortest distance of the example . . . . . . . . . . . . . . . . . Example of the all-pairs shortest distance algorithm . . . . . . . . . . . .. 53 53 62. 5.1 5.2 5.3 5.4. A digraph example with its adjacent matrix on R-Mesh All-pairs shortest distance of the example . . . . . . . All-pairs shortest distances result . . . . . . . . . . . . Topological sort result . . . . . . . . . . . . . . . . .. . . . .. 68 68 69 70. 6.1. An open question - globally update reachable information . . . . . . . . .. 74. viii. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . ..

(12) List of Tables 1.1. Related Algorithms on Different Parallel Computation Models . . . . . .. 14. 2.1. Summary of previous and our results for Chapter 2. . . . . . . . . . . . .. 39. 3.1. Summary of previous and our results for Chapter 3. . . . . . . . . . . . .. 51. 4.1. Summary of previous and our results for Chapter 4. . . . . . . . . . . . .. 63. 5.1. Summary of previous and our results for Chapter 5. . . . . . . . . . . . .. 71. ix.

(13) Chapter 1. Introduction. Reconfigurable architectures have attracted many researchers and scientists for their high performance computing for the past three decades. Since 1980s, many reconfigurable architectures have been proposed, such as reconfigurable mesh (R-Mesh), directional reconfigurable mesh (DR-Mesh), processor arrays with reconfigurable bus systems (PARBS), directional processor arrays with reconfigurable bus systems (D-PARBS), complete directional processor arrays with reconfigurable bus systems (CD-PARBS), reconfigurable multiple bus machine (RMBM), directional reconfigurable multiple bus machine (DRMBM), reconfigurable networks (RN), directional reconfigurable networks (D-RN), etc. Based on these reconfigurable architectures, constant time or logarithmic time algorithms for many problems can be easily obtained. As a matter of fact, the concurrent read and concurrent write parallel random access machine (CRCW PRAM) could not compute the logical exclusive-OR of n boolean values in O(1) time, even though it is an idealistic parallel computation model with the concurrent write capability. On reconfigurable architectures, the same problem can be solved in O(1) time. Therefore, reconfigurable architectures are valuable for the high performance computing (HPC) and time-critical applications. Processor arrays with reconfigurable bus systems (PARBS) and reconfigurable mesh (R-Mesh) are the most popular and famous models among these reconfigurable architectures. They both are parallel reconfigurable computation architectures that consist of processor arrays and reconfigurable bus systems which can dynamically configure the internal switches of processors to form specific sub-buses. The processors connected to a common sub-bus can communicate with others by broadcasting data on it in fixed units 1.

(14) of time. That is why processor arrays with reconfigurable bus systems or reconfigurable mesh can solve many problems efficiently and quickly. After all, the capability to configure the sub-buses is also a part of computation. Since many engineering and science problems can be formulated as digraph or graph problems, developing algorithms for digraph or graph problems becomes important in theory and practice. Because each edge in an undirected graph G can be replaced with two arcs, the digraph D dominates the undirected graph G, G ⊆ D. Processor arrays with reconfigurable bus systems and reconfigurable mesh can be extremely useful for digraph or graph problems. Before this study, to our best knowledge, there are two approaches to solve digraph problems on reconfigurable architectures. The first one is on the basis of matrix multiplication which is independent of computation models. It was used to solve the algebraic path problems (APP), for example, transitive closure (TC), all-pairs shortest path (APSP), the minimum spanning tree (MST), on non-directional reconfigurable architectures. The second approach uses directional reconfigurable architectures, such as directional reconfigurable mesh (DR-Mesh) and complete directional processor arrays with reconfigurable bus systems (CD-PARBS), to solve specified digraph problems. The algorithms of the second approach specifically use the capability of the directional reconfigurable architectures which can control the data flow in each segment of a bus. The constant-time algorithms on reconfigurable models for digraph or graph problems all are based on the s-t connectivity algorithm. Its steps are as follows : the digraph or graph is embedded in a n×n directional or non-directional reconfigurable model, a signal is sent from a specified processor (s, s) representing the source vertex s, and the target vertex t is connected to the source vertex s if and only if processor (t, t) receives the signal. The key. 2.

(15) of designing the constant-time algorithms on reconfigurable architectures is the constanttime signal broadcasting: all the reachable vertices for the specified source vertex can receive the signal in the same time. So the actual distance between two vertices is not significant in such algorithm. Since we know that it cannot solve some types of digraph problems. That is why we believe the constant-time signal broadcasting is the pros and cons of reconfigurable architectures for some digraph problems. In this chapter, we will describe some of popular and powerful parallel computing models that have been used by many researchers and scientists worldwide, for example, parallel random access machine (PRAM) [1], reconfigurable mesh (R-Mesh), processor arrays with reconfigurable bus systems (PARBS), reconfigurable multiple bus machine (RMBM), directional reconfigurable mesh (DR-Mesh), directional processor arrays with reconfigurable bus systems (D-PARBS), complete directional processor arrays with reconfigurable bus systems (CD-PARBS), directional reconfigurable multiple bus machine (D-RMBM), etc. They belong to the reconfigurable architectures except the well-known PRAM model. The concept of reconfigurable architectures has existed since the 1960s. Prof. Gerald Estrin [2] [3], the first scholar in this field, proposed the concept of a computer made of a standard processor and an array of reconfigurable hardware in his valuable paper. In the 1980s and 1990s, many researchers and scientists derived different reconfigurable architectures from his idea [4] [5] [6] [7] [8]. In this dissertation, these models are categorized into non-directional and directional reconfigurable architectures. The main difference between non-directional and directional is that DR-Mesh, D-PARBS, CD-PARBS, and directional RMBM can control the data flow in each segment of a bus. This is the main. 3.

(16) reason that the directional models can solve the digraph problems but the non-directional models cannot. Algorithms on the specified parallel computation model can show the power of the model and the complexities of these algorithms can help us to know the problems better. So some significant algorithms on these parallel computation models will be included in Table 1.1.. 1.1. Parallel Computation Models. Traditionally speaking, a sequential algorithm is constructed and implemented as a serial stream of instructions to solve a problem on the sequential computation model. Only one instruction is executed on a central processing unit (CPU) each time, the next instruction will be executed after the previous one is finished. In order for solving a problem quickly, we can use multiple processing elements (PEs) simultaneously. For example, it can be accomplished by dividing the problem into smaller parts so that each processing element can solve its part simultaneously and merge with the others to get the final result. For the reason above, many parallel computing models and model-dependent algorithms have been proposed. The parallel algorithms can reduce the time complexities dramatically, although they need a lot of processing elements. Due to the unrecoverable time resource and the conservation of mass and energy in a closed system, we believe that time complexity is the most important parameter of an algorithm. For the high performance computing and the time-critical applications, we do need fast parallel algorithms.. 4.

(17) 1.1.1. Parallel Random Access Machine. A parallel random access machine (PRAM) [1] is a shared memory abstract machine which is used by researchers to design their algorithms, find the truth of problems, or compare the performance of algorithms developed on other parallel computing models. The parallel random access machine is a very idealistic parallel computation model, but it is a very popular and famous model in the world and it is important in theory. We will describe this model as follows. The conflict of read and write accessing the same shared memory location is solved by one of the following strategies:. • Exclusive Read Exclusive Write (EREW) – every memory cell can be read or written onto by only one processing element at a time • Concurrent Read Exclusive Write (CREW) – multiple processing elements can read a memory cell but only one element can write at a time • Exclusive Read Concurrent Write (ERCW) – not ever in consideration • Concurrent Read Concurrent Write (CRCW) – multiple processing elements can read and write a memory cell.. Several assumptions are made while developing algorithms on PRAM. They are:. • No limit on the number of processing elements in the machine • Any memory cell is uniformly accessible from any processing element • No limit on the amount of shared memory in the machine. 5.

(18) • No resource contention • Algorithms on the machine are the types of multiple instructions and multiple data (MIMD) or single instruction and multiple data (SIMD). Although PRAM is an idealistic parallel computation model to implement - it is useful in designing parallel algorithms to understand the concurrency of the problems, i.e. how to divide the problems into smaller sub-problems and solve them in parallel.. 1.1.2. Reconfigurable Architectures. A reconfigurable architecture consists of processing elements interconnected by a reconfigurable bus system that can be used to generate various connection patterns between the processing elements dynamically. The basic unit in reconfigurable architectures is the processing element which comprises switches, local memory and arithmetic logic unit (ALU). The processing element can set up its switches to read from or write onto a bus or local memory, and do one arithmetic or logic operation on local data in one unit of time. A reconfigurable bus system has four parameters: • Width: It refers to the data width of the processing elements. Note that the width is not directly related to the bus width of a reconfigurable architecture. There are two different models distinguished by the length of the operand of the processing element. – Bit model: the length of the operand is a bit – Word model: the length of the operand is a word • Delay: The time needed to propagate a signal. 6.

(19) – Unit delay model: just O(1) time for broadcasting, no matter how far it is – Logarithmic delay model: O(log n) time for broadcasting, where n is the number of processing elements the signal has to travel • Access: Each processing element connected to the bus through its ports can either read from or write onto it. For the writing operation, only one of the processing elements which are connected to a common bus can write data on this bus. The operation to read has the following two models. – Exclusion Read (ER) model: Only one processing element can read the bus – Concurrent Read (CR) model: all processing elements can read the bus • Connection Patterns: the connection patterns can be set among its ports in a processing element.. Reconfigurable Mesh Reconfigurable mesh was presented by Mill et al. [9] [10]. It includes an array of processors interconnected by a reconfigurable bus system that can be used to obtain various interconnection patterns between the processors. The power of reconfigurability is illustrated by solving some problems like the exclusive OR (XOR) more efficiently on reconfigurable mesh than on the parallel random access machine. The 2-D reconfigurable mesh of size m×n is an array of processing elements in m rows and n columns, connected to a grid-shaped reconfigurable broadcast bus. A 2-D 3×5 reconfigurable mesh (R-Mesh) is shown in Figure 1.1. The internal connections among the four ports N, S, E, W of a processing element can be configured during the execution of algorithms. Note that there. 7.

(20) are 15 allowed connection patterns for 2-D R-Mesh (see Figure 1.2). By configuring the internal connections, all processing elements can be partitioned into different sub-buses as shown in Figure 1.3. A 3-D r×s×t reconfigurable mesh with six ports, N, S, E, W, U, and D, can be defined in the similar way.. 1. 2. 3. 4. 5. External connection. 1 N. 2 W. E. 3. S. Figure 1.1: A reconfigurable mesh of size 3×5. {NSEW}. {NS,EW}. {NE,SW}. {NW,SE}. {NSW}. {NEW}. {NSE}. {SEW}. {NW}. {NE}. {SE}. {SW}. {NS}. {EW}. {}. Figure 1.2: 15 possible port connections. In this study, all our algorithms are based on the non-directional reconfigurable mesh whose parameters are word, unit delay, and concurrent read model. 8.

(21) Figure 1.3: A 2-D reconfigurable mesh with five sub-buses. Processor Arrays with Reconfigurable Bus Systems The processor arrays with reconfigurable bus systems are the general form for reconfigurable architectures. For example, a reconfigurable mesh is a 2-D processor array with reconfigurable bus systems. So they are functionally equivalent to each other. For the k-dimensional (k-D) processor arrays with reconfigurable bus systems, there are k-D processor arrays and reconfigurable bus systems. As far as we know, the value of k is less than or equal to three in most algorithms on reconfigurable architectures.. Directional Processor Arrays with Reconfigurable Bus Systems Although there are many constant-time and logarithmic-time algorithms on processor arrays with reconfigurable bus systems and reconfigurable mesh, only a few digraph algorithms have been proposed on non-directional reconfigurable models. Pradeep and Murthy presented a constant-time algorithm in [11] for redundant arcs elimination for digraph by using O(n4 ) processors on the processor arrays with reconfigurable bus systems. Because processor arrays with reconfigurable bus systems cannot control the direction of signal flow, their algorithm cannot solve the problem correctly [12]. Therefore,. 9.

(22) the directional processor arrays with reconfigurable bus systems (D-PARBS) and complete directional processor arrays with reconfigurable bus systems (CD-PARBS) have been proposed in [12] to solve some digraph problems, for example, transitive closure, strongly connected graph, strongly connected component, etc. The proposed D-PARBS and CD-PARBS architectures are allowed to control the direction of signal flow, so we could develop the constant time algorithms for digraph problems on such architectures. Now let us explain the reason why Pradeep’s algorithm in [11] cannot guarantee to correctly solve the redundant arc elimination problem for digraphs by Figure 1.4 and Figure 1.5. Figure 1.4 is the digraph example as input for the algorithm. Figure 1.5 shows the result. Notice that the algorithm is executed on a non-directional reconfigurable model. If we want to examine whether the arc < 1, 4 > is redundant or not, we need to accomplish the following steps: Step 1. Send signal from P1,1 and check if P4,4 receives this signal. If P4,4 does, then perform Step 2; otherwise do nothing. Step 2. Disconnect P1,4 and disable all processors Pi,j for i < 1 and j > 4. In this case, there is no such Pi,j . Step 3. Send signal again from P1,1 . If P4,4 receives the signal again, then < 1, 4 > is a redundant arc; otherwise, it is not.. Step 1 is shown in the left of Figure 1.5, and Steps 2 and 3 are shown in the right of Figure 1.5, where the thick black lines represent the signal flowing from P1,1 to P4,4 .. 10.

(23) k 1. 1. 2. 3. 4. 3. 4. 1 i 2. 2. 3 3. 4. 4. Figure 1.4: A digraph example on non-directional R-Mesh. k 1. k 2. 3. 1. 4. 1. 2. 1. i. i 2. 2. 3. 3. 4. 4. Figure 1.5: Examine whether the arc, from V1 to V4 , is redundant. In this case, it finds that < 1, 4 > is a redundant arc, which indicates that there is another directed path from V1 to V4 without going through < 1, 4 >. However, as shown in the left of Figure 1.4, < 1, 4 > is actually not a redundant arc. This incorrect result is mainly because Pradeep’s algorithm could not properly control the signal flow. As a result, the signal goes through a path from vertices V1 , V3 , and V2 to V4 even there is no such directed path. Because the non-directional reconfigurable architectures cannot control the direction of signal, so some researchers proposed modified models which are called directional reconfigurable architectures. The main benefit is that the modified models can control the. 11.

(24) direction of signal, so they can solve the digraph problems on the modified models. The D-PARBS is one of such modified models. In the right of Figure 1.6, the signal sent by V1 does not broadcast to V4 , so we know that V4 is not a redundant arc.. k 1. k 2. 3. 4. 1. 1. 2. 3. 4. 1. i. i 2. 2. 3. 3. 4. 4. Figure 1.6: Examine whether the arc, from V1 to V4 , is redundant in D-PARBS. Directional reconfigurable architectures are useful and idealistic reconfigurable models for solving the digraph problems, but we do have difficulties to make such machine. The main difficultly is that the electric signal is non-directional. If they can be implemented, their cost will be quite high. For example, in [13] the single source shortest path problem can be solved on DR-Mesh in O(1) time but the cost is O(n2 w × n2 w), where w is the maximum arc weight. That is why we believe that the directional model is not a realistic solution for solving the digraph problems.. 1.2. Basic Algorithm for Reconfigurable Architectures. The constant-time algorithms on the non-directional and directional reconfigurable models for digraph and graph problems are based on the s-t connectivity algorithm which. 12.

(25) has the outstanding feature: the digraph or graph is embedded in an n×n directional or non-directional reconfigurable model, a signal is sent from a specified processor (s, s) representing the source vertex s, and the target vertex t is connected to the source vertex s if and only if processor (t, t) receives the signal. In the s-t connectivity algorithm, we cannot identify which reachable vertices are closer to the source vertex s. Although there are many O(1) time tree algorithms for graph, for example, Euler tour, rooting, tree traversal, vertex levels, descendants and leaves, and tree reconstruction, but no algorithm which can construct a general tree for graph has been found [7]. The reason is because of the constant-time signal broadcasting: all the reachable vertices for the specified source vertex can receive the signal in the same time, so the actual distance between two vertices is not important. We believe that the constant-time signal broadcasting is the pros and cons of the reconfigurable architecture for some digraph and graph problems. For example our previous research in [14] can find the minimum spanning tree or the spanning tree in O(1) time, but we can not list the vertices and edges in sequence from the specified source vertex s to a specified target vertex t in O(1) time. Here we give two examples to demonstrate the s-t connectivity approach on different problems. In Figure 1.6 and 1.7, the processors set their internal switches to {EW} or {} according to its value and operations. The signal is broadcasted by the sender and the receiver cannot receive the signal in these two cases. This is the basic skill in the reconfigurable architectures, so some problems can be solved in O(1) time.. 13.

(26) 0. 0. 0. 1. 0. Figure 1.7: The O(1) time n-bits AND Algorithm on a 1-D R-Mesh. 0. 0. 0. 1. 0. Figure 1.8: The O(1) time n-bits OR Algorithm on a 1-D R-Mesh. 1.3. Algorithms on Parallel Computation Models. Many algorithms have been proposed and proven to be optimal on reconfigurable architectures for some basic computations like OR, AND, EXOR, Addition, Multiplication, etc. By using these basic computations and some techniques, the problems of image processing, computational geometry, graph, etc. have been exploited and demonstrated to show the power of reconfigurable architectures. We present the summary of some important algorithms on reconfigurable architectures, RAM, and PRAM in Table 1.1. The notations in Table 1.1 are explained in Section 1-4.. Table 1.1: Related Algorithms on Different Parallel Computation Models Problem. Model. Time. CPU #. Transitive closure [15]. RAM. O(n3 ). 1. 14.

(27) Problem. Model. Time. CPU #. Strongly connected component [16]. RAM. O(n + m). 1. A deterministic NC algorithm for. PRAM. O(log 11 n×n1/2 ). O(log 11 n×n1/2 ). Breadth-first search [18]. RAM. O(n + m). 1. Depth-first search [18]. RAM. Θ(n + m). 1. Topological sort [18]. RAM. Θ(n + m). 1. compo-. RAM. Θ(n + m). 1. Minimum spanning tree (Kruskal). RAM. O(m×log n). 1. RAM. O(m+n×log n). 1. path. RAM. O(n×m). 1. Single source shortest path (DAG). RAM. O(n + m). 1. RAM. O(n2 ). 1. RAM. O(n4 ). 1. general directed depth-first search [17]. Strongly. connected. nents [18]. [18] Minimum spanning tree (Prime) [18] Single. source. shortest. (Bellman-Ford) [18]. [18] Single source shortest path (Dijkstra) [18] All-pairs shortest paths (BellmanFord) [18]. 15.

(28) Problem. Model. Time. CPU #. All-pairs shortest paths (Dijkstra). RAM. O(n3 ). 1. Sorting n numbers [19]. 3-D PARBS. O(1). n×n×n. Transitive closure [20]. 3-D PARBS. O(1). n×n×n. Transitive closure [20]. 2-D PARBS. O(1). n2 ×n2. Bipartite graph [20]. 3-D PARBS. O(1). n×n×n. Bipartite graph [20]. 2-D PARBS. O(1). n2 ×n2. Connected components [20]. 3-D PARBS. O(1). n×n×n. Connected components [20]. 2-D PARBS. O(1). n2 ×n2. Articulation point [20]. 3-D PARBS. O(1). n×n×n. Articulation point [20]. 2-D PARBS. O(1). n2 ×n2. Bi-connected components [20]. 3-D PARBS. O(1). n×n×n2. Bi-connected components [20]. 2-D PARBS. O(1). n2 ×n3. Bridge [20]. 3-D PARBS. O(1). n × n × (n((n −. [18]. 1)/2)) Bridge [20]. 2-D PARBS. O(1). n2 ×n2 ×(n((n − 1)/2)). Minimum spanning tree [20]. 3-D PARBS. O(1). M ax{n, m}×m. Minimum spanning tree [20]. 2-D PARBS. O(1). n2 × M ax{n × m, m2 }. Transitive closure [21]. CRCW PRAM. 16. O(log 3 n). n.

(29) Problem. Model. Time. CPU #. Transitive closure (Digraph) [22]. 3-D PARBS. O(log n). n×n×n. All pairs shortest paths (Digraph). 3-D PARBS. O(log n). n×n×n2. 3-D PARBS. O(log n). n×n×n2. EREW. O(log 2 n)time. N/A. PRAM. O(n3 )work. PRAM. O(log 3 n). n/log n. EREW. O(log 4 n)time. N/A. PRAM. O(n2 )work. Redundant arcs elimination [11]. 2-D R-Mesh. O(1). n4. EXOR of n bits [10]. 2-D R-Mesh. Θ(1). 2n×3. Sum of two n bit numbers [26]. 1-D PARBS. O(1). n. Product of two n bit numbers [26]. 2-D PARBS. O(log n). n×2n. Sorting and computing convex hulls. 2-D PARBS. O(1). (n × m) × (n ×. [22] Minimum spanning tree (Digraph) [22] All-pairs shortest path [23]. Strongly. connected. component. (Planar Digraph) [24] All-pairs shortest path [25]. [n/m]), 1 ≤ m ≤. [27]. n XOR of n-bits [27]. 2-D PARBS. O(1). 2n×3. Summation of n bits [27]. 2-D PARBS. O(1). n×(n + 1). Summation of n,m bits integers [27]. 2-D PARBS. O(1). 2nm×2nm. 17.

(30) Problem. Model. Time. CPU #. Multiplication of two n-bit integers. 2-D PARBS. O(1). 4n2 ×4n2. 2-D PARBS. O(1). nk+1 ×n. Minimum spanning tree [29]. 3-D PARBS. O(log n). n3+ε. Sorting nk number [30]. (K+1)-D. O(1). n×n× . . .×n. [27] Simulation k-D RM by 2-D RM [28]. RMB Sorting n numbers [31]. 3-D R-Mesh. O(1). n3/2. Sorting n numbers [32]. 2-D R-Mesh. O(1). n×n. Sorting of n left and right parenthe-. 2-D R-Mesh. O(1). n×n. List ranking [34]. 2-D R-Mesh. O(log ∗ n). n×n. List ranking (Randomized) [34]. 2-D R-Mesh. O(1). n×n. Minimum spanning tree [35]. 2-D R-Mesh. O(1). n×m2. Spanning tree [36]. RMBM. O(1). m×n (with m×n3. ses and all matching pairs [33]. switches) All-pair shortest paths [37]. 4-D HBBN. O(log N ). N 1/c × N × N × N , Bus width = O(log N )-bit. All-pair shortest paths [37]. 3-D HBBN. O(log N ). N × N × N , Bus width = N 1/c -bit. 18.

(31) Problem Minimum-weight. spanning. Model. Time. CPU #. 4-D HBBN. O(log N ). N 1/c × N × N × N , Bus width =. tree [37]. O(log N )-bit Minimum-weight. spanning. 3-D HBBN. O(log N ). width = N 1/c -bit. tree [37] Transitive closure [37]. N × N × N , Bus. 3-D HBBN. O(log N ). N × N × N, Bus. width. =. O(log N )-bit Connected component [37]. 3-D HBBN. O(log N ). N × N × N, Bus. width. =. O(log N )-bit Biconnected component [37]. 4-D HBBN. O(log N ). N ×N ×N × N , Bus width = O(log N )-bit. Articulation point and bridge [37]. 4-D HBBN. O(log N ). N ×N ×N × N , Bus width = O(log N )-bit. Minimum spanning tree [38]. 3-D PARBS. O(log 2 n). n3. Convex hull [39]. 2-D R-Mesh. O(1). n×n. k-D maxima [39]. 2-D R-Mesh. O(1). n×n. Two set dominance counting [39]. 2-D R-Mesh. O(1). n×n. Smallest enclosing box [39]. 2-D R-Mesh. O(1). n×n. 19.

(32) Problem. Model. Time. CPU #. All pairs nearest neighbor [39]. 2-D R-Mesh. O(1). n×n. Triangularization [39]. 2-D R-Mesh. O(1). n×n. Multiply two n bit numbers [40]. 2-D R-Mesh. O(1). n×n. Verification of minimum spanning. CREW. O(log n). N/A. tree [41]. PRAM. Topological sort (Digraph) [12]. 3-D. CD-. O(1). n×n×n. CD-. O(1). n×n×n. CD-. O(1). n×n×n. CD-. O(1). n×n×n. PARBS Transitive closure (Digraph) [12]. 3-D PARBS. Cyclic graph checking (Digraph). 3-D. [12]. PARBS. Strongly connected component (Di-. 3-D. graph) [12]. PARBS. Minimum spanning tree [42]. 3-D AROB. O(log n). n×n×n. Minimum spanning tree [43]. 2-D R-Mesh. O(log 2 n). n×n. Minimum spanning tree [43]. 2-D R-Mesh. O(log n). n1+ε ×n. Longest common subsequence [13]. DR-Mesh. O(1). mh × nh (h = min(m, n) + 1). Single-source shortest path [13]. DR-Mesh. O(1). n2 w. ×. n2 w. w=maximum arc weight Minimum spanning tree [44] [14]. 3-D PARBS. 20. O(1). n×n×n2.

(33) Problem. Model. Time. Minimum spanning tree [45]. EREW. O(log n). PRAM. O((m. CPU # with. N/A. +. √ n) log n) operations Minimum spanning tree [45]. CRCW. O(log n). PRAM. O(n2 ). with. N/A. opera-. tions Minimum spanning tree [45]. CRCW. O(log n). PRAM. O((m. with. N/A. +. √ n) log n) operations Strongly connected component [46]. 1.4. PRAM. O(log 2 n). N/A. Notations and Definitions. The following notations and definitions are used throughout this dissertation. G=(V, E). An undirected graph,. D=(V, A). A digraph (directed graph),. n. The number of vertices in G or D or the input size of the specified problem,. . If n is a t-th power number then = (1/t), 21.

(34) m. The number of edges or arcs in G or D or the input size of the specified problem,. k or K. k = 1, 2, 3, . . . , n or K = 1, 2, 3, . . . , n,. c. A constant number,. N. The number of vertices in G or D with O(log N )-bit bus width,. V. The set of vertices in G or D. V={V1 , V2 , . . ., Vn },. Vi. A vertex in G or D,. E. The set of edges in G. E={E1 , E2 , . . ., Em },. (Vi , Vj ). An edge in G,. A. The set of arcs in D. A={A1 , A2 , . . ., Am },. < Vi , Vj >. An arc in D,. < Vi , . . ., Vk >. A directed path in D,. Pi. A directed path in D,. A or M. The adjacent matrix of D,. A∗ or M ∗. A∗ = Ak or M ∗ = M k , where k = 1, 2, 3, . . . , n,. log ∗. log ∗ = log k , where k = 1, 2, 3, . . . , n,. processor (i,j,k) The processor located at the i-th row, the j-th column, and the k-th plane, variable (i,j,k). The specified variable in processor (i,j,k),. Ri. A strongly connected component in G or D.. 22.

(35) 1.5. Organization of the Dissertation. Some parallel computation models and algorithms have been mentioned in the beginning of this chapter. In Chapter 2, we will introduce our O(log d(D)) time digraph algorithm as the basic idea, where d(D) is the diameter of digraph D. Some transitive closure related algorithms will be proposed in Chapter 3. These algorithms are based on the idea presented in Chapter 2 and solve by adding extra steps or modifying some steps, for example, strongly connected digraph, strongly connected component, cyclic checking, and tree construction problems. The all-pairs shortest distance problem is similar to the all-pairs shortest path problem, the only difference is that the arcs have the same weights in our problem. In Chapter 4, we use the same approach to solve the all-pairs shortest distance problem. It shows that our approach not only can solve the digraph problems whose result are limited to 0 or 1 but also the more complicated problems too. The allpairs shortest distance related problems are solved in Chapter 5, for example, the single source shortest distance, diameter, and topological sort problems. We will conclude all remarkable results in this dissertation and describe our future work in the last chapter.. 23.

(36) Chapter 2. Transitive Closure Algorithm. The transitive closure of digraph D is defined as D∗ =(V,A∗ ), where A∗ ={< i, j >: there is a directed path from vertex Vi to vertex Vj in D}, D∗ < i, j >= 0 if and only if there is no directed path existed, D∗ < i, j >= 1 otherwise. In this chapter, we will propose a way to compute the transitive closure of a digraph D on a non-directional reconfigurable model, for example, R-Mesh.. 2.1. Introduction. Here we summarize the previous results for the transitive closure (TC) problem. In 1962, Warshall [15] developed a sequential algorithm to solve the transitive closure problem in O(n3 ) time. Chen et al. [22] proposed algorithms in 1992 for solving transitive closure, all-pairs shortest path (APSP), and minimum spanning tree (MST) problems in O(log n) time on a 3-D n×n×n, n×n×n2 , and n×n×n2 PARBS respectively. These algorithms are based on the matrix multiplication algorithm which is an O(1) time algorithm on a 3-D n×n×n PARBS and is optimal. The transitive closure algorithm of digraph was found by using O(1) time on a directional CREW RMBM by Trahan et al. [36] in 1994. Kuo et al. [12] solved the topological sort (TS), transitive closure, cyclic checking, strongly connected digraph, and strongly connected component (SCC) problems in 1999 by using O(1) time on a 3-D n×n×n CD-PARBS. The algorithms developed by Trahan and Kuo are all based on the s-t connectivity algorithm which can solve digraph problems in O(1) time. This approach is good for some digraph problems, but some digraph problems cannot be solved efficiently [13].. 24.

(37) In [47], Reif showed that the lexicographical depth-first search (DFS) problem is Pcomplete. It means that DFS is difficult to parallelize. In [48], Karp and Ramachandran believed that applications of DFS seem easier to solve in parallel than DFS itself. They also defined the transitive-closure bottleneck problems which refer to breadth-first search (BFS) and the above applications of BFS and DFS, but not DFS itself. To our knowledge, there are O(1) and O(log n) time transitive closure algorithms for undirected and directed graphs, but there is no DFS algorithm on reconfigurable architectures. The O(1) time algorithms proposed in [36] and [12] cannot solve the lexicographical depth-first search problem. The reason is that their algorithms are all based on the s-t connectivity algorithm. In this chapter, we will propose a transitive closure algorithm which can find the transitive closure of digraph D in O(log d(D)) time, where D=(V, A) and d(D) is the diameter of digraph D, on a 3-D n×n×n R-Mesh. The adjacent matrix of the digraph D has the information for any specified vertex Vi and its adjacent vertices, i.e. Vj , in which < Vi , Vj >∈ A. We merge the vertices of digraph D by the rule that if < Vi , Vj >∈ A and < Vj , Vk >∈ A then the directed path< Vi , Vj , Vk >exists. This rule can be applied to the directed paths as well. For example, if< Vi , . . . , Vj , . . . >and< Vj , . . . , Vk >exist then the directed path < Vi , . . . , Vj , . . . , Vk > exists, too. This gives us an innovative approach for solving the transitive closure and related problems on the non-directional reconfigurable architectures. Figure 2.1 is the example used to demonstrate our algorithm in this study. This digraph is not a strongly connected graph and it has three strongly connected components, { V1 , V2 , V4 , V5 } , { V3 } , and { V6 } . Obviously, cycles will be generated, for example,. 25.

(38) < V1 , V2 , V5 , V1 > and < V1 , V4 , V5 , V1 >. The grey nodes in Figure 2.1 are the arcs of digraph.. V1. V2. V3. V4. V5. V6. Figure 2.1: A digraph example with its adjacent matrix on R-Mesh. 2.2. Transitive Closure Algorithm. Model: A 3-D n×n×n R-Mesh.. Input:. For 1 ≤ i, j, k ≤ n, processor (i, j, k) holds value(i, j, k) ← 1 if and only if < Vi , Vj >∈ A or j = i, 0 otherwise.. Output: For 1 ≤ i, j, k ≤ n, processor (i, j, k) holds value(i, j, k) = 1 if and only if there is a directed path from vertex Vi to vertex Vj or j = i, 0 otherwise. begin Step 1.. value(i, j, k) ← 1 if and only if< Vi , Vj >∈ A or j = i, 0 otherwise /* Initialize variables of processor (i, j, k). */. 26.

(39) Step 2.. select(i, j, k) ← 0. Step 3.. temp(i, j, k) ← 0. Step 4.. previous(i, j, k) ← 0. Step 5.. current(i, j, k) ← 0. Step 6.. halt(i, j, k) ← 0. Step 7.. do {. Step 8.. processor (i, j, k) configures as { NS }. Step 9.. processor (k, j, k) with value(k, j, k) = 1 writes a signal and diagonal processor (j, j, k) reads from any one of its N or S ports /* The specified vertex Vk uses its outgoing arcs or directed paths, the k-th row of k-th plane, to find all the vertices Vj which are reachable by the vertex Vk via the outgoing arcs or directed paths of the vertex Vk . */. Step 10.. if processor (j, j, k) receives the signal in Step 9 then select(j, j, k) ← 1 else select(j, j, k) ← 0 /* The vertex Vj , diagonal processor, which received the signal in Step 9 assigns its variable select to 0 or 1. The vertex Vj with select = 1 is reachable for the vertex Vk . */. Step 11.. processor (i, j, k) configures as { EW }. Step 12.. processor (j, j, k) writes select(j, j, k) and processor (i, j, k) reads from any one of its E or W ports into temp(i, j, k). 27.

(40) Step 13.. if temp(i, j, k) = 1 and value(i, j, k) = 1 then temp(i, j, k) ← 1 else temp(i, j, k) ← 0 /* All arcs or directed paths of the vertex Vj , such as < Vj , Vx >∈ A or < Vj , . . . , Vx > exists, will set its variable temp to 1 in Step 13, 0 otherwise. */. Step 14.. processor (i, j, k) configures as { NS }. Step 15.. value(k, j, k) ←OR(temp(1, k, k), temp(2, k, k), . . . , temp(n, k, k)) /* The vertex Vk can reach to the vertex Vj and all the reachable vertices of Vj . The OR function is the well-known constant-time n-bit OR algorithm on a 1-D n processor R-Mesh, so the detail steps are omitted. */. Step 16.. processor (i, j, k) configures as { UD }. Step 17.. processor (k, j, k) writes value(k, j, k) and processor (i, j, k) reads from any one of its U or D ports into value(i, j, k) /* The processors on the k-th row of k-th plane update the reachable information to other planes. */. Step 18.. previous(i, j, k) ← current(i, j, k). Step 19.. current(i, j, k) ← value(i, j, k). 28.

(41) Step 20.. if previous(k, j, k) = current(k, j, k) then temp(k, j, k) ← 1 else temp(k, j, k) ← 0 /* The reachable information of any pair of vertices in the current round will compare with the reachable information of previous round in Steps 18 to 20. When temp = 0, the relation of this pair of vertices is changed, a vertex is reachable for another one in this round. */. Step 21.. processor (i, j, k) configures as { EW }. Step 22.. halt(k, j, k) ←AND(temp(k, 1, k), temp(k, 2, k), . . . , temp(k, n, k)) /* When halt(k, j, k) ← 1 if and only if no any relation is changed in this round, 0 otherwise. The AND function is the well-known constanttime n-bit AND algorithm on a 1-D n processor R-Mesh, so the detail steps are omitted. */. Step 23.. processor (i, j, k) configures as { NSEW }. Step 24.. processor (k, 1, k) writes halt(k, 1, k) and processor (i, j, k) reads from any one of its ports into halt(i, j, k). Step 25. Step 26.. processor (i, j, k) configures as { UD } } while (AND(halt(1, 1, 1), halt(1, 1, 2), . . . , halt(1, 1, n)) 6= 1) /* Do all vertices find their reachable vertices? If yes, then this algorithm stops. */. end. 29.

(42) When applying our transitive closure algorithm to the example in Figure 2.1, we use Figure 2.2 to explain how our algorithm works.. 30.

(43) j. i. k=1. k=2. k=3. k=4. k=5. k=6. (a) From Step 8 to Step 9: The arcs or directed paths of the vertex Vk broadcast the signal to find the reachable vertex Vj . The thick nodes are the senders and the diagonal nodes of each plane are the receivers. j. i. k=1. k=2. k=3. k=4. k=5. k=6. (b) Step 10: The diagonal processors set their select(j, j, k) to 0 or 1, the black diagonal nodes set select(j, j, k) ← 1 and other diagonal nodes set select(j, j, k) ← 0 j 1. 0 1. i. 0 1. 0. 1 1. 0. 0. k=2. 0 0. 1 0. 0. 0 0. 0. 1 0. 1 0. 1. 0. k=1. 0 0. 0 0. 1 1. k=3. 0 k=4. 0 1. 0 1. k=5. 1 k=6. (c) From Step 11 to Step 12: The integers in the diagonal thick nodes are the values to be sent in Step 12. j. i. k=1. k=2. k=3. k=4. k=5. k=6. (d) Step 13: Set temp(i, j, k) to 0 or 1. The thick nodes set temp(i, j, k) ← 1, temp(i, j, k) ← 0 otherwise.. 31.

(44) 1. 2. 3. 1. 2. 3. 1. 2. 3. 1. 2. 3. 1. 2. 3. 1. 2. 3. 4. 5. 6. 4. 5. 6. 4. 5. 6. 4. 5. 6. 4. 5. 6. 4. 5. 6. k=1. k=2. k=3. k=4. k=5. k=2. k=3. k=4. k=5. k=6. j. i. k=1. k=6. (e) From Step 14 to Step 15: The specified vertex Vk merges all its reachable vertices. The thick white nodes on the k-th row of k-th plane are the new merged reachable vertices. j 1. 1. 1. 1. 1. 0 1. 1. 1. 0. 1. 1 0. i. 0. 1. 0. 0. 1 1. 0. 0. 1. 1. 1 1. 1. 0. 1. 1. 1 0. k=1. k=2. k=3. k=4. 0. k=5. 0. 0. 0. 1. k=6. (f) From Step 16 to Step 17: Update the adjacent matrix. The thick nodes will send their value(i, j, k) to each k-th plane via { UD }.. j. j. ≠. i. i. (g) From Step 18 to Step 26: The previous adjacent matrix on the left is not equal to the current adjacent matrix on the right, so the algorithm will do the while loop again. j. i. k=1. k=2. k=3. k=4. k=5. k=6. (h) From Step 8 to Step 9: The arcs or directed paths of the vertex Vk broadcast the signal to find the reachable vertex Vj . The thick nodes are the senders and the diagonal nodes of each plane are the receivers.. 32.

(45) j. i. k=1. k=2. k=3. k=4. k=5. k=6. (i) Step 10: The diagonal processors set their select(j, j, k) to 0 or 1, the black diagonal nodes set select(j, j, k) ← 1 and other diagonal nodes set select(j, j, k) ← 0. j 1. 1. 0. 1 1. i. 1. 1 1. 1 0. 0 1. 1. 0. 1 0. 1 1. 1. 0 1. 1. k=2. 0 0. 0. 0. 0. 0 1. 1. k=1. 1. 0. 0. 1. k=3. 1. k=4. 1. k=5. k=6. (j) From Step 11 to Step 12: The values in the diagonal thick nodes are the values to be sent in Step 12. j. i. k=1. k=2. k=3. k=4. k=5. k=6. (k) Step 13: Set temp(i, j, k) to 0 or 1. The thick nodes set temp(i, j, k) ← 1, temp(i, j, k) ← 0 otherwise. 1. 2. 3. 1. 2. 3. 1. 2. 3. 1. 2. 3. 1. 2. 3. 1. 2. 3. 4. 5. 6. 4. 5. 6. 4. 5. 6. 4. 5. 6. 4. 5. 6. 4. 5. 6. k=1. k=2. k=3. k=4. k=5. k=6. j. i. k=1. k=2. k=3. k=4. k=5. k=6. (l) From Step 14 to Step 15: The specified vertex Vk merges all its reachable vertices. The thick white nodes on the k-th row of k-th plane are the new merged reachable vertices.. 33.

(46) j 1. 1. 1. 1. 1. 1 1. 1. 1. 1. 1. 1 0. i. 0. 1. 0. 0. 1 1. 1. 1. 1. 1. 1 1. 1. 1. 1. 1. 1 0. k=1. k=2. k=3. k=4. 0. k=5. 0. 0. 0. 1. k=6. (m) From Step 16 to Step 17: Update the adjacent matrix. The thick nodes will send their value(i, j, k) to each k-th plane via { UD }.. j. j. ≠. i. i. (n) From Step 18 to Step 26: The previous adjacent matrix on the left is not equal to the current adjacent matrix on the right, so the algorithm will do the while loop again.. (o) From Step 8 to Step 17: These steps are almost the same as the Figure 2-2(h) to Figure 2-2(m), so these steps are omitted.. j. i. j. =. i. (p) From Step 18 to Step 26: Because no new reachable vertex can be added in the next round, the algorithm will be finished and we get the transitive closure of the example.. Figure 2.2: Example of the transitive closure algorithm. 34.

(47) Theorem 2.2.1 The transitive closure problem on digraph can be solved in O(log d(D)) time on a 3-D n×n×n non-directional reconfigurable mesh, where d(D) is the diameter of the digraph D.. Proof The maximum distance of shortest directed paths of digraph D is called the diameter of digraph D. We give a digraph D=(V, A) and assume its diameter is k. When d(D)=0, no reachable vertex Vj of any vertex Vi can be found from Steps 8 to 10, since < Vi , Vj >∈ / A. There is no reachable vertex Vk of vertex Vj that can be found in Steps 11 to 13, since < Vj , Vk >∈ / A. If < Vi , Vj >∈ A, then d(D)=1. This contradicts our assumption. In Steps 14 to 17, we update the adjacent matrix M by adding the new reachable information and the updated adjacent matrix is M 1 , M 1 =A. S. < Vi , Vk > and. A1 =M 1 . Since M =M 1 , so Step 26 will stop this algorithm. In this case, we only need one round to find its transitive closure. When d(D)=1, all reachable vertices Vj of any vertex Vi will be found from Steps 8 to 10, if < Vi , Vj >∈ A. There is no reachable vertex Vk of vertex Vj that can be found in Steps 11 to 13, since < Vj , Vk >∈ / A. If < Vi , Vj >∈ A and < Vj , Vk >∈ A, then < Vi , Vj , Vk >exists and d(D)=2. This contradicts our assumption. In Steps 14 to 17, we update the adjacent matrix M by adding the new reachable information and the updated adjacent matrix is M 1 , M 1 =A. S. < Vi , Vk > and A1 =M 1 . Since M =M 1 , so Step 26 will. stop this algorithm. In this case, we only need one round to find its transitive closure. When d(D)=2, all reachable vertices Vj of any vertex Vi will be found from Steps 8 to 10, if< Vi , Vj >∈ A. There are new reachable vertices Vk of vertex Vj that can be found in Steps 11 to 13, if< Vj , Vk >∈ A. If< Vi , Vj >∈ A and< Vj , Vk >∈ A, then< Vi , Vj , Vk > exists and d(D)=2. In Steps 14 to 17, we update the adjacent matrix M by adding the. 35.

(48) new reachable information and the updated adjacent matrix is M 1 , M 1 =A. S. < Vi , Vk >. and A1 =M 1 . Since M 6= M 1 , so Step 26 will continue the while-loop again. In Steps 8 to 10 of the second round, we can find all reachable vertices Vj of any vertex Vi , if < Vi , Vj >∈ A1 . There is no new reachable vertex Vk of vertex Vj that can be found in Steps 11 to 13, since < Vj , Vk >∈ / A1 . If < Vi , Vj >∈ A1 and < Vj , Vk >∈ A1 , then < Vi , . . . , Vj , Vk > exists and d(D)=3. This contradicts our assumption. In Steps 14 to 17, we update the adjacent matrix M 1 by adding the new reachable information and the updated adjacent matrix is M 2 , M 2 =A1. S. < Vi , Vk >and A2 =M 2 . Since M 1 =M 2 , so Step. 26 will stop this algorithm. In this case, we only need two rounds to find its transitive closure. Assume our algorithm can find the transitive closure of digraph D with d(D)=k and k=2r . The lengths of directed paths of digraph D between 0 and 2r will be found correctly in (r + 1) rounds. When d(D)=k + 1=2r + 1, the previous r rounds will find all directed paths with lengths between 0 and 2r . In Steps 8 to 10 of the (r + 1)-th round, we can find all reachable vertices Vj of any vertex Vi , since < Vi , . . . , Vj >∈ Ar . There are new reachable vertex Vk of vertex Vj that can be found in Steps 11 to 13, since < Vj , . . . , Vk >∈ Ar . If < Vi , . . . ., Vj >∈ Ar and < Vj , . . . , Vk >∈ Ar , then < Vi , . . . , Vj , . . . , Vk > exists. In the best case, we may have two directed paths, for example, P1 and P2 , if vertices in P1 and P2 are the same except vertices Vi and Vk then the length of < Vi , . . . , Vj , . . . , Vk > is 2r + 1. In Steps 14 to 17, we update the adjacent matrix M r by adding the new reachable information and the updated adjacent matrix is M r+1 , M r+1 =Ar. S. < Vi , Vk > and. Ar+1 =M r+1 . Since M r 6= M r+1 , so Step 26 will continue the while-loop again. In the. 36.

(49) (r + 2)-th round, the algorithm will stop, since M r+1 =M r+2 . The transitive closure can be found in (r + 2) rounds. When d(D)=k + 2=2r + 2 to d(D)=2k − 1=2r+1 − 1, we can get the transitive closure as the previous case in (r + 2) rounds. When d(D)=2k=2r+1 , the previous r rounds will find all directed paths with length between 0 and 2r . In Steps 8 to 10 of the (r+1)-th round, we can find all reachable vertices Vj of any vertex Vi , since< Vi , . . . , Vj >∈ Ar . There are new reachable vertex Vk of vertex Vj that can be found in Steps 11 to 13, since < Vj , . . . , Vk >∈ Ar . If < Vi , . . . ., Vj >∈ Ar and< Vj , . . . , Vk >∈ Ar , then< Vi , . . . , Vj , . . . , Vk >exists. In the best case, we may have two directed paths with lengths 2r , for example, P1 and P2 , if vertices in P1 and P2 are disjointed except vertex Vj then the length of < Vi , . . . , Vj , . . . , Vk > is 2r+1 . In Steps 14 to 17, we update the adjacent matrix M r by adding the new reachable information and the updated adjacent matrix is M r+1 , M r+1 =Ar. S. < Vi , Vk > and Ar+1 =M r+1 . Since. M r 6= M r+1 , so Step 26 will continue the while-loop again. In the (r + 2)-th round, the algorithm will stop, since M r+1 =M r+2 . The transitive closure can be found in (r + 2) rounds. The time complexity of each step in our algorithm is O(1), so the time complexity of the algorithm is totally decided by the number of rounds. This proves that the transitive closure problem on digraph can be solved in O(log d(D)) time on a 3-D n×n×n nondirectional reconfigurable mesh, where d(D) is the diameter of the digraph D.. 37.

(50) 2.3. Conclusion. Before this study, there are only two approaches to solve digraph problems on the reconfigurable architectures. The first approach is on the basis of matrix multiplication which is independent of the models and can solve the algebraic path problems (APP) like the transitive closure problem, all-pairs shortest path problem, the minimum spanning tree problem, etc. The second approach uses the directional reconfigurable architectures, such as DR-Mesh, D-PARBS, CD-PARBS, and directional RMBM. The algorithms of the second approach specifically use the capability of the directional reconfigurable architectures which can configure their buses to suit the problems. Our algorithm is the third approach which is based on the basic graph theory. We use the transitivity rule on arcs and apply it on directed paths too. For example, if < Vi , . . . , Vj , . . . > and < Vj , . . . , Vk > exist then the directed path < Vi , . . . , Vj , . . . , Vk > exists. This give us an innovative approach for solving the transitive closure problem on non-directional reconfigurable architectures. The well-known breadth-first search (BFS) and depth-first search (DFS) are the most popular graph search algorithms. The breadth-first search begins at a specified node and explores all its neighboring nodes, the search stops when all nodes are explored. The depth-first search also starts at a specified node and explores as far as possible until there is no reachable node. Then the search will go back to the most recent node until all nodes are explored. In Steps 8 to 10 of the first round of the algorithm as shown in Figure 2.2(a) and (b), the actual arcs of a specified vertex can be used to find all its reachable vertices. The distances between the specified vertex and all its reachable vertices are all one. From these steps, we know that our transitive closure algorithm is similar to the breadth-first search (BFS) algorithm especially in the beginning. In Steps 11 to 15 as shown in Figure. 38.

(51) 2.2(c), (d), and (e), the reachable vertices of different specified vertices are merged by the rules we gave in the first section of this chapter. In Theorem 2.2.1, we proof that the maximum distance of each round can be twice of its previous round. From these steps, we believe that our algorithm is similar to the depth-first search (DFS) algorithm. The known fastest algorithm [12] can solve the transitive closure problem in O(1) time on 3-D n × n × n CD-PRABS, a directional model. Their algorithm is based on the s-t connectivity algorithm not the depth-first search (DFS) algorithm, so that DFS is difficult to parallelize even in reconfigurable architectures. Our O(log d(D)) time algorithm could be quite good result on non-directional models. We summarize previous and our results in Table 2.1.. Table 2.1: Summary of previous and our results for Chapter 2. Problem. Previous results. Our results. Transitive closure. On sequential model, O(n3 ) On 3-D n × n × n R-Mesh time, Warshall [15]. model, O(log d(D)) time. On 3-D n × n × n PARBS model, O(log n) time, Chen [22] On 3-D n×n×n CD-PARBS model, O(1) time, Kuo [12]. Chen’s [22] and our algorithms are executed on the non-directional reconfigurable architectures, but ours is faster. Kuo’s algorithm is a cost optimal solution, but the 39.

(52) CD-PARBS is an idealistic model. Our algorithm can be used on directional and nondirectional models. Our proposed algorithm not only gives us a new approach to solve the digraph problems on the non-directional models, but also provides a basic idea to design algorithms for the digraph problems which has never been studied. In the following chapters, we will solve different digraph problems by designing algorithms based on the same idea.. 40.

(53) Chapter 3. Transitive Closure Related Algorithms. In this chapter, we will propose algorithms to solve some related digraph problems and these algorithms are based on the idea of our transitive closure algorithm. These algorithms show the capability of our idea and we believe that it would be an important fundamental algorithm for solving other digraph problems in the future.. 3.1. Introduction. In this chapter, we will extend our transitive closure algorithm to solve some related digraph problems. By adding extra steps at the end of our transitive closure algorithm, strongly connected digraph and strongly connected components can be solved easily. Other problems such as cyclic checking and tree construction can also be solved by modifying a few steps in our transitive closure algorithm. Because most steps are similar, the details will be omitted. But the main differences among these algorithms will be explained below. In the cyclic checking algorithm, its time complexity can be further reduced to O(log c(D)) time, c(D) is the minimum distance of directed cycles in a digraph. The tree construction algorithm can build a tree from the specified root vertex to other vertices. If b(Vs , Vt ) is the distance between the source vertex Vs and the destination vertex Vt , then vertex Vt with b(Vs , Vt )=1, 2, . . ., and log d(D) will be added sequentially.. 41.

(54) 3.2. Strongly Connected Digraph Algorithm. A strongly connected digraph is a digraph that has at least one directed path between any pair of vertices. According to the definition, its transitive closure will be a matrix whose elements are all one, every pair of vertices has at least one directed path between them. After we got the transitive closure of a digraph, we can use the well-known constant-time n-bit AND algorithm on the rows and columns to decide the final result, for example AN D(AN D(value(1, 1, 1), value(1, 2, 1), . . . , value(1, n, 1)), AN D(value(2, 1, 1), value(2, 2, 1), . . . , value(2, n, 1)), . . . , AN D(value(n, 1, 1), value(n, 2, 1), . . . , value(n, n, 1))). So the strongly connected digraph problem can be solved in O(log d(D)) time on a 3-D n×n×n non-directional reconfigurable mesh. Figure 3.1 shows a counterexample of strongly connected digraph.. j. i. Figure 3.1: A digraph has three strongly connected components. Theorem 3.2.1 The strongly connected graph problem on digraph can be solved in O(log d(D)) time on a 3-D n×n×n non-directional reconfigurable mesh, where d(D) is the diameter 42.

(55) of the digraph D. Proof The proof of this theorem is similar to Theorem 2.2.1, so the detail steps are omitted.. 3.3. Strongly Connected Component Algorithm. Assume a digraph has three strongly connected components R1 , R2 , and R3 as shown in Figure 3.2. The thick arcs connected these three components, so they form a loop. All vertices in R1 can reach the vertices in R2 via the arc< V1 , V5 >and the vertices in R3 via the arc< V2 , V4 >. The similar results can be found in R2 and R3 , so R1 , R2 , and R3 are in the same strongly connected component. Hence, by the definition of strongly connected component, these three strongly connected components should be grouped and viewed as a single strongly connected component.. From above discussion, we know that there should not have loops among the strongly connected components of a digraph. Our transitive closure algorithm will find the relations for all vertices of the digraph. All the vertices in a strongly connected component have the same relations of reachability. For example, assume that two vertices Vi and Vj are in the same strongly connected component R. If Vi has a directed path S which connects Vi to Vk , then Vj can also reach Vk via the arcs in R and the directed path S. Hence, the two vertices Vi and Vj in the strongly connected component will have the same relations of reachability. We can use this feature to design the algorithm for solving the strongly connected component problem based on the transitive closure algorithm. The strongly connected components of a digraph are its maximal strongly connected 43.

(56) V1. V5. R1. V2. R2. V6. V4 V3. R3. Figure 3.2: A digraph has three strongly connected components. sub-digraphs. If each strongly connected component is contracted to a single vertex, the resulting digraph is a directed acyclic graph (DAG). A digraph is acyclic if and only if it has no strongly connected sub-digraph which has more than one vertex in it. Because the condensation of digraph is a directed acyclic graph, the vertices in different strongly connected components have different reachable vertices. After we got the transitive closure, we can use the k-th plane for the k-th row to find the rows which have the same reachable vertices. For example k=4, the processors on the 4-th plane will be configured as { NS }, the processors on the 4-th row will broadcast their value to their columns. After the data broadcasting phase, all processors on the 4-th plane will compare their values with the received value, and set the flags to 1 if they are the same, 0 otherwise. We can use the well-known constant-time n-bit AND algorithm on the rows to find the rows with the. 44.

(57) same reachable vertices in O(1) time. After we got the rows, we can find the row with the minimum row index and broadcast the index to other rows as their group number in O(1) time on a 2-D n×n non-directional reconfigurable mesh. In Figure 3.4, rows 1, 2, 4, and 5 have the same reachable vertices, so the vertices V1 , V2 , V4 , and V5 are in the same strongly connected component and their group number are 1. Rows 3 and 6 have no row with the same reachable vertices, so the vertices V3 and V6 are independent strongly connected components and their group numbers are 3 and 6, respectively. The example in Figure 3.3 has three strongly connected component, { V1 , V2 , V4 , V5 }, { V3 }, and { V6 }. Hence, the strongly connected component problem can be solved in O(log d(D)) time on a 3-D n×n×n non-directional reconfigurable mesh.. V1. V2. V3. V4. V5. V6. Figure 3.3: A digraph example. 45.

(58) j. i. Figure 3.4: A digraph has three strongly connected components. Theorem 3.3.1 The strongly connected components problem on digraph can be solved in O(log d(D)) time on a 3-D n×n×n non-directional reconfigurable mesh, where d(D) is the diameter of the digraph D.. Proof The proof of this theorem is similar to Theorem 2.2.1, so the detail steps are omitted.. 3.4. Cyclic Graph Check Algorithm. As we described in the previous section, a digraph is acyclic if and only if it has no strongly connected sub-digraphs which have two or more vertices. In another word, a cyclic digraph must have at least a strongly connected component which has two or more vertices. After we got the strongly connected components, we can check the size of each strongly connected component to decide the final result. This can be done by adding nbits in O(1) time on a 2-D n×n non-directional reconfigurable mesh. For example, the. 46.

(59) vertices V1 , V2 , V4 , and V5 are in the same strongly connected component and their group number is 1 on the 4-th plane. The vertex V4 broadcasts its group number to all processors, and only the diagonal processors with the same group number set their variables temp to 1. The size of a strongly connected component will be the sum of the diagonal processors variable temp. If temp > 1 then temp = 1, 0 otherwise. By using O(1) time n-bit AND algorithm on a 1-D n processor R-Mesh, we will know whether there is any component whose size is bigger than 1. Hence, the cyclic checking problem can be solved in O(log d(D)) time on a 3-D n×n×n non-directional reconfigurable mesh. Theorem 3.4.1 The cyclic graph checking problem on digraph can be solved in O(log d(D)) time on a 3-D n×n×n non-directional reconfigurable mesh, where d(D) is the diameter of the digraph D. Proof The proof of this theorem is similar to Theorem 2.2.1, so the detail steps are omitted. In addition to the previous algorithm, we have another algorithm which can solve this problem faster. In Chapter 2, our main idea is that we merge the directed paths of digraph D by the rule: if< Vi , . . . , Vj , . . . >and< Vj , . . . , Vk >exist then the directed path < Vi , . . . , Vj , . . . , Vk > exists. If Vi =Vk , then there is a cycle. For example, in the left of Figure 3.5, processor (4, 5, 4) is merged by< 4, . . . , 5 >, processor (4, 1, 4) is merged by < 5, . . . , 1 >, and processor (4, 6, 4) is merged by < 5, . . . , 6 >. The white thick nodes in the right of Figure 3.5, processor (4, 2, 4) and processor (4, 3, 4), will be merged by < 1, . . . , 2 >, < 5, . . . , 2 >, and < 1, . . . , 3 >. Note that the 4-th plane is used to get the reachable vertices of the vertex V4 . The 4-th column in the right of Figure 3.5 shows us that< 1, . . . , 4 >and< 5, . . . , 4 >will reach V4 again. This tells us that we can find at least 47.

(60) one directed path which starts from V4 , visits some vertices, and stops at V4 again. We assume the minimum distance of cycles in the digraph D is c(D), then the algorithm can get the result in O(log c(D)) time on a 3-D n×n×n non-directional reconfigurable mesh.. Theorem 3.4.2 The cyclic graph checking problem on digraph can be solved in O(log c(D)) time on a 3-D n×n×n non-directional reconfigurable mesh, where c(D) is the minimum distance of cycles in the digraph D.. Proof The proof of this theorem is similar to Theorem 2.2.1, so the detail steps are omitted.. 1. 2. 3. 1. 2. 3. 4. 5. 6. 4. 5. 6. k=4. 5. k=4. 4 4 5. 5 1 2 4 4 5. k=4. k=4. Figure 3.5: Example for our faster cyclic checking and tree construction algorithms. 48.

(61) 3.5. Tree Construction Algorithm. Given a digraph D, we can construct trees from any specified vertex to all other vertices. In Figure 3.5, the numbers in both 4-th rows are the indexes of parents and we construct the tree with root vertex V4 as shown in Figure 3.6. For example, the white thick nodes in the right of Figure 3.5, processor (4, 2, 4) and processor (4, 3, 4), will be merged by < 1, . . . , 2 >, < 5, . . . , 2 >, and < 1, . . . , 3 >. The < 1, . . . , 2 > and < 5, . . . , 2 > provide the directed paths to the vertex V2 , but we do not know which actually arc is used. We can find the actual arc with minimum row index in column 2 by modifying the well-known constant-time n-bit OR algorithm where the signals they sent are the row indexes, and < 1, 2 >will be the actual arc and the parent of V2 is V1 . From< 1, . . . , 3 >, we can find the parent of V3 is V2 . In the right of Figure 3.5, V4 can reach V3 via < 1, . . . , 3 >. We do not know how V1 reaches V3 , but we do know the reachable vertices of V1 are V2 , V3 , V4 , and V5 . So the reachable information of the first row can be transferred to the third column, the new adjacent matrix is called C. By AN D(A, C), logical AND the corresponding elements in A and C, we will find the actual arcs which can reach the vertex V3 . Hence, the tree construction problem can be solved in O(log d(D)) time on a 3-D n × n × n non-directional reconfigurable mesh.. 1. 2. 3. 4. 5. 6. k=4. Figure 3.6: Example of our tree construction algorithm. 49.

(62) Theorem 3.5.1 The tree construction problem on digraph can be solved in O(log d(D)) time on a 3-D n×n×n non-directional reconfigurable mesh, where c(D) is the diameter of the digraph D. Proof The proof of this theorem is similar to Theorem 2.2.1, so the detail steps are omitted.. 3.6. Conclusion. In Chapter 2, we have proposed a transitive closure algorithm for digraph on the nondirectional reconfigurable mesh. Some related digraph problems are solved by modifying our transitive closure algorithm. All algorithms show us that these digraph problems can have algorithms on the non-directional model. The tree construction algorithm also shows us that trees with particular ordering of neighboring arcs can be found in O(log d(D)) time on the same model. For the s-t connectivity approach, it cannot identify the vertices with different distances. Because the signal is broadcasted on a bus, all vertices in this bus can receive it at the same time. Based on such approach, researchers must add extra steps to handle such issue. From our algorithms, we know that the time complexity of most digraph problems is related to the diameter of digraph D and 1 ≤ d(D) ≤ n. For some specified digraph problems, the time complexity can be lower than O(log d(D)) time. For example, the cyclic checking problem only needs O(log c(D)) time, 2 ≤ c(D) ≤ d(D) + 1. The diameters of most digraphs are less than n, D = (V, A), | V |=n, and | A |=m. In previous researches, algorithms for the dense digraphs usually have higher time complexities than the sparse counterparts. But in this study, we believe that the time complexities of some 50.

(63) digraph algorithms are only related to the diameter of digraph and more arcs will shorten the diameter with higher possibility. We summarize previous and our results in Table 3.1.. Table 3.1: Summary of previous and our results for Chapter 3. Problem Strongly. Previous results Connected. Digraph Strongly. Our results. On 3-D n×n×n CD-PARBS On 3-D n × n × n R-Mesh model, O(1) time, Kuo [12]. Connected. model, O(log d(D)) time. On 3-D n×n×n CD-PARBS On 3-D n × n × n R-Mesh model, O(log d(D)) time. Component. model, O(1) time, Kuo [12]. Cyclic Graph. On 3-D n×n×n CD-PARBS On 3-D n × n × n R-Mesh. Tree Construction. model, O(1) time, Kuo [12]. model, O(log c(D)) time. N/A. On 3-D n × n × n R-Mesh model, O(log d(D)) time. 51.

相關文件

Lecture by baluteshih Credit by zolution... Minimum

• 下面介紹三種使用greedy algorithm產生minimum cost spanning

Both problems are special cases of the optimum communication spanning tree problem, and are reduced to the minimum routing cost spanning tree (MRCT) prob- lem when all the

The min-max and the max-min k-split problem are defined similarly except that the objectives are to minimize the maximum subgraph, and to maximize the minimum subgraph respectively..

Given a graph and a set of p sources, the problem of finding the minimum routing cost spanning tree (MRCT) is NP-hard for any constant p > 1 [9].. When p = 1, i.e., there is only

For all pairs of nodes, the diameter measures the maximal length of shortest paths, while the wide di- ameter measures the maximal length of best containers.. In practical

The Prajñāpāramitā-hṛdaya-sūtra (般若波羅蜜多心經) is not one of the Vijñānavāda's texts, but Kuei-chi (窺基) in his PPHV (般若波羅蜜多心經 幽賛) explains its

After the graph is constructed, we can realize that for all