• 沒有找到結果。

廣義的 Shuffle-Exchange 網路之最佳全體對全體私人化交換

N/A
N/A
Protected

Academic year: 2021

Share "廣義的 Shuffle-Exchange 網路之最佳全體對全體私人化交換"

Copied!
28
0
0

加載中.... (立即查看全文)

全文

(1)

國 立 交 通 大 學

應用數學系

碩 士 論 文

廣義的 Shuffle-Exchange 網路之

最佳全體對全體私人化交換

Optimal All-to-All Personalized

Exchange in General

Shuffle-Exchange Networks

研 究 生:陳柏澍

指導教授:陳秋媛 教授

中 華 民 國 九 十 六 年 六 月

(2)

廣義的 Shuffle-Exchange 網路之

最佳全體對全體私人化交換

Optimal All-to-All Personalized Exchange

in General Shuffle-Exchange Networks

研 究 生:陳柏澍 Student:Richard B. Chen

指導教授:陳秋媛 Advisor:Chiuyuan Chen

國 立 交 通 大 學

應 用 數 學 系

碩 士 論 文

A Thesis

Submitted to Department of Applied Mathematics

College of Science

National Chiao Tung University

in Partial Fulfillment of the Requirements

for the Degree of

Master

in

Applied Mathematics

June 2007

Hsinchu, Taiwan, Republic of China

(3)

廣義的 Shuffle-Exchange 網路之

最佳全體對全體私人化交換

研究生:陳柏澍

指導老師:陳秋媛 教授

國 立 交 通 大 學

應 用 數 學 系

摘 要

全體對全體私人化交換溝通(all-to-all personalized exchange communication)出現在 許多平行與分散式處理系統之應用。在文獻[12]中,Yang 以及 Wang 運用拉丁方 陣的技巧,針對了具有 unique-path 以及 self-routable 性質的多級式連接網路,提出 了時間複雜度為O(N)的最佳全體對全體私人化交換演算法。所有在文獻[12]中被 討論到的網路(包括 shuffle-exchange 網路),皆滿足 1 2n N = + (N 表示多級式網路的 輸入及輸出端的個數,n+1是多級式網路的階級數)。值得注意的是,Yang 以及 Wang 的演算法要求多級式網路中的每一階級裡的所有交換器的狀態都必須相同;換句話 說,Yang 以及 Wang 的演算法使用階級控制技術。在文獻[7]中,Padmanabham 提 出 了 廣 義 的 shuffle-exchange 網 路 ; 在 廣 義 的 shuffle-exchange 網 路 中 , 1 2 2n <Nn+ ,不再要求 1 2n N = + 。由於廣義的 shuffle-exchange 網路不一定具有 unique-path 性質,因此無法使用 Yang 以及 Wang 的演算法。本論文的目的即在於: 針對廣義的 shuffle-exchange 網路,提出兩個最佳全體對全體私人化交換演算法。和 Yang 以及 Wang 的演算法不同的是,我們的演算法沒有使用拉丁方陣,也不要求網 路要具有 unique-path 性質。我們的第一個演算法使用階級控制技術,而且適用於任 何的N;我們證明了:當要求使用階級控制技術、而且 1 1 2 2 2n− + nNn+ 時,此演 算法是最佳的。我們的第二個演算法不使用階級控制技術、而且只適用於N =2n+2 時;我們證明了,此演算法是最佳的。 關鍵詞:多級式網路,平行與交換式計算,全體對全體溝通,全體對全體私人化交 換。

中 華 民 國 九 十 六 年 六 月

(4)

Optimal All-to-All Personalized Exchange

in General Shuffle-Exchange Networks

Student: Richard B. Chen

Advisor: Chiuyuan Chen

Department of Applied Mathematics National Chiao Tung University

Abstract

All-to-all personalized exchange communication has been widely applied in many parallel and distributed processing applications. In [14], by the Latin square method, Yang and Wang proposed an optimal all-to-all personalized exchange algorithm for the unique-path, self-routable multistage interconnection networks (MINs). All the networks considered in [14], including the famous shuffle-exchange networks, satisfy

N = 2n+1, in which N is the number of inputs (outputs) and n + 1 is the number

of stages of the network. Do notice that Yang and Wang’s algorithm requires the states of all the switches of a stage to be identical; i.e., the stage control technique is used. In [9], Padmanabham proposed the general shuffle-exchange network (GSEN) with 2n < N ≤ 2n+1. Since a GSEN is not necessarily a unique-path MIN, Yang

and Wang’s algorithm may not apply. The purpose of this paper is to propose two optimal all-to-all personalized exchange algorithms for GSENs. Unlike Yang and Wang’s algorithm, we abandon the Latin square method and the requirement on the unique-path property. The first algorithm uses the stage control technique and works for arbitrary N . We will prove it is optimal when the stage control technique is assumed for 2n−1+ 2n≤ N ≤ 2n+1. On the contrary, the second algorithm does

not use the stage control technique and works only for N = 2n+ 2. We will prove

that it is optimal.

Keywords: multistage interconnection network, parallel and distributed com-puting, all-to-all communication, all-to-all personalized exchange.

(5)

誌 謝

光陰似箭,歲月如梭,轉眼間兩年已經過去了。還記得當初考上國

立交通大學應用數學系,是組合界擁有堅強師資陣容的系所。懷著期待

的心情進入。

組合組擁有優秀師資,以及團結和睦氣氛!在這種環境下,同學間

不僅生活上感情融洽,課業上也互相扶持。短短的時間內,老師們的教

導更開闊我的視野。感謝陳秋媛老師的演算法等課程、傅恆霖老師的圖

論課程、翁志文老師的組合編碼等課程,以及黃大原老師的設計理論等

課程。不只是理論的教導,更展延相關的應用。

其中最感謝的老師,莫過於我的指導老師:陳秋媛教授。她不只是

老師,更像是一位朋友。學業上,教了我許多知識;生活上,更像是患

難相助的好朋友。在待人處事方面,也開導我許多。

還有要感謝國立中央大學的單維彰老師。他曾是我大學導師,也教

過我許多科目。也因為老師的關係,讓我除了數學,在電腦知識、技能

方面也有好的表現。

最後感謝我同屆的同學:介友、宜廷、文強、澍仁、國安、張圳、

雁婷、妙玲,由於你們在生活及課業的互相幫忙,讓我在這兩年留下美

好回憶。還有同學:威雄、鈺傑、子鴻、志文以及博士班的學長:國元、

宏賓。有你們的參與下,讓我的生活多采多姿!

最後感謝我的父母,從小拉拔我、栽培我,持續鼓勵、支持我。你

們是我成功最大推手!感恩的心,不止於此,僅以微薄紙筆,代表我心!

(6)

Contents

Abstract (in Chinese) i

Abstract (in English) ii

Acknowledgement iii

Contents iv

List of Figures v

1 Introduction 1

2 Some preliminaries 4

3 All-to-all personalized exchange in GSENs with stage control 6 4 All-to-all personalized exchange in GSENs with N = 2n+ 2 10

(7)

List of Figures

1 Communications among processors using an MIN. . . 2

2 A 10 × 10 MIN which is also a 10 × 10 GSEN. . . . 2

3 The states of a 2 × 2 swtich. . . . 3

4 The network configuration of the GSEN in Figure 2. . . 5

5 An example of phase 1 of Algorithm GSEN-ATA-2. . . 20

(8)

1

Introduction

Processors in a parallel and distributed processing system often need to communicate with other processors. The communication among these processors could be one-to-one,

one-to-many, or all-to-all. In particular, all-to-all communication can be further classified

into all-to-all broadcast and all-to-all personalized exchange. In all-to-all broadcast, each processor sends the same message to all other processors; while in all-to-all personalized exchange, each processor sends a specific message to every other processor. This paper focuses on all-to-all personalized exchange.

All-to-all personalized exchange occurs in many important applications (for exam-ple, matrix transposition and fast Fourier transform (FFT)) in parallel and distributed computing. Since a processor can send only one message in each time unit, the time to complete all-to-all personalized exchange is Ω(N), where N is the number of processors in the given network. The all-to-all personalized exchange problem has been extensively studied for hypercubes, meshes, and tori; see [8, 14] for details. As was mentioned in [14], although the algorithm for a hypercube achieves optimal time complexity, a hypercube suffers from unbounded node degrees and therefore has poor scalability. On the other hand, although a mesh or torus has a constant node degree and better scalability, its algorithm has a higher time complexity [14]. An MIN (defined later) is considered to be a better choice for implementing all-to-all personalized exchange due to its shorter communication delay and better scalability.

Given N processors P0, P1, · · · , PN −1, an N × N multistage interconnection network

(MIN) can be used for communication among these processors as shown in Figure 1, where

N × N means N inputs and N outputs. Figure 2 shows an example of a 10 × 10 MIN.

A column in an MIN is called a stage and the nodes in an MIN are called switches (or

switching elements or crossbars). Throughout this paper, an MIN means an N × N MINs

(9)

[1, 2, 3, 5, 7] for switches of other sizes. It is well known that a 2 × 2 switch has only two possible states: straight or cross, as shown in Figure 3.

N x N MIN P0 P1 PN-1 I0 I1 IN-1 O0 O1 ON-1 … … …

Figure 1: Communications among processors using an MIN.

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

Figure 2: A 10 × 10 MIN which is also a 10 × 10 GSEN.

Obviously, it is meaningless to consider a network that does not have a path between an arbitrary pair of input and output. An MIN is unique-path if there is a unique path between each pair of input and output. An MIN is self-routable if the routing decision at a switch depends only on the addresses of the source and the destination. In [14], Yang and Wang proposed an optimal all-to-all personalized exchange algorithm for a class of unique-path, self-routable MINs.

Yang and Wang’s algorithm [14] uses stage control (see [10]), which is a commonly used technique to reduce the cost of the network setting for all-to-all personalized exchange

(10)

cross straight i0 i1 i0 i1 o0 o1 o0 o1

Figure 3: The states of a 2 × 2 swtich.

communication. Stage control means that the states of all the switches of a stage have to be identical. With stage control, a single control bit (0 for straight and 1 for cross), or in other words, one electronic driver circuit, can be used to control all the switches of a stage. Thus the number of expensive electronic driver circuits needed is significantly lower than that of individual switch control.

Throughout this paper, N denotes the number of processors in a given MIN and

n + 1 is the number of stages in a given MIN. Since each switch is of size 2 × 2, N is an

even integer. All the networks considered in [14], including the famous shuffle-exchange networks, satisfy N = 2n+1. Shuffle-exchange networks have been proposed as a popular

architecture for MINs; see [4, 5, 6, 9, 11]. In [9], Padmanabhan proposed the general

shuffle-exchange network (GSEN) with 2n < N ≤ 2n+1. The N terminals in an N × N

GSEN are numbered 0, 1, · · · , N − 1 and the shuffle-exchange operation on N terminals is the permutation π defined by

π(i) = (2i + ¹ 2i N º ) mod N, 0 ≤ i ≤ N − 1.

See Figure 2 for an example. In the remaining part of this paper, we will simply use a GSEN to denote an N × N GSEN. Notice that in a shuffle-exchange network, N = 2n+1,

while in a GSEN, 2n< N ≤ 2n+1.

Although Yang and Wang’s algorithm [14] is optimal, it works only for unique-path MINs. Since a GSEN is not necessarily a unique-path MIN, Yang and Wang’s algorithm may not apply. Besides, Yang and Wang’s algorithm requires constructing a Latin square

(11)

in advance and allocating memory for storing the Latin square. In [14], the time for constructing the Latin square is not counted in the optimal O(N) communication delay. The purpose of this paper is to propose two optimal all-to-all personalized exchange algorithms for GSENs. Unlike Yang and Wang’s algorithm, we abandon the Latin square method and the requirement on the unique-path property. The first algorithm uses the stage control technique and works for arbitrary N. We will prove it is optimal when the stage control technique is assumed for 2n−1+ 2n≤ N ≤ 2n+1. On the contrary, the second

algorithm does not use the stage control technique and works only for N = 2n+ 2. We

will prove that it is optimal.

This paper is organized as follows: Section 2 gives some preliminaries. Section 3 is our first all-to-all personalized exchange algorithm. Section 4 is our second all-to-all personalized exchange algorithm. Concluding remarks are given in the final section.

2

Some preliminaries

In a GSEN, the switches are aligned in n + 1 stages: stage 0, stage 1, · · · , stage n. Each stage ` consists of N/2 switches denoted as s`

0, s`1, · · · , s`N/2−1 and s`(i+1) mod N is

considered to be the successive switch of s` i.

The network configuration of an MIN is defined by the states of its switches. Since a GSEN has N

2 × (n + 1) switches, the network configuration of a GSEN can be represented

by an N

2 × (n + 1) matrix in which each entry is defined by the state of its corresponding

switch. And, when the stage control technique is used, the network configuration of a GSEN can be represented by a number between 0 and 2n+1− 1. For example, the network

configuration of the GSEN in Figure 2 can be represented by the matrix in Figure 4 or by the number 10, which is (1010)2.

A permutation of an MIN is one-to-one mapping between the inputs and outputs. For an MIN, if there is a permutation that maps input i to output p(i), where p(i) ∈

(12)

      1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0      

Figure 4: The network configuration of the GSEN in Figure 2.

{0, 1, · · · , N − 1} for i = 0, 1, · · · , N − 1, then we will use

µ 0 1 · · · N − 1 p(0) p(1) · · · p(N − 1) ¶ or simply use p(0) p(1) · · · p(N − 1)

to denote the the permutation. Given a network configuration of an MIN, a permutation can be obtained. For example, the network configuration shown in Figure 2 maps input 0 to output 1, input 1 to output 4, input 2 to output 6, · · · , and input 9 to output 8; this configuration obtains the permutation

1 4 6 0 7 2 9 3 5 8.

Permutations realizable by an MIN are called admissible permutations. Not all of the

N ! permutations are realizable by an MIN. For example, the identity permutation is not

realizable by the MIN in Figure 2.

An N ×N Latin square is an N ×N matrix A = (ai,j) , i, j = 0, 1, · · · , N −1, such that

entries ai,j are in the set {0, 1, · · · , N − 1} and no two entries in a row or a column are

identical. In [14], Yang and Wang found that: to realize all-to-all personalized exchange for a unique-path, self-routable MIN, one only needs to arrange N network configurations so that their corresponding admissible permutations form an N ×N Latin square. By using this Latin square method, Yang and Wang [14] proposed an optimal all-to-all personalized exchange algorithm for a class of unique-path, self-routable MINs; see also [7, 8, 12, 13, 15].

(13)

In this paper, ⊕ denotes the XOR operation. As a reference, 0 ⊕ 0 = 0, 0 ⊕ 1 = 1, 1 ⊕ 0 = 1, 1 ⊕ 1 = 0.

3

All-to-all personalized exchange in GSENs with

stage control

In a GSEN, the messages are transmitted in a pipelining pattern. In the following, a

round means a process to transmit all the messages of the current stage to the next stage.

Before proposing our first all-to-all personalized exchange algorithm for GSENs, we will prove that when 2n+ 2n−1 ≤ N ≤ 2n+1 and the stage control technique is used, at least

2n+1+ n rounds are required to complete all-to-all personalized exchange in a GSEN. The

following lemma plays an important role in the remaining proofs.

Lemma 1. If the network configuration x maps input 0 to output j, then the network

configuration (x + 2n) mod 2n+1 maps input N/2 to the same output j. Moreover, x and

(x + 2n) mod 2n+1 differ only in the setting of stage 0.

Proof. Since the shuffle pattern makes input 0 and input N/2 link to the same switch of the stage 0, we have the lemma.

See Figure 2 for an illustration of this lemma. It is not difficult to see that the network configuration 10 maps input 0 to output 1 and the network configuration (10 + 8) mod 16, which equals 2, maps input 5 to the same output 1. Before going further, we introduce a definition. Output j is called a unique-path output of input i if the path between them is unique.

Lemma 2. If j is a unique-output of input 0, then j is also a unique-path output of input

(14)

Proof. Suppose to the contrary that the path between input N/2 and output j is not unique. Then by Lemma 1, the path between input 0 and output j will not be unique. Lemma 3. Input 0 has exactly 2N − 2n+1 unique-path outputs; these unique-path outputs

are consecutive and they are 2n+1− N, 2n+1− N + 1, · · · , N − 1.

Proof. Since the path between 0 and the switch sn

i is unique for 2n− N2 ≤ i ≤ N2 − 1,

we have this lemma.

The following lemma is obvious and its proof is omitted.

Lemma 4. Suppose j is a unique-path output of input 0. Then when the stage control

technique is used, the network configuration that maps 0 to j is exactly j.

Corollary 5. Suppose j is a unique-path output of input 0. Then when the stage control

technique is used, the network configuration that maps N/2 to j is exactly (j + 2n) mod

2n+1 .

Proof. This corollary follows directly from Lemma 1, Lemma 2, and Lemma 4.

We now derive a lower bound on the number of rounds required to complete all-to-all personalized exchange in a GSEN.

Theorem 6. When 2n + 2n−1 ≤ N ≤ 2n+1 and the stage control technique is used, at

least 2n+1+ n rounds are required to complete all-to-all personalized exchange in a GSEN.

Proof. By Lemma 3, U = {2n+1−N, 2n+1−N +1, · · · , N −1} is the set of unique-path

outputs of input 0. When 2n+2n−1 ≤ N ≤ 2n+1, we have S = {2n−1, 2n−1+1, · · · , 2n−1+

2n− 1} ⊆ U. Note that |S| = 2n. Let

(15)

and

S2 = {2n−1+ 2n, 2n−1+ 2n+ 1, · · · , 2n+1− 1, 0, 1, · · · , 2n−1− 1}.

By Lemma 4, the 2n network configurations in S

1 are required for input 0 to get to all the

outputs in S. By Corollary 5, the 2n network configurations in S

2 are required for input

N/2 to get to all the outputs in S. Since

S1∪ S2 = {0, 1, · · · , 2n+1− 1},

when the stage control technique is used, at least 2n+1network configurations are required

to complete all-to-all personalized exchange. Since at least 2n+1 network configurations

are required and it takes n + 1 rounds for a message to travel through a GSEN, we have this theorem.

We are now ready to propose our first all-to-all personalized exchange algorithm for GSENs. This algorithm uses the stage control technique and has two phases. The first phase is the message preparing phase and in this phase, personalized messages that need to be sent out from each processor are inserted into the message queue of that processor. The second phase is the message sending phase and in this phase, personalized messages are sent out from the message queue of each processor.

Algorithm GSEN-ATA-with-Stage-Control. Phase 1: The message preparing phase.

• The (n+1)-digit binary representations xnxn−1· · · x0 of numbers 0, 1, · · · , 2n+1−1

are sequentially generated and the labels of every input of the GSEN (the label of input 0 is 0, the label of input 1 is 1, etc) are equipped with the current binary representation xnxn−1· · · x0.

• Before a label enters switch sji, sji is set to straight if xn−j = 0 and set to cross if

(16)

• When a label reaches an output, a personalized message is prepared; in particular,

if label s reaches output t, then a personalized message that processor s wants to send to processor t is prepared and is inserted into the message queue of processor s.

Phase 2: The message sending phase.

• The (n+1)-digit binary representations xnxn−1· · · x0 of numbers 0, 1, · · · , 2n+1−1

are sequentially generated and the personalized messages in the message queue of every input of the GSEN are equipped with the current binary representation

xnxn−1· · · x0.

• Before a message enters switch sji, sji is set according to the rules used in phase 1.

• When a message reaches an output, that output receives a personalized message

for it.

End of the algorithm.

Theorem 7. Algorithm GSEN-ATA-with-Stage-Control is correct and takes 2(2n+1+ n)

rounds.

Proof. To prove the correctness of this algorithm, it is sufficient to prove that for an arbitrary pair of input i and output j, i can get to j. Since the stage control technique is used, there are only 2n+1 possible network configurations. The network configuration for i

to get to j is therefore a number in 0, 1, · · · , 2n+1− 1. Since Algorithm

GSEN-ATA-with-Stage-Control uses every number in 0, 1, · · · , 2n+1− 1 as one of its network configurations,

i can get to j. It is obvious that the above algorithm takes 2(2n+1+ n) rounds.

Corollary 8. When 2n + 2n−1 ≤ N ≤ 2n+1 and the stage control technique is used,

(17)

Proof. By Theorem 7, Algorithm GSEN-ATA-with-Stage-Control takes O(2n+1 + n)

rounds. By Theorem 6, when the stage control technique is used, the number of rounds required to complete all-to-all personalized exchange in a GSEN is Ω(2n+1+ n). We now

have this corollary.

4

All-to-all personalized exchange in GSENs with

N = 2

n

+ 2

In this section, we will propose our second all-to-all personalized exchange algorithm for GSENs and we will assume that the given GSEN has exactly N = 2n+ 2 nodes. The

differences between our two algorithms are: The first algorithm uses the stage control technique and each phase of the algorithm requires 2n+1 + n rounds (notice that 2n <

N ≤ 2n+1). On the contrary, each phase of the second algorithm requires only N + n

rounds and only the first 2n(note that N = 2n+2) rounds use the stage control technique.

The following is the second algorithm; it also has two phases: the message preparing phase and the message sending phase.

Algorithm GSEN-ATA-2.

Phase 1: The message preparing phase.

• The (n + 1)-digit binary representations xnxn−1· · · x0 of numbers 0, 1, · · · , 2n− 1,

2n+ 2n−1, and 2n+ 2n−1+ 1 are sequentially generated and the labels of every

input of the GSEN (the label of input 0 is 0, the label of input 1 is 1, etc) are equipped with the current binary representation xnxn−1· · · x0.

• Before a label enters switch sji, sji is set according to the number x with which the label is equipped.

(18)

If x is neither 2n+ 2n−1 nor 2n+ 2n−1+ 1, then:

sji is set to straight if xn−j = 0 and set to cross if xn−j = 1.

If x is 2n+ 2n−1 or 2n+ 2n−1+ 1, then:

if j = 0 or j = n, then sji is set to straight if xn−j = 0 and set to cross if

xn−j = 1; otherwise, sji is set to straight if i ⊕ xn−j = 0 and set to cross if

i ⊕ xn−j = 1.

• When a label reaches an output, a personalized message is prepared; in particular,

if label s reaches output t, then a personalized message that processor s wants to send to processor t is prepared and is inserted into the message queue of processor s.

Phase 2: The message sending phase.

• The (n + 1)-digit binary representations xnxn−1· · · x0 of numbers 0, 1, · · · , 2n− 1,

2n+ 2n−1, and 2n+ 2n−1+ 1 are sequentially generated and the personalized

messages in the message queue of every input of the GSEN are equipped with the current binary representation xnxn−1· · · x0.

• Before a message enters switch sji, the switch is set according to the rules used in phase 1.

• When a message reaches an output, that output receives a personalized message

for it.

End of the algorithm.

Phase 2 of Algorithm GSEN-ATA-2 is similar to phase 1 of Algorithm GSEN-ATA-2 except that a personalized message (instead of the label i) is sent from input i. So we only give an example for phase 1; see Figures 5 and 6. In these two figures, each 0-1 string is the binary representation of the number x with which a label is equipped. From these

(19)

two figures, the labels arriving at the outputs are as follows. for output 0 : 0 4 8 5 7 6 1 3 2 9 for output 1 : 4 0 5 8 6 7 3 1 9 2 for output 2 : 8 3 0 2 1 5 7 9 6 4 for output 3 : 3 8 2 0 5 1 9 7 4 6 for output 4 : 7 2 6 3 0 9 4 5 1 8 for output 5 : 2 7 3 6 9 0 5 4 8 1 for output 6 : 6 1 7 9 4 8 0 2 5 3 for output 7 : 1 6 9 7 8 4 2 0 3 5 for output 8 : 5 9 4 1 3 2 6 8 0 7 for output 9 : 9 5 1 4 2 3 8 6 7 0

It is not difficult to see that Algorithm GSEN-ATA-2 completes all-to-all personalized exchange for a GSEN with N = 10 nodes.

In the remaining part of this section, we will prove that Algorithm GSEN-ATA-2 is correct and optimal. Recall that the switches of stage ` are s`

0, s`1, · · · , s`N/2−1 and s`0

is considered to be the successive switch of s`

N/2−1. The following two observations are

based on the assumption that the setting of every switch of stage 0 is straight:

Observation 1. At stage 1, only one switch is reachable from input i. At stage 2, exactly 2 switches are reachable from input i and these switches are consecutive. In general, at stage `, 0 ≤ ` ≤ n, exactly 2`−1 switches are reachable from input i and these

switches are consecutive. At stage n (i.e., the last stage), exactly 2n−1 switches are

reachable from input i and these switches are consecutive.

Since the switches of stage ` that are reachable from input i are consecutive, we only need to know the first one; suppose s`

C` is this switch. Then we have the following observation.

Observation 2. C` = ½ i mod N/2 if ` = 0 , 2`−1(2i + b2i Nc) mod N/2 if 1 ≤ ` ≤ n.

We now use the above two observations to prove a lemma.

Lemma 9. If i ≤ N/2−1, then for each phase of Algorithm GSEN-ATA-2, after

perform-ing the first N + n − 2 rounds, only one switch of stage n (the last stage) is not reachable from input i. Moreover, if this unique switch is sn

qi, then qi = (2

(20)

Proof. When the stage control technique is used and the setting of every switch of stage 0 is straight, there are only 2npossible network configurations: 0, 1, · · · , 2n− 1. For each

phase of Algorithm GSEN-ATA-2, its first 2n rounds use the stage control technique and

the switches are set according to the (n + 1)-digit binary representations xnxn−1· · · x0 of

the numbers 0, 1, · · · , 2n− 1. So by Observation 1, for each phase of Algorithm

GSEN-ATA-2, after performing the first 2n+ n = N + n − 2 rounds, the number of switches

of stage n that are reachable from input i is 2n−1. Since each stage consists of 2n−1+ 1

switches, only one switch of stage n is not reachable from i. By Observation 2, if this unique switch is sn

qi, then qi = (Cn+ 2

n−1) mod N/2, i.e., q

i = (2n−1− 2i) mod N/2.

The following corollary follows directly from Lemma 9.

Corollary 10. q0 = 2n−1 and qi = (qi−1− 2) mod N/2 for i = 1, 2, · · · , N/2 − 1.

The proof of the following lemma is similar to that of Lemma 9 and is omitted here. Lemma 11. If i ≥ N/2, then for each phase of Algorithm GSEN-ATA-2, after performing

the first N + n − 2 rounds, only one switch of stage n (the last stage) is not reachable from input i. Moreover, if this unique switch is sn

qi, then qi = (2

n−1− 2i − 1) mod N/2.

The following corollary follows directly from Lemma 11.

Corollary 12. qN/2 = 2n−1− 1 and qi = (qi−1− 2) mod N/2 for i = N/2 + 1, N/2 +

2, · · · , N − 1.

Let M1 (M2), an N2 × (n + 1) 0-1 matrix, be the network configuration defined as

follows. For each 0 ≤ ` ≤ n, column ` of M1(M2) contains the setting of switches of stage

` at round 2n+ ` + 1 (2n+ ` + 2). When we do not want to specify which one of M 1 and

M2 is used, we will simply use M to denote either M1 or M2. From Algorithm

(21)

of 2n+ 1 and 2n+ 2, respectively. Since 2n+ 1 = (1100 · · · 00)

2 and 2n+ 2 = (1100 · · · 01)2,

the first n columns of M1 and M2 are identical and

(i) each entry in column 0 of M is 1,

(ii) 1 and 0 appear alternatively in column 1 of M,

(iii) 0 and 1 appear alternatively in column 2, column 3, · · · , column n − 1 of M, (iv) each entry in column n of M1 (M2) is 0 (1).

See the following for an illustration.

M1 =              1 1 0 0 · · · 0 0 0 1 0 1 1 · · · 1 1 0 1 1 0 0 · · · 0 0 0 1 0 1 1 · · · 1 1 0 ... · · · 1 0 1 1 · · · 1 1 0 1 1 0 0 · · · 0 0 0 1 0 1 1 · · · 1 1 0              M2 =              1 1 0 0 · · · 0 0 1 1 0 1 1 · · · 1 1 1 1 1 0 0 · · · 0 0 1 1 0 1 1 · · · 1 1 1 ... · · · 1 0 1 1 · · · 1 1 1 1 1 0 0 · · · 0 0 1 1 0 1 1 · · · 1 1 1             

For convenience, denote the two subports on the left-hand (right-hand) side of a 2 × 2 switch i0 and i1 (o0 and o1); see Figure 3. In a GSEN, the right-hand side of every stage

has exactly N ports: port 0, port 1, · · · , port N − 1. For convenience, let p`

i denote the

label of the port on the right-hand side of stage ` that is reachable from input i. When the network configuration M is used, the following two properties hold.

Property A. If i ≤ N/2 − 1 and 1 ≤ ` < n, then port p`

i is an o0-subport.

Property B. If i ≥ N/2 and 1 ≤ ` < n, then port p`

i is an o1-subport.

The following two lemmas will be used to prove that input i can reach switch sn qi by

using the network configuration M.

Lemma 13. If i ≤ N/2 − 1, then input i can reach switch sn

qi by using the network

configuration M. Moreover, input i can get to outputs 2qi and 2qi + 1 (the two outputs

connecting to sn

(22)

Proof. Let sn

ti be the switch of stage n (the last stage) that is reachable from input i

when the network configuration M is used. First consider the case that i = 0. Clearly, input 0 reaches switch s0

0 via i0-subport. Since input 0 reaches s00 via i0-subport and the

setting of s0

0 is cross, input 0 reaches switch s12·0+1 (i.e., s11) via i0-subport. Since input 0

reaches s1

1via i0-subport and the setting of s11 is straight, input 0 reaches switch s12·1+0 (i.e.,

s2

2) via i0-subport. For ` = 2, 3, · · · , n − 1, since input 0 reaches s`2`−1 via i0-subport and

the setting of s`

2`−1 is straight, input 0 reaches switch s`+12·2`−1+0 (i.e., s`+12` ) via i0-subport.

In particular, when ` = n−1, input 0 reaches switch sn

2n−1, which is switch snq0. So t0 = q0.

Next consider the case that 0 < i ≤ N/2 − 1. By Corollary 10, to prove this lemma, it remains to prove that

ti = (ti−1− 2) mod N/2 for i = 1, 2, · · · , N/2 − 1.

To prove the above statement, it suffices to prove that

pn−1i = (pn−1i−1 − 2) mod N for i = 1, 2, · · · , N/2 − 1.

Again, to prove this statement, it suffices to prove that (∗) p`

i = (p`i−1+ 2`+1) mod N for 1 ≤ ` ≤ n − 1.

We will prove (*) by induction on `. It is not difficult to see that (*) holds when ` = 1 or 2. Suppose ` ≥ 3 and (*) holds for ` − 1. Note that p`−1

i = (p`−1i−1 + 2`) mod N. Since

Property A holds, p`

i = 2p`−1i mod N and p`i−1= 2p`−1i−1 mod N. So

p`

i = 2p`−1i mod N = 2(p`−1i−1+ 2`) mod N = (p`i−1+ 2`+1) mod N

and (*) holds.

In the above discussion, we have proven that input i can reach switch sn

qi by using M1

or M2. Since the two outputs connecting to snqi are 2qi and 2qi + 1 and s

n

qi is set to be

straight by M1 and cross by M2, input i can get to outputs 2qi and 2qi+ 1 by using M1

(23)

Lemma 14. If i ≥ N/2, then input i can reach switch sn

qi by using the network

configura-tion M. Moreover, input i can get to outputs 2qi and 2qi+ 1 (the two outputs connecting

to sn

qi) by using M1 and M2, respectively.

Proof. The proof of this lemma is similar to that of the previous lemma except that Property B is used instead of Property A; hence the proof is omitted here.

Theorem 15. Algorithm GSEN-ATA-2 is correct and takes 2(N + n) rounds.

Proof. By Lemmas 9, 11, 13, and 14, each input i reaches each output j and hence Algorithm ATA-2 is correct. It is obvious that each phase of Algorithm GSEN-ATA-2 takes N + n and the whole algorithm takes 2(N + n) rounds.

Corollary 16. Algorithm GSEN-ATA-2 is optimal.

Proof. By Theorem 15, Algorithm GSEN-ATA-2 takes O(N) rounds. Since the number of rounds required to complete all-to-all personalized exchange in a GSEN is Ω(N), we have this corollary.

5

Concluding remarks

In [14], Yang and Wang proposed an optimal all-to-all personalized exchange algo-rithm, called ATAPE, for a class of unique-path, self-routable multistage interconnection networks (MINs). The MINs considered in [14] include the famous shuffle-exchange net-works. Algorithm ATAPE works only for unique-path MINs and requires constructing a Latin square in advance and allocating memory for storing the Latin square. Yang and Wang thought that the Latin square construction needs to be run only once at the time a network is built. Thus the Latin square associated with the network can be viewed as one of the system parameters and the time for constructing the Latin square is not counted in their communication delay analysis.

(24)

In this paper, we consider the general shuffle-exchange networks (GSENs). A GSEN is not necessarily a unique-path MIN and hence Algorithm ATAPE may not apply. We have proposed two optimal all-to-all personalized exchange algorithms for GSENs. Each of the two algorithms consists of two phases: the message preparing phase and the message sending phase. Algorithm ATAPE also consists of two (main) steps: Steps 1 and 2, which correspond to the message preparing phase and message sending phase of our algorithms, respectively. Unlike Algorithm ATAPE, we abandon the Latin square method and the requirement on the unique-path property.

Our first algorithm uses the stage control technique and works for arbitrary N. We have proven that it is optimal when the stage control technique is assumed for 2n−1+2n

N ≤ 2n+1. However, an output may receive more than one (identical) message from the

same input when the algorithm is executed. These overhead can be avoided and we do not discuss on this topic in this paper. Our second algorithm does not use the stage control technique and works only for N = 2n+ 2. We have also proven that it is optimal.

References

[1] G. J. Chang, F. K. Hwang, and L. D. Tong, “Characterizing bit permutation net-works,” Networks, vol. 33, no. 4, pp. 261-267, 1999.

[2] Z. Chen, Z. J. Liu, and Z. L. Qiu, “Bidirectional shuffle-exchange network and tag-based routing algorithm,” IEEE Commun. Lett., vol. 7, no. 3, pp. 121-123, 2003. [3] C. Y. Chen, J. K. Lou, “An efficient tag-based routing algorithm for the backward

network of a bidirectional general shuffle-exchange network,” IEEE Commun. Lett., vol. 10, no. 4, pp. 296-298, 2006.

[4] M. Gerla, E. Leonardi, F. Neri, and P. Palanti, “Routing in the bidirectional shuf-flenet,” IEEE-ACM Trans. Netw., vol. 9, no. 1, pp. 91-103, 2001.

(25)

[5] F. K. Hwang, “The mathematical theory of nonblocking swithcing networks,” Series

on Applied Mathematics, vol. 15, 2004.

[6] D. H. Lawrie, “Access and alignment of data in an array processor,” IEEE Trans.

Comput., vol. 24, no. 12, pp. 1145-1155, 1975.

[7] V. W. Liu, C. Y. Chen, and R. B. Chen, “Optimal all-to-all personalized exchange in d-nary banyan multistage interconnection networks,” to appear in J. Comb. Optim.. [8] A. Massini, “All-to-all personalized communication on multistage interconnection

networks,” Discrete Appl. Math., vol. 128, no. 2, pp. 435-446, 2003.

[9] K. Padmanabham, “Design and analysis of even-sized binary shuffle-exchange net-works for multiprocessors,” IEEE Trans. Parallel Distrib. Syst., vol. 2, no. 4, pp. 385-397, 1991.

[10] C. Qiao and L. Zhou, “Scheduling switching element disjoint connections in stage-controlled photonic banyans,” IEEE Trans. Commun., vol. 47, no. 1, pp. 139-148, 1999.

[11] R. Ramaswami, “Multiwavelength lightwave networks for computer communication,”

IEEE Commun. Mag., vol. 31, no. 2, pp. 78-88, 1993.

[12] Y. Yang, J. Wang, “All-to-all personalized exchange in banyan networks,” Proc.

Parallel and Distributed Computing and Sysetems (PDCS’99), Cambridge, MA, pp.

78-86, 1999.

[13] Y. Yang, J. Wang, “Optimal all-to-all personalized exchange in multistage networks,”

Proc. Seventh International Conference on Parallel and Distributed Systems (IC-PADS’00), Iwale, Japan, 2000.

(26)

[14] Y. Yang, J. Wang, “Optimal all-to-all personalized exchange in self-routable mul-tistage networks,” IEEE Trans. Parallel Distrib. Syst., vol. 11, no. 3, pp. 261-274, 2000.

[15] Y. Yang, J. Wang, “Optimal all-to-all personalized exchange in a class of optical multistage networks,” IEEE Trans. Parallel Distrib. Syst., vol. 12, no. 9, pp. 567-582, 2001.

(27)

0000 0 1 2 3 4 5 6 7 7 8 9 0000 0001 0 5 0 1 1 6 2 3 2 7 4 5 3 8 6 7 8 4 7 8 9 9 0000 0001 0010 0 7 0 5 0 1 5 3 1 6 2 3 1 8 2 7 4 5 6 4 3 8 6 7 4 2 8 4 7 8 9 9 9 0000 0001 0010 0011 0 8 0 7 0 5 0 1 7 6 5 3 1 6 2 3 5 4 1 8 2 7 4 5 3 2 6 4 3 8 6 7 2 1 4 2 8 4 7 8 9 9 9 9 0000 0010 0011 0100 0001 0 4 0 7 0 5 0 1 0 8 8 3 5 3 1 6 2 3 7 6 7 2 1 8 2 7 4 5 5 4 6 1 6 4 3 8 6 7 3 2 1 5 4 2 8 4 7 8 2 1 9 9 9 9 9 0000 0011 0100 0101 0010 0001 0 4 0 7 0 5 0 1 0 8 0 4 8 3 5 3 1 6 2 3 7 6 8 3 7 2 1 8 2 7 4 5 5 4 7 2 6 1 6 4 3 8 6 7 3 2 6 1 1 5 4 2 8 4 7 8 2 1 1 5 9 9 9 9 9 9 0001 0100 0101 0110 0011 0010 4 0 0 7 0 5 0 1 0 8 8 5 3 8 5 3 1 6 2 3 7 6 0 2 2 7 1 8 2 7 4 5 5 4 6 3 1 6 6 4 3 8 6 7 3 2 7 9 6 9 4 2 8 4 7 8 2 1 9 4 5 9 9 9 9 1 0101 0110 0111 0100 0011 0010 0 7 0 5 0 1 7 1 8 5 8 5 5 3 1 6 2 3 0 4 0 2 0 2 1 8 2 7 4 5 3 6 6 3 6 3 6 4 3 8 6 7 5 9 7 9 7 9 4 2 8 4 7 8 9 8 9 4 9 4 9 9 9 2 1 1

Initial GSEN Round 1

Round 2 Round 3

Round 4 Round 5

Round 6 Round 7

(28)

0110 0111 1100 0101 0100 0011 0 7 0 5 0 1 7 6 5 8 7 1 5 3 1 6 2 3 1 5 2 0 0 4 1 8 2 7 4 5 0 9 3 6 3 6 6 4 3 8 6 7 4 8 9 7 5 9 4 2 8 4 7 8 8 3 7 1 9 8 9 9 9 2 2 4 0111 1100 1101 0110 0101 0100 0 7 0 5 0 1 7 6 7 1 7 6 5 3 1 6 2 3 1 5 0 4 1 5 1 8 2 7 4 5 0 9 3 6 0 9 6 4 3 8 6 7 4 8 5 9 4 8 4 2 8 4 7 8 8 3 9 8 8 3 9 9 9 2 2 2 1100 1101 0111 0110 0101 5 2 0 5 7 1 6 7 1 3 0 8 1 6 0 4 5 1 7 9 6 3 2 7 3 6 9 0 4 5 1 4 3 8 5 9 8 4 0 2 4 7 8 4 9 8 4 2 2 6 9 9 2 8 3 0 1 1 0 0 0 1 1 1 0 1 1 0111 5 2 2 6 1 3 1 3 0 8 5 1 7 9 7 9 6 3 0 4 4 5 4 5 1 9 8 9 0 2 0 2 9 7 9 3 2 6 2 6 8 7 4 8 1 1 1 0 1 0 1 1 1100 2 6 3 1 2 9 5 1 9 7 6 4 0 9 5 4 1 8 8 4 2 0 5 3 4 3 0 8 3 0 6 7 7 1101 1100 2 9 2 9 6 4 6 4 1 8 1 8 5 3 5 3 3 0 3 0 7 7 1101 9 2 4 6 8 1 3 5 5 7 0 Round 8 Round 9 Round 10 Round 11 Round 12 Round 13 End of phase 1

數據

Figure 1: Communications among processors using an MIN.
Figure 3: The states of a 2 × 2 swtich.
Figure 4: The network configuration of the GSEN in Figure 2.
Figure 5: An example of phase 1 of Algorithm GSEN-ATA-2.
+2

參考文獻

相關文件

In the past researches, all kinds of the clustering algorithms are proposed for dealing with high dimensional data in large data sets.. Nevertheless, almost all of

• When a call is exercised, the holder pays the strike price in exchange for the stock.. • When a put is exercised, the holder receives from the writer the strike price in exchange

• When a call is exercised, the holder pays the strike price in exchange for the stock.. • When a put is exercised, the holder receives from the writer the strike price in exchange

Ramesh: An algorithm for generating all spann ing trees of directed graphs, Proceedings of the Workshop on Algorithms an d Data Structures, LNCS, Vol.. Ramesh: Algorithms for

Step 5: Receive the mining item list from control processor, then according to the mining item list and PFP-Tree’s method to exchange data to each CPs. Step 6: According the

In the proposed method we assign weightings to each piece of context information to calculate the patrolling route using an evaluation function we devise.. In the

proposed a greedy algorithm to utilize the Divide-and-Conquer technique to obtain near optimal scheduling while attempting to minimize the size of total communication messages

The proposed algorithms use the optimal-searching technique of genetic algorithm (GA) to get an efficient scheduling solution in grid computing environment and adapt to