Universal switch blocks for three-dimensional FPGA design

(1)

Universal switch blocks for three-dimensional

FPGA design

G.-M. Wu, M. Shyu and Y.-W. Chang

Abstract: The authors consider the switch-block design problem for three-dimensional FPGAs. A three-dimensional switch block M with W terminals on each face is said to be universal if every set of nets satisfying the dimension constraint (i.e. the number of nets on each face of M is at most W) is simultaneously routable through M. A class of universal switch blocks for three-dimensional FPGAs is presented. Each of the switch blocks has 15W switches and switch-block ﬂexibility 5 (i.e. FS=5) . It is proved that no switch block with less than 15W switches can be universal. The proposed switch blocks are compared with others of the topology associated with those used in the Xilinx XC4000 FPGAs. Experimental results demonstrate that the proposed universal switch blocks improve routabilty at the chip level. Further, the decomposition property of a universal switch block provides a key insight into its layout implementation with a smaller silicon area.

1 Introduction

A conventional ﬁeld-programmable gate array (FPGA) (Fig. 1a)consists of an array of logic blocks that can be connected by routing resources [1]. The logic blocks contain circuits used to implement logic functions. The routing resources consist of wire segments and switch blocks. Figure 1b illustrates a switch block in which the programmable switches, denoted by dashed lines between terminals, are shown. There are many reports on four-sided switch blocks[2–8].

A three-dimensional FPGA architecture (Fig. 2a)is a generalisation based on the conventional 2-D FPGA; it stacks a number of 2-D FPAG blocks together by MCM fabrication techniques[9, 10], where each logic block has six adjacent neighbours, as opposed to four in the 2-D case[9]. The 3-D switch blocks are not the same as the conventional switch blocks (Fig. 2b). Each switch block is connected with six adjacent switch blocks. Therefore, they enable each channel segment to connect to some subset of the channel segments incident on the other ﬁve faces of the 3-D switch block. This unique architecture motivates our study of the 3-D switch blocks.

A three-dimensional switch block M with W terminals on each face is said to be universal if every set of nets satisfying the dimension constraint (i.e. the number of nets on each face of M is at most W)is simultaneously routable through M. This paper presents a class of universal switch blocks for three-dimensional FPGAs. Each switch block has 15W switches and switch-block ﬂexibility 5 (i.e FS¼ 5). We prove

that no switch block with less than 15W switches can be universal. We also compare the proposed switch blocks with others of the topology associated with those used in the Xilinx XC4000 FPGAs. Experimental results demonstrate that the universal switch blocks improve routabilty at the chip level.

2 Switch-block modelling

This Section presents the modelling for 3-D switch blocks and their routing. It is shown that the 3-D switch-block design problem can be transformed into the six-sided one. A three-dimensional switch block is a cubic block with W terminals on each face of the block. The size of the 3-D switch block is referred W. Some pairs of terminals, on different faces of the block, may have programmable switches and thus can be connected by programming the switches to be ‘ON’. We represent a 3-D switch block by M3d(T, S) , where T is the set of terminals, and S the set of switches. Let the faces F1, F2, F3, F4, F5, and F6represent the front, rear, left, right, top, and bottom faces, respec-tively (Fig. 3). Label the terminals t1,1, t1,2,y,t1,W, t2,1, t2,2,y,t2,W,y,t6,1, t6,2, y, t6,W starting from the termi-nals on F1 to those on F6. Let T(F )¼ {t1,1, y, t1,W} (front terminals), T(H)¼ {t2,1,y, t2,W} (rear terminals), T(L)¼ {t3,1, y, t3,W} (left terminals), T(r)¼ {t4,1, y, t4,W} (right terminals), T(T )¼ {t5,1,y, t5,W} (top terminals), and T(B)¼ {t6,1, y, t6,W} (bottom terminals). Figure 3b shows the labelling of the terminals on Fi. Therefore, S¼ {(ti,j, tp,q)7 there exists a programmable switch between ti,j and tp,q}, and T¼,iA{F,H,L,R,T,B}T(i). For convenience, we often refer to a switch block M3d(T,S)simply as M3d,omitting T and S, if there is no ambiguity about T and S, or T and S are not of concern in the context.

A hexagonal switch block (HSB)is a six-sided switch block with V terminals on each side of the block. We say that the HSB is of size V. We represent an HSB by Mh(Th,Sh) , where Th is the set of terminals and Sh the set of programming switches. Label the terminals t1,1, t1,2,y, t1,V, t2,1, t2,2,y, t2,V,y, t6,1, t6,2,y,t6,Vstarting from the rightmost terminal on the bottom side and proceeding clockwise (Fig. 3c) . Let Th(i)¼ {ti,1, y, ti,V},

G.-M. Wu is with the Department of Management of Information System, Nanhua University, Taiwan, Republic of China

M. Shyu is with the Department of Computer and Information Science, National Chiao Tung University, Taiwan, Republic of China

Y.-W. Chang is with the Department of Electrical Engineering and Graduate Institute of Electronics Engineering, National Taiwan University, Taiwan, Republic of China

rIEE, 2004

IEE Proceedings online no. 20040228 doi:10.1049/ip-cds:20040228

(2)

where i¼ 1,2,3,4,5, or 6. Therefore, Sh¼ {(tm,n,tu,v)7 there exists a programmable switch between terminal tm,n and terminal tu,v}, where mau, m,u ¼ 1,2, y, 6, n, v ¼ 1,2, y, V, and Th¼,Th(i) , where i¼ 1, 2, y, 6. For convenience, we often refer to Mh(Th, Sh)simply as Mh, omitting Thand Sh, if there is no ambiguity about Thand Sh, or Thand Sh are not of concern in the context.

In the following, we transform the design problem for the 3-D switch blocks into that for the HSBs. For convenience, we modify the terminology isomorphism used in [3] as follows. Let M(T, S) (M0_(T0_{, S}0_{)) be a} 3-D or a hexagonal switch block. We have the following deﬁnition.

Definition 1: Two switch blocks M(T, S)and M0_(T0_{, S}0₎ are isomorphic if there exists a bijection f:T-T0_{such that}

(tm,n,tu,v)AS if and only if ( f(tm,n), f(tu,v))AS0 and, for any two terminals tm,n, and tu,v, tm,n, tu,vAT if and only if f(t_m,n), f(tu,v)AT0.

In other words, M(T,S)and M0_(T0_{, S}0_{)are isomorphic} if we can relabel the terminals of M to be the terminals of M0, maintaining the corresponding switches in M and M0_{; and for terminals on the same side (face)of M, their} corresponding terminals are also on the same side (face)of M0_{. For any two isomorphic switch blocks, we have the} following theorems.

Theorem 1:[3]Any two isomorphic switch blocks have the same routing capacity.

Theorem 2: For any M3dof size W, there exists an Mhof the same size such that M3dand Mhare isomorphic, and vice versa.

Proof: For an M3d(S,T)of size W, we can construct an Mh(Sh,Th)of the same size such that (tm,n,tu,v)AShif (tm,n, tu,v)AS, where mau, m,u ¼ 1,2, y ,6 and n,v ¼ 1,2, y, W. Let the mapping function f:T-Th be f(tm,n)¼ tm,n. Obviously, (tm,n, tu,v)AS if and only if (f(tm,n), f(tu,v))¼ (tm,n, tu,v)ASh. Therefore, by deﬁnition 1, M3d and Mh are isomorphic. For an Mh(Sh, Th)of size V, we can construct an M3d(S, T )of the same size such that (tm,n, tu,v)AS if (tm,n, Tu,v)ASh where mau,m,u ¼ 1,2, y, 6 and n,v¼ 1,2, y, V. Similarly, there exists the bijection f0_:T

h

-T such that f(tm,n)¼ tm,n and (tm,n, tu,v)ASh and only if ( f(tm,n), f(tu,v))¼ (tm,n, tu,v)AS. Therefore, M3dand Mhare

isomorphic. &

By theorems 1 and 2, the design problem for the 3-D switch block is equivalent to that for the six-sided switch

a b

Fig. 1 Conventional FPGA and its switch block

a Conventional FPGA architecture b Conventional four-sided switch block

a b

Fig. 2 3-D FPGA and switch block

a 3-D FPGA b 3-D switch block

c b

a

Fig. 3 3-D switch block and corresponding six-sided switch block

a Model of 3-D switch block

b One face of 3-D switch block and its terminals c Corresponding six-sided switch block.

(3)

block. Therefore, we shall focus on the six-sided switch block in the rest of the paper. We can classify all connections passing through a switch block into a number of categories. For an HSB, connections can be of 15 types. See Fig. 4 for the type deﬁnition.

A routing requirement vector (RRV) n for an HSB is a 15-tuple (n1,2, y, n1,6, n2,3, y, n2,6, n3,4, y, n3,6, n4,5, n4,6, n5,6) , where ni,j is the number of type-(i, j)connections required to be routed through an HSB, 0rni,jrV, i, j ¼ 1,2, y, 6, iaj. An RRV n is said to be routable on an HSB Mhif there exists a routing for n on Mh.

The routing capacity of a switch block M is referred to as the number of distinct routable vectors on M; that is, the routing capacity of M is the cardinality 7{n7n is routable on M}7. A switch block M with V terminals on each side is called ‘universal’ if every set of nets satisfying the dimension constraint (i.e. the number of nets on each side of M is at most V)is simultaneously routable through M. We have the following deﬁnition.

Definition 2: An HSB Mh of size V is called universal if the following set of inequalities is the necessary and sufﬁcient conditions for an RRV n¼ (n1,2, y, n1,6, n2,3, y, n2,6, n3,4, y, n3,6, n4,5, n4,6, n5,6)to be routable on Mh: n1;2þ n1;3þ n1;4þ n1;5þ n1;6 V ð1Þ n1;2þ n2;3þ n2;4þ n2;5þ n2;6 V ð2Þ n_1;3þ n2;3þ n3;4þ n3;5þ n3;6 V ð3Þ n1;4þ n2;4þ n3;4þ n4;5þ n4;6 V ð4Þ n1;5þ n2;5þ n3;5þ n4;5þ n5;6 V ð5Þ n_1;6þ n2;6þ n3;6þ n4;6þ n5;6 V ð6Þ We refer to the dimension constraint as the set of inequalities which characterises a six-sided universal switch block of size V. Therefore, the dimension constraint for an HSB is the set of (1)–(6) listed in deﬁnition 2. Note that the number of nets routed through each side of a switch block cannot exceed V; therefore, a universal switch block has the maximum routing capacity.

3 Universal switch blocks

In this Section, we present an algorithm for constructing symmetric HSBs and it is proved that symmetric HSBs are universal. The symmetric HSB of size V has only 15 V switches. We prove that no HSB with less than 15 V switches can be universal. Based on isomorphism operations (theorem 1), we can identify a whole class of universal switch blocks.

3.1 Symmetric switch blocks

Algorithm Symmetric_Switch_Block, shown in Fig. 5, constructs a six-sided switch block Mhof size V. We refer to the topology of the switch block constructed by the algorithm as the ‘symmetric topology’ and the switch block

as the ‘symmetric switch block’. Figure 6 shows two examples of symmetric switch blocks. For a symmetric switch block, it has the ﬂexibility (FS)of 5; thus, the total number of switches in the symmetric switch block size V is

6 V Fs

2 ¼ 3 V Fs¼ 15 V

For a symmetric switch block with an even V terminals on each side, it can be partitioned into V/2 sub-blocks of size two; and for an odd V, it can be partitioned into IV/2m sub-blocks of size two and one sub-block of size one. Thus, we have the following property. A symmetric switch block of size V can be partitioned into IV/2m symmetric sub-blocks of size two and (V mod 2)symmetric sub-block of size one. (We call this property the decomposition property.)

Note that these sub-blocks are not interacting with each other; thus, each sub-block can be considered independently.

3.2 Proof of universality

In this subsection, it is proved that the symmetric switch blocks constructed by algorithm Symmetric_Switch_Block are universal. To show that the symmetric switch blocks are universal, we ﬁrst prove that the symmetric HSBs of size two are universal.

Fig. 4 Fifteen types of connection in an HSB

Fig. 5 Algorithm for constructing a six-sided symmetric switch

block of size V

a b

Fig. 6 Two symmetric hexagonal switch blocks

a Symmetric HSB of V¼ 3 b Symmetric HSB of V¼ 2

(4)

Lemma 1: The HSB Mh of size two constructed by algorithm Symmetric_Switch_Block is universal.

Proof: By definition 2, we must prove that n is routable on Mh if and only if the following inequalities are simulta-neously satisfied: n1;2þ n1;3þ n1;4þ n1;5þ n1;6 2 ð7Þ n1;2þ n2;3þ n2;4þ n2;5þ n2;6 2 ð8Þ n1;3þ n2;3þ n3;4þ n3;5þ n3;6 2 ð9Þ n1;4þ n2;4þ n3;4þ n4;5þ n4;6 2 ð10Þ n_1;5þ n2;5þ n3;5þ n4;5þ n5;6 2 ð11Þ n1;6þ n2;6þ n3;6þ n4;6þ n5;6 2 ð12Þ (If)It is not difficult to identify all of the RRVs satisfying (7)–(12). (In fact, there are 2578 such RRVs.) We verify the RRVs and conclude that they can all successfully be routed on the HSB of size two constructed by algorithm Symmetric_Switch_Block. In fact, based on the work in [11], we need to check only the RRVs in the corresponding dominating set (see [11] for the definition of dominating sets). The key insight is that the two terminals, say terminals b and c, which connect to a terminal, say a, do not share any switch (Fig. 7b); thus the connections associated with them are non-interacting, except those associated with a.

(Only if)For an HSB Mhof size two, the total number of connections routed through each side of Mhcannot exceed two. Hence, if n is routable on Mh, the six inequalities must

be satisﬁed. &

Let UV denote the set of RRVs which satisfy the dimension constraint for an HSB of size V. An RRVcAUV is called a maximal RRV (MRRV)if there exists no other RRV in UV that dominates c. In the following, we show that all RRVs in UVcan be decomposed into UV2and U2.

Similarly, we need to check only the RRVs in the corresponding dominating set (i.e. MRRVs). We have the following lemma.

Lemma 2: When an MRRV cAUVis routed on an HSB, all unused terminals, if any, must be on the same side and the number of unused terminals funused¼ 2c, 0rcr IV/2m, cAZ.

Proof: If there are two unused terminals on different sides (say sides i and j, ioj), we can increase gi,jby one without violating the dimension constraint, implying that c is not maximal: a contradiction. Hence, all unused terminals, if any, must be on the same side. Note that the total number of terminals is ftotal¼ 6 V, an even number. Assume that there are fusedused terminals. Obviously, fusedis even since each switch is incident on two terminals. Also, funused ¼ ftotal–fusedrV since all unused terminals, if any, must be on the same side. Since ftotaland fusedare even numbers and 0rfunusedrV, funused¼ ftotal–fused¼ 2c, 0rcr

IV/2m, cAZ. &

Consider the MRRVs in UV. By lemma 2, we can classify the MRRVs into two types. One is that all terminals are used (i.e. funused¼ 0), and we call an MRRV of this type a ‘complete MRRV’. The other is that an even number of terminals on the same side are unused (i.e. funused¼ 2c, cAZ+), and we call an MRRV of this type a ‘degenerate complete MRRV’.

To show that an MRRV in UVcan be decomposed into UV2 and U2, we ﬁrst construct a multiple graph and a weighted graph for the MRRV as follows: for any MRRV we construct a multiple graph Gm(Vm,Em) , where Vm¼ {v1, v2y,v6}.If ni,j¼ 1, construct an edge between viand vjwith weight 1; if ni,jZ2, construct two edges between vi,and v_j with total weights equal to ni,j. (We call the two edges a ‘multi-edge’.)We induce a weighted graph Gw(Vw,Ew)from Gm(Vm,Em)by substituting a weighted edge for a multi-edge. Figures 8b and 8c show a multiple graph Gm1 for

a

b

Fig. 7 Two symmetric HSBs and their sub-blocks

a Decomposition of symmetric HSB of V¼ 4 b Decomposition of symmetric HSB of V¼ 3

(5)

n¼ (1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 3) and its corresponding weighted graph Gw1, respectively. In Gm1, there are two edges between v5and v6because n5,6¼ 3; thus, we introduce the weighted edge (v5, v6)in Gw1. In a weighted graph Gw(Vw, Ew) , a vertex vAVwrepresents one side of an HSB, an edge eAEhrepresents a type of connection between two sides of the HSB, and weight(e)denotes the number of connections of the type associated with e.

Let Ck denote a connected component of k vertices in Gw. We have the following lemma.

Lemma 3: For a weighted graph Gw associated with a complete MRRV, there exists no isolated vertex in Gwand for kZ3, Ckcontains no degree-one vertex.

Proof: There exists no isolated vertex in Gw, since the total connections associated with a complete MRRV for a side of HSB must be equal to V. Suppose there exists a Gw(Vw,EW) (associated with a complete MRRV)for an HSB of size V with a connected component Ckand, for kZ3, Ckcontains a degree-one vertex vi. Let vi connect to a vertex vj by an edge ei,j¼ (vi, vj) . Since Gw is associated with a complete MRRV and viis a degree-one vertex. The total number of connections associated with vi, weight (ei,j), is equal to the dimension constraint V. Further, the total number of connections associated with a vertex can be V at most; since weight (ei,j)¼ V, vjonly connects to vi. Hence, the degree of vj must also equal one, and vi and vj form a C2. A contradiction. Therefore, there exists no Ck (kZ3)with

degree-one vertex in Gw. &

A hamiltonian subcycle of a multiple graph Gm(Vm, Em) is a simple cycle that contains a subset of vertices in Vm. Two hamiltonian subcycles in a multiple graph are called ‘independent’ if no vertex is shared by two subcycles. A multiple graph Gm(Vm,Em)is said to be sub-hamiltonian if it contains a set of independent hamiltonian subcycles and all vertices in Vmare on the subcycles; otherwise, it is non-sub-hamiltonian. Also, we call a weighted graph sub-hamilto-nian if its associated multiple graph is sub-hamiltosub-hamilto-nian. For three RRVs u, v, and x, we say u is a sub-RRV of v if there exists an x such that v¼ u+x. Let ni be a sub-RRV of n. We deﬁne O and O0 as follows:

O¼fnjn 2 UVg

O0¼fnjn ¼ n1þ n2;n12 UV2and n22 U2g; where V 2

We have the following lemmas.

Lemma 4: If the multiple graph of an MRRV n is sub-hamiltonian, n has a complete sub-RRV n2AU₂.

Proof: Let Gmbe a sub-hamiltonian graph associated with an MRRV n. Gm has a set of independent hamiltonian subcycles S¼ {c1, c2, y, ck} that covers all vertices in Gm. We can choose a sub-RRV n2 of n as follows. For any cycle ci2 S; ci¼ovi1; vi2; . . . ; vik; vij4. If k¼ 2, let

n2;ðv_i1;v_i2Þ¼ 2; otherwise, let n2;ðv_i1;v_i2Þ¼ n2;ðv

i2;vi₃Þ¼ . . . ¼ n2;ðv_ik;vi₁Þ¼ 1. We traverse Gmbased on S. S includes inde-pendent hamiltonian subcycles which contain all vertices in Gm. All vertices will be visited one time. A vertex in Gm corresponds to one side of an HSB associated with n. Every vertex contributes two degrees in S. Thus, the number of connections with one side of the HSB is equal to 2. n2satisﬁes the following inequalities:

n2;ð1;2Þþ n2;ð1;3Þþ n2;ð1;4Þþ n2;ð1;5Þþ n2;ð1;6Þ¼ 2 n_2;ð1;2Þþ n2;ð2;3Þþ n2;ð2;4Þþ n2;ð2;5Þþ n2;ð2;6Þ¼ 2 n_2;ð1;3Þþ n2;ð2;3Þþ n2;ð3;4Þþ n2;ð3;5Þþ n2;ð3;6Þ¼ 2 n_2;ð1;4Þþ n2;ð2;4Þþ n2;ð3;4Þþ n2;ð4;5Þþ n2;ð4;6Þ¼ 2 n_2;ð1;5Þþ n2;ð2;5Þþ n2;ð3;5Þþ n2;ð4;5Þþ n2;ð5;6Þ¼ 2 n_2;ð1;6Þþ n2;ð2;6Þþ n2;ð3;6Þþ n2;ð4;6Þþ n2;ð5;6Þ¼ 2 Thus n2AU₂and it is a complete MRRV. & Lemma 5: (RRV decomposition property) O¼ O0_.

Proof: First, we show that O0DO. By definition 2, an RRV n1AU_V2if and only if the following inequalities are simultaneously satisfied: n1;ð1;2Þþ n1;ð1;3Þþ n1;ð1;4Þþ n1;ð1;5Þþ n1;ð1;6Þ V 2 ð13Þ n1;ð1;2Þþ n1;ð2;3Þþ n1;ð2;4Þþ n1;ð2;5Þþ n1;ð2;6Þ V 2 ð14Þ n1;ð1;3Þþ n1;ð2;3Þþ n1;ð3;4Þþ n1;ð3;5Þþ n1;ð3;6Þ V 2 ð15Þ n1;ð1;4Þþ n1;ð2;4Þþ n1;ð3;4Þþ n1;ð4;5Þþ n1;ð4;6Þ V 2 ð16Þ n1;ð1;5Þþ n1;ð2;5Þþ n1;ð3;5Þþ n1;ð4;5Þþ n1;ð5;6Þ V 2 ð17Þ n1;ð1;6Þþ n1;ð2;6Þþ n1;ð3;6Þþ n1;ð4;6Þþ n1;ð5;6Þ V 2 ð18Þ and for an RRV n2AU₂, the following inequalities are simultaneously satisfied: n2;ð1;2Þþ n2;ð1;3Þþ n2;ð1;4Þþ n2;ð1;5Þþ n2;ð1;6Þ 2 ð19Þ n2;ð1;2Þþ n2;ð2;3Þþ n2;ð2;4Þþ n2;ð2;5Þþ n2;ð2;6Þ 2 ð20Þ n2;ð1;3Þþ n2;ð2;3Þþ n2;ð3;4Þþ n2;ð3;5Þþ n2;ð3;6Þ 2 ð21Þ n2;ð1;4Þþ n2;ð2;4Þþ n2;ð3;4Þþ n2;ð4;5Þþ n2;ð4;6Þ 2 ð22Þ n2;ð1;5Þþ n2;ð2;5Þþ n2;ð3;5Þþ n2;ð4;5Þþ n2;ð5;6Þ 2 ð23Þ n2;ð1;6Þþ n2;ð2;6Þþ n2;ð3;6Þþ n2;ð4;6Þþ n2;ð5;6Þ 2 ð24Þ Let n¼ n1+n2, n1,2¼ n1,(1,2)+n2,(1,2), n1,3¼ n1,(1,3)+n2,(1,3), y_{, n}_5,6¼ n_1,(5,6)_+n_2,(5,6)_{. Combining (13)and (19), (14)and} (20), (15)and (21), (16)and (22), (17)and (23), and (18)and (24), we obtain (25)–(30) n_1;2þ n1;3þ n1;4þ n1;5þ n1;6 V ð25Þ n1;2þ n2;3þ n2;4þ n2;5þ n2;6 V ð26Þ n1;3þ n2;3þ n3;4þ n3;5þ n3;6 V ð27Þ n1;4þ n2;4þ n3;4þ n4;5þ n4;6 V ð28Þ n1;5þ n2;5þ n3;5þ n4;5þ n5;6 V ð29Þ n1;6þ n2;6þ n3;6þ n4;6þ n5;6 V ð30Þ Thus, nAUV, and we have O0DO.

Next, we show that ODO0. It sufﬁces to show that each MRRV nAO is in O0_{. By lemma 2, all unused terminals} for routing an MRRV on an HSB are on the same side, and the number of unused terminals is even. Without loss of

a b c

Fig. 8 Routing instance and corresponding multiple and weighted

graphs

a Routing instance n¼ (1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 3) b Multiple graph Gm1associated with RRV n

(6)

generality, assume that all unused terminals, if any, are on side 1 and the number of unused terminals is equal to c1, 0rc1rIV/2m. By deﬁnition 2, an MRRV nAO if and only if the following equalities are simultaneously satisﬁed:

n1;2þ n1;3þ n1;4þ n1;5þ n1;6¼ V 2c1 ð31Þ n_1;2þ n2;3þ n2;4þ n2;5þ n2;6¼ V ð32Þ n_1;3þ n2;3þ n3;4þ n3;5þ n3;6¼ V ð33Þ n1;4þ n2;4þ n3;4þ n4;5þ n4;6¼ V ð34Þ n1;5þ n2;5þ n3;5þ n4;5þ n5;6¼ V ð35Þ n1;6þ n2;6þ n3;6þ n4;6þ n5;6¼ V ð36Þ Algorithm RRV-Decomposition listed in Fig. 9 shows a method to decompose nAUv into n1 and n2, where n1AU_V2and n₂AU₂. It derives n based on the two cases: (i) n is a complete MRRV, and (ii) n is a degenerate complete MRRV.

We ﬁrst consider the case where n is a complete MRRV. Let Gw be a weighted graph of n and Cibe a connected component with i vertices in Gw. By lemma 3, there exists no isolated vertex or any Ck, kZ3, with a degree-one vertex in Gw. Thus, the number of vertices in Ckcould only be 2, 3, 4, or 6.

We classify all weighted graphs for complete MRRVs into four categories a, b, g, and d, listed in Table 1. (Note that the weighted graphs, except C2, contain no isolated vertex or degree-one vertex.)Categories a, b, g, and d represent the cases where Gwconsists of three C2s, one C2 and one C4, two C3s, and one C6, respectively. The total

number of weighted graphs with complete MRRVs are 56, and twelve of them are illegal, which can be veriﬁed by checking if the total edge weights of the graphs equal 3 V. (Note that all 6 V terminals are used for a complete MRRV, and a connection is incident on two terminals.)For example, Fig. 10a shows an infeasible weighted graph Gw. Consider vertices v1and v4. The total number of connections associated with v1must be equal to the dimension constraint V (i.e. all terminals on each side of the HSB are used) ; therefore, 7e17+7e27+7e37+7e47 ¼ V. Similarly, there are V connections associated with v4, and thus 7e57+7e67+7e77+7e87 ¼ V. Therefore, the total num-ber of connections associated with Gw is equal to 2 V. By (31)–(36), however, the total number of connections associated with a complete MRRV must be equal to 3 V. Therefore, Gwis illegal. We have the facts that the other 44 weighted graphs are sub-hamiltons. (Table 1 summarises the number of legal and illegal weighted graphs for complete MRRVs.)Figure 10b shows a sub-hamilton weighted graph associated with a complete MRRV. It has a hamiltonian subcycle c1¼ov1, v2,y, v6, v14 that contains all vertices. By lemma 4, n has a complete sub-RRV n2AU₂. In other words, n₂ satisﬁes the following equalities: n2;ð1;2Þþ n2;ð1;3Þþ n2;ð1;4Þþ n2;ð1;5Þþ n2;ð1;6Þ¼ 2 ð37Þ n_2;ð1;2Þþ n2;ð2;3Þþ n2;ð2;4Þþ n2;ð2;5Þþ n2;ð2;6Þ¼ 2 ð38Þ n_2;ð1;3Þþ n2;ð2;3Þþ n2;ð3;4Þþ n2;ð3;5Þþ n2;ð3;6Þ¼ 2 ð39Þ n_2;ð1;4Þþ n2;ð2;4Þþ n2;ð3;4Þþ n2;ð4;5Þþ n2;ð4;6Þ¼ 2 ð40Þ n_2;ð1;5Þþ n2;ð2;5Þþ n2;ð3;5Þþ n2;ð4;5Þþ n2;ð5;6Þ¼ 2 ð41Þ n_2;ð1;6Þþ n2;ð2;6Þþ n2;ð3;6Þþ n2;ð4;6Þþ n2;ð5;6Þ¼ 2 ð42Þ Let n1¼ nn2. Since n is a complete MRRV, the constant c1 in (31) equals zero. By (31)–(36) and (13)–(18), we have the following equalities: n2ð1;2Þþ n2;ð1;3Þþ n2;ð1;4Þþ n2;ð1;5Þþ n2;ð1;6Þ¼ V 2 ð43Þ n2;ð1;2Þþ n2;ð2;3Þþ n2;ð2;4Þþ n2;ð2;5Þþ n2;ð2;6Þ ¼ V 2 ð44Þ n2;ð1;3Þþ n2;ð2;3Þþ n2;ð3;4Þþ n2;ð3;5Þþ n2;ð3;6Þ ¼ V 2 ð45Þ n2;ð1;4Þþ n2;ð2;4Þþ n2;ð3;4Þþ n2;ð4;5Þþ n2;ð4;6Þ ¼ V 2 ð46Þ n2;ð1;5Þþ n2;ð2;5Þþ n2;ð3;5Þþ n2;ð4;5Þþ n2;ð5;6Þ ¼ V 2 ð47Þ n2;ð1;6Þþ n2;ð2;6Þþ n2;ð3;6Þþ n2;ð4;6Þþ n2;ð5;6Þ ¼ V 2 ð48Þ

Table 1: Number of weighted graphs for complete MRRVs

Category Number of weighted graphs Number of illegal graphs Number of legal graphs a 1 0 1 b 3 0 3 g 1 0 1 d 52 12 40 Total 56 12 44

Fig. 9 Algorithm for RRV decomposition, assuming that all

unused terminals, if any, are on side 1

a b

Fig. 10 Weighted graphs for two complete MRRVs

a Illegal weighted graph; its total weights is equal to 2V

b Legal weighted graph with a hamiltonian subcycle C1¼ov1, v2,y,

(7)

Obviously, n1AUV2. Applying similar techniques, we can show that all multiple graphs associated with degenerate complete MRRVs are sub-hamiltons. Thus, ODO0. Based on the above discussion, we have O0¼ O.

Two sub-blocks are said to be ‘independent’ if they do not share any terminals. By lemma 5, independent switch sub-blocks can be considered individually and merged to form a larger switch block, and the routable RRVs for the merged switch block are identical to those given by applying vector addition operations on the routable RRVs for the independent switch sub-blocks. With the decomposition properties of symmetric switch blocks and RRVs, we can prove the following theorem by using mathematical induction.

Theorem 3: The symmetric switch blocks are universal Proof: Using deﬁnition 2, we shall show that, for an HSB Mhof size V constructed by algorithm Symmetric_Switch_ Block, n is routable on Mh if and only if the following inequalities are simultaneously satisﬁed:

n1;2þ n1;3þ n1;4þ n1;5þ n1;6 V n1;2þ n2;3þ n2;4þ n2;5þ n2;6 V n1;3þ n2;3þ n3;4þ n3;5þ n3;6 V n1;4þ n2;4þ n3;4þ n4;5þ n4;6 V n1;5þ n2;5þ n3;5þ n4;5þ n5;6 V n1;6þ n2;6þ n3;6þ n4;6þ n5;6 V

For the HSBs constructed by the algorithm, we have the following key observations (Fig. 7). For an HSB of an even V, we can partition it into V/2 non-interacting sub-blocks (shown in Fig. 7a); each sub-block has the same topology as that of b shown in Fig. 7a. For an HSB of an odd V, we can partition it into [V/2] non-interacting sub-blocks, with each of the [V/2] sub-blocks identical to b and one sub-block with a clique topology formed by the six terminals from the middle of each side (Fig. 7b). Because terminals in different sub-blocks are non-interacting, each sub-block can be considered independently (lemma 5). Therefore, each symmetric HSB consists of [V/2] independent universal switch sub-blocks of size two, and additional one of size one if V is odd. Further, by lemma 1, an HSB of size two constructed by algorithm Symmetric_Switch_Block is uni-versal. (If)For an even V, by lemma 5, we can decompose a vector n¼ (n1,2, n1,3, y, n1,6, n2,3, y, n2,6, n3,4, y, n3,6, n4,5, n4,6, n5,6)into V/2 sub-RRVs ni¼ (ni,(1,2), ni,(1,3), y, ni,(1,6), ni,(2,3), y, ni,(2,6), ni,(3,4), y, ni,(3,6), ni,(4,5), ni,(4,6), ni,(5,6)), where i¼ 1,2, y, V/2, such that each sub-RRV satisﬁes the following set of inequalities:

ni;ð1;2Þþ ni;ð1;3Þþ ni;ð1;4Þþ ni;ð1;5Þþ ni;ð1;6Þ 2 ð49Þ ni;ð1;2Þþ ni;ð2;3Þþ ni;ð2;4Þþ ni;ð2;5Þþ ni;ð2;6Þ 2 ð50Þ ni;ð1;3Þþ ni;ð2;3Þþ ni;ð3;4Þþ ni;ð3;5Þþ ni;ð3;6Þ 2 ð51Þ ni;ð1;4Þþ ni;ð2;4Þþ ni;ð3;4Þþ ni;ð4;5Þþ ni;ð4;6Þ 2 ð52Þ ni;ð1;5Þþ ni;ð2;5Þþ ni;ð3;5Þþ ni;ð4;5Þþ ni;ð5;6Þ 2 ð53Þ ni;ð1;6Þþ ni;ð2;6Þþ ni;ð3;6Þþ ni;ð4;6Þþ ni;ð5;6Þ 2 ð54Þ where n¼X V =2 i¼1 ni

By lemma 1, each RRV ni satisfying (49)–(54) must be routable on the HSB of size two, and by lemma 5, n is also routable on the symmetric HSB of size V.

For an odd V, by lemma 5, we can decompose a vector n into [V/2] sub-RRVs nis, where i¼ 1,2, y, [V/2], such that each of the sub-RRVs ni, i¼ 1,2, y, [V/2] satisﬁes the set (49)–(54), and the remaining one n[V/2]satisﬁes the following set of inequalities: n½V =2;ð1;2Þþ n½V =2;ð1;3Þþ n½V =2;ð1;4Þþ n½V =2;ð1;5Þ þ n½V =2;ð1;6Þ 1 ð55Þ n½V =2;ð1;2Þþ n½V =2;ð2;3Þþ n½V =2;ð2;4Þþ n½V =2;ð2;5Þ þ n½V =2;ð2;6Þ 1 ð56Þ n½V =2;ð1;3Þþ n½V =2;ð2;3Þþ n½V =2;ð3;4Þþ n½V =2;ð3;5Þ þ n½V =2;ð3;6Þ 1 ð57Þ n½V =2;ð1;4Þþ n½V =2;ð2;4Þþ n½V =2;ð3;4Þþ n½V =2;ð4;5Þ þ n½V =2;ð4;6Þ 1 ð58Þ n½V =2;ð1;5Þþ n½V =2;ð2;5Þþ n½V =2;ð3;5Þþ n½V =2;ð4;5Þ þ n½V =2;ð5;6Þ 1 ð59Þ n½V =2;ð1;6Þþ n½V =2;ð2;6Þþ n½V =2;ð3;6Þþ n½V =2;ð4;6Þ þ n½V =2;ð5;6Þ 1 ð60Þ We have n¼X ½V =2 i¼1 ni

By lemma 1, each RRV nisatisfying the set (49)–(54) must be routable on the symmetric HSB of size two. Further, the last RRV is also routable on an HSB of size one (a clique of six terminals). Hence, by lemma 5, n is also routable on the symmetric HSB of size V.

(Only if)For an HSB Mh of size V, the total number of connections routed through each side of Mh, cannot exceed V. Hence, if n is routable on Mh, (1)–(6) must be

satisﬁed. &

Theorem 4: No HSBs with less than 15 V switches can be universal.

Proof: By deﬁnition 2, an RRV with only one non-zero component V such as (V, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , (0, V, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) ,y, (0, V, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, V)is routable on a universal HSB. Hence it needs at least V non-interacting switches for each type of connections to construct a universal HSB. Since there are 15 types of connections in an HSB, the smallest number of switches needed to construct a universal HSB is 15 V.

As mentioned in subsection 3.1, the total number of switches used in a six-sided symmetric switch block is 15 V. Thus, the symmetric HSBs are the ‘cheapest’ universal ones, i.e. it uses the minimum number of switches to construct a universal switch block. Note that the 15 V switches requirement is quite small, compared to a fully connected HSB which has 15V2 switches. By theorem 1, any two isomorphic switch blocks have the same routing capacity. Hence, we have following lemma.

Lemma 6: For any two isomorphic switch blocks Mh(T, S)and Mh0ðT0; S0Þ, Mh(T, S)is universal if and only if M0

hðT0; S0Þ is universal.

By lemma 6, we can obtain a whole class of universal switch blocks by performing isomorphism operations on a symmetric switch block.

(8)

4 Experimental results

To explore the effects of switch-block architectures on routing on a 3-D FPGA, we implemented a maze router in the C language and ran it on a SUN Ultra workstation. We tested the area performance of the router based on the CGE [2] and the SEGA[12] benchmark circuits. A logic-block pin was connected to any of the W tracks in the adjacent routing channel. These circuits were routed on a two-layer 3-D FPGA and randomly assigned the layer for each terminal of a net. The switch-block architectures used were the symmetric switch blocks and clique-based (Xilinx XC4000-like)switch blocks. The quality of a switch block was evaluated by the area performance of the detailed router. Table 2 shows the results. For the results listed in this table, we determined the minimum number of tracks W required for 100% routing completion for each circuit, using the two kinds of switch blocks. Because net ordering often affects the performance of a maze router, the benchmark circuits were routed by using the following three net-ordering schemes to avoid possible biases: (i)net order as given in the original benchmark circuits, (ii)shortest net ﬁrst (non-decreasing order of net lengths), and (iii) longest net ﬁrst (non-increasing order of net lengths). Also, since our main goal is to make fair comparisons for switch-block architectures, no optimisation such as rip-and-reroute phase, was incorpo-rated in the maze router (optimisation might bias the comparison).

Results show that, between the two kinds of switch blocks, the symmetric switch blocks usually needed the minimum W s for 100% routing completion, no matter what net order was used. The results show that the performed symmetric switch blocks improve routability at the chip level. (An average of 6% improvement in area performance was achieved.)Thus also performed experi-ments to explore the effects of net density on the area performance of switch blocks. Connections were randomly generated on a 15 15 3 (number of logic blocks in the three layers)3-D FPGA. For this purpose, it was assumed

that the number of pins on each logic blocks was unlimited (so that we could test denser circuits). As shown in Fig. 11, no matter how dense the circuit is (numbers of connections ranging from 400 to 1600), the symmetric switch blocks consistently outperform the clique-based switch blocks. (An average of 10% improvement in the area performance was achieved.)

5 Concluding remarks

We have proposed a class of the universal switch blocks for three-dimensional FPGAs. Each switch block has 15W switches and FS¼ 5. It has been proved that no switch block with less than 15W switches can be universal. The proposed switch blocks have been compared with those of

Table 2: Minimum numbers of tracks needed for detailed-routing completion

Circuit Number of

logic blocks

Number of connections

Number of tracks needed for detailed-routing completion

Original order Shortest net first Longest net first

Symmetric Clique Sym. Cliq. Sym. Cliq.

BUSC 13 12 2 392 7 9 7 7 8 8 DMA 18 16 2 771 9 9 8 8 10 10 BNRE 22 21 2 1257 9 9 8 9 10 10 DFSM 23 22 2 1422 9 9 8 8 9 9 Z03 27 26 2 2135 9 10 8 9 10 10 9symml 11 10 2 259 6 6 6 7 7 8 alu2 15 13 2 511 7 7 7 8 8 9 alu4 19 17 2 851 8 9 9 9 10 11 apex7 12 10 2 300 6 7 6 7 8 8 example2 14 12 2 444 8 8 8 8 9 10 k2 22 20 2 1256 11 11 10 11 12 12 term1 10 9 2 202 7 8 6 6 7 8 too_large 15 14 2 519 7 9 8 8 9 9 vda 17 16 2 722 7 9 8 9 9 10 Total F F 109 118 106 112 125 131 Comparison F F 1.00 1.08 1.00 1.06 1.00 1.05 500 1000 1500 No. of connections 28 24 20 16 12

No. of tracks needed for 100% routing

universal clique

Fig. 11 Comparison of area performance using the symmetric

switch blocks and clique-based switch blocks for different numbers of connections on a 15 15 3 3D-FPGA

(9)

Xilinx XC4000-like FPGAs. Experimental results have shown that the proposed universal switch blocks improve routabilty at the chip level.

There are several significant future research directions: Exploration of the universal switch blocks of 3-D FPGAs with respect to multi-terminal pins: in this paper, the theoretical analysis is based on two-pin nets. Nevertheless, the benchmark circuits also contain significant numbers of multi-pin nets. The experimental results based on circuits with multi-pin nets conform to the theoretical findings based on two-pin nets. The approach can be extended to the design of universal switch blocks with respect to multi-pin nets. For example, one may first model the types of a specified multi-pin net (say, C(6, 3)¼ 20 types of nets for three-pin nets). After the modelling is established, similar procedures may be applied for further analysis.

Development of global/detailed routers that can consider the universal switch-block architectures: to develop an FPGA router considering the switch-block architecture, we shall ﬁrst develop a congestion metric associated with the switch block, and then perform the routing based on congestion control of switch-block density as well as classical channel density.

6 References

1 Brown, S.D., Francis, R.J., Rose, J., and Vranesic, Z.G.: ‘Field-programmable gate arrays’ (Kluwer Academic Publishers, Boston, MA, 1992)

2 Rose, J., and Brown, S.: ‘Flexibility of interconnection structures for ﬁeld-programmable gate arrays’, IEEE J. Solid-State Circuits, 1991, 26, (3), pp. 277–282

3 Chang, Y.-W., Wong, D.F., and Wong, C.K.: ‘Universal switch blocks for FPGA design’, ACM Trans. Des. Autom. Electron. Syst., 1996, 1, (1), pp. 80–101

4 Chang, Y.-W., Zhu, K., Wong, D.F., Wu, G.-M., and Wong, C.K.: ‘Analysis of FPGA/FPIC switch modules’, in ACM Trans. Des. Autom. Electron. Syst., 2003, 8, (1), pp. 11–37

5 Zhu, K., Wong, D.F., and Chang, Y.-W.: ‘Switch module design with application to two-dimensional segmentation design’. Proc. IEEE/ ACM Int. Conf. on Computer-aided design, Santa Clara, USA, Nov. 1993, pp. 481–486

6 Sun, Y., Wang, T.-C., Wong, C.K., and Liu, C.L.: ‘Routing for symmetric FPGAs and FPICs’. Proc. IEEE/ACM Int. Conf. on Computer-aided design, Santa Clara, Nov. 1993, pp. 486–490

7 Wu, G.-M., and Chang, Y.-W.: ‘Switch matrix architecture and routing for FPDs’. Proc. ACM Int. Symp. on Physical design, Monterey, CA, USA, April 1998, pp. 481–486

8 Wu, G.-M., and Chang, Y.-W.: ‘Quasi-universal switch matrices for FPD design’, IEEE Trans. Comput., 1999, 48, (10), pp. 1107–1122

9 Alexander, M.J., Cohoon, J.P., Ganley, J.L., and Robins, G.: ‘Three-dimensional ﬁeld-programmable gate arrays’. Proc. IEEE Int. ASIC Conf., Austin, TX, September 1995, pp. 253–256

10 Depreitere, J., Neefs, H., Marck, H.V., Campenhout, J.V., Baets, B.D.R., Thienpont, H., and Veretennicoff, I.: ‘An optoelectronic 3-D ﬁeld programmable gate array’. Proc. 4th Int. Workshop on Programmable logic applications, Prague, Czech Republic, September 1994

11 Thakur, S., Chang, Y.-W., Wong, D.F., and Muthukrishnan, S.: ‘Algorithms for an FPGA switch module routing problem with application to global routing’, IEEE Trans. Computer-Aided Design, 1997, 16, pp. 32–46

12 Lemienx, G.G., and Brown, S.D.: ‘A detailed routing algorithm for allocating wire segments in ﬁeld-programmable gate arrays’. Proc. ACM/SIGDA Physical Design Workshop, Lake Arrowhead, CA, USA, 1993, pp. 215–216