Quasi-Universal Switch Matrices
for FPD Design
Guang-Ming Wu and Yao-Wen Chang, Member, IEEE
AbstractÐAn FPD switch module M with w terminals on each side is said to be universal if every set of nets satisfying the dimension constraint (i.e., the number of nets on each side of M is at most w) is simultaneously routable through M [8]. Chang et al. have identified a class of universal switch blocks in [8]. In this paper, we consider the design and routing problems for another popular model of switch modules called switch matrices. Unlike switch blocks, we prove that there exist no universal switch matrices. Nevertheless, we present quasi-universal switch matrices which have the maximum possible routing capacities among all switch matrices of the same size and show that their routing capacities converge to those of universal switch blocks. Each of the quasi-universal switch matrices of size w has a total of only 14w ÿ 20 (14w ÿ 21) switches if w is even (odd), w > 1, compared to a fully populated one which has 3w2ÿ 2w switches. We prove that no switch matrix with less than 14w ÿ 20 (14w ÿ 21) switches can be quasi-universal. Experimental results demonstrate that the quasi-universal switch matrices improve routabilty at the chip level.
Index TermsÐAnalysis, architecture, design, digital, gate array, programmable logic array.
æ
1 I
NTRODUCTIONF
IELD-PROGRAMMABLE DEVICES (FPDs) refer to any digital, user-configurable integrated circuits used to implement logic functions. Due to their short production time and low prototyping cost, FPDs have become a very popular alternative to realizing logic designs. Fig. 1 shows the architectures of major commercially available FPDs. As illustrated in Fig. 1a and Fig. 1b, a Field-Programmable Gate Array (FPGA) consists of an array of logic modules that can be connected by general routing resources. The logic modules contain combinational and sequential circuits that implement logic functions. The routing resources consist of horizontal and vertical channels and their intersection areas. An intersection area of a horizontal and a vertical channel is referred to as a switch module. A net can change its routing direction via a switch module; this requires using at least one programmable switch inside the switch module. A large circuit that cannot be accommodated into a single FPGA is divided into several parts; each part is realized by an FPGA and these FPGAs are then inter-connected by a Field-Programmable Interconnect Chip (FPIC) (see Fig. 1c). In a Complex Programmable Logic Device (CPLD), logic modules are surrounded by continuous horizontal and vertical routing tracks (see Fig. 1d). Similar to FPGAs, an intersection area of a horizontal and a vertical channels in an FPIC or a CPLD is also referred to as a switch module.Recent works by [5], [17] have shown that the feasibility of FPGA design is constrained more by routing resources than by logic resources and, often, routing delays, rather than logic-module delays, dominate the performance of FPGAs. Therefore, it is desirable to facilitate routing in the design of FPGAs and FPICs. Switch modules are the most
important component of the routing resources in FPDs. Studies by [6], [8], [15] have shown that the higher the routability of the switch modules, the smaller the track count needed to achieve 100 percent routing completion. Hence, increasing the routability of a switch module also improves the area performance of a router. Therefore, it is of significant importance to consider switch-module design. In current technology, FPD programmable switches usually consume a large amount of area. Due to the area constraint, the number of switches that can be placed in a switch module is usually limited, implying limited routability. Therefore, there is a basic trade-off between routability and area for switch-module architectures.
There are two types of switch modules in commercially available FPDs, switch matrices and switch blocks. (See Fig. 2 for their models.) The effects of switch-module architectures on routing for the symmetric-array-based FPGAs were first studied experimentally by Rose and Brown [15]. A theoretical study of flexibility and routability was later presented based on a stochastic model [6]. The primary conclusion in both of the studies in [6], [15] is that high pin-to-track connectivity together with relatively low switch-module connectivity is a better solution to the routability and area trade-off. Therefore, the architecture of a switch module is of particular importance, due to a relatively small switch population in a switch module. Chang et al. [8] proposed a class of high-routability switch blocks and analyzed three types of well-known switch blocks; they showed theoretically and experimentally that switch blocks with higher routability usually lead to better area perfor-mance, which confirms the findings by [6], [15].
In this paper, we focus on switch matrices. Not much work has been reported on switch-matrix design. Zhu et al. in [19] first explored the feasibility conditions for switch matrices and presented a design heuristic based on a stochastic approach. Chang et al. in [7] applied a network-flow based heuristic for switch-module design. Sun et al. in
. The authors are with the Department of Computer and Information Science, National Chiao Tung University, Hsinchu 300, Taiwan. E-mail: ywchang@cis.nctu.edu.tw.
For information on obtaining reprints of this article, please send e-mail to: tc@computer.org, and reference IEEECS Log Number 108469.
[16] studied the effects of using the two switch-module architecturesÐswitch matrices and switch blocksÐon rout-ing. Based on the study in [16], an FPGA/FPIC with switch matrices in general needs fewer switches but more routing tracks for routing completion than that with switch blocks. This work shows the trade-offs in using the two types of switch modules.
In this paper, we consider the design and routing problems for universal switch matrices. An FPD switch module M with w terminals on each side is said to be universal if every set of nets satisfying the dimension constraint (i.e., the number of nets on each side of M is at most w) is simultaneously routable through M [8]. We prove that there exist no universal switch matrices. Never-theless, we present quasi-universal switch matrices which
have the maximum possible routing capacities among all switch matrices of the same size and show that their routing capacities converge to those of universal switch modules. Each of the quasi-universal switch matrices of size w has a total of only 14w ÿ 20 (14w ÿ 21) switches if w is even (odd), w > 1, compared to a fully populated one which has 3w2ÿ
2w switches. We prove that no switch matrix with less than 14w ÿ 20 (14w ÿ 21) switches can be quasi-universal. Ex-perimental results demonstrate that the quasi-universal switch matrices improve routabilty at the chip level.
The rest of the paper is organized as follows: Section 2 gives the preliminaries for our problem. Section 3 explores the feasibility conditions of switch matrices. Section 4 presents the quasi-universal switch matrices. Section 5 gives an example graph modeling for detailed routing. Finally, experimental results are reported in Section 6.
2 P
RELIMINARIESA switch matrix consists of a grid of w horizontal and w vertical tracks. We represent a switch matrix by Mw(or M if
w is not of concern). There are two types of switches in a switch matrix, crossing switches and separating switches. (See Fig. 2a.) If a crossing switch at the intersection of a horizontal and a vertical tracks is ªON,º the two tracks are connected; if it is ªOFF,º the tracks are not connected and are thus electrically noninteracting. If a separating switch on a track is ªOFF,º the track is split into two electrically noninteracting routing segments so that the
Fig. 1. Major FPD architectures. (a) The symmetric-array FPGA model. (b) The row-based FPGA model. (c) The FPIC model. (d) The CPLD model.
terminals on opposite sides can be used independently; if it is ªON,º the track becomes a single electrical track. In Fig. 2a, the crossing switches are represented by solid circles and the separating switches by hollow circles. Switch matrices are used in various symmetric-array FPGAs [4], [11], row-based FPGAs [1], [9], FPICs [3], and CPLDs [2].
A connection is an electrical path between two terminals on different sides of a switch module. Connections can be of six types, each of which is characterized by two sides of a module, as shown in Fig. 3. For example, type-6 connections connect terminals on the left and the bottom sides of a module. Type-1 and type-2 connections are straight connec-tions, whereas the others are bent connections.
A routing requirement vector (RRV) ~n is a six-tuple n1; n2; . . . ; n6, where niis the number of type-i connections
required to be routed through a switch module, 0 ni w,
i 1; 2; . . . ; 6. An RRV ~n is said to be routable on a switch module M, denoted by ~n / M, if there exists a routing for ~n on M. For example, the RRV 0; 1; 0; 1; 1; 1 is routable on the switch matrix shown in Fig. 4 by programming the switches 1, 2, 3, and 7 to be ON, and a routing solution is illustrated by the thick lines; on the other hand, the RRV 2; 2; 1; 0; 1; 0 is not routable on the switch matrix.
An RRV ~m m1; m2; . . . ; m6 is dominated by another
RRV ~n n1; n2; . . . ; n6, denoted by ~m ~n, if and only if
mi ni, 8i; 1 i 6. Any RRV ~m is routable on a switch
matrix M if there exists an RRV ~n that is routable on M and ~
m ~n; i.e., ~n / M ^ ~m ~n ) ~m / M. An RRV ~n is called maximally routable on a switch module M if ~n / M and no additional nets can be routed through M; in other words, ~n is maximally routable if ~n is not dominated by any other routable RRV on M.
The routing capacity of a switch module M is referred to as the number of distinct routable RRVs on M; that is, the routing capacity of M is the cardinality jf~nj~n / Mgj. The universal switch module (USM for short) is defined in [8] as follows:
Definition 1 [8]. A switch module Mw is called universal if
the following set of inequalities is the necessary and sufficient conditions for an RRV ~n n1; n2; . . . ; n6 to be
routable on M:
n1 n3 n6 w 1
n2 n3 n4 w 2
n1 n4 n5 w 3
n2 n5 n6 w: 4
Note that the number of nets routed through each side of M cannot exceed w; this dimension constraint is characterized by the preceding four inequalities, one for each side. Therefore, a USM has the maximum routing capacity, and it is desirable to find such a universal switch matrix, if any. In this paper, we design a class of switch matrices with best possible routing capacities and give the qualitative and quantitative analyses for the matrices.
3 F
EASIBILITYC
ONDITIONSIn this section, we explore the feasibility conditions of switch matrices.
Lemma 1. An RRV ~n is routable on a switch matrix Mw
(~n / Mw) only if ~n w; w; 0; 0; 0; 0 or the following set of
inequalities is satisfied: n1 n3 n6 w 5 n2 n3 n4 w 6 n1 n4 n5 w 7 n2 n5 n6 w 8 n1 n2 maxfn3 n5; n4 n6g 2w ÿ 1: 9
Proof. The proof is inspired by the work in [19]. Obviously, w; w; 0; 0; 0; 0 / Mw. It is trivial that
(5)-(8) are necessary conditions for an RRV ~n to be routable on Mw because nets routed through each side
of Mw cannot exceed w. We show that (9) is also a
necessary condition for ~n / M. Consider the RRV ~
n 0; 0; w; 0; w ÿ 1; 0. We show that ~n is maximally routable. Clearly, any increment in n1, n2, n3, n4, or n6
would result in violation of the necessary conditions (5) or (6). To verify that it is also impossible to increase n5, see Fig. 5a. Since n3 w, track t1, on the left side of
the switch matrix in Fig. 5a, must be used to route a type-3 net, say net x. Net x must turn upward somewhere on the track t1, and this will prevent one
track on the bottom side of the switch matrix from routing type-5 nets. For example, if net x turns upward at the intersection of track t1 and track t3, then track t3
cannot be used to route type-5 nets. Therefore, n5
w ÿ 1 cannot be increased and ~n is maximally routable. Consider the RRV ~n i; i; w ÿ i; 0; w ÿ i ÿ 1; 0, for
Fig. 3. Six types of connections.
1 i w ÿ 1. Since n1 n2 i, there are i horizontal
tracks and i vertical tracks that must be used to route n1 type-1 and n2 type-2 nets. Further, these tracks
cannot be used to route any other type of nets. Therefore, routing RRV ~n on Mw can be reduced to
routing ~n0 on M0 of size w ÿ i by prerouting n1 type-1
a n d n2 t y p e - 2 n e t s , w h e r e
~
n0 0; 0; w ÿ i; 0; w ÿ i ÿ 1; 0. For the case where
n16 n2, since routing a bent net (a type3, 4, 5, or
-6 net) requiring using a horizontal and a vertical tracks, the combined number of type-3 and -5 (type-4 and -6) is limited by i maxfn1; n2g. Hence, for the
case where n16 n2, routing RRV ~n on Mw can also be
reduced to routing ~n0 on M0 of size w ÿ i by prerouting
n1 type-1 and n2 type-2 nets, where ~n0 0; 0; w ÿ
i; 0; w ÿ i ÿ 1; 0 and i maxfn1; n2g. Based on the
preceding observation, ~n0 is maximally routable on
M0, and so is ~n on Mw. Similarly, the RRVs
i; i; 0; w ÿ i; 0; w ÿ i ÿ 1, i; i; w ÿ i ÿ 1; 0; w ÿ i; 0, and i; i; 0; w ÿ i ÿ 1; 0; w ÿ i are also maximally routable on Mw, 0 i w ÿ 1. Thus, (9) is a necessary condition
for an RRV to be routable on Mw. tu
By Lemma 1, we have the following negative result. Theorem 1. There exists no universal switch matrix.
Also, by Lemma 1, an RRV is simply unroutable on any switch matrix if the RRV fails to satisfy (5)-(9). We call an RRV nontrivial if it satisfies (5)-(9); otherwise, it is trivial (trivially unroutable). A switch matrix M is said to be
quasi-universal if all nontrivial RRVs are routable on M. We give the formal definition of a quasi-universal switch matrix (Q-USM for short) as follows:
Definition 2. A switch matrix Mw is called quasi-universal if
(5)-(9) are the necessary and sufficient conditions for an RRV ~
n n1; n2; . . . ; n6 to be routable on Mw.
Since (5)-(9) are the most fundamental routing con-straints, a Q-USM has the maximum routing capacity among all switch matrices. It is thus of particular importance to find such class of switch matrices.
4 Q
UASI-U
NIVERSALS
WITCHM
ATRICES(Q-USM)
In this section, we present procedures for constructing the Q-USM and give quantitative analyses for the matrices. 4.1 Procedures for Q-USM Design
Fig. 5a, Fig. 5b, Fig. 5c, and Fig. 5d show configurations for r o u t i n g n~1 0; 0; 6; 0; 5; 0, n~2 0; 0; 5; 0; 6; 0,
~
n3 0; 0; 0; 6; 0; 5, and ~n4 0; 0; 0; 5; 0; 6 on switch
matrices M1, M2, M3, and M4 of size w 6, respectively. In particular, they are the only configurations for routing
~
n1; . . . ; ~n4on M1; . . . ; M4. Clearly, for a switch matrix M to
be quasi-universal, ~ni; 1 i 4, must be routable on M.
This observation motivated our construction for the diagonal switch matrix shown in Fig. 6. For the purpose of concise description, we refer to the subdiagonals of a switch matrix M as the conceptual slanted lines parallel and adjacent to the two diagonals of M. Therefore, there are four
subdiagonals on each switch matrix (see Fig. 6). A diagonal switch matrix D is constructed based on the following three rules:
. Rule 1: Place crossing switches on the two diagonals of D;
. Rule 2: Place crossing switches on the four sub-diagonals of D;
. Rule 3: Place separating switches between the diagonals and subdiagonals of D.
4.2 Proof of the Quasi-Universality
In this section, we show that the diagonal switch matrices constructed by the procedures mentioned earlier are ªcheapestº Q-USM. Let
w f w; w; 0; 0; 0; 0g[
f n1; n2; n3; n4; n5; n6jn1 n3 n6 w; n2 n3 n4
w; n1 n4 n5 w;
n2 n5 n6 w; n1 n2 maxfn3 n5; n4 n6g
2w ÿ 1g:
To prove that a diagonal switch matrix Dw is
quasi-universal, we must show that all RRVs in w are routable
on Dw. The proof of the quasi-universality is based on
mathematical induction and is informally described as follows: It is trivial to show that D1 and D2 are
quasi-universal. Assume that all diagonal switch matrices are quasi-universal for w m. To prove that the claim holds for the case where w m 2, we give constructive routings for all maximally routable RRVs in m2ÿ m on Dw2 by
applying the routings for the cases where w m. Further, we claim that the number of switches used by each of our diagonal switch matrices is, in fact, the minimum require-ment for a switch matrix to be quasi-universal.
To establish the proof, we first need some definitions and lemmas. Let r90 ~n (rÿ90 ~n) denote a 90-degree rotation
counterclockwise (clockwise), rh ~n (rv ~n) a reflection
along the horizontal (vertical) axis for a routing configura-tion for ~n. For example, if
~
n 1; 2; 3; 4; 5; 6; r90 ~n 2; 1; 4; 5; 6; 3;
rÿ90 ~n 2; 1; 6; 3; 4; 5;
rh ~n 1; 2; 6; 5; 4; 3;
and rv ~n 1; 2; 4; 3; 6; 5. We have the following definition
and lemmas:
Definition 3. Two routing configurations for ~n and ~m are equivalent if ~n ( ~m) can be obtained by performing a sequence of r90, rÿ90, rh, and/or rvoperations on ~m (~n). We say that ~n
and ~m are in the same equivalence class and denote ~n and ~m by ~n ~m.
Lemma 2. 8 ~m; ~n; ~m ~n) ~m / D()~n / D.
Proof. Since the diagonal switch matrix D is symmetrical, we can obtain the same diagonal switch matrix by performing the rotation or reflection operations. The
claim thus follows. tu
Lemma 3.
n1; n2; n3; n4; n5; n6 / Dw)
n1 2; n2 2; n3; n4; n5; n6 / Dw2:
Proof. If ~n n1; n2; n3; n4; n5; n6 / Dw, ~n must satisfy
(5)-(9), according to Lemma 1. When routing ~n0 n1
2; n2 2; n3; n4; n5; n6 on Dw2 (see Fig. 7), we can use
tracks t1and t2(t3and t4) to preroute two type-1 (type-2)
nets. Hence, as illustrated in Fig. 7, the routing on the remaining part of Dw2 is identical to routing ~n on Dw.
The claim thus follows. tu Lemma 4.
n1; n2; n3; n4; n5; n6 / Dw)
n1; n2; n3 1; n4 1; n5 1; n6 1 / Dw2:
Proof. If ~n n1; n2; n3; n4; n5; n6 / Dw, ~n must satisfy
(5)-(9), according to Lemma 1. When routing ~n0
n1; n2; n3 1; n4 1; n5 1; n6 1 on Dw2(see Fig. 8),
we can use the crossing switches on the four corners of Dw2to preroute a type-3, a type-4, a type-5, and a type-6
nets. As illustrated in Fig. 8, the routing on the remaining part of Dw2is identical to routing ~n on Dw. Therefore,
~ n0/ D w2. tu Lemma 5. n1; n2; n3; n4; n5; n6 / Dw) n1 1; n2 1; n3 1; n4; n5; n6 / Dw2 (also, n1 1; n2 1; n3; n4 1; n5; n6; n1 1; n2 1; n3; n4; n5 1; n6 and n1 1; n2 1; n3; n4; n5; n6 1 / Dw2).
Fig. 7. Prerouting two type-1 nets and two type-2 nets on Dw2. Fig. 8. Prerouting a type-3, a type-4, a type-5, and a type-6 nets on the respective four corners of Dw2.
Proof. If ~n n1; n2; n3; n4; n5; n6 / Dw, ~n must satisfy
(5)-(9), according to Lemma 1. When routing ~n0 n1
1; n2 1; n3 1; n4; n5; n6 on Dw2 (see Fig. 9a), we can
use the bottom- and right-most tracks of Dw2 to
preroute a type-1 and a type-2 nets, the crossing switch on the upper-left corner of Dw2to preroute a type-3 net.
As illustrated in Fig. 9a, the routing on the remaining part of Dw2 is reduced to routing ~n on Dw. Applying
similar techniques, the RRV ~n0 n1 1; n2 1; n3; n4
1; n5; n6; n1 1; n2 1; n3; n4; n5 1; n6 or n1 1; n2
1; n3; n4; n5; n6 1 can also be routed on Dw2 (see
Fig. 9b, Fig. 9c, Fig. 9d). Therefore, ~n0/ Dw2. tu
Lemma 6. n1; n2; n3; n4; n5; n6 / Dw) n1 1; n2; n3 1; n4 1; n5; n6 / Dw2 (also, n1; n2 1; n3 1; n4; n5; n6 1; n1 1; n2; n3; n4; n5 1; n6 1 and n1; n2 1; n3; n4 1; n5 1; n6 / Dw2).
Proof. If ~n n1; n2; n3; n4; n5; n6 / Dw, ~n must satisfy
(5)-(9), according to Lemma 1. When routing ~n0 n1
1; n2; n3 1; n4 1; n5; n6 on Dw2 (see Fig. 10), we can
use the crossing switches on the two upper corners of
Dw2 to preroute a type-3 and a type-4 nets, and the
bottom-most track to preroute a type-1 net. As illustrated in Fig. 10a, the routing on the remaining part of Dw2is
reduced to routing ~n on Dw. Applying similar
techni-q u e s , t h e R R V ~n0 n
1; n2 1; n3 1; n4; n5; n6
1; n1 1; n2; n3; n4; n5 1; n6 1 or n1; n2 1; n3; n4
1; n5 1; n6 can also be routed on Dw2 (see Fig. 10b,
Fig. 10c, Fig. 10d). Therefore, ~n0/ D
w2. tu
Lemmas 3-6 give constructive routings for most maxi-mally routable RRVs in w2ÿ w on Dw2 by extending
the routings for Dw. However, there exist some RRVs in
w2ÿ w that can not be derived from Lemmas 3-6. We
proceed to identify those RRVs in the following discussion. If an RRV ~n n1; n2; n3; n4; n5; n6 is nontrivial for Dw,
then any of the following RRVs (say ~n0)
n1 2; n2 2; n3; n4; n5; n6; n1; n2; n3 1; n4 1; n5 1; n6 1; n1 1; n2 1; n3 1; n4; n5; n6; n1 1; n2 1; n3; n4 1; n5; n6; n1 1; n2 1; n3; n4; n5 1; n6; n1 1; n2 1; n3; n4 1; n5; n6 1; n1 1; n2; n3 1; n4 1; n5; n6; n1 1; n2; n3; n4; n5 1; n6 1; n1; n2 1; n3 1; n4; n5; n6 1; and n1; n2 1; n3; n4 1; n5 1; n6
is nontrivial for Dw2. We say an RRV such as ~n0is derivable
from w since it is in w2 and can be obtained by
performing some operation defined in Lemmas 3-6 on an RRV ~n 2 w; it is underivable, otherwise. Based on Lemmas
3-6, we have the fact that ~n / Dw)~n0/ Dw2. Do there
exist any underivable RRVs? As an example, 0; 0; w 1; 1; w 1; 1 2 w2 is underivable from w. (At first
glance, it seems that the RRV can be obtained by performing the operation defined in Lemma 4 on 0; 0; w; 0; w; 0; unfortunately, 0; 0; w; 0; w; 0 62 w.) We have the following
lemmas.
Lemma 7. Table 1 lists all maximally underivable RRVs in w2ÿ w.
Proof. We partition all maximal, nontrivial RRVs ~n0
n0
1; n02; n03; n04; n05; n06 into five sets A; B; . . . ; E as follows:
A f~n0j~n02 w2ÿ w; n01 2 ^ n02 2g B f~n0j~n02 w2ÿ w; n03 1 ^ n04 1 ^ n05 1 ^ n06 1g C f~n0j~n02 w2ÿ w; n01 1 ^ n02 1 ^ n03 1g [ f~n0j~n02 w2ÿ w; n01 1 ^ n02 1 ^ n04 1g [ f~n0j~n02 w2ÿ w; n01 1 ^ n02 1 ^ n05 1g [ f~n0j~n02 w2ÿ w; n01 1 ^ n02 1 ^ n06 1g D f~n0j~n02 w2ÿ w; n01 1 ^ n03 1 ^ n04 1g [ f~n0j~n02 w2ÿ w; n01 1 ^ n05 1 ^ n06 1g [ f~n0j~n02 w2ÿ w; n02 1 ^ n03 1 ^ n06 1g [ f~n0j~n02 w2ÿ w; n02 1 ^ n04 1 ^ n05 1g
Set E : The remaining RRV s:
We first prove that all RRVs in A are derivable. For each ~n0 n0
1; n02; n03; n04; n05; n06 2 A; ~n0 satisfies the
fol-lowing set of inequalities (by Lemma 1):
n0 1 n03 n06 w 2; 10 n0 2 n03 n04 w 2; 11 n0 1 n04 n05 w 2; 12 n0 2 n05 n06 w 2; 13 n0 1 n02 maxfn03 n50; n04 n06g 2 w 2 ÿ 1; 14 n0 1 2 ^ n02 2: 15
By Lemma 3, ~n0 can be derived from from
~ n n1; n2; n3; n4; n5; n6; n1 n01ÿ 2; n2 n02ÿ 2; n3 n03; n4 n04; n5 n05; n6 n06:
Substituting ~n for ~n0 in (10), . . . , (24), we have
n1 n3 n6 w;
n2 n3 n4 w;
n1 n4 n5 w;
n2 n5 n6 w;
n1 n2 maxfn3 n5; n4 n6g 2w ÿ 1:
Therefore, ~n 2 w. The claim that all RRVs in Set A are
derivable holds.
We then identify all underivable RRVs in B ÿ A. For each ~n0 n0
1; n02; n03; n04; n05; n06 2 B ÿ A; ~n0 satisfies the
following set of inequalities: n0 1 n03 n06 w 2; 16 n0 2 n03 n04 w 2; 17 n0 1 n04 n05 w 2; 18 TABLE 1
n0 2 n05 n06 w 2; 19 n0 1 n02 maxfn03 n50; n04 n06g 2 w 2 ÿ 1; 20 n0 3 1 ^ n04 1 ^ n05 1 ^ n06 1 and ~n062 A: 21
By Lemma 4, ~n0can be derived from
~n n1; n2; n3; n4; n5; n6; n1 n01; n2 n02; n3 n03ÿ 1; n4 n04ÿ 1; n5 n05ÿ 1; n6 n06ÿ 1:
Substituting ~n for ~n0in (16), . . . , (20), we have
n1 n3 n6 w; 22
n2 n3 n4 w; 23
n1 n4 n5 w; 24
n2 n5 n6 w; 25
n1 n2 maxfn3 n5; n4 n6g 2w 1: 26
By (9) in Lemma 1, ~n 2 wexcept that
n1 n2 n3 n5 2w 1; 27
n1 n2 n4 n6 2w 1; 28
n1 n2 n3 n5 2w; or 29
n1 n2 n4 n6 2w: 30
We show that (27) and (28) are illogical. Combining (22) and (23), (23) and (24), (22) and (25), and (24) and (25), we have n1 n2 n4 n6 2n3 2w; 31 n1 n2 n3 n5 2n4 2w; 32 n1 n2 n3 n5 2n6 2w; 33 n1 n2 n4 n6 2n5 2w: 34 By (28) and (31), we have 2w 1 n1 n2 n4 n6 n1 n2 n4 n6 2n3 2w;
a contradiction. Similarly, (27) is illogical. Therefore, we need to consider only (29) and (30).
We identify the underivable RRVs induced by (29) and (30) in the following. Subtracting (29) from (32) and (33), we have n4 n6 0. Substituting zeros for n4and
n6 in (22), . . . , (25), we have n1 n3 w; 35 n2 n3 w; 36 n1 n5 w; 37 n2 n5 w: 38 By (29), (35), . . . , (38), we have n1 n3 n2 n3 n1 n5 n2 n5 w:
Therefore, n1 n2and n3 n5. Since ~n062 A ~n062 B ÿ A
TABLE 2
Capacity Comparison of the Universal Switch Modules and the Diagonal Switch Matrices
Fig. 11. The graph modeling. (a) A symmetrical-array-based FPGA architecture. (b) Switches in the connection module and the switch matrix. (c) The graph topology.
and n1 n01 and n2 n02, n1 1 and n2 1. ~n
0; 0; w; 0; w; 0 or 1; 1; w ÿ 1; 0; w ÿ 1; 0 satisfies (29). Similarly, we can show that ~n 0; 0; 0; w; 0; w or 1; 1; 0; w ÿ 1; 0; w ÿ 1 are underivable, by (30). Substi-tuting ~n0 for ~n, we conclude that
~
n0 0; 0; w 1; 1; w 1; 1; 1; 1; w; 1; w; 1;
0; 0; 1; w 1; 1; w 1;
and 1; 1; 1; w; 1; w are underivable in B ÿ A. Let the equivalence class induced by set x be x. We represent
an equivalence class by symbol [ ]. Since 0; 0; w 1; 1; w 1; 1 0; 0; 1; w 1; 1; w 1 and
1; 1; w; 1; w; 1 1; 1; 1; w; 1; w;
we have AÿB 0; 0; w 1; 1; w 1; 1; 1; 1; w; 1; w; 1.
Applying similar techniques, we can obtain all underivable RRVs in w2ÿ w, listed as follows:
E 0; 0; w 2; 0; w 1; 0; 1; 0; w 1; 0; w 1; 0 = in Classes 3 and 5 = BÿA 0; 0; w 1; 1; w 1; 1; 1; 1; w; 1; w; 1 = Class 4 = Cÿ A[B 1; 1; w; 0; w 1; 0; 2; 1; w; 1; w ÿ 1; 0; 2; 1; w; 0; w; 0 = in Classes 2; 3; and 5 = Dÿ A[B[C 1; 0; 1; w 1; 0; w= in Class 1 =:
Table 1 summarizes those underivable RRVs into five equivalence classes. Note that 1; 0; 1; w 1; 0; w is in Class 1 since it can be obtained by performing the rv ~n0
operation on 1; 0; w 1; 1; w; 0 (see Section 4.2 for the equivalence operation). (Similarly, 1; 1; w; 0; w 1; 0
belongs to Class 3.) tu
Lemma 8. Algorithms 1-5 (Figs. 14, 15, 16, 17, 18) give respective routing solutions for the five classes of underivable RRVs listed in Table 1.
Based on Lemmas 3-8, we have the following theorem. Theorem 2. The diagonal switch matrices are quasi-universal. Proof. By definition, we shall show that all nontrivial RRVs
are routable on a diagonal switch matrix Dw. By
Lemma 2, we only need to show that there exists one RRV in each maximal, nontrivial equivalence class that is
routable on Dw. We proceed by induction on the size w of
a diagonal switch matrix. The claim trivially holds for w 2 since it is easy to enumerate all RRVs and check if the RRVs are routable on the corresponding diagonal switch matrices. Assume that all diagonal switch matrices are quasi-universal for w m. Consider the case where w m 2. Lemmas 3-6, and Algorithms 155 (Figs. 14, 15, 16, 17, 18) give constructive routings for all maximally routable RRVs in m2ÿ mby applying the
routing for the cases where w m and w m ÿ 2. (See Algorithms 1-5 (Figs. 14, 15, 16, 17, 18) for the routings for RRVs in the five equivalence classes (listed in Table 1) on the diagonal switch matrix.) Hence, by induction, the diagonal switch matrices are quasi-universal. tu Thus, we have shown that the diagonal switch matrices are quasi-universal and they thus have the maximum routing capacity among all switch matrices of the same size. It is easy to see that each diagonal switch matrix of size w contains 6w ÿ 8 (6w ÿ 9) crossing switches if w is even (odd), and 8w ÿ 12 separating switches, w > 1. In particular, the numbers of switches are also the minimum requirement for a switch matrix to be quasi-universal.
Fig. 12. Algorithm for finding the set of mutually exclusive switch pairs on a switch matrix.
TABLE 3
Theorem 3. No switch matrix with less than 6w ÿ 8 (6w ÿ 9 if w is odd) crossing switches and 8w ÿ 12 separating switches can be quasi-universal, w > 1.
Proof. Consider the four RRVs
0; 0; w; 0; w ÿ 1; 0; 0; 0; w ÿ 1; 0; w; 0; 0; 0; 0; w; 0; w ÿ 1; and 0; 0; 0; w ÿ 1; 0; w. Since they all satisfy (5)-(9), they must be routable on a quasi-universal switch matrix. The set of switches needed to route the the four RRVs is equivalent to the ªunionº of the switches in the four corresponding routing topologies shown in Fig. 5 (Fig. 5 shows an example for the case where w 6). It is thus easy to see that 6w ÿ 8 (6w ÿ 9 if w is odd) crossing switches and 8w ÿ 12 separating switches, w > 1, are the minimum requirement for a switch matrix to be quasi-universal by counting the number of switches in the
ªunionº set. tu
Hence, the diagonal switch matrices are the ªcheapestº quasi-universal switch matrices. Note that the number of switches required for a diagonal switch matrix is very small compared to a fully populated switch matrix which has w2
crossing switches and 2w2ÿ 2w separating switches.
4.3 Routing-Capacity Analysis
Let Dwand Uwbe a diagonal switch matrix and a universal
switch module of size w, respectively. Let FDwbe the feasible
set for Dw; that is, FDw f~nj~n / Dwg. FUw is similarly
defined. We have the following theorem.
TABLE 4
Number of Tracks Needed for Detailed-Routing Completion for the CGE (Top 5) and SEGA (Bottom 9) Benchmark Circuits
The three schemes for net order are 1) original net order as given in the benchmark circuits, 2) shortest net first (nondecreasing order of net lengths), and 3) longest net first (nonincreasing order of net lengths).
TABLE 5
Comparison of the Area Performance by Using the Diagonal Switch Matrices and Randomly Generated Switch Matrices for
Various Connection Densities, Based on 15 15 FPGA
Fig. 13. Comparison of the area performance by using the diagonal switch matrices and randomly generated switch matrices for different numbers of connections on a 15 15 FPGA.
Theorem 4. jFDwj jFUwj ÿ 2w 5 6! 2w6 24w5 119w4 312w3 464w2 96w 144c: Proof. FDw f~njn1 n3 n6 w; n2 n3 n4 w; n1 n4 n5 w; n2 n5 n6 w; n1 n2 maxfn3 n5; n4 n6g 2w ÿ 1g [ f w; w; 0; 0; 0; 0g:
The closed form for the cardinality of FDw, w > 0, can be
obtained as follows: jFDwj jf~njn1 n3 n6 w; n2 n3 n4 w; n1 n4 n5 w; n2 n5 n6 wgj ÿ wÿ1[ i0 f i; i; 0; w ÿ i; 0; w ÿ ig ÿ wÿ1[ i0 f i; i; w ÿ i; 0; w ÿ i; 0g 5 6! 2w6 24w5 119w4 312w3 464w2 384w 144c ÿ 2w 6!5 2w6 24w5 119w4 312w3 464w2 96w 144c: Note that the identity
jFDwj jf~njn1 n3 n6 w; n2 n3 n4 w; n1 n4 n5 w; n2 n5 n6 wgj 5 6! 2w6 24w5 119w4 312w3 464w2 384w 144c is given by [8]. tu
Based on the above lemma, it is simple to verify the
following theorem. Theorem 5. (Capacity ratio)
1. jFDwj=jFUwj is a strictly increasing function of w,
w > 0; 2. limw!1jFDwj=jFUwj 1. Proof. 1. For w > 0, jFDw1j jFUw1j ÿjFDwj jFUwj jFUwjjFDw1j ÿ jFDwjjFUw1j jFUw1jjFUwj jFDw1j jFDwj 2w ÿ jFDwj jFDw1j 2w 2 jFUw1jjFUwj jFDw1 jjFDw j2wjFDw1 jÿjFDw1 jjFDw jÿ2wjFDw jÿ2jFDw j jFUw1 jjFUw j 2wjFDw1j ÿ 2wjFDwj ÿ 2jFDwj jFUw1jjFUwj 72jFA~~w Uw1jjFUwj > 0; where
~ A 10; 126; 637; 1608; 2008; 777; ÿ144; ~ w w6; w5; w4; w3; w2; w; 1: Because jFDwj jFUwj< jFDw1j jFUw1j, jFDwj=jFUwj is a strictly increasing function of w, w > 0. 2. limw!1jFjFDwUwjj limw!1 jFDwj jFDwj2w 1. tu
Therefore, the routing capacity of a diagonal switch matrix converges to that of a universal switch module of the same size. Table 2 summarizes the routing capacities for the
Fig. 15. Algorithm for routing the maximal underivable Equivalence Class 2 on the diagonal switch matrix Dw2.
universal switch module and the diagonal switch matrices and their capacity ratios.
5 G
RAPHM
ODELING FORD
ETAILEDR
OUTINGIn the previous section, we showed theoretically that the diagonal switch matrices have high routing capacities. To explore the effects of switch-matrix architectures on chip-level routing, we shall test the area performance of a router on an FPD chip using benchmark circuits. To develop a router for experimentation, we may model an FPD as a graph and apply the graph-search technique to FPD routing. In this section, we use a symmetrical-array-based FPGA as an example to demonstrate the graph modeling.
Given an FPGA architecture, we use a vertex to represent a wire segment or a logic-module pin and an edge to represent a connection that can be established by program-ming a switch or by using a track in the switch matrix. See Fig. 11 for an illustration. Fig. 11b shows a logic module with two pins on one side. We introduce two vertices p1and
p2 for the two pins shown in the figure. There are two
horizontal (vertical) routing tracks partitioned into four wire segments (for the portion considered here), two on each of the left and right (the top and bottom) sides of the switch matrix. We introduce a vertex for each wire segment (i1; i2; . . . ; i8 in Fig. 11c) and an edge between the two
vertices associated with each pair of wire segments abutted on the switch matrix (edges i1; i3; i6; i8, etc.). If there is a
crossing switch connected to any pin or wire segment, then introduce an edge between the two corresponding vertices. (Note that the switches in the connection module can be viewed as crossing switches. A connection module is, in fact, a switch module with no separating switches.) For
instance, since pin p1can connect to wire segment i1(i2), an
edge e1 (e2) between vertices p1 and i1 (i2) is created. For
each crossing switch, we create an additional four edges for its incident wire segments (thus, a clique for the four vertices associated with those segments is formed). The graph modeling is thus done. In addition to the graph modeling, however, we need to use two data structures to cope with the problems of connection conflicts. The conflicts arise in two forms:
. Mutually exclusive crossing switches: Two crossing switches with no separating switch between them are mutually exclusive. For example, in Fig. 11, crossing switches s1 and s2 cannot be used for
different connections at the same time since there is no separating switch between them. In this case, we say that s1 and s2 are in the same exclusion set. The
algorithm in Fig. 12 provides a method to find the exclusion sets for a switch matrix.
. Mutually exclusive connections: Two connections are mutually exclusive if they do not belong to the same net and are incident on the same crossing switch. For example, in Fig. 11, edges i1; i5 and i3; i7 cannot
be used for different connections at the same time since the connections are mutually exclusive. It is easy to derive an algorithm similar to The algorithm in Fig. 12 to identify all mutually inclusive connec-tions.
With the data structures, we can incorporate the detection for illegal connections and exclusion sets into a router.
Based on the graph modeling, we may formulate the routing problem as finding a set of disjoint trees (a forest), one tree for a net and each tree connecting all terminals of a
net. Any graph search-based algorithm such as maze router can be used for detailed routing.
6 E
XPERIMENTALR
ESULTSTo explore the effects of switch-matrix architectures on routing, we implemented a maze router based on the graph modeling mentioned in the preceding section in the C language and ran on a Sun Ultra workstation. We tested the area performance of the router based on the CGE [15] and SEGA [14] benchmark circuits. Table 3 gives the names of the circuits, the numbers of logic modules in the FPGAs, and the numbers of nets and connections in the circuits. A logic-module pin was connected to any of the w tracks in the adjacent routing channel. The switch-matrix architec-tures used were the diagonal switch matrices, randomly generated switch matrices with the same numbers of switches as those in the diagonal switch matrices, and the switch matrices designed by [19].
The quality of a switch matrix was evaluated by the area performance of the detailed router. Table 4 shows the results. For the results listed in this table, we determined the minimum number of tracks w required for 100 percent routing completion for each circuit, using the three kinds of switch matrices. Because net ordering often affects the performance of a maze router, we routed the benchmark circuits by using the following three net-ordering schemes to avoid possible biases: 1) net order as given in the original benchmark circuits, 2) shortest net first (nondecreasing order of net lengths), and 3) longest net first (nonincreasing order of net lengths). Also, since our main goal is to make fair comparisons for various switch-matrix architectures, no rip-and-reroute phase was incorporated in the maze router (optimization might bias the comparison). The running
times ranged from 3 sec for the smallest circuit (9symml) to 160 sec for the largest one (Z03). Our results show that, among the three kinds of switch matrices, the diagonal switch matrices usually needed the minimum ws for 100 percent routing completion, no matter what order was used. The results show that our diagonal switch matrices can improve the routability at the chip level.
It should be noted that the design in [19] is based on a different switch-matrix routing model from oursÐin [19], only one crossing or separating switch can be used for routing a connection on a switch matrix, and at most 2w separating switches can be placed on a switch matrix, whereas ours allows multiswitch routing and does not have the upper-bound constraint, 2w; therefore, it is impossible to make a completely fair comparison with [19]. (This is why we also compared our designs with those randomly generated switch matrices.)
We also performed experiments to explore the effects of net density on the area performance of switch matrices. We randomly generated connections on a 15 15 (number of logic modules) FPGA. For this purpose, we assume that the number of pins on each logic module is unlimited (so that we could test denser circuits). As shown in Table 5 and Fig. 13, the denser the circuit, the better the diagonal switch matrices than the randomly generated switch matrices. This phenomenon reveals the facts that the routability of a single switch matrix plays a more important role when 1) the connection density on a chip gets denser and 2) the switch matrices become larger. Notice that denser applications and larger chips are trends of the commercial applications and products. Therefore, we expect that the switch-matrix architectures will have even greater impact on FPD chip routability than they do now.
7 C
ONCLUDINGR
EMARKSWe have presented a class of quasi-universal switch matrices and shown theoretically and experimentally that they result in better area performance in routing. Our research also confirms the findings by [6], [8], [15] that switch modules with larger routing capacities often result in better routing solutions. Also, our study has shown that the routability of a single switch matrix plays a more important role when 1) the net density on a chip gets denser, and 2) the switch matrices become larger. Since denser applica-tions and larger chips are trends of the commercial applications and products, the switch-matrix architectures would have even greater impact on FPD chip-level routability than they do now.
To explore the effects of FPD switch-matrix architectures on routing, we adopted the bottom-up approach by optimizing a single switch matrix first (and future work shall extend to the cases for multiple switch matrices in series). The methodology is mainly motivated by the golden rule ªoptimize the common casesº [10], which is the key to contemporary computer designs. For real applications, most connections are short (the common cases); for example, about 60 percent (90 percent) of connections in the CGE [15] and SEGA [14] benchmark circuits are routed through no more than two (five) switch modules, independent of the sizes of FPGAs. Therefore, the architecture of a single switch module is of particular importance. In contrast, though theoretically sound and interesting, the worst-case scenarioÐemphasizing the worst-case routing instanceÐ-for exploring the architectural effects is often pathologically pessimistic and rarely corresponds to practical applications. (See [12], the Turing Award lecture by R. M. Karp.) We believe that the average/common-case scenario shall be a superior alternative to architectural design.
A
CKNOWLEDGMENTSThis work was partially supported by the National Science Council of Taiwan under Grant No. NSC-87-2215-E-009-041.
R
EFERENCES[1] Actel Corp., FPGA Data Book and Design Guide, 1996. [2] Altera Corp., FLEX 10K Handbook, 1996.
[3] Aptix Inc., FPIC AX1024D, Preliminary Data Sheet, Aug. 1992. [4] AT&T Microelectronics, AT&T Field-Programmable Gate Arrays
Data Book, Apr. 1995.
[5] N. Bhat and D. Hill, ªRoutable Technology Mapping for LUT FPGAs,º Proc. IEEE Int'l Conf. Computer Design, pp. 95-98, 1992. [6] S.D. Brown, J. Rose, and Z.G. Vranesic, ªA Stochastic Model to
Predict the Routability of Field-Programmable Gate Arrays,º IEEE Trans. Computer-Aided Design, vol. 12, no. 12, pp. 1,827-1,838, Dec. 1993.
[7] Y.-W. Chang, D.F. Wong, and C.K. Wong, ªDesign and Analysis of FPGA/FPIC Switch Modules,º Proc. IEEE Int'l Conf. Computer Design, pp. 394-401, Austin, Tex., Oct. 1995.
[8] Y.-W. Chang, D.F. Wong, and C.K. Wong, ªUniversal Switch Modules for FPGA Design,º ACM Trans. Design Automation of Electronic Systems, vol. 1, no. 1, pp. 80-101, Jan. 1996.
[9] A. El Gamal et al., ªAn Architecture for Electrically Configurable Gate Arrays,º IEEE J. Solid-State Circuits, vol. 24, no. 2, pp. 394-398, Apr. 1989.
[10] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, second ed. Morgan Kaufmann, 1996.
[11] H.C. Hsieh et al., ªThird-Generation Architecture Boosts Speed and Density of Field-Programmable Gate Arrays,º Proc. IEEE Custom Integrated Circuits Conf., pp. 31.2.1-31.2.7, May 1990. [12] R.M. Karp, ªCombinatorics, Complexity, andRandomness,º
Comm. ACM, vol. 29, no. 2, pp. 98-109, 1986.
[13] C.Y. Lee, ªAn Algorithm for Path Connections and Its Applica-tions,º IRE Trans. Electronic Computers, vol. 10, pp. 346-365, Sept. 1961.
[14] G.G. Lemienx and S.D. Brown, ªA Detailed Routing Algorithm for Allocating Wire Segments in Field-Programmable Gate Arrays,º Proc. ACM/SIGDA Physical Design Workshop, pp. 215-216, Lake Arrowhead, Calif., 1993.
[15] J. Rose and S. Brown, ªFlexibility of Interconnection Structures for Field-Programmable Gate Arrays,º IEEE J. Solid State Circuits, vol. 26, no. 3, pp. 277-282, Mar. 1991.
[16] Y. Sun, T.-C. Wang, C.K. Wong, and C.L. Liu, ªRouting for Symmetric FPGAs and FPICs,º Proc. IEEE/ACM Int'l Conf. Computer-Aided Design, pp. 486-490, Santa Clara, Calif., Nov. 1993. [17] S. Trimberger and M. Chene, ªPlacement-Based Partitioning for Lookup-Table-Based FPGA,º Proc. IEEE Int'l Conf. Computer Design, pp. 91-94, 1992.
[18] Xilinx Inc., The Programmable Logic Data Book, 1996.
[19] K. Zhu, D.F. Wong, and Y.-W. Chang, ªSwitch Module Design with Application to Two-Dimensional Segmentation Design,º Proc. IEEE/ACM Int'l Conf. Computer-Aided Design, pp. 481-486, Santa Clara, Calif., Nov. 1993.
Guang-Ming Wu received the BS degree in information and computer engineering from Chung Yuan Christian University, Chun-Li, Taiwan, Republic of China, in 1990, and the MS degree in computer science from Chao Tung University, Hsinchu, Taiwan, Republic of China, in 1994. He is currently working toward the PhD degree in computer science from Chao Tung University. His research interests include design automation and optimization in VLSI design.
Yao-Wen Chang (S'94-M'96) received the BS degree in computer science and information engineering from National Taiwan University in 1988, and the MS and the PhD degrees in computer science from the University of Texas at Austin in 1993 and 1996, respectively.
He was with the IBM T.J. Watson Research Center, Yorktown Heights, New York, in the VLSI group during the summer of 1994. Since 1996, he has been an associate professor in the Depart-ment of Computer and Information Science at National Chiao Tung University, Hsinchu, Taiwan. His current research interests lie in design automation, architectures, and systems for VLSI and combinatorial optimization.
Dr. Chang received the Best Paper Award at the 1995 IEEE International Conference on Computer Design (ICCD-95) for his work on FPGA routing and the MS Thesis Supervision Award from the Institute of Electrical Engineering, Taiwan, in 1998. He is a member of the IEEE, the IEEE Circuits and Systems Society, the ACM, and ACM/SIGDA.