Problem formulation of partitioning

3 Problem Formulation

3.2 Problem formulation of partitioning

This stage divides the input QCA circuit into levels which fit in with the scheduling constraint and meantime decrease the maximum level height of entire circuit.

3.2 problem formulation of placement

This stage rearranges the logic devices within their assigned level such that the total number of edge crossings between adjacency levels is minimized.

3.3 problem formulation of pin assignment

This stage assigns actual pin position of each logic devices of all levels which must provide a legal pin assignment for later channel routing stage while simultaneously minimize the wire length of pin-to-pin connection.

3.4 problem formulation of channel routing

This stage finishes connection among the pins in every two adjacency levels so that the total wire crossing is minimized.

Chapter 4 Algorithm

This chapter states our total flow of QCA layout synthesis. First we partition input combinational circuit into several levels. Then we transform the circuit into the k-layer bigraph representation and focus on minimizing the total edge crossings in this graph.

After that we convert this graph to physical circuit representation by pin assignment.

Eventually, we implement our QCA physical layout through channel routing step.

4.1 Partitioning

This section presents our partitioning algorithm, which consists of wire-block insertion, wire-block fan-out sharing, and level folding steps. In order to construct a valid schedule constraint, we do wire-block insertion that is putting each logic device into the level based on the topological ordering of these devices and then inserting wire blocks in all paths shorter than the longest reconvergent path. Two paths are reconvergent if they have the same starting device and the same destination device. After wire blocks are inserted into our circuit, there should be many identical wire blocks in the circuit. To remove those identical wire blocks, wire-block fan-out sharing is needed. In wire-block fan-out sharing, we merge those wire blocks which have the same input signal to maximize the sharing among the fan-outs of a logic device output.

Figure 4-1(a) shows the initial circuit partition with valid scheduling. Figure 4-1(b) shows the result of wire-block insertion approach, wire blocks are those circle filled with blue. And as shown in Figure 4-1(c), two wire blocks coming from E are combined into one.

(a)

(b)

(c)

Figure 4-1. (a) An example circuit partition with valid scheduling; (b) The circuit partition after wire-block insertion approach; (c) The circuit partition after wire-block fan-out sharing approach.

The algorithm of wire-block insertion is proposed by [11], and we use this algorithm between different levels.

Algorithm: Wire-block Insertion

Input: A directed graph G (V, E), V is logic device and E denote data dependency between devices.

Begin

n = E.pop();

S denotes the source vertex of E;

T denotes the destination vertex of E;

D = level (T) – level (S);

In this algorithm, we traverse edges one by one in the graph. For a given edge, if its two endpoints are not on the adjacent levels, a series of new wire-blocks are added between the two endpoints. These wire-blocks form a connection for the two terminals of this edge.

After wire blocks are inserted and fan-out sharing is finished, we calculate the height

of each level. The heights of all levels are uniformed by level folding. We first calculate the average height of all levels and set a value of little higher than the average height as the maximum height among all levels. Therefore, the height of some levels may exceed the maximum height. We can reduce the height of those levels by folded those level into two or more levels to satisfy our maximum height constraint. This is done by inserting wire blocks in place of logic devices and placing these devices into the next level. A logic device is moved into the next new level only if replacing this device with wire blocks can decrease the level height. Figure 4-2 shows an example that move a logic device to a new level and the level height is decreased from 12 to 8. After level-folding is completed, we perform fan-out sharing again to guarantee no identical wire blocks exist.

Figure 4-2. An example of level folding

4.2 Placement

This stage reorders the logic devices level by level to minimize the wire crossing between logic devices. Placement algorithm involves multilevel guided breadth-first search and adaptive insertion. First, multilevel guided breadth-first search method is

performed to get an initial placement. Then, placement refinement is achieved by performing adaptive insertion on each adjacent level from the last level to first level. The result of adaptive insertion is tentative and the reduction value for crossing number on each level is stored. Scanning from rightmost level toward left, a series of levels are selected for realizing placement such that this series of levels have maximum total number of crossing reduction.

4.2.1 Guided breadth-first search

Guided breadth-first search is first proposed by [19]. The main breadth-first search is preceded by another breadth-first search whose function is to find out the longest path in the graph. After the longest path is identified, the main search begins at on end of this path and continuing attaches all other shorter paths at any branch point.

At first search, we calculate height[v], the distance form the root for each node v, and also record the depth[v], maximum value of height[u] achieved by any descendant u of v in the breadth-first search tree. Then at main search, the node s for which depth[s] is maximum will be the beginning node of this search. The reason that we select node s to be the begging node of the main search would be illustrated as Figure 4-3. In Figure 4-3(b), BFS starting from an end point of a path results in no edge crossing in the graph.

This result is better than the result shown in Figure 4-3(a). When one node k has two or more children, we traverse its children ki by increasing order of depth [ki] and ties are

broken by traversing the node with larger height [k_i] first. We would illustrate this in Figure 4-4. In Figure 4-4(a), we complete our BFS traversal with smaller depth[v] first, and complete the traversal with larger depth[v] first in Figure 4-4(b). As shown in graph, traversing with smaller depth[v] first results in less edge crossing. While main search is finished, a series of numbers are assigned to each node based on the order of visitation in the main search. And we use these numbers to rearrange nodes on each level of the graph.

(a) (b)

Figure 4-3. (a) BFS starting not from a end point (b) BFS starting from a end point

(a) (b)

Figure 4-4. (a) BFS traversal order with smaller depth[v] first (b) BFS traversal order with larger depth[v] first

4.2.2 Multilevel guided breadth-first search

In our heuristic, we would like to minimize total offset between levels. Thus we apply guided breadth-first search to entire circuit, i.e. all logic devices are traversed in this search. In the first search, current traversed node will collect the neighbor nodes in the pre-level first and then collect nodes in the post-level. This method could provide an initial placement that the longest path (or largest component) of the graph will be decomposed from this circuit first and shorter path (or smaller component) of the graph start to attach to the largest component. And if there are several disjoint component in this graph, those components will separate from each other. We would illustrate multilevel guided breadth-first search through Figure 4-5(a) to Figure 4-5(d).

Figure 4-5(a) shows a 4-level bigraph with initial presentation. We select node A as the seed for the first breadth-first search and result is shown in Figure 4-5(b). In Figure 4-5(b), depth number is listed aside the node, for instance, depth number of node A is seven. Figure 4-5(c) shows the result of the main breadth-first search. In this graph, node L has three children T, Q, G. Because depth (T) < depth (Q) < depth (G), we traverse these nodes in the order of T, Q and G. Figure 4-5(d) is the placement of multilevel guided breadth-first search. All nodes are sorted by the increasing order of the number listed by Figure 4-5(c).

In our example of multilevel guided breadth-first search, there are 18 wire crossings in the initial presentation. After applying multilevel guided breadth-first search, only 8 wire crossings remain in the graph, in other words, 10 wire crossings is reduced in the graph after multilevel guided breadth-first search.

(a)

(b)

(c)

(d)

Figure 4-5. (a) An example of the initial placement of a 4-level bigraph; (b) The first breadth-first search result of (a), number aside the node is the depth of this node; (c) The main breadth-first search result of (a), numbers aside the node is the visitation of the main search; (d) Placement after multilevel guided breadth-first search, nodes in a level is sorted by numbers in (c).

4.2.3 Adaptive Insertion method

Local search [20] is a popular way to improve solutions in bigraph crossing problem.

Repeat simple operation on the current ordering until no instance of the operation would improve reducing number of crossings. An example of an operation is neighbor swapping, which is swapping nodes (at position) i and i + 1 on level l. Such operation would be repeated until no choice of i could decrease the number of crossings.

Adaptive insertion is a kind of local search based on neighbor swapping in the way:

each operation inserts a node at any position among other nodes on its level, and each node is inserted mostly once during a pass. Assume node i is inserted before node j, where j < i , The resulting cost change, D_l(i,j) , is ─ the effect of the insertion is that of a succession of swaps of node i with node j , j + 1, … , i -1. The condition node i is inserted after node j is similar with the resulting cost change D

∑

In our heuristic, one pass of adaptive insertion does a bottom-to-top sweep of logic devices on a level l. Devices are not allowed to stay in place even if no insertion would decrease the number of crossings. If device i is already inserted in a pervious operation, node i is marked and it is not selected any more during the remainder of the current pass.

To illustrate one pass of adaptive insertion, we use an example starting from Figure 4-6 (a) to perform a series of node swapping. The first node we selected to perform inserting operation is node a. Figure 4-6(b) shows the best position for node a is to insert it above node d, yielding a decrease of two. The next unmarked node is b. Node b is forced to move and finds its best position is below node d as shown in Figure 4-6(c).

After moving node b, the total crossings number is increase by one. Next we select node c as our seed to do operation. Node c will be placed above node b and gain a crossing reduction of one as shown in Figure 4-6(d). Next operation is swapped node a and node d with no change in number of crossings. Finally node e is placed to uppermost position as shown Figure 4-6(f) with total wire crossing number of eight.

(a) (b)

(e) (f)

Figure 4-6. Adaptive insertion on a simple example

To finish adaptive insertion is time-consuming because we should compute variations of wire crossings after every node swapping. Thus we use the adjacency matrix to compute the number of wire crossings to save the computing time.

In a bipartite graph, there is a wire crossings between two layers x and y if x_i connects to ym , xj connects to yn and xi < xj , ym > yn where i , j , m , n denotes the relative positional ordering of the nodes. In terms of an adjacency matrix, this can be considered as if point (i,j) is included in the lower left sub-matrix of (m,n) or vice versa. Therefore the total crossing is computed by adding the product of every matrix element and the sum of its left lower sub-matrix entries. Because this is very computational expensive, we implement it with the incremental wire crossing method proposed by [13] instead of computing the matrix directly. Figure 4-7 shows an instance of wire crossing computation .In this method, we firs calculate the row-wise sum of all entries as in Figure 4-7(c). Then we compute the column-wise sum of this row-wise sum matrix as in Figure 4-7(d). Finally, we calculate the sum of all the entries(r,c) in the original matrix by the entries (r + 1, c - 1) in the column-wise sum matrix to obtain the total wire crossing.

Incremental wire crossing method enables us to perform node insertion without computing wire crossing individually. In stead, we just update the value of rows after every operation to get the total number of wire crossings.

Figure 4-7 Illustration of wire crossing computation. (a) given graph, (b) initial adjacency matrix, (c) row-wise sum, (d) column-wise sum.

4.3 Pin assignment

This section presents our pin assignment algorithm, which consists of greedy pin assignment and pseudo routing steps.

4.3.1 Greedy pin assignment

In this step, we assign the input pin positions of all device blocks from the device on topmost position of the level to the device on bottommost position of the level. Then if there are some empty rows below all device blocks in this level, we begin to move device block from bottom to top of this level to their best position. The best position of the device block is the position that makes wire length of pin-to-pin connection of the device minimized. Sometimes, best position of a device block is not unique. In such condition, we would shift this device block to the bottommost best position for the reason that preserve most moving space for other device blocks. If best position of a device block is already occupied by other devices, this device block would be placed to an unoccupied

position that is closest to best position. We would illustrate this step from Figure 4-8(a) to Figure 4-8(g)

(a) (b) (c) (d)

(e) (f) (g) Figure 4-8. An example of greedy pin assignment

In Figure 4-8(a), the left side is the fixed level (pins are already assigned); right side is the device set of the variable level. Figure 4-8(b) shows the initial pin assignment of the variable level. Because three empty rows remain in this level, we begin to shift devices from E to A to their best position. In Figure 4-8(c), device E is shifted to its best position. Then we can notice that the best position of device D was occupied by device E

already, thus we place it to the position closest to best position as shown in Figure 4-8(d).

Other devices above device D is able to shift to the best position, Figure 4-8(e-g) show those results.

4.3.2 Pseudo routing

After greedy pin assignment, there may be still some unroutable nets. A net is unroutable if it forms a cycle in vertical constraint graph and can’t resolve this cycle by doglegging. We would verify such case exists or not by pseudo routing. When an unroutable net is found, we would insert a new row into this level or slightly shift the position of pins nearby the pin of this unroutable net to make this net routable. Pseudo routing will be repeated until no unroutable nets found in our pin assignment results.

4. 4 Channel Routing

This stage will finish the wire routing inside every level. Although in our synthesis model, level is a horizontal column, in this section we would like to lie down all levels as Figure 4-9. This is because channel routing like Figure 4-9 is a well-known form of channel routing.

4.4.1 Overview

We would finish the wire routing for each level in 5 steps. First, the level pins are scanned from left to right, and the VCG is constructed (step 1). Since there can’t be any cycle existed in VCG, cycles in the VCG are removed by doglegging (step 2). Once the VCG is acyclic, we would add crossing edges to VCG to reduce wire crossing (step 3). If cycles exist in the VCG, minimal weighted crossing edge set are removed and to make VCG becoming acyclic again (step 4). Eventually, we apply the LEF algorithm to assign

track to each net and finish routing of this level. Channel routing process is applied to every level to implement entire circuit.

4.4.2 Doglegging

Doglegging is to split of horizontal segments of a net. This is used, not only to remove cycles in the VCG, but also used to minimize the number of horizontal tracks. We can apply DFS to determine whether VCG contains cycles or not. Once a cycle is found, the net in this cycle is divided into several subnets and a vertical dogleg is inserted. Each of these subnets is created and added into the VCG to remove these cycles.

4.4.3 Crossing edge insertion

It’s clear that wire crossing can only occur between nets which overlap horizontally.

And the number of wire crossings between any arbitrary pair of horizontally overlapping nets is strongly influenced by their vertical ordering.

Therefore, in order to reduce crossing, we use the notation “crossing edges” which is first proposed by [14], between nets in the VCG. In order to drive those horizontally overlapping nets to form the vertical relationship which results in the minimum number of crossing between them. Each crossing edge is a directed edge and assigned a weight that determines the number of wire crossings saved by placing the net that denotes the source point above the net that denotes the destination point.

For example, consider Net 1 and Net 2, which overlap horizontally in Figure 4-9.

Figure 4-9(a) shows if Net 1 places above Net 2 then they will crossover three times. But

if Net 1 places below Net 2 as shown in Figure 4-9(c), there is only one crossing between them. Therefore, we modify VCG shown in Figure 4-9(b) to Figure 4-9(d). Crossing edge sources form Net 1 is weighted by two and points to Net 1. Figure 4-9(c) is the result channel routing of VCG shown in Figure 4-9(d) which is the fewest wire crossings.

(a) (b)

Figure 4-9. (a-b) A channel and VCG with minimum channel width. (c) Optimum solution for minimizing wire crossing. (d) Modified VCG with inserting crossing edge.

4.4.4 Cycle break

As stated in previous section, the weight of crossing edges determines the number of wire crossings reduction. Hence if VCG is acyclic after crossing edge insertion, we preserve all crossing edges to obtain a result with minimum wire crossings. Otherwise, if

VCG is not acyclic after insert crossing edges, the cycles must be removed before track assignment begins. Because if we reserve the crossing edge set with higher total weight, the result of channel routing has fewer wire crossings. Our goal is to find a set of crossing edges A⊂E with the minimum total weight such that G – A is acyclic.

Since a acyclic directed graph has a topological ordering, we develop our cycle break algorithm as shown in Figure 4-11. The main idea of our algorithm is to enforce a topological ordering of VCG and remove all violating edges that violate this topological ordering, i.e. edge start from the vertex with larger order to the vertex with smaller order.

Therefore our goal is to find a topological ordering of VCG that has the violating edges set with minimum total weight.

In our algorithm, we prune the vertices which have no outward edges first because these vertices wouldn’t introduce any violating edges if we place them in the tail of topological ordering list. Then we calculate the cost of the candidate vertex that has no inward vertical constraint edge. Cost of candidate vertex determines that if we want to break a cycle, how many crossing reduction we would lose. Thus we pick the vertex with lowest cost into the front of topological ordering list and update VCG. After update VCG, if any vertices which have no outward edges exist, we would prune them as the reason stated above. When the list of topological ordering is completely formed, we start to

在文檔中最小交點式的量子點細胞元自動化布局合成 (頁 21-0)