Optimization During Detailed Placement and Post Placement

Phase

Although Agrawal et al.,author of [18], has suggested that scan chain reordering should only be executed after place-and-route procedure, this suggestion is based on reducing scan chain routing overhead, decreasing scan chain routing congestion, improving timing violation, etc. From the dynamic power formula, we can find that shortening connection length also has the benefit of lowering loading capacitance and thus reducing switching power. In order to further reduce routing length of LPSC, we implement two methods on shortening LPSC routing length during placement phase and optimizing LPSC after placement phase. From these, we can know how much cost will be charged to improve LPSC routing length. In the following subsections, we show how we implement our detailed placement which can improve LPSC routing cost and how we optimize LPSC routing length in post placement phase. The LPSC order has been decided after test vectors are known, and the connection of LPSC will not be broken when going into the procedures proposed in this section.

3.1 Optimizing Low Power Scan Chain during De-tailed Placement Procedure

In cell-based VLSI physical design, many efficient placement algorithms have been developed, such as simulated-based placement [21], partition-based placement [8], performance driven placement [15], etc. In order to develop a detailed placer which can keep information for specific cells, such as flip-flops, during placement process, we choose greedy detailed placement as our placement algorithm which is used by Dragon standard-cell placement algorithm [16] [20]. In order to simplify implementation cost, we consider each standard cell occupying a regular bin in a grid graph. Our greedy detailed placement consists of three steps. First, we partition circuit with hMetis [1] which is multilevel hypergraph partitioning scheme. From this partitioning step, we can group cells which have strong connections among them. Second, we construct weighted graph where vertices represent groups and edges represent interconnection between two groups. We use a greedy TSP program to find a low cross interconnection cost order for these groups. We then use this group order to be our row order and generate an initial placement. Third, we randomly choose one base cell, and from the location of base cell, we randomly choose a target cell within window region. Size of window region can be controlled by setting height and width parameters. The search direction can be classified into two types: vertical search and horizontal search, and it is adjustable. From the experiment conducted in [17], the vertical search to horizontal search ratio is recommended to be 1/5. We use direct wire length between base cell and target cell as our cost.We estimate the total wire length cost when swapping process takes place between base and target cells. If the total wire length cost is reduced, we swap base cell and target cell, otherwise the swapping would not occur. Figure 3.1 shows the flow graph of this greedy placement procedure.

Figure 3.1: Experiment flow of LPSC optimization during placement (A)ISCAS’89 [9] circuit format. We have developed a script to translate it to ISPD98 net format.

(B) We use hMetis to partition the benchmark circuit. The number of partition depends on number of square root of gate count. (C) After the circuit is partitioned, we build a weighted graph where vertices represent groups and edges represent interconnection between two groups. (D) From (C), we use heuristic TSP to find a feasible group order. (E) When group order is decided, cells are randomly placed into their corresponding rows. (F)Two cells, base cell and target cell, are randomly chosen. If the swapping procedure can improve the routing cost, the swap will occur.

Figure 3.2: The strategy for optimizing LPSC during placement phase. If the chosen base cell is scan cell, the selection of the target cell will be limited into the quadrant where the preceding scan cell is located.

In order to improve LPSC routing cost, we modify the greedy placement strategy as follows. Instead of breaking scan chain connection, we order the scan chain with low power order and try to optimize its routing length in detailed placement phase.

In the detailed placement process, if the chosen base cell is scan cell, the selection of the target cell will be limited into quadrant where the preceding scan cell is located.

And if the base cell is not scan flip-flop, the placer performs normal greedy swapping strategy. Figure 3.2 shows how we define the target cell search range.

3.2 Optimizing Low Power Scan Chain during Post Placement Phase

Optimizing LPSC during placement process may cause some gates not finding their feasible positions. From the experimental result, we learn that scan chain will have about 30% shorter routing length by pulling scan cells closer but the design will have poor routing cost and will get worse when design is larger. To our understanding, it is not good to have better LPSC routing performance while sacrificing the total routing cost of the designs.While testing power is the critical issue in modern VLSI design, physical challenge is also essential as process shrinks. Solving low power scan cell wire length problem may not be a good idea. In order to eliminate this

Figure 3.3: We search feasible location for some specific scan cell between itself and its predecessor and compare swapping cost for each location in this specific region.

If there is a feasible location to swap, the swapping procedure will be performed.

impact in physical design, we do not optimize LPSC during placement but refine it in post placement phase instead. After placement procedure, we can have physical information for each scan flip-flop. Then, the optimizing process goes with checking feasible position for each scan cell. For example, Scan Cell_j will be searched for suitable location between itself and its predecessor Scan Cell_j−1. Scan Cell_j will estimate swapping cost for each cell in specific region. If there are several locations which can reduce both chip routing cost and scan chain interconnection length, the most feasible cell, which is closest to predecessor of Scan Cellj and has no impact on previous optimized placement, will be selected to swap with Scan Cellj. The example is as shown in Figure 3.3. The experimental results show that scan chain routing length can be reduced by 1% to 3% with no influence on previous optimized placement result.

Chapter 4 Scan Chain Reordering with

在文檔中低功率循序串列於繞線端最佳化方法研究 (頁 25-30)