Chapter 4 Implementation Results
4.1 Implementation Environment Setup
During entire research process, TSMC 0.18um CMOS Mixed-Signal 1P6M process and Artisan standard cell library is adopted. The benchmark circuit for proposed design flow is a Single-Instruction-Multiple-Data Multiply and Accumulate Unit (SIMD MAC). The Hardware Description Language (HDL) for behavioral coding is Cadence Verilog-XL. The EDA tool for logic synthesis is Synopsys Design Vision.
The circuit verification and debug tool is Spring Soft Debussy. The APR for physical design is Synopsys Astro. The tool for physical verification is Cadence Virtuoso-XL and Mentor Calibre.
The SIMD MAC architecture is shown in figure 4.1. The recursive architecture is chosen to implement SIMD MAC. The recursive architecture involves building wider vector elements out of several of the narrower vector elements and adding the multiple results together. The execution results can be determined iteratively by repeating the data back through the unit over more than one cycle. Radix-4 modified booth encoding and Wallace tree are used to speed up the accumulation. The hot one coding compensates the two’s complement increment bits of each partial product created from booth encoding. Table 1 shows the summary of input and output ports.
The width of input and output are 32 and 40, respectively. The wider width of output originates from the iterative accumulative execution and prevents results from saturation easily. The Tag file in Table 1 decides the accumulative data from
accumulator within the SIMD MAC or loading from external accumulators. Table 2 introduces the operation of each SIMD instruction. The multiply with implicit accumulate (MIA) features are also added in this SIMD MAC.
Shifter & Mux
Vector Mask & Booth Encoder Multiplicand
Figure 4.1 SIMD MAC architecture.
Table 1 SIMD MAC input & output summary
Table 2 SIMD MAC instruction specification.
Note : x, y can top-half[31:16] or bottom-half[15:0]
operand
4.2 Physical Design Flow for Voltage Separation
The use of voltage separation permits operating different groups of the design at different voltage level in order to optimize the overall chip power consumption.
Moreover, the groups can even close the voltage supply when they are in the idle situation. But the voltage separation definitely makes the design process even more complicated with respect to static timing, power routing, floorplanning, etc. The Figure 4.2 shows an example problem associated with voltage separation. The cells belongs to each island are only placed to area indicating in the figure. The limited placed area restricts the flexibility of cell moving space, which may be used to resolve the timing requirement and wiring congestion. The larger core area results in the waste to area utilization and small one makes the trouble of wiring congestion.
Therefore, how to decide the appropriated size of core area and how to place each group to appropriate location is an important issue for voltage separation.
Limited Place Range Core Area
Figure 4.2 Example of limited placed area, the cells belongs to each group are placed to indicating area.
A design using Voltage Separation needs to put together the groups which are powered by the same voltage source. Besides, the groups need to be placed close to the power pads in order to minimize the power routing complexity and the IR drop.
Since each group requires its own power grid, the overhead with respect to area and delay is unavoidable. It may have additional area overhand due to potential dead spaces if two or more power rings can not packed perfectly. Figure 4.3 shows the dead space is caused by each power ring packed imperfectly. Because each group has its own power grid, each power gird can adjust the width of metal line to fit the design demand. Therefore, the area overhead is not a critical issue if the power ring packed perfectly. Figure 4.4 shows a layout diagram. There are four SIMD MACs which everyone has its own power rings. The area overhead created by voltage separation is shown in Table 4. There is only about 1 % area overhead in voltage separation.
Figure 4.3 Example of dead space, the dead space is caused by the each power rings packed imperfectly.
Figure 4.4 Example of voltage separation. Each group has its own power rings.
Table 3 Area overhead of voltage separation
Type Area Normalized Area
SEPARATED POWER
RINGS 1.375X1.179 1
UNSEPARATED
POWER RINGS 1.390X1.179 1.011
4.3 Body Bias for Cell-Based Design Flow
In Section 4.3.1, the implementation result using dual-supply cell is discussed.
Because dual-supply cell separates the body and source terminal, extra gap and layout design rule will result in increasing of cell height. In Section 4.3.2, the body bias implementation with conventional cell has to leave appropriate interval for straps insertion. The interval will incurs core area overhead. The detail estimation is discussed in this section.
4.3.1 Body Bias with Dual-Supply Standard Cell
In this section, the dual-supply standard cell library is proposed for body bias within cell-based design flow. Figure 4.5 shows the outlines of general and dual-supply standard cell. The general standard cell is provided by Artisan for TSMC 0.18um process. The cell height of general standard cell is 5.84um. Power and ground rails are designed to be 0.8um respectively. When dual-supply standard cell is adopted for body, one rail is powered from VDD (VSS), and another rail is powered from VDDB (VSSB) for body bias. Because each rail of dual-supply cell sinks less current than rail of the general cell that is powered from single VDD (VSS). Thus, the width of power rails can be shrunk. Meanwhile, VDDB of the dual-supply standard cell is further scaled to 0.3um because current magnitude of Body terminal is less than Source terminal. The cell height of dual-supply standard cell is then changed to 6.3 um, which accounts to 7.9% area overhand compared to general cell.
For the body bias implementation with dual-supply standard cell, the advantage is routing can be directly performed after placement. The existed APR tool can be used to automate the placement and routing of the cell most efficiently. Figure 4.6 shows the architecture of power rail connection. In Figure 4.6, there are four power rings, which create for each power rail connection.
VDD 0.8 um
VSS 0.8 um
VDDB 0.3 um
6.3 um VDD 0.5 um
VSS 0.5 um VSSB 0.3 um 5.84 um
(a) (b)
Figure 4.5 Outline of cells. (a) General cell. (b) Dual-supply cell.
VDDB
VDD
VSS VSSB
Core Area
Dual-Supply Cell
Figure 4.6 Power rail connection architecture.
4.3.2 Body Bias with General Standard Cell
This section introduces the body bias implantation with general standard cell. In order to add straps between standard cells, an interval is left for strap and well pattern insertion. Figure 4.7 shows the outline of cell placement. The cell placement of general design flow is shown in Figure 4.7 (a). The cells are placed side by side and shared VSS and VDD power rails with other cells. This type of cell placement can increase use efficiency of core area. Figure 4.7 (b) shows the cell placement for body bias. When cells are placed, an appropriate interval is left between cells for realizing body bias. The width of interval can be controlled by design parameter “Row/Core Ratio”. Therefore, the area overhead depends on how large the interval area occupy in the entire core size. Table 4 show the area overhead created by body bias with general cell. According to SIMD MAC experiment result, the core area is increased from 692 um x 692um to 750 um x 750 um. It accounts to 17% area overhead compared with SIMD MAC without body bias. The layout diagram for body bias with general cell is show in Figure 4.8. The white block indicating in the figure show some straps are inserted into the intervals betweens cells. The extra power rings supply VSSB power line for body bias.
Figure 4.7 Cell placement. (a) Cell placement for general design flow.
(b) Cell placement for body bias, the straps and well are added within the interval.
Table 4
Area Overhead for body bias with general cell
Interval Space Area Normalized Area SIMD MAC
(without body bias) 0 692 um x 692 um 1
SIMD MAC
(with body bias) 2 um 750 um x 750 um 1.174
Figure 4.8 Layout diagram for body bias with general cell.
4.4 Power Switch Implementation
During cell-based design flow, the APR tool determines the major part of cell placement and power gird architecture. Figure 4.9 show the outline of cell placement and power grid architecture. The cells are placed within the core area by timing analysis. Two power rings, VSS and VDD, are around the core area and deliver power to the cell equally. If designer want to add power switch into the design circuit, the architecture of power grid have to be consideration. The carefully implementation of power switch usually can decrease the area overhead without complicated modification.
In our research, the NMOSFET power switch is added into the SIMD MAC. The concept of adding power switch into cell-based design flow is shown in Figure 4.10.
The power switches are inserted into the power delivery grid between the metal-1 and metal-2. The power switch switches are distributed through the layout in two columns in order to reduce any current crowding issues in the power delivery grid. At the same time, the power switched addition, like described in Section 3.4, is not implemented by APR tool. So the cut-off signal routing which drives the gate of the switches has to realize by designer. Because the power switches are placed below the power grid, the area overhead created by power switches is nearly 0%. The layout diagram of power switches placement is shown in the Figure 4.11.
VDD
VSS
Core Area
Standard Cell
Figure 4.9 Cell placement and power grid architecture for cell-base design flow.
Metal 2
Metal 1
VSS
Virtual VSS Power Switch
Cell
Figure 4.10 Concept of adding power switch into cell-based design circuitry.
Figure 4.11 Power switches placement.