• 沒有找到結果。

Chapter 3 Implementation

3.4 Power Switch Implementation

In Section 3.4, the implementation of power switch is introduced. Entire design phase of power switch is using Cadence Virtuoso-XL and verified by Mentor Calibre.

The NMOSFET power switch implementation diagram is shown in Figure 3.12. The power switches are added below VSS net and distributed throughout the layout in two main VSS columns in order to reduce any current crowding issues.. During implementation of power switch, some layout modification is need for VSS delivery grid.

The layout of NMOSFET power switch is indicated in Figure 3.13. The NMOSFET use multi-finger type architecture. The multi-finger architecture provides larger channel width in the limit area, which is suitable for cell-based design and is inserted below the power delivery grid. In order to change general delivery gird to fit power switch insertion, some modification of power delivery is needed. The principle of modification is shown in Figure 3.14 and Figure 3.15. Figure 3.14 and Figure 3.15 show the cross-sections of general VSS delivery grid and modified VSS delivery grid, respectively. Compare Figure 3.14 and Figure 3.15, the contact between metal 1 and metal 2 is removed and a NMOSFET power switch is inserted. The Gate terminal of NMOSFET is driven by sleep signal. Figure 3.16 shows the location of power switches in the entire core.

Metal 2

Metal 1

VSS

Virtual VSS Power Switch

Cell

Figure 3.12 Diagram of power switch implementation.

Drain

Source Gate

Figure 3.13 Layout of multi-finger type NMOSFET power switch.

Figure 3.14 Profiles of general VSS delivery grid.

P-well

P+

N+

N+

V1

CO CO

CO

Metal 1 Metal 1

Metal 2

PO CO Metal 1

P-substrate

VSS

Power Switch

Figure 3.15 Profile of VSS power delivery grid for power switch. Modified VSS delivery grid, indicating the insertion of power switch and remove of contact.

Figure 3.16 Entire layout view of power switch implementation.

Chapter 4

Implementation Results

In this chapter, The EDA tools and design environment setup which is used in this thesis is specified in Section 4.1. The area penalty cause by physical design flow for voltage separation is shown in figure 4.2. The body bias design requirement method and implementation consideration is described in Section 4.3. In Section 4.4, the extra area and routing resource for power switch is introduced. Finally, the advantages and drawbacks are discussed in Section 4.5.

4.1 Implementation Environment Setup

During entire research process, TSMC 0.18um CMOS Mixed-Signal 1P6M process and Artisan standard cell library is adopted. The benchmark circuit for proposed design flow is a Single-Instruction-Multiple-Data Multiply and Accumulate Unit (SIMD MAC). The Hardware Description Language (HDL) for behavioral coding is Cadence Verilog-XL. The EDA tool for logic synthesis is Synopsys Design Vision.

The circuit verification and debug tool is Spring Soft Debussy. The APR for physical design is Synopsys Astro. The tool for physical verification is Cadence Virtuoso-XL and Mentor Calibre.

The SIMD MAC architecture is shown in figure 4.1. The recursive architecture is chosen to implement SIMD MAC. The recursive architecture involves building wider vector elements out of several of the narrower vector elements and adding the multiple results together. The execution results can be determined iteratively by repeating the data back through the unit over more than one cycle. Radix-4 modified booth encoding and Wallace tree are used to speed up the accumulation. The hot one coding compensates the two’s complement increment bits of each partial product created from booth encoding. Table 1 shows the summary of input and output ports.

The width of input and output are 32 and 40, respectively. The wider width of output originates from the iterative accumulative execution and prevents results from saturation easily. The Tag file in Table 1 decides the accumulative data from

accumulator within the SIMD MAC or loading from external accumulators. Table 2 introduces the operation of each SIMD instruction. The multiply with implicit accumulate (MIA) features are also added in this SIMD MAC.

Shifter & Mux

Vector Mask & Booth Encoder Multiplicand

Figure 4.1 SIMD MAC architecture.

Table 1 SIMD MAC input & output summary

Table 2 SIMD MAC instruction specification.

Note : x, y can top-half[31:16] or bottom-half[15:0]

operand

4.2 Physical Design Flow for Voltage Separation

The use of voltage separation permits operating different groups of the design at different voltage level in order to optimize the overall chip power consumption.

Moreover, the groups can even close the voltage supply when they are in the idle situation. But the voltage separation definitely makes the design process even more complicated with respect to static timing, power routing, floorplanning, etc. The Figure 4.2 shows an example problem associated with voltage separation. The cells belongs to each island are only placed to area indicating in the figure. The limited placed area restricts the flexibility of cell moving space, which may be used to resolve the timing requirement and wiring congestion. The larger core area results in the waste to area utilization and small one makes the trouble of wiring congestion.

Therefore, how to decide the appropriated size of core area and how to place each group to appropriate location is an important issue for voltage separation.

Limited Place Range Core Area

Figure 4.2 Example of limited placed area, the cells belongs to each group are placed to indicating area.

A design using Voltage Separation needs to put together the groups which are powered by the same voltage source. Besides, the groups need to be placed close to the power pads in order to minimize the power routing complexity and the IR drop.

Since each group requires its own power grid, the overhead with respect to area and delay is unavoidable. It may have additional area overhand due to potential dead spaces if two or more power rings can not packed perfectly. Figure 4.3 shows the dead space is caused by each power ring packed imperfectly. Because each group has its own power grid, each power gird can adjust the width of metal line to fit the design demand. Therefore, the area overhead is not a critical issue if the power ring packed perfectly. Figure 4.4 shows a layout diagram. There are four SIMD MACs which everyone has its own power rings. The area overhead created by voltage separation is shown in Table 4. There is only about 1 % area overhead in voltage separation.

Figure 4.3 Example of dead space, the dead space is caused by the each power rings packed imperfectly.

Figure 4.4 Example of voltage separation. Each group has its own power rings.

Table 3 Area overhead of voltage separation

Type Area Normalized Area

SEPARATED POWER

RINGS 1.375X1.179 1

UNSEPARATED

POWER RINGS 1.390X1.179 1.011

4.3 Body Bias for Cell-Based Design Flow

In Section 4.3.1, the implementation result using dual-supply cell is discussed.

Because dual-supply cell separates the body and source terminal, extra gap and layout design rule will result in increasing of cell height. In Section 4.3.2, the body bias implementation with conventional cell has to leave appropriate interval for straps insertion. The interval will incurs core area overhead. The detail estimation is discussed in this section.

4.3.1 Body Bias with Dual-Supply Standard Cell

In this section, the dual-supply standard cell library is proposed for body bias within cell-based design flow. Figure 4.5 shows the outlines of general and dual-supply standard cell. The general standard cell is provided by Artisan for TSMC 0.18um process. The cell height of general standard cell is 5.84um. Power and ground rails are designed to be 0.8um respectively. When dual-supply standard cell is adopted for body, one rail is powered from VDD (VSS), and another rail is powered from VDDB (VSSB) for body bias. Because each rail of dual-supply cell sinks less current than rail of the general cell that is powered from single VDD (VSS). Thus, the width of power rails can be shrunk. Meanwhile, VDDB of the dual-supply standard cell is further scaled to 0.3um because current magnitude of Body terminal is less than Source terminal. The cell height of dual-supply standard cell is then changed to 6.3 um, which accounts to 7.9% area overhand compared to general cell.

For the body bias implementation with dual-supply standard cell, the advantage is routing can be directly performed after placement. The existed APR tool can be used to automate the placement and routing of the cell most efficiently. Figure 4.6 shows the architecture of power rail connection. In Figure 4.6, there are four power rings, which create for each power rail connection.

VDD 0.8 um

VSS 0.8 um

VDDB 0.3 um

6.3 um VDD 0.5 um

VSS 0.5 um VSSB 0.3 um 5.84 um

(a) (b)

Figure 4.5 Outline of cells. (a) General cell. (b) Dual-supply cell.

VDDB

VDD

VSS VSSB

Core Area

Dual-Supply Cell

Figure 4.6 Power rail connection architecture.

4.3.2 Body Bias with General Standard Cell

This section introduces the body bias implantation with general standard cell. In order to add straps between standard cells, an interval is left for strap and well pattern insertion. Figure 4.7 shows the outline of cell placement. The cell placement of general design flow is shown in Figure 4.7 (a). The cells are placed side by side and shared VSS and VDD power rails with other cells. This type of cell placement can increase use efficiency of core area. Figure 4.7 (b) shows the cell placement for body bias. When cells are placed, an appropriate interval is left between cells for realizing body bias. The width of interval can be controlled by design parameter “Row/Core Ratio”. Therefore, the area overhead depends on how large the interval area occupy in the entire core size. Table 4 show the area overhead created by body bias with general cell. According to SIMD MAC experiment result, the core area is increased from 692 um x 692um to 750 um x 750 um. It accounts to 17% area overhead compared with SIMD MAC without body bias. The layout diagram for body bias with general cell is show in Figure 4.8. The white block indicating in the figure show some straps are inserted into the intervals betweens cells. The extra power rings supply VSSB power line for body bias.

Figure 4.7 Cell placement. (a) Cell placement for general design flow.

(b) Cell placement for body bias, the straps and well are added within the interval.

Table 4

Area Overhead for body bias with general cell

Interval Space Area Normalized Area SIMD MAC

(without body bias) 0 692 um x 692 um 1

SIMD MAC

(with body bias) 2 um 750 um x 750 um 1.174

Figure 4.8 Layout diagram for body bias with general cell.

4.4 Power Switch Implementation

During cell-based design flow, the APR tool determines the major part of cell placement and power gird architecture. Figure 4.9 show the outline of cell placement and power grid architecture. The cells are placed within the core area by timing analysis. Two power rings, VSS and VDD, are around the core area and deliver power to the cell equally. If designer want to add power switch into the design circuit, the architecture of power grid have to be consideration. The carefully implementation of power switch usually can decrease the area overhead without complicated modification.

In our research, the NMOSFET power switch is added into the SIMD MAC. The concept of adding power switch into cell-based design flow is shown in Figure 4.10.

The power switches are inserted into the power delivery grid between the metal-1 and metal-2. The power switch switches are distributed through the layout in two columns in order to reduce any current crowding issues in the power delivery grid. At the same time, the power switched addition, like described in Section 3.4, is not implemented by APR tool. So the cut-off signal routing which drives the gate of the switches has to realize by designer. Because the power switches are placed below the power grid, the area overhead created by power switches is nearly 0%. The layout diagram of power switches placement is shown in the Figure 4.11.

VDD

VSS

Core Area

Standard Cell

Figure 4.9 Cell placement and power grid architecture for cell-base design flow.

Metal 2

Metal 1

VSS

Virtual VSS Power Switch

Cell

Figure 4.10 Concept of adding power switch into cell-based design circuitry.

Figure 4.11 Power switches placement.

4.5 Summary

In this section, the features of low power techniques are summarized. Table 5 shows the summary of the low power techniques introduced in this thesis. Because of dual-supply standard cell is not available in our research; the area overhead of dual-supply cell indicates the star mark and the meaning of the number represent the increase percentage per cell compare with general cell. Beside the method using dual-supply cell for body bias, other design methods need extra pattern to realized techniques. The semi-automation represents that some design phases are realized by designer, not EDA tool.

All techniques introduced in our research, e.g. voltage separation, body bias and power switch, are realized and implemented on SIMD MAC. Actually, all techniques introduced in this thesis can realize on every design circuit which adopts cell-based design flow. Figure 4.12 shows a layout diagram of streaming clusters with voltage separation technique. Streaming architecture has been suggested as an efficient architecture for both media applications and baseband architecture for software defined radios.

Table 5 Summary of low power techniques

Technique type Area overhead Extra pattern Implementation style

Voltage Separation 1.1% DNW Semi-automation

Dual-supply

cell *7.9% none Fully-automation

Body

Bias General

cell 17 % Contact / DIFF Semi-automation

Power Switch -- none Semi-automation

* The cell height of dual-supply accounts to 7.9% area overhand compared to general cell.

Cluster 1 Cluster 2

Cluster 3 Cluster 4

Memory

Power Ring Power Ring

Power Ring Power Ring

Power Ring

Figure 4.12 Layout diagram of streaming cluster with voltage separation. Each cluster has its own power ring.

Chapter 5

Conclusion and Future Work

5.1 Conclusion

As the power consumption of VLSI design increases from one generation to the next, it is becoming more important to control power dissipation even when circuit in idle mode. To meet the power requirement of advanced VLSI design, several simple yet effective physical design flows using existent commercial EDA tool have been presented.

By using Voltage Separation techniques, the design circuit can be partitioned into several islands and providing minimum voltage for reducing power is possible. At the same time, the deep n-well (DNW) is added to diminish noised coupling towards common substrate. Moreover, a cell layout style with build-in dual supply rail is proposed. By using the cell layout type, body bias can be immediately embedded in typical cell-based design flow. The extra power grid creation and port connection is also presented. By using conventional cell library, the body bias also can be added into cell-based design flow via a simple contact modification and well pattern insertion. Finally, the power switch which is suitable for cell-based design flow is shown. By the careful design, the power switches are inserted between metal-1 and metal-2 layer and suffer less area overhead.

By embedding low power techniques into physical design flow. A design circuit with low power technique feature is available. Therefore, this thesis provides an opportunity to realize several low power techniques relied on cell-based method.

Although some simple low power techniques are realized, several enhancements, such as partition islands, sizing of power switch and physical design considerations, are still needed for the EDA tool. Further, creating an industry-wide design flow with robust capability is essential. These include functional partitioning, synthesis, timing analysis, power analysis, test, simulation and physical design.

5.2 Future Work

Although several low power techniques have been added into the cell-based design flow, a comprehensive power management unit is still essential for a real SoC system.

This power management unit not only deals with performance coherence between functional blocks as well as handles power sequencing and communication issues, but also determines the minimum voltage level for each functional block or provides optimized voltage for body bias immediately. This information may be different according to using process technology such as low power process or high speed process. Further, some side effect, e.g. leakage current of NMOS increases dramatically in small channel width, will probably damage the effectiveness of the low power techniques in 0.13nm generation or beyond. Therefore, a robust block-level simulation for leakage efficiency in needed.

Bibliography

[1] D. E. Lackey, P. S. Zuchowski, T. R. Bednar, D. W. Stout, S. W. Gould, and J.M. Cohn, “Managing power and performance for system-on-chip designs using voltage islands,” IEEE/ACM International Conference on Computer Aided Design, ICCAD 2002, pp. 195-202, 10-14 November, 2002.

[2] T. Kam, S. Rawat, D. Kirkpatrick, R. Roy, G. S. Spirakis, N. Sherwani, and C.Peterson, “EDA challenges facing future microprocessor design,” IEEE Transactions on Computer Aided Design, vol. 19, pp. 1498-1506, Dec. 2000.

[3] J. W. Tschanz, S. G. Narendra, Y. Ye, B. A. Bloechel, S. Borkar, and V. De,

“Dynamic sleep transistor and body bias for active leakage power control of microprocessors,” IEEE Journal of Solid-State Circuits, vol.38, no. 11, pg.

1838-1845, November 2003.

[4] J. Tschanz, Y. Ye, L. Wei, V. Govindarajulu, N. Borkar, S. Burns, T. Karnik, S.

Borkar, and V. De, “Design optimizations of a high performance microprocessor using combinations of dual-Vt allocation and transistor sizing,” in Symp. VLSI Circuits Dig. Tech. Papers, 2002, pp. 218–219.

[5] T. Kuroda, T. Fujita, S. Mita, T. Nagamatsu, S. Yoshioka, K. Suzuki, F. Sano, M. Norishima, M. Murota, M. Kako, M. Kinugawa, M. Kakumu, and T. Sakurai,

“A 0.9-V, 150-MHz, 10-mW, 4mm , 2-D discrete cosine transform core processor with variable threshold-voltage (VT) scheme,” IEEE Journal of Solid-State Circuits, vol. 31, pp. 1770-1779, Nov. 1996.

[6] Calhoun, B., F. Honore, A. P. Chandrakasan, "A Leakage Reduction Methodology for Distributed MTCMOS," IEEE Journal of Solid-State Circuits, pp. 818-826, May 2004.

[7] V. Kursun and E. G. Friedman, " Sleep Switch Dual Threshold Voltage Domino Logic with Reduced Standby Leakage Current," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 12, No. 5, pp. 485-496, May 2004.

[8] Kao, J., A. P. Chandrakasan, "Dual-Threshold Techniques for Low-Power Digital Circuits," IEEE Journal of Solid-State Circuits, pp. 1009-1018, July 2000.

[9] S. Thompson, I. Young, J. Greason, and M. Bohr, “Dual threshold voltages and substrate bias: keys to high performance, low-power, 0.1-m logic designs,” in Symp. VLSI Technology Dig. Tech. Papers, 1997, pp. 69-70.

[10] S. Narendra, A. Keshavarzi, B. A. Bloechel, S. Borkar, and Vivek De, “Forward Body Bias for Microprocessors in 130-nm Technology Generation and Beyond,” IEEE Journal of Solid-State Circuits, vol. 38, pp. 696-701, May 2003.

[11] L. T. Clark, E. J. Hoffman, J. Miller,M. Biyani, Y. Liao, S. Strazdus, M.

Morrow, K. E. Velarde, and M. A. Yarch, “An embedded 32b microprocessor core for low-power and high-performance applications,” IEEE J. Solid-State Circuits, vol. 36, pp. 1599-1608, Nov. 2001.

[12] W. K. Yeh, S. M. Chen, Y. K. Fang (2004) “ Substrate Noise-Coupling Characterization and Efficient Suppression in CMOS Technology", IEEE T-Electron Device, Vol. 51, No.5, pp.817-827

[13] http://www.cic.org.tw

[14] J. Hu, Y. Shin, N. Dhanwada and R. Marculescu, “Architecting Voltage Islands in Core-based System-on-a-Chip Designs,” in Proc. ISLPED, oo.180-185, Aug.

2004.

[15] K. A. Bowman, S. G. Duvall, and J. D. Meindl, “Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution,” in IEEE ISSCC Dig. Tech. Papers, Feb. 2001, pp. 278-279.

[16] J. Tschanz, J. Kao, S. Narendra, R. Nair, D. Antoniadis, A. Chandrakasan, and V. De, “Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage,” IEEE Int.

Solid-State Circuits Conf. Dig. Tech. Papers, 2002, pp. 422-423.

[17] J. Tschanz et. al., "Effectiveness of adaptive supply voltage and body bias for reducing impact of parameter variations in low power and high performance microprocessors," IEEE Journal of Solid-State Circuits, pp. 826-829, May 2003.

[18] Kao, J., A. P. Chandrakasan, "Dual-Threshold Techniques for Low-Power Digital Circuits," IEEE Journal of Solid-State Circuits, pp. 1009-1018, July 2000.

[19] Calhoun, B., F. Honore, A. P. Chandrakasan, "A Leakage Reduction Methodology for Distributed MTCMOS," IEEE Journal of Solid-State Circuits, pp. 818-826, May 2004.

[20] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada,

“1-V Power Supply High-Speed Digital Circuit Technology with Multithreshold-Voltage CMOS,” IEEE Journal of Solid-State Circuits, pp.

“1-V Power Supply High-Speed Digital Circuit Technology with Multithreshold-Voltage CMOS,” IEEE Journal of Solid-State Circuits, pp.

相關文件