Thesis Organization - 實現在40奈米製程下可操縱在低電壓的四讀四寫多執行序暫存器叢集設計

Following is the main contents of this thesis. In Chap 2 we will discuss the recent work about low power SRAM design. A conventional 6T SRAM basic operation and stability are introduced at first. After that low power SRAM assists circuit design, multi-port SRAM and register file design are discuss step by step. Chapter 3 shows conventional dual-port operation conflict problem and read disturb issues. A new 2R2W multi-port SRAM structure is proposed. Share write bitline and X and Y cut control line can do bit-interleaving structure and no need any others periphery circuit.

In Chap. 4, register file with multi–thread and double pump technology introduced at beginning. New technology “Data Slot Switch” and conflict detect circuit can help no disturb issue. In Chap 5, a new share read bitline is proposed and reduce dummy read of bit-interleaving structure. Active power saves by share RBL and leakage power reduces by keep RWL high in standby mode. In the end, Chapter 6 finally concludes this thesis.

Chapter 2 Previous Low-Power SRAM Designs

2.1 Introduction

In recent microprocessors, the capacity of on-chip memory is rapidly increasing to improve overall performance. According to ITRS roadmap in 2002 [2.1] [2.2], memory chip will occupy 90% of chip area in 2013. In such a memory rich chip, the leakage current of an SRAM, which comprises the vast majority of on-chip transistors, dominates the standby current because leakage power is proportional to the number of transistors. Thus, it becomes important to focus on SRAM standby leakage current reduction for ultra-low power application.

Low power, minimum transistor count and fast access SRAM is essential for embedded multimedia and communication applications realized using system on a chip technology. Hence, simultaneous or parallel read/write (R/W) access multi-port SRAM bit cells are widely employed in such embedded systems. Multi-port has many advantages like high performance and high bandwidth, but it also consumers more percentages of power and area.

This chapter begins with the analysis of power dissipation of SRAM circuit and technique for leakage reduction will be shown in section 2.2. In section 2.3, stability issues of SRAM cell, including hold stability, read stability, and write ability will be defined and the impact of variation on SRAM in low voltage will be presented. In section 2.4, 2.5 and 2.6, Conventional dual port SRAM and Multi-port SRAM cell are showed. Finally, In 2.7 the previous Multi-port register file cell design and peripheral circuit technology will be described.

2.2 Power Dissipation

This chapter begins with an analysis of power dissipation of CMOS circuit and circuit technique for power dissipation. Power dissipation combines with dynamic power (P_dynamic), leakage power (P_leakage), and short circuit power (Pshort-circuit). Power could be expressed as, where Pdynamic = α CLVDD2

f, Pleakage = VDDIleakage, and Pshort-circuit=I_meanV_DD

P_total = P_dynamic+ P_leakage+ Pshort−circuit (2.1)

2.2.1 Dynamic Power

Fig. 2.1 show a CMOS inverter, the average dynamic power dissipation can be obtained by summarizing the average dynamic power of N/P MOS. The cause of dynamic power is logic transition of CMOS circuits which charges or discharges its load capacitance and parasitic capacitance (CL). As can be seen in (2.1), the dynamic power dissipation is direct proportion to switching activity factor (α), capacitance load (CL), squire of supply voltage (VDD2

), and operating frequency (f).

VDD

GND

VIN VOUT

iN iP

Fig. 2.1 Circuit diagram of inverter

2.2.2 Leakage Power

Fig. 2.2 Leakage current of deep-submicron transistors

In advanced CMOS technologies, embedded SRAM leakage current becomes dominant compared to the dynamic current. The majority of SRAM macro leakage current is from its bit cell array [2.3]. Leakage current is composed of reverse-biased junction leakage current (I_REV), gate induced drain leakage (I_GIDL), gate direct-tunneling leakage (IG), and sub-threshold leakage (ISUB) in a CMOS transistor [2.4] [2.5].

Fig. 2.2 shows reverse-biased junction leakage, sub-threshold leakage, gate direct-tunneling leakage, injection of hot carriers from substrate to gate oxide, gate induced drain leakage, and punch through leakage in the deep scaling transistor.

Junction Leakage

In Fig. 2.3, leakage in reverse biased transistors and diodes includes the effects of carrier generation, related to residual damage density and location relative to the junction boundary, as well as structure and bias dependent effects of gate oxide leakage, band-to-band tunneling at the drain junction and thermionic emission from metal contacts. All of these effects depend on process conditions, through dependence on dopant activation and profile shape, junction location and local electric fields.

Subthreshold

Fig. 2.3 Gate leakage current paths in a NMOS transistor

In the steady-state ON region both the gate and drain of the device are held at high with the source being grounded. In this state a well-formed channel exists and three separate components of the gate tunneling current Igs, Igcs and Igcd are active. The component from gate to drain overlap (Igd) is absent due to the almost zero electric field in that region of the oxide. The overall current flow is from gate to source and channel, opposite to the flow in the OFF state. In the steady-state OFF region both gate and source are at ground while the drain is at high (VDD) voltage. Since no channel is formed in this condition, the only active component is Igd [2.6].

Gate-induced drain leakage (GIDL):

As the electric field in and around the gated p-n junction is increased by the applied gate voltage, all the high-field effects, such as avalanche multiplication and band-to-band tunneling, can increase very dramatically (Fig. 2.4). Thus, the leakage current of a reverse-biased gated diode can increase dramatically when the gate voltage begins to cause field crowding in and around the junction region.

Fig. 2.4 Leakage current of deep-submicron transistors

Sub threshold Leakage

When gate voltage is below the threshold voltage, sub-threshold leakage or weak inversion current occurs between source and drain. For example, an off state inverter, although the Vgs of the NMOS is 0V, there is a light current (leakage) flowing from the drain to source due to the voltage V_DD across V_ds [2.7].

Sub-threshold behavior can be modeled physically as show in the following [2.8]

𝐼_𝑑𝑠 = 𝜇^𝑊_𝐿 (^𝑘𝑇_𝑞)²𝐶_𝑠𝑡ℎ𝑒^{𝑉𝑔−𝑉𝑇+𝜂𝑉𝑑𝑠}^{𝑚𝑘𝑇 𝑞}^⁄ (1 − 𝑒⁻^{𝑘𝑇 𝑞}^𝑉𝑑𝑠^⁄ ) , 𝑚 = 1 +^𝐶_𝐶^𝑠𝑡ℎ

𝑜𝑥 (2.2)

Where W and L denote the transistor width and length, μ denotes the carrier mobility, Csth = Cdep = Cit denotes the summation of the depletion region capacitance and the interface trap capacitance both per unit area of the MOS gate, η is the drain induce barrier lowering (DIBL) coefficient, and Cox denote the gate input capacitance per unit area of the MOS gate.

Sub-threshold leakage increases exponentially with the reduction of the threshold voltage and DIBL would lower threshold make leakage even worse. On the other hand, sub-threshold can be drop with increasing the threshold voltage. In low power technology we can use high V_th technology transistor to reduce sub-threshold leakage in off state.

High-K Metal Gate

In order to reduce gate leakage, a new material is used for replace the conventional SiO2. Silicon dioxide has been used as a gate oxide material for decades. As transistors have decreased in size, the thickness of the silicon dioxide gate dielectric has steadily decreased to increase the gate capacitance and thereby drive current, raising device performance. As the thickness scales below 2 nm, leakage currents due to tunneling increase drastically, leading to high power consumption and reduced device reliability (Fig. 2.5). Replacing the silicon dioxide gate dielectric with a high-κ material allows increased gate capacitance without the associated leakage effects. The 2.3 rule showed that we can add high k material and extended thickness to get the equal capacitive. By thickness oxide, leakage problem can reduce significantly [2.9].

Fig. 2.5 Conventional silicon dioxide gate dielectric structure compared to a potential high-k dielectric structure

𝐶 =^𝑘∈_𝑡⁰^𝐴 (2.3)

 A is the capacitor area

 κ is the relative dielectric constant of the material (3.9 for silicon dioxide )

 ε₀ is the permittivity of free space

 t is the thickness of the capacitor oxide insulator

Fin FET Structure

Fig.2.6 shows Fin FET device has especially faster switching times and higher current density. Not like conventional MOS structure, a new better gate control device is developed by IBM. Vertical gate has more area cover the channel, so better control ability is approach. Due to its superior gate control, electrostatic integrity, and variability, Fin FET has demonstrated satisfactory scalability and feasibility for mass production of post-22-nm technology node [2.10] [2.11].

Fig. 2.6 Fin-FET structure

Punch-through Leakage

Finally, in short-channel devices, due to the proximity of the drain and the source, the depletion regions at the drain-substrate and source-substrate junctions extend into the channel. As the channel length is reduced, if the doping is kept constant, the separation between the depletion region boundaries decreases. An increase in the reverse bias across the junctions (with increase in V_DS) also pushes the junctions nearer to each other. As the combination of channel length and reverse bias leads to the merging of the depletion regions, punch through leakage occurs.

Punch through will bring a high current, and make the device short down. Hot and power dissipation by high current, so designer should very care about this effect.

2.2.3 Short Circuit Power

When CMOS switch frequently, a path from vdd to gnd will short together. This dc path makes external power consumption. Short circuit power can be expressed as rule (2.4). Imean is the mean value of the short circuit current [2.12].

On the circuit-level, there have been a number of articles describing the short circuit power. From the short circuit power articles by Veendrick [2.13], and Hedenstierna and Jeppson [2.14], these power dissipation rules are showed below.

Pshort-circuit=Imean x VDD (2.4)

 

f V V

P_short_circuit ( _DD 2 _t)³

12 

  (2.5)

P: The device transistor conductance τ: The ramp time

β: The gain factor of a transistor,

f: The operating frequency

2.3 SRAM Bit-cell Stability and Write-ability

When CMOS technology process is scaling down, process variation is become more and more important. PVT variation is the major effect on cell stability, such as global variation and local variation. Therefore, how to use the simulation information to accurate the true threshold is very important. The worst cast must be consider and usually use Monte Carlo simulation to detect it. The following of this section will state the most widely adopted SRAM cell stability definition.

2.3.1 Static Noise Margin (SNM)

The best common way to measure the stability of cross-coupled inverters is the static noise margin (SNM). Hold static noise margin is defined as the maximum value of static DC voltage noise which can be tolerated by the SRAM bit-cell without flipping the storage node when word-line turns off. Fig. 2.7 shows the normal test Hold SNM simulation in 6T SRAM cell. Give a two noise in the Q and Qb then find max voltage noise can maintain the storage data of the SRAM. In this case, WL is zero and two BL keep high [2.15].

Fig. 2.8 shows the standard setup for modeling Read SNM. Compare with HSNM mode, in this case WL is turn and simulation read operation. The node “0” will raise a little voltage because of the voltage dividing effect between the pass transistor and pull-down transistor. Once the disturb voltage rise near to the trip point of the inverter, data will be flipped. The curve is small than HSNM because read distribute issues and it reduce node stability significantly. Fig. 2.7 and Fig. 2.8 also show the example of butterfly curves during hold and read, revealing the degradation in SNM during read.

0 VDD

VR (V)

VL (V)

WL=0

BL=VDD BLb=VDD

VL VR

Fig. 2.7 Standard setup for finding Hold SNM

Fig. 2.8 Standard setup for finding Read SNM

2.3.2 Write Margin (WM)

There are many way to measure the write ability of SRAM bit-cell, the simple one is find the write trip point (WTP). Write margin is defined as𝑉_𝐷𝐷− 𝑀𝐼𝑁[𝑉(𝑊𝑊𝐿)].

𝑀𝐼𝑁[𝑉(𝑊𝑊𝐿)] is the minimum write-word-line voltage required for flipping the bit-cell. In this write margin test mode, sweep WL voltage from VDD to Zero. The higher write margin, the easier the data is written into bit-cell. Fig. 2.9 shows a corresponding example of finding write margin. The write margin is defined as the VDD

- VWL value at the point when VR and VL flip. The write margin value and variation is a

function of the cell design, SRAM array size and process variation. A cell is considered not writeable if the worst-case write margin becomes lower than the ground potential.

VWL(V)

Fig. 2.9 Write margin of a SRAM bit-cell

2.3.3 Impact of Variation on SRAM in Low Voltage Differential 6T SRAM

6T bit-cell is not applied for process scaling down, and also not suitable in low-voltage operation. If 6T cell want to operate under novel technology, area of N/PMOS has to enlarge to gain more W/R ability. Process problem to 6T cell is very sensitive, such as random dopant fluctuation (RDF) and line edge roughness (LER).

This may result in the threshold voltage mismatch between the adjacent transistors in memory cell [2.16] [2.17].

Half select disturbs Failure

In Nano-device scaling down, threshold voltage variation is become larger. By process variation NMOS Vt is not a constant value anymore, if disturb voltage is larger than bit-cell trip voltage, the data will flip and error happened. Conventional 6T with bit-interleaving structure will have half select problem. Fig. 2.10 shows the half select disturbs failure and waveform, if pull-down NMOS Vt is too high, and access NMOS Vt is low. Current is stack on the Qb, a probability data will flip by this current path.

Fig. 2.10 The read-disturb of 6T SRAM in different process [2.17]

Read/Write conflict issues

In this configuration, both read and write accesses are opposite making it highly difficult to overcome the severe effect of variation and manufacturing defects. Fig.

2.11 shows the β ratio of 6T SRAM bit-cell and the β ratio conflict will be described afterward.

During read access the cell must remain bi-stable to ensure that both data logic value can be held and read without being upset by read disturb that occur at the internal nodes. In order to facilitate read and minimize read disturb, the β₂ ratio should be small enough by strong PD NMOS and weak AX NMOS. During write access the cell should be made mono-stable to write the desired data. For improving writability, the β3 ratio must be large by strong AX NMOS and weak PUP PMOS.

For improving writability and minimizing read disturb simultaneously, the transistor can be sized as PD > AX > PUP. However, it would degrade the β1 ratio hence the V_TRIPresult in poor read SNM. Therefore, these three β ratios are conflict to each other, simply sizing could not solve 6T SRAM failures.

Hold and Read Failure

Hold failure happens if the destruction of the cell content in the standby mode at a low supply voltage. Therefore higher trip point of back-to-back makes the cell easier to flip, thereby increasing the hold failure probability. As shown in Fig. , it is preserved to

very low voltages and will form the basis for several of the ultra-low voltage bit-cell design described in section 2.5 and 2.6.

Fig. 2.12 6T SRAM SNM loss at low voltages [2.18]

If the data stored in an SRAM cell flips during reading, there is a read failure. If the voltage rise at the node storing “0” and higher than the trip point of the back-to-back inverter, then the data stored in the cell would flip over. Fig. 2.12 shows that the 6T SRAM bit-cell fails to operate at low voltages because of reduced signal levels and increased variation. At low voltages, the read SNM is negative, indicating loss of stability.

Write Failure

If the data stored in an SRAM cell can’t be flip during writing, there is a write failure. While writing “0” to node storing “1,” the voltage at the node need to be discharged below the trip point of the back-to-back inverter. As shown in Fig 2.13, it is also lost at low voltage, where a positive value, in this case, indicates write failures.

Fig. 2.13 6T SRAM write margin [2.18]

Access Failure

If the voltage difference between the two bit-lines (dual-end) or the voltage drop of the single bit-line (single-end) can’t be sensed by the sense amplifier during the access time, there is an access failure. The cause of access failure can be ascribed to read-current degradation and data-dependent bit-line leakage.

The cell read-current, I_READ, is the current sunk from the pre-charged bit-lines during a read access when the access devices are enabled. At ultra-low voltages, we expect a significantly reduced read-current because of the lower gate-drive voltage.

However, the increased effect of threshold voltage variation severely degrades the weak cell read-current even further. Fig. 2.14 normalizes the read-current distribution by the mean read-current to highlight just the further degradation due to variation.

Fig. 2.14 Read-current distribution [2.18]

Fig. 2.15 I_READ is less than I_leakage from un-accessed cells at low voltage [2.19]

An implied consequence of the reduced read-current is that the aggregate leakage currents from the un-accessed cells on the same bit-lines can make conventional data sensing impossible. Because of the reduced ION-to-IOFF ratio and severe degradation from read-current variation, these can exceed the actual read-current of the accessed cell. Fig. 2.15 shows IREAD /ILEAK,TOT of 256-row SRAM array loss of functionality at low voltages. At ultra-low voltage the bit-line leakage exceeds the read signal, making the accessed data indecipherable.

2.4 Previous Read/Write Assist Peripheral Circuit

2.4.1 Keeper Tracking Circuit Assist for SRAM Design

Wide or structures are typically used in the read path of register files, L1 caches, match lines of TCAMs, flash memories and PLAs. In most of the applications the worst case requirement would be to sense the difference between the leakage state where all the pull-down legs are leaky and the ON state where only one of the legs is ON. The increase in the variability and magnitude of the leakage current has become a major bottleneck in realizing such wide OR gates [2.21] [2.22].

In the conventional design, the keeper being PMOS and it does not track the

leakage currents in the pull-down NMOS logic for the FNSP and SNFP corners.

These results in performance degradation, higher short-circuit power dissipation and limit the number of pull down legs.

An ideal keeper is expected to have minimum contention, good noise robustness, good process tracking, less power and area overhead and should support wide fan-in gates.

Fig. 2.16 A conditional keeper with INV chain [2.20]

Fig. 2.17 A current mirror keeper [2.22]

Conditional keeper (CKP)

A weak keeper holds the state of the dynamic node during the transition window and a strong keeper is conditionally activated based on the state of the dynamic node after a certain delay Fig. 2.16 This reduces contention during the evaluation period, thereby enabling high speed and reducing the short circuit power dissipation.

Current mirror keeper (LCR)

Current mirror-based keeper technique Fig. 2.17 was proposed for better process tracking. This technique provides excellent tracking of the delay, and the contention is still high because the keeper is strongly ON during the beginning of the evaluation phase. Further the replica transistor does not track the leakage due to noise (as Vgs=0) and DIBL (as the drain voltage of the replica NMOS varies across process corners) in the pull-down NMOS logic.

Fig. 2.18 Cross couple keeper with INV chain (left) Fig. 2.19 Rate sensing keeper with INV chain (right) Cross couple keeper (CSK)

在文檔中實現在40奈米製程下可操縱在低電壓的四讀四寫多執行序暫存器叢集設計 (頁 20-0)