Chapter 2 Previous Low-Power SRAM Designs
2.7 Summary
In the beginning of this chapter I introduce the power consumption model and device geometric effect. Nowadays, leakage power is domain the whole chip power consumption and how to reduce power dissipation is a very important issue. Standby power and leakage current are discussed in the section 2.1, then CMOS device design and new technology such as FinFET, High-K metal gate are also introduced. After that the basic operation of conventional 6T SRAM and introduce the basic concept and measurement of stability and write ability in SRAM bit-cell. By technology process scaling down, the process variation is already damage the SRAM cell stability significantly. Global variation and local variation are discussed in section 2.3. Then,
36
we introduce some new assist technologies for SRAM design or improve SNM, such as boosting circuit, keeper design and negative BL …etc. Finally, new cell or share WWL structures for low power purpose are discussed. Besides, new register design bit-cell and concept also listed in the 2.6.
37
Chapter 3
Low Power 2R2W Multi-Port 8Kb SRAM Design
3.1 Introduction
In this chapter, a new low power 13T 2 Read 2Write (2R2W) multi-port SRAM bit cell is proposed. Combine with wide range operation and multi-port and multi-port goodness, it very suit for portable device or mobile phone. A new sharing WBL structure and cross Y_Cut & X_Cut can help cell more robustness and improve write ability and WBL driver power Reduction. Negative VVSS technology is embedded for low voltage write success. Using this technology, a shorter write 1 time is approached.
In order to gain higher bandwidth, multi-port design becomes more important in media application. No like conventional single port, multi-port SRAM design can do synchronous or asynchronous operation, because it with two independent ports. Parallel operation is got more bandwidth at same time, but a new conflict issues must be take care.
At first, I discuss conventional problem in Chap.3.2. In this section conflict problem will be specific introduced. In Chap3.4, in order to improve write “1”ability, there are two technology used in this design. Single-end write is low power reduction but write ability is drop compare with convention differential write. By use negative VVSS and cut off feedback loop can improve write strength Chap. 3.5 shows post layout simulation, performance and power analysis. The TSMC 40nm general purpose 2R2W multi-port 8K chip is tape out by CIC on Aug. 22.
38
3.2 Conventional Dual-Port 8T SRAM
3.2.1 Two Kinds of Access Mode in DP-SRAM
Fig. 3.1 shows conventional dual-port SRAM bit-cell, it has two port can read / write at the same time. Compare with conventional single port design, dual port structure give designer more control flexible. Dual-port SRAM provides high bandwidth and asynchronous CLK timing control property. Conflict problem is a very important in dual-port, there are many technologies to improve the Vmin of DP-SRAM against a disturb condition [3.1] [3.2].
Fig. 3.1 Conventional 8T dual-port SRAM cell
Fig. 3.2 Different-row access mode [3.2]
Conventional 8T Dual-Port
BLA BLB BLA_bBLB_b
WLB
WLA
39
Fig. 3.3 Access in the same row [3.2]
3.2.2 Write and Read Disturb Issue in 8T DP-SRAM
There are two access modes in two port operation, first is different row which will no disturbed problem [3.3]. Second is two ports access in the same row simultaneously (Fig. 3.2 & Fig. 3.3). The case 1 (Fig. 3.4): If write for the left cell in the same row, and read for the right cell. Dummy read is happened for the left side, which is referred to as “write disturbed” The dummy read operation prevents the internal “1” node from begin flipped by BLA, so the write-ability for the left memory.
Fig. 3.4 Write operation disturbed by dummy read in the same row [3.3]
40
The case 2 (Fig. 3.5): If read for the left cell on the same row, and another read port is pointed to the right cell. Dummy read operation occurs for the left cell. The internal
“0” node is ramped up though BLA, causing a reduction of the cell current.
Consequently reduction in the cell in the cell current leads to a read failure due to lack of BL swing (read disturb).
Fig. 3.5 Read operation disturbed by dummy read in the same row [3.3]
Timing control with CLK skew disturbed on dual-port also discussed in [3.4]. Timing variation is relative wire line in whole chip, if positive skew or negative skew is happened, write / read disturbed maybe caused function failed.
3.2.3 Read/Write Conflict of Dual-port
Fig. 3.6 shows new technologies that can solve the conflict problem by using timing sharing technology [3.5] [3.6]. Normally, Read and write operation is forbidden at the same time. In conventional design, if read/write is point to the same bit, large conflict power consumption and it needs more wide WL pulse have to finish the operation.
41
Fig. 3.6 2P-SRAM for image processing unit [3.6]
Fig. 3.7 Delay conflict waveform scheme [3.6]
In this timing sharing design, if read/write happened at the same time, one of two WL pulse will delay turn on and reduce conflict problem. Consequently, power and time delay smaller than conventional design. Not only power consumption, but also cell stability improved by short disturbed time. Fig. 3.7 shows operation waveform compare with conventional and this work [3.6].
42
3.3 A New 2R2W Bit-cell
3.3.1 Bit-cell Schematic and Layout View
A new cell with share WBL structure and half select disturb free characteristic is proposed. Fig. 3.8 shows a new 2R2W multi-port bit-cell, the NMOS with green color is share with neighbor column. For area reduction, this cell is used single-end write and conventional 8T read buffer. Not only low area consumption, but also power is lower than conventional dual port SRAM. In write mode, Xsel and Ysel is cross couple, this structure is suit for bit-interleaving structure. Using this design, we will not need others technology such as read and write back in bit-interleaving architecture. This cell has 13 control lines, 5 is row base control and the others are column base.
The bit cell all takes regular Vt N/PMOS in this design. In order to improve read/write current, short channel effect is used. Not use the minima size 40nm, larger channel length can gain more Ion current. So MN1, MN3, MN2, MN4, MN5, MN5, MN7, MN8 all use 70nm to replace minima channel length. For cell robust issues, the cross couple is not used minima length either.
Fig. 3.8 2R2W multi-port SRAM bit-cell
120/50
43
The 2R2W SRAM bit-cell layout is take M1~M3. M1 & M3 is row base and M2 is column based. Metal layer and bit-cell schematic are showed in Fig. 3.9, Fig.3.10 and metal layer width is 90nm each one. In TSMC 40G technology process, dummy ploy is needed between poly and poly. In this reason, the cell layout must be larger than regular 65nm or others technology process. For area reduction, dummy poly in this design is share neighbor. Fig. 3.11 shows 2R2W multi-port SRAM layout view, the area size is 3.095um x 0.9um = 2.785um2.
Fig. 3.9 2R2W metal layer organization
Fig. 3.10 2R2W multi-port SRAM bit-cell layout schematic
Metal 1
GND WBL_A WBL_B RBL_A RBL_B Ysel_A Ysel_A Y_Cut VCC
<0.09um> <0.09um> <0.09um> <0.09um> <0.09um> <0.09um> <0.09um>
<0.09um>
<0.09um>
<0.09um>
<0.09um>
<0.09um>
44
Fig. 3.11 Two 2R2W multi-port SRAM bit-cell layout view
The cell layout adaptive thin cell layout and left/right share the same WBL. Thin cell layout is popular in these years and has minima area consumption.
3.3.2 Share WBL Structure
For low power design, a new share WBL structure is proposed. Single write operation can see in Fig. 3.12, data from share WBL pass through two NMOS and write into Q. In write mode, X & Y are cut off for write ability improve. Fig. 3.13 shows two ports are access in the same row at the same time. Because in each side has one NMOS can isolated another WBL signal, there is no disturbed problem.
Fig. 3.12 One port writes with no disturb issues
< 0.9um >
L_Y_Cut L_Ysel_A Y_sel_B R_Ysel_A
“Driven”
“1”
“1”
“1”
“0”
45
Fig. 3.13 Two write the same row with no disturb issues
3.3.3 Bit-interleaving (8 to 1)
SRAM array is more un-robust when technology process scaling down. Short channel effect modulation and soft error is easier can see everywhere in nowadays. Soft errors are caused by radiation of energetic particles, thermal neutrons, random noise, or signal integrity. A soft error is a signal or data which is wrong, but is not assumed to imply such a mistake or breakage. Since contiguous bit-cells could be corrupted at one radiation injection, the interleaving scheme takes a benefit that the effect of soft error will associated with different logical words.
A common 8-1 bit-interleaved SRAM array is illustrated in Fig. 3.14. In each row, bit-cells of words A, B, C, D…and H are interleaved and share one word-line. During a read/write operation, the column-multiplexers select the bit-lines of accessed columns among words A, B, C, D…and H.
120/70
120/70
Xsel_B Xsel_A
Left_Q
L_Ysel_A Y_sel_B R_Ysel_A
“Driven_A”
“1”
“1”
Right_Q
“Driven_B”
46
Fig. 3.14 8 to 1 Bit-interleaved SRAM array
3.4 Write Assist Technology
3.4.1 Negative VVSS
Write “1” is worst case in this single-end write bit-cell. Two stack NMOS poor the write “1” ability, voltage is drop form full VDD to VDD –Vtn. By this reason, if no any assist circuit to help write “1” operation, write function will failed in 0.7V.
Negative VVSS circuit can solve this problem, and use this technology the cell supply voltage can down to 0.5V. It is important for sizing capacitance and set a suitable negative level when write “1” operation. It will flip the data in half select on the same column when negative level is over triple point. Fig. 3.15 shows 8 to 1 bit-interleaving scheme with negative VVSS circuit output.
Fig. 3.15 Bit-interleaving select and Negative VVSS control
A00 B00 C00 D00 E00 F00 G00 H00 AY0 BY0 CY0 DY0 EY0 FY0 GY0 HY0
47
Fig. 3.16 shows negative VVSS control circuit, which use two PMOS capacitance to generative a negative level pulse. This circuit works only when write data “1” into bit-cell, others case VVSS is connect to ground. Fig. 3.17 shows negative level generative in different voltage and different corner.
Fig. 3.16 Negative VVSS control circuit
Fig. 3.17 Negative level (a) Different supply voltage (b) Different corner
VVSS
48
3.4.2 Inverter Feedback Loop Cut-off
Not only use negative write assist technology, cut off feedback path NMOS also used in this 2R2W multi-port SRAM design. N/P MOS will cut off when write operation, they are separate control by Y_Cut and X_Cut signal. Fig. 3.18 shows the equivalent circuit when data write form share WBL to Qc. Conventional PMOS cut off is not suit for bit-interleaving structure, because it will floating the neighbor cell node when write operation (Figure 3.19). In my design, another NMOS control by column base specific cut off which one is ready to write. By this method, a more robust structure and no floating issues is achieved.
Fig. 3.18 Data write cut off scheme
Fig. 3.19 Conventional PMOS cut off structure
A B
49
3.5 2R2W Dual-port 8Kb SRAM Design
3.5.1 2R2W Multi-port SRAM Schematic
Fig. 3.20 shows 2R2W 8Kb multi-ports SRAM design schematic. There are two banks in this chip and each one size is 4Kb 64 bits x 64 bits. Write IO is place on the top of bank, and Read IO is place on the bottom of bank. Each bank has itself replica circuit and control RWL/WWL pulse width. The specific spec is showed on table 3.1, for low
Multi-port Bank 1 64*64
Multi-port Bank 2 64*64
Write IOWrite IO Read IORead IO Replica_LeftReplica_Right Write Conflict_BWrite Conflict_A Enable_Buffer
Replica_Sum
X_Decoder
W_Y_Decoder R_Y_Decoder
D_IND_IN D_OUTD_OUT Fig. 3.20 Schematic of 8Kb 2R2W SRAM Chip
50
power consumption the chip is operation at 0.6V VDD. Power and performance analysis will show in the next section.
2R2W Multi-port SRAM TSMC 40nm General Purpose
Memory Size 8Kbit (4K*2)
Data-width 16 bit
Read Address (each port) 8 bit Write Address (each port) 8 bit
Bit-Interleaving 8 bit
Read / Write Port 2Read / 2Write
Metal layer M1~M5 (1P9M)
Voltage range 0.5V~1.2V
Cell size (Bank) 3.095um x 0.9um Access time @ 0.6V TT 25 5.93 ns
Cycle time @0.6V TT 25 6.1 ns (160 MHz)
Read power 0.0692uW/t (per bit-cell)
Write power 0.115uW/t (per bit-cell)
Table 3.1 Summary of the 8kb 2R2W multi-port spec
3.5.2 Data Transmission Path
In order to eliminate the conflict problem in conventional multi-port operation, a new conflict detect circuit is included in this design. Read operation is non-broken the store data in the cell and write will flip the data.
I set read priority always higher than write, if read and write address are the same. Only read operation is turned on, and write operation is in stall.
According to this conflict detect design, WEN signal need a more conflict detect time. After detecting is finish, intra-WEN signal is output to the replica and trigger the next stage. Fig 3.21 shows data transmission path in this 2R2W multi-port chip.
51
Fig. 3.21 Data transmission path in 2R2W SRAM chip
3.5.3 New Technology Adaptive in 2R2W SRAM Design
Fig. 3.22 shows whole chip layout view of the proposed 2R2W multi-port 8K SRAM.
The proposed 2R2W multi-port SRAM is fabricated using TSMC 40nm general purpose process. The area of bit-cell is 3.095um x 1.8um = 5.571um2 and the Whole chip size is 621.55um x 152.67um = 94.892 mm2. Below is all of improved technology of this design (Table 3.2).
Input Buffer
52
Fig. 3.22 2R2W 8kb SRAM array layout view
NO. Technology
1. Wide range operation 0.4~1.2V
2. Share WBL structure
3. Power gating sensing circuit 4. Negative VVSS for write assist 5. Transmission gate cut off write assist 6. Y_Cut NMOS for floating issue free
7. Bit-interleaving structure
Table 3.2 Characteristic of 2R2W 8kb SRAM
3.5.4 Test Pattern and Simulation Waveform
In order to test all function and worst case in this design, Fig.3.23 is my test function.
There are 7 CLK cycle and every cycle test on different pattern. Such as 1W, 1R, 1W1R …etc., and the next cycle is test conflict detect circuit. Write data pattern is first nearest and next is furthest bit-cell, and try to find the critical path in this chip. Fig. 3.24 is post layout simulation write waveform and Fig. 3.25 shows read waveform by post simulation.
621.55um
152.67um
53
Fig. 3.23 Test pattern for 2R2W multi-port SRAM Chip
Fig. 3.24 Write test function for 2R2W multi-port SRAM chip
0 1 2 3 4 5 6 7 8 9 10
54
Fig. 3.25 Read test function for 2R2W multiport SRAM chip
3.6 Post-layout Simulation
3.6.1 Performance
Fig. 3.26 Read “0” speed with different voltage
Fig. 3.26 shows read “0” performance in this chip, read time is domain the cycle time
CLK
55
in this 2R2W multi-port SRAM. Read buffer is like conventional 8T SRAM read buffer, two stack NMOS reduce the read current. Not like write in this design have two assist which is discussed before, read NMOS only uses short channel effect to gain more Ion current. Fig 3.27, 3.28 show write “0” and write “1” performance in different supply voltage.
Fig. 3.27 Write “0” speed with different voltage
Fig. 3.28 Write “1” speed with different voltage
0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4
56
Fig. 3.29 (a) shows write address conflicts detect circuit delay, delay time raise significant in low voltage supply. Right figure 3.29 (b) is read time compare with worst write time (write operation time and wen conflict delay time). Time different between read “0” and write worst case is near double timing. So when moderation the W/R performance, it must be very take care of read “0” operation.
Fig. 3.29 (a) Write conflict detect delay
Fig. 3.30 (b) Read “0” performance compare with write worst case
Fig. 3.31 shows different corner verse Read/Write performance for supply voltage = 0.6V. FF speed is the faster one, and SS is the slowest on this case. SS change clear in different corner and SF and FS performance is near equivalent between SS &FF. Fig.
3.32 shows temperature effect on read/write operation, post simulation shows that temperature effect for 0.6V is smaller than corner effect.
0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4
57
Fig. 3.31 Performance variation for different corner
Fig. 3.32 Write “1” speed with different voltage
TT SS FF FS SF
Different Corner / 25C / 0.6V Read 0
58
3.6.2 Power Consumption
Fig. 3.34 shows A/B port write operation and read operation power dissipation. Not like conventional, share WBL structure make the write driver reduce to 1/2. Not only share WBL structure, but also single-end bit-cell structure. A new bit-cell structure can supply bit-interleaving, no half select issues when W/R on the same row. A shorter access time reduces active power consumption and this design with faster operation frequency. Power reduction compare with conventional 8T cell operation reduce more than 30% in active mode (Fig3.33).
Fig. 3.33 Active power consumption
Fig. 3.34 Power consumption with different voltage
0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4
Con. Dual-port 8TCon. 8T SRAM This Work 0
59
In order to find out the Min. energy point, P x T is showed in Fig.3.35. Although this cell can lower voltage down to 0.5V, energy is not the lowest one. Timing delay is raised significantly in low supply voltage, it make energy product more than 0.7 V.
Read power is more than write because too longer delay make power delay product bigger. Operating in 0.7V voltage scale is the best choose for low power dissipation.
Fig. 3.35 Power delay product with different voltage
Next page is the tape out chip view, and pin name. Pin number in this test chip is 68 pins and total area is 890um x 1090um. This chip is tape out by CIC and use TSMC 40nm general purpose design. Pins are stacked for reducing area consumption, in this way pin and pin each other is very tightly.
0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.80E-013
2.00E-013 2.20E-013 2.40E-013 2.60E-013 2.80E-013 3.00E-013
Energy (W)
Supply Voltage (V) Energy (TT)
Power_Write Power_Read
60
Fig. 3.36 2R2W multi-port SRAM pin name
Fig. 3.37 2R2W multi-port 8K SRAM test chip
51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35
Q_A_SE Q_A_R Q_A_CKSEL WADD_B_SN WADD_B_R WADD_B_D VDDPST VSSPST AVDD VDD VSS WADD_B_CKSEL WADD_A_SN WADD_A_R WADD_A_D WADD_A_CKSEL RADD_B_SN
52 Q_A_SI RADD_B_R 34
53 Q_A_SN RADD_B_D 33
54 Q_B_CKSEL RADD_B_CKSEL 32
55 Q_OUT_A<0> RADD_A_SN 31
56 Q_OUT_A_<15> RADD_A_SI 30
57 QOUT_B_<0> RADD_A_R 29
58 VDDPST VDDPST 28
59 VSSPST VSSPST 27
60 AVDD AVDD 26
61 VDD VDD 25
62 VSS VSS 24
63 QOUT_B_<15> RADD_A_D 23
64 VSS RADD_A_CKSEL 22
65 VDD DIN_B_SN 21
66 AVDD DIN_B_R 20
67 AVDD DIN_B_D 19
68 AVDD DIN_B_CKSEL 18
ADD_SE CEN CLK CLKSEL CM REN_A VDDPST VSSPST AVDD REN_B TDELAY WEN_A WEN_B DIN_A_CKSEL DIN_A_D DIN_A_R DIN_A_SN < 890 um >
<1090 um>
61
3.7 Summary
An 8K 2R2W 13T SRAM array design is presented in this chapter. The new cell structure can supply bit-interleaving structure, and no disturb issues compare with convention dual-port SRAM cell. Wide range operation from 1.2Vdd to 0.4V is more flexible by user. There are many low power designs for low power reduction, such like share WBL, CLK gating, and power gating are used in this design. The new share WBL structure can reduce active power about 60% compare with conventional.
The chip is already tape out on Sep. 1 by CIC. This cell can operate under wide operate under wide operating voltage (VDD=1.4V~0.5V) that can cover all process and temperature variation. By post-layout simulation result, this 8K 2R2W multi-port SRAM can operate at 475 MHz at VDD=0.9V, TT corner and 25C and it also can operate at 150 MHz at VDD=0.6V, TT corner and 25C. The power consumption of read operation and write operation in VDD=0.9V, TT corner and 25C are 0.115(uW/t) bit-cell and 0.0692 (uW/t) bit-cell. The VDDmin is 0.5V in TT corner and 25C.
62
Chapter 4
Low-Power Register File Designs and New Bit-Cell Structure
4.1 Introduction
Lower VDDMIN operation can achieve orders of magnitude low power consumption compare to convention super-threshold operation. In these year, near threshold is a new region for energy reduction than sub threshold region. Although operation in sub-threshold can reduce many orders than super-threshold, long operation time is needed. By this reason, it average total energy consumption in sub threshold region.
Nowadays, such as medical devices, portable devices, sensor networks and wireless body area network (WBAN) where performance is not constrained. Register file play a very important role in many process or SoC application. Not like the SRAM, register file need high bandwidth, high operation, and very robustness. In order to gain more bandwidth, multi-port structure is added. Increase port number can enlarge more bandwidth but area and power are overhead increasing at the same time. These kinds
Nowadays, such as medical devices, portable devices, sensor networks and wireless body area network (WBAN) where performance is not constrained. Register file play a very important role in many process or SoC application. Not like the SRAM, register file need high bandwidth, high operation, and very robustness. In order to gain more bandwidth, multi-port structure is added. Increase port number can enlarge more bandwidth but area and power are overhead increasing at the same time. These kinds