• 沒有找到結果。

CHAPTER 1 INTRODUCTION

1.3 T HESIS O RGANIZATION

In the following of the thesis, Chapter 2 discuss the basic operation concept of traditional 6T SRAM and its design issue. Besides, we would compare the difference between non-pipeline SRAM design and pipeline SRAM design in Chapter 2. In addition to these concepts, the reliability issue and some design methodology would be also mentioned in this Chapter. Chapter 3 demonstrates “40nm 1.0Mb high performance 6T Pipeline SRAM with Three Step-Up Word-Line (TSUWL) and Bit-Line Under-Drive (BLUD) and Adaptive Voltage Detector (AVD)” design. In this Chapter, Variation Tolerant TSUWL is proposed to improve the read and write

3

stability of 6T SRAM. Variation Tolerant Bit-Line Under-Drive scheme for SRAM stability enhancement for low voltage operation. Variation Tolerant boost control scheme using Adaptive Voltage Detector circuit to mitigate gate dielectric over stress.

The design issue and test flow and chip measurement result would be also discussed in Chapter 3. In the end, Chapter 4 makes a conclusion to this thesis.

4

Chapter 2

Overview of the design of 6T SRAM

2.1 Memory Family

Memory always occupies over 90% area of the current System on Chip (SOC).

As a result, memory always dominates the overall performance of one system. In order to store data, we always used the Random Access Memory (RAM) in the integrated system (IC). Besides, memory family could basically be divided into two categories: volatile memory and nonvolatile memory. RAMs is always associated with volatile memory which the storage data would loss if the power turn off. In contrast, nonvolatile memory would keep the storage data if the power off.

RAMs have been widely used for the most embedded system due to their higher access speed than other memory family. Besides, volatile memory could basically be divided into two categories: Dynamic RAM (DRAM) and Static RAM (SRAM).

DRAM has more compact density than SRAM, because DRAM can be built by one transistor and one capacitance. For the past decade, conventional 6T SRAM is always the mainstream to the cache memories in high performance system due to it has the highest operation speed which could reach several hundred Mega Hertz or even Giga Hertz than DRAM. Nowadays, DRAM is currently the major storage device of most SOC due to it has more compact density.

However, with the process technology node goes the deep sub-micron, the

5

design of 6T SRAM will be faced with several challenges. We must consider these issues which are the process variation and the leakage due to the Read/Write ability suffers a serious degradation in these issues. In order to reduce the process variation and the leakage, we must focus on how to design an efficient circuit technique and understand the basic operation of 6T SRAM.

2.2 6T SRAM

2.2.1 Structure of 6T SRAM

BLB BL

WL

VDD VDD

QB Q

M1

M2

M3 M4

M5 M6

Fig. 2-1 Traditional 6T SRAM cell

In Fig. 2-1 shows the widely common used traditional 6T SRAM cell. For this cell, it includes three control signals and six transistors. Three control signals contain Word-Line (WL) and one pair of Bit-Lines (BL and BLB). Six transistors of the cell contain two pass-gate n-type transistors (M3 and M6), and two pull-up p-type transistors and two pull-down n-type transistors, so the cell is called 6”T” (Transistor)

6

cell. Two inverters (M1-M2 and M4-M5) are to combine to form one cross couple latch in this cell. This cell could use the cross couple latch to lock value at logic “1” or

“0” due to the voltage transfer curve (CVT) of the cross couple latch has only two stable points. In fact, there is one meta-stable point when slope is positive one, but it’s not easy to exist (show in Fig.2-2). This Bit-Line pair is connected to the source node of pass-gate n-type transistors. Besides, two pass-gate n-type transistors could be seen as port to access the storage data in this cell. And “Q” and “QB” are storage nodes.

Word-Line signal is used to enable this cell, and then the data could be passed in or passed out from this Bit-Lin pair.

Fig. 2-2 Voltage Transfer Curve (VTC) of CMOS inverter [2-1]

2.2.2 Read Operation and Read Disturb of 6T SRAM

Before the read operation, all of Bit-Line pairs of 6T SRAM cell are pre-charged to high voltage (VDD) at standby mode. At read mode, assume the cell storage data which “Q” storage node is “0” (GND) and “QB” storage node is “1”

(VDD) (Fig. 2-3). When once the signal of Word-Line goes high, two pass-gate n-type

7

transistors (M3 and M6) are turned on for accessing storage data. The storage node “0”

side will discharge the Bit-Line voltage to ground level (through M3 and M2).

However, on the other side of the storage node “0”, the storage node “1” will uphold the high level due to the storage node “1” and BLB are the same high level.

According to this operation flow, the storage data can be easily passed to Bit-Line.

And then, in order to get exact storage data on output pin, we must use additional peripheral read circuit to get the storage data from Bit-Lin.

BLB BL

Fig. 2-3 Read operation

At read mode, this cell had a thorny problem that would to hurt the original storage data. Fig. 2-3 shows the pass-gate n-type transistor (M3) and the pull-down n-type transistor (M2) form a voltage divider. In this case, we assume node “Q” is “0”

(GND). When the signal of Word-Line goes high level, node “Q” would be rose to a voltage rather than ground voltage. This situation was called Read-Disturb voltage.

Besides, because of the Read-Disturb voltage, the read stability suffers a serious

8

degradation. When the Read-Disturb voltage goes over the trip voltage of the opposite inverter, the storage node “1” would be flipped to “0”.

Fig. 2-4 shows the stability ratios. At 90 nm technology node, the cell switch-point and read down-level began to overlap. That would affect the design of a high yield SRAM in advance technology node.

Fig. 2-4 Stability ratio [2-2]

2.2.3 Hold Static Noise Margin and Read Static Noise Margin

In order to evaluate the read ability of 6T SRAM, the Static Noise Margin (SNM) is an important indicator. First of all, we can use the Voltage Transfer Curve (VTC) to get the butterfly curve through switch the axis of any one of the Voltage Transfer Curve (VTC). And then, we can get the Static Noise Margin (SNM) by the butterfly curve. Fig. 2-5 shows the butterfly curve of the Hold Static Noise Margin (HSNM).

We can get the HSNM curve at standby mode due to the standby mode operation of 6T SRAM is exactly a cross coupled pair latch. Besides, we can see it has two “wings”

and the largest tolerable square of these two wings chooses the smaller one to use definition the Hold Static Noise Margin (HSNM).

9

VR VDD 0

Fig. 2-5 Butterfly curve of HSNM

Fig. 2-6 shows the difference of curves between the Hold Static Noise Margin (HSNM) and the Read Static Noise Margin (RSNM). For read operation, the signal of Word-Line goes high and two pass-gate n-type transistors turn on for passing storage node data. In addition, the butterfly curve of standby mode is larger than the read mode, and the largest square in either wing that is the HSNM (standby mode) larger than the RSNM (read mode). Therefore, the minimum RSNM also could directly defined as the voltage difference between the trip voltage of inverter and the Read-Disturb voltage. If any other of RSNM wing becomes to “0” or under the zero, the destructive read operation will occur the read fail.

10

VR VDD 0

VDD

VL

HSNM RSNM

Fig. 2-6 The HSNM and RSNM butterfly curve

2.2.4 Write Operation and Half-Selected Read Disturb of 6T SRAM

Before the signal of Word-Line goes high, the write data must be ready on the Bit-Line pair. Fig. 2-7 shows a write mode, we assume the storage node “Q” is “0”

and the storage node “QB” is “1” and we want to write “0” data to the storage node

“QB”. In this case, the Bit-Line of the storage node “QB” side should be prepared to

“0”. And then, when the signal of Word-Line goes high, the storage node “QB” data will discharge to ground level by pull-up p-type transistor (M4) and pass-gate n-type transistor (M6). This write operation is successfully, but it still has chance to happen write fail. If the storage node “QB” data is not lower to trigger the tip voltage of opposite inverter, then this write operation is fail.

11

BLB BL WL

VDD VDD

QB Q

M1

M2

M3 M4

M5

M6 1 0

I

0 1

Fig. 2-7 Write operation

However, the most common seem problem to 6T SRAM Array is the Half-Selected Read Disturb issue (Fig. 2-8). Fig. 2-9 shows a write mode example.

When the signal of WL1 goes high, the Column 1 (COL1) is selected for write operation and the Column 0 (COL0) is not selected for read operation. Under this situation, the Column 0 (COL0) cell can occur the Half-Selected Read Disturb. But the Half-Selected Read Disturb issue is unwanted. Because of the other standby cells would affect the storage data by this issue.

12

Fig. 2-8 Half-Selected Read Disturb Voltage of 6T SRAM cell [2-2]

Fig. 2-9 Half-Selected Disturb of the 6T SRAM array (For Write operation) [2-2]

2.2.5 Write Static Noise Margin and Write Margin and AC Write Margin

Fig. 2-10 shows the butterfly curve of write operation. In a successfully write operation, the butterfly curve must be open with only one interest point. By definition, this butterfly curve like be combined by RSNM and HSNM. Besides, we also can find a largest tolerable square like RSNM or HSNM on this curve. But there is also write

13

fail problem for write operation. If the WSNM becomes to 0 or under the zero or more than one interest point, the write operation will fail.

To evaluate the write performance, the Write Margin (WM) [2-3, 2-4] is also an important indicator. Before the write operation, both of BL and BLB have to set at high level (logic one). During the write operation, we sweep down the Bit-Line voltage of the storage node “1” side of the cell from high level to ground level.

Afterwards, when the storage node “1” flip, the Bit-Line voltage at this moment is defined as Write Margin (WM) (Fig. 2-11).

During write operation at the same Word-Line pulse width, we change the Bit-Line voltage of the storage node “1” side of the cell from high level to ground level. At some Bit-Line voltage the cell storage node “1” will be suddenly flip, and the Bit-Line voltage at this moment is defined as ACWM.

VR VDD 0

VDD

VL

Fig. 2-10 Butterfly curve of WSNM

14

Fig. 2-11 The definition of the Write Margin (WM) [2-3]

2.2.6 The Size and Layout of 6T SRAM

In order to keep the read stability, the VREAD (Read-Disturb voltage) must be small. So, the pass-gate n-type transistor should be weaker than pull-down n-type transistor (Fig. 2-12). To maintain the write ability, the pass-gate n-type transistor should be stronger than pull-up p-type transistor (Fig. 2-12). In addition, for keep the stability at the standby mode, the pull-down n-type transistor cannot be too stronger compares to pull-up p-type transistor (Fig. 2-12). As a result, the size of each transistor of the 6T SRAM cell is specific designed to ensure maximize the read and hold stability and the write ability.

Starting around 90nm node [2-2], the Thin-Cell layout of the 6T SRAM cell becomes the mainstream due to the Thin-Cell of layout style could reduce BL loading to improve performance and noise immunity. Fig. 2-13 shows the layout of 6T SRAM which uses a single direction poly-silicon to improve manufacturability and yield.

15

Fig. 2-12 Transistor size ratio in 6T SRAM[2-2]

Fig. 2-13 The layout of the 6T SRAM in advanced process technology

2.3 SRAM Array Architecture 2.3.1 Memory Array

For current System on Chip (SOC), memory always be build in the integrated system (IC) as the storage media. These cells are usually formed into an array to enhance the area efficiency and to easy access. In traditional SRAM architecture design, all of the SRAM cells are put together with the peripheral circuit such as Row/Column decoder and Sense Amplifier (SA) are placed next to the SRAM to control Read/Write operation (Fig. 2-14). In this architecture, in order to control pass-gate of all the row direction cells, the signal of Word-Line (WL) is usually a row

16

direction signal. And the Bit-Line (BL) is a signal of column direction that can pass in or out the data from the SRAM cell. So, if we want to select one cell for read or write operation, both the Word-Line (WL) and Bit-Line (BL) must be activated. When the interest cell is selected, the read or write operation is depend on the signal of write enable. However, if the number of the row or column cell increase, the total capacitance and resistance will increase on Word-Line (WL) and Bit-Line (BL), and thus increasing the transient response. In order to reduce the transient response issue, we can use the Hierarchical Word-Line technique and the Hierarchical Bit-Line technique. These two techniques not only reduce the Word-Line loading and the Bit-Line loading but also reduce the charge injection into SRAM cell and the transient response, and thus improving the performance, power and noise margin [2-2].

Fig. 2-14 Array architecture of an 2Nx2M memory array [2-1]

17

Fig. 2-15 SRAM critical path [2-5]

2.3.2 Differential Sensing and Large Signal Sensing Scheme

In order to get exact storage data on output pin, we must use additional peripheral read circuit to get the storage data from Bit-Lin. The sensing scheme could basically be divided into two categories: differential sensing scheme and large signal sensing scheme. The differential sensing is also celled the small signal sensing in conventional sensing scheme. In order to get the logic “0” or “1” signal from the amplified signal of differential sense amplifier, the basic idea of the differential sensing scheme is to sense the voltage difference between BL and BLB with amplify.

To use a cross-couple latch (Q1, Q2, Q3 and Q4) and two access transistors (Q5 and Q6) are to combine to form a conventional differential sensing scheme (Fig. 2-16). It is similar to 6T SRAM but the sizing and the design are different. During read operation, after the Word-Line signal is enabled and the storage node data passed to the Bit-Line, one of the Bit-Line will begin to go low. If the voltage difference between BL and BLB has enough voltage to enable the differential sense amplifier,

18

the sense amplifier enable (SAE) signal would go high and activate the sense amplifier. Then, the Bit-Line pair will be fully separated and we can get a fully logic

“0” or “1”. The differential sensing scheme usually co-operate with long Bit-Line structure which means there are many cells along the Bit-Line (usually more than hundred cells). In the long Bit-Line structure, due to the Bit-Line loading would become very heavy, the read time would suffer a serious degradation from the storage node data pass to Bit-Line. Therefore, in order to improve the read time, the long Bit-Line structure must be co-operated with the differential sensing scheme.

Fig. 2-16 Differential sense amplifier [2-1]

19

With the technology node goes the deep sub-micron of process, the leakage issue becomes a critical issue due to the charge into the cell will become very serious in the long Bit-Line structure. Even the Word-Line was turned off, the standby cell could be flip by the large leakage current that would result retention fail. And the differential sensing scheme also may fail to sense Bit-Line signal by the leakage current. Base on this issue, the Bit-Line length must be decreased and co-operate with short Bit-Line structure. In the short Bit-Line structure, the most common used length is about 8, 16, or 32 cells on Bit-Line. So, the total leakage current could be reduced and the read time could be also reduced. Therefore, the short Bit-Line structure must be co-operated with the large signal sensing scheme. The large signal sensing scheme has been used to sense the data on local Bit-Line [2-10, 2-11, 2-12 and 2-13]. At this scheme, the most common used one transistor or one inverter to detect the signal on the Bit-Lines. When the Bit-Line voltage goes lower than the sensing transistor or sensing inverter, the data of the Bit-Line will be passed out such as to Global Bit-Line and next stage circuit.

Besides, the short Bit-Line structure can reduce the Bit-Line loading, and the storage node data pass to the Bit-Line will be faster. Due to the large signal sensing scheme is a single ended sensing, so the leakage is half to the differential sensing where both BL and BLB must be connected on the differential sensing amplifier. And the large signal scheme is easier to implement than the differential sensing scheme which is not easier to optimize the gain. For short Bit-Line structure with the large signal scheme, the area overhead was a disadvantage. Because of the short Bit-Line structure needs many sensing transistors or sensing inverters in each column, the area overhead is larger than the long Bit-Line structure. Fig. 2-17 and Fig. 2-18 show the typical the large signal sensing scheme.

20

Fig. 2-17 Large signal sensing scheme of IBM Cell processor [2-12, 2-13]

Fig. 2-18 Large signal sensing scheme [2-11]

21

2.3.3 Non-Pipeline SRAM Design

Fig. 2-19 shows a Non-Pipeline SRAM operation diagram. When the Clock rising edge coming, the input address will be launched and decoded at the same time.

Next the Finite State Machine (FSM) will control the WLE signal to enable Word-Line signal to perform read or write operation. Due to the WLE signal enable, the pre-charge circuit is turned off. In Non-Pipeline SRAM design, the WLE signal can be seen as internal Clock. For read operation, Local Bit-Line (LBL) and Global Bit-Line (GBL) will get the read data from the storage node data of the 6T SRAM cell.

Then, we can utilize the replica control circuit to perform dummy read or write operation for making the WLE signal goes low to turn off Word-Line and enable Global Bit-Line (GBL)/Output latch. When the WLE signal goes low, the pre-charged circuit will be turned on again. This is a completely procedure of read operation in Non-Pipeline SRAM.

22

Fig. 2-19 Non-Pipeline SRAM operation diagram

2.3.4 Pipeline SRAM Design

Ultra high performance system utilizes the Pipeline SRAM to enhance the performance. Fig. 2-20 [2-6] shows the best of the operating time in a cycle is 11 Fan-Out-Of-4 (FO4). Fan-Out-Of-4 (FO4) means an inverter can drive four identical copies. The time of only 5 to 8 FO4 is used to operate for function and distribution due to the output delay of L2 and the setup time of the L1 should be removed. Hence, in Pipeline SRAM design, how to balance the operating time in every cycle is an important issue.

WLE WLE

BL Restore Pre-charge

Disable WL

1 Cycle

Decoder

Launch ADD.

PRE-CHARGE

LBL GBL

Data-Out Latch LBL+

GBL GBL Latch Data-Out Latch

23

Fig. 2-20 The design of 11FO4 cycle time between cycle boundary [2-6]

Fig. 2-21 [2-6] shows the macro of IBM fully Pipelined Embedded SRAM in the Streaming Processor of the cell processor. Starting operating SRAM is from the 3rd cycle to 5th cycle. At the 3rd cycle, one of Local Word-Line (LWL) signals will be decoded and latched in Word-Line driver. At the 4th cycle, the Local Word-Line (LWL) will be enabled to perform read or write operation. By the way, the write operation is finished in this cycle. In the read operation, the Bit-Line (BL) data utilizes the sense

Fig. 2-21 [2-6] shows the macro of IBM fully Pipelined Embedded SRAM in the Streaming Processor of the cell processor. Starting operating SRAM is from the 3rd cycle to 5th cycle. At the 3rd cycle, one of Local Word-Line (LWL) signals will be decoded and latched in Word-Line driver. At the 4th cycle, the Local Word-Line (LWL) will be enabled to perform read or write operation. By the way, the write operation is finished in this cycle. In the read operation, the Bit-Line (BL) data utilizes the sense

相關文件