• 沒有找到結果。

實現在40奈米製程下可操縱在低電壓的四讀四寫多執行序暫存器叢集設計

N/A
N/A
Protected

Academic year: 2021

Share "實現在40奈米製程下可操縱在低電壓的四讀四寫多執行序暫存器叢集設計"

Copied!
141
0
0

加載中.... (立即查看全文)

全文

(1)

電子工程學系 電子研究所

實現在 40 奈米製程下可操縱在低電壓的四讀四寫多

執行序暫存器叢集設計

Low VDD

MIN

4R4W Multi-Thread Register File Design

and Implementation in 40nm CMOS Process

研 究 生 林弘璋

指導教授:黃 威 教授

莊景德 教授

(2)

實現在 40 奈米製程下可操縱在低電壓的四讀四寫多

執行序暫存器叢集設計

Low VDD

MIN

4R4W Multi-Thread Register File Design

and Implementation in 40nm CMOS Process

研 究 生:林弘璋 Student:Hon-Jarn Lin

指導教授:黃 威 教授 Advisor:Prof. Wei Hwang

莊景德 教授 Prof.

Ching-Te Chuang

國 立 交 通 大 學

電 子 工 程 學 系 電 子 研 究 所

碩 士 論 文

A Thesis

Submitted to Department of Electronics Engineering & Institute of Electronics College of Electrical Engineering and Computer Engineering

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master in

Electronics Engineering

June 2012

Hsinchu, Taiwan

(3)

I

實現在 40 奈米製程下可操縱在低電壓的四讀四寫多

執行序暫存器叢集設計

學生:林弘璋

指導教授:黃 威 教授

莊景德 教授

國立交通大學電子工程學系電子研究所

摘 要

隨著攜帶式電子產品,像是手機、筆記型電腦、影像通訊和眾多電腦商品 越來越廣泛的運用,一個低功率消耗且可提供 SoC 晶片平行處理的記憶體是非常 重要的課題。在這篇論文中分別探討了兩個主題,第一個是具有二讀二寫 8Kb 靜態隨機存取記憶體,另一個則是具有四讀四寫 2Kb 多執行序暫存器叢集設計, 兩者皆實現在 TSMC 40nm 製程上。為了達到高頻寬以及高效能,傳統單一讀寫隨 機存取記憶體無法提供夠高的效率,因此我們提出了一顆具有二讀二寫多重埠的 靜態隨機存取記憶體,此設計不僅可以解決同時列選取干擾並且可以使用於位元 交錯結構,其他設計像是相鄰共用寫入模組、CLK 閘控和電壓偵測器阻隔皆可提 供節省更多能量消耗。一個 8Kb 的測試晶片設計與實現在 TSMC 40nm 製程下,經 由電路布局後的模擬顯示,在 0.9 伏特可操作在 475 百萬赫茲。另一個設計為具 四讀四寫多執行序暫存器叢集設計,新的技術像是一周期兩次寫/讀、支援四個 序列並行、資料空位轉移和共用讀取模組。此設計提供廣泛電壓使用,可從 0.4 伏特到 1.2 伏特讓使用可以更佳有彈性。考慮低能量消耗技術像是沒有仿造讀取 動作、可降低一半讀取電路和字元線保持在高電位。藉由這些設計可大幅降低動 態能量消耗以及靜態能量消耗分別為 50%和 25%。 經由電路布局後的模擬顯示在 0.9 伏特可操作在 238 百萬赫茲。

(4)

II

Low VDD

MIN

4R4W Multi-Thread Register File Design

and Implementation in 40nm CMOS Process

Student: Hon-Jarn Lin Advisors: Prof. Wei Hwang

Prof. Ching-Te Chuang

Department of Electronics Engineering & Institute of Electronics

National Chiao-Tung University

ABSTRACT

There are wide-ranging usage of portable mobile device (PMD) such as cell phone, notebook and video product and many different types of computers in today markets. It is crucial important to emphasis energy efficiency, low power consumption and parallel memory design in system-on-chips (SoC) recently. In thesis, two topics will be presented. First topic is the low power 2R2W 8Kb multi-port SRAM design, second topic is the low power 4R4W 2Kb multi-thread register file design and implementation in TSMC 40nm CMOS technology. In order to gain high bandwidth and high performance, conventional single-port SRAM design is not efficiency. In this way, we proposed a new structure 2R2W multi-port bit-cell structure, this cell not only eliminate the half select distribute problem but also support bit-interleaving structure. Low power technology such as share WBL structure, CLK gating and SA power gating are included. An 8K test chip is designed and implemented in TSMC 40nm general purpose CMOS process. Post-layout simulation results demonstrate operating frequency of 475 MHz at 0.9V. Another work is 4R4W multi-thread register file design, with double pump, four threads, data slot switch control and share RBL structure technology are proposed. Wide range supply voltage operation form 0.4V to 1.2V, it supply designer has more flexibility. No dummy read operation, reducing RBL to 1/2 and RWL keep VVSS are design for low power consideration. In this work, active power reduction is more than 50% and standby power reduction is less than 25%. Post-layout simulation results demonstrate operating frequency of 238 MHz at 0.9V.

(5)

III

致 謝

可以順利完成這篇論文,有許多要感謝的人。首先,我的兩位指導老師黃威 教授和莊景德教授,感謝能給我許多寶貴的意見和優渥的研究資源,讓我在研究 時可以全力以赴不會有後顧之憂。兩位老師資深歷練,除了平時間指導我們有關 研究方面的難題,也常常教導我們人生的哲理,讓我們每每受益良多。 接著是要感謝一起打拼的學長們王道平、張銘宏、黃柏蒼、楊皓義,在研究 的路上崎嶇不平,對於一個轉組的學生更是困難重重,感謝他們不餘遺力教導和 適時的給我許多想法,讓我得以度過難關解決層層關卡。感謝 LPMD 的同學們, 因為有了你們,讓原本乏味的研究生活了增添了不少色彩,一起熬夜一起歡樂, 此外 Digital VLSI Lab 實驗室的大家,有幸大家在這一路上互相扶持跟成長, 也由衷的感謝。最後就是那些常常給我大力支持以及傾聽我苦水的工科學弟妹 們,由衷的感謝大家。

最後,感謝家人對我的鼓勵與支持,也是我論文最大的推手,當我最堅實的後 盾,讓我可以一心一意的完成我的研究,在此獻上無限的感謝。

(6)

IV

Contents

Chapter 1 Introduction

1.1 Background ... 1 1.2 Challenges ... 2 1.3 Motivation ... 3 1.4 Thesis Organization ... 4

Chapter 2 Previous Low-Power SRAM Designs

2.1 Introduction ... 5

2.2 Power Dissipation ... 6

2.2.1 Dynamic Power ... 6

2.2.2 Leakage Power ... 7

2.2.3 Short Circuit Power ... 12

2.3 SRAM Bit-cell Stability and Write-ability ... 12

2.3.1 Static Noise Margin (SNM) ... 13

2.3.2 Write Margin (WM) ... 14

2.3.3 Impact of Variation on SRAM in Low Voltage Differential 6T SRAM ... 15

2.4 Previous Read/Write Assist Peripheral Circuit ... 19

2.4.1 Keeper Tracking Circuit Assist for SRAM Design ... 19

2.4.2 Charge Pump Circuit Design ... 22

2.5 Previous Low Voltage SRAM Design ... 25

2.5.1 SRAM Bit-cell ... 25

(7)

V

2.6.1 Register File Bit-cell ... 32

2.7 Summary ... 35

Chapter 3 Low Power 2R2W Multi-Port 8Kb SRAM Design

3.1 Introduction ... 37

3.2 Conventional Dual-Port 8T SRAM ... 38

3.2.1 Two Kinds of Access Mode in DP-SRAM ... 38

3.2.2 Write and Read Disturb Issue in 8T DP-SRAM ... 39

3.2.3 Read/Write Conflict of Dual-port ... 40

3.3 A New 2R2W Bit-cell ... 42

3.3.1 Bit-cell Schematic and Layout View ... 42

3.3.2 Share WBL Structure ... 44

3.3.3 Bit-interleaving (8 to 1) ... 45

3.4 Write Assist Technology ... 46

3.4.1 Negative VVSS ... 46

3.4.2 Inverter Feedback Loop Cut-off ... 48

3.5 2R2W Dual-port 8Kb SRAM Design ... 49

3.5.1 2R2W Multi-port SRAM Schematic ... 49

3.5.2 Data Transmission Path ... 50

3.5.3 New Technology Adaptive in 2R2W SRAM Design ... 51

3.5.4 Test Pattern and Simulation Waveform ... 52

3.6 Post-layout Simulation ... 54

3.6.1 Performance ... 54

3.6.2 Power Consumption ... 58

(8)

VI

Chapter 4 Low-Power Register File Designs and New Bit-Cell

Structure

4.1 Introduction ... 62

4.2 Previous of Low Power Register-File Design ... 63

4.2.1 Power Reduction ... 63

4.2.2 Banked Register File Architecture ... 64

4.2.3 Tri-state Register File Design ... 67

4.3 Multi-thread Register File Design ... 68

4.3.1 Multi-thread Application Design ... 69

4.3.2 The Parity Protected Multi-thread Register File ... 70

4.3.3 Thread Switching ... 72

4.4 Timing Sharing Technology ... 74

4.4.1 Pervious Work ... 74

4.4.2 Conflict Issues ... 76

4.5 This Work ... 76

4.5.1 Bank Structure ... 76

4.5.2. Read/Write Slot Controller ... 82

4.5.3 Switch Data Circuit ... 84

4.5.4 Thread Switch Control ... 85

4.5.5 Double Pump Operation ... 87

4.6 Summary ... 89

Chapter 5 Low VDD

MIN

Multi-thread 4R4W Register File Design in

TSMC 40nm CMOS Process

5.1 Introduction ... 90

(9)

VII

5.2 4R4W Register File Structure ... 90

5.2.1 2R2W Register File Unit Cell & Layout View ... 90

5.2.2 Share WBL Structure ... 93

5.2.3 Share RBL Structure ... 93

5.3 Register File Assist Technology ... 95

5.3.1 Negative VVSS Design ... 95

5.3.2 Single-end Write Cut-off & Y_Cut for Floating Issues Free ... 96

5.4 Implementation of Multi-thread 4R4W RF ... 97

5.4.1 4R4W Register File Floor Plane ... 97

5.4.2 Design Implementation & Test-flow of Proposed 4R4W Register File ... 99

5.5 Post-layout Simulation Result ... 102

5.5.1 Performance ... 103

5.5.2 Power Consumption ... 107

5.5.3 Iso-Area SNM Simulation and Comparison ... 110

5.6 Summary ... 112

Chapter 6 Conclusion & Future Work

6.1 Conclusions ... 114

(10)

VIII

List of Figures

Fig. 1.1 Voltage scaling and energy dissipation [1.4] ... 1

Fig. 1.2 Conventional dual-port 8T SRAM bit-cell ... 2

Fig. 2.1 Circuit diagram of inverter ... 6

Fig. 2.2 Leakage current of deep-submicron transistors ... 7

Fig. 2.3 Gate leakage current paths in a NMOS transistor ... 8

Fig. 2.4 Leakage current of deep-submicron transistors ... 9

Fig. 2.5 Conventional silicon dioxide gate dielectric structure compared to a potential high-k dielectric structure ... 10

Fig. 2.6 Fin-FET structure ... 11

Fig. 2.7 Standard setup for finding Hold SNM ... 13

Fig. 2.8 Standard setup for finding Read SNM ... 14

Fig. 2.9 Write margin of a SRAM bit-cell ... 14

Fig. 2.10 The read-disturb of 6T SRAM in different process [2.17] ... 15

Fig. 2.11 The β ratio of 6T SRAM bit-cell ... 16

Fig. 2.12 6T SRAM SNM loss at low voltages [2.18] ... 17

Fig. 2.13 6T SRAM write margin [2.18] ... 18

Fig. 2.14 Read-current distribution [2.18] ... 18

Fig. 2.15 IREAD is less than Ileakage from un-accessed cells at low voltage [2.19] ... 19

Fig. 2.16 A conditional keeper with INV chain [2.20] ... 20

Fig. 2.17 A current mirror keeper [2.22] ... 20

Fig. 2.18 Cross couple keeper with INV chain (left) ... 21

(11)

IX

Fig. 2.20 Replica bias generator for RSK circuit ... 22

Fig. 2.21 The variation of the rates across different process corners ... 22

Fig. 2.22 8T SRAM cell with on-die RWL and WWL boosting [2.23] ... 23

Fig. 2.23 2SLS can Effective promotion boost ratio [2.23] ... 23

Fig. 2.24 Different boost frequency effect [2.23] ... 23

Fig. 2.25 2-step level-shifter reduce ILOAD [2.23] ... 24

Fig.2.26 Dickson charge pump circuits ... 24

Fig. 2.27 Ker proposed CP circuit and waveform with four pumping stages ... 25

Fig. 2.28 (a) The standard 2-port 8T SRAM bit-cell with non-isolated read-port [2.28] (b) An isolated read-port 7T SRAM bit-cell [2.28] ... 26

Fig. 2.29 Schematic diagram of the DCO 8T SRAM cell with dual VDD [2.29] ... 27

Fig. 2.30 DCO 8T cell shows 2x lower leakage at the same read current at 0.9V comparing to SCO 8T cell [2.29] ... 27

Fig. 2.31 Comparison of leakage components between DCO 8T cell and SCO 8T cell at 0.9V (Q = 1) [2.29] ... 27

Fig. 2.32 (a) Schematic diagram of the proposed 2-port 6T SRAM bit-cell with shared read and write assist transistors per word [2.30] ... 28

(b) The VTC and SNM obtained from butterfly curve for the standard ST, 7T and proposed 6T SRAM bit-cells [2.30] ... 28

Fig. 2.33 A 32-bit word organization of the proposed 2-port 6T SRAM bit-cell to eliminate simultaneous read/write disturbance problem [2.30] ... 29

Fig 2.34 Simultaneous R/W access issues in word-oriented array [2.30] ... 29

Fig. 2.35 Schematic of the proposed Z8T SRAM [2.31] ... 30

Fig. 2.36 Schematic of the proposed 10T SRAM [2.33] ... 31

Fig. 2.37 Schematic of the proposed 9T SRAM [2.34] ... 31

(12)

X

Fig. 2.39 Cell number on one bit-line is small [2.36] ... 33

Fig. 2.40 Two bit-cells share one bit-line [2.36] ... 33

Fig. 2.41 Cell number on one bit-line is small [2.37] ... 33

Fig. 2.42 IRF integrated memory cell circuit diagram [2.39] ... 34

Fig. 2.43 Standard 4R2W split to 2 copies of a 2R1W cell [2.41] ... 35

Fig. 3.1 Conventional 8T dual-port SRAM cell ... 38

Fig. 3.2 Different-row access mode [3.2] ... 38

Fig. 3.3 Access in the same row [3.2] ... 39

Fig. 3.4 Write operation disturbed by dummy read in the same row [3.3] ... 39

Fig. 3.5 Read operation disturbed by dummy read in the same row [3.3] ... 40

Fig. 3.6 2P-SRAM for image processing unit [3.6] ... 41

Fig. 3.7 Delay conflict waveform scheme [3.6]... 41

Fig. 3.8 2R2W multi-port SRAM bit-cell ... 42

Fig. 3.9 2R2W metal layer organization ... 43

Fig. 3.10 2R2W multi-port SRAM bit-cell layout schematic ... 43

Fig. 3.11 Two 2R2W multi-port SRAM bit-cell layout view ... 44

Fig. 3.12 One port writes with no disturb issues ... 44

Fig. 3.13 Two write the same row with no disturb issues ... 45

Fig. 3.14 8 to 1 Bit-interleaved SRAM array... 46

Fig. 3.15 Bit-interleaving select and Negative VVSS control ... 46

Fig. 3.16 Negative VVSS control circuit ... 47

Fig. 3.17 Negative level (a) Different supply voltage (b) Different corner ... 47

(13)

XI

Fig. 3.19 Conventional PMOS cut off structure ... 48

Fig. 3.20 Schematic of 8kb 2R2W SRAM Chip………..49

Fig. 3.21 Data transmission path in 2R2W SRAM chip ... 51

Fig. 3.22 2R2W 8kb SRAM array layout view ... 52

Fig. 3.23 Test pattern for 2R2W multi-port SRAM Chip ... 53

Fig. 3.24 Write test function for 2R2W multi-port SRAM chip ... 53

Fig. 3.25 Read test function for 2R2W multiport SRAM chip ... 54

Fig. 3.26 Read “0” speed with different voltage ... 54

Fig. 3.27 Write “0” speed with different voltage ... 55

Fig. 3.28 Write “1” speed with different voltage ... 55

Fig. 3.29 (a) Write conflict detect delay ... 56

Fig. 3.30 (b) Read “0” performance compare with write worst case ... 56

Fig. 3.31 Performance variation for different corner ... 57

Fig. 3.32 Write “1” speed with different voltage ... 57

Fig. 3.33 Active power consumption ... 58

Fig. 3.34 Power consumption with different voltage ... 58

Fig. 3.35 Power delay product with different voltage... 59

Fig. 3.36 2R2W multi-port SRAM pin name ... 60

Fig. 3.37 2R2W multi-port 8K SRAM test chip ... 60

Fig. 4.1 Efficiency of power gating circuits... 63

Fig. 4.2 Supply switching with ground collapse ... 64

Fig. 4.3 16-port SRAM architecture with 2-port banks and distributed crossbar ... 65

(14)

XII

Fig. 4.5 The bank-internal 1-to-8 read-port converter, ... 67

Fig. 4.6 Schematic design of novel register files ... 67

Fig. 4.7 CMT pipeline efficiency with 32 threads ... 69

Fig. 4.8 Representation of pipeline and enhancements for multithreading ... 70

Fig. 4.9 FRF double-pumped pulse clock generator circuit diagram ... 71

Fig. 4.10 SPICE simulations of both writes “0” and “1” ... 71

Fig. 4.11 Schematic of two threads switch cell ... 72

Fig. 4.12 The vector register file cell unit ... 73

Fig. 4.13 VRF cell capable of holding 2 bits and inter-bit thread decoder ... 74

Fig. 4.14 (a) Write replica circuit (b) Signal waveforms of this circuit ... 75

Fig. 4.15 Circuit cross-section of double-pumped write path ... 76

Fig. 4.16 4R4W Multi-Bank structure ... 77

Fig. 4.17 1st Data In/Out conflict detector ... 78

Fig. 4.18 1st Conflict WEN/REN detect waveform ... 79

Fig. 4.19 4R4W Multi-Bank structure………....80

Fig. 4.20 Conflict finite state machine ... 81

Fig. 4.21 S0/S1 switch control circuit ... 84

Fig. 4.22 Power7 thread decoder control circuit ... 85

Fig. 4.23 Bit cell read out architecture ... 86

Fig. 4.24 A/B port threads switch control ... 86

Fig. 4.25 Slot detected circuit ... 87

Fig. 4.27 Register files replica circuit ... 88

(15)

XIII

Fig. 5.1 The 2R2W register file unit cell ... 91

Fig. 5.2 The metal layer implement of 2R2W register file unit cell ... 92

Fig. 5.3 Layout view of 2R2W register file ... 92

Fig. 5.4 Conventional dummy read operation in half select cell ... 93

Fig. 5.5 Share RBL structure in 4R4W register file ... 94

Fig. 5.6 Share RBL structure no dummy read power consumption ... 94

Fig. 5.7 Share RBL structure in 4R4W register file ... 95

Fig. 5.8 (a) Negative level in different voltage (b) Negative level in different corner 96 Fig. 5.9 4R4W 2kb multi-thread register file floor plan ……….………97

Fig. 5.10 Test pattern and simulation waveform result ... 99

Fig. 5.11 Write post-layout simulation waveform result ... 100

Fig. 5.12 Read post-layout simulation waveform result ... 100

Fig. 5.13 Data transmission path in this 4R4W multi-threading register file ... 101

Fig. 5.14 4R4W multi-thread register file bank layout view ... 103

Fig. 5.15 Improve technologies of 4R4W register file design ... 103

Fig. 5.16 Read “0” performance for post-layout simulation ... 104

Fig. 5.17 Write “0” performance for post-layout simulation ... 104

Fig. 5.18 Write “1” performance for post-layout simulation ... 105

Fig. 5.19 (a) Address conflict detect circuit delay under wide range ... 106

(b) Delay of write worst case compare with read “0” access time. ... 106

Fig. 5.20 Power consumption with different supply voltage ... 107

Fig. 5.21 Energy consumption with different supply voltage ... 108

(16)

XIV

Fig. 5.23 Leakage power saving different voltage ... 109

Fig. 5.24 Hold static noise margin simulation ... 111

Fig. 5.25 Write trip point simulation ... 111

List of Tables

Table 3.1 Summary of the 8kb 2R2W multi-port spec ... 50

Table 3.2 Characteristic of 2R2W 8kb SRAM ... 52

Table 5.1 Negative VVSS circuit function ... 96

Table 5.2 The specification of proposed ... 98

Table 5.3 Improve technology of 4R4W register file design ... 102

Table 5.4 These works compare with conventional SRAM bit-cell ... 110

(17)

1

Chapter 1

Introduction

1.1 Background

Low power design for portable device such as cell phone, wireless device and

notebook are rapidly growing in these years. A simple and effective way to reduce

energy is to scale down supply voltage. Reduction of energy consumption is desirable

in microprocessors to enable longer battery life and adequate heat dissipation. The

active power saving is quadratic and leakage power reduction is linear [1.1]–[1.3]. The

total energy consumption equation is showed in (1.1)

current short leakage dynamic total P P P P     (1.1)

WherePdynamic fCVDD2,PleakageVDDIleakage,PscVDDImean

(18)

2

Fig. 1.1 shows the SRAM Min. energy point not in the sub threshold but in near

threshold region. Operation in sub threshold region, although power reduce only

linear, delay rises significantly, and power delay product will not small anymore. This

reason make the Min. energy point is shift to neat-threshold region.

1.2 Challenges

Conventional dual-port (Fig. 1.2) or multi-port design is power hungry by accessing

many ports parallel at the same time. High frequency and multi-port read out by register

file often domain the whole chip power consumption. In order to gain high data

transmission bandwidth, this design is needed and not to be lack. Applying for portable

device, low power design is very important which can save obvious power dissipation,

and enables longer battery life.

read-word-line

R

ea

d

B

L

R

ea

d

B

Lb

write-word-line

W

rit

e

B

Lb

W

rit

e

B

L

Fig. 1.2 Conventional dual-port 8T SRAM bit-cell

Voltage scale down is not easy in conventional dual-port 8T design because read

disturb problem and read/write conflict issues will degrade cell stability. Not only cell

stability problem, write–ability is another big challenge for low voltage operation.

When supplying voltage scaling down, area of dual 8T must to enlarge cell size for

more stability. Besides, as the supply voltage is reduced, the effect of Ion/Ioff ratio is

smaller than operation in super-threshold region. Driver current normally degrades

(19)

3

In addition to CMOS technology process scaling down, there is more physic solid

state effect generated on device. Moreover, these minimum geometry transistors are

vulnerable to inter-die as well as intra-die process variations. Intra-die process variation

includes random dopant fluctuation (RDF), line edge roughness (LER). This may lead

to threshold voltage mismatch between the adjacent transistors in a memory cell giving

asymmetrical characteristics [1.5] [1.6].

1.3 Motivation

Try to do a low power and high retention SRAM cell [1.7], and no conflict

distribute problem is this work goal. Conventional dual-port already not suit for novel

technology process. A new structure must be proposed to solve distribute problem and

conflict problem. Operation under near-threshold voltage to gain the Min. energy

consumption, and this cell will apply to low power device such as wireless senor or

mobile phone. In addition to low power and high reliability, high bandwidth support

is another primary circuit concern. Multi-thread and multi-bank structure design may

include in this work for performance improve.

Except scaling down the supply voltage, peripheral circuit will use low power

design such as power gating, CLK gating, and DVS technology may added for active

or standby power reduction [1.8]. Boost or Negative circuit can improve write-ability

when operation under low voltage. Previous work like cut off feedback in single-end

design also can gain more write ability. In this design, power reduction is more

important than high speed operation.

For more robust consideration, bit-interleaving structure can eliminate soft error

rate damage the SRAM bit-cell. Leakage problem in BL can’t be ignored in low

voltage operation region. If leakage current is too over reliability, read operation will

(20)

4

1.4 Thesis Organization

Following is the main contents of this thesis. In Chap 2 we will discuss the recent

work about low power SRAM design. A conventional 6T SRAM basic operation and

stability are introduced at first. After that low power SRAM assists circuit design,

multi-port SRAM and register file design are discuss step by step. Chapter 3 shows

conventional dual-port operation conflict problem and read disturb issues. A new

2R2W multi-port SRAM structure is proposed. Share write bitline and X and Y cut

control line can do bit-interleaving structure and no need any others periphery circuit.

In Chap. 4, register file with multi–thread and double pump technology introduced at

beginning. New technology “Data Slot Switch” and conflict detect circuit can help no

disturb issue. In Chap 5, a new share read bitline is proposed and reduce dummy read

of bit-interleaving structure. Active power saves by share RBL and leakage power

reduces by keep RWL high in standby mode. In the end, Chapter 6 finally concludes

(21)

5

Chapter 2

Previous Low-Power SRAM Designs

2.1 Introduction

In recent microprocessors, the capacity of on-chip memory is rapidly increasing

to improve overall performance. According to ITRS roadmap in 2002 [2.1] [2.2],

memory chip will occupy 90% of chip area in 2013. In such a memory rich chip, the

leakage current of an SRAM, which comprises the vast majority of on-chip transistors,

dominates the standby current because leakage power is proportional to the number of

transistors. Thus, it becomes important to focus on SRAM standby leakage current

reduction for ultra-low power application.

Low power, minimum transistor count and fast access SRAM is essential for

embedded multimedia and communication applications realized using system on a

chip technology. Hence, simultaneous or parallel read/write (R/W) access multi-port

SRAM bit cells are widely employed in such embedded systems. Multi-port has many

advantages like high performance and high bandwidth, but it also consumers more

percentages of power and area.

This chapter begins with the analysis of power dissipation of SRAM circuit and

technique for leakage reduction will be shown in section 2.2. In section 2.3, stability

issues of SRAM cell, including hold stability, read stability, and write ability will be

defined and the impact of variation on SRAM in low voltage will be presented. In

section 2.4, 2.5 and 2.6, Conventional dual port SRAM and Multi-port SRAM cell are

showed. Finally, In 2.7 the previous Multi-port register file cell design and peripheral

(22)

6

2.2 Power Dissipation

This chapter begins with an analysis of power dissipation of CMOS circuit and

circuit technique for power dissipation. Power dissipation combines with dynamic

power (Pdynamic), leakage power (Pleakage), and short circuit power (Pshort-circuit). Power

could be expressed as, where Pdynamic = α CLVDD2f, Pleakage = VDDIleakage, and

Pshort-circuit=ImeanVDD

Ptotal = Pdynamic+ Pleakage+ Pshort−circuit (2.1)

2.2.1 Dynamic Power

Fig. 2.1 show a CMOS inverter, the average dynamic power dissipation can be

obtained by summarizing the average dynamic power of N/P MOS. The cause of

dynamic power is logic transition of CMOS circuits which charges or discharges its

load capacitance and parasitic capacitance (CL). As can be seen in (2.1), the dynamic

power dissipation is direct proportion to switching activity factor (α), capacitance load

(CL), squire of supply voltage (VDD2), and operating frequency (f).

VDD

GND

V

IN

V

OUT

iN

iP

C

L

(23)

7

2.2.2 Leakage Power

Fig. 2.2 Leakage current of deep-submicron transistors

In advanced CMOS technologies, embedded SRAM leakage current becomes

dominant compared to the dynamic current. The majority of SRAM macro leakage

current is from its bit cell array [2.3]. Leakage current is composed of reverse-biased

junction leakage current (IREV), gate induced drain leakage (IGIDL), gate

direct-tunneling leakage (IG), and sub-threshold leakage (ISUB) in a CMOS transistor

[2.4] [2.5].

Fig. 2.2 shows reverse-biased junction leakage, sub-threshold leakage, gate

direct-tunneling leakage, injection of hot carriers from substrate to gate oxide, gate

induced drain leakage, and punch through leakage in the deep scaling transistor.

Junction Leakage

In Fig. 2.3, leakage in reverse biased transistors and diodes includes the effects of

carrier generation, related to residual damage density and location relative to the

junction boundary, as well as structure and bias dependent effects of gate oxide

leakage, band-to-band tunneling at the drain junction and thermionic emission from

metal contacts. All of these effects depend on process conditions, through dependence

on dopant activation and profile shape, junction location and local electric fields. Subthreshold Punchthrough GIDL Reverse bias diode Gate Oxide Tunneling Gate Source Drain n+ n+ Well P

(24)

8

Fig. 2.3 Gate leakage current paths in a NMOS transistor

In the steady-state ON region both the gate and drain of the device are held at high

with the source being grounded. In this state a well-formed channel exists and three

separate components of the gate tunneling current Igs, Igcs and Igcd are active. The

component from gate to drain overlap (Igd) is absent due to the almost zero electric

field in that region of the oxide. The overall current flow is from gate to source and

channel, opposite to the flow in the OFF state. In the steady-state OFF region both

gate and source are at ground while the drain is at high (VDD) voltage. Since no

channel is formed in this condition, the only active component is Igd [2.6].

Gate-induced drain leakage (GIDL):

As the electric field in and around the gated p-n junction is increased by the

applied gate voltage, all the high-field effects, such as avalanche multiplication and

band-to-band tunneling, can increase very dramatically (Fig. 2.4). Thus, the leakage

current of a reverse-biased gated diode can increase dramatically when the gate voltage

(25)

9

Fig. 2.4 Leakage current of deep-submicron transistors

Sub threshold Leakage

When gate voltage is below the threshold voltage, sub-threshold leakage or weak

inversion current occurs between source and drain. For example, an off state inverter,

although the Vgs of the NMOS is 0V, there is a light current (leakage) flowing from the

drain to source due to the voltage VDD across Vds [2.7].

Sub-threshold behavior can be modeled physically as show in the following [2.8]

𝐼𝑑𝑠 = 𝜇𝑊𝐿 (𝑘𝑇𝑞)2𝐶𝑠𝑡ℎ𝑒𝑉𝑔−𝑉𝑇+𝜂𝑉𝑑𝑠𝑚𝑘𝑇 𝑞⁄ (1 − 𝑒−𝑘𝑇 𝑞𝑉𝑑𝑠⁄ ) , 𝑚 = 1 +𝐶𝑠𝑡ℎ

𝐶𝑜𝑥 (2.2)

Where W and L denote the transistor width and length, μ denotes the carrier mobility,

Csth = Cdep = Cit denotes the summation of the depletion region capacitance and the

interface trap capacitance both per unit area of the MOS gate, η is the drain induce

barrier lowering (DIBL) coefficient, and Cox denote the gate input capacitance per unit

area of the MOS gate.

Sub-threshold leakage increases exponentially with the reduction of the threshold

voltage and DIBL would lower threshold make leakage even worse. On the other hand,

sub-threshold can be drop with increasing the threshold voltage. In low power

technology we can use high Vth technology transistor to reduce sub-threshold leakage in

(26)

10

High-K Metal Gate

In order to reduce gate leakage, a new material is used for replace the conventional

SiO2. Silicon dioxide has been used as a gate oxide material for decades. As transistors

have decreased in size, the thickness of the silicon dioxide gate dielectric has steadily

decreased to increase the gate capacitance and thereby drive current, raising device

performance. As the thickness scales below 2 nm, leakage currents due to tunneling

increase drastically, leading to high power consumption and reduced device reliability

(Fig. 2.5). Replacing the silicon dioxide gate dielectric with a high-κ material allows

increased gate capacitance without the associated leakage effects. The 2.3 rule showed

that we can add high k material and extended thickness to get the equal capacitive. By

thickness oxide, leakage problem can reduce significantly [2.9].

Fig. 2.5 Conventional silicon dioxide gate dielectric structure compared to a

potential high-k dielectric structure

𝐶 =𝑘∈0𝐴

𝑡 (2.3)

A is the capacitor area

 κ is the relative dielectric constant of the material (3.9 for silicon dioxide )

 ε0 is the permittivity of free space

(27)

11

Fin FET Structure

Fig.2.6 shows Fin FET device has especially faster switching times and higher

current density. Not like conventional MOS structure, a new better gate control device

is developed by IBM. Vertical gate has more area cover the channel, so better control

ability is approach. Due to its superior gate control, electrostatic integrity, and

variability, Fin FET has demonstrated satisfactory scalability and feasibility for mass

production of post-22-nm technology node [2.10] [2.11].

Fig. 2.6 Fin-FET structure

Punch-through Leakage

Finally, in short-channel devices, due to the proximity of the drain and the source,

the depletion regions at the drain-substrate and source-substrate junctions extend into

the channel. As the channel length is reduced, if the doping is kept constant, the

separation between the depletion region boundaries decreases. An increase in the

reverse bias across the junctions (with increase in VDS) also pushes the junctions nearer

to each other. As the combination of channel length and reverse bias leads to the

merging of the depletion regions, punch through leakage occurs.

Punch through will bring a high current, and make the device short down. Hot and

(28)

12

2.2.3 Short Circuit Power

When CMOS switch frequently, a path from vdd to gnd will short together. This dc

path makes external power consumption. Short circuit power can be expressed as rule

(2.4). Imean is the mean value of the short circuit current [2.12].

On the circuit-level, there have been a number of articles describing the short

circuit power. From the short circuit power articles by Veendrick [2.13], and

Hedenstierna and Jeppson [2.14], these power dissipation rules are showed below.

P

short-circuit

=I

mean

x V

DD (2.4)   f V V Pshortcircuit DD t 3 ) 2 ( 12    (2.5)

P: The device transistor conductance τ: The ramp time

β: The gain factor of a transistor, f: The operating frequency

2.3 SRAM Bit-cell Stability and Write-ability

When CMOS technology process is scaling down, process variation is become

more and more important. PVT variation is the major effect on cell stability, such as

global variation and local variation. Therefore, how to use the simulation information

to accurate the true threshold is very important. The worst cast must be consider and

usually use Monte Carlo simulation to detect it. The following of this section will

(29)

13

2.3.1 Static Noise Margin (SNM)

The best common way to measure the stability of cross-coupled inverters is the

static noise margin (SNM). Hold static noise margin is defined as the maximum value

of static DC voltage noise which can be tolerated by the SRAM bit-cell without flipping

the storage node when word-line turns off. Fig. 2.7 shows the normal test Hold SNM

simulation in 6T SRAM cell. Give a two noise in the Q and Qb then find max voltage

noise can maintain the storage data of the SRAM. In this case, WL is zero and two BL

keep high [2.15].

Fig. 2.8 shows the standard setup for modeling Read SNM. Compare with HSNM

mode, in this case WL is turn and simulation read operation. The node “0” will raise a

little voltage because of the voltage dividing effect between the pass transistor and

pull-down transistor. Once the disturb voltage rise near to the trip point of the inverter,

data will be flipped. The curve is small than HSNM because read distribute issues and

it reduce node stability significantly. Fig. 2.7 and Fig. 2.8 also show the example of

butterfly curves during hold and read, revealing the degradation in SNM during read.

0 VDD 0 VDD VR (V ) VL (V) WL=0 BL=VDD BLb=VDD VL VR VN VN

(30)

14 0 VDD 0 VDD VR (V ) VL (V) WL=VDD BL=VDD BLb=VDD VL VR VN VN

Fig. 2.8 Standard setup for finding Read SNM

2.3.2 Write Margin (WM)

There are many way to measure the write ability of SRAM bit-cell, the simple one is find the write trip point (WTP). Write margin is defined as𝑉𝐷𝐷− 𝑀𝐼𝑁[𝑉(𝑊𝑊𝐿)]. 𝑀𝐼𝑁[𝑉(𝑊𝑊𝐿)] is the minimum write-word-line voltage required for flipping the bit-cell. In this write margin test mode, sweep WL voltage from VDD to Zero. The

higher write margin, the easier the data is written into bit-cell. Fig. 2.9 shows a

corresponding example of finding write margin. The write margin is defined as the VDD

- VWL value at the point when VR and VL flip. The write margin value and variation is a function of the cell design, SRAM array size and process variation. A cell is considered

not writeable if the worst-case write margin becomes lower than the ground potential.

VWL(V) In te rn al n od e vo lta ge VL VR 0 VDD VW L( V) VDD 0 Write trip pointer WL=VWL BL=0 BLb=VDD VL VR

(31)

15

2.3.3 Impact of Variation on SRAM in Low Voltage

Differential 6T SRAM

6T bit-cell is not applied for process scaling down, and also not suitable in

low-voltage operation. If 6T cell want to operate under novel technology, area of

N/PMOS has to enlarge to gain more W/R ability. Process problem to 6T cell is very

sensitive, such as random dopant fluctuation (RDF) and line edge roughness (LER).

This may result in the threshold voltage mismatch between the adjacent transistors in

memory cell [2.16] [2.17].

Half select disturbs Failure

In Nano-device scaling down, threshold voltage variation is become larger. By

process variation NMOS Vt is not a constant value anymore, if disturb voltage is

larger than bit-cell trip voltage, the data will flip and error happened. Conventional 6T

with bit-interleaving structure will have half select problem. Fig. 2.10 shows the half

select disturbs failure and waveform, if pull-down NMOS Vt is too high, and access

NMOS Vt is low. Current is stack on the Qb, a probability data will flip by this

current path.

(32)

16

Read/Write conflict issues

In this configuration, both read and write accesses are opposite making it highly

difficult to overcome the severe effect of variation and manufacturing defects. Fig.

2.11 shows the β ratio of 6T SRAM bit-cell and the β ratio conflict will be described

afterward. WL BL BLb1”0” PUP PD AX AX VREAD VTRIP 𝐼𝑃𝑈𝑃 𝐼𝑃𝐷 ~𝛽1 = 𝜇𝑃𝑈𝑃(𝑊/𝐿)𝑃𝑈𝑃 𝜇𝑃𝐷(𝑊/𝐿)𝑃𝐷 → 𝑉𝑇𝑅𝐼𝑃 𝐼𝐴𝑋 𝐼𝑃𝐷~𝛽2= 𝜇𝐴𝑋(𝑊/𝐿)𝐴𝑋 𝜇𝑃𝐷(𝑊/𝐿)𝑃𝐷→ 𝑉𝑅𝐸𝐴𝐷 𝐼𝐴𝑋 𝐼𝑃𝑈𝑃~𝛽3= 𝜇𝐴𝑋(𝑊/𝐿)𝐴𝑋 𝜇𝑃𝑈𝑃(𝑊/𝐿)𝑃𝑈𝑃 → 𝑉𝑤𝑟𝑖𝑡𝑒

Fig. 2.11 The β ratio of 6T SRAM bit-cell

During read access the cell must remain bi-stable to ensure that both data logic

value can be held and read without being upset by read disturb that occur at the

internal nodes. In order to facilitate read and minimize read disturb, the β2 ratio

should be small enough by strong PD NMOS and weak AX NMOS. During write

access the cell should be made mono-stable to write the desired data. For improving

writability, the β3 ratio must be large by strong AX NMOS and weak PUP PMOS.

For improving writability and minimizing read disturb simultaneously, the

transistor can be sized as PD > AX > PUP. However, it would degrade the β1 ratio

hence the VTRIP result in poor read SNM. Therefore, these three β ratios are conflict to

each other, simply sizing could not solve 6T SRAM failures.

Hold and Read Failure

Hold failure happens if the destruction of the cell content in the standby mode at a

low supply voltage. Therefore higher trip point of back-to-back makes the cell easier to

(33)

17

very low voltages and will form the basis for several of the ultra-low voltage bit-cell

design described in section 2.5 and 2.6.

Fig. 2.12 6T SRAM SNM loss at low voltages [2.18]

If the data stored in an SRAM cell flips during reading, there is a read failure. If

the voltage rise at the node storing “0” and higher than the trip point of the back-to-back

inverter, then the data stored in the cell would flip over. Fig. 2.12 shows that the 6T

SRAM bit-cell fails to operate at low voltages because of reduced signal levels and

increased variation. At low voltages, the read SNM is negative, indicating loss of

stability.

Write Failure

If the data stored in an SRAM cell can’t be flip during writing, there is a write

failure. While writing “0” to node storing “1,” the voltage at the node need to be

discharged below the trip point of the back-to-back inverter. As shown in Fig 2.13, it is

(34)

18

Fig. 2.13 6T SRAM write margin [2.18]

Access Failure

If the voltage difference between the two bit-lines (dual-end) or the voltage drop

of the single bit-line (single-end) can’t be sensed by the sense amplifier during the

access time, there is an access failure. The cause of access failure can be ascribed to

read-current degradation and data-dependent bit-line leakage.

The cell read-current, IREAD, is the current sunk from the pre-charged bit-lines

during a read access when the access devices are enabled. At ultra-low voltages, we expect a significantly reduced read-current because of the lower gate-drive voltage. However, the increased effect of threshold voltage variation severely degrades the weak cell read-current even further. Fig. 2.14 normalizes the read-current distribution by the mean read-current to highlight just the further degradation due to variation.

(35)

19

Fig. 2.15 IREAD is less than Ileakage from un-accessed cells at low voltage [2.19]

An implied consequence of the reduced read-current is that the aggregate leakage currents from the un-accessed cells on the same bit-lines can make conventional data sensing impossible. Because of the reduced ION-to-IOFF ratio and severe degradation

from read-current variation, these can exceed the actual read-current of the accessed cell. Fig. 2.15 shows IREAD /ILEAK,TOT of 256-row SRAM array loss of functionality at

low voltages. At ultra-low voltage the bit-line leakage exceeds the read signal, making

the accessed data indecipherable.

2.4 Previous Read/Write Assist Peripheral Circuit

2.4.1 Keeper Tracking Circuit Assist for SRAM Design

Wide or structures are typically used in the read path of register files, L1 caches,

match lines of TCAMs, flash memories and PLAs. In most of the applications the

worst case requirement would be to sense the difference between the leakage state

where all the pull-down legs are leaky and the ON state where only one of the legs is

ON. The increase in the variability and magnitude of the leakage current has become

a major bottleneck in realizing such wide OR gates [2.21] [2.22].

(36)

20

leakage currents in the pull-down NMOS logic for the FNSP and SNFP corners.

These results in performance degradation, higher short-circuit power dissipation and

limit the number of pull down legs.

An ideal keeper is expected to have minimum contention, good noise robustness,

good process tracking, less power and area overhead and should support wide fan-in

gates.

Fig. 2.16 A conditional keeper with INV chain [2.20]

Fig. 2.17 A current mirror keeper [2.22]

Conditional keeper (CKP)

A weak keeper holds the state of the dynamic node during the transition window

and a strong keeper is conditionally activated based on the state of the dynamic node

after a certain delay Fig. 2.16 This reduces contention during the evaluation period,

thereby enabling high speed and reducing the short circuit power dissipation.

Current mirror keeper (LCR)

Current mirror-based keeper technique Fig. 2.17 was proposed for better process

tracking. This technique provides excellent tracking of the delay, and the contention is

still high because the keeper is strongly ON during the beginning of the evaluation

phase. Further the replica transistor does not track the leakage due to noise (as Vgs=0)

and DIBL (as the drain voltage of the replica NMOS varies across process corners) in

(37)

21

Fig. 2.18 Cross couple keeper with INV chain (left)

Fig. 2.19 Rate sensing keeper with INV chain (right)

Cross couple keeper (CSK)

Fig. 2.18 is based on cross coupled structures, and has two switch steps. Using a

cross coupled structure based on SCL and feedback PMOS transistors to provide

additional noise immunity to the dynamic node without much performance

degradation.

Rate sensing keeper (RSK)

Fig. 2.19 is the rate sensing keeper technique works based on the difference in the

rate of change of voltage at the dynamic node of the gate during the ON (Rdynon) and

the leakage (Rdyoff) condition. A reference rate (Rref), which is the average of the two

rates, is used to control the state of the keeper. The fact that the keeper is OFF during

the start of the evaluation phase and the adaptive control of the keeper strength

based on the process corner helps RSK to achieve higher speed and better tracking,

respectively.

(38)

22

Fig. 2.20 Replica bias generator for RSK circuit

Fig. 2.21 The variation of the rates across different process corners

(2.7)

2.4.2 Charge Pump Circuit Design

In this section, I will to introduce about the boosting method for SRAM assist. How

to do can gain the more efficiency is very important and save extra power dissipation.

While the demand for aggressive low-power is ever increasing thus demanding a

lower VMIN of the SRAM cell. Before, people may do a SRAM cell sizing in order to

scale down the VMIN, but only 10% total VMIN reduction is attained and upsizing at a

cost of ~25% increase in array area. Therefore, RD and WR circuit assists that can

achieve VMIN reduction at a minimal area impact are necessary.

Boosting RWL enables larger read “ON” current without forcing a larger PMOS keeper. Boosting WWL helps WR VMIN for 2 reasons – improving contention without

(39)

23

the other side. At iso-array area, increase on-die boosting achieves twice as much

VMIN reduction as simple cell upsizing Fig. 2.21.

Fig. 2.22 8T SRAM cell with on-die RWL and WWL boosting [2.23]

Fig. 2.23 2SLS can Effective promotion boost ratio [2.23]

Fig. 2.24 Different boost frequency effect [2.23]

Boost Ratio Optimum

Ideal boosting ratio (BR = VBOOST/Vcc) under no load current (ILOAD) is 2VCC.

Actual BR is lower, however, as determined by ILOAD from all active & inactive

level-shifters, boosting clock frequency (FBCLK) Fig. 2.22, 2.23, and boosting

capacitance (CCP). At a given phase of BCLK, one of the two CP paths alternately

supply charge to the VBOOST rail. In order to maintain gate oxide & junction reliability

of devices connected to VBOOST, CP is enabled (i.e. BCLK is toggling) if [BR x VCC

< VMAX] is met. The CP is turned off otherwise, with transistor MX turned on to short

VBOOST to VCC [2.23].

The 2SLS minimizes dynamic ILOAD current that needs to be supplied by the CP.

Fig. 2.24 unlike conventional (DCVS) LS where a “0”-to-VBOOST transition is all

supplied by the VBOOST rail, the 2SLS performs this transition in 2 steps Fig. 2.24. In

the first step, “0”-to- VCC is supplied by MP1 at which point MP2 kicks in to supply the remaining VCC to- VBOOST.The circuit is proposed in [2.24].

(40)

24

Fig. 2.25 2-step level-shifter reduce ILOAD [2.23]

Charge Pump Circuit

Fig. 2.26 shows the four-stage Dickson charge pump circuit, where the

diode-connected MOSFETs are used to transfer the charges from the present stage to

the next stage [2.25] [2.26]. The voltage difference between the drain terminal and the

source terminal of the diode connected MOSFET is the threshold voltage when the

diode-connected MOSFET is turned on. Therefore, the output voltage of the

four-stage Dickson charge pump circuit has been derived as

(2.7)

Fig.2.26 Dickson charge pump circuits

The threshold voltage (Vt) of the diode-connected MOSFET becomes larger due to

the body effect when the voltage on each pumping node is pumped higher. Therefore,

the pumping efficiency of the Dickson charge pump circuit is degraded by the body

(41)

25

Fig. 2.27 Ker proposed CP circuit and waveform with four pumping stages

The circuit and waveform of the new proposed charge pump circuit with four

stages are shown in Fig. 2.27 [2.27]. To avoid the body effect, the bulks of the devices

in the proposed charge pump circuit are recommended to be connected to their

sources respectively if the given process provides the deep n-well layer. Clock signals

CLK and CLKB are out-of-phase but with the amplitudes of VDD.

2.5 Previous Low Voltage SRAM Design

2.5.1 SRAM Bit-cell

Differential VSSM 7T SRAM Bit-cell

The standard non-isolated read and writes 2-port 8T SRAM bit-cell is shown in

(42)

26

isolated read-port comprising of two transistors M1R, M2R, and a single read bit-line to

directly sense the data from node Q. By separating write port consisting of a single

ended write bitline and write word-line , this design offers a static-noise-margin-free

read operation, since it isolates the read current path (shown in dotted) from the data

storage nodes (Q or QB).

In the reason of separating R/W ports, the isolation of read-ports provides more

than 2 times better read SNM that cannot be achieved in standard 6T bitcell like the

HSNM.

Fig. 2.28 (a) The standard 2-port 8T SRAM bit-cell with non-isolated read-port [2.28]

(b) An isolated read-port 7T SRAM bit-cell [2.28]

DCO 8T SRAM Cell Design

In this paper [2.29], the authors try to use two kind core oxide structures for

power reduction and low VCCMIN. A new structure 8T cell with dual core oxide

(DCO) in 45LPG triple gate oxide CMOS process is proposed for high performance

low leakage mobile applications. The DCO 8T SRAM operates under dual voltage

supplies with write assist. Compared to traditional single-end 8T cell, DCO 8T SRAM

showed the same performance with only half the standby leakage, and lower VCCMIN.

The DCO 8T cell is designed in 45nm LPG CMOS process which shows in Fig.

(43)

27

normally at 0.9V and 1.1V, respectively. For example, during low voltage operation,

only VddM will be lowered to 0.6V while VddM1 remains unchanged.

Fig. 2.29 Schematic diagram of the DCO 8T SRAM cell with dual VDD [2.29]

Fig. 2.30 DCO 8T cell shows 2x lower leakage at the same read current at 0.9V

comparing to SCO 8T cell [2.29]

Fig. 2.31 Comparison of leakage components between DCO 8T cell and SCO 8T

cell at 0.9V (Q = 1) [2.29]

Fig. 2.30 shows read current vs. standby leakage comparison between these two

cells across process corners. DCO SRAM standby leakage at 0.9V is only 3nA, which

is half of the SCO cell at the same 98uA Iread performance. This is mainly due to the

(44)

28

port.Silicon data showed the DCO cell read BL leakage (sub threshold leakage) and

read pull-down gate leakage are dominating leakage source, as shown in Fig.2.31.

A new 2-port SRAM it-cell

In this paper [2.30], a new structure dual-port 6T cell is proposed. Fig. 2.32

combine with assist MOS and more one global WL, there are three merits show in the

list compare with convention dual port design.

1. A new 2-port 6T memory bit-cell and its word-oriented array organization is

proposed to eliminate simultaneous read and write access disturbances due to

column select functionality in neighboring bit-cells or words.

2. The poor read-noise margin and conflicting read-write problems are handled

by isolating the read and write-ports to achieve higher stability margins.

3. The process variation sensitivity analysis shows that the proposed design has

significantly low process variation sensitivity as compared to existing ones,

hence a better parametric yield.

Fig. 2.32 (a) Schematic diagram of the proposed 2-port 6T SRAM bit-cell with

shared read and write assist transistors per word [2.30]

(b) The VTC and SNM obtained from butterfly curve for the standard ST,

7T and proposed 6T SRAM bit-cells [2.30]

Fig. 2.33 shows a 32-bit word-oriented SRAM array organization of the

(45)

29

is divided into many parts block for do this feature. Each word also has a

sub-wordline driver to activate the local wordlines, and a set of read and write-assist

transistors. In a word-oriented SRAM array organization, all the bit-cells of a word

are kept together, which facilitates the sharing of read and write-assist transistors.

Not only reasons said before, multi-divide word and bitline techniques are

commonly used to reduce the charging and discharging capacitance of wordlines and

bitlines, or in other words to minimize the read/write delay for improving the array

performance.

Fig. 2.33 A 32-bit word organization of the proposed 2-port 6T SRAM bit-cell to

eliminate simultaneous read/write disturbance problem [2.30]

Fig 2.34 Simultaneous R/W access issues in word-oriented array [2.30]

Fig. 2.34 shows the schematic diagram of a 2-port 6T SRAM bit-cell memory

module, with word-oriented array organization having four n-bit words (A, B, C and D)

arranged in 2-rows and 2-columns. By this way, the cell array can do simultaneous

(46)

30

Zigzag 8T-SRAM

Previously 8T/10T has obvious drawbacks of slower write back or wasteful

layout in implementing schemes, even if they are much better than the 6T cell. A

decoupled single-ended 8T (DS8T) [2.32] suffers slower read first and WB due to its

single-ended sensing. The CP10T [2.33] cell has larger area penalty because it uses a

5-poly pitch layout, and suffers degraded write ability due to its serial access-gates.

Decoupled differential 9T (D9T) and 10T cells improve read speed but require large

area. Poorer area-cost effective cells lead to an increasing σVTH due to limited

resorting to transistor upsizing (Fig. 2.37 & Fig. 2.36).

This paper demonstrates for the first time quantitative performance advantages of

a zigzag 8T-SRAM (Z8T) [2.31]. Fig. 2.35 shows cell over the decoupled

single-ended sensing 8T-SRAM (DS8T) with write-back schemes, which was

previously recognized as the most area-efficient cell under large σVTH/VDD

conditions. Since Z8T uses only 1T for each decoupled read-port, faster 2T

differential sensing (D2S) can be implemented within the same area as the

single-ended DS8T. Thanks to D2S, Z8T cell enables much faster R/W speed at

VDDmin than DS8T. For the same VDDmin/speed, Z8T save the cell area by 15%.

Compare with conventional DS8T area is 14% smaller and 53% faster read. In this

work, a low VDDmin can down to 250mV.

(47)

31

9T SRAM

Fig. 2.36 Schematic of the proposed 10T SRAM [2.33]

Fig. 2.37 Schematic of the proposed 9T SRAM [2.34]

A New Low Leakage 8T-SRAM

[2.35]

Figure 2.38 shows the architecture of new 8T SRAM cell. It consists of two extra

transistors MNLL and MNWL as compared to conventional 6T SRAM cell.

Transistor MNLL is used to reduce gate leakage while transistor MNWL is used to

make cell SNM free in the zero state. There are three characteristics in this cell design.

First is a novel read ‘’0’’ static noise margin free eight transistors SRAM cell is

proposed that reduces gate leakage power in the zero state. Second, this new high VT

8T SRAM cell reduces total leakage by 60% in zero state at highest temperature.

Finally, new cell improves SNM by 2.2 times as compared to conventional 6T SRAM cell in read operation and standby mode for the case when cell stores logic ‘1’.

(48)

32

2.6 Previous Low Power Register File Design

2.6.1 Register File Bit-cell

The Fig. 2.39 shows a RF bit-cell which can work in sub-threshold region [2.36],

the disadvantage is that the read port limits the cell number on bit-line due to a little

fan-in/out. The author replaced the conventional cell with the bottom right cell Fig.

2.40. It provides a solution to reduce the capacitance. However, the speed will

degrade and cause large area in array. In this mux cell design, select one cell of two

will spend an extra time.

A likely design in Fig. 2.41 also uses the same combinational circuit to reduce the

loading on RBL [2.37]. In this paper, the Double-DICE storage element, which reduces

charge sharing and collecting between the sensitive nodes of sensitive pairs in a Dual

Interlocked Cell (DICE) storage cell. If a radiation particle strikes a sensitive node

(drain of a NMOS or a PMOS in off mode), and it loses its charge, the redundant nodes

restore the state of this affected node and prevent an upset in the storage cell logic. The

DICE design provides excellent protection against SEU for sub-micron technologies,

where a single radiation strike results in charge collection at only one node. So the

author combines this cell and reduce fan in technology by two NAND-OR gate to get a

low capacitor design.

In [2.37], he proposed a Dual-DICE design, which interleaves two DICE storage

cells to make them more resistant to upsets caused by charge sharing and creation of

lateral parasitic bipolar transistors in multiple PMOS devices in deep submicron

technologies. The design provides an area savings as compared to the alternative

(49)

33

Fig. 2.39 Cell number on one bit-line is small [2.36]

Fig. 2.40 Two bit-cells share one bit-line [2.36]

Fig. 2.41 Cell number on one bit-line is small [2.37]

The IRF design presented several challenges with the large number of multi-ported

registers required to support the four threads in the core [2.38] [2.39]. The design goal

was to satisfy the performance needs with competitive area and power consumption.

Performance-wise, the pipeline requires a read access immediately after a restore

(50)

34

The IRF in this design supplies a maximum of three operands per instruction for the

single active thread. Therefore, the read ports for all four threads are merged into a

compact structure with shared read bitlines (Fig. 2.42) to reduce area and power. The 32

entries are folded into two columns with only 16 read cell pull downs on the bitline for

optimal performance and array aspect ratio [2.40].

Fig. 2.42 IRF integrated memory cell circuit diagram [2.39]

Multi-port separation

Due to more efficient wiring and contact sharing, a 2R1W register file cell is ~3 to 4×

smaller than a 4R2W cell (Fig. 2.43), which reduces cell dimensions and thus both

wordline and bitline lengths by nearly a factor of two [2.41] [2.42].

Instead, the 2R1W cell subarray is replicated (with common write operations

(51)

35

achieved while still maintaining low word and bitline capacitances. Even with subarray

duplication, the 3 to 4× smaller cell size achieves a near 2× macro-level area reduction

over a traditional 4R2W design. This area reduction also results in a corresponding

decrease in leakage power. Due to reduced read bitline capacitance and smaller drivers,

read power and read bitline latency can both be improved by ~2×. Write power is not

dramatically affected as reduced write bitline capacitance balances subarray

duplication [2.43].

Fig. 2.43 Standard 4R2W split to 2 copies of a 2R1W cell [2.41]

2.7 Summary

In the beginning of this chapter I introduce the power consumption model and

device geometric effect. Nowadays, leakage power is domain the whole chip power

consumption and how to reduce power dissipation is a very important issue. Standby

power and leakage current are discussed in the section 2.1, then CMOS device design

and new technology such as FinFET, High-K metal gate are also introduced. After

that the basic operation of conventional 6T SRAM and introduce the basic concept

and measurement of stability and write ability in SRAM bit-cell. By technology

process scaling down, the process variation is already damage the SRAM cell stability

(52)

36

we introduce some new assist technologies for SRAM design or improve SNM, such

as boosting circuit, keeper design and negative BL …etc. Finally, new cell or share

WWL structures for low power purpose are discussed. Besides, new register design

(53)

37

Chapter 3

Low Power 2R2W Multi-Port 8Kb SRAM

Design

3.1 Introduction

In this chapter, a new low power 13T 2 Read 2Write (2R2W) multi-port SRAM bit

cell is proposed. Combine with wide range operation and multi-port and multi-port

goodness, it very suit for portable device or mobile phone. A new sharing WBL

structure and cross Y_Cut & X_Cut can help cell more robustness and improve write

ability and WBL driver power Reduction. Negative VVSS technology is embedded for

low voltage write success. Using this technology, a shorter write 1 time is approached.

In order to gain higher bandwidth, multi-port design becomes more important in

media application. No like conventional single port, multi-port SRAM design can do

synchronous or asynchronous operation, because it with two independent ports. Parallel

operation is got more bandwidth at same time, but a new conflict issues must be take

care.

At first, I discuss conventional problem in Chap.3.2. In this section conflict problem will be specific introduced. In Chap3.4, in order to improve write “1”ability, there are two technology used in this design. Single-end write is low power reduction but write

ability is drop compare with convention differential write. By use negative VVSS and

cut off feedback loop can improve write strength Chap. 3.5 shows post layout

simulation, performance and power analysis. The TSMC 40nm general purpose 2R2W

(54)

38

3.2 Conventional Dual-Port 8T SRAM

3.2.1 Two Kinds of Access Mode in DP-SRAM

Fig. 3.1 shows conventional dual-port SRAM bit-cell, it has two port can read / write at the same time. Compare with conventional single port design, dual port

structure give designer more control flexible. Dual-port SRAM provides high

bandwidth and asynchronous CLK timing control property. Conflict problem is a very

important in dual-port, there are many technologies to improve the Vmin of

DP-SRAM against a disturb condition [3.1] [3.2].

Fig. 3.1 Conventional 8T dual-port SRAM cell

Fig. 3.2 Different-row access mode [3.2] Conventional 8T Dual-Port B LA BLB B LA _b B LB _b WLB WLA

(55)

39

Fig. 3.3 Access in the same row [3.2]

3.2.2 Write and Read Disturb Issue in 8T DP-SRAM

There are two access modes in two port operation, first is different row which will

no disturbed problem [3.3]. Second is two ports access in the same row

simultaneously (Fig. 3.2 & Fig. 3.3). The case 1 (Fig. 3.4): If write for the left cell in

the same row, and read for the right cell. Dummy read is happened for the left side, which is referred to as “write disturbed” The dummy read operation prevents the internal “1” node from begin flipped by BLA, so the write-ability for the left memory.

(56)

40

The case 2 (Fig. 3.5): If read for the left cell on the same row, and another read port

is pointed to the right cell. Dummy read operation occurs for the left cell. The internal “0” node is ramped up though BLA, causing a reduction of the cell current. Consequently reduction in the cell in the cell current leads to a read failure due to lack

of BL swing (read disturb).

Fig. 3.5 Read operation disturbed by dummy read in the same row [3.3]

Timing control with CLK skew disturbed on dual-port also discussed in [3.4]. Timing

variation is relative wire line in whole chip, if positive skew or negative skew is

happened, write / read disturbed maybe caused function failed.

3.2.3 Read/Write Conflict of Dual-port

Fig. 3.6 shows new technologies that can solve the conflict problem by using

timing sharing technology [3.5] [3.6]. Normally, Read and write operation is

forbidden at the same time. In conventional design, if read/write is point to the same

bit, large conflict power consumption and it needs more wide WL pulse have to finish

the operation.

數據

Fig. 2.5 Conventional silicon dioxide gate dielectric structure compared to a  potential high-k dielectric structure
Fig. 2.6 Fin-FET structure
Fig. 2.10 The read-disturb of 6T SRAM in different process [2.17]
Fig. 2.11 The β ratio of 6T SRAM bit-cell
+7

參考文獻

相關文件

進而能自行分析、設計與裝配各 種控制電路,並能應用本班已符 合機電整合術科技能檢定的實習 設備進行實務上的實習。本課程 可習得習得氣壓-機構連結控制

軟體至 NI ELVIS 環境。現在,您在紙上或黑板上的設計可在 Multisim 內進 行模擬,並模擬為 NI ELVIS 或 NI ELVIS II 電路板配置上的傳統電路圖。設 計趨於成熟後,使用者即可在 NI

With the results of the literature review on cooperative learning of game design structures, this research examines the possibilities of applying the “Multi-Touch Control”

mov ax,var1 ;將其中一個記憶體內容先複製到暫存器 xchg ax,var2 ;分別執行記憶體與暫存器內容的交換動作 xchg ax,var1 ;完成交換。 Swap var1

而使影像設計工具在操作時呈現非預設的結果。為此操作者可以利用重設 Photoshop 軟體

MOV reg,data reg ← data 轉移立即資料(data)到暫存器 reg 內 MOV dreg,sreg dreg ← sreg 轉移暫存器 sreg 的內容到暫存器 dreg MOV segreg,reg segreg ← reg

• 可編程實體實物(Programmable physical objects),是指 一些可以讓人們設計及運行程序的物件,通常是一些電子 設備..

高等電腦輔助設計與製造 (Advanced Computer Aided Design and Manufacturing).