以電流模式操作之低功率和高速率的靜態隨機存取記憶體

全文

(1)國立交通大學電子工程學系電子研究所. 博士論文. 以電流模式操作之低功率和高速率的靜態隨機存取記憶體 Low Power and High Speed SRAM with Current-Mode Techniques. 研究生：王上銘指導教授：吳慶源博士. 中華民國九十三年六月.

(2) 以電流模式操作之低功率和高速率的靜態隨機存取記憶體 Low Power and High Speed SRAM with Current-Mode Techniques 研究生: 王上銘. Student: Shang-Ming Wang. 指導教授: 吳慶源博士. Advisor: Dr. Ching-Yuan Wu. 國立交通大學電子研究所博士論文. A Thesis Submitted to Institute of Electronics College of Electrical Engineering and Computer Science National Chiao-Tung University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Electronic Engineering June 2004 Hsinchu, Taiwan, Republic of China. 中華民國九十三年六月.

(3) 以電流模式操作之低功率和高速率的隨機靜態存取記憶體. 學生: 王上銘. 指導教授: 吳慶源. 國立交通大學電子研究所. 摘要. 本論文主要針對低功率消耗的靜態隨機存取記憶體的設計與分析。靜態隨機存取記憶體的存取路徑可分為三部分:一為寫入路徑，從位元址輸入到列位元線端;另一為讀取路徑，從列位元線到資料輸出端;最後為記憶細胞元。藍達雙極性電晶體是利用金氧半場效電晶體與其寄生的雙極性電晶體所合成的一種電壓控制負微分電阻電晶體，可應用於記憶的元件。本文提出新的藍達雙極性電晶體結構，並以簡單的電路模式與元件物理來探討其工作原理。利用所提的藍達雙極性電晶體，設計完成新型單邊讀寫記憶細胞元。設計一個低功率且高效能的靜態隨機存取記憶體，常常著重於減少工作時的工率及被用狀態的直流電流與漏電流。為減少讀寫操作時所消耗的功率，我們提出電流模式讀寫操作的機制以取代傳統的電壓模式。本論文提出電流模式操作感測放大器，當位元線的電壓僅需少許的變化，此感測放大器便能順利的讀取，並且能降低雜訊。另外，提出以電流模式操作的寫入驅動器，其寫入時僅需將位元線的電壓作少許的變化，不僅可降低功率的消耗並可加速寫入的動作。利用電流模式的讀寫技術，讀取速度與寫入的脈衝寬度幾乎與位元線和資料元線的電容負載無關。根據此電流模式，提出一個可操作於高速低功率的細胞元。此細胞元的存取電晶體和反向器電晶體的尺寸幾乎相同，並可經由位元線的微小電壓差而驅動。 i.

(4) 為評估此電流模式技術，使用 0.35 微米一層複晶矽兩層金屬製程，製作一個 32Kx8 的靜態隨機存取記憶體。此記憶體在 3 伏特供壓下其存取時間為 9 奈秒，動態電流在 100 百萬赫茲頻率工作時為 28 毫安培。. ii.

(5) Low Power and High Speed SRAM with Current-Mode Techniques Student: Shang-Ming Wang. Advisor: Dr. Ching-Yuan Wu. Institute of Electronics National Chiao-Tung University. Abstracts This thesis explores the design and analysis of Static Random Access Memories (SRAMs) and focuses on low power operation. The SRAM access path is split into three portions: from address input to word line rise (the write operation), from word line rise to data output (the read path) and memory cell. The techniques to optimize both of these paths are investigated. The voltage-controlled negative-differential-resistance device by using a merged integrated circuit of n-channel MOSFET and parasitic NPN bipolar transistor, called Lambda bipolar transistor (LBT), is known for its memory application. In this thesis, a new LBT structure is developed and its characteristics are derived by simple circuit model and device physics. A novel single-sided memory cell based on the proposed LBT’s is presented. High performance and low power SRAM design always focuses on reducing dynamic power dissipation at the operating state and decreasing DC current and leakage current at the standby state. To reduce operation iii.

(6) power without decreasing read/write speed, we propose special current-mode. read/write. mechanism. instead. of. conventional. voltage-mode circuits. In this thesis, a new current-mode sense amplifier is proposed to sense the bit-line signal even though the voltage swing of the bit-line is small, and the non-floating design reduces noise produced during sensing in the standby mode. The current-mode write driver can reduce the bit-line swing when data write in, not only decreasing power consumption but also speeding up writing access time. Using new current-mode techniques for read and write operation, the sensing speed and write pulse width are insensitive to the bit-line and data-line capacitances and a separated positive feedback technique is used to enable the circuit to operate at high-speed and low-power. These techniques always keep the voltage swing of the bit-line and data-line quite small. Based on current-mode operation, a memory cell that operates at low-power current-mode is developed. The memory cell has almost equally sized access and inverter transistors, which can be toggled using a small differential bit-line voltage. The presented techniques were demonstrated to be useful by evaluating an experimental 32Kx8 SRAM chip using 0.35um 1P2M CMOS process technology. An experimental 32Kx8 CMOS SRAM with a 9ns access time at a supply voltage of 3V is described to evaluate the new current-mode techniques. The active current is 28mA at 100MHz and 25℃.. iv.

(7) Acknowledgements I thank my advisor Prof. Ching-Yuan Wu for his invaluable guidance throughout the course of this thesis. His insights and wisdom have been a great source of inspiration for this work. He not only encourages me to pursue this degree but also provides me the best research environment with generous support. Special thanks are given to Silicon-Based Technology Corp. for financial support, especially chip implementation. Thanks to my many friends on campus and outside, I have had many memorable moments outside my work. I would especially like to thank Prof. Bill Tai and the members of Silicon-Based Technology Corp. Furthermore, I would like to thank the members of the Advanced Semiconductor Device Lab. for their valuable suggestions, interesting discussions, and their friendship. I am grateful to my sisters and brother for being such a wonderful family. I thank my wife, Hsiao-Mei, for her love and patience during the final stages of my dissertation. Last but not the least, I am eternally indebted to by my parents for their love, encouragement and support. This dissertation is dedicated to them.. v.

(8) Contents Abstract(Chinese) ………………………………………………………i Abstract(English)...………………………………………………iii Acknowledgments ……………………………………………………... v Contents ………………………………………………………………. vi Figure Captions ……………………………………………………... viii Table Captions ………………………………………………………... xi Chapter 1 Introduction ……………………………………………... 1 Chapter 2 Overview of CMOS SRAM …………………………….. 6 2.1 SRAM Partitioning ………………………………...................... 6 2.2 Circuit Techniques in SRAMs …………………………………. 8 Chapter 3 Lambda Bipolar Transistor Memory Cell …………….... 16 3.1 New Lambda Bipolar Transistor ……………………………… 18 3.2 Description of New Memory Cell ……………………………. 23 3.3 Performance of New Memory Cell .…………………………... 28 3.3.1 Write Operation ………………………………………… 28 3.3.2 Read Operation ………………………………………… 29 3.3.3 Comparisons …………………………………………... 31 Chapter 4 New Current-Mode Sense Amplifier …………………... 33 4.1 Introduction ………………………………………………….. 33 4.2 Voltage Sensing and Current Sensing ……………………….. 35 4.2.1 Theoretical Model …………………………………...... 35 4.2.2 Voltage-Mode and Current-Mode Signal Delay ……… 38 4.3 Voltage-Mode Sense Amplifier …………………………….... 43 4.4 Clamped Bit-Line Sense Amplifier …………………………... 45 4.5 New Current-Mode Sense Amplifier ………………………... 48 4.5.1 Circuit Description and Operation …………………….. 48 4.5.2 Simulation Results …………………………………….. 52 Chapter 5 New Current-Mode Write Driver ……………………... 57 5.1 Conventional Voltage Writing Mechanism …………………. 58 vi.

(9) 5.2 Current Writing with Equalization Transistor ………………. 59 5.3 New Current Writing Mechanism ….……………………….. 62 5.3.1 Current-Mode Write Driver …………………………… 62 5.3.2 New Memory Cell for Current-Mode Operation ……... 64 5.3.3 Simulation Results and Comparisons ……………….... 66 Chapter 6 Low Power and High Speed SRAM …………………... 69 6.1 Low Power SRAM Architecture ………………………….… 69 6.2 Cell Design and Layout ……………….………………….…. 71 6.3 Process Variation Effects on Current-Mode Circuit ……….…. 73 6.4 Experimental Results …………………………………….….. 76 Chapter 7 Conclusions …………………………….……………….. 80 References …………………………………………………………….. 82. vii.

(10) List of Figures Fig. 1. Elementary SRAM structure. Fig. 2. Divided Word Line (DWL) Architecture. Fig. 3. Schematic of a two-level 8 to 256 decoder. Fig. 4. a) Conventional static NAND gate b) Nakamura’s NAND gate [35]. Fig. 5. Skewed NAND gate. Fig. 6. Bit-line mux hierarchies in a 512 row block. Fig. 7. Two common types of sense amplifiers. Fig. 8. A vertical Lambda bipolar transistor structure. Fig. 9. An equivalent circuit of a vertical Lambda bipolar transistor. Fig. 10 The I-V characteristics of a vertical Lambda bipolar transistor Fig. 11 General configuration of a new memory cell Fig. 12 The I-V characteristics of a new memory cell with current and resistive load Fig. 13 A new SRAM memory cell circuit Fig. 14 The static transfer characteristics of the memory cells Fig. 15 Write “0” operation Fig. 16 Write “1” operation Fig. 17 Read “0” operation Fig. 18 Read “1” operation Fig. 19 Sensing delay versus bit-line capacitance Fig. 20 Typical use of a sense amplifier Fig. 21 Theoretical voltage-mode signal model Fig. 22 Theoretical current-mode signal model Fig. 23 CMOS representation for a voltage-mode signal model Fig. 24 CMOS representation for a current-mode signal mode Fig. 25 A long interconnect model viii.

(11) Fig. 26 Comparison of voltage sensing and current sensing Fig. 27 Comparison of voltage sensing and current sensing with different values of load resistance Fig. 28 Comparison of voltage-sensing and current-sensing with approximations Fig. 29 Simple differential couple schematic Fig. 30 Full complementary positive feedback amplifier schematic Fig. 31 Clamped bit-line sense amplifier Fig. 32 A current-mode sense amplifier and a simplified data path circuit Fig. 33 Simulated current waveforms of the new current-sensing data path circuit Fig. 34 Simulated waveforms of the new current-sensing data path circuit Fig. 35 Sensing delay and average power dissipation versus bit-lines capacitance Fig. 36 Sensing delay and average power dissipation versus data-lines capacitance Fig. 37 Bit-line model during write access cycle Fig. 38 7T-memory cell Fig. 39 A current-mode write driver and a simplified data path circuit Fig. 40 Schematic of the memory cell Fig. 41 The static transfer characteristics of the memory cell Fig. 42 Simulated waveforms of the new current-writing data path circuit Fig. 43 Write pulse width and average power dissipation versus data-lines capacitance Fig. 44 Architecture of low power memory chip Fig. 45 The layout of memory cell Fig. 46 Layout placement of same-size transistor Fig. 47 Sensing delay and average power dissipation with process variations versus bit-lines capacitance Fig. 48 Write pulse width and average power dissipation with process ix.

(12) variations versus data-lines capacitance Fig. 49 A photomicrograph of 32Kx8 SRAM Fig. 50 Typical address and output waveforms Fig. 51 Shmoo plot of address time versus power supply voltage. x.

(13) List of Tables Table 1. Comparison to conventional SRAM cell Table 2. Process and SRAM characteristics. xi.

(14) Chapter 1 Introduction High-speed and low-power SRAMs have become a critical component of many VLSI chips. This is especially true for microprocessors, wherein the on-chip cache sizes are growing with each generation to bridge the increasing divergence in the speeds of the processor and the main memory [1-2]. Simultaneously, power dissipation has become an important consideration both due to the increased integration and operating speeds, as well as due to the explosive growth of battery operated appliances [3]. This thesis explores the design of SRAMs and focuses on reducing the operating power. While process scaling [4-5] remains the biggest drivers of low power design, this thesis investigates some circuit techniques which can be used in conjunction to scaling to achieve low power operation. Conceptually, a SRAM has the structure shown in Fig.1. It consists of a matrix of 2m rows by 2n columns of memory cells. Each memory cell in a SRAM contains a pair of cross-coupled inverters which form a bi-stable element. These inverters are connected to a pair of bit-lines through NMOS pass transistors which provide differential read and write access. A SRAM also contains some column and row circuitries to access these cells. The m+n bits of address input, which identify the cell to be accessed, are split into m row address bits and n column address bits. The row decoder activates one of the 2m word lines, which connects the memory cells of that row to their respective word line. The column 1.

(15) decoder sets a pair of column switches, which connects one of 2n bit-lines to the peripheral circuits.. Fig.1 Elementary SRAM structure. In a read operation, the bit-lines start precharged to a reference voltage usually close to the positive supply. When the word line turns high, the access NMOS connected to the cell node being stored a data ‘0’ starts discharging the bit-line, while the complementary bit-line remains in its precharged state, thus resulting in a differential voltage being developed across the bit-line pair. Each SRAM cell is optimized to minimize the cell area, and hence its cell current is very small, resulting 2.

(16) in a slow bit-line discharge rate. To speed up the RAM access, each sense amplifier is used to amplify the small bit-line signal and eventually drives the signal to the external world. During a write operation, the write data is transferred to the desired columns by driving the data onto the bit-line pairs by grounding either the bit-line or its complementary. If the cell data is different from the write data, then the data ‘1’ node is discharged when the access NMOS connects to the discharged bit-line, thus causing the cell to be written with the bit-line value. The next chapter introduces the various techniques which are used in practical SRAMs. For the purpose of design and optimization, the access path can be divided into two portions: the read path- the portion from the memory cell ports to the SRAM I/O ports and the write path- the portion from the I/O ports to the memory cell. In most SRAM cell design, the basic flip-flop circuit structure is the most frequently used. However, a full CMOS cell usually occupies twice larger area as compared with high-resistance poly load and poly-PMOS TFT load cells. On the other hand, a high-resistance poly load cell consumes relatively high standby power. Therefore, several earlier works [6-11] on single-sided memory cells had been conducted for both power and area reduction. In this thesis, we proposed new single-sided memory cells based on new Lambda bipolar transistor (LBT). In Chapter 3, we report the new LBT, and the new LBT is developed based on the original LBT structure with a modification for low power purpose. The operation principle of the device is derived by simple circuit model and device physics. In Chapter 3, we also present the new single-sided memory cells 3.

(17) based on our LBT. Some comparisons between the reported memory cells and the single-sided CMOS cell are made. For many years, the design of SRAM circuits has focused on improving the operation speed. For example, the capacity of SRAM quadruples every three years, and various voltage-mode sense amplifiers have been used in many generations of SRAM. As the bit-line and data-line capacitances get larger and larger as SRAM evolves, memory access time using voltage-mode sense amplifier will become quite long. Meanwhile, power supply voltage should be reduced in the future VLSI design for the sake of power reduction and device reliability. In order to overcome these problems, several papers [12-17] had proposed to use current-mode sense amplifiers for the future. However, the DC current through the sense amplifier would not be eliminated. To solve this DC power consumption in sense amplifier, a new current-mode sense amplifier is proposed. The new structure not only reduces DC power consumption of sense amplifier but also senses the bit-line differential signal in a very short time. The new n-type separated flip-flop current-mode sense amplifier will be described in Chapter 4. Chapter 5 discusses the new current-mode write driver circuit. The power consumption of writing data into memory always dominates for a large percentage of whole chip during the writing access cycle. In the past, the voltage-mode writing circuit was used. Using this mechanism, the voltage swing at the bit-line always needs almost full supply voltage swing. Therefore, the dynamic power consumption at the bit-line will increase as large as voltage swing at the bit-line variation. This large voltage swing not only consumes large power when writing data, but also 4.

(18) increases the memory cycle time. The cycle time is long because the bit-line level must be pulled up back to supply voltage after write operation, preparing for next read or write operation. So the operation speed of SRAM is not only determined by the access time but also by the cycle time. Some design concepts [18] using the current-mode technique in write operation have been proposed to reduce the large voltage swing at the bit-line. However, these methods increase the transistor number in memory cell, making the memory size larger. Moreover, the decoder of control signal and timing control become more complicated. The new current writing mechanism is proposed to reduce the large voltage swing at the bit-line without increasing transistor number in memory cell. The decoder architecture and timing control signal are as simple as the conventional technique. In Chapter 6, 32Kx8 SRAM chip is implemented. At the architecture level design, the key goals are localizing on operation signal to reduce active capacitance and switch, reduce signal swings, and eliminate any DC power consumption of system. We finally summarize the main conclusions of this thesis in Chapter 7.. 5.

(19) Chapter 2 Overview of CMOS SRAM The delay and power of practical SRAMs have been reduced over the years via innovations in the array organization and circuit design. This chapter discusses both these topics and highlights the issues addressed by this thesis. We will first explore the various partitioning strategies in Section 2.1 and then point out the main circuit techniques which have been presented in the literature to improve speed and power in Section 2.2.. 2.1 SRAM Partitioning For large SRAMs, significant improvement on delay and power can be achieved by partitioning the cell array into smaller subarrays, rather than having a single monolithic array as shown in Fig.1. Typically, a large array is partitioned into a number of identically sized subarray (commonly referred to as macros), each of which stores a part of the accessed word, called the subword, and all of which are activated simultaneously to access the complete word [19-21]. The macros can be thought of as independent RAMs, except that they might share parts of the decoder. Each macro conceptually looks like the basic structure shown in Fig.1. During an access to a certain row, the word line activates all the 6.

(20) cells in that row and the desired subword is accessed via the column multiplexers. This arrangement has two drawbacks for macros that have a very large number of columns; the word line RC delay grows as the square of the number of cells in the row, and bit line power grows linearly with the number of columns. Both these drawbacks can be overcome by. Fig.2 Divided Word Line (DWL) Architecture further subdividing the macros into smaller blocks of cells using the Divided Word Line (DWL) technique as first proposed by [22]. In the DWL technique, the long word line of a conventional array is broken up into k sections, which each section is activated independently thus reducing the word line length by k and hence reducing its RC delay by k2 . Fig.2 shows the DWL architecture where a macro of 256 columns is partitioned into 4 blocks and each block has only 64 columns. The row selection is now done in two stages, first a global word line is activated which is then transmitted into the desired block by a block select signal to 7.

(21) activate the desired local word line. Since the local word line is shorter (only 64 columns wide), it has a lower RC delay. Though the global word line is nearly as long as the width of the macro, it has a lower RC delay than a full length word line since its capacitive loading is smaller. It sees only the input loading of the four word line drivers instead of the loading of all the 256 cells. In addition, its resistance can be lowered as it could use wider wires on a higher level metal layer. The word line RC delay is reduced by another factor of four by keeping the word-line drivers in the center of the word line segments thus halving the length of each segment. Since 64 cells within the block are activated as opposed to all the 256 cells in the undivided array, the column current is also reduced by a factor of 4. The concept of dividing the word line can be carried out recursively on the global word line (and the block select line) for larger RAMs, and is called the Hierarchical Word Decoding (HWD) technique [23]. Partitioning can also be done to reduce the bit-line height. Partitioning of the RAM incurs area overhead at the boundaries of the partitions. For example, a partition which dissects the word lines requires the use of word-line drivers at the boundary. Since the RAM area determines the lengths of the global wires in the decoder and the data path, it directly influences their delay and energy.. 2.2 Circuit Techniques in SRAMs The SRAM access path can be broken down into two components: the decoder and the data path. The decoder encompasses the circuits from the address input to the word line. The data path encompasses the circuits 8.

(22) from the cells to the I/O ports. The logical function of the decoder is equivalent to 2n -input AND gates, where the large fan-in AND operation is implemented in a hierarchical structure. The schematic of a two-level 8 to 256 decoder is shown in Fig.3. The first level is the predecoder where two groups of four address inputs and their complements (A0, A0, A1, A1, ...) are first decoded to activate one of the 16 predecoder output wires respectively to form the partially decoded products (A0A1A2A3, A0A1A2A3, ...). The predecoder outputs are combined at the next level to activate the. Fig.3 Schematic of a two-level 8 to 256 decoder word line. The decoder delay consists of the gate delay in the critical path and the interconnect delay of the predecoder and word line wires. As the wire RC delay grows as the square of the wire length, the wire delay within the decoder structure, especially of the word line, becomes 9.

(23) significant in large SRAMs. Sizing of gates in the decoder allows for trade off between the delay and the power. Transistor sizing had been studied by a number of researchers for both high speed [24-26] and low power [27-28]. The decoder sizing problem is complicated slightly due to the presence of intermediate interconnect from the predecoder wires.. Fig.4 a) Conventional static NAND gate b) Nakamura’s NAND gate [35] The decoder delay can be greatly improved by optimizing the circuit style used to construct the decoder gates. Older designs implemented the decoder logic function in a simple combinational style using static CMOS circuit style (Fig.4a) [29-31]. In such a design, one of the 2m word lines will be active at any time. If in any access, the new row address differs from the previous one, then the old word line is deasserted and the new word line is asserted. Thus, the decoder gate delay in such a design is the maximum of the delay to deassert the old word line and the delay to assert a new word line, and it is minimized when each gate in the decode path is designed to have equal rising and falling delays. The decoder gate delay can be significantly reduced by using pulsed circuit techniques [32-34], where the word line is not a combinational signal but a pulse 10.

(24) which stays active for a certain minimum duration and then shuts off. Thus, before any access all the word lines are off and the decoder just needs to activate the word line for the new row address. Since only one kind of transition needs to propagate through the decoder logic chain, the transistor sizes in the gates can be skewed to speed up this transition and minimize the decoder delay. Fig.4b shows an instance of this technique [35], where the PMOS in the NAND gates are sized to be a half that in a regular NAND structure. In the pulsed design, the PMOS sizes can be reduced by a factor of two and still result in the same rising delay since it is guaranteed that both the inputs will deassert, thus reducing the loading of the previous stage and hence reducing the overall decoder delay. This concept is extended further in [32], where the deassertion of the gate is completely decoupled from its assertion. Fig.5 shows an example of such a gate where the transistor size in the logic chain is skewed heavily to speed up the output assertion once the inputs are activated. The gate is then reset by some additional devices and made ready for the next access. By decoupling the assert and deassert paths, the former can be optimized to reduce the decoder delay.. Fig.5 Skewed NAND gate. 11.

(25) The SRAM data path logically implements a multiplexer for reads (and a demultiplexer for writes). In the simplest implementation, the multiplexer has only two levels: at the lowest level, the memory cells in a column are all connected together to a bit line and in the next level, a small number of these bit lines are multiplexed together through column pass transistors (Fig.1). When the bit-line height is very large, it can be further partitioned to form multi-level bit line hierarchies, by using additional layers of metal [36]. In general, the multiplexer hierarchy can be constructed in a large number of ways (2r-1*2c mux designs are possible for a 2r * 2c+k block with 2r number of rows, 2c number of columns and an access width of 2k bits). Fig.6 shows two possible designs for a block with 512 rows. The schematic shows only the NMOS pass gates for a single-ended bit line to reduce the clutter in the figure, while the real multiplexer would use CMOS pass gates for differential bit-lines, to allow for reads and writes. Fig.6a shows the single level mux design, where two adjacent columns with 512 cells are multiplexed into a single sense amplifier. Fig.6b shows a two level structure in which the first level multiplexes two 256 high columns, the output of which are multiplexed in the second level to form the global bit lines, feeding into sense amplifiers. Similarly, hierarchical muxing can also be done in the I/O lines which connect the outputs of all the sense amplifiers to the I/O ports [37]. Due to its small size, a memory cell is very weak and limits the bit-line slew rate during reads. Hence sense amplifiers are used to amplify the bit-line signal so signals as small as 100mV can be detected. In a conventional design, even after the sense amplifier senses the bit lines, 12.

(26) they continue to slew to eventually create a large voltage differential. This leads to a significant waste in power since the bit lines have a large capacitance. By limiting the word-line pulse width, we can control the amount of charges pulled down by the bit lines and hence limit power dissipation [38-41]. In this thesis, we use a scheme to control the word line pulse width to be just wide enough, over a wide range of operating conditions, for the sense amplifiers to reliably sense, and prevent the bit lines from slewing further.. Fig.6 Bit-line mux hierarchies in a 512 row block A number of different sense amplifier circuits have been proposed in the past and they essentially fall into two categories: the linear amplifier type [42-43] and the latch type [19-21]. Fig.7 illustrates a simple prototype of each type. In the linear amplifier type (Fig.7a), the amplifier needs a DC bias current to set it up in the high gain region prior to the arrival of the bit-line signal. To convert the small swing bit-line signal into a full swing CMOS signal, a number of stages of amplification are 13.

(27) required. These kinds of amplifiers are typically used in very high performance designs. Because they consume biasing power and they operate over a limited supply voltage, they are not preferred for low power and low voltage designs. In these designs, the latch-type designs are used (Fig.7b). They consist of a pair of cross-coupled gain stages which are turned on with the aid of a sense clock when an adequate input differential is set up. The positive feedback in the latch leads to a full amplification of the input signal to a full digital level. While this type consumes the least amount of power due to the absence of any biasing power, they could potentially be slower since a timing margin is needed in the generation of the sense clock. If the sense clock arrives before enough input differential is set up, it could lead to a wrong output value. Typically, the sense clock timing needs to be adjusted for the worst case operating and process condition, which in turn slows it down for the typical conditions due to the excess timing margins. In this thesis, we will look at some timing circuits which track the bit-line delay and which are used to generate a sense clock with a reduced timing overhead. In large SRAMs, another level is added to the data path hierarchy by connecting the outputs of the sense amplifiers onto the I/O lines (Fig.2). The I/O lines transport the signal between the RAM I/O ports to the memory blocks. In large access width SRAMs, the power dissipation of these lines can also be significant and hence the signaling on these is also via small swings [44]. In Chapter 4, we will apply the low swing bit-line technique to the I/O lines too, to reduce the I/O line power.. 14.

(28) a) current mirror amplifier. b) latch-type amplifier. Fig.7 Two common types of sense amplifiers. 15.

(29) Chapter 3 Lambda Bipolar Transistor Memory Cell Negative differential resistance semiconductor devices have been known for their memory application. The negative differential resistance or the folded I-V characteristics of devices makes it possible to have the multiple stable states with good margins in a simple circuit consisting of just a few devices. This fact was recognized by several researchers and several compact multiple-valued storage functions had been described in the literature. For example, Thomas et al. [45] had described a voltage-controlled negative differential resistance device (NEGIT) made by a bipolar transistor and an extended field plate over the emitter-base junction. The operation of NEGIT depends on some uncontrollable parameters such as surface recombination velocity and surface state. Wu et al. [46~47] had presented another voltage-controlled negative differential resistance device, called Lambda bipolar transistor (LBT), which merges a NMOS with a bipolar transistor. The LBT’s, in particular, had shown clear voltage-controlled negative differential resistance characteristics, which is. advantageous. for. functional. circuit. applications.. Moreover,. planar-structure LBT’s has also been realized to meet the demand for high-level integration. In recent years, quantum devices with carrier transport of resonant tunneling (RT) had been developed, several attempts were made on the RT structures to obtain multiple negative differential resistance characteristics [48]. Based on the multiple negative differential 16.

(30) resistance concepts, many resonant tunneling devices had been developed and fabricated to implement a memory cell [49-51]. However, these structures need the external bias source to separate the peaks and are difficult to incorporate into a bipolar transistor to exploit the additional advantages of high gain and good input-output isolation. Also, they will be difficult to fabricate over million-transistor circuit in III-V technology, and one of the shortcomings is the absence of a reasonable density low-power on-chip memory. In the past, several earlier works on single-side memory cells had been conducted for both power and area reduction. Among these, Takagi et al. [6] proposed Dual Depletion CMOS memory cell; Schrader et al. [7] proposed a static memory based Schmitt trigger circuit; Elmasry et al. [8] proposed double-Lambda diode (DOL) memory cells, and they also proposed SDW MOSFET memory cell [9] by using single–device well MOSFET’s. To our knowledge, none of them has been implemented so far for practical applications. Besides, as the advance of submicrometer device fabrication technology, the area of memory cells is continuously scaled down, leading to fine metal bit-line problems. The fine metal bit-lines in a high-resistance poly load or ploy-PMOS load cell will induce large signal delays, or high current density could cause reliability degradation. For these reasons, several new works on single-bit line SRAM’s were proposed. Sasaki et al. [10] had proposed a high-density 16-Mb SRAM, and a CMOS flip-flop circuit is acted as the storage element with only one access transistor. However, the noise margin is small based on this structure. Ukita et al. [11] had proposed an ultra-low power SRAM. They used the same CMOS flip-flop as the 17.

(31) storage element but with two serial-connected access transistors at one side: one is for X address selection and the other is for Y address selection. This structure can achieve very low power requirement. However, as the driver to load ratio is small, the delay time becomes significant.. Fig.8 A vertical Lambda bipolar transistor structure. The proposed new memory cell is based on Wu’s Lambda bipolar transistor (LBT) developed in 1980’s. The LBT is a highly integrated device characterized by its voltage-controlled negative differential resistance, and has been used successful in many applications [52-53]. If the LBT is to be used in static random access memory cell, the standby current in one of its storage states is relatively high. For this reason, a new LBT is proposed for low power applications.. 3.1 New Lambda Bipolar Transistor 18.

(32) The basic structure of the new Lambda bipolar transistor and its electrical equivalent circuit connection are shown in Fig.8 and Fig.9, respectively. From Fig.8, the n-channel enhancement-mode MOSFET's are fabricated upon the base region of a vertical NPN bipolar transistor, which is called the vertical Lambda bipolar transistor (VLBT). The source of one of the MOSFET's (labeled as E') is utilized as the emitter of the vertical NPN bipolar transistor, while the p-type diffusion well and the n-type epi-layer act as the base and the collector, respectively. The equivalent circuit is shown in Fig.9 and their interconnections can be clearly seen. It could be noted that, in ordinary circuit applications, E’ is biased at a voltage level lower than B’ and C’. Therefore, E’ is the only possibly turned-on PN junction, i.e. the other three sources/drains other than E’ have no chance to act as the emitter of the vertical bipolar transistor.. Fig.9 An equivalent circuit of a vertical Lambda bipolar transistor. The vertical Lambda bipolar transistor is operated in the same way as 19.

(33) the conventional bipolar transistor with a fixed external base current. From the terminal characteristics of the separate devices, the general equations for the proposed VLBT, according to the circuit model of Fig.9, can be written as 2 ⎧ ⎡ V B 'E ' ⎤ ⎪ K 1 ⎢(V C 'E ' − V T 1)V B 'E ' − 2 ⎥⎦ ⎪⎪ ⎣ I B' = ⎨ ⎪ K1 2 ⎪ 2 (V C 'E ' − V T 1) ⎪⎩. if (V C 'E ' − V T 1) > V B 'E '. (1). if (V C 'E ' − V T 1) < V B 'E ' ,. where K 1 = C ox µ n W 1 L1 and VT1 is the threshold voltage of M1. (2). I C' = I C + I B. ⎧ 2 ⎡ ⎪ ( V C 'E ' − V BE ' ) ⎤ ⎥ ⎪ K 2 ⎢(V B 'E ' − V T ' )(V C 'E ' − V BE ' ) − 2 ⎣ ⎦ ⎪⎪ = IB ⎨ ⎪ K2 2 ⎪ 2 (V B 'E ' − V T ' ) ⎪ ⎪⎩. if (V B 'E ' − V T 2 ) > V C 'E '. (3). if (V B 'E ' − V T 2 ) < V C 'E ' ,. where K 2 = C ox µ n W 2 L2 , VT2 is the threshold voltage of M2, and V T ' = V T 2 + V BE ' . I C = βI B + I CO (1 + β ). (4). where β is the dc common-emitter current gain of the NPN bipolar transistor, and ICEO=ICO(1+β) is the common-emitter collector reverse saturation current. A certain current source load M3 operated in saturation region is chosen for derivation. The current equation can be written as. 20.

(34) I B' =. 2 K3 ( V GG − V B 'E ' − V T 3 ) 2. where K 3 = C ox µ n W 3 L3. for V GG > (V B ' E ' + V T 3 ). (5). , VT3 is the threshold voltage of M3 and the VGG. is the power supply connected to the drain of the M3. In order to get analytical expressions, the body effects are assumed to be negligible. To see the quantitative operational principles of a VLBT shown in Fig.10, the six-region analyses are given as follow: Region I: If V C ' E ' < V BE'(on) < V T1 , , M1 is off, M2 is operated in linear region, and Q1 is off. In this region, IB=0, thus I C ' = I CO (1 + β ) .. Region II: If V BE '(on ) < V C ' E ' < V T 1 and V B 'E ' = V GG − V T 3 − K [ 2φ fp + V B 'E ' − 2φ fp ] > V T ' (where K is the modifying substrate factor), M1 is off, M2 is kept in linear region, and Q1 is operated in forward-active region. By solving VB’E’, i.e. 2 2 ⎛ K ⎞ + K⎛ K + 2φ ⎞ 2 φ 2 φ K K − + + = − + + ⎜ ⎜ ⎟ V B 'E ' V GG V T V GG V T fp fp fp ⎟ 2 ⎠ 4 ⎝ ⎠ ⎝. 1. 2. (6). we get the output current in this region: 2 ⎡ ⎡⎛ ⎛ K ⎞ ⎟ I C ' = (1 + β )⎢ K 2 ⎢⎜⎜ ⎜V GG − V T + K 2φ fp + 2 ⎠ ⎣⎢ ⎣⎢⎝ ⎝ 2 ⎤ ⎤ ⎞ ( ⎛ ⎞ V K CE ' − V BE ' ) ⎟ ⎥ + K ⎜V GG − V T + K 2φ fp + + 2φ fp ⎟ − V T ' (V C 'E ' − V BE ' ) − + I CO ⎥ ⎟ 4 2 ⎥ ⎥ ⎝ ⎠ ⎠ ⎦ ⎦. (7). 2. Region III: If V T 1 < V C ' E ' < V B ' E ' − V T 2 and assuming that V B 'E ' > V T ' , M1 is operated in saturation region, M2 is kept in linear region, and Q1 is still in 21.

(35) forward-active region. Solving VB’E’ by equating (1) and (5), we obtain V B 'E ' = V GG −. K1 ( V C ' E ' − V T 1) − V T 3 K3. (8). and the output current in this region is ⎡ ⎡ I C ' = (1 + β )⎢ K 2 ⎢V GG − ⎣⎢ ⎣. ⎤ (V C 'E ' − V BE ')2 + ⎤ K1 ( V C ' E ' − V T 1) − V T 3 − V T '⎥ (V C ' E ' − V BE ' ) − I CO ⎥ 2 K3 ⎦ ⎦⎥. (9) The peak current is I P = I C '|V derived by letting VP =. B ' E ' =V P. , where the peak voltage VP can be. ∂I C ' = 0 , i.e. ∂VC ' E '. K 3 (V GG − V T 2 + V T 3 ) + K 1 K 3 (V BE ' + V T 1) K 3 + 2 K1 K 3. (10). Region IV: If V B ' E ' − V T 2 < V C ' E ' < V B ' E ' + V T 1 and V B 'E ' > V T ' , M1 and M2 are both operated in saturation region, and Q1 is operated in forward-active region. Using equations (1),(2),(4), and (8), we obtain 2 ⎡ 2⎛ ⎤ ⎞ K K 1 (V C 'E ' − V T 1) − V T 3 − V T ' ⎟⎟ + I CO ⎥ I C ' = (1 + β )⎢ ⎜⎜V GG − ⎢ 2 ⎝ ⎥ K3 ⎠ ⎣ ⎦. (11). By differentiating (11) with respect to VC’E’, the output resistance in this region can be written as RO =. −1. (1 + β ) K 2. ⎤ K1 ⎡ K3 ( V C ' E ' − V T 1) − V T 3 − V T ' ⎥ ⎢V GG − K3 ⎣ K1 ⎦. (12). Region V: When V C ' E ' > V B ' E ' + V T 1 , M1 is operated in linear region and M2 is operated in saturation region. 22.

(36) Assuming V B ' E ' > V T , Q1 is operated in forward-active region. The output current is I C ' = (1 + β )⎡⎢ K 2 (V B ' E ' − V T ' )2 + I CO ⎤⎥ . Solving (1) and (5), gives ⎣ 2 ⎦ K 1 (V C 'E ' − V T 1) + K 3 (V GG − V T 3 ) V B 'E ' = K 1+ K 3. [[ (V − K 1. − V T 1) + K 3 (V GG − V T 3 )] − K 3 (K 1 + K 3 )(V GG − V T 3 ) K1 + K 3 2. C 'E '. 2. ]. 1. 2. (13). By a chain rule, we have the output resistance RO =. =. 1 ∂I C ' ∂I B 'E ' ∂VB 'E ' ∂VC 'E ' 1 × (1 + β ) K 2 (V B 'E ' − V T ')(K 1 + K 3). ⎡ 2 K 1 [K 1 (VC ' E ' − VT 1 ) + K 3 (VGG − VT 3 )] ⎢ K1 − 2 2 ⎢ 2 [K 1 (VC ' E ' − VT 1 ) + K 3 (VGG − VT 3 )] − K 3 (K 1 + K 3 )(VGG − VT 3 ) ⎣. [. ⎤ ⎥ 1 2 ⎥ ⎦. ]. (14). The valley voltage Vv can be obtained by solving V B ' E ' (V C ' E ') = V T ' , i.e. 2 K 3 (V GG − V T ' + V T 3) + K 1 (V T ' + 2V T 'V T 1) = VV 2 K 1V T ' 2. (15). Region VI: When V C ' E ' > V V , M1 is still operated in linear region, M2 and Q1 are both off. Thus, I B = 0 and I C ' = I CO (1 + β ) . The output dc characteristic of the new VLBT is shown in Fig. 3-3.. 3.2 Description of the New Memory Cell The performance of a SRAM strongly depends on the design of its memory cell. Generally, a full CMOS cell is suitable for low power design with acceptable speed. However, it has a significant area penalty over a high-resistance poly load or poly-PMOS load cell. On the contrary, 23.

(37) the fine metal bit-lines in a high-resistance poly load or poly-PMOS load cell will induce large signal delay or high current density, causing reliability. Fig.10 The I-V characteristics of a vertical Lambda bipolar transistor. degradation. In our thesis, a new single-sided memory cell is proposed to solve these problems. The general configuration of the proposed static random access memory cell is shown in Fig.11, which consists of a VLBT, a load element, a current source device, and an access transistor. Owing to the negative differential resistance of VLBT, the storage node SN has two dc static points (See Fig.12). Two kinds of load elements, current source-like and resistance-like, can be selected for different applications. For a current source-like load, the current flow at the static points SC1 and SC2 can be both small if the circuit is well-configured. On the contrary, a 24.

(38) resistance-like load memory cell generally suffers dc current flow at the lower static state SR2. However, it occupies a relative smaller area as compared with a current source-like load one.. Fig.11 General configuration of a new memory cell. Fig.12 The I-V characteristics of a new memory cell with current and resistive load 25.

(39) The new memory cell based on the proposed VLBT is presented in Fig.13. In memory cell configuration, M1, M2 and Q1 operate as a VLBT storage element, M3 acts as a current source, M4 acts as the load element, and M5 is the access transistor. When Vx is in the low stable-state SC1, any positive noise causes Vx to increase slightly. At this moment, IC’ is larger than IDS4 so that Vx discharges to SC1. If any negative noise causes Vx to reduce a little, the fact that IDS4 is larger than IC’ will cause Vx to be charged to SC1. Previous description demonstrates why this state is stable. The same argument can apply to the state SC2 to verify this state to be stable.. Fig.13 A new SRAM memory cell circuit. If any positive noise is introduced as Vx in the switching state SW, the positive differential current IDS4-IC’ will charge the node X to the high stable state SC2. The memory cell no longer stays in the state SW. On the other hand, if negative noise is introduced as the memory cell is in the state SW, a negative differential current IDS4-IC’ makes the node X to be discharged to SC1. Both types of noise (positive and negative) cause a 26.

(40) transition from the state SW to either the stable state SC1 or SC2. The stored voltage levels are CMOS like, i.e., a full swing between ground and supply voltage is obtained. Fig.14 shows the static noise margin (SNM) comparison between our new memory cell and [48] proposed, which is referred as a LBT configuration. The voltage of storage node at any instant is the base-emitter voltage in the LBT configuration, hence is always less than 1V. The SNM of the new memory cell (VLBT) and the LBT configuration are about 1.2V and 0.4V, respectively. The new memory cell has the larger SNM than LBT configuration. It also shows that the LBT configuration requires adequate circuit to sense the state of the cell, because the switch point of the LBT configuration is less than 1V.. Fig.14 The static transfer characteristics of the memory cells. 27.

(41) 3.3 Performance of the New Memory Cell Extensive circuit simulations have been carried out to verify the circuit operation and the characteristics of performance. The performance of the proposed circuit is evaluated based on 5V, 0.5um BiCMOS technology. The simulation results are based on 1ns rise and fall time.. 3.3.1 Write Operation In the static memory cell, the write operation is performed by forcing high and low voltage to the bit-line. The operation cycles start at 3ns, turning on the access transistor M5 by a word-line pulse with 1ns rise time.. Fig.15 Write “0” operation When changing the binary state of the memory circuit from 1 to 0, the voltage level of node X rapidly decreases. Because the transistor M1 28.

(42) is turned off and the transistor M2 and Q1 are turned on, the internal capacitor of node Y is charged very fast via the transistor M3. The simulation result is shown in Fig.15. Changing the binary state from 1 to 0 just takes about 0.5ns. When changing the binary state from 0 to 1, the voltage level of node X increases very fast due to the fact that the current through the transistor M4 is increased. But with increasing the node voltage Vx, the access transistor as well as the transistor M3 is turned off. Now, the internal capacitor of node X is charged more slowly via the load transistor M4. The simulation result is shown in Fig.16. Changing the binary state from 1 to 0 just takes about 1.5ns.. Fig.16 Write “1” operation. 3.3.2 Read Operation The stored data of a memory cell selected by the word-line and the 29.

(43) column decoder has to be read nondestructively. For the read operation, the bit-line capacitor CBL is precharged to the reference voltage level Vref and then is left floating. The bit-line voltage versus time during the reading cycle is calculated by assuming that the memory cell has to charge a bit-line capacitor CBL of 1pF. Reading a binary 0, the bit-line capacitor has to be discharged via the transistor Q1. The current flowing from the bit-line into the circuit should be low enough so that the voltage level of node X does not cross the switching point SW. This current will increase with an increasing precharge voltage level on the bit-line. It means that the precharge voltage has an upper limiting voltage. For a precharge voltage level higher than the upper limiting voltage, the circuit becomes unstable and switches into the opposite binary state. The information in the memory cell will then be destroyed during readout. The reading "0" operation is shown in Fig.17.. Fig.17 Read “0” operation 30.

(44) Reading a binary 1, the bit-line capacitor is charged via the transistor M4, and the load current will cause the node voltage Vx to full. To avoid the voltage level of node X crossing the switching point SW, the load current level should be higher. Therefore, during reading a binary 1, the precharge voltage level has a lower limiting voltage. The reading "1" operation is shown in Fig.18.. Fig.18 Read “1” operation. 3.3.3 Comparisons Fig.19 shows comparisons of the transient analysis of read "0" operation with respect to different load capacitances on the bit-line. Since the bipolar transistor is operated from cut-in to forward-active region, the proposed memory cell does not make too much difference on the delay time from conventional single-side CMOS memory cell for a small 31.

(45) bit-line capacitance. However, for a large bit-line capacitance, the proposed memory cell is superior to the conventional one because it owns large cell current. What can be noted is that for a heavily-loaded bit-line, the conventional memory is destructively read, i.e. its storage state is changed from "0" to "1" after read operation. On the contrary, the proposed memory cell maintains its trend on delay time toward a bit-line capacitance. Because the charges required for changing the state of the proposed cell from "0" to "1" are relatively large as compared with the conventional one, which are important for nondestructive read operation.. Fig.19 Sensing delay versus bit-line capacitance. 32.

(46) Chapter 4 New Current-Mode Sense Amplifier During the reading access cycle, the sense amplifier is one of the most critical element of memory circuit. The conventional sense amplifier is based on the voltage-mode technique, but its sensing time increases as the bit-line capacitance increases and its AC operation power consumption is very large. Several design techniques had been proposed to reduce the power dissipation of static RAM [54] in the past. On the other hand, several current-mode sensing circuits [55-57] had been proposed to overcome the problem of possible speed degradation due to larger bit-line or data-line capacitances.. 4.1 Introduction Due to their great importance in memory performance, sense amplifiers have become a very large class of circuits. Their main function is to sense or detect stored data from a read selected memory cell. Fig.20 shows a typical use of a sense amplifier.. Fig.20 Typical use of a sense amplifier 33.

(47) The memory cell being read produces a current "IDATA" that removes some of the charge (dQ) stored on the pre-charged bit-lines. Since the bit-lines are very long and are shared by other similar cells, the parasitic resistance "RBL" and capacitance "CBL" are large. Thus, the resulting bit-line voltage swing (dVBL) caused by the removal of "dQ" from the bit line is very small, i.e., dVBL=dQ/CBL. Sense amplifiers are used to translate this small voltage signal to a full logic signal that can be further used by digital logic. The need for increased memory capacity, higher speed, and lower power consumption has defined a new operating environment for future sense amplifiers. Below are some of the effects of increased memory capacity and decreased supply voltage: 1) Increasing the number of memory cells per bit-line increases CBL, while an increase in length of the bit-line increases RBL. 2) Decreasing memory cell area to integrate more memory cells in a single chip reduces the current IDATA that is driving the heavily loaded bit-line. This coupled with increased CBL causes an even smaller voltage swing on the bit-line. 3) Decreasing supply voltage results in smaller noise margins which in turn affect sense amplifier reliability. In this Chapter, new current-mode sense amplifiers will be presented and its ability to deal with these newly imposed operating conditions examined.. 34.

(48) 4.2 Voltage Sensing and Current Sensing Current-sensing or current-mode as the name suggests is the sensing technique which determines the logic value present on a wire based on the current through the wire. The difference between current-sensing and voltage sensing is very subtle for conventional CMOS. As for MOS transistors, there is no current-threshold but they have a voltage threshold and hence, conventionally they determine the signal state by sensing the voltage.. 4.2.1 Theoretical Model Theoretically, a voltage-mode signaling can be modeled as shown in Fig.21. In the voltage mode, the driver drives interconnect and is terminated with an open circuit (RL ≈ ∞). This allows the voltage at the destination to change based on the input voltage. The sensing circuit at the destination has to then figure out the signal state using this voltage value.. Fig.21 Theoretical voltage-mode signal model. 35.

(49) However, in the case of current-sensing the signal is transmitted by a current pulse. The theoretical representation for current sensing would be shown in Fig.22. In a current-sensing, the driver drives a line which is terminated by a short (RL ≈ 0). Hence, there exists a path for the current to flow and the sensing circuit at the end of the line has to detect this current to determine the signal value.. Fig.22 Theoretical current-mode signal model. As shown in Fig.23, the conventional way of signaling is voltage-mode. An inverter acting as a driver drives interconnect which builds up a voltage at the end of the line. Since the line ends in the gates of the transistors, RL ≈ ∞. The voltage sensing circuit is another inverter and since the MOS transistors have voltage thresholds to turn them on or off, the output of the inverter depends on the voltage at its gate. The biggest challenge in current-mode signaling is to design an efficient sensing circuitry, which detects the change in current. A normal driver can be used to drive interconnect and to drive current instead of voltage,. 36.

(50) Fig.23 CMOS representation for a voltage-mode signal model. and the end of the line should provide a path to ground. Thus, a current-mode sensing setup would look like the one in Fig.24.. Fig.24 CMOS representation for a current-mode signal model. The main difference between the current-mode and voltage-mode signaling is the termination of interconnect. In the case of current-mode, the termination resistance is very small; while in the case of voltage-mode, it is very large. Since current is used as a mode of signaling in current-mode and there should be a path to ground from driver, static power dissipation is expected in current-mode signaling. Also the receiving (sensing) circuit is complex in current-mode as MOS 37.

(51) transistors don't have a current threshold. Also since there is a low impedance path to the ground at the end of the line, the capacitance of interconnect is not charged to Vdd but to an intermediate value. Since the sensing current in MOS is not very trivial, most of the current-mode sensing is done differentially. This may require some synchronizing (precharging or pre-equalizing) signal.. 4.2.2 Voltage-Mode and Current-Mode Signal Delay The use of current sensing amplifiers has a number of benefits over voltage sensing amplifiers. The most important ones are significant reductions in bit-line voltage swing and major reductions in sensing delays [58]. These benefits translate to lower dynamic power consumption and increased sensing speed. The key to these improvements lies in the low input resistance of the current sensing amplifier. This becomes evident when examining the equivalent sensing circuit in Fig.25.. Fig.25 A long interconnect model. In this model, we assumed that the output current is a linear-ramp signal as shown in Eq. (4-1), i.e., 38.

(52) i = p (t-δt) o. (4-1). o. where io is the output current, po is the constant slope, and δt is the delay. The analysis shows that the delay for a line is given by the following equation: ⎞ ⎟ ⎛ ⎞ RL ⎟+ ⎜ ⎟ ⋅ ⎟ ⎟ RB CT ⎜ + + L ⎝ R B RT R L ⎠ ⎟ ⎠. ⎛ R (R ⋅ C ) ⎜⎜ R + 3 + R ⋅ δt = ⎜ 2 + + ⎜R R R T. T. B. T. B. ⎝. T. L. (4-2). where RT and CT are the total bit-line resistance and capacitance. A voltage mode signal path, the RC line modeled in the above circuit is open circuit, it means that the resistor RL is extremely large. When RL >> RB, it can be assumed to be infinite in the above equation. Therefore, the time constant can be given by：. δt =. (R ⋅ C ) ⎛⎜ 2R ⋅ 1+ ⎜ 2 R ⎝ T. T. T. B. ⎞ ⎟ ⎟ ⎠. (4-3). When we consider a current mode signal path behavior, the output loading of the long interconnect line is always a low resistance (ideally zero). Therefore, the RL modeled in Eq. (4-1) can be ignored, so the time constant can be given by： ⎛ RT (RT ⋅ CT ) ⎜⎜ RB + 3 ⋅ δ = ⎜ + 2 ⎜ R B RT ⎝. t. ⎞ ⎟ ⎟ ⎟ ⎟ ⎠. (4-4). Fig.26 shows a comparison of voltage-sensing and current-sensing, Eq. (4-3) and Eq. (4-4). The figure shows that the current-sensing has less delay as compared to the voltage sensing. Actually, the load resistance for 39.

(53) current-sensing is not zero and so the effect of non zero load resistance should be studied. Fig.27 shows a comparison of the current-sensing with different load resistances.. Fig.26 Comparison of voltage sensing and current sensing. Fig.27 Comparison of voltage sensing and current sensing with different values of load resistance. 40.

(54) As expected, an increase in load resistance increases the delay in the current-sensing technique, but the increase is not very significant for low resistance of interconnect and/or low resistance of the driver. The plots show that the delay in both current-sensing and voltage-sensing technique increases quadratically with respect to the length of the line (represented by the resistance of the line in the plots). When we consider the long interconnect line signal path delay, we assume the source resistance is 1kΩ, and total capacitance distributed in the line is 1pF, and the total resistance of the line is 100Ω. The time constant of voltage mode signal path is 1.05ns. For the same assumption in current mode signal path, the time constant can be estimated to be 0.047ns. We makes another approximation, since RB>>RT, the delay for voltage-mode can be approximated as RBCT. Also, the delay for current-mode can be approximated as RTCT/2 and since RB>>RT current-mode is faster than voltage mode. A plot for comparing the voltage-mode and current-mode delay is shown in Fig.28. Based on the above analysis, if the capacitance loading is independent, the time constant of long interconnection line can be reduced by reducing the loading resistance RL. When the next stage is a voltage-mode circuit, it is always working as a capacitance loading. Therefore, the loading resistance is much larger than resistance in line. This method will make long delay time in signal transportation. In order to shorten the time constant in long interconnect line, we can make 41.

(55) Fig.28 Comparison of voltage-sensing and current-sensing with approximations next stage to be a current mode circuit. By this way, the loading RL can be reduced to lower than the line resistance. Even they are the same order; the delay time in long line can be shorten to one order or more little. Hence, this low resistance input node for next stage can speed up the signal delay time, which pass through the long interconnects line.. 4.3 Voltage-Mode Sense Amplifier Voltage-mode sense amplifiers have been known for a long time, the simplest voltage sensing amplifier is the differential couple [2]. Fig.29 shows a schematic diagram of a simple differential couple with its inputs and outputs labeled. During a read, the input nodes (VIN+ and VIN-) would be pre-charged to VPRE, causing the output nodes (VOUT+ and VOUT-) to 42.

(56) stay at the same level. The read-selected cell would then be asserted and a small voltage swing would appear on the bit-lines. This small voltage swing would then be amplified by the differential couple and later used to drive digital logic. . Fig.29 Simple differential couple schematic. Another version of a voltage sense amplifier which has enjoyed a wide usage is the full complementary positive feedback differential sense amplifier. This voltage sense amplifier has a very large differential gain and the added ability to automatically rewrite destructive read data [59]. Fig.30 shows the schematic diagram of the full complementary positive feedback amplifier. The positive feedback amplifier has two data nodes VIN/OUT1 and VIN/OUT2 and three control nodes SANEN, SAPEN and PRE. Nodes VIN/OUT1 and VIN/OUT2 act as both input and output to the sense amplifier. Its operation is as follows: 1) the data nodes are equalized using PRE; 2)the memory cell being read is asserted and a small voltage difference forms on nodes VIN/OUT1 and VIN/OUT2; 3) while MN1 and MN2 are biased to be operated. 43.

(57) Fig.30 Full complementary positive feedback amplifier schematic. in the saturation region, MN6 is turned on by SANEN; 4)as both VIN/OUT1 and VIN/OUT2 are decreased in voltage, so is the difference between them; 5) one of them decreases much faster than the other and causes MN(1or2) to enter cutoff while the other starts operating in triode; 6) at this point MP5 is turned on by SAPEN which pulls the signals rapidly apart; 7) at this point since VIN/OUT1 and VIN/OUT2 are directly connected to the bit-lines, the data is automatically written to the destructively read memory cell. Due to its positive feedback, this voltage sensing amplifier achieves a very high differential gain. This high gain minimizes sensing time by being able to sense small voltage swings on the bit-line. However, since the bit-line capacitance is growing along with memory capacity, the bit-line voltage swing is becoming smaller and more power expensive to produce. There also exists a practical limit to this decreasing voltage swing. When the bit-line voltage swing reaches the same magnitude as bit-line noise, the voltage sense amplifier will become unusable. 44.

(58) Therefore, to achieve the preset objectives of large memory capacity, high speed, and low power, a new type of sense amplifier is needed.. 4.4 Clamped Bit-Line Sense Amplifier A commonly used current mode sensing amplifier is the clamped bit-line sense amplifier [58] shown in Fig.31. By clamping the voltage on the bit-line to a stable voltage (VREF), the signal current produced by the cell can be transferred to an internal sense amp node without charging/discharging the large bit-line capacitance. As a result, both sensing delay and dynamic power consumption are significantly decreased.. Fig.31 Clamped bit-line sense amplifier. This sense amplifier uses three pre-charge and equalization transistors (M7, M8 and M9), two current sensing transistors (M5 and M6) 45.

(59) and four back to back inverter configuration transistors for the voltage output stage (M1, M2, M3, M4). Its operation follows two stages pre-charge/equalization and sensing. The following is the timing schedule: 1) transistors M7, M8, and M9 are turned on to pre-charge and equalize the sensing nodes; 2) transistors M7 and M8 are turned off and the memory cell accessed; 3) the current from the cell starts being sourced by one of the transistors M1 and M2 and a voltage difference starts forming on one of the output nodes; 4) this voltage is further amplified by the positive feedback amplifier until it reaches the latched state. It has been shown that the time response of a latch formed by cross-coupled inverters is directly related to the AC small signal gain bandwidth (GBW) product of the inverters. Maximizing GBW product maximizes the speed of the sense amplifier. By examining both small signal models for the positive feedback cross-coupled voltage sense amplifier and the clamped bit-line current sensing amplifier, we can derive the following GWBs:(a) voltage sensing and (b) current sensing GWB: a). GBW. VS. =. g C. b). m BL. GBW. CS. =. g C. m d. Since Cd << CBL, it can be easily seen that the current mode sense amplifier enjoys a much higher speed. Another observation is that this amplifier is bit-line capacitance insensitive maintaining a constant speed over increased bit-line capacitance. To recognize the power savings associated with the switch to current sensing amplifiers, we need to examine the dynamic power dissipation of the voltage sensing amplifier. In voltage sensing, the bit-line are 46.

(60) discharged and charged by dVBL(close to 400mV) for every read operation. When this dVBL is combined with both increasingly large bit-line capacitance CBL and read frequency "fread", the energy following below equation becomes large [60]:. P = fread * CBL * V2BL .. The current sensing amplifier on the other hand has a very negligible voltage swing, thus nearly eliminating dynamic power dissipation. Furthermore, this bit-line voltage inactivity significantly decreases cross talk between bit-lines, and supply voltage drop associated with bit-line charge-up.. 4.5 New Current-Mode Sense Amplifier The sensing speed of the current mode sense amplifier is faster than conventional voltage sense amplifier and is independent of the bit-line and data-line capacitances. For conventional sense amplifier, because the input nodes connect the bit line or data line, the reading access speed is always dependent of the bit-line and data-line capacitance. This will be a difficult problem to solve due to the more and more cells parallel connect in bit line that always makes a large bit line loading. As the store capability in memory grows up, the number of cell increased in the memory can not be avoided. If we need reading data of cell in a short time, the number of switch that selects the current column can not be increased, it means that the column number must be reduced, at this time, the cells parallelly connected in the bit-line increase. Loading a large bit-line capacitance makes the RC time constant delay extremely larger and the speed of sense 47.

(61) amplifier pulls down as capacitance increases. Due to the low impedance of current-mode sense amplifier, the signal from the memory cell can be injected into the sense amplifier with only minimal charging or discharging of the bit-line capacitance. As a consequence, the voltage change in the bit line during the sense portion of a cell read access is extremely low, eliminating the source of most voltage noise coupling problems and minimizing power supply bounce during sensing.. 4.5.1 Circuit Description and Operation In this section, the new current-mode sense amplifier is proposed. The operation power of reading access cycle is less than that of conventional current-mode sense amplifier and the speed is extremely high. Fig.32 presents the read data path of an n-type separated flip-flop current-mode sense amplifier. The N5-N6 and P1-P2 are made in a manner similar to positive feedback latches. N1 and N2 connect the input nodes and pull down the data-lines close to the ground level. The transistors N7 and N8 are the separating transistors, and the transistors N3 and N4 are the equalization transistors. The bit-line and data-line capacitances are represented by CBL and CDL, respectively, and WL and CL are the word-line and column-line selector signals, respectively. The inputs to the current-mode cross-coupled latch are at the sources of the transistors N5 and N6. The low impedance at the input nodes causes the current signals at the data-lines to be injected into the cross-coupled latch without charging or discharging the data-line capacitances. Hence, the sensing speed is insensitive to both the bit-line and the data-line capacitances.. 48.

(62) Fig.32 A current-mode sense amplifier and a simplified data path circuit. Before beginning the sensing operation, the same as the conventional sense amplifier, the bit-lines need to be equalized to the same voltage level. In this design, we pull down the bit-line voltage to the ground level, which is different from the conventional SRAM’s. When the sense amplifier is in the standby state, the signal “SENB” is at high-level and the signal “SEN” is at low-level. Under this condition, N3 and N4 are on, so they pull down the drains of the N5 and N6 to low-level. Hence, N5 and N6 are in the cut-off state, and P1 and P2 operate in the linear region, since their gate voltages are at low-level. The “SEN” is at low-level, so N7 and N8 are in the cut-off state, therefore, no current flows through N7 and N8. At the time, the voltage at the output nodes of the sense amplifier (node A and node B) are equal to the power supply voltage, the input nodes are at zero volts, and the latch nodes (the 49.

(63) drains of N5 and N6) are discharged to low-level. Hence, in the standby state, no DC current flows in the sense amplifier. During the read operation, both WL and CL lines are activated. The “SENB” is at low-level, and so turns off N3 and N4; and the “SEN” is at high-level, the separated flip-flop is turned on. When a particular memory cell is accessed, a differential current signal appears at the DL and DLB of the common data-lines. N5 has a lower VGS than N6, so the voltage at node A exceeds the voltage at node B. Moreover, the amplifier with cross-coupled configuration implies that the source to gate voltage of P2 is less than that of P1. The current that flows into node A will therefore be much higher than the current that flows into node B. The voltage at node A then increases further and the voltage at node B decreases. The separated flip-flop is a positive feedback loop, which regenerates the voltage to full swing and latches the voltage, and the response time of the flip-flop is very short, since the capacitance of the output node is very small. Besides, the different points from conventional current sense amplifier are the equalization transistors N3, N4, and the special positive latch structure with separated transistor N7, N8. In conventional design, for example, CBLSA or hybrid mode sense amplifier always uses only one NMOS transistor which connects the two output nodes as an equalization transistor. For this method, when equalizing signal rise high, the NMOS turns on to equalize the charge between the two output nodes. Assume the beginning voltage levels of the output nodes are supply voltage and ground level. After the equalization stops, the voltage is the half of the supply voltage. In this condition, the transistors that combine the positive feedback latch always turn on due to their gate-source voltage is larger than their threshold voltage. Hence, there is a static current flow through from power supply node to the ground. This static current makes 50.