國立交通大學
電控工程研究所
博 士 論 文
應用於近臨界電壓晶片資料傳輸之
拔靴帶式電路技術
Bootstrapped Circuit Techniques for Near-threshold
On-chip Data Link
研
究 生: 何盈杰
指導教授:
蘇朝琴 教授
應用於近臨界電壓晶片資料傳輸之
拔靴帶式電路技術
Bootstrapped Circuit Techniques for Near-threshold
On-chip Data Link
研 究 生:何盈杰 Student:Ying-Chieh Ho
指導教授:蘇朝琴 Advisor:Chau-Chin Su
國 立 交 通 大 學
電控工程研究所
博 士 論 文
A DissertationSubmitted to Institute of Electrical Control Engineering College of Electrical Engineering
National Chiao Tung University in Partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy
in
Electrical Control Engineering June 2012
Hsinchu, Taiwan, Republic of China
應用於近臨界電壓晶片資料傳輸之
拔靴帶式電路技術
研 究 生:何盈杰 指導教授:蘇朝琴 教授
國立交通大學電控工程研究所博士班
摘要
近年來,「環保綠能、永續生存」是近年來各界發展的重點。對電子產品而言, 電池是能量的主要來源,延長電池的壽命可減少電池的消耗;另一方面,使用低功率 設計,讓電路能降低功率消耗並延長電池的壽命。根據 P = fCV2 的理論中,同時降 低操作電壓、減少電容負載的多重作用下,使得動態功率可達到好幾個羃次方 (Order) 的下降。為了達到低功率的效果,降低操作電壓是最直覺又有效的方法。甚至,有許 多研究是將電路操作在近臨界區(Near-threshold)附近或直接在次臨界區裡操作。奈米 技術已經廣泛地運用在低功耗的應用上,包括RF、Analog、AD/DA、與MPU等,功 率更低的還有生理信號檢測的相關設計。充分利用奈米技術中元件負載減小的特性, 以及次臨界區電流的極限。 然而近臨界電路的設計將元件操作在近臨界區,目的是大幅降低功耗,達到所謂 的效率能源(Energy-efficient)的特色。但是它有幾個主要的瓶頸:第一、操作速度慢, 多應用於生醫晶片或其它慢速的系統。第二、靜態漏電功率消耗的問題在近臨界區下 更顯得嚴重。第三、嚴重的製程漂移,影響著良率與量產成本。在本論文裡,我們提出了近臨界電壓系統單晶片(System on Chip, SoC)上的資料 傳輸(Data link)電路設計。並提出一系列全新的靴帶式技術(Bootstrap technique),解決 近臨界區電路設計的問題。我們提出的靴帶式技術,主要概念是使電路可提供雙向的 升壓功能,所謂的雙向,是同時對 P 型跟 N 型元件作用,一邊大幅地增加驅動力, 一邊抑制靜態漏電。相較於傳統電路操作在近臨界區,可以有兩個order 的改善。另 一個的優點就是靴帶式技術可以使在次臨界區操作電壓下的電路,操作在一般的三極 管區 (Triode region),使得電路模型更加精確。我們從電路的蒙地卡羅分析就可以清 楚地了解到製程漂移因此大幅減少。 我們一共呈現了四個相關的電路:(1)一個應用於時脈網路(clock network)裡,可 主動減少漏電流之靴帶式反相器。操作在0.2V 時,即便是 1cm 晶片上連線的時脈樹, 能提供10MHz 的穩定時脈,能加以抑制低電壓操作時嚴重的靜態漏電流。此外,本 設計使用閘極升壓(Gate Boosting)的概念,使大部分元件操作在導通區,大幅降低製 程漂移。(2)一個應用在晶片匯流排(on-chip bus)上,能有效抑制符號干擾(Inter-Symbol
Interference, ISI)的靴帶式中繼器設計,VDD = 0.3V 時,單一個 channel 最高可以傳輸
100Mbps 的資料傳輸率 (使用 210-1 PRBS),即便在 VDD = 0.1V 時,仍有 0.8Mbps 的
資料傳輸率。(3)接著,我們尋求最佳的有效能源設計,提出的高倍升壓的中繼器, 提供三倍與四倍升壓功能之預驅動器(Pre-driver)來提供最佳的有效能源設計,而不會
最高可達到 5Mbps 的資料傳輸率,每位元的能源消耗僅有 35.2fJ。(4)最後,我們提 出了靴帶式振盪器(bootstrapped ring oscillator),並完成了一個可操作在近臨界電壓的 全數位鎖位迴路(All-digital PLL, ADPLL)。操作在 0.5V 時,這個 ADPLL 可提供 480MHz 的輸出頻率,僅有 78μW 的功率消耗,而在 0.25V 時,仍可提供 44.8MHz
Bootstrapped Circuit Techniques for Near-threshold
On-chip Data Link
Student:Ying-Chieh Ho Advisor:Chau-Chin Su
Institute of Electrical Control Engineering National Chiao Tung University
ABSTRACT
For the sustainable electronic devices, ultra-low power design is essential to prolong
the battery lives. According to P = fCV2, scaling the supply voltage down is the most
effective way to reduce the power consumption. According to the forecast from the International Technology Roadmap for Semiconductors (ITRS), the supply voltage will be scaled to 0.5V for low-power applications within the next generation. Scaling the supply voltage near the threshold voltage is the most favorable solution for low-power designs. On the other hand, Nano-scaled devices exceed the limit of the speed in the near-threshold region based on small device loading. Nano-scaled process is broadly applied to ultra-low power designs, which includes RF, AD/DA, MPU, especially in biomedical applications. Emerging embedded biomedical applications have once more pushed the low-power designs into another extreme case.
In order to achieve the feature of the energy-efficient operation, the designs are applied to work using near-threshold supply. However, near-threshold circuit design is
definitely challenging because the driving capability (Ion), which is limited to apply to slow
system. Then, the static leakage power becomes severe, and decreases the Ion/Ioff ratio.
Moreover, process variations are degraded significantly, affecting the circuit performance, the power efficiency, and the fabrication yield.
In this dissertation, we propose circuit designs on-chip data link system using near-threshold supply. In order to improve the design issues in the near-threshold region, we have developed several bootstrapped circuits. The main contribution of the proposed bootstrapped techniques is to boost the gate voltage at the both sides, which means to boost the gate voltage of the PMOS and NMOS at the same time. The proposed circuit is applicable in both increasing driving ability by boosting signals into super-threshold region and reducing the leakage current. While the circuit is operated in sub-threshold region, two-order improvement is achieved. In addition, the bootstrapped circuits are operated in triode region with the near-threshold supply. Consequently, that explain why the process
variation affects the proposed design scheme to a lesser extent. We can verify it with simulations of Monte Carlo analysis.
Four build blocks using bootstrapped circuits in on-chip data link have been proposed. The first one is a bootstrapped CMOS inverter applied to on-chip clock network. In
addition to improving the driving ability, a large gate voltage swing from -VDD to 2VDD
suppresses the sub-threshold leakage current. The test chip is able to achieve 10MHz
operation under 200mV VDD; the power consumption is 1.01μW. The Monte Carlo analysis
results indicate that a sigma of delay time is only 2.9ns at 0.2V operation. Then, an ISI-suppressed bootstrapped repeater applied to on-chip bus is proposed. The bootstrapped CMOS repeaters are inserted to drive a 10mm on-chip bus. Additionally, a precharge enhancement scheme increases the speed of the data transmission, and a leakage current reduction technique suppresses ISI jitter. The measured results demonstrate that for a 10-mm on-chip bus, it can achieve 100Mbps data rate at 0.3V, and even 0.8 Mbps at 0.1V. The third section investigates the performance of the interconnects with repeater insertion in the sub-threshold region. A 3X CMOS pre-driver and a 4X one are proposed to enhance the driving capability. As compared to the conventional repeater, the proposed ones have higher energy efficiency. The measured results show that the 3X (4X) pre-drivers can achieve 5Mbps (1.5Mbps) data rate at 0.15V with an efficiency of 35.2fJ (32.8fJ). The last section, we present a near-threshold supply ADPLL with bootstrapped digitally-controlled
ring oscillator (BDCO) that allows an ADPLL to operate with a near-threshold supply. The
BDCO is composed of a bootstrapped ring oscillator (BTRO) and a weighted
thermometer-controlled resistance network (WTRN). The proposed bootstrapped delay cell
generates large gate voltage swing to improve the driving capability significantly. The boosted output swing keeps the transistors operated in the linear region to provide high
linearity of the output frequency as function of VDD even using a near-threshold supply.
According to the transferring character of the BTRO, WTRN provides linear control while sweeping the supply voltage. The proposed ADPLL oscillates from 36.8 to 480MHz with a power consumption of 2.4-78μW under a supply voltage of 0.25-0.5V.
誌謝
光陰似箭,歲明如梭,一轉眼離開業界回到學校進修的日子已經六年了。兩千多 個日子一晃眼就過去,而在腦海中留下的是深刻的感動。這一路走來挫折不斷,挑戰 也是一波一波接著來。曾幾何時,我幾度懷疑自己能否完成這個學業,但是此時此刻, 我完成了生涯規畫中重大的階段。 一路上有許多人相助與陪伴,才能造就今天的我。除了謝天之外,該感謝的人, 真的是太多了。打從心底知道,即便是缺少了一個貴人,就只一個,我的學位可能就 不會完成!在未來的日子裡,我會繼續創造我的未來與價值,但在這之前,我謝謝所 有身邊曾經陪伴我,鼓勵我、提攜我的各位! 感謝我的指導老師 蘇朝琴教授多年來的教導,老師無論是在學術研究上縝密嚴 謹的思考方式,抑或是為人處事上圓融包容,都讓學生獲益良多。在這幾年,我一改 以往的學習態度,不再以強記的方式為學,而是敞開心胸用謙卑的心與想像力,以熱 情來迎接無止盡的學海,也因此收穫斐然。 感謝我碩士班的指導教授 吳安宇教授,雖然時空的因素沒能繼續待在您的門 下,但是每次見面時,您總是不忘提點學生在專注研究之餘,需注意未來的規劃與寫 作的技巧,學生謹記在心。感謝 洪浩喬教授,您是我的益師益友,謝謝您除了在課 堂上的教導外,分享了這麼多您在學術研究上的經驗。當學生在茫茫的學術海中亂衝 時,有一位前輩在旁提點,讓我充滿著信心。感謝 莊景德教授,以及 李鎮宜教授在 計畫中提供的晶片面積與下線的機會。缺少了這些晶片,我們的想法就只是一場空 談,更不會有這些論文的產出。感謝 周世傑教授在法國巴黎參加研討會時,帶著學 生認識世界各地的學者,增廣個人視野。 此外,也感謝曾煜輝博士與徐仁乾博士的同袍之情,我永遠不會忘記這些一起努 力的日子,希望大家這段辛勤耕耘,未來都會有所收穫。感謝小馬在On-Chip Bus 的 研究上嗚了第一槍,更謝謝這篇論文的其它共同作者:家齊以及于昇,很榮幸跟兩位 在這個主題上一同討論、成長,現在全世界都看到我們的成果了。謝謝在918 這個大 家庭中一起生活的朋友:丸子、教主、楙軒、小潘潘、方董,以及其它這六七年來所有的學弟妹,謝謝大家的協助與包容。也要謝謝這些年來,與我們一同在計畫中奮鬥 的助理們:雅雯、上容、俊秀、豐文、伉佑、美玲。 還有其它研究群的朋友們,李淑敏教授、蕭志龍教授、盧台祐博士、楊皓義博士、 杜明賢博士、胡璧合博士、蔡玉章博士、陳嘉怡博士、范銘隆博士、洪紹峰博士、許 書餘博士以及劉小胖、致煌、勖哲、柏鈞、瑋庭等各位學弟,感謝大家適時地伸出援 手,讓我的研究更為順遂。 最後,感謝我最愛的家人,我的父母、姊姊與哥哥,你們給予盈杰的栽培與殷殷 期盼,盈杰無以回報。謝謝我的妻子,佳慧,有了妳的支持我才夠無後顧地衝刺學業, 沒有妳的愛就沒有我的博士學位。而我的寶貝女兒苡瑄,把拔也要謝謝妳,因為有妳, 把拔對自己的未來更有勇氣;有了妳,把拔的人生更有意義。 謹獻給我的家人。 何盈杰 于交大電資303 2012/6/27
Table of Contents
摘要 iii
ABSTRACT ... v
Table of Contents...ix
Chapter 1 Introduction... 1
1.1. Challenges in Nano-Scaled Near-threshold Design ... 1
1.2. Near-threshold On-chip Data Link ... 2
1.3. Organization of the Dissertation... 3
Chapter 2 Background Review... 4
2.1. Effects in Nano-scaled Process [6]... 4
2.1.1. Short-Channel Effect ... 4
2.1.2. Narrow-Width Effect ... 5
2.1.3. Sub-threshold Leakage [6, 9] ... 6
2.1.4. Drain-Induced Barrier Lowering [6] ... 6
2.1.5. Gate-Induced Drain Leakage [6, 10] ... 7
2.1.6. Gate Leakage [11]... 7
2.2. Challenges in Ultra Low-voltage Designs... 8
2.2.1. Degradation of Driving Capability... 8
2.2.2. Leakage Power and Ion-to-Ioff Ratio [8, 12] ... 8
2.2.3. Process, Voltage and Temperature Variation ... 9
2.3. Low-voltage Design Techniques ... 10
2.3.1. Bootstrap Techniques... 10
2.3.2. Dynamic Voltage and Frequency Scaling... 12
2.3.3. Multi-threshold MOS Control ... 13
2.3.4. Bulk-driven Technique ... 13
2.4. Summary... 13
Chapter 3 Near-threshold Clock Network ... 15
3.1. Overview of On-chip Interconnect... 16
3.1.1. RC-Interconnect with repeater insertion ... 16
3.1.2. Time constant, power dissipation and FoM... 17
3.2. Proposed Active Leakage Reduction Bootstrapped Inverter... 18
3.3. Detail Evaluation and Discussion... 20
3.3.1. Boosting Efficiency ... 21
3.3.2. Reduction of Leakage Current... 22
3.3.3. Delay Time Analysis... 25
3.3.4. Delay Time Analysis of Process Variation ... 27
3.4.1. Implementation of the Bootstrap Capacitor... 28
3.4.2. Chip Implementation and Measurement ... 29
3.5. Summary... 31
Chapter 4 Near-threshold On-chip Bus... 32
4.1. Proposed On-chip Bus Architecture ... 32
4.2. ISI-suppressed Bootstrapped Driver... 33
4.3. Detailed Evaluation and Comparisons ... 35
4.3.1. Boosting Efficiency ... 35
4.3.2. Leakage Current Reduction... 36
4.3.3. Leakage Power Analysis... 36
4.3.4. ISI Suppression... 41
4.3.5. Energy Efficiency ... 43
4.3.6. Monte Carlo Simulations... 45
4.4. Experimental Setup and Measurement... 47
4.4.1. Chip implementation ... 47
4.4.2. Measured Waveforms ... 48
4.4.3. Leakage Power Measurement... 53
4.5. Summary... 55
Chapter 5 High-boosting Pre-driver ... 56
5.1. Proposed High-boosting Pre-driver... 56
5.2. High-boosting Pre-driver in Long Interconnects... 59
5.2.1. Leakage Current Reduction... 59
5.2.2. Energy Efficiency ... 59
5.2.3. Boosting Efficiency ... 62
5.2.4. Monte Carlo Simulations... 63
5.3. Experiment and Measurement Results ... 65
5.3.1. Chip implementation ... 65
5.3.2. Measured Waveforms ... 66
5.4. Summary... 68
Chapter 6 Near-threshold ADPLL ... 70
6.1. Architecture of Proposed All-Digital PLL ... 71
6.1.1. PFD, PS and TDC... 71
6.1.2. DLF... 72
6.1.3. Bootstrapped Digitally-Controlled Oscillator ... 73
6.1.3.1. Bootstrapped Ring Oscillator ... 74
6.1.3.2. Weighted-Thermometer Code Control ... 75
6.1.4. SDM ... 76
6.2.1. Power Analysis of BTRO ... 77
6.2.2. Linearity Analysis of BTRO... 78
6.3. Experimental Results and Comparisons... 81
6.3.1. Chip Implementation ... 81 6.3.2. Measured Results... 83 6.3.3. Comparisons ... 86 6.4. Conclusions ... 87
Chapter 7 Conclusions ... 98
References ... 98
VITA
... 98
Publication List ... 98
Chapter 1
Introduction
In the past few years, low voltage and low power designs have attracted significant attentions because of the popularity of portable devices. Emerging embedded biomedical applications have once more pushed the low-power designs into another extreme case.
According to P=fCV2, scaling the supply voltage near the threshold voltage is the most favorable
solution for low-power designs. A 180mV, 1024-point FFT processor is a pioneer sub-threshold supply design [1], and followed by [2]. Sub-threshold SRAM is another important category [3]. Other designs include a 6-bit Flash ADC for use at 0.2–0.9V and a 14-tap 8-bit finite impulse response (FIR) at 20MHz under 0.27V [4-5].
“Sustainability” is the theme of the ASSCC 2011 and ISSCC 2012. They focused on the design techniques of energy-efficient and low-voltage circuits and of improving battery lifetime. A panel discussion about 0.5V system is held as well during ASSCC 2011, which pointed out the challenges of this new trend. However, energy-efficient designs under a low-voltage supply usually have speed degradation. A new circuit design strategy should perform good trade-off
between energy efficiency and speed. In addition, the nano-scaled effects, Ion/Ioff ratio, and
process variations are degraded significantly, affecting the circuit performance, the power efficiency (leakage power), and the fabrication yield.
1.1. Challenges in Nano-Scaled Near-threshold Design
As technology continues to be scaled down, the performance of nano-scaled devices are influenced by many reasons, such as threshold voltage, channel physical dimensions, doping concentration, gate oxide thickness, and supply voltage. Due to the fluctuation of these factors,
short-channel effect (SCE), narrow-width effect, drain-induced barrier lowering (DIBL), gate-induced drain leakage (GIDL), and gate leakage are incurred. These effects become a
critical bottleneck for the trade-off among speed, power and cost requirements.
Near-threshold circuit design is affected significantly because of the degradation of the
driving capability, the Ion/Ioff ratio, and variations. Although circuits down to the near-threshold
supply can achieve ultra-low power consumption, the driving capability of CMOS devices require a large area to compensate for driving efficiency. A conventional CMOS circuit also
incurs a severe Ioff problem in the nano-meter process. In addition, the near-threshold circuit
suffers serious process, voltage and temperature (PVT) variations, which could be even several times variations.
1.2. Near-threshold On-chip Data Link
Fig. 1-1 shows a block diagram of on-chip data link system. According to different system requirement, serializer/de-erializer might be needed. Apart from serializer/de-erializer, the on-chip bus and local oscillator are the most important macros in the system.
On-chip interconnects becomes a bottleneck with respect to speed, power, cost and noise while the technology scaling to nano-meter. Among the on-chip bus design categories, repeater insertion is a popular method for interconnects. In this dissertation, we discuss challenges and design issues for a near-threshold clock buffer and a nano-scaled near-threshold data link circuit. In order to solve these problems, we have proposed a new on-chip clock network and data bus with several bootstrapped techniques.
Fig. 1-1 Basic function blocks of on-chip data link.
Phase-locked loops (PLLs) often play an important role to serve as a local oscillator. In this
dissertation, we develop a bootstrapped ring oscillator (BTRO), which can operate at 0.2-0.6V supply voltage. Owing to the bootstrapped technique, it achieves high linearity as a function of voltage supply. Based on this feature, a new ADPLL with BTRO is proposed as well. It can achieve 480MHz with only consuming 78 μW.
1.3. Organization of the Dissertation
The rests of the dissertation are organized as follows. Section II reviews the backgrounds of this dissertation. First, several effects of the nano-scaled devices are introduced. Challenges in low-voltage circuit design are discussed as well. Moreover, some reported low-voltage techniques are reviewed. Section III introduces the repeated-RC on-chip interconnect architecture. A bootstrapped inverter applied to a 0.2V clock network is developed. It also features an active leakage current reduction technique to save leakage power. Section IV introduces a low-voltage on-chip bus with an ISI-suppressed bootstrapped repeater. In order to achieve high energy-efficiency, Section V introduces high-boosting bootstrapped repeaters. In Section VI, we present a near-threshold ADPLL using a bootstrapped digitally-controlled oscillator (DCO). Finally, Section VII draws conclusions and future works.
Chapter 2
Background Review
In the past few decades, the scaling of CMOS technologies has been the major driving force of the trend of Moore’s Law. As scaling to nanometer technology, the process parameters are no longer scaled to a single scaling factor because the saturation of carrier velocity and the increasing sub-threshold leakage current become serious. With the continuing shrinking of the channel length and the gate-oxide thickness, some non-ideal effects appear to affect circuits. Additionally, lowering the supply of nano-scaled designs to the near-threshold region has several detrimental impacts. In this chapter, the effects in nano-scaled near-threshold design are briefly reviewed. Subsequently, popular low-voltage design techniques shall be introduced as well.
2.1. Effects in Nano-scaled Process [6]
2.1.1. Short-Channel Effect
The short-channel effect (SCE) is occurred on a MOSFET device in which channel length is as the same order of magnitude as the depletion-layer widths of the source and drain junction. The SCE is often modeled of charge sharing, where the source and drain depletion regions store
the charge under the gate. The threshold voltage Vth of a MOSFET can be represented using
depletion approximation as 2 B th fb f OX Q V V C = + Φ + (2.1)
where V is the flat-band voltage; fb Φ is the Fermi potential; f Q is the charge of channel ; and B
COX is the oxide capacitance. While channel length is shrunk, the stored charges are reduced
significantly in the doped area. As a result, threshold voltage is increased due to increasing channel length.
Fig. 2-1. Threshold voltage with change in channel length due to SCE [6].
Halo doping, which is a non-uniform channel doping in modern processes to adjust
threshold voltage is so-called reverse short-channel effect (RSCE). The increasing of threshold
voltage comes from extra doping charges near the source and drain regions. As the device's length is reduced, the threshold voltage of the device increases. The behavior is the opposite of what is expected from the SCE [7-8].
2.1.2. Narrow-Width Effect
The narrow-width effect (NWE) occurs when the threshold voltage Vth of a nano-scaled
MOSFET is modulated by the gate width. Hence the device width modulates the drain current. According to the Eq.(2.1), there are two main reasons to cause NWE. First, the charge in the gate-induced depletion region results an increase of threshold voltage. The second on is that channel doping is higher along the width dimension. Because dopants trespass under the gate, higher voltage is necessary to incur the channel inversion. Fig. 2-2 shows the NWE as a function of channel width. Width 300n 500n 700n 900n ID(nA) Vth(mV) 290 270 20 30 40
2.1.3. Sub-threshold Leakage [6, 9]
In a nano-scaled device, the sub-threshold (or weak inversion conduction) current Isub is
happened with gate-source voltage below the threshold voltage Vth. Itcan be expressed as in
Eq.(2.2). 2exp( GS th ) 1 exp( DS ) sub dep T T T V V V W I C V L nV V μ − ⎛ − ⎞ = ⎜ − ⎟ ⎝ ⎠. (2.2)
Where μ is the effective mobility; Cdep is the depletion capacitance; W and L are the width and
length of the device; VT is the thermal voltage; VGS is the gate-to-source voltage; n is the
sub-threshold slope factor, and VDS is the drain-to-source voltage.
As compared to the strong inversion region, the sub-threshold current is dominated by the diffusion current instead. The movement by the diffusion is likely to charge flowing in BJTs. However, sub-threshold current is affected by other phenomenon, such as drain-induced barrier lowering (DIBL) and gate-induced drain leakage (GIDL). They are introduced in the following sections.
2.1.4. Drain-Induced Barrier Lowering [6]
-0.4 -0.2 0.0 0.2 0.4 0.6 10p 100p 1n 10n 100n 1μ 10μ 100μ D rai n c urrent (A mp)
Gate voltage (Volt)
VDS = 0.1 V VDS = 0.2 V VDS = 0.3 V Conventional Ioff @ VG= 0 V D V ↑ VD= VDD VG= - VDD @25 C,TT Corner° (DIBL) (GIDL)
Fig. 2-3. Drain current of a NMOS device vs. VG in the near-threshold region.
In micron-scaled devices, the source and drain are separated far enough that no effect is incurred on the depletion regions. In such a case, the drain current is nearly independent of the channel length and drain bias. At the off conditions, the potential barrier between the source and
drain prevents electrons from flowing to the drain. In a short-channel device, the Vth varies with
channel length according to the SCE. In addition, DIBL effect induces energy barrier lowering with increasing drain voltage [6]. When a short-channel device uses a higher drain voltage, the energy barrier decreases lower, resulting in further increasing the drain current. Fig. 2-3 depicts
ID as a function of VG, which illustrates DIBL effect as the drain voltage increases. As shown in
Fig. 2-1, DIBL effect lowers the threshold voltage, but remains the slope in the near-threshold region.
2.1.5. Gate-Induced Drain Leakage [6, 10]
Gate-induced drain leakage (GIDL) occurs in the drain junction owing to high field effect in the drain junction of an MOSFET. It usually happens when the electric field in or around the gated PN junction becomes more substantial with the applied gate voltage. The high-field effects, like avalanche multiplication and band-to-band tunneling (BTBT), become severely. Thus, the leakage current of a reverse-biased gated diode may increase dramatically when the negative gate voltage begins to cause field crowding and peak field. In order to suppress GIDL, thicker oxide and lower electric field might be used. Besides, very high drain doping is considerable for minimizing GIDL as well. Figure 2-3 also shows the GIDL according to drain current characters of a NMOS device with different drain voltage.
2.1.6. Gate Leakage [11]
In nanometer technology, the process parameters as the gate oxide layer thickness TOX has
been scaled to the values in the range of 12–22Å. As mentioned, DIBL also incurs in the
presence of large gate tunneling leakage current Igate. Igate increases due to the finite probability
of an electron tunneling through the SiO2 layer directly. The probability is a strong exponential
function of TOX. Only a difference of 2Å TOX thinner may increase an order of magnitude.
Therefore, it becomes the most sensitive parameter with respect to any physical dimensions.
Typically, Igate is much smaller than sub-threshold leakage current Isub, while TOX is large than
20Å. In simulation level, BSIM4 model (level =54) includes nano-scaled effects such as GIDL
and DIBL. In addition, Igate has taken into account as well. For fast simulation and reliable
2.2. Challenges in Ultra Low-voltage Designs
2.2.1. Degradation of Driving Capability
When a MOSFET device is operated in the super-Vth region, the drain current operated in
the saturation region is a function of the gate voltage. It can be represented as Eq.(2.3).
(
)
2 , ( ) 1 D Sat ox GS th DS W I C V V V L μ λ = − + . (2.3)Where Cox is the gate oxide capacitance per unit area; and λ is the factor for channel-length
modulation. According to Eq.(2.3), drain current ID,Sat decreases quadratically when the gate
voltage goes lowering. When the gate voltage keeps going lower into the sub-threshold region, the drain current starts to decrease exponentially, as shown in Eq.(2.2). That is to say, when our design is operated in near-threshold region, poor driving is the first design issue. In normal 1V designs, sizing is a way that we often use to increasing driving. However, gate capacitance of a MOS device drops very slightly when the gate drive lowers to nearly threshold voltage. As a result, enlarging device size to enhance driving capability seems not a good idea in the near-threshold region.
2.2.2. Leakage Power and I
on-to-I
offRatio [8, 12]
Ion-to-Ioff ratio becomes a critical factor in near-threshold digital circuits and near-threshold
circuits. The inherently small Ion-to-Ioff ratio dominates how many transistors can be connected
per node. As reported in [12], the degradation in Ion-to-Ioff is from approximately 107 to 104 and it
implies that there is a strong interaction between the ON and the OFF devices in sub-threshold region when it comes to setting the voltage level of critical signals. Unfortunately, this causes a relevant failure mechanism in circuit operation. As illustrated in Fig. 2-4, an inverter is served as
a driver with a capacitive load of 200 fF while VDD is being swept from 0.1–0.3V. The circuit is
operated to the limit of the speed. Obviously, the leakage power becomes a greater portion of the
0.10 0.15 0.20 0.25 0.30 0.0 3.0n 6.0n 9.0n 12.0n 15.0n
Supply voltage (Volt)
Conventional Conventional 0 10 20 30 40 50 Plea kag e / P T (%) Plea kag e (W a tt) @25 C,TT Corner° Pleakage Pleakage/ PT(%) 0.2pF ; ; W 320nm W 460nm m=50 L n 60nm L p 60nm ⎛ ⎞ = ⎛ ⎞ = ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠
Fig. 2-4. Leakage power on a repeater at subthreshold supply.
2.2.3. Process, Voltage and Temperature Variation
Process, voltage and temperature (PVT) corners induced performance variation makes the
circuits design in near-threshold region tremendously challenging. First of all, process variability affects current due to some process parameters, such as mobility and threshold voltage. Even a small variation may lead to exponentially mismatch. The process variation is divided into two major categories [13]. Besides, it is classified into more specific categories, according to their
physical range on a wafer or on a die [14]. Fig. 2-5 depicts ID as a function of gate voltage in the
near-threshold region, which illustrates process and voltage effect at room temperature. It shows
that the variation of ID becomes worse due to the process and voltage fluctuation as the supply
voltage goes lower.
Apart from the static term of the process variation after a fabricated die, voltage supply variation is related to the fluctuations during the circuits operations. Real-time fluctuations caused by a voltage drop or inductance effect in wire may result in function failure [14-15]. The impact of temperature is another important factor to the variation and reliability in a nano-scaled chip, especially the supply voltage down to the near-threshold region. The sub-threshold current
is highly depending on the temperature owing to the parameter VT. In contrast to the current in
the super-threshold region, ID increases as the temperature is raised. The measured temperature
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1p 10p 100p 1n 10n 100n 1μ 10μ 100μ Drai n cu rrent (Amp )
Gate voltage (Volt)
FF TT SS IDmin, SS IDmin, TT IDmin, FF ° @25 C
Fig. 2-5. Drain current in different corners in the near-threshold region.
2.3. Low-voltage Design Techniques
As mentioned, circuit design in the near-threshold region has many challenges. Several techniques have been reported to solve the problems or improve energy efficiency. They are briefly reviewed in the following sections.
2.3.1. Bootstrap Techniques
Bootstrapping is an effective means of enhancing the speed in order to raise the driving efficiency. Therefore, a previous work has developed a bootstrapped CMOS driver for large capacitive loads, shown if Fig. 2-6 [16]. According to [16], the bootstrapped driver consists of a pull-up and pull-down control pair to drive the PMOS and NMOS transistors, respectively. The
gate voltages of PMOS and NMOS driver transistors are kept VDD and 0 in the cut-off phase. In
the driving phase, the gate voltages are fed -VDD and 2VDD to increase the current density. When
the input Vin is at 0 V, the Va is at VDD and the output of the inverter is at VDD. Moreover, MN2
and MN1b are off; MP2 and MP1b are on. Therefore, V2P is pre-charge to 0 V by MN2b, and
bootstrap capacitor Cbp stores a potential of VDD. When the Vin transits from 0 V to VDD (from L
to H), V2P is boosted from 0 V to -VDD. Then, the potential of a -VDD is passed from V2P to V1P.
Consequently, the potential of a -VDD is at the gate of the driver MP2, which drives Vout by VSG
Fig.2-6 Reported bootstrapped driver in [16].
The driver in [16] successful enhances the driving capability by boosting the gate voltage, which is suitable using in the near-threshold supply as well. However, there are several drawbacks such as reverse leakage current or non-ideal transient edge. Some researchers have
proposed some improvements based on [16]. Among them, Kil et al. proposed a sub-threshold
bootstrapped repeater in a 9MHz distributed clock network at 0.4V [17]. The sub-threshold bootstrapped repeater is depicted in Fig. 2-7, which is composed of two bootstrap circuits. One is for pre-boosting, and the other is for driving. The circuit of per-boosting enhances the pre-charge
current to increase the speed. In addition, MPS2 and MNS2 are switches that can feed the boosted
signal back to eliminate the reverse current. However, while this approach is applied to a data link, the kick-back disturbance through the boosting capacitors causes a large timing jitter. Furthermore, it consumes large static power and is associated with high capacitor costs.
Fig. 2-7 Reported bootstrapped driver in [17].
2.3.2. Dynamic Voltage and Frequency Scaling
Dynamic Voltage and Frequency Scaling (DVFS) is a popular power saving scheme since it
is broadly used in microprocessor and DSP ASICs [18]. Since different functions need different execution times, supply voltage or the data rate can be dynamically changed to meet the specification requirements in DVFS system; hence, the power consumption can be optimized for the computational tasks conditionally.
On the other hand, DVFS scheme also applied to lower the operating frequency in portable products when battery goes low. DVFS is able to keep system working on basic functions in order to extend the battery lifetime or stand-by time. DVFS scheme is applied to adjust PVT variation as well [19]. In fact, such designs often remain large redundant margin in particle chip. DVFS determines the supply voltage or the frequency for the task appropriately and dynamically and therefore exceeds most power efficient.
Critical Path Monitors (CPMs) [18, 20-21] a sub-module of these worst-case margins by
using a delay-chain which is replica of the critical path of the actual design. The propagation delay through this replica-path is monitored and voltage and frequency are scaled until the replica-path just meets timing. The replica-path tracks the critical-path delay across inter-die
process variations and global fluctuations on supply voltage and temperature, thereby eliminating margins due to global PVT variations.
2.3.3. Multi-threshold MOS Control
Since the circuits operate in the near-threshold region, lowering the supply voltage
decreases ID according to equations (2.2) and (2.3). It results in a drastic rising in gate delay time.
In order to overcome the speed degradation problem, one way is to reduce the Vth of a MOSFET
device [22-23]. As Vth is reduced, however, another significant problem incurs. A rapid increase
in stand-by current due to changes in the sub-threshold leakage current damages the power performance. To save stand-by power during the sleeping mode, a power management scheme combined small embedded processor and multi-threshold sleep control is reported in [24]. It
utilizes high Vth MOSFET devices, resulting in low standby and dynamic power.
2.3.4. Bulk-driven Technique
Similar to multi-threshold MOS control, the bulk-driven technique is using circuit
techniques to shift Vth lower or higher by biasing bulk voltage. Sometime, the bulk-driven
technique is called “adaptive body-biasing” as well [25]. Some contributed works based on the bulk-driven technique are reported in [26-27]. The threshold voltage can be expressed as in Eq.(2.4) [28].
0 2 2
th th F SB F
V =V −γ ⎡⎣ φ −V − φ ⎤⎦ . (2.4)
It is the well-known equation relating how the body voltage affects the threshold voltage, where
γ is the body effect coefficient. The bulk-driven technique has several important features. To
enhance the driving capability by modulate the Vth is the obvious one. The most important
feature is that it can allow zero, negative, and even small positive bias voltages to achieve the desired DC currents such that it has a good alternative to increase the input common-mode voltage range. In normal circuit design, the bulk terminals of PMOS (NMOS) is always connected to the highest (lowest) potential to avoid the latch-up problem from junction forward biasing of the bulk–source.
2.4. Summary
In this chapter, several backgrounds of the dissertation have been briefly reviewed. Since some non-ideal effects owing to the shrinking of the channel length and the gate-oxide thickness,
current variation caused by environment makes circuit designs more challenging. Additionally, nano-scaled circuits design using near-threshold supply has several detrimental impacts. Trade-off between performance and energy efficiency should be carefully dealt with. Last part of this chapter, some popular low-voltage design techniques have been introduced as well. Based on the concept of the bootstrap technique, we will develop several bootstrap circuits in the following chapters.
Chapter 3
Near-threshold Clock Network
A driver with strong driving current and little skew is needed in a clock network. According to Fig. 3-1(a), the conventional bootstrapped driver consists of a pull-up and pull-down control pair to drive the PMOS and NMOS transistors, respectively. As mentioned in chapter 2, the gate
voltages of PMOS and NMOS driver transistors are kept VDD and 0 in the cut-off phase; they are
fed -VDD and 2VDD to increase the current density in the driving phase. Despite a previous effort
[35] to increase the boosting efficiency by rearranging the timing of the switching and boosting signals, reverse leakage current remains the main drawback of conventional bootstrapped drivers. Among other bootstrapped circuits, single capacitor ones reduce the costs of hardware overhead [36-37]. However, their complex circuitry design seriously degrades charge sharing at the capacitor node. Moreover, the leakage current is problematic as well.
(a) (b)
Fig. 3-1.(a) Conventional bootstrapped circuit (b) Proposed bootstrapped circuit.
In this chapter, we present a sub-threshold clock network with a bootstrapped CMOS inverter operated at sub-threshold power supply. The bootstrapped CMOS inverter is introduced to achieve high boosting efficiency and improve the speed. It is applicable in both increasing driving ability by boosting signals into super-threshold region and reducing the leakage current as well. Fig. 3-1(b) illustrates the circuit diagram. Theoretically, the PN bootstrap circuit
produces an output swing of -VDD to 2VDD. 2VDD (-VDD) enhances the driving capability of
NMOS (PMOS) driver and suppresses the leakage for the PMOS (NMOS). The PN bootstrap
negative VSG (VGS) = -VDD suppresses leakage current while the PMOS (NMOS) driver is turned
off. Moreover, as compared to other previous works, the proposed design scheme has fewer devices in the sub-threshold region. Consequently, that explain why the process variation affects the proposed design scheme to a lesser extent.
3.1. Overview of On-chip Interconnect
Before introducing the proposed bootstrapped CMOS inverter, the fundamental of interconnect is briefly reviewed. First of all, interconnect and repeater linear model is adopted according to VLSI parameters scaling in this section. In addition, the definitions of speed and power consumption of the on-chip interconnect circuits are described. All these parameters
introduced from linear models to define figure of merit (FoM), the index for optimal global
on-chip interconnect design.
3.1.1. RC-Interconnect with Repeater Insertion
Top Metal
Bottom Metal
Fig. 3-2. Cross section of interconnect configurations.
In general, a global interconnect is assumed to be placed between two adjacent orthogonal
metal layers and two coplanar wires, as shown in Fig. 3-2, where W and S are the interconnect
width and spacing; T is the interconnect thickness and H is the dielectric height; Cf is the
fringing-field capacitance; Ca is the parallel plate capacitance to the top and bottom layers of
metal; Cc is the coupling capacitance between the neighboring interconnects. The interconnect
resistance per unit length is denoted as (3-1).
T W rw ⋅ = ρ . (3-1)
With technology scaling and global interconnect increasing, repeaters insertion is broadly used to reduce delay and power consumption. Several literatures have addressed the optimization of global interconnect design with repeater insertion [29-33]. Since the interconnect parameters
can be determined by width S and spacing W and so on, on-chip interconnects with repeaters
insertion can be analyzed by Elmore RC delay model. According to Elmore delay model, time constant τof whole interconnect can be given from the model depicted in [29-31]
When we separate global interconnect into several segments, the small delay penalty of repeaters can be tolerated on these critical segments. Time constant τis dominated by interconnect segment. However, if the segment of global interconnect is over-shorten, the driving capability of repeaters decreases severely. Consequently, there is a trade-off between time constant τand power consumption.
3.1.2. Time Constant, Power Dissipation and Figure of Merit
Data rate is relative to time constant. Rising time and falling time can be estimated by the step response The output rise time is defined from the 20% transition edge to 80% transition edge, as shown in Eq.(3-2).
r 80% 20%
t =t −t ≅1.386τ . (3-2)
The minimum rising time is specified as 0.125 unit interval (UI) in the SATA standard, where
t80% and t20% is the time when output voltage exceeds 80% VDD and 20% VDD, respectively during
the rising edge [34].
Besides speed is one of the most important factors in on-chip interconnect design, power consumption is another basic consideration as well. The total power consumption includes not only the switching power, but also the leakage power and the short-circuit power, which is
expressed as PSW, PSC and PLeakage, respectively. The detail expressions and discussions are
reported in [29-31]. The total power dissipation of each interconnect is written as in Eq.(3-3).
(
)
T SW SC Leakage L P P P P h ⎛ ⎞ =⎜ ⎟× + + ⎝ ⎠ . (3-3)Where L is the total length of interconnect and h is the separated segment length. Since switching
power dissipation is a great portion of total power, PSW can be expressed as in Eq.(3-4).
(
)
2 SW gs db Wire DD mL P f c c c V h α ⎡ ⎤ = ⋅⎢ + + ⎥⋅ ⎣ ⎦ . (3-4)(cgs+cdb) is the parasitic capacitor of repeater.
Performance of interconnect is effected by many design parameters. Most of them were
discussed in literatures [32-33]. The FoM is used to compare the performance. Here, FoM1 in
Eq.(3-5) is defined as the total energy per bit to express the energy efficiency. 2 1 FoM T . T Total DD P E C V f α = = ≈ (3-5)
Where ET represents the total energy. Fig. 3-3 shows the energy per bit is a function where total
L is 10 mm and ET is depicted as a function of segment length h and repeater finger m. As a
result, we can find out that the design is more energy-efficient as h is longer and m is using
minimum m=1. Since the supply voltage VDD is assigned by the system requirement, the only
way to gain the energy efficiency is using long segment length h. However, it suffers great penalty of speed. According to this limiting fact, the most energy efficiency happens as using maximum h and the minimum driver sizing. It becomes a trade-off depending on the requirement. 0 400 800 1,200 102 103 0 0.5 1 1.5 2 Finger m Segment length (um)
E ng ery pe r bi t (pJ )
Fig. 3-3. Effect of segment length and fingers of repeaters on the energy per bit.
3.2. Active Leakage Reduction Bootstrapped Inverter
Fig. 3-4 schematically depicts the proposed active leakage reduction bootstrapped inverter
(ALBI). Where CBP and CBN are the bootstrap capacitors; MP1 and MN1 are the transistors for CBP
pre-charge and CBN pre-discharge; INV refers to the inverter to control MP2 and MN2; MPD and
MND are the output drivers for CL; NP and NN are the boosted nodes. The node NB is boosted
operations with the input switching from H to L and from L to H respectively. Fig. 3-7 shows the ALBI simulated transient waveforms with an output load of 0.5pF under a power supply of
200mV. According to this figure, before Vin transits from H-to-L, node NN has the initial voltage
of 0V. After transiting from H-to-L, NN is boosted below ground to (-188mV). Meanwhile, MP2
is turned off and MN2 is turned on. Therefore, the boosted signal at NN passes through MN1 to NB
to drive MPD in order to pull up the capacitive load CL. At this moment, MP1 is turned on to
pre-charge NP to VDD (0.2V). However, MN1 is turned on reversely causing the reverse current
flow to charge NN. At the end of the period while Vin is L, NN still holds (-90mV). When Vin goes
from L to H, the operation is similar to Vin transiting from H to L. NP is boosted above VDD to
389mV and discharged to 303mV at the end of the period while Vin is H.
Fig. 3-4. Proposed bootstrapped inverter.
Fig. 3-6. Proposed bootstrapped inverter operations (input L-to-H). 1.9μ 2.0μ 2.1μ -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 Vo ltag e (V) Time (sec) Vin NP NN Vout NP NN Vin Vout 1.9μ 2.0μ 2.1μ -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 Vo ltag e (V) Time (sec) Vin NP NN Vout NP NN Vin Vout 389mV 303mV -188mV P1 M precharge to 0.2V N1 M pre-discharge to 0V -90mV
Fig. 3-7. Simulated timing waveforms at 5 MHz at 200 mV VDD.
3.3. Detail Evaluation and Discussion
The proposed ALBI is superior to previous designs in terms of leakage power and switching
speed. In a low-voltage circuit design, the decreasing the Ion/Ioff ratio degrades the noise margin.
In the proposed design, the boosted voltage is used in both driving phase and cut-off phase.
Additionally, the proposed design improves the Ion/Ioff ratio by using the active bootstrapped
leakage reduction method. Moreover, fewer design components increase the speed of the bootstrapped circuit. Owing to the fewer components operating in the sub-threshold region, the proposed design scheme performs better than other previous works in terms of Monte Carol analysis.
To compare the performances of the proposed scheme and conventional ones more fairly, this work re-designed the conventional inverter and reported bootstrapped drivers by using the 90nm process. The sizes of the conventional inverter and the bootstrapped driver are designed to obtain the same rise/fall transient output waveforms. Their device sizes are listed in TABLE 3-1. A 30fF boost capacitor is used to ensure that the boosting efficiency exceeds 80%. These features are evaluated in detail as follows.
TABLE 3-1 Device Sizing
2 300 / 80 1 260 / 80 driver 4 200 / 80 4 200 / 80 switch 4 200 / 80 4 400 / 80 inverter Bootstrapped driver [17] 2 340 / 80 1 250 / 80 driver 3 200 / 80 3 200 / 80 switch 4 200 / 80 4 400 / 80 inverter Bootstrapped driver [16] 2 340 / 80 1 285 / 80 driver 1 200 / 160 1 200 / 160 MP2, MN2 1 200 / 80 1 200 / 80 MP1, MN1 4 200 / 80 4 400 / 80 inverter Proposed Bootstrapped inverter 30 440 / 80 30 420 / 80 inverter Conventional INV mp PMOS W/L (nm /nm) mn NMOS W/L (nm /nm) Sub-circuit Driver topology 2 300 / 80 1 260 / 80 driver 4 200 / 80 4 200 / 80 switch 4 200 / 80 4 400 / 80 inverter Bootstrapped driver [17] 2 340 / 80 1 250 / 80 driver 3 200 / 80 3 200 / 80 switch 4 200 / 80 4 400 / 80 inverter Bootstrapped driver [16] 2 340 / 80 1 285 / 80 driver 1 200 / 160 1 200 / 160 MP2, MN2 1 200 / 80 1 200 / 80 MP1, MN1 4 200 / 80 4 400 / 80 inverter Proposed Bootstrapped inverter 30 440 / 80 30 420 / 80 inverter Conventional INV mp PMOS W/L (nm /nm) mn NMOS W/L (nm /nm) Sub-circuit Driver topology
3.3.1. Boosting Efficiency
Ideally, the boosted node NB generates a voltage swing from 2VDD to –VDD. However, the
parasitic capacitance at node NB exhibits the charge-sharing effect with the bootstrap capacitance
[17]. For example, when NB transitions above VDD, consider the equivalent circuit of the upper
side shown in Fig. 3-4. VBP and CPTP are the voltage and the total parasitic capacitance at NB,
respectively. Ideally, VBP transits from –VDD to 2VDD. Thus,
2 BP PTP BP DD DD BP PTP BP PTP C C V V V C C C C = ⋅ − ⋅ + + . (3-6)
To increase driving capability, the bootstrap capacitance is designed to be significantly larger than the parasitic capacitance at the node. As a result, (3-6) can be rewritten as (3-7),
2 2 . BP BP DD P DD BP PTP C V V V C C β ≈ ⋅ ⋅ + (3-7) P
from VDD to below ground, the estimated VBN is
(
)
(
)
. BN BN DD N DD BN PTN C V V V C C β ≈ ⋅ − ⋅ − + (3-8)Based on larger bootstrap capacitance, the boosting efficiency is better. In order to observe the leakage power and time delay time in a more ideal case, we used 100fF as a bootstrap capacitor. In our test chip, based on a trade-off between cost and performance, a 30fF boost capacitor is used for sure that the boosting efficiency is 80% at least. As shown in the Fig. 3-8, the boosting efficiency is 88% when using a 30fF bootstrap capacitor.
0 20 40 60 80 100 55 60 65 70 75 80 85 90 95 100 Boost effic ie nc y (% ) Boost capacitance (fF) Boost efficiency Bo osti ng e ff ici en cy (%) Bootstrap capacitor (fF) Boosting efficiency (%)
Fig. 3-8. Boosting efficiency vs. bootstrap capacitor.
3.3.2. Reduction of Leakage Current
In the proposed design scheme, the boosted high (2VDD) at NB enhances the driving
capability of MND and suppresses the leakage current of MPD. Similarly, the boosted low (-VDD)
at NB enhances the driving of MPD and reduces the leakage of MND.
The Ioff current is primarily formed by a sub-threshold leakage current [38-39]. Hence,
scaling the supply voltage lowers the Ion/Ioff ratio. In the previous literature, bootstrapped drivers
improve the Ion/Ioff ratio only by enhancing Ion unidirectional. The proposed design effectively
suppresses the leakage current of PMOS (NMOS) by providing a potential of a -VDD to VSG
(VGS). According to the I-V formula in sub-threshold region, our design s reduces the leakage
current exponentially.
power under dynamic operations is difficult. The leakage power of a periodic waveform can be
estimated by separating it from the average total power. The total energy ET of a period of T is
(
)
T T SW SC Leakage SW SC Leakage E P T P P P T E E P T = ⋅ ≈ + + ⋅ = + + ⋅ , (3-9)where E , T ESW , ESC and ELeakage represents the total energy, the switching energy, the
short-circuit energy, and the leakage energy. The switching energy, short circuit energy and leakage current are assumed to remain constant under the same power supply. A long wire can be regarded as large capacitive load is pF range. When a CMOS driver drives heavy capacitive
loads, the energy contributions of the short-circuit current can be ignored. ELeakage is
proportional to T; E is the total energy of the repeaters. Thus, we can rewrite Eq.(3-9) as rep
2 .
2
T rep wire DD Leakage
E ≈⎛⎜E +α C V ⎞⎟+P ⋅T
⎝ ⎠ (3-10)
For two identical signals with different periods T1 and T2, Leakage power PLeakage is derived as
(
)
1 1 2 2 1 2 T T Leakage P T P T P T T ⋅ − ⋅ = − . (3-11)Fig. 3-9 shows the comparison results for the leakage power as a function of frequency with a 0.2pF capacitive load in different temperature and process corners. The ratio of leakage power
to total power is also shown in Fig. 3-9. Owing to the negative VGS control, the leakage power at
10MHz under 0.2V of the proposed bootstrapped inverter is 2pW. The leakage power is 3.9nW for a conventional inverter, 0.15nW for [16], and 39nW for [17]. Although the PMOS (NMOS)
transistor is turned off with the positive voltage VSG (VGS) = VDD in [17], the leakage power in
[17] is more than three orders higher than in the proposed design scheme. When the operating frequency goes from 10MHz to 100kHz, the potential of the boost node become lower due to the node leakage degrades the leakage performance. The potential of the boost node even returns to
100k 1M 10M 1p 10p 100p 1n 10n 100n Leakage p ower (W att ) Clock frequency (Hz) 0 20 40 60 80 100 DD @TT 25 C, V =0.2V° N P GS SG DD V , V = V N P GS SG DD V , V = -V N P GS SG V , V = 0 PLe ak /Pto tal (% ) Conventional Proposed JSSC1997[16] TVLSI2008[17] Leak. power PLeak/Ptotal ★ ▲ (a) DD @SS -40 C, V =0.2V° PLe ak /Ptota l (% ) Conventional Proposed JSSC1997[16] TVLSI2008[17] Leakage power PLeak/Ptotal ★ ▲ 100k 1M 10M 1f 10f 100f 1p 10p 100p 1n Lea kage po wer (Watt) Clock frequency (Hz) -10 0 10 20 30 40 50 (b)
100k 1M 10M 10n 100n 1μ 10μ Leak age pow er (W att ) Clock frequency (Hz) 0 20 40 60 80 100 DD @FF 125 C, V =0.2V° PLe ak /Pto ta l (% ) (c)
Fig. 3-9. Leakage power as a function of frequency from 10 MHz to 100 kHz in corners.
3.3.3. Delay Time Analysis
Delay time is another important feature of bootstapped circuits. Although the driving transistors operate in a triode region under the subthreshlod-supply, other devices remain in the subthreshlod region. The total delay time is thus the sum of the propagation delay of the INV and the driver, which is denoted as
, , ,
P BI P INV P Driver
t =t +t . (3-12)
Where tP BI, , tP INV, , and tP Driver, are the delays of the bootstrapped inverter, the INV, and the
driver, respectively.
Assume that the boost efficiency is the same for all bootstrapped drivers. Delay time of the INV becomes a dominant factor. The sub-threshold logic delay is derived in [9] as
2exp( ) f L DD P DD th dep T T k C V t V V W C V L nV μ ⋅ ⋅ = − . (3-13)
Where kf is a fitting parameter. However, circuit delay time is related to the RC loading effects.
The ALBI has the shortest delay time among the other bootstrapped circuits since the loading of
Fig. 3-10 summarizes the comparison results for the delay time (from H to L) and the power
consumption as a function of CL at 10 MHz with a supply of 200 mV. The proposed design is the
lowest in power consumption and delay time.
0.2 0.4 0.6 0.8 1.0 5.0n 10.0n 15.0n 20.0n 25.0n 30.0n 35.0n Del ay ti me ( sec ) Cap Loading (pF) Proposed JSSC1997[4] TVLSI2008[6] 10n 100n Power ( W at t) DD @V =0.2V,25 C,TT Corner° Proposed JSSC1997[16] TVLSI2008[17]
Fig. 3-10. Delay time and power consumption versus capacitive loads at 10 MHz.
The potential of the boost node returned to VDD or 0 indeed degrades the leakage
performance in the low frequency or in the fast process/temperature corners. On the contrary, the
potential of another boost node can easily pre-charge to VDD or 0. As shown in Fig. 3-11, whether
in the nominal 25°C, TT corner or in -40°C, SS corner or the 125 °C, FF corner, the delay times of all designs are almost the same at the frequencies from 1 MHz to 100 kHz.
100k 1M 10M 1n 10n 100n Del ay ti me ( se c) Clock frequency (Hz) DD @V =0.2V Proposed JSSC1997[16] TVLSI2008[17] TT, 25°C SS, -40°C FF, 125°C ★ ▲ @SS, -40 C° @TT, 25 C° @FF, 125 C°
3.3.4. Delay Time Analysis of Process Variation
Sub-threshold operation limits the yield due to its serious process variations. Although the boosted control signal pushes the driver transistors into the triode region, the residue circuit devices still incur the same serious problems with the variation. With fewer devices in the sub-threshold region, the proposed design is less affected by the process variation.
The delay time variability analysis is performed based on Monte Carlo simulations. Device
mismatch, threshold voltage Vth and process corner variation are assumed to be Gaussian random
distribution. In order to cover the most critical process and temperature corners, Monte Carlo simulations are under 3σ process variation at 25°C, 125°C and -40°C, as shown in Fig. 3-12. The supply voltage is 200mV and the clock rate is 1MHz. The number of samples for each temperature corner is 1500, and the total number of samples is 4500. For the worst case at -40°C, a conventional inverter has an average delay of 15.1ns, and the standard deviation is 26.4ns. For the proposed design does not only reduce the average delay to 6.9ns, but also the standard deviation to 6.3ns, which is much better than [16] and [17]. Obviously, The ALBI has higher immunity to the process and temperature variation.
0 200 400 600 800 # of sample s Conv._25° Conv._125° Conv._-40° 0 200 400 600 800 # of sampl es Prop._25° Prop._125° Prop._-40° 0 200 400 600 800 # o f s a m p le s J1997._25° J1997._125° J1997._-40° 0.0 10.0n 20.0n 30.0n 40.0n 50.0n 60.0n 70.0n 80.0n 0 200 400 600 800 # o f s am p le s
Delay time (sec)
T2008._25° T2008._125° T2008._-40° μ= 15.1 ns σ= 26.4 ns μ= 6.9 ns σ= 6.3 ns μ= 15.1 ns σ= 22.3ns μ= 13.9 ns σ= 17.3 ns DD @V =0.2V
3.4. Implementation and Experimental Results
3.4.1. Implementation of the Bootstrap Capacitor
We can choose the value of the boost capacitor to adjust the boosting efficiency. Large boost capacitor can achieve high boosting efficiency. In addition, larger boost capacitor can store more charges to keep the node voltage against the leakage even at the low speed. However, the area cost and power consumption is the design trade-off. In our test chip, a 30fF boost capacitor is used ensure that the boosting efficiency is at least 80% and doesn’t occupy too much area. MOSFET cap, MOM cap, and MIM capacitor are three types of capacitors in CMOS technology. Among them, MOSFET capacitor has the densest capacitance per area. However, MOSFET capacitor also has several drawbacks. First of all, while the MOSFET capacitor operated in sub-threshold region, the capacitance changes abruptly due to the control voltage as shown in Fig. 3-13. Then, the leakage current of the nano-scaled device becomes more serious.
Next, MOSFET capacitor has large parasitic capacitance from Vctrl nodes to the bulk as
compared to other caps. The large parasitic capacitance need more power budget in the driver. MIM capacitor has the least parasitic capacitance but largest area. A 30fF MIM capacitor occupies 5.1um x 8.5um. Besides, MIM capacitor needs an extra mask which means extra cost. As a result, we use MOM capacitor as the boost capacitor without extra mask. A 30fF MOM capacitor occupies 3.7um x 8.6um and has 1fF parasitic capacitance load at both nodes.
-0.9 -0.6 -0.3 0.0 0.3 0.6 0.9 10 20 30 40 50 60 70 Cap acit ance (fF ) Vctrl (Volt) Capacitance ctrl-V ctrl+ V 1μm ×2 1μm -0.9 -0.6 -0.3 0.0 0.3 0.6 0.9 10 20 30 40 50 60 70 Cap acit ance (fF ) Vctrl (Volt) Capacitance ctrl-V ctrl+ V 1μm ×2 1μm capacitance decreasing abruptly
3.4.2. Chip Implementation and Measurement
A test chip of bootstrapped CMOS inverters is implemented in 90nm 1P9M SPRVT process to demonstrate the effectiveness of the proposed design scheme. The test circuits include the reported bootstrapped circuits of [16], [17], and the proposed design. The circuits also contain test keys to verify the interconnection model. Each bootstrapped circuit is implemented as a 10-stage cascade driver chain. In each stage, two 30fF MOM capacitors serve as bootstrap
capacitors and a 200fF MOM capacitor as CL. Level shifters are used to boost the 200mV
internal signal to 500mV chip I/O signal for the measurement. The total area is 958μm 776μm× ,
and the core area is 566μm 102μm.× Fig. 3-14 shows the die photograph. The layout area of the
proposed bootstrapped inverter cell is 25.8μm 4.1μm.×
Test keys
Bootstrapped test circuitsDecouple Cap.
Decouple Cap.
De-couple Cap.Proposed bootstrapped
inverter cell
Fig. 3-14. Die photograph and cell layout.
Fig. 3-15 shows the photography of our experimental environment. Fig. 3-16 shows the measured waveform. The cumulative clock peak-to-peak and RMS jitters are 3.6ns and 504ps, respectively. The measured average total power is 1.01μW. With the leakage power estimated in Eq. (3-10), the derived leakage power is 107nW with the periods of 100ns and 105ns. TABLE
3-2 lists the summary of the chip. Since the threshold voltage Vthn and |Vthp| are 240mV and
180mV, respectively. We target to operate 10MHz at 0.2V. TABLE 3-3 lists the comparisons of measured results with other works at 0.2V. For a ten-stage driver chain operating at 10MHz, the ALBI has a delay time of 30.1μs, energy efficiency is 0.1 pJ/cycle, and the leakage power is 107nW, which is the best as compared to [16] and [17].
20ns 100mV
Fig. 3-16. Measured waveform at 0.2V core VDD (0.5V I/O VDD).
TABLE3-2Chip Summary
Measured Post-sim (FF Corner) Measured Post-sim (FF Corner) 1.01uW 107nW 1.13uW Total Power Whole Chip Bootstrapped Circuits Interconnect Test Circuits Layout Area 133nW Leakage Power Power Dissipation @ 10 MHz (10 stages) 0.5V Digital Circuits 0.2V, 0.5V Level Shift Buffer
0.2V Bootstrapped Circuits
Supply Voltage
90nm SPRVT Low-K CMOS Process Process Specification (unit) Item Measured Post-sim (FF Corner) Measured Post-sim (FF Corner) 1.01uW 107nW 1.13uW Total Power Whole Chip Bootstrapped Circuits Interconnect Test Circuits Layout Area 133nW Leakage Power Power Dissipation @ 10 MHz (10 stages) 0.5V Digital Circuits 0.2V, 0.5V Level Shift Buffer
0.2V Bootstrapped Circuits
Supply Voltage
90nm SPRVT Low-K CMOS Process Process Specification (unit) Item 958μm 776μm× 566μm 102μm× 575μm 307μm×
TABLE3-3Comparisons
0.10 0.34
0.19 Energy per cycle (pJ)
107 833 276 Leakage Power (nW) 1.01 1.71 0.74 Total Power (uW)
30.1 48.2
47.3 Delay time (us)
10 5 4 Max frequency (MHz) 0.2 0.2 0.2 Supply voltage (V) Proposed T.VLSI2008 [17] JSSC1997 [16] 0.10 0.34 0.19 Energy per cycle (pJ)
107 833 276 Leakage Power (nW) 1.01 1.71 0.74 Total Power (uW)
30.1 48.2
47.3 Delay time (us)
10 5 4 Max frequency (MHz) 0.2 0.2 0.2 Supply voltage (V) Proposed T.VLSI2008 [17] JSSC1997 [16]
3.5. Summary
This chapter describes an ALBI applied to a sub-threshold supply clock network. Based on 4500 times of Monte Carlo simulations, the average delay time of the proposed design with
200fF CL is 6.9ns with a standard deviation of 6.3ns, which achieves a reduction of 76% from
the conventional inverter. Measured results verify that the test chip can achieve a clock rate of
10MHz at 200mV VDD. Due to the negative VGS suppression, the measured leakage power is
more than 50% improvement over the previously reported bootstrapped drivers. The power consumption is 1.01μW, and the leakage power is 107nW, and the energy efficiency is 0.1pJ/cycle.
Chapter 4
Near-threshold On-chip Bus
In data communication, inter-symbol interference (ISI) critically limits the data rate. In this chapter, an on-chip bus design with an ISI-suppressed bootstrapped near-threshold repeater is proposed. Operating at the near-threshold supply voltage is the most effective means in power reduction. To overcome the poor driving capability, the bootstrap technique is used. In addition, a pre-charge enhancement and a leakage current reduction schemes are adopted. They achieve beneficial speed-energy tradeoff. Furthermore, the proposed repeater suppresses ISI noise in data link applications.
4.1. Proposed On-chip Bus Architecture
Metal 5
Fig. 4-1. Proposed on-chip bus architecture with new bootstrapped repeater insertion. Fig. 4-1 shows the proposed 4-bit on-chip bus for data communication under the near-threshold power supply. A bus is divided into several segments, each of which is driven by a bootstrapped repeater. Ground shielding is used to eliminate the effective-loading uncertainty and decouple the noise from adjacent channels. The staggered repeaters on adjacent channels are misaligned to reduce the coupling noise and simultaneous switching noise (SSN).