應用於近臨界電壓晶片資料傳輸之拔靴帶式電路技術

(1)

國立交通大學

電控工程研究所

博士論文

應用於近臨界電壓晶片資料傳輸之

拔靴帶式電路技術

Bootstrapped Circuit Techniques for Near-threshold

On-chip Data Link

研

究生：何盈杰

指導教授：

蘇朝琴教授

(2)

應用於近臨界電壓晶片資料傳輸之

拔靴帶式電路技術

Bootstrapped Circuit Techniques for Near-threshold

On-chip Data Link

研究生：何盈杰 Student：Ying-Chieh Ho

指導教授：蘇朝琴 Advisor：Chau-Chin Su

國立交通大學

電控工程研究所

博士論文

A Dissertation

Submitted to Institute of Electrical Control Engineering College of Electrical Engineering

National Chiao Tung University in Partial Fulfillment of the Requirements

for the Degree of Doctor of Philosophy

in

Electrical Control Engineering June 2012

Hsinchu, Taiwan, Republic of China

(3)

應用於近臨界電壓晶片資料傳輸之

拔靴帶式電路技術

研究生：何盈杰指導教授：蘇朝琴教授

國立交通大學電控工程研究所博士班

摘要

近年來，「環保綠能、永續生存」是近年來各界發展的重點。對電子產品而言，電池是能量的主要來源，延長電池的壽命可減少電池的消耗；另一方面，使用低功率設計，讓電路能降低功率消耗並延長電池的壽命。根據 P = fCV2 的理論中，同時降低操作電壓、減少電容負載的多重作用下，使得動態功率可達到好幾個羃次方 (Order) 的下降。為了達到低功率的效果，降低操作電壓是最直覺又有效的方法。甚至，有許多研究是將電路操作在近臨界區(Near-threshold)附近或直接在次臨界區裡操作。奈米技術已經廣泛地運用在低功耗的應用上，包括RF、Analog、AD/DA、與MPU等，功率更低的還有生理信號檢測的相關設計。充分利用奈米技術中元件負載減小的特性，以及次臨界區電流的極限。然而近臨界電路的設計將元件操作在近臨界區，目的是大幅降低功耗，達到所謂的效率能源(Energy-efficient)的特色。但是它有幾個主要的瓶頸：第一、操作速度慢，多應用於生醫晶片或其它慢速的系統。第二、靜態漏電功率消耗的問題在近臨界區下更顯得嚴重。第三、嚴重的製程漂移，影響著良率與量產成本。

在本論文裡，我們提出了近臨界電壓系統單晶片(System on Chip, SoC)上的資料傳輸(Data link)電路設計。並提出一系列全新的靴帶式技術(Bootstrap technique)，解決近臨界區電路設計的問題。我們提出的靴帶式技術，主要概念是使電路可提供雙向的升壓功能，所謂的雙向，是同時對 P 型跟 N 型元件作用，一邊大幅地增加驅動力，一邊抑制靜態漏電。相較於傳統電路操作在近臨界區，可以有兩個order 的改善。另一個的優點就是靴帶式技術可以使在次臨界區操作電壓下的電路，操作在一般的三極管區 (Triode region)，使得電路模型更加精確。我們從電路的蒙地卡羅分析就可以清楚地了解到製程漂移因此大幅減少。我們一共呈現了四個相關的電路：(1)一個應用於時脈網路(clock network)裡，可主動減少漏電流之靴帶式反相器。操作在0.2V 時，即便是 1cm 晶片上連線的時脈樹，能提供10MHz 的穩定時脈，能加以抑制低電壓操作時嚴重的靜態漏電流。此外，本設計使用閘極升壓(Gate Boosting)的概念，使大部分元件操作在導通區，大幅降低製程漂移。(2)一個應用在晶片匯流排(on-chip bus)上，能有效抑制符號干擾(Inter-Symbol

Interference, ISI)的靴帶式中繼器設計，VDD = 0.3V 時，單一個 channel 最高可以傳輸

100Mbps 的資料傳輸率 (使用 210-1 PRBS)，即便在 VDD = 0.1V 時，仍有 0.8Mbps 的

資料傳輸率。(3)接著，我們尋求最佳的有效能源設計，提出的高倍升壓的中繼器，提供三倍與四倍升壓功能之預驅動器(Pre-driver)來提供最佳的有效能源設計，而不會

(4)

最高可達到 5Mbps 的資料傳輸率，每位元的能源消耗僅有 35.2fJ。(4)最後，我們提出了靴帶式振盪器(bootstrapped ring oscillator)，並完成了一個可操作在近臨界電壓的全數位鎖位迴路(All-digital PLL, ADPLL)。操作在 0.5V 時，這個 ADPLL 可提供 480MHz 的輸出頻率，僅有 78μW 的功率消耗，而在 0.25V 時，仍可提供 44.8MHz

(5)

Bootstrapped Circuit Techniques for Near-threshold

On-chip Data Link

Student：Ying-Chieh Ho Advisor：Chau-Chin Su

Institute of Electrical Control Engineering National Chiao Tung University

ABSTRACT

For the sustainable electronic devices, ultra-low power design is essential to prolong

the battery lives. According to P = fCV2_{, scaling the supply voltage down is the most}

effective way to reduce the power consumption. According to the forecast from the International Technology Roadmap for Semiconductors (ITRS), the supply voltage will be scaled to 0.5V for low-power applications within the next generation. Scaling the supply voltage near the threshold voltage is the most favorable solution for low-power designs. On the other hand, Nano-scaled devices exceed the limit of the speed in the near-threshold region based on small device loading. Nano-scaled process is broadly applied to ultra-low power designs, which includes RF, AD/DA, MPU, especially in biomedical applications. Emerging embedded biomedical applications have once more pushed the low-power designs into another extreme case.

In order to achieve the feature of the energy-efficient operation, the designs are applied to work using near-threshold supply. However, near-threshold circuit design is

definitely challenging because the driving capability (Ion), which is limited to apply to slow

system. Then, the static leakage power becomes severe, and decreases the Ion/Ioff ratio.

Moreover, process variations are degraded significantly, affecting the circuit performance, the power efficiency, and the fabrication yield.

In this dissertation, we propose circuit designs on-chip data link system using near-threshold supply. In order to improve the design issues in the near-threshold region, we have developed several bootstrapped circuits. The main contribution of the proposed bootstrapped techniques is to boost the gate voltage at the both sides, which means to boost the gate voltage of the PMOS and NMOS at the same time. The proposed circuit is applicable in both increasing driving ability by boosting signals into super-threshold region and reducing the leakage current. While the circuit is operated in sub-threshold region, two-order improvement is achieved. In addition, the bootstrapped circuits are operated in triode region with the near-threshold supply. Consequently, that explain why the process

(6)

variation affects the proposed design scheme to a lesser extent. We can verify it with simulations of Monte Carlo analysis.

Four build blocks using bootstrapped circuits in on-chip data link have been proposed. The first one is a bootstrapped CMOS inverter applied to on-chip clock network. In

addition to improving the driving ability, a large gate voltage swing from -VDD to 2VDD

suppresses the sub-threshold leakage current. The test chip is able to achieve 10MHz

operation under 200mV VDD; the power consumption is 1.01μW. The Monte Carlo analysis

results indicate that a sigma of delay time is only 2.9ns at 0.2V operation. Then, an ISI-suppressed bootstrapped repeater applied to on-chip bus is proposed. The bootstrapped CMOS repeaters are inserted to drive a 10mm on-chip bus. Additionally, a precharge enhancement scheme increases the speed of the data transmission, and a leakage current reduction technique suppresses ISI jitter. The measured results demonstrate that for a 10-mm on-chip bus, it can achieve 100Mbps data rate at 0.3V, and even 0.8 Mbps at 0.1V. The third section investigates the performance of the interconnects with repeater insertion in the sub-threshold region. A 3X CMOS pre-driver and a 4X one are proposed to enhance the driving capability. As compared to the conventional repeater, the proposed ones have higher energy efficiency. The measured results show that the 3X (4X) pre-drivers can achieve 5Mbps (1.5Mbps) data rate at 0.15V with an efficiency of 35.2fJ (32.8fJ). The last section, we present a near-threshold supply ADPLL with bootstrapped digitally-controlled

ring oscillator (BDCO) that allows an ADPLL to operate with a near-threshold supply. The

BDCO is composed of a bootstrapped ring oscillator (BTRO) and a weighted

thermometer-controlled resistance network (WTRN). The proposed bootstrapped delay cell

generates large gate voltage swing to improve the driving capability significantly. The boosted output swing keeps the transistors operated in the linear region to provide high

linearity of the output frequency as function of VDD even using a near-threshold supply.

According to the transferring character of the BTRO, WTRN provides linear control while sweeping the supply voltage. The proposed ADPLL oscillates from 36.8 to 480MHz with a power consumption of 2.4-78μW under a supply voltage of 0.25-0.5V.

(7)

誌謝

光陰似箭，歲明如梭，一轉眼離開業界回到學校進修的日子已經六年了。兩千多個日子一晃眼就過去，而在腦海中留下的是深刻的感動。這一路走來挫折不斷，挑戰也是一波一波接著來。曾幾何時，我幾度懷疑自己能否完成這個學業，但是此時此刻，我完成了生涯規畫中重大的階段。一路上有許多人相助與陪伴，才能造就今天的我。除了謝天之外，該感謝的人，真的是太多了。打從心底知道，即便是缺少了一個貴人，就只一個，我的學位可能就不會完成!在未來的日子裡，我會繼續創造我的未來與價值，但在這之前，我謝謝所有身邊曾經陪伴我，鼓勵我、提攜我的各位! 感謝我的指導老師蘇朝琴教授多年來的教導，老師無論是在學術研究上縝密嚴謹的思考方式，抑或是為人處事上圓融包容，都讓學生獲益良多。在這幾年，我一改以往的學習態度，不再以強記的方式為學，而是敞開心胸用謙卑的心與想像力，以熱情來迎接無止盡的學海，也因此收穫斐然。感謝我碩士班的指導教授吳安宇教授，雖然時空的因素沒能繼續待在您的門下，但是每次見面時，您總是不忘提點學生在專注研究之餘，需注意未來的規劃與寫作的技巧，學生謹記在心。感謝洪浩喬教授，您是我的益師益友，謝謝您除了在課堂上的教導外，分享了這麼多您在學術研究上的經驗。當學生在茫茫的學術海中亂衝時，有一位前輩在旁提點，讓我充滿著信心。感謝莊景德教授，以及李鎮宜教授在計畫中提供的晶片面積與下線的機會。缺少了這些晶片，我們的想法就只是一場空談，更不會有這些論文的產出。感謝周世傑教授在法國巴黎參加研討會時，帶著學生認識世界各地的學者，增廣個人視野。此外，也感謝曾煜輝博士與徐仁乾博士的同袍之情，我永遠不會忘記這些一起努力的日子，希望大家這段辛勤耕耘，未來都會有所收穫。感謝小馬在On-Chip Bus 的研究上嗚了第一槍，更謝謝這篇論文的其它共同作者：家齊以及于昇，很榮幸跟兩位在這個主題上一同討論、成長，現在全世界都看到我們的成果了。謝謝在918 這個大家庭中一起生活的朋友：丸子、教主、楙軒、小潘潘、方董，以及其它這六七年來所

(8)

有的學弟妹，謝謝大家的協助與包容。也要謝謝這些年來，與我們一同在計畫中奮鬥的助理們：雅雯、上容、俊秀、豐文、伉佑、美玲。還有其它研究群的朋友們，李淑敏教授、蕭志龍教授、盧台祐博士、楊皓義博士、杜明賢博士、胡璧合博士、蔡玉章博士、陳嘉怡博士、范銘隆博士、洪紹峰博士、許書餘博士以及劉小胖、致煌、勖哲、柏鈞、瑋庭等各位學弟，感謝大家適時地伸出援手，讓我的研究更為順遂。最後，感謝我最愛的家人，我的父母、姊姊與哥哥，你們給予盈杰的栽培與殷殷期盼，盈杰無以回報。謝謝我的妻子，佳慧，有了妳的支持我才夠無後顧地衝刺學業，沒有妳的愛就沒有我的博士學位。而我的寶貝女兒苡瑄，把拔也要謝謝妳，因為有妳，把拔對自己的未來更有勇氣；有了妳，把拔的人生更有意義。謹獻給我的家人。何盈杰于交大電資303 2012/6/27

(9)

摘要 iii

ABSTRACT ... v

Table of Contents...ix

Chapter 1 Introduction... 1

1.1. Challenges in Nano-Scaled Near-threshold Design ... 1

1.2. Near-threshold On-chip Data Link ... 2

1.3. Organization of the Dissertation... 3

Chapter 2 Background Review... 4

2.1. Effects in Nano-scaled Process [6]... 4

2.1.1. Short-Channel Effect ... 4

2.1.2. Narrow-Width Effect ... 5

2.1.3. Sub-threshold Leakage [6, 9] ... 6

2.1.4. Drain-Induced Barrier Lowering [6] ... 6

2.1.5. Gate-Induced Drain Leakage [6, 10] ... 7

2.1.6. Gate Leakage [11]... 7

2.2. Challenges in Ultra Low-voltage Designs... 8

2.2.1. Degradation of Driving Capability... 8

2.2.2. Leakage Power and Ion-to-Ioff Ratio [8, 12] ... 8

2.2.3. Process, Voltage and Temperature Variation ... 9

2.3. Low-voltage Design Techniques ... 10

2.3.1. Bootstrap Techniques... 10

2.3.2. Dynamic Voltage and Frequency Scaling... 12

2.3.3. Multi-threshold MOS Control ... 13

2.3.4. Bulk-driven Technique ... 13

2.4. Summary... 13

Chapter 3 Near-threshold Clock Network ... 15

3.1. Overview of On-chip Interconnect... 16

3.1.1. RC-Interconnect with repeater insertion ... 16

3.1.2. Time constant, power dissipation and FoM... 17

3.2. Proposed Active Leakage Reduction Bootstrapped Inverter... 18

3.3. Detail Evaluation and Discussion... 20

3.3.1. Boosting Efficiency ... 21

3.3.2. Reduction of Leakage Current... 22

3.3.3. Delay Time Analysis... 25

3.3.4. Delay Time Analysis of Process Variation ... 27

(10)

3.4.1. Implementation of the Bootstrap Capacitor... 28

3.4.2. Chip Implementation and Measurement ... 29

3.5. Summary... 31

Chapter 4 Near-threshold On-chip Bus... 32

4.1. Proposed On-chip Bus Architecture ... 32

4.2. ISI-suppressed Bootstrapped Driver... 33

4.3. Detailed Evaluation and Comparisons ... 35

4.3.2. Leakage Current Reduction... 36

4.3.3. Leakage Power Analysis... 36

4.3.4. ISI Suppression... 41

4.3.5. Energy Efficiency ... 43

4.3.6. Monte Carlo Simulations... 45

4.4. Experimental Setup and Measurement... 47

4.4.1. Chip implementation ... 47

4.4.2. Measured Waveforms ... 48

4.4.3. Leakage Power Measurement... 53

4.5. Summary... 55

Chapter 5 High-boosting Pre-driver ... 56

5.1. Proposed High-boosting Pre-driver... 56

5.2. High-boosting Pre-driver in Long Interconnects... 59

5.2.1. Leakage Current Reduction... 59

5.2.2. Energy Efficiency ... 59

5.2.4. Monte Carlo Simulations... 63

5.3. Experiment and Measurement Results ... 65

5.3.1. Chip implementation ... 65

5.3.2. Measured Waveforms ... 66

5.4. Summary... 68

Chapter 6 Near-threshold ADPLL ... 70

6.1. Architecture of Proposed All-Digital PLL ... 71

6.1.1. PFD, PS and TDC... 71

6.1.2. DLF... 72

6.1.3. Bootstrapped Digitally-Controlled Oscillator ... 73

6.1.3.1. Bootstrapped Ring Oscillator ... 74

6.1.3.2. Weighted-Thermometer Code Control ... 75

6.1.4. SDM ... 76

(11)

6.2.1. Power Analysis of BTRO ... 77

6.2.2. Linearity Analysis of BTRO... 78

6.3. Experimental Results and Comparisons... 81

6.3.1. Chip Implementation ... 81 6.3.2. Measured Results... 83 6.3.3. Comparisons ... 86 6.4. Conclusions ... 87

Chapter 7 Conclusions ... 98

References ... 98

VITA

... 98

Publication List ... 98

(12)

Chapter 1

Introduction

In the past few years, low voltage and low power designs have attracted significant attentions because of the popularity of portable devices. Emerging embedded biomedical applications have once more pushed the low-power designs into another extreme case.

According to P=fCV2, scaling the supply voltage near the threshold voltage is the most favorable

solution for low-power designs. A 180mV, 1024-point FFT processor is a pioneer sub-threshold supply design [1], and followed by [2]. Sub-threshold SRAM is another important category [3]. Other designs include a 6-bit Flash ADC for use at 0.2–0.9V and a 14-tap 8-bit finite impulse response (FIR) at 20MHz under 0.27V [4-5].

“Sustainability” is the theme of the ASSCC 2011 and ISSCC 2012. They focused on the design techniques of energy-efficient and low-voltage circuits and of improving battery lifetime. A panel discussion about 0.5V system is held as well during ASSCC 2011, which pointed out the challenges of this new trend. However, energy-efficient designs under a low-voltage supply usually have speed degradation. A new circuit design strategy should perform good trade-off

between energy efficiency and speed. In addition, the nano-scaled effects, Ion/Ioff ratio, and

process variations are degraded significantly, affecting the circuit performance, the power efficiency (leakage power), and the fabrication yield.

1.1. Challenges in Nano-Scaled Near-threshold Design

As technology continues to be scaled down, the performance of nano-scaled devices are influenced by many reasons, such as threshold voltage, channel physical dimensions, doping concentration, gate oxide thickness, and supply voltage. Due to the fluctuation of these factors,

short-channel effect (SCE), narrow-width effect, drain-induced barrier lowering (DIBL), gate-induced drain leakage (GIDL), and gate leakage are incurred. These effects become a

critical bottleneck for the trade-off among speed, power and cost requirements.

Near-threshold circuit design is affected significantly because of the degradation of the

driving capability, the Ion/Ioff ratio, and variations. Although circuits down to the near-threshold

supply can achieve ultra-low power consumption, the driving capability of CMOS devices require a large area to compensate for driving efficiency. A conventional CMOS circuit also

(13)

incurs a severe Ioff problem in the nano-meter process. In addition, the near-threshold circuit

suffers serious process, voltage and temperature (PVT) variations, which could be even several times variations.

1.2. Near-threshold On-chip Data Link

Fig. 1-1 shows a block diagram of on-chip data link system. According to different system requirement, serializer/de-erializer might be needed. Apart from serializer/de-erializer, the on-chip bus and local oscillator are the most important macros in the system.

On-chip interconnects becomes a bottleneck with respect to speed, power, cost and noise while the technology scaling to nano-meter. Among the on-chip bus design categories, repeater insertion is a popular method for interconnects. In this dissertation, we discuss challenges and design issues for a near-threshold clock buffer and a nano-scaled near-threshold data link circuit. In order to solve these problems, we have proposed a new on-chip clock network and data bus with several bootstrapped techniques.

Fig. 1-1 Basic function blocks of on-chip data link.

Phase-locked loops (PLLs) often play an important role to serve as a local oscillator. In this

dissertation, we develop a bootstrapped ring oscillator (BTRO), which can operate at 0.2-0.6V supply voltage. Owing to the bootstrapped technique, it achieves high linearity as a function of voltage supply. Based on this feature, a new ADPLL with BTRO is proposed as well. It can achieve 480MHz with only consuming 78 μW.

(14)

1.3. Organization of the Dissertation

The rests of the dissertation are organized as follows. Section II reviews the backgrounds of this dissertation. First, several effects of the nano-scaled devices are introduced. Challenges in low-voltage circuit design are discussed as well. Moreover, some reported low-voltage techniques are reviewed. Section III introduces the repeated-RC on-chip interconnect architecture. A bootstrapped inverter applied to a 0.2V clock network is developed. It also features an active leakage current reduction technique to save leakage power. Section IV introduces a low-voltage on-chip bus with an ISI-suppressed bootstrapped repeater. In order to achieve high energy-efficiency, Section V introduces high-boosting bootstrapped repeaters. In Section VI, we present a near-threshold ADPLL using a bootstrapped digitally-controlled oscillator (DCO). Finally, Section VII draws conclusions and future works.

(15)

Chapter 2 Background Review

In the past few decades, the scaling of CMOS technologies has been the major driving force of the trend of Moore’s Law. As scaling to nanometer technology, the process parameters are no longer scaled to a single scaling factor because the saturation of carrier velocity and the increasing sub-threshold leakage current become serious. With the continuing shrinking of the channel length and the gate-oxide thickness, some non-ideal effects appear to affect circuits. Additionally, lowering the supply of nano-scaled designs to the near-threshold region has several detrimental impacts. In this chapter, the effects in nano-scaled near-threshold design are briefly reviewed. Subsequently, popular low-voltage design techniques shall be introduced as well.

2.1. Effects in Nano-scaled Process [6]

2.1.1. Short-Channel Effect

The short-channel effect (SCE) is occurred on a MOSFET device in which channel length is as the same order of magnitude as the depletion-layer widths of the source and drain junction. The SCE is often modeled of charge sharing, where the source and drain depletion regions store

the charge under the gate. The threshold voltage Vth of a MOSFET can be represented using

depletion approximation as 2 B th fb f OX Q V V C = + Φ + _(2.1)

where V is the flat-band voltage; _fb Φ is the Fermi potential; _f Q is the charge of channel ; and _B

COX is the oxide capacitance. While channel length is shrunk, the stored charges are reduced

significantly in the doped area. As a result, threshold voltage is increased due to increasing channel length.

(16)

Fig. 2-1. Threshold voltage with change in channel length due to SCE [6].

Halo doping, which is a non-uniform channel doping in modern processes to adjust

threshold voltage is so-called reverse short-channel effect (RSCE). The increasing of threshold

voltage comes from extra doping charges near the source and drain regions. As the device's length is reduced, the threshold voltage of the device increases. The behavior is the opposite of what is expected from the SCE [7-8].

2.1.2. Narrow-Width Effect

The narrow-width effect (NWE) occurs when the threshold voltage Vth of a nano-scaled

MOSFET is modulated by the gate width. Hence the device width modulates the drain current. According to the Eq.(2.1), there are two main reasons to cause NWE. First, the charge in the gate-induced depletion region results an increase of threshold voltage. The second on is that channel doping is higher along the width dimension. Because dopants trespass under the gate, higher voltage is necessary to incur the channel inversion. Fig. 2-2 shows the NWE as a function of channel width. Width 300n 500n 700n 900n ID(nA) Vth(mV) 290 270 20 30 40

(17)

2.1.3. Sub-threshold Leakage [6, 9]

In a nano-scaled device, the sub-threshold (or weak inversion conduction) current Isub is

happened with gate-source voltage below the threshold voltage Vth. Itcan be expressed as in

Eq.(2.2). 2_exp( GS th _{) 1 exp(} DS ₎ sub dep T T T V V V W I C V L nV V μ − ⎛ − ⎞ = _⎜ − _⎟ ⎝ ⎠. (2.2)

Where μ is the effective mobility; Cdep is the depletion capacitance; W and L are the width and

length of the device; VT is the thermal voltage; VGS is the gate-to-source voltage; n is the

sub-threshold slope factor, and VDS is the drain-to-source voltage.

As compared to the strong inversion region, the sub-threshold current is dominated by the diffusion current instead. The movement by the diffusion is likely to charge flowing in BJTs. However, sub-threshold current is affected by other phenomenon, such as drain-induced barrier lowering (DIBL) and gate-induced drain leakage (GIDL). They are introduced in the following sections.

2.1.4. Drain-Induced Barrier Lowering [6]

-0.4 -0.2 0.0 0.2 0.4 0.6 10p 100p 1n 10n 100n 1μ 10μ 100μ D rai n c urrent (A mp)

Gate voltage (Volt)

VDS = 0.1 V VDS = 0.2 V VDS = 0.3 V Conventional I_off @ V_G= 0 V D V ↑ V_D= V_DD V_G= - V_DD @25 C,TT Corner° (DIBL) (GIDL)

Fig. 2-3. Drain current of a NMOS device vs. VG in the near-threshold region.

In micron-scaled devices, the source and drain are separated far enough that no effect is incurred on the depletion regions. In such a case, the drain current is nearly independent of the channel length and drain bias. At the off conditions, the potential barrier between the source and

(18)

drain prevents electrons from flowing to the drain. In a short-channel device, the Vth varies with

channel length according to the SCE. In addition, DIBL effect induces energy barrier lowering with increasing drain voltage [6]. When a short-channel device uses a higher drain voltage, the energy barrier decreases lower, resulting in further increasing the drain current. Fig. 2-3 depicts

ID as a function of VG, which illustrates DIBL effect as the drain voltage increases. As shown in

Fig. 2-1, DIBL effect lowers the threshold voltage, but remains the slope in the near-threshold region.

2.1.5. Gate-Induced Drain Leakage [6, 10]

Gate-induced drain leakage (GIDL) occurs in the drain junction owing to high field effect in the drain junction of an MOSFET. It usually happens when the electric field in or around the gated PN junction becomes more substantial with the applied gate voltage. The high-field effects, like avalanche multiplication and band-to-band tunneling (BTBT), become severely. Thus, the leakage current of a reverse-biased gated diode may increase dramatically when the negative gate voltage begins to cause field crowding and peak field. In order to suppress GIDL, thicker oxide and lower electric field might be used. Besides, very high drain doping is considerable for minimizing GIDL as well. Figure 2-3 also shows the GIDL according to drain current characters of a NMOS device with different drain voltage.

2.1.6. Gate Leakage [11]

In nanometer technology, the process parameters as the gate oxide layer thickness TOX has

been scaled to the values in the range of 12–22Å. As mentioned, DIBL also incurs in the

presence of large gate tunneling leakage current Igate. Igate increases due to the finite probability

of an electron tunneling through the SiO2 layer directly. The probability is a strong exponential

function of TOX. Only a difference of 2Å TOX thinner may increase an order of magnitude.

Therefore, it becomes the most sensitive parameter with respect to any physical dimensions.

Typically, Igate is much smaller than sub-threshold leakage current Isub, while TOX is large than

20Å. In simulation level, BSIM4 model (level =54) includes nano-scaled effects such as GIDL

and DIBL. In addition, Igate has taken into account as well. For fast simulation and reliable

(19)

2.2. Challenges in Ultra Low-voltage Designs

2.2.1. Degradation of Driving Capability

When a MOSFET device is operated in the super-Vth region, the drain current operated in

the saturation region is a function of the gate voltage. It can be represented as Eq.(2.3).

(

)

2 , ( ) 1 D Sat ox GS th DS W I C V V V L μ λ = − + _. (2.3)

Where Cox is the gate oxide capacitance per unit area; and λ is the factor for channel-length

modulation. According to Eq.(2.3), drain current ID,Sat decreases quadratically when the gate

voltage goes lowering. When the gate voltage keeps going lower into the sub-threshold region, the drain current starts to decrease exponentially, as shown in Eq.(2.2). That is to say, when our design is operated in near-threshold region, poor driving is the first design issue. In normal 1V designs, sizing is a way that we often use to increasing driving. However, gate capacitance of a MOS device drops very slightly when the gate drive lowers to nearly threshold voltage. As a result, enlarging device size to enhance driving capability seems not a good idea in the near-threshold region.

2.2.2. Leakage Power and I

on

-to-I

off

Ratio [8, 12]

Ion-to-Ioff ratio becomes a critical factor in near-threshold digital circuits and near-threshold

circuits. The inherently small Ion-to-Ioff ratio dominates how many transistors can be connected

per node. As reported in [12], the degradation in Ion-to-Ioff is from approximately 107 to 104 and it

implies that there is a strong interaction between the ON and the OFF devices in sub-threshold region when it comes to setting the voltage level of critical signals. Unfortunately, this causes a relevant failure mechanism in circuit operation. As illustrated in Fig. 2-4, an inverter is served as

a driver with a capacitive load of 200 fF while VDD is being swept from 0.1–0.3V. The circuit is

operated to the limit of the speed. Obviously, the leakage power becomes a greater portion of the

(20)

0.10 0.15 0.20 0.25 0.30 0.0 3.0n 6.0n 9.0n 12.0n 15.0n

Supply voltage (Volt)

Conventional Conventional 0 10 20 30 40 50 Plea kag e / P T (%) Plea kag e (W a tt) @25 C,TT Corner° Pleakage Pleakage/ PT(%) 0.2pF ; ; W 320nm W 460nm _m=50 L n 60nm L p 60nm ⎛ ⎞ ₌ ⎛ ⎞ ₌ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠

Fig. 2-4. Leakage power on a repeater at subthreshold supply.

2.2.3. Process, Voltage and Temperature Variation

Process, voltage and temperature (PVT) corners induced performance variation makes the

circuits design in near-threshold region tremendously challenging. First of all, process variability affects current due to some process parameters, such as mobility and threshold voltage. Even a small variation may lead to exponentially mismatch. The process variation is divided into two major categories [13]. Besides, it is classified into more specific categories, according to their

physical range on a wafer or on a die [14]. Fig. 2-5 depicts ID as a function of gate voltage in the

near-threshold region, which illustrates process and voltage effect at room temperature. It shows

that the variation of ID becomes worse due to the process and voltage fluctuation as the supply

voltage goes lower.

Apart from the static term of the process variation after a fabricated die, voltage supply variation is related to the fluctuations during the circuits operations. Real-time fluctuations caused by a voltage drop or inductance effect in wire may result in function failure [14-15]. The impact of temperature is another important factor to the variation and reliability in a nano-scaled chip, especially the supply voltage down to the near-threshold region. The sub-threshold current

is highly depending on the temperature owing to the parameter VT. In contrast to the current in

the super-threshold region, ID increases as the temperature is raised. The measured temperature

(21)

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1p 10p 100p 1n 10n 100n 1μ 10μ 100μ Drai n cu rrent (Amp )

Gate voltage (Volt)

FF TT SS IDmin, SS IDmin, TT I_{Dmin, FF} ° @25 C

Fig. 2-5. Drain current in different corners in the near-threshold region.

2.3. Low-voltage Design Techniques

As mentioned, circuit design in the near-threshold region has many challenges. Several techniques have been reported to solve the problems or improve energy efficiency. They are briefly reviewed in the following sections.

2.3.1. Bootstrap Techniques

Bootstrapping is an effective means of enhancing the speed in order to raise the driving efficiency. Therefore, a previous work has developed a bootstrapped CMOS driver for large capacitive loads, shown if Fig. 2-6 [16]. According to [16], the bootstrapped driver consists of a pull-up and pull-down control pair to drive the PMOS and NMOS transistors, respectively. The

gate voltages of PMOS and NMOS driver transistors are kept VDD and 0 in the cut-off phase. In

the driving phase, the gate voltages are fed -VDD and 2VDD to increase the current density. When

the input Vin is at 0 V, the Va is at VDD and the output of the inverter is at VDD. Moreover, MN2

and MN1b are off; MP2 and MP1b are on. Therefore, V2P is pre-charge to 0 V by MN2b, and

bootstrap capacitor Cbp stores a potential of VDD. When the Vin transits from 0 V to VDD (from L

to H), V2P is boosted from 0 V to -VDD. Then, the potential of a -VDD is passed from V2P to V1P.

Consequently, the potential of a -VDD is at the gate of the driver MP2, which drives Vout by VSG

(22)

Fig.2-6 Reported bootstrapped driver in [16].

The driver in [16] successful enhances the driving capability by boosting the gate voltage, which is suitable using in the near-threshold supply as well. However, there are several drawbacks such as reverse leakage current or non-ideal transient edge. Some researchers have

proposed some improvements based on [16]. Among them, Kil et al. proposed a sub-threshold

bootstrapped repeater in a 9MHz distributed clock network at 0.4V [17]. The sub-threshold bootstrapped repeater is depicted in Fig. 2-7, which is composed of two bootstrap circuits. One is for pre-boosting, and the other is for driving. The circuit of per-boosting enhances the pre-charge

current to increase the speed. In addition, MPS2 and MNS2 are switches that can feed the boosted

signal back to eliminate the reverse current. However, while this approach is applied to a data link, the kick-back disturbance through the boosting capacitors causes a large timing jitter. Furthermore, it consumes large static power and is associated with high capacitor costs.

(23)

Fig. 2-7 Reported bootstrapped driver in [17].

2.3.2. Dynamic Voltage and Frequency Scaling

Dynamic Voltage and Frequency Scaling (DVFS) is a popular power saving scheme since it

is broadly used in microprocessor and DSP ASICs [18]. Since different functions need different execution times, supply voltage or the data rate can be dynamically changed to meet the specification requirements in DVFS system; hence, the power consumption can be optimized for the computational tasks conditionally.

On the other hand, DVFS scheme also applied to lower the operating frequency in portable products when battery goes low. DVFS is able to keep system working on basic functions in order to extend the battery lifetime or stand-by time. DVFS scheme is applied to adjust PVT variation as well [19]. In fact, such designs often remain large redundant margin in particle chip. DVFS determines the supply voltage or the frequency for the task appropriately and dynamically and therefore exceeds most power efficient.

Critical Path Monitors (CPMs) [18, 20-21] a sub-module of these worst-case margins by

using a delay-chain which is replica of the critical path of the actual design. The propagation delay through this replica-path is monitored and voltage and frequency are scaled until the replica-path just meets timing. The replica-path tracks the critical-path delay across inter-die

(24)

process variations and global fluctuations on supply voltage and temperature, thereby eliminating margins due to global PVT variations.

2.3.3. Multi-threshold MOS Control

Since the circuits operate in the near-threshold region, lowering the supply voltage

decreases ID according to equations (2.2) and (2.3). It results in a drastic rising in gate delay time.

In order to overcome the speed degradation problem, one way is to reduce the Vth of a MOSFET

device [22-23]. As Vth is reduced, however, another significant problem incurs. A rapid increase

in stand-by current due to changes in the sub-threshold leakage current damages the power performance. To save stand-by power during the sleeping mode, a power management scheme combined small embedded processor and multi-threshold sleep control is reported in [24]. It

utilizes high Vth MOSFET devices, resulting in low standby and dynamic power.

2.3.4. Bulk-driven Technique

Similar to multi-threshold MOS control, the bulk-driven technique is using circuit

techniques to shift Vth lower or higher by biasing bulk voltage. Sometime, the bulk-driven

technique is called “adaptive body-biasing” as well [25]. Some contributed works based on the bulk-driven technique are reported in [26-27]. The threshold voltage can be expressed as in Eq.(2.4) [28].

0 2 2

th th F SB F

V =V −γ ⎡_⎣ φ −V − φ ⎤_{⎦ .} (2.4)

It is the well-known equation relating how the body voltage affects the threshold voltage, where

γ is the body effect coefficient. The bulk-driven technique has several important features. To

enhance the driving capability by modulate the Vth is the obvious one. The most important

feature is that it can allow zero, negative, and even small positive bias voltages to achieve the desired DC currents such that it has a good alternative to increase the input common-mode voltage range. In normal circuit design, the bulk terminals of PMOS (NMOS) is always connected to the highest (lowest) potential to avoid the latch-up problem from junction forward biasing of the bulk–source.

2.4. Summary

In this chapter, several backgrounds of the dissertation have been briefly reviewed. Since some non-ideal effects owing to the shrinking of the channel length and the gate-oxide thickness,

(25)

current variation caused by environment makes circuit designs more challenging. Additionally, nano-scaled circuits design using near-threshold supply has several detrimental impacts. Trade-off between performance and energy efficiency should be carefully dealt with. Last part of this chapter, some popular low-voltage design techniques have been introduced as well. Based on the concept of the bootstrap technique, we will develop several bootstrap circuits in the following chapters.

(26)

Chapter 3

Near-threshold Clock Network

A driver with strong driving current and little skew is needed in a clock network. According to Fig. 3-1(a), the conventional bootstrapped driver consists of a pull-up and pull-down control pair to drive the PMOS and NMOS transistors, respectively. As mentioned in chapter 2, the gate

voltages of PMOS and NMOS driver transistors are kept VDD and 0 in the cut-off phase; they are

fed -VDD and 2VDD to increase the current density in the driving phase. Despite a previous effort

[35] to increase the boosting efficiency by rearranging the timing of the switching and boosting signals, reverse leakage current remains the main drawback of conventional bootstrapped drivers. Among other bootstrapped circuits, single capacitor ones reduce the costs of hardware overhead [36-37]. However, their complex circuitry design seriously degrades charge sharing at the capacitor node. Moreover, the leakage current is problematic as well.

(a) (b)

Fig. 3-1.(a) Conventional bootstrapped circuit (b) Proposed bootstrapped circuit.

In this chapter, we present a sub-threshold clock network with a bootstrapped CMOS inverter operated at sub-threshold power supply. The bootstrapped CMOS inverter is introduced to achieve high boosting efficiency and improve the speed. It is applicable in both increasing driving ability by boosting signals into super-threshold region and reducing the leakage current as well. Fig. 3-1(b) illustrates the circuit diagram. Theoretically, the PN bootstrap circuit

produces an output swing of -VDD to 2VDD. 2VDD (-VDD) enhances the driving capability of

NMOS (PMOS) driver and suppresses the leakage for the PMOS (NMOS). The PN bootstrap

(27)

negative VSG (VGS) = -VDD suppresses leakage current while the PMOS (NMOS) driver is turned

off. Moreover, as compared to other previous works, the proposed design scheme has fewer devices in the sub-threshold region. Consequently, that explain why the process variation affects the proposed design scheme to a lesser extent.

3.1. Overview of On-chip Interconnect

Before introducing the proposed bootstrapped CMOS inverter, the fundamental of interconnect is briefly reviewed. First of all, interconnect and repeater linear model is adopted according to VLSI parameters scaling in this section. In addition, the definitions of speed and power consumption of the on-chip interconnect circuits are described. All these parameters

introduced from linear models to define figure of merit (FoM), the index for optimal global

on-chip interconnect design.

3.1.1. RC-Interconnect with Repeater Insertion

Top Metal

Bottom Metal

Fig. 3-2. Cross section of interconnect configurations.

In general, a global interconnect is assumed to be placed between two adjacent orthogonal

metal layers and two coplanar wires, as shown in Fig. 3-2, where W and S are the interconnect

width and spacing; T is the interconnect thickness and H is the dielectric height; Cf is the

fringing-field capacitance; Ca is the parallel plate capacitance to the top and bottom layers of

metal; Cc is the coupling capacitance between the neighboring interconnects. The interconnect

resistance per unit length is denoted as (3-1).

T W r_w ⋅ = ρ . (3-1)

(28)

With technology scaling and global interconnect increasing, repeaters insertion is broadly used to reduce delay and power consumption. Several literatures have addressed the optimization of global interconnect design with repeater insertion [29-33]. Since the interconnect parameters

can be determined by width S and spacing W and so on, on-chip interconnects with repeaters

insertion can be analyzed by Elmore RC delay model. According to Elmore delay model, time constant τof whole interconnect can be given from the model depicted in [29-31]

When we separate global interconnect into several segments, the small delay penalty of repeaters can be tolerated on these critical segments. Time constant τis dominated by interconnect segment. However, if the segment of global interconnect is over-shorten, the driving capability of repeaters decreases severely. Consequently, there is a trade-off between time constant τand power consumption.

3.1.2. Time Constant, Power Dissipation and Figure of Merit

Data rate is relative to time constant. Rising time and falling time can be estimated by the step response The output rise time is defined from the 20% transition edge to 80% transition edge, as shown in Eq.(3-2).

r 80% 20%

t =t −t ≅1.386τ _. _(3-2)

The minimum rising time is specified as 0.125 unit interval (UI) in the SATA standard, where

t80% and t20% is the time when output voltage exceeds 80% VDD and 20% VDD, respectively during

the rising edge [34].

Besides speed is one of the most important factors in on-chip interconnect design, power consumption is another basic consideration as well. The total power consumption includes not only the switching power, but also the leakage power and the short-circuit power, which is

expressed as PSW, PSC and PLeakage, respectively. The detail expressions and discussions are

reported in [29-31]. The total power dissipation of each interconnect is written as in Eq.(3-3).

(

)

T SW SC Leakage L P P P P h ⎛ ⎞ =_{⎜ ⎟}× + + ⎝ ⎠ . (3-3)

Where L is the total length of interconnect and h is the separated segment length. Since switching

power dissipation is a great portion of total power, PSW can be expressed as in Eq.(3-4).

(

)

2 SW gs db Wire DD mL P f c c c V h α ⎡ ⎤ = ⋅_⎢ + + _⎥⋅ ⎣ ⎦ . (3-4)

(29)

(cgs+cdb) is the parasitic capacitor of repeater.

Performance of interconnect is effected by many design parameters. Most of them were

discussed in literatures [32-33]. The FoM is used to compare the performance. Here, FoM1 in

Eq.(3-5) is defined as the total energy per bit to express the energy efficiency. 2 1 FoM T . T Total DD P E C V f α = = ≈ (3-5)

Where ET represents the total energy. Fig. 3-3 shows the energy per bit is a function where total

L is 10 mm and ET is depicted as a function of segment length h and repeater finger m. As a

result, we can find out that the design is more energy-efficient as h is longer and m is using

minimum m=1. Since the supply voltage VDD is assigned by the system requirement, the only

way to gain the energy efficiency is using long segment length h. However, it suffers great penalty of speed. According to this limiting fact, the most energy efficiency happens as using maximum h and the minimum driver sizing. It becomes a trade-off depending on the requirement. 0 400 800 1,200 102 103 0 0.5 1 1.5 2 Finger m Segment length (um)

E ng ery pe r bi t (pJ )

Fig. 3-3. Effect of segment length and fingers of repeaters on the energy per bit.

3.2. Active Leakage Reduction Bootstrapped Inverter

Fig. 3-4 schematically depicts the proposed active leakage reduction bootstrapped inverter

(ALBI). Where CBP and CBN are the bootstrap capacitors; MP1 and MN1 are the transistors for CBP

pre-charge and CBN pre-discharge; INV refers to the inverter to control MP2 and MN2; MPD and

MND are the output drivers for CL; NP and NN are the boosted nodes. The node NB is boosted

(30)

operations with the input switching from H to L and from L to H respectively. Fig. 3-7 shows the ALBI simulated transient waveforms with an output load of 0.5pF under a power supply of

200mV. According to this figure, before Vin transits from H-to-L, node NN has the initial voltage

of 0V. After transiting from H-to-L, NN is boosted below ground to (-188mV). Meanwhile, MP2

is turned off and MN2 is turned on. Therefore, the boosted signal at NN passes through MN1 to NB

to drive MPD in order to pull up the capacitive load CL. At this moment, MP1 is turned on to

pre-charge NP to VDD (0.2V). However, MN1 is turned on reversely causing the reverse current

flow to charge NN. At the end of the period while Vin is L, NN still holds (-90mV). When Vin goes

from L to H, the operation is similar to Vin transiting from H to L. NP is boosted above VDD to

389mV and discharged to 303mV at the end of the period while Vin is H.

Fig. 3-4. Proposed bootstrapped inverter.

(31)

Fig. 3-6. Proposed bootstrapped inverter operations (input L-to-H). 1.9μ 2.0μ 2.1μ -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 Vo ltag e (V) Time (sec) Vin NP NN Vout N_P N_N V_in V_out 1.9μ 2.0μ 2.1μ -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 Vo ltag e (V) Time (sec) Vin NP NN Vout N_P N_N V_in V_out 389mV 303mV -188mV P1 M precharge to 0.2V N1 M pre-discharge to 0V -90mV

Fig. 3-7. Simulated timing waveforms at 5 MHz at 200 mV VDD.

3.3. Detail Evaluation and Discussion

The proposed ALBI is superior to previous designs in terms of leakage power and switching

speed. In a low-voltage circuit design, the decreasing the Ion/Ioff ratio degrades the noise margin.

In the proposed design, the boosted voltage is used in both driving phase and cut-off phase.

Additionally, the proposed design improves the Ion/Ioff ratio by using the active bootstrapped

leakage reduction method. Moreover, fewer design components increase the speed of the bootstrapped circuit. Owing to the fewer components operating in the sub-threshold region, the proposed design scheme performs better than other previous works in terms of Monte Carol analysis.

(32)

To compare the performances of the proposed scheme and conventional ones more fairly, this work re-designed the conventional inverter and reported bootstrapped drivers by using the 90nm process. The sizes of the conventional inverter and the bootstrapped driver are designed to obtain the same rise/fall transient output waveforms. Their device sizes are listed in TABLE 3-1. A 30fF boost capacitor is used to ensure that the boosting efficiency exceeds 80%. These features are evaluated in detail as follows.

TABLE 3-1 Device Sizing

2 300 / 80 1 260 / 80 driver 4 200 / 80 4 200 / 80 switch 4 200 / 80 4 400 / 80 inverter Bootstrapped driver [17] 2 340 / 80 1 250 / 80 driver 3 200 / 80 3 200 / 80 switch 4 200 / 80 4 400 / 80 inverter Bootstrapped driver [16] 2 340 / 80 1 285 / 80 driver 1 200 / 160 1 200 / 160 M_P2, M_N2 1 200 / 80 1 200 / 80 M_P1, M_N1 4 200 / 80 4 400 / 80 inverter Proposed Bootstrapped inverter 30 440 / 80 30 420 / 80 inverter Conventional INV m_p PMOS W/L (nm /nm) m_n NMOS W/L (nm /nm) Sub-circuit Driver topology 2 300 / 80 1 260 / 80 driver 4 200 / 80 4 200 / 80 switch 4 200 / 80 4 400 / 80 inverter Bootstrapped driver [17] 2 340 / 80 1 250 / 80 driver 3 200 / 80 3 200 / 80 switch 4 200 / 80 4 400 / 80 inverter Bootstrapped driver [16] 2 340 / 80 1 285 / 80 driver 1 200 / 160 1 200 / 160 M_P2, M_N2 1 200 / 80 1 200 / 80 M_P1, M_N1 4 200 / 80 4 400 / 80 inverter Proposed Bootstrapped inverter 30 440 / 80 30 420 / 80 inverter Conventional INV m_p PMOS W/L (nm /nm) m_n NMOS W/L (nm /nm) Sub-circuit Driver topology

3.3.1. Boosting Efficiency

Ideally, the boosted node NB generates a voltage swing from 2VDD to –VDD. However, the

parasitic capacitance at node NB exhibits the charge-sharing effect with the bootstrap capacitance

[17]. For example, when NB transitions above VDD, consider the equivalent circuit of the upper

side shown in Fig. 3-4. VBP and CPTP are the voltage and the total parasitic capacitance at NB,

respectively. Ideally, VBP transits from –VDD to 2VDD. Thus,

2 BP PTP BP DD DD BP PTP BP PTP C C V V V C C C C = ⋅ − ⋅ + + . (3-6)

To increase driving capability, the bootstrap capacitance is designed to be significantly larger than the parasitic capacitance at the node. As a result, (3-6) can be rewritten as (3-7),

2 2 . BP BP DD P DD BP PTP C V V V C C β ≈ ⋅ ⋅ + (3-7) P

(33)

from VDD to below ground, the estimated VBN is

(

)

(

)

. BN BN DD N DD BN PTN C V V V C C β ≈ ⋅ − ⋅ − + (3-8)

Based on larger bootstrap capacitance, the boosting efficiency is better. In order to observe the leakage power and time delay time in a more ideal case, we used 100fF as a bootstrap capacitor. In our test chip, based on a trade-off between cost and performance, a 30fF boost capacitor is used for sure that the boosting efficiency is 80% at least. As shown in the Fig. 3-8, the boosting efficiency is 88% when using a 30fF bootstrap capacitor.

0 20 40 60 80 100 55 60 65 70 75 80 85 90 95 100 Boost effic ie nc y (% ) Boost capacitance (fF) Boost efficiency Bo osti ng e ff ici en cy (%) Bootstrap capacitor (fF) Boosting efficiency (%)

Fig. 3-8. Boosting efficiency vs. bootstrap capacitor.

3.3.2. Reduction of Leakage Current

In the proposed design scheme, the boosted high (2VDD) at NB enhances the driving

capability of MND and suppresses the leakage current of MPD. Similarly, the boosted low (-VDD)

at NB enhances the driving of MPD and reduces the leakage of MND.

The Ioff current is primarily formed by a sub-threshold leakage current [38-39]. Hence,

scaling the supply voltage lowers the Ion/Ioff ratio. In the previous literature, bootstrapped drivers

improve the Ion/Ioff ratio only by enhancing Ion unidirectional. The proposed design effectively

suppresses the leakage current of PMOS (NMOS) by providing a potential of a -VDD to VSG

(VGS). According to the I-V formula in sub-threshold region, our design s reduces the leakage

current exponentially.

(34)

power under dynamic operations is difficult. The leakage power of a periodic waveform can be

estimated by separating it from the average total power. The total energy ET of a period of T is

(

)

T T SW SC Leakage SW SC Leakage E P T P P P T E E P T = ⋅ ≈ + + ⋅ = + + ⋅ , (3-9)

where E , _T E_SW , ESC and ELeakage represents the total energy, the switching energy, the

short-circuit energy, and the leakage energy. The switching energy, short circuit energy and leakage current are assumed to remain constant under the same power supply. A long wire can be regarded as large capacitive load is pF range. When a CMOS driver drives heavy capacitive

loads, the energy contributions of the short-circuit current can be ignored. E_Leakage is

proportional to T; E is the total energy of the repeaters. Thus, we can rewrite Eq.(3-9) as _rep

2 _.

2

T rep wire DD Leakage

E ≈⎛_⎜E +α C V ⎞_⎟+P ⋅T

⎝ ⎠ (3-10)

For two identical signals with different periods T1 and T2, Leakage power PLeakage is derived as

(

)

1 1 2 2 1 2 T T Leakage P T P T P T T ⋅ − ⋅ = − . (3-11)

Fig. 3-9 shows the comparison results for the leakage power as a function of frequency with a 0.2pF capacitive load in different temperature and process corners. The ratio of leakage power

to total power is also shown in Fig. 3-9. Owing to the negative VGS control, the leakage power at

10MHz under 0.2V of the proposed bootstrapped inverter is 2pW. The leakage power is 3.9nW for a conventional inverter, 0.15nW for [16], and 39nW for [17]. Although the PMOS (NMOS)

transistor is turned off with the positive voltage VSG (VGS) = VDD in [17], the leakage power in

[17] is more than three orders higher than in the proposed design scheme. When the operating frequency goes from 10MHz to 100kHz, the potential of the boost node become lower due to the node leakage degrades the leakage performance. The potential of the boost node even returns to

(35)

100k 1M 10M 1p 10p 100p 1n 10n 100n Leakage p ower (W att ) Clock frequency (Hz) 0 20 40 60 80 100 DD @TT 25 C, V =0.2V° N P GS SG DD V , V = V N P GS SG DD V , V = -V N P GS SG V , V = 0 PLe ak /Pto tal (% ) Conventional Proposed JSSC1997[16] TVLSI2008[17] Leak. power P_Leak/P_total ★ ▲ (a) DD @SS -40 C, V =0.2V° PLe ak /Ptota l (% ) Conventional Proposed JSSC1997[16] TVLSI2008[17] Leakage power P_Leak/P_total ★ ▲ 100k 1M 10M 1f 10f 100f 1p 10p 100p 1n Lea kage po wer (Watt) Clock frequency (Hz) -10 0 10 20 30 40 50 (b)

(36)

100k 1M 10M 10n 100n 1μ 10μ Leak age pow er (W att ) Clock frequency (Hz) 0 20 40 60 80 100 DD @FF 125 C, V =0.2V° PLe ak /Pto ta l (% ) (c)

Fig. 3-9. Leakage power as a function of frequency from 10 MHz to 100 kHz in corners.

3.3.3. Delay Time Analysis

Delay time is another important feature of bootstapped circuits. Although the driving transistors operate in a triode region under the subthreshlod-supply, other devices remain in the subthreshlod region. The total delay time is thus the sum of the propagation delay of the INV and the driver, which is denoted as

, , ,

P BI P INV P Driver

t =t +t . (3-12)

Where tP BI, , tP INV, , and tP Driver, are the delays of the bootstrapped inverter, the INV, and the

driver, respectively.

Assume that the boost efficiency is the same for all bootstrapped drivers. Delay time of the INV becomes a dominant factor. The sub-threshold logic delay is derived in [9] as

2_exp( ₎ f L DD P DD th dep T T k C V t V V W C V L nV μ ⋅ ⋅ = − . (3-13)

Where kf is a fitting parameter. However, circuit delay time is related to the RC loading effects.

The ALBI has the shortest delay time among the other bootstrapped circuits since the loading of

(37)

Fig. 3-10 summarizes the comparison results for the delay time (from H to L) and the power

consumption as a function of CL at 10 MHz with a supply of 200 mV. The proposed design is the

lowest in power consumption and delay time.

0.2 0.4 0.6 0.8 1.0 5.0n 10.0n 15.0n 20.0n 25.0n 30.0n 35.0n Del ay ti me ( sec ) Cap Loading (pF) Proposed JSSC1997[4] TVLSI2008[6] 10n 100n Power ( W at t) DD @V =0.2V,25 C,TT Corner° Proposed JSSC1997[16] TVLSI2008[17]

Fig. 3-10. Delay time and power consumption versus capacitive loads at 10 MHz.

The potential of the boost node returned to VDD or 0 indeed degrades the leakage

performance in the low frequency or in the fast process/temperature corners. On the contrary, the

potential of another boost node can easily pre-charge to VDD or 0. As shown in Fig. 3-11, whether

in the nominal 25°C, TT corner or in -40°C, SS corner or the 125 °C, FF corner, the delay times of all designs are almost the same at the frequencies from 1 MHz to 100 kHz.

100k 1M 10M 1n 10n 100n Del ay ti me ( se c) Clock frequency (Hz) DD @V =0.2V Proposed JSSC1997[16] TVLSI2008[17] TT, 25°C SS, -40°C FF, 125°C ★ ▲ @SS, -40 C° @TT, 25 C° @FF, 125 C°

(38)

3.3.4. Delay Time Analysis of Process Variation

Sub-threshold operation limits the yield due to its serious process variations. Although the boosted control signal pushes the driver transistors into the triode region, the residue circuit devices still incur the same serious problems with the variation. With fewer devices in the sub-threshold region, the proposed design is less affected by the process variation.

The delay time variability analysis is performed based on Monte Carlo simulations. Device

mismatch, threshold voltage Vth and process corner variation are assumed to be Gaussian random

distribution. In order to cover the most critical process and temperature corners, Monte Carlo simulations are under 3σ process variation at 25°C, 125°C and -40°C, as shown in Fig. 3-12. The supply voltage is 200mV and the clock rate is 1MHz. The number of samples for each temperature corner is 1500, and the total number of samples is 4500. For the worst case at -40°C, a conventional inverter has an average delay of 15.1ns, and the standard deviation is 26.4ns. For the proposed design does not only reduce the average delay to 6.9ns, but also the standard deviation to 6.3ns, which is much better than [16] and [17]. Obviously, The ALBI has higher immunity to the process and temperature variation.

0 200 400 600 800 # of sample s Conv._25° Conv._125° Conv._-40° 0 200 400 600 800 # of sampl es Prop._25° Prop._125° Prop._-40° 0 200 400 600 800 # o f s a m p le s J1997._25° J1997._125° J1997._-40° 0.0 10.0n 20.0n 30.0n 40.0n 50.0n 60.0n 70.0n 80.0n 0 200 400 600 800 # o f s am p le s

Delay time (sec)

T2008._25° T2008._125° T2008._-40° μ= 15.1 ns σ= 26.4 ns μ= 6.9 ns σ= 6.3 ns μ= 15.1 ns σ= 22.3ns μ= 13.9 ns σ= 17.3 ns DD @V =0.2V

(39)

3.4. Implementation and Experimental Results

3.4.1. Implementation of the Bootstrap Capacitor

We can choose the value of the boost capacitor to adjust the boosting efficiency. Large boost capacitor can achieve high boosting efficiency. In addition, larger boost capacitor can store more charges to keep the node voltage against the leakage even at the low speed. However, the area cost and power consumption is the design trade-off. In our test chip, a 30fF boost capacitor is used ensure that the boosting efficiency is at least 80% and doesn’t occupy too much area. MOSFET cap, MOM cap, and MIM capacitor are three types of capacitors in CMOS technology. Among them, MOSFET capacitor has the densest capacitance per area. However, MOSFET capacitor also has several drawbacks. First of all, while the MOSFET capacitor operated in sub-threshold region, the capacitance changes abruptly due to the control voltage as shown in Fig. 3-13. Then, the leakage current of the nano-scaled device becomes more serious.

Next, MOSFET capacitor has large parasitic capacitance from Vctrl nodes to the bulk as

compared to other caps. The large parasitic capacitance need more power budget in the driver. MIM capacitor has the least parasitic capacitance but largest area. A 30fF MIM capacitor occupies 5.1um x 8.5um. Besides, MIM capacitor needs an extra mask which means extra cost. As a result, we use MOM capacitor as the boost capacitor without extra mask. A 30fF MOM capacitor occupies 3.7um x 8.6um and has 1fF parasitic capacitance load at both nodes.

-0.9 -0.6 -0.3 0.0 0.3 0.6 0.9 10 20 30 40 50 60 70 Cap acit ance (fF ) Vctrl (Volt) Capacitance ctrl-V ctrl+ V 1μm ×2 1μm -0.9 -0.6 -0.3 0.0 0.3 0.6 0.9 10 20 30 40 50 60 70 Cap acit ance (fF ) Vctrl (Volt) Capacitance ctrl-V ctrl+ V 1μm ×2 1μm capacitance decreasing abruptly

(40)

3.4.2. Chip Implementation and Measurement

A test chip of bootstrapped CMOS inverters is implemented in 90nm 1P9M SPRVT process to demonstrate the effectiveness of the proposed design scheme. The test circuits include the reported bootstrapped circuits of [16], [17], and the proposed design. The circuits also contain test keys to verify the interconnection model. Each bootstrapped circuit is implemented as a 10-stage cascade driver chain. In each stage, two 30fF MOM capacitors serve as bootstrap

capacitors and a 200fF MOM capacitor as CL. Level shifters are used to boost the 200mV

internal signal to 500mV chip I/O signal for the measurement. The total area is 958μm 776μm× ,

and the core area is 566μm 102μm.× Fig. 3-14 shows the die photograph. The layout area of the

proposed bootstrapped inverter cell is 25.8μm 4.1μm.×

Test keys

Bootstrapped test circuits

Decouple Cap.

De-couple Cap.

Proposed bootstrapped

inverter cell

Fig. 3-14. Die photograph and cell layout.

(41)

Fig. 3-15 shows the photography of our experimental environment. Fig. 3-16 shows the measured waveform. The cumulative clock peak-to-peak and RMS jitters are 3.6ns and 504ps, respectively. The measured average total power is 1.01μW. With the leakage power estimated in Eq. (3-10), the derived leakage power is 107nW with the periods of 100ns and 105ns. TABLE

3-2 lists the summary of the chip. Since the threshold voltage Vthn and |Vthp| are 240mV and

180mV, respectively. We target to operate 10MHz at 0.2V. TABLE 3-3 lists the comparisons of measured results with other works at 0.2V. For a ten-stage driver chain operating at 10MHz, the ALBI has a delay time of 30.1μs, energy efficiency is 0.1 pJ/cycle, and the leakage power is 107nW, which is the best as compared to [16] and [17].

20ns 100mV

Fig. 3-16. Measured waveform at 0.2V core VDD (0.5V I/O VDD).

TABLE3-2Chip Summary

Measured Post-sim (FF Corner) Measured Post-sim (FF Corner) 1.01uW 107nW 1.13uW Total Power Whole Chip Bootstrapped Circuits Interconnect Test Circuits Layout Area 133nW Leakage Power Power Dissipation @ 10 MHz (10 stages) 0.5V Digital Circuits 0.2V, 0.5V Level Shift Buffer

0.2V Bootstrapped Circuits

Supply Voltage

90nm SPRVT Low-K CMOS Process Process Specification (unit) Item Measured Post-sim (FF Corner) Measured Post-sim (FF Corner) 1.01uW 107nW 1.13uW Total Power Whole Chip Bootstrapped Circuits Interconnect Test Circuits Layout Area 133nW Leakage Power Power Dissipation @ 10 MHz (10 stages) 0.5V Digital Circuits 0.2V, 0.5V Level Shift Buffer

0.2V Bootstrapped Circuits

Supply Voltage

90nm SPRVT Low-K CMOS Process Process Specification (unit) Item 958μm 776μm× 566μm 102μm× 575μm 307μm×

(42)

TABLE3-3Comparisons

0.10 0.34

0.19 Energy per cycle (pJ)

107 833 276 Leakage Power (nW) 1.01 1.71 0.74 Total Power (uW)

30.1 48.2

47.3 Delay time (us)

10 5 4 Max frequency (MHz) 0.2 0.2 0.2 Supply voltage (V) Proposed T.VLSI2008 [17] JSSC1997 [16] 0.10 0.34 0.19 Energy per cycle (pJ)

107 833 276 Leakage Power (nW) 1.01 1.71 0.74 Total Power (uW)

30.1 48.2

47.3 Delay time (us)

10 5 4 Max frequency (MHz) 0.2 0.2 0.2 Supply voltage (V) Proposed T.VLSI2008 [17] JSSC1997 [16]

3.5. Summary

This chapter describes an ALBI applied to a sub-threshold supply clock network. Based on 4500 times of Monte Carlo simulations, the average delay time of the proposed design with

200fF CL is 6.9ns with a standard deviation of 6.3ns, which achieves a reduction of 76% from

the conventional inverter. Measured results verify that the test chip can achieve a clock rate of

10MHz at 200mV VDD. Due to the negative VGS suppression, the measured leakage power is

more than 50% improvement over the previously reported bootstrapped drivers. The power consumption is 1.01μW, and the leakage power is 107nW, and the energy efficiency is 0.1pJ/cycle.

(43)

Chapter 4

Near-threshold On-chip Bus

In data communication, inter-symbol interference (ISI) critically limits the data rate. In this chapter, an on-chip bus design with an ISI-suppressed bootstrapped near-threshold repeater is proposed. Operating at the near-threshold supply voltage is the most effective means in power reduction. To overcome the poor driving capability, the bootstrap technique is used. In addition, a pre-charge enhancement and a leakage current reduction schemes are adopted. They achieve beneficial speed-energy tradeoff. Furthermore, the proposed repeater suppresses ISI noise in data link applications.

4.1. Proposed On-chip Bus Architecture

Metal 5

Fig. 4-1. Proposed on-chip bus architecture with new bootstrapped repeater insertion. Fig. 4-1 shows the proposed 4-bit on-chip bus for data communication under the near-threshold power supply. A bus is divided into several segments, each of which is driven by a bootstrapped repeater. Ground shielding is used to eliminate the effective-loading uncertainty and decouple the noise from adjacent channels. The staggered repeaters on adjacent channels are misaligned to reduce the coupling noise and simultaneous switching noise (SSN).

應用於近臨界電壓晶片資料傳輸之拔靴帶式電路技術

國立交通大學

電控工程研究所

博 士 論 文

應用於近臨界電壓晶片資料傳輸之

拔靴帶式電路技術

Bootstrapped Circuit Techniques for Near-threshold

On-chip Data Link

研

究 生： 何盈杰

指導教授：

蘇朝琴 教授

應用於近臨界電壓晶片資料傳輸之

拔靴帶式電路技術

Bootstrapped Circuit Techniques for Near-threshold

On-chip Data Link

研 究 生：何盈杰 Student：Ying-Chieh Ho

指導教授：蘇朝琴 Advisor：Chau-Chin Su

國 立 交 通 大 學

電控工程研究所

博 士 論 文

應用於近臨界電壓晶片資料傳輸之

拔靴帶式電路技術

研 究 生：何盈杰 指導教授：蘇朝琴 教授

國立交通大學電控工程研究所博士班

摘要

Bootstrapped Circuit Techniques for Near-threshold

On-chip Data Link

Student：Ying-Chieh Ho Advisor：Chau-Chin Su

ABSTRACT

誌謝

Table of Contents

摘要 iii

ABSTRACT ... v

Table of Contents...ix

Chapter 1 Introduction... 1

Chapter 2 Background Review... 4

Chapter 3 Near-threshold Clock Network ... 15

Chapter 4 Near-threshold On-chip Bus... 32

Chapter 5 High-boosting Pre-driver ... 56

Chapter 6 Near-threshold ADPLL ... 70

Chapter 7 Conclusions ... 98

References ... 98

VITA

... 98

Publication List ... 98

Chapter 1

Introduction

1.1. Challenges in Nano-Scaled Near-threshold Design

1.2. Near-threshold On-chip Data Link

1.3. Organization of the Dissertation

Chapter 2

Background Review

2.1. Effects in Nano-scaled Process [6]

2.1.1. Short-Channel Effect

2.1.2. Narrow-Width Effect

2.1.3. Sub-threshold Leakage [6, 9]

2.1.4. Drain-Induced Barrier Lowering [6]

2.1.5. Gate-Induced Drain Leakage [6, 10]

2.1.6. Gate Leakage [11]

2.2. Challenges in Ultra Low-voltage Designs

2.2.1. Degradation of Driving Capability

(

)

2.2.2. Leakage Power and I

-to-I

Ratio [8, 12]

2.2.3. Process, Voltage and Temperature Variation

2.3. Low-voltage Design Techniques

2.3.1. Bootstrap Techniques

2.3.2. Dynamic Voltage and Frequency Scaling

2.3.3. Multi-threshold MOS Control

2.3.4. Bulk-driven Technique

2.4. Summary

Chapter 3

Near-threshold Clock Network

3.1. Overview of On-chip Interconnect

3.1.1. RC-Interconnect with Repeater Insertion

3.1.2. Time Constant, Power Dissipation and Figure of Merit

(

博士論文

究生：何盈杰

蘇朝琴教授

研究生：何盈杰 Student：Ying-Chieh Ho

國立交通大學

博士論文

研究生：何盈杰指導教授：蘇朝琴教授