適用於高能源效率晶片之可感知變異超低電壓設計

(1)

國

立

交

通

大

學

電子工程學系電子研究所

博士論文

適用於高能源效率晶片之

可感知變異超低電壓設計

Variation-Aware Ultra-Low Voltage Design for Energy Efficient Chips

研究生：張銘宏

指導教授：黃威教授

(2)

適用於高能源效率晶片之

可感知變異超低電壓設計

Variation-Aware Ultra-Low Voltage Design for Energy Efficient Chips

研

究生：張銘宏 Student：Ming-Hung Chang

指導教授：黃

威教授 Advisor：Prof. Wei Hwang

國

立交通大學

電子工程學系

電子研究所

博

士論文

A Dissertation

Submitted to Department of Electronics Engineering and Institute of Electronics

College of Electrical and Computer Engineering National Chiao Tung University

in partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy in

Electronics Engineering

June 2012

Hsinchu, Taiwan, Republic of China

(3)

i

適用於高能源效率晶片之可感知變異超低電壓設計

學生：張銘宏

指導教授：黃威教授

國立交通大學電子工程學系電子研究所博士班

摘要

本論文提出一具備高能源效率設計之動態電壓頻率調整平台。高能源效率設計包括超低電壓溫度感測器、可感知變異之頻率產生器、高可靠度之超低電壓靜態記憶體與先進先出記憶體。以上述先進先出記憶體作為驗證電路，實現一個高穩定性的動態電壓頻率系統設計。超低電壓全晶上頻率基底之溫度感測器可工作於 0.4V 與 0˚C~100˚C 溫度範圍內，每秒可有效偵測 45k 次，使用一位元校正機制下，僅有-1.81˚C~+1.52˚C 的溫度誤差，其實現於 TSMC 65nm 製程下，使用面積為 990μm2_。_{Logical effort 是數位設計者常用之技巧，但傳統}

的Logical effort 並未考慮 CMOS 操作於不同工作區間，以及溫度和製程對其造成的影響，本

論文提出一個可應用在0.1V~1V 間的統一 Logical effort，並且可減少溫度和製程變化所造成的延遲估計誤差。根據上述的統一Logical effort，本論文設計一超低電壓頻率產生器，其內建的感測器可提供資訊動態自我調整鎖定區間誤差，此技術實現於UMC 65nm 製程下，可產生 625kHz 及 5MHz 最高頻率輸出分別在 0.2V 與 0.5V 下，且其消耗的功率僅各有 0.18μW 與 5.17μW，同時本頻率產生器可合成出 1/8 至 4 倍於參考頻率之輸出。本論文設計一運用打斷正回授正反器迴圈以改善寫入能力之9T 靜態記憶體，本記憶體同時具備讀取緩衝以增進寫入可靠度與降低漏電電流，位元交錯結構也可與本靜態記憶體交錯運用以提高軟錯誤的抵抗能力，本靜態記憶體實現於UMC 65nm 製程下，可工作於電壓為 0.3V 以909kHz 頻率操作且僅消耗最低能源 3.51μW。為提供無線近身網路系統良好的儲存單元，本論文設計一以10T 靜態記憶體基底之先進先出記憶體，該先進先出記憶體實現於UMC 90nm 製程下，可工作於電壓為0.4V 以 50kHz 頻率操作寫入僅消耗最低能源 2.09μW，同時以 625kHz 頻率操作讀取僅消耗最低能源2.25μW。本論文提供一具備高能源效率設計之動態電壓頻率調整平台，以8T 靜態記憶體基底之先進先出記憶體作為展示電路，提供兩種工作模式：低電壓(0.3V)與高效能(0.5V)，若其持續工作於低電壓模式時可節省69.5%功率消耗，本平台可適用於高穩定性之無線近身網路應用。

(4)

Variation-Aware Ultra-Low Voltage Design

for Energy-Efficient Chips

Student : Ming-Hung Chang

Advisor : Prof. Wei Hwang

Department of Electronics Engineering & Institute of Electronics

National Chiao-Tung University

Abstract

Energy efficient design is a k ey focus in emerging energy-constrained platforms. Dynamic voltage frequency scaling (DVFS) platform with energy-efficient designs are presented in this thesis. Ultra-low voltage temperature sensor and variation-aware clock generator are implemented to enable DVFS platform. Robust near-/sub-threshold SRAM/FIFO memories are designed as the test vehicle of DVFS platform.

An ultra-low voltage fully integrated frequency-domain smart temperature sensor is presented. With one-point calibration, a -1.81˚C~ +1.52˚C inaccuracy over a 0˚C~100˚C temperature operation range has been measured for 12 t est chips. At a conversion rate of 45k samples/s, the proposed temperature sensor consumes an average power of 520nW and achieves 0.49˚C/LSB at 11-bit output

resolution. It occupies only 990μm2_{in a TSMC 65-nm general purpose bulk CMOS process. The}

voltage-/temperature-induced delay estimation error of conventional logical effort is much more severe in near-/sub-threshold region. Super-/near-/sub-threshold logical effort models are presented to eliminate delay estimation error caused by voltage and temperature variations. A near-/sub-threshold programmable clock generator is also presented in this thesis. The major challenge of the ultra-low voltage (ULV) circuits is that the lock-in range of the delay line is easily affected by the environmental variations. In the proposed clock generator, there is a PVT compensation unit which consists of a set of delay line and a PVT detector. The unit is responsible for adjusting the lock-in range of clock generator to guarantee successful clock lock. In addition, it has the ability to generate the output clock with frequency from 1/8 to 4 times of the reference clock. The clock generator has been designed using UMC 65nm CMOS technology. The frequencies of reference clock are 625 KHz at 0.2V and 5MHz at 0.5V. The power consumptions are 0.18μW and

5.17μW, respectively, at 0.2V and 0.5V. The core area of this clock generator is 0.01mm2.

A 9T SRAM bit-cell is presented to enhance write ability by cutting off the positive feedback loop of SRAM cross-coupled inverter pair. In read mode, an access buffer is designed to isolate storage node from read path for better read robustness and leakage reduction. Bit-interleaving

(5)

iii

scheme is allowed by incorporating the proposed 9T SRAM bit-cell with additional write-wordlines (WWL/WWLb) for soft error tolerance. A 1Kbit 9T 4-to-1 bit-interleaved SRAM is implemented in 65nm bulk CMOS technology. The experimental results demonstrate that the test chip minimum energy point occurs at 0.3V supply voltage. It can achieve an operation frequency of 909kHz with 3.51μW active power consumption. An ultra-low power (ULP) 16Kbit SRAM-based first-in first-out (FIFO) memory is also presented for wireless body area networks (WBANs). The proposed FIFO memory is capable of operating in ultra-low voltage (ULV) regime with high variation immunity. An ULP near-/sub-threshold 10 transistors (10T) SRAM bit-cell is proposed to be the storage element for improving write variation in ULV regime and eliminate the data-dependent bit-line leakage. The proposed SRAM-based FIFO memory also features adaptive power control circuit, counter-based pointers, and a smart replica read/write control unit. The proposed FIFO is implemented to achieve a minimum operating voltage of 400mV in UMC 90nm CMOS technology. The write power is 2.09μW at 50kHz and the read power is 2.25μW at 625kHz.

Finally, a 512-word by 16-bit (8kb) subthreshold asynchronous first-in first-out (FIFO) memory is presented for wireless body area networks (WBANs). Meanwhile, A 1kb dynamic voltage scaling 8T SRAM-based FIFO memory is implemented to operate between 0.5V (near-threshold) and 0.3V (subthreshold) in UMC 65nm technology with 0.535μW at 625kHz and 0.163μW at 20kHz power consumption, respectively. The proposed DVS FIFO memory can provide up t o 69.5% power savings when low-power mode is always engaged, and there is no power overhead if the period of low-power mode is longer than 48.66μs. It is suitable for healthcare applications equipped with DVFS capability.

(6)

Acknowledgements

I would like to thank my parents and brother for all the supports they have given me. Thank you for raising me and guiding me to be the person I am.

I am extremely grateful to my advisor, Prof. Wei Hwang, for providing me a good research environment and giving me the maximum freedom of research. Thank you for all the constructive comments and suggestions on my research.

I would also like to thank all the laboratory fellows and school mates, graduated or still in the school. Thank you for making the school life more delightful. Special thanks to MOEA u-PHI project and ITRI project teams, who had been a great help on my research.

(7)

List of Tables

3.1 The Performance Comparison of Recent Temperature Sensors . . . 35

3.2 Functions of A(T) for super-threshold unified logical effort model consid-ering supply voltage and temperature . . . 40

3.3 Functions of B(T), C(T), and D(T) for near-threshold unified logical effort model, and Functions of E(T) and F(T) for sub-threshold unified logical effort model considering supply voltage and temperature . . . 41

3.4 Specifications of the proposed DLL-based clock generator . . . 52

4.1 Proposed 9T Bit-Cell Basic Operations Truth Table . . . 58

4.2 Vmin Comparison of Various Bit-Cell Topologies . . . 65

4.3 Test chips measurement summary and comparison . . . 70

4.4 Iso-area calculation considering subarray efficiency . . . 82

4.5 Device sizing for various bit-cell topologies . . . 83

4.6 Vmin Comparison of Various Bit-Cell Topologies . . . 86

4.7 Vmin Proposed 10T SRAM-based FIFO memory . . . 93

5.1 Comparison of various SRAM bit-cells . . . 104

(12)

List of Figures

1.1 Thirty-five years of semiconductor technology scaling [1.2]. . . 1 2.1 Main leakage current components in an NMOS transistor [2.28]. . . 6 2.2 Tradeoff between frequency loss, leakage reduction, and area overhead [2.32]. 8 2.3 Energy and delay in different supply voltage operating regions [2.47]. . . . 9 2.4 Memory occupied up to 69% chip power as the emerging applications

hav-ing more critical energy constraints [2.48]. . . 10 2.5 Classification of variations [2.78]. . . 11 2.6 Minimum reported supply voltage for recent ultra-low voltage designs,

highlighting limitation posed by SRAMs compared with logic [2.14]. . . 13 2.7 Sensor power budgets with common power sources [2.2]. . . 15 3.1 (a) to-propagation-delay-difference generator. (b)

Temperature-to-frequency-difference generator. . . 19 3.2 The linearity of temperature sensitive delay line (TSDL) in

super-/sub-threshold region. . . 19 3.3 The proposed ultra-low voltage frequency-domain temperature sensor. . . . 22 3.4 Timing diagram of the proposed fixed pulse width generator. . . 22 3.5 Inverter used in sub-threshold temperature sensitive ring oscillator. . . 25 3.6 The proposed frequency-domain temperature sensor under (a) process

vari-ation, and (b) voltage variation. . . 26 3.7 Block diagram of the proposed ultra-low voltage frequency-domain

(13)

3.8 The effect of process variation on the proposed process invariant

tempera-ture sensor. . . 29

3.9 The implementation of the proposed process invariant temperature sensor. 29 3.10 Timing diagram of the proposed process invariant temperature sensor. . . . 30

3.11 Microphotograph of the proposed process invariant temperature sensor. . . 31

3.12 Measurement environment for the test chips. . . 32

3.13 Bare die of the test chip on PCB board. . . 32

3.14 Measured error curves for 12 test chips. . . 33

3.15 Measurement results for 12 test chips. . . 33

3.16 Measurement error curves for supply voltage variations. . . 34

3.17 Concept diagram of PVT compensation. . . 36

3.18 Two cascaded FO1 inverters. . . 39

3.19 Proposed clock generator for near-/sub-threshold DVFS system. . . 42

3.20 Proposed finite state machine (FSM). . . 43

3.21 Timing diagram of our FMS operating from Reset to Lock state. . . 44

3.22 Lock-in delay line (lattice delay line [3.28]) used in our proposed clock generator. . . 45

3.23 PVT compensation delay line used in our proposed clock generator. . . 46

3.24 Proposed PVT detector. . . 46

3.25 Monte Carlo simulations for periods of ring oscillators (composed of FO1-INV and FO2-NAND) (a) 0.2V supply voltage, and (b) 0.5V supply voltage. 48 3.26 Control unit including (a) lock-in delay line controller, and (b) SEL generator. 48 3.27 (a) Phase detector, and (b) RSTPD generator. . . 50

3.28 PVT compensation for locking range of proposed generator at (a) 0.2V, TT, w/o compensation, (b) 0.2V, TT, with compensation, (c) 0.2V, FF, w/o compensation, (d) 0.2V, FF, with compensation, (e) 0.5V, TT, w/o compensation, (f) 0.5V, TT, with compensation, (g) 0.5V, FF, w/o com-pensation, (h) 0.5V, FF, with compensation. . . 51

3.29 Layout view of our DLL-based clock generator under UMC 65nm bulk CMOS technology. . . 52

(14)

4.1 Wireless sensor node block diagram for the WBAN system. . . 55

4.2 Block diagram of the proposed 9T bit-cell. The relative threshold voltage ratio of high Vt MOSFET to regular Vt one is 1.3 to 1. . . 57

4.3 (a) Proposed 9T bit-cell in hold operation, and (b) HSNM performance comparison. . . 58

4.4 (a) Proposed 9T bit-cell in read operation, and (b) RSNM performance comparison. . . 59

4.5 (a) Proposed 9T bit-cell in write operation, and (b) write margin perfor-mance comparison. . . 59

4.6 Layout view of the proposed 9T bit-cell. Its size is 1.92× larger than 6T mincell. . . 60

4.7 Hold-failure probability comparison. . . 62

4.8 Read-failure probability comparison. . . 63

4.9 Write-failure probability comparison. . . 64

4.10 Standard 4-to-1 bit-interleaved SRAM array. . . 66

4.11 Schematic illustration of the proposed 9T bit-cells free of write-half-select problem. . . 66

4.12 HSNM distributions of write-half-selected 9T/8T bit-cells. . . 67

4.13 Block diagram of 1Kbit 9T bit-interleaved SRAM. . . 67

4.14 Read replica column and read pulse control circuit. . . 68

4.15 Write pulse control circuit. . . 69

4.16 Die photo and layout view for 1Kbit 9T SRAM test chip fabribated in 65nm bulk CMOS process. . . 71

4.17 Measured power of 1Kbit 9T SRAM versus VDD. . . 72

4.18 Standard FIFO memory and its power consumption ratio. . . 72

4.19 Conventional dual-port 8T bit-cell. . . 74

4.20 Proposed dual-port 10T bit-cell. . . 75

4.21 Layout view of the proposed dual-port 10T bit-cell in UMC 90nm CMOS technology. . . 76

(15)

4.23 (a) Proposed 10T bit-cell in read operation, and (b) Read SNM comparison

in read mode. . . 77

4.24 Read SNM distributions of Monte Carlo simulations (100,000 times). . . . 78

4.25 Proposed 10T bit-cell in write operation. . . 78

4.26 Write margin distributions of Monte Carlo simulations (100,000 times). . . 79

4.27 (a) Proposed 10T bit-cell in hold operation, and (b) Hold SNM comparison in hold mode. . . 79

4.28 Data-independent bitline leakage reduction scheme. . . 80

4.29 Sensing margin comparisons under the worst case scenario. . . 80

4.30 Thin-cell layout style (a) conventional DP 8T mincell, and (b) SE 8T mincell. 81 4.31 Thin-cell layout style (a) conventional DP 8T iso-area bit-cell, and (b) SE 8T iso-area bit-cell. . . 82

4.32 Hold-failure probability comparison. . . 84

4.33 Read-failure probability comparison. . . 85

4.36 Block diagram of the proposed 16Kbit SRAM-based FIFO memory. . . 88

4.37 FIFO memory operation example. . . 88

4.38 (a) Adaptive power control finite state machine, and (b) (i + 1)th word of storage element. . . 89

4.39 Block diagram of the proposed counter-based pointer. . . 90

4.40 The synchronous counter-based pointer (a) schematic view, and (b) power consumption comparisons. . . 90

4.41 SRAM write delay in different process corner and temperature. . . 91

4.42 Proposed smart replica read/write control units. . . 92

4.43 (a) Floorplan and layout views of our 16Kbit 10T SRAM-based FIFO mem-ory, and (b) power reduction ratio by the proposed energy-efficient techniques. 94 5.1 Micro-watt wireless wearable healthcare ECG microsystem block diagram . 95 5.2 A wireless sensor node with two operating modes: Low-power Mode and High-performance Mode. . . 97

(16)

5.3 Proposed 8T SRAM bit-cell . . . 99

5.4 Vt, Ion-Iof f-ratio, and delay versus channel length of proposed 8T SRAM bit-cell . . . 99

5.5 Hold mode of proposed 8T SRAM bit-cell . . . 100

5.6 Read mode and butterfly curve of proposed 8T SRAM bit-cell . . . 100

5.7 The distributions of read SNM of Monte Carlo simulation . . . 101

5.8 Read-bitline leakage reduced by read-buffer-footers . . . 101

5.9 (a) Hierarchical read-bitline scheme with footer in global read-bitline, and (b) Iread-Ileakage-ratio of 512-bit(dot-line)/32-bit(solid-line) per read-bitline with/without RSCE and read-buffer-footer . . . 102

5.10 Equivalent circuit of the proposed 8T SRAM bit-cell in write operation . . 103

5.11 (a) The distributions of write margin performing Monte Carlo simulation, and (b) write delay performance comparison . . . 103

5.12 Layout view of the proposed 8T SRAM bit-cell . . . 104

5.13 Block diagram of proposed asynchronous 8T-SRAM-based FIFO . . . 105

5.14 (a) The adaptive power control system (b) ith word of storage element . . . 106

5.15 The replica column for read operation and read pulse control circuit . . . . 107

5.16 The replica column for write operation and write pulse control circuit . . . 108

5.17 Block diagram of the proposed dynamic voltage frequency scaling 8T-SRAM-based FIFO as a demonstration DVFS platform. . . 110

5.18 Switched capacitor DC-DC converter. . . 111

5.19 DVFS controller and its timing diagram. . . 112

5.20 Layout view and die photo of 1Kbit asynchronous DVFS 8T-SRAM-based FIFO. . . 113

5.21 Energy consumption comparisons of 1Kbit 8T-SRAM-based FIFO with DVFS and without DVFS. . . 116

6.1 Proposed power management system architecture. . . 119

6.2 PVT-aware ultra-low voltage DVFS FIFO system. . . 120

(17)

Chapter 1 Introduction

Driven by the growing demands on battery-operated or self-powered mobile applica-tions, high energy efficiency becomes the driving force for digital circuit design. For most scenarios, energy harvested from the ambient is in the orders of micro-watts, necessitat-ing the circuit implementations to be very efficient in terms of energy consumption [1.1]. Therefore, ultra-low power designs for wireless devices have three primary concerns: small form factor, long lifetime, and low cost. In order to fulfill those requirements, the emerging digital circuit design targets are area-efficiency, energy-efficiency, and robustness.

Figure 1.1: Thirty-five years of semiconductor technology scaling [1.2].

Advances in sub-threshold circuit design have recently demonstrated capabilities com-patible with aggressive energy consumption reduction. However, the drawbacks of sub-threshold design are: the dramatically increased leakage plus decreased ION-IOF F-ratio,

(18)

and the increased energy efficiency comes at the cost of performance loss. As shown in Fig. 1.1, technology scaling shrinks feature size by 70% every generations. However, power density doubles, leakage current increases by 25%, and ION-IOF F-ratio degrades

by 60%. For short channel devices, parameter variations affect design performance more and result in larger threshold voltage variation. On the other hand, dynamic voltage frequency scaling (DVFS) is a popular solution to have energy efficiency and performance concurrently. In other words, if the throughput constraint is cycling between different operating modes, adjusting the supply voltage for the requirements of each mode can provide significant energy savings.

An overview of this work is as follows. In Chapter 2, previous work and basic energy-efficient techniques will be introduced. Meanwhile, wireless body area sensor networks (WBANs) will also be discussed to give the concept of biomedical device standard. An ultra-low voltage temperature sensor with high process variation immunity is first pre-sented in Chapter 3. Also, an unified logical effort model is prepre-sented to speed up ultra-low voltage circuit conceptual design. Based on the proposed model, a near-/sub-threshold DLL-based clock generator with PVT-aware locking range compensation is first presented. A 9T subthreshold SRAM design with bit-interleaving scheme is presented in Chapter 4. An energy-efficient 10T SRAM-based FIFO memory design is also presented. As the test vehicle of proposed dynamic voltage frequency scaling (DVFS) platform, a 8T-SRAM-based FIFO design in 65nm CMOS is first presented in Chapter 5. Finally, conclusions and possible future research directions will be discussed in Chapter 6.

(19)

Chapter 2 Prior Works Review

For emerging battery-powered/energy-harvested portable electronic devices, there are three major design requirements. They are long lifetime, low cost, and tiny form factor [2.1–2.5]. In order to meet these requirements, the development of digital system design concentrated on finding ultra-low-power, robust, and area-efficient solutions.

Power consumption is the sum of dynamic power and leakage power.

Pactive= Pdynamic+ Pleakage= f CVDD2 + IleakageVDD (2.1)

Firstly, lowering supply voltage is an effective strategy to achieve long lifetime since dynamic energy consumption has a square dependence on the supply voltage [2.6, 2.7].

Pdynamic = f CVDD2 (2.2)

where f is the switching frequency, C is the effective switched capacitance of the circuit, and VDD is the supply voltage. Secondly, leakage current becomes a critical issue in

nanometer regimes since subthreshold leakage currents vary exponentially with threshold voltage [2.8].

Ileakage=

W W0

I0• 10(VGS−Vth)/S (2.3)

where UT is the thermal voltage, W is the device width, and S = nUTln10 is the

sub-threshold slope. The leakage power consumption can be much worse if the switch activity is low. Leakage current reduction techniques become a necessary requirement of energy efficient chips.

(20)

Ultra-low voltage operations are being examined capable of providing orders of mag-nitude less power than standard-1V operations. Meanwhile, the minimum energy opera-tions of logic and memory usually occur in the subthreshold and near-threshold regions [2.6, 2.7, 2.9, 2.10]. Successful energy efficient techniques are discussed in Sec. 2.1 for both subthreshold and near-threshold regions. Also, state-of-the-art ultra-low voltage SRAM designs including new bit-cells, novel sensing schemes, and read/write assist circuits are introduced in Sec. 2.2.

However, performance loss and reliability degradation are two major problems for ultra-low voltage design. To retain or improve performance, it is necessary to reduce the threshold voltage as well, resulting in the exponential increase of the subthreshold leakage. On the other hand, global systematic and local random environmental variations in process, supply voltage, and temperature (PVT) are posing a major challenge to the future nanometer circuit design [2.11, 2.12]. In addition, aging variations degrade device robustness and strength when a device is used for a long period of time. Therefore, subthreshold leakage, PVT environmental variations, and aging variations monitoring and smart variability-resistant designs are necessary. The related researches on variation-aware circuits are discussed in Sec. 2.3.

In order to retain the excellent energy efficiency while reducing performance loss, dynamic voltage frequency scaling (DVFS) [2.13] is an effective means for time-varying workload in wireless devices. It reduces supply voltage to enhance battery lifetime while only providing maximum performance when required. For applications with wide spread of workload intensity, DVFS technique is the key to build an optimum energy saving system. Recently, ultra-dynamic voltage scaling (UDVS) technique [2.14, 2.15] where supply voltage is reduced to less than threshold voltage was presented. Many successful designs based on DVFS concepts are surveyed in Sec. 2.4.

One popular energy-limited application with time-varying throughput is healthcare monitoring wearable body area sensor networks (WBANs). The standard of WBANs is under development by IEEE 802.15 TG6 [2.16] for low power devices operation on, in or around the human body. Typical WBANs consist of sensor nodes recognized as an enabling technologies for continuous and noninvasive measurements of vital signs such as

(21)

body temperature, heart rate, and electrocardiogram (ECG/EKG). However, the wearable nature of the sensor nodes constrain form factor size and energy budget because battery replacement may be difficult or impossible. Sec. 2.5 reviews related work on energy-efficient circuit designs for WBANs.

2.1 Energy Efficient Techniques for Ultra-Low

Volt-age Designs

Until the early 2000s, high performance design was the major trend of digital circuits. However, the cost-effective cooling solutions can only provide around 100W power con-sumption. Meanwhile, power-limited portable devices rapidly grew in the last 20 years. Traditional low power techniques including switching activity reduction, pipelining, all-level parallelism, interconnect/logic optimization are no longer sufficient for micro-power microsystems. Several effective ideas have been drawn attention. Digital-assisted analog design for signal calibration and variation compensation became popular [2.17] as tech-nology scaling down to nanometer range. A new FDSOI process techtech-nology [2.18] and a novel 3-D IC package technique [2.19–2.21] are also primary focuses to provide optimum operation while maintaining energy efficiency. Recently, the primary focus to achieve energy efficient digital designs is ultra-low voltage operations [2.3, 2.5, 2.8, 2.22, 2.23].

Ultra-low power circuits demonstrate a huge potential in enhancing the lifetime of portable/bio-medical applications. It is because supply voltage reduces dynamic energy consumption quadratically. However, leakage in subthreshold region increases dramati-cally and drain current decreases exponentially both impacting ION-IOF F-ratio. It can

significantly degrade the devices performance and reliability. To aid in selection of gate size for leakage reduction of ultra-low voltage designs, a new framework for widely-used logical effort method [2.24] must be modified. Logical effort is defined as the ratio of the input capacitance of a gate to that of an inverter delivering the same amount of output cur-rent. It is for quickly estimating the optimal delay time and optimize super-threshold logic paths. Previous research about subthreshold logical effort for maximum drive current was present by Keane [2.25]. A framework was presented by choosing the optimal transistor

(22)

stack sizing factor for best performance. Later, an ultra-low voltage sizing method is pro-posed to minimizing OFF leakage current and maximizing ON active current at the same time [2.26]. The logical effort models extend the original high-performance-oriented de-sign in super-threshold region to energy-efficiency-oriented dede-sign in near-threshold and subthreshold regions. Meanwhile, supply voltage and temperature variations are both taken into account.

2.1.1 Subthreshold Regimes

Lowering supply voltage toward subthreshold region can help portable devices power budget under control. However, the penalties of working in such region are slower speeds, reduced ION-IOF F-ratio, and increased sensitivity to variations. [2.27]. Generally,

energy-stringent applications like wireless sensor nodes, biomedical sensors, and battery-free elec-tronics tend to have fairly low speed requirement. Although leakage current is decreased with supply voltage scaling down, the ION-IOF F-ratio in subthreshold region reduced

down to only 160X (7,000X in super-threshold region) as stated in [2.10]. That is be-cause subthreshold conduction drain current is far more less than super-threshold one. Therefore, it is essential to identify what leakage current components needs to be reduced.

Figure 2.1: Main leakage current components in an NMOS transistor [2.28].

There are four major short-channel leakage mechanisms as illustrated in Fig. 2.1. They are reverse-biased junction leakage current (IREV), gate induced drain leakage (IGIDL),

(23)

For the leakage current of an OFF transistor IOF F,

IOF F = IREV + IGIDL+ ISU B (2.4)

Note that IGate is not included because the transistor gate is not at a high potential.

Because of the low threshold voltage in nanometer technology, ISU B typically dominates

IOF F.

There are plenty of successful energy-efficient techniques to reduce leakage current in subthreshold region. They are power gating through the use of sleep transistors [2.31– 2.37], multiple threshold voltage CMOS (MTCMOS) [2.38, 2.39], and body bias control [2.3, 2.22, 2.30, 2.40–2.45].

Power gating technique is adding a header and/or footer (called sleep transistor) be-tween the actual power/ground rail and the virtual power/ground. It helps to turn off the leakage current path during standby mode. In order to design the power gating devices efficiently, there exists three main design challenges. They are power gating structure, sleep transistor sizing, and supply noise minimization. A power gating structure presented in [2.35] that supports both a cutoff mode and an intermediate power-saving and data-retaining mode. In [2.33], an algorithm estimating the voltage drop and minimizing the size was presented. Moreover, an optimal sizing scheme in [2.32] using an explicit noise and impedance model was developed for supply noise minimization. As shown in Fig. ??, the width of power gating device can be a tradeoff between frequency loss, leakage reduction, and area overhead.

Most of advanced technologies provide MTCMOS technique to achieve high perfor-mance and low power demands. High Vth devices can reduce leakage current by sacrificing

speeds. On the other hand, Low Vth devices can be operated faster than normal ones with

low leakage overhead. In [2.38], a series-connected low Vth power gating structure with

two virtual ground ports was presented to reduce IGate, wake-up time, and rush current.

Meanwhile, a design methodology that enables local insertion of sleep devices for sequen-tial and combinational circuits was presented in [2.39]. It also prevented most sneak leakage paths.

Utilizing the body effect, the device threshold voltage can be controlled by the sub-strate bias. It can provide high-Vth characteristic in standby mode and low-Vth one in

(24)

ac-Figure 2.2: Tradeoff between frequency loss, leakage reduction, and area overhead [2.32].

tive mode [2.30,2.44,2.45]. However, it may increase the depletion width of the MOSFET parasitic junction diode and rapidly increases the BTBT current between the substrate and source/drain, especially in halo implants. In [2.40], optimum body bias voltages were generated for different temperature and process conditions adaptively based on the PVT monitoring and controlling systems. The power supply variations were also compensated based on the propagation delay change of the inverter chain.

2.1.2 Near-threshold Regimes

Minimum energy operations for logic are usually happened in subthreshold region. However, it was reported in [2.46] that a 20% increase in energy from the minimum energy point gives back ten times in performance. Therefore, near-threshold operation can be more energy efficient than subthreshold region from energy-delay-product (EDP) view. For a broad range of power-constrained computing segments from sensors to high performance servers, near-threshold operation is preferred because it it more robust than subthreshold one and energy efficient than super-threshold one as shown in Fig. 2.3 [2.47].

2.2 Ultra-Low Voltage Memories

In highly energy constrained applications, the memory power consumption drives the need for ultra-low voltage operations as shown in Fig. 2.4 [2.48]. Traditional 6T bit-cell

(25)

Figure 2.3: Energy and delay in different supply voltage operating regions [2.47].

without large area overhead cannot survive in subthreshold region because of its read disturb nature. Meanwhile, bitline leakage in 6T SRAMs limits the number of bit-cells on a bitline to 16 [2.49]. In order to overcome the challenges of performing robust ultra-low voltage read/write/hold operations, several successful ultra-low voltage memory designs [2.10, 2.37, 2.48–2.74] were presented. Some of them presented novel bit-cells to avoid disturbances. Some of them presented read-/write-assist techniques in architecture level. For novel bit-cells, a 5T bit-cell [2.57] used sizing asymmetry to improve read stability. Another 5T bit-cell [2.71] utilized dynamic read stability was presented. A read-static-noise-margin-free 7T bit-cell [2.72] was presented to overcome the limits to the speed of 6T SRAM with a 0.5V supply voltage. Also, a 9T bit-cell [2.50] with bit-interleaving scheme enhances write ability by cutting off the positive feedback loop of inverter pair. The 9T bit-cell can reliably operate at the minimum energy point 0.3V. Meanwhile, 8T [2.53, 2.59, 2.61, 2.63–2.65, 2.67, 2.75] and 10T [2.49, 2.51, 2.66, 2.68, 2.70] bit-cells have various structure settings. In [2.67], read buffer was used to ensure read stability. It can

(26)

Figure 2.4: Memory occupied up to 69% chip power as the emerging applications having more critical energy constraints [2.48].

achieve a minimum operating voltage of 350mV. Utilization of the reverse short channel effect in 8T bit-cell [2.65] improved its write margin and read performance without the aid of peripheral circuits. Asymmetrical write-assist 8T bit-cell [2.59] with virtual ground biasing scheme was presented to achieve 0.2V supply voltage. In [2.53], a fully differential 8T bit-cell that allows bit-interleaving to achieve soft-error tolerance. For 10T bit-cell in [2.49], it used four extra transistors to implement a read buffer. The buffer solved read disturbance in 6T bit-cell and relaxed the bitline integration limitation. A schmitt trigger (ST) based differential 10T bit-cell [2.70] was presented to achieve 1.56× higher read static noise margin compared to 6T bit-cell at 0.4V supply voltage. It can be operated at a supply voltage of 160mV. Then, another ST-based 10T bit-cell [2.76] was presented to achieve soft-error tolerance by providing bit-interleaving structure. A detail iso-area analysis was also reported in [2.51].

Other than novel bit-cell structure, peripheral assist techniques for stability improve-ment were presented in dynamic and static ways [2.56]. The dynamic ones include ver-tically routed VDD/VSS, horizontally routed VDD/VSS, Vwordline adjustment, Vbitline

(27)

sup-ply voltage [2.37] or voltage scalable [2.58, 2.60, 2.65], and adaptive body bias control [2.65]. Meawhile, reducing Vwordline and/or increase bit-cell VDD can increase read

sta-bility. During write operations, reducing bit-cell VDD [2.69] and/or employing negative

Vbitline [2.52, 2.77] can improve write margin [2.54].

2.3 Variation-Aware Circuits

Process variations [2.78] can be taken as two parts: global die-to-die (D2D) and local within-die (WID) variations. Global D2D process variations come from different runs, lots, and wafers. Local WID process variations due to fundamental physical or process control limitations. The major sources of WID process variations include random dopant fluctuation, channel length variation, line edge roughness, oxide charge variation, mobil-ity fluctuation, gate oxide thickness variation, and channel width variation [2.79, 2.80]. The first two variations are the dominant sources of WID process variations in current technology. The other critical global variation is lifetime aging problems. They include negative bias temperature instability (NBTI) [2.81], positive bias temperature instabil-ity (PBTI), hot carrier injection (HCI), time-dependent dielectric breakdown (TDDB), and electromigration. Two successful on-chip aging sensors [2.82, 2.83] were presented to monitor the performance degradation. In [2.82], the sensor achieved a direct correlation between the threshold voltage degradation and the phase difference.

Figure 2.5: Classification of variations [2.78].

As shown in Fig. 2.5, various sources of variations according to their spatial and temporal rate-of-change [2.78]. Other than process variations, voltage and temperature

(28)

variations are also needed to be reduced. The impact of variations leads to lower noise margins, reliability degradation, large power consumption, and temporal degradation. There are lots of previous researches related to variation-aware logic [2.1, 2.42, 2.43, 2.79, 2.80,2.84–2.92] and memories [2.9,2.11,2.48,2.93–2.95] for digital circuit performance and yield maintaining.

In [2.84, 2.85], local spatial variations on digital circuit performance was presented to on-chip measure the impact on FET current. A gated osillator was presented to be an all-digital measurement circuit for dynamic supply noise waveform. By taken WID pro-cess variations into account, a variation-aware optimal supply voltage scaling mechanism was presented in [2.89, 2.90]. To ensure the logic functionality, voltage transfer charac-teristic can be an indicator [2.9]. Meanwhile, soft error models accounting for D2D and WID process variations in subthreshold SRAM bit-cells were presented in [2.93]. Because SRAMs generally need to retain data, the low-leakage data-retention techniques in the presence of variations were analyzed in [2.48].

In order to monitor real-time on-chip environmental status, process, voltage, and temperature sensors are essential for variation-aware circuits. One major key focus is in smart temperature sensors [2.96–2.107]. Recently, process and voltage sensors [2.108] and threshold voltage sensors [2.81] also require close attention.

2.4 Dynamic Voltage Frequency Scaling

Emerging applications like implantable/wearable medical devices, wireless sensor net-works and hand held electronics are battery-powered or even battery-free. However, the demand for diverse functionalities to be integrated in these applications creates a se-rious power management bottleneck. Power management techniques [2.109–2.118] are paramount for energy efficient chips. Utilizing on-chip sensors and an embedded micro-controller to measure power and temperature status, and modulate both voltage and frequency to maximize performance is applied on a 90-nm Itanium family processor [2.117]. Also, a multidimensional adaptive power management approach [2.116] opti-mally trades-off power and performance by concurrently tuning supply voltage in RF and digital baseband components. In [2.114], a online-learning algorithm for system-level

(29)

power management was presented with extremely lightweight and negligible overhead.

Figure 2.6: Minimum reported supply voltage for recent ultra-low voltage designs, high-lighting limitation posed by SRAMs compared with logic [2.14].

Energy consumption is the sum of leakage energy and switching energy. In Sec. 2.1, techniques for leakage energy reduction are discussed. For switching energy reduction, dynamic voltage frequency scaling (DVFS) [2.7, 2.9, 2.13–2.15, 2.19, 2.27, 2.60, 2.61, 2.64, 2.89,2.90,2.119–2.127] serves as an energy effective solution in response to varying perfor-mance requirement. It was reported in [2.7] that minimum energy point (MEP) occurred in sub-threshold regions. MEP depends heavily on leakage current, which itself depends on supply voltage [2.124]. A circuit was presented to determine an optimal low activ-ity supply voltage for energy-efficient DVFS. The reported minimum operational supply voltage has two different trends for SRAMs and logic as shown in Fig. 2.6. SRAMs pose a critical limitation in DVFS systems because they are far more less activity factors and sensitive to leakage than logic.

For DVFS platform, highly efficient power conversion achieved by DC-DC converters [2.2, 2.128–2.133] not only in sleeping mode at very light load condition but also in high-speed mode at very heavy load condition. Meanwhile, generating the clock frequencies [2.134–2.141] of DVFS platform is another critical challenge. Level converters [2.142, 2.143] capable of converting voltage from subthreshold to super-threshold regions are also

(30)

essential. According to [2.48], SRAMs occupied up to 69% chip power as the emerging applications having more critical energy constraints. Several successful DVFS SRAM [2.37, 2.48, 2.58, 2.60, 2.61, 2.64, 2.74, 2.119] implementations were also draw lots attention. The state-of-the-art energy-efficient chips [2.23, 2.144, 2.145] were usually operated in ultra-low voltage domains and utilized DVFS technique. A 180-mV subthreshold FFT processor using minimum energy design methodology was presented in [2.6]. In [2.146], a fully integrated power management unit was implemented for GSM baseband-radios. A 167-processor computational platform with per-processor DVFS circuits was presented in [2.147]. Its DVFS controller provides three methods: 1) static, 2) dynamic runtime through software, and 3) dynamic runtime through local hardware for voltage and fre-quency setting. A near-/sub-threshold multi-standard JPEG co-processor was presented in [2.148]. It adopt a configurable Vth balancing scheme to enable ultra wide range VDD

scaling. Twenty-five power domain control was used in H.264 Full-HD decoding applica-tion processor [2.19].

2.5 Wireless Body Area Sensor Networks

Wireless body area sensor networks (WBANs) [2.2,2.4,2.149–2.151] are driven by grow-ing aggrow-ing population worldwide [2.152]. WBANs followed IEEE 802.15 TG6 [2.16] for low power devices operation on, in or around the human body. One recent famous application is wearable medical microsystems that measure human vital signs, e.g. electrocardiogram (ECG), electroencephalography (EEG), heart rate (HR), and blood pressure (BP). Most wearable medical devices include sensors, a analog frontend, a digital baseband and signal processing unit, a battery, a reference oscillator, and a RF transceiver. To ease the burden of human carrying, the form factor of it should be tiny. The volume should be less than 1cm3_{, and weight should be lighter than 100g [2.153]. As for common power sources of}

sensor, the small form factor also restrict us with tight power budgets as shown in Fig. 2.7.

In order to solve the major energy-limited constraint, an 0.5V to 1.0V 16-bit biomed-ical signal processing platform in [2.154] can achieve 10.2× and 11.5× energy reduction when running complete EEG and EKG applications respectively. Voltage scaling and

(31)

Figure 2.7: Sensor power budgets with common power sources [2.2].

block-level power gating optimizes energy efficiency under applications of varying com-plexity. In [2.155], a EEG acquisition SoC with integrated feature extraction processor was presented for a chronic seizure detection. It only consumed 9µJ per feature vector by reducing the rate of wireless EEG data transmission. Meanwhile, using multi-tone code division multiple access (MT-CDMA) and orthogonal frequency division multiple access (OFDM), a 0.5V dual-mode baseband transceiver [2.156] can meet up to 8 multi-user coexistence. This chipset can achieve 4.85Mbps with power consumption of 5.52µW. Two successful general purpose subthreshold sensor processors [2.3, 2.157] were also presented with excellent energy efficiency of 2.6pJ and 3.5pJ per instruction respectively. Some other energy efficient techniques for WBANs were also presented in [2.158–2.160] for frequency tracking loop, signal component separator, and digitally controlled oscillator.

(32)

Chapter 3 Ultra-Low Voltage Temperature

Sensor and Clock Generator Design

Thermal and power management are major challenges in emerging energy-constrained applications with lifetimes of months to years. A fully integrated high-resolution, small-size, and ultra-low power temperature sensor is the key to providing vital environmental data for management units efficiency enhancement. On the other hand, pursuing longer operational lifetimes of portable platforms has driven the integrated circuit design into ultra-low voltage regime where process, voltage, and temperature (PVT) variations are much more severe than the conventional super-threshold design [3.1–3.3]. In this regime, threshold voltage shifts caused by local variation exponentially exacerbate the weak ION

-IOF F-ratio. In order to ensure the functionality in the presence of PVT variations, it

motivates the design of variation-aware near-/sub-threshold circuits [3.4]. In some energy-limited miniature devices, they are powered by energy harvesting from the environment to increase the lifetime. The supply voltage it generated is usually not larger than 0.5V. Therefore, a temperature sensor capable of ultra-low voltage operation is essential. More-over, a new class of package technologies, three-dimensional integrated circuit (3D-IC) [3.5, 3.6], for achieving multi-function integration, improving system speed, and reducing power consumption makes on-die hot-spot problem even worse because of increasing power density and unbalanced thermal stresses distribution. Temperature variations over time induced by those stacking structures in 3D-IC require a fast and area-efficient temperature

(33)

sensor to enable real-time multiple-location hot-spot detection.

With the evolution of CMOS process technology, the number of transistors in a digital core doubles about every two years. The increases of transistor density and operating frequency have brought the effect of shorter battery life. For some applications such as wireless body area network (WBAN) sensors, the critical consideration is life time instead of operating frequency. The WBAN system provides body signal collecting and reliable physical monitoring. It has many wireless sensor nodes (WSNs) attached on or implanted inside human body. How to perform an ultra-low voltage (ULV) design and simultaneously conform to the performance and reliability requirements is an important issue. Even though degradation in speed and increased susceptibility to parameter variations, the power dissipation can be achieved by operating digital circuits with scaled supply voltages. The operating voltage is scaled down to near-threshold (e.g. 0.5V) or sub-threshold (e.g. 0.2V) region depending on the power and speed requirements of the target systems.

Dynamic-voltage-and-frequency-scaling (DVFS) technique is widely used to achieve the goal of saving powers. Besides, advances in ULV circuit design have demonstrated capabilities to reduce the power consumptions. The mix of DVFS and ULV design tech-niques has a great potential for the ultra-low power demands. In the DVFS system, the clock generation and transmission are realized by clock generator and clock tree. The mainly possible problems in clock system are clock jitter and skew. Jitter comes from clock generator, and skew comes from clock tree. They may cause functional errors in digital circuits, and will be more serious in ULV region because of environmental vari-ations. The environmental variations include process, voltage, and temperature (PVT); they should be considered carefully when designing ULV clock generators.

3.1 Ultra-Low Voltage Process-Invariant

Frequency-Domain Smart Temperature Sensor Design

Thermistors and platinum resistors are two most popular conventional temperature sensors with high temperature detection accuracy. However, they need additional readout circuitry to produce temperature readings. In order to overcome it, analog-to-digital

(34)

con-vertors (ADCs) were integrated into the so-called smart temperature sensors [3.7, 3.8] for easily accessible results in digital format. Most high-accuracy and high-resolution temper-ature sensors are based on the tempertemper-ature characteristics of parasitic bipolar transistors. The inaccuracy of the state-of-the-art smart voltage-domain temperature sensors were ±0.1◦_{C (3σ) with resolution of 25mK [3.9] and 10mK [3.10]. Their digital output}

res-olution can be no less than 0.025◦C. Those were achieved by using dynamic element matching, a combination of correlated double-sampling and system-level chopping for off-set cancellation, precision mismatch-elimination layout, and individual trimming at room temperature after packaging. In [3.11], energy-efficient ”zoom-ADC” architecture was pre-sented to maintain the resolution and accuracy of ∆Σ-ADCs. An inaccuracy of 0.2◦C(3σ) with resolution 15mK at conversion rate of 10 samples/s was achieved. However, it is hard to implement these analog voltage-domain temperature sensors to be operated in ultra-low voltage regime.

Recently, a time-to-digital-converter-based (TDC-based) CMOS smart temperature sensor [3.12] without a voltage/current ADC or bandgap reference was presented. The time-domain sensor utilized a temperature-dependent delay line to generate a pulse with a width proportional to the test temperature. Then, a cyclic TDC was implemented to convert the pulse into a corresponding digital code. Later, a slow conversion rate improved version [3.13] was presented with curvature compensation to achieve a better accuracy than other timedomain sensors. With twopoint calibration, it realized a -0.4◦C∼+0.6◦C inaccuracy (3σ) over 0◦C∼90◦C range. Furthermore, process variation is a major challenge needed to be highlighted as technology aggressively scaling down. To remove the effect of process variation and reduce high volume production cost of two-point calibration, a dual-DLL-based time-domain temperature sensor was presented in [3.14]. Initially, one DLL was in a closed loop while the other one was in an open loop to perform the calibration mode of the sensor. It provided required process corner data for the measurement mode to remove the effect of process variation. The use of DLLs yielded a high measurement bandwidth 5k samples/s at 7b resolution. However, hundreds of inverters were required in these time-domain sensors to obtain enough pulse delay for sufficient temperature resolution.

(35)

In this Section, an on-chip 0.4V area-efficient frequency-domain smart temperature sensor with enhanced process variation immunity is developed in TSMC 65nm general purpose CMOS technology. The rest of this paper is organized as follows. Two re-lated state-of-the-art temperature sensors are discussed in Sec. 3.1.1. In Sec. 3.1.2, a frequency-domain temperature sensor for ultra-low voltage operation is proposed. The process variation immunity enhancement of the proposed smart temperature sensor will be described in Sec. 3.1.3. Sec. 3.1.4 provides the proposed 0.4V frequency-domain tem-perature sensor test chips and silicon measurement results. The summary is discussed in Sec. 3.1.5.

3.1.1 Previous Work

Figure 3.1: (a) to-propagation-delay-difference generator. (b) Temperature-to-frequency-difference generator.

Figure 3.2: The linearity of temperature sensitive delay line (TSDL) in super-/sub-threshold region.

(36)

A temperature-to-propagation-delay-difference generator [3.12] was designed to pro-duce an output pulse with a width as linearly proportional to the measured temperature. As shown in Fig. 3.1(a), the START signal went through two different delay lines. One was temperature sensitive, and the other was temperature insensitive. The difference of propagation delay between those two delay lines, Td1−Td2, was generated by the XOR gate

to form temperature-dependent output pulse width. Note that the second delay line with low thermal sensitivity was inserted to avoid large DC offset. However, the characteristics of temperature sensitive delay line (TSDL) becomes very different as the supply voltage scaling down. There are three operation regions of the MOSFETs, including super-, near-, and sub-threshold region. The corresponding current equations are listed as follows. Super-threshold region: (VGS >> Vth) ID sp= 1 2µ ∗ COX _W L (VGS− Vth) 2 (1 + λVDS) . (3.1) Near-threshold region: (VGS ∼ Vth) ID near = µ∗COX _W L VDS VGS− Vth− 1 2VDS . (3.2) Sub-threshold region: (VGS < Vth) ID sb= µ∗COX _W L (m − 1) U_T2exp _V GS− Vth mUT (3.3)

where Vth denotes threshold voltage and µ∗ denotes the effective channel mobility. The

thermal voltage is represented by UT. These three parameters are temperature related.

Considering the transistor figure of merit for temperature sensing, the temperature coef-ficient of current (TCC) [3.15] was used. For a long channel transistor, the TCC in the super-threshold region of operation based on (3.1) is given by

T CCsp = 1 ID sp dID sp dT ! = 1 µ∗ dµ∗ dT − 2 VGS− Vth dVth dT . (3.4)

The relative change of T CCsp is a negative few thousandths per degree because the

(37)

(assuming VDS is much larger than UT) is given by T CCsb = 1 ID sb dID sb dT ! = 1 µ∗ dµ∗ dT + 2 T − 1 nUT " dVth dT + VGS− Vth T # . (3.5)

The relative change of T CCsb is now positive because the negative threshold voltage

sensitivity dominates in sub-threshold region due to the exponential dependence upon it. As the transistor goes deeper into weaker inversion, yielding T CCsb of 6% per degree and

more. Based on (3.4) and (3.5), the relationship of the TSDL propagation delay versus temperature in super-/sub-threshold region is shown in Fig. 3.2. The TSDL propagation delay in super-threshold region increases with temperature whereas that in sub-threshold region decreases with temperature. However, the linearity of the TSDL propagation delay in sub-threshold region is much worse as shown in Fig. 3.2. Therefore, the characteristics of the TSDL in sub-threshold region is not suitable for ultra-low voltage temperature measurement.

On the other hand, the temperature insensitive delay line (TIDL) in [3.12] was also hard to implement when the supply voltage is lower to near-/sub-threshold region. The design principle of TIDL was setting ∂ID/∂T =0 to yield the thermal independent

conduc-tion current. The first challenge is that the conducconduc-tion current equaconduc-tion in super-threshold region is very different from that in sub-threshold region, especially the power of Vth term.

The second one is that the relative change of TCC in sub-threshold region is several pos-itive hundredths per degree while the relative change of TCC in super-threshold region is a negative few thousandths per degree. The third one is that the conduction current equation of sub-threshold region shown in (3.3) is affected by the thermal voltage to the power of 2, U2

T.

In [3.16], a temperature-to-frequency-difference generator was designed to have the temperature sensitive ring oscillator (TSRO) to be the clock source for up-counting, and the temperature insensitive ring oscillator (TIRO) to be the clock source for down-counting. With the same counting period, the output of the up-down counter was equal to the frequency difference of the two oscillators, fo1− fo2, as shown in Fig. 3.1(b). The

counter output, fo1 − fo2, was designed to be linearly proportional to the measured

(38)

366k samples/sec with only 400µW power consumption. It adopted a modified TIRO to solve the voltage head room problem. However, the implementation of the TIRO was still based on setting ∂ID sp/∂T =0 to acquire the minimum thermal sensitivity. Adopting the

TIRO in ultra-low voltage region encounters the same difficulty as the TIDL in [3.12].

3.1.2 Subthreshold Frequency-Domain Temperature Sensor

De-sign

The previous super-threshold temperature sensors in Sec. 3.1.1 using temperature proportional to propagation-delay/frequency difference were both no longer suitable for ultra-low voltage temperature measurement. It is because that the sub-threshold device conduction current is now exponentially changed based on (3.3). Also, the relative change of TCC is now a positive few hundredths per degree in weak inversion region. It will become more sensitive as the transistor goes deeper into weaker inversion.

Figure 3.3: The proposed ultra-low voltage frequency-domain temperature sensor.

Figure 3.4: Timing diagram of the proposed fixed pulse width generator.

(39)

temperature measurement. It composes of a sub-threshold temperature sensitive ring os-cillator (SB-TSRO), a fixed pulse width generator, a 2-input AND, and an S-bit counter. The proposed sensor is designed to have the frequency ratio between the SB-TSRO and the clock source, CLK, of the fixed pulse width generator proportional to the test tem-perature. Thus, the proposed temperature sensor can be regarded as a temperature-to-frequency-ratio generator. An N-bit counter and a D flip-flop construct the fixed pulse width generator. The CLK for the N-bit counter is created from the divided system clock, and its frequency equals to fo1. Using the most significant bit of the N-bit counter,

Cmsb, to reset D flip-flop can produce the desired pulse width without a comparator. Once

ST ART is inserted enabling CLK to trigger N-bit counter, the Cmsb will become 1 after

2N −1_{positive edge of CLK. It, then, resets the output of the D flip-flop, Q, and the N-bit}

counter. The desired pulse width is generated from the D flip-flop output, Q. The fixed pulse width period equals to 2N −1/fo1. The timing diagram of the proposed fixed pulse

width generator is shown in Fig. 3.4. Note that the difference of the D flip-flop delay time between Q changing from 0 to 1, Td1, and from 1 to 0, Td2, is negligible since the pulse

width, W , is longer enough. Also, it can remove some of the fast voltage fluctuations when the period of voltage variation is much shorter than the fixed pulse width period. Moreover, the SB-TSRO is designed to generate a frequency, fo2, linearly proportional to

the measured temperature. Using the 2-input AND, the clock output of the SB-TSRO can only trigger the S-bit counter within the pulse width period, W . Therefore, the digital output of S-bit counter is equal to 2N −1_f

o2/fo1.

3.1.2.1 Design Principles

One of the key components of the proposed sensor is the sub-threshold tempera-ture sensitive ring oscillator (SB-TSRO). It should produce an output clock with fre-quency as linearly proportional to the measured temperature as possible. The frefre-quency of SB-TSRO constructed by the inverters is proportional to the conduction current since f = ID_/(V

DD× Ceq).

(40)

Note that supply voltage, VDD, and equivalent capacitor of an inverter, Ceq, are assumed

to be temperature independent. The inversion layer effective mobility depends on tem-perature according to [3.17] µ∗ = µ0 T T0 a , (3.7)

where a is typically between -1 and -2. Also, the thermal voltage, UT, is equal to

UT =

kBT

q . (3.8)

By substituting (3.7) and (3.8) into (3.3), the equation becomes ID sb = µ0COX _W L (m − 1) _T T0 a _k BT q !2 exp ( q [VGS− Vth(T )] mkBT ) . (3.9)

Using Taylor series expansion for exponential function, the equation becomes ID sb∼= µ0COX _W L (m − 1) _T T0 a _k BT q !2( 1 + q [VGS− Vth(T )] mkBT ) . (3.10) After simplification, ID sb∼= XAT2+a ( 1 + q [VGS− Vth(T )] mkBT ) ≈ XAT2+a ( q [VGS− Vth(T )] mkBT ) (3.11) where XA = µ0COX W L (m − 1) k2 B q2_Ta 0

. It is not temperature related. Note that the second term within the curly brackets is much larger than 1.

Based on [3.18], the threshold voltage, Vth, can be expressed as

Vth(T ) = Vth(T0) + α (T − T0) , (3.12)

where α is a negative coefficient. Thus, the term within curly brackets of (3.11) is related to threshold voltage, Vth, and thermal voltage, UT. It is proportional to temperature. It

also means the frequency of SB-TSRO is proportional to temperature based on (noa5). Note that (3.11) is proportional to T1∼2 _{since coefficient a is typically between -1 to -2.}

The accuracy of this temperature sensor is degraded a few because SB-TSRO is not strict linear.

In order to ensure proposed SB-TSRO operates in sub-threshold region, the design principle of the proposed SB-TSRO device threshold voltage is

(41)

Figure 3.5: Inverter used in sub-threshold temperature sensitive ring oscillator.

where the supply voltage, VDD, is equal to VGS. The TM AX represents the maximum

temperature operation range of the sensor. The inverter with enable function used in proposed SB-TSRO is shown in Fig. 3.5(a). The threshold voltage behavior can be adjusted by using different multi-threshold CMOS (MTCMOS) setting or increasing the effective channel length. Based on (3.13), the threshold voltage of MOSFETs within proposed SB-TSRO at 125◦C is implemented to be VDD for the design convenience. The

relationship of SB-TSRO output clock frequency versus temperature is an approximate linear function as shown in Fig. 3.5(b).

On the other hand, the fixed pulse width generator in Fig. 3.3 requires CLK to create a fixed temperature insensitive pulse width. The CLK can be easily synthesized from system clock using a simple frequency divider. The frequency of the CLK, fo1, equals

to the system clock divided by M. The value of M depends on the frequency generated by the SB-TSRO and the required digital output resolution of the proposed sensor. The immunity of the CLK to the variations relies on the external system clock generator. Meanwhile, the temperature sensitivity of CLK is not required to be exactly zero. Only if the approximation line of CLK frequency versus temperature is not parallel to the SB-TSRO approximation line.

3.1.2.2 Simulation Results

An 11-bit frequency-domain temperature sensor is simulated at 0.4V supply voltage us-ing TSMC 65nm CMOS technology. The SB-TSRO uses regular threshold voltage (RVT) CMOS. The device effective length of the RVT CMOS is adjusted to have its threshold voltage satisfying (3.13). The temperature digital output inaccuracy is -3.0◦C∼+3.0◦C

(42)

Figure 3.6: The proposed frequency-domain temperature sensor under (a) process varia-tion, and (b) voltage variation.

(without process/voltage variations) over 0◦C∼100◦C temperature range after one-point calibration. The conversion rate of the proposed temperature sensor can be as fast as 50k samples/sec.

The effects of process/voltage variations on the proposed ultra-low voltage temperature sensor are shown in Fig. 3.6. The major source of voltage variation is the supply voltage bouncing caused by digital circuit switching. However, the bouncing noise will be averaged since the frequency of the proposed sensor is much slower than the system clock. As a result, the effect of process variation is worse than that of voltage variation. The process variation induced inaccuracy is ±48◦C while voltage variation induced inaccuracy is -15.5◦C∼5.1◦C. of the frequency-domain temperature sensor.

3.1.3 Ultra-Low Voltage Frequency-Domain Temperature

Sen-sor with Process Variation Immunity Enhancement

In order to remove the effect of process variation, the CLK provided by system clock divided by M is replaced by a near-threshold temperature sensitive ring oscillator (Near-TSRO) as shown in Fig 3.7. The frequency of the Near-TSRO is fo3. The S-bit counter

is still triggered by the SB-TSRO with fo2 frequency. Hence, the output pulse width of

fixed pulse width generator becomes 2N −1_/f

o3. The corresponding digital output of S-bit

(43)

Figure 3.7: Block diagram of the proposed ultra-low voltage frequency-domain tempera-ture sensor with process variation immunity enhancement.

3.1.3.1 Design Principles

There are two temperature sensitive ring oscillators (TSROs) in the modified frequency-domain temperature sensor for process variation immunity enhancement. One is oper-ated in sub-threshold region, named SB-TSRO, and its frequency is proportional to the conduction current, ID sb. The other one is operated in near-threshold region, named

Near-TSRO, and its frequency is proportional to the conduction current, ID near, based

on f = ID_/(V

DD × Ceq).

fN ear−T SRO∝ ID near. (3.14)

Based on (3.6) and (3.14), the digital output of S-bit counter can be represented by

2N −1fo2/fo3 ∝2N −1ID sb/ID near. (3.15)

Considering (3.2) and (3.3), ID sb/ID near becomes

ID sb ID near = (m − 1) U 2 T exp _V GS−Vth2 mUT VDS VGS− Vth3− 1₂VDS , (3.16)

where Vth2 is the device threshold voltage of SB-TSRO, and Vth3 is the device threshold

voltage of Near-TSRO. Note that the µ∗COX

W L

term is cancelled. Given VGS = VDS =

VDD, the above equation can be simplified as

ID sb ID near = (m − 1) _k BT q 2 expnq[VDD−Vth2(T )] mkBT o VDD h 1 2VDD − Vth3(T ) i , (3.17)

(44)

where UT = kB_qT. Using Taylor series expansion for exponential function, the equation becomes ID sb ID near = (m − 1) _k BT q 2n 1 + q[VDD−Vth2(T )] mkBT o VDD h 1 2VDD− Vth3(T ) i ≈ (m − 1)kBT q 2n_q[V DD−Vth2(T )] mkBT o VDD h 1 2VDD − Vth3(T ) i (3.18)

Note that the second term within the curly brackets of the numerator is much larger than 1.

The numerator in (3.18) is proportional to temperature when supply voltage equals to SB-TSRO threshold voltage (VDD=Vth2). Meanwhile, the denominator of (3.18) is

approximately proportional to T. Therefore, the output of proposed temperature sensor with enhanced process variation immunity is approximately proportional to T. However, the device threshold voltage of SB-TSRO decreases as temperature increases. Equation (3.18) is going to be proportional to T1∼2 _{when the term within the curly brackets of}

the numerator is approximately proportional to T. It is important to point out that the SB-TSRO threshold voltage is not required to be exactly equal to supply voltage. Only if the approximation line of SB-TSRO frequency versus temperature is not parallel to the Near-TSRO approximation line, will the proposed frequency-domain temperature sensor function correctly.

Equation (3.18) is only valid provided that fo2 is generated in sub-threshold region

whereas fo3 is generated in near-threshold region. In order to ensure the SB-TSRO (fo2)

and the Near-TSRO (fo3) operate in sub-threshold and near-threshold region,

respec-tively, the design principles of the device threshold voltage within the two TSROs for the proposed temperature sensor with enhanced process variation immunity are

Vth2(T ) = VDD, T > TM AX (3.19)

Vth3(T ) = 1₂VDD, T < TM IN , (3.20)

where TM AX and TM IN represent the maximum and minimum temperature operation

range of the sensor respectively.

On the other hand, the enhanced process variation immunity is achieved by the temperature-to-frequency-ratio structure. Some process parameters of ID sb are cancelled

適用於高能源效率晶片之可感知變異超低電壓設計

國

立

交

通

大

學

電子工程學系 電子研究所

博 士 論 文

適用於高能源效率晶片之

可感知變異超低電壓設計

Variation-Aware Ultra-Low Voltage Design for Energy Efficient Chips

研 究 生：張銘宏

指導教授：黃 威 教授

適用於高能源效率晶片之

可感知變異超低電壓設計

Variation-Aware Ultra-Low Voltage Design for Energy Efficient Chips

研

究 生：張銘宏 Student：Ming-Hung Chang

指導教授：黃

威 教授 Advisor：Prof. Wei Hwang

國

立 交 通 大 學

電子工程學系

電子研究所

博

士 論 文

適用於高能源效率晶片之可感知變異超低電壓設計

學生：張銘宏

指導教授：黃 威 教授

國立交通大學電子工程學系電子研究所博士班

摘 要

Variation-Aware Ultra-Low Voltage Design

for Energy-Efficient Chips

Student : Ming-Hung Chang

Advisor : Prof. Wei Hwang

Department of Electronics Engineering & Institute of Electronics

National Chiao-Tung University

Abstract

Acknowledgements

Table of Contents

List of Tables

List of Figures

Chapter 1

Introduction

Chapter 2

Prior Works Review

2.1

Energy Efficient Techniques for Ultra-Low

Volt-age Designs

2.1.1

Subthreshold Regimes

2.1.2

Near-threshold Regimes

2.2

Ultra-Low Voltage Memories

2.3

Variation-Aware Circuits

2.4

Dynamic Voltage Frequency Scaling

2.5

Wireless Body Area Sensor Networks

Chapter 3

Ultra-Low Voltage Temperature

Sensor and Clock Generator Design

3.1

Ultra-Low Voltage Process-Invariant

Frequency-Domain Smart Temperature Sensor Design

3.1.1

Previous Work

3.1.2

Subthreshold Frequency-Domain Temperature Sensor

De-sign

3.1.3

Ultra-Low Voltage Frequency-Domain Temperature

Sen-sor with Process Variation Immunity Enhancement

電子工程學系電子研究所

博士論文

研究生：張銘宏

指導教授：黃威教授

究生：張銘宏 Student：Ming-Hung Chang

威教授 Advisor：Prof. Wei Hwang

立交通大學

士論文

指導教授：黃威教授

摘要