• 沒有找到結果。

應用於無線近身網路之可調式全數位時脈產生器

N/A
N/A
Protected

Academic year: 2021

Share "應用於無線近身網路之可調式全數位時脈產生器"

Copied!
81
0
0

加載中.... (立即查看全文)

全文

(1)國. 立. 交. 通. 大. 學. 電子工程學系電子研究所 碩 士. 論. 文. 應用於無線近身網路之可調式 全數位時脈產生器 A Tunable All-Digital Clock Generator for Wireless Body Area Network Applications. 研究生 : 陳俊廷 指導教授 : 李鎮宜博士. 中 華 民 國 九 十 七 年 八 月.

(2) 應用於無線近身網路之可調式全數位 時脈產生器 A Tunable All-Digital Clock Generator for Wireless Body Area Network Applications 研 究 生:陳俊廷. Student:Juinn-Ting Chen. 指導教授:李鎮宜 教授. Advisor:Prof. Chen-Yi Lee. 國 立 交 通 大 學 電機學院 電子工程所碩士班 碩 士 論 文 A Thesis Submitted to Department of Electronics Engineering & Institute of Electronics College of Electrical and Computer Engineering National Chiao Tung University in Partial Fulfillment of the Requirements for the Degree of Master in Electronics Engineering August 2008 Hsinchu, Taiwan, Republic of China. 中 華 民 國 九 十 七 年 八 月.

(3) 應用於無線近身網路之可調式全數位 時脈產生器 研究生: 陳俊廷. 指導教授: 李鎮宜 國立交通大學 電子工程學系 電子研究所碩士班. 摘. 要. 對於逐漸受到重視的無線近身網路來說,高可靠度、可攜式與低製造成本的 需求成為近年來最重要的研究主題之一。準確無誤地偵測人體生醫資訊,並且以 有限的功率消耗加以傳輸訊號是無線近身網路的主要訴求。而在微小的感測貼片 上做系統整合則是相當大的挑戰。 本篇論文將會針對應用於無線近身網路的時脈產生器做介紹,並以高可靠 度、低功耗、低面積的觀點來設計一個可調式全數位時脈產生器。藉由可調整相 位及頻率的時脈產生器與動態取樣相位頻率調整技術搭配,可在傳輸的封包錯誤 率達到 1 %時,使整體效能僅有 0.25 dB 損耗,同時並降低接收端的類比轉數位 電路功耗達 46 %。而為了更進一步降低前述時脈產生器的功率消耗,本篇論文 也提出一種應用遲滯電路來設計數位控制振盪器的方法,所設計的 5 MHz 數位 振盪器,其最小解析度可達 0.78 ps,功率消耗僅為 2.6 µW。在本篇論文的最後 將會介紹一種可容忍製程、電壓、溫度飄移的全數位時脈產生器來當作可調式全 數位時脈產生器的訊號源,其功率消耗為 343 µW、頻率誤差最大為 0.002 %,可 搭配頻率控制電路來取代傳統的石英振盪器。而以上所述的可調式全數位時脈產 生器共可在無線感測端降低 89.8 %與 88.1 %的功率與面積消耗,使其達成在無 線近身網路上高可靠度、可攜式與低製造成本的需求。. i.

(4) A Tunable All-Digital Clock Generator for Wireless Body Area Network Applications Student: Juinn-Ting Chen Advisor: Chen-Yi Lee Department of Electronics Engineering and Institute of Electronics, National Chiao-Tung University. ABSTRACT For wireless body area network applications, the reliability, portability and cost are the significant studies in the recent years. In order to accurately monitor the biomedical signals without interference, system reliability is the challenge. For battery limited applications, the feature of low power consumption is undoubtedly required whenever the system is operating or standby. The demand of small size in sensor tags increases the difficulty in system integration, especially within a common used quartz crystal oscillator. In this thesis, we propose an all-digital tunable clock generator for wireless body area network applications. For 46 % ADC power reduction and only 0.25 dB SNR loss at PER=1 %, a phase-frequency tunable clock generator is applied with dynamic phase-frequency recovery technologies. So as to reduce power consumption on the always-on clock generator, a hysteresis-delay-cell-based digitally controlled oscillator is introduced, which has 0.78 ps delay resolution and consumes 2.6 µW at 5 MHz. Finally, an all-digital and cell-based PVT tolerance clock generator is described for replacing the reference quartz crystal oscillator. It achieves 343 µW and 0.002 % maximum frequency offset by frequency tuning capability. The overall designs enable the power and area reduction by 89.8 % and 88.1 % in wireless sensor nodes, respectively. ii.

(5) 誌 謝 從大四推甄上研究所後,在 SI2 實驗室已經渡過了兩年半的日子,在這段寶 貴的時間內,從實驗室獲得了不少的專業知識,才能讓我完成這份論文。在此, 非常感謝我的指導老師李鎮宜教授,老師豐富的學識與切入問題的角度令我受益 良多,完善的研究設備與環境,使得我的研究得以順利完成。感謝實驗室的鍾菁 哲學長,解決了晶片設計流程上的問題,讓我們的晶片可以順利下線。更要感謝 游瑞元學長對我研究方向不厭其煩地指引,使我能完成碩士學業,最後感謝口試 委員們的指導與寶貴的意見。. iii.

(6) CONTENTS. PAGE. CHAPTER 1 Introduction..............................................................................................1 1.1 Motivation.......................................................................................................1 1.2 Organization....................................................................................................5 CHAPTER 2 Phase-Frequency Tunable Clock Generator ............................................6 2.1 System Overview ............................................................................................7 2.2 Architecture.....................................................................................................9 2.3 Circuit Designs..............................................................................................11 2.3.1 Phase Frequency Detector.................................................................11 2.3.2 Digitally Controlled Oscillator .........................................................13 2.3.3 Glitch-Free Clock Multiplexer..........................................................16 2.4 Simulation Result..........................................................................................19 2.5 Implementation and Measurement Result ....................................................21 2.6 Summary .......................................................................................................24 CHAPTER 3 Hysteresis-Delay-Cell-Based Digitally Controlled Oscillator...............25 3.1 Hysteresis Delay Cell....................................................................................29 3.1.1 Rabaey Architecture..........................................................................31 3.1.2 Dokic Architecture ............................................................................32 3.1.3 Sarawi Architecture...........................................................................33 3.1.4 Comparison .......................................................................................34 3.2 Proposed Hysteresis Delay Cell....................................................................36 3.2.1 Formulation.......................................................................................36 3.2.2 Delay Tunable Hysteresis Delay Cell ...............................................41 3.3 Proposed HDC-Based Digitally Controlled Oscillator .................................44 3.4 Simulation Result..........................................................................................48 3.5 Summary .......................................................................................................49 CHAPTER 4 PVT Tolerance Clock Generator............................................................51 4.1 Architecture...................................................................................................52 iv.

(7) 4.2 Circuit Designs..............................................................................................53 4.2.1 PVT Detector ....................................................................................53 4.2.2 Mapper ..............................................................................................58 4.2.3 Clock Oscillator ................................................................................61 4.3 Simulation Result..........................................................................................62 4.4 Implementation .............................................................................................63 4.5 Summary .......................................................................................................65 CHAPTER 5 Conclusion and Future Work .................................................................66 5.1 Conclusion ....................................................................................................66 5.2 Future Work ..................................................................................................68 Reference .....................................................................................................................69. v.

(8) LIST of FIGURES. PAGE. Fig. 1.1. WSN (a) power (b) area...................................................................................2 Fig. 1.2. PER without and with PFTCG ........................................................................3 Fig. 1.3. Power consumption without and with PFTCG................................................3 Fig. 1.4. PVT variations of ring oscillator under 90 nm process ...................................5 Fig. 2.1. Block diagram of the system operation with all-digital PFTCG .....................8 Fig. 2.2. Architecture of the proposed all-digital PFTCG............................................10 Fig. 2.3. Control mechanism of the proposed all-digital PFTCG................................ 11 Fig. 2.4. Schematic of PFD..........................................................................................12 Fig. 2.5. Block diagram of DCO..................................................................................14 Fig. 2.6. Architecture of 1st tuning stage of DCO .......................................................14 Fig. 2.7. Proposed delay cell in 3rd tuning stage .........................................................14 Fig. 2.8. Schematic of 2-to-1 glitch-free clock MUX..................................................17 Fig. 2.9. Simulated waveforms of the 2-to-1 glitch-free MUX ...................................18 Fig. 2.10. Proposed 8-to-1 glitch-free clock MUX for DPR .......................................18 Fig. 2.11. Simulated waveforms of PFTCG operation scenario ..................................20 Fig. 2.12. Simulated multi-phase waveforms of PFTCG.............................................20 Fig. 2.13. Area distribution of all-digital PFTCG........................................................22 Fig. 2.14. Layout of the proposed PFTCG...................................................................22 Fig. 2.15. Micro chip photo (a) WSN (b) CPN............................................................23 Fig. 2.16. Measurement result of PFTCG....................................................................23 Fig. 2.17. ADC power comparison ..............................................................................24 Fig. 3.1. Repeating switching through cascading inverter...........................................27 Fig. 3.2. Output signals through inverter and HDC.....................................................30 Fig. 3.3. Transfer function of HDC..............................................................................30 Fig. 3.4. Rabaey HDC (a) Circuits (b) Schematic .......................................................31 Fig. 3.5. Dokic HDC ....................................................................................................32 Fig. 3.6. Sarawi HDC...................................................................................................33 vi.

(9) Fig. 3.7. Normalization of area efficiency with standard cells and HDCs ..................35 Fig. 3.8. Normalization of energy efficiency with standard cells and HDCs ..............36 Fig. 3.9. Transition response of Sarawi HDC ..............................................................39 Fig. 3.10. Proposed delay tunable HDC ......................................................................42 Fig. 3.11. Delay of the proposed delay tunable HDC ..................................................42 Fig. 3.12. Power of the proposed delay tunable HDC .................................................43 Fig. 3.13. Cascading BUF and DCV............................................................................43 Fig. 3.14. Architecture of the proposed HDC-based DCO ..........................................45 Fig. 3.15. Delay element of the 1st tuning stage..........................................................45 Fig. 3.16. PFTCG comparison (a) power (b) area .......................................................50 Fig. 4.1. Architecture of the proposed PVT tolerance clock generator........................52 Fig. 4.2. Delay ratio of ND4M0H_L to BUFM8H ......................................................54 Fig. 4.3. Second order modeling curve of delay value ................................................55 Fig. 4.4. Second order modeling error .........................................................................56 Fig. 4.5. Architecture of the proposed PVT detector ...................................................57 Fig. 4.6. Second order partition curve of delay value..................................................58 Fig. 4.7. Second order partition error...........................................................................58 Fig. 4.8. Second order mapping curve of oscillator codeword ....................................60 Fig. 4.9. Second order mapping error ..........................................................................60 Fig. 4.10. Architecture of the proposed clock oscillator ..............................................61 Fig. 4.11. Area distribution of PVT tolerance clock generator ....................................63 Fig. 4.12. Layout of the proposed PVT tolerance clock generator ..............................63 Fig. 4.13. PVT tolerance clock generator comparison (a) power (b) area...................65 Fig. 5.1. Overall comparison (a) power (b) area..........................................................68. vii.

(10) LIST of TABLES. PAGE. Table 2.1. Specification of PFTCG................................................................................8 Table 2.2. Controllable range and delay resolution of DCO in PFTCG ......................16 Table 2.3. The proposed PFTCG hardware profile......................................................21 Table 3.1. Delay and power of standard cells in 90 nm technology ............................27 Table 3.2. Performance comparison with standard cells and HDCs............................34 Table 3.3. Transistor size of proposed delay tunable HDC..........................................42 Table 3.4. Comparison of cascading BUF and DCV to delay tunable HDC ...............44 Table 3.5. Controllable range and delay resolution of 5 MHz HDC-based DCO .......47 Table 3.6. Controllable range and delay resolution of 200 MHz HDC-based DCO ...47 Table 3.7. Performance comparison of DCO...............................................................49 Table 4.1. Performance comparison of clock sources..................................................64. viii.

(11) CHAPTER 1 Introduction 1.1 Motivation For the ubiquitous personal healthcare inspection (uPHI) in wireless body area network (WBAN) applications, high reliability, low power consumption and low cost are especially required. Several wireless sensor nodes (WSN) are placed on or in human body for monitoring biomedical signals and the central processing nodes (CPN) collect the signals transmitted by WSN. The power and cost issues are emphasized on WSN because of the long-term monitoring and portability. However, there exist some performance, power, area and cost problems on clock generator in present systems, such as ZigBee, Bluetooth, UWB, WiBoC [1] and so on. The sampling clock offset (SCO) degrades the system performance [2]. The analog-to-digital converter (ADC) circuits double receiver power at 2-times sampling rate [2]. Always-turned-on clock generator has much power dissipation compared with the baseband. The disintegrable quartz crystal oscillator occupies large area and power and needs extra board components which increase the manufacturing cost. The area and power comparison is shown in Fig. 1.1. 1.

(12) (a). (b) Fig. 1.1. WSN (a) power (b) area.. Dynamic phase recovery (DPR) [2] and dynamic frequency recovery (DFR) [2] have been proposed for ADC power reduction and performance improvement by the aid of a phase-frequency tunable clock generator (PFTCG) [2-3] for WBAN applications. In order to save the ADC power, DPR searches the best sampling phase in the received signal and reduces the sampling rate from Nyquist rate to the symbol rate. DFR recovers the received data and also tunes the ADC sampling frequency offset, resulting in less-interfered acquired data [2]. Fig. 1.2 and Fig. 1.3 [4] show the packet error rate (PER) and the power comparison with PFTCG by both DPR and DFR method under SCO = 50 ppm, respectively. There are only 0.25 dB SNR loss at PER = 1 % and 47.7 % ADC power reduction in the standard process 90 nm CMOS technology [4]. For system performance maintenance and power dissipation, the PFTCG is required in WBAN application.. 2.

(13) 0. 10. Ideal Case With PFTCG Without PFTCG PER = 1%. -1. PER. 10. -2. 10. -3. 10. 1. 2. 3. 4. 5. 6. 7. SNR (dB) Fig. 1.2. PER without and with PFTCG. [4]. Fig. 1.3. Power consumption without and with PFTCG. [4] All-digital clock generators have become more and more attractive in system integrations and system-on-chip (SoC) applications [5-8]. Instead of utilization passive components of voltage-controlled oscillators (VCO) in the phase-locked loops (PLL), all-digital PFTCG approach can minimize the power consumption and reduce the system turnaround time. Nevertheless, the always-turned-on PFTCG should be reduced more power in battery-limited devices in WBAN systems. Digitally 3.

(14) controlled oscillator (DCO) is the main module to all-digital clock generators and occupies over 50 % power dissipation [6]. The state-of-the-art DCO designs still have large power when operating frequency decreases [5-11]. Thus, this thesis attempts to propose a low power and delay tunable hysteresis delay element which is the key component in sub-10µW, high-resolution and wide-range DCO design. For synchronization in PFTCG for WBAN applications, there is a reference clock source. The most common clock source is the quartz crystal oscillator which provides frequency stability regardless of process, voltage and temperature (PVT) variations. However, the quartz crystal oscillator is difficult for integration and unsuitable for small size, low cost and low power requirement in portable devices. Silicon micro-electro-mechanical systems (MEMS) [12] have been proposed as a result of lower power consumption, but they also require extra CMOS processes, wafer level packaging technologies and long manufacturing duration. Ring oscillator based clock generator is proposed by [13] which makes use of a band-gap voltage regulator, temperature and process compensation circuits and a comparator. It accomplishes low cost demand and overcomes PVT variations, but the power dissipation is still a problem resulted from operational amplifier in band-gap regulator and comparator. Moreover, the PVT variation would have a greater effect upon stability and reliability when the process technique shrinks to nanometer scale instead of 0.25 µm process as [13]. Fig. 1.4 shows the frequency variation of a 5 MHz standard-cell-based ring oscillator under different PVT conditions in 90 nm CMOS technology. In worst case, the frequency would vary almost ±60 % due to different PVT condition corners. In this thesis, we propose a low cost, low power and portable PVT tolerance clock generator with frequency tuning capability in deep sub-micron CMOS process for frequency stability. 4.

(15) Frequency (MHz). 8 7. FF 0oC~125oC TT 0oC~125oC SS 0oC~125oC. 6 5 4 3. 2 0.90. 0.94. 0.98. 1.02. Voltage (V). 1.06. 1.10. Fig. 1.4. PVT variations of ring oscillator under 90 nm process.. 1.2 Organization The rest of this thesis is organized as follows. At first, the all-digital PFTCG is described in Chapter 2. Then, we propose a low power delay tunable hysteresis delay cell (HDC) for DCO design in Chapter 3. Chapter 4 presents a PVT tolerance clock generator. Finally, Chapter 5 summarizes our work and discusses some design topics in the future.. 5.

(16) CHAPTER 2 Phase-Frequency Tunable Clock Generator As shown in Chapter 1, the PFTCG is used for DPR and DFR [2] to change the generated clock phase and frequency in some response time. Traditional PLLs, designed by analog approaches, are composed of phase frequency detector (PFD), charge pump (CP) circuits, loop filter (LF), VCO and frequency divider. The analog-based PLLs have more difficulty in tradeoffs among gain, supply voltage and frequency range of VCO designs in more advanced process technology. The large capacitance of LF increases chip area, but the off-chip capacitance consumes much power. Furthermore, the serious leakage current problem to CP circuits in deep sub-micron technology also dominates overall power dissipation. On the contrary, the advantages of all digital approach, like all-digital phase-locked loop (ADPLL) [5-8], all-digital delay-locked loop (ADDLL) [14] or all-digital multi-phase clock generators (ADMCG) [15], are short lock-in time, low design complexity for voltage scaling and power minimization, and easily integration in SoC applications. Therefore, the PFTCG is proposed in all-digital scheme for 6.

(17) power reduction and performance improvement by the clock phase and frequency adjustments.. 2.1 System Overview The overall dynamic phase-frequency recovery [2] block diagram with the proposed all-digital PFTCG is shown in Fig. 2.1 [3]. The signals are transmitted with the channel noise and down-converted in the receiver side. Then, the received signals are sampled by symbol period with initial timing offset ε. After timing synchronization composed of packet detection and boundary detection blocks, the timing error detector (TED) starts maximum absolute-squared-sum (MASS) search [2] of the initial preamble. Afterward, the TED calculates the absolute-squared-sum with different sampling phase εˆ provided by the PFTCG. And then, the PFTCG selects the optimal sampling phase that results in MASS. Although TED adjusts the sampling clock phase, the drift amount due to sampling clock frequency offset ξ still increases. The frequency error detector (FED) estimates the sampling clock frequency offset after the fast Fourier transformation (FFT) by least squares (LS) algorithm [16]. The estimated sampling clock frequency offset ξˆ is also sent to the PFTCG for tuning sampling frequency. To summarize, the. ADC sampling clock is controlled by PFTCG with the estimated sampling phase offset εˆ and sampling frequency offset ξˆ .. The phase-selection capability of PFTCG enables the receiver to sample incoming signals at better instances without increasing sampling frequency, and the frequency fine-tuning capability reduces the SCO between the transmitter and 7.

(18) receiver for better PER performance. The design specification of PFTCG is listed in Table 2.1, including 5 MHz reference clock source and 5 MHz target output with 8 phases and ±150 ppm frequency tuning range centered at 5 MHz.. Fig. 2.1. Block diagram of the system operation with all-digital PFTCG.. Table 2.1. Specification of PFTCG. Reference Clock Source. 5 MHz. Output Clock. 5 MHz. Phase Number. 8. Frequency Tuning Range. ±150 ppm (@5MHz). 8.

(19) 2.2 Architecture The proposed all-digital and cell-based PFTCG architecture is shown in Fig. 2.2. There are four major blocks in the PFTCG, namely phase frequency detector (PFD), multi-phase digitally controlled oscillator (DCO), PFTCG controller, and glitch-free clock multiplexer (GFCMUX). The reference clock (REF_CLK) is generated at 5 MHz by the small and highly integrated circuits which are described in Chapter 4. In the locking loop, the PFD detects the difference of frequency and phase between the reference clock (REF_CLK) and the DCO output (Phase0). Then, it generates an up (UP) and down (DOWN) signal to indicate that the controller adjusts DCO control code (DCO_CODE) to speed up or slow down the DCO, respectively. The updated DCO control code can provide multi-phase DCO to generate eight phases clock (from PHASE0 to PHASE7) with equal spaced by the extracted DCO delay path. The glitch-free clock multiplexer receives the estimated sampling phase offset εˆ from TED and selects the optimal sampling phase from PHASE0 ~ PHASE7. The FED delivers the estimated sampling frequency offset ξˆ to PFTCG controller and slightly tunes the sampling frequency by DCO_CODE. According to the developed algorithm [5], the whole all-digital PFTCG operation mechanism is illustrated in Fig. 2.3. After the system reset, the all-digital PFTCG enters to a phase and frequency tracking state. The controller sets the DCO at the middle of delay path. The DCO initial search step is n/4, where n is the number of frequencies provided by the DCO. While the PFD detects from lead to lag, the search step is divided by two, and vice versa [5]. When a new DCO code is calculated, the. 9.

(20) present DCO and PFD control signals are first cleared and then updated to the latest DCO code. To clear DCO prevents from glitches which result from directly updating DCO codeword. To clear PFD keeps the coarse-tuning loop from frequency and phase divergence. When the search step reduces to one, the frequency of DCO output clock is acquired [5]. The DCO control code would be averaged during the next cycles for tracking the output clock frequency of DCO. Then, the lock signal (LOCK) triggers and DCO codeword locks the output clock frequency to the desired 5 MHz. Afterwards, the phase selection state is applied to switch and search the optimal sampling phase by the aid of TED. Finally, FED would send the estimated clock frequency offset ξˆ to PFTCG, resulting in the less-interfered data before system signal processing.. Fig. 2.2. Architecture of the proposed all-digital PFTCG.. 10.

(21) Fig. 2.3. Control mechanism of the proposed all-digital PFTCG.. 2.3 Circuit Designs. 2.3.1 Phase Frequency Detector The PFD design follows the circuit topology proposed in [5] with standard cell library and the block diagram is shown in Fig. 2.4. While the feedback clock (PHASE0) generated from DCO leads the reference clock source (REF_CLK), the signal QD generates a high pulse until REF_CLK arrives the D flip-flop (DFF) and triggers for QU. The generated signal QU first goes back to the reset branch on DFF and then clears the QU and QD. At the same time, OUTU brings about a low pulse and OUTD remains high. Finally, the flags UP and DOWN will be triggered by these 11.

(22) signals and sent to the PFTCG controller for slowing down the DCO. On the other hand, when PHASE0 lags REF_CLK, DOWN becomes high and UP remains low. The dead zone problem is generally known in PFD, which is caused by the limited response time of transistors. When the pulse width of QU or QD is not long enough to turn on the following circuits, the characteristic of PFD becomes discontinuous. To minimize the dead zone, a digital pulse amplifier [5] is proposed in Fig. 2.4. It uses the cascaded two-input AND gates architecture to enlarge the pulse width of OUTU and OUTD. There is another method to eliminate the dead zone with an inserted delay buffer in the feedback path of the reset branch. The increasing response time for DFF would effectively generate a wide enough pulse width to minimize the dead zone of the PFD, thus, the following D-flip-flops can detect it. When the phase error between REF_CLK and PHASE0 is less than 5 ps, both UP and DOWN will remain in high, and no trigger signal is sent to the PFTCG controller.. Fig. 2.4. Schematic of PFD.. 12.

(23) 2.3.2 Digitally Controlled Oscillator The proposed cell-based and 8-phase 5 MHz DCO is shown in Fig. 2.5. To preserve the DCO control code resolution and wide operation range under PVT variations from several tens of nanoseconds to the ten picoseconds scale, the proposed DCO is separated into three tuning stages. In order to provide 8 phases from the generated 5 MHz clock source, the buffers in the 1st tuning stage divide the total delay into a multiple of 50 ns in each delay segment and connect to 4 multiplexer groups. The signals, from OUT0 to OUT3, are extracted from the delay chain by multiplexer groups with equal spacing. Then, they are fine-tuned individually by the following 2nd and 3rd stages and generate 8 phase clock signals by inverters (INV) and buffers (BUF). The proposed 1st tuning stage employs cascading structure [17] with 16-to-1 path selector, as shown in Fig. 2.6, to maintain delay linearity and extend operation range easily. There are 4 bits of 1st tuning control code for the 16-to-1 path selector. The delay time difference between the two neighbor paths is determined by one 1st tuning delay cell including one buffer (BUF) and one multiplexer (MUX) as shown in Fig. 2.6. In place of the tri-state buffer architecture [5] [10-11] for path selector, the multiplexers can increase the controllable range. The summation of propagation delay from low to high (TPLH) and propagation delay from high to low (TPHL) of one 1st tuning delay cell is about 30.27 ns under PVT conditions (TT, 0.8V, 25℃). So, the delay resolution of the outputs (OUT0 ~ OUT3) is 30.27 ns when the 1st tuning control code changes by one.. 13.

(24) Fig. 2.5. Block diagram of DCO.. Fig. 2.6. Architecture of 1st tuning stage of DCO.. Fig. 2.7. Proposed delay cell in 3rd tuning stage [10]. Moreover, the 2nd and 3rd tuning stages are constructed after the 1st tuning stage to achieve better delay resolution of the proposed DCO. The circuit topology in the 2nd tuning stage follows the 1st stage except that the minimum delay resolution is 1.06 ns with 5 bits control code. For tracking the reference clock without the false lock in PFTCG, the controllable range of the 2nd tuning stage has to cover the delay resolution of 1st tuning stage. The principle of 3rd tuning stage design is the same as the mentioned 2nd tuning stage. 14.

(25) The least significant bit (LSB) resolution of the DCO can be improved to about 8.6 ps by adding 3rd tuning delay cell. The 3rd tuning stage applies the digitally-controlled varactors (DCV) [10] from cell library to accomplish the highest resolution and linearity. As shown in Fig. 2.7, there is an intrinsic capacitance (CI) parallel with a differential capacitance (∆C) in the output node (OUT). And the gate capacitance of 3-input NAND logic-gate is controlled by the digital code (ON3). The other input pin is tied to zero. This 3-input NAND is selected with one input pin tied to zero to cut off the path of NMOS and PMOS from ground and voltage supply, respectively [3]. Then the on-off behavior from ON3 decides if the additional loading capacitance (∆C) appeared in the output node of the delay cells, resulting in the change of charge and discharge in the desired delay resolution [3]. For the ±150 ppm frequency tuning range of design specification, the controllable range of the 3rd tuning stage has to be larger than 60 ps ( = 2 * 200ns * 150ppm ). In the proposed DCO design, the range of 3 tuning stage is at least 428.8 ps (= ±1072 ppm) under any PVT variations. There are 7 bits digital control codes in the 3rd tuning stage. Thus, the proposed DCO has 16 ( = 4 + 5 + 7 ) bits for tuning. Based on all standard cells, the delay resolution and controllable range of proposed three tuning stages under PVT conditions (TT, 0.8V, 25℃) are listed in Table 2.2. It shows that the controllable range of each stage is larger than the step of the previous stage. By HSPICE simulation, the tolerance maximum output frequency of the proposed DCO is 6.03 MHz (165.9 ns) and the minimum output frequency of the DCO is 4.48 MHz (223.0 ns) under PVT corners (SS, 0.72V, 125℃) ~ (FF, 1.1V, 0℃). As a result, total power consumption of the proposed DCO is 90.3 µW and 53.7. 15.

(26) µW under 1.0 V and scaled 0.8 V supply voltage, respectively, in UMC 90 nm CMOS process. Table 2.2. Controllable range and delay resolution of DCO in PFTCG. 1st Tuning Stage. 2nd Tuning Stage. 3rd Tuning Stage. Code Length (bits). 4. 5. 7. Range (ns). 454.05. 32.9158. 1.0922. Resolution (ns). 30.27. 1.0618. 0.0086. 2.3.3 Glitch-Free Clock Multiplexer As above, the proposed DCO generates 8 phase clock signals for DPR. Then, one of these 8 sources is selected by the glitch-free technique [18-19]. In general, a simple multiplexer is used to perform the selection operation. However, different arrival time of the switching signals to the conventional multiplexer results in glitches. The problem with the conventional multiplexer is that the control signal may change in any time with respect to the source clocks, which creates a potential for chopping the output clock or a glitch at the multiplexer output [19]. These glitches on the clock line would lead to the difficulty in sampling data synchronization and DPR [2]. Fig. 2.8 depicts a 2-to-1 clock switching circuits [19] that provides either of two clock signals CLK0 and CLK1 on a clock-distribution output OUT_CLK without switching glitches. For the purpose of protection the high pulse of OUT_CLK against interruption, two negative edge trigger DFF are used. As shown in Fig. 2.9, in the beginning, the selection signal (SELECT) switches to zero, and d0 turns to zero immediately. Then, at the following falling edge of CLK0, the upper DFF is triggered 16.

(27) and qb0 feedbacks to d1. At the same time, OUT_CLK stops the propagation from CLK0. In the end, the below DFF is triggered at the following negative edge of CLK1. and OUT_CLK switches to CLK1 without glitches. These circuits also assure that the second positive edge of output signal (OUT_CLK) after the selection signal changes from the new clock (CLK1). We can extend this 2-to-1 clock switching MUX to 8 clock sources switching. And each select signal has to feedback to all sources [19]. However, the DPR method orderly switches the 8 phase clocks and chooses the optimal phase by MASS search algorithm in DPR. So, we can modify the extend architecture to reduce the redundant circuits. The proposed 8-to-1 glitch-free clock MUX has not connect all feedback signals of DFF output (qb[0] ~ qb[7]) to select signals (SELECTION[0] ~ SELECTION[7]), as shown in Fig. 2.10. The phase selection signal (P[0:2]). controlled by TED transfers to SELECTION[0:7] by a decoder.. Fig. 2.8. Schematic of 2-to-1 glitch-free clock MUX [19].. 17.

(28) Fig. 2.9. Simulated waveforms of the 2-to-1 glitch-free MUX.. Fig. 2.10. Proposed 8-to-1 glitch-free clock MUX for DPR.. 18.

(29) 2.4 Simulation Result Fig. 2.11 shows the transient response of the proposed PFTCG operation scenario, where the reference clock (REF_CLK) is 5 MHz. When the RESET is triggered, the PFTCG starts to track the frequency and phase of reference clock. The DCO control codeword (DCO_CODE[15:0]) is converged to desired 5 MHz until the LOCK signal is triggered. By using an adaptive search step in frequency acquisition as. described in Section 2.2, the PFTCG can finish the tracking state in 128 ( = 4 * 2 * log(216 ) ) reference clock cycles in this worst case. During this tracking time, it is found that the CLEAR_DCO signal is sent frequently to update the DCO loop to a new delay path to avoid the glitches in the loop. Then, the phase selection signal P[2:0] controls the glitch-free clock multiplexer to switch the output phase (PHASE[7:0]) in order when the PFTCG is required to change the output clock phase to the optimal phase by TED. After the searching of phases, output clock (OUT) is fixed its sampling phase by the estimated sampling phase offset εˆ . The frequency tuning control signal ξˆ (corresponding to the signals TUNE_VALID and TUNE_CODE) from FED is sent for fine-tuning the frequency of. output clock. In Fig. 2.12, there are 8 even-spaced clock waveforms in the switching phase state. The phase of output clock is switched from PHASE7 to PHASE2. Each pair of waveforms has about 25 ns delay. The percentage between each phase slot is 10 %, 10 %, 10 %, 12 %, 14 %, 14 %, 14 % and 15 % of one period from PHASE0 to PHASE7, respectively.. 19.

(30) Fig. 2.11. Simulated waveforms of PFTCG operation scenario.. Fig. 2.12. Simulated multi-phase waveforms of PFTCG.. 20.

(31) 2.5 Implementation and Measurement Result We summarize the PFTCG hardware information in Table 2.3. The PFTCG is an always-on building block that continuously consumes both dynamic and static power. Therefore, it is implemented in the UMC standard process 90 nm high threshold voltage (SPHVT) CMOS technology for static current saving. The frequency of reference clock source is 5 MHz. The generated phase-frequency tunable output clock has 8 phases at 5 MHz. The delay cell resolutions of 1st ~ 3rd tuning stage in the DCO circuits are 30.27 ns, 1.06 ns, and 8.6 ps, respectively. Fig. 2.13 shows the area distribution of all-digital PFTCG. The DCO and controller almost occupy overall area. Table 2.3. The proposed PFTCG hardware profile. Technology. Standard 90 nm SPHVT CMOS. Target Frequency. 5 MHz. Phase Number. 8. 1st Tuning Stage Resolution. 30.27 ns. 2nd Tuning Stage Resolution. 1.06 ns. 3rd Tuning Stage Resolution. 8.6 ps. Freq. Tuning Range. ±1072 ppm(@5MHz). Core Area. 125 µm × 252 µm. The PFTCG designed layout view is shown in Fig. 2.14. In the area of this PFTCG, the main part is the DCO circuits from the delay cells to constitute the 25 ns delay in each delay phase. In the rest of the area, it mainly comes from the control circuits because the long delay line has multiple of delay stage to control and it 21.

(32) requires lots of circuits to decode the control signals. This PFTCG is integrated in a test system [21], dual-mode (MT-CDMA & OFDM) baseband transceiver, for system verification with the PFTCG area 125 µm x 252 µm, where the chip microphoto and layout of the PFTCG is shown in Fig. 2.15.. Fig. 2.13. Area distribution of all-digital PFTCG.. Fig. 2.14. Layout of the proposed PFTCG.. 22.

(33) (a). (b). Fig. 2.15. Micro chip photo (a) WSN (b) CPN. [21] Fig. 2.16 shows the measured output waveform of PFTCG using LeCroy LC584A. There four phase outputs (PHASE0, PHASE2, PHASE4 and PHASE6) at Channels 1, 2, 3 and 4. Both peak-to-peak phase jitter and maximum root-mean-square (RMS) jitter at 5 MHz are 287 and 640 ps over 15032 sweeps, respectively. By using a current-meter with 100 pA resolution at 1V/25℃ (supply of I/O pad is 2.5 V), the measured power consumptions are 145.8 µW and 95.4 µW at 5 MHz with 1.0 V and 0.8 V supply voltage, respectively.. Fig. 2.16. Measurement result of PFTCG. 23.

(34) 2.6 Summary An all-digital and cell-based clock generator is designed to enable the clock phase and frequency tuning dynamically during the wireless communications system in operation. The proposed all-digital PFTCG provides 8 clock phases for selection and enables the ADC sampling signals with lower frequency and better sampling phase, resulting in lower power dissipation. The PFTCG also achieves ±1072 ppm frequency tuning range centered at 5 MHz under any PVT variations, leading to high performance against SCO. Comparing with the no sampling offset case, there is only 0.25 dB SNR loss when PER = 1 % as shown in Fig. 1.2 [4]. Hardware is measured with 145.8 µW and 95.4 µW at 5 MHz with 1.0 V and 0.8 V supply in the standard process 90 nm CMOS technology. The overall power comparison is shown in Fig. 2.17. There is 46.1 % ADC power reduction included the overhead of DPR, DFR and PFTCG. Therefore, this proposed PFTCG enables the robust and high performance in SoC design for WBAN applications.. Fig. 2.17. ADC power comparison.. 24.

(35) CHAPTER 3 Hysteresis-Delay-Cell-Based Digitally Controlled Oscillator To meet power-critical or battery-less systems for WBAN application, a low power DCO is required in always-on clock generator. But, in most state-of-the-art DCO circuits to ADPLL [5-8], ADDLL [14] or ADMCG [15] circuits, the aspect of low power and fine delay resolution in low frequency application are not considered together. General techniques have been proposed to operate in low frequency, which is used by frequency divider circuits or long delay lines in DCO. In the frequency divider circuits approach, however, the original delay resolution of the divided signal would be damaged by frequency divider. Although the fine delay resolution can be achieved by the long delay line in DCO, the area and power dissipation also increases due to the cascading buffers in the long delay line [3]. The power consumption and delay resolution are always a trade-off in DCO design. The power of the previous proposed DCO in Chapter 2 occupies 75 % power consumption of all-digital PFTCG under 1.0 V. This DCO power is dominated by the cascading buffers (BUF) [17] and DCV [10] as shown in Fig. 2.6 and Fig. 2.7, 25.

(36) respectively. Each BUF is composed of a multiple of inverters for achieving 200 ns (5 MHz) delay values. But, the long cascading inverter chains waste much power with the switching transistors for the desired long delay as shown in Fig. 3.1. The poor energy and area efficiency in the cascading inverter chains is the major drawback for the low frequency application DCO design. The state-of-the-art DCO has been proposed in several architectures. For low power scheme, a 140 µW DCO has been proposed in [11]. When the DCO delay line selects a shorter delay path to provide higher operation frequency, some rest delay cells will not be used. These disabled delay cells still consumes extra power in DCO [11]. In order to disable the redundant delay cells in the operating DCO for power reduction, these delay cells are isolated from the delay loop in DCO [11]. Then, the DCO power is only related to the essential characteristic of the working cells. But, for further power reduction in DCO, there is a design challenge to decrease the power consumption in the cascading standard cells. Table 3.1 shows the delay value and power consumption of UMC 90 nm SPHVT standard cells. The cell delay is given by TD = TPHL + TPLH. (3-1). where TPHL and TPLH is the high-to-low and low-to-high propagation delay of each cells, respectively. The simulation is under PVT conditions (TT, 1.0V, 25℃). As the operating frequency becomes lower, the increasing power on the cascading cells would occupy higher power ratio in the DCO.. 26.

(37) Fig. 3.1. Repeating switching through cascading inverter.. Table 3.1. Delay and power of standard cells in 90 nm technology. BUFM2H BUFM4H BUFM8H DEL1M1H. DEL1M4H. DEL2M1H. Delay (ns). 0.100. 0.095. 0.090. 0.223. 0.199. 0.344. Power (µW). 57.01. 111.44. 225.79. 40.23. 85.65. 30.59. Furthermore, the techniques [5-11] [22] for improving the DCO resolution also affect the overall power consumption. For example, by controlling the number of the enabled tri-state buffers or tri-state inverters bank, driving capability modulation (DCM) technique changes the transistor driving capability on a fixed capacitance loading [6]. Nevertheless, DCM has the disadvantages of poor delay resolution, nonlinearity, large power dissipation and large area. Although the digitally controlled LC oscillator provides high tuning range and good stability [22], it requires dedicated circuit layout design and occupies large power consumption and area, which is composed of a parasitic capacitance tank. Additionally, the DCO with current-starved delay element [9] can change the delay value with the different controlling current and achieve high resolution, but the static current source consumes much static power. In contract with [9], the delay cell is constructed from transmission gates by the 27.

(38) equivalent channel resistance in the charge and discharge path [8]. It achieves high delay resolution, but the power dissipation is still unacceptable. Another delay resolution improvement technique uses different input code to control the charge path of or-and-inverter (OAI) cell shunted with tri-state inverters [5]. However, this approach also has nonlinear delay step. The other techniques [10] [20], moreover, use the shunt capacitor circuits to fine-tune the capacitance loadings and improve delay resolution and linearity. Unfortunately, DCV result in a poor performance on power consumption and area to maintain an acceptable operation range. Hysteresis delay cell (HDC) and DCV were proposed together in [7] [11], which was the first use of HDC in a DCO design. The HDC can replace many DCV cells and reduce some power consumption, but it does not possess better power feature than an inverter. Thus, a new HDC is proposed in the following sections to generate a wide delay range equal to the one in a multiple of inverters, in a simple technology, instead of cascading lots of buffers or inverters. The proposed HDC can not only overcome the design challenge in DCO power reduction with the least area, but also achieve high delay resolution, especially in sub-100MHz DCO designs.. 28.

(39) 3.1 Hysteresis Delay Cell The HDCs [23-25], or namely Schmitt triggers, were widely used in digital and analog circuits for waveform shaping under noisy environment. As shown in Fig. 3.2, the switching point of CMOS inverter circuits is fixed at the average of high level voltage and low level voltage because the PMOS and NMOS are both in the saturation region. But the output signal of HDC circuits is filtered by the high level and low level threshold voltage, donated as V+ and V− , respectively. There exists an extra delay between the output of the inverter and HDC due to the hysteresis phenomenon. Fig. 3.3 describes the transfer function of HDC. The Boolean logical function of HDC in Fig. 3.3 is the same as an inverter gate. In forward switching path, the voltage of output (VOUT) remains high level until the voltage of input (VIN) increases to V+. Then, the output ties to the low voltage. Oppositely, when VIN decreases to V-, VOUT switches to the high level voltage. The hysteresis voltage width of HDC is defined as equation (3-2). Vhw = V+ − V−. (3-2). The hysteresis width presents the output from the cross-talk noise and supply noise on clock and supply power, and also increases the response time of HDC circuits. However, the feature of hysteresis, or non-sensitivity with input, can provide a long delay in place of lots of cascading inverters. There are three common HDC in the following sections, including Rabaey [23], Dokic [24] and Sarawi [25] architecture. We attempt to analyze the power consumption and compare with the standard cells in UMC 90 nm CMOS technology. 29.

(40) Fig. 3.2. Output signals through inverter and HDC.. Fig. 3.3. Transfer function of HDC.. 30.

(41) 3.1.1 Rabaey Architecture The HDC with Rabaey architecture was proposed as shown in Fig. 3.4 [23]. There are three inverters in this architecture. The transfer function of Rabaey’s HDC is different from that in Fig. 3.3. The Boolean logic of this Rabaey architecture is the same as a buffer cell. The static behavior of Rabaey architecture is stated as follows. In the beginning, we assume the input voltage VIN is in high level voltage VDD and the output voltage is tied to low. When VIN decreases to a certain voltage V-, the mp3 and mn4 invert the output voltage to VDD. Therefore, the output feedbacks to mp3 and mn4 to speed up the transition and produce a clean output signal [23]. The low level switching point Vis determined by the transistor mp1, mn1 and mn2. The analysis as forward switching is similar to the above. However, Rabaey HDC consumes large power dissipation due to the short current path.. (a). (b). Fig. 3.4. Rabaey HDC (a) Circuits (b) Schematic.. 31.

(42) 3.1.2 Dokic Architecture There is Dokic architecture of HDC as shown in Fig. 3.5 [24]. The transfer function is the same in Fig. 3.3 as well. It can be extended to a NOR and NAND type HDC. When the input voltage VIN is equal to VDD, mp1 and mp2 are in cut off region, and mn1 and mn2 are turned on. So, the voltage of output VOUT is equal to ground resulting mn3 in cut off region and mp3 in saturation region. While VIN decreases to V-, mp1 and mp3 act as a saturated enhancement-mode inverter. Transistor mp2 turns on. as well, providing a charging path from VDD to output. Oppositely, if VIN increases to V+, mn1, mn2 and mn3 are on. Then, there is a discharging path from output to ground.. These obvious short current paths bring about the major power consumption in the Dokic HDC.. Fig. 3.5. Dokic HDC.. 32.

(43) 3.1.3 Sarawi Architecture Fig. 3.6 illustrates Sarawi HDC [25] which is designed by inverter chain internally cascaded with a footer and a header. Fig. 3.3 depicts the transfer function. The operation of this HDC circuit can be described as follows. First, suppose the initial input voltage VIN is VDD, so that the mn2 is on and the mp2 is in cut off region, which implies mn3 is turned off, mp3 is turned on, mn1 is on and mp1 is off. Transistor mn2 remains on and mp2 remains off until VIN decreases to a certain voltage V-, at which output, VOUT switches from a low to a high value. The similar behavior as forward switching with mp2, mn2 and mn1 is observed as follows. When a low level voltage is applied to VIN, VOUT goes to VDD. VOUT would switch from VDD to ground until VIN increases into the high level threshold voltage V+ and triggers the pull-down network. Because of the lack of directly short current path, the longer delay and less power consumption can be expected in Sarawi HDC.. Fig. 3.6. Sarawi HDC.. 33.

(44) 3.1.4 Comparison Table 3.2 lists the performance comparisons with the above HDCs and the standard cells in UMC SPHVT 90 nm CMOS technology, including BUFM2H, BUFM4H, BUFM8H, DEL1M1H, DEL1M4H and DEL2M1H. The simulation PVT condition is at typical corner case and 1.0 V supply voltage. Table 3.2. Performance comparison with standard cells and HDCs.. Delay (ns) BUFM2H. Area Efficiency. Energy Area Energy Efficiency Efficiency Efficiency Normalization Normalization (s/µJ) (%) (%). Area. Power. 2. (µm ). (µW). 0.100. 0.154. 57.01. 0.648. 6.74. 0.018. 3.95. BUFM4H. 0.095. 0.307. 111.44. 0.308. 3.20. 0.009. 2.02. BUFM8H. 0.090. 0.614. 225.79. 0.146. 1.52. 0.004. 1.00. DEL1M1H. 0.223. 0.170. 40.23. 1.314. 13.67. 0.025. 5.59. DEL1M4H. 0.199. 0.421. 85.65. 0.474. 4.93. 0.012. 2.63. DEL2M1H. 0.344. 0.170. 30.59. 2.030. 21.12. 0.033. 7.36. Rabaey. 0.177. 0.154. 72.34. 1.152. 11.98. 0.014. 3.11. Dokic. 0.212. 0.115. 31.53. 1.842. 19.16. 0.032. 7.14. Sarawi. 1.384. 0.144. 2.25. 9.612. 100. 0.444. 100. (ns/µm2). The cell delay is the summation of high-to-low and low-to-high propagation delay, defined in equation (3-1). The area efficiency is an index of cost as (3-3), which is the delay comparison within same area. And, the energy efficiency means the inverse of transition power as (3-4). These two parameters can be regarded as a figure of merit to evaluate the performance of delay cells.. 34.

(45) Area Efficiency = Energy Efficiency =. Delay Area. (3-3). Delay Energy. (3-4). The normalization of area and energy efficiency is shown in Fig. 3.7 and Fig. 3.8, respectively. By the simulation results, it is found that the HDCs of Rabaey and Dokic perform similar area and energy efficiency to the standard cells. But, the Sarawi architecture represents the best performance in both area and energy efficiency. That implies the Sarawi HDC can achieve the same delay by using the least area and energy compared with the other delay cells. So, we will re-analyze the Sarawi HDC in the following section and propose a new delay tunable and low power HDC for DCO resolution improvement.. Area Efficiency Normalization (%). 100. 90. 80. 70. 60. 50. 40. 30. 20. 10. 0. Sarawi DEL2M1H Dokic DEL1M1H Rabaey BUFM2HDEL1M4HBUFM4H BUFM8H. Fig. 3.7. Normalization of area efficiency with standard cells and HDCs.. 35.

(46) Energy Efficiency Normalization (%). 100. 90. 80. 70. 60. 50. 40. 30. 20. 10. 0. Sarawi DEL2M1H Dokic DEL1M1HBUFM2H Rabaey DEL1M4HBUFM4H BUFM8H. Fig. 3.8. Normalization of energy efficiency with standard cells and HDCs.. 3.2 Proposed Hysteresis Delay Cell. 3.2.1 Formulation The reason of the longest delay value and most area and energy efficiency in Sarawi HDC is the wide hysteresis voltage width and creeping rise/fall time of output. As shown in Fig. 3.5, when the input voltage of HDC decreases from VDD to V- in the reverse switching path, the currents of the transistors, mp2, mn2 and mp1, are the same [25] as follows.. I mp 2 = I mn 2 = I mp1. 36. (3-5).

(47) Thus, we may rewrite (3-5) as. 1 1 1 β p 2 (V− − VS 1 − Vtp ) 2 = β n 2 (V− − Vtn ) 2 = β p1 (VS 1 − VDD − Vtp ) 2 2 2 2. (3-6). where βm is the transconductance of transistor m labeled in Fig. 3.6. Vtn and Vtp is threshold voltage of NMOS and PMOS, respectively. Based on the left hand side of (3-6), we have. VS1 + Vtp + V− =. βn2 *V β p 2 tn. βn2 1+ β p2. (3-7). where VS1 is the voltage in node S1 [25]. According to the right hand side in (3-6), the VS1 is expressed as. V− − Vtp + VS1 =. β p1 (V + V ) β p 2 DD tp. β p2 1+ β p2. (3-8). Substituting this result in (3-8) into (3-7), we summarize the expression as. V− =. R pVDD + 2 R pVtp + R(1 + R p )Vtn RR p + R p + R. (3-9). where R p = β p1 β p 2 and R = β n 2 β p 2 . If Vtp = −Vtn , we may rewrite (3-9) as. V− =. R pVDD + ( RR p − 2 R p + R)Vtn RR p + R p + R. 37. (3-10).

(48) The same analysis as forward switching with mp2, mn2 and mn1 is as follows [25]. When a low value signal is applied to VIN, VOUT goes high. VOUT would switch from high to low until VIN increases into V+. In the forward switching path, the relationship of the currents between transistors mp2, mn2 and mn1 can be written as. I mn 2 = I mp 2 = I mn1. (3-11). 1 1 1 β n 2 (V+ − VS 2 − Vtn ) 2 = β p 2 (V+ − VDD − Vtp ) 2 = β n1 (VS 2 − Vtn ) 2 2 2 2. (3-12). From (3-12), we have the forward switching point as similar as (3-9).. V+ =. ( Rn + 1)VDD + ( Rn + 1)Vtp + 2 RRnVtn. (3-13). RRn + Rn + 1. where Rn = β n1 β n 2 . When Vtp = −Vtn , (3-13) is expressed as. V+ =. ( Rn + 1)VDD + (2 RRn − Rn − 1)Vtn RRn + Rn + 1. (3-14). The switching points, V+ and V-, of Sarawi HDC can be calculated by (3-10) and (3-14) with. R = R p = Rn = 1 , leading to. V− = VDD 3 ,. V+ = VDD * 2 3. and. Vhw = VDD 3 [25]. Consequently, based on (3-10) and (3-14), we substitute R = 1 in these equations. The switching points V- and V+ can be rewritten as. V− =. V+ =. Vtn + R p (VDD − Vtn ) 2Rp + 1. VDD − Vtn + Rn (VDD + Vtn ) 2 Rn + 1. 38. (3-15). (3-16).

(49) Fig. 3.9. Transition response of Sarawi HDC. The rise time TRISE and fall time TFALL of the HDC circuits contribute the most delay in the hysteresis phenomenon. As shown in Fig. 3.9, the transition time of the HDC dominates the overall propagation delay. Assume the output capacitance COUT is voltage independent. The fall time TFALL consists of three intervals. The first part tf1 is the time interval of VOUT from (0.9 * VDD ) to (VDD − Vtn ) and mn2 is operated in saturation region, resulting in. increasing voltage of node S2. The model is expressed as COUT. dVOUT β = − n 2 (VDD − Vtn ) 2 dt 2. (3-17). Taking the integration, we obtain VDD −Vtn. COUT ∫. 0.9VDD. dVOUT = −. β n2. Therefore, (3-19) is summarized as. 39. 2. tf1. (VDD − Vtn ) 2 ∫ dt 0. (3-18).

(50) 2COUT (Vtn − 0.1VDD ) β n 2 (VDD − Vtn ) 2. tf1 =. (3-19). The second part tf2 is the time interval when mn2 is in linear region. In this interval, VOUT drops from (VDD − Vtn ) to (0.5 *VDD ) and then turns on mn1, which is shown as. β dVOUT 2 = − n 2 (2(VDD − Vtn )VOUT − VOUT ) dt 2. COUT COUT ∫. β dVOUT = − n2 2 2 − Vtn )VOUT − VOUT ). 0.5VDD. VDD −Vtn. 2((VDD. tf2 =. ∫. tf 2. 0. (3-20). dt. COUT 3V − 4Vtn ln DD β n 2 (VDD − Vtn ) VDD. (3-21). (3-22). The other part tf3 is the time interval when mn1 and mn2 are both in linear region. In the discharging interval, VOUT drops from (0.5 * VDD ) to (0.1 * VDD ) through mn1 and mn2. We can find that COUT COUT ∫. dVOUT β 2 = − N (2(VDD − Vtn )VOUT − VOUT ) dt 2. 0.1VDD. 0.5VDD. 2((VDD. tf3 =. β dVOUT =− N 2 2 − Vtn )VOUT − VOUT ). COUT 19VDD − 20Vtn ln β N (VDD − Vtn ) 3VDD − 4Vtn. ∫. tf 3. 0. (3-23). dt. (3-24). (3-25). where the βN is the equivalent transconductance of combination of mn1 and mn2.. βN =. 1 1. β n1. +. 1. (3-26). β n2. Consequently, the fall time TFALL is the summation of tf1, tf2 and tf3. From (3-19), (3-22) and (3-24), we may we rewrite the expression as. 40.

(51) TFALL =. 2V − 0.2VDD 3V − 4Vtn β n 2 19VDD − 20Vtn COUT ( tn + ln DD ln ) (3-27) + 3VDD − 4Vtn β n 2 (VDD − Vtn ) VDD − Vtn VDD βN. By the similar analysis, the rise time TRISE can be obtained in (3-28).. TRISE =. 3V + 4Vtp β p 2 19VDD + 20Vtp − 2Vtp − 0.2VDD COUT ( ln ) (3-28) + ln DD + 3VDD + 4Vtp β p 2 (VDD + Vtp ) VDD + Vtp VDD βP. where βP is the equivalent transconductance of mp1 and mp2. Based on (3-27) and (3-28), the rise time TRISE and fall time TFALL are inverse proportional to the transconductances βn2, βN, βp2 and βP. Thus, we can control the propagation delay of HDC by different βn2, βN, βp2 and βP.. 3.2.2 Delay Tunable Hysteresis Delay Cell According to the previous analysis, we propose a seven stage delay tunable HDC based on the original low power HDC architecture as shown in Fig. 3.10. The sizing of transistors in Fig. 3.10 is listed in Table 3.3. These delay stages control the fall time TFALL by the discharge transconductance in the proposed HDC. With different. codeword, the proposed circuits perform different values in the propagation delay. The simulation results of delay value and power consumption is shown in Fig. 3.11 and Fig. 3.12, respectively. The proposed HDC can achieve 0.78 ps delay resolution, and the delay range is from 1.643 ns to 1.742 ns with the fine delay linearity which guarantees a monotonic delay behavior when the control word increases. The delay value is several hundreds times cell delay of one minimum size inverter and the power consumption is below 2.2 µW in each codeword.. 41.

(52) Fig. 3.10. Proposed delay tunable HDC. Table 3.3. Transistor size of proposed delay tunable HDC. Transistor. mp1. mp2. mp3. mn1. mn2. mn3. mr00. mr01. mr02. mr03. W/L (µm/µm). 0.36 0.08. 0.36 0.08. 0.36 0.08. 0.36 0.08. 0.24 0.08. 0.24 0.08. 0.23 0.16. 0.22 0.16. 0.20 0.16. 0.17 0.16. Transistor. mr04. mr05. mr06. mr07. mr08. mr09. mr10. mr11. mr12. mr13. W/L. 0.14 0.16. 0.16 0.24. 0.11 0.24. 0.12 0.08. 0.12 0.08. 0.12 0.08. 0.12 0.08. 0.12 0.08. 0.12 0.08. 0.12 0.08. Delay. 1.74 1.72 Delay (ns). (µm/µm). 1.7 1.68 1.66 1.64 0. 20. 40. 60 80 Codeword. 100 120. Fig. 3.11. Delay of the proposed delay tunable HDC.. 42.

(53) Power Power ( µW). 2.4 2.3 2.2 2.1 2 1.9 0. 20. 40. 60 80 Codeword. 100 120. Fig. 3.12. Power of the proposed delay tunable HDC.. Fig. 3.13. Cascading BUF and DCV. Table 3.4 illustrates the features and comparisons with proposed delay tunable HDC and the most commonly-used cascading BUF and DCV approach [10] which is depicted in Fig. 3.13. The proposed delay tunable HDC with similar propagation delay and controllable range can perform better performance in resolution, power and area. The delay resolution improves the DCO frequency tuning step and covers every desired delay value in DCO. The 98.4 % power reduction and 92.8 % area reduction implies both dynamic and static power saving, resulting in better area efficiency and energy efficiency on clock generator.. 43.

(54) Table 3.4. Comparison of cascading BUF and DCV to delay tunable HDC.. Transistor. Delay (ns). Controllable Range (ps). Resolution (ps). Power. Area. (µW). (µm2). Cascading BUF & DCV. 1.86. 67.7. 2.26. 133. 6.048. Proposed Delay Tunable HDC. 1.64. 99.4. 0.78. 2.2. 0.437. 3.3 Proposed HDC-Based Digitally Controlled Oscillator By the above proposed delay tunable HDC, we design a 5 MHz low power all-HDC-based DCO as shown in Fig. 3.14. The proposed DCO is partitioned into two tuning stages. The 1st tuning stage composed of HDC1 extends the controllable range of DCO. The 2nd tuning stage, cascading HDC2, is for the delay resolution improvement. Because the targeted frequency is 5 MHz, the total delay of HDCs in 2nd stage has less than 200 ns under any PVT conditions. Furthermore, the delay controllable range in 2nd tuning stage must cover the delay resolution of 1st tuning stage, avoiding false lock in PFTCG, ADPLL [5-8] or ADDLL [14] applications. The delay resolution of 1st tuning stage is summation the summation of propagation delay from low to high (TPLH) and propagation delay from high to low (TPHL) of HDC1. The architecture of HDC1 is illustrated in Fig. 3.15. Based on the Sarawi HDC, we apply an extra transistor mp4 as the header. The ENABLE signal is used for isolating the redundant delay elements in the closed loop and saving the power consumption. Generally, the dynamic power Pdym in the 1st tuning stage is expressed as. 44.

(55) Fig. 3.14. Architecture of the proposed HDC-based DCO.. Fig. 3.15. Delay element of the 1st tuning stage.. 45.

(56) Pdym = C LVDD f 2. (3-29). where CL is the overall loading capacitance and f is the circuit operating frequency. When we don’t disable the redundant delay elements outside the closed loop in 1st tuning stage of DCO, the power becomes. Pdym = C LVDD f = ( M * Ccell )VDD 2. 2. M CCellVDD 1 = * N * TD N TD. 2. (3-30). where M is the total number of HDC1, N is the number of HDC1 in the closed loop, Ccell and TD are the capacitance and delay value of one HDC1, respectively. When the ENABLE signal turns off the redundant delay, the dynamic power is written as. Pdym = ( N * Ccell )VDD. 2. C V 1 = cell DD N * TD TD. 2. (3-31). The dynamic power with disabled redundant elements is N M times of the power with unblocked the elements. In other words, the power consumption with disabled redundant elements is independent of N. It also means that the 1st tuning stage power consumption is fixed as shown as (3-31) whatever DCO operating frequency is. Consequently, the power and delay characteristics of HDC1 imply the overall 1st tuning stage power performance. The 2nd stage delay element HDC2 is the same as the proposed delay tunable HDC in Section 3.2, which provides both delay resolution and delay offset. For covering the delay resolution of 1st tuning stage, the 2nd tuning stage must have enough controllable range. Thus, the number of 2nd tuning stage element increases to 64. Table 3.5 summarizes the control code length, controllable range and delay resolution of the 5 MHZ all-HDC-based DCO. 46.

(57) For hundred-MHz DCO application, we can apply the same DCO architecture as shown in Fig. 3.14. The HDC1 in the 1st tuning stage can be changed with the small delay value cells, like AND logic-gate cells, for decreasing the delay value and increasing the operating frequency. The 2nd tuning stage still applies the cascading delay tunable HDC for preserving the resolution. Table 3.6 shows the simulation results of modified 200 MHz HDC-based DCO. The code length is 14 bits with 6 bits in 1st tuning stage and 8 bits in 2nd tuning stage. The LSB delay resolution is still 0.78 ps.. Table 3.5. Controllable range and delay resolution of 5 MHz HDC-based DCO. 1st Tuning Stage. 2nd Tuning Stage. Code Length (bits). 7. 13. Range (ns). 412.201. 6.362. Resolution (ps). 3246. 0.78. Table 3.6. Controllable range and delay resolution of 200 MHz HDC-based DCO. 1st Tuning Stage. 2nd Tuning Stage. Code Length (bits). 6. 8. Range (ns). 10.09. 0.199. Resolution (ps). 160. 0.78. 47.

(58) 3.4 Simulation Result Based on the low power and high delay resolution for WBAN application [3], the proposed HDC-based DCO is verified and implemented in the standard process 90 nm high threshold voltage (SPHVT) CMOS technology in both 5 MHz and 200 MHz operating frequencies. The power consumption is 2.6 µW at 5 MHz and 14.3 µW at 200 MHz under 1.0 V supply voltage, respectively. The reason for larger power at 200 MHz than 5 MHz DCO is the poor energy efficiency of standard AND logic-gates in 1st tuning stage. The LSB resolutions of both DCOs with delay tunable HDC are 0.78 ps. The designed 5 MHz DCO requires 20 bits control word length and the range of operating frequency is from 1.9 MHz (526.3 ns) to 9.4 MHz (106.8 ns). The other DCO, operating at 200 MHz, has 14 bits codeword and operating range is from 69.8 MHz (14.3 ns) to 249.4 MHz (4.01 ns) with 0.78 ps delay resolution. Note that it is easy to extend the controllable range of these DCOs by changing the HDC numbers or using other small delay cells in the 1st tuning stage. In order to fine-tune the delay resolution, the equivalent transconductance of delay tunable HDCs in the 2nd tuning stage can be easily controlled as well. Table 3.7 lists the overall comparison to the state-of-the-art DCOs. The proposed DCO has the least power dissipation compared with other designs, and also achieves high delay resolution. As a result, the proposed HDC-based DCO indeed has the benefits of better resolution, operation range and delay linearity for low power applications.. 48.

(59) Table 3.7. Performance comparison of DCO. Performance. Proposed. Proposed. TCAS2’07. JSSC’05. ISQED’02. ISCAS’06. JSSC’04. Indices. DCOⅠ. DCOⅡ. [11]. [9]. [20]. [8]. [6]. Process. 90nm. 90nm. 90nm. 0.18µm. 0.13µm. 0.13µm. 0.35µm. 1. 1. 1. 1.8. 1.65. 1.2. 3. 20. 14. 15. 5. 8. 8. 7. 1.9~9.4. 70~249. 191~952. 413~485. 150. 200~750. 152~366. 0.78. 0.78. 1.47. 2. 40. N/A. 10~150. Power. 2.6µW. 14.3µW. 140µW. 170~340µW. *0.5mW. *0.85mW. *12mW. Consumption. (@5MHz). (@200MHz). (@200MHz). (Static only). (@150MHz). (@560MHz). (@366MHz). Supply Voltage (V) DCO Control Word Length Operation Range (MHz) LSB Resolution (ps). *Power consumption estimated from 50% of ADPLL. 3.5 Summary In this chapter, we introduce a low power, small area and high delay resolution DCO by the HDC. Compared with the standard cells, the proposed HDC not only has the low power and small area feature, but also achieves high delay resolution with linearity. With the aid of the proposed HDC, the 5 MHz DCO has 0.78 ps LSB delay resolution and only consumes 2.6 µW under 1.0 V. Another proposed design of 200 MHz DCO can provide 0.78 ps resolution and 14.3 µW under 1.0 V supply voltage in the standard process 90 nm CMOS technologies, which consumes the least power dissipation of the state-of-the-art DCO.. 49.

(60) As a result, this work enables 97.6 % power reduction and 99.6 % area reduction in comparison with the previous DCO in Chapter 2 under 1.0 V. In terms of the all-digital PFTCG, the overall power and area reduction are 73.0 % and 47.6 %, respectively.. (a). (b). Fig. 3.16. PFTCG comparison (a) power (b) area.. 50.

(61) CHAPTER 4 PVT Tolerance Clock Generator In general, the quartz crystal oscillator is the familiar solution to reference clock source in communication systems. For WBAN applications, the requirements of clock source are low power, low cost and small area, especially in WSN. Although the quartz crystal oscillator can provide good stability under different PVT variations, the milli-watt power consumption [26], large area and extra board component bonding are the fatal disadvantages. The additional board components also result in the difficulty in system integrations and increase the manufacturing cost. The silicon micro-electro-mechanical systems (MEMS) [12] have been proposed to replace the quartz crystal oscillator. But, the extra CMOS processes, wafer level packaging technologies and long manufacturing duration increase the cost and the time to market (TTM) as well. Furthermore, in the quartz crystal oscillators and MEMS approaches, the frequency accuracy would decay when operating time increases. The generated clock is incapable of calibrating by the system. In system level, the ring oscillator seems to a better solution to chip integration, power budget and area. A 7 MHz ring oscillator [13] has been proposed by adding a band-gap voltage regulator, temperature and process compensation circuits and a 51.

(62) comparator. However, the operational amplifiers in voltage regulator and comparator consume large power. Moreover, when the CMOS technology scales to the next advanced generation, the PVT variations become worse as shown in Fig. 1.4. The violent frequency variation rate of 5 MHz ring oscillator is about 60 % under the worst case PVT corners. For the low power and integrable clock source applications, such as WBAN, the design challenge is against the serious PVT variations. In the following sections, we describe a new methodology to generate a stable and low power clock source under any PVT variations. The proposed design also has frequency tuning capability to fine-tune the clock frequency and avoid the frequency drift in the long service life. The design specification is 5 MHz which is as same as the PFTCG reference clock source.. Fig. 4.1. Architecture of the proposed PVT tolerance clock generator.. 4.1 Architecture The proposed PVT tolerance clock generator is shown in Fig. 4.1. It is composed of three blocks, including PVT detector, mapper and clock oscillator. The PVT detector can extract the delay information from different PVT conditions. The mapper 52.

(63) transfers the information to a digital codeword for calibrating the PVT variations. The clock oscillator receives the digital codeword from mapper and generates the target frequency clock. The process parameters are provided by the standard chip testing procedure and stored in one time programming (OTP) devices to calibrate the process variation. The process calibration behavior can be executed on the mapper, or on both PVT detector and mapper. The frequency tuning command feedbacks from system frequency recovery loop for fine-tuning the clock frequency [2].. 4.2 Circuit Designs. 4.2.1 PVT Detector The PVT detector senses the PVT conditions and transfers the response of delay information to a digital code. The delay information is extracted by delay cells which have different PVT sensitivities. Suppose there are two different delay cells in the PVT detector, namely the reference delay cells and variable delay cells. These two delay cells with different PVT sensitivities result in different delay variation rates under different PVT conditions. The PVT detector can observe the relative delay variation between the reference cells and variable cells by the delay ratio R ( P, V , T ) =. DVAR ( P, V , T ) DREF ( P, V , T ). 53. (4-1).

(64) where DVAR(P,V,T) is the delay of a variable cell and DREF(P,V,T) is the delay of a reference cell. Both delay values depend on the PVT conditions, so the delay ratio R(P,V,T) is a function of PVT as well.. Fig. 4.2 shows the delay ratio of variable cell (ND4M0H_L) to reference cell (BUFM8H) versus absolute delay under different PVT corners. The ND4M0H_L is a ND4M0H cell with another ND4M0H in the output loading. The ND4M0H and BUFM8H cells are both standard cells in UMC 90 nm SPHVT technology. The cell delay value results from a step input. In Fig. 4.2, the x-axis is the delay ratio R(P,V,T) and the y-axis is the absolute delay value. The three groups of data are the simulation results on the three process corners (FF, TT, SS). Each group covers different simulating voltage (0.9 V ~ 1.1 V) and temperature (0℃ ~ 125℃) variations.. 0.35. DREF. Delay (ns). 0.3. SS. 0.25 TT. 0.2. FF. 0.15 0.1 0.22. 0.24. 0.26. 0.28. R Fig. 4.2. Delay ratio of ND4M0H_L to BUFM8H.. 54.

(65) The relation between delay ratio and absolute delay value of reference cell can be modeled a one-to-one mapping function under fixed process variation. Thus, the delay variation under certain process condition is approximated to a second order curve, which is written as DModel ( P,V , T ) = Dˆ REF ( P,V , T ) = a * R ( P,V , T ) 2 + b * R ( P,V , T ) + c. (4-2). where a, b and c represents the process variation coefficients. These process variation parameters can be obtained and stored from the chip testing procedure. Then, the second order modeling error is expressed as E Model = abs (. DModel ( P,V , T ) − DREF ( P,V , T ) ) DREF ( P,V , T ). (4-3). The simulation results of the second order modeling curves are shown in Fig. 4.3 (a). The modeling error is limited by the PVT sensitivity between the reference cells and variable cells, which is the vibrations of the curves as shown in Fig. 4.3 (b). The maximum modeling error is about 2.05 % as shown in Fig. 4.4.. 0.35. DREF. DREF 0.15. DModel Delay (ns). Delay (ns). 0.3 0.25 0.2. Zoom In. DModel. 0.145 0.14. 0.15 0.135 0.1 0.22. 0.24. R. 0.26. 0.28. 0.263. (a). 0.265. R. (b). Fig. 4.3. Second order modeling curve of delay value.. 55. 0.267.

(66) Max. Modeling Error = 2.0483 %. Error (%). 3. EModel. 2. 1. SS 0. 0.22. FF. TT 0.24. 0.26. R. 0.28. Fig. 4.4. Second order modeling error. For implementation, we have to partition the delay ratio into several intervals and map the digital codeword of delay ratio into the real delay value. In the i-th delay ratio region, we can map these delay ratios into a delay value DPartition,i. DModel ,i , MAX = a * Ri , MIN + b * Ri , MIN + c. (4-4). DModel ,i ,MIN = a * Ri ,MAX + b * Ri ,MAX + c. (4-5). 2. 2. DPartition ,i =. DMode ,i ,MAX + DMode,i ,MIN 2 = a * RPartition ,i + b * RPartition ,i + c 2. (4-6). where Ri,MIN is the minimum delay ratio in the i-th region and Ri,MIN is the maximum delay ratio in the i-th region. DModel,i,MAX and DModel,i,MIN are the maximum and minimum modeling delay in the i-th delay ratio region, respectively. RPartition,i is the corresponding delay ratio in the i-th region. The frequency error after partition is written as E Partition,i = abs (. DPartition ,i − DREF ( P, V , T ) ) DREF ( P, V , T ). 56. (4-7).

參考文獻

相關文件

ADSL(A symmetric D igital S ubscriber L ine ,非對稱數位

This design the quadrature voltage-controlled oscillator and measure center frequency, output power, phase noise and output waveform, these four parameters. In four parameters

Abstract—We propose a multi-segment approximation method to design a CMOS current-mode hyperbolic tangent sigmoid function with high accuracy and wide input dynamic range.. The

Due to low birth rate and setting up many new senior high schools and senior vocational schools, now the rate of entering a higher school for junior high school graduates has

To maximize coverage rate and minimize overlapping sensing area, we propose a greedy algorithm to find an arc on the circumference of the minimum overlapping area which can

A segmented current steering architecture is used with optimized performance for speed, resolution, power consumption and area with TSMC 0.18μm process.. The DAC can be operated up

We design a method to build the corresponding graph, bounding functions, and then searching cycles by using a backtracking algorithm to find the feasible

In summary, by establishing knowledge base mould construction with optimal design, this study proposes an ideal method for motorcycle design, hoping to cut down the time required