• 沒有找到結果。

A Sub-100 mu W Area-Efficient Digitally-Controlled Oscillator Based on Hysteresis Delay Cell Topologies

N/A
N/A
Protected

Academic year: 2021

Share "A Sub-100 mu W Area-Efficient Digitally-Controlled Oscillator Based on Hysteresis Delay Cell Topologies"

Copied!
4
0
0

加載中.... (立即查看全文)

全文

(1)

IEEE Asian Solid-State Circuits Conference November 16-18, 2009 / Taipei, Taiwan

978-1-4244-4434-2/09/$25.00 ©2009 IEEE 89

3-2

A Sub-100ȝW Area-Efficient Digitally-Controlled

Oscillator Based on Hysteresis Delay Cell Topologies

Man-Chia Chen, Jui-Yuan Yu, and Chen-Yi Lee Department of Electronics Engineering National Chiao-Tung University, Hsinchu, Taiwan

{mandy, blues, cylee}@si2lab.org

Abstract—This work addresses an all digitally-controlled

oscillator (DCO) design with three newly proposed hysteresis delay cells (HDC). According to circuit topologies, the three HDCs are defined as on-off, cascaded, and nested HDCs that provide different propagation delay. These HDCs comprise architecture, a power-of-two delay stage DCO (P2DCO), that every delay stage provides half delay than the previous one in a descending order, resulting in low power and low cost features. A self-calibration method is accompanied to maintain the monotonicity of the P2DCO under PVT variations. The P2DCO is verified in a 90nm CMOS technology. The LSB control word provides a 2.04ps delay resolution. The post-layout simulations show that the dynamic power is 75.9ȝW and 5.2ȝW in the 239.2MHz and 3.89MHz, respectively. The area of the P2DCO is 60*20ȝm2.

I. INTRODUCTION

Digitally-controlled oscillator (DCO), as a key module in digitally-controlled-based frequency synthesis applications [1], shows several advantages over conventional voltage-controlled oscillator (VCO). DCO possesses the merits of easier porting between different process and voltage scaling and minimizes control and integration efforts. However, it is reported [2] that 50%~70% power of a clocking circuit comes from the DCO, playing the major bottleneck in total power reduction. In other words, power reduction in a DCO design effectively cuts down the overall system power, especially in low-power SoC applications.

DCO has been proposed in several architectures. The current-starved DCO provides high delay resolution but features high static power consumption. The standard-cell based DCO with straightforward delay elements, buffer/inverters and or-and-inverter logic cells, presents 100ȝW-order power and poor linearity with insufficient delay resolution, whereas the digitally-controlled varactor (DCV) improves delay resolution but with similar power scale. Therefore, a hysteresis delay cell (HDC) was proposed for tradeoff between power and delay resolution [3][4]. However, the resultant power saving is limited especially in low operating frequency, since the state-of-the-art DCO designs all require a large-area delay line to reach sufficient delay combinations. Consequently, this work addresses (i) a set of

HDCs with novel structures (on-off, cascade, and nested) which are power efficient, especially in low operation frequency, (ii) a power-of-two (P2) delay stage architecture which largely reduces the DCO area with the proposed HDC set. However, because the delay variation of the proposed HDCs is large under different process, voltage and temperature (PVT) conditions, a binary recovery self-calibration (BRSC) algorithm is proposed to improve the monotonicity of the proposed DCO. These three features accordingly overcome the challenge in the DCO power reduction and make the proposed DCO a preferred choice in power-thirsty or battery-less systems, especially in a sub-100MHz design.

II. SYSTEM OVERVIEW

The system includes a power-of-two DCO (P2DCO) and the BRSC block as shown in Fig. 1. The P2DCO features low power and small area properties, which is accomplished by utilizing novel HDCs. The BRSC block, which includes a distortion estimator and a codeword mapper, is to compensate the non-monotone effect of the P2DCO under PVT variations. The proposed P2DCO includes three tuning stages: the coarse tuning stage and two fine tuning stages as illustrated in Fig. 2. For each tuning stage, the delay segment of each control code bit is organized in a P2 structure where each delay segment produces half of the delay time of the one placed prior to it in the delay line. This architecture exempts the need for a binary to decimal converter, which is typical in the state-of-the-art DCO designs. The control method in a coarse tuning stage is to choose a path by a multiplexer for different propagation delay. In the fine tuning stages, it is by the direct control of the delay cells.

Each stage utilizes different types of HDCs as its delay cells. The coarse stage tuning applies cascaded HDCs (CHDC) and nested HDCs (NHDC) while the 1st fine tuning stage uses

on-off HDCs (OHDC). In the 2nd fine tuning stage, MOS gate

P2DCO OUTDCO

CP2DCO M U X Codeword Mapper (testing mode) mode Distortion Estimator ǻcn Cuser compensation codewords (normal mode) C={CN,CC,CO,CM}

Binary Recovery Self-Calibration Block

(2)

90

capacitance (MGC) is applied to generate variant combinations of output loadings for delay fine tunings. The gate capacitance is also arranged in the P2 ordering style by combining a number of controlled transistors.

The non-monotone behavior in a P2DCO design comes from inconsistent delay variations among different HDC structures under PVT variations. Therefore, the BRSC block is proposed to maintain monotone with a codeword transformation. It also eases the design effort of precise tuning to fulfill the P2 requirement. In the BRSC testing mode, the distortion estimator computes the compensation codewords () automatically. After that in the normal mode, the

codeword mapper uses  to generate CP2DCO and passes it to

the P2DCO.

III. PROPOSED HYSTERESIS DELAY CELLS

The feasibility of Schmitt trigger circuits in low power operation has been discussed in [5][6]. This work further exploits this concept of the hysteresis phenomena and proposes three topologies for HDC designs that are able to generate delays in a wide range and possess better power efficiency compared with the state-of-the-art delay cells.

A. On-off HDC

An on-off HDC (OHDC) is designed on a Schmitt-trigger circuit basis, as shown in Fig. 3(a)(b). By adding two controlling transistors MPc4 and MNc4, the delay cell can be

determined to operate as a normal inverter or a hysteresis inverter. Therefore, a delay difference is derived between these two modes. This is able to generate finer resolution than simply using the hysteresis cell as a delay element.

The concept of maintaining or destroying hysteresis property can be applied to various forms of Schmitt-trigger circuits. This work only demonstrates two possible types of circuit for the consideration of power consumption and delay range. The extra controlling transistors MPc5 and MNc5 in

Fig.3(b) is used to prevent potential short current paths which may cause extra power consumption and unbalanced fall time and rise time current, resulting in poor jitter performance.

B. Cascaded HDC

A general form of cascaded HDCs (CHDC) is shown in Fig. 3(c). It can be viewed as an on-off HDC in the Schmitt-trigger mode with longer internal inverter chain. The header transistor MPh1 and the footer transistor MNf1 play a role as

voltage gating cells that scale down the actual supply voltage of an internal inverter chain and confine the short current generated from internal nodes during voltage transitions. As a result, the power of this cascaded HDC is much lower than that of a normal inverter chain.

C. Nested HDC

A general form of nested HDCs (NHDC) is shown in Fig. 3(d). It can be viewed as a cascaded HDC with its internal delay chains composed of cascaded HDCs. A signal transition at the input propagates through all the inverter chains in each separate level from lower layers to higher layers. Here a two-layer nested HDC is demonstrated. It can be nested deeper as long as the threshold voltage is small enough. However, the footer and header transistors should be designed to reach the same status (on or off) when in steady state. This guarantees full-swing in each output node of internal delay chains so that each delay block has balanced delay time. The power-saving mechanism of nested HDCs is similar to that described for cascaded HDCs only with a more evident improvement since the supply voltage for the internal delay chain propagation is scaled down to a greater degree.

IV. BINARY RECOVERY SELF-CALIBRATION ALGORITHM

The non-monotone behavior of a P2DCO is demonstrated in Fig. 4 where some delay segments fail the P2 ordering style because of unbalanced change of delay time among the delay segments under PVT variations. Therefore, when a

user-OUTDCO

Coarse Tuning Stage

NHDCm-1 CN,m-1 M U X NHDCm-2 CN,m-2 M U X NHDC0 M U X M U X CN,0 CHDCn-1 M U X CC,n-1 M U X CHDC0 M U X CC,0 OHDCp-1 CO,p-1 CO,0 OUTDCO

1st Fine Tuning Stage

OHDC0

CM,r-1 CM,0

2nd Fine Tuning Stage MOS Gate Capacitance (MGC)

Nested HDCs (NHDC) Cascaded HDCs (CHDC) On-Off HDCs (OHDC)

delay segment

Fig. 2. Architecture of the P2DCO

Vin VDD VSS Vn Vp MPi2 MNi2 MPi3 MNi3 Mn,SWITCH Mp,SWITCH Vout MPh1 MNf1 MNc4 MPc4 VDD VSS VSS VDD MNf1 MNi3 MPi3 MPh2 Vn Vp MPh1 MNf2 MPc5 MPc4 MNc4 MNc5 Mn,SWITCH Mp,SWITCH Mp,SWITCH Mn,SWITCH Vout Vin Vin Internally Cascaded Delay Chain Vn Vp MNf1 MNi2 MPi2 MPh1 VDD VSS Vout MPi3 MNi3 MPiS MNiS MPh1 Internally Nested Delay Chain VDD VSS Level-1 Level-2 Vp1 MPh2 MNf2 MNf1 MNf3 MPh3 Vp2 Vn2 Vin Vn1 Vout Layer-1 Layer-2 Layer-2 Layer-1 (a) (b) (c) (d)

(3)

91

defined codeword (—•‡”) is directly input to the P2DCO, the

output period may show an offset due to variant PVT conditions. To solve this issue, a mapping from —•‡” to a new

codeword ʹ, which leads to the desired output period, is

required. The mapping from —•‡” to ʹ can be expressed

as (1).

ʹ ൌ —•‡” ൅ ο—•‡” (1)

The BRSC algorithm is to reconstruct the P2 relation among delay segments by filling the insufficient delay time of each delay segments with the delay time of the others. This is done by representing the insufficient delay time () of the

(n+1)-th delay segment as a compensation codeword ()

generating equivalent delay time as shown in Fig. 4. Whenever crossing a gap, a corresponding compensation codeword is added to the original codeword to maintain the monotonic behavior. Here only demonstrates the curve for the six LSBs of the control codeword. It could be further extended to the total length of the control codeword. As a result, ο—•‡”

is the sum of the compensation codeword corresponding to all the gaps included in —•‡”.

ο—•‡” ൌ σെͳൌͲ‹—•‡”ǡο… (2) where k is the codeword length. ‹—•‡”ǡ is the number of gaps caused by the (n+1)-th delay segment in —•‡”. ‹—•‡”ǡ is expressed as (3) due to the P2 structure.

‹—•‡”ǡൌ ൜

ہሺ—•‡” െ ʹሻ ʹΤ ൅ͳۂ ൅ ͳǡ 

—•‡” െ ʹ൒ Ͳ

Ͳǡ —•‡” െ ʹ൏ Ͳ (3)

During the testing mode, the distortion estimator computes the key parameters  by using (4) and (5), which is similar

in concept with (1) and (2). It states that the difference between a pair of codewords ሺĮǡ ȕሻ equals to the sum of the

compensation codewords needed to fill the gaps between them. Ⱦൌ Ƚ൅ οȾെȽ൅ ͳ  

οȾെȽൌ σെͳൌͲሺ‹Ⱦǡെ ‹Ƚǡሻ (5) where ȕ൐ Į and ሺȕሻ ൐ ሺĮሻ ൒ ሺȕെ ͳሻ . ሺሻ

represents the output period corresponds to a codeword . Therefore, the compensation codeword for each delay segments, , can be solved by the simultaneous equations

formed on the basis of (4) with k pairs of properly chosen codewords ሺĮǡǡ ȕǡሻ . Įǡ is set to ʹെ ͳ while ȕǡ is

derived by direct measuring the P2DCO. This method guarantees to include the information needed to solve . Fig.

4 demonstrates an example of a codeword pair ሺĮǡͷǡ ȕǡͷሻ

generated from the above method.

V. SIMULATION RESULTS

The proposed design, a P2DCO with BRSC algorithm is simulated in 90nm 1P9M CMOS process operating at 14MHz. Table I summarizes the HDCs used in this design based on the post-layout simulation results (partially from measurements). The simulation and measurement results of the delay cells used in [3] are listed as a comparison. The coarse tuning stage includes eight delay segments producing delay step from 0.5ns to 64ns. Each delay segment is mainly constructed with one HDC. In the 1st stage fine tuning, four OHDCs are combined to produce delay segment from 32ps to 0.5ns while the 2nd stage fine tuning is to produce delay segment from 1ps to 32ps with MGCs. Each delay segment is tuned to an approximate value, since the BRSC algorithm is able to compensate the imperfectness in P2 structure. Fig. 5 compares the performance of different HDCs and normalizes it with the performance of standard cells used in [3] at each tuning stage. The x-axis and y-axis implies the area and power saving from replacing standard cells with the proposed HDCs under the same operation frequency. It is reduced to a minimum of 2%, 28% and 89% of the original power in the coarse tuning stage and the two fine tuning stages respectively. Although some of the 1st fine tuning delay cells occupied larger area, the power reduction is the major concerns.

The power performance of a DCO design largely depends on its delay cells. It implies that the delay cell contributing the most delay time in an output period dominates the total power performance. Therefore, the total power consumption declines when the output period increases as shown in Fig. 6(a), since the proposed HDCs is the most power-efficient in coarse tuning stage. A P2DCO constructed with inverters is simulated for contrast. The power varies slightly in different output period since all delay cells possess the same delay time and power consumption.

Fig. 6(b) shows the post-layout simulation result to illustrate the improvement from the BRSC algorithm. The BRSC algorithm is applied at the coarse tuning stage, where the delay variation is most serious under different PVT

This work was supported by MOEA of Taiwan, R.O.C., under Grant 97-EC-17_A-03-S1-0005. 0 10 20 30 40 50 60 -10 0 10 20 30 40 50 60 Cuser CP2DCO ǻCuser

P2DCO with non-monotonic behavior ideal case

insufficient delay time (ǻtn, n=0...k-1)

compensation codeword (ǻcn,, n=0...k-1) Codeword Period ǻt5 ǻc5 ǻc3 desired Cȕ,5 ǻc4 ǻt3 ǻt4 ǻt3 ǻt3 ǻt4 ǻt3 CĮ,5 . .. .. (n+1) th delay segment ideal delay time in P2 ordering style ǻtn (n-1) th n th delay seg. delay time under PVT variation

Fig. 4. A model for BRSC algorithm

10-2 NHDC5(2%,2%) NHDC4(4%,2%) NHDC3(5%,3%) NHDC2(8%,4%) OHDC2(110%,41%) 10-2 10-1 100 10-1 100 101 MGC (41%,89%) NHDC0(23%,11%) CHDC1(32%,15%) CHDC0(46%,21%) OHDC0(377%,65%) NHDC1(12%,5%)

Notation: Cell Name (Norm. Area/Delay,Norm. Power)

Normalized Area/Delay Ratio

Normalized P ower Consumption OHDC1(237%,36%) OHDC3(69%,28%) Ref. Cell (100%,100%)

Coarse Tuning Cell (Ref. :[3]AND) 1stFine Tuning Cell (Ref. :[3]HDC) 2ndFine Tuning Cell (Ref. :[3]DCV-LD)

(4)

92

TABLE II. COMPARISON WITH THE STATE-OF-THE-ART OSCILLATOR DESIGNS

Proposed (simulation) TCAS2’07[3] (measurement) TCAS2’05[7] (measurement) ISSCC'08[8] (measurement) Process 90nm CMOS 90nm CMOS 0.35um CMOS 65nm CMOS Approach Digitally-Controlled Ring Oscillator Digitally-Controlled Ring Oscillator Digitally-Controlled Ring Oscillator Voltage-Controlled Relaxation Oscillator Supply Voltage (V) 1(0.9~1.1) 1 3.3 1.2(1.1~1.3) Operation Range (MHz) 3.8~239.2 191~952 18~214 12 LSB Resolution(ps) 2.04 1.47 1.55 NA Jitter 2.16ps (239MHz,p-p) 446fs (239MHz,rms) 49.05ps (417MHz,p-p) 8.18ps (417MHz,rms) NA -161.7dB@290K Power Consumption 75.9ȝW(239.2MHz) 5.2ȝW(3.89MHz) 140ȝW(200MHz) 18mW (200MHz) 90ȝW(12MHz) Area 1200ȝm2 NA 40000ȝm2 24000ȝm2

condition. This prototype is well-tuned under 1V supply voltage and temperature at 25oC. Therefore, the curve in this

condition is monotonic and overlapped with resultant curve after the BRSC algorithm. The simulation result under the other two different PVT conditions shows the non-monotonic behavior which is labeled with arrows. With the aid of the BRSC algorithm, the monotonic property is recovered.

The layout of the test chip is shown in Fig. 7. The area of

the P2DCO is 20*60ȝm2. The rest includes the BRSC

algorithm and some testing circuit. Table II shows the overall comparison of the P2DCO with the state-of-the-art oscillator designs. The P2DCO provides the least power consumption (5.2ȝW@3.89MHz, 75.9ȝW@239.2MHz) with least area occupation compared to the state-of-the-art designs. Moreover, the proposed HDCs are compatible with the automated CAD tools and therefore save design efforts in system integration.

VI. CONCLUSION

This work proposes three structures (On-Off, Cascaded, Nested) for the hysteresis-based delay cell design. Accompanied with the all-digital design scenario of the power-of-two delay stages, it enables the use of delay cells that have largely improved power/delay as well as area/delay density, resulting in both the least dynamic and static power consumption. Moreover, the monotonicity is well-maintained by the self-calibration scheme under different PVT variation. As a result, this work provides the most economic design approach, in terms of power and area, in the all-digital DCO in the state-of-the-art that it could be a suitable choice for low-power SoC applications.

REFERENCES

[1] Kwang-Jin Lee, Uk-Rae Cho, et al., “A Digitally Contolled Oscillator

for Low Jitter All Digital Phase Locked Loops,” IEEE ASSCC, pp. 365-368, Nov. 2005.

[2] Jui-Yuan Yu, et al., “An All-Digital Phase-Frequency Tunable Clock Generator for Wireless OFDM Communications,” in Proc. IEEE Int.

Conf. SoC, pp. 305-308, Sep. 2007.

[3] Duo Sheng, et al., “An Ultra-Low-Power and Portable Digitally Controlled Oscillator for SoC Applications,” IEEE Tran. Circuits and

Systems II, vol. 54, no. 11, pp. 954-958, Nov. 2007.

[4] Jun Zhao and Yong-Bin Kim, “A 12-bit Digitally Controlled Oscillator with Low Power Consumption and Low Jitter,” IEEE 51st Midwest

Symposium on Circuits and Systems, pp. 370-373, Aug. 2008

[5] S.F. Al-Sarawi, “Low power Schmitt trigger circuits,” Electron. Lett., vol. 38, no. 18, pp. 1009-1010, Aug. 2002.

[6] B.L. Dokic, “CMOS NAND and NOR Schmitt circuits,”

Microelectronics J., vol. 27, no. 8, pp. 757-765, 1996.

[7] P.-L. Chen, C.-C. Chung and C.-Y. Lee, “A portable digitally controlled oscillator using novel varactors,” IEEE Tran. Circuits and

Systems II, Express Briefs, vol. 52, no. 5, pp. 233-237, May 2005.

[8] Paul F.J. Geraedts, Ed van Trijl, Eric A. M. Klumperink, Gerard J.M. Wienk, and Bram Nauta, “A 90uW 12MHz Relaxation Oscillator with a -162dB FOM,” IEEE ISSCC Dig. Tech. Papers, pp.348-350,Feb 2008.

0 50 100 150 200 250 0 10 20 30 40 50 60 70 80 90 inverter-based DCO This work Power (ȝ W) Period(ns) 0 50 100Codeword150 200 250 Period (ns) 0 100 200 300 400 500 600 700 SS/0.9V/125oC, w/ BRSC TT/1.0V/ 25oC, w/o BRSC SS/0.9V/125oC, w/o BRSC TT/1.0V/ 25oC, w/ BRSC FF/1.1V/-25oC, w/ BRSC FF/1.1V/-25oC, w/o BRSC (a) (b) Fig. 6. Post-layout simulation result of (a)power consumption

(b)the BRSC algorithm

TABLE I. SIMULATION SUMMARIES OF DELAY CELLS

Tuning Stage Delay Cell Delay Res.(ns) Power (ȝW)

Coarse Tuning AND[3] 0.052 (0.052*) 103.93 (123.43*) NHDC5 65.016 1.74 NHDC4 30.979 2.33 NHDC3 16.069 3.04 NHDC2 8.120 4.25 NHDC1 4.003 5.54 NHDC0 2.049 11.02 CHDC1 1.040 15.93 CHDC0 0.522 21.94 1st stage fine tuning HDC[3] 0.083 (0.116*) 151.5 (160.84*) OHDC3 0.343 43.19 OHDC2 0.173 62.47 OHDC1 0.131 53.93 OHDC0 0.066 98.80 2nd stage fine tuning DCV-LD[3] 0.738ps (1.21ps*) 153.95 (208.46*) DCV-SD[3] 0.61ps (0.85ps*) 159.95 (190.44*) MGC 1.02ps (1.22ps*) 137.02 (164.5*) *Measurement result P2DCO BRSC + testing circuit P2DCO: 60ȝm × 20ȝm

數據

Fig. 1. System block diagram of the P2DCO and the BRSC algorithm
Fig. 2. Architecture of the P2DCO
Fig. 4. A model for BRSC algorithm
TABLE II.   C OMPARISON WITH THE STATE - OF - THE - ART OSCILLATOR DESIGNS

參考文獻

相關文件

Courtesy: Ned Wright’s Cosmology Page Burles, Nolette & Turner, 1999?. Total Mass Density

In this work, for a locally optimal solution to the NLSDP (2), we prove that under Robinson’s constraint qualification, the nonsingularity of Clarke’s Jacobian of the FB system

Based on the reformulation, a semi-smooth Levenberg–Marquardt method was developed, and the superlinear (quadratic) rate of convergence was established under the strict

request even if the header is absent), O (optional), T (the header should be included in the request if a stream-based transport is used), C (the presence of the header depends on

(That year was chosen because the Catholic Diocese of Hsinchu was established and church work began in the area in 1951.) So the first thing is a historical overview of

Zhang, “A flexible new technique for camera calibration,” IEEE Tran- scations on Pattern Analysis and Machine Intelligence,

Developing a signal logic to protect pedestrian who is crossing an intersection is the first purpose of this study.. In addition, to improve the reliability and reduce delay of

Developing a signal logic to protect pedestrian who is crossing an intersection is the first purpose of this study.. In addition, to improve the reliability and reduce delay of