Summary - Low-Power Digitally Controlled Oscillator Design with Hysteresis

Chapter 2 Low-Power Digitally Controlled Oscillator Design with Hysteresis

2.6 Summary

with other DCO designs. Furthermore, the proposed low-power solution does not induce any performance loss. Additionally, since the proposed DCO can be implemented with standard cells, it has a good portability. As a result the proposed DCO has the benefits of better resolution, operation range, linearity, and portability.

2.6 Summary

In this chapter, we have proposed a hysteresis delay cell an ultra-low-power DCO with cell-based design for SoC applications. The proposed HDC not only can be used in low-power DCO, but also can reduce the DCDL power consumption. With the proposed segmental tuning structure and HDC, the power consumption of coarse-tuning and fine-tuning stages can be further reduced by 70% and 86.2%

respectively, as compared with conventional designs. Measurement results show that our proposed DCO can achieve 1.47ps resolution and 140µW at frequency of 200MHz. The proposed DCO achieve over an-order power reduction of the conventional works. As a result our proposal achieves not only less power consumption, but also better LSB resolution and delay linearity of DCO. Moreover, because the proposed DCO has a good portability as a soft intellectual property (IP), it is very suitable for SoC applications as well as system-level integration.

- 27 -

Chapter 3 Fast Lock-In All Digital

Phase-Locked Loop Design

3.1 Introduction

In this chapter, a fast lock-in all-digital phase-locked loop design is presented.

As mentioned in Chapter 1, many applications such as microprocessor, communication baseband processor, and multimedia system require a clock synthesizer or clock multiplier. Hence, PLL had become an essential component in SoC design. In order to reduce overall power consumption of SoC design, especially in portable and mobile applications, system uses the power management commonly to save the redundant power dissipation. To support this low-power technique, the PLL should provide fast entry and exit from power management techniques [10]. As a result, the locking time of PLL is a very important design specification for low-power SoC applications. In addition, for the fast locking frequency synthesizer applications, such as a frequency hopping multiple access systems, locking time is also the most critical design issue.

For fast acquisition requirement, the traditional analog PLL requires tuning of the voltage-controlled oscillator (VCO) free-running frequency near the desired frequency in advance or increases loop bandwidth. However, the exact VCO tuning

- 28 -

range is not easy due to process, voltage, and temperature variations (PVT variations), and the increased loop bandwidth degrades jitter performance [20]. Many researchers have focused on overcoming such structural handicap. A digital frequency-difference detector (DFDD) is proposed in [20] to convert the frequency difference directly to the digital code, and then control the VCO gain adaptively. The adaptive loop bandwidth scheme is proposed by [21] to reduce the locking time. But, the circuit complexity will be increased due to the adaptive loop bandwidth architecture.

In contrast to analog approaches, all-digital phase-locked loop (ADPLL) using binary search algorithm is proposed to achieve locking with 50 [10] and 46 [11]

cycles, respectively. The binary search ADPLL can not only achieve fast lock, but also have good performance as compared with the analog PLL. To further reduce locking time, a time-to-digital converter (TDC) based ADPLL is proposed in [22].

This ADPLL uses TDC to quantize the reference clock period into multiples of inverter delay times. Because TDC and DCO are influenced by the same PVT variations, the TDC measured code is more accurately and can cope with PVT variations. However, the power consumption and design complexity will be increased due to the TDC digital processing unit.

As a result, the research target of the proposed ADPLL is to achieve fast lock-in using TDC with small hardware penalty. In addition to locking time, power consumption is another important design specification of ADPLL, thus the proposed fast lock-in ADPLL employs the high-resolution and low-power DCO as described in Chapter 2 to save overall power and enhance performance.

- 29 -

This chapter is organized as follows. Section 3.2 introduces and describes the proposed design of the binary search ADPLL. The proposed TDC-based ADPLL for fast locking is described in Section 3.3. In Section 3.4, the proposed low-complexity 2-level flash TDC is presented, and the review of previous work of TDC is also dicussed in this section. In Section 3.5, the simulation results of the proposed ADPLLs are presented and discussed. Finally, a brief summary is given in Section 3.6.

3.2 Binary Search ADPLL Overview

3.2.1 Binary Search ADPLL Architecture

Fig. 3.1 illustrates the proposed binary search ADPLL architecture. It consists of seven major functional blocks: a phase/frequency detector (PFD), two digitally controlled oscillators (DCO’s) (tracking and average DCOs), an ADPLL controller, and three frequency dividers (pre-divider, DCO divider, and output divider). N, M, and K are inputs for programming pre-divider, DCO divider, and output divider respectively. There are two DCO’s in the ADPLL: the tracking DCO is used for

lead lag

Fig. 3.1: Binary search ADPLL architecture.

- 30 -

tracking reference clock and the average DCO can generate the output clock with small jitter by the average mechanism.

The PFD detects the frequency difference and phase error between the divided reference clock (Ref_N) and the divided DCO output clock (DCO_M), and it generates LEAD/LAG signals to speed up or slow down the DCO output frequency.

When controller receives LEAD from PFD, it increases the DCO control code (DCO code [16:0]) to decrease the output frequency of the tracking DCO. Oppositely, when controller receives LAG from PFD, it decreases the DCO control code to increase the output frequency of the tracking DCO. These blocks form a close-loop to achieve the

“phase-lock” function. For frequency synthesis application, the controller can filter DCO control code variation and control average DCO to provide a low-jitter clock output (OUTPUT CLK). For clock multiplier application, the in-phase clock is generated directly from the tracking DCO.

Lower Bound Middle Target Upper Bound

DCO Frequency Band

Cycle No. 0 1 2 3 4 5 6 7 8 9

Fig. 3.2: Binary search algorithm.

- 31 -

3.2.2 Binary Search Algorithm

The locking procedure of the binary search ADPLL can be divided into two modes: frequency acquisition and phase tracking. Phase lock starts from frequency acquisition mode. The frequency acquisition mode employs binary search algorithm to search the target frequency of input clock. Fig.3.2 illustrates the binary search algorithm for frequency acquisition. In the beginning, DCO oscillates at the middle of DCO frequency band, and the search step is one fourth of DCO frequency band. If output frequency is higher than target frequency, ADPLL controller adds current search step to DCO control code to lower the output frequency. Conversely, if output frequency is lower than target frequency, ADPLL controller adds DCO control code to increase the output frequency. Whenever PFD output changes from LAG to LEAD

PHASE

Fig. 3.3: Flowchart of phase tracking mode.

- 32 -

or vice versa, the search step is divided by 2. After the search step reduces to 1, the frequency acquisition completes.

After the frequency acquisition completes, the locking procedure enters into phase tracking mode. Fig.3.3 shows the flowchart of phase tracking mode. In the beginning of this mode, the speed-up count (SPEEDUP_COUNT) sets to zero. When the PFD output changes from LAG to LEAD or vice versa, that means the phase polarity changes, the search step will be reduced half of the previous step. If the search direction keeps the same way, the speed-up count will add one. When the speed-up count equals to the boundary value, the search step will be doubled as the previous step to accelerate the phase tracking. If the boundary value is too large, the PLL may not track the input phase. Conversely, small boundary value will occur the unstable issue. By the simulation, the boundary value is selected to eight.

Due to the PFD dead zone and the reference clock noise, the DCO control code has small variations even the frequency and phase has been locked. In order to reduce jitter, the proposed ADPLL uses an average mechanism to eliminate such non-ideal effects. In the beginning, the ADPLL controller detects the maximum and minimum

lead lag

Fig. 3.4: TDC-based ADPLL architecture

- 33 -

of the DCO control code within 256 reference clock cycles and then takes the average of these two values. The average value will be the average DCO control code (avg_code [16:0]) for average DCO. Without the tracking noise, the ADPLL will generate a more stable and low-jitter output clock.

3.3 The Proposed TDC-Based ADPLL

The locking time is the most critical design specification for fast-locking application. In order to achieve fast locking, the TDC-based ADPLL is proposed and described in this section. In the proposed architecture, the locking procedure is divided into two modes: coarse locking and fine locking. Phase lock starts from coarse locking mode. In this mode, TDC is used to calculate the nearest control code quickly for DCO to produce the desired frequency. Because TDC can convert the input clock period information to multiples of delay time of delay cell, ADPLL controller can take this period information to jump to desired frequency quickly. After the coarse locking mode completed, ADPLL enters fine locking mode to reduce the residual frequency and phase error by binary search algorithm as described in

High-Frequency Clock Counter

Time Interval

Fig. 3.5: Counter-based TDC.

- 34 -

previous section. As a result, overall lock-in time can be reduced by adding TDC module significantly.

Fig. 3.4 illustrates the proposed TDC-based ADPLL architecture. There are several functional blocks: a TDC, a phase/frequency detector (PFD), an ADPLL controller, a DCO, and two frequency dividers (pre-divider and DCO divider).Through the DCO divider, the signal DCO_M is the output of DCO divided by M. The Ref_N comes from reference clock divided by N. Once the ADPLL is enabled, TDC provides the coarse DCO control code (TDC_code [5:0]) to the ADPLL controller after two reference clock cycles, and then DCO generates the desired frequency output by this coarse DCO control code. After TDC operation is completed, the PFD generates the signal “lead” or “lag” depending on the phase and frequency difference between Ref_N and DCO_M. If DCO_M leads Ref_N, PFD generates a “lead” signal to slow down the DCO. Conversely, when DCO_M lags Ref_N, PFD generates a “lag” signal to speed up the DCO. When the ADPLL controller receives “lead” or “lag” from the PFD, it changes the DCO control code (DCO_code [13:0]). And then DCO control code controls the DCO to generate the output clock (DCO_CLK). These blocks form a close-loop to achieve the

“phase-locked” function.

Because the proposed TDC-based ADPLL uses the novel 2-level flash TDC, the coarse locking only takes two input clock cycles. In the fine locking mode, the worst case for lock time of the binary search algorithm [11], in terms of input clock cycle,

^TL =

(

²×^log2²^N

)

−¹ (3.1)

- 35 -

Fig. 3.6: (a) Single delay chain flash TDC. (b) Operation of single delay chain flash TDC.

- 36 -

control code. In the proposed ADPLL, the DCO control code is 14 bits, as a result, the entire phase locking procedure takes 29 clock cycles including 2 cycles TDC operation and 27 cycles (N=14) for the fine-tuning phase locking.

3.4 Time-to-Digital Converter

3.4.1 TDC Overview

Time-to-digital converters have been widely used for measurement system, temperature sensor, and communication system [23]-[25]. Because TDC can convert the time information to digital code, it is an essential component for the interface of analog and digital signals. Many approaches have been proposed to implement a TDC [1], [23]-[25]. The counter-based TDC uses a high-frequency clock or multi-phase clock to sample the timing interval and convert to multiples of period of

F/F

- 37 -

high-frequency sampling clock as shown in Fig. 3.5 [1]. The design concept of counter-based TDC is very straightforward, but the power consumption is very high due to the high-frequency counter design.

Another approach is the flash TDC that is analogous to flash analog-to-digital converters for voltage amplitude encoding and operate by comparing a signal edge to various reference edges all displaced in time [23], [24]. The elements that compare the input signal to the reference are usually flip-flops. In the single delay chain flash TDC shown in Fig. 3.6 (a), each buffer produces a delay equal to t. Suppose it is desired to determine the period of input clock using the eight buffers converter in Fig.

3.6 (b). Each flip-flop compares the displacement in time of the delayed the first rising edge to the first falling edge of input clock. The thermometer-encoded output indicates the value of delay time of buffer; assuming the flip-flops are given sufficient time to resolve. The drawback to this implementation is that the resolution can not be smaller than a single gate delay. In addition, when the frequency of the input clock is low, it will require numbers of flip-flops and buffers to cover large clock period, leading to suffer large power consumption and hardware cost.

In order to enhance resolution, the flash converter can be constructed with a Vernier delay line as shown in Fig. 3.7 [25]. This architecture achieves a resolution of t1- t2, where t1 >t2. However, the power and area issues still need to be resolved when the sampled clock with low frequency.

Because the proposed TDC-based ADPLL uses TDC to lock the input clock frequency coarsely, the high resolution is not the design target of the TDC. In contrast,

- 38 -

how to lower the power and circuit complexity of TDC is more important design issue for the fast lock-in ADPLL application.

3.4.2 The Proposed 2-Level Flash TDC

As mentioned in the previous subsection, the single level flash TDC needs a large of flip-flops, leading to increase power consumption and design cost. In contrast to single level type, the proposed 2-level flash TDC takes only 12 D-flip-flops (8+4) as shown in Fig. 3.8, thus it has lower hardwire complexity and power consumption.

There are several functional blocks, namely a 1^st level flash TDC, a 2^nd level flash TDC, a delay selection multiplexer, and a period calculator. The 1^st level flash TDC consists of 4 large delay cells whose delay time is eight times of small delay cell (8t) and 4 D-flip-flops. In contrast to the 1^st level flash TDC; the 2^nd level flash TDC has only 8 small delay cells and D-flip-flops. The small delay cells used in the 1^st and 2^nd level flash TDC’s remain the same as those for DCO coarse-tuning stage.

D F/F

Fig. 3.8: The proposed 2-level flash TDC architecture.

- 39 -

When the TDC is enabled, Ref_N is sent to the 1^st level flash TDC, and the input signal will propagate through the 4 large delay cells. When the first falling edge of Ref_N arrives, the outputs of the large delay cells will be sampled by D-flip-flops and selects one of large delay cell outputs for the 2^nd level flash TDC. All outputs of D-flip-flops (Q1 [3:0]) are also sent to the thermometer-to-binary converter to generate the 1^st level flash TDC output (L1_SEL). Then the 2^nd level flash TDC generates the delay selection signal (L2_SEL) based on the sampled delay outputs (Q2 [7:0]). The outputs of the 1^st and 2^nd level flash TDC section are thermometer code type that can be used to generate selection signals easily. After both L1_SEL and L2_SEL have been generated, the period calculator can estimate the period of Ref_N based on these values. The conversion equation can be given as

Tr=(L1_SEL×8+L2_SEL)×2 (3.2)

where Tr is the period of Ref_N. For example, as shown in Fig. 3.9, if the period equals to 36 times of delay cell delay time, L1_SEL and L2_SEL should be 2 and 2 respectively. In order to reduce lock-in time, the TDC only measures half period of Ref_N, and the calculated value should be shifted left to obtain the period of Ref_N.

The TDC takes only two reference clock cycles to complete lock-in operation. From

2 Clock Cycles

Fig. 3.9: Simulation of 2-level flash TDC.

- 40 -

the simulation results with 0.13µm CMOS standard cell library, the TDC resolution equals delay time of one delay cell (165ps), and the frequency error is 3.3% at 200MHz in the lock-in state.

In the proposed TDC-based ADPLL architecture, the frequency of Ref_N is the same as the frequency of DCO divided by M (DCO_M) as frequency locked. The delay time of coarse-tuning stage in DCO equals Tr divided by N. In order to reduce the hardware complexity of division, we propose a novel method to approximate this division operation results. This simplified operation can be divided into two steps.

First, if the value of division ratio (M) is the power of two, this division operation is only a shift-right operation. If not, we extract the value of power of two of MSB in M (MS) and ML (M+1). Second, the division ratio will be shifted right by MS and ML, and then the TDC output equals the average of these two values (TL and TS). For example, if M=6, MS and ML is 2 and 3 respectively. The average of the shifted

Tracking DCO control code

Average DCO control code

Fig. 3.10: Transient response of binary search ADPLL.

- 41 -

value (4 and 9) equals 6. As a result, the division can be completed approximately with small hardware cost.

3.5 Experimental Results

The proposed ADPLL’s are designed and implemented by 0.13µm CMOS standard cell library and cell-based design flow, thus the proposed architecture is modeled in Hardware Description Language (HDL) and functionally verified using NC-Verilog simulator. Moreover, we also use transistor-level simulator with Hspice to verify the performance of the timing critical circuits including DCO, PFD and TDC.

To achieve high performance and low power, the proposed binary search ADPLL and TDC-based ADPLL use the ultra-low power DCO as described in Chapter 2.

The simulation of the binary search ADPLL is shown in Fig. 3.10. The frequency of the reference clock is 20MHz, and the division ratio is 10, thus the frequency of the ADPLL output clock is 200MHz (=20MHz * 10). When the ADPLL controller

Fig. 3.11: Transient response of TDC-based ADPLL.

- 42 -

receives the “lead” or “lag” signal from the PFD, the DCO control code will be decreased or increased respectively, and the frequency of DCO will be changed too.

In the Fig. 3.10, we can see that either the tracking DCO control code or the average DCO control code will be converged to a stable value and complete the lock function.

Fig. 3.11 shows the transient response of the proposed TDC-based ADPLL, where the reference clock is 20MHz, and the division ratio (M) is 10. Thus the output frequency is 200MHz (=20MHz * 10). The TDC takes 2 reference clock cycles to complete coarse lock-in operation and 27 cycles to align phase. After TDC operation is completed, the DCO control code is changed by PFD output frequency to generate desired DCO output frequency. As shown in Fig. 3.11, the DCO control code will be converged to a stable value and complete the lock function. The simulation results show the power consumption is 250µW at 200MHz and 1.2V.

3.6 Summary

In this chapter, the binary search algorithm and the proposed TDC-based ADPLL have been presented. Because the locking time of TDC-based ADPLL can be reduced to 29 input clock cycles by the novel 2-level flash TDC, it is very suitable for fast lock-in applications. By the 2-level architecture, the hardware cost of the proposed TDC can be saved significantly. In addition, since all designs of the proposed ADPLL

在文檔中應用於系統晶片之低功率全數位式時脈產生器 (頁 41-0)