• 沒有找到結果。

Digitally Controlled Delay Line

Chapter 5 All-Digital Delay-Locked Loop Design

5.3 ADDLL Circuit Design

5.3.1 Digitally Controlled Delay Line

According to the requirements of ADDLL, it has to provide 4-phase clock signal with equal delay space within single input cycle. Thus, the design challenge of the

CDS FDS P90 P180 P270 P360

CLK_IN

CDS FDS CDS FDS CDS FDS

(a)

(b)

F[1]

C_OUT F_OUT

F[16]

F[0]

HDC DCV

(c)

Fig. 5.5: (a) Proposed DCDL. (b) Coarse-tuning stage. (c) Fine-tuning stage.

- 68 -

delay line in ADDLL is to achieve high delay resolution and high speed at the same time [37]. The proposed DCDL has four duplicated delay stages, and each of which has one coarse-delay stage (CDS) and one fine-delay stage (FDS) as shown in Fig.

5.5(a). The minimum delay of each delay stage should be shorter than 1/4 of clock period to provide 90° phase-shift signal within the same clock cycle. The proposed DCDL employs this cascade-stage structure to achieve high delay resolution and high speed at the same time [34]. Each CDS has 16 coarse-delay cells (CDCs), consisting of one buffer and one multiplexer, and the coarse-tuning control code (C[15:0])

PULSE_START

FDS FDS

PULSE_END

FDS FDS 01

Dummy Intrinsic Delay Chain

CDC

- 69 -

selects the propagation paths from CDCs [41]. The intrinsic delay of CDS is only the gate delay of one multiplexer and interconnect delay as shown in Fig. 5.5(b).

In order to achieve better delay resolution, a hysteresis delay cell (HDC) and 16 digitally controlled varactors (DCV’s) are added as shown in Fig. 5.5(c). When the tri-state inverter of the HDC is enabled (F[0] is high), output signal of the enabled tri-state inverter has the hysteresis phenomenon in the transition state to produce different delay times. The gate capacitance of a DCV can be changed slightly by the fine-tuning control code (F[16:1]) to obtain high delay resolution in FDS. Because a tri-state holder cell can provide larger delay than a DCV, it can replace many DCV’s to reduce power consumption and the intrinsic, ensuring that the delay range of FDS covers the minimum delay time of CDC to keep the dead zone less than the delay resolution of FDS. As a result, the overall intrinsic delay of DCDL can be reduced by CDC and tri-state holder. The simulation results show that the minimum delay resolution of one FDS is 4ps; hence the total delay resolution of DCDL is 16ps. In order to enlarge the phase-shift range of DCPS, the gain of control code of DCPS is four, thus the minimum tuning delay of DCPS is 16ps.

Fig. 5.7: Layout of ADDLL and DCPS.

- 70 -

5.3.2 Time-to-Digital Converter

Fig. 5.6(a) illustrates the architecture of the proposed TDC. The period of input clock is quantized by 4 CDCs and converted to TDC control code (TDC_CODE) as shown in Fig. 5.6(b). Pulse_Start and Pulse_End rises at the first and second rising edge of input clock respectively. The dummy intrinsic delay chain that contains 4 FDSs with minimum delay and one multiplexer is the same as the minimum delay path of DCDL. Because the total delay of DCDL consists of the intrinsic delay and the tunable delay cell delay, Pulse_Start will pass through the dummy intrinsic delay chain in the front of the CDC chain and then the delay between delayed Pulse_Start (Pulse_Start_D) and Pulse_End will be quantized by 4 CDCs and converted to TDC

(a)

0.25 input clock period

(b)

Fig. 5.8: (a) Transient response of ADDLL. (b) ADDLL at steady state.

- 71 -

control code. As a result, the intrinsic delay effect can be removed to improve the precision of quantization and conversion. Additionally, Pulse_Start and Pulse_End only toggle once after system is reset.

5.4 Experimental Results and Comparisons

The proposed design is implemented by 0.13µm CMOS standard library where the layout of ADDLL and DCPS is shown in Fig. 5.7, and area of ADDLL and DCPS

DQ DQS

DQSD

DQS_R_ADJ DLL_CTRL

DQS_R_CTRL

(a)

DQ DQS

DQSD

DQS_R_ADJ DLL_CTRL

DQS_R_CTRL

(b)

Fig. 5.9: Tunable signal phase scheme in read operation when (a) DQS leads DQ.

(b) DQS lags DQ.

- 72 -

is 0.026mm2 and 0.01mm2 respectively. The proposed ADDLL and DCPS are designed and implemented by cell-based design flow, thus the proposed architecture and lock-in algorithm are modeled in Hardware Description Language (HDL) and functionally verified using NC-Verilog simulator. Fig. 5.8(a) shows the locking procedure of ADDLL after system is reset. The entire phase locking procedure takes 13 clock cycles. Fig. 5.8(b) shows the proposed ADDLL at steady state. When ADDLL is locked, the generated 4-phase clock signals reach equal space in one input clock period. Thus the phase shift between P90 and P360 is 1/4 clock period.

The proposed designs have been verified by HSPICE post-layout simulation with 1.2V. The simulation results of the proposed tunable phase shift scheme show the delayed DQS (DQS_D) can be adjusted to approach the center of DQ period when DQS leads or lags DQ, as a result, it can eliminate the mismatching delay from interconnection of multi-chip as shown in Fig. 5.9. The tunable range of phase shift is from -600ps to +400ps. For DDR2 400/800 applications, the operation range of the proposed ADDLL is from 200MHz to 400MHz, and the simulation results show that

CLOCK1 CLK_IN

CLOCK2

Fig. 5.10: Phase shift between CLOCK1 and CLOCK2 at 400MHz.

- 73 -

the total power consumption is 5.5mW and peak-to-peak period jitter is 20ps at 400MHz. The phase difference between CLOCK1 (P360) and CLOCK2 (P90) is 634ps at 400MHz, hence the phase-shift error is 1.3° (compared with 90°) as shown in Fig. 5.10. Fig. 5.11 shows the phase shift and peak-to-peak period jitter of ADDLL under different PVT and input clock frequency. Table 5.1 lists comparison results with the state-of-the-art ADDLLs for clock generation in DDR controller applications.

The proposed ADDLL has the shortest locking time, the smallest phase-shift error, and the lowest power consumption compared with other ADDLL designs.

Furthermore, the proposed ADDLL not only has good portability, but also provides the 90° phase-shift clock within the same clock cycle.

5.5 Summary

In this chapter, a tunable phase shift scheme based on a fast-lock portable ADDLL and a tunable DCPS for the timing block of DDR interface solution is presented. The proposed ADDLL that employs the high-performance DCDL and

P2P Jitter (ps)

Fig. 5.11: Jitter and phase shift of ADDLL under different PVT.

- 74 -

TDC can achieve fast phase lock and keep small phase-shift error compared with other ADDLLs. The proposed phase shift scheme provides an all-digital and suitable solution to eliminate the non-ideal effect of data transmission between multi-chip interconnections especially for high data rate interconnection applications.

Table 5.1: ADDLL Performance Comparisons

Performance Indices Proposed ADDLL [40] VLSI-DAT'06 [37] CICC'07 [38] E.LETTERS'08 [39]

Process 0.13µm CMOS 0.13µm CMOS 0.13µm CMOS 0.18µm CMOS

Supply Voltage (V) 1.2 1.2 1.2 1.8

Lock Time (clock cycles) 13 NA 40 < 80 Operation Range (MHz) 200 ~ 400 100 ~ 200 333.5 ~ 800 510 ~ 1100

P2P Jitter (ps) 20 @400MHz 950 @100MHz 40 @800MHz 20.4 @800MHz

Phase Error (degrees) 1.3 5.47 (7.6%) 2 NA

Power Consumption (mW) 5.5 @400MHz 9 @200MHz 19.2 @800MHz 12 @800MHz Phase Shift within Single

Cycle

Yes No Yes Yes

Portability Yes Yes No No

- 75 -

Chapter 6

All Digital Synchronous Mirror Delay Design

6.1 Introduction

As the operating frequency of electronic systems increases, de-skew clock circuits have been widely used for clock synchronization in System-on-Chip (SoC) applications. Synchronous mirror delay (SMD) is composed of a clock driver for driving the large clock loading on the chip and a skew-compensation circuit for compensating the clock skew induced by the clock driver. In contrast to phase-locked loop (PLL) and delay-locked loop (DLL), SMD is more suitable for the applications that require fast locking and low power consumption, because of its simple circuit structure [9], [42]-[52]. However, the static phase error between input and output clock is hard to reduce in the conventional SMD, owing to the low delay resolution.

Many SMD’s have been proposed to reduce the static error including an interleaved type that utilized an interleaving scheme that reduced the static phase error, but had to pay the penalty of increased circuit complexity and power consumption [44], [45]. The successive approximation register (SAR) SMD utilizes phase blender to improve the delay resolution, however, it takes long lock-in time [47]. Besides, the conventional SMD accepts only the pulsed clock signal to ensure the functionality,

- 76 -

implying the input clock needs to be modulated if duty cycle is not suitable. An arbitrary duty cycle SMD [48], [49] can accept wide input duty cycle range, but it may occur signal conflict when the high frequency clock propagates through the long delay line. The brief summary of the different SMD approaches is listed in Table 6.1.

In this chapter, the proposed all-digital SMD (ADSMD) utilizes the edge-trigger mirror delay cell (EMDC) and blocking edge-trigger scheme to increase the input duty cycle range and avoid the signal conflict. Furthermore, the proposed fine-tuning delay line (FTDL) and delay-matching structure can reduce the overall static phase error. As a result, the proposed ADSMD not only can achieve the wide input duty cycle range but also keep the small phase error at the same time [52].

This chapter is organized as follows. Section 6.2 describes the basic concept and operation of the conventional SMD. The proposed ADSMD architecture and circuit design including delay-matching structure, blocking edge-trigger scheme, EMDC, and

Table 6.1: Comparisons of Different SMD Approaches

Performance Indices Interleaved SMD

[44], [45] SAR SMD [47] Arbitrary Duty Cycle SMD [48], [49]

Static Phase Error Large Small Large

Lock-In Time Short Long Short

Duty Cycle Range Narrow Narrow Wide

Power/Complexity High Medium Medium

- 77 -

FTDL are described in Section 6.3. In Section 6.4, the experimental results of the proposed design are presented. Finally, a brief summary is addressed in Section 6.5.

6.2 SMD Overview

The schematic diagram of the conventional SMD is shown in Fig. 6.1. It consists of an input buffer (IB) with delay Td1, a clock driver (CD) with delay Td2, a forward delay line (FDL), a backward delay line (BDL), and a mirror control circuit (MCC). A pulsed clock propagates forward for the time of Tck - Td1 - Td2 through the FDL, and then propagates backward through the BDL as the opposite direction of FDL, where

Tck is the input clock cycle time. As a result, the total delay time is Td1 + (Td1 + Td2) + (Tck - Td1 - Td2) + (Tck - Td1 - Td2) + Td2 = 2Tck. In order that the NAND type

mirror delay cell (MDC) in MCC can perform accurately, the input clock should be modulated to narrow-pulse clock to ensure the two inputs of MDC will not be

Clock Propagation Path

Fig. 6.1: Architecture of the conventional SMD.

- 78 -

overlapped at logic high within the first input clock cycle. The accuracy of phase alignment of SMD is dominated by the delay resolution of delay cell in FDL and BDL.

Besides, because the gate delay of MDC is neglected in the delay formula, it will further increase the phase error of SMD after two clock cycles.

Clock Propagation Path

EMDC: Edge-Trigger Mirror Delay Cell FTC: Fine-Tuning Control Code

Fig. 6.2: (a) Architecture of the proposed SMD (b) Circuit of EMDC

- 79 -

6.3 The Proposed ADSMD Design

Fig. 6.2(a) illustrates the architecture of the proposed ADSMD which consists of several major functional blocks: a dummy delay line (DDL), a FDL, a MCC, a BDL, a FTDL, a phase detector, and a timing controller, and the circuit of EMDC is shown in Fig. 6.2(b) [52]. As compared with the conventional SMD, a DDL of the proposed delay-matching structure SMD contains an EMDC and a FTDL to compensate the delay of EMDC and FTDL. As a result, the total delay time is Td1 + (Td1 + Td2 +

Td3 + Td4) + (Tck - Td1 - Td2 - Td3 - Td4) + Td2 + (Tck - Td1 - Td2 - Td3 - Td4) + Td3 + Td4 = 2Tck. The locking procedure is divided into coarse and fine locking. The

coarse locking takes two clock cycles as the same as the conventional design, and the maximum phase error is the delay resolution of FDL and BDL. The remaining phase error is further reduced by FTDL controlled by 3-bit fine-tuning control code (FTC).

In the fine locking, the FTC is changed every two clock cycles by the timing controller based on UP/DN from phase detector to control the delay of FTDL to align

Driving Buffer

Fig. 6.3: Block diagram and equivalent circuit of DCV.

- 80 -

phase between external clock (EXT_CLK) and internal clock (INT_CLK). As a result, the entire locking procedure takes 10 clock cycles (2 + 2 x 4).

Typically, the delay resolution of FDL is one AND gate delay which is about several hundred picoseconds depending on the technology. In order to achieve high delay resolution, the proposed FTDL employs a digitally-controlled varactor (DCV)

IB_OUT

Fig. 6.4: Timing waveform (a) without blocking scheme (b) with blocking scheme.

- 81 -

whose gate capacitance can be changed slightly by the FTC to change the delay of FTDL under different output loading of the driving buffer as shown in Fig. 6.3 [34].

As a result, the overall delay resolution of SMD can be improved from several hundred picoseconds to ten picoseconds.

To increase the input duty cycle range, the proposed SMD utilizes the EMDC to detect the level changing of the outputs of the successive delay cells in FDL [49].

However, based on the system requirements, the length of the FDL and BDL may need to increase to achieve the wide operating frequency range. But, it will induce more than one output of the EMDCs at logic low as the high-frequency clock propagates through the long FDL, implying SMD operation is unstable as shown in Fig. 6.4(a). The proposed blocking edge-trigger scheme uses the blocking signal (BLK), which is set to low level at the second rising edge of IB_OUT to block the clock propagation in FDL to avoid the signal conflict in MCC and ensure the SMD functionality as shown in Fig. 6.4(b).

Fig. 6.5: Microphotography of SMD test chip.

- 82 -

6.4 Experimental Results

A test chip of the proposed SMD has been fabricated in 0.18µm CMOS process, where chip microphotography is shown in Fig. 6.5. The proposed design is verified by post-layout simulation using HSPICE. Fig. 6.6(a) shows the entire locking process takes ten clock cycles, and the total propagation delay of SMD is adjusted by the FTC every two clock cycles, making the phase error reduced to 15ps at 400MHz. Table 6.2 lists the verification results of phase error under different PVT conditions and input

400MHz, Duty Cycle: 80%

200MHz, Duty Cycle: 20%

200MHz, Duty Cycle: 80%

10 CLOCK Cycles

Phase Error: 15ps

(a)

400MHz, Duty Cycle: 20%

400MHz, Duty Cycle: 80%

200MHz, Duty Cycle: 20%

200MHz, Duty Cycle: 80%

(b)

Fig. 6.6: (a) Timing diagram of the proposed SMD (b) Acceptable Input duty cycle under different frequencies.

- 83 -

clock frequencies. The proposed SMD can accept wide input duty cycle from 20% to 80% at different input clock frequencies as shown in Fig. 6.6(b). The performance characteristics of the proposed SMD are summarized in Table 6.3.

6.5 Summary

The performance and application scope of the conventional SMD are limited by

the low accuracy phase alignment and the narrow-pulse clock demand. In this chapter, three important design concepts of the proposed SMD are proposed: a high-resolution delay line, a delay-matching structure, and a blocking edge-trigger scheme. The

Table 6.3: ADSMD Performance Summary

Process 0.18µm CMOS

Supply Voltage (V) 1.8

Operation Range (MHz) 200 ~ 400 Input Duty Cycle Range (%) 20 ~ 80

Delay Resolution (ps) 10

Phase Error (ps) 18

Lock Time (clock cycles) 10 Power Consumption (mW) 8.7 @400MHz

Area (mm2) 0.08

Table 6.2: Phase Error Under Different PVT Conditions

SS, 1.62V, 125° TT, 1.8V, 25° FF, 1.98V, -40°

200MHz 6ps 11ps 16ps 400MHz 16ps 15ps 18ps

- 84 -

proposed high-resolution delay line and delay-matching structure reduce the phase error between the external and internal clock, and the proposed blocking edge-trigger scheme extends the input duty cycle range without delay line length limitation. As a result, the proposed SMD can achieve wide duty cycle range and keep small static phase error compared with conventional designs, making it suitable for the clock synchronization in SoC applications.

- 85 -

Chapter 7

Conclusions and Future Works

7.1 Conclusions

In this dissertation, a systematic all-digital design approach to implement various high performance and low power clock generators, including ADPLL, ADSSCG, ADDLL, and ADSMD, for SoC applications has been presented. The proposed DCO which is the kernel module of all-digital clock generators employs a cascadable structure with coarse and fine-tuning stage to achieve high resolution and wide frequency range at the same time. The coarse-tuning stage utilizes a segmental delay line (SDL) to reduce redundant power, and the proposed hysteresis delay cell (HDC) can reduce the circuit complexity and loading of the fine-tuning stage to further lower down the power consumption.

For the power management system application, the proposed PLL employs a novel 2-level flash TDC to reduce lock-in time with low hardware cost. Besides, in the consumer electronics, microprocessor (µP) based systems, and data transmission circuits, how to reduce the electromagnetic interference (EMI) effect is an important design topic. Based on the proposed RDTM, the spreading ratio of the proposed ADSSCG can be specified flexibly by application demands while keeping the phase tracking capability. With the proposed low-power DCO and auto-adjustment

- 86 -

algorithm, the overall power consumption can be saved while keeping monotonic delay characteristic.

Double data rate (DDR) memories have been widely used for high-performance system in modern SoC designs to meet required data bandwidth. Because DDR memory controller needs specified clock and control signal to ensure the functionality and performance of data accesses, a tunable phase shift scheme based on all-digital delay locked loop (ADDLL) and digital control phase shifter (DCPS) has been proposed in this work to solve the delay mismatching issue. In addition, memory design utilizes the synchronous mirror delay (SMD) to eliminate the clock skew by wire delay mismatching. The proposed all-digital SMD (ADSMD) uses edge-trigger mirror delay cells to enlarge the input duty cycle range and fine-tuning delay lines with high-resolution delay cell to reduce the static phase error.

The proposed all-digital clock generators not only use the proposed DCO/delay cell and several design techniques to enhance performance and reduce power consumption, but also can be realized by standard cells in standard CMOS processes, making it easily portable to different processes as a soft intellectual property (IP). As a result, the proposed all-digital clock generators are very suitable for SoC applications as well as system-level integration.

7.2 Future Works

- 87 -

The proposed DCO employs a cascadable structure with coarse and fine-tuning stage to achieve high resolution and wide frequency range at the same time. However, this structure has several drawbacks. First, the controllable range of each stage should be larger than the delay step of the previous stage to ensure it does not have any dead zone larger than the LSB resolution of DCO. Thus, it needs over design to meet this design constraint, leading to increase power and area. Second, the non-monotonic problem will happen when DCO control code switch cross over different tuning stages. The non-monotonic problem may induce stability issue and large jitter.

Recently, many researchers proposed the phase interpolation approach to implement a monotonic DCO design [53]-[56]. However, the phase interpolator is not only hard to obtain precise timing, but also has large power consumption. As a result, a new DCO structure should be proposed to overcome these design issues.

Furthermore, as the operating frequency of clock generator increases, we should pay more attention to several design considerations to ensure the performance and functionality. First, because the tolerance of the duty cycle variation becomes small, the clock generator should embed a duty cycle corrector (DCC) to maintain the duty cycle of clock generator output. Second, in order to achieve high operating frequency, the clock generator may utilize advanced process to implement the high-performance design. It will encounter many non-ideal design issues, such as large leakage current and heavy wire loading as chip area increased. Thus, how to design a nano-meter clock generator will be a great challenge. Third, because the design of SoC becomes more complex, the clock generator needs high immunity to PVT variations to ensure the performance and functionality. In the previous work, it only proposed a compensated solution for supply voltage variation [53]. To have more robust clock generator for high-frequency SoC applications, how to increase the immunity to PVT

- 88 -

variations is an important research topic in the future. In addition to these design

variations is an important research topic in the future. In addition to these design