• 沒有找到結果。

Auto-Adjustment Algorithm for Monotonic DCO

Chapter 4 All-Digital Spread Spectrum Clock Generator Design

4.3 DCO Design

4.3.2 Auto-Adjustment Algorithm for Monotonic DCO

As mentioned in the previous section, the DCO control code will be changed to obtain the different output periods in the spread spectrum applications, thus the monotonic characteristic of DCO is very important. Because the controllable delay range of each stage must be larger than the finest delay step of the previous stage, non-monotonic problem will occur when DCO code switches at the boundary of different tuning stages. To eliminate such non-ideal effects, an adjustable algorithm

- 54 -

for boundary code switching is proposed. Fig. 4.4 is the flowchart of the proposed algorithm. When the DCO code crosses the boundary of different tuning stages, the DCO code will be adjusted by the ADSSCG controller to eliminate the non-monotonic issue automatically. If DCO code changes across boundary of different tuning stages, the original code will add or subtract the extra compensation code to reduce the delay difference caused by tuning stages switching. According to

Fig. 4.4: Flowchart of auto-adjustment algorithm.

5050 5052 5054 5056 5058 5060

9777 9778 9779 9780 9781 9782 9783 9784 9785 9786 9787 DCO Control Code

Delay (ps)

Fig. 4.5: Comparison between original and adjusted timing.

- 55 -

the simulation results of our proposed DCO under different process-voltage-temperature (PVT) conditions, the extra compensation code of across coarse/1st fine, 1st /2nd fine, and 2nd /3rd fine-tuning stage can be defined as 320, 48, and 4 respectively. For example, when the last four bits of DCO code (including one bit for 2nd fine-tuning stage and last three bits for 3rd fine-tuning stage) changes from

)2

0111

( to(1000)2, the delay should increase 1.1ps ideally, but it decreases 3.78ps (from 7.7ps to 3.92ps which is the delay of one 2nd fine-tuning cell) instead. Based on the auto-adjustment algorithm, the code will be adjusted from (1000)2to(1100)2. As a result, the delay will increase 0.62ps, leading to operate in a monotonic way as shown in Fig. 4.5.

4.4 Experimental Results and Comparisons

Based on the requested operating frequency for an in-house µP-based system and LCD controller [35] applications, the proposed ADSSCG should generate output clock ranges from 27MHz to 54MHz. The proposed ADSSCG is designed and

Fig. 4.6: Microphotograph of ADSSCG test chip.

- 56 -

implemented by cell-based design flow, thus the proposed architecture and spread spectrum algorithm are modeled in Hardware Description Language (HDL) and functionally verified using NC-Verilog simulator. Moreover, we also use transistor-level simulator with Hspice to verify the DCO performance. Because the

(a) (b)

Fig. 4.7: Measurement spectrum of 54MHz (a) Without frequency spreading (b) With 1% frequency spreading.

(a) (b)

Fig. 4.8: Measurement spectrum of 27MHz (a) Without frequency spreading (b) With 10% frequency spreading.

- 57 -

proposed ADSSCG is implemented with standard cells, the physical layout is generated by the auto placement and routing (APR) tool.

A test chip has been fabricated in 0.18µm 1P6M CMOS process with area of 0.156mm2, where the chip microphotograph is shown in Fig. 4.6. The ADSSCG output signal is measured using Agilent E4440A spectrum analyzer at 1.8V/25°C to test the performance. The input clock frequency is from 13.5MHz to 27MHz. The total current consumption is 0.69mA at frequency of 54MHz. Fig. 4.7 shows the reduction of peak power is 9.5dB at 54MHz with 1% of spreading ratio, and the reduction of peak power is 15dB at 27MHz with 10% of spreading ratio is shown as Fig. 4.8. Figs. 4.7 and 4.8 shows the EMI can be reduced at the maximum and minimum operation frequency of the proposed design, respectively. Because RDTM is a kind of the triangular modulation, some peaks are happened in spectrum [27]. For the complex digital application in our system chip, ADSSCG operates under dirty power supply environment in the spread-spectrum operation mode, hence it increases noise floor of spread-spectrum operation mode as shown in Fig. 4.7(b) and 4.8(b), and the measured rms jitter is 94ps at 54MHz with frequency spreading. Besides, because the discrete modulation has wide frequency distribution, it also induces large jitter and has high noise floor. There are several solutions to reduce the high noise floor issue and jitter. First, in the system integration, the power supply for ADSSCG and other modules should separate to maintain a clear environment for the timing critical circuits. In addition, the ADSSCG should have higher immunity for dirty power supply environment. Second, the modulation algorithm should change frequency smoothly to avoid the large frequency jump and provide the pure frequency of clock output. Third, the resolution and monotonicity of DCO should be further improved to enhance the performance and reduce jitter.

- 58 -

Table 4.3 lists comparison results with the state-of-the-art SSCGs for clock generation applications. Based on the power index comparison, it is clear that the proposed ADSSCG can provide better power-to-frequency ratio, implying the proposed ADSSCG is more effective in power saving for a given operating frequency.

In addition, since the proposed architecture is very simple and without passive components, it can achieve low-complexity and small-area compared with other SSCG designs. Although [32] occupies smaller area, it needs an extra PLL to provide the frequency multiplication function, and it can only provide the fixed frequency spreading ratio. Furthermore, since the proposed ADSSCG can be implemented with standard cells, it has a good portability and very suitable for SoC integration as compared with [28]-[30]. As a result the proposed ADSSCG has the benefits of better power consumption, programmable spreading ratio, area, and portability.

Table 4.3: SSCG Performance Comparisons

Performance Indices Proposed JSSC’03 [28] TCASI’08 [29] ISSCC’05 [30] JSSC’07 [32]

Process 0.18µm CMOS 0.35µm CMOS 0.35μm CMOS 0.18μm CMOS 0.15µm CMOS

Design Approach All-Digital Analog Analog Analog All-Digital Modulation Type Modulation on DCO Modulation on VCO Modulation on VCO Modulation on Divider Delay Line (2)

Application

µP-based system/

LCD Controller

µP-based system μP-based system SATA I DVD Player

Output Frequency (MHz) 27 ~ 54 66/133/266 50 ~ 480 1500 27

Power Consumption (mW) 1.2 (@54MHz) 300 (@266MHz) 27.5 (@400MHz) 77 (@1.5GHz) 7.1 (@27MHz)

Power Index (µW/MHz) 22.2 1127.8 68.8 51.3 263

Area (mm2) 0.156

2.01 (Excluding loop filter)

0.66 0.31 0.06 (Excluding PLL)

Portability Yes No No No No

(1) Based on timing constraint of system application. (2) Needs an extra PLL.

- 59 -

4.5 Summary

In this chapter, we proposed a portable, low power, and area-efficient ADSSCG with programmable spreading ratio for SoC applications. Based on the proposed RDTM, the spreading ratio can be specified flexibly by application demands while keeping the phase tracking capability. With the proposed low-power DCO, the overall power consumption can be saved. The proposed auto-adjustment algorithm can maintain the monotonic characteristic of DCO. Measurement results show the proposed ADSSCG can achieve 9.5 dB EMI reductions with 1% frequency-spreading ratio and 1.2mW at frequency of 54MHz. As a result, our proposal achieves less power consumption and area with competitive EMI reductions. Moreover, because the proposed ADSSCG has a good portability as a soft intellectual property (IP), it is very suitable for SoC applications as well as system-level integration.

- 60 -

Chapter 5

All Digital Delay-Locked Loop Design

5.1 Introduction

In this chapter, a fast-lock and portable all-digital delay-locked loop (ADDLL) with 90° phase shift and a digitally-controlled phase shifter (DCPS) for DDR interface applications is presented. As the operating frequency of electronic systems increases, double data rate (DDR) memories have been widely used for memory performance enhancement and high-speed data transmission between microprocessors and memory devices. Fig. 5.1(a) illustrates the interconnection of the DDR memory and core system. The data transfers are based on the bidirectional differential or single-ended data strobe (DQS) that is transmitted along with data (DQ) for capture [7]. In the read operation, DQS is transmitted edge-aligned with DQ by the DDR memory, and then delayed by 90° phase shift to the center of the data period to enlarge the effective data capture window in the DDR controller. However, the effective data valid window will be reduced by delay mismatching between DQS and DQ from interconnection of multi-chip as shown in Fig. 5.1(b). In contrast to the read operation, DQS is center-aligned with DQ by the controller and transmitted to the memory in the write operation. However, the effective data valid window will be reduced and the maximum attainable frequency will be further limited by delay mismatching from interconnection of multi-chip even DQS has been delayed by 90°

phase shift in the controller before transmitted as shown in Fig. 5.1(c). As a result, the

- 61 -

phase shift of DQS should be a suitable value instead of the fixed 90° by DDR controller to reach the center of DQ period both in the read and write operation. Thus, DDR controller should have the tunable phase-shift capability to eliminate the non-ideal effect of data transmission between multi-chip interconnections especially in high data rate applications.

DDR

Data Valid Data Valid Data Valid DQ (in controller)

Data Valid Data Valid Data Valid Delay Mismatching

Capture Window

(c)

Fig. 5.1: (a) Interconnection of DDR memory and core system. (b) Waveform of read operation. (c) Waveform of write operation.

- 62 -

Many delay-locked loops (DLL’s) and phase shifters have been proposed for a clock generator which can provide the fixed 90° phase-shift clock or control signal required to transfer data correctly in the high-speed DDR memory controller [36]-[40].

The DLL generates an output clock aligned with input clock and provides the control signal for the phase shifter of DQS. In the physical implementation, the phase shifters may have long distance from DLL. The digitally-controlled phase shifter (DCPS), controlled by digital control signal, is more suitable for high-performance DDR controller applications, because the digital control signal is more robust when it has long path propagation. Thus, many all-digital DLL’s (ADDLL’s) providing the digital control code for the DCPS have been proposed [37]-[39]. However, the phase of these DCPS outputs are not tuned when the ADDLL is locked. Thus, these designs have low immunity to against the non-ideal effect of data transmission between multi-chip interconnections. In addition, these ADDLL’s take long locking time, implying that they are not suitable for the low-power DDR controller whose clock signals should be generated in a short time when the controller switches from power-down to active mode. Besides, due to the speed limitation of delay line, a multi-cycle shifting scheme is proposed [37] to generate the phase-shift clock signal, however it is not suitable for the non-periodic DQS.

In this chapter, a tunable phase shift scheme based on a fast lock-in ADDLL and a tunable digitally-controlled phase shifter (DCPS) for high data rate interconnection applications are presented. The proposed ADDLL uses the reference clock to establish the timing information and DCPSs provide the suitable phase adjustment of non-periodic control signals to obtain a large data capture window. The proposed ADDLL utilizes a time-to-digital converter (TDC) to reduce locking time and avoid the harmonic lock problem. A high-performance digitally-controlled delay line

- 63 -

(DCDL) is also included to achieve high speed and keep high delay resolution to generate 90° phase-shift clock signal with small phase-shift error and single-cycle shifting scheme. The proposed DCPS provides the tunable phase adjustment of DQS for DDR interface where precise control is the key to achieve reliable high-performance operation. Furthermore, the proposed ADDLL and DCPS use cell-based design approach, making it easily be integrated into digital system and ported to different processes as a soft IP.

This chapter is organized as follows. Section 5.2 describes the proposed tunable phase shift scheme based on a fast-lock and portable ADDLL and a DCPS for DDR interface applications. Section 5.3 focuses on the proposed DCDL and TDC circuit design. In Section 5.4, the experimental results and performance comparisons of the proposed design are presented. Finally, a brief summary is given in Section 5.5.

5.2 The proposed Clock Generator Architecture

5.2.1 Tunable Phase Shift Scheme

Fig. 5.2 illustrates the architecture of the proposed tunable phase shift scheme for DDR controller that consists of four major functional blocks: a phase controller, an ADDLL, and two DCPSs. After ADDLL is locked, it provides two clock signals:

CLOCK1 (phase aligned with input clock) and CLOCK2 (90° delayed with input clock), and the DLL control code (DLL_CTRL) for phase controller [40]. If DCPS uses the DLL_CTRL without any adjustment, it will generate delayed DQS with 90°

- 64 -

phase shift which is the same as CLOCK2 in ADDLL. In the beginning of the tunable phase scheme, the phase adjustment codes of read/write DQS (DQS_R_ADJ/

DQS_W_ADJ) will be set to zero, implying the phase shift of DQS is 90°. Then the core system will enter the test mode to access DDR memory through the DDR controller to verify the functionality and performance of the clock and signal generators in DDR controller. If the core system has detected that DDR memory system fails to meet performance specification, the control code of read/write DQS (DQS_R_CTRL/DQS_W_CTRL) will be increased or decreased sequentially by the phase adjustment codes to generate the suitable phase shift of the delayed read/write DQS (DQSD_R/DQSD_W) to compensate the delay mismatching by interconnection between DDR memory and core system. The flowchart of the tunable phase shift scheme is shown in Fig. 5.3.

Fig. 5.2: Architecture of the proposed tunable phase shift scheme for DDR controller.

- 65 -

The architecture of the proposed ADDLL which consists of five major functional blocks: a TDC, a DCDL, a phase detector (PD), an ADDLL controller, and a control code decoder as shown in Fig. 5.4(a). The locking procedure is divided into two steps:

coarse locking by TDC and fine locking by the binary search algorithm. In the beginning, ADDLL resets and TDC takes four clock cycles to generate TDC control code to determine the coarse controlling code of DCDL for the output clock signal (P360) which is delayed by one clock period approximately. After coarse locking, DCDL control code will be fine tuned by ADDLL controller based on UP/DN from PD to control the delay of DCDL to align phase between CLK_IN and P360. The worst case for lock time of the binary search algorithm [11], in terms of input clock cycle, is

TF =(2×log22N)−1

(5.1)

where TF is the lock time of fine tuning and N is number of bits of the binary search control code. Because the total number of bits of the fine-tuning control code is 5, the

Meet SPEC.

Y N

Start

DQS_R_ADJ = 0, DQS_W_ADJ = 0

Modify DQS_R_ADJ and

QDS_W_ADJ Fix

DQS_R_ADJ and QDS_W_ADJ

End

Fig. 5.3: Flowchart of the proposed tunable phase shift scheme.

- 66 -

entire phase locking procedure takes 13 clock cycles including 4 cycles for ADDLL reset and TDC operation and 9 cycles (N=5) for the fine-tuning phase locking. In addition, control code decoder converts the DCDL control code from binary to thermal format, owing to the requirement for high-resolution DCDL structure.

Fig. 5.4(b) illustrates the structure of the proposed DCPS including one decoder and one DCDL which are the same as the design in ADDLL. Because the delay of the proposed DCPS is tunable with high-resolution delay step, it can be delayed more or less than 90° depending on phase adjustment setting by the system timing demand.

*: TDC_CODE[3:0]

- 67 -

5.3 ADDLL Circuit Design

5.3.1 Digitally Controlled Delay Line

According to the requirements of ADDLL, it has to provide 4-phase clock signal with equal delay space within single input cycle. Thus, the design challenge of the

CDS FDS P90 P180 P270 P360

CLK_IN

CDS FDS CDS FDS CDS FDS

(a)

(b)

F[1]

C_OUT F_OUT

F[16]

F[0]

HDC DCV

(c)

Fig. 5.5: (a) Proposed DCDL. (b) Coarse-tuning stage. (c) Fine-tuning stage.

- 68 -

delay line in ADDLL is to achieve high delay resolution and high speed at the same time [37]. The proposed DCDL has four duplicated delay stages, and each of which has one coarse-delay stage (CDS) and one fine-delay stage (FDS) as shown in Fig.

5.5(a). The minimum delay of each delay stage should be shorter than 1/4 of clock period to provide 90° phase-shift signal within the same clock cycle. The proposed DCDL employs this cascade-stage structure to achieve high delay resolution and high speed at the same time [34]. Each CDS has 16 coarse-delay cells (CDCs), consisting of one buffer and one multiplexer, and the coarse-tuning control code (C[15:0])

PULSE_START

FDS FDS

PULSE_END

FDS FDS 01

Dummy Intrinsic Delay Chain

CDC

- 69 -

selects the propagation paths from CDCs [41]. The intrinsic delay of CDS is only the gate delay of one multiplexer and interconnect delay as shown in Fig. 5.5(b).

In order to achieve better delay resolution, a hysteresis delay cell (HDC) and 16 digitally controlled varactors (DCV’s) are added as shown in Fig. 5.5(c). When the tri-state inverter of the HDC is enabled (F[0] is high), output signal of the enabled tri-state inverter has the hysteresis phenomenon in the transition state to produce different delay times. The gate capacitance of a DCV can be changed slightly by the fine-tuning control code (F[16:1]) to obtain high delay resolution in FDS. Because a tri-state holder cell can provide larger delay than a DCV, it can replace many DCV’s to reduce power consumption and the intrinsic, ensuring that the delay range of FDS covers the minimum delay time of CDC to keep the dead zone less than the delay resolution of FDS. As a result, the overall intrinsic delay of DCDL can be reduced by CDC and tri-state holder. The simulation results show that the minimum delay resolution of one FDS is 4ps; hence the total delay resolution of DCDL is 16ps. In order to enlarge the phase-shift range of DCPS, the gain of control code of DCPS is four, thus the minimum tuning delay of DCPS is 16ps.

Fig. 5.7: Layout of ADDLL and DCPS.

- 70 -

5.3.2 Time-to-Digital Converter

Fig. 5.6(a) illustrates the architecture of the proposed TDC. The period of input clock is quantized by 4 CDCs and converted to TDC control code (TDC_CODE) as shown in Fig. 5.6(b). Pulse_Start and Pulse_End rises at the first and second rising edge of input clock respectively. The dummy intrinsic delay chain that contains 4 FDSs with minimum delay and one multiplexer is the same as the minimum delay path of DCDL. Because the total delay of DCDL consists of the intrinsic delay and the tunable delay cell delay, Pulse_Start will pass through the dummy intrinsic delay chain in the front of the CDC chain and then the delay between delayed Pulse_Start (Pulse_Start_D) and Pulse_End will be quantized by 4 CDCs and converted to TDC

(a)

0.25 input clock period

(b)

Fig. 5.8: (a) Transient response of ADDLL. (b) ADDLL at steady state.

- 71 -

control code. As a result, the intrinsic delay effect can be removed to improve the precision of quantization and conversion. Additionally, Pulse_Start and Pulse_End only toggle once after system is reset.

5.4 Experimental Results and Comparisons

The proposed design is implemented by 0.13µm CMOS standard library where the layout of ADDLL and DCPS is shown in Fig. 5.7, and area of ADDLL and DCPS

DQ DQS

DQSD

DQS_R_ADJ DLL_CTRL

DQS_R_CTRL

(a)

DQ DQS

DQSD

DQS_R_ADJ DLL_CTRL

DQS_R_CTRL

(b)

Fig. 5.9: Tunable signal phase scheme in read operation when (a) DQS leads DQ.

(b) DQS lags DQ.

- 72 -

is 0.026mm2 and 0.01mm2 respectively. The proposed ADDLL and DCPS are designed and implemented by cell-based design flow, thus the proposed architecture and lock-in algorithm are modeled in Hardware Description Language (HDL) and functionally verified using NC-Verilog simulator. Fig. 5.8(a) shows the locking procedure of ADDLL after system is reset. The entire phase locking procedure takes 13 clock cycles. Fig. 5.8(b) shows the proposed ADDLL at steady state. When ADDLL is locked, the generated 4-phase clock signals reach equal space in one input clock period. Thus the phase shift between P90 and P360 is 1/4 clock period.

The proposed designs have been verified by HSPICE post-layout simulation with 1.2V. The simulation results of the proposed tunable phase shift scheme show the delayed DQS (DQS_D) can be adjusted to approach the center of DQ period when DQS leads or lags DQ, as a result, it can eliminate the mismatching delay from

The proposed designs have been verified by HSPICE post-layout simulation with 1.2V. The simulation results of the proposed tunable phase shift scheme show the delayed DQS (DQS_D) can be adjusted to approach the center of DQ period when DQS leads or lags DQ, as a result, it can eliminate the mismatching delay from