• 沒有找到結果。

全數位快速鎖定自我校正多相位延遲鎖定迴路

N/A
N/A
Protected

Academic year: 2021

Share "全數位快速鎖定自我校正多相位延遲鎖定迴路"

Copied!
94
0
0

加載中.... (立即查看全文)

全文

(1)

電子工程學系 電子研究所碩士班

全數位快速鎖定自我校正多相位延遲鎖定迴路設計

An All-Digital Fast-Lock Self-Calibrated Multiphase DLL

研 究 生:莊立溥

指導教授:黃 威 教授

(2)

全數位快速鎖定自我校正多相位延遲鎖定迴路設計

全數位快速鎖定自我校正多相位延遲鎖定迴路設計

全數位快速鎖定自我校正多相位延遲鎖定迴路設計

全數位快速鎖定自我校正多相位延遲鎖定迴路設計

An All-Digital Fast-Lock Self-Calibrated Multiphase DLL

研 究 生:莊立溥 Student:Li-Pu Chuang

指導教授:黃 威 教授 Advisor:Prof. Wei Hwang

國 立 交 通 大 學

電 子 工 程 學 系 電 子 研 究 所

碩 士 論 文

A Thesis

Submitted to Department of Electronics Engineering & Institute of Electronics College of Electrical Engineering and Computer Engineering

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master

in

Electronics Engineering

July 2008

Hsinchu, Taiwan, Republic of China

(3)

全數位快速鎖定自我校正多相位延遲鎖定迴路設計

全數位快速鎖定自我校正多相位延遲鎖定迴路設計

全數位快速鎖定自我校正多相位延遲鎖定迴路設計

全數位快速鎖定自我校正多相位延遲鎖定迴路設計

學生

學生

學生

學生:

:莊立溥

莊立溥

莊立溥

莊立溥

指導教授

指導教授

指導教授:

指導教授

:黃

教授

教授

教授

教授

國立交通大學電子工程學系電子研究所

國立交通大學電子工程學系電子研究所

國立交通大學電子工程學系電子研究所

國立交通大學電子工程學系電子研究所

摘 要

本論文提出一個全數位式快速鎖定具自我校正功能的多相位延遲鎖定迴路 設計。根據所提出的快速自我校正演算法,減少因為製程不相配或是輸出負載不 同造成輸出訊號的相位誤差。此外,為了達到快速鎖定,以及增加操作頻率範圍 並且同時避免多諧鎖定,提出了一個非平衡式二進位搜尋演算法,其特點在於提 供不同的初始延遲時間已達到上述的功能。一個非平衡式二進位搜尋控制器實現 在 UMC 90nm CMOS 技術,模擬結果顯示,當延遲鎖定迴路操作頻率在 100MHz 到 500MHz (五倍)時,可以在 22 個參考時脈週期內鎖定(最差情況)。 一個 300MHz 到 1.08GHz 全數位精確多相位輸出延遲鎖定迴路實現在 UMC 90nm CMOS 技術。藉由一新型數位控制線性近似延遲元件達到線性增加延遲時 間以及抗環境變異的能力。一個數位式校正單元根據所提出的快速自我校正演算 法被設計與實現並且能使多相位輸出訊號的相位誤差自我校正。在校正程序結束 之後,校正單元會自動關閉以減少功率消耗。在操作頻率為 500MHz 時,最大相 位誤差可從 20.9ps 減少至 4.5ps。其最大消耗的總功率為 2.16 毫瓦當操作在 1GHz 時。本論文提出的延遲鎖定迴路可穩定地使用在各種嵌入式記憶體應用。

(4)

An All-Digital Fast-Lock Self-Calibrated Multiphase DLL

Student : Li-Pu Chuang

Advisors : Prof. Wei Hwang

Department of Electronics Engineering & Institute of Electronics

National Chiao-Tung University

ABSTRACT

An all-digital fast-lock self-calibrated DLL is proposed in this thesis. Base on the proposed rapid self-calibration (RSC) algorithm, the timing error caused by process mismatch and various output loading can be effectively self-calibrated. Besides, an unbalance binary search algorithm is proposed to extend the locking range and avoid harmonic lock at the same time. An unbalance binary search algorithm based (UBS) controlled is implemented in UMC 90nm CMOS technology. The simulation results show that, the operating frequency is 100MHz to 500MHz (up to 5X) and the lock-in time is down to 22 reference clock cycles in the worst case.

A 300MHz-1.08GHz all-digital multiphase delay-locked loop with precise multi-phase output has been designed with UMC 90nm CMOS technology. The linear approximate delay element property of linearity and insensitive to PVT variation is good for digitally controlled delay line. In addition, a digital calibration unit is designed based on RSC algorithm, which makes the phase error among the multiple outputs can be self-calibrated. The entire calibration unit could be turned off after calibration procedure is complete to reduce power consumption. The simulation results show the DLL exhibits a lock range from 300MHz to 1.08GHz. The maximum phase is reduced from 20.9ps to 4.5ps when the DLL is operating at 500MHz. The total power dissipation of the all-digital self-calibrated multiphase delay-locked loop is 2.16mW at 1GHz with 1V power supply. The presented DLL can be robustly used in embedded memory applications.

(5)

Content

CHAPTER 1 INTRODUCTION ...1

1.1BACKGROUND...1

1.2MOTIVATION...1

1.3ORGANIZATION...2

CHAPTER 2 AN OVERVIEW OF DELAY-LOCKED LOOP ...4

2.1BASIC CONCEPTS OF DELAY-LOCKED LOOP...4

2.2CLASSIFICATIONS OF DELAY-LOCKED LOOP...5

2.3DESIGN OF ANALOG DELAY-LOCKED LOOP...7

2.3.1 Voltage-controlled Delay Line...7

2.3.2 Phase Detector...9

2.3.3 Charge Pump and Loop Filter ...10

2.3.4 Stability Analysis of Delay-Locked Loop ... 11

2.4DESIGN OF CONVENTIONAL DIGITAL DLL...13

2.4.1 Register-controlled DLL [15] ...13

2.4.2 Counter-controlled DLL [48]...14

2.4.3 Successive Approximation Register-controlled DLL [4] ...15

2.4.4 Time measurement controlled DLL ...17

2.4.5 Digitally Controlled Delay Line...18

2.4.5.1 Shunt Capacitor based DCDE ... 19

2.4.5.2 Standard Cell based DCDE ... 20

2.4.5.3 Current-starved based DCDE ... 21

2.5COMPARISON OF DIGITAL DLL AND ANALOG DLL...23

CHAPTER 3 MULTIPHASE DELAY-LOCKED LOOP WITH SELF-CALIBRATION...25

3.1INTRODUCTION OF MULTIPHASE DLL ...25

3.2APPLICATION OF MULTIPHASE DLL...26

3.3SELF-CALIBRATION TECHNIQUES...30

3.3.1 A Self-Calibration Delay-Locked Delay Line...32

3.3.2 Sequential Phase Adjustment Calibration Technique...33

3.3.3 Parallel Phase Adjustment Calibration Technique ...34

3.3.4 A PLL based Self-calibrated algorithm...36

(6)

CHAPTER 4 A WIDE-RANGE AND FAST-LOCK ALL-DIGITAL

DELAY-LOCKED LOOP ...43

4.1INTRODUCTION OF WIDE-RANGE DLL ...43

4.2PREVIOUS RESEARCH OF WIDE-RANGE DLL...44

4.3UNBALANCE BINARY SEARCH ALGORITHM...46

4.4CIRCUIT DESCRIPTION...50

4.4.1 Step Controller ...50

4.4.2 Binary Controller ...51

4.4.3 Digitally Controlled Delay Line...53

4.5SIMULATION RESULTS...55

CHAPTER 5 IMPLEMENTATION OF ALL-DIGITAL FAST-LOCK SELF-CALIBRATED MULTIPHASE DLL ...57

5.1INTRODUCTION...57

5.1SYSTEM ARCHITECTURE...59

5.2CIRCUIT DESCRIPTION...60

5.2.1 Phase Detector...60

5.2.2 Linearly Approximant Delay Element ...62

5.2.3 Digitally Controlled Delay Line...64

5.2.4 Lock-in unit ...65

5.2.5 Calibration Unit...66

5.2.5.1 Digital Relative Phase Detector... 67

5.2.5.2 Interpolator... 69

5.2.5.3 Lock Detect Unit ... 70

5.3DESIGN IMPLEMENTATION...71

5.4SIMULATION RESULT...73

CHAPTER 6 CONCLUSION AND FUTURE WORK ...77

6.1CONCLUSION...77

6.2FUTURE WORK...77

(7)

List of Tables

TABLE 1COMPARISON OF DIFFERENT TYPE DCDE...23

TABLE 2COMPARISON OF ANALOG DLL AND DIGITAL DLL ...24

TABLE 3SUMMARY OF THE ADSCM-DLL...75

(8)

List of Figures

FIGURE 1THE ARCHITECTURE OF CONVENTIONAL DLL...4

FIGURE 2BLOCK DIAGRAM OF ANALOG DLL...7

FIGURE 3THE RCCDL(A) DELAY ELEMENT (B) DELAY LINE...8

FIGURE 4THE CSCDL(A) DELAY ELEMENT (B) DELAY LINE...8

FIGURE 5THREE-STATE PHASE DETECTOR...9

FIGURE 6PD RESPONSES WITH (A) REFERENCE SIGNAL LAG FEEDBACK SIGNAL (B) REFERENCE SIGNAL LEAD FEEDBACK...9

FIGURE 7PD STATE DIAGRAM...10

FIGURE 8SIMPLE MODEL OF CHARGE PUMP AND LOOP FILTER... 11

FIGURE 9LOOP FILTER... 11

FIGURE 10SMALL SIGNAL AC MODEL OF THE CONVENTIONAL ANALOG DLL... 11

FIGURE 11BLOCK DIAGRAM OF DIGITAL DLL...13

FIGURE 12REGISTER CONTROLLED DLL ...14

FIGURE 13 COUNTER-CONTROLLED DLL ...15

FIGURE 14FLOWCHART OF 3-BIT BINARY SEARCH ALGORITHM...16

FIGURE 15SARDLL ...17

FIGURE 16TMDLL...18

FIGURE 17DCDL REALIZED BY A PATH-SELECTION METHOD...18

FIGURE 18LDU AND LDL ...19

FIGURE 19 SHUNT-CAPACITOR BASED DCDE...20

FIGURE 20 PARALLEL TRI-STATE INVERTER BASED DCDE...20

FIGURE 21AOI-OAI PARALLEL BASED DCDE ...21

FIGURE 22 CURRENT STARVED BASED DCDE ...22

FIGURE 23THE BLOCK DIAGRAM OF CONVENTIONAL DLL-BASED FREQUENCY SYNTHESIZER...26

FIGURE 24THE OPERATION OF DVFS SCHEME...27

FIGURE 257:1DATA CHANNEL COMPRESSION TRANSCEIVER.(A)TRANSMITTER CIRCUIT.(B)RECEIVER CIRCUIT. ...29

FIGURE 26READ OPERATION TIMING BUDGET...29

FIGURE 27THE BLOCK DIAGRAM OF MULTIPHASE DLL FOR DDRSDRAM APPLICATION...30

FIGURE 28DELAY TIME MISMATCH DUE TO THE DELAY CELL WITH THE THRESHOLD VOLTAGE MISMATCH OF 15 MV ...31

FIGURE 29DELAY TIME MISMATCH DUE TO THE DELAY CELL WITH THE CHANNEL LENGTH MISMATCH OF 10% ...31

(9)

FIGURE 31SEQUENTIAL PHASE ADJUSTMENT CALIBRATION ALGORITHM...34

FIGURE 32(A)RELATIVE PHASE DETECTOR (B)RELATIVE COMPARISON METHOD...34

FIGURE 33PARALLEL PHASE ADJUSTMENT CALIBRATION ALGORITHM...35

FIGURE 34(A)DELAY SENSING CIRCUIT.(B)CALIBRATION LOOP CHARGE PUMP...35

FIGURE 35 THE IDEA OF PROPOSED RSC ALGORITHM...37

FIGURE 36STEP SUMMARY OF RSC ALGORITHM...39

FIGURE 37THE CALIBRATION PRODUCE COMPARISON WITH (A)IQ-STYLE (B) BUFFER-STYLE (C) RSC-BASED...41

FIGURE 38CALIBRATION CYCLES COMPARISON...42

FIGURE 39HARMONIC LOCKING PROBLEMS...44

FIGURE 40 (A)FALSE-LOCK CAPABILITY PHASE DETECTOR AND ITS (B) TIMING DIAGRAM...45

FIGURE 41CONVENTIONAL BINARY SEARCH BASED CONTROLLER...47

FIGURE 423-BIT UNBALANCE BINARY SEARCH ALGORITHM. ...48

FIGURE 43COMPARISON WITH CONVENTIONAL BS AND UBS ALGORITHM...49

FIGURE 44 SIMULATION LOCK TIME VERSUS THE OPERATION RANGE. ...50

FIGURE 459-BIT STEP CONTROLLER...51

FIGURE 46(A)9-BIT BINARY CONTROLLER (B)SBG...52

FIGURE 47THE OPERATION OF THE UBS CONTROLLER. ...53

FIGURE 48THE ARCHITECTURE OF DCDL AND BWDC. ...54

FIGURE 49THE SIMULATION RESULTS OF (A) DELAY TIME VERSUS INPUT VECTOR (B) POWER CONSUMPTION VERSUS DELAY TIME. ...54

FIGURE 50LOCK PROCESS WHEN THE OPERATING FREQUENCY AT 100MHZ...55

FIGURE 51LOCK PROCESS WHEN THE OPERATING FREQUENCY AT 125MHZ...56

FIGURE 52LOCK PROCESS WHEN THE OPERATING FREQUENCY AT 250MHZ...56

FIGURE 53LOCK PROCESS WHEN THE OPERATING FREQUENCY AT 500MHZ...56

FIGURE 54THE PROPOSED ADSCM-DLL ARCHITECTURE. ...59

FIGURE 55(A)THE BLOCK DIAGRAM OF PHASE DETECTOR (B)TSPC...60

FIGURE 56THE OPERATION OF CONVENTION PD. ...61

FIGURE 57THE MODIFY TSPCDFF ...62

FIGURE 58THE OPERATION OF PROPOSED PD...62

FIGURE 59LADE...63

FIGURE 60THE ARCHITECTURE OF PROPOSED DCDL. ...64

FIGURE 61DEALT OF DCDL V.S. INPUT VECTOR...64

FIGURE 625-BIT LOCK-IN UNIT...65

FIGURE 63THE OPERATION OF PROPOSED LOCK-IN UNIT...66

FIGURE 64THE BLOCK DIAGRAM OF THE CALIBRATION UNIT. ...66

FIGURE 65OPERATION OF DRPD[3]...68

(10)

FIGURE 67THE PROPOSED INTERPOLATOR...70

FIGURE 68THE LOCK DETECT UNIT...71

FIGURE 69THE POWER COMPARISON OF WITH/WITHOUT LDU...71

FIGURE 70LAYOUT VIEW OF THE ADSCM-DLL...72

FIGURE 71LAYOUT VIEW OF THE TEST CHIP...72

FIGURE 72THE OPERATION OF LOCK-IN STAGE...73

FIGURE 73THE OPERATION OF CALIBRATION STAGE...74

FIGURE 74THE PHASE ERROR OF EACH DELAY STAGE (A)90NM (B)130NM...75

(11)

CHAPTER 1

INTRODUCTION

1.1

B

ACKGROUND

With the growth of CMOS process technology, the complexity and operating frequency in the VSLI systems had growth exponentially. The design trend goes toward to the system–level integration and single-chip solution. In the point of System-On-Chip (SoC) design, the reusable modules takes advantages of design cycle and process portable. Therefore, the quality of the synchronous clock signals between each module becomes more important. How to eliminate the clock skew becomes an important issue for the high performance VLSI systems and SOC application.

Phase-locked loop (PLL) and delay-locked loop (DLL) are widely used to solve the clock synchronization problem. However, the DLL is more suitable for the clock de-skew problem than PLL due to the simple design effort and innate characteristic. Besides, the DLL also provides better jitter performance because there is no jitter accumulation in a voltage controlled delay line (VCDL) or digitally controlled delay line (DCDL). As a consequence, the DLL is frequently used in clock synchronous.

1.2

M

OTIVATION

The application of DLL is not only limited to the clock synchronous but also for the clock/data recovery (CDR) circuit [45], double data rate (DDR) SDRAM [9], [10] and frequency multiplier [43], [44], [1]. A multiphase VCDL or DCDL output is typically used to implement this circuit function. However, the edges of the multiphase output signals are not equally spaced due to the delay mismatches. For the CDR circuits using the multiphase sampling schemes [45], the phase offset corrupts signal constellation and raising the bit error rates. Similarly, the frequency multiplier

(12)

using edge combiner schemes [43], [44], the static phase error among each delay stage induce the fixed pattern jitter at the multiplied clock output. Therefore, a DLL with precise multiphase outputs is necessary. Moreover, the conventional DLLs may suffer from harmonic lock over a wide operating frequency range. Various wide-range DLLs architectures have been develop to solve the false locking problem. The DLL with multiple VCDLs to overcome this problem of a limited delay range is proposed in [14]. In [6], an all-analog DLL improves the locking range by using replica delay line. However, it is not suitable for the process portability and noise immunity consideration. Therefore, digital is developed to improve this problem.

According to above issues, this thesis focuses on the techniques of the search algorithm for the DLL to eliminate false locking problem and the calibration mechanism for the multiphase outputs to compensate the delay mismatch among the delay line.

1.3

O

RGANIZATION

The thesis organization is as follows:

Chapter 2 gives an overview of DLL, including analog DLL and digital DLL. A comparison result is also given in this chapter.

Chapter 3 describes the fundamentals of the calibration schemes for multiphase DLL and presents a novel rapid self-calibrated (RSC) algorithm. Base on the RSC algorithm, the multiphase DLL can adjust the phase difference in digital manner and eliminate the phase error of the multiple outputs. A comparison result with other calibration schemes is also presented.

Chapter 4 A modify binary search algorithm is presented which extends the locking range to fully delay line and avoid harmonic locking, simultaneously. Then, the circuit design and detail operation flow of the DLL which base on modify binary search algorithm will also be addressed.

Chap 5 presents a multiphase DLL with precise multiple outputs. Base on the RSC algorithm, the system architecture and its circuit design is also presented. Finally, we will show the implementation of layout, simulation result and performance summary.

(13)
(14)

CHAPTER 2

AN OVERVIEW OF DELAY-LOCKED

LOOP

2.1

B

ASIC

C

ONCEPTS OF

D

ELAY

-L

OCKED

L

OOP

Delay Line (T

d

)

Phase

detector

DLL

controller

Reference Clock Outout Clock Clock Buffer(Tcb) Feedback Clock

Figure 1 The architecture of conventional DLL

The basic architecture of a conventional DLL is shown in Figure. 1. A DLL consists of a phase detector, a variable delay line, and a DLL controller to convert the PD’s output signal to digital or analog signals for the delay line. It automatically tunes the delay time of the delay line and inserts an optimal delay time (Td) to compensate the

phase error between the reference clock and output clock. After the DLL is locked, equation (2.1) will be satisfied, where K is an integer, Tref represents the clock period of

the reference clock. Td and Tcb denotes the delay time of delay line and clock buffer

respectively.

ref d cb

K T

´

=

T + T

(2.1)

(15)

between the reference clock and output clock (or called feedback clock). At the same time, the output clock will be synchronized with the reference clock, and the clock buffer delay can be ignored.

Since the delay line is adjusted in the analog manner, the continuous tuning step results in higher delay resolution than in a digital one. Besides, it can achieve better jitter performance and smaller chip area. However, it is not suitable for future low-voltage applications because it cannot provide enough delay range under low supply voltage [4]. Moreover, the process-sensitive characteristic makes them difficult to be transferred to advanced technologies and less noise immunity in a System on Chip (SoC) environment. On the contrary, digital manner can provider more robust to overcome the process, voltage, and temperature (PVT) variations, and exhibit shorter lock time and noise immunity than analog one.

The design challenge of DLL is how to overcome the PVT variations, and balance the clock jitter, power consumption, area cost, portability, and lock time. Thus, different manners have been proposed to reach this objective. In the next section we will introduce the classifications of delay-locked loop.

2.2

C

LASSIFICATIONS OF

D

ELAY

-L

OCKED

L

OOP

We can classify delay-locked loop into open loop type and closed loop type by different locking mechanisms.

I. Open loop type Delay-Locked Loop

Synchronous mirror delay (SMD) is the most typical circuit of open loop type design. The main advantage of SMD is the fast locking characteristics in recovering from power-down or standby mode within a few cycles of the system clock. Nevertheless, the fast locking characteristics of SMD, the phase error between the reference clock signal and output signal cannot be controlled as accurately as a close loop type DLL [46]. Thus the analog synchronous mirror delay (ASMD) [49] was proposed to enhance the phase acquisition performance.

(16)

Register-controlled DLL [15], [29] and Counter-controlled DLL [48] is the most typical example of closed loop type delay-locked loop design. The most advantage is the improvement of the clock skew problem in open loop type delay-locked loop caused by environment variation, and smaller static phase error and lower clock jitter is also achieved. However, in order to synchronize between the reference clock signal and output clock signal the lock time in closed loop type is longer than in an open loop one and the lock mechanism also consumes more power. Thus, the Successive Approximation Register-controlled DLL [4] that uses binary search manner and Time measurement controlled DLL has been proposed to resolve lock time and power consumption problem.

Besides the classifications mentioned above, we can define different types of DLLs by circuit implement manners. The classifications of DLL circuit are defined as

: follows

(1) Analog DLL: Each block processes an analog signal. The advantages are low jitter output and higher delay resolution. The disadvantage is lower noise immunity and a longer design cycle.

(2) All Digital DLL Each block processes digital signal. Higher noise immunity and portability are the advantages of ADDLL. However, the lower delay resolution and jitter performance are disadvantages in ADDLL in general.

(3) Mixed DLL Using digital blocks to reach fast coarse tuning lock and fine tuning the phase error in an analog manner. The advantage is that it can reach high delay resolution and fast lock time, but the drawback is it is hard to integrate digital and analog blocks simultaneously.

(17)

2.3

D

ESIGN OF

A

NALOG

D

ELAY

-L

OCKED

L

OOP

Figure 2 Block diagram of analog DLL

Figure 2 illustrates the block diagram of an analog DLL that contains a voltage-controlled delay line (VCDL), a phase detector, a charge pump, and a first order loop filter. The reference clock signal propagates through the voltage-controlled delay line that consists of cascaded variable delay stages. The phase detector compares the phase between the reference clock and output clock, which is the delay version of VCDL, and produces an up/down signal. The charge pump integrates the phase detector output signal and the loop filter produces a control voltage, Vctrl, to operate the delay

line.

2.3.1 Voltage-controlled Delay Line

Delay elements are widely used in digital systems and are essential parts for clocking operation in high speed VLSI application. The simple and easy to design makes the RC delay and inverter chain method have been the most common delay elements in those applications. However, the characteristics of the delay element are sensitive to supply noise and PVT variations.

In this section, we will introduce the two distinct approaches of VCDL. They are RC-time-constant Controlled Delay Line (RCCDL) and Current-Starved Controlled Delay Line (CSCDL)

(18)

1. RC-time-constant Controlled Delay Line

The basic delay line of RC-time-constant controlled delay element is shown in Figure 3(b). The circuit can be obtained by cascading even number of the same delay elements. In Figure 3(a), the control voltage (Vctrl) controls the charge current. The

transistor Mn1 in essence controls the amount of effective load capacitance “seen” by the driving gate. Large value of Vctrl decreases the resistance of the transistor Mn1, so

the effective capacitance at the logic gate output increase, producing a large delay.

Figure 3 The RCCDL (a) delay element (b) delay line

2. Current-Starved Controlled Delay Line

A basic delay element of CSCDL is shown in Figure 4(a). A simple current mirror can be used to generate two bias voltages. The control voltage Vctrl is applied to a

series-connected element which can “current starve” an inverter. Vctrl modulates the ON

resistance of pull-down transistor Mn1, and through a current mirror, pull-up transistor Mp1. These variable resistances control the current available to charge or discharge the load capacitance. Large values of Vctrl allow a large current to follow, producing a small

delay.

(19)

2.3.2 Phase Detector

Phase Detector is a circuit that is response the relationship between reference and feedback signal. Figure 5 shows three-state phase detector circuit and Figure 6 shows the waveforms in some conditions. Unlike multipliers and XOR gate, three-state PD generates two outputs that are not complementary. When the feedback signal is high and the reference signal is low, then the PD produces positive pulse at down signal, while up signal remains at zero.

Conversely, if reference signal is high and feedback signal is low then positive pulses appear at up signal while down signal is zero. It should be note that, in principle, up and down are never high together in the simulation. The average value of up-down is an indication of phase difference between reference and feedback clock.

Figure 5 Three-state phase detector

Reference signal Feedback signal UP DOWN T Reference signal Feedback signal UP DOWN T

(20)

Figure 7 PD state diagram

In the Figure 7, it shows the PD circuit behavior. It has three state diagrams: UP=1, DOWN=0 (state 1), UP=0, DOWN=0 (state 0), UP=0, DOWN=1 (state 2). Because the PD is build up from two edge-triggered sequential circuits, we can avoid dependence of the output upon the duty cycle of the inputs.

Suppose the circuit is initially in state 0. Then a rising edge on reference signal takes the circuit to state 1, where UP=1, down=0. With state 1 is reached, any more rising edges at reference signal won’t case state change at all. The circuit will remain in this state until a transition occurs on feedback signal, upon which the PD returns to state 0. The switching sequence between state 0 and state 2 is similar.

The three-state PD can nominally detect a full range of phase difference, i.e. +2pi,-2pi. A phase difference larger than 2pi is truncated with respect to integer of 2pi. The output of the PD can drive charge pump to produce a controlled voltage for delay line. The charge pump and loop filter will be discussed followed.

2.3.3 Charge Pump and Loop Filter

The simple model of charge pump and loop filter is shown in Figure 8. It consists of two matched current sources and function as switch. When the up signal is high, it turns on the upper switch and charges output node Vctrl. On the other hand, when the down signal is high, the down signal turns on the lower switch and discharges the output node Vctrl. Finally, if both up and down signal are low, then net current is zero and output node Vctrl holds the original voltage.

(21)

simple to design and has better noise performance. The passive filter was shown in Figure 9, which may be first-order, second-order, or other high order structure. High order filters take advantages of rejecting out-band noise. However, low order filters result in more stable operations. The choice between high order filters and low order filters depends on the applications and to prevent DLL into unstable state.

Figure 8 Simple model of charge pump and loop filter

Vctrl IP Vctrl IP V ctrl IP R2 R1 C1 C2 C3 R1 C1 C2 C1

Figure 9 Loop filter

2.3.4 Stability Analysis of Delay-Locked Loop

(22)

Before starting the stability analysis of ADLL, the small signal AC model shall be introduced first. This is shown in Figure 10 where summer stands for phase detector, Icp is the charge pump current, TREF is the period of input reference clock, C is the capacitor value in loop filter, and KVCDL is the gain of VCDL. When loop is in steady-state locked condition, the s-domain transfer function from input to output is

0 1 ( ) 1 (2.2) ( ) 1 N D s s D s w = + Where 2 (2.3) N REF T p w =

From Eq. 2-10, we can easily find that the DLL is a first order system that is inherently stable. Unlike the small-signal AC model for a typical PLL, a minimum of a second order transfer function is required.

Since the transfer function is inherently stable, a wider loop bandwidth can be used. This allows a fast acquisition time, as well as the use of small loop filter capacitors facilitating integration. However, the small-signal AC model is only valid when the loop bandwidth, that is ωN, is much smaller than the phase detector comparison frequency (generally 10:1). Therefore, the following equation should be satisfied for stability consideration.

1 (2.4) 2 10 N CP VCDL REF I K C w w p × Where 2 (2.5) N REF T p w =

(23)

2.4

D

ESIGN OF

C

ONVENTIONAL

D

IGITAL

DLL

Figure 11 Block diagram of digital DLL

The conventional digital DLL architecture is shown in Figure 11. It consists of three major blocks and constructs a close loop circuit. The major blocks are phase detector (PD), control unit (CU) and digital control delay line (DCDL) respectively.

The input of DLL is external clock (Ext_clk) and feedback signal is internal clock (Int_clk) which is the delayed version of the external clock signal. The CU generates digital signals to control the amount of the delay time, and the PD detects the phase error between the input signal clock signal and the feedback signal. If Ext_clk signal leads Int_clk signal, the CU adjusts the digital signals to increase the delay time of DCDL. Conversely, the CU decrease the delay time to compensate the phase error until the Int_clk synchronize to Ext_clk.

By different implementation of control unit, we can classify control unit into register-controlled, counter-controlled, successive approximation register-controlled, and time measurement controlled of conventional DLL. The following section will describe in detail.

2.4.1 Register-controlled DLL [15]

As Figure 12 shows the block diagram of register-controlled DLL. The n-bit shift register which is controlled by the output of phase detector is used to generate control signals for the digitally controlled delay line. At any time, only on bit of the shift

(24)

register is active to select a specify delay time of delay line. The phase detector detects the relation between input clock and output clock, and generates left and right signal for shift register to control the amount delay time. When Enable is active, it will enable the shift register, vice versa.

Figure 12 Register controlled DLL

When the output clock leads the input clock, the phase detector sends left signal to shift register and the high bit in the shift register will be shifted left to increase the delay time to compensate for the delay mismatch. Similarly, when the right is active, the high bit in the shift register will be shifted right to decrease the delay time. When Enable is active, the phase error between the input clock and the output clock is within one unit delay, and the data in the shift register will be held. Under this mechanism, the loop is locked and the phase error will not exceed the unit delay.

Although the control mechanism is quite sample, but when the operating range is increased, the additional delay stages of delay line should be added, however, it increases the chip area. Beside, the control mechanism is one by one, means, the more delay stages needs more shift registers to control the delay line. Thus, it also increases locking time. In the worst case, n-bit shift register needs n/2 locking cycles.

2.4.2 Counter-controlled DLL [48]

Basically, the operating principle of counter-controlled DLL is similar to register-controlled DLL expect the up/down counter substitutes for the shift register to control the delay line. In addition, the binary-weighted delay line is adopted and no longer consists of delay stages with equal delay time. The linearity of binary-weighted

(25)

delay line is an important issue, we will discuss in section 2.4.5. Hence, we focus on the characteristic of CDLL.

N-bit up/down counter

Delay Cell Delay Cell Binary-Weighted Delay Line Delay Cell

Phase detector

UP/DOWNLOCK

Output Clock Input Clock d2 d1 (dummy) Out In Figure 13 counter-controlled DLL

Figure 13 shows the block diagram of counter-controlled DLL. The active of up/down counter is base on the output of phase detector. The n-bit control word determiners whether the input signal goes through the delay path or passes it. The most different between register-controlled DLL (RDLL) and counter-controlled DLL (CDLL) is area requirement. For example, compare with the RDLL, if 128 delay stages are required in a RDLL, only 7 delay stages are required in a CDLL. Besides, the 128-bit shift register in a RDLL can be substituted for 7-bit up/down counter. While the operating ranges and delay resolution of RDLL and CDLL are the same, the delay line of RDLL will get larger offset delay time and occupy larger chip area than the CDLL. By using CDLL, the chip area could be reduced while maintaining the same operating range as in a RDLL. However, the CDLL still use to linear approach manner to trace the input clock, thus the locking time of CDLL would not get any improvement as RDLL. In the worst case, with n-bit binary-weighted delay line, the locking time maintains n/2 locking cycles.

2.4.3 Successive Approximation Register-controlled DLL [4]

As we mention above, the locking time is an important parameter for digital DLL to evaluate the performance, especially in the high-speed memory applications. Both of the DLL that mentioned above based on the linear search exhibit the same lock time. The linear search algorithm increases the locking time when finding the optimal delay

(26)

of delay line to insert into the input clock and output clock. The binary search algorithm may be applied to reduce the locking time. First, the most significant bit (MSB) of the control word is set to 1, and the other bits all are set to 0. The phase detector judge whether the output clock leads the input clock or not. If output clock leads the input clock, the MSB is set to low. If output clock lags the input clock, the MSB remains high and held constant. In this way, the MSB is determined. The operating produce is repeated for the following bit until the least significant bit (LSB) is determined.

Figure 14 shows an example of the 3-bit binary search algorithm. Assume the final control word is set to “001” and the initial control word is set to “100”. In this example, the output clock leads input clock in the step 1 and step2, and output clock lags input clock in the step 3. Finally the binary searching finds the correct control word “001”.

110 111 101 100 101 011 010 011 001 000 001 110 010 Lead Lag Step1 Step2 Step0 111 Start 100

Figure 14 Flowchart of 3-bit binary search algorithm

The successive approximation register (SAR) DLL changes the searching mechanism to binary search algorithm and adopted with binary-weighted delay line. It is not only reduces the chip area but also shorten the locking time. In the worst case, with n-bit delay line, the locking time of SAR-DLL is log2(2^(n-1)). Unfortunately, The SAR controller in the DLL determines the value of each bit of the word in a sequential and irreversible. Therefore, it becomes an open-loop type circuit after lock-in and never against the PVT variation. An improved SAR DLL [41] was proposed to solve this problem by using the counter-controlled control word instead of SAR-controlled. The initial control word of the counter is load from the SAR controller, and then a counter-controlled DLL is started to maintain the environment variation.

(27)

Figure 15 SARDLL

2.4.4 Time measurement controlled DLL

Another mechanism to reduce the locking time was proposed in [49] [32]. The time measurement controlled (TM) DLL divide the locking produce into two stages, coarse tuning and phase tracing. The coarse tuning stage is based on the time to digital converter (TDC) circuit. In RDLL and CDLL, the narrow tuning step causes the long locking time. The TDC can measures the input clock period and convert it to digital signals within two clock cycles, then transfer the digital control word to the control block, therefore, the tuning step is extensive. After the coarse tuning stage, the phase tracing stage is active to fine tune the delay of the delay line. Usually, only few control bits need to be determined in the phase tracing stage, therefore, a counter-controlled based control block is preferred. Compare with the TD-DLL and SAR-DLL, there is no different of locking time in phase tracing stage, the most distinction between TD-DLL and SAR-DLL is in the coarse tuning stage. The locking time in the coarse tuning stage of SAR-DLL depends on how many control bits need to be determined, but the TD-DLL can achieve coarse truing within only few cycles. In the worst case, assume m fine tuning bits, the locking time of TD-DLL is (m/2+2) locking cycles. Although the search time of TD-DLL is quite quick, the drawback of TD-DLL still is the area requirement.

(28)

Figure 16 TM DLL

2.4.5 Digitally Controlled Delay Line

Digitally controlled delay line (DCDL) is the key component of ADDLL. Like most voltage controlled delay line (VCDL), the DCDL consists of several different digitally controlled delay elements (DCDE). There are two main parameters to adjust the delay time of DCDL. One is the total number of the delay elements, usually taken for the coarse tune method, and the other is the propagation delay time of the delay elements (i.e. inverters), which is usually taken for the fine tune method. The first delay time adjustment parameter is usually realized by a path-selection approach, and Figure 17 shows the example [16]. In this example, 2n delay buffer are connected in series. A decoder decodes an n-bit control word D into 2 n control lines. Hence, if the propagation delay time of each buffer stage is Tbuffer, then the time resolution is 2*Tbuffer.

(29)

Another example of phase-selection method is shown in Figure 18. The lattice delay line (LDL) [5] cascaded several lattice delay units (LDU). The digital control word T determines the clock signal (CLKIN) propagation path. Unlike conventional digital controlled delay element with two different delays controlled by a multiplexer increasing tuning range but intrinsic delay increases as well. When the tuning range increases, the minimal delay is not changed. Both the intrinsic delay and the delay step in an LDL are the delay of two NAND gates. As the operating frequency increases, the number of activated delay units is reduced and the power consumption remains the same.

Figure 18 LDU and LDL

There are several different architectures that have been used to implement a DCDE. However they can generally be classified into the shunt capacitor based, the parallel-inverter based and the current-starved based delay elements. In the following section, we will introduce different kinds of DCDE.

2.4.5.1 Shunt Capacitor based DCDE

Figure 19 shows the basic circuit of using a shunt capacitor based DCDE [50]. In this circuit, MC1~MCn acts as shunt capacitor. Transistor M1~Mn controls the charging and discharging current to the MC1~MCn. The operating is similar to RCCDL; replace the Vctrl to the digital control word D which is n-bit resolution controls the equivalent capacitance on the output node. As a consequence, the delay time of shunt capacitor based method can be controlled in binary-weigh. The drawback of shunt capacitor based DCDE is sensitive to power supply noise and PVT variation.

(30)

Figure 19 shunt-capacitor based DCDE

2.4.5.2 Standard Cell based DCDE

One simple example of standard cell based DCDE was proposed in [20] [51], as shown in Figure 20. The delay element is cascaded six inverters in the first row and the additional tri-state inverter with its control bit is added in every column. By enabling the number of tri-state inverter buffer, the delay time of DCDE can be controlled. It is simple and easy to implement. However, it needs large area and high power dissipation for the fine tune necessarily in the DCDL design. Besides, the resolution is hard to be uniform.

Figure 20 parallel tri-state inverter based DCDE

The other example, as shown in Figure 21, the DCDE is implemented by an add-or-inverter (AOI) cell and or-and-inverter (OAI) cell with two parallel tri-state

(31)

inverters was proposed in [48]. The basic method is to adjust the driving capability with resistance control. The advantage is that this fine tune method of DCDE has less area and power dissipation compare with [20] [51]]. However, since it’s based in AOI-OAI cell to change the delay resolution, the resolution step is also hard to be uniform and sensitive to power-supply variation. Besides, it also requires an additional decoder for mapping the control input of AOI-OAI cell.

Figure 21 AOI-OAI parallel based DCDE

2.4.5.3 Current-starved based DCDE

The current starved based DCDE was proposed in [26]. As Figure 22 shows, the charging and discharging currents of the inverter, composed of M1 and M2, are controlled by two sets of current-controlling nMOS (Mn0, Mn1, …) and pMOS (Mp1, Mp2, …) transistors at the source of M1 and M2, respectively. The current controlling transistors are sized in a binary fashion. It allows achieving binary incremental delays. As can be seen, by applying a specific binary vector to the controlling transistors, a combination of transistors is turned on at the sources of M1 and M2 transistors. Such an arrangement controls the rise time and fall time of the output voltage of the inverter.

(32)

Figure 22 current starved based DCDE

However, one of the problems with the current staved based DCDE architectures is the non-monotonic delay behavior with ascending binary input vector. As can be seen in the circuits of Figure 23, the input vector changes the effective resistance of transistors placed at the source of the nMOS or pMOS transistors of the inverter. This not only changes the resistance at the source of M1 or M2, but also changes the parasitic capacitance associated with transistors at these nodes. This is because the parasitic capacitance at the drain of a MOSFET is different in the ON and OFF states.

In [8], there are two factors depending on the input vector to affect the delay : (1) The resistance of the controlling transistors:

The circuit delay can be increased / decreased by increasing / decreasing the effective ON resistance of the controlling transistors at the source of M1.

(2) The capacitance of the controlling transistors:

The charge sharing effect cause the output capacitance to be discharge faster and the overall delay decrease as the effective capacitance of the controlling transistors at the source of M1 increase.

The larger resistance increases the delay; however, larger parasitic capacitance decreases the delay. The effective capacitance seen at the source of M1 depends on which controlling transistors are on. Because of the ON and OFF capacitances between drain and ground of a MOSFET is different. Therefore, it may make monotonic

(33)

characteristic of the DCDE can not be ensured with ascending input vector. This situation will be further complicated as the number of delay controlling transistors increases. Table 1 shows the comparison of the different type of DCDE.

Table 1 Comparison of different type DCDE

Delay cell type

Drawbacks

Circuit structure

Shunt-capacitor based

 Sensitive to power supply noise

 Process mismatch

Standard-cell based

 Larger area and power dissipation

 Delay resolution

 Different coarse delay and fine delay cells

Current-starved based  Poor linearity  Sensitive to PVT variation IN OUT Input Vector Mn0 Mn1 Mn2 Mp0 Mp1 Mp2 M1 M2

2.5

C

OMPARISON OF

D

IGITAL

DLL

AND

A

NALOG

DLL

The most advantage of the analog approaches is the smaller static phase, good jitter performance, fine resolution because the delay is varied continuously. In addition, the analog DLL achieves small chip are and low power consumption. However, it suffers from slow locking and performance degradation due to sensitivity to variations of process and temperature. Although digital requires more chip area and power

(34)

dissipation, it is more robust against process, voltage, temperature (PVT) variation. Besides, the digital DLL provides fast lock time and easy to design. However, the quantization error of the digital DLL is unavoidable because the delay adjustment is in a discrete manner.

However, the digital DLL is still attractive of its shorter lock time and easy integration compare with analog approach. Table 2 shows the comparison of the analog DLL and digital DLL.

Table 2 Comparison of analog DLL and digital DLL

Analog Digital

Phase error Smaller Larger

Lock range Smaller Larger

Lock time Longer Short

(35)

CHAPTER 3

MULTIPHASE DELAY-LOCKED LOOP

WITH SELF-CALIBRATION

In this chapter, it introduces the multiphase DLL with self-calibration. The conventional multiphase DLL architecture, design consideration and self-calibration schemes would be described in Section 3.1 and Section 3.2, respectively. In addition, the applications of multiphase DLL would be detailed in Section 3.3. Finally, Section 3.4 would give an introduction of proposed Rapid Self-Calibration (RSC) algorithm.

3.1

I

NTRODUCTION OF

M

ULTIPHASE

DLL

A ring-oscillator-based phase locked loop (PLL) or delay line based delay-locked loop (DLL) has been widely used because of its ability to generate multiphase clock signals. The multiphase clock signals can be used in various applications. Time-interleaved architectures, like a transmitter and receiver, employ multiple signals processing paths in parallel to achieve high overall speed while the speed of each channel is standard [48] [24]. In wireless communication systems, the multiphase clock signals are easily converted into the in-phase and quadrature (I/Q) signals with π/4 radian difference essential for the down-conversion mixer [25]. In frequency synthesizer, the multiphase signals are used to generate a high frequency signal [1].

In most of these systems, the ring oscillator or delay line which consists of several identical delay elements is inserted into a negative feedback loop. When the PLL or DLL into the locked state, that means, the reference clock signal is split in several identical parts and the delay time of each delay stage is equally. Unfortunately, even all the delay stages are designed to be identical, each delay stage introduces a different delay due to the mismatch after fabrication, not to mention temperature and supply

(36)

voltage variations.

3.2

A

PPLICATION OF

M

ULTIPHASE

DLL

In this section, we will introduce the application of multiphase DLL in detail. In these applications the multiphase DLL is used to replace the PLL. The choice of DLL rather than PLL is due to the fact that they do not exhibit the jitter accumulation characteristic and there is no need for frequency multiplication of some applications.

I. Frequency Synthesizer

Figure 23 The block diagram of conventional DLL-based frequency synthesizer.

A DLL can operate as PLL, which uses delay line to replace VCO. Fig. 23 shows the simplified block diagram of DLL-based frequency synthesizer. When the loop is locked, the output phases of every delay stage are evenly spaced one reference clock period Tref. Each phase difference of two delay stage has a delay of Tref/N and the edge combiner can generates a transition for each phase output transition, hence the output frequency is the N times the reference frequency Tref.

A multiplying DLL overcomes the drawbacks of PLL such as jitter accumulation, high sensitivity to supply, and substrate noise. For this reason, it represents a good performance for phase noise.

(37)

II. Dynamic Frequency Scaling

In recent years, the power and energy consumption has become a critical design issue in the embedded systems, especially for the mobile systems and portable systems. Dynamic Voltage Frequency Scaling (DVFS) has been more important for saving energy on mobile embedded systems. Figure 24 illustrates the diagram of the voltage/frequency transition that proposed in [36]. The voltage changes from high to low and goes back to high in this example. In the conventional frequency scaling, the clock must be stopped during voltage transition. Therefore, performance overhead occurs by the frequency scaling. For the proposed frequency scaling, the voltage/frequency selectors are introduced to achieve no performance overhead as indicated in the third line of the figure.

Figure 24 The operation of DVFS scheme

Notice, there are two issues need to be consideration for changing the frequency without stopping the running programs. First, the data transfer from modules operating in different frequency must be handled by the main bus. Second, the transition in supply voltage skews the clock tree.

A DVFS scheme is also proposed in [35]. A frequency adjuster circuit unit calculates the optimum clock frequency based on the activity value derived from the activity monitor to reserve the required number of inactive margin cycles within the monitoring period and indicates the next clock frequency to the clock generator. The dynamic frequency is selected by the clock thinning circuit which collects several different frequency input. Therefore, it can operate continuously without PLL relock or system.

(38)

In order not to make performance overhead, the relock time is an important issue for the DFS. All the previous mentioned DVFS schemes utilize multiple existing frequencies to generate the desired frequency. However, it increases the consumption for the useless frequency. A multiphase DLL based clock generator for dynamic frequency scaling was proposed in [31]. With plain digital logic for frequency adjustment, the multiplication factor can be changed with fast lock time. For the specific case, it only takes one-cycle to lock during frequency scaling.

III. Transmitter [48]

In the digital communication application, the multiphase DLL can apply to a data cannel compression transceiver. The architecture of the transceiver is shown in Figure 25. The transmitter’s output, TX_DATA and TX_CLK, are sent to the receiver’s inputs, RX_DATA and RX_CLK, respectively. In the transmitter, the generated seven-phase clock signals are used to transfer 7-bits data (DATA [6:0]) into one data channel (TX_DATA), and the TX_CLK is also sent to the receiver. The receiver shown in Figure 25(b) recovers the received data stream (RX_DATA) back to original 7-bits data (DATA_OUT [6:0]). The two-phase ADMCG shown in Figure 25(b) is used to estimate the accurate delay of TREF/14. It aligns two adjacent phases of the seven-phase DLL outputs (i.e., P6 and P0) to measure the delay, and the received data stream will first be delayed by and then sampled by the seven-phase multiphase clock signals. Thus, those multiphase clock signals can sample the received data stream in the center of the bit symbol boundary, and this maximizes the timing margin of the receiver circuit

(39)

Figure 25 7:1 Data channel compression transceiver. (a) Transmitter circuit. (b) Receiver circuit.

IV. DDR SDRAM controller application [10]

In [10], the calculations for timing budget show that the optimal value for tSD is approximately 20 percent of an input clock period as shown in Figure 26. Since the input clock frequency range from 100MHz to 200MHz (DDR-200/266/333/400), the tSD value varies from 2ns (=10nsX0.2) to 1ns (=5nsX0.2). Therefore, a five-phase all-digital DLL was proposed in [10] to generate the desired tSD delay for DQS signal.

Figure 26 Read operation timing budget

The block diagram of the five-phase all-digital DLL for DDR SDRAM controller application is shown in Figure 27. Like most of DLL-based multi-phase clock generators, the DLL has a multi-stage delay line with the same control word to

(40)

generate equally spaced multi-phase clock output. It uses the time-to-digital (TDC) scheme to lock whole loop. Hence, a design consideration should be noticed is that sometimes it is difficult to meet the minimum delay constraint when using standard cell to build up a high resolution delay cell. Therefore, the DLL in this design is lock to two periods of the reference clock period by using TDC scheme. After DLL is locked, the phase spacing of each delay stage should be 2*TFREF/5, where TFREF

means the clock period of the reference clock. Hence the minimum delay constraint for each delay stage is extended twice as original. The total delay from DQS to DQSD becomes 1.2xTFREF, which means the phase shift between DQS and DQSD is still

0.2xTFREF. As a result, the desired tSD delay can be generated by the multiphase DLL

Figure 27 The block diagram of multiphase DLL for DDR SDRAM application

3.3

S

ELF

-C

ALIBRATION

T

ECHNIQUES

As we mention above, the multiphase clocks are useful in many applications. The feedback loop guarantees the whole loop to hold the lock state. However, each delay cell may introduce different delay time due to the process variations or wiring mismatch. It is impossible to equal each phase difference of output signals without any calibration schemes. Figure 28 shows the 1000 points Monte-Carlo simulation results for static timing errors among five delay cells, where the designed delay time is 1ns and 15 mV threshold voltage mismatch of the delay cells are added. In our 90 nm CMOS technology, the threshold voltage mismatch of 15mV for the delay cells will cause the maximum delay time mismatch of 100ps (around 10% delay mismatch). Similarly, Figure 29 shows the Monte-Carlo simulation results for the 10% channel

(41)

length mismatch. The simulation results indicate the channel length mismatch will cause the maximum delay time mismatch of 40ps (around 4% delay mismatch) for the delay cells.

Figure 28 Delay time mismatch due to the delay cell with the threshold voltage mismatch of 15 mV

Delay time mismatch (ps)

-20 -15 -10 -5 0 5 10 15 20 0 100 200 300 400

Figure 29 Delay time mismatch due to the delay cell with the channel length mismatch of 10% Since the delay cells introduce different delay time due to the process mismatch, the additional calibration mechanisms is needed. In order to compensates the mismatches among delay cells in the DLL or PLL. One of the solutions to reduce the mismatch is to increase the transistor size. Starting from a circuit that has been optimized with respect to specifications other than noise and mismatch, one can scale the width of every component of that circuit by a certain factora . For a delay cell, the implication of the impedance level scaling is that increasing the power by a factor a yields a stochastic jitter reduction of a . Also the mismatch of the delay between

different cells will improve by a factor a [1].

However, impedance level scaling [1] will increase parasitic capacitance, power and area. When the clocking speed increases, the delay cell with minimum channel lengths may be chosen for the sake of higher speed [7]. Such a delay cell suffers from poor matching which may induce significant timing errors. Thus, the extra

(42)

self-calibration algorithm and its circuits for the precise multiphase DLL or PLL is necessary.

3.3.1 A Self-Calibration Delay-Locked Delay Line

One of self-calibration algorithms was proposed in [26], the basic concept is shown in Figure 30, where NDLi is the differential non-linearity of ith delay cell and Ri is the contents of i-th register and R0i its value at the beginning of the non-linearity test. With perform a complete code-density test with balanced mean method [26], the correction is done by comparing the register content with two thresholds that define as ± 1% non-linearity value and if an arithmetic overflow (or underflow) of the register is detects during the test, a interrupt is occurred for the delay cell and the cell controller ignores further hits.

111..11 110..00 100..00 001..11 000..00 +2% +1% 0% -1% -2%

DOWN=1

UP=1

R

oi DNLi Ri

Figure 30 Self-calibration delay locked delay line scheme

The calibration produce is dependence on the most two significant bits of the register content. When these two bits are value ‘00’ or ‘11’ the relevant threshold has been exceeded and cell controller should be decreasing or increasing the delay time of delay cell by adjust the calibration control word. Thus the comparator can reduce to a very simple structure that consist of two logic gate and apply to a four bits up/down counter which generates the calibration control word. According to the test result, assures us that the delay mismatch of each delay cell could be pushed below 1%.

(43)

The calibration algorithm uses a time-measurement method to reach self-calibration delay locked delay line. However, there is a restriction in which if the initial non-linearity of the delay cell is out of the allowable correction range, the calibration mechanism must halt.

3.3.2 Sequential Phase Adjustment Calibration Technique

Another method proposed in [3] [7] avoids the mentioned problem. In [3], the operation of self-calibration, as shown in Figure 31, assume the initial phase differences between out1 and out2, out2 and out3, out3 and out4, and out4 and out1 are θ1, θ2, θ3, and θ4 respectively and the target phase differences are all 90-degrees. First, the calibration produce is controlled by loop enable signal loop_enabli, i.e. the

loop_signal1 signal becomes high and selects three output signals, out1, out2, and out3.

The selected signals are inserted into relative phase detector as shown in Figure 32 and generate a control signal vcon1 and von2 to adjust the phase of out2 by changing capacitance of two adjacent capacitors. By performing relative phase comparison method the phase differences between out1 and out2, out2 and out3 become same. Figure 31 (b) shows the result. Similarly, when next loop enable signal, loop_signal2, is high, the phase differences between out2 and out3, out3 and out4 become same as shown in Figure 31 (c). By continuously, the phase difference between out1 and out2, out2 and out3, out3 and out4, and out4 and out1 become all the same, finally. Figure 31 (f) shows the final state of the DLL loops.

(44)

Figure 31 Sequential phase adjustment calibration algorithm

Figure 32 (a) Relative phase detector (b) Relative comparison method

3.3.3 Parallel Phase Adjustment Calibration Technique

In order to adjust every output phase independently and not interfere with main loop, the additional delay adjustment outside the ring oscillator or the delay line method was proposed in [2]. Figure 33 shows the parallel phase adjustment calibration algorithm. Since φ1 is tracing reference clock by the main loop, φ5 can be calibrated by comparing △td15 and △td51, and φ1 is used as the reference signal ofφ5. Whenφ1 and

(45)

φ5 is established, φ3 can be calibrated by comparing △td13 and △td35, and φ7 by

△td57 and △td71. This process is repeated for the other phases and finally the phase differences of each delay cell can be calibrated.

Figure 33 Parallel phase adjustment calibration algorithm

Figure 34 shows the circuit design to implement the parallel phase adjustment calibration algorithm that we mention above. The delay sensing circuit, as shown in Figure 34 (a), is used to produce the time delay pulse width, △tdij. The calibration loop charge pump, as shown in Figure 34 (b), is a simple current-steering structure and the capacitor is implemented with pMOS transistor.

(46)

3.3.4

A

PLL

BASED

S

ELF

-

CALIBRATED ALGORITHM

The self-calibrated algorithm which proposed in [25] is based on the innate ability of PLL. Assume the fractional-N frequency synthesizer is based on a PLL capable of generating 8 different phase clock signals. Each edge of the clock signal is used to synthesize the output signals. Consequently, the division ratio becomes (M+1/8). When the PLL is locked, the amount phase error △ti that caused by the delay mismatches

becomes zero, where i means the ith delay cell. In other word,

1 2 8 0 (3.1)

t + t + 跂? t =

V V V

Assume the phase offset of the 1st delay cell is change by △11. The after the PLL is locked again, the resulted phase offset becomes:

1 1 1 1 1 1 1 1 1 1 1 1 1 , 2 2 , 8 8 (3.2) 8 8 8 t t t t = t - t + V t = t + V t = t + V V V V V V V V

Where △tkN is the phase error due to N-th delay cell after K cycles of calibration, and △mN is the amount of the calibration at the m-th iteration.

By repeating the above step for each delay cell one by one until

( m) (3.3)

N = tN

å

V V

is satisfied for all delay cells, the final values of the phase error due to 1st delay cell becomes; 1 1 1 1 2 8 1 1 1 1 8 1 1 1 1 8 1 0 (3.4) 8 final k k k k k k k k n n t t t t t t t t t = = = = = 澺 蕫 = - + + + 跂□ 蕫 錵 澺 蕫 = - + = 蕫 錵

å

V V V V V V V V V Similarly, 1 2 3 8 0 (3.5)

final final final final

t = t = t 跂? t =

V V V V

Therefore, all the phase errors due to the delay mismatches are reduces to zero when the compensation algorithm is finished.

(47)

3.4

R

APID

S

ELF

-C

ALIBRATION

A

LGORITHM

In [26] [3] [7] [2] [25], the calibration algorithm has been adopted to overcome PVT variations. The self-calibration algorithm [3], [7] requires additional timing control circuit, and large calibration cycles. A novel rapid self-calibration (RSC) algorithm was proposed to reduce calibrate cycle, where no extra timing control circuits is needed.

Figure 35 the idea of proposed RSC algorithm

Figure 35 shows the idea of proposed RSC algorithm. Assume the multiphase DLL is consisted of k identical digitally controlled delay elements and multiphase output generated from each delay element. Unlike conventional multiphase DLL which the delay elements are controlled by a global signal, each delay element in the proposed RSC algorithm is controlled by two sets of control words. They are lock-in control word and calibration control word_i, where i means the ith delay stage. The lock-in control word is connected to all delay elements and each delay element has distinct calibration control word_i as shown in Figure 35.

In the beginning, the total delay of the DCDL locks to multiple period of reference clock, Ref_clk, by changing the lock-in control word. After the DLL is locked, i.e. the phase difference between Ref_clk and last output signal, Pk, is equal to one reference clock period. The RSC algorithm first considers about three signals; they are Ref_clk, P1, and P2. A relative comparison method [3] is adopted to adjust θ1 to (θ1+θ2)/2 by changing the calibration control word_1, where θi means the phase difference between Pi and Pi-1. Similarly, the calibration unit would consider about the next three signals; they are P1, P2, and P3. It will adjust θ2 to (θ2+θ3)/2 by changing the calibration control word_2. Unlike [3], the modified θ1 does not affect θ2. This

(48)

allows sequential adjustment in the same reference cycle. Finally, the adjustment of θk is based on Pk and Ref_clk, which guarantees the whole DLL remains locked. The lock-in control word remains unchanged during the calibration process to ensure successful RSC operation. As the result, the final output difference of each delay stage is one fifth of the period of reference clock.

Figure 36 shows the RSC algorithm expressed in mathematical equations when the delay stages number is five. Assume the DLL is in the locked state initially and fulfill equation (3.6).

o

360 (3.6)

1+ 2+ 3+ 4+ 5=

q q q q q

In the first calibration cycle, i.e. n=1, the θ1 becomes the mathematical average

value of θ1 and θ2, which is expressed as

1 2 (3.7)

2

q + q

Where θij means the phase error due to j-th delay cell after i-th calibration cycles,

and θ1 and θ2 are the initial phase differences between Ref_clk and P1, P2 and P3. At

the same calibration cycle, the phase difference between P2 and P3, θ2, becomes

2 3 (3.8)

2 q + q

The phase adjustment of θ3 and θ4 in the first calibration cycle is similar to θ1 and θ2. Because the DLL is hold in the locked state, θ5 would not change at the first calibration cycle. Next, when n=2, the phase differences θ1 and θ2 become equation (4) and equation (5), respectively.

1 2 2 3 (3.9) 4 q + q + q 2 2 3 4 (3.10) 4 q + q + q

Note that after the first calibration cycle, the equation (1) may not be observed. In order to guarantee the whole DLL remains locked, θ5 becomes

數據

Figure 1 The architecture of conventional DLL
Figure 2 Block diagram of analog DLL
Figure 4 The CSCDL (a) delay element (b) delay line
Figure 6 PD responses with (a) reference signal lag feedback signal (b) reference signal lead feedback
+7

參考文獻

相關文件

The prototype consists of four major modules, including the module for image processing, the module for license plate region identification, the module for character extraction,

A floating point number in double precision IEEE standard format uses two words (64 bits) to store the number as shown in the following figure.. 1 sign

A floating point number in double precision IEEE standard format uses two words (64 bits) to store the number as shown in the following figure.. 1 sign

In this paper, we have shown that how to construct complementarity functions for the circular cone complementarity problem, and have proposed four classes of merit func- tions for

In the past researches, all kinds of the clustering algorithms are proposed for dealing with high dimensional data in large data sets.. Nevertheless, almost all of

 Negative selection: if the antibodies of a B cell match any self antigen in the bone marrow, the cell dies.  Self tolerance: almost all self antigens are presented i n

This design the quadrature voltage-controlled oscillator and measure center frequency, output power, phase noise and output waveform, these four parameters. In four parameters

The schematic diagram of the Cassegrain optics is shown in Fig. The Cassegrain optics consists of a primary and a secondary mirror, which avoids the generation of