This Thesis introduces the basics of DLL and the idea of clock frequency multiplication.
Finally, a DLL-based clock generation for dynamic frequency scaling is implemented in detail.
The thesis is organized as follows:
Chapter 2 begins with the brief introduction of the clock generation between PLL/DLL-based methods from the system view. We also discuss the applications of DLL-based clock generation. Finally we introduce the basic building blocks of the clock generator.
In chapter 3, we describe the mathematic equation and linear model of DLL. We discuss the open loop as well as close loop characteristics for determine the system parameters, and we build a behavior model. Jitter analysis is discussed. Finally, the implementation of DLL is present.
In chapter 4, we discuss the multiplication method and a 150MHz~1200MHz clock generator for dynamic frequency scaling is described. Finally, the simulation result is shown.
Chapter 5 begins with the I/O consideration. Then the layout consideration for the minimization of mismatch is discussed. The experiment results and the test setup are shown.
Finally, we sum up the reason of the chip failure and propose the solution.
Chapter 6 gives the conclusions to this work, in which the DLL-based clock generator is designed. Suggestions for future works are recommended at the ending of the thesis.
Chapter 2
Clock Generator Overview
2.1 Clock Generation Specification Parameters
In this section will introduce the specification parameters for the clock generator.
◆ Timing jitter: A measure of the clock generator performance. The rising and falling edge departure of the clock from its ideal position.
◆ Multiplication factor: The frequency multiplying capability for the clock multiplier.
◆ Tuning range: A wireless transceiver uses predefined frequency bands to transmit and receive. The tuning range is a measure of the frequency synthesis capability.
◆ Integratability: A measure of how well the circuit can be intergrated. The bottleneck in high-quality PLL/DLL synthesizer design is the high Q inductor and the loop filter capacitance.
◆ Power consumption: The power consumption is as low as possible especially for portable telecommunication systems.
2.2 Clock Generation
2.2.1 PLL-based Clock Generation
The general architecture of PLL is illustrated in Fig.2.2.1. The reference signal compares with the frequency divided signal of VCO.
Fig.2.2.1 PLL 2.2.2 DLL-based Clock Generation
The DLL can be generally categorized by two types according to their jitter transfer characteristics [5] (see Fig.2.2.2). In a Type I DLL, the signal is compared with its delayed version. This architecture is widely used in DLL-based frequency synthesizers, multi-phase clock generators, and clock deskewing circuits. In a Type II DLL, the signal is compared with the delayed version of another clock source. This architecture is widely used in DLL-based clock recovery circuits.
Fig.2.2.2 Type I and Type II DLLs.
2.2.3 Comparison of PLL/DLL-based Clock Generation The comparison of PLL and DLL are listed below:
◆ PLL
- VCO has jitter accumulation.
- Higher order system, can be stable and hard to design.
- Costly to integrate loop filter.
- Performance is less reference signal dependent.
- Easy frequency multiplication.
◆ DLL
- VCDL has no jitter accumulation.
- 1st order system, always stable and easier to design. (In general case) - Easier to integrate loop filter.
- Performance is reference signal dependent.
- Difficult frequency multiplication.
- Limited locking range. (Harmonic locking)
2.3 DLL-based Clock Generator in Different Applications
The PLL-based clock generation is not the only solution for the low cost and high performance LO in the communication systems. The DLL-based clock generator is developed to achieve better phase noise. In addition, the DLL-based clock generator for frequency scaling goes along with energy-efficiency. The frequency scaling scheme is applied to different operate mode for more power-efficiency management. In this section will introduce recent DLL-based clock generation applications.
2.3.1 A 900-MHz local oscillator using a DLL-based frequency multiplier technique for PCS applications [2]
carrier frequency for a monolithic CMOS local oscillator is proposed. An edge combination technique is first introduced in the DLL. The experiment results show that a level of phase noise performance with the DLL-based approach can be used for the difficult AMPS/TDMA standard.
Fig.2.3.1 DLL-based frequency multiplier for PCS applications 2.3.2 A 0.2-2 GHz 12 mW multiplying DLL for low-jitter clock synthesis in highly-integrated data communication chips [6]
Serializer Deserializer
Fig.2.3.2 Basic components of a high-speed serial I/O.
High-speed serial I/O links are replacing traditional parallel buses as the bandwidth demand of computer and digital communications components continues to grow. The jitter performance of the clock generator directly influences the data transmission. A multiplying DLL architecture with programmable frequency multiplication is first proposed. The experiment results show that the excellent jitter performance in a quad SerDes block.
2.3.3 A 2.4-Gsample/s DVFS FFT Processor for MIMO OFDM Communication Systems
[4]
Since in a system point of view full MIMO streams are not always need depending on the channel condition and data rate, the FFT processor for MIMO OFDM systems is desired to optimize the power consumption for all operation modes. The dynamic voltage and frequency scaling (DVFS) is an effective technique on scaling both voltage and frequency to optimal values depending on the processing needs. Without using PLL-based clock generation scheme, the divider based clock generation is basically to satisfy the fast response time requirement when the operation mode changes. In this work, fclk
8 frequency scaling that achieves the requirement of response time, frequency multiplication, and better jitter performance will be a solution.
RF ADC Filter
Fig.2.3.3 Block diagram of multi-mode MIMO OFDM receiver.
2.4 Introduction to Clock Generator building blocks 2.4.1 Phase Detector
The phase detector is used to generate the phase difference message of the input signals.
Basically, three flavors of phase detection exist; Analog phase detector of multiplier performs a mixing operation on its input signals and the resulting DC output is a measure for the phase error. The second type is digital phase detectors, which are implemented using XOR gates or Flip-flops in a sequential way. The third type is phase-frequency detector, which provides both phase and frequency differences information.
2.4.1.1 Analog Phase Detectors
As shown in Fig.2.4.1, the two sinusoidal inputs A1‧cos(ω1t+θ1(t)) and unwanted sum-component is filtered out by the low-pass response.
The analog phase detector is especially useful in applications where the reference frequency is too high for other circuits and where the loop bandwidth is sufficiently narrow to effectively suppress the unwanted signals.
Fig.2.4.1 Analog Phase Detector 2.4.1.2 XOR Phase Detector
(t)])
An XOR gate can be used as a phase detector as shown in Fig.2.4.2, where Yeff is donated as the average value of output, which is proportional to the phase difference θd and can be written as
d d
eff K
Y = θ (2.4.2)
π
Kd =Y (2.4.3)
where Kd is PD gain and Y is the supply voltage.
The linear operating range is π radians. The problem arises when both inputs are asymmetrical (see Fig.2.4.3); In that case the output signal gets clipped around –π/2 and π/2, reducing the loop gain of the PLL and DLL and thus the locking capabilities. The poor phase detection range is also a limitation to the applications.
Fig.2.4.2 XOR PD
Fig.2.4.3 XOR PD with asymmetric input & Transfer Curve
2.4.1.3 Flip-flop Phase Detector
A phase detector using a JK flip-flop is shown in Fig.2.4.4. J set Q to high, and K reset Q
phase detector is a full reference cycle and is centered around ±π radians, which is doubled compared to the XOR PD. Also, SR flip-flop can obtain the same phase characteristic. The flip-flops are sensitive to reference spurs.
d d
eff K
Y = θ (2.4.4)
π 2
Kd = Y (2.4.5)
Fig.2.4.4 (a) JK-FF. (b) θd=0. (c) θd>0. (d) Transfer curve.
2.4.1.4 Phase-Frequency Detector
Typically, the phase-frequency detector is a sequential logic with phase and frequency detection abilities. The operation of PFD is shown in Fig.2.4.5. The relation of Yeff to θd is given by Eq. (2.4.6-7). Up and Dn control the charge pump operation. The linear operating range of this PFD is 4π radians. However the PFD suffers from a dead zone problem near zero phase difference resulting in wrong function. The delay buffer is inserted in the reset signal path to eliminate the dead zone with trade-off to the reduced operating range.
d d
eff K
Y = θ (2.4.6)
π 4
Kd = Y (2.4.7)
Fig.2.4.5 (a) PFD. (b) State diagram. (c) Transfer curve.
2.4.2 Charge Pump
The basic concept of the charge pump is to charge or discharge its output load (see Fig.2.4.6). The current mismatch phenomenon is unavoidable, and is induced by charge sharing caused by the parasitic capacitance in nodes p and n, leakage current, Ids current variation due to changes of Vds, process variation.
Fig.2.4.6 Charge pump
2.4.3 The Loop Filter
The proper order of the loop filter is chosen to suppress the noise of the control signal and to ensure the loop stability. For 1st-order DLL, a simple on-chip passive capacitor is chosen to implement.
2.4.4 The Voltage Controlled Delay Line
The voltage controlled delay line (VCDL) offers the desired delay range for different clock frequency applications. Unlike ring VCO, VCDL doesn’t have jitter accumulation problem since there is no feedback path to enhance the jitter. Multi-identical stages are usually for multi-phase generation, and the delay mismatch of each stage degrades the phase precision.
The differential delay cell (see Fig.2.4.7) in [7] has high linearity, and the swing is well-controlled by using replica-bias control scheme while achieving process-independent bandwidth tracking. However, this differential delay cell consumes static power.
Fig.2.4.7 differential delay cell
2.4.5 The Edge Combiner
The edge combiner is triggered by the multi-phase signals and then generates the signal with frequency multiplication of reference clock. Large internal parasitic capacitance limits the speed of the edge combiner, which confines the maximum operation frequency of the clock generator.
Chapter 3
An Adaptive-bandwidth Mixed-mode Delay-Locked Loop
3.1 Delay-Locked Loop Fundamentals
In order to observe the loop dynamics, the DLL can be analyzed by mathematical equation, linear model, and behavior model.
3.1.1 Mathematical Basis and Linear Model
Fig.3.1.1 Basic DLL block diagram.
The 1st order DLL block diagram is illustrated in Fig.3.1.1. Based on the control theory, a DLL can be written as eq. (3.1). system, thus always stable.
3.1.2 Behavior Model
Using Matlab Simulink to build behavior model (see Fig.3.1.2) can help us to handle
truly situations of circuits, and the non-idealities, such as PD switching timing mismatch, CP current mismatch, delay cell mismatch, and so on.
Fig.3.1.2 1st order DLL behavior model.
The simulation results show that the loop is stable in the time domain of view at the target reference clock frequency 300MHz.
3.1.3 Jitter Analysis
◆
◆
◆
◆ DLL output jitter due to delay cell mismatch [8]
The delay of the individual stage is given by:
rinsic reasonable since any common change of delay in the cells is removed by the loop.
When the DLL is in the lock state the delay of delay cell number i is given by:
∑
=where Tref is the reference clock period, and M denotes total delay stages.
The systematic jitter on the mth tap can be expressed by:
)
The variance is given by:
2 error is zero on the first and last taps by the loop controls. Therefore, the highest uncertainty appears in the middle stage of VCDL (see Fig.3.1.4(a)).
◆
◆
◆
◆ DLL output jitter due to delay cell noise
The jitter due to delay cell noise is a random distribution with an arbitray variance of
td
σ∆ . The DLL output jitter due to delay cell noise is approximately equal to the stochastic
jitter of the uncontrolled VCDL [9]. The jitter variance of the on the mth tap is given by:
d
m t
t m ∆
∆ ≈ σ
σ (3.6)
The result shows that delay cell noise is highest on the last tap of VCDL (see Fig.3.1.4(b)).
Fig.3.1.4 The histogram of DLL output jitter induced by (a) delay cell mismatch, (b) delay cell noise.
3.2 Implementation
In this section we discuss the implementation of DLL building blocks.
3.2.1 Phase Detector
As shown in Fig.3.2.1, this is a conventional PFD, but with the modified pre-charge circuitry, the linear range is extended from 2π to 3π. The linear range of this PFD is less than the PFD we introduced in previous chapter. Why we choose this PFD? We should notice that the phase of CLKREF always leads CLKVCDL in DLL, so lies in the negative linear range of the transfer curve won’t happen, and the effective linear region of PFD in Fig.2.4.5 is reduced from 4π to 2π, which is smaller than the modified conventional PFD. H and L represent the logic levels to be initialized at a certain node. The PMOS transistors are used to pre-charge the nodes to “High” while the NMOS transistors are used to pre-discharge the nodes to “Low”.
To minimize the phase offset of the PD, the pre-charged and pre-discharged nodes must be
switches. Similarly, adding dummy PMOS transistors for NMOS switches to keep the nodes in symmetry hence minimize the phase offset. Also, the symmetrical layout should be done to minimize of phase offset.
Fig.3.2.1 (a) Modified conventional PFD. (b) Transfer curve.
3.2.2 Charge Pump
The charge pump (see Fig.3.2.2) is similar to the circuits proposed in [10]. M5 and M8 form a current mirror and replicate the current of Idn to M8. Since Iup is designed equals to Idn, the current mismatch can be minimized. The voltage difference between vcp and n1 still introduce some current mismatch. With the long channel NMOS devices, the problem can be improved. The simulation results show that the current mismatch is small enough to be neglected. To compensate the change of KVCDL, a programmable current mirror is used and
the complete charge pump circuit is as shown in Fig.3.2.3.
DNb DN UP UPb
Control
vcp
M1 M2 M3 M4
M5 M6 M7 M8
n1
I
dnI
upFig.3.2.2 Charge pump circuit
Fig.3.2.3 Complete charge pump circuit 3.2.3 Loop Filter
The loop filter is designed using MOS capacitor. 3-bit digital code is implemented to adapt with change in VCDL gain and CP current. Therefore, the loop bandwidth can be optimized.
3.2.4 Voltage Controlled Delay Line
Inverter-based delay cell is chosen to implement. This is because it has the following benefits: Robust, high-speed operation, adaptive to low supply voltage. The inverter-based
The delay range of the implementation is designed to cover the target reference clock period (see Table 3.1, Table 3.2). Finally, the control voltage is chosen from 1.4~3.3V. In the worst case, the maximum operation frequency is decided by the SS corner when Vctrl=3.3V.
In the best case, the minimum operation frequency is by the TT corner when Vctrl=1.4V.
Table 3.1 Delay range (B5~B8) Delay range Final feedback stage
Vctrl=3.3V Vctrl=1.4V Vctrl=1V
B8 2.11ns 7.64ns 28ns
B7 1.85ns 6.69ns 24.5ns
B6 1.58ns 5.73ns 21ns
B5 1.32ns 4.78ns 17.5ns
Table 3.2 Delay range (B8 versus corner) Delay range
A voltage buffer in unit gain configuration (see Fig.3.2.4) is used to isolate the controlled supply node of VCDL from the loop filter. The loop bandwidth of the buffer must be higher than the DLL loop bandwidth to ensure no stability problem.
Fig.3.2.4 Voltage Buffer
3.2.6 Level Shifter
Conventional low to high level shifter is implemented. As shown in Fig.3.2.5, the cross-coupled configuration enhances the level conversion speed.
Fig.3.2.5 Level Shifter 3.2.7 Multiplexer
The widely used transmission gate multiplexers is shown in Fig.3.2.6. The time constant of the multiplexing node is determined by the equivalent resistance of transmission gates RTG and the capacitance at the multiplexing node. However, it was pointed out in [11] that transmission-gate multiplexers are not suitable for applications where the symbol time is less than 4FO4. Multiplexing speed can be improved by using the pseudo-nMOS based multiplexers. A 4:1 serializer (see Fig.3.2.7) is implemented to multiplex the feedback clock D5 to D8.
Fig.3.2.6 Muxplexer
Fig.3.2.7 A 4:1 serializer
Chapter 4
A 150MHz~1200MHz Clock Generator For Dynamic Frequency Scaling
4.1 Frequency multiplication
Fig.4.1.1 shows a method for high-speed clock generator. A frequency multiplier (see Fig.4.1.2) proposed in [13] is programmable, and the operation is as follows. When the signal Qb=”1”. Node Y is discharged to “Low” through NMOS transistor M3, and node X keeps the previous “High” data value. At the rising edge of the A1 signal, both transistors M1 and M2 are turned on for a short time duration of tp1 and transfer data between nodes X and Y. When Qb is “High”, node X is discharged to “gnd” through transistors M1-M3. For “Low” X, MP2 charges output node Q to “High”. After three inverter delay (Inv4-Inv6), Qb becomes “Low”
and node X is charged to “High” through MP1, and node Y keeps the previous “Low” data value. At the rising edge of the A2 signal, the data transfer from node X to node Y can be explained in a similar manner. After data transfer, node Y drives M4 to discharge output node Q to “Low”. Thus, output clock signal toggles at every rising edge of the “Ai” signal. The multiplication can be programmable with MUXs.
Fig.4.1.1 High-speed clock generation
Fig.4.1.2 A Programmable Frequency Multiplier [13]
4.2 Implementation
4.2.1 Frequency Multiplier
The frequency multiplier in the last section suffers from large parasitic capacitance on the internal node X and Y with the increasing of multiplication factor. Another frequency multiplier architecture is illustrated in Fig.4.2.1. The drawback of large internal parasitic capacitance is overcome. The blocks of the frequency multiplier are described as follows.
Fig.4.2.1 A Programmable Frequency Multiplier
◆
◆
◆
◆ Transition detector
A single transition detector cell consists of a 3-input NAND and an inverter. Eight cells are chosen for programmable frequency multiplication that is controlled by the select signal Si.
◆
◆
◆
◆ AND logic
The AND logic “AND”s Eight signals. Using a fan-out of eight logic gate to implement
against the pull-up path (i.e. PMOS in parallel) is not a wise choice. Besides, the improper arrangement of the signal affect the “AND” results (see Fig.4.2.2). The 2-input symmetric NAND in accompany with the arrangement of the signals ensure the correct function.
A
Fig.4.2.2 (a) AND logic with straight forward signal arrangement. (b) AND logic with alternative signal arrangement. (c) The schematic of signal propagation failure [14].
◆
◆
◆
◆ Toggle-pulsed latch (TPL)
The operation of TPL (see Fig.4.2.3) is as follows. The short pulse signal A is fed to TPL and Ab is the inverted signal of A. M1 is used to prevent the node X from voltage drop of Vth and enhances the setup time when the node X changes from “low” to “high”. The maximum speed of TPL is limited by the total delay of three inverters (Inv1-Inv3). The delay of the signal path can be characterized by a logic effort equation (eq. (4.1)). For simplicity, the maximum operation frequency can be roughly estimated by eq. (4.2) while neglecting the delay of M2 and Inv1~Inv3 can be treated as a ring oscillator suppose M2 always turned on.
p gb
d = + (4.1)
td
where d is total delay, g is logic effort, b is the number of branch, p is the parasitic delay, N is the number of stage, and td is the delay time of an inverter.
2 charge pump bias circuits, transition detector are implemented using combinational logic circuitry. Table 4.1 lists the final results.
Table 4.1 Encoding table of multiplication factor Multiplication
4.2.3 Timing match
The ideal multi-phase distribution is illustrated in Fig.4.2.4(a). Total delay of the delay path is given by eq. (4.3).
CLK VCDL
tot N t t
t = ‧ = (4.3)
The final feedback stage of VCDL is changed for different frequency multiplications.
Therefore, a MUX is added in the delay path, and the total delay is given by eq. (4.4) resulting in the timing offset every N cycles (see Fig.4.2.4(b)).
CLK MUX VCDL
tot N t t t
t = ‧ + = (4.4)
The extra delay cell and MUX are added to cancel the timing offset problem (see Fig.4.2.4(c)).
Fig.4.2.4 (a) Ideal (b) Mux is added (c) Timing offset is cancelled.
4.3 Simulation results
◆ Fig.4.3.1 PFD operation when the reference signal and feedback signal are in phase
◆ Fig.4.3.2 PFD operation when the reference signal leads feedback signal @ 300MHz
◆ Fig.4.3.3 PFD operation when the reference signal lags feedback signal @ 300MHz
◆ Fig.4.3.4 Behavior of charge pump
◆ Fig.4.3.5 Charge pump current versus output voltage
◆ Fig.4.3.6 VCDL delay time versus control voltage sweep
◆ Fig.4.3.7 Corner simulations of the VCDL B8 delay time versus control voltage sweep
◆ Fig.4.3.8(a) Behavior of DLL locking process
- Fig.4.3.8(b) Locking process overview
- Fig.4.3.9 Initial state
- Fig.4.3.9 Initial state