Pulsed-Triggered Level Converting Flip-Flop

Chapter 2 Overview on DLL-based Frequency Multiplier, Duty Cycle Corrector,

2.4 An Overview on Level Converting Flip-Flop

2.4.4 Pulsed-Triggered Level Converting Flip-Flop

A pulsed-triggered flip-flop is composed of a pulse generator and a latching. The level converter is implemented into the latch part. The pulsed-triggered flip-flops offers an attractive method of meeting delay and energy requirement. This kind of flip-flop inherently has a zero or negative setup time so that it can absorb the clock skew and jitter from timing budget in the critical path. Additionally, the pulsed-triggered LCFF can provide a small D-Q delay and has a low logic complexity.

For the pulsed-triggered LCFF can be classified into single-edged/dual-edged and implicit-triggered/explicit-triggered.

2.4.4.1 Single-Edged /Dual-Edged

Depending on the number of clock triggering edge, the pulsed LCFF can be separated into single-edge triggered and dual-edged triggered. Comparing these two types of LCFF, the single-edged LCFF get the input date only on one of the clock edges and dual-edged LCFF can capture the input data at both of the clock edge, rising edge and falling edge. The dual-edged triggered LCFF can maintain the same throughput as single-edge triggered LCFF with a half clock frequency. Therefore, the

power consumption of the clock tree can be reduce a lot. However, the dual-edged triggered LCFF should consider the timing constraints, such as the duty cycle variations [2.53]. Self precharge flip-flop[2.51] and dual-pass-transistor flip-flop[2.54]

are the examples of the single-edged triggered LCFF. In [2.54], the pulse generator produce a pulse only at the rising clock edge so that N1 and N2 are turned on to pass the signal, as Figure 4 shown. Recently, the dual-edged triggered LCFFs have become a promising way to reduce the delay and power overhead for the level conversion in the multiple supply voltage systems, [2.55]-[2.60]. In [2.57], the block colored in green means been supplied by a low voltage, the thick line represents high threshold voltage, and the thin line stands for low threshold voltage. Pulse 1 is produced at the positive clock edge to turn on M3 and M5. Pulse 2 is generated at the negative clock edge to switch on M2 and M4. There is an extra function which retains the data even when the flip-flop is in the sleep mode.

Figure 2.27. Single-edged triggered flip-flop[2.55]

30 (a) (b)

Figure 2.28. Dual-edged triggered flip-flop[2.58]: (a) Pulsed-triggered LCFF (b)Dual-edge pulse trigger circuit

2.4.4.2 Implicit-Triggered/Explicit-Triggered

Another category of the pulsed-triggered LCFF is decided by whether has a distinctive pulse generator or not. If a pulse generator is combined into the latch, this kind of pulsed-triggered LCFF called an implicit-pulsed triggered LCFF[2.55]-[2.56].

In [2.55], the four inverters in the dot-line box construct a pulse generator, as Fig.

2.29 shown. At the positive clock edge, N3, N7, N8, and N10 are turned on to sample the input data. The capturing window width is about three inverter delays At the negative clock edge, N2, N4, N9, and N11 are switched on to capture the input data.

In this architecture, it employed the conditional discharged technique so that save a redundant internal power consumption. If a pulse generator is outside the latch, this kind of pulsed-triggered LCFF is called an explicit-pulsed triggered LCFF [2.57]-[2.61]. In [2.58], it proposed a 4T-XOR logic gate to generate the pulse at the clock rising edge and the clock falling edge, as Fig. 2.30 shown. The explicit-pulsed triggered LCFF has a higher power overhead because of a pulse generator than the implicit-pulsed triggered LCFF. However, the explicit-pulsed triggered LCFF can share a common pulse generator among the latches so that reduce the power and the area overhead.

31 Figure 2.29. Implicit-pulsed triggered LCFF[2.56]

(a) (b)

Figure 2.30. Explicit-pulsed triggered LCFF [2.59]: (a) Pulsed-triggered LCFF (b) 4T-XOR pulse generator

Chapter 3 A Wide Range DLL-based Multiphase Clock Generator with Duty Cycle Correction in 65nm CMOS

In order to increase the bandwidth of the data rate in a high-speed system, the multiphase clocks has been exploited. A wide range DLL-based multiphase clocks is proposed. The eight phases is divided from a clock cycle. In this work, There are two control mode to make the proposed multiphase clocks to form a close loop. The first mode is successive approximation register-controlled (SAR) mode. The SAR mode, which is the binary search algorithm, helps to accelerate the lock in speed. When the output clock is locked, the second mode is counter mode. The digital delay block control word is added or subtracted by 1. In addition, the proposed multiphase clock generator can be operated from 80MHz to 500 MHz. A harmonic detection is proposed to avoid a harmonic lock. When the supply voltage is 1.0V and the operating frequency is 500MHz, the proposed multiphase clock generator consume 0.29 mW. The operation range is from 80MHz to 500MHz.

The clock signal is transmitted through the clock tree. Due to the unmatched

clock diver, the clock duty cycle is deviated from 50%. A PVT robust all-digital duty cycle corrector (DCC) is proposed, which is based on the SARDLL. A PVT detection is adapted in the this work so that the output duty cycle error rate can reduced. When the supply voltage is 0.5V and input frequency is 167MHz, the proposed duty cycle consumes 26.30 μW.

Section 3.1 gives an introduction of the DLL-based frequency multiplier. The multiphase clock applications are discussed in Section 3.2. The implementation of the multiphase clock generator is given in Section 3.3 and Section 3.4. The implementation of the duty cycle is described in Section 3.5. Finally, Section 3.6 concludes our work

3.1 Introduction

Phase-Locked loops (PLL) and delay-locked loops (DLLs) have been widely utilized to eliminate clock signal skews and jitter in high-speed microprocessors, memory interfaces and communication integrated circuits (ICs). In addition, they are capable of producing the multiphase clock signals. Many clock multiplication schemes have been proposed. Phase-locked loops (PLLs) are usually used as clock generator, but its locking period takes hundred of reference clock cycles. To enhance the flexibility of clock generator, an all-digital clock generator is presented [3.1]

which generates output clock by delaying the reference clock dynamically according to the frequency control code. However, the output frequency can only be fraction of reference clock. Delay-locked loop (DLL) [3.2] was presented for DVFS system, but it couldn’t generate fractional clock. Cyclic clock multiplier (CCM) has been presented for DVFS application [3.3], and it has the advantage of creating fractional or multiplied clock. However, the cyclic clock multiplier uses TDC for phase error

detection which will consume much area and power. Generally, the DLL has better jitter performance than the PLL because there is no jitter accumulation characteristic in the DLL.

For the high-speed systems, the data can be designed to be sampled by both of the positive clock edge and negative clock edge so that the throughput is increased a lot. For the low power systems, if maintaining the same throughput, the clock frequency can be decreased to a half of clock frequency. Once the clock frequency is reduced, the clock network consumes less power. Therefore, a clock signal with 50%

duty cycle is a critical key for these applications. If there is a duty cycle distortion, it may cause the degradation of the performance. However, a duty cycle of the clock signal from the off-chip is prone to deviate from 50% while operated in a high frequency. In addition, even the clock generator produces a 50% duty cycle clock signal, there is probably a deviation in the duty cycle because of the unmatched clock driver in the rising edge and falling edge. In order to solve this problem, the duty cycle corrector (DCC) have been widely used to adjust the duty cycle as close to 50%

as possible.

3.2 Multiphase clock applications

3.2.1 Frequency synchronizer[3.4]

A DLL can operate as PLL, which uses delay line to replace VCO. Fig. 3.1 shows the simplified block diagram of DLL-based frequency synthesizer. When the loop is locked, the output phases of every delay stage are evenly spaced one reference clock period Tref. Each phase difference of two delay stage has a delay of Tref/N and the edge combiner can generates a transition for each phase output transition, hence the output frequency is the N times the reference frequency Tref. A multiplying DLL

overcomes the drawbacks of PLL such as jitter accumulation, high sensitivity to supply, and substrate noise. For this reason, it represents a good performance for phase noise.

VCDL(analog) or DCDL(digital)

f

_back

PD Control

CP LF

analog

CU CW

digital or n

f

_ref

Edge Combiner f

out

=Nf

ref

Figure 3.1. Frequency synchronizer.

3.2.2 Clock and data recovery[3.5]

A block diagram is shown in Fig. 3.2 There are two main components of the CDR-an analog PLL (this part can be replace with the digital DLL, multiphase clocks to sample the data) and a digital CDR. The PLL's function is to generate evenly spaced multi-phase clocks which drive the receiver samplers. There are eight such clock phases and samplers-four for clock recovery and four for data recovery. A bang-bang phase detector generates 3-level phase error information by performing early/late detection and a simple majority vote on the 32 incoming samples. This phase error is filtered by a digital loop filter consisting of a proportional and a integral path to produce a 14-bit filter output. Given the difficulty of implementing a 14-bit phase interpolator with good linearity, a fully digital CDR controller that takes advantages of the phase filtering characteristics of the PLL is employed.

36 Figure 3.2. Clock and data recovery [3.5].

3.2.3 DRAM interface[3.6]

The calculations for timing budget show that the optimal value for tSD is approximately 20 percent of an input clock period. Since the input clock frequency range from 100MHz to 200MHz (DDR-200/266/333/400), the tSD value varies from 2ns (=10nsX0.2) to 1ns (=5nsX0.2). Therefore, a five-phase all-digital DLL was proposed in [3.6] to generate the desired tSD delay for DQS signal. The block diagram of the five-phase all-digital DLL for DDR SDRAM controller application is shown in Fig. 3.3. Like most of DLL-based multi-phase clock generators, the DLL has a multi-stage delay line with the same control word to generate equally spaced multi-phase clock output. It uses the time-to-digital (TDC) scheme to lock whole loop.

Hence, a design consideration should be noticed is that sometimes it is difficult to meet the minimum delay constraint when using standard cell to build up a high resolution delay cell. Therefore, the DLL in this design is lock to two periods of the reference clock period by using TDC scheme. After DLL is locked, the phase spacing of each delay stage should be 2*TFREF/5, where TFREF means the clock period of the reference clock. Hence the minimum delay constraint for each delay stage is extended twice as original. The total delay from DQS to DQSD becomes 1.2xTFREF, which

means the phase shift between DQS and DQSD is still 0.2xT_FREF. As a result, the desired tSD delay can be generated by the multiphase DLL.

Figure 3.3. Multiphase DLL used in DRAM interface [3.6].

3.3 System architecture

The proposed all-digital DLL-based multiphase clocks architecture is shown in Fig. 3.4.

It consists of four major blocks: eight digital controlled delay blocks, phase detector (PD), delay block controller, and anti-harmonic detection. In our work, when the Reset signal is high, the eight delay blocks are clear. If the Reset signal is low, the CLK_ref signal passes through the eight delay block. The operation is divided into four steps. The finite state machine is shown in Fig. 3.5. At first, the proposed multiphase clocks is in the anti-harmonic detection. Our work provides a wide operation range. It may result in a harmonic problem. For example, for the ideal situation, eight phases are separated in one clock period. Due to a wide delay range, the clock generator probably lock in the output clock with the two clock periods, which means eight phases are separated from the two clock period. Therefore, the data sampling rate is reduced. While the anti-harmonic detection is finished, the next step is the SAR mode. In the SAR mode, the delay block is controlled by a digital code which is produced from SAR controller. SAR control uses

the binary search algorithm. Finishing the SAR mode step, the proposed multiphase clocks is in the lock state. Due to the characteristic of the SAR control scheme, when entering the lock state, the clock generator becomes an open loop. An open loop is easily effected by the environmental variations. Thus, the multiphase clock generator is perhaps out of the lock state. If the clock generator is locked, the counter mode is triggered. The counter block will continue tracking the means of counter which adds or substrates by 1 at a time to the digital delay block control code. By utilizing the counter mode, the whole clock generator is in always in the close loop. Even if there exists the environmental variations, the clock generator will be locked to the reference clock.

Delay

Figure 3.4. Proposed DLL-base multiphase clocks with a wide operation range

Reset _Harmonic

3.4 Circuit description

3.4.1 Delay blocks

The CLK_ref signal goes through the eight delay blocks. Our target is to provide a wide range operation. In each delay block, it includes a coarse tune delay line and a fine tune delay line, as Fig. 3.6 shown. A coarse is used to enlarge the delay range and delay step so that the searching speed can be accelerated. A fine tune delay line is utilized to increase the delay resolution. A high delay resolution helps to reduce the clock jitter.

Delay Block

Nest-Lattice Current-starved

Delay Block Delay Block

Fine tune Coarse tune

Figure 3.6. Delay blocks

3.4.1.1 Coarse tune-Nest-lattice

For the coarse tune delay line, the nest-lattice structure [3.7] is adopted in our work. The nest-lattice delay is composed of the cascading lattice delay unit. For a conventional delay line, if the tunable delay range is increased by cascading the delay unit, the intrinsic delay is also increased. Therefore, the maximum operation frequency is limited. However, for the nest-lattice delay line, this problem can be avoided. The intrinsic delay of the next-lattice delay is four NAND gate delays. Each delay step is two NAND gate delays. The relationship between the input vector and

the delay is shown in Fig. 3.8. The delay resolution is about 55ps.

Figure 3.7. Coarse tune delay line- nest-lattice structure.

0 20 40 60 80 100 120

Figure 3.8. The relationship between the digital control code and the coarse delay.

3.4.1.2 Fine tune-Current-starve

The fine tune delay unit is employed the current starve type inverter, shown in Fig. 3.9. Two inverter are cascaded to form a buffer. Each coarse delay line has two NAND gate delay step. After eight delay block, each control bit has sixteen NAND gate delay. We use the fine tune delay line to increase the delay resolution. The digital

control word b[2:0] is fed into the fine tune delay line. It also includes a 3-bit binary-to-thermometer decoder to output the current starve inverter word f[6:0]. The delay of the current starve inverter is controlled by the conduction current. More gates are open, the delay is smaller. Fig. 3.10 shows the relationship between the input vector and the delay. In the coarse tune, the delay resolution is about 55ps. Therefore, for the fine tune, the delay resolution is about 6ps.

VDD

Figure 3.9. Fine tune delay line- current starve inverter

0 1 2 3 4 5 6 7

Figure 3.10. The relationship between the digital control code and the fine tune delay

3.4.1.3 Binary-to-thermometer decoder

The binary-to-thermometer decoder is adopted in the delay block. The thermometer code provides a monotonic characteristic. For example, the current starve inverter are open more gate when the control words are decreased. The

thermometer code is changed one bit between the two adjacent binary numbers. Also, the thermometer decoder scheme can reduce the glitch when comparing with the binary scheme. Fig. 3.11(a) shows a N-bit binary-to-thermometer decoder architecture.

Fig. 3.11(b) illustrates 2-bit binary-to-thermometer decoder structure. In order to make each signal path as equal as possible, a NAND gate is used as an inverter. Fig.

3.11(c) presents a 3-bit binary-to-thermometer decoder. A 2-bit binary-to-thermometer decoder is utilized in the 3-bit binary-to-thermometer decoder. Therefore, this kind of binary-to-thermometer decoder has a simple rule to follow. Finally, for N-bit binary-to-thermometer decoder is shown in Fig. 3.11(a).

N-bit to binary

Figure 3.11. (a)N-bit binary-to-thermometer decoder. (b) 2-bit binary-to-thermometer decoder. (c) 3-bit binary-to-thermometer decoder.

3.4.2 Phase detector

The phase detector is composed of two D flip-flops, two XOR logic gates, and a delay cell, as Fig. 3.12(a) shown. The eighth phase clock is feedback into the phase detector as a clock signal of the D flip-flop. The CLK_ref signal is used as the data of the D flip-flop. The eighth phase clock is delayed by two NAND gate delay as the clock signal of the second D flip-flop. The two NAND gate delay will form a detection window. Two output signal of the phase detector are Comp and Lock. Fig.

3.12(b) demonstrates how to judge the output signal. If CLK_ref signal is located in the detection window, the Lock signal is pulled up. When the CLK_ref signal appears after the detection window, the delay block should provide a long delay time in the next clock cycle, and vice versa. The truth table of the phase detector is shown in Table 3.1.

Comp

Lock CLK_ref

D Q1 CLK

D Q2 CLK Qb

Two NAND gate delay

(a)

Lock Lead Lag

Detection window

(b)

Figure 3.12. (a) Phase detector circuit block. (b) Operation diagram

44 Table 3.1. The truth table of the phase detector.

Q1 Q2 Comp Lock

0 0 1 0

1 1 0 0

0 1 X 1

3.4.3 Delay block controller

The digital DLLs are four categories. The first type is register-controlled DLL (RDDL) [3.8]. The n-bit shift register which is controlled by the output of phase detector is used to generate control signals for the digitally controlled delay line.

When the operating range is increased, the additional delay stages of delay line should be added. This increases the chip area. Because the control mechanism is one by one, the more delay stages needs more shift registers to control the delay line. Thus, it also increases locking time. In the worst case, n-bit shift register needs n/2 locking cycles.

The second type is counter-controlled DLL (CDLL) [3.9]. The operating principle of counter-controlled DLL is similar to register-controlled DLL expect the up/down counter substitutes for the shift register to control the delay line. The n-bit control word determiners whether the input signal goes through the delay path or passes it.

The most different point between RDLL and CDLL is area requirement. In the worst case, with n-bit binary-weighted delay line, the locking time maintains n/2 locking cycles.The third type is time measurement DLL (TMDLL) [3.10]. It can measure the input clock period and convert it to digital signals within two clock cycles, then transfer the digital control word to the control block. The search time of TMDLL is quite fast, but it has an area overhead. The final type is successive approximation register-controlled DLL (SARDLL) [3.11].The SARDLL changes the searching mechanism to binary search algorithm and adopted with binary-weighted delay line. It

is not only reduces the chip area but also shorten the locking time. The SAR controller in the DLL determines the value of each bit of the word in a sequential and irreversible. Therefore, it becomes an open-loop type circuit after lock-in and never

在文檔中可用於低電壓動態電壓與頻率調節系統之多相時脈設計與電壓準位轉換設計 (頁 41-0)