Organization - 可用於低電壓動態電壓與頻率調節系統之多相時脈設計與電壓準位轉換設計

Chapter 1 Introduction

1.3 Organization

The thesis includes six chapters which focus on the level conversion for the multiple supply voltage domain and a wide range operation DLL-based multiphase clock generator.

Chapter 2 gives an overview of the DLL-based frequency multiplier, duty cycle corrector, level converter, and level converting flip-flop.

Chapter3 describes the proposed wide range DLL-based multiphase clock generator with a anti-harmonic detection. Also, a DLL-based duty cycle corrector is proposed.

Chapter4 presents the proposed energy-efficient level converter for sub-threshold to super-threshold operation.

Chapter5 demonstrates the proposed level converting flip-flop. The level converter from Chapter4 is utilized in the flip-flop. For the level converting flip-flop, the data can be latched and level converting at the same time.

Chapter6 gives the conclusion of this thesis and the future work.

Chapter 2 Overview on DLL-based Frequency Multiplier, Duty Cycle Corrector, and Level Conversion

2.1 An Overview on DLL-based frequency multiplier

As the advance of the CMOS technologies, IC performances have improved very fast. Many high-speed applications, such as microprocessor, memory IC and communication IC, require high frequency clocks in the circuits. In order to increase the bandwidth of the data rate, the clock frequency can be improved by the on-chip frequency multipliers. The output high frequency clock signal should be also synchronized with the reference clock. Thus, the synchronization problem has become a critical design issue in the clock generation field. The clock generation and synchronization can be solved by the phase-locked loop (PLL) [2.1]-[2.3] or the delay-locked loop (DLL) [2.4]-[2.6].

PLL-based clock generators require voltage-controlled oscillator (VCO), which is difficult to design and prone to the process, voltage, and temperature (PVT) variations [2.7]-[2.8]. DLL-based clock generator produces the clocks by replacing the VCOs with the delay block. Therefore, the DLLs are much simpler to implement and more immune to the PVT variations. Additionally, DLLs merely add a controllable phase delay to the input clock signal and produce an output clock signal, jitter that is present in the input clock signal is passed directly to the output clock signal. In contrast, the jitter of the input clock signal can be better filtered out by DLL.

Although the DLLs are more stable than the PLLs, there is a difficulty in designing the frequency multiplier using voltage-controlled delay or digitally-controlled delay line. The frequency multiplier schemes have been presented [2.9]-[2.21]. One is that the multipliers have a fixed multiplication factor [2.9]-[2.10]. The other is that the programmable multiplication factor frequency multiplier circuits have been proposed [2.11]-[2.13].

Two kinds of DLL-based multiphase clock schemes have been presented. One is using many delay blocks to produce many consecutive delay phases and then utilizing the edge combiner [2.14]-[2.15], [2.20]-[2.21] to achieve the frequency multipliers.

The multiplication ratio is usually correlated with the number of the delay cells in the delay lines, meaning higher factor the larger areas. The other is using delay cells as a ring oscillator generating the cyclic waves [2.11]-[2.12], [2.16]-[2.19] to multiply the clock frequency. The cyclic scheme has a locking initial constraint that it has to operate from the shortest delay line. Thus, this kind of the clock generation cannot switch the clock frequency from low to high.

2.1.1 Frequency multiplier scheme

Advances in the VLSI fabrication process have led to an increase in the clock frequencies of circuits. Hundreds of megahertz frequencies is easy to reach for nowadays technology. At such high frequencies, how to distribute the clock signal through an entire system has become a problem. An external clock cannot be used, thus an on-chip clock multiplier is essential for the high speed products. There are two ways to produce the multiple frequency of the reference clock. One is cyclic circulating wave, the other is delaying the reference clock to produce the multiphase clocks.

2.1.1.1 Cyclic circulating scheme

All-digital clock generator for dynamic frequency scaling[2.11]

In this case, an all-digital clock generator using a cyclic clock multiplier (CCM) was proposed for the dynamic frequency scaling applications. It only takes the four reference clock cycles to lock the clock signal. Besides, the cyclic jitter cased by the mismatch of the delay cell can be reduced because the output clock passes through the same delay line. Particularly, it can realize a fractional or multiplied clock.

Fig. 2.1 shows the proposed multiple frequency clock generator. It includes a CCM, a finite state machine (FSM), a conventional time-to-digital converter (TDC), a counter_K, a programmable divider and two multiplexers (MUXs). The divider varies from 2 to 8 by N[2:0]. Take the five times of reference clock as an example (factor ratio, M=5) and a typical timing diagram is shown in Fig. 2.2. The operation is divided into four steps which takes one reference clock for each step. To preset C[4:0]

as M=5 and CCM measures the period of the reference clock. Five unit delay cells are selected to circulate a pulse. Counter_K counts this multiplied clock within one reference clock. Next, the counted value is stored in K[4:0]=3. The FSM changes the number of the delay cells of CCM, from 5 to 3. After that, CCM produces 5 pulses by 3 units. Finally, TDC measures the phase error between the multiplied signal PG and the reference clock. And then the delay of the unit delay cell in the CCM is adjusted by the output of TDC.

7 Figure 2.1. An all-digital clock generator for DVFS [11].

Figure 2.2 Timing diagram [2.11].

Figure 2.3. Cyclic clock multiplier [2.11].

All-digital fast-locking programmable DLL-based clock generator[2.12]

For the cyclic clock multiplier, there exists the initial delay constraint. A new locking method was proposed to fix this problem. Moreover, the modified successive approximation register-controlled (MSAR) circuit was utilized to shorten the locking time and tracks the environmental variations.

Fig. 2.4 shows the proposed DLL-based clock generator. It is made up of the MSAR circuit, a timing control circuit, a digital phase-frequency detector (PFD), and a digital-controlled delay line. This clock generator provides two operation modes, binary-search mode and sequential-search mode. Each mode has two execution cycles alternatively, refresh cycle and compare cycle. Differing from the conventional cyclic circulating scheme, refreshed every reference cycle. It refreshes the output clock to eliminate the initial constraint by two execution cycles. The two-cycle refreshing technique helps to solve the initial delay constraint and achieve fast-locking time.

However, this architecture provides the more accumulated jitter and a half-loop bandwidth, compared with the conventional architecture having the same loop parameters. When the binary-search mode is finished, the frequency acquisition is also carried out. The clock generator enters the sequential -search mode. The MSAR circuit becomes a counter. The clock generator operates in a closed loop to track the PVT variations and compensates for the phase error. Once the clock generator enters the sequential-search mode, it will not go back to the binary-search mode until the system is reset. Fig. 5 shows the timing diagram of the clock generator.

9 Figure 2.4. A fast-locking programmable DLL-based clock generator [2.12].

Figure 2.5. Timing diagram [2.12].

2.1.1.2 Multiphase clocks with an edge combiner

A low power and wide range programmable clock generator with a high multiplication factor[2.4]

In this case, the clock generator consists of a DLL, a pulse generator, and a pulse combiner. Each pulse is generated from one corresponding unit delay cell. A high multiplication factor can be achieved with a fewer number of delay cells stages. In addition, power dissipation is reduced because the pulse generator consists of D flip-flop and inverters which operated only when trigged in their turn. An additional pulse selection process has been eliminated because the required sub-pulses are produced from the pulse generator for a target output signal frequency. A

saturated-type differential delay cell is utilized so that the clock generator can be operated in a low frequency without the area overhead.

Fig. 2.6 shows the overall block diagram of the DLL-based clock generator. The clock generator produces 24 differential phase-shifted signals. The pulse generator detects the rising edge of the selected phase-shifted signals according to the programmed 2-bit signal. The details of the pulse generator and the pulse combiner are illustrated in the Fig. 2.7. The phase shifted signal from the VCDL, k , triggers the corresponding kth DFF. Since the reset process takes two-inverter delay time, a short pulse of duration, Δτ, is generated at Qk. Finally, the pulse combiner collects these pulses. In order to create the required pulses, each S is set to either 0 or 1 according to the programmed 2-bit signals, C0 and C1. Thus, one of four multiplication factors-4,8,12,24 can be chosen.

Figure 2.6.A clock generator with a high multiplication factor [2.4].

11 Figure 2.7. Pulse generator and edge combiner [2.4].

Process variations tolerant all-digital multiphase DLL for DDR3 interface[2.21]

In this case, the clock generator uses the four phase shifter to produce the multiplied frequency. The conventional digital DLL may suffer from a harmonic locking problem and area overhead of the delay line control logic. A time to digital converter (TDC) was used to prevent the harmonic locking but have an area overhead.

Therefore, a ring oscillator and a counter to resolve the harmonic problem was used in this case. The maximum operating frequency of a digital DLL is determined by the minimum delay of a delay line. In this architecture the minimum delay is four times larger than the conventional digital DLL. The fine-tune delay line is needed to solve this problem.

Fig. 2.8 shows the proposed DLL which is composed of four 90 degree phase shift blocks, a global delay line controller, a phase selector, and an edge combiner. To eliminate the delay mismatch among the delay lines, the operation mode of the DLL is divided into the calibration mode and the locking mode. Each 90 degree delay block has its own controller which avoids interfering with other delay blocks. During the calibration mode, each 90 degree block calibrates 90 degree delay of its own delay line. An area efficient binary to thermometer (BTC) is used to control the digital

coarse delay line (DCDL). After the calibration mode, the operation mode changes into the locking mode. During the locking mode, the four delay lines are operated not as a ring oscillator but as a delay line controlled by the global delay controller.

Finally, an edge combiner collects a 90 degree shifted clock output with 2x multiplication. Fig. 2.9 shows the 90 degree phase shift block structure.

Figure2.8. Process variation tolerant multiphase DLL [2.20].

Figure 2.9. 90 degree phase shift block structure [2.20].

2.2 An Overview on Duty Cycle Corrector

The clock signal is described by some of parameters such as, clock frequency, clock period, clock duty cycle. The duty cycle is defined by the ratio of the on-time period in a clock cycle period. A clock with a 50% duty cycle plays an important role in some applications, such as double-data-rate (DDR) SDRAM, a clock and data recovery circuit (CDR), an analog-to-digital converter (ADC), an dual-edged triggered flip-flop. In the double-rate systems, the data is sampled at both of the clock

edge, positive clock edge and negative clock edge. For the high-speed systems, the data is sampled by the dual clock edge so that the throughput is dramatically increased if comparing to only using the single clock edge. For the low power systems, if maintaining the same throughput, the clock frequency can be decreased to a half of clock frequency. Once the clock frequency is reduced, the clock network can consume less power. Also, a clock signal with 50% duty cycle is a critical key for the dynamic logic family. There are two phase-precharge phase when clock signal is at the low level and evaluation phase when the clock signal stay at the high level in the dynamic logic family. If there is a duty cycle distortion, it may cause the degradation of the performance. However, a duty cycle of the clock signal from the off-chip is prone to deviate from 50% while operated in a high frequency. In addition, even the clock generator produces a 50% duty cycle clock signal, there is probably a deviation in the duty cycle because of the unmatched clock driver in the rising edge and falling edge.

In order to solve this problem, the duty cycle corrector (DCC) have been widely used [2.22]-[2.33] to adjust the duty cycle as close to 50% as possible. DCC can be classified into two types: digital type [2.22]-[2.29], analog type[2.33]-[2.34], and mixed mode [2.30]-[2.31]. The digital DCCs are separated into the feedback type [2.22]-[2.23] and non-feedback type[2.24]-[2.29]. The analog DCCs are usually implemented as a feedback type [2.32]-[2.33] to get a better accuracy at the expense of long locking time.

2.2.1 Digital

2.2.1.1 Feedback [2.22]

The proposed structure is shown in Fig. 2.10. It provides a clock synchronization with a deskew buffer and a duty cycle correction. They are composed of three half

delay lines (HDLs), an edge combiner, an interpolator, two phase detectors, and the circuit controllers. The three half delay lines are used to reduce the mismatch between the half delay lines. The architecture is based on a cyclic time-to-digital converter to shorten the locking time. Two phase detectors are employed to give the leading-lagging information during duty cycle correction (DCC) and deskew operations. Fig 2.11 shows the signal paths in the DCC phase and the deskew phase.

From Fig. 2.11(a), CLK_IN passes through PATH 1 to produce CLK_DL whose period is equal to one input clock period. PD1 is used in PATH 1. In PATH 2 , with the fixed HDL1, HDL3 is duplicated as HDL2 by using PD2. Finally, CLK_IN travels through three HDLs to start the deskew phase, as Fig. 2.11(b) shown. CLKDL is like a set signal of the edge combiner. CLK_HDL and CLK_RHDLcan be interpolated by averaging both as a half of the input clock period. The newly interpolated signal is used like a reset signal of the edge combiner. The digital feedback DCC can provide a short duty cycle correction time when comparing with the analog method. However, the digital feedback DCC needs a more complicated duty cycle detectors structures such as TDC.

Figure 2.10. Proposed all-digital with feedback loop DCC [2.22].

15 (a)

(b)

Figure 2.11. Signal paths [2.22]. (a) DCC path (b) deskew path

2.2.1.2 Non-feedback [2.26]

The half cycle delay line (HCDL) is a key component in the proposed DCC, as Fig. 2.12(a) shown. CKB passes through HCDL to produce CKR which is delayed by a half of the clock period. CKR is like a reset signal of the SR latch. CKB travels through matching delay line (MDL) to generate CKS which is slightly delayed after CKB. The correcting precision of a digital DCC is depended on the delay time of the delay unit so there exists a quantization error. MDL is used to compensate this inherent delay of HCDL. The timing diagram is drawn in Fig. 2.12(b). The digital non-feedback DCC has a fast duty cycle correction procedure. However, the characteristics of the open loop is that it can't track the process, voltage, and temperature (PVT) variations. It is not suitable to be operated in the low voltage design.

16 (a)

(b)

Figure 2.12. Without a feedback loop DCC [2.26]. (a) Proposed topology (b) Timing diagram.

2.2.2 Analog

The analog DCCs are usually implemented as feedback type to achieve a higher duty cycle accuracy at the expense of a long duty cycle correction time. Besides, it needs a more complex designs to maintain stable operation. In [2.33], DCC is along with a negative feedback loop and the pulse shrinking/stretching mechanism is utilized to adjust the input clock duty cycle, as Fig. 2.13 shown. The differential low-pass filter is used as an integrator to generate a feedback voltage Vbias which is used to adjust the delay line. In [2.33], the pulse control loop is utilized to control the duty cycle of the input clock, as Fig. 2.14 shown. Using the current ratio of charge pump, the DCC can generate a programmable duty cycle of the output clock

17 Figure 2.13. Analog DCC with an integrator [2.33].

Figure 2.14. Analog DCC with a charge pump [2.33].

2.2.3 Mixed mode [2.30]

The other type of DCCS is the mixed mode. The duty cycle detectors are implemented by the analog circuits such as an amplifier, integrators, or comparators.

In [2.30], it used a comparator as a duty cycle detector and a digital SAR-controller to control the duty-cycle adjuster. The duty-cycle adjuster is employed the phase mixers to achieve a better duty-correction resolution. The analog comparator detect the clock duty-error precisely. The other blocks are implemented with digital circuits to reduce duty cycle correction time.

Figure 2.15. Mixed mode DCC [2.30].

2.3 An Overview on Level Converter

With the development of the portable devices, the power consumption becomes a critical issue. Applying a voltage scaling technique that changes the supply voltage of a gates to a lower value in CMOS circuits is an effective way of reducing power consumption. Ultra-low voltage logics have been exploited[2.34]. However, supplying an ultra-low supply voltage may cause the degradation of the performance.

In [2.35], a clustered voltage scaling scheme was developed, in which, a critical path is still supplied by a high voltage to meet the performance demanding and a non-critical path is provided by a low supply voltage to reduce the power consumption. When a gate in a low voltage drives the a gate in the high voltage, the high output of the low-voltage gate is unable to fully turn off the high-voltage gate.

This results in a DC leakage path from the power source to the ground. To overcome this problem, a level converter is implanted at the interface between two different voltage domains. The conventional level converter are separated into three types, as Fig. 2.26 shown. Unfortunately, they were only suitable for converting a low super-threshold input into a high super-threshold output. When operated in the sub-threshold region, those conventional level converter fail to work correctly.

Recently, there had been proposed many methodologies for successfully converting the a sub-threshold input to a super-threshold output, [2.37]-[2.49].

Vin_low

Figure 2..16. Conventional level converters. (a) Cross-coupled type (b) Current mirror type (c) Dynamic type

2.3.1 Cross-coupled type

The cross-coupled type level converter is drawn in Fig. 2.16(a). Two cross-coupled PMOS transistors form a positive feedback loop. When operated in ultra-low voltage, the pull-down devices (NMOS) is too weak to compete with the pull-up devices (PMOS). To have a balance driving ability can solve this problem.

2.3.1.1 Voltage doubler [2.37]

The voltage doubler is inserted before the input signal is applied to the level converter. The voltage doubler can level up the input voltage to enhance the pull down strength, as Fig. 2.17 shown. However, there are two larger MOS capacitors to incur an area overhead. In addition, the they are susceptible to noise because one of the two capacitors has one node floating. Therefore, the capacitors lose the charges over a period of time.

Figure 2.17. Voltage doubler [2.37]

2.3.1.2 Cascade level converter [2.38]

Fig. 2.18 presents a cascade level converter methodology. The difference between the output voltage and the input voltage is reduce so that the pull up driving ability and pull down driving ability is almost equally. There is no imbalanced driving strength problem. For this method, the system should provide the three intermediate voltages, 300mV, 400mV, and 600mV. This results in the power management overhead.

Figure 2.18.Cascade level converter [2.38]

2.3.1.3 Reduced swing inverter (RSI) [2.39]

In the cross-coupled latch, two reduced swing inverter are inserted in the positive loop, as Fig. 2.19(a) shown. The method is to reduce the pull up driving ability. A reduced swing inverter is presented in Fig. 2.19(b). When the input signal IN is "0", MP3 is turned on and charges the node OUT. If the input signal IN is changed to "1",

在文檔中可用於低電壓動態電壓與頻率調節系統之多相時脈設計與電壓準位轉換設計 (頁 16-0)