Organization - 應用於無線近身網路之可調式全數位時脈產生器

CHAPTER 1 Introduction

1.2 Organization

The rest of this thesis is organized as follows. At first, the all-digital PFTCG is described in Chapter 2. Then, we propose a low power delay tunable hysteresis delay cell (HDC) for DCO design in Chapter 3. Chapter 4 presents a PVT tolerance clock generator. Finally, Chapter 5 summarizes our work and discusses some design topics in the future.

CHAPTER 2 Phase-Frequency Tunable Clock Generator

As shown in Chapter 1, the PFTCG is used for DPR and DFR [2] to change the generated clock phase and frequency in some response time. Traditional PLLs, designed by analog approaches, are composed of phase frequency detector (PFD), charge pump (CP) circuits, loop filter (LF), VCO and frequency divider. The analog-based PLLs have more difficulty in tradeoffs among gain, supply voltage and frequency range of VCO designs in more advanced process technology. The large capacitance of LF increases chip area, but the off-chip capacitance consumes much power. Furthermore, the serious leakage current problem to CP circuits in deep sub-micron technology also dominates overall power dissipation.

On the contrary, the advantages of all digital approach, like all-digital phase-locked loop (ADPLL) [5-8], all-digital delay-locked loop (ADDLL) [14] or all-digital multi-phase clock generators (ADMCG) [15], are short lock-in time, low design complexity for voltage scaling and power minimization, and easily integration in SoC applications. Therefore, the PFTCG is proposed in all-digital scheme for

power reduction and performance improvement by the clock phase and frequency adjustments.

2.1 System Overview

The overall dynamic phase-frequency recovery [2] block diagram with the proposed all-digital PFTCG is shown in Fig. 2.1 [3]. The signals are transmitted with the channel noise and down-converted in the receiver side. Then, the received signals are sampled by symbol period with initial timing offset ε. After timing synchronization composed of packet detection and boundary detection blocks, the timing error detector (TED) starts maximum absolute-squared-sum (MASS) search [2]

of the initial preamble. Afterward, the TED calculates the absolute-squared-sum with different sampling phase εˆ provided by the PFTCG. And then, the PFTCG selects the optimal sampling phase that results in MASS.

Although TED adjusts the sampling clock phase, the drift amount due to sampling clock frequency offset ξ still increases. The frequency error detector (FED) estimates the sampling clock frequency offset after the fast Fourier transformation (FFT) by least squares (LS) algorithm [16]. The estimated sampling clock frequency offset ξˆ is also sent to the PFTCG for tuning sampling frequency. To summarize, the ADC sampling clock is controlled by PFTCG with the estimated sampling phase offset εˆ and sampling frequency offset ξˆ .

The phase-selection capability of PFTCG enables the receiver to sample incoming signals at better instances without increasing sampling frequency, and the frequency fine-tuning capability reduces the SCO between the transmitter and

receiver for better PER performance. The design specification of PFTCG is listed in Table 2.1, including 5 MHz reference clock source and 5 MHz target output with 8 phases and ±150 ppm frequency tuning range centered at 5 MHz.

Fig. 2.1. Block diagram of the system operation with all-digital PFTCG.

Table 2.1. Specification of PFTCG.

Reference Clock Source 5 MHz

Output Clock 5 MHz

Phase Number 8

Frequency Tuning Range ±150 ppm (@5MHz)

2.2 Architecture

The proposed all-digital and cell-based PFTCG architecture is shown in Fig. 2.2.

There are four major blocks in the PFTCG, namely phase frequency detector (PFD), multi-phase digitally controlled oscillator (DCO), PFTCG controller, and glitch-free clock multiplexer (GFCMUX).

The reference clock (REF_CLK) is generated at 5 MHz by the small and highly integrated circuits which are described in Chapter 4. In the locking loop, the PFD detects the difference of frequency and phase between the reference clock (REF_CLK) and the DCO output (Phase0). Then, it generates an up (UP) and down (DOWN) signal to indicate that the controller adjusts DCO control code (DCO_CODE) to speed up or slow down the DCO, respectively. The updated DCO control code can provide multi-phase DCO to generate eight phases clock (from PHASE0 to PHASE7) with equal spaced by the extracted DCO delay path. The glitch-free clock multiplexer receives the estimated sampling phase offset εˆ from TED and selects the optimal sampling phase from PHASE0 ~ PHASE7. The FED delivers the estimated sampling frequency offset ξˆ to PFTCG controller and slightly tunes the sampling frequency by DCO_CODE.

According to the developed algorithm [5], the whole all-digital PFTCG operation mechanism is illustrated in Fig. 2.3. After the system reset, the all-digital PFTCG enters to a phase and frequency tracking state. The controller sets the DCO at the middle of delay path. The DCO initial search step is n/4, where n is the number of frequencies provided by the DCO. While the PFD detects from lead to lag, the search step is divided by two, and vice versa [5]. When a new DCO code is calculated, the

present DCO and PFD control signals are first cleared and then updated to the latest DCO code. To clear DCO prevents from glitches which result from directly updating DCO codeword. To clear PFD keeps the coarse-tuning loop from frequency and phase divergence.

When the search step reduces to one, the frequency of DCO output clock is acquired [5]. The DCO control code would be averaged during the next cycles for tracking the output clock frequency of DCO. Then, the lock signal (LOCK) triggers and DCO codeword locks the output clock frequency to the desired 5 MHz.

Afterwards, the phase selection state is applied to switch and search the optimal sampling phase by the aid of TED. Finally, FED would send the estimated clock frequency offset ξˆ to PFTCG, resulting in the less-interfered data before system signal processing.

Fig. 2.2. Architecture of the proposed all-digital PFTCG.

Fig. 2.3. Control mechanism of the proposed all-digital PFTCG.

2.3 Circuit Designs

2.3.1 Phase Frequency Detector

The PFD design follows the circuit topology proposed in [5] with standard cell library and the block diagram is shown in Fig. 2.4. While the feedback clock (PHASE0) generated from DCO leads the reference clock source (REF_CLK), the signal QD generates a high pulse until REF_CLK arrives the D flip-flop (DFF) and triggers for QU. The generated signal QU first goes back to the reset branch on DFF and then clears the QU and QD. At the same time, OUTU brings about a low pulse and OUTD remains high. Finally, the flags UP and DOWN will be triggered by these

signals and sent to the PFTCG controller for slowing down the DCO. On the other hand, when PHASE0 lags REF_CLK, DOWN becomes high and UP remains low.

The dead zone problem is generally known in PFD, which is caused by the limited response time of transistors. When the pulse width of QU or QD is not long enough to turn on the following circuits, the characteristic of PFD becomes discontinuous. To minimize the dead zone, a digital pulse amplifier [5] is proposed in Fig. 2.4. It uses the cascaded two-input AND gates architecture to enlarge the pulse width of OUTU and OUTD. There is another method to eliminate the dead zone with an inserted delay buffer in the feedback path of the reset branch. The increasing response time for DFF would effectively generate a wide enough pulse width to minimize the dead zone of the PFD, thus, the following D-flip-flops can detect it.

When the phase error between REF_CLK and PHASE0 is less than 5 ps, both UP and DOWN will remain in high, and no trigger signal is sent to the PFTCG controller.

Fig. 2.4. Schematic of PFD.

2.3.2 Digitally Controlled Oscillator

The proposed cell-based and 8-phase 5 MHz DCO is shown in Fig. 2.5. To preserve the DCO control code resolution and wide operation range under PVT variations from several tens of nanoseconds to the ten picoseconds scale, the proposed DCO is separated into three tuning stages.

In order to provide 8 phases from the generated 5 MHz clock source, the buffers in the 1st tuning stage divide the total delay into a multiple of 50 ns in each delay segment and connect to 4 multiplexer groups. The signals, from OUT0 to OUT3, are extracted from the delay chain by multiplexer groups with equal spacing. Then, they are fine-tuned individually by the following 2nd and 3rd stages and generate 8 phase clock signals by inverters (INV) and buffers (BUF).

The proposed 1st tuning stage employs cascading structure [17] with 16-to-1 path selector, as shown in Fig. 2.6, to maintain delay linearity and extend operation range easily. There are 4 bits of 1st tuning control code for the 16-to-1 path selector.

The delay time difference between the two neighbor paths is determined by one 1st tuning delay cell including one buffer (BUF) and one multiplexer (MUX) as shown in Fig. 2.6. In place of the tri-state buffer architecture [5] [10-11] for path selector, the multiplexers can increase the controllable range. The summation of propagation delay from low to high (TPLH) and propagation delay from high to low (TPHL) of one 1st tuning delay cell is about 30.27 ns under PVT conditions (TT, 0.8V, 25℃). So, the delay resolution of the outputs (OUT0 ~ OUT3) is 30.27 ns when the 1st tuning control code changes by one.

Fig. 2.5. Block diagram of DCO.

Fig. 2.6. Architecture of 1st tuning stage of DCO.

Fig. 2.7. Proposed delay cell in 3rd tuning stage [10].

Moreover, the 2nd and 3rd tuning stages are constructed after the 1st tuning stage to achieve better delay resolution of the proposed DCO. The circuit topology in the 2nd tuning stage follows the 1st stage except that the minimum delay resolution is 1.06 ns with 5 bits control code. For tracking the reference clock without the false lock in PFTCG, the controllable range of the 2nd tuning stage has to cover the delay resolution of 1st tuning stage. The principle of 3rd tuning stage design is the same as the mentioned 2nd tuning stage.

The least significant bit (LSB) resolution of the DCO can be improved to about 8.6 ps by adding 3rd tuning delay cell. The 3rd tuning stage applies the digitally-controlled varactors (DCV) [10] from cell library to accomplish the highest resolution and linearity. As shown in Fig. 2.7, there is an intrinsic capacitance (CI) parallel with a differential capacitance (∆C) in the output node (OUT). And the gate capacitance of 3-input NAND logic-gate is controlled by the digital code (ON3). The other input pin is tied to zero. This 3-input NAND is selected with one input pin tied to zero to cut off the path of NMOS and PMOS from ground and voltage supply, respectively [3]. Then the on-off behavior from ON3 decides if the additional loading capacitance (∆C) appeared in the output node of the delay cells, resulting in the change of charge and discharge in the desired delay resolution [3].

For the ±150 ppm frequency tuning range of design specification, the controllable range of the 3rd tuning stage has to be larger than 60 ps (=2*200ns*150ppm). In the proposed DCO design, the range of 3 tuning stage is at least 428.8 ps (= ±1072 ppm) under any PVT variations. There are 7 bits digital control codes in the 3rd tuning stage. Thus, the proposed DCO has 16 (=4+5+7) bits for tuning. Based on all standard cells, the delay resolution and controllable range of proposed three tuning stages under PVT conditions (TT, 0.8V, 25℃) are listed in Table 2.2. It shows that the controllable range of each stage is larger than the step of the previous stage.

By HSPICE simulation, the tolerance maximum output frequency of the proposed DCO is 6.03 MHz (165.9 ns) and the minimum output frequency of the DCO is 4.48 MHz (223.0 ns) under PVT corners (SS, 0.72V, 125℃) ~ (FF, 1.1V, 0℃). As a result, total power consumption of the proposed DCO is 90.3 µW and 53.7

µW under 1.0 V and scaled 0.8 V supply voltage, respectively, in UMC 90 nm CMOS process.

Table 2.2. Controllable range and delay resolution of DCO in PFTCG.

1st Tuning Stage 2nd Tuning Stage 3rd Tuning Stage

Code Length (bits) 4 5 7

Range (ns) 454.05 32.9158 1.0922

Resolution (ns) 30.27 1.0618 0.0086

2.3.3 Glitch-Free Clock Multiplexer

As above, the proposed DCO generates 8 phase clock signals for DPR. Then, one of these 8 sources is selected by the glitch-free technique [18-19]. In general, a simple multiplexer is used to perform the selection operation. However, different arrival time of the switching signals to the conventional multiplexer results in glitches. The problem with the conventional multiplexer is that the control signal may change in any time with respect to the source clocks, which creates a potential for chopping the output clock or a glitch at the multiplexer output [19]. These glitches on the clock line would lead to the difficulty in sampling data synchronization and DPR [2].

Fig. 2.8 depicts a 2-to-1 clock switching circuits [19] that provides either of two clock signals CLK0 and CLK1 on a clock-distribution output OUT_CLK without switching glitches. For the purpose of protection the high pulse of OUT_CLK against interruption, two negative edge trigger DFF are used. As shown in Fig. 2.9, in the beginning, the selection signal (SELECT) switches to zero, and d0 turns to zero immediately. Then, at the following falling edge of CLK0, the upper DFF is triggered

and qb0 feedbacks to d1. At the same time, OUT_CLK stops the propagation from CLK0. In the end, the below DFF is triggered at the following negative edge of CLK1 and OUT_CLK switches to CLK1 without glitches. These circuits also assure that the second positive edge of output signal (OUT_CLK) after the selection signal changes from the new clock (CLK1).

We can extend this 2-to-1 clock switching MUX to 8 clock sources switching.

And each select signal has to feedback to all sources [19]. However, the DPR method orderly switches the 8 phase clocks and chooses the optimal phase by MASS search algorithm in DPR. So, we can modify the extend architecture to reduce the redundant circuits. The proposed 8-to-1 glitch-free clock MUX has not connect all feedback signals of DFF output (qb[0] ~ qb[7]) to select signals (SELECTION[0] ~ SELECTION[7]), as shown in Fig. 2.10. The phase selection signal (P[0:2]) controlled by TED transfers to SELECTION[0:7] by a decoder.

Fig. 2.8. Schematic of 2-to-1 glitch-free clock MUX [19].

Fig. 2.9. Simulated waveforms of the 2-to-1 glitch-free MUX.

Fig. 2.10. Proposed 8-to-1 glitch-free clock MUX for DPR.

2.4 Simulation Result

Fig. 2.11 shows the transient response of the proposed PFTCG operation scenario, where the reference clock (REF_CLK) is 5 MHz. When the RESET is triggered, the PFTCG starts to track the frequency and phase of reference clock. The DCO control codeword (DCO_CODE[15:0]) is converged to desired 5 MHz until the LOCK signal is triggered. By using an adaptive search step in frequency acquisition as described in Section 2.2, the PFTCG can finish the tracking state in 128 (=4*2*log(2¹⁶)) reference clock cycles in this worst case. During this tracking time, it is found that the CLEAR_DCO signal is sent frequently to update the DCO loop to a new delay path to avoid the glitches in the loop.

Then, the phase selection signal P[2:0] controls the glitch-free clock multiplexer to switch the output phase (PHASE[7:0]) in order when the PFTCG is required to change the output clock phase to the optimal phase by TED. After the searching of phases, output clock (OUT) is fixed its sampling phase by the estimated sampling phase offset εˆ . The frequency tuning control signal ξˆ (corresponding to the signals TUNE_VALID and TUNE_CODE) from FED is sent for fine-tuning the frequency of output clock.

In Fig. 2.12, there are 8 even-spaced clock waveforms in the switching phase state. The phase of output clock is switched from PHASE7 to PHASE2. Each pair of waveforms has about 25 ns delay. The percentage between each phase slot is 10 %, 10

%, 10 %, 12 %, 14 %, 14 %, 14 % and 15 % of one period from PHASE0 to PHASE7, respectively.

Fig. 2.11. Simulated waveforms of PFTCG operation scenario.

Fig. 2.12. Simulated multi-phase waveforms of PFTCG.

2.5 Implementation and Measurement Result

We summarize the PFTCG hardware information in Table 2.3. The PFTCG is an always-on building block that continuously consumes both dynamic and static power.

Therefore, it is implemented in the UMC standard process 90 nm high threshold voltage (SPHVT) CMOS technology for static current saving. The frequency of reference clock source is 5 MHz. The generated phase-frequency tunable output clock has 8 phases at 5 MHz. The delay cell resolutions of 1st ~ 3rd tuning stage in the DCO circuits are 30.27 ns, 1.06 ns, and 8.6 ps, respectively. Fig. 2.13 shows the area distribution of all-digital PFTCG. The DCO and controller almost occupy overall area.

Table 2.3. The proposed PFTCG hardware profile.

Technology Standard 90 nm SPHVT CMOS

Target Frequency 5 MHz

Phase Number 8

1st Tuning Stage Resolution 30.27 ns 2nd Tuning Stage Resolution 1.06 ns

3rd Tuning Stage Resolution 8.6 ps

Freq. Tuning Range ±1072 ppm(@5MHz)

Core Area 125 µm × 252 µm

The PFTCG designed layout view is shown in Fig. 2.14. In the area of this PFTCG, the main part is the DCO circuits from the delay cells to constitute the 25 ns delay in each delay phase. In the rest of the area, it mainly comes from the control circuits because the long delay line has multiple of delay stage to control and it

requires lots of circuits to decode the control signals. This PFTCG is integrated in a test system [21], dual-mode (MT-CDMA & OFDM) baseband transceiver, for system verification with the PFTCG area 125 µm x 252 µm, where the chip microphoto and layout of the PFTCG is shown in Fig. 2.15.

Fig. 2.13. Area distribution of all-digital PFTCG.

Fig. 2.14. Layout of the proposed PFTCG.

(a) (b)

Fig. 2.15. Micro chip photo (a) WSN (b) CPN. [21]

Fig. 2.16 shows the measured output waveform of PFTCG using LeCroy LC584A. There four phase outputs (PHASE0, PHASE2, PHASE4 and PHASE6) at Channels 1, 2, 3 and 4. Both peak-to-peak phase jitter and maximum root-mean-square (RMS) jitter at 5 MHz are 287 and 640 ps over 15032 sweeps, respectively. By using a current-meter with 100 pA resolution at 1V/25℃ (supply of I/O pad is 2.5 V), the measured power consumptions are 145.8 µW and 95.4 µW at 5 MHz with 1.0 V and 0.8 V supply voltage, respectively.

Fig. 2.16. Measurement result of PFTCG.

2.6 Summary

An all-digital and cell-based clock generator is designed to enable the clock phase and frequency tuning dynamically during the wireless communications system in operation. The proposed all-digital PFTCG provides 8 clock phases for selection and enables the ADC sampling signals with lower frequency and better sampling phase, resulting in lower power dissipation. The PFTCG also achieves ±1072 ppm frequency tuning range centered at 5 MHz under any PVT variations, leading to high performance against SCO. Comparing with the no sampling offset case, there is only 0.25 dB SNR loss when PER = 1 % as shown in Fig. 1.2 [4]. Hardware is measured with 145.8 µW and 95.4 µW at 5 MHz with 1.0 V and 0.8 V supply in the standard process 90 nm CMOS technology. The overall power comparison is shown in Fig.

2.17. There is 46.1 % ADC power reduction included the overhead of DPR, DFR and PFTCG. Therefore, this proposed PFTCG enables the robust and high performance in SoC design for WBAN applications.

Fig. 2.17. ADC power comparison.

CHAPTER 3 Hysteresis-Delay-Cell-Based Digitally Controlled Oscillator

To meet power-critical or battery-less systems for WBAN application, a low power DCO is required in always-on clock generator. But, in most state-of-the-art DCO circuits to ADPLL [5-8], ADDLL [14] or ADMCG [15] circuits, the aspect of low power and fine delay resolution in low frequency application are not considered together. General techniques have been proposed to operate in low frequency, which is used by frequency divider circuits or long delay lines in DCO. In the frequency divider circuits approach, however, the original delay resolution of the divided signal would be damaged by frequency divider. Although the fine delay resolution can be achieved by the long delay line in DCO, the area and power dissipation also increases

在文檔中應用於無線近身網路之可調式全數位時脈產生器 (頁 15-0)