• 沒有找到結果。

CHAPTER 2 Phase-Frequency Tunable Clock Generator

2.6 Summary

An all-digital and cell-based clock generator is designed to enable the clock phase and frequency tuning dynamically during the wireless communications system in operation. The proposed all-digital PFTCG provides 8 clock phases for selection and enables the ADC sampling signals with lower frequency and better sampling phase, resulting in lower power dissipation. The PFTCG also achieves ±1072 ppm frequency tuning range centered at 5 MHz under any PVT variations, leading to high performance against SCO. Comparing with the no sampling offset case, there is only 0.25 dB SNR loss when PER = 1 % as shown in Fig. 1.2 [4]. Hardware is measured with 145.8 µW and 95.4 µW at 5 MHz with 1.0 V and 0.8 V supply in the standard process 90 nm CMOS technology. The overall power comparison is shown in Fig.

2.17. There is 46.1 % ADC power reduction included the overhead of DPR, DFR and PFTCG. Therefore, this proposed PFTCG enables the robust and high performance in SoC design for WBAN applications.

Fig. 2.17. ADC power comparison.

CHAPTER 3

Hysteresis-Delay-Cell-Based Digitally Controlled Oscillator

To meet power-critical or battery-less systems for WBAN application, a low power DCO is required in always-on clock generator. But, in most state-of-the-art DCO circuits to ADPLL [5-8], ADDLL [14] or ADMCG [15] circuits, the aspect of low power and fine delay resolution in low frequency application are not considered together. General techniques have been proposed to operate in low frequency, which is used by frequency divider circuits or long delay lines in DCO. In the frequency divider circuits approach, however, the original delay resolution of the divided signal would be damaged by frequency divider. Although the fine delay resolution can be achieved by the long delay line in DCO, the area and power dissipation also increases due to the cascading buffers in the long delay line [3]. The power consumption and delay resolution are always a trade-off in DCO design.

The power of the previous proposed DCO in Chapter 2 occupies 75 % power consumption of all-digital PFTCG under 1.0 V. This DCO power is dominated by the cascading buffers (BUF) [17] and DCV [10] as shown in Fig. 2.6 and Fig. 2.7,

respectively. Each BUF is composed of a multiple of inverters for achieving 200 ns (5 MHz) delay values. But, the long cascading inverter chains waste much power with the switching transistors for the desired long delay as shown in Fig. 3.1. The poor energy and area efficiency in the cascading inverter chains is the major drawback for the low frequency application DCO design.

The state-of-the-art DCO has been proposed in several architectures. For low power scheme, a 140 µW DCO has been proposed in [11]. When the DCO delay line selects a shorter delay path to provide higher operation frequency, some rest delay cells will not be used. These disabled delay cells still consumes extra power in DCO [11]. In order to disable the redundant delay cells in the operating DCO for power reduction, these delay cells are isolated from the delay loop in DCO [11]. Then, the DCO power is only related to the essential characteristic of the working cells.

But, for further power reduction in DCO, there is a design challenge to decrease the power consumption in the cascading standard cells. Table 3.1 shows the delay value and power consumption of UMC 90 nm SPHVT standard cells. The cell delay is given by

PLH PHL

D T T

T = + (3-1)

where TPHL and TPLH is the high-to-low and low-to-high propagation delay of each cells, respectively. The simulation is under PVT conditions (TT, 1.0V, 25℃). As the operating frequency becomes lower, the increasing power on the cascading cells would occupy higher power ratio in the DCO.

Fig. 3.1. Repeating switching through cascading inverter.

Table 3.1. Delay and power of standard cells in 90 nm technology.

BUFM2H BUFM4H BUFM8H DEL1M1H DEL1M4H DEL2M1H

Delay (ns) 0.100 0.095 0.090 0.223 0.199 0.344

Power (µW) 57.01 111.44 225.79 40.23 85.65 30.59

Furthermore, the techniques [5-11] [22] for improving the DCO resolution also affect the overall power consumption. For example, by controlling the number of the enabled tri-state buffers or tri-state inverters bank, driving capability modulation (DCM) technique changes the transistor driving capability on a fixed capacitance loading [6]. Nevertheless, DCM has the disadvantages of poor delay resolution, nonlinearity, large power dissipation and large area. Although the digitally controlled LC oscillator provides high tuning range and good stability [22], it requires dedicated circuit layout design and occupies large power consumption and area, which is composed of a parasitic capacitance tank. Additionally, the DCO with current-starved delay element [9] can change the delay value with the different controlling current and achieve high resolution, but the static current source consumes much static power.

In contract with [9], the delay cell is constructed from transmission gates by the

equivalent channel resistance in the charge and discharge path [8]. It achieves high delay resolution, but the power dissipation is still unacceptable.

Another delay resolution improvement technique uses different input code to control the charge path of or-and-inverter (OAI) cell shunted with tri-state inverters [5]. However, this approach also has nonlinear delay step. The other techniques [10]

[20], moreover, use the shunt capacitor circuits to fine-tune the capacitance loadings and improve delay resolution and linearity. Unfortunately, DCV result in a poor performance on power consumption and area to maintain an acceptable operation range. Hysteresis delay cell (HDC) and DCV were proposed together in [7] [11], which was the first use of HDC in a DCO design. The HDC can replace many DCV cells and reduce some power consumption, but it does not possess better power feature than an inverter.

Thus, a new HDC is proposed in the following sections to generate a wide delay range equal to the one in a multiple of inverters, in a simple technology, instead of cascading lots of buffers or inverters. The proposed HDC can not only overcome the design challenge in DCO power reduction with the least area, but also achieve high delay resolution, especially in sub-100MHz DCO designs.

3.1 Hysteresis Delay Cell

The HDCs [23-25], or namely Schmitt triggers, were widely used in digital and analog circuits for waveform shaping under noisy environment. As shown in Fig. 3.2, the switching point of CMOS inverter circuits is fixed at the average of high level voltage and low level voltage because the PMOS and NMOS are both in the saturation region. But the output signal of HDC circuits is filtered by the high level and low level threshold voltage, donated as V+ and V, respectively. There exists an extra delay between the output of the inverter and HDC due to the hysteresis phenomenon.

Fig. 3.3 describes the transfer function of HDC. The Boolean logical function of HDC in Fig. 3.3 is the same as an inverter gate. In forward switching path, the voltage of output (VOUT) remains high level until the voltage of input (VIN) increases to V+. Then, the output ties to the low voltage. Oppositely, when VIN decreases to V-, VOUT

switches to the high level voltage. The hysteresis voltage width of HDC is defined as equation (3-2).

+

=V V

Vhw (3-2)

The hysteresis width presents the output from the cross-talk noise and supply noise on clock and supply power, and also increases the response time of HDC circuits.

However, the feature of hysteresis, or non-sensitivity with input, can provide a long delay in place of lots of cascading inverters.

There are three common HDC in the following sections, including Rabaey [23], Dokic [24] and Sarawi [25] architecture. We attempt to analyze the power consumption and compare with the standard cells in UMC 90 nm CMOS technology.

Fig. 3.2. Output signals through inverter and HDC.

Fig. 3.3. Transfer function of HDC.

3.1.1 Rabaey Architecture

The HDC with Rabaey architecture was proposed as shown in Fig. 3.4 [23].

There are three inverters in this architecture. The transfer function of Rabaey’s HDC is different from that in Fig. 3.3. The Boolean logic of this Rabaey architecture is the same as a buffer cell.

The static behavior of Rabaey architecture is stated as follows. In the beginning, we assume the input voltage VIN is in high level voltage VDD and the output voltage is tied to low. When VIN decreases to a certain voltage V-, the mp3 and mn4 invert the output voltage to VDD. Therefore, the output feedbacks to mp3 and mn4 to speed up the transition and produce a clean output signal [23]. The low level switching point V

-is determined by the trans-istor mp1, mn1 and mn2. The analys-is as forward switching is similar to the above. However, Rabaey HDC consumes large power dissipation due to the short current path.

(a) (b)

Fig. 3.4. Rabaey HDC (a) Circuits (b) Schematic.

3.1.2 Dokic Architecture

There is Dokic architecture of HDC as shown in Fig. 3.5 [24]. The transfer function is the same in Fig. 3.3 as well. It can be extended to a NOR and NAND type HDC. When the input voltage VIN is equal to VDD, mp1 and mp2 are in cut off region, and mn1 and mn2 are turned on. So, the voltage of output VOUT is equal to ground resulting mn3 in cut off region and mp3 in saturation region. While VIN decreases to V-, mp1 and mp3 act as a saturated enhancement-mode inverter. Transistor mp2 turns on as well, providing a charging path from VDD to output. Oppositely, if VIN increases to V+, mn1, mn2 and mn3 are on. Then, there is a discharging path from output to ground.

These obvious short current paths bring about the major power consumption in the Dokic HDC.

Fig. 3.5. Dokic HDC.

3.1.3 Sarawi Architecture

Fig. 3.6 illustrates Sarawi HDC [25] which is designed by inverter chain internally cascaded with a footer and a header. Fig. 3.3 depicts the transfer function.

The operation of this HDC circuit can be described as follows. First, suppose the initial input voltage VIN is VDD, so that the mn2 is on and the mp2 is in cut off region, which implies mn3 is turned off, mp3 is turned on, mn1 is on and mp1 is off.

Transistor mn2 remains on and mp2 remains off until VIN decreases to a certain voltage V-, at which output, VOUT switches from a low to a high value. The similar behavior as forward switching with mp2, mn2 and mn1 is observed as follows. When a low level voltage is applied to VIN, VOUT goes to VDD. VOUT would switch from VDD

to ground until VIN increases into the high level threshold voltage V+ and triggers the pull-down network. Because of the lack of directly short current path, the longer delay and less power consumption can be expected in Sarawi HDC.

Fig. 3.6. Sarawi HDC.

3.1.4 Comparison

Table 3.2 lists the performance comparisons with the above HDCs and the standard cells in UMC SPHVT 90 nm CMOS technology, including BUFM2H, BUFM4H, BUFM8H, DEL1M1H, DEL1M4H and DEL2M1H. The simulation PVT condition is at typical corner case and 1.0 V supply voltage.

Table 3.2. Performance comparison with standard cells and HDCs.

Delay

The cell delay is the summation of high-to-low and low-to-high propagation delay, defined in equation (3-1). The area efficiency is an index of cost as (3-3), which is the delay comparison within same area. And, the energy efficiency means the inverse of transition power as (3-4). These two parameters can be regarded as a figure of merit to evaluate the performance of delay cells.

Area Delay Efficiency

Area = (3-3)

Energy Delay Efficiency

Energy = (3-4)

The normalization of area and energy efficiency is shown in Fig. 3.7 and Fig. 3.8, respectively.

By the simulation results, it is found that the HDCs of Rabaey and Dokic perform similar area and energy efficiency to the standard cells. But, the Sarawi architecture represents the best performance in both area and energy efficiency. That implies the Sarawi HDC can achieve the same delay by using the least area and energy compared with the other delay cells. So, we will re-analyze the Sarawi HDC in the following section and propose a new delay tunable and low power HDC for DCO resolution improvement.

Sarawi DEL2M1H Dokic DEL1M1H Rabaey BUFM2HDEL1M4HBUFM4H BUFM8H 0

10 20 30 40 50 60 70 80 90 100

Area Efficiency Normalization (%)

Fig. 3.7. Normalization of area efficiency with standard cells and HDCs.

Sarawi DEL2M1H Dokic DEL1M1HBUFM2H Rabaey DEL1M4HBUFM4H BUFM8H 0

10 20 30 40 50 60 70 80 90 100

Energy Efficiency Normalization (%)

Fig. 3.8. Normalization of energy efficiency with standard cells and HDCs.

3.2 Proposed Hysteresis Delay Cell

3.2.1 Formulation

The reason of the longest delay value and most area and energy efficiency in Sarawi HDC is the wide hysteresis voltage width and creeping rise/fall time of output.

As shown in Fig. 3.5, when the input voltage of HDC decreases from VDD to V- in the reverse switching path, the currents of the transistors, mp2, mn2 and mp1, are the same [25] as follows.

1 2

2 mn mp

mp I I

I = = (3-5)

Thus, we may rewrite (3-5) as threshold voltage of NMOS and PMOS, respectively. Based on the left hand side of (3-6), we have

where VS1 is the voltage in node S1 [25]. According to the right hand side in (3-6), the VS1 is expressed as

Substituting this result in (3-8) into (3-7), we summarize the expression as

R

The same analysis as forward switching with mp2, mn2 and mn1 is as follows [25]. When a low value signal is applied to VIN, VOUT goes high. VOUT would switch from high to low until VIN increases into V+. In the forward switching path, the relationship of the currents between transistors mp2, mn2 and mn1 can be written as

1

From (3-12), we have the forward switching point as similar as (3-9).

1 equations. The switching points V- and V+ can be rewritten as

1

Fig. 3.9. Transition response of Sarawi HDC.

The rise time TRISE and fall time TFALL of the HDC circuits contribute the most delay in the hysteresis phenomenon. As shown in Fig. 3.9, the transition time of the HDC dominates the overall propagation delay.

Assume the output capacitance COUT is voltage independent. The fall time TFALL

consists of three intervals. The first part tf1 is the time interval of VOUT from ) increasing voltage of node S2. The model is expressed as

2 2

Taking the integration, we obtain

Therefore, (3-19) is summarized as

2

The second part tf2 is the time interval when mn2 is in linear region. In this interval, VOUT drops from (VDDVtn) to (0.5*VDD) and then turns on mn1, which is

The other part tf3 is the time interval when mn1 and mn2 are both in linear region.

In the discharging interval, VOUT drops from (0.5*VDD) to (0.1*VDD) through mn1

where the βN is the equivalent transconductance of combination of mn1 and mn2.

2 (3-22) and (3-24), we may we rewrite the expression as

4 )

By the similar analysis, the rise time TRISE can be obtained in (3-28).

4 )

where βP is the equivalent transconductance of mp1 and mp2.

Based on (3-27) and (3-28), the rise time TRISE and fall time TFALL are inverse proportional to the transconductances βn2, βN, βp2 and βP. Thus, we can control the propagation delay of HDC by different βn2, βN, βp2 and βP.

3.2.2 Delay Tunable Hysteresis Delay Cell

According to the previous analysis, we propose a seven stage delay tunable HDC based on the original low power HDC architecture as shown in Fig. 3.10. The sizing of transistors in Fig. 3.10 is listed in Table 3.3. These delay stages control the fall time TFALL by the discharge transconductance in the proposed HDC. With different codeword, the proposed circuits perform different values in the propagation delay.

The simulation results of delay value and power consumption is shown in Fig. 3.11 and Fig. 3.12, respectively. The proposed HDC can achieve 0.78 ps delay resolution, and the delay range is from 1.643 ns to 1.742 ns with the fine delay linearity which guarantees a monotonic delay behavior when the control word increases. The delay value is several hundreds times cell delay of one minimum size inverter and the power consumption is below 2.2 µW in each codeword.

Fig. 3.10. Proposed delay tunable HDC.

Table 3.3. Transistor size of proposed delay tunable HDC.

Transistor mp1 mp2 mp3 mn1 mn2 mn3 mr00 mr01 mr02 mr03 W/L

Transistor mr04 mr05 mr06 mr07 mr08 mr09 mr10 mr11 mr12 mr13 W/L

Fig. 3.11. Delay of the proposed delay tunable HDC.

0 20 40 60 80 100 120 1.9

2 2.1 2.2 2.3 2.4

Codeword

Powe r ( µ W)

Power

Fig. 3.12. Power of the proposed delay tunable HDC.

Fig. 3.13. Cascading BUF and DCV.

Table 3.4 illustrates the features and comparisons with proposed delay tunable HDC and the most commonly-used cascading BUF and DCV approach [10] which is depicted in Fig. 3.13. The proposed delay tunable HDC with similar propagation delay and controllable range can perform better performance in resolution, power and area. The delay resolution improves the DCO frequency tuning step and covers every desired delay value in DCO. The 98.4 % power reduction and 92.8 % area reduction implies both dynamic and static power saving, resulting in better area efficiency and energy efficiency on clock generator.

Table 3.4. Comparison of cascading BUF and DCV to delay tunable HDC.

Transistor Delay (ns)

Controllable Range (ps)

Resolution (ps)

Power (µW)

Area (µm2) Cascading BUF

& DCV 1.86 67.7 2.26 133 6.048

Proposed Delay

Tunable HDC 1.64 99.4 0.78 2.2 0.437

3.3 Proposed HDC-Based Digitally Controlled Oscillator

By the above proposed delay tunable HDC, we design a 5 MHz low power all-HDC-based DCO as shown in Fig. 3.14. The proposed DCO is partitioned into two tuning stages. The 1st tuning stage composed of HDC1 extends the controllable range of DCO. The 2nd tuning stage, cascading HDC2, is for the delay resolution improvement. Because the targeted frequency is 5 MHz, the total delay of HDCs in 2nd stage has less than 200 ns under any PVT conditions. Furthermore, the delay controllable range in 2nd tuning stage must cover the delay resolution of 1st tuning stage, avoiding false lock in PFTCG, ADPLL [5-8] or ADDLL [14] applications.

The delay resolution of 1st tuning stage is summation the summation of propagation delay from low to high (TPLH) and propagation delay from high to low (TPHL) of HDC1. The architecture of HDC1 is illustrated in Fig. 3.15. Based on the Sarawi HDC, we apply an extra transistor mp4 as the header. The ENABLE signal is used for isolating the redundant delay elements in the closed loop and saving the power consumption. Generally, the dynamic power Pdym in the 1st tuning stage is expressed as

Fig. 3.14. Architecture of the proposed HDC-based DCO.

Fig. 3.15. Delay element of the 1st tuning stage.

f V C

Pdym = L DD2 (3-29)

where CL is the overall loading capacitance and f is the circuit operating frequency.

When we don’t disable the redundant delay elements outside the closed loop in 1st tuning stage of DCO, the power becomes

D

where M is the total number of HDC1, N is the number of HDC1 in the closed loop, Ccell and TD are the capacitance and delay value of one HDC1, respectively. When the ENABLE signal turns off the redundant delay, the dynamic power is written as

D

The dynamic power with disabled redundant elements is N M times of the power with unblocked the elements. In other words, the power consumption with disabled redundant elements is independent of N. It also means that the 1st tuning stage power consumption is fixed as shown as (3-31) whatever DCO operating frequency is. Consequently, the power and delay characteristics of HDC1 imply the overall 1st tuning stage power performance.

The 2nd stage delay element HDC2 is the same as the proposed delay tunable HDC in Section 3.2, which provides both delay resolution and delay offset. For covering the delay resolution of 1st tuning stage, the 2nd tuning stage must have enough controllable range. Thus, the number of 2nd tuning stage element increases to 64. Table 3.5 summarizes the control code length, controllable range and delay resolution of the 5 MHZ all-HDC-based DCO.

For hundred-MHz DCO application, we can apply the same DCO architecture as

For hundred-MHz DCO application, we can apply the same DCO architecture as

相關文件