Simulation Results - A Thermally Robust Buffered Clock Tree Using Logical Effort

Chapter 4 A Thermally Robust Buffered Clock Tree Using Logical Effort

4.4 Simulation Results

Hspice simulations with layout parameters extracted were performed to evaluate the performance of the proposed design. In the simulations, we used the UMC 65-nm technology including layout of tunable-width buffers and 7th-layer metal H-tree interconnection. Figure 4.11 shows the tunable-width inverter and 7th-layer metal with 1-um width. The clock skew is measured between points A and B in Figure 4.1, in various thermal conditions. Table 4.1 lists the improvements of clock skew after using logical effort compensation at 0.3V (sub-threshold region) and 0.5V

(near-threshold region). In Table 4.1, W₁ is set to be 128X for 0.5V and 64X for 0.3V, considering logical effort tuning range. Before compensation, the buffer width is not changed in various thermal conditions, equal to W₁. With logical effort compensation, clock buffers create constant delay, mitigating temperature induced clock skew. The clock skew is reduced by up to 97.8%, and 71.19% in average.

Metal 7

1-um Width Interconnect

128X

64X 32X 16X 8X 4X 2X 1X

Figure 4.11 Layout of a tunable-width inverter

Table 4.1 Compensation improvement of clock skew in sub/near-threshold region

Chapter 5

A Programmable Clock Generator for Sub- and Near-Threshold DVFS System

In this chapter, a sub/near-threshold programmable clock generator will be presented. It has the ability creating output clock with frequency 1/8~4 times of the reference clock. The variation-aware logic design is performed in the clock generator, which improves the reliability on process variation. The adoption of pulse-circulating scheme reduces process induced output clock jitter. In addition, we realize a PVT compensation unit for adjusting the locking range of clock generator. The clock generator has been designed in UMC 65nm CMOS technology. The frequencies of reference clock are 625 KHz at 0.2V and 5MHz at 0.5V.

Section 5.1 gives the introduction. Section 5.2 shows the system architecture of proposed programmable clock generator for sub/near-threshold DVFS system.

Section 5.3 introduces the variation-aware logic design for sub-threshold operation.

Section 5.4 demonstrates the proposed PVT compensation technique mainly for adjusting clock generator’s locking range. And section 5.5 shows the circuit description of clock generator. In section 5.6 the clock tree proposed in chapter 4 and the programmable clock generator will be combined. Section 5.7 shows the design implementation in UMC 65-nm CMOS technology. Finally, the post-layout design is simulated and results will be demonstrated in section 5.8.

5.1 Introduction

The dynamic-voltage-and-frequency-scaling (DVFS) technique has been adopted in many low-power devices such as wireless body area network (WBAN) communication system. The WBAN system provides body signal collecting and reliable physical monitoring, which has many wireless sensor nodes (WSNs) attached on or implanted inside human body [5.1][5.2]. To achieve low-power requirement, near/sub-threshold regime has been introduced to WBAN system.

Many clock multiplication schemes have been proposed for DVFS systems in super-threshold region. Phase-locked loops (PLLs) are usually used as clock generator, but its locking period takes hundred of reference clock cycles. To enhance the flexibility of clock generator for DVFS system, an all-digital clock generator is presented [5.3] which generates output clock by delaying the reference clock dynamically according to the frequency control code. However, the output frequency can only be fraction of reference clock. Delay-locked loop (DLL) [5.4] was presented for DVFS system, but it couldn’t generate fractional clock. Cyclic clock multiplier (CCM) has been presented for DVFS applications [5.5], and it has the advantage of creating fractional or multiplied clock. However, the cyclic clock multiplier uses TDC for phase error detection which will consume much area and power.

In this chapter, a programmable clock generator is proposed which is aimed at sub- and near-threshold region. It adopts the pulse-circulating scheme in [5.5] and includes some advantages. First, the pulse always circulates through the same delay line; thus compared to DLL based clock multiplier [5.6][5.7], the process-induced phase error will be reduced. Second, the proposed clock generator has the ability of

PVT compensation for locking range and takes only one reference clock cycle. Finally, variation-aware logic design is performed for sub-threshold and near operation.

5.2 System Architecture

The architecture of the proposed clock generator is shown in Figure 5.1. The clock generator consists of main blocks as following: pulse generators (PG), phase detector, counter, lock-in delay line, PVT-Comp. (PVT Compensation) delay line, PVT-Comp., control and frequency divider.

Figure 5.1 Proposed clock generator for sub- and near-threshold DVFS system

In the proposed clock generator, the CLKREF signal enters a PG which produces pulses (P_REF) with frequency equal to CLK_REF. Pulse multiplier generates pulses (P_OUT) with 8-time frequency of the reference pulses (PREF). In addition, the divider can divide the input frequency by 2, 4, 6 or 8. Therefore, the proposed clock generator is able to output clock with frequency M/N times of the reference clock, M = (1, 8) and N = (2, 4, 6, 8) which are controlled by input frequency selecting signal FS[2:0]. Table 5.1 shows

the frequency selection range.

FS[2:0] M N f_out / f_ref 000 1 8 0.125 001 1 6 0.167 010 1 4 0.250 011 1 2 0.5

100 8 8 1

101 8 6 1.333

110 8 4 2

111 8 2 4

Table 5.1 Frequency selection range, f_out and f_ref are the frequencies of output and reference clocks

In order to produce POUT with 8-time frequency of PREF, we adopt a circulating scheme. Each pulse of P_REF will enter the circulating path and circulate for 8 times.

The paths is determined by path selection signal SEL, when SEL = 1 the pulse from PREF can enter the delay line; otherwise, the circulating path is built. The counter is used for counting the number of times that pulse flowing in the circulating path. The counter informs phase detector and control block whether the counting times is equal to 8 by the signal countE8. Phase detector compares the phases of POUT and PREF only when the counting times is equal to 8. The control block will change the value of C[5:0] according the compared results, LEAD and LAG. Figure 5.2 demonstrates the procedure of system operation. After the system is reset, the state machine will pass through three steps: PVT compensation, SAR control (successive approximation register) and lock.

57 Comp.PVT

Reset

SAR

Lock Reset

finishSAR Out of locking

range

Lock

Figure 5.2 Finite state machine

In the first step, the system undertakes PVT compensation. In sub- and near-threshold regions, devices behaviors are affected more seriously by PVT variations than that in super-threshold region. The effects of PVT variations cause the lock-in delay line having extremely different delay. To compensate for delay variations, the clock generator uses PVT-Comp. technique to provide adequate delay for the lock-in delay line. Therefore, the period of reference clock is in the locking range of lock-in delay line.

After PVT compensation, the system enters the second step – SAR control which uses binary search algorithm. In this step, the control block changes the control codes C[5:0] according to the comparison result of phase detector, LEAD and LAG. By using SAR control, the lock-in delay line is tuned so that the clock generator can be locked to reference clock.

Finally, the state of clock generator becomes lock state, and the clock generator can output clock with multiplied or divided frequency. In this step the feedback loop – from output clock to phase detector, control block, then control code C[5:0] – is still

kept. The control block will continue tracking by means of counter control which adds or subtracts C[5:0] by 1 at a time. Keeping it in close loop guarantees that the clock generator will be locked to reference clock, even if there are voltage and temperature variations in run time. Figure 5.3 and Figure 5.4 show the schematic diagram of waveform. The state and control codes C[5:0] change every two clock cycles.

CLK_REF P_REF P_OUT SEL RST

LEAD LAG

C5 C4 C3 C2 C1 C0

Reset

STATE PVT Comp. SAR Control

Figure 5.3 The schematic diagram of waveform from state Reset to SAR control

CLKREF

PREF

POUT

SEL

STATE SAR Control Lock

Figure 5.4 The schematic diagram of waveform from state SAR control to Lock

5.3 Variation-Aware Logic Design

5.3.1 Sub-Threshold Logic Design Challenge

Voltage scaling is an effective approach for improving the power efficiency. The power consumed by a digital circuit can be expressed as

f V C

P  _DD² (5.1)

Where P is power consumption, C is total charged capacitance, V_DD is supply voltage and f is operation frequency. Since the power is a quadratic function of VDD, we can efficiently lower power consumption by voltage scaling. However, when supply voltage is down to sub-threshold region, there are two critical factors that affect functionality [5.8]. First, ratios of on to off currents (I_ON / I_OFF) are decreased in logic gates. Second, random-dopant-fluctuation is a dominant source of local variations in sub-V_t [5.9]. These two factors result in not only reduced output swings in CMOS logic gates but also skewed voltage transfer curve (VTC). Figure 5.5 shows the VTC of an inverter operated at 300 mV [5.8] with global as well as local variations. In some cases, the logic levels of CMOS gates are severely degraded.

Figure 5.5 Effects of variations and reduced ION / IOFF on sub-Vt inverter voltage transfer curve [5.8]

5.3.2 Mitigating Variation by Upsizing Transistors

Upsizing transistor is one technique for mitigating local variation. Researches in [5.10] show that standard deviation of V_t varies inversely with the square root of the channel area. However, the upsized lengths and widths of transistors increase total capacitance as well as power consumption. The back-to-back configuration in is proposed [5.8] to deal with this trade-off. In this configuration, NAND and NOR gates are selected to check for the worst case. Because NAND gate has more leakage for output logic 1 and NOR gate has more leakage for output logic 0, the worst skewed VTC is obtained. For mitigating V_t variation, 3-input gates have much more area increase than 2-input gates at 0.2V. In addition, 3 stacked transistors have less current, thus lowering operation speed. In the proposed clock generator, we use only 2-input logic gates for sub-threshold operation.

61 Initial condition

V = 0

Initial condition V = 0.2

Leakage Active

Figure 5.6 Back-to-back configuration

In Figure 5.6, the initial output voltage of NAND and NOR gates are set to be 0V and 0.2V respectively. If the function error happens, the output voltages will change.

Trend of failure rate is analyzed in [5.8], which shows that the failure rate is decreased exponentially as either VDD or device width is increased. In UMC 65nm CMOS technology and at 0.2V, 10000 times Monte Carlo simulation demonstrates that the failure frequencies are 51 times with minimum width and length, and 0 time with 125% minimum width and 123% minimum length. The device-aware sizing is needed for sub-threshold design, thereby avoiding function errors.

5.4 PVT Compensation for Locking Range of Delay Line

In this section, we will demonstrate the proposed PVT compensation technique for the locking range of delay line. It has been shown that the device behavior is much more variable to PVT variations in sub-threshold region than that in super-threshold region. For the clock generator, the influenced devices make the lock-in delay line having different delay range. Therefore, the clock generator probably cannot be locked to reference clock. To solve this problem, the PVT compensation technique is proposed.

Figure 5.7 shows the concept diagram of PVT compensation. In the typical condition, the reference clock is in the locking range of lock-in delay line. While there are PVT variations, the locking range is shifted, thus the clock generator cannot be locked to reference clock. After adding the PVT compensation, the locking range can be adjusted.

Period Locking range (lock-in delay line)

Period

Period CLK_REF

With Compensation PVT Variations

Figure 5.7 Schematic diagram of PVT compensation

5.4.1 Delay Ratio of FO1-INV to FO2-NAND

In this subsection we demonstrate the delay ratio of inverter with fanout 1 (FO1-INV) to NAND gate with fanout 2 (FO2-NAND), and this characteristic will be used in PVT compensation for locking range of delay line. The reason of using the delay component FO1-INV is that FO1-INV is taken as the cell of PVT sensing circuits in PVT-comp. block. The reason of using delay component FO2-NAND is that FO2-NAND delay is unit delay step which can be tuned in the lock-in delay line, and the topology is shown in Figure 5.8.

VIN S0

S0B

S1 S1B

D D

S14

VDD

S14B

VOUT

D : Dummy

Figure 5.8 Topology of delay line (lattice delay line [5.13]) used in the proposed clock generator

Figure 5.9 are two 11-stage ring oscillators composed of FO1-INV and FO2-NAND respectively. The  ratio of FO1-INV is 1, in which the sizes of NMOS and PMOS are the same. Figure 5.10 and Figure 5.11 are Monte Carlo simulation results of the oscillators at 0.2V (sub-threshold region) and 0.5V (near-threshold region). The simulations also include the temperature effects. We can see that at supply voltage 0.2V and 0.5V, the delay ratios of FO2-NAND to FO1-INV are both 2.

This ratio is unchanged under various PVT conditions. In the next subsection, the property will be used for PVT compensation for the locking range of delay line.

(a)

D D

D ^{: Dummy}

(b)

Figure 5.9 Ring oscillator using (a) FO1-INV cell, (b) FO2-NAND cell

Figure 5.10 Periods of ring oscillators (composed of FO1-INV and FO2-NAND) at 0.2V

Figure 5.11 Periods of ring oscillators (composed of FO1-INV and FO2-NAND) at 0.5V

5.4.2 Procedure of PVT Compensation for Locking Range of

Delay Line

PVT-comp. delay line is controlled by D[5:0]. In the PVT compensation state, the PVT-comp. block first senses the environmental conditions, which will be recorded in a counted number count. Then count is decoded to control code D[5:0], the PVT-comp. delay line can provide adequate delay.

Comp. PVT-D[5:0]

PVT-Comp.

Delay Line

Figure 5.12 PVT-comp. block and PVT-comp. delay line

The PVT-comp. block is shown in Figure 5.13. It consists of a PVT sensing circuit, a counter and a decoder. The PVT sensing circuits uses a ring oscillator which can be switched on or off. When the clock generator is in PVT-comp. state, the switch signal is turned on for one reference clock cycle. Then counter records the number of oscillated cycles.

Switch

Counter count[7:0]

1 Reference Clock Cycle

D[5:0]

PVT Sensing Circuit

Decoder

= TD

Figure 5.13 PVT-comp. block

The ring oscillator is composed of 62-stage FO1-INV and 1-stage NAND. The FO1-INV is the same as that in section 5.4.1. According the simulation result, the period of the ring oscillator’s output is nearly equal to 128-stage FO1-INV delay, 128×DINV. So the counted number count is equal to:

INV FO1-INV and delay of FO2-NAND has been introduced in subsection 5.4.1:

INV FO

NAND D

D _, ₂ 2 (5.3)

D_NAND,FO2 represents delay of FO2-NAND. Thus:

2 times frequency, the pulse signal will propagate through the delay line for 8 times. For locking to the reference clock, the target delay of entire delay line should be equal to 1/8 T_D. So from (5.4):

The delay of entire delay line is divided into two parts: delays provided by PVT Comp. delay line and by lock-in delay line. The locking range of the lock-in delay

line is from 4×D_NAND,FO2 to 130×D_NAND,FO2 which will be introduced in the next subsection. The 64×DNAND,FO2 in equation (5.5) means that the target of the delay provided by lock-in delay line is set to about the middle point of locking range. The remaining delay will be compensated by PVT-comp. delay line.

Figure 5.14 shows the PVT-comp. delay line, it is similar to nested lattice delay line (NLDL) proposed by [5.13]. The unit delay step of PVT-comp. delay line is 32×DNAND,FO2.To calculate control codes D[5:0], we divide the delay provided by PVD-comp. delay line in (5.5) by 32×D_NAND,FO2:

 

negative, it is set to 0. In Figure 5.13, the decoder realizes the equation in (5.6). The divisor of count is 4 so division can be accomplished with shift bit 2, thereby saving circuit area.

30 NANDFO2 30 NANDFO2

Figure 5.14 PVT-comp. delay line

5.5 Circuit Description

5.5.1 Lock-In Delay Line (LIDL) Controller

The lock-in delay line controller combines two categories of locking strategy:

SAR (Successive Approximation Register) controlled [5.11] and counter controlled [5.12]. The SAR controlled strategy adopts binary search algorithm, which achieves fast lock time and low hardware complexity. Nevertheless, its open-loop characteristic

doesn’t track the environmental variations such as temperature and voltage variations.

For solving this problem, the counter controlled strategy is added, which is aimed at track the environmental variations for its close-loop characteristic. When the clock generator starts, it will use the SAR strategy first for fast locking; after the SAR algorithm finished, it changes to counter controlled strategy. Figure 5.15 represents this procedure. C[5:0] is LIDL control codes, and it is sent back to the combination logic blocks. The multiplexer choose which lock-in strategy to be used. When the clock generator is in lock state, it will choose the counter controlled locking strategy tracking to environmental variations.

Counter Controlled ControlledSAR

DDD QQQ C[5:0]

State = Lock

6 6

Figure 5.15 Lock-In Delay Line (LIDL) Controller

5.5.2 Lock-In Delay Line (LIDL)

The LIDL is modified from NLDL nested lattice delay line (NLDL) [5.13].

Compared with NLDL, the LIDL in Figure 5.16 saves some circuit area by using 14-stage FO2-NANDinstead of lattice delay line (LDL) as block delay. It still keeps the advantages of NLDL. First, the LIDL has equal rising and falling times. Second, while the tuning range increases, the maximum operating frequency will be the same.

Finally, the variation is only half compared to conventional configuration in Figure

14 NANDFO2 14 NANDFO2

C[5:3]

D D D : Dummy

14-stage NANDFO2

Figure 5.16 Lock-In Delay Line (LIDL)

Figure 5.17 Simulation result of delay variation [5.13]

5.5.3 Pulse Generator

To circulate a pulse in the LIDL, the pulse width and the duty cycle should be design properly to avoid the pulse disappearing [5.14]. Figure 5.18 shows the pulse generator which is composed of D flip-flop and delay line. The pulse is generated when there is a rising edge in V_IN.

D Q

V_IN r

VOUT

VIN

V_OUT

Figure 5.18 Pulse generator

5.5.4 SEL Generator

In Figure 5.1, the SEL signal selects the path of pulse, from POUT or PREF. If the pulse signal is from P_REF, the circulating pulses are readjusted. If the pulse signal is from POUT, the circulating path is built. Figure 5.19 shows SEL generator, it has two different modes at states SAR and Lock. When the state is SAR, SEL will be inversed every negative edge of PREF, which waveform is in Figure 5.20. When the state is Lock, SEL is decided by P_REF and CountE8, which waveform is in Figure 5.21. SEL will be high when PREF is high or 8th pulse of POUT arrives, and the latter is designed to avoid 9th pulse propagating through the circulating path early.

D Q

PREF

CountE8

0 1 State = SAR

SEL State = Lock

Figure 5.19 SEL generator

CLKREF

PREF

POUT

SEL

Figure 5.20 SEL waveform while State = SAR

1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 0

CLK_REF PREF

P_OUT RST_Counter

Counter CountE8 SEL

Figure 5.21 SEL waveform while State = Lock

5.5.5 Phase Detector

In Figure 5.22, the phase detector compares the arrival time of PREF and 8th POUT. Conventional phase detector uses only two D flip-flops, which is not suitable in pulse circulating scheme. Here we add another two flip-flops in front of them, thereby able

Figure 5.23 RSTPD generator

5.5.6 Frequency Divider

The frequency divider shown in Figure 5.24 is able to divide the frequency of P_DIV by 2, 4, 6 or 8, according to frequency selection signal FS[2:0]. The number of division is decided according to how many flip-flops are in the loop. For example, when FS[1:0] is equal to 11, the clock loop will propagate through only one flip-flop, thus the output frequency is the division of PDIV by 2. This frequency divider is capable of producing 50% duty cycle output clock. Table 5.2 lists the relation between FS[1:0] and frequency division ratio.

Table 5.2 The relation between control signal and output frequency

在文檔中可用於工作在次臨界╱近臨界電壓區間綠色節能科技之製程、電壓、溫度高適應性超低電壓時脈系統設計 (頁 60-0)