Chapter 5 High-boosting Pre-driver
5.3. Experiment and Measurement Results
A test chip has been designed and fabricated in 65nm 1P10M SPRVT. The test chip includes four on-chip buses- the 2X, 3X, 4X pre-driving repeaters and the conventional inverter, as shown in Fig. 5-12. Four-bit pseudo-random bit sequences (PRBS) are generated and passed through an H-to-L level shifter to adjust the voltage swing to 0.1–0.3 V. An extra input I/P is provided to switch between a tunable clock signal or random data. Each on-chip bus has three channels. Each channel is 10-mm long with a wire spacing of 100nm for ground shielding in Metal5. The bus using 2X pre-drivers and the conventional repeater is divided into 10 segments, and into 4 and 2 segments with 3X and 4X pre-drivers, respectively. In each boosting pre-driver, 100fF MIM capacitors serve as the bootstrap capacitors. Level shifters are used for the I/O circuit. Fig. 5-13 shows the photograph of the die. The total area with I/O pads is 1400μm 1400μm× .
Fig. 5-12. Block diagram of test circuits.
1400μm OP Test
Test CKT 2X Bus 3X Bus 4X Bus
1400μm
INV Bus 2X Bus
Buf Buf Cap
Cap
Cap
Fig. 5-13. Die photo and cell layout.
5.3.2. Measured Waveforms
Fig. 5-14 shows the measured data eye diagram waveforms under a 0.15V supply. A 29– 1 bit PRBS sequence is used as the input random data. Fig. 5-15(a) and (b) show the simulated and measured data rate and energy efficiencies of the all buses. The TT process corner is used in the post-layout simulation to ensure consistency with the measurements. In general, the measured results coincide with the simulated ones. The bus with 2X boosting pre-driver can operate at 1.5MHz clock or 2.5Mbps data under 0.15V with an energy efficiency of 32.4fJ/bit. For the bus with 3X boosting pre-driver, they are 3MHz, 5Mbps and 35.2fJ/bit. For the 4X bus, they are 1.1MHz, 1.5Mbps, and 32.8fJ/bit. According to the interconnect parameters from the datasheet, the energy dissipation of the wires is 20.3fJ/bit (0.5· fCwireV ). It shows the proposed buses DD2 performs well energy efficiency and are close to the limit.
BT3X@Core V =0.15V,Data rate=5MbpsDD BT4X@Core V =0.15V,Data rate=1.5MbpsDD BT2X@Core V =0.15V,Data rate=2.5MbpsDD
80ns 200mV
RMS P-P
Jitter =18.6ns,Jitter =96ns JitterRMS=8.8ns,Jitter =50.2nsP-P JitterRMS=25.4ns,Jitter =133.4nsP-P
40ns 200mV 131ns 200mV
Fig. 5-14. Measured waveform under 0.15 V core VDD (600 mV~ 800 mV I/O VDD).
0.10 0.15 0.20 0.25 0.30
100k 1M 10M 100M
Data rate (bps)
VDD (Volt)
Post-sim_2X Post-sim_3X Post-sim_4X Measured_2X Measured_3X Measured_4X
@25 C,TT Corner°
(a)
0.10 0.15 0.20 0.25 0.30
0.00 0.05 0.10 0.15
VDD (Volt) Post-sim_2X
Post-sim_3X Post-sim_4X Measured_2X Measured_3X Measured_4X
Ebit(pJ)
@25 C,TT Corner°
(b)
Fig. 5-15. Comparisons with measured and post-simulation results.
(a) Data rate at different VDD, (b) energy rate at different VDD.
TABLE 5-1 summarizes the performance of the test chip, and TABLE 5-2 compares to the previous works. The FoM is used to compare the performance. FoM1 is defined as the energy per bit. FoM2 is the data rate normalized to pitch-power product [44]. The proposed design can operate in the sub-threshold region under a supply voltage of 0.15V. The energy per bit is 35.2fJ/bit for the 3X pre-driver, and 32.8fJ/bit for the 4X pre-driver. This indicates that the proposed designs are more energy-efficient than the others. The comparisons with FoM2 show that the proposed ones are also more area efficient than the others.
TABLE5-1. Chip Summary
Energy per bit 32.4 (fJ/Ch)
NMOS: 230mV; PMOS: –190mV Vth
65nm 1P10M SPRVT Low-K CMOS Process
Energy per bit 32.4 (fJ/Ch)
NMOS: 230mV; PMOS: –190mV Vth
65nm 1P10M SPRVT Low-K CMOS Process
5.4. Summary
This chapter has successfully explored on-chip bus design under 0.15 V. The proposed 3X and 4X boosting pre-driver improves the energy efficiency and the data rate simultaneously.
According to Monte Carlo analysis, the proposed design has a smaller peak-to-peak variability under the device mismatch and process variation. A test chip in 65 nm 1P10M SPRVT CMOS process has been designed and fabricated. The measured results verify that the proposed 3X (4X) pre-driver achieves a 3 MHz (1.1 MHz) clock rate and 5 Mbps (1.5 Mbps) data rate at 0.15V VDD. The energy-efficiency is 35.2 fJ/bit (32.8 fJ/bit). In addition, it has highest data rate, normalized to the power and pitch product, as compared to the others.
TABLE5-2. Comparisons
only shows clock rate.
Power (μW) Data rate (Mbps)
FoM = Energy per bit ; FoM = .
Data rate (Mbps) Power (μW) Pitch(μm)
∗ =
Chapter 6
Near-threshold ADPLL
For the sustainable electronic devices, ultra-low power design is essential to prolong the battery lives. According to P = fCV2, scaling the supply voltage down is the most effective way to reduce the power consumption. According to the forecast from the International Technology Roadmap for Semiconductors (ITRS), the supply voltage will be scaled to 0.5V for low-power applications within the next generation [46]. Recently, some 0.5V biomedical applications have been reported [47-48]. In addition, some important analog building blocks have been developed with a 0.5V supply at MHz level [49-50].
Phase-locked loops (PLLs) are key building blocks in integrated circuits. Several clock circuits scaled to 0.5V are reported using analog approaches [51-53]. All-digital PLLs (ADPLLs) are popular alternative to analog PLLs for their portability and scalability. Additionally, ADPLLs have no DC power dissipation. For a PLL, the oscillator is the most power starving building block even in near-threshold operation. Although LC oscillators have superior phase noise, ring oscillators are often chosen due to power and area considerations. The digitally-controlled oscillator (DCO) presented in [54] is composed of a 12-bit DAC and a current-controlled oscillator using 260 uA bias current. However, the high resolution DAC requires extra power and area overhead. In order to enhance the driving capability and linear control range, a 0.5V 8-phase voltage-controlled oscillator (VCO) with a bulk-driven technique is reported in [53]. It successfully modulates threshold voltage Vth by slightly increasing the leakage current. [55]
takes an all digital approach. It uses a large number of digital delay cells and paths that it makes difficult to reduce the power due to its parasitic loads. Several DCOs are composed of a supply-regulated ring oscillator and a digitally-controlled resistance network (DRN) [56-57].
Here, linearity and complexity are major designs issues for DRNs.
In this chapter, we present a near-threshold supply ADPLL with bootstrapped digitally-controlled ring oscillator (BDCO) to operate at 0.25-0.5V. The BDCO is composed of a bootstrapped ring oscillator (BTRO) and a weighted thermometer-controlled resistance network (WTRN). The proposed bootstrapped delay cell generates large gate voltage swing to improve the driving capability. The boosted output swing keeps the transistors operate in linear region to have high linearity under a near-threshold supply.
The rests of the chapter are organized as follows. Section 6.1 introduces the proposed ADPLL. The analyses of performance evaluation are described in Section 6.2. In Section 6.3, the test chip and the experimental results are given. Finally, the comparisons and the conclusion are drawn in Section 6.4.
6.1. Architecture of Proposed All-Digital PLL
The proposed ADPLL, as shown in Fig. 6-1, consists of a phase frequency detector (PFD) to detect the phase error, a phase selector (PS) to reroute the signal path, a time-to-digital converter (TDC) to convert the phase error into digital code, a digital loop filter (DLF) to filter out the high frequency noise, a DCO to generate the required output frequency, and a divider (DIV) to divide and feed back the output frequency. To improve the resolution of the DCO, a 4-bit sigma-delta modulator (SDM) is used for the dithering.
Fig. 6-1. Block diagram of the proposed ADPLL
6.1.1. PFD, PS and TDC
PFD, PS and TDC together can be regarded as digital phase detector. PFD produces UP and DN signals to indicate the phase error. The circuit diagram is shown in Fig. 6-2(a). It is designed as a dynamic circuit to operate at high frequency. In order to have the correct phase arrangement for the TDC, two signals are reroute by PS, as illustrated in Fig. 6-2(b) [57].
TDC is based on a Vernier delay line, as shown in Fig. 3 [59]. It requires proper phase order for the conversion. As LEAD and LAG signals propagate in their independent delay chain, the timing difference between the two signals decreases by TΔ in each stage, where TΔ is defined as the resolution of the Vernier TDC. In the proposed ADPLL, a 4-bit TDC is designed with 20ps resolution at 0.5V. The phase comparators compare the phases of the delayed LEAD and LAG signals and produce a thermometer code. Each comparator is composed of two cross-coupled latches as depicted in Fig. 6-3. Finally a thermometer-to-binary (T2B) decoder converts the thermometer code to a 4-bit binary one.
CLKHT
RST Q
CLKHT
RST Q
UP
DN FREF
FB
Δt Δt
(a) (b)
Fig. 6-2. Circuit schematics of (a) PFD and (b) PS.
T+ ΔT
T T
T+ ΔT T+ ΔT
T
T+ ΔT
T
Fig. 6-3. Circuit schematic of the TDC.
6.1.2. DLF
The DLF is a 2nd order digital filter whose parameters are obtained by a bilinear transformation from its analog counterpart, as depicted in Fig. 6-4. It contains two signal paths, the proportional path (K ) and the integral path (P K ).The transfer function is I
ALF
V(s) 1
H(s) R
I(s) sC
= = + . (6-1)
Z-domain transfer function for representation of the DLF is in (6-2).
1
The Z-domain equations can be converted to the S-domain equations according to bilinear transformation [58], as written in (3).
1
Here TS is the sampling period of the reference clock in the ADPLL. The integrator is expressed as 1 1
1 z− − in Z-domain. Thus, while converting to Z-domain by bilinear transformation, Eq.
(6-1) can be rewrite as
Following the mentioned steps, we can obtain the design parameters of the DLF in the proposed ADPLL.
Fig. 6-4. Circuit schematic of DLF.
6.1.3. Bootstrapped Digitally-Controlled Oscillator
Based on our previous work [58], the proposed monotonic bootstrapped DCO (BDCO) is composed of a 5-stage BTRO with its supply voltage VC connected to a WTRN, as shown in Fig.
6-5. For near-threshold operation, linearity and variability are two major concerns. The techniques we use to overcome these two problems are detailed as follows.
GS C
V = 2V
SG C
V = -V
Fig. 6-5. Circuit schematics of the BDCO and BTRO.
6.1.3.1. Bootstrapped Ring Oscillator
In order to operate in the near-threshold region, a bootstrapped ring oscillator (BTRO) has been proposed [58], as shown in Fig. 6-5. The bootstrapped delay cell produces an output swing of –VC to 2VC ideally. The transient waveforms are illustrated in Fig. 6-6. When Vin=2VC, NOP=0 and NBP is precharged to VC by MP1. After Vin transits to –VC, NOP rises to VC and boosts NBP to 2VC. The boosted 2VC at NBP is transferred to Vout via MP2. 2VC (-VC) output voltage pushes NMOS (PMOS) transistors of the next cells into super-threshold region and increases their driving capability. It also suppresses the PMOS (NMOS) leakage current exponentially. As a result, we are able to increase the operation frequency without leakage problem by using large transistors. Since transistors are operating in super-threshold region, they have better linearity and immunity against process variation.
513.0n 513.3n 513.6n 513.9n 514.2n 514.5n 514.8n -0.50
-0.25 0.00 0.25 0.50 0.75
Time (sec) NOP
Node voltage (Volt)
Vin
@25 C,TT Corner°
NON Vout
Fig. 6-6. Simulated transient waveforms of a five-stage bootstrapped ring oscillator.
6.1.3.2. Weighted-Thermometer Code Control
The proposed WTRN is illustrated in Fig. 6-7. It controls VC for BTRO. In addition to the fully thermometer code in [56], the weighted code is used to have better linearity. The resistance network consists of 9-bit PMOS transistor arrays, binary-to-thermometer (B2T) code converters and an SDM. Fully thermometer control occupies large area with complicated wiring.
Hybrid architecture of binary and thermometer control is reported in [57] and costs less chip area.
Because the PMOS arrays are no longer binary weighted to obtain a better linearity, the proposed PMOS arrays are arranged in a segmented thermometer code with a dedicated transistor sizing.
There are a total of 13 control bits, two for coarse tune, three for medium tune, four for fine tune, and four for dithering by a SDM to further improve the resolution. In order to improve the conductivity at sub-0.5V, only four PMOS transistors stacked in each column. Figure 6-8 shows the DCO output frequency versus the coarse and medium control codes. As compared to the binary weighted, the proposed BDCO has better linearity with a gain of 563 kHz/code in TT corner.
Fig. 6-7. Detail circuit schematic of the BDCO with the WTRN
-4 0 4 8 12 16 20 24 28 32 200.0M
400.0M 600.0M 800.0M
Binary-weighted Proposed_TT Proposed_SS Proposed_FF
Frequency (Hz)
Coarse and medium codes
°
@25 C
Fig. 6-8. DCO output frequency versus coarse codes in corners
6.1.4. SDM
To improve the resolution of the BDCO, a 4-bit 1st-order SDM is used to dither the least-significant bit (LSB). Figure 8 shows its circuit diagram. It consists of a 4-bit adder and a register. With the SDM dithering, the BDCO has equivalently 16 times the resolution improvement. The parameters of the ADPLL are listed in TABLE I with a target of 400 MHz at 0.5V.
Fig. 6-9. Block diagram of the SDM.
Table 6-1. Design parameters of the proposed ADPLL
Digital loop filter coefficients
16 Divider number
20ps TDC resolution
563kHz/code DCO gain
1.25MHz Loop bandwidth
Parameters
Digital loop filter coefficients
16 Divider number
20ps TDC resolution
563kHz/code DCO gain
1.25MHz Loop bandwidth
Parameters
-1; -4
P I
K =2 K =2
6.2. Detailed Evaluation on BTRO 6.2.1. Power Analysis of BTRO
Precharge
Leakage Leakage
Leakage Leakage
Fig. 6-10. Power analysis of the BTRO
For a PLL, oscillator consumes most. Different from an analog VCO in which constant biasing current is the major power consumption, DCO consumes no DC current. However, the dynamic power is major concern, especially for BTRO due to its large output swing from –βVC
to β2VC. β is the boosting efficient factor [15]. As shown in Fig. 6-10, the total capacitance at the node Vout is COP of this stage and CIP of the next stage. In addition, CINV denotes the total capacitance at the output nodes of the INVP and INVN, where the output swings are for GND to VC. As a result, the total dynamic power consumption of the 5-stage BTRO is
( )( )
( )
2 2
2 2
5 2
45 5 .
BTRO IP OP C C INV C
IP OP INV C
P f C C V V C V
f C C C V
β β
β
⎡ ⎤
≈ ⎣ + + + ⎦
⎡ ⎤
≈ ⎣ + + ⎦
(6-6)
There are several leakage current paths in a bootstrapped delay cell. As shown in Fig. 9, take Vin=β2VC as an example, one is from pre-charge node NBP to the output through MP2, and another from the ground to the boosted node through MN1. Since β2VC is applied to the gate of MP2 and -VC to that of MN1, all these transistors are biased with negative VGS. Similarly, the other two paths are on the INVP and INVN. As a result, all leakage currents are significantly reduced such that they can be neglected.
6.2.2. Linearity Analysis of BTRO
For a VCO/DCO, the tuning linearity is very important which affects tracking and locking behavior as well as jitter performance. For the proposed 5-stage ring oscillator, the period is 10TD, where TD is the single stage delay. Assume that the rising and falling time is not exactly the same, and the TD then can be represented as
(
_ _)
D 0.5 PHL C PLH C
T = τ +τ . (6-7)
Here τPHLand τPLH are the propagation delays measured from the time of input change to the time of the corresponding output from H to L and L to H, respectively. The linearity can analyzed based on τPHL. We take a 5-stage inverter-based VCO as an example. Assume the characteristics of PMOS and NMOS are very similar and a load CL refers to the effective load capacitance at output node of the single stage. As shown in Fig. 6-11, CL is dis-charged by the NMOS with a VGS =VDD. Since the VCO is operated in the near-threshold region, the maximum VDD is 0.5V. According to the state equation, τPHL C_ can be the integration as in (6-8).
_ 0.5
DD DD
V L
PHL C V out
DN
C dV
τ =
∫
I . (6-8)PHL_C
Fig. 6-11. Delay time calculation for an inverter-based ring oscillator.
According to the switching characteristics, the switching operation consists of two intervals due to the threshold voltage Vth [61]. The switching operation at near-threshold supply is either in saturation with a VDD above threshold voltage or in sub-threshold with a VDD below threshold voltage. Thus, we can rewrite (6-8) as
_ _ _
PHL C PHL Sat PHL Sub
τ =τ +τ . (6-9)
When the ring oscillator is operated above the threshold voltage, the NMOS has a saturation current, as expressed in (6-10) [62]. Thus, we can derivate τPHL C Sat_ , as in (6-11) according to I-V equation in saturation region.
( )
the width and length of the device; Vth is the threshold voltage, and λ is the factor for channel-length modulation. On the other hand, when the VCO operates below the threshold voltage, according to sub-threshold current in (6-12), τPHL C Sub_ , is rewritten as in (6-13).2exp( DD th ) 1 exp( DD ) .
Where Cdep is the depletion capacitance; VT is the thermal voltage; and n is the sub-threshold slope factor. Obviously, the gate delay characteristics of the inverter-based ring oscillator are separated into two different regions. According to (6-11) and (6-13), both of these two regions are not proportional to the reciprocal of VDD. As a result, the inverter VCO is not a linear supply-regulated VCO.
As compared to inverter VCO, the BTRO features boosted swings from -βVDD to β2VDD to push the INVP and INVN operating in the triode region. The driving current is represented in (6-14). The propagation delay of the falling edge, τPHL BT_ is illustrated in Fig. 6-12. We can derivate τPHL BT_ from (6-8) and (6-14) to (6-15). VDD ≤ 0.5V(Bootstrapped)
Vout
Fig. 6-12. Delay time calculation for the BTRO.
2
(6-15), the frequency of the BTRO is highly proportional to the reciprocal of ( 2βVDD−Vth), which is suitable for supply-regulated VCO in the near-threshold region.
For a design example for 5-stage supply-regulated VCO, the VCO transfer curves at 25°C in different process corners are shown in Fig. 6-13. As compared to an inverter VCO, BTRO has higher linearity at near-threshold region and is less affected by the process variation.
0.2 0.3 0.4 0.5 0.6
0.0 500.0M 1.0G 1.5G 2.0G
Frequency (Hz)
Supply voltage (Volt) Bootstrap_FF
Bootstrap_TT Bootstrap_SS INV_FF INV_TT INV_SS
°
@25 C
Fig. 6-13. Comparisons of the VCOs transfer curve with supply-regulation.
6.3. Experimental Results and Comparisons 6.3.1. Chip Implementation
The proposed ADPLL has been fabricated in 90nm 1P9M SPRVT CMOS process. The test chip includes two test circuits, the proposed BTRO and the ADPLL. Figure 6-14 shows the block diagram of the test circuits. Multi-stage bootstrapped level shifters with an intermediate supply voltage VM_I/O are used for driving open drain devices. Figure 6-15 shows the chip micrograph.
The overall active area of the BTRO and the ADPLL is 31.5 μm×61.5 μm and 326 μm×175 μm, respectively. The test chip is mounted on an FR4 test board with SMA connectors, as shown in Fig. 6-16. An Agilent 81130A pulse generator provides the reference clock; an Agilent 54382D is used to measure output waveforms and its jitter performance. A Keithley 2400 power meter provides DC power and measures power consumptions. Phase noise was measured using an Agilent E4440A Spectrum Analyzer.
Fig. 6-14. Block diagram of the test circuits.
175um
326um
Fig. 6-15. Micrograph of the test chip.
Core VPLLADPLL
I/O VDD
Rst
fref Ph 1
Ph 2 Ph_t
Supply noise Test CKT
VBTRO
Fig. 6-16. Photo of the FR4 test board.
6.3.2. Measured Results
Figures 6-17(a) and 6-17(b) show the measured output waveforms of the BTRO at 0.2 and 0.6V. The detail frequency/power versus 0.2-0.6V VDD plots of the BTRO are shown in Fig. 6-18.
These measured results match the simulated ones in TT corner. As to the oscillation frequency versus the supply voltage, the BTRO has a relatively linear behavior near the threshold region.
0.5 ns 100 mV
(a)
50 mV 8 ns
(b)
Fig. 6-17. Measured output waveforms of the BTRO at (a) 0.6V VDD ; (b) 0.2V VDD.
100 200 300 400 500 600 700 0.0
200.0M 400.0M 600.0M 800.0M 1.0G 1.2G
Power (μW)
Freqency (Hz)
Supply voltage (mV)
Measured Post-sim FF Post-sim TT Post-sim SS
0.1 1 10 100
Fig. 6-18. Comparisons with measured and simulation results.
A locked clock waveform at 400 MHz is illustrated in Fig. 6-19. The measured jitter histogram shows that the output rms jitter and peak-to-peak jitter are 9.37ps and 69.1ps, respectively. The output frequency range of the proposed ADPLL is from 36.8 MHz to 480 MHz under a supply voltage of 0.25 to 0.5V. Figures 6-20 and 6-21 show the measured results of output spectrum and phase noise at 0.5V and 0.25V VDD, respectively. With a reference of 30 MHz (2.3 MHz), the measured spur at 480 MHz (36.8MHz) under a 0.5V (0.25V) VDD is 42.5dB (39.9dB) below the carrier. The phase noise are -96.2dBc/Hz (-91.6dBc/Hz) at 1 MHz offset and -79.9dBc/Hz (-78.1dBc/Hz) at 10kHz offset when the output frequency is 480MHz (36.8MHz).
Table 6-II summaries the major characters of the test chip.
200 ps 100 mV
Frequency: 400 MHz RMS Jitter: 9.37 ps P-P Jitter: 69.1 ps Total of 10 khits 200 ps
100 mV 200 ps
100 mV
Frequency: 400 MHz RMS Jitter: 9.37 ps
Frequency: 400 MHz RMS Jitter: 9.37 ps