• 沒有找到結果。

Chapter 2 Basic concept of DCO Implementations (DCO)

2.2 The basic concept of Digital-Controlled Oscillator (DCO)

2.2.2 Fine-tune unit

The resolution of Fine-tune unit is regarded to the performance of DCO clock.

How to improve the fine tune resolution is the important way to reduce the jitter. It is difficult to reduce the jitter if the fine tune resolution is not as good as low jitter requirement.

The well-known Fine-tune unit is described as Fig.2.5.[6][15][16][17]. The Fine-tune unit has two transistors turn-on to be the loading of inverter. There are several transistor pairs controlled by fine tune control bits. Through the control of these bits, we adjust the loading of inverter and get the different delay time.

For example, we have 4 bits to control the fine tune delay. Assume the loading of control bits set (f1:f4)=(0000) is RL. The loading of control bits set (f1:f4)= (1111) will be 5 RL. We have 5 steps delay time when (f1:f4)=(0000)/(0001)/(0011) /(0111)/(1111).

The other method of Fine-tune unit is to add the additional capacitor in the delay chain as Fig.2.6.[1].

Each capacitor provide the additional capacitance △C .We can change the amount of connected capacitors and adjust the delay time through the control bits. We can have inverter gate or one pair of transistors (PMOS & NMOS) to be the additional capacitor as Fig.2.7. Nor gate and Nand gate are also the good solutions for better fine tune resolution. We will introduce OAI and AOI as additional capacitor in next chapter. The △C of them is much smaller than others so we can have the best resolution as delay time step.

The third method of fine-tune unit is phase-mixer as Fig.2.9.[18][19]. Through mixing early and late signals as different phase, we can get the different fine tune steps. The design constraint of the phase-blender circuit is that all paths through the circuit must provide precisely the same loading and delay time to make sure the phase difference of ψA andψB is the same as ΨA andΨB. The phase-mixer can have multiple cascaded stages for further resolution and more fine-tune steps.

CHAPTER 3 PROPOSED

DIGITAL-CONTROLLED-OSCILLAT OR (DCO) AND SIMULATION

RESULT

In analog PLL, Voltage-Controlled-Oscillator (VCO) is used to generate the phase-locked clock. As the same function of VCO, DCO is implemented to All-Digital-Phase-Locked-Loop (ADPLL) for clock generation as Fig.3.1. DCO accept the digital signals from controller and transfer the digital signals to time delay.

The delay time will decide the phase and frequency of oscillator. That is, ADPLL controls the DCO by changes the control words. Arithmetically incrementing or decrementing the DCO control words will modulate the DCO frequency and phase.

Here we also provide the wide bandwidth DCO. Digital- to-Time Converter (DTC) is applied to decide wide bandwidth frequency. Through digital signals control, wide bandwidth application is available. In this chapter, we introduce the overview structure of Digital-Controlled Oscillator (DCO), coarse- search cell, fine-search cell and the new scheme to get better fine search resolution. In next section, we will introduce Digital-to-Time Converter (DTC) and wide bandwidth oscillator scheme. In addition, how to reduce jitter from DCO scheme is the other important topic that we will discuss.

The DCO block of ADPLL consists of Digital-to-Time converter (DTC), Coarse-search cell and Fine-search cell as Fig.3.2 (a).

DTC is designed for wide bandwidth application. Coarse-search cell is designed for coarse search that search step is large. Fine-search cell is designed for fine search that search step is small. When the clock signal rise, the delay path through DTC, Coarse-search cell and cell make the delay time as period of duty cycle “high”(TH).

In the other hand, the clock signal falls, the delay path makes the delay time as period of duty cycle “low”(TL). The consistent of duty cycle “high” and duty cycle “low”

make the whole clock cycle. The period of clock(T) is sum of the period of duty cycle

“high” and duty cycle “low”. As show in Fig.3.2(b), T = TL + TH.. The frequency f is 1 divide T, that is, 1/T. For example, if TL=0.5 ns and TH=0.5ns, T = TL + TH = 1ns, the frequency is 1/1ns = 1GHz.

The basic concept of oscillator is show as Fig.3.3(a). There is intrinsic delay exists in m1, m2 and m3. The intrinsic delay caused by resistance in MOS channel and capacitance in loading capacitor. Fig.3.3(b) shows intrinsic delay of inverter.

When signal rise, current come from VDD and charge loading capacitor. The time delay τ_rise =” Rp ‧ Cg”. On the other way, current discharge from loading capacitor to ground when signal falls, the time delay τ_fall =” Rn ‧ Cg”.

τ_rise = Rp ‧ Cg (3.1) τ_fall = Rn ‧ Cg (3.2)

The oscillator consists of 3 ( or odd) inverters . We replace one inverter with nand cell for the implementation of “Clock_EN”. The signal enable oscillator to work. At the moment when “Clock_OUT” signal rises, the time is t_r. After intrinsic delay time d1 of m1, “N1” signal falls to the inverse phase. After more m2 intrinsic delay time d2 , “N2” signal will rises to “H”. More time delay d3 is contributed from m3 . Finally, “Clock_OUT” falls at the time t_f = “t_r + d1 + d2 +d3”. “Clock_OUT”

spend the time “d1 + d2 + d3” to travel on signal “H”. This is the period of clock duty

“H”. It will travel on signal “L” at the same way and that will be the period of clock duty “L”. The balance of duty “H” and duty “L” is very important. It will significally influence the performance of DCO.

In this oscillator scheme, the period of duty “H” is contributed from two parts.

One is fall delay of m1 and m3. The other one is rise delay of m2. On the other hand, the period of duty “L” is also contributed from two parts. One is rise delay of m1 and m3. The other one is fall delay of m2. How to get the same delay time at rising edge and falling edge is the serious topic in oscillator design.

TH =τ_fall_m1 +τ_rise_m2 +τ_fall_m3 (3.3)

We can add the additional gate capacitor in the delay chain. For example, we add a NMOS capacitor at net :”N1” as show in Fig 3.3(c). It spends more rise and fall time to charge the additional capacitor. The delay time d1 will be more than the original scheme. We can make several schematic by add different additional capacitor to get the different delay time. This will make the different frequency clocks. If the additional capacitance is large, it is the Coarse-search cell. If the additional capacitance is small, it is the Fine-search cell. Coarse-search cell is easy to implement, but the Fine-search cell is dependent on how much small the capacitance is. The most small step delay time is the best resolution in this design.

NMOS capacitance for different gate voltage is show in Fig.3.4. There are three regions in this plot.

(1) Accumulation occurs when Vg < Vfb, the negative charge on the gate attracts holes from the substrate to the oxide-semiconductor interface.

(2) Depletion occurs when Vfb < Vg < Vth, the positive charge on the gate pushes the mobile holes into the substrate. Therefore, the semiconductor is depleted of mobile carriers at the interface and a negative charge, due to the ionized acceptor ions, is left in the space charge region.

(3) Inversion occurs when Vg > Vth, there exists a negatively charged inversion layer at he oxide-semiconductor interface. The inversion layer is caused by minority carriers, which are attracted to the interface by the positive gate voltage.

From the plot we find that the capacitance is different at VN1=0v and VN1=1v.

At VN1=0v, the NMOS capacitance is in depletion region. But at VN1=1v, the NMOS capacitance is in inversion region. That is, the rise delay is different from the fall delay.

In the circuit design, we usually add another PMOS to get the better performance for time delay. In the DCO design proposed here, the delay cell for search steps are Inverter, NAND and OAI. The gate capacitors of these cells both have NMOS and PMOS. NOR and AOI are also the good delay cells for search steps.

3.2 The proposed DCO architecture

There are 28 control bits (setting [27:0] ) for DCO frequency adjustment.

reduce jitter is another important topic. We will discuss the scheme integrated with analog design for reducing later.

As the diagram show in Fig.3.5, DCO receives the control bits from DCO controller and generate the output clock. The frequency of this clock will be divided and compare with the reference clock. The result of comparison will make a new setting of control bits and generate a new DCO clock until both the phase and frequency is locked in.

The proposed DCO architecture is show as Fig.3.6. There are three major blocks in this scheme. The first block is Digital-to-Time converter for wide bandwidth control. The second block is Coarse-tune cell consists of inverter and NAND. The six control bits are decoded to 64 stages. The stage starts at C0 [5:0] = [000000] (en[30:0]

= [00000…000]) and ends at C0 [5:0] = [111111] ( en[30:0] = [11111…111]). The delay time of each stage is 30 pico-seconds. The third block is Fine-tune cell. This cell includes four sub-cell and controlled by 13 bits. The first sub-cell is controlled by 3 bits and decoded to 8 stages. The stage starts at F1 [6:0] = [0000000] and ends at F1 [6:0]=[11111111]. The delay time of each stage is 4 pico-seconds. The second sub-cell is controlled by 5 bits and decode to 32 stages. The stage starts at F2 [ 30:0] = [00000…000] and ends at F2 [30:0] = [11111…111]. The delay time of each stage is 0.15 pico-seconds. The third sub-cell is controlled by 2 bits and decoded to 4 stages.

The stages are F3 [2:0] = [000], [001],[011] and [111]. The delay time of each stage is 0.05 pico-seconds. The last sub-cell is controlled by 3 bits and decoded to 8 stages.

The stage starts at F4[6:0]=[0000000] and ends at F4[6:0]=[1111111].

table.3.1 Cell name Inverter and

NAND NAND NAND OAI AOI

Control bits 6 3 5 2 3

stages 64 8 32 4 8

resolution 30ps 4ps 0.15ps 0.05ps 0.01ps

3.3 Cell-based Digital Controlled Oscillator

3.3.1 Digital to time converter (DTC)

DTC consist of internal CLK generator, counter and multiplex show as Fig.3.7.

CLK generator is used to generate internal clock for timing counter. The “in” signal is come from “DCOCLK” as show in Fig.3.6. This signal is opposite to signal “out_reg”

when it travels through all DCO delay cells. In this time, signal “EN” will go “H” and enable CLK generator. Counter begin to work . When the counter ends its work at the time that counter_reg [8:0] = C1 [8:0], signal “out_reg” will inverse to be the same as

“in” through multiplex and continue to its next trip. The description above is the process about half cycle of clock. C1[8:0] decide the count number and clock period.

When C1[8:0] add one, the clock period will add multiple of internal clock period 1.513ns. The minimum clock period is 2.16ns when C1[8:0]=0. The maximum clock period is 15148.5ns when C1[8:0]=511. That is, the frequency range is from 0.66MHz to 460MHz step by 3.026ns. The simulation result of variable clock period controlled by C1[8:0] is show as Fig.3.8(a) and that of minimum/maximum clock period is show in Fig.3.8(b).

The minimum clock period 2.16ns is consist of DTC, Coarse-tune and Fine-tune cell intrinsic delay when C1[8:0]=0. As C1[8:0] increase, the DCO clock period will

resolution. Wide bandwidth and high resolution is our design target. The advantage of DTC wide bandwidth is to overcome the variation of PVT.

3.3.2 Coarse-tune cell

The Coarse-tune cell is show as Fig.3.7. The delay chain consists of inverter and NAND. The delay time is selected by multiplex and controlled by C0[5:0]. There is a power saving scheme here. That is, we disable unused delay chain by en [30:0]. This scheme saves transient current. It will save current consumption and reduce the noise due to transient current. So it is a power saving and low jitter design.

From Fig.3.7 we can see that when mux[5:0]=[000000], the selected one is wire_l1_[0], the delay time is intrinsic delay due to 6 multiplexers and one inverter.

When mux[5:0]=[000001], the selected one is inv_[1], the total delay time is one inverter and one Nand gate delay plus intrinsic delay. When mux[5:0]=[000010], the selected one is wire_l1_[2], the delay time is two Nand gate delay. The time delay from wire_l1_[0] to inv_[1] is 30ps and that from inv_[1] to wire_l1_[2] is also 30ps.

That is, each stage delay time is 30ps. We have total 64 stages controlled by 6 bits.

The first stage is intrinsic delay 450ps. The second stage delay is 480ps and the last stage delay is 450ps plus 1.89ns. The simulation result of coarse tune is as Fig.3.8.

The minimum delay time of coarse is 450ps when mux[5:0]=[000000] and the maximum delay time is 450ps plus 1.89ns when mux[5:0]=[111111].

3.3.3 Fine-tune cell

We use four fine-tune cells, the advantage of this scheme is that it need fewer cell units. Because the coarse-tune cell resolution is 30ps, we need 3000 Fine-tune cell units that the resolution is 0.01ps for good work. In our scheme, we just need 48 Fine-tune cell units to meet our target as show in table3.2. This scheme will save more chip area. We need 12 control bits from DCO controller if we use 3000 F4 cell units.

We need 13 control bits for our scheme because we have more tolerance in each stage for process, voltage and temperature variation. This penalty is a good deal for saving more Fine-tune unit cells.

table.3.2

Fine-search

First sub-cell Second sub-cell Third sub-cell Fourth sub-cell

cell units number 7 31 3 7

start control set F1[0000000] F2[000…000] F3[000] F4[0000000]

second control set F1[0000001] F2[000…001] F3[001] F4[0000001]

We have 2-NAND (F1), NAND (F2), OAI (F3) and AOI (F4) as Fine-tune cell.

We decide the capacitance by controlling the switches. The switches are controlled by control bits of F1 [6:0], F2 [30:0], F3 [2:0] and F4 [6:0]. As described above, the change of capacitance means the change of delay time. By switching control bits, we can adjust the delay time to meet the target of frequency and phase. This is fine tune to meet DCO clock. As show in Fig.3.9., we can see F1, F2, F3 and F4 Fine-tune cell as addition capacitors. By controlling the switches, we adjust the capacitance to decide the delay time. NAND gate as F1 and F2 Fine-tune cell by adjustment of ΔC and ΔR. OAI as F3 Fine-tune cell and AOI as F4 Fine-tune cell by adjustment ofΔC, ΔR and ΔI.

F1 Fine-tune cell is consist of two parallel NAND gate, the channel width and length (W/L) of first NAND gate is equivalent to PMOS : W/L = 3.36u/0.08u and NMOS : W/L = 2.4u/0.08u. The channel width and length (W/L) of second NAND gate is equivalent to PMOS : W/L = 1.26u/0.08u and NMOS : W/L = 0.9u/0.08u.As Fig 3.10(a), each step of Fine-tune delay is 4 pico-seconds. From F1[6:0]=[0000000]

to F1[6:0]=[0000001], the delay time will add 4 pico-seconds. From F1[6:0]=[00000 00] to F1[6:0]=[11111111], the delay time will add 28 pico-seconds. For the next increasing step, Coarse-tune will add one step of delay time that is 30 pico-seconds and F[6:0] reset to F1[0000000]. Here we have some tolerance for process, voltage and temperature variation. Our design for F1 Fine-tune cell can support Coarse-tune delay time by step 32 pico-seconds. We consider PVT issuse to have tolerance about 80%~90% for Coarse-tune cell and Fine-tune cell.

F2 Fine-tune cell is a NAND gate which channel width and length is PMOS : W/L = 0.36u/0.08u and NMOS : W/L = 0.24u/0.08u. The simulation result is show as Fig.3.10(b). Each step of F2 Fine-tune cell is 0.15 pico-seconds. Here we get waveform every 4 steps, that is 0.6 pico-seconds. From F2[30:0]=[000…000] to F2[30:0]=[111…111], total 32 stages. The delay time add 4.65 pico-seconds. This is more than F1 Fine-tune cell time delay 4 pico-seconds by step. This is also the consideration of PVT tolerance.

The simulation result of F3 Fine-tune cell is show in Fig.3.12(a). Each step of F3 Fine-tune cell is 0.05 pico-seconds. There are total 4 stages from F3[2:0]=[000] to F3[2:0]=[111]. The time delay from F3[000] to F3[001] is 0.015 pico-seconds and the time delay from F3[000] to F3[111] is 0.015 pico-seconds. For the next increasing step, F2 Fine-tune cell will add 0.15ps by one step. Here we also consider PVT variation and have more tolerance.

We use AOI as F4 Fine-tune cell that show in Fig.3.13. There are 4 pairs of PMOS/NMOS in this cell. Net “A1” is connected to delay line, net “B” is forced to

“L” and net “C” is forced to “H”. When we switch the signal of net “A2” from ‘L” to

‘H”, the delay time will add 0.01ps. This is the best resolution we find in the cell of AOI and OAI.

The simulation result of F4 Fine-tune cell is show as Fig.3.14. Each step of delay cell is 0.01 pico-seconds. From the simulation waveform and measurement result, we can see the delay time is steadily increase 0.01 pico-seconds by step. This is reliable performance of Fine-tune cell. This also mean the jitter of this design can be reduced to be under 0.01 pico-seconds. From F4[6:0]=[0000000] to [0000001], the delay time add 0.01 pico-seconds. From F4[6:0]=[0000000] to [1111111], the delay time add 0.07 ps, this is more than F3 Fine-tune cell resolution 0.05ps. Here is also the consideration of PVT tolerance.

It is a interesting work to use AOI and OAI as Fine-tune cell. From the simulation result, we can find the reliable performance as table.3.3 It is reliable to use

Table. 3.3 The resolution of AOI/OAI as Fine-tune cell for different switching

(001)--->(101) (000)--->(101) (101--->111) AOI

(A2,B,C) 0.01ps 0.026ps

(A1,B,C) 0.05ps OAI

(A1,A2,C) 0.032ps

3.3.4 Performance comparison with other work

From the performance comparison as table.3.4 [1][8][20][5], we can find the performance of our design in bandwidth and LSB resolution is much better than other work. In the comparison of these work, we find the best bandwidth except our work is 45~450MHz. It is not so good as our design 0.66~460MHz (i.e.2.16ns ~ 15148.5 ns).

The best resolution of other design is 5 pico-seconds for Fine-tune cell. Our design is 0.01 pico-seconds. That is much better than others work. The Fine-tune resolution is related to jitter performance. The much better resolution will make good jitter performance. The other advantage of our design is low power, the power consumption is 1.15mW at 460MHz.

3.4 Current mirror for jitter reduction

There is a model to simulate the jitter induced by ground bounce due to transient current. We assume there is a 400MHz transient current induced by other 2000 inverters of ADPLL and make the bounce noise 0.44V. In our experiment, we have a current mirror sinking 10 mA in our DCO scheme and reduce bounce noise to 0.187V.

The experiment is show as Fig.3.15. We consider the RLC effect from package.

The transient current will make power and ground bounce. That is the major source of jitter.

table 3.4 Performance comparison with other work

This work TCSII '05 ISQED '02 ISSCC'03 JSSC'03

0.09um @ 1V 0.35um @ 3.3V 0.5um @ 1.65V 0.6um @ 5V 0.35um @ 3.3V

DCO word length 28 bits 15bits 8 bits 10bits 12bits

LSB resolution 0.01 ps 1.55 ps 40 ps 10 ps 5 ps

DCO output range 0.66~460MHz 18~214MHz 150MHz 10~12.5MHz 45~450MHz Power Consumption1.15mW @ 460 MHz 18mW @ 200 MHz 1mW @ 150 MHz 164mW @ 100 MHz 100mW @ 450 MHz

We can find the simulation result in Fig.3.16. The power and ground bounce influence the period of clock. That will be the major source of jitter. If we apply current mirror for DCO., the bounce will be reduced. We find the internal clock of DTC will be improved when current mirror is applied. The internal clock of DTC is

We can find the simulation result in Fig.3.16. The power and ground bounce influence the period of clock. That will be the major source of jitter. If we apply current mirror for DCO., the bounce will be reduced. We find the internal clock of DTC will be improved when current mirror is applied. The internal clock of DTC is

相關文件