Chapter 3 Phase-Locked Loop
3.3 CIRCUIT IMPLEMENTATION
3.3.4 Voltage-Controlled- Oscillator
In order to have the low jitter characteristics of the output clock, the delay buffer used in voltage controlled oscillator (VCO) should have low sensitivity and high noise rejection capability of the supply and substrate voltage. The basic building block of the VCO used in this thesis is based on the differential delay stages with symmetric loads and replica-feedback biasing. The building blocks of the VCO include a four stages ring oscillator and a self-biased replica-feedback bias generator. Fig 3-9 shows
VCO delay cell.
Fig 3-9 Schematic of the four stages VCO and the delay cell
As shown in Fig. 3-9, the buffer stage contains a source-coupled pair with diode-connected PMOS devices as resistive loads in shunt with an equally sized PMOS device. The control voltage, Vbp, is the bias voltage for the PMOS device. It is also used to generate the bias voltage for the NMOS current source and provides the control over the delay of the buffer stage. In order to provide a bias current that is independent of the supply and substrate noise, the bias voltage of the NMOS current source, Vbn, will be continuously adjusted. Fig.3-10 shows the I-V characteristics of the symmetric load.
Fig 3-10 I-V curve of the symmetric load
Basically, to get the high noise rejection capability over the supply and substrate noise, the load of the differential pair should have a linear I-V characteristic. In practice, this is difficult to use MOS device to achieve it. But the symmetric load can cancel the first order of the common mode voltage noise. Therefore, the symmetric load here, though nonlinear, could be used to have high dynamic supply noise immunity. The control voltage, Vbp, is the bias voltage for the PMOS device. In order to provide a bias current that is independent of the static supply noise, the bias voltage of the NMOS current source, Vbn, will be continuously adjusted. As the supply voltage changes, the drain voltage of the NMOS current source also changes.
However, the gate bias is adjusted by the replica-feedback bias generator to keep the output current constant. It seems that it makes the output resistance of the NMOS current source higher. Hence the static supply noise is greatly improved.
Based on the analysis of the I-V curve, it can be shown that the effective resistance of a symmetric load (Reff) is directly proportional to the small signal resistance at the ends of the swing range which is just one over the transconductance (gm) for one of the two equally sized PMOS biased at Vctrl. Therefore, the buffer delay is
where Ceff is the effective buffer output capacitance. The drain current for one of the two equally sized devices biased at Vctrl is
( )
[ ]
22 V V Vtp
I
d= kp
DD−
ctrl−
(3-2)Taking derivative with respect to Vctrl, the transconductance gm is given by
( )
[ V V Vtp ]
kp
g
m=
DD−
ctrl−
(3-3)The buffer delay is then given by
( )
As a result, Kvco is independent of the buffer bias current and the VCO has first order tuning linearity.
The bias generator of the VCO delay cell is shown in Fig 3-11. It provides the output bias voltage Vbp and Vbn from input signal Vctrl. The primary function is to continuously adjust the VCO delay buffer bias current to provide the correct lower swing limit Vctrl for the VCO delay buffer stages. As a result, it builds up a current that is held constant and independent of supply voltage. The bias generator consists of a PMOS source coupled differential pair, a half-buffer replica, and a control voltage buffer. The differential amplifier is actually a unity-gain buffer which forces the voltage of node Va in Fig 3-11 equal to Vctrl, a condition required for correct symmetric load swing limits, and provide the bias voltage Vbn for the NMOS current source. Besides, the bias voltage, Vbn,is dynamically adjusted by the differential amplifier to increase the supply noise immunity. With the half-buffer replica, the net result is that the output current of the NMOS current source is established by the load
element and is independent of the supply voltage. If the supply voltage changes, the amplifier will adjust to keep the swing and the bias current constant. Because the differential amplifier utilizes the self-biased architecture, there are two stable states, one of which is unbiased. As a result, an initial circuit is needed to bias the amplifier when power-up.
Fig 3-11 Schematic of self-biased replica-feedback bias generator
Because the differential amplifier and the half-buffer replica form a two-stage negative feedback loop, frequency response issue must be taken into consideration.
Basically, there are two poles in the loop. One is at amplifier output, and the other is at the half-buffer replica output. Since the pole at the amplifier output is the dominant one, it can be moved toward origin to increase the phase margin of the loop by the capacitive load of the NMOS current source gates in the VCO buffer chain. Moreover, in order to track any supply and substrate noise that affect the VCO jitter performance, the bandwidth of the self-biased circuit is usually set equal to the operation frequency
of the VCO. The bias circuit also provides a buffered version of control voltage Vctrl using an extra control voltage buffer. This can isolate the control voltage Vctrl from capacitive coupling in the VCO buffer chain.
The PLL used in this thesis needs to generate four phases for the transmitter multiplexer and for the receiver samplers. Therefore, the VCO uses four delay buffer stages with the output frequency at 400MHz. The transfer curve simulation result of the VCO is shown in Fig. 3-12. The supply voltage is 3.3V. For Vctrl between 0.8V to 2.4V, the gain of the VCO is 457MHz.
0
Fig 3-12 Transfer Curve of the VCO
The differential oscillator output is converted to the 50% duty cycle single-ended signal used as input to the phase-frequency detector with the differential-to-single-ended converter shown in Fig. 3-13 and the feed forward type duty-cycle corrector shown in Fig. 3-14. The two differential amplifiers of the differential-to-single-ended converter use the same current source bias voltage, Vbn, generated by the self-biased replica-feedback bias generator for the VCO. According to Vbn, the circuit corrects the input common-mode voltage level and provides signal amplification.
Fig 3-13 Schematic of differential-to-single-ended converter
Fig. 3-14 Schematic of feed forward type duty-cycle corrector and its timing diagram
The duty-cycle corrector is connected behind the differential-to-single-ended converter to ensure that the duty-cycle of the VCO will be 50%. The signal P+
selected from the multiphase signals turn on M1 and M2, and charges the output node clk+ of the duty-cycle corrector almost instantaneously. Because the discharge path of
the node clk+ is already off due to the signal P-. The signal P-, which is also selected from the multiphase signals, is the one whose rising edge is shifted by 180°
in phase from that of P+. Similarly, the signal P- rapidly discharges the node clk+ and delivers the desired 50% duty-cycle signal. Since this duty-cycle correction circuit consists of only two transmission gates and two inverters, the area is minimal and the power consumption is negligible. In order to drive next stages, digital buffers are added at the output to improve the driving ability.
3.3.5 Divider
Because the output frequency of the VCO is 400 MHz and the input reference frequency is 100MHz. Hence a divided-by-four circuit is used. The TSPC D Flip-Flop connected its inverted output to D input is used as a divided-by-two circuit, as shown in Fig. 3-15. In this circuit we need to check input clock driving capability to assure correct operation. Then, two divided-by-two circuits are cascaded to get a divided-by-four circuit. Unfortunately, asynchronous counter will accumulate jitter stage by stage. A synchronous counter is used at the last stage to re-sample the clock, and it will eliminate the jitter accumulated in asynchronous counter, as shown in Fig.
3-16.
Fig. 3-15 Schematic of TSPC Asynchronous Divided-by-two circuit
Fig. 3-16 Divider composed of asynchronous and synchronous counters and its timing diagram
3.4 PLL Parameter Design
Because the charge pump has switching characteristics, the PLL is generally a discrete-time domain operation. It is difficult to use continuous time-domain analysis.
However, if under some condition, the s-domain model could also be used to get a thorough understanding of the negative feedback loop. Fig. 3-17 shows the linear model of the PLL.
Fig. 3-17 Linear Model of PLL
Assume the PLL is in lock state. The PFD and CP have a current change of Ip/2π (A/rad), the LF has a transfer function F(s) (V/A), the VCO has a gain of Kvco (Hz/v), and the feedback factor is 1/N. The conversion gain of the VCO should be changed to 2πKvco/s (rad/sec-V), because phase is the integral of the frequency. Based on the above definitions and PLL linear model, the open loop gain of the PLL can be represented as
The closed loop transfer function of the PLL is given by
Therefore, the 3-dB bandwidth is
N
From analysis of LF, we know that the shunt capacitance C2 is typically much smaller than C1. Therefore, we can neglect the capacitor C2 and using classical two-pole system and second-order linear model of PLL to analyze the characteristic of transient response. With F(s) = R1 + (1/sC1), the closed loop transfer function can be derived as
Equation above can be compared to the classical two-pole system transfer function
Therefore, the natural frequency ωn, and damping factor ζ can be derived as
= 1
In the case of the PLL design, the frequency noise of the VCO could be the dominant noise source to influence the phase noise performance. As will be seen in
later section, the noise of the VCO has the high pass characteristics. Therefore, a large loop bandwidth for the PLL feedback system is better because it can enhance the tracking ability. The choice of the damping factor ζ is a trade off between acquisition time and step response stability. If larger ζ is chosen, the system could have longer acquisition time. On the other hand, if smaller ζ is chosen, the system may be ringing for step response or become unstable.
Then, we use the loop bandwidth and the phase margin to determine the component values of the loop filter. We can get
2
The phase term will be determined based on the pole and zero of the loop filter such that the phase margin is calculated as
p
By setting the derivative of the phase margin equal to zero, the phase margin is maximum when the loop bandwidth is set to the average of pole and zero.
p
The loop bandwidth (BW) now can be written as
⎟⎟⎠
The design flow of a third-order PLL can be derived from equations above. The design flow can be summarized as follows:
1. Determine Kvco by measuring VCO test keys or simulating a VCO using in thedesign or referring to the data sheets of the employed commercial VCO.
2. Depending on the desired noise and transient performance, determine the loop bandwidth BW. Usually, the BW is less than 1/10 of the reference clock.
3. If the filter is off-chip, set Ip to be around 100μA to 1mA. If an on-chip filter is employed, decrease the value of Ip so that the reasonable trade off between chip area and charge pump current could be reached.
4. Determine the nominal value of N according to the system to be applied to.
5. Selecting the required PM specification.
6. With BW, Ip, PM, N, and Kvco determined, R1 can be calculated.
7. Calculate the value of C1 with C1=1/R1ωz.
8. Calculate the value of C2.
The parameters used in the PLL are listed in Table. 3-1. Fig.3-18 shows the curve for the open loop PLL frequency response. This curve gives the phade margin of approximately 70°. Fig. 3-19 shows the eight even-spaced phases of frequency 400MHz. Fig. 3-20 shows the simulation of the eight output clock signals of the PLL.
Technology TSMC 0.35µm 2P4M CMOS
Table 3-1 Parameters of the PLL
Fig. 3-18 Open Loop gain simulation of the PLL
Fig. 3-19 Simulation of the four output clock signals of the PLL
Fig. 3-20 Simulation of the eight output clock signals of the PLL
3.5 PLL Noise Analysis and Stability
The timing jitter could affect the maximum timing margin of the transmitter and the performance of the high speed serial link. The output jitter of the PLL is contributed by many different noise sources as shown in Fig. 3-21, where θin(s) is the reference noise, in(s) is the PFD and CP noise, Vn(s) is the LF noise and θn(s) is the VCO noise.
Fig. 3-21 Linear model of PLL with different noise sources
These noises introduce the phase fluctuations or timing jitter in time domain.
Using closed loop analysis, the transfer functions with different noise sources can be derived as
The noise transfer functions have different characteristics. The Hin(s) and Hpdf_cp(s) are low pass functions, the HLF(s) is a band pass function and the Hvco(s) is a high pass function. Based on the analysis, the loop bandwidth of the PLL should be maximized to meet the high pass function of the VCO to filter the timing jitter caused by the VCO. The maximum nature frequency ωn of the PLL is restricted to the input reference clock frequency ωin. Using the analysis from the PLL, the criteria of the stability limit can be derived as
2 larger loop bandwidth indicates that more phase noise from the input clock will transfer to the output with larger loop bandwidth. However, it does not cause a problem when the input is a clean clock source.
Chapter 4
Transmitter
4.1 Architecture of Transmitter
PRBS 4:1
Fig. 4-1. Block diagram of type 1Transmitter
PRBS 4:1
Fig. 4-2. Block diagram of type 2Transmitter
The data input is from PRBS (Pseudo Random bit sequence). The data process circuit pre-skew the data before feeding them into the multiplexer. The pre-skew of parallel data are shown in Fig. 4-3. Fig. 4-1 and Fig. 4-2 show the block diagrams of the transmitter architecture with type 1 and type2. The differences between type 1 and type 2 are clock process circuit and clock process delay circuit. Type 1 transmitter transfer 100MHz clock as Fig. 4-4 for receiver. Type 2 transmitter transfer 800MHz clock as Fig. 4-5 for receiver.
D0 D1
D2 D3
0.625 ns 2.5 ns D0
D1 D2 D3
2.5 ns
t t
Fig. 4-3 pre-skew of parallel data
D0 D1 D2 D3
D0 D1 D2 D3 D0
100 MHz Clk
Fig 4-4 Type 1 clock (100 MHz)
D0 D1 D2 D3
D0 D1 D2 D3 D0
800 Mhz Clk
Data
Fig 4-5 Type 2 clock (800 MHz)
The transmitter is built up by a PRBS circuit, a PLL, a 4 to 1 multiplexer, clock process circuit, and data and clock circuit. The transmitter consists of a PLL proposed in the chapter 3 to produce the clock signals at 400 MHz with eight even-spaced phases. By using 4:1 input-multiplexer, we can serializes low-speed four channels parallel data on four even-spaced phases of 400MHz which gives a bit rate 1.6Gbps, and we can reduce the frequency requirement of the timing circuits and the digital logic. Only four even-spaced phase is utilized for 4:1 MUX. The other is utilized for transferring 800MHz clock and using data pre-skew.
For testing, the Pseudo Random Bit Sequence (PRBS) is utilized to generate data pattern. Through the data and clock driver, the data stream is transmitted out with a nominal swing of 200mV. In the following section, we will describe the detail circuits of the function blocks in the transmitter architecture.
4.2 Pseudo Random Bit Sequence (PRBS)
Fig. 4-6 block diagram of Pseudo Random Bit Sequence (PRBS)
Fig. 4-7 PRBS delay cell circuit
As shown in Fig.4-6, The Pseudo random bit sequence (PRBS) is widely used for testing communication systems. Fig. 4-7 shows the circuit implementation of the D-flip flop delay cell used in the PRBS circuit. With a series delay cell, each delay cell can offer a signal for next delay cell. The output of the XOR gate can generate the new data. The pattern repeats every 27-1=127 clock cycles. We also note that if the
initial condition is zero, the delay cells remain in a degenerate state. Therefore, the SET signal must be used to solve this problem. And the XOR logic is the speed-critical part in the circuit. Then, we can use the outputs as 4-parallel data inputs of transmitter.
4.3 Four-to-One Multiplexer
Fig. 4-8 Timing diagram of 4:1 multiplexer
The multiplexer is used to serialize the parallel data channels D0~D3.When the transmitter transfers the data stream with 1.6Gbps, the PLL must produce four-phases with 400MHz. It generates the required phases of clk0, clk2, clk4, and clk6. The other phases of clock are utilized to generate 800M Hz clock for TYPE 2 transmitter. The relationship between input data, D0~D3, and clock (clk0, clk2, clk4, andclk6) is shown in Fig.4-8. For example, at the timing interval between the rising edge of clk0 and the falling edge of clk6, the input signal D0 starts driving the multiplexer output.
In order to achieve this algorithm, the multiplexer, as shown in Fig.4-9, is used to serialize the parallel eight data channel input D0~D3. High multiplexer fan-in may become the bottleneck and the achievable speed gradually decreases. This speed limitation is not an inherent property of the process technology but of the circuit topology. Then 2-1 MUX is utilized, such as Fig. 4-10 and Fig. 4-11. The Mux delay buffer is introduced in section 4.3.2.
2-1
Fig.4-9 Block Diagram of 4-1 Multiplexer
2-1 MUX
In1
clkb clk
In1b In2 In2b
Fig. 4-10 Block of 2-1 MUX
In1
In1b
clkb In2
In2b clk
out
In1b
In1
clkb In2b
In2 clk
outb
Fig. 4-11 Schematic of 2-1 MUX
4.3.1 Data Pre- skew to 4:1 Multiplexer
Fig. 4-12 Timing diagram of Pre-skew
In order to ensure that each multiplexer of first level can select input data at the stable and correct state, the pre-skew parallel data channel D0~D3 is utilized for the multiplexer. If the transient edges of clock and input data rise approximately at the same time, the selected data is confused and costs some time to be stable. Thus, the output data jitter of the transmitter. Fig. 4-12 shows the timing diagram of pre-skew.
In order to achieve the target, some input data must be shifted before given in 2-level multiplexer.
4.3.2 Mux Delay
As showed in Fig. 4-1 and Fig. 4-2, clocks are transferred with data pattern. The delay of data pattern and the clock must be the same and hence the clock is reliable.
With the same circuit architecture, the delay of these two circuits are the same. Then the path of clock is designed as long as the path of data pattern. The multiplexer is used to serialize the parallel data channels D0~D3. Data is passed through 2-levels type MUX, and hence two stage of MUX delay buffer is added in clock path, as Fig.
4-13.
Fig. 4-13 schematic of MUX delay
4.4 Clock Process Circuit
Because code modulation is usually used for data pattern, we usually don’t need such high speed to match spectrum for channel. In TYPE 1 transmitter, 100 MHz clock is transferred, as Fig. 4-4. For a critical case, a 1.6Gbps data pattern transfers a one followed by a zero (0 1 0 1 0 1 . . . . ), and it is equal to 800MHZ clock actually.
Then 800MHz is the fastest clock to transfer 1.6Gbps data pattern. In TYPE 2 transmitter, 800 MHz clock is transferred, as Fig. 4-5.
TYPE 1 Transmitter:
In TYPE 1 transmitter, 100MHz clock is utilized to give information about phase between clock and data, as Fig. 4-4. Clk0 in Fig.4-8 is used to generate 100Mhz clock for receiver. Because clk0 is 400MHz and the clock for receiver is 100MHz. Hence a
In TYPE 1 transmitter, 100MHz clock is utilized to give information about phase between clock and data, as Fig. 4-4. Clk0 in Fig.4-8 is used to generate 100Mhz clock for receiver. Because clk0 is 400MHz and the clock for receiver is 100MHz. Hence a