Chapter 2 Background Study
2.1 T ECHNIQUE OF CDR
Basic method of CDR is shown in Figure 2.1. The noisy and asynchronous data is received from channel. We need a CDR to recover the clock and resample the data.
The main function of CDR is to synchronize and reconstruct data, and reduce the accumulated jitter reduction.
Serial Data Input
Decision Circuit
Recovered Clock
D Q
CDR Circuit
Recovered Data
Generally speaking, the CDR has two basic architectures. The PLL based CDR and the oversamoling based CDR use different concepts to architect a CDR. We discuss these two types of CDR in next paragraphs.
PLL based CDR
Figure 2.2 shows the basic architecture of the PLL based CDR[2]. The difference between traditional PLL and PLL based CDR is the retiming circuit implemented by a D flip flop(DFF). The random data instead of reference clock is used as input.
PLL based CDR comprises a phase frequency detector(PFD), a charge pump(CP), a low-pass filter(LPF), a voltage-controlled oscillator(VCO), and a retiming circuit. The PLL based CDR uses the PFD to detect the timing difference between the input data and the sampling clock. In order to adjust the VCO control voltage and filter out high frequency noise, the CP and LPF are designed. Finally, according to the control voltage, the VCO generates the sampling clock until the sampling clock and input data have no phase difference.
Figure 2.2: PLL based CDR
There is another similar type of CDR architecture, called DLL based CDR[2].
It replaces the VCO by a voltage control delay line(VCDL). Unlink the VCO, the VCDL adjusts the phase rather than the frequency.
PFD VCO
Retiming
Data in
Charge
Pump LPF
Recovery Data
Chapter2 Background Study
Oversampling Based CDR
Figure 2.3 shows the block diagram of the oversampling CDR[3]. The input data is sampled by a certain number of parallel samplers simultaneously. We also need a multi-phase clock generator to generate multi-phase clock. The outputs of the parallel samplers are stored. The bit boundary detection detects the data boundary by a majority voter. Finally, according to the bit boundary detection, we obtain the optimal clock to sample the data. Therefore, the data selector is implemented by a multiplexer to decide which sampled result is the recovered data.
Figure 2.3: Oversampling based CDR
Figure 2.4 is an example of the oversampling technique. In this example, the data is sampled by three phases in every bit time. Every neighboring sampled results is exclusive-ored to detect the data boundary. According to the accumulated number of transitions, we decide the one of the maximum count to be the boundary.
In this example, the maximum accumulated transition is six. We derive the transition edge is between phase 1 and phase 2. Finally, the best phase to sample is phase 3.[4]
Data
Sample Storage
Bit Boundary Detection Multi-Phase
Clock Generator DFF DFF
DFF DFF DFF Ref. Clock
Recovery Data Phase Detection
Parallel Samplers Data
Selector
Figure 2.4: Timing diagram of the oversampling
Comparison
Two different types of CDR architecture are presented in the previous paragraphs. Table 2.1 lists the comparison between the PLL based CDRs and the oversampling based CDRs. Generally speaking, PLL based CDRs are an analog approach and oversampling based CDRs use a digital approach. Therefore, oversampling based CDRs are easy to be redesigned when the process technology is changed. It is one of the important advantages of the oversampling based CDRs. In Table 2.1, we compare some features of CDRs to understand the advantages and drawbacks in these two types of CDRs.
P 1 P 2 P 3 P 1 P2 P3 P1 P2 P3 P1 P2 P3 P1 P 2 P 3 P 1 P 2 P3 Input data
Sampling Phases
Sampled Value 0 1 1 11 1 1 0 0 00 0 0 1 1 11 1 1 0 0 00 0 0 1 1 1 1 1 1 0 0 Indicate Transition 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0
Accumulate Transition 6 0 0
Transition Edge judgment P1-P2
Phase Picked P3
Chapter2 Background Study
PLL based CDR Oversampling based CDR
Resolution High Low
Locking Time Low Short
Noise Immune Bad Good
Hardware Overhead Small Large
Table 2.1: Comparison of PLL based CDR and oversampling CDR
2.2 Basic of Spread Spectrum
In order to reduce EMI, there are many techniques proposed. The spread spectrum is one of these techniques. The spread spectrum utilizes the frequency modulation to distribute the power. This technique is described in Figure 2.5[1].
Originally, the total power is concentrated at certain frequency. It induces large EMI.
The spread spectrum reduces the maximum peak energy under the same total amount of energy. Only a small amount of variation in frequency is needed to obtain several decibels of energy reduction. In short, the spread spectrum is a popular, low cost, and efficient technique to reduce EMI
Figure 2.5: Comparison of non-Spread spectrum and spread spectrum
Figure 2.6 shows the Serial-ATA II requirement for 3Gbps transceiver systems [5]. The spread spectrum utilizes a 5000ppm down spreading and a 30~33kHz triangular profiles. According to this requirement, the lowest frequency is 2.985Gbps.
Non-Spread spectrum Spread spectrum
Figure 2.6: Spread spectrum requirement for Serial-ATA II
Down spreading frequency modulation ensures the highest frequency is below the original frequency, 3Gbps. Serial-ATA specification defines a 30~33 kHz triangular modulation rate. In this requirement, the frequency varies with time.
t f
3Gbps 30~33kHz SSC
non-SSC 2.985Gbps
(-5000ppm)
Chapter3 Frequency Compensation Technique
Chapter 3
Frequency Compensation Technique
Conventional CDRs, no matter PLL based or oversampling based, have less than 1000 ppm frequency tolerance. However, the Serial ATA requires a 5000ppm spread ratio. It induces very large jitter when input data has high frequency offset.
Therefore, we propose a frequency compensation technique to enhance the tracking ability. The main contribution of this technique is the reduction the jitter at high frequency offset.
3.1 Frequency Compensation Methodology
In traditional CDR, it is not easy to track large frequency variation because the bandwidth is limited. In other words, small bandwidth induces smaller jitter, but the tracking ability is weak. On the contrary, if the CDR bandwidth is designed too large, it is good for frequency tolerance, but it obtains large jitter.
In order to solve this trade off between bandwidth and jitter, we design
major purpose of the frequency compensation loop is to increase the CDR bandwidth. Only when the bandwidth is extended, the frequency tolerance is increased.
The methodology of frequency compensation is shown in Figure 3.1. In Serial ATA, the spreading is a triangular waveform with 33kHz modulation frequency. For the frequency changes, we detect the amount of frequency variation in a frequency compensation period. In this thesis, the frequency compensation period is T . In s Figure 3.1, we detect the frequency increment in section A. Therefore, the frequency compensation loop provides the same amount of frequency to compensate this increment in section B and so on. According to this methodology, we compensate the frequency in every T . s
Figure 3.1: The methodology of frequency compensation
3.2 The Proposed CDR Architecture
Figure 3.2 shows the proposed CDR architecture with frequency compensation.
This architecture can be separated into two parts. The first part is a phase locked t
1/33k
¼*1/33k
∆f1
Ts
Ts Ts
Compensated Frequency
∆f0
∆f
B A
Chapter3 Frequency Compensation Technique
phase selector. The second part is the frequency compensation loop. It comprises a pulse counter, a pulse accumulator, and a frequency error compensator (FEC).
Besides, we design a lock detector to control the confidence counter size.
Figure 3.2: Proposed CDR architecture
Phase locked loop
The phase locked loop can be simplified as shown in Figure 3.3. The phase detector (PD) is a bang-bang phase detector. The transfer curve of bang-bang phase detector is shown in Figure 3.4[7]. The PD can detect the relationship between input data and recovered clock. The PD output “LEAD” means the input data phase appears earlier than recovered clock and vice versa.
Figure 3.3: Simplified phase locked loop architecture Recovery
HRPD Variable C.C.
Figure 3.4: Transfer curve of bang-bang phase detector
The Confidence Counter(CC) is similar to a loop filter functionally. The confidence counter size decides the equivalent bandwidth. In this thesis, the confidence counter size is N . If the number of accumulated input signal (Lead/Lag/Hold) exceeds N, the confidence counter have an output Lead_ov or Lag_ov. Lead_ov and Lag_ov are the inputs of the phase control. The coarse tune controls the choice of two neighboring phases from external clock. The fine tune interpolate phase more precisely. The phase control is designed with coarse tune and fine tune. Finally, the phase selector adjusts the phase to track the input data phase.
In order to have more precise phase resolution, we use the interpolation technique in the phase selector. The advantage of high resolution is the jitter suppression. In other words, when this system is lock and stable, the recovered clock is lock between two phases. The higher the phase resolution, the smaller the jitter is induced.
This architecture is implemented in digital. It is another advantage of this architecture.
Frequency Compensation loop
The frequency compensation loop comprises three components. Figure 3.5 shows the block diagram of the frequency compensation loop.
PD out
Lead
Lag
∆φ
Chapter3 Frequency Compensation Technique
Figure: 3.5 Frequency compensation loop
The first one is the pulse counter. The pulse counter counts the number of pulse difference between lead pulses and lag pulses. This value means the frequency offset in a specific time. This specific time is the rate to update the number of compensated pulses. We denote the frequency compensation period Ts in this thesis. The pulse accumulator accumulates the pulses in every Ts. The final component of the frequency compensation loop is FEC. FEC can generate the same number of the pulses as the number in the pulses accumulation. And the pulse is generated as uniformly as possible. Finally, these pulses are used as the input of the phase control to adjust the phase in the phase selector. Every pulse adjusts the recovered clock one resolution.
3.3 Confidence Counter Analysis
The first key point in this design is to decide N. Intuitively, a smaller N produces an output quickly. That is, a smaller N represents a larger equivalent bandwidth.
In the phase locked loop, we derive closed loop transfer function by the following steps:
1.Calculate the equivalent bandwidth for N of 6:
From Variable sized C.C
Ts P0
Pulse Counter
Freq.
Error Comp.
P0
Pulse Accum.
To Phase Control
(FSM)
Figure 3.6: Modify the confidence counter as a Markov chain[9] Using the first passage time, we modify (3-2) as
(n) (n-1) Now we use (3-3) to calculate the combinations of P for different ijn n. Therefore, we establish the matrix (3-4)
(3-4) Second, we use the expected value of conditional probability to calculate the time that confidence counter output occurs.
⎥ ⎥
Chapter3 Frequency Compensation Technique
In (3-5), p means the probability of lead. Therefore, we assume the input jitter is a Gaussian distribution in Figure 3.7. The probability of lead can be calculated as
p = f(x)dx
Figure 3.7: Gaussian distribution profile
In (3-5), parameter ζ is represented the number was circuited by dot square in (3-4).
We can extend the matrix to a general from by following derive.
0= 1
6 i+1
(3-8) means the excepted value for the confidence counter to obtain an output. We can derive the equivalent bandwidth from (3-8).
C_C 1
= E[ ]
ω Δφ (3-9) 2. Rewrite the equivalent bandwidth in a general form:
(3-8) can replace the confidence counter size 6 by N. We obtain the equivalent bandwidth of confidence counter as
N k k -1
According to (3-10), the relationship between confidence counter size N and equivalent bandwidth is plotted in Figure 3.8.
Chapter3 Frequency Compensation Technique
900.00
281.25
38.09 53.05
80.36 136.36
29.47 24.26 20.97 18.73 0
100 200 300 400 500 600 700 800 900 1000
1 2 3 4 5 6 7 8 9 10
Figure 3.8: The relationship between N and equivalent bandwidth
From Figure 3.8, we achieve the result that the larger the confidence counter size, the smaller the equivalent bandwidth.
3. Phase selector transfer function:
Besides the confidence counter, there is another component in phase locked loop.
The phase selector adjusts the phase when the control digital code changes. We modify the relationship between digital code and output phase in Figure 3.9.
Therefore, we linearizes the relationship curve. Eventually, the transfer function of phase selector KPI is represented in (3-20):
1 Digital Code
2π
16 (2π/16)*1
(2π/16)*2 2
Confidence counter size
Bandwidth (MHz)
∆φ
KPI =2p=2p
16 L (3-11) where L is the interpolation steps
4. Phase locked loop transfer function:
From (3-10) and (3-11), the phase locked loop transfer function is derived under these assumptions:
Assumption 1: The input jitter is a Gaussian distribution Assumption 2: The phase selector transfer curve is linearized Assumption 3: Only consider the input jitter in ±3σ
Assumption 4: Δ is half the resolution. φ
The phase locked loop closed transfer function can be represented in (3-12) by these assumptions.
I closed
I
c_c
H (s)= K
(K +1)+ s ω
(3-12)
Where c_c k* N k k -1
i i
k=0
R R
2 2
= 2 { (N 2k) [Q( )] [1 - Q( )] T}
J J
P
6 6
ω π× ∑∞ + ×ζ × + × .(3 -13)
In (3-12) and (3-13), the phase locked loop transfer function depends on four parameters.
N is the confidence counter size.
R is the phase resolution.
J is the peak to peak input jitter. i
T is the clock period.
Besides, P and * ζk are shown in (3-5) and (3-7).
Jitter tolerance is an important specification for CDRs. Jitter tolerance is defined as input jitter a receiver must tolerance without violating system’s BER
Chapter3 Frequency Compensation Technique
3.11[5]. The available region is upper the jitter tolerance mask. As a result, we design our desire curve as the dotted line. In Figure 3.10, if we deign the phase locked loop bandwidth at 0.4MHz, it suits for the jitter tolerance.
Figure 3.10: Jitter tolerance of Serial ATA II and desired curve
An approximation condition to avoid increasing the BER is[2]
in out
in
- < 0.5UI [1 - H(s)] < 0.5UI θ θ
θ . (3-14) We therefore can express the jitter tolerance as
JT
closed
G (s) = 0.5
1 - H (s) . (3-15) The relation between jitter tolerance and phase locked loop closed transfer function is shown in Figure 3.11. According to the phase locked loop closed transfer function (3-12), we can calculate the equivalent bandwidth by different N . The result is shown in Figure 3.11. When N is 28, the equivalent bandwidth is 0.4MHz.
Eventually, in order to be easy for circuit design, we choose N =32 as the confidence counter size.
25000 fdata
Jitter Amplitude (UI)
0.1
1667 fdata
1.5
0.36M 0.5
Desired curve
Jitter frequency (Hz)
Figure 3.11: The equivalent frequency response of C.C. by different size
In short, in our design the confidence counter size is 32. As the result, the equivalent bandwidth satisfies the Serial ATA II jitter tolerance requirement.
3.4 Frequency Compensation Period Determination
The second key point in our design is to determine the frequency compensated period. In Figure 3.1, the spread spectrum is a triangular profile. Therefore, the amount of frequency increment is fixed. In Figure 3.12, we calculate the slope of the frequency offset.
0.4M -3dB
N=40 N=28
N=20
N=1
Chapter3 Frequency Compensation Technique
Figure 3.12: Relationship between Ts and frequency offset
According to the Serial ATA II specification[5], the modulation frequency is 33kHz, and spread ratio is 5000ppm. As the result, we can derive the equation:
s s
f = slope T = 330 T
Δ × × . (3-16) where fΔ means the frequency offset.
The longer the T is, the larger the frequency offset will be. But the longer the s T , the more precious equivalent frequency offset be calculated. It is a trade off s
between the T and the frequency offset. In this design, we design the maximum s error tolerance is two resolutions in T . We represent it as s
s clk
s
2 1
32 = f
T f
41.68p = f T
× Δ
×
⇒ Δ ×
. (3-17)
In this design, the number of resolution steps is 32. And T multiplies s fclk means the number of clocks in T . Finally, (3-17) means at the maximum frequency s offset, the maximum compensated error is 2 times resolution.
From (3-16) and (3-17), we can find the optimal T and the equivalent s frequency offset. Figure 3.13 shows the results from these two equations.
k Slope ppm
33 1 2 1 5000
×
=
Ts Average frequency offset in
previous Ts.
Ts
Figure 3.13: Determination of T s
The optimal T is 0.355us. In order to simplified the circuit design , we s choose the T =0.314us. s T is 0.341us represents that a s T is 512-clock cycles. s Therefore, in every T , the frequency offset is 112.53ppm. s
Ts Δ f(ppm )
0.355u (533clock) 117.15
112.53
0.341u (512clock)
Chapter4 Implementation of Clock and Data Recovery
Chapter 4
Implementation of Clock and Data Recovery
In this chapter, we describe the detail of the circuit implementation. The proposed system block diagram is shown in Figure 4.1. Therefore, we have behavior simulation to verify this design functionally. Besides, the circuit level simulation predicts the performance. Finally, the layout is shown, and the test environment setup is proposed.
Figure 4.1: Proposed system block diagram Recovery
Clock
1.5G 8 phase Lead_p/
Lag_p
Lead_f/ Lag_f
.
HRPD Variable Ts C.C.
Pulse Counter
Pulse
Accum. FEC Phase Control
Phase Selector Lock
Detector SSC_Data
Input 3Gbps
Ts
4.1 Building Blocks
There are eight blocks in proposed CDR as Figure 4.1. We describe these blocks respectively.
Half Rate Phase Detector
The half rate phase detector (HRPD) detects two bit boundary is lead, lag or hold in a clock cycle. HRPD needs 4-phase clock source to sample the input bit stream. The considerations is shown in Figure 4.2
Figure 4.2: HRPD timing diagram
In Figure 4.2, P0 and P2 detect the boundary of the input data, and P1 and P3 can sample the data. The circuit implementation is shown is Figure 4.3.[10]. By this operation, there are two answers of lead, lag ,or hold in every clock cycle. It is the reason that the architecture is called half rate.
The output of HRPD has 4 bits. Lead 1 and Lag 1 represent the relation between the input data and recovered clock in the first bit boundary. Lead 2 and Lag 2 are for the second bit boundary. For example, if (Lead1,Lag1,Lead2,Lag2)=1000, it means that the first boundary leads the clock and the second boundary has no transition.
P0(1.5GHz) P1 P2 P3 Input data 3Gbps
Chapter4 Implementation of Clock and Data Recovery
Figure 4.3: Half Rate Phase Detector
Therefore, we need an encoder to transform the output of the HRPD output into 2’s complement format. Table 4.1 is the truth table of this transformation. If the Lead1 and Lead2 are 1s, we transform into the 010(+2). On the contrary, if the Lag1 and Lag2 are 1s, we transform into the 110(-2). The positive number means lead and the negative number means lag. Finally, the boolean function of the encoder is (4-1).
And, the encoder is implemented by static CMOS logic.
0
1
Table 4.1: Encoder truth table
Variable-Sized Confidence Counter
In Chapter 3, we have introduced the function of the confidence counter.
According to the analysis in Chapter 3, the confidence counter size decides the equivalent bandwidth. A smaller N is equivalent to the larger bandwidth and better tracking ability. In order to compensate initial frequency offset, we choose N to be 2 initially. Then, N is increased to 8. Finally, N is fixed at the desired value 32.
By adjusting the size of the confidence counter, we change the equivalent bandwidth.
Therefore, it is called variable-sized confidence counter. The circuit of the variable-size confidence counter is shown in Figure 4.4.
Chapter4 Implementation of Clock and Data Recovery
Figure 4.4: Variable-sized confidence counter
In Figure 4.4, a 6-bit adder is designed. SO[5:0] are output bits of the 6-bit adder. For example, when N is 2, we detect that when SO1 changes the value. For N of 2, 8, and 32, we detect SO1, SO3, and SO5. In Table 4.2, the boldface represents that bit value being changed. Simultaneously, the variable sized confidence counter generates a output pulse. Moreover, if the input of the variable size confidence counter is lead, it represents positive value. When the accumulated value is +2, there is a lead output pulse. On the contrary, when the accumulated value is -3, another lag output pulse is generated. It is an asymmetry decision.
0
Table 4.2: 6-bit adder output in variable sized confidence counter (partial)
The adjustment of confidence counter number is decided by the lock detector.
We will introduce the behavior and circuit of the lock detector later.
Consider a clock frequency of 1.5GHz, it means that confidence counter must achieve its function in 666.6666ps. In Figure 4.4, we design a 6-bit adder with delay as small as possible. As the result, the 6-bit adder is shown in Figure 4.5. It is a carry-look-ahead(CLA) structure[11]. In order to reduce the propagation delay, we implement these gates by the pseudo NMOS logic rather than the static logic. From the simulation, the critical path in this 6-bit adder is 460ps.
Figure 4.5: The structure of 6-bit adder
S0
Chapter4 Implementation of Clock and Data Recovery
Phase Control
The phase control has two input source. One is the phase locked loop control and another control is from frequency compensation loop. Functionally, the phase
The phase control has two input source. One is the phase locked loop control and another control is from frequency compensation loop. Functionally, the phase