CHAPTER 1 INTRODUCTION
1.2 T HESIS O RGANIZATION
This thesis comprises five chapters. This chapter illustrates the research motivation.
Chapter 2 describes the background study. We will introduce transceiver and their basic concepts. In addition, we compare three different types of CDR and discuss design considerations in this chapter.
In Chapter 3, we describe the CDR in a receiver. A 2.5Gbps CDR is proposed in all digital approaches. The bandwidth of the confidence counter for an frequency offset of 5000ppm is discussed.
In Chapter 4, we describe remained blocks of the transceiver. First we chose shifted-register-type serializer and deserializer. Second, we chose a modified LVDS driver to meet our design target. Third, we implement the receiver-front end in all digital approach.
Finally, Chapter 5 concludes this thesis and discusses the future development.
TX RX
Timing Recovery
Chapter 2
Background Study
2.1 Basic Serial Link
In a point to point serial link system, we can classify a mesochronous system and plesiochronous system by how the sampling clocks are derived. In a mesochronous
system the phase relationship between the transmitter and receiver is unknown [3].
However, in a plesiochronous system, the transmitter and receiver have similar but slightly different clock frequencies. Usually, the clock recovery mechanism in a mesochronous systems is simpler than that in a plesiochronous system because only
phase locking is necessary.
A generalized model of a serial link is illustrated in Figure 2-1. It consists of a transmitter, a channel and a receiver. The transmitter side serializes the parallel data
and delivers the synchronized data into the channel. The receiver side receives the serial data and recovers its timing. Finally, the serial data becomes parallel one.
In practice, the transmitter drives a HIGH or LOW analog voltage onto the channel for a particular output-voltage swing as determined by the system specification. The Receiver on the other end of the channel recovers the signal to the original digital information. To recover the bits from the signal, the analog waveform is amplified and sampled. Then an additional circuit, the timing-recovery circuit, properly places the sampling strobe.
Fig. 2-1 A generalized model of a serial link
2.1.1 Introduction of Jitter
Jitter also called Timing jitter is the time-domain phase noise. It expresses the time difference between the expected transition of the signal and the actual transition.
Jitter can be defined into two types. One is called deterministic jitter which is the predictable component of jitter. The other are called random jitter that is the remaining components of jitter. Random Jitter (RJ) is related to thermal, flicker and shot noise sources. Deterministic Jitter (DJ) is related to crosstalk, ISI, Duty-Cycle Distortion (DCD). Word-synchronized distortion due to imperfections within a data
Serial to Parallel Receiver
N N
Phase Loop lock
Clock Recovery Output driver
Parallel to Serial
serializer and other bounded jitter sources. In a CDR system we usually concern two specifications which can show the effect of jitter.
2.1.2 Jitter Tolerance
Jitter tolerance defined as input jitter a receiver must tolerate without violating system BER specifications. We can test tolerance compliance by adding a peak-to-peak amplitude of sinusoidal jitters at various frequencies (with amplitude greater than the mask) to the data input and observing bit error rate. Figure 2-2 shows the jitter tolerance mask for OC-192 spec.
Fig. 2-2 Jitter tolerance mask.
2.1.3 Jitter Transfer
Jitter transfer is defined as the ratio of jitter on the output of a device to the jitter applied on the input of the device versus jitter frequency. Jitter transfer is important
because it can quantify the jitter accumulation performance of data retiming devices.
We can test jitter transfer by adding sine wave jitters at various frequencies and observing the resulting jitter at the CDR output. Figure 2-3 shows the jitter transfer for OC-192 spec.
Fig. 2-3 Jitter tolerance mask
2.1.4 Bit Error Rate
In data transmission, the Bit Error Rate (BER) is the percentage of bits that have errors relative to the total number of bits received in a transmission.
BER is a merit of figure for the link performance. It also presents the reliability of the link. Too high the BER may indicate that a slower data rate would actually prolong the overall transmission time for a given amount of transmitted data. As a
bits d transmitte of
number bits
erroneous of
number
BER= ÷
result, the BER is specified in many industrial specifications such as IEEE 802.x, Gigabit Ethernet, SONET, OC-48, OC-192…, and etc.
There are some contributors which make the BER increases. First, the influences of noise on the transmitted signal and the noise induced in the receiver. Second the ISI will also cause errors because of the characteristics of circuit operations.
2.2 Techniques of Timing Recovery
Figure 2-4 shows the functionality of CDR, A clock recovery circuit in a receiver is used to reconstruct the clock. Because the data received in a receiver are asynchronous and noisy. Thus, it requires that a clock be extracted for synchronous operations. The data also need to be retimed so that jitter accumulated during data transmission can be removed.
Fig. 2-4 Functionality of clock and data recovery
2.2.1 PLL Based Clock and Data Recovery Circuit
The basic architecture of a PLL-based CDR is shown in Figure 2-5. It’s a feedback system that operates like a traditional PLL. Random data, instead of a reference clock, is used as an input signal. A retimer is added on to the system can
D Q
CDR Circuit Serial Data Input
Decision Circuit
Recovered Clock
Recovered Data
reconstruct the input data. The output of PLL_based CDR is directly aligned to the incoming data. The clock phase position located at the center of input data eye.
Fig. 2-5 PLL-based CDR
A PLL-based CDR consists a Phase Frequency Detector (PFD), a charge pump (CP), a Low-Pass Filter (LP filter), a Voltage-Controlled Oscillator (VCO), and a retimer or a decision circuit.
The Phase Detector (PD) detects the phase difference between the sampling clock phase and input data. It must have the capability to tolerate the missing pulse.
There are two types of PD, Hogge’s PD (linear type) and Alexander PD (bang-bang type) [4] [5].
Figure 2-6 shows the basic Hogge’s PD and its transfer function. It contains 2 DFFs and 2 XOR gates. The output of Hogge’s PD will shows the phase difference amount and polarity.
Retimer PFD
&
CP
LPF VCO
Data In Recover
Clock
Data Out
Fig. 2-6 The basic Hogge’s PD
Figure 2-7 shows the basic Alexander PD and its transfer function. It contains 4 DFFs and 2 XOR gates. The result of phase detection only tells the polarity, whether leading or lagging. The amount of phase difference is not detected. Thus, the jitter is constantly produced at every phase detection.
Fig. 2-7 The basic Alexander PD
The phase difference is converted into VCO control voltage. The high frequency noise in this control voltage is filtered out by the loop filter. The loop adjusts the frequency of the voltage-controlled oscillator by the control voltage (Vctrl) until the phase is exactly the same as the input data. When the phase between the data input and VCO output are match, the VCO output will provide the sampling phase to the
D Q D Q
Alexander Binary PD Transfer function
Transfer function
decision circuit for retiming the input data.
For a feedback system, the stability is a major problem. The tracking bandwidth is limited by the stability of the feedback system. In general, the loop bandwidth of the PLL is usually set 1/20~1/40 of the reference clock. But this will limit the PLL track bandwidth and the ability to track the clock jitter.
Another similar architecture is the delay_locked loop type, It replaces the VCO block by a voltage control delay line (VCDL) A VCDL is composed of delay cells.
The phase of the reference clock is tracked by controlling the different delay of the VCDL.
2.2.2 Oversampling Based Clock and Data Recovery Circuit
The over-sampling block diagram is shown in Figure 2-8. The input data is sampled with the multiple phases generated by the clock generator in a single bit period. It requires at least 3 samples per bits. So, if the input frequency changes the sampling clock must be also changed.
Fig. 2-8 Oversampling-based CDR Multi-phase
Data Receiver PLL/DLL
Delay PhDet
Filter Select Clk0-n
Mux
Data Output Data
Recovery Ref_Clk
Data Input
After the over-sampling, the transition in the data must be detected by using digital approach architecture. The algorithm employed in this work uses the transition position to determine the bit-window boundary and the optimum sampling point is picked. This method essentially employs a feed-forward loop. The high tracking rate is possible without the loop bandwidth problem.
For example, the data is sampled by 3 phases per bit time is shown in Figure 2-9.
It decides the boundary with a portion of a sampled stream. First the neighboring data are XORed. Then the transitions are accumulated. The transition that has the highest accumulated count is the data transition edge. Then we can choice p2 as sampling phase [6].
P1 P2 P3 P1 P2 P3 P1 P2 P3 P1 P2 P3 P1 P2 P3 P1 P2 P3 Input data
Sampling Phases
Sampled Value 0 1 1 11 1 1 0 0 00 0 0 1 1 11 1 1 0 0 00 0 0 1 1 11 1 1 0 0
Indicate Transition 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0
Accumulate Transition 6 0 0
Transition Edge judgment P1-P2
Phase Picked P3
Fig. 2-9 Time diagram of the oversampling
2.2.3 Comparisons
Table 2-1 shows the difference between PLL based CDR and oversampling based CDR. The PLL type can reach the exact phase. So, the output jitter is small, but the tracking ability is limited by loop bandwidth. In the oversampling type the output
jitter is determined by the resolution of neighboring clock phases. The advantages of oversampling type are simpler implement and reject the very high frequency jitter, up to the lesser of clock rate. Because of using the oversampling method, the hardware overhead will increase.
PLL Based CDR Oversampling CDR
Lock time Depends on loop bandwidth Within a few data bits Jitter Depends on analog block’s noise, loop
bandwidth, mismatch& variations
Depends on multiphase clock resolution
Scalability No Yes
Table 2-1 Comparison of different typed CDR
2.3 Techniques of serializer
Serializer combines many low speed parallel random data into a high-speed stream of serial data. This technique reduces the buses for interconnections effectively Thus, it alleviates electrical and magnetic interference.
For example, most transmitters incorporate a 16-to-1 serializer. It allows 16 inputs to be much slower than the output. Hence, it simplifies the design of the package. This chapter describes the design of the all digitized CMOS data serializer.
That is capable of running at a 2.5Gbps data rate, which can be adopted in optical transmitters.
2.3.1 Tree-Type Serializer
Figure 2-10 shows the data serializer with tree-type architecture. That is based on a set of 2:1 multiplexer circuits and uses a binary tree structure. To convert 16 parallel
control clock is half clock rate of which is used in next stage. Conventional 2:1 serializer is trigger during the every clock transitions. The latch in the final stage as a retimer. Hence, the jitter caused by previous stages cannot be accumulated. Therefore, the clean control clock in tree-type Serializer is very important. But, it’s difficult to produce the global clock for high speed operation.
The data serializer can achieve high-speed operation. Because the RC load of every multiplexer is not large [7]. The power consumption is lower due to the small number of high-speed-operated devices.
Fig. 2-10 Tree-type serializer architecture
2.3.2 Single-Stage Type Serializer
Time-division parallel-to-serial data conversion is achieved by gating each transmission path sequentially. Each branch acts as AND gate. The control signal of every transmitter gate is causing by phase of PLL [8]. The data serializer with single-stage type architecture is shown in Figure.2-11. Each data bit is transmitted at a time slot defined by the overlapping of the two controlled phases ie. Ck0 and Ck5.
Figure 3.3 shows the single-stage type serializer timing diagram.
2:1 mux
2:1 mux
2:1 mux
2:1 mux
2:1 mux
2:1 mux
2:1 mux F/16
F/8
F/4
F/2 F
2:1 mux D0 D1
D1D1
………...
Out
Fig. 2-11 Single-stage type serializer architecture
The maximum operating speed of this type is limited by large output parasitic capacitance. It is generated by a larger number of parallelisms and cannot increase transistor size to overcome this problem. Because increasing size also increasing plastic capacitance. Hence, this data serializer is commonly applied to the systems below 4Gbps.
Fig. 2-12 Timing diagram of Single-stage type serializer
…. ….
outa outb Ck0
Ck5
d0
Ck0
Ck5
nd0 Ck7
Ck4
d7 nd7
Ck4 Ck7
Chapter 3
Clock and Data Recovery
3.1 CDR Tracking Method
In Chapter 2, we have described various architecture of CDR proposed in recent years. The CDR is usually used to solve phase problem in many application. It is essential to synchronize between data and clock to decrease BER.
Here we will introduce our CDR tracking method. Our CDR uses the phase shifting method to track the optimum phase. The phase change algorism is in linear search because it is easy to implement. We use clock rising edge as data sampling edge and the falling edge to deicide the bit boundary.
A tracking process is shown in Figure 3-1. The first sampling edge is rising edge.
The second sampling edge is falling edge. The third sampling edge is rising edge of next period. The initial sampled sequence is “001”.
We know that the direction of phase shifting is decided by the output of the PD.
The PD tells whether clock’s leading or laging the data. So the sampling phase will shift right or left until the phase is locked.
In Figure 3-1, the initial sampling phases are P1, the optimum sampling phases are between P2 and P3. After three times of changing sampling phases, the optimum sampling phase between “P2 and P3” is tracked.
Rule 1 Select the left phase as the sampling phase
if the PD output lag=1 lead=0 hold=0 ”.
Rule 2 Select the right phase as the sampling phase
if the PD output lag=0 lead=1 hold=0”.
Rule 3 The optimum sampling phase is found if phase change back and forth.
Fig. 3-1 Example of a tracking process
0 0 1
0 0 1
0 0 1
0 1 1
P1 P2 P3 P4
(b) (a) (c)
(d)
3.2 CDR Architecture
First of all, we need a clock generator to generate a 4-phase clock as the clock source. Alexander phase detectors will get the relationships between the input data and the sampling phases. Then, the leading/lagging/holding signal will be delivered to the confidence counter. The counter in this system works as the low pass filter. If the counter has any overflows, the lead_ov/lag_ov signal will change the phase control finite state to generate appropriate phase control signals. Phase interpolator will select two neighboring phases and interpolate them to generate new phase position. The new phase is near the optimum phase. we can recover the data directly by using the optimum sampling phase. The CDR is designed to spread spectrum clocking (SSC).
So we can handle 5000ppm frequency offset between input data and internal clock.
The architecture of our CDR circuit is shown in Figure 3-2.
Fig. 3-2 Architecture of the proposed CDR
2
4
APD CC FSM PI
2.5Gbps Recovered Data
Lead Hold Lag
Lead_ov
Lag_ov
2.5G clk
4
•APD: Alexander phase detector
•CC :Confidence counter
•FSM : Phase control
•PI : Phase interpolator 2.5Gbps
Data Input
3.3 Building Blocks
Each block in Figure 3-2 will be described respectively in the below paragraphs.
3.3.1 Multi-phases Phase Locked Loop
An 8-phase PLL [9] is applied as a frequency source for the purpose of multi-phase sampling. The PLL is a ring oscillator operates at frequency of 1.25GHz.
We use xor gate to generate 4-phase 2.5GHz clock.
3.3.2 Alexander Phase Detector (APD)
Chapter 2, we compare different PDs. The architecture of APD is shown in Figure 3-4. The phase detector in the CDR extracts the relationships between the input data transitions and the edges of sampling phases. Whether a data transition is present and whether the clock lead or lag the data [17]. Figure 3-3 illustrates the APD principle. It also knows as the lead-lag detection method. The XORs of three data samples, S1 S⊕ 2 and S2⊕S3 provides the lead or lag information. If lead=1、
lag =0 then the clock falling edge is lead data. If lead=0、lag =1 then the clock falling edge is lag data. If hold=1, then no data transition is present. So Alexander phase detector is a good approach in this field.
Clk lead
Fig. 3-4 Architecture of APD
3.3.3 Confidence Counter
Because the data is easily affected by noise. It will tell erroneous information of clock’s leading or lagging. So we use a confidence counter here to play a role of digital low pass filter. It prevents the influences on the system performance due to the noise effects. Such as signal distortions or ISI [10]. Traditionally, the confidence counter has two types. One type is the continuation type. Only when there are leads or lags continually N times, the confidence counter is activated. The other type is the accumulation type. When the numbers of lead or lag accumulates N times, the confidence counter is activated.
The state diagram is shown in Figure 3-5. Here we use an accumulation type counter with one-hot-state design. It has lower the glitches effect and is suitable for
D Q D Q
D Q D Q
Clk B
C
lag
lead hold
Qb Q
D Db ck
ckb
A
high speed operation.
The confidence counter works in a token transfer manner. When token transfer out of range, the token will start at middle of the register’s chain and send the overflow message. Now we have to change the sampling phases. In other words, to design such a high speed counter must be more carefully. The circuit is shown in Figure 3-6.
Fig. 3-5 State diagram of the confidence counter.
Fig. 3-6 Architecture of the confidence counter.
gnd
gnd
… …
Reset
CLK
Left
right
Hold_FSM Hold
Right Left
Lead_FSM
X6
X7
X5
X8 X9
X4 X3
L R
L R X11
X2 X1
H
X12
Lag_FSM
3.3.4 Phase Control (Finite State Machine)
A finite state machine is designed as a control logic block. It tells the system which state is present state and which output will be considered as the recovered clock and the recovered data. When the phase control receives lead_ov or lag_ov signal from the confidence counter. It controls the phase interpolator to adjust clock with a phase resolution. The state diagram and the phase change behavior are shown in Figure 3-7.
Fig. 3-7 (a) State diagram (b) Phase behavior 1000
1100
1110 1110
1100
1000
0000 0011
0111 0001
1111
1111
0000 0001 0011 0111
P0&P1 interpolator (a)
P0 P1
…
P2 B …
A
Course control Course control
fine control fine control
(b)
The phase control includes two parts. One is fine tune and the other is coarse tune. The coarse tune determines two neighboring phase which difference is 100p.The fine tune determines the optimum phase between the neighboring phase. So the fine tune must works after coarse tune. The behavior of the phase change is linear change and the control word change like thermal code.
The architecture of the phase control is shown in Figure 3-8. The fine tune is shift register type. The coarse tune is a combinational circuit generated by Karnaugh map.
Fig. 3-8 The architecture of phase control (a) Fine tune (b) Coarse tune (b)
3.3.5 Phase Interpolator
The utility of the interpolator is to interpolate one signal between two phases.
Figure 3-9 shows the architecture of phase interpolator. The interpolator composed of two parallel sets which are composed of tri-state inverters and their output are short together. We can generate the phase by adjusting the relative drive strengths of the two sides. Because the basic cell of interpolator is tri-state inverter. Hence, the number of turn-on inverters can be controlled digitally.
The architecture of phase interpolator is shown in Figure 3-10. The coarse tune control the phases Ph0~Ph3 which have a difference 100ps between each phase. The phase can be rotated by change coarse tune control bits. The behavior of fine tune control varies the contribution of two input edges. This method generates the new
The architecture of phase interpolator is shown in Figure 3-10. The coarse tune control the phases Ph0~Ph3 which have a difference 100ps between each phase. The phase can be rotated by change coarse tune control bits. The behavior of fine tune control varies the contribution of two input edges. This method generates the new