Thesis Organization

Chapter 1 Introduction

1.4 Thesis Organization

The thesis organization is described as follows:

Chapter 2 introduce the modern clock and data recovery circuits, including design trends from tracking type to various oversampling types of CDR. The specifications of data, timing and spread spectrum are also investigated.

Chapter 3 describes the 2^nd-order Phase/Frequency tracking algorithm for Feed-Forward Phase Adjusted CDR. Algorithm and theories are analyzed;

implementations, behavioral and circuit simulations are carried out.

Chapter 4 describes the Multiple Alternating Edge Sampling methodology used to improve CDR loop behavior. Theoretical analysis and implementation results are done and compared with different scenarios.

Chapter 5 shows the experimental results and describes the measurement consideration for the test chip.

Chapter 6 is the conclusion and future work that may improve the CDR.

Chapter 2 Oversampling Based Clock Data Recovery

2.1 Introduction to Clock Data Recovery

In multi-gigabit serial link systems, due to the extremely high data rate, the bit time becomes small comparing to signal propagation time. It is therefore impractical to provide additional serial clock with a separate wire because even the slightest difference in length of the data and clock line will introduce significant skew. In modern high speed serial links, the clock is no longer transmitted through the channel, but is extracted from the data by the clock and data recovery (CDR) circuits. The CDR must detect the phase and frequency information from the received data transition and adjust the local clock generator to recover the link clock signal. The recovered clock is then used to sample the received data stream at the optimal point,

i.e. the point that offers most timing margin against jittery input and the least recovered data bit-error rate (BER). The CDR is therefore an important building block in the receiver architecture, and is used in many serial protocols, such as Gigabit Ethernet, serial ATA, PCI-Express, HDMI, SONET/SDH, XAUI, etc.

The development of CDR circuits has brought out a variety of architecture that is shooting for different applications. As shown in Fig. 2.1, the earlier design [10]-[12], and [13] incorporate a Phase-Locked-Loop (PLL) in the CDR loop to track the phase and frequency of incoming bit stream. The tracking type CDR design is straightforward, but suffers from speed limitation due to linear phase detection. Also the direct use of PLL to recover clock leads to undesired bandwidth conflict. In general, the bandwidth requirements of PLL loop and CDR loop may be different with respect to the need of PLL phase noise immunity, CDR tracking ability, and the stability of tracking behavior due to low input SNR. Such bandwidth issue leads to the development of oversampling CDR [14] - [24]. The oversampling CDR does not use PLL directly to track the phase and frequency of incoming data. Instead, a separate feedback phase/frequency recovery loop chooses among multiple phase from PLL to track the receive stream. The dual-loop architecture also has additional benefits that will be explained later.

Fig. 2.1 The Tracking type CDR

2.2 Comparisons of Oversampling CDR Algorithms and Architectures

The oversampling CDR consist of a frequency synthesize loop and a phase/frequency recovery loop. The frequency synthesize loop provides the clock for recovery loop to work in a plesiochronous condition. That is, the frequency and phase is very close to the receiving data, so that the recovery loop can further minimize the difference. The dual-loop architecture provides an additional advantage for modern high traffic serial link application [14] [24]. In modern communication systems, multi-IO systems integrated on System-On-Chip (SOC) is desired because the high data rate requirements and reduced area and power. For multi-IO systems, many dual-loop CDRs can share one common frequency synthesize loop to provide plesiochronous clock, while each recovery loop is independent from other IOs and function individually. This dual-loop provides great power and area savings, as PLL is

Fig. 2.2 Blind Oversampling CDR.

generally area and power consuming.

The oversampling CDR uses multiple sample point per data unit interval (UI) to acquire phase lead/lag information. For example, a 2X-oversampling CDR has a data-sampling and an edge-sampling for every UI, and the sampled info is compared to each other to tell the lead or lag information. This is the binary phase detection which is suitable for high speed application, but may introduce some unfavorable effects that can be overcome by M-AES proposed in Chapter 4.

2.2.1 Blind Oversampling Scheme

As shown in Fig. 2.2, the blind oversampling CDR [14][15][24][27] consist of a Multi-Phase PLL as frequency synthesize loop and an all digital data recovery loop.

In Fig. 2.3, the blind oversampling detects the data transition and chooses from the multiple phases from PLL: p0, p1 and p2, and the result of the sampling phase that best samples the data eye opening is used as recovered data. This blind oversampling does not include feedback loop and can be digitally implemented, therefore it is suitable for soft Silicon Intellectual Property (SIP) application. The parameters of

Fig. 2.3 The Blind Oversampling algorithm using center picking scheme.

digital filter can be adjustable regarding different specifications. However, the blind oversampling technique lacks the frequency tracking ability and requires huge traffic buffer when there is frequency deviation between Tx/Rx. Due to the nature of blind oversampling, the data is recovered without the information of the correct frequency;

therefore no clock signal is recovered. For modern serial link specification that requires spread spectrum clocks, the design will be insufficient.

2.2.2 Feed-Back Phase Adjusted Scheme

Fig. 2.4 shows the Feed-Back Phase Adjusted CDR [21] [22] [23]. Incoming data is sampled by edge sampling clock and data sampling clock provided by multi-phase VCO, and the phase detector extracts lead/lag information. The information, after digitally filtered, is used to alter the phase of feedback clock of the PLL by phase multiplexer and phase interpolator, thereby changes the Vctrl and the multi-phase

Fig. 2.4 Feed‐Back Phase Adjusted CDR.

VCO clock to track incoming data. This architecture has advantages from the fact that the feedback clock is phase adjusted. First, the phase discontinuity produced in phase selection can be filtered out by loop filter, thus the jitter of sampling clock can be reduced. Second, all phases from VCO is altered simultaneously to track the incoming data, therefore greatly reduce the number of phase multiplexer and phase interpolator required, hence reduce great power and area.

However, because the PLL loop and clock recovery loop are simultaneously altered, this architecture suffers the bandwidth requirements conflict as mentioned before. Moreover, as the PLL loop is no longer independent from CDR loop, when applied in multi-IO systems, each CDR will require one PLL to provide plesiochronous clock. It is again a great demand of area and power, leads to an unfavorable choice for multi-IO applications.

Fig. 2. 5 Feed‐Forward Phase Adjusted CDR

2.2.3 Feed-Forward Phase adjusted Scheme

Due to the drawbacks of feed-back phase adjusted CDR, the frequency synthesize loop (PLL) and clock recovery loop must be independent from each other [16][17][18]. As shown in Fig. 2. 5, the phase selection is moved away from PLL’s feedback clock to the direct output of multi-phase VCO. In our proposed architecture, bandwidth conflict is avoided and can support multi-IO applications.

Multiple phase multiplexer and phase interpolator are used in phase selection because of parallel sampling. Fig. 2. 5 shows the case with parallel sampling of five bits. This will have extra four multiplexer and interpolator blocks, but trades for application flexibility, and much more power/area saving in multi-IO systems.

2.3 Timing and Data Format Specifications

Our proposed CDR is targeted for modern multi-gigabit serial transmission systems that are applicable to varies standards. One suitable standard is Serial Advanced Technology Attachment Generation 3 (SATA-III) [29]. The SATA is a high speed interconnection applied in computer and storage devices like hard disk and optical drivers and is expected to replace the widely used ATA technology. Although SATA-II already found applicability in modern hard disk drive and is able to cover foreseeable improvement of hard disk drive transfer rate in near future, SATA-III is still being developed and will be used in port multipliers, solid-state drives, and the continuing of storage evolution based on historic trends [30]. Table 2.1 shows the generations of SATA.

2.3.1 Data Format

Because the specification of SATA-III is still under development, our proposed

Table 2.1 Generations of SATA [29]

CDR will use the known specifications of SATA-II. According to [29], the data rate is 6Gb/s. The receiver should be able to detect differential NRZ stream with data rates of

± 350 ppm with 0/-5000 ppm spread spectrum clock from nominal rate. The minimum and maximum differential input voltage is 275mV and 750mV respectively.

2.3.2 Timing and Jitter Performance

The timing requirements are specified in eye diagram and jitter performances.

Although eye diagram is not specified in SATA documents, it can be referenced from 3Gb/s standards of Serial Attached SCSI which is capable of interoperating with SATA [31]. Fig. 2.6 shows the eye diagram and Table 2.2shows the parameters.

The jitter performance is specified in [29] and is divided into 2 categories: one is random jitter (RJ), which arises from thermal noise and is an unbounded Gaussian distribution. It is normally measured in standard deviation (σ_RJ and as a rule of thumb, the data transition edge can be 14 times of the standard deviations away from the mean during 10¹² data transmitted. The other class of jitter is deterministic jitter (DJ), which composes of duty cycle distortion, data dependent (ISI), periodic and uncorrelated bounded. DJ is characterized by bounded, peak-to-peak value.

To ensure 10^-12 Bit Error Rate (BER), SATA calculates total jitter (TJ) by

TJ DJ 14 σRJ (2. 1)

Given TJ=0.60UI and DJ=0.42UI [29], one can calculate that σ_RJ 0.013UI, and this will be the target specification.

Jitter tolerance mask is another important measure of CDR systems that describes the frequency response of the CDR loop under the input phase variations.

The jitter tolerance mask is not clearly specified in SATA, therefore we reference the tolerance standard of synchronous digital hierarchy (SDH) STM-64 interface [32], whose data rate is 10 Gb/s, as our design target specification. The specifications are shown in Fig. 2.7 and Table 2.3. From the specification we can see that CDR is required to track low frequency jitter to very large amplitude, while high frequency (>10MHz) jitter is allowed to pass directly without any tracking.

Half of maximum jitter

UI X1 0.275

Center UI X2 0.500

Fig. 2.7 The target jitter tolerance mask. [32]

Table 2.3 The requirements of jitter tolerance mask.

Frequency Requirement

10 < f ≤ 12.1 2490 UI

12.1 < f ≤ 20 k 3.0 10 f UI

20 k < f ≤ 400 k 1.5 U I

400 k < f ≤ 4 M 6.0 10 f UI

4 M < f ≤ 80 M 0.15UI

2.3.3 Spread Spectrum Clock

In high speed electronic circuits, voltage and current altering induces great intensity of electro-magnetic radiation called Electro-Magnetic Interference (EMI).

This interference becomes a serious threat to functionality of other electronic modules and needs to be attenuated or shielded. As a high speed electro signal generator, the serial link transmitter is required to adapt EMI reduction mechanism, and spread spectrum clock is the most efficient and preferable solution.

Spread Spectrum Clock (SSC) is a special application of frequency modulation (FM); the basic idea is to modulate the frequency of the EMI-emitting high speed clock signal, creating a small deviation from original frequency. As the frequency is deviated, the energy peak is “spread” in the spectrum and the amplitude is attenuated, therefore the emitting energy is reduced.

The waveform that is used to frequency modulates the EMI source is called

“modulation profile”, and the frequency of the waveform is called “modulation frequency”. As shown in Fig. 2. 8, the shape of spread spectrum is mainly determined by the modulation profile. According to [28], triangle profile provides more averaged attenuation than the sinusoidal, thus better overall attenuation, while much easily realizable than the saw-tooth waveform.

The amount of energy attenuation is determined by the modulation frequency. In general, higher frequency modulation waveform results in greater energy attenuation.

But in serial link application, high modulation frequency directly contributes to high frequency deviation in the receiver end that the CDR loop bandwidth needs to cope with, which is often very low.

In the SATA specification [29], the modulation profile is triangle waveform, and the modulation frequency is 30~33 KHz. The maximum frequency deviation is -5000 ppm. Then this will be the target SSC specification for our clock recovery circuit, shown in Fig. 2. 9

Fig. 2. 8 Modulation profiles and their corresponding spectrums (a) Sinusoidal (b) Triangle (c) Saw‐tooth

Fig. 2. 9 Target SSC specification of our CDR

Chapter 3 A 2 ^nd -Order Phase/Frequency tracking Algorithm for Feed- Forward Phase Adjusted CDR

3.1 Overview

The oversampling clock and data recovery circuits introduced in Chapter 2 use phase adjusting method, rather than voltage controlled oscillators, to track incoming phase and frequency deviation. Therefore, it needs an algorithm to calculate the required phase adjustment from the information of binary PD. In order to track both phase and frequency, it needs a 2^nd-order algorithm and has been reported in [17], [33]-[35]. The theoretical analysis can be found in [25] and is very useful in designing the 2^nd-order behavioral model.

(a)

(b)

Fig. 3. 1 (a) The concept of 2^nd‐order CDR. (b) The proposed 2^nd‐order CDR.

The s-domain concept of 2^nd-order CDR can be seen in Fig. 3. 1(a), the binary phase detector detects the phase difference φe, then φe is proportionally counted with a gain GP, and integrated with gain GI. The ratio of the phase adjustments from the proportional path to that from the integral path is defined to be the stability factor ξ [25]. In Fig. 3. 1(a), the stability factor equals GP/ GI. In binary phase detection without deadzone, ξ should be greater than two times the loop latency in UI to achieve unconditionally stable [25]. However, in our design this constrain may be relaxed because the deadzone from M-AES as will be described in Section 4.2.1.

The results from two paths are summed and is used to direct the digital phase rotator. The rotator acts as the VCO in s-domain, which is an integrator of filter output, it integrates the phase +/- information and adjust the phase of sampling edges.

In our proposed architecture, however, in order to reduce hardware overhead in implementing integral path while maintaining loop stability, the arrangement is modified as in Fig. 3. 1(b). In Fig. 3. 1(b), the ξ now equals 1/GI. It can be shown in s-domain analysis that if the natural frequency and damping factor of original 2^nd-order CDR is n and ζ (different from ξ), then the proposed 2^nd-order CDR will have ω^′ ω G_P and ζ^′ ζ/ G_P. Because the actual required phase adjustments are very rare comparing to the phase detection results, the GP is less than 1, so the natural frequency is reduced and damping factor is increased. Therefore parameters can be designed accordingly without changing loop characteristic. The implementation of GI, which is also less than 1, needs not to be so small therefore requires less hardware (will be further explained in Section 3.2.3).

3.2 Phase/Frequency tracking CDR

The block diagram of the proposed feed-forward phase adjusted CDR is shown in Fig. 3. 2. At data rate fs=6GHz, a reference clock of 100MHz is given to the PLL to generate a clock with 1.2GHz, 10 phases. Phase selection block, controlled by the digital 2^nd-order algorithm, selects 5 phases for data sample and 5 phases for edge sample that tracks incoming stream with phase resolution of 1/32 UI. At the sampler, the incoming stream is sampled and synchronized with parallel 5-bits at 1.2GHz, equivalent to 6GHz data rate. The Phase Detector is a binary (bang-bang) [25]

detector, that extract the phase lead/lag information from the data and edge samples.

Then a Pre-Filter which composed of a majority vote and a sliding window is used to average out the effect of random jitter and balance the loop gain in different data transition density. The sliding window operates at the rate to 600MHz and its output is

Fig. 3. 2 The block diagram of proposed CDR.

used in the Proportional and Integral Path.

The proportional path and integral path behaves like a 2^nd -order digital loop filter that interpret the Up/Down into phase and frequency adjustment, then the Phase rotator and decoder controls the Phase selection block.

3.2.1 Pre-Filter

As shown in Fig. 3. 3 (a) and (b), the binary phase detection is done by exclusive-or the data sampling and edge sampling to detect transition and compare the transition with current clock edges. The Pre-Filter has a majority vote, which sums the 5 lead/lag signals and makes a final decision to represent current lead/lag, as shown in Fig. 3. 3 (b). This majority vote has two contributions: first, the effect of random jitter can be averaged, i.e., the randomness can be filtered out and the trend of phase

Fig. 3. 3 (a) The ideal operation of data and edge sampling. (b) Under jittery condition, the operation of binary PD and Pre‐Filter.

drifting can be maintained; second, the difference of data transition density often causes huge variation of loop gain and results in instability or loss of tracking. The majority vote can ensure an upper bounded gain when data transition is too often, and a minimum gain when data transition is too rare; hence preserve a reasonable loop gain.

The second part of pre-filter is the sliding window that changes the rate to 600MHz to enable later processing and produces the sum of two successive Pre-Filter outputs.

3.2.2 Proportional Path

The GP in Fig. 3. 1(b) is implemented by a modified first-order delta-sigma modulator. The modification is done by adding sign bit path to handle both positive and negative inputs. The architecture and operation are shown in Fig. 3. 4. The input of proportional path is from sliding window that sums two successive Ups/Downs,

therefore is a 3-bit integer ranging from +2 to -2. The 3-bit is then extended to (N+1) bits for truncation. The value of proportional gain, GP, is decided by the truncation depth N, that is, GP=2^-N. For example, N=2 represents GP=0.25. In our design, the depth N is programmable from 2 to 5.

In Fig. 3. 4 (b) and (c), we can see that when GP=1/4, in 6 cycle time, proportional path generates 2 positive steps when input has 8 up signals, and 2 negative steps when input has 8 down signals. This implementation produce a time averaged gain equal to a fractional number, and the output is the phase adjustment step. The phase rotator will integrate the steps and tracks the incoming data phase.

With the continuing of phase adjustment, the proportional also has a very limited frequency tracking capability. For example, in our proposed system, if N=3 is set, the proportional thpa has maximum frequency tolerance of

1 2^N

600MHz 6GHz

32UI 390.625ppm (3. 1)

(a)

4 /

=1 GP

(b)

4 /

=1 GP

(c)

Fig. 3. 4 (a) The architecture of Proportional path. (b) The operation of Proportional path. (c) The operation with negative values.

Fig. 3. 5 The architecture of Integral path.

3.2.3 Integral Path

In order to track not only the phase but also frequency of incoming data, the Up/Down information must be integrated to form the frequency information. The integrated signal is then passed into a time averaged gain element similar to the proportional path, as shown in Fig. 3. 5. The input is extended to (4+M) bits, where 4-bits are integer part and M-bits are fractional parts. The integer width is determined by SSC spread frequency. When the maximum frequency deviation of SSC, the maximum phase adjustment of integral path is:

5000ppm 6MHz6GHz 1 32

C 4 (3. 2)

In order to represent +4 ~ -4, we need 4-bit width. Where C is the counter gain,

in our design it is 0.5. The reason for choosing 0.5 as counter gain is explained in Section 3.2.4.

Being different from proportional path, the delay of integral path is not critical to jitter tolerance mask at high frequency. Therefore, pipeline insertion is suitable to maintain functionality at 600MHz. The pipelines are inserted between integer integrator, fractional integrator and fractional counter. As the proportional path, the

在文檔中適用於展頻時脈之多重交替式轉態取樣技術與時脈資料回復電路 (頁 25-0)