使用改進式電流驅動邏輯閂鎖器之百億位元/每秒資料與時脈回復電路

(1)

國立交通大學

電信工程學系

碩士論文

使用改進式電流驅動邏輯閂鎖器之百億位

元/每秒資料與時脈回復電路

10Gb/s Clock and Data recovery circuit

with improved MCML Latch

研究生：邱俊宏

指導教授：洪崇智博士

(2)

使用改進式電流驅動邏輯閂鎖器之百億位

元/每秒資料與時脈回復電路

10Gb/s Clock and Data recovery circuit

with improved MCML Latch

研究生：邱俊宏

Student: Chun-Hong Chiu

指導教授：洪崇智教授

Advisor: Prof. Chung-Chih Hung

國立交通大學

電信工程學系電信研究所碩士班

碩士論文

A Thesis

Submitted to Department of Communication Engineering

College of Electrical Engineering and Computer Science

National Chiao-Tung University

In Partial Fulfillment of the Requirements

For the Degree of

Master of Science

In

Communication Engineering

January 2006

(3)

Abstrat(chinese) i

Abstrat(English) ii

Acknowledgement iv

List of Tables v

List of Figures vi

Chapter 1 Introduction 1

1.1 Backgroud... 1 1.2 NRZ Data Format... 2 1.3 Optical-Fiber Transceiver………... 3

1.4 Eye Diagram Analysis... 4

1.5 Thesis Overview………. 5

Chapter 2 Clock and Data Recovery Architectures 7

2.1 Principles of Operation……… 7 2.2 CDR Fundamental……….. 9 2.2.1 Bang-Bang PD... 9 2.2.2 Voltage-to-Current Converter... 13 2.2.3 Loop Filter... 13 2.2.4 Voltage-Controlled Oscillator... 14

2.3 Analysis of Loop Performance………...…… 15

2.3.1 approximated frequency response with Loop filter... 18

Chapter 3 10Gb/s CDR Design 21

3.1 Introduction………. 21

3.2 Circuit Description……….. 21

3.2.1 High speed MCML Latch... 22

3.2.2 Improved High speed MCML Latch... 26

(4)

3.2.3.1 D_type Flip-Flop... 27

3.2.3.2 XOR gate... 29

3.2.4 V/I converter and Loop filter………... 31

3.2.5 VCO………. 32

3.2.6 Output Driver... 37

3.3 Simulation Result………... 38

Chapter 4 VLSI Implementation 41

4.1 Layout……….. 41

4.2 Performance Summary………. 43

Chapter 5 Conclusion and Future research 45

Bibliography 47

(5)

摘要

隨著互補式金氧半製程技術的發展，以及處理器運算能力的快速提升，顯示著用以傳輸資訊的寬頻資料連結越來越顯得重要。在許多的應用中，比如說電腦內部、電腦與電腦間和電腦與週邊間的介面，這樣的連結通常是一個很重要的部分，也是目前整體系統操作速度的瓶頸。為了克服在資料傳輸過程中由各種雜訊源所導致的訊號完整性問題，接收器在整個高速連結效能的表現中扮演了一個重要的角色，而其中最複雜的部分就是資料與時脈回復電路的設計，傳統上，為了在能操作在高頻下，製程上大都採用 GaAs METFET, GaAs HBT 或 Si BiCMOS 製程。然而，由於深次微米互補式金氧半製程本身高速、低成本、低功率、高度整合的優勢，深次微米互補式金氧半製程也已經被考慮使用在這些高速電路。論文主題在於使用標準互補式金氧半製程實現一個使用我們改進過後的電流驅動閂鎖器 10Gb/s 資料與時脈回復電路。並比較其和傳統電流驅動閂鎖器對資料與時脈回復電路效能的改進。此資料與時脈回復電路採用了亞歷山大數位式相位比較器，對稱式互斥或閘和 LC 壓控振盪。電路採用台積電 0.18um 的製程技術，在 1.8V 的電源供應下消耗 130 毫瓦(包含輸出緩衝器)。使用改進過後的電流驅動閂鎖器電路的重新取樣資料抖動為 7.5ps，比原本的電流驅動閂鎖器電路的資料抖動 11.2ps 來得佳。

(6)

Abstract

The scaling of CMOS process technologies and the increasing computational capability of processors show that high bandwidth links to communicate information are getting more and more important. Such high speed links are necessary parts of many applications, such as inner computer, computer-to-computer, or computer-to-peripheral interfaces, and they are the bottleneck of the system operating speed. To overcome the signal integrity problems induced by various noise sources during data transmission, the receiver design plays a significant role in the overall performance of high speed links. The design of clock and data recovery circuits is the most complicated part of the transceiver implementation. Traditionally, such high-speed circuits used for multi-Gb/s data communication were implemented with either GaAs METFET, GaAs HBT, or Si BiCMOS technology. However, the deep sub-micron CMOS technology is now being considered in these high-speed circuits because of its high speed, low cost, low power dissipation, and highly integrated capability.

The goal of this work is to use a standard CMOS process to implement a 10Gb/s clock and data recovery (CDR) circuit using the improved MCML latch. Comparison of the CDR performance between using improved MCML latch and using common MCML latch are provided.This clock and data recovery circuit uses an Alexander bang-bang phase detector, symmetry XOR gates and a LC tank Voltage-Controlled Oscillator (VCO). The circuit is designed in TSMC 0.18-um CMOS technology,

(7)

jitter of the retimed data of CDR using the improved MCML latch is 7.5ps better than the 11.2ps peak-to-peak jitter of the retimed data of CDR using common MCML latch.

(8)

誌謝

首先，我要對我的指導教授洪崇智老師致上最誠摯的謝意，還有口試委員陳信樹老師、蘇育德老師、闕河鳴老師對我論文所提出的改進和建意。感謝老師在我碩士班兩年的研究生涯中，給予我最細心與耐心的指導與叮嚀，並在適當的時候給予我鼓勵與提供寶貴的人生經驗。其次，對於實驗室的天佑學長還有陪伴我兩年的同學誌倫、三益、家瑋、峻岳，以及學弟們也要致上我深深的謝意，感謝他們給予我知識上的啟發，以及實驗中的協助。還有其他實驗室的同學明衡、啟賓，學長蕭先生等，在論文完成的過程中給予我精神與實質上的莫大幫助。最後，要感謝的是我最親愛的父母，和善體人意的姊姊們感謝他們給予我的支持與關懷，使我在人生的過程裡得到最細心的呵護與照顧，讓我在成長與求學過程中能夠有所依靠。僅以此篇論文獻給所有關心我的人。邱俊宏國立交通大學

(9)

List of Tables

2-1 Samples and the clock phase condition……… 11

3-1 Post simulation result of the two type D Flip-Flops……… 29

3-2 VCO post simulation result………. 37

(10)

List of Figures

1-1 Non-return-to-zero data format……….. 2

1-2 The spectrum of NRZ data………. 2

1-3 Block diagram of a common optical-fiber transceiver………... 3

1-4 Eye diagram……… 4

2-1 A simple CDR block diagram………. 8

2-2 The Alexander PD principle………... 10

2-3 The Alexander phase detector……… 10

2-4 The waveforms of Alexander PD when clock lags……… 12

2-5 Bang-Bang PD characteristic………. 12

2-6 A second-order loop filter……….. 13

2-7 Illustration of the VCO (a) model of the oscillator, (b) characteristic………... 15

2-8 Model of the CDR………. 16

2-9 Bode plot of the open-loop transfer function……… 17

2-10 The close-loop frequency response of the CDR………. 19

2-11 The close-loop transient step response of a CDR……… 20

3-1 Inverter circuits of the (a) CMOS logic and (b) MCML………... 23

3-2 Power consumption of the MCML and CMOS logic……… 23

3-3 Circuit schematic of a CMOS CML latch………. 24

3-4 Voltage waveforms at the output of a MCML Latch………. 25

3-5 The improved MCML latch………... 26

3-6 Voltage waveforms at the output of the improved MCML Latch……….. 27

3-7 The Master-Slave D Flip-Flop………... 28

3-8 (a) The Eye diagram of the D Flip-Flop (b) The Eye diagram of the D Flip-Flop with improved MCML latch……….. 28

3-9 MCML XOR gate……….. 29

3-10 Symmetric XOR gate………... 30

3-11 The transfer function of the Alexander PD……….. 31

(11)

3-14 Structure of varactor (a) pn junction (b) accumulation-mode MOS………… 34

3-15 The C-V characteristic of accumulation-mode MOS varactor……… 35

3-16 (a) Symmetric spiral inductor (b) circuit model of inductor………... 35

3-17 Simulation result of the VCO……… 37

3-18 Source follower output buffer………... 38

3-19 CDR locking behavior……….. 38

3-20 Retimed data and clock……… 39

3-20 The eye diagram of the retimed data with improved MCML latch………. 39

3-21 The eye diagram of the retimed data with common MCML latch……….. 40

4-1 Chip layout of the CDR circuits……… 42

A-1 LM 317 regulator……….. 52

A-2 Bypass filter at regulator output……… 52

A-3 (a) Off-chip bonding wire test (b) The testing PCB………. 53

A-4 Experimental test setup……… 54

A-5 VCO tuning range experimental result……… 55

A-6 VCO signal power……… 55

(12)

Chapter 1 Introduction

1.1 Background

With the rapidly-growing volumes of data in telecommunication networks, the research in high-speed optical and electronic devices and systems have increased. With the popularization of the Internet and the rise in the speed of microprocessors and memories, the transport of data continues to be the bottleneck, motivating work on faster communication channels.

The idea of using light as a carrier for signals has been around for more than a century, but researchers could not demonstrated the utility of the optical fiber as a medium for light propagation until the mid-1950s. The optical-fiber has the characteristics of extremely low loss (0.15 to 0.2 dB/km) and large bandwidth (25 go 50 GHz) so the optical-fiber communication systems have been used mainly for high-speed, high-density, long-distance communications[1]. The advantages of the optical-fiber transmission in the local systems are now being explored in applications, such as Local Area Network (LAN) and Wide Area Network (WAN) systems for multimedia, Fiber To The Home (FTTH)[2], and the board-to-board interconnections between computers[3].

(13)

1.2 NRZ Data Format

The optical-fiber communication system sends Non-Return-to-Zero (NRZ) data of information as a series of optical or electrical pulse. The Non-Return-to-Zero (NRZ) data format has the property that the voltage level is constant during a bit interval. For example, a constant positive voltage used to represent binary 1, and the absence of the voltage can be used to represent binary 0. An instance of NRZ data is illustrated in Figure 1-1, and Figure 1-2 shows the spectrum of NRZ data. The NRZ data exhibit no spectral line at the frequency equal to the bit rate. The NRZ data format is hard to detect the clock frequency, but it is more effectively in using bandwidth.

D(t)

1 0 1 1 0 1 0 0 0 1 1

t

Tb

Figure 1-1 Non-return-to-zero data format

(14)

1.3 Optical-Fiber Transceiver

A common optical-fiber communication system consists of a transmitter and a receiver as show in Figure 1-3[1]. At the transmitter side, a multiplexer that manages several signals and interleaves them into a high-speed output. The output stream is sent to a laser driver and turned into an optical signal through a laser diode. The optical signal travels through a fiber to the receiver side.

At the receiver side, the received light is sensed by a photodetector (ex: a photodiode) which converts the light to a week current signal. The photodiode is followed by a transimpedance amplifier (TIA) which amplifies the photodiode output with low noise and sufficient bandwidth and converts the current signal into a voltage signal. The Clock and Data Recovery (CDR) circuit is the most important block in the receiver end. The CDR block is used to extracts the right clock signal from the received random NRZ data to allow synchronous operation in the proceeding block and retimes the input data to remove the jitter accumulated during transmission improving the bit error rate (BER) of the receiver. The demultiplexer uses the clock signal from the CDR to demultiplexer the signal into several low-speed outputs.

(15)

1.4 Eye Diagram Analysis

The data eye in the receiver is used to determine the CDR circuit’s performance in the synchronization issue. The eye diagram of the received data is shown in Figure 1-4. It offers the key to the understanding of a good estimation of the resampling data. The best point to sample data is at the center of the eye, which maximizes the timing margin for data reception. The maximum data transfer rate is determined by the following parameters, as shown in the following[4]:

[1] TRX_SH : the sample-and-hold of the sampler, i.e., the time zone around the

sampling time during which a changing input signal can result in an undefined receiver output.

[2]TRX_Jitter : the receiver’s clock jitter. It is caused by power supply and substrate noise

resulting from the switching of digital logics or output buffers to introduce timing error on the receiver side.

[3]TTX_Jitter : the transmitter’s clock jitter. It is introduced by the noise and the clock

jitter on the transmitter side.

[4]Toffset : the static sampling error. The static sampling offset typically resulting from

the systematic clock skew deviate the average position of the sampling points from the center of the data eye.

[5]TISI : the inter-symbol-interference. This is the sum of the rise/fall time of the signal

plus the uncertainty in the total signal delay.

(16)

Figure 1-4 Eye diagram

The timing margin of the data eye Tm, which can be calculated by:

Tm = Tb – ( TISI + TTX_Jitter ) – ( TRX_SH + TRX_Jitter +Toffset )

1.5 Thesis Overview

This thesis comprises five chapters of which this introduction is the first. The Chapter 2 describes the basics of the simple CDR circuit and the trade-off involving in several design parameters of the system would be discussed. Chapter 3 discusses the MOS current mode logic circuit design in latch circuit for high speed operation and the improved MOS current mode logic which can reduce the output voltage fluctuation. Also we detail all transistor level of the simple CDR. In Chapter 4, we present the VLSI implementation and several physical design strategies used to minimize noise coupling and facilitate testing. Finally, we summarize the researched simple CDR circuit of this thesis in Chapter 5.

(17)

(18)

Chapter 2 Clock and Data Recovery

Architectures

The NRZ data stream received and amplified by an optical receiver suffers both inter-symbol interference (ISI) and noisy. For subsequent processing, timing information, e.g., a clock, must be extracted from the data so as to allow synchronous operations. Furthermore, the data must be retimed such that the jitter accumulated during transmission is removed. The task of clock extraction and data retiming is called “clock and data recovery” (CDR).

2.1 Principles of Operation

This Chapter discusses the design issues related to the simple CDR architectures. A common technique to design an integrated CDR is to use a phase-locked loop (PLL) to generate the frequency of the received NRZ data and compensating for process and temperature variations[5][6]. In all of these PLL-based CDR, they can be divided into two groups according to types of their Phase Detector (PD), the linear proportional Phase Detector[7], and the Bang-Bang Phase Detector. The Bang-Bang CDR architectures have recently found wide use in high-speed applications. The most common Bang-Bang CDR is based on Alexander phase detector[8]. In this work, the simple CDR contains several major building blocks.

(19)

(1)Phase detector : A two-level output circuit senses the phase difference between the input data and recovered clock only on data transitions.

(2)Voltage-to-Current (V/I) converter : It converts the phase detector circuit’s digital output voltage to current signal.

(3)Loop filter : It suppresses the high-frequency components of the PD output and presenting the dc level to the oscillator.

(4)Voltage-controlled Oscillator : A local clock generator that is aligned to the incoming NRZ data. Recovered clock from the VCO is used to sample the incoming NRZ data.

Figure 2-1 shows the block diagram of a proposed simple CDR[1][9]. At first, the Bang-Bang phase detector compares the incoming NRZ data and recovered clock. Secondly, the V/I converter senses the PD output voltage to generate current charging or discharging the loop filter. By using the V/I converter output current to charging or discharging the loop filter, the loop filter provides the voltage for the voltage-controlled oscillator to lock at the same frequency with the incoming data. Finally, the operation is completed and the PD retimes the data inherently.

(20)

2.2 CDR Fundamental

Generally, the task of the CDR architectures is to recovery the phase and frequency information from the input by extracting the clock from data transitions and retimes the input data stream. How can the simple CDR circuit provide these functions? In the following subsections 2.2.1 – 2.2.4, we will discuss the simple CDR building blocks in detail.

2.2.1 Bang-Bang PD

The phase detector is important in detecting the purity of the clock and data recovered from the received NRZ data. The phase detector must have the capability of deal with random NRZ data and recover the clock that is associated with the data stream. We usually use a linear phase detector or a digital Bang-Bang phase detector. A linear phase detector suffers from nonlinearity of non-uniform data patterns. In addition, it is difficult to design and is highly sensitive to mismatch. The Bang-Bang phase detector is less sensitive to data patterns. It also provides simplicity in design and better phase adjustment at high speed in spite of higher jitter.

In this work, we use the Alexander phase detector. The Alexander PD use three data samples, S1~S3, which is sampled by three consecutive clock edges to detect

whether a data transition is present and whether the clock leads or lags the data. Figure 2-2 illustrates the Alexander PD principle. Figure 2-3 shows the circuit topology. The Alexander PD consists of four D Flip-Flops and two XOR gates. The D Flip-Flop FF1 samples the incoming NRZ data stream on the rising edge of CLK and

D Flip-Flop FF2 only delays the result by one clock cycle. The D Flip-Flop FF3

samples the incoming NRZ data stream on the falling edge of CLK and D Flip-Flop FF4 just delays this sample by half a clock cycle.

(21)

Clock lead

S1 S2 S3 Din CLK S1 S2 S3

Clock lag

Din CLK

t

Figure 2-2 The Alexander PD principle

FF

1

D Q

FF

2

D Q

FF

3

D Q

FF

4

D Q

CLK

Incoming data stream

T

1

T

2

T

3

T

4

lag

lead

Retimed

data

Figure 2-3 The Alexander phase detector

The Alexander PD uses these three consecutive samples, S1~S3, in one data period to

determine whether the clock leads or lags the data. If the clock leads, then the last samples, S3, is unequal to the first two. Conversely, if the clock lags, then the first

sample, S1, is unequal to the last two. In these condition, we take S1♁S2 and S2♁S3

to provide the clock lead-lag information:

(a) If S1♁S2 is low and S2♁S3 is high, then the clock leads the input data

(b) If S1♁S2 is high and S2♁S3 is low, then the clock lags the input data

(22)

The samples, S1~S3, and the clock phase condition compared with the input data is showed at Table 2-1 S1 S2 S3 相位關係 0 0 0 No transition 1 0 0 Clock lag 1 1 0 Clock lead 1 1 1 No transition 0 1 1 Clock lag 0 0 1 Clock lead

Table 2-1 Samples and the clock phase condition

Let us examine the waveforms at various points in the Alexander PD to get more insight into its operation. As illustrated in Figure 2-4, the first rising edge of CLK makes the FF1 to sample a high data level and the FF2 sample a low data level which

is the prior output of the FF1. On the falling edge of CLK, the FF3 samples a low level

on the input data. The second rising edge of CLK then accomplishes three tasks: it makes the FF1 to sample a low level on the input data, besides it produces a delayed

version of the first sample at the output of FF2 and makes the FF4 to reproduces the

FF3 output. The values of S1, S2, and S3 are therefore valid for comparison at t = T1,

remaining constant for one clock period. As a result, the XOR gates can generate valid outputs simultaneously and determine the phase difference between the clock and the input data. If there is no data transition, the values of S1, S2, and S3 are the same and

(23)

Figure 2-4 The waveforms of Alexander PD when clock lags

∆Ө

Vout

Figure 2-5 Bang-Bang PD characteristic

From above studying, the PD’s behavior is showed in Figure 2-5. The ∆ is the θ phase difference between Din and clock signal. If the clock lags (∆ < 0), the average θ

PD output, ( lead – lag )avg , is a high negative value. Conversely, if the clock leads

(∆ > 0), then ( lead – lag )θ avg is a high positive value. The Alexander PD exhibits a

very high gain in the vicinity of ∆ =0, so the CDR loop locks such that Sθ 2 coincides

with the data zero crossings and S1 appears in the center of the data eye. So the S1 can

(24)

2.2.2 Voltage-to-Current Converter

The Alexander PD outputs drive the voltage-to-current converter. The two output signals ( lead, lag ) are averaged in the current domain, and the result is applied to the loop filter. Because the high gain of the Alexander PD yields a small phase offset under locked condition, this simple CDR circuit need not incorporate a charge pump. In the absence of data transitions, V/I converter generates a zero dc output, leaving the oscillator control undisturbed. As a result, for long data urns, the VCO frequency drifts only due to device electronic noise rather than due to a high or low level on the control line.

2.2.3 Loop Filter

The loop filter works between the V/I converter and the voltage-controlled oscillator. The transfer function of the loop filter has a large influence on the properties of the CDR loop. Figure 2-6 shows the second-order low-pass filter[10][11]. It consists of a resistor Rp in series with a capacitor Cp and a capacitor Cs in parallel.

The capacitor Cs provides a higher pole to reduce the ripple noise of the VCO

voltage-controlled line. The loop filter provides a pole in the original to provide an infinite DC gain to get the zero static phase error, and a zero to improve the phase margin to ensure the closed loop stability of the CDR loop.

(25)

The total transfer function of the loop filter is

( )

_{( )}

( )

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + = = p z h s s s K s I s V s F ω ω 1 (2.1) where p s p p s p p p z p s P p h C C R C C C R C C C R K = = + + = ;ω 1 ;ω

2.2.4 Voltage-Controlled Oscillator

The voltage-controlled oscillator generates an output waveform with its frequency controlled by the control voltage, as shown in figure 2-7(a). Figure 2-7(b) shows the characteristic of VCO, the VCO frequency ω0 is a linear function of the

control voltage Vc. The curve need not be linear, but it usually simplifies the design if

the slope is the same everywhere. The slope Kvco is the gain of the VCO. Gain and

linearity are most important to CDR systems. We will introduce some specifications of VCO [1][9][]10]:

Tuning range: The tunable frequency range of the VCO must be able to

cover the entire required frequency range of the interested application.

Tuning linearity: An ideal VCO has a constant VCO gain, Kvco, at the

required tuning range.

Power supply sensitivity: In SOC design, the switching noise induced by

digital circuit will couple to VDD of a VCO and influence its output waveform. Hence, this effect must be as low as possible to reduce the VCO output jitter.

Phase stability: An ideal spectrum of the VCO output should be look likes

Dirac-impulse. That is to say, the phase noise of the VCO must be as low as possible.

(26)

VCO VCO V1 2V1 f 2f (a) f1 f2 V1 V₂ Kvco = (f2-f1)/(V2-V1) (b) Control voltage

Figure 2-7 Illustration of the VCO (a) model of the oscillator, (b) characteristic

2.3 Analysis of Loop Performance

With the Bang-Bang PD characteristic, the clock falling edge must sample zero-crossing points of the input NRZ data. Even for a slight phase error, the PD generates a large output, driving the loop toward lock. We now consider a more realistic Bang-Bang characteristic, where the gain in the vicinity of ∆ = 0 is finite. θ The finite slope arises form D_FF’s metastability[1] : if the D_FF samples at the zero- crossing point of the input data, the output may not reach the full logical level in one bit period. If the Alexander PD locks, the ∆ approaches zero, the second sample, θ S2, falls in the vicinity of the data zero crossing, thereby driving FF3 and FF4 into metastability. In metastability, the XOR gates produce small differential outputs yielding a small average output for the overall PD. Thus, the CDR loop can lock such that the XOR gates experience metastable inputs most of the time. For the phase differences that are small enough to produce an unsaturated output, the PD characteristic likes linear[1]. We imitate the analysis of a linear PLL-based CDR. The approximated model of the simple CDR with an Alexander Bang-Bang phase detector is shown in figure 2-8. Where Kd is the gain of the phase detector, Kvco is the gain of the VCO, the transfer function of the loop filter is F(s). We can observe the model is

(27)

Figure 2-8 Model of the CDR

We first consider the open-loop response to obtain considerable insight into the design of the CDR circuit. This response can be derived by breaking the loop at the feedback input of the phase detector. The output phase, θ_o

( )

s , is related to the input phase,

( )

s i θ , by

( )

s K s F K s s VCO d i o =θ ⋅ ⋅ ⋅ θ (2.2)

The open-loop response, H(s), then is given by

( )

_{( )}

( )

s K s F K s s s H VCO d i o = ⋅ ⋅ = θ θ (2.3)

We use the loop filter in Figure 2-6, then Eq. (2.3) becomes

( )

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + ⋅ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + ⋅ ⋅ + ⋅ ⋅ + ⋅ + ⋅ = p z p s s p p p p p s VCO d s s s K C C C C R s s C R s C C K K s H ω ω 1 1 1 1 2 2 (2.4)

(28)

Where K =K_d ⋅K_h⋅K_VCO is the loop bandwidth of the CDR

Figure 2-9 shows the bode plot of the transfer function. We can see the phase of H(s) is -180o at ω= 0, and the zero ωz introduce the phase shift of +90o and the pole ωp

introduce the phase shift of -90o. The phase margin could be described as follows

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ = − p Z K K PM ω ω tan tan 1 _(2.5)

Another way to approximate this parameter is to ignore the shunt capacitor Cs. Since

Cp >> Cs, the zero, P P z C R 1 =

ω , is much smaller than the pole,

p s p s p C C R C C ⋅ ⋅ + = ω .

Hence, Eq. (2.4) can be re-written as

( )

1 ₂ s C R s C K K s H p p VCO d⋅ ⋅ + ⋅ ⋅ = (2.6) where

( )

p C s R s F ⋅ + = 1 |H(s)| _H(s) 0dB ωz ωp -180o Phase Margin

(29)

For the CDR to be stable, the following condition should hold: VCO d p p p K K C C R ⋅ < ⋅ 1 (2.7)

To consider the examples quoted above, this design guarantees that the phase margin is good enough to the loop[10][12].

2.3.1 approximated frequency response with Loop filter

In contrast to the approximated analysis above, the other popular method to analysis a CDR is by the closed-loop transfer function which is written is written is in Eq. (2.8) and loop filter transfer function is approximated to

( )

p p C s R s F ⋅ + = 1

( )

_{( )}

( )

_{( )}

( )

_{( )}

VCO d i s H s s K F s K s G ⋅ ⋅ + VCO d o s H s = K ⋅F s ⋅K + = = 1 θ θ (2.8) or, equivalently, by

( )

_{( )}

( )

_{( )}

1 2 1 2 1 2 + ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ ⋅ ⋅ + ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ + ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ ⋅ ⋅ = + = = n n n i o s s s s H s H s s s G ω ξ ω ω ξ θ θ (2.9)

where ξ, define as the damping factor, is given by

z K ω ξ 2 1 = (2.10)

(30)

z n K ω

ω = ⋅ (2.11) The damping factor and natural frequency characterize the close-loop response. The close-loop frequency response of the CDR for different values of damping factor are normalized to natural frequency as shown in Figure 2-10. This figure shows that the CDR is a low-pass filter to the phase noise at frequency below ω_n. For small value of

ξ, the curve is shaper than those of large value of ξ. In the CDR design, the loop is designed to be over-damping(ξ>1) to avoid the jitter peaking effect. This also helps increase the phase margin of the open-loop transfer function.

M ag nit ud e( db ) Frequency(Hz)

Figure 2-10 The close-loop frequency response of the CDR

Figure 2-11 shows the transient step response of the CDR for different value of damping factor and for time normalized to

n ω

1

. The step response is generated by instantaneously advancing the phase of the input by one radian and observing the output for different damping levels in the time domain. The CDR output initially responses rapidly but takes a long time to the steady state for the damping factor larger than one; i.e., the system is over-damped. We can find that the rate of the initial

(31)

CDR design.

S

(t)

Time ( 10us)

(32)

Chapter 3 10 Gb/s Clock and Data Recovery

Circuit Design

3.1 Introduction

This Chapter discusses the circuits design more detailed in transistor level. The design method could be applied to a CDR with the input data rate of 10 Gb/s. The process and model used for the circuit design is the TSMC 0.18 µm 1P6M CMOS process. We simulate the CDR with HSPICE to acquire the detail electrical behavior. Moreover, the process variations and the temperature effects should be taken into account. We must simulate the CDR in high temperature, slow process and fast process besides the normal temperature and typical process. After that, the post-layout simulation including the circuit parasitic resistances and capacitances must be simulated.

3.2 Circuit Description

For the CDR circuit to handle high-frequency signals, the circuit must have fast switching speed. ICs operating at speeds greater than 10 Gb/s usually use GaAs MESFETs, GaAs HBTs, Si BiCMOS transistors. The power consumption of these processes, however, is relatively large because their supply voltage is high and their

(33)

and highly integrated for low cost. The CMOS transistors have the advantages of low power consumption and low cost, but they still rarely been used in high-speed systems because of their operation speed is too low.

3.2.1 High speed MCML Latch

Generally, a conventional CMOS inverter exhibits some drawbacks that prevent it from being vastly used in high-speed low voltage circuits. First, a CMOS inverter is essentially a single-ended circuit. In a multi-gigahertz frequency range, the short on-chip wires act as coupled transmission lines. The electromagnetic coupling thus causes serious operational malfunctioning in the circuits, particularly single-ended circuits. Beside, the pMOS transistor in a static CMOS inverter will severely limit the maximum operating frequency of the circuit. For circuit can correctly operate at 10GHz domain, we use the MOS current-mode logic (MCML) to take the place of conventional CMOS logic. MCML circuits can operate with lower signal voltage and higher operating frequency at lower supply voltage than static CMOS circuits. The MCML has extensively used to implement ultrahigh-speed buffers [13], [14], latches [14], multiplexers and demultiplexers [15], and frequency dividers [16].

Figure 3-1 shows inverters of the CMOS logic and the conventional MCML. The CMOS logic has the advantage of low power consumption, but its operation is relative slow. For example, the maximum toggle frequency of a conventional 0.18 μm CMOS inverter is only about 3.5 GHz. The power consumption of this CMOS logic is the product of the operation frequency and the charging and discharging power per unit switching. On the other hand, the power consumption of the MCML is the drain current of the current source transistor MNb. Therefore, the power consumption of the MCML is nearly independent of the operation frequency. The CMOS logic uses power only when charging and discharging, its power consumption is generally

(34)

smaller than that of the MCML. However, in the gigahertz frequency range, the power consumption of the CMOS logic become larger than that of the MCML, as shown in Figure 3-2 [15]. This means that the MCML is more suitable for low-power operation in the gigahertz frequency range.

Figure 3-1 Inverter circuits of the (a) CMOS logic and (b) MCML

Figure 3-2 Power consumption of the MCML and CMOS logic

(35)

MN4, being employed to store the data. Figure 3-1 demonstrates a CMOS CML latch circuit.

Figure 3-3 Circuit schematic of a CMOS CML latch

The track and latch modes are determined by the clock signal inputs to a second differential pair, MN5 and MN6. When the signal CK is “high”, the tail current Iss entirely flows to the tracking circuit, MN5 and MN6, thereby allowing Vout to track Vin. In the latch-mode, the signal CK goes low, the tracking stage is disabled, whereas the latch pair is enabled storing the logic state at the output.

To achieve the best performance in a MCML latch, a complete current switching must take place, and the current produced by the tail current needs to flow through the ON branch only. So the latch output voltage swing of single end is R_D⋅I_SS (RD is the

equivalent resistance of MP1 or MP2 when it is worked at linear region). The value of load resistance depends on both the tail current and voltage swing requirement. Reducing the load resistance RD and increasing tail current Iss is one way to lower the

transient time without changing the output swing voltage, but increasing the tail current will also increase the power consuming. The sizes of pair transistors (MN1-MN2 and MN3-MN4) are increased by increasing the tail current and also are

(36)

increased by reducing the magnitude of single ended swing voltage. The sizes of the other transistors (MN5, MN6 and MNb) have the effect of the tail current only. For high speed switching between MN1-MN2 and MN5-MN6, we let the size of the transistor as large as possible. On the other hand, the increasing of the transistor size will make the parasitic capacitance serious. The parasitic capacitance will slow the operation speed of the circuit. We must get the best balance between the transistor size and the parasitic capacitance for circuit performance.

Unfortunately, the transient output of MCML latch signals has serious level fluctuation as show in Figure 3-4 with a 10Gb/s input data stream. It comes from the clock signal changes. During each transition from the sampling mode ( CK is high ) to the latching mode ( CK is low ), the current tail of the cross-coupled pair must first recharge the capacitances of the cross-coupled pair as it start drawing current from the output nodes, and changing the logic state. Consequently, the output nodes of the MCML latch generate current spiking resulting in the large fluctuation of the output nodes that can yield operation failure at high speed application. In the design of latch, the problem of the serious output fluctuation becomes more serious as lowering the voltage swing. It is a kind of the barrier disturbing high speed gate design[17][18].

(37)

3.2.2 Improved High speed MCML Latch

In the previous, the MCML latch has serious output fluctuation. It comes from the variation of the tail current density depending on level change of clock and input data signal though the output nodes are keeping its level unchanged. This typed of latch cannot avoid it. To reduce this problem, we propose a simple approach to improve the MCML latch. The proposed circuit is showed in Figure 3-5. We add an nMOS MN7 as another current source only for MN3-MN4 cross-couple pair. The added MN7 makes the cross-couple pair always on, so when the latch changes form sample mode to the latch mode the tail current Iss does not need to recharge the capacitances of the cross-coupled pair. This can reduce the current spiking of the output nodes reducing the fluctuation. Because the cross-couple pair is always on, the sample pair, MN1-MN2, must have larger current than ID to change the state of the

cross-couple pair to trace the input data in sample mode. Figure 3-6 shows the simulation results of the improved MCML latch with a 10Gb/s input data stream.

(38)

Figure 3-6 Voltage waveforms at the output of the improved MCML Latch

3.2.3 Alexander Phase Detector

The Alexander phase detector is composed of four D_type Flip-Flops and two XOR gates.

3.2.3.1 D_type Flip-Flop

In the previous chapter, an Alexander phase detector use the D Flip-Flops to sample the input data or delay the sampling result by a clock or half a clock. For correct operation at 10GHz domain, we use the improved MCML architecture which is proposed in last section to realize the high speed D Flip-Flop. Figure 3-7 shows the Master-Slave D type Flip-Flop. When clock is “high”, the master latch holds the sampling result for half a clock and the slave latch samples the master output. Any changes at input nodes can not influence the output. When clock is “low”, the slave latch holds the sampling state of the sampling pair for another half a clock. The sampling result at clock rising edge holds for a clock period. Because the CDR input data is random binary sequence, we input a rate of 10Gb/s 27-1 PRBS pattern to the D Flip-Flop with a clock signal that is locked to the input pattern. The performance

(39)

showed in Figure 3-8(a) (b). We make eye diagram of the D Flip-Flop output to be more clear about the performance of the MCML latch and the improved MCML latch. Table 3-1 shows the post simulation result of the two type D Flip-Flops.

Figure 3-7 The Master-Slave D Flip-Flop

Figure 3-8(a) The Eye diagram of the D Flip-Flop

(40)

MCML D Flip-Flop Improved MCML D Flip-Flop Improved performance Jitter (pk-pk) 3.1 ps 0.7 ps 343% Eye opening (mv) 368 mv 488 mv 33% Output fluctuation 137 mv 71 mv 93%

Table 3-1 Post simulation result of the two type D Flip-Flops

3.2.3.2 XOR gate

Figure 3-9 shows the common schematic of MCML XOR gate[19]. The circuit has a similar structure like latch. However, it has two symmetric differential pairs,

MN1-MN2 and MN3-MN4, which deal with same input signal as Α−Α_ pair. When both A and B have the same logic level, logic high or low, either MN1-MN5 or MN3-MN6 turns on. Consequently, the output node, Q, goes logically low. When two input signals have different logic level, one is high and the other is low, either MN2-MN5 or MN4-MN6 turns on. The output node, Q− , goes logically low.

(41)

It is important to note that the XOR gates in Figure 2-3 must provide the two different inputs with symmetric load. Otherwise, differences in propagation delays result in systematic phase offsets. To reducing the unbalance effect of the load, each of the XOR gates is implemented as shown in Figure 3-10[20]. The circuit does not use the stacking stages, so it provides perfect symmetry between the two inputs. The output is single-ended but the single-ended “early” and “late” signals produced by the two XOR gates in the phase detector are sensed with respect to each other, thus acting as a differential drive for the Voltage-to-Current converter. The operation of the XOR circuit is as follows. We set the Vref at the output common-mode level of the D Flip-Flop preceding the XOR gate. If the two inputs are identical, one of the tail currents flows through the transistor MP and the output voltage is low. If the two logical inputs are not equal, then one of the input transistors on the left and one of the input transistors on the right turns on, thus turning the transistor MP off and the output voltage is high.

(42)

Figure 3-11 shows the post simulation result of the Alexander phase detector’s transfer function which is composed of the proposed structure above. This is accomplished by obtaining the difference value of lead and lag signals with 10Gb/s input data rate. The PD zero output voltage at a phase difference is approximately 6.5ps (0.13π) from the metastable point, indicating that the systematic offset between the data and the clock is very small.

-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 -30 -20 -10 0 10 20 30 PD ou tp ut voltage (mv) Phase difference(π)

Figure 3-11 The transfer function of the Alexander PD

3.2.4 V/I converter and Loop filter

Figure 3-12 shows the V/I converter and the loop filter. The V/I converter compares the output voltage of the two XOR gates of the Alexander phase detector and converts this voltage difference to current output. The output current will charge or discharge the loop filter to tune the VCO controlling voltage. Unlike charge pumps, V/I converters need not to switch after every phase comparison. Therefore, it does not suffer from the dead-zone issue. Without the stacking switch MOS, the V/I converter can provide nearly rail-to-rail voltage swings for the oscillator control line[20][21].

(43)

Although the Bang-Bang CDR loop is in general a nonlinear time-variant system, it can only be assumed linear if the phase error is small. The design of the loop filter is based on a linear time-invariant model of the loop and is performed in continuous time domain.

Figure 3-12 V/I converter and loop filter

3.2.5 VCO

A voltage control oscillator (VCO) is the most sensitive building block in a CDR as far as supply and substrate noise is concerned. Therefore, careful design is needed in order to reduce noise and frequency drift. Although ring oscillator has wide tuning range and excellence of integration with digital CMOS process, it can not accomplish high-frequency operation, such as 10GHz or higher. To accomplish high-frequency operation, we use the LC-tank oscillator. The LC-tank oscillator has an excellent phase noise performance with low power consumption because of a relatively high quality factor. This high-speed component was realized in expensive technologies such as GaAs, SiGe or bipolar before. Now the low cost of CMOS technology due to its paramount maturity and high integration density has pushed the designers to realize low noise VCO for CMOS system on chip[1][22].

As shown in Figure 3-13[23], the complementary cross-coupled differential LC structure was used to realize the fully integrated 10GHz domain low noise and low

(44)

power consumption oscillator under 1.8V supply voltage. Fully differential operation provides complementary outputs. The using of pMOS and nMOS structure offers higher transconductance for a given current, which results in saving power and faster switching of the cross-coupled differential pair[1][24].

Figure 3-13 The complementary cross-coupled differential LC VCO

The oscillation frequency of LC topologies is equal to f_osc =1

(

2π LC

)

,

suggesting that only the inductor and capacitor values can be varied to tune the frequency and other parameters such as bias currents and transistor transconductances affect negligibly. Since it is difficult to vary the value of monolithic inductors, we simply change the tank capacitance to tune the oscillator. The tunable capacitance is called “varacor”. In circuit design, a reverse-biased pn junction or a MOSFET can serve as a varactor. The MOSFET varactor suffers from a large source-drain resistance in the vicinity of minimum capacitance due to the low carrier concentration in the

osc

(45)

varactor” resolves these difficulties. Figure 3-14 shows the structure of pn junction and accumulation-mode MOS varactor[1].

p

+

n

+

n-well

p-substrate

n

+

n-well

p-substrate

n

+

V

G

V

S

(a)

(b)

Figure 3-14 Structure of varactor (a) pn junction (b) accumulation-mode MOS

While used in both bipolar and CMOS technologies, the pn junction varactors become less attractive at low supply voltage VCO design for two reasons. First, pn junctions suffer from a limited tuning range that trades with nonlinearity in the C-V characteristic. The junction capacitance can express as

m B R O V C C ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ Φ + = 1 var (3.1)

where Co is the zero-bias value, VR the reverse-bias voltage, ΦB the built-in potential

of the junction, and m a value typically between 0.3 and 0.4. At low supply voltages VR has a very limited range, yielding a small range for Cvar and hence for fosc. The

capacitance varies slowly under reverse bias and sharply under forward bias, thereby introducing significant nonlinearity in the VCO characteristic. Second, at low supply voltages, it becomes increasingly more difficult to select the oscillator common-mode level and signal swings so as to avoid forward biasing the diodes. The accumulation-mode MOS varactor does not exhibit the above shortcomings. The C-V characteristic is illustrated in Figure 3-15. The MOS varactor should operate with

(46)

positive and negative biases so as to provide maximum dynamic range, Cmax/Cmin, of 2.5 to 3 with . This device comfortably tolerates both positive and negative voltages, allowing large VCO swings.

V V V V _G _S 1 1 ≤ − ≤ −

Figure 3-15 The C-V characteristic of accumulation-mode MOS varactor

Monolithic inductors are typically realized as spiral structures. The mutual coupling between every two turns results in a relatively large inductance per unit area. Figure 3-16(a) shows the symmetric spiral structure and figure 3-16(b) shows the equivalent circuit model[25][1].

w

2R

S

N:turn number

(N=2)

(a)

(b)

L

R

dc

R

sub

R

sub

C

ox

C

ox

C

sub

C

sub

Cs

(47)

The definition of each parameter is listed below: L : inductance

Rdc : metal series resistance

Cs : overlap capacitance between the spiral and the center tap under pass Cox : oxide capacitance between the spiral and substrate

Rsub : silicon substrate resistance

Csub : silicon substrate capacitance

For a given inductance, different combinations of line width, number of turns, and outer dimension can be used, leading to a large design space. However, the dc resistance of the inductor, Rdc, often constrains the choice of these values. In particular,

the line must be sufficiently wide so that Rdc does not significantly limit the Q.

Nevertheless, increasing W yields a greater area and a larger capacitance for the inductor. These will increase the loss of inductance. At high frequency, the series resistance of the wire is also influenced by the skin effect heavily. Interestingly, such current distribution also changes the inductance value because the area and hence the magnetic flux enclosed by each turn change[25].

At high frequency the passive components is a very important part, because they compose the core of VCO, the resonant tank. High quality factors and low parasitics are necessary for both inductor and varactor in respect of the phase noise, tuning range and power consumption. Thus, accurate modeling, especially at frequencies above 10GHz, may require electromagnetic field simulations. Because we do not have any experience of making an on-chip passive components, we make use of TSMC 0.18μm RFIC 1P6M+ process which provides electrical behavior characteristics of spiral inductor and varactor for design reference. The operation speed follows the synchronous optical network (SONET) OC-192 at the forward error-correction(FEC) bit rates of 10.71Gb/s[26]. The post-layout simulation transfer curve of VCO is show in Figure 3-17. Table 3-2 collates the simulation result.

(48)

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 9.40E+009 9.60E+009 9.80E+009 1.00E+010 1.02E+010 1.04E+010 1.06E+010 1.08E+010 1.10E+010 1.12E+010 TT SS FF Output F requency C ontrol Voltage

Figure 3-17 Simulation result of the VCO

TT post simulation FF post simulation SS post simulation Power supply 1.8 V 1.8 V 1.8 V 9.74 ~ 11.14 GHz 9.84 ~ 11.06 GHz 9.467 ~ 11.06 GHz

VCO tuning range

650 MHz/v 1 GHz/v 1.5 GHz/v

VCO output swing (Vpeak to peak) 1 V 1.28 V 815 mV

VCO power consumption 12.5mW 15.1mW 8.65mW

VCO phase noise -103dBc@1Mhz -103dBc@1Mhz -104dBc@1Mhz

Table 3-2 VCO post simulation result

3.2.6 Output Driver

A source follower circuit is used as an output driver in the chip for the convenience of measurement, as shown in Figure 3-18. It also shows that the output level shift, bond wire parasitic and the loading effect of 50Ω. The output level shift before the source follower is used to adjust the output dc level. A source follower can provide low output impendence and strong driving capacity. The single-ended source follower circuit output resistance is 1 . The output signal could be measured

(49)

inductance and the 50Ω loading effect[27].

Figure 3-18 Source follower output buffer

3.3 Simulation Result

The CDR system is simulated by HSPICE with the TSMC 0.18um model. The simple CDR successfully locks to 27-1 NRZ PRBS data. The control voltage for the VCO is shown in Figure 3-19. The loop locks to the input data within 5.5us. The control voltage ripple is within 15mv. Figure 3-20 shows the retimed NRZ data and the retimed clock at 10.71GHz. It shows that the loop can tolerate a slight frequency offset, which is much larger than the frequency variation between the VCO output and input NRZ data, and can lock under the noisy random data stream.

(50)

Input Data

Retimed Data

Clock

Figure 3-20 Retimed data and clock

Finally, retimed data jitter of the CDR with the improved MCML latch is showed in Figure 3-21. The eye diagram of the retimed data shows that the peak-to-peak jitter of the retimed data is 7.5ps

Figure 3-20 The eye diagram of the retimed data with improved MCML latch

Figure 3-22 shows the retimed data jitter of the CDR with common MCML latch. The eye diagram of the retimed data shows that the peak-to-peak jitter of the retimed data is 11.2ps. Compared with the two eye diagrams, the improved MCML latch can reduce the retimed data jitter and has better eye opening than the common MCML latch.

(51)

(52)

Chapter 4 VLSI Implementation

4.1 Layout

In a high-speed, mixed-mode circuit design, significant attention must be paid to physical layout both to avoid speed degradation and to minimize noise coupling. In addition, testing issues must be considered in conjunction with layout floor planning and pad placement.

The whole chip layout of the CDR is shown in Figure 4-1. It can be seen that a large area is occupied by the spiral inductor. The spiral inductor is created by TSMC 0.18um RF inductor model. Above the spiral inductor, the VCO complement nMOS and pMOS cross-couple pairs use the TSMC 0.18um RF MOS model to provide precise simulation result. The loop filter is provided off chip to reduce chip area. The top circuit block is the Alexander Phase detector circuit. Below the PD block, the left block is the data output buffers and the right block is the two XOR gates and the V/I converter.

Several physical design strategies used to minimize noise coupling and facilitate testing are listed below:

1, In order to avoid the substrate noise coupling form RF VCO block to sensitive other circuit block, the ground signal used by VCO block circuit is separate form the other circuit ground.

(53)

2, In order to reduce the noise coupling between each building block and also to monitor individual current sank easily by each block. There are three sets of supply voltage. One is for the output buffer and the level shift circuits, one is for LC-tank VCO and another is for PD circuit and V/I converter circuits.

3, In order to reduce noise coupling between each circuits, every circuit has its guard ring.

4, The biasing circuit for each main block is also separated. The allocation of biasing circuits is the same as that of supply voltage.

Xor & V/I converter Phase detector

Output buffer

Cross-couple pairs

Level shift

Spiral Inductor

(54)

The proposed CDR circuit has been implemented in TSMC 0.18um 1P6M mixed signal CMOS process with a supply voltage of 1.8V. The total area of the chip is 1.05mm × 0.797mm.

4.2 Performance Summary

The performance summary of the proposed CDR with improved MCML latch is given in Table 4-1. The parameters in the system design are listed. The CDR performance between using improved MCML latch and using common MCML latch are also provided in this table.

Table 4-1 Performance summary

Technology TSMC 0.18μm RFIC 1P6M+ process Power supply 1.8V

Chip Size 1.05mm × 0.797mm Power consumption 130mw(including output buffers)

Input bit rate 10.71Gb/S (OC-192) Output bit rate 10.71 Gb/s

Kpd ₇₅

rad mV

Kvco 650 MHz/v VCO phase noise -103dBc@1Mhz

Lock time < 6us

Retimed jitter 7.5ps(peak-to-peak) CDR Spec

Power 40.7 mw CDR with common

(55)

(56)

Chapter 5 Conclusion and Future Research

The simple CDR circuit was presented in this thesis. This is a first attempt by the author and still has much room for improvement. The simple CDR uses the proposed MCML architecture to incorporate the Alexander phase detector. The proposed MCML architecture can reduce the output signal level fluctuation. These realizations may improve the jitter performance. The techniques for high-performance CDR design remain a challenging and a promising task.

In Chapter 2, a simple CDR with Alexander phase detector is presented. The system behavior and loop performance have been analyzed. The approximate second-order model, which imitates the analytical of linear PLL model, was derived to validate the stability and to assist in system parameters design.

In Chapter 3, a 10GHz simple CMOS CDR circuit has been realized in 0.18μm standard CMOS process. In order to operate at high-speed frequency in the Gb/s range reliably, we must use the MCML architecture circuit which can operate correctly with smaller input signal voltage swing at high frequency. We proposed an improved MCML circuit which can reduce the output signal level fluctuation drawback of original MCML. We use the full-rate Alexander bang-bang phase detector for a phase tracking state. The output of phase detector drives a V/I converter. The V/I converter

(57)

change the VCO oscillating frequency. Finally, we compare the system performance between the improved MCML latch system and the common MCML latch system. The proposed improved MCML latch can reduce the CDR jitter and has more opened eye diagram than the common MCML latch.

In Chapter 4, there are some discussions on the layout techniques. The common centroid layout structure is used to reduce layout mismatch. Finally, the CDR circuit as presented in this thesis occupies a 1.05mm × 0.797mm chip area in TSMC 0.18um 1P6M technology. The total power consumption of this chip is about 130mw under a 1.8V supply voltage (two output buffers included).

This CDR structure is quite fundamental. With the proposed MCML latch, we can improve the common MCML circuit drawback to get a better jitter performance. In this proposed CDR circuit, we can additionally add a frequency detection circuit in the future research. The frequency detection drives the VCO frequency toward the desired value by a frequency-locked loop. When the frequency error reaches a sufficiently small value, the PLL takes over and performs phase locking. This frequency detector can improve the typical PLL drawback of small capture range, especially if it operates with random data.

(58)

Bibliography

[1] B. Razavi, “Design of Integrated Circuits for Optical Communication,” McGRAW-Hill, 2003.

[2] N. Miki and K. Okada, “Access flexibility with passive double star system,” IEEE 5th Conf. Opt./Hybrid Access Network Proc., 1993.

[3] K. Yukimatsu and Y. Shimazu, “Optical interconnections in switching system,” IEICE Trans. Electron., vol. E77-C, no. 1, pp.2-8, Jan. 1994.

[4] R Farjad-Rad, “A CMOS 4PAM Multi-Gbps Serial Link Transceiver,” Ph.D. Thesis, Stanford University, 2000.

[5] H. Djahanshahi, C. Andre and T. Salama, “ Differential CMOS Circuits for 622 MHz/933 MHz Clock and Data Recovery Applications,” IEEE Journal of Solid-Stae Circuits, Vol. 35, No. 6, June 2000.

[6] L. Wu, H. Chen, S. Nagavarapu, R. Geiger, E. Lee, and W. Black, “A monolithic 1.25Gb/s CMOS clock/data recovery circuit for fibre channel transceiver,” in Proc. IEEE ISCAS, Vol.2, pp. 565-568, Orlando, FL, June 1999.

[7] C.R. Hogge, “A Self-Correcting Clock Recovery Circuit,” IEEE J. Lightwave Tech., Vol. 3, pp. 1312-1314, Dec. 1985.

[8] J.D.H. Alexander, “Clock Recovery from Random Binary Data,” Electronics Letters, Vol. 11, pp. 541-542, Oct. 1975.

[9] J. Savoj and B. Razavi, “High-speed CMOS circuits for optical receivers,” Kluwer Academic Publishers, 2001

(59)

[11] J. Lee and B. Razavi, “ A 40Gb/s Clock and Data Recovery Circuit in 0.18um CMOS Technology,” IEEE Journal of Solid-State circuits, Vol. 38, No. 12, Dec. 2003

[12] P.K. Hanumolu, M. Brownlee, K. Mayaram and Un-Ku Moon, “ Analysis of charge-pump phase-locked loops,” IEEE Transactions on circuits and systems, Vol. 51, No. 9, Sep. 2004.

[13] K. Iravani, F. Saleh, D. Lee, P. Fung, P. Ta, and G. Miller, “Clock and data recovery for 1.25Gb/s Ethernet transceiver in 0.35um CMOS,” in Proc. IEEE Custom Integrated Circuit Conf., May 2001, pp.261-264.

[14] H. T. Ng and D. J. Allstot, “CMOS current steering logic for low-voltage mixed-signal integrated circuits,” IEEE Trans. VLSI Syst., vol. 5, pp.301-308, Sept. 1997.

[15] A. Tanabe, M. Umetani, I. Fujiwara, K. Kataoka, M. Okihara, H. Sakuraba, T. Endoh, and F. Masuoka, “0.18um CMOS 10Gb/s multiplexer/demultiplexer ICs using current mode logic with tolerance to threshold voltage fluctuation,” IEEE J. Solid-State Circuits, vol. 36, pp. 988-996, June 2001.

[16] H. D. Wholmuth, D. Kehrer and W. Simburger, “A high sensitivity static 2:1 frequency divider up to 19 GHz in 120 nm CMOS,” in Proc. IEEE Radio Frequency Integrated Circuits (RFIC) Symp., June 2002, pp. 231-234.

[17] J.K Shin, T.W. Yoo and M.S. Lee, “Design of half-rate linear phase detector using MOS current-mode logic gates for 10Gb/s clock and data recovery circuit,” Advanced Communication Technology, 2005, ICACT 2005. The 7th International Conference, Vol. 1, pp.205 – 210, Feb. 2005.

[18] P. Heydari and R. Mohanavelu, “Design of Ultrahigh-speed Low-Voltage CMOS CML Buffers and Latches,” IEEE Transactions on Very Large Scale in Integration(VLSI) Systems, Vol. 12, no. 10, Oct. 2004.

(60)

[19] M. Alioto and G. Palumbo, “Modeling and Optimized Design of Current Mode MUX/XOR and D Flip-Flop,” IEEE transactions on circuits and systems-II:ANALOG AND DIGITAL SIGNAL PROCESS, Vol. 47, no. 5, May 2000.

[20] J. Savoj, B. Razavi, “A 10-Gb/s CMOS clock and data recovery circuit with a half-rate linear phase detector,” IEEE Journal of Solid-State Circuits, vol. 36, Issue 5, pp. 761 – 768, May 2001.

[21] J. Lee and B. Razavi, “A 40Gb/s Clock and Data Recovery Circuit in 0.18um CMOS Technology,” IEEE Journal of Solid-State Circuits, Vol. 38, No. 12, Dec. 2003.

[22] D. Baek, T. Song, E. Yoon and S. Hong, “8GHz CMOS Quadrature VCO Using Transformer-Based LC tank,” IEEE microwave and wireless components letters, Vol. 13, No. 10, Oct. 2003.

[23] J. Maget, M. Tiebout, R. Kraus, “MOS varactors with n- and p-type gates and their influence on an LC-VCO in digital CMOS,” IEEE Journal of Solid-State Circuits, vol. 38, Issue 7, pp. 1139 – 1147, July 2003.

[24] Y. H. Kao and M.T. Hsu, “Theoretical analysis of low phase noise design of CMOS VCO,” IEEE microwave and wireless components letters, Vol. 15, No. 1, January 2005.

[25] TSMC 0.18um MIXED signal 1P6M+ SALICIDE 1.8V/3.3V PCM SPEC(T-018-MM-PC-001)

[26] J. Cao, M. Green, A. Montaz, K. Vakilian, D. Chung, K.C. Jun, M. Caresosa, X. Wang, W.G. Tan, Y. Cai, I. Fujimori, and A. Hairapetian, “OC-192 Transmitter and Receviver in Standard 0.18um CMOS,” IEEE Journal of Solid-State Circuits,

(61)

[27] S.W. Yoon, S. Hong and J. Laskar, “ Efficiency enhanced and harmonic suppressed differential VCO with novel buffer scheme using transformers for IEEE 802.11a,” IEEE MTT-S Digest.

[28] National Semiconductor, LM117/LM317A/LM317 3-Terminal Adjustable Regular Data Sheet, National Semiconductor, Inc., 1997.

(62)

Appendix A

Testing Strategies

The chip testing consists of three steps, namely, DC power supply and ground, print circuit board (PCB) layout, and closed-loop CDR testing.

Firstly, the DC operating point is measured to make sure that all of the biasing current and DC points are in the vicinity of the original designs. Since the CDR system has a high frequency LC-tank VCO, we partition the DC power supply and ground-reference net on the PCB into 2 parts to isolate the noise coupling. Hence, the VCO and other circuit ground planes is separated on the PCB and connected by a inductor. This inductor shorts the DC voltage of the VCO and other circuits grounds, while preventing the high-frequency noise coupling. The power supply and bias voltage are generated by LM 317 adjustable regulators as show in Figure A-1[28]. The input of the regulator circuit is connected to a 6V battery instead of a general power supply, because the noise of general power supply is much larger than the battery. The regulator circuit is easy to use and could be predicted by the Equation A.1. 2 1 2 1 25 . 1 I R R R Vout ⎟⎟⋅ ADJ⋅ ⎠ ⎞ ⎜⎜ ⎝ ⎛ + ⋅ = (A.1) where the IADJ is the DC current that flows out of the adjustment terminal ADJ of the

regulator. The capacitor C1 can be added to improve transient response at the output.

(63)

10uF、1uF、0.1uF and 0.01uF capacitors as shown in Figure A-2.

Figure A-1 LM 317 regulator

Figure A-2 Bypass filter at regulator output

Measurement must performed with raw die mounted on the PCB to prevent the parasitic effect of the package, which is illustrated in Figure A-3(a), and the testing PCB layout was shown in Figure A-3(b). High-frequency signal traces such as NRZ, NRZB, R_CLK, R_CLK, R_NRZ and R_NRZB are mode as short as possible to reduce signal exhaustion and the length of differential signal traces are made close to each other reduce the parasitic clock skew. Each high-speed traces use the SMA(Surface Mount Adaptor) connector. High-speed output lines can easily couple the large output swing onto the sensitive input line. Another challenge is in placing the discrete components and terminations match to the chip to reduce associated parasitic and signal reflections.

(64)

Figure A-3(a) Off-chip bonding wire test Loop filter Bare Die Retimed Data VCO GND NRZ Circuit GND Retimed Data_B NRZB R_CLK R CLKB

Figure A-3(b) The testing PCB

The testing schematic of the closed-loop CDR is shown in Figure A-4. In order to avoid the substrate noise coupling from VCO block to sensitive other circuit blocks and thus degrade the jitter performance, the VCO ground is separated from the other circuit block ground. The VCO tuning range can be measured by the Spectrum Analyzer(Agilent E4440A PSA Series Spectrum Analyzer) with the tuning voltage generated by DC power supply(Agilent E3610A power supply). The PRBS non-return-to-zero fully differential input data is generated by the Pattern

(65)

voltage swing can be set by this instrument. After the loop is locked, the resulting eye diagram is monitored by the Oscilloscope(Tektronix TDS6124C Digital Storage Oscilloscope).

Figure A-4 Experimental test setup

We measure the CDR chip with above setup method. The first important parameter to test is the VCO’s tuning range. The measurement of the VCO result is showed in Figure A-5. The VCO tuning range is 9.0641GHz ~8.9532GHz. It is not in our required range. We conjecture that the circuit layout has heavy parasitic capacitance. The heavy parasitic capacitance lowers the VCO oscillation frequency and the ratio of the varactor capacitance to total capacitances. Thus, the VCO tuning range is more smaller than simulation. The signal spectrum is showed in Figure A-6. The signal power is about -40dBm and the VCO output spectrum is not pure. The signal power is too small to let the CDR system work correctly. The failure of experiment reminds us that the layout of the high-speed VCO circuit should be more

(66)

symmetrical to reduce high frequency signal coupling. The LC-tank VCO should have effective guard ring to cut off the noise coupling form the inductor and varactor.

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 8.94 8.96 8.98 9.00 9.02 9.04 9.06 9.08 measure result F re quenc y (GHz) control voltage (V)

Figure A-5 VCO tuning range experimental result

使用改進式電流驅動邏輯閂鎖器之百億位元/每秒資料與時脈回復電路

國 立 交 通 大 學

電信工程學系

碩 士 論 文

使用改進式電流驅動邏輯閂鎖器之百億位

元/每秒資料與時脈回復電路

10Gb/s Clock and Data recovery circuit

with improved MCML Latch

研究生：邱俊宏

指導教授：洪崇智 博士

使用改進式電流驅動邏輯閂鎖器之百億位

元/每秒資料與時脈回復電路

10Gb/s Clock and Data recovery circuit

with improved MCML Latch

研 究 生：邱 俊 宏

Student: Chun-Hong Chiu

指導教授：洪 崇 智 教授

Advisor: Prof. Chung-Chih Hung

國立交通大學

電信工程學系 電信研究所碩士班

碩 士 論 文

A Thesis

Submitted to Department of Communication Engineering

College of Electrical Engineering and Computer Science

National Chiao-Tung University

In Partial Fulfillment of the Requirements

For the Degree of

Master of Science

In

Communication Engineering

January 2006

Table of Contents

Abstrat(chinese) i

Abstrat(English) ii

Acknowledgement iv

List of Tables v

List of Figures vi

Chapter 1 Introduction 1

Chapter 2 Clock and Data Recovery Architectures 7

Chapter 3 10Gb/s CDR Design 21

Chapter 4 VLSI Implementation 41

Chapter 5 Conclusion and Future research 45

Bibliography 47

摘要

Abstract

誌謝

List of Tables

List of Figures

Chapter 1

Introduction

1.1 Background

1.2 NRZ Data Format

1.3 Optical-Fiber Transceiver

1.4 Eye Diagram Analysis

1.5 Thesis Overview

Chapter 2

Clock and Data Recovery

Architectures

2.1

Principles of Operation

2.2 CDR Fundamental

2.2.1 Bang-Bang PD

Clock lead

Clock lag

t

t

FF

D Q

FF

D Q

FF

D Q

FF

D Q

CLK

T

T

T

T

lag

國立交通大學

碩士論文

指導教授：洪崇智博士

研究生：邱俊宏

指導教授：洪崇智教授

電信工程學系電信研究所碩士班

碩士論文

_{( )}

_{( )}

_{( )}

_{( )}

_{( )}

_{( )}

_{( )}