1Gbps 串列連結收發器

全文

(1)國立交通大學電子工程學系碩. 電子研究所碩士班. 士. 論. 文. 1Gbps 串列連結收發器 A 1Gbps Serial-Link Transceiver. 研究生. : 周政賢. 指導教授. : 吳錦川. 教授. 中華民國九十三年五月.

(2)

(3) 1Gbps 串列連結收發器 A 1Gbps Serial-Link Transceiver 研究生：周政賢. Student : Cheng-Hsien Chou. 指導教授：吳錦川教授. Advisor : Prof. Jiin-Chuan Wu. 國立交通大學電子工程學系電子研究所碩士班碩士論文. A Thesis Submitted to Institute of Electronics College of Electrical Engineering and Computer Science National Chiao Tung University In Partial Fulfillment of the Requirements for the Degree of Master of Science In Electronic Engineering May 2004 Hsin-Chu, Taiwan, Republic of China. 中華民國九十三年五月.

(4)

(5) 1Gbps 串列連結收發器. 學生:周政賢. 指導教授:吳錦川博士. 國立交通大學電子工程學系電子研究所碩士班. 摘要隨著積體電路製程技術的進步，對於需要高頻寬和低延遲晶片之間資料傳輸也隨之增加。本論文描述一個高速串列式連結輸入輸出界面之設計。傳輸資料頻率定於 1Gbps。傳送器使用一個鎖相迴路作為一個時脈電路來提供八個相位給八對一多工器。此鎖相迴路輸入頻率為 31.25MHz，而輸出頻率為 125MHz。平行資料的預先調整相位機制被使用來減少多工器的時脈限制。在多工器和資料驅動器之間的預先驅動器使用主動電感負載來增加頻寬。開汲極電流模式輸出驅動器使用預先加強電路來增加傳送資料位元轉變時期所需的電流源。接收器使用具有磁滯現象的比較器將傳送過來的資料放大成數位訊號。然後，一個操作在輸入資料頻率一半的時脈資料回復電路使用雙追蹤路徑控制機制來達到更好的時脈雜訊表現。最後，解多工器將時脈資料回復電路的輸出轉變成八個平行資料通道。此傳送器採用0.35µm 2P4M CMOS製程技術實現。當鎖相迴路輸出時脈為125MHz時，量測結果顯示輸出時脈的方均根抖動和峰值抖動分別為11.42ps和82ps。傳送器能正常傳送出1Gbps的串列資料。在電壓電源為3.3V時，總消耗功率為141mW。. i.

(6) ii.

(7) A 1Gbps Serial-Link Transceiver Student: Cheng-Hsien Chou. Advisor: Prof. Jiin-Chuan Wu. Department of Electronics & Institute of Electronics National Chiao-Tung University. Abstract As the IC fabrication technology advances, the need for high-bandwidth and low-latency inter-chip data transfer has also increased. This thesis describes the design of a high-speed serial link I/O interface. The transmission data rate is targeted at 1Gbps. The transmitter uses a phase-locked loop (PLL) as a timing circuit to provide eight phases for the 8-to-1 multiplexer. The input frequency of the PLL is 31.25MHz and the output frequency is 125MHz. The pre-skew mechanism of the parallel data is used to reduce the timing constrain of the multiplexer. The pre-driver inserted between the multiplexer and the data driver uses active inductive peaking load to enhance the bandwidth. The open-drain current mode data driver uses a pre-emphasis circuit to increase the current during the data transition. The receiver uses the comparator with hysteresis to amplify the incoming data to full swing. Then, the clock and data recovery (CDR) operates at half of the input data rate and uses a dual-tracking path control mechanism to achieve better jitter performance. Finally, the de-multiplexer converts the CDR outputs to eight parallel data channels. The transmitter is fabricated in a TSMC 0.35µm 2P4M process. The measured RMS and peak-to-peak jitter of the 125MHz output clock of the PLL are 11.42ps and 82ps, respectively. The transmitter transmits 1Gbps serial data normally. Total power consumption is 141mW at 3.3V supply voltage.. iii.

(8) iv.

(9) 誌謝首先，我要感謝我的指導老師吳錦川教授，在碩士班兩年的研究生涯中，不厭其煩地指導我，不論是專業知識的培養，或是做研究的態度和處理問題的方法，都讓我獲益良多。其次，也要感謝陳巍仁教授、呂良鴻教授、邱煥凱教授撥冗擔任我的口試委員，並且提供我不少寶貴的意見。論文研究能夠完成，要感謝在 307 實驗室的諸多學長，謝謝你們這兩年的指導，特別要感謝阿傑學長，在晶片量測時給我莫大的幫助，並要感謝范姜、伯儒兩位學長悉心的教導，讓我獲益良多，在此衷心的感謝你們。還要感謝一同在 527 奮鬥的夥伴，權哲、棋樺、阿文、瑋仁、秉捷、如琳、紀豪、韋霆、旻珓、宗霖、致遠、丁彥，特別感謝阿瑞和阿嵐，時常陪我在實驗室大呼小叫，有了你們，平淡的研究生活多了許多樂趣，另外感謝我的室友英廷，在量測上給我的協助，還有其他的學長、同學、學弟，要感謝的人還有很多，在此一併感謝。還要感謝我的父母、家人對我的支持與關懷，讓我在成長與求學過程中能夠有所依靠。尤其是我的父母，在我最疲憊的時候，能吃到你們準備的水果是最幸福的事。也感謝我所有的親人們，一直幫助著我完成這段路程。最後要感謝我的女朋友琬鈺，總是給予我最大的支持和鼓勵，妳的一個笑容是我繼續努力下去的原動力。. 謹以此篇論文獻給所有關心我的人。. 周政賢國立交通大學中華民國九十三年五月. v.

(10) vi.

(11) CONTENTS ABSTRACT (CHINESE) .................................................... i ABSTRACT (ENGLISH) .................................................. iii ACKNOWLEDGEMENT .................................................. v CONTENTS ....................................................................... vii LIST OF TABLES .............................................................. xi LIST OF FIGURES ......................................................... xiii. CHAPTER 1 INTRODUCTION 1.1. MOTIVATION ...............................................................................................................1. 1.2. THESIS ORGANIZATION ...............................................................................................3. CHAPTER 2 BACKGROUND 2.1. BASIC SERIAL LINK ....................................................................................................5. 2.2. SIGNALING CIRCUITS ..................................................................................................8. 2.3. TIMING CIRCUITS .....................................................................................................10. 2.4. TIMING RECOVERY ARCHITECTURE .......................................................................... 11. 2.4.1. PLL-BASED ARCHITECTURE .............................................................................12. 2.4.2. OVERSAMPLING PHASE-PICKING ARCHITECTURE .............................................14. CHAPTER 3 TRANSMITTER 3.1. ARCHITECTURE OF TRANSMITTER WITH PRE-EMPHASIS ...........................................17. 3.2. PHASE-LOCKED LOOP ..............................................................................................19. vii.

(12) 3.2.1. INTRODUCTION ................................................................................................ 19. 3.2.2. PLL ARCHITECTURE ........................................................................................ 19. 3.2.3. CIRCUIT IMPLEMENTATION ............................................................................... 20. 3.2.3.1. PHASE FREQUENCY DETECTOR .................................................................... 20. 3.2.3.2. CHARGE PUMP ............................................................................................. 23. 3.2.3.3. LOOP FILTER ................................................................................................ 25. 3.2.3.4. VOLTAGE CONTROLLED OSCILLATOR .......................................................... 26. 3.2.3.5. DIVIDER ....................................................................................................... 33. 3.2.4. PLL PARAMETER DESIGN................................................................................. 35. 3.2.5. PLL NOISE ANALYSIS AND STABILITY.............................................................. 41. 3.3. MULTIPLEXER AND PRE-DRIVER............................................................................... 43. 3.4. DATA DRIVER AND PRE-EMPHASIS DRIVER ............................................................ 47. 3.5. TRANSMITTER SIMULATION RESULTS....................................................................... 49. CHAPTER 4 RECEIVER 4.1. ARCHITECTURE OF RECEIVER .................................................................................. 53. 4.2. SLICER ..................................................................................................................... 54. 4.3. CLOCK AND DATA RECOVERY .................................................................................. 57. 4.3.1. INTRODUCTION ................................................................................................ 57. 4.3.2. CDR ARCHITECTURE ....................................................................................... 58. 4.3.3. CIRCUIT IMPLEMENTATION ............................................................................... 59. 4.3.3.1. HALF-RATE PHASE DETECTOR ..................................................................... 59. 4.3.3.2. HALF-RATE FREQUENCY DETECTOR ............................................................ 62. 4.3.3.3. VOLTAGE CONTROLLED OSCILLATOR .......................................................... 65. 4.3.4 4.4. CDR PARAMETER DESIGN ............................................................................... 70 DE-MULTIPLEXER .................................................................................................... 74. viii.

(13) 4.5. RECEIVER SIMULATION RESULTS ..............................................................................76. CHAPTER 5 EXPERIMENTAL RESULTS 5.1. EXPERIMENTAL SETUP ..............................................................................................80. 5.2. PRINT CIRCUIT BOARD LAYOUT ...............................................................................82. 5.3. EXPERIMENTAL RESULTS ..........................................................................................83. CHAPTER 6 CONCLUSIONS AND FUTURE WORKS 6.1. CONCLUSIONS ..........................................................................................................94. 6.2. FUTURE WORKS .......................................................................................................95. REFERENCES .................................................................... 95 VITA ..................................................................................... 99. ix.

(14) x.

(15) LIST OF TABLES Table. 1-1 Industrial standards for high speed link .................................................................2 Table. 2-1 Maximum allowable cable loss (5m) [3]................................................................6 Table. 2-2 Comparison between full-rate and half-rate timing recovery architectures .........13 Table. 3-1 Parameters of the PLL ..........................................................................................39 Table. 3-2 The operation of pre-emphasis summary .............................................................49 Table. 4-1 Parameters of the CDR.........................................................................................71 Table. 5-1 Measured cable loss of 5m USB cable.................................................................85 Table. 5-2 Measured results summary of the Transmitter .....................................................91. xi.

(16) xii.

(17) LIST OF FIGURES Fig. 2-1 Block diagram of the basic serial link........................................................................5 Fig. 2-2. Maximum allowable cable loss (5m) [3] ...................................................................6. Fig. 2-3. Transmitter with different transmitter architectures: voltage-mode (a), current-mode (b), and differential (c)...........................................9. Fig. 2-4. Timing recovery architecture. (a) PLL-based (b) oversampling phase-picking .... 11. Fig. 2-5. (a) Full-rate data and clock (b) Half-rate data and clock .........................................13. Fig. 3-1. Block diagrams of transmitter with pre-emphasis ...................................................17. Fig. 3-2. Pre-skew of parallel data..........................................................................................18. Fig. 3-3. PLL architecture.......................................................................................................20. Fig. 3-4. PFD implementation ................................................................................................21. Fig. 3-5. TSPC D flip-flop used in PFD circuit......................................................................21. Fig. 3-6 (a) Simulation result of the PFD (b) The enlargement of the simulation result.......23 Fig. 3-7. Schematic of the charge pump.................................................................................24. Fig. 3-8. Schematic of the loop filter......................................................................................25. Fig. 3-9. Schematic of the four stages VCO and the delay cell..............................................26. Fig. 3-10 I-V curve of the symmetric load..............................................................................27 Fig. 3-11 Schematic of self-biased replica-feedback bias generator.......................................29 Fig. 3-12 Frequency response of the self-biased replica-feedback bias generator..................30 Fig. 3-13 Schematic of differential-to-single-ended converter ...............................................31 Fig. 3-14 Schematic of feed forward type duty-cycle corrector and its timing diagram ........32 Fig. 3-15 Transfer curve of the VCO ......................................................................................33 Fig. 3-16 Schematic of TSPC Asynchronous Divided-by-two circuit ....................................34. xiii.

(18) Fig. 3-17 Divider composed of asynchronous and synchronous counters and its timing diagram.................................................................................................................... 34 Fig. 3-18 Linear model of PLL............................................................................................... 35 Fig. 3-19 Open loop simulation using parameter in Table. 3-1 .............................................. 39 Fig. 3-20 Close loop simulation using parameter in Table. 3-1.............................................. 40 Fig. 3-21 Control voltage simulation using SPICE ................................................................ 40 Fig. 3-22 Simulation of eight-phase of the PLL ..................................................................... 40 Fig. 3-23 Linear model of PLL with different noise sources.................................................. 41 Fig. 3-24 Timing diagram of an 8 to 1 multiplexer ................................................................ 43 Fig. 3-25 Schematic of the 8 to 1 multiplexer and pre-driver ................................................ 44 Fig. 3-26 Schematic of the 8 to 1 pre-emphasis multiplexer and pre-driver .......................... 45 Fig. 3-27 Implementation of the active inductor peaking....................................................... 46 Fig.3-28 Schematic of the data driver.................................................................................... 48 Fig. 3-29 Schematic of the pre-emphasis driver..................................................................... 48 Fig. 3-30 Simulation results of (a) the driver outputs without pre-emphasis, (b) the differential output.................................................................................................... 50 Fig. 3-31 Simulation results of (a) the driver outputs with pre-emphasis, (b) the differential output ...................................................................................................................... 51 Fig. 3-32 Eye diagram of the signal at transmitting side without pre-emphasis .................... 52 Fig. 3-33 Eye diagrams of the signal at transmitting side with pre-emphasis........................ 52 Fig. 4-1. Block diagrams of the receiver................................................................................ 54. Fig. 4-2. Schematic of slicer .................................................................................................. 54. Fig. 4-3 Frequency response of slicer ................................................................................... 56 Fig. 4-4. Hysteresis window of the slicer............................................................................... 56. Fig. 4-5. The output of slicer when input 500MHz 150mV .................................................. 57. xiv.

(19) Fig. 4-6. Half-rate CDR architecture ......................................................................................58. Fig. 4-7. Half-rate phase detector ...........................................................................................60. Fig. 4-8 Operation of the half-rate phase detector.................................................................61 Fig. 4-9. Transfer characteristic of PD ...................................................................................62. Fig. 4-10 Half-rate frequency detector....................................................................................63 Fig. 4-11 Timing diagram of the FD (a) Fvco < 1/2 data rate (b) Fvco > 1/2 data rate ..........64 Fig. 4-12 Circular phase diagram............................................................................................64 Fig. 4-13 Up and down generator ...........................................................................................65 Fig. 4-14 (a) Delay cell (b) Half-circuit of the delay cell for small signal analysis................66 Fig. 4-15 Schematic of the linearization circuit ......................................................................68 Fig. 4-16 Transfer curve of the linear circuit ..........................................................................69 Fig. 4-17 Transfer curve of the VCO ......................................................................................70 Fig. 4-18 Open loop simulation using parameter in Table. 4-1...............................................72 Fig. 4-19 Close loop simulation using parameter in Table. 4-1 ..............................................72 Fig. 4-20 Control voltage of the VCO.....................................................................................73 Fig. 4-21 Up and downb signals of the frequency detector ....................................................73 Fig. 4-22 Retimed even and odd data and Retimed clock.......................................................74 Fig. 4-23 Asynchronous tree-type 2:8 de-multiplexer ............................................................75 Fig. 4-24 (a) 1:2 DEMUX module and (b) timing diagram ....................................................75 Fig. 4-25 Time domain of the received signal and output of the slicer...................................77 Fig. 4-26 CDR in the lock state and retimed clock .................................................................77 Fig. 4-27 Eight parallel data outputs of the de-multiplexer ....................................................78 Fig. 5-1. Transmitter chip micrograph....................................................................................81. Fig. 5-2. The experimental setup of the transmitter ...............................................................81. Fig. 5-3. The print circuit board for testing ............................................................................82. xv.

(20) Fig. 5-4 Jitter histograms of the PLL at 125MHz ................................................................. 83 Fig. 5-5 Measured PLL output waveform............................................................................. 84 Fig. 5-6. (a) Transmitter output eye mask (b) Receiver input eye mask................................ 84. Fig. 5-7. Measured cable loss of 5m USB cable.................................................................... 85. Fig. 5-8. The connector between two 1.8m USB cable ......................................................... 86. Fig. 5-9. Tx output waveform without pre-emphasis at 1Gbps ............................................. 87. Fig. 5-10 Tx output waveform with pre-emphasis at 1Gbps .................................................. 87 Fig. 5-11 Rx input waveform through 1.8m cable without Tx pre-emphasis at 1Gbps ......... 88 Fig. 5-12 Rx input waveform through 1.8m cable with Tx pre-emphasis at 1Gbps .............. 88 Fig. 5-13 Rx input waveform through 3.6m cable without Tx pre-emphasis at 1Gbps ......... 89 Fig. 5-14 Rx input waveform through 3.6m cable with Tx pre-emphasis at 1Gbps .............. 89 Fig. 5-15 Rx input waveform through 5.4m cable without Tx pre-emphasis at 1Gbps ......... 90 Fig. 5-16 Rx input waveform through 5.4m cable with Tx pre-emphasis at 1Gbps .............. 90 Fig. 5-17 The relationship between differential output level and pre-emphasis current ........ 92 Fig. 5-18 The relationship between RMS jitter and pre-emphasis current............................. 92. xvi.

(21) Chapter 1 Introduction. 1.1 Motivation Recently, the advances in IC fabrication technology along with aggressive circuit design have led to an exponential growth of the speed and integration levels of digital IC’s. However, these advancements have led to some chips being limited by the chip-to-chip data communication bandwidth. This limitation has motivated research in the area of high-speed links that interconnect chips [1]. Traditionally, system designers have addressed the increasing bandwidth demands by increasing the number of pins and wires interconnecting digital IC’s. However, this bandwidth improvement does not come for free. Increased number of pins, printed-circuit-board (PCB) traces, connectors, and cables drive up the overall system cost. In larger scale systems, e.g., multiprocessors or communication switches, a more attractive approach is to use point-to-point links. This approach has advantages both from a circuit design and an architectural point of view. From circuit design perspective, the use of point-to-point transmission lines offers greater flexibility in the physical construction of the. 1.

(22) system. Moreover, a point-to-point link has potential for higher communication bandwidth than a bus, due to its reduced signal integrity problems. From an architectural perspective, the bandwidth demands of high-speed systems make the shared bus medium the main performance bottleneck. Therefore, the architecture of most high performance communication switches is inherently based on point-to-point interconnections [2]. The population applications are such as optical communication, back plane interconnection, USB, IEEE1394, and TMDS. Some industrial standards of high speed link are listed in Table. 1-1. Traditionally, high-speed links in the Gb/s range have been implemented in GaAs or bipolar technologies. The primary advantage provided by those technologies is faster intrinsic device speed (higher fT). However, despite its slower device speed, CMOS technology is more widely available and allows higher integration than other technologies. With this availability, high-speed links built in CMOS would appeal to large-volume applications that require such high performance links. Furthermore, with higher integration, links could be built as a macro-block in a single-chip system that allows for significant cost savings in these applications.. Table. 1-1 Industrial standards for high speed link. 2.

(23) The goal of this research is to design a CMOS serial link transceiver with the data rate at 1Gbps.. 1.2 Thesis Organization The thesis is organized into six chapters. The chapter 1 introduces the motivation and the organization of this thesis. Chapter 2 describes the background behind this thesis research. It starts with an overview of a basic high-speed link and brings out signal and clocking methods in designing a high-speed serial link. Chapter 3 covers the design of the transmitter. High speed parallel to serial data conversion is achieved by means of time-division multiplexer toggled by a low jitter and 8-phases phase-locked loop. The pre-emphasis circuits adding to the output driver are realized not only to deal with high frequency attenuation of cable, but also to keep the voltage level at low frequency. Chapter 4 presents building blocks of the clock and data recovery circuit. The architecture with improved jitter performance is proposed. The frequency acquisition part design is also introduced. Chapter 5 shows the experimental results. Chapter 6 concludes this thesis and discusses the future development.. 3.

(24) 4.

(25) Chapter 2 Background. 2.1 Basic Serial Link A general serial link is composed of three primary components: a transmitter, a channel, and a receiver, as shown in Fig. 2-1. The data before transmission are usually parallel data stream in order to increase the bandwidth of the link. Therefore, a PISO (parallel in serial out) circuit is needed before sending to the transmitter driver. The transmitter converts digital bits into a signal stream that is propagated on the channel to the receiver.. Fig. 2-1 Block diagram of the basic serial link. 5.

(26) The channel, on which the signal travels, e.g. coaxial cable or twisted pair, is commonly called the communication channel. In this thesis, The USB cable is used as the serial link channel. The characteristic of USB cable (5m) is shown in Table. 2-1 and Fig. 2-2 [3].. Table. 2-1 Maximum allowable cable loss (5m) [3]. Fig. 2-2 Maximum allowable cable loss (5m) [3]. 6.

(27) The Receiver on the other end of the channel recovers the signal to the original digital information by amplifying and sampling the signal. The clock recovery circuit embedded in the receiving side is to adjust the receiver clock based on the receiver data to let the sampling point into the center of the data eye. Then, a SIPO (serial in parallel out) circuit converts the serial data back to N parallel bits. The termination resistors, which match the impedance of the channel, can minimize signal reflection. The performance of the serial link is mainly characterized by the data bandwidth. Another link performance metric, the bit error rate (BER), measures how many bit errors are made per second. The maximum data rate of the serial link is usually specified at a specific BER to guarantee the robustness of the overall system. BER is important not only because it reduces the effective system bandwidth, but also because in many systems, applying error correction techniques can prohibitively increase the system cost. The errors are caused by the noises that come from each part of the system. Intrinsic noise sources are the random fluctuations due to the inherent thermal and shot noise of the passive and active system components. However, especially in VLSI applications, other non-fundamental noise sources can limit the link performance. These noise sources include coupling from other channels, switching activity from other circuits integrated with the link circuitry, and reflections induced from channel imperfections. These noise types typically have a non-white frequency spectrum, and exhibit strong data dependencies. Moreover, their overall power is often proportional to the power of the transmitted signals. Therefore, due to the noise consideration, there are two main issues in designing high-speed serial link interference circuit: signaling and clocking. The signaling issue is how to maximize the voltage margins of the interface so that the receiver could have enough voltage margins to recover the data correctly. The clocking issue is how to maximize the timing margins of the interface to transmit and receive data. In many high-speed serial link applications, latency,. 7.

(28) power and die area are also critical issues [4].. 2.2 Signaling Circuits The transmitter drives a HIGH or LOW analog voltage onto the channel and is designed for a particular output-voltage swing based on the system specification. The design issues are to maintain small voltage noise and timing noise on the signal. There are two types of output drivers to drive the output: voltage-mode drivers and current-mode drivers. Voltage-mode drivers, as shown in Fig. 2-3 (a), are switches that switch the line voltage. Because the switches are implemented with transistors, the driver appears as a switched resistance. To switch the voltage fully, a small resistance is needed which typically requires a large switching device. In contrast, current-mode drivers, as illustrated in Figure 2-3 (b), are switching current sources. The output impedance of the driver is much higher than the line impedance. It is also called high impedance signaling. Therefore, the transmitter bandwidth is typically not an issue even with significant output capacitance. The voltage to be transmitted on the line is determined by the switched current and the line impedance or an explicit load resistor. The driver can be simply implemented by biasing the MOS transistor in its saturation region. Current-mode drivers are slightly better in terms of insensitivity to supply-power noise because they have high output impedance and hence the signal is tightly coupled only to VOH, the signal return path. The output current does not vary with ground noise as long as the current source bias signal is tightly coupled to the ground signal. The disadvantage with current-mode drivers is that, in order to keep the current sources in saturation, the transmitted voltage range must be well above ground that increases power dissipation.. 8.

(29) Fig. 2-3 Transmitter with different transmitter architectures: voltage-mode (a), current-mode (b), and differential (c). For better supply-noise rejection, the differential mode can be adopted, as shown in Fig. 2-3 (c), because the supply noise is now common-mode. Since the current remains roughly constant, the transmitter induces less switching noise on the supply voltage that could benefit other transmitted or received signals on the same die. To reduce reflections at the end of the transmission line, the transmitter needs to be terminated. An off-chip termination resistor could introduce significant impedance mismatches because of the package parasitic components. To incorporate the resistor, with current-mode drivers, an explicit on-chip resistor at the driver can act as the termination resistor. If a resistive layer is not available, a transistor in its linear region can be used as the resistor. With voltage-mode drivers, the design is slightly more complex because the switch resistance should match the line impedance Z0. This may be done either through proper sizing of the driver or by over-sizing the driver and compensating with an external series resistor, as shown in the Fig. 2-3 (a).. 9.

(30) 2.3 Timing Circuits To properly recover the bit sequence, the receiver’s sampling clock phases need to have a stable and pre-determined relationship to the phase of the incoming data, thus maximizing the timing margin. The deterministic phase relationship becomes an even more stringent requirement in higher bandwidth systems. In these systems, the bit-rate is a multiple of the on-chip clock, requiring either an explicitly faster bit-clock, or multiple phases of lower frequency clocks with well-controlled phase relationship between them. This clock position must be determined from the phase and frequency of incoming data by the timing recovery circuit. Therefore, a reliable and flexible method for dealing with the synchronization problem is to use on-chip active phase aligning circuits. Generally, these circuits fall in a class of control systems known as Phase-Locked Loops. The sampling clock quality can be characterized by phase offset and jitter. Phase offset is a static (DC) quantity that is equal to the difference between the ideal average position of a clock and the actual average position. Jitter is the dynamic (AC) variation of phase and is dominated by on-chip power-supply and substrate noise. Jitter is specified in terms of both short-term and long-term variations. Cycle-to-cycle jitter describes the short-term uncertainty on the period of a clock, while long-term jitter describes the uncertainty in the position of the clock with respect to the system clock source. In conventional digital design the most important requirement is minimizing cycle-to-cycle jitter. In high-speed links, however, both quantities can be equally important. Low frequency jitter is caused by imperfections on the system clock source and slow temperature and operating voltage variations. This type of jitter can be tracked reasonably well by employing a phase locked loop. Medium frequency and cycle-to-cycle jitter are caused by on-chip supply and substrate noise and are the major concern.. 10.

(31) 2.4 Timing Recovery Architecture The task of the timing recovery circuit is to recover the phase and frequency information from the transition in the received data stream. The optimal sample point is midway between the possible data-transition times. Noise and mismatches inherent to the timing recovery circuit produce jitter in the sampling clocks, which degrade the timing margin. Moreover, the transmitter jitter causes uncertainty in the transition points makes clock extraction more difficult. As shown in Fig. 2-4, two types of timing recovery architectures have been used in links. One is the PLL-based (data-recovery PLL) [5] and the other is the oversampling phase-picking [6].. Fig. 2-4 Timing recovery architecture (a) PLL-based (b) oversampling phase-picking. 11.

(32) 2.4.1 PLL-based Architecture In PLL-based architecture, as shown in Fig. 2-4 (a), the negative feedback loop controls the internal phase by adjusting the frequency of the voltage controlled oscillator (VCO) with Vctrl signal until the frequency matches that of an external reference. A phase detector detects the phase difference between the sampling clock and the external input data signal, and adjusts the VCO control voltage. A phase detector generally drives a charge pump that converts the phase difference into a charge. A filtered version of this charge becomes the VCO control voltage. Based on the phase information of the data, the best sample is chosen as the data bit by some decision logic. To maintain good phase relationship between the sampling clock and the data transitions, the PLL should detect the input phase accurately and track any input jitter with a high loop bandwidth. Unfortunately, the stability limits the loop bandwidth of the system. Because the timing information is embedded in the data system, coding of the data is used to ensure a minimum and maximum transition density. High data transition density in the data stream is preferred since it could maintain the stability of the system. PLL-based timing recovery architectures can be categorized into full-rate and half-rate architectures. In a full-rate circuit the position of the data transition is compared to the falling edge or rising edge of the clock and clock frequency is equal to the data rate as shown in Fig. 2-5 (a). Single edge triggered flip flop can be used to retime the data. On the other hand, the location of the data transition is compared to both rising and falling edges of the clock in a half-rate circuit and the clock frequency is equal to one half of the data rate as shown in Fig. 2-5 (b). Due to the one half of the clock frequency, double edge triggered flip flop is needed to perform the data retiming. The comparison between the two architectures is listed in Table. 2-2 [7] [8].. 12.

(33) Fig. 2-5 (a) Full-rate data and clock (b) Half-rate data and clock. The most important advantage of half-rate architectures is the reduction of the circuit speed by a factor of two. This often means the reduction of the total power dissipation. In fact, as the operation speed of circuits approaches the maximum operating frequency of a particular technology, the required power consumption grows exponentially. In addition, the de-multiplexing performed simultaneously by half-rate architecture is another attractive feature that makes them suitable for serial link architecture. It can reduce the complexity, hardware, and power dissipation of the deserializer.. Table. 2-2 Comparison between full-rate and half-rate timing recovery architectures. 13.

(34) The duty cycle mismatch is a major concern in employing half-rate timing recovery architecture. If the spacing between the rising and falling edges of the clock signal is different from half to the clock period, the width of the data eye sampled by the rising edge is different from that sampled by the falling edge, resulting in bimodal jitter. So the duty cycle of the clock signal must be considered carefully in the design of half-rate timing recovery architecture. The Clock and Data Recovery (CDR) architecture presented in this thesis employs half-rate architecture. Although the 0.35µm CMOS technology is fast enough to perform full-rate operation (1GHz), the resulting reduction of power consumption makes the half-rate (500MHz) approach a good candidate.. 2.4.2 Oversampling Phase-picking Architecture The second timing recovery scheme is the oversampling phase-picking as shown in Fig. 2-4 (b). Instead of using feedback loop to control the sampling phases, the data stream is sampled at multiple phase positions per bit creating an oversampling representation of the data stream. It does not require data coding or frequency acquisition since the system clock is readily available through the clock channel. What has to be handled is to adjust the skew between the clock and received data streams. Transitions in the data can be extracted from the sampled data. Based on the data transitions, the sample position nearest the center can be chosen as the data bit. The way to choose data is determined by different digital algorithms, like majority voting [9]. The phase-picking architecture has several advantages. First, it replaces the feedback loop with a feed-forward loop, allowing the selected sample to track. 14.

(35) phase movements of the data with respect to the clock without an intrinsic bandwidth limitation. The maximum tracking rate is limited by the transition information present. This fast tracking can potentially track the transmit PLL’s jitter accumulation. A second advantage of the phase-picking architecture is that long PLL phase-locking time is not needed. Phase decisions are made whenever input transitions are present. The primary disadvantage of the architecture is that there is an inherent static phase error due to the phase quantization. Higher oversampling ratios could reduces the static phase error but add significant complexity to the design. Furthermore, inherent sampler uncertainty limits the minimum quantization error. More significantly, the increased number of samplers increases the input capacitance, hence limiting the input bandwidth. Therefore, the architecture has a trade-off between the input bandwidths and static phase offsets. For high input bandwidths, the tradeoff favors a low oversampling ratio with the penalty of higher static phase offsets due to the coarse quantization. Besides, due to the open loop mechanism, an error may occurs when sampling point just stands on the data edges, which is not a good position for sampling time, This condition is usually introduced by the static phase error between clock and signal, i.e. the timing skew. However, the feed-forward loop could not offer a mechanism to eliminate the effect of timing skew, which may cause the design complexity of the decision algorithm.. 15.

(36) 16.

(37) Chapter 3 Transmitter. 3.1 Architecture of Transmitter with Pre-emphasis This chapter presents the transmitter design. The purpose of the transmitter is to drive the signal off chip using electrical quantities with the least power, area and noise based on the channel characteristics. Fig. 3-1 shows the block diagrams of the transmitter architecture.. Fig. 3-1 Block diagrams of transmitter with pre-emphasis. 17.

(38) The data input is from the PRBS (pseudo random bit sequence). The PRBS is a maximal-length sequence with polynomial X7 + X6 + 1. The data processing circuit converts the parallel data streams into differential signals and pre-skews the data before feeding them into the multiplexer. The pre-skew of parallel data are shown in Fig. 3-2 [10].. Fig. 3-2 Pre-skew of parallel data. By using 8:1 input-multiplexer to serialize low-speed eight channels parallel data on eight even-spaced phases of 125MHz which gives a bit rate 1Gbps, we can reduce the frequency requirement of the timing circuits and the digital logic. The eight even-spaced phases of frequency 125MHz is given by the PLL. Finally, the serial data are transmitted out through data driver. Furthermore, due to the ISI (Inter-Symbol-Interference) issue which reduced the transmitted signal’s timing and voltage margins, a pre-emphasis circuit is applied to the data driver. In the following section, we will describe the detail circuits of the function blocks in the transmitter architecture.. 18.

(39) 3.2 Phase-Locked Loop 3.2.1 Introduction Phase-locked loop (PLL) is an important building block used in many aspects including digital, analog and communication applications. For example, it can be used to recover clock from data signals, perform synchronization, frequency synthesizer, and generate multiple phases with equal phase resolution. Recently, as the demand for higher bandwidth data link, the PLL design plays a key part in the link performance. In the transmitter, we introduce the circuit design of a charge-pump type PLL with a reference input clock signal at 31.25MHz and output clock signal at 125MHz. By adopting four differential stages in voltage controlled oscillator, it generates eight clock phases for the use of the eight-to-one multiplexer.. 3.2.2 PLL Architecture A phase-locked loop (PLL) is basically an oscillator whose phase and frequency is locked to those of the input signal. This is done by using a negative feedback control loop, as shown in Fig. 3-3, which includes a phase/frequency detector (PFD), a charge pump circuit (CP), a loop filter (LF), a voltage controlled oscillator (VCO), and a frequency divider (divided by N). The PFD is used to compare the feedback signal (Fback) from the frequency divider with the reference input signal (Fref), and generates the Up and Downb signal to the following charge pump circuit. Based on Up and Downb input signals, the charge pump. 19.

(40) begins to charge or discharge the loop filter to change the input control voltage (Vctrl) of the VCO which varies the frequency of the output signal (Clk). The loop filter is basically a low pass filter used to filter out the high frequency component coming from the PFD and charge pump. In this way, the frequency of the feedback signal could be adjusted to be the same with the reference signal through the feedback control loop. In steady state, the frequency of the output signal will be N-times of the input signal. Moreover, the input signal (Fref) and the feedback signal (Fback) are phase-aligned.. Fig. 3-3 PLL architecture. 3.2.3 Circuit Implementation. 3.2.3.1. Phase Frequency Detector. The phase frequency detector (PFD) is a digital sequential circuit employs a tri-state operation. It could be implemented simply by two dynamic D flip-flops and one NOR gate, as shown in Fig. 3-4. The TSPC D flip-flop schematic used in PFD is shown in Fig. 3-5 [11].. 20.

(41) Fig. 3-4 PFD implementation. Fig. 3-5 TSPC D flip-flop used in PFD circuit. The PFD is triggered by two positive clock edges of the reference (Fref) and the feedback (Fback) signals. If the reference clock leads the feedback clock, the Up signal will be set from low to high. This will in turn increase the frequency of the voltage controlled oscillator output signal. When the feedback signal’s rising edge arrives, the reset signal will be high to reset the Up signal to low. In contrast, if the reference clock lags the feedback clock, the Down signal will be set to high, until the reference signal triggers the reset signal. This. 21.

(42) Down signal, on the contrary, is used to decrease the frequency of the voltage controlled oscillator output signal. This type of operation has a linear range of ±2π and can act as both phase detector and frequency detector. This property will greatly enhance the locking range. Ideally, the PFD should have the ability to distinguish any phase error between reference and feedback signals. In practical, when the phase error is too small, the reset signal is so fast that the following charge pump circuit will not be activated. This will result in dead zone region (undetectable phase difference range). The dead zone is highly undesirable because it allows the VCO to accumulate as much random phase error as the phase difference with respect to the input while receiving no corrective feedback. The dead zone region could be eliminated by adding extra delay cells in the reset path to ensure that when both reference and feedback signals are at the same phase, there would be equal and non-zero Up and Down pulses at the output. The elimination of the dead zone results in overall linear operating characteristics for the PFD, especially for input signals with small but finite phase difference. But inserting the delay cells will limit the maximum operation frequency that is in inverse. △Vctrl (mV) / T. proportion to the total reset path delay [12].. Phase difference (ns) (T=32ns) (shift +16ns) (a). 22.

(43) △Vctrl (µV) / T. Phase difference (ns) (T=32ns) (shift +16ns) (b) Fig. 3-6 (a) Simulation result of the PFD (b) The enlargement of the simulation result. The simulation results of the PFD, which followed by a charge pump with 150uA and 70pF load capacitor, is shown in Fig. 3-6, where △Vctrl is the voltage change on the load capacitance. Since mismatch exists in Up and Down signal path, the curve shows some offset.. 3.2.3.2. Charge Pump. The schematic of the charge pump circuit is shown in Fig. 3-7 [13]. It can charge or discharge the loop filter to vary VCO center frequency according to Up and Downb from PFD. A conventional charge pump circuit has problems such as charge sharing in high impedance state, charge injection, and clock feedthrough. Charge injection is produced by the overlap capacitance of the switch devices and the capacitance at the intermediate node between the current source and switch devices. This charge injection will result in a phase offset at the input of the PFD when the PLL is locked. To eliminate the charge injection problem, the two. 23.

(44) switch devices are separated from the output voltage. Therefore, the output voltage is now isolated from the switching noise resulting from the overlap capacitance of the two switch devices. In addition, the intermediate node between the current source and switch devices will charge to the output voltage only by the gate overdrive of the current source devices, Vgs – Vt, an amount independent of the output voltage. Moreover, since both the NMOS and PMOS current sources always turn on in each cycle, any charge injection will cancel out to first order with equal current source device sizes. The matching between charge and discharge current is improved by balancing the loading on the charge pump control signals, Up and Downb. This is accomplished by the dummy current source path whose control signals are Upb and Down.. Fig. 3-7 Schematic of the charge pump. 24.

(45) 3.2.3.3. Loop Filter. A second-order on chip loop filter is designed to suppress the reference spurs. The loop filter is a low pass filter that is used to extract the average value from the PFD output. As shown in Fig. 3-8, it is composed of a resistor R1 in series with capacitor C1 and a capacitor C2 in parallel.. Fig. 3-8 Schematic of the loop filter. The loop filter provides a pole in the original to provide an infinite DC gain to get the zero static phase error, and a zero in the open loop response in order to improve the phase margin to ensure overall stability of the loop. Capacitance C2 is used to provide higher-order roll off for reducing the ripple noise to mitigate frequency jump. The total transfer function of the loop filter can be expressed as. F ( s) =. sR1C1 + 1 1 C1 + C2 s[(sR1C1C2 C1 + C2 ) + 1]. (3-1). Kh× (S + ωz ) S × (1 + S / ωP ). (3-2). and hence. F (s) =. 25.

(46) where. ω z = 1 / R1C1 , ωP = ωz × (1 + C1 / C2 ) , Kh =. R1 × C1 C1 + C2. But the adding of the capacitance C2 will make the overall PLL system become third-order one and affect the stability of the loop. In general, by setting C1>20×C2, the third-order can be approximated to second-order loop.. 3.2.3.4. Voltage Controlled Oscillator. The building blocks of the VCO include a four stages ring oscillator and a self-biased replica-feedback bias generator [14] [15]. Fig. 3-9 shows the schematic of the four stages VCO and the delay cell.. Fig. 3-9 Schematic of the four stages VCO and the delay cell. 26.

(47) In order to have the low jitter characteristics of the output clock, the delay cell used in voltage controlled oscillator (VCO) should have low sensitivity and high noise rejection capability of the supply and substrate voltage. The supply noise can be categorized into static and dynamic noise. The architecture of the VCO used in this thesis can greatly improved the static and dynamic supply noise [16]. The delay cell of the VCO contains a source-coupled pair with diode-connected PMOS devices as resistive loads in shunt with an equally sized PMOS device. They are called symmetric loads because their I-V curve is symmetric about the center of the voltage swing, as shown in Fig. 3-10.. Fig. 3-10 I-V curve of the symmetric load. Basically, to get the high noise rejection capability over the supply and substrate noise, the load of the differential pair should have a linear I-V characteristic. In practice, this is difficult to use MOS device to achieve it. But the symmetric load can cancel the first order of the common mode voltage noise. Therefore, the symmetric load here, though nonlinear, could. 27.

(48) be used to have high dynamic supply noise immunity. The control voltage, Vbp, is the bias voltage for the PMOS device. In order to provide a bias current that is independent of the static supply noise, the bias voltage of the NMOS current source, Vbn, will be continuously adjusted. As the supply voltage changes, the drain voltage of the NMOS current source also changes. However, the gate bias is adjusted by the replica-feedback bias generator to keep the output current constant. It seems that it makes the output resistance of the NMOS current source higher. Hence the static supply noise is greatly improved. Based on the analysis of the I-V curve, it can be shown that the effective resistance of a symmetric load (Reff) is directly proportional to the small signal resistance at the ends of the swing range which is just one over the transconductance (gm) for one of the two equally sized PMOS biased at Vctrl. Therefore, the buffer delay is. td = Reff Ceff =. 1 Ceff gm. (3-3). where Ceff is the effective buffer output capacitance. The drain current for one of the two equally sized devices biased at Vctrl is. Id =. kp [(V DD − Vctrl ) − Vtp ]2 2. (3-4). Taking derivative with respect to Vctrl, the transconductance gm is given by. g m = kp[(VDD − Vctrl ) − Vtp ]. (3-5). The buffer delay is then given by. td =. C eff. kp [(V DD − V ctrl. )−. Vtp. ]. Thus, for N stages of the VCO, the oscillator frequency is given by. 28. (3-6).

(49) f osc =. kp [(V DD − V ctrl ) − Vtp 1 = 2 Nt d 2 NC eff. ]. (3-7). The gain of the VCO is given by. K vco =. df osc − kp = dV ctrl 2 NC eff. (3-8). As a result, Kvco is independent of the buffer bias current and the VCO has first order tuning linearity. The self-biased replica-feedback bias generator of the VCO delay cell is shown in Fig. 3-11. It provides the output bias voltage Vbp and Vbn from input signal Vctrl. The primary function is to continuously adjust the VCO delay buffer bias current to provide the correct lower swing limit Vctrl for the VCO delay buffer stages. As a result, it builds up a current that is held constant and independent of supply voltage.. Fig. 3-11 Schematic of self-biased replica-feedback bias generator. 29.

(50) The self-biased replica-feedback bias generator consists of a PMOS source coupled differential pair, a half-buffer replica, and a control voltage buffer. The differential amplifier is actually a unity-gain buffer which forces the voltage of node Va in Fig. 3-11 equal to Vctrl, a condition required for correct symmetric load swing limits, and provide the bias voltage Vbn for the NMOS current source. Besides, the bias voltage, Vbn, is dynamically adjusted by the differential amplifier to increase the supply noise immunity. With the half-buffer replica, the net result is that the output current of the NMOS current source is established by the load element and is independent of the supply voltage. If the supply voltage changes, the amplifier will adjust to keep the swing and the bias current constant. Because the differential amplifier utilizes the self-biased architecture, there are two stable states, one of which is unbiased. As a result, a start-up circuit is needed to bias the amplifier when power-up. Because the differential amplifier and the half-buffer replica form a two-stage negative feedback loop, frequency response issue must be taken into consideration. Fig.3-12 shows the frequency response of the self-biased replica-feedback bias generator.. Fig. 3-12 Frequency response of the self-biased replica-feedback bias generator. 30.

(51) Basically, there are two poles in the loop. One is at amplifier output, and the other is at the half-buffer replica output. Since the pole at the amplifier output is the dominant one, it can be moved toward origin to increase the phase margin of the loop by the capacitive load Cc of the NMOS current source gates in the VCO buffer chain. Moreover, in order to track any supply and substrate noise that affect the VCO jitter performance, the bandwidth of the self-biased circuit is usually set equal to the operation frequency of the VCO. The bias circuit also provides a buffered version of control voltage Vctrl using an extra control voltage buffer. This can isolate the control voltage Vctrl from capacitive coupling in the VCO buffer chain. The differential oscillator output is converted to the 50% duty cycle single-ended signal used as input to the phase-frequency detector with the differential-to-single-ended converter shown in Fig. 3-13 [15] and the feed forward type duty-cycle corrector shown in Fig. 3-14 [11]. The two differential amplifiers of the differential-to-single-ended converter use the same current source bias voltage, Vbn, generated by the self-biased replica-feedback bias generator for the VCO. According to Vbn, the circuit corrects the input common-mode voltage level and provides signal amplification.. Fig. 3-13 Schematic of differential-to-single-ended converter. 31.

(52) Fig. 3-14 Schematic of feed forward type duty-cycle corrector and its timing diagram. The duty-cycle corrector is connected behind the differential-to-single-ended converter to ensure that the duty-cycle of the VCO will be 50%. The signal P+ selected from the multiphase signals turn on M1 and M2, and charges the output node clk+ of the duty-cycle corrector almost instantaneously. Because the discharge path of the node clk+ is already off due to the signal P-. The signal P-, which is also selected from the multiphase signals, is the one whose rising edge is shifted by 180° in phase from that of P+. Similarly, the signal Prapidly discharges the node clk+ and delivers the desired 50% duty-cycle signal. Since this duty-cycle correction circuit consists of only two transmission gates and two inverters, the area is minimal and the power consumption is negligible. In order to drive next stages, digital buffers are added at the output to improve the driving ability. The PLL used in this thesis needs to generate eight phases for the transmitter multiplexer. Therefore, the VCO uses four delay buffer stages with the output frequency at 125MHz. The transfer curve simulation result of the VCO is shown in Fig. 3-15. The supply voltage is 3.3V. For Vctrl between 0.5V to 2.2V, the gain of the VCO is -118.5MHz. And the transfer curve is monotonic.. 32.

(53) Fig. 3-15 Transfer curve of the VCO. 3.2.3.5. Divider. Because the output frequency of the VCO is 125MHz and the input reference frequency is 31.25MHz. Hence a divided-by-four circuit is used. The TSPC D Flip-Flop connected its inverted output to D input is used as a divided-by-two circuit, as shown in Fig. 3-16 [17]. In this circuit we need to check input clock driving capability to assure correct operation. Then, two divided-by-two circuits are cascaded to get a divided-by-four circuit. Unfortunately, asynchronous counter will accumulate jitter stage by stage. A synchronous counter is used at the last stage to re-sample the clock, and it will eliminate the jitter accumulated in asynchronous counter, as shown in Fig. 3-17.. 33.

(54) Fig. 3-16 Schematic of TSPC Asynchronous Divided-by-two circuit. Fig. 3-17 Divider composed of asynchronous and synchronous counters and its timing diagram. 34.

(55) 3.2.4 PLL Parameter Design Due to the charge pump switching characteristics, the PLL is generally a discrete-time domain operation that is difficult to use continuous time-domain analysis. However, if under some condition, the s-domain model could also be used to get a thorough understanding of the negative feedback loop. Fig. 3-18 shows the linear model of the PLL.. Fig. 3-18 Linear model of PLL. Assume the PLL is in lock state. The PFD and CP have a current change of Ip/2π (A/rad), the LF has a transfer function F(s) (V/A), the VCO has a gain of Kvco (Hz/v), and the feedback factor is 1/N. The conversion gain of the VCO should be changed to 2πKvco/s (rad/sec-V), because phase is the integral of the frequency. Based on the above definitions and PLL linear model, the open loop gain of the PLL can be represented as G (s) × β (s) =. θ back (s ) I P × Kvco × F ( s ) = θ in (s ) s× N. The closed loop transfer function of the PLL is given by. 35. (3-9).

(56) H ( s) =. θout (s ) G( s) N × G(s) N × K = = = θin (s ) 1 + G(s) × β (s) N + G(s) s + K. (3-10). Therefore, the 3-dB bandwidth is I × Kvco × F ( s ) ω3dB = K = P N. (3-11). From analysis of LF in section 3.2.3.3, we know that the shunt capacitance C2 is typically much smaller than C1. Therefore, we can neglect the capacitor C2 and using classical two-pole system and second-order linear model of PLL to analyze the characteristic of transient response. With F(s) = R1 + (1/sC1), the closed loop transfer function can be derived as H (s) =. (1 + SR1C1 ) I P × K vco ⋅ I K R I K C1 S 2 + P vco 1 S + P vco N NC 1. (3-12). Equation (3-12) can be compared to the classical two-pole system transfer function 2ζ × ωn + ωn 2 H ( s) = 2 S + 2ζ × ωn × S + ωn 2. (3-13). Therefore, the natural frequency ωn, and damping factor ζ can be derived as ωn =. ζ =. I p K vco NC1. ωn 2ω z. (3-14). (3-15). In the case of the PLL design, the frequency noise of the VCO could be the dominant noise source to influence the phase noise performance. As will be seen in later section, the noise of the VCO has the high pass characteristics. Therefore, a large loop bandwidth for the PLL feedback system is better because it can enhance the tracking ability. The choice of the damping factor ζ is a trade off between acquisition time and step response stability. If larger ζ. 36.

(57) is chosen, the system could have longer acquisition time. On the other hand, if smaller ζ is chosen, the system may be ringing for step response or become unstable. Then, we use the loop bandwidth and the phase margin to determine the component values of the loop filter. By substituting equation (3-2) into equation (3-11), we can get Loop BW =. I P × Kvco R1C1 ⋅ N C1 + C 2. (3-16). From equation (3-17), the phase term will be determined based on the pole and zero of the loop filter such that the phase margin is calculated as PM = tan −1. BW. − tan −1. ωz. BW. (3-17). ωp. By setting the derivative of the phase margin equal to zero, the phase margin is maximum when the loop bandwidth is set to the average of pole and zero. BW = ω zω p. (3-18). We can define a new parameter, γ, as γ =. BW. ωz. =. ωp. (3-19). BW. From equation (3-20), the capacitance ration of C1 and C2 can be represented by C1 = γ 2 −1 C2. (3-20). The loop bandwidth (BW) now can be written as BW =. I p × K VCO N. ⎛ 1 ⋅ R1 ⎜⎜1 − 2 ⎝ γ. 37. ⎞ ⎟⎟ ⎠. (3-21).

(58) The design flow of a third-order PLL can be derived from equations (3-19), (3-20), and (3-21). The design flow can be summarized as follows [18]:. (1) Determine Kvco by measuring VCO test keys or simulating a VCO using in your design or referring to the data sheets of the employed commercial VCO. (2) Depending on the desired noise and transient performance, determine the loop bandwidth BW. Usually, BW is less than 1/10 of reference clock. (3) If the filter is off-chip, set Ip to be around 100µA to 1mA. If an on-chip filter is employed, decrease the value of Ip so that reasonable trade off between chip area and pump current could be reached. (4) Determine the nominal value of N according to the system to be applied to. (5) Selecting the required PM specification. The zero and pole positions are then determined by equation (3-19). (6) With BW, Ip, PM, N, and Kvco determined, R1 can be calculated with equation (3-21). (7) Calculate the value of C1 with C1=1/R1ωz. (8) Calculate the value of C2 by equation (3-20).. The parameters used in the PLL are listed in Table. 3-1. The MATLAB simulation results based on equation (3-11) and (3-12) can be shown in Fig. 3-19 and Fig. 3-20. Fig. 3-21 shows PLL closed-loop control voltage of the SPICE simulation. Fig. 3-22 shows the eight even-spaced phases of frequency 125MHz.. 38.

(59) Table. 3-1 Parameters of the PLL. Fig. 3-19 Open loop simulation using parameter in Table. 3-1. 39.

(60) Fig. 3-20 Close loop simulation using parameter in Table. 3-1. Fig. 3-21 Control voltage simulation using SPICE. Fig. 3-22 Simulation of eight-phase of the PLL. 40.

(61) 3.2.5 PLL Noise Analysis and Stability As mentioned in chapter 2, timing jitter could affect the maximum timing margin of the transceiver and therefore, performance of the serial link. The output clock jitter performance of the PLL depends on the jitter of the VCO, input source, and the design of the loop parameters. There are some noise sources that contribute the output jitter in PLL, as depicted in Fig. 3-23, where θin is the reference noise, θpfd is PFD and CP noise, θlf is loop filter noise, and θvco is the VCO noise.. Fig. 3-23 Linear model of PLL with different noise sources. These noises introduce the phase fluctuations or timing jitter in time domain. Using closed loop analysis, the transfer functions with different noise sources can be derived as H (s) =. H pfd (s) =. θ out ( s ) N × K = θ in ( s ) s + K. θ out (s) N K = 2π ⋅ ⋅ IP s + K θ pfd (s). 41. (3-22). (3-23).

(62) H lf ( s ) =. θ out ( s ) K = 2π ⋅ vco θ lf ( s ) s+K. (3-24). Hvco(s) =. θout(s) s H(s) = =1θvco(s) s + K N. (3-25). where K and H(s) are given in (3-10) and (3-11). Each noise transfer function has its own characteristics. H(s) and Hpfd(s) are low-pass functions, Hlf(s) is a band-pass function, and Hvco(s) is a high-pass function. Therefore, based on the different frequency responses of the transfer functions, there exists a trade off in choosing the wide or narrow bandwidth. Narrow bandwidth of PLL will suppress noise from the input reference source and PFD part, while wide one will suppress noise from the VCO. Most of the time, the input source of the PLL is from the crystal oscillator, which has much smaller phase noise than the one of the VCO. Therefore, the input source could be viewed as jitter-free. Based on the analysis, the loop bandwidth of the PLL should be maximized to meet the high-pass function of the VCO to reduce the timing jitter. The maximum nature frequency ωn of the PLL is restricted of the reference clock frequency ωin. Using the analysis from [19] [20], the criteria of the stability limit can be derived as ωn 2 <. ωin 2 π ( R1C1ωin + π ). (3-26). As a rule of thumb, stability can be assumed by keeping ωn < 1/10 ωin. Choosing larger loop bandwidth indicates that more phase noise from the input clock will transfer to the output with larger loop bandwidth. However, it does not cause a problem when the input is a clean clock source.. 42.

(63) 3.3 Multiplexer and Pre-driver The multiplexer is used to serialize the pre-skewed parallel data channels D0~D7. Each multiplexer is switched by two series NMOS transistors that are controlled by two adjacent clock phases. For example, as shown in Fig. 3-24, at the timing interval between the rising edge of clk4 and the falling edge of clk1, the center of the input signal D0+ and D0- starts driving the multiplexer output. The PLL generates the required phases of clk0 through clk7 with 1ns phase resolution to reach the data transfer rate of 1Gbps.. Fig. 3-24 Timing diagram of an 8 to 1 multiplexer. 43.

(64) The schematic of the eight to one multiplexer is shown in Fig. 3-25 [21]. The speed of the multiplexer circuit is mainly determined by the resistance of PMOS and the total capacitance of the output node. Increasing the PMOS size relative to the NMOS size would increase the speed while reducing the swing of the output nodes A and B. The ratio of the PMOS and NMOS sizes has to be chosen such that the swing at the multiplexer outputs A and B are enough to switch the pre-driver in the worst case.. Fig. 3-25 Schematic of the 8 to 1 multiplexer and pre-driver. In order to determine pre-emphasis or not, we must know the previous bit and the current bit to control the pre-emphasis driver. The schematic of the pr-emphasis multiplexer is identical to Fig. 3-25 except it is delayed by one bit period 1ns, as shown in Fig. 3-26 [22].. 44.

(65) Fig. 3-26 Schematic of the 8 to 1 pre-emphasis multiplexer and pre-driver. The pre-driver is composed of a source coupled pair with active inductive peaking load. It is inserted between the multiplexer and the final output driver to reduce the size of the multiplexer. The active inductive peaking load can substantially enhance the bandwidth of gain stages [23]. The implementation of the active inductor is shown in Fig. 3-27, which consists of a PMOS device and a resistor Rs placed in series with the gate of PMOS. The PMOS device is operated in the saturation region, and the passive resistor can be realized using a NMOS operating in the triode region. The impedance looking into the source of the PMOS can be approximated by Z out =. 1 1 + sR s C gs ⋅ g m 1 + sC gs g m. 45. (3-27).

(66) where gm is the transconductance of the PMOS.. Fig. 3-27 Implementation of the active inductor peaking. Therefore, the zero and pole of the Zout is given by Zero = −. 1 R s C gs. (3-28). gm C gs. (3-29). Pole = −. The additional zero is introduced by the resistor Rs. For Zout to behave as an inductor, it. 46.

(67) is require that Zero > Pole ⇒ g m >. 1 Rs. (3-30). The inductive region (zero < ω < pole) and the inductance can be adjusted by tuning the locations of the pole and zero. The inductive region should cover the bandwidth of the pre-driver for a better gain and bandwidth boosting performance and the frequency response of the pre-driver should has the optimum group delay.. 3.4 Data Driver and Pre-emphasis Driver The data driver, as shown in Fig. 3-28, is an open-drain current-mode driver, which is composed of a differential source coupled pair with a stable constant current source Id. The input signal out+ and out- is from the pre-driver output mentioned in section 3.3, which is the serialized data with data rate 1Gbps. The outputs of the data driver, D+ and D-, are to directly drive a differential cable line. The data driver is providing a balanced AC current drive to the cable line imposed on the DC current Id to reach the required output swing. The main issue of the data driver is the settling time control, that is, the bandwidth limitation of the driver. When the bit time of the data is smaller than the settling time of the data driver, the values of the previously transmitted signal will affect the current bit’s waveform. This interference, called inter-symbol interference (ISI), reduces the maximum frequency at which the system can operate. Therefore, as shown in Fig. 3-29, a pre-emphasis driver is applied directly on the output pins to enhance the settling ability of the data driver.. 47.

(68) Fig.3-28 Schematic of the data driver. Fig. 3-29 Schematic of the pre-emphasis driver. Most of the output current comes from the data driver, controlled by the data multiplexer. The pre-emphasis multiplexer controls additional output current from the pre-emphasis driver. 48.

(69) when the serialized data bit changes from low to high or from high to low. The operation of pre-emphasis is given as a summary in Table. 3-2.. Table. 3-2 The operation of pre-emphasis summary. 3.5 Transmitter Simulation Results After transmitted from the transmitter circuit, the signals D+ and D- go through the bonding wire with internal bonding pad and external package pin. The thin bonding wire can be inductive, and the pad and the pin are inductive and capacitive. Then, the signals TxD+ and TxD- after going through the package transmit through the cable and arrive at the receiver termination resistor. Fig. 3-30 (a) shows the simulated waveforms of the proposed transmitter outputs TxD+ and TxD- without pre-emphasis. Fig. 3-30 (b) is the differential output. As can be seen, the high-frequency transmitted outputs TxD+ and TxD- are influenced by the past low-frequency outputs. Therefore, the high-frequency transmitted outputs could not meet the required output voltage range. Fig. 3-31 (a) shows the simulated waveforms of the proposed transmitter outputs TxD+ and TxD- with the adding of pre-emphasis. By adding the pre-emphasis circuit, the data bit transition is now faster than that without pre-emphasis. Fig. 3-31 (b) is the differential output.. 49.

(70) Fig. 3-32 shows the eye diagram of the signal at transmitting side without pre-emphasis. Fig. 3-33 shows the eye diagram of the signal at transmitting side with pre-emphasis.. (a). (b) Fig. 3-30 Simulation results of (a) the driver outputs without pre-emphasis, (b) the differential output. 50.

(71) (a). (b) Fig. 3-31 Simulation results of (a) the driver outputs with pre-emphasis, (b) the differential output. 51.

(72) Fig. 3-32 Eye diagram of the signal at transmitting side without pre-emphasis. Fig. 3-33 Eye diagrams of the signal at transmitting side with pre-emphasis. 52.

(73) Chapter 4 Receiver. 4.1 Architecture of Receiver This chapter presents the receiver design. The purpose of the receiver is to recovery the received signal to the original data by amplifying and sampling the signal. The clock and data recovery circuit embedded in the receiving side is to adjust the receiver clock based on the received data to make the sampling point into the center of the data eye. Then, the de-multiplexer makes recovered serial data become eight parallel data. Fig. 4-1 shows the block diagrams of the receiver architecture.. 53.

(74) Fig. 4-1 Block diagrams of the receiver. 4.2 Slicer When the differential data enter the receiver chip, they will be distorted because of the inductance and capacitance resonance caused by bonding wire and pad. Fig. 4-2 shows the schematic of the slicer [24]. The slicer is one of the most important building blocks in the receiver circuit. It is actually an open-loop comparator. To meet the common mode voltage range, the circuit is implemented with PMOS input differential pairs with a constant current source and using NMOS crossed-coupled pairs as the load.. Fig. 4-2 Schematic of slicer. The slicer needs to be able to detect the received signals that were noisy and swing. 54.

(75) limited and amplify the signal to get the nearly full swing CMOS level at the output. Therefore, the gain and bandwidth of the slicer should be carefully designed to meet the requirement. Moreover, the offset voltage of the slicer also affects the correct operation of the receiver. The offset voltage is not only due to the mismatches in the input devices but also mismatches (both device and capacitance mismatch) within the positive-feedback structure. These errors are referred back to the input as the input-offset voltage. The slicer also consists of two on chip termination resistors to match the characteristic impedance of the channel to reduce the parasitic effect caused by the packages and reflections. Fig. 4-3 is the frequency response of the slicer. It can be shown that within the full data rate of the transmitted NRZ signal, it still has 33.6dB gain. Fig. 4-4 shows the hysteresis window of the slicer. The advantage of this hysteresis comparator is noise immunity. The threshold voltage is determined by the system BER. Fig. 4-5 shows the corresponding output signal of the slicer, the limited received signals which are about 150mV are being amplified to the full scale. The data stream then sends to the following clock and data recovery circuit to get the data value.. 55.

(76) Fig. 4-3 Frequency response of slicer. Fig. 4-4 Hysteresis window of the slicer. 56.

(77) Fig. 4-5 The output of slicer when input 500MHz 150mV. 4.3 Clock and Data Recovery 4.3.1 Introduction The data stream received and amplified by the slicer is both noisy and asynchronous. The data must be retimed such that the jitter during transmission can be removed. The clock must also be extracted from the random data so as to allow synchronous operation. The task of data retiming and clock extraction is done by clock and data recovery (CDR) circuits. In the receiver, we introduce the circuit design of a PLL-based CDR. The main idea of a PLL-based CDR is the detection of the data location with respect to the clock edge during each data transition. If the data leads the clock, the clock will be sped up. If the data lags the clock, the. 57.