• 沒有找到結果。

每秒一百億筆資料傳輸之新型樹狀序列器傳輸端

N/A
N/A
Protected

Academic year: 2021

Share "每秒一百億筆資料傳輸之新型樹狀序列器傳輸端"

Copied!
81
0
0

加載中.... (立即查看全文)

全文

(1)

國 立 交 通 大 學

電機與控制工程研究所

碩 士 論 文

每秒一百億筆資料傳輸之新型樹狀序列器傳輸端

A Novel Tree-Type Serializer for 10Gbps Transmitter

研 究 生:陳冠宇

指導教授:蘇朝琴 教授

(2)

每秒一百億筆資料傳輸之新型樹狀序列器傳輸端

A Novel Tree-Type Serializer for 10Gbps Transmitter

研 究 生:陳冠宇 Student : Guan Yu Chen

指導教授:蘇朝琴 教授 Advisor : Chau Chin Su

國 立 交 通 大 學

電機與控制工程研究所

碩士論文

A Thesis

Submitted to Department of Electrical and Control Engineering College of Electrical Engineering and Computer Science

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master

in

Electrical and Control Engineering September 2006

Hsinchu, Taiwan, Republic of China

中華民國九十五年九月

(3)

每秒一百億筆資料傳輸之新型樹狀序列器傳輸端

研究生 : 陳冠宇 指導教授 : 蘇朝琴 教授

國立交通大學電機與控制工程研究所

摘 要

本論文提出一個可用在每秒一百億位元傳輸的序列輸出入端的新型樹狀序列器,利用 九十度相位差異的時脈來作為類似開關控制,以排除一般設計中對於時序再重置方法的需 要,因此在功率消耗與電路面積上能有顯著降低。模擬結果顯示在每秒一百億位元傳輸, 相較於一般的序列器,所推薦的序列器消耗百分之七十的功率以及百分之二十二的面積。 在本篇論文中,我們設計了一個每秒一百億位元傳輸器。使用台積電 0.13μ m 2P8M CMOS 製程來實現,此傳輸電路在 1.2 伏特的電源供應下消耗功率 27 毫瓦。 另外,我們也提出在晶片內部傳輸的通道模型以及所對應一公分通道長的低功率驅動 器設計。 關鍵字: 序列器, 多工器, 解序列器, 解多工器, 高速序列連結, 新型樹狀序列器, 九十度相位差異的時脈

(4)

A Novel Tree-Type Serializer for 10Gbps Transmitter

Student: Guan Yu Chen Advisor: Chau Chin Su

Department of Electrical and Control Engineering

National Chiao Tung University

Abstract

This thesis proposes a novel tree-type serializer for 10Gbps serial I/O. It uses quadrature clocks as switch controls to eliminate the need for retiming in a conventional design. As a result, power consumption and circuit area is significantly reduced. Simulation results show that at 10Gbps the proposed serializer consumes 0.7 of power and occupies 0.22 of area as compared to a conventional one.

In this thesis, a 10 Gbps transmitter has been designed. It is implemented in TSMC 0.13μ m 2P8M CMOS process., the transmitter circuit consumes 27mW on a 1.2V power supply.

Besides, we analyze the on-chip channel model and design a low power driver for the 1cm channel.

Keyword: Serializer, Multiplexer, Deserialize, DeMultiplexer, High-speed serial links, novel tree-type serializer, quadrature clocks

(5)

致 謝

我首先要感謝我的指導教授 蘇朝琴老師,感謝老師指導我的研究以及做研究的精神。

接著要感謝大師兄 鴻文學長,您總是很有耐心的指點學弟;還要感謝丸子學長,在您 的維持下,實驗室才能正常運作,打 AOC 才不會 lag,嗯…,我是說 Layout 才不會 lag; 當然還要謝謝仁乾和盈杰學長的指教和建議。 還要謝謝王照勳學長,感謝您在 TSMC0.13um 製程上不厭其煩的解答。還有汝敏在 TSMC0.13um 製程上的鼎力相助和洪老師實驗室的鼎鈞與振綱與我討論製程的問題以及 CIC 的 TSMC0.13 負責人張文旭先生解答任何我在製程上的使用問題,感謝你們。當然不能忘記 中央的各位學長們,包括顯元學長,毅山學長,育凱學長幫我下探針,常常陪我一忙就是 一整天,真是讓我萬分感激也十分不好意思,謝謝你們。另外還要謝謝煜輝學長,阿亮學 長,阿達學長,瑛佑學長,Cgu 學長,Ku 學長,阿銘學長的指導,還有志龍跟大姐與我分 享量測上的經驗,以及忠傑,順閔,TOTORO,小朱的幫忙。 再來要感謝智琦無私的解答,以及招牌般的笑容,恭喜老大你脫團了(哼)~~~,果然 好心有好報。還有匡良,楙軒,宗諭,大家彼此之間的鼓勵打氣與互相扶持。還有祥哥, 教主,皇如,村鑫,小馬,奶油哥,方董,威翔,存遠,大家一起烤肉打球聊天的時光是 美好的回憶。 當然,還有助理依萍,雅雯,俊秀,感謝妳們通知我們新消息以及…嗯,開會通知和 簽到表(Bad Dream~~)。 還要感謝士豪,教練,螞蟻,小 z,還有大學同學們的鼓勵,謝謝大家 最後當然是要感謝我的家人-爸爸,媽媽,沒有您們的鼓勵和支持,也不會有今天的我, 感謝你們。 陳冠宇 2006/7/23

(6)

List of Contents

List of Contents

List of Contents ...V

List of Tables... VI

List of Figures...VII

Chapter 1 Introduction...1

1.1CMOSHIGH-SPEED SERIAL LINKS 1

1.2MOTIVATION 3

1.3THESIS ORGANIZATION 4

Chapter 2 Background Study ...5

2.1OTHER STRUCTURE OF SERIALIZER 5

2.2SHIFT-REGISTER TYPE SERIALIZER 8

2.3SINGLE-STAGE TYPE SERIALIZER 9

2.4CONVENTIONAL TREE-TYPE SERIALIZER 10

Chapter 3 The Novel Tree-Type Serializer ...12

3.1FUNCTIONAL BLOCKS 12

3.2COMPARISON OF THREE STRUCTURES 15

3.2.1 ANALYSIS OF NOVEL TREE-TYPE AND SINGLE-STAGE SERIALIZER 16

3.2.2 COMPARE THREE ARCHITECTURES BY HSPICE SIMULATION 20

3.2.2.1 SINGLE-STAGE SERIALIZER 21

3.2.2.2 NOVEL TREE-TYPE SERIALIZER 24

3.2.2.3 CONVENTIONAL TREE-TYPE SERIALIZER 27

3.2.3 THE COMPARISON RESULTS AS FIGURES AND TABLES 31

3.3SUMMARY 37

Chapter 4 Transmitter Circuit Design ...38

4.1INTRODUCTION 38 4.2CIRCUIT DESIGN 38 4.2.1 MUX 8-TO-1 39 4.2.2 MUX 4-TO-1 42 4.2.3 MUX 32-TO-1 44 4.2.4 DRIVER 45 4.3SIMULATION RESULT 45 4.4IMPLEMENTATIONS 48

(7)

List of Contents

4.5SUMMARY 52

Chapter 5 Measurement...53

5.1OFF CHIP 53

5.2ON WAFER 56

Chapter 6 On-Chip channel model analysis and low power driver.. 59

6.1 On-Chip Channel Model Analysis 59

6.2 Low Power Driver 62

Chapter 7 Conclusion ...65

7.1 CONCLUSION 65

(8)

List of Tables

List of Tables

Table 1.1 High-Speed Communication Standard 3

Table 2.1 Comparison of three kinds of MUX 11

Table 3.1 Rise Time versus size of the MOS Transistors in Single-Stage Serializer 23 Table 3.2 Rise Time versus size of the MOS Transistors in Novel Tree-Type Serializer 26

Table 3.3 Rise Time versus size of the MOS Transistors in Conventional Tree-Type Serializer 29

Table 3.4 Power of three architectures versus Rising Time 32

Table 3.5 Area of three architectures versus Rising Time 32

Table 3.6 Power X Area of three architectures versus Rising Time 34

Table 3.7 Analysis vs. Simulation 36

Table 3.8 Power and Area Comparisons 37

Table 3.9 Power and Area Comparisons Standardizes by Using Conventional Tree 37 Table 4.1 The Power Consumption of Each Part of Circuit 47

Table 5.1 Measurement result 58

Table 6.1 Characteristic Impedance of M8-MY 62

(9)

List of Figures

List of Figures

Figure 1.1 Conventional Transceiver 2

Figure 2.1 Diagram of N-to-1 multiplexer 6

Figure 2.2 Shift-register type serializer 6

Figure 2.3 Single-stage type serializer 7

Figure 2.4 Conventional tree-type serializer 7

Figure 2.5 Serializer of CML 8

Figure 2.6 Circuit of 4-to-1Single-Stage Type 9

Figure 2.7 Circuit of 2-to-1 MUX in tree-type serializer 11

Figure 3.1 The original and proposed tree-type multiplexers 13

Figure 3.2 4-to-1 Novel Tree-Type Serializer 14

Figure 3.3 8-to-1 Novel Tree-Type Serializer 14

Figure 3.4 Timing Diagram of The Proposed MUX 15

Figure 3.5 Architecture of The Novel Tree-Type Serializer with2 to 1 N 15

Figure 3.6 8-to-1 Single-Stage Serializer 16

Figure 3.7 Basic Inverter 16

Figure 3.8 Half Circuit of Single-Stage and Novel Tree-Type Serializer 17

Figure 3.9 Equivalent R,C Circuit of Serializer 17

Figure 3.10 Static DFF 21

Figure 3.11 DFF of Clock Generator 21

Figure 3.12 Design steps for single-stage serializer 22

Figure 3.13 Eye Diagram of Rise Time of 300ps for Single-Stage Serializer 23

Figure 3.14 Eye Diagram of Rise Time of 150ps for Single-Stage Serializer 24

Figure 3.15 Steps of Simulation of Novel Tree-Type Serializer 25

Figure 3.16 Eye Diagram of Rise Time of 300ps for Novel Tree-Type Serializer 26

Figure 3.17 Eye Diagram of Rise Time of 100ps for Novel Tree-Type Serializer 27

Figure 3.18 8-to-1 conventional tree-type architecture 28

Figure 3.19 Eye Diagram of Rise Time of 300ps for Conventional Tree-Type Serializer 29

Figure 3.20 Eye Diagram of Rise Time of 100ps for Conventional Tree-Type Serializer

30

Figure 3.21 Area v.s. Rising Time 31

Figure 3.22 Power v.s. Rising Time 33

Figure 3.23 Area v.s. Rising Time 33

(10)

List of Figures

Figure 3.25 Power X Area v.s. Rising Time of Two Structures 35

Figure 3.26 Analysis vs. Simulation 36

Figure 4.1 Whole Architecture of the Chip 39

Figure 4.2 Clock Diagram of 8-to-1 MUX with Propagation Delay 39

Figure 4.3 Proposed Novel Tree-Type Serializer 40

Figure 4.4 Circuit of Differential DFF 40

Figure 4.5 Structure of 8-to-1 MUX and Clock Gen 41

Figure 4.6 Data and Clock Diagram of 8-to-1 MUX with Delay 42

Figure 4.7 (a) CS Circuit with Load C (b) Small Signal Equivalent Circuit of (a) (c) CS Circuit with additional inductor (d) Small Signal Equivalent Circuit of (c) 43

Figure 4.8 Circuit of 4-to-1 MUX 43

Figure 4.9 Data and Clock Diagram of 4-to-1 MUX with Delay 44

Figure 4.10 Architecture of 32-to-1 MUX 44

Figure 4.11 Architecture of Multi-Stage Driver 45

Figure 4.12 Simulation Result of The Multi-Phase Generator

46

Figure 4.13 Simulation Result of 8-to-1 Serializer 46

Figure 4.14 Simulation Result of 4-to-1 Serializer 47

Figure 4.15 The Eye Diagram of 10Gbps Transmitter 47

Figure 4.16 The Effect of Ground Bounce to VDD and GND 48

Figure 4.17 Layout of MUX2 and DFF 48

Figure 4.18 Retiming and PRBS DFF 49

Figure 4.19 8-to-1 MUX 49

Figure 4.20 (a) Clock Generator for 8-to-1 MUZ (b) 4-to-1 MUX (c) 5GHZ to four phase 2.5GHZ Clock divider 49

Figure 4.21 32-to-1 Serializer 50

Figure 4.22 Layout of Whole Chip (Without Dummy) 50

Figure 4.23 Layout of Whole Chip (With Dummy) 51

Figure 4.24 The Whole Measurement Environment 52

Figure 5.1 The Whole Chip Photo 54

Figure 5.2 The Core Photo 54

Figure 5.3 Structure of Four layers PCB 54

Figure 5.4 Photo of Off-Chip Measurement PCB 55

Figure 5.5 Environment of Off-Chip Measurement 55

Figure 5.6 (a) 1.25Gbps Data Eye Diagram (b) Reset 55

Figure 5.7 2.5Gbps Data Eye Diagram 56

Figure 5.8 Wenworth probe station 57

(11)

List of Figures

Figure 5.10 Probe Photo 57

Figure 5.11 (a) Eye Diagram 1 (b) Eye Diagram 2 58

Figure 6.1 (a) Microstrip (b) Stripline 60

Figure 6.2 On-Chip Channel Model Design Flow 61

Figure 6.3 (a) Structure of Driver (b) Impedance matching 63

Figure 6.4 Architecture of Pre-Driver 64

(12)

Chapter 1 Introduction

Chapter 1

Introduction

1.1 CMOS High-Speed Serial Links

High-speed serial links in Gbps range are usually implemented in bipolar or GaAs technologies. The primary reason is the higher bandwidth of those devices. However, CMOS transistors process technology has grown exponentially in recent years. It results in a remarkable improvement in the operating speed and integration level. [1]

Figure 1.1 is a conventional serial link system. It comprises three primary components: a transmitter, a channel, and a receiver. The high-speed data sent by a transmitter are analog signal. These analog signals known as non-return-to-zero (NRZ) use either a HIGH-level or a LOW-level to represent data bits. For an optical transmission system, these levels are different amounts of optical power. For electrical systems, these levels are different signal voltage or current pulses.

(13)

Chapter 1 Introduction

parallel bits into a serial bit stream. The timing information is embedded in this serial data. The output drivers drive the signal from serializer to the channel.

The channel is the medium of the data transmission system. There are many types of channels, such as unshielded twisted-pairs, printed-circuit boards (PCB) transmission lines, chip packages, coaxial cables and optical fibers. There are two high-speed links, copper cables and optical fibers. The first one as for short distance transmission and the second as for long distance ones. The most significant advantage provided by optical fibers is high bandwidth over long distances. But the drawback is the cost since the optical fiber and the necessary components as expensive. To replace optical fiber, the less expensive solution for high-speed communication is using cooper cables. But the cable length limits the bandwidth of transmission.[2]

The receiver receives and converts this analog signal back into binary data. It includes a front end amplifier, a deskew buffer or a clock and data recovery (CDR) and a deserializer. To recover the signal from transmitter, the analog waveform is amplified by front end amplifier. The data is resampled by the deskew buffer or the CDR. Finally, the serial data send into deserializer to converter serial high -speed data into parallel low speed data.

Figure 1.1 Conventional transceiver

In advanced design case, there is a Pseudo Random Bit Sequence (PRBS) generator and verifier. The function is to check the correction of the data received

PLL Output RX Front end PRBS Generator Deskew + Deserializer • On-Chip transmission line 32-to-1 serializer

(14)

Chapter 1 Introduction

from receiver by comparing to the data in transmitter. This is a build in self test (BIST) system. Phase lock loop (PLL) provides both transmitter and receiver a clock source. The CMOS high-speed serial links have been widely used in many applications such as data transmission with multiple processors, communication within computers, routers, etc. Also, there are many standard specification for CMOS high-speed serial links, like Gigabit Ethernet, IEEE1394, SONET, Fiber Channel. Table 1.1 is the table of standards

Table 1.1 High-Speed Communication Standard

Standard Data Rate

OC-12/STM-4 622.08Mbps FC1063 1.0625Gbps SATA 1.5Gbps OC-48/STM-16 2.48832Gbps PCI-Express 2.5Gbps SATA2 3Gbps XAUI 3.125Gbps 4G FC 4.25Gbps 8G FC 8.5Gbps OC-192 9.95328Gbps 10GbE 10.3125Gbps Fiber Channel 10.51875Gbps G.709 10.66423Gbps G.975 10.70923Gbps OC-768 39.81gbps

1.2 Motivation

Advanced integrated circuit technologies are able to integrate muilti-million gates into a single chip. Operating frequency and data throughput have been increased significantly. Conventionaly, parallel buses and serial links are two approaches for high-speed signaling. For parallel buses, many bus lines are needed in a system to make the total transmission data rate arrive the specification. The drawback of the large buses is the increased power consumption and the explosion of circuit area. Also, the pads numbers is increased. Unfortunately, the number of I/O pins cannot grow proportionally. As a result, high-speed serial I/O is needed to solve the communication

(15)

Chapter 1 Introduction

bottleneck. PCI-Express and Serial ATA are two prominent examples. For serial transmission links, it maximizes the communication bandwidth and distance in a single transmission line. Serial links offer a high-speed and low-cost solution to multi-gigabit per second rates over long distance. Applications such as computer-to-computer or computer-to-peripheral interconnection can reach several meters. A key component is a serialier that converts low-speed parallel data into high-speed serial output stream.

In this thesis, a novel tree-type serializer circuit is proposed. We implement this transmitter architecture using non-return-to-zero (NRZ) signal techniques. A 10 Gbps novel tree-type serializer with output driver and PRBS (Pseudo Random Bit Sequence) has been designed. We also analyze the on-chip channel mode and design a low power driver for the channel with 1cm length.

1.3 Thesis organization

The rest of the paper is organized as follows.

In Chapter 2, we describe and analyze the conventional structure of serializer. In Chapter 3, we introduce the proposed novel tree-type serializer architecture and analyze and compare to other conventional architecture. The simulation results of comparation are also showed.

In Chapter 4, the chip implementation is presented. We show the full architecture of this transmitter. We also show the detail circuit of each block. Finally, we present the simulation results, layout, and measurement consideration of the design.

In Chapter5, the measurement results are presented. It includes off-chip measurement and on-wafer measurement by a probe station. The results include eye diagrams, jitters (Pk-Pk)(RMS), power consumptions.

In Chapter 6, we show an on-chip channel model analysis and a low power driver.

(16)

Chapter 2 Background Study

Chapter 2

Background Study

2.1 Other Structure of Serializer

Serializer, also called Multiplexer or MUX, has the function of converting parallel low speed input data into serially high-speed output data stream. As Figure 2.1 shows, a conceptual block diagram of a serializer. In Figure 2.1, there is a N-to-1 multiplexer. Di1 to Din are n-bit parallel low speed input data. Selected by ck1 to

n

ck , Di1, Di2 and Din are serialized into high-speed output, DO. Its data rate is n times of Di. In many applications, the number of inputs of serializer is power of two, like 2, 4, 8, 16. Some system like PCI-Express may encode the output data. Thus, the number of input of serializer may be changed to another number. For example as 8B/10B scrambler need a 10 to 1 multiplexer.

There are three principal structures of serializer. They are shift-register type, single-stage type, and tree-type serializer. The architecture is shown in Figure 2.2, 2.3, and 2.4. There are other special architectures, like CML (Current Mode Logic) MUX

(17)

Chapter 2 Background Study

as shown in Figure 2.5. We will explain the structures in the next chapter.

ck1 ck2 ck3 ckn DO Multiplexer N:1 Di1 Di2 Di3 Din CK ck1 ck2 ckn Di1 Di2 Din D1 D2 Dn

Parallel input data & clock

Serial output data CK DO D1 D2 DN ck1 ck2 ck3 ckn DO Multiplexer N:1 Di1 Di2 Di3 Din CK ck1 ck2 ckn Di1 Di2 Din D1 D2 Dn

Parallel input data & clock

Serial output data CK

DO D1 D2 DN

Figure 2.1 Diagram of N-to-1 multiplexer

Figure 2.2 Shift-register type serializer CK1 CK3 CK2 D D0 D3 D2 D5 D4 D7 D6 D1 out CK1 CK2 CK3 Out D7 D0 D1 D2 D3 D4 D5 D6 D7 0 D

(18)

Chapter 2 Background Study Φ1 Φ2 Φ3 Φ0b Φ1b Φ2b Φ3b d0 d1 d2 d3 d4 d5 d6 d7 Out Φ0 Φ0 Φ1b d0 Φ1 Φ2b d1 Φ2 Φ3b d2 Φ3 Φ0 d3 Φ0b Φ1 d4 Φ1b Φ2 d5 Φ2b Φ3 d6 Φ3b Φ0b d7 Outb Φ0 Φ1b d0b Φ1Φ 2b d1b Φ2Φ 3b d2b Φ3 Φ0 d3b Φ0b Φ1 d4b Φ1b Φ 2 d5b Φ2b Φ3 d6b Φ3b Φ0b d7b Out Φ1 Φ2 Φ3 Φ0b Φ1b Φ2b Φ3b Φ1 Φ2 Φ3 Φ0b Φ1b Φ2b Φ3b d0 d1 d2 d3 d4 d5 d6 d7 Out Φ0 Φ0 Φ1b d0 Φ1 Φ2b d1 Φ2 Φ3b d2 Φ3 Φ0 d3 Φ0b Φ1 d4 Φ1b Φ2 d5 Φ2b Φ3 d6 Φ3b Φ0b d7 Outb Φ0 Φ1b d0b Φ1Φ 2b d1b Φ2Φ 3b d2b Φ3 Φ0 d3b Φ0b Φ1 d4b Φ1b Φ 2 d5b Φ2b Φ3 d6b Φ3b Φ0b d7b Out

Figure 2.3 Single-stage type serializer

Figure 2.4 Conventional tree-type serializer /2 /2 CK/2(0°) D D D Ou 1 2 3 4 D0 D1 L L CK/2(90°) CK CK/2 (90°) CK CK/2 (0°) Out 1 2 3 4 1 2 3 4 D1 D0 D1 D0

2 to 1 MUX cell timing diagram

D0 D1 D2 D3 D4 D5 D6 D7

Mux Mux Mux

Mux Mux Mux Mux Latch D Flip Flop 2 to 1 MUX cell

(19)

Chapter 2 Background Study d1 d1b d2 d2b S SN Vb d1 d1b d2 d2b S SN Vb Figure 2.5 Serializer of CML

2.2 Shift-Register Type Serializer

Figure 2.2 shows the shift-register type serializer. The main function of this architecture is parallel load and serial shift. Both work of different frequencies. Parallel load works of low data rate. It uses CK2 as the clock. The parallel data inputs load in the D Flip Flop (DFF). Serial shift works of high-speed data rate It uses CK1 as function clock. The high data rate DFF trigged by CK1 sends data into a sequenced stream. The data in the serial shift register have been sent out entirely. CK3 loads the data from parallel load register into serial shift register. CK1 has the highest clock rate. It is divided to produce CK2, CK3. Refer to the timing diagram of the clock and data in Figure 2.2, this serializer works as follows.

The shift-register type serializer is a straightforward implementation. It can process arbitrary number of parallel data by increasing the number of DFFs and adjusting clock rate. The jitter is small with an ideal clock. However, there are several drawbacks. First, the maximum operating speed of this circuit is limited by the device performance. [3]. According to [4], only 3gbps transmission can be achieved even with 0.15um CMOS transistors technology. Second, it needs an extreme high speed and low jitter global clock. The DFF of serial shift work at the highest rate. This causes a large power consumption.

(20)

Chapter 2 Background Study

2.3 Single-Stage Type Serializer

Figure 2.6 Circuit of 4-to-1Single-Stage Type

Figure 2.3 is the structure of a single-stage type serializer. Figure2.6 is the basic circuit diagram of this structure. The multiplexer needs to input the clock with the same frequency as the parallel input data. As show in Figure 2.3, the data is sent out when two specific clocks with different phases overlap. For example, d0 is transmitted when Φ0 and Φ1b overlap (both are 1). The data period of d0 is from Φ0 positive edge to Φ1b negative edge. The other data are transmitted by the same rule.

There is also one point that should be remarked in Figure 2.6. Many papers show that the device of data input is just a NMOS transistors.[5~10]. But, in [11] [12], we know that adding an extra PMOS transistors of data input has a benefit. When data is low, the PMOS transistors turns on and drives current to precharge the internal node to a high level. In other words, this technique can reduce the charge sharing effect and alleviates data jitter.

In order to have large output swing, the pull-up PMOS transistors must be weakly sized to reduce the driving capability. This makes the low to high transient time larger and the unbalance of rising and falling times. To achieve higher speed, we should reduce the output swing. The analyses of output swing and delay time to pull-up PMOS transistors size are shown in [10].

Basically, it is a multiplexer controlled by the phases of a multi-phase low-speed clock. The power consumption is small. This serializer can also handle arbitrary number of parallel data. It sends out one bit of data at each phase interval. The most significant drawback is the large self parasitic capacitance at the outputs that limit the bandwidth performance.[1][9] Furthermore, phase imbalance of the clock may also

O d1b d2b d3b d4b P2 P1 P1 P4 P4 P3 P3 P2 Ob d1 d2 d3 d4 P2 P1 P1 P4 P4 P3 P3 P2

(21)

Chapter 2 Background Study

create jitters.

2.4 Conventional Tree-Type Serializer

Figure 2.4 shows a 8-to-1 tree-type serializer for high-speed applications. It is composed of three stages of 2-to-1 multiplexers organized as a tree. A high-speed clock, normally at half the data rate, is divided to control the successive stages. However, due to the two inputs need to be out of phase, retiming mechanism is required [13~15]. We describe the 2-to-1 MUX in detail in Figure 2.4. We use CK/2(0) to retime DFF. D0 is latched by one positive triggered DFF. D1 is latched by one positive triggered DFF and one negative triggered DFF. After the retiming, D0 and D1 have a 180 degree phase shift. Then those two data as sent into a 2-to-1 MUX and we use CK/2(90) to select data out of the MUX. Notice the timing diagram of Figure 2.4, using CK/2(90) to select data during 1/4 to 3/4 the data period ensures enough setup time and hold time.

The conventional tree-type serializer is able to operate at a high frequency due to the low output parasitic capacitance and retiming mechanism. This architecture can only convert power of two of parallel input data, such as 2, 4, 8, and 16. It is able to achieve higher speed than a single-stage serializer. However, its hardware overhead and power consumption is higher.

Figure 2.5 and Figure 2.7 are the conventionally circuits of 2-to-1 MUX block. Figure 2.5 shows a CML of 2-to-1 MUX. It has a current source NMOS transistor biased by Vb to support a biasing current. The select S and inversion SN decide either d1 or d2 to be transmitted. As CMOS process technology scaled fast in recent years, supply voltage is lowed. The implementation of CML is harder due to three stages of NMOS transistors. Figure 2.7 is much alike a single-stage serializer and has lower parasitical capacitance at output node. Figure 2.7(a) is a 2-to-1 single-stage circuit and Figure 2.7(b) adds a PMOS transistor data input to reduce charge sharing effect as describe before.

(22)

Chapter 2 Background Study

Figure 2.7 Circuit of 2-to-1 MUX in tree-type serializer

Table 2.1 is a comparison of three types of MUX. The advantage is that this structure can work using a ring oscillator type phase lock loop (PLL). This means that the needed clock rate is 1/N of the transmission data rate.

Tree-type serializer is composed of multiple stages. This makes the number of input in each stage as well as the parasitical capacitance at output node be reduced. For this reason, the bandwidth of tree-type serializer is the highest among the three structures. The shortcoming is the requirement of a higher clock rate.

Table 2.1 Comparison of three kinds of MUX

High freq, single phase

Low freq, multi-phase High freq, single

phase External clock property medium Low High Bandwidth High Medium Low Power N N Multiplex number Shift-register Single-stage Tree

High freq, single Low freq,

multi-High freq, single External clock Bandwidth Medium N N Multiplex number Tree N 2 O d2 d1 CK CKB Ob d1b CK O d2 d1 CK Ob d1b d2b CKB CK (a) (b) CKB CKB d2b

(23)

Chapter 4 Transmitter Circuit Design

Chapter 3

The Novel Tree-Type Serializer

3.1 Functional Blocks

In this chapter, we will introduce a new serializer structure which consumes less power and area. First, we explain the 2-to-1 MUX and the control clock. Second, we show the configuration of 4-to-1 and 8-to-1 MUX. Finally, we describe the design issue. Figure 3.1 shows the conventional and proposed novel tree-type 4-to-1 serializer (multiplexer) cells. Three retiming D-type Flip-Flops (DFF), as shown in Figure 2.4, are removed. Instead, quadrature clocks are used for the switch control in the previous stage. The first stage is controlled by the original clock to switch and output data at two times the clock rate. The second stage is controlled by two divide-by-two clocks with phase difference of 90 degree. As one can see, with quadrature clocks, o

retiming can be waived. Moreover, data is ready one half period before being switched in. Therefore, there is no data dependent jitter. The overall jitter is determined by the output control clock.

(24)

Chapter 4 Transmitter Circuit Design

Figure 3.1 also shows the timing diagram without propagation delay. Therefore, there is no timing variation at the output. Figure 3.2 and Figure 3.3 are 4-to-1 and 8-to-1 MUX with quadrature clocks. The circuit structure is simple and regular. Without propagation delay, each stage of this serializer will have the setup time which is half of the input clock period and have no hold time.

The propagation delay is a design issue in chip implementation. Figure 3.4 shows the case that considers the propagation delay. T1 is the delay of clock divide; T2 is the delay of the MUX; T3 is one-bit time. As one can see, the setup timing margin is T3-T1-T2; and the hold time margin is T1+T2. In general, they are more than enough for the MUX to operate reliably.

Figure 3.1 The original and proposed tree-type multiplexers.

Figure 3.5 shows the architecture of the novel tree-type serializer with2 to 1. N

The circuit structure is simple and regular. The novel tree-type serializer embeds data retiming in the previous stage of MUX. Due to this, hardware overhead and power consumption are expected to be lower.

Modified mux cell & timing

Out I1 I0 I1 I0 I1 I0 CK/2 (90°) CK/2 (0°) CK D0 D1 D Q D Q D Q CK (0°) CK (90°) I1 I0 D1 I1 I0 I1 I0 D0

Original mux cell & timing

D1 D0 Out CK/2 (90°) CK (90°) CK (0°) Out D0 D1 D0 , D1 CK/2(0) CK CK/2(90) D0 D1 D0 D1 Out D1 D0 D1 D0 D1 D0

(25)

Chapter 4 Transmitter Circuit Design

Figure 3.2 4-to-1 Novel Tree-Type Serializer

Figure 3.3 8-to-1 Novel Tree-Type Serializer Ck 0 Ck 1 Ck2 0 Ck2 2 Ck2 3 Ck2 1 D0 D1 D2 D3 out Ck 0 Ck2 0 Ck2 2 D0 D2 Ck 1 Ck2 3 Ck2 1 D1 D3 4 4--11MMUUXX Ck 0 Ck 1 Ck3 3 Ck2 1 Ck3 5 Ck2 2 Ck3 1 Ck2 3 Ck3 2 Ck2 0 Ck3 4 Ck3 0 Ck3 6 Ck3 7 Ck 0 Ck 1 Ck2 0 Ck2 2 Ck3 0 Ck3 4 Ck3 6 Ck2 3 Ck2 1 Ck3 1 Ck3 5 Ck3 7 Ck3 2 Ck3 3 D3 D2 D1 D0 D7 D6 D5 D4 Out 0 1 2 3 4 5 6 7 4 5 6 7 Out

(26)

Chapter 4 Transmitter Circuit Design

Figure 3.4 Timing Diagram of The Proposed MUX.

Figure 3.5 Architecture of The Novel Tree-Type Serializer with2 to 1. N

3.2 Comparison of Three Structures

We compare our novel tree-type serializer to the single-stage and the conventional tree-type serializer in this section. In section 3.2.1, we analyze the required number of PMOS transistors in single-stage and novel tree-type serializer. This could help we understand the speed limitation of them. We can also know the difference of size in the two architectures when both of them work in the same transient time and boundary conditions. Section 3.2.2, we compare the three

1/2 1/2 1/2 1/2 1/2 1/2 Out I1 I0 I1 I I1 I CK/2(90°) CK/2(0°) C D D I-Q Gen CK/2(0) CK CK/2(90) Out D1 D0 D0 D1 D0 D1 T1 T2 T2 Tsetup Thold T3 Clock

T1 : I-Q Gen delay, T2 : MUX delay, T3:half CK period Tsetup : Setup time timing margin = T3 – T1 – T2 Thold : hold time timing margin = T3 – Tsetup

(27)

Chapter 4 Transmitter Circuit Design

architectures by using HSPICE for simulation. The simulations of the three architectures are with the same boundary conditions to ensure the fair of comparison. Section 3.2.3, we show the comparison results as figures and tables.

3.2.1 Analysis of Novel Tree-Type and Single-Stage Serializer

Considering the chip design issue described in Chapter 4, we need four 2.5Gbps 8-to-1 serialzer and one 10Gbps 4-to-1 serializer. Therefore, the analysis and simulation focus on the 2.5Gbps 8-to-1 serializer. Figure 3.6 shows the 8-to-1 single-stage serializer with dummy PMOS transistor which alleviate the charge sharing effect.

Figure 3.6 8-to-1 Single-Stage Serializer.

We consider a basic inverter in TSMC 0.13μ technology. The design rule for m smallest width is 0.3μ . The basic inverter is shown in Figure 3.7. m

Figure 3.7 Basic Inverter.

The average of C and d C of PMOS transistor and NMOS transistor of the g inverter is as follows. Thus, 0.13um 1.3um ) L W ( p = 0.13um 0.3um ) L W ( N = fF 486 . 0 C fF 9294 . 1 C fF 747 . 0 C fF 5714 . 2 C NMOS PMOS avg _ gN avg _ gP avg _ dN avg _ dP = = = =

(28)

Chapter 4 Transmitter Circuit Design

( 1)

Figure 3.8 Half Circuit of Single-Stage and Novel Tree-Type Serializer. Figure 3.8 show the half circuit of these two architectures. Since three stage of 2–to-1 MUX compose a 8-to-1 novel tree-type serializer, we only need to show the last 2-to-1 MUX which dominates the output capacitance and bandwidth. The boundary conditions we assume are (1) for C , we consider PMOS transistor drain out capacitance and up level NMOS transistors drain capacitance. (2) the dummy transistor is not considered. (3) the swing in each architecture is from 0.25V to 1.2 V.

Figure 3.9 Equivalent R,C Circuit of Serializer.

MUX 8:1 : Use single stage

0 φ d0 d1 d2 d3 d4 d5 d6 d7 1 φ 2 φ 3 φ 4 φ 5 φ 6 φ 7 φ 0 φ 1 φ 2 φ 3 φ 4 φ 5 φ 7 φ 6 φ d0b

MUX 2:1 : Use single stage

d0 d1 φ φ RO_P RO N CO VDD VO e Capacitanc l Parasitica Output C NMOS of Resistance Output R PMOS of Resistance Output R O O_N O_P = = = gN dP gN gP gN dN 1.5C ,C 4C ,C 5.3C C = = =

(29)

Chapter 4 Transmitter Circuit Design

We calculate the delay time from the output resistance and capacitance of serializer. We use equivalent RC circuit shown in Figure 3.9 to simplify the calculation. The calculations is are

( 2) Now we calculate the output resistance and capacitance of each architecture. Then, we substitute the results into (2).

For a single-stage MUX:

O_P O_N O O_P O_N C R R ) R t(R O O_P DD o O O_P O_N O_P O_N O O_P DD N P O O_P O_N O_N DD N O O_P O_N P O_N DD O_N O_N P O O_N O_N DD o R R C R R Time_delay e C R 1 V (t) V C R R R R S 1 C R 1 V ) R (R C R SR R V R C R SR R R V C SR 1 R R C SR 1 R V V O_P O_N O_P O_N + = ⇒ = ⇒ + + = + + = + + = + + + = + − O MUX stage -single 1 -to -8 in NMOS connected parallelly of number the is m inverter basic a in PMOS the of e capacitanc drain equivalent the is C inverter basic a in NMOS the of e capacitanc drain equivalent the is C inverter basic a in PMOS the of e capacitanc gate equivalent the is C inverter basic a in NMOS the of e capacitanc gate equivalent the is C inverter basic a in NMOS the of resistance equivalent the is R Assume n8 dP dN gP gN N inverters. 16 of fanout a means which ) C C ( 16 C (1), By e capacitanc load output the is fF 40 C MUX type tree novel 1 -to -8 in PMOS connected parallelly of number the is m MUX type tree novel 1 -to -8 in NMOS connected parallelly of number the is m MUX stage -single 1 -to -8 in PMOS connected parallelly of number the is m gP gN L L p2 n2 p8 + = =

(30)

Chapter 4 Transmitter Circuit Design

( 3) ( 4) ( 5)

( 6) For a novel tree-type MUX:

( 7) ( 8) ( 9) ) 10 ( When (6) is equal to (10) ) 11 ( (12) ) 13 ( ) C 16(C m C m C 8k C m k m gP gN p8 dp p8 dN 8 out p8 8 n8 + + + = = p8 N O_P N p8 8 n8 N O_N m 1 R R , R m k 1 3 m 1 3R R = = = p8 8 gP gN N 8 dP N 8 dN N 8 )m k (3 ) C (C R 48 k 3 C 3R k 3 C R 24k Delay_time + + + + + + = ) C 16(C m C m k 2C C m k m gP gN p2 dP p2 2 dN out p2 2 n2 + + + = = p2 N O_P N p2 2 n2 N O_N m 1 R R , R m k 2 m 1 2R R = = = p2 2 gP gN N dP N 2 dN N 2 2 )m k (2 ) C (C 32R C R k 2 2 C R k 2 4k Delay_time + + + + + + = ) C (C R )m k 32(3 C R m )m k 2(3 C R m )m k (3 4k ) C (C R )m k 48(2 C R )m k (2 3m C R )m k (2 m 24k ) k (2 )m k (3 by sign equal the of sides both ultiply gP gN N p8 8 dP N p2 p8 8 dN N p2 p8 8 2 gP gN N p2 2 dP N p2 2 p8 dN N p2 2 p8 8 2 p8 8 + + + + + + = + + + + + + + + m , M p2 = + + + + + + 8 p8 gP gN N 8 dP N 8 dN N 8 )m k (3 ) C (C 48R k 3 C 3R k 3 C R 24k p2 2 gP gN N dP N 2 dN N 2 2 )m k (2 ) C (C 32R C R k 2 2 C R k 2 4k + + + + + + p8 8 p2 p8 8 p2 p8 8 2 p2 2 p2 p8 2 p2 p8 2 8 gN dP gN gP gN dN )m k 160(3 m )m k 10.6(3 m )m k (3 6k )m k 240(2 m )m k 15.9(2 m )m k (2 36k is (12) 5.3C C , 4C C , 1.5C C : (1) + + + + + = + + + + + = = = , Then From

(31)

Chapter 4 Transmitter Circuit Design ) 14 ( ) 15 ( ) 16 ( ) 17 ( ) 18 (

In (17), when m approximates infinite, p8 m is 1.232. This is because the p2 transition time in single-stage MUX will converge no mater how m increase. So p8

2 p

m converge to the significant calue. In (18), whenmp2 > 1.232 mp8 < 0. This implies if m is large than 1.232, it is impossible to find a solution for p2 m . p8 Because if the transition time of m is too short, the increasing of p2 m can not p8 achieve the same transition time of m . p2

3.2.2 Compare three architectures by HSPICE Simulation

In order to verify the low power and low area overhead advantages over the single-stage and conventional tree-type serializers, we design all three of them. We compare these three architectures in three ways (1) power consumption, (2) area overhead, (3) power area product. For solution by HSPICE, the boundary conditions are 0.25V) ~ (1.2V level input same The (5) . 11 . 3 igure shown in F generator of clock ) The same 4 ( . 10 . 3 in Figure DFF shown data skew ) The same 3 ( g time sin ri ) the same 2 ( fF 40 of C ) The same 1 ( L p2 p2 p2 p2 p8 p8 p8 p8 p2 p8 p8 p2 p2 p8 p8 p2 p8 p2 p8 p2 p2 p8 p2 p8 2 8 m : 1440 1168.95m 1560m m : m 1560 1168.95m 1440m : m m : m 1440m 1560m m 1168.95m 1440m m 95.4m m 243m 1560m m 103.35m m 1404m 4.5, k 6, k swing same the for Now, − − = ∧ + = = + ⇒ + + = + + = = p2 p2 p2 p2 p8 p8 p8 p8 p2 p8 m : 1440 1168.95m 1560m m : m 1560 1168.95m 1440m : m m : m − − = + =

(32)

Chapter 4 Transmitter Circuit Design

Figure 3.10 Static DFF.

In every architecture, we simulate the cases with the rise time of 300ps、275ps、 250ps、225ps、200ps、175ps、150ps、125ps、100ps. We simulate additional cases for the rise time of 170ps、165ps、160ps、155ps for single-stage MUX only. These extra points would make the simulation result more complete.

Figure 3.11 DFF of Clock Generator.

.3.2.2.1 Single-Stage Serializer

We design and simulate this architecture as shown in Figure 3.6 according to Figure 3.12. Reset Reset_b In Out In Inb Out Outb Reset Reset Reset Reset Reset_b Reset_b Reset_b Reset_b Data Skew DFF

Clock Gen Step1:Choose the size

of MUX to match the rising time spec.

Step2:Choose the size of Data Skew DFF to keep the rising time spec. Step3 : Choose the size of

Clock Gen to keep the rising time spec.

(33)

Chapter 4 Transmitter Circuit Design

Figure 3.12 Design steps for single-stage serializer.

We can optimize the simulation and ensure each block consuming appropriated power by the steps shown in Figure 3.12. The rule of the size choosing in the design steps should conform to the TSMC design rule. In step 1, we obtain the size of PMOS transistors and NMOS transistors that match the rise time specification by using the command “.alter” in HSPICE that carefully increases the size of MOS transistors. The results are shown in Table 3.1 and sizes that match the rise time specification are boldfaced.

Table 3.1 Rise Time versus size of the MOS Transistors in Single-Stage Serializer.

13 . 0 3 . 1 ) L W ( p = 13 . 0 3 . 0 ) L W ( n = 13 . 0 3 . 1 ) L W ( p = 13 . 0 3 . 0 ) L W ( n = mp mn Tr mp mn Tr 1.0 6 302 12 72 157 1.1 6.6 292 13 78 157 1.2 7.2 276 14 84 157 1.3 7.8 265 15 90 155 1.4 8.4 257 16 96 154 1.5 9 250 17 102 154 1.6 9.6 245 18 108 154 1.7 10.2 240 19 114 153 1.8 10.8 234 20 120 152 1.9 11.4 228 21 126 150 2.0 12 223 22 132 149 3 18 198 30 180 148 4 24 181 40 240 147 5 30 175 50 300 146

(34)

Chapter 4 Transmitter Circuit Design 6 36 169 60 360 146 7 42 166 70 420 145 8 48 164 80 480 145 9 54 162 90 540 145 10 60 161 100 600 145 11 66 157

We describe the details of the cases for 300ps and 150ps.

300 ps case :

Figure 3.13 Eye Diagram of Rise Time of 300ps for Single-Stage Serializer.

Here, the area is referred to the total gate area.

2 N N p p N P N P m 0.13 117.8 0.13 6) 64 0.3 1 2 (1.3 L ) m NO. W m NO. (W Area 1.3163mW Power 6 m 1, m , m 0.13 m 0.3 ) L W ( , m 0.13 m 1.3 ) L W ( μ × = × × × + × × = × × × + × × = = = = μ μ = μ μ =

(35)

Chapter 4 Transmitter Circuit Design

150ps case:

Figure 3.14 Eye Diagram of Rise Time of 150ps for Single-Stage Serializer.

.3.2.2.2 Novel Tree-Type Serializer

We design and simulate this architecture according to Figure 3.15

2 N N p p N P N P m 0.13 2473.8 0.13 126) 64 0.3 21 2 (1.3 L ) m NO. W m NO. (W Area 17.614mW Power 126 m 21, m , n 0.13 m 0.3 ) L W ( , m 0.13 m 1.3 ) L W ( μ × = × × × + × × = × × × + × × = = = = μ μ = μ μ =

(36)

Chapter 4 Transmitter Circuit Design

Figure 3.15 Design steps for Novel Tree-Type Serializer.

In Step 1, we get the size of first stage 2-to-1 serializer that match the rising time specification by using the command .alter in HSPICE and carefully increase the size of MOS transistors. The result is shown in Table 3.2 and sizes that match the rising time specification are boldface.

Table 3.2 Rise Time versus size of the MOS transistors in Novel Tree-Type Serializer. 13 . 0 3 . 1 ) L W ( p = 13 . 0 3 . 0 ) L W ( n = 13 . 0 3 . 1 ) L W ( p = 13 . 0 3 . 0 ) L W ( n = mp mn Tr (ps) mp mn Tr 0.6 2.7 320 2.0 9 130 0.65 2.925 297 3.0 13.5 106 0.7 3.15 280 3.7 16.65 100 0.75 3.375 261 4 18 95.9 0.8 3.6 248 5 22.5 90.9 0.85 3.825 235 6 27 89.1 Clock Data Skew DFF

Step1:Choose the size of the first stage MUX to

match the rising time spec.

Step4:Choose the size of Data Skew DFF to keep the rising time spec. Step5:Choose the size

of Clock Gen to keep the rising time spec.

Step3:Choose the size of the third stage MUX to match the rising time spec.

Step2:Choose the size of the second stage MUX to match the rising time spec.

Clock

(37)

Chapter 4 Transmitter Circuit Design 0.9 4.05 225 7 31.5 85.0 0.95 4.275 216 8 36 84.6 1.0 4.5 202 9 40.5 82.7 1.05 4.725 190 10 45 81.5 1.1 4.95 184 11 49.5 81.0 1.2 5.4 172 12 54 81.0 1.3 5.85 165 13 58.5 80.5 1.4 6.3 161 14 63 80.2 1.5 6.75 154 15 67.5 79.4 1.6 7.2 149 16 72 78.7 1.7 7.65 142 17 76.5 78.8 1.8 8.1 139 18 81 78.3 1.9 8.55 134 19 85.5 78.0

We describe the detail of the case of 300ps and 100ps for example.

300 ps case :

(38)

Chapter 4 Transmitter Circuit Design

100 ps case :

Figure 3.17 Eye Diagram of Rise Time of 100ps for Novel Tree-Type Serializer.

2 N N p p N P N P m 0.13 25.252 4 0.13 1) 12 0.156 1 2 (0.15 2 0.13 1) 12 0.156 1 2 (0.15 0.13 2.925) 12 0.3 0.65 2 (1.3 L ) m NO. W m NO. (W Area 1.3136mW Power 2.925 m 0.65, m , m 0.13 m 0.3 ) L W ( , m 0.13 m 1.3 ) L W ( μ × = × × × × + × × + × × × × + × × + × × × + × × = × × × + × × ∑ = = = = μ μ = μ μ = 2 7 1 mux mux N N p p N P N P m 0.13 82.592 4 0.13 1) 12 0.156 1 2 (0.15 2 0.13 1) 12 0.156 1 2 (0.15 0.13 16.65) 12 0.3 3.7 2 (1.3 L ) m NO. W m NO. (W Area 2.7900mW Power 16.65 m 3.7, m , m 0.13 m 0.3 ) L W ( , m 0.13 m 1.3 ) L W ( μ × = × × × × + × × + × × × × + × × + × × × + × × = × × × + × × = = = = μ μ = μ μ =

=

(39)

Chapter 4 Transmitter Circuit Design

.3.2.2.3 Conventional Tree-Type Serializer

The 8-to-1 conventional tree-type architecture is shown in Figure 3.18. We design and simulate this architecture according to Figure 3.15 which is the same to novel tree-type MUX.

Figure 3.18 8-to-1 conventional tree-type architecture.

As before, we show the rising time versus size of MOS transistors in Table 3.3 and sizes that match the rising time specification are boldface.

Table 3.3 Rise Time versus Size of the MOS Transistors in Conventional Tree-Type Serializer. 13 . 0 3 . 1 ) L W ( p = 13 . 0 3 . 0 ) L W ( n = 13 . 0 3 . 1 ) L W ( p = 13 . 0 3 . 0 ) L W ( n = mp mn Tr (ps) mp mn Tr (ps) 0.6 2.5 308 21 87.5 62.7 0.7 2.875 269 24 100 62.5 0.8 3.375 240 27 112.5 62.3 D1 D5 D4 D8 D3 D7 D2 D6 CLK/4 CLK/2 CLK=2.5G CLK/2 CLK/2 CLK/4 CLK/4 CLK/8 CLK/8

(40)

Chapter 4 Transmitter Circuit Design 0.9 3.75 215 30 125 62.5 1.0 4.125 201 33 137.5 62.2 1.1 4.626 183 36 150 62.3 1.2 5.0 171 39 162.5 62.7 1.3 5.375 163 42 175 62.9 1.4 5.875 153 45 187.5 62.7 1.5 6.25 145 48 200 62.2 1.8 7.5 130 51 212.5 61.8 2.1 8.75 118 54 225 62 2.4 10 109 57 237.5 63.4 2.7 11.25 102 60 250 62.4 3.0 12.5 97.4 63 262.5 62.7 6.0 25 74.7 66 275 62.8 9.0 37.5 68.4 69 287.5 62.8 12 50 65.4 72 300 63.4 15 62.5 64 75 312.5 63.3 18 75 62.7

We describe the detail of the case of 300ps and 100ps for example.

(41)

Chapter 4 Transmitter Circuit Design

Figure 3.19 Eye Diagram of Rise Time of 300ps for Conventional Tree-Type Serializer.

100 ps case :

Figure 3.20 Eye Diagram of Rise Time of 100ps for Conventional Tree-Type Serializer. 2 DFF P N 7 1 mux p p N N mux N P N P m 0.13 340.032 9 0.25 0.13 0.3) 88 1.3 (88 6 0.2 0.13 2.5) 12 0.3 0.6 2 (1.3 0.13 2.5) 12 0.3 0.6 2 (1.3 L ) W 88(W L ) m NO. W m NO. (W Area 3.2329mW Power 2.5 m 0.6, m , m 0.13 m 0.3 ) L W ( , m 0.13 m 1.3 ) L W ( μ × = × × × × + × + × × × × × + × × + × × × + × × = × + + × × × + × × = = = = μ μ = μ μ =

= 2 DFF P N 7 1 mux p p N N mux N P N P m 0.13 377.150 9 0.25 0.13 0.3) 88 1.3 (88 6 0.045 0.13 11.25) 12 0.3 2.7 2 (1.3 0.13 11.25) 12 0.3 2.7 2 (1.3 L ) W 88(W L ) m NO. W m NO. (W Area 3.9895mW Power 11.25 m 2.7, m , m 0.13 m 0.3 ) L W ( , m 0.13 m 1.3 ) L W ( μ × = × × × × + × + × × × × × + × × + × × × + × × = × + + × × × + × × = = = = μ μ = μ μ =

=

(42)

Chapter 4 Transmitter Circuit Design

3.2.3 The Comparison Results as Figures and Tables.

Figure 3.21 Area v.s. Rising Time.

Table 3.4 Power of three architectures versus Rising Time.

Rising Time Single-Stage MUX Conventional Tree Novel Tree Type 300ps 1.3163 3.2329 1.3136 275ps 1.6726 3.2618 1.3328 250ps 1.8246 3.2955 1.3706 225ps 2.0895 3.3722 1.4089 200ps 2.2480 3.3743 1.4479 175ps 3.5854 3.4486 1.5257 150ps 17.614 3.5068 1.5934 125ps X 3.6624 1.8391 Single-Stage MUX

Novel Tree-Type MUX Conventional Tree-Type MUX

(43)

Chapter 4 Transmitter Circuit Design

100ps X 3.9895 2.7900

Rising Time 170ps 165ps 160ps 155ps

Single -Stage 6.0694 7.2889 8.6057 12.326

Table 3.5 Area of three architectures versus Rising Time.

Rising Time Single-Stage MUX

Conventional Tree

Novel Tree Type

300ps 117.8x0.13um 340.032 x0.13um 25.252 x0.13um 275ps 141.36x0.13um 342.051 x0.13um 26.12 x0.13um 250ps 176.7 x0.13um 343.78 x0.13um 28.072 x0.13um 225ps 235.6 x0.13um 345.946 x0.13um 29.88 x0.13um 200ps 353.4 x0.13um 347.872 x0.13um 31.832 x0.13um 175ps 589 x0.13um 350.592 x0.13um 35.52 x0.13um 150ps 2473 x0.13um 354.482 x0.13um 43.112 x0.13um 125ps X 361.786 x0.13um 50.56 x0.13um

100ps X 377.150 x0.13um 82.592 x0.13um

Rising Time 170ps 165ps 160ps 155ps

Single-Stage MUX

(44)

Chapter 4 Transmitter Circuit Design Power v.s. Rising Time

Rising Time (ps) 50 100 150 200 250 300 350 Po wer (m W) 0 2 4 6 8 10 12 14 16 18 20 Single Stage Conventional Tree Novel Tree

Figure 3.22 Power v.s. Rising Time.

Area v.s Rising Time

Rising Time 50 100 150 200 250 300 350 Area 0 50 100 150 200 250 300 350 Single Stage Conventional Tree Novel Tree

Figure 3.23 Area v.s. Rising Time.

Table 3.6 Power X Area of three architectures versus Rising Time.

Rising Time Single-Stage MUX Conventional Tree Novel Tree Type 300ps 20.1578 142.9076 4.3122 275ps 30.7370 145.0413 4.5257

(45)

Chapter 4 Transmitter Circuit Design 250ps 41.9129 147.2805 5.0018 225ps 63.9972 151.6579 5.4727 200ps 103.2776 152.5972 5.9916 175ps 274.5341 157.1767 7.0451 150ps 5663.8698 161.6072 8.9303 125ps X 172.2507 12.0880 100ps X 195.6032 29.9561 Rising Time 170ps 165ps 160ps 155ps Single-Stage MUX 557.6807 892.9777 1317.8769 2831.4055

Power x Area v.s Rising Time

Rising Time 50 100 150 200 250 300 350 Pow er X Are a 0 1000 2000 3000 4000 5000 6000 Single Stage Conventional Tree Novel Tree

(46)

Chapter 4 Transmitter Circuit Design

Power x Area v.s. Rising Time

Rising Time 50 100 150 200 250 300 350 Po wer x Area 0 50 100 150 200 250 Conventional Tree Novel Tree

Figure 3.25 Power X Area v.s. Rising Time of Two Structures.

We use the data from Table 3.1, Table 3.2, and Table 3.3 to plot the Figure 3.21. From this figure, we can know the rising time limitation of each architecture due to the uncharged rising time of rapidly increased area. The rising time limitation of single-stage MUX is 150ps. The rising time limitation of novel tree-type MUX is 80ps. The rising time limitation of conventional tree-type MUX is 65ps. It also means the bandwidth limitation of each architecture and we can see the bandwidth of novel tree- type MUX is larger than single-stage MUX and a little less than conventional tree -type MUX.

Table 3.4 shows the power versus rising time of three architectures. Figure 3.22 is plotted from the data of Table 3.4. Table 3.5 shows the area overhead versus rising time of three architectures. Figure 3.23 is plotted from the data of Table 3.5. Table 3.6 shows the power area product versus rising time of three architectures. Figure 3.24 is plotted from the data of Table 3.6. Figure 3.25 shows only the power-area comparison of two tree structures. Here, area is referred to the total gate area.

Figure 3.22 and Figure 3.23 show that single-stage serializers can only go up to 6.5Gbps. Beyond 5Gbps, power and area increase significantly. Conventional tree-type and proposed tree-type serializers are able to reach 10Gbps with relatively constant power and area overhead. Due to these results of comparison, the advantages of low power and low area overhead over single-stage and conventional tree-type serializers are verified.

(47)

Chapter 4 Transmitter Circuit Design

Figure 3.26 Analysis vs. Simulation.

Table 3.7 Analysis vs. Simulation.

As following, we compare the result of simulations and analysis as Eq 6. We 5:1.2 5:0.97 Tr=175ps 2:0.9 2:0.74 Tr=225ps 1.2:0.7 1.2:0.58 Tr=275ps 1:0.65 1:0.53 Tr=300ps 1.5:0.8 1.5:0.65 Tr=250ps Tr=150ps Tr=200ps 21:1.6 21:1.16 3:1 3:0.85 Simulatoin mp8:mp2 Analysis mp8:mp2

mp8 v.s. mp2

mp8 0 5 10 15 20 25 mp2 0 1 2 3 mp2 (Analysis) mp2 (Simulation) 1560 m 95 . 1168 m 1440 : m m : m 8 p 8 p p8 2 p p8 = +

(48)

Chapter 4 Transmitter Circuit Design

arrange the result as Table 3.7 and Figure 3.26 and verify the analysis in section 3.2.1 is matched the simulation.

3.3 Summary

In this chapter, we finish the analysis and comparison.

Table 3.8 shows the numerical data of power and area comparisons for five different rise time. Table 3.9 standardizes the performance using conventional tree as the reference. As one can see, the proposed design consumes 0.43 power and occupies 0.09 area of the conventional tree at 5Gbps (200ps rise time). Together, it is 25.84 times better than the conventional one. At 10 Gbps (100ps), the power and area ratio is 0.70 and 0.22. Performance wise, it is 6.49 times better.

Table 3.8 Power and Area Comparisons.

Single-Stage MUX Conventional

Tree Novel Tree Rise

Time

Power Area Power Area Power Area

100ps - - 3.99 49.03 2.79 10.74

150ps 17.61 321.59 3.51 46.08 1.59 5.60 200ps 2.25 45.94 3.37 45.22 1.45 4.14 250ps 1.82 22.97 3.30 44.69 1.37 3.65 300ps 1.32 15.31 3.23 44.20 1.31 3.28 Table 3.9 Power and Area Comparisons Standardizes by Using Conventional Tree.

Single-Stage

MUX Conventional Tree Novel Tree Rise

Time

Power Area Power Area Power Area 100ps - - 1 1 0.70 0.22

150ps 5.02 6.98 1 1 0.45 0.12

200ps 0.67 1.02 1 1 0.43 0.09

250ps 0.55 0.51 1 1 0.41 0.08

(49)

Chapter 4 Transmitter Circuit Design

Chapter 4

Transmitter Circuit Design

4.1 Introduction

This chapter will describe the detail circuit design of the chip implementation. Note that, 5GHZ VCO (voltage-controlled oscillator) is difficult to implement using 0.13um technology unless using a LC tank type oscillator. Without 5GHZ clock, the final stage is a 4-to-1 multiplexer, as will be shown in later. Since the test chip contains the serializer and a driver. There is no PLL on chip. Hence, the clock source is a 5GHZ clock. It is divided into a 4-phase 2.5GHz clock to emulate the 2.5GHZ PLL.

4.2 Circuit Design

Figure 4.1 is the whole architecture of this chip. It includes four 8-to-1 serializers, one 4-to-1 serializer and multi-stage driver. In this section, we describe the design of each block in detail and show the circuit.

(50)

Chapter 4 Transmitter Circuit Design

Figure 4.1 Whole architecture of the chip

4.2.1 MUX 8-to-1

There is no consideration about the propagation delay of each stage in Figure 3.3. But this is actually a design issue and we consider it here. In 0.13 mμ technology, a simple inverter with FO4 has 60ps propagation delay. Taking this delay into the 8-to-1 MUX of 2.5Gbps data rate, as Figure 3.4, we can see the timing diagram is shown in Figure 4.2. The Pn[1] and Pn[1]b are1.25GHz. The Pn[2], Pn[2]b, Pn[3] and Pn[3]b are through the first stage of frequency divider and have 625MHz. The Pn[4], Pn[4]b, Pn[5], Pn[5]b, Pn[6], Pn[6]b, Pn[7], and Pn[7]b which are through the second

Figure 4.2 Clock Diagram of 8-to-1 MUX with Propagation Delay

out Pn[1] Pn[2] Pn[3] Pn[4] Pn[6] Pn[4]b Pn[5]b Pn[5] Pn[1]b Pn[2]b Pn[3]b Pn[7]b Pn[6]b Pn[7] 8-1 8-1 8-1 8-1 D0~D7 D8~D15 D16~D2 D24~D31 Multi-phase Gen 312.5M PRBS 5GHz 4-phase 2.5GHZ 4-1 Driver

(51)

Chapter 4 Transmitter Circuit Design

stage of frequency divider have 312.5MHz. These clock frequency have the 60ps delay and so does the serializer. Figure 4.3 is the structure of 8-to-1 MUX and the data skew DFF as shown in figure. In conventional, we can use a positive and a negative trigger DFF to implement and we need twelve DFFs in 8-to-1 MUX. However, there is a more efficient way to implement. From [14~15] [17~23], we can use

Master-Slave-Master Flip Flop(MSM FF) to replace the positive and negative trigger DFF.The 90 phase shift between the inputs of serializer is achieved by adding an o

MSM-FF (extra latch) to one path.

Figure 4.3 Proposed Novel Tree-Type Serializer

Figure 4.4 Circuit of Differential DFF

Proposed Novel Tree MUX 8-to-1

CLK out 1.25G 625M 312.5M Skew M S M S M S ckb ck D Db Q Qb Diff DFF cell

(52)

Chapter 4 Transmitter Circuit Design

Figure 4.5 Structure of 8-to-1 MUX and Clock Gen

We use a new differential DFF as shown in Figure 4.4 for our clock generator and data skew DFF. This DFF has higher bandwidth and smaller area overhead than original one, as shown in Figure 3.11. This is because the fewer MOS transistors and less output node capacitance in this new differential DFF. The other reason we use this differential DFF is the requirement of 0 , o 90 , o 180 , o 270 phase of clock. o

Figure 4.5 show the structure of 8-to-1 serializer and clock generator. The circuit of each 2-to-1 MUX is shown in Figure 2.7(b). The corresponding data and clock diagram of each node is in Figure 4.6. Like Figure 4.2, Figure 4.6 adds the propagation delay time. The third stage 2-to-1 MUX outputs of novel tree-type MUX are net1, net2, net3, and net4 which are 625Mbps. The net1 multiplexes D1 and D5. The net2 multiplexes D2 and D6. The net3 multiplexes D3 and D7. The net4 multiplexes D4 and D8. The second stage 2-to-1 MUX outputs of novel tree-type MUX are net5, and net6 which are 1.25Gbps. The net5 multiplexes net1 and net3. The

D1 D3 D2 D Q D Q Q D D Q D Q Q D D Q Pn0 Pnb Pn[1] Pn[1]b Pn[2] Pn[2] Pn[3] Pn[3] Pn[4] Pn[4]b Pn[6] Pn[6] Pn[5] Pn[5]b Pn[7] Pn[1] Pn[1] D8 D5 D7 net1 net2 net3 net4 net5 net6 D4 D6 Out Pn[7]b

(53)

Chapter 4 Transmitter Circuit Design

net6 multiplexes net2 and net4. The first stage 2-to-1 MUX outputs of novel tree-type MUX are 2.5Gbps data rate. The out multiplexes net5 and net6..

Figure 4.6 Data and Clock Diagram of 8-to-1 MUX with Delay

4.2.2 Mux 4-to-1

Since the highest frequency of input clock rate is four phases 2.5GHZ, we use a 4-to-1 single-stage serializer to convert the four 2.5Gbps data rate to 10Gbps data rate. In [25], we know we can add an inductor in circuit to increase the bandwidth. This is called inductive peaking. The idea is to make the capacitance that limits the bandwidth resonate with the inductor. We describe the conception in detail as following. In Figure 4.7, the two circuits are common source stage with and without inductor peaking. If we have a input step pulse in Vi, the inductor in Figure 4.7 (d) serves as an open circuit since the components of high frequency in the transition of

Pn[1] Pn[2] Pn[3] Pn[4] Pn[6] Pn[4]b Pn[5]b Pn[5] Pn[1]b Pn[2]b Pn[3]b Pn[7] Pn[6]b Pn[7] Din[1:4] Din[5:8] Out

Realistic timing Diagram (Delay time:60ps)

net5 net6 net1 net2 net3 net4

(54)

Chapter 4 Transmitter Circuit Design

input step pulse. This causes the current all flow through the load C rather than through the resistor R. Thus, the output voltage level changes faster in Figure 4.7(c) than in Figure 4.7(a). The application is shown in [14~15][19][21][26~28].

As described above, inductive peaking can increase bandwidth substantially. But the area overhead due to inductor also increase rapidly. Thus, we overcome the drawback of low bandwidth of single-stage type by use active inductive peaking [24].

Figure 4.7 (a) CS Circuit with Load C (b) Small Signal Equivalent Circuit of (a) (c) CS Circuit with additional inductor (d) Small Signal Equivalent Circuit of (c)

Figure 4.8 Circuit of 4-to-1 MUX

O Ob d3 d2 d1 d4 P2 P1 P1 P4 P4 P3 P3 P2 d4b d1b d2b d3b P2 P1 P1 P4 P4 P3 P3 P2 4-1 MUX schematic i mV g Vi Vi R R C C L Vo Vo C C R R L i mV g Vo Vo (a) (b) (c) (d)

(55)

Chapter 4 Transmitter Circuit Design

Figure 4.9 Data and Clock Diagram of 4-to-1 MUX with Delay

The circuit of this 4-to-1 MUX is shown in Figure 4.8. We add an additional NMOS transistor as current source in each output node to enhance the inductance of active inductive peaking. . Figure 4.9 is the data and clock diagram of 4-to-1 MUX with delay.

4.2.3 MUX 32-to-1

Figure 4.10 Architecture of 32-to-1 MUX out D1 D3 D2 D4 P1 P3 P2 P4

4-1 MUX timing diagram

D4 D1 D2 D3 D4 D5 out D1 D2 D3 D4 D5 out D1 D2 D3 D4 D5 out D1 D2 D3 D4 D5 out D1 D5 D9 D13 D17 D21D25D29 D2 D6 D10 D14 D18 D22D26D30 D3 D7 D11 D15 D19 D23D27D31 D4 D8 D17 D16 D20 D24D28D32 D1 D2 D3 P1 P2 P3 P4 Out

(56)

Chapter 4 Transmitter Circuit Design

Figure 4.10 shows the overall circuit structure for the proposed 32-to-1 serializer for 10Gbps serial I/O. The module will be integrated into a 0.13um chip with an 8-phase 2.5GHz PLL.

4.2.4 Driver

For the requirement of measurement, we design a frequency divider to divide the input 5GHZ into four phases 2.5GHZ. And we design a multi-stage driver to drive the signal from 32-to-1 serializer. The circuit diagram is shown in Figure 4.11. This current mode logic (CML) driver is a conventional way in driver design [18] [20] [26~27] [29~30]. And the design skill is in [31]. This architecture has good immunity to SSN.

Figure 4.11 Architecture of Multi-Stage Driver

4.3 Simulation Result

Figure 4.12 is simulation result of the multi-phase generator output. It generates Pn[1] with 1.25GHZ, Pn[2], Pn[3] with 625MHZ, and Pn[4],Pn[5], Pn[6],Pn[7] with 312.5MHZ. These clocks are for 8-to-1 serializer and the simulation result is matched Figure 4.2. Figure 4.13 is the simulation of 8-to-1 serializer. It includes multi-phase clock, net1, net2, net3, net4 with 625Mbps data, net5, net6 with 1.25Gbps data, and out with 2.5Gbps data. This result is matched Figure 4.6. Figure 4.14 is the simulation result of 4-to-1 serializer. This serializer serializes four 2.5Gbps data from four 8-to-1 serializers to 10Gbps.

Figure 4.15 is the eye diagram of data through 32-to-1 MUX and multi-stage 500f

(57)

Chapter 4 Transmitter Circuit Design

driver. The data rate is 10Gbps. The output swing is 300mV. And the jitter is 3.66ps. Table 4.1 is the power consumption of each part of circuit. The total power consumption is 27.06mW. Figure 4.16 shows the effect of ground bounce to VDD and GND. The noise(P-P) is 40mV.

Figure 4.12 Simulation Result of The Multi-Phase Generator

Figure 4.13 Simulation Result of 8-to-1 Serializer. Pn[1 Pn[2 Pn[3 Pn[4 Pn[6 Pn[4] Pn[5] Pn[5 Pn[1] Pn[2] Pn[3] Pn[7] Pn[6] Pn[7] Pn[1] Pn[2] Pn[3] Pn[4] Pn[6] Pn[5] Pn[7] Din[1:4] Din[5:8] out net5 net6 net1 net2 net3 net4

(58)

Chapter 4 Transmitter Circuit Design

Figure 4.14 Simulation Result of 4-to-1 Serializer.

Figure 4.15 The Eye Diagram of 10Gbps Transmitter Table 4.1 The Power Consumption of Each Part of Circuit.

Module Current (mA) Power (mW) 8-1 MUX cell (4X)+ 4-1 MUX 2.7 3.24 4-phase 2.5GHZ 3.65 4.38 32bit_PRBS+multi-phase Gen 4.9 5.88 Driver 11.3 13.56 Total 22.55 27.06 o ouutt D D33 D D11 D D44 D D22 P P11 P P33 P P22 P P44

(59)

Chapter 4 Transmitter Circuit Design

Figure 4.16 The Effect of Ground Bounce to VDD and GND

4.4 Implementations

The chip has been implemented using TSMC 0.13um 2P8M CMOS process. It contains a 32-to-1 serializer, a 10Gbps driver, and a 32-bit PRBS (pseudo random bit sequence) generator. The diagrams of layout are shown in Figure 4.17 to Figure 4.23. The core area for the serializer is only 200um X 150um. The driver area is 360um X 110um. The total area of this chip is 1.14mm X 0.99mm.

Figure 4.17 Layout of MUX2 and DFF

Ground Bounce ~ 40mV (pk-pk)

(60)

Chapter 4 Transmitter Circuit Design

Figure 4.18 Retiming and PRBS DFF

Figure 4.19 8-to-1 MUX

Figure 4.20 (a) Clock Generator for 8-to-1 MUZ (b) 4-to-1 MUX (c) 5GHZ to four phase 2.5GHZ Clock divider

M Maasstteerr S Sllaavvee M Maasstteerr S Sllaavvee M Maasstteerr M MUUXX MMUUXX22 MMUUXX22 MMUUXX22 M MUUXX22 MMUUXX22 M MUUXX22 (a) (b) (c)

(61)

Chapter 4 Transmitter Circuit Design

Figure 4.21 32-to-1 Serializer

Figure 4.22 Layout of Whole Chip (Without Dummy) gndd gndd gndd gndd ck e ckb e s r vdd2 vdd1 vdd3 vdd4 gnda gnda gnda gnda outb out

數據

Figure 1.1 Conventional transceiver
Figure 2.1 Diagram of N-to-1 multiplexer
Figure 2.6 Circuit of 4-to-1Single-Stage Type
Figure 3.2    4-to-1 Novel Tree-Type Serializer
+7

參考文獻

相關文件

2011 年,美國民眾 發起「占領華爾街 運動」,抗議「富

Have shown results in 1 , 2 &amp; 3 D to demonstrate feasibility of method for inviscid compressible flow

A floating point number in double precision IEEE standard format uses two words (64 bits) to store the number as shown in the following figure.. 1 sign

A floating point number in double precision IEEE standard format uses two words (64 bits) to store the number as shown in the following figure.. 1 sign

[r]

• Figure 26.26 at the right shows why it is safer to use a three-prong plug for..

• Figure 26.26 at the right shows why it is safer to use a three-prong plug for..

Taking second-order cone optimization and complementarity problems for example, there have proposed many ef- fective solution methods, including the interior point methods [1, 2, 3,