Chapter 1 Introduction
1.3 T HESIS ORGANIZATION
bottleneck. PCI-Express and Serial ATA are two prominent examples. For serial transmission links, it maximizes the communication bandwidth and distance in a single transmission line. Serial links offer a high-speed and low-cost solution to multi-gigabit per second rates over long distance. Applications such as computer-to-computer or computer-to-peripheral interconnection can reach several meters. A key component is a serialier that converts low-speed parallel data into high-speed serial output stream.
In this thesis, a novel tree-type serializer circuit is proposed. We implement this transmitter architecture using non-return-to-zero (NRZ) signal techniques. A 10 Gbps novel tree-type serializer with output driver and PRBS (Pseudo Random Bit Sequence) has been designed. We also analyze the on-chip channel mode and design a low power driver for the channel with 1cm length.
1.3 Thesis organization
The rest of the paper is organized as follows.
In Chapter 2, we describe and analyze the conventional structure of serializer. In Chapter 3, we introduce the proposed novel tree-type serializer architecture and analyze and compare to other conventional architecture. The simulation results of comparation are also showed.
In Chapter 4, the chip implementation is presented. We show the full architecture of this transmitter. We also show the detail circuit of each block. Finally, we present the simulation results, layout, and measurement consideration of the design.
In Chapter5, the measurement results are presented. It includes off-chip measurement and on-wafer measurement by a probe station. The results include eye diagrams, jitters (Pk-Pk)(RMS), power consumptions.
In Chapter 6, we show an on-chip channel model analysis and a low power driver.
The research is concluded in Chapter 7.
Chapter 2 Background Study
Chapter 2
Background Study
2.1 Other Structure of Serializer
Serializer, also called Multiplexer or MUX, has the function of converting parallel low speed input data into serially high-speed output data stream. As Figure 2.1 shows, a conceptual block diagram of a serializer. In Figure 2.1, there is a N-to-1 multiplexer. Di1 to Din are n-bit parallel low speed input data. Selected by ck1 to
ckn, Di1, Di2 and Din are serialized into high-speed output, DO. Its data rate is n times of Di. In many applications, the number of inputs of serializer is power of two, like 2, 4, 8, 16. Some system like PCI-Express may encode the output data. Thus, the number of input of serializer may be changed to another number. For example as 8B/10B scrambler need a 10 to 1 multiplexer.
There are three principal structures of serializer. They are shift-register type, single-stage type, and tree-type serializer. The architecture is shown in Figure 2.2, 2.3, and 2.4. There are other special architectures, like CML (Current Mode Logic) MUX
Chapter 2 Background Study
as shown in Figure 2.5. We will explain the structures in the next chapter.
ck1
Parallel input data & clock
Serial output data CK
Parallel input data & clock
Serial output data CK
DO D1 D2 DN
Figure 2.1 Diagram of N-to-1 multiplexer
Chapter 2 Background Study
2 to 1 MUX cell timing diagram
D0 D1 D2 D3 D4 D5 D6 D7
Mux Mux Mux
Mux
Chapter 2 Background Study
d1 d1b
d2 d2b
S SN
Vb d1 d1b
d2 d2b
S SN
Vb
Figure 2.5 Serializer of CML
2.2 Shift-Register Type Serializer
Figure 2.2 shows the shift-register type serializer. The main function of this architecture is parallel load and serial shift. Both work of different frequencies.
Parallel load works of low data rate. It uses CK2 as the clock. The parallel data inputs load in the D Flip Flop (DFF). Serial shift works of high-speed data rate It uses CK1 as function clock. The high data rate DFF trigged by CK1 sends data into a sequenced stream. The data in the serial shift register have been sent out entirely. CK3 loads the data from parallel load register into serial shift register. CK1 has the highest clock rate.
It is divided to produce CK2, CK3. Refer to the timing diagram of the clock and data in Figure 2.2, this serializer works as follows.
The shift-register type serializer is a straightforward implementation. It can process arbitrary number of parallel data by increasing the number of DFFs and adjusting clock rate. The jitter is small with an ideal clock. However, there are several drawbacks. First, the maximum operating speed of this circuit is limited by the device performance. [3]. According to [4], only 3gbps transmission can be achieved even with 0.15um CMOS transistors technology. Second, it needs an extreme high speed and low jitter global clock. The DFF of serial shift work at the highest rate. This causes a large power consumption.
Chapter 2 Background Study
2.3 Single-Stage Type Serializer
Figure 2.6 Circuit of 4-to-1Single-Stage Type
Figure 2.3 is the structure of a single-stage type serializer. Figure2.6 is the basic circuit diagram of this structure. The multiplexer needs to input the clock with the same frequency as the parallel input data. As show in Figure 2.3, the data is sent out when two specific clocks with different phases overlap. For example, d0 is transmitted when Φ0 and Φ1b overlap (both are 1). The data period of d0 is from Φ0 positive edge to Φ1b negative edge. The other data are transmitted by the same rule.
There is also one point that should be remarked in Figure 2.6. Many papers show that the device of data input is just a NMOS transistors.[5~10]. But, in [11] [12], we know that adding an extra PMOS transistors of data input has a benefit. When data is low, the PMOS transistors turns on and drives current to precharge the internal node to a high level. In other words, this technique can reduce the charge sharing effect and alleviates data jitter.
In order to have large output swing, the pull-up PMOS transistors must be weakly sized to reduce the driving capability. This makes the low to high transient time larger and the unbalance of rising and falling times. To achieve higher speed, we should reduce the output swing. The analyses of output swing and delay time to pull-up PMOS transistors size are shown in [10].
Basically, it is a multiplexer controlled by the phases of a multi-phase low-speed clock. The power consumption is small. This serializer can also handle arbitrary number of parallel data. It sends out one bit of data at each phase interval. The most significant drawback is the large self parasitic capacitance at the outputs that limit the bandwidth performance.[1][9] Furthermore, phase imbalance of the clock may also
O
d1b d2b d3b d4b
P2 P1 P1
P4 P4
P3 P3
P2
Ob
d1 d2 d3 d4
P2 P1 P1
P4 P4
P3 P3
P2
Chapter 2 Background Study
create jitters.
2.4 Conventional Tree-Type Serializer
Figure 2.4 shows a 8-to-1 tree-type serializer for high-speed applications. It is composed of three stages of 2-to-1 multiplexers organized as a tree. A high-speed clock, normally at half the data rate, is divided to control the successive stages.
However, due to the two inputs need to be out of phase, retiming mechanism is required [13~15]. We describe the 2-to-1 MUX in detail in Figure 2.4. We use CK/2(0) to retime DFF. D0 is latched by one positive triggered DFF. D1 is latched by one positive triggered DFF and one negative triggered DFF. After the retiming, D0 and D1 have a 180 degree phase shift. Then those two data as sent into a 2-to-1 MUX and we use CK/2(90) to select data out of the MUX. Notice the timing diagram of Figure 2.4, using CK/2(90) to select data during 1/4 to 3/4 the data period ensures enough setup time and hold time.
The conventional tree-type serializer is able to operate at a high frequency due to the low output parasitic capacitance and retiming mechanism. This architecture can only convert power of two of parallel input data, such as 2, 4, 8, and 16. It is able to achieve higher speed than a single-stage serializer. However, its hardware overhead and power consumption is higher.
Figure 2.5 and Figure 2.7 are the conventionally circuits of 2-to-1 MUX block.
Figure 2.5 shows a CML of 2-to-1 MUX. It has a current source NMOS transistor biased by Vb to support a biasing current. The select S and inversion SN decide either d1 or d2 to be transmitted. As CMOS process technology scaled fast in recent years, supply voltage is lowed. The implementation of CML is harder due to three stages of NMOS transistors. Figure 2.7 is much alike a single-stage serializer and has lower parasitical capacitance at output node. Figure 2.7(a) is a 2-to-1 single-stage circuit and Figure 2.7(b) adds a PMOS transistor data input to reduce charge sharing effect as describe before.
Chapter 2 Background Study
Figure 2.7 Circuit of 2-to-1 MUX in tree-type serializer
Table 2.1 is a comparison of three types of MUX. The advantage is that this structure can work using a ring oscillator type phase lock loop (PLL). This means that the needed clock rate is 1/N of the transmission data rate.
Tree-type serializer is composed of multiple stages. This makes the number of input in each stage as well as the parasitical capacitance at output node be reduced.
For this reason, the bandwidth of tree-type serializer is the highest among the three structures. The shortcoming is the requirement of a higher clock rate.
Table 2.1 Comparison of three kinds of MUX
High freq, single phase
Low freq, multi-phase High freq, single
phase External clock
property
High freq, single Low freq,
multi-High freq, single External clock
Bandwidth
Chapter 4 Transmitter Circuit Design
Chapter 3
The Novel Tree-Type Serializer
3.1 Functional Blocks
In this chapter, we will introduce a new serializer structure which consumes less power and area. First, we explain the 2-to-1 MUX and the control clock. Second, we show the configuration of 4-to-1 and 8-to-1 MUX. Finally, we describe the design issue. Figure 3.1 shows the conventional and proposed novel tree-type 4-to-1 serializer (multiplexer) cells. Three retiming D-type Flip-Flops (DFF), as shown in Figure 2.4, are removed. Instead, quadrature clocks are used for the switch control in the previous stage. The first stage is controlled by the original clock to switch and output data at two times the clock rate. The second stage is controlled by two divide-by-two clocks with phase difference of 90 degree. As one can see, with quadrature clocks, o retiming can be waived. Moreover, data is ready one half period before being switched in. Therefore, there is no data dependent jitter. The overall jitter is determined by the output control clock.
Chapter 4 Transmitter Circuit Design
Figure 3.1 also shows the timing diagram without propagation delay. Therefore, there is no timing variation at the output. Figure 3.2 and Figure 3.3 are 4-to-1 and 8-to-1 MUX with quadrature clocks. The circuit structure is simple and regular.
Without propagation delay, each stage of this serializer will have the setup time which is half of the input clock period and have no hold time.
The propagation delay is a design issue in chip implementation. Figure 3.4 shows the case that considers the propagation delay. T1 is the delay of clock divide; T2 is the delay of the MUX; T3 is one-bit time. As one can see, the setup timing margin is T3-T1-T2; and the hold time margin is T1+T2. In general, they are more than enough for the MUX to operate reliably.
Figure 3.1 The original and proposed tree-type multiplexers.
Figure 3.5 shows the architecture of the novel tree-type serializer with2 to 1. N The circuit structure is simple and regular. The novel tree-type serializer embeds data retiming in the previous stage of MUX. Due to this, hardware overhead and power consumption are expected to be lower.
Modified mux cell & timing
I1 Out
Original mux cell & timing
D1
Chapter 4 Transmitter Circuit Design
Figure 3.2 4-to-1 Novel Tree-Type Serializer
Figure 3.3 8-to-1 Novel Tree-Type Serializer Ck 0
Chapter 4 Transmitter Circuit Design
Figure 3.4 Timing Diagram of The Proposed MUX.
Figure 3.5 Architecture of The Novel Tree-Type Serializer with2 to 1. N
3.2 Comparison of Three Structures
We compare our novel tree-type serializer to the single-stage and the conventional tree-type serializer in this section. In section 3.2.1, we analyze the required number of PMOS transistors in single-stage and novel tree-type serializer.
This could help we understand the speed limitation of them. We can also know the difference of size in the two architectures when both of them work in the same transient time and boundary conditions. Section 3.2.2, we compare the three
1/2
1/2 1/2 1/2
1/2
1/2 I1 Out
I0 I1
I
I1 I
CK/2(90°) CK/2(0°)
C D
D
I-Q Gen
CK/2(0)
CK CK/2(90)
Out D1 D0
D0 D1 D0 D1
T1T2
T2
Tsetup Thold
T3
Clock
T1 : I-Q Gen delay, T2 : MUX delay, T3:half CK period Tsetup : Setup time timing margin = T3 – T1 – T2 Thold : hold time timing margin = T3 – Tsetup
Chapter 4 Transmitter Circuit Design
architectures by using HSPICE for simulation. The simulations of the three architectures are with the same boundary conditions to ensure the fair of comparison.
Section 3.2.3, we show the comparison results as figures and tables.
3.2.1 Analysis of Novel Tree-Type and Single-Stage Serializer
Considering the chip design issue described in Chapter 4, we need four 2.5Gbps 8-to-1 serialzer and one 10Gbps 4-to-1 serializer. Therefore, the analysis and simulation focus on the 2.5Gbps 8-to-1 serializer. Figure 3.6 shows the 8-to-1 single-stage serializer with dummy PMOS transistor which alleviate the charge sharing effect.
Figure 3.6 8-to-1 Single-Stage Serializer.
We consider a basic inverter in TSMC 0.13μ technology. The design rule for m smallest width is 0.3μ . The basic inverter is shown in Figure 3.7. m
Figure 3.7 Basic Inverter.
The average of C and d C of PMOS transistor and NMOS transistor of the g inverter is as follows.
Chapter 4 Transmitter Circuit Design
( 1)
Figure 3.8 Half Circuit of Single-Stage and Novel Tree-Type Serializer.
Figure 3.8 show the half circuit of these two architectures. Since three stage of 2–to-1 MUX compose a 8-to-1 novel tree-type serializer, we only need to show the last 2-to-1 MUX which dominates the output capacitance and bandwidth. The boundary conditions we assume are (1) for C , we consider PMOS transistor drain out capacitance and up level NMOS transistors drain capacitance. (2) the dummy transistor is not considered. (3) the swing in each architecture is from 0.25V to 1.2 V.
Figure 3.9 Equivalent R,C Circuit of Serializer.
MUX 8:1 : Use single stage
Chapter 4 Transmitter Circuit Design
We calculate the delay time from the output resistance and capacitance of serializer. We use equivalent RC circuit shown in Figure 3.9 to simplify the calculation. The calculations is are
( 2) Now we calculate the output resistance and capacitance of each architecture.
Then, we substitute the results into (2).
For a single-stage MUX:
O_P
Chapter 4 Transmitter Circuit Design
( 3) ( 4) ( 5)
( 6) For a novel tree-type MUX:
( 7)
Chapter 4 Transmitter Circuit Design transition time in single-stage MUX will converge no mater how m increase. So p8
2
m converge to the significant calue. In (18), p whenmp2 > 1.232 mp8 < 0. This implies if m is large than 1.232, it is impossible to find a solution for p2 m . p8 Because if the transition time of m is too short, the increasing of p2 m can not p8 achieve the same transition time of m . p2
3.2.2 Compare three architectures by HSPICE Simulation
In order to verify the low power and low area overhead advantages over the single-stage and conventional tree-type serializers, we design all three of them. We compare these three architectures in three ways (1) power consumption, (2) area overhead, (3) power area product. For solution by HSPICE, the boundary conditions are
Chapter 4 Transmitter Circuit Design
Figure 3.10 Static DFF.
In every architecture, we simulate the cases with the rise time of 300ps、275ps、
250ps、225ps、200ps、175ps、150ps、125ps、100ps. We simulate additional cases for the rise time of 170ps、165ps、160ps、155ps for single-stage MUX only. These extra points would make the simulation result more complete.
Figure 3.11 DFF of Clock Generator.
.3.2.2.1 Single-Stage Serializer
We design and simulate this architecture as shown in Figure 3.6 according to Figure 3.12.
Reset Reset_b
In Out
In
Inb
Out
Outb Reset
Reset
Reset
Reset
Reset_b Reset_b Reset_b
Reset_b
Data Skew DFF
Clock Gen Step1:Choose the size
of MUX to match the rising time spec.
Step2:Choose the size of Data Skew DFF to keep the rising time spec.
Step3 : Choose the size of Clock Gen to keep the rising time spec.
Chapter 4 Transmitter Circuit Design
Figure 3.12 Design steps for single-stage serializer.
We can optimize the simulation and ensure each block consuming appropriated power by the steps shown in Figure 3.12. The rule of the size choosing in the design steps should conform to the TSMC design rule. In step 1, we obtain the size of PMOS transistors and NMOS transistors that match the rise time specification by using the command “.alter” in HSPICE that carefully increases the size of MOS transistors. The results are shown in Table 3.1 and sizes that match the rise time specification are boldfaced.
Table 3.1 Rise Time versus size of the MOS Transistors in Single-Stage Serializer.
13 . 0
3 . ) 1 L (W p =
13 . 0
3 . ) 0 L (W n =
13 . 0
3 . ) 1 L (W p =
13 . 0
3 . ) 0 L (W n =
mp mn Tr mp mn Tr
1.0 6 302 12 72 157
1.1 6.6 292 13 78 157
1.2 7.2 276 14 84 157
1.3 7.8 265 15 90 155
1.4 8.4 257 16 96 154
1.5 9 250 17 102 154
1.6 9.6 245 18 108 154
1.7 10.2 240 19 114 153
1.8 10.8 234 20 120 152
1.9 11.4 228 21 126 150
2.0 12 223 22 132 149
3 18 198 30 180 148
4 24 181 40 240 147
5 30 175 50 300 146
Chapter 4 Transmitter Circuit Design
Figure 3.13 Eye Diagram of Rise Time of 300ps for Single-Stage Serializer.
Here, the area is referred to the total gate area.
2
Chapter 4 Transmitter Circuit Design
150ps case:
Figure 3.14 Eye Diagram of Rise Time of 150ps for Single-Stage Serializer.
.3.2.2.2 Novel Tree-Type Serializer
We design and simulate this architecture according to Figure 3.15
2
Chapter 4 Transmitter Circuit Design
Figure 3.15 Design steps for Novel Tree-Type Serializer.
In Step 1, we get the size of first stage 2-to-1 serializer that match the rising time specification by using the command .alter in HSPICE and carefully increase the size of MOS transistors. The result is shown in Table 3.2 and sizes that match the rising time specification are boldface.
Table 3.2 Rise Time versus size of the MOS transistors in Novel Tree-Type Serializer.
Data Skew DFF
Step1:Choose the size of the first stage MUX to
match the rising time spec.
Step4:Choose the size of Data Skew DFF to keep the rising time spec.
Step5:Choose the size of Clock Gen to keep the rising time spec.
Step3:Choose the size of the third stage MUX to match the rising time spec.
Step2:Choose the size of the second stage MUX to match the rising time spec.
Clock Gen
Chapter 4 Transmitter Circuit Design
0.9 4.05 225 7 31.5 85.0
0.95 4.275 216 8 36 84.6
1.0 4.5 202 9 40.5 82.7
1.05 4.725 190 10 45 81.5
1.1 4.95 184 11 49.5 81.0
1.2 5.4 172 12 54 81.0
1.3 5.85 165 13 58.5 80.5
1.4 6.3 161 14 63 80.2
1.5 6.75 154 15 67.5 79.4
1.6 7.2 149 16 72 78.7
1.7 7.65 142 17 76.5 78.8
1.8 8.1 139 18 81 78.3
1.9 8.55 134 19 85.5 78.0
We describe the detail of the case of 300ps and 100ps for example.
300 ps case :
Figure 3.16 Eye Diagram of Rise Time of 300ps for Novel Tree-Type Serializer.
Chapter 4 Transmitter Circuit Design
100 ps case :
Figure 3.17 Eye Diagram of Rise Time of 100ps for Novel Tree-Type Serializer.
2
Chapter 4 Transmitter Circuit Design
.3.2.2.3 Conventional Tree-Type Serializer
The 8-to-1 conventional tree-type architecture is shown in Figure 3.18. We design and simulate this architecture according to Figure 3.15 which is the same to novel tree-type MUX.
Figure 3.18 8-to-1 conventional tree-type architecture.
As before, we show the rising time versus size of MOS transistors in Table 3.3 and sizes that match the rising time specification are boldface.
Table 3.3 Rise Time versus Size of the MOS Transistors in Conventional Tree-Type Serializer.
13
Chapter 4 Transmitter Circuit Design
0.9 3.75 215 30 125 62.5 1.0 4.125 201 33 137.5 62.2
1.1 4.626 183 36 150 62.3
1.2 5.0 171 39 162.5 62.7
1.3 5.375 163 42 175 62.9
1.4 5.875 153 45 187.5 62.7
1.5 6.25 145 48 200 62.2
1.8 7.5 130 51 212.5 61.8
2.1 8.75 118 54 225 62
2.4 10 109 57 237.5 63.4
2.7 11.25 102 60 250 62.4
3.0 12.5 97.4 63 262.5 62.7
6.0 25 74.7 66 275 62.8
9.0 37.5 68.4 69 287.5 62.8
12 50 65.4 72 300 63.4
15 62.5 64 75 312.5 63.3
18 75 62.7
We describe the detail of the case of 300ps and 100ps for example.
300 ps case :
Chapter 4 Transmitter Circuit Design
Figure 3.19 Eye Diagram of Rise Time of 300ps for Conventional Tree-Type Serializer.
100 ps case :
Figure 3.20 Eye Diagram of Rise Time of 100ps for Conventional Tree-Type Serializer.
2
Chapter 4 Transmitter Circuit Design
3.2.3 The Comparison Results as Figures and Tables.
Figure 3.21 Area v.s. Rising Time.
Table 3.4 Power of three architectures versus Rising Time.
Rising Time Single-Stage MUX
Conventional Tree
Novel Tree Type
300ps 1.3163 3.2329 1.3136 275ps 1.6726 3.2618 1.3328 250ps 1.8246 3.2955 1.3706 225ps 2.0895 3.3722 1.4089 200ps 2.2480 3.3743 1.4479 175ps 3.5854 3.4486 1.5257 150ps 17.614 3.5068 1.5934
125ps X 3.6624 1.8391
Single-Stage MUX
Novel Tree-Type MUX Conventional Tree-Type MUX
Chapter 4 Transmitter Circuit Design
100ps X 3.9895 2.7900
Rising Time 170ps 165ps 160ps 155ps
Single -Stage 6.0694 7.2889 8.6057 12.326
Table 3.5 Area of three architectures versus Rising Time.
Table 3.5 Area of three architectures versus Rising Time.