• 沒有找到結果。

Time Constant

After we obtain C , g C , d R , tr C , and w R from Section 3.3, the time τ w constant of the repeater model is [3]

s s 2

The data transmitted in a single interconnect with bandwidth of BWsingle is inversely proportional to the time constant. To acquire the voltage swing from 5% of V to 95% of DD V , the bandwidth of a single interconnect DD BWsingle is defined as

The global interconnects with repeater insertion is shown in Figure 3.4, where L is the total interconnect length. The global interconnects which link many blocks of a SOC usually consist of a large number (n) of the parallel interconnects, and the total bandwidth BWtotal is

total single

Figure 3.4 Global interconnects with repeater insertion

Power

With technology scaling, the total power consumption is not only the switching power. The leakage power increases rapidly and the short-circuit power has also been shown to be a significant fraction (up to 15%) of the total power consumption for low-power and high-speed designs [4]. The three components of the total power are

analyzed as follows.

Switching power mode

The switching power of the repeater is shown in Figure 3.5. The switching power occurs when current in the repeater charge or discharge C , g C , and w C . The d expression of switching power is

Figure 3.5 Switching power model of repeater

Short-circuit power mode

The short-circuit power of the repeater is shown in Figure 3.6(a). The short-circuit power occurs during the transition from either high-to-low or low-to-high.

Both NMOS and PMOS transistors are on for a short period of time, and there is a current drawn from V through the two transistors to the ground [5]. The input and DD output voltage and current waveforms are shown in Figure 3.6(b). We denote t the r time for the input to rise from V to tn VDDVtp. The short-circuit current waveform is approximated by a triangular wave [4]. The expression of short-circuit power is

Vin Vout

Figure 3.6 Voltage and current waveforms of a CMOS inverter

Leakage power mode

For a long interconnect, we assume that there are half ones and half zeros. When inverter has an input of one, the NMOS transistor is turned ON. The leakage current is determined by the PMOS transistor. When inverter has an input of zero, the PMOS transistor is turned ON. The leakage current is determined by the NMOS transistor.

The expression of leakage power is

n p

min n min p

leakage DD leakage DD n off p off

DD n off p off minimum sized inverter.

These three types of power constitute the power dissipation in one stage.

Prepeater= Pswitching +Pshort circuit +Pleakage. (3.8)

The total power for the global interconnects with repeater insertion is shown in Figure 3.4. In order to analyze the total power simply, we consider merely about

switching

P that is up to 85% of total power. The expression of total power Ptotal is f = bw

total dd total 1

2 of the global interconnects, p is the energy dissipation of the single interconnect. 1

Area

The area of a single interconnect Asingle is shown in Figure 3.7. After we obtain the width and spacing of the interconnect, the area of a single interconnect Asingle is

single

A = (W + SP) l L

× × l . (3.13) We implement the overall chip in Figure 3.4 and put the repeaters under the global interconnects. Therefore, we only consider the area of the global interconnects.

The expression of total area Atotal is

L

SP

W

L

SP

W

Figure 3.7 Area of a single interconnect with repeater insertion

Summary of Performance

According to the previous discussion, we observe that the bandwidth, power, and area of a single interconnect are affected by the interconnect width and spacing.

Furthermore, BWtotal , Ptotal , and Atotal are proportional to BWsingle, Psingle , and

single

A respectively. Therefore, we use MATLAB to plot 3D graph for BWtotal, Ptotal, and Atotal as function of width and spacing. These 3D graph are shown in Figure 3.8, Figure 3.9, and Figure 3.10 respectively.

Figure 3.8 MATLAB simulation for power vs. width and spacing

Figure 3.9 MATLAB simulation for bandwidth vs. width and spacing

Figure 3.10 MATLAB simulation for area vs. width and spacing

3.5 Figure of Merit for Optimization

The aim of global interconnects design is to obtain large bandwidth,small global interconnects area, and low power consumption simultaneously. According to the summary of performance discussed in Section 3.4, the large bandwidth BWsingle

requiressmall interconnect width and spacing. But the low power consumption Psingle and the small interconnect area Asingle require large interconnect width and spacing.

The global interconnects width and spacing affect the overall chip performance such as the bandwidth, the power consumption, and the interconnect area. The tradeoff between the bandwidth, the power, and the area is needed. Therefore, the figure of merit FOM is used for the global interconnects. It considers for the bandwidth, power consumption, and area simultaneously. The expression of FOM is

total total total

FOM = BW

P ×A . (3.15) The proposed novel methodology is to optimize the global interconnects and obtain the maximal FOM simultaneously for the various technologies. The proposed methodology considers three parts for the global interconnects, 1) the optimal interconnect width and spacing, 2) the optimal repeater size and interconnect length, 3) the optimal interconnect bandwidth.

Optimal Global Interconnects Width and Spacing

The previous equation is determined by the various interconnect width and spacing. The optimal interconnect width and spacing are not calculated. In this section, we use (3.11) and (3.13) to obtain the product of power and area for a single interconnect. The expression is

2

single single w g d dd

P A = {f[c L+(c S +c S) L] V } [(W + SP) L]

× × × × × × l × × × . (3.16)

Minimum power mode

The minimum power for the single interconnect is while the interconnect spacing is to tend towards infinite. When the spacing increases, the capacitance reduces. We define the infinite interconnect spacing as when the parallel plate capacitance is 10

times the coupling capacitance. Therefore, we substitute the minimum interconnect

Minimum area mode

Minimum area for the single interconnect is while the interconnect width and spacing are the smallest.

Minimum product of power and area mode

We can increase the interconnect spacing to reduce the power of the single interconnect. But, it also increases the area. Therefore, the minimum product of power and area mode is an important issue for the whole performance. In this section, we simplify (3.16) to

single single w

P ×A ∝ ×K [c ×(W + SP)] (3.19)

2

K = × ×f L VDD . (3.20) To achieve the minimum product of power and area mode, the interconnect width must be minimum. On this premise, the optimal interconnect spacing is calculated by setting the derivative of Psingle×Asingle on SP to be zero.

(Psingle Asingle) SP = 0

∂ ×

∂ (3.21) We solve (3.21) and the optimal interconnect spacing is

c

Use (3.16) and the technology parameter of TSMC 0.18μm, we can use MATLAB to plot 2D graph in Figure 3.11 for Psingle×Asingle versus to the various spacing.

Figure 3.11 MATLAB simulation for product of power and area vs. minimal width and spacing

Optimal Repeater Size and Optimal Interconnect Length

After we obtain the optimal interconnect width and spacing, we substitute (3.4),(3.12), and (3.14) to (3.15). The FOM is written as

total total

total total total

total

2 total dd

BW BW

FOM = =

1 BW

P A ( BW energy) [ (W + SP) L]

2 bw

2 1

=

3.32 BW V L (W + SP) c

K 1 c

τ τ

× × × × × ×

× × × × × ×

∝ × ×

. (3.23)

We observe that the FOM is inversely proportional to the product of time constant and capacitance τ×c. The expression is

s s 2 We solve (3.25) and the optimal repeater size is

opt s w

w g

S = r c

r c . (3.26) We substitute (3.3), (3.26), and the optimal interconnect width and spacing to HSPICE. The simulations are shown in Figure 3.12 and Figure 3.13. The Figure 3.12 expresses that the interconnect bandwidth is versus to the interconnect length. Figure 3.13 expresses that the interconnect bandwidth per energy is versus to the interconnect length.

Number of repeaters per cm

Bandwidth

HSPICE Simulation

Figure 3.12 Variation of bandwidth with number of repeaters

VDD = 1.8V

1.00E+07 3.00E+07 5.00E+07 7.00E+07 9.00E+07 1.10E+08 1.30E+08 1.50E+08

0 10 20 30 40 50

Number of repeaters per cm

bandwidth per bit-energy (bps/pJ)

Figure 3.13 Variation of bandwidth per energy with number of repeaters

The optimal interconnect length is obtained by setting the derivative of τ×c on l to be zero.

c= 0 l τ

∂ ×

∂ (3.27) We solve (3.27) and the optimal repeater size is

s g opt

w w

0.7r c l =

r c . (3.28) Therefore, we substitute (3.3), (3.26), (3.28), and the optimal interconnect width and spacing to HSPICE again. The simulation is shown in Figure 3.14. We obtain the maximum value of interconnect bandwidth per energy. Therefore, we claim that the interconnect circuit is optimized.

VDD = 1.8V

1.00E+07 3.00E+07 5.00E+07 7.00E+07 9.00E+07 1.10E+08 1.30E+08 1.50E+08

0 10 20 30 40 50

Number of repeaters per cm

bandwidth per bit-energy (bps/pJ)

Figure 3.14 Variation of bandwidth per energy with number of repeaters

Optimal Interconnect Bandwidth

After we obtain the optimal interconnect width and spacing, the optimal repeater size, and the optimal interconnect length, we substitute them to (3.3) and obtain the optimal interconnect bandwidth. The expression of the optimal interconnect bandwidth is

( )

opt

s g s d

1 1

BW = =

3.32 3r c + r c 3.32

τ× × . (3.29)

3.6 Optimization Flow

The optimization flow for the global interconnects is shown in Figure 3.12. It includes three methods, 1) the optimization for the minimum product of power and area mode, 2) the optimization for the minimum area mode, 3) the optimization for the minimal power mode.

Figure 3.15 Optimization flow for global interconnects To choose the optimization for minimal power To choose the

optimization for minimal area Begin Design

To choose process technology

Optimization END Decide the optimal interconnect Width (Wopt) and Spacing (SPopt) Decide the optimal Repeater Size (Sopt) and Interconnect Length (lopt)

Decide the optimal bandwidth (BWopt) of the interconnect

Precise Ca , Cf , Cc

Precise cw , rw

Precise rs , cg

To choose the optimization for minimal product of

power and area

3.7 Optimal Design Parameter

According to optimization flow, we optimize the minimum product of power and area to obtain the maximum FOM. Table 3.2 shows the optimal design expression of the global interconnects.

Parameter Optimal design

Interconnect Space (Wopt) Minimum Width

Interconnect Space (SPopt) opt c

a f Interconnect length (lopt) opt s g

w w

Table 3.2 Optimal design expression

We substitute the model parameter to the previous equation and calculate the optimal design parameters are shown in Table 3.3.

Supply voltage - repeaters 1.8V , 13 repeaters/cm Interconnect dimensions W = 0.28μm , SP = 0.64μm

Repeater dimensions Wn = 9.9μm , Wp = 35.2μm

Bandwidth 3Gbps

Total power 9.2mW

Total area 9200μm2

Table 3.3 Optimal design value

3.8 Summary

In this chapter, we improve the optimization for the global interconnects. The optimal design flow is proposed. We optimize the interconnect width and spacing, the repeater size and interconnect length, and the interconnect bandwidth. Finally, according to the optimal design parameter, we claim that the interconnect circuit is optimized.

Chapter 4

Global Interconnects Circuit Implementation

4.1 Single Interconnect Structure

Figure 4.1 is the typical single interconnect. According to the optimal design value, the data rate is 3Gbps and the interconnect length L is 10000μm. Figure 4.2 shows the pre-simulation results of the last repeater output. Table 4.1 shows the total power consumption and jitter of the single interconnect.

l

L l

L

Figure 4.1 Single interconnect with the optimal design

1.8

0 200p 400p 600p

1.8

0 200p 400p 600p

1.81.6

0 200p 400p 600p

1.81.6

0 200p 400p 600p

(a) TT (b) SS

0 200p 400p 600p

1.81.6

0 200p 400p 600p

1.81.6

0 200p 400p 600p

1.81.6

0 200p 400p 600p

(c) FF (d) SF

0 200p 400p 600p

1.8

0 200p 400p 600p

(e) FS

Figure 4.2 Corners of the single interconnect

L = 10000μm TT SS FF SF FS

Power 8.1mW 7.5mW 9.1mW 8mW 82mW

Jitter(p-p) 43ps 63ps 35.6ps 43.8ps 44.5ps Table 4.1 Power and jitter of the single interconnects

4.2 Global Interconnects Structure

Typical global interconnects implementation

Figure 4.3 is the typical layout of unidirectional global interconnects. Figure 4.4 shows the coupling effect of crosstalk by considering a simple case of three parallel lines with the optimal repeaters as drivers. In general, the length capacitor is inversely proportional to the interconnect spacing and proportional to the interconnect length that runs in parallel model. The cross coupling capacitor C is in the c horizontal spacing between the global interconnects.

Figure 4.3 Unidirectional global interconnects

Figure 4.4 Geometrical RC model of the parallel interconnect

On-chip global interconnects implementation

To reduce the impact of capacitive coupling noise, we use the interleaved repeaters for the global interconnects which is described in [11]. The layout structure is shown in Figure 4.5.

This approach uses the offset repeaters in a bus-like structure to minimize the impact of coupling capacitance on delay and crosstalk noise. If the repeaters are offset so that each gate is placed in the middle of its neighboring gates, the affection is limited to one. This is because potential worst-case simultaneous switching on adjacent wires can be present for only half the impacted line’s length. In such condition the other half of the impacted line will consequently experience best-case neighboring switching activity. The Figure 4.6 shows the impact of the interleaved repeaters.

Figure 4.5 Layout of on-chip global interconnects

Figure 4.6 Impact of interleaving repeaters

4.3 Generation of Random Data

In order to test the global interconnects independently, we put a data generator to connect the global interconnects. It is difficult to generate completely random binary data because for the randomness to manifest itself. For this reason, it is common to employ a PRBS. It is “pseudo” because it is deterministic and after 2 -1 elements it n starts to repeat itself. It is the unlike real random sequence.

Due to the data rate operating at gigahertz, we choose the dynamic DFF to setup the PRBS. The dynamic DFF is shown in Figure 4.7. In Figure 4.8, there are twelve resettable dynamic DFFs and an XOR gate to send the result to the input of the first DFF.

D Q

Reset Reset

D Q

Reset Reset

Figure 4.7 Resettable dynamic DFF

DD QQ

Figure 4.8 Linear feedback shift registers

A segment of 212− data patterns is generated with twelve registers and an 1 XOR circuit. The property of the PRBS architecture is that it can generate all possible combination patterns except the all zero vector. The probability of transitions from 0 to 1 and 1 to 0 are the same as 50%. It is a simple and regular structure. This technique can be extended to an m-bit system so as to produce a sequence of length

2m− . 1

Figure 4.9 shows the HSPICE simulation results and Figure 4.10 shows the eye diagrams of PRBS.

10

Figure 4.9 Timing diagram of PRBS

10

0 200p 400p 600p

10

0 200p 400p 600p

Figure 4.10 Eye diagram of PRBS

4.4 Output Buffer

When the data exports to the chip, they are distorted. Because the boning wire and pad cause the resonance of inductance and capacitance. Therefore, output buffer plays an important role to transmit signals. The output data stream usually has large jitter and small amplitude swing. Therefore, the output sensitivity, symmetry, and bandwidth are major concerns.

Figure 4.11 shows the architecture of output buffer. The proposed architecture is all digitized. It operates in fully differential and amplifies the swing of the output signal stage by stage. We use two inverters which connect input to output by each other to make hysteresis. It makes the signal transfer with symmetry and reduces the effect of noise. The inverter connected with a transmission gate has two advantages.

First, the inverter which input and output connect together makes the input common-mode at0.5V . We don’t need common-mode feed back circuit. Second, DD the transmission gate act as resister and it makes the inductive peaking effect.

Although we reduce the gain, we extend the bandwidth of the inverter. In order to reach the large swing of output, we need more stages to reach it.

in

inb

A B C D

in

inb

A B C D

Figure 4.11 Architecture of output buffer

4.5 Layout and Simulation

The proposed 10mm optimal global interconnects is implemented by National Chip Implement Center (CIC) in TSMC 0.18μm 1P6M CMOS process. The data rate is 3Gbps per channel. The layout of this chip is shown in Figure 4.12. The core area is 0 196. mm (2 700um×280um) and the total area is 0.6144mm (2 960um×640um). The chip includes a 10mm global interconnects, a PRBS, and an output buffer. The rest area is filled up with decouple capacitors to bypass power noise. The chip will be implemented and send back in January 2008.

10mm on-chip global interconnect with Repeater chain

Decouple Cap

Buffer PRBS

960μm

640μm

10mm on-chip global interconnect with Repeater chain

Decouple Cap

Buffer PRBS

960μm

640μm

Figure 4.12 Layout of 10mm optimal global interconnects

We input 3Gbps PRBS signal to test the 10mm optimal global interconnects.

Figure 4.13 and Figure 4.14 show the five corners of the last repeater outputs for the 5mm optimized global interconnects and the 10mm optimized global interconnects respectively. These simulations are all post layout-simulation results.

1.8

0 200p 400p 600p

1.8

0 200p 400p 600p

1.8

0 200p 400p 600p

1.8

0 200p 400p 600p

(a) TT (b) SS

0 200p 400p 600p

1.81.6

0 200p 400p 600p

1.81.6

0 200p 400p 600p

1.81.6

0 200p 400p 600p

(c) FF (d) SF

0 200p 400p 600p

1.8

0 200p 400p 600p

(e) FS

Figure 4.13 Corners of the global interconnects for 5000μm

L = 5000μm TT SS FF SF FS

Jitter(p-p) 23.9ps 24.5ps 33ps 25ps 24.4ps Table 4.2 Jitter of the global interconnects for 5000μm

1.81.6

0 200p 400p 600p

1.81.6

0 200p 400p 600p

1.8

0 200p 400p 600p

1.8

0 200p 400p 600p

(a) TT (b) SS

0 200p 400p 600p

1.81.6 1.41.2 0.81 0.60.4 0.20

0 200p 400p 600p

1.81.6

0 200p 400p 600p

1.81.6

0 200p 400p 600p

(c) FF (d) SF

0 200p 400p 600p

1.81.6

0 200p 400p 600p

(e) FS

Figure 4.14 Corners of the global interconnects for 10000μm

L = 10000μm TT SS FF SF FS

Power 85.6mW 82mW 90mW 85.5mW 85.9mW

Jitter(p-p) 41.4ps 44.1ps 41.9ps 41.7ps 44ps Table 4.3 Power and jitter of the global interconnects for 10000μm

Besides, we scale down the power supply to 1V. The data rate is down to 2.2Gbps. Figure 4.15 shows the eye diagram of global interconnects. The jitter is 53.9ps and the power consumption is 2.475mW.

0 200p 400p 600p 800p

1

0.8

0.6

0.2

0 0.4

0 200p 400p 600p 800p

1

0.8

0.6

0.2

0 0.4

Figure 4.15 The eye diagram of global interconnects at 2.2Gbps

We also change the temperature condition test the 10mm optimal global interconnects. Figure 4.16 and Figure 4.17 show the affections of temperature variation respectively.

0 200p 400p 600p

1.8

0 200p 400p 600p

1.8

Figure 4.16 Temperature = 0 for the global interconnects

0 200p 400p 600p

1.8

0 200p 400p 600p

1.8

Figure 4.17 Temperature = 100 for the global interconnects

The post layout-simulation summaries are shown in Table 4.2 and Table 4.3 respectively. The best interconnect has at least 0.87 unit interval (UI) eye-opening of 333ps period at the end of last repeater output. Table 4.4 shows the summary of the 10mm global interconnects.

Item Specification Process TSMC 0.18μm 1P6M

Supply Voltage 1.8V

Data Rate 3Gbps/channel × 8

Link 10mm on chip micro-strip line Jitter of received data (pk-to-pk) 41.4ps (0.124UI)

Repeater chain Layout Area 700μm × 280μm Core Layout Area 960μm × 640μm

PRBS Generator 4mW

Repeater Chain 73.6mW(9.2mW × 8) Output Buffer 8mW

Power Consumption

Total 85.6mW Table 4.4 Summary of the 10mm global interconnects

4.6 Performance Comparison

The specifications of the global interconnects are shown in Table 4.5. For the performances of the global interconnects, we concern mainly about the bandwidth, the power consumption, the interconnect length, and the area of the global interconnects.

These important performances are substituted to (3.15) to calculate FOM.

Besides, for the convenience of comparison at the same level, we scale down the power supply to 1V and obtain 2.475mW power consumption at 2.2Gbps.

Reference Bandwidth Process Supply Power Link Area JSSC’06[13] 1Gbps 0.35μm 2.5V 5.8mW 1.75cm 0.105mm2 TVLSI’05[14] 1.47Gbps 0.18μm 2V 14.2mW 1cm 0.005mm2 ISQED’05[15] 1.66Gbps 0.18μm 1V 3.1mW 1cm 0.006mm2 JSSC’03[16] 2Gbps 0.18μm 1.8V 30mW 2cm 0.018mm2 ASSCC’05[18] 2.5Gbps 0.13μm 1.2V 4.6mW 0.9cm 0.0108mm2

3Gbps 0.18μm 1.8V 9.2mW 1cm 0.0092mm2 This work

2.2Gbps 0.18μm 1V 2.47mW 1cm 0.0092mm2 Table 4.5 Specifications of the global interconnects

The maximum FOM and the minimum power consumption per bit are important targets for the global interconnects. They are shown in Table 4.6. The proposed architecture’s FOM is maximum when the power supply is 1.8V. Furthermore, we also scale down the power supply to 1V. The FOM is still better than other cases.

In high-speed link design, the power consumption per bit is usually used to determine the performance. In Table 4.6, we obtain the minimum power consumption per bit when the power supplies are 1V and 1.8V respectively.

Reference FOM Power/bit (pJ/bit)

JSSC’06[13] 1.6 5.8

TVLSI’05[14] 20 6.55

ISQED’05[15] 89 1.86

JSSC’03[16] 3.7 8

ASSCC’05[18] 50 1.84

35 3.06 This work

97 1.13 Table 4.6 Comparisons of the global interconnects

4.7 Measurement Considerations

The test configuration is shown in Figure 4.15 and we illustrate the purpose of each instrument. Power supply enables this chip. Agilent N4901B Serial BERT provides input up to 3Gbps data rate and the 3GHz differential clock. By a wide-band oscilloscope, we can observe the high-speed performance of the global interconnects.

We expect to obtain 3Gbps signal from the output of the last repeater and the eye-opening diagram which is up to 0.85UI.

Besides, we will regulate the power supply voltage to obtain the various optimal bandwidths. The power is measured by Ktythley 2400 Source Meter. According to the bandwidth and the power, we also calculate the FOM of this chip.

Ktythley:2400 source meter

N4901B Serial BERT 13.5 Gb/s N4901B Serial BERT 13.5 Gb/s

BER measurement

HP E3610A DC Power Supply Output Buffer HP E3610A DC Power Supply

Output Buffer

HP E3610A DC Power Supply RRBS / Repeater chain HP E3610A DC Power Supply

RRBS / Repeater chain

PCB

Figure 4.15 Measurement setup

4.8 Summary

In this chapter, the 10mm on-chip global interconnects are optimized by the proposed methodology. It is implemented by TSMC 0.18μm 1P6M technology. The data rate is 3Gbps. The power consumption is 9.2mW per interconnect. The power per

In this chapter, the 10mm on-chip global interconnects are optimized by the proposed methodology. It is implemented by TSMC 0.18μm 1P6M technology. The data rate is 3Gbps. The power consumption is 9.2mW per interconnect. The power per

相關文件