E FFECTIVE R ESISTANCE - 一個低硬體成本消耗,適用於晶片內單通道每秒三十億筆資料傳輸之匯流排介面電路設計

According to the Elmore delay model, a gate with effective resistance R and capacitance has a propagation delay of RC. A wire with distributed resistance R and capacitance C treated as a single π-segment has propagation delay of RC/ 2. We review the properties of RC circuits. The lumped RC circuit in Figure 2.2(a) has a unit step response of

-t out R C

V (t)= 1- e . (2.2) The propagation delay of this circuit is obtained by solving for t when _pd

( )

Figure 2.2 (a) Lumped RC model (b) distributed RC model

The distributed RC circuit in Figure 2.2(b) has no closed form time domain response. The capacitance is distributed along the circuit rather than all being at the end. We expect the capacitance to be charged on average through about half the resistance and the propagation delay is about half as great. It is shown in Figure 2.3. A numerical analysis finds that the propagation delay is0.38R C . ^'

Figure 2.3 Lumped and distributed RC circuit response

To reconcile the Elmore model with the true results for a logic gate, we recall that logic gates have complex nonlinear I-V characteristics and are approximated to have an effective resistance. If we characterize that effective resistance as R R= ^'ln 2, the propagation delay really becomes the product of the effective resistance and capacitance: t_pd =RC. We will calculate this effective resistance by simulating the delay of a gate driving a capacitance load and measuring the propagation delay.

For the distributed circuits, we observe that

' 1 ' 1

0.38 ln 2

2 2

R C≈ R C = RC. (2.4)

Therefore, the Elmore delay model describes distributed delay well if we use an effective wire resistance equal to 69% of that computed with (2.5).

R R l

= _, w. (2.5) This is somewhat inconvenient. The effective resistance is further complicated by the effect of nonzero rise time on propagation delay. When the input is a slow ramp, the propagation delay depends on the rise time of the input and approaches RC for lumped models and RC/ 2 for distributed models.

In summary, it is a reasonable practice to estimate propagation delay of gates using the Elmore delay model as RC where R is the effective resistance of the gate. Similarly, we can estimate the flight time along a wire as RC/ 2 where R is the true resistance of the wire. It is important to use good transistor models and appropriate input slopes to obtain more accurate results.

2.3 Crosstalk Effect

In deep sub-micron technology, the signal over long interconnect is a dominant issue in the chip design with the current technology. With the device sizes getting smaller and smaller and many circuits are built in a chip, the global interconnects are spaced closer and closer together. The signal rise and fall times go into the nano second region, and the effect of coupling is more observable between interconnects.

The result of crosstalk has implications on the data throughput and on signal integrity. In closely coupled interconnects such as in the long parallel interconnects, the affections of crosstalk include the speeded up signal or the considerable additional delay. The other different impacts are shown in Figure 2.4 [2].

C_c

Figure 2.4 Crosstalk effects (a) additional delay (b) speedup (c) glitch (d) oscillation

2.4 Optimization for Minimum Delay

In general interconnect design, the repeater are optimally sized to minimize the interconnect delay. But these optimally sized repeaters are very large [3] (450 times the minimum sized inverter available in the correct technology for the global interconnects) and also dissipate a significant amount of power. The total power dissipation by such repeaters in high-performance designs is very high.

However, as shown in Figure 2.5, the interconnect delay is actually very low with respect to both the repeater size and interconnect length close to the minimum value [4].

S/S_opt l/l_opt

(τ/l)/(τ/lopt)

S/S_opt l/l_opt

(τ/l)/(τ/lopt)

Figure 2.5 Normalized delay per unit length as a function of repeater size and interconnect length

For the basic repeater model, it is shown in Figure 2.6. To obtain the optimal repeater size and the optimal interconnect length, we use the time constant of the repeater from Chapter 3. The delay per unit length of the repeater is given by

s s 2

g d w w g w w

r r 1

= (c +c )+ c + r c S + c r l

l l S 2

τ × × . (2.6)

Therefore, the delay per unit length is optimized when Furthermore, the optimal delay per unit length is given by

Figure 2.6 Basic repeater model

In a word, for the general interconnect design, we always find the optimal repeater size and the optimal interconnect length to minimize the interconnect delay.

2.5 Optimization for Power Dissipation

Because all global interconnects are not the critical path, a small delay penalty can be tolerated on these non-critical interconnects. There exists a potential for large power savings by using the smaller repeaters and the larger interconnect lengths.

In the optimization for power consumption, the methodology is to estimate the repeater size and interconnect length which minimize the global interconnects power consumption for a given delay penalty. According to Figure 2.5, we fix a interconnect delay and obtain Figure 2.7. Figure 2.7 shows that we can use the optimal repeater size and the optimal interconnect length to obtain the optimal interconnect power for a given interconnect delay.

The total optimal power is composed of a lot of repeater power. Noteworthily,

the total repeater power is not only the switching power, it also includes short-circuit power and leakage power. These powers are discussed particularly in Chapter 3.

S/S_opt

Figure 2.7 Normalized power per unit length as a function of repeater size and interconnect length

The total repeater power is discussed in Chapter 3. The expression is

1 [( ) ] 2 3

repeater switching short-circuit leakage

d g w r

Therefore, for a given interconnect delay f , the repeater power is rewritten as

repeater 1 d g w 2 opt 3

P = k [(c S +c S)+c l] + k S (1+ f)( ) l + k S l

× × × × × × τ × × .(2.12)

Then, the repeater power per unit length is given by

repeater '

We set the derivative of this with respect to S and l to zero. This equation is solved by using Newton-Raphson. Therefore, we can obtain the optimal repeater size and the optimal interconnect length to minimize the interconnect power consumption.

2.6 Summary

In this chapter, we discuss three effects of interconnect and two different optimizations for the global interconnects. These effects are considered to enhance our analysis in Chapter 3 and Chapter 4. Furthermore, according to two different optimizations, we improve them and propose a novel optimization to the global interconnects.

Chapter 3 Global Interconnects Circuit Design

3.1 Global Interconnects

The optimal repeater insertion is a good method to reduce power consumption and chip area. In Chapter 2, we have described various methodologies to optimize global interconnects. However, the methods are not enough to improve the performance completely. In this chapter, we introduce a novel methodology to optimize power and area effectively.

3.2 Model Parameter

Before the optimization, we must acquire process parameters which affect the optimization. The technology parameters and equivalent circuit parameters are shown in Table 3.1. These parameters are obtained from the TSMC database and the

International Technology Roadmap for Semiconductors (ITRS) database. Where t is the interconnect thickness, ε_r is the dielectric constant, ρ is the metal resistivity, and V is the power supply voltage. _DD

Table 3.1 also includes the input capacitance c , the output capacitance _g c , and _d output resistance r for a minimum sized inverter. _s

Tech. Node

(nm) 180 130 90 65 45

t (nm) 1000 670 482 319 236

εr 3.75 3.3 2.8 2.5 2.1

ρ (10^-8Ω‧m) 2.2 2.2 2.2 2.2 2.2

ca (fF/μm²) 0.039 0.053 0.065 0.057 0.072 cf (fF/μm) 0.05 0.07 0.058 0.065 0.052 cc (fF) 0.09 0.046 0.029 0.015 0.01

rs (kΩ) 8 9.5 10 15.8 12.5

cg (fF) 1.9 1.33 1.1 1.03 0.9

cd (fF) 4.8 3.32 2.04 1.22 0.6

Ioffn (μA/μm) 0.2 2 3.56 20 35.5

VDD (V) 1.8 1.2 1 0.7 0.6

Table 3.1 Technology and equivalent circuit parameters

3.3 Model of Global Interconnects

Repeater Model

We use repeaters to relay the signal in the interconnect. The repeater model is presented in Figure 3.1. It consists of two minimum sized inverter and a segment of a metal wire. The repeater has an input capacitance of c , an output capacitance of _g c , _d and an output resistance of r . Therefore, for a repeater of size _s S, the total input capacitance is C_g = × , the total output capacitance is S c_g C_d = × , and the total S c_d output resistance is R_tr =r S_s/ .

The interconnect is modeled as a distributed RC line. It contains the resistance per unit length r and capacitance per unit length _w c . For an interconnect with _w length l, the total resistance is R_w= × , and the total capacitance is l r_w C_w= × . l c_w

C_w

C_d C_g

R_tr R_w

l C_d C_w C_g

R_tr R_w C_w

C_d C_g

R_tr R_w

Figure 3.1 Repeater RC model

On-chip Interconnect Model

The cross section of global interconnects is shown in Figure 3.2, where W is the width. SP is the spacing. T is the thickness. c is the parallel plate _a capacitance to the top and bottom layers of metals and is proportional to interconnect width. c is the fringing capacitance. _f c is the coupling capacitance between the _c neighboring interconnects and is inversely proportional to the interconnect spacing.

The interconnect resistance per unit length is r_w=ρ/Wt, where ρ is the metal resistivity.

W SP W SP W c_c

c_f c_a cc_f_c t

W SP W SP W

c_c

c_f c_a cc_f_c t

Figure 3.2 Cross section of global interconnects

According to TSMC 0.18μm technology, we can obtain c , _a c , and _f c _c respectively. The interconnect capacitance per unit length c is _w

w a f c

c = c W +c + c

× SP. (3.1) Furthermore, we can use MATLAB to plot the 3D graph for c as shown in _w Figure 3.3.

Figure 3.3 Extracted capacitance cw as a function of width and spacing for 180nm technology

3.4 Performance of Global Interconnects

Time Constant

After we obtain C , _g C , _d R , _tr C , and _w R from Section 3.3, the time τ _w constant of the repeater model is [3]

s s 2

The data transmitted in a single interconnect with bandwidth of BW_single is inversely proportional to the time constant. To acquire the voltage swing from 5% of V to 95% of DD V , the bandwidth of a single interconnect _DD BW_single is defined as

The global interconnects with repeater insertion is shown in Figure 3.4, where L is the total interconnect length. The global interconnects which link many blocks of a SOC usually consist of a large number (n) of the parallel interconnects, and the total bandwidth BW_total is

total single

Figure 3.4 Global interconnects with repeater insertion

Power

With technology scaling, the total power consumption is not only the switching power. The leakage power increases rapidly and the short-circuit power has also been shown to be a significant fraction (up to 15%) of the total power consumption for low-power and high-speed designs [4]. The three components of the total power are

analyzed as follows.

Switching power mode

The switching power of the repeater is shown in Figure 3.5. The switching power occurs when current in the repeater charge or discharge C , _g C , and _w C . The _d expression of switching power is

Figure 3.5 Switching power model of repeater

Short-circuit power mode

The short-circuit power of the repeater is shown in Figure 3.6(a). The short-circuit power occurs during the transition from either high-to-low or low-to-high.

Both NMOS and PMOS transistors are on for a short period of time, and there is a current drawn from V through the two transistors to the ground [5]. The input and _DD output voltage and current waveforms are shown in Figure 3.6(b). We denote t the _r time for the input to rise from V to _tn V_DD −V_tp. The short-circuit current waveform is approximated by a triangular wave [4]. The expression of short-circuit power is

V_in V_out

Figure 3.6 Voltage and current waveforms of a CMOS inverter

Leakage power mode

For a long interconnect, we assume that there are half ones and half zeros. When inverter has an input of one, the NMOS transistor is turned ON. The leakage current is determined by the PMOS transistor. When inverter has an input of zero, the PMOS transistor is turned ON. The leakage current is determined by the NMOS transistor.

The expression of leakage power is

ⁿ ^p

min n min p

leakage DD leakage DD n off p off

DD n off p off minimum sized inverter.

These three types of power constitute the power dissipation in one stage.

P_repeater= P_switching +Pshort circuit₋ +Pleakage. (3.8)

The total power for the global interconnects with repeater insertion is shown in Figure 3.4. In order to analyze the total power simply, we consider merely about

switching

P that is up to 85% of total power. The expression of total power P_total is f = bw

total dd total 1

2 of the global interconnects, p is the energy dissipation of the single interconnect. ₁

Area

The area of a single interconnect A_single is shown in Figure 3.7. After we obtain the width and spacing of the interconnect, the area of a single interconnect A_single is

single

A = (W + SP) l L

× × l . (3.13) We implement the overall chip in Figure 3.4 and put the repeaters under the global interconnects. Therefore, we only consider the area of the global interconnects.

The expression of total area A_total is

…

Figure 3.7 Area of a single interconnect with repeater insertion

Summary of Performance

According to the previous discussion, we observe that the bandwidth, power, and area of a single interconnect are affected by the interconnect width and spacing.

Furthermore, BW_total , P_total , and A_total are proportional to BW_single, P_single , and

single

A respectively. Therefore, we use MATLAB to plot 3D graph for BW_total, P_total, and A_total as function of width and spacing. These 3D graph are shown in Figure 3.8, Figure 3.9, and Figure 3.10 respectively.

Figure 3.8 MATLAB simulation for power vs. width and spacing

Figure 3.9 MATLAB simulation for bandwidth vs. width and spacing

Figure 3.10 MATLAB simulation for area vs. width and spacing

3.5 Figure of Merit for Optimization

The aim of global interconnects design is to obtain large bandwidth,small global interconnects area, and low power consumption simultaneously. According to the summary of performance discussed in Section 3.4, the large bandwidth BW_single

requiressmall interconnect width and spacing. But the low power consumption P_single and the small interconnect area A_single require large interconnect width and spacing.

The global interconnects width and spacing affect the overall chip performance such as the bandwidth, the power consumption, and the interconnect area. The tradeoff between the bandwidth, the power, and the area is needed. Therefore, the figure of merit FOM is used for the global interconnects. It considers for the bandwidth, power consumption, and area simultaneously. The expression of FOM is

total total total

FOM = BW

P ×A . (3.15) The proposed novel methodology is to optimize the global interconnects and obtain the maximal FOM simultaneously for the various technologies. The proposed methodology considers three parts for the global interconnects, 1) the optimal interconnect width and spacing, 2) the optimal repeater size and interconnect length, 3) the optimal interconnect bandwidth.

Optimal Global Interconnects Width and Spacing

The previous equation is determined by the various interconnect width and spacing. The optimal interconnect width and spacing are not calculated. In this section, we use (3.11) and (3.13) to obtain the product of power and area for a single interconnect. The expression is

single single w g d dd

P A = {f[c L+(c S +c S) L] V } [(W + SP) L]

× × × × × × l × × × . (3.16)

Minimum power mode

The minimum power for the single interconnect is while the interconnect spacing is to tend towards infinite. When the spacing increases, the capacitance reduces. We define the infinite interconnect spacing as when the parallel plate capacitance is 10

times the coupling capacitance. Therefore, we substitute the minimum interconnect

Minimum area mode

Minimum area for the single interconnect is while the interconnect width and spacing are the smallest.

Minimum product of power and area mode

We can increase the interconnect spacing to reduce the power of the single interconnect. But, it also increases the area. Therefore, the minimum product of power and area mode is an important issue for the whole performance. In this section, we simplify (3.16) to

single single w

P ×A ∝ ×K [c ×(W + SP)] (3.19)

K = × ×f L VDD . (3.20) To achieve the minimum product of power and area mode, the interconnect width must be minimum. On this premise, the optimal interconnect spacing is calculated by setting the derivative of P_single×A_single on SP to be zero.

(P^single A^single) SP = 0

∂ ×

∂ (3.21) We solve (3.21) and the optimal interconnect spacing is

Use (3.16) and the technology parameter of TSMC 0.18μm, we can use MATLAB to plot 2D graph in Figure 3.11 for P_single×A_single versus to the various spacing.

Figure 3.11 MATLAB simulation for product of power and area vs. minimal width and spacing

Optimal Repeater Size and Optimal Interconnect Length

After we obtain the optimal interconnect width and spacing, we substitute (3.4),(3.12), and (3.14) to (3.15). The FOM is written as

total total

total total total

total

2 total dd

BW BW

FOM = =

1 BW

P A ( BW energy) [ (W + SP) L]

2 bw

2 1

3.32 BW V L (W + SP) c

K 1 c

τ τ

× × × × × ×

∝ × ×

. (3.23)

We observe that the FOM is inversely proportional to the product of time constant and capacitance τ×c. The expression is

s s 2 We solve (3.25) and the optimal repeater size is

opt s w

w g

S = r c

r c . (3.26) We substitute (3.3), (3.26), and the optimal interconnect width and spacing to HSPICE. The simulations are shown in Figure 3.12 and Figure 3.13. The Figure 3.12 expresses that the interconnect bandwidth is versus to the interconnect length. Figure 3.13 expresses that the interconnect bandwidth per energy is versus to the interconnect length.

Number of repeaters per cm

Bandwidth

HSPICE Simulation

Figure 3.12 Variation of bandwidth with number of repeaters

VDD = 1.8V

1.00E+07 3.00E+07 5.00E+07 7.00E+07 9.00E+07 1.10E+08 1.30E+08 1.50E+08

0 10 20 30 40 50

Number of repeaters per cm

bandwidth per bit-energy (bps/pJ)

Figure 3.13 Variation of bandwidth per energy with number of repeaters

The optimal interconnect length is obtained by setting the derivative of τ×c on l to be zero.

c= 0 l τ

∂ ×

∂ (3.27) We solve (3.27) and the optimal repeater size is

s g opt

w w

0.7r c l =

r c . (3.28) Therefore, we substitute (3.3), (3.26), (3.28), and the optimal interconnect width and spacing to HSPICE again. The simulation is shown in Figure 3.14. We obtain the maximum value of interconnect bandwidth per energy. Therefore, we claim that the interconnect circuit is optimized.

VDD = 1.8V

1.00E+07 3.00E+07 5.00E+07 7.00E+07 9.00E+07 1.10E+08 1.30E+08 1.50E+08

0 10 20 30 40 50

Number of repeaters per cm

bandwidth per bit-energy (bps/pJ)

Figure 3.14 Variation of bandwidth per energy with number of repeaters

Optimal Interconnect Bandwidth

After we obtain the optimal interconnect width and spacing, the optimal repeater size, and the optimal interconnect length, we substitute them to (3.3) and obtain the optimal interconnect bandwidth. The expression of the optimal interconnect bandwidth is

( )

opt

s g s d

1 1

BW = =

3.32 3r c + r c 3.32

τ× × . (3.29)

3.6 Optimization Flow

The optimization flow for the global interconnects is shown in Figure 3.12. It includes three methods, 1) the optimization for the minimum product of power and area mode, 2) the optimization for the minimum area mode, 3) the optimization for the minimal power mode.

Figure 3.15 Optimization flow for global interconnects To choose the optimization for minimal power To choose the

optimization for minimal area Begin Design

To choose process technology

Optimization END Decide the optimal interconnect Width (Wopt) and Spacing (SPopt) Decide the optimal Repeater Size (Sopt) and Interconnect Length (lopt)

Decide the optimal bandwidth (BWopt) of the interconnect

Precise Ca , Cf , Cc

Precise cw , rw

Precise rs , cg

To choose the optimization for minimal product of

power and area

3.7 Optimal Design Parameter

According to optimization flow, we optimize the minimum product of power and area to obtain the maximum FOM. Table 3.2 shows the optimal design expression of the global interconnects.

Parameter Optimal design

Interconnect Space (Wopt) Minimum Width

Interconnect Space (SPopt) ^opt ^c

a f Interconnect length (lopt) opt ^{s g}

w w

Table 3.2 Optimal design expression

We substitute the model parameter to the previous equation and calculate the optimal design parameters are shown in Table 3.3.

Supply voltage - repeaters 1.8V , 13 repeaters/cm Interconnect dimensions W = 0.28μm , SP = 0.64μm

Repeater dimensions Wn = 9.9μm , Wp = 35.2μm

Bandwidth 3Gbps

Total power 9.2mW

Total area 9200μm²

Table 3.3 Optimal design value

3.8 Summary

In this chapter, we improve the optimization for the global interconnects. The optimal design flow is proposed. We optimize the interconnect width and spacing, the repeater size and interconnect length, and the interconnect bandwidth. Finally, according to the optimal design parameter, we claim that the interconnect circuit is optimized.

Chapter 4 Global Interconnects Circuit Implementation

4.1 Single Interconnect Structure

Figure 4.1 is the typical single interconnect. According to the optimal design value, the data rate is 3Gbps and the interconnect length L is 10000μm. Figure 4.2 shows the pre-simulation results of the last repeater output. Table 4.1 shows the total power consumption and jitter of the single interconnect.

…

L l

…

Figure 4.1 Single interconnect with the optimal design

1.8

0 200p 400p 600p

1.8

0 200p 400p 600p

1.81.6

0 200p 400p 600p

1.81.6

0 200p 400p 600p

(a) TT (b) SS

0 200p 400p 600p

1.81.6

0 200p 400p 600p

1.81.6

0 200p 400p 600p

1.81.6

0 200p 400p 600p

1.8

0 200p 400p 600p

(e) FS

Figure 4.2 Corners of the single interconnect

L = 10000μm TT SS FF SF FS

Power 8.1mW 7.5mW 9.1mW 8mW 82mW

Jitter(p-p) 43ps 63ps 35.6ps 43.8ps 44.5ps Table 4.1 Power and jitter of the single interconnects

4.2 Global Interconnects Structure

Typical global interconnects implementation

Figure 4.3 is the typical layout of unidirectional global interconnects. Figure 4.4 shows the coupling effect of crosstalk by considering a simple case of three parallel lines with the optimal repeaters as drivers. In general, the length capacitor is inversely proportional to the interconnect spacing and proportional to the interconnect length that runs in parallel model. The cross coupling capacitor C is in the _c horizontal spacing between the global interconnects.

… …

…

… …

…

Figure 4.3 Unidirectional global interconnects

…

Figure 4.4 Geometrical RC model of the parallel interconnect

On-chip global interconnects implementation

To reduce the impact of capacitive coupling noise, we use the interleaved repeaters for the global interconnects which is described in [11]. The layout structure

在文檔中一個低硬體成本消耗,適用於晶片內單通道每秒三十億筆資料傳輸之匯流排介面電路設計 (頁 16-0)