According to the Elmore delay model, a gate with effective resistance R and capacitance has a propagation delay of RC. A wire with distributed resistance R and capacitance C treated as a single π-segment has propagation delay of RC/ 2. We review the properties of RC circuits. The lumped RC circuit in Figure 2.2(a) has a unit step response of
'
-t out R C
V (t)= 1- e . (2.2) The propagation delay of this circuit is obtained by solving for t when pd
( )
Figure 2.2 (a) Lumped RC model (b) distributed RC model
The distributed RC circuit in Figure 2.2(b) has no closed form time domain response. The capacitance is distributed along the circuit rather than all being at the end. We expect the capacitance to be charged on average through about half the resistance and the propagation delay is about half as great. It is shown in Figure 2.3. A numerical analysis finds that the propagation delay is0.38R C . '
0
Figure 2.3 Lumped and distributed RC circuit response
To reconcile the Elmore model with the true results for a logic gate, we recall that logic gates have complex nonlinear I-V characteristics and are approximated to have an effective resistance. If we characterize that effective resistance as R R= 'ln 2, the propagation delay really becomes the product of the effective resistance and capacitance: tpd =RC. We will calculate this effective resistance by simulating the delay of a gate driving a capacitance load and measuring the propagation delay.
For the distributed circuits, we observe that
' 1 ' 1
0.38 ln 2
2 2
R C≈ R C = RC. (2.4)
Therefore, the Elmore delay model describes distributed delay well if we use an effective wire resistance equal to 69% of that computed with (2.5).
R R l
= , w. (2.5) This is somewhat inconvenient. The effective resistance is further complicated by the effect of nonzero rise time on propagation delay. When the input is a slow ramp, the propagation delay depends on the rise time of the input and approaches RC for lumped models and RC/ 2 for distributed models.
In summary, it is a reasonable practice to estimate propagation delay of gates using the Elmore delay model as RC where R is the effective resistance of the gate. Similarly, we can estimate the flight time along a wire as RC/ 2 where R is the true resistance of the wire. It is important to use good transistor models and appropriate input slopes to obtain more accurate results.
2.3 Crosstalk Effect
In deep sub-micron technology, the signal over long interconnect is a dominant issue in the chip design with the current technology. With the device sizes getting smaller and smaller and many circuits are built in a chip, the global interconnects are spaced closer and closer together. The signal rise and fall times go into the nano second region, and the effect of coupling is more observable between interconnects.
The result of crosstalk has implications on the data throughput and on signal integrity. In closely coupled interconnects such as in the long parallel interconnects, the affections of crosstalk include the speeded up signal or the considerable additional delay. The other different impacts are shown in Figure 2.4 [2].
Cc
Figure 2.4 Crosstalk effects (a) additional delay (b) speedup (c) glitch (d) oscillation
2.4 Optimization for Minimum Delay
In general interconnect design, the repeater are optimally sized to minimize the interconnect delay. But these optimally sized repeaters are very large [3] (450 times the minimum sized inverter available in the correct technology for the global interconnects) and also dissipate a significant amount of power. The total power dissipation by such repeaters in high-performance designs is very high.
However, as shown in Figure 2.5, the interconnect delay is actually very low with respect to both the repeater size and interconnect length close to the minimum value [4].
S/Sopt l/lopt
(τ/l)/(τ/lopt)
S/Sopt l/lopt
(τ/l)/(τ/lopt)
Figure 2.5 Normalized delay per unit length as a function of repeater size and interconnect length
For the basic repeater model, it is shown in Figure 2.6. To obtain the optimal repeater size and the optimal interconnect length, we use the time constant of the repeater from Chapter 3. The delay per unit length of the repeater is given by
s s 2
g d w w g w w
r r 1
= (c +c )+ c + r c S + c r l
l l S 2
τ × × . (2.6)
Therefore, the delay per unit length is optimized when Furthermore, the optimal delay per unit length is given by
d
Figure 2.6 Basic repeater model
In a word, for the general interconnect design, we always find the optimal repeater size and the optimal interconnect length to minimize the interconnect delay.
2.5 Optimization for Power Dissipation
Because all global interconnects are not the critical path, a small delay penalty can be tolerated on these non-critical interconnects. There exists a potential for large power savings by using the smaller repeaters and the larger interconnect lengths.
In the optimization for power consumption, the methodology is to estimate the repeater size and interconnect length which minimize the global interconnects power consumption for a given delay penalty. According to Figure 2.5, we fix a interconnect delay and obtain Figure 2.7. Figure 2.7 shows that we can use the optimal repeater size and the optimal interconnect length to obtain the optimal interconnect power for a given interconnect delay.
The total optimal power is composed of a lot of repeater power. Noteworthily,
the total repeater power is not only the switching power, it also includes short-circuit power and leakage power. These powers are discussed particularly in Chapter 3.
S/Sopt
Figure 2.7 Normalized power per unit length as a function of repeater size and interconnect length
The total repeater power is discussed in Chapter 3. The expression is
1 [( ) ] 2 3
repeater switching short-circuit leakage
d g w r
Therefore, for a given interconnect delay f , the repeater power is rewritten as
repeater 1 d g w 2 opt 3
P = k [(c S +c S)+c l] + k S (1+ f)( ) l + k S l
× × × × × × τ × × .(2.12)
Then, the repeater power per unit length is given by
repeater '
We set the derivative of this with respect to S and l to zero. This equation is solved by using Newton-Raphson. Therefore, we can obtain the optimal repeater size and the optimal interconnect length to minimize the interconnect power consumption.
2.6 Summary
In this chapter, we discuss three effects of interconnect and two different optimizations for the global interconnects. These effects are considered to enhance our analysis in Chapter 3 and Chapter 4. Furthermore, according to two different optimizations, we improve them and propose a novel optimization to the global interconnects.
Chapter 3
Global Interconnects Circuit Design
3.1 Global Interconnects
The optimal repeater insertion is a good method to reduce power consumption and chip area. In Chapter 2, we have described various methodologies to optimize global interconnects. However, the methods are not enough to improve the performance completely. In this chapter, we introduce a novel methodology to optimize power and area effectively.
3.2 Model Parameter
Before the optimization, we must acquire process parameters which affect the optimization. The technology parameters and equivalent circuit parameters are shown in Table 3.1. These parameters are obtained from the TSMC database and the
International Technology Roadmap for Semiconductors (ITRS) database. Where t is the interconnect thickness, εr is the dielectric constant, ρ is the metal resistivity, and V is the power supply voltage. DD
Table 3.1 also includes the input capacitance c , the output capacitance g c , and d output resistance r for a minimum sized inverter. s
Tech. Node
(nm) 180 130 90 65 45
t (nm) 1000 670 482 319 236
εr 3.75 3.3 2.8 2.5 2.1
ρ (10-8Ω‧m) 2.2 2.2 2.2 2.2 2.2
ca (fF/μm2) 0.039 0.053 0.065 0.057 0.072 cf (fF/μm) 0.05 0.07 0.058 0.065 0.052 cc (fF) 0.09 0.046 0.029 0.015 0.01
rs (kΩ) 8 9.5 10 15.8 12.5
cg (fF) 1.9 1.33 1.1 1.03 0.9
cd (fF) 4.8 3.32 2.04 1.22 0.6
Ioffn (μA/μm) 0.2 2 3.56 20 35.5
VDD (V) 1.8 1.2 1 0.7 0.6
Table 3.1 Technology and equivalent circuit parameters
3.3 Model of Global Interconnects
Repeater Model
We use repeaters to relay the signal in the interconnect. The repeater model is presented in Figure 3.1. It consists of two minimum sized inverter and a segment of a metal wire. The repeater has an input capacitance of c , an output capacitance of g c , d and an output resistance of r . Therefore, for a repeater of size s S, the total input capacitance is Cg = × , the total output capacitance is S cg Cd = × , and the total S cd output resistance is Rtr =r Ss/ .
The interconnect is modeled as a distributed RC line. It contains the resistance per unit length r and capacitance per unit length w c . For an interconnect with w length l, the total resistance is Rw= × , and the total capacitance is l rw Cw= × . l cw
Cw
Cd Cg
Rtr Rw
l Cd Cw Cg
Rtr Rw Cw
Cd Cg
Rtr Rw
l
Figure 3.1 Repeater RC model
On-chip Interconnect Model
The cross section of global interconnects is shown in Figure 3.2, where W is the width. SP is the spacing. T is the thickness. c is the parallel plate a capacitance to the top and bottom layers of metals and is proportional to interconnect width. c is the fringing capacitance. f c is the coupling capacitance between the c neighboring interconnects and is inversely proportional to the interconnect spacing.
The interconnect resistance per unit length is rw=ρ/Wt, where ρ is the metal resistivity.
W SP W SP W cc
cf ca ccfc t
W SP W SP W
cc
cf ca ccfc t
Figure 3.2 Cross section of global interconnects
According to TSMC 0.18μm technology, we can obtain c , a c , and f c c respectively. The interconnect capacitance per unit length c is w
w a f c
c = c W +c + c
× SP. (3.1) Furthermore, we can use MATLAB to plot the 3D graph for c as shown in w Figure 3.3.
Figure 3.3 Extracted capacitance cw as a function of width and spacing for 180nm technology
3.4 Performance of Global Interconnects
Time Constant
After we obtain C , g C , d R , tr C , and w R from Section 3.3, the time τ w constant of the repeater model is [3]
s s 2
The data transmitted in a single interconnect with bandwidth of BWsingle is inversely proportional to the time constant. To acquire the voltage swing from 5% of V to 95% of DD V , the bandwidth of a single interconnect DD BWsingle is defined as
The global interconnects with repeater insertion is shown in Figure 3.4, where L is the total interconnect length. The global interconnects which link many blocks of a SOC usually consist of a large number (n) of the parallel interconnects, and the total bandwidth BWtotal is
total single
Figure 3.4 Global interconnects with repeater insertion
Power
With technology scaling, the total power consumption is not only the switching power. The leakage power increases rapidly and the short-circuit power has also been shown to be a significant fraction (up to 15%) of the total power consumption for low-power and high-speed designs [4]. The three components of the total power are
analyzed as follows.
Switching power mode
The switching power of the repeater is shown in Figure 3.5. The switching power occurs when current in the repeater charge or discharge C , g C , and w C . The d expression of switching power is
Figure 3.5 Switching power model of repeater
Short-circuit power mode
The short-circuit power of the repeater is shown in Figure 3.6(a). The short-circuit power occurs during the transition from either high-to-low or low-to-high.
Both NMOS and PMOS transistors are on for a short period of time, and there is a current drawn from V through the two transistors to the ground [5]. The input and DD output voltage and current waveforms are shown in Figure 3.6(b). We denote t the r time for the input to rise from V to tn VDD −Vtp. The short-circuit current waveform is approximated by a triangular wave [4]. The expression of short-circuit power is
Vin Vout
Figure 3.6 Voltage and current waveforms of a CMOS inverter
Leakage power mode
For a long interconnect, we assume that there are half ones and half zeros. When inverter has an input of one, the NMOS transistor is turned ON. The leakage current is determined by the PMOS transistor. When inverter has an input of zero, the PMOS transistor is turned ON. The leakage current is determined by the NMOS transistor.
The expression of leakage power is
n p
min n min p
leakage DD leakage DD n off p off
DD n off p off minimum sized inverter.
These three types of power constitute the power dissipation in one stage.
Prepeater= Pswitching +Pshort circuit− +Pleakage. (3.8)
The total power for the global interconnects with repeater insertion is shown in Figure 3.4. In order to analyze the total power simply, we consider merely about
switching
P that is up to 85% of total power. The expression of total power Ptotal is f = bw
total dd total 1
2 of the global interconnects, p is the energy dissipation of the single interconnect. 1
Area
The area of a single interconnect Asingle is shown in Figure 3.7. After we obtain the width and spacing of the interconnect, the area of a single interconnect Asingle is
single
A = (W + SP) l L
× × l . (3.13) We implement the overall chip in Figure 3.4 and put the repeaters under the global interconnects. Therefore, we only consider the area of the global interconnects.
The expression of total area Atotal is
…
L
…
SP
W
…
L
…
SP
W
Figure 3.7 Area of a single interconnect with repeater insertion
Summary of Performance
According to the previous discussion, we observe that the bandwidth, power, and area of a single interconnect are affected by the interconnect width and spacing.
Furthermore, BWtotal , Ptotal , and Atotal are proportional to BWsingle, Psingle , and
single
A respectively. Therefore, we use MATLAB to plot 3D graph for BWtotal, Ptotal, and Atotal as function of width and spacing. These 3D graph are shown in Figure 3.8, Figure 3.9, and Figure 3.10 respectively.
Figure 3.8 MATLAB simulation for power vs. width and spacing
Figure 3.9 MATLAB simulation for bandwidth vs. width and spacing
Figure 3.10 MATLAB simulation for area vs. width and spacing
3.5 Figure of Merit for Optimization
The aim of global interconnects design is to obtain large bandwidth,small global interconnects area, and low power consumption simultaneously. According to the summary of performance discussed in Section 3.4, the large bandwidth BWsingle
requiressmall interconnect width and spacing. But the low power consumption Psingle and the small interconnect area Asingle require large interconnect width and spacing.
The global interconnects width and spacing affect the overall chip performance such as the bandwidth, the power consumption, and the interconnect area. The tradeoff between the bandwidth, the power, and the area is needed. Therefore, the figure of merit FOM is used for the global interconnects. It considers for the bandwidth, power consumption, and area simultaneously. The expression of FOM is
total total total
FOM = BW
P ×A . (3.15) The proposed novel methodology is to optimize the global interconnects and obtain the maximal FOM simultaneously for the various technologies. The proposed methodology considers three parts for the global interconnects, 1) the optimal interconnect width and spacing, 2) the optimal repeater size and interconnect length, 3) the optimal interconnect bandwidth.
Optimal Global Interconnects Width and Spacing
The previous equation is determined by the various interconnect width and spacing. The optimal interconnect width and spacing are not calculated. In this section, we use (3.11) and (3.13) to obtain the product of power and area for a single interconnect. The expression is
2
single single w g d dd
P A = {f[c L+(c S +c S) L] V } [(W + SP) L]
× × × × × × l × × × . (3.16)
Minimum power mode
The minimum power for the single interconnect is while the interconnect spacing is to tend towards infinite. When the spacing increases, the capacitance reduces. We define the infinite interconnect spacing as when the parallel plate capacitance is 10
times the coupling capacitance. Therefore, we substitute the minimum interconnect
Minimum area mode
Minimum area for the single interconnect is while the interconnect width and spacing are the smallest.
Minimum product of power and area mode
We can increase the interconnect spacing to reduce the power of the single interconnect. But, it also increases the area. Therefore, the minimum product of power and area mode is an important issue for the whole performance. In this section, we simplify (3.16) to
single single w
P ×A ∝ ×K [c ×(W + SP)] (3.19)
2
K = × ×f L VDD . (3.20) To achieve the minimum product of power and area mode, the interconnect width must be minimum. On this premise, the optimal interconnect spacing is calculated by setting the derivative of Psingle×Asingle on SP to be zero.
(Psingle Asingle) SP = 0
∂ ×
∂ (3.21) We solve (3.21) and the optimal interconnect spacing is
c
Use (3.16) and the technology parameter of TSMC 0.18μm, we can use MATLAB to plot 2D graph in Figure 3.11 for Psingle×Asingle versus to the various spacing.
Figure 3.11 MATLAB simulation for product of power and area vs. minimal width and spacing
Optimal Repeater Size and Optimal Interconnect Length
After we obtain the optimal interconnect width and spacing, we substitute (3.4),(3.12), and (3.14) to (3.15). The FOM is written as
total total
total total total
total
2 total dd
BW BW
FOM = =
1 BW
P A ( BW energy) [ (W + SP) L]
2 bw
2 1
=
3.32 BW V L (W + SP) c
K 1 c
τ τ
× × × × × ×
× × × × × ×
∝ × ×
. (3.23)
We observe that the FOM is inversely proportional to the product of time constant and capacitance τ×c. The expression is
s s 2 We solve (3.25) and the optimal repeater size is
opt s w
w g
S = r c
r c . (3.26) We substitute (3.3), (3.26), and the optimal interconnect width and spacing to HSPICE. The simulations are shown in Figure 3.12 and Figure 3.13. The Figure 3.12 expresses that the interconnect bandwidth is versus to the interconnect length. Figure 3.13 expresses that the interconnect bandwidth per energy is versus to the interconnect length.
Number of repeaters per cm
Bandwidth
HSPICE Simulation
Figure 3.12 Variation of bandwidth with number of repeaters
VDD = 1.8V
1.00E+07 3.00E+07 5.00E+07 7.00E+07 9.00E+07 1.10E+08 1.30E+08 1.50E+08
0 10 20 30 40 50
Number of repeaters per cm
bandwidth per bit-energy (bps/pJ)
Figure 3.13 Variation of bandwidth per energy with number of repeaters
The optimal interconnect length is obtained by setting the derivative of τ×c on l to be zero.
c= 0 l τ
∂ ×
∂ (3.27) We solve (3.27) and the optimal repeater size is
s g opt
w w
0.7r c l =
r c . (3.28) Therefore, we substitute (3.3), (3.26), (3.28), and the optimal interconnect width and spacing to HSPICE again. The simulation is shown in Figure 3.14. We obtain the maximum value of interconnect bandwidth per energy. Therefore, we claim that the interconnect circuit is optimized.
VDD = 1.8V
1.00E+07 3.00E+07 5.00E+07 7.00E+07 9.00E+07 1.10E+08 1.30E+08 1.50E+08
0 10 20 30 40 50
Number of repeaters per cm
bandwidth per bit-energy (bps/pJ)
Figure 3.14 Variation of bandwidth per energy with number of repeaters
Optimal Interconnect Bandwidth
After we obtain the optimal interconnect width and spacing, the optimal repeater size, and the optimal interconnect length, we substitute them to (3.3) and obtain the optimal interconnect bandwidth. The expression of the optimal interconnect bandwidth is
( )
opt
s g s d
1 1
BW = =
3.32 3r c + r c 3.32
τ× × . (3.29)
3.6 Optimization Flow
The optimization flow for the global interconnects is shown in Figure 3.12. It includes three methods, 1) the optimization for the minimum product of power and area mode, 2) the optimization for the minimum area mode, 3) the optimization for the minimal power mode.
Figure 3.15 Optimization flow for global interconnects To choose the optimization for minimal power To choose the
optimization for minimal area Begin Design
To choose process technology
Optimization END Decide the optimal interconnect Width (Wopt) and Spacing (SPopt) Decide the optimal Repeater Size (Sopt) and Interconnect Length (lopt)
Decide the optimal bandwidth (BWopt) of the interconnect
Precise Ca , Cf , Cc
Precise cw , rw
Precise rs , cg
To choose the optimization for minimal product of
power and area
3.7 Optimal Design Parameter
According to optimization flow, we optimize the minimum product of power and area to obtain the maximum FOM. Table 3.2 shows the optimal design expression of the global interconnects.
Parameter Optimal design
Interconnect Space (Wopt) Minimum Width
Interconnect Space (SPopt) opt c
a f Interconnect length (lopt) opt s g
w w
Table 3.2 Optimal design expression
We substitute the model parameter to the previous equation and calculate the optimal design parameters are shown in Table 3.3.
Supply voltage - repeaters 1.8V , 13 repeaters/cm Interconnect dimensions W = 0.28μm , SP = 0.64μm
Repeater dimensions Wn = 9.9μm , Wp = 35.2μm
Bandwidth 3Gbps
Total power 9.2mW
Total area 9200μm2
Table 3.3 Optimal design value
3.8 Summary
In this chapter, we improve the optimization for the global interconnects. The optimal design flow is proposed. We optimize the interconnect width and spacing, the repeater size and interconnect length, and the interconnect bandwidth. Finally, according to the optimal design parameter, we claim that the interconnect circuit is optimized.
Chapter 4
Global Interconnects Circuit Implementation
4.1 Single Interconnect Structure
Figure 4.1 is the typical single interconnect. According to the optimal design value, the data rate is 3Gbps and the interconnect length L is 10000μm. Figure 4.2 shows the pre-simulation results of the last repeater output. Table 4.1 shows the total power consumption and jitter of the single interconnect.
l
…
L l
…
L
Figure 4.1 Single interconnect with the optimal design
1.8
0 200p 400p 600p
1.8
0 200p 400p 600p
1.81.6
0 200p 400p 600p
1.81.6
0 200p 400p 600p
(a) TT (b) SS
0 200p 400p 600p
1.81.6
0 200p 400p 600p
1.81.6
0 200p 400p 600p
1.81.6
0 200p 400p 600p
(c) FF (d) SF
0 200p 400p 600p
1.8
0 200p 400p 600p
(e) FS
Figure 4.2 Corners of the single interconnect
L = 10000μm TT SS FF SF FS
Power 8.1mW 7.5mW 9.1mW 8mW 82mW
Jitter(p-p) 43ps 63ps 35.6ps 43.8ps 44.5ps Table 4.1 Power and jitter of the single interconnects
4.2 Global Interconnects Structure
Typical global interconnects implementation
Figure 4.3 is the typical layout of unidirectional global interconnects. Figure 4.4 shows the coupling effect of crosstalk by considering a simple case of three parallel lines with the optimal repeaters as drivers. In general, the length capacitor is inversely proportional to the interconnect spacing and proportional to the interconnect length that runs in parallel model. The cross coupling capacitor C is in the c horizontal spacing between the global interconnects.
… …
…
… …
…
Figure 4.3 Unidirectional global interconnects
…
Figure 4.4 Geometrical RC model of the parallel interconnect
On-chip global interconnects implementation
To reduce the impact of capacitive coupling noise, we use the interleaved repeaters for the global interconnects which is described in [11]. The layout structure
To reduce the impact of capacitive coupling noise, we use the interleaved repeaters for the global interconnects which is described in [11]. The layout structure