RLC coupling-aware simulation and on-chip bus encoding for delay reduction

(1)

RLC Coupling-Aware Simulation and On-Chip Bus

Encoding for Delay Reduction

Shang-Wei Tu, Yao-Wen Chang, and Jing-Yang Jou

Abstract—This paper shows that the worst case switching pattern that incurs the longest bus delay while considering the RLC effect is quite different from that while considering the RC effect alone. It implies that the existing encoding schemes based on the RC model may not improve or possibly worsen the delay when the inductance effects become dominant. A bus-invert method is also proposed to reduce the on-chip bus delay based on the RLC model. Simulation results show that the proposed encoding scheme signiﬁcantly reduces the worst case coupling delay of the inductance-dominated buses.

Index Terms—Bus-invert method, coupling, inductance, interconnect delay, worst case switching pattern.

I. INTRODUCTION

With aggressive scaling of transistor size, interconnect delay

in-creasingly dominates chip performance in deep-submicrometer

de-signs [17], [18], [20]. Besides, as the process technology advances and

the clock frequency increases over gigahertz, the inductance effects

of on-chip interconnect structures have become increasingly

signiﬁ-cant [7], [18]. On-chip inductance effects in high-performance circuit

designs might affect interconnect in many ways. The performance of

a circuit will be reduced due to the increase of wire delay [5], [13].

The long-range inductive crosstalk can cause serious signal integrity

wire inductance may damage devices. Finally, inductance in power and

ground grids can increase the noise in the supply and ground voltages

when large currents ﬂow. This is also known as the ground-bounce

problem. Therefore, inductance effects cannot be neglected in today’s

high-performance circuit designs, especially for global interconnects

such as clock wires and signal buses.

Most existingworks focus on reducingthe effects resultingfrom

couplingcapacitance on the bus structure. There is not much work

in the literature consideringinductance effects on the bus structure to

develop encodingschemes to reduce bus delay. Consideringonly the

capacitive couplingeffect, Victor and Keutzer [21], Baek et al. [1],

Hirose and Yasuura [10], and Sotiriadis and Chandrakasan [19]

pro-posed their bus encodingtechniques to eliminate crosstalk delay.

Since most previous works only consider capacitance effects on the

bus to reduce delay, the worst case switchingpattern that incurs the

largest delay is when adjacent wires simultaneously switch in opposite

transition directions. However, consideringthe RLC circuit model for

the bus structure, we ﬁnd that the worst case switchingpattern with the

largest on-chip bus delay is when all wires simultaneously switch in the

same direction. On the contrary, this worst case pattern is the best case

pattern of a coupling RC model. Further, the best case pattern with the

RLC model is that the central wire of the bus switches in a different

direction from all other wires that all switch in the same direction.

Manuscript received March 29, 2005; revised July 19, 2005 and September 23, 2005. This work was supported by the MediaTek Research Center at NCTU under Grant Q583. This paper was recommended by Associate Editor R. Suaya. S.-W. Tu and J.-Y. Jou are with the Department of Electronics Engineer-ing, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C. (e-mail: kuma@athena.ee.nctu.edu.tw; jyjou@faculty.nctu.edu.tw).

Y.-W. Changis with the Graduate Institute of Electronics Engineeringand Department of Electrical Engineering, National Taiwan University, Taipei 106, Taiwan, R.O.C. (e-mail: ywchang@cc.ee.ntu.edu.tw).

Digital Object Identiﬁer 10.1109/TCAD.2005.860956

Fig. 1. LC cross-coupled 5-bit bus structure. (a) Switchingpattern of the worst

case delay in the RLC model. (b) Switchingpattern of the best case delay in the

RLC model. (↑: switch from “0” to “1.” ↓: switch from “1” to “0.”)

However, this best case pattern is just the worst case pattern with the

RC model. See Fig. 1 for examples of the worst case and best case

switchingpatterns on a 5-bit bus. Therefore, the worst case patterns

with the maximum on-chip bus delay are completely different for the

RC and RLC models. Hence, as inductance cannot be neglected in

to-day’s high-performance circuit design, it is very important to consider

RLC effects to develop encodingschemes to reduce the bus delay.

With the ﬁndings of the best case and worst case patterns, we

propose a new encodingscheme for on-chip buses to minimize

couplingdelay with the dominance of inductance effects. The key

idea is that inductance couplingeffects should be alleviated by

transformingthe data sequences transmittingthrough on-chip buses.

However, the architectures of the encoder and decoder should be of

low complexity so that the power and delay overheads due to the codec

circuitry can be compensated by the signiﬁcant reduction of bus delay.

The rest of this paper is organized as follows. Section II describes

the parameters and basic assumptions used in our study for the bus

structure and then gives the working ﬂow. Section III performs some

simulations by usingthe RC model. Section IV gives simulations

by usingthe RLC model. The method and circuitry of our encoding

(decoding) scheme are described in Section V, and simulation results

are shown in Section VI. Finally, Section VII concludes the paper and

discusses our future work.

II. PRELIMINARY

In this work, we used the bus structure shown in Fig. 1 to conduct

our simulations. We assume that all drivers (receivers) have uniform

size and all signal wires have uniform width, spacing, and length.

The length, width, and pitch of the signal wire were 2000, 0.8, and

(2)

Fig. 2. Working ﬂow.

TABLE I

SIMULATIONRESULTS OF A5-bit BUSCONSIDERINGONLY

RC EFFECTS(0: NOTRANSITION)

2 µm, respectively. The respective width and pitch of the power/ground

were 2 and 13 µm. The heights of all wires are set to 2 µm. The

signal rise/fall time was set to 100 ps. With these feasible parameters

[7], [8], [18], we used the famous 3-D ﬁeld-solver FastCap [15] to

extract the self and couplingcapacitance and FastHenry [14] to extract

the resistance, self inductance, and couplinginductance. Then, with

these extracted RLC parameters, we constructed the coupling RLC

and RC circuit models. Both circuit models were constructed as π

segments using series resistance (or series resistance and inductance

for RL) and shunt capacitance. Finally, the circuits were simulated

by usingHSPICE. The overall ﬂowchart is illustrated in Fig. 2. In

our simulations, we assumed that synchronous latches are located at

the transmitter side. Thus, all the signals switch at the same time on

the buses.

III. SIMULATIONS

WITH THE

RC C

IRCUIT

MODEL

In this section, we simulate all switchingpatterns on the 5-bit

bus structure consideringonly RC effects. The simulation results are

listed in Table I. We should note that the number of total switching

patterns is 2

5

_{= 32 (without consideringnontransition cases).}

How-ever, switchingfrom “0” to “1” is symmetric to switchingfrom “1”

to “0” for bus delay computation. Therefore, the complete switching

patterns can be reduced to 2

5

_{/2 = 16}

_{. Besides, the 5-bit bus structure}

is also a symmetric structure with respect to the central signal wire.

For example, the switchingpatterns

↓↓↑↑↑ and ↑↑↑↓↓ have the same

delay effect on the central signal wire. Hence, the complete switching

TABLE II

SIMULATIONRESULTS OF A5-bit BUSCONSIDERING

RLC EFFECTS(Vdd = 1.2 V)

patterns can further be reduced to ten patterns as listed in Table I (the

ﬁrst ten patterns).

From Table I, the three patterns

↓↓↑↓↓, ↑↓↑↓↓, and ↑↓↑↓↑ result in

signiﬁcantly larger delays on the central signal wire. Obviously, when

we consider the resistance, self capacitance, and couplingcapacitance

of interconnects, the worst case switchingpattern that incurs the

largest delay is when adjacent wires simultaneously switch in opposite

transition directions. Therefore, all the previously mentioned encoding

schemes [1], [10], [19], [21] can improve the worst case bus delay.

IV. SIMULATIONS

WITH THE

RLC C

IRCUIT

MODEL

In this section, we ﬁrst simulate all switchingpatterns on the

5-bit bus structure consideringthe RLC effects of bus interconnects,

and then increase wire capacitance to see whether the worst case

switchingpattern will change or not as the wire capacitance becomes

dominant. The simulation results for the 5-bit bus are shown in

Table II. From Table II, we observe that the worst case pattern changes

from

↓↓↑↓↓ (in Table I) to ↑↑↑↑↑ and the best case pattern changes

from

↑↑↑↑↑ (in Table I) to ↓↓↑↓↓. Therefore, the worst case and best

case switchingpatterns are completely different consideringRC and

RLC effects. Therefore, as the technology advances and the clock

frequency continues to increase, it is very important to consider RLC

effects on the bus structure to derive encodingschemes to reduce bus

delay. Otherwise, the encodingschemes might not improve or even

worsen the on-chip bus delay because of the redundant logics and

wires. Further, we also observe that the largest overshoot noise occurs

for the pattern

↑↑↑↑↑, as shown in Table II.

Why does the worst case switchingpattern

↑↑↑↑↑ result in the

largest bus delay when considering RLC effects on the 5-bit bus?

Theoretically speaking, this is mainly due to two factors. 1) Inductance

becomes dominant due to higher frequency (increasing the impedance

of wire inductance that is jωL) and longer interconnects (longer return

path). 2) It is also due to the long-range effect of inductance. From

Faraday’s law [2], as shown in (1), the electromotive force induced in

a closed circuit is equal to the negative rate of increase of the magnetic

ﬂux linkingthe circuit. We have

V

j

= −

dΦ

_ij

dt

with Φ

ij

=

Sj

B

i

d

s

j

(1)

(3)

Fig. 3. Delays (percentage of that of pattern 00 ↑ 00) of the worst case switchingpattern with various wire capacitances.

where V

_j

is the electromotive force induced in loop j due to the

time-varyingcurrent I

i

in loop i. Here, Φ

ij

is the magnetic ﬂux in

loop j due to the current I

i

,

B

i

is the magnetic ﬂux density arising

from current I

i

, and S

j

represents the surface bounded by the loop j.

The orientation of

B

i

can be determined from the right-hand rule.

Therefore, as shown in Fig. 1(a), the time-varying (increasing) current

of the leftmost aggressor wire will induce a downward time-varying

(increasing) magnetic ﬁeld on the victim wire. Therefore, the current

results in a positive mutual ﬂux Φ, which also increases with time.

Finally, from (1), the induced voltage on the victim loop is negative;

that is, the induced current on the victim wire ﬂows in the reverse

direction of the victim current. Hence, while all neighboring wires

simultaneously switch in the same direction as the victim wire does,

they will all induce a current of different direction on the victim wire

as shown in Fig. 1(a). This implies that the charging time (delay)

will increase due to the long-range coupling. We can conclude that

as inductance effects dominate, the worst case switchingpattern with

maximum delay is when all wires simultaneously switch in the same

direction. Meanwhile, these patterns will also result in the largest noise

between each other.

Since Cao et al. [3] claimed that the worst case switchingpattern

for a 5-bit bus should be

↑↓↑↓↑ consideringcapacitive and inductive

coupling, we also conducted simulations to see whether the worst case

switchingpattern will change or not when capacitance effects become

dominant. We simulated with the extracted RCL circuit model of the

5-bit bus by increasingthe wire capacitance step by step. The

simula-tion results are shown in Fig. 3, and the complete switching patterns

when capacitance effects dominate (ten times of wire capacitance) are

listed in Table III.

From Fig. 3 and Table III, we observe that the worst case switching

pattern for the 5-bit bus changes from

↑↑↑↑↑ to ↑↓↑↓↑. From Table III,

while consideringthe worst (best) case switchingpattern as wire

ca-pacitance dominates, we should ﬁrst consider the immediate neighbors

for the worst (best) case capacitive couplingand then consider the

farther neighbors for the worst (best) inductive coupling.

To further investigate the change of the worst case switching pattern

when capacitance effects dominate, we also conducted simulations

with varying signal rise times. As shown in Fig. 4, the worst case

switchingpattern for the 5-bit bus also changes from

↑↑↑↑↑ to ↑↓↑↓↑

when we increase the signal rise time (i.e., decrease the working

frequency). This phenomenon also conforms to the trend when

capac-itance effects dominate since the impedance of wire capaccapac-itance will

increase as the workingfrequency decreases. We should note that the

frequency of interest here is 583.3 MHz as the rise time is set to 600 ps,

TABLE III

SIMULATIONRESULTS OF THE5-bit BUSWHENWIRECAPACITANCE

BECOMESDOMINANT(TENTIMESWIRECAPACITANCE)

Fig. 4. Delays (percentage of that of the pattern 00 ↑ 00) of the worst case switchingpattern with various signal rise times.

for which the capacitance effects dominate. (See [12] for the formula

to determine whether the inductance effects are signiﬁcant.)

V. BUS-INVERT

SCHEME

Inspired by Stan’s low-power bus-invert method [16] for reducing

the transition activities to reduce the bus transition power, we propose

a bus-invert method to reduce the on-chip bus delay due to

couplingef-fects while inductance efcouplingef-fects dominate. Our bus-invert method inverts

the input data when the number of bits switchingin the same direction

is more than half of the number of signal bits. The remaining problem

is how to implement the codingarchitecture with low complexity.

For the implementation, we propose an encoder architecture shown

in Fig. 5.

There are three types of possible signal transitions: type I:

↑

(switchingfrom “0” to “1”), type II:

↓ (switchingfrom “1” to “0”),

and type III: 0 (no switching). If we refer to x

i

(n) as an input

signal and to x

i

(n − 1) as its previous input signal, then type I is

(x

_i

(n), x

_i

(n − 1)) = (1, 0), type II is (x

i

(n), x

i

(n − 1)) = (0, 1),

and type III is (x

i

(n), x

i

(n − 1)) = (0, 0) or (1, 1). With the input

x

i

(n) and x

i

(n − 1), the codeword generator generates (q

L

, q

H

) =

(0, 1) for type I, (1, 0) for type II, and (0, 0) for type III. Then all

q

_L

’s are inputs to the majority voter (L) and all q

_H

’s to the majority

voter (H). Finally, from the output of the majority voter L or H, we

can detect if the number of type I or II transitions is more than half

(4)

Fig. 5. (a) 4-bit bus encoder for the bus-invert scheme. (b) 5-bit bus encoder for the bus-invert scheme.

of the number of signal bits. If one of the majority voters’ outputs is

high, the input signal should be inverted. The majority voters can be

implemented by usingeither a tree of full adders or resistors combined

with a voltage comparator [16].

Since the additional invert line will contribute to transitions, it

should also be considered. Let N be the total number of signal

bits of a bus excludingthe invert line. The output of the majority

voter is asserted when

(N + 1)/2 inputs are high. If N is odd,

the example encoder architecture is just as that shown in Fig. 5(b).

Hence, after encoding, the worst case switching pattern occurs when

(N + 1)/2 signal bits switch in the same direction, where N is

odd. If N is even, the encoder architecture is somewhat different as

that shown in Fig. 5(a). The major differences are that we need an

extra input INV(n

− 1) for our encoder and INV(n) = INV(n − 1)

or INV(n

− 1), dependingif INV_t is high or low. Hence, after

encoding, the worst case switching pattern is that N/2 signal bits

switch in the same direction, where N is even.

The circuitry of the receiver is relatively simple because it only

needs to conditionally invert the receivingdata to get a correct data

value. If N is odd, the receivingdata need to be inverted only when the

invert line is high. If N is even, the receivingdata need to be inverted

only when the invert line has a transition.

For today’s high-performance circuits, there are typically only 14 to

16 FO4 (fanout-of-four inverter [6]) delays per clock [11]. Hence, the

delay overhead introduced by our encoder should be minimized. Let

d

AND2

, d

OR2

, and d

XOR2

be the delay of a two-input

AND

gate, that of

a two-input

OR

gate, and that of a two-input

XOR

gate, respectively. For

an N -bit bus, the critical path delay D(N ) of our N -bit bus encoder

is given by

D(N ) = d

Codeword Generator

+ d

(N +1) 2

-

out

-

of

-

N Majority Voter

+ d

OR2

+ d

XOR2

(2)

where d

_{Codeword Generator}

equals the delay of an inverter d

_INV

plus d

AND2

(i.e., d

Codeword Generator

= d

INV

+ d

AND2

), and the

de-lay d

_(N+1)/2

-

out

-

of

-

N MajorityVoter

is given by log

3/2

N

∗

d

full adder

(delay of a full adder) since we use a full adder tree to implement

the majority voter. Therefore, for a typical 8-bit bus encoder with

optimized logic and the full-adder circuit implemented as a

mirror-type adder, the critical path of the encoder has a delay of ten FO4,

which is about two thirds of the clock cycle time. This delay overhead

is similar to that of the low-power bus-invert method. Nevertheless,

this delay overhead is the “worst case” scenario. Since the encoding

logic could be fused with the logic of the IP block, this delay overhead

could be reduced with the simultaneous optimization of encodingand

the IP block logic.

Like the bus-invert method, our method can also reduce the bus

transitions. The reduction of the bus transition count occurs when

there are

(N + 1)/2 bits that transit in the same direction. For this

case, our encoder will invert the current data and the transition count

will be reduced. Take an 8-bit bus as an example. For the transition

pattern (00

↑↑↑↑↑↑) before encoding, the transition count is ﬁve. After

(5)

Fig. 6. Worst case #-bit bus delay (percentage of the delay of only one transition pattern of the #-bit bus) with # varyingfrom 2 to 11.

Fig. 7. Reduction of worst case delay of #-bit bus by using the bus-invert method with # varyingfrom 2 to 11.

encoding, the transition pattern changes to (

↑↑000000(↑)) or (↓↓

000000(↑)), and the transition count is reduced to three (the additional

transition (

↑) is due to the signal transit on the invert line). Therefore,

our encodingscheme can also reduce the average power consumed by

the bus in terms of the average transition count. However, the peak

power dissipation after encodingwill remain the same. For the

transi-tion pattern (

↑↓↑↓↑↓↑↓) before encoding, our encoder will not invert

the current data (i.e., the transition pattern will remain the same after

encoding, and this transition pattern causes the peak power consumed

by the 8-bit bus due to the couplingcapacitance between wires). Since

the oppositely switching signals are good for reducing the inductive

couplingdelay, our encoder will keep these transitions unless there

are

(N + 1)/2 bits that transit in the same direction. However, the

oppositely switchingsignals are the worst for the power consumption

when consideringthe capacitive couplingeffects. Therefore, the peak

power will remain the same after usingour encodingscheme.

VI. SIMULATION

RESULTS

A. Bus Coupling Delay Reduction

With the parameters given in Section II, we conducted our

simu-lations by varyingbus signal bits with or without usingthe proposed

bus-invert method. The simulation results are shown in Figs. 6 and 7.

From Fig. 6, we observe that coupling inductance has greater

impacts on bus delay as the number of bus bit lines increases. For a

tight LC cross-coupled bus, as shown in Fig. 6, the increase (in percent)

of the worst case switchingdelay grows about linearly with the number

of bus bit lines. Hence, for a high-frequency tight LC cross-coupled

TABLE IV

REDUCTION OFWORSTCASENOISE BYUSING THEBUS-INVERT

METHOD FORBUSWIDTHSRANGINGFROM2TO11

bus, the delay due to signals simultaneously switching in the same

direction should be considered.

As shown in Fig. 7, our encoding method can signiﬁcantly reduce

the worst case switchingdelay; in other words, the bus performance

can be improved. Besides, our encodingmethod can obtain an even

better reduction rate as the number of bus bit lines increases. However,

since the encoder architectures for even-bit and odd-bit buses are

slightly different, the delay reductions are also a little different. For an

N

-bit bus, if N is odd, the worst case switchingpattern after encoding

is (N + 1)/2 signal bits switchingin the same direction includingthe

INV line. For when N is even, the worst case pattern after encoding

is that only N/2 signal bits switch in the same direction, including

the INV line. Hence, the reduction curve of even-bit buses is above

that of odd-bit buses when the number of bits is larger than ﬁve (see

Fig. 7). We should also note that for the 2-bit bus, our encoding method

will worsen the worst case delay because the additional INV line will

introduce large additional coupling to the victim line. In other words,

the delay of the worst case after encodingfor 2-bit lines plus one INV

line will be larger than the worst case for only 2-bit lines.

In addition to reducingthe worst case delay, our method has the side

effects of decreasingthe maximum ground bounce and eliminating

the maximum inductive noise. For example, as shown in Table IV, the

average reduction of maximum inductive noise is about 17%. Since

the ground bounce and the inductive noise are also worst when all

signal wires switch in the same direction, our method can also reduce

these effects.

B. Delay Overhead of the Bus Encoder

To investigate the delay introduced by our bus encoder and the

couplingdelay reduction by usingour encodingmethod, we conduct

the followingsimulations to show the delay reduction consideringthe

delay overhead of the encoder for different technology nodes.

The parameters used are adopted from the 1997 National

Technol-ogy Roadmap for Semiconductors (NTRS’97) [17] and the simulation

results in [4]. These parameters are shown in Table V. We consider a

typical 8-bit bus with the total routinglength of half perimeter of a chip

and four times of the minimum wire width and spacing. The length of

each wire segment between two buffers is 3000 µm. In addition, the

buffers are sized to maintain equal input and output transition times,

which is a classical design criterion for buffer sizing. The simulation

results are listed in Table VI. Column 2 shows the half perimeter of

a chip accordingto the chip area reported in Table V, assumingthat

chips are of the square shape and thus the half perimeter of a chip

is 2

√

Area. Column 3 lists the number of required wire segments for

signals passing through the half perimeter of a chip [i.e., the half

(6)

TABLE V

INTERCONNECT ANDDEVICEPARAMETERSUSED

TABLE VI

SIMULATIONRESULTS OF THECOUPLINGDELAYREDUCTION BYUSINGOURENCODINGMETHOD AND THEDELAYOVERHEAD OF THEENCODER FORDIFFERENTTECHNOLOGYNODES

perimeter of a chip 2

√

Area (millimeter) divided by the length of a wire

segment which is 3 mm]. Column 4 shows the delay overhead induced

from our 8-bit bus encoder. The delay gains (the worst case delay of the

bus before encoding minus that after encoding) of the signals passing

through one wire segment (3000 µm) and through half perimeter of a

chip are shown in Columns 5 and 6, respectively. The overall delay

gains [((Column 6

− Column 4)/Column 4) × 100%] are given in

Column 7. Finally, noise reduction is shown in Column 8.

Columns 2 and 3 in Table VI reveal the increase of the chip size

as the technology advances. Since the intrinsic gate delay decreases as

the feature size shrinks, the delay overhead of our encoder decreases as

well (see Column 4). We report the worst case delay overhead derived

in Section V in our simulations. From Columns 5–7, we observe that

the overall delay gain tends to increase as the technology advances

although the delay gain of each wire segment decreases. For example,

as shown in our simulations, the delay gain for signals passing through

the half perimeter of a chip is only about 20% for the 0.18-µm process

while this gain increases to about 167% for the 0.07-µm process. The

reasons are twofold: 1) the decrease of the intrinsic gate delay and

2) the increase of the chip size. To further improve the overall delay

gain by reducing the delay overhead, designers can also use dynamic

logic to implement the encoding circuit. In addition to the coupling

delay reduction, our method can also reduce the maximum inductive

couplingnoise for long-interconnects by about 30%. The simulation

results are shown in Column 8.

VII. CONCLUSION AND

DISCUSSIONS

In this paper, we have shown that the inductance effect has changed

the worst case switchingpattern with the maximum bus delay. For a

5-bit bus structure, the worst case switchingpattern is

∗↓↑↓∗ or

∗ ↑↓↑ ∗ considering RC effects, but the worst case pattern changes to

↑↑↑↑↑ or ↓↓↓↓↓ considering RLC effects. Hence, we shall consider

both the RC and the RLC effects to derive effective encodingschemes

for bus delay optimization.

We have also conducted simulations considering RLC effects on the

bus structure when the wire capacitance becomes dominant. We have

observed that the worst case switchingpattern is also different from

the one considering RC effects. The difference is due to the long-range

inductive coupling.

We have also proposed a bus-invert method to reduce the worst

case on-chip bus delay with the dominance of the inductance coupling

effect. Simulation results have shown that our encodingmethod can

signiﬁcantly reduce the worst coupling delay of a bus. In the future,

we intend to develop a more sophisticated bus-invert scheme to further

reduce the inductive couplingdelay.

Our encodingscheme is recommended for cases when buses or

parallel signal wires are about thousands of micrometers long and

work above gigahertz frequencies. At such working frequencies, the

gate delay overhead of our encoder should be small enough. If we

choose the full-adder tree to implement the majority voter, the delay

of the majority voter is O(log

_1.5

N )

∗

(full-adder delay), where N

is the total number of signal bits of a bus. In other words, if N is

very large, our encoder may cause timing violations. To solve this

problem, we can divide the original bus into subbuses by inserting

ground wires between subbuses. Hence, the overall problem is a

gate-delay-dependent (and thus process-dependent) optimization problem.

Therefore, we shall solve this problem in our future work.

It should be noted that our encodingmethod is not optimal.

However, it is very simple yet efﬁcient, and thus the encoder and

decoder logics are also very easy for implementation. Therefore, the

delay and the power overhead of the encoder and decoder logics are

minor compared to the delay and the power consumption of the bus. It

needs further investigation for the possible optimal encoding scheme,

and it could be a possible direction of our future work. We believe

that the resulting“optimal” encoder and decoder would be much more

complex than ours and might use more than one pipeline stage to

encode/decode data.

The worst case switchingpattern, as pointed out in this paper, could

be varied with the dimensions and the workingfrequency of a bus.

(7)

Hence, to develop a ﬂexible encodingscheme that can cope with

the varyingworst case patterns, one potential method is to conduct

complete HSPICE simulations for all switchingpatterns accordingto

the workingfrequency and the extracted RLC model of a bus to ﬁnd

the real worst case switchingpattern. After simulation, all transition

delays between any two data patterns can be measured. Then, we

can develop an appropriate bus encodingmethod to avoid the patterns

that violate the delay constraint. However, it is very time consuming

to conduct complete HSPICE simulations and may suffer from the

memory explosion problem for wide buses (for an n-bit bus, there

are totally 2

n

_{data patterns and 4}

n

_/2

_{transition patterns). Therefore,}

identifyingthe real worst case switchingpattern of a bus efﬁciently is

also a desirable research topic before the development of a ﬂexible bus

encodingscheme.

REFERENCES

[1] K. H. Baek, K. W. Kim, and S. M. Kang, “A low energy encoding tech-nique for reduction of couplingeffects in SOC interconnects,” in Proc.

43rd IEEE Midwest Symp. Circuits and Systems, Lansing, MI, Aug. 2000,

pp. 80–83.

[2] D. K. Cheng, Field and Wave Electromagnetics, 2nd ed. Reading, MA: Addison-Wesley, 1989.

[3] Y. Cao, X. Huang, N. H. Chang, S. Lin, O. S. Nakagawa, W. Xie, D. Sylvester, and C. Hu, “Effective on-chip inductance modelingfor multiple signal lines and application to repeater insertion,” in Int. Symp.

Quality Electronic Design, San Jose, CA, Mar. 2001, pp. 185–190.

[4] J. Cong, “An interconnect-centric design ﬂow for nanometer technolo-gies,” Proc. IEEE, vol. 89, no. 4, pp. 505–528, Apr. 2001.

[5] M. H. Chowdhury, Y. I. Ismail, C. V. Kashyap, and B. L. Krauter, “Per-formance analysis of deep sub micron VLSI circuits in the presence of self and mutual inductance,” in IEEE Int. Symp. Circuits and Systems, Scottsdale, AZ, 2002, pp. 197–200.

[6] D. Chinnery and K. Keutzer, Closing the Gap Between ASIC and

Custom—Tools and Techniques for High-Performance ASIC Design.

Boston, MA: Kluwer, 2002.

[7] M. A. Elgamel and M. A. Bayoumi, “Interconnect noise analysis and optimization in deep submicron technology,” IEEE Circuits Syst. Mag., vol. 3, no. 4, pp. 6–17, 2003.

[8] R. Escovar and R. Suaya, “Optimal design of clock trees for multigi-gahertz applications,” IEEE Trans. Comput.-Aided Des. Integr. Circuits

Syst., vol. 23, no. 3, pp. 329–345, Mar. 2004.

[9] L. He and K. M. Lepak, “Simultaneous shield insertion and net ordering for capacitive and inductive couplingminimization,” in Int. Symp.

Physi-cal Design, San Diego, CA, 2000, pp. 55–60.

[10] K. Hirose and H. Yasuura, “A bus delay reduction technique consider-ingcrosstalk,” in Proc. Design Automation and Test Eur. (DATE), Paris, France, Mar. 2000, pp. 441–445.

[11] R. Ho, K. W. Mai, and M. A. Horowitz, “The future of wire,” Proc. IEEE, vol. 89, no. 4, pp. 490–504, Apr. 2001.

[12] Y. I. Ismail, E. G. Friedman, and J. L. Neves, “Figures of merit to char-acterize the importance of on-chip inductance,” IEEE Trans. Very Large

Scale Integr. (VLSI) Syst., vol. 7, no. 4, pp. 442–449, Dec. 1999.

[13] Y. I. Ismail, “On-chip inductance cons and pros,” IEEE Trans. Very Large

Scale Integr. (VLSI) Syst., vol. 10, no. 6, pp. 685–694, Dec. 2002.

[14] M. Kamon, M. J. Tsuk, and J. K. White, “FastHenry: A multipole-accelerated 3D inductance extraction program,” IEEE Trans.

Comput.-Aided Des. Integr. Circuits Syst., vol. 42, no. 9, pp. 1750–1758,

Sep. 1994.

[15] K. Nabors and J. White, “FastCap: A multipole accelerated 3-D capaci-tance extraction program,” IEEE Trans. Comput.-Aided Des. Integr.

Cir-cuits Syst., vol. 10, no. 11, pp. 1447–1459, Nov. 1991.

[16] M. R. Stan and W. P. Burleson, “Bus-invert codingfor low-power I/O,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 3, no. 1, pp. 49–58, Mar. 1995.

[17] Semiconductor Industry Association, National Technology Roadmap for

Semiconductors, 1997.

[18] ——, International Technology Roadmap for Semiconductors, 2003. [19] P. P. Sotiriadis and A. P. Chandrakasan, “Reducingbus delay in

submi-cron technology using coding,” in Proc. Asia and South Paciﬁc Design

Automation Conf., Yokohama, Japan, Feb. 2001, pp. 109–114.

[20] L. Trevillyan, D. Kung, R. Puri, L. N. Reddy, and M. A. Kazda, “An integrated environment for technology closure of deep-submicron IC de-signs,” IEEE Des. Test. Comput., vol. 21, no. 1, pp. 14–22, Jan./Feb. 2004. [21] B. Victor and K. Keutzer, “Bus encodingto prevent crosstalk delay,” in

Int. Conf. Computer-Aided Design, San Jose, CA, Nov. 2001, pp. 57–63.

Modeling the Driver Load in the Presence

of Process Variations

Janet M. Wang, Jun Li, Satish Yanamanamanda,

Lakshmi K. Vakati, and Kishore K. Muchherla

Abstract—Feature sizes of less than 90 nm and clock frequencies higher than 3 GHz calls for fundamental changes in driver-load models. New driver-load models must consider the process variation impact of the manufacturing procedure, the nonlinear behavior of the drivers, the in-ductance effects of the loads, and the slew rates of the output waveforms. The present deterministic driver-load models use the conventional de-terministic driver-delay model with a singleC_eﬀ (one ramp) approach. Neither the statistical property of the driver nor the inductance effects of the interconnect are taken into consideration. Therefore, the accuracy of existing models is questionable. This paper introduces a new driver-load model that predicts the driver-delay changes in the presence of process variations and represents the interconnect load as a distributed resistance, inductance and capacitance (RLC) network. The employed orthogonal polynomial-based probabilistic collocation method (PCM) constructs a driver-delay analytical equation from the circuit’s output response. The obtained analytical equation is used to evaluate the driver output de-lay distribution. In addition, the load is modeled as a two-effective-capacitance in order to capture the nonlinear behavior of the driver. The lossy transmission line approach accounts for the impact of the inductance when modeling the driving-point interconnect load. The new model shows improvements of 9% in the average delay error and 2.2% in the slew rate error over the simulation program with integrated circuit emphasis (SPICE) and the one ramp modeling approaches. Compared with the Monte Carlo method, the proposed model demonstrates a less than 3% error in the expected gate delay value and a less 5% error in the gate delay variance.

Index Terms—Driver equivalent resistance, inductance effect evaluation criteria, interconnect driving-point admittance, multiple effective capaci-tance, probability collocation method (PCM), process variation.

I. INTRODUCTION

As technologies advance beyond the deep submicrometer (DSM)

regime, design for manufacturability (DFM) issues are moving

into the mainstream with unexpectedly low yields startingat the

130-nm process node. At 90 nm and below, DFM issues are the major

factors affectingthe speed of production ramps and the proﬁtability of

semiconductor companies.

Manuscript received June 15, 2004; revised October 24, 2004 and May 2, 2005. This work was supported in part by the National Science Foundation under Grant NSF-345090. This paper was recommended by Associate Editor L. Scheffer.

J. M. Wangis with the Department of Electrical and Computer Engineer-ing, University of Arizona, Tucson, AZ 85721-0104 USA (e-mail: wml@ ece.arizona.edu).

J. Li is with Anova Solutions Inc., San Jose, CA 95054 USA (e-mail: junl@anova-solutions.com).

S. Yanamanamanda, L. K. Vakati, and K. K. Muchherla were with the Uni-versity of Arizona, Tucson, AZ 85721, USA. They are now with Micron Tech-nology, Boise, ID 83716 USA (e-mail: satishy@gmail.com; kalpana@email. arizona.edu; muchherla@gmail.com).

RLC coupling-aware simulation and on-chip bus encoding for delay reduction