Chapter 1 Introduction
1.2 Our Approach
As the process technology advances and the clock frequency increases over GHz, the inductance effects on on-chip interconnect structures have become increasingly significant [1]. Most existing works focus on reducing the effect resulting from the coupling capacitance on the bus structure. There is not much work in the literature considering the inductance effects on the bus structure to
develop encoding schemes to reduce the bus delay. However, considering the RLC circuit model for the bus structure, we find out that when the inductance
effect dominates, the worst-case switching pattern with the largest on-chip bus delay is when all wires simultaneously switch in the same direction [4].
Furthermore, in [4], the authors indicate that while considering the RLC effect of interconnects, the worst-case switching pattern will change under different levels of interconnect (local, medium, or global wire) and different working frequency.
Hence, as inductance cannot be neglected in today high-performance circuit design, it is very important to consider the RLC effect while developing the bus encoding schemes.
From [4] and the previous discussions, we can understand that the impacts caused by aggressors are very diverse through the mixture of the capacitive and inductive coupling. Therefore, the worst-case delay patterns could be very different for various bus wire parameters, i.e., different inductive and capacitive coupling conditions. Moreover, the capacitive and inductive coupling effects vary with different working frequencies since the impedance of capacitance (jωC)-1 (where ω = 2πf) decreases with the frequency but the impedance of inductance jωL increases. Therefore, when considering RLC effects, it is crucial to take design parameters into account to derive a better bus encoding scheme.
Thus, with the concept that the worst-case switching pattern varies with given design parameters while consider the RLC effect of interconnects, we propose a flexible encoding scheme for on-chip buses with given parameters. The key idea is that the coupling effect should be alleviated by transforming the data
sequences transmitting through on-chip buses. However, the architectures of the encoder and decoder should be of low complexity so that the power and delay overheads due to the codec circuitry can be compensated by the significant reduction of the bus delay.
The thesis is organized as follows. Chapter 2 describes our bus encoding flow for reducing the delay time. Further more, we propose a bus encoding flow to lengthen signal propagation in chapter 3. Finally, chapter 4 concludes this thesis.
Chapter 2
Bus Encoding for Reducing the Delay Time
2.1 Preliminary
2.1.1 Assumption and Problem Input
In this thesis, we consider the coplanar bus structure shown in Figure 9 to build the encoding flow. In this bus structure, we assume that each driver (receiver) has a uniform size and each signal wire has a uniform width, pitch, length and height. Given the parameters of wires (width, height and pitch), the delay constraint, the working frequency (or slew rate) and the number of data bits,
the encoding flow will generate a valid code set that meet the delay constraint with considering the LC coupling effects. The valid code set states that any transition between patterns within this code set is guaranteed to meet the delay constraint. This code set can be used for one-to-one mapping with data patterns.
Signal Wire
Power/Ground
Power/Ground
Figure 9: The coplanar bus structure.
2.1.2 The Bus Structure
Assume the number of data bit is n, the overall bus structure is shown in Figure 10. The valid code set of the global bus contains only 2n out of 2m possible codes. The specific 2n codes are selected to minimize the coupling effects between any two of them. In addition, the transition delay between any two patterns in the specific 2n codes will meet the delay constraint which is given by user. The encoding and decoding process are straightforward and are not discussed in this thesis.
Figure 10: The overall bus structure including the encoder and decoder.
Since transistors mainly operate in the linear region during transitions, it is assumed that all drivers’ output resistances are linear throughout the simulations.
Therefore, the drivers will be modeled as simple linear resistances. In addition, the receivers will be replaced by equivalent gate capacitances in the circuit model and the wires are replaced by equivalent RLC circuit models. With the models, the built circuit model for the coplanar bus structure will be constructed by only linear elements (linear R, L, and C). In other words, the built circuit is a LTI (linear time invariant) system.
In the simulation, we assume that synchronous latches are located at the transmit side and receive side. Hence, all the signals switch at the same time on the buses, which is a very common assumption for busses [13].
Encoder Decoder
Data
n-bit
m-bit
Data n-bit
m > n
Global bus with only 2nvalid codes which are allowed to be transmitted
(m-n: wire overhead)
2.2 Overall Encoding Flow
Figure 11 illustrates our overall bus encoding flow. With the given parameters, the first step is to build the 3D bus structure and then extract the resistances, capacitances, and inductances of bus wires. After extraction, the equivalent RLC circuit will be built. Next, the built circuit will be simulated using HSPICE with the basis vectors which will be defined later. By applying superposition theorem [14], we can establish the transition graph efficiently.
(4). Find a maximum valid code set (maximum clique in graph)
(4). Find a maximum valid code set (maximum clique in graph) Given bus parameters and constraintsGiven bus parameters and constraints
(1). Build the (1). Build the RLCRLCmodel model
Add one more bitAdd one more bit
No
Yes
(5).If the valid code set covers all data patterns ?
(5).If the valid code set covers all data patterns ?
Output valid code setOutput valid code set
(2). SPICE simulation for (2). SPICE simulation for basis vectors basis vectors
(3). Build the transition graph (3). Build the transition graph
Figure 11: The overall encoding flow.
With the information of the transition graph, the greedy algorithm is applied to find a valid code set in which all transitions between any code pair will meet the delay constraint. In step (5), we will check whether the code set covers all
data patterns or not. If so, the valid code set will be outputted to map to the data patterns. Otherwise, we will add one more bit line to the bus structure and redo from step (1). In the following, we give an example to demonstrate the delay reduction by using our encoding flow with the generated valid code set. Given a 2-bit bus structure (4 data patterns 00, 01, 10, 11), and the delay constraint is set to 30 picoseconds. The transition delays of all transition patterns of the 2-bit bus are listed in Table 4. In Table 4, since the transition pattern ↑↓ (i.e. 01Æ10 or 10Æ01) violate the delay constraint, we will use our encoding method to improve the worst-case switching delay to meet the delay constraint for the bus.
By using our encoding flow, a valid code set which contains 4 patterns 000, 001, 100, and 101 will be generated. The one to one mapping between the data patterns and the valid codes and the new bus structure with the generated valid code set are shown in Figure 12. By using our encoding technique, the worst-case switching pattern ↑↓ is removed from the bus since only the valid codes (i.e. 000, 001, 100, and 101) are allowed to be transmitted on the bus. The transition delays of all transition patterns of the encoded 3-bit bus are listed in Table 5. From Table 5, we can observe that all transitions between any two codes in the valid code set meet the delay constraint.
Table 4: The transition delays of a 2-bit bus without encoding.
Transition patterns Transition delay (picoseconds)
-↑ 27.20840
↑- 27.19817
↑ ↓ 38.72782
↑ ↑ 11.96549 ( -: stable wire, ↑: switch from low to high, ↓: switch from high to low)
Figure 12: The encoded 3-bit bus with the generated valid code set.
Table 5: The transition delays of the generated valid code set.
Transition patterns Transition delay (picoseconds)
--↑ 24.6111
↑-- 24.5803
↑-↓ 29.2374
↑-↑ 19.6909
The details of each step in our encoding flow will be described in the following sections.
2.3 Build the RLC Model
In step (1), with the given feasible parameters, FastCap [15] and FastHenry [16] are used to extract the RLC parameters of the bus and construct the SPICE model. The detailed flow of step (1) is shown in Figure 13. FastCap can extract the self and coupling capacitance of wires, while FastHenry is developed to extract the resistance, self inductance, and coupling inductance. With these extracted RLC parameters, the equivalent RLC circuit models will be constructed.
The circuit models are constructed as π-segments using series resistances and inductances and shunt capacitances. The circuit model will be outputted as a SPICE file.
SPICE file SPICE file
Figure 13: Extract RLC and generate the corresponding SPICE file.
2.4 SPICE Simulation for Basis Vectors
After building the equivalent RLC circuit model of the bus, one can obtain the transition delay by simply using HSPICE simulation. However, for an n-bit bus, there are 2n input patterns and total 4n kinds of transition patterns. If we simulate all transition patterns using only HSPICE, it will be very time-consuming. The complexity of simulation time will be 4n*(HSPICE simulation time for a transition pattern). Therefore, we develop a method based on superposition theorem [14] to speed up the simulation time. The superposition theorem states that, for an LTI circuit, the resulting effects (currents and voltage differences) of the independent current (or voltage) sources in the circuit can be considered (calculated) separately, and then summed up to obtain the overall results. Based on this idea, we first simulate the basis vectors which are independent sources in the built RLC circuit. Then we can obtain the real delay of each transition pattern by superposing the results of the basis vectors.
What are the basis vectors of a bus? We define them as all independent transitions of a bus inputs. The basis vectors of the 3-bit bus are – – ↑, – ↑ –,
↑ – –, – – ↓, – ↓ – and ↓ – – . Throughout the thesis, “–” stands for stable input (i.e. 0Æ0 or 1Æ1), “↑” stands for input changing from low to high, and “↓”
stands for input changing from high to low. There are two properties of basis vectors shown in the following.
Property 1: In the equivalent linear RLC circuits, given a basis vector (e.g. – –
↑), the transition delay of switching inputs are the same whether other stable
inputs are stable at ‘0’ or ‘1’ (e.g. 000Æ001 and 110Æ111 have the same transition delay).
Proof: Coupling effects result from the voltage changing (Iinduce = Ccouple × (dV/dt)) and current changing (Vinduce = Lcouple × (dI/dt)), stable inputs will not contribute noise to neighbor wires. Therefore, the transition delays of switching inputs are the same whether other stable inputs are stable at ‘0’ or ‘1’. □
Property 2: In the equivalent linear RLC circuits, (– – ↑ , – – ↓), (– ↑ – , – ↓ –), and(↓ – – , ↑ – –) are called dual basis vector pairs. The voltage waveforms resulting from a dual basis vector pair such as (– – ↑ , – – ↓) will be equal in magnitude but opposite in directions.
Proof: Since the extracted RLC circuits are composed by linear elements, the built circuit is a LTI (linear time invariant) system. Therefore, if we input signals with the same magnitude but opposite directions, then the output waveforms will be in equal magnitude but in opposite directions. □
Based on the two properties, we define a minimum basis vector set as that all transition patterns can be obtained by superposing the basis vectors of this set.
Please note that the basis vector sets of a 3-bus are ({– – ↑, – – ↓}, {– ↑ –, – ↓ –}, {↑ – –, ↓ – –}). Therefore, there are eight choices of the minimum basis vector set for a 3-bit bus (e.g. (– – ↑, – ↑ –, ↑ – –), (– – ↑, – ↑ –, ↓ – –), (– – ↑, – ↓ –,
↑ – –), (– – ↑, – ↓ –, ↓ – –), (– – ↓, – ↑ –, ↑ – –), (– – ↓, – ↑ –, ↓ – –), (– – ↓, –
↓ –, ↑ – –) and (– – ↓, – ↓ –, ↓ – –)). The minimum basis vector set for a 3-bit bus can be arbitrary one of the eight choices. Hence, we only need to simulate the
basis vectors of the chosen minimum basis vector set. Then we can use the simulation results to obtain the delays of all possible transition patterns by applying the superposition theorem. The details and examples of step (2) are shown in Figure 14. The example in Figure 14 demonstrates the minimum basis vector simulation flow for a 3-bit bus. First, we choose one minimum basis vector from the minimum basis vector set at a time as the input transition pattern on the bus. Second, we perform SPICE simulation and record the voltage waveform of each signal wire. Then, we repeat the first step for another minimum basis vector until whole minimum basis vector set is simulated.
Record the waveform of each wire Record the waveform of
each wire 3 Record the waveform of
each wire 3
HSPICE simulationHSPICE simulation Apply one input source at one timeApply one input source at one time
1
2
HSPICE simulationHSPICE simulation
Apply one input source at one time
Apply one minimum basis vector at a time
1
2
Record the waveform of each wire Repeat 3 Record the waveform of
each wire 3 Record the waveform of
each wire Record the waveform of
each wire 3 Record the waveform of
each wire 3
HSPICE simulationHSPICE simulation Apply one input source at one timeApply one input source at one time
1
2
HSPICE simulationHSPICE simulation
Apply one input source at one time
Apply one minimum basis vector at a time
1
2
Record the waveform of each wire Repeat
Figure 14: HSPICE simulations with the minimum basis vectors of the minimum basis vector set.
2.5 Build the Transition Graph
In step (3), we apply the superposition theorem to calculate the real transition delay of each transition pattern by using the simulation results of the basis vectors in step (2). Figure 15 illustrates how to obtain the real delay of a transition pattern by using the simulation results of the basis vectors. First, we decompose the transition pattern into some basis vectors. Then, by looking up the simulation results of the basis vectors that have been simulated in step (2) and superposing them, we can obtain the real voltage waveform of the transition pattern. Finally, the transition delay of this transition can be obtained. To verify the accuracy of this method, we conduct various simulations and compare the voltage waveforms obtained by using the superposition method and by HSPICE simulation. The simulation results show that both methods have exactly the same voltage waveforms in various simulations. We give an example in Figure 16 to demonstrate the accuracy of the superposition method. For a 3-bit bus structure, Figure 16 (a) shows the voltage waveform of each signal wires at the receiver side by directly conducting HSPICE simulation. The transition pattern is – ↑ ↑ and the supply voltage is 1.2V. The voltage waveforms obtained by using the superposition method is as shown in Figure 16 (b). These voltage waveforms of the transition pattern (i.e. – ↑ ↑) are obtained by superposing the pre-simulation results of the two basis vectors, – – ↑ and – ↑ –. Comparing Figure 16 (a) and Figure 16 (b), the voltage waveforms obtained by using two methods are exactly the same. In addition, to demonstrate the accuracy of the superposition method more clearly, we also show some check points on the voltage wave forms.
The voltage values of the check points in Figure 16 (b) are exactly the same with
those in Figure 16 (a). Thus, we can conclude that by using the superposition method, we can obtain voltage waveforms exactly equivalent to those obtained by directly conducting HSPICE simulation.
1
Figure 15: An example of superposition.
The voltage waveform by directly conducting HSPICE simulation:
(– ↑ ↑)
Time Time Time
Bit 2 Bit 3
Bit 1
(a)
The voltage waveform by using superposition method:
(– – ↑ + – ↑ – = – ↑ ↑)
Bit 1 Bit 2 Bit 3
Time Time Time
(b)
Figure 16: The voltage waveforms of the signal wires obtained (a) by directly conducting HSPICE simulation and (b) by using superposition method.
With the use of the superposition method, we can effectively calculate the transition delay between any two input patterns by looking up and superposing the pre-simulation results of the minimum basis vector set. In next step, we build a transition graph to record if the transition delay between input patterns meets the delay constraint or not. Figure 17 illustrates an example of a transition graph of 3-bit bus where vertices represent input patterns and edges indicate that the transition delay between two input patterns meets the delay constraint.
With the use of the superposition theorem on an n-bit bus, the complexity of obtaining all transition patterns’ delays will be reduced from (4n)*(HSPICE simulation time for a transition pattern) to n*(HSPICE simulation time for a transition pattern) + (4n/2)*(superposition time). The first term, n*(HSPICE simulation time for a transition pattern), is for simulating the minimum basis vector set, and the second term, 4n/2*(superposition time), is for mathematical computations which superpose the minimum basis vectors’ simulation results.
011
Figure 17: The transition graph.
2.6 Find a Maximum Valid Code Set
After building the transition graph, we can obtain a maximum valid code set by finding a maximum clique in the graph. The reason to find a clique in the transition graph is that we want to obtain a valid code set within which any transition between two codes is guaranteed to meet the delay constraint. Besides, the reason to find a maximum clique is because the number of found codes (vertices) should be greater or equal to the number of required data patterns (e.g.
2n in Figure 10) under the given wire overhead (e.g. (m-n) in Figure 10 (b) ). If the number of found valid codes is greater than that of required data patterns, it indicates that we could possibly use less extra wires to generate the valid code set that meet the delay constraint. Therefore, we are always eager to maximize the number of found codes to minimize the wire overhead. However, since finding a maximum clique is an NP-complete problem, we use a greedy algorithm to solve this problem within reasonable time. The pseudo code of the greedy algorithm is shown in Figure 18.
INPUT: A transition graph OUTPUT: A clique
If the graph is a clique, output this graph and exit REPEAT
Find a vertex in the graph whose degree is the smallest Remove that vertex and update the graph
UNTILE All remained vertices in the graph form a clique All vertices in the clique form a set CLIQUE
All vertices that deleted from the graph form a set DELETE REPEAT
Find a vertex in the set DELETE
IF the vertex and the set CLIQUE can form a larger clique THEN Renew the set CLIQUE to the larger one
UNTILE All vertices in the set DELETE have been tried to add in the set CLIQUE Output the set CLIQUE
Figure 18: Our greedy algorithm to find a maximum clique.
In the first loop, we first check if this graph is a clique. If it is a clique, we output this graph directly. Otherwise, we delete the vertex whose edge degree is the smallest and update the graph. Following, we redo above steps until the remaining vertices form a clique. Take Figure 17 as an example, the edge degree of each vertex in Figure 17 is {5, 5, 6, 5, 5, 7, 5, 4} from vertex (000) to vertex (111) counterclockwise. Obviously, this input graph is not a clique. Hence, in the first loop, we first remove (111) since it has smallest edge degree and then update the graph. Next, we pick (000) as the next vertex to be removed. Then (001) and (011) are selected to be removed subsequently. The removed vertices will be stored in the set DELETE for the further check in the second loop of Figure 18.
This loop will be repeated until the remained vertices in the graph can form a
clique. Finally, the remaining vertices (010, 100, 101, 110) will form a clique as shown in Figure 19.
100
Figure 19: The greedy search result of Figure 17.
Since the greedy method in the first loop of the Figure 18 is a simple heuristic algorithm, it could not guarantee to find an optimal solution. In order to further improve the result, we add extra steps in the second loop in Figure 18 to improve the outcome.
The second loop tries to add vertices that are stored in the DELETE set back
The second loop tries to add vertices that are stored in the DELETE set back