Chapter 2 Bus Encoding for Reducing the Delay Time
2.10 The Performance Comparison between Shield Insertion Technique
In this sub-section, we conduct some simulations to compare the improvement of our flow with the conventional shield insertion technique. Given a bus structure, the number of data bit is 4 (16 data patterns), and the wire
overhead is one (only one additional wire is allowed). The bus length, width, height and pitch of signal wires are 2000µm, 0.8µm, 2µm and 2µm, respectively.
The number of data bits is set to 6 (i.e. 64 data patterns). Figure 24 (a) and (b) illustrates the buses after using our flow and inserting one shield.
Power/Ground
Power/Ground
ENCODER DECODER
(a)
Power/Ground
Power/Ground Power/Ground Signal
Wire
Signal Wire
(b)
Figure 24: The bus structure (a) with the use of our encoding method and (b) with the use of shield insertion technique.
The simulation results under different working frequencies and bus lengths are conducted and the worst-case transition delays are shown in Figure 25.
Besides, the worst-case transition delay of the original 6-bit bus is also shown in Figure 25. Although both our flow and the shield insertion technique can reduce the worst-case transition delay of buses, our flow always outperforms the shield insertion technique under different working frequencies and bus lengths as shown in Figure 25. Therefore, our flow is effective in reducing the coupling delay under different working frequencies and bus lengths comparing to the conventional shield insertion technique.
0.00E+00
Figure 25: The worst-case transition delay by using our method and the shield insertion technique.
Chapter 3
Bus Encoding to Lengthen Signal Propagation
With the increase of chip size and clock frequency, the global interconnect delay is likely to be larger than one clock period. Hence, on-chip signals can no longer reach the entire die in one clock cycle [17]. As shown in Figure 26, the percentage of the reachable chip area in one clock cycle continuously decreases as the technology advances. When the process technology enters 0.1-µm, only 16 percent of the die is reachable within one clock period (at 1.2 GHz). In other words, signals need eight pipeline stages to propagate through the entire die.
Recently, due to strong noise coupling effects in DSM, the one cycle signal propagation length is further decreased especially for global interconnects. Thus, how to increase the signal propagation length in one clock cycle becomes an important issue in today high performance circuit designs.
Figure 26: Trends for the clock locality metric [17].
The conventional bus encoding problem focuses on how to reduce the delay on a fixed length bus with considering only RC effects. In our work, we extend the conventional one to a new problem, i.e., how to increase the signal propagation length of a bus by using bus encoding methods with considering RLC effects under given parameters and constraints. Therefore, the conventional
bus encoding problem becomes a sub-problem of our extended problem. We also propose a flexible maximum signal propagation length estimation flow to solve the extended problem. The proposed flow combines a bus encoding scheme and a curve fitting method. The bus encoding scheme can effectively reduce the LC coupling effects on on-chip buses, and hence, improve the worst-case switching delay. The curve fitting method can keep the proposed flow very efficient.
3.1 Problem Formulation
Again, the coplanar bus structure shown in Figure 9 is considered in this chapter. The inputs of the problem in this chapter are the parameters of wires (i.e.
wire width, high, and pitch), the working frequency (or slew rate), the delay constraint, the number of data bits, the initial bus length, and user-required precision. With these inputs, our goal is generating a valid code set that achieves the maximum propagation length. The definition the valid code set is the same as that in Chapter 2.
3.2 Maximum Signal Propagation Length Estimation Flow
Predict the next D by the curve-fitting equation
NO
Output Dmaxand the valid code set Is D the maximum length?
(2). SPICE simulation for basis vectors (1). Build the RLC model
(3). Build the transition graph
(4). Find a maximum valid code set (maximum clique in graph)
YES
bus parameters, working frequency, constraints, initial bus length, and user-given precision
Record worst-case switching delay
Figure 26: The maximum signal propagation length estimation flow.
To achieve the maximum propagation length, we build the maximum signal propagation length estimation flow shown in Figure 26. It mainly comprises two parts − the bus encoding scheme and the curve fitting method. At first, users should give the bus parameters (n-bit data, initial wire length D, wire height, wire width, wire pitch, Power/Ground wire width, Power/Ground-to-signal pitch), the user-required precision (△Drequired), the working frequency, and constraints (delay constraint and wire overhead constraint). Then with these parameters, we perform our encoding scheme and check if the delay constraint is still met under the wire overhead constraint (e.g., (m-n) in Figure 1 (b)). If the delay constraint is met, the bus length is increased and predicted by the curve fitting method for the next iteration. This process is repeated until the maximum signal propagation length is obtained. When this is the case, the last length which meets the delay constraint is reported as the maximum propagation length Dmax , and a valid code set is generated at the same time. The details of our bus encoding method and curve fitting method are discussed as follows.
The bus encoding scheme in the maximum signal propagation length estimation flow is similar to that in Chapter 2. Hence, in the following, we give a brief review of the encoding scheme. With user-given bus parameters and constraints, we extract and build the RLC model in the first step. Next, the chosen minimum basis vectors will be simulated by HSPICE with the built RLC model. Then, the transition graph will be constructed according to the delay constraint. Finally, we apply a greedy algorithm to find the maximum valid code set.
3.3 The Curve Fitting Method
In order to minimize the runtime of our flow, the iterations of our flow should be kept as few as possible. Hence, the maximum propagation length should be successfully predicted within few iterations instead of just incrementally increasing the predicted length for each iteration. However, if we apply a simple binary search to find the maximum length, the number of required iterations depends heavily on the given initial length. Therefore, we adopt a curve fitting method to quickly predict the maximum propagation length and thus reduce the number of required iterations.
To use the curve fitting method, we have to find a suitable fitting equation of the interconnect delay with coupling effects. A closed form delay equation is given for a gate driving an RLC wire segment with a gate capacitance load [21].
In [21], two extreme cases need to be considered. For one extreme case where L→0, the delay reduces to 0.37RCl2 where R is the unit length wire resistance, C is the unit length wire capacitance, and l is the wire length. In this case, the wire delay is squarely dependent on the wire length. For the other extreme case where R→0, the delay reduces to l LC where L is the unit length wire inductance. In this case, the wire delay is linearly dependent on the wire length. Therefore, it is desirable to use a quadratic equation to fit the delay of a single wire when the LC effects of the wire are considered. Furthermore, we can also use a quadratic equation to fit the worst-case switching delay of an n-bit parallel coupled bus
because the switching aggressors only change the effective wire capacitance and inductance of a victim wire. We also use various curve fitting methods to fit the worst-case delay curve with respect to the wire length. The simulation results are shown in Figure 27. From Figure 27, we can observe that the quadratic fitting equation matches the wire delay very well comparing to other fitting equations.
Therefore, by using the quadratic curve fitting, the maximum propagation length can be predicted efficiently within few iterations in our flow.
Figure 27: Curve fitting with different functions.
3.4 The Termination Condition
After the valid code set is generated by the bus encoding scheme, two conditions need to be checked before outputting the final valid code set and the maximum propagation length (Dmax). The two conditions are shown in Figure 28.
In the first step, we will check if the size of the found valid code set is greater than or equal to the number of the data patterns. If the size is greater than or equal to the number of data patterns, we record the current bus length (D) as the valid length and move to the next step for further check. Else, the current bus length will be recorded as the invalid length and the flow move to the curve fitting scheme to predict the next D. The valid length means that with this bus length, our flow can generate a valid code set whose size is greater than or equal to the number of the data patterns. On the contrary, the invalid length means that with this bus length, our flow can not generate a valid code set whose size is greater than or equal to the number of the data patterns.
Figure 28: The decision policy of our flow.
If the current bus length (D) is recorded as the valid length, we move to the next step to check whether the user-required precision is met or not. In step (2), we will check if the difference of distance △D is smaller than user-required precision △Drequired, where △D = | invalid length - valid length |. If △D is smaller than user-required precision, the flow will be terminated and the valid length will be outputted as the maximum signal propagation length with the valid code set. Else, the flow will move to the curve scheme to predict the next D to redo the bus encoding scheme. In the following, we give an example shown in Figure 29 to demonstrate the conditions discussed in step (2). Given a bus structure, the user-required precision is set to 10μm (i.e. △Drequired =10μm ) and the number of data patterns is 64. The black points in Figure 29 represent the valid lengths and the black triangle represent the invalid lengths. In Figure 29 (a), the distance difference △D of the current valid length and the invalid length is larger than user-required precision △Drequired. On the other hand, since △D <
△Drequired in Figure 29 (b), the current valid length (1576 μm) will be
outputted with a valid code set as the final result.
54
1500 1510 1520 1530 1540 1550 1560 1570 1580 1590 1600 Bus Length (um)
1500 1510 1520 1530 1540 1550 1560 1570 1580 1590 1600 Bus Length (um)
Valid Code Set Size
3μm 3μm
(b)
Figure 29: (a) The distance difference △D ( | invalid length - valid length
| ) is larger than user-required precision △Drequired. (b) △D < △ Drequired.
3.5 Simulation Results
With the user-given parameters, the following simulation results show that our flow can increase the signal propagation length of buses by reducing coupling effects. For example, if the delay constraint is set to 20ps, and the width, height and pitch of signal wires are given as 2µm, 2µm and 4µm, respectively.
The bus working frequency is 1GHz and the supply voltage is 1.2V. For a 4-bit data bus (16 data patterns), the simulation results of the signal propagation length with different numbers of wires are shown in Figure 30. From Figure 30, we can observe that the improvement of the maximum propagation length is generally better when adding more wires into the bus (more wire overhead). Similar simulation results can be obtained in Figure 31 for a 6-bit data bus (64 data patterns). To verify whether the estimated maximum propagation length and the valid code set are correct, SPICE simulation is conducted. It is confirmed that all transitions meet the given delay constraint. Thus, our flow can increase the signal propagation length by reducing the coupling effects under the given delay constraint.
However, the improvement of the maximum propagation length becomes less after adding a certain number of wires into the bus as shown in Figure 30 and 31. If we continuous adding wires, more added wires may even decrease the improvement of the propagation length. This is because the additional wires increase the bus width (i.e. increase the current loop width of each wire).
Therefore, the inductance effects on the buses increase with every additional wire as well. Hence, there is an optimal number of extra wires for a given bus
structure and working frequency. By using our flow, designers can easily obtain the improvement curve of the maximum propagation length regarding to different number of additional wires. With this information, designers can obtain the optimal number of additional wires and estimate the maximum propagation length of buses.
Figure 30: The maximum propagation length vs. different wire overheads for a 4-bit data bus.
700 800 900 1000 1100 1200 1300
4 5 6 7 8 9 10 11
Bit Length(um)
Figure 31: The maximum propagation length vs. different wire overheads for a 6-bit data bus.
In Figure 32, we demonstrate more simulation results of our flow. Assume the number of data bits varies from 3 to 10, we use the maximum signal propagation length estimation flow to find the maximum length with one bit wire overhead. In other words, for 3-bit data, we conduct our flow with a 3-bit bus and a 4-bit bus to find the maximum propagation length. The original propagation length in Figure 32 represents the found maximum propagation length by using our flow without 1-bit wire overhead (i.e. for 3-bit data, the original propagation length represents the found maximum length on 3-bit bus). In addition, the increased propagation length stands for the increased length after adding one bit wire overhead by using our flow. As shown in Figure 32, our flow can increase about 25 percent of the original propagation length after adding one more bit as wire overhead. Hence, we can conclude that our flow can indeed increase the
700 800 900 1000 1100 1200 1300
6 7 8 9 10 11
Bit Length(um)
maximum propagation length by using our encoding scheme.
0 200 400 600 800 1000 1200
3 4 5 6 7 8 9 10
Bit
Length (um) Increased Propagation Length
Original Propagation Length
Figure 32: The maximum propagation length after adding one more bit as wire overhead.
Chapter 4 Conclusions
4.1 Bus Encoding for Reducing the Delay Time
In this work, we propose an effective on-chip bus encoding flow by considering RLC effects and is usable in any working frequency. With user-given design constraints and wire parameters, our encoding flow will generate a valid code set that meets all constraints. Since all the transitions between two input patterns in the valid code set meet the design constraint, designers can perform one to one mapping between valid code set and data patterns without restrictions.
Simulation results show that our encoding method can significantly reduce the coupling delay of a bus with given delay constraint and parameters. In addition, by using superposition theorem, our method can significantly reduce the run time.
4.2 Bus Encoding to Lengthen Signal Propagation
We also propose a maximum signal propagation length estimation flow that can increase the propagation length on buses with the user-given constraints and bus parameters. Our flow combines a proposed bus encoding scheme and a curve fitting method. The proposed bus encoding scheme can effectively reduce the LC coupling effects on on-chip buses, and hence, improve the worst-case switching delay. In addition, the signal propagation length can be efficiently predicted by the proposed curve fitting method. Therefore, our flow can produce a valid code set of a bus structure that achieves the maximum signal propagation length under the given constraints with reasonable runtime. The simulation results show that the proposed flow significantly increase the propagation length.
Since this work can estimate the propagation length of a bus, we believe that it is feasible to be integrated into a floorplan or routing tool. Furthermore, we intend to combine our work with some layout techniques such as the shield insertion or the buffer insertion. Therefore, the maximum signal propagation length of a bus could possibly be further increased by a proper mixture of these techniques.
Reference
[1] Semiconductor Industry Association, International Technology Roadmap for Semiconductors, 2003.
[2] Mohamed A. Elgamel and Magdy A. Bayoumi, “Interconnect Noise Analysis and Optimization in Deep Submicron Technology,” IEEE circuits and Systems Magazine, pp.1540-7977, March, 2003.
[3] Joon-Seo Yim and Chong-Min Kyung, “Reducing Cross-Coupling among Interconnect Wires in Deep-Submicron Data path Design,” Proceedings of IEEE Design Automation Conference, pp. 21-25 June 1999.
[4] Shang-Wei Tu; Jing-Yang Jou; Yao-Wen Chang, “RLC Effects on Worst-Case Switching Pattern for On-Chip Buses,” IEEE International Symposium on Circuits and System, 2004.
[5] Mohamed A. Elgamel, Ashok Kumar, and Magdy A. Bayoumi, “Efficient Shield Insertion for Inductive Noise Reduction in Nanometer Technologies,”
IEEE Transaction Very Large Scale Integration Systems, vol. 13, issue 3, pp.
401-405, Mar. 2005.
[6] Yu Cao, Chenming Hu, Xuejue Huang, Andrew B. Kahng, Sudhakar Muddu, Dirk Stroobandt, and Dennis Sylvester, “Effects of Global Interconnect Optimizations on Performance Estimation of Deep Submicron Design,”
IEEE/ACM International Conference on Computer Aided Design, pp. 56 – 61, Nov. 2000.
[7] Kevin M. Lepak, lrwan Luwandi, and Lei He, “Simultaneous Shield Insertion and Net Ordering under Explicit RLC Noise Constraint,” Proceedings of Design Automation conference, pp. 199-202, June 2001.
[8] Magdy A. El-Moursy and Eby G. Friedman, “Optimum Wire Sizing of RLC Interconnect with Repeaters,” Proceedings of the IEEE Great Lakes Symposium on VLSI, pp. 27-32, April 2003.
[9] Yehea I. Ismail, Eby G. Friedman, and Jose L. Neves, “Repeater Insertion in Tree Structured Inductive Interconnect,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 48, issue 5, pp.
471-481, May 2001.
[10] Bret Victor and Kurt Keutzer, “Bus Encoding to Prevent Crosstalk Delay,”
International Conference on Computer Aided Design, pp. 57-63, Nov. 2001.
[11] Kwang-Hyun Baek; Ki-Wook Kim; and Sung-Mo Kang, “A Low Energy Encoding Technique for Reduction of Coupling Effects in SOC Interconnects,” Proc. of the 43rd IEEE Midwest Symposium on Circuits and Systems, vol. 1, pp. 80-83, Aug. 2000.
[12] Paul P. Sotiriadis and Anantha Chandrakasan, “Reducing Bus Delay in Submicron Technology Using Coding,” Proc. of the Asia and South Pacific Design Automation Conference, pp. 109-114, Feb. 2001.
[13] Harish Kriplani, Farid Najm, and Ibrahim Hajj, “Pattern Independent Maximum Current Estimation in Power and Ground Buses of CMOS VLSI Circuits: Algorithms, Signal Correlations, and Their Resolution,” IEEE Trans. Computer-Aided Design of Integrated Circuits Syst., pp. 998-1012, 1995.
[14] L. O. Chua, C. A. Desoer, and E. S. Kuh, “Linear and Nonlinear Circuits”, McGraw-Hill Inc, 1987.
[15] Keith Nabors and Jacob White, “FastCap: A Multipole Accelerated 3-D Capacitance Extraction Program,” IEEE Trans. Computer-Aided Design, vol.
10, No. 11, pp. 1447-1459, Nov. 1991.
[16] M. Kamon, M. J. Tsuk, and J. K. White, “FastHenry: a Multipole-accelerated 3D Inductance Extraction Program,” IEEE Trans.
Computer-Aided Design, pp. 1750−1758, Sept. 1994.
[17] Doug Matzke, “Will Physical Scalability Sabotage Performance Gain?”
IEEE Computer, pp. 37-39, Sept. 1997.
[18] Ankireddy Nalamalpu, Sriram Srinivasan, and Wayne P. Burleson,
“Boosters for Driving Long On-chip Interconnects: Design Issues, Interconnect Synthesis and Comparison with Repeaters,” IEEE Transaction on Computer-Aided Design, vol. 21, pp. 50-62, Jan. 2002.
[19]Hui Zhang, Varghese George, and Jan M. Rabaey, “Low-swing On-chip Signaling Techniques: Effectiveness and Robustness,” IEEE Transaction Very Large Scale Integration Systems, vol. 8, pp. 264-272, June 2000.
[20]Atul Maheshwari and Wayne Burleson, “Differential Current-sensing for On-chip Interconnects,” IEEE Transaction Very Large Scale Integration Systems, vol. 12, pp. 1321-1329, Dec. 2004.
[21] Yehea I. Ismail and Eby G. Friedman, ”Effects of Inductance on the Propagation Delay and Repeater Insertion in VLSI Circuits: a Summary,”
IEEE Circuits and Systems Magazine, 2003.
Vita
Jiun-Sheng Huang was born in Taitung on December 29, 1980. He received the M.S. degree in Electrical Engineering from National Chiao Tung University in June 2003 and entered the Institute of Electronics, National Chiao Tung University in September 2003. His major studies were Electronics Design Automation (EDA) and VLSI design. He received the M.S. degree from NCTU in June 2005.