ACCEPTED MANUSCRIPT

signer would adopt without an automatic co-design environment. This result demonstrates the importance of PE Selection. We can also observe that LTM-PS dramatically improves the solution quality over baseline SA. When con-sidering dynamic energy consumption only, LTM-PS achieves 13% less energy consumption than that of baseline SA. Two-Stage SA first uses the Greedy PE-Selection method to prune down the solution space, therefore, it can find better solutions in short time than LTM-PS. On the average, Two-Stage SA achieves 32.9% less energy consumption than baseline SA when considering dynamic energy consumption only.

When comparing SA-based algorithms with the branch-and-bound and itera-tive algorithm, we can observe that the proposed SA-based algorithms is able to synthesize a solution with good quality in a reasonable time. The branch-and-bound algorithm can always synthesize the optimal solution. When con-sidering dynamic energy only, branch-and-bound achieves 52.5% less energy consumption than baseline SA. However, the execution time of the branch-and-bound algorithm is also extremely long. In this set of experiments, when performing the branch-and-bound algorithm, we can only get synthesis results of task graphs g1−g5, which have at most 13 tasks in their task graphs. When comparing the SA-based algorithms and the branch-and-bound algorithm, we can observe that the solution quality of configuration synthesized by Two-Stage SA is close to that of branch-and-bound. When considering dynamic energy only, among the five task graphs that the branch-and-bound algorithm is able to synthesize, Two-Stage SA has at most 5.9% more energy consump-tion than that of the branch-and-bound algorithm. We can also observe that the iterative algorithm, which only explores a subset of feasible solutions in each iteration, performs worse than baseline SA, LTM-PS and the Two-Stage SA in all cases. However, the iterative algorithm performs better than the Greedy PE-Selection method in some cases. As described earlier, the Greedy PE-Selection method tends to select a set of PEs with high computation power and high energy consumption when a task set needs PEs with high computa-tion power to meet its deadline. In such cases, Greedy PE-Seleccomputa-tion tends to perform worse than iterative algorithm.

When comparing Figure11and Figure12, we observe that the differences be-tween SA-based algorithms and iterative algorithm are shortened when con-sidering static power consumption. SA-based methods tend to select PEs with lower voltage levels as long as the timing constraints are met. These PEs also lead to a longer execution time and thus have more leakage energy consump-tion. However, when considering static power consumption, except for Greedy PE-Selection, the SA-based methods still perform better than the iterative algorithm, and the Two-Stage SA still performs the best in all cases.

Figure 13 and Figure 14 show the dynamic and dynamic+static energy con-sumption of MPEG2 encoder system, respectively. The energy concon-sumption is

Greedy LTM-PS Two-Stage SA Iterative

Fig. 13. Dynamic Energy Consumption of MPEG2 Encoder System Synthesized by Various Co-Synthesis Algorithms

Greedy LTM-PS Two-Stage SA Iterative

Fig. 14. Dynamic and Static Consumption of MPEG2 Encoder System Synthesized by Various Co-Synthesis Algorithms

also normalized to that of baseline SA. In this set of experiments, we do not show the results of the branch-and-bound algorithm since the communication pattern of MPEG2 encoder is complex and branch-and-bound can not syn-thesize its result in a reasonable time. In this set of experiments, Two-Stage SA achieves 6.5% less energy consumption than baseline SA. The Greedy PE-Selection method still performs the worst in this set of experiments. However, the results synthesized by the iterative algorithm is almost the same as that of Two-Stage SA. PE library used in this set of experiment has little variation in PE performance and energy consumption. Therefore, the PE library makes the co-synthesis algorithms hard to choose PEs to trade off between perfor-mance and energy consumption, and the iterative algorithm is easy to choose good configuration in the initial solution. When considering system leakage energy consumption (Figure 14), the result is similar to that of considering dynamic energy consumption only.

7.2 Comparison of Algorithm Efficiency

Another important metric for evaluating various hardware-software co-synthesis algorithms is how fast they find the synthesis result. Table 2 lists the

execu-ACCEPTED MANUSCRIPT

Schemes Running Time

Baseline SA 1

Greedy SA 1.40

LTM-PS 2.20

Two-Stage SA 2.07

Iterative Algorithm 0.35 Branch and Bound Method 6368.8 Table 2

Execution Time Evaluation

tion time of various schemes normalized to baseline SA. The results show that Two-Stage SA derives better solution than LTM-PS without using longer ex-ecution time. In Two-Stage SA, the first stage is invoked only once, and the second stage converges faster than LTM-PS because the PE searching space has been reduced. LTM-PS has longer execution time than baseline SA since a low-temperature SA is performed after each PS perturbation. The experi-mental results also show the iterative algorithm is the fastest among all the evaluated algorithms. Compared to SA-based algorithms, the solution space explored by the iterative algorithm explores is smaller. Therefore, the iterative algorithm tends to sacrifice solution quality to get execution efficiency. The branch-and-bound algorithm is the slowest among all the evaluated algorithms since it needs to exhaustively explore the design space.

8 Conclusion

In this paper, we propose an energy-aware architectural co-synthesis algo-rithm for Network-on-Chip (NoC) system design which simultaneously opti-mizes both software and hardware architectures to meet a tight performance constraint. We propose four types of SA-based co-synthesis algorithms. The baseline SA algorithm treats each co-design step as a perturbation; LTM-PS performs a low-temperature SA after each LTM-PS perturbation; the Greedy PE-Selection method tries the PE configurations in a non-decreasing order of their energy consumption; Two-Stage SA first uses the Greedy PE-Selection method to prune the design space and then invoke a complete SA to derive final hardware and software architecture. To compare the efficiency of the proposed SA-based algorithms, we also implement the branch-and-bound and iterative algorithm to solve the co-synthesis problem of NoC. Our experimen-tal results show that the Two-Stage SA algorithm achieves the best solution quality in a reasonable execution time. When considering synthetic task set

ACCEPTED MANUSCRIPT

and dynamic energy only, Two-Stage SA achieves 32.9% less energy consump-tion than baseline SA on the average.

References

[1] T. L. Adam, K. Chandy, and J. Dickson. A comparison of list schedules for parallel processing systems. Commun. ACM, 17(12):685–690, December 1974.

[2] L. Benini and G. De Micheli. Network on chips: A new soc paradigm.

IEEE Computers, 35:70–78, January 2002.

[3] T. H. Coreman, C. E. Leiserson, R. L. Rivest, and C. Stain. Introduction to Algoirthms. McGraw Hill.

[4] W. J. Dally and B. Towles. Route packets, not wires: On-chip intercon-nection networks. Proc. Design Automation Conference (DAC), pages 684–689, June 2001.

[5] G. De Micheli. Synthesis and Optimization of Digital Circuits. McGraw-Hill, 1994.

[6] R. P. Dick, D. L. Rhodes, and W. Wolf. Tgff: Task graphs for free. Proc.

Intl. Workshop on Hardware/Software Codesign, pages 97–101, March 1998.

[7] C. J. Glass and L. M. Ni. The turn model for adaptive routing. Proc.

international Symposium on Computer Architecture (ISCA), pages 278–

287, May 1992.

[8] M. Grajcar. Strengths and weakness of genetic list scheduling for het-erogeneous systems. Proc. International Conference on Application of Concurrency to System Design (ACSD), pages 123–132, June 2001.

[9] A. Hemani, A. Jantsch, S. Kumar, A. Postula, J. ¨Oberg, M. Millberg, and D. Lindqvist. Network on a chip: An architecture for billion transistor era. Proc. of the IEEE NorChip, 220(4598):671–680, November 2000.

[10] W.-H. Hung, Y.-J. Chen, C.-L. Yang, Y.-S. Chang, and A. P. Su. An ar-chitectural co-synthesis algorithm for energy-aware network-on-chip de-sign. Proc. SAC, March 2007.

[11] J. Hu and R. Marculescu. Energy-aware mapping for tile-based noc archi-tectures under performance constraints. IEEE ASP-DAC, January 2003.

[12] J. Hu and R. Marculescu. Energy-aware communication and task schedul-ing for network-on-chip architecture under real-time constraints. Proc.

Design, Automation and Testing in Europe Conference and Exhibition (DATE), 2004.

[13] S. Kaxiras, Z. Hu, and M. Martonosi. Cache decay: Exploiting genera-tional behavior to reduce cache leakage power. In Proceedings of the 28th annual international symposium on Computer architecture 2001(ISCA’

01), 2001.

ACCEPTED MANUSCRIPT

[14] S. Kirkpatrick, C. D. G. Jr., and M. Vecchi. Optimization by simulated annealing. Science, 220(4598):671–680, May 1983.

[15] S. Kumar, A. Jantsch, J.-P. Soininen, M. Forsell, M. Millberg, J. ¨Oberg, K. Tiensyrj¨a, and A. Hemani. A network on chip architecture and design methodology. Proc. Symposium on VLSI, pages 117–124, April 2002.

[16] M. Millberg, E. Nilsson, R. Thid, and A. Jantsch. Guaranteed bandwidth using looped containers in temporally disjoint netwroks within the nos-trum network on chip. Proc. of 2004 Desitn, Automation and Test in Europe (DATE ’04), March 2004.

[17] S. Murali and G. De Micheli. Bandwidth-constrained mappings of cores onto noc architectures. Proc. 2004 Design, Automation and Test in Eu-rope (DATE ’04), March 2004.

[18] S. Murali and G. De Micheli. SUNMAP: A tool for automatic topol-ogy selection and generation for nocs. Proc. 2004 Design Automation Conference (DAC ’04), pages 914–919, 2004.

[19] S. Murali, P. Meloni, F. Angiolini, D. Atienza, S. Carta, L. Benini, G. De MiCheli, and L. Raffo. Designing application-specific networks on chips with floorplan information. Proc. 2006 International Conference on Computer-Aided Design (ICCAD ’06), 2006.

[20] D. Shin and J. Kim. Power-aware communication optimization for network-on-chips with voltage scalable links. Proc. CODES+ISSS, September 2004.

[21] G. C. Sih and E. A. Lee. A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Transactions on Parallel and Distributed Systems, 4(2):175–187, Febru-ary 1993.

[22] K. Srinivasan, K. S. Chatha, and G. Konjevod. An automated technique for topology and route generation of application specific on-chip inter-connection netowkrs. Proc. 2005 International Conference on Computer-Aided Design (ICCAD ’05), 2005.

[23] K. Srinivasan, K. S. Chatha, and G. Konjevod. Linear-programming-based atechniques for synthesis of network-on-chip architectures.

IEEE Transactions on Very Large Scale Intergration (VLSI) Systems, 14(4):407–420, April 2006.

[24] W. H. Wolf. Hardware-software codesign of embedded systems. Proceed-ings of the IEEE, 82(7):967–989, July 1994.

[25] W. H. Wolf. An architectural co-synthesis algorithm for distributed, em-bedded computing systems. IEEE Transaction on Very Large Scale In-tegration (VLSI) Systems, 5, June 1997.

[26] T. T. Ye, L. Benini, and G. De Micheli. Analysis of power consumption on switch fabrics in network routers. Proc. of Design Automation Conference (DAC), pages 524–529, June 2002.

[27] MPEG2 video. IS standard. I. D. 13818-2, 2001.

[28] ARM Processor cores. http://www.arm.com/products/CPUs/.

[29] Electronics. Philips’ IP portfolio. http://www.semiconductors.philips.com.

ACCEPTED MANUSCRIPT

[30] SimpleScalar. http://www.simplescalar.com/.

[31] Texas Instruments. Digital Signal Processing

. http://focus.ti.com/dsp/docs/dsphome.tsp?sectionId=46.

ACCEPTED MANUSCRIPT

Yi-Jung Chen received the B.S. and M.S. degrees from the Department of Computer Science and Information Engineering at National Chi Nan University, Nantou, Taiwan in 2000 and 2002, respectively. She is currently working toward the Ph.D. degree in Department of Computer Science and Information Engineering at National Taiwan University, Taipei, Taiwan.

Her research interests include high-level synthesis, Network-on-Chip design and memory hierarchy design.

Chia-Lin Yang received the B.S. degree from the National Taiwan Normal

University, Taiwan, R.O.C., in 1989, the M.S. degree from the University of Texas at Austin in 1992, and the Ph.D. degree from the Department of Computer Science, Duke University, Durham, NC, in 2001.

In 1993, she joined VLSI Technology Inc. (now Philips Semiconductors) as a Software Engineer. She is currently an Associate Professor in the Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan. Her research interests include energy-efficient microarchitectures, memory hierarchy design, and multimedia workload characterization.

Dr. Yang is the recipient of a 2000–2001 Intel Foundation Graduate Fellowship Award and 2005 IBM Faculty Award.

Yen-Sheng Chang received the B.S. degree in computer science and engineering from National Dong Hwa University, Hualien, Taiwan, in 2003, and the M.S. degree in computer science and engineering from National Taiwan University, Taipei, Taiwan, in 2005. His research interests include hardware-software co-design and Network-on-Chip design.

Biography of all Authors

ACCEPTED MANUSCRIPT

Yi-Jung Chen

ACCEPTED MANUSCRIPT

Chia-Lin Yang

ACCEPTED MANUSCRIPT

Yen-Sheng Chang

在文檔中 An Architectural Co-Synthesis Algorithm for Energy-Aware Network-on-Chip Design (頁 21-30)