Analysis - Experimental Results - 基於基因演算法應用於異質性網路單晶片系統之快速任務排程方法

Chapter 4 Experimental Results

4.2 Analysis

1. 2-step VS. 1-step

We compare three crossover schemes here. The curve “Shape” is the crossover scheme as presented in [17]. Curve “Shape + Partition” means using partition cross through the evolution process and curve “2-step” is our crossover scheme. As we can see in

Figure 4.3, if we change the crossover method in the 200^th generation, we can get better performance in the later generation. In this case, we can get 4.6% throughput improvement in 1,000^th generation, and the throughput of 2-step method is always better than the shape method.

Table 4.2：2-step method (Task 300/Ratio 2)

2-step Shape + Partition Shape

Generation 100 200 600 1000 100 200 600 1000 100 200 600 1000

Throughput 479 536 645 697 479 536 616 666 435 489 615 674

400 450 500 550 600 650 700

100 200 600 1000

Generation Throughput

2-step Shape+Partition shape

Figure 4.3：Improvement of 2-step method (Task 300/Ratio 2)

2. Partition Method

As shown in Figure 4.4, three curves show the throughput in 100^th, 200^th and 1,000^th generation. Since we use partition cross with control of communication overhead, we can get a raising trend when partition number increases. But partition number influences the run time of system, so we pick a proper value of partition by the improvement curve. Take Figure 4.4 as an example, when partition number is more than 6, the throughput improvement curve becomes flat.

Table 4.3：Partition (Task 200/Ratio 2)

Partition 1 2 3 4 5

Throughput at 100gen 621 663.1 671.8 671.4 682.6 Throughput at 200gen 679.3 740.7 749.7 746.1 749.4 Throughput at 1000gen 783.1 845 852.8 857.5 840

Partition 6 7 8 9 10

Throughput at 100gen 685 679.6 694 691.7 700.8 Throughput at 200gen 755.8 760.3 767.6 767.8 766 Throughput at 1000gen 863.9 873.4 884.2 871.6 890.3

600 650 700 750 800 850 900

1 2 3 4 5 6 7 8 9 10

Partition number

Throughput 1000 gen

200 gen 100 gen

Figure 4.4：Partition (Task 200/Ratio 2)

3. Comparison under Different Throughput Demand

We compare three crossover schemes under 3 task ranges including two-point crossover, shape crossover [17] and our crossover method. The advantages of our algorithm are proved here. Because of great improvement of throughput in the beginning of evolution, we can save a lot of time compared with other crossover schemes under different throughput demands. Especially in terms of tasks of 300 and 400, the saving time is significant.

Table 4.4：Comparison under different throughput demand (Task 200/Ratio 1)

Task200

Throughput 250 350 450

2 point 29 243 837

Shape 13 233 631

2-step 16 146 533

0 100 200 300 400 500 600 700 800 900

250 350 450

Throughput Generation

2 point Shape 2-step

Figure 4.5：Throughput curves in 200 tasks (Ratio 1)

Table 4.5：Comparison under different throughput demand (Task 300/Ratio 1)

Task300

Throughput 200 250 300

2 point 105 364 868

Shape 88 259 672

2-step 34 157 447

0 100 200 300 400 500 600 700 800 900 1000

200 250 300

Throughput Generation

2 point Shape 2-step

Figure 4.6：Throughput curves in 300 tasks (Ratio 1)

Table 4.6：Comparison under different throughput demand (Task 400/Ratio 1)

Task400

Throughput 160 200 240

PP 12 150 713

Shape 5 161 469

PGA 4 51 278

0 100 200 300 400 500 600 700 800

160 200 240

Throughput Generation

2 point Shape 2-step

Figure 4.7：Throughput curves in 400 tasks (Ratio 1)

4. Throughput Comparison

In this section, we compare four crossover methods including random, two-point crossover, shape crossover [17] and our method. We normalize the throughput to random method. As we can see, we can get around 10% improvements in 100^th generation in all cases. If we can get great improvement in the start of evolution, then we can save a lot of time to obtain a desired throughput.

In most cases, our algorithm can attain higher throughput in the whole evolution process, especially for the case of task number above 300, where 5% improvement can be attained at 1,000^th generation.

Task200

100 200 400 600 800 900 1000

Generation

Figure 4.8：Improvement of 4 crossover schemes (Task 200/Ratio 1)

100 200 400 600 800 900 1000

Generation

Figure 4.9：Improvement of 4 crossover schemes (Task 200/Ratio 2)

100 200 400 600 800 900 1000

Generation

Figure 4.10：Improvement of 4 crossover schemes (Task 200/Ratio 3)

100 200 400 600 800 900 1000

Generation

Figure 4.11：Improvement of 4 crossover schemes (Task 200/Ratio 4)

There are 4 cases of task graphs containing 170~230 tasks under 4 ratios of computation time to communication time. Our algorithm can obtain 8%~10%

improvement in 100^th generation, but in the later generation, the difference is decreased.

Task300

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

0 200 400 600 800 1000 1200

Generation Throughput

2-step Shape 2 point Random 12.9%

Figure 4.12：Improvement of 4 crossover schemes (Task 300/Ratio 1)

0 200 400 600 800 1000 1200

Generation

Figure 4.13：Improvement of 4 crossover schemes (Task 300/Ratio 2)

0 200 400 600 800 1000 1200

Generation

Figure 4.14：Improvement of 4 crossover schemes (Task 300/Ratio 3)

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9

0 200 400 600 800 1000 1200

Generation Throughput

2-step Shape 2 point Random

Figure 4.15：Improvement of 4 crossover schemes (Task 300/Ratio 4)

There are 4 cases of task graphs containing 270~330 tasks under 4 ratios of computation time to communication time. Our algorithm can obtain 12%

improvement in 100^th generation, and we can still have above 4% improvement in 1000^th generation.

Task400

0 200 400 600 800 1000 1200

Generation

Figure 4.16：Improvement of 4 crossover schemes (Task 400/Ratio 1)

0 200 400 600 800 1000 1200

Generation

Figure 4.17：Improvement of 4 crossover schemes (Task 400/Ratio 2)

0 200 400 600 800 1000 1200

Generation

Figure 4.18：Improvement of 4 crossover schemes (Task 400/Ratio 3)

0 200 400 600 800 1000 1200

Generation

Figure 4.19：Improvement of 4 crossover schemes (Task 400/Ratio 4)

There are 4 cases of task graphs containing 370~430 tasks under 4 ratios of computation time to communication time. Our algorithm can attain above 12%

improvement in 100^th generation, and we can still have above 5% improvement in 1000^th generation.

5. Mutation Rate VS. Throughput

250 260 270 280 290 300 310 320 330 340 350

0 200 400 600 800 1000 1200

0 10%

20%

Figure 4.20：Mutation rate VS. Throughput (Task 300/Ratio 2)

In our algorithm, suitable mutation rate can raise throughput performance.

However, we think that crossover method dominates the throughput performance of genetic algorithm. Thus, we focus on the improvement of crossover method.

Chapter 5

Conclusions and future work

5.1 Conclusions

Our method can obtain great improvement in scheduling process. We can use the result according to our requirement. For instance, if we want to get a throughput in a short period, our method can attain around 10% improvement in 100th generation. If our requirement is met around here, we can save a lot of scheduling time. On the other hand, if we want a higher throughput, our method can also provide better throughput in the later generation. Especially for the case of task number above 300, our method can still obtain 5% improvement at the 1000th generation.

Traditional method can not handle the task graph well, and this kind of situation will become worse when task graph become larger. So we combine the partition scheme and graphic-base crossover to improve the partition method by finding suitable partition size and adjust boundary. Experimental result shows that the improvement in throughput is obvious and we can save a lot of scheduling time.

When applications become more complex, we think that our scheduling method can handle well and get great system performance.

5.2 Future work

Partition method for task graph is important for partition genetic algorithms, especially in the complex applications. If we can find better partition method to handle the task graph, we believe that we can speed up the scheduling program and obtain better performance in system throughput. In order to get better partition algorithm, we can further consider about the partition size, communication amount, and task topology.

We can find different improvement curves in different crossover methods. For example, our method can obtain great improvement at the start of evolution, but in later generation, there is still space for further improvement. In other traditional method, they have their own curves and trend. If we can do some detail analysis about all the crossover methods, maybe we can find a better crossover method by combining their advantage.

Reference

[1] R. Ho, K. Mai, and M. Horowitz, "The future of wires," IEEE, vol. 89, no. 4, pp.

490-504, April 2001.

[2] William J. Dally and J. Poulton, "Digital Systems Engineering," Cambridge University Press, 1998.

[3] Cesar Albenes Zeferino and Altamiro Amsdeu Susin, "SoCIN: A Parametric and Scalable Network-on-Chip," 16th Symposium on Integrated Circuits and Systems Design, pp. 169-174, Sep. 2003.

[4] Axel Jantsch, and Hannu Tenhunen, "Networks on Chip," Kluwer Academic Publishers, 2003.

[5] Adrijean Adriahantenaina, Hervé Charlery, Alain Greiner, Laurent Mortiez and Cesar Albenes Zeferin, "SPIN: a scalable, packet switched, on-chip micro-network," Design, Automation and Test in Europe Conference and Exhibition, supplements 70-73, 2003.

[6] Luca Benini and Giovanni De Micheli, "Networks on Chips: a New SoC Paradigm," Computer, Volume 35, Issue 1, pp. 70-78, Jan. 2002.

[7] William J. Dally and Brian Towles, "Route Packets, Not Wires: On-Chip Interconnection Networks," Design Automation Conference, pp. 684-689, June 2001.

[8] Pierre Guerrier and Alain Greiner, "A Generic Architecture for On-Chip Packet-Switched Interconnections," Design, automation and test in Europe, pp.

250-256, 2000.

[9] Shashi Kumar, Axel Jantsch, Juha-Pekka Soininen, Martti Forsell, Mikaek Millberg, Johny Öberg, Kari Tiensyrjä and Ahmed Hemani, "A Network on Chip Architecture and Design Methodology," IEEE Computer Society Annual Symposium on VLSI, pp. 105-112, April 2002.

[10] Daniel Wiklund and Dake Liu, "SoCBUS: Switched Network on Chip for Hard Real Time Embedded Systems," Parallel and Distributed Processing Symposium, April 2003.

[11] Doris Ching, Patrick Schaumont and Ingrid Verbauwhede, "Integrated modeling and Generation of a Reconfigurable Network-on-chip," 18th International Parallel and Distributed Processing Symposium, pp. 139-145, 2004.

[12] Davide Berozzi and Luca Benini, "Xpipes: A Network-on-Chip Architecture for Gigascale Systems-on-Chip," Circuit and Systems Magazine, Volume 4, Issue 2, pp. 18-31, 2004.

[13] Srinivasan Murali and Giovanni De Micheli, "Bandwidth-Constrained Mapping of Cores onto NOC Architectures," Design, Automation and Test in Europe Conference and Exhibition, volume. 2, pp. 896-901, Feb. 2004.

[14] Tang. Lei and Shashi Kumar, "A Two-Step Genetic Algorithm for Mapping Task Graphs to a Network on Chip Architecture," Euromicro Symposium on Digital System Design, pp. 180-187, Sep. 2003.

[15] Edwin S.H. Hou and Nirwan Ansari, "A Genetic Algorithm for Multiprocessor Scheduling," IEEE Transactions on Parallel and Distributed Systems, Volume 5, pp.113-120, 1994.

[16] Yi-Hsuan Lee and Cheng Chen, "A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems," The Ninth Workshop on Compiler Techniques for High-performance Computing, 2003.

[17] Wan-Hsi Hsieh, "GA-Based Task Scheduling for Heterogeneous Network-on-Chip," National Chiao Tung University, Master Thesis, 2005.

[18] Liang-Yu Lin, Cheng-Yeh Wang, Pao-Jui Huang, Chih-Chieh Chou and Jing-Yang Jou, "Communication-driven Task Binding for Multiprocessor with Latency Insensitive Network-on-Chip," Asia and South Pacific Design Automation Conference, Jan. 2005.

[19] Jingcao Hu and Radu Marculescu, "Energy-Aware Mapping for Tile-based NoC Architectures Under Performance Constraints," Asia & South Pacific Design Automation Conference, pp. 233-239, Jan. 2003.

[20] Jingcao Hu and Radu Marculescu, "Energy- and Performance-Aware Mapping of Regular NoC Architectures," IEEE transactions on Computer-Aided Design

of Integrated Circuits and Systems, Volume 24, Issue 4, pp.551-562, April

2005.

[21] Kenjiro Taura and Andrew Chien, "A Heuristic Algorithm for Mapping Communicating Tasks on Heterogeneous Resources," 9th Heterogeneous Computing Workshop, pp. 102-115, May 2000.

[22] David E. Goldberg, "Genetic Algorithms in Search, Optimization & Machine Learning, " Addison-Wesley Publishers, 1989.

[23] Baxter, M. J., Tokhi, M. O. and Fleming, P. J. "An investigation of the heterogeneous mapping problem using genetic algorithms," CONTROL '96, UKACC.

[24] R.J.H. Hoes, "Predictable Dynamic Behavior in NoC-based Multiprocessor System-on-Chip," M.Sc. Thesis, TUE, Eindhoven, Dec. 2004.

[25] Robert P. Dick, David L. Rhodes, and Wayne Wolf, "TGFF: task graphs for free," 6th International Workshop on Hardware/Software Codesign, pp. 97-101, 1998.

Vita

Yan-Ting Mi was born in Taipei on June 4, 1983. He received the B.S. degree in Electronics Engineering from National Chiao Tung University in June 2006. From September 2006 to August 2008, he was a graduate student of Professor Jing-Yang Jou in the institute of Electronics, National Chiao Tung University. His research was related to Electronic Design Automation (EDA). He received the M.S. degree in Electronics Engineering from National Chiao Tung University in August 2008.

在文檔中基於基因演算法應用於異質性網路單晶片系統之快速任務排程方法 (頁 58-0)