Chapter 3 Task Scheduling
3.7 Termination
GAs terminate until the fitness value of the best chromosome is saturated during the evolution process or the generation number reaches a pre-defined number.
Chapter 4 Experimental Results
4.1 Experimental flow
Figure 28 demonstrates our experimental flow. At first, we exploit Task Graphs for Free (TGFF) [20] , which is a user-controllable, general-purpose, pseudorandom task graph generator, to generate random cases. And then, the generated task graphs are scheduled by our task scheduling tool. Finally, we can analyze the experimental results.
Task Graph
TGFF Task Analysis
Scheduling Task Graph
TGFF Task Analysis
Scheduling
Figure 28 : Experimental flow
We use TGFF to generate many task graphs. Each task graph contains a netlist and tables for related information. For example, Figure 29 (a) is a netlist, where “TASK”
represents a task and the following are the task name and computation information which is specified in Figure 29 (c) (processing element database). As well, “ARC” represents a data transmission from the former to the later, and the corresponding communication amount is given in Figure 29 (d). In the computation table, “uP” and “fpga” represent the computation time in processor and FPGA, respectively. “memory” and “capacity” are the memory and capacity usage of processor and FPGA. The TGFF output file and its corresponding task graph is shown in Figure 29 (b).
@computation 0 {
@communication 0 {
@TASK_GRAPH 0 { PERIOD 1659
(d) Communication table (c) Computation table
(a) Netlist (b) Task graph
@computation 0 {
@communication 0 {
@TASK_GRAPH 0 { PERIOD 1659
(d) Communication table (c) Computation table
(a) Netlist (b) Task graph
4.2 Analysis of performance of GAs
In this experiment, we compare the traditional mating schemes and our mating schemes. The parameters of GAs are shown in Table 2.
1000 Max generation
400 Population
20%
Mutation rate
40%
Cross rate
1000 Max generation
400 Population
20%
Mutation rate
40%
Cross rate
Table 2 : The parameters of GAs
Cross rate means that 40% of population is going to mate. Mutation rate means that every new generated chromosome has the probability of 20% to perform mutation. The whole population is set to 400 chromosomes. The algorithm terminates until the performance of the best chromosome is saturated or when it reach max generation.
We generate 20 random task graphs and each task graph contains 270 ~ 330 tasks. The computation time of each task is set to 150 ~ 200 time unit on FPGA and 50 ~ 67 time unit on processor. When two or more than two tasks that are mapped onto a processor, the processor needs to schedule the tasks. Therefore, we set the computation time of each task on processor 1/3 times of that on FPGA, such that total computation time of tasks on processor or FPGA is more balance. The communication amount is 150 ~ 200 data unit, and the maximum fanin/out of each task is 6. The memory and capacity usage of each task
The communication time is the communication amount divided by channel bandwidth without any contention. Here, the channel bandwidth is set 1 ~ 4 (data unit / time unit), such that the ratio of computation time to communication time (no contention) is 1 ~ 4.
When the ratio is low (e.g., 1), the system is computation intensive. When the ratio is high (e.g., 4), the system is communication intensive.
The resource location of our platform is shown in Figure 30. The topology is like a chessboard, and the mesh size is 7 × 7. The memory of processor and capacity of FPGA are set to 1800. The buffer size of each PE is 12000 (data unit).
Processor
FPGA Processor
FPGA
Figure 30 : Resource location
The system performance improved rate of four mating schemes with four ratio (1~4) are shown in Figure 31. Since sub-graph crossover considers the dependency of tasks, the improved rate of GAs that using sub-graph crossover outperforms those use traditional single point-crossover and two-point crossover. This implies that the mating schemes should consider the dependency of tasks. In addition, shape crossover not only inherits the
shape crossover outperforms all other mating schemes. In Figure 32, the saturation time of shape crossover is less than others. Shape crossover provides better results and shorter computation times than other mating schemes.
140%
*Average of 20 cases
Improve rate = , Ratio = comp. time / commu. time
Computation
Figure 31 : The improvement of 4 mating schemes
811
Saturation time (generation)
*Average of 20 cases
Figure 32 : Saturation time of 4 mating schemes
Chapter 5
Conclusions and Future Works
5.1 Conclusions
In this thesis, we solve the multi-constraints task scheduling problem. By mapping the task scheduling problem to GA-domain, this problem is solved in an efficient way. Since the traditional mating schemes in GAs do not consider the dependency of the task graph, we propose both sub-graph and shape crossover to overcome this issue. We also construct a high-level simulator to evaluate our solutions. This is not only fast but also accurate. The experimental results show that our mating schemes provide better performance and require less computation time than traditional ones.
5.2 Future works
It is found that buffer-size for every input/output of task has great impact on system performance. If the buffer-size is unlimited, the data transmission can always be accepted, and the utilization of communication resources will be maximized so that system performance is also improved. However, due to the lack of on-chip memory, unlimited buffer-size is impossible. An algorithm must be developed to optimize the buffer-length of each input/output instead of equally-distributed, such that the system performs well with limited buffer size.
Resource location is also important. If we do not consider the relationship between topology of resource location and application, the system may not perform well.
Consequently, given a specific application and several platforms with different topologies, an algorithm must be developed to find out the most suitable platform for the application.
Reference
[1] Axel Jantsch and Hannu Tenhunen, Networks on Chip, Kluwer Academic Publishers, 2003.
[2] Luca Benini and Giovanni De Micheli, “Networks On Chips: A New SoC Paradigm,” in Computer Jan. 2002, Volume 35, Issue 1, pp. 70-78.
[3] Davide Berozzi and Luca Benini, “Xpipes: A Network-on-Chip Architecture for Gigascale Systems-on-Chip,” in Circuit and Systems Magazine 2004, Volume 4, Issue 2, pp. 18-31.
[4] Cesar Albenes Zeferino and Altamiro Amsdeu Susin, “SoCIN: A Parametric and Scalable Network-on-Chip,” in Proceedings of the 16th Symposium on Integrated Circuits and Systems Design, Sep. 2003, pp. 169-174.
[5] Pierre Guerrier and Alain Greiner, “A Generic Architecture for On-Chip Packet-Switched Interconnections,” in proceedings of the conference on Design, automation and test in Europe, 2000, pp. 250-256.
[6] Alan Allan, Don Edenfeld, William J. Joyber, Jr, Andrew B. Kahng, Mike Rodgers and Yervant Zorian, “2001 Technology Roadmap for Semiconductors,” in IEEE computer, Jan. 2002, pp.42-53.
[7] William J. Dally and Brian Towles, “Route Packets, Not Wires: On-Chip Interconnection Networks,” in Proceedings of the Design Automation Conference, June 2001, pp. 684-689.
[8] Jingcao Hu and Radu Marculescu, “Energy- and Performance-Aware Mapping of
Integrated Circuits and Systems, April 2005, Volume 24, Issue 4, pp.551-562.
[9] Shashi Kumar, Axel Jantsch, Juha-Pekka Soininen, Martti Forsell, Mikaek Millberg, Johny Öberg, Kari Tiensyrjä and Ahmed Hemani, “A Network on Chip Architecture and Design Methodology,” in Proceedings of IEEE Computer Society Annual Symposium on VLSI, April 2002, pp. 105-112.
[10] Daniel Wiklund and Dake Liu, “SoCBUS: Switched Network on Chip for Hard Real Time Embedded Systems,” in Proceedings of the Parallel and Distributed Processing Symposium, April 2003.
[11] Jingcao Hu and Radu Marculescu, “Energy-Aware Mapping for Tile-based NoC Architectures Under Performance Constraints,” in Proceedings of Asia & South Pacific Design Automation Conference, Jan. 2003, pp. 233-239.
[12] Srinivasan Murali and Giovanni De Micheli, “Bandwidth-Constrained Mapping of Cores onto NOC Architectures,” in Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, Feb. 2004, volume. 2, pp. 896-901.
[13] Tang. Lei and Shashi Kumar, “A Two-Step Genetic Algorithm for Mapping Task Graphs to a Network on Chip Architecture,” in Proceedings of Euromicro Symposium on Digital System Design, Sep. 2003, pp. 180-187.
[14] Liang-Yu Lin, Cheng-Yeh Wang, Pao-Jui Huang, Chih-Chieh Chou and Jing-Yang Jou, “Communication-driven Task Binding for Multiprocessor with Latency Insensitive Network-on-Chip,” Asia and South Pacific Design Automation Conference, Jan. 2005.
[15] R.J.H. Hoes, “Predictable Dynamic Behavior in NoC-based Multiprocessor System-on-Chip,” M.Sc. Thesis, TUE, Eindhoven, Dec. 2004.
Real-Time DSP,” in Global Telecommunication Conference and Exhibitions, Nov.
1989, Volume 2, pp. 1279-1283.
[17] Kenjiro Taura and Andrew Chien, “A Heuristic Algorithm for Mapping Communicating Tasks on Heterogeneous Resources,” in Proceedings of 9th Heterogeneous Computing Workshop, May 2000, pp. 102-115.
[18] David E. Goldberg, Genetic Algorithms in Search, Optimization & Machine Learning, Addison-Wesley Publishers, 1989.
[19] Baxter, M. J., Tokhi, M. O. and Fleming, P. J. “An Investigation of the Heterogeneous Mapping Problem Using Genetic Algorithms,” on CONTROL '96, UKACC International Conference, Sep. 1996, Volume 1, pp. 448-453.
[20] Robert P. Dick, David L. Rhodes and Wayne Wolf, “TGFF: Task Graphs for Free,” in proceeding of the 6th International Workshop on Hardware/Software Codesign, 1998, pp. 97-101.
Vita
Wan-His Hsieh was born in Taoyuan, Taiwan on August 6, 1981. He received the B.S.
degree in Electrical Engineering from National Central University in June 2003 and entered the Institute of Electronics, National Chiao Tung University in September 2003.
His research interests include electronic design automation (EDA) and VLSI design. He received the M.S. degree from National Chiao Tung University in June 2005.