Future work - Conclusions and Future Work

Chapter 5 Conclusions and Future Work

5.2 Future work

In addition to our previous features, there are still some attractive issues worthy of further investigations in the future.

(1) In our task model, we assume all tasks do not share any data. However, some tasks may share data with others through sharing a number of memory blocks.

Concurrently scheduling tasks sharing data with each other may improve the performance because tasks can prefetch data into the cache for each other. In order to consider the data sharing among tasks, we have to modify the hint generation phase and hint evaluation phase to make the predictions on data sharing. Besides, in the task scheduling phase, we also need a new gang grouping mechanism which attempts to concurrently schedule tasks shared data among them.

(2) We assume that all tasks have the same importance. However, it will be more desirable if we allow tasks with different level of importance. We need a new gang grouping mechanism in the task scheduling phase to make it to consider priorities. For such grouping mechanism, more critical tasks should encounter less cache contentions even if some cores must be left idle. In order to measure the performance, we also need a more sophisticated design for performance metric because the IPC does not consider the importance level of individual tasks.

(3) Simultaneous Multithreaded Processors (SMT)[30, 31] are another multi-core processor architecture. For such architecture, in addition to L2 cache, execution resources such as ALU and FPU are also sharing among cores. This sharing may introduce resource contentions which will affect the overall performance of system[32³³- 34]. In the future, we can try to adapt our method for this architecture and consider about other types of resource contentions. The hint generation phase and the hint evaluation phase need to be modified for the predicting of these resource contentions. Furthermore, different types of resource contentions may cause different latencies, the task scheduling mechanism may need considering this difference to make an efficient task scheduling.

(4) We assume that our scheduler is executed on a dedicated system processor,

and the scheduling overhead is ignored. However, the scheduler may have a lot of idle time if the task load is low. This is not economic for a cost-sensitive system.

The scheduling processor may be used for computation while it is idle as well as scheduling tasks. In this way, the scheduling overhead needs to be taken into account for those tasks scheduled on the system processor. How to define and quantify the scheduling overhead is not trivial and becomes one of the extensions of this thesis.

Bibliography

[1] Y. N. Patt, S. J. Patel, M. Evers, D. H. Friendly, J. Stark, "One Billion Transistors, One Uniprocessor, One Chip", Computer, Volume 30, Issue 9, pp.51-57, 1997.

[2] W. J. Dally, S. Lacy, "VLSI Architecture: Past, Present, and Future", Proc.

of 20th Anniversary Conference on Advanced Research in VLSI, pp.232-241, 1999.

[3] R. Nair, "Effect of Increasing Chip Density on The Evolution of Computer Architectures", IBM Journal of Researchand Development, Volume 46, Number 2, pp.223-234, 2002.

[4] D. Burger, J. R. Goodman, "Billion-Transistor Architectures", Computer, Volume 30, Issue 9, pp.46-48, 1997.

[5] K. Olukotun and L. Hammond, "The Future of Microprocessors", ACM Queue, Volume 3, Issue 7, pp.26-29, 2005.

[6] L. Hammond, B. A. Hayfeh, and K. Olukotun, "A Single-Chip Multiprocessor", Computer, Volume 30, Issue 9, pp.79-85, 1997.

[7] L. Hammond, B. A. Hubbert, M. Siu, M. K. Prabhu, M. Chen and K.

Olukolun, "The Stanford Hydra CMP", IEEE micro, Volume 20, Issue 2, pp.71-84, 2000.

[8] J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le and B. Sinharoy,

"POWER4 System Microarchitecture", IBM Journal of Researchand Development, Volume 46, Number 1, pp.5-25, 2002.

[9] B. A. Nayfeh and K. Olukotun, "Exploring the Design Space for A Shared-Cache Multiprocessor", Proc. of 21st Annual International Symposium on Computer Architecture, pp.166-175, 1994.

[10] G. E. Suh, L. Rudolph, and S. Devadas, "Dynamic Cache Partitioning for CMP/SMT Systems", The Journal of Supercomputing, Volume 28, Issue 1, pp.7-26, 2004.

[11] S. Kim, D. Chandra, and Y. Solihin, "Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture", Proc. of 13th International Conference on Parallel Architectures and Compilation Techniques, pp.111-122, 2004.

[12] D. Chandra, F. Guo, S. Kim, and Y. Solihin, "Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture", Proc. of 11th International Symposium on High-Performance Computer Architecture, pp.340-351, 2005.

[13] S. Hily, A. Seznex, "Contention on 2nd Level Cache May Limit the Effectiveness of Simultaneous Multithreading", Tech. Report PI-1086, IRISA, 1997.

[14] A. Settle, J. Kihm, A. Janiszewski, and D. A. Connors, "Architectural Support for Enhanced SMT Job Scheduling", Proc. of 13th International Conference on Parallel Architectures and Compilation Techniques, pp.63-73, 2004.

[15] A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum, "Throughput-Oriented Scheduling On Chip Multithreading Systems", Tech. Report TR-17-04, Harvard, 2004.

[16] J. L. Henning, "SPEC CPU2000: Measuring CPU Performance in the New Millennium", Computer, Volume 33, Issue 7, pp.28-35, 2000.

[17] C. A. R. Hoare, "Monitors: An Operating System Structuring Concept", Communications of the ACM, Volume 17, Number 10, pp.549-557, 1974.

[18] A. Silberschatz, P. B. Galvin and G. Gagne, "Operating System Concept", Wiley, 2002

[19] D. G. Feitelson and M. A. Jette, "Improved Utilization and Responsiveness with Gang Scheduling", Proceedings of the Job Scheduling Strategies for Parallel Processing, pp.238-261, 1997.

[20] T. Sherwood, E. Perelman, and B. Calder, "Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications", Proc. of 10th International Conference on Parallel Architectures and Compilation Techniques, pp.3-14, 2001.

[21] P. J. Denning, "Thrashing: Its causes and prevention", Proc. of American Federation of Information Processing Societies Fall Joint Computer Conference, pp.915-922, 1968.

[22] E. Berg and E. Hagersten, "StatCache: A Probabilistic Approach to Efficient and Accurate Data Locality Analysis", Proc. of 4th International Symposium on Performance Analysis of Systems and Software, pp.20-27, 2004.

[23] R. L. Mattson, J. Gecsei, D. R. Slutz and I. L. Traiger, "Evaluation Techniques for Storage Hierarchies", IBM Systems Journal, Volume 9, Number 2, pp.78-117, 1970.

[24] K. D. Cooper and L. Torczon, "Engineering a Compiler", Morgan Kaufmann, 2004

[25] A. V. Aho, R. Sethi and J. D. Ullman, "Compilers: Principles, Techniques and Tools", Addison-Wesley, 1985

[26] D. Burger and T. M. Austin, "The SimpleScalar Tool Set 2.0", ACM SIGARCH Computer Architecture News, Volume 25, Issue 3, pp.13-25, 1997.

[27] K. C. Yeager, "The MIPS R10000 Superscalar Microprocessor", IEEE Micro, Volume 16, Number 2, pp.28-40, 1996.

[28] J. Laudon and D. Lenoski, "System Overview of the SGI Origin 200/2000", Proceedings of COMPCON 97, p.150, 1997.

[29] M. D. Hill and A. J. Smith, "Evaluating Associativity in CPU Caches", IEEE Transactions on Computers, Volume 38, Issue 12, pp.1612-1630, 1989.

[30] H. Hirata, K. Kimura, S. Nagamine, Y. Mochizuki, A. Nishimura, Y.

Nakase and T. Nishizawa, "An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads", Proc. of the 19th Annual International Symposium on Computer Architecture, pp.136-145, 1992.

[31] J. L. Lo, J. S. Emer, H. M. Levy, R. L. Stamm, D. M. Tullsen and S. J.

Eggers, "Converting Thread-Level Parallelism to Instruction-Level Parallelism via Simultaneous Multithreading", ACM Transactions on Computer Systems, Volume 15, Issue 3, pp.322-354, 1997.

[32] S. J. Eggers, J. S. Emer, H. M. Leby, J. L. Lo, R. L. Stamm and D. M.

Tullsen, "Simultaneous Multi-Threading: A Platform for Next-Generation Processors", IEEE micro, Volume 17, Issue 5, pp.12-19, 1997.

[33] S. E. Raasch and S. K. Reinhardt, "The Impact of Resource Partitioning on SMT Processors", Proc. of 12th International Conference on Parallel Architectures and Compilation Techniques, pp.15-25, 2003.

[34] L. K. McDowell, S. J. Eggers and S. D. Gribble, "Improving Server Software Support for Simultaneous Multithreaded Processors", Proc. of the 9th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.37-48, 2003.

在文檔中在晶片多處理器系統下以減少快取衝突為目的之動態工作排程方法 (頁 58-64)