Conclusion and Future Work - 在超長指令字的數位訊號處理器下的指令排程以降低能量消耗為目的

In this thesis, we propose a method named Greedy Switching Activities Scheduling (GSAS) which comprises two phases. The phase one of GSAS schedules the DAG. The phase two of GSAS re-assigns the registers to reduce the switching activities. The experimental results have shown the effectiveness of our method. Finally, we will conclude our thesis and propose some future work for our research.

5.1 Conclusion

Portable devices, such as cellular phone, digital camera, PDA have become so popular and are used widely in the world. Hence, the power reduction in VLIW DSP becomes a more and more important problem. Due to buses consume a significant fraction of total power dissipation in a processor, so we propose a method, GSAS, to reduce the switching transitions on the instruction bus. In summary, we give the following conclusions:

(a) The phase one of GSAS uses a greedy method to schedule the DAG and reduce the switching activities. According to the experimental results, the more power caused by each bit switch the more energy our method can save. That is, when the power coefficient α representing the consumed power per transition is big, then we can save more power in switching activities.

(b) The time complexity of the phase one of GSAS is (|V|*(|V|*N)), where |V| is the number of the sub-instructions and N is the number of functional units. It don’t need to find the min-cost maximal weight bipartite matching and just finds the only one node in one iteration which needs at most O(|V|*N) to be completed. But the complexity of MSAS is O(|V|*(N+|V|)³). Hence, the phase one of GSAS saves more time in comparing with MSAS.

(c) The phase two of GSAS can improve the results of the phase one of GSAS by re-assigning the registers. According to the experimental results, we can observe that when the phase one collocates with the phase two, it can save more power than only using the phase one. We can find that the register assignment is an important factor affecting the total switching activities. The phase two uses a greedy method to re-assign the registers and it can reduce the total switching activities of the schedule created by the phase one.

5.2 Future Work

There are still many things we can do in the future.

(a) In our experiments, we only use simplified machine of TI TMS320C6000.

In the future, we can try to do our experiments with different machine architectures to see if our method works in other architectures.

(b) The phase two of GSAS can be only collocated with the phase one of GSAS. In the future, we will try to find a better way to re-assign the registers to reduce the switching activities and we will make it collocated with all other algorithms.

(c) Our method is not designed specially for the loop applications. We don’t do the optimization for the organization of the loop body. In the future, we can focus our research on the scheduling for the loop applications to reduce the schedule length and switching activities of a loop.

(d) Our method only consider about the self-transitions. There are some researches trying to reduce the coupling-transitions [24-25]. In the future, we can consider about both self-transitions and coupling-transitions and try to reduce more power.

Bibliography

[1] V. Tiwari, S. Malik, and M.Fujita, “Power analysis of embedded software: A first step towards software power minimization,” in Proceedings of the IEEE.ACM International Conference on Computer Aided Design, Nov. 1994, pp. 110-115.

[2] N. Chang, K. Kim, and H. G. Lee, “Cycle-accurate energy measurement and characterization with a case study of the ARM7TDMI,” IEEE Tran. On VLSI Systems, vol.

10, no. 2, pp.146-154, Apr. 2002.

[3] L. -F. Chao, Andrea LaPaugh, and Edwin H. -M. Sha, “Rotation Scheduling: A Loop Pipelining Algorithm”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 16, Issue 3, pp. 229-239, March 1997.

[4] Nelson L. Passos and Edwin H. -M. Sha, “Achieving Full Parallelism using Multidimensional Retiming”, IEEE Transactions on Parallel and Distributed Systems, Vol.

7, No. 11, pp. 1150-1163, Nov. 1996.

[5] Nelson L. Pasos and Edwin H. -M. Sha, “Scheduling of Uniform Multidimensional Systems under Resource Constraints”, IEEE Transactions on Very Large Scale Integration Systems, Vol. 6, Issue 4, pp. 719-730, Dec. 1998

.

[6] Mike Tien-Chien Lee, and Vivek Tiwari, and Sharad Malik, and Masahiro Fujita, “Power Analysis and Minimization Techniques for Embedded DSP Software”,IEEE Transactions on VLSI Systems, Vol 5, no1, pp. 123-133, March 1997.

[7] M. J. Irwin. Tutorial: Power reduction techniques in SoC bus interconnects. In 1999 IEEE International ASIC/SOC Conference, 1999.

[8] Texas Instruments, Inc. TMS320C6000 CPU and Instruction Set Reference Guide 2000 [9] Texas Instruments, Inc. TMS320C6000 Peripherals Reference Guide

[10] Aili Shao, Qingfeng Zhuge, Youtao Zhang, and Edwin H. -M. Sha, “Algorithms and Analysis of Scheduling for Low-power High-performance DSP on VLIW Processors”, accepted in International Journal of High Performance Computing and Networking.

[11] Zili Shao, Qingfeng Zhuge, Edwin H. -M. Sha, and Chantana Chantrapornchai, “Loop Scheduling for Minimizing Schedule Length and Switching Activities”, Proc. of International Symposium on Circuits and Systems, Vol. 5, pp. 109-112, May 2003.

[12] Zili Shao, Qingfeng Zhuge, Edwin H. -M. Sha, and Chantana Chantrapornchai,

“Analysis and Algorithms for Scheduling with Minimal Switching Activities”, Proc. of 45th Midwest Symposium on Circuits and Systems, Vol. 1, pp. 372-375, Aug. 2002.

[13] C. Lee, J. -K. Lee, and T. Hwang, “Compiler Optimization on Instruction Scheduling for Low Power”, Proc. of International Symposium on System Synthesis, pp. 55-60, Sep.

2000.

[14] K. Choi and A. Chatterjee, “Efficient Instruction-level Optimization Methodology for Low-power Embedded Systems”, Proc. of International Symposium on System Synthesis, pp. 147-152, Oct. 2001.

[15] Markus Lorenz, Rainer Leupers, Peter Marwedel, Thorsten Drager, and Gerhard Fettweis,

“Low-energy DSP Code Generation using a Genetic Algorithm”, Proc. of International Conference on Computer Design, pp. 431-437, Sep. 2001.

[16] E. Musoll and J. Cortadella, “Scheduling and Resource Binding or Low Power”, Proc. of International Symposium on System Synthesis, pp. 104-109, April 1995.

[17] Suvodeep Gupta and Srinivas Katkoori, “Force-directed Scheduling for Dynamic Power Optimization”, Proc. of IEEE Computer Society Annual Symposium on VLSI, pp. 68-73, April 2002.

[18] Daehong Kim, Dongwan Shin, and Kiyoung Choi, “Low Power Pipelining of Linear Systems: A Common Operand Centric Approach”, Proc. of International Symposium on Low Power Electronics and Designs, pp. 225-230, Aug. 2001.

[19] Zili Shao, Qingfeng Zhuge, Edwin H. –M. Sha, Meilin Li and Bin Xiao,

“Switching-Activity Minimization on Instruction-level Loop Scheduling for VLIW DSP Applications”, Proc. of 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, Pages224 – 23, Sept. 2004 .

[20] H. Saip and C. L. Lucchesi, “Matching algorithm for bipartite graphs, Tecn.

Rep.DCC-93-03 (Departamento de Cincia da Computao, Universidade Estudal de Campinas), March 1994.

[21] C. E. Leiserson and J. B. Saxe, Retiming synchronous circuity. Algorithmica, 6:5-35, 1991.

[22] M. J. Irwin. Tutorial: Power reduction techniques in SoC bus interconnects. In 1999 IEEE International ASIC/SOC Conference, 1999.

[23] http://www.ert.rwth-aachen.de/Projekte/Tools/DSPSTONE/dspstone.html

[24] Chun-Gi Lyuh, Taewhan Kim, Ki-Wook Kim, “Coupling-Aware High-level Interconnect Synthesis for Low power”, Proc. of the 2002 IEEE/ACM international conference in Computer-aided design, Page609 - 613, Nov. 2002.

[25] Yan Zhang, John Lach, Kevin Skadron, Mircea R. Stan. “Odd/Even Bus Invert with Two-Phase Transfer for Buses with Coupling”, Proc. of the international symposium on Low power electronics and design, Page 80 – 83, Aug. 2002.

在文檔中在超長指令字的數位訊號處理器下的指令排程以降低能量消耗為目的 (頁 46-50)