Power Model - Power Aware Architecture - 以內容特徵為基礎之運動向量估測演算法及架構研究

4.6 Power Aware Architecture

4.7.1 Power Model

Before depict the simulation results, this subsection presents the power model which will be used to analyze the performance in the thesis. In a VLSI system designed in CMOS technology, one can consider the major power consumption of a CMOS gate i as (4-15), where C_i is the output capacitance, f_i is the operation frequency, and r_i(0 ↔ 1) is the switch activity of gate i. α and κ are constants.

P_gate_i = α · C_i · f_i· V_DD² = κ · C_i· r_i(0 ↔ 1). (4-15)

For an execution unit EUj in such VLSI system, the power consumption can be shown in (4-16), where G^j is the gate count of EU_j.

P_EU_j =

G^j

i=1

κ · C_i^j· r_i^j(0 ↔ 1). (4-16)

After considering the activity of execution units, the total power consumption can be expressed as (4-17) and approximated as (4-19) by assuming the switch activities are uniform within an execution unit; that is, r^k_i(0 ↔ 1) = r^k(0 ↔ 1), ∀r^k_i(0 ↔ 1). Since the average output capacitances of each execution unit (C_avg^k ) are nearly the same as the average output capacitances of total system (C_avg), the total power consumption can be approximated to (4-22). Therefore, we can obtain an approximated power estimation model shown in (4-23), where εgpis defined as the gate power coefficient. In this paper, we use the gate power coefficient as the unit for estimating power dissipation.

P_total = X

Table 4.VII.: Power analysis of the power-aware architecture

EU_i PE array SRA PAT+MVS EXU

AD + Adder Others

Gate Count Gⁱ 117,760 58,708 44,640 1,800 15,393 rⁱ(0 ↔ 1) 4p²R⁻¹_s 4p² 4p² 4p² N² Pⁱ 1.21e8 · R⁻¹_s 6.01e7 4.57e7 1.84e6 3.94e6 Ptotal(εgp) 1.21e8 · R⁻¹_s + 1.12e8

N = 16 and p = 16

Cell library: TSMC 0.35um process

4.7.2 Results

Table 4.VII. shows the synthesis result using the TSMC 1P4M 0.35um cell li-brary, where the symbol R_s means the content-based subsample rate and the ε_gp is the gate power coefficient defined in (4-23). Comparing with the general semi-systolic architecture [11], the edge extraction unit (EXU) of proposed architecture is the major overhead for power-aware function. As mentioned above, this paper uses one of three gradient filters to implement the EXU. As per the synthesis re-sults, the gate counts of the three gradient filters are 499.33, 697.77 and 631.63 respectively. The variance of these values is very little to the overall gate count of EXU. For instance, the gate count of EXU with high-pass filter is equal to 15393.

The number is extremely larger than the variance. It means that the selection of gradient filter does not affect the overhead estimation much. Therefore, we selec-tively use high pass filter to estimate the performance overhead caused by EXU.

From Table 4.VII., we can notice that the area overhead of EXU is 6.46% while the worst-case power overhead is only 2.8% when the subsample rate is 4-to-1 with N = 16 and p = 16.

Figure 4-21 shows examples of four video clips for switching the power con-sumption mode. The target subsample pixel count is reduced by 48 every 40

frames and the control parameter K_pis set to 0.2. The result shows that the adap-tive control mechanism can make the power consumption reach the target level within 10 frames and the stationary error be under 5%. According to the battery properties described in section 4.2, the curve shows that our power-aware architec-ture can extend the battery lifetime by slowly and gradually degrading the quality.

The marks A, B, C, and D are corresponding to the switching points in Fig. 4-2(b) respectively.

4.8 Summary

Motivated by the concept of battery properties and power-aware paradigm, this chapter presents an architecture-level power-aware technique based on a novel adaptive content-based subsample algorithm. When the battery is in the status of full capacity, the proposed ME architecture will turn on all the PEs to provide the best compression quality. In contrast, when the battery capacity is short for full operation, instead of exhibiting an all-or-none behavior, the proposed architecture will shift to lower power consumption mode by disable some PEs to extend the battery lifetime with little quality degradation.

As the results of simulation, the CSA can significantly reduce computation complexity with little quality degradation. However, there exists a non-stationary problem with CSA for implementing power-aware architecture if the designer uses constant threshold parameters m^t₁ and statically sets the floating threshold for a given power mode. Since different video clips with the same threshold pa-rameters will have different subsample rates, setting the threshold value without considering the content variation of video clips will make the subsample rate non-stationary; that is, the power consumption will not be converged within a narrow

0 20 40 60 80 100 120 140 160 180 200

Switch mode of Dancer clip

Frame number

Switch mode of News clip

Frame number

Switch mode of Paris clip

Frame number

Switch mode of Waterfall clip

Frame number

Figure 4-21: The power switching curves of four clips. (a) The Dancer Clip. (b) The News Clip. (c) The Paris Clip. (d) The Waterfall Clip.

range for a given power mode. The divergence of power consumption can result in a bad power-awareness. To solve this non-stationary problem, the paper uses an adaptive control mechanism to adaptively adjust the threshold parameters so that the subsample rate can be stationary. The adaptive control mechanism used in this work is a run-time process that adjusts the threshold parameters fittingly according to the difference between the current subsample rate and the desired subsample rate (or target subsample rate).

Founded on the content-based algorithm, the power-aware architecture can dynamically operate at different power consumption modes with little quality degradation according to the remaining capacity of battery pack to achieve better battery discharging property. And the control mechanism maintains the power-consumption mode in an acceptable stationary state successively.

Chapter 5 Conclusion

This thesis has presented content-based motion estimation algorithms and archi-tectures to solve the problem of huge computation load and power-aware issue for the portable multimedia applications. The two-phase ME algorithm and the content-based power-aware ME algorithm are proposed to achieve the target. The formal one uses the edge-matching approach as matching criteria of the first phase to reduce the computational complexity. The later one applies a content-based subsample algorithm with the adaptive control mechanism to achieve the power-aware function for better quality degradation while the power mode is operated in the lower consumption level.

By employing the edge-matching criteria and the scan direction in the first phase, the edge-driven two-phase algorithm can use the quantized pixel value more efficiently and thus reduce the computation complexity. Reducing the com-putational complexity means reducing the power consumption. As the simulation results, the proposed two-phase algorithm can reduce the significant computation load comparing with the full-search algorithm and still can be more efficient than the exist two-phase algorithm.

The content-based power-aware algorithm performs power-aware function by disable/enable processing elements according to the subsample mask. The power-aware approach extracts the edge pixels of a macro-block and subsamples the non-edge pixels only to maintain the quality performance in acceptable level. Since the power consumption is proportional to the subsample rate, this content-based algorithm adopts a close-loop control mechanism to keep the subsample rate in stationary state. Founded on the content-based algorithm, the power-aware archi-tecture can dynamically operate at different power consumption modes with little quality degradation according to the remaining capacity of battery pack to achieve better battery discharging property.

According to the content-based methodology, the proposed algorithms and ar-chitectures are very suitable for the portable multimedia devices which are pow-ered by battery. The two-phase architecture is for the low-complexity application and the power-aware architecture is for extending the battery life with better qual-ity trade-off. As the simulation results show, the proposed content-based ME algorithms and architectures can achieve better power and quality performance for the portable multimedia application.

Bibliography

[1] Information Technology–Coding of Moving Picture and Associated Audio for Digital Storage Media at Up to About 1.5Mbps/s–Part2: Video, ISO/IEC 11172-2 (MPEG-1 Video), 1993.

[2] Information Technology–General Coding of Moving Pictures and Associ-ated Audio Information: Video, ISO/IEC 13818-2 (MPEG-2 Video) ITU-T H.262, 1996.

[3] MPEG-4 Video Group, Information Technology– Coding of Audio-Visual Objects– Part 2: Visual, ISO/IEC 14496-2, 1999.

[4] P. Kuhn, Algorithms, Complexity Analysis And VLSI Architectures for MPEG-4 Motion Estimation, Kluwer Academic Publishers, 1999.

[5] ITU-T Recommendation H.261, Video Codec for Audiovisual Services at Px64 Kbits, Dec. 1990.

[6] ITU-T Recommendation H.263, Video Codec for Low Bitrate Communica-tion, May 1996.

[7] V. Bhaskaran, Image and Video Compression Standards–Algorithms and Architectures, Kluwer Academic Publishers, 1997.

[8] K.-M. Yang, M.-T. Sun, and L. Wu, “A family of VLSI designs for the motion compensation block-matching algorithm,” IEEE Trans. Circuits Syst.

Video Technol., vol. 36, no. 10, pp. 1317–1325, Oct. 1989.

[9] H. Yeo and Y.-H Hu, “A novel modular systolic array architecture for full-search block matching motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 5, no. 5, pp. 407–416, Oct. 1995.

[10] S.-B. Pan, S.-S. Chae, and R.-H. Park, “VLSI architectures for block match-ing algorithms usmatch-ing systolic arrays,” IEEE Trans. Circuits Syst. Video Tech-nol., vol. 6, no. 1, pp. 67–73, Feb. 1996.

[11] C.-H. Hsieh and T.-P. Lin, “VLSI architecture for block-matching motion estimation algorithm,” IEEE Trans. Circuits Syst. Video Technol., vol. 2, no.

2, pp. 169–175, Jun. 1992.

[12] Y.-K. Lai and L.-G. Chen, “A data-interlacing architecture with two-dimensional data-reuse for full-search block-matching algorithm,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 2, pp. 124–127, Apr. 1998.

[13] V.-L. Do and K.-Y. Yun, “A low-power VLSI architecture for full-search block-matching motion estimation,” IEEE Trans. Circuits Syst. Video Tech-nol., vol. 8, no. 4, pp. 393–398, Aug. 1998.

[14] J.-F. Shen, T.-C. Wang, and L.-G. Chen, “A novel low-power full-search block-matching motion-estimation design for h.263+,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 7, pp. 890–897, Jul. 2001.

[15] J.-C. Tuan, T.-S. Chang, and C.-W. Jen, “On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 1, pp. 61–72, Jan. 2002.

[16] J.-N. Kim and T.-S. Chio, “Adaptive matching scan algorithm based on gradient magnitude for fast full search in motion estimation,” IEEE Tran.

Consumer Electronics, vol. 45, no. 3, pp. 762–772, Aug. 1999.

[17] T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, “Motion-compensated interframe coding for video conferencing,” in Proc. Na-tional Telecommunications Conf., New Orleans, LA, Nov. 1981, pp. G5.3.1–

G5.3.5.

[18] R. Li, B. Zeng, and M.-L. Liou, “A new three-step search algorithm for block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 4, no. 4, pp. 438–442, Aug. 1994.

[19] M.-J. Chen, L.-G. Chen, and T.-D. Chiueh, “One-dimensional full search motion estimation algorithm for video coding,” IEEE Trans. Circuits Syst.

Video Technol., vol. 4, no. 5, pp. 504–509, Oct. 1994.

[20] L.-M. Po and W.-C. Ma, “A novel four-step search algorithm for fast block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, no. 3, pp. 313–317, 1996.

[21] J.-Y. Tham, S. Ranganath, M. Ranganath, and A.-A. Kassim, “A novel un-restricted center-biased diamond search algorithm for block motion estima-tion,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 4, pp. 369–377, Aug. 1998.

[22] A.-M. Tourapis, O.-C. Au, M.-L. Liou, G. Shen, and I. Ahmad, “Optimizing the MPEG-4 encoder-advanced diamond zonal search,” in Proc. 2000 Int.

Symp. Circuits and Systems, Geneva, Switzerland, May 2000, pp. 674–677.

[23] S. Zhu and K.-K. Ma, “A new diamond search algorithm for fast block-matching motion estimation,” IEEE Trans. Image Processing, vol. 9, no. 2, pp. 287–290, Feb. 2000.

[24] B. Liu and A. Zaccarin, “New fast algorithms for the estimation of block motion vectors,” IEEE Trans. Circuits Syst. Video Technol., vol. 3, no. 2, pp.

148–157, Apr. 1993.

[25] B. Natarajan, V. Bhaskaran, and K. Konstantinides, “Low-complexity block-based motion estimation via one-bit transforms,” IEEE Trans. Circuits Syst.

Video Technol., vol. 7, no. 4, pp. 702–706, Aug. 1997.

[26] S. Cucchi and D. Grechi, “A new features-based fast algorithm for motion estimation: Decimated integral projection (D.I.P.),” in Proc. 1997 Int. Conf.

Infor., Commun., and Signal Processing., Singapore, Sep. 1997, pp. 297 – 300.

[27] J.-S. Kim and R.-H. Part, “Feature-based block matching algorithm using integral projections,” IEE Electron. Letter, vol. 25, pp. 29–30, Jan. 1989.

[28] H. Gharavi and M. Mills, “Blockmatching motion estimation algorithms-new results,” IEEE Trans. Circuits and Systems, vol. 37, no. 5, pp. 649 – 651, May 1990.

[29] J.-S. Kim and R.-H. Part, “A fast feature-based block matching algorithm using integral projections,” IEEE Journal Select. Areas Commun., vol. 10, no. 5, pp. 968–971, Jun. 1992.

[30] K. Sauer and B. Schwartz, “Efficient block motion estimation using integral projections,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, no. 5, pp.

513–518, Oct. 1996.

[31] S. Lee and S.-I. Chae, “Motion estimation algorithm using low-resolution quantization,” IEE Electronic Letters, vol. 32, no. 7, pp. 647–648, Mar.

1996.

[32] H.-W Cheng and L.-R. Dung, “EFBLA: A two-phase matching algorithm for fast motion estimation,” in Advances in Multimedia Information Processing - PCM 2002, Dec. 2002, vol. 2532, pp. 112–119.

[33] H.-W. Cheng and L.-R. Dung, “EFBLA: a two-phase matching algorithm for FAST motion estimation,” IEE Electronic Letters, vol. 40, no. 11, pp.

660–661, May 2004.

[34] W. Li and E. Salari, “Succesive elimination algorithm for motion estima-tion,” IEEE Trans. Image Processing, vol. 4, no. 1, pp. 105–107, Jan. 1995.

[35] S. Lee, J.-M. Kim, and S.-I. Chae, “New motion estimation using low-resolution quantization for MPEG2 video encoding,” in Workshop on VLSI Signal Processing, IX, Oct. 1996, pp. 428–437.

[36] S. Lee and S.-I. Chae, “Two-step motion estimation algorithm using low-resolution quantization,” in International Conference on Image Processing, Sep. 1996, pp. 795–798 vol.3.

[37] R.-C. Gonzalez and R.-E. Woods, Digital Image Processing, Addison Wes-ley, Sep. 1993.

[38] “http://www.m4if.org/resources.php,” .

[39] H.-W Cheng and L.-R. Dung, “A vario-power ME architecture using contnt-based subsample algorithm,” IEEE Trans. Consumer Electronics, vol. 50, no. 1, pp. 349–354, Feb. 2004.

[40] H.-W. Cheng and L.-R. Dung, “A power-aware motion estimation architec-ture using content-based subsampling,” Journal of Information Science and Engineering, (accepted).

[41] H.-W. Cheng and L.-R. Dung, “A content-based methodology for power-aware motion estimation architecture,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, (accepted).

[42] M. Bhardwaj, R. Min, and A.-P. Chandrakasan, “Quantifying and enhancing power awareness of VLSI systems,” IEEE Trans. VLSI Systems, vol. 9, no.

6, pp. 757–772, December 2001.

[43] J.-R. Jain and A.-K. Jain, “Displacement measurement and its application in interframe image coding,” IEEE Trans. Commun., vol. COM-29, pp. 1799–

1808, Dec. 1981.

[44] C. Zhu, X. Lin, and L.-P. Chau, “Hexagon-based search pattern for fast block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no.

5, pp. 349 –355, May 2002.

[45] J.-H. Luo, C.-N. Wang, and T. Chiang, “A novel all-binary motion estimation (ABME) with optimized hardware architectures,” IEEE Trans. Circuits Syst.

Video Technol., vol. 12, no. 8, pp. 700 –712, Aug. 2002.

[46] C.-L. Su and C.-W. Jen, “Motion estimation using msd-first processing,”

IEE Proc-G Circuits, Devices and Systems, vol. 150, no. 2, pp. 124–133, April 2003.

[47] O.-S. Unsal and I. Koren, “System-level power-aware design techniques in real-time systems,” Proceedings of the IEEE, vol. 91, no. 7, pp. 1055 –1069, July 2003.

[48] C.-K. Cheung and L.-M. Po, “Normalized partial distortion search algorithm for block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol.

10, no. 3, pp. 417–422, Apr. 2000.

[49] D. Linden, Handbook of Batteries, Second Edition, McGraw-Hill, Inc., 1995.

[50] Mobile Pentium^® III Processor in BGA2 and Micro-PGA2 Packages Datasheet, p. 55, Intel Corporation.

[51] Y.-L. Chan and W.-C. Siu, “New adaptive pixel decimation for block motion vector estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp.

113–118, Feb. 1996.

[52] P. Raghavan and C. Chakrabarti, “Battery-friendly design of signal process-ing algorithms,” in IEEE Workshop on Signal Processprocess-ing Systems, Aug.

2003, pp. 304–309.

[53] S.-C. Cheng and H.-M. Hang, “A comparison of block-matching algorithms mapped to systolic-array implementation,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 5, pp. 741–757, Oct. 1997.

[54] H.-W. Cheng and L.-R. Dung;, “A novel vario-power architecture of motion estimation using a content-based subsample algorithm,” in IEEE Workshop on Signal Processing Systems, Aug. 2003, pp. 201 – 206.

[55] H.-W. Cheng and L.-R. Dung;, “A power-aware architecture for motion estimation,” in The 14th VLSI Design/CAD Symposium, Taiwan, Aug. 2003, pp. 133–136.

[56] H.-W. Cheng and L.-R. Dung;, “A power-aware ME architecture using sub-sample algorithm,” in IEEE International Symposium on Circuits and Sys-tems, May 2004, pp. III–821–4 Vol.3.

[57] M.-B. Ahmad and T.-S Choi, “Edge detection-based block motion estima-tion,” IEE Electronic Letters, vol. 37, no. 1, pp. 17–18, Jan. 2001.

[58] L.-D Vos and M. Stegherr, “Parameterizable VLSI design for the motion compensation block-matching algorithm,” IEEE Trans. Circuits Syst. Video Technol., vol. 36, no. 10, pp. 1317–1325, Oct. 1989.

[59] Y.-K. Lai, Y.-L. Lai, Y.-C. Liu, P.-C. Wu, and L.-G. Chen, “VLSI imple-mentation of the motion estimator with two-dimensional data-reuse,” IEEE Trans. Consumer Electronics, vol. 44, no. 3, pp. 623–629, Aug. 1998.

[60] J.-C. Tuan and C.-W. Jen, “An architecture of full-search block matching for minimim memory bandwidth requirement,” Proc. IEEE 8th Great Lake Symp. VLSI, pp. 152–156, 1998.

[61] S. Chang, J.-H. Hwang, and C.-W. Jen, “Scalable array architecture design for full search block matching,” IEEE Trans. Circuits Syst. Video Technol., vol. 5, no. 4, pp. 332–343, Aug. 1995.

[62] Y.-S. Jehng, L.-G. Chen, and T.-D. Chiueh, “An efficient and simple VLSI tree architecture for motion estimation algorithms,” IEEE Trans. on Signal Processing, vol. 41, no. 2, pp. 889–900, Feb. 1993.

[63] J.-N. Kim and T.-S. Choi, “A fast full-search motion-estimation algorithm using representative pixels and adaptive matching scan,” IEEE Trans. Cir-cuits Syst. Video Technol., vol. 10, no. 7, pp. 1040–1048, Oct. 2000.

[64] T. Komarek and P. Pirsch, “Array architectures for block matching algo-rithms,” IEEE Trans. Circuits Syst. Video Technol., vol. 36, no. 10, pp.

1302–1308, 1989.

[65] B.-S. Kim and J.-D Cho, “VLSI architecture for low power motion estima-tion using high data access reuse,” in ASICs, 1999. AP-ASIC ’99. The First IEEE Asia Pacific Conference on, Aug. 1999, pp. 162–165.

[66] X. Lu and O.-C. Au, “Improved fast motion estimation using integral pro-jection features for hardware implementation,” in IEEE Intel. Symp. Circuits and Syst., Hong Kong, Jun. 1997, pp. 1337–1340.

[67] J.-N. Kim, S.-C. Byun, Y.-H. Kim, and B.-H. Ahn, “Fast full search motion estimation algorithm using early detection of impossible candidate vectors,”

IEEE Transactions on Signal Processing, vol. 50, no. 9, pp. 2355 –2365, Sep. 2002.

[68] C.-L. Su and C.-W. Jen, “MSD-first on-line arithmetic progressive process-ing implementation for motion estimation,” IEICE Trans. Inf. and Syst., vol.

E86-D, no. 11, pp. 2433–2443, Nov. 2003.

[69] B. Natarajan, V. Bhaskaran, and K. Konstantinides, “Low-complexity algo-rithm and architecture for block-based motion estimation via one-bit trans-forms,” in Acoustics, Speech, and Signal Processing, 1996. ICASSP-96.

Conference Proceedings., 1996 IEEE International Conference on, May 1996, pp. 3244–3247.

[70] G. Privat and M. Renaudin, “Motion estimation VLSI architecture for image coding,” in Proc. Int. Conf. Comput. Design: VLSI Comput. Processors, 1989, pp. 78–81.

[71] Wujian Zhang, Runde Zhou, and T. Kondo, “Low-power motion-estimation architecture based on a novel early-jump-out technique,” The IEEE Interna-tional Symposium on Circuits and Systems, vol. 5, pp. 187–190, 2001.

在文檔中以內容特徵為基礎之運動向量估測演算法及架構研究 (頁 97-114)