In this thesis, we proposed a video coder bit allocation scheme based on human visual perception model. A visual complexity measure is introduced into the proposed rate control algorithm. A modified contrast sensitivity function, which is suitable for visual complexity estimation, is designed based on the visual model research. By applying this proposed visual complexity measure, one can obtain a visual complexity map which represents the visual distortion sensitivity of each macroblock.
Experiments show that the result of our distortion sensitivity analysis is quite consistent with human vision systems for all the test sequences used.
The visual analysis directs the rate control algorithm to assign more bits to the regions with higher visual importance, and fewer bits to the regions that can tolerate larger distortion. The coding performance of the proposed method is compared with the H.264 JM7.6 encoder with the reference rate control algorithm. Since PSNR does not completely agree with the perceptual quality evaluated by human eyes, we use SSIM to access the quality in our experiments. The proposed method has the better performance with higher SSIM numbers and lower bitrate in all test cases. Moreover, the proposed method in low target bitrate cases tends to improvement visual quality more, while in high-bitrate conditions it tents to reduce the bitrate. This characteristic is consistent with human visual system property.
Although the proposed bit allocation algorithm performs well, there are still some room for further improvements. For example, the quadratic rate-distortion model sometimes is not accurate in estimating the quantization step size. That is, the actual bitrate does not match the target bitrate by applying the estimated QP. A flaw in the derivation of the quadratic rate control function was pointed out in this thesis. An
area of future research that should be considered is to develop a more reliable rate-distortion function.
In our method, contrast sensitivity function is introduced into the analysis of visual complexity. Contrast sensitivity is an important feature of human vision systems, but there are still some other visual models, such as luminance masking, that can be used for video content analysis. The luminance masking effects says that visual threshold of HVS has a strong dependence on the surrounding background luminance.
Therefore, the sensitivity of noise of a video region should also take into account its surrounding luminance levels. Another important visual cue is related to the tractability of a moving object by human eye movement. This visual cure may be computed from optical flow information. An area with randomly-oriented motions might not be tracked easily. On the contrary, an area with consistent motions might be more tractable and are more sensitive to distortion.
Even though we have used SSIM for objective visual quality assessment in this thesis, it might not be perfect. ITU-T Video Quality Expert Group (VQEG) has conducted a call-for-proposal for an objective measure that can closely resemble the HVS. However, none of the proposal is distinctly better than the others [35]. Simply put, designing a good objective visual quality is still an open problem. The analysis between the video content and human visual system response initiated in this thesis might provide some useful information to develop a more practical objective measure.
In summary, future improvements can be expected with these efforts.
7. Reference
[1] J. L. Mannos and D. J. Sakrison, “The effects of a Visual Fidelity Criterion on the Encoding of Images,” IEEE Trans. on Information Theory, Vol.20, No. 4, Jul. 1974, pp.525-536.
[2] S. Daly, "The Visible Differences Predictor: An algorithm for the Assessment of Image Fidelity," Digital Images and Human Vision, A.B. Watson, editor, MIT Press, Cambridge, Massachusetts, 1993.
[3] H. Rushmeier, G.. Ward, C. Piatko, P. Sanders and B. Rust, "Comparing Real and Synthetic Images: Some Ideas About Metrics," Proceedings of Sixth Eurographics Workshop on Rendering, Dublin, Ireland, 1995, pp. 82-91,.
[4] B. Tao, B. W. Dickinson, and H. A. Peterson, “Adaptive Model-Driven Bit Allocation for MPEG Video Coding,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 10, Feb. 2000.
[5] C. W. Tang, C. H. Chen, Y. H. Yu, and C. J. Tsai, "A Novel Visual Distortion Sensitivity Analysis for Video Encoder Bit Allocation," Proc. IEEE Intern. Conference on Image Processing, Singapore, October 2004.
[6] C.-W. Tang, C.-H. Chen, Y.-H. Yu, and C.-J. Tsai, “Visual Sensitivity Guided Bit Allocation for Video Coding,” IEEE Trans. on Multimedia, Vol. 8, No. 1, Feb. 2006, pp.
11-18.
[7] A. N. Netravali and B. G. Haskell, Digital Pictures: Representation and Compression.
New York, NY: Plenum, 1988.
[8] F. W. Campbell and D. G. Green, "Optical and retinal factors affecting visual resolution,"
J. Physiol., vol. 181, 1965, pp. 576-593.
[9] A. B. Watson and A. J. Ahumada, Jr., "A standard model for foveal detection of spatial
contrast," Journal of Vision, 5(9), 2005, pp. 717-740.
[10] H. Zhou, M. Chen and M. F. Webster, "Comparative evaluation of visualization and experimental results using image comparison metrics," Proc. IEEE Visualization, Boston, 2002, pp. 315-322.
[11] P. Moon and D. E. Spencer, “The Visual Effect of Nonuniform Surrounds,” Journal of Optical Society of American, vol. 35, March 1945, pp. 233-248.
[12] P. Mertz, “Perception of Television Random Noise,” J. of Society of Motion Picture &
Television Engineers, vol. 54, Jan. 1950, pp. 8-34.
[13] R. J. Safranek and J. D. Johnston, “A Perceptually Tuned Sub-band image Coder with Image Dependent Quantization and Post-quantization Data Compression,” Proc. IEEE Int. Conf., Acoust., Speech, Signal Process, vol. 3, 1989, pp. 1945-1948.
[14] C.-H. Chau and Y.-C. Li, „“A Perceptually Tuned Subband Image Coder Based on The Measure of Just-Noticeable-Distortion profile,“ IEEE Trans. Circuits & Systems for Video Technology, vol.5, no.6, Dec., 1995, pp. 467-476.
[15] Z. Wang and A. C. Bovik, "A Human Visual System-Based Objective Video Distortion Measurement System,” International Conference on Multimedia Processing and Systems, Aug. 2000
[16] ISO/IEC JTC1/SC29 WG11, “Test Model 5,” MPEG Document N0400, Sydney, April 1993.
[17] H.-J. Lee, Tihao Chiang and Y.-Q. Zhang, “Scalable Rate Control for MPEG-4 Video,”
IEEE Transactions on Circuits and Systems for Video Technology, Volume 10, Issue
6, Sept. 2000, pp. 878-894.
[18] ISO/IEC JTC1/SC29 WG11, “Annex-L Rate Control,” Information Technology – Coding of Audio Visual Objects – Part 2: Visual, ISO/IEC 14496-2:2003, 3rd Ed., 2003.
[19] Z. He and S. K. Mitra, “Optimum Bit Allocation and Accurate Rate Control for Video Coding via rho-Domain Source Modeling,” IEEE Trans. on Circuit and System for
Video Technology, Vol. 12, No. 10, Oct. 2002, pp. 840-849.
[20] S. Lee, M. S. Pattichis, and A. C. Bovik, “Foveated Video Compression with Optimal Rate Control,” IEEE Transactions on Image Processing, Vol. 10, No.7, July 2001, pp.
977-992.
[21] M. R. Pickering and J. F. Arnold, “A Perceptually Efficient VBR Rate Control
Algorithm,” IEEE Transactions on Image Processing, Vol. 3, No.5, September 1994, pp.
527-532.
[22] L. Itti, C. Koch, and E. Niebur, “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no.
11, November, 1998, pp. 1254-1259.
[23] A. J. Ahumada Jr., “Simplified Vision Model for Image Quality Assessment,” SID International Symposium Digest of Technical Papers, 27:397-400, 1996, pp. 397-400.
[24] Z. Wang, L. Lu, and A. C. Bovik, “Video Quality Assessment Based on Structural Distortion Measurement,” Signal Processing: Image Communication, Vol. 19, No. 2, Jan. 2004.
[25] B. K. P. Horn and B. G. Schunck, “Determining Optical Flow,” Artificial Intelligence, 1981, pp. 185-203.
[26] D. H. Kelly, “Visual Processing of Moving Stimuli,” Journal of the Optical Society of America A, Vol. 2, 1985, pp. 216-225.
[27] B. Girod, “Eye Movements and Coding of Video Sequences,” SPIE Visual Communications and Image Processing, ed. T. R. Hsing, pp. 398-405, 1988.
[28] T. Painter and A. Spanias, "Perceptual Coding of Digital Audio" Proc. of the IEEE,Vol.
88, No.4, 2000 pp. 451-515.
[29] R.Talluri, K. Oehler, T. Bannon, J. D. Courtney, A. Das and J. Liao, “A Robust, Scalable, Object-Based Video Compression Technique for Very Low Bit-Rate Coding,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 7, No. 1, Feb. 1997, pp.
221-233.
[30] Z. Wang, A. C. Bovik and L. Lu , “Why Is Image Quality Assessment So Diffucult,”
IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2002.
[31] P. salembier, L. Torres, F. Meyer and C. Gu, “Region-Based Video Coding Using Mathematical Morphology,” Proceedings of the IEEE, Vol. 83, No. 6, Jun 1995, pp.
843-857.
[32] T. Berger, Rate Distortion Theory. Englewood Cliffs, NJ: Prentice Hall, 1971.
[33] J. J. Gibson, The Perception of the Visual World, Houghton Mifflin, Boston, MA, 1950.
[34] P. G. J. Barten, Contrast Sensitivity of The Human Eye and Its Effects on Image Quality, SPIE-International Society for Optical Engineering, 1999.
[35] VQEG, “Final report from the video quality experts group on the validation of objective models of video quality assessment,” http://www.vqeg.org, March 2000.
[36] C. E. Shannon, “A Mathematical Theory of Communication,” Bell Syst. Tech. J., Vol.27, 1948, pp.379-423 and 623-656.
[37] A. D. Viterbi and J. K. Omura, Principles of Digital Communication and Coding, NY:
McGraw-Hill, 1979.
[38] R. C. Gonzalez and R. E. Woods, Digital Image Processing, Second Edition, Prentice-Hall, 2002.