• 沒有找到結果。

Conclusions and Future Work

There is a saying, “To know both the opponent and yourself, and you can fight with no danger of defeat.” We can stand a better chance if we know the opponent more. Hence, game study before the play is a task of vital importance for the coach and players. However, manual game logging, annotation and analysis via watching the whole sports video are laborious and time-consuming. Therefore, the coach and players keenly desire the assistance of computer technology in game study. On the other hand, the audience or sports fans recently are no longer satisfied with the video viewing systems providing quick browsing, indexing and summarization of sports video. They demand informative data to have a further insight into the games. Hence, our research in this thesis focuses on sports video content analysis, understanding and annotation so as to provide computer-assisted game study and content-based sports information retrieval.

In sports games, significant events are mainly caused by the ball-player interaction and the ball trajectory brings much semantic/tactical information contributive to content understanding. Hence, we propose a physics-based ball tracking scheme to compute the ball trajectory, and furthermore design an innovative approach capable of reconstructing the 3D trajectory from single camera video. Since the ball is small and usually moves fast in frames, recognizing the ball within a single frame is almost impossible. We identify the ball trajectory via judging whether the trajectory conforms to the ball motion characteristics rather than recognize which object is the ball in each frame. The ball positions missed can also be recovered by the obtained trajectory. Moreover, the 2D-to-3D inference is intrinsically a challenging problem due to the loss of the depth information in picture capturing.

Incorporating the court specifications for camera calibration and the physical characteristics of ball motion for 3D trajectory modeling, we are able to compute the motion parameter of

the modeled 3D trajectory and approximate the depth information. The challenge of 3D trajectory reconstruction from single camera video is thus overcome. Manifold trajectory-based applications are designed to comprehend the semantic or tactical content, including: shooting location estimation in basketball, pitch analysis in baseball, set type recognition, serve placement estimation and 3D virtual replay in volleyball, etc. Game watching becomes an entirely novel and exciting experience.

The strike zone plays a crucial role in each pitch of the baseball games since the strike zone not only supports the strike/ball judgment but also provides the reference for determining the pitch location. Thus, we design a stance-based strike zone shaping scheme which integrates efficient algorithms of home plate detection, object contour and dominant point locating. No matter the batter is right- or left-handed, the strike zone can be shaped adaptively to the batter’s stance. In addition to the confrontation of the pitch vs. the batter, the ball motion and the defense process after the ball is batted into the field also catch the attention of the audience. Thus, we recognize the spatial patterns in the frames of the field shot, classify the play regions (the active regions of event occurrence), and infer the ball routing patterns and defense process from the transitions of play regions. From ball tracking, strike zone shaping to play region classification for ball routing pattern inference, we have fairly extensive analysis for baseball video. Informative annotation and sports information enable the sports fans and professionals to go deep into the game.

Comprehensive experiments on basketball, volleyball and baseball videos have been conducted. The experimental results show that the proposed methods perform well in retrieving game information and even reconstructing 3D information from single camera video for different kinds of sports. The features and techniques proposed in this thesis lead to satisfactory solution for content understanding, tactics analysis sports information retrieval and computer-assisted game study in many kinds of sports videos. Although the 3D trajectory reconstruction method proposed in this thesis has good results, there is still some deviation

between the real-world ball trajectory and the reconstructed trajectory. This may result from the effects of the physical factors we do not involve, such as air friction, ball spin rate and spin axis, etc., and the intrinsic constraints, such as the loss of 3D information or depth information, lighting conditions, noises, the capturing angle and the frame rate of the capturing device, etc. Hence, one direction of our future work is to involve more physical factors in modeling the 3D trajectory for better approximating the reconstructed 3D trajectory to the real world ball trajectory. Moreover, single-view video analysis may be limited by the incomplete 3D information due to object occlusion and the loss of depth information. Another direction is to extend the proposed approaches to multi-view video analysis. In the future, we will integrate the information from multiple cameras to reinforce multimedia content analysis, understanding, indexing and annotation.

On the other hand, we are currently working on deriving the intrinsic rules of region transitions for different defense patterns in baseball games, using temporal pattern mining based on the play region classification proposed in this thesis. The cooperation of players to achieve a successful defense is exciting and inspiring. Hence, we will utilize the transitions of the play region classified for ball route pattern deducing, content-based defense process recognition and similarity event retrieval with concise content presentation, so that the sports fans and professionals will be greatly assisted in game strategy studying, statistics collection, tactics analysis and even improving their own skills.

Reference

[1] L. Y. Duan, M. Xu, Q. Tian, C. S. Xu and J. S. Jin, “A unified framework for semantic shot classification in sports video,” IEEE Trans. on Multimedia, vol. 7, 2005, pp.

1066-1083.

[2] H. Lu, Y. P. Tan, “Unsupervised clustering of dominant scenes in sports video,” Pattern Recognition Letters, vol.24 ,issue 15, 2003, pp. 2651-2662.

[3] T. Mochizuki, M. Tadenuma and N. Yagi, “Baseball video indexing using patternization of scenes and hidden Markov model,” in: Proc. of the IEEE Int. Conf. on Image Processing, vol. 3, 2005, pp.1212-1215.

[4] J. Assfalg, M. Bertini, C. Colombo, A. D. Bimbo, W. Nunziati, “Semantic annotation of soccer videos: automatic highlights identification,” Computer Vision and Image Understanding, vol. 92, issue 2-3, 2003, pp. 285-305.

[5] Y. Gong, M, Han, W. Hua, W. Xu, “Maximum entropy model-based baseball highlight detection and classification,” Computer Vision and Image Understanding, vol. 96, issue 2, 2004, pp. 181-199.

[6] C. C. Cheng, C. T. Hsu, “Fusion of audio and motion information on HMM-based highlight extraction for baseball games,” IEEE Trans. on Multimedia, vol. 8, 2006, pp.

585-599.

[7] L. Xie, P. Xu, S. F. Chang, A. Divakaran, H. Sun, “Structure analysis of soccer video with domain knowledge and hidden Markov models,” Pattern Recognition Letters, vol. 25, issue 7, 2004, pp. 767-775.

[8] X. Yu, C. Xu, H. W. Leong, Q. Tian, Q. Tang, K. W. Wan, “Trajectory-based ball detection and tracking with applications to semantic analysis of broadcast soccer video,”

in: Proc. of the 11th ACM Int. Conf. on Multimedia, 2003, pp. 11-20.

[9] H. T. Chen, H. S. Chen and S. Y. Lee, “Physics-based ball tracking in volleyball videos with its applications to set type recognition and action detection,” in: Proc. IEEE Int.

Conf. on Acoustics, Speech, and Signal Processing, 2007, pp. I-1097-I-1100.

[10] H. T. Chen, H. S. Chen, M. H. Hsiao, W. J. Tsai and S. Y. Lee, “A trajectory-based ball tracking framework with visual enrichment for broadcast baseball videos,” Journal of Information Science and Engineering, 24 (1), 2008, pp. 143-157.

[11] A. Gueziec, “Tracking pitches for broadcast television,” Computer, vol.35, 2002, pp.38-43.

[12] Hawk-Eye, http://news.bbc.co.uk/sport1/hi/tennis/2977068.stm.

[13] QUESTEC, http://www.questec.com/q2001/prod_uis.htm.

[14] G. Pingali, A. Opalach, Y. Jean, “Ball tracking and virtual replays for innovative tennis broadcasts,” in: Proc. of the 15th Int. Conf. on Pattern Recognition, vol. 4, 2000, pp.

152-156.

[15] J. R. Wang, N. Parameswaran, “Detecting tactics patterns for archiving tennis video clips,” in: Proc. of IEEE 6th Int. Symp. on Multimedia Software Engineering, 2004, pp.

186-192.

[16] D. Y. Chen, M. H. Hsiao, and S. Y. Lee, “Automatic closed caption detection and filtering in MPEG videos for video structuring,” Journal of Information Science and Engineering, vol. 22, issue 5, 2006, pp. 1145-1162.

[17] A Ekin, A. M. Tekalp, R Mehrotra, “Automatic soccer video analysis and summarization,” IEEE Trans. on Image Processing, vol. 12, 2003, pp. 796-807.

[18] W. J. Heng and K. N. Ngan, “Shot boundary refinement for long transition in digital video sequence,” IEEE Trans. on Multimedia, vol. 4, issue 4, 2002, pp. 434 – 445.

[19] A. Hanjalic, “Shot-boundary detection: unraveled and resolved?” IEEE Trans. on Circuits and Systems for Video Technology, vol. 12, issue 2, 2002, pp. 90-105.

[20] D. Y. Chen, S. Y. Lee and H. Y. Mark Liao, “Robust video sequence retrieval using a novel object-based T2D-histogram descriptor,” Journal of Visual Communication and Image Representation, vol. 16, issue 2, 2005, pp. 212-232.

[21] S. Y. Lee, J. L. Lian, D. Y. Chen, “Video summary and browsing based on story-unit for video-on-demand service,” in: Proc. Int. Conf. on Information, Communications and Signal Processing, 2001.

[22] A. Ekin and A.M. Tekalp, “Robust dominant color region detection and color-based applications for sports video,” in: Proc. IEEE Int. Conf. Image Processing, vol. 1, 2003, pp. 21-24.

[23] G. Millerson, The technique of television production, 12th ed., New York: Focal, March 1990.

[24] A. M. Ferman and A. M. Tekalp, “A fuzzy framework for unsupervised video content characterization and shot classification,” Journal of Electronic Imaging, vol. 10, no. 4, 2001, pp. 917–929.

[25] D. Farin, S. Krabbe, P. H. N. de With and W. Effelsberg, “Robust camera calibration for sport videos using court models,” SPIE Storage and Retrieval Methods and Applications for Multimedia, 2004, vol.5307, pp. 80-91.

[26] D. Farin, J. Han and P. H. N. de With, “Fast camera calibration for the analysis of sport sequences,” in: Proc. IEEE Int. Conf. on Multimedia and Expo 2005, pp.-, 2005.

[27] B. Jähne, Digital Image Processing, Springer Verlag, 2002.

[28] J. W. Davis, A. F. Bobick, “The recognition of human movement using temporal templates,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 23 (3), 2001, pp.

257-267.

[29] L. Y. Duan, M. Xu, Q. Tian and C. S. Xu, “Mean shift based video segment representation and applications to replay detection,” in: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 2004, pp. V- 709-712.

[30] L. Y. Duan, M. Xu, T. S. Chua, Q Tian, and C. S. Xu, “A mid-level representation framework for semantic sports video analysis,” in: Proc. 11th ACM Int. Conf. on Multimedia, 2003, pp. 33- 44.

[31] M. C. Tien, H. T. Chen, Y. W. Chen, M. H. Hsiao and S. Y. Lee, “Shot classification of basketball videos and its applications in shooting position extraction,” in: Proc. IEEE Int.

Conf. on Acoustics, Speech, and Signal Processing, pp. I-1085-I-1088, 2007.

[32] G. Zhu, Q. Huang, C. Xu, L. Xing, W. Gao, and H. Yao, “Human behavior analysis for highlight ranking in broadcast racket sports video,” IEEE Trans. on Multimedia, vol. 9, no.6, pp.1167-1182, Oct. 2007.

[33] H. T. Chen, M. H. Hsiao, H. S. Chen, W. J. Tsai and S. Y. Lee, “A baseball exploration system using spatial pattern recognition,” in Proc. IEEE Int. Symp. on Circuits and Systems, pp. 3522-3525, 2008

[34] X. Yu, H. W. Leong, C. Xu, and Q. Tian, “Trajectory-Based Ball Detection and Tracking in Broadcast Soccer Video,” IEEE Trans. on Multimedia, vol. 8, no.6, pp.1164-1178, Dec. 2006.

[35] G. Zhu, Q. Huang, C. Xu, Y. Rui, S. Jiang, W. Gao and H, Yao, “Trajectory based event tactics analysis in broadcast sports video,” in: Proc. 15th ACM Int. Conf. on Multimedia, pp.58-67, 2007.

[36] X. Yu, N. Jiang, L. F Cheong, H. W. Leong and X. Yan, “Automatic camera calibration of broadcast tennis video with applications to 3d virtual content insertion and ball detection and tracking,” Computer Vision and Image Understanding, 2008.

[37] H. T. Chen, M. C. Tien, Y. W. Chen, W. J. Tsai and S. Y. Lee, “Physics-based ball tracking and 3D trajectory reconstruction with applications to shooting location estimation in basketball video,” Journal of Visual Communication and Image Representation, vol. 20, pp. 204-216, 2009 .

[38] T. Watanabe, M. Haseyama, and H. Kitajima, “A soccer field tracking method with wire frame model from TV images,” in: Proc. IEEE Int. Conf. on Image Processing, vol. 3, pp.

1633-1636, 2004

[39] X. Yu, N. Jiang and L. F. Cheong, “Accurate and stable camera calibration of broadcast tennis video,” in: Proc Int. IEEE Conf. Image Processing, vol. 3, pp. 93-96, 2007.

[40] A. Loui, J. Luo, S. Chang, D. Ellis, W. Jiang, L. Kennedy, K. Lee, and A. Yanagawa,

“Kodak 's consumer video benchmark data set: concept definition and annotation,” in:

Proc. Int. Workshop on Multimedia information Retrieval, pp.245-254, 2007.

[41] R. Oami, A.B. Benitez, S.-F. Chang and N. Dimitrova, “Understanding and modeling user interests in consumer videos,”in: Proc. IEEE Int. Conf. on Multimedia and Expo, vol.2, pp.1475-1478, 2004.

[42] C. Forlines, K.A. Peker and A. Divakaran “Subjective assessment of consumer video summarization,” in: Proc. SPIE Int. Soc. Opt. Eng. 6073, 60730J, 2006.

[43] N. Owens, C. Harris, and C. Stennett, “Hawk-eye tennis system,” in: Proc. VIE 2003, pp.182- 185, 2003.

[44] T. Zhang and C. C. Jay Kuo, “Audio content analysis for online audiovisual data segmentation and classification,” IEEE Trans. Speech and Audio Processing, vol. 9, no.

4, pp.441-457, May 2001.

[45] Y. Wang, Z. Liu, and J.C. Huang, “Multimedia content analysis using both audio and visual clues,” IEEE Signal Processing Magazine, vol. 17, Issue 6, pp.12-36, Nov. 2000.

[46] M. Xu, N. C. Maddage, C. Xu, M. Kankanhalli, and Q. Tian, “Creating audio keywords for event detection in soccer video,” in: Proc. IEEE Int. Conf. on Multimedia and Expo, vol.2, pp.II- 281-284, 2003.

[47] M. Xu, L. Duan, L. Chia, and C. Xu, “Audio keyword generation for sports video analysis,” in: Proc. 12th Annual ACM Int. Conf. on Multimedia, pp. 758-759, 2004.

[48] B. Zhang, W. Dou, and L. Chen, “Ball Hit Detection in Table Tennis Games Based on Audio Analysis,” in: Proc. 18th IEEE Int. Conf. on Pattern Recognition, vol. 3, pp.

220-223, 2006.

[49] R. Hartley and A. Zisserman, Multiple view geometry in computer vision, Cambridge University Press 2003(2nd edition), UK.

[50] W. Hua, M. Han, and Y. Gong, “Baseball scene classification using multimedia features,”

in: Proc. of IEEE Int. Conf. on Multimedia and Expo 2002, vol. 1, pp. 821-824, 2002 [51] Y. Rui, A. Gupta, and A. Acero, “Automatically extracting highlights for TV baseball

programs,” in: Proc. of the 8th ACM Int. Conf. on Multimedia, pp. 105-115, 2000.

[52] W. T. Chu, C. W. Wang, and J. L. Wu, “Extraction of baseball trajectory and physics-based validation for single-view baseball video sequences,” in: Proc. of IEEE Int.

Conf. on Multimedia and Expo 2006, pp. 1813-1816, 2006.

[53] S. C. Pei and F. Chen, “Semantic scenes detection and classification in sports videos,” in:

Proc. of IPPR Conf. on Computer Vision, Graphics and Image Processing (CVGIP), 2003, pp. 210-217.

[54] D. Zhong and S. F. Chang, “Structure analysis of sports video using domain models,” in:

Proc. of IEEE Int. Conf. on Multimedia and Expo 2001, pp. 713-716, 2001.

[55] K. Nummiaro, E. Koller-Meier, and L. V. Gool, “An adaptive color-based particle filter,”

Image and Vision Computing, vol. 21, pp. 99-110, 2003.

[56] M. Kumano, Y. Ariki, K. Tsukada, S. Hamaguchi and H. Kiyose, “Automatic extraction of PC scenes based on feature mining for a real time delivery system of baseball highlight scenes,” in: Proc. IEEE Int. Conf. on Multimedia and Expo, vol. 1, pp. 277-280, 2004.

[57] http://en.wikipedia.org/wiki/Strike_zone

[58] http://en.wikipedia.org/wiki/Category:Baseball_terminology [59] http://mlb.mlb.com/mlb/official_info/umpires/strike_zone.jsp

[60] C. Kim, J. N. Hwang, “Fast and automatic video object segmentation and tracking for content-based applications,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 12, pp.122-129, 2002.

[61] http://mlb.mlb.com/mlb/official_info/official_rules/objectives_1.jsp

[62] Z. Xiong, R. Radhakrishnan, A. Divakaran, T. S. Huang, “Highlights extraction from sports video based on an audio-visual marker detection framework,” in: Proc. of IEEE Int. Conf. on Multimedia and Expo 2005, pp. 29-32, 2005.

[63] G. Welch and G. Bishop, “An introduction to the Kalman filter,” Technical Report no. TR

95-041, University of North Carolina at Capel Hill, 2004.

[64] L. Piegl, W. Tiller, “The NURBS Book,” Springer, ISBN 3-540-61545-8, 1997.

[65] D. Hoffman, W. Richards, “Parts of recognition,” Cognition, vol.18, pp. 65-96, 1984.

[66] K. Siddiqi and B. B. Kimia, “Parts of visual form: computational aspects,” IEEE Trans.

on Pattern Analysis and Machine Intelligence, vol. 17, pp. 239-251, 1995.

[67] X. Tong, H. Lu, Q. Liu and H. Jin, “Replay detection in broadcasting sports video,” in:

Proc. of 3rd IEEE Int. Conf. on Image and Graphics, pp. 337-304, 2004.

[68] H. Shum and T. Komura, “Tracking the Translational and Rotational Movement of the Ball Using High-Speed Camera Movies,” in: Proc. of the IEEE Int. Conf. on Image Processing 2005, vol. 3, pp.1084-1087, 2005.

[69] P. Chang, M. Han and Y. Gong, “Extract highlights from baseball game video with hidden Markov models,” in: Proc. of the IEEE Int. Conf. on Image Processing 2002, vol.

1, pp.609-612, 2002.

[70] R. C. Nelson, “Finding Line Segments by Stick Growing,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.16, pp. 519-523,1994.

[71] W. T. Chu and J. L. Wu “Explicit semantic events detection and development of realistic applications for broadcasting baseball videos,” Multimedia Tools and Applications, vol.

38, No. 1, pp. 27-50, 2008.

[72] C. H. Liang, W. .T Chu, J. H. Kuo, J. L. Wu, W. H. Cheng, “Baseball event detection using game-specific feature sets and rules,” in: Proc. IEEE Int. Symp. on Circuits and System2005, vol. 4, pp. 3829-3832, 2005.

[72] W. T Chu, J. L. Wu, “Integration of rule-based and model-based decision methods for baseball event detection,” In: Proc. IEEE Int. Conf. on Multimedia and Expo 2005, pp.-, 2005.

[73] S. F. Chang, “The holy grail of content-based media analysis,” IEEE Multimedia, vol. 9, no.2, pp.6-10, 2002.

[74] A. Kokaram, N. Rea, R. Dahyot, M. Tekalp, P. Bouthemy, P. Gros, and I. Sezan,

“Browsing sports video: trends in sports-related indexing and retrieval work,” IEEE Signal Processing Magazine, vol. 23, pp. 47-58, Mar. 2006.

相關文件