Multi-level Summarization - Event Detection and Summarization

Chapter 3 The Proposed Method

3.3 Event Detection and Summarization

3.3.3 Multi-level Summarization

In this subsection, we provide summaries of the baseball video. We propose a multi-level hierarchical framework to select the contents of the summaries. We can collect all the detected highlight events as the most detailed summary, or we can reduce highlight events to condense the summarization content. There is a trend that more important events are in the longer PSUs.

In Eq. 3.30, the computation of the score of , , is specified, where

is the corresponding PSU of , and and are adjustable weight parameters. Thus we compute score of each highlight event according to Eq. 3.30. Events in the longer PSUs obtain higher score. After computing the score for each detected event, we can provide the multi-level summarization with different number of highlight events.

eventi Score event )( _i PSU_c

eventi Wt Ws

The multi-level summarization framework conforms to the concept of MPEG-7

hierarchical summary DSs. Fig. 3.13 illustrates the concept of the multi-level summarization.

Higher levels provide more detailed summaries, and lower levels only containing higher score events provide coarser summaries. Users can choose the suitable summary according to their needs.

( _i) ( _c) * ( _c) * (3.30)

Score event =Ti PSU Wt+SN PSU Ws

Fig. 3.13 Multi-level summaries of the baseball video

Chapter 4 Experimental Results

In this chapter, we present the experimental results of baseball video analysis and summarization. In section 4.1, we state the experimental environments and the test baseball video data. In section 4.2, we discuss the experimental results.

4.1 Experimental Environment and Test Data

Our experimental environment is Microsoft Windows XP Professional operating system on an IBM compatible PC with an Intel P4 2.0 GHz CPU and 256 megabytes RAM. The program was developed in the C language and compiled under Microsoft Visual C++ 6.0. The utilized MPEG2 decoder is the open source developed by Berkeley Multimedia Research Center.

We verify our method by making experiments on six test cases, two of them are Taiwan baseball videos, two of them are Japan baseball videos, and the others are American baseball videos. The total length of the videos is about 16 hours. These baseball videos include 5 different stadiums from 3 different channels. All of these videos are of the MPEG-2 format with the 352*240 resolution at 30 fps. Table 4.1 and Fig. 4.1 indicate the information of 6 test videos. All of the baseball videos were shot change detected using the method and program proposed by Lee [17]. Because of the gradual shot changes always occur in the relaxing moments and replays, gradual shot change detection may divide the non-highlight event PSU into more shots and influences the precision of the highlight event detection. Thus, in the shot change detection, we only focus on the abrupt type. In our experiments, the preprocessing time is about 37 minutes, and the analysis time is about 43 minutes for one hour length videos in average.

Video Game Frames Length Video1 2001 Baseball World Cup Game55 237844 2:12:08 Video2 2001 Baseball World Cup Game64 274962 2:32:55 Video3 2003 Asian Baseball Championship Game4 286915 2:39:23 Video4 2003 NPB Nippon series Game1 337255 3:07:22 Video5 2001 MLB World Series Game3 311001 2:52:47 Video6 2001 MLB World Series Game4 257438 2:23:08

Table 4. 1 The information of the test videos

Fig. 4.1 The test baseball videos

4.2 Experimental Results

We use the Precision and Recall measure to evaluate the performance of experimental results. Precision and Recall are indicated in Eq.4.1 and 4.2, where C is the number of correct detected relevant events, F is the number of false detected events, C+F is the number of total detected events, M is the number of missed relevant events, and C+M is the number of total relevant events.

Table 4.2 shows the result of change events detection. In most cases, the results are satisfied. The results in Video4 are worse than other videos, because the luminance of rear part is instable. Three of the false detected events in Video4 are in the neighbor of the positions of three missed change events. Although these false detected events cannot show the precise change event positions, they also structure the game with intervals of half innings.

Video Correct Miss False Precision Recall

Video 1 16 0 2 88.89% 100%

Table 4.2 The Results of change event detection

Table 4.3 presents the results of highlight events detection. The highlight events in which we are interested are advantageous to offensive such as hits, bunts, stealing bases, scores,

sacrifice flies, and defensive errors, or are meaning events such as pause events, dead balls, and game ending events. The precisions and recalls of the highlight event detection are computed by comparing the detected results and the human observation results. The average precision of all test videos is about 53%, and the average recall of all test videos is about 86%

in our experiments. Some false detected highlight events which do not belong to defined highlight events are still meaningful, such as walks, infield outs, outfield outs, and base progressive events. These events also provide useful information about the game to the audiences. Fig. 4.2 illustrates the comparison of the original data sizes and the summary sizes of highlight events in our experiments. The summary data sizes are about 7%-13% of the original data sizes, and the lengths of the summaries are about 12~24 minutes.

After detecting the highlight events and providing the most detailed summary, we can also condense the summary progressively to provide multi-level summaries. We compute a score for each detected highlight event according to Eq.3.30. Then we reserve the higher score events and filter out the lower score events. The number of reserve events is according to the reserve ratio. The reserve ratio is defined in Eq.4.3, where RE is the number of reserve events, and TH is the number of total detected highlight events. Fig. 4.3-4.8 present precisions, recalls, and time ratios of the re-condensed results in Video1-Video6. We can discover that as the highlight event reserve ratio decreases, the precision increases, and the recall and time ratio decreases. In other worlds, the re-condensed summary has higher precision and the coarser content. It is a tradeoff between precision and recall. Some applications like delivering sport video over narrow band networks, using mobile device to browse video, and quickly realizing the game result need the short summary content with high precision, thus the refining summarization process is a feasible solution.

Reserve Ratio = RE (4.3)

Video Correct Miss False Precision Recall

Table 4.3 The Results of highlight event detection

7928

9175 9563 10366

8588 11245

934 667 958 1426 1015 844

Video 1 Video 2 Video 3 Video 4 Video 5 Video 6

Seconds

Original Size Summary size

Fig. 4.2 Comparison of original data size and summarization data size

Precision

4.3(a) Video1 precision of re-condensed summaries

4.3(c) Video1 Time Ratio of re-condensed summaries

4.4(c) Video2 Time Ratio of re-condensed summaries Fig. 4.3 Video1 Experimental Results Fig. 4.4 Video2 Experimental Results

Precision

4.5(c) Video3 Time Ratio of re-condensed summaries Fig. 4.5 Video3 Experimental Results Fig. 4.6 Video4 Experimental Results

Precision

4.7(c) Video 5 Time Ratio of re-condensed summaries

4.8(c) Video 6 Time Ratio of re-condensed summaries

Fig. 4.7 Video5 Experimental Results Fig. 4.8 Video6 Experimental Results

After analyzing the baseball video, we can provide indices for it. There are three types of indices, pitching shots, highlight events, and half innings (according the change events).

Moreover, we can provide multi-level summaries according to the number of the reserve events. Fig. 4.9 gives a re-condensed summarization example of Video5. The total length of Video5 is 172 minutes 46 seconds. The most detailed summary contains all detected 30 events with precision of 53%, and the length is 16 minutes 55seconds with time ratio of 9%. After a re-condensed summarization process, we reserve the top 35% highlight events according to the corresponding scores. This re-condensed summary contains 10 events with precision of 70%, and its length is 6 minutes 39 seconds with time ratio of 3%.

Fig. 4.9 An example of refining summarization process

Chapter 5 Conclusions and Future Works

In this thesis, we proposed a pitching semantic unit (PSU) based baseball video structure analysis and hierarchical summarization method. We efficiently use the color features, and motion features to detect the pitching shots and to classify shot types. After detecting the pitching shots, we segment the baseball video into a sequence of PSUs. We detect change events and highlight events from the PSUs and compute a score for each highlight event to accomplish providing the multi-level summaries.

Characteristics of our proposed method include the following.

1. We propose a simple framework and only use fewer features to analyze the baseball videos efficiently.

2. We provide three types of indices, pitching shots, detected highlight events, and half innings in the baseball video.

3. The multi-level summaries help users to comprehend a baseball game quickly, and users can choose their needed summary according to their preference. The higher level summaries provide more detailed contents, and the lower level summaries provide the coarser contents.

4. The summaries greatly condense the baseball video in time, and they will raise the usage of the baseball video such as delivering over narrow band network and application in mobile devices.

In our work, we focus on detecting most highlight events efficiently, but we do not further classify these events into some categories. In the future, people can extract more characteristics from the PSUs and integrate other features such as audios, textures, and

caption information in the video to classify the detected events into some categories, to get more high-level semantic information, and to promote the highlight event detection precision.

References

[1] N. Day and J. M. Martinez, “Overview of the MPEG-7 Standard (version 4.0),”

ISO/IEC JTC1/SC29/WG11 N4675 Jeju, March 2002

[2] A. Ekin, A. M. Tekalp, and Rajiv Mehrotra “Automatic Soccer Video Analysis and Summarization,” IEEE Trans. Image Processing, Vol. 12, No. 7, pp. 796-807, July 2003

[3] D. Zhong and S. F. Chang, “Structure Analysis of Sports Video Using Domain Models,” IEEE International Conference on Multimedia and Expo, pp. 22-25, August 2001

[4] S. F. Chang, D. Zhong and R. Kumar, “Real-Time Content-Based Adaptive Streaming of Sports Videos,” Content-Based Access of Image and Video Libraries, 2001 (CBAIVL 2001) IEEE Workshop on, Dec 2001

[5] T. Kawashima, K. Tateyama, T. Iijima and Y. Aoki, “Indexing of Baseball Telecast for Content-based Video Retrieval,” IEEE International Conference on Image Processing, vol.1, pp. 871-874, Oct 1998

[6] D. Zhang and S. F. Chang, “Event Detection in Baseball Video Using Superimposed Caption Recognition,” 10th ACM International Conference on Multimedia, pp.

315-318, 2002

[7] C. L. Huang and C. Y. Chang, “Video summarization using Hidden Markov Model,” IEEE International Conference on Information Technology: Coding and Computing, pp. 473-477, 2001

[8] H. C. Shih and C. L. Huang, “Image Analysis and Interpretation for Semantics Categorization in Baseball Video,” IEEE International Conference on Information Technology: Coding and Computing [Computers and Communications], pp.

379-383, 2003

[9] W. Hua, M. Han and Y. Gong, “Baseball Scene Classification Using Multimedia Features,” IEEE 2002

[10] P. Chang, M. Han and Y. Gong, “Extract Highlights From Baseball Game Video with Hidden Markov Models” IEEE International Conference on Multimedia and Expo, Vol. 1, pp. 821-824, Aug. 2002

[11] M. Han, W. Hua, W. Xu and Y. Gong, “An integrated Baseball Digest System Using Maximum Entropy Method,” ACM international conference on Multimedia, pp.

347-350, 2002

[12] R. C. Gonzales and R. E. Woods “Digital Image Processing,” Prentice Hall, 2002 [13] D. S. Taubman, M. W. Marcellin, “JPEG2000: image compression fundamentals,

standards, and practice,” Kluwer Academic Publishers, 2002 [14] ISO/IEC IS 13818-2, MPEG-2 Video

[15] F. Idris and S. Panchanathn, “Review of Image and Video Indexing Technique”, Journal of Visual Communication and Image Representation, Vol. 8, No. 2, pp.

146-166 1997

[16] I. Koprinska and S. Carrato, “Temporal Video Segmentation: A Survey,” Signal Processing: Image Communication, Vol. 16, pp. 477-500, 2001

[17] W. T. Lee, “MPEG Video Analysis – Shot Change Detection and Classification,”

Master Thesis, Institute of Computer and Information Science, National Chaio Tung University, 2003

[18] V. Kobla, D. Doermann, K. I. Lin, and C. Faloutsos, “Compressed domain video indexing techniques using DCT and Motion Vector information in MPEG video,” in Proc. of the SPIE Conference on Storage and Retrieval for Still Image and Video Databases V, vol. 3022, pp200-211, 1997

[19] J. B. McQueen, “Some methods of classification and analysis of multivariate

observations,” In Proc. of 5th Berkeley Symp. on Mathematical Statistics and Probability, pp. 281-297, 1967

[20] S.C. Pei and Y. Z. Chou, “Efficient MPEG Compressed Video Analysis Using the Macroblock Type Information,” IEEE Trans. Multimedia, Vol. 1 No. 4, PP. 321-333, 1999

[21] X.D. Zhang, T. Y. Liu, K. T. Lo, and J. Feng, “Dynamic Selection and Effective Compression of Key Frames for video abstraction,” Pattern Recognition Letters, Vol. 24, pp. 1523-1532, 2003

[22] Z. H. Liu, “A Content Retrieval System Based on MPEG-7 Descriptors and JPEG2000 for Mobile Applications,” Master Thesis, Institute of Computer and Information Science, National Chaio Tung University, 2003

在文檔中以投球語意單元為基底的棒球影片結構分析與階層式摘要 (頁 44-0)