Text/Video Alignment - Semantic Event Extraction from Scoreboard Frames

3.2 Proposed Method

4.2.2 Semantic Event Extraction from Scoreboard Frames

4.2.2.4 Text/Video Alignment

Based on the recognized game clock, a text/video alignment is presented to do sports video annotation. The alignment consists two parts. First, through the recognized game clock in video frames, the corresponding target frame of each event extracted from webcast text (see Chapter 3) is located, this is called event moment detection. Second, the time period for each event is determined, this is called event boundary detection.

As to event boundary detection, since many basketball events do not have obvious boundary patterns, even manually labeling event boundary is very subjective, the result may vary from person to person. In fact, a sports event should have redundancies before and after the event moment to explain the cause and result.

Therefore, here, we can set a general interval for all kinds of basketball events. For example, ten seconds before the event moment to five seconds after the event moment.

The basketball events extracted in the interval are treated as successful text/video alignment. An example of text/video alignment is presented in Fig. 4.4.

Fig. 4.4 An example of text/video alignment.

4.3 Experimental Results

Our experiments are conducted by 11 NBA 2008-2009 postseason games. The basketball videos are captured by TV card. The webcast text is acquired from ESPN website. After annotating all basketball videos, the experimental result is evaluated by watching them with human eyes. An event is detected as a hit if the manually generated event boundary is covered by the proposed method. As can be seen in Table 4.1, the detection rate of annotation result without video frames partition is horrible.

On the contrary, the detection rate of the proposed method reaches 100%. The main reason is that video frames partition can prevent unnecessary game clock recognition from frames without a scoreboard and raise the digit recognition rate. This also solves the challenge of discontinuity in basketball games.

Table 4.1 Semantic events extraction results of the proposed method. Correctly detected number / Total event number

(Detection rate)

We check the results and find that the missing events are due to a special circumstance. As can be seen from Fig. 4.5, sometimes a basketball game is playing

but the game clock is not shown on video frames. For example, 1) late start after broadcasting TV commercials; 2) picture-in-picture interviewing a player; 3) some other statistic numbers; 4) trailers produced by the broadcasting company. Since no game clock is available for recognizing in these frames, the missing errors are acceptable.

(a) Late start of a quarter from 11:56 instead of 12:00.

(b) Picture-in-picture interviewing a player.

Fig. 4.5 Examples of basketball games playing without game clock.

4.4 Summary

Using webcast text as external knowledge in multimodal fusion framework for

sports video annotation is a trend recently. Semantic text events are extracted in our previous work (see Chapter 3). The other challenging task is to annotate semantic events by game clock recognition and text/video alignment. Different from game clock recognition of soccer videos, game clock recognition of basketball videos is much harder due to the frequently temporal pause of game clock. The event boundary is hard to detect automatically as well. To treat the above-mentioned problems, a text/video alignment and event annotation method has been proposed. The novelty of video frames partition prevents semantic resource extraction from a lot of unnecessary processing frames, so the performance and detection rate can be increased. Our experiments show that all semantic events are annotated successfully.

CHAPTER 5

A NOVEL METHOD FOR SLOW MOTION REPLAY DETECTION IN BROADCAST BASKETBALL VIDEO

In this chapter, we will propose a method to detect slow motion replays in basketball videos. The existence of scoreboard is referred to filter large amount of non-replay frames, this improves detection accuracy. After video frames partition, every consecutive non-scoreboard frame sequence bounded by scoreboard frames are considered as a non-scoreboard segment. Characteristics of replays and non-replays are observed to create features, which can be used to detect replays and prune non-replays from non-scoreboard segments.

5.1 Introduction

Slow motion replays present detail processes of sports events, and they have been widely referred by professionals for athlete performance analysis, professional training, and injury prevention. Slow motion replays also provide resources for sports video analysis such as highlight generation [23], video summarization [24]-[26], and event detection [8]. Therefore, slow motion replay extraction has become a valuable and hot research topic.

Some methods [23][12][15]-[18] for slow motion replay detection have been proposed, they can be classified into two categories. The first category [23][12][15]

assumes that a replay is sandwiched by either two special digital video effects (SDVEs) or two logo transitions. Based on the assumption, Pan et al. [23] build a hidden Markov model (HMM) to detect slow motion replays. Some methods [12][15]

first locate SDVEs or logo transitions, and then consider those segments sandwiched by SDVEs or logo transitions as slow motion replays. They either assume that the two SDVEs or logo transitions before and after a replay are identical [12] or visually similar [15]. However, these assumptions are not always true in basketball videos. In fact, production effects used in basketball videos are various and complicated. The beginning and end of a basketball replay have some combinations: 1) paired visually similar SDVEs; 2) non-paired SDVEs; 3) a SDVE in one end and an abrupt transition in the other. Furthermore, a basketball video segment bounded by paired SDVEs is not always a replay. So, previous work in this category cannot be applied to basketball videos.

The second category [16]-[18] analyzes features of replays to distinguish replay segments from non-replay segments. Farn et al. [16] extracted slow motion replays captured from both standard cameras and high speed cameras. The extractor refers to the dominate color of soccer field; however, it is not applicable in basketball videos since the size of basketball court is relatively smaller and its textures are more complicated. Wang et al. [17] conducted motion-related features and presented a

support vector machine (SVM) to classify slow motion replays and normal shots. The approach experimented on soccer and basketball videos. But the precision rates of two experimented basketball videos are 55.6% and 53.3% with recall rates 62.5% and 66.7%, respectively. Han et al. [18] proposed a general framework based on Bayesian network to make full use of multiple clues, including shot structure, gradual transition pattern, slow motion, and sports scene. Since they considered gradual transition as a feature clue, the method is suffered from the inaccuracy of their used automatic gradual transition detector. Their experiments performed improvements in replay detection with precision rate 82.9% and recall rate 83.2%, but the recall rate is still not high enough for sports highlight generation.

Basketball is one of the most important sports in the world, yet challenges of slow motion replay detection in basketball videos still remain. The first category methods are not applicable for basketball videos due to the improper assumption. The second category methods are applicable for basketball videos, but there is room for improvement in both precision rate and recall rate. Moreover, most previous researches analyze every video frame to detect replays, but detecting replays in video frames that are surely non-replay degrades both performance and detection rate.

5.2 Proposed Method

In this study, we propose a novel method to tackle above-mentioned challenges and detect slow motion replays in basketball videos. First, video frames partition proposed in Chapter 2 is referred to filter out video frames that have no chance of being replays. After filtering, video frames without scoreboard existence, called non-scoreboard frames, are grouped into several non-scoreboard segments. Then, characteristics of replays and non-replays are both observed to create features, where the former is for detecting replays and the latter is for pruning non-replays.

5.2.1 Video Frames Partition

As can be seen from Fig. 2.1, in basketball videos, all frames can be broadly classified into two categories, scoreboard frames and non-scoreboard frames.

Scoreboard frames present basketball game with scoreboard overlaid on them, while non- scoreboard frames present the rest, e.g., sideline interview, slow motion replay, etc. Since scoreboard frames which are definitely non-replays usually occupy nearly half of a broadcast basketball video, it is beneficial to filter out scoreboard frames from detecting slow motion replays. So, an automatic scoreboard template extractor is needed. As shown in Fig. 2.1(a), a scoreboard is fixed rectangular area with pixels changing infrequently. Based on this fact, our previous work (see Chapter 2)

presented an automatic scoreboard template extractor. Here, we adapt this extractor to get the scoreboard template and position. After scoreboard template extraction, the video frames partition can be done by matching every frame with scoreboard template at the scoreboard position.

在文檔中籃球影片之語義標注與摘要擷取之研究 (頁 64-73)