Hierarchical Search System - Proposed Method

3.2 Proposed Method

3.2.3 Hierarchical Search System

Fig. 3.6 gives an example to show the concept of the proposed hierarchical search system. First, a user can query by one word to get rough results. Then he can continually query by more words to get into deeper levels for finer results. Here we implement the system by establishing a pair of index tables and manipulating them back and forth.

Fig. 3.6 An example to illustrate the concept of the proposed hierarchical search system.

Query

jumper

makes

assists

misses

dunk

makes

Here we build a forward index table and an inverted index table. The former records mappings from descriptions to event keywords, and the latter stores mappings from keywords to descriptions. Note that the forward index table is established automatically after applying the unrelated words filtering procedure. Based on the forward index table, the inverted index table can be established by sequentially scanning event keyword set of each description. An example is given in Fig. 3.7 to do clearer explanation. Suppose we have five descriptions as shown in Fig. 3.7(a). After applying unrelated words filtering procedure to each description, we can obtain Fig.

3.7(b). By scanning each row in Fig. 3.7(b), for each row, we can obtain a description index (DI) and the corresponding event keyword set (EKS). Then DI is linked to each keyword in EKS. After scanning all rows sequentially in Fig. 3.7(b), Fig. 3.7(c) is established. Both inverted index table and forward index table are referred to achieve the hierarchical search system. The inverted index table is used for returning query results by intersecting those description sets mapped by query keywords. The forward index is originally just an intermediate, but reused in our method for providing suggested query keywords, i.e. auto-complete feature.

Webcast Text Index of Description Description

D1 Peja Stojakovic misses 10-foot two point shot

D2 David West misses jumper

D3 Peja Stojakovic makes 19-foot two point shot D4 Trevor Ariza makes 19-foot jumper

D5 David West makes 17-foot jumper (Chris Paul assists)

(a) Descriptions and their indices.

Forward Index

Index of Description Event Keyword Set

D1 misses, shot

D2 misses, jumper

D3 makes, shot

D4 makes, jumper

D5 assists, makes, jumper

(b) Mappings from description indices to event keywords.

Inverted Index

Keywords Indices of Description Set assists D5

jumper D2, D4, D5 makes D3, D4, D5 misses D1, D2 shot D1, D3

Fig. 3.7 An example to illustrate the data structure for hierarchical search.

In our system, a query is considered as a set of multiple words. The hierarchical feature means that a user can get more general results by querying fewer words or get more specific result by querying more words; for example, the results of querying

“jumper” are those descriptions having the keyword “jumper”, and the results of querying “jumper makes” are those descriptions having both “jumper” and “makes.”

The query result is the intersection of description sets obtained through the keywords of query in the inverted index list. For providing suggested query keywords, the resulting intersection set is then used as another query for the forward index list. The keyword set of each description in the resulting intersection set are extracted. Finally, the union of all extracted keyword sets is considered as the suggested query keywords.

The detail algorithm of the proposed search system is given below.

Hierarchical Search Algorithm

Step1: A user types several query words.

Step2: Look up the inverted index and get description sets mapped by the query

words. Intersect these description sets to obtain a query result.

Step3: Look up the forward index and get keyword sets mapped by the query

result.

Step4: Output the union set of these word sets. The user selects some keywords from output as query words. Perform Step2 and output the query result.

Here, we use Fig. 3.7 as an example to do explanation. Assume that a user types a query {jumper}, the system will look up the inverted index list and get a temporary result set {D2, D4, D5}. Then, the system will look up the forward index list and recommend the user {assists, jumper, makes, misses}, i.e. the union set of {jumper, misses}, {jumper, makes}, and {assists, jumper, makes}. If the user changes his query

to {jumper, makes}, the system will return {D4, D5}, i.e. the intersection set of {D3, D4, D5} and {D2, D4, D5}. Therefore, a powerful hierarchical search system with query recommendation function is built.

3.3 Experimental Results

In most search systems, statistical analysis such as receiver operating characteristic (ROC) analysis or recall-precision is used to evaluate the performance.

Through the analysis, the system degradation caused by misclassification can be estimated. However, as mentioned in Section 3.2.2, we cluster descriptions by an exactly matching function, so there is no misclassified event in our system. This means that both precision and recall rates of the proposed method are 100%.

Researches aimed at detecting text events from webcast text are few. Xu and Chua [5] modeled webcast text as external knowledge in detecting events from football and soccer. The evaluation of the fusion video event detection was presented, but that of webcast text analysis alone was not. Xu et al. [8] proposed a framework to analyze webcast text and videos independently and align them through game time.

According to the framework, the performance of video event detection mainly depends on webcast text analysis. Here we compare our method with Xu et al.’s work.

Our experiments are conducted by 25 NBA 2009-2010 games and 41 NBA

2008-2009 postseason games. The former are used as training database, and the latter are used as testing database to examine the reliability of the proposed method. We also collect 68 UEFA Champions League 2010-2011 soccer games, where 20 of them are used as training database and the other 48 are used as testing database. The webcast text from 134 games is acquired from ESPN website. As can be seen in Table 3.1, hundreds of descriptions in a game are clustered into, in average, 44 semantic event categories for basketball and 20 semantic event categories for soccer.

Table 3.1 Average number of sports event categories in 25 basketball training data and 20 soccer training data.

Mean Variance Standard deviation Basketball 44.08 9.08 3.01

Soccer 19.85 5.40 2.32

From Xu et al.’s previous work, the pLSA, the optimal number of event categories is nine for basketball and eight for soccer. The top three keywords of each category are selected by a conditional probability. They use the top ranked keyword as single keyword during event detection. We map the top three results of pLSA to our multiple keywords categories in Table 3.2 and Table 3.3. In Table 3.3, because

“attempt” is chosen as a member of black list in the interactive system, we use “shot”

as the single-keyword match for mappings from soccer events in pLSA to those in the proposed method. The words “missed” and “misses” refer to the same verb (e.g., miss)

and have the same meaning in descriptions. We consider these two words as the same and use “missed(misses)” as their common representative. In order to achieve fine performance in detecting semantic events, Xu et al. not only use keywords detection in description sentences, but also analyze context information in them. For example, in basketball, the top ranked keyword “jumper” is detected as “Jumper” event only if its previous word is “makes,” and other sentences containing word “jumper,” e.g., Kenyon Martin misses 22-foot jumper, are discarded. However, these discarded events are actually semantic events and can be valuable for further research, e.g., sports posture analysis, injury prevention, special highlight, etc. It can be seen from Table 3.2 and Table 3.3 that every category of pLSA is mapped to several different semantic events of the proposed method. These several events are related but somehow different. For example, in basketball, “jumper misses” describes that a jumper is missed while “jumper makes” describes that a jumper is made successfully.

In soccer, “blocked shot” describes that a shot attempt is blocked by an opponent while “missed(misses) shot” describes that a shot attempt is missed by the kicker himself. Hence, misclassifying or discarding these events decreases the precision and recall rates. However, in our method, the precision and recall rates are both 100%.

With the support of hierarchical search system, we can query multiple keywords for more specific events, which is even better than pLSA with context information. Table

3.2 and Table 3.3 also show those semantic event categories which are unavailable in Xu et al.’s method, but can be detected in our method, e.g., steal, timeout, turnover for basketball and injury, blocked, penalty for soccer. These semantic events are important for special highlights or injury prevention, and should not be ignored or misclassified. So, the proposed method is superior to pLSA.

Table 3.2 Mappings of basketball event categories from pLSA to the proposed method.

Xu et al.’s Method (pLSA) Proposed Method

(Categories with Multiple Keywords) Category Ranked

Keywords

Shot shot makes shot, misses shot

pass bad

Jumper jumper jumper misses, jumper makes, assists jumper makes

foot misses

Layup layup layup makes, layup misses, driving layup makes, assists layup makes

driving blocks

Dunk dunk dunk makes, assists dunk makes, dunk makes slam, driving dunk makes, dunk misses makes

misses

Table 3.2 Mappings of basketball event categories from pLSA to the proposed method (continued).

Xu et al.’s Method (pLSA) Proposed Method

(Categories with Multiple Keywords) Category Ranked

Keywords

Block blocks blocks layup, blocks jumper, blocks driving layup, blocks hook shot, blocks shot, blocks dunk, blocks layup, blocks jumper, blocks driving layup, blocks hook shot, blocks shot,

blocks dunk shot

assists

Rebound rebound defensive rebound, offensive rebound defensive

offensive

Foul foul draws foul shooting, draws foul personal, draws foul offensive, ball draws foul loose,

foul technical, defense foul illegal person, draws flagrant foul type

draw personal

Free throw throw free makes throw, free misses throw free

makes

Substitution enters enters

game timeout

N/A bad pass, bad pass steals, bad lost steals, full timeout, official timeout, turnover, traveling, ejected, double dribble, defense illegal, clock

shot violation

Table 3.3 Mappings of soccer event categories from pLSA to the proposed method.

Xu et al.’s Method (pLSA) Proposed Method

(Categories with Multiple Keywords) Category Ranked

Keywords

Corner corner corner, assisted corner saved shot, corner goal penalty shot, corner saved shot, assisted corner

goal, assisted corner goal shot, assisted corner missed(misses), corner goal shot, corner

missed(misses) shot, assisted corner missed(misses) shot, corner free kick missed(misses) shot, assisted corner saved, corner

free goal kick shot conceded

bottom

Shot attempt blocked shot, assisted missed(misses) shot, assisted blocked shot, assisted goal saved shot, missed(misses) shot, assisted corner saved shot,

assisted shot, corner goal penalty shot, corner saved shot, assisted corner goal shot, corner goal shot, corner missed(misses) shot, goal saved shot,

free kick shot, assisted goal shot, free kick missed(misses) shot, assisted corner missed(misses) shot, corner free kick missed(misses) shot, goal penalty saved shot,

corner free goal kick shot, goal penalty shot right

footed

Foul foul foul, card foul yellow, foul penalty, card foul dangerous

for

Card yellow card foul yellow, card yellow shown

card

Table 3.3 Mappings of soccer event categories from pLSA to the proposed method (continued).

Xu et al.’s Method (pLSA) Proposed Method

(Categories with Multiple Keywords) Category Ranked

Keywords

Free kick kick free kick, free kick shot, free kick missed(misses) shot, corner free kick missed(misses) shot, corner free goal kick shot free

wins

Offside offside offside

ball tries

Substitution substitution replaces substitution, injury replaces substitution replaces

lineups

Goal goal assisted goal saved shot, corner goal penalty shot, assisted corner goal, assisted corner goal shot, corner goal shot, goal saved shot, assisted goal shot, assisted goal saved, goal penalty saved shot, goal saved, goal, corner free goal kick shot,

goal penalty shot, assisted goal shot

box

N/A injury, assisted missed(misses), assisted blocked, penalty, assisted

Here we want to examine the reliability of the proposed method. For basketball, 25 NBA 2009-2010 games are taken as training data. After processing all the training data and gathering the extracted semantic events, we collect the union of these semantic events as a sample set with cardinality 82. Then we process the testing data, which are collected from 41 NBA 2008-2009 postseason games, and examine whether all the semantic events extracted from testing data are listed in the sample set or not.

For soccer, we use 20 UEFA Champions League soccer games as training data and 48 UEFA Champions League soccer games as testing data. According to our examination, with sparse exceptions, almost all the semantic events extracted from testing data can be found in the sample set. Table 3.4 and Table 3.5 show all exception events which are quite rare. These exceptions may be caused by different writing styles or some rarely happened events, and can still be collected in an interactive way if necessary.

Therefore, the proposed method is very stable.

Table 3.4 Occurrences of exception basketball events from 41 testing games.

Exception events 18679 basketball descriptions Number (Percentage) away ball draws foul

misses pointer flagrant free misses throw

blocks driving dunk

Table 3.5 Occurrences of exception soccer events from 48 testing games.

Exception events 5727 soccer descriptions Number (Percentage) card

corner penalty saved shot missed(misses)

goal shot

assisted corner missed shot missed shot

3.4 Summary

In this chapter, we have proposed an unsupervised approach for semantic event extraction from sports webcast text and made some contributions: 1) detecting semantic events from webcast text in an unsupervised manner; 2) requiring no additional context information analysis; 3) preserving more significant events in sports games; 4) extracting multiple keywords from event categories to support hierarchical searching; 5) providing auto-complete feature for finer retrieval.

According to experimental results, the proposed method extracts significant semantic events from basketball and soccer games and preserves those events that are ignored or misclassified by previous work. The extracted significant text events can be used for further video indexing and summarization. Furthermore, the proposed method is reliable.

CHAPTER 4

ANNOTATING WEBCAST TEXT IN BASKETBALL VIDEOS BY GAME CLOCK RECOGNITION AND TEXT/VIDEO ALIGNMENT

In this chapter, we will propose a text/video alignment and event annotation method. As mentioned in Chapter 2, semantic events appear in scoreboard frames only.

Thus, the proposed semantic event extraction method focuses on analyzing scoreboard frames. For each scoreboard frame, location of each clock digit is first located. A digit templates collection scheme is provided to collect digit character templates. With clock digit locations and digit templates, a two-step strategy is proposed to recognize game clocks on the semi-transparent scoreboard in scoreboard frames. With the game clock recognized from sports video, the alignment work is done by finding every match for game clock extracted from webcast text and annotating the corresponding event description on video frames.

4.1 Introduction

In the world, substantial number of sports videos are produced and broadcasted through television program or Internet streaming. It is nearly impossible to watch all sports videos. Most of the time, fans prefer to watch highlights of sports videos or retrieve only partial video segments that they are interested in. Therefore, sports video summarization and retrieval have become valuable and hot research topics. In these

topics, automatic semantic event detection and video annotation are essential works.

Most of existing researches [1]-[3] use video content as resource knowledge.

However, schemes relying on video content encounter a challenge called semantic gap. Recently, some researches [4]-[9] use a multimodal fusion of video content and external resource knowledge to bridge the semantic gap. The multimodal fusion scheme, which analyzes webcast text and video content separately and then does text/video alignment to complete sports video annotation or summarization, has been used in American football [4], soccer [6]-[8], and basketball [7]-[8].

In the scheme, text/video alignment, which consists of event moment detection and event boundary detection, has a great impact on performance. It can be achieved through scoreboard recognition. As can be seen in Fig. 4.1, a scoreboard is usually overlaid on sports videos to present the audience some game related information (e.g., score, game status, game clock) that can be recognized and aligned with text results.

For sports with game clock (e.g., basketball and soccer), event moment detection can be performed through video game clock recognition. Xu et al. [6]-[8] used Temporal Neighboring Pattern Similarity (TNPS) measure to locate game clock and recognize each digit of the clock. A detection-verification-redetection mechanism is proposed to solve the problem of temporal disappearing clock region in basketball videos.

However, recognizing game clock in a frame which has no game clock is definitely

unnecessary. The cost of verification and redetection could have been avoided.

Moreover, the clock digit characters cannot be located on a semi-transparent scoreboard.

(a) Transparent scoreboard.

(b) Non-transparent scoreboard.

Fig. 4.1 Two examples of overlaid scoreboard with game clock in basketball video.

According to our observation, two main problems of detecting game clock in basketball videos are the temporal disappearance and the temporal pause of game clock. The temporal disappearance of game clock may be caused by slow motion replays, shot transition effect or TV commercials, etc. The temporal pause of game

clock may be due to some basketball events, e.g., timeout, substitution, foul, etc.

These two problems make game clock recognition of basketball videos much harder than that of soccer videos. Furthermore, in order not to let scoreboard cover details of video frames, more and more sports videos use transparent scoreboard overlay. The transparency of scoreboard is another serious problem for game clock location and recognition.

As to event boundary detection, some researchers used hidden Markov model (HMM) [7] and conditional random field model (CRFM) [8]. However, not all events have obvious temporal patterns for start and end boundaries due to the complicated camera motions and play ground textures of sports videos. In Xu et al.’s experiments [8], boundary detection accuracy (BDA) are relatively low for foul and substitution events in basketball. Because foul event is short and followed by some other events (e.g. free throw, throw in) without obvious temporal transition patterns, and substitution event is loose of structure. Even if boundaries are labeled manually, results may still be subjective.

To treat the above-mentioned problems, based on the sports video analysis framework proposed in Chapter 2, we present a text/video alignment and event annotation method.

4.2 Proposed Method

In the proposed method, a video frame partition method (see Chapter 2) is referred to divide frames into scoreboard frames and non-scoreboard frames. For each scoreboard frame, location of each clock digit is first located. A digit templates collection scheme is provided to collect digit character templates. With clock digit locations and digit templates, a two-step strategy is proposed to recognize game clocks on the semi-transparent scoreboard in scoreboard frames. With the game clock recognized from sports video, the alignment work is done by finding every match for game clock extracted from webcast text (see Chapter 3) and annotating the corresponding event description on video frames.

4.2.1 Video Frames Partition

As can be seen from Fig. 2.1, in basketball videos, all frames can be broadly classified into two categories, scoreboard frames and non-scoreboard frames.

Scoreboard frames present basketball game with scoreboard overlaid on them, while non- scoreboard frames present the rest, e.g., sideline interview, slow motion replay, etc. Since semantic events only appear in scoreboard frames, it is beneficial to filter out unnecessary processing frames in each semantic resource extraction step. So, an automatic scoreboard template extractor is needed.

在文檔中籃球影片之語義標注與摘要擷取之研究 (頁 41-59)