• 沒有找到結果。

The advances in video production technology and the consumer demand have led to the ever-increasing volume of multimedia data. The rapid evolution of digital equipments allows general users to archive multimedia data much easily. The explosive proliferation of multimedia data in education, entertainment, sport and various applications makes manual indexing and annotation no more practical. The development of practical systems and tools for multimedia content analysis, understanding, indexing and retrieval is undoubtedly compelling [1, 4, 6, 72, 73, 74].

A large number of content retrieval approaches have been proposed on the basis of low-level features. However, human interpret video in terms of semantics rather than low-level features. The demand for automatic video understanding and interpretation requires the mid-level representations mapping from low-level features to high-level semantics, such as shot class, camera motion pattern, color layout, object shape and object trajectory.

Especially, object trajectory is one of the most informative representations which human use to analyze events frequently. Hence, the trajectory-based approaches have been gaining popularity [8, 15, 34, 35].

As important multimedia content, sports video has been attracting considerable research efforts due to the commercial benefits, entertainment functionalities and a large audience base [1, 2, 22, 25, 30, 35, 74]. In this thesis, we take sports video as source material for research.

Techniques of event detection, content understanding and sports information retrieval are proposed for automatic annotation and enriched visual presentation. Most viewers prefer retrieving the designated events, scenes and payers to watching a whole game in a sequential way. Therefore, various algorithms have been developed for shot classification, highlight extraction and semantic annotation based on the fusion of audiovisual features and the game-specific rules.

In this thesis, we focus on feature integration and algorithm development for sports video content analysis and understanding. Sports information retrieval, tactics analysis, enriched visual presentation can provide the audience and professionals a further insight into the games. Fig. 1-1 depicts the overview of our research work. We first extract low-level features adaptive to different event detection so as to infer the high-level semantic information. Then, various mid-level representations are computed to bridge the gap between low-level features and semantic content meanings. Since significant events are mainly caused by the interaction of moving objects, object trajectories bring much semantic information contributive to content understanding. Thus, we design several trajectory-based algorithms for sports video content analysis and understanding.

Fig. 1-1. Overview of our research work.

For semantic and tactical content analysis in sports video, we first propose an effective and efficient ball tracking algorithm. Object tracking is usually the medium to convert the low-level features into high-level events in video processing. Object tracking has been an arduous problem despite the long research history. Ball tracking is even a more challenging task due to the fast speed and small size. It is almost impossible to distinguish the ball within a single frame, so information over successive frames, e.g. motion information, is required to facilitate the discrimination of the ball from other objects. In several kinds of ball games, the ball moves following the physical characteristic that the ball trajectory forms a (near-) parabolic curve. For example, the ball shot toward the basket in a basketball game, the ball passed between players in a volleyball game, the ball moving between players in a tennis game and the pitched ball in a baseball game. Utilizing the physical characteristics of ball motion, we present a physics-based ball tracking method to compute the 2D ball trajectory in different kinds of single camera sports videos.

To have a further insight into the games and retrieve more detailed sports information, we propose an innovative approach capable of reconstructing 3D ball trajectory from single camera video for court sports. The 2D-to-3D inference is intrinsically challenging due to the loss of 3D information in projection to 2D frames. For court sports, the court lines and feature objects are captured in video frames. We utilize the domain knowledge of the court specifications to compute the transformation between 3D real world positions and 2D frame coordinates for camera calibration. Involving the physical characteristic of ball motion, we are able to recover the 3D information and reconstruct the 3D ball trajectory.

The obtained 2D trajectory and the reconstructed 3D trajectory enable manifold applications to sports information retrieval and computer-assisted game study. In basketball games, shooting location (the location of a player shooting the ball) is one of the important game statistics providing abundant information about the shooting tendency of a team. The statistical graph of shooting locations facilitates the coach to view the distribution of shooting

locations at a glance and to quickly comprehend where the players have higher possibility of shooting. Then, the players and the couch can infer the offense tactics of an opponent team and adapt their own defense strategy toward the team. Presently, most of the shooting location logging tasks are achieved manually. It is time-consuming and inefficient to watch a whole long video, take records and gather statistics. Thus, we propose a scheme to extract the shooting trajectory in basketball video, reconstruct the 3D trajectory and estimate the shooting location.

In volleyball games, players are not allowed to hold the ball. Hence, we detect the ball-player interaction events by utilizing the positions and the occurring times of direction changes in the trajectory. Moreover, the reconstructed 3D trajectory can provide the sports information the audience or professionals would like to know, such as set type, attack height, serve speed, serve placement, etc. Most of the informative game statistical data which cannot be directly perceived through human eyes can now be obtained based on the obtained 2D trajectory and the reconstructed 3D trajectory. Furthermore, the 3D virtual replay gives an exciting and practical visualization which enables watching the ball motion from any viewpoint.

In baseball games, the ball speed and the curvature of the ball trajectory are two main factors in determining how difficult the pitched ball can be hit. Hence, we track the pitched ball and extract the ball trajectory. Thus, ball speed and trajectory curvature can be computed for pitch analysis. Due to the capturing viewpoint and the frame rate constraint, the ball speed and trajectory curvature might not be very precise. The proposed pitch analysis is not for grading, but for entertainment effects, enriched visual presentation and sports information retrieval. In addition to ball speed and trajectory curvature, the pitch location (the relative location of the ball in/around the strike zone when the ball passes by the batter) also dominates the direction of the ball batted out. For example, a batter who swings at a lower pitch has a good chance of hitting a ground ball, while a batter who swings at a higher pitch

has a great chance of hitting the ball in the air. Since the strike zone provides reference for determining the pitch location, we propose a contour-based method to shape the strike zone according to the batter’s stance. Strike/ball judgment can also be visualized on the video frames by the shaped strike zone. Besides the pitch/batter confrontation, the ball motion and the defense process after the ball is batted into the field is another focus of attention. With the field specifications, we design algorithms to recognize the spatial patterns (field lines and field objects) in frames. Then, the active regions of event occurrence in the field are classified by the spatial patterns. We can infer the ball routing patterns and defense process from the transitions of the active regions captured in the video. Content understanding and annotation can thus be achieved, providing rich information about the games.

Comprehensive experiments on basketball, volleyball and baseball videos show encouraging results. The proposed methods perform well in 2D ball tracking and 3D trajectory reconstruction from single camera video for different kinds of sports. It is our belief that the coach and players will be greatly assisted in game study with the semantic and tactical information derived from our proposed methods. Also, the audience can have a professional insight into the game.

In the following chapters, we give detailed explanation for the proposed methods and techniques. The rest of the thesis is organized as follows. Chapter 2 explains physics-based ball tracking and 3D trajectory reconstruction with applications to shooting location estimation in basketball video. Chapter 3 describes ball tracking and 3D trajectory approximation with applications to tactics analysis in volleyball video. Chapter 4 elaborates on ball tracking, strike zone shaping and play region classification in baseball video. Finally, Chapter 5 concludes this thesis.

Chapter 2. Physics-Based Ball Tracking and 3D Trajectory Reconstruction

相關文件