Ball tracking and 3D trajectory approximation with applications to tactics analysis from single-camera volleyball sequences

(1)

Ball tracking and 3D trajectory approximation

with applications to tactics analysis from single-camera

volleyball sequences

Hua-Tsung Chen&Wen-Jiin Tsai&Suh-Yin Lee& Jen-Yu Yu

Published online: 21 June 2011

# Springer Science+Business Media, LLC 2011

Abstract Providing computer-assisted tactics analysis in sports is a growing trend. This paper presents an automatic system for ball tracking and 3D trajectory approximation from single-camera volleyball sequences as well as demonstrates several applications to tactics analysis. Ball tracking in volleyball video has great complexity due to the high density of players on the court and the complicated overlapping of ball-player. The 2D-to-3D inference is intrinsically challenging due to the loss of 3D information in projection to 2D frames. To overcome these challenges, we propose a two-phase ball tracking algorithm in which we first detect ball candidates for each frame, and then use them to compute the ball trajectories. With the aid of camera calibration, we involve physical characteristics of ball motion to approximate the 3D ball trajectory from the 2D trajectory. The visualization of 3D trajectory and the applications to trajectory-based tactics analysis not only assist the coaches and players in game study but also make game watching a whole new experience. The experiments on international volleyball games show encouraging results. We believe that the proposed framework can be extended and applied to various kinds of sports games.

Keywords Object tracking . Sports video analysis . Content-based multimedia analysis . Camera calibration . 3D trajectory approximation

DOI 10.1007/s11042-011-0833-y

H.-T. Chen (*)

Information and Communications Technology Lab, National Chiao Tung University, Hsinchu 300, Taiwan

e-mail: huatsung@cs.nctu.edu.tw W.-J. Tsai

:

S.-Y. Lee

Department of Computer Science, National Chiao Tung University, Hsinchu 300, Taiwan J.-Y. Yu

(2)

1 Introduction

The proliferation of multimedia data makes manual annotation of huge video databases no more practical. This trend facilitates developing automatic systems and tools for content-based multimedia information retrieval. Recently, sports video is attracting considerable attention due to the potential commercial benefits and entertaining functionalities. As the pace of life in the information society accelerates, most viewers desire to retrieve the significant events or designated scenes and players, rather than watching a whole game in a sequential way. Various algorithms of shot classification [7,8,16,23], highlight extraction [2,9,32] and semantic annotation [1,3,18] in sports video have been developed based on the combination of low-level visual/auditory features and game-specific rules. Furthermore, semantic content analysis of sports video requires ball/player tracking [4,5,22,27–29,31, 33] to acquire the ball-player interaction and camera calibration [10,11, 24,29, 30] to obtain the ball/player positions in the real world coordinates.

Most existing work in sports video analysis is audience-oriented. However, the coaches and sports professionals desire to acquire tactical, statistical and professional information in game watching. Traditional interactive video viewing systems which provide quick browsing, indexing and summarization of sports video no longer fulfill their requirements. The professionals prefer better understanding of the tactic patterns and statistical data so that they are able to improve performance and better adapt the operational policy during the game. To achieve this purpose, the current trend is to employ some personnel for game annotation, match recording, tactics analysis and statistics collection. However, it is obviously time-consuming and labor-intensive. Hence, automatic tactics analysis and statistics collection in sports games are undoubtedly compelling.

Although more and more research in sports video processing concentrates on ball tracking and trajectory-based tactics analysis, the majority of existing work focuses on tennis and soccer video [27–29,33]. Little work was done for volleyball video. However, volleyball games attract a large amount of audience. Besides, it is very challenging to track the ball in volleyball video due to the high density of players on the court and the frequent ball-player overlaps. Hence, ball tracking and 3D trajectory approximation in volleyball video are worth in-depth investigation. In this paper, we develop an automatic system called VIA (Volleyball Intelligence Agent), which performs 2D ball tracking and 3D trajectory approximation from single view video sequences, captured by a fixed camera located behind the court, for tactics analysis in volleyball games.

Generally speaking, not all of the sports games are broadcasted on TV. As the rapid evolution of digital equipments, general users are able to capture multimedia data more easily. It is common nowadays for sports professionals to set up a camera to capture the video sequences of the games they are interested in for game strategy study. Visual content analysis is no longer confined to broadcast video. Content analysis in user-generated multimedia data becomes another burgeoning and critical issue [12, 17, 19]. This trend necessitates the development of computer-assisted game study system for the user-captured sports video.

The rest of the paper is organized as follows.Section 2introduces the related work on sports video analysis. Section 3describes the overview of the proposed VIA system. The processes of audio event detection, camera calibration and 2D ball trajectory extraction are explained in Sections 4, 5 and 6, respectively. Section 7 elaborates 3D trajectory approximation. Section 8 presents the trajectory-based applications to tactics analysis. Section 9reports and discusses the experimental results. Finally,Section 10concludes this paper.

(3)

2 Related work

2.1 Related work on camera calibration

Semantic analysis of sport video requires camera calibration to convert 2D positions in the video frame to 3D real world coordinates or vice versa. Various camera calibration methods are based on planar reference objects [10,11,24]. These plane-based calibration techniques require feature points on a plane appearing in different views. Farin et al. [10,11] propose a camera calibration algorithm for court sports. They start with identifying the court-line pixels by exploiting the constraints of color and local texture, and then detect the court lines by the Hough transform. The intersection points of the court lines are extracted as the feature points to compute the camera projection matrix via solving a set of linear equations. For the subsequent frames, a model tracking mechanism is used to predict the camera parameters from the previous frame. Watanabe et al. [24] propose a soccer field tracking method, which extracts the field lines, defines a wire frame model based on the official layout of the soccer field lines, and finally tracks where the field area corresponds to in the soccer field by utilizing the camera parameters computed via matching the wire frame model with the extracted field lines.

Yu et al. [29,30] propose a non-plane based camera calibration method of tennis video. They approximate the projection geometry by a perspective projection model mapping from the 3D world to the 2D image. Two techniques are used: frame grouping and Hough-like search. The grouping technique clusters frames according to camera viewpoint. Then, a group-wise data analysis is used to obtain stable camera parameters. However, some of the parameters vary even if they have similar camera viewpoint. A Hough-like search is used to tune some parameters.

2.2 Related work on ball/player tracking

Since significant events are mainly caused by ball-player and player-player interactions, balls and players are the most frequently tracked objects in sports video. Yu et al. [27,28, 31] present a trajectory-based algorithm for ball detection and tracking in soccer video. The ball size is first estimated from feature objects (the goalmouth and ellipse) to detect ball candidates. Potential trajectories are generated from ball candidates by a Kalman filter based verification procedure. Camera motion recovery helps in obtaining better candidates and forming longer ball trajectories. The true ball trajectories are finally selected from the potential trajectories according to a confidence index, which indicates the likelihood that a potential trajectory is a ball trajectory. Zhu et al. [33] analyze the temporal-spatial interaction among the ball and players to construct a tactic representation called aggregate trajectory based on multiple trajectories. The interactive relationship with play region information and hypothesis testing for trajectory temporal-spatial distribution are exploited to analyze the tactic patterns. Our previous work [4,5] performs physics-based ball tracking in broadcast sports video and provide trajectory-based applications, such as pitching evaluation in baseball and shooting location estimation in basketball.

Some work focuses on 3D trajectory reconstruction based on multiple cameras located at specific positions [13,20,21]. Hawk-Eye system [20] produces computer-generated replays viewed through 360°. 2D ball tracking is first performed on each of the specifically located cameras. These 2D trajectories are then sent to a 3D reconstitution module to construct the 3D trajectories, and impact points between separate trajectories (can occur at a bounce or a strike) are determined. Finally, the complete track is visualized. ESPN K-Zone system [13]

(4)

extracts the trajectory for each pitch and uses computer-generated graphics to outline the strike zone boundaries. Two cameras linked to two PCs are used to observe the ball and each PC extracts a 2D trajectory. The two pitch-tracking computers combine two 2D positions which correspond to the same time code into a 3D position. Then, the successive 3D positions are fed into a Kalman filter to determine the final trajectory. UIS (Umpire Information System) [21] uses multiple cameras to track each pitch and measure the batter’s strike zone so as to support the strike/ball judgment. Although these systems perform well in ball tracking and 3D trajectory reconstruction, they have strong limitation of view angles and require high cost of multiple high speed cameras. Moreover, the high demand for the camera installation locations and the visible areas constrains their systems to be used in a studio-like sports field. These systems are not practicable for general users.

Ball tracking in sports video confronts many difficulties [28]: the small ball size in frames, the varied ball appearance (shape, color, size), the presence of many ball-like objects, the occlusion of the ball (by a player) and the mergence of the ball with lines or players. For soccer video, some previous algorithms [27, 28,31] estimate the ball size, detect ball candidates by appearance features and extract the ball trajectory based on Kalman filter. In tennis and baseball videos, there are less ball-player occlusions but the ball moves very fast. To achieve high accuracy, multiple high-speed cameras are required to track the ball [13,20,21]. As to volleyball video, the high density of players on the court and the frequent ball-player occlusions make ball tracking much challenging. In this paper, we utilize the characteristic that the volleyball moves near parabolically to model the ball motion. The positions of the ball occluded by players can be recovered, and the volleyball trajectory can be extracted accurately.

3 System overview

To achieve automatic tactics analysis in volleyball games, a system called VIA (Volleyball Intelligence Agent) is designed in this paper. Based on our previous work [26], which extracts 2D trajectories for set type recognition, VIA approximates and visualizes 3D ball trajectories, so that not only trajectory-based game study can be presented but game watching also becomes an entirely novel experience. The system framework is illustrated in Fig.1.

Whistle directly determines the start and end of each play in volleyball games. Thus, VIA starts with whistle detection to segment the game into plays. Moreover, VIA also

Camera Audio stream Video frames 4. 3D Trajectory Approximation Whistle Attack

1. Audio Event Detection Play Boundary

Event Index

Ball diameter

Applications Ball positions over frames

3. 2D Ball Trajectory Extraction

Ball candidate detection Trajectory generation

2. Camera Calibration

Feature point finding Projection matrix

computation

Fig. 1 Block diagram of the proposed VIA system

(5)

detects the sounds of attacks for event indexing. For video frames, VIA first performs camera calibration via finding the non-coplanar feature points to compute the projection matrix mapping 3D real world coordinates to 2D image positions. Since we use a fixed camera for video capturing and there is no camera motion, VIA performs camera calibration and projection matrix computation once in each game. For 2D ball trajectory extraction, ball candidates are detected in each frame by the constraints of size, shape and compactness. However, it is almost impossible to distinguish the ball within a single frame, so VIA correlates information on the ball candidates over a sequence of frames, explores potential trajectories and identifies the true ball trajectories. To approximate the 3D trajectory, we model the 3D trajectory with the parameters: velocities and initial positions based on the physical characteristics of ball motion. The 3D ball positions over frames can be represented by equations. The projection matrix computed in camera calibration is then used to map the equation-represented 3D ball positions to the 2D ball coordinates in frames. With the 2D coordinates of the extracted ball candidates being known, we can compute the parameters of the 3D motion equations and approximate the 3D ball trajectory. Finally, VIA is able to present trajectory-based applications to tactics analysis.

The novelty and contribution of this paper are summarized as follows. The problem of 2D-to-3D inference is intrinsically challenging due to the loss of 3D information in projection to 2D frames in picture capturing. Incorporating the physical motion information and domain knowledge, we propose the premier approach capable of 3D trajectory approximation without the need of multiple cameras, based on the 2D ball tracking method of our previous work [26]. Moreover, based on the obtained 2D and 3D ball trajectories, we design several novel applications which meet professionals’ demand including action detection, set type recognition, 3D virtual replay and serve placement estimation. Trajectory-based tactics analysis and statistics collection can be accomplished without manual efforts to support the couch and professionals in game study and performance evaluation. The novelty of manifold trajectory-based game analysis gives the audience brand-new experience of game watching.

4 Audio event detection

Several significant events, such as whistle and attack, are difficult to detect from visual features but can be directly traced through audio features. ZCR (Zero Crossing Rate), which counts the number of times that the signal crosses the zero axis, is a simple measurement of the frequency content of a signal. Since whistle has higher frequency than other signals, ZCR (Zero Crossing Rate) is a distinguishing and easy-to-compute feature for whistle detection [25,26]. Thus, we perform whistle detection via picking the ZCR peaks which are greater than a threshold ZZCR.

Attack detection plays an important role in event indexing. The sound of attack is a transient signal in a very short duration. By analyzing the STE (Short-Time Energy) [34], a peak can be observed when the attack occurs. Thus, we perform attack detection via picking the STE peaks which are greater than a threshold ZSTE.

5 Camera calibration

In the pinhole camera model, a camera is a mapping from the 3D real world to the 2D image space [15]. The real world point (X, Y, Z)T in 3D space is represented as a

(6)

homogeneous 4-vector W=(X, Y, Z, 1)Tby adding a final coordinate of 1, the image point (x, y)T in 2D space is represented as a homogeneous 3-vector (x, y, 1)Tby adding a final coordinate of 1, and P represents the 3×4 homogeneous camera projection matrix (for homogeneous representation of points, please refer to p27 in [15]). Then, the mapping from the 3D real world to the 2D image is written compactly as

m PW ¼ 0: ð1Þ

To compute the camera projection matrix, we need to extract a set of corresponding points—the points whose coordinates are both known in the 3D real world and in the 2D image. We first segment the court region consisting of the court lines L1to L7(see Fig.2)

using the dominant color feature computed via color histogram. The court lines are detected by the Hough transform. Then, we can obtain the coordinates of the ground feature points g1 to g10via computing the intersection of court lines. In addition to the ground feature

points, the computation of camera matrix requires non-coplanar feature points. Thus, we trace vertically from the ground points g5, g6 in the image and search the two vertical

borders of the net using the Hough transform. The endpoints of the vertical border of the net (g11to g14), together with the ground feature points, form a non-coplanar feature point

set.

Here, we briefly describe the methods of solving for the camera projection matrix P; the details of these methods are available in [15]. For each correspondence W↔ m we derive a relationship from Eq. (1):

Y

Z

g

12

g

13

g

14

(7)

The matrix P has 12 entries, and (ignoring scale) 11 degrees of freedom, so it is necessary to have at least 11 equations to solve for P. With the 14 non-coplanar point correspondences obtained, we can solve for P using the direct linear transform (DLT) algorithm (see p109, p178–184 in [15]).

6 2D ball trajectory extraction

It is a challenging task to identify the ball in frames due to its small size and fast movement. We have proposed an effective method of 2D ball trajectory extraction in [6]. In this section, we summarize the processes of 2D trajectory extraction.

6.1 Ball candidate detection

Since we focus on analyzing the video sequences captured by a fixed camera and there is no camera motion, it is sufficient to detect moving pixels via differencing successive frames for moving object segmentation. The opening morphological operation (erosion followed by dilation) by a 3×3 square structuring element are performed to remove noises. Please refer to Sectio n 9.2 and Section 9.3 in [14] for more details about morphological operations. Then, moving objects are formed by region growing which iterates the following procedure until all moving pixels are assigned to regions: start with a seed (a moving pixel which is not assigned to a region) and grow the region by appending the neighboring moving pixels to the seed. However, the ball is not the only moving object, but also the audience and players. Therefore, we design the following sieves to prune non-ball objects so as to improve the efficiency of the consequent ball tracking step. The remaining objects which satisfy the constraints of size, shape and compactness are considered as the ball candidates.

1) Size sieve: The in-frame ball diameter Dfrmcan be proportionally estimated from the

length between the court line intersections by the pinhole camera imaging principal: Dfrm=Dreal

¼ d=Dð Þ; Dfrm¼ Drealðd=DÞ ð4Þ

where Drealis the diameter of a real volleyball (≈ 21 cm), d and D are the in-frame

length and the real-world length of a corresponding line segment, respectively. To compute the ratio (d/D), we select the two points closest to the frame center from the court line intersections and calculate the in-frame distance d of the selected two points. Since the distance of the two points in the real court D is specified in the volleyball rules, the ratio (d/D) can be computed out. Thus, the planar ball size in the frame can be estimated asπ • (Dfrm/2)2. The size sieve filters out the objects of which the sizes

are not within the range½p&ðDfrm=2Þ2 Δ; p&ðDfrm=2Þ2þ Δ, where Δ is the extension

for tolerance toward processing faults. In our experiments, the video resolution is 352× 240, and we setΔ=10 empirically.

2) Shape sieve: The ball in frames may have a shape different from a circle, but its height-to-width ratio Rh-wand width-to-height ratio Rw−hshould be close to 1. Let Ra=

max(Rh−w, Rw−h). An object will be removed if its Rais greater than a threshold ZRa.

3) Compactness sieve: The compactness sieve is built to filter out the objects with the compactness degree CD, as defined in Eq. (5), less than a threshold ZCD.

(8)

The ball is at a distance away from other moving objects in most frames. Thus, the ball candidates close to other moving objects might be over-segmented regions of players. To improve the accuracy of ball tracking, ball candidates are classified into isolated or contacted candidates according to their nearest objects. A candidate is classified as isolated if there is no neighboring object within a distance of Dfrm(the in-frame ball diameter), and

it is classified as contacted, otherwise.

6.2 Potential trajectory exploration

It is very difficult to identify the ball from the ball candidates within a single frame. Therefore, motion information over successive frames is required to discriminate the ball from other moving objects. To visualize the motion of ball candidates, we plot the y-and x-coordinates of the ball candidates over time (indexed by the frame serial number n), called y-time plot (YTP) and x-time plot (XTP), respectively. An example of YTP and XTP is shown in Fig. 3a, where black dots and green crosses represent the detected isolated candidates and contacted candidates, respectively.

To acquire the characteristics of ball motion, we make YTPs and XTPs for 30 volleyball clips by manually locating the ball positions over frames. Three examples are shown in Fig.4, where the vertical lines indicate the ball-player interactions. We observe that the ball

y-time plot (YTP) x-time plot (XTP)

(a) Plotting the y-and x- coordinates of the ball candidates over time (indexed by the frame serial number n). Black dots represent isolated candidates and green crosses represent contacted candidates.

(b) Potential trajectories: the sequences of the linked ball candidates in YTP and XTP.

(c) Integrated trajectory.

(9)

moves near parabolically in y-direction and moves near straightly in x-direction between a pair of successive ball-player interactions. Therefore, we design an algorithm to explore a sequence of ball candidates which form a near parabolic curve in YTP and a near straight line in XTP simultaneously as a potential trajectory.

Figure 5 shows the algorithm of potential trajectory exploration. Initially, each ball candidate c is linked to the nearest candidate c' in the previous frame if the distance between c and c' is smaller than Dfrm(the in-frame ball diameter). A growing trajectory is

formed when the number of linked ball candidates is up to three (three points form a parabolic curve), and the prediction functions are defined as Eq. (6) and Eq. (7).

y¼ a1&n2þ a2&n þ a3; a1< 0; n : frame serial number ð6Þ

x¼ b1&n þ b2 ð7Þ

Then, the algorithm verifies the ball position prediction of each growing trajectory. The prediction is considered matched if the distance between a ball candidate and the predicted position is smaller than Dfrm. A growing trajectory T extends by adding the ball candidate

which matches the predicted position, and the prediction functions are updated by re-computing the best-fitting functions for the coordinates of the ball candidates in T using the least squares fitting. For a growing trajectory, if no candidate matches the predicted position, the ball is considered missed and the predicted position is taken as the ball position. A growing trajectory T is finalized as a potential trajectory if the ball is missed for Zf consecutive frames. A ball candidate added to no growing trajectory is linked to the

nearest candidate in the previous frame for forming a new growing trajectory. The potential trajectories produced from this procedure are shown as the sequences of the linked ball candidates in YTP and XTP in Fig.3b.

6.3 Trajectory identification and integration

For each potential trajectory T, we define its confidential point to measure how likely T is a ball trajectory. A potential trajectory T gains one confidential point for each satisfaction of the following criteria: 1) the trajectory length is greater than a threshold ZTL, 2) the

percentage of isolated candidates in the trajectory is greater than a threshold ZIC, and 3) the

prediction error (defined as the average of the distances from ball candidate positions to the predicted positions) is less than a threshold ZPE. Obviously, we want ball trajectories to

n y n y n y n x n x n x

(a) Example 1 (b) Example 2 (c) Example 3

(10)

have high confidential points. However, we cannot just iteratively select the potential trajectory Thwith highest confidential points and discard the trajectories which overlap with

Th, because a ball trajectory may be removed wrongly if the adjacent trajectory is over

extended due to a nearby spurious candidate. Hence, we first select the potential trajectories with 3 confidential points as the true trajectories (we say that these trajectories are identified). For two overlapped trajectories, we compute the trajectory intersection, and trim the portion after the intersection in the former trajectory and trim the portion before the intersection in the latter. Then, we iteratively select the longest 2-point potential trajectories which do not overlap with identified trajectories until all 2-point potential trajectories are processed. Finally, the gaps between two successive identified trajectories can be patched by extending these two trajectories based on their respective prediction

Input: BC, the set of detected ball candidates Output: S, the set of potential trajectories

Initialization for growing trajectories;

for each frame f {

for each ball candidate c in f { for each growing trajectory T{

if (c matches the ball position predicted by T){

Add c to T;

Update the prediction functions of T; }

}

if (c is not added to any growing trajectory and f is not the first frame){

Find in the previous frame the ball candidate c' closest to c;

if(distance(c, c') < Dfrm){

Link c to c';

if (# of linked ball candidates reaches 3){

if (the 3 linked ball candidates form a line in XTP){

Initialize a growing trajectory T;

Initialize the prediction functions of T; }

else

Remove the first ball candidate from the link; }

} } }

for each growing trajectory T {

if (no ball candidate matches the predicted position){

The ball is considered missed;

The predicted position is taken as the ball position;

Set Zf= half the number of the ball candidates in T;

if (the ball is missed for Zf consecutive frames )

Move T into S;

} } }

(11)

functions, as shown in Fig. 3c. Thus, the ball positions can be estimated even though the ball is temporarily occluded.

7 3D trajectory approximation

In volleyball games, the ball trajectory comprises a sequence of near parabolic curves, even though many factors affect the ball motion, such as velocity, gravitational constant, spin axis, spin rate, air friction, etc. We call each near parabolic curve in the ball trajectory a sub-trajectory and roughly model a 3D sub-sub-trajectory as:

Xt¼ X0þ VXt

Yt¼ Y0þ VYt

Zt¼ Z0þ VZtþ gt2=2

ð8Þ where (Xt, Yt, Zt) is the 3D ball coordinate at time t, (X0, Y0, Z0) is the 3D ball coordinate of

the starting position in the sub-trajectory, (VX, VY, VZ) is the 3D ball velocity and g is the

gravitational constant.

We use Wt¼ Xð t; Yt; Zt; 1ÞT¼ Xð 0þ VXt; Y0þ VYt; Z0þ VZtþ gt2=2; 1Þ T

and mt=(xt,

yt, 1)T. From Eq. (3) each point correspondence gives two equations as

0T _W tT ytWtT WtT 0T xtWtT P1 P2 P3 0 @ 1 A ¼ 0: ð9Þ

Given N detected ball candidates, we obtain 2N equations. Since the entries in P (denoted as pij), the 2D image coordinate (xt, yt) and the occurring time t of each ball candidate are

known, we can set up a linear system, as Eq. (10), to compute the six unknowns X0, Y0, Z0,

VX, VYand VZ. We solve for (X0, VX, Y0, VY, Z0, VZ)Tusing the direct linear transform algorithm

(see p109 in [15]). Thus, each 3D sub-trajectory can be approximated.

p11 x1p31 p11t1 x1p31t1 p12 x1p32 p12t1 x1p32t1 p13 x1p33 p13t1 x1p33t1 p21 y1p31 p21t1 y1p31t1 p22 y1p32 p22t1 y1p32t1 p23 y1p33 p23t1 y1p33t1 p11 x2p31 p11t2 x2p31t2 p12 x2p32 p12t2 x2p32t2 p13 x2p33 p13t2 x2p33t2 p21 y2p31 p21t2 y2p31t2 p21 y2p32 p21t2 y2p33t2 p23 y2p33 p23t2 y2p33t2 .. . p11 xNp31 p11tN xNp31tN p12 xNp32 p12tN xNp32tN p13 xNp33 p13tN xNp33tN p21 yNp31 p21tN yNp31tN p22 yNp32 p22tN yNp32tN p23 yNp33 p23tN yNp33tN 2 6 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 7 5 X0 VX Y0 VY Z0 VZ 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 ¼ x1ðp33 gt21=2 þ 1Þ ðp13 gt12=2 þ p14Þ y1ðp33 gt21=2 þ 1Þ ðp23 gt12=2 þ p24Þ x2ðp33 gt22=2 þ 1Þ ðp13 gt22=2 þ p14Þ y2ðp33 gt22=2 þ 1Þ ðp23 gt22=2 þ p24Þ .. . xNðp33 gt2N=2 þ 1Þ ðp13 gt2N=2 þ p14Þ yNðp33 gt2N=2 þ 1Þ ðp23 gt2N=2 þ p24Þ 2 6 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 7 5 ð10Þ However, here comes a problem. Since each 3D sub-trajectory is approximated independently, the 3D coordinate of the transition point between a pair of adjacent sub-trajectories computed from the preceding sub-trajectory is not always consistent with the one computed from the succeeding sub-trajectory. To overcome this problem, we

(12)

enhance the algorithm by taking two adjacent sub-trajectories into consideration simultaneously.

Figure 6 illustrates the procedure of 3D trajectory approximation by a sample ball trajectory. As shown in Fig.6a, the ball trajectory contains three sub-trajectories S0, S1and

S2. Let P1be the transition point between S0and S1, and P2between S1and S2. Let (VXi,

VYi, VZi) be the 3D ball velocity of the sub-trajectory Si, where i is the index. As shown in

Fig.6b, we consider the two adjacent sub-trajectories S0and S1to derive (VX0, VY0, VZ0, X1,

Y1, Z1, VX1, VY1, VZ1). Taking P1as the initial point, the 3D sub-trajectories of S0and S1are

expressed as Eq. (11) and Eq. (12), respectively. Xt¼ X1 VX 0t Yt¼ Y1 VY 0t Zt¼ Z1 VZ 0t gt2=2 ð11Þ Xt¼ X1þ VX 1t Yt¼ Y1þ VY 1t Zt¼ Z1þ VZ1tþ gt2=2 ð12Þ

Using Wt¼ Xð t; Yt; Zt; 1ÞT¼ ðX1 VX 0t; Y1 VY 0t; Z1 VZ 0t gt2=2Þ for S0 and

Wt¼ Xð t; Yt; Zt; 1ÞT¼ ðX1þ VX 1t; Y1þ VY 1t; Z1þ VZ1tþ gt2=2Þ for S1, we obtain two

equations for each ball candidate, as Eq. (9). The 2N equations produced from N ball

Process: S0 P1 S1 Compute : (VX0, VY0, VZ0, X1, Y1, Z1, VX1, VY1, VZ1) Process: S1 P2 S2 Compute: (VX1, VY1, VZ1, X2, Y2, Z2, VX2, VY2, VZ2) Process: P1 S1 P2 Known: (X1, Y1, Z1) and (X2, Y2, Z2) Refine: (VX1, VY1, VZ1) P1(X1, Y1, Z1) P2(X2, Y2, Z2) S0 S1 S2 (VX0, VY0 VZ0) (VX2, VY2 VZ2) (VX1, VY1 VZ1)

A ball trajectory containing three

sub-trajectories: S0, S1and S2 S0 S1 (VX0, VX0 VZ0) (VX1, VY1 VZ1) S1 _S 2 (VX2, VY2 VZ2) (VX1, VY1 VZ1) S1 (VX1, VY1 VZ1) (a) (b) (c) (d) P1(X1, Y1, Z1) P1(X1, Y1, Z1) P2(X2, Y2, Z2) P2(X2, Y2, Z2)

(13)

candidates form a linear system, as Eq. (13). Assume that the S0and S1consist of k and N-k

ball candidates, respectively.

p11 x1p31 p11t1þ x1p31t1 0 p13 x1p33 p13t1þ x1p33t1 0 p21 y1p31 p21t1þ y1p31t1 0 p23 y1p33 p23t1þ y1p33t1 0 .. . p11 xkp31 p11tkþ xkp31tk 0 p13 xkp33 p13tkþ xkp33tk 0 p21 ykp31 p21tkþ ykp31tk 0 p23 ykp33 p23tkþ ykp33tk 0 p11 xkþ1p31 0 p11tkþ1 xkþ1p31tkþ1 p13 xkþ1p33 0 p13tkþ1þ xkþ1p33tkþ1 p21 ykþ1p31 0 p21tkþ1 ykþ1p31tkþ1 p23 ykþ1p33 0 p23tkþ1þ ykþ1p33tkþ1 .. . p11 xNp31 0 p11tN xNp31tN p13 xNp33 0 p13tN xNp33tN p21 yNp31 0 p21tN yNp31tN p23 yNp33 0 p23tN yNp33tN 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 X1 VX 0 VX 1 Y1 VY 0 VY 1 Z1 VZ0 VZ1 2 6 6 6 6 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 7 7 7 7 5 ¼ x1ðp33 gt12=2 þ 1Þ þ ðp13 gt21=2 þ p14Þ y1ðp33 gt12=2 þ 1Þ þ ðp23 gt21=2 þ p24Þ .. . xkðp33 gtk2=2 þ 1Þ þ ðp13 gt2k=2 þ p14Þ ykðp33 gtk2=2 þ 1Þ þ ðp23 gt2k=2 þ p24Þ xkþ1ðp33 gt2kþ1=2 þ 1Þ þ ðp13 gtkþ12 =2 þ p14Þ ykþ1ðp33 gt2kþ1=2 þ 1Þ þ ðp23 gtkþ12 =2 þ p24Þ .. . xNðp33 gt2N=2 þ 1Þ ðp13 gtN2=2 þ p14Þ yNðp33 gt2N=2 þ 1Þ ðp23 gtN2=2 þ p24Þ 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 ð13Þ

We solve for (X1, Vx0, Vx1, Y1, Vy0, Vy1, Z1, Vz0, Vz1)Tusing the direct linear transform

algorithm. Thus, the coordinate of P1 (X1, Y1, Z1) is obtained. In the same way, the nine

parameters (VX1, VY1, VZ1, X2, Y2, Z2, VX2, VY2, VZ2) can be computed by processing S1and

S2simultaneously, as shown in Fig.6c.

For the sub-trajectory S1between two transition points P1and P2, its 3D velocity (VX1,

VY1, VZ1) is computed twice: one when processing S0−P1−S1and the other when processing

S1−P2−S2. Thus, we refine (VX1, VY1, VZ1) via solving for Eq. (12) with (Xt, Yt, Zt)=(X2, Y2,

Z2). Finally, the 3D trajectory can be approximated using the 3D ball velocity (VXi, VYi, VZi)

on each sub-trajectory Siand the coordinate (Xj, Yj, Zj) of each transition point Pj.

8 Trajectory-based applications

This section presents several applications based on the acquired 2D and 3D trajectories to demonstrate the utility of the proposed 2D ball tracking and 3D trajectory approximation scheme. The trajectory-based applications to tactics analysis greatly assist the coaches and players in game strategy study.

8.1 Action detection and set type recognition using 2D trajectory

In volleyball games, a play begins with a serve followed by the iterative actions: reception, set and attack. By rules, players are not allowed to hold the ball during a play. Thus, the ball changes its motion only when interacting with a player. The turning points of the ball trajectory can be detected and recognized as serve, reception, set and attack in order.

Since in volleyball games the set type brings most tactical information and typically dominates whether a team can score or not, we focus on the set action and further recognize the set type. Figure7illustrates ten common set types. A set type is determined according to its direction (forward or backward), the horizontal and vertical displacements of the ball.

(14)

We define the discriminants as Table1. Set Qa, Qb, Qc and Qd are quick sets which players try to hit the ball as soon as possible. Set #2 and # 3 are short sets next to the setter while set #1, #4, #5 and #6 are long sets toward the two sides of the net. A set type can be recognized by classifying the set curve (the sub-trajectory after the set action) into one of the ten types by the discriminants, where a2and b1are coefficients in Eq. (6) and Eq. (7),

and g1–g3are thresholds. We use 200 set curves (20 curves per set type) as training data and

manually label the set types. The thresholds g1–g3are determined by seeking for the values

which best classify the set curves in the training data.

8.2 3D virtual replay and serve placement estimation using 3D trajectory

3D trajectory approximation facilitates the enriched visual presentation of 3D virtual replays. The ball movement can be watched on a virtual court from any viewpoint. This visualization is exciting and practical that the viewpoints can be switched among the receiver, setter, attacker or the players opposite the net, which cannot be captured from any camera on the court.

Serve placement (landing position) offers a valuable insight into the game strategy because the serve-reception directs the first attack in a play. With the 3D trajectory approximated, we extend the trajectory of a serve and the serve placement can be estimated when the sub-trajectory reaches the ground (the z-coordinate of the ball equals zero).

9 Experimental results and discussion

The proposed algorithms of audio event detection, 2D ball tracking and 3D trajectory approximation are implemented in Borland C++ Builder 6.0. For performance evaluation,

Table 1 Discriminants of ten common set types

Set Discriminant Set Discriminant

#1(long) b1<0, |b1|>g1, |b1|/a2≤g2 Qa (quick) b1<0, |b1|<g1, g2<|b1|/a2≤g3 #2(short) b1<0, |b1|<g1, |b1|/a2≤g2 Qb (quick) b1<0, |b1|/a2>γ3

#3(short) b1>0, |b1|<g1, |b1|/a2≤g2 Qc (quick) b1>0, |b1|<g1, g2<|b1|/a2≤g3 #4(long) b1>0, |b1|>g1, |b1|/a2≤g2 Qd (quick) b1>0, |b1|/a2>g3

#5(long) b1<0, |b1|>g1, g2<|b1|/a2≤g3 #6 (long) b1>0, |b1|>g1, g2<|b1|/a2≤g3 Fig. 7 Illustration of set type

(15)

the proposed system is tested on the volleyball video sequences (MPEG-1, 352×240, 29.97 fps, Audio: 44.1 kHz, 16 bits, stereo) which we captured in 2005 Asia Men’s Volleyball Challenge Cup: 1) Taiwan vs. Korea, 2) China vs. Japan, and 3) Japan vs. Korea.

9.1 Parameter setting

In ball candidate detection, an object will be removed by the shape sieve or the compactness sieve if its Ra(the higher value between AR and 1/AR, where AR is the aspect

ratio) is greater than a threshold ZRa or its compactness degree CD is less than ZCD,

respectively. To determine the values of ZRaand ZCD, we use a training data set containing

300 ball candidates which are manually recognized as the true ball. We compute the Raand

CD of the 300 ball candidates, and construct the relative cumulative frequency distributions, as shown in Fig. 8. For high recall rate, the thresholds ZRa and ZCD are

determined in such a way that no more than 5% of the ball candidates in the training data set will be removed by the shape and compactness sieves.

In potential trajectory exploration, a growing trajectory T is finalized as a potential trajectory if the ball is missed for Zfconsecutive frames. We set Zf=half the number of the

ball candidates in T, since a potential trajectory with more than half the ball positions missed is not reliable. In trajectory identification, we compute the trajectory lengths, the percentages of isolated candidates in the trajectory and the prediction errors (in pixel) of a training data set containing 100 true ball trajectories and 100 false trajectories to construct the relative frequency distributions, as shown in Fig.9, where the solid lines are for true ball trajectories and the dotted lines are for false trajectories. The thresholds ZTL, ZICand

ZPEare determined at the intersections of the solid lines and the dotted lines.

9.2 Results of 2D trajectory extraction

We achieve good performance of overall 96.5% precision rate and 98.57% recall rate in whistle detection. The experiment of 2D trajectory extraction is conducted on the shots which are correctly segmented by whistle detection. The following conventions and notations are used in presenting the results. For each ball frame (the ball is in the region of the frame even though the ball is occluded), the ground truth of ball position is obtained by manual inspection. The system is said to correctly identify a frame f if: 1) it concludes the ball position within a distance of Dfrm(the in-frame ball diameter) from the ground truth

when f is a ball frame or 2) it concludes that there is no ball when f is a no-ball frame. The system is said to give a false alarm if it concludes the incorrect ball position in a ball frame

(a) Frequency (%) Ra ZRa (b) CD(%) Frequency (%) ZCD

(16)

or it detects a ball in a no-ball frame. Let #frm and #ball-frm denote the numbers of frames and ball frames in the sequence, respectively. Let #correct be the number of frames which the system identifies correctly and #false be the number of false alarms.

The results of ball detection and tracking are presented in Table2. A ball is said to be detected correctly if it matches a ball candidate. A ball is said to be tracked if the system can conclude the correct position of the ball on the derived trajectory. An example is given in Fig.10. Figure10ashows the original frame. In Fig.10b, the ball is missed when the ball is occluded by or close to the player(s). However, the system can still compute the ball trajectory and track the ball positions, as shown in Fig. 10c. We achieve the accuracy (#correct/#frm) of 71.85% on average in ball detection. By inspecting the error cases, we observe that the ball might be missed before serving in some plays, because the player who is serving does not toss up the ball high enough. Consequently, the ball which is too close to

Trajectory Length ZTL Frequency (%) Isolated Candidate (%) Frequency (%) ZIC

Prediction Error (in pixel)

ZPE

Frequency (%)

(a)

(b)

(c)

Fig. 9 Relative frequency distributions of (a) trajectory length, (b) percentage of isolated candidates in the trajectory, (c) prediction errors (in pixel)

Table 2 Performance of ball detection and tracking (accuracy=#correct/#frm)

Sequence Ground truth Detection result Tracking result

#frm #ball-frm #correct #false accuracy #correct #false accuracy

TWN-KOR 15824 11508 11643 426 73.58% 13632 403 86.15%

CHN-JPN 14835 10520 10712 422 72.21% 12877 411 86.80%

JPN-KOR 19241 13147 13499 518 70.16% 16964 506 88.17%

(17)

or occluded by the player is hard to detect. On the other hand, the tracking might fail if too many ball candidates are missed and not enough ball candidates are detected. However, the proposed ball tracking method is able to correct most errors and promotes the final accuracy up to 87.12% on average. Besides, the rate of false alarm (#false/#frm) is very low—an average of 2.65%, which takes a very small portion in the trajectory. Hence, the high reliability of the extracted trajectories significantly promotes the feasibility of the subsequent trajectory-based applica-tions to tactics analysis and 3D trajectory approximation.

Table 3 presents the performance of action detection. Most actions, except serve, are detected well and the accuracies are about 90%. The misses of serve detection are mainly caused by the failures in ball tracking before a serve (as mentioned in the previous paragraph). The 6th column “attack(+audio)” reports the result of attack detection using both the trajectory and audio information. A peak in STE (Short-Time Energy) after the set action is recognized as an attack action. Combination of the trajectory and audio information improves the accuracy of attack detection in two ways: 1) the peaks in STE before the set action should be false alarms and can be eliminated, and 2) some misses in trajectory-based attack detection due to the tracking error can be recovered by STE.

Figure11demonstrates 2D ball trajectory extraction and action detection. The detected action: serve, reception, set and attack are shown in Fig.11a–d, respectively. Set is one of the actions and the set type is further recognized. In each of Fig.11a–d, the left image displays the frame at the moment when the action is detected, with the trajectory superimposed on the frame. The right image shows the automatic generated close-up for the detected action.

9.3 Simulation results of 3D trajectory approximation

The estimation of 3D ball positions highly relies on the 2D ball positions extracted. Owing to the high accuracy of the proposed 2D ball tracking scheme, VIA is able to approximate the 3D trajectory well. Sample simulation results are demonstrated in Figs.12,13, and14. Take Fig.12for explanation. Figure12adisplays the frame at the moment when a serve is

Ball is missed

Ball is recovered

(a) Original frame (b) Ball detection (c) Ball tracking Fig. 10 Illustration of ball detection and ball tracking

Table 3 Performance of action detection

Action Serve Reception Set Attack Attack(+audio)

#action 133 133 130 125 125

#correct 110 119 120 112 115

(18)

occurring. The frame is enriched by superimposing the extracted ball trajectory on the frame and projecting the 3D trajectory on the court plane. Similarly, the enriched frames for reception, set and attack are shown in Fig.12b–d, respectively. It can be observed that the transition positions of the 3D trajectory are almost the locations of the actions occurring, which verifies the feasibility of the proposed 3D trajectory approximation method. The trajectory projected on the court model, as shown in Fig. 12e, enables the audience or professionals to comprehend the transition of ball motion much easily. Figure12fdisplays the

Extracted trajectory Detected action Extracted trajectory Detected action

(a) Serve (b) Reception (c) Set (type = #5) (d) Attack

Fig. 11 Ball trajectory extraction and action detection: a serve, b reception, c set and d attack

(a) (b)

(c) (d)

(e)

(f) _(g) _(h)

Fig. 12 3D trajectory approximation of sample 1 (set type=#5 and the set action is close to the net): (a)–(d) The enriched frames for serve, reception, set and attack, respectively (e) Ball trajectory projected on the court model (f) Serve placement estimation (g)–(h) 3D virtual replays from different viewpoints

(19)

(a) (b)

(c) (d)

(e)

(f) _(g) _(h)

Fig. 13 3D trajectory approximation of sample 2 (set type=#4 and the set action is far from the net): (a)–(d) The enriched frames for serve, reception, set and attack, respectively (e) Ball trajectory projected on the court model (f) Serve placement estimation (g)–(h) 3D virtual replays from different viewpoints

(a) (b)

(c) (d)

(f) (g) (h)

(e)

Fig. 14 3D trajectory approximation of sample 3 (set type=#5 and the set action is far from the net): (a)–(d) The enriched frames for serve, reception, set and attack, respectively (e) Ball trajectory projected on the court model (f) Serve placement estimation (g)–(h) 3D virtual replays from different viewpoints

(20)

serve placement estimation. Furthermore, virtual replays can be provided and the ball trajectory in each play can be viewed from any viewpoint, as presented in Fig.12g–h. In the play of Fig.12, the receiver passes the ball close to the net and the setter sets the ball (type= #5) along the net, which can be seen in Fig.12e. In Fig.13, the receiver passes the ball to the setter about two meters away from the net (see Fig.13e), and the set type is #4. In Fig.14, the receiver passes the ball to the setter about three meters away from the net (see Fig.14e), and the setter sets the ball (type #5) toward the net so that the attacker can hit the ball at a position close to the net.

Inspecting error cases, we find that improper segmentation of the ball might lead to the deviation of the 2D ball candidate coordinate. If there are not enough ball candidates detected to rectify the deviation, the system might misjudge a far-to-near trajectory as a near-to-far one, and vice versa, as shown in Fig.15. Thus, the 3D ball positions would not be estimated correctly. Strictly speaking, there may be some deviation between the actual ball trajectory and the approximated 3D trajectory, due to the effects of the physical factors we do not involve, such as air friction, ball spin rate and spin axis, etc. However, our experimental results show that the proposed physics-based method is able to approximate the 3D ball trajectory pretty well for tactics analysis.

Since the 3D ball coordinates in the real world cannot be obtained, we prepare synthesis data where the ground truth is known and use them to perform quantitative analysis. Table4 shows the number of ball frames (#ball-frame), the 2D ball tracking accuracy, the average and the maximum 3D distances between the ground truth and the computed 3D ball positions. The average 3D distances are about 0.14 m and 0.18 m. The maximum 3D

(a) (b)

(c) (d)

Fig. 15 Example of false 3D ball trajectory approximation: (a) 2D ball trajectory superimposed on the frame, (b) ball trajectory projected on the court model, (c) and (d) 3D virtual replays from different viewpoints

Table 4 Quantitative analysis of 3D trajectory approximation

#ball-frm 2D ball tracking accuracy Avg. 3D distance Max. 3D distance

Synthesis 1 1409 91.41% 0.14 m 0.31 m

(21)

distances are no more than 0.4 m. To further evaluate the robustness of the proposed 3D trajectory approximation approach against 2D ball tracking error, we manually reduced the tracking accuracy via discarding some tracked 2D ball positions and then compute the average and the maximum 3D distances, as show in Table5and Fig.16. It can be observed that the average and the maximum distances increase dramatically as the 2D ball tracking accuracy reduces. That is, the proposed 3D trajectory approximation approach highly relies on the tracked 2D ball positions.

9.4 Discussion and comparison

The experiments are conducted on an IBM ThinkPad X60 notebook computer (CPU: Intel Core Duo T2400 1.83 GHz, RAM: 1 GB). Table 6presents the computing time of each process stage:τ1for object segmentation,τ2for ball candidate detection,τ3for ball tracking

and τ4 for 3D trajectory approximation. The percentage of the computing time of each

process stage to the total computing time tallðtall¼ t1þ t2þ t3þ t4Þ is given in the

parentheses. The last two rows present the number of frames in the sequence (#frm) and the average computing time for each frame (τall/#frm).

Table 6 shows that the computation of 3D trajectory approximation is very efficient, which takes a quite low percentage of the total computing time (about 1.3%). The computational cost is mainly from ball tracking (up to 70%). In our statistics, 6.89 ball candidates are detected from 59.43 objects produced in each frame on average, as presented in Table7. The effectiveness is defined as Eq. (14):

Effectiveness¼ Nð b NaÞ=Nb ð14Þ

where Nban Naare the object numbers before and after applying a sieve to remove non-ball

objects, respectively. The most effective sieve is the size sieve, which is able to remove 68.16% of non-ball objects. The shape and compactness sieves can remove 34.07% and 45.89%, respectively. The computational efficiency of ball tracking can be improved if we tighten the sieve constraints to remove more non-ball objects. However, more misses will occur accordingly. Overall, we achieve the average computing time of about 25 ms/frame, that is, the proposed VIA system is able to extract 2D trajectories and approximate 3D trajectories in real time.

Table 5 3D trajectory approximation performance on different tracking accuracies

2D ball tracking accuracy (Manually reduced) Avg. 3D distance Max. 3D distance

Synthesis 1 91.41% 0.14 m 0.31 m Synthesis R1 73.10% 0.28 m 0.63 m Synthesis R2 60.89% 0.48 m 0.97 m Synthesis R3 45.71% 0.91 m 1.76 m 0 0.5 1 1.5 2 0% 20% 40% 60% 80% 100%

2D ball tracking accuracy

Avg. 3D difference Max. 3D difference

Fig. 16 3D trajectory approxi-mation performance on different tracking accuracies

(22)

For performance comparison, we implement another ball tracking algorithm based on Kalman filter, which is widely used in object tracking [13, 27, 28]. To compare the effectiveness and efficiency of the Kalman filter-based algorithm (KF) with those of our algorithm, we use #correct (the number of the frames in which the system correctly identifies the ball), #false (the number of false alarms), accuracy (#correct/#frm) and CT (the computing time) as criteria, as reported in Table8. The comparison shows that our algorithm performs better in eliminating false alarms. Consequently, our algorithm achieves higher accuracy of about 87% compared to about 80% for KF algorithm. Moreover, our algorithm requires less computing time. This is because the use of the ball motion characteristics can prevent searching many false trajectories. In conclusion, our algorithm outperforms KF algorithm in both effectiveness and efficiency.

As to 3D trajectory approximation, Table9shows the comparison between the famous existing systems [13,20,21] and our proposed VIA system. To the best of our knowledge, there are few researches on 3D trajectory approximation from single-camera video sequences. The 2D-to-3D inference is also one of the main contribution and novelty of this paper. Most of the existing 3D trajectory reconstruction systems work on multiple cameras located on specific positions. The Hawk-Eye system [20] completed its debut at the Wimbledon Championships 2007 and has been applied to official games for years. It is claimed that the Hawk-Eye system can provide instant replay within 2–3 seconds and shows an average error of only 0.36 cm. ESPN K-Zone system [13] officially debuted in 2001 and is claimed to be accurate to within four-tenths of an inch (1.016 cm). The UIS (Umpire Information System) [21] was first seen on air during the 1997 Baseball World Series. The UIS claims that each pitch can be tracked and recorded within a half-inch (1.27 cm) of its actual location. These existing systems have outstanding performance in 3D trajectory approximation. However, they require high cost of multiple high speed cameras Table 6 Computing time of each process stage (τl: object segmentation,τ2: ball candidate detection,τ3: ball tracking,τ4: 3D trajectory approximation, tall¼ t1þ t2þ t3þ t4)

Computing time TWN-KOR CHN-JPN JPN-KOR

τ1(τ1/τall) 81353 ms (21.77%) 78593 ms (19.61%) 96649 ms (19.93%) τ2(τ2/τall) 23729 ms (6.35%) 24485 ms (6.11%) 37799 ms (7.79%) τ3(τ3/τall) 264010 ms (70.63%) 292547 ms (72.99%) 343944 ms (70.93%) τ4(τ4/τall) 4656 ms (1.25%) 5184 ms (1.29%) 6542 ms (1.35%) τall 373748 ms 400809 ms 484934 ms #frm 15824 14835 19241 τall/#frm 23.62 ms 27.02 ms 25.2 ms

Table 7 Effectiveness of the size, shape and compactness sieves

Process stage #object (per frame) Effectiveness

Initial 59.43

Apply the size sieve only 18.92 68.16%

Apply the shape sieve only 39.19 34.07%

Apply the compactness sieve only 32.16 45.89%

(23)

and have strong limitation of view angles. In a way our proposed VIA system better meet the practical requirement and general users’ needs.

10 Conclusions and future work

The more you know the opponents, the better chance you stand of winning. Therefore, game strategy study before the play is of vital importance for the coaches and players. To assist game strategy study and extract tactic information, we design a physics-based system VIA (Volleyball Intelligence Agent) for ball tracking, 3D trajectory approximation and providing applications to tactics analysis based on the 2D and 3D trajectories. The problem of 2D-to-3D inference is intrinsically challenging due to the loss of 3D information in projection to 2D frames. One significant contribution is the integrated scheme which utilizes the domain knowledge of court specification for camera calibration and encapsulates physical characteristics of ball motion into object tracking to achieve 3D trajectory approximation from single view video sequences. Moreover, the VIA system has illustrated some of the numerous trajectory-based applications made possible by this scheme, including: action detection, set type recognition, 3D virtual replays and serve placement estimation. These applications significantly assist the coaches, players and the audience to have a novel insight into the game.

Every sport has its respective game rules, domain knowledge, court model and tactics. Therefore, tactics analysis is specially designed for a specific sport. The trajectory-based applications to tactics analysis presented in this paper cannot be directly applied to other sports. However, we propose a generalized approach for 2D ball tracking and 3D trajectory approximation. With domain knowledge-based adaption, the proposed approach can be readily applied to the sports which have sufficient court information captured in the video and require ball trajectory extraction for tactics analysis, such as basketball, soccer, tennis and table tennis. On the other hand, it is also considered as part of our future work to apply the proposed system to the video of higher resolution, and we believe that better performance can be achieved. Table 8 Comparison between the Kalman filer-based algorithm and our algorithm (#false: number of false positive, CT: computing time)

Ball tracking KF algorithm Our algorithm

#correct #false accuracy CT (ms) #correct #false accuracy CT (ms)

TWN-KOR 12890 775 81.46% 340308 13632 403 86.15% 264010

CHN-JPN 11818 701 79.66% 392012 12877 411 86.80% 292547

JPN-KOR 15427 1019 80.18% 449190 16964 506 88.17% 343944

Total 40135 2495 80.43% 1181510 43473 1320 87.12% 900501

Table 9 Comparison between the famous existing systems [13,20,21] and our proposed VIA system on 3D trajectory approximation ESPN K-Zone System [13] Hawk-Eye System [20] QuesTec UIS [21] Our proposed VIA system Avg. 3D error 1.016 cm 0.36 cm 1.27 cm 14 cm # of camera(s) 2 6 4 1

(24)

Generally speaking, not all of the sports games are broadcasted on TV. It is a growing trend that the coaches and players set up a camera to capture the game they want to analyze. This trend necessitates the development of computer-assisted game study system for the captured sports video like the proposed VIA system. The main difference between the user-captured video and the broadcast video is that the former has no camera motion while the latter has. To avoid the effect of camera motion, in this paper we use user-captured video to verify the proposed 3D trajectory approximation approach. The experiments show convincing results.

The limitation of our proposed system is that only the video sequences captured by a fixed camera are analyzed. For broadcast video sequences, camera motion increases the complexity in moving object segmentation. The frame differencing method is no longer adequate to segment objects when the background keeps changing. On the other hand, the current system considers the physical effect of gravity acceleration to model 3D ball trajectories as parabolic curves. However, there are still other factors affecting the ball motion, such as air friction, ball spin axis, ball spin rate, etc. In the future, we will take camera motion into consideration for ball tracking in broadcast video sequences. Moreover, we will involve more physical factors to model the 3D ball trajectory more precisely.

Acknowledgement The research is partially supported by the National Science Council of Taiwan, R.O.C, under the grant No. NSC 95-2221-E-009-076-MY3 and partially supported by Lee and MTI center for Networking Research at National Chiao Tung University, Taiwan.

References

1. Assfalg J, Bertini M, Colombo C, Bimbo AD, Nunziati W (2003) Semantic annotation of soccer videos: automatic highlights identification. Comput Vis Image Underst 92(23):285–305

2. Cheng CC, Hsu CT (2006) Fusion of audio and motion information on HMM-based highlight extraction for baseball games. IEEE Trans Multimed 8(3):585–599

3. Chen HT, Hsiao MH, Chen HS, Tsai WJ, Lee SY (2008) A baseball exploration system using spatial pattern recognition. In: Proc IEEE Int Symp Circuits and Systems 2008:3522–3525

4. Chen HT, Chen HS, Hsiao MH, Tsai WJ, Lee SY (2008) A trajectory-based ball tracking framework with enrichment for broadcast baseball videos. J Inf Sci Eng 24(1):143–157

5. Chen HT, Tien MC, Chen YW, Tsai WJ, Lee SY (2009) Physics-based ball tracking and 3D trajectory reconstruction with applications to shooting location estimation in basketball video. J Vis Commun Image Representation 20(3):204–216

6. Chen HT, Chen HS, Lee SY (2007) Physics-based ball tracking in volleyball videos with its applications to set type recognition and action detection. In: Proc. IEEE Int. Conf. on Acoustic Speech Signal Process 2007, pp. I-1097–1100

7. Duan LY, Xu M, Tian Q (2005) A unified framework for semantic shot classification in sports video. IEEE Trans Multimed 7(6):1066–1083

8. Duan LY, Xu M, Chua TS, Tian Q, Xu CS (2003) A mid-level representation framework for semantic sports video analysis. In: Proc. 11th ACM Int. Conf. Multimedia, pp.33–44

9. Ekin A, Tekalp AM, Mehrotra R (2003) Automatic soccer video analysis and summarization. IEEE Trans Image Process 12(7):796–807

10. Farin D, Krabbe S, Peter HN, Effelsberg W (2004) Robust camera calibration for sport videos using court models. SPIE Storage Retrieval Meth Appl Multimedia 5307:80–91

11. Farin D, Han J, Peter HN (2005) Fast camera calibration for the analysis of sport sequences. In: Proc IEEE Int Conf Multimedia and Expo 2005:482–485

12. Forlines C, Peker KA, Divakaran A (2006) Subjective assessment of consumer video summarization. In: Proc. SPIE Int. Soc. Opt. Eng. (6073), pp. 170–177

13. Gueziec A (2002) Tracking pitches for broadcast television. Computer 35:38–43 14. Gonzalez RC, Woods RE (2002), Digital image processing, Prentice Hall (2nd edition)

15. Hartley R, Zisserman A (2003) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, UK

(25)

16. Lu H, Tan YP (2003) Unsupervised clustering of dominant scenes in sports video. Pattern Recogn Lett 24(15):2651–2662

17. Loui A, Luo J, Chang S, Ellis D, Jiang W, Kennedy L, Lee K, Yanagawa A (2007) Kodak 's consumer video benchmark data set: concept definition and annotation. In: Proc Int Workshop Multimedia Inform Retrieval 2007:245–254

18. Mei T, Hua XS (2008) Structure and event mining in sports video with efficient mosaic. Multimedia Tools Appl 40(1):89–110

19. Oami R, Benitez AB, Chang SF, Dimitrova N (2004) Understanding and modeling user interests in consumer videos. In: Proc IEEE Int Conf Multimedia and Expo 2004:1475–1478

20. Owens N, Harris C, Stennett C (2003) Hawk-eye tennis system. In: Proc Inf Conf Visual Information Engineering 2003:182–185

21. QuesTec-Umpire Information System. [Online]. Available:http://www.questec.com/q2001/prod_uis.htm 22. Seo Y, Choi S, Kim H, Hong KS (1997) Where are the ball and players? Soccer game analysis with color-based tracking and image mosaick In: Proc Image Analysis and Processing 1997(1331):196–203 23. Tien MC, Chen HT, Chen YW, Hsiao MH, Lee SY (2007) Shot classification of basketball videos and its applications in shooting position extraction. In: Proc. IEEE Int. Conf. Acoust. Speech Signal Process 2007, pp. I-1085–1088.

24. Watanabe T, Haseyama M, Kitajima H (2004) A soccer field tracking method with wire frame model from TV images. In: Proc IEEE Int Conf Image Process 2004:1633–1636

25. Xu M, Maddage NC, Xu C, Kankanhalli M, Tian Q (2003) Creating audio keywords for event detection in soccer video. In: Proc. IEEE Int. Conf. Multimedia and Expo 2003, pp. II-281–284

26. Xu M, Duan L, Chia L, Xu C (2004), Audio keyword generation for sports video analysis. In: Proc. 12th Annual ACM Int. Conf. Multimedia, pp.758–759

27. Yu X, Xu C, Leong HW, Tian Q, Tang Q, Wan KW (2003) Trajectory-based ball detection and tracking with applications to semantic analysis of broadcast soccer video. In: Proc. 11th ACM Int. Conf. Multimedia, pp.11–20 28. Yu X, Leong HW, Xu C, Tian Q (2006) Trajectory-based ball detection and tracking in broadcast soccer

video. IEEE Trans Multimed 8(6):1164–1178

29. Yu X, Jiang N, Cheong LF, Leong HW, Yan X (2008) Automatic camera calibration of broadcast tennis video with applications to 3 d virtual content insertion and ball detection and tracking. Comput Vis Image Underst 113(5):643–652

30. Yu X, Jiang N, Cheong LF (2007) Accurate and stable camera calibration of broadcast tennis video. In: Proc Int IEEE Conf Image Process 2007:93–96

31. Yu X, Tu X, Ang EL (2007) Trajectory-based ball detection and tracking in broadcast soccer video with the aid of camera motion recovery. In: Proc IEEE ICME 2007:1543–1546

32. Zhu G, Huang Q, Xu C, Xing L, Gao W, Yao H (2007) Human behavior analysis for highlight ranking in broadcast racket sports video. IEEE Trans Multimed 9(6):1167–1182

33. Zhu G, Huang Q, Xu C, Rui Y, Jiang S, Gao W, Yao H (2007) Trajectory based event tactics analysis in broadcast sports video. In: Proc. 15th ACM Int. Conf. Multimedia, pp.58–67

34. Zhang T, Kuo J (2001) Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans Speech Audio Process 9(4):441–457

Hua-Tsung Chen received the B.S. M.S. and Ph.D. degrees in Computer Science and Information Engineering from National Chiao Tung University, Hsinchu, Taiwan in 2001, 2003 and 2009, respectively. He

(26)

is current Assistant Research Fellow with Information and Communications Technology Lab, National Chiao Tung University, Hsinchu, Taiwan. His research interests include computer vision, video signal processing, content-based video indexing and retrieval, multimedia information system and music signal processing.

Wen-Jiin Tsai received the B.S., M.S. and Ph.D. degrees in computer science and information engineering from National Chiao Tung University, Hsinchu, Taiwan, in 1992, 1993 and 1997, respectively. She was a software manager at the DTV R&D Department of Zinwell Corporation, Hsinchu, Taiwan, during 1999-2005. She has been an Assistant Professor at the Department of Computer Science, National Chiao-Tung University, Hsinchu, Taiwan, since February 2005. Her research interests include video compression, video transmission, digital TV and content-based video retrieval.

Suh-Yin Lee received the B.S. degree in electrical engineering from National Chiao Tung University, Taiwan, in 1972, and the M.S. degree in computer science from University of Washington, Seattle, U.S.A., in 1975, and the Ph.D. degree in computer science from Institute of Electronics, National Chiao Tung University. Her research interests include content-based indexing and retrieval, distributed multimedia information system, mobile computing and data mining.

(27)

Jen-Yu Yu received the B.S. and M.S. degrees in Computer Science and Information Engineering from National Chiao Tung University in 2000 and 2002, respectively. He joined the Information and Communications Research Laboratories of the Industrial Technology Research Institute in 2003, where he is currently the deputy division director of division for Network-based Services Technology. His major research areas include mobile video streaming system, distributed multimedia system, peer-to-peer system, content-based retrieval and video compression.