Camera Calibration - Scene Change Detection of Basketball Video and Its Application in

Chapter 3 Scene Change Detection of Basketball Video and Its Application in

3.5 Camera Calibration

For semantic analysis of sport videos, camera calibration parameters are required to convert the positions of a ball and players in the video frame to 3D space in the real-world coordinates or vice versa. Fig. 3-18 shows the correspondence between a 2D court image and a 3D court model.

2D Court Image

3D Court model

Fig. 3-18 Correspondence between 2D court image and 3D court model.

As mentioned in section 2.4.1:

To calculate the eleven camera parameters(c ), we need at least six non co-plane _ij points whose 2D and 3D coordinates are both known. In court sport like basketball, the marker lines on the court and the backboard boundary can be used to determine the calibration parameters since both the color and length of the marker lines and

backboard boundary are determined by the official rules. Fig. 3-19 shows the line correspondences between image and basketball court model. If we can find white lines in the image, the crossing or boundary points of lines can be used to calculate the transformation between the image and the real court. After that, the positions of the ball and players on the court can be estimated by detecting the center point and footing points respectively.

Fig. 3-19 Line correspondences between image and basketball court model.

Fig. 3-20 is the flow chart of camera calibration. For each frame, image pixels are classified as court line and backboard boundary pixels by some color and local texture constraints. Hough Transform and Court Model Fitting are applied at first

Hough Transform White Line Pixel Detection

Court Model Fitting

Court Parameter Refinement

Frame 1 Frame 2 Frame 3

Camera Parameter Refinement

Camera Parameter Refinement White Line

Pixel Detection

White Line Pixel Detection

frame to extract line candidates and initialize the court and backboard location. In subsequent frames, we make a fast local search for the new camera parameters with the previous approximate court and backboard locations rather than performing Hough Transform and Court Model Fitting again. We will explain each step in following sections.

Fig. 3-20 The flow chart of camera calibration.

3.4.1 White Pixel Detection

The color of court lines is white by the official rule. However, there are other white objects in an image such as advertisement logos, part of the stadium, the spectators or the players dressed in white clothes. These not correctly detected white pixels result in too many line candidates after using the subsequent Hough line-detection method, and make the fitting of the court model time consuming and unreliable. We use additional criteria to constrain the set of court line pixels. As illustrated in Fig. 3-21, assume that the court line has a width=τ pixels and the

⎪⎩

LinePixel _a _b _b

away from the current pixel in the vertical and horizontal directions respectively. We check if the brightness of Ｏ or Ｘ is darker than the candidate pixel.

Fig. 3-21 Part of the image containing a white line pixel.

We identify a pixel as a court line pixel or not according to Eq.(10):

(10)

where )Y(x,y is the luminance value in YCbCr space. Fig. 3-22 is an example after applying Eq.(10) to detect possible white line pixels.(a) is the original image, (b) shows detected white line pixels by red points, and (c) extracts white line pixels by black points.

Since pixels in finely textured areas of small white letters in logos, white areas in the stadium, or spectators wearing white clothes will still pass the above white line test, the result will contain many noise pixels. Therefore, we exclude those white pixels that are in textured regions to prevent too much false detection in the line-extraction step.

Textured regions are recognized by observing the two eigenvalues of the structure matrix S, computed over a small window of size 2b + 1 around each

∑ ∑

⁺

−

∇

⋅

∇

=

^p ^b

b p x

b p

b p y

x T

y x Y y

x Y

s ( , ) ( ( , ))

candidate pixel (p_x,p_y). The structure matrix is defined in [21]:

If both eigenvalues of matrix S, called λ and ₁ λ₂ (λ₁ ≥λ₂)are large, it indicates a two-dimensional texture area. If one eigenvalue is large and the other is small, image gradients are oriented along a common axis. On the straight court lines, the latter case will be applied to define an additional rule which retains white pixels if

1 λ

λ ≥ c⋅ .

(a) The original image. (b) White line pixels shown by red points.

Fig. 3-22 White line pixel detection.

Results of the proposed structure constraint can be seen in Fig. 3-23. (a) shows the white line pixels without line-structure constraint by red points. (b) shows the white line pixels with line-structure constraint by red points. (c) extracts white line pixels without line-structure constraint by black points. (d) extracts white line pixels with line-structure constraint by black points. We can observe that many noise pixels in the area of small white letters in logos, white areas in the stadium, or spectators wearing white clothes are removed after applying line-structure constraint.

(a)White line pixels without line-structure (b) White line pixels with line-structure constraint shown by red points. constraint shown by red points.

(c)Extracted white line pixels without (d) Extracted white line pixels with line-structure constraint. line-structure constraint.

Fig. 3-23 Applying line-structure constraint.

3.4.2 Court Line and Backboard Line Candidates Detection

After obtaining the white pixels, the system has to identify the court lines and the top boundary of the backboard. A standard Hough transform on the set of the previously detected white pixels is used to detect these line candidates. As depicted in Fig. 3-24, the parameter space used to represent the lines is (θ, d), where θ is the angle between the line normal and the horizontal axis, and d is the distance of the line to the origin. We construct an accumulator matrix for all (θ, d) and sample the accumulator matrix at a resolution of one degree for θ and one pixel for d. As Fig.

3-25 shows, since a line in (x, y) space corresponds to a point in (θ, d) space, line candidates are determined by extracting the local maxima in the accumulator array.

Fig. 3-24 Hough transform for straight lines.

d y

L: ⋅cosθ+ ⋅sinθ =

The Hough transform has the disadvantage that thick lines in the input image usually result in a bundle of detected lines, which all lie close together. Another disadvantage of the Hough transform is that the accuracy of the determined line parameters is depending on the resolution of the accumulator matrix. This problem cannot be easily reduced by increasing resolution of the accumulator matrix, since this also causes that the inexact parameter samples for an input line spread over a larger area in the accumulator matrix. Solve both of the above-mentioned problems by introducing a further step after the Hough transform to improve the accuracy of the detected line parameters by computing the best fit line to the input data. Furthermore, lines whose parameters are nearly equal are considered being duplicates and one of them is removed.

With all line candidates, we can obtain six intersections of the court lines as indicated in Fig. 3-26. However, we need two more points of the backboard to calculate the camera parameters.

Fig. 3-26 Six intersections of the court line candidates.

As we can see in Fig. 3-26, the lighting condition and the material of the backboard usually make the white pixels only distinguishable on the top of the

backboard. If we can obtain the start point and end point of the backboard top-line, we can calculate the camera parameters. Unfortunately, the white top-line of the backboard is too short in comparison with the court lines, which results in the elimination from the line candidates during the Hough Transform step. To solve this problem, we use only the one fourth pixels in the top of the frame to detect the backboard line, and compute the line segment boundaries to know where the line starts and ends. The algorithm of line segment boundaries is described as follows.

Fig. 3-27 Detection of line-segment boundaries.

Fig. 3-28 Boundaries of the backboard top-line.

Scanning along the detected line, a sequence of white (top-line) pixels and black (non top-line) pixels is obtained. Because of classification errors and occlusions, the

start and ends at position end, and define the number of errors as the number of black pixels in the range start – end plus the number of white pixels outside the range start – end ( stands for errors). Using this error definition, we place the line segment boundaries such that the error is minimized. This optimization has a linear time complexity, and the result is shown in Fig. 3-28.

3.4.3 Model Fitting

With the intersections of court lines and the boundaries of backboard top-line found in the first frame, we can match the eight points to the court model and calculate the camera parameters as Fig. 3-29 shows.

Fig. 3-29 Match the eight points to the court model and calculate the camera parameters.

3.4.4 Court Parameter Refinement

The previous calibration algorithm only need to be applied in the bootstrapping process when the first frame of a new shot is processed. For subsequent frames, we can assume that the acceleration of camera motion is small. This enables the prediction of the camera parameters for the next frame. Since the prediction provides a good first estimate of the camera parameters, a simplified version of the above algorithm can be applied.

Fig. 3-30 Camera parameter prediction.

As Fig. 3-30 shows, H is the camera parameters for frame t. If we know the _t camera parameters for frames t and t − 1, we can predict the camera parameters H^ˆ_t₊₁ for t + 1 by Hˆ_t ₁ H_tH_t⁻¹₁H_t

−

+ = . The non-linear Levenberg-Marquardt minimization algorithm can be used to find the new camera parameters [21].

However, the court lines are too complex and varied when the basketball video has camera motion, which cause difficulty in camera parameter refinement. Since tracking the camera parameter is a bottleneck, we only analyze clips without camera motion to estimate the 3D trajectory. From 2D trajectories obtained in the ball

C B

D A

through the backboard. As represented in Fig. 3-31, the four 2D image points of the backboard are marked as A, B, C and D, which can be derived from the 3D real world locations. If the parabola of the 2D trajectory passes through the minimum bounding rectangle of the backboard, it will be a possible shooting trajectory.

Fig. 3-31 Extract possible 2D shooting trajectory.

The relationship between each pair of corresponding points in the 2D and 3D space is:

(11)

where ( vu, ) is in the 2D image coordinates and (X_C,Y_C,Z_C) is in the 3D real world coordinates. Moreover, the 3D ball trajectory should fit the physical property:

(12)

where )(x₀,y₀,z₀ is the initial position of the ball in 3D coordinate, (V_x,V_y,V_z) is the velocity of the ball in 3D coordinate, g is acceleration of gravity, and t is the current time.

)

Since the eleven camera calibration parameters and the time of each point on the trajectory are known, we can calculate the six unknowns (_x₀_,_y₀_,_z₀_,_V_x_,_V_y_,_V_z) of the parabola with three or more arbitrary points on the 2D trajectory. Fig. 3-32 indicates the three points that we choose to calculate (_x₀_,_y₀_,_z₀_,_V_x_,_V_y_,_V_z ). With camera parameters matrix C and six physical parameters (_x₀_,_y₀_,_z₀_,_V_x_,_V_y_,_V_z), we can extract the 3D trajectory and take the starting point of the 3D trajectory as the shot position.

Fig. 3-32 Choose three points on the 2D trajectory.

Chapter 4 Experiment

In this chapter, we present the experimental results of the proposed system. We detect scene changes of MPEG testing sequences in compressed domain. For shot classification and tactic analysis steps, we use AVI sequences and implement the analysis process in pixel domain. The resolution of all sequences is 360×240. Section 4.1 shows the result of scene change detection and shot classification. In section 4.2 and section 4.3, the outcomes of 2D ball trajectory extraction and camera calibration are illustrated, respectively. Finally, the 3D shooting position is indicated.

4.1 Experimental Result of Scene Change Detection and Shot Classification

We use two basketball videos of HBL (High-school Basketball League) to test the scene change detection and shot classification algorithm. The first video is a 15 minutes long basketball video which contains 96 shots ( 37 Close-up view shots, 27 Medium view shots, and 32 Full-court view shots), and the other is 10 minutes long and contains 71 shots ( 26 Close-up view shots, 24 Medium view shots, and 21 Full-court view shots). Table. 1 shows the classification results.!

From Table. 1, the accuracy of our shot classification algorithm is about 95.2%

(the number of correctly classified shots divided by the number of total shots). The miss and false situation may be caused by the angle of view. For instance, if a real full court view shot contains large portion of spectators, the ratio of the court dominant color will be lower, which results in wrong classification.

Close up Medium Full court

Sequence 1 Sequence 2 Sequence 1 Sequence 2 Sequence 1 Sequence 2

Ground Truth 37 26 27 24 32 21

No. of Miss 1 2 2 2 0 1

No. of False 0 1 1 3 2 1

Table. 1 Shot classification results of two testing sequences. Sequence 1 is a 15 minutes basketball video containing 96 shots, and sequence 2 is a 10 minutes basketball video containing 71 shots.

4.2 Experimental Result of Tracking the Ball

Using the proposed ball candidate search and tracking methods, we can obtain the 2D trajectories from the full court view shots. Fig. 4-1 is the tracking result of a shot without camera motion, and Fig. 4-2 is the tracking result of a shot with camera motion. No matter the sport video is shot by stationary camera or not, we can obtain its possible 2D trajectories.

Fig. 4-2 The tracking result of a shot with camera motion.

4.3 Experimental Result of Camera Calibration and Shooting Position

In this section, we only use the clips without camera motion to test the camera calibration algorithm. As Fig. 4-3 shows, the location of the points for camera calibration and the backboard position can be derived from the image. Therefore, the real shooting trajectory presented by solid circles can be identified as shown in Fig.

4-4. Use the transformation relationship from 2D coordinate to 3D coordinate, we can obtain the shot position. Fig. 4-5 indicates the 3D shooting position by a red point.

Fig. 4-3 The 2D location of the points for camera calibration and the backboard position.

Fig. 4-4 The real 2D ball trajectory.

Fig. 4-5 The obtained shooting position in 3D court model.

Chapter 5 Conclusion and Future Work

Sport event detection has been proposed in previous research. However, these events only provide the audience a more efficient way to browse through sport videos.

We propose a system that can automatically detect the scene change of the basketball video and classify clips into three kinds of shots. With the full-court-view shots, we can track the ball in the videos, detect the court-line and the backboard positions, and define the transformation relationship from 2D image to 3D real-world court model.

After mapping the position of the ball from images to court model, the system concludes the possible shooting positions.

Analyzing tactics in basketball video is difficult due to the variation of view angle, the complexity of background and the intricacy of court lines. Our ball tracking method can be used for any full court view shot no matter whether there is camera motion or not. However, the camera calibration algorithm can only be applied for clips without camera motion.

Since the camera is not fixed, the result of shooting positions might not be accurate enough. The future work can be concentrated on videos shot by stationary camera so that the system will be more reliable. Tracking players in the video is difficult because occlusion occurs when players get close. If we can propose a more effective and efficient tracking algorithm, we could gather more statistics to analyze the behavior of the players in the games. Furthermore, we can conclude useful knowledge such as the defense rank and the offense tactics for professional basketball players and coaches who need more detailed information of the game.

Bibliography

[1] G. Lu, "Communication and Computing for Distributed Multimedia Systems,"

Artech House: Norwood, MA, 1996.

[2] A. Puri, R. L. Schmidt, and B. G. Haskell, "Overview of the MPEG Standards," edit by Atul Puri, and Tsuhan Chen, Maecel Dekker Inc, New York/Basel, 2000.

[3] Y. Gong, L. T. Sin, C. H. Chuan, H. Zhang, and M. Sakauchi, "Automatic Parsing of TV Soccer Programs," IEEE International Conference on Multimedia Computing and Systems, pp. 167-174, 1995.

[4] Y. P. Tan, D. D. Saur, S. R. Kulkarni, and P. J. Ramadge, "Rapid Estimation of Camera Motion from Compressed Video with Application to Video Annotation," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 10, Issue 1, pp. 133-146, 2000.

[5] D. Zhong and S. F. Chang, "Structure Analysis of Sports Video Using Domain Models," IEEE International Conference on Multimedia and Expo, pp.713-716, 2001.

[6] L. Xie, S. F. Chang, A. Divakaran, and H. Sun, "Structure Analysis of Soccer Video with Hidden Markov Models," International Conference on Acoustic, Speech, and Signal Processing, Vol. 4, pp. 4096-4099, 2002.

[7] G. Sudhir, J. C. M. Lee, and A.K.Jain, "Automatic Classification of Tennis Video for High-Level Content-Based Retrieval," IEEE International Workshop on Content-Based Access of Image and Video Databases, pp. 81-90, 1998.

[8] W. Hua, M. Han, and Y. Gong, "Baseball Scene Classification Using Multimedia Features," IEEE International Conference on Multimedia and Expo, Vol. 1, pp.821-824, 2002.

[9] J. Assfalg, M.Bertini, A. D. Bimbo, W. Nunziati, and P.Pala, "Soccer Highlights Detection and Recognition Using HMMs," IEEE International Conference on Multimedia and Expo, Vol. 1, pp.825-828, 2002.

[10] C. W. Ngo, T. C. Pong, and H. J. Zhang, "On Clustering and Retrieval of Video Shots," ACM Multimedia, pp. 51-60, 2001.

[11] J. Assfalg, M. Bertini, C. Colombo, and A. D. Bimbo, "Semantic Annotation of Sports Videos," IEEE Multimedia, Vol.9, Issue 2, pp. 52-60, 2002.

[12] Y. Gong, C. Hock-Chuan, and L. T. Sin, "An Automatic Video Parser for TV Soccer Games," The 2nd Asian Conference on Computer Vision, Vol. 2, pp.

509-513, 1995.

[13] D. Yow, B. L. Yeo, M. Yeung, and B. Liu, "Analysis and Presentation of

Computer Vision, pp. 499-503, 1995.

[14] T. Tab, J. Hasegawa, and T. Fukumura, "Development of Motion Analysis System for Quantitative Evaluation of Teamwork in Soccer Games,"

International Conference on Image Processing, Vol. 3, pp. 815-818, 1996.

[15] Y. Seo, S. Choi, H. Kim, and K. S. Hong, "Where Are the Ball and Players?

Soccer Game Analysis with Color-Based Tracking and Image Mosaick," The 9th International Conference on Image Analysis and Processing, Vol. 2, pp.

196-203, 1997.

[16] Y. Ohno, J. Miura, and Y. Shirai, "Tracking Players and a Ball in Soccer Games," IEEE/SICE/RSJ International Conference onMultisensor Fusion and Integration for Intelligent Systems, pp. 147-152, 1999.

[17] Y. Ohno, J. Miura, and Y. Shirai, "Tracking Players and Estimation of the 3D Position of a Ball in Soccer Games," IEEE International Conference on Pattern Recognition, Vol. 1, pp. 145-148, 2000.

[18] H. Kim and K. Hong, "Robust Image Mosaicing of Soccer Videos Using Self-calibration and Line Tracking," Pattern Analysis & Applications, Vol. 4, pp. 9-19, 2001.

[19] T. Watanabe, M. Haseyama, and H. Kitajima, "A Soccer Field Tracking Method With Wire Frame Model From TV Images," IEEE International Conference on Image Processing, Vol. 3, pp. 1633-1636, 2004.

[20] C. Calvo, A. Micarelli, and E. Sangineto, "Automatic Annotation of Tennis Video Sequences," The 24th DAGM Symposium on Pattern Recognition, Vol.

2449, pp. 540-547, Springer, 2002.

[21] D. Farin, S. Keabbe, P. H. N. d. With, and W. Effelsberg, "Robust Camera Calibration for Sport Videos Using Court Models," In SPIE Storage and Retrieval Methods and Applications for Multimedia, Vol. 5307, pp. 80-91, 2004.

[22] M. Xu, L. Y. Duan, C. Xu, M. Kankanhalli, and Q. Tian, "Event Detection in Basketball Video Using Multiple Modalities," IEEE Joint Conference of the Fourth International Conference on Information, Communications, and Signal Processing, Vol. 3, pp. 1526-1530, 2003.

[23] A. Ekin and A. M. Tekalp, "Generic Play-break Event Detection for Summarization and Hierarchical Sports Video Analysis," IEEE International Conference on Multimedia and Expo, Vol. 1, pp. 169-172, 2003.

[24] A. Ekin and A. M. Tekalp, "Shot Type Classification by Dominant Color for Sports Video Segmentation and Summarization," IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 3, pp. 173-176, 2003.

[25] G. Xu, Y. F. Ma, H. I. Zhang, and S. Yang, "A HMM Based Semantic Analysis Framework for Sports Game Event Detection," IEEE International Conference on Image Processing, Vol. 1, pp. 25-28, 2003.

[26] C. Y. Wu, "Video Content Representation and Indexing Using Hierarchical Structure," Master thesis, National Chiao Tung University, Dept. of CSIE, 2000.

[27] B. L. Yeo and B. Liu, "Rapid Scene Analysis on Compresses Video," IEEE Transaction on Circuit and System for Video Technology, Vol. 5, Issue 6, pp.

533-544, 1995.

[28] A. Ekin and A. M. Tekalp, "Robust Dominant Color Region Detection and Color-based Applications for Sports Video," IEEE International Conference

在文檔中籃球影片之場景偵測及其在戰術分析之應用 (頁 46-0)