A spatial-color mean-shift object tracking algorithm
with scale and orientation estimation
, Chung-Wei Juan, Jyun-Ji Wang
Department of Electrical and Control Engineering, National Chiao Tung University, Hsinchu 300, Taiwan, ROC
a r t i c l e
i n f o
Received 30 October 2007
Received in revised form 26 May 2008 Available online 22 August 2008 Communicated by Y. Ma Keywords:
Mean-shift Object tracking
Principle component analysis Object deformation
a b s t r a c t
In this paper, an enhanced mean-shift tracking algorithm using joint spatial-color feature and a novel similarity measure function is proposed. The target image is modeled with the kernel density estimation and new similarity measure functions are developed using the expectation of the estimated kernel den-sity. With these new similarity measure functions, two similarity-based mean-shift tracking algorithms are derived. To enhance the robustness, the weighted-background information is added into the proposed tracking algorithm. Further, to cope with the object deformation problem, the principal components of the variance matrix are computed to update the orientation of the tracking object, and corresponding eigenvalues are used to monitor the scale of the object. The experimental results show that the new sim-ilarity-based tracking algorithms can be implemented in real-time and are able to track the moving object with an automatic update of the orientation and scale changes.
Ó 2008 Elsevier B.V. All rights reserved.
In visual tracking, object representation is an important issue, because it can describe the correlation between the appearance and the state of the object. An appropriate object representation makes the target model more distinguishable from the back-ground, and achieves a better tracking result. Comaniciu et al. (2003)used the spatial kernels weighted by a radially symmetric normalized distance from the object center to represent blob ob-jects. This representation makes mean-shift tracking more efﬁ-cient. Radially symmetric kernel preserves representation of the distance of a pixel from the center even the object has a large set of transformations, but this approach only contains the color infor-mation of the target and the spatial inforinfor-mation is discarded. Para-meswaran et al. (2006)proposed the tunable kernels for tracking, which simultaneously encodes appearance and geometry that en-able the use of mean-shift iterations. A method was presented to modulate the feature histogram of the target that uses a set of spa-tial kernels with different bandwidths to encode the spaspa-tial infor-mation. Under certain conditions, this approach can solve the problem of similar color distribution blocks with different spatial conﬁgurations.
Another problem in the visual tracking is how to track the scale of object. In the work byComaniciu et al. (2003), the mean-shift algorithm is run several times, and for each different window size, the similarity measure Bhattacharyya coefﬁcient is computed for
comparison. The window size yielding the largest Bhattacharyya coefﬁcient, i.e. the most similar distribution, is chosen as the up-dated scale.Parameswaran et al. (2006), Birchﬁeld and Rangarajan (2005) and Porikli and Tuzel (2005)use the similar variation meth-od to solve the scale problem. But this methmeth-od is not always stable, and easily makes the tracker lose the target. Collins (2003) ex-tended the mean-shift tracker by adapting Lindeberg’s theory (Lindeberg, 1998) of feature scale selection based on local maxima of differential scale-space ﬁlters. It uses blob tracking and a scale kernel to accurately capture the target’s variation in scale. But the detailed iteration method was not described in the paper. An EM-like algorithm (Zivkovic and Krose, 2004) was proposed to esti-mate simultaneously the position of the local mode and used the covariance matrix to describe the approximate shape of the object. However, implementation details such as deciding the scale size from the covariance matrix were not given. Other attempts were made to study different representation methods. Zhang et al. (2004)represented the object by a kernel-based model, which of-fers more accurate spatial–spectral description than general blob models. Later, they further extend the work to cope with the scal-ing and rotation problem under the assumption of afﬁne transfor-mation (Zhang et al., 2005). Zhao and Tao (2005) proposed the color correlogram to use the correlation of colors to solve the re-lated problem. But these methods did not consider the inﬂuence of complex background.
This work extends the traditional mean-shift tracking algorithm to improve the performance of arbitrary object tracking. At the same time, the proposed method tries to estimate the scale and orientation of the target. This idea is similar to the CAMSHIFT 0167-8655/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved.
* Corresponding author. Tel.: +886 3 5712121x54318; fax: +886 3 5715998. E-mail address:email@example.com(J.-S. Hu).
Contents lists available atScienceDirect
Pattern Recognition Lettersj o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / p a t r e c
algorithm (Bradski, 1998) except spatial probability information as well as background inﬂuence are considered. The subject of this paper is divided into two parts. The ﬁrst part is to develop the new spatial-color mean-shift trackers for the purpose of capturing the target more accurately than the traditional mean-shift tracker. The second part is to develop a method for solving the scale and orientation problem mentioned above. The solution, though seems straightforward, has never been proposed in literature. The effec-tiveness proved by experiments shows a further enhancement on the mean-shift algorithm.
2. Model deﬁnition
Birchﬁeld and Rangarajan (2005)proposed the concept of spa-tial histogram, or spatiogram, in which each histogram bin con-tains the mean and covariance information of the locations of the pixels belonging to that bin. This idea involves the spatially weighted mean and covariance of the locations of the pixels. The spatiogram captures the spatial information of the general histo-gram bins. However, as shown inFig. 1, if cyan and blue belong to the same bin, these two blocks have the same spatiogram, even though they have different color patterns.
Let the image of interest have M pixels and the associated color space can be classiﬁed into B bins. For example, in RGB color space, if each color is divided into 8 intervals, the total number of bins is 512. The image can be described as Ix¼ fxi;cxi;bxigi¼1;...;Mwhere xi
is the location of pixel i with color feature vector cxiwhich belongs to the bxith bin. The color feature vector has the dimension d which is the color channels for the pixel (for example, in RGB color space, d = 3 and cxi¼ ðRxi;Gxi;BxiÞ). To keep the robustness of color description of the spatiogram, we extend the spatiogram and de-ﬁne a new joint spatial-color model of the image Ixas
hIxðbÞ ¼ nb;lP;b; X P;b ;lC;b; X C;b * + ; b ¼ 1; . . . ; B ð1Þ
lP,b, andPP;bare the same as the spatiogram proposed by
Birchﬁeld and Rangarajan (2005). Namely, nbis the number of
lP,bthe mean vector of pixel locations, andPP;bthe covariance
matrix of pixel locations belonging to the bth bin. In(1), we add two additional elements.
lC,bis the mean vector of the color feature
vec-tors andPC;bis the associated covariance matrix.
3. Spacial-colour mean-shift object tracking algorithm
Using the spatial-color feature and the concept of expectation (Yang et al., 2005), two different tracking algorithms are proposed as the following.
3.1. Spatial-color mean-shift tracking algorithm (tracker 1)
The p.d.f. of the selected pixel x, cx, bxin the image model hIxðbÞ (see(1)) can be estimated using kernel density function.
pðx; cx;bxÞ ¼ 1 B XB b¼1 KP x lP;b; X P;b ! KC cx lC;b; X C;b ! dðbx bÞ ð2Þ
where KPand KCare multivariate Gaussian kernel functions and can
be regarded as the spatially weighted and color-feature weighted function respectively. It is also possible to use a smooth kernel such as Gaussian (Yang et al., 2005). Using the concept of the expectation of the estimated kernel density, we can deﬁne a new similarity measure function between the model hIxðbÞ and candidate Iy¼ fyj;cyj;byjgj¼1;...;Nas JðIx;IyÞ ¼ JðyÞ ¼1 N XN j¼1 pðyj;cyj;byjÞ ¼ 1 NB XN j¼1 XB b¼1 KP yj lP;b; X P;b ! KC cyj lC;b; X C;b ! dðbyj bÞ ð3Þ
The spatial-color model in(2)under measures like(3)might be sen-sitive to small spatial changes. This problem was discussed by
O’Conaire et al. (2007) and Birchﬁeld and Rangarajan (2007). How-ever, this model also gives advantages of orientation estimation. As shown inFig. 2, if there is no deformation between candidate and target, and the distance of motion is not excessively large between two adjacent frames, we can consider the motion of object of two frames as a pure translation. Under these assumptions, the center of target model x0is proportional to the center of candidate y in
the candidate image. As a result, we can normalize the pixels loca-tion and then obtain the new similarity measure funcloca-tion as the following: JðyÞ ¼ 1 NB XN j¼1 XB b¼1 KP yj y ðlP;b x0Þ; X P;b ! KC cyj lC;b; X C;b ! dðbyj bÞ ð4Þ
The best candidate for matching can be found by computing the maximum value of the similarity measure. Let the gradient of the similarity function with respect to the vector y equal to 0, i.e.,
rJ(y) = 0, we can obtain the new position ynewof the target to be
rJðyÞ ¼ 0 ) 1 NB XN j¼1 XB b¼1 X P;b !1 ðyj y lP;bþ x0ÞKPKCdðbyj bÞ ¼ 0 ) X N j¼1 XB b¼1 X P;b !1 KPKCdðbyj bÞ 8 < : 9 = ;ðy x0Þ ¼X N j¼1 XB b¼1 X P;b !1 ðyj lP;bÞKPKCdðbyj bÞ
Fig. 1. Illustration of the same spatial information with different color distribution for one bin.
, P b μ 0 x , P b μ y
y x0¼ XN j¼1 XB b¼1 X P;b !1 KPKCdðbyj bÞ 8 < : 9 = ; 1 X N j¼1 XB b¼1 X P;b !1 ðyj lP;bÞKPKCdðbyj bÞ 8 < : 9 = ; ð5Þ
As a result, the new position ynewis described as(6).
ynew¼ XN j¼1 XB b¼1 X P;b !1 KPKCdðbyj bÞ 8 < : 9 = ; 1 X N j¼1 XB b¼1 X P;b !1 ðyj lP;bÞKPKCdðbyj bÞ 8 < : 9 = ;þ x0 ð6Þ where KP¼ KP yj yold lP;bþ x0; X P;b ! ¼ 1 2
pjPP;bj 1=2e 1 2ðyjyoldlP;bþx0Þ T P P;b 1 yjyoldlP;bþx0 ð Þ ! ð7Þ and KC¼ KC cyj lC;b; X C;b ! ¼ 1 ð2
pÞ3=2jP C;b j1=2e 1 2 cyjlC;b T P C;b 1 ðcyjlC;bÞ ! ð8Þ
Eq.(6)is the mean shift vector as well as an iterative function with respect to y. In the sequel, we deﬁne yoldas the current position.
3.2. Tracking algorithm with reduced complexity (tracker 2)
Based on the deﬁnition of the p.d.f. in(2), the kernel density functions of(7) and (8)have to be computed in each iterative step during tracking. Therefore, it is possible to reduce the computa-tional complexity when the variation of the target image is small. Rewrite(2)as pðbÞ ¼ 1 M XM i¼1 KP xi lP;bðiÞ; X P;bðiÞ 0 @ 1 AKC cxi lC;bðiÞ; X C;bðiÞ 0 @ 1 Adðb bðiÞÞ ð9Þ
where b(i) is the color bin to which pixel i belongs. We can derive new kernel density estimation functions as
KP xi lP;bðiÞ; X P;bðiÞ 0 @ 1 A ¼ 1 2
pjPP;bðiÞj1=2e 1 2ðxilP;bðiÞÞ T P P;bðiÞ 1 xilP;bðiÞ ð Þ ! ð10Þ KC cxi lC;bðiÞ; X C;bðiÞ 0 @ 1 A ¼ 1 ð2
pÞ3=2jPC;bðiÞj 1=2e 1 2ðcxilC;bðiÞÞ T P C;bðiÞ 1 cxilC;bðiÞ ð Þ ! ð11Þ
KPand KCare also the spatially weighted and color-feature weighted
functions which depend on the image model. Using the similar con-cept of the expectation of the estimated kernel density, another new similarity measure function between the model and candidate Iy¼ fyj;cyj;byjgj¼1;...;Ncan be deﬁned as JðIx;IyÞ ¼ JðyÞ ¼ 1 N XN j¼1 GðeyjÞpðbyjÞ ð12Þ
where eyj¼1aðy yjÞ and y is the center of the candidate image.
GðeyjÞ is a spatially weighted function depending on the candidate
j¼1N y yj
The best candidate is obtained by ﬁnding the maximum value of the similarity measure, i.e.,
rJðyÞ ¼ 0 ) 1
aN XN j¼1 ðy yjÞG 0ðey jÞpðbyjÞ ¼ 0 ) yX N j¼1 G0ðey jÞpðbyjÞ ¼ XN j¼1 yjG0ðeyjÞpðbyjÞ ) y ¼ PN j¼1yjG 0ðey jÞpðbyjÞ PN j¼1G 0ðey jÞpðbyjÞ ð13Þ
The spatially weighted term G0ðey
jÞ can be derived by choosing
func-tion G as the Epanechnikov kernel funcfunc-tion:
KðxÞ ¼ 1 2Cdðd þ 2Þð1 xk k 2 Þ; if xk k < 1 0; otherwise ( ð14Þ
where d is the dimension of space, Cd is the volume of the unit
d-Dimensional sphere. Let K(x) = k(kxk2), we obtain
kðxÞ ¼ 1
2Cdðd þ 2Þð1 xÞ; if x < 1
For two-dimensional image processing applications (d = 2 and Cd=
p), the kernel function is reduced to
kðxÞ ¼ 1 2pð2 þ 2Þð1 xÞ ¼2pð1 xÞ; if x < 1 0; otherwise ( ð15Þ
Assigning G(x) = k(x), the derivative of G becomes a constant as
G0ðxÞ ¼ k0ðxÞ ¼ 2
The result is simple and easy to compute. Finally, by substituting
(16)into(13), we can obtain the second similarity-based mean-shift algorithm as follows: ynew¼ PN j¼1yjpðbyjÞ PN j¼1pðbyjÞ ð17Þ 3.3. Weighted-background information
Like most of the tracking algorithms, the proposed method may arrive at an incorrect result if the background contains similar information in foreground. The problem becomes more serious if the scale and orientation of the target have to be followed. One way to reduce the inﬂuence of background is to apply a weighting function to the image surrounding the target. The combination of the weighting function with the spatial-color mean-shift tracking algorithms proposed before is explained below.
Let NF,bbe the normalized histogram of the foreground of the
bth bin PbNF;b¼ 1
, and NO,bthe normalized histogram of the
background of the bth bin PbNO;b¼ 1
background is computed in the region around the foreground (tar-get). The size is equal to two times the target size and the area is equal to three times the target area. Deﬁne the background inﬂu-ence factor of the bth bin asNF;b
NO;b for NO,b–0. The maximum value of the factor for all bins are deﬁned as b ¼ maxb¼1B;
. When b 1, certain bins in background contain more related features than the corresponding foreground bins. This should results in a small background weighting factor for those bins. Note that we ex-clude the cases when NO,b= 0 in computing b. Therefore, we should
also make the background weighting factor small for the cases when b 1. Based on the analysis, the background weighting fac-tor can be deﬁned as
Wb¼ 1 eb0b 1 b NF;b NO;b; if NO;b–0 1; if NF;b–0 and NO;b¼ 0 0; otherwise 8 > > < > > : ð18Þ
where b0is a constant. Note that when NF,b–0 and NO,b= 0, the
background has no inﬂuence to the foreground at the bth bin. Therefore, it is given the largest weighting in(18). The weighted-background information is added into the mean-shift tracking algo-rithms developed in (6) and (17), and the algorithms is derived again as ynew¼ XN j¼1 XB b¼1 X P;b !1 WbyjKPKCd½byj b 8 < : 9 = ; 1 X N j¼1 XB b¼1 X P;b !1 ðyj lP;bÞWbyjKPKCd½byj b 8 < : 9 = ;þ x0 ð19Þ and ynew¼ PN j¼1yjWbyjpðbyjÞ PN j¼1WbyjpðbyjÞ ð20Þ
3.4. Update of scale and orientation
The characteristic values of the covariance matrix of the spatial-color distribution can be utilized to represent the scale and orien-tation of the target. A similar idea was proposed in the algorithm called CAMSHIFT (Bradski, 1998). From the experimental results shown later, this simple calculation provides a fairly robust scale and orientation tracking performance which greatly enhances the capability of mean-shift algorithm.
We deﬁne several new elements as follows.
lTis the total mean
vector of the locations of all pixels in the target,P0
within-class covariance matrix of the B bins by adding the background weighting.P0
Bis the between-class covariance matrix of the B bins
by adding the background weighting. P0
T is the total covariance
matrix of locations of all data by adding the background weighting.
lT¼1 M XB b¼1 nblP;b ð21Þ X0 W ¼X B b¼1 XM i¼1 Wbðxi lP;bÞðxi lP;bÞ T d½bxi b ð22Þ X0 B ¼X B b¼1 WbnbðlP;b lTÞðlP;b lTÞ T ð23Þ X0 T ¼X M i¼1 Wbðxi lTÞðxi lTÞ T d½bxi b ð24Þ
It can be shown ﬁrst thatP0
Tcan be computed from the
between-class covariance matrix and within-between-class covariance matrix as
X0 T ¼X M i¼1 Wbðxi lTÞðxi lTÞ T d½bxi b ¼X B b¼1 XM i¼1 Wbðxi lP;bþ lP;b lTÞðxi lP;bþ lP;b lTÞ T d½bxi b ¼X B b¼1 XM i¼1 Wbðxi lP;bÞðxi lP;bÞ T d½bxi b þX B b¼1 XM i¼1 Wbðxi lP;bÞðxi lIÞ T d½bxi b þX B b¼1 XM i¼1 WbðlP;b lTÞðxi lP;bÞ T d½bxi b þX B b¼1 XM i¼1 WbðlP;b lTÞðlP;b lTÞ T d½bxi b ð25Þ Because XB b¼1 XM i¼1 WbðlP;b lTÞðxi lP;bÞ T d½bxi b ¼X B b¼1 WbðlP;b lTÞ XM i¼1 ðxi lP;bÞ T d½bxi b ¼X B b¼1 WbðlP;b lTÞðnblP;b nblP;bÞ T ¼ 0 and XB b¼1 XM i¼1 WbðlP;b lTÞðlP;b lTÞ T d½bxi b ¼X B b¼1 WbðlP;b lTÞðlP;b lTÞ TXM i¼1 d½bxi b ¼X B b¼1 WbnbðlP;b lTÞðlP;b lTÞ T we can obtain X0 T ¼X B b¼1 XM i¼1 Wbðxi lP;bÞðxi lP;bÞ T d½bxi b þX B b¼1 WbnbðlP;b lTÞðlP;b lTÞ T ¼X 0 W þX 0 B ð26Þ
Based on(26), we can computeP0T by the information of bins. It
means that even the target does not contain the model completely. We can also obtain the approximatedP0
Tfrom the bins in the
tar-get. Suppose the data dimension is limited to 2. Using principle component analysis method to solve the eigen-equation (Hastie et al., 2001), we have,
v ¼ kv ð27Þ
The corresponding eigen-vectors, v1and v2, are the direction of long
axis and short axis of the data distribution. k1and k2are the largest
and smallest eigen-values. Further, suppose the data is uniformly distributed in an ellipse. The principle direction of the ellipse is v1
and the length of long axis and short axis is equal to two times of k1and k2, respectively. So v1, 2k1and 2k2are the orientation and
4. Experiment results
The proposed spatial-color mean-shift tracking algorithms were implemented in C and tested on a 2.8GHz Pentium 4 PC with 1GB memory. We use normalized color variables r and g as the feature space as
r ¼ Red
ðRed þ Green þ BlueÞ; g ¼
Green ðRed þ Green þ BlueÞ
The color histograms are divided into 512 bins, i.e. the value B of(1)
is equal to 512. Three video clips are used in the ﬁrst experiment for ﬁxed scale and orientation cases: the face sequence for face track-ing; the cup sequence with complex appearance in complex back-ground; and the walking girl sequence which is obtained from
Adam et al. (2006)with partial occlusions. To demonstrate the scale and orientation tracking ability, two video clips are tested in the second experiment. The ﬁrst sequence is the person walking away from and toward the camera with a large variation of scale. The sec-ond sequence which is obtained from theCAVIAR database (2004)
illustrates the problem of large deformation. The image size of face sequence, cup sequence, walking girl sequence, and walking person sequence are 320 240, and the image size of the CAVIAR database is 352 288. The tracking window sizes of face sequence, cup se-quence, walking girl sequence are 59 82, 50 65, and 27 98, respectively. In the following ﬁgures, tracker 1 means the algorithm of(19)and its extension while tracker 2 the algorithm of(20)and its extension. To compute the tracking error, the ground-truth of the object location is marked visually in every 10 frames and the er-ror is determined by Euclidean distance.
Fig. 3. Face sequence of frames 33, 93, 117, 126, 183, 256, 271, and 455. (Red: tracker 1, blue: tracker 2, green: traditional mean-shift tracker). (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.)
Fig. 4. Distance error of face sequence. (*
Note: we only consider the distance error which is smaller than 50 pixels).
Fig. 5. Cup sequence of frames 4, 45, 63, 69, 81, 105, 166, and 243. (Red: tracker 1, blue: tracker 2, green: traditional mean-shift tracker). (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.)
4.1. Spatial-color mean-shift trackers with single scale tracking Part of the frames in the face tracking experiment is shown in Fig. 3. Fig. 4 shows the tracking error. For comparison pur-pose, traditional mean-shift algorithm is also implemented (marked as tMS in the ﬁgures). This example shows under sim-ple background, all these methods have similar performance but on the average, the proposed methods outperform the tradi-tional one.
Similarly, tracking of a cup with complex feature in a complex background are shown inFigs. 5 and 6. As shown inFig. 6,
tradi-tional mean-shift algorithm has a larger tracking error and some-times loses the target.
In the case of partial occlusion (Fig. 7), tracker 1 always captures the target under the circumstances of the variation of illumination and partial occlusion, but tracker 2 and traditional mean-shift fail in the tracking process.
4.2. Spatial-color mean-shift trackers with scale and orientation
Figs. 8 and 9shows the results of the spatial-color mean-shift trackers with the PCA method. In the video clip, the target person
Fig. 6. Distance error of cup sequence. (*
Note: we only consider the distance error which is smaller than 50 pixels).
Fig. 7. Walking girl sequence of frames 28, 111, 124, 130, 153, 166, 196, and 220. (Red: tracker 1, blue: tracker 2, green: traditional mean-shift tracker). (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.)
walks away from and toward the camera, the two trackers capture the target at all times. These show that both trackers are capable of tracking the size of the target. In the surveillance sequence ob-tained from the CAVIAR database (2004), a person walks, lies down, and ﬁnally stands up and resumes walking. These different actions give signiﬁcant deformation of the target.Figs. 10 and 11
show that the trackers proposed in this paper always track the tar-get with the corresponding scale, orientation, and shape.
4.3. Performance analysis
The performance of the proposed algorithms is analyzed in two different aspects: the preprocessing time of building the model and
the computational time for each iterative step. The face sequence and cup sequence are used to test the performance of the proposed trackers. The models are built from the ﬁrst image of these two se-quences, and the preprocessing procedure is executed ﬁve times to Fig. 9. Walking person tracking results of spatial-color mean-shift tracker2 with PCA scale method (frames 83, 358, 494, 598, 689, 733, 914, and 1000).
Fig. 10. Surveillance tracking results of spatial-color mean-shift tracker1 with PCA scale method (frames 3, 24, 32, 51, 59, 286, 318, and 353).
Fig. 11. Surveillance tracking results of spatial-color mean-shift tracker2 with PCA scale method (frames 3, 24, 32, 51, 59, 286, 318, and 353).
The preprocessing time of both trackers (in second)
Tracker 1 Tracker 2
Face sequence 0.027858 0.031299
obtain the average computing time.Table 1shows the preprocess-ing time of both trackers.
Figs. 12 and 13show the iteration time of the ﬁrst 200 frames of face sequence and cup sequence for tracker 1. The average time of total frames (about 2300 frames) is 0.035855 s (about 28 frames/s).
The average time of an iteration of total frames (about 1900 frames) of cup sequence is 0.017854 (about 56 frames/s).Figs. 14 and 15show the results for tracker 2. The average time of total frames is 0.020670 s (about 48 frames/s). The average time of an iteration of total frames of cup sequence is 0.006608 (about Fig. 12. The computing time of tracker 1 for the ﬁrst 200 frames of face sequence.
Fig. 13. The computing time of tracker 1 for the ﬁrst 200 frames of cup sequence.
Fig. 14. The computing time of tracker 2 for the ﬁrst 200 frames of face sequence.
151 frames/s). The tracking time of tracker 2 is smaller than that of tracker 1 because tracker 2 computes KPand KCat the
preprocess-ing stage instead of at each iteration. Nevertheless, both trackers satisfy the real-time requirement of current image sampling rate for most cameras (30 frames/s).
A spatial-color mean-shift object tracking algorithm is pro-posed in this paper. Combining the spatial information with color feature makes the model more robust in tracking applications. New tracking algorithms are proposed based on the proposed similarity measure using the concept of the expectation of the estimated kernel density. Moreover, the principal component analysis is applied to the covariance matrix of the spatial-color distribution to access the scale and orientation of the target. The experiment results show the effectiveness and real-time capability of the proposed algorithms. The update of scale and orientation in this paper are based on the image tracked by the proposed algorithm. This information is not considered for the tracking in the next step, which is an interesting research topic for further study. To do so, a modiﬁed model like the afﬁne trans-formation in Zhang et al. (2005), might have to be considered. However, this means certain scale and orientation restrictions are imposed on the target image. These aspects will be investi-gated in our future research.
This work was supported in part by the National Science Coun-cil of Taiwan, ROC under Grant No. NSC 95-2218-E-009-064 and the Ministry of Economic Affairs under Grant No. 95-EC-17-A-04-S1-054.
Adam, Amit, Rivlin, Ehud and Shimshoni, Ilan, 2006. Robust fragments-based tracking using the integral histogram. In: Proc. 2006 IEEE Comput. Soc. Conf. on Computer Vision and Pattern.
Birchﬁeld, S., Rangarajan, S., 2005. Spatiograms versus histograms for region-based tracking. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR’05) 2, pp. 1158–1163.
Birchﬁeld, S., Rangarajan, S., 2007. Spatial histograms for region-based tracking. ETRI J. 29 (5), 697–699.
Bradski, G.R., 1998. Computer vision face tracking for use in a perceptual user interface. Intel. Technol. J. 2 (2), 1–15.
CAVIAR database, 2004.<http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/>. Collins, R., 2003. Mean-shift blob tracking through scale space. In: Proc. IEEE Conf.
on Computer Vision and Pattern Recognition (CVPR’03) 2, 2003, p. 234. Comaniciu, D., Ramesh, V., Meer, P., 2003. Kernel-based object tracking. IEEE Trans.
Pattern Anal. Machine Intell. 25 (5), 564–577.
Hastie, T., Tibshirani, R., Friedman, J., 2001. The Elements of Statistical Learning. ﬁrst ed. Springer.
Lindeberg, T., 1998. Feature detection with automatic scale selection. Internat. J. Comput. Vision 30 (2), 79–116.
O’Conaire, C., O’Connor, N.E., Smeaton, A.F., 2007. An improved spatiogram similarity measure for robust object localization. In: Proc. ICASSP, 2007. Parameswaran, V., Ramesh, V., Zoghlami, I., 2006. Tunable kernels for tracking. In:
Proc. 2006 IEEE Comput. Soc. Conf. on Computer Vision and Pattern Recognition 2, pp. 2179–2186.
Porikli, F., Tuzel, O., 2005. Object tracking in low-frame-rate video. In: Proc. SPIE, vol. SPIE-5685, 2005, pp. 72–79.
Yang, C., Duraiswami, R., Davis, L., 2005. Efﬁcient mean-shift tracking via a new similarity measure. IEEE Conf. Comput. Vision and Pattern Recognition 1, 176– 183.
Zhang, H., Huang, Z., Huang, W., Li, L., 2004. Kernel-based method for tracking objects with rotation and translation. In: Proc. 17th Internat. Conf. on Pattern Recognition, (ICPR’04), vol. 2, pp. 728–731.
Zhang, H., Huang, W., Huang, Z., Li, L., 2005. Afﬁne object tracking with Kernel-based spatial-color representation. In: IEEE Comput. Soc. Conf. on Comput. Vision and Pattern Recognition (CVPR’05) 1, 20–25 June, pp. 293–300. Zhao, Q., Tao, H., 2005. Object tracking using color correlogram. In: Second Joint
IEEE Internat. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 15–16 October, pp. 263–270.
Zivkovic, Z., Krose, B., 2004. An EM-like algorithm for color-histogram-based object tracking. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR’04) 1, pp. 798–803.