Similarity Measure Function - S PATIAL -C OLOR M EAN -S HIFT O BJECT T RACKING A LGORITHM

Chapter 3. Spatial-Color Mean-Shift Object Tracking Algorithm

3.4 S PATIAL -C OLOR M EAN -S HIFT O BJECT T RACKING A LGORITHM

3.4.2 Similarity Measure Function

Similar with the concept of the expectation of the estimated kernel density as (3-5), we can get a new similarity measure function among the model I_x = x{ , }_i u_{i i}₌1,..._M and

As shown in Figure 3.4-1, if there is no deformation between candidate and target, and the distance of motion is not large between frames, we can consider the motion of object of two frames as a pure translation. Under these assumptions, the center of target with respect to the mean of location of the b-th bin in the model is in proportion to the center of candidate with respect to the mean of location of the b-th bin in the candidate image. So we can obtain

, ( ) , ( )

P b i − = P b j −

μ x μ y (3-8)

, ( ) , ( )

Figure 3.4-1 : Illustration of pure translation.

Substitute (3-9) into (3-7), we can obtain the new similarity measure function as follows. 3.4.3 Spatial-Color Mean-Shift Tracker

Similar with traditional Bhattacharyya coefficient method, we want to find the maximum value of the similarity measure to get the best candidate, so we let the gradient of the similarity function with respect to the vector y be equal to 0.

( ) (3-11) is the mean shift vector and also an iterative function with respect to y, and we rewrite (3-11) as

3.4.4 Another Derivation of the New Mean-Shift Tracker

Now we want to use another method to derive the second similarity-based mean-shift tracker. As the kernel density estimation model (3-6) which we defined in 3.4.1, if we replace x by x_i, and c_x by c_i in (3-6), we can get a new kernel density

KP and K_C are also the spatially weighted and color-feature weighted functions, but these two weighted functions are depend on the image model.

With similar concept of the expectation of the estimated kernel density used in 3.4.2, we define another new similarity measure function between the model

1,..., is spatially weighted depends on the candidate image. (3-18) is another new similarity measure function which we proposed.

Now we let the gradient of the similarity function with respect to the vector y be equal to 0 to find the maximum value of the similarity measure to obtain the best candidate.

1 1

So we obtain (3-20) which is another iterative mean shift vector and we rewrite (3-20) as Epanechnikov kernel function as

1 2 which is a constant. The result is easy to compute and simpler, and this is the reason why we choose weighted function G as Epanechnikov kernel function. Finally, by

substituting (3-25) into (3-21), we can get the second similarity-based mean-shift

(3-26) interprets that the object tracking algorithm is an iterative procedure which moves from current position y_old to the new position y_new.

3.4.5 Spatial-Color Mean-Shift Tracking Procedure

we have found the new spatial-color mean-shift tracking algorithms, single object tracking can be summarized as Figure 3.4-2 and Figure 3.4-3.

yes

Finish one iteration Initialization : Compute the target model

from model region according to the definition (3-2).

Initialize the location of the target in current frame with

according to (3-13) and (3-14) and substitute them into (3-12)

to find the next location of target candidate

number of pixels in b-th bin.

mean vector of location in b-th bin.

covariance matrix of location in b-th bin.

mean vector of RGB feature in b-th bin.

covariance matrix of RGB feature in b-th bin.

1,...,

Figure 3.4-2 : Spatial-color mean-shift tracking procedure of the first tracker.

yes

Finish one iteration Initialization : Compute the target model

from model region according to the definition (3-2).

, , , , , , ,

P b P b C b C b

μ ∑ μ ∑

Find the next location of target candidate according to

(3-26)

according to (3-16) and (3-17) 1,...,

{K_{Pi i}}₌ _M {K_{Ci i}}₌_1,...,_M

Initialize the location of the target in current frame with

number of pixels in b-th bin.

mean vector of location in b-th bin.

covariance matrix of location in b-th bin.

mean vector of RGB feature in b-th bin.

covariance matrix of RGB feature in b-th bin.

1,...,

b= B

Figure 3.4-3 : Spatial-color mean-shift tracking procedure of the second tracker.

3.5 Choice of the Color Feature Space

In 3.2.2, we choose color space ( , , )R G B as our color feature, so μ_{C b}_, _{is the} 3-dimension mean vector of values of ( , , )R G B and ∑_{C b}_, is the covariance matrix of

( , , )R G B . The color space ( , , )R G B is easily influenced by illumination that affects our tracking results greatly. So we take account of the normalized color space ( , , )r g b

space ( , , )r g b is defined as

3.6 Background-Weighted Information

In many tracking applications, background information is an important issue.

Exactly representing the target model is a difficult subject, and the system is always confused by the foreground feature with the background feature because the foreground always contains the background information. The proposed tracking method is based on the similarity between the target and the candidate; therefore, how to represent the foreground model is very important. Further, the improper representation of the foreground may concern with the scale and orientation selection algorithm, and obtain inappropriate scale. In this chapter, we derive a simple weighted-background representation and add this approach to the spatial-color mean-shift trackers which we proposed before.

Let N_{F b}_, as the normalized histogram of the foreground of the b-th bin

∑

). The histogram of background is computed in the region around the foreground (target). We define weights as

, ,1 ,2 ,

The weights transformation diminishes the effect of features which contribute more to the background than to the foreground.

Now we add the weighted-background information to the mean-shift trackers developed in 3.4 and re-derive the revised weighted spatial-color mean-shift as follows.

We add the weights to (3-7) and (3-18), and obtain

( ) By similar derivation in 3.4, we can obtain the final spatial-color mean-shift tracker functions which contain the weighted-background information from (3-29) and (3-30).

3.7 Update of Scale and Orientation

In computer vision and image processing, the object always changes its scale when it is away from the camera or toward the camera. In the situation of zoom in and zoom out of camera, the size of object body is also different between image frames. As

shown in Figure 3.7-1, if the object size is smaller than tracking window, it will contain many background pixels as well as the foreground object pixels. This problem causes wrong tracking result with noisy background pixels when a histogram computed within the window is compared to a model histogram describing the appearance of the foreground object. If the object size is larger than tracking window, it will cause the tracker to become more easily distracted by background clutter.

Figure 3.7-1 : Illustration of scale problem. (The figure is obtained from [4])

The orientation problem is similar with the scale problem. A fixed window may not contain all regions of the tracked object if it appears the variation of orientation and results in the failure of tracking. In the later section, we will use part of principal component analysis method to solve these two problems.

3.7.1 Introduction of Principal Component Analysis

Principal component analysis (PCA) is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate which is the first principal component, the second greatest variance on the second coordinate, and so on.

Assume the sample covariance matrix of standardized matrix X (X∈R^P) to be 1

= N

R − X X (3-33) The principal component analysis problem can be derived to be as the eigen-equation problem [13].

λ

Rv v (3-34) By solving this eigen-equation, we can obtain eigen-values { }

λ

_{i i}₌1,...,_P and eigen-vectors

{ } v

_{i i}₌_1,...,_P, respectively. The largest eigen-vector is the largest principal component which is the direction that makes variance of the projected data to be maximum, and the smallest principal component is the direction that makes that variance minimum as shown in Figure 3.7-2. In 2-dimension image data case, by this method we can obtain two eigen-values and two eigen-vectors which represent the orthogonal axes of data, respectively.

Figure 3.7-2 : Illustration of principal component analysis.

3.7.2 Orientation Selection by Principal Component Analysis

We can get the orientation of the total sample data by the concept of PCA method by previous section. Because we have computed some information about the image data location, we can use these data to get the total covariance matrix of total data for reducing the computation. In this section, we want to derive the covariance matrix of total image data from the elements which we defined in 3.2.2. Above all, we review the

Largest Principal

definition of some elements of model which we defined in (3-2) as follows. μ_{P b}_, is

mean vector of locations of pixels which belong to the b-th bin, ∑_{P b}_, is covariance matrix of location of pixels of the b-th bin, and B is the bin number which we classified.

And we define several new elements as follows. μ_T is the total mean vector of the locations of all pixels in the target, ∑_W is the within-class covariance matrix of the B bins, ∑_B is the between-class covariance matrix of the B bins, and ∑_T is the total covariance matrix of locations of all data.

T ,

Decomposing ∑_T, we get some derivation results.

Because

Therefore, we can get the total covariance matrix of all image data from the elements of model which we have defined, and we substitute ∑_T into R in (3-34) as

∑_T v=λv (3-42)

By solving this eigen-equation, we can get two eigen-vectors v₁ and v₂ with respect to the largest principal component and smallest principal component, respectively. If we use ellipse as the region of the target, the largest principal component represents the long axis and the smallest principal component represents the short axis as shown in Figure 3.7-2.

3.7.3 Adding Weighted-Background Information

In 3.6, we have discussed the influence of background information about the scale and orientation selection. For improving robustness and accurate of scale and orientation selection, we add the weighted background information to (3-41), and obtained the total covariance matrix with weighted-background information.

' more accurate direction of long axis and short axis.

3.7.4 Scale Selection

By the total covariance matrix, we can get orientation of the distribution of target image data by the axes of ellipse, but we can not obtain the length of axes. Now we want to know the relation between total covariance matrix and two axes.

We consider a uniform ellipse distribution, and assume probability of this ellipse is ¹

πab. Now we compute the variances along two axes.

where

σ

_xx_andσ_yy are elements of total covariance matrix ^'_T ^xx ^xy The values of two axes are about double of variances along the long axis and short axis.

3.8 Summary

In the previous section, we obtain the spatial-color mean-shift trackers, now we summarize these concepts as Figure 3.8-1 and Figure 3.8-2.

yes

Finish one iteration Initialization : Compute the target model

from model region according to the definition (3-2).

Initialize the location of the target in current frame with

according to (3-13) and (3-14) and substitute them into (3-31)

to find the next location

of target candidate

number of pixels in b-th bin.

mean vector of location in b-th bin.

covariance matrix of location in b-th bin.

mean vector of rg feature in b-th bin.

covariance matrix of rg feature in b-th bin.

1,..., Compute total covariance matrix

to get the eigen-vectors as orientation according to (3-42) and compute scale

according to (3-46)

Figure 3.8-1 : Complete spatial-color mean-shift tracking procedure of the first tracker.

yes

Finish one iteration Initialization : Compute the target model

from model region according the definition (3-2).

according to (3-16) and (3-17) 1,...,

{K_{Pi i}}₌ _M {K_{Ci i}}₌_1,...,_M

Initialize the location of the target in current frame with Find the next location

of target candidate according to (3-32)

number of pixels in b-th bin.

mean vector of location in b-th bin.

covariance matrix of location in b-th bin.

mean vector of rg feature in b-th bin.

covariance matrix of rg feature in b-th bin.

1,..., Compute total covariance matrix

to get the eigen-vectors as orientation according to (3-42) and compute scale

according to (3-46)

Figure 3.8-2 : Complete spatial-color mean-shift tracking procedure of the second tracker.

Chapter 4. Experiment Results

4.1 Experiment Illustration

The proposed spatial-color mean-shift tracking algorithms have been implemented in C and tested on a 2.8GHz Pentium 4 PC with 1GB memory. We divide the color histograms into 512 bins, i.e. the B of (3-2) is equal to 512. In the first part, we show our experiment results with the steps of what we developed our final trackers in chapter 3 in order, and present the tracking results with single scale experiment. We use the face sequence for face tracking, the cup sequence with complex appearance in complex background, and the walking girl sequence which is obtained from [14] with partial occlusions. In the second part, we present the experiment results with the boy walking sequence and surveillance sequence. The first sequence is the person away from the camera and toward the camera with huge variation of scale. The second sequence which is obtained from the CAVIAR database [15] illustrates the problem of huge deformation. The image size of face sequence, cup sequence, walking girl sequence, and walking boy sequence are 320x240, and the image size of surveillance sequence is 352x288. The tracking window sizes of face sequence, cup sequence, walking girl sequence are 59x82, 50x65, and 27x98, respectively.

We define (3-12) and the extension part as tracker 1, and (3-26) and the extension part as tracker 2. In the later section, we will compare the proposed tracker 1 and tracker 2 with the traditional mean-shift tracker, i.e. (2-16), and the general scale adaptation method with plus or minus 10 percent [1].

About the experiment results, we show part of the real tracking sequence, the distance error figure, and iteration num figure. We define the correct location of the

object every 10 frames by hand in advance, and use these data to analysis the tracking results. We discard the distance error which is larger than 50 pixels that shows the tracker loses the target. The iteration number is the frequency of tracker finding the target in that frame in the iterative procedure. Finally, the computing time of the proposed trackers will be discussed.

4.2 Spatial-Color Mean-Shift Trackers with RGB Feature

In this section, we present the experiment results of trackers with RGB color feature that we proposed in 3.4, and we define (3-12) as tracker 1 and (3-26) as tracker 2.

4.2.1 Face Sequence

Figure 4.2-1 : Face tracking results of spatial-color mean-shift trackers proposed in 3.4. Shown are frames 33, 93, 117, 126, 183, 256, 271, 455, 766. (red: tracker 1, blue: tracker 2, green: traditional mean-shift tracker)

Figure 4.2-2 : Distance error of face sequence of spatial-color mean-shift tracker1 and tracker 2 proposed in 3.4 that is compared with traditional mean-shift tracker. (*note: we only consider the distance error which is

smaller than 50 pixels)

Figure 4.2-3 : Iteration number of face tracking sequence. (left: tracker1, middle: tracker2, right: traditional mean-shift tracker)

At about 120^th frame, tracker 1 loses the face and captures the target again at about 950^th frame. In the situation of face being captured of three trackers, the distance errors of tracker 1 and tracker 2 are always smaller than those of traditional mean-shift

tracker. The average of iteration number of traditional mean-shift tracker is smaller than the other trackers. Up to now, the tracker 2 which we developed is not robust and more unstable than the traditional mean-shift tracker, but the tracker 1 is better about accurately tracking.

4.2.2 Cup Sequence

Figure 4.2-4 : Cup tracking results of spatial-color mean-shift trackers proposed in 3.4. Shown are frames 4, 45, 63, 69, 81, 105, 166, 243, 364. (red: tracker 1, blue: tracker 2, green: traditional mean-shift tracker)

Figure 4.2-5 : Distance error of cup sequence of spatial-color mean-shift tracker1 and tracker 2 proposed in 3.4 that is compared with traditional mean-shift tracker. (*note: we only consider the distance error which is smaller

than 50 pixels)

Figure 4.2-6 : Iteration number of cup sequence. (left: tracker1, middle: tracker2, right: traditional mean-shift tracker)

At most frames, the tracker 1 and tracker 2 lose the target, and the mean-shift tracker has weakly capturing. Because the background of this scene is very complex and the appearance of cup which we want to track is also complex, the trackers easily track the background object. The tracker 1 and tracker 2 contain the spatial information, so the trackers easily capture the background region which involves the

similar spatial information when the cup is swayed. The traditional mean-shift tracker only contains the color distribution information, so it is easily affected by the complex background information and can not accurately track the target.

4.2.3 Walking Girl Sequence

Figure 4.2-7 : Walking girl tracking results of spatial-color mean-shift trackers proposed in 3.4. Shown are frames 28, 106, 111, 124, 130, 153, 166, 196, .220. (red: tracker 1, blue: tracker 2, green: traditional mean-shift

tracker)

Figure 4.2-8 : Iteration numbers of walking girl sequence. (left: tracker1, middle: tracker2, right: traditional mean-shift tracker)

The walking girl sequence contains the problem of variation of illumination and partial occlusion. The variation of illumination from darker to bright and all trackers are not robust with this situation. At 111^th frame, part of girl has be covered by the car and the tracker 1 and traditional mean-shift tracker still track the girl, but the tracker 2 loses her. The trackers are not better enough.

4.3 Spatial-Color Mean-Shift Trackers with Normalized Feature

In order to reduce the influence of the slight variation of illumination, the normalized feature rg is used in 3.5. Similar with 4.2, the tracker 1 is defined as (3-12) with rg feature and the tracker 2 is defined as (3-26) with rg feature.

4.3.1 Face Sequence

Tracker 1

Figure 4.3-1 : Face tracking results of spatial-color mean-shift tracker 1 with rg feature in 3.5. Shown are frames 33, 93, 117, 126, 183, 256, 271, 455, 766. (red: tracker 1 with rg feature, blue: tracker 1 with RGB

feature)

Figure 4.3-2 : Distance error of face sequence of spatial-color mean-shift tracker1 with rg feature that is compared with that with RGB feature in 3.5. (*note: we only consider the distance error which is smaller than 50

Figure 4.3-3 : Iteration numbers of face sequence. (left: tracker1 with rg feature, right: tracker 1 with RGB feature)

As shown in Figure 4.3-2, the tracker 1 with rg feature loses the target at about 1190^th frame because at the left top of the scene there is a box which has similar appearance with face. In the situation of the face being tracked, the distance errors of tracker 1 with rg feature are more accurate than those of tracker 1 with RGB feature, and the average of iteration number of tracker 1 with rg feature is smaller than that of tracker 1 with RGB feature. Changing feature space to the normalized feature space can speed up the performance of tracker.

Tracker 2

Figure 4.3-4 : Face tracking results of spatial-color mean-shift tracker 2 with rg feature in 3.5. Shown are frames 33, 93, 117, 126, 183, 256, 271, 455, 766. (red: tracker 2 with rg feature, blue: tracker 2 with RGB

feature)

Figure 4.3-5 : Distance error of face sequence of spatial-color mean-shift tracker2 with rg feature that is compared with that with RGB feature in 3.5. (*note: we only consider the distance error which is smaller than 50

pixels)

Figure 4.3-6 : Iteration numbers of face sequence. (left: tracker2 with rg feature, right: tracker 2 with RGB feature)

Figure 4.3-5 shows that the tracker 2 with rg feature makes tracking more

‘workable’ than the tracker 2 with RGB feature. The performance of tracker 2 with rg feature is better than that of tracker 1 with RGB feature.

4.3.2 Cup Sequence

Tracker 1

Figure 4.3-7 : Cup tracking results of spatial-color mean-shift tracker 1 with rg feature in 3.5. Shown are frames 4, 45, 63, 69, 81, 105, 166, 243, 364. (red: tracker 1 with rg feature, blue: tracker 1 with RGB feature)

Figure 4.3-8 : Distance error of cup sequence of spatial-color mean-shift tracker 1 with rg feature that is compared with that with RGB feature in 3.5. (*note: we only consider the distance error which is smaller than 50

在文檔中使用空間與顏色特徵的平均移動演算法於物件大小與方位追蹤 (頁 30-0)