使用空間與顏色特徵的平均移動演算法於物件大小與方位追蹤

(1)

國立交通大學

電機與控制工程研究所

碩士論文

使用空間與顏色特徵的平均移動演算法

於物件大小與方位追蹤

A New Spatial-Color Mean-Shift Object Tracking Algorithm

with Scale and Orientation Estimation

研究生：阮崇維

指導教授：胡竹生博士

(2)

使用空間與顏色特徵的平均移動演算法

於物件大小與方位追蹤

研究生：阮崇維 指導教授：胡竹生博士 國立交通大學 電機與控制工程研究所碩士班

摘要

本論文中發展了一個以空間和顏色為基礎的平均移動演算法。其中以空間中 顏色分佈的相對資訊和顏色的特徵來定義物件的模型，並以新的相似度函數發展 出新的平均移動演算法來做物件追蹤，為了要使物件追蹤的效果更穩健，針對不 同的特徵做了實驗並選出使追蹤效果最好的顏色特徵，接著並在演算法中加入了 以背景資訊而建立的權重值，使得演算法具有更好的穩定性。而為了解決在物件 追蹤中常遇到的物件大小與方位的問題，我們使用了主成分分析的方法來估測物 件的方位，並以主成分分析所延伸而來的演算法來估計物件的大小，而此方法確 實可以自動更新物件的大小與方位。在最後的實驗中則可以看出此追蹤演算法可 以解決部份遮蔽和物件變形的問題，且在複雜背景下仍具有良好的即時追蹤效 能。

(3)

A New Spatial-Color Mean-Shift Object Tracking Algorithm

with Scale and Orientation Estimation

Student： Chung-Wei Juan Advisor： Prof. Jwu-Sheng Hu

Institute of Electrical and Control Engineering National Chiao-Tung University

ABSTRACT

In this thesis, we propose the new mean-shift tracking algorithms based on a new similarity measure function. The joint spatial-color feature is used as our basic model elements. The target image is modeled with the kernel density estimation and we use the concept of expectation of the estimated kernel density to develop the new similarity measure functions. With these new similarity measure functions, two new similarity-based mean-shift tracking algorithms were derived. To enhance the robustness, we add the weighted-background information to the proposed mean-shift tracking algorithm. In order to solve the deformation problem, the principal component analysis method is used to update the orientation of the tracking object, and a simple method is elaborated to monitor the scale of the object. The results of the experiments show that the new similarity-based tracking algorithms are real-time and can track the moving object correctly, and update the orientation and scale of the object automatically.

(4)

誌謝

首先感謝胡竹生老師兩年來辛勤的指導，在老師的實驗室中接觸了各式各樣的東 西，學到了相當多專業上的知識，並完成了碩士學位。謝謝黃育綸老師在 TI DSP 大獎 賽期間所給予的指導，也感謝老師不時的關心以及在人生方向上的鼓勵與建議。謝謝可 愛的蔡中庸老師，當了兩年蔡媽的計概助教，也和老師ㄧ起吃了不少好吃的素食。另外 謝謝口試委員莊仁輝老師、周志成老師、張文中老師的建議，讓學生更加了解想要做一 份好的論文究竟還需要些什麼。 謝謝實驗室的成員，謝謝永遠只能說實話的永融不時在人生方向與學業間的建議 與指教，謝謝宗敏學長在學業期間關於影像方面的的指導，謝謝興哥的人生共勉，謝謝 一起打拼的楷祥、弘齡、鳥蕙、恒嘉、螞蟻、佩靜、朱木、鏗元、劉大人、啟揚、趴趴、 俊宇的肚子、瓊文、阿吉、槓。 謝謝多年的室友兼好友爽比油(俊瑋)，感謝你的畢業禮物。謝謝我最好的朋友姚明 (宇晨)。謝謝廖公(子揚)不時解答我影像處理上的問題，謝謝吁吁吁(志軒)一個禮拜一次 的海賊王，謝謝賓拉登常常在大半夜找我去吃宵夜，也謝謝老涂、天婦羅和他們實驗室 的 NND，謝謝信元學長、嘉豪學長、賓哥的鼓勵與指導，祝願你們能早日脫離博班苦 海。 謝謝大四畢業那年一起在墾丁當解說員的夥伴，欣民、孝旭、舒帆、怡伶、雅瑜、 婕瑜、妍伶、彥廷、佩君、嘉玲、建炘、志原、紅生、致融、彭大哥、秋樺姐、怡慧姐、 婷葳姐、華珍姐、翠玲姐，和你們在一起帶團的 2 個月大概是我在交大這六年中最快樂 的回憶。 最後要謝謝家人的支持以及活動中心公用琴房中那台較破舊的鋼琴，在我研究所後 期無數難過與沮喪的日子裡，碰碰你往往都可以讓我暫時忘掉一切，也謝謝陪我度過 6 個酷暑的電風扇，希望你在永融家仍能一如往昔。

(5)

List of Tables

TABLE 2.2-1: TWO WEIGHT KERNEL FUNCTIONS. ... 8

TABLE 4.6-1: THE PREPROCESSING TIME OF TRACKER 1 ACCORDING TO THE PROCEDURE AS SHOWN IN FIGURE

3.4-2. ... 79

3.8-1. ... 79

3.4-3. ... 80

(8)

List of Figures

FIGURE 1.2-1:SIMILAR COLOR DISTRIBUTION BLOCKS TRACKING SEQUENCE. ... 2

FIGURE 2.3-1: ILLUSTRATION OF CLASSIFICATION OF TWO DISTRIBUTIONS. ... 10

FIGURE 2.5-1: TRADITIONAL MEAN-SHIFT TRACKING ALGORITHM PROCEDURE. ... 14

FIGURE 3.2-1: THREE DIFFERENT POSES OF A PERSON’S HEAD (TOP), IMAGES GENERATED FROM THE COMPUTED HISTOGRAM (MIDDLE), IMAGES GENERATED FROM THE COMPUTED SPATIOGRAM (BOTTOM)... 17

FIGURE 3.2-2: ILLUSTRATION OF THE SAME SPATIAL INFORMATION WITH DIFFERENT COLOR DISTRIBUTION FOR ONE BIN. ... 18

FIGURE 3.4-1: ILLUSTRATION OF PURE TRANSLATION... 21

FIGURE 3.4-2: SPATIAL-COLOR MEAN-SHIFT TRACKING PROCEDURE OF THE FIRST TRACKER. ... 26

FIGURE 3.4-3: SPATIAL-COLOR MEAN-SHIFT TRACKING PROCEDURE OF THE SECOND TRACKER. ... 27

FIGURE 3.7-1: ILLUSTRATION OF SCALE PROBLEM. ... 30

FIGURE 3.7-2: ILLUSTRATION OF PRINCIPAL COMPONENT ANALYSIS... 31

FIGURE 3.8-1: COMPLETE SPATIAL-COLOR MEAN-SHIFT TRACKING PROCEDURE OF THE FIRST TRACKER. ... 36

FIGURE 3.8-2: COMPLETE SPATIAL-COLOR MEAN-SHIFT TRACKING PROCEDURE OF THE SECOND TRACKER. ... 37

FIGURE 4.2-1: FACE TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKERS PROPOSED IN 3.4... 40

FIGURE 4.2-2: DISTANCE ERROR OF FACE SEQUENCE OF SPATIAL-COLOR MEAN-SHIFT TRACKER1 AND TRACKER 2 PROPOSED IN 3.4 THAT IS COMPARED WITH TRADITIONAL MEAN-SHIFT TRACKER... 40

FIGURE 4.2-3: ITERATION NUMBER OF FACE TRACKING SEQUENCE. ... 40

FIGURE 4.2-4: CUP TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKERS PROPOSED IN 3.4.. ... 41

FIGURE 4.2-5: DISTANCE ERROR OF CUP SEQUENCE OF SPATIAL-COLOR MEAN-SHIFT TRACKER1 AND TRACKER 2 PROPOSED IN 3.4 THAT IS COMPARED WITH TRADITIONAL MEAN-SHIFT TRACKER... 42

FIGURE 4.2-6: ITERATION NUMBER OF CUP SEQUENCE. ... 42

FIGURE 4.2-7: WALKING GIRL TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKERS PROPOSED IN 3.4. ... 43

FIGURE 4.2-8: ITERATION NUMBERS OF WALKING GIRL SEQUENCE. ... 44

FIGURE 4.3-1: FACE TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER 1 WITH RG FEATURE IN 3.5.. 45

FIGURE 4.3-2: DISTANCE ERROR OF FACE SEQUENCE OF SPATIAL-COLOR MEAN-SHIFT TRACKER1 WITH RG FEATURE THAT IS COMPARED WITH THAT WITH RGB FEATURE IN 3.5. ... 45

FIGURE 4.3-3: ITERATION NUMBERS OF FACE SEQUENCE. ... 46

FIGURE 4.3-4: FACE TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER 2 WITH RG FEATURE IN 3.5.. 47

FIGURE 4.3-5: DISTANCE ERROR OF FACE SEQUENCE OF SPATIAL-COLOR MEAN-SHIFT TRACKER2 WITH RG FEATURE THAT IS COMPARED WITH THAT WITH RGB FEATURE IN 3.5. ... 47

FIGURE 4.3-6: ITERATION NUMBERS OF FACE SEQUENCE. ... 48

FIGURE 4.3-7: CUP TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER 1 WITH RG FEATURE IN 3.5.. . 49

(9)

FEATURE THAT IS COMPARED WITH THAT WITH RGB FEATURE IN 3.5. ... 49 FIGURE 4.3-9: ITERATION NUMBER OF CUP SEQUENCE. ... 50

FIGURE 4.3-10: CUP TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER 2 WITH RG FEATURE IN 3.5..51

FIGURE 4.3-11: DISTANCE ERROR OF CUP SEQUENCE OF SPATIAL-COLOR MEAN-SHIFT TRACKER 2 WITH RG

FEATURE THAT IS COMPARED WITH THAT WITH RGB FEATURE IN 3.5. ... 51 FIGURE 4.3-12: ITERATION NUMBERS OF CUP SEQUENCE... 52

FIGURE 4.3-13: WALKING GIRL TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER 1 WITH RG

FEATURE IN 3.5. ... 53 FIGURE 4.3-14: ITERATION NUMBERS OF WALKING GIRL SEQUENCE. ... 53

FIGURE 4.3-15: WALKING GIRL TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER 2 WITH RG

FEATURE IN 3.5. ... 54 FIGURE 4.3-16: ITERATION NUMBERS OF WALKING GIRL SEQUENCE. ... 54

FIGURE 4.4-1: FACE TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER 1 WITH RG FEATURE AND

WEIGHTED-BACKGROUND INFORMATION IN 3.6. ... 55

FIGURE 4.4-2: DISTANCE ERROR OF FACE SEQUENCE OF SPATIAL-COLOR MEAN-SHIFT TRACKER 1 WITH RG

FEATURE AND WEIGHTED-BACKGROUND INFORMATION IN 3.6 THAT IS COMPARED WITH THAT WITH RG

FEATURE ONLY. ... 56 FIGURE 4.4-3: ITERATION NUMBER OF FACE SEQUENCE... 56

FIGURE 4.4-4: FACE TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER 2 WITH RG FEATURE AND

FIGURE 4.4-5: DISTANCE ERROR OF FACE TRACKING SEQUENCE OF SPATIAL-COLOR MEAN-SHIFT TRACKER 2 WITH

RG FEATURE AND WEIGHTED-BACKGROUND INFORMATION IN 3.6 THAT IS COMPARED WITH THAT WITH RG

FEATURE ONLY. ... 58 FIGURE 4.4-6: ITERATION NUMBERS OF FACE SEQUENCE. ... 58

FIGURE 4.4-7: FACE TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKERS WITH RG FEATURE AND

FIGURE 4.4-8: DISTANCE ERROR OF FACE SEQUENCE OF SPATIAL-COLOR MEAN-SHIFT TRACKERS WITH RG

FEATURE AND WEIGHTED-BACKGROUND INFORMATION IN 3.6. ... 60 FIGURE 4.4-9: ITERATION NUMBER OF FACE SEQUENCE... 60

FIGURE 4.4-10: CUP TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER 1 WITH RG FEATURE AND

FIGURE 4.4-11: DISTANCE ERROR OF CUP SEQUENCE OF SPATIAL-COLOR MEAN-SHIFT TRACKERS WITH RG

FEATURE AND WEIGHTED-BACKGROUND INFORMATION IN 3.6. ... 62 FIGURE 4.4-12: ITERATION NUMBER OF CUP SEQUENCE. ... 62

FIGURE 4.4-13: CUP TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER 2 WITH RG FEATURE AND

FIGURE 4.4-14: DISTANCE ERROR OF CUP SEQUENCE OF SPATIAL-COLOR MEAN-SHIFT TRACKER WITH RG FEATURE

(10)

FIGURE 4.4-16: CUP TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKERS WITH RG FEATURE AND

FIGURE 4.4-17: DISTANCE ERROR OF CUP SEQUENCE OF SPATIAL-COLOR MEAN-SHIFT TRACKERS WITH RG FEATURE AND WEIGHTED-BACKGROUND INFORMATION IN 3.6. ... 65

FIGURE 4.4-18: ITERATION NUMBER OF CUP SEQUENCE. ... 66

FIGURE 4.4-19: WALKING GIRL TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER 1 WITH RG FEATURE AND WEIGHTED-BACKGROUND IN 3.6... 67

FIGURE 4.4-21: WALKING GIRL TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER 2 WITH RG FEATURE AND WEIGHTED-BACKGROUND IN 3.6... 68

FIGURE 4.4-23: WALKING GIRL TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKERS WITH RG FEATURE AND WEIGHTED-BACKGROUND IN 3.6. ... 69

FIGURE 4.5-1: WALKING PERSON TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER1 WITH PCA SCALE METHOD... 71

FIGURE 4.5-2: WALKING PERSON TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER2 WITH PCA SCALE METHOD... 72

FIGURE 4.5-3: WALKING PERSON TRACKING RESULTS OF TRADITIONAL MEAN-SHIFT TRACKER WITH PLUS OR MINUS 10 PERCENT SCALE ADAPTATION METHOD... 73

FIGURE 4.5-4: ITERATION NUMBER OF WALKING PERSON SEQUENCE... 74

FIGURE 4.5-5: SURVEILLANCE TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER1 WITH PCA SCALE METHOD... 75

FIGURE 4.5-6: SURVEILLANCE TRACKING RESULTS OF SPATIAL-COLOR MEAN-SHIFT TRACKER2 WITH PCA SCALE METHOD... 76

FIGURE 4.5-7: SURVEILLANCE TRACKING RESULTS OF TRADITIONAL MEAN-SHIFT TRACKER WITH PLUS OR MINUS 10 PERCENT SCALE ADAPTATION METHOD... 77

FIGURE 4.5-8: ITERATION NUMBER OF SURVEILLANCE SEQUENCE... 77

FIGURE 4.6-1: THE TRACKING TIME OF THE FIRST 200 FRAMES OF FACE SEQUENCE OF TRACKER 1. ... 79

FIGURE 4.6-2: THE TRACKING TIME OF THE FIRST 200 FRAMES OF CUP SEQUENCE OF TRACKER 1... 80

FIGURE 4.6-3: THE TRACKING TIME OF THE FIRST 200 FRAMES OF FACE SEQUENCE OF TRACKER 2. ... 81

(11)

Chapter 1. Introduction

1.1 Motivation and Objective

In related research of the computer vision, the object tracking is an important issue in many computer vision applications. The object tracking can be applied on the surveillance system which can capture the person of unknown identity and notifies related persons immediately. Perceptual interfaces also require the tracking system to capture where the user is. A good tracking system makes driving more secure and assists the driver to handle the situation of navigation. Furthermore, robot system, augmented reality, digital home, and object-based video compression all depend on the object tracking system.

Up to now, there is not a robust object tracking system which can be applied under all kinds of different circumstances. The object tracking system is always developed for specific situations. For example, V. Parameswaran et al. [3] proposed a tunable representation for tracking encoding appearance and geometry but failed for deformation, and F. Porikli et al. [2] presented a method for the low-frame-rate video in which objects have fast motion but failed under the huge variation of illumination.

In general, the tracking system is easily influenced by many factors. An insufficient target representation of tracking system could easily create confusion between the target and the background. Huge variations of illumination always make the appearance of target be different from that of model. Occlusion problem results in an incomplete target representation and makes the tracking fail. Moreover, the computer can not judge the same target with different scale size at the scene automatically. With these problems and related applications, how to track the moving

(12)

object robustly is an important and interesting research issue.

1.2 Literature Review

In this thesis, we propose an algorithm based on the mean-shift tracking algorithm proposed in [1]. The advantages of the mean-shift tracker include fast operation, robustness and invariance to a large class of object deformations. A large number of related research followed [1] to develop various related aspects such as feature spaces [4] [5], spatial information [6] [7], shape adaptation [8] [9], etc.

In visual tracking, object representation is an important issue because it can describe the correlation between the appearance and the state of the object. An appropriate object representation is more robust and makes the target model more distinguished from the background, and achieves a better tracking result. In [1], D. Comaniciu et al. used the spatial kernels with the pixels which are weighted by a radially symmetric normalized distance from the object center, together with color histograms, to represent blob-alike color objects, and the representation of target make mean-shift tracking more efficient. Radially symmetric kernel preserves representation of the distance of a pixel from the center even the object has a large set of transformations, but this approach only contains the color information of the target and the spatial information is discarded. As shown in Figure 1.2-1, the tracker fails because the rectangular block being tracked overlapped with another block of the same color distribution but inverted spatial distribution of colors.

Figure 1.2-1 : Similar color distribution blocks tracking sequence. (The figure is obtained from [3]) Furthermore, V. Parameswaran et al. [3] proposed the tunable kernels for tracking,

(13)

which simultaneously encodes appearance and geometry that enable the use of mean-shift iterations for tracking. A method was presented to modulate the feature histogram of the target that uses a set of spatial kernels with different bandwidths to encode the spatial information. This method shows how one could learn the optimal set of bandwidths to use the captured data for the case of pedestrians walking upright. This approach indeed can solve the problem of similar color distribution blocks with different spatial configuration, but it just works for some cases, such as walking upright.

Another problem in the visual tracking is how to track the scale of object. In [1], the mean-shift algorithm is run several times, using the current and scaled window sizes. For each different window size, the similarity measure Bhattacharyya coefficient is computed to be compared, and the window size yielding the largest Bhattacharyya coefficient, i.e. the most similar distribution, is chosen as the new current scale. V. Parameswaran et al. [3], S. Birchfield et al. [7] and F. Porikli et al. [10] use the similar variation method to solve the scale problem, but this method is unstable, and easily make the tracker lose the target.

R. Collins [4] extended the mean-shift tracker by adapting T. Lindeberg’s theory [11] of feature scale selection based on local maxima of differential scale-space filters. This method uses blob tracking and a scale kernel to accurately capture the target’s variation in scale. But in the paper the detailed iteration method was not described. Furthermore, an EM-like algorithm [9] is provided to estimate the shape of the local mode. This approach estimates simultaneously the position of the local mode and uses the covariance matrix to describe the approximate shape of object. But this paper also does not illustrate how to decide the scale size from the covariance matrix and other details about implementation.

(14)

Q. Zhao et al. [6] and H. Zhang et al. [8] propose the methods to solve the problem of rotation and translation. H. Zhang et al. [8] proposed a method which represents the object by a kernel-based model, which offers more accurate spatial-spectral description than general blob models. Q. Zhao et al. [6] proposed the color correlogram method to use the correlation of colors to solve the related problem. But these methods are not suitable for the complex background situation.

Most papers in literature provide methods for specific applications. This thesis extends the traditional mean-shift tracking algorithm and will propose a new mean-shift based method to improve the arbitrary object tracking problem, and try to estimate the scale and orientation of target.

1.3 Thesis Subject and Contribution

The subject of this thesis can be divided into two parts. The first past is to develop the new spatial-color mean-shift trackers for the purpose of capturing the target more accurately than the traditional mean-shift tracker. The second part is to develop a method for solving the scale and orientation problem which always appears in computer vision.

In the first part, the new spatial-color mean-shift object tracking algorithms are presented, thus the trackers can track the target consistently. The tracking algorithms combine the spatial information and color feature to represent the model more robustly, and use the new similarity measure functions to obtain the iterative mean-shift procedure. Some other extension methods and algorithms are used to improve the performance of these new trackers, such as different color feature space and weighted-background information.

In the second part, this thesis uses principle component analysis method to estimate the scale and orientation of the tracking target. The principle component

(15)

analysis method can be extended from the tracking algorithm proposed above because the spatial-color mean-shift object tracking algorithms and the principle component analysis method both use the spatial information and weighted-background information.

The proposed spatial-color mean-shift object tracking algorithms are implemented and the experiment results show that the new methods are more robust than the traditional mean-shift tracking algorithm, and can improve the scale and orientation problems.

1.4 Outlines of Thesis

The remainder of this thesis is organized as follows.

Chapter 2: the traditional mean-shift tracking algorithm is reviewed, including how to represent the target model, the traditional similarity measure Bhattacharyya coefficient, how to derive the traditional mean-shift tracker, and the summary of total mean-shift tracking algorithm procedure.

Chapter 3: at first, two recent papers are reviewed, and the similar concept of these two papers is extended to develop the new spatial-color mean-shift tracking algorithms. To improve the new trackers and make them more robust, some extensions of the basic algorithm is discussed and applied. Finally, the algorithm for solving scale and orientation is presented and the total algorithm is summarized at the end of this chapter.

Chapter 4: the experiment results are presented according to the developing steps of algorithms in chapter 3. Some real image sequences and figures are presented, and the experiment results are discussed.

(16)

Chapter 2. Traditional Mean-Shift Tracking Algorithm

2.1 Introduction

Mean shift tracking algorithm [1] is a template base image tracking algorithm. The main concept of mean-shift tracking is to find the candidate which is the most similar with target image by mean-shift iterations. The principle of mean-shift is to compare the color distribution of candidate region with the color distribution of the model, and to compute the similarity measure, Bhattacharyya coefficient, to observe the variation of gradient of candidate to find the mean-shift vector. Further, mean-shift finds the most similar region or the most possible area of the candidate. In later sections, we will introduce the derivation and principle of the traditional mean-shift tracking algorithm.

2.2 Target Representation

Mean-shift is a template based algorithm, so we must find a feature to represent our target model. In general, we always choose the color p.d.f. as our reference model. We consider the center of target model as location 0 and the candidate is defined at location y. Further, we define the target model as q and the target candidate as p y( ). In practice, the image data are classified to m-bin histograms in order to reduce the computational complexity. Thus we define the target model as

1,..., 1 ˆ { }ˆ ˆ 1 m u u m u u q ₌ q = =

∑

= q _(2-1)

(17)

1,..., 1 ˆ( ) {ˆ ( )} ˆ 1 m u u m u u p ₌ p = =

∑

= p y y _(2-2)

Although, the histogram is not the best nonparametric density estimate [16], it is simple and sufficient for traditional mean-shift algorithm.

2.2.1 Model Representation

We need to capture the character to form a p.d.f. from the target model image with the first step of mean-shift tracking algorithm. Let { }x*i i=1...n represent the pixel

locations of the region which we want to track in the target model, and we consider the center of target model as location 0. We define the function b R: 2 →{1,..., }m as color index, and the value of function b x( )*_i is the index of its bin in the quantized

feature space of pixel x*i. The probability of the feature u=1,...,m of the target

model is then defined as

2 * * 1 ˆ ( ) [ ( ) ] n u i i i q C k

δ

b u = =

∑

x x − _(2-3)

where δ is the Kronecher delta function, C is the normalization constant computed for condition m₁ˆ_u 1 u= q =

∑

, so we can obtain 2 * 1 1 ( ) n i i C k = =

∑

x (2-4)

since the summation of delta functions is equal to one for u=1,...,m. In (2-3) and (2-4), * 2

( i )

k x is a convex and monotonic decreasing kernel function which contains the highest weight at the center and smaller weights to pixels farther from the center. In general, the pixel near the center of the target model region is more

(18)

target is covered by some obstacles, and the weights improve the robustness of the tracking result because the peripheral pixels are less significant. D.W. Scott [16] and D. Comaniciu et al. [17] mention two functions which are normal function (Gaussian function) and Epanechnikov function are more suitable to be the kernel function of mean-shift tracking algorithm. We list some information about these two functions in Table 2.2-1.

Table 2.2-1: Two weight kernel functions.

Function Name Definition Sketch with d =2

Normal Function (Gaussian Function) 2 / 2 1 1 exp( ), 1 (2 ) 2 ( ) 0, . d N if K otherwise

π

⎧ ₋ _< ⎪ = ⎨ ⎪⎩ x x x Epanechnikov Function 2 1 ( 2)(1 ), 1 2 ( ) 0, . d E d if C K otherwise ⎧ ₊ ₋ _< ⎪ = ⎨ ⎪⎩ x x x

d: dimension of space (in our 2D image case, d =2)

C_d : the volume of the unit d-Dimension sphere (in our 2D image case, Cd =π )

2.2.2 Candidate Representation

Now we define the p.d.f. of candidate in mean-shift tracking algorithm. Let

1,...,

{ } h

i i= n

x represent the pixel locations of the region in the target candidate, which centered at y in the current frame. As the same in 2.2.1, we define b x( ) as color

(19)

index of its bin in the quantized feature space of pixel xi. The probability of the

feature u=1,...,m of the target candidate is then defined as

2 1

ˆ ( )

(

) [ ( )

]

h n i u h i i

p

C

k

b

u

h

δ

=

−

=

∑

y x

−

y

x

_(2-5)

where k x( ) is the same kernel function with target model, h is bandwidth which defines the region size of the candidate, δ is the Kronecher delta function, Ch is the normalization constant computed for condition

1 ˆ 1 m u u p = =

∑

, so we can obtain 2 1

1 (

)

h h n _i i

C

k

h

=

−

∑

y x

(2-6)

Note that Ch does not depend on y, since the pixel locations xi are organized in a

regular lattice and y is one of the lattice nodes [1]. Therefore, Ch can be pre-calculated for a given kernel and different values of h.

2.3 Similarity Based on Bhattacharyya Coefficient

The similarity measure function is to compare the similarity between the target candidate and the target model to find the most similar region. There are various similarity measure functions to be used for different target representations. A differentiable kernel function yields a differentiable similarity function and efficient gradient-based optimizations procedures can be used for finding its local maximum which is the most possible region which we want to track.

In traditional mean-shift tracking algorithm, Bhattacharyya coefficient is used as the similarity measure function. First, the similarity function is defined as a distance among model and candidate, and the distance between two discrete distributions as

(20)

ˆ ˆ

( ) 1 [ ( ), ]

d y = −

ρ

p y q (2-7) and then ρ is chosen as Bhattacharyya coefficient between candidate pˆ and model

ˆ q. 1 ˆ( ) [ ( ), ]ˆ ˆ ˆ ( )ˆ m u u u p q

ρ

= ≡ =

∑

y p y q y _(2-8)

The concept of Bhattacharyya coefficient is the cosine of the angle between the

m-dimensional unit vectors ( ˆ1,..., ˆ )

T m

ρ

and ( ˆ1,..., ˆ ) T m q q , and it is an efficient and divergence-type for statistical measure.

With a different point of view, Bhattacharyya coefficient can be considered as the error estimation of two distributions. Figure 2.3-1 shows that we can obtain the best classification according to the vertical line of point A, and we can get the smallest error which is the yellow region. The larger error about classification results in larger Bhattacharyya coefficient and represents high similarity, and smaller error results in smaller Bhattacharyya coefficient and represents low similarity.

Figure 2.3-1 : Illustration of classification of two distributions.

2.4 Traditional Mean-Shift Tracker

(21)

Bhattacharyya coefficient (2-8). Using Taylor expansion to expand (2-8) around the values ρˆ ( )_u y at y y= ˆ0 which is the location of the target in previous frame. We start

to find the new target location in current frame. The linear approximation is obtained as 0 1 ˆ ˆ ˆ ˆ ˆ [ ( ), ] ( ) m u u u q ρ ρ = =

∑

p y q y 0 0 0 0 0 0 0 2 2 2 0 ˆ ˆ [ ( ), ] ˆ ( ) ˆ ˆ ˆ ˆ [ ( ) ( )] ! ˆ ˆ [ ( ), ] ˆ ˆ ˆ ˆ ˆ ˆ [ ( ), ] [ ( ) ( )] ˆ ( ) ˆ ˆ [ ( ), ] ˆ ( ) ˆ ˆ ˆ ˆ [ ( ) ( )] ... 2! ˆ ˆ [ ( ), n n u _n u u n u u u u u u n

ρ

∞ = = = = = ∂ ∂ = − ∂ = + − ∂ ∂ ∂ + − + ≈

∑

y y y y y y y y p y q y y y p y q p y q y y y p y q y y y p y 0 0 0 0 0 0 1 1 ˆ ˆ [ ( ), ] ˆ ˆ ˆ ˆ ] [ ( ) ( )] ˆ ( ) 1 ˆ ( )ˆ ˆ [ˆ ( )ˆ ˆ (ˆ )] ˆ ˆ 2 ( ) u u u m m u u u u u u u _u _u q q q

ρ

_ρ

ρ

= = = ₌ = = ∂ + − ∂ =

∑

+

∑

⋅ ⋅ − y y y y y y _{y y} p y q q y y y y y y y 0 1 1 0 ˆ 1 1 ˆ (ˆ )ˆ ˆ ( ) ˆ 2 2 ( ) m m u u u u u u u q q ρ ρ ρ = = =

∑

y +

∑

y y (2-9) The approximation is established under the assumption of the target not moving drastically from previous location yˆ0 to current location y, and this condition is

always satisfactory between consecutive image frames. Substituting (2-5) into (2-9), we can obtain 2 0 1 1 1 0 ˆ 1 1 ˆ ˆ ˆ ˆ ˆ ( ) [ ( ), ] ( ) ( ) [ ( ) ] ˆ 2 2 ( ) h n m m i u u u h i u u i u q q C k b u h

ρ

δ

ρ

= = = − = ≈

∑

+

∑ ∑

y x − y p y q y x y

(22)

2 0 1 1 1 ˆ (ˆ )ˆ ( ) 2 2 h n m h i u u i u i C q w k h ρ = = − =

∑

y +

∑

y x (2-10) where 1 0 ˆ [ ( ) ] ˆ (ˆ ) m u i i u u q w

δ

b u

ρ

= =

∑

x − y (2-11)

The objective is to find the maximum of Bhattacharyya coefficient ρ( )y . Because ρ( )y is independent of y, the term of 0

1 1 ˆ (ˆ )ˆ 2 m u u u q ρ =

∑

y in (2-10) does not affect the value of ρ( )y , and ρ( )y is only influenced by

2 1 ( ) ( ) 2 h n h i i i C f w k h = − =

∑

y x y (2-12) Further, using gradient-based optimizations procedure with (2-12), we obtain

2 2 1 ( ) ( ) '( ) 2 h n h i i i i C f w k h = h − ∇ y =

∑

y x− y x (2-13) Letting g x( )= −k x'( ), we obtain 2 2 1 ( ) ( ) ( ) 2 h n h i i i i C f w g h = h − ∇ y =

∑

x −y y x 2 2 1 2 2 1 1 ( ) ( ) 2 ( ) h h h n _i i i i n h i i n _i i w g C h g h h g h = = = ⎡ ₋ ⎤ ⎢ ⎥ ⎡ ₋ ⎤ _⎢ _⎥ = ⎢ _{⎥ ⎢}× − _⎥ − ⎢ ⎥ ⎣ _{⎦ ⎢} _⎥ ⎣ ⎦

∑

y x x y x y y x (2-14)

We can separate (2-14) into two parts. The first term is proportional to the density

estimation at y and 2 1 ( ) h n _i i= g _h −

∑

y x is assumed as a positive number [18], and let (2-14) be equal to 0.

( )

f

∇ y =0 (2-15) The second term can be obtained as the mean shift vector.

(23)

2 0 1 1 2 0 1 ˆ ( ) ˆ ˆ ( ) h h n i i i i n i i i w g h w g h = = − = −

∑

y x x y y x (2-16)

The mean shift vector always points toward the direction of maximum increase in the density. In this procedure, we can find the local maximum of the density by (2-16), and the kernel region can recursively moved from current location yˆ0 to the new

location yˆ1.

2.5 Mean-Shift Tracking Algorithm Procedure

The complete traditional mean-shift tracking algorithm is presented as Figure 2.5-1.

(24)

new

y

K

0

ˆ

y

0 1,...,

ˆ

{

p

_u

(

y

)}

_u₌ _m 1...

{ }

h i i n

w

₌ 1 0

ˆ

if

y

−

y

<

ε

1

ˆ

y

0 1

ˆ

←

ˆ

y

1,...,

ˆ

{ }

q

_{u u}₌ _m 1 0

ˆ

[ ( )

]

ˆ ˆ

(

)

m u i i u u

q

w

δ

b

u

ρ

=

∑

x

−

y

2 0 1 1 2 0 1

ˆ

(

)

ˆ

(

)

h h n _i i i i n _i i i

x w g

h

w g

h

= =

−

=

−

∑

y

x

y

x

(25)

Chapter 3. Spatial-Color Mean-Shift Object Tracking

Algorithm

3.1 Introduction

In this chapter, we will introduce two papers [7][12] with the spatiogram and the new similarity measure. With the spatiogram of [7], we will extend the original model to a new model with spatial and color feature information, and then use resembling method as [12] to get two different similarity measures. We will derive the iterative mean-shift tracking algorithms from the similarity measure functions. And then we will discuss the two different color features and select better color feature space by experiment test. To improve the robustness, we will take account of the background information and add the background-weighted parameter to the new mean-shift algorithms. In the final step, we will discuss the scale problem and try to use the principal component analysis method to solve it. In conclusion, we will give a summary and list the complete new mean-shift algorithm procedures.

3.2 Model Definition

In traditional mean-shift tracking algorithm, color histogram is used as the target representation. Color histogram discards all spatial information and uses the concept of color distribution to represent the target. This foundation technique is used to develop several tracking systems [2] [4] [10] which show that color histogram is robustness about deformation of the tracked object. But in some circumstances, spatial information is important and advantageous for different interference. In this chapter,

(26)

we want to build the target with color and spatial information.

3.2.1 Paper Survey about Spatiogram

Recently, S. Birchfield et al. [7] proposed the concept of each histogram bin which contains the mean and covariance information of the locations of the pixels which belong to that bin. This idea involves the spatially weighted by the mean and covariance of the locations of the pixels and not only the color information as traditional method. They call this concept a spatial histogram, or spatiogram. The model of spatiogram of an image I can be represented as follows.

( ) , , 1,...,

I b b b

h b = 〈n μ ∑ 〉 b= B (3-1) where

n

b is the number of pixels whose values belong to the b-th bin, μb is mean

vector of locations of all pixels which belongs to the b-th bin (i.e. the 2D coordinates), and ∑b is covariance matrix of locations of all pixels which belongs to the b-th bin

(symmetric matrix), and the pixels in the image can be classified to B bins.

The spatiogram captures the spatial information of the general histogram bins, but the traditional color histogram only gets the color distribution information. For instance, Figure 3.2-1 illustrates the difference between the spatiogram and traditional histogram. There are three different poses of a person’s head in the first row. For each person’s head to compute the histogram and spatiogram first, if we want to rebuild the original image from the computed histogram, the second row shows that we only can get the disorderly image which barely contains the color information. However, the image rebuilt form the computed spatiogram reveals the relationships about the color as shown in the third row. This paper uses the spatiogram and the general Bhattacharyya coefficient to derive a mean shift procedure algorithm and improve the tracking result when being compared with histogram method. The experiment results

(27)

demonstrate that spatial information captures a firmer description of the target to improve robustness in tracking.

Figure 3.2-1 : Three different poses of a person’s head (top), images generated from the computed histogram (middle), images generated from the computed spatiogram (bottom). (The figure is obtained from [7].)

3.2.2 A Joint Spatial-Color Feature Model

As shown in Figure 3.2-2, if cyan and blue belong to the same bin, these two blocks have the same spatiogram, but they have different color patterns. To keep the robustness of color description of the spatiogram, we extend the spatiogram and define our joint spatial-color model as

, , , ,

( ) , , , , 1,...,

I b P b P b C b C b

h b = 〈n μ ∑ μ ∑ 〉 b= B (3-2) where

n

b ,

μ

P b, , ∑P b, are the same as spatiogram which S. Birchfield et al.

proposed, and are respectively the number of pixels, the mean vector of locations, and covariance matrix of locations of pixels which belong to the b-th bin. In (3-2), we add two elements. μC b, is mean vector of color feature with d color channels of all pixels

which belong to the b-th bin (for example, in RGB color space, d=3 and

(R G B, , ) =

(28)

all pixels which belong to the b-th bin, and the pixels in the image can be classified to

B bins. We choose RGB channels as the color feature first, so μC b, is a 3D vector,

,

C b

∑ is a symmetric matrix, and we will discuss another more robust color feature in 3.5.

Figure 3.2-2 : Illustration of the same spatial information with different color distribution for one bin.

3.3 Paper Survey about New Similarity Measure

The significance of mean shift algorithm is to find the local maximum of the similarity measure between the image model and the candidate. In general, the similarity measure can be derived to a mean shift algorithm and we use the iterative result to track the candidate location. The most general similarity measures used in image tracking are the Bhattacharyya coefficient and the Kullback-Leibler divergence. For the spatial-color model, we want to find a simple similar measure function to obtain the mean-shift algorithm.

Lately, C. Yang et al. [12] proposed a new simple symmetric similarity function between kernel density estimates of the template and candidate distributions in a joint spatial-feature space and then presented an iterative tracking algorithm. This paper denotes model image as Ix = x{ , }i ui i=1,...,M and candidate image as Iy= y{ j, }vj j=1,...,N, where xi and yj are locations of pixels, ui and vj belong to the feature space. This paper describes the target feature distribution in the joint spatial-feature space, and uses estimated kernel density function to model the p.d.f. of the object in the

(29)

model image as 2 2 1 - -1 ˆ ( , ) N i i x i u u p u w k N = σ h ⎛ ⎞ ⎛ ⎞ = ⎜_⎜ ⎟ ⎜_{⎟ ⎜} ⎟_⎟ ⎝ ⎠ ⎝ ⎠

∑

x x x (3-3)

where σ and h are the bandwidths in spatial and feature space. We can also regard (3-3) as a spatially weighted function w with Gaussian Mixture Model of feature space

k.

Finally, this paper uses expectation of the estimated kernel density function between the model Ix and candidate Iy in the joint feature-spatial space as similarity measure 1 1 ˆ ( , ) ( , ) M x y x j j j J I I p v N = =

∑

y (3-4) 2 2 1 1 - -1 N M i j i j i j u v w k MN = = σ h ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ ⎜ ⎟ = ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠

∑∑

x y (3-5)

The paper then uses (3-5) to derive a similarity-based mean-shift tracking algorithm. The experiment results show that it is more accurate and the number of iterations is less than the traditional Bhattacharyya coefficient method. This main concept of the new similarity function is based on the expectation of all pixels over the model and candidate.

3.4 Spatial-Color Mean-Shift Object Tracking Algorithm

With the spatial-color feature and the concept of expectation, we develop two different tracking algorithms. The detailed statement and demonstration will be derived as follows.

3.4.1 Kernel Density Estimation of the Model Image

(30)

model (3-2) and set the image model as the estimated kernel density function. 1 1 , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) 1/ 2 _{3/ 2} 1/ 2 1 , ( ) , ( ) 1 1 exp ( ) ( ) ( ) exp ( ) ( ) ( ) 1 2 2 ( , ) [ ( )] 2 (2 ) T T P b i P b i P b i C b i C b i C b i M i P b i C b i p u u b i M _π _π δ − − = ⎛₋ ₋ ₋ ⎞ ⎛₋ ₋ ₋ ⎞ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ =

∑

x x − x μ x μ c μ c μ x ∑ ∑ ∑ ∑ , ( ) , ( ) , ( ) , ( ) 1 1 ( , ) ( , ) [ ( )] M P P b i P b i C C b i C b i i K K u b i M = δ − − −

∑

x μ cx μ ∑ ∑ (3-6) where b i( ) is the color bin which pixel i belongs to. KP and KC are multivariate Gaussian kernel functions. We use the delta function whose role is the Gaussian function in (3-3), the difference of these two concepts is that the Gaussian function contains the smoothed component but the delta function does not. We can also regard

P

K and KC as the spatially weighted and color-feature weighted function.

3.4.2 Similarity Measure Function

Similar with the concept of the expectation of the estimated kernel density as (3-5), we can get a new similarity measure function among the model Ix = x{ , }i ui i=1,...M and candidate Iy = y{ j, }vj j=1,...,N as 1 1 ( , ) ( ) ( , ) N x y j j j J I I J p v N = = y =

∑

y , ( ) , ( ) , ( ) , ( ) 1 1 1 ( , ) ( , ) [ ( )] j N M P j P b i P b i C C b i C b i j j i K K v b i NM = = δ =

∑∑

y −μ ∑ c_y −μ ∑ − (3-7) As shown in Figure 3.4-1, if there is no deformation between candidate and target, and the distance of motion is not large between frames, we can consider the motion of object of two frames as a pure translation. Under these assumptions, the center of target with respect to the mean of location of the b-th bin in the model is in proportion to the center of candidate with respect to the mean of location of the b-th bin in the candidate image. So we can obtain

, ( ) , ( )

P b i − = P b j −

(31)

, ( ) , ( ) P b i = P b j − + μ μ y x (3-9) , ( ) P b i μ x , ( ) P b j μ y model target

Figure 3.4-1 : Illustration of pure translation.

Substitute (3-9) into (3-7), we can obtain the new similarity measure function as follows. , ( ) , ( ) , ( ) , ( ) 1 1 1 ( ) ( , ) ( , ) [ ( )] j N M P j P b j P b i C C b i C b i j j i J K K v b i NM = = δ =

∑∑

− + − _y − − y y μ y x ∑ c μ ∑ (3-10)

3.4.3 Spatial-Color Mean-Shift Tracker

Similar with traditional Bhattacharyya coefficient method, we want to find the maximum value of the similarity measure to get the best candidate, so we let the gradient of the similarity function with respect to the vector y be equal to 0.

( ) J ∇ y =0 1 , ( ) , ( ) 1 1 1 -( ) ( ) [ ( )]= N M P b i j P b j P C j j i K K v b i NM δ − = = ⇒

∑∑

∑ y −μ + −y x − 0 1 1 , ( ) , ( ) , ( ) 1 1 1 1 ( ) [ ( )] ( ) ( ) ( ) [ ( )] N M N M P b i P C j P b i j P b j P C j j i j i K K δ v b i K K δ v b i − − = = = = ⎧ ⎫ ⇒_⎨ − _⎬ − = − − ⎩

∑∑

∑ ⎭ y x

∑∑

∑ y μ 1 1 1 , ( ) , ( ) , ( ) 1 1 1 1 ( ) [ ( )] ( ) ( ) [ ( )] N M N M P b i P C j P b i j P b j P C j j i j i K K δ v b i K K δ v b i − − − = = = = ⎧ ⎫ ⎧ ⎫ − =_⎨ − _{⎬ ⎨} − − _⎬ ⎩

∑∑

⎭ ⎩

∑∑

⎭ y x ∑ ∑ y μ (3-11) (3-11) is the mean shift vector and also an iterative function with respect to y, and we rewrite (3-11) as

(32)

1 1 1 , ( ) , ( ) , ( ) 1 1 1 1 ( ) [ ( )] ( ) ( ) [ ( )] + N M N M new P b i P C j P b i j P b j P C j j i j i K K δ v b i K K δ v b i − − − = = = = ⎧ ⎫ ⎧ ⎫ =_⎨ − _{⎬ ⎨} − − _⎬ ⎩

∑∑

⎭ ⎩

∑∑

⎭ y ∑ ∑ y μ x (3-12) where , ( ) , ( ) 1 , ( ) , ( ) , ( ) 1/ 2 , ( ) ( , ) 1 exp ( ) ( ) ( ) 2 2 P j P b j old P b i T j P b j old P b i j P b j old P b i K π − − + − ⎛₋ ₋ ₊ ₋ ₋ ₊ ₋ ⎞ ⎜ ⎟ ⎝ ⎠ = y μ y x y μ y x y μ y x ∑ ∑ ∑ (3-13) , ( ) , ( ) 1 , ( ) , ( ) , ( ) 1/ 2 3/2 , ( ) ( , ) 1 exp ( ) ( ) ( ) 2 (2 ) j j j C C b i C b i T C b i C b i C b i C b i K π − − ⎛₋ ₋ ₋ ⎞ ⎜ ⎟ ⎝ ⎠ = y y y c μ c μ c μ ∑ ∑ ∑ (3-14)

and ynew is the new position of the target which we want to track and yold is the current position.

3.4.4 Another Derivation of the New Mean-Shift Tracker

Now we want to use another method to derive the second similarity-based mean-shift tracker. As the kernel density estimation model (3-6) which we defined in 3.4.1, if we replace x by xi, and cx by ci in (3-6), we can get a new kernel density estimation function as , ( ) , ( ) , ( ) , ( ) 1 1 ( ) ( , ) ( , ) [ ( )] M P i P b i P b i C i C b i C b i i p u K K u b i M = δ =

∑

x −μ ∑ c −μ ∑ − (3-15) where 1 , ( ) , ( ) , ( ) , ( ) , ( ) 1/ 2 , ( ) 1 exp ( ) ( ) ( ) 2 ( , ) 2 T i P b i P b i i P b i P i P b i P b i P b i K π − ⎛₋ ₋ ₋ ⎞ ⎜ ⎟ ⎝ ⎠ − = x μ x μ x μ ∑ ∑ ∑ (3-16)

(33)

1 , ( ) , ( ) , ( ) , ( ) , ( ) _3/2 1/ 2 , ( ) 1 exp ( ) ( ) ( ) 2 ( , ) (2 ) T i C b i C b i i C b i C i C b i C b i C b i K π − ⎛₋ ₋ ₋ ⎞ ⎜ ⎟ ⎝ ⎠ − = c μ c μ c μ ∑ ∑ ∑ (3-17) P

K and KC are also the spatially weighted and color-feature weighted functions, but these two weighted functions are depend on the image model.

With similar concept of the expectation of the estimated kernel density used in 3.4.2, we define another new similarity measure function between the model

1,..., { , } x i i i M I = x u ₌ and candidate Iy= y{ j, }vj j=1,...,N as 1 , ( ) , ( ) , ( ) , ( ) 1 1 1 ( , ) ( ) ( - ) ( ) 1 ( - ) ( , ) ( , ) [ ( )] N x y j j j N M j P i P b i P b i C i C b i C b i j j i J I I J G p v N G K K v b i NM δ = = = = = = − − −

∑

∑∑

y y y y y x μ ∑ c μ ∑ (3-18) where y is the center of the candidate image, G y y( - _j) is a weighted function which is spatially weighted depends on the candidate image. (3-18) is another new similarity measure function which we proposed.

Now we let the gradient of the similarity function with respect to the vector y be equal to 0 to find the maximum value of the similarity measure to obtain the best candidate.

( )

J

∇

y

=

0

(3-19) 1 1 1 ( - ) '( - ) [ ( )] N M j j P C j j i G K K v b i NM = = δ ⇒

∑∑

y y y y − =0 1 1 1 1 '( - ) [ ( )] '( - ) [ ( )] N M N M j P C j j j P C j j i j i G K K δ v b i G K K δ v b i = = = = ⇒y

∑∑

y y − =

∑∑

y y y −

(34)

1 1 1 1 '( - ) [ ( )] '( - ) [ ( )] N M j j P C j j i N M j P C j j i G K K v b i G K K v b i δ δ = = = = − ⇒ = −

∑∑

y y y y y y (3-20)

So we obtain (3-20) which is another iterative mean shift vector and we rewrite (3-20) as 1 1 1 1 '( - ) [ ( )] '( - ) [ ( )] N M j old j P C j j i new N M old j P C j j i G K K v b i G K K v b i δ δ = = = = − = −

∑∑

y y y y y y (3-21)

where ynew is the new position of the target and yold is the current position. (3-21)

contains the spatially weighted term G y'( _old -y_j), and we choose function G as the Epanechnikov kernel function as

2 1 ( 2)(1 ), 1 2 ( ) 0, d d if C K otherwise ⎧ ₊ ₋ _< ⎪ = ⎨ ⎪⎩ x x x (3-22)

where d is the dimension of space, Cd is the volume of the unit d-Dimension sphere. Letting K( )x =k(x 2), we obtain 1 ( 2)(1 ), 1 2 ( ) 0, d d x if x C k x otherwise ⎧ ₊ ₋ _< ⎪ = ⎨ ⎪⎩ (3-23)

In image case, d =2, so Cd =π and we obtain

1 2 (2 2)(1 ) (1- ), 1 ( ) 2 0, x x if x k x otherwise π π ⎧ ₊ ₋ ₌ _< ⎪ = ⎨ ⎪⎩ (3-24) Letting G x( )=k x( ), we obtain 2 '( ) '( ) G x k x π = = − (3-25) which is a constant. The result is easy to compute and simpler, and this is the reason why we choose weighted function G as Epanechnikov kernel function. Finally, by

(35)

substituting (3-25) into (3-21), we can get the second similarity-based mean-shift algorithm as follows. 1 1 1 1 [ ( )] [ ( )] N M j P C j j i new N M P C j j i K K v b i K K v b i δ δ = = = = − = −

∑∑

y y _(3-26)

(3-26) interprets that the object tracking algorithm is an iterative procedure which moves from current position yold to the new position ynew.

3.4.5 Spatial-Color Mean-Shift Tracking Procedure

we have found the new spatial-color mean-shift tracking algorithms, single object tracking can be summarized as Figure 3.4-2 and Figure 3.4-3.

(36)

yes Finish one iteration

Initialization : Compute the target model from model region according to

the definition (3-2). , , , , , , , P b P b C b C b μ ∑ μ ∑ new old if y −y <ε Set old ← new y y (3-12) 1 1 , ( ) 1 1 1 , ( ) , ( ) 1 1 ( ) [ ( )] ( ) ( ) [ ( )] + N M new P b i P C j j i N M P b i j P b j P C j j i K K v b i K K v b i δ δ − − = = − = = ⎧ ⎫ =_⎨ − _⎬ ⎩ ⎭ ⎧ ⎫ − − ⎨ ⎬ ⎩ ⎭

∑∑

y y μ x ∑ ∑ Initialize the location of the target in

current frame with

old y (3-13) (3-14) compute and according to (3-13) and (3-14) and substitute them into (3-12)

to find the next location of target candidate P K K_C new y , , , , ( ) , , , , I b P b P b C b C b h b = 〈n μ ∑ μ ∑ 〉 (3-2) b n , P b μ , P b ∑ , C b μ , C b ∑

number of pixels in b-th bin. mean vector of location in b-th bin. covariance matrix of location in b-th bin.

mean vector of RGB feature in b-th bin. covariance matrix of RGB feature in b-th bin.

1,..., b= B 1 , ( ) , ( ) , ( ) 1/ 2 , ( ) 1 exp ( ) ( ) ( ) 2 2 T j P b j old P b i j P b j old P P b i K π − ⎛₋ ₋ ₊ ₋ ₋ ₊ ₋ ⎞ ⎜ ⎟ ⎝ ⎠ = y μ y x ∑ y μ y x ∑ 1 , ( ) , ( ) , ( ) 1/ 2 3/2 , ( ) 1 exp ( ) ( ) ( ) 2 (2 ) j j T C b i C b i C b i C C b i K π − ⎛₋ ₋ ₋ ⎞ ⎜ ⎟ ⎝ ⎠ = y y c μ ∑ c μ ∑

(37)

no

yes

Finish one iteration

Initialization :

Compute the target model

from model region according to the definition (3-2).

, , , , , , ,

P b P b C b C b

μ ∑ μ ∑

Find the next location of target candidate according to

(3-26) new y new old if y −y <ε Set old ← new y y compute and according to (3-16) and (3-17) 1,..., {K_{Pi i}}₌ _M {K_{Ci i}}₌_1,...,_M

Initialize the location of the target in current frame with

old y 1 , ( ) , ( ) , ( ) 1/ 2 , ( ) 1 exp ( ) ( ) ( ) 2 2 T i P b i P b i i P b i P P b i K π − ⎛₋ ₋ ₋ ⎞ ⎜ ⎟ ⎝ ⎠ = x μ ∑ x μ ∑ (3-16) 1 , ( ) , ( ) , ( ) 1/ 2 3/2 , ( ) 1 exp ( )( ) ( ) 2 (2 ) T i C b i C b i i C b i C C b i K π − ⎛₋ ₋ ₋ ⎞ ⎜ ⎟ ⎝ ⎠ = c μ ∑ c μ ∑ (3-17) 1 1 1 1 [ ( )] [ ( )] N M j P C j j i new N M P C j j i K K v b i K K v b i δ δ = = = = − = −

∑∑

y y (3-26) , , , , ( ) , , , , I b P b P b C b C b h b = 〈n μ ∑ μ ∑ 〉 (3-2) b n , P b μ , P b ∑ , C b μ , C b ∑

number of pixels in b-th bin. mean vector of location in b-th bin. covariance matrix of location in b-th bin.

mean vector of RGB feature in b-th bin. covariance matrix of RGB feature in b-th bin.

1,...,

b= B

Figure 3.4-3 : Spatial-color mean-shift tracking procedure of the second tracker.

3.5 Choice of the Color Feature Space

In 3.2.2, we choose color space ( , , )R G B as our color feature, so μC b, is the

3-dimension mean vector of values of ( , , )R G B and ∑C b, is the covariance matrix of

( , , )R G B . The color space ( , , )R G B is easily influenced by illumination that affects our tracking results greatly. So we take account of the normalized color space ( , , )r g b

(38)

space ( , , )r g b is defined as ( ) R r R G B = + + , ( ) G g R G B = + + , ( ) B b R G B = + + (3-27)

The covariance matrix of the normalized color space ( , , )r g b is near singular because the definition (3-27), so we choose ( , )r g as the color feature space. Chapter 4 will show that the experiment results of ( , )r g is more robust about the variation of illumination than that of ( , , )R G B .

3.6 Background-Weighted Information

In many tracking applications, background information is an important issue. Exactly representing the target model is a difficult subject, and the system is always confused by the foreground feature with the background feature because the foreground always contains the background information. The proposed tracking method is based on the similarity between the target and the candidate; therefore, how to represent the foreground model is very important. Further, the improper representation of the foreground may concern with the scale and orientation selection algorithm, and obtain inappropriate scale. In this chapter, we derive a simple weighted-background representation and add this approach to the spatial-color mean-shift trackers which we proposed before.

Let NF b, as the normalized histogram of the foreground of the b-th bin

( O b, 1

b

N =

∑

), and NO b, as the normalized histogram of the background of the b-th bin

( F b, 1

b

N =

∑

). The histogram of background is computed in the region around the foreground (target). We define weights as

(39)

, ,1 ,2 , , , ,1 ,2 , , , max( , ,..., ), 0 1, 0 0 0, F b F F F B O b O b O O O B b F b O b N N N N if N N N N N W if N and N otherwise ⎧ _× _≠ ⎪ ⎪⎪ =_⎨ ≠ = ⎪ ⎪ ⎪⎩ (3-28)

The weights transformation diminishes the effect of features which contribute more to the background than to the foreground.

Now we add the weighted-background information to the mean-shift trackers developed in 3.4 and re-derive the revised weighted spatial-color mean-shift as follows. We add the weights to (3-7) and (3-18), and obtain

( ) 1 1 ( , ) ( ) ( , ) N x y b j j j j J I I J W p v N = = y =

∑

y (3-29) ( ) j 1 1 ( , ) ( ) G( - ) ( ) N x y b j j j J I I J W p v N = = y =

∑

y y (3-30) By similar derivation in 3.4, we can obtain the final spatial-color mean-shift tracker functions which contain the weighted-background information from (3-29) and (3-30). 1 1 1 , ( ) ( ) , ( ) , ( ) ( ) 1 1 1 1 ( ) [ ( )] ( ) ( ) [ ( )] + N M N M new P b i b j P C j P b i j P b j b j P C j j i j i W K K δ v b i W K K δ v b i − − − = = = = ⎧ ⎫ ⎧ ⎫ =_⎨ − _{⎬ ⎨} − − _⎬ ⎩

∑∑

⎭ ⎩

∑∑

⎭ y ∑ ∑ y μ x (3-31) j ( ) 1 1 N ( ) j=1 1 [ ( )] = [ ( )] N M b j P C j j i new M b j P C j i W K K v b i W K K v b i δ δ = = = − −

∑∑

y y (3-32)

3.7 Update of Scale and Orientation

In computer vision and image processing, the object always changes its scale when it is away from the camera or toward the camera. In the situation of zoom in and zoom out of camera, the size of object body is also different between image frames. As

使用空間與顏色特徵的平均移動演算法於物件大小與方位追蹤

國 立 交 通 大 學

電機與控制工程研究所

碩 士 論 文