Dissertation Organization - 以二維影像與漸進式相似度外觀圖解法為基礎之穩健三維物體辨識

Chapter 1 Introduction

1.5 Dissertation Organization

This chapter provides a brief introduction of the background subtraction system and 3D object recognition, including rigid object recognition, human posture recognition and scene recognition. This chapter also briefly discusses two main components in the proposed 3D object recognition framework. The remainder of this dissertation is organized as follows. Chapter 2 presents the proposed background subtraction algorithm (BSHSR), including the descriptions of the CBM, STCBM, LTCBM, GBM, and CSIM. Chapter 3 describes the ISAG and the proposed hierarchal matching structure. Chapter 4 presents experimental results that demonstrate the performance of the proposed method for 3D rigid objects, human postures and scene recognition. Finally, some concluding remarks and future researches are discussed in Chapter 5.

Chapter 2 Background Subtraction

2.1 Introduction

For precisely extracting foreground objects, environmental changes and shadow/highlight effects are necessary to be considered. Despite the existence of abundance of research on individual techniques, as described in Chapter 1, few efforts have been made to investigate the integration of environmental changes and shadow/highlight effects. In this work, we proposed a scheme that combines the color-based background model (CBM), the gradient-based background model (GBM) and the cone-shape illumination model (CSIM) to solve the issue in practice.

The remainder of this chapter is organized as follows. Section 2.2 describes the system architecture and the corresponding dataflow. Section 2.3 describes the statistical learning method used in the probabilistic modeling and defines the STCBM and LTCBM. Section 2.4 then proposes the CSIM using the STCBM and LTCBM to classify shadows and highlights efficiently. A hierarchical background subtraction framework that combined with color-based subtraction, gradient-based subtraction

and shadow and highlight removal was then described to extract the real foreground of an image. Finally, Section 2.6 presents discussions and conclusions.

2.2 System Architecture

Figure 2-1 illustrates the block diagram of the BSHSR. The BSHSR comprises three main models which are called the CBM, GBM and CSIM. The CBM comprises the LTCBM and STCBM, where the LTCBM is defined to record the background changes during a long period and STCBM is defined to record the background changes during a short period. Moreover, the STCBM and LTCBM are used to determine the parameters of the GBM and CSIM with a selection rule. Four stages are involved in the BSHSR. First, color-based background subtraction is performed on the input image for extracting the foreground candidates via the LTCBM. After that, shadow and highlight removal is performed on the foreground candidates via the CSIM for classifying the pixels of foreground candidates among real foreground, shadow and highlight. For eliminating the false foreground regions, gradient-based background subtraction is performed on the input image via the GBM. Finally, a hierarchal background subtraction is performed for combing the results from the CSIM and GBM.

Figure 2-1 The block diagram of the BSHSR.

2.3 Background Modeling

Our previous investigation [56] studied a CBM to record the activity history of a pixel via GMM. However, the foreground regions generally suffer from rapid intensity changes and require a period of time to recover themselves when objects leave the background. In this work, the STCBM and LTCBM are defined and applied to improve the flexibility of the gradient-based subtraction that proposed by Javed et.al [57]. The features of images used in this work include pixel color and gradient information. This study assumes that the density functions of the color features and gradient features are both Gaussian distributed.

2.3.1 Color-Based Background Modeling

First, each pixel x is defined as a 3-dimensional vector (R, G, B) at time t. N Gaussian distributions are used to construct the GMM of each pixel, which is described as Eq. (2-1).

where λrepresents the parameters of GMM,

1 mpixel values collected from a pixel among a period of m image frames. The next step is calculating the parameter λ of GMM of each pixel so that the GMM can match the distribution of X with minimal errors. A common method for calculating

λ is the maximum likelihood (ML) estimation. ML estimation aims to find model parameters by maximizing the GMM likelihood function. ML parameters can be obtained iteratively using the expectation maximization (EM) algorithm [58] and the ML estimation of λ is defined as Eq. (2-2).

The EM algorithm involves two steps; the parameters of GMM can be derived by iteratively using the Expectation step equation and Maximum step equation, as Eqs.

(2-3) and (2-4).

βji denotes the posterior probability that the feature x_j belongs to the ith Gaussian component distribution.

Maximum step: (M step)

The termination criteria of the EM algorithm are as follows:

1. The increment between the new log-likelihood value and the last log-likelihood value is below a minimum increment threshold.

2. The iterative count exceeds a maximum iterative count threshold.

Suppose an image contains S =W×H pixels, where Wmeans the image width and H means the image height. There are totalS GMMs should be calculated by the EM algorithm with the collected training feature vector of each pixel.

Moreover, this study uses the K-means algorithm [59], which is an unsupervised data clustering used before the EM algorithm iterations to accelerate the convergence.

First, N random values are chosen from X and assigned as the center of each class.

Then the following steps are applied to cluster the m values of the training feature vector X .

1. To calculate 1-norm distances between the m values and the N center values.

Each value of X is classified to the class having the minimum distance with it.

2. After clustering all the values of X , re-calculate each class center by calculating the mean of the values among each class.

3. Calculate the 1-norm distances between the m values and the N new center values. Each value of X is classified to the class which has the minimum distance with it. If the new clustering result is the same as the clustering result before re-calculating each class center, then stop, otherwise return to previous step to calculate the N new center values.

4. After applying K-means algorithm to cluster the values of X , the mean of each class is assigned as the initial value of μ_i, the maximum distance among the points of each class is assigned as the initial value of ∑ , and the value of _i w _i is initialized as 1/N .

2.3.2 Model Maintenance of the LTCBM and STCBM

According to the above sections, an initial color-based probabilistic background model is created using the training feature vector set X with N Gaussian distributions and N is usually defined as 3 to 5 based on the observation over a short period of time m. However, when the background changes are recorded over time, it is possible that more different distributions from the original N distributions are observed. If the GMM of each pixel contains only N Gaussian distributions, only N background distributions are reserved and other collected background information is lost and it is not flexible to model the background with only N Gaussian distributions.

To maintain the representative background model and improve the flexibility of the background model simultaneously, an initial LTCBM is defined as the combination of the initial color-based probabilistic background model and extra N new Gaussian distributions (total 2N distributions), an arrangement inspired by the work of [60]. Kaew et al. [49] proposed a method of sorting the Gaussian distributions based on the fitness value w_i/σ_i ( ∑_i =σ_i²I ), and extracted a representative model with a threshold valueB₀.

After sorting the first N Gaussian distributions with fitness value, b (b≤N) Gaussian distributions are extracted with Eq. (2-5).

∑

= ^b

j b j

B w B

min 0

arg (2-5)

The first b Gaussian distributions are defined as the elected color-based background model (ECBM) to be the criterion to determine the background.

Meanwhile, the remainders (2N-b) of the Gaussian distributions are defined as the

candidate color-based background model (CCBM) for dealing with the background changes. Finally, the LTCBM is defined using the combination of the ECBM and CCBM. Figure 2-2 shows the block diagram to illustrate the process of building the initial LTCBM, ECBM and CCBM.

Training vector set X EM algorithm The initial Color-Based

probabilistic background model

Figure 2-2 Block diagram showing the process of building the initial LTCBM, ECBM and CCBM.

The Gaussian distributions of the ECBM mean the characteristic distributions of

“background”. Therefore, if a new pixel value belongs to any of the Gaussian distributions of the ECBM, the new pixel is regarded as “a pixel contains the property of background” and the new pixel is classified as “background”. In this work, a new pixel value is considered as background when it belongs to any Gaussian distribution in the ECBM and has a probability not exceeding 2.5 standard deviations away from the corresponding distribution. If none of the b Gaussian distributions match the new pixel value, a new test is conducted by checking the new pixel value against the Gaussian distributions in the CCBM. The parameters of the Gaussian distributions are updated via Eq. (2-6). ρ and α are termed the learning rates , and determine the update speed of the LTCBM. Moreover, ^∧p(w_i^t |X_i^t⁺¹) results from background subtraction which is set to 1 if a new pixel value belongs to the i Gaussian ^th distribution. If a new incoming pixel value does not belong to any of the Gaussian

distributions in the CBM and the number of Gaussian components in the CCBM is below (2Nb), a new Gaussian distribution is added to reserve the new background information with three parameters: the current pixel value as the mean, a large predefined value as the initial variance, and a low predefined value as the weight.

Otherwise, the (2N b− )^th Gaussian distribution in the CCBM is replaced by the new one. After updating the parameters of the Gaussian components, all Gaussian distributions in the CBM are resorted by recalculating the fitness values.

)

Unlike the LTCBM, the STCBM is defined to record the background changes during a short period. Suppose B₁ frames are collected during a short period B₁ and then B₁ new incoming pixels for each pixel are collected and defined as a test pixel set P =

{

p1^,p2^,...,pq^,...,pB₁

}

, where p_q means the new incoming pixel at time q.

A test pixel set P is defined and used for calculating the STCBM and a result set S is then defined and calculated by comparing P with the LTCBM and is described as Eq.(2-7), where I means the result after background subtraction ,which means the _q index of Gaussian distribution of the initial LTCBM, R means the index of _q resorting result for each Gaussian distribution after each update, and F means the _q reset flag of each Gaussian distribution.

{

1, ,..., ,...,2 _q _B1`, _q ( , ( ), ( ))_q _q _q

}

S= S S S S and S = I R i F i (2-7)

where 1≤ ≤I_q 2 , 1N ≤R i_q( ) 2 , ( ) {0,1},1≤ N F i_q ∈ ≤ ≤i 2N The histogram of CG is then given using Eq. (2-8).

' '

( ) [ ( ( ( ))) ( ( ( )))]/ 1

CG k q q q q q q q q

H k =

∑

δ k− I +R I + ⋅F

∑

δ k− I +R I B (2-8) where 1≤ ≤k 2 , 1N ≤ ≤q B₁,1≤ < q^' q

In brief, four Gaussian distributions are used to explain how Eqs. (2-7) and (2-8) work and the corresponding example is listed in Table 2-1. At first, the original CBM contains four Gaussian distributions (2N =4), and the index of Gaussian distribution in the initial CBM is fixed (1,2,3,4). At the first time, a new incoming pixel which belongs to the second Gaussian distribution compares with the CBM, so the result of background subtraction is I_q =2. Moreover, the CBM is updated with Eq. (2-6) and the index of Gaussian distribution in the CBM is changed. When the order of the first and second Gaussian distributions is changed, R_q(i) records the change states; for example, 1R_q(1)= means the first Gaussian distribution has moved forward to the

second one, and R_q(2)=−1 means the second Gaussian distribution has moved backward to the first one. At the second time, a new incoming pixel which belongs to the second Gaussian distribution based on the initial CBM is classified as the first Gaussian distribution (I_q =1) based on the latest order of the CBM. However, the CG histogram can be calculated according to the original index of the initial CBM with the latest order of the CBM and R_q(i), such that H_CG(I_q +F_q =2) will be accumulated with one. Moreover, R_q(i) changes while the order of Gaussian

distributions changes. For example, at the fifth time in Table 2-1, the order of CBM changes from (2,1,3,4) to (1,2,3,4), and then R_q(1)=1−1=0 means the first Gaussian distribution of the initial CBM has moved back to the first one of the latest CBM, and R_q(2)=−1+1=0means the second Gaussian distribution has moved back to the second one of the latest CBM.

Table 2-1 An example to calculate CG histogram

TIME (q) ^{INDEX OF}

If a new incoming pixel p matches the _q i Gaussian distribution that has the ^th least fitness value, the i Gaussian distribution is replaced with a new one and the ^th flag F will be set to 1 to reset the accumulated value of _q H_CG(i). Figure 2-3 shows the block diagram about the process of calculatingH . _CG

No Test Pixel Pq Color-based Background

Subtraction q=B1? Calculate HCG

q=q+1 Resorting the Gaussian

Distributions of the LTCBM The result structure Sq of the

background Subtraction

Yes

LTCBM

Record Sq into the result structure S

Figure 2-3 Block diagram showing the process to calculate H . _CG

After matching all test pixels to the corresponding Gaussian distribution, the result set S can be used to calculating H using _CG I and _q F . With the reset flag _q F , _q the STCBM can be built up rapidly based on a simple idea, threshold on the occurring frequency of Gaussian distribution. That is to say, the short-term tendency of background changes is apparent if an element ofH_CG(k) is above a threshold valueB₂ during a period of frames B₁. In this work, B₁ is assigned a value of 300 frames and B₂ is set to be 0.8. Therefore, the representative background component in the short-term tendency can be determined to be k if the value of H_CG(k) exceeds 0.8, otherwise, the STCBM provides no further information on background model selection.

2.3.3 Gradient-Based Background Modeling

Javed et.al [57] developed a hierarchical approach that combines color and gradient information to solve the problem about rapid intensity changes. Javed et.al [57] adopted thek , highest weighted Gaussian component of GMM at each pixel to ^th

obtain the gradient information to build the gradient-based background model. The choice of k in [57] is similar to select k based only on the ECBM defined in this work.

However, choosing the highest weighted Gaussian component of GMM leads to the loss of the short term tendencies of background changes. Whenever a new Gaussian distribution is added into the background model, it is not selected owing to its low weighting value for a long period of time. Consequently, the accuracy of the gradient-based background model is reduced for that the gradient information is not suitable for representing the current gradient information.

To solve this problem, both STCBM and LTCBM are considered in selecting the value of k for developing a more robust gradient-based background model and maintaining the sensitivity to short-term changes. When the STCBM provides a representative background component (says the k bin in the STCBM), k is set to _s^th

k rather than the highest weighted Gaussian distribution. s

Let x_i^t_,_j =[R,G,B] be the latest color value that matched the k distribution of _s^th the LTCBM at pixel location( ji, ), then the gray value of x_i^t_,_j is applied to calculate the gradient-based background subtraction. Suppose the gray value of x_i^t_,_j is

calculated as Eq. (2-9), then g_i^t_,_j will be distributed as Eq. (2-10) based on independence among RGB color channels,

, 2

After that, the gradient along the x axis and y axis can be defined as

t distributions defined in Eqs. (2-11) and (2-12).

)

Δ is defined as its direction (the angle with respect to the

horizontal axis), and ^Δ ⁼

[

^Δ_m^,^Δ_d

]

is defined as the feature vector for modeling the gradient-based background model. The gradient-based background model based on feature vector Δ =

[

Δ_m,Δ_d

]

then can be defined as Eq. (2-13).

2 2

2.4 Background Subtraction with Shadow Removal

2.4.1 Shadow and Highlight Removal

Besides foreground and background, shadows and highlights are two important phenomenons that should be considered in most cases. Shadows and highlights result from changes in illumination. Compared with the original pixel value, shadow has similar chromaticity but lower brightness, and highlight has similar chromaticity but higher brightness. The regions influenced by illumination changes are classified as the foreground if shadow and highlight removal is not performed after background subtraction.

Hoprasert et al. [60] proposed a method of detecting highlight and shadow by gathering statistics from N color background images. Brightness and chromaticity distortion are used with four threshold values to classify pixels into four classes. The method that used the mean value as the reference image in [60] is not suitable for dynamic background. Furthermore, the threshold values are estimated based on the histogram of brightness distortion and chromaticity distortion with a given detection rate, and are applied to all pixels regardless of the pixel values. Therefore, it is possible to classify the darker pixel value as shadow. Furthermore, it cannot record the history of background information.

This work proposes a 3D cone model that is similar to the pillar model proposed by Hoprasert [60], and combines the LTCBM and STCBM to solve the above problems. A cone model is proposed with the efficiency in deciding the parameters of 3D cone model according to the proposed LTCBM and STCBM. In the RGB space, a Gaussian distribution of the LTCBM becomes an ellipsoid whose center is the mean of the Gaussian component, and the length of each principle axis equals 2.5 standard deviations of the Gaussian component. A new pixel I(R,G,B) is considered to belong to background if it is located inside the ellipsoid. The chromaticities of the pixels located outside the ellipsoid but inside the cone (formed by the ellipsoid and the origin) resemble the chromaticity of the background. The brightness difference is then applied to classify the pixel as either highlight or shadow. Figure 2-4 illustrates the 3D cone model in the RGB color space.

Figure 2-4 The proposed 3D cone model in the RGB color space.

The threshold values τ_low and τ_high are applied to avoid classifying the darker pixel value as shadow or the brighter value as highlight, and can be selected based on the standard deviation of the corresponding Gaussian distribution in the CBM.

Because the standard deviations of the R, G and B color axes are different, the angles between the curved surface and the ellipsoid center are also different. It is difficult to classify the pixel using the angles in the 3D space. The 3D cone is projected onto the 2D space to classify a pixel using the slope and the point of tangency. Figure 2-5 illustrates the projection of the 3D cone model onto the RG 2D space.

Figure 2-5 2D projection of the 3D cone model from RGB space onto the RG space.

Let a and b denote the lengths of major and minor axis of the ellipse, where

a=2.5*σR and b=2.5*σ_G. The center of the ellipse is (μ_R,μ_G), and the elliptical equation is described as Eq. (2-14).

) 1

A matching result set is given by F_b =

{

f_bi,i=1,2,3

}

, where f is the matching _bi result of a specific 2D space. A pixel vectorI =

[

I_R,I_G,I_B

]

is then projected onto the 2D spaces of R-G, G-B, and B-R. The pixel matching result is set to 1 when the slope of the projected pixel vector is between m₁ and m . Meanwhile, if the background ₂ mean vector is E =

[

μ_R,μ_G,μ_B

]

, the brightness distortion α_b can be calculated via

The image pixel is classified as highlight, shadow or foreground using the matching result set F , the brightness distortion _b α_b and Eq. (2-17).

When a pixel is a large standard deviation away from a Gaussian distribution, the Gaussian distribution probability of the pixel approximately equals to zero. It also means the pixel does not belong to the Gaussian distribution. By using the simple concept, τ_high and τ_low can be chosen using N standard deviation of the _G corresponding Gaussian distribution in the CBM and are described as Eq. (2-18).

2.4.2 Background Subtraction

A hierarchical approach combining color-based background subtraction and

在文檔中以二維影像與漸進式相似度外觀圖解法為基礎之穩健三維物體辨識 (頁 23-0)