Object Representation - Incremental Similarity-Based Aspect-Graph 3D Object Recognition .29

Chapter 3 Incremental Similarity-Based Aspect-Graph 3D Object Recognition .29

3.3 Object Representation

In this work, shape and color features are utilized to measure similarity between two object views. To extract shape information, a robust background subtraction framework from previous works [65-66] is utilized to extract foreground regions while considering shadows and highlights. Foreground detection provides flexibility when constructing the object database, even in an out-of-control environment. Canny edge detection [67] is then applied to extract shape edge, and the Gradient Vector Flow Snake (GVF) [68] is applied to extract the contour information. Assume that the contour information is included in a set Z, which is composed of N pointsz , where _i

z is a complex form given by Eq. (3-1). Two kinds of shape features, which are i

called the Fourier descriptor (FD) [69] and the point-to-point length (PPL), are extracted from Z .

{ ( )} {z i x_i jy_i}, 0 i N

= = + ≤ <

Z (3-1)

3.3.1 Shape Features

The points inside the set Z are re-sampled using Eq. (3-2) to eliminate variations in shift and scale.

{ ( )} { [(z i L x_c _i x_c) j y( _i y_c)]/ }L

= = − + −

Z (3-2)

where 0 i≤ < N; L denotes contour length of Z , L_c is expected contour length, and (x_c,y_c) is the location of the contour center of Z . Then, the Fourier transform is applied to Z to compute FD using Eq. (3-3).

( ) _n^N0 ( ) exp( 2 / ), 0 k<N

FD k =

∑

₌⁻ z n −j πkn N ≤ (3-3)

The low-frequency parts of FD are extracted with the consideration of decreasing the variations of high-frequency noises, and are defined as MAG Notably, MAG is composed of 2T magnitude values of frequency information selected among ₂ 2N frequencies. The method for extracting MAG is given by Eq. (3-4).

{| ( ) | , | ( ) | , 1 2}

MAG= FD k FD N k− ≤ ≤k T (3-4) Intuitively speaking, MAG only characterizes the shape and not the orientation of human posture. Therefore, MAG cannot discriminate between similar shapes oriented differently. To solve this problem, phase information for FD must be used for memorizing an object. The work in [70] proposes that memorizing the phase value at low frequency is sufficient. Suppose the phase information is θ_z, then θ_z can be calculated using FD(1)and FD N( − , as described in Eqs. (3-5) and (3-6). 1)

1 1 1

(1) | (1)|.exp( )

FD = FD jθ =R + jI (3-5)

1 1 1

( 1) | ( 1)|.exp( _N ) _N _N

FD N− = FD N− jθ ₋ =R ₋ + jI ₋ (3-6) Furthermore, θ can be calculated using Eq. (3-7). _z

1 1 1 1 1 1

( ) / 2 (arctan( / ) arctan( / )) / 2

z N I R IN RN

θ = θ θ+ ₋ = + ₋ ₋ (3-7)

where R₁and R_N₋₁ denote the real parts of FD(1) and FD N( − , 1) I₁ and I_N₋₁

denote the imaginary parts of FD(1) and FD N( − , and 1) θ₁ and θ_N₋₁ are the phases of FD(1) and FD N( − . 1)

Moreover, the lengths between each pair of points in Z are defined as PPL.

PPLis suitable for describing shape details. To calculate PPL is time consuming due

to that each point is considered as a start point. Equations (3-8) and (3-9) describe the

Numerous features, such as edge, corner, texture, color and shape, have been utilized to extract useful information from an image. Among these features, color involves the intuitive information to represent the conceptual idea of an image.

Therefore, pixel color and pixel position are utilized in this work to extract the conceptual idea of an image. The color space used in this work is RGB color space, a format common to most video devices. To enhance the regional information of an image, the position (x, y) feature is combined with RGB color information as the feature vector. That is, each pixel contains a 5D feature vector (R, G, B, x, y), which is shown in Fig. 3-4.

Figure 3-4 5D feature vector construction.

This work applies Gaussian mixture model (GMM) to model region information in a scene image as a blob model, which is defined as BM, using 5D feature vectors (R, G, B, x, y). We assume that the density function of color and position features have Gaussian distributions. First, each pixel x is defined as a 5-dimensional vector at time t. Moreover, N Gaussian distributions are used to construct the GMM, which is described in Eq. (3-10).

λ represents the parameters of GMM,

Next, parameters λ of GMM are calculated to enable the GMM to match the feature vector distribution with least errors. The most common method for calculating parameters λ is ML estimation. The objective of ML estimation is to identify model parameters by maximizing the likelihood function of GMM obtained from training feature vectors X . The ML parameters are derived iteratively using the EM algorithm. Supposing there ares feature vectors x x₁, ,...,₂ x (In this work, s is _s defined as image size, 320×240=76,800), then the ML estimation of λ can be calculated using Eq. (3-11).

Furthermore, unsupervised data clustering is used before the EM algorithm iterations to accelerate convergence. This study uses the K-means algorithm [59] for clustering. The number of clusters is defined, and then the initial center of each cluster is obtained randomly. The appropriate center and variance of each cluster can be

estimated iteratively using the K-means algorithm and applied as the initial mean and variance of each Gaussian component of the GMM.

3.3.3 Similarity Functions

To determine the similarity between two objects when building databases and recognizing objects, a similarity measurement D U V is applied to extract features.

(

)

We assume that the features extracted from two contours are U =

{

u0, , , ," u_i " u_I₋1

}

and V =

{

v0, , , ," v_i " v_I₋1

}

, respectively, where I denotes the feature size. Two similarity measures are applied using 1-norm distance (Eq. (3-12)) and K-L distance [71](Eq. (3-13)), where c denotes the number of points on an extracted contour and

sdenotes image size. In this work, c is defined as 256 and s is defined as 76800.

( )

¹ represents the m characteristic view of the ^th n object. Moreover, ^th Ammin denotes the aspects that have the minimum distance from V_newⁿ and min

Cm represents the minimal distance, where m^min is the index of Ammin. C_mⁿmin−₁ and C_mⁿmin+₁ denote the

neighboring views of min

(Eq.(3-17)) denotes the similarity measure using θ_z and 1-norm distance.

( )

在文檔中以二維影像與漸進式相似度外觀圖解法為基礎之穩健三維物體辨識 (頁 48-53)