System architecture - 使用適應性機率模型之多使用者辨識系統

Chapter 1 Introduction

1.4 System architecture

Fig. 1-1: System Architecture

Figure 1-1 shows the overview of our system. First, use the camera to capture the image from the scene. The face database is used to train the face detector which can

pick out the hypothesized faces from the image. Then, the facial feature extractor grabs the significant features from the hypothesized faces. The face identifier trained by client database can identify the hypothesized faces by their significant features.

Finally, the decision of face identifier shows that they are clients or impostors.

Chapter 2 Face Detection

In this chapter, a feature-based system for face detection is introduced. The face detection technique employed in this system is based on the AdaBoost algorithm introduced in [4]. In the beginning, a histogram fitting method is applied for lighting normalization as a pre-process in front of the AdaBoost detector. Then, Details of using Haar-like features are described. And then, we describe the AdaBoost algorithm for combining classifiers in a “cascade.” Finally, the principle of a region-based clustering method is described.

Figure 2-1 shows the flow chart of face detection process. At the beginning of this architecture, searching windows with different scales are used to extract blocks from the images captured by a camera. The size of the searching window starts with resolution of 24 X 24 (pixels). Images with 384 X 288 pixels are scanned by 12 scales of searching windows with a scaling factor of 1.25. Then, the size and luminance of the extracted block is normalized to the same. A face detector trained by face database detects the face from the normalized block. Finally, a region-based clustering method is proposed to precisely locate the face regions from the image.

Fig. 2-1: The flow chart of Face Detection

2.1 Lighting normalization

Before extracting facial features, a lighting normalization method using histogram fitting [20] is applied. In this process, a target histogram function (^{G l ,}

( )

l=0,1, 2,..., 255; where l is the discrete gray-scale intensity level) is chosen as the histogram of the image closest to the mean value of the face database. The primary task of histogram fitting is to transform the original histograms of extracted blocks (described with a histogram function H l ) to be the same as the target

( )

histogram (described as ^{G l ).}

( )

The detail of histogram fitting is described as follow: At the beginning, have to find the functions MH_→U

( )

l (Eq. (2.1)) and MG_→U

( )

l (Eq. (2.2)) that map the

histograms ^{H l and}

( )

G l onto a uniform distribution histogram. Figure 2-2 and

( )

Fig. 2-3 shows the expression of mapped histogram by Eq. (2.1) and Eq. (2.2).

( ) ( )

can present as Eq. (2-3). Figure 2-4 shows the transformed process of desired mapping function by Eq. (2-3).

( ) ( ( ) )

, 0,1,..., 1

H G U G H U

M _→ l =M _→ M _→ l l= L− (2.3)

Fig. 2-2: The expression of mapped histogram by Eq. (2.1)

Fig. 2-3: The expression of mapped histogram by Eq. (2.2)

Fig. 2-4: The transformed process of desired mapping function by Eq. (2.3)

Figure 2-5(a) is the chosen target image and (b) is the histograms of the chosen target image. The origin histograms of transformed images are shown in Fig. 2-6(a) and (b) are the histograms of transformed images after the histogram fitting.

(a) (b)

Fig. 2-5: (a) The chosen target image (b) the histograms of the chosen target image

(a)

(b)

Fig. 2-6: The histograms of transformed images (a) before and (b) after the histogram fitting

2.2 Features

The features used in this work are reminiscent of Haar basis functions which have been used by Papageorgiou [3]. It is feasible to use composition of multiple different brightness rectangles to present the light and dark region in the image. If we can know the entire rectangle features which present the target object, the target object can be detected by contrasting unknown objects with the rectangle features. For characteristic of faces, the difference in intensity between the region of the eyes and a region across the upper cheeks is shown in Fig. 2-7.

Fig. 2-7: The two of multiple rectangle features appear the face

We use four kinds of rectangle features which are shown in Fig. 2-8.

Fig. 2-8: The four kinds of rectangle features

valvesubtracted = f x y w h Type

(

^{, , , ,}

)

(2.4) Eq. (2.4) is the definition of rectangle features. (x, y) is the origin of the relative

coordinate of rectangle features in the searching window. The searching window is used to find the block which has a face inside in the image. The significance of w, h denotes the relative weight and height of rectangle features. Type presents which kinds of rectangle features. valve_subtracted is the sum of the pixels in white rectangle subtracted from dark rectangle.

A single rectangle feature which best separates the face and non-face samples can be considered as a weak classification. That is, for each rectangle feature, the weak classification determiners the optimal threshold classification function, such that the minimum number of examples are misclassified.

The threshold selection for rectangle features is described below. Figure 2-9 is the flow chart of threshold selection for rectangle features.

Fig. 2-9: The flow chart of selecting threshold for rectangle features

Fig. 2-10: The positive database

Fig. 2-11: The negative database

Selected threshold of a rectangle feature is trained by lighting normalized face database which consists of 4,000 face images and 59,000 non-face images. Figure 2-10 and Figure 2-11 present the positive and negative database. In this procedure, we need to collect the distribution information of subtracted values by this rectangle feature for face database. Then, find a threshold which discriminates the two classes to make detection rate higher than others. Eq. (2.5) is a weak classifier ^{h x f p}

(

^{, , ,}θ

)

consists of a feature f x y w h type , a threshold

(

^{, , , ,}

) ( )

θ and a polarity

( )

indicating the direction of inequality. x indicates a 24 X 24 pixels sub-window of an image.

( )

^1,

( )

, , ,

if pf x p h x f p

otherwise θ _{= ⎨}^⎧⎪ ^< θ

⎪⎩ (2.5)

2.3 Training of detector

For the minimum resolution of the detector, which is 24 X 24, the exhaustive set of rectangle features are quite large, 160,000. Even through each rectangle feature can be computed very efficiently, computing the complete set is prohibitively expensive.

Viola [4] presents a variant of AdaBoost is used both to select the rectangle features and to train the classifier. In its origin form, the Adaboost learning algorithm is used to boost the classification performance of a single learning algorithm. It does this by combining a collection of weak classification functions to form a stronger classifier.

In their results, the stronger classifier consists of 200-rectangle features provides initial evidence that a boosted classifier constructed from rectangle features is an effective technique for face detection. However the performance of computation time of stronger classifier is not good so that it is not sufficient for many real-world tasks.

A structure of cascaded classifiers which achieves increased detection performance while radically reducing computation time is proposed by Viola [4]. The related researches of extended structures of cascaded classifiers are introduced in the after. In our thesis, we train the classification with the concept of a structure of cascaded classifiers.

The overall classifier is shown in Fig. 2-12 that is composed of many classifiers of stages. Stages in the cascade are constructed by training classifiers using AdaBoost.

In stage1, an object extracted by searching window is classified as face so that it is allowed entering to stage2, otherwise the object is rejected. As same as in stage3 the object has to pass by stage2. In brief, a labeled face is passed through a series of stages, a rejected object is rejected by particular stage even if it enters the last stage.

Fig. 2-12: The overall classifier

The cascade design is driven from a set of detection and performance goals. The number of cascade stages must be sufficient to achieve excellent detection rate while minimizing computation. For example, if each stage has a detection rate of 0.99 (since

0.9≈0.9910), a detection rate of 0.9 can be achieved by a 10 stage classifier

Fig. 2-13: The flow chart of training of cascaded classification

The flow chart of training of cascaded classification is shown in Fig. 2-13. The value for f is the maximum acceptable false positive rate each stage, d is minimum acceptable detection rate each stage, F_t_arg_et is overall false positive rate, P is the set of face samples, and N is the set of Non-face samples. The meaning for i is the stage of cascaded classification and n is the number of weak classification in the stage. _i The overall false positive rate must be smaller than F_t_arg_et and each stage have to satisfy the equality: F_i ≤ ×f F_i₋₁.

Fig. 2-14: The flow chart of training classification for each stage

The classifiers for stages in the cascade are constructed by training classifiers using AdaBoost. The procedure of this is shown in Fig. 2-14. m and l are the

number of non-faces and faces respectively, j is the sum of non-faces and faces samples. First we have to initialize weights _, ¹ , ¹ 0,1

2 2

i j j

w for y

m l

= = respectively,

normalization the weights by Eq. (2.6). Then, according to Eq. (2.7) select the best weak classifier with respect to the weighted error.

2.4 Post-process: A region-based clustering method

The face detector can find a lot of candidates around faces in a scanned image as shown in Fig. 2-15.

Evidently, we need to deal with the troubled problem that more than two blocks are classified as faces around a single face. A region-based clustering method is proposed to solve this problem. The method consists of two levels of clustering, one is called local scale clustering and another is called global scale clustering. The local scale clustering is used to cluster the same scale of blocks and design a simple filter to judge numbers of blocks in clusters. While numbers of blocks in a cluster are more than one, the cluster is preserved as the possible candidate of faces; otherwise, it will be discarded. The global scale clustering works after local scale clustering finished around the original detected blocks. In the end, we select the average of the corners in the global scale clusters to label the faces.

Fig. 2-15: The image after face detecting

20 Eq. (2-11), Eq. (2.12) and Eq. (2.15) are formulated decision rules of the proposed method. cluster x y

( )

^, = means the block x and ¹ y are in the same cluster and their bounding regions is overlapped. overlap rate x y is the ^_

( )

percentage of overlapped region for x and ^y, distance x y

( )

^, is the distance of a center for x and ^y. Figure 2-16 (a) and (b) shows the chart of the overlapped region and the distance of a center of two blocks in the local scale clustering and the global scale clustering respectively. In Fig. 2-16 (a), the two blocks are resolved as the same cluster. In Fig. 2-16 (b), the two blocks are resolved as different clusters, because the distance of their center is not satisfied with Eq. (2.13).

(a)

(b)

Fig. 2-16: The chart of the overlapped region and the distance of a center of two blocks in (a) the local scale clustering and (b) the global scale clustering

The two blocks are not in the same cluster in Fig. 2-16(b). In a special case as shown in Fig. 2-17, the four blocks are in the different clusters. Therefore, they are considered as faces and located in the image; the most of them are false accept blocks.

For the reason, we choose the one of them block to replace the others if they are satisfied with Eq. (2.12).

Fig. 2-17: A special case in cluster

The example of Fig. 2-15 after the local scale clustering is shown in Fig. 2-18(a) and Fig. 2-18(b) is the results of after the global scale clustering from Fig. 2-18(a).

(a) (b)

Fig. 2-18: (a) The results of the local scale clustering (b) the results of the global scale clustering

1 Chapter 3

Face Identification

After extracting a face from the captured image, the information of the face can be used to identify the person by the system of face identification. Two major parts of the face identification in this work are eigenfaces extraction and the adaptive probabilistic model (APM). First, we describe the details of eigenfaces extraction.

Then, the proposed adaptive probabilistic model (APM) used for modeling a client’s face is presented.

The flow chart of face identification process is shown in Fig. 3-1. First, the facial feature extractor is used to extract the facial features from faces that are received from the face detector mentioned in previous chapter. The facial feature extractor is constructed by the principle components analysis (PCA) [17] which is based on projecting the image space into a low dimensional feature space. According to extracted facial features, the faces are judged as either clients or imposters by the face identifier. The face identifier is formed with the adaptive probabilistic model (APM) and the client database. Details of the proposed methods are introduced in the following sections.

Fig. 3-1: The flow chart of face identification

3.1 Features

The eigenfaces technique based on principle component analysis has been widely used for pattern recognition, as well as in the field of biometrics. It is the most popular feature extraction method employed by face identification techniques [17]. The principle components analysis (PCA) techniques, also known as the Karhunen-Loeve methods, choose a dimensionality reducing projection that maximizes the scatter of all projected samples. The eigenface feature extraction based on PCA is used to obtain the most important features from the face images in our system. These features are obtained by projecting the original images into corresponding subspaces.

To begin with, we have a training set of N images, and each image consists of n elements. For example, we have N = 4000 images in our database used to compute eigenfaces. Each image has n = 24 X 24 = 576 elements. Figure 3-2 shows the chart of rearranging 24 X 24 pixel of image to 576 X 1 vectors.

Fig. 3-2: The chart of rearranging 24 X 24 pixel of image to 576 X 1 vectors

The process of obtaining a single space consists of finding the covariance matrix C of the training set and computing the eigenvectors v k_k; =1, 2,...,n . The eigenvectors v corresponding to the largest eigenvalues _k λ_k span the base of the sought subspace. Each original image can be projected into the subspace as Eq. (3.1).

1, 2,...,

k vk s k m

η = ⋅Φ 　　 = (3.1)

Where m ( m< ) is the chosen dimensionality of the image subspace and n

s s

Φ = Γ −Ψ , where Γ is an original images from the set of images that have to be _s projected and Ψ is the average image of the training set. In Fig. 3-3, the average image obtained from our training set is presented. The coordinates of the projected images in the subspace, η_k;k=1, 2,...,m, can be used as a feature vector for the matching procedure.

Fig 3-3: The average face image from our database

Selecting dimensionality of the image subspace is an important topic. If m is closer to n , the degree of face identification is more precise. But it spends more computational time to project the original images into the corresponding subspace.

Hence, we have to choose the appropriately dimensionality of the image subspace for the precision and the computational time.

The content of pattern information with respect to the number of eigenvectors is shown in Fig.3-4. The more eigenvectors are used, the more pattern information can be expressed. Forty eigenvectors can express about 77 percentage of pattern information; fifty eigenvectors can express about 81 percentage of pattern information;

sixty eigenvectors can express about 84 percentage of pattern information.

Figure 3-5 denotes the detection rate corresponding to each number of eigenvectors. While the number of selected eigenvectors is greater than twenty, the

degree of detection rate is not obvious improved. Instead of the number of selected eigenvectors is greater than fifty, the degree of detection rate is reduced progressively.

The reason is the pattern information includes the significant information and noise.

The more eigenvectors are extracted, the more noises are extracted. Hence, the performance of the detection rate is descending by the affect of the noise.

Depending on the factor of the computational load and the detection rate, we choose fifty eigenvectors as the image subspace used for face identification.

Fig 3-4: The contents of pattern information with respect to the number of eigenvectors

Fig 3-5: The detection rate corresponding to each number of eigenvectors

3.2 Adaptive probabilistic model (APM)

The adaptive probabilistic model (APM) is proposed to achieve a fast and functional technique of face identification. The construction of the adaptive probabilistic model (APM) is a weighted combination of simple probabilistic functions. Hence, the design of APM is sufficient for real-time tasks. Furthermore, the proposed APM can on-line register new clients and update the clients’ information.

The capability of on-line registering new clients enhances the practicability of the proposed system. The detection rate of identification can also be improved by updating clients’ information for long-term usage of the proposed system.

The primary concept of the APM architecture is based on view-independent face identification. The model of view-independent face identification is constructed by five different head orientations from each person (Ebrahimpour et al. [21] proposes the model of face recognition. In the model, the face space, spanning from right to left profiles along the horizontal plane, is divided into five views). The view-independent model of face identification is more robust than the single view model, because the head orientation of a person is variable in real world.

Our model is designed to achieve view-independent face identification with a mixture of view-independent faces modeled by probabilistic functions. The view-independent model of face identification is constructed by five different head orientations from each client as shown in Fig. 3-6.

Fig 3-6: Example for five different head orientations of a client

3.2.1 Similarity measure

APM follows probabilistic constraint, that is, similarity measures of APM are designed to model the likelihood functions. The judgment of classification is relying on the degree of likelihood. For example, the similarity of a testing sample x between each registered client is computed with the likelihood functions of each client.

Then, the testing sample x is classified as the client corresponding to the biggest similarity. Eq. (3.2) shows the likelihood function. In our system, n presents the label of each client, k is the one of five head orientations, and t denotes the updating times of clients’ information. The w_{n k t}_{, ,} is the weight of each probabilistic functions, the constraining of w_{n k t}_{, ,} is shown in Eq. (3.3). The value of w_{n k}_{, ,1} is initialized by Eq.

Eq. (3.5) indicates the original probabilistic functions. d is the dimension of input vectors, μ_{n k t}_{, ,} is the mean vector, and σ_{n k t}_{, ,} is the covariance matrix. Due to the assumption of Eq. (3.6) (where I is the identity matrix), the probabilistic functions in Eq. (3.5) can be simplified as Eq. (3.7). Figure 3-7 indicates the chart of initial mean vector μ_{n k}_{, ,1}.

Fig 3-7: The chart of initialization of mean vector μ_{n k}_{, ,1}

3.2.2 Parameter tuning

The magnitude of covariance matrix σ_{n k t}_{, ,} can affect the performance of APM.

For this reason, we design an experiment to find the best value of covariance matrix

, , n k t

σ form the different coefficient of covariance matrix σ_{n k t}_{, ,} .

The face database containing images of 10 persons is used for the experiment.

The database has, for each person, images of ten different head orientations. We choose five images of ten different head orientations for each person to be the training data, the other five images is used to be the testing data.

The covariance matrix σ_{n k}_{, ,0} is initialized by the variance of training data, because the images of each person is too less to compute the variance from the images.

When obtain the initialized covariance matrix σ_{n k}_{, ,0}, we need to adjust the coefficient of covariance matrix σ_{n k}_{, ,1}.

_{, ,1} 1 _{, ,0}

n k n k

σ = ×k σ (3.8)

The covariance matrix σ_{n k}_{, ,1} is adjusted by Eq. (3.8). The detection rate with

respect to different parameter k of the covariance matrix σ_{n k}_{, ,1} is shown in Fig. 3-8.

When the parameter k is larger than four and smaller than forty-three, the detection rate is obvious improved. Therefore, we choose parameter k as 5 to obtain a suitable covariance matrix σ_{n k}_{, ,1} for APM employed in this work.

Fig 3-8: The detection rate of different parameter k of the covariance matrix

3.2.3 Adaptive updating

The topic of adaptive updating introduces the updating functions of APM. The design of adaptive updating for APM improves the detection rate of face identification.

As the updating times increase, the functions of APM become more robust. The

model of APM will match more precisely the head orientations of the actual person While a client is identified correctly, the function of APM is updated immediately. By the design of adaptive updating for APM, we present the improvement of the detection rate in chapter 4.

在文檔中使用適應性機率模型之多使用者辨識系統 (頁 13-0)