Improving Face Recognition Performance Using Similarity Feature-based Selection and Classication Algorithm

(1)

Ubiquitous International Volume 6, Number 1, January 2015

Improving Face Recognition Performance Using

Similarity Feature-based Selection and Classification

Algorithm

Chi-Kien Tran1_{, Tsair-Fwu Lee, Ph.D}1,∗_,

1 _{Department of Electronics Engineering,}

National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan, ROC

Pei-Ju Chao, Ph.D2,∗

2 _{Department of Radiation Oncology,}

Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan, ROC

∗ _{Corresponding authors: [email protected]; [email protected];} Received November, 2013; revised July, 2014

Abstract. In this paper, we propose the effective similarity feature-based selection and classification algorithm to select similarity features on the training images and to clas-sify face images in face recognition system. The experiments were conducted on the ORL Database of Faces, which consists of 400 images of 40 individuals, and the Yale Face Database, which is made up of 11 images per 15 classes. Three face recognition systems, one based on the histogram-based feature, the second based on the feature which is the mean of pixel values in window with size of 4×4, and the last based on the local directional pattern feature, were developed. Euclidean distance, Manhattan distance and Chi-square distance were taken as distance metrics for the classification method. The results indi-cated that the proposed algorithms not only reduced the dimensions of feature space but also achieved a mean recognition accuracy that was 1.55%÷11.31% better compared to conventional algorithms.

Keywords: Face recognition, similarity feature, histogram, pixel values, local directional pattern.

1. Introduction. Face recognition has a wide variety of applications such as in identity authentication, access control and surveillance [1]. Engineering started to show interest in face recognition in the 1960s. One of the first researchers of this subject was Woodrow W. Bledsoe [2]. Since Bledsoe, there has been a lot of research to deal with different aspects of this field. Despite achievements, face recognition challenges remain in computer vision research [1, 3, 4]. One of these is how to extract features from face images. These features are important in the later step of identifying the subject with an acceptable error rate. Feature extraction involves in several steps - dimensionality reduction, feature extraction and feature selection. In these steps, the selection a subset of the extracted features is an important step that can cause the smallest of classification errors.

Feature selection transforms or combines the data in order to select a proper subspace in the original feature space. In other words, a feature selection algorithm selects the best subset of the input feature set. Feature selection is an important stage of training

(2)

and is one of two ways of avoiding the curse of dimensionality (the other is feature ex-traction). There are two approaches in feature selection known as ”forward selection” and ”backward selection”. Forward selection will start with no features and add them one by one, at each step adding the one that decreases the error most, until any further addition does not significantly decrease the error. Backward selection will start with all the features and remove them one by one, at each step removing the one that decreases the error most (or increases it only slightly), until any further removal increases the er-ror significantly. From the perspective of selection strategy, feature selection algorithms broadly fall into three models: filter, wrapper or embedded [5]. The filter model evalu-ates features without involving in any learning algorithm. The wrapper model requires a learning algorithm and uses its performance to evaluate the goodness of features. The embedded model incorporates feature selection as a part of the learning process, and use the objective function of the learning model to guide searching for relevant features such as decision trees or artificial neural networks.

In recent studies, many researchers have done much work on feature selection and have presented multiple class separability criterion and algorithms which are essentially based on the concept of ’Similarity Preserving Feature Selection’. These feature selection cri-terion and algorithms include Relief [6, 7] and ReliefF [7], Laplacian Score [8], Fisher Score [9], SPEC [10], HSIC [11] and Trace Ratio [12], in which, Fisher Score and ReliefF were designed to select features that assign similar values to the samples from the same class and different values to samples from different classes, Laplacian Score was designed to retain sample locality, and HSIC was designed to maximize feature-class dependency. However, these algorithms have the common drawback of being unable to handle feature redundancy, therefore it wastes lots of time computing, the accuracy is not high in face recognition applications [13].

Face images of the same person in a class have small changes in translation, rotation and illumination. Based on these characteristics, our idea is to keep similarity features in the set of training images of the same person, so we propose a feature selection algorithm to select a subset of the extracted features that cause the smallest classification error. Three face recognition systems were developed, the first based on the histogram-based feature, the second based on the feature which is the mean of pixel values in window with size of 4× 4 (M4 × 4), and the last based on the local directional pattern feature [14]. Euclidean distance, Manhattan distance and Chi-square distance were taken as distance metrics for the classification method [15, 16]. We also compared the proposed algorithms that used the similarity features and the conventional algorithms that did not use the similarity features. The proposed algorithms showed improvement on the recognition accuracy over the conventional algorithms.

2. Materials and Methods.

2.1. Input data. The algorithms were implemented in Visual C# and then tested on two face databases, the ORL Database of Faces [17] and the Yale Face Database (cropped images of MIT Media Lab) which is publicly available for this research aims at the URL http://vismod.media.mit.edu/vismod/classes/mas622-00/datasets/. The ORL Database of Faces consists of 400 images of 40 individuals. The images contain a high degree of variability in expression, pose, and facial details are stored as a 112× 92 pixel array with 256 gray levels (see Figure 1). The Yale Face Database is made up of 11 images per 15 classes (165 total images). The images are gray scale and are cropped with a resolution of 231× 195 pixels (see Figure 2).

(3)

2.2. Features used. There are many approaches to extract the facial feature for face recognition, such as local binary patterns (LBP) [18], local Gabor binary pattern his-togram sequence (LGBPHS) [19], local phase quantization (LPQ) [20], and local direc-tional pattern (LDP) [14]. In this paper, we chose bin-based histogram feature,the mean of pixel values in window with size of 4× 4 feature, and LDP-based feature in order to illustrate the potential of the proposed algorithms.

Figure 1. Example images of the ORL Database of Faces.

Figure 2. Example cropped images of the Yale Face Database.

2.2.1. Histogram. A histogram is a type of graph that has wide applications in statistics. The horizontal axis depicts the range and scale of observations involved, and the vertical axis shows the number of data points in various intervals i.e. the frequency of observations in the intervals. The histogram allows visualizing numerical data by indicating the number of data points that lie within a range of values, called a class or a bin. The frequency of the data that falls in each class is depicted by the use of a bar. Histograms are invariant to image manipulations such as rotations, translations but they also change slightly with a change in scale, angle of view or with occlusion. Despite these advantages, histograms perform poorly under different imaging or lighting conditions. They are also ineffective in distinguishing different images that have similar color distributions and suffer from inefficient computation due to their dimensionality. Some histogram-based face recognition systems have been introduced in [21-25].

Given a 256 gray image and H is the histogram feature vector of it. H can be defined by,

H[I (x, y) × Nbin div 256] = H [I(x, y) × Nbin div 256] + 1, (1)

where I(x, y) is the value of pixel at coordinate x, y and Nbin is the number of the bin.

2.2.2. The mean of pixel values in window with size of 4× 4 (M4 × 4). In this paper, we use a simple feature to test the proposed algorithms so-called the mean of pixel values in window with size of 4× 4. It is defined by dividing a face image into non-overlapping windows (regions) with size of 4× 4 and computing the mean value for these pixels.

m= ∑the pixel values in window with size of 4× 4

(4)

2.2.3. Local directional pattern (LDP). LDP [14] is a gray-scale texture pattern which characterizes the spatial structure of a local image texture. A LDP operator computes the edge response values in all eight directions at each pixel position and generates a code from the relative strength magnitude. Given a central pixel in the image, applying the Kirsch compass edge detector, we obtain eight edge response values m0, m1, , m7, each

representing the edge significance in its respective direction (see Figure 3). We find the top k values ∣ mi∣ and set them to 1. The remaining (8-k) bits of 8-bit LDP pattern are

set to 0. Finally the LDP code is derived which is calculated as follows. LDPk=

7

∑

i=0

s(mi− mk)2i, (3)

where mk is the k−th most significant directional response and the step function s(x) is

defined as Equation(4). Figure 4 shows an example of LDP code with k=3. s(x) = {1, x≥ 0

0, x< 0. (4)

Figure 3. Edge response and LDP binary bit positions.

Figure 4. Generating LDP code with k=3. 2.3. Conventional algorithms.

2.3.1. Algorithm 1. For training, training images are extracted features and stored in vectors for further processing. After that, mean of features from the stored vectors is calculated and stored in another vector for later use in phase classification. This mean vector is used for calculating the absolute differences among the mean of trained images and the test image.

Similarly, the first step of classification is the same as training. The second step, the minimum distance between the feature vector of test image and the mean feature vectors is calculated to find the matched class with test image.

Training algorithm Let the training set of face images be {I1, I2, . . . , Im} and fij

denote the jth feature of the ith image Ii, i=1,. . . , m; j=1,. . . ,n.

The mean of the jth feature is defined by, Ψj =

1

(5)

Classification algorithm Let Y be a feature vector of a test image. Calculate the distance between Y and the mean feature vectors of p classes {Ψ1_{, Ψ}2_{, . . . , Ψ}p}.

di(Ψi, Y) = L (Ψi, Y) , (6)

where proposed L metrics are dissimilarity measures such as Manhattan distance, Eu-clidean distance, Histogram intersection, Chi-square statistics and other distance mea-sures.

Find the minimum distance between Y and {Ψ1_{, Ψ}2_{, . . . , Ψ}p}

s= argmini(di), (7)

and we say that the face with Y vector belongs to a class s.

2.3.2. Algorithm 2. Each image is divided into blocks and extracts the histogram from each block. These histograms are concatenated to get a spatially combined histogram which plays the role of a global face feature for the given face image. The recognition is performed using a nearest neighbor classifier with Chi-square statistics as dissimilarity measures. This algorithm is designed as described in detail in [14], but it does not use weight for regions. Figure 5 describes block diagram of the recognition system based on LDP descriptor.

Figure 5. Block diagram of the recognition system based on LDP descriptor.

2.4. Proposed similarity feature selection and classification algorithms. Face images of the same person in a subject have small changes in translation, rotation and illumination. From these characteristics, this paper proposes algorithm to retain sim-ilarity features having discrimination power and stability which minimizes within-class differences whilst maximizes between-class differences.

In phase training, firstly, the training images of the same person are extracted fea-tures and stored in vectors for further processing. See Figure 6 for an illustration of the histogram-based feature extraction. Secondly, the mean of features from the stored vectors of previous step is calculated and is stored in a vector for later use in next step (see Figure 7(a)). Thirdly, the variance of features is calculated and stored in a vector. Fourthly, the mean vector and the variance vector are used to keep the features that have a little variance (the so-called similarity features). It means that the features which

(6)

Table 10. Performance of the LDP-based system with the Chi-square dis-tance is used for the classification method.

threshold values for two face databases are ∈ [0.09, 0.5]. However, if other systems use other features or classifiers then we have to find the suitable threshold value, because the results depend on the four factors: feature type, the number of training images, threshold value and distance measure.

Tables 1÷10 showed that the results of the proposed algorithms were outstanding, because these not only reduced the dimension of feature space, but also achieved a higher mean recognition accuracy than conventional algorithms from 1.55% to 11.31%. The proposed algorithms could perform better than the conventional ones because they kept essential information from training images and so enhanced the power of discrimination among different classes. Thanks to the advantages of the proposed algorithms, storage, performance and communication of face recognition systems will be better.

4. Conclusion and future works. In this paper, we propose similarity feature-based selection and classification algorithm. Three face recognition systems, the first system based on the histogram-based feature, the second one based on the feature which is the mean of value pixels in window with size of 4×4 (M4×4), and the third one based on LDP feature, were developed to show that the proposed algorithms outperform the conventional algorithms. The results showed that the our algorithms were a valuable tool for performance improvement of face recognition system.

Although higher recognition rate achieved by the proposed methods, still there are some issues which should be furthered addressed such as finding the optimal threshold for each database automatically, or applying the proposed algorithm with other features, in the purpose of improving the recognition rate.

Competing interests: Part of this study was presented on The Second International Conference on Robot, Vision and Signal Processing (RVSP-2013).

Acknowledgment. This study was supported financially, in part, by grants from NSC-101-2221-E-151-007-MY3, and NSC-102-2221-E-182A-002.

References

[1] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, Face recognition: A literature survey, ACM Comput. Surv., vol. 35, pp. 399–458, 2003.

[2] M. Ballantyne, R. S. Boyer, and L. Hines, Woody Bledsoe : His life and legacy, The AI Magazine, vol. 17, pp. 7–20, 1996.

(7)

[3] A. K. Jain, A. Ross, and S. Prabhakar, An introduction to biometric recognition, Circuits and Systems for Video Technology, IEEE Transactions on, vol. 14, pp. 4–20, 2004.

[4] Y. Ming-Hsuan, J. K. David, and A. Narendra, Detecting Faces in Images: A Survey, IEEE Trans-actions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 34–58, 2002.

[5] I. Guyon and A. Elisseeff, An introduction to variable and feature selection, The Journal of Machine Learning Research, vol. 3, pp. 1157–1182, 2003.

[6] K. Kira and L. A. Rendell, A practical approach to feature selection, The proceedings of the ninth international workshop on Machine learning, Aberdeen, Scotland, United Kingdom, 1992.

[7] M. Robnik-ˇSikonja and I. Kononenko, Theoretical and Empirical Analysis of ReliefF and RReliefF, Machine Learning, vol. 53, pp. 23–69, 2003.

[8] X. He, D. Cai, and P. Niyogi, Laplacian score for feature selection, Advances in Neural Information Processing Systems, vol. 17, 2005.

[9] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd Edition ed. New York: John Wiley & Sons, 2001.

[10] Z. Zhao and H. Liu, Spectral feature selection for supervised and unsupervised learning, The pro-ceedings of the 24th international conference on Machine learning, Corvalis, Oregon, 2007.

[11] L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt, Feature selection via dependence maxi-mization, J. Mach. Learn. Res., vol. 13, pp. 1393–1434, 2012.

[12] F. Nie, S. Xiang, Y. Jia, C. Zhang, and S. Yan, Trace ratio criterion for feature selection, presented at the Proceedings of the 23rd national conference on Artificial intelligence - Volume 2, Chicago, Illinois, 2008.

[13] D. Zhang, S. Chen, and Z.-H. Zhou, Constraint Score: A new filter method for feature selection with pairwise constraints, Pattern Recognition, vol. 41, pp. 1440–1451, 2008.

[14] T. Jabid, M. H. Kabir, and O. Chae, Local Directional Pattern (LDP) for face recognition, in Consumer Electronics (ICCE), 2010 Digest of Technical Papers International Conference on, 2010, pp. 329–330.

[15] V. Perlibakas, Distance measures for PCA-based face recognition, Pattern Recognition Letters, vol. 25, pp. 711–724, 2004.

[16] T. Ahonen, A. Hadid, and M. Pietik¨ainen, Face Recognition with Local Binary Patterns, in Computer Vision - ECCV 2004. vol. 3021, T. Pajdla and J. Matas, Eds., ed: Springer Berlin Heidelberg, 2004, pp. 469–481.

[17] The ORL Database of Faces, AT&T Laboratories Cambridge. Available: http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html

[18] T. Ojala, M. Pietik¨ainen, and D. Harwood, A comparative study of texture measures with classifi-cation based on featured distributions, Pattern Recognition, vol. 29, pp. 51–59, 1996.

[19] Z. Wenchao, S. Shiguang, G. Wen, C. Xilin, and Z. Hongming, Local Gabor binary pattern histogram sequence (LGBPHS): a novel non-statistical model for face representation and recognition, in Com-puter Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 1, pp. 786–791, 2005. [20] V. Ojansivu and J. Heikkil¨a, Blur Insensitive Texture Classification Using Local Phase Quantization, in Image and Signal Processing. A. Elmoataz, O. Lezoray, F. Nouboud, and D. Mammass, Eds., ed: Springer Berlin Heidelberg, vol. 5099, pp. 236–243, 2008.

[21] B. Fazl e, M. Y. Javed, and U. Qayyum, Face Recognition using Processed Histogram and Phase-Only Correlation (POC), in Emerging Technologies, 2007. ICET 2007. International Conference on, 2007, pp. 238–242.

[22] H. Gulati, D. Aggarwal, A. Verma, and D. P. S. Sandhu, Face Recognition using Hybrid Histogram & Eigen value Approach, International Journal of Research in Engineering and Technology (IJRET), vol. 1, pp. 64–68, 2012.

[23] S. Singh, M. Sharma, and N. S. Rao, Robust & Accurate Face Recognition using Histograms, In-ternational Journal of Computer Science and Information Security (IJCSIS), vol. 10, pp. 113–122, 2012.

[24] H. Demirel and G. Anbarjafari, A new face recognition system based on color histogram matching, in Signal Processing, Communication and Applications Conference, 2008. SIU 2008. IEEE 16th, 2008, pp. 1–4.

[25] H. Demirel and G. Anbarjafari, High Performance Pose Invariant Face Recognition, INSTICC -Institute for Systems and Technologies of Information, Control and Communication, pp. 282–285, 2008.