• 沒有找到結果。

Basis-emphasized Non-negative Matrix Factorization with Wavelet Transform

N/A
N/A
Protected

Academic year: 2022

Share "Basis-emphasized Non-negative Matrix Factorization with Wavelet Transform "

Copied!
97
0
0

加載中.... (立即查看全文)

全文

(1)

國立台灣大學 資訊工程研究所

碩士論文

小波轉換之基底強化非負矩陣分解演算法 及其在人臉辨識之應用

Face Recognition Using

Basis-emphasized Non-negative Matrix Factorization with Wavelet Transform

研究生: 歐珮珮 撰 指導教授: 貝蘇章 博士

中華民國九十五年六月

(2)

誌 誌 誌 誌謝 謝 謝 謝

感謝貝蘇章老師收容我進入歡樂的 530 影像處理實驗室,指導我從無到有做 出些小小研究成果。很幸運地,在這混雜著歡笑淚水的兩年研究生涯,有群可愛 的碩二同學作伴,大夥一起上山(貓空)下海(澎湖),互相加油打氣,在實驗 室挑燈夜戰,準備考試、拼論文的情景依舊令人難忘。感謝實驗室裡的學長姐,

熱心地協助我解答很多修課、研究方面的問題,以及貼心的學弟妹,讓 530 實驗 室更活潑熱鬧。還要感謝家人和知心好友的支持,常常要忍受我的碎碎念,並給 予我很多誠摯的建議,謝謝你們。

(3)

中 中

中 中 文 文 文 文 摘 摘 摘 摘 要 要 要 要

非負矩陣分解演算法的主要問題為,無法確保產生,對於人臉辨識很重要,

具備局部強化特徵的基底。我們的目標是強化基底的局部特徵,以及將主軸分析 演算法的正交特徵加在非負矩陣分解演算法上。為了降低原圖的雜訊干擾,例如 臉部表情、光照變化和局部遮蔽,小波轉換被應用在基底強化非負矩陣分解演算 法之前。在這篇論文中我們提出一個新的子空間投影技術,叫做小波轉換之基底 強化非負矩陣分解演算法,以表示位於低頻的人臉圖像,並且產生較佳的人臉辨 識正確率。最後將這些結果與主軸分析演算法和非負矩陣分解演算法做比較。

(4)

Abstract

A fundamental problem of Non-negative Matrix Factorization (NMF) is that it does not always extract basis components manifesting localized features which are essential in face recognition. The aim of our work is to strengthen localized features in basis images and to impose orthonormal characteristic of Principle Component Analysis (PCA) on NMF. Such improved technique is called Basis-emphasized Non-negative Matrix Factorization (BNMF). In order to reduce noise disturbance in the original image such as facial expression, illumination variation and partial occlusion, Wavelet Transform (WT) is applied before the BNMF decomposition. In this paper, a novel subspace projection technique, called Basis-emphasized Non-negative Matrix Factorization with Wavelet Transform (wBNMF), is proposed to represent human facial image in low frequency sub-band and yields better recognition accuracy. These results are compared with those produced by PCA and NMF.

(5)

CONTENTS

Table of Contents

Table of Contents i

1 Introduction 1

1.1 Related Work . . . 1

1.2 Face Database . . . 4

1.2.1 CBCL face database . . . 4

1.2.2 ORL face database . . . 4

1.2.3 Normalized ORL face database . . . 5

1.2.4 AR face database . . . 6

1.3 Principle Component Analysis . . . 7

1.4 Non-negative Matrix Factorization . . . 9

2 Basis-emphasized Non-negative Matrix Factorization 11 2.1 Drawbacks of NMF . . . 11

2.2 Extensions of NMF . . . 12

2.3 Algorithm of BNMF . . . 14

2.4 Comparison of NMF-related Algorithms . . . 17

2.5 Image Training and Testing . . . 18

2.6 Metric Determination . . . 20

3 Wavelet Transform 29 3.1 Introduction . . . 29

3.2 Two-Dimensional Discrete Wavelet Transform . . . 30

3.3 Multi-Resolution Analysis using Filter Banks . . . 31

3.4 Wavelet Filter . . . 34

3.5 Wavelet Sub-Bands . . . 35

3.6 Basis-emphasized Non-negative Matrix Factorization with Wavelet Transform . . . 41

3.7 Biometric Recognition System . . . 46

3.8 Wavelet Determination of wBNMF . . . 47

(6)

CONTENTS

4 Face Recognition 55

4.1 Introduction . . . 55

4.2 Facial Expression . . . 58

4.3 Illumination Variation . . . 63

4.4 Occlusion Disturbance . . . 67

4.5 Cross Validation . . . 70

5 Conclusion 73

(7)

LIST OF FIGURES

List of Figures

1.1 Face examples from the CBCL database. . . 5 1.2 Face examples from the ORL database. . . 5 1.3 Face examples from the normalized ORL database. . . 6 1.4 Face examples from the cropped AR database: neutral (N),

smile (F1), anger (F2), scream (F3), left light (L1), right light (L2), both lights (L3), sunglasses (G1), sunglasses & left light (G2), sunglasses & right light (G3), scarf (S1), scarf & left light (S2), scarf & right light (S3). . . 8 2.1 Basis images learned from the CBCL (a) and ORL (b) database

using NMF. . . 12 2.2 Basis images learned from the normalized ORL database using

LNMF (a) and NMFSC (b) respectively. . . 13 2.3 Basis images learned from the normalized ORL database using

NMF (a) and BNMF (b). . . 16 2.4 The objective function history of BNMF in Fig. 2.3(b) is dis-

played and it is non-increasing under the corresponding update rules. The converged value in the case is 5.1657e + 003. . . 16

(8)

LIST OF FIGURES

2.5 Basis images learned by NMF (left) and BNMF (right) from the normalized ORL database. From up to down, the dimen- sion of each row is 25, 49 and 81 respectively. Every basis image is of size 48 × 48 and the displayed images are re-sized to fit the paper format. . . 19 2.6 Recognition rates versus the six different metrics with rank =

25. . . 24 2.7 Recognition rates versus the six different metrics with rank =

75. . . 25 2.8 Recognition rates versus the six different metrics with rank =

125. . . 25 2.9 Recognition rates versus the six different metrics with rank =

175. . . 26 2.10 Recognition rates using the Riemannian distance metric with

rank = 25, 75, 125 and 175 respectively. . . 26 3.1 Signal processing application using Wavelet Transform. . . 30 3.2 The decomposition and reconstruction of two-dimensional DWT. 32 3.3 Two-level wavelet decomposition tree. . . 33 3.4 Two-level wavelet reconstruction tree. . . 34 3.5 Orthogonal wavelets. (a) Haar wavelet, (b) Discrete Meyer

wavelet. . . 36 3.6 Orthogonal wavelets. (a) Daubechies wavelet of order 5, (b)

Daubechies wavelet of order 10, . . . 37

(9)

LIST OF FIGURES

3.7 Orthogonal wavelets. (a) Symlets wavelet of order 5, (b) Sym- lets wavelet of order 10, . . . 38 3.8 Orthogonal wavelets. (a) Coiflets wavelet of order 1, (b) Coiflets

wavelet of order 5. . . 39 3.9 Two-dimensional discrete wavelet transform on images. (a)

original image, (b) the decomposing results through two-step two-dimensional DWT. . . 40 3.10 Face image in wavelet sub-bands. (a) original image, (b) 1-

level wavelet decomposition, (c) 2-level wavelet decomposition, (d) 3-level wavelet decomposition. . . 42 3.11 The reconstructed images using two-dimensional IDWT. (a)

reconstructed image without discarding any frequency band, (b) reconstructed image discarding the first (lowest) frequency band, (c) reconstructed image discarding the second frequency band, (d) reconstructed image discarding the third (highest) frequency band. . . 43 3.12 The difference between the original image and the correspond-

ing reconstructed image. (a) error image between the original image and Fig. 3.11(a), (b) error image between the original image and Fig. 3.11(b), (c) error image between the original image and Fig. 3.11(c), (d) error image between the original image and Fig. 3.11(d). . . 44 3.13 Flow chart of generating the wBNMF features. . . 45 3.14 Receiver Operating Curve. . . 48

(10)

LIST OF FIGURES

3.15 Comparison of BNMF using various basis numbers for the AR database. Since the upper left point in the original receiver operating curve is not distinct, we magnify the region between 70 and 90 of (1-FRR) in the form of percentage as shown in the lower figure. . . 50 3.16 Comparison of wBNMF using different wavelet filters at var-

ious thresholds. Note that the most upper left point in Fig.

3.16(b) is produced by wBNMF using the Symlets wavelet fil- ter of order 5. . . 52 3.17 Comparison of PCA, NMF and wBNMF approaches at various

thresholds. Note that the most upper left point, the lowest EER, is produced by wBNMF. . . 54 4.1 A set of 25 basis images for (a) PCA and (b) NMF using the

AR database. . . 57 4.2 Basis images for BNMF using the AR database. . . 57 4.3 Total success rate of the smile facial images versus different

number of basis components, 25, 75, 125 and 175, using PCA, NMF and wBNMF. . . 60 4.4 Total success rate of the anger facial images versus different

number of basis components, 25, 75, 125 and 175, using PCA, NMF and wBNMF. . . 61 4.5 Total success rate of the scream facial images versus different

number of basis components, 25, 75, 125 and 175, using PCA, NMF and wBNMF. . . 61

(11)

LIST OF FIGURES

4.6 Total success rate versus various facial expressions using tech- niques, PCA, NMF and wBNMF. Note that the number of basis components is 25. . . 62 4.7 Total success rate of the left-lighting neutral facial images ver-

sus different number of basis components, 25, 75, 125 and 175, using PCA, NMF and wBNMF. . . 64 4.8 Total success rate of the right-lighting neutral facial images

versus different number of basis components, 25, 75, 125 and 175, using PCA, NMF and wBNMF. . . 64 4.9 Total success rate of the both-lighting neutral facial images

versus different number of basis components, 25, 75, 125 and 175, using PCA, NMF and wBNMF. . . 65 4.10 Total success rate versus various lighting conditions using tech-

niques, PCA, NMF and wBNMF. Note that the number of basis components is 175. . . 65 4.11 Total success rate of the facial images with sunglasses occlu-

sion versus different number of basis components, 25, 75, 125 and 175, using PCA, NMF and wBNMF. . . 68 4.12 Total success rate of the facial images with scarf occlusion

versus different number of basis components, 25, 75, 125 and 175, using PCA, NMF and wBNMF. . . 68 4.13 Total success rate versus facial images with sunglasses or scarf

occlusion using techniques, PCA, NMF and wBNMF. Note that the number of basis components is 175. . . 69

(12)

LIST OF FIGURES

4.14 Total Success Rate (TSR) of each testing set under four kinds of training set. . . 71

(13)

LIST OF TABLES

List of Tables

2.1 Distance between the original and the reconstructed images under various extensions of NMF and NMF itself. And we set the dimension for 25 and the iteration time for 1000. . . 17 2.2 Recognition rates with NMF, LNMF, NMFSC and BNMF

techniques using six different distance metrics. The number of basis components, also called rank, is 25. . . 22 2.3 Recognition rates with NMF, LNMF, NMFSC and BNMF

techniques using six different distance metrics. The number of basis components, also called rank, is 75. . . 22 2.4 Recognition rates with NMF, LNMF, NMFSC and BNMF

techniques using six different distance metrics. The number of basis components, also called rank, is 125. . . 23 2.5 Recognition rates with NMF, LNMF, NMFSC and BNMF

techniques using six different distance metrics. The number of basis components, also called rank, is 175. . . 23 3.1 Error measures of BNMF using different basis numbers for the

AR database. The lowest EER means the best recognition accuracy. . . 49

(14)

LIST OF TABLES

3.2 Error measures of wBNMF using different wavelet filters at the most appropriate threshold. An appending number to the wavelet filter is its corresponding order. . . 51 3.3 Error measures of PCA, NMF and wBNMF using the Symlets

wavelet filter of order 5. . . 53 4.1 The total success rates of smile expression produced by PCA,

NMF and wBNMF under different feature dimensional space.

Where rank (r) is the number of basis components. . . 59 4.2 The total success rates of anger expression produced by PCA,

NMF and wBNMF under different feature dimensional space. . 59 4.3 The total success rates of scream expression produced by PCA,

NMF and wBNMF under different feature dimensional space. . 59 4.4 The total success rates of neutral facial images with left light

on produced by PCA, NMF and wBNMF under different fea- ture dimensional space. Where rank (r) is the number of basis components. . . 63 4.5 The total success rates of neutral facial images with right light

on produced by PCA, NMF and wBNMF under different fea- ture dimensional space. . . 66 4.6 The total success rates of neutral facial images with both lights

on produced by PCA, NMF and wBNMF under different fea- ture dimensional space. . . 66

(15)

LIST OF TABLES

4.7 The total success rates of facial images with sunglasses oc- clusion produced by PCA, NMF and wBNMF under different feature dimensional space. Where rank (r) is the number of basis components. . . 67 4.8 The total success rates of facial images with scarf occlusion

produced by PCA, NMF and wBNMF under different feature dimensional space. . . 67 4.9 Cross validation result. There are four kinds of training set:

smile & anger facial expression (F1, F2), left & right lights (L1, L2), sunglasses occlusion with left & right light (G2, G3) and scarf occlusion with left & right light (S2, S3). On the other hand, six kinds of testing set are used for experiment:

neutral facial expression (N), scream facial expression (F3), both lights (L3), sunglasses occlusion (G1) and scarf occlusion (S1). . . 71

(16)

Chapter 1 Introduction

1.1 Related Work

Since a specific pattern of interest can reside in a low dimensional sub- manifold of the original input data space which consists of an unnecessarily high dimensionality. Subspace analysis is used to reveal low dimensional structures observed in a high dimensional space such as pattern recognition.

In fact, the essence of feature extraction in pattern recognition can be con- sidered as discovering and computing low dimensional intrinsic pattern from observation. For these reasons, subspace analysis has been a major research issue in appearance based on imaging and vision. However, for a specific pattern such as human face, the recognizable facial features occupy only a fraction of all. Subspace analysis has demonstrated its success in numerous

(17)

1.1. RELATED WORK

visual recognition tasks such as face recognition, face detection and tracking.

Face recognition is one of the most challenging problems to be solved in the computer vision community due to the wide variety of illumination condition, facial expression and occlusion. It has several potential applica- tions in areas such as Human Computer Interaction (HCI), biometrics and security. Moreover, it is a prototypical pattern recognition problem whose solution is helpful in many other classification problems. Until now, several sophisticated approaches have been developed to obtain better recognition result using some face databases. But there is no uniform way to establish the best approach because nearly all of them have been designed to work with faces under some specific situations. In order to obtain comparable result, the three databases used are the MIT CBCL face database [1], the Cambridge ORL face database [2] and the Aleix Robert (AR) face database [3].

One effective approach for face recognition is Principal Component Anal- ysis (PCA) [4] which can simplify a dataset by transforming the data to a new coordinate system with the greatest variance. PCA learns basis com- ponents for subspace representation and achieves dimension reduction by discarding the least significant components. The eigenimage method uses PCA [5] [6] performed on a set of training images to decorrelate second order moments corresponding to low frequency property. Each input image can be represented as a linear combination of these eigenimages. Due to the holistic nature of this method, the resulting components are global interpretations, and thus PCA is unable to extract basis components manifesting localized

(18)

1.1. RELATED WORK

including stability to local deformation, lighting variation, and partial occlu- sion. Therefore, several methods have been proposed for localized, part-based feature extraction.

Recently a subspace method called Non-negative Matrix Factorization (NMF) is proposed by Lee and Seung [7] [8] as a way to find a set of ba- sis functions for representing non-negative data, which has been used for image representation, document analysis [7] and clustering [9] [27] [28] for its parts-based representation property. NMF is akin to other matrix de- compositions which have been proposed previously, such as Positive Matrix Factorization (PMF) of Juvela, Lehtinen and Paatero [10] [11]. The non- negativity constraints make the representation purely additive, in contrast to many other linear representations such as PCA and Independent Com- ponent Analysis (ICA) [12] [13]. However, the additive parts learned by NMF are not necessarily localized for some databases such as the ORL face database. Experiments also show that directly using the learned feature vectors via NMF under the Euclidean distance cannot get better face recog- nition rate than that obtained by the traditional PCA. In order to improve the recognition accuracy, Local Non-negative Matrix Factorization (LNMF) [14] is proposed to achieve a more localized NMF algorithm with the aim of computing spatially localized bases from a face database by adding three con- straints that modify the objective function in the original NMF algorithm.

But this method has a slow speed for learning the bases. Then, Guillamet and Vitri`a adopt one relevant metric called Earth Mover’s Distance (EMD) [15]

for parts-based representation of NMF. However, the computation of EMD is too time-demanding. Recently Hoyer incorporated the notion of sparseness

(19)

1.2. FACE DATABASE

to improve the found decompositions, and then proposed a method called Non-Negative Matrix Factorization with Sparseness Constraints (NMFSC) [18] [19]. But its recognition accuracy is not better than that of PCA.

In this paper, a novel subspace method is proposed, called Basis-emphasized Non-negative Matrix Factorization with Wavelet Transform (wBNMF), for learning intuitive parts-based representation of visual pattern with noise re- duction [29] [30]. Inspired by previous work, our aim is to impose orthonor- mal characteristic on basis components and to make the representation more suitable for tasks where feature localization is important. This paper also investigates how to improve the face recognition accuracy based on wBNMF [20] [21] [22] [23] [24] [25] [26]. For better performance, we adopt the Riema- nian metric distance for the learned feature vectors instead of the Euclidean distance [16] [17]. Experiments on the widely used AR face database demon- strate the proposed method can improve recognition accuracy and even out- perform PCA.

1.2 Face Database

1.2.1 CBCL face database

There are 2429 faces and 4548 non-faces in the training set. The testing set consists of 472 faces, 23573 non-faces. Each image is 19 × 19 grayscale PGM format. Figure 1.1 shows some sample images from the database.

1.2.2 ORL face database

There are 400 face images of 40 persons, 10 images per person which are

(20)

1.2. FACE DATABASE

Figure 1.1: Face examples from the CBCL database.

Figure 1.2: Face examples from the ORL database.

images are taken at different times, slightly varying lighting, facial expres- sions (open/closed eyes, smiling/non-smiling) and facial details (glasses/no- glasses). All the images are taken against a dark homogeneous background.

The faces are in up-right position of frontal view, with slight left-right out- of-plane rotation. Each image is linearly stretched to the full range of pixel value within [0, 255].

1.2.3 Normalized ORL face database

The ORL faces are re-aligned in the center of the original images. Besides, the redundancy region of each image, non-facial feature, is eliminated to avoid undesired noise in face image analysis. The dimension of the normalized ORL face image is 48 × 48.

(21)

1.2. FACE DATABASE

Figure 1.3: Face examples from the normalized ORL database.

1.2.4 AR face database

The AR color face database contains images of 126 individuals (70 males and 56 females). Original images are 768 × 576 pixels in size with 24-bit color resolution. A total of 13 photos are taken from each individual with each shot taken under different conditions: neutral, smile, anger, scream, left light on, right light on, both lights on, wearing sunglasses, wearing sunglasses and left light on, wearing sunglasses an right light on, wearing scarf, wearing scarf and left light on, and wearing scarf and right light on. These same shots are taken again after two weeks interval in another session.

For our experiments, only 200 face images (50 males and 50 females) of 13 shots in both sessions of the original AR database were randomly extracted.

Each original facial image has been aligned with respect to its upper left corner. We aligned these images since the preprocessing step is critical in achieving good classifier performance. In order to avoid external influence of background, these realigned images are cropped and down-sampled in such a way that the final image size is 120 × 120 pixels. Moreover, we reduced them to dimensions of 60 × 60 and 30 × 30 where our representation of main facial feature becomes more manageable and conspicuous. To evaluate the ability

(22)

1.3. PRINCIPLE COMPONENT ANALYSIS

of NMF to extract and analyze low-level image information, the experiments were carried out with non-normalized images.

1.3 Principle Component Analysis

Principal Component Analysis (PCA) is a linear dimensionality reduction technique with respect to the Mean Square Error (MSE) of the reconstruc- tion. For a set of N training vectors X = x1, ..., xN the mean (µ = N1 PN

i=1xi) and covariance matrix (P = N1 PN

i=1(xi − µ)(xi − µ)T) can be calculated.

Defining a projection matrix E composed of the K eigenvectors of P with highest eigenvalues, the K-dimensional representation of an original, n-dimensional vector x, is given by the projection y = ET(x − µ). Therefore, in the matrix factorization V ≈ W H of PCA, each column of matrix W represents an eigenvector and the matrix factor H represents the eigenprojection.

In PCA, any image can be represented as a linear combination of a set of orthogonal bases which form an optimal transform in the sense of recon- struction error. However, due to the global characteristic of bases, PCA cannot extract basis components manifesting localized features. And its two extensions, independent component analysis (ICA) and kernel principal com- ponent analysis (KPCA), have the same problem. Hence, the difference be- tween PCA and NMF arises from the constraints imposed on the matrix factors W and H. PCA constrains the columns of W to be orthonormal and the rows of H to be orthogonal to each other. For NMF, it just requires that each element of V , W and H is non-negative.

(23)

1.3. PRINCIPLE COMPONENT ANALYSIS

N

F1 F2 F3

L1 L2 L3

G1 G2 G3

S1 S2 S3

Figure 1.4: Face examples from the cropped AR database: neutral (N), smile (F1), anger (F2), scream (F3), left light (L1), right light (L2), both lights (L3), sunglasses (G1), sunglasses & left light (G2), sunglasses & right light (G3), scarf (S1), scarf & left light (S2), scarf & right light (S3).

(24)

1.4. NON-NEGATIVE MATRIX FACTORIZATION

1.4 Non-negative Matrix Factorization

According to psychological and physiological evidence for parts-based rep- resentations in the brain, perception of the whole is based on perception of its parts. For that reason, a new subspace method called non-negative ma- trix factorization (NMF) is recently proposed to learn the parts of images.

NMF imposes the non-negativity constraints on its bases and coefficients and allows additive-only combinations, not subtractive cancellations. The non-negative constraint is consistent with the neuro-physiological fact that the neural firing rate is non-negative and leads to parts-based learning of information. Therefore, these representations have the intuitive meaning of adding parts to form a whole. Due to its parts-based representation prop- erty, NMF and its extensions have been used for various extensions, image classification, face detection, face and object recognition.

There exists two efficient algorithms with multiplicative update rules to implement such non-negative matrix factorization; see Lee and Seung (2001) for more detail information. Here we adopt the update rules based on the Euclidean distance.

The matrix factorization of NMF is:

Vij =

r

X

k=1

WikHkj (1.1)

We implement NMF with the update rules for W and H.

Wik ←− Wik

m

X

j=1

Vij

(W H)ij

Hij (1.2)

Wik ←− Wik

Pn p=1Wpk

(1.3)

(25)

1.4. NON-NEGATIVE MATRIX FACTORIZATION

Hkj ←− Hkj

n

X

i=1

Wik Vij (W H)ij

(1.4) Iteration of these update rules converges to a local minimum of objective function.

F =

n

X

i=1 m

X

j=1

(Vij − (W H)ij)2 (1.5) For example, we use NMF to represent a set of facial images. The image dataset is regarded as an n × m matrix V , each column of which represents one of the m facial images and contains n non-negative pixel values. Then we get a matrix factorization of the form V ≈ W H. The dimensions of the matrix factors W and H are n × r and r × m respectively. The r columns of W are basis images. The NMF basis images are usually localized features that correspond to the intuitive notion of human face. Each column of H is an encoding of the corresponding facial image. An encoding consists of the coefficients by which a face is represented as a linear combination of basis images.

Denoting the n-dimensional measurement vectors as vs (s = 1 . . . m), a linear approximation of the data is given by vs≈Pr

k=1wkhsk. Note that each measurement vector is written in terms of the same basis vectors. Usually the rank r is chosen to be smaller than n or m, so that the size of W and H is smaller than that of the original matrix V . This results in a compressed version of the original data matrix. Since relatively few basis vectors are used to represent many data vectors, NMF can discover latent structure in the original data under the achievement of good approximation.

(26)

Chapter 2

Basis-emphasized Non-negative Matrix Factorization

2.1 Drawbacks of NMF

One noticeable property of NMF is that it usually produces a naturally sparse representation of the input data. Such a representation encodes much of the input data by using relatively few active bases, which make the encoding easy to interpret. Lee and Seung (1999) originally showed that NMF learned a parts-based representation when trained on the CBCL database. Despite this success, when applied to the ORL database in which images are not aligned as well, a global decomposition emerges. Therefore, the difference in result was mainly attributed to how well the images were hand-aligned (Li et al., 2001). In Fig. 2.1, NMF is applied to different face databases, the CBCL

(27)

2.2. EXTENSIONS OF NMF

(a) (b)

Figure 2.1: Basis images learned from the CBCL (a) and ORL (b) database using NMF.

and ORL databases. The representation of basis images learned from the CBCL database is apparently composed of the intuitive facial features, but the representation of basis images learned from the ORL database is global rather than local.

2.2 Extensions of NMF

Sparseness in both the bases and encodings is crucial for a parts-based repre- sentation. For this reason, many studies incorporating the notion of sparse- ness are developed to improve the found decomposition. One useful approach is non-negative matrix factorization with sparseness constraints (NMFSC), the aim of which is to constrain NMF to find decomposition with desired

(28)

2.2. EXTENSIONS OF NMF

(a) (b)

Figure 2.2: Basis images learned from the normalized ORL database using LNMF (a) and NMFSC (b) respectively.

degree of sparseness.

In the other point of view, the manifestation of localized features is signif- icant and then an improved method, local non-negative matrix factorization (LNMF), is proposed. LNMF defines a novel objective function with addi- tional localization constraint. However, many basis images learned from the normalized ORL database using LNMF and NMFSC respectively lack intu- itive facial features as shown in Fig. 2.2. Moreover, most of them are just non-meaningful fragments. In addition, the sparseness of the basis images in NMFSC is fixed at 0.75 and higher pixel values are in darker color. In the following cases, the normalized ORL face image database is chosen to avoid undesired noise.

(29)

2.3. ALGORITHM OF BNMF

2.3 Algorithm of BNMF

The basis images we desire are non-global and contain several versions of mouths, noses and other intuitive facial features in different locations and forms. In order to meet human intuitive notion of an individual face, some symmetrical facial features have better come out in pairs such as eyes and eyebrows. These considerations lead us to designing an improved NMF which can learn basis images with weakened holistic contour and emphasized local features. The accomplished result is shown in Fig. 2.3 and it is noticeable that our method can learn desired facial features, in contrast to the original NMF.

Since the original NMF does not impose any constraints on the spatial localization, minimizing the objective function hardly yields a factorization which can reveal local features in the basis images. Therefore, BNMF is introduced to impose more spatial constraints on the cost function. A new objective function is defined to learn intuitively parts-based components.

Letting B = [bij] = WTW, BNMF can learn local features by imposing the following constraints on the bases and encodings.

(1) Given the existing constraint P

iwij = 1 for all i, we wish that each basis should be as orthogonal as possible, so as to minimize redundancy between different bases. The objective function can be imposed by

Xbij = min (2.1)

(2) Take advantage of essential characteristics in PCA, we make each basis

(30)

2.3. ALGORITHM OF BNMF

image have constant energy. This can be achieved by

Wik ←− Wik

qPn p=1Wpk2

(2.2)

(3) To avoid the degradation of energy on coefficients, we renormalize each row of H to maintain constant energy. This can be done by

Hkj ←− Hkj

qPm p=1Hkp2

(2.3)

The incorporation of the above constraints leads to the following objective function for BNMF:

F =

n

X

i=1 m

X

j=1

(Vij − (W H)ij)2+X

Bij (2.4)

When subject to the non-negativity for all matrix factors, a local solution to the minimization of such a constrained objective function can be found under the update rules:

Wik ←− Wik

m

X

j=1

Vij

(W H)ij

Hkj (2.5)

Wik ←− Wik

Pn p=1Wpk

(2.6)

Wik ←− Wik

qPn p=1Wpk2

(2.7)

Hkj ←− Hkj

n

X

i=1

Wik

Vij

(W H)ij

(2.8)

Hkj ←− Hkj

qPm p=1Hkp2

(2.9)

(31)

2.3. ALGORITHM OF BNMF

(a) (b)

Figure 2.3: Basis images learned from the normalized ORL database using NMF (a) and BNMF (b).

0 20 40 60 80 100

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

2x 105

iteration time

the value of objective function

Figure 2.4: The objective function history of BNMF in Fig. 2.3(b) is displayed and it is non-increasing under the corresponding update rules. The converged value in the case is 5.1657e + 003.

(32)

2.4. COMPARISON OF NMF-RELATED ALGORITHMS

Algorithm Euclidean distance Divergence distance NMF 5.2398e + 003 5.5763e + 003 LNMF 2.9566e + 005 6.6422e + 005 NMFSC 1.9498e + 004 1.9731e + 004 BNMF 5.1654e + 003 5.4661e + 003

Table 2.1: Distance between the original and the reconstructed images under various extensions of NMF and NMF itself. And we set the dimension for 25 and the iteration time for 1000.

2.4 Comparison of NMF-related Algorithms

BNMF adds the non-negative constraint and the orthonormal characteris- tic of PCA to get intuitive parts-based features. Therefore, bases should be as orthogonal as possible so as to minimize redundancy between them.

To show the advantage of BNMF, we compare it with other extensions of NMF. The Euclidean distance and the divergence distance between the orig- inal and reconstructed images are computed to evaluate the efficiency of each algorithm. The smaller distance value we compute, the better matrix factor- ization we get. Furthermore, the influence of various dimensions (number of basis components) on the result basis images is surveyed.

As Fig. 2.3 shows, NMF and BNMF learn basis images which contain dark intuitive part-based facial features and light global facial contour. Higher contrast between the holistic contour and the local feature emerges in BNMF.

Nevertheless, some of the basis images learned from LNMF and NMFSC in Fig. 2.2 are no more than non-meaningful fragments. In Table 2.1, we intentionally adopt two different kinds of distance metrics, the Euclidean distance and the divergence distance. So we can see more deeply that the

(33)

2.5. IMAGE TRAINING AND TESTING

smallest distance value occurs in BNMF.

The BNMF procedure learns basis components through the additive- only computation and the manifestation local features. As the dimension increases, the features learned from BNMF become more local and apparent.

In Fig. 2.5, the BNMF representation is both parts-based and local, whereas the NMF representation is parts-based but holistic.

2.5 Image Training and Testing

The data matrix, F = [f1f2. . . fm], is constructed such that the training face images occupy the columns of the F matrix. The remaining samples are used for testing. Now learning is done using some matrix factorization algorithms such as NMF, BNMF and PCA, to decompose F into two factor matrices, W and H. Let the basis images be W = [w1w2. . . wr] and the encodings be H = [h1h2. . . hm]. Each face fi in F can be approximately reconstructed by a linear combination of the basis images W , and the corresponding encoding coefficients hi = (h1ih2i. . . hri)T. Hence, a face can be modeled in terms of a linear superposition of basis functions together with encodings as follows:

fi = W hi (2.10)

In order to calculate the corresponding encodings of facial images, each face f in the training set and testing set is multiplied by the inverse or pseudo-inverse (if the inverse does not exist) of the basis component matrix W. The basis components in such a factor matrix W are generated from certain matrix factorization algorithm using the set of training faces. As a

(34)

2.5. IMAGE TRAINING AND TESTING

Figure 2.5: Basis images learned by NMF (left) and BNMF (right) from the normalized ORL database. From up to down, the dimension of each row is 25, 49 and 81 respectively. Every basis image is of size 48 × 48 and the displayed images are re-sized to fit the paper format.

(35)

2.6. METRIC DETERMINATION

given by

hi = Wfi (2.11)

where Wis a computed pseudo-inverse matrix of the basis matrix W . Once trained, the facial image set is represented by a set of encodings {h1h2. . . hm} with a reduced dimension of rank r.

2.6 Metric Determination

Since the positive space learned by NMF and its extensions lacks of a suit- able metric, it is not directly adequate for further analysis such as object recognition using the nearest neighbor classifier. For this reason, a distance metric must be determined to work with the positive projected vectors in an optimal manner. In order to improve the face recognition accuracy, we evaluate various distance metrics for the learned feature vectors trying to determine the best one for the specific problem.

We assume that x and y are two d-dimensional vectors. The aim is to calculate different distance metrics between a test feature vector and a prototype one, and how close theses vectors are to each other. For this, six commonly used distance measures are tested in this current work:

(1) The L1, Manhattan or Cityblock metric is defined as:

L1(x, y) =

d

X

i=1

|xi− yi| (2.12)

(2) The Euclidean or L2 metric is defined as:

Euc(x, y) = v u u t

d

X(xi− yi)2 (2.13)

(36)

2.6. METRIC DETERMINATION

(3) The Correlation metric is defined as:

Cor(x, y) =

Pd

i=1(xi− xmean)(yi− ymean) q

Pd

i=1(xi− xmean)2Pd

i=1(yi− ymean)2

(2.14)

(4) The Angular metric is defined as:

Ang(x, y) =

Pd i=1xiyi

q Pd

i=1x2i Pd i=1yi2

(2.15)

which is the cosine of the angle between the two observation vectors measured from zero and takes values from -1 to 1.

(5) The Mahalanobis metric is defined as:

M ah(x, y) =p(x − y)T × Cov(D)−1× (x − y) (2.16)

Cov(D) is the covariance matrix.

(6) The Riemannian metric is defined as:

Rie(x, y) = (x − y)TG(x − y) (2.17)

G is a similarity matrix which is defined as G = (Gij) = (hBi, Bji) = BTB where Bi is the i-th learned basis, and hx, yi means the inner product of x and y.

We have selected the AR color face database because it is a well-known database with a large number of color facial images under various conditions.

There are six metrics that have been tested with such a database and most of them are based on preprocessing the input images in order to reduce some distortion effects. In this case, we use the 120 × 120 grayscale images of the

(37)

2.6. METRIC DETERMINATION

method-rank metric

L1 Euc Cor Ang M ah Rie

NMF-25 0.500 0.540 0.540 0.545 0.430 0.685 LNMF-25 0.670 0.620 0.480 0.515 0.265 0.560 NMFSC-25 0.520 0.580 0.485 0.555 0.205 0.525 BNMF-25 0.505 0.490 0.485 0.510 0.340 0.690

Table 2.2: Recognition rates with NMF, LNMF, NMFSC and BNMF tech- niques using six different distance metrics. The number of basis components, also called rank, is 25.

method-rank metric

L1 Euc Cor Ang M ah Rie

NMF-75 0.620 0.575 0.645 0.655 0.620 0.730 LNMF-75 0.605 0.525 0.440 0.495 0.340 0.640 NMFSC-75 0.700 0.645 0.565 0.640 0.465 0.690 BNMF-75 0.630 0.590 0.635 0.665 0.615 0.745

Table 2.3: Recognition rates with NMF, LNMF, NMFSC and BNMF tech- niques using six different distance metrics. The number of basis components, also called rank, is 75.

cropped AR database without any other modification. Training images con- sist of two neutral poses of each individual in both sessions. Images labeled as F1 are used as a testing set because they contain smile expression under normal condition. The following part presents experimental evaluations of several traditional distance measures in the context of face recognition when using NMF and extensions of NMF.

Our current work compares the performance obtained by NMF and its extensions. Furthermore, we have also analyzed how different distance met- rics affect the classification result in the projected space. Faces are projected

(38)

2.6. METRIC DETERMINATION

method-rank metric

L1 Euc Cor Ang M ah Rie

NMF-125 0.630 0.570 0.685 0.685 0.680 0.735 LNMF-125 0.635 0.490 0.455 0.445 0.365 0.690 NMFSC-125 0.710 0.630 0.660 0.680 0.615 0.735 BNMF-125 0.590 0.575 0.680 0.680 0.665 0.750

Table 2.4: Recognition rates with NMF, LNMF, NMFSC and BNMF tech- niques using six different distance metrics. The number of basis components, also called rank, is 125.

method-rank metric

L1 Euc Cor Ang M ah Rie

NMF-175 0.655 0.630 0.705 0.730 0.720 0.735 LNMF-175 0.590 0.455 0.450 0.425 0.350 0.735 NMFSC-175 0.715 0.640 0.720 0.695 0.715 0.745 BNMF-175 0.645 0.585 0.755 0.760 0.745 0.770

Table 2.5: Recognition rates with NMF, LNMF, NMFSC and BNMF tech- niques using six different distance metrics. The number of basis components, also called rank, is 175.

the modification of feature space dimensionality. Each specific situation is analyzed when all NMF-related algorithms are used for classification. Thus, we can learn when the performance of the BNMF technique is better and understand when it can be used for a further classification task.

It is possible to select a suitable metric when using BNMF that we can improve the NMF recognition accuracy with the same training set. Because BNMF adds more constraints on bases to minimize its objective function, the learned bases can present non-negligible correlations and other higher order effects. Therefore, we prefer a distance metric with emphasized con- sideration of the learned bases and our experimental result just agrees with

(39)

2.6. METRIC DETERMINATION

1 2 3 4 5 6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recognition Rate

L1 Euc Cor Ang Mah Rie NMF−25

LNMF−25 NMFSC−25 BNMF−25

Figure 2.6: Recognition rates versus the six different metrics with rank = 25.

the above idea. In Table 2.2, Table 2.3, Table 2.4 and Table 2.5, we notice that the best recognition rate is produced by BNMF using the Riemannian metric presents the best one among the six different metrics. Moreover, un- der various analyzed ranks, the recognition rate produced by BNMF using the Riemannian metric generally outperforms than that produced by other NMF-related method using the same metric. Especially under the highest rank of 175, BNMF presents the best recognition rate of 0.77. This is due to the fact that BNMF learns more essential information about the faces. When trying to recover the original image, the above fact is helpful for BNMF to generate a good estimation.

The first impression is that the Euclidean distance is not the most suit- able metric when working with BNMF ,but the Riemannian distance metric

(40)

2.6. METRIC DETERMINATION

1 2 3 4 5 6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

L1 Euc Cor Ang Mah Rie

Recognition Rate

NMF−75 LNMF−75 NMFSC−75 BNMF−75

Figure 2.7: Recognition rates versus the six different metrics with rank = 75.

1 2 3 4 5 6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

L1 Euc Cor Ang Mah Rie

Recognition Rate

NMF−125 LNMF−125 NMFSC−125 BNMF−125

Figure 2.8: Recognition rates versus the six different metrics with rank = 125.

(41)

2.6. METRIC DETERMINATION

1 2 3 4 5 6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

L1 Euc Cor Ang Mah Rie

Recognition Rate

NMF−175 LNMF−175 NMFSC−175 BNMF−175

Figure 2.9: Recognition rates versus the six different metrics with rank = 175.

20 40 60 80 100 120 140 160 180

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Rank

Recognition Rate

NMF LNMF NMFSC BNMF

Figure 2.10: Recognition rates using the Riemannian distance metric with rank = 25, 75, 125 and 175 respectively.

(42)

2.6. METRIC DETERMINATION

techniques. Now we can show that adopting this Riemannian distance metric is more suitable than any other distance metric for face classification when using the nearest neighbor classifier. Let f1, f2 denote two facial vectors in the original n-dimension space, and the corresponding learned coefficients in lower r-dimension space are h1, h2, respectively. To some extent, we can say f1 = W h1 and f2 = W h2 where W is the learned basis matrix. Then we can get

Rie(f1, f2) = (f1− f2)T(f1− f2) = (h1− h2)TWTW(h1− h2)

= (h1− h2)TG(h1 − h2) 6= (h1− h2)T(h1− h2)

(2.18)

This indicates that the Riemannian metric can preserve the neighborhood of the original samples for classification. The recognition accuracy for AR database using the Riemannian metric is represented in Fig. 2.10, as a func- tion of the number of basis components. All these techniques are able to im- prove recognition accuracy when a higher rank is used. Note that the highest recognition rate is achieved by BNMF with 175 number of basis components.

At last, we can claim that the Riemannian metric is the most suitable one for the problem of metric determination. BNMF clearly outperforms NMF and its other extensions using the Riemannian metric for each number of basis components.

(43)
(44)

Chapter 3

Wavelet Transform

3.1 Introduction

The Wavelet Transform (WT) has been evolving for some time. Mathemati- cians theorized its use in the early 1900’s. While the Fourier transform deals with transforming the time domain components to frequency domain and fre- quency analysis, the wavelet transform deals with scale analysis, that is, by creating mathematical structures that provide varying time/frequency/amplitude slices for analysis. This transform is a portion of a complete waveform, hence the term wavelet. The wavelet transform has the ability to identify frequency components simultaneously with their location in time. Additionally, com- putations are directly proportional to the length of the input signal.

In wavelet analysis, the scale that one uses in looking at data plays a

(45)

3.2. TWO-DIMENSIONAL DISCRETE WAVELET TRANSFORM

Figure 3.1: Signal processing application using Wavelet Transform.

special role. Wavelet algorithms process data at different scales or resolution.

If we look at a signal with a large ”window”, we would notice gross features.

Similarly, if we look at a signal with a small ”window”, we would notice small discontinuities. As a matter of fact, low-frequency components contribute to the global description, while the high-frequency components contribute to the finer details required in the identification task.

There is a wide range of applications for Wavelet Transforms, especially in signal processing. Fig. 3.1 shows the general steps followed in a signal processing application. Processing may involve de-noising, edge detection, feature extraction and others. And the processed signal is either stored or transmitted. The popularity of Wavelet Transform is growing because of its ability to reduce distortion in the reconstructed signal while retaining all the significant features present in the signal.

3.2 Two-Dimensional Discrete Wavelet Transform

Since the information the continuous wavelet transform (CWT) provides is highly redundant as far as the reconstruction of the signal is concerned, the discrete wavelet transform (DWT) is needed to provide sufficient information both for analysis and synthesis of the original signal, with a significant re-

(46)

3.3. MULTI-RESOLUTION ANALYSIS USING FILTER BANKS

duction in the computation time. DWT employs two sets of functions, called scaling functions and wavelet functions obtained by successive high pass and low pass filtering of the time domain signal. The equation of DWT is given:

(Wψf(x))(m, n) = 1

√am0

X

k

x[k]ψ[a−m0 n− k] (3.1)

Where m is a scaling integer variable, n is a shifting integer variable, x[k] is a digital signal with sample index k, and ψ is the mother wavelet.

For images, two-dimensional DWT is needed to decompose approxima- tion coefficients at level j − 1 to four components at level j which are the approximation (LLj) and the details in three orientations: horizontal (LHj), vertical (HLj) and diagonal (HHj). On the other hand, two-dimensional discrete wavelet transform (IDWT) is used to reconstruct the original image.

Fig. 3.2 describes the basic decomposition and reconstruction steps.

3.3 Multi-Resolution Analysis using Filter Banks

Wavelets can be realized by iteration of filters with rescaling. The resolution of the signal, which is a measure of the amount of detail information in the signal, is determined by the filtering operations, and the scale is determined by up-sampling and down-sampling operations.

The DWT is computed by successive low-pass and high-pass filtering of the discrete time-domain signal to connect the continuous-time multi- resolution to discrete-time filters as shown in Fig. 3.3. This is called the Mallat algorithm or Mallat-tree decomposition. Its significance is that in the manner, it connects the continuous-time multi-resolution to discrete-time fil- ters. At each decomposition level, the half-band filters produce signals span-

(47)

3.3. MULTI-RESOLUTION ANALYSIS USING FILTER BANKS

Figure 3.2: The decomposition and reconstruction of two-dimensional DWT.

(48)

3.3. MULTI-RESOLUTION ANALYSIS USING FILTER BANKS

Figure 3.3: Two-level wavelet decomposition tree.

ning only half of the frequency band. This doubles the frequency resolution as the uncertainty in frequency is reduced by half. The filtering and decima- tion process is continued until the desired level is reached. The maximum number of levels depends on the length of the signal.

The DWT of the original signal is obtained by concatenating all the co- efficients starting from the last level of decomposition. Basically, the recon- struction is the reverse process of decomposition. The approximation and detail coefficients at every level are up-sampled by two, passed through the low pass and high pass synthesis filters and then added. In Fig. 3.4, this pro- cess is continued through the same number of levels as in the decomposition process to obtain the original signal.

(49)

3.4. WAVELET FILTER

Figure 3.4: Two-level wavelet reconstruction tree.

3.4 Wavelet Filter

The wavelet function can be classified into two classes: orthogonal and non- orthogonal wavelets. Based on the application, only the orthogonal wavelet function is used. The coefficients of orthogonal filters are real numbers. The filters are of the same length and are not symmetric. The low pass filter, Lo

and the high pass filter, Hi are related to each other by

Hi(z) = z−NLo(−z−1) (3.2)

The two filters are alternated flip of each other. The alternating flip auto- matically gives double-shift orthogonality between the low-pass and high-pass filters. Filters that satisfy the above equation are known as Conjugate Mirror Filters (CMF). Perfect reconstruction is possible with alternating flip. For perfect reconstruction, the synthesis filters are identical to the analysis filters

(50)

3.5. WAVELET SUB-BANDS

moments. This property is useful in many signal and image processing ap- plications. They have regular structure which leads to easy implementation and scalable architecture.

In Fig. 3.5, Fig. 3.6, Fig. 3.7 and Fig. 3.8, we illustrate some of the com- monly used orthogonal wavelet functions. The Haar wavelet is one of the oldest and simplest wavelet. Therefore, any discussion of wavelets starts with the Haar wavelet. The Daubechies wavelets are the most popular wavelets.

They represent the foundations of wavelet signal processing and are used in numerous applications. The Haar, Daubechies, Symlets and Coiflets are compactly supported orthogonal wavelets. These wavelets along with the Discrete Meyer wavelets are capable of perfect reconstruction. The wavelets are chosen based on their ability to analyze the signal in a particular appli- cation.

3.5 Wavelet Sub-Bands

Wavelet Transform can decompose a signal into components with different frequencies. Given an image f10(x, y), the decomposing result through two- step two-dimensional DWT is shown in Fig. 3.9(b), in which f12(x, y) rep- resents the lowest frequency components and f41(x, y) represents the highest frequency components. Besides, if Wavelet Transform is continuously ex- ecuted at p times, f2p(x, y), f3p(x, y) and f4p(x, y) can be regarded as the influence of wavelet function on horizontal, vertical and diagonal orientation respectively.

Two-dimensional discrete wavelet transform can be used to decompose the facial images into a multi-resolution representation in order to keep the

(51)

3.5. WAVELET SUB-BANDS

(a)

(b)

Figure 3.5: Orthogonal wavelets. (a) Haar wavelet, (b) Discrete Meyer wavelet.

(52)

3.5. WAVELET SUB-BANDS

(a)

(b)

Figure 3.6: Orthogonal wavelets. (a) Daubechies wavelet of order 5, (b) Daubechies wavelet of order 10,

(53)

3.5. WAVELET SUB-BANDS

(a)

(b)

Figure 3.7: Orthogonal wavelets. (a) Symlets wavelet of order 5, (b) Symlets wavelet of order 10,

(54)

3.5. WAVELET SUB-BANDS

(a)

(b)

Figure 3.8: Orthogonal wavelets. (a) Coiflets wavelet of order 1, (b) Coiflets wavelet of order 5.

(55)

3.5. WAVELET SUB-BANDS

(a)

(b)

Figure 3.9: Two-dimensional discrete wavelet transform on images. (a) orig- inal image, (b) the decomposing results through two-step two-dimensional DWT.

(56)

3.6. BASIS-EMPHASIZED NON-NEGATIVE MATRIX FACTORIZATION WITH WAVELET TRANSFORM least coefficients possible without losing useful image information. Fig. 3.10 depicts the decomposition process by applying the two-dimensional Haar wavelet transform of a face image and depicts the successive levels wavelet decomposition by applying the Haar wavelet transform on the low-frequency band sequentially. Note that the highest wavelet sub-band contains mostly noise and the contour of the decomposed facial image is clearer toward the left-top direction.

In this paper, Wavelet Transform is used to reconstruct a better represen- tation in the low spatial frequency bands by discarding the highest frequency spectrum of each level as shown in Fig. 3.11(d). Hence, it can make the facial images insensitive to facial expression, illumination variation and occlusion.

During reconstruction, these discarded coefficients are replaced with zeros.

We also compute the difference between the reconstructed and the original images and the error result is shown in Fig. 3.12, in which the larger the error, the lighter the color for each element.

3.6 Basis-emphasized Non-negative Matrix Factorization with Wavelet Transform

This section introduces a novel subspace projection technique via BNMF to represent human facial image in low frequency sub-band, which is able to realize through the Wavelet Transform. After wavelet decomposition and reconstruction, BNMF is performed to produce part-based representations of the images. The simulation results on the AR database show that the hybrid of BNMF and the best wavelet filter will yield better recognition rate and shorter training time. In order to achieve an excellent verification rate

(57)

3.6. BASIS-EMPHASIZED NON-NEGATIVE MATRIX FACTORIZATION WITH WAVELET TRANSFORM

(a) (b)

(c) (d)

Figure 3.10: Face image in wavelet sub-bands. (a) original image, (b) 1-level wavelet decomposition, (c) 2-level wavelet decomposition, (d) 3-level wavelet decomposition.

(58)

3.6. BASIS-EMPHASIZED NON-NEGATIVE MATRIX FACTORIZATION WITH WAVELET TRANSFORM

(a) (b)

(c) (d)

Figure 3.11: The reconstructed images using two-dimensional IDWT. (a) re- constructed image without discarding any frequency band, (b) reconstructed image discarding the first (lowest) frequency band, (c) reconstructed image discarding the second frequency band, (d) reconstructed image discarding the third (highest) frequency band.

(59)

3.6. BASIS-EMPHASIZED NON-NEGATIVE MATRIX FACTORIZATION WITH WAVELET TRANSFORM

(a) (b)

(c) (d)

Figure 3.12: The difference between the original image and the correspond- ing reconstructed image. (a) error image between the original image and Fig. 3.11(a), (b) error image between the original image and Fig. 3.11(b), (c) error image between the original image and Fig. 3.11(c), (d) error image between the original image and Fig. 3.11(d).

(60)

3.6. BASIS-EMPHASIZED NON-NEGATIVE MATRIX FACTORIZATION WITH WAVELET TRANSFORM

Figure 3.13: Flow chart of generating the wBNMF features.

when identifying the faces, We investigate the performance obtained by the integration of WT and BNMF to take the advantages of these two methods.

These results are compared with those learned by PCA and the original NMF techniques later.

In face recognition, dimensionality reduction is very important to project the facial images from a high-dimensional space onto a lower-dimensional space. And wavelet transform reduces the resolution of images and decreases the computation load of feature generation. With the adoption of wavelet transform, the training time can also be reduced significantly. In this pa- per, three level of wavelet decomposition is performed on face images. The reconstructed face image discarding the highest-frequency sub-band is then subjected to BNMF. The integrated framework of Wavelet Transform and Basis-emphasized Non-negative Matrix Factorization is abbreviated as wB- NMF. The flow chart of wBNMF is illustrated in Fig. 3.13.

(61)

3.7. BIOMETRIC RECOGNITION SYSTEM

3.7 Biometric Recognition System

In order to evaluate the performance of a biometric system, we use two types of error measures: False Acceptance Rate (FAR) and False Rejection Rate (FRR) as defined in Eqnation (3.3) and Eqn. (3.4). FAR, also called type I error, is the rate at which imposters are verified by the system and is considered as the most serious biometric security errors because it gives unauthorized users access to systems. FRR, also referred to as type II error, is the rate at which legitimate users are wrongly identified as impostors by the system. It is noticeable that a false rejection does not necessarily indicate a flaw in the biometric system. For example, in a fingerprint-based system, dirt on the scanner can result in the misreading of fingerprint, causing a false rejection of the authorized user.

F AR= number of accepted imposter

total number of imposter accesses × 100% (3.3) F RR= number of rejected genuine

total number of genuine accesses× 100% (3.4) Another useful measure can be obtained by combining the above two error rates into the Total Success Rate (TSR) as defined in Eqn. (3.5). A biometric security system predetermines the threshold values for its FAR and FRR, and when the rates are equal, the common value is referred to as the Equal Error Rate (EER) as defined in Eqn. (3.6). EER, also referred to as Crossover Error Rate (CER), is the point at which the proportion of false acceptances is equal to the portion of false rejections. The lower the equal

(62)

3.8. WAVELET DETERMINATION OF WBNMF

error rate value, the higher the accuracy of the biometric system.

T SR= (1−number of accepted imposter+ number of rejected genuine

total number of accesses )×100%

(3.5) EER= F AR+ F RR

2 (3.6)

In signal detection theory, a Receiver Operating Curve (ROC) is a graph- ical plot of the sensitivity vs. (1-specificity) for a binary classifier system as its discrimination threshold is varied. It can also be represented equivalently by plotting (1-FRR) vs. FAR. The best possible prediction method would yield a point in the upper left corner of the ROC space. The less the tradeoff between the sensitivity and specificity, the better the performance is. This is tantamount to saying: the closer the area under the ROC is to one, the bet- ter the performance is. Fig. 3.14 represents two typical ROC curves. Curve B has an area under the curve closer to one than curve A and then curve B is the better performance.

3.8 Wavelet Determination of wBNMF

In the field of face recognition using the AR database, we try to determine the best wavelet filter for wBNMF. Given a digitized image containing the neutral facial expression of a person, the facial features in the image can be extracted and then be used to identify the facial image of the same person2 from a testing set of smile facial expression images. The main steps involve projecting the facial images into a low dimensional feature space through the basis matrix W , and the Riemannian metric is used to define the similarity measure between two faces.

(63)

3.8. WAVELET DETERMINATION OF WBNMF

Figure 3.14: Receiver Operating Curve.

First, an experiment is carried out to determine the most appropriate basis number for BNMF. Under consideration of computational efficiency, we take the well-known AR database with reduced dimension of 60 × 60 and 1000 iterations are used to update W and H. In addition, the distance of the same person’s facial images in the training and testing sets respectively are measured to learn the possible thresholds. Different pairs of False Acceptance Rate (FAR) and False Rejection Rate (FRR) at various ascending thresholds are created to draw a Receiver Operating Curve (ROC), and the best thresh- old with the lowest Equal Error Rate (EER) can be detected. The resulting receiver operating curves with different basis numbers, r = 25, 75, 125, 175, are drawn in Fig. 3.15. Note that the closer the receiver operating curve

(64)

3.8. WAVELET DETERMINATION OF WBNMF

basis number (r) threshold FAR (%) FRR (%) EER (%) 25 2.3242e + 006 12.9697 15.2500 14.1098 75 2.9913e + 006 9.6843 20.5000 15.0922 125 3.5046e + 006 12.8131 18.0000 15.4066 175 3.6686e + 006 12.8131 18.2500 15.5316

Table 3.1: Error measures of BNMF using different basis numbers for the AR database. The lowest EER means the best recognition accuracy.

for each receiver operating curve, the most up-left point containing the best threshold can produce the best face recognition rate.

According to Table 3.1, the number of basis components are chosen to verify the best performance with the corresponding threshold. The optimum verification rate for the AR database using BNMF is EER = 14.1098 with FAR = 12.9697, FRR = 15.2500 when r = 25 at threshold = 2.3242e+006.

It can be observed that the optimum r is not the biggest one. This result indicates that moderately large r of BNMF is sufficient to discriminate dif- ferent faces. To sum up, the performance of a facial recognition system is measured by previously-mentioned error rates. Good performance can bal- ance the tradeoff between the verification rates and the false accept rates depending on application needs. An ideal system would have a verification rate of 100% and a false accept rate of 0%. But such systems do not exist.

Another experiment is accomplished by using a similar set of r = 25 to determine the optimum verification rate when wBNMF is integrated with multiple wavelet filters using decomposition level 3. In addition, we adopt five types of orthogonal wavelet filters: Haar, Discrete Meyer, Daubechies, Symlets and Coiflets. The last three ones have two different orders respec-

(65)

3.8. WAVELET DETERMINATION OF WBNMF

(a)

(b)

Figure 3.15: Comparison of BNMF using various basis numbers for the AR database. Since the upper left point in the original receiver operating curve is not distinct, we magnify the region between 70 and 90 of (1-FRR) in the form of percentage as shown in the lower figure.

參考文獻

相關文件

Matrix Factorization is an effective method for recommender systems (e.g., Netflix Prize and KDD Cup 2011).. But training

Matrix factorization (MF) and its extensions are now widely used in recommender systems.. In this talk I will briefly discuss three research works related to

In this paper, we propose a practical numerical method based on the LSM and the truncated SVD to reconstruct the support of the inhomogeneity in the acoustic equation with

In this chapter we develop the Lanczos method, a technique that is applicable to large sparse, symmetric eigenproblems.. The method involves tridiagonalizing the given

introduction to continuum and matrix model formulation of non-critical string theory.. They typically describe strings in 1+0 or 1+1 dimensions with a

For circular cone, a special non-symmetric cone, and circular cone optimization, like when dealing with SOCP and SOCCP, the following studies are cru- cial: (i) spectral

Key words: theory of the nature of the mind, the Buddha nature, one who possesses a gotra, non-resultant activity which is neither positive nor negative and is able

look up visibility, V look up BRDF for view, ρ integrate product of L, V, ρ set color of vertex. draw