Face Liveness Detection Based on Skin Blood Flow Analysis

(1)

Article

Face Liveness Detection Based on Skin Blood

Flow Analysis

Shun-Yi Wang1, Shih-Hung Yang2,*, Yon-Ping Chen1and Jyun-We Huang2

1 _{Department of Electrical Engineering, National Chiao Tung University, Hsinchu 30010, Taiwan;}

[email protected] (S.-Y.W.); [email protected] (Y.-P.C.)

2 _{Department of Mechanical and Computer-Aided Engineering, Feng Chia University,}

Taichung 40724, Taiwan; [email protected]

* Correspondence: [email protected]; Tel.: +886-4-2451-7250 (ext. 3527)

Received: 6 November 2017; Accepted: 1 December 2017; Published: 7 December 2017

Abstract:Face recognition systems have been widely adopted for user authentication in security systems due to their simplicity and effectiveness. However, spoofing attacks, including printed photos, displayed photos, and replayed video attacks, are critical challenges to authentication, and these spoofing attacks allow malicious invaders to gain access to the system. This paper proposes two novel features for face liveness detection systems to protect against printed photo attacks and replayed attacks for biometric authentication systems. The first feature obtains the texture difference between red and green channels of face images inspired by the observation that skin blood flow in the face has properties that enable distinction between live and spoofing face images. The second feature estimates the color distribution in the local regions of face images, instead of whole images, because image quality might be more discriminative in small areas of face images. These two features are concatenated together, along with a multi-scale local binary pattern feature, and a support vector machine classifier is trained to discriminate between live and spoofing face images. The experimental results show that the performance of the proposed method for face spoof detection is promising when compared with that of previously published methods. Furthermore, the proposed system can be implemented in real time, which is valuable for mobile applications.

Keywords:spoof detection; skin blood flow; block-based color moment; public domain database

1. Introduction

To protect personal privacy, biometric authentication systems, such as face and fingerprint recognition systems, have gained considerable attention for their ability to confirm user identity. Thus, face and fingerprint recognition systems [1,2] have been extensively researched and implemented in various security systems. In recent decades, human face recognition systems have been widely studied due to their simplicity and effectiveness for performing user authentication in security systems. One of the most popular mobile operating systems, Android, even allows users to unlock their smartphones through face recognition. As the need for face-recognition-based unlocking techniques increases, determining how to deal with spoofing attacks becomes a critical authentication challenge [3]. Spoofing attacks launched against an authentication system may allow malicious invaders to gain access to the system and can therefore lead to the leakage of private data [4]. A face recognition system mainly conducts face representation and face matching when a face is detected by a face detection algorithm. For face representation, most methods extract facial landmarks by geometrical descriptors for both 2D and 3D faces, and are robust in dealing with expression and occlusion [5–7]. For face matching, multi-class classifiers are usually adopted, such as support vector machine (SVM) and Bayesian classifiers. These face recognition systems have achieved satisfactory performance in security and forensic applications. However, a recent study showed that state-of-the-art face Symmetry 2017, 9, 305; doi:10.3390/sym9120305 www.mdpi.com/journal/symmetry

(2)

recognition systems that use commercial software are vulnerable to spoofing attacks using face images [8]. The reason for this is that live and spoofing face images of the same user may be similar in the feature space when a high resolution spoofing face image is provided. Even the human eye cannot distinguish a live face image from a spoofing face image at first glance [9]. Such attacks on a secure system is a substantial problem because acquiring face images or video from a camera or social media is easier than acquiring other biometric traits, such as fingerprints. Therefore, detection of face liveness is a difficult problem for the face recognition system. It is important to design face liveness detection algorithms to discriminate between live and spoofing face images.

With the rapid development of multimedia technology, malicious invaders can easily collect photographs or video of a targeted person from the Internet. Figure1shows samples of live face image, printed face image, and displayed face image. Because printed photos and video replay attacks are more easily launched than 3D mask attacks, this study focused its examination on printed photos, displayed photos, and replayed video attacks. The purpose of this paper is to develop a face liveness detection algorithm which protects the biometric system from printed photos, displayed photos, and replayed video attacks.

Symmetry 2017, 9, 305 2 of 18

art face recognition systems that use commercial software are vulnerable to spoofing attacks using face images [8]. The reason for this is that live and spoofing face images of the same user may be similar in the feature space when a high resolution spoofing face image is provided. Even the human eye cannot distinguish a live face image from a spoofing face image at first glance [9]. Such attacks on a secure system is a substantial problem because acquiring face images or video from a camera or social media is easier than acquiring other biometric traits, such as fingerprints. Therefore, detection of face liveness is a difficult problem for the face recognition system. It is important to design face liveness detection algorithms to discriminate between live and spoofing face images.

With the rapid development of multimedia technology, malicious invaders can easily collect photographs or video of a targeted person from the Internet. Figure 1 shows samples of live face image, printed face image, and displayed face image. Because printed photos and video replay attacks are more easily launched than 3D mask attacks, this study focused its examination on printed photos, displayed photos, and replayed video attacks. The purpose of this paper is to develop a face liveness detection algorithm which protects the biometric system from printed photos, displayed photos, and replayed video attacks.

(a) (b) (c)

Figure 1. Examples of (a) a live face image, (b) a printed photo, and (c) a photo displayed on a mobile phone.

2. Related Work

Various studies have proposed several “face liveness” detection methods to protect against printed photo attacks and replayed attacks. These methods are based on motion, image quality, texture, and depth, and are as follows:

1. Motion-Based Methods: Motion-based methods aim to detect the natural responses of live faces, such as eye blinking [10,11], head rotation [12], and mouth movements [13]. Although these methods can successfully detect printed photo attacks, they are ineffective at identifying replayed video attacks, which present natural responses. Furthermore, they require multiple frames (usually >3 s) to estimate facial motions restricted by the human physiological rhythm [14].

2. Image Quality Analysis-Based Methods: Image quality analysis-based methods [15,16] capture the image quality differences between live and spoofing face images. Image quality degradations, which are caused by spoofing mediums (e.g., paper and screen), usually appear in spoofing face images, and printed photos and replayed videos displayed on a monitor can be detected using color space analysis [17]. Thus, these methods extract chromatic moment features to distinguish a live face image from a spoofing face image. These methods usually assess image quality by using whole images and are highly generalizable. However, image quality might be more discriminative in small and local areas of face images.

3. Texture-Based Methods: Texture-based methods [9,18] assume that the use of various spoofing mediums would result in distinct surface reflection and shape deformation, which lead to texture differences between live and spoofing face images. These methods are used to perform face spoof detection by extracting texture features from a single face image and can thus provide a quick response. However, the texture features may lack good generalizability to various facial expressions, poses, and spoofing schemes when the training data are collected from few subjects and under limited conditions. Therefore, combining texture features and image quality features may improve the performance of face spoof detection.

Figure 1. Examples of (a) a live face image, (b) a printed photo, and (c) a photo displayed on a mobile phone.

2. Related Work

Various studies have proposed several “face liveness” detection methods to protect against printed photo attacks and replayed attacks. These methods are based on motion, image quality, texture, and depth, and are as follows:

1. Motion-Based Methods: Motion-based methods aim to detect the natural responses of live faces, such as eye blinking [10,11], head rotation [12], and mouth movements [13]. Although these methods can successfully detect printed photo attacks, they are ineffective at identifying replayed video attacks, which present natural responses. Furthermore, they require multiple frames (usually >3 s) to estimate facial motions restricted by the human physiological rhythm [14]. 2. Image Quality Analysis-Based Methods: Image quality analysis-based methods [15,16] capture the

image quality differences between live and spoofing face images. Image quality degradations, which are caused by spoofing mediums (e.g., paper and screen), usually appear in spoofing face images, and printed photos and replayed videos displayed on a monitor can be detected using color space analysis [17]. Thus, these methods extract chromatic moment features to distinguish a live face image from a spoofing face image. These methods usually assess image quality by using whole images and are highly generalizable. However, image quality might be more discriminative in small and local areas of face images.

3. Texture-Based Methods: Texture-based methods [9,18] assume that the use of various spoofing mediums would result in distinct surface reflection and shape deformation, which lead to texture differences between live and spoofing face images. These methods are used to perform face spoof detection by extracting texture features from a single face image and can thus provide a quick response. However, the texture features may lack good generalizability to various facial expressions, poses, and spoofing schemes when the training data are collected from few subjects

(3)

and under limited conditions. Therefore, combining texture features and image quality features may improve the performance of face spoof detection.

4. Depth-Based Methods: Depth-based methods [12,19] estimate the depth information of a face to discriminate a live 3D face from a spoofing face presented on 2D planar media. The defocusing technique [20], near-infrared sensors [21], and light field cameras [22] are representative examples of these methods. Depth features can be used to effectively detect printed photos and video replay attacks. On the other hand, few studies have developed 3D depth analysis methods to estimate the 3D depth information of a face. An optical flow field-based approach is proposed to analyze the difference in the optical flow field between a planar object and a 3D face [12]. Another study exploits geometric invariants according to a set of facial landmarks for detecting replay attacks [19]. However, to estimate the depth information, these methods generally require multiple frames or a depth-measuring device, which might increase the cost of the systems. To address the problems identified in the aforementioned methods, this study proposes a new framework, including two new features inspired by the texture-based method [18] and image distortion analysis [16], for face spoof detection. The experimental results showed that the proposed framework is competitive with state-of-the-art approaches, and the key contributions of this framework can be summarized as follows:

• The first feature highlights the distinct properties in red and green channels between live and spoofing face images. This feature can reveal skin blood flow differences between live and spoofing face images. This skin-related texture feature is extracted by the local binary pattern (LBP) operator in red and green channels and can detect both shape and color distortion. In other words, it combines the advantages of texture- and image quality analysis-based methods. • The second feature is a block-based color moment that estimates the color distribution in the local

regions of face images. This feature can preserve the local color distribution of face images and, further, provides more spatial information than does the color moment determined from a whole image. The local information helps discriminate between live and spoofing face images.

The proposed features were concatenated, along with a multi-scale local binary pattern (MLBP) feature, to construct a feature vector from a single image for providing a quick response. The feature vector was fed into an SVM to discriminate between live and spoofing face images. Four public domain databases, namely NUAA Photograph Imposter Database [23], CASIA Face Anti-Spoofing Database [24], Idiap Replay-Attack [9], and MSU Mobile Face Spoofing Database [16], were used to evaluate the performance of the proposed method. The experimental results demonstrated that the performance of the proposed method for face spoof detection is promising when compared with that of previously published methods. Furthermore, the proposed system requires less computational time (54.6 ms) and can thus be performed in real-time.

The remainder of this paper is organized as follows: Section2describes the proposed face spoof detection method in detail, Section3outlines the experimental results based on the public domain databases, and Section4presents a conclusion.

3. Face Livenss Detection

This section describes the individual steps of the proposed face liveness detection system, which are outlined in Figure2. Faces were detected using the Viola–Jones face detection algorithm [25] when the coordinates were not available in the databases; thereafter, they were normalized into a 64×64 pixel image. Distortions in the specular reflection components and color distribution usually appear in spoofing face images due to the spoofing mediums. In this study, three features were set to extract discriminative information for the live and spoofing face images according to skin texture and color distortion analysis. Subsequently, the features were concatenated to create a feature vector, which was fed into an SVM for classification. In the subsequent subsections, these features are explained in greater detail.

(4)

Symmetry 2017, 9, 305 4 of 18

Figure 2. Proposed face liveness detection system.

3.1. Multi-Scale Local Binary Pattern

As noted, live face images possess distinct surface reflection properties that distinguish them from 2D spoofing face images captured from printed photos and video replays. This differentiation is mainly due to specular and diffusion components. Thus, MLBP generalized from a LBP [26] was used as an image descriptor to extract the texture features related to reflection properties, which have been shown to have good discriminative ability for face spoof detection [18]. Furthermore, a uniform LBP operator [27] was employed to keep, at most, two bitwise transitions between 1 and 0 and to accumulate the other patterns in another bin. The uniform LBP operator for a pixel with value g_c surrounded by P neighborhood pixels in radius R is defined as

(

)

(

)

1 , 2 0 , 2 , if 2 1 , otherwise P p p c P R u p P R s g g U LBP LBP P − =  − ≤  =   ₊ 



_, (1)

(

,

)

(

1

) (

0

)

1

(

) (

1

)

1 P P R P c c p c p c p U LBP s g ₋ g s g g − s g g s g ₋ g = = − − − +



− − − _, (2)

where u2 denotes a uniform pattern and gp denotes the pth neighborhood pixel value. A feature

vector was then constructed by concatenating a uniform LBP histogram from the whole image. It has been found that the pixel intensity of the red channel of human skin is usually higher than that of either of the blue or green channels due to skin blood flow. Additionally, the reflectance characteristic of the red channel in live face images may be different from that in spoofing face images. Therefore, this study examined MLBP features in the red channel to determine facial texture according to three scales of LBP operators: LBP , 8,1u2 LBP , and 8,2u2 LBP16,2u2. The feature vector was a

concatenation of a 8,12

u

LBP histogram of nine overlapping image blocks and of 8,22

u

LBP and 16,22

u

LBP

histograms over the whole image. Each image block was divided from a normalized 64 × 64 face image with a 16-pixel overlap to highlight the central regions of an image, which may provide key facial details. Thus, the dimensionality of the MLBP feature vector is 531 + 243 + 59 = 833. Notably, the parameters of MLBP, such as P, R, and the number of overlapping image blocks, were designed according to the suggestion of [18] due to their satisfactory performance.

3.2. Red–Green Deviated Texture

It is known that skin blood flow in the face enables a live face to reflect red light and absorb green light [28]. A live face image therefore consists of a wider variety of intensity values and more detailed texture in the red channel than in the green channel. By contrast, a spoofing face image generated by a printer or displayed on a screen usually possesses a monotonic color distribution in both the red and green channels due to the imperfect color reproduction property of printing or display devices. The difference between red and green channels may help distinguish between live and spoofing face images. Furthermore, the specular and diffusion components in red and green channels of a live face image are different from those in a spoofing face image. This study therefore proposed a new feature, called the red–green (R–G) deviated texture, which is a dual-channel

Figure 2.Proposed face liveness detection system. 3.1. Multi-Scale Local Binary Pattern

As noted, live face images possess distinct surface reflection properties that distinguish them from 2D spoofing face images captured from printed photos and video replays. This differentiation is mainly due to specular and diffusion components. Thus, MLBP generalized from a LBP [26] was used as an image descriptor to extract the texture features related to reflection properties, which have been shown to have good discriminative ability for face spoof detection [18]. Furthermore, a uniform LBP operator [27] was employed to keep, at most, two bitwise transitions between 1 and 0 and to accumulate the other patterns in another bin. The uniform LBP operator for a pixel with value gc surrounded by P neighborhood pixels in radius R is defined as

LBPP,Ru2 =      P−1 ∑ p=0 s gp−gc2p , if U(LBPP,R) ≤2 P+1 , otherwise , (1) U(LBPP,R) =|s(gP−1−gc) −s(g0−gc)| + P−1

∑

p=1 s gp−gc−s gp−1−gc, (2) where u2 denotes a uniform pattern and gpdenotes the pth neighborhood pixel value. A feature vector was then constructed by concatenating a uniform LBP histogram from the whole image.

It has been found that the pixel intensity of the red channel of human skin is usually higher than that of either of the blue or green channels due to skin blood flow. Additionally, the reflectance characteristic of the red channel in live face images may be different from that in spoofing face images. Therefore, this study examined MLBP features in the red channel to determine facial texture according to three scales of LBP operators: LBP_8,1u2, LBP_8,2u2, and LBP_16,2u2 . The feature vector was a concatenation of a LBP_8,1u2histogram of nine overlapping image blocks and of LBP_8,2u2and LBP_16,2u2 histograms over the whole image. Each image block was divided from a normalized 64× 64 face image with a 16-pixel overlap to highlight the central regions of an image, which may provide key facial details. Thus, the dimensionality of the MLBP feature vector is 531 + 243 + 59 = 833. Notably, the parameters of MLBP, such as P, R, and the number of overlapping image blocks, were designed according to the suggestion of [18] due to their satisfactory performance.

3.2. Red–Green Deviated Texture

It is known that skin blood flow in the face enables a live face to reflect red light and absorb green light [28]. A live face image therefore consists of a wider variety of intensity values and more detailed texture in the red channel than in the green channel. By contrast, a spoofing face image generated by a printer or displayed on a screen usually possesses a monotonic color distribution in both the red and green channels due to the imperfect color reproduction property of printing or display devices. The difference between red and green channels may help distinguish between live and spoofing face images. Furthermore, the specular and diffusion components in red and green channels of a live face image are different from those in a spoofing face image. This study therefore proposed a new feature,

(5)

called the red–green (R–G) deviated texture, which is a dual-channel extraction based on the LBP operator for identifying the texture difference between the red and green channels.

The R–G deviated texture is histogram-generated from a whole face image and is defined as HR−G= H1R−G, HR−G2 ,· · ·, H59R−G with Hi_R−G= H i LBP_R−HiLBP_G , i=1, 2,· · ·, 59 , (3)

where Hi_{LBP_R}and H_{LBP_G}i denote the ith bin of the LBP histogram using LBP_8,1u2in the red and green channels, respectively. Notably, a uniform pattern [29] was adopted to implement a simple rotation invariant descriptor, which consists of at most two 1–0 or 0–1 transitions. Therefore, the LBP histogram used in this study is a 59-dimensional feature vector including 58 separate bins for uniform patterns and a single bin for all 198 nonuniform patterns.

Because the R–G deviated texture may contain discriminative information in small and local areas of an image, the normalized 64×64 face images in this study were divided into 3×3 blocks with 16-pixel overlapping. The R–G deviated texture was formed by concatenating the LBP histograms; it has a dimensionality of 531. This study analyzed the influence of the color channel on textures due to the skin blood flow differences between live and spoofing face images.

Figure3presents a graphical representation of the R–G deviated textures in live and spoofing face images, where the y axis denotes the percentage of deviation between the red and green channels. As revealed by the figure, the texture difference between red and green channels in the live face image is larger than that in the spoofing face image. In other words, spoofing face images present a different texture distribution compared with that in live face images, which suggests that the R–G deviated texture may be a feature with the ability to differentiate between live and spoofing face images.

Symmetry 2017, 9, 305 5 of 18

extraction based on the LBP operator for identifying the texture difference between the red and green channels.

The R–G deviated texture is histogram-generated from a whole face image and is defined as

(

1 2 59

)

G R G R G R G R− = H − ,H − ,,H − H with 59 2 1, , , i , H H H i G _ LBP i R _ LBP i G R− = − =  , (3) where HiLBP_R and i G _ LBP

H denote the ith bin of the LBP histogram using 8,12

u

LBP in the red and green channels, respectively. Notably, a uniform pattern [29] was adopted to implement a simple rotation invariant descriptor, which consists of at most two 1–0 or 0–1 transitions. Therefore, the LBP histogram used in this study is a 59-dimensional feature vector including 58 separate bins for uniform patterns and a single bin for all 198 nonuniform patterns.

Because the R–G deviated texture may contain discriminative information in small and local areas of an image, the normalized 64 × 64 face images in this study were divided into 3 × 3 blocks with 16-pixel overlapping. The R–G deviated texture was formed by concatenating the LBP histograms; it has a dimensionality of 531. This study analyzed the influence of the color channel on textures due to the skin blood flow differences between live and spoofing face images.

Figure 3 presents a graphical representation of the R–G deviated textures in live and spoofing face images, where the y axis denotes the percentage of deviation between the red and green channels. As revealed by the figure, the texture difference between red and green channels in the live face image is larger than that in the spoofing face image. In other words, spoofing face images present a different texture distribution compared with that in live face images, which suggests that the R–G deviated texture may be a feature with the ability to differentiate between live and spoofing face images.

(a) (b)

(c) (d)

Figure 3. R–G deviated texture in (a) a live face image and (b) a spoofing face image; (c,d) Graphical representation of the R–G deviated texture in the live and spoofing images.

3.3. Block-Based Color Moment

Image chromaticity and contrast distortion are the major distortions that occur in a spoofing face image captured from printed photo and replayed video [16]. These distortions lead to color distribution differences between live and spoofing face images due to the imperfect color reproduction property of spoofing media, such as printers and screens.

Figure 4 shows the color distributions in the hue, saturation, and value (HSV) space of a live face image, a printed photo, and an on-screen photo. In general, printed photos tend to have less color contrast and saturation than do live face images. By contrast, on-screen photos tend to have more contrast and brightness than do live face images.

0 50 100 150 200 250 300 350 400 450 500 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0 50 100 150 200 250 300 350 400 450 500 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

Figure 3.R–G deviated texture in (a) a live face image and (b) a spoofing face image; (c,d) Graphical representation of the R–G deviated texture in the live and spoofing images.

3.3. Block-Based Color Moment

Image chromaticity and contrast distortion are the major distortions that occur in a spoofing face image captured from printed photo and replayed video [16]. These distortions lead to color distribution differences between live and spoofing face images due to the imperfect color reproduction property of spoofing media, such as printers and screens.

Figure4shows the color distributions in the hue, saturation, and value (HSV) space of a live face image, a printed photo, and an on-screen photo. In general, printed photos tend to have less color contrast and saturation than do live face images. By contrast, on-screen photos tend to have more contrast and brightness than do live face images.

(6)

Symmetry 2017, 9, 305 6 of 18

(a) (b) (c) (d)

Figure 4. Examples of a live face image (first row), spoofing face image used in a printed photo attack (second row), and spoofing face image used in a video replay attack. Images were retrieved from the CASIA database. (a) Face image; (b) Histogram of the hue component; (c) Histogram of the saturation component; (d) Histogram of the value component.

In this study, the color distribution of an image was estimated to elucidate the chromatic differences between live and spoofing face images. First, the face image was converted from the RGB space to the HSV space. Subsequently, the mean, standard deviation, and skewness of the color distribution of an image were computed in the ith channel as follows:



= = N j ij i _N p E 1 1 , (4)

(

)



= − = N j i ij i p E N 1 2 1 σ , (5)

(

)

3 1 3 1

_

= − = N j i ij i p E N s _, (6)

where pij denotes the jth image pixel value in the ith color channel, and N is the total number of

pixels. These three statistical moments of each channel are also known as color moment features [30]. Therefore, the dimensionality of the color moment feature vector is 3_×3₌9.

In [30], the color moment features were extracted from a whole face image. However, the color moment features can reveal distinct properties in small and local areas of face images. The local regions of face images may show larger color distribution differences between live and spoofing face images than do entire face images. This study therefore proposed a block-based color moment, which is a concatenation of color moment features calculated from the local regions of a face image. Face images were first divided into 2 2× blocks without overlapping; subsequently, the color moment features from each of the four blocks were extracted for each color channel. By concatenating all of the color moment features from the four blocks, a block-based color moment with 36 dimensions was constructed.

Finally, the MLBP features, block-based color moment, and R–G deviated texture feature were concatenated together to create a feature vector whose dimensionality was 833 + 531 + 36 = 1400. An

0 50 100 150 200 250 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0 50 100 150 200 250 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0 50 100 150 200 250 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0 50 100 150 200 250 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 50 100 150 200 250 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 0 50 100 150 200 250 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0 50 100 150 200 250 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 50 100 150 200 250 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0 50 100 150 200 250 0 0.005 0.01 0.015 0.02 0.025 0.03

Figure 4.Examples of a live face image (first row), spoofing face image used in a printed photo attack (second row), and spoofing face image used in a video replay attack. Images were retrieved from the CASIA database. (a) Face image; (b) Histogram of the hue component; (c) Histogram of the saturation component; (d) Histogram of the value component.

In this study, the color distribution of an image was estimated to elucidate the chromatic differences between live and spoofing face images. First, the face image was converted from the RGB space to the HSV space. Subsequently, the mean, standard deviation, and skewness of the color distribution of an image were computed in the ith channel as follows:

Ei = 1 N N

∑

j=1 pij, (4) σi = v u u t 1 N N

∑

j=1 pij−Ei2, (5) si= 3 v u u t 1 N N

∑

j=1 pij−Ei3, (6)

where pijdenotes the jth image pixel value in the ith color channel, and N is the total number of pixels. These three statistical moments of each channel are also known as color moment features [30]. Therefore, the dimensionality of the color moment feature vector is 3×3=9.

In [30], the color moment features were extracted from a whole face image. However, the color moment features can reveal distinct properties in small and local areas of face images. The local regions of face images may show larger color distribution differences between live and spoofing face images than do entire face images. This study therefore proposed a block-based color moment, which is a concatenation of color moment features calculated from the local regions of a face image. Face images were first divided into 2×2 blocks without overlapping; subsequently, the color moment features from each of the four blocks were extracted for each color channel. By concatenating all of the color moment features from the four blocks, a block-based color moment with 36 dimensions was constructed.

Finally, the MLBP features, block-based color moment, and R–G deviated texture feature were concatenated together to create a feature vector whose dimensionality was 833 + 531 + 36 = 1400. An SVM classifier [31] was then trained to discriminate between live and spoofing face images by

(7)

using library LibSVM [32]. The objective of the SVM is to search for an optimal hyper-plane which separates the face images into live and spoofing face images with a maximum margin. A linear SVM was implemented for NUAA Photograph Imposter Database, CASIA Face Anti-Spoofing Database, and Idiap Replay-Attack, while a nonlinear SVM with the radial basis function kernel was implemented for MSU Mobile Face Spoofing Database in order to compare the proposed method with that developed by Wen et al. [16]. Notably, the linear SVM was trained with the default parameters due to their reliable performance. A parameter optimization was performed for the nonlinear SVM by cross-validation to ensure a fair comparison. Furthermore, the attributes were scaled to avoid numerical difficulties [33]. Because the R–G deviated texture and block-based color moment were extracted as the features, the proposed system required color images.

4. Empirical Work

The proposed face liveness detection system was evaluated using four public domain databases according to the training and testing protocols from [9,23,24]. This section first introduces the four public domain databases: NUAA Photograph Imposter Database [23], CASIA Face Anti-Spoofing Database [24], Idiap Replay-Attack [9], and MSU Mobile Face Spoofing Database [16]. Then, the empirical results including the effects of the color channel and individual features on the performance of the proposed method were demonstrated.

4.1. Face Spoofing Database

Four public domain databases containing images of various 2D face spoof attacks were used to evaluate the performance of the proposed face liveness detection system. The system detected the face using the Viola–Jones face detection algorithm [25] and normalized the face image into a 64×64 pixel image. The properties of the four public domain databases are summarized as follows.

4.1.1. NUAA Photograph Imposter Database

The NUAA Photograph Imposter Database [23] was created in 2010 and is currently one of the most widely used benchmark databases. This database contains 5105 live client and 7509 printed photo attack images from 15 Asian subjects in various environments and under different illumination conditions. The live client images were captured using a webcam (20 fps, 640×480 pixels), whereas the printed photo attack images were captured using a Canon camera (Canon, Inc., Lake Success, NY, USA) and were then printed on both A4 paper and photographic paper. Figure5shows a few samples of the live client and printed photo attack images available in the NUAA database.

Symmetry 2017, 9, 305 7 of 18

SVM classifier [31] was then trained to discriminate between live and spoofing face images by using library LibSVM [32]. The objective of the SVM is to search for an optimal hyper-plane which separates the face images into live and spoofing face images with a maximum margin. A linear SVM was implemented for NUAA Photograph Imposter Database, CASIA Face Anti-Spoofing Database, and Idiap Replay-Attack, while a nonlinear SVM with the radial basis function kernel was implemented for MSU Mobile Face Spoofing Database in order to compare the proposed method with that developed by Wen et al. [16]. Notably, the linear SVM was trained with the default parameters due to their reliable performance. A parameter optimization was performed for the nonlinear SVM by cross-validation to ensure a fair comparison. Furthermore, the attributes were scaled to avoid numerical difficulties [33]. Because the R–G deviated texture and block-based color moment were extracted as the features, the proposed system required color images.

4. Empirical Work

The proposed face liveness detection system was evaluated using four public domain databases according to the training and testing protocols from [9,23,24]. This section first introduces the four public domain databases: NUAA Photograph Imposter Database [23], CASIA Face Anti-Spoofing Database [24], Idiap Replay-Attack [9], and MSU Mobile Face Spoofing Database [16]. Then, the empirical results including the effects of the color channel and individual features on the performance of the proposed method were demonstrated.

4.1. Face Spoofing Database

Four public domain databases containing images of various 2D face spoof attacks were used to evaluate the performance of the proposed face liveness detection system. The system detected the face using the Viola–Jones face detection algorithm [25] and normalized the face image into a

64

64× pixel image. The properties of the four public domain databases are summarized as follows. 4.1.1. NUAA Photograph Imposter Database

The NUAA Photograph Imposter Database [23] was created in 2010 and is currently one of the most widely used benchmark databases. This database contains 5105 live client and 7509 printed photo attack images from 15 Asian subjects in various environments and under different illumination conditions. The live client images were captured using a webcam (20 fps, 640×480 pixels), whereas the printed photo attack images were captured using a Canon camera (Canon, Inc., Lake Success, NY, USA) and were then printed on both A4 paper and photographic paper. Figure 5 shows a few samples of the live client and printed photo attack images available in the NUAA database.

Figure 5. Samples of live face images (top row) and spoofing face images (bottom row) in the NUAA

database.

4.1.2. CASIA Face Anti-Spoofing Database

The CASIA Face Anti-Spoofing Database [24] was launched in 2012 and contains images of three types of spoofing attacks: printed photo attack, printed photos with perforated eye regions, and video replay attacks. This database contains 150 live and 450 spoofing videos collected from 50 Asian subjects, which were captured in triplicate using a low-quality camera (640×480), a normal quality Figure 5. Samples of live face images (top row) and spoofing face images (bottom row) in the NUAA database.

4.1.2. CASIA Face Anti-Spoofing Database

The CASIA Face Anti-Spoofing Database [24] was launched in 2012 and contains images of three types of spoofing attacks: printed photo attack, printed photos with perforated eye regions, and video replay attacks. This database contains 150 live and 450 spoofing videos collected from 50 Asian subjects, which were captured in triplicate using a low-quality camera (640×480), a normal quality camera

(8)

(480×640), and a high-quality Sony NEX-5 camera (Sony, Tokyo, Japan) (1920×1080). Figure6shows examples of each of the three types of spoofing attacks.

Symmetry 2017, 9, 305 8 of 18

camera (480×640), and a high-quality Sony NEX-5 camera (Sony, Tokyo, Japan) (1920×1080). Figure 6 shows examples of each of the three types of spoofing attacks.

(a) (b) (c)

Figure 6. Samples of spoofing attack images in the CASIA database. (a) Printed photo attack; (b) Printed photo with perforated eye regions; (c) Video replay attack.

4.1.3. Idiap Replay-Attack

Idiap Replay-Attack [9] emerged in 2012 and consists of three types of spoofing attack videos: printed photos, mobile phone attacks, and tablet attacks. This database contains 200 live and 1000 spoofing attack videos collected from 50 subjects who are identified as Caucasian, Asian, or African. The live face videos were collected using a MacBook Webcam (Apple Inc., Cupertino, CA, USA) (320×240 pixels), whereas the spoof face videos were collected using a Canon PowerShot SX150 IS camera (Canon, Inc., Lake Success, NY, USA) (1280×720 pixels). Additionally, the videos are captured under two types of stationary conditions: with a fluorescent lamp against a uniform background or in daylight against a nonuniform background. Furthermore, each attack video is captured in both hand-based and fixed-support modes. Figure 7 shows the face samples of the live and spoofing face images. This study followed the protocols specified in [9] and, thus, adopted all frames in the training set to train the classifier and those in the developing set to determine the threshold value. The classifier was then tested using all of the frames in the testing set of Idiap Replay-Attack.

(a) (b) (c) (d)

Figure 7. Face samples in Idiap Replay-Attack. (a) Live face image; (b) Spoofing face image used in printed photo attacks; (c) Spoofing face image used in mobile phone attacks; (d) Spoofing face image used in tablet attacks.

4.1.4. MSU Mobile Face Spoofing Database

The MSU Mobile Face Spoofing Database [16] was launched in 2015 and contains both printed photos and replayed video attacks. In total, this database contains 110 live videos and 330 spoofing attack videos collected from 35 subjects who are identified as Caucasian (70%), Asian (28%), or African (2%). Similar to Idiap Replay-Attack, live face videos were captured using a MacBook Air laptop camera (Apple Inc., Cupertino, CA, USA) (640×480 pixels) and Google Nexus 5 mobile phone camera (Google, Mountain View, CA, USA) (720×480 pixels), whereas the spoof face videos were captured using a Canon 550D SLR camera (Canon, Inc., Lake Success, NY, USA) (1920×1088 pixels) and iPhone 5s camera (Apple Inc., Cupertino, CA, USA) (1920×1080 pixels). Each video is at least 9 s long, with 30 fps. Because the face images are captured on a mobile phone, the MSU database can simulate mobile phone unlocking applications. Furthermore, the printed photos are of higher quality than those from other databases due to the use of a state-of-the-art color printer (HP Color Laserjet CP6015xh). The videos are replayed on two attack media (iPad Air and iPhone 5S screens). Figure 8 shows samples of some live and spoofing face images found in the database.

Figure 6.Samples of spoofing attack images in the CASIA database. (a) Printed photo attack; (b) Printed photo with perforated eye regions; (c) Video replay attack.

Idiap Replay-Attack [9] emerged in 2012 and consists of three types of spoofing attack videos: printed photos, mobile phone attacks, and tablet attacks. This database contains 200 live and 1000 spoofing attack videos collected from 50 subjects who are identified as Caucasian, Asian, or African. The live face videos were collected using a MacBook Webcam (Apple Inc., Cupertino, CA, USA) (320×240 pixels), whereas the spoof face videos were collected using a Canon PowerShot SX150 IS camera (Canon, Inc., Lake Success, NY, USA) (1280×720 pixels). Additionally, the videos are captured under two types of stationary conditions: with a fluorescent lamp against a uniform background or in daylight against a nonuniform background. Furthermore, each attack video is captured in both hand-based and fixed-support modes. Figure7shows the face samples of the live and spoofing face images. This study followed the protocols specified in [9] and, thus, adopted all frames in the training set to train the classifier and those in the developing set to determine the threshold value. The classifier was then tested using all of the frames in the testing set of Idiap Replay-Attack.

Symmetry 2017, 9, 305 8 of 18

camera (480×640), and a high-quality Sony NEX-5 camera (Sony, Tokyo, Japan) (1920×1080). Figure 6 shows examples of each of the three types of spoofing attacks.

(a) (b) (c)

Figure 6. Samples of spoofing attack images in the CASIA database. (a) Printed photo attack; (b) Printed photo with perforated eye regions; (c) Video replay attack.

Idiap Replay-Attack [9] emerged in 2012 and consists of three types of spoofing attack videos: printed photos, mobile phone attacks, and tablet attacks. This database contains 200 live and 1000 spoofing attack videos collected from 50 subjects who are identified as Caucasian, Asian, or African. The live face videos were collected using a MacBook Webcam (Apple Inc., Cupertino, CA, USA) (320×240 pixels), whereas the spoof face videos were collected using a Canon PowerShot SX150 IS camera (Canon, Inc., Lake Success, NY, USA) (1280×720 pixels). Additionally, the videos are captured under two types of stationary conditions: with a fluorescent lamp against a uniform background or in daylight against a nonuniform background. Furthermore, each attack video is captured in both hand-based and fixed-support modes. Figure 7 shows the face samples of the live and spoofing face images. This study followed the protocols specified in [9] and, thus, adopted all frames in the training set to train the classifier and those in the developing set to determine the threshold value. The classifier was then tested using all of the frames in the testing set of Idiap Replay-Attack.

(a) (b) (c) (d)

Figure 7. Face samples in Idiap Replay-Attack. (a) Live face image; (b) Spoofing face image used in printed photo attacks; (c) Spoofing face image used in mobile phone attacks; (d) Spoofing face image used in tablet attacks.

The MSU Mobile Face Spoofing Database [16] was launched in 2015 and contains both printed photos and replayed video attacks. In total, this database contains 110 live videos and 330 spoofing attack videos collected from 35 subjects who are identified as Caucasian (70%), Asian (28%), or African (2%). Similar to Idiap Replay-Attack, live face videos were captured using a MacBook Air laptop camera (Apple Inc., Cupertino, CA, USA) (640×480 pixels) and Google Nexus 5 mobile phone camera (Google, Mountain View, CA, USA) (720×480 pixels), whereas the spoof face videos were captured using a Canon 550D SLR camera (Canon, Inc., Lake Success, NY, USA) (1920×1088 pixels) and iPhone 5s camera (Apple Inc., Cupertino, CA, USA) (1920×1080 pixels). Each video is at least 9 s long, with 30 fps. Because the face images are captured on a mobile phone, the MSU database can simulate mobile phone unlocking applications. Furthermore, the printed photos are of higher quality than those from other databases due to the use of a state-of-the-art color printer (HP Color Laserjet CP6015xh). The videos are replayed on two attack media (iPad Air and iPhone 5S screens). Figure 8 shows samples of some live and spoofing face images found in the database.

Figure 7.Face samples in Idiap Replay-Attack. (a) Live face image; (b) Spoofing face image used in printed photo attacks; (c) Spoofing face image used in mobile phone attacks; (d) Spoofing face image used in tablet attacks.

The MSU Mobile Face Spoofing Database [16] was launched in 2015 and contains both printed photos and replayed video attacks. In total, this database contains 110 live videos and 330 spoofing attack videos collected from 35 subjects who are identified as Caucasian (70%), Asian (28%), or African (2%). Similar to Idiap Replay-Attack, live face videos were captured using a MacBook Air laptop camera (Apple Inc., Cupertino, CA, USA) (640×480 pixels) and Google Nexus 5 mobile phone camera (Google, Mountain View, CA, USA) (720×480 pixels), whereas the spoof face videos were captured using a Canon 550D SLR camera (Canon, Inc., Lake Success, NY, USA) (1920×1088 pixels) and iPhone 5s camera (Apple Inc., Cupertino, CA, USA) (1920×1080 pixels). Each video is at least 9 s long, with 30 fps. Because the face images are captured on a mobile phone, the MSU database can simulate mobile phone unlocking applications. Furthermore, the printed photos are of higher quality than those

(9)

from other databases due to the use of a state-of-the-art color printer (HP Color Laserjet CP6015xh). The videos are replayed on two attack media (iPad Air and iPhone 5S screens). Figure8shows samples of some live and spoofing face images found in the database.

Symmetry 2017, 9, 305 9 of 18

(a) (b) (c) (d)

Figure 8. Face samples in the MSU database that were captured using the cameras in a Google Nexus 5 mobile phone (top row) and MacBook Air (bottom row). (a) Live face images; (b) Spoofing face images replayed on an iPad Air screen; (c) Spoofing face images replayed on an iPhone 5S screen; (d) Spoofing face images used in a printed photo attack.

4.2. Effects of Different Color Channels

The MLBP used in this study could extract texture features from a specific color channel of a facial image. In this section, the influence of various color channels (i.e., red, green, and blue channels in the RGB space; the luminance channel in the YUV space where Y is the luminance, and U and V are the chrominance; and the luminance channel in the HSV space) on the proposed face liveness detection system in the four public domain databases is reviewed. Notably, only the MLBP was extracted as the feature vector, which was then fed into the SVM classifier. Figure 9a–d present the receiver operating characteristic (ROC) curves of various color channels in the NUAA, CASIA, Idiap, and MSU databases, respectively. The Grey and Value lines represent the luminance channels in the YUV and HSV spaces, respectively. As described in the earlier text, the red channel provided the best performance among all color channels. This finding indicated that the texture features in the red channel offer information that helps discriminate between live and spoofing face images.

(a) (b) (c) (d) Figure 9. Face spoofing detection performance on the (a) NUAA; (b) CASIA; (c) Idiap; and (d) MSU databases, using the MLBP to extract features from various color channels.

4.3. Effects of Different Features

The proposed face liveness detection system utilizes a combination of three features: the MLBP, R–G deviated texture, and the block-based color moment. The effects of individual features and various combinations of the features were analyzed, and the results are listed in Table 1. For all combinations, the face detection and classifier were identical to the proposed face liveness detection system. The performance of the system using various features from each database was evaluated by calculating the accuracy rate, as follows:

100 (%)

TP TN

Accuracy rate

N

+

=

×

, (7)

where N denotes the total number of face images (including live and spoofing face images), and

TP

and TN indicate the numbers of correctly identified live and spoofing face images, respectively. In addition to the accuracy rate, ROC curve data were collected, and the area under the ROC curve (AUC) was calculated as another performance index of the proposed face spoof detection system. Notably, methods that have larger AUCs are generally considered to be more accurate methods. The

T

rue Pos

itive

Rate

Figure 8.Face samples in the MSU database that were captured using the cameras in a Google Nexus 5 mobile phone (top row) and MacBook Air (bottom row). (a) Live face images; (b) Spoofing face images replayed on an iPad Air screen; (c) Spoofing face images replayed on an iPhone 5S screen; (d) Spoofing face images used in a printed photo attack.

4.2. Effects of Different Color Channels

The MLBP used in this study could extract texture features from a specific color channel of a facial image. In this section, the influence of various color channels (i.e., red, green, and blue channels in the RGB space; the luminance channel in the YUV space where Y is the luminance, and U and V are the chrominance; and the luminance channel in the HSV space) on the proposed face liveness detection system in the four public domain databases is reviewed. Notably, only the MLBP was extracted as the feature vector, which was then fed into the SVM classifier. Figure9a–d present the receiver operating characteristic (ROC) curves of various color channels in the NUAA, CASIA, Idiap, and MSU databases, respectively. The Grey and Value lines represent the luminance channels in the YUV and HSV spaces, respectively. As described in the earlier text, the red channel provided the best performance among all color channels. This finding indicated that the texture features in the red channel offer information that helps discriminate between live and spoofing face images.

Symmetry 2017, 9, 305 9 of 18

(a) (b) (c) (d)

Figure 8. Face samples in the MSU database that were captured using the cameras in a Google Nexus 5 mobile phone (top row) and MacBook Air (bottom row). (a) Live face images; (b) Spoofing face images replayed on an iPad Air screen; (c) Spoofing face images replayed on an iPhone 5S screen; (d) Spoofing face images used in a printed photo attack.

4.2. Effects of Different Color Channels

The MLBP used in this study could extract texture features from a specific color channel of a facial image. In this section, the influence of various color channels (i.e., red, green, and blue channels in the RGB space; the luminance channel in the YUV space where Y is the luminance, and U and V are the chrominance; and the luminance channel in the HSV space) on the proposed face liveness detection system in the four public domain databases is reviewed. Notably, only the MLBP was extracted as the feature vector, which was then fed into the SVM classifier. Figure 9a–d present the receiver operating characteristic (ROC) curves of various color channels in the NUAA, CASIA, Idiap, and MSU databases, respectively. The Grey and Value lines represent the luminance channels in the YUV and HSV spaces, respectively. As described in the earlier text, the red channel provided the best performance among all color channels. This finding indicated that the texture features in the red channel offer information that helps discriminate between live and spoofing face images.

(a) (b) (c) (d)

Figure 9. Face spoofing detection performance on the (a) NUAA; (b) CASIA; (c) Idiap; and (d) MSU databases, using the MLBP to extract features from various color channels.

4.3. Effects of Different Features

The proposed face liveness detection system utilizes a combination of three features: the MLBP, R–G deviated texture, and the block-based color moment. The effects of individual features and various combinations of the features were analyzed, and the results are listed in Table 1. For all combinations, the face detection and classifier were identical to the proposed face liveness detection system. The performance of the system using various features from each database was evaluated by calculating the accuracy rate, as follows:

100 (%) TP TN Accuracy rate N + = × , (7)

where N denotes the total number of face images (including live and spoofing face images), and

TP and TN indicate the numbers of correctly identified live and spoofing face images, respectively. In addition to the accuracy rate, ROC curve data were collected, and the area under the ROC curve (AUC) was calculated as another performance index of the proposed face spoof detection system. Notably, methods that have larger AUCs are generally considered to be more accurate methods. The

T

rue Pos

itive

Rate

Figure 9.Face spoofing detection performance on the (a) NUAA; (b) CASIA; (c) Idiap; and (d) MSU databases, using the MLBP to extract features from various color channels.

4.3. Effects of Different Features

The proposed face liveness detection system utilizes a combination of three features: the MLBP, R–G deviated texture, and the block-based color moment. The effects of individual features and various combinations of the features were analyzed, and the results are listed in Table1. For all combinations, the face detection and classifier were identical to the proposed face liveness detection system. The performance of the system using various features from each database was evaluated by calculating the accuracy rate, as follows:

Accuracy rate= TP+TN

(10)

where N denotes the total number of face images (including live and spoofing face images), and TP and TN indicate the numbers of correctly identified live and spoofing face images, respectively. In addition to the accuracy rate, ROC curve data were collected, and the area under the ROC curve (AUC) was calculated as another performance index of the proposed face spoof detection system. Notably, methods that have larger AUCs are generally considered to be more accurate methods. The training and testing protocols for the four public domain databases were identical to those used in [9,23,24].

Table 1. Face spoofing detection performance (%) regarding various combinations of features in images from the four public domain databases where values in bold indicate the best results among the features.

Feature NUAA CAISA Idiap MSU

Accuracy AUC Accuracy AUC Accuracy AUC Accuracy AUC

(i) 80.04 91.68 82.90 90.52 86.04 90.12 75.99 86.82 (ii) 85.60 92.46 80.72 86.08 87.13 92.79 85.15 91.17 (iii) 72.48 90.97 85.66 89.27 84.76 92.28 78.19 86.64 (iv) 73.60 91.34 89.03 91.85 91.56 93.78 79.19 87.72 (v) 95.45 99.29 90.72 95.13 93.74 97.46 82.13 90.44 (vi) 95.52 99.34 91.70 95.35 95.52 98.73 88.45 93.89 (vii) 98.56 99.85 88.59 94.00 92.59 97.13 86.31 92.57 (viii) 92.16 99.43 90.02 94.27 92.01 97.08 88.68 94.47 (ix) 96.69 99.96 93.24 96.57 96.55 99.34 90.06 95.71

i–ix: MLBP, R–G deviated texture, color moment, block-based color moment, MLBP + color moment, MLBP + block-based color moment, MLBP + R–G deviated texture, R–G deviated texture + block-based color moment, and proposed feature, respectively.

Notably, the color moment in [16] was calculated from a whole image (i.e., without dividing the image into blocks), whereas the block-based color moment used in the present study was calculated from the four individual blocks of an image. We assumed that the color moments may be more discriminative in small and local areas of the image than the moments calculated from a whole image. Therefore, the images were divided into 2×2 blocks without overlap for examination. The feature vector of the block-based color moment was formed by concatenating the color moments calculated from each individual block. As shown in Table1, the block-based color moment (iii) achieved better performance than did a single color moment calculated from a whole image (iv), in terms of both the accuracy rate and AUC, in all of the databases. In other words, the color moments in the local areas of an image provided more spatial information about the face and were more discriminative than was the color moment calculated from a whole image.

Table1also reveals that the R–G deviated feature achieves better performance than do the other individual features in the NUAA and MSU databases, but not in the Idiap or CASIA database. This feature also achieved the lowest AUC (86.08%) among all individual features in the CASIA database. The result indicated that the difference between the red and green channels in live face images is distinct from those in the spoof face images in the NUAA and MSU databases, but the same is not true in the Idiap or CASIA database. By contrast, the block-based color moment achieved better performance than the other individual features in both the Idiap and CASIA databases, but not in the NUAA or MSU databases. Furthermore, this feature achieved the highest AUC (93.78%) among all individual features in the Idiap database. This result showed that spoof attack images possess imperfect color reproduction properties, which lead to a color distribution that is distinct from that in the live face images in both the Idiap and CASIA databases. Therefore, this color-based feature can discriminate between live and spoofing face images in these two databases.

Most of the combinations of the features (v–viii) achieved better performance than did the individual features (i–iv). For example, the R–G deviated feature improved when it was combined with the MLBP in the NUAA database: combining these two features enhanced the AUC by 7.39% and generated the highest AUC of 99.85%. The block-based color moment also improved when it was

(11)

combined with the MLBP in the CASIA, Idiap, and MSU databases. Combining these two features enhanced the AUC by 3.5%, 4.95%, and 2.72% in the CASIA, Idiap, and MSU databases, respectively, and generated the highest AUCs of 95.35% and 98.73% in the CASIA and Idiap databases, respectively. Both the R–G deviated feature alone and combined with the MLBP achieved the optimum performance in the NUAA database. Furthermore, both the block-based color moment alone and combined with the MLBP achieved the optimum performance in the CASIA and Idiap databases. The MLBP was helpful in differentiating reflectance between live and spoofing face images. However, among all combinations, none of these combinations achieved the best performance (regarding either the accuracy rate or AUC) in all databases. By contrast, the proposed face liveness detection system combined all three features to further improve spoofing image identification. Specifically, the AUC results were 96.57%, 99.34%, and 95.71% in the CASIA, Idiap, and MSU databases, respectively. Although the proposed method had a lower accuracy rate than did the method combining the MLBP and R–G deviated texture, the proposed method achieved the highest AUC (99.96%) in the NUAA database. This finding confirmed that the proposed method has an excellent ability to discriminate between live and spoofing face images and can accurately identify spoofing face images in all of the databases. 5. Performance Evaluation

In this section, the proposed method is compared with previously published methods from [16,18,23,34,35], which adopted various preprocessing techniques, features, and classifiers. First, the performance indices that are used to measure the performance of a face liveness detection system are described. Subsequently, the performance of different face spoofing detection methods is compared.

5.1. Performance Index

The performance of a face liveness detection system was determined based on its accuracy rate (in the NUAA database), equal error rate (EER) (in the CASIA and MSU databases), and half total error rate (HTER) (in the Idiap database). Face liveness detection system errors can be divided into false acceptance (wherein a spoofing face image is classified as a live face image) and false rejection (wherein a live face image is classified as a spoofing face image). The HTER is defined as half of the sum of the false acceptance rate (FAR) and false rejection rate (FRR) and is calculated as

HTER(τ) = FAR(τ) +FRR(τ)

2 , (8)

where τ denotes the threshold of a classifier. FAR and FRR are the ratios of incorrectly classified spoofing face images and live face images, respectively, and are defined as follows:

FAR(τ) = # of false acceptance

# of spoofing faces , (9)

FRR(τ) = # of false rejection

# of real faces . (10)

Typically, when the FAR increases, the FRR decreases; the lower the HTER, the better the method is. Additionally, the EER is defined as a point in an ROC curve where the FAR equals the FRR; the lower an EER value, the better the classification ability of a detection system is.

Table2reveals the abilities of different face liveness detection systems that were applied to identify live and spoofing face images in four public domain databases.

(12)

Symmetry 2017, 9, 305 12 of 18

Table 2. Performance of various spoofing detection methods for images in the four public domain databases. N/A means not applicable and values in bold indicate the best results among the methods.

Method Classifier NUAA CASIA Idiap MSU

Accuracy AUC EER HTER EER

Määttä et al. [18] Nonlinear SVM 92.70% 99.00% N/A N/A N/A Tan et al. [23] _{logistic regression}Sparse nonlinear 84.50% 95.00% N/A N/A N/A

Kim et al. [35] Linear SVM 98.45% N/A N/A 12.50% N/A

Pinto et al. [34] Linear SVM N/A N/A 14.00% N/A N/A

Wen et al. [16] Ensemble SVM N/A N/A 12.90% 7.41% 8.58%

Proposed method Linear SVM 96.69% 99.96% 7.01% 4.92% 10.20%

Proposed method1 _{Nonlinear SVM} _N/A _N/A _N/A _N/A _7.23%

1_{A nonlinear support vector machine (SVM) was adopted to compare the proposed method with that developed by}

Wen et al. [16] in the MSU database.

5.2. Comparison with Other Methods in NUAA Database

For the NUAA database, our proposed method achieved an accuracy rate of 96.69%. This is slightly worse than the method proposed by Kim et al. [35] but outperforms the other two methods [18,23]. Furthermore, our proposed method achieved an AUC of 99.96%, which exceeds those achieved by [18] (AUC = 99%) and [23] (AUC = 95%). Kim et al. [35] did not include AUC results in their study. This finding indicated that our proposed method can effectively classify live and spoofing face images in the NUAA database. Figure10shows ten examples of correctly classified images. Because the live face images are captured under adequate illumination and lighting conditions, the texture-based feature can capture facial textures from live face images. Furthermore, the proposed method also captured the color distortion, shape deformation, and surface reflection from the spoofing face images. Figure11presents five examples of misclassified spoofing face images in the NUAA database. Notably, there were no false rejection results (i.e., no live face images were misclassified). This result implied that the spoofing face images that were dimly lit or too bright or those that had an unknown light reflection were often misclassified by the proposed method. Although some correctly classified images were recorded under bright conditions (e.g., the last two images of the first row in Figure10), the misclassified spoofing face images (e.g., the third image in Figure11) were recorded under a “too bright” condition, which created an unclear texture. One possible reason for such misclassification is the lack of detailed texture information due to the dim light and too bright conditions. Additionally, there is a large portion of the background shown in the cropped images (e.g., the fourth image in Figure11). Thus, the face detector might affect the performance of the proposed face spoofing detection method.

Table 2. Performance of various spoofing detection methods for images in the four public domain databases. N/A means not applicable and values in bold indicate the best results among the methods.

Method Classifier NUAA CASIA Idiap MSU

Accuracy AUC EER HTER EER

Määttä et al. [18] Nonlinear SVM 92.70% 99.00% N/A N/A N/A

Tan et al. [23]

Sparse nonlinear logistic regression

84.50% 95.00% N/A N/A N/A

Kim et al. [35] Linear SVM 98.45% N/A N/A 12.50% N/A

Pinto et al. [34] Linear SVM N/A N/A 14.00% N/A N/A

Wen et al. [16] Ensemble SVM N/A N/A 12.90% 7.41% 8.58%

Proposed method Linear SVM 96.69% 99.96% 7.01% 4.92% 10.20%

Proposed method 1_Nonlinear_{SVM N/A N/A N/A N/A 7.23%}

1_{A nonlinear support vector machine (SVM) was adopted to compare the proposed method with that}

developed by Wen et al. [16] in the MSU database. 5.2. Comparison with Other Methods in NUAA Database

For the NUAA database, our proposed method achieved an accuracy rate of 96.69%. This is slightly worse than the method proposed by Kim et al. [35] but outperforms the other two methods [18,23]. Furthermore, our proposed method achieved an AUC of 99.96%, which exceeds those achieved by [18] (AUC = 99%) and [23] (AUC = 95%). Kim et al. [35] did not include AUC results in their study. This finding indicated that our proposed method can effectively classify live and spoofing face images in the NUAA database. Figure 10 shows ten examples of correctly classified images. Because the live face images are captured under adequate illumination and lighting conditions, the texture-based feature can capture facial textures from live face images. Furthermore, the proposed method also captured the color distortion, shape deformation, and surface reflection from the spoofing face images. Figure 11 presents five examples of misclassified spoofing face images in the NUAA database. Notably, there were no false rejection results (i.e., no live face images were misclassified). This result implied that the spoofing face images that were dimly lit or too bright or those that had an unknown light reflection were often misclassified by the proposed method. Although some correctly classified images were recorded under bright conditions (e.g., the last two images of the first row in Figure 10), the misclassified spoofing face images (e.g., the third image in Figure 11) were recorded under a “too bright” condition, which created an unclear texture. One possible reason for such misclassification is the lack of detailed texture information due to the dim light and too bright conditions. Additionally, there is a large portion of the background shown in the cropped images (e.g., the fourth image in Figure 11). Thus, the face detector might affect the performance of the proposed face spoofing detection method.

Figure 10. Ten examples of correctly classified images in the NUAA database by using our proposed method. Figure 10.Ten examples of correctly classified images in the NUAA database by using our proposed method.