有效的三階段演算法應用於人臉偵測

(1)

行政院國家科學委員會專題研究計畫成果報告

有效的三階段演算法應用於人臉偵測

計畫類別：個別型計畫

計畫編號： NSC94-2213-E-151-012-

執行期間： 94 年 08 月 01 日至 95 年 08 月 31 日

執行單位：國立高雄應用科技大學光電與通訊工程研究所

計畫主持人：王敬文

計畫參與人員：許志偉許俊彥

報告類型：精簡報告

報告附件：出席國際會議研究心得報告及發表論文

處理方式：本計畫可公開查詢

中華民國 95 年 9 月 12 日

(2)

行政院國家科學委員會補助專題研究計畫成果報告

※※※※※※※※※※※※※※※※※※※※※※※※※

※

※ 有效的三階段演算法應用於人臉偵測

※

※ An Efficient Three Step Approach for Face Detection※

※ ※

※※※※※※※※※※※※※※※※※※※※※※※※※

計畫類別：■個別型計畫 □整合型計畫

計畫編號：NSC 94-2213-E-151 -012

執行期間：94 年 8 月 1 日至 95 年 7 月 31 日

計畫主持人：王敬文國立高雄應用科技大學光通所

計畫參與人員：許志偉國立高雄應用科技大學光通所碩士班

許俊彥國立高雄應用科技大學光通所碩士班

成果報告類型(依經費核定清單規定繳交)：精簡報告

本成果報告包括以下應繳交之附件：

□赴國外出差或研習心得報告一份

□赴大陸地區出差或研習心得報告一份

■出席國際學術會議心得報告及發表之論文各一份

□國際合作研究計畫國外研究報告書一份

執行單位：國立高雄應用科技大學光通所

中華民國 95 年 9 月 12 日

(3)

可供推廣之研發成果資料表

■

可申請專利

■

可技術移轉

日期：95 年 9 月 12 日

國科會補助計畫

計畫名稱：有效的三階段演算法應用於人臉偵測

計畫主持人：王敬文

計畫編號：NSC 94-2213-E-151 -012 學門領域：資訊二

技術/創作名稱

強韌型人臉偵測系統

發明人/創作人

王敬文

中文：我們提出一個新穎的演算法以萃取人臉器官，其中包括眼

睛偵測、嘴巴寬度測量、及姿態估算。在這個演算法中，我們設

計小波框架特徵模板並使用支援向量機及小波散度濾波器來搜索

切割人臉 T 字部位。此 T 字部位通常包括眼睛及嘴巴區塊可用來

選取相對應的視角辨識器進行人臉辨識。實驗結果顯示在複雜的

背景下我們的方法具有相當優異的成果。

技術說明

英文：In this project, we present an efficient algorithm for facial

component extractions, which are eyes detection and the mouth width

measuring, and pose estimation. The algorithm is based on the novel

overcomplete wavelet feature template, the support vector machine

(SVM) classifier, and the wavelet entropy filtering to robustly detect

and segment the T-shape face region. The segmented T-shape face

region, which is the smallest area enclosed by the face ellipse including

eyes and mouth, is used to select the corresponding view-based

classifier for face recognition. The experimental results show that the

proposed method is robust against the complex scenes.

可利用之產業

及

可開發之產品

適用產業：資訊、電子、醫工、機電、國防、保全業

適用產品別：門禁安全、差勤管制、人臉追蹤、視訊會議、機器

人、殘障輔助系統、資料壓縮、資料檢索、軍事用途...。

技術特點

人臉偵測: 不受人數、背景、及光源之影響，可適用各種環境。

搭配適當鏡頭時，偵測距離可達 10 米。

即時的性能: 使用 3.0 GHz CPU 執行單一人臉偵測辨識，平均耗

時 0.5 秒，偵測辨識結果立即上傳預設站台，同時經由 USB I/O

界面輸出 ON/OFF 控制信號。

推廣及運用的價

值

因應個人通訊 3G、4G 時代的來臨，與人有關的多媒體視訊處理皆

可適用，在生活面及安全性上非常實用。

附件二

(4)

Efficient Facial Component Extraction for Detection and Recognition

Jing-Wein Wang

Institute of Photonics and Communications

National Kaohsiung University of Applied Sciences

E-mail: [email protected]

Abstract

In this paper, we present an efficient algorithm for facial component extractions, which are eyes detection and the mouth width measuring, and pose estimation. The algorithm is based on the novel overcomplete wavelet feature template, the support vector machine (SVM) classifier, and the wavelet entropy filtering to robustly detect and segment the T-shape face region. The segmented T-shape face region, which is the smallest area enclosed by the face ellipse including eyes and mouth, is used to select the corresponding view-based classifier for face recognition. The experimental results show that the proposed method is robust against the complex scenes.

1. Introduction

Automatic face segmentation, the task of finding and locating face in images, is a key issue for face biometrics, which can totally perform the functionality of face detection because it not only locates the position of human faces, but also segments the actual shape of complete human faces, and is especially applicable to as a preceding step for face recognition. Greenspan et al. [1] demonstrated that a mixture-of-Gaussians modeling of color space with a robust representation can accommodate large color variations as well as highlights and shadows that allows for face-color modeling and segmentation. Chen and Lu [2] developed a fuzzy clustering algorithm without requiring the knowledge about the number of the color clusters to be generated at each stage and the resolution of the color regions can be controlled by the radius of a cluster. They tested the segmentation system on two face images by setting the cluster radius parameter at various scale. Liu et al. [3] proposed a face segmentation method based on the improved Binary Partition Tree algorithm [4] that also used the valley energy as the face features. Based on the distance measured between the connected component object and the best-fit ellipse [5], Haddadnia et al. [6] tried to cut out a pure face portion by using an ellipse model with five parameters which its shape resembles the shape of a face. A pixel projection

segmentation approach that addresses preprocessing and use of horizontal and vertical projections was studied by Baskan et al. [7]. However, all these methods always include data that are irrelevant to facial portion such as hair, neck, shoulders and backgrounds, and they are not suitable for performing the task of the complete face recognition system since the non-face portion may contribute to the decision boundary determination process that can affect the recognition results [8]-[10].

In this paper, we propose a robust wavelet-based technique combing the merits of the image- and feature-based methods to precisely extract the eyes and locate the mouth with width information for precise face segmentation, which finds the best-fit ellipses to enclose the facial region of the human face in various views under complex scenes. The major advantages of our algorithm are that no redundant data will be included in the segmented face except the elliptic region circumscribed by eye-mouth triangle and the generated estimation on view angle will be a basis for multiview face recognition. Our method uses the T-shape face region for recognition, which is the smallest area enclosed by face ellipse including eyes and mouth. The rest of this paper is organized as follows. In Section 2, the proposed face detection and segmentation approach will be introduced. Some experimental results will be demonstrated in Section 3 to corroborate the proposed approach. Section 4 concludes the paper.

2. Face segmentation and pose estimation

The algorithm’s details of precise face segmentation and pose estimation are described as follows:

Step 1: Perform skin color detection on input image (Fig. 1(a)) to locate region of interest (ROI), which is gotten by enlarging the skin region with a factor 1.2. Step 2: Transform the ROI into overcomplete wavelet coefficients by using the D4 basis.

Step 3: Set the initial feature template size to 24 × 24 pixels, exhaustively scan each ROI at different scales starting from the top left corner and sliding the template at two pixels increment horizontally and vertically until

(5)

all ROIs have been covered. The ROI is scanned at 10 scales each a factor of 1.25 larger than the last.

Step 4: To locate the potential candidate, the rectangle feature of W × H pixels in the feature template can be computed as follows | ) , ( | ] [ 0 , 0≤<

∑

≤< = H y W x y x d i DWF LH , (1) where dDWF

LH is the nonsubsampled version of discrete

wavelet transform (DWF, discrete wavelet frames) in the LH horizontal subband and i∈{1,...,I}, I = 9 is the number of rectangles in the proposed feature template. At each location within ROI, check the decision rule of equation (2) that indicates whether the face template is satisfied or not.

1 2 3 4 5 6 7 8 9

( [1] > [4] ) && ( [3] > [6] ) && ( [1] > [2] || [3] > [2] ) && ( [5] > [4] ) && ( [5] > [6] ) && ( [8] > [7] ) && ( [8] > [9] ) (2) Step 5: Merge templates that can be counted as enclosing the same face since it is common to have multiple findings with small displacements horizontally and/or vertically.

Step 6: Normalize all the candidate regions – resizing to 24 × 24 pixels and using histogram equalization. Step 7: Classify the face candidate using the SVM classifier. If the class corresponds to a face, we draw a rectangle around the face in the output image.

Step 8: Normalize the detected face rectangle to size 4 0 × 6 0 p i x e l s . An entropy-based smoothing filter is introduced to move the center from pixel to pixel in the rectangle to remove the coefficients located at outside the facial component regions. This continues until all pixel locations have been covered and facial component object are to be created for extraction. The entropy filtering of the pixels in the 3 × 3 (N = 3 ) neighborhood defined by the mask is given by the expression ) , ( log ) , ( 1 _ 1 0 , 2 d x y d x y N entropy Wavelet DWF LH N y x DWF LH

∑

− = − =

.

( 3 ) According to anthropometry, we could perform the inter-orientation projection along the horizontal and vertical axes respectively to locate human eyes and mouth. Search for the centroid in the facial component region, we approach region segmentation by finding

meaningful boundary based on point aggregation procedure. Choosing the center pixel of the component region is a natural starting point and grouping points to form the region of interest with paying attention to 4-connectivity would yield a clustering result, when no more pixels for inclusion in the region. After growing, the region centroid is relocated and therefore eyes, mouth, and face size are also known. An adaptation is finally carried out to mark the eyes and mouth positions and refine the bounding rectangle as an ellipse of fitting facial oval shape (Fig. 1(b)).

Note that one needs to do further works as the next step for the view-space separation of human face while the final objective of facial component extraction is for face recognition.

Step 9: The view-space separation procedure presented in Fig. 2 is defined in the image plane. The points

) ,

(Lx Ly and (Rx,Ry) as displayed in the following

illustration are the coordinates of the left and right eyes, respectively, and the point (Mx,My) is the center of the mouth. We define the pan θ of equation (4) as the angle between the lines passing through the right eye and the center of the mouth and passing through the left eye and the center of the mouth. The face image with rotation in plane θ can be rotated back to the upright position (orientation alignment) and then the left-view can be separated from the right-view based on the skin color ratio between the left cheek and the right cheek of the segmented face. The color ratio range is determined as (0.8, 1.2) for the frontal view face recognition, which is corresponding to pan rotation in (-20°, 20°). The left view with rotation angle (-60°, -20°) is defined as the color ratio less than 0.8, and the color ratio with value larger than 1.2 is responsible for the right view with rotation angles (20°, 60°). π θ 180/ 2 / ) ( 2 / ) ( tan1 _⎟⎟× ⎠ ⎞ ⎜⎜ ⎝ ⎛ + − + − = − L R M L R M y y x x x x . (4) ) , (Mx My ) , (Rx Ry ) , (Lx Ly θ

(6)

3. Experimental results

3.1. Live and photo detections

Our training set contains 16,000 faces and 22,000 non-faces which were extracted semi-automatically by our algorithm from various sources including internet and pictures taken by ourselves. The precise face detector is tested on the face images from the photos which differs from the face detector’s training database. In order to define a correctly detected face in the case of precise face segmentation, we introduce the FDM (Face detection measuring) criterion that a detected face is considered as valid if the detected eye-mouth triangle circumscribed circle area is within ±10 percent of the real face area and contains both eye-pair and mouth. The detection rate is the ratio between the number of successful detections and the number of labeled faces in the test set. The false alarm rate is the ratio between the number of false positive detection and the number of detected windows.

Fig. 3 shows example results obtained by our face detector on the test sets, which are composed of various sources. In the experiments, we obtain high detection rates with low false alarm, which are 93.69% with precision 84.63%, 90.38% with precision 88.68%, and 99.0% with precision 98.02% for the BioID (1521 images) [11], the Internet (100 images), and the Caltech (450 images) [12] test sets, respectively.

3.2. Live detection and recognition

As a case study, an integrated system for multiview face detection and recognition (MFDR) in complex scenes is constructed. The MFDR system comprises a real time face detector followed by a SVM recognizer. The SVM recognizer was trained and test on different sets of the live image database with 1000 positive examples (registered persons) and 1000 negative examples (intruders), 100 of each person, without using the face detector’s ones. All the faces were collected in frontal view or near frontal view. The one-level decomposed wavelet subbands for the example image are all input to the SVM recognizer for face identification. We performed frontal-view and near frontal-view recognition based on the results of detection. In this scenario, the system processes a sequence of images to recognize the person and recall his name if he/she is recognized as a database subject; otherwise, the person is rejected as unknown. We perform face detection and recognition from video input lively based on the rectangle face prevailed in the published works and the ellipse face, respectively. Overall, the latter with 94% recognition rate and no false acceptance achieves a better performance than the former. In terms of speed, the execution time of the proposed live detector and recognizer is directly related to size and complexity of the images. For example, our system is operating at an

average processing time less than 1.0 sec per live image with single face on a 3.0 GHz Pentium PC. The result shows that the performance of face recognition on this small scale problem (10 persons) is acceptable, therefore it may have potential applications such as door access control in a home security system.

4. Conclusions

We have presented in this paper a framework for precise face segmentation based on wavelet detector, without making any assumptions concerning the areas of the face pattern to analyze. The robustness of our system to varying poses and facial expressions as well as lighting variations was evaluated using real sets of difficult images. The satisfactory result for recognition has been obtained with the proposed ellipse face segmentation.

Acknowledgement

This work has been partly supported under grant No. NSC NSC 94-2213-E-151-012.

References

[1] M. Greenspan, J. Goldberger, and I. Eshet, “Mixture model for face color modeling and segmentation,” Pattern

Recognition Letters, vol. 22, pp. 1525-1536, 2001.

[2] T. Q. Chen and Y. Lu, “Color image segmentation-an innovative approach,” Pattern Recognition, vol. 35, pp. 395-405, 2002.

[3] Z. Liu, J. Yang, and N. S. Peng, “An efficient face segmentation algorithm based on binary partition tree,” Signal

Processing: Image Communication, vol. 20, pp. 295-314,

2005.

[4] P. Salembier and L. Garrido, “Binary partition tree as an efficient representation for image processing, segmentation, and information retrieval,” IEEE Transactions on Image

Processing, vol. 9, no. 4, pp. 561-576, April 2000.

[5] J. Wang and T. Tan, “A new face detection method based on shape information,” Pattern Recognition Letter, vol. 21, pp. 463-471, 2000.

[6] J. Haddadnia, K. Faez, and M. Ahmadi, “An efficient human face recognition system using pseudo zernike moment invariant and radial basis function neural network,”

International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI), vol.17, no.1, pp. 41-62, 2003.

[7] S. Baskan, M. M. Bulut, and V. Atalay, “Projection based method for segmentation of human face and its evaluation,”

Pattern Recognition Letters, vol. 23, pp. 1623-1629, 2002.

[8] A. Martinez, “Recognizing imprecisely localized, partially occluded and expression variant faces from a single sample per class,” IEEE Transactions on Pattern Analysis and

Machine Intelligence, vol. 24, pp. 748-763, 2002.

[9] L-F. Chen, H-Y. Liao, J-C. Lin, and C-C. Han, “Why recognition in a statistics-based face recognition system should be based on the pure face portion: a probabilistic decision-based proof,” Pattern Recognition, vol. 34, no. 7, pp. 1393-1403, 2001.

[10] K. Messer, J. Kittler, M. Sadeghi, M. Hamouz, A. Kostyn, S. Marcel, S. Bengio, F. Cardinaux, C. Sanderson, N. Poh, Y.

(7)

Rodriguez, K. Kryszczuk, J. Czyz, L. Vandendorpe, J. Ng, H. Cheung, and B. Tang. “Face authentication competition on the BANCA database,” ICBA, Lecture Notes in Computer Science, vol. LNCS 3072, pp. 8-15, 2004.

[11] http://www.humanscan.de/.

[12] http://www.vision.caltech.edu/html-iles/archive.html.

Fig. 1. Live face detection.

Fig. 2. View space separation.

(a) (b)

(c)

(d)

Fig. 3. Live detection: (a) dim lighting, (b) bright lighting, (c) in crowds; and (d) photo detection.

FACE CLASSIFICATION WAVELET FRAMES

(a) Live input image;

SKIN COLOR DETECTION

FACE LOCALIZATION

FACIAL COMPONENT EXTRACTION

(b) Live face detection.

SEGMENTATION ORIENTATION ALIGNMENT PAN ROTATION ANGLE ESTIMATION

Detected face Segmented face

有效的三階段演算法應用於人臉偵測

行政院國家科學委員會專題研究計畫 成果報告

有效的三階段演算法應用於人臉偵測

計畫類別： 個別型計畫

計畫編號： NSC94-2213-E-151-012-

執行期間： 94 年 08 月 01 日至 95 年 08 月 31 日

執行單位： 國立高雄應用科技大學光電與通訊工程研究所

計畫主持人： 王敬文

計畫參與人員： 許志偉 許俊彥

報告類型： 精簡報告

報告附件： 出席國際會議研究心得報告及發表論文

處理方式： 本計畫可公開查詢

中 華 民 國 95 年 9 月 12 日

行政院國家科學委員會補助專題研究計畫成果報告

※※※※※※※※※※※※※※※※※※※※※※※※※

※

※

※ 有效的三階段演算法應用於人臉偵測

※

※ An Efficient Three Step Approach for Face Detection※

※ ※

※※※※※※※※※※※※※※※※※※※※※※※※※

計畫類別：■個別型計畫 □整合型計畫

計畫編號：NSC 94-2213-E-151 -012

執行期間：94 年 8 月 1 日至 95 年 7 月 31 日

計畫主持人： 王敬文 國立高雄應用科技大學光通所

計畫參與人員： 許志偉 國立高雄應用科技大學光通所碩士班

許俊彥 國立高雄應用科技大學光通所碩士班

成果報告類型(依經費核定清單規定繳交)：精簡報告

本成果報告包括以下應繳交之附件：

□赴國外出差或研習心得報告一份

□赴大陸地區出差或研習心得報告一份

■出席國際學術會議心得報告及發表之論文各一份

□國際合作研究計畫國外研究報告書一份

執行單位：國立高雄應用科技大學光通所

中 華 民 國 95 年 9 月 12 日

可供推廣之研發成果資料表

■

可申請專利

■

可技術移轉

國科會補助計畫

計畫名稱：有效的三階段演算法應用於人臉偵測

計畫主持人：王敬文

計畫編號：NSC 94-2213-E-151 -012 學門領域：資訊二

技術/創作名稱

發明人/創作人

王敬文

中文：我們提出一個新穎的演算法以萃取人臉器官，其中包括眼

睛偵測、嘴巴寬度測量、及姿態估算。在這個演算法中，我們設

計小波框架特徵模板並使用支援向量機及小波散度濾波器來搜索

切割人臉 T 字部位。此 T 字部位通常包括眼睛及嘴巴區塊可用來

選取相對應的視角辨識器進行人臉辨識。實驗結果顯示在複雜的

背景下我們的方法具有相當優異的成果。

技術說明

英 文 ：In this project, we present an efficient algorithm for facial

component extractions, which are eyes detection and the mouth width

measuring, and pose estimation. The algorithm is based on the novel

overcomplete wavelet feature template, the support vector machine

(SVM) classifier, and the wavelet entropy filtering to robustly detect

and segment the T-shape face region. The segmented T-shape face

region, which is the smallest area enclosed by the face ellipse including

eyes and mouth, is used to select the corresponding view-based

classifier for face recognition. The experimental results show that the

proposed method is robust against the complex scenes.

可利用之產業

及

可開發之產品

適用產業：資訊、電子、醫工、機電、國防、保全業

適用產品別：門禁安全、差勤管制、人臉追蹤、視訊會議、機器

人、殘障輔助系統、資料壓縮、資料檢索、軍事用途...。

技術特點

人臉偵測: 不受人數、背景、及光源之影響，可適用各種環境。

搭配適當鏡頭時，偵測距離可達 10 米。

即時的性能: 使用 3.0 GHz CPU 執行單一人臉偵測辨識，平均耗

時 0.5 秒，偵測辨識結果立即上傳預設站台，同時經由 USB I/O

界面輸出 ON/OFF 控制信號。

推廣及運用的價

值

因應個人通訊 3G、4G 時代的來臨，與人有關的多媒體視訊處理皆

行政院國家科學委員會專題研究計畫成果報告

計畫類別：個別型計畫

執行單位：國立高雄應用科技大學光電與通訊工程研究所

計畫主持人：王敬文

計畫參與人員：許志偉許俊彥

報告類型：精簡報告

報告附件：出席國際會議研究心得報告及發表論文

處理方式：本計畫可公開查詢

中華民國 95 年 9 月 12 日

計畫主持人：王敬文國立高雄應用科技大學光通所

計畫參與人員：許志偉國立高雄應用科技大學光通所碩士班

許俊彥國立高雄應用科技大學光通所碩士班

中華民國 95 年 9 月 12 日

英文：In this project, we present an efficient algorithm for facial