人臉偵測之研究

(1)

國

立

交

通

大

學

資訊科學與工程研究所

博士論文

人臉偵測之研究

A Study on Face Detection

研究生: 井民全

指導教授: 陳玲慧教授

(2)

人臉偵測之研究

A Study on Face Detection

研究生：井民全

Student: Min-Quan Jing

指導教授：陳玲慧

Advisor: Dr. Ling-Hwei Chen

國立交通大學

資訊科學與工程研究所

博士論文

A Dissertation

Submitted to Institute of Computer Science and Engineering College of Computer Science

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Doctor of Philosophy

in

Computer Science

June 2009

Hsinchu, Taiwan, Republic of China

中華民國九十八年七月

(3)

人臉偵測之研究

研究生:井民全指導教授:陳玲慧博士

國立交通大學資訊學院資訊工程系

摘要

近年來

，

數位安全監控技術的普遍發展

，

使得完全自動化的人臉

系統 (fully automated face system) 越來越受到重視。人臉偵測是自動

化人臉系統的基石，任何人臉應用系統都需要有一個強固的人臉偵測

方法，才能達到預期的效果，因此人臉偵測的研究與分析便顯得重要

而且迫切。雖然人臉偵測方法的研究已經超過 20 年，也有許多成果

已經發表。可是對於多變化環境下的人臉偵測，特別是針對不同人臉

角度、不同光源方向與有顏色光源的人臉位置偵測方法，依然是一

項十分具挑戰性的研究領域。

一般人臉偵測方法的主要架構大致是由下列幾項要素組成: 人臉

候選位置的判定、臉部特徵的擷取與人臉位置的偵測。可是由於多變

化的環境，導致人臉偵測的困難。例如在研判人臉可能出現的位置

時，就可能受取像時的環境條件所影響，有顏色的光源與不同角度的

側面光源，如不加以考量就會影響人臉候選位置的判定。另外由於攝

影機與人臉之間的取像角度，使得人臉在影像中所呈現的姿勢可能為

(4)

正面、接近正面、半側面、側面等，一些重要的臉部特徵如眼睛，卻

可能會因為影子、眼鏡和人臉姿勢的關係導致偵測錯誤。而不同的臉

部表情也會產生不同的臉部呈現樣式。這些因素增加了人臉偵測的困

難性。本論文之主要目的在於提出一個解決上述問題的人臉偵測方

法。

在本論文中，我們首先將會提出一個在多變化環境下的眼睛水平

線偵測方法，利用膚色資訊將人臉可能所在的區域標記出來，然後根

據人類眼睛擁有的非膚色特性與高亮度(gray-level)變化特性，將眼睛

的位置標示出來，最後利用人臉的幾何特性將眼睛水平線標記定位。

這個方法主要的貢獻在於提出一個有效的方法，取得多變化環境下的

眼睛候選位置與水平線資訊。接下來，利用得到眼睛水平線位置與眼

睛可能位置座標，發展一個在多變化環境下不同姿勢的人臉偵測方

法。首先將膚色資訊所標記出來的人臉區域進行肩膀區域的判定與刪

除，接著利用定義好的側面人臉特徵幾何規則將人臉區域分成側面人

臉或非側面人臉區域。對於非側面的人臉候選區域，利用前述方法取

得眼睛水平線與眼睛可能位置座標。最後針對不同人臉姿勢與人臉亮

度分佈的特性，標記出真正的眼睛位置並藉此將人臉位置標示出來。

本論文所提出的方法，經過標準人臉資料庫實驗證實，可以解決人臉

偵測在多變化環境下所遭遇的問題。

(5)

A Study on Face Detection

Student: Min-Quan Jing Advisor: Dr. Ling-Hwei Chen

Institute of Computer Science and Engineering National Chiao Tung University

ABSTRACT

While digital surveillance systems are receiving increasing concern in

modern society, developing a fully automated face system is getting more and more

attentions than before. However, a robust face detector is a foundation of building an

automated face system. Face detection method insures a face system realizable. Thus,

the demand for an efficient method to automatically detect face becomes urgent.

Although face detection has been studied for more than 20 years, developing a

human face detector under various environments is still a challenging task. An

automatical face-detection job would include three steps: face candidate location,

facial feature extraction and face detection. Some factors make face location difficult.

One is the variety of colored lighting sources, another is that facial features such as

eyes may be partially or wholly occluded by shadow generated by a bias lighting

direction; others are races and different face poses with/without glasses. These factors

make face detection difficult. The aim of the research is to provide a solution to these

(6)

of a face under various environments. Based on the facts that the eye color is very

different from skin color and the gray level variance of an eye is high, some eye-like

regions are located. Then the horizontal eye line is extracted based on the located

eye-like regions and some geometric properties in a face. The contribution of this

method is providing information of eye-like region positions and human horizontal

eye line under various environments. Next, a method based on the extracted eye-like

region position and horizontal eye line information will be proposed to detect a face

with different poses under various environments. The basic idea is, first, skin regions

are extracted from an input image using skin color information and then the shoulder

part is determined and cut out by using shape information. The remained head part is

identified as a face candidate. For a face candidate, we apply a set of geometric

features to determine if it is a profile face. If the face candidate is a non-profile face,

then a set of eye-like rectangles extracted from the face candidate and the lighting

distribution are used to determine. Solving the poses problem and detecting face

location under various environments are the main contributions of this method.

Experimental results show that the proposed method is robust under a wide range of

(7)

誌謝

首先要對我的指導教授陳玲慧博士獻上我最誠摯的感謝。在她細心與熱心的指導下，讓我能夠完成博士論文。從她身上我學習到豐富的知識與嚴謹的態度，更重要的是，我從她身上學習到真誠待人的處事原則。她是幫助我最多的人，在我的人生中，我很幸運能夠遇到這麼一位這麼好的老師。接著我要特別感謝的人是李建興學長，他在我最需要幫忙的時候伸出援手並且給我溫暖的建議與打氣。林瑞祥學長，在我困難的時候幫我排除萬難解決問題。我由衷地感謝你們，你們是我心中學長的典範。我要感謝自動化資訊處理實驗室的朋友們，惠龍我從你身上學到堅持到底的精神，我記得每個晚上你鼓勵我的話語、文超很感謝你為我做的事情、芳如我永遠記得妳下雨天幫我買便當的樣子、盈如謝謝妳幫我送論文給教授們，我知道妳一個一個去敲門、Sasami、俊旻、阿和、小胖、阿鴻、阿達、Gavin、小貝，實驗室有你們在讓我覺得很開心與溫暖。宜軒謝謝妳來實驗室看我幫我打氣，我收到了妳的祝福、阿雪謝謝妳不辭辛勞的來實驗室修改論文。合吉、朝君、小邵、致生、崇荏、佳峰，謝謝你們在我困難時陪我，你們的友誼是我重要的支柱。最後我要感謝我的父母，在我徬徨時多次給我堅定的力量、讓我能夠勇往直前。貞如我的妻子謝謝妳多年的忍讓，使我沒有後顧之憂地追求我的夢想。這些年來辛苦妳了。我僅以最誠摯的心將本論文獻給我的父母、妻子與曾經在我生命中給予我鼓勵及協助的每一個人，謝謝你們。

(8)

LIST OF TABLES

TABLE 2.1 THE TEMPLATES FOR SIX EYE CLASSES ………….……. 24 TABLE 3.1 DETECTION RESULTS ON HHI DATABASE (IMAGE SIZE

640 x 480) ………... 61 TABLE 3.2 DETECTION RESULTS ON CHAMPION DATABASE

(IMAGE SIZE ~ 150 x 220) ………... 62 TABLE 3.3 DETECTION RESULTS ON NON-PROFILE FACES OF HHI

(11)

LIST OF FIGURES

Fig. 1.1 The block diagram of the proposed face detector. 6 Fig. 2.1 Some results of applying skin region extractor on different

races under various lighting environments. (a) Four images taken under normal, bias colored and high lighting environments. (b) The extracted candidate skin regions using threshold 142. (c) The extracted candidate skin regions using threshold 132……… 12 Fig. 2.2 Some examples for illustrating the skin region filter. (a) A face

image. (b) Candidate skin regions using 132 as threshold. (c) The result of removing regions of small sizes and improper shapes in (b). (d) Candidate skin regions using 142 as threshold. (e) The result of removing small regions in (d). (f) The result of merging (c) and (e) and imposed on (a). (g) The result of removing uniform content regions and cross over regions in (f). 14 Fig. 2.3 The block diagram of the eye-like region detector……….. 15 Fig. 2.4 Subjects taken under various lighting environments. (a) The

normal lighting condition. (b) Top lighting source. (c) Colored and left bias lighting source. (d) High lighting source…………. 16 Fig. 2.5 The flowchart of the intensity based eye-like region detector…. 17 Fig. 2.6 An example of the bi-level thresholding results. (a) A face

region. (b) The histogram of detected gray pixels within the skin region. (c) Bi-level thresholding results of (a) using different thresholds………... 18 Fig. 2.7 An example of the horizontal gray value projection for an eye.

(a) The projection region extended from eye-like region. (b) The projected histogram of the projection region in (a)………... 19 Fig. 2.8 The results of applying the intensity based algorithm. (a) The

result of a normal lighting source face. (b) The result of a colored and left bias lighting source face………. 19 Fig. 2.9 The result of applying the proposed intensity and non-skin

color based eye detectors. (a) The eye-like regions located based on non-skin color information. (b) The combined result of Fig. 2.8(b) and (a)………. 20 Fig. 2.10 Five false region classes. (a) An example to show four classes

(12)

of false eye-like regions. (b) Another example to show forehead - hair class ……… 21 Fig. 2.11 The result of applying the false eye-like removing procedure.

(a) A bounding rectangle of a candidate skin region. (b) The horizontal projection histogram of the bounding rectangle in (a). (c) The detected upper and bottom lines. (d) The result after applying the removing procedure presented in Section 2.2.3.1 to (a)……….. 22 Fig. 2.12 The result of applying the eye position refining procedure. (a)

An example to show an eye not located at the center of the eye-like region. (b) The result of applying the eye position refining procedure to (a)………... 25 Fig. 2.13 The result of removing isolated false regions. (a) An example to

show isolated regions. (b) The result of removing the isolated regions in (a)………. 25 Fig. 2.14 An example to illustrate the bounding rectangle of a face. (a)

The minimum rectangle containing all eye-like regions. (b) The expanding result of (a)……….. 26 Fig. 2.15 Two examples to show different models of gray level

histograms. (a) A bias lighting face and its corresponding histogram. (b) A normal lighting face and its corresponding histogram……….. 27 Fig. 2.16 Two examples to show the over-segmentation and

under-segmentation. (a) Original image. (b) An over-segmentation obtained after applying the threshold process to (a). (c) Original image. (d) An under-segmentation obtained after applying the threshold process to (c)……… 28 Fig. 2.17 The result of applying the forehead-hair false region removing

procedure. (a) The eye-like regions before applying the removing procedure. (b) The result of applying the bi-level threshold to (a). (c) The result of removing the forehead - hair false regions in (b)……… 30 Fig. 2.18 An example to explain the eye-like region resuming procedure.

(a) The bounded ranges for resuming the removed eye-like regions in the shadow area of a face. (b) Three eye-like regions (marked by white rectangles) got back………. 30

(13)

Fig. 2.19 Some examples for the horizontal projection. (a) Only one valley case. (b) Two valleys case. (c) D1s1D2 with two eye-like regions near the second valley line. (d) D1s2D2 with two eye-like regions near the first valley line……….. 33 Fig. 2.20 A part of images from HHI face database with different poses

and eye-glasses………. 34 Fig. 2.21 A part of images from Champion face database with different

skin colors………. 34 Fig. 2.22 The detection results for persons with shadows and various

lighting sources on their faces……….. 34 Fig. 2.23 The detection results of face images collected from the Internet,

MPEG7 video clips and PIE database……….. 35 Fig. 2.24 Some error/missing examples………... 35 Fig. 3.1 The block diagram of the proposed face detector……… 38 Fig. 3.2 Some results of applying Jing-Chen skin region extractor. (a)

Three images taken under various lighting environments. (b) The extracted skin regions (white pixels) using Jing-Chen skin region detector……….. 39 Fig. 3.3 An example showing two men wearing clothes with/without

skin-like color. (a) A man wearing a T-shirt of skin-like color. (b) A man wearing a T-shirt of non skin-like color. (c) The detected skin region of (a). (d) The detected skin region of (b). (e) The horizontal projection of (c). (f) The horizontal projection of (d)……… 40 Fig. 3.4 An example for removing shoulder from a skin region. (a) The

Top-Down line for a projection histogram. (b) The neck point and the cutting line located. (c) The located cutting line on Fig. 3.3(c). (d) The result of removing shoulder from (c)…………... 42 Fig. 3.5 An example for illustrating the profile features. (a) Some

feature points on a right profile. (b) The extracted nose peak. (c) The extracted chin scan line and chin point. (d) The extracted nose bottom scan line and nose bottom. (e) The extracted nose top………. 43 Fig. 3.6 An example to show the defined profile variables and the face

rectangle. (a) The defined variables for a skin region. (b) The defined profile face rectangle………... 45

(14)

Fig. 3.7 An example for simple face model. (a) A face image. (b) The located horizontal eye line and eye-like rectangles. (c) The defined face rectangle………... 47 Fig. 3.8 Two examples for the complexity face model. (a) A man with

glasses. (b) A man under bias lighting source……….. 48 Fig. 3.9 The vertical histogram projection patterns and detected vertical

symmetric lines. (a) A half profile pattern. (b) A bias lighting pattern. (c) A near frontal pattern with left-side pose. (d) A normal pattern………... 49 Fig. 3.10 An example to illustrate the eye-like boundary for a skin

region……… 52 Fig. 3.11 Two examples for true eye and face location. (a) A case

satisfying Rule 1. (b) The detected face location based on the eye locations from (a). (c) A case satisfying Rule 2. (d) The detected face location based on the eye locations of (c)………... 54 Fig. 3.12 Two examples for the two kinds of eye patterns. (a) The

two-layered pattern. (b) The three-layered pattern………... 56 Fig. 3.13 Some examples for true eye and face rectangle location. (a)

Two-layered patterns at each side of the symmetric line. (b) The detected face location based on the true eye locations on (a). (c) One two-layered pattern detected. (d) The detected face location based on the eye locations on (c). (e) Another example for only one two-layered pattern detected at one side. (f) The detected face location based on the eye locations on (e). (g) One three-layered pattern detected. (h) The detected face location based on the eye locations on (g). (i) An example of other patterns detected. (j) The detected face location based on the eye locations on (i)………... 56 Fig. 3.14 A part of images from HHI face database……… 58 Fig. 3.15 A part of images from Champion face database………... 59 Fig. 3.16 The detection results for persons with different skin colors……. 59 Fig. 3.17 The detection results for persons with different poses…………. 60 Fig. 3.18 The ROC curves for our face detector on HHI and Champion

databases………... 62 Fig. 3.19 The detection results of face images collected from our

laboratory, the Internet, and MPEG7 video clips………. 63 Fig. 3.20 The detection results for multiple faces………... 64

(15)

ABBREVIATION

SVM Support Vector Machine

HHI Heinrich Hertz Institute

PIE The CMU pose, illumination, and expression database

(16)

CHAPTER 1 INTRODUCTION

1.1 MOTIVATION AND APPLICATIONS

human face detector under various environments is still a challenging task. Some

factors make face location difficult. One is the variety of colored lighting sources,

another is that facial features such as eyes may be partially or wholly occluded by

shadow generated by a bias lighting direction; others are races and different face

poses with/without glasses. In a drowsy driving warning system, driver’s mental

condition can be revealed by measuring the eye movement from the eye location [1].

How to locate the eyes of a driver under extreme lighting condition is an important

issue for a successful intelligent transportation system. Detecting the eye line can help

locate eyes. Moreover, the detected eye line can be used to do face detection. There

are several methods are proposed to overcome face detection problems, Hjelmas and

Low [2] have presented an extensive survey of feature-based and image-based

algorithms for face detection. In feature-based methods, the most widely used

technique is skin color detection based on one of the color models. Most image-based

(17)

faces at all scales, making them computationally expensive. Also, Yang [3] classifies

the approaches into four categories: knowledge-based, feature invariant, template

matching, and appearance-based and address one of the difficulty of face detection is

lighting source varies significantly. Several methods [4-18] had been proposed by now.

Some [4-6] use eigenface, neural network and support vector machine (SVM) to

detect faces of restricted poses under normal lighting condition. Chow [7] proposed an

approach to locate facial features (such as eyes, eyebrows) by using an eigenmask

under different lighting conditions. However, the method fails to locate the facial

features for some kinds of poses (such as near-profile and profile face). Shin and

Chuang [8] uses shape and contour information to locate face in a plain background

and normal lighting condition. However, it is difficult to detect the predefined shape

of a face in a complex background and various lighting conditions. Wu et al. [9]

proposed an algorithm to locate face candidates using eye-analogue segments

information. The algorithm would fail when a person wears glasses. Some methods

[10-12] use mosaic, edge and mathematical morphology to detect eyes, however,

these methods would fail to locate eyes when a face is under a poor lighting condition

(such as bias-light source). Wang [13] proposed a boosted based algorithm to detect a

face. But the training phase of the method is time consuming and it is only designed

(18)

performance in terms of detection rate. However, the detection would fail under

extreme lighting condition.

As mentioned above, extreme lighting conditions (such as colored lighting

source and bias lighting direction), different subject poses, glasses, races, and

complex background are factors to make face detection difficult. To solve the problem

caused by these factors, we will propose a method to extract the horizontal eye line of

a face and some eye-like regions under various environments. The extracted eye line

and eye-like regions can be further used to help extract the true positions of eyes and

face. The proposed method contains several steps. First, we will use skin colors to

extract candidate skin regions. Next, an eye-like region detector based on intensity

and color information will be provided to explore all possible eye-like regions in each

candidate skin region. A lighting distribution based algorithm will then be presented to

remove some false eye regions. Finally, based on the extracted eye-like regions, the

horizontal eye line for a candidate skin region can be located using the gray level

distribution of the skin region. The eye line is used to detect a face with/without

(19)

1.2 THE MAIN PROBLEMS AND CURRENT STATE

In this dissertation, we will provide methods to deal with the problems of face

detection including: facial feature extraction, poses identification and face detection.

These three problems are defined as follows:

(1) Facial feature extraction: given an image, to develop a method extracting facial

features. It is an important process for face detection since different lighting

environments will impact the performance of face detection.

(2) Pose identification: given a face candidate, to develop a method to automatically

identify what pose of the face.

(3) Face detection under various lighting environment: given an image, to develop a

method to locate the face under various lighting environment.

Actually, the three problems are logically sequenced. The solutions to these three

problems can be used as the basic components for developing a fully automatically

face system. As mentioned above, most of research efforts have been spent on face

(20)

1.2.1 The Main Problem of Face Detection

human face detector under various environments is still a challenging task. Some

factors make face location difficult. One is the variety of colored lighting sources,

another is that facial features such as eyes may be partially or wholly occluded by

shadow generated by a bias lighting direction; others are races and different face

poses with/without glasses. As mentioned above, extreme lighting conditions (such as

colored lighting source and bias lighting direction), different subject poses, glasses,

races, and complex background are factors to make face detection difficult. To solve

these problems, we propose a method to detect a face with/without glasses under

various poses and environments. Fig. 1.1 shows the block diagram of the proposed

face detector. First, we use skin colors to extract candidate skin regions. Next, shape

information is used to judge whether a skin region contains a shoulder part. If yes, a

skin projection based method will be applied to remove it and the head part is labeled

as a face candidate for further processing. For each face candidate, a set of geometric

constrains are firstly applied to judge if the face candidate is a profile face. If no,

some eye-like rectangles are first extracted. Then, based on these eye-like rectangles,

a horizontal eye line and a vertical symmetric line are located to judge if the face

(21)

A color image Skin region detection

Is a profile face ? Face location Eye-like rectangle location Vertical symmetric line detection End Non-profile face detection Shoulder removing Is a face found No face True eye labeling Horizontal eye line detection no yes no yes ?

Fig. 1.1. The block diagram of the proposed face detector.

1.3 SYNOPSIS OF THE DISSERTATION

The rest of the dissertation is organized as follows. Chapter 2 describes the

proposed horizontal eye line detection and eye-like region detection method. The

face-pose identification method and face detection method with different poses under

vary environment are proposed in Chapter 3. Some conclusions and future research

(22)

CHAPTER 2 A NOVEL METHOD FOR

HORIZONTAL EYE LINE DETECTION

UNDER VARIOUS ENVIRONMENTS

In this chapter, we will propose a method to extract the horizontal eye line of a

face and some eye-like regions under various environments. It contains several steps.

First, we will use skin colors to extract candidate skin regions. Next, an eye-like

region detector based on intensity and color information will be provided to explore

all possible eye-like regions in each candidate skin region. A lighting distribution

based algorithm will then be presented to remove some false eye regions. Finally,

based on the extracted eye-like regions, the horizontal eye line for a candidate skin

region can be located using the gray level distribution of the skin region.

2.1 INTRODUCTION

In a drowsy driving warning system, driver’s mental condition can be revealed

by measuring the eye movement from the eye location [1]. How to locate the eyes of a

driver under extreme lighting condition is an important issue for a successful

intelligent transportation system. Detecting the eye line can help locate eyes.

(23)

detection has been studied for more than 20 years, developing a human face detector

under various environments is still a challenging task. Some factors make face

location difficult. One is the variety of colored lighting sources, another is that facial

features such as eyes may be partially or wholly occluded by shadow generated by a

bias lighting direction; others are races and different face poses with/without glasses.

In order to overcome this problem, Hjelmas and Low [2] have presented an extensive

survey of feature-based and image-based algorithms for face detection. In

feature-based methods, the most widely used technique is skin color detection based

on one of the color models. Most Image-based approaches are based on

multiresolution window scanning to detect faces at all scales, making them

computationally expensive. Also, Yang [3] classifies the approaches into four

categories: knowledge-based, feature invariant, template matching, and

appearance-based and address one of the difficulty of face detection is lighting source

varies significantly. Several methods [4-18] had been proposed by now. Some [4-6]

use eigenface, neural network and support vector machine (SVM) to detect faces of

restricted poses under normal lighting condition. Chow [7] proposed an approach to

locate facial features (such as eyes, eyebrows) by using an eigenmask under different

lighting conditions. However, the method fails to locate the facial features for some

(24)

and contour information to locate face in a plain background and normal lighting

condition. However, it is difficult to detect the predefined shape of a face in a complex

background and various lighting conditions. Wu et al. [9] proposed an algorithm to

locate face candidates using eye-analogue segments information. The algorithm

would fail when a person wears glasses. Some methods [10-12] use mosaic, edge and

mathematical morphology to detect eyes, however, these methods would fail to locate

eyes when a face is under a poor lighting condition (such as bias-light source). Wang

[13] proposed a boosted based algorithm to detect a face. But the training phase of the

method is time consuming and it is only designed for frontal face. Shih [14] proposed

a color based method shows reasonable performance in terms of detection rate.

However, the detection would fail under extreme lighting condition.

As mentioned above, extreme lighting conditions (such as colored lighting

source and bias lighting direction), different subject poses, glasses, races, and

complex background are factors to make face detection difficult. To solve the problem

caused by these factors, we will propose a method to extract the horizontal eye line of

a face and some eye-like regions under various environments. The extracted eye line

and eye-like regions can be further used to help extract the true positions of eyes and

(25)

2.2 THE PROPOSED METHOD

The proposed method contains several steps. First, we will use skin colors to

extract candidate skin regions. Next, an eye-like region detector based on intensity

and color information will be provided to explore all possible eye-like regions in each

candidate skin region. A lighting distribution based algorithm will then be presented to

remove some false eye regions. Finally, based on the extracted eye-like regions, the

horizontal eye line for a candidate skin region can be located using the gray level

distribution of the skin region.

2.2.1 The Proposed Skin Region Detector

In several face detection systems [15], skin color plays a major role for

segmenting face regions from an image. There are several color models utilized to

label pixels as skin, these include RGB, HSV and YCrCb color spaces [16-18]. Since

a face image is usually taken under unpredicted lighting conditions, RGB color-model

is sensitive to light and unsuitable for representing skin color. In YCbCr model, Cr

represents the red degree [15]. Since human skin color tends to red, we will adopt

(26)

2.2.1.1 Skin Region Extraction

Some skin detectors [14, 15] work on skin-color distribution modeling. Different

illumination levels such as indoor, outdoor, highlight and shadows will change the

color of skin. However, the size of the training set and variety of samples used in the

training may be impact on the skin detector performance [15]. That means the trained

skin model may be required extensive training and fine tuning to archive an optimal

performance for different training samples. In this dissertation, the skin detector

works on explicit skin-color space thresholding.

As mentioned above, human skin color tends to red despite the race, we will use Cr

value to determine whether a pixel is skin or not. If a pixel’s Cr value is larger than a

predefined threshold, then it will be considered as a skin pixel. In this dissertation, a

threshold Crth1 is adopted to bound skin color. On the other hand, the Cr value for skin

pixels will be lowered under a high lighting with blue color biased source

environment, another lower threshold Crth2 will also be used. That is, for an input

image, we first use Crth1 (here, 142 is used) as threshold to get candidate skin regions.

Then, for the same input image, we use Crth2 (here, 132 is used) as threshold to get

another candidate skin regions. Fig. 2.1 shows some results of applying the skin

region extractor on different racial faces. From this figure, we can see that whatever

(27)

pixels). After skin pixels are extracted, those connected skin pixels are grouped to

form a set of candidate skin regions, each region is bounded by a red rectangle (see

Figs. 2.1(b) and 2.1(c)). From Fig. 2.1, we can see that some extracted candidate skin

regions are not true skin. In the next section, we will present a filter to remove some

false skin regions.

(a) Four images taken under normal, bias colored and high lighting environments.

(b) The extracted candidate skin regions using threshold 142.

(c) The extracted candidate skin regions using threshold 132.

Fig. 2.1. Some results of applying skin region extractor on different races under various lighting environments.

2.2.1.2 Skin Region Filter

The skin region filter is designed to remove false skin regions based on four

(28)

used to remove several false regions. The detail is described as follows.

(1) Size: A candidate skin region will be removed if the width or height of its bounding rectangle is less than a threshold Sth. In this dissertation, we set Sth to be

15 (see Figs. 2.2(b) and 2.2(d)). On the other hand, if the width of its bounding rectangle is larger than a half of the input image width, the region will also be removed.

(2) Shape: In general, the shape of a face should be like an ellipse. Thus, if a region looks like a horizontal or vertical thin bar, it should be removed (see Fig. 2.2(b)). Note that if a region with Hs/Ws  Vth, it is considered as a vertical thin bar. On

the other hand, if a region with Ws/ Hs  Hth, it is considered as a horizontal thin

bar. Here, Hs and Ws are the height and the width of a region, and in the

dissertation, we set Vth to be 2.5 and Hth to be 0.5.

(3) Uniform-Content: A uniform content region should not be a face due to that the eyes and eyebrows are in face. Thus, if the standard derivation of the gray values in the bounding rectangle of a candidate skin region is less than a predefined threshold Uth (10), the region will be removed (see Fig. 2.2(f)).

(4) Geometric relation: If two candidate skin regions crossover and the small one is less than a half of the big one, then the small one is removed (see Fig. 2.2(g)). On the other hand, if a small region is totally covered by another big region, then both regions are preserved (see Fig. 2.2(g)). For this case, in the later process, the small one is first used to detect the eye line. If an eye line is found, the detection for the big one will be skipped.

(29)

Vertical shape region

Small size regions

(a) A face image. (b) Candidate skin regions using 132 as threshold.

Small size regions

(c) The result of removing regions of small sizes and improper shapes in (b).

(d) Candidate skin regions using 142 as threshold.

(e) The result of removing small regions in (d).

(f) The result of merging (c) and (e) and imposed on (a).

(g) The result of removing uniform content regions and cross over regions in (f). Fig. 2.2. Some examples for illustrating the skin region filter.

(30)

2.2.2 The Proposed Eye-Like Region Detector

In this section, we will provide a method to extract eye-like regions from those

remaining candidate skin regions. Fig. 2.3 shows the block diagram of the proposed

eye-like region detector. First, based on intensity and skin color information, some

eye-like regions are extracted. Next, some false regions are removed. Since the true

eye may not be located in the center of the eye-like region, a refining algorithm is

presented to adjust the eye-like region. Finally, some false eye-like regions appearing

in hair or shadow part will be removed, and isolated eye-like regions are also

removed. A set of eye-like regions The bounding rectangle of a candidate skin region Intensity based eye detection Non-skin color based eye detection Eye-like region merging and false region removing Eye position refining

Hair / Shadow and isolated region

removing

Fig. 2.3. The block diagram of the eye-like region detector.

In [7], luminance contrast and eye shape are used to detect an eye. However, under

various lighting environments, unpredicted shadow appearing on face makes eye

detection difficult. Fig. 2.4 shows some face images taken under various lighting

environments. To treat this problem, we use two fundamental eye properties to extract

eye-like regions. The first property is that the pupil and iris’s gray values are lower

(31)

provided to locate eye-like regions in the bounding rectangle of each candidate skin

region. The second property is that under non-colored lighting source condition, the

eye color will be very different from skin color. Based on this property, we consider

those small non-skin color regions in the bounding rectangle of a candidate skin

region as eye-like regions.

(a) The normal lighting condition. (b) Top lighting source.

(c) Colored and left bias lighting source.

(d) High lighting source.

Fig. 2.4. Subjects taken under various lighting environments.

2.2.2.1 Intensity Based Eye-Like Region Extraction

After extracting candidate skin regions, an intensity based detector will be first

provided to extract eye-like regions from the bounding rectangle of each candidate

(32)

The bounding rectangle of a skin region t1=t1_MAX Bi-level thresholding using t1 Candidate eye-like region finding False region removing Reserve the remaing eye-like regions t1=t1-10 t1 < t1_MIN A set of eye-like regions Yes No

Fig. 2.5. The flowchart of the intensity based eye-like region detector.

Based on the fact that the pupil’s gray value are lower than skin gray values, the

detector adopts a bi-level thresholding to get the eye-like regions. First, for a

candidate skin region, several different threshold values, t1, are used for bi-level

thresholding, the maximum and minimum of t1 are named as t1_MAX and t1_MIN,

respectively. t1_MAX is set as the average gray value of 10% pixels with maximum

gray values in the candidate skin region and t1_MIN is set to be the average gray

value of 5% pixels with minimum gray values in the candidate skin region. At

beginning, a bi-level threshold operator with t1_MAX as the initial threshold is

applied to extract eye-like regions, and then the operator is applied literately with t1

reduced by a value Ith (here, 10 is used) each time until t1 reaches t1_MIN. Fig. 2.6

shows an example for evaluating t1_MAX and t1_MIN, and the result of applying the

(33)

t1_Min t1_Max t1=146 t1=86 t1=56

(a) A face region.

(b) The histogram of detected gray pixels within the skin region.

(c) Bi-level thresholding results of (a) using different thresholds.

Fig. 2.6. An example of the bi-level thresholding results.

After applying a bi-level thresholding operator at each time, every resulting black

region is considered as a candidate eye-like region. However, some of these black

regions are false ones. Here, we provide three filters to remove these false eye-like

regions. One is geometric filter, which is used to remove too small or too large regions.

A black region with size less than Sth2 (here, Sth2 is set as 5x5) or its width (height)

larger than Sth3 (here, we set Sth3 to be 1/4 skin region width) will be removed. One is

statistical filter, if the standard derivation of the gray values of an eye-like region is

lower than a predefined threshold, the region is considered as a false eye-like region

and then removed. Here the threshold we chose is 10. From the structure of an eye, we

can see that the iris and pupil’s gray levels are lower than skin. Based on this property,

the third filter called projection filter is provided to remove each black region with its

average gray value larger than that of the upper or lower neighboring areas. A

(34)

of the highest peak above or below the position, we removed the region. Fig. 2.8

shows the final result of applying the intensity based eye-like region detector to a face

image, each extracted eye-like region is enclosed by a red rectangle.

Eye-like region

Projection region

a

(a) The projection region extended from eye-like region.

(b) The projected histogram of the projection region in (a).

Fig. 2.7. An example of the horizontal gray value projection for an eye.

(a) The result of a normal lighting source face.

(b) The result of a colored and left bias lighting source face.

Fig. 2.8. The results of applying the intensity based algorithm.

2.2.2.2 Non-Skin Color Based Eye Detector

Under a very low lighting environment, an eye-like region extracted by intensity

(35)

information to locate eyes. First, non-skin color regions within a candidate skin region

are considered as eye-like regions. The three false eye-like region filters described in

Section 3.1 are then used to remove some false regions. Fig. 2.9(a) shows the result of

applying the non-skin color based eye detector. Each eye-like region is enclosed by a

green rectangle. From this figure, we can see that the left eye was precisely located.

Fig. 2.9(b) shows the combined result of Fig. 2.8(b) and Fig. 2.9(a).

(a) The eye-like regions located based on non-skin color information.

(b) The combined result of Fig. 2.8(b) and (a).

Fig. 2.9. The result of applying the proposed intensity and non-skin color based eye detectors.

2.2.3 False Eye-Like Region Removing

The obtained eye-like regions still contain some false regions; we can classify these

regions into five classes: overlapped, hair reflecting, beard/clothes, isolated, and

forehead-hair (see Fig. 2.10). In the following, we will provide some procedures to

(36)

Hair reflecting Overlappe d Isolated / overlapped Beard / clothes Hair reflecting Forehead -hair Overlapped Beard / clothes

(a) An example to show four classes of false eye-like regions.

(b) Another example to show forehead - hair class.

Fig. 2.10. Five false region classes.

2.2.3.1 Overlapping/ Hair Reflecting/ Beard/ Clothes False Region Removing

For the overlapping class, if one eye-like region totally covers a small eye-like

region, the covering region is removed. To remove hair reflection regions, first, a

horizontal gray level projection histogram (see Fig. 2.11(b)) is created. Then, the first

peak location from the top of the projection histogram is defined as an upper line

which usually indicates the forehead. Each eye-like region intersecting the upper line

is removed. For the beard/clothes class, we first define a bottom line with distance h/2

from the upper line, where h is the distance between the upper line and the bottom of

the skin region. Then all eye-like regions below the line are removed. Fig. 2.11(d)

(37)

The first peak

(a) A bounding rectangle of a candidate skin region.

(b) The horizontal projection histogram of the bounding rectangle in (a).

h/2

Upper line

Bottom line h

(c) The detected upper and bottom lines.

(d) The result after applying the removing procedure presented in Section 2.2.3.1 to (a).

Fig. 2.11. The result of applying the false eye-like removing procedure.

2.2.3.2 Eye Position Refining And Isolated False Region Removing

For a non-rotated human face, an eye pair should be located at a near-horizontal

line. Thus for an eye-like region, if we can not find another region at its left or right,

then this region is called isolated region and should be removed. However, under

various environments, a true eye may not be located at the center of the extracted

eye-like region (see Fig. 2.12(a)), we are then unable to know where the eye is. In

order to treat this problem, we provide an eye position refining method to locate the

(38)

into three classes according to the lighting condition: normal, high light and low light.

Next, one image is selected randomly from each class. Considering the effect of

glasses, the normal class has two images selected with/without wearing glasses. For

each image, three templates are extracted (see Table 2.1). An eye area including eyelid

is taken as the first template. In the first template, an area only containing eye is then

taken as the second template. In the second template, an area only containing eye ball

is taken as the third template. After obtaining the twelve templates, we use each

template to mask each eye-like region by sliding the template pixel by pixel in the

region to find the best matched area for each class. Covariance is used to measure the

similarity. If an eye-like region with the located eye center does not lie in the true eye,

we take three new templates from the image containing the region using the previous

method. The procedure is repeated until all eye-like regions are processed. After the

procedure is finished, we find that nine new templates (see Table 2.1) belonging to

three new classes are extracted. These three new classes are left-side (the eye sees the

left side), right-side (the eye sees the right side) and bias lighting. Since the images in

HHI have different face sizes, the extracted templates will also have different sizes.

After we make the eye templates, a mentioned similarity matching procedure is

adopted to find the best matched area for testing image. We use the similarity function

(39)

n B b A a B A n i mean i mean i



    1 ) )( ( ) , cov( , (2-1)

where A[a₁,a₂,...,a_n] and B[b₁,b₂,...,b_n] are two areas, and A_mean and B_mean

are the mean values for each area. Thus, we can obtain six best matched areas for each

eye-like region. Then the center of the best matched area with the highest Cb/Cr value

(the most possible non-skin area) is considered as the best eye center in the eye-like

region (see the red points in Fig. 2.12(b)). Finally, all eye-like regions are refined to

be a wh region centering at the best eye center. In the dissertation, we set w to

30, h to 10 (see the green rectangles in Fig. 2.12(b)). After an eye-like region is

refined, we judge if it is an isolated region. If yes, the region will be removed. Fig.

2.13(b) shows the result of removing those refined isolated false regions.

Table 2.1 The templates for six eye classes.

Eye Type Template Images

Normal High light Low light Left-side Right-side Eye ball

(40)

(a) An example to show an eye not located at the center of the eye-like region.

(b) The result of applying the eye position refining procedure to (a).

Fig 2.12. The result of applying the eye position refining procedure.

(a) An example to show isolated regions.

(b) The result of removing the isolated regions in (a).

Fig. 2.13. The result of removing isolated false regions.

2.2.3.3 Forehead-hair false regions removing

Before describing the proposed forehead-hair false region removing algorithm, we

will define the bounding rectangle for a face. Let R be the minimum rectangle

containing all eye-like regions, then R is expanded in the vertical direction to form a

square rectangle (see Fig. 2.14). The square rectangle is defined as the bounding

(41)

(a) The minimum rectangle containing all eye-like regions.

(b) The expanding result of (a).

Fig. 2.14. An example to illustrate the bounding rectangle of a face.

In general, pixels in hair or shadow regions caused by a bias lighting have lower gray

levels than pixels in a face. Based on this property, we find that if the bounding

rectangle of a face contains a part of lower gray region (hair or side shadow), its

gray-level histogram, h(x), will be a bi-model distribution; else the histogram will be a

uni-model distribution (see Fig. 2.15). To distinguish these two models, an average

probability horizontal line (APL) Y = T with



  255 0 ) ( 256 1 T x x

h is first defined. If the

number of the intersection line segments between the APL and the histogram is two, a

bi-model distribution is identified (see Fig. 2.15(a)). Otherwise, if only one line

segment exists, a uni-model distribution is identified (see Fig. 2.15(b)). For a face

bounding rectangle, those eye-like regions appearing in the hair area should be

removed. Based on the above discussion, a procedure is proposed to remove

forehead-hair false regions. To identify the hair part, we set the initial bi-level

threshold t0 as average gray value of the face rectangle for a bi-model histogram. If

(42)

       



 255 0 ) ( % 5 ) ( min x x h t ACC t with



  t x x h t ACC 0 ) ( )

( . The pixels with gray levels

less than t0 are labeled as black pixels. However, the threshold may be improper such

that over-segmentation or under-segmentation (see Fig. 2.16) will occur. To treat this

problem, the threshold t0 will be adjusted according to the following procedure. After

applying the bi-level thresholding, if no two eye-like regions contain over Oth (30%)

black pixels, an over-segmentation is detected (see Fig. 2.16(b)). The t0 should be

adjusted to be larger. The adjustment rule is to take the nearest right valley to the

current t0 on the histogram as a new threshold t0. If there is no valley is found, then

the t0 will be adjusted by increasing a threshold value (Nth), here Nth is set to be 5.

Bi-model (a) A bias lighting face and its corresponding histogram.

Uni-model (b) A normal lighting face and its corresponding histogram.

(43)

(a) Original image. (b) An over-segmentation obtained after applying the threshold process to (a).

(c) Original image. (d) An under-segmentation obtained after applying the threshold process to (c). Fig. 2.16. Two examples to show the over-segmentation and under-segmentation.

On the other hand, if all eye-like regions are located in a big black area, an

under-segmentation is detected (see Fig. 2.16(d)), t0 will be refined to smaller. The

adjustment rule is to take the nearest left valley to the current t0 on the histogram as a

new threshold t0. If there is no valley was found, then the t0 will be decreased by Nth.

The above procedure is repeated until no over or under segmentation occurs. In order

to prevent a dangling situation, any used valley will be removed to avoid reusing.

Then the bi-level thresholding operator with the final threshold is applied to identify

the dark regions in the face bounding rectangle (see Fig. 2.17(b)). If a dark area is hair,

it should be large in the face bounding rectangle. If a dark area is iris, it should be

small. So, for a dark region with area larger than

4 1

(44)

consider it as a hair region, and all eye-like regions in the hair region are removed. In

addition, if a dark region has the ratio of the height over the width larger than a

threshold 2.5, we will remove the region. On the other hand, since the iris and eyelids

have lower gray level than the other part of an eye, if an eye-like region contains less

than 3% dark pixels, the region is also removed. Note that any eye-like region

locating at the shadow part will also be removed by the above procedure (see Fig.

2.17(c)). To resume these regions, we use a previously mentioned fact if an eye is

found, then the other eye should be found at its right side or left side. For each

remaining eye-like region, R, we first draw three lines, L1, L2 and L3, passing

through the center of R. L1 is a horizontal line, the angles between L1 and L2 (or L3)

is a predefined angle Ath (here, 10 is used) (see Fig. 2.18(a)) for a bi-model

histogram face. Then a bounded range, BR, is defined to be the area bounded by L2

and L3. Based on the bounded range, all eye-like regions in the bounded range

removed by the forehead-hair false region removing procedure will be got back (see

Fig. 2.18(b)). Now the remaining eye-like regions are used to form a new bounding

rectangle (see the dotted rectangle in Fig. 2.18(b)) which will be used to determine the

(45)

(a) The eye-like regions before applying the removing procedure.

(b) The result of applying the bi-level threshold to (a).

(c) The result of removing the forehead - hair false regions in (b).

Fig. 2.17. The result of applying the forehead-hair false region removing procedure.

0 10  0 10  L2 L1 L3 R w w/4

(a) The bounded ranges for resuming the removed eye-like regions in the shadow area of a face.

(b) Three eye-like regions (marked by white rectangles) got back.

Fig. 2.18. An example to explain the eye-like region resuming procedure.

2.2.4 The Proposed Horizontal Eye Line Detector

In this section, we will propose a novel eye line detection algorithm. Because

eyes and eyebrows are regions with low gray-level, their locations will correspond to

(46)

algorithm can be reduced as a valley finding procedure on the horizontal gray value

projection histogram. In the valley finding procedure, all valleys are taken first, then

the small valleys with value smaller than a threshold Dth (here, we set Dth to be 3) will

be removed. If some valleys are located at similar locations with distance between

two neighboring ones smaller than a threshold Bth (here, Bth is set to be 3), the deepest

valley will be kept, others are discarded.

The detail is described as follows. For a given bounding rectangle of a face, if there

is only one valley in the projection histogram and there is a pair of eye-like regions

near the valley (see Fig. 2.19(a)), then the horizontal line (the green line in Fig.

2.19(a)) is defined as the line passing through the vertical middle location between the

two eye-like regions. If there are two valleys, and if the distance between these two

valleys is less than a threshold and there are at least one pair of eye-like regions near

the two horizontal lines passing through these two valleys, respectively, then the

upper one (the red line in Fig. 2.19(b)) is considered as the eyebrow line and the other

(the green line in Fig. 2.19(b)) is the eye line. If there are 3 or 4 valleys, we will

execute the following procedure (called procedure A). First, we define the distance

between the first and second valleys as D1 and between the second and third valleys

(47)

Condition 1: D1s1D2. Condition 2: D1s2D2.

Note that s1 and s2 are set as 1.2 and 3.0 in the dissertation.

If valleys satisfy Condition 1 and there exist two eye-like regions near the second

valley line, then the second valley line (the green line in Fig. 2.19(c)) is considered as

eye line. On the other hand, if Condition 2 is satisfied and there exist two horizontal

eye-like regions near the first valley line, then the first valley line (the green line in

Fig. 2.19(d)) is considered as the eye line. If both conditions are not satisfied, we will

apply procedure A described above to the next three valleys (the second, third and

fourth). If no eye line is found, we conclude that the region is not a face. If an eye line

is found, any eye-like region which distance from the eye line larger then 1.5 height of

the eye-region will be removed. Then, the remaining regions will be used to outline a

face rectangle where the left/right position is the most left/right region. The top

position of the rectangle is defined as the upper location of the most top region from 3

height of an eye-like region. And the height of the rectangle is defined as 1.3 width of

(48)

(a) Only one valley case. (b) Two valleys case.

(c) D1s1D2 with two eye-like regions near the second valley line.

(d) D1s2D2 with two eye-like regions near the first valley line. Fig. 2.19. Some examples for the horizontal projection.

2.3 EXPERIMENTAL RESULTS

In order to show the effectiveness of the proposed method, we apply the

proposed method to HHI face database [19] of 206 images (see Fig. 2.20) and

Champion face database [20] of 227 images (see Fig. 2.21). We also collect some

other images from Internet, MPEG7 video clips, PIE database [21] to evaluate the

performance. These contain face images of different racial persons under different

kinds of lighting conditions (such as overhead, side and color lightings). Since one of

our main applications of the method is for a drowsy driving warning system and in

most cases, a driver’s head position is usually frontal or near profile. Since the head is

rarely in-plane rotated when people are drivning, in dissertation only detect the

(49)

implemented by Java language on a laptop with Pentium(R) M processor 1.4 G Hz

CPU. In Fig. 2.21, some successful results of applying our method to some faces with

different skin colors. The successful results for faces with different poses and eye

glasses are shown in Fig. 2.20. Even there is shadow on a face; the eye line can also

be detected (see Fig. 2.22).

Fig. 2.20. A part of images from HHI face database with different poses and eye-glasses.

Fig. 2.21. A part of images from Champion face database with different skin colors.

Fig. 2.22. The detection results for persons with shadows and various lighting sources on their faces.

(50)

If the detected eye line is located under eye brow and near eyes, we consider it to be a correct detection; otherwise it is an error detection. The tolerance value is the height of an eye-like region. If no horizontal eye line is detected for a face image, we call it a missing detection. Fig. 2.23 shows the successful detection results for a set of images from the Internet, MPEG7 video clips and PIE database images. For HHI database, the correct detection rate is 95.63% and the correct detection rate on Champion database is 94.27%. Fig. 2.24 shows the error and missing detection results.

Fig. 2.23. The detection results of face images collected from the Internet, MPEG7 video clips and PIE database.

(51)

CHAPTER 3 A NOVEL FACE DETECTION METHOD

UNDER VARIOUS ENVIRONMENTS

In this chapter, we will propose a method to detect a face with different poses

under various environments. It consists of three phases: profile feature extraction,

non-profile face detection and the true eye locating.

3.1 INTRODUCTION

Detecting a face from an image is the first step for face applications. Even it has

been studied for many years; detecting a face under various environments is still a

challenge work. Some factors make face detection difficult. One is the variety of

colored lighting sources, another is that facial features such as eyes may be partially

or wholly occluded by shadow generated by a bias lighting direction; others are races

and different face poses with/without glasses. Several methods [4-18] had been

proposed to detect faces. Some [4-6] use eigenface, neural network and support

vector machine (SVM) to detect faces of restricted poses under normal lighting

(52)

properties are used to extract the facial features and applied to verify a face. Chow et

al. [7] proposed an approach to locate a face candidate under different lighting

conditions and then applied an eigenmask based method to verify the face. However,

these methods fail to locate faces with some kinds of poses (such as near-profile and

profile face). Shin and Chuang [8] uses shape and contour information to locate face

in a plain background and normal lighting condition. However, it is difficult to detect

a face in a complex background and various lighting conditions. Wu and Zhou [9]

proposed an algorithm to locate face candidates using eye-analogue segment

information. The algorithm would fail when a person wears glasses. Some methods

[10-12] use mosaic, edge and mathematical morphology to detect eyes, however,

these methods would fail to locate eyes when a face is under a poor lighting

condition (such as bias-light source).

3.2 THE PROPOSED METHOD

As mentioned above, various lighting conditions, different head poses, glasses,

races, and complex background are factors to make face detection difficult. To solve

these problems, we propose a method to detect a face with/without glasses under

various poses and environments. Fig. 3.1 shows the block diagram of the proposed

(53)

information is used to judge whether a skin region contains a shoulder part. If yes, a

skin projection based method will be applied to remove it and the head part is

labeled as a face candidate for further processing. For each face candidate, a set of

geometric constrains are firstly applied to judge if the face candidate is a profile face.

If no, some eye-like rectangles are first extracted. Then, based on these eye-like

rectangles, a horizontal eye line and a vertical symmetric line are located to judge if

the face candidate is a non-profile face.

A color image Skin region detection

Is a profile face ? Face location Eye-like rectangle location Vertical symmetric line detection End Non-profile face detection Shoulder removing Is a face found No face True eye labeling Horizontal eye line detection no yes no yes ?

Fig. 3.1. The block diagram of the proposed face detector.

The remaining of this chapter is described as follows. In Section 3.2.1, a shape

based algorithm is proposed to identify a profile face. Section 3.2.2 presents an

algorithm to detect a non-profile face. Section 3.2.3 provides the true eye locating

(54)

3.2.1 Profile Detection

In this section, we will provide a method to determine if an image contains a

profile face. First, the skin region detector proposed by Jing and Chen [22] is

adopted to extract skin regions from an input image. Fig. 3.2 shows the extracted

results from some face images with different poses taken under various lighting

environments. Note that some pixels in clothes with color similar to skin are also

labeled as skin. In the following, we will provide a method to remove these pixels.

(a) Three images taken under various lighting environments.

(b) The extracted skin regions (white pixels) using Jing-Chen skin region detector. Fig. 3.2. Some results of applying Jing-Chen skin region extractor.

3.2.1.1 Shoulder Removing Procedure

Fig. 3.3 shows an example of two men, one in Fig. 3.3(a) wears a T-shirt of

(55)

3.3(c) and 3.3(d) show the results of applying Jing-Chen method to Figs. 3.3(a) and

3.3(b), respectively. In Fig. 3.3(c), pixels on shoulder are labeled as skin pixels. Here,

a projection-based procedure is provided to remove the shoulder part.

(a) A man wearing a T-shirt of skin-like color.

(b) A man wearing a T-shirt of non skin-like color.

(c) The detected skin region of (a). (d) The detected skin region of (b).

Neck point

(e) The horizontal projection of (c). (f) The horizontal projection of (d). Fig. 3.3. An example showing two men wearing clothes with/without skin-like color.

First, we need to determine whether a skin region includes a shoulder part. In

人臉偵測之研究

國

立

交

通

大

學

資訊科學與工程研究所

博 士 論 文

人臉偵測之研究

A Study on Face Detection

研 究 生: 井 民 全

指導教授: 陳 玲 慧 教授

人臉偵測之研究

A Study on Face Detection

研 究 生：井民全

Student: Min-Quan Jing

指導教授：陳玲慧

Advisor: Dr. Ling-Hwei Chen

國 立 交 通 大 學

資 訊 科 學 與 工 程 研 究 所

博 士 論 文

中華民國九十八年七月

人臉偵測之研究

研究生:井民全 指導教授:陳玲慧博士

國立交通大學資訊學院 資訊工程系

摘要

近年來

數位安全監控技術的普遍發展

使得完全自動化的人臉

系統 (fully automated face system) 越來越受到重視。人臉偵測是自動

化人臉系統的基石，任何人臉應用系統都需要有一個強固的人臉偵測

方法，才能達到預期的效果，因此人臉偵測的研究與分析便顯得重要

而且迫切。雖然人臉偵測方法的研究已經超過 20 年，也有許多成果

已經發表。可是對於多變化環境下的人臉偵測，特別是針對不同人臉

角度 、不同光源方向與有顏色光源的人臉位置偵測方法，依然是一

項十分具挑戰性的研究領域。

一般人臉偵測方法的主要架構大致是由下列幾項要素組成: 人臉

候選位置的判定、臉部特徵的擷取與人臉位置的偵測。可是由於多變

化的環境，導致人臉偵測的困難。例如在研判人臉可能出現的位置

時，就可能受取像時的環境條件所影響，有顏色的光源與不同角度的

側面光源，如不加以考量就會影響人臉候選位置的判定。另外由於攝

影機與人臉之間的取像角度，使得人臉在影像中所呈現的姿勢可能為

正面、接近正面、半側面、側面等，一些重要的臉部特徵如眼睛，卻

可能會因為影子、眼鏡和人臉姿勢的關係導致偵測錯誤。而不同的臉

部表情也會產生不同的臉部呈現樣式。這些因素增加了人臉偵測的困

難性。本論文之主要目的在於提出一個解決上述問題的人臉偵測方

法。

在本論文中，我們首先將會提出一個在多變化環境下的眼睛水平

線偵測方法，利用膚色資訊將人臉可能所在的區域標記出來，然後根

據人類眼睛擁有的非膚色特性與高亮度(gray-level)變化特性，將眼睛

的位置標示出來，最後利用人臉的幾何特性將眼睛水平線標記定位。

這個方法主要的貢獻在於提出一個有效的方法，取得多變化環境下的

眼睛候選位置與水平線資訊。接下來，利用得到眼睛水平線位置與眼

睛可能位置座標，發展一個在多變化環境下不同姿勢的人臉偵測方

法。首先將膚色資訊所標記出來的人臉區域進行肩膀區域的判定與刪

除，接著利用定義好的側面人臉特徵幾何規則將人臉區域分成側面人

臉或非側面人臉區域。對於非側面的人臉候選區域，利用前述方法取

得眼睛水平線與眼睛可能位置座標。最後針對不同人臉姿勢與人臉亮

度分佈的特性，標記出真正的眼睛位置並藉此將人臉位置標示出來。

本論文所提出的方法，經過標準人臉資料庫實驗證實，可以解決人臉

偵測在多變化環境下所遭遇的問題。

A Study on Face Detection

ABSTRACT

誌 謝

TABLE OF CONTENTS

LIST OF TABLES

LIST OF FIGURES

ABBREVIATION

CHAPTER 1

INTRODUCTION

CHAPTER 2

A NOVEL METHOD FOR

HORIZONTAL EYE LINE DETECTION

UNDER VARIOUS ENVIRONMENTS









CHAPTER 3

博士論文

研究生: 井民全

指導教授: 陳玲慧教授

研究生：井民全

國立交通大學

資訊科學與工程研究所

博士論文

研究生:井民全指導教授:陳玲慧博士

國立交通大學資訊學院資訊工程系

角度、不同光源方向與有顏色光源的人臉位置偵測方法，依然是一

誌謝