利用多重亮度紅外線打光器之夜間人臉辨識

(1)

國

立

交

通

大

學

多媒體工程研究所

碩

士

論

文

利用多重亮度紅外線打光器之夜間人臉辨識

Nighttime Face Detection Using Multi-Intensity IR Illuminator

研究生：姚柏安

指導教授：莊仁輝教授

(2)

利用多重亮度紅外線打光器之夜間人臉辨識

Nighttime Face Detection Using Multi-Intensity IR Illuminator

研究生：姚柏安 Student：Po-An Yao

指導教授：莊仁輝 Advisor：Jen-Hui Chuang

國立交通大學

多媒體工程研究所

碩士論文

A Thesis

Submitted to Institute of MultimediaEngineering

College of Computer Science

National Chiao Tung University

in partial Fulfillment of the Requirements

for the Degree of

Master

in

Computer Science

July 2012

Hsinchu, Taiwan, Republic of China

(3)

利用多重亮度紅外線打光器

之夜間人臉辨識

學生：姚柏安指導教授：莊仁輝博士

國立交通大學

多媒體工程研究所碩士班

摘要

在目前一般夜間監控的環境下，常常因光源的不足，無法取得較遠之清晰人物；另一方面，在光源充足的情況下，也可能因人物離攝影機過近，造成人物過曝的情形，以致於無法辨識其面貌五官。對此，我們使用週期性改變光源強度之多重亮度紅外線打光器，來輔助夜視攝影，以得到多種亮度影像，並對此新式夜視影像做分析，以期同時達到遠處人物偵測正確與近處人物清晰辨識之目的。由於透過多重亮度打光器會產生不同亮度的人臉，我們提出亮度偵測方法以分辨出高品質的人臉影像，實驗結果顯示，在不同的距離下，我們可以偵測並挑選出好的人臉影像。

(4)

Nighttime Face Detection Using Multi-Intensity IR Illuminator

Student：Po-An Yao Advisor：Dr. Jen-Hui Chuang

Institute of Multimedia Engineering

National Chiao Tung University

Abstract

In nighttime video surveillance, the image details of far objects are often hard to be

identified due to poor illumination conditions while the image regions of near objects may be

whitened due to overexposure. To alleviate the two problems simultaneously for nighttime

video surveillance, we adopt a new multi-intensity infrared illuminator as a supportive light

source to provide multiple illumination levels periodically. By using the illuminator with

multiple degrees of illumination power, both far and near objects can be clearly captured. In

this thesis, an effective algorithm is developed to pick out high quality human faces from

nighttime video sequences. Experiment results show that well exposed face images can be

(5)

Acknowledgement

I would like to express my very great appreciation to Dr. Jen-Hui Chung, for his patient

guidance, enthusiastic encouragement and useful suggestions of this thesis. His willingness to

give his time so generously has been very much appreciated. I would also like to thank the

members of my dissertation committee, Dr. Hsu-chun Yen, Dr. Fei-pei Lai, and Dr.

Sheng-Jyh Wang, for their thoughtful insights and advices to help me develop this research.

I wish to acknowledge the help provided by all the members of the Intelligent System

Laboratory at National Chiao Tung University during the planning and development of this

research work. Special thanks also to all of those who supported me in any respect in

completing this thesis.

Lastly, I would also like to extend my thanks especially to my dear parents and girlfriend

(6)

List of Figures

FIGURE 1.1:ILLUSTRATION OF THE USE OF INTEGRAL IMAGE. ... 2

FIGURE 1.2:HAAR-LIKE RECTANGLE FEATURES. ... 2

FIGURE 1.3:THE ROTATED RECTANGLE FEATURES AND CENTER-SURROUND FEATURES. ... 2

FIGURE 1.4:ILLUSTRATION OF THE ROTATED INTEGRAL IMAGE. ... 3

FIGURE 1.5:THE TWO BASIC TYPES OF DIAGONAL FILTERS [4]. ... 4

FIGURE 1.6:HAAR WAVELET LIKE FEATURES WITH FLEXIBLE SIZES AND DISTANCES [6]. ... 4

FIGURE 1.7:SOME EXAMPLES OF IMAGES WITH “BRIGHT PUPIL” EFFECT. ... 5

FIGURE 1.8:PERIODIC CHANGES OF THE ILLUMINATION INTENSITIES OF A MIIR-ILLUMINATOR. ... 6

FIGURE 1.9:SCHEMATIC DIAGRAM OF THE FRAMEWORK FROM [12]. ... 7

FIGURE 1.10:FLOWCHART OF THIS THESIS. ... 8

FIGURE 2.1:CAMERA CAPTURES DIFFERENT RANGES OF SURVEILLANCE DEPTH. ... 10

FIGURE 2.2:INTENSITY WAVEFORM OF MIIR-ILLUMINATOR DETECTED BY PHOTODIODE WHICH IS DISPLAYED ON AN OSCILLOSCOPE. ... 10

FIGURE 2.3:DISTRIBUTION OF AVERAGE INTENSITY OF IMAGES. ... 11

FIGURE 2.4:THE BRIGHTNESS OF CAPTURED IMAGES CHANGES PERIODICALLY. ... 13

FIGURE 2.5:(A)NON-UNIFORM DISTRIBUTION OF INTENSITY IS NON-UNIFORM IN A CAPTURED IMAGE.(B)THE BRIGHTNESS OF HUMAN FACE NEAR THE CENTRAL REGION OF AN IMAGE BRIGHTER THAN OTHERS, EVEN THOUGH ONE OTHER PERSON IS NEARER TO CAMERA. ... 14

FIGURE 3.1:TWO CYCLES OF IMAGES CAPTURED WITH MIIR-ILLUMINATOR. ... 15

FIGURE 3.2:A CASCADE OF CLASSIFIERS WITH N STAGES.EACH CLASSIFIER IS TRAINED TO ACHIEVE A HIT RATE OF H AND A FALSE ALARM RATE OF F. ... 17

FIGURE 3.3(A)–(D):THE RESULTS OF USING DIFFERENT CLASSIFIERS.ON THE LEFT, RED RECTANGLES ARE FOR FACE DETECTED BY FRONTAL FACE CLASSIFIER DESCRIBED IN [3].ON THE RIGHT, RED RECTANGLES ARE FOR FACES DETECTED BY OURS FRONTAL FACE CLASSIFIER. ... 19

FIGURE 3.4:EXAMPLES OF OVEREXPOSED AND BLURRED FACES. ... 20

FIGURE 3.5:THE RED RECTANGLE REGIONS ARE THE RESULTS OF FIRST LAYER CLASSIFIER AND THE GREEN RECTANGLE REGIONS ARE THE RESULTS OF DIFFERENT SECOND LAYER CLASSIFIERS.THE LATTER (FROM LEFT TO RIGHT): EYE PAIR CLASSIFIER, EYE CLASSIFIER, NOSE CLASSIFIER, AND MOUTH CLASSIFIER. ... 22

FIGURE 3.6:HUMAN FACES AT DIFFERENT DISTANCES CAN BE CAPTURED BY ADOPTING DIFFERENT ILLUMINATION LEVELS. ... 24

FIGURE 4.1:HUMAN FACES DETECTED BY THE TWO-LAYER CLASSIFIER. ... 25

FIGURE 4.2:SOME EXAMPLES OF DIFFERENT BRIGHTNESS OF FACE.(A)HIGH BRIGHTNESS.(B)MEDIUM BRIGHTNESS.(C)LOW BRIGHTNESS. ... 26

FIGURE 4.3:(A)ORIGINAL FACES WHICH ARE DETECTED BY THE TWO-LAYER CLASSIFIER IN ONE ILLUMINATION CYCLE.(B)THE COLORS REPRESENT DIFFERENT RANGES OF GRAY VALUES OF A FACE. ... 27

(9)

FIGURE 4.4:THE CHEEK REGION (GREEN RECTANGLE) USED TO REPRESENT THE BRIGHTNESS OF HUMAN FACE. ... 27

FIGURE 4.5:SOME EXAMPLES OF INTENSITY HISTOGRAMS OF CHEEK REGIONS. ... 28

FIGURE 4.6:THE VARIATION OF BRIGHTNESS OF THE CHEEK REGION (SEE TEXT). ... 28

FIGURE 4.7:(A)HISTOGRAMS OF HQ FACE IMAGES.(B)HISTOGRAMS OF CHEEK REGIONS. ... 29

FIGURE 4.8:THE SELECTION OF FACES WITH ACCEPTABLE QUALITY IN ONE ILLUMINATION CYCLE. ... 30

FIGURE 4.9:THE (A)SIZE AND (B) BRIGHTNESS ADJUSTMENTS OF EACH IMAGE GROUP:HQ(TOP), DARKER (MIDDLE), AND LIGHTER (BOTTOM). ... 31

FIGURE 5.1:(A)EXPERIMENTAL ENVIRONMENT.(B)CLOSE-UP VIEW OF IR CAMERA AND MIIR-ILLUMINATOR.... 32

FIGURE 5.2(A)–(C):FACE IMAGES CAPTURED USING THE MIIR-ILLUMINATOR (LEFT COLUMN) AND A TRADITIONAL IR ILLUMINATOR (RIGHT COLUMN). ... 34

FIGURE 5.3:RESULTS OF FACE DETECTION AND FACE IMAGE SELECTION UNDER SIMPLE BACKGROUND ENVIRONMENTS. ... 36

FIGURE 5.4:RESULTS UNDER MULTI-PERSON CONDITION.(A)RESULTS OF FACE DETECTION.(B)RESULTS OF FACE IMAGE SELECTION. ... 38

FIGURE 5.5:(A)FACE DETECTION RESULTS UNDER SIMPLE BACKGROUND AND COMPLEX BACKGROUND ENVIRONMENTS OBTAINED BY USING ONE-LAYER CLASSIFIER.(B)SIMILAR RESULTS OBTAINED BY USING TWO-LAYER CLASSIFIER... 40

FIGURE 5.6:RESULTS OF FACE DETECTION AND FACE IMAGE SELECTION FOR ONE PERSON UNDER COMPLEX BACKGROUND ENVIRONMENTS.(A)THE HUMAN FACE IMAGES ARE CAPTURED UNDER SIX DIFFERENT ILLUMINATION LEVELS.(B)THE RESULTS OF GROUPING THE DETECTED FACE IMAGES. ... 42

FIGURE 5.7:RESULTS OF FACE DETECTION AND FACE IMAGE SELECTION FOR MULTI-PERSON UNDER COMPLEX BACKGROUND ENVIRONMENTS.(A)THE HUMAN FACE IMAGES ARE DETECTED UNDER SIX DIFFERENT ILLUMINATION LEVELS.(B)THE RESULTS OF GROUPING THE DETECTED FACE IMAGES. ... 44

(10)

List of Tables

TABLE 3.1:THE EFFICIENCY IMPROVEMENT FROM PRE-PROCESSING. ... 16 TABLE 3.2:THE ACCURACIES OF DIFFERENT SECOND LAYER CLASSIFIERS. ... 20 TABLE 4.1:HISTOGRAMS OF HQ FACES AND THEIR CHEEK. ... 29

(11)

Chapter 1. Introduction

1.1 Motivation

While video surveillance plays a vital role in crime survey and prevention, recent growth

of surveillance cameras necessitates the development of automatic and semi-automatic

security surveillance systems. Although many surveillance systems are based on the visible

light in the daytime, there are much more crimes committed in the nighttime than in the

daytime. One of the most important factors in surveillance systems is to capture the clear face

of the criminals, as it allows the police to quickly catch the criminals. However, nighttime

videos often have low quality due to poor lighting conditions. For instance, faces of intruders

at far distances from a camera are usually hard to be confirmed due to limited power of an

infrared illuminator (IR-illuminator) of the camera. On the other hand, faces of people close

to a camera may become unclear due to overexposure of the image under the strong light from

an IR-illuminator. In fact, it is practically impossible to obtain images of good quality for all

human faces at different distances with traditional IR illuminators. Hence, the goal of this

thesis is to improve the quality of face image and resolve the problems of underexposure and

overexposure in nighttime video surveillance.

1.2 Review of related work

Face detection has been studied for a long period of time, and many techniques have

been developed to solve this popular topic. According to applications in different

environments, these techniques can be roughly classified into two types: daytime and

(12)

1.2.1 Face detection in daytime

In recent years, the face detection technique developed by Viola and Jones [1] has

enabled many successful face detectors. It applied the integral image for rapid computation of

Haar-like features, and trains the classifier with AdaBoost algorithm. The integral image at

point ( ) contains the sum of the pixels above and to the left, i.e.,

( ) ∑ ( ) (1) where ii(x,y) is the integral image and i(x’,y’) is the original image. By using the integral

image, one can efficiently compute the sum of pixel values in a rectangle, as shown in Figure

1.1, where the sum of pixel values in D can be calculated as:

( ) ( ) ( ) ( ) (2) regardless of the size of D.

Figure 1.1: Illustration of the use of integral image.

Figure 1.2: Haar-like rectangle features.

(13)

So we can compute simple Haar-like rectangle feature by using the integral image, as shown

in Figure 1.2. The feature value is the intensity difference between black and white rectangles.

Afterwards lots of research works improve the Haar-like rectangle features. In [2]、[3],

the authors rotate these features by 45 degrees, and use center-surround feature, as shown in

Figure 1.3. The integral image rotated by 45 degrees in [2]、[3] is calculated as follow:

( ) ∑ | | ( ) (3) Using the rotated integral image, we can compute the sum of pixel values in any rotated

rectangular area, as shown in Figure 1.4. The sum of pixel values in D can be calculated as:

( ) ( ) ( ) ( ) (4) So we can compute simply rotated Haar-like rectangle feature by using the rotated integral

image in Figure 1.3. The authors in [2] say that using rotated rectangle feature can improve

the performance by about 10%.

Figure 1.4: Illustration of the rotated integral image.

Jones and Viola [4] build diagonal filters which focus on diagonal structures in the image.

These diagonal filters are shown in Figure 1.5. The sum of the pixel values in the dark gray

shaded region is subtracted from the sum calculated from the light gray shaded region. To

compute the diagonal filters, they just use 16 pixels in the corners. These diagonal features are

(14)

Figure 1.5: The two basic types of diagonal filters [4].

Three types of Haar wavelet like features are defined in the detection sub-window in [5],

as shown in Figure 1.6. The rectangles are of flexible size and are at distances of ( ) apart. These features can be non-symmetrical to cater to non-symmetrical characteristics of non-frontal faces.

Figure 1.6: Haar wavelet like features with flexible sizes and distances [6].

In addition to designing different features, there is another way to improve the detected

performance by using better boosting learning algorithm. In [1], Jones and Viola use

AdaBoost learning algorithm which selects a small number of Haar-like features from a larger

set and yields extremely efficient classifiers. Based on [2], Li et al. in [5] use FloatBoost

which includes the idea of Floating Search [6] into AdaBoost to overcome the

non-monotocity problems associated with AdaBoost. According to [6], FloatBoost only needs

(15)

1.2.2 Face detection in nighttime

Different from daytime, face detection in nighttime is limited by the lack of light. We

can’t find where the face is in the images. To solve the light problem in nighttime, most of researchers use the near infrared (NIR) camera to get the images or videos.

In [7], Zhao and Grigat present an automatic face recognition system in the near IR

spectrum. They use a simple and low cost hardware to get the “Bright pupil” effect which is

utilized to localize the eyes, as shown in Figure 1.7. Then Discrete Cosine Transform (DCT)

coefficients are selected as features, and Support Vector Machines (SVM) are employed to

identify faces.

Figure 1.7: Some examples of images with “Bright pupil” effect.

J. Dowdall et al. [8] use the NIR skin detection to obtain the skin region of human being,

and then use two distinct modes for face detection. In the first mode, they use correlated

multiband integral projections to detect the eyes and the eyebrows. Their system enters the

second mode if face detection fails in the first mode. Facial feature detection in the second

mode is based on a dynamic thresholding model and template matching.

In [9], the authors present an active NIR imaging system which is capable of producing

face images of good condition regardless of visible lights. Two statistical learning algorithms,

one is based on LDA and, the other on AdaBoost, are used to build face recognition

(16)

1.2.3 Multi-intensity IR illuminator

In [10], a new multi-intensity IR illuminator (MIIR-illuminator) is developed for

reducing blurred imaging results for nighttime surveillance. As shown in Figure 1.8, the

illumination intensities are changed periodically from the brightest level to the darkest one. In

[11], two methods of foreground object detection are developed for video sequences captured

with the MIIR-illuminator. The first method is background modeling based on illumination

clustering, it classifies image frames across different periods into clusters of similar

illumination levels. Then, it applies background subtraction to extract foreground objects in

each illumination cluster. The second method, called periodic min-max modeling, which has

less computation of background models than the first one by considering only extreme values

of an image pixel.

Figure 1.8: Periodic changes of the illumination intensities of a MIIR-illuminator.

In [12], the authors use MIIR-illuminator to detect license plate in nighttime scenes.

Figure 1.9 illustrates the schematic of the framework. Potential locations of the license plates

are estimated using the gradient and edge features, and then verified based on the stroke width

of the license ID. Finally, the results of license plate detection obtained from images captured

with different illumination levels are integrated into a synthesized image of high dynamic

range for better visualization.

Time Intensity

(17)

Figure 1.9: Schematic diagram of the framework from [12].

With the MIIR-illuminator, human faces are far away from the camera can be observed

in high illumination levels while the faces close to the camera aren’t be overexposure in low

illumination levels. We use MIIR-illuminator to capture surveillance video, and design an

effective algorithm to pick out the human faces in these images.

1.3 Organization of thesis

In nighttime surveillance, it is important to capture people in video sequences, especially

human faces. The advantage which MIIR-illuminator emits different intensity infrared light

periodically can let us get more information from nighttime environment. So, we use the face

detection techniques by Viola and Jones to detect human faces in the images captured by

MIIR-illuminator. In addition, selecting the high quality (HQ) faces is also important in

nighttime surveillance. The definition of HQ faces is the selected faces which are seen clearly

for human eyes, not overexposure or underexposure. We design a method for face image

selection.

The flowchart of this thesis is shown in Figure 1.10. In the remainder of this thesis, a

review of MIIR-illuminator is described in detail in Chapter 2. In Chapter 3, we elaborate on

(18)

selection is described clearly. In Chapter 5, experimental results under different environments

are described in detail. Finally, we give conclusions and discuss future works in Chapter 6.

Figure 1.10: Flowchart of this thesis. Input images Pre-processing Face classifiers Eye classifiers Faces Find cheeks Intensity histogram Regrouping The HQ face images Output images Face Detection

Face Image Selection

No

Yes

Yes No

(19)

Chapter 2. The Multi-Intensity IR

Illuminator

Currently, nighttime surveillance is generally performed under a constant intensity of IR

illuminator. Therefore, only few locations could be well-illuminated. People at far distances

from a camera may hard to be recognized due to the limited power of IR-illuminator. On the

other hand, people close to a camera may become unclear due to overexposure of the image

under the high intensity from an IR-illuminator. These problems cause low quality problem of

nighttime video. In order to solve these problems, we use MIIR-illuminator to expand the

range of nighttime surveillance, which will be described in this chapter together with some

special features of images thus captured.

2.1 MIIR-illuminator device

MIIR-illuminator has the capabilities of emitting multi-intensity of IR light by changing

the intensity periodically. It can let the camera capture more information from surveillance

depths and widths, as shown in Figure 2.1. In Figure 2.1(a), we can see the detailed

information about refrigerator and whiteboard, which are far from the camera, with the

brightest illumination level. In Figure 2.1(b), we can see the human face, which is close to the

(20)

(a) (b) Figure 2.1: Camera captures different ranges of surveillance depth.

Authors in [10] controls the light emitted by MIIR-illuminator to change periodically

from the brightest intensity to the darkest intensity. Figure 1.8 already shows the periodic

changes of the illumination intensities of a MIIR-illuminator and Figure 2.2 verifies the

periodic changes of the illumination intensities from oscilloscope. The frequency is 15 hertz

and each cycle has six illumination levels.

Figure 2.2: Intensity waveform of MIIR-illuminator detected by photodiode which is displayed on an oscilloscope.

(21)

2.2 Features of images captured with MIIR-illuminator

Images captured using the MIIR-illuminator look quite different from those obtained by

traditional IR illuminators because the brightness of the former changes periodically. Some

features of images captured with MIIR-illuminator will be described in this section.

2.2.1 Brightness change is not perfectly periodic

The multi-intensity illumination levels in each cycle are not exactly the same, as shown

in Figure 2.3 for the distribution of average intensity of an image with only a background of

white wall. In this thesis, average intensity is defined as spatial average of total exposed

image, and the formula is defined as,

∑ ∑ ( ) (5)

where is average intensity, is the image height, is the image width, and ( ) is the pixel value. In Figure 2.3, there are six illumination cycles, but the shapes of them are not

exactly the same. On the other hand, it is possible to capture totally the same images in a

cycle when the time to transport images to the storage is not enough. Therefore, these

redundant images shall be deleted for improving the performance of computation, and then

we can do face detection and subsequent selection of high quality human faces.

Frame number Intensity

(22)

Figure 2.4 shows an example to illustrate above situation. There are two cycles of images

captured with MIIR-illuminator, cycle 1 is represented by green block, and cycle 2 is

represented by blue block. Figure 2.4 shows that the illumination levels change periodically

from the brightest intensity to the darkest one; however, if the light changes from the darkest

intensity to the brightest one, it is too dark to focus on objects initially and getting poor image

quality. The number of images in the two cycles is not the same. And in each cycle, the

number of image under different illumination levels is also different. In addition, the

illumination level should change from the darkest level of current cycle to the brightest level

of the next cycle. Some transitive images appeared between the two cycles are shown in

Figure 2.4 marked with a red rectangle.

Based on results described above, we can get the conclusion that period of brightness

change is unstable. The redundant and transitive images are not needed. The solution of this

(23)

Figure 2.4: The brightness of captured images changes periodically.

2.2.2 Illumination within an image frame is non-uniform

The main feature of image captured with IR illuminator is that the peripheral region of it

is darker than the central region, as shown in Figure 2.5(a). The brightness of human face in

the central region may be brighter than the peripheral region even if distance is farther, as

shown in Figure 2.5(b). The man standing in the middle from camera has highest brightness

of faces in these three men. Due to the non-uniform characteristics of IR illumination, the

brightest object may be not the nearest one. People may not be detected when they pass by the

corner because the peripheral region of image is not bright enough.

Level 1 Level 1 Level 1 Level 1

Level 1 Level 1 Level 1 Level 2 Level 2 Level 2 Level 3 Level 3 Level 3 Level 4 Level 4 Level 4 Level 4 Level 5 Level 5 Level 5 Level 5 Level 6 Level 6 Level 6

Level 6 Level 6 Level 6 Level 1

Period cycle 1

(24)

(a)

(b)

Figure 2.5: (a) Non-uniform distribution of intensity is non-uniform in a captured image. (b) The brightness of human face near the central region of an image brighter than others, even though one other person is nearer to camera.

Intensity=197 Intensity=120

(25)

Chapter 3. Face Detection for Different

Image Brightness

This chapter introduces how to perform the face detection of images captured with

MIIR-illuminator so that clear faces, which may be critical in law enforcement applications,

may be acquired. We first develop the pre-processing that will delete the previously

mentioned redundant images and transitive images in Section 3.1. We then describe the

principle of face detection approach implemented in OpenCV in Section 3.2. Finally, we

introduce the classifiers that are used in this thesis and show how these classifiers work in

Section 3.3.

3.1 Pre-processing

Because the captured images using MIIR-illuminator have many redundant images, we

need to delete them to avoid unnecessary computation. As shown in Figure 3.1, the images

crossed by red lines are redundant images, which reveal that almost half of images are not

(26)

Removing the redundant images can be achieved by evaluating temporal difference of

two consecutive images (t and t-1), and calculates the sum value (diff) of differences as

∑ ( ) (6) where ( ) is the value of pixel i of frame t (t-1). If the sum is zero, it means that

these two continuous images are exactly the same, and we will delete one of the redundant

images.

The transitive images, which are marked with a green rectangle in Figure 3.1, should be

also deleted because they do not belong to any cycle. To that end, we evaluate the total gray

value as for frame t. as

∑ (7) If , frame t corresponds to a transitive image and will be deleted. As shown in Table 3.1, we can reduce more than half of image frames and save up to 58% in

computation time with pre-processing which deletes redundant images and transitive images.

Table 3.1: The efficiency improvement from pre-processing.

Data

Before pre-processing After pre-processing Improvement in Computation

time Frame number Computation

time(s) Frame number

Computation time(s) Data1 26 23.914 12 9.994 58% Data2 372 278.801 174 141.58 49.2% Data3 439 425.586 205 213.791 49.7% Data4 662 687.793 312 329.686 52%

(27)

3.2 Review of face detection

This section describes the face detection in OpenCV which include: (1) Haar-like

features and integral image, (2) AdaBoost algorithm for training classifier, (3) cascade of

classifiers. The approach is based on [1] by Viola and Jones.

Haar-like features are shown in Figures 1.2, and 1.3, and can be computed very rapidly

using integral image. The formulas which calculate the simple Haar-like features and rotated

Haar-like features are described in Section 1.2.1. The main advantage of using the integral

image is that any rectangular sum can be computed in four array references. Two-rectangle

features can be computed with six references because the two rectangles are adjacent.

Similarly, three- and four-rectangle features only need eight and nine references, respectively.

AdaBoost is an eﬃcient boosting algorithm for training classifier. It selects good features from a large feature set. It constructs weak classifiers, each of which is based on one

of the features. Then, AdaBoost will boost (via weighted summation) the weak classifier into

a stronger classifier. Cascade of classifiers achieves higher detection performance and less

computation time. Connecting many classifiers in series will create a cascade of classifier that

separates positive or negative data, as shown in Figure 3.2. Simpler classifiers are used to

reject the majority of sub-windows at earlier stages before more complex classifiers.

Figure 3.2: A cascade of classifiers with N stages. Each classifier is trained to achieve a hit rate of h and a false alarm rate of f. N 3 2 1 1-f h h h h 1-f 1-f 1-f

Sub-windows … Classified faces

(28)

3.3 Face detection using two-layer classifiers

According to Section 3.2, frontal face detection by Viola and Jones [1] is simple and

effective. R. Lienhart et al. [3] then use modified Haar-like feature and Gentle AdaBoost

algorithm to improve hit rate of face detection, with the performance raised up to 82.7% at a

rescale factor of 1.1. This is a good approach for frontal face detection so we apply the

classifier as our first-layer classifier. By the way, we do try to build our classifier but the

result is worse than theirs, as shown in Figure 3.3. In Figure 3.3 (a) – (d), both the results of

(29)

(a)

(b)

(c)

(d)

Figure 3.3 (a) – (d): The results of using different classifiers. On the left, red rectangles are for face detected by frontal face classifier described in [3]. On the right, red rectangles are for faces detected by ours frontal face classifier.

(30)

The purpose of this thesis is to capture HQ human faces. According to our observation,

some of the faces which are detected by the frontal face classifier are not clear enough. The

facial features may be overexposed or blurred, as shown in Figure 3.4. In order to remove

blurred faces and wrong faces, we use two layers of classifiers to double check clear human

faces.

Figure 3.4: Examples of overexposed and blurred faces.

To that end, in addition to the first layer classifier which is the frontal face classifier

mentioned in previous paragraph, we use a second layer classifier to process the candidate

regions which is identified by the first layer classifier. We tried to use nose, mouth, one eye

and eye pair classifiers in OpenCV to be the second layer classifier. The results show that the

eye pair classifier gets lowest false alarm, as shown in Table 3.2, so we choose the eye pair

classifier which is trained by M. Castrillo´n et al [13] to be the second layer classifier. The

miss rate of the eye pair classifier is 35.2%, which is also the lowest.

Table 3.2: The accuracies of different second layer classifiers. Classifier Detected

region TP FN FP TN Miss rate

Eye pair 344 191 104 0 49 35.2%

Eye 344 97 198 5 44 67.1%

Nose 344 158 137 9 40 46.4%

(31)

Figure 3.5 shows the results of facial features detection obtained with different classifiers.

The red rectangle regions are detected as human faces by the frontal face classifier, and the

green rectangle regions are detected as their facial features by the second layer classifier. It is

(32)

(a) (b) (c) (d) (e) (f)

Figure 3.5 : The red rectangle regions are the results of first layer classifier and the green rectangle regions are the results of different second layer classifiers. The latter (from left to right): eye pair classifier, eye classifier, nose classifier, and mouth classifier.

(33)

The above captured human face images with multi-intensity illumination are clear at

different distances. Based on these images, correct human faces can often be detected by

two-layer classifiers. As shown in Figure 3.6(a), when the man walks from far to near, his

face at different distances can be captured by adopting different illumination levels. On the

other hand, Figure 3.6(b) shows four people at different distances, with their faces captured at

different illumination levels in an illumination cycle. The person who stands at the farthest

distance from camera can be captured with the first or second brightness level, wherein the

people standing closer to the camera are overexposed. When the intensity is lower, the man

standing closer to the camera is clear and the human face images can be captured easily. The

man who stands closest to camera is detected easily in images with darker levels while others

are underexposed.

…

(a)

Level 1 Level 2 Level 3 Level 4 Level 5 Level 6

(34)

(b)

Figure 3.6: Human faces at different distances can be captured by adopting different illumination levels.

After the above face detection procedure, we can get multiple images of human face for

each person. A method to select human face images of good quality for further processing

will be described in detail in the next chapter.

Level 1 Level 2 _{Level 3}

(35)

Chapter 4. Face Image Selection by Using

Intensity Histogram

For human faces detected by the two-layer classifier described in the previous chapter, as

shown in Figure 4.11, we need to select the high quality (HQ) ones for better visualization. In Section 4.1, we explain the reason of choosing the cheek region of human face for the above

selection by examining the gray value distribution for a face. Accordingly, the intensity

histogram of cheek region is calculated for getting the representative intensity of human faces

in Section 4.2. Finally, the method of regrouping human faces with different brightness for

better visualization is described in Section 4.3.

Figure 4.1: Human faces detected by the two-layer classifier.

1. Figure 3.6(a) shows selected images of three illumination cycles of the same image sequence.

4.1 Using cheek portion of human faces

One possible way of selecting HQ faces from human faces detected by the two-layer

classifier described in the previous chapter is to analyze the brightness of a human face. As

shown in Figure 4.2(a), if the brightness of face is too high, the facial features will be unclear

(1) (2)

(3) (4)

(36)

since the image is overexposed, similar problem will also occur when the image is

underexposed, as shown in Figure 4.2(c). On the other hand, for the face image shown in

Figure 4.2(b), its brightness is close to that desired for a HQ face in that all the facial features

have good contrast and can be identified easily.

(a) (b) (c)

Figure 4.2: Some examples of different brightness of face. (a) High brightness. (b) Medium brightness. (c) Low brightness.

In order to establish an appropriate and consistent measure of brightness of human face,

we first analyze the brightness distribution of detected face images. For easier observation, we

divide the gray values between 0 and 255 into 8 portions and use 8 different colors to

represent them. For example, Figure 4.3(a) shows face images detected by the two-layer

classifier, with gray values illustrated with the above colors shown in Figure 4.3(b). In Figure

4.3(b), one can see that the color of human cheek has near uniform distribution, which has

spatially smooth surface with uniform texture, and occupies a larger portion of human face.

Therefore, we choose the brightness of cheek to represent the brightness of face. A simple

way of cutting the cheek region, which is adopted in the thesis, is shown in Figure 4.4. We cut

the top half, one-sixth of bottom, and one-sixth of left and right so that the region inside the

green rectangle is regarded as the cheek portion of human face. Then, we use this cheek

(37)

(a)

(b)

Figure 4.3: (a) Original faces which are detected by the two-layer classifier in one illumination cycle. (b) The colors represent different ranges of gray values of a face.

Figure 4.4: The cheek region (green rectangle) used to represent the brightness of human face.

4.2 Calculating the brightness of cheeks

For the cheek region defined above, we now use the intensity histogram as those shown

in Figure 4.5 to obtain a numeric value to represent the brightness of the human face. In

particular, for each of intensity histogram of a check region, we first identity the gray values

of the top ten bins, then calculate the mean value of these bins to represent the brightness of

cheek numerically. 1/6 w 1/6 w l w 1/2 l 1/6 l

(38)

Figure 4.5: Some examples of intensity histograms of cheek regions.

Consider a person located at different distances from a camera and illuminated by the

multi-intensity IR illuminator, as shown in Figure 3.6(a), the change of the brightness of the

human face will depend on the distance as well as the light intensity. Figure 4.6 shows the

variation of brightness of the cheek region obtained for such an image sequence in which a

man is walking from far distance to a nearer distance. At first, the brightness of cheek is lower

than 150 for all brightest illumination levels. Then, the brightness in the maximum brightness

level increases gradually, while the difference between maximum and minimum brightness

levels also becomes larger. At last, in the third row of Figure 3.6(a), the man is near the

MIIR-illuminator so that the maximum brightness of face is close to 250 and the human face

is overexposed.

Figure 4.6: The variation of brightness of the cheek region (see text). Brightness

(39)

4.3 Regrouping human faces with different brightness

Since the brightness of cheek changes for different illumination levels and at the

different distances, we can’t select the HQ faces in a single illumination level. Thus, we need

to analyze the brightness of all the faces and define the ranges of brightness which can be

used to select the HQ faces.

For the image sequence shown in Figure 3.6(a), 63 human faces are detected from the

174 images which have different brightness and different scales. We then select the clear

faces by hand, as shown in Figure 4.1, for part of 63 human faces with red retangle indicate

the selected ones. Figure 4.7(a) and (b) show intensity histograms of these face images and

the corresponding cheek regions, respectively. While the black backgorund will affect the

determination of the brightness of face images, as shown in Figure 4.7(a), the histogram of

cheek regions more similar to each other faces and will be adopted to calculate the brightness

of these face images. For the HQ faces shown in Figure 4.1 which are selected manually, the

brightness of them are calculated as between 170 and 210, as shown in Table 4.1.

(a)

(b)

Figure 4.7: (a) Histograms of HQ face images. (b) Histograms of cheek regions. Table 4.1: Histograms of HQ faces and their cheek.

(1) (2) (3) (4) (5) (6)

Size 56x56 57x57 62x62 62x62 66x66 70x70

(40)

In addition, we also select the human faces with good contrast which are a little brighter

than HQ faces (called Lighter face), and those a little darker than HQ faces (called Darker

faces). Figure 4.8 shows the HQ face, the Lighter face, and the Darker face, all selected

manually.

Figure 4.8: The selection of faces with acceptable quality in one illumination cycle.

Like the brightness range obtained for the HQ faces, similar ranges can also be obtained

for the Lighter faces and the Darker ones as

( ) { ( ) ( ) ( ) (8)

Figure 4.9(a) shows the three groups of face images selected according to (8) wherein the

faces are resized to 50x50 for convenient observation. Moreover, for each group of images,

the brightness of faces are adjusted so that the mean value of image brightness is equal to the

mid-point of the range shown in (8).As shown in Figure 4.9(b), the brightness of all faces in

each group becomes more uniform, and look nicer, after these processing steps. It is not easy

to see that human faces in each group, with same size and similar brightness, are actually

obtained with different illumination levels, and for people located at different distances. Intensity: 248 248 238 180 113 45

1 2 3 4 5 6

(41)

(a)

(b)

Figure 4.9: The (a) Size and (b) brightness adjustments of each image group: HQ (top), darker (middle), and lighter (bottom).

(42)

Chapter 5. Experimental Results

5.1 Environment settings

Our system includes an IR camera and the MIIR-illuminator which emits six levels of

infrared light periodically, and the IR camera captures image frames with 640x480 in

resolutions. These devices are located about 2 or 2.5 meters above the ground level, as shown

in Figure 5.1.

(a) (b)

Figure 5.1: (a) Experimental environment. (b) Close-up view of IR camera and MIIR-illuminator.

2.5m

MIIR-illuminator IR camera

(43)

5.2 Experiment 1 — MIIR-illuminator vs. Traditional IR

illuminator

The purpose of this experiment is to compare the performance of face detection using

MIIR-illuminator and traditional IR illuminator. Figure 5.2 shows face images captured using

the MIIR-illuminator (left column) and a traditional IR illuminator (right column). In Figure

5.2(a), the man is about 4 meters away from the camera. The first illumination level of

MIIR-illuminator is chosen because the image of human face is most clear. However, we

can’t detect the human face in the image illuminated by the traditional IR illuminator at the same distance because the face image is underexposed when the person is far away from the

illuminator. In Figure 5.2(b), images are captured at a distance of 3 meters away from the

camera. Faces can be detected from both images. In Figure 5.2(c), when the person is 1 meter

away from camera, a good image (under the illumination of level 5) can be selected for the

MIIR-illuminator, shown as the left hand side of Figure 5.2(c); on the right hand side, the

image obtained from traditional IR illuminator is overexposed because the person is too close

to the camera.

Based on these results, one can see that the MIIR-illuminator has better chance of

providing images of high quality for nighttime surveillance of a large area. In particular, we

can capture human faces of better quality for people located at different distances by using

(44)

(a)

(b)

(c)

Figure 5.2 (a) – (c): Face images captured using the MIIR-illuminator (left column) and a traditional IR illuminator (right column).

Level 3 Level 1

(45)

5.3 Experiment 2 — Face detection by using MIIR-illuminator

under simple background environments

This section presents some experimental results of face detection by using

MIIR-illuminator under simple background environments. The goal of these experiments is to

show that human faces at different distances can be detected correctly from images

illuminated by the MIIR-illuminator before the HQ face images can be selected from them.

The experiments can be divided into (i) single person case and (ii) multi-person case.

5.3.1 Single person under simple background environments

The camera is installed at 2 meters above the ground for this experiment wherein man is

walking from far to near slowly, as shown from left to right and from top to bottom in Figure

5.3(a). Image obtained with six different illumination levels within one illumination cycle are

displayed in each row. In the first row of Figure 5.3(a), the faces can only be detected under

the brightest illumination levels (levels 1 and 2) and underexposed under other illumination

levels. In the second row, the faces can only be detected under the medium illumination levels

(levels 2 - 5) and either overexposed or underexposed under other illumination levels. In the

third row, these faces can only be detected under the darkest illumination levels (levels 3 - 6).

As shown in Figure 5.3(b), the faces of better quality can be further divided into 3 groups,

namely the Best group, the Lighter group, the Darker group, and the Other group. The human

faces in the first row correspond to the HQ face images, with highest quality and good

contrast. For each group, after size and brightness adjustments, the face images which are

captured for person walking at different distances are very similar to one another. While the

Lighter and Darker groups also have acceptable quality of facial features, the Other group are

(46)

…

(a)

(b)

Figure 5.3 : Results of face detection and face image selection under simple background environments.

(a) The human faces are captured under six different illumination levels. (b) The faces of better quality can be further divided into 3 groups.

Best group

Lighter group

Darker group

Other group

(47)

5.3.2 Multi-person under simple background environments

The camera is installed at 2.5 meters above the ground for this experiment wherein two

men are walking along with one behind the other, as shown in Figure 5.4(a). Image obtained

with six different illumination levels within one illumination cycle are displayed in each row.

In the first row of Figure 5.4(a), faces can only be detected under highest illumination levels

(levels 1 and 2) and underexposed under other illumination levels. Compared with the first

row, the closer person is close enough to be detected under five different illumination levels

in the second row of Figure 5.4(a). In the third row of Figure 5.4(a), the closer person is too

close and overexposed under the brighter illumination levels, but we still can detect human

faces under lower illumination levels (levels 3, 4, and 5). In the same row, the distant person

can only be detected under the highest illumination level; however, the front one is

overexposed.

Figure 5.4(b) shows results of face image regrouping, similar to that shown in Figure

5.3(b). One can see that multiple face images of seemingly equal quality can be obtained for

each group of images, and it requires careful examination to tell that these images are actually

obtained for people located at different distances from the camera.

According to this experimental result, our methods of face detection and face image

(48)

…

(a)

(b)

Figure 5.4 : Results under multi-person condition. (a) Results of face detection. (b) Results of face image selection.

Best group

Lighter group

Darker group

Other group

(49)

5.4 Experiment 3 — Face detection by using MIIR-illuminator

under complex background environments

This section presents some experimental results of face detection by using

MIIR-illuminator under complex background environments. The goal of these experiments is

to show that human faces at different distances can be detected correctly even under complex

background environments before the HQ face images can be selected form them. The

experiments can be divided into (i) single person case and (ii) multi-person case.

The results of face detection under complex background environments cause many false

positive regions if only one-layer classifier applied. As shown in Figure 5.5(a), which may

decrease the accuracy of face detection. Our method which uses two-layer classifiers can filter

(50)

(a) (b)

Figure 5.5: (a) Face detection results under simple background and complex background environments obtained by using one-layer classifier. (b) Similar results obtained by using two-layer classifier.

Simple background Complex background

(51)

5.4.1 Single person under complex background environments

The camera is installed at 2 meters above the ground for this experiment. As shown in

Figure 5.6(a), image obtained with six different illumination levels within one illumination

cycle are displayed in each row. When the man stands at far distance from the camera, his

face can only be detected under higher illumination levels and underexposed under other

illumination levels. On the other hand, as the man walks towards the camera, the face image is

overexposed under the brightest illumination level, but we can detect it by selecting the image

(52)

…

(a)

(b)

Figure 5.6: Results of face detection and face image selection for one person under complex background environments. (a) The human face images are captured under six different illumination levels. (b) The results of grouping the detected face images.

Best group

Lighter group

Darker group

(53)

5.4.2 Multi-person under complex background environments

The camera is installed at 2 meters above the ground for this experiment. Two persons

walk to the camera one behind the other under complex background environments. As shown

in Figure 5.7(a), image obtained with six different illumination levels within one illumination

cycle are displayed in each row. The face of closer person can be detected under the brighter

illumination levels with far distance condition. When two persons walk towards the camera,

the face of closer person can be detected under the darker illumination level, and the face of

distant person can be detected under the brighter illumination level. According to these

experimental results, the performance of face detection under complex background

(54)

…

(a)

(b)

Figure 5.7: Results of face detection and face image selection for multi-person under complex background environments. (a) The human face images are detected under six different illumination levels. (b) The results of grouping the detected face images.

Best group

Lighter group

Darker group

(55)

Chapter 6. Conclusions and Future Works

6.1 Conclusions

In this thesis, we use MIIR-illuminator for nighttime face detection. The advantage of

using MIIR-illuminator is that it emits infrared light of different intensities periodically,

acquiring more information from the environment. Such an approach seems to reasonably

solve the problem of capturing poor quality images of human faces in nighttime.

Our face detection method is divided into two parts: face detection and face image

selection. The two-layer classifiers of face detection improve the performance of detection by

decreasing the false alarm. The face image selection scheme then selects the HQ faces, with

minor adjustments for better visualization, from the detected human faces. Experimental

results show that HQ face images can be detected and selected for persons at different

distances from the camera. Even under a complex background environment, our method can

still detect and select HQ faces effectively and correctly, which is good for improving the

performance in crime investigation and prevention.

6.2 Future works

Our Face detection method uses the whole image to detect face regions. In the future, we

hope that we can incorporate the foreground detection algorithm [11] as a preprocessing stage

of face detection. It may identify the foreground regions quickly before human faces are

detected within these regions. Hopefully, such an approach, will decrease the computation

time and filter out some false positive regions in the background. In addition, we hope to

establish an objective standard for HQ face images. Finally, we hope that MIIR-illuminator

(56)

(57)

References

[1] P. Viola and M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple

Features,” Computer Vision and Pattern Recognition, vol.1, pp. 511-518, 2001.

[2] R. Lienhart and J. Maydt, “An Extended Set of Haar-like Features for Rapid Object

Detection,” International Conference on Image Processing, vol. 1, pp. 900-903, 2002.

[3] R. Lienhart, A. Kuranov, and V. Pisarevsky, “Empirical Analysis of Detection Cascades

of Boosted Classifiers for Rapid Object Detection,” Pattern Recognition, pp. 297-304,

2003.

[4] M. Jones and P. Viola, “Fast Multi-view Face Detection,” Technical Report 096,

Mitsubishi Electric Research Laboratories, 2003.

[5] S. Li, L. Zhu, Z. Zhang, A. Blake, H. Zhang, and H. Shum, “Statistical Learning of

Multi-View Face Detection,” European Conference on Computer Vision, pp. 117-121,

2002.

[6] P. Pudil, J. Novovicova, and J. Kittler, “Floating Search Methods in Feature Selection,”

Pattern Recognition Letters, vo1. 15, pp. 1119-1125, 1994.

[7] S. Zhao and R. R. Grigat, “An Automatic Face Recognition System in the Near Infrared

Spectrum,” Machine Learning and Data Mining in Pattern Recognition, pp. 437-444,

2005.

[8] J. Dowdall, I. Pavlidis, and G. Bebis, “Face Detection in the Near- IR Spectrum,” Image

and Vision Computing, vol. 21, pp. 565-578, 2003.

[9] S. Z. Li, R. Chu, S. Liao, and L. Zhang, “Illumination Invariant Face Recognition using

Near-Infrared Images,” Pattern Analysis and Machine Intelligence, vol. 29, no. 4, pp.

627–639, 2007.

[10] W. C. Teng, “A New Design of IR Illuminator for Nighttime Surveillance,” MS Thesis,

National Chiao Tung Univ., 2010.

(58)

Multi-Intensity Infrared Illuminator,” International Conference on Signal Processing

and Imaging Engineering, 2011.

[12] Y. T. Chen, J. H. Chuang, H. H. Lin, and H. T. Chen, “Robust License Plate Detection in

Nighttime Scenes using Multiple Intensity IR-Illuminator,” International Symposium on

Industrial Electronics, 2012.

[13] M. Castrilloń, O. Deńiz, C. Guerra, and M. Hernańdez, “ENCARA2: Real-Time

Detection of Multiple Faces at Different Resolutions in Video Streams,” Journal of

利用多重亮度紅外線打光器之夜間人臉辨識

國

立

交

通

大

學

多媒體工程研究所

碩

士

論

文

利用多重亮度紅外線打光器之夜間人臉辨識

Nighttime Face Detection Using Multi-Intensity IR Illuminator

研 究 生：姚柏安

指導教授：莊仁輝 教授

利用多重亮度紅外線打光器之夜間人臉辨識

Nighttime Face Detection Using Multi-Intensity IR Illuminator

研 究 生：姚柏安 Student：Po-An Yao

指導教授：莊仁輝 Advisor：Jen-Hui Chuang

國 立 交 通 大 學

多 媒 體 工 程 研 究 所

碩 士 論 文

利用多重亮度紅外線打光器

之夜間人臉辨識

學生：姚柏安 指導教授：莊 仁 輝 博士

國立交通大學

多媒體工程研究所碩士班

摘 要

Nighttime Face Detection Using Multi-Intensity IR Illuminator

Student：Po-An Yao Advisor：Dr. Jen-Hui Chuang

Institute of Multimedia Engineering

National Chiao Tung University

Abstract

Acknowledgement

Contents

List of Figures

List of Tables

Chapter 1. Introduction

1.1 Motivation

1.2 Review of related work

1.2.1

Face detection in daytime

1.2.2

Face detection in nighttime

1.2.3

Multi-intensity IR illuminator

1.3 Organization of thesis

Chapter 2. The Multi-Intensity IR

Illuminator

2.1 MIIR-illuminator device

2.2 Features of images captured with MIIR-illuminator

2.2.1

Brightness change is not perfectly periodic

2.2.2

Illumination within an image frame is non-uniform

Chapter 3. Face Detection for Different

Image Brightness

3.1 Pre-processing

3.2 Review of face detection

3.3 Face detection using two-layer classifiers

Chapter 4. Face Image Selection by Using

Intensity Histogram

4.1 Using cheek portion of human faces

4.2 Calculating the brightness of cheeks

4.3 Regrouping human faces with different brightness

Chapter 5. Experimental Results

5.1 Environment settings

5.2 Experiment 1 — MIIR-illuminator vs. Traditional IR

illuminator

5.3 Experiment 2 — Face detection by using MIIR-illuminator

under simple background environments

5.3.1

Single person under simple background environments

5.3.2

Multi-person under simple background environments

5.4 Experiment 3 — Face detection by using MIIR-illuminator

under complex background environments

5.4.1

Single person under complex background environments

研究生：姚柏安

指導教授：莊仁輝教授

研究生：姚柏安 Student：Po-An Yao

國立交通大學

多媒體工程研究所

碩士論文

學生：姚柏安指導教授：莊仁輝博士

摘要