• 沒有找到結果。

中 華 大 學

N/A
N/A
Protected

Academic year: 2022

Share "中 華 大 學"

Copied!
41
0
0

加載中.... (立即查看全文)

全文

(1)

中 華 大 學 碩 士 論 文

題目:使用表情轉換矩陣建立以影像為基礎之表情辨 識系統

A New Appearance-Based Facial Expression Recognition System with Expression Transition

Matrices

系 所 別:資訊工程學系碩士班 學號姓名:M09402054 林靈逸 指導教授:連振昌 博士

中華民國 九十七年 一月

(2)

摘要

在本論文中,我們提出於低解析度的影像下使用表情轉換的方法去辨識六 種 不 同 的 臉 部 表 情 , 包 括 生 氣 (anger) 、 害 怕 (fear) 、 快 樂 (happiness) 、 無 (neutral)、難過(sadness)和驚訝(surprise)。在臉部表情辨識系統中,應用於日常生 活,光線隨時隨地都在變化。而這些變化經常照成表情辨識錯誤,所以我們需要 對每張影像做光線補償。對補償過後的人臉找出有效區域,再利用直接對應法 (direct mapping)去計算不同臉部表情有效區域之間的轉換矩陣並儲存至資料庫 中。當臉部影像輸入後,分別使用臉部偵測,樣板比對法(template matching),

Hough transform 等方法去切割出能夠表現臉部表情特徵的有效臉部區域,再利 用表情轉換矩陣轉換至指定的臉部表情,經比對後辨識出該輸入影像的臉部表 情。實驗結果顯示我們的臉部表情辨識系統能夠有效和準確的辨識於真實環境下 拍攝的 20 段影片。一張影像從 Camera 中擷取出來並且辨識表情約需要 0.24 秒。

(3)

致謝

很幸運的能夠有機會在中華大學資訊工程學系碩士班中求學,度過值得回憶 的兩年研究生活並順利取得到碩士學位。在這之中,首先我要感謝我的指導教授 連振昌老師在這兩年半的時間裡不僅在專業知識上對我的幫助與教導,在人生的 態度以及做人處世的道理上更是讓我獲益良多。這兩年來在連老師ㄧ點一滴的栽 培下,才得以順利完成研究所的學業。

其次,我要感謝智慧型多媒體實驗室中許許多多朝夕相處夥伴們,尤其感謝 致傑、仲平、士棻、嘉宏、志強、郁婷、忠茂、炳佑和揚凱學長在論文研究上給 予的莫大指導和幫助,家銘、建程、清乾、正達和昭偉同學間相互的支持與照顧,

以及雅麟、佐民、銘輝、岳珉、懷三、偉欣、怡婷、筱萱、明修和正倫學弟的協 助和陪伴,使得我這兩年的研究生生活得以過的多彩多姿。

最後,感謝父母家人無時無刻對我的叮嚀與照顧,使得我能如期完成我的碩士論 文。

(4)

目錄

摘要……… 一 致謝……… 二 目錄……… 三 第一章 簡介………四 第二章 前處理……… 五 第三章 表情轉換矩陣……… 六 第四章 臉部表情辨識……… 七 第五章 實驗結果……… 八

第六章 結論……… 九

(5)

第一章 簡介

近 幾 年 來 , 有 越 來 越 多 臉 部 表 情 辨 識 的 相 關 研 究 被 應 用 在 人 機 互 動 (human-computer interaction) 、 測 謊 (deceit detection) 和 憂 鬱 症 病 患 的 監 控 (depressed patients surveillance)上。一般而言,大部分的臉部表情可以被分成六 種類型,包括生氣(anger)、害怕(fear)、快樂(happiness)、無(neutral)、難過(sadness) 和驚訝(surprise),而大部分的臉部表情辨識系統也可以被粗略區分成以特徵為 基 礎 (feature-based) 、 以 影 像 外 觀 為 基 礎 (image-based) 和 以 模 型 為 基 礎 (model-based)三種方法。然而,大部份的臉部表情辨識方法卻都存在表情特徵 取得困難、演算法過於複雜、執行時間太長和未能使用於低解析度影像等缺點。

於是在本論文中,我們提出了使用表情轉換的臉部表情辨識方法去改善上述的問 題,並利用直接對應法(direct mapping)去計算不同表情之間的轉換矩陣。第二章 將會介紹再表情辨識前必須做的一些前處理,第三章則會詳細介紹如何使用直接 對應法(direct mapping)計算不同表情之間的轉換矩陣,第四章是如何辨識人臉為 何種表情,第五章則是實驗結果,第六張為結論。

(6)

第二章 前處理

在表情辨識之前,必須先找出人臉位於影像的區域。於現實生活中光線會隨 者時間、環境的改變而產生變化。而這些變化會照成表情辨識系統辨識錯誤,所 以我們必須對找到的人臉區域做光線補償。透過光線補償的人臉圖片,使用板比 對法(template matching)和 Hough transform 去找到左眼、右眼的正確位置,再利 用彩色樣板比對法和嘴巴一些特性找到嘴巴上端的正確位置,最後傾斜校正、大 小調整等方法切割出能夠表現出臉部表情特徵的有效臉部區域。進而使用這些有 效臉部區域做表情辨識。

(7)

第三章 表情轉換矩陣

在訓練階段,我們利用不同的表情轉換矩陣去紀錄所有臉部表情之間的轉換 資訊。首先我們從 Cohn-Kanade 表情資料庫裡頭選取 240 張臉部影像當作訓練影 像,分別經過手動方式的旋轉、切割、大小調整和亮度正規化後,找到能夠表現 出臉部表情特徵的有效臉部區域,並使用直接對應法(direct mapping)去計算所有 的表情轉換矩陣,最後儲存至資料庫中。

(8)

第 四章 臉部表情辨識

在辨識階段,我們除了使用前一章所計算得到的表情轉換矩陣去轉換已知表 情的臉部影像到其他指定表情,更可以利用這些表情轉換矩陣去辨識六種臉部表 情,包括生氣(anger)、害怕(fear)、快樂(happiness)、無(neutral)、難過(sadness) 和驚訝(surprise)。當未知表情的臉部影像輸入後,利用第二張提到的方法,取 出臉部表情有效區域。但是除了無表情其他表情都會有程度上的差異,例如:笑 有 開 懷 大 笑 或 者 是 微 笑 。 所 以 我 們 分 別 對 生 氣 (anger) 、 害 怕 (fear) 、 快 樂 (happiness)、難過(sadness)和驚訝(surprise)五種表情訓練 2 種不同程度的表情 轉換。最後利用一組表情轉換矩陣(Tneutral-anger, Tneutral-week anger , Tneutral-fear, Tneutral-week fear, Tneutral-happiness, Tneutral-week happiness, Tneutral-neutral, Tneutral-sadness, Tneutral-week sadness,

Tneutral-surprise and Tneutral-week surprise)對辨識者無表情影像轉換後得到不同的表情,經

關聯式比對法(correlation matching)和一些運算後辨識出該影像的臉部表情。

(9)

第五章 實驗結果

在訓練階段,有效的臉部區域被正規化到 30×30 的大小,然後利用直接對應

法(direct mapping)計算出由無表情到各種表情的轉換矩陣。在辨識階段,我們自 動辨識 20 段於真實環境下自行拍攝的影片,先利用臉部偵測、樣本比對法 (template matching)和一些前處理得到大小為 30×30 的有效臉部區域,最後利用 表情轉換矩陣進行臉部表情的辨識。一張影像從相機取像到辨識約需要 0.24 秒。實驗結果顯示臉部表情辨識系統有不錯的辨識率(86%)。

(10)

第六章 結論

在本論文中,我們利用表情轉換的方法辨識六種臉部表情,包括生氣 (anger) 、 害 怕 (fear) 、 快 樂 (happiness) 、 無 (neutral) 、 難 過 (sadness) 和 驚 訝 (surprise)。在臉部表情辨識系統中,我們利用直接對應法(direct mapping)去計 算不同臉部表情之間的轉換矩陣。實驗結果顯示我們所提出的臉部表情辨識系統 能夠準確和有效的辨識於真實環境下拍攝的 20 段影片。未來,我們將尋找更大 的資料庫以建立出更準確的表情轉換矩陣,並試圖辨識出不同角度的臉部表情。

(11)

英文附錄

(12)

II

A New Appearance-Based Facial Expression Recognition System with Expression Transition

Matrices

Prepared by Lin-Lin Yi

Directed by Dr. Cheng-Chang Lien

Computer Science and Information Engineering Chung-Hua University

Hsin-Chu, Taiwan, R.O.C.

February, 2008

(13)

Abstract

In this thesis, we propose a novel image-based facial expression recognition method called “expression transition” to identify six kinds of facial expressions (anger, fear, happiness, neutral, sadness, and surprise) at low-resolution images. The boosted tree classifiers and template matching are used to locate and crop the effective face region that may characterize facial expressions. Furthermore, illuminate compensation, template matching and Hough transform are used to locate positions of eyes and mouth accurately. Then, the expression transformed images via a set of expression transition matrices are matched with the real facial images to identify the facial expressions. The proposed system can recognize the facial expressions with the speed of 0.24 seconds per frame and accuracy above 86%.

(14)

2

Contents

Abstract……….. 1

Contents………..

2

Chapter1 Introduction………..

3

Chapter2 Image preprocessing………

7

2.1 Face detection and face region extraction………... 7

2.2 Illumination compensation………8

2.3 Locating of eyes and mouth……….. 10

2.4 Effective facial region………... 12

Chapter3 Expression transition matrix………...

14

3.1 Direct mapping……….. 14

3.2 Transform expression by transition matrix………... 16

Chapter4 Facial expression recognition………..

18

Chapter5 Experimental result………..

22

Chapter6 Conclusion……….

26

References...

27

(15)

Chapter 1

Introduction

Recently, large amount of researches [1-9] are addressed on the works of automatic facial expression recognition that are widely applied to human-computer interaction [10], deceit detection [11] or depressed patients surveillance [12].

Generally, facial expressions are classified into six categories (anger, fear, happiness, neutral, sadness, and surprise) or described by using the action units (AUs) in the facial action coding system (FACS) [13].

Most of facial expression recognition systems may be roughly categorized into feature-based methods [1-3], image-based methods [4, 5], and model-based methods [6]. The feature-based methods extract the shapes and locations of eyebrows, eyes, nose, and mouth to form the expression feature vectors and identify the facial expressions. Kotsia et al. [1] manually place some Candide grid’s points around the regions of eyes, eyebrows, and mouth. The grid adaptation process is applied to generate the deformed Candide grid and the facial expression is identified by using the changing information of each Candide grid’s point in the video sequence. Cohn et al. [2] develop an automatic AUs classification system based on the feature point tracking. The displacements of 36 manually located feature points are estimated by using the optic flow method. Trujillo et al. [3] propose a novel method of interest point detection to localize the facial features. The interest point operator works directly on intensity variations of OTCBVS IRIS thermal image database [14].

In the image-based methods, some holistic spatial analysis, local spatial analysis,

(16)

4

[4] use discrete wavelet transform (DWT) to extract feature vectors from low- frequency subband to high-frequency subbands. Donato et al. [5] explore many new approaches for the facial image representation to recognize AUs automatically, which include some holistic and local spatial analysis, e.g., principal component analysis (PCA), independent component analysis (ICA), local feature analysis (LFA), linear discriminant analysis (LDA), and Gabor wavelet decomposition.

In the model-based methods, the statistical models are constructed from training images and used to recognize the facial expressions. Huang et al. [6] apply the point distribution model and the gray-level model to find the facial features and track their positions when the variations of designated points on these facial features occur.

There are totally ten action parameters (APs) are employed to identify the facial expressions.

To recognize the facial expressions accurately, many novel pattern recognition methods are integrated in the facial expression recognition systems. Kotsia et al. [1]

apply support vector machines (SVM) to classify the six basic facial expressions and eight AUs. Lu et al. [4] utilize self-organizing map (SOM) and multi-layer perceptron (MLP) neutral network to reduce facial features dimension and classify the facial expressions respectively. In [7], some of machine learning methods including adaboost, linear discriminant analysis (LDA), and support vector machines (SVM) are applied and compared with the performance of facial expression recognition.

However, the above-mentioned researches have two significant drawbacks. Firstly, most facial feature extraction methods may not extract the facial features (shape, color, and position) robustly because of hair and glasses occlusion, lighting variation, and wrinkle. Secondly, the computation of extracting facial features is complex and costly.

Furthermore, most current researches [1-7] are addressed on the facial expression

(17)

recognition with high-resolution images. However, in the applications of indoor/

outdoor surveillance, the facial images are often captured with low-resolution.

Therefore, to apply the above-mentioned methods in real environments will make facial expression recognition inaccurate. Shan et al. [8] propose a novel low- computation discriminative feature called local binary patterns (LBP) for the facial expression recognition at the low-resolution images. However, the recognition accuracy is unsatisfied when the resolution of facial image is less than 36×48. Wang et al. [9] identify the facial expression from low-resolution images by using an expression classifier that is learned from boosting Haar features. Although the accuracy and processing time of boosting method outperforms the other methods, but the training process of classifier often need a long time period.

In this thesis, a novel method called “expression transition” is developed to reduce the high training cost and recognize the facial expression at low-resolution images.

Based on the statistical view-transition method [15], an effect facial expression recognition method is proposed. In addition, the illumination compensation projection method, template matching and Hough transform [27] are used to improve the locating of eyes and mouth. Based on the positions of eyes and mouth the effective facial region can be extracted for the expression transformation process.

The system block diagram including the training phase and recognition phase is shown in Fig. 1. In the training phase, large amount of training facial images are categorized into six categories (anger, fear, happiness, neutral, sadness, and surprise).

The expression transition matrices that used to model the transformation between two facial expressions are constructed by direct mapping. In the recognition phase, a learn-based face detection technique with Intel’s open source computer vision library [16] is used to detect the face region and the expression transition matrices obtained in

(18)

the training phase are applied to recognize the facial expressions.

Fig. 1 Block diagram of proposed facial expression recognized system.

6

(19)

Chapter 2

Image Preprocessing

Before recognizing the facial expressions, some image preprocesses: face detection, geometrical correction, illumination compensation, locating the positions of eyes and mouth, and cropping of effective face region are required to generate an effective facial image. The flowchart of image preprocessing is shows in Fig. 2.

Fig. 2 Image preprocesses of extracting effective face region.

2.1 Face detection and face region extraction

To recognize the facial expression efficiently, robust and fast face detection technique is required. Viola et al. [21] apply a cascade of boosted tree classifiers to

(20)

fixed size 24×24 and developed by the Haar-like features shown in Fig. 3.

Fig. 3 Extended set of 14 Haar-like features.

In our facial expression recognition system, we apply the algorithm in Intel’s open source computer vision library [16] to detect face region. Some results of face detection are shown in Fig. 4. Then, the region of face is scaled to size of 100×100.

(a) frame 0 (b) frame 1 (c) frame 2 (d) frame 3

(e) frame 4 (f) frame 5 (g) frame 6 (h) frame 7 Fig. 4 Face detected within a video sequence (from neutral expression to happiness,

anger, and surprise expression).

2.2 Illumination compensation

Because the non--uniform illumination can makes the appearance-based facial expression recognition inaccurate, the illumination compensation should be applied to

8

(21)

cope with this problem. Here, the illumination compensation based on the multiple regression model (ICR) [25] is adopted to compensate the non-uniform illumination.

The illumination compensation based on the multiple regression model [25] can be written as:

ε +

= XB

Y

, (1)

where X is the image coordinate matrix, Y is the image intensity, B is the regression parameters and ε is the error that is assumed as normally distributed with mean 0 and variance σ2. By using the least square estimator, the regression parameter matrix B can be derived by the following formula.

Y X X X

B=( T )1 T . (2)

Then, the best-fit plane Z can be written as:

XB

Z = . (3)

Finally, the illumination compensated plane Zcan be derived as:

[

zc z zc z zc z zc zq T

Z' = 1, 2, 3,..., 1

]

, (4)

where zc=(max(Z)-min(Z))/2 and q is the number of total pixels. Some examples of illumination compensation are shown in Fig. 5.

(22)

2.3 Locating of eyes and mouth

Because the pupil is especially conspicuous in red channel shown in Fig. 6, a 5×5 template shown in Fig. 7-(a) is used to find the position of pupil. However, the pupil position obtained from the template matching shown in Fig. 8 can only serve the candidate position of the pupil. Hence, the Hough transform is applied to find the center position of pupil from the edge information extracted from the canny edge detection [16]. Based on the candidate position different radiuses are used to find the precise position of pupil. Some examples of extracting the precise position of pupil are shown in Fig. 10.

(a) (b) (c)

Fig. 6 Pupil image in (a) red, (b) green, and (c) blue channel respectively.

To find the position of the mouth, the color template matching and edge searching are applied again. The algorithm of locating the mouth is described in the following steps.

1. Based on the anthropometry and center of pupils, we can determine a searching line segment shown in Fig. 9 (a).

2. Computer the edge density along the searching line segment. If the edge density is larger than a predefined threshold, then this point is categorized into the candidate mouth positions.

3. Apply the template matching to find the precise position of mouth shown in Fig. 9 (b). The mouth template is shown in Fig. 7 (b).

Some examples of extracting the position of mouth are shown in Fig. 11.

10

(23)

(a) (b)

Fig. 7 (a) Eye template. (b) Mouth template.

(a) (b)

Fig. 8 The candidate pupil position is found by template matching and Hough transform for the expression of (a) neutral and (b) surprise.

(a) (b)

Fig. 9 (a) The blue line segment for searching the position of mouth. (b) The red point is the detected mouth position.

Fig. 10 The detected positions of eyes/mouth and radius of pupil.

(24)

2.4 Effective facial region

With the coordinates of eyes and mouth, the inclination correction and size scaling are performed to locate effective face region. In inclination correction process, the inclination correction angle θ for a facial image is calculated by the positions of left-eye (xl, yl) and right-eye (xr, yr).

r l

r l

x x

y y

− −

= tan−1

θ . (8)

Based on the inclination correction angle θ, the position (x, y) in facial image can be transformed to new position (x′, y′).

( ) ( )

( ) ( )

⎥ −

⎢ ⎤

⎡ −

⎥+

⎢ ⎤

=⎡

⎥⎦

⎢ ⎤

c c c

c

y y

x x y

x y x

θ θ

θ θ

cos sin

sin

cos , (9)

where (xc, yc) is center point of facial image.

After rotating the facial image, the effective face region can be obtained by the scaling and cropping the facial image according to a fixed length ratio among the positions of left-eye, right-eye, and top of the mouth. Once the effective face region is reduced to size n×n and expression feature vector a can be formed. Fig. 11 shows the examples of extracted effective face region for six facial expressions.

neutral anger fear sadness surprise happiness

(a)

12

(25)

(b)

Fig. 11 The effective face regions extracted from (a) Chon-Kanade database and (b) online captured face images.

The effective face region that may characterize facial expressions just encloses the eyebrows, eyes, nose, and mouth. Using the effective face region may remove the hair occlusion problem and make different persons have similar appearances for the same facial expression.

(26)

Chapter 3

Expression Transition Matrix

To develop a fast facial expression recognition system at low-resolution images and avoid unreliability problem of facial feature extraction in the feature-based methods, we propose an image-based facial expression recognition method called

“expression transition” to identify six kinds of facial expressions with high efficiency and accuracy.

3.1 Direct mapping

To reduce the computation cost, the cropped the effective facial region with size m×m is normalized into lower resolution images with size n×n (n<m). For each normalized image, an expression feature vector a is constructed by scanning the image from left-top to right-bottom and written as

=

n

an

a a

, 1 , 0

0 , 0

a M

, (10)

where ax, y denotes the pixel value at position (x, y). The expression transition matrix is used to model the transformation between two facial expression images and calculated by examining the changes in geometrical pixel value distributions [19]

shown in Fig. 12. We employ direct mapping approaches to calculate the expression transition matrices. The approach of direct mapping is to transform two facial expressions directly at the low-resolution facial images. However, the computation cost of facial expression matrices using this approach will increase significantly at

14

(27)

higher resolution image. We can control both the accuracy and cost of expression transition matrix computations with the size of texture property vector v.

Fig. 12 Facial expression transformation from expression indices i to j.

Given two subsets of feature vectors with facial expression indices i and j, the transformation between two facial expressions may be modeled as

] a a , a [ ] a a , a

[ 1j 2jL kj =Tij 1i 2i L ki , (11)

where k is the number of expression feature vectors formed by individual persons and Tij represents the expression transition matrix from facial expression indices i to j. Let

and , Eq. (2) can be rewritten as ]

a a , a

[ 1j 2j Kj

Bj = L Bi =[a1i,ai2LaKi ]

i ij

j T B

B = . (12)

We may apply the pseudo inverse matrix to find the solution of Tij as

( )

i iT T

i j

ij B B B B

T = 1 . (13)

Fig. 13 illustrates the facial expression transformation using the expression transition matrix Tij constructed by the approach of direct mapping.

(28)

Fig. 13 Facial expression transition used direct mapping.

3.2 Transform expression by transition matrix

Any two different expressions have a relation. So we can transition from a known expression to another expression, it can be written as:

ai

aj =Tij (14)

ai is the feature vector of expression index i, Tij is the expression transition matrix from facial expression indices i to j, and aj is transformed expression feature vector.

Fig. 15 shows the facial images of neutral expression and the transformed facial images with expression transition matrices Tneutral-anger, Tneutral-fear, Tneutral-happiness, Tneutral-neutral, Tneutral-sadness, and Tneutral-surprise respectively. Therefore, all the facial images of desired expression can be obtained by using the expression transition process and such an expression transition may be applied to develop the facial expression recognition system. Fig. 14 illustrates some expression transformed images.

(a)

16

(29)

(b)

Fig. 14 (a) Expression transitions for the facial images in Cohn-Kanade data base (b) Expression transitions for the captured facial images.

Here, we use expression transition matrix to model the expression transition between two facial expression images. Using these expression transition matrices, we can transform a facial image with known expression to another facial image of desired expressions. In addition, it also can be applied to recognize the six kinds of facial expressions.

(30)

Chapter 4

Facial Expression Recognition

In the training phase, the effective face region cropping shown in Fig. 15 is performed manually according to the positions of left-eye, right-eye, and top of the mouth and the histogram equalization [17] is applied to remove the effect of brightness variations. The Cohn-Kanade facial expression database [18] is used to construct the expression transition matrices. This database consists of approximately 2000 facial image sequences captured from more than 200 persons and classified into six categories including anger, fear, happiness, neutral, sadness, and surprise.

Moreover, these facial images are acquired from different lighting condition, age, sex, race, and appearance.

Fig. 15 Effective face region is cropped manually according to the positions of left-eye, right-eye, and top of the mouth in the training phase.

In chapter 3, the expression transition matrices are calculated from the direct mapping. These matrices may not only be applied to transform a facial image with known expression to any other facial image of desired expressions but also employed to recognize six different facial expressions including anger, fear, happiness, neutral,

18

(31)

sadness, and surprise. However, images with the same facial expression can have different degrees of facial actions, e.g., happiness may be a hearty laugh or smile.

Hence, two transition matrices for the expression transition from neutral to happiness, anger, fear, sadness, or surprise are generated to simulate the expression transition with different degrees. There are total eleven transition matrices are computed to describe the expression transitions. Each facial expression is recognized by fusing of two correlations obtained by matching the input facial image with the simulated facial image of different degrees. The fusion rule is defined as Eq.(15).

) (

* ) (e e'

Fu = (15)

e is the expression of happiness, anger, surprise, fear, or sadness correlation. eis the weak expression as the same expression of e. Fu is the fusion of expression. Fig. 16 shows the block diagram of the proposed facial expression recognition system.

(32)

20

Weak happiness

Anger

Surprise

Neutral Sadness Fear

Real facial image

Recognized expression Transition

matrices

Happiness

Weak anger

Weak surprise

Weak fear

Weak sadness

Correlation matching

Happiness correlation

Weak happiness correlation

Neutral image

Anger correlation

Weak anger correlation

Surprise correlation

Weak surprise correlation

Fear correlation

Weak fear correlation

Sadness correlation

Weak sadness correlation

Neutral correlation

Fusion of happiness correlation

Fusion of anger correlation

Fusion of surprise correlation

Fusion of fear correlation

Fusion of sad correlation Registered image

Transformed image

Fig. 16 The block diagram is the proposed facial expression recognition system.

Given an input facial image of neutral expression, we may transform the facial image with a set of expression transition matrices T = {Tneutral-anger, Tneutral-week anger , Tneutral-fear, Tneutral-week fear, Tneutral-happiness, Tneutral-week happiness, Tneutral-neutral, Tneutral-sadness, Tneutral-week sadness, Tneutral-surprise and Tneutral-week surprise } to obtain the transformed facial images respectively. Next, the method of correlation matching [24] defined in Eq. (16) is applied to calculate the correlation coefficients between transformed facial images and the real facial images.

(33)

2 1 2

2 [ ( , ) ]

ˆ ] ) , ˆ( [

] ) , ( ˆ ][

) , ˆ( [

=

∑ ∑ ∑∑

∑∑

x x y

m y

m x y

m m

f y x f f

y x f

f y x f f y x f

γ (16)

Where fˆis transformed facial image, f is registered facial image, and mand fm are average pixel values of transformed facial image and registered facial image respectively. The facial expression is identified by choosing the one whose maximum correlation coefficient.

(34)

22

Chapter 5

Experimental Results

The proposed facial expression recognition system consists of two phases: training phase and recognition phase. In the training phase, 240 facial images captured from 104 persons in Cohn-Kanade facial expression database are used to construct the expression transition matrices. All the training facial images are classified into six kinds of expressions (anger, fear, happiness, neutral, sadness, and surprise) and the effective face region are extracted by the method mentioned in chapter 2 from each facial image. Effective face region is normalized to a low-resolution image with size 30×30 and the method of direct mapping is applied to construct totally 11 expression transition matrices.

In the recognition phase, 20 video clips from 8 different persons are used as the test data to evaluate the proposed facial expression recognition system. We employ the face detection technique in Intel’s open source computer vision library to detect the face region with the rate about 10 frames per second. Next, the method of template matching and Hough transform are applied to search the positions of left-eye, and right-eye. Color template matching and edge searching are applied to find the position of mouth. Based on the positions of eyes and mouth effective face region can be extracted and normalized to size 30×30.

The ground truth of the accuracy analysis is constructed by human observing of the expressions in the test video clips. The facial expressions in each test video are recognized frame by frame with the rate about 0.24 seconds per frame. The accuracy analyses are performed with the temporal filtering and the one without temporal

(35)

filtering. The objective of temporal filtering is to remove the abrupt changes of the recognized expressions that are abnormal during the facial expression recognition process. Hence, in temporal filtering process, a suitable size of sliding window is applied to filter the abrupt changes of the recognized expressions. Table 1 shows the accuracy analysis without temporal filtering of facial expression recognition system and Table 2 shows the accuracy analysis with temporal filtering. Because the facial expressions are recognized continuously within the test videos, a lot of expression transitions occur and then this will influence the recognition accuracy. Furthermore, many testers can’t express the fear and sad expressions. Hence, we don’t recognize the fear and sad expressions in the real testing of facial expression recognition.

Table 1: The facial expression recognition results without temporal filter

video clip Neutral correct Happy correct Anger correct Surprise correct number right wrong rate right wrong rate right wrong rate right wrong rate video 1 35 1 97% 15 1 94% 11 0 100% 14 1 93%

video 2 62 0 100% 22 7 76% 20 2 91% 14 4 78%

video 3 78 0 100% 20 3 87% 18 7 72% 14 5 74%

video 4 73 1 99% 14 2 88% 29 4 88% 20 1 95%

video 5 27 0 100% 19 6 76% 20 3 87% 16 0 100%

video 6 28 6 82% 17 3 85% 14 11 56% 26 1 96%

video 7 37 2 95% 24 0 100% 17 2 89% 18 1 95%

video 8 30 0 100% 10 16 38% 18 1 95% 25 0 100%

video 9 29 3 91% 19 5 79% 19 2 90% 25 0 100%

video 10 34 0 100% 20 3 87% 32 2 94% 19 1 95%

video 11 31 2 94% 24 3 89% 17 7 71% 27 0 100%

video 12 30 0 100% 14 4 78% 19 8 70% 24 1 96%

video 13 40 0 100% 19 4 83% 18 2 90% 22 2 92%

video 14 13 0 100% 19 1 95% 26 0 100% 31 1 97%

video 15 56 2 97% 11 1 92% 11 1 92% 13 4 76%

video 16 22 16 58% 27 3 90% 21 1 95% 23 3 88%

video 17 33 5 87% 26 11 70% 17 2 89% 26 1 96%

(36)

24

video 19 38 0 100% 12 10 55% 10 8 56% 11 6 65%

video 20 55 3 95% 24 4 86% 14 2 88% 43 11 80%

total 819 43 95% 366 102 78% 378 73 84% 429 48 90%

Table 2: The facial expression recognition results with temporal filter

video clip neutral correct Happy correct anger correct surprise correct number right wrong rate right wrong rate right wrong rate right wrong rate

video 1 32 4 89% 15 1 94% 10 1 91% 13 2 87%

video 2 58 4 94% 25 4 86% 20 2 91% 16 2 89%

video 3 75 3 96% 20 3 87% 21 4 84% 15 4 79%

video 4 69 5 93% 14 2 88% 31 2 94% 19 2 90%

video 5 26 1 96% 22 3 88% 20 3 87% 15 1 94%

video 6 25 9 74% 18 2 90% 20 5 80% 25 2 93%

video 7 36 3 92% 23 1 96% 17 2 89% 17 2 89%

video 8 30 0 100% 14 12 54% 17 2 89% 25 0 100%

video 9 28 4 88% 23 1 96% 18 3 86% 24 1 96%

video 10 33 1 97% 21 2 91% 30 4 88% 18 2 90%

video 11 33 0 100% 25 2 93% 22 2 92% 26 1 96%

video 12 28 2 93% 15 3 83% 24 3 89% 23 2 92%

video 13 39 1 98% 21 2 91% 18 2 90% 22 2 92%

video 14 13 0 100% 18 2 90% 25 1 96% 31 1 97%

video 15 55 3 95% 10 2 83% 11 1 92% 16 1 94%

video 16 24 14 63% 30 0 100% 21 1 95% 25 1 96%

video 17 30 8 79% 34 3 92% 18 1 95% 25 2 93%

video 18 69 1 99% 14 11 56% 30 5 86% 20 3 87%

video 19 36 2 95% 17 5 77% 14 4 78% 13 4 76%

video 20 56 2 97% 24 4 86% 15 1 94% 43 11 80%

total 795 67 92% 403 65 86% 402 49 89% 431 46 90%

Because the facial expression (Happiness) in video clips 8 and 18 are not obvious they are often recognized as Neutral expression shown in Fig. 17. In addition, some images with eye closing shown in Fig. 18 also make the expression recognition inaccurate.

(37)

Fig. 17 Facial expression (Happiness) in video clips 9 and 18 are not obvious. They are often recognized as Neutral expression.

Fig. 18 Eye closing makes the expression recognition inaccurate.

(38)

26

Chapter 6

Conclusion

In this thesis, we propose an image-based facial expression recognition method called “expression transition” to identify six kinds of facial expressions including anger, fear, happiness, neutral, sadness, and surprise. The expression transition matrices are calculated by direct mapping and used to develop our facial expression recognition system. The experimental results show that the proposed facial expression recognition system can recognize the facial expressions with the speed of 0.24 seconds per frame and accuracy above 86%. . The future works focus on refining the expression transition matrices with a lager facial expression database and recognizing the facial images with the various kinds of view angles.

(39)

Reference

[1] I. Kotsia and I. Pitas, “Real time facial expression recognition from image sequences using support vector machines,” IEEE International Conference on Image Processing, vol. 2, pp. 966-969, 2005.

[2] J. Cohn, A. Zlochower, J. J. Lien, Y. T. Wu, and T. Kanade, “Automated face coding: a computer-vision based method of facial expression analysis,” 7th European Conference on Facial Expression Measurement and Meaning, pp.

329-333, 1997.

[3] L. Trujillo, G. Olague, R. Hammoud, and B. Harnandez, “Automatic feature localization in thermal images for facial expression recognition,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol.

3, pp. 14, 2005.

[4] Y. Z. Lu and Z. Y. Wei, “Facial expression recognition based on wavelet transform and MLP neural network,” Proceedings of 7th International Conference on Signal Processing, vol. 2, pp. 1340-1343, 2004.

[5] G. Donato, M. S. Bartlett, J. C. Hager, P. Ekman, and T. J. Sejnowski,

“Classifying facial actions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, pp. 974-989, 1999.

[6] C. L. Huang and Y. M. Huang, “Facial expression recognition using model-based feature extraction and action parameters classification,” Journal of Visual Communication and Image Representation, vol. 8, pp. 278-290, 1997.

[7] M. S. Bartlett, G. Littlewort, M. Frank, C. Lainscsek, I. Fasel, and J. Movellan,

“Recognizing facial expression: machine learning and application to spontaneous behavior,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 568-573, 2005.

[8] C. Shan, S. Gong, and P. W. McOwan, “Recognizing facial expressions at low resolution,” IEEE Conference on Advanced Video and Signal Based Surveillance, pp. 330-335, 2005.

[9] Y. Wang, H. Ai, B. Wu, and C. Huang, “Real time facial expression recognition with adaboost,” Proceedings of the 17th International Conference on Pattern Recognition, vol. 3, pp. 926-929, 2004.

(40)

28

[10] R. Cowie, E. Dougls-Cowie, N. Tsapatsoulis, G.Votsis, S. Kollias, W. Fellenz, and J. G. Taylor, “Emotion recognition in human-computer interaction,” IEEE Signal Processing Magazine, vol. 18, pp. 32-80, 2001.

[11] P. Ekman, Telling lies: clues to deceit in the marketplace, politics, and marriage, ed. first, W. W. Norton & Company, New York, 1985.

[12] M. Heller and V. Haynal, “The faces of suicidal depression (translation les visages de la depression de suicide),” Kahiers Psychiatriques Genevois (Medicine et Hygiene Editors), vol. 16, pp. 107-117, 1994.

[13] P. Ekman and W. Friesen, “Facial action coding system: a technique for the measurement of facial movement,” Palo Alto, Calif.: Consulting Psychologists Press, 1978.

[14] OTCBVS database, http://www.cse.ohio-state.edu/otcbvs-bench.

[15] A. Utsumi and N. Tetsutani, “Adaptation of appearance model for human tracking using geometrical pixel value distributions,” Proceedings of Sixth Asian Conference on Computer Vision, vol. 2, pp. 794-799, 2004.

[16] Intel’s open source computer vision library, http://www.intel.com/technology/

computing/opencv/index.htm.

[17] R. C. Gonzalez and R. E. Woods, Digital Image Processing, ed. second, Prentice Hall, New Jersey, 2002.

[18] Cohn-Kanade facial expression database, http://vasc.ri.cmu.edu/idb/html/face/

facial_expression/index.html.

[19] A. Utsumi and N. Tetsutani, “Human detection using geometrical pixel value structures,” Proceedings of Fifth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 34-39, 2002.

[20] S. J. Leon, Linear Algebra with Applications, ed. fifth, Prentice Hall, New Jersey, 1998.

[21] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 511-518, 2001.

[22] H. Jee, K. Lee, and S. Pan, “Eye and face detection using SVM,” Proceedings of the 2004 Conference on Intelligent Sensors, Sensor Networks and Information Processing, pp. 577-580, 2004.

[23] R. C. Gonzalez and R. E. Woods, Digital Image Processing, ed. second, Prentice Hall, New Jersey, 2002.

(41)

[24] C. Shan, S. Gong, and P. W. McOwan, “Robust facial expression recognition using local binary patterns,” IEEE International Conference on Image Processing, vol. 2, pp. II-370-3, 2005.

[25] J. Ko, E. Kim, and H. Byun, “A simple illumination normalization algorithm for face recognition”,Springer-Verlag Berlin Heidelberg 2002,PRICAI 2002, LNAI 2417, pp. 532-541, 2002.

[26] Z. H. Zhou and X. Geng, “Projection functions for eye detection”, Pattern Recognition 37 (2004), pp. 1049–1056.

[27] Z. Xu and P. Shi , “A robust and accurate method for pupil features extra"

Pattern Recognition, 2006. ICPR 2006. 18th International Conference on Volume 1, 20-24 Aug. 2006. pp. 437–440.

參考文獻

相關文件

In this paper, we propose a practical numerical method based on the LSM and the truncated SVD to reconstruct the support of the inhomogeneity in the acoustic equation with

Therefore, in this research, we propose an influent learning model to improve learning efficiency of learners in virtual classroom.. In this model, teacher prepares

We present a new method, called ACC (i.e. Association based Classification using Chi-square independence test), to solve the problems of classification.. ACC finds frequent and

Tekalp, “Frontal-View Face Detection and Facial Feature Extraction Using Color, Shape and Symmetry Based Cost Functions,” Pattern Recognition Letters, vol.. Fujibayashi,

(b) Write a program (Turing machine, Lisp, C, or other programs) to simulate this expression, the input of the program is these six Boolean variables, the output of the program

Trace of display recognition and mental state inference in a video labelled as undecided from the DVD [5]: (top) selected frames from the video sampled every 1 second; (middle) head

To enhance availability of composite services, we propose a discovery-based service com- position framework to better integrate component services in both static and dynamic

In this thesis, we propose a Density Balance Evacuation Guidance base on Crowd Scatter Guidance (DBCS) algorithm for emergency evacuation to make up for fire