• 沒有找到結果。

IdenNet: Identity-Aware Facial Action Unit Detection

N/A
N/A
Protected

Academic year: 2022

Share "IdenNet: Identity-Aware Facial Action Unit Detection"

Copied!
25
0
0
顯示更多 ( 頁)

全文

(1)

IdenNet: Identity-Aware Facial Action Unit Detection

Cheng-Hao Tu Chih-Yuan Yang Jane Yung-jen Hsu Intelligent Agent Lab

Computer Science and Information Engineering

iAgents Lab

(2)

Outline

➢ Introduction and Motivation

➢ Related Work

➢ Proposed Method

➢ Experiments

➢ Conclusion

(3)

Facial Action Units (AUs)

● Facial expression is a fast and non-verbal channel conveying our emotions and intentions.

● Facial Action Units (AUs) indicate fundamental actions of individual muscles or groups of muscles which describe more than 7,000 observable facial

expressions.

(4)

Problem: Identity-based Differences

(5)

Motivation - Limited Subjects of AU-annotated Datasets

● Labeling AUs takes time and requires expert knowledge. Therefore, existing AU-annotated datasets only contain limited subjects.

Dataset​ Labels​​ Number​of​

Subjects​​ Number​of​

Samples​​

BP4D​ AUs​​ 41​ 328​videos​​

(148562​frames)​

DISFA​ AUs​​ 27​ 27​videos​(4​mins,​

~129,600​frames)​​

UNBC-McMaster​ AUs,​Pain​ 25​ 200​videos​​

(48398​frames)​

AMFED​ AUs,​Interest​​ <=​242​ 242​videos​(1​

mins)​

(6)

Idea - Utilizing ID-annotated Datasets

Dataset​ Number​of​

Subjects​ Number​of​

Samples​​

LFW​ 5,749​ 13,233​

WDRef​ 2,995​ 99,773​

CelebA​ 10,177​ 202,599​

VGG​FACE​ 2,622​ 2.6M​

VGG​FACE2 9,131 3.3M

● We aims to utilize identity-rich face datasets to improve AU detection.

(7)

Outline

➢ Introduction and Motivation

➢ Related Work

➢ Proposed Method

➢ Experiments

➢ Conclusion

(8)

SVTPT: Support Vector-based Transductive Parameter Transfer (ICMI 2014)

This method learns multiple linear classifiers (Support Vector Machine) using labeled source data. For an unlabeled test image, the method transfers those classifier parameters to create a new linear classifier.

This method runs slowly because it needs to compare a test image with all training images.

(9)

DRML: Deep Region and Multi-label Learning for Facial Action Unit Detection (CVPR 2016)

This method develops a region layer to simultaneously learn regional information and AU co-occurrence.

This method does not use any additional dataset to train the model.

(10)

ROINet: Action Unit Detection with Region Adaptation, Multi-labeling Learning and

Optimal Temporal Fusing (CVPR 2017)

This method selects 20 regions of interest to detect AUs, and fuses temporally information for inputs as frame sequences.

This method does not use additional datasets.

The network size is large because it uses VGG conv1 to conv12 to extract features.

(11)

E-Net+ATF: Identity-based Adversarial Training of Deep CNNs for Facial Action Unit Recognition

(BMVC 2018)

E-Net means a VGG19-based network containing enhancing layers, which is extended from ROINet.

ATF (adversarial training framework) means the parameter updating strategy for its three sub-networks: E-Net, AU, and ID.

It shares a similar idea as ours: to reduce identity-caused differences.

Differences: it uses VGG (large, generic) but ours uses LightCNN (small, face-specific).

Ours uses an additional identity-rich dataset.

(12)

LightCNN: A light CNN for Deep Face

Representation with Noisy Labels (IEEE TIFS 2018)

This method develops MFM (Max-Feature-Map), a variation of the maxout

activation function, to handle noisy data and reduce model size and computational load.

MFM-based small networks work well for face identification problems.

(13)

Outline

➢ Introduction and Motivation

➢ Related Work

➢ Proposed Method

➢ Experiments

➢ Conclusion

(14)

Ideas

● To remove identity-caused differences

We subtract identity-based features from AU-based features.

● To make the network small and robust

We adopt LightCNN and its pre-trained weights as our feature extractor.

We use ID-annotated, noisy images to train the network.

(15)

Face Clustering Task

● We train our network to “learn a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity.“ (Schroff et al. 2015)

● We use the same triplet loss proposed by FaceNet.

close

far positive sample

(same identity)

negative sample (different identity) anchor sample

(given identity)

(16)

AU Detection Task

● We adopt cross-entropy loss for AU detection, which is widely used by existing AU detection methods.

t-SNE visualization of 6000 frames from 6 identities (1000 frame per identity)

black: AU12 labeled gray: AU12 not labeled

(17)

Optimization

● We combine AU- and ID-annotated datasets in our end-to-end training procedure with the total loss

● For ID-annotated batches, we set as 1 because AU labels are unavailable.

● For AU/ID-annotated batches, we set as 0.5.

(18)

Network Architecture

● We adopt 4 convolutional layers from LightCNN and use its weights

pretrained on the CASIA-WebFace.

● We randomly initialize our Face

Clustering and AU Detection layers and use CelebA as our ID-annotated

dataset to train the network.

(19)

Outline

➢ Introduction and Motivation

➢ Related Work

➢ Proposed Method

➢ Experiments

➢ Conclusion

(20)

Datasets

AU/ID-annotated ID-annotated

Dataset BP4D DISFA UNBC-McMaster CelebA

Year of release 2014 2013 2011 2015

Number of identities

41 27 25 10177

Number of frames 148,572 ~129,600 48,398 202,599

Number of annotated AUs

27 12 10 none

Example image

(21)

Matrics

F1-measure, the harmonic mean of recall and precision

(22)

BP4D within-dataset

(23)

DISFA within-dataset

UNBC-McMaster within-dataset

(24)

Trained on BP4D / Test on DISFA

Trained on BP4D / Test on UNBC-McMaster

(25)

Summary

● We propose a lightweight, effective network to address the AU detection problem.

● It adopts two existing methods: LightCNN, and FaceNet for face clustering, and a subtraction to reduce identity-caused differences for AU detection.

● It generates state-of-the-art performance for both within-dataset and cross- dataset scenarios.

參考文獻

相關文件

Positron emission tomography: a new, precise imaging modality for detection of primary head and neck tumors and assessment of cervical adenopathy..

42 Thus, the prevention of oral cancer and its associated morbidity and mortality, hinges upon the early detection of oral precancerous lesions, allowing for histological evaluation

The combined immunohistochemical expression of p53 and Ki67 proteins could represent a simple and inexpensive molecular marker for early detection of potentially malignant lesions

For each imaginary quadratic number field K with class number 1, find an

In this research, we use conventional RGB (Red, Green, Blue) images as input data. The conventional RGB image is demosaiced from raw image by using the color interpolation

• Detection delay: the time required to sense whether a channel is idle (usually small). • Propagation delay: how fast it takes for a packet to travel from the transmitter to

We are not aware of any existing methods for identifying constant parameters or covariates in the parametric component of a semiparametric model, although there exists an

The research proposes a data oriented approach for choosing the type of clustering algorithms and a new cluster validity index for choosing their input parameters.. The

By correcting for the speed of individual test takers, it is possible to reveal systematic differences between the items in a test, which were modeled by item discrimination and

Since it is so, what do we cultivate for?People are looking for the ways to improve the mental state, and the courage or wisdom to face the hard moments.. But the ways of improving

– evolve the algorithm into an end-to-end system for ball detection and tracking of broadcast tennis video g. – analyze the tactics of players and winning-patterns, and hence

Tekalp, “Frontal-View Face Detection and Facial Feature Extraction Using Color, Shape and Symmetry Based Cost Functions,” Pattern Recognition Letters, vol.. Fujibayashi,

Our main goal is to give a much simpler and completely self-contained proof of the decidability of satisfiability of the two-variable logic over data words.. We do it for the case

Pantic, “Facial action unit detection using probabilistic actively learned support vector machines on tracked facial point data,” IEEE Conference on Computer

For the items of ―identity recognition‖ and ―education outreach‖, it can he improved by the promotion of public art education and make the life aesthetics take root in

It finds the water-leaking factors for structures, and then discusses prevention methods and measures from design and constructional point of views.. It was found that

Yu (a veteran elementary school teacher) with team support, through action research of the development and evaluation of PowerPoint (PPt) for three-digit

The issue of construction surplus soils can be solved by using it for production of CLSM (Soil-based CLSM, S-CLSM), and the effective reclamation of resources can

The purpose of this study is to propose a conceptual design of an intersection bus-pedestrian collision warning system with appropriate detection and warning parameters for

In order to avoid the continued accumulation of tracking error, we track the face region in the vicinity, the face detection processing, when the detected face, then use the results

In face detection process, we mainly used the skill of skin color detection and other image processing technology to extract the face region from the complex background.. In the

exists a group of citizens identifying themselves as both Chinese and Taiwan- ese, it has been expected that Chinese national identity positively influences attitudes toward

• “ The Chinese people consist of various ethnic groups, and those groups should not be isolated from one another. perception about