• 沒有找到結果。

IdenNet: Identity-Aware Facial Action Unit Detection

N/A
N/A
Protected

Academic year: 2022

Share "IdenNet: Identity-Aware Facial Action Unit Detection"

Copied!
25
0
0

加載中.... (立即查看全文)

全文

(1)

IdenNet: Identity-Aware Facial Action Unit Detection

Cheng-Hao Tu Chih-Yuan Yang Jane Yung-jen Hsu Intelligent Agent Lab

Computer Science and Information Engineering

iAgents Lab

(2)

Outline

➢ Introduction and Motivation

➢ Related Work

➢ Proposed Method

➢ Experiments

➢ Conclusion

(3)

Facial Action Units (AUs)

● Facial expression is a fast and non-verbal channel conveying our emotions and intentions.

● Facial Action Units (AUs) indicate fundamental actions of individual muscles or groups of muscles which describe more than 7,000 observable facial

expressions.

(4)

Problem: Identity-based Differences

(5)

Motivation - Limited Subjects of AU-annotated Datasets

● Labeling AUs takes time and requires expert knowledge. Therefore, existing AU-annotated datasets only contain limited subjects.

Dataset​ Labels​​ Number​of​

Subjects​​ Number​of​

Samples​​

BP4D​ AUs​​ 41​ 328​videos​​

(148562​frames)​

DISFA​ AUs​​ 27​ 27​videos​(4​mins,​

~129,600​frames)​​

UNBC-McMaster​ AUs,​Pain​ 25​ 200​videos​​

(48398​frames)​

AMFED​ AUs,​Interest​​ <=​242​ 242​videos​(1​

mins)​

(6)

Idea - Utilizing ID-annotated Datasets

Dataset​ Number​of​

Subjects​ Number​of​

Samples​​

LFW​ 5,749​ 13,233​

WDRef​ 2,995​ 99,773​

CelebA​ 10,177​ 202,599​

VGG​FACE​ 2,622​ 2.6M​

VGG​FACE2 9,131 3.3M

● We aims to utilize identity-rich face datasets to improve AU detection.

(7)

Outline

➢ Introduction and Motivation

➢ Related Work

➢ Proposed Method

➢ Experiments

➢ Conclusion

(8)

SVTPT: Support Vector-based Transductive Parameter Transfer (ICMI 2014)

This method learns multiple linear classifiers (Support Vector Machine) using labeled source data. For an unlabeled test image, the method transfers those classifier parameters to create a new linear classifier.

This method runs slowly because it needs to compare a test image with all training images.

(9)

DRML: Deep Region and Multi-label Learning for Facial Action Unit Detection (CVPR 2016)

This method develops a region layer to simultaneously learn regional information and AU co-occurrence.

This method does not use any additional dataset to train the model.

(10)

ROINet: Action Unit Detection with Region Adaptation, Multi-labeling Learning and

Optimal Temporal Fusing (CVPR 2017)

This method selects 20 regions of interest to detect AUs, and fuses temporally information for inputs as frame sequences.

This method does not use additional datasets.

The network size is large because it uses VGG conv1 to conv12 to extract features.

(11)

E-Net+ATF: Identity-based Adversarial Training of Deep CNNs for Facial Action Unit Recognition

(BMVC 2018)

E-Net means a VGG19-based network containing enhancing layers, which is extended from ROINet.

ATF (adversarial training framework) means the parameter updating strategy for its three sub-networks: E-Net, AU, and ID.

It shares a similar idea as ours: to reduce identity-caused differences.

Differences: it uses VGG (large, generic) but ours uses LightCNN (small, face-specific).

Ours uses an additional identity-rich dataset.

(12)

LightCNN: A light CNN for Deep Face

Representation with Noisy Labels (IEEE TIFS 2018)

This method develops MFM (Max-Feature-Map), a variation of the maxout

activation function, to handle noisy data and reduce model size and computational load.

MFM-based small networks work well for face identification problems.

(13)

Outline

➢ Introduction and Motivation

➢ Related Work

➢ Proposed Method

➢ Experiments

➢ Conclusion

(14)

Ideas

● To remove identity-caused differences

We subtract identity-based features from AU-based features.

● To make the network small and robust

We adopt LightCNN and its pre-trained weights as our feature extractor.

We use ID-annotated, noisy images to train the network.

(15)

Face Clustering Task

● We train our network to “learn a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity.“ (Schroff et al. 2015)

● We use the same triplet loss proposed by FaceNet.

close

far positive sample

(same identity)

negative sample (different identity) anchor sample

(given identity)

(16)

AU Detection Task

● We adopt cross-entropy loss for AU detection, which is widely used by existing AU detection methods.

t-SNE visualization of 6000 frames from 6 identities (1000 frame per identity)

black: AU12 labeled gray: AU12 not labeled

(17)

Optimization

● We combine AU- and ID-annotated datasets in our end-to-end training procedure with the total loss

● For ID-annotated batches, we set as 1 because AU labels are unavailable.

● For AU/ID-annotated batches, we set as 0.5.

(18)

Network Architecture

● We adopt 4 convolutional layers from LightCNN and use its weights

pretrained on the CASIA-WebFace.

● We randomly initialize our Face

Clustering and AU Detection layers and use CelebA as our ID-annotated

dataset to train the network.

(19)

Outline

➢ Introduction and Motivation

➢ Related Work

➢ Proposed Method

➢ Experiments

➢ Conclusion

(20)

Datasets

AU/ID-annotated ID-annotated

Dataset BP4D DISFA UNBC-McMaster CelebA

Year of release 2014 2013 2011 2015

Number of identities

41 27 25 10177

Number of frames 148,572 ~129,600 48,398 202,599

Number of annotated AUs

27 12 10 none

Example image

(21)

Matrics

F1-measure, the harmonic mean of recall and precision

(22)

BP4D within-dataset

(23)

DISFA within-dataset

UNBC-McMaster within-dataset

(24)

Trained on BP4D / Test on DISFA

Trained on BP4D / Test on UNBC-McMaster

(25)

Summary

● We propose a lightweight, effective network to address the AU detection problem.

● It adopts two existing methods: LightCNN, and FaceNet for face clustering, and a subtraction to reduce identity-caused differences for AU detection.

● It generates state-of-the-art performance for both within-dataset and cross- dataset scenarios.

參考文獻

相關文件

By correcting for the speed of individual test takers, it is possible to reveal systematic differences between the items in a test, which were modeled by item discrimination and

Since it is so, what do we cultivate for?People are looking for the ways to improve the mental state, and the courage or wisdom to face the hard moments.. But the ways of improving

– evolve the algorithm into an end-to-end system for ball detection and tracking of broadcast tennis video g. – analyze the tactics of players and winning-patterns, and hence

Tekalp, “Frontal-View Face Detection and Facial Feature Extraction Using Color, Shape and Symmetry Based Cost Functions,” Pattern Recognition Letters, vol.. Fujibayashi,

Our main goal is to give a much simpler and completely self-contained proof of the decidability of satisfiability of the two-variable logic over data words.. We do it for the case

Pantic, “Facial action unit detection using probabilistic actively learned support vector machines on tracked facial point data,” IEEE Conference on Computer

For the items of ―identity recognition‖ and ―education outreach‖, it can he improved by the promotion of public art education and make the life aesthetics take root in

It finds the water-leaking factors for structures, and then discusses prevention methods and measures from design and constructional point of views.. It was found that