Human Age Estimation from Face Images - 使用圖像和深度學習了解社交互動

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

2.2 Human Age Estimation from Face Images

The current age estimation systems utilizing face images usually comprise of two concate-nated stages: aging-related facial feature extraction and age estimation techniques. Thus, we review the literature on age estimation according to these two stages.

2.2.1 Aging-Related Facial Feature Extraction

Most previous studies for facial age estimation focused on the extraction and fusion of different types of facial features. For example, Choi et al. [64] compared the performances of various methods (e.g., sobel filter, difference image between original and smoothed image, ideal high pass filter (IHPF), Gaussian high pass filter (GHPF), Haar and Daubechies discrete wavelet transform (DWT)) for extracting local features that can be used for detailed age estimation. Both [65] and [66] combined global and local features (e.g., active appearance models (AAM), Gabor filters, local binary patterns (LBP), Gabor wavelets (GW), and local phase quantization (LPQ)) to form hybrid features in order to have a better facial aging representation. Furthermore, Huerta et al. [67] used a fusion of textural and local appearance-based descriptors to achieve faster and more accurate results. Guo et al. [68] proposed the use of canonical correlation analysis (CCA) for jointly estimating the age with other facial information such as gender.

Meanwhile, other studies concentrate on extracting new features specially designed to estimate age [69, 70]. For example, Guo et al. [69] proposed using the biologically inspired features (BIF) for estimating human age from faces, that are generated based on a pyramid

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

of Gabor filters. Geng et al. [70] introduced an approach named as AGing pattErn Subspace (AGES) for age estimation. In this AGES approach, they model the aging process using an aging pattern which is defined as a sequence of face images of the same person at different ages and sorted in the time order. For encoding the face images, the AAM [71, 72] is used to extract the feature vectors represent the face images in an aging pattern, indicating that the extracted feature combines both the shape and the intensity of the face images. In [73], aging face was represented by integrating AAM, LBP, and Gabor features, that are extracted from the face image. Furthermore, Suo et al. [74] proposed to design four graphical facial features, that is, topology, geometry, photometry, and configuration, based on their recent developed multi-resolution hierarchical face model [75]. Guided by this hierarchical model, instead of densely pursuing all filters over the image lattice, they applied particular filters to various parameters at different levels to extract these four types of features for age estimation.

The authors in [76] incorporated the features of LBP histogram with main components of BIF, shape and texture features of AAM, and the projection of the original image pixels to principal component analysis (PCA) subspace, for representing the aging of the face image. Both [77] and [67] used the histogram of oriented gradients (HOG) [78] to represent facial features. Recent studies also proposed to use high-level complex age-related visual features extracted using deep learning techniques such as CNN for automatic age estimation [32, 79, 80]. These studies demonstrated that high-level semantic features designed based on deep neural networks architectures usually perform better than hand-crafted features.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

2.2.2 Age Estimation Techniques

After extracting and representing aging features, the subsequent step is to estimate the age. Age estimation can be considered as a particular task of pattern recognition. It can be approached as a multi-class classification problem when each age label is viewed as a class [45, 81, 82]. On the other hand, age estimation can be considered as a regression problem when age labels are viewed as sequential chronological series [24, 83, 84]. Thus, age estimation techniques have been divided into the following two categories: a) classification-based methods; and b) regression-classification-based methods.

Regarding the classification-based methods, the existing studies have introduced several types of classifications models. For instance, Gao and Ai [48] used a fuzzy version of a classifier known as linear discriminant analysis (LDA) [85] to classify the face image as one of the following four coarse categories: baby, child, adult, and old. Lanitis et al. [45]

presented a quantitative evaluation of the performance of different classifiers for automatic age estimation, including the nearest neighbor classifier, the artificial neural networks (ANN), and a quadratic function classifier. Ueki et al. [81] formulated the age estimation as an 11-class classification problem for the Waseda human-computer Interaction Technology-DataBase (WIT-DB) which has 11-class age-groups registered. They first built eleven Gaussian models from each 11 age-group in a low-dimensional 2DLDA+LDA feature space using the expectation-maximization (EM) algorithm. The age-group classification is then determined by fitting the test image to each Gaussian model and comparing the likelihoods.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Han and Jain [86] used three different support vector machines (SVMs) to predict the age group (or exact age), gender, and race of a subject.

By considering age estimation as a regression problem, Lanitis et al. [29] examined the following three formulations for the aging function: linear, quadratic, and cubic, using 50 raw model parameters. A genetic algorithm is employed to learn the optimal model parameters from training face images of various ages. Guo et al. [19, 18] applied support vector regression (SVR) technique on age manifold learned with the orthogonal locality preserving projections (OLPP) method for age estimation. To fit aging manifold learned with the conformal embedding analysis (CEA) method, Fu et al. [83, 24] used a multiple linear regression function [87], which attains considerable improvements over some existing methods. A semidefinite programming (SDP) formulation is used by Yan et al. [88] to solve the regression problem for age estimation, in which the regressor is learned from uncertain nonnegative labels. The authors demonstrated that using SDP formulation for age regression provides much better results than the quadratic regression function and the multilayer perceptrons. However, the SDP is computationally very expensive particularly when the training set is large.

Recently, several deep learning-based techniques have been used for facial age estimation.

For example, Takimoto et al. [89] integrated a multilayered neural network with the adapted retinal sampling mechanism in order to estimate facial age. Geng et al. [90] proposed a constructive probabilistic neural network for facial age estimation based on learning from label distributions. The CNNs have been used in different recent studies on age estimation as well [91–94]. Niu et al. [33] used ordinal regression and multiple-output CNN

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

for age estimation. A series of sub-problems were transformed into binary classification from the ordinal regression then solved by CNNs as each output layer matching one sub-problem. Chen et al. [95] proposed a cascaded classification- regression framework for estimating the apparent age from unconstrained face images using deep convolutional neural networks (DCNNs). An error-correcting mechanism is also used to correct any erroneous age prediction.

In summary, all the previous works followed the conventional paradigm for facial age estimation, i.e., learning direct mappings between the extracted facial features and the associated age labels. These observations motivated us to develop our comparative approach using the deep learning method for estimating facial age.

Motivated by the human cognitive processes [37], it is arguable that a more robust approach to estimate a facial age is to be in a comparative manner, that is, learning from a number of comparative relations (a given face is younger or older than another face of known age). The development of our approach was also inspired by other ranking-based methods, such as Ranking SVM [96], RankBoost [97], and RankNet [98]. Ranking SVM [96]

formulates learning to rank as the problem of classifying instance pairs into two categories:

correctly ranked and incorrectly ranked. Experimental results of this method demonstrated that the algorithm performs well in practice, successfully adapting the retrieval function of a meta-search engine to the preferences of a group of users. Nevertheless, the losses (penalties) of incorrect ranking between higher ranks and lower ranks, and incorrect ranking among lower ranks are specified the same. This remark will cause problems for facial age estimation, as the youngest and oldest persons have entirely different facial information.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

RankBoost [97] is another ranking algorithm that is trained on pairs; it is similar to our work because it attempts to directly solve the preference learning problem rather than solve an ordinal regression problem. The results are provided using decision stumps as weak learners. RankNet algorithm [98] is easy to train and performs well on a real-world ranking problem with large amounts of data. In addition, RankNet explores the use of a neural network formulation. A probabilistic cost for training systems is also proposed to learn ranking functions using pairs of training examples. In this study, a novel ranking approach is presented through the proposed comparative framework for facial age estimation. First, a set of selected references, i.e., baseline samples, is introduced into the framework to make each rank more robust. Second, the proposed age estimation model is generated using the deep learning technique, providing effective features to rank each age based on the facial information. Finally, the younger/older comparison will help provide robust ranking by leaning similar facial information to estimate similar ranks; thus, the ranking will be better structured.

在文檔中使用圖像和深度學習了解社交互動 - 政大學術集成 (頁 42-47)