本文中分別測試了幾種計算人臉影像集距離的方法,並且在 4 個不同的資料 集上進行實驗。這幾種方法中「LAHISD」在辨識率和計算時間上有較佳的表現。
這個方法是在 2010 年所提出,是近年來較新的計算影像集相似度的方法之一,
雖然在簡單的環境如資料集「Honda/UCSD」可以有非常好的表現,但是在資料 集「Sinica Face Video Dataset」的表現卻非常不理想,顯示出在人臉辨識這領域 裡計算影像集相似度的演算法還有許多進步的空間。
從本文中的數個實驗結果得知在進行人臉辨識任務時的數項影響因子:對辨 識率的影響在合理範圍內的隨機取樣數大約為每組影像集各取 10~20 張人臉影 像;證明了使用影片進行辨識比單獨使用影像辨識會有更好的辨識率;訓練和測 試資料之間不該有重疊或是來自相似甚至相同影片的片段;少數偵測錯誤的人臉 對整體辨識率影響不大。另外在人臉偵測準確度的影響實驗中,在隨機取樣數 50 和 100 時,用自動化偵測的人臉進行辨識比反而比用人工標註的人臉進行辨 識有更好的辨識率,這部分可能可以透過只從人工標註的人臉取出自動化偵測的 到的人臉進行實驗,或許可以排除一些人工標記出來較不正向的臉對辨識率的影 響。
在本文中介紹了一個新的影片資料集「Sinica Face Video Dataset」的蒐集和 標記過程。為了進一步的人臉辨識任務,此資料集讓性別分類和成人/小孩分類 的研究可以在富有挑戰性的真實生活中的影片上進行。從本文所展現的實驗結果,
部追蹤演算法。此外手動標記的程式可用於其他類型的影片資料集,以建立人物 追蹤的 ground truths。
新的資料集帶來了一些新的挑戰和相關研究。在本文中已經用人工方式標記 了人臉的區域範圍 ground truths,未來可以更進一步的將臉部特徵例如五官輪廓 等也用人工方式標記出 ground truths。並使真實生活中的影片臉部特徵標記與追 蹤演算法將能夠像人臉偵測與追蹤演算法一樣進行評估與研究,還可以研究臉部 影像校準(facial image alignment)對辨識效能的影響。
參考文獻
[1] P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, 2001, pp. I-511-I-518 vol.1.
[2] Mark Everingham, Josef Sivic, and A. Zisserman, "Hello! My name is... Buffy -- automatic naming of characters in TV video," BMVC 2006, 4-7 September 2006, Edinburgh, UK, 2006.
[3] A. Katsamanis, G. Papandreou, and P. Maragos, "Audiovisual-to-articulatory speech inversion using Active Appearance Models for the face and Hidden Markov Models for the dynamics," in Acoustics, Speech and Signal
Processing, 2008. ICASSP 2008. IEEE International Conference on, 2008, pp.
2237-2240.
[4] G. Shaogang, S. McKenna, and J. J. Collins, "An investigation into face pose distributions," in Automatic Face and Gesture Recognition, 1996.,
Proceedings of the Second International Conference on, 1996, pp. 265-270.
[5] L. Shen, L. Bai, and M. Fairhurst, "Gabor wavelets and General Discriminant Analysis for face identification and verification," Image and Vision Computing, vol. 25, pp. 553-563, 5/1/ 2007.
[6] R. Thiyagarajan, S. Arulselvi, and G. Sainarayanan, "Gabor feature based classification using statistical models for face recognition," Procedia Computer Science, vol. 2, pp. 83-93, // 2010.
[7] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, "Face recognition: A literature survey," Acm Computing Surveys (CSUR), vol. 35, pp. 399-458, 2003.
[8] M. Turk and A. Pentland, "Eigenfaces for Recognition," Journal of Cognitive Neuroscience, vol. 3, pp. 71-86, 1991/01/01 1991.
[9] P. N. Belhumeur, J. P. Hespanha, and D. Kriegman, "Eigenfaces vs.
Fisherfaces: recognition using class specific linear projection," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 19, pp.
711-720, 1997.
[10] K. Etemad and R. Chellappa, "Discriminant analysis for recognition of human face images," JOSA A, vol. 14, pp. 1724-1733, 1997.
[11] Y. Jian, D. Zhang, A. F. Frangi, and Y. Jing-Yu, "Two-dimensional PCA: a new approach to appearance-based face representation and recognition," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 26, pp.
131-137, 2004.
[12] A. V. Nefian and M. H. Hayes III, "Hidden Markov models for face
recognition," in Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, 1998, pp. 2721-2724.
[13] T. Ahonen, A. Hadid, and M. Pietikäinen, "Face Recognition with Local Binary Patterns," in Computer Vision - ECCV 2004. vol. 3021, T. Pajdla and J.
[15] J. Bigün, B. Duc, F. Smeraldi, S. Fischer, and A. Makarov, "Multi-modal person authentication," in Face Recognition, ed: Springer, 1998, pp. 26-50.
[16] M. Everingham and A. Zisserman, "Automated person identification in video,"
in Image and Video Retrieval, ed: Springer, 2004, pp. 289-298.
[17] M. Everingham and A. Zisserman, "Automated visual identification of characters in situation comedies," in Pattern Recognition, 2004. ICPR 2004.
Proceedings of the 17th International Conference on, 2004, pp. 983-986 Vol.4.
[18] U. Park and A. K. Jain, "3D model-based face recognition in video," in Advances in Biometrics, ed: Springer, 2007, pp. 1085-1094.
[19] K. Fukui and O. Yamaguchi, "Face Recognition Using Multi-viewpoint Patterns for Robot Vision," in Robotics Research. vol. 15, P. Dario and R.
Chatila, Eds., ed: Springer Berlin Heidelberg, 2005, pp. 192-201.
[20] O. Yamaguchi, K. Fukui, and K. i. Maeda, "Face recognition using temporal image sequence," in Automatic Face and Gesture Recognition, 1998.
Proceedings. Third IEEE International Conference on, 1998, pp. 318-323.
[21] W. Ruiping, S. Shiguang, C. Xilin, and G. Wen, "Manifold-Manifold Distance with application to face recognition based on image set," in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, 2008, pp.
1-8.
[22] H. Cevikalp and B. Triggs, "Face recognition based on image sets," in
Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, 2010, pp. 2567-2573.
[23] H. Yiqun, A. S. Mian, and R. Owens, "Sparse approximated nearest points for image set classification," in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 2011, pp. 121-128.
[24] Y.-C. Chen, V. Patel, P. J. Phillips, and R. Chellappa, "Dictionary-Based Face Recognition from Video," in Computer Vision – ECCV 2012. vol. 7577, A.
Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, Eds., ed: Springer Berlin Heidelberg, 2012, pp. 766-779.
[25] Z. Liu and Y. Wang, "Major cast detection in video using both audio and visual information," in Acoustics, Speech, and Signal Processing, 2001.
Proceedings.(ICASSP'01). 2001 IEEE International Conference on, 2001, pp.
1413-1416.
[26] Z. Liu and Y. Wang, "Major cast detection in video using both speaker and face information," Multimedia, IEEE Transactions on, vol. 9, pp. 89-101, 2007.
[27] M. Everingham, J. Sivic, and A. Zisserman, "Taking the bite out of automated naming of characters in TV video," Image and Vision Computing, vol. 27, pp.
545-559, 2009.
[28] E. El Khoury, C. Senac, and P. Joly, "Face-and-clothing based people
clustering in video content," in Proceedings of the international conference on Multimedia information retrieval, 2010, pp. 295-304.
[29] Y. Gao, T. Wang, J. Li, Y. Du, W. Hu, Y. Zhang, et al., "Cast indexing for videos by ncuts and page ranking," in Proceedings of the 6th ACM
international conference on Image and video retrieval, 2007, pp. 441-447.
[30] K. YAMAMOTO, O. YAMAGUCHI, and H. AOKI, "Fast face clustering based on shot similarity for browsing video," 2010.
[31] L. Kuang-chih, J. Ho, Y. Ming-Hsuan, and D. Kriegman, "Video-based face recognition using probabilistic appearance manifolds," in Computer Vision and
Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, 2003, pp. I-313-I-320 vol.1.
[32] R. Gross and J. Shi, "The cmu motion of body (mobo) database," 2001.
[33] L. Wolf, T. Hassner, and I. Maoz, "Face recognition in unconstrained videos with matched background similarity," in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 2011, pp. 529-534.
[34] G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, "Labeled faces in the wild: A database forstudying face recognition in unconstrained environments,"
in Workshop on Faces in'Real-Life'Images: Detection, Alignment, and Recognition, 2008.
[35] 林維昭, "private communication," ed, 2010.
[36] 蘇裕傑, "根據姿勢與外貌整合的影像人臉註記," 交通大學多媒體工程研 究所學位論文, pp. 1-43, 2011.
[37] Y. Hu, A. S. Mian, and R. Owens, "Face Recognition Using Sparse Approximated Nearest Points between Image Sets," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 34, pp. 1992-2004, 2012.
[38] S. Y, L. Y, C. J, and W. L, "辨識率 5 成~鳩咪," 2010.
[39] A. Strehl and J. Ghosh, "Cluster ensembles---a knowledge reuse framework for combining multiple partitions," The Journal of Machine Learning Research, vol. 3, pp. 583-617, 2003.
[40] L. Hubert and P. Arabie, "Comparing partitions," Journal of Classification, vol.
2, pp. 193-218, 1985/12/01 1985.
[41] Rand index - Wikipedia. Available: http://en.wikipedia.org/wiki/Rand_index