5.5 Applications and Experiments
5.5.4 Automatic Commenting Assistant
We propose a novel application – given an image, automatically recommend comments containing the most likely VACs predicted based on image content. Automatic comment-ing is an emergcomment-ing function in social media2, aiming at generating comments for a given post, e.g., tweets or blogs, by observing the topics and opinions appearing in the con-tent. However, commenting image has never been addressed because of the difficulty in understanding visual semantics and visual affects. Intuitively, commenting behavior is strongly influenced by viewer affect concepts. This motivates us to study automatically
2More details regarding commenting bot is introduced in http://en.wikipedia.org/wiki/Twitterbot
commenting images by the proposed viewer affect concept prediction.
The proposed method (Corr) considers the PACs detected from the visual content and the PAC-VAC correlations captured by the Bayesian probabilistic model described in Sec-tion 5.4.2. First, we detect the PACs in the test image and construct a candidate comment pool by extracting comments of images in the training set that contain similar PACs (the top 3 detected PACs with the highest P (pk|di)) in the visual content. Each comment is rep-resented by bag-of-viewer-affect-concepts as a vector Cl, indicating the presence of each VAC in that comment. Meanwhile, the test image is represented by a vector Vi consisting of the posterior probability P (vj|di) (cf. Eq. 5.3) of each VAC given the test image, di. The relevance between a comment and the test image is measured by their inner product sli = Cl · Vi. Finally, we select the comment with the highest relevance score sli from the candidate comment pool for automatic commenting. Note that, the images, which are used to extract comments in the candidate pool, do not overlap with the test image set. We compare our method with the two baselines (1) PAC-only: selecting one of the comments associated with another image having the most similar PAC to that of the test image and (2) Random: randomly selecting a comment from the comments of training images.
We conduct user study to evaluate the automatic commenting quality in terms of (1) plausibility, (2) specificity to the image content and (3) whether it is liked by users. Totally, 30 users are involved in this experiment. Each automatic comment is evaluated by three different users to avoid potential user bias. Each user is asked to evaluate 40 automatic comment, each is generated for a test image. The users are asked to rate the comment in three different dimensions (score from 1 to 3 in each dimension), Plausibility: how plausible the comment given the specific image content; Specificity: how specific the comment is to the image content; Like: how much does the user like the comment. Totally, 400 image-comment pairs are included in this investigation.
As shown in Figure 5.4, the most gain appears in plausibility where our method signif-icantly outperforms the other two baselines (PAC-only) and (Random) by 35% and 56%
(relative improvement), respectively. Additionally, the proposed approach also clearly improves specificity of the generated comments to the visual content in the image. For
gorgeous'composi,on!'what'a'beau,ful'place' to'be!'
great'super'moon'shot'and'nice'tutorial'too!' I'was'in'my'car'screaming'that'I'didn't'have'my' camera.'Beau,fully'done!'
This'beau,ful'photo'is'reminiscent'of'a'
Maxfield'Parrish'pain,ng,'at'least'to'my'eyes.'
(d)' (a)'
(b)'
(c)'
Cool'shot'with'the'haze'in'the'background.' Must'be'early'morning'late'spring?'
lovely'moody'shot'I'so'peaceful!''
they'are'so'cute'when'they'curl'up'like'this'to' sleep..nice'capture'
EpauleKes'might'make'it'look'a'liKle'longI necked...'
Figure 5.3: Example results of VAC prediction and automatic comment selection.
example, comments containing the affect concept “cute” are selected by our methods for images containing “dog,” “kid.” Our method (Corr) produces comments that are more liked by users. The potential reasons are, (1) our methods tend to include viewer affect concepts that comprise more emotional words and thus evoke stronger responses from the subjects; (2) our method uses the correlation model that tries to learn the popular comment-ing behavior discovered from real comments in social multimedia, as described in Section 5.4.2. Overall, commenting by our method has the quality closest to original real com-ment. Figure 5.3 (a) and (b) shows a few plausible and content relevant fake comments (dashed) automatically generated by the proposed commenting robot. One additional find-ing is if selected comments mention incorrect objects (“moon” in (c)) or actions (“sleep”
in (d)) in the given image, users can easily distinguish them from the real ones. This points out interesting future refinement by incorporating object detection in the automatic commenting process.
0"
0.1"
0.2"
0.3"
0.4"
0.5"
0.6"
0.7"
0.8"
plausibility" specificity" like"
real" random" PACAonly[3]" Corr"
Figure 5.4: Subjective quality evaluation of automatic commenting for image content.
In another evaluation scheme, we focus on plausibility of the faked comments. Each test includes an image, one original comment and the fake comments selected by the pro-posed method and the baseline (Random). User is asked to decide which one of the four comments is most plausible given the specific image. Comments generated by content-aware method can confuse the users in 28% of times, while the real comment was consid-ered to be most plausible in 61% of times. This is quite encouraging given the fact that our method is completely content-based, namely the prediction is purely based on analysis of the image content and the affect concept correlation model. No textual metadata of the image was used. It is also interesting that 11% of randomly selected comments are judged to be more plausible than the original real comment. However, as discussed earlier, such random comments tend to have poor quality in terms of content specificity.
5.6 Remarks
In this paper, we study visual affect concepts in the two explicit aspects, publisher affect concepts and viewer affect concepts, and aim at analyzing their correlations – what viewer affect concepts will be evoked when a specific publisher affect concept is expressed in the image content. For this purpose, we propose to discover hundreds of viewer affect concepts from a million-scale comment sets crawled from social multimedia. Further-more, we predict the viewer affect concepts by detecting the publisher affect concepts in
image content and the probabilistic correlations between such affect concepts and viewer affect concepts mined from social multimedia. Extensive experiments confirm exciting utilities of our proposed methods in the three applications, image recommendation, viewer affect concept prediction and image commenting robot. Future directions include incor-poration of the viewer profiles in predicting the likely response affects, and extension of the methods to other domains.
Chapter 6
Conclusions and Future Work
In summary, we address the human-centric data analytics from the three perspectives, (1) people-centric visual search, (2) demographic data mining and (3) viewer affective comment prediction. We propose a framework for learning facial attributes by crowd-sourcing weakly labeled data in social multimedia. Based on these automatically detected attributes, we demonstrate the effectiveness in retrieving images and mining user prefer-ences. Beyond profiling people in visual content, we further propose to analyze the viewer affective feedback elicited by social multimedia. The proposed methodologies are bene-ficial for cross-discipline research in computational sociology and cognitive psychology.
Furthermore, the mined knowledge are essential for advertisement, personalization ser-vices and more human-centric applications, which always draw great industry attention in terms of mobile, search, cloud computing and online advertising technologies. We believe these strong links will encourage more opportunities in collaborations and developments between academia and industry.
Bibliography
[1] available at
http://www.flickr.com/photos/spencerfinnley/5377578656/, http://www.flickr.com/photos/spolyak/1031569673/.
[2] Internet world stats: The latest internet indicators, usage, penetration rates, popula-tion, country size and iso 3316 symbol. http://www.internetworldstats.com/.
[3] I. Arapakis, J. M. Jose, and P. D. Gray. Affective feedback: An investigation into the role of emotions in the information seeking process. In ACM SIGIR Conference, 2008.
[4] M. Argyle and J. Dean. Eye-contact, distance and affliation. In Sociometry, 1965.
[5] S. Baluja and H. A. Rowley. Boosting sex identification performance. In Interna-tional Journal of Computer Vision, 2007.
[6] T. L. Berg, A. C. Berg, J. Edwards, M. Maire, R. White, Y.-W. Teh, E. Learned-Miller, and D. Forsyth. Names and faces in the news. In IEEE Conference on Com-puter Vision and Pattern Recognition, 2004.
[7] T. L. Berg, A. C. Berg, and J. Shih. Automatic attribute discovery and characteriza-tion from noisy web data. In European Conference on Computer Vision, 2010.
[8] Bird, Steven, E. Loper, and E. Klein. Natural language processing with python. 2009.
[9] D. Black. The theory of committees and elections. Cambridge University Press, London, 1958, 2nd ed., 1963.
[10] D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In ACM International Conference on Multimedia, 2013.
[11] A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In International Conference on Image and Video Retrieval, 2007.
[12] R. F. Bruce and J. M. Wiebe. Recognizing subjectivity: A case study of manual tagging. Natural Language Engineering, 1999.
[13] J. D. Burger, J. Henderson, G. Kim, and G. Zarrella. Discriminating gender on twit-ter. In International AAAI Conference on Weblogs and Social Media, 2011.
[14] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001.
Software available at
http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[15] B.-C. Chen, Y.-H. Kuo, Y.-Y. Chen, K.-Y. Chu, and W. H. Hsu. Semi-supervised face image retrieval using sparse coding with identity constraint. In ACM International Conference on Multimedia, 2011.
[16] H. Chen, A. Gallagher, and B. Girod. Describing clothing by semantic attributes. In European Conference on Computer Vision, 2012.
[17] Y.-Y. Chen, A.-J. Cheng, and W. H. Hsu. Personalized travel recommendation by mining people attributes and social group types from community-contributed photos.
In IEEE Transactions on Multimedia, 2013.
[18] Y.-Y. Chen, W. H. Hsu, and H.-Y. M. Liao. Learning facial attributes by crowdsourc-ing in social media. In International Conference on World Wide Web, 2011.
[19] Y.-Y. Chen, W. H. Hsu, and H.-Y. M. Liao. Discovering informative social subgraphs and predicting pairwise relationships from group photos. In ACM International Con-ference on Multimedia, 2012.
[20] Y.-Y. Chen, W. H. Hsu, and H.-Y. M. Liao. Automatic training image acquisition and effective feature selection from community-contributed photos for facial attribute detection. In IEEE Transactions on Multimedia, 2013.
[21] A.-J. Cheng, Y.-Y. Chen, Y.-T. Huang, W. H. Hsu, and H.-Y. M. Liao. Personalized travel recommendation by mining people attributes from community-contributed photos. In ACM International Conference on Multimedia, 2011.
[22] D. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world’
s photos. In International Conference on World Wide Web, 2009.
[23] N. Dalal and B. Trigg. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition, 2005.
[24] R. Datta, D. Joshi, J. Li, and J. Z. Wang. Studying aesthetics in photographic images using a computational approach. In European Conference on Computer Vision, 2006.
[25] M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent sub-structure-based approaches for classifying chemical compounds. In IEEE Transactions on Knowledge and Data Engineering, 2005.
[26] A. Esuli and F. Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. In International Conference on Language Resources and Evalua-tion, 2006.
[27] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning object categories from google’s image search. In IEEE International Conference on Computer Vision, 2005.
[28] D. J. Field. Relations between the statistics of natural images and the response prop-erties of cortical cells. J. Opt. Soc. Am. A, 1987.
[29] A. C. Florian Schroff and A. Zisserman. Harvesting image databases from the web.
In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011.
[30] K. A. Frank. Identifying cohesive subgroups. In Social Networks, 1995.
[31] K. A. Frank and J. Y. Yasumoto. Linking action to social structure within a system:
Social capital within and between subgroups. In American Journal of Sociology, 1998.
[32] A. C. Gallagher and T. Chen. Understanding images of groups of people. In IEEE Conference on Computer Vision and Pattern Recognition, 2009.
[33] X. Geng, T.-Y. Liu, T. Qin, and H. Li. Feature selection for ranking. In International ACM SIGIR conference on Research and development in information retrieval, 2007.
[34] G. Guo, G. Mu, Y. Fu, and T. S. Huang. Human age estimation using bio-inspired features. In IEEE Conference on Computer Vision and Pattern Recognition, 2009.
[35] E. T. Hall. The hidden dimension. In Culture, 1966.
[36] A. Hanjalic. Extracting moods from pictures and sounds: Towards truly personalized tv. IEEE Signal Processing Magazine, 2006.
[37] M. Hu and B. Liu. Mining opinion features in customer reviews. In AAAI Conference on Artificial Intelligence, 2004.
[38] P. Isola, J. Xiao, A. Torralba, and A. Oliva. What makes an image memorable? In IEEE Conference on Computer Vision and Pattern Recognition, 2011.
[39] T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European Machine Learning and Data Mining Conference, 1998.
[40] T. Kudo, E. Maeda, and Y. Matsumoto. An application of boosting to graph classi-fication. In Conference on Neural Information Processing Systems, 2004.
[41] N. Kumar, P. Belhumeur, and S. Nayar. Facetracer: A search engine for large col-lections of images with faces. In European Conference on Computer Vision, 2008.
[42] N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar. Attribute and simile clas-sifiers for face verification. In IEEE International Conference on Computer Vision, 2009.
[43] P. Lang, M. Bradley, and B. Cuthbert. International affective picture system (iaps):
Affective ratings of pictures and instruction manual. Technical Report A-8. Univer-sity of Florida, Gainesville, FL, 2008.
[44] D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Conference on Neural Information Processing Systems, 2001.
[45] Y.-H. Lei, Y.-Y. Chen, B.-C. Chen, L. Iida, and W. H. Hsu. Where is who: Large-scale photo retrieval by facial attributes and canvas layout. In ACM SIGIR Conference, 2012.
[46] B. Li, A. Ghose, and P. G. Ipeirotis. Towards a theory model for product search. In International Conference on World Wide Web, 2011.
[47] C. Li, A. Gallagher, A. C. Loui, and T. Chen1. Aesthetic quality assessment of con-sumer photos with faces. In International Conference on Image Processing, 2010.
[48] B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 1998.
[49] J. Machajdik and A. Hanbury. Affective image classification using features inspired by psychology and art theory. In ACM International Conference on Multimedia, 2010.
[50] A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI Workshop on Learning for Text Categorization, 1998.
[51] T. Mei, W. H. Hsu, and J. Luo. Knowledge discovery from community- contributed multimedia. IEEE Multimedia Magazine, 2010.
[52] T. Mensink and J. Verbeek. Improving people search using query expansions: how friends help to find people. In European Conference on Computer Vision, 2008.
[53] T. M. Mitchell. In Machine Learning, 1998.
[54] B. Moghaddam and M.-H. Yang. Learning gender with support faces. IEEE Trans-actions on Pattern Analysis and Machine Intelligence, 2002.
[55] B. Ni, Z. Song, and S. Yan. Web image mining towards universal age estimator. In ACM International Conference on Multimedia, 2009.
[56] S. Nowozin and K. Tsuda. Weighted substructure mining for image analysis. In IEEE Conference on Computer Vision and Pattern Recognition, 2007.
[57] T. Ojala, M. Pietikainen, and D. Harwood. A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 1996.
[58] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification us-ing machine learnus-ing techniques. In Conference on Empirical Methods in Natural Language Processing, 2002.
[59] M. J. Pazzani and D. Billsus. Content-based recommendation systems. In The Adap-tive Web: Methods and Strategies of Web Personalization. Volume 4321 of Lecture Notes in Computer Science, 2007.
[60] M. Pennacchiotti and A.-M. Popescu. A machine learning approach to twitter user classification. In International AAAI Conference on Weblogs and Social Media, 2011.
[61] P. J. Phillips, H. Moon, P. Rauss, and S. A. Rizvi. The feret evaluation method-ology for face-recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000.
[62] R. Plutchik. Emotion: A psychoevolutionary synthesis. Harper & Row, Publishers, 1980.
[63] F. Schroff, A. Criminisi, and A. Zissermann. Harvesting image databases from the web. In IEEE International Conference on Computer Vision, 2007.
[64] D. W. Scott. Multivariate density estimation: Theory, practice, and visualization.
John Wiley & Sons, Inc., Hoboken, NJ, USA. doi 10.1002/9780470316849.fmatter, 2008.
[65] N. Sebe, I. Cohen, T. Gevers, and T. S. Huang. Emotion recognition based on joint visual and audio cues. In International Conference on Pattern Recognition, 2006.
[66] S. Siersdorfer, S. Chelaru, W. Nejdl, and J. San Pedro. How useful are your com-ments?: Analyzing and predicting youtube comments and comment ratings. In In-ternational Conference on World Wide Web, 2010.
[67] P. Singla, H. Kautz, A. Gallagher, and J. Luo. Discovery of social relationships in consumer photo collections using markov logic. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2008.
[68] J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In IEEE International Conference on Computer Vision, 2003.
[69] R. Sommer. Further studies of small group ecology. In Sociometry, 1965.
[70] B. Taneva, M. Kacimi, and G. Weikum. Gathering and ranking photos of named enti-ties with high precision, high recall, and diversity. In ACM International Conference on Web Search and Data Mining, 2010.
[71] S. Tong and D. Koller. Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research, 2002.
[72] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In IEEE Conference on Computer Vision and Pattern Recognition, 2001.
[73] P. Viola, J. C. Platt, and C. Zhang. Multiple instance boosting for object detection.
In Neural Information Processing Systems, 2006.
[74] G. Wang and D. Forsyth. Joint learning of visual attributes, object classes and visual saliency. In IEEE International Conference on Computer Vision, 2009.
[75] G. Wang, A. Gallagher, J. Luo, and D. Forsyth. Seeing people in social context:
Recognizing people and social relationships. In European Conference on Computer Vision, 2010.
[76] M. Wang, K. Yang, X.-S. Hua, and H.-J. Zhang. Towards a relevant and diverse search of social images. IEEE Transactions on Multimedia, 2010.
[77] S.-Y. Wang, W.-S. Liao, L.-C. Hsieh, Y.-Y. Chen, and W. H. Hsu. Learning by expan-sion: Exploiting social media for image classification with few training examples.
Neurocomputing, 2012.
[78] W. Wang and Q. He. A survey on emotional semantic image retrieval. In IEEE International Conference on Image Processing, 2008.
[79] J. M. Wiebe, R. F. Bruce, and T. P. O’Hara. Development and use of a gold-standard data set for subjectivity classifications. In Conference of the Association for Com-putational Linguistics, 1999.
[80] C. Wu, C. Liu, H.-Y. Shum, Y.-Q. Xu, and Z. Zhang. Automatic eyeglasses removal from face images. In IEEE Conference on Computer Vision and Pattern Recognition, 2008.
[81] P. Wu, W. Ding, Z. Mao, and D. Tretter. Close & closer: Discover social relationship from photo collections. In IEEE International Conference on Multimedia and Expo, 2009.
[82] R. Yan, A. Natsev, and M. Campbell. A learning-based hybrid tagging and browsing approach for efficient manual image annotation. In IEEE Conference on Computer Vision and Pattern Recognition, 2008.
[83] S. Yan, X. Zhou, M. Liu, M. Hasegawa-Johnson, and T. S. Huang. Regression from patch-kernel. In IEEE Conference on Computer Vision and Pattern Recognition, 2008.
[84] X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In IEEE Inter-national Conference on Data Mining, 2002.
[85] J. Yang, Y.-G. Jiang, A. G. Hauptmann, and C.-W. Ngo. Evaluating bag-of-visual-words representations in scene classification. In ACM International Conference on Multimedia Information Retrieval, 2007.
[86] J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In IEEE Conference on Computer Vision and Pattern Recognition, 2009.
[87] Y. Yang and J. O. Pedersen. A comparative study on feature selection in text cate-gorization. In International Conference on Machine Learning, 1997.
[88] J. Yuan, Q. You, S. McDonough, and J. Luo. Sentribute: Image sentiment analy-sis from a mid-level perspetive. In Workshop on Sentiment Discovery and Opinion Mining, 2013.
[89] T. Zhang, H. Chao, C. Willis, and D. Tretter. Consumer image retrieval by estimating relation tree from family photo collection. In International Conference on Image and Video Retrieval, 2010.
[90] Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma. Mining interesting locations and travel sequences from gps trajectories. In International Conference on World Wide Web, 2009.
[91] M. Zhou and H. Wei. Face verification using gaborwavelets and adaboost. In Inter-national Conference on Pattern Recognition, 2006.
[92] L. Zhuang, F. Jing, and X.-Y. Zhu. Movie review mining and summarization. In ACM International Conference on Information and Knowledge Management, 2006.