Winston H. Hsu (徐宏民)
National Taiwan University, Taipei
From Media Retrieval to Data Analytics –
Research HighlightsOffice: R512, CSIE Building
Communication and Multimedia Lab (通訊與多媒體實驗室) http://www.csie.ntu.edu.tw/~winston
November 11, 2014
@NTU, November 2014 – Winston Hsu
Dr. Winston Hsu (徐宏⺠民) – Short Bio
§ Associate Professor in NTU CSIE and GINM, since Feb. 1, 2007
– Affiliated with Communication and Multimedia Lab (CMLab)
台大通訊與多媒體實驗室
§ PhD from Columbia University, New York, 2007
§ 4 years in (startup-period) CyberLink (
訊連科技
)– Founding Engineer, Project Leader, and RD Manger
§ Honors & Awards
– 2600+ Google citations in the past years
– Editorial Board for IEEE Multimedia Mag., Organizing Committee for ACM
Multimedia 2010/2013, IEEE/ACM Senior Member, MSR Visiting Researcher
– Awards:
國科會100年度吳大猷先生紀念獎
(2011), FIRST PRIZE in ACM Multimedia Grand Challenge 2011, FIRST PLACE in MSR-Bing Image Retrieval Challenge 2013, Microsoft Research Award in Multimedia Search 2009/2012, 2013資訊⽉月傑出資訊⼈人才獎, 台灣⼤大學電資學院學術貢獻獎 (top 3%), etc.2
@NTU, November 2014 – Winston Hsu
Ongoing Research Projects (Selected) –
More Details and Demos in
3 facial/clothing attribute
detection/search
web-scale indexing &
feature learning
large-scale photo/video recognition
web-scale facial image retrieval
mobile visual recognition
multimodal deep neural network
social media mining big data analytics and visualization
first-person/wearable cameras
consumer photo retrieval
http://www.csie.ntu.edu.tw/~winston/
next
@NTU, November 2014 – Winston Hsu
Exponentially Growing Photos (Billions)
§ Why?
– Mobile phones!
– Sharing in social media: for organization and (social) communication [Ames, et al., CHI’07]
§ Evidenced by the numbers
– 880 billion photographs will be taken in 2015 [Yahoo!]
– 500 million: average number of tweets sent per day – 350 million: average daily uploaded Facebook photos – 250 billion: total number of uploaded Facebook photos – 20 billion: total number of photos shared in Instagram
Source: http://expandedramblings.com; photo from NBC News 4
@St. Peter's Square, Vatican
@NTU, November 2014 – Winston Hsu
(1/4) National Security –
Why? Strong Social and Industrial Needs
§ Search suspects across thousands of video cameras or millions of community-contributed photos
§ For example, “Boston Marathon Bombing 2013”
5
[Lei&Hsu, ACMMM’11, SIGIR’12 Lin, TCSVT’13, ICCV’13]
@NTU, November 2014 – Winston Hsu
(2/4) Identifying Geographic Information and Objects –
Why? Strong Social and Industrial Needs
§ Identifying locations from photos and searching suspicious objects in satellite images
§ For example,
– IM2GPS [Hays et al., CVPR’09] – over million/billion-scale geo-ref. photos – “Malaysia Airline Flight 370” over high-resolution satellite images
6
[Kuo & Hsu, ACMMM’09, ’10, CVPR’11, IEEE TMM’12]
@NTU, November 2014 – Winston Hsu
(3/4) Augmented Reality –
Why? Strong Social and Industrial Needs
§ In real time, recognizing and matching the real world to help those in needs; e.g., elderlies, surgeons, tourists, etc.
§ Sample projects:
– Wearable devices (e.g., Google Glass)
– MIT Six Sense Project (Pranav Mistry and Pattie Maes)
http://www.healthcare.philips.com/main/about/future-of-healthcare/ 7
[Wu’13 IEEE MM, ACMMM Su’13, Wu’12, Cheng’10]
@NTU, November 2014 – Winston Hsu
(4/4) Culture & Commerce –
Why? Strong Social and Industrial Needs
8
[Hou&Hsu, 2014, Cheng, MM’11, Chen MM’12, TMM’13]
@NTU, November 2014 – Winston Hsu
Photos and Videos are Essential, Ever-Growing, and Challenging Big Data!!
9
@NTU, November 2014 – Winston Hsu
• Goal – exploiting mobile- and cloud-based capabilities for improving user experiences and visual analytics and search
10
Our Research Paradigm –
Mobile and Cloud-based Media Computing
Billion-scale photos, videos, and (noisy)
metadata
• Users (Social)
• Contents
• Services
• Context
• Interactions
• Camera
• Sensors
@NTU, November 2014 – Winston Hsu
Automatic Voice-based Facial Image Annotation and Retrieval
§ Novel and intuitive scenario – voice tagging when taking photos
– Propagating proper names to each candidate faces; learning from the uncertainty
– Accepted for ACM Multimedia 2014 (and ECCV 2014)
11 Jennifer
Angelina Brad
@NTU, November 2014 – Winston Hsu
Large-Scale Attribute-based People Search – Search by Impression
§ Search by impression – searching people-related photos by graphically describing the search intentions
§ FIRST PRIZE in ACM Multimedia Grand Challenge 2011
demo
[Lei&Hsu, ACM MM 2011]
12
[Lei&Hsu, SIGIR 2012]
@NTU, November 2014 – Winston Hsu
Scalable Face Image/Video Retrieval
(Sub-second Response in Million Dataset)
13
[Chen & Hsu, ACMMM’11, IEEE TMM’13, TCSVT’14
ECCV’14]
Age-invariant
@NTU, November 2014 – Winston Hsu
Me-link: Link Me to the Media –
Fusing Audio and Visual Cues for Robust and Efficient Mobile Media Interaction
http://vimeo.com/82499464
demo
demo
14
[Yeh&Hsu, WWW 2014]
@NTU, November 2014 – Winston Hsu
Our Work in Deep Learning (DNN) for Video Recognition –
Solving Scarce Training Data Problem0 0.2 0.4 0.6 0.8 1 WeddingReception
Graduation MusicPerformance NonMusicPerformance Birthday
0 0.1 0.2 0.3 0.4 0.5 0.6 Bird
PlayGround Dog Baseball Soccer
0 0.1 0.2 0.3 0.4 Swimming
IceSkating Biking Dog Beach
[Su&Hsu, 2014]
15
demo
@NTU, November 2014 – Winston Hsu
§ Limited storage, memory, power in mobile devices
§ Reducing model size by approximating the high-dimensional kernel
– Efficient computation – linear projection – Small model – sparse projection matrix
Mobile Visual Recognition –
Preserving Accuracy + Reducing Model Size
# of feature dim.
# of classifiers
Su et al., Scalable Mobile Visual Classification by Kernel Preserving Projection Over High- 16
Dimensional Features. IEEE Transactions on Multimedia 2014
@NTU, November 2014 – Winston Hsu
§ Music service visualization and mining (~1B logs, 0.2M active users)
Our Projects on Very Large Visual Collections and Logs
§ Microsoft Bing search logs (21M users, 41M queries)
– Improving image search: FIRST PLACE in MSR-Bing Image Retrieval Challenge 2013
– Personalized image search [Wu et al., SIGIR’14]
§ Social media mining for the city
(million-scale geo-referenced photos, tweets, checkins, bikes, etc.)
17
SSeeaarrcchh LLoogg ooff UUsseerr 11112255
͞͞ǁǁŽŽƌƌůůĚĚŶŶĞĞǁǁƐƐ͟͟
Baseball
Personalized Suggestion with Trending-Aware Image Selection General Suggestion
(A)
(B)
͞
͞ƚƚĞĞŶŶŶŶŝŝƐƐ͟͟
͞
͞aattpp ƌƌĂĂŶŶŬŬŝŝŶŶŐŐ͟͟
Alex Rodriguez
Pablo Sandoval Clayton Kershaw
Masahiro Tanaka Ukraine
South Sudan
Rafael Nadal RogerFederer
Tennis Global Issue
1/19 1/21 1/23
Search Count
͞zĂŶŬĞĞƐLand Masahiro dĂŶĂŬĂ͟-ESPN
Travel Planning Hotel Reservation Dining Guide…
(d) Personalized Mobile Recommendation (b) People Attribute &
Group Type Detection Statue of Liberty
Central Park
Hilton Hotel
(c) Spatio-Temporal Activity Mining
(a) Large-Scale Community-Contributed Photos
+
@NTU, November 2014 – Winston Hsu
Example – App for Flower Recognition in Apple Store
§ Collaborating with TW environmental protection NGO (荒野保護協會) and sponsored by Taiwan Mobile
§ The first flower recognition APP – KNN at server side
§ Recognizing 50+ flowers now; scalable to hundreds and more
§ Very challenging feedbacks à motivating new research directions (context, fine grain, web crowdsourcing, mobile hashing, etc.)
* PR event and news on 04/09/2013
demo
18
[Wu&Hsu, IEEE MM’13]
@NTU, November 2014 – Winston Hsu
Discovering the City by Mining Diverse and Multimodal Data Streams – IBM Grand Challenge: New York City 360
§ Exploring and Integrating Multiple Contents and Sources for NYC Life
§ ACM Multimedia 2014 Grand Challenge Multimodal Award
19
5.3M geotagged tweets 1.6M check-in photos
12.4K attractions & restaurants 6.5B subway logs
3
Exploring and Integrating Multiple Contents and Sources for NYC Life
Diverse Data Collection
(Subway)
Affinity Matrix 1
0
The similarity of top-ranked POIs
Text
Image
Venue
Traffic Data
@NTU, November 2014 – Winston Hsu
Complementary and diverse media data Rich and informative
attributes / facets
State-of-the-art and novel methods
Promising and emerging utilities /
applications
Events
Landmark Clothing
Transportation Food
People
HumanAttributes Traffic Pattern
Spatiotemporal Awareness
Trend & Event Detection Content-BasedVisual Analysis
Posts, Images, Videos, Likes, Contextual Information, Meta-Data
Sentiment Analysis
Brooklyn Bridge Park
Manhattan Skyline Sunset
Lincoln Center
Mercedes-Benz Fashion Week Food
@NTU, November 2014 – Winston Hsu
Recent Student Awards (selected)
–
Working on Essential and Emerging Problems§ ACM Multimedia 2014 Grand Challenge Multimodal Award
§ FIRST PLACE in MSR-Bing Image Retrieval Challenge 2013
§ First Prize for ACM Multimedia Grand Challenge 2011
§ ACM Multimedia 2013 Grand Challenge Multimodal Award
§ 陳殷盈ACM Multimedia 2012 Doctoral Symposium Best Paper Award
§ 郭盈希Microsoft Research Asia Fellowship 2012
§ 朱冠宇榮獲「中國電機⼯工程學會102年⻘青年論⽂文獎」第三名
§ 博⼠士班學⽣生陳冠婷(102)、陳殷盈(101)、林彥良(101)獲得 「補助博⼠士
⽣生赴國外研究(千⾥里⾺馬)」獎助
§ 陳柏村榮獲101年度中華⺠民國⼈人⼯工智慧學會碩⼠士論⽂文獎
§ 中華電信2011電信創新應⽤用⼤大賽雲端應⽤用校園組亞軍
§ 鄭安容榮獲「中國電機⼯工程學會100年⻘青年論⽂文獎」第⼆二名
§ 李⽂文瑜榮獲頂尖國際會議SIGIR 2011 Google Fellowship for Women
§ 陳殷盈榮獲頂尖國際會議WWW 2011 Google Fellowship for Women
§ 郭盈希同學榮獲「中國電機⼯工程學會99年⻘青年論⽂文獎」第⼆二名
§ 學⽣生榮獲中華電信2010電信奧斯卡—花博應⽤用組冠軍
21
@NTU, November 2014 – Winston Hsu
Thanks and Comments!
22
Winston H. Hsu (徐宏民)
National Taiwan University, Taipei
Office: R512, CSIE Building
Communication and Multimedia Lab (通訊與多媒體實驗室) http://www.csie.ntu.edu.tw/~winston