Winston H. Hsu (徐宏民)
National Taiwan University, Taipei
Machine Intelligence for Large-Scale Image/Video Data Streams
– Advancing Deep Neural Networks for Emerging Applications
Office: R512, CSIE Building
November, 2016
Dr. Winston Hsu ( 徐宏民 ) – Short Bio
§ Professor in NTU CSIE and GINM, since Feb. 1, 2007
– Affiliated with Communication and Multimedia Lab (CMLab)
§ PhD from Columbia University, New York, 2007
§ 4 years in (startup-period) CyberLink Corp. (
訊連科技)
– Founding Engineer, Project Leader, and RD Manger
§ Recognitions & Awards
– 3500+ Google citations; H-index: 27; i10-index: 51
– Director for NVIDIA AI Lab (NTU), AE for IEEE Trans. on Multimedia; AE for IEEE Multimedia Mag., Organizing Committee for ACM Multimedia
2010/2013/2015/2016, IEEE/ACM Senior Member, MSR Visiting Researcher (2014),
Visiting Researcher IBM Watson (2016)
– Awards:
2011 Ta-You Wu Memorial Award (Young Researcher), FIRST PRIZE in ACM Multimedia Grand Challenge 2011, FIRST PLACE in MSR-Bing Image RetrievalChallenge 2013, Microsoft Research Award 2009/2012/2014/2015, 2013 National
Outstanding IT Elite Award, 2012 NTU EECS Academic Contribution Award (top 3%), etc.
2
Globally Competitive for Our Research Team
§ Recent report by Wealth Magazine (財訊雙週刊)
§ Research developments in AI (data learning in large-scale multimodal data streams)
§ How we have strived hard to keep our group competitive in the global research communities.
§ Our PhD alumni had received offers from the US-based
research labs
Awarded “NVIDIA AI LAB” – The 1st in Asia, the 4th in the World (GTC Taipei, September 21, 2016)
§ Video announcement by NIVIDIA CEO/Co-Founder Jen-Hsun Huang
– https://www.youtube.com/watch?v=yjhj7bAj9hs#t=57m16s
§ For the project, “DeepTutor” – question and answering over large- scale multimodal data streams
§ The 4
thNVIDIA AI Lab in the world; right after Stanford, Berkeley, and OpenAI
4
Motivations – Numerous Cameras in Different Forms;
and Keep Growing …
Ent er pr is es /G ov er nm ent s
Ongoing Research Projects (Selected) –
More Details and Demos in
6 facial/clothing attribute
detection/search
web-scale indexing &
feature learning
large-scale photo/video recognition
web-scale facial image retrieval
mobile visual recognition
multimodal deep neural network
social media mining big data analytics and visualization
first-person/wearable cameras
consumer photo retrieval
http://www.csie.ntu.edu.tw/~winston/
next
§ Task: Online system (< 12 seconds) to score on each image-query pair that reflects how relevant the query could be used to describe the given image;
§ Hosted by Microsoft Research (Redmond) and Bing
§ Dataset: 23M click logs (query, image, #click) for training set and 77450 image-query pairs for online test
Image Search by Semantic Understanding –
First Place in MSR-Bing Image Retrieval Challenge 2013
http://web-ngram.research.microsoft.com/GrandChallenge
dollar bill
?
suri and katie cruise
?
drones
?
1
Product Inquiry/Recommendation by Mobiles (2009)
§ Product price/information inquiry by mobile phones
§ Experienced with indexing high-dimensional & large-scale data
demo
8
Amazon Flow Google Goggle
Pinterest Visual Search Alibaba Pailitao
[Lin et al., ICIP’09, Chen et al., JVCI’10]
2
Large-Scale Attribute-based People Search – Search by Impression
§ Search by impression
– searching people-related photos by graphically describing the search intentions§ FIRST PRIZE in ACM Multimedia Grand Challenge 2011
demo
[Lei&Hsu, ACM MM 2011]
[Lei&Hsu, SIGIR 2012]
3
Ongoing Projects in Image/Video Analytics with Deep Convolutional Neural Networks
§
Goal – Devise effective and efficient learning methods for scalable visual analytic platforms, applicable foremerging industry applications
10
Playground
scene cat.
photo annotation
person
bottle
dog
object detection
clothing attributes facial attributes vehicle attributes video events
Corgi
fine-grained recognition
Pembroke WelshCorgi
drone AR auto. training
data for CNN
Supervised
QA Proactive QA Self-taught
QA Deep
Tutoring
(diverse media streams)
Travel Sentiment Shopping Smart City …
Education Healthcare Surveillance Automobile Robotics
DeepTutor for Multimodal Question and Answering
Scalable Deep Learning Framework
• Multimodal and joint
• Semi-/un- supervised
• Video learning
• Transfer learning
• Scalable platform
• ….
• Hashing for memory networks
• Multimodal memory networks
• Captioning
• Zero-shot query
• Auto. training data acquisition
• …
QA Interface (Reinforced + Augmented) Multimedia QA Engine
Efficient, Large-Scale, and Multimodal Memory Representations
• Memory networks
• Reinforcement
• Attention model for AR
• Deep segmentation
• Deep user modeling
• …
4
(a)
(b)
(d)
(e)
Visiting Scientist – Cognitive Computation for IBM Watson AI (New York, USA)
§ The first movie trailer generated by AI system (Watson)
– One of the researchers in the team of three
§ Demo video: https://www.youtube.com/watch?v=gJEzuYynaiw
§ News
– “Watson helped make a trailer for a horror movie about AI,” Engadget
• https://www.engadget.com/2016/09/01/ibm-watson-movie-trailer-morgan/
– “A computer built this trailer for a horror movie about an evil AI,” Mashable
• http://mashable.com/2016/09/01/morgan-watson-ai-trailer
– ….
12
5
Image/Video Cognition (Machine Perception)
§ Problem definition: Given a video (image), describe it in natural language
§ Motivations
– Understanding high-level semantics and intention from video collection
– Leveraging multiple modalities such as video, time, text, etc., in the unified deep learning framework
– Enabling technology for video event detection, surveillance, live content filtering, robotics, social media mining, HCI, question and answering, etc.
A man and a woman are
6
Image/Video Cognition (Machine Perception) – Tentative Results
14
A woman is pouring a bowl of dough and another woman is making something <eos>
Image-based 3D Model Retrieval
– Retrieving semantically Related 3D Models by Image
§ Novel proposal – End-to-end deep neural networks for cross-domain and cross-view learning and ranking
§ Impacts: the brand-new problem and significantly outperforming prior neural networks
[Lee et al., submitted, 2016]
Image-CNN Adaptation Layer
Cross-View Convolution
Rank by L2 distance View-CNN
…
Query Image
3D Shapes
Image representation
Shape representation
Top Ranked 3D Shapes:
Rendered Views
…
View-CNN
6
3D Medical Segmentation by Deep Neural Networks
§ Novel proposal – Utilizing cross-modal learning in the sequential and convolutional neural networks
§ Impacts: Significantly outperforming prior works (e.g., U-Net) in open benchmarks
16
[Tseng et al., submitted, 2016]
7
Social Media Mining – Huge Photos/Videos Shared for
Human Activities
Discovering the City by Mining Diverse and Multimodal Data Streams – IBM Grand Challenge: New York City 360
§ Exploring and integrating multiple contents and sources for NYC life
§ ACM Multimedia 2014 Grand Challenge Multimodal Award
18
[Kuo, ACM MM’14]
8
Understand Human Activities from Social Media (e.g., Instagram): Time + Photos + Tags
9
convolutional neural network sequential neural
network (RNN, LSTM)
§ Why: Huge needs in location-based services: advertisement, location understanding, recommendation, city planning, etc.
§ Problem Definition: Location classification, provided a collection of photos and associated metadata
§ Location Categories (10): Arts & Entertainment, College & University, Event, Food,
Fashion Mining from Social Media by Clothing Attributes – Huge Interest from Fashion Industry
§ Confirmed the influence of fashion shows in daily life
– 60 clothing attributes
§ Widely discussed in social media and news media (NY Post, MIT Tech. Review, Science News, etc.)
20
[Chen et al., ACMMM’15]
08/28/2015
10
Drone AR – Understanding the Context from Drone Views (Ongoing Project)
11
@NTU, November 2016 – Winston Hsu
Recent Student Awards (selected)
– Working on Essential and Emerging Problems
§ FIRST PLACE in MSR-Bing Image Retrieval Challenge 2013
§ First Prize for ACM Multimedia Grand Challenge 2011
§ ACM Multimedia 2013 Grand Challenge Multimodal Award
§ 陳殷盈ACM Multimedia 2012 Doctoral Symposium Best Paper Award
§ 郭盈希Microsoft Research Asia Fellowship 2012
§ 朱冠宇榮獲「中國電機工程學會102年青年論文獎」第三名
§ 博士班學生陳冠婷(102)、陳殷盈(101)、林彥良(101)獲得 「補助博士 生赴國外研究(千里馬)」獎助
§ 陳柏村榮獲101年度中華民國人工智慧學會碩士論文獎
§ 中華電信2011電信創新應用大賽雲端應用校園組亞軍
§ 鄭安容榮獲「中國電機工程學會100年青年論文獎」第二名
§ 李文瑜榮獲頂尖國際會議SIGIR 2011 Google Fellowship for Women
§ 陳殷盈榮獲頂尖國際會議WWW 2011 Google Fellowship for Women
§ 郭盈希同學榮獲「中國電機工程學會99年青年論文獎」第二名
§ 學生榮獲中華電信2010電信奧斯卡—花博應用組冠軍
22
Hearty Contributions from
Our Research Members
Acknowledgements for Research Sponsors
24