Slides credited from Prof. Hung-Yi Lee
What is Machine Learning?
2
What Computers Can Do?
Programs can do the things you ask them to do
3
Program for Solving Tasks
Task: predicting positive or negative given a product review
“I love this product!” “It claims too much.” “It’s a little expensive.”
+ - ?
“台灣第一波上市!” “規格好雞肋…” “樓下買了我才考慮”
推 噓
?
program.py
Some tasks are complex, and we don’t know how to write a program to solve them.
if input contains “love”, “like”, etc.
output = positive
if input contains “too much”, “bad”, etc.
output = negative
program.py program.py
program.py program.py program.py
4
Learning ≈ Looking for a Function
Task: predicting positive or negative given a product review
“I love this product!” “It claims too much.” “It’s a little expensive.”
+ - ?
“台灣第一波上市!” “規格好雞肋…” “樓下買了我才考慮”
推 噓
?
f f f
f f f
Given a large amount of data, the machine learns what the function f should be.
5
Learning ≈ Looking for a Function
Speech Recognition
Handwritten Recognition
Weather forecast
Play video games
( ) =
f
( ) =
f
( ) =
f
( ) =
f
“2”
“你好”
“ Saturday”
“move left”
Thursday
6
Machine Learning Framework
Training is to pick the best function given the observed data Testing is to predict the label using the learned function Training Data
Model: Hypothesis Function Set
2
1, f f
Training: Pick the best function f *
Testing: f
( )
x =
y y = +
f
*“Best” Function
( ) ( )
x1, yˆ1 , x2, yˆ2 ,
Testing Data
( )
x,? ,
“It claims too much.”
-
(negative): x
ˆy :
7
function input function output
What is Deep Learning?
A subfield of machine learning
8
Stacked Functions Learned by Machine
Production line (生產線)
“台灣第一波
上市!” 推
End-to-end training: what each function should do is learned automatically
Simple Function
f1
Simple Function
f2
Simple Function
f3
Deep Learning Model
f: a very complex function
9
Deep learning usually refers to neural network based model
Stacked Functions Learned by Machine
Output Layer Hidden Layers
Input Layer
Layer 1 Layer 2 Layer L
Input Output
vector x
“台灣第一波 上市!”
label y 推 x
1x
2……
x
N…… …… ……
……
……
……
y
Features / Representations
Representation Learning attempts to learn good features/representations
Deep Learning attempts to learn (multiple levels of) representations and an output
10
Deep v.s. Shallow – Speech Recognition
Shallow Model
MFCC Waveform
…
Filter bank DFT
DCT log
GMM
spectrogram
“Hello”
:hand-crafted :learned from data Each box is a simple function in the production line:
11
Deep v.s. Shallow – Speech Recognition
Deep Model
All functions are learned from data
Less engineering labor, but machine learns more f
1“Hello”
f
2f
3f
4f
5“Bye bye, MFCC” - Deng Li in Interspeech 2014 12
Deep v.s. Shallow – Image Recognition
Shallow Model
:hand-crafted :learned from data
http://www.robots.ox.ac.uk/~vgg/research/encoding_eval/ 13
Deep v.s. Shallow – Image Recognition
Deep Model
Reference: Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833)
“monkey”
f
1f
2f
3f
4All functions are learned from data
Features / Representations
14
Machine Learning v.s. Deep Learning
Machine Learning
describing your data with features a
computer can understand
model learning algorithm
Credit by Dr. Socher
hand-crafted domain-specific knowledge
optimizing the weights on features
15
Machine Learning v.s. Deep Learning
Deep Learning
representations learned by machine
model learning algorithm
automatically learned internal knowledge
optimizing the weights on features
16
Deep learning usually refers to neural network based model
Inspired by Human Brain
17
A Single Neuron
z w
1w
2w
N…
x
1x
2x
N+ b
( ) z
( ) z
z
bias
y
( )
zz e
−= + 1
1
Sigmoid function Activation
function
Each neuron is a very simple function
18
Deep Neural Network
Cascading the neurons to form a neural network
x
1x
2……
Layer 1
……
y
1y
2……
Layer 2
……
Layer L
……
……
……
Input Output
y
Mx
NM
N
R
R
f : →
A neural network is a complex function:
Each layer is a simple function in the production line
19
History of Deep Learning
1960s: Perceptron (single layer neural network) 1969: Perceptron has limitation
1980s: Multi-layer perceptron 1986: Backpropagation
1989: 1 hidden layer is “good enough”, why deep?
2006: RBM initialization (breakthrough) 2009: GPU
2010: breakthrough in Speech Recognition (Dahl et al., 2010) 2012: breakthrough in ImageNet (Krizhevsky et al. 2012)
2015: “superhuman” results in Image and Speech Recognition
20
Deep Learning Breakthrough
First: Speech Recognition
Second: Computer Vision
21 Acoustic Model WER on RT03S FSH WER on Hub5 SWB
Traditional Features 27.4% 23.6%
Deep Learning 18.5% (-33%) 16.1% (-32%)
History of Deep Learning
1960s: Perceptron (single layer neural network) 1969: Perceptron has limitation
1980s: Multi-layer perceptron 1986: Backpropagation
1989: 1 hidden layer is “good enough”, why deep?
2006: RBM initialization (breakthrough) 2009: GPU
2010: breakthrough in Speech Recognition (Dahl et al., 2010) 2012: breakthrough in ImageNet (Krizhevsky et al. 2012)
2015: “superhuman” results in Image and Speech Recognition
Why does deep learning show breakthrough in applications after 2010?
22
Reasons why Deep Learning works
Big Data GPU
23
Why to Adopt GPU for Deep Learning?
GPU is like a brain
Human brains create graphical imagination for mental thinking
24
台灣好吃牛肉麵
Why Speed Matters?
Training time
◦ Big data increases the training time
◦ Too long training time is not practical
Inference time
◦Users are not patient to wait for the responses
25
0 50 100 150 200 250 300
P40 P4 CPU
Time (ms)
GPU enables the real-world applications using the computational power
Why Deeper is Better?
Deeper → More parameters
26
x1 x2
……
xN……
x1 x2
……
xN……
……
Shallow Deep
Universality Theorem
Any continuous function f
can be realized by a network with only hidden layer
27
: R R
Mf
N→
http://neuralnetworksanddeeplearning.com/chap4.html
Why “deep” not “fat”?
Fat + Shallow v.s. Thin + Deep
Two networks with the same number of parameters
28
x1 x2
……
xN……
……
x1 x2
……
xN……
Fat + Shallow v.s. Thin + Deep
Hand-Written Digit Classification
29
The deeper model uses less parameters to achieve the same performance
Fat + Shallow v.s. Thin + Deep
Two networks with the same number of parameters
30
x1 x2
……
xN……
……
x1 x2
……
xN……
2d d
d
O(2d) O(d
2)
How to Apply?
31
How to Frame the Learning Problem?
The learning algorithm f is to map the input domain X into the output domain Y
Input domain: word, word sequence, audio signal, click logs Output domain: single label, sequence tags, tree structure, probability distribution
32
Y X
f : →
Output Domain – Classification
Sentiment Analysis
Speech Phoneme Recognition Handwritten Recognition
33
“這規格有誠意!” +
“太爛了吧~” -
/h/
2
Output Domain – Sequence Prediction
POS Tagging
Speech Recognition
Machine Translation
34
“推薦我台大後門的餐廳” 推薦/VV 我/PN 台大/NR 後門/NN
的/DEG 餐廳/NN
“大家好”
“How are you doing today?” “你好嗎?”
Learning tasks are decided by the output domains
Input Domain –
How to Aggregate Information
Input: word sequence, image pixels, audio signal, click logs Property: continuity, temporal, importance distribution Example
◦ CNN (convolutional neural network): local connections, shared weights, pooling
◦ AlexNet, VGGNet, etc.
◦ RNN (recurrent neural network): temporal information
35
Network architectures should consider the input domain properties
How to Frame the Learning Problem?
The learning algorithm f is to map the input domain X into the output domain Y
Input domain: word, word sequence, audio signal, click logs Output domain: single label, sequence tags, tree structure, probability distribution
36
Y X
f : →
Network design should leverage input and output domain properties
“Applied” Deep Learning
Deep Learning
representations learned by machine
model learning algorithm
automatically learned internal knowledge
optimizing the weights on features
37
How to frame a task into a learning problem and design the corresponding model
Core Factors for Applied Deep Learning
1. Data: big data
2. Hardware: GPU computing
3. Talent: design algorithms to allow networks to work for the specific problems
38
Concluding Remarks
39
Training
Concluding Remarks
40
Inference
Concluding Remarks
41
Training
Inference
Main focus: how to apply deep learning to the real-world problems
Reference
Reading Materials
◦ Academic papers will be put in the website
Deep Learning
◦ Goodfellow, Bengio, and Courville, “Deep Learning,” 2016.
http://www.deeplearningbook.org
◦ Michael Nielsen, “Neural Networks and Deep Learning”
http://neuralnetworksanddeeplearning.com
42