Modern Machine Learning Models Random Forest
A Complicated Data Set
g
t
(N0
=N/2) G with first t treesModern Machine Learning Models Random Forest
A Complicated Data Set
g
t
(N0
=N/2) G with first t trees‘easy yet robust’ nonlinear model
Modern Machine Learning Models Random Forest
A Complicated Data Set
g
t
(N0
=N/2) G with first t trees‘easy yet robust’ nonlinear model
Modern Machine Learning Models Random Forest
A Complicated Data Set
g
t
(N0
=N/2) G with first t trees‘easy yet robust’ nonlinear model
Modern Machine Learning Models Random Forest
A Complicated Data Set
g
t
(N0
=N/2) G with first t trees‘easy yet robust’ nonlinear model
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Modern Machine Learning Models ::
Adaptive (or Gradient) Boosting
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Apple Recognition Problem
•
is this a picture of an apple?•
say, want to teach a class of6 year olds
•
gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!)(APAL stands for Apple and Pear Australia Ltd)
Dan Foy APAL adrianbartel ANdrzej cH. Stuart Webster
https:
//flic.
kr/p/jNQ55
https:
//flic.
kr/p/jzP1VB
https:
//flic.
kr/p/bdy2hZ
https:
//flic.
kr/p/51DKA8
https:
//flic.
kr/p/9C3Ybd
nachans APAL Jo Jakeman APAL APAL
https:
//flic.
kr/p/9XD7Ag
https:
//flic.
kr/p/jzRe4u
https:
//flic.
kr/p/7jwtGp
https:
//flic.
kr/p/jzPYNr
https:
//flic.
kr/p/jzScif
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Apple Recognition Problem
•
is this a picture of an apple?•
say, want to teach a class of6 year olds
•
gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!)Mr. Roboto. Richard North Richard North Emilian Robert Vicol
Nathaniel Mc-Queen https:
//flic.
kr/p/i5BN85
https:
//flic.
kr/p/bHhPkB
https:
//flic.
kr/p/d8tGou
https:
//flic.
kr/p/bpmGXW
https:
//flic.
kr/p/pZv1Mf
Crystal jfh686 skyseeker Janet Hudson Rennett Stowe
https:
//flic.
kr/p/kaPYp
https:
//flic.
kr/p/6vjRFH
https:
//flic.
kr/p/2MynV
https:
//flic.
kr/p/7QDBbm
https:
//flic.
kr/p/agmnrk
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Our Fruit Class Begins
•
Teacher: Please look at the pictures of apples and non-apples below. Based on those pictures, how would you describe an apple? Michael?•
Michael: I think apples arecircular.
(Class): Apples are
circular.
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Our Fruit Class Continues
•
Teacher: Being circular is a good feature for the apples. However, if you only say circular, you could make several mistakes. What else can we say for an apple? Tina?•
Tina: It looks like apples arered.
(Class): Apples are somewhat
circular and
somewhatred.
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Our Fruit Class Continues More
•
Teacher: Yes. Many apples are red. However, you could still make mistakes based on circular and red. Do you have any other suggestions, Joey?•
Joey: Apples could also begreen.
(Class): Apples are somewhat
circular and
somewhatred and possibly green.
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Our Fruit Class Ends
•
Teacher: Yes. It seems that apples might be circular, red, green.But you may confuse them with tomatoes or peaches, right? Any more suggestions, Jessica?
•
Jessica: Apples havestems
at the top.(Class): Apples are somewhat
circular, somewhat red, possibly green,
and may havestems
at the top.Modern Machine Learning Models Adaptive (or Gradient) Boosting
Motivation
•
students: simple hypotheses gt
(likevertical/horizontal lines)
•
(Class): sophisticated hypothesis G (like black curve)•
Teacher: a tactic learning algorithm thatdirects the students to focus on key examples
next: demo of such an algorithm
Modern Machine Learning Models Adaptive (or Gradient) Boosting
A Simple Data Set
‘Teacher’-like algorithm works!
Modern Machine Learning Models Adaptive (or Gradient) Boosting
A Simple Data Set
‘Teacher’-like algorithm works!
Modern Machine Learning Models Adaptive (or Gradient) Boosting
A Simple Data Set
‘Teacher’-like algorithm works!
Modern Machine Learning Models Adaptive (or Gradient) Boosting
A Simple Data Set
‘Teacher’-like algorithm works!
Modern Machine Learning Models Adaptive (or Gradient) Boosting
A Simple Data Set
‘Teacher’-like algorithm works!
Modern Machine Learning Models Adaptive (or Gradient) Boosting
A Simple Data Set
‘Teacher’-like algorithm works!
Modern Machine Learning Models Adaptive (or Gradient) Boosting
A Simple Data Set
‘Teacher’-like algorithm works!
Modern Machine Learning Models Adaptive (or Gradient) Boosting
Putting Everything Together
Gradient Boosted Decision Tree (GBDT) s 1 = s 2 = . . . = s N = 0
for t = 1, 2, . . . , T
1
obtaing t
byA
({(xn
,y n − s n
)}) whereA
is a (squared-error) regression algorithm—such as ‘weak’ C&RT?
2
computeα t
=OneVarLinearRegression({(g t (x n ), y n − s n
)})3
updates n
←s n
+α t g t (x n )
return G(x) =PT
t=1 α t g t
(x)GBDT: ‘regression sibling’ of AdaBoost +
decision tree—very popular in practice
Modern Machine Learning Models Deep Learning
Modern Machine Learning Models ::
Deep Learning
Modern Machine Learning Models Deep Learning
Physical Interpretation of Neural Network
x
0= 1 x
1x
2.. . x
d+1
tanh
tanh
w
ij(1)w
jk(2)w
kq(3)+1
tanh
tanh
s
3(2) tanhx
3(2)•
each layer:pattern feature extracted
from data,remember? :-)
•
how many neurons? how many layers?—more generally,
what structure?
• subjectively, your design!
• objectively, validation, maybe?
structural decisions:
key issue
for applying NNetModern Machine Learning Models Deep Learning
Shallow versus Deep Neural Networks
shallow: few (hidden) layers; deep: many layers
Shallow NNet
•
moreefficient
to train ( )• simpler
structural decisions ( )•
theoreticallypowerful enough
( )Deep NNet
• challenging
to train (×)• sophisticated
structural decisions (×)• ‘arbitrarily’ powerful
( )•
more‘meaningful’?
(see next slide)deep NNet (deep learning)
gaining attention
in recent yearsModern Machine Learning Models Deep Learning
Meaningfulness of Deep Learning
,
is it a ‘1’? ✲ ✛ is it a ‘5’?
✻
z
1z
5φ
1φ
2φ
3φ
4φ
5φ
6positive weight negative weight
• ‘less burden’
for each layer:simple
tocomplex
features•
natural fordifficult
learning task withraw features, like vision
deep NNet: currently popular invision/speech/. . .
Modern Machine Learning Models Deep Learning
Challenges and Key Techniques for Deep Learning
•
difficultstructural decisions:
• subjective with domain knowledge: like convolutional NNet for images
•
highmodel complexity:
• no big worries if big enough data
• regularization towards noise-tolerant: like
• dropout (tolerant when network corrupted)
• denoising (tolerant when input corrupted)
•
hardoptimization problem:
• careful initialization to avoid bad local minimum:
called pre-training
•
hugecomputational complexity
(worsen withbig data):
• novel hardware/architecture: like mini-batch with GPU
IMHO, careful
regularization
andinitialization
are key techniquesModern Machine Learning Models Deep Learning
A Two-Step Deep Learning Framework
Simple Deep Learning
1
for` = 1, . . . , L,pre-train
n wij (`)
oassuming w
∗ (1)
,. . . w∗ (`−1)
fixed(a) (b) (c) (d)
2 train with backprop
onpre-trained
NNet tofine-tune
all nw
ij (`)
odifferent deep learning models deal with the steps somewhat differently
Modern Machine Learning Models Deep Learning