‘easy yet robust’ nonlinear model

Modern Machine Learning Models Random Forest

A Complicated Data Set

t

⁰

=N/2) G with first t trees

Modern Machine Learning Models Random Forest

A Complicated Data Set

t

⁰

=N/2) G with first t trees

Modern Machine Learning Models Random Forest

A Complicated Data Set

t

⁰

=N/2) G with first t trees

‘easy yet robust’ nonlinear model

Modern Machine Learning Models Random Forest

A Complicated Data Set

t

⁰

=N/2) G with first t trees

‘easy yet robust’ nonlinear model

Modern Machine Learning Models Random Forest

A Complicated Data Set

t

⁰

=N/2) G with first t trees

‘easy yet robust’ nonlinear model

Modern Machine Learning Models Adaptive (or Gradient) Boosting

Modern Machine Learning Models ::

Adaptive (or Gradient) Boosting

Modern Machine Learning Models Adaptive (or Gradient) Boosting

Apple Recognition Problem

•

is this a picture of an apple?

•

say, want to teach a class of

6 year olds

•

gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!)

(APAL stands for Apple and Pear Australia Ltd)

Dan Foy APAL adrianbartel ANdrzej cH. Stuart Webster

https:

//flic.

kr/p/jNQ55

https:

//flic.

kr/p/jzP1VB

https:

//flic.

kr/p/bdy2hZ

https:

//flic.

kr/p/51DKA8

https:

//flic.

kr/p/9C3Ybd

nachans APAL Jo Jakeman APAL APAL

https:

//flic.

kr/p/9XD7Ag

https:

//flic.

kr/p/jzRe4u

https:

//flic.

kr/p/7jwtGp

https:

//flic.

kr/p/jzPYNr

https:

//flic.

kr/p/jzScif

Modern Machine Learning Models Adaptive (or Gradient) Boosting

Apple Recognition Problem

•

is this a picture of an apple?

•

say, want to teach a class of

6 year olds

•

gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!)

Mr. Roboto. Richard North Richard North Emilian Robert Vicol

Nathaniel Mc-Queen https:

//flic.

kr/p/i5BN85

https:

//flic.

kr/p/bHhPkB

https:

//flic.

kr/p/d8tGou

https:

//flic.

kr/p/bpmGXW

https:

//flic.

kr/p/pZv1Mf

Crystal jfh686 skyseeker Janet Hudson Rennett Stowe

https:

//flic.

kr/p/kaPYp

https:

//flic.

kr/p/6vjRFH

https:

//flic.

kr/p/2MynV

https:

//flic.

kr/p/7QDBbm

https:

//flic.

kr/p/agmnrk

Modern Machine Learning Models Adaptive (or Gradient) Boosting

Our Fruit Class Begins

•

Teacher: Please look at the pictures of apples and non-apples below. Based on those pictures, how would you describe an apple? Michael?

•

Michael: I think apples are

circular.

(Class): Apples are

circular.

Modern Machine Learning Models Adaptive (or Gradient) Boosting

Our Fruit Class Continues

•

Teacher: Being circular is a good feature for the apples. However, if you only say circular, you could make several mistakes. What else can we say for an apple? Tina?

•

Tina: It looks like apples are

red.

(Class): Apples are somewhat

circular and

somewhat

red.

Modern Machine Learning Models Adaptive (or Gradient) Boosting

Our Fruit Class Continues More

•

Teacher: Yes. Many apples are red. However, you could still make mistakes based on circular and red. Do you have any other suggestions, Joey?

•

Joey: Apples could also be

green.

(Class): Apples are somewhat

circular and

somewhat

red and possibly green.

Modern Machine Learning Models Adaptive (or Gradient) Boosting

Our Fruit Class Ends

•

Teacher: Yes. It seems that apples might be circular, red, green.

But you may confuse them with tomatoes or peaches, right? Any more suggestions, Jessica?

•

Jessica: Apples have

stems

at the top.

(Class): Apples are somewhat

circular, somewhat red, possibly green,

and may have

stems

at the top.

Modern Machine Learning Models Adaptive (or Gradient) Boosting

Motivation

•

students: simple hypotheses g

_t

(like

vertical/horizontal lines)

•

(Class): sophisticated hypothesis G (like black curve)

•

Teacher: a tactic learning algorithm that

directs the students to focus on key examples

next: demo of such an algorithm

Modern Machine Learning Models Adaptive (or Gradient) Boosting

A Simple Data Set

‘Teacher’-like algorithm works!

Modern Machine Learning Models Adaptive (or Gradient) Boosting

A Simple Data Set

‘Teacher’-like algorithm works!

Modern Machine Learning Models Adaptive (or Gradient) Boosting

A Simple Data Set

‘Teacher’-like algorithm works!

Modern Machine Learning Models Adaptive (or Gradient) Boosting

A Simple Data Set

‘Teacher’-like algorithm works!

Modern Machine Learning Models Adaptive (or Gradient) Boosting

A Simple Data Set

‘Teacher’-like algorithm works!

Modern Machine Learning Models Adaptive (or Gradient) Boosting

A Simple Data Set

‘Teacher’-like algorithm works!

Modern Machine Learning Models Adaptive (or Gradient) Boosting

A Simple Data Set

‘Teacher’-like algorithm works!

Modern Machine Learning Models Adaptive (or Gradient) Boosting

Putting Everything Together

Gradient Boosted Decision Tree (GBDT) s ₁ = s ₂ = . . . = s _N = 0

for t = 1, 2, . . . , T

1

obtain

g t

A

({(x

ⁿ

y n − s ⁿ

)}) where

A

is a (squared-error) regression algorithm

—such as ‘weak’ C&RT?

2

compute

α t

=OneVarLinearRegression({(

g _t (x _n ), y _n − s n

)})

3

update

s _n

←

s _n

α _t g _t (x _n )

return G(x) =P

T

t=1 α _t g _t

(x)

GBDT: ‘regression sibling’ of AdaBoost +

decision tree

—very popular in practice

Modern Machine Learning Models Deep Learning

Modern Machine Learning Models ::

Deep Learning

Modern Machine Learning Models Deep Learning

Physical Interpretation of Neural Network

x

= 1 x

x

.. . x

+1

tanh

w

_ij⁽¹⁾

w

_jk⁽²⁾

w

_kq⁽³⁾

+1

tanh

s

₃⁽²⁾ tanh

x

₃⁽²⁾

•

each layer:

pattern feature extracted

from data,

remember? :-)

•

how many neurons? how many layers?

—more generally,

what structure?

• subjectively, your design!

• objectively, validation, maybe?

structural decisions:

key issue

for applying NNet

Modern Machine Learning Models Deep Learning

Shallow versus Deep Neural Networks

shallow: few (hidden) layers; deep: many layers

Shallow NNet

• efficient

to train ( )

• simpler

structural decisions ( )

•

theoretically

powerful enough

( )

Deep NNet

• challenging

to train (×)

• sophisticated

structural decisions (×)

• ‘arbitrarily’ powerful

( )

• ‘meaningful’?

(see next slide)

deep NNet (deep learning)

gaining attention

in recent years

Modern Machine Learning Models Deep Learning

Meaningfulness of Deep Learning

,

is it a ‘1’? ✲ ✛ is it a ‘5’?

✻

z

φ

positive weight negative weight

• ‘less burden’

for each layer:

simple

complex

features

•

natural for

difficult

learning task with

raw features, like vision

deep NNet: currently popular in

vision/speech/. . .

Modern Machine Learning Models Deep Learning

Challenges and Key Techniques for Deep Learning

•

difficult

structural decisions:

• subjective with domain knowledge: like convolutional NNet for images

•

high

model complexity:

• no big worries if big enough data

• regularization towards noise-tolerant: like

• dropout (tolerant when network corrupted)

• denoising (tolerant when input corrupted)

•

hard

optimization problem:

• careful initialization to avoid bad local minimum:

called pre-training

•

huge

computational complexity

(worsen with

big data):

• novel hardware/architecture: like mini-batch with GPU

IMHO, careful

regularization

and

initialization

are key techniques

Modern Machine Learning Models Deep Learning

A Two-Step Deep Learning Framework

Simple Deep Learning

1

for` = 1, . . . , L,

pre-train

n w

_ij ^(`)

assuming w

_∗ ⁽¹⁾

,. . . w

∗ ^(`−1)

fixed

(a) (b) (c) (d)

2 train with backprop

pre-trained

NNet to

fine-tune

all n

_ij ^(`)

different deep learning models deal with the steps somewhat differently

Modern Machine Learning Models Deep Learning

Mini-Summary

Modern Machine Learning Models Support Vector Machine

large-margin boundary ranging from linear to non-linear

在文檔中 Quick Tour of Machine Learning ( 機器學習速遊) 　 (頁 136-163)