最近搜尋

沒有找到結果。

標籤

沒有找到結果。

文件

沒有找到結果。

上傳

首頁學校主題

登錄

Machine Learning Techniques (ᘤᢈ)

Share "Machine Learning Techniques (ᘤᢈ)"

N/A

N/A

Protected

學年: 2022

Info

Protected

Academic year: 2022

Share "Machine Learning Techniques (ᘤᢈ)"

Copied!

24

0

0

24

0

0

加載中.... (立即查看全文)

立即下載 ( 24 頁 )

全文

(1)

Machine Learning Techniques ( 機器學習技巧)

Lecture 14: Miscellaneous Models

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

Department of Computer Science

& Information Engineering

National Taiwan University

( 國立台灣大學資訊工程系)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/23

(2)

Miscellaneous Models

Agenda

Lecture 14: Miscellaneous Models Matrix Factorization

Gradient Boosted Decision Tree Naive Bayes

Bayesian Learning

(3)

Miscellaneous Models Matrix Factorization

Recommender System Revisited

data ML ^skill

• data: how ‘many users’ have rated ‘some movies’

• skill: predict how a user would rate an unrated movie

A Hot Problem

•

competition held by Netflix in 2006

• 100,480,507 ratings that 480,189 users gave to 17,770 movies

• 10% improvement = 1 million dollar prize

•

dataD

j

for j-th movie:{(x

ⁿ

= (i), y

n

=r

_ij

)}

^N n=1 ^j

—abstract features

x _n

= (i)

how to

learn our preferences

from allD

j

?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/23

(4)

Miscellaneous Models Matrix Factorization

Linear Model for Recommender System

consider

one linear model

for eachD

j

={(x

n

= (i), y

_n

=r

_ij

)}

^N _n=1 ^j

, with a shared

transform Φ:

y ≈

h _j

(x) =

w ^T _j Φ(x) for j-th movie

• Φ(i): named v _i

, to be

learned

from data, like NNet/RBF Net

•

then,

r _ij

=y

n

≈

w ^T _j v _i

•

overall E

_in

with squared error:

E

_in

({

w _j

}, {

v _i

}) = P

j

N

_j

E

_in ^(j)

(w

_j

,{

v _i

}) P

j

N

_j

= 1

N X

known (i,j)

(r

_ij

−

w ^T _j v _i

)

²

how to minimize?

SGD

by sampling known (i,

j)

(5)

Miscellaneous Models Matrix Factorization

Matrix Factorization

r _ij

≈

w ^T _j v _i

=

v ^T _i w _j

R movie 1 movie 2 · · · movie J

user 1 100 ? · · · −

user 2 − 70 · · · −

· · · · · · · · · · · · · · ·

user _I ? − · · · 0

≈

V v ^T ₁ v ^T ₂

· · · v ^T _I

W w 1 w 2 · · · w J

Match movie and viewer factors

predicted rating

com edy con ten t acti on

con ten t blo ckb uste r?

Tom Cru ise in it?

like s T om Cr uis e?

pre fer s b loc kb ust ers ? like s a cti on?

like s c om edy ?

movie viewer

add contributions from each factor

Matrix Factorization Model

•

learning:

→

known rating

→ learned

factors w _j

and

v _i

→ unknown rating prediction similar modeling can be used for

abstract features

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/23

(6)

Miscellaneous Models Matrix Factorization

Fun Time

(7)

Miscellaneous Models Gradient Boosted Decision Tree

Coordinate Descent for Linear Blending

Consider a linear blending problem: forG = {g

`

},

min

β

1 N

N

X

n=1

exp

−y

ⁿ

L

X

`=1

β _` g _`

(x

n

)

!

•

why exponential error

exp(

−y

G(x)):

a

convex upper bound

on err

_0/1

as

err c

•

how to minimize?

—GD, SGD,. . . if few

{g ` }

•

what if lots of or

infinitely many g _` ?

—pick

one good g _i

, and update its

β _i

only

coordinate descent: in each iteration

•

pick a good

coordinate i

(the best one for the next step)

•

minimize by setting

β _i ^new ← β ^old _i + ∆

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/23

(8)

Miscellaneous Models Gradient Boosted Decision Tree

Coordinate Descent View of AdaBoost

Consider a linear blending problem: forG = {g

`

},

min

β

1 N

N

X

n=1

exp

−y

n L

X

`=1

β ` g _`

(x

_n

)

!

coordinate descent: in each iteration

•

pick a good

coordinate i

(the best one for the next step)

•

minimize by setting

β _i ^new ← β ^old _i + ∆

AdaBoost: in each iteration

•

pick a

good hypothesis g _t

•

set

α ^new _t ← 0 + ¹ ₂ ln ¹⁻ ^t

t

after some derivations (ML2012Fall HW7.5):

AdaBoost =

coordinate descent

+

exponential

error

(9)

Miscellaneous Models Gradient Boosted Decision Tree

Gradient Boosted Decision Tree

Consider another linear blending problem:

min

β

1 N

N

X

n=1

y

_n

−

L

X

`=1

β ` g _`

(x

_n

)

!

²

• best coordinate

at t-th iteration (under assumptions):

min

g `

1 N

N

X

n=1

(y

_n − G _t−1 (x _n )

−

g _`

(x

_n

))

²

—best

hypothesis

on{(x

n

,

residual _n

)}

• best β _` ^new

: one-dimensional

linear regression

gradient boosted decision tree

(GBDT):

above + find

best g _` by decision tree

(a ‘regression’ extension of AdaBoost)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23

(10)

Miscellaneous Models Gradient Boosted Decision Tree

Fun Time

(11)

Miscellaneous Models Naive Bayes

Naive Bayes Model

want: getting

P(y |x)

(e.g. logistic regression) for classification

•

Bayes rule:

P(y |x)

∝

P(x |y) P(y )

•

estimating

P(y ): frequency of y n

=y inD (easy!)

•

joint distribution

P(x |y)

: easier if

P(x |y)

=

P(x ₁ |y)P(x 2 |y) · · · p(x d |y)

—conditional independence

•

marginal distribution

P(x _i |y)

:

piece-wise discrete, Gaussian, etc.

Naive Bayes model:

h(x) =

P(x ₁ |y)P(x 2 |y) · · · p(x d |y) P(y )

with your choice of distribution families

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

(12)

Miscellaneous Models Naive Bayes

More about Naive Bayes

find g(x) =

P(x ₁ |y) · · · p(x d |y) P(y )

by ‘good estimate’ of all RHS terms

for binary classification

sign

P(x ₁ | + 1)P(x 2 | + 1) · · · p(x d | + 1) P(+1) P(x ₁ | − 1)P(x 2 | − 1) · · · p(x d | − 1) P( −1)

− 1

= sign

P(+1) P( −1)

d

Y

i=1

P(x _i | + 1) P(x _i | − 1)

− 1

!

= sign











log

P(+1) P( −1)

| {z }

w ₀

+

d

X

i=1

log

P(x _i | + 1) P(x _i | − 1)

| {z }

φ _i (x)











=sign

w ₀

+

d

X

i=1

1·

φ i

(x)

!

—also naive linear model with ‘heuristic/learned’

transform

and

bias

a simple (heuristic) model, usually

super fast

(13)

Miscellaneous Models Naive Bayes

IDCM 2006 Top 10 Data Mining Algorithms

1

C4.5: decision tree

2

K -means: clustering, taught with RBF Network

3

SVM: large-margin/kernel

4

Apriori: for frequent itemset mining

5

EM: the ‘gradient descent’

in Bayesian learning

6

PageRank: for

link-analysis, similar to matrix factorization

7

AdaBoost: aggregation

8

k -NN: taught very shortly within RBF Network

9

Naive Bayes: linear model with heuristic transform

10

CART: decision tree personal view of four missing ML competitors:

LinReg, LogReg, Random Forest, NNet

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/23

(14)

Miscellaneous Models Naive Bayes

Fun Time

(15)

Miscellaneous Models Bayesian Learning

Disclaimer

Part of the following lecture borrows

Prof. Yaser S. Abu-Mostafa’s slides with permission.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/23

(16)

Miscellaneous Models Bayesian Learning

Theprior

P (h = f | D)

^requires^an^additionalprobabilitydistribution:

P (h = f | D) = P ( D | h = f) P (h = f)

P ( D) ∝ P (D | h = f) P (h = f) P (h = f )

^is^the^prior

P (h = f | D)

^is^the^posterior

Giventheprior,wehavethefulldistribution

LearningFromData-Le ture18 7/23

Theprior

P (h = f | D)

^requires^an^additionalprobabilitydistribution:

P (h = f | D) = P ( D | h = f) P (h = f)

P ( D) ∝ P (D | h = f) P (h = f) P (h = f )

^is^the^prior

P (h = f | D)

^is^the^posterior

Giventheprior,wehavethefulldistribution

LearningFromData-Le ture18 7/23

Theprior

P (h = f | D)

^requires^an^additionalprobabilitydistribution:

P (h = f | D) = P ( D | h = f) P (h = f)

P ( D) ∝ P (D | h = f) P (h = f) P (h = f )

^is^the^prior

P (h = f | D)

^is^the^posterior

Giventheprior,wehavethefulldistribution

LearningFromData-Le ture18 7/23

Theprior

P (h = f | D)

^requires^an^additionalprobabilitydistribution:

P (h = f | D) = P ( D | h = f) P (h = f)

P ( D) ∝ P (D | h = f) P (h = f) P (h = f )

^is^the^prior

P (h = f | D)

^is^the^posterior

Giventheprior,wehavethefulldistribution

LearningFromData-Le ture18 7/23

Theprior

P (h = f | D)

^requires^an^additionalprobabilitydistribution:

P (h = f | D) = P ( D | h = f ) P (h = f )

P ( D) ∝ P (D | h = f) P (h = f) P (h = f )

^is^the^prior

P (h = f | D)

^is^the^posterior

Giventheprior,wehavethefulldistribution

LearningFromData-Le ture18 7/23

Theprior

P (h = f | D)

^requires^an^additionalprobabilitydistribution:

P (h = f | D) = P ( D | h = f ) P (h = f )

P ( D) ∝ P (D | h = f) P (h = f) P (h = f )

^is^the^prior

P (h = f | D)

^is^the^posterior

Giventheprior,wehavethefulldistribution

LearningFromData-Le ture18 7/23

Theprior

P (h = f | D)

^requires^an^additionalprobabilitydistribution:

P (h = f | D) = P ( D | h = f ) P (h = f )

P ( D) ∝ P (D | h = f) P (h = f) P (h = f )

^is^the^prior

P (h = f | D)

^is^the^posterior

Giventheprior,wehavethefulldistribution

LearningFromData-Le ture18 7/23

Theprior

P (h = f | D)

^requires^an^additionalprobabilitydistribution:

P (h = f | D) = P ( D | h = f ) P (h = f )

P ( D) ∝ P (D | h = f ) P (h = f ) P (h = f )

^is^the^prior

P (h = f | D)

^is^the^posterior

Giventheprior,wehavethefulldistribution

LearningFromData-Le ture18 7/23

Theprior

P (h = f | D)

^requires^an^additionalprobabilitydistribution:

P (h = f | D) = P ( D | h = f ) P (h = f )

P ( D) ∝ P (D | h = f ) P (h = f ) P (h = f )

^is^the^prior

P (h = f | D)

^is^the^posterior

Giventheprior,wehavethefulldistribution

LearningFromData-Le ture18 7/23

Theprior

P (h = f | D)

^requires^an^additionalprobabilitydistribution:

P (h = f | D) = P ( D | h = f ) P (h = f )

P ( D) ∝ P (D | h = f ) P (h = f ) P (h = f )

^is^the^prior

P (h = f | D)

^is^the^posterior

Giventheprior,wehavethefulldistribution

LearningFromData-Le ture18 7/23

Theprior

P (h = f | D)

^requires^an^additionalprobabilitydistribution:

P (h = f | D) = P ( D | h = f ) P (h = f )

P ( D) ∝ P (D | h = f ) P (h = f ) P (h = f )

^is^the^prior

P (h = f | D)

^is^the^posterior

Giventheprior,wehavethefulldistribution

LearningFromData-Le ture18 7/23

Theprior

P (h = f | D)

^requires^an^additionalprobabilitydistribution:

P (h = f | D) = P ( D | h = f ) P (h = f )

P ( D) ∝ P (D | h = f ) P (h = f ) P (h = f )

^is^the^prior

P (h = f | D)

^is^the^posterior

Giventheprior,wehavethefulldistribution

LearningFromData-Le ture18 7/23

Theprior

P (h = f | D)

^requires^an^additionalprobabilitydistribution:

P (h = f | D) = P ( D | h = f ) P (h = f )

P ( D) ∝ P (D | h = f ) P (h = f ) P (h = f )

^is^the^prior

P (h = f | D)

^is^the^posterior

Giventheprior,wehavethefulldistribution

LearningFromData-Le ture18 7/23

Theprior

P (h = f | D)

^requires^an^additionalprobabilitydistribution:

P (h = f | D) = P ( D | h = f ) P (h = f )

P ( D) ∝ P (D | h = f ) P (h = f ) P (h = f )

^is^the^prior

P (h = f | D)

^is^the^posterior

Giventheprior,wehavethefulldistribution

LearningFromData-Le ture18 7/23

(17)

Miscellaneous Models Bayesian Learning

Exampleofaprior

Consideraper eptron:

h

^is^determined^by

w = w 0 , w 1 , · · · , w d

Apossibleprioron

w

^:^Ea
h

w i

^isindependent,uniformover

[ −1, 1]

Thisdeterminesthepriorover

h

^-

P (h = f )

Given

D

^,^we^an^ompute

P ( D | h = f)

Puttingthemtogether,weget

P (h = f | D)

∝ P (h = f)P (D | h = f)

LearningFromData-Le ture18 8/23

Exampleofaprior

Consideraper eptron:

h

^is^determined^by

w = w 0 , w 1 , · · · , w d

Apossibleprioron

w

^:^Ea
h

w i

^isindependent,uniformover

[ −1, 1]

Thisdeterminesthepriorover

h

^-

P (h = f )

Given

D

^,^we^an^ompute

P ( D | h = f)

Puttingthemtogether,weget

P (h = f | D)

∝ P (h = f)P (D | h = f)

LearningFromData-Le ture18 8/23

Exampleofaprior

Consideraper eptron:

h

^is^determined^by

w = w 0 , w 1 , · · · , w d

Apossibleprioron

w

^:^Ea
h

w i

^isindependent,uniformover

[ −1, 1]

Thisdeterminesthepriorover

h

^-

P (h = f )

Given

D

^,^we^an^ompute

P ( D | h = f)

Puttingthemtogether,weget

P (h = f | D)

∝ P (h = f)P (D | h = f)

LearningFromData-Le ture18 8/23

Exampleofaprior

Consideraper eptron:

h

^is^determined^by

w = w 0 , w 1 , · · · , w d

Apossibleprioron

w

^:^Ea
h

w i

^isindependent,uniformover

[ −1, 1]

Thisdeterminesthepriorover

h

^-

P (h = f )

Given

D

^,^we^an^ompute

P ( D | h = f)

Puttingthemtogether,weget

P (h = f | D)

∝ P (h = f)P (D | h = f)

LearningFromData-Le ture18 8/23

Exampleofaprior

Consideraper eptron:

h

^is^determined^by

w = w 0 , w 1 , · · · , w d

Apossibleprioron

w

^:^Ea
h

w i

^isindependent,uniformover

[ −1, 1]

Thisdeterminesthepriorover

h

^-

P (h = f )

Given

D

^,^we^an^ompute

P ( D | h = f)

Puttingthemtogether,weget

P (h = f | D)

∝ P (h = f)P (D | h = f)

LearningFromData-Le ture18 8/23

Exampleofaprior

Consideraper eptron:

h

^is^determined^by

w = w 0 , w 1 , · · · , w d

Apossibleprioron

w

^:^Ea
h

w i

^isindependent,uniformover

[ −1, 1]

Thisdeterminesthepriorover

h

^-

P (h = f )

Given

D

^,^we^an^ompute

P ( D | h = f)

Puttingthemtogether,weget

P (h = f | D)

∝ P (h = f)P (D | h = f)

LearningFromData-Le ture18 8/23

Exampleofaprior

Consideraper eptron:

h

^is^determined^by

w = w 0 , w 1 , · · · , w d

Apossibleprioron

w

^:^Ea
h

w i

^isindependent,uniformover

[ −1, 1]

Thisdeterminesthepriorover

h

^-

P (h = f )

Given

D

^,^we^an^ompute

P ( D | h = f)

Puttingthemtogether,weget

P (h = f | D)

∝ P (h = f)P (D | h = f)

LearningFromData-Le ture18 8/23

Exampleofaprior

Consideraper eptron:

h

^is^determined^by

w = w 0 , w 1 , · · · , w d

Apossibleprioron

w

^:^Ea
h

w i

^isindependent,uniformover

[ −1, 1]

Thisdeterminesthepriorover

h

^-

P (h = f )

Given

D

^,^we^an^ompute

P ( D | h = f)

Puttingthemtogether,weget

P (h = f | D)

∝ P (h = f)P (D | h = f)

LearningFromData-Le ture18 8/23

Exampleofaprior

Consideraper eptron:

h

^is^determined^by

w = w 0 , w 1 , · · · , w d

Apossibleprioron

w

^:^Ea
h

w i

^isindependent,uniformover

[ −1, 1]

Thisdeterminesthepriorover

h

^-

P (h = f )

Given

D

^,^we^an^ompute

P ( D | h = f )

Puttingthemtogether,weget

P (h = f | D)

∝ P (h = f)P (D | h = f)

LearningFromData-Le ture18 8/23

Exampleofaprior

Consideraper eptron:

h

^is^determined^by

w = w 0 , w 1 , · · · , w d

Apossibleprioron

w

^:^Ea
h

w i

^isindependent,uniformover

[ −1, 1]

Thisdeterminesthepriorover

h

^-

P (h = f )

Given

D

^,^we^an^ompute

P ( D | h = f )

Puttingthemtogether,weget

P (h = f | D)

∝ P (h = f)P (D | h = f)

LearningFromData-Le ture18 8/23

Exampleofaprior

Consideraper eptron:

h

^is^determined^by

w = w 0 , w 1 , · · · , w d

Apossibleprioron

w

^:^Ea
h

w i

^isindependent,uniformover

[ −1, 1]

Thisdeterminesthepriorover

h

^-

P (h = f )

Given

D

^,^we^an^ompute

P ( D | h = f )

Puttingthemtogether,weget

P (h = f | D)

∝ P (h = f)P (D | h = f)

LearningFromData-Le ture18 8/23

Exampleofaprior

Consideraper eptron:

h

^is^determined^by

w = w 0 , w 1 , · · · , w d

Apossibleprioron

w

^:^Ea
h

w i

^isindependent,uniformover

[ −1, 1]

Thisdeterminesthepriorover

h

^-

P (h = f )

Given

D

^,^we^an^ompute

P ( D | h = f )

Puttingthemtogether,weget

P (h = f | D)

∝ P (h = f )P (D | h = f )

LearningFromData-Le ture18 8/23

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/23

(18)

Miscellaneous Models Bayesian Learning

A prior isanassumption

Eventhemostneutralprior:

Hi Hi

Thetrueequivalentis:

Hi Hi

LearningFromData-Le ture18 9/23

A prior isanassumption

Eventhemostneutralprior:

Hi Hi

Thetrueequivalentis:

Hi Hi

LearningFromData-Le ture18 9/23

A prior isanassumption

Eventhemostneutralprior:

x is unknown

1

−1 _Hi

Hi

Thetrueequivalentis:

Hi Hi

LearningFromData-Le ture18 9/23

A prior isanassumption

Eventhemostneutralprior:

x is unknown

1

−1

x is random

?

Hi Hi

Thetrueequivalentis:

Hi Hi

LearningFromData-Le ture18 9/23

A prior isanassumption

Eventhemostneutralprior:

x is unknown

1

−1 x

P(x)

x is random

?

Hi Hi

−1 1

Thetrueequivalentis:

Hi Hi

LearningFromData-Le ture18 9/23

A prior isanassumption

Eventhemostneutralprior:

x is unknown

1

−1 x

P(x)

x is random

Hi Hi

−1 1

Thetrueequivalentis:

Hi Hi

LearningFromData-Le ture18 9/23

A prior isanassumption

Eventhemostneutralprior:

x is unknown

1

−1 x

P(x)

x is random

Hi Hi

−1 1

Thetrueequivalentwouldbe:

Hi Hi

LearningFromData-Le ture18 9/23

A prior isanassumption

Eventhemostneutralprior:

x is unknown

1

−1 x

P(x)

x is random

Hi Hi

−1 1

Thetrueequivalentwouldbe:

x is unknown

1

−1 x

x is random

Hi Hi

−1 a 1

δ ^{(x )} ^−a

LearningFromData-Le ture18 9/23

(19)

Miscellaneous Models Bayesian Learning

If weknewtheprior

. . .

^we^ould^ompute

P (h = f | D)

^for^every

h ∈ H

= ⇒

^we^an^nd^the^most^probable

h

^given^the^data

we anderive

E (h(x))

^for^every

x

we anderivetheerrorbarforevery

x

we anderiveeverythinginaprin ipledway

LearningFromData-Le ture18 10/23

If weknewtheprior

. . .

^we^ould^ompute

P (h = f | D)

^for^every

h ∈ H

= ⇒

^we^an^nd^the^most^probable

h

^given^the^data

we anderive

E (h(x))

^for^every

x

we anderivetheerrorbarforevery

x

we anderiveeverythinginaprin ipledway

LearningFromData-Le ture18 10/23

If weknewtheprior

. . .

^we^ould^ompute

P (h = f | D)

^for^every

h ∈ H

= ⇒

^we^an^nd^the^most^probable

h

^given^the^data

we anderive

E (h(x))

^for^every

x

we anderivetheerrorbarforevery

x

we anderiveeverythinginaprin ipledway

LearningFromData-Le ture18 10/23

If weknewtheprior

. . .

^we^ould^ompute

P (h = f | D)

^for^every

h ∈ H

= ⇒

^we^an^nd^the^most^probable

h

^given^the^data

we anderive

E (h(x))

^for^every

x

we anderivetheerrorbarforevery

x

we anderiveeverythinginaprin ipledway

LearningFromData-Le ture18 10/23

If weknewtheprior

. . .

^we^ould^ompute

P (h = f | D)

^for^every

h ∈ H

= ⇒

^we^an^nd^the^most^probable

h

^given^the^data

we anderive

E (h(x))

^for^every

x

we anderivetheerrorbarforevery

x

we anderiveeverythinginaprin ipledway

LearningFromData-Le ture18 10/23

If weknewtheprior

. . .

^we^ould^ompute

P (h = f | D)

^for^every

h ∈ H

= ⇒

^we^an^nd^the^most^probable

h

^given^the^data

we anderive

E (h(x))

^for^every

x

we anderivetheerrorbarforevery

x

we anderiveeverythinginaprin ipledway

LearningFromData-Le ture18 10/23

If weknewtheprior

. . .

^we^ould^ompute

P (h = f | D)

^for^every

h ∈ H

= ⇒

^we^an^nd^the^most^probable

h

^given^the^data

we anderive

E (h(x))

^for^every

x

we anderivetheerrorbarforevery

x

we anderiveeverythinginaprin ipledway

LearningFromData-Le ture18 10/23

If weknewtheprior

. . .

^we^ould^ompute

P (h = f | D)

^for^every

h ∈ H

= ⇒

^we^an^nd^the^most^probable

h

^given^the^data

we anderive

E (h(x))

^for^every

x

we anderivetheerrorbarforevery

x

we anderiveeverythinginaprin ipledway

LearningFromData-Le ture18 10/23

If weknewtheprior

. . .

^we^ould^ompute

P (h = f | D)

^for^every

h ∈ H

= ⇒

^we^an^nd^the^most^probable

h

^given^the^data

we anderive

E (h(x))

^for^every

x

we anderivetheerrorbarforevery

x

we anderiveeverythinginaprin ipledway

LearningFromData-Le ture18 10/23

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/23

(20)

Miscellaneous Models Bayesian Learning

One Instance of Using Posterior

•

logistic regression: know how to calculate

likelihood P( D|w = w f )

• define Gaussian prior P(w = w _f ) = N (0, σ ² I)

• posterior

=

Gaussian

*

(logistic likelihood)

•

maximize

posterior

= maximize [log Gaussian+

log logistic likelihood]

=

regularized logistic regression regularized logistic regression

= min

augmented error

(with iid assumption +

effective d VC heuristic

+

surrogate error err) c

= max

prior

*

likelihood

(with iid assumption +

prior/likelihood

assumptions)

(21)

Miscellaneous Models Bayesian Learning

When isBayesian learningjustied?

1. Thepriorisvalid

trumpsallothermethods

2. Thepriorisirrelevant

justa omputational atalyst

LearningFromData-Le ture18 11/23

When isBayesian learningjustied?

1. Thepriorisvalid

trumpsallothermethods

2. Thepriorisirrelevant

justa omputational atalyst

LearningFromData-Le ture18 11/23

When isBayesian learningjustied?

1. Thepriorisvalid

trumpsallothermethods

2. Thepriorisirrelevant

justa omputational atalyst

LearningFromData-Le ture18 11/23

When isBayesian learningjustied?

1. Thepriorisvalid

trumpsallothermethods

2. Thepriorisirrelevant

justa omputational atalyst

LearningFromData-Le ture18 11/23

When isBayesian learningjustied?

1. Thepriorisvalid

trumpsallothermethods

2. Thepriorisirrelevant

justa omputational atalyst

LearningFromData-Le ture18 11/23

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/23

(22)

Miscellaneous Models Bayesian Learning

My Biased View

in reality:

•

prior/likelihood mostly

invalid

(Gaussian, conditional independence, etc.), shooting for computational ease

•

prior/likelihood

irrelevant? I don’t know

(23)

Miscellaneous Models Bayesian Learning

Fun Time

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 22/23

(24)

Miscellaneous Models Bayesian Learning

Summary

Lecture 14: Miscellaneous Models Matrix Factorization

Gradient Boosted Decision Tree Naive Bayes

Bayesian Learning

參考文獻

立即下載 ( PDF - 24 頁 - 2.54 MB )

相關文件

Machine Learning Techniques (ᘤᢈ)

2 Combining Predictive Features: Aggregation Models Lecture 7: Blending and Bagging.. Motivation of Aggregation

Machine Learning Techniques (ᘤᢈ)

Which of the following aggregation model learns diverse g t by reweighting and calculates linear vote by steepest search?.

Machine Learning Techniques (ᘤᢈ)

3 Distilling Implicit Features: Extraction Models Lecture 14: Radial Basis Function Network. RBF

Machine Learning Techniques (ᘤᢈ)

Lecture 4: Soft-Margin Support Vector Machine allow some margin violations ξ n while penalizing them by C; equivalent to upper-bounding α n by C Lecture 5: Kernel Logistic

Machine Learning Techniques (ᘤᢈ)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22.. Decision Tree Decision Tree Hypothesis. Disclaimers about

Machine Learning Techniques (ᘤᢈ)

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.. linear SVM: more robust and solvable with quadratic programming Lecture 2: Dual Support

Machine Learning Techniques (ᘤᢈ)

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.

What is Machine Learning Perceptron Learning Algorithm Types of Learning

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics

上傳您的學習材料以下載所有文件。

您的文件將被豐富，在 9lib TW 上共享以幫助學習。

相關文件

Machine Learning Foundations

Machine Learning Foundations

97

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

126

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

26

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

112

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

37

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

31

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

147

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

153

0

0