最近搜尋

沒有找到結果。

標籤

沒有找到結果。

文件

沒有找到結果。

上傳

首頁學校主題

登錄

Machine Learning Foundations (ᘤ9M)

Share "Machine Learning Foundations (ᘤ9M)"

N/A

N/A

Protected

學年: 2022

Info

Protected

Academic year: 2022

Share "Machine Learning Foundations (ᘤ9M)"

Copied!

113

0

0

113

0

0

加載中.... (立即查看全文)

立即下載 ( 113 頁 )

全文

(1)

Machine Learning Foundations ( 機器學習基石)

Lecture 7: The VC Dimension

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

Department of Computer Science

& Information Engineering

National Taiwan University

( 國立台灣大學資訊工程系)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 0/26

(2)

The VC Dimension

Roadmap

1 When Can Machines Learn?

2 Why

Can Machines Learn?

Lecture 6: Theory of Generalization E out ≈ E in

possible

if

m H (N) breaks somewhere

and

N large enough Lecture 7: The VC Dimension

Definition of VC Dimension VC Dimension of Perceptrons Physical Intuition of VC Dimension Interpreting VC Dimension

3 How Can Machines Learn?

4 How Can Machines Learn Better?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 1/26

(3)

The VC Dimension Definition of VC Dimension

Recap: More on Growth Function

m

H

(N) of break point k ≤

B(N, k ) =

X

k −1 i=0

N i

| {z }

highest term N

^{k −1}

k

B(N , k ) 1 2 3 4 5

1 1 2 2 2 2

2 1 3 4 4 4

3 1 4 7 8 8

N 4 1 5 11 15 16

5 1 6 16 26 31

6 1 7 22 42 57

k

N

^{k −1}

1 2 3 4 5

1 1 1 1 1 1

2 1 2 4 8 16

3 1 3 9 27 81

4 1 4 16 64 256

5 1 5 25 125 625 6 1 6 36 216 1296

provably

& loosely, for N ≥ 2, k ≥ 3,

m

_H

(N)

≤ B(N, k ) =

k −1

X

i=0

N i

≤ N ^{k −1}

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 2/26

(4)

The VC Dimension Definition of VC Dimension

Recap: More on Growth Function

m

H

(N) of break point k ≤

B(N, k ) =

X

k −1 i=0

N i

| {z }

highest term N

^{k −1}

k

B(N , k ) 1 2 3 4 5

1 1 2 2 2 2

2 1 3 4 4 4

3 1 4 7 8 8

N 4 1 5 11 15 16

5 1 6 16 26 31

6 1 7 22 42 57

k

N

^{k −1}

1 2 3 4 5

1 1 1 1 1 1

2 1 2 4 8 16

3 1 3 9 27 81

4 1 4 16 64 256

5 1 5 25 125 625 6 1 6 36 216 1296

provably

& loosely, for N ≥ 2, k ≥ 3,

m

_H

(N)

≤ B(N, k ) =

k −1

X

i=0

N i

≤ N ^{k −1}

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 2/26

(5)

The VC Dimension Definition of VC Dimension

Recap: More on Growth Function

m

H

(N) of break point k ≤

B(N, k ) =

X

k −1 i=0

N i

| {z }

highest term N

^{k −1}

k

B(N , k ) 1 2 3 4 5

1 1 2 2 2 2

2 1 3 4 4 4

3 1 4 7 8 8

N 4 1 5 11 15 16

5 1 6 16 26 31

6 1 7 22 42 57

k

N

^{k −1}

1 2 3 4 5

1 1 1 1 1 1

2 1 2 4 8 16

3 1 3 9 27 81

4 1 4 16 64 256

5 1 5 25 125 625 6 1 6 36 216 1296

provably

& loosely, for N ≥ 2, k ≥ 3,

m

_H

(N)

≤ B(N, k ) =

k −1

X

i=0

N i

≤ N ^{k −1}

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 2/26

(6)

The VC Dimension Definition of VC Dimension

Recap: More on Growth Function

m

H

(N) of break point k ≤

B(N, k ) =

X

k −1 i=0

N i

| {z }

highest term N

^{k −1}

k

B(N , k ) 1 2 3 4 5

1 1 2 2 2 2

2 1 3 4 4 4

3 1 4 7 8 8

N 4 1 5 11 15 16

5 1 6 16 26 31

6 1 7 22 42 57

k

N

^{k −1}

1 2 3 4 5

1 1 1 1 1 1

2 1 2 4 8 16

3 1 3 9 27 81

4 1 4 16 64 256

5 1 5 25 125 625 6 1 6 36 216 1296

provably

& loosely, for N ≥ 2, k ≥ 3,

m

_H

(N)

≤ B(N, k ) =

k −1

X

i=0

N i

≤ N ^{k −1}

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 2/26

(7)

The VC Dimension Definition of VC Dimension

Recap: More on Growth Function

m

H

(N) of break point k ≤

B(N, k ) =

X

k −1 i=0

N i

| {z }

highest term N

^{k −1}

k

B(N , k ) 1 2 3 4 5

1 1 2 2 2 2

2 1 3 4 4 4

3 1 4 7 8 8

N 4 1 5 11 15 16

5 1 6 16 26 31

6 1 7 22 42 57

k

N

^{k −1}

1 2 3 4 5

1 1 1 1 1 1

2 1 2 4 8 16

3 1 3 9 27 81

4 1 4 16 64 256

5 1 5 25 125 625 6 1 6 36 216 1296

provably

& loosely, for N ≥ 2, k ≥ 3,

m

_H

(N)

≤ B(N, k ) =

k −1

X

i=0

N i

≤ N ^{k −1}

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 2/26

(8)

The VC Dimension Definition of VC Dimension

Recap: More on Vapnik-Chervonenkis (VC) Bound

For

any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, k ≥ 3

P

D

h

E

_in

(g)− E

out

(g) > i

≤

P

D

h∃h ∈

H

s.t.

E

_in

(h)− E

^out

(h) > i

≤ 4m

H

(2N)exp

−

¹ ₈

² N

if k exists

≤ 4(2N)

^{k −1}

exp

−

¹ ₈

² N

if 1 m

H

(N) breaks at k (good

H

)

if

2 N large enough (good

D

)

=⇒

probably

generalized ‘E

_out

≈ E

in

’, and if 3 A picks a g with small E

in

(good

A

)

=⇒

probably

learned!

(:-) good luck)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/26

(9)

The VC Dimension Definition of VC Dimension

Recap: More on Vapnik-Chervonenkis (VC) Bound

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, k ≥ 3

P

D

h

E

_in

(g)− E

out

(g) > i

≤ P

D

h∃h ∈

H

s.t.

E

_in

(h)− E

^out

(h) > i

≤ 4m

H

(2N)exp

−

¹ ₈

² N

if k exists

≤ 4(2N)

^{k −1}

exp

−

¹ ₈

² N

if 1 m

H

(N) breaks at k (good

H

)

if

2 N large enough (good

D

)

=⇒

probably

generalized ‘E

_out

≈ E

in

’, and if 3 A picks a g with small E

in

(good

A

)

=⇒

probably

learned!

(:-) good luck)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/26

(10)

The VC Dimension Definition of VC Dimension

Recap: More on Vapnik-Chervonenkis (VC) Bound

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, k ≥ 3

P

D

h

E

_in

(g)− E

out

(g) > i

≤ P

D

h∃h ∈

H

s.t.

E

_in

(h)− E

^out

(h) > i

≤ 4m

H

(2N)exp

−

¹ ₈

² N

if k exists

≤ 4(2N)

^{k −1}

exp

−

¹ ₈

² N

if 1 m

H

(N) breaks at k (good

H

)

if

2 N large enough (good

D

)

=⇒

probably

generalized ‘E

_out

≈ E

in

’, and if 3 A picks a g with small E

in

(good

A

)

=⇒

probably

learned!

(:-) good luck)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/26

(11)

The VC Dimension Definition of VC Dimension

Recap: More on Vapnik-Chervonenkis (VC) Bound

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, k ≥ 3

P

D

h

E

_in

(g)− E

out

(g) > i

≤ P

D

h∃h ∈

H

s.t.

E

_in

(h)− E

^out

(h) > i

≤ 4m

H

(2N)exp

−

¹ ₈

² N

if k exists

≤ 4(2N)

^{k −1}

exp

−

¹ ₈

² N

if 1 m

H

(N) breaks at k (good

H

)

if

2 N large enough (good

D

)

=⇒

probably

generalized ‘E

_out

≈ E

in

’

, and if 3 A picks a g with small E

in

(good

A

)

=⇒

probably

learned!

(:-) good luck)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/26

(12)

The VC Dimension Definition of VC Dimension

Recap: More on Vapnik-Chervonenkis (VC) Bound

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, k ≥ 3

P

D

h

E

_in

(g)− E

out

(g) > i

≤ P

D

h∃h ∈

H

s.t.

E

_in

(h)− E

^out

(h) > i

≤ 4m

H

(2N)exp

−

¹ ₈

² N

if k exists

≤ 4(2N)

^{k −1}

exp

−

¹ ₈

² N

if 1 m

H

(N) breaks at k (good

H

)

if

2 N large enough (good

D

)

=⇒

probably

generalized ‘E

_out

≈ E

in

’, and if 3 A picks a g with small E

in

(good

A

)

=⇒

probably

learned!

(:-) good luck)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/26

(13)

The VC Dimension Definition of VC Dimension

Recap: More on Vapnik-Chervonenkis (VC) Bound

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, k ≥ 3

P

D

h

E

_in

(g)− E

out

(g) > i

≤ P

D

h∃h ∈

H

s.t.

E

_in

(h)− E

^out

(h) > i

≤ 4m

H

(2N)exp

−

¹ ₈

² N

if k exists

≤ 4(2N)

^{k −1}

exp

−

¹ ₈

² N

if 1 m

H

(N) breaks at k (good

H

)

if

2 N large enough (good

D

)

=⇒

probably

generalized ‘E

_out

≈ E

in

’, and if 3 A picks a g with small E

in

(good

A

)

=⇒

probably

learned!

(:-) good luck)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/26

(14)

The VC Dimension Definition of VC Dimension

Recap: More on Vapnik-Chervonenkis (VC) Bound

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, k ≥ 3

P

D

h

E

_in

(g)− E

out

(g) > i

≤ P

D

h∃h ∈

H

s.t.

E

_in

(h)− E

^out

(h) > i

≤ 4m

H

(2N)exp

−

¹ ₈

² N

if k exists

≤ 4(2N)

^{k −1}

exp

−

¹ ₈

² N

if 1 m

H

(N) breaks at k (good

H

)

if

2 N large enough (good

D

)

=⇒

probably

generalized ‘E

_out

≈ E

in

’, and if 3 A picks a g with small E

in

(good

A

)

=⇒

probably

learned! (:-) good luck)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/26

(15)

The VC Dimension Definition of VC Dimension

VC Dimension

the formal name of

maximum non-

break point

Definition

VC dimension ofH, denoted d^VC(H) is

largest

N for which m

H

(N) = 2

^N

•

the

most

inputsH that can shatter

•

dVC=‘minimum k’ - 1

N ≤ dVC =⇒ H can shatter some N inputs

> dVC =⇒ is a break point forH

if N≥ 2, dVC ≥ 2,

m _H (N) ≤ N ^d

^VC

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/26

(16)

The VC Dimension Definition of VC Dimension

VC Dimension

the formal name of

maximum non-break point

Definition

VC dimension ofH, denoted d^VC(H) is

largest

N for which m

H

(N) = 2

^N

•

the

most

inputsH that can shatter

•

dVC=‘minimum k’ - 1

N ≤ dVC =⇒ H can shatter some N inputs

> dVC =⇒ is a break point forH

if N≥ 2, dVC ≥ 2,

m _H (N) ≤ N ^d

^VC

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/26

(17)

The VC Dimension Definition of VC Dimension

VC Dimension

the formal name of

maximum non-break point Definition

VC dimension ofH, denoted d^VC(H) is

largest

N for which m

H

(N) = 2

^N

•

the

most

inputsH that can shatter

•

dVC=‘minimum k’ - 1

N ≤ dVC =⇒ H can shatter some N inputs

> dVC =⇒ is a break point forH

if N≥ 2, dVC ≥ 2,

m _H (N) ≤ N ^d

^VC

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/26

(18)

The VC Dimension Definition of VC Dimension

VC Dimension

the formal name of

maximum non-break point Definition

VC dimension ofH, denoted d^VC(H) is

largest

N for which m

H

(N) = 2

^N

•

the

most

inputsH that can shatter

•

dVC =‘minimum k’ - 1

N ≤ dVC =⇒ H can shatter some N inputs

> dVC =⇒ is a break point forH

if N≥ 2, dVC ≥ 2,

m _H (N) ≤ N ^d

^VC

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/26

(19)

The VC Dimension Definition of VC Dimension

VC Dimension

the formal name of

maximum non-break point Definition

VC dimension ofH, denoted d^VC(H) is

largest

N for which m

H

(N) = 2

^N

•

the

most

inputsH that can shatter

•

dVC =‘minimum k’ - 1

N ≤ dVC =⇒ H can shatter some N inputs

> dVC =⇒ is a break point forH

if N≥ 2, dVC ≥ 2,

m _H (N) ≤ N ^d

^VC

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/26

(20)

The VC Dimension Definition of VC Dimension

VC Dimension

the formal name of

maximum non-break point Definition

VC dimension ofH, denoted d^VC(H) is

largest

N for which m

H

(N) = 2

^N

•

the

most

inputsH that can shatter

•

dVC =‘minimum k’ - 1

N ≤ dVC =⇒ H can shatter some N inputs N > dVC =⇒ N is a break point forH

if N≥ 2, dVC ≥ 2,

m _H (N) ≤ N ^d

^VC

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/26

(21)

The VC Dimension Definition of VC Dimension

VC Dimension

the formal name of

maximum non-break point Definition

VC dimension ofH, denoted d^VC(H) is

largest

N for which m

H

(N) = 2

^N

•

the

most

inputsH that can shatter

•

dVC =‘minimum k’ - 1

N ≤ dVC =⇒ H can shatter some N inputs

k

> dVC =⇒

k

is a break point forH

if N≥ 2, dVC ≥ 2,

m _H (N) ≤ N ^d

^VC

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/26

(22)

The VC Dimension Definition of VC Dimension

VC Dimension

the formal name of

maximum non-break point Definition

VC dimension ofH, denoted d^VC(H) is

largest

N for which m

H

(N) = 2

^N

•

the

most

inputsH that can shatter

•

dVC =‘minimum k’ - 1

N ≤ dVC =⇒ H can shatter some N inputs

k

> dVC =⇒

k

is a break point forH

if N≥ 2, dVC ≥ 2,

m _H (N) ≤ N ^d

^VC

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/26

(23)

The VC Dimension Definition of VC Dimension

The Four VC Dimensions

•

positive rays: m

_H

(N) = N + 1

d

VC

= 1

•

•

positive intervals: m

H

(N) =

¹ ₂

N

²

+

¹ ₂

N + 1

d

VC

= 2

• •

•

convex sets: m

H

(N) = 2

^N

d

VC

= ∞

up

bottom

•

2D perceptrons: m

H

(N)≤ N

³

for N ≥ 2

d

_VC

= 3

•

• •

good:

finite d

VC

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/26

(24)

The VC Dimension Definition of VC Dimension

The Four VC Dimensions

•

positive rays: m

_H

(N) = N + 1

d

VC

= 1

•

•

positive intervals: m

H

(N) =

¹ ₂

N

²

+

¹ ₂

N + 1

d

VC

= 2

• •

•

convex sets: m

H

(N) = 2

^N

d

VC

= ∞

up

bottom

•

2D perceptrons: m

H

(N)≤ N

³

for N ≥ 2

d

_VC

= 3

•

• •

good:

finite d

VC

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/26

(25)

The VC Dimension Definition of VC Dimension

The Four VC Dimensions

•

positive rays: m

_H

(N) = N + 1

d

VC

= 1

•

•

positive intervals: m

H

(N) =

¹ ₂

N

²

+

¹ ₂

N + 1

d

VC

= 2

• •

•

convex sets: m

H

(N) = 2

^N

d

VC

= ∞

up

bottom

•

2D perceptrons: m

H

(N)≤ N

³

for N ≥ 2

d

_VC

= 3

•

• •

good:

finite d

VC

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/26

(26)

The VC Dimension Definition of VC Dimension

The Four VC Dimensions

•

positive rays: m

_H

(N) = N + 1

d

VC

= 1

•

•

positive intervals: m

H

(N) =

¹ ₂

N

²

+

¹ ₂

N + 1

d

VC

= 2

• •

•

convex sets: m

H

(N) = 2

^N

d

VC

= ∞

up

bottom

•

2D perceptrons: m

H

(N)≤ N

³

for N ≥ 2

d

_VC

= 3

•

• •

good:

finite d

VC

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/26

(27)

The VC Dimension Definition of VC Dimension

The Four VC Dimensions

•

positive rays: m

_H

(N) = N + 1

d

VC

= 1

•

•

positive intervals: m

H

(N) =

¹ ₂

N

²

+

¹ ₂

N + 1

d

VC

= 2

• •

•

convex sets: m

H

(N) = 2

^N

d

VC

= ∞

up

bottom

•

2D perceptrons: m

H

(N)≤ N

³

for N ≥ 2

d

_VC

= 3

•

• •

good:

finite d

_VC

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/26

(28)

The VC Dimension Definition of VC Dimension

VC Dimension and Learning

finite d

VC

= ⇒ g ‘will’ generalize (E out (g) ≈ E in (g))

•

regardless of learning algorithmA

•

regardless of input distribution P

•

regardless of target function f

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

unknown P on X

x

1

, x

2

, · · · , x

N

x

‘worst case’

guarantee on generalization

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 6/26

(29)

The VC Dimension Definition of VC Dimension

VC Dimension and Learning

finite d

VC

= ⇒ g ‘will’ generalize (E out (g) ≈ E in (g))

•

regardless of learning algorithmA

•

regardless of input distribution P

•

regardless of target function f

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

unknown P on X

x

1

, x

2

, · · · , x

N

x

‘worst case’

guarantee on generalization

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 6/26

(30)

The VC Dimension Definition of VC Dimension

VC Dimension and Learning

finite d

VC

= ⇒ g ‘will’ generalize (E out (g) ≈ E in (g))

•

regardless of learning algorithmA

•

regardless of input distribution P

•

regardless of target function f

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

unknown P on X

x

1

, x

2

, · · · , x

N

x

‘worst case’

guarantee on generalization

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 6/26

(31)

The VC Dimension Definition of VC Dimension

VC Dimension and Learning

finite d

VC

= ⇒ g ‘will’ generalize (E out (g) ≈ E in (g))

•

regardless of learning algorithmA

•

regardless of input distribution P

•

regardless of target function f

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

unknown P on X

x

1

, x

2

, · · · , x

N

x

‘worst case’

guarantee on generalization

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 6/26

(32)

The VC Dimension Definition of VC Dimension

VC Dimension and Learning

finite d

VC

= ⇒ g ‘will’ generalize (E out (g) ≈ E in (g))

•

regardless of learning algorithmA

•

regardless of input distribution P

•

regardless of target function f

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

unknown P on X

x

₁

, x

₂

, · · · , x

_N

x

‘worst case’

guarantee on generalization

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 6/26

(33)

The VC Dimension Definition of VC Dimension

VC Dimension and Learning

finite d

VC

= ⇒ g ‘will’ generalize (E out (g) ≈ E in (g))

•

regardless of learning algorithmA

•

regardless of input distribution P

•

regardless of target function f

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

unknown P on X

x

₁

, x

₂

, · · · , x

_N

x

‘worst case’

guarantee on generalization

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 6/26

(34)

The VC Dimension Definition of VC Dimension

Fun Time

If there is a set of N inputs that cannot be shattered by H. Based only on this information, what can we conclude about d

VC

( H)?

1

dVC(H) > N

2

dVC(H) = N

3

d_VC(H) < N

4

no conclusion can be made

Reference Answer: 4

It is possible that there is another set of N inputs that can be shattered, which means dVC≥ N. It is also possible that no set of N input can be shattered, which means d_VC< N. Neither cases can be ruled out by one

non-shattering set.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 7/26

(35)

The VC Dimension Definition of VC Dimension

Fun Time

If there is a set of N inputs that cannot be shattered by H. Based only on this information, what can we conclude about d

VC

( H)?

1

dVC(H) > N

2

dVC(H) = N

3

d_VC(H) < N

4

no conclusion can be made

Reference Answer: 4

It is possible that there is another set of N inputs that can be shattered, which means dVC≥ N. It is also possible that no set of N input can be shattered, which means d_VC< N.

Neither cases can be ruled out by one non-shattering set.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 7/26

(36)

The VC Dimension VC Dimension of Perceptrons

2D PLA Revisited

E _out (g) ≈ 0 :-)

E _in (g) = 0 E out (g) ≈ E in (g)

PLA can converge P[|E in (g) − E out (g) | > ] ≤ ... by d

VC

= 3

linearly separable D

with x n ∼ P and y ⁿ = f (x n )

T large N large

general PLA for

x with more than 2 features?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 8/26

(37)

The VC Dimension VC Dimension of Perceptrons

2D PLA Revisited

E _out (g) ≈ 0 :-)

E _in (g) = 0 E out (g) ≈ E in (g)

PLA can converge

P[|E in (g) − E out (g) | > ] ≤ ... by d

VC

= 3

linearly separable D

with x n ∼ P and y ⁿ = f (x n )

T large N large

general PLA for

x with more than 2 features?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 8/26

(38)

The VC Dimension VC Dimension of Perceptrons

2D PLA Revisited

E _out (g) ≈ 0 :-)

E _in (g) = 0

E out (g) ≈ E in (g)

PLA can converge

P[|E in (g) − E out (g) | > ] ≤ ... by d

VC

= 3

linearly separable D

with x n ∼ P and y ⁿ = f (x n )

T large

N large

general PLA for

x with more than 2 features?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 8/26

(39)

The VC Dimension VC Dimension of Perceptrons

2D PLA Revisited

E _out (g) ≈ 0 :-)

E _in (g) = 0

E out (g) ≈ E in (g)

PLA can converge

P[|E in (g) − E out (g) | > ] ≤ ... by d

VC

= 3

linearly separable D with x n ∼ P and y ⁿ = f (x n )

T large

N large

general PLA for

x with more than 2 features?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 8/26

(40)

The VC Dimension VC Dimension of Perceptrons

2D PLA Revisited

E _out (g) ≈ 0 :-)

E _in (g) = 0

E out (g) ≈ E in (g)

PLA can converge P[|E in (g) − E out (g) | > ] ≤ ... by d

VC

= 3 linearly separable D with x n ∼ P and y ⁿ = f (x n )

T large

N large

general PLA for

x with more than 2 features?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 8/26

(41)

The VC Dimension VC Dimension of Perceptrons

2D PLA Revisited

E _out (g) ≈ 0 :-)

E _in (g) = 0 E _out (g) ≈ E in (g)

PLA can converge P[|E in (g) − E out (g) | > ] ≤ ... by d

VC

= 3 linearly separable D with x n ∼ P and y ⁿ = f (x n )

T large N large

general PLA for

x with more than 2 features?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 8/26

(42)

The VC Dimension VC Dimension of Perceptrons

2D PLA Revisited

E _out (g) ≈ 0 :-)

E _in (g) = 0 E _out (g) ≈ E in (g)

PLA can converge P[|E in (g) − E out (g) | > ] ≤ ... by d

VC

= 3 linearly separable D with x n ∼ P and y ⁿ = f (x n )

T large N large

general PLA for

x with more than 2 features?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 8/26

(43)

The VC Dimension VC Dimension of Perceptrons

2D PLA Revisited

E _out (g) ≈ 0 :-)

E _in (g) = 0 E _out (g) ≈ E in (g)

PLA can converge P[|E in (g) − E out (g) | > ] ≤ ... by d

VC

= 3 linearly separable D with x n ∼ P and y ⁿ = f (x n )

T large N large

general PLA for

x with more than 2 features?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 8/26

(44)

The VC Dimension VC Dimension of Perceptrons

VC Dimension of Perceptrons

•

1D perceptron (pos/neg rays): dVC=2

•

2D perceptrons: dVC=3

• d

VC

≥ 3: •

• •

• d

VC

≤ 3: × ◦

◦ ×

•

d -D perceptrons: dVC

=

?

d + 1

two steps:

•

dVC≥ d + 1

•

dVC≤ d + 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 9/26

(45)

The VC Dimension VC Dimension of Perceptrons

VC Dimension of Perceptrons

•

1D perceptron (pos/neg rays): dVC=2

•

2D perceptrons: dVC=3

• d

VC

≥ 3: •

• •

• d

VC

≤ 3: × ◦

◦ ×

•

d -D perceptrons: dVC

=

?

d + 1

two steps:

•

dVC≥ d + 1

•

dVC≤ d + 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 9/26

(46)

The VC Dimension VC Dimension of Perceptrons

VC Dimension of Perceptrons

•

1D perceptron (pos/neg rays): dVC=2

•

2D perceptrons: dVC=3

• d

VC

≥ 3: •

• •

• d

VC

≤ 3: × ◦

◦ ×

•

d -D perceptrons: dVC

=

?

d + 1

two steps:

•

dVC≥ d + 1

•

dVC≤ d + 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 9/26

(47)

The VC Dimension VC Dimension of Perceptrons

VC Dimension of Perceptrons

•

1D perceptron (pos/neg rays): dVC=2

•

2D perceptrons: dVC=3

• d

VC

≥ 3: •

• •

• d

VC

≤ 3: × ◦

◦ ×

•

d -D perceptrons: dVC

=

?

d + 1

two steps:

•

dVC ≥ d + 1

•

dVC ≤ d + 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 9/26

(48)

The VC Dimension VC Dimension of Perceptrons

Extra Fun Time

What statement below shows that d

VC

≥ d + 1?

1

There are some d + 1 inputs we can shatter.

2

We can shatter any set of d + 1 inputs.

3

There are some d + 2 inputs we cannot shatter.

4

We cannot shatter any set of d + 2 inputs.

Reference Answer: 1

dVCis the maximum that m

H

(N) = 2

^N

, and m

H

(N) is the most number of dichotomies of N inputs. So if we can find 2

^{d +1}

dichotomies on some d + 1 inputs, m

H

(d + 1) = 2

^{d +1}

and hence d_VC≥ d + 1.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 10/26

(49)

The VC Dimension VC Dimension of Perceptrons

Extra Fun Time

What statement below shows that d

VC

≥ d + 1?

1

There are some d + 1 inputs we can shatter.

2

We can shatter any set of d + 1 inputs.

3

There are some d + 2 inputs we cannot shatter.

4

We cannot shatter any set of d + 2 inputs.

Reference Answer: 1

dVCis the maximum that m

H

(N) = 2

^N

, and m

H

(N) is the most number of dichotomies of N inputs. So if we can find 2

^{d +1}

dichotomies on some d + 1 inputs, m

H

(d + 1) = 2

^{d +1}

and hence d_VC≥ d + 1.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 10/26

(50)

The VC Dimension VC Dimension of Perceptrons

d VC ≥ d + 1

There are

some d + 1 inputs

we can shatter.

•

some ‘trivial’ inputs:

X =















—

x ^T ₁

—

—

x ^T ₂

—

—

x ^T ₃

— ...

—x

^T _{d +1}

—















=















1 0 0 . . . 0 1 1 0 . . . 0

1 0 1 0

.. . .. . . .. 0 1 0 . . . 0 1















•

visually in 2D:

•

• •

note:

X invertible!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 11/26

(51)

The VC Dimension VC Dimension of Perceptrons

Can We Shatter X?

X =











—

x ^T ₁

—

—

x ^T ₂

— ...

—x

^T _{d +1}

—











=











1 0 0 . . . 0 1 1 0 . . . 0 .. . .. . . .. 0 1 0 . . . 0 1











invertible

to shatter . . .

for any

y =





 y

₁

... y

_{d +1}





, find

w such that

sign (Xw) = y

⇐=

(Xw) = y X invertible!

⇐⇒

w = X ⁻¹ y

‘special’ X can be shattered =⇒ dVC≥ d + 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 12/26

(52)

The VC Dimension VC Dimension of Perceptrons

Can We Shatter X?

X =











—

x ^T ₁

—

—

x ^T ₂

— ...

—x

^T _{d +1}

—











=











1 0 0 . . . 0 1 1 0 . . . 0 .. . .. . . .. 0 1 0 . . . 0 1











invertible

to shatter . . .

for any

y =





 y

₁

... y

_{d +1}





, find

w such that

sign (Xw) = y ⇐=

(Xw) = y

X invertible!

⇐⇒

w = X ⁻¹ y

‘special’ X can be shattered =⇒ dVC≥ d + 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 12/26

(53)

The VC Dimension VC Dimension of Perceptrons

Can We Shatter X?

X =











—

x ^T ₁

—

—

x ^T ₂

— ...

—x

^T _{d +1}

—











=











1 0 0 . . . 0 1 1 0 . . . 0 .. . .. . . .. 0 1 0 . . . 0 1











invertible

to shatter . . .

for any

y =





 y

₁

... y

_{d +1}





, find

w such that

sign (Xw) = y ⇐=

(Xw) = y X invertible!

⇐⇒

w = X ⁻¹ y

‘special’ X can be shattered =⇒ dVC≥ d + 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 12/26

(54)

The VC Dimension VC Dimension of Perceptrons

Can We Shatter X?

X =











—

x ^T ₁

—

—

x ^T ₂

— ...

—x

^T _{d +1}

—











=











1 0 0 . . . 0 1 1 0 . . . 0 .. . .. . . .. 0 1 0 . . . 0 1











invertible

to shatter . . .

for any

y =





 y

₁

... y

_{d +1}





, find

w such that

sign (Xw) = y ⇐=

(Xw) = y X invertible!

⇐⇒

w = X ⁻¹ y

‘special’ X can be shattered =⇒ dVC≥ d + 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 12/26

(55)

The VC Dimension VC Dimension of Perceptrons

Extra Fun Time

What statement below shows that d

VC

≤ d + 1?

1

There are some d + 1 inputs we can shatter.

2

We can shatter any set of d + 1 inputs.

3

There are some d + 2 inputs we cannot shatter.

4

We cannot shatter any set of d + 2 inputs.

Reference Answer: 4

dVCis the maximum that m

H

(N) = 2

^N

, and m

H

(N) is the most number of dichotomies of N inputs. So if we cannot find 2

^{d +2}

dichotomies on any d + 2 inputs (i.e. break point),

m

H

(d + 2)< 2

^{d +2}

and hence d_VC< d + 2. That is, dVC≤ d + 1.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/26

(56)

The VC Dimension VC Dimension of Perceptrons

Extra Fun Time

What statement below shows that d

VC

≤ d + 1?

1

There are some d + 1 inputs we can shatter.

2

We can shatter any set of d + 1 inputs.

3

There are some d + 2 inputs we cannot shatter.

4

We cannot shatter any set of d + 2 inputs.

Reference Answer: 4

dVCis the maximum that m

H

(N) = 2

^N

, and m

H

(N) is the most number of dichotomies of N inputs. So if we cannot find 2

^{d +2}

dichotomies on any d + 2 inputs (i.e. break point),

m

H

(d + 2)< 2

^{d +2}

and hence d_VC< d + 2.

That is, dVC≤ d + 1.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/26

(57)

The VC Dimension VC Dimension of Perceptrons

d VC ≤ d + 1 (1/2)

A 2D Special Case

• •

• • X =









—

x ^T ₁

—

—

x ^T ₂

—

—

x ^T ₃

—

—x

^T ₄

—









=









1 0 0 1 1 0 1 0 1 1 1 1









◦ ?

× ◦

? cannot be ×

w ^T x ₄

= + −

> 0

linear dependence

restricts dichotomy

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 14/26

(58)

The VC Dimension VC Dimension of Perceptrons

d VC ≤ d + 1 (1/2)

A 2D Special Case

• •

• • X =









—

x ^T ₁

—

—

x ^T ₂

—

—

x ^T ₃

—

—x

^T ₄

—









=









1 0 0 1 1 0 1 0 1 1 1 1









◦ ?

× ◦

? cannot be ×

w ^T x ₄

= + −

> 0

linear dependence

restricts dichotomy

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 14/26

(59)

The VC Dimension VC Dimension of Perceptrons

d VC ≤ d + 1 (1/2)

A 2D Special Case

• •

• • X =









—

x ^T ₁

—

—

x ^T ₂

—

—

x ^T ₃

—

—x

^T ₄

—









=









1 0 0 1 1 0 1 0 1 1 1 1









◦ ?

× ◦

? cannot be ×

w ^T

x ₄

=

w ^T

x ₂

+

w ^T

x ₃

−

w ^T

x ₁

> 0

linear dependence

restricts dichotomy

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 14/26

(60)

The VC Dimension VC Dimension of Perceptrons

d VC ≤ d + 1 (1/2)

A 2D Special Case

• •

• • X =









—

x ^T ₁

—

—

x ^T ₂

—

—

x ^T ₃

—

—x

^T ₄

—









=









1 0 0 1 1 0 1 0 1 1 1 1









◦ ?

× ◦

? cannot be × w ^T x ₄

=

w ^T x ₂

+

w ^T x ₃

− w

^T x ₁

> 0

linear dependence

restricts dichotomy

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 14/26

(61)

The VC Dimension VC Dimension of Perceptrons

d VC ≤ d + 1 (1/2)

A 2D Special Case

• •

• • X =









—

x ^T ₁

—

—

x ^T ₂

—

—

x ^T ₃

—

—x

^T ₄

—









=









1 0 0 1 1 0 1 0 1 1 1 1









◦ ?

× ◦

? cannot be × w ^T x ₄

=

w ^T x ₂

| {z }

◦

+

w ^T x ₃

| {z }

◦

−

w ^T x ₁

| {z }

×

> 0

linear dependence

restricts dichotomy

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 14/26

(62)

The VC Dimension VC Dimension of Perceptrons

d VC ≤ d + 1 (1/2)

A 2D Special Case

• •

• • X =









—

x ^T ₁

—

—

x ^T ₂

—

—

x ^T ₃

—

—x

^T ₄

—









=









1 0 0 1 1 0 1 0 1 1 1 1









◦ ?

× ◦

? cannot be × w ^T x ₄

=

w ^T x ₂

| {z }

◦

+

w ^T x ₃

| {z }

◦

−

w ^T x ₁

| {z }

×

> 0

linear dependence

restricts dichotomy

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 14/26

(63)

The VC Dimension VC Dimension of Perceptrons

d VC ≤ d + 1 (1/2)

A 2D Special Case

• •

• • X =









—

x ^T ₁

—

—

x ^T ₂

—

—

x ^T ₃

—

—x

^T ₄

—









=









1 0 0 1 1 0 1 0 1 1 1 1









◦ ?

× ◦

? cannot be × w ^T x ₄

=

w ^T x ₂

| {z }

◦

+

w ^T x ₃

| {z }

◦

−

w ^T x ₁

| {z }

×

> 0

linear dependence

restricts dichotomy

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 14/26

(64)

The VC Dimension VC Dimension of Perceptrons

d VC ≤ d + 1 (2/2)

d -D General Case

X =















—

x ^T ₁

—

—

x ^T ₂

— ...

—

x ^T _{d +1}

—

—

x ^T _{d +2}

—















more rows than columns:

linear dependence (some a

_i

non-zero)

x _{d +2}

=

a ₁ x ₁

+

a ₂ x ₂

+. . . +

a _{d +1} x _{d +1}

•

can you generate (sign(a

₁ ), sign(a ₂ ), . . . , sign(a _{d +1} ), ×

)? if so, what

w?

w ^T x _{d +2}

=

a ₁ w ^T x ₁

| {z }

◦

+a

₂ w ^T x ₂

| {z }

×

+. . . +

a _{d +1} w ^T x _{d +1}

| {z }

×

>

0(contradition!)

‘general’ X no-shatter =⇒ dVC ≤ d + 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 15/26

(65)

The VC Dimension VC Dimension of Perceptrons

d VC ≤ d + 1 (2/2)

d -D General Case

X =















—

x ^T ₁

—

—

x ^T ₂

— ...

—

x ^T _{d +1}

—

—

x ^T _{d +2}

—















more rows than columns:

linear dependence (some a

_i

non-zero)

x _{d +2}

=

a ₁ x ₁

+

a ₂ x ₂

+. . . +

a _{d +1} x _{d +1}

•

can you generate (sign(a

₁ ), sign(a ₂ ), . . . , sign(a _{d +1} ), ×

)? if so, what

w?

w ^T x _{d +2}

=

a ₁ w ^T x ₁

| {z }

◦

+a

₂ w ^T x ₂

| {z }

×

+. . . +

a _{d +1} w ^T x _{d +1}

| {z }

×

>

0(contradition!)

‘general’ X no-shatter =⇒ dVC ≤ d + 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 15/26

(66)

The VC Dimension VC Dimension of Perceptrons

d VC ≤ d + 1 (2/2)

d -D General Case

X =















—

x ^T ₁

—

—

x ^T ₂

— ...

—

x ^T _{d +1}

—

—

x ^T _{d +2}

—















more rows than columns:

linear dependence (some a

_i

non-zero)

x _{d +2}

=

a ₁ x ₁

+

a ₂ x ₂

+. . . +

a _{d +1} x _{d +1}

•

can you generate (sign(a

₁ ), sign(a ₂ ), . . . , sign(a _{d +1} ), ×

)? if so, what

w?

w ^T x _{d +2}

=

a ₁ w ^T x ₁

| {z }

◦

+a

₂ w ^T x ₂

| {z }

×

+. . . +

a _{d +1} w ^T x _{d +1}

| {z }

×

>

0(contradition!)

‘general’ X no-shatter =⇒ dVC ≤ d + 1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 15/26

(67)

The VC Dimension VC Dimension of Perceptrons

Fun Time

Based on the proof above, what is d

VC

of 1126-D perceptrons?

1

1024

2

1126

3

1127

4

6211

Reference Answer: 3

Well,

too much fun for this section! :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 16/26

(68)

The VC Dimension VC Dimension of Perceptrons

Fun Time

Based on the proof above, what is d

VC

of 1126-D perceptrons?

1

1024

2

1126

3

1127

4

6211

Reference Answer: 3

Well,

too much fun for this section! :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 16/26

(69)

The VC Dimension Physical Intuition of VC Dimension

Degrees of Freedom

0 1 2

3 4 5 6 8 7 10 9 11 12 13 14 15 16

1718 0 1

2 3

4 5 6 7 9 8 10 11 12 13 14 15 16

17 18

0 1 2

3 4 5 6 8 7 10 9 11 12 13 14 15 16

1718 0 1

2 3

4 5 6 7 9 8 10 11 12 13 14 15 16

17 18

0 1 2

3 4 5 6 8 7 10 9 11 12 13 14 15 16

1718 0 1

2 3 4 5 6 7 9 8 10 11 12 13 14 15 16

17 18

01 2

3 4 5 6 87 10 9 11 12 13 14 15 16

1718 01

2 3

4 5 6 7 9 8 10 11 12 13 14 15 16

17 18

0 1 2

3 4 5 6 8 7 10 9 11 12 13 14 15 16

1718 0 1

2 3

4 5 6 7 9 8 10 11 12 13 14 15 16

17 18

(modified from the work of Hugues Vermeiren on http://www.texample.net)

•

hypothesis

parameters w = (w ₀ , w ₁ , · · · , w d ):

creates degrees of freedom

•

hypothesis quantity M =|H|:

‘analog’ degrees of freedom

•

hypothesis ‘power’ d_VC=d + 1:

effective ‘binary’ degrees of freedom d

_VC(

H

):

powerfulness

of

H

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 17/26

(70)

The VC Dimension Physical Intuition of VC Dimension

Degrees of Freedom

0 1 2

3 4 5 6 8 7 10 9 11 12 13 14 15 16

1718 0 1

2 3

4 5 6 7 9 8 10 11 12 13 14 15 16

17 18

0 1 2

3 4 5 6 8 7 10 9 11 12 13 14 15 16

1718 0 1

2 3

4 5 6 7 9 8 10 11 12 13 14 15 16

17 18

0 1 2

3 4 5 6 8 7 10 9 11 12 13 14 15 16

1718 0 1

2 3 4 5 6 7 9 8 10 11 12 13 14 15 16

17 18

01 2

3 4 5 6 87 10 9 11 12 13 14 15 16

1718 01

2 3

4 5 6 7 9 8 10 11 12 13 14 15 16

17 18

0 1 2

3 4 5 6 8 7 10 9 11 12 13 14 15 16

1718 0 1

2 3

4 5 6 7 9 8 10 11 12 13 14 15 16

17 18

(modified from the work of Hugues Vermeiren on http://www.texample.net)

•

hypothesis

parameters w = (w ₀ , w ₁ , · · · , w d ):

creates degrees of freedom

•

hypothesis quantity M =|H|:

‘analog’ degrees of freedom

•

hypothesis ‘power’ d_VC=d + 1:

effective ‘binary’ degrees of freedom d

_VC(

H

):

powerfulness

of

H

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 17/26

參考文獻

立即下載 ( PDF - 113 頁 - 762.10 KB )

Outline

if k exists

相關文件

Machine Learning Foundations (ᘤ9M)

• logistic regression often preferred over pocket.. Linear Models for Classification Stochastic Gradient Descent. Two Iterative

Machine Learning Foundations (ᘤ9M)

vice versa.’ To verify the rule, you chose 100 days uniformly at random from the past 10 years of stock data, and found that 80 of them satisfy the rule. What is the best guarantee

Machine Learning Foundations (ᘤ9M)

Which keywords below shall have large positive weights in a good perceptron for the task. 1 coffee, tea,

Machine Learning Foundations (ᘤ9M)

Which keywords below shall have large positive weights in a good perceptron for the task.. 1 coffee, tea,

Machine Learning Techniques (ᘤᢈ)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22.. Decision Tree Decision Tree Hypothesis. Disclaimers about

Machine Learning Foundations (ᘤ9M)

2 You’ll likely be rich by exploiting the rule in the next 100 days, if the market behaves similarly to the last 10 years. 3 You’ll likely be rich by exploiting the ‘best rule’

Machine Learning Foundations (ᘤ9M)

1 After computing if D is linear separable, we shall know w ∗ and then there is no need to use PLA.. Noise and Error Algorithmic Error Measure. Choice of

Machine Learning Foundations (ᘤ9M)

You shall find it difficult to generate more kinds by varying the inputs, and we will give a formal proof in future lectures.

上傳您的學習材料以下載所有文件。

您的文件將被豐富，在 9lib TW 上共享以幫助學習。

相關文件

Machine Learning Foundations

Machine Learning Foundations

97

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

126

0

0

Machine Learning Foundations (ᘤ9M)

Machine Learning Foundations (ᘤ9M)

27

0

0

Machine Learning Foundations (ᘤ9M)

Machine Learning Foundations (ᘤ9M)

32

0

0

Machine Learning Foundations (ᘤ9M)

Machine Learning Foundations (ᘤ9M)

68

0

0

Machine Learning Foundations (ᘤ9M)

Machine Learning Foundations (ᘤ9M)

27

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

153

0

0

Machine Learning Foundations (ᘤ9M)

Machine Learning Foundations (ᘤ9M)

27

0

0