• 沒有找到結果。

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

VC

≥ 2

P

D

h

E

in

(g)− E

out

(g) > 

| {z }

BAD

i

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

VC

≥ 2

P

D

h

E

in

(g)− E

out

(g) > 

| {z }

BAD

i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N



| {z }

δ

Rephrase

. . .,

with probability ≥ 1 − δ

,

GOOD:

E

in

(g)− E

out

(g) ≤  set

δ

= 4(2N)

d

VCexp

1 8



2 N



δ

4(2N)

dVC = exp

1 8



2 N

 ln



4(2N)

dVC

δ



=

1 8



2 N

r

8

N

ln

4(2N)

dVC

δ



= 

√. . .

| {z } Ω(N,

H

,

δ)

: penalty for

model complexity

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

VC

≥ 2

P

D

h

E

in

(g)− E

out

(g) > 

| {z }

BAD

i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N



| {z }

δ

Rephrase

. . .,

with probability ≥ 1 − δ

,

GOOD:

E

in

(g)− E

out

(g) ≤  set

δ

= 4(2N)

d

VCexp

1 8



2 N



δ

4(2N)

dVC = exp

1 8



2 N



ln



4(2N)

dVC

δ



=

1 8



2 N

r

8

N

ln

4(2N)

dVC

δ



= 

√. . .

| {z } Ω(N,

H

,

δ)

: penalty for

model complexity

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

VC

≥ 2

P

D

h

E

in

(g)− E

out

(g) > 

| {z }

BAD

i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N



| {z }

δ

Rephrase

. . .,

with probability ≥ 1 − δ

,

GOOD:

E

in

(g)− E

out

(g) ≤  set

δ

= 4(2N)

d

VCexp

1 8



2 N



δ

4(2N)

dVC = exp

1 8



2 N

 ln

4(2N)

dVC

δ



=

1 8



2 N

r

8

N

ln

4(2N)

dVC

δ



= 

√. . .

| {z } Ω(N,

H

,

δ)

: penalty for

model complexity

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

VC

≥ 2

P

D

h

E

in

(g)− E

out

(g) > 

| {z }

BAD

i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N



| {z }

δ

Rephrase

. . .,

with probability ≥ 1 − δ

,

GOOD:

E

in

(g)− E

out

(g) ≤  set

δ

= 4(2N)

d

VCexp

1 8



2 N



δ

4(2N)

dVC = exp

1 8



2 N

 ln

4(2N)

dVC

δ



=

1 8



2 N

r

8

N

ln

4(2N)

dVC

δ



= 

√. . .

| {z } Ω(N,

H

,

δ)

: penalty for

model complexity

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

VC

≥ 2

P

D

h

E

in

(g)− E

out

(g) > 

| {z }

BAD

i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N



| {z }

δ

Rephrase

. . .,

with probability ≥ 1 − δ

,

GOOD!

gen. error

E

in

(g)− E

out

(g)

r

8 N

ln



4(2N)

dVC

δ



E in (g) − r

8 N ln

 4(2N)

dVC

δ

 ≤

E

out

(g) ≤ E

in

(g) + r

8 N

ln



4(2N)

dVC

δ



√. . .

| {z } Ω(N,

H

,

δ)

: penalty for

model complexity

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

VC

≥ 2

P

D

h

E

in

(g)− E

out

(g) > 

| {z }

BAD

i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N



| {z }

δ

Rephrase

. . .,

with probability ≥ 1 − δ

,

GOOD!

gen. error

E

in

(g)− E

out

(g)

r

8 N

ln



4(2N)

dVC

δ



E in (g) − r

8 N ln

 4(2N)

dVC

δ



E

out

(g) ≤ E

in

(g) + r

8 N

ln



4(2N)

dVC

δ



√. . .

| {z } Ω(N,

H

,

δ)

: penalty for

model complexity

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

VC

≥ 2

P

D

h

E

in

(g)− E

out

(g) > 

| {z }

BAD

i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N



| {z }

δ

Rephrase

. . .,

with probability ≥ 1 − δ

,

GOOD!

gen. error

E

in

(g)− E

out

(g)

r

8 N

ln



4(2N)

dVC

δ



E in (g) − r

8 N ln

 4(2N)

dVC

δ



E

out

(g) ≤ E

in

(g) + r

8 N

ln



4(2N)

dVC

δ



√. . .

| {z } Ω(N,

H

,

δ)

: penalty for

model complexity

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26

The VC Dimension Interpreting VC Dimension

THE VC Message

with

a high probability,

E out (g)

E in (g)

+

r

8 N ln 

4(2N)

dVC

δ



| {z }

Ω(N,H,δ)

in-sample error model complexity out-of-sample error

VC dimension, dvc

Error

dvc

dVC ↑:

E in

but

Ω ↑

dVC ↓:

Ω ↓

but

E in

best dVC

in the middle

powerful H

not always good!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/26

The VC Dimension Interpreting VC Dimension

THE VC Message

with

a high probability,

E out (g)

E in (g)

+

r

8 N ln 

4(2N)

dVC

δ



| {z }

Ω(N,H,δ)

in-sample error model complexity out-of-sample error

VC dimension, dvc

Error

dvc

dVC↑:

E in

but

Ω ↑

dVC ↓:

Ω ↓

but

E in

best dVC

in the middle

powerful H

not always good!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/26

The VC Dimension Interpreting VC Dimension

THE VC Message

with

a high probability,

E out (g)

E in (g)

+

r

8 N ln 

4(2N)

dVC

δ



| {z }

Ω(N,H,δ)

in-sample error model complexity out-of-sample error

VC dimension, dvc

Error

dvc

dVC↑:

E in

but

Ω ↑

dVC↓:

Ω ↓

but

E in

best dVC

in the middle

powerful H

not always good!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/26

The VC Dimension Interpreting VC Dimension

THE VC Message

with

a high probability,

E out (g)

E in (g)

+

r

8 N ln 

4(2N)

dVC

δ



| {z }

Ω(N,H,δ)

in-sample error model complexity out-of-sample error

VC dimension, dvc

Error

dvc

dVC↑:

E in

but

Ω ↑

dVC↓:

Ω ↓

but

E in

best dVC

in the middle

powerful H

not always good!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/26

The VC Dimension Interpreting VC Dimension

THE VC Message

with

a high probability,

E out (g)

E in (g)

+

r

8 N ln 

4(2N)

dVC

δ



| {z }

Ω(N,H,δ)

in-sample error model complexity out-of-sample error

VC dimension, dvc

Error

dvc

dVC↑:

E in

but

Ω ↑

dVC↓:

Ω ↓

but

E in

best dVC

in the middle

powerful H

not always good!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Sample Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

VC

≥ 2

P

D

h

E

in

(g)− E

out

(g) > 

| {z }

BAD

i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N



| {z }

δ

given

specs

 = 0.1, δ = 0.1, dVC=3, want

4(2N) d

VC

exp − 1 8  2 N  ≤ δ N bound

100 2.82 × 10

7

1,000 9.17 × 10

9

10,000 1.19 × 10

8

100,000 1.65 × 10

−38

29,300 9.99 × 10

−2

sample complexity:

need N ≈ 10, 000d

VC

in theory

practical rule of thumb:

N ≈ 10d

VC

often enough!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Sample Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

VC

≥ 2

P

D

h

E

in

(g)− E

out

(g) > 

| {z }

BAD

i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N



| {z }

δ

given

specs

 = 0.1, δ = 0.1, dVC=3, want

4(2N) d

VC

exp − 1 8  2 N  ≤ δ N bound

100 2.82 × 10

7

1,000 9.17 × 10

9

10,000 1.19 × 10

8

100,000 1.65 × 10

−38

29,300 9.99 × 10

−2

sample complexity:

need N ≈ 10, 000d

VC

in theory

practical rule of thumb:

N ≈ 10d

VC

often enough!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Sample Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

VC

≥ 2

P

D

h

E

in

(g)− E

out

(g) > 

| {z }

BAD

i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N



| {z }

δ

given

specs

 = 0.1, δ = 0.1, dVC=3, want

4(2N) d

VC

exp − 1 8  2 N  ≤ δ N bound

100 2.82 × 10

7

1,000 9.17 × 10

9

10,000 1.19 × 10

8

100,000 1.65 × 10

−38

29,300 9.99 × 10

−2

sample complexity:

need N ≈ 10, 000d

VC

in theory

practical rule of thumb:

N ≈ 10d

VC

often enough!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Sample Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

VC

≥ 2

P

D

h

E

in

(g)− E

out

(g) > 

| {z }

BAD

i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N



| {z }

δ

given

specs

 = 0.1, δ = 0.1, dVC=3, want

4(2N) d

VC

exp − 1 8  2 N  ≤ δ N bound

100 2.82 × 10

7

1,000 9.17 × 10

9

10,000 1.19 × 10

8

100,000 1.65 × 10

−38

29,300 9.99 × 10

−2

sample complexity:

need N ≈ 10, 000d

VC

in theory

practical rule of thumb:

N ≈ 10d

VC

often enough!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Sample Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

VC

≥ 2

P

D

h

E

in

(g)− E

out

(g) > 

| {z }

BAD

i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N



| {z }

δ

given

specs

 = 0.1, δ = 0.1, dVC=3, want

4(2N) d

VC

exp − 1 8  2 N  ≤ δ N bound

100 2.82 × 10

7

1,000 9.17 × 10

9

10,000 1.19 × 10

8

100,000 1.65 × 10

−38

29,300 9.99 × 10

−2

sample complexity:

need N ≈ 10, 000d

VC

in theory

practical rule of thumb:

N ≈ 10d

VC

often enough!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Sample Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

VC

≥ 2

P

D

h

E

in

(g)− E

out

(g) > 

| {z }

BAD

i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N



| {z }

δ

given

specs

 = 0.1, δ = 0.1, dVC=3, want

4(2N) d

VC

exp − 1 8  2 N  ≤ δ N bound

100 2.82 × 10

7

1,000 9.17 × 10

9

10,000 1.19 × 10

8

100,000 1.65 × 10

−38

29,300 9.99 × 10

−2

sample complexity:

need N ≈ 10, 000d

VC

in theory

practical rule of thumb:

N ≈ 10d

VC

often enough!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Sample Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

VC

≥ 2

P

D

h

E

in

(g)− E

out

(g) > 

| {z }

BAD

i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N



| {z }

δ

given

specs

 = 0.1, δ = 0.1, dVC=3, want

4(2N) d

VC

exp − 1 8  2 N  ≤ δ N bound

100 2.82 × 10

7

1,000 9.17 × 10

9

10,000 1.19 × 10

8

100,000 1.65 × 10

−38

29,300 9.99 × 10

−2

sample complexity:

need N ≈ 10, 000d

VC

in theory

practical rule of thumb:

N ≈ 10d

VC

often enough!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26

The VC Dimension Interpreting VC Dimension

Looseness of VC Bound

P

D

h

E

in

(g)− E

out

(g) > i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N

 theory: N ≈ 10, 000dVC;

practice: N ≈ 10d

VC

Why?

Hoeffding for unknown E

out any distribution, any target

m

H

(N) instead of|H(x

1

, . . . , x

N

)|

‘any’ data

N

d

VC instead of m

H

(N)

‘any’ H of same d

VC

union bound on worst cases

any choice made by A

—but hardly better, and ‘similarly loose for all models’

philosophical message

of VC bound important for improving ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26

The VC Dimension Interpreting VC Dimension

Looseness of VC Bound

P

D

h

E

in

(g)− E

out

(g) > i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N

 theory: N ≈ 10, 000dVC;

practice: N ≈ 10d

VC

Why?

Hoeffding for unknown E

out any distribution, any target

m

H

(N) instead of|H(x

1

, . . . , x

N

)|

‘any’ data

N

d

VC instead of m

H

(N)

‘any’ H of same d

VC

union bound on worst cases

any choice made by A

—but hardly better, and ‘similarly loose for all models’

philosophical message

of VC bound important for improving ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26

The VC Dimension Interpreting VC Dimension

Looseness of VC Bound

P

D

h

E

in

(g)− E

out

(g) > i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N

 theory: N ≈ 10, 000dVC;

practice: N ≈ 10d

VC

Why?

Hoeffding for unknown E

out any distribution, any target

m

H

(N) instead of|H(x

1

, . . . , x

N

)|

‘any’ data

N

d

VC instead of m

H

(N)

‘any’ H of same d

VC

union bound on worst cases

any choice made by A

—but hardly better, and ‘similarly loose for all models’

philosophical message

of VC bound important for improving ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26

The VC Dimension Interpreting VC Dimension

Looseness of VC Bound

P

D

h

E

in

(g)− E

out

(g) > i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N

 theory: N ≈ 10, 000dVC;

practice: N ≈ 10d

VC

Why?

Hoeffding for unknown E

out any distribution, any target

m

H

(N) instead of|H(x

1

, . . . , x

N

)|

‘any’ data

N

d

VC instead of m

H

(N)

‘any’ H of same d

VC

union bound on worst cases

any choice made by A

—but hardly better, and ‘similarly loose for all models’

philosophical message

of VC bound important for improving ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26

The VC Dimension Interpreting VC Dimension

Looseness of VC Bound

P

D

h

E

in

(g)− E

out

(g) > i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N

 theory: N ≈ 10, 000dVC;

practice: N ≈ 10d

VC

Why?

Hoeffding for unknown E

out any distribution, any target

m

H

(N) instead of|H(x

1

, . . . , x

N

)|

‘any’ data

N

d

VC instead of m

H

(N)

‘any’ H of same d

VC

union bound on worst cases

any choice made by A

—but hardly better, and ‘similarly loose for all models’

philosophical message

of VC bound important for improving ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26

The VC Dimension Interpreting VC Dimension

Looseness of VC Bound

P

D

h

E

in

(g)− E

out

(g) > i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N

 theory: N ≈ 10, 000dVC;

practice: N ≈ 10d

VC

Why?

Hoeffding for unknown E

out any distribution, any target

m

H

(N) instead of|H(x

1

, . . . , x

N

)|

‘any’ data

N

d

VC instead of m

H

(N)

‘any’ H of same d

VC

union bound on worst cases

any choice made by A

—but hardly better, and ‘similarly loose for all models’

philosophical message

of VC bound important for improving ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26

The VC Dimension Interpreting VC Dimension

Looseness of VC Bound

P

D

h

E

in

(g)− E

out

(g) > i

if k exists

≤ 4(2N)

d

VCexp

1 8



2 N

 theory: N ≈ 10, 000dVC;

practice: N ≈ 10d

VC

Why?

Hoeffding for unknown E

out any distribution, any target

m

H

(N) instead of|H(x

1

, . . . , x

N

)|

‘any’ data

N

d

VC instead of m

H

(N)

‘any’ H of same d

VC

union bound on worst cases

any choice made by A

—but hardly better, and ‘similarly loose for all models’

philosophical message

of VC bound important for improving ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26

The VC Dimension Interpreting VC Dimension

Fun Time

Consider the VC Bound below. How can we decrease the probability of getting BAD data?

P

D

h

E

in

(g)− E

out

(g) > i

if k exists

≤ 4(2N)

d

VCexp

1 8



2

N

1

decrease model complexity dVC

2

increase data size N a lot

3

increase generalization error tolerance

4

all of the above

Reference Answer: 4

Congratulations on being

相關文件