最近搜尋

沒有找到結果。

標籤

沒有找到結果。

文件

沒有找到結果。

上傳

首頁學校主題

登錄

if k exists

在文檔中 Machine Learning Foundations (ᘤ9M) (頁 84-111)

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

^VC

≥ 2

P

D

h

E

_in

(g)− E

^out

(g) >

| {z }

BAD

i

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

^VC

≥ 2

P

D

h

E

_in

(g)− E

^out

(g) >

| {z }

BAD

i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ 8

² N

| {z }

δ

Rephrase

. . .,

with probability ≥ 1 − δ

,

GOOD:

E

_in

(g)− E

out

(g) ≤ set

δ

= 4(2N)

^d

^VCexp

−

¹ ₈

² N

δ

4(2N)

^d^VC = exp

−

¹ ₈

² N

ln

4(2N)

^d^VC

δ

=

¹ ₈

² N

r

8

N

ln

_4(2N)

d_VC

δ

=

√. . .

| {z } Ω(N,

H

,

δ)

: penalty for

model complexity

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

^VC

≥ 2

P

D

h

E

_in

(g)− E

^out

(g) >

| {z }

BAD

i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ 8

² N

| {z }

δ

Rephrase

. . .,

with probability ≥ 1 − δ

,

GOOD:

E

_in

(g)− E

out

(g) ≤ set

δ

= 4(2N)

^d

^VCexp

−

¹ ₈

² N

δ

4(2N)

^d^VC = exp

−

¹ ₈

² N

ln

4(2N)

^d^VC

δ

=

¹ ₈

² N

r

8

N

ln

_4(2N)

d_VC

δ

=

√. . .

| {z } Ω(N,

H

,

δ)

: penalty for

model complexity

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

^VC

≥ 2

P

D

h

E

_in

(g)− E

^out

(g) >

| {z }

BAD

i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ 8

² N

| {z }

δ

Rephrase

. . .,

with probability ≥ 1 − δ

,

GOOD:

E

_in

(g)− E

out

(g) ≤ set

δ

= 4(2N)

^d

^VCexp

−

¹ ₈

² N

δ

4(2N)

^d^VC = exp

−

¹ ₈

² N

ln

_4(2N)

d_VC

δ

=

¹ ₈

² N

r

8

N

ln

_4(2N)

d_VC

δ

=

√. . .

| {z } Ω(N,

H

,

δ)

: penalty for

model complexity

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

^VC

≥ 2

P

D

h

E

_in

(g)− E

^out

(g) >

| {z }

BAD

i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ 8

² N

| {z }

δ

Rephrase

. . .,

with probability ≥ 1 − δ

,

GOOD:

E

_in

(g)− E

out

(g) ≤ set

δ

= 4(2N)

^d

^VCexp

−

¹ ₈

² N

δ

4(2N)

^d^VC = exp

−

¹ ₈

² N

ln

_4(2N)

d_VC

δ

=

¹ ₈

² N

r

8

N

ln

_4(2N)

d_VC

δ

=

√. . .

| {z } Ω(N,

H

,

δ)

: penalty for

model complexity

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

^VC

≥ 2

P

D

h

E

_in

(g)− E

^out

(g) >

| {z }

BAD

i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ 8

² N

| {z }

δ

Rephrase

. . .,

with probability ≥ 1 − δ

,

GOOD!

gen. error

E

_in

(g)− E

^out

(g)

≤

r

8 N

ln

4(2N)

^d^VC

δ

E _in (g) − r

8 N ln

4(2N)

^d^VC

δ

≤

E

out

(g) ≤ E

_in

(g) + r

8 N

ln

4(2N)

^d^VC

δ

√. . .

| {z } Ω(N,

H

,

δ)

: penalty for

model complexity

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

^VC

≥ 2

P

D

h

E

_in

(g)− E

^out

(g) >

| {z }

BAD

i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ 8

² N

| {z }

δ

Rephrase

. . .,

with probability ≥ 1 − δ

,

GOOD!

gen. error

E

_in

(g)− E

^out

(g)

≤

r

8 N

ln

4(2N)

^d^VC

δ

E _in (g) − r

8 N ln

4(2N)

^d^VC

δ

≤

E

out

(g) ≤ E

_in

(g) + r

8 N

ln

4(2N)

^d^VC

δ

√. . .

| {z } Ω(N,

H

,

δ)

: penalty for

model complexity

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Penalty for Model Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

^VC

≥ 2

P

D

h

E

_in

(g)− E

^out

(g) >

| {z }

BAD

i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ 8

² N

| {z }

δ

Rephrase

. . .,

with probability ≥ 1 − δ

,

GOOD!

gen. error

E

_in

(g)− E

^out

(g)

≤

r

8 N

ln

4(2N)

^d^VC

δ

E _in (g) − r

8 N ln

4(2N)

^d^VC

δ

≤

E

out

(g) ≤ E

_in

(g) + r

8 N

ln

4(2N)

^d^VC

δ

√. . .

| {z } Ω(N,

H

,

δ)

: penalty for

model complexity

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26

The VC Dimension Interpreting VC Dimension

THE VC Message

with

a high probability,

E _out (g)

≤

E _in (g)

+

r

8 N ln

4(2N)

^d^VC

δ

| {z }

Ω(N,H,δ)

in-sample error model complexity out-of-sample error

VC dimension, dvc

Error

d^∗vc

•

dVC ↑:

E _in ↓

but

Ω ↑

•

dVC ↓:

Ω ↓

but

E _in ↑

•

best d_VC

^∗ in the middle

powerful H

not always good!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/26

The VC Dimension Interpreting VC Dimension

THE VC Message

with

a high probability,

E _out (g)

≤

E _in (g)

+

r

8 N ln

4(2N)

^d^VC

δ

| {z }

Ω(N,H,δ)

in-sample error model complexity out-of-sample error

VC dimension, dvc

Error

d^∗vc

•

d_VC↑:

E _in ↓

but

Ω ↑

•

dVC ↓:

Ω ↓

but

E _in ↑

•

best d_VC

^∗ in the middle

powerful H

not always good!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/26

The VC Dimension Interpreting VC Dimension

THE VC Message

with

a high probability,

E _out (g)

≤

E _in (g)

+

r

8 N ln

4(2N)

^d^VC

δ

| {z }

Ω(N,H,δ)

in-sample error model complexity out-of-sample error

VC dimension, dvc

Error

d^∗vc

•

d_VC↑:

E _in ↓

but

Ω ↑

•

dVC↓:

Ω ↓

but

E _in ↑

•

best d_VC

^∗ in the middle

powerful H

not always good!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/26

The VC Dimension Interpreting VC Dimension

THE VC Message

with

a high probability,

E _out (g)

≤

E _in (g)

+

r

8 N ln

4(2N)

^d^VC

δ

| {z }

Ω(N,H,δ)

in-sample error model complexity out-of-sample error

VC dimension, dvc

Error

d^∗vc

•

d_VC↑:

E _in ↓

but

Ω ↑

•

dVC↓:

Ω ↓

but

E _in ↑

•

best d_VC

^∗ in the middle

powerful H

not always good!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/26

The VC Dimension Interpreting VC Dimension

THE VC Message

with

a high probability,

E _out (g)

≤

E _in (g)

+

r

8 N ln

4(2N)

^d^VC

δ

| {z }

Ω(N,H,δ)

in-sample error model complexity out-of-sample error

VC dimension, dvc

Error

d^∗vc

•

d_VC↑:

E _in ↓

but

Ω ↑

•

dVC↓:

Ω ↓

but

E _in ↑

•

best d_VC

^∗ in the middle

powerful H

not always good!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Sample Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

^VC

≥ 2

P

D

h

E

_in

(g)− E

^out

(g) >

| {z }

BAD

i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ ₈

² N

| {z }

δ

given

specs

= 0.1, δ = 0.1, dVC=3, want

4(2N) ^d

^VC

exp − ¹ ₈ ² N ≤ δ N bound

100 2.82 × 10

⁷

1,000 9.17 × 10

⁹

10,000 1.19 × 10

⁸

100,000 1.65 × 10

⁻³⁸

29,300 9.99 × 10

⁻²

sample complexity:

need N ≈ 10, 000d

VC

in theory

practical rule of thumb:

N ≈ 10d

^VC

often enough!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Sample Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

^VC

≥ 2

P

D

h

E

_in

(g)− E

^out

(g) >

| {z }

BAD

i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ ₈

² N

| {z }

δ

given

specs

= 0.1, δ = 0.1, dVC=3, want

4(2N) ^d

^VC

exp − ¹ ₈ ² N ≤ δ N bound

100 2.82 × 10

⁷

1,000 9.17 × 10

⁹

10,000 1.19 × 10

⁸

100,000 1.65 × 10

⁻³⁸

29,300 9.99 × 10

⁻²

sample complexity:

need N ≈ 10, 000d

VC

in theory

practical rule of thumb:

N ≈ 10d

^VC

often enough!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Sample Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

^VC

≥ 2

P

D

h

E

_in

(g)− E

^out

(g) >

| {z }

BAD

i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ ₈

² N

| {z }

δ

given

specs

= 0.1, δ = 0.1, dVC=3, want

4(2N) ^d

^VC

exp − ¹ ₈ ² N ≤ δ N bound

100 2.82 × 10

⁷

1,000 9.17 × 10

⁹

10,000 1.19 × 10

⁸

100,000 1.65 × 10

⁻³⁸

29,300 9.99 × 10

⁻²

sample complexity:

need N ≈ 10, 000d

VC

in theory

practical rule of thumb:

N ≈ 10d

^VC

often enough!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Sample Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

^VC

≥ 2

P

D

h

E

_in

(g)− E

^out

(g) >

| {z }

BAD

i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ ₈

² N

| {z }

δ

given

specs

= 0.1, δ = 0.1, dVC=3, want

4(2N) ^d

^VC

exp − ¹ ₈ ² N ≤ δ N bound

100 2.82 × 10

⁷

1,000 9.17 × 10

⁹

10,000 1.19 × 10

⁸

100,000 1.65 × 10

⁻³⁸

29,300 9.99 × 10

⁻²

sample complexity:

need N ≈ 10, 000d

VC

in theory

practical rule of thumb:

N ≈ 10d

^VC

often enough!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Sample Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

^VC

≥ 2

P

D

h

E

_in

(g)− E

^out

(g) >

| {z }

BAD

i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ ₈

² N

| {z }

δ

given

specs

= 0.1, δ = 0.1, dVC=3, want

4(2N) ^d

^VC

exp − ¹ ₈ ² N ≤ δ N bound

100 2.82 × 10

⁷

1,000 9.17 × 10

⁹

10,000 1.19 × 10

⁸

100,000 1.65 × 10

⁻³⁸

29,300 9.99 × 10

⁻²

sample complexity:

need N ≈ 10, 000d

VC

in theory

practical rule of thumb:

N ≈ 10d

^VC

often enough!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Sample Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

^VC

≥ 2

P

D

h

E

_in

(g)− E

^out

(g) >

| {z }

BAD

i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ ₈

² N

| {z }

δ

given

specs

= 0.1, δ = 0.1, dVC=3, want

4(2N) ^d

^VC

exp − ¹ ₈ ² N ≤ δ N bound

100 2.82 × 10

⁷

1,000 9.17 × 10

⁹

10,000 1.19 × 10

⁸

100,000 1.65 × 10

⁻³⁸

29,300 9.99 × 10

⁻²

sample complexity:

need N ≈ 10, 000d

VC

in theory

practical rule of thumb:

N ≈ 10d

^VC

often enough!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26

The VC Dimension Interpreting VC Dimension

VC Bound Rephrase: Sample Complexity

For any

g

=

A

(

D

)∈

H

and ‘statistical’

large D

,

for N ≥ 2, d

^VC

≥ 2

P

D

h

E

_in

(g)− E

^out

(g) >

| {z }

BAD

i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ ₈

² N

| {z }

δ

given

specs

= 0.1, δ = 0.1, dVC=3, want

4(2N) ^d

^VC

exp − ¹ ₈ ² N ≤ δ N bound

100 2.82 × 10

⁷

1,000 9.17 × 10

⁹

10,000 1.19 × 10

⁸

100,000 1.65 × 10

⁻³⁸

29,300 9.99 × 10

⁻²

sample complexity:

need N ≈ 10, 000d

VC

in theory

practical rule of thumb:

N ≈ 10d

^VC

often enough!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26

The VC Dimension Interpreting VC Dimension

Looseness of VC Bound

P

D

h

E

_in

(g)− E

^out

(g) > i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ ₈

² N

theory: N ≈ 10, 000d^VC;

practice: N ≈ 10d

^VC

Why?

•

Hoeffding for unknown E

out any distribution, any target

•

m

H

(N) instead of|H(x

1

, . . . , x

_N

)|

‘any’ data

•

N

^d

^VC instead of m

_H

(N)

‘any’ H of same d

VC

•

union bound on worst cases

any choice made by A

—but hardly better, and ‘similarly loose for all models’

philosophical message

of VC bound important for improving ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26

The VC Dimension Interpreting VC Dimension

Looseness of VC Bound

P

D

h

E

_in

(g)− E

^out

(g) > i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ ₈

² N

theory: N ≈ 10, 000d^VC;

practice: N ≈ 10d

^VC

Why?

•

Hoeffding for unknown E

_out any distribution, any target

•

m

H

(N) instead of|H(x

1

, . . . , x

_N

)|

‘any’ data

•

N

^d

^VC instead of m

_H

(N)

‘any’ H of same d

VC

•

union bound on worst cases

any choice made by A

—but hardly better, and ‘similarly loose for all models’

philosophical message

of VC bound important for improving ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26

The VC Dimension Interpreting VC Dimension

Looseness of VC Bound

P

D

h

E

_in

(g)− E

^out

(g) > i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ ₈

² N

theory: N ≈ 10, 000d^VC;

practice: N ≈ 10d

^VC

Why?

•

Hoeffding for unknown E

_out any distribution, any target

•

m

H

(N) instead of|H(x

1

, . . . , x

_N

)|

‘any’ data

•

N

^d

^VC instead of m

_H

(N)

‘any’ H of same d

VC

•

union bound on worst cases

any choice made by A

—but hardly better, and ‘similarly loose for all models’

philosophical message

of VC bound important for improving ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26

The VC Dimension Interpreting VC Dimension

Looseness of VC Bound

P

D

h

E

_in

(g)− E

^out

(g) > i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ ₈

² N

theory: N ≈ 10, 000d^VC;

practice: N ≈ 10d

^VC

Why?

•

Hoeffding for unknown E

_out any distribution, any target

•

m

H

(N) instead of|H(x

1

, . . . , x

_N

)|

‘any’ data

•

N

^d

^VC instead of m

H

(N)

‘any’ H of same d

^VC

•

union bound on worst cases

any choice made by A

—but hardly better, and ‘similarly loose for all models’

philosophical message

of VC bound important for improving ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26

The VC Dimension Interpreting VC Dimension

Looseness of VC Bound

P

D

h

E

_in

(g)− E

^out

(g) > i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ ₈

² N

theory: N ≈ 10, 000d^VC;

practice: N ≈ 10d

^VC

Why?

•

Hoeffding for unknown E

_out any distribution, any target

•

m

H

(N) instead of|H(x

1

, . . . , x

_N

)|

‘any’ data

•

N

^d

^VC instead of m

H

(N)

‘any’ H of same d

^VC

•

union bound on worst cases

any choice made by A

—but hardly better, and ‘similarly loose for all models’

philosophical message

of VC bound important for improving ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26

The VC Dimension Interpreting VC Dimension

Looseness of VC Bound

P

D

h

E

_in

(g)− E

^out

(g) > i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ ₈

² N

theory: N ≈ 10, 000d^VC;

practice: N ≈ 10d

^VC

Why?

•

Hoeffding for unknown E

_out any distribution, any target

•

m

H

(N) instead of|H(x

1

, . . . , x

_N

)|

‘any’ data

•

N

^d

^VC instead of m

H

(N)

‘any’ H of same d

^VC

•

union bound on worst cases

any choice made by A

—but hardly better, and ‘similarly loose for all models’

philosophical message

of VC bound important for improving ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26

The VC Dimension Interpreting VC Dimension

Looseness of VC Bound

P

D

h

E

_in

(g)− E

^out

(g) > i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ ₈

² N

theory: N ≈ 10, 000d^VC;

practice: N ≈ 10d

^VC

Why?

•

Hoeffding for unknown E

_out any distribution, any target

•

m

H

(N) instead of|H(x

1

, . . . , x

_N

)|

‘any’ data

•

N

^d

^VC instead of m

H

(N)

‘any’ H of same d

^VC

•

union bound on worst cases

any choice made by A

—but hardly better, and ‘similarly loose for all models’

philosophical message

of VC bound important for improving ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26

The VC Dimension Interpreting VC Dimension

Fun Time

Consider the VC Bound below. How can we decrease the probability of getting BAD data?

P

D

h

E

_in

(g)− E

^out

(g) > i

if k exists

≤ 4(2N)

^d

^VCexp

−

¹ 8

²

N

1

decrease model complexity d_VC

2

increase data size N a lot

3

increase generalization error tolerance

4

all of the above

Reference Answer: 4

Congratulations on being

在文檔中 Machine Learning Foundations (ᘤ9M) (頁 84-111)

立即下載 "Machine Learning Found..."

Outline

if k exists (你在這裡)

相關文件