learning possible with infinite lines :-)

Training versus Testing Effective Number of Lines

Effective Number of Lines

maximum kinds of lines with respect to N inputs

x ₁

, x

2

,· · · , x

N

⇐⇒

effective number of lines

•

must be≤ 2

^N

(why?)

•

finite ‘grouping’ of infinitely-many lines∈ H

•

wish:

_in

(g)− E

out

(g) >

≤ 2 ·

effective(N)

· exp

−2

²

lines in 2D

N effective(N)

1 2

2 4

3 8

4 14

< 2 ^N

if 1 effective(N) can replace M and

2 effective(N) 2

^N

Training versus Testing Effective Number of Lines

Effective Number of Lines

maximum kinds of lines with respect to N inputs

x ₁

, x

2

,· · · , x

N

⇐⇒

effective number of lines

•

must be≤ 2

^N

(why?)

•

finite ‘grouping’ of infinitely-many lines∈ H

•

wish:

_in

(g)− E

out

(g) >

≤ 2 ·

effective(N)

· exp

−2

²

lines in 2D

N effective(N)

1 2

2 4

3 8

4 14

< 2 ^N

if 1 effective(N) can replace M and

2 effective(N) 2

^N

learning possible with infinite lines :-)

Training versus Testing Effective Number of Lines

Effective Number of Lines

maximum kinds of lines with respect to N inputs

x ₁

, x

2

,· · · , x

N

⇐⇒

effective number of lines

•

must be≤ 2

^N

(why?)

•

finite ‘grouping’ of infinitely-many lines∈ H

•

wish:

_in

(g)− E

out

(g) >

≤ 2 ·

effective(N)

· exp

−2

²

lines in 2D

N effective(N)

1 2

2 4

3 8

4 14

< 2 ^N

if 1 effective(N) can replace M and

2 effective(N) 2

^N

learning possible with infinite lines :-)

Training versus Testing Effective Number of Lines

Effective Number of Lines

maximum kinds of lines with respect to N inputs

x ₁

, x

2

,· · · , x

N

⇐⇒

effective number of lines

•

must be≤ 2

^N

(why?)

•

finite ‘grouping’ of infinitely-many lines∈ H

•

wish:

_in

(g)− E

out

(g) >

≤ 2 ·

effective(N)

· exp

−2

²

lines in 2D

N effective(N)

1 2

2 4

3 8

4 14

< 2 ^N

if 1 effective(N) can replace M and

2 effective(N) 2

^N

learning possible with infinite lines :-)

Training versus Testing Effective Number of Lines

Effective Number of Lines

maximum kinds of lines with respect to N inputs

x ₁

, x

2

,· · · , x

N

⇐⇒

effective number of lines

•

must be≤ 2

^N

(why?)

•

finite ‘grouping’ of infinitely-many lines∈ H

•

wish:

_in

(g)− E

out

(g) >

≤ 2 ·

effective(N)

· exp

−2

²

lines in 2D

N effective(N)

1 2

2 4

3 8

4 14

< 2 ^N

if 1 effective(N) can replace M and

2 effective(N) 2

^N

learning possible with infinite lines :-)

Training versus Testing Effective Number of Lines

Effective Number of Lines

maximum kinds of lines with respect to N inputs

x ₁

, x

2

,· · · , x

N

⇐⇒

effective number of lines

•

must be≤ 2

^N

(why?)

•

finite ‘grouping’ of infinitely-many lines∈ H

•

wish:

_in

(g)− E

out

(g) >

≤ 2 ·

effective(N)

· exp

−2

²

lines in 2D

N effective(N)

1 2

2 4

3 8

4 14

< 2 ^N

if 1 effective(N) can replace M and

2 effective(N) 2

^N

learning possible with infinite lines :-)

Training versus Testing Effective Number of Lines

Fun Time

What is the effective number of lines for five inputs ∈ R ² ?

1

2

3

4 Reference Answer: 3

If you put the inputs roughly around a circle, you can then pick any consecutive inputs to be on one side of the line, and the other inputs to be on the other side. The procedure leads to effectively 22 kinds of lines, which is

much smaller than 2 ⁵ = 32. You shall find it difficult

to generate more kinds by varying the inputs, and we will give a formal proof in future lectures.

•x

1

•x

2

•x

3

•x

4

•x

5

Training versus Testing Effective Number of Lines

Fun Time

What is the effective number of lines for five inputs ∈ R ² ?

1

2

3

4 Reference Answer: 3

much smaller than 2 ⁵ = 32. You shall find it difficult

to generate more kinds by varying the inputs, and we will give a formal proof in future lectures.

•x

1

•x

2

•x

3

•x

4

•x

5

Training versus Testing Effective Number of Hypotheses

Dichotomies: Mini-hypotheses

H = {hypothesis h : X → {×,

◦}}

•

call

h(x

₁

, x

₂

, . . . , x

_N

) = (h(x

₁

), h(x

₂

), . . . , h(x

_N

))∈ {×,

◦} ^N

dichotomy: hypothesis ‘limited’ to the eyes of x ₁

, x

₂

, . . . , x

_N

•

H(x

1

, x

₂

, . . . , x

_N

all dichotomies ‘implemented’ by H on x 1 , x ₂ , . . . , x _N

hypothesesH dichotomiesH(x

1

, x

₂

, . . . , x

_N

) e.g. all lines in R

²

{◦◦◦◦,

◦◦◦×

◦◦××

, . . .} size possibly infinite upper bounded by 2

^N

|H(x

1

, x

₂

, . . . , x

_N

)|: candidate for

replacing M

Training versus Testing Effective Number of Hypotheses

Dichotomies: Mini-hypotheses

H = {hypothesis h : X → {×,

◦}}

•

call

h(x

₁

, x

₂

, . . . , x

_N

) = (h(x

₁

), h(x

₂

), . . . , h(x

_N

))∈ {×,

◦} ^N

dichotomy: hypothesis ‘limited’ to the eyes of x ₁

, x

₂

, . . . , x

_N

•

H(x

1

, x

₂

, . . . , x

_N

all dichotomies ‘implemented’ by H on x 1 , x ₂ , . . . , x _N

hypothesesH dichotomiesH(x

1

, x

₂

, . . . , x

_N

) e.g. all lines in R

²

{◦◦◦◦,

◦◦◦×

◦◦××

, . . .} size possibly infinite upper bounded by 2

^N

|H(x

1

, x

₂

, . . . , x

_N

)|: candidate for

replacing M

Training versus Testing Effective Number of Hypotheses

Dichotomies: Mini-hypotheses

H = {hypothesis h : X → {×,

◦}}

•

call

h(x

₁

, x

₂

, . . . , x

_N

) = (h(x

₁

), h(x

₂

), . . . , h(x

_N

))∈ {×,

◦} ^N

dichotomy: hypothesis ‘limited’ to the eyes of x ₁

, x

₂

, . . . , x

_N

•

H(x

1

, x

₂

, . . . , x

_N

all dichotomies ‘implemented’ by H on x 1 , x ₂ , . . . , x _N

hypothesesH dichotomiesH(x

1

, x

₂

, . . . , x

_N

) e.g. all lines in R

²

{◦◦◦◦,

◦◦◦×

◦◦××

, . . .} size possibly infinite upper bounded by 2

^N

|H(x

1

, x

₂

, . . . , x

_N

)|: candidate for

replacing M

Training versus Testing Effective Number of Hypotheses

Dichotomies: Mini-hypotheses

H = {hypothesis h : X → {×,

◦}}

•

call

h(x

₁

, x

₂

, . . . , x

_N

) = (h(x

₁

), h(x

₂

), . . . , h(x

_N

))∈ {×,

◦} ^N

dichotomy: hypothesis ‘limited’ to the eyes of x ₁

, x

₂

, . . . , x

_N

•

H(x

1

, x

₂

, . . . , x

_N

all dichotomies ‘implemented’ by H on x 1 , x ₂ , . . . , x _N

hypothesesH dichotomiesH(x

1

, x

₂

, . . . , x

_N

) e.g. all lines in R

²

{◦◦◦◦,

◦◦◦×

◦◦××

, . . .} size possibly infinite upper bounded by 2

^N

|H(x

1

, x

₂

, . . . , x

_N

)|: candidate for

replacing M

Training versus Testing Effective Number of Hypotheses

Dichotomies: Mini-hypotheses

H = {hypothesis h : X → {×,

◦}}

•

call

h(x

₁

, x

₂

, . . . , x

_N

) = (h(x

₁

), h(x

₂

), . . . , h(x

_N

))∈ {×,

◦} ^N

dichotomy: hypothesis ‘limited’ to the eyes of x ₁

, x

₂

, . . . , x

_N

•

H(x

1

, x

₂

, . . . , x

_N

all dichotomies ‘implemented’ by H on x 1 , x ₂ , . . . , x _N

hypothesesH dichotomiesH(x

1

, x

₂

, . . . , x

_N

) e.g. all lines in R

²

{◦◦◦◦,

◦◦◦×

◦◦××

, . . .} size possibly infinite upper bounded by 2

^N

Training versus Testing Effective Number of Hypotheses

Growth Function

•

|H(x

1

, x

₂

, . . . , x

_N

)|: depend on inputs (x

₁

, x

₂

, . . . , x

_N

)

•

growth function:

remove dependence by

taking max of all possible (x ₁ , x ₂ , . . . , x _N )

H

(N) = max

x

₁

,x

₂

,...,x

∈X

|H(x

1

, x

₂

, . . . , x

_N

•

finite, upper-bounded by 2

^N

lines in 2D

N m

H

(N)

1 2

2 4

3 max(. . . , 6, 8)

=8 4 14

< 2 ^N

how to ‘calculate’ the growth function?

Training versus Testing Effective Number of Hypotheses

Growth Function

•

|H(x

1

, x

₂

, . . . , x

_N

)|: depend on inputs (x

₁

, x

₂

, . . . , x

_N

)

•

growth function:

remove dependence by

taking max of all possible (x ₁ , x ₂ , . . . , x _N )

H

(N) = max

x

,x

,...,x

∈X

|H(x

1

, x

₂

, . . . , x

_N

•

finite, upper-bounded by 2

^N

lines in 2D

N m

H

(N)

1 2

2 4

3 max(. . . , 6, 8)

=8 4 14

< 2 ^N

how to ‘calculate’ the growth function?

Training versus Testing Effective Number of Hypotheses

Growth Function

•

|H(x

1

, x

₂

, . . . , x

_N

)|: depend on inputs (x

₁

, x

₂

, . . . , x

_N

)

•

growth function:

remove dependence by

taking max of all possible (x ₁ , x ₂ , . . . , x _N )

H

(N) = max

x

,x

,...,x

∈X

|H(x

1

, x

₂

, . . . , x

_N

•

finite, upper-bounded by 2

^N

lines in 2D

N m

_H

(N)

1 2

2 4

3 max(. . . , 6, 8)

=8 4 14

< 2 ^N

how to ‘calculate’ the growth function?

Training versus Testing Effective Number of Hypotheses

Growth Function

•

|H(x

1

, x

₂

, . . . , x

_N

)|: depend on inputs (x

₁

, x

₂

, . . . , x

_N

)

•

growth function:

remove dependence by

taking max of all possible (x ₁ , x ₂ , . . . , x _N )

H

(N) = max

x

,x

,...,x

∈X

|H(x

1

, x

₂

, . . . , x

_N

•

finite, upper-bounded by 2

^N

lines in 2D

N m

_H

(N)

1 2

2 4

3 max(. . . , 6, 8)

=8 4 14

< 2 ^N

how to ‘calculate’ the growth function?

Training versus Testing Effective Number of Hypotheses

Growth Function

•

|H(x

1

, x

₂

, . . . , x

_N

)|: depend on inputs (x

₁

, x

₂

, . . . , x

_N

)

•

growth function:

remove dependence by

taking max of all possible (x ₁ , x ₂ , . . . , x _N )

H

(N) = max

x

,x

,...,x

∈X

|H(x

1

, x

₂

, . . . , x

_N

•

finite, upper-bounded by 2

^N

lines in 2D

N m

_H

(N)

1 2

2 4

3 max(. . . , 6, 8)

=8 4 14

< 2 ^N

how to ‘calculate’ the growth function?

Training versus Testing Effective Number of Hypotheses

Growth Function for Positive Rays

x

. . . x

h(x) = −1 h(x) = +1

a

•

X = R (one dimensional)

•

H contains h, where

each h(x ) = sign(x − a) for threshold a

•

‘positive half’ of 1D perceptrons

one dichotomy for a∈ each spot (x

ⁿ

, x

_n+1

m H (N) = N + 1

(N + 1)

2 ^N

when N large!

₁

₂

₃

₄

◦ ◦ ◦ ◦

× ◦ ◦ ◦

× × ◦ ◦

× × × ◦

× × × ×

Training versus Testing Effective Number of Hypotheses

Growth Function for Positive Rays

x

. . . x

h(x) = −1 h(x) = +1

a

•

X = R (one dimensional)

•

H contains h, where

each h(x ) = sign(x − a) for threshold a

•

‘positive half’ of 1D perceptrons

one dichotomy for a∈ each spot (x

ⁿ

, x

_n+1

m H (N) = N + 1

(N + 1)

2 ^N

when N large!

₁

₂

₃

₄

◦ ◦ ◦ ◦

× ◦ ◦ ◦

× × ◦ ◦

× × × ◦

× × × ×

Training versus Testing Effective Number of Hypotheses

Growth Function for Positive Rays

x

. . . x

h(x) = −1 h(x) = +1

a

•

X = R (one dimensional)

•

H contains h, where

each h(x ) = sign(x − a) for threshold a

•

‘positive half’ of 1D perceptrons

one dichotomy for a∈ each spot (x

ⁿ

, x

_n+1

m H (N) = N + 1

(N + 1)

2 ^N

when N large!

₁

₂

₃

₄

◦ ◦ ◦ ◦

× ◦ ◦ ◦

× × ◦ ◦

× × × ◦

× × × ×

Training versus Testing Effective Number of Hypotheses

Growth Function for Positive Rays

x

. . . x

h(x) = −1 h(x) = +1

a

•

X = R (one dimensional)

•

H contains h, where

each h(x ) = sign(x − a) for threshold a

•

‘positive half’ of 1D perceptrons

one dichotomy for a∈ each spot (x

ⁿ

, x

_n+1

m H (N) = N + 1

(N + 1)

2 ^N

when N large!

₁

₂

₃

₄

◦ ◦ ◦ ◦

× ◦ ◦ ◦

× × ◦ ◦

× × × ◦

× × × ×

Training versus Testing Effective Number of Hypotheses

Growth Function for Positive Rays

x

. . . x

h(x) = −1 h(x) = +1

a

•

X = R (one dimensional)

•

H contains h, where

each h(x ) = sign(x − a) for threshold a

•

‘positive half’ of 1D perceptrons

one dichotomy for a∈ each spot (x

ⁿ

, x

_n+1

m H (N) = N + 1

₁

₂

₃

₄

◦ ◦ ◦ ◦

× ◦ ◦ ◦

Training versus Testing Effective Number of Hypotheses

Growth Function for Positive Intervals

x

. . . x

h(x) = −1 h(x) = +1 h(x) = −1

•

X = R (one dimensional)

•

H contains h, where

each h(x ) = +1 iff x ∈ [`, r), −1 otherwise

one dichotomy for each ‘interval kind’

m H (N)

N + 1

2 | {z }

interval ends in N + 1 spots

1 |{z}

all ×

= 1

2 N ² + 1 2 N + 1

1

2 ²

¹ ₂

N + 1

2 ^N

when N large!

x

◦ × × ×

◦ ◦ × ×

◦ ◦ ◦ ×

◦ ◦ ◦ ◦

× ◦ × ×

× ◦ ◦ ×

× ◦ ◦ ◦

× × ◦ ×

× × ◦ ◦

× × × ◦

× × × ×

Training versus Testing Effective Number of Hypotheses

Growth Function for Positive Intervals

x

. . . x

h(x) = −1 h(x) = +1 h(x) = −1

•

X = R (one dimensional)

•

H contains h, where

each h(x ) = +1 iff x ∈ [`, r), −1 otherwise

one dichotomy for each ‘interval kind’

m H (N)

N + 1 2

| {z }

interval ends in N + 1 spots

1 |{z}

all ×

1

2 ²

¹ ₂

N + 1

2 ^N

when N large!

x

◦ × × ×

◦ ◦ × ×

◦ ◦ ◦ ×

◦ ◦ ◦ ◦

× ◦ × ×

× ◦ ◦ ×

× ◦ ◦ ◦

× × ◦ ×

× × ◦ ◦

× × × ◦

× × × ×

Training versus Testing Effective Number of Hypotheses

Growth Function for Positive Intervals

x

. . . x

h(x) = −1 h(x) = +1 h(x) = −1

•

X = R (one dimensional)

•

H contains h, where

each h(x ) = +1 iff x ∈ [`, r), −1 otherwise

one dichotomy for each ‘interval kind’

m H (N)

N + 1 2

| {z }

interval ends in N + 1 spots

1 |{z}

all ×

= 1

2 N ² + 1 2 N + 1

1

2 ²

¹ ₂

N + 1

2 ^N

when N large!

x

◦ × × ×

◦ ◦ × ×

◦ ◦ ◦ ×

◦ ◦ ◦ ◦

× ◦ × ×

× ◦ ◦ ×

× ◦ ◦ ◦

× × ◦ ×

× × ◦ ◦

× × × ◦

× × × ×

Training versus Testing Effective Number of Hypotheses

Growth Function for Positive Intervals

x

. . . x

h(x) = −1 h(x) = +1 h(x) = −1

•

X = R (one dimensional)

•

H contains h, where

each h(x ) = +1 iff x ∈ [`, r), −1 otherwise

one dichotomy for each ‘interval kind’

m H (N)

N + 1 2

| {z }

interval ends in N + 1 spots

1 |{z}

all ×

x

◦ × × ×

◦ ◦ × ×

◦ ◦ ◦ ×

◦ ◦ ◦ ◦

× ◦ × ×

× ◦ ◦ ×

× ◦ ◦ ◦

Training versus Testing Effective Number of Hypotheses

Growth Function for Convex Sets (1/2)

bottom

在文檔中 Machine Learning Foundations (ᘤ9M) (頁 51-78)

learning possible with infinite lines :-)

Effective Number of Lines

x 1

2

N

effective number of lines

•

N

•

•

in

out

effective(N)

2

lines in 2D

< 2 N

N

Effective Number of Lines

x 1

2

N

effective number of lines

•

N

•

•

in

out

effective(N)

2

lines in 2D

< 2 N

N

learning possible with infinite lines :-)

Effective Number of Lines

x 1

2

N

effective number of lines

•

N

•

•

in

out

effective(N)

2

lines in 2D

< 2 N

N

learning possible with infinite lines :-)

Effective Number of Lines

x 1

2

N

effective number of lines

•

N

•

•

in

out

effective(N)

2

lines in 2D

< 2 N

N

learning possible with infinite lines :-)

Effective Number of Lines

x 1

2

N

effective number of lines

•

N

•

•

in

out

effective(N)

x ₁

^N

_in

²

< 2 ^N

^N

x ₁

^N

_in

²

< 2 ^N

^N

x ₁

^N

_in

²

< 2 ^N

^N

x ₁

^N

_in

²

< 2 ^N

^N

x ₁

^N

_in

²

< 2 ^N

^N

x ₁

^N

_in

²

< 2 ^N

^N

What is the effective number of lines for five inputs ∈ R ² ?

much smaller than 2 ⁵ = 32. You shall find it difficult

What is the effective number of lines for five inputs ∈ R ² ?

much smaller than 2 ⁵ = 32. You shall find it difficult

₁

₂

_N

₁

₂

_N

◦} ^N

dichotomy: hypothesis ‘limited’ to the eyes of x ₁

₂

_N

₂

_N

all dichotomies ‘implemented’ by H on x 1 , x ₂ , . . . , x _N

₂

_N

²