Training versus Testing Effective Number of Lines
Effective Number of Lines
maximum kinds of lines with respect to N inputs
x 1
, x2
,· · · , xN
⇐⇒
effective number of lines
•
must be≤ 2N
(why?)•
finite ‘grouping’ of infinitely-many lines∈ H•
wish:P
E
in
(g)− Eout
(g) >≤ 2 ·
effective(N)
· exp−2
2
Nlines in 2D
N effective(N)1 2
2 4
3 8
4 14
< 2 N
if 1 effective(N) can replace M and
if
2 effective(N) 2
N
Training versus Testing Effective Number of Lines
Effective Number of Lines
maximum kinds of lines with respect to N inputs
x 1
, x2
,· · · , xN
⇐⇒
effective number of lines
•
must be≤ 2N
(why?)•
finite ‘grouping’ of infinitely-many lines∈ H•
wish:P
E
in
(g)− Eout
(g) >≤ 2 ·
effective(N)
· exp−2
2
Nlines in 2D
N effective(N)1 2
2 4
3 8
4 14
< 2 N
if 1 effective(N) can replace M and
if
2 effective(N) 2
N
learning possible with infinite lines :-)
Training versus Testing Effective Number of Lines
Effective Number of Lines
maximum kinds of lines with respect to N inputs
x 1
, x2
,· · · , xN
⇐⇒
effective number of lines
•
must be≤ 2N
(why?)•
finite ‘grouping’ of infinitely-many lines∈ H•
wish:P
E
in
(g)− Eout
(g) >≤ 2 ·
effective(N)
· exp−2
2
Nlines in 2D
N effective(N)1 2
2 4
3 8
4 14
< 2 N
if 1 effective(N) can replace M and
if
2 effective(N) 2
N
learning possible with infinite lines :-)
Training versus Testing Effective Number of Lines
Effective Number of Lines
maximum kinds of lines with respect to N inputs
x 1
, x2
,· · · , xN
⇐⇒
effective number of lines
•
must be≤ 2N
(why?)•
finite ‘grouping’ of infinitely-many lines∈ H•
wish:P
E
in
(g)− Eout
(g) >≤ 2 ·
effective(N)
· exp−2
2
Nlines in 2D
N effective(N)1 2
2 4
3 8
4 14
< 2 N
if 1 effective(N) can replace M and
if
2 effective(N) 2
N
learning possible with infinite lines :-)
Training versus Testing Effective Number of Lines
Effective Number of Lines
maximum kinds of lines with respect to N inputs
x 1
, x2
,· · · , xN
⇐⇒
effective number of lines
•
must be≤ 2N
(why?)•
finite ‘grouping’ of infinitely-many lines∈ H•
wish:P
E
in
(g)− Eout
(g) >≤ 2 ·
effective(N)
· exp−2
2
Nlines in 2D
N effective(N)1 2
2 4
3 8
4 14
< 2 N
if 1 effective(N) can replace M and
if
2 effective(N) 2
N
learning possible with infinite lines :-)
Training versus Testing Effective Number of Lines
Effective Number of Lines
maximum kinds of lines with respect to N inputs
x 1
, x2
,· · · , xN
⇐⇒
effective number of lines
•
must be≤ 2N
(why?)•
finite ‘grouping’ of infinitely-many lines∈ H•
wish:P
E
in
(g)− Eout
(g) >≤ 2 ·
effective(N)
· exp−2
2
Nlines in 2D
N effective(N)1 2
2 4
3 8
4 14
< 2 N
if 1 effective(N) can replace M and
if
2 effective(N) 2
N
learning possible with infinite lines :-)
Training versus Testing Effective Number of Lines
Fun Time
What is the effective number of lines for five inputs ∈ R 2 ?
1
142
163
224
32Reference Answer: 3
If you put the inputs roughly around a circle, you can then pick any consecutive inputs to be on one side of the line, and the other inputs to be on the other side. The procedure leads to effectively 22 kinds of lines, which is
much smaller than 2 5 = 32. You shall find it difficult
to generate more kinds by varying the inputs, and we will give a formal proof in future lectures.•x
1
•x
2
•x
3
•x
4
•x
5
Training versus Testing Effective Number of Lines
Fun Time
What is the effective number of lines for five inputs ∈ R 2 ?
1
142
163
224
32Reference Answer: 3
If you put the inputs roughly around a circle, you can then pick any consecutive inputs to be on one side of the line, and the other inputs to be on the other side. The procedure leads to effectively 22 kinds of lines, which is
much smaller than 2 5 = 32. You shall find it difficult
to generate more kinds by varying the inputs, and we will give a formal proof in future lectures.•x
1
•x
2
•x
3
•x
4
•x
5
Training versus Testing Effective Number of Hypotheses
Dichotomies: Mini-hypotheses
H = {hypothesis h : X → {×,
◦}}
•
callh(x
1
, x2
, . . . , xN
) = (h(x1
), h(x2
), . . . , h(xN
))∈ {×,◦} N
adichotomy: hypothesis ‘limited’ to the eyes of x 1
, x2
, . . . , xN
•
H(x1
, x2
, . . . , xN
):all dichotomies ‘implemented’ by H on x 1 , x 2 , . . . , x N
hypothesesH dichotomiesH(x1
, x2
, . . . , xN
) e.g. all lines in R2
{◦◦◦◦,◦◦◦×
,◦◦××
, . . .} size possibly infinite upper bounded by 2N
|H(x
1
, x2
, . . . , xN
)|: candidate forreplacing M
Training versus Testing Effective Number of Hypotheses
Dichotomies: Mini-hypotheses
H = {hypothesis h : X → {×,
◦}}
•
callh(x
1
, x2
, . . . , xN
) = (h(x1
), h(x2
), . . . , h(xN
))∈ {×,◦} N
adichotomy: hypothesis ‘limited’ to the eyes of x 1
, x2
, . . . , xN
•
H(x1
, x2
, . . . , xN
):all dichotomies ‘implemented’ by H on x 1 , x 2 , . . . , x N
hypothesesH dichotomiesH(x1
, x2
, . . . , xN
) e.g. all lines in R2
{◦◦◦◦,◦◦◦×
,◦◦××
, . . .} size possibly infinite upper bounded by 2N
|H(x
1
, x2
, . . . , xN
)|: candidate forreplacing M
Training versus Testing Effective Number of Hypotheses
Dichotomies: Mini-hypotheses
H = {hypothesis h : X → {×,
◦}}
•
callh(x
1
, x2
, . . . , xN
) = (h(x1
), h(x2
), . . . , h(xN
))∈ {×,◦} N
adichotomy: hypothesis ‘limited’ to the eyes of x 1
, x2
, . . . , xN
•
H(x1
, x2
, . . . , xN
):all dichotomies ‘implemented’ by H on x 1 , x 2 , . . . , x N
hypothesesH dichotomiesH(x
1
, x2
, . . . , xN
) e.g. all lines in R2
{◦◦◦◦,◦◦◦×
,◦◦××
, . . .} size possibly infinite upper bounded by 2N
|H(x
1
, x2
, . . . , xN
)|: candidate forreplacing M
Training versus Testing Effective Number of Hypotheses
Dichotomies: Mini-hypotheses
H = {hypothesis h : X → {×,
◦}}
•
callh(x
1
, x2
, . . . , xN
) = (h(x1
), h(x2
), . . . , h(xN
))∈ {×,◦} N
adichotomy: hypothesis ‘limited’ to the eyes of x 1
, x2
, . . . , xN
•
H(x1
, x2
, . . . , xN
):all dichotomies ‘implemented’ by H on x 1 , x 2 , . . . , x N
hypothesesH dichotomiesH(x1
, x2
, . . . , xN
) e.g. all lines in R2
{◦◦◦◦,◦◦◦×
,◦◦××
, . . .} size possibly infinite upper bounded by 2N
|H(x
1
, x2
, . . . , xN
)|: candidate forreplacing M
Training versus Testing Effective Number of Hypotheses
Dichotomies: Mini-hypotheses
H = {hypothesis h : X → {×,
◦}}
•
callh(x
1
, x2
, . . . , xN
) = (h(x1
), h(x2
), . . . , h(xN
))∈ {×,◦} N
adichotomy: hypothesis ‘limited’ to the eyes of x 1
, x2
, . . . , xN
•
H(x1
, x2
, . . . , xN
):all dichotomies ‘implemented’ by H on x 1 , x 2 , . . . , x N
hypothesesH dichotomiesH(x1
, x2
, . . . , xN
) e.g. all lines in R2
{◦◦◦◦,◦◦◦×
,◦◦××
, . . .} size possibly infinite upper bounded by 2N
Training versus Testing Effective Number of Hypotheses
Growth Function
•
|H(x1
, x2
, . . . , xN
)|: depend on inputs (x1
, x2
, . . . , xN
)•
growth function:remove dependence by
taking max of all possible (x 1 , x 2 , . . . , x N )
m
H
(N) = maxx
1,x
2,...,x
N∈X
|H(x1
, x2
, . . . , xN
)|•
finite, upper-bounded by 2N
lines in 2D
N mH
(N)1 2
2 4
3 max(. . . , 6, 8)
=8 4 14
< 2 N
how to ‘calculate’ the growth function?
Training versus Testing Effective Number of Hypotheses
Growth Function
•
|H(x1
, x2
, . . . , xN
)|: depend on inputs (x1
, x2
, . . . , xN
)•
growth function:remove dependence by
taking max of all possible (x 1 , x 2 , . . . , x N )
m
H
(N) = maxx
1,x
2,...,x
N∈X
|H(x1
, x2
, . . . , xN
)|•
finite, upper-bounded by 2N
lines in 2D
N mH
(N)1 2
2 4
3 max(. . . , 6, 8)
=8 4 14
< 2 N
how to ‘calculate’ the growth function?
Training versus Testing Effective Number of Hypotheses
Growth Function
•
|H(x1
, x2
, . . . , xN
)|: depend on inputs (x1
, x2
, . . . , xN
)•
growth function:remove dependence by
taking max of all possible (x 1 , x 2 , . . . , x N )
m
H
(N) = maxx
1,x
2,...,x
N∈X
|H(x1
, x2
, . . . , xN
)|•
finite, upper-bounded by 2N
lines in 2D
N mH
(N)1 2
2 4
3 max(. . . , 6, 8)
=8 4 14
< 2 N
how to ‘calculate’ the growth function?
Training versus Testing Effective Number of Hypotheses
Growth Function
•
|H(x1
, x2
, . . . , xN
)|: depend on inputs (x1
, x2
, . . . , xN
)•
growth function:remove dependence by
taking max of all possible (x 1 , x 2 , . . . , x N )
m
H
(N) = maxx
1,x
2,...,x
N∈X
|H(x1
, x2
, . . . , xN
)|•
finite, upper-bounded by 2N
lines in 2D
N mH
(N)1 2
2 4
3 max(. . . , 6, 8)
=8 4 14
< 2 N
how to ‘calculate’ the growth function?
Training versus Testing Effective Number of Hypotheses
Growth Function
•
|H(x1
, x2
, . . . , xN
)|: depend on inputs (x1
, x2
, . . . , xN
)•
growth function:remove dependence by
taking max of all possible (x 1 , x 2 , . . . , x N )
m
H
(N) = maxx
1,x
2,...,x
N∈X
|H(x1
, x2
, . . . , xN
)|•
finite, upper-bounded by 2N
lines in 2D
N mH
(N)1 2
2 4
3 max(. . . , 6, 8)
=8 4 14
< 2 N
how to ‘calculate’ the growth function?
Training versus Testing Effective Number of Hypotheses
Growth Function for Positive Rays
x
1x
2x
3. . . xN
h(x) = −1 h(x) = +1
a
•
X = R (one dimensional)•
H contains h, whereeach h(x ) = sign(x − a) for threshold a
•
‘positive half’ of 1D perceptronsone dichotomy for a∈ each spot (x
n
, xn+1
):m H (N) = N + 1
(N + 1)
2 N
when N large!x
1
x2
x3
x4
◦ ◦ ◦ ◦
× ◦ ◦ ◦
× × ◦ ◦
× × × ◦
× × × ×
Training versus Testing Effective Number of Hypotheses
Growth Function for Positive Rays
x
1x
2x
3. . . xN
h(x) = −1 h(x) = +1
a
•
X = R (one dimensional)•
H contains h, whereeach h(x ) = sign(x − a) for threshold a
•
‘positive half’ of 1D perceptronsone dichotomy for a∈ each spot (x
n
, xn+1
):m H (N) = N + 1
(N + 1)
2 N
when N large!x
1
x2
x3
x4
◦ ◦ ◦ ◦
× ◦ ◦ ◦
× × ◦ ◦
× × × ◦
× × × ×
Training versus Testing Effective Number of Hypotheses
Growth Function for Positive Rays
x
1x
2x
3. . . xN
h(x) = −1 h(x) = +1
a
•
X = R (one dimensional)•
H contains h, whereeach h(x ) = sign(x − a) for threshold a
•
‘positive half’ of 1D perceptronsone dichotomy for a∈ each spot (x
n
, xn+1
):m H (N) = N + 1
(N + 1)
2 N
when N large!x
1
x2
x3
x4
◦ ◦ ◦ ◦
× ◦ ◦ ◦
× × ◦ ◦
× × × ◦
× × × ×
Training versus Testing Effective Number of Hypotheses
Growth Function for Positive Rays
x
1x
2x
3. . . xN
h(x) = −1 h(x) = +1
a
•
X = R (one dimensional)•
H contains h, whereeach h(x ) = sign(x − a) for threshold a
•
‘positive half’ of 1D perceptronsone dichotomy for a∈ each spot (x
n
, xn+1
):m H (N) = N + 1
(N + 1)
2 N
when N large!x
1
x2
x3
x4
◦ ◦ ◦ ◦
× ◦ ◦ ◦
× × ◦ ◦
× × × ◦
× × × ×
Training versus Testing Effective Number of Hypotheses
Growth Function for Positive Rays
x
1x
2x
3. . . xN
h(x) = −1 h(x) = +1
a
•
X = R (one dimensional)•
H contains h, whereeach h(x ) = sign(x − a) for threshold a
•
‘positive half’ of 1D perceptronsone dichotomy for a∈ each spot (x
n
, xn+1
):m H (N) = N + 1
x
1
x2
x3
x4
◦ ◦ ◦ ◦
× ◦ ◦ ◦
Training versus Testing Effective Number of Hypotheses
Growth Function for Positive Intervals
x
1x
2x
3. . . xN
h(x) = −1 h(x) = +1 h(x) = −1
•
X = R (one dimensional)•
H contains h, whereeach h(x ) = +1 iff x ∈ [`, r), −1 otherwise
one dichotomy for each ‘interval kind’
m H (N)
=N + 1
2
| {z }
interval ends in N + 1 spots
+
1
|{z}
all ×
= 1
2 N 2 + 1 2 N + 1
1
2
N2
+1 2
N + 12 N
when N large!x
1x
2x
3x
4◦ × × ×
◦ ◦ × ×
◦ ◦ ◦ ×
◦ ◦ ◦ ◦
× ◦ × ×
× ◦ ◦ ×
× ◦ ◦ ◦
× × ◦ ×
× × ◦ ◦
× × × ◦
× × × ×
Training versus Testing Effective Number of Hypotheses
Growth Function for Positive Intervals
x
1x
2x
3. . . xN
h(x) = −1 h(x) = +1 h(x) = −1
•
X = R (one dimensional)•
H contains h, whereeach h(x ) = +1 iff x ∈ [`, r), −1 otherwise
one dichotomy for each ‘interval kind’
m H (N)
=N + 1 2
| {z }
interval ends in N + 1 spots
+
1
|{z}
all ×
1
2
N2
+1 2
N + 12 N
when N large!x
1x
2x
3x
4◦ × × ×
◦ ◦ × ×
◦ ◦ ◦ ×
◦ ◦ ◦ ◦
× ◦ × ×
× ◦ ◦ ×
× ◦ ◦ ◦
× × ◦ ×
× × ◦ ◦
× × × ◦
× × × ×
Training versus Testing Effective Number of Hypotheses
Growth Function for Positive Intervals
x
1x
2x
3. . . xN
h(x) = −1 h(x) = +1 h(x) = −1
•
X = R (one dimensional)•
H contains h, whereeach h(x ) = +1 iff x ∈ [`, r), −1 otherwise
one dichotomy for each ‘interval kind’
m H (N)
=N + 1 2
| {z }
interval ends in N + 1 spots
+
1
|{z}
all ×
= 1
2 N 2 + 1 2 N + 1
1
2
N2
+1 2
N + 12 N
when N large!x
1x
2x
3x
4◦ × × ×
◦ ◦ × ×
◦ ◦ ◦ ×
◦ ◦ ◦ ◦
× ◦ × ×
× ◦ ◦ ×
× ◦ ◦ ◦
× × ◦ ×
× × ◦ ◦
× × × ◦
× × × ×
Training versus Testing Effective Number of Hypotheses
Growth Function for Positive Intervals
x
1x
2x
3. . . xN
h(x) = −1 h(x) = +1 h(x) = −1
•
X = R (one dimensional)•
H contains h, whereeach h(x ) = +1 iff x ∈ [`, r), −1 otherwise
one dichotomy for each ‘interval kind’
m H (N)
=N + 1 2
| {z }
interval ends in N + 1 spots
+
1
|{z}
all ×
x
1x
2x
3x
4◦ × × ×
◦ ◦ × ×
◦ ◦ ◦ ×
◦ ◦ ◦ ◦
× ◦ × ×
× ◦ ◦ ×
× ◦ ◦ ◦
Training versus Testing Effective Number of Hypotheses
Growth Function for Convex Sets (1/2)
up
bottom