The VC Dimension Interpreting VC Dimension
VC Bound Rephrase: Penalty for Model Complexity
For any
g
=A
(D
)∈H
and ‘statistical’large D
,for N ≥ 2, d
VC≥ 2
P
D
h
E
in
(g)− Eout
(g) >| {z }
BAD
i
The VC Dimension Interpreting VC Dimension
VC Bound Rephrase: Penalty for Model Complexity
For any
g
=A
(D
)∈H
and ‘statistical’large D
,for N ≥ 2, d
VC≥ 2
P
D
h
E
in
(g)− Eout
(g) >| {z }
BAD
i
if k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
| {z }
δ
Rephrase
. . .,
with probability ≥ 1 − δ
,GOOD:
E
in
(g)− Eout
(g) ≤ setδ
= 4(2N)d
VCexp−
1 8
2 N
δ
4(2N)
dVC = exp−
1 8
2 N
ln4(2N)
dVCδ
=
1 8
2 N
r8
N
ln4(2N)
dVCδ
=
√. . .
| {z } Ω(N,
H
,δ)
: penalty for
model complexity
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26
The VC Dimension Interpreting VC Dimension
VC Bound Rephrase: Penalty for Model Complexity
For any
g
=A
(D
)∈H
and ‘statistical’large D
,for N ≥ 2, d
VC≥ 2
P
D
h
E
in
(g)− Eout
(g) >| {z }
BAD
i
if k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
| {z }
δ
Rephrase
. . .,
with probability ≥ 1 − δ
,GOOD:
E
in
(g)− Eout
(g) ≤ setδ
= 4(2N)d
VCexp−
1 8
2 N
δ
4(2N)
dVC = exp−
1 8
2 N
ln
4(2N)
dVCδ
=
1 8
2 N
r8
N
ln4(2N)
dVCδ
=
√. . .
| {z } Ω(N,
H
,δ)
: penalty for
model complexity
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26
The VC Dimension Interpreting VC Dimension
VC Bound Rephrase: Penalty for Model Complexity
For any
g
=A
(D
)∈H
and ‘statistical’large D
,for N ≥ 2, d
VC≥ 2
P
D
h
E
in
(g)− Eout
(g) >| {z }
BAD
i
if k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
| {z }
δ
Rephrase
. . .,
with probability ≥ 1 − δ
,GOOD:
E
in
(g)− Eout
(g) ≤ setδ
= 4(2N)d
VCexp−
1 8
2 N
δ
4(2N)
dVC = exp−
1 8
2 N
ln4(2N)
dVCδ
=
1 8
2 N
r
8
N
ln4(2N)
dVCδ
=
√. . .
| {z } Ω(N,
H
,δ)
: penalty for
model complexity
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26
The VC Dimension Interpreting VC Dimension
VC Bound Rephrase: Penalty for Model Complexity
For any
g
=A
(D
)∈H
and ‘statistical’large D
,for N ≥ 2, d
VC≥ 2
P
D
h
E
in
(g)− Eout
(g) >| {z }
BAD
i
if k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
| {z }
δ
Rephrase
. . .,
with probability ≥ 1 − δ
,GOOD:
E
in
(g)− Eout
(g) ≤ setδ
= 4(2N)d
VCexp−
1 8
2 N
δ
4(2N)
dVC = exp−
1 8
2 N
ln4(2N)
dVCδ
=
1 8
2 N
r8
N
ln4(2N)
dVCδ
=
√. . .
| {z } Ω(N,
H
,δ)
: penalty for
model complexity
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26
The VC Dimension Interpreting VC Dimension
VC Bound Rephrase: Penalty for Model Complexity
For any
g
=A
(D
)∈H
and ‘statistical’large D
,for N ≥ 2, d
VC≥ 2
P
D
h
E
in
(g)− Eout
(g) >| {z }
BAD
i
if k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
| {z }
δ
Rephrase
. . .,
with probability ≥ 1 − δ
,GOOD!
gen. error
E
in
(g)− Eout
(g)≤
r
8 N
ln4(2N)
dVCδ
E in (g) − r
8 N ln
4(2N)
dVCδ
≤
Eout
(g) ≤ Ein
(g) + r8 N
ln4(2N)
dVCδ
√. . .
| {z } Ω(N,
H
,δ)
: penalty for
model complexity
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26
The VC Dimension Interpreting VC Dimension
VC Bound Rephrase: Penalty for Model Complexity
For any
g
=A
(D
)∈H
and ‘statistical’large D
,for N ≥ 2, d
VC≥ 2
P
D
h
E
in
(g)− Eout
(g) >| {z }
BAD
i
if k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
| {z }
δ
Rephrase
. . .,
with probability ≥ 1 − δ
,GOOD!
gen. error
E
in
(g)− Eout
(g)≤
r
8 N
ln4(2N)
dVCδ
E in (g) − r
8 N ln
4(2N)
dVCδ
≤
Eout
(g) ≤ Ein
(g) + r8 N
ln4(2N)
dVCδ
√. . .
| {z } Ω(N,
H
,δ)
: penalty for
model complexity
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26
The VC Dimension Interpreting VC Dimension
VC Bound Rephrase: Penalty for Model Complexity
For any
g
=A
(D
)∈H
and ‘statistical’large D
,for N ≥ 2, d
VC≥ 2
P
D
h
E
in
(g)− Eout
(g) >| {z }
BAD
i
if k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
| {z }
δ
Rephrase
. . .,
with probability ≥ 1 − δ
,GOOD!
gen. error
E
in
(g)− Eout
(g)≤
r
8 N
ln4(2N)
dVCδ
E in (g) − r
8 N ln
4(2N)
dVCδ
≤
Eout
(g) ≤ Ein
(g) + r8 N
ln4(2N)
dVCδ
√. . .
| {z } Ω(N,
H
,δ)
: penalty for
model complexity
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/26
The VC Dimension Interpreting VC Dimension
THE VC Message
with
a high probability,
E out (g)
≤E in (g)
+r
8 N ln
4(2N)
dVCδ
| {z }
Ω(N,H,δ)
in-sample error model complexity out-of-sample error
VC dimension, dvc
Error
d∗vc
•
dVC ↑:E in ↓
butΩ ↑
•
dVC ↓:Ω ↓
butE in ↑
•
best dVC∗ in the middle
powerful H
not always good!Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/26
The VC Dimension Interpreting VC Dimension
THE VC Message
with
a high probability,
E out (g)
≤E in (g)
+r
8 N ln
4(2N)
dVCδ
| {z }
Ω(N,H,δ)
in-sample error model complexity out-of-sample error
VC dimension, dvc
Error
d∗vc
•
dVC↑:E in ↓
butΩ ↑
•
dVC ↓:Ω ↓
butE in ↑
•
best dVC∗ in the middle
powerful H
not always good!Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/26
The VC Dimension Interpreting VC Dimension
THE VC Message
with
a high probability,
E out (g)
≤E in (g)
+r
8 N ln
4(2N)
dVCδ
| {z }
Ω(N,H,δ)
in-sample error model complexity out-of-sample error
VC dimension, dvc
Error
d∗vc
•
dVC↑:E in ↓
butΩ ↑
•
dVC↓:Ω ↓
butE in ↑
•
best dVC∗ in the middle
powerful H
not always good!Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/26
The VC Dimension Interpreting VC Dimension
THE VC Message
with
a high probability,
E out (g)
≤E in (g)
+r
8 N ln
4(2N)
dVCδ
| {z }
Ω(N,H,δ)
in-sample error model complexity out-of-sample error
VC dimension, dvc
Error
d∗vc
•
dVC↑:E in ↓
butΩ ↑
•
dVC↓:Ω ↓
butE in ↑
•
best dVC∗ in the middle
powerful H
not always good!Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/26
The VC Dimension Interpreting VC Dimension
THE VC Message
with
a high probability,
E out (g)
≤E in (g)
+r
8 N ln
4(2N)
dVCδ
| {z }
Ω(N,H,δ)
in-sample error model complexity out-of-sample error
VC dimension, dvc
Error
d∗vc
•
dVC↑:E in ↓
butΩ ↑
•
dVC↓:Ω ↓
butE in ↑
•
best dVC∗ in the middle
powerful H
not always good!Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/26
The VC Dimension Interpreting VC Dimension
VC Bound Rephrase: Sample Complexity
For any
g
=A
(D
)∈H
and ‘statistical’large D
,for N ≥ 2, d
VC≥ 2
P
D
h
E
in
(g)− Eout
(g) >| {z }
BAD
i
if k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
| {z }
δ
given
specs
= 0.1, δ = 0.1, dVC=3, want4(2N) d
VCexp − 1 8 2 N ≤ δ N bound
100 2.82 × 10
71,000 9.17 × 10
910,000 1.19 × 10
8100,000 1.65 × 10
−3829,300 9.99 × 10
−2sample complexity:
need N ≈ 10, 000d
VCin theory
practical rule of thumb:
N ≈ 10d
VCoften enough!
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26
The VC Dimension Interpreting VC Dimension
VC Bound Rephrase: Sample Complexity
For any
g
=A
(D
)∈H
and ‘statistical’large D
,for N ≥ 2, d
VC≥ 2
P
D
h
E
in
(g)− Eout
(g) >| {z }
BAD
i
if k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
| {z }
δ
given
specs
= 0.1, δ = 0.1, dVC=3, want4(2N) d
VCexp − 1 8 2 N ≤ δ N bound
100 2.82 × 10
71,000 9.17 × 10
910,000 1.19 × 10
8100,000 1.65 × 10
−3829,300 9.99 × 10
−2sample complexity:
need N ≈ 10, 000d
VCin theory
practical rule of thumb:
N ≈ 10d
VCoften enough!
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26
The VC Dimension Interpreting VC Dimension
VC Bound Rephrase: Sample Complexity
For any
g
=A
(D
)∈H
and ‘statistical’large D
,for N ≥ 2, d
VC≥ 2
P
D
h
E
in
(g)− Eout
(g) >| {z }
BAD
i
if k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
| {z }
δ
given
specs
= 0.1, δ = 0.1, dVC=3, want4(2N) d
VCexp − 1 8 2 N ≤ δ N bound
100 2.82 × 10
71,000 9.17 × 10
910,000 1.19 × 10
8100,000 1.65 × 10
−3829,300 9.99 × 10
−2sample complexity:
need N ≈ 10, 000d
VCin theory
practical rule of thumb:
N ≈ 10d
VCoften enough!
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26
The VC Dimension Interpreting VC Dimension
VC Bound Rephrase: Sample Complexity
For any
g
=A
(D
)∈H
and ‘statistical’large D
,for N ≥ 2, d
VC≥ 2
P
D
h
E
in
(g)− Eout
(g) >| {z }
BAD
i
if k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
| {z }
δ
given
specs
= 0.1, δ = 0.1, dVC=3, want4(2N) d
VCexp − 1 8 2 N ≤ δ N bound
100 2.82 × 10
71,000 9.17 × 10
910,000 1.19 × 10
8100,000 1.65 × 10
−3829,300 9.99 × 10
−2sample complexity:
need N ≈ 10, 000d
VCin theory
practical rule of thumb:
N ≈ 10d
VCoften enough!
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26
The VC Dimension Interpreting VC Dimension
VC Bound Rephrase: Sample Complexity
For any
g
=A
(D
)∈H
and ‘statistical’large D
,for N ≥ 2, d
VC≥ 2
P
D
h
E
in
(g)− Eout
(g) >| {z }
BAD
i
if k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
| {z }
δ
given
specs
= 0.1, δ = 0.1, dVC=3, want4(2N) d
VCexp − 1 8 2 N ≤ δ N bound
100 2.82 × 10
71,000 9.17 × 10
910,000 1.19 × 10
8100,000 1.65 × 10
−3829,300 9.99 × 10
−2sample complexity:
need N ≈ 10, 000d
VCin theory
practical rule of thumb:
N ≈ 10d
VCoften enough!
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26
The VC Dimension Interpreting VC Dimension
VC Bound Rephrase: Sample Complexity
For any
g
=A
(D
)∈H
and ‘statistical’large D
,for N ≥ 2, d
VC≥ 2
P
D
h
E
in
(g)− Eout
(g) >| {z }
BAD
i
if k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
| {z }
δ
given
specs
= 0.1, δ = 0.1, dVC=3, want4(2N) d
VCexp − 1 8 2 N ≤ δ N bound
100 2.82 × 10
71,000 9.17 × 10
910,000 1.19 × 10
8100,000 1.65 × 10
−3829,300 9.99 × 10
−2sample complexity:
need N ≈ 10, 000d
VCin theory
practical rule of thumb:
N ≈ 10d
VCoften enough!
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26
The VC Dimension Interpreting VC Dimension
VC Bound Rephrase: Sample Complexity
For any
g
=A
(D
)∈H
and ‘statistical’large D
,for N ≥ 2, d
VC≥ 2
P
D
h
E
in
(g)− Eout
(g) >| {z }
BAD
i
if k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
| {z }
δ
given
specs
= 0.1, δ = 0.1, dVC=3, want4(2N) d
VCexp − 1 8 2 N ≤ δ N bound
100 2.82 × 10
71,000 9.17 × 10
910,000 1.19 × 10
8100,000 1.65 × 10
−3829,300 9.99 × 10
−2sample complexity:
need N ≈ 10, 000d
VCin theory
practical rule of thumb:
N ≈ 10d
VCoften enough!
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/26
The VC Dimension Interpreting VC Dimension
Looseness of VC Bound
P
D
h
E
in
(g)− Eout
(g) > iif k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
theory: N ≈ 10, 000dVC;practice: N ≈ 10d
VCWhy?
•
Hoeffding for unknown Eout any distribution, any target
•
mH
(N) instead of|H(x1
, . . . , xN
)|‘any’ data
•
Nd
VC instead of mH
(N)‘any’ H of same d
VC•
union bound on worst casesany choice made by A
—but hardly better, and ‘similarly loose for all models’
philosophical message
of VC bound important for improving MLHsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26
The VC Dimension Interpreting VC Dimension
Looseness of VC Bound
P
D
h
E
in
(g)− Eout
(g) > iif k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
theory: N ≈ 10, 000dVC;practice: N ≈ 10d
VCWhy?
•
Hoeffding for unknown Eout any distribution, any target
•
mH
(N) instead of|H(x1
, . . . , xN
)|‘any’ data
•
Nd
VC instead of mH
(N)‘any’ H of same d
VC•
union bound on worst casesany choice made by A
—but hardly better, and ‘similarly loose for all models’
philosophical message
of VC bound important for improving MLHsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26
The VC Dimension Interpreting VC Dimension
Looseness of VC Bound
P
D
h
E
in
(g)− Eout
(g) > iif k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
theory: N ≈ 10, 000dVC;practice: N ≈ 10d
VCWhy?
•
Hoeffding for unknown Eout any distribution, any target
•
mH
(N) instead of|H(x1
, . . . , xN
)|‘any’ data
•
Nd
VC instead of mH
(N)‘any’ H of same d
VC•
union bound on worst casesany choice made by A
—but hardly better, and ‘similarly loose for all models’
philosophical message
of VC bound important for improving MLHsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26
The VC Dimension Interpreting VC Dimension
Looseness of VC Bound
P
D
h
E
in
(g)− Eout
(g) > iif k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
theory: N ≈ 10, 000dVC;practice: N ≈ 10d
VCWhy?
•
Hoeffding for unknown Eout any distribution, any target
•
mH
(N) instead of|H(x1
, . . . , xN
)|‘any’ data
•
Nd
VC instead of mH
(N)‘any’ H of same d
VC•
union bound on worst casesany choice made by A
—but hardly better, and ‘similarly loose for all models’
philosophical message
of VC bound important for improving MLHsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26
The VC Dimension Interpreting VC Dimension
Looseness of VC Bound
P
D
h
E
in
(g)− Eout
(g) > iif k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
theory: N ≈ 10, 000dVC;practice: N ≈ 10d
VCWhy?
•
Hoeffding for unknown Eout any distribution, any target
•
mH
(N) instead of|H(x1
, . . . , xN
)|‘any’ data
•
Nd
VC instead of mH
(N)‘any’ H of same d
VC•
union bound on worst casesany choice made by A
—but hardly better, and ‘similarly loose for all models’
philosophical message
of VC bound important for improving MLHsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26
The VC Dimension Interpreting VC Dimension
Looseness of VC Bound
P
D
h
E
in
(g)− Eout
(g) > iif k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
theory: N ≈ 10, 000dVC;practice: N ≈ 10d
VCWhy?
•
Hoeffding for unknown Eout any distribution, any target
•
mH
(N) instead of|H(x1
, . . . , xN
)|‘any’ data
•
Nd
VC instead of mH
(N)‘any’ H of same d
VC•
union bound on worst casesany choice made by A
—but hardly better, and ‘similarly loose for all models’
philosophical message
of VC bound important for improving MLHsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26
The VC Dimension Interpreting VC Dimension
Looseness of VC Bound
P
D
h
E
in
(g)− Eout
(g) > iif k exists
≤ 4(2N)
d
VCexp−
1 8
2 N
theory: N ≈ 10, 000dVC;practice: N ≈ 10d
VCWhy?
•
Hoeffding for unknown Eout any distribution, any target
•
mH
(N) instead of|H(x1
, . . . , xN
)|‘any’ data
•
Nd
VC instead of mH
(N)‘any’ H of same d
VC•
union bound on worst casesany choice made by A
—but hardly better, and ‘similarly loose for all models’
philosophical message
of VC bound important for improving MLHsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/26
The VC Dimension Interpreting VC Dimension
Fun Time
Consider the VC Bound below. How can we decrease the probability of getting BAD data?
P
D
h
E
in
(g)− Eout
(g) > iif k exists
≤ 4(2N)
d
VCexp−