—can get worse when involving ‘choice’
Feasibility of Learning Connection to Real Learning
Coin Game
. . . .
top
bottom
Q: if everyone in size-150 NTU ML class
flips a coin 5 times, and one of the students gets 5 heads for her coin ‘g’. Is ‘g’ really magical?
A: No. Even if all coins are fair, the probability that
one of the coins
results in5 heads
is 1−31 32
150
> 99%.
BAD sample: E in and E out far away
—can get worse when involving ‘choice’
Feasibility of Learning Connection to Real Learning
Coin Game
. . . .
top
bottom
Q: if everyone in size-150 NTU ML class
flips a coin 5 times, and one of the students gets 5 heads for her coin ‘g’. Is ‘g’ really magical?
A: No. Even if all coins are fair, the probability that
one of the coins
results in5 heads
is 1−31 32
150
> 99%.
BAD sample: E in and E out far away
—can get worse when involving ‘choice’
Feasibility of Learning Connection to Real Learning
BAD Sample and BAD Data
BAD Sample
e.g., E
out
=1 2
, but getting all heads (Ein
=0)!BAD Data for One h E out (h) and E in (h) far away:
e.g., E
out
big (far from f ), but Ein
small (correct on most examples)D
1D
2. . . D
1126. . . D
5678. . . Hoeffding
h BAD BAD P
D[BAD D for h] ≤ . . .
Hoeffding: small
P
D
[BADD] = Xall possibleD
P(D) ·J
BAD
DKFeasibility of Learning Connection to Real Learning
BAD Sample and BAD Data
BAD Sample
e.g., E
out
=1 2
, but getting all heads (Ein
=0)!BAD Data for One h E out (h) and E in (h) far away:
e.g., E
out
big (far from f ), but Ein
small (correct on most examples)D
1D
2. . . D
1126. . . D
5678. . . Hoeffding
h BAD BAD P
D[BAD D for h] ≤ . . .
Hoeffding: small
P
D
[BADD] = Xall possibleD
P(D) ·J
BAD
DKFeasibility of Learning Connection to Real Learning
BAD Sample and BAD Data
BAD Sample
e.g., E
out
=1 2
, but getting all heads (Ein
=0)!BAD Data for One h E out (h) and E in (h) far away:
e.g., E
out
big (far from f ), but Ein
small (correct on most examples)D
1D
2. . . D
1126. . . D
5678. . . Hoeffding
h BAD BAD P
D[BAD D for h] ≤ . . .
Hoeffding: small
P
D
[BADD] = Xall possibleD
P(D) ·J
BAD
DKFeasibility of Learning Connection to Real Learning
BAD Sample and BAD Data
BAD Sample
e.g., E
out
=1 2
, but getting all heads (Ein
=0)!BAD Data for One h E out (h) and E in (h) far away:
e.g., E
out
big (far from f ), but Ein
small (correct on most examples)D
1D
2. . . D
1126. . . D
5678. . . Hoeffding
h BAD BAD P
D[BAD D for h] ≤ . . .
Hoeffding: small
P
D
[BADD] = Xall possibleD
P(D) ·J
BAD
DKFeasibility of Learning Connection to Real Learning
BAD Data for Many h
=⇒
BAD data for many h
⇐⇒
no ‘freedom of choice’
byA⇐⇒
there exists some h such that E out (h) and E in (h) far away
D
1D
2 . . .D
1126 . . .D
5678Hoeffding
h
1BAD BAD P
D[BAD D for h
1] ≤ . . .
h
2BAD P
D[BAD D for h
2] ≤ . . .
h
3BAD BAD BAD P
D[BAD D for h
3] ≤ . . .
. . .
h
MBAD BAD P
D[BAD D for h
M] ≤ . . .
all BAD BAD BAD ?
for M hypotheses, bound of P
D
[BADD]?Feasibility of Learning Connection to Real Learning
BAD Data for Many h
=⇒
BAD data for many h
⇐⇒
no ‘freedom of choice’
byA⇐⇒
there exists some h such that E out (h) and E in (h) far away
D
1D
2 . . .D
1126 . . .D
5678Hoeffding
h
1BAD BAD P
D[BAD D for h
1] ≤ . . .
h
2BAD P
D[BAD D for h
2] ≤ . . .
h
3BAD BAD BAD P
D[BAD D for h
3] ≤ . . .
. . .
h
MBAD BAD P
D[BAD D for h
M] ≤ . . .
all BAD BAD BAD ?
for M hypotheses, bound of P
D
[BADD]?Feasibility of Learning Connection to Real Learning
BAD Data for Many h
=⇒
BAD data for many h
⇐⇒
no ‘freedom of choice’
byA⇐⇒
there exists some h such that E out (h) and E in (h) far away
D
1D
2 . . .D
1126 . . .D
5678Hoeffding
h
1BAD BAD P
D[BAD D for h
1] ≤ . . .
h
2BAD P
D[BAD D for h
2] ≤ . . .
h
3BAD BAD BAD P
D[BAD D for h
3] ≤ . . .
. . .
h
MBAD BAD P
D[BAD D for h
M] ≤ . . .
all BAD BAD BAD ?
for M hypotheses, bound of P
D
[BADD]?Feasibility of Learning Connection to Real Learning
BAD Data for Many h
=⇒
BAD data for many h
⇐⇒
no ‘freedom of choice’
byA⇐⇒
there exists some h such that E out (h) and E in (h) far away
D
1D
2 . . .D
1126 . . .D
5678Hoeffding
h
1BAD BAD P
D[BAD D for h
1] ≤ . . .
h
2BAD P
D[BAD D for h
2] ≤ . . .
h
3BAD BAD BAD P
D[BAD D for h
3] ≤ . . .
. . .
h
MBAD BAD P
D[BAD D for h
M] ≤ . . .
all BAD BAD BAD ?
for M hypotheses, bound of P
D
[BADD]?Feasibility of Learning Connection to Real Learning
BAD Data for Many h
=⇒
BAD data for many h
⇐⇒
no ‘freedom of choice’
byA⇐⇒
there exists some h such that E out (h) and E in (h) far away
D
1D
2 . . .D
1126 . . .D
5678Hoeffding
h
1BAD BAD P
D[BAD D for h
1] ≤ . . .
h
2BAD P
D[BAD D for h
2] ≤ . . .
h
3BAD BAD BAD P
D[BAD D for h
3] ≤ . . .
. . .
h
MBAD BAD P
D[BAD D for h
M] ≤ . . .
all BAD BAD BAD ?
for M hypotheses, bound of P
D
[BADD]?Feasibility of Learning Connection to Real Learning
BAD Data for Many h
=⇒
BAD data for many h
⇐⇒
no ‘freedom of choice’
byA⇐⇒
there exists some h such that E out (h) and E in (h) far away
D
1D
2 . . .D
1126 . . .D
5678Hoeffding
h
1BAD BAD P
D[BAD D for h
1] ≤ . . .
h
2BAD P
D[BAD D for h
2] ≤ . . .
h
3BAD BAD BAD P
D[BAD D for h
3] ≤ . . .
. . .
h
MBAD BAD P
D[BAD D for h
M] ≤ . . .
all BAD BAD BAD ?
for M hypotheses, bound of P
D
[BADD]?Feasibility of Learning Connection to Real Learning
Bound of BAD Data
P
D
[BADD]= P
D
[BADD for h1 or BAD
D for h2 or
. . . orBAD
D for hM
]≤
P
D
[BADD for h1
] + PD
[BADD for h2
] +. . . + PD
[BADD for hM
] (union bound)≤
2 exp
−2 2 N
+
2 exp
−2 2 N
+. . . +
2 exp
−2 2 N
= 2Mexp
−2
2
N
•
finite-bin version of Hoeffding, valid for allM, N and
•
does not depend on any Eout
(hm
),no need to ‘know’ E out (h m )
—f and P can stay unknown
•
‘Ein
(g) = Eout
(g)’ isPAC, regardless of A
‘most reasonable’A (like PLA/pocket): pick the h
m
withlowest E in (h m )
as gFeasibility of Learning Connection to Real Learning
Bound of BAD Data
P
D
[BADD]= P
D
[BADD for h1 or BAD
D for h2 or
. . . orBAD
D for hM
]≤ P
D
[BADD for h1
] + PD
[BADD for h2
] +. . . + PD
[BADD for hM
](union bound)
≤
2 exp
−2 2 N
+
2 exp
−2 2 N
+. . . +
2 exp
−2 2 N
= 2Mexp
−2
2
N
•
finite-bin version of Hoeffding, valid for allM, N and
•
does not depend on any Eout
(hm
),no need to ‘know’ E out (h m )
—f and P can stay unknown
•
‘Ein
(g) = Eout
(g)’ isPAC, regardless of A
‘most reasonable’A (like PLA/pocket): pick the h
m
withlowest E in (h m )
as gFeasibility of Learning Connection to Real Learning
Bound of BAD Data
P
D
[BADD]= P
D
[BADD for h1 or BAD
D for h2 or
. . . orBAD
D for hM
]≤ P
D
[BADD for h1
] + PD
[BADD for h2
] +. . . + PD
[BADD for hM
] (union bound)≤
2 exp
−2 2 N
+
2 exp
−2 2 N
+. . . +
2 exp
−2 2 N
= 2Mexp
−2
2
N
•
finite-bin version of Hoeffding, valid for allM, N and
•
does not depend on any Eout
(hm
),no need to ‘know’ E out (h m )
—f and P can stay unknown
•
‘Ein
(g) = Eout
(g)’ isPAC, regardless of A
‘most reasonable’A (like PLA/pocket): pick the h
m
withlowest E in (h m )
as gFeasibility of Learning Connection to Real Learning
Bound of BAD Data
P
D
[BADD]= P
D
[BADD for h1 or BAD
D for h2 or
. . . orBAD
D for hM
]≤ P
D
[BADD for h1
] + PD
[BADD for h2
] +. . . + PD
[BADD for hM
] (union bound)≤
2 exp
−2 2 N
+
2 exp
−2 2 N
+. . . +
2 exp
−2 2 N
= 2Mexp
−2
2
N
•
finite-bin version of Hoeffding, valid for allM, N and
•
does not depend on any Eout
(hm
),no need to ‘know’ E out (h m )
—f and P can stay unknown
•
‘Ein
(g) = Eout
(g)’ isPAC, regardless of A
‘most reasonable’A (like PLA/pocket): pick the h
m
withlowest E in (h m )
as gFeasibility of Learning Connection to Real Learning
Bound of BAD Data
P
D
[BADD]= P
D
[BADD for h1 or BAD
D for h2 or
. . . orBAD
D for hM
]≤ P
D
[BADD for h1
] + PD
[BADD for h2
] +. . . + PD
[BADD for hM
] (union bound)≤
2 exp
−2 2 N
+
2 exp
−2 2 N
+. . . +
2 exp
−2 2 N
= 2Mexp
−2
2
N
•
finite-bin version of Hoeffding, valid for allM, N and
•
does not depend on any Eout
(hm
),no need to ‘know’ E out (h m )
—f and P can stay unknown
•
‘Ein
(g) = Eout
(g)’ isPAC, regardless of A
‘most reasonable’A (like PLA/pocket): pick the h
m
withlowest E in (h m )
as gFeasibility of Learning Connection to Real Learning
Bound of BAD Data
P
D
[BADD]= P
D
[BADD for h1 or BAD
D for h2 or
. . . orBAD
D for hM
]≤ P
D
[BADD for h1
] + PD
[BADD for h2
] +. . . + PD
[BADD for hM
] (union bound)≤
2 exp
−2 2 N
+
2 exp
−2 2 N
+. . . +
2 exp
−2 2 N
= 2Mexp
−2
2
N•
finite-bin version of Hoeffding, valid for allM, N and
•
does not depend on any Eout
(hm
),no need to ‘know’ E out (h m )
—f and P can stay unknown
•
‘Ein
(g) = Eout
(g)’ isPAC, regardless of A
‘most reasonable’A (like PLA/pocket): pick the h
m
withlowest E in (h m )
as gFeasibility of Learning Connection to Real Learning
Bound of BAD Data
P
D
[BADD]= P
D
[BADD for h1 or BAD
D for h2 or
. . . orBAD
D for hM
]≤ P
D
[BADD for h1
] + PD
[BADD for h2
] +. . . + PD
[BADD for hM
] (union bound)≤
2 exp
−2 2 N
+
2 exp
−2 2 N
+. . . +
2 exp
−2 2 N
= 2Mexp
−2
2
N•
finite-bin version of Hoeffding, valid for allM, N and
•
does not depend on any Eout
(hm
),no need to ‘know’ E out (h m )
—f and P can stay unknown
•
‘Ein
(g) = Eout
(g)’ isPAC, regardless of A
‘most reasonable’A (like PLA/pocket): pick the h
m
withlowest E in (h m )
as gFeasibility of Learning Connection to Real Learning
Bound of BAD Data
P
D
[BADD]= P
D
[BADD for h1 or BAD
D for h2 or
. . . orBAD
D for hM
]≤ P
D
[BADD for h1
] + PD
[BADD for h2
] +. . . + PD
[BADD for hM
] (union bound)≤
2 exp
−2 2 N
+
2 exp
−2 2 N
+. . . +
2 exp
−2 2 N
= 2Mexp
−2
2
N•
finite-bin version of Hoeffding, valid for allM, N and
•
does not depend on any Eout
(hm
),no need to ‘know’ E out (h m )
—f and P can stay unknown
•
‘Ein
(g) = Eout
(g)’ isPAC, regardless of A
‘most reasonable’A (like PLA/pocket): pick the h
m
withlowest E in (h m )
as gFeasibility of Learning Connection to Real Learning
Bound of BAD Data
P
D
[BADD]= P
D
[BADD for h1 or BAD
D for h2 or
. . . orBAD
D for hM
]≤ P
D
[BADD for h1
] + PD
[BADD for h2
] +. . . + PD
[BADD for hM
] (union bound)≤
2 exp
−2 2 N
+
2 exp
−2 2 N
+. . . +
2 exp
−2 2 N
= 2Mexp
−2
2
N•
finite-bin version of Hoeffding, valid for allM, N and
•
does not depend on any Eout
(hm
),no need to ‘know’ E out (h m )
—f and P can stay unknown
•
‘Ein
(g) = Eout
(g)’ isPAC, regardless of A
‘most reasonable’A (like PLA/pocket): pick the h
m
withlowest E in (h m )
as gFeasibility of Learning Connection to Real Learning
Bound of BAD Data
P
D
[BADD]= P
D
[BADD for h1 or BAD
D for h2 or
. . . orBAD
D for hM
]≤ P
D
[BADD for h1
] + PD
[BADD for h2
] +. . . + PD
[BADD for hM
] (union bound)≤
2 exp
−2 2 N
+
2 exp
−2 2 N
+. . . +
2 exp
−2 2 N
= 2Mexp
−2
2
N•
finite-bin version of Hoeffding, valid for allM, N and
•
does not depend on any Eout
(hm
),no need to ‘know’ E out (h m )
—f and P can stay unknown
•
‘Ein
(g) = Eout
(g)’ isPAC, regardless of A
‘most reasonable’A (like PLA/pocket):
pick the h
m
withlowest E in (h m )
as gFeasibility of Learning Connection to Real Learning
The ‘Statistical’ Learning Flow
if|H| = M finite, N large enough,
for whatever g picked byA, E
out
(g)≈ Ein
(g)ifA finds one g with E
in
(g)≈ 0, PAC guarantee for Eout
(g)≈ 0=⇒
learning possible :-)
unknown target function f : X → Y
(ideal credit approval formula)
training examples D : (x
1, y
1), · · · , (x
N,y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
hypothesis set H
unknown P on X
x
1, x
2, · · · , x
Nx
M = ∞? (like perceptrons)
—see you in the next lectures
Feasibility of Learning Connection to Real Learning
The ‘Statistical’ Learning Flow
if|H| = M finite, N large enough,
for whatever g picked byA, E
out
(g)≈ Ein
(g) ifA finds one g with Ein
(g)≈ 0,PAC guarantee for E
out
(g)≈ 0=⇒
learning possible :-)
unknown target function f : X → Y
(ideal credit approval formula)
training examples D : (x
1, y
1), · · · , (x
N,y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
hypothesis set H
(set of candidate formula)
unknown P on X
x
1, x
2, · · · , x
Nx
M = ∞? (like perceptrons)
—see you in the next lectures
Feasibility of Learning Connection to Real Learning
The ‘Statistical’ Learning Flow
if|H| = M finite, N large enough,
for whatever g picked byA, E
out
(g)≈ Ein
(g) ifA finds one g with Ein
(g)≈ 0,PAC guarantee for E
out
(g)≈ 0 =⇒learning possible :-)
unknown target function f : X → Y
(ideal credit approval formula)
training examples D : (x
1, y
1), · · · , (x
N,y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
hypothesis set H
unknown P on X
x
1, x
2, · · · , x
Nx
M = ∞? (like perceptrons)
—see you in the next lectures
Feasibility of Learning Connection to Real Learning
The ‘Statistical’ Learning Flow
if|H| = M finite, N large enough,
for whatever g picked byA, E
out
(g)≈ Ein
(g) ifA finds one g with Ein
(g)≈ 0,PAC guarantee for E
out
(g)≈ 0 =⇒learning possible :-)
unknown target function f : X → Y
(ideal credit approval formula)
training examples D : (x
1, y
1), · · · , (x
N,y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
hypothesis set H
(set of candidate formula)
unknown P on X
x
1, x
2, · · · , x
Nx
M = ∞? (like perceptrons)
—see you in the next lectures
Feasibility of Learning Connection to Real Learning
Fun Time
Consider 4 hypotheses.
h
1
(x) = sign(x1
), h2
(x) = sign(x2
), h3
(x) = sign(−x1
), h4
(x) = sign(−x2
).For any N and, which of the following statement is not true?
1
theBAD
data of h1
and theBAD
data of h2
are exactly the same2
theBAD
data of h1
and theBAD
data of h3
are exactly the same3
PD
[BADfor some hk
]≤ 8 exp −22
N4
PD
[BADfor some hk
]≤ 4 exp −22
NReference Answer: 1
The important thing is to note that 2 is true, which implies that 4 is true if you revisit the union bound. Similar ideas will be used to conquer the M =∞ case.
Feasibility of Learning Connection to Real Learning
Fun Time
Consider 4 hypotheses.
h
1
(x) = sign(x1
), h2
(x) = sign(x2
), h3
(x) = sign(−x1
), h4
(x) = sign(−x2
).For any N and, which of the following statement is not true?
1
theBAD
data of h1
and theBAD
data of h2
are exactly the same2
theBAD
data of h1
and theBAD
data of h3
are exactly the same3
PD
[BADfor some hk
]≤ 8 exp −22
N4
PD
[BADfor some hk
]≤ 4 exp −22
NReference Answer: 1
The important thing is to note that 2 is true, which implies that 4 is true if you revisit the union bound. Similar ideas will be used to conquer the M =∞ case.
Feasibility of Learning Connection to Real Learning