Feasibility of Learning Connection to Learning
Connection to Learning
bin
•
unknownorange
prob. µ•
marble•
∈ bin• orange •
• green •
•
size-N sample from bin of i.i.d. marblesFeasibility of Learning Connection to Learning
Connection to Learning
bin
•
unknownorange
prob. µ•
marble•
∈ bin• orange •
• green •
•
size-N sample from bin of i.i.d. marbleslearning
•
fixed hypothesis h(x)=?
target f
(x)
• x
∈ X•
h iswrong
⇔h(x) 6= f (x)
•
h isright
⇔h(x) = f (x)
•
check h onD = {(xn
, yn
|{z}
f (x
n)
)} with i.i.d.
x n
if
large N & i.i.d. x n
, canprobably
infer unknownJh(x) 6= f (x)K probabilityby knownJh(x
n
)6= yn
K fractiontop
X
• h(x) 6= f (x)
• h(x) = f (x)
Feasibility of Learning Connection to Learning
Connection to Learning
bin
•
unknownorange
prob. µ•
marble•
∈ bin• orange •
• green •
•
size-N sample from bin of i.i.d. marbleslearning
•
fixed hypothesis h(x)=?
target f (x)
• x
∈ X•
h iswrong
⇔h(x) 6= f (x)
•
h isright
⇔h(x) = f (x)
•
check h onD = {(xn
, yn
|{z}
f (x
n)
)} with i.i.d.
x n
if
large N & i.i.d. x n
, canprobably
infer unknownJh(x) 6= f (x)K probabilityby knownJh(x
n
)6= yn
K fractiontop
X
• h(x) 6= f (x)
• h(x) = f (x)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/26
Feasibility of Learning Connection to Learning
Connection to Learning
bin
•
unknownorange
prob. µ•
marble•
∈ bin• orange •
• green •
•
size-N sample from bin of i.i.d. marbleslearning
•
fixed hypothesis h(x)=?
target f (x)• x
∈ X•
h iswrong
⇔h(x) 6= f (x)
•
h isright
⇔h(x) = f (x)
•
check h onD = {(xn
, yn
|{z}
f (x
n)
)} with i.i.d.
x n
if
large N & i.i.d. x n
, canprobably
infer unknownJh(x) 6= f (x)K probabilityby knownJh(x
n
)6= yn
K fractiontop
X
• h(x) 6= f (x)
• h(x) = f (x)
Feasibility of Learning Connection to Learning
Connection to Learning
bin
•
unknownorange
prob. µ•
marble•
∈ bin• orange •
• green •
•
size-N sample from bin of i.i.d. marbleslearning
•
fixed hypothesis h(x)=?
target f (x)• x
∈ X•
h iswrong
⇔h(x) 6= f (x)
•
h isright
⇔h(x) = f (x)
•
check h onD = {(xn
, yn
|{z}
f (x
n)
)} with i.i.d.
x n
if
large N & i.i.d. x n
, canprobably
infer unknownJh(x) 6= f (x)K probabilityby knownJh(x
n
)6= yn
K fractiontop
X
• h(x) 6= f (x)
• h(x) = f (x)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/26
Feasibility of Learning Connection to Learning
Connection to Learning
bin
•
unknownorange
prob. µ•
marble•
∈ bin• orange •
• green •
•
size-N sample from bin of i.i.d. marbleslearning
•
fixed hypothesis h(x)=?
target f (x)• x
∈ X•
h iswrong
⇔h(x) 6= f (x)
•
h isright
⇔h(x) = f (x)
•
check h onD = {(xn
, yn
|{z}
f (x
n)
)} with i.i.d.
x n
if
large N & i.i.d. x n
, canprobably
infer unknownJh(x) 6= f (x)K probabilityby knownJh(x
n
)6= yn
K fractiontop
X
• h(x) 6= f (x)
• h(x) = f (x)
Feasibility of Learning Connection to Learning
Connection to Learning
bin
•
unknownorange
prob. µ•
marble•
∈ bin• orange •
• green •
•
size-N sample from bin of i.i.d. marbleslearning
•
fixed hypothesis h(x)=?
target f (x)• x
∈ X•
h iswrong
⇔h(x) 6= f (x)
•
h isright
⇔h(x) = f (x)
•
check h onD = {(xn
, yn
|{z}
f (x
n)
)} with i.i.d.
x n
if
large N & i.i.d. x n
, canprobably
infer unknownJh(x) 6= f (x)K probabilityby knownJh(x
n
)6= yn
K fractiontop
X
• h(x) 6= f (x)
• h(x) = f (x)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/26
Feasibility of Learning Connection to Learning
Connection to Learning
bin
•
unknownorange
prob. µ•
marble•
∈ bin• orange •
• green •
•
size-N sample from bin of i.i.d. marbleslearning
•
fixed hypothesis h(x)=?
target f (x)• x
∈ X•
h iswrong
⇔h(x) 6= f (x)
•
h isright
⇔h(x) = f (x)
•
check h onD = {(xn
, yn
|{z}
f (x
n)
)} with i.i.d.
x n
if
large N & i.i.d. x n
, canprobably
infer unknownJh(x) 6= f (x)K probabilityby knownJh(x
n
)6= yn
K fractiontop
X
• h(x) 6= f (x)
• h(x) = f (x)
Feasibility of Learning Connection to Learning
Connection to Learning
bin
•
unknownorange
prob. µ•
marble•
∈ bin• orange •
• green •
•
size-N sample from bin of i.i.d. marbleslearning
•
fixed hypothesis h(x)=?
target f (x)• x
∈ X•
h iswrong
⇔h(x) 6= f (x)
•
h isright
⇔h(x) = f (x)
•
check h onD = {(xn
, yn
|{z}
f (x
n)
)} with i.i.d.
x n
if
large N & i.i.d. x n
, canprobably
infer unknownJh(x) 6= f (x)K probabilityby knownJh(x
n
)6= yn
K fractiontop
X
• h(x) 6= f (x)
• h(x) = f (x)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/26
Feasibility of Learning Connection to Learning
Added Components
unknown target function f : X → Y
(ideal credit approval formula)
training examples D : (x
1, y
1), · · · , (x
N,y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
hypothesis set H
(set of candidate formula)
unknown P on X x
1, x
2, · · · , x
Nh ≈ f
?fixed h x
for any fixed h, can probably infer
unknown E out (h)
= Ex∼P
Jh(x) 6= f (x)KFeasibility of Learning Connection to Learning
The Formal Guarantee
for any fixed h, in ‘big’ data
(N large),
for any fixed h,
in-sample error E
in
(h) is probably close tofor any fixed h,
out-of-sample error E
out
(h)(within )
P
E
in
(h)− Eout
(h)>
≤ 2 exp−2
2 N
same as the ‘bin’ analogy . . .
•
valid for allN
and•
does not depend on Eout
(h),no need to ‘know’ E out (h)
—f and P can stay unknown
•
‘Ein
(h) = Eout
(h)’ isprobably approximately correct (PAC)
=⇒ if
‘E in (h) ≈ E out (h)’
and‘E in (h) small’
=⇒ E
out
(h) small =⇒ h ≈ f with respect to PFeasibility of Learning Connection to Learning
The Formal Guarantee
for any fixed h, in ‘big’ data
(N large),
for any fixed h,
in-sample error E
in
(h) is probably close tofor any fixed h,
out-of-sample error E
out
(h)(within )
P
E
in
(h)− Eout
(h)>
≤ 2 exp−2
2 N
same as the ‘bin’ analogy . . .
•
valid for allN
and•
does not depend on Eout
(h),no need to ‘know’ E out (h)
—f and P can stay unknown
•
‘Ein
(h) = Eout
(h)’ isprobably approximately correct (PAC)
=⇒ if
‘E in (h) ≈ E out (h)’
and‘E in (h) small’
=⇒ E
out
(h) small =⇒ h ≈ f with respect to PFeasibility of Learning Connection to Learning
The Formal Guarantee
for any fixed h, in ‘big’ data
(N large),
for any fixed h,
in-sample error E
in
(h) is probably close tofor any fixed h,
out-of-sample error E
out
(h)(within )
P
E
in
(h)− Eout
(h)>
≤ 2 exp−2
2 N
same as the ‘bin’ analogy . . .
•
valid for allN
and•
does not depend on Eout
(h),no need to ‘know’ E out (h)
—f and P can stay unknown
•
‘Ein
(h) = Eout
(h)’ isprobably approximately correct (PAC)
=⇒ if
‘E in (h) ≈ E out (h)’
and‘E in (h) small’
=⇒ E
out
(h) small =⇒ h ≈ f with respect to PFeasibility of Learning Connection to Learning
The Formal Guarantee
for any fixed h, in ‘big’ data
(N large),
for any fixed h,
in-sample error E
in
(h) is probably close tofor any fixed h,
out-of-sample error E
out
(h)(within )
P
E
in
(h)− Eout
(h)>
≤ 2 exp−2
2 N
same as the ‘bin’ analogy . . .
•
valid for allN
and•
does not depend on Eout
(h),no need to ‘know’ E out (h)
—f and P can stay unknown
•
‘Ein
(h) = Eout
(h)’ isprobably approximately correct (PAC)
=⇒ if
‘E in (h) ≈ E out (h)’
and‘E in (h) small’
=⇒ E
out
(h) small =⇒ h ≈ f with respect to PFeasibility of Learning Connection to Learning
The Formal Guarantee
for any fixed h, in ‘big’ data
(N large),
for any fixed h,
in-sample error E
in
(h) is probably close tofor any fixed h,
out-of-sample error E
out
(h)(within )
P
E
in
(h)− Eout
(h)>
≤ 2 exp−2
2 N
same as the ‘bin’ analogy . . .
•
valid for allN
and•
does not depend on Eout
(h),no need to ‘know’ E out (h)
—f and P can stay unknown
•
‘Ein
(h) = Eout
(h)’ isprobably approximately correct (PAC)
=⇒
if
‘E in (h) ≈ E out (h)’
and‘E in (h) small’
=⇒ E
out
(h) small=⇒ h ≈ f with respect to P
Feasibility of Learning Connection to Learning
The Formal Guarantee
for any fixed h, in ‘big’ data
(N large),
for any fixed h,
in-sample error E
in
(h) is probably close tofor any fixed h,
out-of-sample error E
out
(h)(within )
P
E
in
(h)− Eout
(h)>
≤ 2 exp−2
2 N
same as the ‘bin’ analogy . . .
•
valid for allN
and•
does not depend on Eout
(h),no need to ‘know’ E out (h)
—f and P can stay unknown
•
‘Ein
(h) = Eout
(h)’ isprobably approximately correct (PAC)
=⇒
Feasibility of Learning Connection to Learning
Verification of One h
for any fixed h, when data large enough, E