• 沒有找到結果。

BAD sample: E in and E out far away

—can get worse when involving ‘choice’

Feasibility of Learning Connection to Real Learning

Coin Game

. . . .

top

bottom

Q: if everyone in size-150 NTU ML class

flips a coin 5 times, and one of the students gets 5 heads for her coin ‘g’. Is ‘g’ really magical?

A: No. Even if all coins are fair, the probability that

one of the coins

results in

5 heads

is 1−

31 32



150

> 99%.

BAD sample: E in and E out far away

—can get worse when involving ‘choice’

Feasibility of Learning Connection to Real Learning

Coin Game

. . . .

top

bottom

Q: if everyone in size-150 NTU ML class

flips a coin 5 times, and one of the students gets 5 heads for her coin ‘g’. Is ‘g’ really magical?

A: No. Even if all coins are fair, the probability that

one of the coins

results in

5 heads

is 1−

31 32



150

> 99%.

BAD sample: E in and E out far away

—can get worse when involving ‘choice’

Feasibility of Learning Connection to Real Learning

BAD Sample and BAD Data

BAD Sample

e.g., E

out

=

1 2

, but getting all heads (E

in

=0)!

BAD Data for One h E out (h) and E in (h) far away:

e.g., E

out

big (far from f ), but E

in

small (correct on most examples)

D

1

D

2

. . . D

1126

. . . D

5678

. . . Hoeffding

h BAD BAD P

D

[BAD D for h] ≤ . . .

Hoeffding: small

P

D

[BADD] = X

all possibleD

P(D) ·J

BAD

DK

Feasibility of Learning Connection to Real Learning

BAD Sample and BAD Data

BAD Sample

e.g., E

out

=

1 2

, but getting all heads (E

in

=0)!

BAD Data for One h E out (h) and E in (h) far away:

e.g., E

out

big (far from f ), but E

in

small (correct on most examples)

D

1

D

2

. . . D

1126

. . . D

5678

. . . Hoeffding

h BAD BAD P

D

[BAD D for h] ≤ . . .

Hoeffding: small

P

D

[BADD] = X

all possibleD

P(D) ·J

BAD

DK

Feasibility of Learning Connection to Real Learning

BAD Sample and BAD Data

BAD Sample

e.g., E

out

=

1 2

, but getting all heads (E

in

=0)!

BAD Data for One h E out (h) and E in (h) far away:

e.g., E

out

big (far from f ), but E

in

small (correct on most examples)

D

1

D

2

. . . D

1126

. . . D

5678

. . . Hoeffding

h BAD BAD P

D

[BAD D for h] ≤ . . .

Hoeffding: small

P

D

[BADD] = X

all possibleD

P(D) ·J

BAD

DK

Feasibility of Learning Connection to Real Learning

BAD Sample and BAD Data

BAD Sample

e.g., E

out

=

1 2

, but getting all heads (E

in

=0)!

BAD Data for One h E out (h) and E in (h) far away:

e.g., E

out

big (far from f ), but E

in

small (correct on most examples)

D

1

D

2

. . . D

1126

. . . D

5678

. . . Hoeffding

h BAD BAD P

D

[BAD D for h] ≤ . . .

Hoeffding: small

P

D

[BADD] = X

all possibleD

P(D) ·J

BAD

DK

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E in (h) far away

D

1

D

2 . . .

D

1126 . . .

D

5678

Hoeffding

h

1

BAD BAD P

D

[BAD D for h

1

] ≤ . . .

h

2

BAD P

D

[BAD D for h

2

] ≤ . . .

h

3

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E in (h) far away

D

1

D

2 . . .

D

1126 . . .

D

5678

Hoeffding

h

1

BAD BAD P

D

[BAD D for h

1

] ≤ . . .

h

2

BAD P

D

[BAD D for h

2

] ≤ . . .

h

3

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E in (h) far away

D

1

D

2 . . .

D

1126 . . .

D

5678

Hoeffding

h

1

BAD BAD P

D

[BAD D for h

1

] ≤ . . .

h

2

BAD P

D

[BAD D for h

2

] ≤ . . .

h

3

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E in (h) far away

D

1

D

2 . . .

D

1126 . . .

D

5678

Hoeffding

h

1

BAD BAD P

D

[BAD D for h

1

] ≤ . . .

h

2

BAD P

D

[BAD D for h

2

] ≤ . . .

h

3

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E in (h) far away

D

1

D

2 . . .

D

1126 . . .

D

5678

Hoeffding

h

1

BAD BAD P

D

[BAD D for h

1

] ≤ . . .

h

2

BAD P

D

[BAD D for h

2

] ≤ . . .

h

3

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E in (h) far away

D

1

D

2 . . .

D

1126 . . .

D

5678

Hoeffding

h

1

BAD BAD P

D

[BAD D for h

1

] ≤ . . .

h

2

BAD P

D

[BAD D for h

2

] ≤ . . .

h

3

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

] (union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp



−2

2

N



finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

]

(union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp



−2

2

N



finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

] (union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp



−2

2

N



finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

] (union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp



−2

2

N



finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

] (union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp



−2

2

N



finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

] (union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp

−2

2

N

finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

] (union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp

−2

2

N

finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

] (union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp

−2

2

N

finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

] (union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp

−2

2

N

finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

] (union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp

−2

2

N

finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket):

pick the h

m

with

lowest E in (h m )

as g

Feasibility of Learning Connection to Real Learning

The ‘Statistical’ Learning Flow

if|H| = M finite, N large enough,

for whatever g picked byA, E

out

(g)≈ E

in

(g)

ifA finds one g with E

in

(g)≈ 0, PAC guarantee for E

out

(g)≈ 0

=⇒

learning possible :-)

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

unknown P on X

x

1

, x

2

, · · · , x

N

x

M = ∞? (like perceptrons)

—see you in the next lectures

Feasibility of Learning Connection to Real Learning

The ‘Statistical’ Learning Flow

if|H| = M finite, N large enough,

for whatever g picked byA, E

out

(g)≈ E

in

(g) ifA finds one g with E

in

(g)≈ 0,

PAC guarantee for E

out

(g)≈ 0

=⇒

learning possible :-)

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

unknown P on X

x

1

, x

2

, · · · , x

N

x

M = ∞? (like perceptrons)

—see you in the next lectures

Feasibility of Learning Connection to Real Learning

The ‘Statistical’ Learning Flow

if|H| = M finite, N large enough,

for whatever g picked byA, E

out

(g)≈ E

in

(g) ifA finds one g with E

in

(g)≈ 0,

PAC guarantee for E

out

(g)≈ 0 =⇒

learning possible :-)

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

unknown P on X

x

1

, x

2

, · · · , x

N

x

M = ∞? (like perceptrons)

—see you in the next lectures

Feasibility of Learning Connection to Real Learning

The ‘Statistical’ Learning Flow

if|H| = M finite, N large enough,

for whatever g picked byA, E

out

(g)≈ E

in

(g) ifA finds one g with E

in

(g)≈ 0,

PAC guarantee for E

out

(g)≈ 0 =⇒

learning possible :-)

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

unknown P on X

x

1

, x

2

, · · · , x

N

x

M = ∞? (like perceptrons)

—see you in the next lectures

Feasibility of Learning Connection to Real Learning

Fun Time

Consider 4 hypotheses.

h

1

(x) = sign(x

1

), h

2

(x) = sign(x

2

), h

3

(x) = sign(−x

1

), h

4

(x) = sign(−x

2

).

For any N and, which of the following statement is not true?

1

the

BAD

data of h

1

and the

BAD

data of h

2

are exactly the same

2

the

BAD

data of h

1

and the

BAD

data of h

3

are exactly the same

3

P

D

[BADfor some h

k

]≤ 8 exp −2

2

N

4

P

D

[BADfor some h

k

]≤ 4 exp −2

2

N

Reference Answer: 1

The important thing is to note that 2 is true, which implies that 4 is true if you revisit the union bound. Similar ideas will be used to conquer the M =∞ case.

Feasibility of Learning Connection to Real Learning

Fun Time

Consider 4 hypotheses.

h

1

(x) = sign(x

1

), h

2

(x) = sign(x

2

), h

3

(x) = sign(−x

1

), h

4

(x) = sign(−x

2

).

For any N and, which of the following statement is not true?

1

the

BAD

data of h

1

and the

BAD

data of h

2

are exactly the same

2

the

BAD

data of h

1

and the

BAD

data of h

3

are exactly the same

3

P

D

[BADfor some h

k

]≤ 8 exp −2

2

N

4

P

D

[BADfor some h

k

]≤ 4 exp −2

2

N

Reference Answer: 1

The important thing is to note that 2 is true, which implies that 4 is true if you revisit the union bound. Similar ideas will be used to conquer the M =∞ case.

Feasibility of Learning Connection to Real Learning

Summary

1 When

Can Machines Learn?

Lecture 3: Types of Learning Lecture 4: Feasibility of Learning

Learning is Impossible?

absolutely no free lunch outside D Probability to the Rescue

probably approximately correct outside D Connection to Learning

verification possible if E in (h) small for fixed h

Connection to Real Learning

相關文件