最近搜尋

沒有找到結果。

標籤

沒有找到結果。

文件

沒有找到結果。

上傳

首頁學校主題

登錄

BAD sample: E in and E out far away

在文檔中 Machine Learning Foundations (ᘤ9M) (頁 65-94)

—can get worse when involving ‘choice’

Feasibility of Learning Connection to Real Learning

Coin Game

. . . .

top

bottom

Q: if everyone in size-150 NTU ML class

flips a coin 5 times, and one of the students gets 5 heads for her coin ‘g’. Is ‘g’ really magical?

A: No. Even if all coins are fair, the probability that

one of the coins

results in

5 heads

is 1−

³¹ 32

150

> 99%.

BAD sample: E _in and E _out far away

—can get worse when involving ‘choice’

Feasibility of Learning Connection to Real Learning

Coin Game

. . . .

top

bottom

Q: if everyone in size-150 NTU ML class

flips a coin 5 times, and one of the students gets 5 heads for her coin ‘g’. Is ‘g’ really magical?

A: No. Even if all coins are fair, the probability that

one of the coins

results in

5 heads

is 1−

³¹ 32

150

> 99%.

BAD sample: E _in and E _out far away

—can get worse when involving ‘choice’

Feasibility of Learning Connection to Real Learning

BAD Sample and BAD Data

BAD Sample

e.g., E

_out

=

¹ ₂

, but getting all heads (E

_in

=0)!

BAD Data for One h E out (h) and E _in (h) far away:

e.g., E

_out

big (far from f ), but E

_in

small (correct on most examples)

D

1

D

2

. . . D

1126

. . . D

5678

. . . Hoeffding

h BAD BAD P

D

[BAD D for h] ≤ . . .

Hoeffding: small

P

D

[BADD] = X

all possibleD

P(D) ·J

BAD

DK

Feasibility of Learning Connection to Real Learning

BAD Sample and BAD Data

BAD Sample

e.g., E

_out

=

¹ ₂

, but getting all heads (E

_in

=0)!

BAD Data for One h E out (h) and E _in (h) far away:

e.g., E

_out

big (far from f ), but E

_in

small (correct on most examples)

D

1

D

2

. . . D

1126

. . . D

5678

. . . Hoeffding

h BAD BAD P

D

[BAD D for h] ≤ . . .

Hoeffding: small

P

D

[BADD] = X

all possibleD

P(D) ·J

BAD

DK

Feasibility of Learning Connection to Real Learning

BAD Sample and BAD Data

BAD Sample

e.g., E

_out

=

¹ ₂

, but getting all heads (E

_in

=0)!

BAD Data for One h E out (h) and E _in (h) far away:

e.g., E

_out

big (far from f ), but E

_in

small (correct on most examples)

D

1

D

2

. . . D

1126

. . . D

5678

. . . Hoeffding

h BAD BAD P

D

[BAD D for h] ≤ . . .

Hoeffding: small

P

D

[BADD] = X

all possibleD

P(D) ·J

BAD

DK

Feasibility of Learning Connection to Real Learning

BAD Sample and BAD Data

BAD Sample

e.g., E

_out

=

¹ ₂

, but getting all heads (E

_in

=0)!

BAD Data for One h E out (h) and E _in (h) far away:

e.g., E

_out

big (far from f ), but E

_in

small (correct on most examples)

D

1

D

2

. . . D

1126

. . . D

5678

. . . Hoeffding

h BAD BAD P

D

[BAD D for h] ≤ . . .

Hoeffding: small

P

D

[BADD] = X

all possibleD

P(D) ·J

BAD

DK

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E _in (h) far away

D

1

D

2 ^{. . .}

D

1126 ^{. . .}

D

5678

Hoeffding

h

1

BAD BAD P

D

[BAD D for h

¹

] ≤ . . .

h

2

BAD P

D

[BAD D for h

²

] ≤ . . .

h

₃

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

_M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E _in (h) far away

D

1

D

2 ^{. . .}

D

1126 ^{. . .}

D

5678

Hoeffding

h

1

BAD BAD P

D

[BAD D for h

¹

] ≤ . . .

h

2

BAD P

D

[BAD D for h

²

] ≤ . . .

h

₃

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

_M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E _in (h) far away

D

1

D

2 ^{. . .}

D

1126 ^{. . .}

D

5678

Hoeffding

h

1

BAD BAD P

D

[BAD D for h

¹

] ≤ . . .

h

2

BAD P

D

[BAD D for h

²

] ≤ . . .

h

₃

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

_M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E _in (h) far away

D

¹

D

² ^{. . .}

D

¹¹²⁶ ^{. . .}

D

⁵⁶⁷⁸

Hoeffding

h

₁

BAD BAD P

D

[BAD D for h

1

] ≤ . . .

h

2

BAD P

D

[BAD D for h

²

] ≤ . . .

h

₃

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

_M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E _in (h) far away

D

¹

D

² ^{. . .}

D

¹¹²⁶ ^{. . .}

D

⁵⁶⁷⁸

Hoeffding

h

₁

BAD BAD P

D

[BAD D for h

1

] ≤ . . .

h

2

BAD P

D

[BAD D for h

²

] ≤ . . .

h

₃

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

_M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E _in (h) far away

D

¹

D

² ^{. . .}

D

¹¹²⁶ ^{. . .}

D

⁵⁶⁷⁸

Hoeffding

h

₁

BAD BAD P

D

[BAD D for h

1

] ≤ . . .

h

2

BAD P

D

[BAD D for h

²

] ≤ . . .

h

₃

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

_M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤

P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

^D

[BADD for h

M

] (union bound)

≤

2 exp

−2 ² N

+

2 exp

−2 ² N

+. . . +

2 exp

−2 ² N

= 2Mexp

−2

²

N

•

finite-bin version of Hoeffding, valid for all

M, N and

•

does not depend on any E

_out

(h

_m

),

no need to ‘know’ E _out (h _m )

—f and P can stay unknown

•

‘E

_in

(g) = E

_out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E _in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

^D

[BADD for h

2

] +. . . + P

^D

[BADD for h

M

]

(union bound)

≤

2 exp

−2 ² N

+

2 exp

−2 ² N

+. . . +

2 exp

−2 ² N

= 2Mexp

−2

²

N

•

finite-bin version of Hoeffding, valid for all

M, N and

•

does not depend on any E

_out

(h

_m

),

no need to ‘know’ E _out (h _m )

—f and P can stay unknown

•

‘E

_in

(g) = E

_out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E _in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

^D

[BADD for h

2

] +. . . + P

^D

[BADD for h

M

] (union bound)

≤

2 exp

−2 ² N

+

2 exp

−2 ² N

+. . . +

2 exp

−2 ² N

= 2Mexp

−2

²

N

•

finite-bin version of Hoeffding, valid for all

M, N and

•

does not depend on any E

_out

(h

_m

),

no need to ‘know’ E _out (h _m )

—f and P can stay unknown

•

‘E

_in

(g) = E

_out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E _in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

^D

[BADD for h

2

] +. . . + P

^D

[BADD for h

M

] (union bound)

≤

2 exp

−2 ² N

+

2 exp

−2 ² N

+. . . +

2 exp

−2 ² N

= 2Mexp

−2

²

N

•

finite-bin version of Hoeffding, valid for all

M, N and

•

does not depend on any E

_out

(h

_m

),

no need to ‘know’ E _out (h _m )

—f and P can stay unknown

•

‘E

_in

(g) = E

_out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E _in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

^D

[BADD for h

2

] +. . . + P

^D

[BADD for h

M

] (union bound)

≤

2 exp

−2 ² N

+

2 exp

−2 ² N

+. . . +

2 exp

−2 ² N

= 2Mexp

−2

²

N

•

finite-bin version of Hoeffding, valid for all

M, N and

•

does not depend on any E

_out

(h

_m

),

no need to ‘know’ E _out (h _m )

—f and P can stay unknown

•

‘E

_in

(g) = E

_out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E _in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

^D

[BADD for h

2

] +. . . + P

^D

[BADD for h

M

] (union bound)

≤

2 exp

−2 ² N

+

2 exp

−2 ² N

+. . . +

2 exp

−2 ² N

= 2Mexp

−2

²

N

•

finite-bin version of Hoeffding, valid for all

M, N and

•

does not depend on any E

_out

(h

_m

),

no need to ‘know’ E _out (h _m )

—f and P can stay unknown

•

‘E

_in

(g) = E

_out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E _in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

^D

[BADD for h

2

] +. . . + P

^D

[BADD for h

M

] (union bound)

≤

2 exp

−2 ² N

+

2 exp

−2 ² N

+. . . +

2 exp

−2 ² N

= 2Mexp

−2

²

N

•

finite-bin version of Hoeffding, valid for all

M, N and

•

does not depend on any E

_out

(h

_m

),

no need to ‘know’ E _out (h _m )

—f and P can stay unknown

•

‘E

_in

(g) = E

_out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E _in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

^D

[BADD for h

2

] +. . . + P

^D

[BADD for h

M

] (union bound)

≤

2 exp

−2 ² N

+

2 exp

−2 ² N

+. . . +

2 exp

−2 ² N

= 2Mexp

−2

²

N

•

finite-bin version of Hoeffding, valid for all

M, N and

•

does not depend on any E

_out

(h

_m

),

no need to ‘know’ E _out (h _m )

—f and P can stay unknown

•

‘E

_in

(g) = E

_out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E _in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

^D

[BADD for h

2

] +. . . + P

^D

[BADD for h

M

] (union bound)

≤

2 exp

−2 ² N

+

2 exp

−2 ² N

+. . . +

2 exp

−2 ² N

= 2Mexp

−2

²

N

•

finite-bin version of Hoeffding, valid for all

M, N and

•

does not depend on any E

_out

(h

_m

),

no need to ‘know’ E _out (h _m )

—f and P can stay unknown

•

‘E

_in

(g) = E

_out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E _in (h m )

as g

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

^D

[BADD for h

2

] +. . . + P

^D

[BADD for h

M

] (union bound)

≤

2 exp

−2 ² N

+

2 exp

−2 ² N

+. . . +

2 exp

−2 ² N

= 2Mexp

−2

²

N

•

finite-bin version of Hoeffding, valid for all

M, N and

•

does not depend on any E

_out

(h

_m

),

no need to ‘know’ E _out (h _m )

—f and P can stay unknown

•

‘E

_in

(g) = E

_out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket):

pick the h

m

with

lowest E _in (h m )

as g

Feasibility of Learning Connection to Real Learning

The ‘Statistical’ Learning Flow

if|H| = M finite, N large enough,

for whatever g picked byA, E

^out

(g)≈ E

in

(g)

ifA finds one g with E

in

(g)≈ 0, PAC guarantee for E

_out

(g)≈ 0

=⇒

learning possible :-)

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

_N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

unknown P on X

x

1

, x

2

, · · · , x

N

x

M = ∞? (like perceptrons)

—see you in the next lectures

Feasibility of Learning Connection to Real Learning

The ‘Statistical’ Learning Flow

if|H| = M finite, N large enough,

for whatever g picked byA, E

^out

(g)≈ E

in

(g) ifA finds one g with E

in

(g)≈ 0,

PAC guarantee for E

_out

(g)≈ 0

=⇒

learning possible :-)

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

_N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

unknown P on X

x

1

, x

2

, · · · , x

N

x

M = ∞? (like perceptrons)

—see you in the next lectures

Feasibility of Learning Connection to Real Learning

The ‘Statistical’ Learning Flow

if|H| = M finite, N large enough,

for whatever g picked byA, E

^out

(g)≈ E

in

(g) ifA finds one g with E

in

(g)≈ 0,

PAC guarantee for E

_out

(g)≈ 0 =⇒

learning possible :-)

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

_N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

unknown P on X

x

1

, x

2

, · · · , x

N

x

M = ∞? (like perceptrons)

—see you in the next lectures

Feasibility of Learning Connection to Real Learning

The ‘Statistical’ Learning Flow

if|H| = M finite, N large enough,

for whatever g picked byA, E

^out

(g)≈ E

in

(g) ifA finds one g with E

in

(g)≈ 0,

PAC guarantee for E

_out

(g)≈ 0 =⇒

learning possible :-)

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

_N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

unknown P on X

x

1

, x

2

, · · · , x

N

x

M = ∞? (like perceptrons)

—see you in the next lectures

Feasibility of Learning Connection to Real Learning

Fun Time

Consider 4 hypotheses.

h

₁

(x) = sign(x

₁

), h

₂

(x) = sign(x

₂

), h

₃

(x) = sign(−x

1

), h

₄

(x) = sign(−x

2

).

For any N and, which of the following statement is not true?

1

the

BAD

data of h

₁

and the

BAD

data of h

₂

are exactly the same

2

the

BAD

data of h

₁

and the

BAD

data of h

₃

are exactly the same

3

P

D

[BADfor some h

_k

]≤ 8 exp −2

²

N

4

P

D

[BADfor some h

_k

]≤ 4 exp −2

²

N

Reference Answer: 1

The important thing is to note that 2 is true, which implies that 4 is true if you revisit the union bound. Similar ideas will be used to conquer the M =∞ case.

Feasibility of Learning Connection to Real Learning

Fun Time

Consider 4 hypotheses.

h

₁

(x) = sign(x

₁

), h

₂

(x) = sign(x

₂

), h

₃

(x) = sign(−x

1

), h

₄

(x) = sign(−x

2

).

For any N and, which of the following statement is not true?

1

the

BAD

data of h

₁

and the

BAD

data of h

₂

are exactly the same

2

the

BAD

data of h

₁

and the

BAD

data of h

₃

are exactly the same

3

P

D

[BADfor some h

_k

]≤ 8 exp −2

²

N

4

P

D

[BADfor some h

_k

]≤ 4 exp −2

²

N

Reference Answer: 1

The important thing is to note that 2 is true, which implies that 4 is true if you revisit the union bound. Similar ideas will be used to conquer the M =∞ case.

Feasibility of Learning Connection to Real Learning

Summary

1 When

Can Machines Learn?

Lecture 3: Types of Learning Lecture 4: Feasibility of Learning

Learning is Impossible?

absolutely no free lunch outside D Probability to the Rescue

probably approximately correct outside D Connection to Learning

verification possible if E _in (h) small for fixed h

Connection to Real Learning

在文檔中 Machine Learning Foundations (ᘤ9M) (頁 65-94)

立即下載 "Machine Learning Found..."

Outline

相關文件