• 沒有找到結果。

Machine Learning Foundations (ᘤ9M)

N/A
N/A
Protected

Academic year: 2022

Share "Machine Learning Foundations (ᘤ9M)"

Copied!
94
0
0

加載中.... (立即查看全文)

全文

(1)

Machine Learning Foundations

( 機器學習基石)

Lecture 4: Feasibility of Learning

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

Department of Computer Science

& Information Engineering

National Taiwan University

( 國立台灣大學資訊工程系)

(2)

Feasibility of Learning

Roadmap

1 When

Can Machines Learn?

Lecture 3: Types of Learning

focus:

binary classification

or

regression

from a

batch

of

supervised

data with

concrete

features

Lecture 4: Feasibility of Learning

Learning is Impossible?

Probability to the Rescue Connection to Learning Connection to Real Learning

2 Why Can Machines Learn?

3 How Can Machines Learn?

(3)

Feasibility of Learning Learning is Impossible?

A Learning Puzzle

y

n

= −1

y

n

= +1

g(x) = ?

let’s test your ‘human learning’

with 6 examples :-)

(4)

Feasibility of Learning Learning is Impossible?

Two Controversial Answers

whatever you say about g(x),

yn=−1

yn= +1

g(x) = ?

y n = −1

y n = +1

g(x) = ?

truth f (x) = +1 because . . .

symmetry⇔ +1

(black or white count = 3) or (black count = 4 and

middle-top black)⇔ +1

truth f (x) = −1 because . . .

left-top black⇔ -1

middle column contains at most 1 black and right-top white⇔ -1

p

all valid reasons, your

adversarial teacher

can always call you ‘didn’t learn’.

:-(

(5)

Feasibility of Learning Learning is Impossible?

Two Controversial Answers

whatever you say about g(x),

yn=−1

yn= +1

g(x) = ?

y n = −1

y n = +1

g(x) = ?

truth f (x) = +1 because . . .

symmetry⇔ +1

(black or white count = 3) or (black count = 4 and

middle-top black)⇔ +1

truth f (x) = −1 because . . .

left-top black⇔ -1

middle column contains at most 1 black and right-top white⇔ -1

p

all valid reasons, your

adversarial teacher

can always call you ‘didn’t learn’.

:-(

(6)

Feasibility of Learning Learning is Impossible?

Two Controversial Answers

whatever you say about g(x),

yn=−1

yn= +1

g(x) = ?

y n = −1

y n = +1

g(x) = ?

truth f (x) = +1 because . . .

symmetry⇔ +1

(black or white count = 3) or (black count = 4 and

middle-top black)⇔ +1

truth f (x) = −1 because . . .

left-top black⇔ -1

middle column contains at most 1 black and right-top white⇔ -1

p

all valid reasons, your

adversarial teacher

can always call you ‘didn’t learn’.

:-(

(7)

Feasibility of Learning Learning is Impossible?

Two Controversial Answers

whatever you say about g(x),

yn=−1

yn= +1

g(x) = ?

y n = −1

y n = +1

g(x) = ?

truth f (x) = +1 because . . .

symmetry⇔ +1

(black or white count = 3) or (black count = 4 and

middle-top black)⇔ +1

truth f (x) = −1 because . . .

left-top black⇔ -1

middle column contains at most 1 black and right-top white⇔ -1

p

all valid reasons, your

adversarial teacher

can always call you ‘didn’t learn’.

:-(

(8)

Feasibility of Learning Learning is Impossible?

Two Controversial Answers

whatever you say about g(x),

yn=−1

yn= +1

g(x) = ?

y n = −1

y n = +1

g(x) = ?

truth f (x) = +1 because . . .

symmetry⇔ +1

(black or white count = 3) or (black count = 4 and

middle-top black)⇔ +1

truth f (x) = −1 because . . .

left-top black⇔ -1

middle column contains at most 1 black and right-top white⇔ -1

p

all valid reasons, your

adversarial teacher

can always call you ‘didn’t learn’.

:-(

(9)

Feasibility of Learning Learning is Impossible?

Two Controversial Answers

whatever you say about g(x),

yn=−1

yn= +1

g(x) = ?

y n = −1

y n = +1

g(x) = ?

truth f (x) = +1 because . . .

symmetry⇔ +1

(black or white count = 3) or (black count = 4 and

middle-top black)⇔ +1

truth f (x) = −1 because . . .

left-top black⇔ -1

middle column contains at most 1 black and right-top white⇔ -1

p

all valid reasons, your

adversarial teacher

can always call you ‘didn’t learn’.

:-(

(10)

Feasibility of Learning Learning is Impossible?

A ‘Simple’ Binary Classification Problem

x

n

y

n

= f (x

n

)

0 0 0 ◦

0 0 1 ×

0 1 0 ×

0 1 1 ◦

1 0 0 ×

X = {0, 1}

3

,Y = {

◦, ×

}, can enumerate all candidate f as H

pick g ∈ H with all g(x

n

) =y

n

(like PLA),

does g ≈ f ?

(11)

Feasibility of Learning Learning is Impossible?

No Free Lunch

D

x y g f

1

f

2

f

3

f

4

f

5

f

6

f

7

f

8

0 0 0 ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

0 0 1 × × × × × × × × × ×

0 1 0 × × × × × × × × × ×

0 1 1 ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

1 0 0 × × × × × × × × × ×

1 0 1 ? ◦ ◦ ◦ ◦ × × × ×

1 1 0 ? ◦ ◦ × × ◦ ◦ × ×

1 1 1 ? ◦ × ◦ × ◦ × ◦ ×

g ≈ f inside D: sure!

g ≈ f outside D:

No!

(but that’s really what we want!)

learning fromD (to infer something outside D) is doomed if

any ‘unknown’ f can happen. :-(

(12)

Feasibility of Learning Learning is Impossible?

No Free Lunch

D

x y g f

1

f

2

f

3

f

4

f

5

f

6

f

7

f

8

0 0 0 ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

0 0 1 × × × × × × × × × ×

0 1 0 × × × × × × × × × ×

0 1 1 ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

1 0 0 × × × × × × × × × ×

1 0 1 ? ◦ ◦ ◦ ◦ × × × ×

1 1 0 ? ◦ ◦ × × ◦ ◦ × ×

1 1 1 ? ◦ × ◦ × ◦ × ◦ ×

g ≈ f inside D: sure!

g ≈ f outside D:

No!

(but that’s really what we want!)

learning fromD (to infer something outside D) is doomed if

any ‘unknown’ f can happen. :-(

(13)

Feasibility of Learning Learning is Impossible?

No Free Lunch

D

x y g f

1

f

2

f

3

f

4

f

5

f

6

f

7

f

8

0 0 0 ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

0 0 1 × × × × × × × × × ×

0 1 0 × × × × × × × × × ×

0 1 1 ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

1 0 0 × × × × × × × × × ×

1 0 1 ? ◦ ◦ ◦ ◦ × × × ×

1 1 0 ? ◦ ◦ × × ◦ ◦ × ×

1 1 1 ? ◦ × ◦ × ◦ × ◦ ×

g ≈ f inside D: sure!

g ≈ f outside D:

No!

(but that’s really what we want!)

learning fromD (to infer something outside D) is doomed if

any ‘unknown’ f can happen. :-(

(14)

Feasibility of Learning Learning is Impossible?

Fun Time

This is a popular ‘brain-storming’ problem, with a claim that 2%

of the world’s cleverest population can crack its ‘hidden pattern’.

(5, 3, 2)→ 151022, (7, 2, 5)→

?

It is like a ‘learning problem’ with N = 1,

x 1

= (5, 3, 2), y

1

=151022.

Learn a hypothesis from the one example to predict on

x = (7, 2, 5).

What is your answer?

1

151026

2

143547

3

I need more examples to get the correct answer

4

there is no ‘correct’ answer

Reference Answer: 4

Following the same nature of the no-free-lunch problems discussed, we cannot hope to be correct under this ‘adversarial’ setting. BTW,

2 is the designer’s answer: the first two digits = x

1

· x

2

; the next two digits = x

1

· x

3

; the last two digits = (x

1

· x

2

+x

1

· x

3

− x

2

).

(15)

Feasibility of Learning Learning is Impossible?

Fun Time

This is a popular ‘brain-storming’ problem, with a claim that 2%

of the world’s cleverest population can crack its ‘hidden pattern’.

(5, 3, 2)→ 151022, (7, 2, 5)→

?

It is like a ‘learning problem’ with N = 1,

x 1

= (5, 3, 2), y

1

=151022.

Learn a hypothesis from the one example to predict on

x = (7, 2, 5).

What is your answer?

1

151026

2

143547

3

I need more examples to get the correct answer

4

there is no ‘correct’ answer

Reference Answer: 4

Following the same nature of the no-free-lunch problems discussed, we cannot hope to be correct under this ‘adversarial’ setting. BTW,

2 is the designer’s answer: the first two digits = x

1

· x

2

; the next two digits = x

1

· x

3

; the last two digits = (x

1

· x

2

+x

1

· x

3

− x

2

).

(16)

Feasibility of Learning Probability to the Rescue

Inferring Something Unknown

difficult to infer

unknown target f outside D

in learning;

can we infer

something unknown

in

other scenarios?

top

bottom

consider a bin of many many

orange

and

green

marbles

do we

know

the

orange

portion (probability)?

No!

can you

infer

the

orange

probability?

(17)

Feasibility of Learning Probability to the Rescue

Inferring Something Unknown

difficult to infer

unknown target f outside D

in learning;

can we infer

something unknown

in

other scenarios?

top

bottom

consider a bin of many many

orange

and

green

marbles

do we

know

the

orange

portion (probability)?

No!

can you

infer

the

orange

probability?

(18)

Feasibility of Learning Probability to the Rescue

Inferring Something Unknown

difficult to infer

unknown target f outside D

in learning;

can we infer

something unknown

in

other scenarios?

top

bottom

consider a bin of many many

orange

and

green

marbles

do we

know

the

orange

portion (probability)?

No!

can you

infer

the

orange

probability?

(19)

Feasibility of Learning Probability to the Rescue

Statistics 101: Inferring Orange Probability

top

bottom

top

bottom

sample

bin

bin

assume

orange

probability =µ,

green

probability = 1− µ, withµ

unknown

sample

N marbles sampled independently, with

orange

fraction =ν,

green

fraction = 1− ν, nowν

known

does

in-sample ν

say anything about out-of-sampleµ?

(20)

Feasibility of Learning Probability to the Rescue

Statistics 101: Inferring Orange Probability

top

bottom top

bottom

sample

bin

bin

assume

orange

probability =µ,

green

probability = 1− µ, withµ

unknown

sample

N marbles sampled independently, with

orange

fraction =ν,

green

fraction = 1− ν, nowν

known

does

in-sample ν

say anything about out-of-sampleµ?

(21)

Feasibility of Learning Probability to the Rescue

Statistics 101: Inferring Orange Probability

top

bottom top

bottom

sample

bin bin

assume

orange

probability =µ,

green

probability = 1− µ, withµ

unknown

sample

N marbles sampled independently, with

orange

fraction =ν,

green

fraction = 1− ν, nowν

known

does

in-sample ν

say anything about out-of-sampleµ?

(22)

Feasibility of Learning Probability to the Rescue

Statistics 101: Inferring Orange Probability

top

bottom top

bottom

sample

bin bin

assume

orange

probability =µ,

green

probability = 1− µ, withµ

unknown

sample

N marbles sampled independently, with

orange

fraction =ν,

green

fraction = 1− ν, nowν

known

does

in-sample ν

say anything about out-of-sampleµ?

(23)

Feasibility of Learning Probability to the Rescue

Statistics 101: Inferring Orange Probability

top

bottom top

bottom

sample

bin bin

assume

orange

probability =µ,

green

probability = 1− µ, withµ

unknown

sample

N marbles sampled independently, with

orange

fraction =ν,

green

fraction = 1− ν, nowν

known

does

in-sample ν

say anything about out-of-sampleµ?

(24)

Feasibility of Learning Probability to the Rescue

Possible versus Probable

does

in-sample ν

say anything about out-of-sampleµ?

No!

possibly not: sample can be mostly

green

while bin is mostly

orange Yes!

probably yes: in-sampleν likely

close to

unknownµ

top

bottom top

bottom

sample

bin

formally,

what does ν say about µ?

(25)

Feasibility of Learning Probability to the Rescue

Possible versus Probable

does

in-sample ν

say anything about out-of-sampleµ?

No!

possibly not: sample can be mostly

green

while bin is mostly

orange

Yes!

probably yes: in-sampleν likely

close to

unknownµ

top

bottom top

bottom

sample

bin

formally,

what does ν say about µ?

(26)

Feasibility of Learning Probability to the Rescue

Possible versus Probable

does

in-sample ν

say anything about out-of-sampleµ?

No!

possibly not: sample can be mostly

green

while bin is mostly

orange Yes!

probably yes: in-sampleν likely

close to

unknownµ

top

bottom top

bottom

sample

bin

formally,

what does ν say about µ?

(27)

Feasibility of Learning Probability to the Rescue

Possible versus Probable

does

in-sample ν

say anything about out-of-sampleµ?

No!

possibly not: sample can be mostly

green

while bin is mostly

orange Yes!

probably yes: in-sampleν likely

close to

unknownµ

top

bottom top

bottom

sample

bin

formally,

what does ν say about µ?

(28)

Feasibility of Learning Probability to the Rescue

Hoeffding’s Inequality (1/2)

top

bottom top

bottom

sample of size N

bin

µ =

orange

probability in bin

ν =

orange

fraction in sample

in big sample

(N large),

ν is probably close to µ

(within )

P

ν− µ

>



 ≤ 2 exp

−2

 2 N



called

Hoeffding’s Inequality, for marbles, coin, polling,

. . . the statement ‘ν = µ’ is

probably approximately correct

(PAC)

(29)

Feasibility of Learning Probability to the Rescue

Hoeffding’s Inequality (1/2)

top

bottom top

bottom

sample of size N

bin

µ =

orange

probability in bin

ν =

orange

fraction in sample

in big sample

(N large),

ν is probably close to µ

(within )

P

ν− µ

>



 ≤ 2 exp

−2

 2 N



called

Hoeffding’s Inequality, for marbles, coin, polling,

. . .

the statement ‘ν = µ’ is

probably approximately correct

(PAC)

(30)

Feasibility of Learning Probability to the Rescue

Hoeffding’s Inequality (1/2)

top

bottom top

bottom

sample of size N

bin

µ =

orange

probability in bin

ν =

orange

fraction in sample

in big sample

(N large),

ν is probably close to µ

(within )

P

ν− µ

>



 ≤ 2 exp

−2

 2 N



called

Hoeffding’s Inequality, for marbles, coin, polling,

. . .

(31)

Feasibility of Learning Probability to the Rescue

Hoeffding’s Inequality (2/2)

P ν− µ

>



 ≤ 2 exp

−2

 2 N



valid for all

N

and



does not depend onµ,

no need to ‘know’ µ

• larger sample size N

or

looser gap 

=⇒ higher probability for ‘ν ≈ µ’

top

bottom top

bottom

sample of size N

bin

if

large N

, can

probably

infer unknownµ by known ν

(32)

Feasibility of Learning Probability to the Rescue

Hoeffding’s Inequality (2/2)

P ν− µ

>



 ≤ 2 exp

−2

 2 N



valid for all

N

and



does not depend onµ,

no need to ‘know’ µ

• larger sample size N

or

looser gap 

=⇒ higher probability for ‘ν ≈ µ’

top

bottom top

bottom

sample of size N

bin

if

large N

, can

probably

infer unknownµ by known ν

(33)

Feasibility of Learning Probability to the Rescue

Hoeffding’s Inequality (2/2)

P ν− µ

>



 ≤ 2 exp

−2

 2 N



valid for all

N

and



does not depend onµ,

no need to ‘know’ µ

• larger sample size N

or

looser gap 

=⇒ higher probability for ‘ν ≈ µ’

top

bottom top

bottom

sample of size N

bin

if

large N

, can

probably

infer unknownµ by known ν

(34)

Feasibility of Learning Probability to the Rescue

Hoeffding’s Inequality (2/2)

P ν− µ

>



 ≤ 2 exp

−2

 2 N



valid for all

N

and



does not depend onµ,

no need to ‘know’ µ

• larger sample size N

or

looser gap 

=⇒ higher probability for ‘ν ≈ µ’

top

bottom top

bottom

sample of size N

bin

if

large N

, can

probably

infer

unknownµ by known ν

(35)

Feasibility of Learning Probability to the Rescue

Fun Time

Let µ = 0.4. Use Hoeffding’s Inequality P 

ν − µ

>   ≤ 2 exp −2 2 N 

to bound the probability that a sample of 10 marbles will have ν ≤ 0.1. What bound do you get?

1

0.67

2

0.40

3

0.33

4

0.05

Reference Answer: 3

Set N = 10 and = 0.3 and you get the answer. BTW, 4 is the actual probability and Hoeffding gives only an upper bound to that.

(36)

Feasibility of Learning Probability to the Rescue

Fun Time

Let µ = 0.4. Use Hoeffding’s Inequality P 

ν − µ

>   ≤ 2 exp −2 2 N 

to bound the probability that a sample of 10 marbles will have ν ≤ 0.1. What bound do you get?

1

0.67

2

0.40

3

0.33

4

0.05

Reference Answer: 3

Set N = 10 and = 0.3 and you get the

(37)

Feasibility of Learning Connection to Learning

Connection to Learning

bin

unknown

orange

prob. µ

marble

∈ bin

• orange •

• green •

size-N sample from bin of i.i.d. marbles

learning

fixed hypothesis h(x)=

?

target f

(x)

x

∈ X

h is

wrong

h(x) 6= f (x)

h is

right

h(x) = f (x)

check h onD = {(x

n

, y

n

|{z}

f (x

n

)

)} with i.i.d.

x n

if

large N & i.i.d. x n

, can

probably

infer unknownJh(x) 6= f (x)K probability

by knownJh(x

n

)6= y

n

K fraction

top

X

• h(x) 6= f (x)

• h(x) = f (x)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/26

(38)

Feasibility of Learning Connection to Learning

Connection to Learning

bin

unknown

orange

prob. µ

marble

∈ bin

• orange •

• green •

size-N sample from bin of i.i.d. marbles

learning

fixed hypothesis h(x)=

?

target f

(x)

x

∈ X

h is

wrong

h(x) 6= f (x)

h is

right

h(x) = f (x)

check h onD = {(x

n

, y

n

|{z}

f (x

n

)

)} with i.i.d.

x n

if

large N & i.i.d. x n

, can

probably

infer unknownJh(x) 6= f (x)K probability

by knownJh(x

n

)6= y

n

K fraction

top

X

• h(x) 6= f (x)

• h(x) = f (x)

(39)

Feasibility of Learning Connection to Learning

Connection to Learning

bin

unknown

orange

prob. µ

marble

∈ bin

• orange •

• green •

size-N sample from bin of i.i.d. marbles

learning

fixed hypothesis h(x)=

?

target f (x)

x

∈ X

h is

wrong

h(x) 6= f (x)

h is

right

h(x) = f (x)

check h onD = {(x

n

, y

n

|{z}

f (x

n

)

)} with i.i.d.

x n

if

large N & i.i.d. x n

, can

probably

infer unknownJh(x) 6= f (x)K probability

by knownJh(x

n

)6= y

n

K fraction

top

X

• h(x) 6= f (x)

• h(x) = f (x)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/26

(40)

Feasibility of Learning Connection to Learning

Connection to Learning

bin

unknown

orange

prob. µ

marble

∈ bin

• orange •

• green •

size-N sample from bin of i.i.d. marbles

learning

fixed hypothesis h(x)=

?

target f (x)

x

∈ X

h is

wrong

h(x) 6= f (x)

h is

right

h(x) = f (x)

check h onD = {(x

n

, y

n

|{z}

f (x

n

)

)} with i.i.d.

x n

if

large N & i.i.d. x n

, can

probably

infer unknownJh(x) 6= f (x)K probability

by knownJh(x

n

)6= y

n

K fraction

top

X

• h(x) 6= f (x)

• h(x) = f (x)

(41)

Feasibility of Learning Connection to Learning

Connection to Learning

bin

unknown

orange

prob. µ

marble

∈ bin

• orange •

• green •

size-N sample from bin of i.i.d. marbles

learning

fixed hypothesis h(x)=

?

target f (x)

x

∈ X

h is

wrong

h(x) 6= f (x)

h is

right

h(x) = f (x)

check h onD = {(x

n

, y

n

|{z}

f (x

n

)

)} with i.i.d.

x n

if

large N & i.i.d. x n

, can

probably

infer unknownJh(x) 6= f (x)K probability

by knownJh(x

n

)6= y

n

K fraction

top

X

• h(x) 6= f (x)

• h(x) = f (x)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/26

(42)

Feasibility of Learning Connection to Learning

Connection to Learning

bin

unknown

orange

prob. µ

marble

∈ bin

• orange •

• green •

size-N sample from bin of i.i.d. marbles

learning

fixed hypothesis h(x)=

?

target f (x)

x

∈ X

h is

wrong

h(x) 6= f (x)

h is

right

h(x) = f (x)

check h onD = {(x

n

, y

n

|{z}

f (x

n

)

)} with i.i.d.

x n

if

large N & i.i.d. x n

, can

probably

infer unknownJh(x) 6= f (x)K probability

by knownJh(x

n

)6= y

n

K fraction

top

X

• h(x) 6= f (x)

• h(x) = f (x)

(43)

Feasibility of Learning Connection to Learning

Connection to Learning

bin

unknown

orange

prob. µ

marble

∈ bin

• orange •

• green •

size-N sample from bin of i.i.d. marbles

learning

fixed hypothesis h(x)=

?

target f (x)

x

∈ X

h is

wrong

h(x) 6= f (x)

h is

right

h(x) = f (x)

check h onD = {(x

n

, y

n

|{z}

f (x

n

)

)} with i.i.d.

x n

if

large N & i.i.d. x n

, can

probably

infer unknownJh(x) 6= f (x)K probability

by knownJh(x

n

)6= y

n

K fraction

top

X

• h(x) 6= f (x)

• h(x) = f (x)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/26

(44)

Feasibility of Learning Connection to Learning

Connection to Learning

bin

unknown

orange

prob. µ

marble

∈ bin

• orange •

• green •

size-N sample from bin of i.i.d. marbles

learning

fixed hypothesis h(x)=

?

target f (x)

x

∈ X

h is

wrong

h(x) 6= f (x)

h is

right

h(x) = f (x)

check h onD = {(x

n

, y

n

|{z}

f (x

n

)

)} with i.i.d.

x n

if

large N & i.i.d. x n

, can

probably

infer unknownJh(x) 6= f (x)K probability

by knownJh(x

n

)6= y

n

K fraction

top

X

• h(x) 6= f (x)

• h(x) = f (x)

(45)

Feasibility of Learning Connection to Learning

Connection to Learning

bin

unknown

orange

prob. µ

marble

∈ bin

• orange •

• green •

size-N sample from bin of i.i.d. marbles

learning

fixed hypothesis h(x)=

?

target f (x)

x

∈ X

h is

wrong

h(x) 6= f (x)

h is

right

h(x) = f (x)

check h onD = {(x

n

, y

n

|{z}

f (x

n

)

)} with i.i.d.

x n

if

large N & i.i.d. x n

, can

probably

infer unknownJh(x) 6= f (x)K probability

by knownJh(x

n

)6= y

n

K fraction

top

X

• h(x) 6= f (x)

• h(x) = f (x)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/26

(46)

Feasibility of Learning Connection to Learning

Added Components

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

unknown P on X x

1

, x

2

, · · · , x

N

h ≈ f

?

fixed h x

for any fixed h, can probably infer

unknown E out (h)

= E

x∼P

Jh(x) 6= f (x)K

(47)

Feasibility of Learning Connection to Learning

The Formal Guarantee

for any fixed h, in ‘big’ data

(N large),

for any fixed h,

in-sample error E

in

(h) is probably close to

for any fixed h,

out-of-sample error E

out

(h)

(within )

P

E

in

(h)− E

out

(h)

>



 ≤ 2 exp

−2

 2 N



same as the ‘bin’ analogy . . .

valid for all

N

and



does not depend on E

out

(h),

no need to ‘know’ E out (h)

—f and P can stay unknown

‘E

in

(h) = E

out

(h)’ is

probably approximately correct (PAC)

=⇒ if

‘E in (h) ≈ E out (h)’

and

‘E in (h) small’

=⇒ E

out

(h) small =⇒ h ≈ f with respect to P

(48)

Feasibility of Learning Connection to Learning

The Formal Guarantee

for any fixed h, in ‘big’ data

(N large),

for any fixed h,

in-sample error E

in

(h) is probably close to

for any fixed h,

out-of-sample error E

out

(h)

(within )

P

E

in

(h)− E

out

(h)

>



 ≤ 2 exp

−2

 2 N



same as the ‘bin’ analogy . . .

valid for all

N

and



does not depend on E

out

(h),

no need to ‘know’ E out (h)

—f and P can stay unknown

‘E

in

(h) = E

out

(h)’ is

probably approximately correct (PAC)

=⇒ if

‘E in (h) ≈ E out (h)’

and

‘E in (h) small’

=⇒ E

out

(h) small =⇒ h ≈ f with respect to P

(49)

Feasibility of Learning Connection to Learning

The Formal Guarantee

for any fixed h, in ‘big’ data

(N large),

for any fixed h,

in-sample error E

in

(h) is probably close to

for any fixed h,

out-of-sample error E

out

(h)

(within )

P

E

in

(h)− E

out

(h)

>



 ≤ 2 exp

−2

 2 N



same as the ‘bin’ analogy . . .

valid for all

N

and



does not depend on E

out

(h),

no need to ‘know’ E out (h)

—f and P can stay unknown

‘E

in

(h) = E

out

(h)’ is

probably approximately correct (PAC)

=⇒ if

‘E in (h) ≈ E out (h)’

and

‘E in (h) small’

=⇒ E

out

(h) small =⇒ h ≈ f with respect to P

(50)

Feasibility of Learning Connection to Learning

The Formal Guarantee

for any fixed h, in ‘big’ data

(N large),

for any fixed h,

in-sample error E

in

(h) is probably close to

for any fixed h,

out-of-sample error E

out

(h)

(within )

P

E

in

(h)− E

out

(h)

>



 ≤ 2 exp

−2

 2 N



same as the ‘bin’ analogy . . .

valid for all

N

and



does not depend on E

out

(h),

no need to ‘know’ E out (h)

—f and P can stay unknown

‘E

in

(h) = E

out

(h)’ is

probably approximately correct (PAC)

=⇒ if

‘E in (h) ≈ E out (h)’

and

‘E in (h) small’

=⇒ E

out

(h) small =⇒ h ≈ f with respect to P

(51)

Feasibility of Learning Connection to Learning

The Formal Guarantee

for any fixed h, in ‘big’ data

(N large),

for any fixed h,

in-sample error E

in

(h) is probably close to

for any fixed h,

out-of-sample error E

out

(h)

(within )

P

E

in

(h)− E

out

(h)

>



 ≤ 2 exp

−2

 2 N



same as the ‘bin’ analogy . . .

valid for all

N

and



does not depend on E

out

(h),

no need to ‘know’ E out (h)

—f and P can stay unknown

‘E

in

(h) = E

out

(h)’ is

probably approximately correct (PAC)

=⇒

if

‘E in (h) ≈ E out (h)’

and

‘E in (h) small’

=⇒ E

out

(h) small

=⇒ h ≈ f with respect to P

(52)

Feasibility of Learning Connection to Learning

The Formal Guarantee

for any fixed h, in ‘big’ data

(N large),

for any fixed h,

in-sample error E

in

(h) is probably close to

for any fixed h,

out-of-sample error E

out

(h)

(within )

P

E

in

(h)− E

out

(h)

>



 ≤ 2 exp

−2

 2 N



same as the ‘bin’ analogy . . .

valid for all

N

and



does not depend on E

out

(h),

no need to ‘know’ E out (h)

—f and P can stay unknown

‘E

in

(h) = E

out

(h)’ is

probably approximately correct (PAC)

=⇒

(53)

Feasibility of Learning Connection to Learning

Verification of One h

for any fixed h, when data large enough, E

in

(h)≈ E

out

(h)

Can we claim ‘good learning’ (g ≈ f )? Yes!

if E in (h) small for the fixed h

if

and

A pick the h as g

=⇒ ‘g = f ’ PAC

No!

if A forced to pick THE h as g

=⇒

E in (h) almost always not small

=⇒ ‘g 6= f ’ PAC!

real learning:

A shall

make choices ∈ H

(like PLA) rather than

being forced to pick one h. :-(

(54)

Feasibility of Learning Connection to Learning

Verification of One h

for any fixed h, when data large enough, E

in

(h)≈ E

out

(h)

Can we claim ‘good learning’ (g ≈ f )?

Yes!

if E in (h) small for the fixed h

if

and

A pick the h as g

=⇒ ‘g = f ’ PAC

No!

if A forced to pick THE h as g

=⇒

E in (h) almost always not small

=⇒ ‘g 6= f ’ PAC!

real learning:

A shall

make choices ∈ H

(like PLA) rather than

being forced to pick one h. :-(

(55)

Feasibility of Learning Connection to Learning

Verification of One h

for any fixed h, when data large enough, E

in

(h)≈ E

out

(h)

Can we claim ‘good learning’ (g ≈ f )?

Yes!

if E in (h) small for the fixed h

if

and

A pick the h as g

=⇒ ‘g = f ’ PAC

No!

if A forced to pick THE h as g

=⇒

E in (h) almost always not small

=⇒ ‘g 6= f ’ PAC!

real learning:

A shall

make choices ∈ H

(like PLA) rather than

being forced to pick one h. :-(

(56)

Feasibility of Learning Connection to Learning

Verification of One h

for any fixed h, when data large enough, E

in

(h)≈ E

out

(h)

Can we claim ‘good learning’ (g ≈ f )?

Yes!

if E in (h) small for the fixed h

if

and

A pick the h as g

=⇒ ‘g = f ’ PAC

No!

if A forced to pick THE h as g

=⇒

E in (h) almost always not small

=⇒ ‘g 6= f ’ PAC!

real learning:

A shall

make choices ∈ H

(like PLA) rather than

being forced to pick one h. :-(

(57)

Feasibility of Learning Connection to Learning

Verification of One h

for any fixed h, when data large enough, E

in

(h)≈ E

out

(h)

Can we claim ‘good learning’ (g ≈ f )?

Yes!

if E in (h) small for the fixed h

if

and

A pick the h as g

=⇒ ‘g = f ’ PAC

No!

if A forced to pick THE h as g

=⇒

E in (h) almost always not small

=⇒ ‘g 6= f ’ PAC!

real learning:

A shall

make choices ∈ H

(like PLA) rather than

being forced to pick one h. :-(

(58)

Feasibility of Learning Connection to Learning

Verification of One h

for any fixed h, when data large enough, E

in

(h)≈ E

out

(h)

Can we claim ‘good learning’ (g ≈ f )?

Yes!

if E in (h) small for the fixed h

if

and

A pick the h as g

=⇒ ‘g = f ’ PAC

No!

if A forced to pick THE h as g

=⇒

E in (h) almost always not small

=⇒ ‘g 6= f ’ PAC!

real learning:

A shall

make choices ∈ H

(like PLA) rather than

being forced to pick one h. :-(

(59)

Feasibility of Learning Connection to Learning

Verification of One h

for any fixed h, when data large enough, E

in

(h)≈ E

out

(h)

Can we claim ‘good learning’ (g ≈ f )?

Yes!

if E in (h) small for the fixed h

if

and

A pick the h as g

=⇒ ‘g = f ’ PAC

No!

if A forced to pick THE h as g

=⇒

E in (h) almost always not small

=⇒ ‘g 6= f ’ PAC!

real learning:

A shall

make choices ∈ H

(like PLA) rather than

being forced to pick one h. :-(

(60)

Feasibility of Learning Connection to Learning

The ‘Verification’ Flow

unknown target function f : X → Y

(ideal credit approval formula)

verifying examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

final hypothesis g ≈ f

(given formula to be verified) g = h

one hypothesis

h

(one candidate formula)

unknown P on X

x

1

, x

2

, · · · , x

N

x

can now use ‘historical records’ (data) to

(61)

Feasibility of Learning Connection to Learning

Fun Time

Your friend tells you her secret rule in investing in a particular stock:

‘Whenever the stock goes down in the morning, it will go up in the afternoon;

vice versa.’ To verify the rule, you chose 100 days uniformly at random from the past 10 years of stock data, and found that 80 of them satisfy the rule. What is the best guarantee that you can get from the verification?

1

You’ll definitely be rich by exploiting the rule in the next 100 days.

2

You’ll likely be rich by exploiting the rule in the next 100 days, if the market behaves similarly to the last 10 years.

3

You’ll likely be rich by exploiting the ‘best rule’ from 20 more friends in the next 100 days.

4

You’d definitely have been rich if you had exploited the rule in the past 10 years.

Reference Answer: 2

1 : no free lunch; 3 : no ‘learning’ guarantee in verification; 4 : verifying

with only 100 days, possible that the rule is mostly wrong for whole 10 years.

(62)

Feasibility of Learning Connection to Learning

Fun Time

Your friend tells you her secret rule in investing in a particular stock:

‘Whenever the stock goes down in the morning, it will go up in the afternoon;

vice versa.’ To verify the rule, you chose 100 days uniformly at random from the past 10 years of stock data, and found that 80 of them satisfy the rule. What is the best guarantee that you can get from the verification?

1

You’ll definitely be rich by exploiting the rule in the next 100 days.

2

You’ll likely be rich by exploiting the rule in the next 100 days, if the market behaves similarly to the last 10 years.

3

You’ll likely be rich by exploiting the ‘best rule’ from 20 more friends in the next 100 days.

4

You’d definitely have been rich if you had exploited the rule in the past 10 years.

Reference Answer: 2

(63)

Feasibility of Learning Connection to Real Learning

Multiple h

. . . .

top

bottom

h

1

h

2

h

M

E

out

(h

1

) E

out

(h

2

) E

out

(h

M

)

E

in

(h

1

) E

in

(h

2

) E

in

(h

M

)

real learning (say like PLA):

BINGO

when getting

••••••••••

?

(64)

Feasibility of Learning Connection to Real Learning

Multiple h

. . . .

top

h

1

h

2

h

M

E

out

(h

1

) E

out

(h

2

) E

out

(h

M

)

E

in

(h

1

) E

in

(h

2

) E

in

(h

M

)

(65)

Feasibility of Learning Connection to Real Learning

Coin Game

. . . .

top

bottom

Q: if everyone in size-150 NTU ML class

flips a coin 5 times, and one of the students gets 5 heads for her coin ‘g’. Is ‘g’ really magical?

A: No. Even if all coins are fair, the probability that

one of the coins

results in

5 heads

is 1−

31 32



150

> 99%.

BAD sample: E in and E out far away

—can get worse when involving ‘choice’

(66)

Feasibility of Learning Connection to Real Learning

Coin Game

. . . .

top

bottom

Q: if everyone in size-150 NTU ML class

flips a coin 5 times, and one of the students gets 5 heads for her coin ‘g’. Is ‘g’ really magical?

A: No. Even if all coins are fair, the probability that

one of the coins

results in

5 heads

is 1−

31 32



150

> 99%.

BAD sample: E in and E out far away

—can get worse when involving ‘choice’

(67)

Feasibility of Learning Connection to Real Learning

Coin Game

. . . .

top

bottom

Q: if everyone in size-150 NTU ML class

flips a coin 5 times, and one of the students gets 5 heads for her coin ‘g’. Is ‘g’ really magical?

A: No. Even if all coins are fair, the probability that

one of the coins

results in

5 heads

is 1−

31 32



150

> 99%.

BAD sample: E in and E out far away

—can get worse when involving ‘choice’

(68)

Feasibility of Learning Connection to Real Learning

BAD Sample and BAD Data

BAD Sample

e.g., E

out

=

1 2

, but getting all heads (E

in

=0)!

BAD Data for One h E out (h) and E in (h) far away:

e.g., E

out

big (far from f ), but E

in

small (correct on most examples)

D

1

D

2

. . . D

1126

. . . D

5678

. . . Hoeffding

h BAD BAD P

D

[BAD D for h] ≤ . . .

Hoeffding: small

P

D

[BADD] = X

all possibleD

P(D) ·J

BAD

DK

(69)

Feasibility of Learning Connection to Real Learning

BAD Sample and BAD Data

BAD Sample

e.g., E

out

=

1 2

, but getting all heads (E

in

=0)!

BAD Data for One h E out (h) and E in (h) far away:

e.g., E

out

big (far from f ), but E

in

small (correct on most examples)

D

1

D

2

. . . D

1126

. . . D

5678

. . . Hoeffding

h BAD BAD P

D

[BAD D for h] ≤ . . .

Hoeffding: small

P

D

[BADD] = X

all possibleD

P(D) ·J

BAD

DK

(70)

Feasibility of Learning Connection to Real Learning

BAD Sample and BAD Data

BAD Sample

e.g., E

out

=

1 2

, but getting all heads (E

in

=0)!

BAD Data for One h E out (h) and E in (h) far away:

e.g., E

out

big (far from f ), but E

in

small (correct on most examples)

D

1

D

2

. . . D

1126

. . . D

5678

. . . Hoeffding

h BAD BAD P

D

[BAD D for h] ≤ . . .

Hoeffding: small

P

D

[BADD] = X

all possibleD

P(D) ·J

BAD

DK

(71)

Feasibility of Learning Connection to Real Learning

BAD Sample and BAD Data

BAD Sample

e.g., E

out

=

1 2

, but getting all heads (E

in

=0)!

BAD Data for One h E out (h) and E in (h) far away:

e.g., E

out

big (far from f ), but E

in

small (correct on most examples)

D

1

D

2

. . . D

1126

. . . D

5678

. . . Hoeffding

h BAD BAD P

D

[BAD D for h] ≤ . . .

Hoeffding: small

P

D

[BADD] = X

all possibleD

P(D) ·J

BAD

DK

(72)

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E in (h) far away

D

1

D

2 . . .

D

1126 . . .

D

5678

Hoeffding

h

1

BAD BAD P

D

[BAD D for h

1

] ≤ . . .

h

2

BAD P

D

[BAD D for h

2

] ≤ . . .

h

3

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

(73)

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E in (h) far away

D

1

D

2 . . .

D

1126 . . .

D

5678

Hoeffding

h

1

BAD BAD P

D

[BAD D for h

1

] ≤ . . .

h

2

BAD P

D

[BAD D for h

2

] ≤ . . .

h

3

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

(74)

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E in (h) far away

D

1

D

2 . . .

D

1126 . . .

D

5678

Hoeffding

h

1

BAD BAD P

D

[BAD D for h

1

] ≤ . . .

h

2

BAD P

D

[BAD D for h

2

] ≤ . . .

h

3

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

(75)

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E in (h) far away

D

1

D

2 . . .

D

1126 . . .

D

5678

Hoeffding

h

1

BAD BAD P

D

[BAD D for h

1

] ≤ . . .

h

2

BAD P

D

[BAD D for h

2

] ≤ . . .

h

3

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

(76)

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E in (h) far away

D

1

D

2 . . .

D

1126 . . .

D

5678

Hoeffding

h

1

BAD BAD P

D

[BAD D for h

1

] ≤ . . .

h

2

BAD P

D

[BAD D for h

2

] ≤ . . .

h

3

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

(77)

Feasibility of Learning Connection to Real Learning

BAD Data for Many h

=⇒

BAD data for many h

⇐⇒

no ‘freedom of choice’

byA

⇐⇒

there exists some h such that E out (h) and E in (h) far away

D

1

D

2 . . .

D

1126 . . .

D

5678

Hoeffding

h

1

BAD BAD P

D

[BAD D for h

1

] ≤ . . .

h

2

BAD P

D

[BAD D for h

2

] ≤ . . .

h

3

BAD BAD BAD P

D

[BAD D for h

3

] ≤ . . .

. . .

h

M

BAD BAD P

D

[BAD D for h

M

] ≤ . . .

all BAD BAD BAD ?

for M hypotheses, bound of P

D

[BADD]?

(78)

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

] (union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp



−2

2

N



finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E in (h m )

as g

(79)

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

]

(union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp



−2

2

N



finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E in (h m )

as g

(80)

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

] (union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp



−2

2

N



finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E in (h m )

as g

(81)

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

] (union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp



−2

2

N



finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E in (h m )

as g

(82)

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

] (union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp



−2

2

N



finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E in (h m )

as g

(83)

Feasibility of Learning Connection to Real Learning

Bound of BAD Data

P

D

[BADD]

= P

D

[BADD for h

1 or BAD

D for h

2 or

. . . or

BAD

D for h

M

]

≤ P

D

[BADD for h

1

] + P

D

[BADD for h

2

] +. . . + P

D

[BADD for h

M

] (union bound)

2 exp 

−2 2 N 

+

2 exp 

−2 2 N 

+. . . +

2 exp 

−2 2 N 

= 2Mexp

−2

2

N

finite-bin version of Hoeffding, valid for all

M, N and



does not depend on any E

out

(h

m

),

no need to ‘know’ E out (h m )

—f and P can stay unknown

‘E

in

(g) = E

out

(g)’ is

PAC, regardless of A

‘most reasonable’A (like PLA/pocket): pick the h

m

with

lowest E in (h m )

as g

參考文獻

相關文件

vice versa.’ To verify the rule, you chose 100 days uniformly at random from the past 10 years of stock data, and found that 80 of them satisfy the rule. What is the best guarantee

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 16/22.. If we use E loocv to estimate the performance of a learning algorithm that predicts with the average y value of the

vice versa.’ To verify the rule, you chose 100 days uniformly at random from the past 10 years of stock data, and found that 80 of them satisfy the rule. What is the best guarantee

• Submit your lab0 AND hw0 by next Thursday (or you will not be admitted to this course). • wn@csie.ntu.edu.tw is the e-mail

•In a stable structure the total strength of the bonds reaching an anion from all surrounding cations should be equal to the charge of the anion.. Pauling’ s rule-

(3)In principle, one of the documents from either of the preceding paragraphs must be submitted, but if the performance is to take place in the next 30 days and the venue is not

After the Opium War, Britain occupied Hong Kong and began its colonial administration. Hong Kong has also developed into an important commercial and trading port. In a society

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it