• 沒有找到結果。

Machine Learning Foundations ( 機器學習基石)

N/A
N/A
Protected

Academic year: 2022

Share "Machine Learning Foundations ( 機器學習基石)"

Copied!
28
0
0

加載中.... (立即查看全文)

全文

(1)

Machine Learning Foundations ( 機器學習基石)

Lecture 1: The Learning Problem

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

Department of Computer Science

& Information Engineering

National Taiwan University

( 國立台灣大學資訊工程系)

(2)

The Learning Problem Course Introduction

Course Design (1/2)

Machine Learning: a mixture of theoretical and practical tools

theory oriented

• derive everything deeply for solid understanding

• less interesting to general audience

techniquesoriented

• flash over the sexiest techniques broadly for shiny coverage

• too many techniques, hard to choose, hard to use properly

our approach:

foundation oriented

(3)

The Learning Problem Course Introduction

Course Design (2/2)

Foundation Oriented ML Course

mixture of philosophical illustrations, key theory, core techniques, usage in practice, and hopefully jokes

:-)

—what

every machine learning user

should know

story-like:

When Can Machines Learn? (illustrative + technical)

Why Can Machines Learn? (theoretical + illustrative)

How Can Machines Learn? (technical + practical)

• How Can Machines Learn Better? (practical + theoretical)

allows students to

learn ‘future/untaught’

techniques or study deeper theory easily

(4)

The Learning Problem Course Introduction

Course History

NTU Version

15-17 weeks (2+ hours)

highly-praised with

English and blackboard teaching

Coursera Version

8 weeks of ‘foundation’ (this

course) + 7 weeks of

‘techniques’ (coming course)

Mandarin teaching

to reach more audience in need

slides teaching

improved with Coursera’s quiz and homework mechanisms

goal:

try

making Coursera version even better than NTU version

(5)

The Learning Problem Course Introduction

Fun Time

Which of the following description of this course is true?

1

the course will be taught in Taiwanese

2

the course will tell me the techniques that create the android Lieutenant Commander Data in Star Trek

3

the course will be 15 weeks long

4

the course will be story-like

Reference Answer: 4

1

no, my Taiwanese is unfortunately not good enough for teaching (yet)

2

no, although what we teach may serve as foundations of those (future) techniques

3

no, unless you choose to join the next course

4

yes,

let’s begin the story

(6)

The Learning Problem Course Introduction

Roadmap

1 When

Can Machines Learn?

Lecture 1: The Learning Problem Course Introduction

What is Machine Learning

Applications of Machine Learning Components of Machine Learning Machine Learning and Other Fields

2 Why Can Machines Learn?

3 How Can Machines Learn?

4 How Can Machines Learn Better?

(7)

The Learning Problem What is Machine Learning

From Learning to Machine Learning

learning: acquiring skill

learning:

with experience accumulated from

observations observations learning skill

machine learning: acquiring skill

machine learning:

with experience accumulated/computedfrom

data

data ML skill

What is

skill?

(8)

The Learning Problem What is Machine Learning

A More Concrete Definition

skill

⇔ improve some

performance measure

(e.g. prediction accuracy)

machine learning: improving some performance measure

machine learning:

with experience

computed

from

data

data ML

improved performance measure

An Application in Computational Finance

stock data ML more investment gain

Why use machine learning?

(9)

The Learning Problem What is Machine Learning

Yet Another Application: Tree Recognition

‘define’ trees and hand-program:

difficult

learn from data (observations) and recognize: a

3-year-old can do so

‘ML-based tree recognition system’ can be

easier to build

than hand-programmed system

ML: an

alternative route

to build complicated systems

(10)

The Learning Problem What is Machine Learning

The Machine Learning Route

ML: an

alternative route

to build complicated systems

Some Use Scenarios

when human cannot program the system manually

—navigating on Mars

when human cannot ‘define the solution’ easily

—speech/visual recognition

when needing rapid decisions that humans cannot do

—high-frequency trading

when needing to be user-oriented in a massive scale

—consumer-targeted marketing

Give a

computer a fish, you feed it for a day;

teach it how to fish, you feed it for a lifetime.

:-)

(11)

The Learning Problem What is Machine Learning

Key Essence of Machine Learning

machine learning: improving some performance measure

machine learning:

with experience

computed

from

data

data ML

improved performance measure

1

exists

some ‘underlying pattern’ to be learned

—so ‘performance measure’ can be improved

2

but

no

programmable (easy)

definition

—so ‘ML’ is needed

3

somehow there is

data

about the pattern

—so ML has some ‘inputs’ to learn from

key essence: help decide whether to use ML

(12)

The Learning Problem What is Machine Learning

Fun Time

Which of the following is best suited for machine learning?

1

predicting whether the next cry of the baby girl happens at an even-numbered minute or not

2

determining whether a given graph contains a cycle

3

deciding whether to approve credit card to some customer

4

guessing whether the earth will be destroyed by the misuse of nuclear power in the next ten years

Reference Answer: 3

1

no

pattern

2 programmable definition

3 pattern: customer behavior;

definition: not easily programmable;

data: history of bank operation

4

arguably

no (or not enough) data

yet

(13)

The Learning Problem Applications of Machine Learning

Daily Needs: Food, Clothing, Housing, Transportation

data ML skill

1

Food

(Sadilek et al., 2013)

• data: Twitter data (words + location)

• skill: tell food poisoning likeliness of restaurant properly

2

Clothing

(Abu-Mostafa, 2012)

• data: sales figures + client surveys

• skill: give good fashion recommendations to clients

3

Housing

(Tsanas and Xifara, 2012)

• data: characteristics of buildings and their energy load

• skill: predict energy load of other buildings closely

4

Transportation

(Stallkamp et al., 2012)

• data: some traffic sign images and meanings

• skill: recognize traffic signs accurately

ML

is everywhere!

(14)

The Learning Problem Applications of Machine Learning

Education

data ML skill

• data: students’ records on quizzes on a Math tutoring system

• skill: predict whether a student can give a correct answer to

another quiz question

A Possible ML Solution

answer correctly ≈Jrecent

strength

of student >

difficulty

of questionK

give ML

9 million records

from

3000 students

ML determines (reverse-engineers)

strength

and

difficulty

automatically

key part of the

world-champion

system from National Taiwan Univ. in KDDCup 2010

(15)

The Learning Problem Applications of Machine Learning

Entertainment: Recommender System (1/2)

data ML skill

• data: how many users have rated some movies

• skill: predict how a user would rate an unrated movie

A Hot Problem

competition held by Netflix in 2006

• 100,480,507 ratings that 480,189 users gave to 17,770 movies

• 10% improvement = 1 million dollar prize

similar competition (movies → songs) held by Yahoo! in KDDCup 2011

• 252,800,275 ratings that 1,000,990 users gave to 624,961 songs

How can machines

learn our preferences?

(16)

The Learning Problem Applications of Machine Learning

Entertainment: Recommender System (2/2)

Match movie and viewer factors

predicted rating

comedy content action

content blockb uster?

TomCruisein it?

likes TomCruise?

prefers blockbusters? likes action?

likes comedy?

movie viewer

add contributions from each factor

A Possible ML Solution

pattern:

rating

viewer/movie factors

learning:

known rating

→ learned

factors

→ unknown rating prediction

key part of the

world-champion

(again!) system from National Taiwan Univ.

in KDDCup 2011

(17)

The Learning Problem Applications of Machine Learning

Fun Time

Which of the following field cannot use machine learning?

1

Finance

2

Medicine

3

Law

4

none of the above

Reference Answer: 4

1

predict stock price from data

2

predict medicine effect from data

3

summarize legal documents from data

4 :-) Welcome to study this hot topic!

(18)

The Learning Problem Components of Machine Learning

Components of Learning:

Metaphor Using Credit Approval

Applicant Information

age 23 years

gender female

annual salary NTD 1,000,000 year in residence 1 year

year in job 0.5 year current debt 200,000

unknown pattern to be learned:

‘approve credit card good for bank?’

(19)

The Learning Problem Components of Machine Learning

Formalize the Learning Problem

Basic Notations

input:

x ∈ X (customer application)

output: y ∈ Y (good/bad after approving credit card)

• unknown pattern to be learned ⇔ target function:

f : X → Y (ideal credit approval formula)

• data ⇔ training examples: D = {(x 1

,y

1

), (x

2

,y

2

), · · · , (x

N

,y

N

)}

(historical records in bank)

• hypothesis ⇔ skill

with hopefully

good performance:

g : X → Y (‘learned’ formula to be used)

{(x n , y n )}

from

f ML g

(20)

The Learning Problem Components of Machine Learning

Learning Flow for Credit Approval

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

target f

unknown

(i.e. no programmable definition)

hypothesis g hopefully ≈ f but possibly

different

from f

(perfection ‘impossible’ when f unknown) What does g look like?

(21)

The Learning Problem Components of Machine Learning

The Learning Model

training examples D : (x

1

, y

1

), · · · , (x

N

, y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

assume g ∈ H = {h

k

}, i.e. approving if

• h

1

: annual salary > NTD 800,000

• h

2

: debt > NTD 100,000 (really?)

• h

3

: year in job ≤ 2 (really?)

hypothesis set H:

• can contain good or bad hypotheses

• up to A to pick the ‘best’ one as g

learning model

= A and H

(22)

The Learning Problem Components of Machine Learning

Practical Definition of Machine Learning

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

machine learning:

use

data

to compute

hypothesis g

that approximates

target f

(23)

The Learning Problem Components of Machine Learning

Fun Time

How to use the four sets below to form a learning problem for song recommendation?

S

1

= [0, 100]

S

2

= all possible (userid, songid) pairs

S

3

= all formula that ‘multiplies’ user factors & song factors, indexed by all possible combinations of such factors S

4

= 1,000,000 pairs of ((userid, songid), rating)

1

S

1

= X , S

2

= Y, S

3

= H, S

4

= D

2

S

1

= Y, S

2

= X , S

3

= H, S

4

= D

3

S

1

= D, S

2

= H, S

3

= Y, S

4

= X

4

S

1

= X , S

2

= D, S

3

= Y, S

4

= H

Reference Answer: 2

S

4

A on S

−−−−→ (g : S3

2

→ S

1

)

(24)

The Learning Problem Machine Learning and Other Fields

Machine Learning and Data Mining

Machine Learning

use data to compute hypothesis g that approximates target f

Data Mining

use

(huge)

data to

find property

that is interesting

if ‘interesting property’

same as

‘hypothesis that approximate target’

—ML = DM(usually what KDDCup does)

if ‘interesting property’

related to

‘hypothesis that approximate target’

—DM can help ML, and vice versa(often, but not always)

traditional DM also focuses on

efficient computation in large database

difficult to distinguish ML and DM in reality

(25)

The Learning Problem Machine Learning and Other Fields

Machine Learning and Artificial Intelligence

Machine Learning

use data to compute hypothesis g that approximates target f

Artificial Intelligence

compute

something

that shows intelligent behavior

g ≈ f is something that shows intelligent behavior

—ML can realize AI, among other routes

e.g. chess playing

• traditional AI: game tree

• ML for AI: ‘learning from board data’

ML is one possible route to realize AI

(26)

The Learning Problem Machine Learning and Other Fields

Machine Learning and Statistics

Machine Learning

use data to compute hypothesis g that approximates target f

Statistics

use data to

make inference about an unknown process

g is an inference outcome; f is something unknown

—statistics

can be used to achieve ML

traditional statistics also focus on

provable results with math assumptions, and care less about computation

statistics: many useful tools for ML

(27)

The Learning Problem Machine Learning and Other Fields

Fun Time

Which of the following claim is not totally true?

1

machine learning is a route to realize artificial intelligence

2

machine learning, data mining and statistics all need data

3

data mining is just another name for machine learning

4

statistics can be used for data mining

Reference Answer: 3

While data mining and machine learning do share a huge overlap, they are arguably not equivalent because of the difference of focus.

(28)

The Learning Problem Machine Learning and Other Fields

Summary

1 When

Can Machines Learn?

Lecture 1: The Learning Problem Course Introduction

foundation oriented and story-like What is Machine Learning

use data to approximate target Applications of Machine Learning

almost everywhere Components of Machine Learning

A takes D and H to get g Machine Learning and Other Fields

related to DM, AI and Stats

next: a simple and yet useful learning model (H and A)

2 Why Can Machines Learn?

3 How Can Machines Learn?

4 How Can Machines Learn Better?

參考文獻

相關文件

?: { machine learning, data structure, data mining, object oriented programming, artificial intelligence, compiler, architecture, chemistry, textbook, children book,. }. a

vice versa.’ To verify the rule, you chose 100 days uniformly at random from the past 10 years of stock data, and found that 80 of them satisfy the rule. What is the best guarantee

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 16/22.. If we use E loocv to estimate the performance of a learning algorithm that predicts with the average y value of the

• logistic regression often preferred over pocket.. Linear Models for Classification Stochastic Gradient Descent. Two Iterative

Most existing machine learning algorithms are designed by assuming that data can be easily accessed.. Therefore, the same data may be accessed

vice versa.’ To verify the rule, you chose 100 days uniformly at random from the past 10 years of stock data, and found that 80 of them satisfy the rule. What is the best guarantee

Which keywords below shall have large positive weights in a good perceptron for the task. 1 coffee, tea,

Which keywords below shall have large positive weights in a good perceptron for the task.. 1 coffee, tea,