Basics of Machine Learning

(1)

Basics of Machine Learning

( 機器學習入門)

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

Department of Computer Science

& Information Engineering

National Taiwan University

( 國立台灣大學資訊工程系)

(2)

The Learning Problem

More about Me

Associate Professor Dept. CSIE National Taiwan

University

•

graduated from ck49th314, 1997

•

co-leader of KDDCup world champion teams at NTU: 2010–2013

•

Secretary General, Taiwanese Association for Artificial Intelligence

•

co-author of bestseller ML textbook “Learning from Data”

•

instructor of Mandarin-teaching MOOC of Machine Learning on NTU-Coursera:

2013.11–

https://www.coursera.org/course/ntumlone

(3)

The Learning Problem What is Machine Learning

From Learning to Machine Learning

learning: acquiring skill

learning:

with experience accumulated from

observations observations learning skill

machine learning: acquiring skill

machine learning:

with experience accumulated/computedfrom

data

data ML ^skill

What is

skill?

(4)

A More Concrete Definition

⇔

skill

⇔ improve some

performance measure

(e.g. prediction accuracy)

machine learning: improving some performance measure

machine learning:

with experience

computed

from

data

data ML

improved performance measure

An Application in Computational Finance

stock data ML more investment gain

Why use machine learning?

(5)

Yet Another Application: Tree Recognition

•

‘define’ trees and hand-program:

difficult

•

learn from data (observations) and recognize: a

3-year-old can do so

•

‘ML-based tree recognition system’ can be

easier to build

than hand-programmed system

ML: an

alternative route

to build complicated systems

(6)

The Machine Learning Route

ML: an

alternative route

to build complicated systems

Some Use Scenarios

•

when human cannot program the system manually

—navigating on Mars

•

when human cannot ‘define the solution’ easily

—speech/visual recognition

•

when needing rapid decisions that humans cannot do

—high-frequency trading

•

when needing to be user-oriented in a massive scale

—consumer-targeted marketing

Give a

computer a fish, you feed it for a day;

teach it how to fish, you feed it for a lifetime.

:-)

(7)

Key Essence of Machine Learning

machine learning: improving some performance measure

machine learning:

with experience

computed

from

data

data ML

improved performance measure

1

exists

some ‘underlying pattern’ to be learned

—so ‘performance measure’ can be improved

2

but

no

programmable (easy)

definition

—so ‘ML’ is needed

3

somehow there is

data

about the pattern

—so ML has some ‘inputs’ to learn from

key essence: help decide whether to use ML

(8)

The Learning Problem Applications of Machine Learning

Daily Needs: Food, Clothing, Housing, Transportation

data ML ^skill

1

Food

(Sadilek et al., 2013)

• data: Twitter data (words + location)

• skill: tell food poisoning likeliness of restaurant properly

2

Clothing

(Abu-Mostafa, 2012)

• data: sales figures + client surveys

• skill: give good fashion recommendations to clients

3

Housing

(Tsanas and Xifara, 2012)

• data: characteristics of buildings and their energy load

• skill: predict energy load of other buildings closely

4

Transportation

(Stallkamp et al., 2012)

• data: some traffic sign images and meanings

• skill: recognize traffic signs accurately

ML

is everywhere!

(9)

Education

data ML ^skill

• data: students’ records on quizzes on a Math tutoring system

• skill: predict whether a student can give a correct answer to

another quiz question

A Possible ML Solution

answer correctly≈Jrecent

strength

of student>

difficulty

of questionK

•

give ML

9 million records

from

3000 students

•

ML determines (reverse-engineers)

strength

and

difficulty

automatically

key part of the

world-champion

system from National Taiwan Univ. in KDDCup 2010

(10)

Entertainment: Recommender System (1/2)

data ML ^skill

• data: how many users have rated some movies

• skill: predict how a user would rate an unrated movie

A Hot Problem

•

competition held by Netflix in 2006

• 100,480,507 ratings that 480,189 users gave to 17,770 movies

• 10% improvement = 1 million dollar prize

•

similar competition (movies→ songs) held by Yahoo! in KDDCup 2011

• 252,800,275 ratings that 1,000,990 users gave to 624,961 songs

How can machines

learn our preferences?

(11)

Entertainment: Recommender System (2/2)

Match movie and viewer factors

predicted rating

com edy

content action

cont ent blockbu ster?

Tom Cru

isein it?

likesTomCruise? prefersblockbusters?

likesaction?

likescomedy?

movie viewer

add contributions from each factor

A Possible ML Solution

•

pattern:

rating

←

viewer/movie factors

•

learning:

→

known rating

→ learned

factors

→ unknown rating prediction

key part of the

world-champion

(again!) system from National Taiwan Univ.

in KDDCup 2011

(12)

The Learning Problem Components of Machine Learning

Components of Learning:

Metaphor Using Credit Approval

Applicant Information

age 23 years

gender female

annual salary NTD 1,000,000 year in residence 1 year

year in job 0.5 year current debt 200,000

unknown pattern to be learned:

‘approve credit card good for bank?’

(13)

Formalize the Learning Problem

Basic Notations

•

input:

x

∈ X (customer application)

•

output: y ∈ Y (good/bad after approving credit card)

• unknown pattern to be learned ⇔ target function

: f : X → Y (ideal credit approval formula)

• data ⇔ training examples

:D = {(x

1

, y

₁

), (x

₂

, y

₂

),· · · , (x

N

, y

_N

)} (historical records in bank)

• hypothesis ⇔ skill

with hopefully

good performance:

g : X → Y (‘learned’ formula to be used)

{(x ⁿ , y n ) }

from

f ML ^g

(14)

Learning Flow for Credit Approval

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

•

target f

unknown

(i.e. no programmable definition)

•

hypothesis g hopefully≈ f but possibly

different

from f

(perfection ‘impossible’ when f unknown) What does g look like?

(15)

The Learning Model

training examples D : (x

1

, y

1

), · · · , (x

N

, y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

•

assume g∈ H = {h

k

}, i.e. approving if

• h

₁

: annual salary > NTD 800,000

• h

₂

: debt > NTD 100,000 (really?)

• h

₃

: year in job ≤ 2 (really?)

•

hypothesis setH:

• can contain good or bad hypotheses

• up to A to pick the ‘best’ one as g

learning model

=A and H

(16)

Practical Definition of Machine Learning

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

₁

), · · · , (x

_N

,y

_N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

machine learning:

use

data

to compute

hypothesis g

that approximates

target f

(17)

The Learning Problem Machine Learning and Other Fields

Machine Learning and Data Mining

Machine Learning

use data to compute hypothesis g that approximates target f

Data Mining

use

(huge)

data to

find property

that is interesting

•

if ‘interesting property’

same as

‘hypothesis that approximate target’

—ML = DM(usually what KDDCup does)

•

if ‘interesting property’

related to

‘hypothesis that approximate target’

—DM can help ML, and vice versa(often, but not always)

•

traditional DM also focuses on

efficient computation in large database

difficult to distinguish ML and DM in reality

(18)

Machine Learning and Artificial Intelligence

Machine Learning

Artificial Intelligence

compute

something

that shows intelligent behavior

•

g ≈ f is something that shows intelligent behavior

—ML can realize AI, among other routes

•

e.g. chess playing

• traditional AI: game tree

• ML for AI: ‘learning from board data’

ML is one possible route to realize AI

(19)

Machine Learning and Statistics

Machine Learning

Statistics

use data to

make inference about an unknown process

•

g is an inference outcome; f is something unknown

—statistics

can be used to achieve ML

•

traditional statistics also focus on

provable results with math assumptions, and care less about computation

statistics: many useful tools for ML

(20)

A Learning Puzzle

y

n

= −1

y

n

= +1

g(x) = ?

let’s test your ‘human learning’

with 6 examples :-)

(21)

Two Controversial Answers

whatever you say about g(x),

yn=−1

yn= +1

g(x) = ?

y n = −1

y n = +1

g(x) = ?

truth f (x) = +1 because . . .

•

symmetry⇔ +1

•

(black or white count = 3) or (black count = 4 and

middle-top black)⇔ +1

truth f (x) = −1 because . . .

•

left-top black⇔ -1

•

middle column contains at most 1 black and right-top white⇔ -1

p

all valid reasons, your

adversarial teacher

can always call you ‘didn’t learn’.

:-(

(22)

No Free Lunch Theorem

Without any assumptions on the learning problem on hand, all learning algorithms perform the same.

No algorithm is better for all

learning problems

(23)

Gender Classification Problem

?

Male Female

(24)

Gender Classification: Lesson 1

?

Male Female Female Male Male

Female Female Male Female Male

(25)

Gender Classification: Lesson 2

Male

Male Female Female Male Male

Female Female

Male

Female Male

(26)

Gender Classification: Lesson 3

Male

Male Female Female

Male

Female Female Male Female Male

(27)

Gender Classification: Lesson 4

?

Male Female Female Male

Male

Female

(28)

Nearest Neighbors

Intuition

•

memorize everything

•

predict with the closest case

Algorithm

•

Training: memorize all examples (picture, label)

•

Prediction:

• find K nearest neighbors

• let them vote!

(29)

Apple Recognition Problem

•

Is this a picture of an apple?

•

We want to teach a class of 6 year olds.

•

Gather photos from NY Apple Asso. and Google Image.

(30)

Our Fruit Class Begins

Teacher:

How would you describe an apple? Michael?

Michael:

I think apples are circular.

(Class):

Apples are circular.

(31)

Our Fruit Class Continues

Teacher:

Being circular is a good feature for the apples.

However, if you only say circular, you could make several mistakes. What else can we say for an apple? Tina?

Tina:

It looks like apples are red.

(Class):

Apples are somewhat circular and somewhat red.

(32)

Our Fruit Class Continues

Teacher:

Yes. Many apples are red. However, you could still make mistakes based on circular and red. Do you have any other suggestions, Joey?

Joey:

Apples could also be green.

(Class):

Apples are somewhat circular and somewhat red and possibly green.

(33)

Our Fruit Class Continues

Teacher:

Yes. It seems that apples might be circular, red, green. But you may confuse them with tomatoes or peaches, right? Any more suggestions, Jessica?

Jessica:

Apples have stems at the top.

(Class):

Apples are somewhat circular, somewhat red, possibly green, and may have stems at the top.

(34)

Adaptive Boosting

ML and Life

•

combine simple rules to approximate complex function (many heads are better than one)

•

emphasize incorrect data for valuable information (again you can learn by correcting mistakes)

AdaBoost Algorithm

•

Input: examples (picture x

_n

, label y

n

)

^N _n=1

.

•

For t = 1, 2,· · · , T ,

• learn a simple rule h

t

from emphasized examples

• get the confidence w

t

of such rule

• emphasize the examples that do not agree with h

_t

.

•

Output: weighted vote of the rulesP

T

t=1

w

_t

h

_t

(x )

(35)

Machine Learning Research

•

What can machines learn? (application)

• concrete applications (and data mining):

• abstract setups:

classification, regression, · · ·

•

Why can machines learn? (theory)

• theoretical paradigms:

statistical learning, reinforcement learning, interactive learning, · · ·

• generalization guarantees

•

How can machines learn? (algorithm)

Basics of Machine Learning