Machine Learning Overviews and Applications

(1)

Machine Learning Overviews and Applications

Hsuan-Tien Lin (林軒田) [email protected]

Department of Computer Science

& Information Engineering

National Taiwan University

IRTG TAC-ICT Meeting, 01/11/2016

materials mostly taken from my “Learning from Data” book, my “Machine Learning Foundations” free online course, and works from NTU CLLab

and NTU KDDCup teams

(2)

About Me

Hsuan-Tien Lin

• Associate Professor, Dept. of CSIE, National Taiwan University

• Leader of the Computational Learning Laboratory

• Co-author of the textbook “Learning from Data: A Short Course” (often ML best seller on Amazon)

• Instructor of the NTU-Coursera Mandarin-teaching ML Massive Open Online Courses

• “Machine Learning Foundations”:

www.coursera.org/course/ntumlone

• “Machine Learning Techniques”:

www.coursera.org/course/ntumltwo

(3)

What is Machine Learning

(4)

What is Machine Learning

From Learning to Machine Learning

learning: acquiring skill

learning:

with experience accumulated from

observations observations learning skill

machine learning: acquiring skill

machine learning:

with experience accumulated/computedfrom

data

data ML ^skill

What is

skill?

(5)

A More Concrete Definition

skill

⇔ improve some

performance measure

(e.g. prediction accuracy)

machine learning: improving some performance measure

machine learning:

with experience

computed

from

data

data ML

improved performance measure

An Application in Computational Finance

stock data ML more investment gain

Why use machine learning?

(6)

Yet Another Application: Tree Recognition

•

‘define’ trees and hand-program:

difficult

•

learn from data (observations) and recognize: a

3-year-old can do so

•

‘ML-based tree recognition system’ can be

easier to build

than hand-programmed system

ML: an

alternative route

to build complicated systems

(7)

The Machine Learning Route

ML: an

alternative route

to build complicated systems

Some Use Scenarios

•

when human cannot program the system manually

—navigating on Mars

•

when human cannot ‘define the solution’ easily

—speech/visual recognition

•

when needing rapid decisions that humans cannot do

—high-frequency trading

•

when needing to be user-oriented in a massive scale

—consumer-targeted marketing

Give a

computer a fish, you feed it for a day;

teach it how to fish, you feed it for a lifetime.

:-)

(8)

Key Essence of Machine Learning

machine learning: improving some performance measure

machine learning:

with experience

computed

from

data

data ML

improved performance measure

1

exists

some ‘underlying pattern’ to be learned

—so ‘performance measure’ can be improved

2

but

no

programmable (easy)

definition

—so ‘ML’ is needed

3

somehow there is

data

about the pattern

—so ML has some ‘inputs’ to learn from

key essence: help decide whether to use ML

(9)

Snapshot Applications of Machine Learning

(10)

Communication

data ML ^skill

for 4G LTE communication

• data:

• channel information (the channel matrix representing mutual information)

• configuration (precoding, modulation, etc.) that reaches the highest throughput

• skill: predict best configuration to the base station

in a new environment

previous work of my student Yi-An Lin as intern @ MTK

(11)

data ML ^skill

for cross-screen ad placement

• data:

• customer information

• device information

• ad information

• skill: predict best ad to show to the user across devices

so that she/he clicks

ongoing work of my collaboration with Appier http://technews.tw/2015/11/03/

appier-asia/

(12)

Daily Needs: Food, Clothing, Housing, Transportation

data ML ^skill

1

Food

(Sadilek et al., 2013)

• data: Twitter data (words + location)

• skill: tell food poisoning likeliness of restaurant properly

2

Clothing

(Abu-Mostafa, 2012)

• data: sales figures + client surveys

• skill: give good fashion recommendations to clients

3

Housing

(Tsanas and Xifara, 2012)

• data: characteristics of buildings and their energy load

• skill: predict energy load of other buildings closely

4

Transportation

(Stallkamp et al., 2012)

• data: some traffic sign images and meanings

• skill: recognize traffic signs accurately

ML

is everywhere!

(13)

Education

data ML ^skill

• data: students’ records on quizzes on a Math tutoring system

• skill: predict whether a student can give a correct answer to

another quiz question

A Possible ML Solution

answer correctly ≈Jrecent

strength

of student >

difficulty

of questionK

•

give ML

9 million records

from

3000 students

•

ML determines (reverse-engineers)

strength

and

difficulty

automatically

key part of the

world-champion

system from National Taiwan Univ. in KDDCup 2010

(14)

Entertainment: Recommender System (1/2)

data ML ^skill

• data: how many users have rated some movies

• skill: predict how a user would rate an unrated movie

A Hot Problem

•

competition held by Netflix in 2006

• 100,480,507 ratings that 480,189 users gave to 17,770 movies

• 10% improvement = 1 million dollar prize

•

similar competition (movies → songs) held by Yahoo! in KDDCup 2011

• 252,800,275 ratings that 1,000,990 users gave to 624,961 songs

How can machines

learn our preferences?

(15)

Snapshot Applications of Machine Learning

Entertainment: Recommender System (2/2)

Match movie and viewer factors

predicted rating

comedy content action

content blockb uster?

TomCruisein it?

likes TomCruise?

prefers blockbusters? likes action?

likes comedy?

movie viewer

add contributions from each factor

A Possible ML Solution

•

pattern:

rating

←

viewer/movie factors

•

learning:

known rating

→ learned

factors

→ unknown rating prediction

key part of the

world-champion

(again!) system from National Taiwan Univ.

in KDDCup 2011

(16)

Components of Machine Learning

(17)

Components of Learning:

Metaphor Using Credit Approval

Applicant Information

age 23 years

gender female

annual salary NTD 1,000,000 year in residence 1 year

year in job 0.5 year current debt 200,000

unknown pattern to be learned:

‘approve credit card good for bank?’

(18)

Formalize the Learning Problem

Basic Notations

•

input:

x ∈ X (customer application)

•

output: y ∈ Y (good/bad after approving credit card)

• unknown pattern to be learned ⇔ target function:

f : X → Y (ideal credit approval formula)

• data ⇔ training examples: D = {(x ₁

,y

₁

), (x

₂

,y

₂

), · · · , (x

_N

,y

_N

)}

(historical records in bank)

• hypothesis ⇔ skill

with hopefully

good performance:

g : X → Y (‘learned’ formula to be used)

{(x n , y n )}

from

f ML ^g

(19)

Learning Flow for Credit Approval

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

•

target f

unknown

(i.e. no programmable definition)

•

hypothesis g hopefully ≈ f but possibly

different

from f

(perfection ‘impossible’ when f unknown) What does g look like?

(20)

The Learning Model

training examples D : (x

1

, y

1

), · · · , (x

N

, y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

•

assume g ∈ H = {h

_k

}, i.e. approving if

• h

₁

: annual salary > NTD 800,000

• h

₂

: debt > NTD 100,000 (really?)

• h

₃

: year in job ≤ 2 (really?)

•

hypothesis set H:

• can contain good or bad hypotheses

• up to A to pick the ‘best’ one as g

learning model

= A and H

(21)

Practical Definition of Machine Learning

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

₁

), · · · , (x

_N

,y

_N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

machine learning:

use

data

to compute

hypothesis g

that approximates

target f

(22)

Machine Learning Research in CLLab

(23)

Making Machine Learning Realistic: Now

Oracle: truth f (x) + noise e(x)

? (4) ?

data (instance x

n

, label y

n

)

? (1) -

learning

6 (3) good learning system g(x) algorithm

'

&

$

% -

(2) - 6

learning model {h(x)}

CLLab Works: Loosen the Limits of ML

1 cost-sensitive classification: limited protocol (classification) + auxiliary info. (cost)

2 multi-label classification: limited protocol (classification) + structure info. (label relation)

3 active learning: limited protocol (unlabeled data) + requested info.

(query)

4 online learning: limited protocol (streaming data) + feedback info.

(loss)

cost-sensitive classification

(24)

Which Digit Did You Write?

?

one (1) two (2) three (3)

a

classification problem

—grouping “pictures” into different “cate- gories”

(25)

Traditional Classification Problem

Oracle: truth f (x) + noise e(x)

?

data (instance x

_n

, label y

_n

)

?

learning good

learning

system g(x) ≈ f (x) algorithm

'

&

$

% -

6 learning model {g

α

(x)}

1 input: a batch of examples (digit x

_n

, intended label y

n

)

2 desired output: some g(x) such that g(x) 6= y seldom for future examples (x, y )

3 evaluation for some digit

(x = , y = 2)

—g(x) =







1 : wrong;

2 : right;

3 : wrong

Are all the

wrongs equally bad?

(26)

What is the Status of the Patient?

?

H1N1-infected cold-infected healthy

another

classification problem

—grouping “patients” into different “status”

(27)

Patient Status Prediction

error measure = society cost

actual predicted H1N1 cold healthy

H1N1 0 1000 100000

cold 100 0 3000

healthy 100 30 0

•

H1N1 mis-predicted as healthy:

very high cost

•

cold mis-predicted as healthy:

high cost

•

cold correctly predicted as cold:

no cost

human doctors consider costs of decision;

can computer-aided diagnosis do the

same?

(28)

Our Contributions

binary multiclass

regular well-studied well-studied

cost-sensitive known (Zadrozny, 2003)

ongoing (our works)

theoretic, algorithmic and empirical studies of cost-sensitive classification

•

ICML 2010: a theoretically-supported algorithm with

superior experimental results

•

BIBM 2011: application to real-world

bacteria classification with promising experimental results

•

KDD 2012: a cost-sensitive

and error-sensitive

methodology (achieving both low cost and

few

wrongs)

(29)

Making Machine Learning Realistic: Next

Teacher

?

cost c(t) query x(t) & guess ˆ y (t)

? learning algorithm '

&

$

%

knowledge X P P

P P P P i

6 learning model

Interactive Machine Learning

1 environment

2 exploration

3 dynamic

4 partial feedback

let us teach machines as “easily” as teaching students

(30)

Case: Interactive Learning for Online Advertisement

Traditional Machine Learning for Online Advertisement

•

data gathering: system

randomly shows ads to some previous users

•

expert building: system

analyzes data gathered to determine best (fixed) strategy

Interactive Machine Learning for Online Advertisement

•

environment : system serves

online users with profile

•

exploration : system

decides to show an ad to the user

•

dynamic : system receives data from

real-time user click

•

partial feedback : system receives

reward only if clicking

(31)

ICML 2012 Exploration & Exploitation Challenge

Interactive Machine Learning for Online Advertisement

•

environment : system serves

online users with profile

•

exploration : system

decides to show an ad to the user

•

dynamic : system receives data from

real-time user click

•

partial feedback : system receives

reward only if clicking

NTU beats two MIT teams to be the phase 1 winner!

ongoing collaboration with

Appier

for online advertisement

(32)

More on KDDCup

(33)

What is KDDCup?

Background

•

an annual competition on KDD (knowledge discovery and data mining)

•

organized by ACM SIGKDD, starting from 1997, now

the most prestigious data mining competition

•

usually lasts 3-4 months

•

participants include famous research labs (IBM, AT&T) and top universities (Stanford, Berkeley)

Aim

•

bridge the gap between theory and

practice, such as

• scalability and efficiency

• missing data and noise

• heterogeneous data

• unbalanced data

•

define the

state-of-the-art

(34)

KDDCups: 2008 to 2015 (1/4)

2008

•

organizer: Siemens

•

topic: breast cancer prediction (medical)

•

data size: 0.2M

•

teams: > 200

•

NTU:

co-champion

with IBM

2009

•

organizer: Orange

•

topic: customer behavior prediction (business)

•

data size: 0.1M

•

teams: > 400

•

NTU:

3rd place

of slow track

(35)

KDDCups: 2008 to 2015 (2/4)

2010

•

organizer: PSLC Data Shop

•

topic: student performance prediction (education)

•

data size: 30M

•

teams: > 100

•

NTU:

champion

and

student-team champion 2011

•

organizer: Yahoo!

•

topic: music preference prediction (recommendation)

•

data size: 300M

•

teams: > 1000

•

NTU:

double champions

(36)

KDDCups: 2008 to 2015 (3/4)

2012

•

organizer: Tencent

•

topic: webuser behavior prediction (Internet)

•

data size: 150M

•

teams: > 800

•

NTU:

champion of track 2 2013

•

organizer: Microsoft Research

•

topic: paper-author relationship prediction (academia)

•

data size: 600M

•

teams: > 500

•

NTU:

double champions

(37)

KDDCups: 2008 to 2015 (4/4)

2014

•

organizer: DonorsChoose

•

topic: charity proposal recommendation (social work)

•

data size: 850M

•

teams: > 450

•

NTU: top 20

2015

•

organizer: XuetangX

•

topic: dropout student prediction (online education)

•

data size: 100M

•

teams: > 800

•

NTU:

4th place

(38)

Our Systematic Steps in KDDCups

1

data

analysis

(on part of data)

• calculate statistics to identify outliers

• visualize data to see trend/pattern

2

feature

Machine Learning Overviews and Applications

Machine Learning Overviews and Applications

Department of Computer Science

& Information Engineering

National Taiwan University

IRTG TAC-ICT Meeting, 01/11/2016

materials mostly taken from my “Learning from Data” book, my “Machine Learning Foundations” free online course, and works from NTU CLLab

and NTU KDDCup teams

About Me

Hsuan-Tien Lin

• Associate Professor, Dept. of CSIE, National Taiwan University

• Leader of the Computational Learning Laboratory

• Co-author of the textbook “Learning from Data: A Short Course” (often ML best seller on Amazon)

• Instructor of the NTU-Coursera Mandarin-teaching ML Massive Open Online Courses

• “Machine Learning Foundations”:

www.coursera.org/course/ntumlone

• “Machine Learning Techniques”:

www.coursera.org/course/ntumltwo

What is Machine Learning

From Learning to Machine Learning

learning: acquiring skill

observations observations learning skill

machine learning: acquiring skill

machine learning:

data

data ML skill

skill?

A More Concrete Definition

skill

performance measure

machine learning: improving some performance measure

machine learning:

computed

data

data ML

improved performance measure

An Application in Computational Finance

stock data ML more investment gain

Yet Another Application: Tree Recognition

•

difficult

•

3-year-old can do so

•

easier to build

alternative route

The Machine Learning Route

alternative route

Some Use Scenarios

•

•

•

•

computer a fish, you feed it for a day;

:-)

Key Essence of Machine Learning

machine learning: improving some performance measure

machine learning:

computed

data

data ML

improved performance measure

1

some ‘underlying pattern’ to be learned

2

no

definition

3

data

Snapshot Applications of Machine Learning

Communication

data ML skill

for 4G LTE communication

• data:

• channel information (the channel matrix representing mutual information)

• configuration (precoding, modulation, etc.) that reaches the highest throughput

• skill: predict best configuration to the base station

Advertisement

data ML skill

for cross-screen ad placement

data ML ^skill

data ML ^skill

data ML ^skill

data ML ^skill

data ML ^skill

data ML ^skill

• data ⇔ training examples: D = {(x ₁