• 沒有找到結果。

Machine Learning Overviews and Applications

N/A
N/A
Protected

Academic year: 2022

Share "Machine Learning Overviews and Applications"

Copied!
39
0
0

加載中.... (立即查看全文)

全文

(1)

Machine Learning Overviews and Applications

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

Department of Computer Science

& Information Engineering

National Taiwan University

IRTG TAC-ICT Meeting, 01/11/2016

materials mostly taken from my “Learning from Data” book, my “Machine Learning Foundations” free online course, and works from NTU CLLab

and NTU KDDCup teams

(2)

About Me

Hsuan-Tien Lin

• Associate Professor, Dept. of CSIE, National Taiwan University

• Leader of the Computational Learning Laboratory

• Co-author of the textbook “Learning from Data: A Short Course” (often ML best seller on Amazon)

• Instructor of the NTU-Coursera Mandarin-teaching ML Massive Open Online Courses

• “Machine Learning Foundations”:

www.coursera.org/course/ntumlone

• “Machine Learning Techniques”:

www.coursera.org/course/ntumltwo

(3)

What is Machine Learning

(4)

What is Machine Learning

From Learning to Machine Learning

learning: acquiring skill

learning:

with experience accumulated from

observations observations learning skill

machine learning: acquiring skill

machine learning:

with experience accumulated/computedfrom

data

data ML skill

What is

skill?

(5)

What is Machine Learning

A More Concrete Definition

skill

⇔ improve some

performance measure

(e.g. prediction accuracy)

machine learning: improving some performance measure

machine learning:

with experience

computed

from

data

data ML

improved performance measure

An Application in Computational Finance

stock data ML more investment gain

Why use machine learning?

(6)

Yet Another Application: Tree Recognition

‘define’ trees and hand-program:

difficult

learn from data (observations) and recognize: a

3-year-old can do so

‘ML-based tree recognition system’ can be

easier to build

than hand-programmed system

ML: an

alternative route

to build complicated systems

(7)

The Machine Learning Route

ML: an

alternative route

to build complicated systems

Some Use Scenarios

when human cannot program the system manually

—navigating on Mars

when human cannot ‘define the solution’ easily

—speech/visual recognition

when needing rapid decisions that humans cannot do

—high-frequency trading

when needing to be user-oriented in a massive scale

—consumer-targeted marketing

Give a

computer a fish, you feed it for a day;

teach it how to fish, you feed it for a lifetime.

:-)

(8)

What is Machine Learning

Key Essence of Machine Learning

machine learning: improving some performance measure

machine learning:

with experience

computed

from

data

data ML

improved performance measure

1

exists

some ‘underlying pattern’ to be learned

—so ‘performance measure’ can be improved

2

but

no

programmable (easy)

definition

—so ‘ML’ is needed

3

somehow there is

data

about the pattern

—so ML has some ‘inputs’ to learn from

key essence: help decide whether to use ML

(9)

Snapshot Applications of Machine Learning

(10)

Communication

data ML skill

for 4G LTE communication

• data:

channel information (the channel matrix representing mutual information)

configuration (precoding, modulation, etc.) that reaches the highest throughput

• skill: predict best configuration to the base station

in a new environment

previous work of my student Yi-An Lin as intern @ MTK

(11)

Advertisement

data ML skill

for cross-screen ad placement

• data:

customer information

device information

ad information

• skill: predict best ad to show to the user across devices

so that she/he clicks

ongoing work of my collaboration with Appier http://technews.tw/2015/11/03/

appier-asia/

(12)

Daily Needs: Food, Clothing, Housing, Transportation

data ML skill

1

Food

(Sadilek et al., 2013)

• data: Twitter data (words + location)

• skill: tell food poisoning likeliness of restaurant properly

2

Clothing

(Abu-Mostafa, 2012)

• data: sales figures + client surveys

• skill: give good fashion recommendations to clients

3

Housing

(Tsanas and Xifara, 2012)

• data: characteristics of buildings and their energy load

• skill: predict energy load of other buildings closely

4

Transportation

(Stallkamp et al., 2012)

• data: some traffic sign images and meanings

• skill: recognize traffic signs accurately

ML

is everywhere!

(13)

Education

data ML skill

• data: students’ records on quizzes on a Math tutoring system

• skill: predict whether a student can give a correct answer to

another quiz question

A Possible ML Solution

answer correctly ≈Jrecent

strength

of student >

difficulty

of questionK

give ML

9 million records

from

3000 students

ML determines (reverse-engineers)

strength

and

difficulty

automatically

key part of the

world-champion

system from National Taiwan Univ. in KDDCup 2010

(14)

Entertainment: Recommender System (1/2)

data ML skill

• data: how many users have rated some movies

• skill: predict how a user would rate an unrated movie

A Hot Problem

competition held by Netflix in 2006

• 100,480,507 ratings that 480,189 users gave to 17,770 movies

• 10% improvement = 1 million dollar prize

similar competition (movies → songs) held by Yahoo! in KDDCup 2011

• 252,800,275 ratings that 1,000,990 users gave to 624,961 songs

How can machines

learn our preferences?

(15)

Snapshot Applications of Machine Learning

Entertainment: Recommender System (2/2)

Match movie and viewer factors

predicted rating

comedy content action

content blockb uster?

TomCruisein it?

likes TomCruise?

prefers blockbusters? likes action?

likes comedy?

movie viewer

add contributions from each factor

A Possible ML Solution

pattern:

rating

viewer/movie factors

learning:

known rating

→ learned

factors

→ unknown rating prediction

key part of the

world-champion

(again!) system from National Taiwan Univ.

in KDDCup 2011

(16)

Components of Machine Learning

(17)

Components of Learning:

Metaphor Using Credit Approval

Applicant Information

age 23 years

gender female

annual salary NTD 1,000,000 year in residence 1 year

year in job 0.5 year current debt 200,000

unknown pattern to be learned:

‘approve credit card good for bank?’

(18)

Formalize the Learning Problem

Basic Notations

input:

x ∈ X (customer application)

output: y ∈ Y (good/bad after approving credit card)

• unknown pattern to be learned ⇔ target function:

f : X → Y (ideal credit approval formula)

• data ⇔ training examples: D = {(x 1

,y

1

), (x

2

,y

2

), · · · , (x

N

,y

N

)}

(historical records in bank)

• hypothesis ⇔ skill

with hopefully

good performance:

g : X → Y (‘learned’ formula to be used)

{(x n , y n )}

from

f ML g

(19)

Learning Flow for Credit Approval

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

target f

unknown

(i.e. no programmable definition)

hypothesis g hopefully ≈ f but possibly

different

from f

(perfection ‘impossible’ when f unknown) What does g look like?

(20)

The Learning Model

training examples D : (x

1

, y

1

), · · · , (x

N

, y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

assume g ∈ H = {h

k

}, i.e. approving if

• h

1

: annual salary > NTD 800,000

• h

2

: debt > NTD 100,000 (really?)

• h

3

: year in job ≤ 2 (really?)

hypothesis set H:

• can contain good or bad hypotheses

• up to A to pick the ‘best’ one as g

learning model

= A and H

(21)

Practical Definition of Machine Learning

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

machine learning:

use

data

to compute

hypothesis g

that approximates

target f

(22)

Machine Learning Research in CLLab

(23)

Making Machine Learning Realistic: Now

Oracle: truth f (x) + noise e(x)

? (4) ?

data (instance x

n

, label y

n

)

? (1) -

learning

6

(3) good learning system g(x) algorithm

'

&

$

% -

(2) - 6

learning model {h(x)}

CLLab Works: Loosen the Limits of ML

1 cost-sensitive classification: limited protocol (classification) + auxiliary info. (cost)

2 multi-label classification: limited protocol (classification) + structure info. (label relation)

3 active learning: limited protocol (unlabeled data) + requested info.

(query)

4 online learning: limited protocol (streaming data) + feedback info.

(loss)

next:

(1)

cost-sensitive classification

(24)

Which Digit Did You Write?

?

one (1) two (2) three (3)

a

classification problem

—grouping “pictures” into different “cate- gories”

(25)

Traditional Classification Problem

Oracle: truth f (x) + noise e(x)

?

data (instance x

n

, label y

n

)

?

learning good

learning

system g(x) ≈ f (x) algorithm

'

&

$

% -

6

learning model {g

α

(x)}

1 input: a batch of examples (digit x

n

, intended label y

n

)

2 desired output: some g(x) such that g(x) 6= y seldom for future examples (x, y )

3 evaluation for some digit

(x = , y = 2)

—g(x) =

1 : wrong;

2 : right;

3 : wrong

Are all the

wrongs equally bad?

(26)

What is the Status of the Patient?

?

H1N1-infected cold-infected healthy

another

classification problem

—grouping “patients” into different “status”

(27)

Patient Status Prediction

error measure = society cost

actual predicted H1N1 cold healthy

H1N1 0 1000 100000

cold 100 0 3000

healthy 100 30 0

H1N1 mis-predicted as healthy:

very high cost

cold mis-predicted as healthy:

high cost

cold correctly predicted as cold:

no cost

human doctors consider costs of decision;

can computer-aided diagnosis do the

same?

(28)

Our Contributions

binary multiclass

regular well-studied well-studied

cost-sensitive known (Zadrozny, 2003)

ongoing (our works)

theoretic, algorithmic and empirical studies of cost-sensitive classification

ICML 2010: a theoretically-supported algorithm with

superior experimental results

BIBM 2011: application to real-world

bacteria classification with promising experimental results

KDD 2012: a cost-sensitive

and error-sensitive

methodology (achieving both low cost and

few

wrongs)

(29)

Making Machine Learning Realistic: Next

Teacher

?

cost c(t) query x(t) & guess ˆ y (t)

? learning algorithm '

&

$

%



 knowledge X P P

P P P P i

6 learning model

Interactive Machine Learning

1 environment

2 exploration

3 dynamic

4 partial feedback

let us teach machines as “easily” as teaching students

(30)

Case: Interactive Learning for Online Advertisement

Traditional Machine Learning for Online Advertisement

data gathering: system

randomly shows ads to some previous users

expert building: system

analyzes data gathered to determine best (fixed) strategy

Interactive Machine Learning for Online Advertisement

environment : system serves

online users with profile

exploration : system

decides to show an ad to the user

dynamic : system receives data from

real-time user click

partial feedback : system receives

reward only if clicking

(31)

ICML 2012 Exploration & Exploitation Challenge

Interactive Machine Learning for Online Advertisement

environment : system serves

online users with profile

exploration : system

decides to show an ad to the user

dynamic : system receives data from

real-time user click

partial feedback : system receives

reward only if clicking

NTU beats two MIT teams to be the phase 1 winner!

ongoing collaboration with

Appier

for online advertisement

(32)

More on KDDCup

(33)

What is KDDCup?

Background

an annual competition on KDD (knowledge discovery and data mining)

organized by ACM SIGKDD, starting from 1997, now

the most prestigious data mining competition

usually lasts 3-4 months

participants include famous research labs (IBM, AT&T) and top universities (Stanford, Berkeley)

Aim

bridge the gap between theory and

practice, such as

• scalability and efficiency

• missing data and noise

• heterogeneous data

• unbalanced data

define the

state-of-the-art

(34)

KDDCups: 2008 to 2015 (1/4)

2008

organizer: Siemens

topic: breast cancer prediction (medical)

data size: 0.2M

teams: > 200

NTU:

co-champion

with IBM

2009

organizer: Orange

topic: customer behavior prediction (business)

data size: 0.1M

teams: > 400

NTU:

3rd place

of slow track

(35)

KDDCups: 2008 to 2015 (2/4)

2010

organizer: PSLC Data Shop

topic: student performance prediction (education)

data size: 30M

teams: > 100

NTU:

champion

and

student-team champion 2011

organizer: Yahoo!

topic: music preference prediction (recommendation)

data size: 300M

teams: > 1000

NTU:

double champions

(36)

KDDCups: 2008 to 2015 (3/4)

2012

organizer: Tencent

topic: webuser behavior prediction (Internet)

data size: 150M

teams: > 800

NTU:

champion of track 2 2013

organizer: Microsoft Research

topic: paper-author relationship prediction (academia)

data size: 600M

teams: > 500

NTU:

double champions

(37)

KDDCups: 2008 to 2015 (4/4)

2014

organizer: DonorsChoose

topic: charity proposal recommendation (social work)

data size: 850M

teams: > 450

NTU: top 20

2015

organizer: XuetangX

topic: dropout student prediction (online education)

data size: 100M

teams: > 800

NTU:

4th place

(38)

Our Systematic Steps in KDDCups

1

data

analysis

(on part of data)

• calculate statistics to identify outliers

• visualize data to see trend/pattern

2

feature

extraction

• feature design by human: common encoding, domain knowledge, etc.

• feature learning by machines: sparse coding, matrix factorization, deep learning, etc.

3

model

learning

• model exploration (trial-and-evaluate) to improve performance

• model selection to avoid overfitting

4

hypotheses

blending

(towards

big ensemble)

• careful non-linear blending to be sophisticated

• careful linear blending (voting/averaging) to be robust

you can also

follow those step for your applications, except for maybe “big

ensemble”!

(39)

That’s about all. Thank you!

參考文獻

相關文件

For machine learning applications, no need to accurately solve the optimization problem Because some optimal α i = 0, decomposition methods may not need to update all the

sketch with weak labels first, refine with limited labeled data later—or maybe learn from many weak labels only?.. Learning with Limited

⇔ improve some performance measure (e.g. prediction accuracy) machine learning: improving some performance measure..

“Machine Learning Foundations” free online course, and works from NTU CLLab and NTU KDDCup teams... The Learning Problem What is

Suggestions to Medicine Researchers on Using ML-driven AI.. From Intelligence to Artificial Intelligence.. intelligence: thinking and

3 active learning: limited protocol (unlabeled data) + requested

Parallel dual coordinate descent method for large-scale linear classification in multi-core environments. In Proceedings of the 22nd ACM SIGKDD International Conference on

• gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!) and label them as apple/other for learning.. (APAL stands for Apple and Pear