Machine Learning Overview and Applications

(1)

Machine Learning Overview and Applications

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

Computational Learning Lab (CLLab) Department of Computer Science

& Information Engineering

National Taiwan University

( 國立台灣大學資訊工程系計算學習實驗室)

materials mostly taken from my “Learning from Data” book, my

“Machine Learning Foundations” free online course, and works from NTU CLLab and NTU KDDCup teams

(2)

The Learning Problem What is Machine Learning

What is Machine Learning

(3)

From Learning to Machine Learning

learning: acquiring skill

learning:

with experience accumulated from

observations observations learning skill

machine learning: acquiring skill

machine learning:

with experience accumulated/computedfrom

data

data ML ^skill

What is

skill?

(4)

A More Concrete Definition

⇔

skill

⇔ improve some

performance measure

(e.g. prediction accuracy)

machine learning: improving some performance measure

machine learning:

with experience

computed

from

data

data ML

improved performance measure

An Application in Computational Finance

stock data ML more investment gain

Why use machine learning?

(5)

Yet Another Application: Tree Recognition

•

‘define’ trees and hand-program:

difficult

•

learn from data (observations) and recognize: a

3-year-old can do so

•

‘ML-based tree recognition system’ can be

easier to build

than hand-programmed system

ML: an

alternative route

to build complicated systems

(6)

The Machine Learning Route

ML: an

alternative route

to build complicated systems

Some Use Scenarios

•

when human cannot program the system manually

—navigating on Mars

•

when human cannot ‘define the solution’ easily

—speech/visual recognition

•

when needing rapid decisions that humans cannot do

—high-frequency trading

•

when needing to be user-oriented in a massive scale

—consumer-targeted marketing

Give a

computer a fish, you feed it for a day;

teach it how to fish, you feed it for a lifetime.

:-)

(7)

Key Essence of Machine Learning

machine learning: improving some performance measure

machine learning:

with experience

computed

from

data

data ML

improved performance measure

1

exists

some ‘underlying pattern’ to be learned

—so ‘performance measure’ can be improved

2

but

no

programmable (easy)

definition

—so ‘ML’ is needed

3

somehow there is

data

about the pattern

—so ML has some ‘inputs’ to learn from

key essence: help decide whether to use ML

(8)

The Learning Problem Snapshot Applications of Machine Learning

Snapshot Applications of Machine Learning

(9)

Daily Needs: Food, Clothing, Housing, Transportation

data ML ^skill

1

Food

(Sadilek et al., 2013)

• data: Twitter data (words + location)

• skill: tell food poisoning likeliness of restaurant properly

2

Clothing

(Abu-Mostafa, 2012)

• data: sales figures + client surveys

• skill: give good fashion recommendations to clients

3

Housing

(Tsanas and Xifara, 2012)

• data: characteristics of buildings and their energy load

• skill: predict energy load of other buildings closely

4

Transportation

(Stallkamp et al., 2012)

• data: some traffic sign images and meanings

• skill: recognize traffic signs accurately

ML

is everywhere!

(10)

Education

data ML ^skill

• data: students’ records on quizzes on a Math tutoring system

• skill: predict whether a student can give a correct answer to

another quiz question

A Possible ML Solution

answer correctly ≈Jrecent

strength

of student >

difficulty

of questionK

•

give ML

9 million records

from

3000 students

•

ML determines (reverse-engineers)

strength

and

difficulty

automatically

key part of the

world-champion

system from National Taiwan Univ. in KDDCup 2010

(11)

Entertainment: Recommender System (1/2)

data ML ^skill

• data: how many users have rated some movies

• skill: predict how a user would rate an unrated movie

A Hot Problem

•

competition held by Netflix in 2006

• 100,480,507 ratings that 480,189 users gave to 17,770 movies

• 10% improvement = 1 million dollar prize

•

similar competition (movies → songs) held by Yahoo! in KDDCup 2011

• 252,800,275 ratings that 1,000,990 users gave to 624,961 songs

How can machines

learn our preferences?

(12)

Entertainment: Recommender System (2/2)

Match movie and viewer factors

predicted rating

comedy content action

content blockb uster?

TomCruisein it?

likes TomCruise?

prefers blockbusters? likes action?

likes comedy?

movie viewer

add contributions from each factor

A Possible ML Solution

•

pattern:

rating

←

viewer/movie factors

•

learning:

→

known rating

→ learned

factors

→ unknown rating prediction

key part of the

world-champion

(again!) system from National Taiwan Univ.

in KDDCup 2011

(13)

The Learning Problem Components of Machine Learning

Components of Machine Learning

(14)

Components of Learning:

Metaphor Using Credit Approval

Applicant Information

age 23 years

gender female

annual salary NTD 1,000,000 year in residence 1 year

year in job 0.5 year current debt 200,000

unknown pattern to be learned:

‘approve credit card good for bank?’

(15)

Formalize the Learning Problem

Basic Notations

•

input:

x ∈ X (customer application)

•

output: y ∈ Y (good/bad after approving credit card)

• unknown pattern to be learned ⇔ target function:

f : X → Y (ideal credit approval formula)

• data ⇔ training examples: D = {(x ₁

,y

₁

), (x

₂

,y

₂

), · · · , (x

_N

,y

_N

)}

(historical records in bank)

• hypothesis ⇔ skill

with hopefully

good performance:

g : X → Y (‘learned’ formula to be used)

{(x n , y n )}

from

f ML ^g

(16)

Learning Flow for Credit Approval

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

•

target f

unknown

(i.e. no programmable definition)

•

hypothesis g hopefully ≈ f but possibly

different

from f

(perfection ‘impossible’ when f unknown) What does g look like?

(17)

The Learning Model

training examples D : (x

1

, y

1

), · · · , (x

N

, y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

•

assume g ∈ H = {h

_k

}, i.e. approving if

• h

₁

: annual salary > NTD 800,000

• h

₂

: debt > NTD 100,000 (really?)

• h

₃

: year in job ≤ 2 (really?)

•

hypothesis set H:

• can contain good or bad hypotheses

• up to A to pick the ‘best’ one as g

learning model

= A and H

(18)

Practical Definition of Machine Learning

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

₁

), · · · , (x

_N

,y

_N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

machine learning:

use

data

to compute

hypothesis g

that approximates

target f

(19)

The Learning Problem Learning with Different Output Space Y

Learning with Different Output Space Y

(20)

Credit Approval Problem Revisited

age 23 years

gender female

annual salary NTD 1,000,000 year in residence 1 year

year in job 0.5 year current debt 200,000

credit?

{no(−1), yes(+1)}

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

_N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

Y = {−1, +1}:

binary classification

(21)

•

credit

approve/disapprove

•

email

spam/non-spam

•

patient

sick/not sick

•

ad

profitable/not profitable

•

answer

correct/incorrect

(KDDCup 2010)

core and important problem with many tools as

building block of other tools

(22)

Multiclass Classification: Coin Recognition Problem

25

5 1

Mass

Size 10

•

classify US coins (1c, 5c, 10c, 25c) by (size, mass)

•

Y = {1c, 5c, 10c, 25c}, or

Y = {1, 2, · · · , K } (abstractly)

•

binary classification: special case with K = 2

•

written digits ⇒ 0, 1, · · · , 9

•

pictures ⇒ apple, orange, strawberry

•

emails ⇒ spam, primary, social, promotion, update (Google)

many applications

in practice, especially for ‘recognition’

(23)

Regression: Patient Recovery Prediction Problem

•

binary classification: patient features ⇒ sick or not

•

multiclass classification: patient features ⇒ which type of cancer

•

regression: patient features ⇒

how many days before recovery

• Y = R

or Y = [lower, upper] ⊂ R (bounded regression)

—deeply studied in statistics

•

company data ⇒ stock price

•

climate data ⇒ temperature

also core and important with many ‘statistical’

tools as

building block of other tools

(24)

Mini Summary

Learning with Different Output Space Y

• binary classification: Y = {−1, +1}

•

multiclass classification: Y = {1, 2, · · · , K }

• regression

: Y = R

•

. . .and a lot more!!

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

core tools: binary classification and regression

(25)

The Learning Problem Learning with Different Data Label y_n

Learning with Different Data Label y _n

(26)

Supervised: Coin Recognition Revisited

25

5 1

Mass

Size 10

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

supervised learning:

every

x _n comes with corresponding y _n

(27)

Unsupervised: Coin Recognition without y _n

25

5 1

Mass

Size 10

supervised multiclass classification

Mass

Size

unsupervised multiclass classification

⇐⇒

‘clustering’

•

articles ⇒ topics

•

consumer profiles ⇒ consumer groups

clustering: a challenging but useful problem

(28)

Unsupervised: Coin Recognition without y _n

25

5 1

Mass

Size 10

supervised multiclass classification

Mass

Size

unsupervised multiclass classification

⇐⇒

‘clustering’

•

articles ⇒ topics

•

consumer profiles ⇒ consumer groups

clustering: a challenging but useful problem

(29)

Unsupervised: Learning without y _n

•

clustering: {x

_n

} ⇒ cluster(x)

(≈ ‘unsupervised multiclass classification’)

—i.e. articles ⇒ topics

• density estimation: {x _n

} ⇒ density(x) (≈ ‘unsupervised bounded regression’)

—i.e. traffic reports with location ⇒ dangerous areas

• outlier detection: {x _n

} ⇒ unusual(x)

(≈ extreme ‘unsupervised binary classification’)

—i.e. Internet logs ⇒ intrusion alert

•

. . .and a lot more!!

unsupervised learning: diverse, with possibly

very different performance goals

(30)

Semi-supervised: Coin Recognition with Some y _n

25

5 1

Mass

Size 10

supervised

25

5 1

Mass

Size 10

semi-supervised

Mass

Size

unsupervised (clustering)

•

face images with a few labeled ⇒ face identifier (Facebook)

•

medicine data with a few labeled ⇒ medicine effect predictor

semi-supervised learning: leverage

unlabeled data to avoid ‘expensive’ labeling

(31)

Reinforcement Learning

a ‘very different’ but natural way of learning

Teach Your Dog: Say ‘Sit Down’

The dog pees on the ground.

BAD DOG. THAT’S A VERY WRONG ACTION.

•

cannot easily show the dog that y

_n

= sit when

x _n

=‘sit down’

•

but can ‘punish’ to say ˜y

_n

= pee is wrong

•

(customer, ad choice, ad click earning) ⇒ ad system

•

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with

‘partial/implicit

information’

(often sequentially)

(32)

Reinforcement Learning

a ‘very different’ but natural way of learning

Teach Your Dog: Say ‘Sit Down’

The dog sits down.

Good Dog. Let me give you some cookies.

•

still cannot show y

_n

= sit when

x _n

=‘sit down’

•

but can ‘reward’ to say ˜y

_n

= sit is good

•

(customer, ad choice, ad click earning) ⇒ ad system

•

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with

‘partial/implicit

information’

(often sequentially)

(33)

Mini Summary

Learning with Different Data Label y n

• supervised: all y _n

•

unsupervised: no y

n

•

semi-supervised: some y

n

•

reinforcement: implicit y

_n

by goodness(˜y

_n

)

•

. . .and more!!

unknown target function f : X → Y

training examples D : (x

1

,y

₁

), · · · , (x

_N

,y

_N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

core tool: supervised learning

(34)

The Learning Problem Learning with Different Protocol f ⇒ (x_n,y_n)

Learning with Different Protocol f ⇒ (x _n , y _n )

(35)

Batch Learning: Coin Recognition Revisited

25

5 1

Mass

Size 10

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

batch

supervised multiclass classification:

learn from

all known

data

(36)

•

batch of (email, spam?) ⇒ spam filter

•

batch of (patient, cancer) ⇒ cancer classifier

•

batch of patient data ⇒ group of patients

batch learning:

a very common protocol

(37)

Online: Spam Filter that ‘Improves’

•

batch spam filter:

learn with known (email, spam?) pairs, and predict with fixed g

• online

spam filter, which

sequentially:

1 observe an email x

t

2 predict spam status with current g

t

(x

t

)

3 receive ‘desired label’ y

t

from user, and then update g

t

with (x

t

, y

t

)

Connection to What We Have Learned

•

PLA can be easily adapted to online protocol (how?)

•

reinforcement learning is often done online (why?)

online: hypothesis ‘improves’ through receiving data instances

sequentially

(38)

Active Learning: Learning by ‘Asking’

Protocol ⇔ Learning Philosophy

•

batch: ‘duck feeding’

•

online: ‘passive sequential’

• active: ‘question asking’

(sequentially)

—query the y

_n

of the

chosen x _n

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

active: improve hypothesis with fewer labels (hopefully) by asking questions

strategically

(39)

Mini Summary

Learning with Different Protocol f ⇒ (x n , y n )

• batch: all known data

•

online: sequential (passive) data

• active: strategically-observed data

•

. . .and more!!

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

core protocol: batch

(40)

The Learning Problem Learning with Different Input Space X

Learning with Different Input Space X

(41)

Credit Approval Problem Revisited

age 23 years

gender female

annual salary NTD 1,000,000 year in residence 1 year

year in job 0.5 year current debt 200,000 unknown target function

f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

concrete

features: each dimension of X ⊆ R

^d

represents ‘sophisticated physical meaning’

(42)

More on Concrete Features

• (size, mass)

for coin classification

• customer info

for credit approval

• patient info

for cancer diagnosis

•

often including ‘human intelligence’

on the learning task

25

5 1

Mass

Size 10

concrete features: the ‘easy’ ones for ML

(43)

Raw Features: Digit Recognition Problem (1/2)

•

digit recognition problem: features ⇒ meaning of digit

•

a typical supervised multiclass classification problem

(44)

Raw Features: Digit Recognition Problem (2/2)

by Concrete Features

x =(symmetry, density)

by Raw Features

•

16 by 16 gray image

x ≡

(0, 0, 0.9, 0.6, · · · ) ∈ R

²⁵⁶

•

‘simplephysical meaning’;

thus more difficult for ML than concrete features

•

image pixels, speech signal, etc.

raw features: often need human or machines to

convert to concrete ones

(45)

Time Features: Stock Prediction Problem

Stock Prediction Problem

•

given previous (time, price) pairs, predict whether the price would go up or down tomorrow?

•

a ‘binary classification’ problem (or a regression one if predicting the price itself)

•

X ⊆ R representing time, Y = R

⁺

representing price

•

timestamp when student performance in online tutoring system (KDDCup 2010)

•

rating time given by user in recommender system (KDDCup 2011)

time features: can carry trend

(46)

Abstract Features: Rating Prediction Problem

Rating Prediction Problem (KDDCup 2011)

•

given previous (userid, itemid, rating) tuples, predict the rating that some userid would give to itemid?

•

a regression problem with Y ⊆ R as rating and

X ⊆ N × N as (userid, itemid)

•

‘nophysical meaning’; thus even more difficult for ML

•

student ID in online tutoring system (KDDCup 2010)

•

advertisement ID in online ad system

abstract: again need ‘feature

conversion/extraction/construction’

(47)

Mini Summary

Learning with Different Input Space X

• concrete: sophisticated (and related)

physical meaning

•

raw: simple physical meaning

•

time: some trends

•

abstract: no (or little) physical meaning

•

. . .and more!!

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

‘easy’ input: concrete

(48)

The Learning Problem Machine Learning Research in CLLab

Machine Learning Research in CLLab

(49)

Making Machine Learning Realistic: Now

Oracle: truth f (x) + noise e(x)

? (4) ?

data (instance x

n

, label y

n

)

? (1) -

learning

6 (3) good learning system g(x) algorithm

'

&

$

% -

(2) - 6

learning model {h(x)}

CLLab Works: Loosen the Limits of ML

1 cost-sensitive classification: limited protocol (classification) + auxiliary info. (cost)

2 multi-label classification: limited protocol (classification) + structure info. (label relation)

3 active learning: limited protocol (unlabeled data) + requested info.

(query)

4 online learning: limited protocol (streaming data) + feedback info.

(loss)

cost-sensitive classification

(50)

Which Digit Did You Write?

?

one (1) two (2) three (3)

a

classification problem

—grouping “pictures” into different “cate- gories”

(51)

Traditional Classification Problem

Oracle: truth f (x) + noise e(x)

?

data (instance x

_n

, label y

_n

)

?

learning good

learning

system g(x) ≈ f (x) algorithm

'

&

$

% -

6 learning model {g

α

(x)}

1 input: a batch of examples (digit x

_n

, intended label y

n

)

2 desired output: some g(x) such that g(x) 6= y seldom for future examples (x, y )

3 evaluation for some digit

(x = , y = 2)

—g(x) =







1 : wrong;

2 : right;

3 : wrong

Are all the

wrongs equally bad?

(52)

What is the Status of the Patient?

?

H1N1-infected cold-infected healthy

another

classification problem

—grouping “patients” into different “status”

(53)

Patient Status Prediction

error measure = society cost

actual predicted H1N1 cold healthy

H1N1 0 1000 100000

cold 100 0 3000

healthy 100 30 0

•

H1N1 mis-predicted as healthy:

very high cost

•

cold mis-predicted as healthy:

high cost

•

cold correctly predicted as cold:

no cost

human doctors consider costs of decision;

can computer-aided diagnosis do the

same?

(54)

Our Contributions

binary multiclass

regular well-studied well-studied

cost-sensitive known (Zadrozny, 2003)

ongoing (our works)

theoretic, algorithmic and empirical studies of cost-sensitive classification

•

ICML 2010: a theoretically-supported algorithm with

superior experimental results

•

BIBM 2011: application to real-world

bacteria classification with promising experimental results

•

KDD 2012: a cost-sensitive

and error-sensitive

methodology (achieving both low cost and

few

wrongs)

(55)

Making Machine Learning Realistic: Next

Teacher

?

cost c(t) query x(t) & guess ˆ y (t)

? learning algorithm '

&

$

%

knowledge X P P

P P P P i

6 learning model

Interactive Machine Learning

1 environment

2 exploration

3 dynamic

4 partial feedback

let us teach machines as “easily” as teaching students

(56)

Case: Interactive Learning for Online Advertisement

Traditional Machine Learning for Online Advertisement

•

data gathering: system

randomly shows ads to some previous users

•

expert building: system

analyzes data gathered to determine best (fixed) strategy

Interactive Machine Learning for Online Advertisement

•

environment : system serves

online users with profile

•

exploration : system

decides to show an ad to the user

•

dynamic : system receives data from

real-time user click

•

partial feedback : system receives

reward only if clicking

(57)

Preliminary Success: ICML 2012 Exploration &

Exploitation Challenge

Interactive Machine Learning for Online Advertisement

•

environment : system serves

online users with profile

•

exploration : system

decides to show an ad to the user

•

dynamic : system receives data from

real-time user click

•

partial feedback : system receives

reward only if clicking

NTU beats two MIT teams to be the phase 1 winner!

interactive: more challenging than traditional machine learning, but

realistic

(58)

The Learning Problem More on KDDCup

More on KDDCup

(59)

What is KDDCup?

Background

•

an annual competition on KDD (knowledge discovery and data mining)

•

organized by ACM SIGKDD, starting from 1997, now

the most prestigious data mining competition

•

usually lasts 3-4 months

•

participants include famous research labs (IBM, AT&T) and top universities (Stanford, Berkeley)

(60)

Aim of KDDCup

Aim

•

bridge the gap between theory and

practice, such as

• scalability and efficiency

• missing data and noise

• heterogeneous data

• unbalanced data

• combination of different models

•

define the

state-of-the-art

(61)

KDDCups: 2008 to 2013 I

2008

•

organizer: Siemens

•

topic: breast cancer prediction (medical)

•

data size: 0.2M

•

teams: > 200

•

NTU:

co-champion

with IBM (led by Prof. Shou-de Lin)

2009

•

organizer: Orange

•

topic: customer behavior prediction (business)

•

data size: 0.1M

•

teams: > 400

•

NTU:

3rd place

of slow track

(62)

KDDCups: 2008 to 2013 II

2010

•

organizer: PSLC Data Shop

•

topic: student performance prediction (education)

•

data size: 30M

•

teams: > 100

•

NTU:

champion

and

student-team champion 2011

•

organizer: Yahoo!

•

topic: music preference prediction (recommendation)

•

data size: 300M

•

teams: > 1000

•

NTU:

double champions

(63)

KDDCups: 2008 to 2013 III

2012

•

organizer: Tencent

•

topic: webuser behavior prediction (Internet)

•

data size: 150M

•

teams: > 800

•

NTU:

champion of track 2 2013

•

organizer: Microsoft Research

•

topic: paper-author relationship prediction (academia)

•

data size: 600M

•

teams: > 500

•

NTU:

double champions

(64)

KDDCup 2011

Music Recommendation Systems

•

host: Yahoo!

• 11 years of Yahoo! music data

• 2 tracks of competition

•

official dates:

March 15 to June 30

•

1878 teams submitted to track 1;

1854 teams submitted to track 2

(65)

NTU Team for KDDCup 2011

•

3 faculties:

Profs. Chih-Jen Lin, Hsuan-Tien Lin and Shou-De Lin

•

1 course (starting in 2010)

Data Mining and Machine Learning: Theory and Practice

•

3 TAs and 19 students:

most were

inexperienced in music recommendation in the beginning

•

official classes: April to June;

actual classes: December to June

our motto: study state-of-the-art approaches and then

creatively improve them

(66)

Previously: How Much Did You Like These Movies?

http://www.netflix.com

(1M dollar competition between 2007-2009)

goal: use “movies you’ve rated” to automatically

predict your

preferences on future movies

(67)

The Track 1 Problem (1/2)

Given Data

263M examples (user u, item i, rating r

_ui

,date t

_ui

,time τ

_ui

)

user item rating date time

1 21 10 102 23:52

1 213 90 1032 21:01

4 45 95 768 09:15

· · ·

•

u, i: abstract IDs

•

r

_ui

: integer between 0 and 100,

mostly multiples of 10 Additional Information: Item Hierarchy

•

track (46.85%)

•

album (19.01%)

•

artist (28.84%)

•

genre (5.30%)

(68)

The Track 1 Problem (2/2)

Data Partitioned by Organizers

•

training: 253M; validation: 4M;

test (w/o rating): 6M

•

per user,

training < validation < test in time

• ≥ 20 examples total

• 4 examples in validation; 6 in test

• fixed random half of test: leaderboard;

another half: award decision

Goal

predictions ˆr

_ui

≈ r

_ui

on the test set, measured by RMSE =

q

average(ˆr

_ui

− r

_ui

)

²

— one submission allowed

every eight hours

(69)

Three Properties of Track 1 Data

R =

track

₁

track

₂

album

₃

author

₄

· · · genre

_I

user

1

100 80 70 ? · · · −

user

2

− 0 ? 80 · · · −

· · · · · · · · · · · · · · · · · · · · ·

user

_U

? − 20 − · · · 0

similar to Netflix data, but with the following differences...

• scale: larger data

—study mature models that are computationally feasible

• taxonomy: relation graph of tracks, albums, authors and genres

—include as features for combining models nonlinearly

• time: detailed; training earlier than test

—include as features for combining models nonlinearly;

respect time-closeness during training

(70)

Framework of Our Solution

System Architecture

• improve standard models: design variants within 6 families of state-of-the-art models (reaches RMSE 22.7915)

• blend the models: improve prediction power by blending the variants carefully (reaches RMSE 21.3598)

• aggregate the blended predictors: construct a linear ensemble with test performance estimators (reaches RMSE 21.0253)

• post-process the ensemble: add a final touch based on observations from data analysis (reaches RMSE 21.0147)

not only

hard work (200+ models included),

but also

key techniques

(71)

The Learning Problem That’s about all. Thank you!

Machine Learning Overview and Applications

Machine Learning Overview and Applications

Computational Learning Lab (CLLab) Department of Computer Science

& Information Engineering

National Taiwan University

( 國立台灣大學資訊工程系計算學習實驗室)

What is Machine Learning

From Learning to Machine Learning

learning: acquiring skill

observations observations learning skill

machine learning: acquiring skill

machine learning:

data

data ML skill

skill?

A More Concrete Definition

skill

performance measure

machine learning: improving some performance measure

machine learning:

computed

data

data ML

improved performance measure

An Application in Computational Finance

stock data ML more investment gain

Yet Another Application: Tree Recognition

•

difficult

•

3-year-old can do so

•

easier to build

alternative route

The Machine Learning Route

alternative route

Some Use Scenarios

•

•

•

•

computer a fish, you feed it for a day;

:-)

Key Essence of Machine Learning

machine learning: improving some performance measure

machine learning:

computed

data

data ML

improved performance measure

1

some ‘underlying pattern’ to be learned

2

no

definition

3

data

Snapshot Applications of Machine Learning

Daily Needs: Food, Clothing, Housing, Transportation

data ML skill

1

(Sadilek et al., 2013)

• data: Twitter data (words + location)

• skill: tell food poisoning likeliness of restaurant properly

2

(Abu-Mostafa, 2012)

• data: sales figures + client surveys

• skill: give good fashion recommendations to clients

3

(Tsanas and Xifara, 2012)

• data: characteristics of buildings and their energy load

• skill: predict energy load of other buildings closely

4

(Stallkamp et al., 2012)

• data: some traffic sign images and meanings

• skill: recognize traffic signs accurately

ML

Education

data ML skill

• data: students’ records on quizzes on a Math tutoring system

data ML ^skill

data ML ^skill

data ML ^skill

data ML ^skill

• data ⇔ training examples: D = {(x ₁

₁

₂

₂

_N

_N

f ML ^g