### Machine Learning Overview and Applications

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

### Computational Learning Lab (CLLab) Department of Computer Science

### & Information Engineering

### National Taiwan University

### ( 國立台灣大學資訊工程系計算學習實驗室)

materials mostly taken from my “Learning from Data” book, my

“Machine Learning Foundations” free online course, and works from NTU CLLab and NTU KDDCup teams

The Learning Problem What is Machine Learning

## What is Machine Learning

The Learning Problem What is Machine Learning

### From Learning to Machine Learning

### learning: acquiring skill

learning:

with experience accumulated from

### observations observations learning skill

**machine learning: acquiring** skill

**machine learning:**

with experience accumulated/computedfrom

### data

### data ML ^{skill}

What is

### skill?

The Learning Problem What is Machine Learning

### A More Concrete Definition

⇔

### skill

⇔ improve some

### performance measure

(e.g. prediction accuracy)**machine learning: improving some** performance measure

**machine learning:**

with experience

**computed**

from### data

### data ML

### improved performance measure

### An Application in Computational Finance

### stock data ML more investment gain

Why use machine learning?

The Learning Problem What is Machine Learning

### Yet Another Application: Tree Recognition

### •

‘define’ trees and hand-program:**difficult**

### •

learn from data (observations) and recognize: a**3-year-old can do so**

### •

‘ML-based tree recognition system’ can be**easier to build**

than hand-programmed
system
ML: an

**alternative route**

to
build complicated systems
The Learning Problem What is Machine Learning

### The Machine Learning Route

ML: an

**alternative route**

to build complicated systems
### Some Use Scenarios

### •

when human cannot program the system manually—navigating on Mars

### •

when human cannot ‘define the solution’ easily—speech/visual recognition

### •

when needing rapid decisions that humans cannot do—high-frequency trading

### •

when needing to be user-oriented in a massive scale—consumer-targeted marketing

Give a

**computer a fish, you feed it for a day;**

teach it how to fish, you feed it for a lifetime.

**:-)**

The Learning Problem What is Machine Learning

### Key Essence of Machine Learning

**machine learning: improving some** performance measure

**machine learning:**

with experience

**computed**

from### data

### data ML

### improved performance measure

### 1

exists### some ‘underlying pattern’ to be learned

—so ‘performance measure’ can be improved

### 2

but### no

programmable (easy)### definition

—so ‘ML’ is needed

### 3

somehow there is### data

about the pattern—so ML has some ‘inputs’ to learn from

key essence: help decide whether to use ML

The Learning Problem Snapshot Applications of Machine Learning

## Snapshot Applications of Machine Learning

The Learning Problem Snapshot Applications of Machine Learning

### Daily Needs: Food, Clothing, Housing, Transportation

### data ML ^{skill}

### 1

Food### (Sadilek et al., 2013)

### • data: Twitter data (words + location)

### • skill: tell food poisoning likeliness of restaurant properly

### 2

Clothing### (Abu-Mostafa, 2012)

### • data: sales figures + client surveys

### • skill: give good fashion recommendations to clients

### 3

Housing### (Tsanas and Xifara, 2012)

### • data: characteristics of buildings and their energy load

### • skill: predict energy load of other buildings closely

### 4

Transportation### (Stallkamp et al., 2012)

### • data: some traffic sign images and meanings

### • skill: recognize traffic signs accurately

### ML

is everywhere!The Learning Problem Snapshot Applications of Machine Learning

### Education

### data ML ^{skill}

### • data: students’ records on quizzes on a Math tutoring system

### • skill: predict whether a student can give a correct answer to

another quiz question### A Possible ML Solution

answer correctly ≈Jrecent

### strength

of student >### difficulty

of questionK### •

give ML### 9 million records

from### 3000 students

### •

ML determines (reverse-engineers)### strength

and### difficulty

automaticallykey part of the

**world-champion**

system from
National Taiwan Univ. in KDDCup 2010
The Learning Problem Snapshot Applications of Machine Learning

### Entertainment: Recommender System (1/2)

### data ML ^{skill}

### • data: how many users have rated some movies

### • skill: predict how a user would rate an unrated movie

### A Hot Problem

### •

competition held by Netflix in 2006### • 100,480,507 ratings that 480,189 users gave to 17,770 movies

### • 10% improvement = **1 million dollar prize**

### •

similar competition (movies → songs) held by Yahoo! in KDDCup 2011### • 252,800,275 ratings that 1,000,990 users gave to 624,961 songs

How can machines

**learn our preferences?**

The Learning Problem Snapshot Applications of Machine Learning

### Entertainment: Recommender System (2/2)

Match movie and viewer factors

predicted rating

comedy content action

content blockb uster?

TomCruisein it?

likes TomCruise?

prefers blockbusters? likes action?

likes comedy?

movie viewer

add contributions from each factor

### A Possible ML Solution

### •

pattern:### rating

←### viewer/movie factors

### •

learning:→

### known rating

→ learned

### factors

→ unknown rating prediction

key part of the

**world-champion**

(again!)
system from National Taiwan Univ.
in KDDCup 2011

The Learning Problem Components of Machine Learning

## Components of Machine Learning

The Learning Problem Components of Machine Learning

### Components of Learning:

### Metaphor Using Credit Approval

### Applicant Information

age 23 years

gender female

annual salary NTD 1,000,000 year in residence 1 year

year in job 0.5 year current debt 200,000

**unknown** pattern to be learned:

‘approve credit card good for bank?’

The Learning Problem Components of Machine Learning

### Formalize the Learning Problem

### Basic Notations

### •

input:**x ∈ X (customer application)**

### •

output: y ∈ Y (good/bad after approving credit card)### • unknown pattern to be learned ⇔ target function:

f : X → Y (ideal credit approval formula)

### • data ⇔ training examples: D = {(x _{1}

,y_{1}

), (x_{2}

,y_{2}

), · · · , (x_{N}

,y_{N}

)}
(historical records in bank)

### • hypothesis ⇔ skill

with hopefully### good performance:

g : X → Y (‘learned’ formula to be used)

**{(x** n , y n )}

from### f ML ^{g}

The Learning Problem Components of Machine Learning

### Learning Flow for Credit Approval

### unknown target function f : X → Y

### (ideal credit approval formula)

### training examples **D : (x**

1### , y

1### ), · · · , (x

N### ,y

N### ) (historical records in bank)

### learning algorithm

### A

### final hypothesis g ≈ f

### (‘learned’ formula to be used)

### •

target f**unknown**

(i.e. no programmable definition)

### •

hypothesis g hopefully ≈ f but possibly**different**

from f
(perfection ‘impossible’ when f unknown) What does g look like?

The Learning Problem Components of Machine Learning

### The Learning Model

### training examples **D : (x**

1### , y

1### ), · · · , (x

N### , y

N### ) (historical records in bank)

### learning algorithm

### A

### final hypothesis g ≈ f

### (‘learned’ formula to be used)

### hypothesis set H

### (set of candidate formula)

### •

assume g ∈ H = {h_{k}

}, i.e. approving if
### • h

_{1}

### : annual salary > NTD 800,000

### • h

_{2}

### : debt > NTD 100,000 (really?)

### • h

_{3}

### : year in job ≤ 2 (really?)

### •

hypothesis set H:### • can contain **good or bad hypotheses**

### • up to A to pick the ‘best’ one as g

**learning model**

= A and H
The Learning Problem Components of Machine Learning

### Practical Definition of Machine Learning

### unknown target function f : X → Y

### (ideal credit approval formula)

### training examples **D : (x**

1### , y

_{1}

### ), · · · , (x

_{N}

### ,y

_{N}

### ) (historical records in bank)

### learning algorithm

### A

### final hypothesis g ≈ f

### (‘learned’ formula to be used)

### hypothesis set H

### (set of candidate formula)

### machine learning:

use

### data

to compute### hypothesis g

that approximates

### target f

The Learning Problem Learning with Different Output Space Y

## Learning with Different Output Space Y

The Learning Problem Learning with Different Output Space Y

### Credit Approval Problem Revisited

### age 23 years

### gender female

### annual salary NTD 1,000,000 year in residence 1 year

### year in job 0.5 year current debt 200,000

credit?

**{no(−1), yes(+1)}**

### unknown target function f : X → Y

### (ideal credit approval formula)

### training examples **D : (x**

1### , y

1### ), · · · , (x

_{N}

### ,y

N### ) (historical records in bank)

### learning algorithm

### A

### final hypothesis g ≈ f

### (‘learned’ formula to be used)

### hypothesis set H

### (set of candidate formula)

Y = {−1, +1}:

**binary classification**

The Learning Problem Learning with Different Output Space Y

### More Binary Classification Problems

### •

credit### approve/disapprove

### •

email### spam/non-spam

### •

patient### sick/not sick

### •

ad### profitable/not profitable

### •

answer### correct/incorrect

(KDDCup 2010)core and important problem with many tools as

**building block of other tools**

The Learning Problem Learning with Different Output Space Y

### Multiclass Classification: Coin Recognition Problem

**25**

**5**
**1**

**Mass**

**Size**
**10**

### •

classify US coins (1c, 5c, 10c, 25c) by (size, mass)### •

Y = {1c, 5c, 10c, 25c}, or**Y = {1, 2, · · · , K } (abstractly)**

### •

binary classification: special case with K = 2### Other Multiclass Classification Problems

### •

written digits ⇒ 0, 1, · · · , 9### •

pictures ⇒ apple, orange, strawberry### •

emails ⇒ spam, primary, social, promotion, update (Google)**many applications**

in practice,
especially for ‘recognition’
The Learning Problem Learning with Different Output Space Y

### Regression: Patient Recovery Prediction Problem

### •

binary classification: patient features ⇒ sick or not### •

multiclass classification: patient features ⇒ which type of cancer### •

regression: patient features ⇒**how many days before recovery**

### • Y = R

or Y = [lower, upper] ⊂ R (bounded regression)—deeply studied in statistics

### Other Regression Problems

### •

company data ⇒ stock price### •

climate data ⇒ temperaturealso core and important with many ‘statistical’

tools as

**building block of other tools**

The Learning Problem Learning with Different Output Space Y

### Mini Summary

### Learning with Different Output Space Y

### • **binary classification: Y = {−1, +1}**

### •

multiclass classification: Y = {1, 2, · · · , K }### • **regression**

: Y = R
### •

. . .and a lot more!!### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

1### ), · · · , (x

N### ,y

N### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

core tools: binary classification and regression

The Learning Problem Learning with Different Data Label y_{n}

## Learning with Different Data Label y _{n}

The Learning Problem Learning with Different Data Label y_{n}

### Supervised: Coin Recognition Revisited

**25**

**5**
**1**

**Mass**

**Size**
**10**

### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

1### ), · · · , (x

N### ,y

N### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

supervised learning:

every

**x** _{n} **comes with corresponding y** _{n}

The Learning Problem Learning with Different Data Label y_{n}

### Unsupervised: Coin Recognition without y _{n}

**25**

**5**
**1**

**Mass**

**Size**
**10**

### supervised multiclass classification

**Mass**

**Size**

**unsupervised** multiclass classification

### ⇐⇒

‘clustering’### Other Clustering Problems

### •

articles ⇒ topics### •

consumer profiles ⇒ consumer groups**clustering: a challenging but useful problem**

The Learning Problem Learning with Different Data Label y_{n}

### Unsupervised: Coin Recognition without y _{n}

**25**

**5**
**1**

**Mass**

**Size**
**10**

### supervised multiclass classification

**Mass**

**Size**

**unsupervised** multiclass classification

### ⇐⇒

‘clustering’### Other Clustering Problems

### •

articles ⇒ topics### •

consumer profiles ⇒ consumer groups**clustering: a challenging but useful problem**

The Learning Problem Learning with Different Data Label y_{n}

### Unsupervised: Learning without y _{n}

### Other Unsupervised Learning Problems

### •

clustering: {x_{n}

**} ⇒ cluster(x)**

(≈ ‘unsupervised multiclass classification’)

—i.e. articles ⇒ topics

### • **density estimation: {x** _{n}

**} ⇒ density(x)**(≈ ‘unsupervised bounded regression’)

—i.e. traffic reports with location ⇒ dangerous areas

### • **outlier detection: {x** _{n}

**} ⇒ unusual(x)**

(≈ extreme ‘unsupervised binary classification’)

—i.e. Internet logs ⇒ intrusion alert

### •

. . .and a lot more!!**unsupervised learning: diverse, with possibly**

very different performance goals
The Learning Problem Learning with Different Data Label y_{n}

### Semi-supervised: Coin Recognition with Some y _{n}

**25**

**5**
**1**

**Mass**

**Size**
**10**

### supervised

**25**

**5**
**1**

**Mass**

**Size**
**10**

**semi-supervised**

**Mass**

**Size**

### unsupervised (clustering)

### Other Semi-supervised Learning Problems

### •

face images with a few labeled ⇒ face identifier (Facebook)### •

medicine data with a few labeled ⇒ medicine effect predictor**semi-supervised learning: leverage**

unlabeled data to avoid ‘expensive’ labeling
The Learning Problem Learning with Different Data Label y_{n}

### Reinforcement Learning

a ‘very different’ but natural way of learning

### Teach Your Dog: Say ‘Sit Down’

The dog pees on the ground.

**BAD DOG. THAT’S A VERY WRONG ACTION.**

### •

cannot easily show the dog that y_{n}

= sit
when**x** _{n}

=‘sit down’
### •

but can ‘punish’ to say ˜y_{n}

= pee is wrong
### Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

### •

(customer, ad choice, ad click earning) ⇒ ad system### •

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with**‘partial/implicit**

**information’**

(often sequentially)
The Learning Problem Learning with Different Data Label y_{n}

### Reinforcement Learning

a ‘very different’ but natural way of learning

### Teach Your Dog: Say ‘Sit Down’

The dog sits down.

### Good Dog. Let me give you some cookies.

### •

still cannot show y_{n}

= sit
when**x** _{n}

=‘sit down’
### •

but can ‘reward’ to say ˜y_{n}

= sit is good
### Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

### •

(customer, ad choice, ad click earning) ⇒ ad system### •

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with**‘partial/implicit**

**information’**

(often sequentially)
The Learning Problem Learning with Different Data Label y_{n}

### Mini Summary

### Learning with Different Data Label y n

### • **supervised: all y** _{n}

### •

unsupervised: no y### n

### •

semi-supervised: some y### n

### •

reinforcement: implicit y_{n}

by goodness(˜y_{n}

)
### •

. . .and more!!### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

_{1}

### ), · · · , (x

_{N}

### ,y

_{N}

### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

core tool: supervised learning

The Learning Problem Learning with Different Protocol f ⇒ (x_{n},y_{n})

## Learning with Different Protocol f ⇒ (x _{n} , y _{n} )

The Learning Problem Learning with Different Protocol f ⇒ (x_{n},y_{n})

### Batch Learning: Coin Recognition Revisited

**25**

**5**
**1**

**Mass**

**Size**
**10**

### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

1### ), · · · , (x

N### ,y

N### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

**batch**

supervised multiclass classification:
learn from

**all known**

data
The Learning Problem Learning with Different Protocol f ⇒ (x_{n},y_{n})

### More Batch Learning Problems

**25**

**5**
**1**

**Mass**

**Size**
**10**

**Mass**

**Size**

### •

batch of (email, spam?) ⇒ spam filter### •

batch of (patient, cancer) ⇒ cancer classifier### •

batch of patient data ⇒ group of patientsbatch learning:

**a very common protocol**

The Learning Problem Learning with Different Protocol f ⇒ (x_{n},y_{n})

### Online: Spam Filter that ‘Improves’

### •

batch spam filter:learn with known (email, spam?) pairs, and predict with fixed g

### • **online**

spam filter, which**sequentially:**

### 1 observe an email **x**

t
### 2 predict spam status with current g

t### (x

t### )

### 3 receive ‘desired label’ y

t### from user, and then update g

t### with (x

t### , y

t### )

### Connection to What We Have Learned

### •

PLA can be easily adapted to online protocol (how?)### •

reinforcement learning is often done online (why?)online: hypothesis ‘improves’ through receiving data instances

**sequentially**

The Learning Problem Learning with Different Protocol f ⇒ (x_{n},y_{n})

### Active Learning: Learning by ‘Asking’

### Protocol ⇔ Learning Philosophy

### •

batch: ‘duck feeding’### •

online: ‘passive sequential’### • **active: ‘question asking’**

(sequentially)
—query the y

_{n}

of the**chosen** **x** _{n}

### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

1### ), · · · , (x

N### ,y

N### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

active: improve hypothesis with fewer labels (hopefully) by asking questions

**strategically**

The Learning Problem Learning with Different Protocol f ⇒ (x_{n},y_{n})

### Mini Summary

### Learning with Different Protocol f ⇒ (x n , y n )

### • **batch: all known data**

### •

online: sequential (passive) data### • active: strategically-observed data

### •

. . .and more!!### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

1### ), · · · , (x

N### ,y

N### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

core protocol: batch

The Learning Problem Learning with Different Input Space X

## Learning with Different Input Space X

The Learning Problem Learning with Different Input Space X

### Credit Approval Problem Revisited

**age** **23 years**

**gender** **female**

**annual salary** **NTD 1,000,000** **year in residence** **1 year**

**year in job** **0.5 year** **current debt** **200,000** unknown target function

### f : X → Y

### (ideal credit approval formula)

### training examples **D : (x**

1### , y

1### ), · · · , (x

N### ,y

N### ) (historical records in bank)

### learning algorithm

### A

### final hypothesis g ≈ f

### (‘learned’ formula to be used)

### hypothesis set H

### (set of candidate formula)

**concrete**

features: each dimension of X ⊆ R^{d}

represents ‘sophisticated physical meaning’
The Learning Problem Learning with Different Input Space X

### More on Concrete Features

### • **(size, mass)**

for coin classification
### • **customer info**

for credit approval
### • **patient info**

for cancer diagnosis
### •

often including ‘human intelligence’on the learning task

**25**

**5**
**1**

**Mass**

**Size**
**10**

concrete features: the ‘easy’ ones for ML

The Learning Problem Learning with Different Input Space X

### Raw Features: Digit Recognition Problem (1/2)

### •

digit recognition problem: features ⇒ meaning of digit### •

a typical supervised multiclass classification problemThe Learning Problem Learning with Different Input Space X

### Raw Features: Digit Recognition Problem (2/2)

### by Concrete Features

**x =(symmetry, density)**

### by Raw Features

### •

16 by 16 gray image**x ≡**

(0, 0, 0.9, 0.6, · · · ) ∈ R^{256}

### •

‘simplephysical meaning’;thus more difficult for ML than concrete features

### Other Problems with Raw Features

### •

image pixels, speech signal, etc.raw features: often need human or machines to

**convert to concrete ones**

The Learning Problem Learning with Different Input Space X

### Time Features: Stock Prediction Problem

### Stock Prediction Problem

### •

given previous (time, price) pairs, predict whether the price would go up or down tomorrow?### •

a ‘binary classification’ problem (or a regression one if predicting the price itself)### •

X ⊆ R representing time, Y = R^{+}

representing price
### Other Problems with Time Features

### •

timestamp when student performance in online tutoring system (KDDCup 2010)### •

rating time given by user in recommender system (KDDCup 2011)time features: can carry trend

The Learning Problem Learning with Different Input Space X

### Abstract Features: Rating Prediction Problem

### Rating Prediction Problem (KDDCup 2011)

### •

given previous (userid, itemid, rating) tuples, predict the rating that some userid would give to itemid?### •

a regression problem with Y ⊆ R as rating and**X ⊆ N × N as** **(userid, itemid)**

### •

‘nophysical meaning’; thus even more difficult for ML### Other Problems with Abstract Features

### •

student ID in online tutoring system (KDDCup 2010)### •

advertisement ID in online ad systemabstract: again need ‘feature

**conversion/extraction/construction’**

The Learning Problem Learning with Different Input Space X

### Mini Summary

### Learning with Different Input Space X

### • **concrete: sophisticated (and related)**

physical meaning
### •

raw: simple physical meaning### •

time: some trends### •

abstract: no (or little) physical meaning### •

. . .and more!!### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

1### ), · · · , (x

N### ,y

N### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

‘easy’ input: concrete

The Learning Problem Machine Learning Research in CLLab

## Machine Learning Research in CLLab

The Learning Problem Machine Learning Research in CLLab

### Making Machine Learning **Realistic: Now**

### Oracle: truth f (x) + noise e(x)

### ? **(4)** ?

### data (instance **x**

n### , label y

n### )

### ? **(1)** -

### learning

### 6

**(3)** good learning system g(x) algorithm

### '

### &

### $

### % -

**(2)** - 6

### learning model {h(x)}

### CLLab Works: **Loosen the** **Limits of ML**

### 1 cost-sensitive classification: limited protocol (classification) + **auxiliary** **info. (cost)**

### 2 multi-label classification: limited protocol (classification) + **structure** **info. (label relation)**

### 3 active learning: limited protocol (unlabeled data) + **requested info.**

**(query)**

### 4 online learning: limited protocol (streaming data) + **feedback info.**

**(loss)**

next:

**(1)**

cost-sensitive classification
The Learning Problem Machine Learning Research in CLLab

### Which Digit Did You Write?

?

one (1) two (2) three (3)

a

**classification problem**

—grouping “pictures” into different “cate- gories”

The Learning Problem Machine Learning Research in CLLab

### Traditional Classification Problem

### Oracle: truth f (x) + noise e(x)

### ?

### data (instance **x**

_{n}

### , label y

_{n}

### )

### ?

### learning good

### learning

### system g(x) ≈ f (x) algorithm

### '

### &

### $

### % -

### 6

### learning model {g

α### (x)}

### 1 input: a batch of examples (digit **x**

_{n}

### , intended label y

n### )

### 2 desired output: some g(x) such that g(x) 6= y seldom for future examples (x, y )

### 3 evaluation for some digit

### (x = , y = 2)

### —g(x) =

###

###

###

### 1 : wrong;

### 2 : right;

### 3 : wrong

Are all the

**wrongs equally bad?**

The Learning Problem Machine Learning Research in CLLab

### What is the Status of the Patient?

?

H1N1-infected cold-infected healthy

another

**classification problem**

—grouping “patients” into different “status”

The Learning Problem Machine Learning Research in CLLab

### Patient Status Prediction

error measure = society cost

### actual predicted H1N1 cold healthy

### H1N1 0 1000 **100000**

### cold 100 0 3000

### healthy 100 30 0

### •

H1N1 mis-predicted as healthy:**very high cost**

### •

cold mis-predicted as healthy:### high cost

### •

cold correctly predicted as cold:### no cost

human doctors consider costs of decision;

**can computer-aided diagnosis do the**

**same?**

The Learning Problem Machine Learning Research in CLLab

### Our Contributions

binary multiclass

regular well-studied well-studied

cost-sensitive known (Zadrozny, 2003)

### ongoing (our works)

### theoretic, algorithmic and empirical studies of cost-sensitive classification

### •

ICML 2010: a theoretically-supported algorithm with**superior experimental results**

### •

BIBM 2011: application to real-world**bacteria** **classification with promising experimental results**

### •

KDD 2012: a cost-sensitive**and error-sensitive**

methodology (achieving both low cost and**few**

**wrongs)**

The Learning Problem Machine Learning Research in CLLab

### Making Machine Learning Realistic: **Next**

### Teacher

### ?

### cost c(t) query **x(t) & guess ˆ** y (t)

### ? learning algorithm '

### &

### $

### %

### knowledge X P P

### P P P P i

### 6 learning model

**Interactive Machine Learning**

### 1 environment

### 2 exploration

### 3 dynamic

### 4 partial feedback

let us teach machines as “easily” as teaching students

The Learning Problem Machine Learning Research in CLLab

### Case: Interactive Learning for Online Advertisement

### Traditional Machine Learning for Online Advertisement

### •

data gathering: system**randomly shows ads to some previous** **users**

### •

expert building: system**analyzes data gathered to determine** **best (fixed) strategy**

**Interactive Machine Learning for Online Advertisement**

### •

environment : system serves**online users with profile**

### •

exploration : system**decides to show an ad to the user**

### •

dynamic : system receives data from**real-time user click**

### •

partial feedback : system receives**reward only if clicking**

The Learning Problem Machine Learning Research in CLLab

### Preliminary Success: ICML 2012 Exploration &

### Exploitation Challenge

**Interactive Machine Learning for Online Advertisement**

### •

environment : system serves**online users with profile**

### •

exploration : system**decides to show an ad to the user**

### •

dynamic : system receives data from**real-time user click**

### •

partial feedback : system receives**reward only if clicking**

NTU beats two MIT teams to be the phase 1 winner!

interactive: more challenging than traditional machine learning, but

**realistic**

The Learning Problem More on KDDCup

## More on KDDCup

The Learning Problem More on KDDCup

### What is KDDCup?

### Background

### •

an annual competition on KDD (knowledge discovery and data mining)### •

organized by ACM SIGKDD, starting from 1997, now**the most** **prestigious data mining competition**

### •

usually lasts 3-4 months### •

participants include famous research labs (IBM, AT&T) and top universities (Stanford, Berkeley)The Learning Problem More on KDDCup

### Aim of KDDCup

### Aim

### •

bridge the gap between theory and**practice, such as**

### • scalability and efficiency

### • missing data and noise

### • heterogeneous data

### • unbalanced data

### • combination of different models

### •

define the**state-of-the-art**

The Learning Problem More on KDDCup

### KDDCups: 2008 to 2013 I

### 2008

### •

organizer: Siemens### •

topic: breast cancer prediction (medical)### •

data size: 0.2M### •

teams: > 200### •

NTU:**co-champion**

with IBM (led by Prof. Shou-de Lin)
### 2009

### •

organizer: Orange### •

topic: customer behavior prediction (business)### •

data size: 0.1M### •

teams: > 400### •

NTU:**3rd place**

of slow track
The Learning Problem More on KDDCup

### KDDCups: 2008 to 2013 II

### 2010

### •

organizer: PSLC Data Shop### •

topic: student performance prediction (education)### •

data size: 30M### •

teams: > 100### •

NTU:**champion**

and**student-team champion** 2011

### •

organizer: Yahoo!### •

topic: music preference prediction (recommendation)### •

data size: 300M### •

teams: > 1000### •

NTU:**double champions**

The Learning Problem More on KDDCup

### KDDCups: 2008 to 2013 III

### 2012

### •

organizer: Tencent### •

topic: webuser behavior prediction (Internet)### •

data size: 150M### •

teams: > 800### •

NTU:**champion of track 2** 2013

### •

organizer: Microsoft Research### •

topic: paper-author relationship prediction (academia)### •

data size: 600M### •

teams: > 500### •

NTU:**double champions**

The Learning Problem More on KDDCup

### KDDCup 2011

### Music Recommendation Systems

### •

host: Yahoo!### • **11 years of Yahoo! music data**

### • **2 tracks of competition**

### •

official dates:**March 15 to June 30**

### •

1878 teams submitted to track 1;1854 teams submitted to track 2

The Learning Problem More on KDDCup

### NTU Team for KDDCup 2011

### •

3 faculties:**Profs. Chih-Jen Lin, Hsuan-Tien Lin and Shou-De Lin**

### •

1 course (starting in 2010)**Data Mining and Machine Learning: Theory and Practice**

### •

3 TAs and 19 students:most were

**inexperienced in music recommendation in the** **beginning**

### •

official classes: April to June;**actual classes: December to June**

our motto: study state-of-the-art approaches and then

**creatively improve them**

The Learning Problem More on KDDCup

### Previously: How Much Did You Like These Movies?

http://www.netflix.com

**(1M dollar competition between 2007-2009)**

goal: use “movies you’ve rated” to automatically

predict your

**preferences on future movies**

The Learning Problem More on KDDCup

### The Track 1 Problem (1/2)

### Given Data

263M examples (user u, item i, rating r

_{ui}

,date t_{ui}

,time τ_{ui}

)
**user** **item** **rating** **date** **time**

**1** **21** **10** **102** **23:52**

**1** **213** **90** **1032** **21:01**

**4** **45** **95** **768** **09:15**

### · · ·

### •

u, i: abstract IDs### •

r_{ui}

: integer between 0 and 100,**mostly multiples of 10** Additional Information: Item Hierarchy

### •

track (46.85%)### •

album (19.01%)### •

artist (28.84%)### •

genre (5.30%)The Learning Problem More on KDDCup

### The Track 1 Problem (2/2)

### Data Partitioned by Organizers

### •

training: 253M; validation: 4M;test (w/o rating): 6M

### •

per user,**training < validation < test in time**

### • ≥ 20 examples total

### • 4 examples in validation; 6 in test

### • **fixed random half of test: leaderboard;**

**another half: award decision**

### Goal

predictions ˆr

_{ui}

≈ r_{ui}

on the test set, measured by
RMSE =
q

average(ˆr

_{ui}

− r_{ui}

)^{2}

— one submission allowed

**every eight hours**

The Learning Problem More on KDDCup

### Three Properties of Track 1 Data

**R =**

### track

_{1}

### track

_{2}

### album

_{3}

### author

_{4}

### · · · genre

_{I}

### user

1### 100 80 70 **?** · · · −

### user

2### − 0 **?** 80 · · · −

### · · · · · · · · · · · · · · · · · · · · ·

### user

_{U}

**?** − 20 − · · · 0

similar to Netflix data, but with the following differences...

### • scale: larger data

### —study mature models that are **computationally feasible**

### • taxonomy: relation graph of tracks, albums, authors and genres

### —include as features for combining models nonlinearly

### • time: detailed; training earlier than test

### —include as features for combining models nonlinearly;

**respect time-closeness** during training

The Learning Problem More on KDDCup

### Framework of Our Solution

### System Architecture

### • **improve standard models: design** **variants within 6 families of** **state-of-the-art models** (reaches RMSE 22.7915)

### • **blend the models: improve prediction power by** **blending the variants** **carefully** (reaches RMSE 21.3598)

### • **aggregate the blended predictors: construct a linear ensemble with** **test performance estimators** (reaches RMSE 21.0253)

### • **post-process the ensemble: add a final touch based on** **observations** **from data analysis** (reaches RMSE 21.0147)

not only

**hard work (200+ models included),**

but also**key techniques**

The Learning Problem That’s about all. Thank you!