Machine Learning Overview and Applications
Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw
Computational Learning Lab (CLLab) Department of Computer Science
& Information Engineering
National Taiwan University
( 國立台灣大學資訊工程系計算學習實驗室)
materials mostly taken from my “Learning from Data” book, my
“Machine Learning Foundations” free online course, and works from NTU CLLab and NTU KDDCup teams
About Me
Hsuan-Tien Lin
• Associate Professor, Dept. of CSIE, National Taiwan University
• Leader of the Computational Learning Laboratory
• Co-author of the textbook “Learning from Data: A Short Course” (often ML best seller on Amazon)
• Instructor of the NTU-Coursera Mandarin-teaching ML Massive Open Online Courses
• “Machine Learning Foundations”:
www.coursera.org/course/ntumlone
• “Machine Learning Techniques”:
www.coursera.org/course/ntumltwo
What is Machine Learning
What is Machine Learning
What is Machine Learning
From Learning to Machine Learning
learning: acquiring skill
learning:
with experience accumulated from
observations observations learning skill
machine learning: acquiring skill
machine learning:
with experience accumulated/computedfrom
data
data ML skill
What is
skill?
What is Machine Learning
A More Concrete Definition
⇔
skill
⇔ improve some
performance measure
(e.g. prediction accuracy)machine learning: improving some performance measure
machine learning:
with experience
computed
fromdata
data ML
improved performance measure
An Application in Computational Finance
stock data ML more investment gain
Why use machine learning?
What is Machine Learning
Yet Another Application: Tree Recognition
•
‘define’ trees and hand-program:difficult
•
learn from data (observations) and recognize: a3-year-old can do so
•
‘ML-based tree recognition system’ can beeasier to build
than hand-programmed systemML: an
alternative route
to build complicated systemsWhat is Machine Learning
The Machine Learning Route
ML: an
alternative route
to build complicated systemsSome Use Scenarios
•
when human cannot program the system manually—navigating on Mars
•
when human cannot ‘define the solution’ easily—speech/visual recognition
•
when needing rapid decisions that humans cannot do—high-frequency trading
•
when needing to be user-oriented in a massive scale—consumer-targeted marketing
Give a
computer a fish, you feed it for a day;
teach it how to fish, you feed it for a lifetime.
:-)
What is Machine Learning
Key Essence of Machine Learning
machine learning: improving some performance measure
machine learning:
with experience
computed
fromdata
data ML
improved performance measure
1
existssome ‘underlying pattern’ to be learned
—so ‘performance measure’ can be improved
2
butno
programmable (easy)definition
—so ‘ML’ is needed
3
somehow there isdata
about the pattern—so ML has some ‘inputs’ to learn from
key essence: help decide whether to use ML
Snapshot Applications of Machine Learning
Snapshot Applications of Machine Learning
Snapshot Applications of Machine Learning
Advertisement
data ML skill
for 4G LTE communication
• data:
• customer information and ad information
• skill: predict best ad to show to the user
so that she/he clicks ongoing work of my collaboration with Appierhttp://technews.tw/2015/11/03/
appier-asia/
Snapshot Applications of Machine Learning
Daily Needs: Food, Clothing, Housing, Transportation
data ML skill
1
Food(Sadilek et al., 2013)
• data: Twitter data (words + location)
• skill: tell food poisoning likeliness of restaurant properly
2
Clothing(Abu-Mostafa, 2012)
• data: sales figures + client surveys
• skill: give good fashion recommendations to clients
3
Housing(Tsanas and Xifara, 2012)
• data: characteristics of buildings and their energy load
• skill: predict energy load of other buildings closely
4
Transportation(Stallkamp et al., 2012)
• data: some traffic sign images and meanings
• skill: recognize traffic signs accurately
ML
is everywhere!Snapshot Applications of Machine Learning
Education
data ML skill
• data: students’ records on quizzes on a Math tutoring system
• skill: predict whether a student can give a correct answer to
another quiz questionA Possible ML Solution
answer correctly ≈Jrecent
strength
of student >difficulty
of questionK•
give ML9 million records
from3000 students
•
ML determines (reverse-engineers)strength
anddifficulty
automaticallykey part of the
world-champion
system from National Taiwan Univ. in KDDCup 2010Snapshot Applications of Machine Learning
Entertainment: Recommender System (1/2)
data ML skill
• data: how many users have rated some movies
• skill: predict how a user would rate an unrated movie
A Hot Problem
•
competition held by Netflix in 2006• 100,480,507 ratings that 480,189 users gave to 17,770 movies
• 10% improvement = 1 million dollar prize
•
similar competition (movies → songs) held by Yahoo! in KDDCup 2011• 252,800,275 ratings that 1,000,990 users gave to 624,961 songs
How can machines
learn our preferences?
Snapshot Applications of Machine Learning
Entertainment: Recommender System (2/2)
Match movie and viewer factors
predicted rating
comedy content action
content blockb uster?
TomCruisein it?
likes TomCruise?
prefers blockbusters? likes action?
likes comedy?
movie viewer
add contributions from each factor
A Possible ML Solution
•
pattern:rating
←viewer/movie factors
•
learning:→
known rating
→ learned
factors
→ unknown rating prediction
key part of the
world-champion
(again!) system from National Taiwan Univ.in KDDCup 2011
Components of Machine Learning
Components of Machine Learning
Components of Machine Learning
Components of Learning:
Metaphor Using Credit Approval
Applicant Information
age 23 years
gender female
annual salary NTD 1,000,000 year in residence 1 year
year in job 0.5 year current debt 200,000
unknown pattern to be learned:
‘approve credit card good for bank?’
Components of Machine Learning
Formalize the Learning Problem
Basic Notations
•
input:x ∈ X (customer application)
•
output: y ∈ Y (good/bad after approving credit card)• unknown pattern to be learned ⇔ target function:
f : X → Y (ideal credit approval formula)
• data ⇔ training examples: D = {(x 1
,y1
), (x2
,y2
), · · · , (xN
,yN
)}(historical records in bank)
• hypothesis ⇔ skill
with hopefullygood performance:
g : X → Y (‘learned’ formula to be used)
{(x n , y n )}
fromf ML g
Components of Machine Learning
Learning Flow for Credit Approval
unknown target function f : X → Y
(ideal credit approval formula)
training examples D : (x
1, y
1), · · · , (x
N,y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
•
target funknown
(i.e. no programmable definition)
•
hypothesis g hopefully ≈ f but possiblydifferent
from f(perfection ‘impossible’ when f unknown) What does g look like?
Components of Machine Learning
The Learning Model
training examples D : (x
1, y
1), · · · , (x
N, y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
hypothesis set H
(set of candidate formula)
•
assume g ∈ H = {hk
}, i.e. approving if• h
1: annual salary > NTD 800,000
• h
2: debt > NTD 100,000 (really?)
• h
3: year in job ≤ 2 (really?)
•
hypothesis set H:• can contain good or bad hypotheses
• up to A to pick the ‘best’ one as g
learning model
= A and HComponents of Machine Learning
Practical Definition of Machine Learning
unknown target function f : X → Y
(ideal credit approval formula)
training examples D : (x
1, y
1), · · · , (x
N,y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
hypothesis set H
(set of candidate formula)
machine learning:
use
data
to computehypothesis g
that approximates
target f
Machine Learning Research in CLLab
Machine Learning Research in CLLab
Machine Learning Research in CLLab
Making Machine Learning Realistic: Now
Oracle: truth f (x) + noise e(x)
? (4) ?
data (instance x
n, label y
n)
? (1) -
learning
6
(3) good learning system g(x) algorithm
'
&
$
% -
(2) - 6
learning model {h(x)}
CLLab Works: Loosen the Limits of ML
1 cost-sensitive classification: limited protocol (classification) + auxiliary info. (cost)
2 multi-label classification: limited protocol (classification) + structure info. (label relation)
3 active learning: limited protocol (unlabeled data) + requested info.
(query)
4 online learning: limited protocol (streaming data) + feedback info.
(loss)
next:
(1)
cost-sensitive classificationMachine Learning Research in CLLab
Which Digit Did You Write?
?
one (1) two (2) three (3)
a
classification problem
—grouping “pictures” into different “cate- gories”
Machine Learning Research in CLLab
Traditional Classification Problem
Oracle: truth f (x) + noise e(x)
?
data (instance x
n, label y
n)
?
learning good
learning
system g(x) ≈ f (x) algorithm
'
&
$
% -
6
learning model {g
α(x)}
1 input: a batch of examples (digit x
n, intended label y
n)
2 desired output: some g(x) such that g(x) 6= y seldom for future examples (x, y )
3 evaluation for some digit
(x = , y = 2)
—g(x) =
1 : wrong;
2 : right;
3 : wrong
Are all the
wrongs equally bad?
Machine Learning Research in CLLab
What is the Status of the Patient?
?
H1N1-infected cold-infected healthy
another
classification problem
—grouping “patients” into different “status”
Machine Learning Research in CLLab
Patient Status Prediction
error measure = society cost
actual predicted H1N1 cold healthy
H1N1 0 1000 100000
cold 100 0 3000
healthy 100 30 0
•
H1N1 mis-predicted as healthy:very high cost
•
cold mis-predicted as healthy:high cost
•
cold correctly predicted as cold:no cost
human doctors consider costs of decision;
can computer-aided diagnosis do the
same?
Machine Learning Research in CLLab
Our Contributions
binary multiclass
regular well-studied well-studied
cost-sensitive known (Zadrozny, 2003)
ongoing (our works)
theoretic, algorithmic and empirical studies of cost-sensitive classification
•
ICML 2010: a theoretically-supported algorithm withsuperior experimental results
•
BIBM 2011: application to real-worldbacteria classification with promising experimental results
•
KDD 2012: a cost-sensitiveand error-sensitive
methodology (achieving both low cost andfew
wrongs)
Machine Learning Research in CLLab
Making Machine Learning Realistic: Next
Teacher
?
cost c(t) query x(t) & guess ˆ y (t)
? learning algorithm '
&
$
%
knowledge X P P
P P P P i
6 learning model
Interactive Machine Learning
1 environment
2 exploration
3 dynamic
4 partial feedback
let us teach machines as “easily” as teaching students
Machine Learning Research in CLLab
Case: Interactive Learning for Online Advertisement
Traditional Machine Learning for Online Advertisement
•
data gathering: systemrandomly shows ads to some previous users
•
expert building: systemanalyzes data gathered to determine best (fixed) strategy
Interactive Machine Learning for Online Advertisement
•
environment : system servesonline users with profile
•
exploration : systemdecides to show an ad to the user
•
dynamic : system receives data fromreal-time user click
•
partial feedback : system receivesreward only if clicking
Machine Learning Research in CLLab
ICML 2012 Exploration & Exploitation Challenge
Interactive Machine Learning for Online Advertisement
•
environment : system servesonline users with profile
•
exploration : systemdecides to show an ad to the user
•
dynamic : system receives data fromreal-time user click
•
partial feedback : system receivesreward only if clicking
NTU beats two MIT teams to be the phase 1 winner!
ongoing collaboration with
Appier
for online advertisementMore on KDDCup
More on KDDCup
More on KDDCup
What is KDDCup?
Background
•
an annual competition on KDD (knowledge discovery and data mining)•
organized by ACM SIGKDD, starting from 1997, nowthe most prestigious data mining competition
•
usually lasts 3-4 months•
participants include famous research labs (IBM, AT&T) and top universities (Stanford, Berkeley)More on KDDCup
Aim of KDDCup
Aim
•
bridge the gap between theory andpractice, such as
• scalability and efficiency
• missing data and noise
• heterogeneous data
• unbalanced data
• combination of different models
•
define thestate-of-the-art
More on KDDCup
KDDCups: 2008 to 2013 I
2008
•
organizer: Siemens•
topic: breast cancer prediction (medical)•
data size: 0.2M•
teams: > 200•
NTU:co-champion
with IBM (led by Prof. Shou-de Lin)2009
•
organizer: Orange•
topic: customer behavior prediction (business)•
data size: 0.1M•
teams: > 400•
NTU:3rd place
of slow trackMore on KDDCup
KDDCups: 2008 to 2013 II
2010
•
organizer: PSLC Data Shop•
topic: student performance prediction (education)•
data size: 30M•
teams: > 100•
NTU:champion
andstudent-team champion 2011
•
organizer: Yahoo!•
topic: music preference prediction (recommendation)•
data size: 300M•
teams: > 1000•
NTU:double champions
More on KDDCup
KDDCups: 2008 to 2013 III
2012
•
organizer: Tencent•
topic: webuser behavior prediction (Internet)•
data size: 150M•
teams: > 800•
NTU:champion of track 2 2013
•
organizer: Microsoft Research•
topic: paper-author relationship prediction (academia)•
data size: 600M•
teams: > 500•
NTU:double champions
More on KDDCup
KDDCup 2011
Music Recommendation Systems
•
host: Yahoo!• 11 years of Yahoo! music data
• 2 tracks of competition
•
official dates:March 15 to June 30
•
1878 teams submitted to track 1;1854 teams submitted to track 2
More on KDDCup
NTU Team for KDDCup 2011
•
3 faculties:Profs. Chih-Jen Lin, Hsuan-Tien Lin and Shou-De Lin
•
1 course (starting in 2010)Data Mining and Machine Learning: Theory and Practice
•
3 TAs and 19 students:most were
inexperienced in music recommendation in the beginning
•
official classes: April to June;actual classes: December to June
our motto: study state-of-the-art approaches and then
creatively improve them
More on KDDCup
Previously: How Much Did You Like These Movies?
http://www.netflix.com
(1M dollar competition between 2007-2009)
goal: use “movies you’ve rated” to automatically
predict your
preferences on future movies
More on KDDCup
The Track 1 Problem (1/2)
Given Data
263M examples (user u, item i, rating r
ui
,date tui
,time τui
)user item rating date time
1 21 10 102 23:52
1 213 90 1032 21:01
4 45 95 768 09:15
· · ·
•
u, i: abstract IDs•
rui
: integer between 0 and 100,mostly multiples of 10 Additional Information: Item Hierarchy
•
track (46.85%)•
album (19.01%)•
artist (28.84%)•
genre (5.30%)More on KDDCup
The Track 1 Problem (2/2)
Data Partitioned by Organizers
•
training: 253M; validation: 4M;test (w/o rating): 6M
•
per user,training < validation < test in time
• ≥ 20 examples total
• 4 examples in validation; 6 in test
• fixed random half of test: leaderboard;
another half: award decision
Goal
predictions ˆr
ui
≈ rui
on the test set, measured by RMSE =q
average(ˆr
ui
− rui
)2
— one submission allowed
every eight hours
More on KDDCup
Three Properties of Track 1 Data
R =
track
1track
2album
3author
4· · · genre
Iuser
1100 80 70 ? · · · −
user
2− 0 ? 80 · · · −
· · · · · · · · · · · · · · · · · · · · ·
user
U? − 20 − · · · 0
similar to Netflix data, but with the following differences...
• scale: larger data
—study mature models that are computationally feasible
• taxonomy: relation graph of tracks, albums, authors and genres
—include as features for combining models nonlinearly
• time: detailed; training earlier than test
—include as features for combining models nonlinearly;
respect time-closeness during training
More on KDDCup
Framework of Our Solution
System Architecture
• improve standard models: design variants within 6 families of state-of-the-art models (reaches RMSE 22.7915)
• blend the models: improve prediction power by blending the variants carefully (reaches RMSE 21.3598)
• aggregate the blended predictors: construct a linear ensemble with test performance estimators (reaches RMSE 21.0253)
• post-process the ensemble: add a final touch based on observations from data analysis (reaches RMSE 21.0147)
not only
hard work (200+ models included),
but alsokey techniques
That’s about all. Thank you!