Machine Learning Overviews and Applications
Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw
Department of Computer Science
& Information Engineering
National Taiwan University
IRTG TAC-ICT Meeting, 01/11/2016
materials mostly taken from my “Learning from Data” book, my “Machine Learning Foundations” free online course, and works from NTU CLLab
and NTU KDDCup teams
About Me
Hsuan-Tien Lin
• Associate Professor, Dept. of CSIE, National Taiwan University
• Leader of the Computational Learning Laboratory
• Co-author of the textbook “Learning from Data: A Short Course” (often ML best seller on Amazon)
• Instructor of the NTU-Coursera Mandarin-teaching ML Massive Open Online Courses
• “Machine Learning Foundations”:
www.coursera.org/course/ntumlone
• “Machine Learning Techniques”:
www.coursera.org/course/ntumltwo
What is Machine Learning
What is Machine Learning
From Learning to Machine Learning
learning: acquiring skill
learning:
with experience accumulated from
observations observations learning skill
machine learning: acquiring skill
machine learning:
with experience accumulated/computedfrom
data
data ML skill
What is
skill?
What is Machine Learning
A More Concrete Definition
skill
⇔ improve some
performance measure
(e.g. prediction accuracy)machine learning: improving some performance measure
machine learning:
with experience
computed
fromdata
data ML
improved performance measure
An Application in Computational Finance
stock data ML more investment gain
Why use machine learning?
Yet Another Application: Tree Recognition
•
‘define’ trees and hand-program:difficult
•
learn from data (observations) and recognize: a3-year-old can do so
•
‘ML-based tree recognition system’ can beeasier to build
than hand-programmed systemML: an
alternative route
to build complicated systemsThe Machine Learning Route
ML: an
alternative route
to build complicated systemsSome Use Scenarios
•
when human cannot program the system manually—navigating on Mars
•
when human cannot ‘define the solution’ easily—speech/visual recognition
•
when needing rapid decisions that humans cannot do—high-frequency trading
•
when needing to be user-oriented in a massive scale—consumer-targeted marketing
Give a
computer a fish, you feed it for a day;
teach it how to fish, you feed it for a lifetime.
:-)
What is Machine Learning
Key Essence of Machine Learning
machine learning: improving some performance measure
machine learning:
with experience
computed
fromdata
data ML
improved performance measure
1
existssome ‘underlying pattern’ to be learned
—so ‘performance measure’ can be improved
2
butno
programmable (easy)definition
—so ‘ML’ is needed
3
somehow there isdata
about the pattern—so ML has some ‘inputs’ to learn from
key essence: help decide whether to use ML
Snapshot Applications of Machine Learning
Communication
data ML skill
for 4G LTE communication
• data:
• channel information (the channel matrix representing mutual information)
• configuration (precoding, modulation, etc.) that reaches the highest throughput
• skill: predict best configuration to the base station
in a new environmentprevious work of my student Yi-An Lin as intern @ MTK
Advertisement
data ML skill
for cross-screen ad placement
• data:
• customer information
• device information
• ad information
• skill: predict best ad to show to the user across devices
so that she/he clicksongoing work of my collaboration with Appier http://technews.tw/2015/11/03/
appier-asia/
Daily Needs: Food, Clothing, Housing, Transportation
data ML skill
1
Food(Sadilek et al., 2013)
• data: Twitter data (words + location)
• skill: tell food poisoning likeliness of restaurant properly
2
Clothing(Abu-Mostafa, 2012)
• data: sales figures + client surveys
• skill: give good fashion recommendations to clients
3
Housing(Tsanas and Xifara, 2012)
• data: characteristics of buildings and their energy load
• skill: predict energy load of other buildings closely
4
Transportation(Stallkamp et al., 2012)
• data: some traffic sign images and meanings
• skill: recognize traffic signs accurately
ML
is everywhere!Education
data ML skill
• data: students’ records on quizzes on a Math tutoring system
• skill: predict whether a student can give a correct answer to
another quiz questionA Possible ML Solution
answer correctly ≈Jrecent
strength
of student >difficulty
of questionK•
give ML9 million records
from3000 students
•
ML determines (reverse-engineers)strength
anddifficulty
automaticallykey part of the
world-champion
system from National Taiwan Univ. in KDDCup 2010Entertainment: Recommender System (1/2)
data ML skill
• data: how many users have rated some movies
• skill: predict how a user would rate an unrated movie
A Hot Problem
•
competition held by Netflix in 2006• 100,480,507 ratings that 480,189 users gave to 17,770 movies
• 10% improvement = 1 million dollar prize
•
similar competition (movies → songs) held by Yahoo! in KDDCup 2011• 252,800,275 ratings that 1,000,990 users gave to 624,961 songs
How can machines
learn our preferences?
Snapshot Applications of Machine Learning
Entertainment: Recommender System (2/2)
Match movie and viewer factors
predicted rating
comedy content action
content blockb uster?
TomCruisein it?
likes TomCruise?
prefers blockbusters? likes action?
likes comedy?
movie viewer
add contributions from each factor
A Possible ML Solution
•
pattern:rating
←viewer/movie factors
•
learning:known rating
→ learned
factors
→ unknown rating prediction
key part of the
world-champion
(again!) system from National Taiwan Univ.in KDDCup 2011
Components of Machine Learning
Components of Learning:
Metaphor Using Credit Approval
Applicant Information
age 23 years
gender female
annual salary NTD 1,000,000 year in residence 1 year
year in job 0.5 year current debt 200,000
unknown pattern to be learned:
‘approve credit card good for bank?’
Formalize the Learning Problem
Basic Notations
•
input:x ∈ X (customer application)
•
output: y ∈ Y (good/bad after approving credit card)• unknown pattern to be learned ⇔ target function:
f : X → Y (ideal credit approval formula)
• data ⇔ training examples: D = {(x 1
,y1
), (x2
,y2
), · · · , (xN
,yN
)}(historical records in bank)
• hypothesis ⇔ skill
with hopefullygood performance:
g : X → Y (‘learned’ formula to be used)
{(x n , y n )}
fromf ML g
Learning Flow for Credit Approval
unknown target function f : X → Y
(ideal credit approval formula)
training examples D : (x
1, y
1), · · · , (x
N,y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
•
target funknown
(i.e. no programmable definition)
•
hypothesis g hopefully ≈ f but possiblydifferent
from f(perfection ‘impossible’ when f unknown) What does g look like?
The Learning Model
training examples D : (x
1, y
1), · · · , (x
N, y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
hypothesis set H
(set of candidate formula)
•
assume g ∈ H = {hk
}, i.e. approving if• h
1: annual salary > NTD 800,000
• h
2: debt > NTD 100,000 (really?)
• h
3: year in job ≤ 2 (really?)
•
hypothesis set H:• can contain good or bad hypotheses
• up to A to pick the ‘best’ one as g
learning model
= A and HPractical Definition of Machine Learning
unknown target function f : X → Y
(ideal credit approval formula)
training examples D : (x
1, y
1), · · · , (x
N,y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
hypothesis set H
(set of candidate formula)
machine learning:
use
data
to computehypothesis g
that approximates
target f
Machine Learning Research in CLLab
Making Machine Learning Realistic: Now
Oracle: truth f (x) + noise e(x)
? (4) ?
data (instance x
n, label y
n)
? (1) -
learning
6
(3) good learning system g(x) algorithm
'
&
$
% -
(2) - 6
learning model {h(x)}
CLLab Works: Loosen the Limits of ML
1 cost-sensitive classification: limited protocol (classification) + auxiliary info. (cost)
2 multi-label classification: limited protocol (classification) + structure info. (label relation)
3 active learning: limited protocol (unlabeled data) + requested info.
(query)
4 online learning: limited protocol (streaming data) + feedback info.
(loss)
next:
(1)
cost-sensitive classificationWhich Digit Did You Write?
?
one (1) two (2) three (3)
a
classification problem
—grouping “pictures” into different “cate- gories”
Traditional Classification Problem
Oracle: truth f (x) + noise e(x)
?
data (instance x
n, label y
n)
?
learning good
learning
system g(x) ≈ f (x) algorithm
'
&
$
% -
6
learning model {g
α(x)}
1 input: a batch of examples (digit x
n, intended label y
n)
2 desired output: some g(x) such that g(x) 6= y seldom for future examples (x, y )
3 evaluation for some digit
(x = , y = 2)
—g(x) =
1 : wrong;
2 : right;
3 : wrong
Are all the
wrongs equally bad?
What is the Status of the Patient?
?
H1N1-infected cold-infected healthy
another
classification problem
—grouping “patients” into different “status”
Patient Status Prediction
error measure = society cost
actual predicted H1N1 cold healthy
H1N1 0 1000 100000
cold 100 0 3000
healthy 100 30 0
•
H1N1 mis-predicted as healthy:very high cost
•
cold mis-predicted as healthy:high cost
•
cold correctly predicted as cold:no cost
human doctors consider costs of decision;
can computer-aided diagnosis do the
same?
Our Contributions
binary multiclass
regular well-studied well-studied
cost-sensitive known (Zadrozny, 2003)
ongoing (our works)
theoretic, algorithmic and empirical studies of cost-sensitive classification
•
ICML 2010: a theoretically-supported algorithm withsuperior experimental results
•
BIBM 2011: application to real-worldbacteria classification with promising experimental results
•
KDD 2012: a cost-sensitiveand error-sensitive
methodology (achieving both low cost andfew
wrongs)
Making Machine Learning Realistic: Next
Teacher
?
cost c(t) query x(t) & guess ˆ y (t)
? learning algorithm '
&
$
%
knowledge X P P
P P P P i
6 learning model
Interactive Machine Learning
1 environment
2 exploration
3 dynamic
4 partial feedback
let us teach machines as “easily” as teaching students
Case: Interactive Learning for Online Advertisement
Traditional Machine Learning for Online Advertisement
•
data gathering: systemrandomly shows ads to some previous users
•
expert building: systemanalyzes data gathered to determine best (fixed) strategy
Interactive Machine Learning for Online Advertisement
•
environment : system servesonline users with profile
•
exploration : systemdecides to show an ad to the user
•
dynamic : system receives data fromreal-time user click
•
partial feedback : system receivesreward only if clicking
ICML 2012 Exploration & Exploitation Challenge
Interactive Machine Learning for Online Advertisement
•
environment : system servesonline users with profile
•
exploration : systemdecides to show an ad to the user
•
dynamic : system receives data fromreal-time user click
•
partial feedback : system receivesreward only if clicking
NTU beats two MIT teams to be the phase 1 winner!
ongoing collaboration with
Appier
for online advertisementMore on KDDCup
What is KDDCup?
Background
•
an annual competition on KDD (knowledge discovery and data mining)•
organized by ACM SIGKDD, starting from 1997, nowthe most prestigious data mining competition
•
usually lasts 3-4 months•
participants include famous research labs (IBM, AT&T) and top universities (Stanford, Berkeley)Aim
•
bridge the gap between theory andpractice, such as
• scalability and efficiency
• missing data and noise
• heterogeneous data
• unbalanced data
•
define thestate-of-the-art
KDDCups: 2008 to 2015 (1/4)
2008
•
organizer: Siemens•
topic: breast cancer prediction (medical)•
data size: 0.2M•
teams: > 200•
NTU:co-champion
with IBM2009
•
organizer: Orange•
topic: customer behavior prediction (business)•
data size: 0.1M•
teams: > 400•
NTU:3rd place
of slow trackKDDCups: 2008 to 2015 (2/4)
2010
•
organizer: PSLC Data Shop•
topic: student performance prediction (education)•
data size: 30M•
teams: > 100•
NTU:champion
andstudent-team champion 2011
•
organizer: Yahoo!•
topic: music preference prediction (recommendation)•
data size: 300M•
teams: > 1000•
NTU:double champions
KDDCups: 2008 to 2015 (3/4)
2012
•
organizer: Tencent•
topic: webuser behavior prediction (Internet)•
data size: 150M•
teams: > 800•
NTU:champion of track 2 2013
•
organizer: Microsoft Research•
topic: paper-author relationship prediction (academia)•
data size: 600M•
teams: > 500•
NTU:double champions
KDDCups: 2008 to 2015 (4/4)
2014
•
organizer: DonorsChoose•
topic: charity proposal recommendation (social work)•
data size: 850M•
teams: > 450•
NTU: top 202015
•
organizer: XuetangX•
topic: dropout student prediction (online education)•
data size: 100M•
teams: > 800•
NTU:4th place
Our Systematic Steps in KDDCups
1
dataanalysis
(on part of data)• calculate statistics to identify outliers
• visualize data to see trend/pattern
2
featureextraction
• feature design by human: common encoding, domain knowledge, etc.
• feature learning by machines: sparse coding, matrix factorization, deep learning, etc.
3
modellearning
• model exploration (trial-and-evaluate) to improve performance
• model selection to avoid overfitting
4
hypothesesblending
(towardsbig ensemble)
• careful non-linear blending to be sophisticated
• careful linear blending (voting/averaging) to be robust
you can also
follow those step for your applications, except for maybe “big
ensemble”!