Machine Learning Foundations ( 機器學習基石)
Lecture 1: The Learning Problem
Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.twDepartment of Computer Science
& Information Engineering
National Taiwan University
( 國立台灣大學資訊工程系)
The Learning Problem Course Introduction
Course Design (1/2)
Machine Learning: a mixture of theoretical and practical tools
•
theory oriented• derive everything deeply for solid understanding
• less interesting to general audience
•
techniquesoriented• flash over the sexiest techniques broadly for shiny coverage
• too many techniques, hard to choose, hard to use properly
our approach:
foundation oriented
The Learning Problem Course Introduction
Course Design (2/2)
Foundation Oriented ML Course
•
mixture of philosophical illustrations, key theory, core techniques, usage in practice, and hopefully jokes:-)
—what
every machine learning user
should know•
story-like:• When Can Machines Learn? (illustrative + technical)
• Why Can Machines Learn? (theoretical + illustrative)
• How Can Machines Learn? (technical + practical)
• How Can Machines Learn Better? (practical + theoretical)
allows students to
learn ‘future/untaught’
techniques or study deeper theory easily
The Learning Problem Course Introduction
Course History
NTU Version
•
15-17 weeks (2+ hours)•
highly-praised withEnglish and blackboard teaching
Coursera Version
•
8 weeks of ‘foundation’ (thiscourse) + 7 weeks of
‘techniques’ (coming course)
• Mandarin teaching
to reach more audience in need• slides teaching
improved with Coursera’s quiz and homework mechanismsgoal:
try
making Coursera version even better than NTU versionThe Learning Problem Course Introduction
Fun Time
Which of the following description of this course is true?
1
the course will be taught in Taiwanese2
the course will tell me the techniques that create the android Lieutenant Commander Data in Star Trek3
the course will be 15 weeks long4
the course will be story-likeReference Answer: 4
1
no, my Taiwanese is unfortunately not good enough for teaching (yet)2
no, although what we teach may serve as foundations of those (future) techniques3
no, unless you choose to join the next course4
yes,let’s begin the story
The Learning Problem Course Introduction
Roadmap
1 When
Can Machines Learn?Lecture 1: The Learning Problem Course Introduction
What is Machine Learning
Applications of Machine Learning Components of Machine Learning Machine Learning and Other Fields
2 Why Can Machines Learn?
3 How Can Machines Learn?
4 How Can Machines Learn Better?
The Learning Problem What is Machine Learning
From Learning to Machine Learning
learning: acquiring skill
learning:
with experience accumulated from
observations observations learning skill
machine learning: acquiring skill
machine learning:
with experience accumulated/computedfrom
data
data ML skill
What is
skill?
The Learning Problem What is Machine Learning
A More Concrete Definition
⇔
skill
⇔ improve some
performance measure
(e.g. prediction accuracy)machine learning: improving some performance measure
machine learning:
with experience
computed
fromdata
data ML
improved performance measure
An Application in Computational Finance
stock data ML more investment gain
Why use machine learning?
The Learning Problem What is Machine Learning
Yet Another Application: Tree Recognition
•
‘define’ trees and hand-program:difficult
•
learn from data (observations) and recognize: a3-year-old can do so
•
‘ML-based tree recognition system’ can beeasier to build
than hand-programmed systemML: an
alternative route
to build complicated systemsThe Learning Problem What is Machine Learning
The Machine Learning Route
ML: an
alternative route
to build complicated systemsSome Use Scenarios
•
when human cannot program the system manually—navigating on Mars
•
when human cannot ‘define the solution’ easily—speech/visual recognition
•
when needing rapid decisions that humans cannot do—high-frequency trading
•
when needing to be user-oriented in a massive scale—consumer-targeted marketing
Give a
computer a fish, you feed it for a day;
teach it how to fish, you feed it for a lifetime.
:-)
The Learning Problem What is Machine Learning
Key Essence of Machine Learning
machine learning: improving some performance measure
machine learning:
with experience
computed
fromdata
data ML
improved performance measure
1
existssome ‘underlying pattern’ to be learned
—so ‘performance measure’ can be improved
2
butno
programmable (easy)definition
—so ‘ML’ is needed
3
somehow there isdata
about the pattern—so ML has some ‘inputs’ to learn from
key essence: help decide whether to use ML
The Learning Problem What is Machine Learning
Fun Time
Which of the following is best suited for machine learning?
1
predicting whether the next cry of the baby girl happens at an even-numbered minute or not2
determining whether a given graph contains a cycle3
deciding whether to approve credit card to some customer4
guessing whether the earth will be destroyed by the misuse of nuclear power in the next ten yearsReference Answer: 3
1
nopattern
2 programmable definition
3 pattern: customer behavior;
definition: not easily programmable;
data: history of bank operation
4
arguablyno (or not enough) data
yetThe Learning Problem Applications of Machine Learning
Daily Needs: Food, Clothing, Housing, Transportation
data ML skill
1
Food(Sadilek et al., 2013)
• data: Twitter data (words + location)
• skill: tell food poisoning likeliness of restaurant properly
2
Clothing(Abu-Mostafa, 2012)
• data: sales figures + client surveys
• skill: give good fashion recommendations to clients
3
Housing(Tsanas and Xifara, 2012)
• data: characteristics of buildings and their energy load
• skill: predict energy load of other buildings closely
4
Transportation(Stallkamp et al., 2012)
• data: some traffic sign images and meanings
• skill: recognize traffic signs accurately
ML
is everywhere!The Learning Problem Applications of Machine Learning
Education
data ML skill
• data: students’ records on quizzes on a Math tutoring system
• skill: predict whether a student can give a correct answer to
another quiz questionA Possible ML Solution
answer correctly ≈Jrecent
strength
of student >difficulty
of questionK•
give ML9 million records
from3000 students
•
ML determines (reverse-engineers)strength
anddifficulty
automaticallykey part of the
world-champion
system from National Taiwan Univ. in KDDCup 2010The Learning Problem Applications of Machine Learning
Entertainment: Recommender System (1/2)
data ML skill
• data: how many users have rated some movies
• skill: predict how a user would rate an unrated movie
A Hot Problem
•
competition held by Netflix in 2006• 100,480,507 ratings that 480,189 users gave to 17,770 movies
• 10% improvement = 1 million dollar prize
•
similar competition (movies → songs) held by Yahoo! in KDDCup 2011• 252,800,275 ratings that 1,000,990 users gave to 624,961 songs
How can machineslearn our preferences?
The Learning Problem Applications of Machine Learning
Entertainment: Recommender System (2/2)
Match movie and viewer factors
predicted rating
comedy content action
content blockb uster?
TomCruisein it?
likes TomCruise?
prefers blockbusters? likes action?
likes comedy?
movie viewer
add contributions from each factor
A Possible ML Solution
•
pattern:rating
←viewer/movie factors
•
learning:→
known rating
→ learned
factors
→ unknown rating prediction
key part of the
world-champion
(again!) system from National Taiwan Univ.in KDDCup 2011
The Learning Problem Applications of Machine Learning
Fun Time
Which of the following field cannot use machine learning?
1
Finance2
Medicine3
Law4
none of the aboveReference Answer: 4
1
predict stock price from data2
predict medicine effect from data3
summarize legal documents from data4 :-) Welcome to study this hot topic!
The Learning Problem Components of Machine Learning
Components of Learning:
Metaphor Using Credit Approval
Applicant Information
age 23 years
gender female
annual salary NTD 1,000,000 year in residence 1 year
year in job 0.5 year current debt 200,000
unknown pattern to be learned:
‘approve credit card good for bank?’
The Learning Problem Components of Machine Learning
Formalize the Learning Problem
Basic Notations
•
input:x ∈ X (customer application)
•
output: y ∈ Y (good/bad after approving credit card)• unknown pattern to be learned ⇔ target function:
f : X → Y (ideal credit approval formula)
• data ⇔ training examples: D = {(x 1
,y1
), (x2
,y2
), · · · , (xN
,yN
)}(historical records in bank)
• hypothesis ⇔ skill
with hopefullygood performance:
g : X → Y (‘learned’ formula to be used)
{(x n , y n )}
fromf ML g
The Learning Problem Components of Machine Learning
Learning Flow for Credit Approval
unknown target function f : X → Y
(ideal credit approval formula)
training examples D : (x
1, y
1), · · · , (x
N,y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
•
target funknown
(i.e. no programmable definition)
•
hypothesis g hopefully ≈ f but possiblydifferent
from f(perfection ‘impossible’ when f unknown) What does g look like?
The Learning Problem Components of Machine Learning
The Learning Model
training examples D : (x
1, y
1), · · · , (x
N, y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
hypothesis set H
(set of candidate formula)
•
assume g ∈ H = {hk
}, i.e. approving if• h
1: annual salary > NTD 800,000
• h
2: debt > NTD 100,000 (really?)
• h
3: year in job ≤ 2 (really?)
•
hypothesis set H:• can contain good or bad hypotheses
• up to A to pick the ‘best’ one as g
learning model
= A and HThe Learning Problem Components of Machine Learning
Practical Definition of Machine Learning
unknown target function f : X → Y
(ideal credit approval formula)
training examples D : (x
1, y
1), · · · , (x
N,y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
hypothesis set H
(set of candidate formula)
machine learning:
use
data
to computehypothesis g
that approximates
target f
The Learning Problem Components of Machine Learning
Fun Time
How to use the four sets below to form a learning problem for song recommendation?
S
1
= [0, 100]S
2
= all possible (userid, songid) pairsS
3
= all formula that ‘multiplies’ user factors & song factors, indexed by all possible combinations of such factors S4
= 1,000,000 pairs of ((userid, songid), rating)1
S1
= X , S2
= Y, S3
= H, S4
= D2
S1
= Y, S2
= X , S3
= H, S4
= D3
S1
= D, S2
= H, S3
= Y, S4
= X4
S1
= X , S2
= D, S3
= Y, S4
= HReference Answer: 2
S
4
−A on S
−−−−→ (g : S32
→ S1
)The Learning Problem Machine Learning and Other Fields
Machine Learning and Data Mining
Machine Learning
use data to compute hypothesis g that approximates target f
Data Mining
use
(huge)
data tofind property
that is interesting•
if ‘interesting property’same as
‘hypothesis that approximate target’—ML = DM(usually what KDDCup does)
•
if ‘interesting property’related to
‘hypothesis that approximate target’—DM can help ML, and vice versa(often, but not always)
•
traditional DM also focuses onefficient computation in large database
difficult to distinguish ML and DM in reality
The Learning Problem Machine Learning and Other Fields
Machine Learning and Artificial Intelligence
Machine Learning
use data to compute hypothesis g that approximates target f
Artificial Intelligence
computesomething
that shows intelligent behavior
•
g ≈ f is something that shows intelligent behavior—ML can realize AI, among other routes
•
e.g. chess playing• traditional AI: game tree
• ML for AI: ‘learning from board data’
ML is one possible route to realize AI
The Learning Problem Machine Learning and Other Fields
Machine Learning and Statistics
Machine Learning
use data to compute hypothesis g that approximates target f
Statistics
use data to
make inference about an unknown process
•
g is an inference outcome; f is something unknown—statistics
can be used to achieve ML
•
traditional statistics also focus onprovable results with math assumptions, and care less about computation
statistics: many useful tools for ML
The Learning Problem Machine Learning and Other Fields
Fun Time
Which of the following claim is not totally true?
1
machine learning is a route to realize artificial intelligence2
machine learning, data mining and statistics all need data3
data mining is just another name for machine learning4
statistics can be used for data miningReference Answer: 3
While data mining and machine learning do share a huge overlap, they are arguably not equivalent because of the difference of focus.
The Learning Problem Machine Learning and Other Fields