## Basics of Machine Learning

## ( 機器學習入門)

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

### Department of Computer Science

### & Information Engineering

### National Taiwan University

### ( 國立台灣大學資訊工程系)

The Learning Problem

## More about Me

### Associate Professor Dept. CSIE National Taiwan

### University

### •

graduated from ck49th314, 1997### •

co-leader of KDDCup world champion teams at NTU: 2010–2013### •

Secretary General, Taiwanese Association for Artificial Intelligence### •

co-author of bestseller ML textbook “Learning from Data”### •

instructor of Mandarin-teaching MOOC of Machine Learning on NTU-Coursera:**2013.11–**

### https://www.coursera.org/course/ntumlone

The Learning Problem What is Machine Learning

## From Learning to Machine Learning

### learning: acquiring skill

learning:

with experience accumulated from

### observations observations learning skill

**machine learning: acquiring** skill

**machine learning:**

with experience accumulated/computedfrom

### data

### data ML ^{skill}

What is

### skill?

The Learning Problem What is Machine Learning

## A More Concrete Definition

⇔

### skill

⇔ improve some

### performance measure

(e.g. prediction accuracy)**machine learning: improving some** performance measure

**machine learning:**

with experience

**computed**

from### data

### data ML

### improved performance measure

### An Application in Computational Finance

### stock data ML more investment gain

Why use machine learning?

The Learning Problem What is Machine Learning

## Yet Another Application: Tree Recognition

### •

‘define’ trees and hand-program:**difficult**

### •

learn from data (observations) and recognize: a**3-year-old can do so**

### •

‘ML-based tree recognition system’ can be**easier to build**

than hand-programmed
system
ML: an

**alternative route**

to
build complicated systems
The Learning Problem What is Machine Learning

## The Machine Learning Route

ML: an

**alternative route**

to build complicated systems
### Some Use Scenarios

### •

when human cannot program the system manually—navigating on Mars

### •

when human cannot ‘define the solution’ easily—speech/visual recognition

### •

when needing rapid decisions that humans cannot do—high-frequency trading

### •

when needing to be user-oriented in a massive scale—consumer-targeted marketing

Give a

**computer a fish, you feed it for a day;**

teach it how to fish, you feed it for a lifetime.

**:-)**

The Learning Problem What is Machine Learning

## Key Essence of Machine Learning

**machine learning: improving some** performance measure

**machine learning:**

with experience

**computed**

from### data

### data ML

### improved performance measure

### 1

exists### some ‘underlying pattern’ to be learned

—so ‘performance measure’ can be improved

### 2

but### no

programmable (easy)### definition

—so ‘ML’ is needed

### 3

somehow there is### data

about the pattern—so ML has some ‘inputs’ to learn from

key essence: help decide whether to use ML

The Learning Problem Applications of Machine Learning

## Daily Needs: Food, Clothing, Housing, Transportation

### data ML ^{skill}

### 1

Food### (Sadilek et al., 2013)

### • data: Twitter data (words + location)

### • skill: tell food poisoning likeliness of restaurant properly

### 2

Clothing### (Abu-Mostafa, 2012)

### • data: sales figures + client surveys

### • skill: give good fashion recommendations to clients

### 3

Housing### (Tsanas and Xifara, 2012)

### • data: characteristics of buildings and their energy load

### • skill: predict energy load of other buildings closely

### 4

Transportation### (Stallkamp et al., 2012)

### • data: some traffic sign images and meanings

### • skill: recognize traffic signs accurately

### ML

is everywhere!The Learning Problem Applications of Machine Learning

## Education

### data ML ^{skill}

### • data: students’ records on quizzes on a Math tutoring system

### • skill: predict whether a student can give a correct answer to

another quiz question### A Possible ML Solution

answer correctly≈Jrecent

### strength

of student>### difficulty

of questionK### •

give ML### 9 million records

from### 3000 students

### •

ML determines (reverse-engineers)### strength

and### difficulty

automaticallykey part of the

**world-champion**

system from
National Taiwan Univ. in KDDCup 2010
The Learning Problem Applications of Machine Learning

## Entertainment: Recommender System (1/2)

### data ML ^{skill}

### • data: how many users have rated some movies

### • skill: predict how a user would rate an unrated movie

### A Hot Problem

### •

competition held by Netflix in 2006### • 100,480,507 ratings that 480,189 users gave to 17,770 movies

### • 10% improvement = **1 million dollar prize**

### •

similar competition (movies→ songs) held by Yahoo! in KDDCup 2011### • 252,800,275 ratings that 1,000,990 users gave to 624,961 songs

How can machines**learn our preferences?**

The Learning Problem Applications of Machine Learning

## Entertainment: Recommender System (2/2)

Match movie and viewer factors

predicted rating

com edy

content action

cont ent blockbu ster?

Tom Cru

isein it?

likesTomCruise? prefersblockbusters?

likesaction?

likescomedy?

movie viewer

add contributions from each factor

### A Possible ML Solution

### •

pattern:### rating

←### viewer/movie factors

### •

learning:→

### known rating

→ learned

### factors

→ unknown rating prediction

key part of the

**world-champion**

(again!)
system from National Taiwan Univ.
in KDDCup 2011

The Learning Problem Components of Machine Learning

## Components of Learning:

## Metaphor Using Credit Approval

### Applicant Information

age 23 years

gender female

annual salary NTD 1,000,000 year in residence 1 year

year in job 0.5 year current debt 200,000

**unknown** pattern to be learned:

‘approve credit card good for bank?’

The Learning Problem Components of Machine Learning

## Formalize the Learning Problem

### Basic Notations

### •

input:**x**

∈ X (customer application)
### •

output: y ∈ Y (good/bad after approving credit card)### • unknown pattern to be learned ⇔ target function

: f : X → Y (ideal credit approval formula)### • data ⇔ training examples

:**D = {(x**

### 1

, y_{1}

), (x_{2}

, y_{2}

),**· · · , (x**

### N

, y_{N}

)}
(historical records in bank)
### • hypothesis ⇔ skill

with hopefully### good performance:

g : X → Y (‘learned’ formula to be used)

**{(x** ^{n} , y n ) }

from### f ML ^{g}

The Learning Problem Components of Machine Learning

## Learning Flow for Credit Approval

### unknown target function f : X → Y

### (ideal credit approval formula)

### training examples **D : (x**

1### , y

1### ), · · · , (x

N### ,y

N### ) (historical records in bank)

### learning algorithm

### A

### final hypothesis g ≈ f

### (‘learned’ formula to be used)

### •

target f**unknown**

(i.e. no programmable definition)

### •

hypothesis g hopefully≈ f but possibly**different**

from f
(perfection ‘impossible’ when f unknown) What does g look like?

The Learning Problem Components of Machine Learning

## The Learning Model

### training examples **D : (x**

1### , y

1### ), · · · , (x

N### , y

N### ) (historical records in bank)

### learning algorithm

### A

### final hypothesis g ≈ f

### (‘learned’ formula to be used)

### hypothesis set H

### (set of candidate formula)

### •

assume g∈ H = {h### k

}, i.e. approving if### • h

_{1}

### : annual salary > NTD 800,000

### • h

_{2}

### : debt > NTD 100,000 (really?)

### • h

_{3}

### : year in job ≤ 2 (really?)

### •

hypothesis setH:### • can contain **good or bad hypotheses**

### • up to A to pick the ‘best’ one as g

**learning model**

=A and H
The Learning Problem Components of Machine Learning

## Practical Definition of Machine Learning

### unknown target function f : X → Y

### (ideal credit approval formula)

### training examples **D : (x**

1### , y

_{1}

### ), · · · , (x

_{N}

### ,y

_{N}

### ) (historical records in bank)

### learning algorithm

### A

### final hypothesis g ≈ f

### (‘learned’ formula to be used)

### hypothesis set H

### (set of candidate formula)

### machine learning:

use

### data

to compute### hypothesis g

that approximates

### target f

The Learning Problem Machine Learning and Other Fields

## Machine Learning and Data Mining

### Machine Learning

use data to compute hypothesis g that approximates target f

### Data Mining

use

**(huge)**

data to**find property**

that is interesting
### •

if ‘interesting property’**same as**

‘hypothesis that approximate
target’
—ML = DM(usually what KDDCup does)

### •

if ‘interesting property’**related to**

‘hypothesis that approximate
target’
—DM can help ML, and vice versa(often, but not always)

### •

traditional DM also focuses on**efficient computation in large** **database**

difficult to distinguish ML and DM in reality

The Learning Problem Machine Learning and Other Fields

## Machine Learning and Artificial Intelligence

### Machine Learning

use data to compute hypothesis g that approximates target f

### Artificial Intelligence

compute**something**

**that shows intelligent behavior**

### •

g ≈ f is something that shows intelligent behavior—ML can realize AI, among other routes

### •

e.g. chess playing### • traditional AI: game tree

### • ML for AI: ‘learning from board data’

ML is one possible route to realize AI

The Learning Problem Machine Learning and Other Fields

## Machine Learning and Statistics

### Machine Learning

use data to compute hypothesis g that approximates target f

### Statistics

use data to

**make inference** **about an unknown process**

### •

g is an inference outcome; f is something unknown—statistics

**can be used to achieve ML**

### •

traditional statistics also focus on**provable results with math** **assumptions, and care less about computation**

statistics: many useful tools for ML

The Learning Problem

## A Learning Puzzle

### y

n### = −1

### y

n### = +1

### g(x) = ?

**let’s test your ‘human learning’**

**with 6 examples :-)**

The Learning Problem

## Two Controversial Answers

### whatever you say about g(x),

yn=−1

yn= +1

g(x) = ?

### y n = −1

### y n = +1

### g(x) = ?

### truth f (x) = +1 because . . .

### •

symmetry⇔ +1### •

(black or white count = 3) or (black count = 4 andmiddle-top black)⇔ +1

### truth f (x) = −1 because . . .

### •

left-top black⇔ -1### •

middle column contains at most 1 black and right-top white⇔ -1p

all valid reasons, your

**adversarial teacher**

can always call you ‘didn’t learn’. **:-(**

The Learning Problem

## No Free Lunch Theorem

## Without any assumptions on the learning problem on hand, all learning algorithms perform the same.

## No algorithm is better for all

## learning problems

The Learning Problem

## Gender Classification Problem

**?**

Male Female

The Learning Problem

## Gender Classification: Lesson 1

**?**

Male Female Female Male Male

Female Female Male Female Male

The Learning Problem

## Gender Classification: Lesson 2

**Male**

Male Female Female Male Male

Female Female

Male

Female Male

The Learning Problem

## Gender Classification: Lesson 3

**Male**

Male

Male Female Female

Male

Male

Female Female Male Female Male

The Learning Problem

## Gender Classification: Lesson 4

**?**

Male Female Female Male

Male

Female

The Learning Problem

## Nearest Neighbors

### Intuition

### •

memorize everything### •

predict with the closest case### Algorithm

### •

Training: memorize all examples (picture, label)### •

Prediction:### • find K nearest neighbors

### • let them vote!

The Learning Problem

## Apple Recognition Problem

### •

Is this a picture of an apple?### •

We want to teach a class of 6 year olds.### •

Gather photos from NY Apple Asso. and Google Image.The Learning Problem

## Our Fruit Class Begins

### Teacher:

How would you describe an apple? Michael?### Michael:

I think apples are circular.### (Class):

Apples are circular.The Learning Problem

## Our Fruit Class Continues

### Teacher:

Being circular is a good feature for the apples.However, if you only say circular, you could make several mistakes. What else can we say for an apple? Tina?

### Tina:

It looks like apples are red.### (Class):

Apples are somewhat circular and somewhat red.The Learning Problem

## Our Fruit Class Continues

### Teacher:

Yes. Many apples are red. However, you could still make mistakes based on circular and red. Do you have any other suggestions, Joey?### Joey:

Apples could also be green.### (Class):

Apples are somewhat circular and somewhat red and possibly green.The Learning Problem

## Our Fruit Class Continues

### Teacher:

Yes. It seems that apples might be circular, red, green. But you may confuse them with tomatoes or peaches, right? Any more suggestions, Jessica?### Jessica:

Apples have stems at the top.### (Class):

Apples are somewhat circular, somewhat red, possibly green, and may have stems at the top.The Learning Problem

## Adaptive Boosting

### ML and Life

### •

combine simple rules to approximate complex function (many heads are better than one)### •

emphasize incorrect data for valuable information (again you can learn by correcting mistakes)### AdaBoost Algorithm

### •

Input: examples (picture x_{n}

, label y### n

)^{N} _{n=1}

.
### •

For t = 1, 2,· · · , T ,### • learn a simple rule h

t### from emphasized examples

### • get the confidence w

t### of such rule

### • emphasize the examples that do not agree with h

_{t}

### .

### •

Output: weighted vote of the rulesP### T

### t=1

w_{t}

h_{t}

(x )
The Learning Problem