Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

### Department of Computer Science & Information Engineering

### National Taiwan University ( 國立台灣大學資訊工程系)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 0/99

### Roadmap

### What is Machine Learning Perceptron Learning Algorithm Types of Learning

### Possibility of Learning Linear Regression Logistic Regression Nonlinear Transform Overfitting

### Principles of Learning

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 1/99

### What is Machine Learning

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 2/99

The Learning Problem What is Machine Learning

### From Learning to Machine Learning

### learning: acquiring skill

with experience accumulated from

### observations observations learning skill

**machine learning: acquiring** skill

**machine learning:**

with experience accumulated/computedfrom

### data

### data ML ^{skill}

What is

### skill?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 3/99

The Learning Problem What is Machine Learning

### A More Concrete Definition

### skill

⇔ improve some

### performance measure

(e.g. prediction accuracy)**machine learning: improving some** performance measure

**machine learning:**

with experience

**computed**

from### data

### data ML

### improved performance measure

### An Application in Computational Finance

### stock data ML more investment gain

Why use machine learning?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 4/99

### •

‘define’ trees and hand-program:**difficult**

### •

learn from data (observations) and recognize: a**3-year-old can do so**

### •

‘ML-based tree recognition system’ can be**easier to build**

than hand-programmed
system
ML: an

**alternative route**

to
build complicated systems
Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 5/99

### Some Use Scenarios

### •

when human cannot program the system manually—navigating on Mars

### •

when human cannot ‘define the solution’ easily—speech/visual recognition

### •

when needing rapid decisions that humans cannot do—high-frequency trading

### •

when needing to be user-oriented in a massive scale—consumer-targeted marketing

Give a

**computer a fish, you feed it for a day;**

teach it how to fish, you feed it for a lifetime.

**:-)**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 6/99

The Learning Problem What is Machine Learning

### Key Essence of Machine Learning

**machine learning: improving some** performance measure

with experience**computed**

from### data

### data ML

### improved performance measure

### 1

exists### some ‘underlying pattern’ to be learned

—so ‘performance measure’ can be improved

### 2

but### no

programmable (easy)### definition

—so ‘ML’ is needed

### 3

somehow there is### data

about the pattern—so ML has some ‘inputs’ to learn from

key essence: help decide whether to use ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 7/99

### • data: how many users have rated some movies

### • skill: predict how a user would rate an unrated movie A Hot Problem

### •

competition held by Netflix in 2006### • 100,480,507 ratings that 480,189 users gave to 17,770 movies

### • 10% improvement = **1 million dollar prize**

### •

similar competition (movies→ songs) held by Yahoo! in KDDCup 2011### • 252,800,275 ratings that 1,000,990 users gave to 624,961 songs

How can machines

**learn our preferences?**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 8/99

The Learning Problem What is Machine Learning

### Entertainment: Recommender System (2/2)

Match movie and viewer factors

predicted rating

comedy content action

content blockb uster?

TomCruisein it?

likes TomCruise?

prefers blockbusters? likes action?

likes comedy?

movie viewer

add contributions from each factor

### A Possible ML Solution

### •

pattern:### rating

←### viewer/movie factors

### •

learning:### known rating

→ learned

### factors

→ unknown rating prediction

key part of the

**world-champion**

(again!)
system from National Taiwan Univ.
in KDDCup 2011

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 9/99

age 23 years

gender female

annual salary NTD 1,000,000 year in residence 1 year

year in job 0.5 year current debt 200,000

**unknown** pattern to be learned:

‘approve credit card good for bank?’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 10/99

### Formalize the Learning Problem

### Basic Notations

### •

input:**x**

∈ X (customer application)
### •

output: y ∈ Y (good/bad after approving credit card)### • unknown pattern to be learned ⇔ target function

: f : X → Y (ideal credit approval formula)### • data ⇔ training examples

:**D = {(x**

### 1

, y_{1}

), (x_{2}

, y_{2}

),**· · · , (x**

### N

, y_{N}

)}
(historical records in bank)
### • hypothesis ⇔ skill

with hopefully### good performance:

g : X → Y (‘learned’ formula to be used)

**{(x** ^{n} , y n ) }

from### f ML ^{g}

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 11/99

### (ideal credit approval formula)

### training examples **D : (x**

1### , y

1### ), · · · , (x

N### ,y

N### ) (historical records in bank)

### learning algorithm

### A

### final hypothesis g ≈ f

### (‘learned’ formula to be used)

### •

target f**unknown**

(i.e. no programmable definition)

### •

hypothesis g hopefully≈ f but possibly**different**

from f
(perfection ‘impossible’ when f unknown) What does g look like?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 12/99

### training examples **D : (x**

1### , y

1### ), · · · , (x

N### , y

N### ) (historical records in bank)

### learning algorithm

### A

### final hypothesis g ≈ f

### (‘learned’ formula to be used)

### hypothesis set H

### (set of candidate formula)

### •

assume g∈ H = {h### k

}, i.e. approving if### • h

_{1}

### : annual salary > NTD 800,000

### • h

_{2}

### : debt > NTD 100,000 (really?)

### • h

_{3}

### : year in job ≤ 2 (really?)

### •

hypothesis setH:### • can contain **good or bad hypotheses**

### • up to A to pick the ‘best’ one as g **learning model**

=A and H
Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 13/99

### (ideal credit approval formula)

### training examples **D : (x**

1### , y

_{1}

### ), · · · , (x

_{N}

### ,y

_{N}

### ) (historical records in bank)

### learning algorithm

### A

### final hypothesis g ≈ f

### (‘learned’ formula to be used)

### hypothesis set H

### (set of candidate formula)

### machine learning:

use

### data

to compute### hypothesis g

that approximates

### target f

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 14/99

### Machine Learning and Data Mining

### Machine Learning

use data to compute hypothesis g that approximates target f

### Data Mining

use

**(huge)**

data to**find property**

that is interesting
### •

if ‘interesting property’**same as**

‘hypothesis that approximate
target’
—ML = DM(usually what KDDCup does)

### •

if ‘interesting property’**related to**

‘hypothesis that approximate
target’
—DM can help ML, and vice versa(often, but not always)

### •

traditional DM also focuses on**efficient computation in large** **database**

difficult to distinguish ML and DM in reality

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 15/99

use data to compute hypothesis g that approximates target f

compute

**something**

**that shows intelligent behavior**

### •

g ≈ f is something that shows intelligent behavior—ML can realize AI, among other routes

### •

e.g. chess playing### • traditional AI: game tree

### • ML for AI: ‘learning from board data’

ML is one possible route to realize AI

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 16/99

### Machine Learning and Statistics

### Machine Learning

use data to compute hypothesis g that approximates target f

### Statistics

use data to

**make inference** **about an unknown process**

### •

g is an inference outcome; f is something unknown—statistics

**can be used to achieve ML**

### •

traditional statistics also focus on**provable results with math** **assumptions, and care less about computation**

statistics: many useful tools for ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 17/99

### Perceptron Learning Algorithm

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 18/99

### Credit Approval Problem Revisited

### Applicant Information

### age 23 years

### gender female

### annual salary NTD 1,000,000 year in residence 1 year

### year in job 0.5 year current debt 200,000 unknown target function

### f : X → Y

### (ideal credit approval formula)

### training examples **D : (x**

1### , y

_{1}

### ), · · · , (x

_{N}

### ,y

_{N}

### ) (historical records in bank)

### learning algorithm

### A

### final hypothesis g ≈ f

### (‘learned’ formula to be used)

### hypothesis set H

### (set of candidate formula)

what hypothesis set can we use?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 19/99

### current debt 200,000

### •

For**x = (x** _{1}

, x### 2

,· · · , x### d

)‘features of customer’, compute a weighted ‘score’ andapprove credit if X

### d

### i=1

w_{i}

x_{i}

> threshold
deny credit if X### d

### i=1

w_{i}

x_{i}

< threshold
### •

Y:### +1(good), **−1(bad)**

,

### 0 ignored—linear formula h

∈ H are### h(x) = sign

_{d}

X
### i=1

### w _{i}

x_{i}

!

−

### threshold

!

called ‘perceptron’ hypothesis historically

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 20/99

### h(x) = sign

_{d}

X

### i=1

### w _{i}

x_{i}

!

### −threshold

!

= sign

X

### d

### i=1

### w _{i}

x_{i}

!

+

### ( | −threshold) {z }

### w

0### · (+1) | {z }

### x

0

= sign X

### d i=0

### w _{i}

x_{i}

!

= sign

**w** ^{T} **x**

### •

each ‘tall’**w represents a hypothesis h & is multiplied with**

‘tall’

**x —will use tall versions to simplify notation**

what do perceptrons h ‘look like’?
Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 21/99

### •

customer features**x:**

points on the plane (or points in R^{d}

)
### •

labels y :### ◦ (+1)

,### × (-1)

### •

hypothesis h:**lines**

(or hyperplanes in R^{d}

)
—positiveon one side of a line,

### negative

on the other side### •

different line classifies customers differentlyperceptrons⇔

**linear (binary) classifiers**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 22/99

### Select g from H

H = all possible perceptrons,

### g =?

### •

want: g≈ f (hard when f unknown)### •

almost necessary: g ≈ f on D, ideally### g(x _{n} ) = f (x _{n} ) = y _{n}

### •

difficult: H is of**infinite**

size
### •

idea: start from some g_{0}

, and### ‘correct’ its mistakes on D

will represent g

_{0}

by its weight vector**w** _{0}

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 23/99

For t = 0, 1, . . .

### 1

find a### mistake

of**w** t

called **x** _{n(t)} , y _{n(t)}

sign
**w** ^{T} _{t} **x** _{n(t)}

6=

### y _{n(t)}

### 2

(try to) correct the mistake by**w** _{t+1}

←**w** t

+### y _{n(t)} **x** _{n(t)}

. . . until### no more mistakes

return

### last **w (called w**

PLA### ) as g

**w+ x** *y*

*y* *y= +1*

**x** **w**

**x**

### −1 **w** *y= *

**w+ x** *y*

**x** **w**

**x**

### −1 **w** *y= *

**w+ x**

That’s it!

—A fault confessed is half redressed.

**:-)**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 24/99

The Learning Problem Perceptron Learning Algorithm

### Seeing is Believing initially

**x**_{1}**w(t+1)**

**x****9**

**w(t)**

**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**
**x****3**

**w(t)**

**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**
**w**_{PLA}

**worked like a charm with** **< 20 lines!!**

### (note: made x _{i} x 0 = 1 for visual purpose)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 25/99

The Learning Problem Perceptron Learning Algorithm

### Seeing is Believing

**x**_{1}**w(t+1)**

### update: 1

**x****9**

**w(t)**

**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**
**w(t)**

**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**
**w**_{PLA}

**worked like a charm with** **< 20 lines!!**

### (note: made x _{i} x 0 = 1 for visual purpose)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 25/99

The Learning Problem Perceptron Learning Algorithm

### Seeing is Believing

**x**_{1}**w(t+1)**

**x****9**

**w(t)**

**w(t+1)**

### update: 2

**x****14**

**w(t)**
**w(t+1)**
**x****3**

**w(t)**

**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**
**w**_{PLA}

**worked like a charm with** **< 20 lines!!**

### (note: made x _{i} x 0 = 1 for visual purpose)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 25/99

The Learning Problem Perceptron Learning Algorithm

### Seeing is Believing

**x**_{1}**w(t+1)**

**x****9**

**w(t)**

**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

### update: 3

**w(t)**

**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**
**w**_{PLA}

**worked like a charm with** **< 20 lines!!**

### (note: made x _{i} x 0 = 1 for visual purpose)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 25/99

The Learning Problem Perceptron Learning Algorithm

### Seeing is Believing

**x**_{1}**w(t+1)**

**x****9**

**w(t)**

**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****3**

**w(t)**

**w(t+1)**

### update: 4

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**
**w**_{PLA}

**worked like a charm with** **< 20 lines!!**

### (note: made x _{i} x 0 = 1 for visual purpose)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 25/99

The Learning Problem Perceptron Learning Algorithm

### Seeing is Believing

**x**_{1}**w(t+1)**

**x****9**

**w(t)**

**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**
**w(t)**

**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

### update: 5

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**
**w**_{PLA}

**worked like a charm with** **< 20 lines!!**

### (note: made x _{i} x 0 = 1 for visual purpose)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 25/99

The Learning Problem Perceptron Learning Algorithm

### Seeing is Believing

**x**_{1}**w(t+1)**

**x****9**

**w(t)**

**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**
**x****3**

**w(t)**

**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

### update: 6

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**
**w**_{PLA}

**worked like a charm with** **< 20 lines!!**

### (note: made x _{i} x 0 = 1 for visual purpose)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 25/99

The Learning Problem Perceptron Learning Algorithm

### Seeing is Believing

**x**_{1}**w(t+1)**

**x****9**

**w(t)**

**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**
**w(t)**

**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

### update: 7

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**
**w**_{PLA}

**worked like a charm with** **< 20 lines!!**

### (note: made x _{i} x 0 = 1 for visual purpose)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 25/99

The Learning Problem Perceptron Learning Algorithm

### Seeing is Believing

**x**_{1}**w(t+1)**

**x****9**

**w(t)**

**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**
**x****3**

**w(t)**

**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

### update: 8

**x****9**

**w(t)**
**w(t+1)**
**w**_{PLA}

**worked like a charm with** **< 20 lines!!**

### (note: made x _{i} x 0 = 1 for visual purpose)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 25/99

The Learning Problem Perceptron Learning Algorithm

### Seeing is Believing

**x**_{1}**w(t+1)**

**x****9**

**w(t)**

**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**
**w(t)**

**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

### update: 9

**w**_{PLA}

**worked like a charm with** **< 20 lines!!**

### (note: made x _{i} x 0 = 1 for visual purpose)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 25/99

The Learning Problem Perceptron Learning Algorithm

### Seeing is Believing

**x**_{1}**w(t+1)**

**x****9**

**w(t)**

**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**
**x****3**

**w(t)**

**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**x****14**

**w(t)**
**w(t+1)**

**x****9**

**w(t)**
**w(t+1)**

**w**_{PLA}

### finally

**worked like a charm with** **< 20 lines!!**

### (note: made x _{i} x 0 = 1 for visual purpose)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 25/99

### Types of Learning

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 26/99

### Credit Approval Problem Revisited

### age 23 years

### gender female

### annual salary NTD 1,000,000 year in residence 1 year

### year in job 0.5 year current debt 200,000

credit?

**{no(−1), yes(+1)}**

### unknown target function f : X → Y

### (ideal credit approval formula)

### training examples **D : (x**

1### , y

1### ), · · · , (x

_{N}

### ,y

N### ) (historical records in bank)

### learning algorithm

### A

### final hypothesis g ≈ f

### (‘learned’ formula to be used)

### hypothesis set H

### (set of candidate formula)

Y = {−1, +1}:

**binary classification**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 27/99

### •

credit### approve/disapprove

### •

email### spam/non-spam

### •

patient### sick/not sick

### •

ad### profitable/not profitable

### •

answer### correct/incorrect

(KDDCup 2010)core and important problem with many tools as

**building block of other tools**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 28/99

**25**

**5**
**1**

**Mass**

**Size**
**10**

### •

classify US coins (1c, 5c, 10c, 25c) by (size, mass)### •

Y = {1c, 5c, 10c, 25c}, or**Y = {1, 2, · · · , K } (abstractly)**

### •

binary classification: special case with K = 2### Other Multiclass Classification Problems

### •

written digits⇒ 0, 1, · · · , 9### •

pictures⇒ apple, orange, strawberry### •

emails⇒ spam, primary, social, promotion, update (Google)**many applications**

in practice,
especially for ‘recognition’
Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 29/99

### •

multiclass classification: patient features⇒ which type of cancer### •

regression: patient features⇒**how many days before recovery**

### • Y = R

orY = [lower, upper] ⊂ R (bounded regression)—deeply studied in statistics

### Other Regression Problems

### •

company data⇒ stock price### •

climate data⇒ temperaturealso core and important with many ‘statistical’

tools as

**building block of other tools**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 30/99

### Mini Summary

### Learning with Different Output Space Y

### • **binary classification:**

Y = {−1, +1}
### •

multiclass classification: Y = {1, 2, · · · , K }### • **regression:**

Y = R
### •

. . . and a lot more!!### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

1### ), · · · , (x

N### ,y

N### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

core tools: binary classification and regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 31/99

**5**
**1**

**Size**
**10**

### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

1### ), · · · , (x

N### ,y

N### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

supervised learning:

every

**x** _{n} **comes with corresponding y** _{n}

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 32/99

### Unsupervised: Coin Recognition without y _{n}

**25**

**5**
**1**

**Mass**

**Size**
**10**

### supervised multiclass classification

**Mass**

**Size**

**unsupervised** multiclass classification

### ⇐⇒

‘clustering’### Other Clustering Problems

### •

articles⇒ topics### •

consumer profiles⇒ consumer groups**clustering: a challenging but useful problem**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 33/99

**25**

**5**
**1**

**Mass**

**Size**
**10**

### supervised multiclass classification

**Mass**

**Size**

**unsupervised** multiclass classification

### ⇐⇒

‘clustering’### Other Clustering Problems

### •

articles⇒ topics### •

consumer profiles⇒ consumer groups**clustering: a challenging but useful problem**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 33/99

### Unsupervised: Learning without y _{n}

### Other Unsupervised Learning Problems

### •

clustering:**{x**

^{n}

**} ⇒ cluster(x)**

(≈ ‘unsupervised multiclass classification’)

—i.e. articles⇒ topics

### • **density estimation:**

**{x**

### n

**} ⇒ density(x)**(≈ ‘unsupervised bounded regression’)

—i.e. traffic reports with location⇒ dangerous areas

### • **outlier detection:**

**{x**

### n

**} ⇒ unusual(x)**

(≈ extreme ‘unsupervised binary classification’)

—i.e. Internet logs⇒ intrusion alert

### •

. . . and a lot more!!**unsupervised learning: diverse, with possibly**

very different performance goals
Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 34/99

**25**

**5**
**1**

**Mass**

**Size**
**10**

### supervised

**25**

**5**
**1**

**Mass**

**Size**
**10**

**semi-supervised**

**Mass**

**Size**

### unsupervised (clustering)

### Other Semi-supervised Learning Problems

### •

face images with a few labeled⇒ face identifier (Facebook)### •

medicine data with a few labeled⇒ medicine effect predictor**semi-supervised learning: leverage**

unlabeled data to avoid ‘expensive’ labeling
Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 35/99

### Reinforcement Learning

a ‘very different’ but natural way of learning

### Teach Your Dog: Say ‘Sit Down’

The dog pees on the ground.

**BAD DOG. THAT’S A VERY WRONG ACTION.**

### •

cannot easily show the dog that y_{n}

= sit
when**x** _{n}

=‘sit down’
### •

but can ‘punish’ to say ˜y_{n}

= pee is wrong
### Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

### •

(customer, ad choice, ad click earning)⇒ ad system### •

(cards, strategy, winning amount)⇒ black jack agent reinforcement: learn with**‘partial/implicit**

**information’**

(often sequentially)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 36/99

### Teach Your Dog: Say ‘Sit Down’

The dog sits down.

### Good Dog. Let me give you some cookies.

### •

still cannot show y_{n}

= sit
when**x** _{n}

=‘sit down’
### •

but can ‘reward’ to say ˜y_{n}

= sit is good
### Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

### •

(customer, ad choice, ad click earning)⇒ ad system### •

(cards, strategy, winning amount)⇒ black jack agent reinforcement: learn with**‘partial/implicit**

**information’**

(often sequentially)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 36/99

### Learning with Different Data Label y n

### • **supervised: all y** _{n}

### •

unsupervised: no y### n

### •

semi-supervised: some y### n

### •

reinforcement: implicit y_{n}

by goodness(˜y_{n}

)
### •

. . . and more!!### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

_{1}

### ), · · · , (x

_{N}

### ,y

_{N}

### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

core tool: supervised learning

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 37/99

### Possibility of Learning

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 38/99

### y

n### = −1

### y

n### = +1

### g(x) = ?

**let’s test your ‘human learning’**

**with 6 examples :-)**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 39/99

The Learning Problem Possibility of Learning

### Two Controversial Answers

### whatever you say about g(x),

yn=−1

yn= +1

g(x) = ?

### y n = −1

### y n = +1

### g(x) = ?

### truth f (x) = +1 because . . .

### •

symmetry⇔ +1### •

(black or white count = 3) or (black count = 4 andmiddle-top black)⇔ +1

### truth f (x) = −1 because . . .

### •

left-top black⇔ -1### •

middle column contains at most 1 black and right-top white⇔ -1all valid reasons, your

**adversarial teacher**

can always call you ‘didn’t learn’. **:-(**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 40/99

### Theoretical Foundation of Statistical Learning

if

### training and testing from same distribution, with a high probability, E _{out} (g)

### | {z } test error

≤

### E _{in} (g)

### | {z } training error

+

### r

### 8

### N ln _{4(2N)}

d_{VC}(H)

### δ

### | {z }

### Ω:price of using H

in-sample error model complexity out-of-sample error

VC dimension, dvc

Error

d^{∗}vc

### •

dVC(H): VC dimension of H≈ # of parameters to describeH

### •

d_{VC}↑:

### E _{in} ↓

but### Ω ↑

### •

dVC↓:### Ω ↓

but### E _{in} ↑

### •

best d_{VC}

^{∗} in the middle

### powerful H

not always good!Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 41/99

### distribution P(y |x) containing f (x) + noise (ideal credit approval formula)

### training examples **D : (x**

1### , y

1### ), · · · , (x

N### ,y

N### ) (historical records in bank)

### learning algorithm

### A

### final hypothesis g ≈ f

### (‘learned’ formula to be used)

### hypothesis set H

### (set of candidate formula)

### P on X

**x**

_{1}

### , **x**

_{2}

### , · · · , **x**

_{N}

**x** y

1### ,y

2### , · · · , y

Nif control complexity ofH properly and minimize E

_{in}

,**learning possible**

:-)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 42/99

### Linear Regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 43/99

### year in residence 1 year year in job 0.5 year current debt 200,000

credit limit?

**100,000**

### unknown target function f : X → Y (ideal credit **limit** formula)

### training examples **D : (x**

1### , y

1### ), · · · , (x

N### , y

N### ) (historical records in bank)

### learning algorithm

### A

### final hypothesis g ≈ f

### (‘learned’ formula to be used)

### hypothesis set H

### (set of candidate formula)

Y = R:

**regression**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 44/99

### Linear Regression Hypothesis

### age 23 years

### annual salary NTD 1,000,000 year in job 0.5 year current debt 200,000

### •

For**x = (x** _{0}

, x### 1

, x### 2

,· · · , x### d

)‘features of customer’,approximate the

### desired credit limit

with a### weighted

sum:### y

≈ X### d i=0

### w _{i}

x_{i}

### •

linear regression hypothesis:### h(x) = **w** ^{T} **x**

### h(x): like **perceptron, but without the** sign

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 45/99

### ∈ R

### x

### y

**x = (x** _{1} , x 2 ) ∈ R

### x

1### x

2### y

### x

1### x

2### y

linear regression:

find

### lines/hyperplanes

with small### residuals

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 46/99

### The Error Measure

popular/historical error measure:

squared error

### err(ˆ y , y ) = (ˆ y − y) ^{2} in-sample

E

_{in}

(hw) = 1
N
X

### N n=1

### (h(x _{n} )

### | {z }

**w**

^{T}

**x**

n
### − y n ) ^{2}

### out-of-sample

E

_{out}

(w) = E
### (x,y)∼P (w ^{T} **x** − y ) ^{2}

next: how to minimize E

_{in}

(w)?
Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 47/99

E

_{in}

(w) =
N

### n=1

(w

^{T} **x** _{n}

−### y n

)^{2}

=
N ### n=1

(x

^{T} _{n} **w**

−### y n

)^{2}

= 1

N

**x** ^{T} _{1} **w**

−### y _{1} **x** ^{T} _{2} **w**

−### y _{2}

. . .

**x** ^{T} _{N} **w**

−### y _{N}

2

= 1

N

###

###

###

**− − x** ^{T} 1 − −

**− − x** ^{T} 2 − − . . .

**− − x** ^{T} _{N} − −

###

###

### **w**

−
###

###

### y _{1} y _{2} . . . y _{N}

###

###

###

2= 1

Nk

### X

|{z}

### N×d +1

|{z}

**w**

### d +1×1

−

**y**

|{z}

### N×1

k

^{2}

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 48/99

min

**w**

E_{in}

(w) = 1
Nk

### Xw

−**y**

k^{2}

**w**

E_{in}

### •

E_{in}

(w): continuous, differentiable,**convex**

### •

necessary condition of ‘best’**w**

∇E

### in

(w)≡

### ∂E

in### ∂w

0(w)### ∂E

in### ∂w

1(w) . . .### ∂E

in### ∂w

d(w)

=

### 0 0

. . .### 0

—not possibleto ‘roll down’

task: find

**w**

LINsuch that∇E### in

(wLIN) =**0**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 49/99

N N

### | {z }

### A

### |{z}

**b**

### |{z}

### c

### one w only

E_{in}

(w)=_{N} ^{1}

### aw ^{2}

− 2### bw

+### c

∇E

### in

(w)=_{N} ^{1}

(2aw− 2### b) **simple! :-)**

### vector **w**

E_{in}

(w)=_{N} ^{1}

**w** ^{T} Aw

− 2**w** ^{T} **b**

+### c

∇E

### in

(w)=_{N} ^{1}

(2Aw− 2**b)**

similar (derived by definition)
∇E

### in

(w) =_{N} ^{2} X ^{T} Xw

−### X ^{T} **y**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 50/99

task: find

**w**

_{LIN}such that

_{N} ^{2} X ^{T} Xw

−### X ^{T} **y**

=∇E

### in

(w) =**0** invertible X ^{T} X

### • **easy!**

unique solution
**w**

LIN=
### X ^{T} X

### −1

### X ^{T}

| {z }

pseudo-inverse

### X

^{†}

**y**

### •

often the case because### N d + 1

### singular X ^{T} X

### • **many**

optimal solutions
### •

one of the solutions**w**

_{LIN}=

### X ^{†} **y**

by defining

### X ^{†}

in other ways
practical suggestion:

use

**well-implemented** **† routine**

instead of ### X ^{T} X

### −1

### X ^{T}

for numerical stability when

**almost-singular**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 51/99

### X =

###

**− − x** 1 − −

**− − x** ^{T} 2 − −

### · · ·

**− − x** ^{T} N − −

###

###

### | {z }

### N×(d +1)

**y =**

### y _{1} y _{2}

### · · · y _{N}

###

###

### | {z }

### N×1

### 2

calculate pseudo-inverse### |{z} X ^{†}

### (d +1)×N 3

return**w**

LIN
### |{z}

### (d +1)×1

=

### X ^{†} **y**

simple and efficient with

**good** **† routine**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 52/99

### Is Linear Regression a ‘Learning Algorithm’?

**w**

_{LIN}=

### X ^{†} **y**

### No!

### •

analytic (closed-form) solution, ‘instantaneous’### •

not improving E_{in}

nor
E### out

iteratively### Yes!

### •

good E_{in}

?
**yes, optimal!**

### •

good E_{out}

?
**yes, finite d**

_{VC}

**like perceptrons**

### •

improving iteratively?**somewhat, within an iterative** **pseudo-inverse routine**

if E

_{out}

(w_{LIN})is good,

**learning ‘happened’!**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 53/99

### Logistic Regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 54/99

### Heart Attack Prediction Problem (1/2)

### age 40 years

### gender male

### blood pressure 130/85 cholesterol level 240

### weight 70

heart disease?

**yes**

### unknown target distribution P(y |x) containing f (x) + noise

### training examples **D : (x**

1### , y

1### ), · · · , (x

_{N}

### ,y

N### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

### error measure err c err

binary classification:

ideal f (x) = sign

### P(+1 **|x)**

−^{1} 2

∈ {−1, +1}

because of

### classification err

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 55/99

### blood pressure 130/85 cholesterol level 240

### weight 70

heart

### attack? **80% risk**

### unknown target distribution P(y |x) containing f (x) + noise

### training examples **D : (x**

1### , y

1### ), · · · , (x

_{N}

### ,y

N### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

### error measure err c err

### ‘soft’

binary classification:### f

(x) =### P(+1 **|x)**

∈ [0, 1]
Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 56/99

### Soft Binary Classification

target function

### f

(x) =### P(+1 **|x)**

∈ [0, 1]
### ideal (noiseless) data

**x** _{1}

, y_{1} ^{0}

=0.9 =### P(+1 **|x** 1 )

**x** _{2}

, y_{2} ^{0}

=0.2 =### P(+1 **|x** 2 )

...

**x** _{N}

, y_{N} ^{0}

=0.6 =### P(+1 **|x** N )

### actual (noisy) data

**x** _{1}

, y_{1}

=### ◦

∼### P(y **|x** 1 )

**x** _{2}

, y### 2

=### ×

∼### P(y **|x** 2 )

...

**x** _{N}

, y### N

=### ×

∼### P(y **|x** N )

same data as hard binary classification, different

**target function**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 57/99

### ideal (noiseless) data

**x** _{1}

, y_{1} ^{0}

=0.9 =### P(+1 **|x** 1 )

**x** _{2}

, y_{2} ^{0}

=0.2 =### P(+1 **|x** 2 )

...

**x** _{N}

, y_{N} ^{0}

=0.6 =### P(+1 **|x** N )

### actual (noisy) data

**x** _{1}

, y_{1} ^{0}

=### 1

=r### ◦

∼^{?} P(y **|x** 1 )

z

**x** _{2}

, y_{2} ^{0}

=### 0

=r### ◦

∼^{?} P(y **|x** 2 )

z
...

**x** _{N}

, y_{N} ^{0}

=### 0

= r### ◦

∼^{?} P(y **|x** N )

z
same data as hard binary classification, different

**target function**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 57/99

### age 40 years

### gender male

### blood pressure 130/85 cholesterol level 240

### •

For**x = (x** _{0}

, x### 1

, x### 2

,· · · , x### d

)‘features of patient’, calculate a### weighted

‘risk score’:### s

= X### d

### i=0

### w _{i}

x_{i}

### •

convert the### score

to### estimated probability

by logistic function### θ(s)

### θ(s) 1

### 0 s

logistic hypothesis:

### h(x) = θ(w ^{T} **x) =** _{1+exp(−w} ^{1}

T**x)**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 58/99

### linear classification

h(x) = sign(s)*s*
*x*

*x*

*x*
*x*_{0}

1 2

*d*

*h x*( )

plausible err = 0/1

### (small flipping noise)

### linear regression

h(x) =### s

*s*
*x*

*x*

*x*
*x*_{0}

1 2

*d*

*h x*( )

friendly err = squared

### (easy to minimize)

### logistic regression

h(x) =θ(s)*s*
*x*

*x*

*x*
*x*_{0}

1 2

*d*

*h x*( )

### err = ?

how to define

### E _{in} (w) for logistic regression?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 59/99

target function

### f(x) = P(+1 **|x)** ⇔ ^{P(y} ^{|x)}

^{|x)}

^{=}

^{}

^{f} ^{1} ^{(x)} ^{−} ^{f} (x) for y = ^{for y = +1} −1

consider**D = {(x**

### 1

,### ◦

), (x_{2}

,### ×

), . . . , (x_{N}

,### ×

)}### probability that f generates D

P(x_{1}

)P(### ◦|x 1

)× P(x_{2}

)P(### ×|x 2

)× . . .P(x

_{N}

)P(### ×|x N

)### likelihood that h generates D

P(x_{1}

)h(x_{1} )

×
P(x

_{2}

)(1### − h(x _{2} ))

×
. . .
P(x

_{N}

)(1### − h(x _{N} ))

### •

if### h

≈### f,

then likelihood(h)≈ probability using

### f

### •

probability using### f

usually**large**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 60/99

consider**D = {(x**

### 1

,### ◦

), (x_{2}

,### ×

), . . . , (x_{N}

,### ×

)}### probability that f generates D

P(x

_{1}

)f### (x _{1} )

×
P(x_{2}

)(1### − f(x _{2} ))

×
. . .
P(x

_{N}

)(1### − f (x _{N} ))

### likelihood that h generates D

P(x_{1}

)h(x_{1} )

×
P(x

_{2}

)(1### − h(x _{2} ))

×
. . .
P(x

_{N}

)(1### − h(x _{N} ))

### •

if### h

≈### f,

then likelihood(h)≈ probability using

### f

### •

probability using### f

usually**large**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 60/99

### Likelihood of Logistic Hypothesis

likelihood(h)≈ (probability using

### f)

≈**large**

g = argmax
### h

likelihood(h)

### when logistic: h(x) = θ(w ^{T} **x)** 1 − h(x)

=### h( − **x)**

### θ(s) 1

### 0 s

likelihood(h) =

### P(x _{1} )h(x _{1} ) **× P(x** 2 )(1 − h(x _{2} )) **× . . . P(x** N )(1 − h(x _{N} ))

likelihood(logistic

### h)

∝ Y### N n=1

### h(y n **x** _{n}

)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 61/99

g = argmax

### h

likelihood(h)

### when logistic: h(x) = θ(w ^{T} **x)** 1 − h(x)

=### h( − **x)**

### θ(s) 1

### 0 s

likelihood(h) =

### P(x _{1} )h(+x _{1} ) **× P(x** 2 )h( **−x** 2 ) **× . . . P(x** N )h( **−x** N )

likelihood(logistic

### h)

∝ Y### N n=1

### h(y n **x** _{n}

)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 61/99

The Learning Problem Logistic Regression

### Cross-Entropy Error

max

### h likelihood(logistic h) ∝

Y### N n=1

### h(y _{n} **x** _{n}

)
1 + exp(−s)

^{w} N

^{w}

### n=1

=⇒ min

_{w} 1 N

_{w}

X

### N n=1

err(w, x

### n

, y### n

)| {z }

### E

in### (w)

err(w, x, y ) = ln (1 + exp(−y

**wx)):** **cross-entropy error**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 62/99

The Learning Problem Logistic Regression

### Cross-Entropy Error

max

**w** likelihood(w)

∝
Y### N n=1

θ

y

_{n} **w** ^{T} **x** _{n}

**w** N

### n=1

| {z }### E

in### (w)

err(w, x, y ) = ln (1 + exp(−y

**wx)):** **cross-entropy error**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 62/99

The Learning Problem Logistic Regression

### Cross-Entropy Error

max

**w**

ln
Y### N n=1

θ

y

_{n} **w** ^{T} **x** _{n}

1 + exp(−s)

^{w} N

^{w}

### n=1

=⇒ min

_{w} 1 N

_{w}

X

### N n=1

err(w, x

### n

, y### n

)| {z }

### E

in### (w)

err(w, x, y ) = ln (1 + exp(−y

**wx)):** **cross-entropy error**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 62/99

**w** N

### n=1

θ(s) = 1

1 + exp(−s) : min

**w**

### 1 N

X

### N n=1

ln

1 + exp(−y

^{n} **w** ^{T} **x** n

)
=⇒ min

_{w} 1 N

_{w}

X

### N n=1

err(w, x

### n

, y### n

)| {z }

### E

in### (w)

err(w, x, y ) = ln (1 + exp(−y

**wx)):**

**cross-entropy error**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 62/99

### Minimizing E _{in} (w)

min

**w**

E_{in}

(w) = 1
N
X

### N n=1

ln

1 + exp(−y

### n **w** ^{T} **x** _{n}

)

**w**

E_{in}

### •

E_{in}

(w): continuous, differentiable,
twice-differentiable,**convex**

### •

how to minimize? locate**valley**

want∇E### in

(w) =**0**

first: derive∇E

### in

(w)Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 63/99

The Learning Problem Logistic Regression

### The Gradient ∇E ^{in} (w)

### E

_{in}

### (w) = 1 N

### X

N n=1### ln

###

### 1 + exp(

### z }| {

### −y

n**w**

^{T}

**x**

_{n}

### )

### | {z }

###

###

∂E

### in

(w)### ∂w _{i}

= 1
N
X

### N n=1

∂ ln()### ∂

### ∂(1 + exp( ))

### ∂

### ∂ − y n **w** ^{T} **x** _{n}

### ∂w _{i}

= 1

N X

### N n=1

= 1

N X

### N n=1

### exp( ) 1 + exp( )

### −y n x _{n,i}

### !

= 1 N

X

### N n=1

### θ(
) −y n x _{n,i}

∇E

### in

(w) =_{N} ^{1}

P### N n=1

### θ

−y^{n} **w** ^{T} **x** _{n}

### −y ^{n} **x** _{n}

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 64/99

### E

_{in}

### (w) = 1 N

### X

N n=1### ln

###

### 1 + exp(

### z }| {

### −y

n**w**

^{T}

**x**

_{n}

### )

### | {z }

###

###

∂E

### in

(w)### ∂w _{i}

= 1
N
X

### N n=1

∂ ln()### ∂

### ∂(1 + exp( ))

### ∂

### ∂ − y n **w** ^{T} **x** _{n}

### ∂w _{i}

= 1

N X

### N n=1

### 1

### exp( )

### !

### −y n x _{n,i}

### !

= 1

N X

### N n=1

### exp( ) 1 + exp( )

### −y n x _{n,i}

### !

= 1 N

X

### N n=1

### θ(
) −y n x _{n,i}

∇E

### in

(w) =_{N} ^{1}

P### N n=1

### θ

−y^{n} **w** ^{T} **x** _{n}

### −y ^{n} **x** _{n}

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 64/99

### n=1

want∇E

### in

(w) = 1 NX

### N n=1

### θ

−y

### n **w** ^{T} **x** _{n}

### −y n **x** _{n}

=

**0**

**w**

E_{in}

### scaled θ-weighted sum of −y ^{n} **x** n

### •

all### θ( ·)

=0: only if y_{n} **w** ^{T} **x** _{n}

0
—linear separableD

### •

weighted sum =**0:**

non-linear equation of

**w**

**closed-form solution? no :-(**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 65/99

The Learning Problem Logistic Regression

### PLA Revisited: Iterative Optimization

PLA: start from some

**w** _{0}

(say,**0), and ‘correct’ its mistakes on**

D
For t = 0, 1, . . .
### 1

find a### mistake

of**w** _{t}

called **x** _{n(t)} , y _{n(t)}

sign
**w** ^{T} _{t} **x** _{n(t)}

6=### y _{n(t)}

### 2

(try to) correct the mistake by**w** _{t+1}

←**w** _{t}

+### y _{n(t)} **x** _{n(t)}

**w** _{t+1}

**← w**

### t

+ rsign

**w** ^{T} _{t} **x** _{n}

6= y

### n

z y

_{n} **x** _{n}

when stop, return

### last **w as g**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 66/99

For t = 0, 1, . . .

### 1 find a mistake of **w** t called **x** _{n(t)} , y _{n(t)} sign

**w** ^{T} _{t} **x** _{n(t)} 6= y n(t) 2 (try to) correct the mistake by

**w** _{t+1} **← w** ^{t} + y _{n(t)} **x** _{n(t)}

### 1

(equivalently) pick some### n, and update **w** _{t}

by
**w** _{t+1}

←**w** t

+r
sign

**w** ^{T} _{t} **x** n

6=

### y n

z

### y n **x** n

when stop, return

### last **w as g**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 66/99

### PLA Revisited: Iterative Optimization

PLA: start from some

**w** _{0}

(say,**0), and ‘correct’ its mistakes on**

D
For t = 0, 1, . . .
### 1

(equivalently) pick some### n, and update **w** _{t}

by
**w** _{t+1}

←**w** _{t}

+ 1
|{z}

### η

·r sign

**w** ^{T} _{t} **x** _{n}

6=### y _{n}

z
·

### y _{n} **x** _{n}

| {z }

**v**

when stop, return

### last **w as g**

choice of (η, v) and stopping condition defines

**iterative optimization approach**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics 66/99