• 沒有找到結果。

© Deng Cai, College of Computer Science, Zhejiang University

N/A
N/A
Protected

Academic year: 2021

Share "© Deng Cai, College of Computer Science, Zhejiang University"

Copied!
37
0
0

加載中.... (立即查看全文)

全文

(1)

Deng Cai (蔡登)

College of Computer Science Zhejiang University

[email protected]

Introduction to Data Mining

1

(2)

Deng Cai (蔡登)

College of Computer Science Zhejiang University

[email protected]

Introduction to Machine Learning

2

(3)

© Deng Cai, College of Computer Science, Zhejiang University 

Short Bio

Dr. Deng Cai (蔡登)

[email protected][email protected]

Professor at CS college (the state key lab of CAD&CG).

 紫金港校区蒙民伟楼508

Research interests:

 Machine learning

 Data mining

 Computer vision

 …

http://dengcai.zjulearning.org:8081/

3

(4)

© Deng Cai, College of Computer Science, Zhejiang University 

Course Information

Web: http://dengcai.zjulearning.org:8081/Courses/DM/

Homework: http://assignment.zjulearning.org:8081/

 缺省用户名和密码:学号,登陆之后修改密码

Time: 

Monday, 14:05 – 15:35

Thursday, 14:05 – 15:35

Place:Room 504, 7th teaching building, Yuquan Campus

QQ group: 397340601(DM_ZJU)  (Apply with name and student ID) TA: 张永辉、胡津铭

4

(5)

© Deng Cai, College of Computer Science, Zhejiang University 

Course information (Cont’d)

Prerequisite: 

 Linear algebra, analysis, probability theory

 Basic programming skills

Course textbook: No textbook is required. (Papers and  other materials are available at the class web page)

Objective: 

Basic understandings of some of the important machine learning  methods. 

Basic ability to use some machine learning techniques to solve real 

world problems.

(6)

© Deng Cai, College of Computer Science, Zhejiang University 

Reference Books

R. Duda, P. Hart & D. Stork,  Pattern Classification (2 nd ed.),  Wiley, 2000

C. M. Bishop, Pattern Recognition  and Machine Learning, Springer,  2006

T. Hastie, R. Tibshirani & J. 

Friedman, The Elements of 

Statistical Learning: Data Mining,  Inference, and Prediction (2 nd ed.),  Springer, 2009

Kevin Murphy, Machine Learning: 

A Probabilistic Perspective, The 

MIT Press, 2012

(7)

© Deng Cai, College of Computer Science, Zhejiang University 

Reference Books

You can download all the books from the QQ group

(8)

© Deng Cai, College of Computer Science, Zhejiang University 

Evaluation

Quizzes (15%)

Four assignments (10% each)

 Everyone do it by himself

Final exam (45% )

Programming language: 

 Matlab

Tutorials

– http://www.math.ufl.edu/help/matlab‐tutorial/ 

– http://www.math.mtu.edu/~msgocken/intro/node1.html

 Python

8

(9)

© Deng Cai, College of Computer Science, Zhejiang University 

Course Policies

Class

 No laptop, no cellphone.

Cheating

 No.

Homework:

 You have to write you own solution/program.

Late Policy: 

 0~24 hours: 90%

 24~48 hours: 50%

 48 hours ~: 25%

Questions? 

9

(10)

© Deng Cai, College of Computer Science, Zhejiang University 

Why Take This Course?

It is NOT

 Easy course with high scores

 Recommendation letter for US school application

Rank 1

st 

You should

 Work hard

 Be honest

10

(11)

© Deng Cai, College of Computer Science, Zhejiang University 

What is machine learning?

Machine learning is the study of computer 

systems that improve their performance through  experience.

 Learn existing and known structures and rules.

 Discover new findings and structures.

Face recognition

News summarization

In machine learning, we study two types of 

problems

(12)

© Deng Cai, College of Computer Science, Zhejiang University 

The first kind of problems 

刘德华 章子怡 王俊凯 ……

章子怡

(13)

© Deng Cai, College of Computer Science, Zhejiang University 

The first kind of problems 

不同人

同一个人 同一个人

(14)

© Deng Cai, College of Computer Science, Zhejiang University 

The first kind of problems 

57岁 30岁 28岁

18岁

... ...

14岁

33岁

(15)

© Deng Cai, College of Computer Science, Zhejiang University 

The second kind of problems 

(16)

© Deng Cai, College of Computer Science, Zhejiang University 

Two kinds of problems

What are the differences?

Supervised learning vs. Unsupervised learning

(17)

© Deng Cai, College of Computer Science, Zhejiang University 

Two kinds of problems

What are the differences?

Supervised learning vs. Unsupervised learning

Supervised learning

 Goal: learn a mapping from inputs 𝒙 to outputs 𝑦

 Training data: a labeled set of input‐output pairs

 Classification (Categorization, Decision making…)

𝑦 is a categorical variable

 Regression

𝑦 is real‐valued

(18)

© Deng Cai, College of Computer Science, Zhejiang University 

Two kinds of problems

What are the differences?

Supervised learning vs. Unsupervised learning

Unsupervised learning

 We are only given inputs

 Goal: find “interesting patterns”

 Much less well‐defined problem

 Discovering clusters, Clustering

 Discovering latent factors

Dimensionality reduction, Matrix factorization, Topic  modeling

(19)

© Deng Cai, College of Computer Science, Zhejiang University 

Two kinds of problems

What are the differences?

Supervised learning vs. Unsupervised learning

Reinforcement learning

 It is a supervised learning scenario

 No desired category signal is given

 The only teaching feedback is that the tentative  category is right or wrong.

 This is useful for learning how to act or behave when 

given occasional reward or punishment signals.

(20)

© Deng Cai, College of Computer Science, Zhejiang University 

Focus of This Course

What are the typical machine learning problems?

 Supervised Learning

Classification (decision making)

Regression

 Unsupervised Learning

Cluster analysis

Latent factor analysis

What are the basic machine learning tools (methods, algorithms)?

Matlab/Python programming

20

(21)

© Deng Cai, College of Computer Science, Zhejiang University 

Basic Concepts of Supervised Learning

Sample, example, pattern

Features, predictors, independent variables

𝒙 , 𝒙 , ⋯ 𝒙

State of the nature, labels, pattern class, class, responses, dependent variables

𝜔 , 𝜔 , ⋯ 𝜔 or 𝑦 , 𝑦 , ⋯ 𝑦 or 𝑧 , 𝑧 , ⋯ 𝑧

Training data

𝒙 , 𝜔 , 𝒙 , 𝜔 , ⋯ 𝒙 , 𝜔

Model, statistical model, pattern class model, classifier

𝑓

Test data

Training error & test error

(22)

© Deng Cai, College of Computer Science, Zhejiang University 

Supervised Learning

Learning from experience(training data), and build model to predict the future 

Design & 

Train Model Collect 

training  samples

Define  features

Make  prediction

?

Training phase

Test phase

Step 1 Step 2

Representation Learning

(23)

© Deng Cai, College of Computer Science, Zhejiang University 

Supervised Learning

Design & 

Train Model Define 

features

Step 1 Step 2

Which step is more important in building a successful system?

Which one is the focus of this course?

(24)

© Deng Cai, College of Computer Science, Zhejiang University 

Why general classification hard?

Intra‐class variability

The letter “T” in different typefaces

Same face under different expression, pose, illumination Define 

features

Step 1 is not

good enough

(25)

© Deng Cai, College of Computer Science, Zhejiang University 

Why general classification hard?

Inter‐class similarity

Define  features

Step 1 is not

good enough

(26)

© Deng Cai, College of Computer Science, Zhejiang University 

Semantic Gap

Looks similar

But semantically different

Looks different

But semantically

the same

(27)

© Deng Cai, College of Computer Science, Zhejiang University 

Representation: Features

Extract features to represent the samples  Feature vector

Good representation:

 Low intra‐class variability

 Low inter‐class similarity

(28)

© Deng Cai, College of Computer Science, Zhejiang University 

Fish Classification:

Salmon v. Sea Bass

28

Preprocessing involves image enhancement and segmentation;

(i) separate touching or occluding fishes and

(ii) extract fish

contour

(29)

© Deng Cai, College of Computer Science, Zhejiang University 

Representation: Fish Length As Feature

How to design a classifier?

(30)

© Deng Cai, College of Computer Science, Zhejiang University 

30

Representation: Fish Length As Feature

Training (design or learning) Samples

(31)

© Deng Cai, College of Computer Science, Zhejiang University 

Probability Densities

31

(32)

© Deng Cai, College of Computer Science, Zhejiang University 

32

Fish Lightness As Feature

Overlap of these histograms is small compared to

length feature

(33)

© Deng Cai, College of Computer Science, Zhejiang University 

33

Two‐dimensional Feature Space

Two features together are better than individual features

Linear (simple) decision boundary

(34)

© Deng Cai, College of Computer Science, Zhejiang University 

34

Complex Decision Boundary

(35)

© Deng Cai, College of Computer Science, Zhejiang University 

(36)

© Deng Cai, College of Computer Science, Zhejiang University 

Generalization

A generalization of a concept is an extension of the concept to less‐

specific criteria.

Generalization of the classifier (model)

 The performance of the classifier on test data.

Training error:

Simple model  large training error Complex model  less training error

Test error:

Simple model  ? Complex model  ?

(37)

© Deng Cai, College of Computer Science, Zhejiang University 

Prerequisite Knowledge

Probability:

 Bayes theorem

Analysis:

 Gradient descent

Linear Algebra 

 Linear space,

 Matrix

Rank…

Positive definite matrix…

Eigenvector, eigenvalue

Singular vector, singular value

參考文獻

相關文件

‹ Based on the coded rules, facial features in an input image Based on the coded rules, facial features in an input image are extracted first, and face candidates are identified.

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.. linear SVM: more robust and solvable with quadratic programming Lecture 2: Dual Support

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.

Initial Approaches and Some Settings Sparse Features and Linear Classification Condensed Features and Random Forest Ensemble and Final Results.. Discussion

Reading Task 6: Genre Structure and Language Features. • Now let’s look at how language features (e.g. sentence patterns) are connected to the structure

 develop a better understanding of the design and the features of the English Language curriculum with an emphasis on the senior secondary level;..  gain an insight into the

Computer Science and Information Engineering National Taiwan University. 2014 APEC Cooperative Forum on Internet

“Computer-aided diagnosis for distinguishing between triple-negative breast cancer and fibroadenomas based on ultrasound texture features,”.. Medical