• 沒有找到結果。

A Parallelized Learning Algorithm for Monotonicity Constrained Support Vector Machines

N/A
N/A
Protected

Academic year: 2021

Share "A Parallelized Learning Algorithm for Monotonicity Constrained Support Vector Machines"

Copied!
33
0
0

加載中.... (立即查看全文)

全文

(1)

A Parallelized Learning Algorithm for

Monotonicity Constrained Support

Vector Machines

Authors:H. C. Chuang, C. C. Chen, C. Chou, Y. C. Cheng, S. T. Li Presenter:Hui-Chi ,Chuang

(2)

2

Agenda

Introduction

Literature Review

Research Methodology

Experimental Results and Analysis

Conclusions and Suggestions

(3)

3

Agenda

Introduction

Literature Review

Research Methodology

Experimental Results and Analysis

Conclusions and Suggestions

(4)

4

Computers

Server

Research Background and Motivation(1/3)

Mobile phones

Surveillance device smart appliances

(5)

Research Background and Motivation(2/3)

 Support Vector Machines (SVMs) are widely used data

mining techniques owing to their excellent abilities in solving both classification and regression problems.  Data mining techniques enable us to discover hidden

patterns and extract valuable knowledge from databases.  Knowledge-driven:Take a priori domain knowledge into

account.

 Some monotonic relationships exist between attributes and the output values.

(6)

Research Background and Motivation(3/3)

SVM with Monotonicity Constraints(MCSVM)

• Advantage:

1.More accuracy and meaningful

2.Reduce the interference of noise data

• Disadvantage:

1.Change the structure of quadratic programming problem

2. High computing time O (

𝑁𝑁

2

𝑀𝑀)

(7)

Research Objectives

MCSVM

SVM Monotonicity Constrain

Parallel Strategy

(8)

8

Agenda

Introduction

Literature Review

Research Methodology

Experimental Results and Analysis

Conclusions and Suggestions

(9)

Support Vector Machine(1/2)

 Vapnik (1995) proposed support vector machine, SVM)  It is a kind of data mining methods that have been

successively proposed and widely discussed.

 It is sensitive to outliers or noises in the training sample due to overfitting. (Guyon, Matic, Vapnik, 1996)

 SVM is designed to minimize the structural risk by minimizing an upper bound of the generalization error rather than the training error.

(10)

Support Vector Machine(2/2)

Optimal Hyperplane O O O O O x x x x x x O Support Vectors Support Vectors min 1 2 𝒘𝒘 2 + 𝐶𝐶 �𝑖𝑖=1 𝑁𝑁 𝜀𝜀𝑖𝑖 , 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑡𝑡 𝑦𝑦𝑖𝑖 𝒘𝒘𝑻𝑻𝝋𝝋 𝒙𝒙𝒊𝒊 + 𝑠𝑠 ≥ 1 − 𝜀𝜀𝑖𝑖, 𝑖𝑖 = 1, … , 𝑁𝑁, 𝜀𝜀𝑖𝑖 ≥ 0, 𝑖𝑖 = 1, … , 𝑁𝑁

𝒘𝒘: normal vector to the hyperplane,

𝜺𝜺𝒊𝒊: distance between the outlier 𝑖𝑖 and hyperpla 𝑪𝑪: constant, regarded as a regularization param 𝝋𝝋 𝒙𝒙 : kernel function

(11)

Monotonicity Constraints

 There are two different approaches for dealing with problems that have prior knowledge of monotonic properties.

 One is to apply a relabeling technique to those data missing monotonicity

 The other is to add the monotonicity constraints directly to the optimization modeling settings

(12)

Monotonicity Constrained SVMs (1/2)

 Monotonicity is a relationship in which increasing the value of the variables always increases or decreases the likelihood category membership.

 Define the monotonicity relationship.

 Given a dataset with denoted as the feature space, and a partial ordering defined over this input space .

 A linear ordering is defined over the space Y of class values .

(

)

{

i , yi |i 1, 2,...,N

}

ℑ = x =

n R

i y

(13)

Monotonicity Constrained SVMs (2/2)

 Then the classifier is monotone if the following statement holds:

 The monotonicity constraints can be expressed as the following inequality: , for all

( )

( )

, , ijf if ji j x x x x ( ) ( ) T T i j ϕ ≤ ϕ w x w x xixj

(14)

14

Parallel Strategy

 The concept of “ Divide and Conquer. “

 Collobert et al. (2002) - Can be easy to implemented in parallel and where each SVM is trained on a small

subset of the whole dataset.

 Kruger et al. (2006) - Mixture of SVM for speech recognition using Hidden Markov Model

(15)

15

Agenda

Introduction

Literature Review

Research Methodology

Experimental Results and Analysis

Conclusions and Suggestions

(16)

16

Construct Monotonicity Constraints(1/2)

Algorithm 1: Monotonicity Constraints – Hierarchy Method

Step 1. Determine the Hierarchy partition 𝑧𝑧𝑖𝑖 = 𝑠𝑠𝑖𝑖,0, … , 𝑠𝑠𝑖𝑖,𝑣𝑣, … , 𝑠𝑠𝑖𝑖,𝑀𝑀 = 𝑧𝑧𝑖𝑖 for the range interval 𝑧𝑧𝑖𝑖, 𝑧𝑧𝑖𝑖 , for𝑖𝑖 = 1,2, … , 𝑛𝑛.

Step 2. Determine 𝒙𝒙𝒌𝒌= 𝑥𝑥𝑘𝑘,1, 𝑥𝑥𝑘𝑘,2, … , 𝑥𝑥𝑘𝑘,𝑛𝑛 and 𝒙𝒙𝒌𝒌 = 𝑥𝑥𝑘𝑘,1, 𝑥𝑥𝑘𝑘,2, … , 𝑥𝑥𝑘𝑘,𝑛𝑛 such that

𝑥𝑥𝑘𝑘,𝑖𝑖 = 𝑠𝑠𝑖𝑖,𝑘𝑘−1𝑖𝑖 ∈ 𝑀𝑀𝑖𝑖𝑛𝑛𝑖𝑖 , 𝑥𝑥𝑘𝑘,𝑖𝑖 = 𝑠𝑠𝑖𝑖,𝑘𝑘 𝑖𝑖 ∈ 𝑀𝑀𝑑𝑑𝑖𝑖𝑑𝑑 , 𝑥𝑥𝑘𝑘,𝑖𝑖 = ∀ 𝑠𝑠𝑣𝑣 𝑖𝑖 ∈ 𝑀𝑀𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 ,

𝑥𝑥𝑘𝑘,𝑖𝑖 = 𝑠𝑠𝑖𝑖,𝑘𝑘 𝑖𝑖 ∈ 𝑀𝑀𝑖𝑖𝑛𝑛𝑖𝑖 , 𝑥𝑥𝑘𝑘,𝑖𝑖 = 𝑠𝑠𝑖𝑖,𝑘𝑘−1 𝑖𝑖 ∈ 𝑀𝑀𝑑𝑑𝑖𝑖𝑑𝑑 , 𝑥𝑥𝑘𝑘,𝑖𝑖 = ∀ 𝑠𝑠𝑣𝑣 𝑖𝑖 ∈ 𝑀𝑀𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 ,

for 𝑖𝑖 = 1,2, … , 𝑛𝑛, 𝑘𝑘 = 1,2, … , 𝑀𝑀 and 𝑣𝑣 = 0,1,2, … , 𝑀𝑀. Step 3. Add the monotonicity constraint𝒘𝒘𝑇𝑇𝝋𝝋 𝒙𝒙𝒌𝒌 ≤ 𝒘𝒘𝑇𝑇𝝋𝝋 𝒙𝒙𝒌𝒌 . Hierarchy Method (Yang, 2014)

(17)

17

Construct Monotonicity Constraints(2/2)

Level Level 1 Level 2 Level 3

Attribute A1 A2 A3 A1 A2 A3 A1 A2 A3 Max Min v11 v15 v21 v25 r3 v11 v15 v21 v25 r3 v13 v23 v11 v15 v21 v25 r3 v13 v23 v14 v12 v24 v22 𝑥𝑥1= (v15,v21,r3) 𝑥𝑥1= (v11,v25,r3) 𝑥𝑥2= (v15,v23,r3) 𝑥𝑥3= (v13,v21,r3) 𝑥𝑥2= (v13,v25,r3) 𝑥𝑥3= (v11,v23,r3) 𝑥𝑥4= (v15,v24,r3) 𝑥𝑥5= (v14,v23,r3) 𝑥𝑥6= (v13,v22,r3) 𝑥𝑥7= (v12,v21,r3) 𝑥𝑥4= (v14,v25,r3) 𝑥𝑥5= (v13,v24,r3) 𝑥𝑥6= (v12,v23,r3) 𝑥𝑥7= (v11,v22,r3)

𝒘𝒘

𝑇𝑇

𝝋𝝋(𝒙𝒙

𝒌𝒌

) ≤ 𝒘𝒘

𝑇𝑇

𝝋𝝋(𝒙𝒙

𝒌𝒌

)

(18)

Adding the monotonicity constraints to SVM, the model become as following:

18

SVM with Monotonicity Constraints(1/3)

min J 𝒘𝒘, 𝛆𝛆 = 12 𝒘𝒘 2 + 𝐶𝐶 � 𝑖𝑖=1 𝑁𝑁 𝜀𝜀𝑖𝑖, 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑡𝑡 𝑦𝑦𝑖𝑖 𝒘𝒘𝑻𝑻𝝋𝝋 𝒙𝒙𝒊𝒊 + 𝑠𝑠 ≥ 1 − 𝜀𝜀𝑖𝑖, 𝑖𝑖 = 1, … , 𝑁𝑁, 𝒘𝒘𝑻𝑻𝝋𝝋 𝒙𝒙 ≤ 𝒘𝒘𝑻𝑻𝝋𝝋 𝒙𝒙 , 𝜀𝜀𝑖𝑖 ≥ 0, 𝑖𝑖 = 1, … , 𝑁𝑁.

(19)

The problem use the Lagrangian multiplier will be turned into following form:

19

SVM with Monotonicity Constraints(2/3)

max𝜶𝜶,𝜷𝜷 𝑄𝑄�(𝜶𝜶, 𝜷𝜷) = � 𝛼𝛼𝑖𝑖 −12 � � 𝛼𝛼𝑖𝑖𝛼𝛼𝑠𝑠𝑦𝑦𝑖𝑖𝑦𝑦𝑠𝑠𝝋𝝋(𝒙𝒙𝒊𝒊)𝑻𝑻 𝑁𝑁 𝑠𝑠=1 𝑁𝑁 𝑖𝑖=1 𝑁𝑁 𝑖𝑖=1 𝝋𝝋�𝒙𝒙𝒋𝒋� −12 � � 𝛽𝛽𝑘𝑘𝛽𝛽𝑙𝑙 �𝝋𝝋(𝒙𝒙𝒌𝒌) − 𝝋𝝋�𝒙𝒙𝒌𝒌�� 𝑇𝑇 𝑀𝑀 𝑙𝑙=1 �𝝋𝝋(𝒙𝒙𝒍𝒍) − 𝝋𝝋�𝒙𝒙𝒍𝒍�� 𝑀𝑀 𝑘𝑘=1 −12 � � 𝛼𝛼𝑖𝑖𝑦𝑦𝑖𝑖𝛽𝛽𝑘𝑘 𝑀𝑀 𝑘𝑘=1 𝑁𝑁 𝑖𝑖=1 �𝝋𝝋(𝒙𝒙𝒌𝒌) − 𝝋𝝋�𝒙𝒙𝒌𝒌�� 𝑇𝑇 𝝋𝝋(𝒙𝒙𝒊𝒊) −12 � � 𝛼𝛼𝑖𝑖𝑦𝑦𝑖𝑖𝛽𝛽𝑘𝑘 𝑀𝑀 𝑘𝑘=1 𝑁𝑁 𝑖𝑖=1 𝝋𝝋(𝒙𝒙𝒊𝒊)𝑻𝑻�𝝋𝝋(𝒙𝒙 𝒌𝒌) − 𝝋𝝋�𝒙𝒙𝒌𝒌��, 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑡𝑡 �𝛼𝛼𝑖𝑖𝑦𝑦𝑖𝑖 = 0, 𝑁𝑁 𝑖𝑖=1 0 ≤ 𝛼𝛼𝑖𝑖 ≤ 𝐶𝐶, 𝑖𝑖 = 1, … , 𝑁𝑁, 𝛽𝛽𝑘𝑘 ≥ 0, ∀𝑘𝑘 = 1, … , 𝑀𝑀.

(20)

Rewrite the problem into the matrix form:

20

SVM with Monotonicity Constraints(3/3)

min𝜶𝜶,𝜷𝜷 �𝑄𝑄 𝜶𝜶, 𝜷𝜷 = 1 2 𝜶𝜶𝑇𝑇 𝜷𝜷𝑇𝑇 G 𝜶𝜶 𝜷𝜷 − 1𝑇𝑇𝜶𝜶, 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑡𝑡 �𝑖𝑖=1 𝑁𝑁 𝛼𝛼𝑖𝑖𝑦𝑦𝑖𝑖 = 0, 0 ≤ 𝛼𝛼𝑖𝑖 ≤ 𝐶𝐶, 𝑖𝑖 = 1, … , 𝑁𝑁, 𝛽𝛽𝑘𝑘 ≥ 0, ∀𝑘𝑘 = 1, … , 𝑀𝑀. G = G11 G12 G21 G22 G𝑖𝑖,𝑗𝑗11 = 𝑦𝑦𝑖𝑖𝑦𝑦𝑗𝑗𝝋𝝋 𝒙𝒙𝒊𝒊 𝑇𝑇𝝋𝝋 𝒙𝒙𝒋𝒋 , G𝑖𝑖,𝑘𝑘12 = 𝑦𝑦𝑖𝑖 𝝋𝝋 𝒙𝒙𝒌𝒌 − 𝝋𝝋 𝒙𝒙𝒌𝒌 𝑇𝑇 𝝋𝝋 𝒙𝒙𝒊𝒊 , G21 = G12 𝑇𝑇, G𝑘𝑘,𝑙𝑙22 = 𝝋𝝋 𝒙𝒙𝒌𝒌 − 𝝋𝝋 𝒙𝒙𝒌𝒌 𝑇𝑇 𝝋𝝋 𝒙𝒙𝒍𝒍 − 𝝋𝝋 𝒙𝒙𝒍𝒍 for 𝑖𝑖, 𝑠𝑠 = 1, … , 𝑁𝑁, 𝑘𝑘, 𝑙𝑙 = 1, … , 𝑀𝑀.

(21)

 The problem output the following optimal solution:

 minimize the cost function

 Mixture function 21

Parallel Strategy-Mixture

𝑓𝑓 𝑥𝑥 = [ℎ � 𝑗𝑗=1 𝑃𝑃 𝜔𝜔𝑗𝑗(𝑥𝑥)𝑆𝑆𝑗𝑗(𝑥𝑥)] 𝑆𝑆𝑖𝑖(𝑥𝑥) = � 𝑖𝑖=1 𝑁𝑁 𝛼𝛼𝑖𝑖𝑦𝑦𝑖𝑖𝐾𝐾 𝑥𝑥𝑖𝑖, 𝒙𝒙 + � 𝑘𝑘=1 𝑀𝑀 𝛽𝛽𝑘𝑘 𝐾𝐾 𝑥𝑥𝑘𝑘, 𝒙𝒙 − 𝐾𝐾 𝑥𝑥𝑘𝑘, 𝒙𝒙 + 𝑠𝑠 𝜔𝜔 = � 𝑖𝑖=1 𝑁𝑁 𝑓𝑓 𝑥𝑥𝑖𝑖 − 𝑦𝑦𝑖𝑖 .

(22)

22

Agenda

Introduction

Literature Review

Research Methodology

Experimental Results and Analysis

Conclusions and Suggestions

(23)

23

Experimental Design(1/3)

1. Hardware: Intel Core i7-4770 CPU 3.2 GHz with 16 GB RAM

Software: MATLAB R2015a

2. Kernel function: RBF function

3. Grid search:

C = {0.01 0.05 0.1 0.5 1 5 10 50 100 500 1000},

σ = {0.5 5 10 15 25 50 100 250 500}.

(24)

24

Experimental Design(2/3)

Training Dataset 67% Testing Dataset 33% Dataset RMC-SVM Mixture RMC-SVM Repeat 30 times

(25)

25

Experimental Design(3/3)

Experiment Parts Constraints } 𝑵𝑵𝒖𝒖𝒖𝒖𝒖𝒖𝒖𝒖𝒖𝒖 𝒐𝒐𝒐𝒐 𝒑𝒑𝒑𝒑𝒖𝒖𝒑𝒑𝒑𝒑 = {𝟐𝟐, 𝟒𝟒, 𝟔𝟔 𝑵𝑵𝒖𝒖𝒖𝒖𝒖𝒖𝒖𝒖𝒖𝒖 𝒐𝒐𝒐𝒐 𝒄𝒄𝒐𝒐𝒄𝒄𝒑𝒑𝒑𝒑𝒖𝒖𝒑𝒑𝒊𝒊𝒄𝒄𝒑𝒑𝒑𝒑 = 𝟔𝟔𝟔𝟔, 𝟏𝟏𝟐𝟐𝟏𝟏

(26)

26

Dataset Collection

Dataset : WDBC

Attribute Type Monotonicity Relationships

Clump thickness Numerical Increasing with Class

Uniformity of cell size Numerical Increasing with Class

Uniformity of cell shape Numerical Increasing with Class

Marginal adhesion Numerical Increasing with Class

Single epithelial cell size Numerical Increasing with Class

Bare nuclei Numerical Increasing with Class

Bland chromatin Numerical Increasing with Class

Normal nucleoli Numerical Increasing with Class

Mitoses Numerical Increasing with Class

Class Class value=4 (malignant): 239 (34.99%)

Class value=2 (benign): 444 (65.01%)

• UCI

• Size : 683 × 9

Table 1. WDBC dataset characteristics and monotonic relationship (http://kdd.ics.uci.edu/)

(27)

27

Performance Measures(1/2)

𝐴𝐴𝑠𝑠𝑠𝑠𝑠𝑠𝐴𝐴𝐴𝐴𝑠𝑠𝑦𝑦 =

𝑇𝑇𝑃𝑃+𝑇𝑇𝑁𝑁 𝑇𝑇𝑃𝑃+𝐹𝐹𝑃𝑃+𝑇𝑇𝑁𝑁+𝐹𝐹𝑁𝑁

𝑅𝑅𝑠𝑠𝑠𝑠𝐴𝐴𝑙𝑙𝑙𝑙 =

𝑇𝑇𝑃𝑃 𝑇𝑇𝑃𝑃+𝐹𝐹𝑁𝑁

𝑃𝑃𝐴𝐴𝑠𝑠𝑠𝑠𝑖𝑖𝑠𝑠𝑖𝑖𝑡𝑡𝑛𝑛 =

𝑇𝑇𝑃𝑃 𝑇𝑇𝑃𝑃+𝐹𝐹𝑃𝑃

𝐹𝐹 − 𝑚𝑚𝑠𝑠𝐴𝐴𝑠𝑠𝑠𝑠𝐴𝐴𝑠𝑠 = 2 �

𝑃𝑃𝑑𝑑𝑛𝑛𝑖𝑖𝑖𝑖𝑃𝑃𝑖𝑖𝑛𝑛𝑛𝑛�𝑅𝑅𝑛𝑛𝑖𝑖𝑅𝑅𝑙𝑙𝑙𝑙 𝑃𝑃𝑑𝑑𝑛𝑛𝑖𝑖𝑖𝑖𝑃𝑃𝑖𝑖𝑛𝑛𝑛𝑛+𝑅𝑅𝑛𝑛𝑖𝑖𝑅𝑅𝑙𝑙𝑙𝑙 Confusion Matrix Actual Predicted Positive Negative Positive True Positive (TP) False Negative (FN) Negative False Positive (FP) True Negative (TN) Table 4-3. Confusion matrix

(28)

28

Performance Measures(2/2)

• Frequency Monotonicity Rate 𝐹𝐹𝑀𝑀𝑅𝑅 = 𝐹𝐹𝑀𝑀𝑃𝑃

• where P is the number of observed pairs

• FM is the number of pairs that do not violate the monotonicity.

(29)

29

(30)

30

Agenda

Introduction

Literature Review

Research Methodology

Experimental Results and Analysis

Conclusions and Suggestions

(31)

31

Contributions(1/2)

 RMC-SVM vs The parallelized RMC-SVM

• The training time of the parallel strategy RMC-SVM is less than RMC-SVM when both have the similarly classified results.

 The parallelized RMC-SVM with different part to divide dataset

• The parallel strategy RMC-SVM decreases with the increase of the number of parts.

(32)

32

Contributions(2/2)

 Managerial Implications • The monotonicity constraints • The parallel strategy

 Recommendations of Future Works

• Try different algorithms to solve quadratic problem more efficiently. • Extend the classification problem to multiclass cases.

• Attempt more different number parts to divide whole training dataset. • Improve the method of constructing monotonicity constraints.

(33)

33

Q & A

數據

Table 4-3. Confusion matrix

參考文獻

相關文件

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.. linear SVM: more robust and solvable with quadratic programming Lecture 2: Dual Support

For obvious reasons, the model we leverage here is the best model we have for first posts spam detection, that is, SVM with RBF kernel trained with dimension-reduced

2 Distributed classification algorithms Kernel support vector machines Linear support vector machines Parallel tree learning.. 3 Distributed clustering

2 Distributed classification algorithms Kernel support vector machines Linear support vector machines Parallel tree learning?. 3 Distributed clustering

Reading Task 6: Genre Structure and Language Features. • Now let’s look at how language features (e.g. sentence patterns) are connected to the structure

support vector machine, ε-insensitive loss function, ε-smooth support vector regression, smoothing Newton algorithm..

First, we discuss practical use of SVM as an example to see how users apply a machine learning method Second, we discuss design considerations for a good machine learning package..

Through the use of SV clustering combing with one-class SVM and SMO, the hierarchical construction between Reuters categories is built automatically. The hierarchical