An Optimal Classification Method for Biological and Medical Data

(1)

Mathematical Problems in Engineering Volume 2012, Article ID 398232,17pages doi:10.1155/2012/398232

Research Article

An Optimal Classification Method for

Biological and Medical Data

Yao-Huei Huang,

1

_{Yu-Chien Ko,}

2

_{and Hao-Chun Lu}

3

1_{Institute of Information Management, National Chiao Tung University, Management Building 2,}

1001 Ta-Hsueh Road, Hsinchu 300, Taiwan

2_{Department of Information Management, Chu-Hua University, No. 707, Section 2, WuFu Road,}

Hsinchu 300, Taiwan

3_{Department of Information Management, College of Management, Fu Jen Catholic University, No. 510,}

Jhongjheng Road, Sinjhuang, Taipei 242, Taiwan

Correspondence should be addressed to Yao-Huei Huang,yaohuei.huang@gmail.com

Received 25 October 2011; Revised 25 January 2012; Accepted 28 January 2012 Academic Editor: Jung-Fa Tsai

Copyrightq 2012 Yao-Huei Huang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This paper proposes a union of hyperspheres by the mixed-integer nonlinear program to classify biological and medical datasets. A classifying program with nonlinear terms uses piecewise linearization technique to obtain a global optimum. The numerical examples illustrate that the proposed method can obtain the global optimum more eﬀectively than current methods.

1. Introduction

Classification techniques have been widely applied in the biological and medical research domains1–5. Either objects classification or patterns recognition for biological and medical datasets necessarily demands an optimum accuracy for saving patients’ lives. However, cancer identification with the supervised learning technique does not take a global view in identifying species or predicting survivals. The improvement should cover the whole scope to give implications instead of only considering the eﬃciency for diagnosis. This research aims to extract features from whole datasets in terms of induction rules.

In the given dataset with several objects, in which each object has some attributes and belongs to a specific class, classification techniques are used to find a rule of attributes that appropriately describes the features of a specified class. The techniques have been studied over the last four decades, including decision tree-based methods6–11, hyperplane-based methods12–14, and machine learning-based methods 14–17.

(2)

To assess the eﬀects of these classifying techniques, three criteria are used for evaluating the quality of inducing rules based on the study of Li and Chen3.

i Accuracy. The rule fitting a class should not cover the objects of other classes. The accuracy of a rule should be the higher the better.

ii Support. A good rule of fitting a class should be supported by most of the objects of the same class.

iii Compact. A good rule should be expressed in a compact way. That is, the fewer the number of rules, the better the rules are.

This study proposes a novel method to induce rules with high rates of accuracy, support, and compactness based on global optimization techniques, which have become more and more useful in biological and medical researches.

The rest of this paper is organized as follows.Section 2gives an overview of the related literatures. Two types of mathematical models and a classification algorithm are proposed in Section 3. The numerical examples demonstrate the eﬀectiveness of the proposed method in Section 4. Finally, the main conclusions of this study and future work are drawn inSection 5.

2. Literature Review

Currently, two well-known methods are used to induce classification rules. The first method is the decision tree-based method, which has been developed in the last few decades6–10. It is widely applied to fault isolation of an induction motor18 to classify normal or tumor tissues19, skeletal maturity assessment 20, proteomic mass spectra classification 21, and other cases22,23. Although the decision tree-based method assumes that all classes can be separated by linear operations, the inducing rules will suﬀer if the boundaries between the classes are nonlinear. In fact, the linearity assumption prohibits practical applications because many biological and medical datasets have complicated nonlinear interactions between attributes and predicted classes.

Consider the classification problem with two attributes as shown inFigure 1, where “” represents a first-class object, and “•” represents a second-class object.Figure 1depicts a situation in which a nonlinear relationship exists between the objects of two classes. Decision tree method focuses on inducing classification rules for the objects, as shown inFigure 1b, in which the decision tree method requires four rectangular regions to classify the objects.

The second is the support vector hyperplane method, which conducts feature selection and rule extraction from the gene expression data of cancer tissue 24; it is also applied in other applications 12–14,25. The technique separates observations of diﬀerent classes by multiple hyperplanes. As the number of decision variables is required to express the relationship between each training datum and hyperplane, and the separating hyperplane is assumed a nonlinear programming problem, the training speed becomes slow for a large number of training data. Additionally, similar hypersphere support vector methods have been developed by Lin et al.26, Wang et al. 27, Gu and Wu 28, and Hifi and M’Hallah 29 for classifying objects. In classification algorithms, they partition the sample space using the sphere-structured support vector machine14,30. However, these methods need to form a classification problem as a nonlinear nonconvex program, which makes reaching an optimal solution diﬃcult. TakingFigure 1as an example, a hyperplane-based method requires four hyperplanes to discriminate the objects, as shown inFigure 2.

(3)

a Nonlinear dataset b Classify by decision tree method Figure 1: Classifying the objects of two classes.

Figure 2: Classify by hyperplane method method.

As previously mentioned, many biological and medical datasets have compli-cated boundaries between attributes and classes. Both decision tree-based methods and hyperplane-based methods find only the rules with high accuracy, which either cover only a narrow part of the objects or require numerous attributes to explain a classification rule. Although these methods are computationally eﬀective for deducing the classifications rules, they have two limitations as follows.

i Decision tree-based methods are heuristic approaches that can only induce feasible rules. Moreover, decision tree-based methods split the data into hyperrectangular regions using a single variable, which may generate a large number of branches i.e., low rates of compactness.

ii Hyperplane-based methods use numerous hyperplanes to separate objects of diﬀerent classes and divide the objects in a dataset into indistinct groups. The method may generate a large number of hyperplanes and associated rules with low rates of compactness.

(4)

Figure 3: Classify by hypersphere.

Therefore, this study proposes a novel hypersphere method to induce classification rules based on a piecewise linearization technique. The technique reformulates the original hypersphere model by a piecewise linearization approach using a number of binary variables and constraints in the number of piecewise line segments. As the number of break points used in the linearization process increases, the error in linear approximation decreases, and an approximately global optimal solution of the hypersphere model can be obtained. That is, the proposed method is an optimization approach that can find the optimal rules with a high rate of accuracy, support, and compactness. The concept of the hypersphere method is depicted inFigure 3, in which only one circle is required to classify the objects. All objects of class “•” are covered by a circle, and those not covered by this circle belong to class “.”

3. The Proposed Models and Algorithm

As the classification rules directly aﬀect the rates of accuracy, support, and compactness, we formulate two models to determine the highest accuracy rate and support rate, respectively. To facilitate the discussion, the related notations are introduced first:

ai,j: jth attribute value of the ith object,

ht,k,j: jth center value of the kth hypersphere for class t,

rt,k: radius of the kth hypersphere for class t,

nt: number of objects for class t,

ci: ith object belonging to class ci∈ {1, 2, . . . , g},

m: number of attributes, Rt: a rule describing class t.

(5)

rt,k

(ht,k,1, ht,k,2)

a Two dimensions

rt,k

(ht,k,1, ht,k,2, ht,k,3)

b Three dimensions Figure 4: The concept of hypersphere method.

3.1. Two Types of Classification Models

Considering the object xiand hypersphere St,k, and normalizing ai,j i.e., to express its scale

easily, we then have the following three notations.

Notation 1. Normalization rescales all ai,j as a_i,j. The following is a normalizing formula:

a

i,j

ai,j− a_j

aj− a_j , 3.1

where 0 ≤ a_i,j ≤ 1, aj is the largest value of attribute j, and a_j is the smallest value of

attribute j.

Notation 2. A general form for expressing an object xiis written as

xi

a

i,1, ai,2, . . . , ai,m; ci

, 3.2

where ciis the class index of object xi.

Notation 3. A general form for expressing a hypersphere St,kis written as

St,k ht,k,1, ht,k,2, . . . , ht,k,m; rt,k, 3.3

where St,kis the k’th hypersphere for class t.

We use two and three dimensions i.e., two attributes and three attributes as visualizations to depict clearly a circle and a sphere, respectively Figure 4. Figure 4a denotes the centroid of the circle asht,k,1, ht,k,2 and the radius of the circle as rt,k. They are

extended to three dimensions called sphereFigure 4b; in m dimensions i.e., m attributes,

m > 3, which are then called hyperspheres.

To find each center and the radius of the hypersphere, the following two nonlinear models are considered. The first model looks for a support rate as high as possible while the accuracy rate is fixed to 1, as shown in Model1.

(6)

Model 1. One has the following: Maximize i∈I ut,i,k subject to m j1 a i,j− ht,k,j 2 ≤ 1 − ut,i,kM r2t,k, ∀i ∈ I , m j1 a i_,j− ht,k,j 2 > r2 t,k, ∀i∈ I−,

ut,i,k∈ {0, 1} ∀i ∈ I , rt,k≥ 0, and M is big enough constant,

3.4

where I and I−are the two sets for all objects expressed, respectively, by

I _{i | i 1, 2, . . . , n, where object i ∈ class t}_, _3.5

I− _i_{| i}_{1, 2, . . . , n, where object i}_{∈ class t}_/ _. _3.6 Referring to Li and Chen3, the rates of accuracy and support of Rtin Model1can be

specified by the following definitions.

Definition 3.1. The accuracy rate of a rule Rtfor Model1is ARRt 1.

Definition 3.2. The support rate of a rule Rtfor Model1is specified as follows.

i If_k∈Kut,i,k≥ 1 for all i belonging to class t, then Ut,i 1; otherwise Ut,i 0, where

K indicates the hypersphere set for class t.

ii

SRRt

i∈classtUt,i

nt , 3.7

where nt indicates the number of objects belonging to class t.

The second model looks for an accuracy rate as high as possible while the support rate is fixed to 1, as shown in Model2.

Model 2. One has the following:

Maximize i_∈I− vt,i_,k subject to m j1 a i,j− ht,k,j 2 ≤ r2 t,k, ∀i ∈ I ,

(7)

m j1 a i_,j− ht,k,j 2 > vt,i_,k− 1M r2_t,k, ∀i∈ I−, vt,i_,k ∈ {0, 1}, ∀i∈ I−, r_t,k≥ 0, 3.8

where I and I−are the two sets expressed by3.5 and 3.6, respectively.

Similarly, the rates of accuracy and support of Rt in Model2 can be considered as

follows.

Definition 3.3. The accuracy rate of a rule Rtof Model2is denoted as ARRt and is specified

as follows.

i If_k∈Kvt,i_,k  0 belongs to class t, then V_t,i  1 for all i; otherwise, V_t,i 0, where K represents the hypersphere set for class t.

ii

ARRt Rt −

i_{∈class t}Vt,i

Rt , 3.9

whereRt represents the number of total objects covered by Rt.

Definition 3.4. The support rate of a rule Rtof Model2is denoted as SRRt, and SRRt 1.

Definition 3.5. The compactness rate of a set of rules R1, . . . , Rg, denoted as CRR1, . . . , Rg, is

expressed as follows:

CRR1, . . . , Rg gg

t1USt

, 3.10

where UStmeans the number of hyperspheres and unions of hyperspheres for class t.A union

of hyperspheres indicates that the object is covered by diﬀerent hyperspheres, as shown in Figure 5. TakeFigure 5for an example, in which there are two classes. The objects of class “” are covered by two unions of the circles i.e., S1,1∪ S1,2∪ S1,3and S1,4∪ S1,5, and the objects of class “•” are covered by one circle i.e., S2,1. Therefore, US1  2, US2 1, and

CRR1, R2 2/3.

Moreover, Models 1 and 2 are separable nonlinear programs solvable to find an optimal solution by linearizing the quadratic terms h2_t,k,j. The piecewise linearization technique is discussed as follows.

(8)

S1,1 S1,2 S1,3 S1,4 S1,5 S2,1

Figure 5: Classify by hypersphere method.

Proposition 3.6 referring to Beale and Forrest 31. Denote approximate function Lfx as a

piecewise linear function (i.e., linear convex combination) offx, where bl,l 1, 2, . . . , q represents

the break points ofLfx. Lfx is expressed as follows:

fx ∼ Lfx q l1fblwl, 3.11 x  q l1 wlbl, 3.12 q l1 wl 1, 3.13

wherewl ≥ 0, and 3.13 is a special-ordered set of type 2 (SOS2) constraint (reference to Beale and

Forrest [31]).

Note that the SOS2 constraint is a set of variables in which at most two variables may be nonzero. If two variables are nonzero, they must be adjacent in the set.

Notation 4. According to Proposition 3.6, let fx h2_t,k,j. fx is linearized by the

Proposition 3.6and is expressed as Lh2

t,k,j.

3.2. Solution Algorithm

A proposed algorithm is also presented to seek the highest accuracy rate or the highest support rate, as described as follows.

Algorithm 3.7.

Step 1. Normalize all attributesi.e., rescale a_i,j  a_i,j − a_j/aj− aj to be 0 ≤ ai,j ≤ 1.

(9)

Remove the objects

covered by from

the dataset, temporarily Step 2 Step 3 Satisfy stop conditions? No Step 4 Are all classes

processed? Yes

Step 5

Check the unions of hypersphere in the same class

Step 6

Calculate the number of unions for all

End

Step 0 Normalize (all attributes)

Solve model 1 for k′th

hypersphere of class t k = k + 1 St,k USt t Step 1 t = 1 and k = 1 St,k t t = t + 1; k = 1

Figure 6: Flowchart of the proposed algorithm.

Step 3. Solve Model 1 or Model2 to obtain the kth hypersphere of class t. Remove the

objects covered by St,kfrom the dataset temporarily.

Step 4. Let k k 1, and resolve Model1or Model2 until all objects in class t are assigned

to the hyperspheres of same class.

Step 5. Let k 1 and t t 1, and reiterateStep 3until all classes are processed.

Step 6. Check the independent hyperspheres and unions of hyperspheres St,k in the same

(10)

Table 1: Dataset of Example 1.

Object ai,1 ai,2 ci Object ai,1 ai,2 ci Object ai,1 ai,2 ci

x1 6 8 1 x2 12 20 1 x3 13 8 1 x4 18 12.5 1 x5 24 19 1 x6 24 14.5 1 x7 17.5 17.5 2 x8 22 17 2 x9 22 15 2 x10 30 11 2 x11 33.5 7.5 2 x12 24.5 3.5 3 x13 26.5 8 3 x14 23.5 7.5 3 x15 6 30 3 Symbols of ci 1: 2:3:×. 1 0.75 0.5 0.25 0 a2 1 2 3 4 5 6 7 ₈ 9 10 11 12 13 14 15 1 0.75 0.5 0.25 0 a1

a Normalized data for Example 1

S1,1 S1,2 S2,1 S3,1 S2,2 1 0.75 0.5 0.25 0 a2 1 0.75 0.5 0.25 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a1

b Classified by the hypersphere method for Example 1 Figure 7: Visualization of Example 1.

Step 7. Calculate and record the number of independent hyperspheres and unions of

hyperspheres in USt, and iterate t until all classes are done.

According to this algorithm, we can obtain the optimal rules to classify objects most eﬃciently. The process of the algorithm is depicted inFigure 6.

3.3. Operation of a Simple Example

Consider a dataset T inTable 1as an example, which has object i, two attributesai,1, ai,2,

and an index of classes ci for i 1, 2, . . . , 15. The dataset T is expressed as T {xi |

ai,1, ai,2; ci∀i 1, 2, . . . , 15}. There are the domain values of ci ∈ {1, 2, 3}. As there are only

two attributes, these 15 objects can be plotted on a two-dimensional space after normalizing them, as shown inFigure 7a.

This example can be solved by the proposed algorithm as follows.

Step 1. Normalize all attributesi.e., a_i,1 a_i,1− 6/33.5 − 6 and a_i,2 a_i,2− 3.5/30 − 3.5.

(11)

Table 2: Centroid points for the Iris data set by the proposed method. Rule

number Union of spheres St,k ht,k,1 ht,k,2 ht,k,3 ht,k,4 rt,k

R1 S1,1 S1,1 0.366 0.807 0.000 0.000 0.6205 R2 S2,1∪ S2,2∪ S2,3 S2,1 0.557 0.320 0.205 0.460 0.2540 S2,2 0.575 0.581 0.626 0.515 0.0612 S2,3 0.423 0.261 0.352 0.490 0.1388 R3 S3,1∪ S3,2 S3,1 0.248 0.000 2.226 2.151 4.8087 S3,2 0.329 0.187 0.650 0.613 0.0330

Table 3: Comparing results for the Iris flower data set R1,R2,R3.

Items Proposed method Decision tree Hyperplane

support vector

ARR1,R2,R3 1,1,1 1,0.98,0.98 1,0.98,0.96

SRR1,R2,R3 1,0.98,0.98 1,0.98,0.98 1,0.96,0.98

CR 1 0.5 0.1875

Step 3. The classification modeli.e., Model1 is linearly formulated as follows:

Maximize i∈I ut,i,k subject to m j1 a i,j2− 2ai,jht,k,j L h2 t,k,j ≤ 1 − ut,i,kM r_t,k2 , ∀i ∈ I , m j1 a i_,j2− 2a_i_,jht,k,j L h2 t,k,j > r2 t,k, ∀i∈ I−, ut,i,k∈ {0, 1}, rt,k2 ≥ 0, 3.14

where I  {x1, x2, . . . , x6} and I−  {x7, x8, . . . , x15}. The optimal solution of the ht,k,1, ht,k,2, rt,1 0.047,0.265,0.15749 for S1,1, where S1,1 covers objects 1–4. We then temporarily remove these objects covered by S1,1.

Step 4. k k 1: the optimal solution of the ht,k,1, ht,k,2, rt,k 0.736,0.5,0.0138 for S1,2, where

S1,2covers objects 5-6. Class 1 is then done.

Step 5. As t t 1, k 1, and Steps3and4are iterated, we then, respectively, have optimal

solutions for St,kas follows. The results are shown inFigure 7b.

i h2,1,1, h2,1,2, r2,1 0.514, 0.469, 0.0127, where S2,1covers objects 7–9. ii h2,2,1, h2,2,2, r2,2 0.929, 0.210, 0.0251, where S2,2covers objects 10-11. iii h3,1,1, h3,1,2, r3,1 0.583, 0.188, 0.0436, where S3,1covers objects 12–14.

Step 6. Check and calculate the unions of hypersphere St,kfor all k in class ti.e., Initial t 1.

(12)

4. Numerical Examples

This study shows how the experimental results evaluate the performance, including accuracy, support, and compactness rates, and compares the proposed model with diﬀerent methods using CPLEX32. All tests were run on a PC equipped with an Intel Pentium D 2.8 GHz CPU and 2 GMB RAM. Three datasets were tested in our experiments as follows:

i Iris Flower dataset introduced by Sir Ronald Aylmer Fisher 1936,

ii European barn swallow Hirundo rustica dataset obtained by trapping individual swallows in Stirlingshire, Scotland, between May and July 19971,3,

iii the highly selective vagotomy HSV patient dataset of F. Raszeja Memorial Hospital in Poland3,33,34.

4.1. Iris Flower Dataset

The Iris Flower dataset contains 150 objects. Each object is described by four attributesi.e., sepal length, sepal width, petal length, and petal width and is classified by one of three classesi.e., setosa, versicolor, and virginica. By solving the proposed method, we induced six hyperspheresi.e., S1,1 ∈ Class 1, S2,1, S2,2, S2,3 ∈ Class 2, and S3,1, S3,2 ∈ Class 3. The induced classification rules are reported inTable 2.Table 2also lists a hypersphere and two unionsi.e., S1,1, S2,1∪ S2,2∪ S2,3, and S3,1∪ S3,2 of hyperspheres with centroid points and radii.

Rule R1inTable 2contains a hypersphere S1,1, which implies that i “if a

i,1− 0.3662 ai,2− 0.8072 ai,3 − 02 ai,4− 02 ≤ 0.6205, then object xi

belongs to class 1.”

Rule R2inTable 2contains a union of three hyperspheresi.e., S2,1∪ S2,2∪ S2,3 which implies that

i “if a

i,1− 0.5572 ai,2− 0.322 ai,3 − 0.2052 ai,4− 0.462≤ 0.254, then object xi

belongs to class 2,” or ii “if a

i,1− 0.5752 ai,2− 0.5812 ai,3− 0.6262 ai,4− 0.5152≤ 0.0612, then object

xibelongs to class 2,” or

iii “if a

i,1− 0.4232 ai,2− 0.2612 ai,3− 0.3522 ai,4− 0.492≤ 0.1388, then object

xibelongs to class 2.”

Rule R3inTable 2contains a union of two hyperspheresi.e., S3,1∪S3,2, which implies that

i “if a

i,1− 0.2482 ai,2− 02 ai,3 − 2.2262 ai,4− 2.1512≤ 4.8087, then object xi

belongs to class 3,” or ii “if a

i,1− 0.3292 ai,2− 0.1872 ai,3− 0.652 ai,4− 0.6132 ≤ 0.033, then object

xibelongs to class 3.”

Comparing the proposed method with both decision tree3 and hyperplane methods 35 in deducing the classification rules for the Iris Flower dataset, Table 3 lists the experimental result.

(13)

Table 4: Centroid points for the Swallow data set by the proposed method. Rule number Union of spheres St,k ht,k,1 ht,k,2 ht,k,3 ht,k,4 ht,k,5 ht,k,6 ht,k,7 ht,k,8 rt,k R1 S1,1∪S1,2∪ S1,3∪ S1,4 S1,1 0.607 0.110 0.806 0.000 0.000 0.406 1.077 1.163 1.780 S1,2 0.483 0.000 0.236 0.328 0.000 0.793 0.280 0.931 0.948 S1,3 0.588 0.000 1.179 0.000 0.000 0.653 1.037 0.982 1.879 S1,4 0.414 0.000 0.658 0.000 0.227 0.085 0.563 1.224 1.197 R2 S2,1S2,2∪S∪ 2,3∪ S2,4∪ S2,5 S2,1 0.358 0.667 0.371 1.403 0.360 1.530 0.634 0.114 2.825 S2,2 0.198 1.102 0.697 0.000 0.000 1.734 0.110 0.595 2.632 S2,3 0.532 0.422 0.323 1.676 1.009 0.000 0.180 0.020 2.458 S2,4 0.528 0.000 0.430 0.876 0.579 0.461 0.435 0.040 0.694 S2,5 0.659 1.408 0.516 0.000 0.382 0.000 0.173 0.064 1.570 Table 5: Comparing results for the Swallow data set R1,R2.

Items Proposed method Decision tree Hyperplane support

vector

ARR1,R2 1, 1 0.97,1 0.97,1

SRR1,R2 1, 0.97 0.97,1 0.97,1

CR 1 0.3 0.1

The accuracy rates ofR1, R2, R3 in the proposed method are 1,1,1, as Model1has been solved. This finding indicates that none of objects in class 2 or class 3 are covered by S1,1, none of objects in classes 1 or 3 are covered by S2,1∪S2,2∪S2,3, and none of the objects in classes 1 or 2 are covered by S3,1∪ S3,2. The support rate ofR1,R2,R3 in the proposed method is 1,0.98,0.98, indicating that all objects in class 1 are covered by S1,1, 98% of the objects in class 2 are covered by S2,1, S2,2, and S2,3, and 98% of the objects in class 3 are covered by S3,1and

S3,2. The compactness rate of rules R1, R2, and R3is computed as CRR1, R2, R3 3/3 1. Finally, we determine the following.

i Although all three methods perform very well in the rates of accuracy and support, the proposed method has the best performance for the accuracy of classes 2 and 3 i.e., R2and R3.

ii The proposed method has the best compactness rate.

4.2. Swallow Dataset

The European barn swallowHirundo rustica dataset was obtained by trapping individual swallows in Stirlingshire, Scotland, between May and July 1997. This dataset contains 69 swallows. Each object is described by eight attributes, and it belongs to one of two classes i.e., the birds are classified by the gender of individual birds.

Here, we also used Model1to induce the classification rules.Table 4lists the optimal solutionsi.e., centroid and radius for both rules R1and R2.

The result of the decision tree method, which is referred to in Li and Chen3, is listed inTable 5, where ARR1, R2 0.97, 1, SR R1, R2 0.97, 1, and CR 0.3.

(14)

The result of the hyperplane method, referred to in Chang and Lin35, is also listed inTable 5, whereARR1, R2 0.97, 1, SRR1, R2 0.97, 1, and CR 0.1.

We compared the three methods inTable 5 to show that the proposed method can induce rules with better or equivalent values of AR and SR. In fact, the proposed method also has the best compactness rate.

4.3. HSV Dataset

The HSV dataset contains 122 patients classified into four classes, with each patient having 11 preoperating attributes. To maximize the support rate with respect to the proposed method i.e., Model 1, the proposed method generated seven hyperspheres and three unions of hyperspheres. The centroids and radii of the hyperspheres are reported in Table 6, and a comparison with other methods is reported inTable 7.

Using the decision tree method in the HSV dataset generates 24 rules. In addition, the hyperplane method deduces 45 hyperplanes for the HSV dataset.Table 7also shows that the proposed method can find rules with the highest ratesi.e., AR, SR, and CR compared with the other two methods.

4.4. Limitation of the Proposed Method

The hypersphere models are solved by one of the most powerful mixed-integer program software CPLEX32 running in a PC. Based on optimization technique, the results of the numerical examples illustrate that the usefulness of the proposed method is better than that of the current methods, including the decision tree method and the hyperplane support vector method. As the solving time of the hypersphere model, which is linearized, mainly depends on the number of binary variables and constraints, solving the reformulated hypersphere model from the proposed algorithm takes about one minute for each dataseti.e., in Sections 4.1and4.3, in which using eight piecewise line segments linearizes the nonlinear nonconvex termi.e., Lh2

t,k,j of Model1.

The computing time for solving a linearized hypersphere program grows rapidly as the numbers of binary variables and constraints increase. Also, the computing time of the proposed method is slower than that of the decision tree method and hyperplane method, especially for large datasets or a great number of piecewise line segments. In the further study, utilizing a mainframe-version optimization software36–38, integrating meta-heuristic algorithms, or using distributed computing techniques can enhance solving speed to conquer this problem.

5. Conclusions and Future Work

This study proposes a novel method for deducing classification rules, which can find the optimal solution based on a hypersphere domain. The optimization technique for finding classification rules is approached to optimal. Results of the numerical examples illustrate that the usefulness of the proposed method is better than that of the current methods, including the decision tree method and the hyperplane method. The proposed method is guaranteed to find an optimal rule, but the computational complexity grows rapidly by increasing the problem size. More investigation and research are required to enhance further

(15)

Ta b le 6: Centr oid points for the HSV d ata b y the pr oposed method. Rule numbers Union of spher es Sk, l ht,k ,1 ht,k ,2 ht,k ,3 ht,k ,4 ht,k ,5 ht,k ,6 ht,k ,7 ht,k ,8 ht,k ,9 ht,k ,10 ht,k ,11 rt,k R1 S1,1 ∪ S1,2 ... ∪ S1,8 S1, 1 0.504 0.528 0.366 0.288 0.850 0.848 0.351 − 0.282 0.405 0.234 0.313 1.576 S1, 2 0.469 0.395 0.312 0.133 − 0.274 − 0.309 0.134 0.590 0.813 1.000 − 0.847 3.044 S1, 3 0.287 0.867 0.458 0.189 0.977 0.267 0.017 − 0.846 0.462 − 1.000 1.000 4.249 S1, 4 0.586 − 0.400 0.605 0.467 − 0.167 − 0.631 0.268 1.000 0.526 1.000 0.120 3.365 S1, 5 0.296 − 0.511 − 1.000 0.536 − 0.263 0.552 0.954 0.635 − 0.117 − 1.000 0.600 4.608 S1, 6 0.775 0.678 − 0.092 0.340 0.366 0.455 0.451 1.000 0.422 0.295 0.470 1.560 S1, 7 0.525 0.296 0.194 0.124 − 0.945 − 0.467 0.127 1.000 0.097 − 0.897 0.654 4.113 S1, 8 0.000 0.140 0.109 0.250 0.193 0.168 0.063 0.142 0.181 0.119 0.082 0.089 R2 S2, 1 S2, 1 0.513 0.056 0.143 0.078 0.533 0.439 0.180 0.167 − 0.284 − 0.464 0.749 1.450 S2, 2 S2, 2 0.533 0.381 0.139 0.568 − 0.373 − 0.005 0.080 0.831 0.293 0.215 0.369 1.293 S2, 3 S2, 3 0.000 − 0.450 0.709 0.553 0.896 1.000 0.157 − 0.447 0.068 − 0.580 1.000 3.579 S2, 4 S2, 4 0.000 0.862 0.543 − 0.066 0.389 0.451 0.106 0.394 − 0.004 0.042 − 0.014 0.409 R3 S3,1 ∪ S3,2 S3, 1 0.624 0.507 − 1.000 0.147 0.840 1.000 0.827 − 0.585 0.199 0.831 − 0.380 4.277 S3, 2 0.534 0.483 − 0.088 0.365 − 0.364 0.270 0.437 0.785 0.786 0.682 − 1.000 3.003 S3, 3 S3, 3 0.000 0.210 0.630 0.750 − 1.000 0.244 0.388 1.000 0.474 − 0.604 0.475 3.162 R4 S4,1 ∪ S4,2 ∪ S4,3 S4, 1 0.551 0.374 0.637 0.256 0.865 0.944 0.315 − 0.831 − 0.485 − 0.676 0.979 4.330 S4, 2 0.717 0.254 0.548 − 0.718 0.118 0.730 − 0.547 0.498 − 0.580 − 0.464 0.821 3.287 S4, 3 0.527 1.000 0.533 0.089 − 0.152 − 1.000 0.366 0.046 0.547 0.947 − 0.436 2.943 S4, 4 S4, 4 0.489 0.625 − 0.209 0.522 0.644 0.582 0.615 1.000 0.306 − 0.401 0.931 2.059 S4, 5 S4, 5 − 0.011 0.491 − 0.409 0.155 − 0.766 − 0.227 − 0.107 1.000 0.303 − 0.680 − 0.185 2.843

(16)

Table 7: Comparing results for the HSV data set R1,R2,R3,R4.

Items Proposed method Decision tree Hyperplane support

vector

ARR1,R2,R3,R4 1,1,1,1 0.93,0.81,0.7,0.71 0.9,1,1,0.9

SRR1,R2,R3,R4 0.99,1,1,1 0.93,0.72,0.78,0.71 0.9,0.72,0.67,0.69

CR 0.4 0.17 0.09

the computational eﬃciency of globally solving large-scale classification problems, such as running mainframe-version optimization software, integrating meta-heuristic algorithms, or using distributed computing techniques.

Acknowledgments

The authors wish to thank the editor and the anonymous referees for providing insightful comments and suggestions, which have helped them improving the quality of the paper. This work was supported by the National Science Council of Taiwan under Grants NSC 100-2811-E-009-040-, NSC 99-2221-E-030-005-, and NSC 100-2221-E-030-009-.

References

1 M. J. Beynon and K. L. Buchanan, “An illustration of variable precision rough set theory: the gender classification of the European barn swallowHirundo rustica,” Bulletin of Mathematical Biology, vol. 65, no. 5, pp. 835–858, 2003.

2 H. L. Li and C. J. Fu, “A linear programming approach for identifying a consensus sequence on DNA sequences,” Bioinformatics, vol. 21, no. 9, pp. 1838–1845, 2005.

3 H. L. Li and M. H. Chen, “Induction of multiple criteria optimal classification rules for biological and medical data,” Computers in Biology and Medicine, vol. 38, no. 1, pp. 42–52, 2008.

4 C. W. Chu, G. S. Liang, and C. T. Liao, “Controlling inventory by combining ABC analysis and fuzzy classification,” Computers and Industrial Engineering, vol. 55, no. 4, pp. 841–851, 2008.

5 J.-X. Chen, “Peer-estimation for multiple criteria ABC inventory classification,” Computers &

Operations Research, vol. 38, no. 12, pp. 1784–1791, 2011.

6 E. B. Hunt, J. Marin, and P. J. Stone, Experiments in Induction, Academic Press, New York, NY, USA, 1966.

7 L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Wadsworth Statistics/Probability Series, Wadsworth Advanced Books and Software, Belmont, Calif, USA, 1984. 8 J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, 1986.

9 J. R. Quinlan, “Simplifying decision trees,” International Journal of Man-Machine Studies, vol. 27, no. 3, pp. 221–234, 1987.

10 J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufman, San Mateo, Calif, USA, 1993. 11 H. Kim and W. Y. Loh, “Classification Trees with Unbiased Multiway Splits,” Journal of the American

Statistical Association, vol. 96, no. 454, pp. 589–604, 2001.

12 V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, NY, USA, 1995.

13 R. M. Rifkin, Everything old is new again: a fresh look at historical approaches in machine learning, Ph.D. thesis, Massachusetts Institute of Technology, Ann Arbor, MI, USA, 2002.

14 S. Katagiri and S. Abe, “Incremental training of support vector machines using hyperspheres,” Pattern

Recognition Letters, vol. 27, no. 13, pp. 1495–1507, 2006.

15 D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986.

(17)

16 C. H. Chu and D. Widjaja, “Neural network system for forecasting method selection,” Decision Support

Systems, vol. 12, no. 1, pp. 13–24, 1994.

17 C. Giraud-Carrier, R. Vilalta, and P. Brazdil, “Guest editorial: introduction to the special issue on meta-learning,” Machine Learning, vol. 54, no. 3, pp. 187–193, 2004.

18 D. Pomorski and P. B. Perche, “Inductive learning of decision trees: application to fault isolation of an induction motor,” Engineering Applications of Artificial Intelligence, vol. 14, no. 2, pp. 155–166, 2001. 19 H. Zhang, C. N. Yu, B. Singer, and M. Xiong, “Recursive partitioning for tumor classification with

gene expression microarray data,” Proceedings of the National Academy of Sciences of the United States of

America, vol. 98, no. 12, pp. 6730–6735, 2001.

20 S. Aja-Fernández, R. De Luis-Garc´ıa, M. Á. Mart´ın-Fernández, and C. Alberola-López, “A computational TW3 classifier for skeletal maturity assessment. A Computing with Words approach,”

Journal of Biomedical Informatics, vol. 37, no. 2, pp. 99–107, 2004.

21 P. Geurts, M. Fillet, D. de Seny et al., “Proteomic mass spectra classification using decision tree based ensemble methods,” Bioinformatics, vol. 21, no. 14, pp. 3138–3145, 2005.

22 C. Olaru and L. Wehenkel, “A complete fuzzy decision tree technique,” Fuzzy Sets and Systems, vol. 138, no. 2, pp. 221–254, 2003.

23 M. Pal and P. M. Mather, “An assessment of the eﬀectiveness of decision tree methods for land cover classification,” Remote Sensing of Environment, vol. 86, no. 4, pp. 554–565, 2003.

24 Z. Chen, J. Li, and L. Wei, “A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue,” Artificial Intelligence in Medicine, vol. 41, no. 2, pp. 161–175, 2007.

25 H. Nunez, C. Angulo, and A. Catala, “Rule-extraction from support vector machines,” in Proceedings

of the European Symposium on Artificial Neural Networks, pp. 107–112, 2002.

26 Y. M. Lin, X. Wang, W. W. Y. Ng, Q. Chang, D. S. Yeung, and X. L. Wang, “Sphere classification for ambiguous data,” in Proceedings of the International Conference on Machine Learning and Cybernetics, pp. 2571–2574, August 2006.

27 J. Wang, P. Neskovic, and L. N. Cooper, “Bayes classification based on minimum bounding spheres,”

Neurocomputing, vol. 70, no. 4–6, pp. 801–808, 2007.

28 L. Gu and H. Z. Wu, “A kernel-based fuzzy greedy multiple hyperspheres covering algorithm for pattern classification,” Neurocomputing, vol. 72, no. 1–3, pp. 313–320, 2008.

29 M. Hifi and R. M’Hallah, “A literature review on circle and sphere packing problems: models and methodologies,” Advances in Operations Research, vol. 2009, Article ID 150624, 22 pages, 2009. 30 M. Zhu, Y. Wang, S. Chen, and X. Liu, “Sphere-structured support vector machines for multi-class

pattern recognition,” in Proceedings of the 9th International Conference (RSFDGrC ’03), vol. 2639 of

Lecture Notes in Computer Science, pp. 589–593, May 2003.

31 E. M. L. Beale and J. J. H. Forrest, “Global optimization using special ordered sets,” Mathematical

Programming, vol. 10, no. 1, pp. 52–69, 1976.

32 IBM/ILOG, “Cplex 12.0 reference manual,” 2010,http://www.ilog.com/products/cplex/.

33 D. C. Dunn, W. E. G. Thomas, and J. O. Hunter, “An evaluation of highly selective vagotomy in the treatment of chronic duodenal ulcer,” Surgery Gynecology and Obstetrics, vol. 150, no. 6, pp. 845–849, 1980.

34 K. Slowinski, “Rough classification of HSV patients,” in Intelligent Decision Support-Handbook of

Applications and Advances of the Rough Sets Theory, R. Slowinski, Ed., pp. 77–94, Kluwer Academic

Publishers, Dordrecht, The Netherlands, 1992.

35 C. C. Chang and C. J. Lin, “LIBSVM: a library for support vector machines,” 2010,http://www .csie.ntu.edu.tw/∼cjlin/libsvm/index.html.

36 C. Than, R. Sugino, H. Innan, and L. Nakhleh, “Eﬃcient inference of bacterial strain trees from genome-scale multilocus data,” Bioinformatics, vol. 24, no. 13, pp. i123–i131, 2008.

37 M. Rockville, “Large scale computing and storage requirements for biological and environmental research,” Tech. Rep., Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, Calif, USA, 2009.

38 X. Xie, X. Fang, S. Hu, and D. Wu, “Evolution of supercomputers,” Frontiers of Computer Science in

(18)

Submit your manuscripts at

http://www.hindawi.com

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Mathematics

Journal of

Hindawi Publishing Corporation http://www.hindawi.com

Differential Equations

International Journal of

Volume 2014

Applied MathematicsJournal of

http://www.hindawi.com Volume 2014 Mathematical PhysicsAdvances in

Complex Analysis

Journal of

Optimization

Journal of

Combinatorics

Journal of

Function Spaces

Abstract and Applied Analysis

http://www.hindawi.com Volume 2014 International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporation http://www.hindawi.com Volume 2014

The Scientific

World Journal

Discrete Dynamics in Nature and Society Hindawi Publishing Corporation

Discrete Mathematics

Journal of

An Optimal Classification Method for Biological and Medical Data

Research Article