• 沒有找到結果。

Gene selection and classification using Taguchi chaotic binary particle swarm optimization

N/A
N/A
Protected

Academic year: 2021

Share "Gene selection and classification using Taguchi chaotic binary particle swarm optimization"

Copied!
5
0
0

加載中.... (立即查看全文)

全文

(1)

Gene selection and classification using Taguchi chaotic binary particle

swarm optimization

Li-Yeh Chuang

a

, Cheng-San Yang

b,⇑

, Kuo-Chuan Wu

c

, Cheng-Hong Yang

d,e,⇑ a

Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung 80041, Taiwan

b

Department of Plastic Surgery, Chia-Yi Christian Hospital, Chiayi 60002, Taiwan

c

Department of Computer Science and Information Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung 80708, Taiwan

d

Department of Network Systems, Toko University, Chiayi 61363, Taiwan

eDepartment of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung 80708, Taiwan

a r t i c l e

i n f o

Keywords: Microarray data

Correlation-based feature selection Taguchi-binary particle swarm optimization K-nearest neighbor

a b s t r a c t

The purpose of gene expression analysis is to discriminate between classes of samples, and to predict the relative importance of each gene for sample classification. Microarray data with reference to gene expres-sion profiles have provided some valuable results related to a variety of problems and contributed to advances in clinical medicine. Microarray data characteristically have a high dimension and a small sam-ple size. This makes it difficult for a general classification method to obtain correct data for classification. However, not every gene is potentially relevant for distinguishing the sample class. Thus, in order to ana-lyze gene expression profiles correctly, feature (gene) selection is crucial for the classification process, and an effective gene extraction method is necessary for eliminating irrelevant genes and decreasing the classification error rate.

In this paper, correlation-based feature selection (CFS) and the Taguchi chaotic binary particle swarm optimization (TCBPSO) were combined into a hybrid method. The K-nearest neighbor (K-NN) with leave-one-out cross-validation (LOOCV) method served as a classifier for ten gene expression profiles. Experi-mental results show that this hybrid method effectively simplifies features selection by reducing the number of features needed. The classification error rate obtained by the proposed method had the lowest classification error rate for all of the ten gene expression data set problems tested. For six of the gene expression profile data sets a classification error rate of zero could be reached. The introduced method outperformed five other methods from the literature in terms of classification error rate. It could thus constitute a valuable tool for gene expression analysis in future studies.

Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction

Microarray data characteristically have a high dimension and a small sample size, which makes it difficult for a general classi-fication method to obtain correct data for classiclassi-fication. The pur-pose of classification is to build an efficient model for predicting the class membership of data. This model should produce a cor-rect label on the training data and predict the label of any un-known data accurately. Determining an optimal feature (gene) subset is a very complex task, yet one which proves decisive

for the outcome of the classification error rate. Classifying microarray data involves feature selection and classifier design. Feature selection is the process of choosing a subset of features from an original feature set and thus can be viewed as a princi-pal pre-processing tool prior to solving the classification prob-lems (Wang, Yang, Teng, Xia, & Jensen, 2007). The goal of feature selection is to reduce the dimensionality of the problem and to retain the characteristics necessary for recognition, classi-fication and/or the data mining process. A reliable selection method that obtains the relevant genes from the sample data is needed in order to decrease the classification error rates and to avoid incomprehensibility.

Performing an exhaustive search over the entire solution space is not practical since this would require a long computing time associated with high cost. To overcome these feature selection problems, irrelevant and redundant features have to be eliminated and only features that are relevant for the classification process should be considered. Deleting irrelevant features significantly

0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.04.165

⇑ Corresponding authors. Addresses: Department of Plastic Surgery, Chia-Yi Christian Hospital, Chiayi 60002, Taiwan. Tel.:+886 5 276 5041; fax: +886 5 277 4511 (C.-S. Yang). Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung 80708, Taiwan. Tel.: +886 7 381 4526x5639; fax: +886 7 383 6844 (C.-H. Yang).

E-mail addresses: chuang@isu.edu.tw (L.-Y. Chuang), p8896117@mail.ncku. edu.tw (C.-S. Yang), 1097308101@cc.kuas.edu.tw (K.-C. Wu), chyang@cc.kuas. edu.tw(C.-H. Yang).

Contents lists available atScienceDirect

Expert Systems with Applications

(2)

improves the computational efficiency and lowers the classifica-tion error rate. As many pattern recogniclassifica-tion techniques were orig-inally not designed to deal with the large amount of irrelevant or redundant features, combining feature selection techniques has become a necessity (Guyon & Elisseeff, 2003; Kohavi & John, 1997; Liu & Motoda, 1998). Existing methods for data reduction, or more specifically for feature selection in the context of micro-array data analysis (Kim & Cho, 2008; Saeys, Inza, & Larranaga, 2007; Tang, Zhang, Huang, Hu, & Zhao, 2008; Wong & Hsu, 2008), can be classified into two major groups: filter and wrapper approaches. The filter approach separates data before the actual classification process and then calculates feature weight values. Thus, features that represent the original data set better can be identified. However, the filter approach does not account for inter-actions amongst the features. Methods in this category include the correlation-based feature selection (CFS) (Hall, 1999), t-test, infor-mation gain (Quinlan, 1986), mutual information (Battiti, 1994) and entropy-based methods (Liu, Krishnan, & Mondry, 2005). Wrapper models generally focus on improving classification accu-racy of pattern classification problems and typically perform better (i.e., reach high classification accuracy) than filter models. How-ever, wrapper approaches are computationally more expensive than filter methods (Kohavi & John, 1997; Liu et al., 2005). Several methods have previously been used to perform feature selection of training and testing data, such as genetic algorithms (Raymer, Punch, Goodman, Kuhn, & Jain, 2000), branch and bound algo-rithms (Narendra & Fukunaga, 1977), sequential search algorithms (Pudil, Novovicov, & Kittler, 1994), tabu search (Zhang & Sun, 2002), binary particle swarm optimization (BPSO) (Chuang, Chang, Tu, & Yang, 2008), and hybrid genetic algorithms (Oh, Lee, & Moon, 2004). Some novel evaluation criteria for wrapper and filter models are currently under development (Zhu, Ong, & Dash, 2007).

Many optimization algorithms produce locally optimal solu-tions and are thus combined with a local search process to improve the outcome. An example of such a local search process incorpo-rated in a genetic algorithm can be found inOh et al. (2004). In this study, we used the Taguchi method as a local search method in chaotic binary particle swarm optimization (CBPSO). The Taguchi method uses ideas from statistical experimental design to improve or optimize products, processes and equipment. It uses two major tools: the signal-to-noise ratio (SNR), which measures quality, and orthogonal arrays (OAs), which are used to study the many design parameters simultaneously. This method has been successfully ap-plied in machine learning and data mining, e.g., combined data mining and electrical discharge machining (Chang, Tsai, & Ke, 2006). Sohn and Shin used the Taguchi experimental design for the Monte Carlo simulation of classifier combination methods (Sohn & Shin, 2007). Kwak and Choi used the Taguchi method for feature selection for classification problems (Kwak & Choi, 2002). Chen et al. optimized neural network parameters using the Taguchi method (Chen, Tai, Wang, Deng, & Chen, 2008).

This paper presents a hybrid feature selection approach consist-ing of two stages, namely correlation-based feature selection and Taguchi chaotic binary particle swarm optimization (CFS–TCBPSO). In the former, the correlation-based feature selection value of each feature is calculated in order to select those features that can differentiate a variety of classes. This process is considered a filter approach. The Taguchi chaotic binary particle swarm optimization, a wrapper approach, is then applied to the features that have been selected during the first-stage, and evaluates if the selected features affect the classification error rate using the K-nearest neighbor (KNN) classifier (Cover & Hart, 1967; Fix & Hodges, 1951; Tan, 2006) with leave-one-out cross-validation (LOOCV) (Cawley & Talbot, 2003; Stone, 1974). Ten classification microarray data sets taken from the literature (Diaz-Uriarte & Alvarez de Andres, 2006) were used for the feature selection and

classification. The experimental results show that the proposed method achieved the lowest classification error rates and outper-formed the other methods from the literature it was tested against. 2. Material and methods

2.1. Correlation-based feature selection

Correlation-based feature selection (CFS) is a filter feature selec-tion method using a heuristic for evaluating the merit of a subset of feature. The heuristic takes the individual features useful for label-ing the class and their inter-correlation into account. The hypoth-esis of CFS is based on the statement Good feature subsets contain features highly correlated with (i.e., predictive of) the class, yet uncor-related with (i.e. not predictive of) each other (Hall, 1999).

This hypothesis is incorporated into the correlation-based heu-ristic evaluation equation as:

MeritS¼

k

c

cf

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k þ kðk  1Þ

c

ff

q ð1Þ

where MeritSis the heuristic merit of a feature subset S containing k features,

c

cf is the average feature and class correlation, and

c

ff is the average feature-feature intercorrelation (f 2 S). Eq. (1) is Pearson’s correlation, where all variables have been standardized. General filter methods estimate the significance of a feature indi-vidually. CFS is then used to determine the best combination of attribute subsets via score values from the original data sets. The attributes are combined since they would be poor predictors of the class individually. Redundant attributes are discriminated against because they would be highly correlated with one or more of the other attributes (Hall, 1999).

Various heuristic search strategies, such as the best first method (Rich & Knight, 1991), are often applied to search the feature subset space in a reasonable time frame. We applied the best-first-method to calculate a matrix of class and feature-feature correlation merits for CFS from the training data. The best-first-search starts with an empty set of features and generates all possible single feature expansions. Given enough time, a best-first-search will explore the entire feature subset space, so CFS uses a stopping criterion when subsets are found (Hall, 1999). In order to calculate the merit of a feature set, the correlation between features is computed using symmetrical uncertainty (SU):

SU ¼ 2:0  HðYÞ þ HðXÞ  HðX; YÞ

HðYÞ þ HðXÞ

 

ð2Þ where H(Y) and H(X, Y) are defined as:

HðYÞ ¼ X

y2Y

pðyÞlog2ðpðyÞÞ ð3Þ

where a probabilistic model of a feature Y can be formed by esti-mating the individual probabilities of the values y 2 Y from the training data. If feature Y in the training data is partitioned accord-ing to another feature X, then the relationship between features Y and X is given by:

HðYjXÞ ¼ X

x2X

pðxÞX

y2Y

pðyjxÞlog2ðpðyjxÞÞ ð4Þ

SU compensates for the information gain’s bias toward some attri-butes; the SU value is in the range [0, 1].

2.2. Taguchi method

The Taguchi method (Taguchi, Chowdhury, & Taguchi, 2000; Tsai, Liu, & Chou, 2004; Wu & Wu, 2000) is a robust design approach that uses ideas from statistical experimental design for

(3)

estimating and performing improvements in products, processes, and parameter settings. In a robust experimental design, processes or products can be analyzed and improved by altering relevant de-sign factors. Robust experimental dede-sign is used to dede-sign experi-ments and analyze the solution efficiently and reliably. The Taguchi method provides two tools, an orthogonal array (OA) and the signal-to-noise ratio (SNR), for analysis and improvement (Taguchi et al., 2000; Tsai et al., 2004; Wu & Wu, 2000). These tools are used to determine the levels of factors and to minimize the sen-sitivity of noise. A general two-level OA can be defined as Lh(2d), where d is the number of columns (i.e., the number of design parameters) in the orthogonal matrix, and h = 2k(h > d, k > log

2(d), k is an integer) denotes the number of experimental trials; base 2 denotes the number of levels for each design parameter. Hence if a particular target (i.e., process or product) has d different design fac-tors, 2dpossible experimental trials will have to be considered in a full factorial experimental design. OA is principally utilized to de-crease experimental efforts associated with the d design parame-ters. An OA can be considered a fractional factorial experimental design matrix that provides a comprehensive analysis of interac-tions among all design factors, and fair, balanced and systematic comparisons of the different levels (or options) of each design fac-tor. Table 1 is an example of a L8(27) orthogonal array.

The SNR is then utilized to analyze and optimize design param-eters for a particular target. In Taguchi method classifies robust parameter design problems into different categories depending on the target of the problem. Typically, the smaller-the-better and larger-the-better SNR types are utilized (Wu & Wu, 2000). Consider a set of t observations {y1, y2, . . ., yt}.

For the smaller-the-better characteristic, the SNR is determined as SNR ¼ 10 log 1 n Xn t¼1 y2 t ! ð5Þ For the larger-the-better characteristic, the SNR is determined as

SNR ¼ 10 log 1 n Xn t¼1 1 y2 t ! ð6Þ The SNR in the Taguchi method is used to determine the robust-ness of the levels of each design parameter. A ‘‘high quality’’ result for a particular target can be achieved by specifying design param-eters at a specific level with a high SNR.

2.3. Binary particle swarm optimization

In the original particle swarm optimization (PSO) (Kennedy & Eberhart, 1995), each particle is analogous to an individual ‘‘fish’’ in a school of fish. It is a population-based optimization technique,

where a population is called a swarm. A swarm consists of N par-ticles moving around in a D-dimensional search space. The position of the ith particle can be represented by xi= (xi1, xi2, . . ., xiD). The velocity for the ith particle can be written as

v

i= (

v

i1,

v

i2, . . .,

v

iD). The positions and velocities of the particles are confined within [Xmin, Xmax]Dand [Vmin, Vmax]D, respectively. Each particle coexists and evolves simultaneously based on knowledge shared with neighboring particles; it makes use of its own memory and knowl-edge gained by the swarm as a whole to find the best solution. The best previously encountered position of the ith particle is denoted its individual best position pi= (pi1, pi2, . . ., piD), a value called pbesti. The best value of the all individual pbestivalues is denoted the global best position g = (g1, g2, . . ., gD) and called gbest. The PSO process is initialized with a population of random particles, and the algorithm then executes a search for optimal solutions by continuously updating generations. However, many optimiza-tion problems occur in a space featuring discrete, qualitative distinctions between variables and between levels of variables. For this reason, Kennedy and Eberhart (1997)introduced binary PSO (BPSO), which can be applied to discrete binary variables. In a binary space, a particle may move to near corners of a hypercube by flipping various numbers of bits; thus, the overall particle velocity may be described by the number of bits changed per iteration (Kennedy & Eberhart, 1997). In BPSO, each particle is updated based on the following equations:

v

new

id ¼ w 

v

oldid þ c1 r1 ðpbestid xoldidÞ þ c2 r2 ðgbestd xoldidÞ

ð7Þ If

v

new

id 2 ðVmni;VmaxÞ then

v

idnew¼ maxðminðVmax;

v

newid Þ; VminÞ

ð8Þ Sð

v

new id Þ ¼ 1 1 þ evnewid ð9Þ if r3<Sð

v

newid Þ then x new id ¼ 1 else x new id ¼ 0 ð10Þ

In these equations, w is the inertia weight that controls the im-pact of the previous velocity of a particle on its current one, r1,r2 and r3are random numbers between [0, 1], and c1and c2are accel-eration constants that control how far a particle moves in a single generation. Velocities

v

new

id and

v

oldid denote the velocities of the new and old particle, respectively. xold

id is the current particle position, and xnew

id is the new, updated particle position. In Eq.(8), particle velocities of each dimension are tried to a maximum velocity Vmax. If the sum of accelerations causes the velocity of that dimension to exceed Vmax, then the velocity of that dimension is limited to Vmax. Vmaxand Vminare user-specified parameters (in our case Vmax= 6, Vmin= 6). The position of particles after updating is calculated by the function Sð

v

new

id Þ (Eq.(9)). If Sð

v

newid Þ is larger than r3, then its position value is represented by {1} (meaning this position is selected for the next update). If Sð

v

new

id Þ is smaller than r3, then its position value is represented by {0} (meaning this position is not selected for the next update).

2.4. Chaotic binary particle swarm optimization

Generating random sequences with a long period and good uni-formity is very important for a heuristic algorithm. Since chaos is non-repetitive a heuristic algorithm can be embedded. Chaos can be described as the complex behavior of a nonlinear deterministic system that has ergodic and stochastic properties (Schuster & Just, 2005). It is very sensitive to the initial conditions and the parame-ters used. In other word, cause and effect of chaos are not propor-tional to the small differences in the initial values. In what is called the ‘‘butterfly effect’’, small variations of an initial variable can re-sult in huge differences in the solutions after a certain number of iterations. Optimization algorithms based on the chaos theory

Table 1

L8(27) orthogonal array.

Experimental trial Design factors (features)

A B C D E F G Column number 1 2 3 4 5 6 7 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 2 3 1 2 2 1 1 2 2 4 1 2 2 2 2 1 1 5 2 1 2 1 2 1 2 6 2 1 2 2 1 2 1 7 2 2 1 1 2 2 1 8 2 2 1 2 1 1 2

(4)

set of genes that can provide meaningful diagnostic information for disease prediction without diminishing accuracy. Feature selection uses relatively few features since only selective features need to be used. This does not affect the predictive error rate in a negative way. A general overall feature selection approach can be found in

Saeys et al. (2007). Feature selection methods using a wrapper ap-proach are very much dependent on the classifier or the pattern recognition approach used to assign the feature (gene) subset. On the other hand, filter approaches take only intrinsic features of the data into account. Finally, an embedded approach similar to a wrapper approach has the advantage that it includes interactions with the classification model, while at the same time it is far less computationally intensive than a wrapper method (Saeys et al., 2007).Wang et al. (2005)indicate that filter approaches can select more relevant feature subsets faster than wrapper approaches. On the other hand, wrapper approaches generally tend to obtain bet-ter classification accuracies. Inza, Larranaga, Blanco, and Cerrolaza (2004)) and Xiong, Fang, and Zhao (2001) used a wrapper approach to implement feature selection, and selected better feature subsets to boost classification accuracy. Nevertheless, optimal solutions are difficult to find due to the large size of the search space if only a wrapper approach is used. In this paper, we combined a filter and wrapper approach. CFS is a filter method that searches the en-tire feature space efficiently, and TCBPSO is a wrapper method that uses an induction algorithm to evaluate the feature subsets directly. As stated above, wrapper methods generally outperform filter methods in terms of prediction error rate. Since the individual advantages of a wrapper and filter method complement each other well (Zhu et al., 2007), we used a hybrid two-stage strategy to increase the classification accuracy. The Taguchi method imple-mented under the CBPSO procedure is responsible for the local search. The Taguchi principle is used to improve the quality of a product by minimizing the effect of the causes of variation without eliminating these causes (Tsai et al., 2004). The two-level orthogo-nal array and the SNR of the Taguchi method are used for exploita-tion. The optimum particles can easily be found by using both experimental runs and SNRs instead of executing combinations all of factor levels. Consequently, a superior candidate feature subset with high classification performance for the classification task at hand can be obtained in a subsequent iteration.

Many classifiers (e.g., KNN, linear and quadratic discriminant analysis, support vector machines, etc.) show good performance on microarray data. Each approach has its advantages and disad-vantages, so no single approach can be considered ideal. As a clas-sifier, KNN performs well for cancer classification, compared to the more sophisticated classifiers. It is an easily implemented method that has a single parameter (the number of nearest neighbors) to be pre-defined, given that the distance metric is Euclidean (Okun & Priisalu, 2009). Given a fixed dimension, a semi-definite positive norm, and n points in this space, the KNN of every point can be found in O(kn log n) time (Vaidya, 1989). The KNN method is easy

to implement by computing the distances from the test sample to all stored vectors, and its time complexity is superior to other methods. On the other hand, the KNN parameter K, the best choice of the number of neighbors, depends upon the data. Generally, lar-ger values of K reduce the effect of noise on the classification, but make boundaries between classes less distinct.Ghosh (2006) indi-cated that ‘‘the optimum value K depends on the specific data set and is to be estimated using the available training sample observa-tions‘‘. Since the time complexity of KNN is O(kn log n), the parameter K directly influences the performance efficiency. In clas-sification problems, overfitting appears when computationally intensive search algorithms are used. Estimates may be overfitted and yield biased predictions under these circumstances (Reunanen, Guyon, & Elisseeff, 2003). If the training data lies too closely to-gether, the classifier predictions are of poor quality. This occurs when there is insufficient data to train the classifier and the data does not fully cover the concept being learned. This problem is common in many real world samples where the available data may be rather noisy (Loughrey & Cunningham, 2005). In order to avoid overfitting, some additional techniques have been discussed, such as cross-validation, regularization, and early termination or resampling (Schaffer, 1993; Wolpert, 1993). However the best way to avoid overfitting is to use an abundant amount of training data. In this paper, the microarray data characteristically have a high dimension and small sample size, which is subsequently re-duced by a filter feature selection method. After feature reduction, the LOOCV technique enhances the training data for classification in a wrapper-based feature selection method.

The results show that the Taguchi method plays an important role for the local search in GA. PSO has many advantages over GA (Yang, Huang, Wu, & Chang, 2008). In order to enhance the effi-ciency of BPSO, we designed the local search in such a way that it occurs when gBest is unchanged k times.Fig. 3shows how the pro-posed Taguchi method helps CBPSO escape the local optimum. In

Yang, Chuang, Li, and Yang (2008), we used a fitness design called relative difference fitness (RDF) to determine if a local optimum oc-curred. In this case, new particles were generated to escape the entrapment in a local optimum. The IBPSO introduced inChuang et al. (2008)avoids getting trapped in a local optimum by resetting gBest. The concepts behindYang et al. (2008) and Chuang et al. (2008)are similar with regard to avoiding a local optimum. How-ever, the various search strategies may lead to different results. 6. Conclusions

In this paper, we have described a two stage feature selection approach: a filter (CFS) and wrapper (TCBPSO) feature selection method were combined in a hybrid method, and KNN with the LOOCV method served as a classifier for pattern identification in gene expression data. We compared our approaches against Random Forest, shrunken centroids and nearest neighbor methods with variable selection that have been used for classification and feature selection of large-dimensional microarray data sets. Exper-imental results show that this hybrid method effectively simplifies features selection by reducing the number of features needed. The classification error rate obtained by the proposed method was the lowest for all of the ten gene expression data set problems tested. Six gene expression profile data sets even reached a classification error rate of zero. The proposed method can conceivably be used in other research projects that implement feature selection. References

Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 503–511.

(5)

Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., et al. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America, 96, 6745–6750. Battiti, R. (1994). Using mutual information for selecting features in supervised

neural net learning. IEEE Transactions on Neural Networks, 5, 537–550. Cawley, G. C., & Talbot, N. L. C. (2003). Efficient leave-one-out cross-validation of

kernel fisher discriminant classifiers. Pattern Recognition, 36, 2585–2592. Chang, T. C., Tsai, F. C., & Ke, J. H. (2006). Data mining and Taguchi method

combination applied to the selection of discharge factors and the best interactive factor combination under multiple quality properties. The International Journal of Advanced Manufacturing Technology, 31, 164–174. Chen, W.-C., Tai, P.-H., Wang, M.-W., Deng, W.-J., & Chen, C.-T. (2008). A neural

network-based approach for dynamic quality prediction in a plastic injection molding process. Expert Systems with Applications, 35, 843–849.

Chuang, L.-Y., Chang, H.-W., Tu, C.-J., & Yang, C.-H. (2008). Improved binary PSO for feature selection using gene expression data. Computational Biology and Chemistry, 32, 29–38.

Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21–27.

Deb, K., & Raji Reddy, A. (2003). Reliable classification of two-class cancer data using evolutionary algorithms. Biosystems, 72, 111–129.

Diaz-Uriarte, R., & Alvarez de Andres, S. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7.

Fix, E., & Hodges, J. (1951). Discriminatory analysis. Nonparametric discrimination: Consistency properties. Technical report, USAF School of Aviation Medicine, Randolph Field, TX.

Frank, E., Hall, M., Trigg, L., Holmes, G., & Witten, I. H. (2004). Data mining in bioinformatics using Weka. Bioinfomatics, 20, 2479–2481.

Gao, H., Zhang, Y., Liang, S., & Li, D. (2006). A new chaotic algorithm for image encryption. Chaos, Solitons & Fractals, 29, 393–399.

Ghosh, A. K. (2006). On optimum choice of k in nearest neighbor classification. Computational Statistics and Data Analysis, 50, 3113–3123.

Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531–537.

Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. The Journal of Machine Learning Research, 3, 1157–1182.

Hall, M. A. (1999). Correlation-based feature subset selection for machine learning. PhD thesis, Department of Computer Science, University of Waikato.

Huang, H.-L., Lee, C.-C., & Ho, S.-Y. (2007). Selecting a minimal number of relevant genes from microarray data to design accurate tissue classifiers. Biosystems, 90, 78–86.

Huerta, E. B., Duval, B., & Hao, J. (2006). A hybrid ga/svm approach for gene selection and classification of microarray data. Lecture Notes in Computer Science, 3907, 34–44.

Inza, I., Larranaga, P., Blanco, R., & Cerrolaza, A. J. (2004). Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial Intelligence in Medicine, 31, 91–103.

Kennedy, J., & Eberhart, R. C. (1995). Particle swarm optimization. In IEEE international conference on neural networks, Perth, WA (pp. 1942–1948). Kennedy, J., & Eberhart, R. C. (1997). A discrete binary version of the particle swarm

algorithm. In IEEE international conference on systems, man, and cybernetics, Orlando, FL (pp. 4104–4108).

Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., et al. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7, 673–679. Kim, K.-J., & Cho, S.-B. (2008). An evolutionary algorithm approach to optimal

ensemble classifiers for DNA microarray data analysis. IEEE Transactions on Evolutionary Computation, 12, 377–388.

Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97, 273–324.

Kuo, D. (2005). Chaos and its computing paradigm. IEEE Potentials, 24, 13–15. Kwak, N., & Choi, C.-H. (2002). Input feature selection for classification problems.

IEEE Transactions on Neural Networks, 13, 143–159.

Liu, X., Krishnan, A., & Mondry, A. (2005). An entropy-based gene selection method for cancer classification using microarray data. BMC Bioinformatics, 6. Liu, H., & Motoda, H. (1998). Feature selection for knowledge discovery and data

mining. Boston: Kluwer Academic Publishers.

Loughrey, J., & Cunningham, P. (2005). Overfitting in wrapper-based feature subset selection: The harder you try the worse it gets. In Research and development in intelligent systems (Vol. XXI, pp. 33–43).

Narendra, P. M., & Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers, C-26, 917–922.

Oh, I.-S., Lee, J.-S., & Moon, B.-R. (2004). Hybrid genetic algorithms for feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 1424–1437.

Okun, O., & Priisalu, H. (2009). Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors. Artificial Intelligence in Medicine, 45, 151–162.

Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M. E., et al. (2002). Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415, 436–442.

Pudil, P., Novovicov, J., & Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15, 1119–1125.

Quinlan, J. R. (1986). Induction of Decision Trees. Machine Learning, 1, 81–106. Ramaswamy, S., Ross, K. N., Lander, E. S., & Golub, T. R. (2002). A molecular signature

of metastasis in primary solid tumors. Nature Genetics, 33, 49–54.

Raymer, M. L., Punch, W. F., Goodman, E. D., Kuhn, L. A., & Jain, A. K. (2000). Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation, 4, 164–171.

Reunanen, J., Guyon, I., & Elisseeff, A. (2003). Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research, 3, 1371–1382.

Rich, E., & Knight, K. (1991). Artificial intelligence (2nd ed.). New York: McGraw-Hill. Ross, D. T., Scherf, U., Eisen, M. B., Perou, C. M., Rees, C., Spellman, P., et al. (2000). Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics, 24, 227–235.

Saeys, Y., Inza, I., & Larranaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinfomatics, 23, 2507–2517.

Schaffer, C. (1993). Overfitting avoidance as bias. Machine Learning, 10, 153–178. Schuster, H. G., & Just, W. (2005). Deterministic chaos: An introduction (4th ed.).

Weinheim: Wiley-VCH.

Shi, Y., & Eberhart, R. (1998). A modified particle swarm optimizer. In IEEE international conference on evolutionary computation, Anchorage, AK (pp. 69–73).

Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., et al. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1, 203–209.

Sohn, S. Y., & Shin, H. W. (2007). Experimental study for the comparison of classifier combination methods. Pattern Recognition, 40, 33–40.

Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, Series B (Methodological), 36, 111–147. Taguchi, G., Chowdhury, S., & Taguchi, S. (2000). Robust engineering. New York:

McGraw-Hill.

Tan, S. (2006). An effective refinement strategy for KNN text classifier. Expert Systems with Applications, 30, 290–298.

Tang, Y., Zhang, Y.-Q., Huang, Z., Hu, X., & Zhao, Y. (2008). Recursive fuzzy granulation for gene subsets extraction and cancer classification. IEEE Transactions on Information Technology in Biomedicine, 12, 723–730.

Trelea, I. C. (2003). The particle swarm optimization algorithm: Convergence analysis and parameter selection. Information Processing Letters, 85, 317–325. Tsai, J.-T., Liu, T.-K., & Chou, J.-H. (2004). Hybrid Taguchi-genetic algorithm for

global numerical optimization. IEEE Transactions on Evolutionary Computation, 8, 365–377.

Vaidya, P. M. (1989). An O(n log n) algorithm for the all-nearest-neighbors problem. Discrete and Computational Geometry, 4, 101–115.

van ‘t Veer, L. J., Dai, H., van de Vijver, M. J., He, Y. D., Hart, A. A. M., Mao, M., et al. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536.

Wang, Y., Tetko, I. V., Hall, M. A., Frank, E., Facius, A., Mayer, K. F. X., et al. (2005). Gene selection from microarray data for cancer classification – A machine learning approach. Computational Biology and Chemistry, 29, 37–46.

Wang, X., Yang, J., Teng, X., Xia, W., & Jensen, R. (2007). Feature selection based on rough sets and particle swarm optimization. Pattern Recognition Letters, 28, 459–471.

Wolpert, D. H. (1993). On overfitting avoidance as bias. Santa Fe Institute, Technical Report SFI-TR-92-03-5001.

Wong, T.-T., & Hsu, C.-H. (2008). Two-stage classification methods for microarray data. Expert Systems with Applications, 34, 375–383.

Wu, Y., & Wu, A. (2000). Taguchi methods for robust design. New York: ASME. Xiong, M., Fang, X., & Zhao, J. (2001). Biomarker identification by feature wrappers.

Genome Research, 11, 1878–1887.

Yang, C.-S., Chuang, L.-Y., Li, J.-C., & Yang, C.-H. (2008). A novel BPSO approach for gene selection and classification of microarray data. In IEEE international joint conference on neural networks, Hong Kong (pp. 2147–2152).

Yang, C. H., Huang, C. C., Wu, K. C., & Chang, H. Y. (2008). A novel GA-Taguchi-based feature selection method. In Intelligent data engineering and automated learning, Daejeon, South Korea (pp. 112–119).

Zhang, H., & Sun, G. (2002). Feature selection using tabu search method. Pattern Recognition, 35, 701–711.

Zhu, Z., Ong, Y.-S., & Dash, M. (2007). Wrapper-filter feature selection algorithm using a memetic framework. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 37, 70–76.

數據

Fig. 3. Taguchi method effect in microarray data.

參考文獻

相關文件

 Promote project learning, mathematical modeling, and problem-based learning to strengthen the ability to integrate and apply knowledge and skills, and make. calculated

Shih-Cheng Horng , Feng-Yi Yang, “Embedding particle swarm in ordinal optimization to solve stochastic simulation optimization problems”, International Journal of Intelligent

We cannot exclude the presence of the SM Higgs boson below 127 GeV/c 2 because of a modest excess of events in the region. between 115 and 127

Microphone and 600 ohm line conduits shall be mechanically and electrically connected to receptacle boxes and electrically grounded to the audio system ground point.. Lines in

• Definition: A binary tree is a finite set of nodes that is either empty or consists of a root and two disjoint binary trees called the left subtree and t he right subtree2.

Usually the goal of classification is to minimize the number of errors Therefore, many classification methods solve optimization problems.. We will discuss a topic called

Moreover, this chapter also presents the basic of the Taguchi method, artificial neural network, genetic algorithm, particle swarm optimization, soft computing and

This project integrates class storage, order batching and routing to do the best planning, and try to compare the performance of routing policy of the Particle Swarm