A Hybrid BPSO-CGA Approach for Gene Selection and Classification of Microarray Data

(1)

A Hybrid BPSO-CGA Approach for Gene Selection

and Classification of Microarray Data

LI-YEH CHUANG,1CHENG-HUEI YANG,2JUNG-CHIKE LI,2and CHENG-HONG YANG3,4

ABSTRACT

Microarray analysis promises to detect variations in gene expressions, and changes in

the transcription rates of an entire genome in vivo. Microarray gene expression profiles

indicate the relative abundance of mRNA corresponding to the genes. The selection of

rel-evant genes from microarray data poses a formidable challenge to researchers due to the

high-dimensionality of features, multiclass categories being involved, and the usually small

sample size. A classification process is often employed which decreases the dimensionality of

the microarray data. In order to correctly analyze microarray data, the goal is to find an

optimal subset of features (genes) which adequately represents the original set of features. A

hybrid method of binary particle swarm optimization (BPSO) and a combat genetic

algo-rithm (CGA) is to perform the microarray data selection. The K-nearest neighbor (K-NN)

method with leave-one-out cross-validation (LOOCV) served as a classifier. The proposed

BPSO-CGA approach is compared to ten microarray data sets from the literature. The

experimental results indicate that the proposed method not only effectively reduce the

number of genes expression level, but also achieves a low classification error rate.

Key words:

feature selection, genetic algorithm, K-nearest neighbor, microarray, particle swarm

optimization.

1. INTRODUCTION

M

icroarray analysis promises to detect variations in gene expressions, and changes in the transcription rates of an entire genome in vivo. A microarray chip manufactured by high-speed robotics, enables researchers to simultaneously put thousands of samples on a glass slide. The principle behind the microarray is the placement of specific nucleotides sequence in an orderly array, which then will be hybridized by fluorescent DNA or RNA markers. The location and intensity of the fluorescent spot on the glass slide reveals the extent of the transcription of a particular gene. Microarrays are not limited to nucleic acid analysis. Protein microarrays and tissue microarrays were originally introduced to study protein-drug

1

Department of Chemical Engineering, and Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung, Taiwan.

2

Department of Electronic Communication Engineering, National Kaohsiung Marine University, Kaohsiung, Taiwan.

3

Department of Network Systems, Toko University, Chiayi, Taiwan. 4

Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan. Volume 19, Number 1, 2012

# Mary Ann Liebert, Inc. Pp. 68–82

DOI: 10.1089/cmb.2010.0064

(2)

interaction, and the relationship between individual genes and diseases. They have subsequently been applied successfully to the treatment of patients. Experimental data from microarrays is routinely gathered, and the amount of information contained in microarray data is becoming enormous. This abundance of information necessitates very complex biostatistical methods, many of which are still under development, to correctly interpret the information available (Forozan et al., 2000).

Discriminant analysis of microarray data has great potential as a medical diagnosis tool, since re-sults represent the state of a cell at the molecular level. Microarray data is used in medical applica-tions since it is possible to analyze gene expression characteristics and forecast the treatment of human diseases (Chang et al., 2005; Jeffrey et al., 2005; Lonning et al., 2005). Microarray data contains a large number of features with high-dimensions, consists of multiclass categories, and is usually of a small sample size, all of which makes testing and training of general classification methods difficult. With the rapid increase of computational power for data analysis suitable algorithms have become a very important issue. A suitable algorithm selects any optimum subset of genes that may be representing a disease gene com-bination.

The classification of gene expression data samples involves feature selection and classifier design. A reliable selection method for genes relevant for sample classification is needed in order to increase clas-sification accuracies and to avoid incomprehensibility. Feature selection is the process of choosing a subset of features from the original feature set; it can thus be viewed as a principal pre-processing tool when solving classification problems (Choudhary et al., 2006; Wang et al., 2007). Theoretically, feature selection problems are NP-hard. It is impossible to conduct an exhaustive search over the entire solution space since this would take a prohibitive amount of computing time and cost (Cover and Van Campenhout 1977). The goal is to select a subset of d features (genes) from a set of D features (d < D) in a given gene expression data set (Oh et al., 2004). D is comprised of all features (genes) in a gene expression data set and may include noisy, redundant and misleading features. We therefore aimed to delete irrelevant features and only kept features relevant for classification. Deleting irrelevant features improves the computational efficiency. Furthermore, decreasing the number of features results in a reduced classification error rate. An increasing number of evolutionary and optimization algorithms (Deutsch, 2003) have been applied to feature selection problems, such as genetic algorithms (GAs) (Oh et al., 2004), particle swarm optimization (PSO) (Wang et al., 2007), and tabu search (TS) (Tahir et al., 2007).

Identifying an optimal subset of features, i.e., relevant genes, is a very complex task in bioinformatics (Hua et al., 2005). Successful identification usually focuses on the determination of a minimal number of relevant genes combined with a decrease of the classification error rate (Huang and Chang, 2007). Statistical methods applied to microarray data analysis include the linear discriminant analysis (Dudoit et al., 2002; Lee et al., 2005; Li et al., 2004), the K-nearest neighbor method (Cover and Hart, 1967), support vector machines (Furey et al., 2000; Huang and Chang, 2007; Lee and Lee, 2003; Ramaswamy et al., 2001), Random Forest (RF) (Diaz-Uriarte and de Andres, 2006), instance-based methods (Berrar et al., 2006), entropy-based methods (Liu et al., 2005), and shrunken centroids (Tibshirani et al., 2002).

In general, two crucial factors to determine the performance of gene expression classification problem can be identified; these are gene feature selection and classifier design. Hence, in addition to the selected relevant gene subsets, the classification results also depend on the performance of the classifiers. The challenge in classifier design is to extract the proper information from the training samples and to allocate the samples contained in the previously defined diagnostic classes by measurements of expression of the selected genes (Buturovic, 2006). Choosing the correct category of the selected genes is a complicated assignment. In classifier design, solving a multi-class (class >2) classification problem is generally more difficult than solving a classification problem with only two-classes. The K-nearest neighbor (K-NN) method (Deutsch, 2003; Zhu et al., 2007) and support vector machines (SVMs) (Furey et al., 2000; Huang and Chang, 2007) are two prevalent classifiers for gene classification problems. In our study, we adopted the K-nearest neighbor method. The K-nearest neighbor procedure works based on a minimum distance from the testing sample to the training samples to determine the K nearest neighbors. For error estimation methods on the classifier, a widely used approach is validation (CV). The leave-one-out cross-validation (LOOCV) procedure is a straightforward technique and gives an almost unbiased estimator ( Jirapech-Umpai and Aitken, 2005). LOOCV can be used in small sample-sized data sets. The functionality of the classifier for the error estimation is thus reduced for multi-class categories and small sample sizes of the gene microarray data.

(3)

Evolutionary algorithms with their heuristics and stochastic properties often suffer from getting stuck in local optimum. These common characteristics led to the development of evolutionary computation as an increasingly important field. A GA is a stochastic search procedure based on the mechanics of natural selection, genetics and evolution. Since this type of algorithm simultaneously evaluates many points in the search space, it is more likely to find a global solution to a given problem. PSO describes a solution process in which each particle flies through the multidimensional search space. The particle velocity and posi-tion are constantly updated according to the best previous performance of the particle or of the parti-cle’s neighbors, as well as the best performance of the particles in the entire population. Hybridization of evolutionary algorithms with local search has been investigated in many studies (Kao and Zahara, 2008; Lovbjerg et al., 2001). Such a hybrid is often referred to as a memetic algorithm. Memetic algorithms can be treated as a genetic algorithm coupled with a local search procedure (Sorensen and Sevaux, 2006). In this article, instead of using a memetic algorithm, we combined two global optimization algorithms, i.e., a combat genetic algorithm (CGA) and binary particle swarm optimization (BPSO). We employed the hybrid BPSO-CGA algorithm to implement feature selection. The CGA is embedded in the BPSO and performs the role of a local optimizer for each generation. The K-nearest neighbor method (K-NN) with leave-one-out cross-validation (LOOCV) based on Euclidean distance calculations served as a classifier of the BPSO and CGA on ten microarray data sets taken from the literature (Diaz-Uriarte and de Andres, 2006).

In order to evaluate the performance of BPSO-CGA, we compared our experimental classification result with other results reported in the literature (Diaz-Uriarte and de Andres, 2006). These literature methods consisted of four distinct methods, namely two versions of RFs (s.e.¼ 0 and s.e. ¼ 1), nearest neighbor with variable selection (NN.vs), and shrunken centroids (SC.s). In addition to these methods from the literature, we also compared our proposed BPSO-CGA with pure BPSO. Experimental results show that BPSO-CGA not only reduce the number of features, but also prevented the BPSO procedure from getting trapped in a local optimum and thereby lowering the classification error rate.

2. METHODS

2.1. Binary particle swarm optimization

Particle Swarm Optimization (PSO) is one of several population-based evolutionary computation techniques developed in 1995 by Kennedy and Eberhart (1995). PSO simulates the social behavior of birds and fish. This behavior can be described as a swarm intelligence system. In PSO, each solution can be considered a particle in a search space, with an individual position and velocity. During movement, each particle adjusts its position by changing its velocity based on its historical experience and the best expe-rience of its neighboring particles, until it reaches an optimum position (Kennedy, 2006). All of the particles have fitness values based on the calculations of a fitness function. Particles are updated by following two parameters called pbest and gbest at each iteration. Each particle is associated with the best solution (fitness) the particle has achieved so far in the search space. This fitness value is stored, and represents the position called pbest. The value gbest is a global optimum value for the whole population. PSO was originally developed to solve real-value optimization problems. Many optimization problems occur in a space featuring discrete, qualitative distinctions between variables and levels of variables. To extend the real-value version of PSO to a binary/discrete space, Kennedy and Eberhart (1997) proposed a binary PSO (BPSO) method. In a binary search space, a particle may move to near corners of a hypercube by flipping various numbers of bits; thus, the overall particle velocity may be described by the number of bits changed per iteration (Kennedy and Eberhart, 1997).

The position of each particle is represented by Xp¼ {Xp1, Xp2, …, Xpd} and the velocity of each particle is

represented by Vp¼ {Vp1, Vp2, …, Vpd} (d is number of particles). In BPSO, once the adaptive values pbest

and gbest are obtained, the features of the pbest and gbest particles can be tracked with regard to their position and velocity. Each particle is updated according to the following equations (Kennedy and Eberhart, 1997): vnew_id ¼ w · vold id þ c1· r1· pbestid xoldid þ c2· r2· gbestd xoldid (1) if vnew_id 62 (Vmin, Vmax) then vidnew¼ max ( min (Vmax, vnewid ), Vmin) (2)

(4)

S vnew_id ¼ 1 1þ e vnewid

(3) if r35 S vnewid

then xnew_id ¼ 1 else xnew

id ¼ 0 (4)

In Eq. 2, w is the inertia weight, c1and c2are acceleration parameters, whereas rand, rand1, and rand2are

three independent random numbers between [0, 1]. Velocities vnew_pd and vold_pd are those of the updated particle and the particle before being updated, respectively, xold_pd is the original particle position (solution), and xnew_pd is the updated particle position (solution). In Eq. 3, particle velocities of each dimension are tried to a maximum velocity Vmax. If the sum of accelerations causes the velocity of that dimension to exceed Vmax,

then the velocity of that dimension is limited to Vmax. Vmax and Vminare user-specified parameters (in our

case Vmax¼ 6, Vmin¼ 6). The updated features are calculated by the function S(vnewpd ) (Eq. 4), in which

vnew

pd is the updated velocity value. In Eq. 5, if S(vnewpd ) is larger than a randomly produced disorder number

that is within {0.0*1.0}, then its position value Sn, n¼ 1, 2, . . . , m is represented as {1} (meaning this

feature is selected as a required feature for the next update). If S(vnew

pd ) is smaller than a randomly produced

disorder number that is within {0.0*1.0}, then its position value Fn, n¼ 1, 2, . . . , m is represented as {0}

(meaning this feature is not selected as a required feature for the next update).

2.2. Combat genetic algorithm

Genetic Algorithms (GAs) were developed by John Holland in 1975 (Holland, 1992). The concept is based on Darwin’s theory of evolution and the survival of the fittest competition principles of natural selection. Based on evolutionary theory, the main principle of GAs is to randomly generate a population. GAs contain three evolutionary mechanism operators, namely selection, crossover, and mutation. After the three operators are applied, a new generation of the population will be generated. GAs randomly generate a set of chro-mosomes (solutions) at the same time. The chrochro-mosomes with a higher fitness value will be kept and calculated until a fixed number of iterations is reached, and then a final optimal fitness solution is output. GAs may fail to converge at a global optimum point depending upon the selective capacity of the fitness function (large variation of the fitness function for small variation of its input variables). Combat genetic algorithms (CGA), proposed by Eksin and Erol (Eksin and Erol, 2001; Erol and Eksin, 2006), are used to improve on the classic GA shortcoming of premature convergence by focusing on the reproduction aspect of a typical GA. Reproduction is one of the three major mechanisms in a GA that shift the chromosomes towards a local/ global optimum point; it usually decreases the diversity of chromosomes though, which can be viewed as a source of premature convergence towards a local optimum point (Erol and Eksin, 2006). The CGA algorithms can be summarized as follows (Erol and Eksin, 2006):

Step1 Randomly generate initial population for m chromosomes. Step2 Randomly select two distinct chromosomes from the population. Step3 Evaluate the fitness value of the two selected chromosomes. Step4 Calculate the relative difference value called r by using

r¼jf1 f2j f1þ f2

(5) where f1and f2are the fitness values of chromosome 1 and chromosome 2, respectively.

Step5 In compliance with the relative difference values, if the difference value is large, the partial overwrite operator is adopted, whereas if the difference value is small, the classic uniform crossover operation is chosen. This scenario can be divided into four cases:

(1) If f1< f2and R < r, crossover operation is assumed on the chromosome 2 and leave chromosome 1

unchanged, where R is a random number selected between [0, 1]. (2) If f1< f2and R > r, normal crossover operation is chosen.

(3) If f1> f2 and R < r, crossover operation is assumed on chromosome 1 and chromosome 2 is left

unchanged

(4) If f1> f2and R > r, normal crossover operation is chosen.

Step6 Mutation operation is set to a probability of 1/m. (m¼ number of chromosomes)

(5)

2.3. K-nearest neighbor

The K-nearest neighbor (K-NN) method, one of the most popular nonparametric methods (Cover and Hart, 1967; Fix and Hodges, 1989; Hastie et al., 2005), is a supervised learning algorithm introduced by Fix and Hodges in 1951. K-NN classifiers have attracted the interest of many researchers due to their theoretic simplicity and the comparatively high accuracy that can be achieved with it compared other for complex methods (AltIncay, 2007). K-NN classifies objects which are represented as points defined in some feature space. The K-NN method is easy to implement since only the parameter K (number of nearest neighbors) needs to be determined. The parameter K is the most important factor affecting the performance of the classification process. In a multidimensional feature space, the data is divided into testing and training samples. K-NN classifies a new object based on the minimum distance from the testing samples to the training samples. The Euclidean distance was used in this article. If an object is near to a number of K nearest neighbors, it is classified into the K-object category. In order to increase the classification accuracy, the parameter K has to be adjusted according to the different data set characteristics.

In K-NN, a big category tends to have high classification accuracy, while the other minority classes tend to have low classification accuracy (Tan, 2006). In this article, the leave-one-out cross-validation (LOOCV) method was implemented on the microarray data classification. When there are n data to be classified, the data is divided into one testing sample and n-1 training samples at each generation of the evaluation process. The classifier is then constructed by training the n-1 training samples. The category of the test sample can be determined by the classifier. We set the parameter K to 1 directly, meaning that 1-NN with LOOCV was used as a classifier to calculate the classification error rates in our study.

2.4. Hybrid BPSO-CGA procedure

The hybrid BPSO-CGA procedure used in this study combined a BPSO and CGA for feature selection. The CGA was embedded within the BPSO and served as a local optimizer to improve the BPSO perfor-mance at each iteration. The flowchart of BPSO-CGA is shown in Figure 1. Initially, the position of each particle was represented by a binary (0/1) string S¼ F1F2 FD. D is the dimension of the microarray data;

1 represents a selected feature, while 0 represents a non- selected feature. For example, if D¼ 10, we obtain a random binary string S¼ 1000100010, in which only features F1, F5and F9are selected. The

classifi-cation error rate of a 1-nearest neighbor (1-NN) was determined by the leave-one-out cross-validation (LOOCV) method that was used to measure the fitness of chromosomes and each particle. The BPSO-CGA procedure is described below:

Step1 Randomly generate an initial population for BPSO.

Step2 CGA is applied to generated particles, for reproduction, crossover and mutation. Step3 Evaluate fitness values of all particles.

Step4 Check stopping criteria. If satisfied, go to Step 5. Otherwise go to Step 3.

Step5 Calculate the pbest and gbest values. Each particle updates its position and velocity via the BPSO update Eqs. (1) (2) (3) (4).

Step6 Check stopping criteria. If satisfied, output the final solution. Otherwise go to Step2.

The BPSO was configured to contain 20 particles and run for 100 iterations in each trial, or until a stopping criterion was met. The number of particles in the BPSO was equal to the number m of chro-mosomes in CGA (m¼ 20). After each generation of the BPSO, the CGA was run 30 times, for a total of 100 generations. The CGA parameters were taken from Erol and Eksin (Erol and Eksin, 2006). The classical uniform crossover operation applied in the CGA method is of the type proposed in the original CGA literature (Erol and Eksin, 2006). The mutation rate was also chosen based on the original literature. A mutation operator with a probability of 1/m (1/m¼ 1/20 ¼ 0.05), with m being the population size, was used. We adopted the BPSO parameter values of Shi and Eberhart (1998). These values are claimed to be optimized. The acceleration factors c1and c2were both set to 2. The inertia weight w was 0.9.

3. RESULTS

Due to the peculiar characteristics of microarray data (high number of genes and small sample size), many researchers are currently studying how to select relevant genes effectively before using a classification

(6)

REFERENCES

AltIncay, H. 2007. Ensembling evidential k-nearest neighbor classifiers through multi-modal perturbation. Applied Soft Computing 7, 1072–1083.

Berrar, D., Bradbury, I., and Dubitzky, W. 2006. Instance-based concept learning from multiclass DNA microarray data. BMC Bioinformatics 7, 73.

Breiman, L. 1984. Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton, FL.

Buturovic, L.J. 2006. PCP: a program for supervised classification of gene expression profiles. Bioinformatics 22, 245. Chang, J.C., Hilsenbeck, S.G., and Fuqua, S.A.W. 2005. The promise of microarrays in the management and treatment

of breast cancer. Breast Cancer Res. 7, 100.

Choudhary, A., Brun, M., Hua, J., et al. 2006. Genetic test bed for feature selection. Bioinformatics 22, 837. Cover, T., and Hart, P. 1967. Nearest neighbor pattern classification. IEEE Trans. Information Theory 13, 21–27. Cover, T.M., and Van Campenhout, J.M. 1977. On the possible orderings in the measurement selection problem. IEEE

Trans. Syst. Man Cybernet. 7, 657–661.

Deutsch, J.M. 2003. Evolutionary algorithms for finding optimal gene sets in microarray prediction. Bioinformatics 19, 45.

Diaz-Uriarte, R., and de Andres, A. 2006. Gene selection and classification of microarray data using Random Forest. BMC bioinformatics 7, 3.

Dudoit, S., Fridlyand, J., and Speed, T.P. 2002. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97, 77–87.

Eksin, I., and Erol, O.K. 2001. Evolutionary algorithm with modifications in the reproduction phase. IEE Proc. Software 148, 75–80.

Erol, O.K., and Eksin, I. 2006. A new optimization method: big bang-big crunch. Adv. Eng. Software 37, 106–111. Fix, E., and Hodges, Jr., J.L. 1989. Discriminatory analysis. Nonparametric discrimination: consistency properties. Int.

Stat. Rev. 57, 238–247.

Forozan, F., Mahlamaki, E.H., Monni, O., et al. 2000. Comparative genomic hybridization analysis of 38 breast cancer cell lines: a basis for interpreting complementary DNA microarray data. Cancer Res. 60, 4519.

Furey, T.S., Cristianini, N., Duffy, N., et al. 2000. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16, 906.

Hastie, T., Tibshirani, R., Friedman, J., et al. 2005. The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer 27, 83–85.

Holland, J.H. 1992. Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA.

Hua, J., Xiong, Z., Lowey, J., et al. 2005. Optimal number of features as a function of sample size for various classification rules. Bioinformatics 21, 1509.

Huang, H.L., and Chang, F.L. 2007. ESVM: evolutionary support vector machine for automatic feature selection and classification of microarray data. Biosystems 90, 516–528.

Jeffrey, S.S., Lonning, P.E., and Hillner, B.E. 2005. Genomics-based prognosis and therapeutic prediction in breast cancer. J. Natl. Comprehensive Cancer Network 3, 291–300.

Jirapech-Umpai, T., and Aitken, S. 2005. Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinformatics 6, 148.

Kao, Y.T., and Zahara, E. 2008. A hybrid genetic algorithm and particle swarm optimization for multimodal functions. Appl. Soft Computing 8, 849–857.

Kennedy, J. 2006. Swarm Intelligence. Springer, New York.

Kennedy, J., and Eberhart, R. 1995. Particle swarm optimization. IEEE Int Conf. Neural Networks 4, 1942–1948 Kennedy, J., and Eberhart, R.C. 1997. A discrete binary version of the particle swarm algorithm. IEEE Int. Conf. Syst.

Man Cybernet. 5, 4104–4108.

Lee, J.W., Lee, J.B., Park, M., et al. 2005. An extensive comparison of recent classification tools applied to microarray data. Comput. Stat. Data Anal. 48, 869–885.

Lee, Y., and Lee, C.K. 2003. Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19, 1132.

Li, T., Zhang, C., and Ogihara, M. 2004. A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20, 2429.

Liu, X., Krishnan, A., and Mondry, A. 2005. An entropy-based gene selection method for cancer classification using microarray data. BMC Bioinformatics 6, 76.

Lonning, P.E., Sorlie, T., and Borresen-Dale, A.L. 2005. Genomics in breast cancer—therapeutic implications. Nat. Clin. Pract. Oncol. 2, 26–33.

Lovbjerg, M., Rasmussen, T.K., and Krink, T. 2001. Hybrid particle swarm optimiser with breeding and subpopula-tions. Proc. 3rdGenet. Programm. 469–476.

(7)

Oh, I.S., Lee, J.S., and Moon, B.R. 2004. Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 1424–1437.

Ramaswamy, S., Tamayo, P., Rifkin, R., et al. 2001. Multiclass cancer diagnosis using tumor gene expression sig-natures. Proc. Natl. Acad. Sci. USA 98, 15149.

Ripley, B.D. 2008. Pattern Recognition and Neural Networks. Cambridge University Press, New York.

Sberveglieri, M.P.G. 2008. Random Forests and nearest shrunken centroids for the classification of sensor array data. Sensors Actuators B Chem. 131, 93–99.

Shi, Y., and Eberhart, R. 1998. A modified particle swarm optimizer. Proc. IEEE Int. Conf. Evol. Computation 69–73. Sorensen, K., and Sevaux, M. 2006. MAj PM: memetic algorithms with population management. Computers

Opera-tions Res. 33, 1214–1225.

Tahir, M.A., Bouridane, A., and Kurugollu, F. 2007. Simultaneous feature selection and feature weighting using Hybrid Tabu Search/K-nearest neighbor classifier. Pattern Recogn. Lett. 28, 438–446.

Tahir, M.A., Bouridane, A., Kurugollu, F., et al. 2005. A novel prostate cancer classification technique using inter-mediate memory tabu search. EURASIP J. Appl. Signal Processing 14, 2241.

Tan, S. 2006. An effective refinement strategy for KNN text classifier. Expert Syst. Applications 30, 290–298. Tibshirani, R., Hastie, T., Narasimhan, B., et al. 2002. Diagnosis of multiple cancer types by shrunken centroids of gene

expression. Proc. Natl. Acad. Sci. USA 99, 6567.

Wang, X., Yang, J., Teng, X., et al. 2007. Feature selection based on rough sets and particle swarm optimization. Pattern Recogn. Lett. 28, 459–471.

Zhu, Z., Ong, Y., and Dash, M. 2007. Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans. Syst. Man. Cybernet. B 37, 70.

Address correspondence to: Dr. Cheng-Hong Yang Department of Electronic Engineering National Kaohsiung University of Applied Sciences Kaohsiung, Taiwan 807 E-mail: chyang@cc.kuas.edu.tw