行政院國家科學委員會專題研究計畫 成果報告
基於基因演算法之支撐向量機
計畫類別: 個別型計畫
計畫編號: NSC93-2218-E-110-034-
執行期間: 93 年 08 月 01 日至 94 年 07 月 31 日
執行單位: 國立中山大學電機工程學系(所)
計畫主持人: 謝哲光
計畫參與人員: 吳旭焜、林義隆
報告類型: 精簡報告
處理方式: 本計畫可公開查詢
中 華 民 國 94 年 9 月 26 日
行政院國家科學委員會補助專題研究計畫成果報告
※※※※※※※※※※※※※※※※※※※※※※※※※※
※ ※
※ 基於基因演算法之支撐向量機 ※
※ ※
※※※※※※※※※※※※※※※※※※※※※※※※※※
計畫類別:■個別型計畫 □整合型計畫
計畫編號:NSC-93-2218-E-110-034
執行期間:93 年 8 月 1 日至 94 年 7 月 31 日
計畫主持人:謝哲光 國立中山大學電機工程學系
計畫參與人員:吳旭焜,林義隆
本成果報告包括以下應繳交之附件:
□赴國外出差或研習心得報告一份
□赴大陸地區出差或研習心得報告一份
□出席國際學術會議心得報告及發表之論文各一份
□國際合作研究計畫國外研究報告書一份
執行單位:國立中山大學電機工程學系
中 華 民 國
94 年 10 月 15 日
1
行政院國家科學委員會專題研究計畫成果報告
題目:基於基因演算法之支撐向量機
GA-based Support Vector Machines
計畫編號:NSC- 93-2218-E-110-034
執行期限:93 年 8 月 1 日至 94 年 7 月 31 日
主持人:謝哲光 國立中山大學電機工程學系
計畫參與人員: 吳旭焜,林義隆
一、中文摘要 在任何一個機器學習的問題中,為確保成功之 學習,有四個重要的因素必需加以考量:(1) 實驗 資料之品質 (2) 特徵擷取;包含特徵選擇及特徵合 成 (3) 模型選擇; 包含學習機器機之選擇及學習機 器參數之設定 (4) 就選擇的學習機器機之訓練演 算法。在本研究計劃中,我們將使用支撐向量機為 學習機器。我們將嘗試兩種十分廣義的架構以基因 演算法協助學習機器參數設定之問題。這兩種架構 同時考量了訓練誤差及測試誤差的問題。最後,我 們 將 研 究 結果 成 功 地 應用 於 護 士 離職 意 願 問 題 上,此研究成果能對醫院人力資源管理有所助益。 關鍵詞:支撐向量機,基因演算法,護士離職意願 AbstractIn any machine problem, there are four important factors that must be taken into serious account for a successful learning: (1) quality of the experimental data; (2) feature extraction, including feature selection and feature composition; (3) model selection, including choice of learning machine and determination of machine parameters; (4) the algorithm utilized for training the learning machine. Two general frameworks will be proposed to ease the parameters setting of the support vector machine via genetic algorithms. Both the training errors and testing errors are taken into account in these two frameworks. The results of our research will be applied to the problem of nurses’ intention to quit. We sincerely hope that our research results may have some contribution to this hospital human resource management problem.
Keywords: Support vector machines, Genetic
algorithms, Nurses’ intention to quit 二、前言與研究目的
The present era of cost containment pressures indicates that nursing executives have to ensure that their nurses have a work environment with the work characteristics known to be linked to job satisfaction and good outcomes. Leiter et al. [1] concluded that patients, who stayed onwards where nursing staff felt more exhausted or more frequently expressed their intention to quit, were less satisfied with their medical care. Several recent studies also suggested that nurses’ job satisfaction contributed to patients’ satisfaction with nursing care, which is one of the
most important clinical outcome indicators [2, 3]. As indicated in Alexander et al. [4], nurses’ intention to quit their jobs has a strong effect on their actual action of turnover, which might lead to a certain amount of decrease in nurses’ job satisfaction. Nurses’ job dissatisfaction might further contribute to the quality of nursing care delivery and patients’ intention to return to a medical institution for future health care services [2]. In other words, nursing executives need to reduce nurses’ intention to quit and increase their job satisfaction for the purpose of sustaining a medical institution’s market share. As a result, this study addresses a significant problem in nursing—turnover. The support vector machine is utilized as the learning machine in this study and the genetic algorithm is also used to help determine the parameters of a support vector learning problem, including those of the kernel.
The prediction of nurses’ intention to quit is studied using working motivation, job satisfaction, and stress level. The data used in this study was collected in three hospitals located in southern Taiwan. The target population was all nurses with 389 valid cases [5].
There are 36 input variables and one categorical output variable. The input variables include demographic characteristics, working motivation, job satisfaction and importance subscales, and general perceptual factors. The class labels of the output variable, i.e., target, are 1, 2, 3, 4, and 5. These five classes are interpreted as: 5=often have intention to leave my current job; 4=sometimes have intention to leave my current position; 3=don’t have clear intention to leave or stay at my current job; 2=seldom have intention to leave my current position; 1=rarely have intention to leave my current job. See Tzeng et al. [5] for more details of the variables description.
Clearly, this is a 5-class classification problem. The proposed model of our learning machine is schematically shown in Fig. 1. The proposed learning machine, like the usual logistic regressor, is composed of two parts. The first part is a support vector machine for regression, abbreviated as SVR. The second part is a rounding operator which finds the nearest positive integer among 1 to 5 for the real-valued output from SVR. For instance, 2.8 becomes 3 and 4.2 becomes 4. The support vector machine for regression is briefly reviewed later.
Learning from examples has been one of the most exciting research areas in engineering and statistics communities. The main goal of learning from
examples is to find a general rule that explains experimental data given only a sample of limited size. The popular learning machines include artificial neural networks, generalized radial basis function networks, fuzzy neural networks, and support vector machines. It’s our experience that four important factors that must be taken into serious account for a successful learning: (1) quality of the experimental data; (2) feature extraction, including feature selection and feature composition; (3) model selection, including choice of learning machine and determination of machine parameters; (4) the algorithm utilized for training the learning machine. In this study, the determination of the support vector machines parameters is addressed.
It is well known that small training error of a learning machine does not imply good generalization for previously unseen data. This usually means that a learning machine with too high capacity typically leads to the very undesirable effect of over-fitting. On the other hand, a learning machine with too low capacity typically leads also to the very undesirable effect of under-fitting. To overcome this bias-variance dilemma, the method of cross validation is usually employed [6]. In this method, one first splits the experimental dataset into several, say m, parts of approximately equal size. We then perform
m training runs. Each time, one of the m parts is left
out and used as an independent validating set for optimizing the parameters. Taking into consideration both the average training and testing error rates, we would choose the parameters with acceptable results on average over the m runs. The method of cross validation works well for many practical problems. We believe that the success of the cross validation lies in its constructive use of the experimental data. In this paper, the genetic algorithm is utilized to help parameters setting of the learning problem.
三、研究方法
1. Support vector machine
The support vector machine (SVM) is a newly developed learning machine, which is rooted in the statistical learning theory [7]. In some learning machines such as the artificial neural networks or fuzzy logic systems, the method of empirical risk minimization principle is often used for finding the weights of the learning machines. However, the SVM is an approximate implement of the structural risk minimization principle. It usually has good generalization ability for previously unseen data and has been successfully applied to various areas in science and engineering. For background materials, see, e.g., [8, 9], and the references therein.
Let ℜm denote the m-dimensional real space,
pis
{
1,2,L,p}
, x∈ℜn is the input vector, and yis the target variable. Let n
X ⊆ℜ and Y⊆ℜ.
Suppose we are given the training dataset
(
)
{
x y}
X YS:= i, i li=1⊆ × . (1)
Let K
( )
x,z be a given kernel on X×X suchthat
( )
x,x' :( ) ( )
x, x'K = φ φ , x,x'∈X , (2)
where φ is the feature map from the input space X to the feature space F equipped with the inner product
⋅ ⋅, .
Consider the following optimization problem [8]: maximize
(
)
∑∑
∑
∑
= = − = = − − l i l j j i j i l i i l i i iy z zz K x x z 1 1 1 1 1 , 2 ε subject to 0 1 =∑
= l i i z , −C≤zi≤C, i∈ . (3) lHere ε is a small positive number representing the tolerance for regression and C is a positive number representing the tradeoff between the training error and generalization error.
Suppose *
i
z , i∈ , solve the problem (3). Then l
the optimal predictive function is given by
( )
(
,)
* * 1 * b x x K z x f l i i i + =∑
= , (4) where(
)
∑
= − − = l i k i i k z K x x y b 1 * , * ε for any 0<zk* <C, or(
)
∑
= − + = l i k i i k K x x y b 1 * , * ε θ for any − < * <0 k z C .Due to the different physical dimensions of the input variables, we use in this study the Mahalanobis kernel given by
( )
[
(
')
2 2(
')
2]
1 1 2 1 ... exp : ' ,x x x n xn xn x K = −σ− − − −σ− − , X x x, '∈ . (5)In the learning problem stated above, there are parameters that must be prescribed in order to have a well-posed formulation. These include the parameters ε , C, and the kernel parameters 2
1
σ , ...,
2
n
σ .
There have been proposed various methods for determining the parameters ε and C of a support vector learning problem (3). See Cherkassky & Ma [10] and the references therein. In particular, as pointed out in Cherkassky & Ma [10], the determination of ε and C are related with the noise level of the experimental data. However, in many practical problems, like the one considered in this study, it is hard to tell a reasonable noise level of the experimental data. Furthermore, we believe that the kernel parameters, e.g., 2
1
σ , ..., 2
n
σ , are of equal importance for a successful design of a learning machine. Very often, for the support vector learning from examples, we have no ground to set these parameters appropriately. In this study, we use the genetic algorithm to help determine some appropriate parameters. Hence each chromosome of the population, in the terminology of the genetic algorithm, consists of a real array with n+2 parameters, i.e., ε , C, 2
1
σ , ..., 2
n
3 chromosome corresponds to one learning machine.
2. Genetic algorithm
The genetic algorithm, originally developed by John Holland over the course of the 1960s and 1970s, is a biologically motivated search technique mimicking natural selection and natural genetics [11]. It is a general search method in between exhaustive search and traditional search methods. When the fitness landscape of the problem is unclear or riddled with many local optima, the genetic algorithm usually has good searching capability. It starts with a population of possible solutions, called chromosomes, to the problem. A prescribed fitness function is defined for each chromosome. Then highly fit chromosomes are selected for reproduction. After the genetic operations of crossover and mutation, a new generation of possible solutions is formed. This process is repeated until some stopping criterion is met. The genetic algorithm has successfully been applied to many areas of science and engineering. See, e.g., [12, 13], and the references therein.
The genetic algorithm to be used is briefly described in the following. Let N be the initial ip
population size. After the initial random seeding, only N highly fit chromosomes, according to their p
fitness values, are kept for the next generation, i.e.,
p
N is the population size after the initial generation.
In our genetic algorithm, Ng highly fit
chromosomes are kept directly for the next generation. Moreover, these chromosomes form the mating pool for reproduction. The rest Nb chromosomes,
where Np=Ng+Nb , are simply thrown away.
The crossover operation is performed on N highly g
fit chromosomes for generating Nb temporary
offspring chromosomes. The mutation operation is then made on these temporary offspring chromosomes for the next generation.
In our genetic algorithm, the tournament selection is utilized for mating. The parameterized uniform crossover [14] is used for reproduction, which is briefly described as follows [13]. Let γ be the j
prescribed crossover rate in generation j. Suppose the parent chromosomes selected for reproduction are given by ( )
[
d d dl dl]
d z z z z z := 1 2 ... −1 , ( )[
m m ml ml]
m z z z z z := 1 2 ... −1 . (6)Then the offspring chromosomes are given by ( )
[
b b bl bl]
b z z z z z := 1 2 ... −1 , ( )[
s s sl sl]
s z z z z z := 1 2 ... −1 , (7)where, for each i∈ , l
if (α < ), then i γj
(
mi di)
i mi bi z z z z = −β − ,(
mi di)
i di si z z z z = +β − , else zbi =zdi, zsi =zmi, (8)and α , i β are random numbers in i
[ ]
0,1. It isinteresting to note that if α < , then i γj zbi=zmi and di
si z
z = when βi =0; similarly if α <i γj, then di
bi z
z = and zsi =zmi when βi=1.
Let λ be the prescribed mutation rate in j
generation j. The number of times for mutation is given by kj:=λj*Np*l. First we randomly select
one locus of a randomly chosen chromosome from
b
N chromosomes. Then the parameter at this locus
is replaced by a random number.
3. Two frameworks
In the following, we propose two general frameworks for determining the parameters of a support vector learning problem. It should be kept in mind that only experimental data are available on hand. Both the training errors and testing errors are taken into account in these two frameworks. The main idea behind these frameworks is the constructive use of the experimental data.
Framework 1 as shown in Fig. 2 may be viewed as a generalization of the method of cross validation. In each generation of Framework 1, the training examples of fixed size are randomly selected from experimental dataset with the remaining as the testing examples. The cost of a chromosome is defined as the number of misclassified experimental data. Equivalently, we may define the fitness value of a chromosome as the number of experimental data that are correctly classified. Because of the random selection of training examples, the fitness value of the same chromosome is different for different generation. It is our experience that the experimental data that is likely to stump the learning machine should be kept in the training dataset for full training of the learning machine. This motivates our Framework 2 as shown in Fig 3. In each generation of Framework 2, the fitness of each experimental data is calculated as the total number of misclassifications made by the chromosomes in the current population. The experimental data with higher fitness values are kept in the training dataset for the next generation. The cost or fitness of a chromosome is defined as in Framework 1. The main feature of Framework 2 is that both the chromosomes, representing the possible parameters of the learning problem, and the experimental data are assigned the fitness values.
It is a characteristic of the genetic algorithm that upon termination of the algorithm, multiple possible candidates for the solution of the problem are provided. Thus we must devise some ways to choose a final solution among those possible candidates, i.e., the final choice of the parameters of the learning problem. A natural criterion is the life time of a chromosome, i.e., the number of generations that a given chromosome lives. Another reasonable criterion, used in this study, is the life score of a chromosome, which is the average score over generations of a given chromosome. In this approach, scores are assigned in descending order to only top m, say top 5, chromosomes. For other
chromosomes, the scores are set to be zero. The chromosome with the highest life score is chosen as our final solution.
四、結果與討論
In our simulations, the efficient sequential minimal optimization (SMO) algorithm [15] is used for solving the support vector regression problem (3). In each iteration of this algorithm, two parameters, say z i
and z , are selected, very often by heuristics. Then j
the optimization with respect to these two parameters, with others unchanged, can be calculated explicitly by some simple formulae.
In the simulations, the size of training examples is kept at 291, which is about 3/4 of the experimental data size. Moreover, we set Nip =50, Np=40,
20 =
g
N , Nb=20, γj =0.6, and λj=0.01. For
the calculation of the life scores of chromosomes in the population, scores are assigned in descending order to only top 5 chromosomes.
The final parameters are obtained as Framework 1: 0255 . 0 = ε , C=39.768, σ12 =30.5869, 7266 . 19 2 2 = σ , ..., 2 20.3573 35 = σ , 39012 . 2 2 36= σ . Framework 2: 0532 . 0 = ε , C=15.0258, σ12 =33.9908, 5069 . 33 2 2 = σ , ..., 2 32.0047 35= σ , 14179 . 2 2 36= σ .
It is noted that the final parameters determined by either Framework 1 or Framework 2 can hardly be guessed by heuristics. In our earlier study [5], the usual cross validation was employed resulting in
1 . 0 =
ε , C=8 , and σi2 =8 for all i. For
convenience, this set of parameters is said to be produced by Framework 3. Now we have three sets of parameters for the support vector learning problem under consideration. To test their generalization capability, we randomly select 100 testing examples with the remaining as the training examples for 10 times. Then we calculate the average number of misclassifications of the testing examples for each parameter set produced from the three frameworks. The result is shown in Table 1. From this table, it is clear that the results produced by Framework 1 and Framework 2 are quite satisfactory.
Two novel GA-based frameworks have been proposed in this study for the prediction of nurses’ intention to quit. These frameworks are derived from GA-based support vector machine. Simulation results have shown that the GA-based support vector machines could successfully accomplish the task. The result of this study can be used to set up an early warning system for nursing resource administration. With some minor modifications, the two proposed frameworks may be used for other learning problems, including other learning machines such as artificial neural networks, generalized radial basis function
networks, and fuzzy neural networks. 五、參考文獻
[1] M.P. Leiter, P. Harvie, C. Frizzell, The correspondence of patient satisfaction and nurse burnout, Social Science & Medicine 47 (10) (1998) 1611-1617.
[2] H.M. Tzeng, S. Ketefian, The relationship between nurses’ job satisfaction and inpatient satisfaction: an exploratory study in a Taiwan’s teaching hospital, Journal of Nursing Care Quality 16 (2) (2002) 39-49.
[3] H.M. Tzeng, S. Ketefian, R.W. Redman, Relationship of staff nurses strength of culture, job satisfaction, and inpatient evaluation with nursing care, International Journal of Nursing Studies 39 (1) (2002) 79-84.
[4] J.A. Alexander, R. Lichtenstein, H.J. Oh, E. Ullman, A causal model of voluntary turnover among nursing personnel in long-term psychiatric setting, Research in Nursing & Health 21 (5) (1998) 415-427.
[5] H.M. Tzeng, J.G. Hsieh, Y.L. Lin, Predicting nurses’ intention to quit with a support vector machine: a new approach to set up an early warning mechanism in human resource management, Computers, Informatics, Nursing 22 (4) (2004) 232-242.
[6] B. Schölkopf, A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, and Beyond, MIT Press, Cambridge, MA, 2002. [7] B.E. Boser, I.M. Guyon, V.N. Vapnik, A training
algorithm for optimal margin classifiers, in: D. Haussler (Ed.), Proceedings of the 5th Annual
ACM Workshop on Computational Learning Theory, ACM Press, Pittsburgh, PA, 1992, pp. 144-152.
[8] N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines, Cambridge University Press, Cambridge, United Kingdom, 2000.
[9] V. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, 1995.
[10] V. Cherkassky, Y. Ma, Practical selection of SVM parameters and noise estimation for SVM regression, Neural Networks 17 (2004) 113-126. [11] J.H. Holland, Adaptation in Natural and Artificial
Systems, University of Michigan Press, Ann Arbor, Michigan, 1975.
[12] M. Mitchell, An Introduction to Genetic Algorithms, MIT Press, Cambridge, MA, 1996. [13] R.L. Haupt, S.E. Haupt, Practical Genetic
Algorithms, Wiley, New York, 1998.
[14] W.M. Spear, K.A. De Jong, On the virtues of parameterized uniform crossover, in: R.K. Belew, & L.B. Booker (Eds.), Proceedings of the 4th International Conference on Genetic Algorithms, Morgan Kaufmann, San Francisco, CA, 1991, pp. 279-286.
[15] J.C. Platt, Fast training of support vector machines using sequential minimal optimization, in: B. Scholkopf, C.J.C. Burges, & A.J. Smola (Eds.), Advances in Kernel Methods-Support
5 Vector Learning, MIT Press, Cambridge, MA , 1999, pp. 185-208.
Table 1 Average number of misclassifications of testing examples.
Fig. 1. Proposed model of the learning machine.
Fig. 2. Flowchart of Framework 1.