Experiment Results - 使用基因演算法變數篩選與SVM分類器於PET/CT上孤立肺結節之診斷

Texture calculations were done on the suspicious nodules taken from the PET/CT images, and these calculations were then applied with the genetic algorithm program for feature selection before proceeding with SVM classifier.

A set of input variables were randomly selected and evaluated under the GA program at first, then were proceeded to detect the optimal hyper-plane by using SVM classifier for classification of the nodules.

We ran 5-fold cross-validation five times for each set of variables processed from GA screening and SVM classification. Each of the 26 variables inputted was performed with 5-fold cross validation. Finally, statistical results of sensitivity, specificity and accuracy were obtained from each experiment and were used to calculate their means and standard deviations.

Sensitivity

= number of true positives / (number of true positives + number of false negatives)

= probability of a positive test, given that the patient is ill

Specificity

= number of true negatives / (number of true negatives + number of false positives)

= probability of a negatives test, given that the patient is well

Accuracy rate

= (numbers of benign nodules × sensitivity + numbers of malignant nodules × specificity) / total numbers of nodules

Each of the 22 variables from GLCM (not including the first 4 variables) has four distance vectors. These variables can have four different changing distance vectors. If each variable in each distance vector are put together, the number of variables will be even more than the number of items in experimental nodules, and it cannot properly find the most appropriate variables for screening.

Therefore, experiments were done by separating these variables in different directions. We repeated experiments, five times in each direction, found out the sensitivity, specificity and accuracy, and calculated the mean and standard deviation, as shown in Table 4-1.

Table 4- 1 Sensitivity, specificity and accuracy, and their mean and standard deviation of classifications in each direction to GA

Distance vector

[0,1] (1) [0,1] (2) [0,1] (3) [0,1] (4) [0,1] (5)

Sensitivity (%) 79.31 68.97 68.97 68.97 72.41

Mean 71.73

Standard Deviation 4.49

Specificity (%) 77.78 69.44 72.22 69.44 75.00

Mean 72.78

Standard Deviation 3.62

Accuracy (%) 78.63 69.18 70.42 69.18 73.57

Mean 72.20

Standard Deviation 4.02

Distance vector

[1,1] (1) [1,1] (2) [1,1] (3) [1,1] (4) [1,1] (5)

Sensitivity (%) 72.41 72.41 68.97 65.52 65.52

Mean 68.97

Standard Deviation 3.45

Specificity (%) 69.44 72.22 69.44 61.11 72.22

Mean 68.89

Standard Deviation 4.56

Accuracy (%) 71.09 72.33 69.18 63.55 68.51

Mean 68.93

Standard Deviation 3.37

Distance vector

[-1,1] (1) [-1,1] (2) [-1,1] (3) [-1,1] (4) [-1,1] (5)

Sensitivity (%) 68.97 55.17 68.97 72.41 58.62

Mean 64.83

Standard Deviation 7.48

Specificity (%) 69.44 55.56 69.44 72.22 58.33

Mean 65.00

Standard Deviation 7.50

Accuracy (%) 69.18 55.34 69.18 72.33 58.49

Mean 64.90

Standard Deviation 7.49

Distance vector

[-1,-1] (1) [-1,-1] (2) [-1,-1] (3) [-1,-1] (4) [-1,-1] (5)

Sensitivity (%) 65.52 62.07 65.52 68.97 72.41

Mean 66.90

Standard Deviation 3.93

Specificity (%) 63.89 61.11 69.44 66.67 72.22

Mean 66.67

Standard Deviation 4.39

Accuracy (%) 64.79 61.64 67.27 67.94 72.33

Mean 66.79

Standard Deviation 3.96

The final statistical results obtained from GA screening were calculated.

Each result showed that out of the 26 variables, some were used and some were not used. The results were summed according to their variables. Running five times for different directions, as represented by [0, 1], [1, 1], [-1, 0] and [-1, -1]

respectively, and each time there will be cross validation five times in each SVM. Therefore, each variable appear at most 25 times in each direction as shown in Table 4-2 which shows the number of occurrences of each variable.

Table 4- 2 The number of occurrences of each variable number

Variable name Occurrence

[0,1] [1,1] [-1,0] [-1,-1]

20 Sum variance 5 8 4 6

21 Difference variance 21 20 21 22

22 Difference entropy 8 7 8 7

23 Information measure of correlation1

16 17 17 13 24 Information measure of

correlation2

18 15 16 17 25 Inverse difference

normalized (INN)

6 11 8 5 26 Inverse difference moment

normalized

25 21 19 23

After calculating the number of occurrences of each variable, we took out variables with occurrences higher than 15, 20 and 22. The total numbers of variables occurring more than 15 times were found to be 23. Those occurring more than 20 and 22 times were found to be 14 and 5, respectively. Table 4-3 listed these variables.

Table 4- 3 The numbers of those higher frequency variances

occurrences Variables

[0,1] Contrast, Correlation(m), Difference variance, Information measure of correlation1, Information measure of correlation2, Inverse difference moment

normalized

[1,1] Contrast, Maximum probability, Difference variance, Information measure of correlation1,

Information measure of correlation2, Inverse difference moment normalized

[-1,0] Homogeneity, Dissimilarity, Difference variance, Information measure of correlation1, Information

measure of correlation2, Inverse difference moment normalized

[-1,-1] Dissimilarity, Difference variance, Information measure of correlation2, Inverse difference moment normalized

≥ 20

Max SUV of PET

[0,1] Contrast, Difference variance, Inverse difference moment normalized

[1,1] Difference variance, Information measure of correlation1, Information measure of correlation2, Inverse difference moment normalized

[-1,0] Homogeneity, Dissimilarity, Difference variance, Inverse difference moment normalized

[-1,-1] Difference variance, Inverse difference moment normalized

≥ 22 Max SUV of PET 5

[0,1] Contrast, Inverse difference moment normalized [-1,-1] Difference variance, Inverse difference moment normalized

These three set of variables of different numbers were applied to classify the original 65 nodules. Because no variable selection was needed, we ran the SVM directly. It was the same as having five times cross-validation and the results were as shown in the Table 4-4. When we applied to SVM with the selected 22 variables, their sensitivity was found to be 72.41%, specificity 72.22% and accuracy 72.30%. If 14 screened variables were used, the sensitivity was 79.31%, the specificity was72.22%, and the accuracy was 75.38%. When 5 screened variables, the sensitivity, specificity and accuracy were 79.31%, 80.56%, and 80.00% respectively.

Table 4- 4 Sensitivity, Specificity and Accuracy of three sets of selected variables using SVM only

Occurrences ≥15 ≥20 ≥22 Numbers of

variables

23 14 5

Sensitivity (%) 72.41 79.31 79.31 Specificity (%) 72.22 72.22 80.56 Accuracy (%) 72.30 75.38 80.00

Those which appeared more than 22 times showed a greater accuracy rate than which appeared more than 15 and 20 times. On the other hand, the

numbers of screened variables were less than the original 26 variables, so it was shown that the differentiating efficiency did not depend on the number of variables, but on its combination of variables. Special combination could improve the accuracy of identifying benign and malignant nodules.

在文檔中使用基因演算法變數篩選與SVM分類器於PET/CT上孤立肺結節之診斷 (頁 41-49)