• 沒有找到結果。

Chapter 3 EBUS Images Diagnosis System Using Convolutional Neural Network

3.3. Classification

3.3.1. SVM

Support vector machine (SVM) [31] which was a supervised learning classifier was utilized to differentiate between benign and malignant images with the extracted features. The features from CaffeNet were included in the SVM model. Supposing a training set S={xi,yi}, where feature vector xi ∈ Rn , and an indicator vector y ∈ Rl such that yi ∈{1,0} . The soft margin SVM tries to find a hyperplane that satisfies the following constrained optimization:

minω,b,ξ

1

Tω+C ∑ ξi

l

i=1

subject to yiTϕ(xi)+b) ≥ 1 - ξi, ξi ≥ 0, i=1, . . . ,l,

where ω is a n-dimensional vector, b is a scalar, and C > 0 is the regularization parameter. There was a possibility that classification the EBUS images was a non-linear problem. To achieve better performance for the non-linear problem, the kernel function ϕ(xi) was utilized to mapping feature vector xi into a higher dimensional space. In this study, the kernel function was the radial basis function kernel, also called the Radial Basis Function kernel. The SVM classifier produced the results which represented the probabilities of lesion tendency with range 0 to 1. And the threshold was set to 0.5. If the prediction probability exceeds 0.5, the sample is predicted to be malignant;

otherwise, benign.

Chapter 4

Experiment Results and Discussion

4.1. Experiment Environment

The experiments of the proposed computer-aided diagnosis (CAD) system included data augmentation, feature extraction based on fine-tuned CNN, classification.

All the methods were implemented by python programming language and python modules, such as numpy, opencv, scikit-learn and scikit-image which are always utilized in computer vision and machine learning. And the convolutional neural network used in the transfer learning was based on Caffe framework [32] which is developed by Berkeley AI Research (BAIR)/The Berkeley Vision and Learning Center (BVLC). The pre-trained models were from the Caffe Model Zoo [24]. The system was operated under the Microsoft Windows 10 operating system (Microsoft, Redmond, WA, USA) and ran on an Intel® Core™ i7-4790 3.6 GHz processor with 16GB RAM. And the transfer learning was ran on a Geforce GTX 1070 8GB GPU.

4.2. Results

The dataset containing 56 benign cases and 108 malignant cases was used to measure the performance of the experiments. To evaluate the performance of the CAD system, the five-fold cross validation method [33] was utilized. Six indicators included accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and area under the curve (AUC) of receiver operating characteristic curve (ROC) were calculated. The ROC was obtained by using ROCKIT software (C. Metz;

University of Chicago, Chicago, IL, USA). To examine whether using the pre-trained model was useful to improve the performance, the fine-tuned CaffeNet and the CaffeNet trained from scratch had a comparison and the results were shown in Table 4-1. With the advantage of pre-trained model, the accuracy was improved from 62.8% to 81.1% and the sensitivity was increased from 66.7% to 91.7%. Moreover, their ROC curves were illustrated in Fig. 4-1. The diagnosis performance of the fine-tuned CaffeNet was statistically significant better than the CaffeNet directly trained from scratch with a p-value less than 0.05.

Table 4-1. The comparison between CaffeNet trained with scratch and fine-tuned

The value with “*” means the p-value of the comparison between it and the first row< 0.05

Fig. 4-1. The ROC curve of the comparison of whether using pre-trained model.

0

AUC=0.5995 CaffeNet scratch AUC=0.8495 Fine-tuned CaffeNet

To determine whether the fusion of the fine-tuned CaffeNet and SVM have better performance, the fusion of the tuned CaffeNet and SVM compared with the fine-tuned CaffeNet. Their results were listed in Table 4-2. The fusion of the fine-fine-tuned CaffeNet and SVM boosted the specificity from 60.7% to 82.1% and the accuracy from 81.4% to 85.4%. The p-value of the AUC less than 0.05 indicated there was statistically

significant about the improvement. Their ROC curves were shown in Fig. 4-2.

Table 4-2. The comparison of whether classifying by SVM with the features from the fine-tuned CaffeNet.

The value with “*” means the p-value of the comparison between it and the first row< 0.05

Fig. 4-2. The ROC curve of the comparison of whether classifying by SVM with the features from Fine-tuned CaffeNet.

To evaluate whether the proposed CNN method was better than the conventional handcrafted approach, the gray-level co-occurrence matrix (GLCM) [34] method was performed in this experiment to extract second-order statistical texture features from EBUS images. Six GLCM features including contrast, correlation, homogeneity, energy, dissimilarity, ASM were utilized to classify with SVM. In Table 4-3, the results showed that the CaffeNet trained from scratch and the fusion of the CaffeNet trained from scratch and SVM was not superior to the handcrafted approach. Nevertheless, the

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.2 0.4 0.6 0.8 1

True Position Fraction

False Position Fraction

ROC Curves

AUC=0.8495 Fine-tuned CaffeNet AUC=0.8705 Fine-tuned CaffeNet-SVM

fusion of the fine-tuned CaffeNet and SVM outperformed the handcrafted method with statistical significance. Their ROC curves were illustrated in Fig. 4-3.

Table 4-3. The comparison between the handcrafted method and the CNN methods.

Accuracy

The value with “*” means the p-value of the comparison between it and the first row< 0.05

Fig. 4-3. The ROC curve of the comparison between the handcrafted method and the CNN methods.

The other deeper neural networks including the VGGNet with 16 layers [35], the GoogleNet with 22 layers [36], the ResNet with 50 layers [37] were also utilized the transfer learning and the fine-tuning to examine whether the features from deeper neural networks with limited training data had better performance. In Table 4-4 the results showed the fusion of the fine-tuned CaffeNet with SVM was better than the fusion of the VGGNet with 16 layers, the GoogleNet, and the ResNet with 50 layers. And their ROC curves and training time were illustrated in Fig. 4-4 and listed in Table 4-5.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.2 0.4 0.6 0.8 1

True Position Fraction

False Position Fraction

ROC Curves

AUC=0.6891 GLCM-SVM

AUC=0.5989 CaffeNet scratch

AUC=0.6265 CaffeNet scratch-SVM

AUC=0.8705 Fine-tuned CaffeNet-SVM

Table 4-4. The performance of the fusion of the fine-tuned CaffeNet and other deeper

The value with “*” means the p-value of the comparison between it and the first row< 0.05

Fig. 4-4. The ROC curve of the comparison between CaffeNet and other CNN models.

Table 4-5. The training time of the fusion of the fine-tuned CNN models and SVM.

Training Time (5 fold) Fine-tuned CaffeNet-SVM 3 minutes 40 seconds

Fine-tuned VGG16-SVM 50 minutes

Fine-tuned GoogleNet-SVM 15 minutes

Fine-tuned ResNet50-SVM 1 hour 27 minutes

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.2 0.4 0.6 0.8 1

True Position Fraction

False Position Fraction

ROC Curves

AUC=0.8705 Fine-tuned CaffeNet-SVM

AUC=0.8337 Fine-tuned GoogleNet-SVM

AUC=0.8394 Fine-tuned ResNet50-SVM

AUC=0.7683 Fine-tuned VGG16-SVM

4.3. Discussion

In this study, the fine-tuning based on the concept of the transfer learning was performed to overcome the problem of insufficient training data. The Table 4-1 showed that the fine-tuned CaffeNet initialized weights by the model trained with nature images successfully boosts the performance on classifying EBUS images which is not similar to nature images. Directly training with limited scratch was not sufficient to optimize the parameters of the CaffeNet; hence the performance was not good. As with the previous study [38], it was helpful to perform the transfer leaning from the large scale annotated nature image datasets (ImageNet). To achieve better performance, the fusion of the fine-tuned CaffeNet and SVM was performed in this study. In Table 4-2, the fusion of the fine-tuned CaffeNet and SVM improved the specificity and the performance. It represented that the features extracting from the fine-tuned CaffeNet was discriminative and the classification ability of SVM outperformed the direct classification with the CaffeNet using the softmax layer. The reason might be that the generalization ability of the SVM was better than that of the softmax layer [39].

Moreover, according to the experimental results shown in Table 4-3, the performance of the handcrafted method (GLCM+SVM) was higher than that of the CaffeNet directly trained with scratch but lower than the fine-tuned CaffeNet and the fusion of the

limited training data was not plenty to optimize the parameters of the CaffeNet; hence, the automatic feature extractor of the CaffeNet could not produce powerful features than the handcrafted method. Besides the CaffeNet, there have recently been many deeper networks proposed with better performance in the ImageNet Large Scale Visual Recognition Competition [40], like the VGGNet, the GoogleNet and the ResNet. In our experiments, the performance and the training time of the fusion of the fine-tuned CaffeNet and SVM which only contains 8 layers was better than the fusion with other fine-tuned deeper CNNs. The reason might be that the deeper neural networks with more parameters, hence they need more training data and training time to optimize.

Although the proposed system achieved higher performance, there were two limitations. First, the quantity of the original dataset was not sufficient for fine-tuning the model to achieve the performance as the expert diagnosis. Although the data augmentation was performed to expand the dataset, the distribution of the dataset was not enlarged too much. To overcome the limitation, it was necessary to acquire more labeled data for fine-tuning. Besides, the images of the dataset came from only the same type of machine. Therefore, it was unconfirmed whether the proposed system was robust to the images from different types of machines. There was a need to acquire the images from different types of machines for fine-tuning the model to confirm the

Chapter 5

Conclusion and Future Work

In this study, a CAD system classifying lung lesions into benign or malignant was proposed. The system utilized data augmentation to expand the size of training data.

Then feature extraction based on fine-tuned CNN was performed. It was achieved by initializing the CaffeNet with the weight pre-trained on ImageNet and then the layers were fine-tuned with scratch. Moreover, the features were extracted from the fully connected layer 7 of CaffeNet. Furthermore, the SVM model was applied with the features to differentiate between benign and malignant lesions. According to the experiment results, the accuracy, sensitivity, specificity, PPV, NPV and the AUC of this system achieved 85.4% (140/164), 87.0% (94/108), 82.1% (46/56), 90.4% (94/104), 76.6% (46/60) and 0.8705, respectively. The results showed that the fusion of the fine-tuned CaffeNet and SVM system had potential to assist detecting lung cancer. In addition, the proposed method outperformed than the conventional handcrafted method and was the first to utilize deep learning for diagnosing EBUS images automatically. It decreased the manual operation and the time for diagnosis. In the future, it was required to expand the data set with the same quantity of benign and malignant lesions to enhance the optimization of the model. In addition, there was a need to evaluate the

References

[1] R. L. Siegel, K. D. Miller, and A. Jemal, "Cancer statistics, 2016," CA: a cancer journal for clinicians, vol. 66, pp. 7-30, 2016.

[2] R. S. Fontana, D. R. Sanderson, W. F. Taylor, L. B. Woolner, W. E. Miller, J. R.

Muhm, et al., "Early Lung Cancer Detection: Results of the Initial (Prevalence) Radiologic and Cytologic Screening in the Mayo Clinic Study 1, 2," American Review of Respiratory Disease, vol. 130, pp. 561-565, 1984.

[3] T. N. L. S. T. R. Team, "Reduced Lung-Cancer Mortality with Low-Dose Computed Tomographic Screening," New England Journal of Medicine, vol.

365, pp. 395-409, 2011.

[4] M. Kaneko, K. Eguchi, H. Ohmatsu, R. Kakinuma, T. Naruke, K. Suemasu, et al., "Peripheral lung cancer: screening and detection with low-dose spiral CT versus radiography," Radiology, vol. 201, pp. 798-802, 1996.

[5] E. A. Kazerooni, F. T. Lim, A. Mikhail, and F. J. Martinez, "Risk of pneumothorax in CT-guided transthoracic needle aspiration biopsy of the lung,"

Radiology, vol. 198, pp. 371-375, 1996.

[6] T. Balamugesh and F. Herth, "Endobronchial ultrasound: A new innovation in bronchoscopy," Lung India: Official Organ of Indian Chest Society, vol. 26, p.

17, 2009.

[7] K. Yasufuku, T. Nakajima, M. Chiyo, Y. Sekine, K. Shibuya, and T. Fujisawa,

"Endobronchial ultrasonography: current status and future directions," Journal of Thoracic Oncology, vol. 2, pp. 970-979, 2007.

[8] H. Wada, T. Nakajima, K. Yasufuku, T. Fujiwara, S. Yoshida, M. Suzuki, et al.,

"Lymph node staging by endobronchial ultrasound-guided transbronchial needle aspiration in patients with small cell lung cancer," The Annals of thoracic surgery, vol. 90, pp. 229-234, 2010.

[9] T.-Y. Chao, C.-H. Lie, Y.-H. Chung, J.-L. Wang, Y.-H. Wang, and M.-C. Lin,

"Differentiating peripheral pulmonary lesions based on images of endobronchial ultrasonography," CHEST Journal, vol. 130, pp. 1191-1197, 2006.

[10] C.-H. Lie, T.-Y. Chao, Y.-H. Chung, J.-L. Wang, Y.-H. Wang, and M.-C. Lin,

"New image characteristics in endobronchial ultrasonography for differentiating peripheral pulmonary lesions," Ultrasound in medicine &

biology, vol. 35, pp. 376-381, 2009.

[11] P. Nguyen, F. Bashirzadeh, J. Hundloe, O. Salvado, N. Dowson, R. Ware, et al.,

"Grey scale texture analysis of endobronchial ultrasound mini probe images for

2015.

[12] K. Fukushima and S. Miyake, "Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition," in Competition and cooperation in neural nets, ed: Springer, 1982, pp. 267-285.

[13] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, pp. 436-444, 2015.

[14] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, pp. 2278-2324, 1998.

[15] C.-K. Shie, C.-H. Chuang, C.-N. Chou, M.-H. Wu, and E. Y. Chang, "Transfer representation learning for medical image analysis," in Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE, 2015, pp. 711-714.

[16] J.-Z. Cheng, D. Ni, Y.-H. Chou, J. Qin, C.-M. Tiu, Y.-C. Chang, et al.,

"Computer-aided diagnosis with deep learning architecture: applications to breast lesions in US images and pulmonary nodules in CT scans," Scientific reports, vol. 6, p. 24454, 2016.

[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.

[18] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 248-255.

[19] Y. Bar, I. Diamant, L. Wolf, and H. Greenspan, "Deep learning with non-medical training used for chest pathology identification," in Proc. SPIE, 2015, p. 94140V.

[20] H.-C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, et al., "Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning," IEEE transactions on medical imaging, vol. 35, pp. 1285-1298, 2016.

[21] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580-587.

[22] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun,

"Overfeat: Integrated recognition, localization and detection using convolutional networks," arXiv preprint arXiv:1312.6229, 2013.

[23] A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, "CNN features

IEEE conference on computer vision and pattern recognition workshops, 2014, pp. 806-813.

[24] C. Cortes and V. Vapnik, "Support-vector networks," Machine learning, vol. 20, pp. 273-297, 1995.

[25] B. Athiwaratkun and K. Kang, "Feature representation in convolutional neural networks," arXiv preprint arXiv:1507.02313, 2015.

[26] J. Salamon and J. P. Bello, "Deep convolutional neural networks and data augmentation for environmental sound classification," IEEE Signal Processing Letters, vol. 24, pp. 279-283, 2017.

[27] N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway, et al., "Convolutional neural networks for medical image analysis: Full training or fine tuning?," IEEE transactions on medical imaging, vol. 35, pp. 1299-1312, 2016.

[28] J. Schmidhuber, "Deep learning in neural networks: An overview," Neural networks, vol. 61, pp. 85-117, 2015.

[29] J. Donahue, "Caffenet," ed, 2016.

[30] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in European conference on computer vision, 2014, pp. 818-833.

[31] V. Vapnik, The nature of statistical learning theory: Springer science & business media, 2013.

[32] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, et al., "Caffe:

Convolutional architecture for fast feature embedding," in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 675-678.

[33] R. Kohavi, "A study of cross-validation and bootstrap for accuracy estimation and model selection," in Ijcai, 1995, pp. 1137-1145.

[34] R. M. Haralick and K. Shanmugam, "Textural features for image classification,"

IEEE Transactions on systems, man, and cybernetics, pp. 610-621, 1973.

[35] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.

[36] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.

[37] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.

[38] H. C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, et al., "Deep Convolutional Neural Networks for Computer-Aided Detection: CNN

Transactions on Medical Imaging, vol. 35, pp. 1285-1298, 2016.

[39] D.-X. Xue, R. Zhang, H. Feng, and Y.-L. Wang, "CNN-SVM for microvascular morphological type recognition with data augmentation," Journal of Medical and Biological Engineering, vol. 36, pp. 755-764, 2016.

[40] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, et al., "Imagenet large scale visual recognition challenge," International Journal of Computer Vision, vol. 115, pp. 211-252, 2015.

相關文件