Computer-Aided Diagnosis Applied to US of Solid Breast Nodules by Using Principal Component Analysis and Image Retrieval
全文
(2) Int. Computer Symposium, Dec. 15-17, 2004, Taipei, Taiwan.. original textural feature vector will be mapping into principal vector with a lower dimension. The transformed vector, the principal vector, is used as new textural feature to retrieve images from database based on similarity measure of Euclidean distance. The retrieved images are supplied as the reference resources to identify benign and malignant lesions in the US image. The proposed CAD system achieves a good diagnostic performance by using image retrieval techniques with PCA on textural features.. Four different rectangular regions from the two sonograms are used for the analysis of each tumor. An example is shown in Fig. 2. There were: (a) two regions that extended beyond the lesion margins in all directions by 1-2 mm for both transverse and longitudinal views of a tumor; (b) the largest rectangular region that would fit inside the lesion for both transverse and longitudinal views of a tumor.. 2. Data Acquisition 1020 sonograms of region of interest (ROI) from 255 patients including 36 cancers, 57 fibrocystic nodules, 120 fibroadenomas and 33 cysts were used as case samples. The ultrasonic appearances were then correlated either with the fine needle aspiration, core biopsy or surgical findings. ROI was manual extracted by physicians. The ultrasonic images were captured at the transverse and longitudinal views for each tumor. The images were collected from June 1, 1998 to Apirl 31, 1999; the patients' ages ranged from 18 to 64 years; tumors were from 0.8 to 3.6 cm in size. Sonography was performed using an ALOKA SSD 1200 (Tokyo, Japan) scanner and a 7.5 MHz lineal transducer with freeze-frame capability. No acoustic standoff pad was used in any of the cases. When a sonogram is performed, an analog video signal is transmitted from the VCR output of the scanner to a portable notebook computer; the data is then digitized by a frame grabber Video CATcher (from the Top Solution Technology Co.) that is connected to the printer port of the computer. The capturing resolutions of the portable computer and the external frame grabber are 736 × 566 pixels for an NTSC video screen picture. The monochrome ultrasonic image is quantized into 256 gray levels. Figure 1 presents a tumor in the different views.. (a). (b). (c). (d). Figure 2. An example of the four ROI subimages for a tumor: (a) full longitudinal ROI (LA) subimage, (b) inside longitudinal ROI (LB) subimage, (c) full transverse ROI (TA) subimage, and (d) inside transverse ROI (TB) subimage.. 3. Image Analysis An ultrasonic image consists of many points with different values of gray level intensity. Different tissues have significantly different textures. The block difference of inverse probabilities (BDIP) and block variation of local correlation coefficients (BVLC) image features are proposed by Young et al. for content-based image retrieval [10]. The proposed CAD system adopts these two textural features and the autocorrelation matrix to differentiate benign breast tumors from malignant lesions. BDIP and BVLC are defined as following equations. ∑. BDIP = M. 2. −. (i , j )∈B. I (i , j ). max I (i , j ) (i , j )∈B. ,. where I (i, j ) is the intensity of pixel (i, j) and B is an M × M block. The larger value of BDIP would be if the larger variance of intensities in a block. In this paper, M is chosen to be 2.. Figure 1. A breast tumor located in the different views. (with a resolution of 58 × 58 pixels in a 1cm × 1cm rectangle). 1206.
(3) Int. Computer Symposium, Dec. 15-17, 2004, Taipei, Taiwan.. 1 ∑ I (i , j )I (i + k , j + l )µ 0,0 , µ k ,l 2 (i , j )∈B M ρ (k , l ) = σ 0,0 , σ k ,l. BVLC =. max (k ,l )∈O4. [ρ (k , l )] −. min ( k ,l )∈O4. each case can be combined as a 192 dimension feature vector. Suppose that there are N feature vectors in the training set. The average feature vector m of the training set is given by. ,. [ρ (k , l )],. m=. A (∆ m , ∆ n A (0 , 0 ). i =1. i. N r ui = ∑ν ik ( xk − m ) , k =1. for i = 1, 2, …, N. The basis set vectors formed from largest eigenvalues contain most of the information of the feature vectors in the training set. The percent of the total variability explained by each ui can be figured out. Generally, we use first p principal components which exceed 90% of the total variance of the original vectors to approximately project the original feature vector xk into a new p-dimensional feature vector. The approximation equation is defined as xk ≈ ∑ p ω p µ p ,. ). and A(∆m, ∆n) = M −1−∆m 1 ∑ ( M − ∆m)( N − ∆n) x=0. r. ith ROI subimage in the training set. The linear combinations of the eigenvectors the training set form the basis set of vectors ui. We can represent the best characteristics of the variation in the training vectors as principle component ui:. where µ0,0 , σ 0,0 are the local mean and standard deviation of the block with size M × M. The (k, l) term denotes four shift directions, they are -90o, 0 o, 45 o, 45 o respectively. The µk ,l , σ k ,l are the mean and standard deviation of the shifted block. The larger BVLC value means that the ingredients in the block are rough. The 2-D normalized auto-correlation coefficients used to reflect the inter-pixel correlation within an image. The coefficients are further modified into a mean-removed version to generate the similar autocovariance features for images with different brightness but with a similar texture. The modified auto-covariance coefficients between pixel (i, j) and pixel (i+∆m, j+∆n) in an image with size M × N cab be defined as. )=. N. ∑x. v where xi is the high dimension feature vector of the. O4 = {(0,1), (1,0 ), (1,1), (1, −1)}.. γ (∆ m , ∆ n. 1 N. N −1−∆n. ∑ ( f (x, y ) − f )( f (x + ∆m, y + ∆n) − f ) y =0. where f is the mean value of f ( x, y ) . The dimension of the auto-covariance matrix can be any size of images. In this study, Δm andΔn are both 7, after the processing of texture analysis, a 7 × 7 autocovariance matrix is obtained for each image. Because the value of γ (0,0) is always zero. The first element in matrix of every US image will be discarded.. where wp are the new feature vectors representing the xk . The textural feature vector of a queried ROI subimage, qi, can be approximated by the same linear combination and coefficients wq. The coefficients wp are the new feature vectors representing the xk . The textural feature vector from a query ROI subimage, qi, can be approximated with the same linear combination and computed the coefficients wq. An analysis was performed to assess the effects of the new feature vector for the US image database. In this study, we found that the ideal p value is 10, so each original 192-D textural feature vector was reduced by PCA into a 10-D new feature vector.. 5. Image Retrieval for Breast Cancer Diagnosis. 4. Principal Component Analysis. Firstly, the distance of the coefficients wq and wp need to be computed. Similar images were selected from database depending on the criterion of Euclidean distance. The proposed CAD system retrieves the first L tumor images with the smallest Euclidean distance. The queried image would be diagnosed as benign or malignant depending on the DS value of those retrieved images. The DS value is defined as. PCA is a well-known statistical processing technique that can reduce redundancy by projecting the original data over a proper basis. The result which PCA provided is a more applicable and diminished dimension vector. The following is the mathematical procedure of determine the principal components of a training set. We can view the previous textural features of a ROI subimage as a vector. Because the first element of the feature vector is always 1, the first element will be discarded and the rest elements of four ROI subimages for. L. DS = ∑ Weighti × Tumor _ classi i =1. 1207.
(4) Int. Computer Symposium, Dec. 15-17, 2004, Taipei, Taiwan.. Weighti =. L − i +1 L. ∑j. Table 1. The number of misdiagnosed cases of the proposed CAD system at threshold = 0.15 and the number of retrieved image is 7 for each test set.. ,. j =1. ⎧ 1 , if the retrieved image i is malignant case Tumor_classi = ⎨ ⎩ 0 , if the retrieved image i is benign case. The weight of each retrieved images is determined by the retrieved order. A cut-off threshold Th was predefined to separate benign and malignant tumor. If DS value is greater than Th, the tumor is classified as malignant one. Otherwise, the tumor is benign.. Test Set 1 2 3 4 5 6 7 8 9 10. Malignant Cases Benign Cases 1/4 1/21 0/4 2/22 0/4 2/22 0/4 7/22 2/4 1/22 0/4 7/22 0/3 2/23 1/3 3/22 1/3 3/22 0/3 4/22. 6. Results This study classifies the benign and malignant tumors based on retrieved US images. The k-fold cross-validation method [11] is used to estimate the performance of a CAD system. The proposed CAD system was trained and tested with k-fold cross validation (k = 10) methods to recognize the malignant or benign tumors. The 255 cases (1020 US images) in the database randomly divided into k groups. The performance of the proposed CAD system was also analyzed with ROC curve. With a cut-off threshold of 0.15 and the number of retrieval image is 7, the proposed CAD system correctly identifies 31 of 36 malignant tumors and 188 of 219 benign tumors. The proposed CAD system achieved an area under the ROC curve of 0.9253±0.007, as shown in Fig. 3. Table 1 lists the number of misdiagnosed cases at threshold = 0.15 for each test set. The accuracy of proposed CAD system for malignancy, the sensitivity, the specificity are illustrated in Table 2.. 7. Discussion Texture features are helpful to classify masses and normal tissue on sonography, the image retrieval technique provides a potentially useful tool for the sonographic decision support. This study proposes an efficient and effective CAD system with multiview sonograms to distinguish between benign and malignant tumors. The proposed CAD system diagnoses breast tumors using inter-pixel correlations within the four ROI subimages. Based on the experimental results, the proposed CAD system performs differential diagnosis very well. These results confirm that benign and malignant tumors can be classified using texture features in multi-view digital US images. From the satisfactory specificity and sensitivity of results, the proposed system is expected to be a useful computer-aided diagnostic tool for differentiating between benign and malignant cases using sonograms, and could avoid misdiagnosis and reduce the number of unnecessary surgical biopsies.. 1.2 1. Table 2. The performance of the proposed CAD system at threshold = 0.15 and the number of retrieval image is 7.. TPF. 0.8 0.6. Benign. Malignant. DS value < 0.15. TN 188. FN 5. DS value >= 0.15. FP 31. TP 31. 219. 36. 0.4 0.2 0 0. 0.2. 0.4. 0.6. 0.8. Total. 1. FPF. Note: TN: True Negative, FN: False Negative, FP: False Positive, TP: True Positive (1) Accuracy = (TP+TN)/(TP+TN+FP+FN) = 85.9% (2) Sensitivity = TP/(TP+FN) = 86.1% (3) Specificity = TN/(TN+FP) = 85.8%. Figure 3. The diagram of the ROC curve for the retrieval technique is employed in classifying of malignant and benign tumors (the Az value for the ROC curve is 0.9253 ± 0.007).. 1208.
(5) Int. Computer Symposium, Dec. 15-17, 2004, Taipei, Taiwan.. References [1] "Breast Cancer Facts & Figures 2001-2002,". American Cancer Society, 2003. [2] A.T. Stavros, D. Thickman, C.L. Rapp, M.A. Dennis, S.H. Parker, and G.A. Sisney, "Solid Breast Nodules - Use of Sonography to Distinguish Benign and Malignant Lesions," Radiology, vol. 196, no. 1, pp. 123-134, July 1995. [3] B.S. Garra, B.H. Krasner, S.C. Horii, S. Ascher, S.K. Mun, and R.K. Zeman, "Improving the Distinction Between Benign and Malignant Breast-Lesions - the Value of Sonographic Texture Analysis," Ultrasonic Imaging, vol. 15, no. 4, pp. 267-285, Oct. 1993. [4] D.R. Chen, R.F. Chang, and Y.L. Huang, "Computer-aided diagnosis applied to US of solid breast nodules by using neural networks," Radiology, vol. 213, no. 2, pp. 407-412, Nov. 1999. [5] D.R. Chen, R.F. Chang, Y.L. Huang, Y.H. Chou, C.M. Tiu, and P.P. Tsai, "Texture analysis of breast tumors on sonograms," Seminars in Ultrasound CT and MRI, vol. 21, no. 4, pp. 308316, Aug. 2000. [6] D.R. Chen, R.F. Chang, and Y.L. Huang, "Breast cancer diagnosis using self-organizing map for sonography," Ultrasound Med. Biol., vol. 26, no. 3, pp. 405-411, Mar. 2000. [7] D.R. Chen, R.F. Chang, W.J. Kuo, M.C. Chen, and Y.L. Huang, "Diagnosis of breast tumors with sonographic texture analysis using wavelet transform and neural networks," Ultrasound Med. Biol., vol. 28, no. 10, pp. 1301-1310, Oct. 2002. [8] I. T. Jolliffe, Principal Component Analysis. New York: Springer-Verlag, 1986. [9] U. Sinha and H. Kangarloo, "Principal component analysis for content-based image retrieval," Radiographics, vol. 22, no. 5, pp. 1271-1289, Sept. 2002. [10] D. C. Young and Y. S. Sang, “Image retrieval using BDIP and BVLC moments,” IEEE Trans. on circuits and systems for video technology, vol. 13, no. 9, pp. 951-957, Sept. 2003. [11] S.M. Weiss and I. Kapouleas, "An empirical comparison of pattern recognition neural nets and machine learning classification methods," Proc 11th Int Joint Conf Artificial Intelligence, pp. 234-237, 1989.. 1209.
(6)
數據
相關文件
In this paper, we build a new class of neural networks based on the smoothing method for NCP introduced by Haddou and Maheux [18] using some family F of smoothing functions.
Retrieval performance of different texture features according to the number of relevant images retrieved at various scopes using Corel Photo galleries. # of top
• While conventional PCA extracts principal components in the input space, KPCA aims at extracting principal components of variables (or features) that are nonlinearly related to
CAST: Using neural networks to improve trading systems based on technical analysis by means of the RSI financial indicator. Performance of technical analysis in growth and small
CAST: Using neural networks to improve trading systems based on technical analysis by means of the RSI financial indicator. Performance of technical analysis in growth and small
• Information retrieval : Implementing and Evaluating Search Engines, by Stefan Büttcher, Charles L.A.
We try to explore category and association rules of customer questions by applying customer analysis and the combination of data mining and rough set theory.. We use customer
It is concluded that the proposed computer aided text mining method for patent function model analysis is able improve the efficiency and consistency of the result with