The Spotted Microarray Images - 運用最大概似估計及期望最大演算法以重建與切割微正子斷層掃描與微陣列影像之統計應用

2. Methodologies

4.1 The Spotted Microarray Images

These 16 real microarray images used herein are obtained by swapping Cy3 and Cy5 dyes. Each array has 32 blocks, 15488 spots with 7744 genes. Two replicated spots are designed in one array, of which the upper 16 blocks are duplicated as the lower 16 blocks in Fig. 4.1-1. Meanwhile, eight spike genes are designed in each block to evaluate the performance and accuracy of segmentation methods as shown in Fig. 4.1-1.

Fig. 4.1-1: An example of microarray image with 32 blocks, 22 columns and 22 rows. One block is enlarged and eight spike genes are numbered.

A typical spot diameter on each microarray in this study is approximately 160 µm.

Sixteen microarray experiments were conducted in Genomic Medicine Research Core Laboratory of Chang Gung Memorial Hospital, Taiwan. The details of the microarray experiment procedure and probe information are available on the webpage of the laboratory,

http://www.cgmh.org.tw/intr/intr2/c32a0/chinese/corelab_intro/genetics/files/03OctCl one_information_F.zip ,

http://www.cgmh.org.tw/intr/intr2/c32a0/chinese/corelab_intro/genetics/files/MIAME

%20(GMRCL%20Human%207K)_ver01.zip, and in [42]. These eight pairs of swapped microarrays were used for cancer research. Some of the results have been published [43]. Figure 4.1-2 presents one of the results by the adaptive irregular

segmentation in GenePix 6.0 for spot images of Cy3 and Cy5 dyes. The segmentation region may be inaccurate, leading to an over- or under- estimation of the statistics on spot intensities.

Fig. 4.1-2: Typical segmentation of two spot images by the irregular segmentation method of GenePix 6.0. Parts a) and b) present the original images of Cy5 and Cy3 dyes. Parts c) and d) present the segmented region on the images of Cy5 and Cy3 dyes.

Figure 4.1-3 plots the estimated kernel density curves from spot images of Cy3 and Cy5 dyes using the R 2.4.0 software [41,

http://finzi.psych.upenn.edu/R/library/stats/html/density.html and http://www.r-project.org/]. These estimated densities typically have two distributions

in the foreground and background regions.

Fig. 4.1-3: Two estimated density curves for spot of Cy5 (a) and Cy3 (b) dyes. Both Cy3 and Cy5 images have two intensity distributions for background and foreground pixels. The local minimum is used to be cutting point for segmenting spot pixels.

4.2 Evaluation from Spike Genes

There are 256 spike genes on any array with different known Cy3 and Cy5 ratios.

Those spike genes are used to detect performance of GKDE, KDE, GMM and GenePix 6. Fig. 4.1-1 presents the locations and numbers of spike spots in one example of cDNA microarray images. Table 4.2-1 and 4.2-2 shows that all of the SSREs and the SSEs obtained from KDE are smaller than those obtained by the irregular segmentation method in GenePix 6.0, according to a test based on 16 real microarray cDNA images.

The relative improvements of these two segmentation methods are defined as the percentages of the evaluation values in (GenePix-Methods)/GenePix. Since the first eight arrays are produced according to Table 2.3.4-1 that have varying target ratios,

the relative improvements measured by SSRE and SSE are different according to Eqs.

(20) and (21). The last eight arrays are produced according to Table 2.3.4-2 that has a constant ratio, the relative improvements measured by SSRE and SSE are the same according to Eqs. (20) and (21). Table 4.2-1 and 4.2-2 shows that the average relative improvements of GKDE, KDE and GMM associated with the irregular segmentation method in GenePix 6.0 for SSRE and SSE are at the level of (23.5%, 20.9%), (10.5%, 9.2%), and (23.2%, 20.9%). These results reveal that the features estimated by GKDE, KDE and GMM are closer to the designed target ratios for the spike genes than those obtained by the irregular segmentation method in GenePix 6.0.

Table 4.2-1: The comparisons of SSEs are obtained for different methods based on spike genes. Array 1s is that obtained by swapping the dyes of Array 1. Relative improvement is specified by (GenePix-Method)/GenePix as a percentage.

Sum of Square of Errors Relative improvement

Array GKDE KDE GMM GenePix GKDE KDE GMM

1 160.4 180.2 160.6 185.6 13.57 2.87 13.44 1s 116.3 134.2 117.1 146.3 20.49 8.21 19.96 2 136.7 145.6 136.9 153.1 10.71 4.90 10.56 2s 729.0 878.6 729.9 904.5 19.40 2.86 19.30 3 134.4 148.6 135.3 158.5 15.21 6.23 14.60 3s 405.8 534.9 406.1 691.0 41.28 22.58 41.23 4 51.7 68.3 52.2 83.4 37.98 18.10 37.42 4s 300.0 308.9 300.3 318.1 5.69 2.91 5.62

5 231.6 258.8 232.5 276.2 16.14 6.29 15.82 5s 237.3 299.6 237.6 349.7 32.15 14.32 32.06 6 140.7 166.2 141.4 172.2 18.32 3.49 17.89 6s 146.5 173.3 147.4 185.9 21.18 6.81 20.71 7 127.5 157.2 128.4 175.9 27.51 10.65 27.01 7s 67.3 79.0 68.3 122.4 44.98 35.41 44.22 8 133.5 148.6 133.8 177.9 24.94 16.46 24.79 8s 107.4 137.7 108.3 145.9 26.42 5.63 25.78 Average Relative Performance 23.50 10.48 23.15

Table 4.2-2: The comparisons of SSREs are obtained for different methods based on spike genes.. Array 1s is that obtained by swapping the dyes of Array 1. Relative improvement is specified by (GenePix-Method)/GenePix as the percentage.

Sum of Square of Relative Errors Relative improvement

Array GKDE KDE GMM GenePix GKDE KDE GMM

1 8755.8 9624.5 8756.1 9739.2 10.10 1.18 10.09 1s 6146.4 7106.5 6147.2 7662.9 19.79 7.26 19.78 2 6768.2 7227.2 6768.4 7604.2 10.99 4.96 10.99 2s 25079.2 26518.8 25080.1 27604.5 9.15 3.93 9.14

3 6873.6 7595.9 6874.5 7968.9 13.75 4.68 13.73 3s 9923.7 11503.9 9924.0 12979.6 23.54 11.37 23.54

4 2640.0 3645.9 2640.4 4230.4 37.60 13.82 37.58 4s 16359.1 16560.6 16359.3 16621.4 1.58 0.37 1.58

5 5811.8 6470.4 5812.6 6905.1 15.83 6.29 15.82 5s 5939.3 7490.2 5939.6 8742.5 32.06 14.32 32.06

6 3527.9 4155.3 3535.3 4305.7 18.07 3.49 17.89 6s 3684.6 4331.5 3685.4 4648.1 20.73 6.81 20.71 7 3208.8 3929.1 3209.7 4397.2 27.03 10.65 27.01 7s 1705.5 1975.9 1706.4 3059.2 44.25 35.41 44.22

8 3344.0 3714.8 3344.2 4446.6 24.80 16.46 24.79 8s 2707.0 3443.1 2708.0 3648.6 25.81 5.63 25.78 Average Relative Performance 20.94 9.16 20.92

4.3 Evaluation from Duplicated Genes and Swapped Arrays

Table 4.3-1 shows the numbers of used spots excluding spike spots and bad spots in each array and its swapped array.

Table 4.3-1: Used spots excluding spike spots and bad spots in each array are listed and “x2” means two duplicates on every array.

Array Used Spots Call Rate %

1, 1s 7281x2 94.02%

2, 2s 7306x2 94.34%

3, 3s 7253x2 93.66%

4, 4s 7292x2 94.16%

5, 5s 7292x2 94.16%

6, 6s 7347x2 94.87%

7, 7s 7085x2 91.49%

8, 8s 7280x2 94.01%

The bad spots are defined by negative values of foreground minus background mean provided from GenePix 6.0. Those genes are used to investigate performance of GKDE, KDE, and GMM. Figure 4.3-1 shows agreement scatter plots of two replicates gene expression and swapped arrays using GKDE, KDE, GMM and Genepix 6 respectively. The KDE has less outliers than the GKDE, GMM and GenePix 6. In addition, the GKDE and GMM have less outliers than the GnenePix 6.

Fig. 4.3-1: Top row shows four methods to evaluate duplicated spots for 3^rd (red) and swapped 3^rd (blue) arrays. The x-axis and y-axis represent average and difference between duplicated spots. Bottom row shows four methods to evaluate swapped arrays (3^rd, 3^rd s). The x-axis and y-axis represent summation and difference between swapped arrays.

Figure 4.3-2 shows the concordance correlation coefficients, Pearson’s correlations and standard deviations between replicates gene expression of sixteen arrays and eight swapped arrays.

Fig. 4.3-2: Top and down figure are concordance correlations, Pearson’s correlations and standard deviations between duplicated spots of sixteen arrays and between swapped arrays of eight arrays using the GKDE, KDE, GMM and GenePix 6.

The KDE has produced higher correlation and lower standard deviation than those by other methods tested on sixteen arrays with duplicated genes. And the same results as tested on swapped arrays, the KDE has provided lower standard deviation and higher correlation between tested eight swapped arrays. In addition, the GKDE and GMM both have higher correlations and lower standard deviations that the GenePix 6.

5. Discussion and Conclusion

The proposed PDEM algorithm for microPET reconstruction with random correction is demonstrated to produce less noise level, high spatial resolution, and clear boundary of image than those of FORE+OSEM and FORE+FBP from the comparison studies of simulation, phantoms, and real mouse microPET data.

Meanwhile, the PDEM method reconstructs images with lower CVs and smaller FWHMs than those generated by methods built into the microPET R4. In addition, the PDEM method has the same advantages as the MLEM method in PET reconstruction—namely, row operation, linear complexity, monotonic convergence, non-negativity and parallelizability.

We have applied the GMM to segment 3D microPET images after the PDEM reconstruction. The GMM can model the segments of 3D microPET images with different distribution parameters. On the other hand, the K-means method proposed in literatures for segmentation microPET images assumed a constant variance for all clusters. Hence, the GMM approach is more flexible and accurate to model and segment microPET 3D images than K-means. The GMM proposed in this study can also perform the segmentation automatically through the initial estimated from the KDE method. On the other side, the seeding region growing methods proposed in literature for the segmentation of PET images, initial seeds were crucial to perform

images segmentation. The number of clusters was determined by a subjective choice or sequentially searching including the K-means method. When the activities of various clusters have different patterns, the slice normalization approach incorporated with GMM is useful to segment 3D images. For further investigation, it will be of great interest to further evaluate the qualitative and quantitative performance by more phantom and empirical studies with the comparisons to judgments from medical experts.

The GMM and KDE methods are also applied to spotted microarray images. The effect of expression profiling on prognostic and predictive testing for cancer has been recently discussed [47]. However, the low reproducibility of microarray experiments [48, 49] impedes the scheduler from using a microarray to prognose and predict the outcome of cancer. We combine GMM and KDE methods to segment spotted cDNA images. The GKDE was expected to fine tune the GMM and to determine a suitable cutting point for clustering foreground and background using the KDE. The GKDE, KDE and GMM methods can improve the reproducibility in duplicated spots, in swapped arrays and the spike gene spots. This will be useful for the advanced utilization of microarrays in biology and medicine.

In this study, the GKDE, KDE and GMM were applied to segment cDNA microarray images and the evaluation of performances was conducted. First, the spike

genes with known contents were designed on real microarrays, the criteria of SSRE and SSE measure the accuracy and performance. The GKDE, KDE and GMM more accurately estimate the features of spots than the adaptive region growing segmentation method in GenePix 6.0 does. Secondly, duplicated spots are utilized to examine expression variation on a microarray image. The KDE also has a better average relative performance, as measured by the concordance correlation coefficients, Pearson’s correlation coefficients and standard deviations of expression values of duplicated spots. Finally, swapped microarray experiments were conducted to study the variation among dyes. The correlation coefficients measure the linear relationship for the selected spots with significantly differentially expressed levels. Again, the KDE is more accurate, when tested on eight pairs of real swapped cDNA microarray images.

Sixteen real microarray cDNA images were used to determine the accuracy and performance, by comparison with the adaptive irregular segmentation method in GenePix 6.0. The ratio of means is used to estimate features in segmented spots.

Other statistics could be studied. Other methods for segmenting images can be studied further [50-52].

The GKDE, KDE and GMM programs were run in under one thousand seconds to test one real cDNA microarray image on a personal computer with Intel CPU 2.6 GHz

and 2GB RAM. Especially, the KDE algorithm has model free, computational efficiency and improved performance for segmenting cDNA microarray images used for biology and medicine. The method of GKDE also have similar advantages.

在文檔中運用最大概似估計及期望最大演算法以重建與切割微正子斷層掃描與微陣列影像之統計應用 (頁 52-65)