• 沒有找到結果。

4.1 Assessment of validity by ROC curve

In practice, we rarely validate more the 100 genes as differentially expressed genes (Cope et al., 2004). Under both HGU95 and HGU133 datasets, the growth in TP of most combinations has already become flat gradually after FP>100 (Figure 1-1 and Figure 1-3). Furthermore, in these two datasets, true positive rates of the best performance have reached about 95% when FP<100. As for these combinations performing worst, their true positive rates increase after FP>100; even so, their performances still can not catch up to these combinations which perform well before FP<100 (Figure 1-2 and Figure 1-4). Thus, we focus on the part of FP<100 and report the summary statistic AUC up to 100 FP in both HGU95 and HGU133 datasets.

The other spike-in dataset, Golden spike dataset, unlike most microarray experiments assuming a small percentage of genes are differentially expressed, has nearly 10% genes differentially expressed. Considering the situation of FP less than 100 to evaluate performance is not suitable for this dataset. The patterns of all combinations where false positive rate larger than 0.1 are similar to the patterns where false positive rate close to 0.1 (Figure 1-5 and Figure 1-6). Thus, we recommend a conservative choice 0.1 as a cutoff of false positive rate.

Two algorithms of all combinations can not be executed in R and we have no information about their performance. That are HGU133 + dChip(PM-MM) + EBarrays(GG) and HGU133 + PDNN + EBarrays(GG). Thus, there are 35 combinations for both HGU95 and Golden Spike datasets, and only 33 combinations for HGU133 dataset.

We use AUC up to 100 FP in HGU95 and HGU133 datasets and up to 0.1 false positive rate in Golden spike dataset as assistants. For each dataset, all combinations

are ranked based on AUC. If there is distinct difference of AUC between two continuously ranked combinations, combinations are apart from there and divided into two groups. By this way, total combinations are clustered in four groups for HGU95 dataset, four groups for HGU133, and three for Golden spike dataset shown in Table 5, 6, 7 respectively.

For HGU95 dataset

Under HGU95 dataset, (1)RMA or PDNN cooperated with most differential expression methods have excellent performances, except for Welch t-test employed as differential expression method (Figure 1-2). (2)Conversely, the combinations of preprocessing method using MAS 5.0 or dChip(PM-MM) are inferior to other compared combinations (Figure 1-2), and the combinations in the group with smallest AUC is entirely composed by MAS 5.0 and dChip(PM-MM) as preprocessing method (Table 5). (3)As long as using Welch t-test as differential expression method, the performance is not good enough even if cooperated with RMA or PDNN (Figure 2-1).

(4)For a fixed differential expressed method, performances vary largely by employing different preprocessing methods, except for t-test and Welch t-test (Figure 3-1). And all combinations using t-test outperform than using Welch t-test.

For HGU133 dataset

Results in HGU133 are very similar to HGU95. (1)~(3) conclusions are shown in HGU133 as well (Figure 1-4 , Table 6 and Figure 2-2). The different result is that the performances vary largely by employing different preprocessing methods for each differential expressed method.

For Golden Spike dataset

Results in Golden Spike dataset are unlikeness to two datasets above. (1)Instead of RMA and PDNN, dChip have outstanding performances applied to this dataset.

Through viewing Figure 1-6, all combinations are divided into three groups clearly.

There are 11 combinations classified into the outstanding group, and all of them combine with dChip, especially for dChip(PM-only) that cooperated with every differential expression methods are contained. But dChip(PM-MM) has extreme performance. When it is cooperated with fitting differential expression method, such as t-test, Welch t-test, limma and SAM, the performance will be outstanding. On the contrary, it will perform disappointingly (Figure 2-3). (2)The following 7 combinations are the worst, MAS5.0+SAM, MAS5.0+FC, MAS5.0+ EBarrays(GG), MAS5.0+ EBarrays(LNN), dChip(PM-MM)+FC, dChip(PM-MM)+ EBarrays(GG), and dChip(PM-MM)+ EBarrays(LNN) (Table 7). Notice that, for all of the three datasets, the five combinations, MAS5.0+FC, MAS5.0+ EBarrays(GG), MAS5.0+

EBarrays(LNN), dChip(PM-MM)+FC, and dChip(PM-MM)+ EBarrays(LNN), are classified into the worst group clustered by AUC.

4.2 Assessment of reliability by overlap rate

For this dataset, the true number of differentially expressed genes is unknown.

We show the patterns of all combinations in log scale in Figure 4, and find that the trend of most of combinations is similar when the number of genes selected as differentially expressed is less than 10000. Moreover, if there are too many genes identified as differentially expressed genes, a much lower threshold of “score” of significance of the differential expression method must be set. But it is not a practical threshold. Thus, our comparison in reliability focuses on the value of x-axis less than 10000. Here, the four tissues suffering different treatments versus their controls are simply called as K_AA, L_AA, L_CFY, and L_RDL.

Low overlap rate for MAS 5.0 and dChip(PM-MM)

For each condition, K_AA, L_AA, L_CYF, and L_RDL, combinations are divided into five small graphs by preprocessing method such as Figure 5-1~5-4.

Figure 5-1~5-3 show that the overlap rates across two sites are lower than 0.6 when using MAS 5.0 or dChip(PM-MM), but higher overlap rates occur for three other preprocessing methods, RMA, PDNN and dChip(PM-only). For L_RDL, overlap rates exceed 0.6 when using MAS 5.0 or dChip(PM-MM), but that is caused by overall improvement of overlap rate for L_RDL, not for MAS 5.0 or dChip(PM-MM) only (Figure 5-4).

Performances for EBarrays

For each preprocessing method cooperated with EBarrays, very similar patterns under Gamma-Gamma model and Lognormal-Normal model are shown (Figure 6).

Usually, when using EBarrays, there is no overlap gene when small genes selected as differentially expressed but a rapidly increment in overlap rate happens when differentially expressed genes increase to some level. The level varies with different preprocessing method, usually MAS 5.0 and dChip(PM-MM) have lower level and the others have a higher level. However, even if a rapidly increment happens, the performance is still not good enough when compared to other combinations that perform well. The feature above can be saw by Figure 5-1~5-4.

Top 2 combinations

Now we assign the same color to combinations using the same differential expression method in Figure 7, most lines are clustered by color obviously. The performance is worse when using t-test (green) or Welch t-test (blue), and is better when using FC (black). SAM and limma perform well when fewer genes selected as differentially expressed.

Because of the poor performances with MAS5.0, dChip(PM-MM), t-test, Welch t-test and EBarrays, we consider totally 9 permutations with RMA, dChip(PM-only), PDNN as preprocessing method and FC, SAM, limma as differential expression method in Figure 8. Figure 8 shows that the two combinations, RMA+FC (blue) and

PDNN+FC (yellow) have the highest overlap rate and nearly equal. Thus the top two combinations in reliability are RMA+FC and PDNN+FC.

相關文件