• 沒有找到結果。

Real Examples

5.4 Magic Gamma Telescope Data

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

are given in Figure 5.1. In which, the density function of the diseased population is in red, while that of the non-diseased population is in blue. Consider the following diagnostic rule of a single biomarkers: the subject has a positive diagnosis if the observed value of the biomarker exceeds some critical value. In each figure of Figure 5.1, the vertical line x = c is the cutoff point corresponding to the upper limit t = 0.1 of the 1-specificity. Consequently, the marginal pAUC of each biomarker is the integration of all right-tail probabilities in the diseased distribution. Similarly, Table 5.7 and Figure 5.2 offer the information of the distributions of the optimal linear combinations of the reduced biomarker sets based on the two biomarker selection methods.

From these results, we find that pQ0/Q1 is critical in determining the contribution of the biomarker. When the diseased population is relatively more heterogeneous than the non-diseased population, i.e. pQ0/Q1 ≈ 0, the integration of the pAUC produces a greater value and hence a larger pAUC value. On the contrary, a more homogeneous diseased group (pQ0/Q1 ≈ ∞) tends to generate a smaller pAUC value. Hence the top four biomarkers although have obviously distinct characteristics between the two groups, and have strong associations with the disease, they are insignificant in testing the marginal pAUC. However, no matter which selection method we use, too small a value of

| (a∗Tµ1− a∗Tµ0)√

Q1 | (for example, HFS) may lead to an insignificance.

Finally, from the bottom of Table 5.7, although having a smaller| (a∗Tµ1−a∗Tµ0)√ Q1 |, the optimal linear combination via the Forward method has a greater pAUC than that via the Backward method due to having a lower pQ0/Q1.

5.4 Magic Gamma Telescope Data

The data of the telescope example are generated by the Monte Carlo program (MC) to simulate registration of high energy gamma particles in a ground-based atmospheric Cherenkov gamma telescope using an imaging technique. The data information and records are available at http://archive.ics.uci.edu/ml/datasets/MAGIC+GAMMA+Telescope.

The data set contains 12,332 gamma (signal) and 6,688 harden (background) and have ten classifiers. The data website suggests five specificity lower limits from 0.8 to 0.99.

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

That is, t is ranged from 0.01 to 0.2. Hence, we show the optimal linear combination of our algorithm as well as those of Liu et al. (2005) and the corresponding pAUC values for the specificity range (1− t, 1) in the upper panel of Table 5.8. From Table 5.8, when t is less than or equal to 0.1, we observe that the linear combinations are almost the same, and the main contributing variables are the third, fourth and fifth markers. When t is increased to 0.2, the fourth and fifth variables remain important but the contribution of the third variable is greatly reduced.

Next, the biomarker selection results of t = 0.1, 0.2 based on the standardized data are present in the lower panel of Table 5.8. As in the breast tissue example, after data standardization, the number of the major contributing variables which have their coeffi-cients further away from zero increases. Before biomarker selection, the eighth and the tenth markers have the smallest absolute values of the coefficients in the optimal linear combination of the full set. We find that the two biomarker selection methods select the same significant biomarker set, which excludes the eighth and the tenth biomarkers. The stepwise details are reported in Table 5.9. Moreover, the result of the optimal linear com-bination of the reduced biomarker sets, which are selected via the LASSO method, are also reported in the lower panel of Table 5.8. When using λmin, two markers, fConc and fAsym are discarded. However, using λ1SE, three markers, fConc , fAsym and fM3Trans are discarded. In addition, the biomarker sets selected via LASSO with two different λ are both different from the biomarker set selected via our methods. In view of the optimal sample pAUC for application, when t = 0.1, our two biomarker selection methods are better than the LASSO with two different λ. But when t = 0.2, the LASSO with two different λ are better than our two biomarker selection methods.

Figure 5.1: The distributions of I0, P, DR, DA, PA500, AREA, A/DA, MAX IP, and HFS for two groups in the breast tissue example.

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

−50 −4 −3 −2 −1 0 1 2 3 4 5

0.5 1 1.5 2

Forward Method

c

D=0 D=1

−20 −1.5 −1 −0.5 0 0.5 1 1.5 2

0.5 1 1.5 2 2.5 3 3.5 4 4.5

Backward Method

c

D=0 D=1

Figure 5.2: The distributions of the best linear combination obtained via the Forward method and the Backward method for two groups in the breast tissue example.

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

Table 5.8: The coefficients of the optimal linear combination and the corresponding pAUC value for the specificity range (1−t, 1) in the telescope example.

I. Finding the optimal linear combination

t Method fLength fWidth fSize fConc fConc1 fAsym fM3Long fM3Trans fAlpha fDist \pAUCn 0.01 Multiple-initial -0.0028 -0.0065 0.5285 -0.7354 -0.4240 0.0006 0.0008 0.0003 -0.0106 0.0003 0.0009

Liu et al. (2005) -0.0006 -0.0052 0.5330 -0.5124 0.6733 0.0004 0.0006 0.0002 -0.0025 0.0002 0.0006 0.02 Multiple-initial -0.0029 -0.0059 0.4531 -0.7391 -0.4982 0.0006 0.0008 0.0003 -0.0109 0.0002 0.0025 Liu et al. (2005) -0.0006 -0.0052 0.5330 -0.5124 0.6733 0.0004 0.0006 0.0002 -0.0025 0.0002 0.0016 0.05 Multiple-initial -0.0031 -0.0048 0.3287 -0.7363 -0.5913 0.0005 0.0007 0.0003 -0.0112 0.0002 0.0100 Liu et al. (2005) -0.0006 -0.0052 0.5330 -0.5124 0.6733 0.0004 0.0006 0.0002 -0.0025 0.0002 0.0064 0.10 Multiple-initial -0.0035 -0.0040 0.2638 -0.8770 -0.4014 0.0005 0.0006 0.0002 -0.0121 0.0001 0.0284 Liu et al. (2005) -0.0006 -0.0052 0.5330 -0.5124 0.6733 0.0004 0.0006 0.0002 -0.0025 0.0002 0.0181 0.20 Multiple-initial -0.0007 0.0005 -0.0509 0.3478 -0.9362 0.0002 0.0003 0.0000 -0.0040 0.0000 0.0778 Liu et al. (2005) -0.0006 -0.0052 0.5330 -0.5124 0.6733 0.0004 0.0006 0.0002 -0.0025 0.0002 0.0508 II. Biomarker selection

t Method fLength fWidth fSize fConc fConc1 fAsym fM3Long fM3Trans fAlpha fDist \pAUCn 0.1 Full set (raw) -0.0035 -0.0040 0.2638 -0.8770 -0.4014 0.0005 0.0006 0.0002 -0.0121 0.0001 0.0284

Full set (standardized) -0.3441 -0.2898 0.3126 -0.2359 -0.2741 0.1079 0.0721 0.0195 -0.7427 -0.0156 0.0284 Forward -0.3500 -0.2849 0.2969 -0.2427 -0.2819 0.1124 0.0715 0.0000 -0.7430 0.0000 0.0284 Backward -0.3500 -0.2849 0.2969 -0.2427 -0.2819 0.1124 0.0715 0.0000 -0.7430 0.0000 0.0284 LASSO(λmin) -0.3604 -0.1380 0.2110 0.0000 -0.5375 0.0000 0.0939 0.0028 -0.7133 -0.0001 0.0282 LASSO(λ1SE) -0.3408 -0.1995 0.3549 0.0000 -0.4670 0.0000 0.1052 0.0000 -0.6990 -0.0178 0.0281 0.2 Full set (raw) -0.0007 0.0005 -0.0509 0.3478 -0.9362 0.0002 0.0003 0.0000 -0.0040 0.0000 0.0778 Full set (standardized) -0.3863 -0.1445 0.1316 0.0048 -0.4854 0.0640 0.0820 0.0152 -0.7519 -0.0230 0.0814 Forward -0.4115 -0.2147 0.2414 0.0000 -0.4343 0.1045 0.0649 0.0000 -0.7229 0.0000 0.0810 Backward -0.4115 -0.2147 0.2414 0.0000 -0.4343 0.1045 0.0649 0.0000 -0.7229 0.0000 0.0810 LASSO(λmin) -0.3732 -0.0860 0.0191 0.0000 -0.5657 0.0000 0.0637 0.0130 -0.7271 -0.0045 0.0814

74

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

Table 5.9: The Forward and the Backward selections for the specificity range (0.9,1) in the telescope example.

I. Forward selection

Step Marker entries Test statistic Test value p-value Marker selected 1 fAlpha pAUC\ 0.0200 0.000 fAlpha

2 fLength ˆafLength -0.1614 0.000 fAlpha, fLength 3 fSize ˆafSize 0.5234 0.000 fAlpha, fLength, fSize

4 fWidth ˆafWidth -0.2910 0.000 fAlpha, fLength, fSize, fWidth

5 fConcl ˆafConcl -0.5064 0.000 fAlpha, fLength, fSize, fWidth, fConcl

6 fConc ˆafConc -0.3068 0.000 fAlpha, fLength, fSize, fWidth, fConcl, fConc

7 fAsym ˆafAsym 0.1141 0.000 fAlpha, fLength, fSize, fWidth, fConcl, fConc, fAsym

8 fM3Long ˆafM3Long 0.0715 0.000 fAlpha, fLength, fSize, fWidth, fConcl, fConc, fAsym, fM3Long 9 fM3Trans ˆafM3Trans 0.0188 0.364 fAlpha, fLength, fSize, fWidth, fConcl, fConc, fAsym, fM3Long 10 fDist ˆafDist -0.0152 0.466 fAlpha, fLength, fSize, fWidth, fConcl, fConc, fAsym, fM3Long II. Backward selection

Step Marker assessed Test statistic Test value p-value Marker selected

1 All pAUC\n 0.0284 0.000 fDist, fM3Trans, fM3Long, fAsym, fConc, fConc1, fWidth, fSize, fLength, fAlpha 2 fDist ˆafDist -0.0157 0.360 fM3Trans, fM3Long, fAsym, fConc, fConc1, fWidth, fSize, fLength, fAlpha 3 fM3Trans ˆafM3Trans 0.0188 0.256 fM3Long, fAsym, fConc, fConc1, fWidth, fSize, fLength, fAlpha

4 fM3Long ˆafM3Long 0.0715 0.002 fM3Long, fAsym, fConc, fConc1, fWidth, fSize, fLength, fAlpha 5 fAsym ˆafAsym 0.1124 0.000 fM3Long, fAsym, fConc, fConc1, fWidth, fSize, fLength, fAlpha 6 fConc ˆafConc -0.2427 0.000 fM3Long, fAsym, fConc, fConc1, fWidth, fSize, fLength, fAlpha 7 fConcl ˆafConcl -0.2820 0.000 fM3Long, fAsym, fConc, fConc1, fWidth, fSize, fLength, fAlpha 8 fWidth ˆafWidth -0.2849 0.000 fM3Long, fAsym, fConc, fConc1, fWidth, fSize, fLength, fAlpha 9 fSize ˆafSize 0.2969 0.000 fM3Long, fAsym, fConc, fConc1, fWidth, fSize, fLength, fAlpha 10 fLength ˆafLength -0.3500 0.000 fM3Long, fAsym, fConc, fConc1, fWidth, fSize, fLength, fAlpha 11 fAlpha ˆafAlpha -0.7430 0.000 fM3Long, fAsym, fConc, fConc1, fWidth, fSize, fLength, fAlpha

Note: indicates a significance at α = 5%.

75

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

Chapter 6

相關文件