Validation results - 生物晶片資料分析與肺腺癌存活之預測

All the testing results were summarized in table 6.1. To compare with our signature, the testing results of using TNM tumor stage as a classiﬁer and the best-performing method (method A) from Shedden et al. (2008) were also included in table 6.1. Here the TNM tumor stage classiﬁer was simply separating the all stage patients into two groups; stage I and late stage. The

CHAPTER 6. SIGNATURE VALIDATION 52 stage I patients was separated by stage IA or IB. Method A gave a contin-uous risk score constructed by using the average expression proﬁles of 100 clusters to ﬁt ridged Cox proportional hazard model. We noted that the results of method A were analyzed from the expression proﬁles preprocessed by implementing dChip algorithm for entire training and testing data set.

In CAN/DF and MSK two testing sets, all of these four methods performed good for the all stage patients prediction. Only our gene signature, both continuous and categorical types, had all hazard ratios signiﬁcantly greater than 1 for all stage and stage I patients in both data sets. The hazard ratio of method A was not signiﬁcantly greater than 1 in CAN/DF data. The hazard ratios of TNM tumor stage IA and IB were not signiﬁcantly greater than 1 in both testing sets. Furthermore, the hazard ratio of TNM tumor stage IA and IB was smaller than 1 in CAN/DF data set. This result did not suggest using the IA and IB as a classiﬁer for stage I patients. In the external validation cohort data from Duke University, the hazard ratios of our signature were also signiﬁcantly great than 1 for the patients of both the all stage and early stage samples. We noticed that our signature performed better for stage I than all the patients, in this data set. We found that there were ﬁve stage IV and ﬁfteen stage IIIB patients in this data set. However, there were only 11 stage IIIB patients and no stage IV patients in training data set. Furthermore, there were ﬁve of these late stage patients died within half years. This might be a reason that all patients prediction performed not good as the TNM tumor stage.

The Kaplan-Meier curves were given to illustrate the diﬀerence of survival functions between high risk and low risk groups. The Kaplan-Meier curves for classifying patients by TNM tumor stages were also given. Due to the small

CHAPTER 6. SIGNATURE VALIDATION 53 sample size, highly censored rate and the relatively homogeneous samples, classifying CAN/DF into diﬀerent risk groups was much harder than other data set. The Kaplan-Meier curve showed that our signature had reasonable good prediction power in such a data set. The signiﬁcant p-value for the Duke validation set showed that our gene signature, derived from adenocarcinomas patients only, had potential to predict patients with diﬀerent tumor types.

We concluded that our gene signature had good prediction power for all stage or early stage non-small cell lung cancer patients.

CHAPTER 6. SIGNATURE VALIDATION 54

Table 6.1: Validation results in CAN/DF, MSK and Duke data sets CAN/DF All stage Hazard ratio 95% C.I. p-value CPE

Risk score 1.65 (1.17, 2.31) 0.002 0.662

Categorical 3.96 (1.68, 9.34) 0.001 0.651

TNM stage 3.25 (1.54, 6.84) 0.002 0.616

Method A 0.57 (1.20, 2.60) 0.003 0.623

CAN/DF stage I Hazard ratio 95% C.I. p-value CPE

Risk score 1.59 (1.01, 2.51) 0.036 0.666

Categorical 3.78 (1.04,13.74) 0.027 0.648

TNM stage 0.55 (0.17, 1.80) 0.347 0.546

Method A 1.29 (0.84, 1.98) 0.243 0.574

MSK All stage Hazard ratio 95% C.I. p-value CPE

Risk score 1.68 (1.13, 2.51) 0.012 0.614

Categorical 2.65 (1.29, 5.45) 0.006 0.614

TNM stage 3.87 (1.91, 7.85) 0.000 0.642

Method A 1.83 (1.24, 2.70) 0.002 0.627

MSK stage I Hazard ratio 95% C.I. p-value CPE

Risk score 2.23 (1.14, 4.35) 0.023 0.654

Categorical 11.89 (1.53, 92.16) 0.001 0.715

TNM stage 2.60 (0.70, 9.63) 0.127 0.611

Method A 2.10 (1.15, 3.84) 0.014 0.656

Duke All stage Hazard ratio 95% C.I. p-value CPE

Risk score 1.22 (1.02, 1.47) 0.032 0.580

Categorical 1.71 (1.01, 2.87) 0.043 0.566

TNM stage 2.17 (1.29, 3.63) 0.004 0.589

Duke stage I Hazard ratio 95% C.I. p-value CPE

Risk score 1.44 (1.10, 1.87) 0.007 0.635

Categorical 2.93 (1.36, 6.34) 0.005 0.625

TNM stage 1.97 (0.95, 4.10) 0.070 0.580

CHAPTER 6. SIGNATURE VALIDATION 55

0 10 20 30 40 50 60

0.00.20.40.60.81.0

Training - All stage

Time (months)

Proportion alive Low score (n=128)

High score (n=128)

Proportion alive Low score (n=55)

High score (n=56) Cat. p= 0.04181 CPE= 0.57 Score p= 0.03227 CPE= 0.58

0 10 20 30 40 50 60

0.00.20.40.60.81.0

CAN/DF - All stage

Time (months)

Proportion alive Low score (n=41)

High score (n=41)

Proportion alive Low score (n=52)

High score (n=52) Cat. p= 0.00596 CPE= 0.61 Score p= 0.01223 CPE= 0.61

Figure 6.1: Kaplan-Meier curves for all stage samples separated by gene signature

CHAPTER 6. SIGNATURE VALIDATION 56

0 10 20 30 40 50 60

0.00.20.40.60.81.0

Training - Stage I

Time (months)

Proportion alive Low score (n=79)

High score (n=80) Cat. p= 0.00095 CPE= 0.61 Score p= 0.00021 CPE= 0.65

0 10 20 30 40 50 60

0.00.20.40.60.81.0

Duke - Stage I

Time (months)

Proportion alive Low score (n=33)

High score (n=34) Cat. p= 0.00422 CPE= 0.62 Score p= 0.00663 CPE= 0.63

0 10 20 30 40 50 60

0.00.20.40.60.81.0

CAN/DF - Stage I

Time (months)

Proportion alive Low score (n=28)

High score (n=28) Cat. p= 0.03002 CPE= 0.65 Score p= 0.03591 CPE= 0.67

0 10 20 30 40 50 60

0.00.20.40.60.81.0

MSK - Stage I

Time (months)

Proportion alive Low score (n=31)

High score (n=32) Cat. p= 0.00246 CPE= 0.71 Score p= 0.02304 CPE= 0.65

Figure 6.2: Kaplan-Meier curves for stage I samples separated by gene sig-nature

CHAPTER 6. SIGNATURE VALIDATION 57

0 10 20 30 40 50 60

0.00.20.40.60.81.0

Training - All stage

Time (months)

Proportion alive Stage I (n=159)

Stage II~IV (n=97) TNM stage I vs II~IV p= 0 CPE= 0.63

0 10 20 30 40 50 60

0.00.20.40.60.81.0

Duke - All stage

Time (months)

Proportion alive Stage I (n=67)

Stage II~IV (n=44) TNM stage I vs II~IV p= 0.00269 CPE= 0.59

0 10 20 30 40 50 60

0.00.20.40.60.81.0

CAN/DF - All stage

Time (months)

Proportion alive Stage I (n=56)

Stage II~IV (n=26) TNM stage I vs II~IV p= 0.00107 CPE= 0.62

0 10 20 30 40 50 60

0.00.20.40.60.81.0

MSK - All stage

Time (months)

Proportion alive Stage I (n=63)

Stage II~IV (n=41) TNM stage I vs II~IV p= 6e-05 CPE= 0.64

Figure 6.3: Kaplan-Meier curves for all stage samples separated by TNM stage

CHAPTER 6. SIGNATURE VALIDATION 58

0 10 20 30 40 50 60

0.00.20.40.60.81.0

Training - Stage I

Time (months)

Proportion alive Stage IA (n=77)

Stage IB (n=82)

Proportion alive Stage IA (n=40)

Stage IB (n=27)

Proportion alive Stage IA (n=11)

Stage IB (n=45)

Proportion alive Stage IA (n=27)

Stage IB (n=36) TNM stage IA vs IB p= 0.03236 CPE= 0.64

Figure 6.4: Kaplan-Meier curves for stage I samples separated by TNM stage

Chapter 7 Summary and discussion

7.1 Summary

We reanalyzed a large adenocarcinomas data from Shedden et al. (2008) and derived a gene signature from seven gene expression proﬁles. Tested in two independent validation data sets, our gene signature had signiﬁcant prediction power for the survival of samples of all stage patients or stage I pa-tients only. Furthermore, our signature also had signiﬁcant prediction power in an external NSCLC data set that contained two diﬀerent tumor cell types.

Most of the available analysis procedures contain three conceptions; ini-tial data ﬁlter, feature selection and signature construction. Our analysis procedure also contains three steps; gene ﬁlter, gene selection and signature construction. Compared with other methods, our analysis procedure has several advantages. First, we proposed a new criterion to ﬁlter out some inconsistent genes. Second, the gene selection and the signature construc-tion are both supervised by the patient survivals. Third, some non-linear interactive structures are considered in our procedure. Fourth, the model

CHAPTER 7. SUMMARY AND DISCUSSION 60 assumption for the gene selection and the signature construction part in our procedure is the least. Furthermore, the interpretation of our signature is easy and clear. However, there are still some unsolved issues. Although our LA hub gene selecting procedure shows the signiﬁcancy of the selected LA hub gene, but the signiﬁcancy of its paired genes is not showed. A method to screen the selected paired genes is need. However, in practice, we may rely on some Biological knowledge to choose the paired genes. In our analysis procedure, there is only one LA pair used in the signature construction part.

However, if there are more LA pairs selected, a better model to incorporate the LA pairs in the dimension reduction model is needed.

在文檔中生物晶片資料分析與肺腺癌存活之預測 (頁 62-71)