• 沒有找到結果。

Discussion of the selected genes

Our gene signature was derived from the expression profiles of seven genes, where five genes were selected by correlation method and other two genes were selected by liquid association method. According to the sign of the coef-ficient of our signature, we called one selected gene protect or risk genes. The negative coefficient indicated the increase of expression associated with good prediction and the positive coefficient indicated the increase of expression associated with pool prediction. Both protect and risk genes were contained in these 7 genes.

The 3 protect genes are TMEM66, CSRP1 and BECN1. BECN1, also known as autophagy-related gene 6, played a key role in autophagy. It has been reported to be involved in various cancers. It can inhibit the growth of colorectal cancer cells. The expression of beclin 1 is associated with favorable

CHAPTER 7. SUMMARY AND DISCUSSION 61 prognosis in stage IIIB colon cancers. CSRP1 is related to gene regulation, cell growth, and differentiation. And it is hypothesized to be a colorectal cancer related tumor suppressor gene. TMEM66 is a novel gene known as transmembrane protein 66. The 2 risk genes are FOSL2 and ERO1L. FOSL2 is contained in the Fos gene family which encodes leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcrip-tion factor complex AP-1. It was implicated as regulators of cell proliferatranscrip-tion, differentiation, and transformation. The over expression of FOSL2 was also indicated that may play a major role in CCR4 expression and oncogenesis in ATLis associated with a more aggressive tumor phenotype and is probably involved in breast cancer progression in vivo. ERO1L is essential oxidoreduc-tase; a source of oxidative stres. However, it was indicated as a protect gene in another lung adenocarcinoma prognostic study [30]. Another study [31]

suggested that ERO1L plays a key role for inhibiting tumor growth via in-hibiting VEGF-driven angiogenesis. These results disagreed with our finding.

Therefore, to check this issue, we used the gene ERO1L to fit the univariate Cox proportional hazard model. In the training data set, the hazard ratio is 1.35 and it is significantly greater than 1 with 95% confidence interval from 1.19 to 1.54 (p-value < 10−5). In other validation data sets, the hazard ratios are all greater than 1, although they are not significant. Thus, in this large sample data set, the increase of expression profile ERO1L was related poor prediction.

The LA hub gene SRP54 is known as signal recognition particle 54kDa. It binds to the signal sequence of presecretory protein when they emerge from the ribosomes and transfers them to TRAM. A gene chromosomes study suggest that SRP54, BAZ1A, NFKBIA, MBIP, HNF3A, and two

unchar-CHAPTER 7. SUMMARY AND DISCUSSION 62 acterized expressed sequence tags are candidate targets of the amplification mechanism and therefore may be associated, together or separately, with de-velopment and progression of esophageal squamous cell carcinoma (ESC). Its paired gene PAWR is a human gene coding for a tumor-suppressor protein that induces apoptosis in cancer cells, but not in normal cells and specifically upregulated during apoptosis of prostate cells.

Relationship between our permutation procedure and Benjamini-Hochberg procedure under independent assumption

In gene selection part, we used a permutation procedure to select the cutoff of the correlation coefficients and LA scores. The cutoff was decided by the ratio of the expected number and the observed number of correlation coefficients greater than the cutoff. However, this issue can be viewed as a multiple testing issue. A total of 6,252 null hypotheses that Xg and Y was uncorrelated were tested by the test statistics corr(N (xg), N (y)), for g = 1, 2, ..., G. Since the normal quantile transformation only depended on the ranks of the variables, N (xg)’s were identically distributed. Thus, we can apply the permutation test to construct the reference distribution of corr(N (x), N (y)) and get the significant level for each observed rg = corr(N (xg), N (y)). The two-sided p-value for a corresponding cutoff r can be evaluated by where N was the collection of all the permutations of N(x)’s. Since the number of elements in N was too large (256!), in practice, the p-value can be estimated by

CHAPTER 7. SUMMARY AND DISCUSSION 63 where M was a large subset of N . With the p-value given from the permu-tation test, we might implement the Benjamini-Hochberg procedure [17] to solve the multiple testing issue. The Benjamini-Hochberg procedure can be summarized as the follow. Let p(i) be the ordered p-value, for i = 1, 2, ..., n.

The null hypotheses H(1)0 , H(2)0 , ..., H(k)0 were rejected for k = max{i | p(i)

i

mq}. Under the independent or positively correlated assumption of the tests, this procedure controlled the false discovery rate (FDR) at level mm0q ≤ q, where m was the total number of hypotheses (G) and m0 was the num-ber of true null hypotheses. Furthermore, in Benjamini and Yekutieli (2001) [17], they concluded that if we replaced q by q/m

i=1 1

i, then the Benjamini-Hochberg procedure still controlled the false discovery rate (FDR) at level

m0

mq ≤ q. Nevertheless, the modified criterion became more conservative than the Bonferroni method for the first few ordered p-value. In our data, there were no genes could be selected with this modified procedure.

The complex correlation structure of thousands of genes is not clear.

The dependency of these tests is also not clear. The modified procedure might be too conservative. Here we discuss the relationship between our permutation procedure and the Benjamini-Hochberg procedure under the in-dependent assumption. If we assume that the expressions x1, x2, ..., xG were independent, then our procedure constructed a reference distribution with a total number 1,000×6,252. The p-value of the ordered absolute value of correlation coefficients ˆr(g) can be estimated by ˆeg/G from equation (7.2) with #(M) = 1000G. Then our selecting criterion j = max{i | ei/i ≤ α}

is the same as the original criterion in the Benjamini-Hochberg procedure by taking α = q and dividing G both side. Thus, under the independent assumption, our selecting procedure is the same as the Benjamini-Hochberg

CHAPTER 7. SUMMARY AND DISCUSSION 64 procedure with p-value estimated by the permutation test.

Gene signature performance in different data preprocess methods

The expression profiles we analyzed in were preprocessed by MAS 5.0 Statistical algorithm from Affymatrix (2001). One reason we used the MAS 5.0 preprocessing method is that the MAS 5.0 algorithm allows us to imple-ment on each chip separately. To preprocess the training and testing data together is somewhat unrealistic. This reason also motivated us to filter out the genes with inconsistent expression profiles by different preprocessing method. However, we used the expression profiles preprocessed by two dif-ferent preprocessing methods dChip and RMA of our selected genes to derive the signature by applying modified SIR in training data. All the preprocess methods were implemented separately for each data set. After implement-ing the modified slice inverse regression method, the p-values of the large sample chi-spuared test were all greater than 0.05 for these two preprocess method. We did not suggest using any leading eigenvector as the gene sig-nature. However, as a comparison, we still validated the signatures with the same procedure on CAN/DF and MSK data sets. The estimated coefficients of SIR direction for expression profiles preprocessed by dChip and RMA were given in table 7.1

We noted that our cross platforms adjustment could not be applied for these preprocessing methods. Therefore, we did not test the signatures on the Duke data set. The estimated hazard ratios with the corresponding 95%

confident intervals, p-values and the CPE were given in table 7.2. The results showed that the continuous risk were also significant in all testing sets and

CHAPTER 7. SUMMARY AND DISCUSSION 65

Table 7.1: The estimated coefficients of SIR direction for expression profiles preprocessed by dChip and RMA

dChip SIR dir. coefficient RMA SIR dir. coefficient TMEM66 -0.8949 (Protect) TMEM66 -0.6386 (Protect) CSRP1 -0.4813 (Protect) CSRP1 -0.5222 (Protect) BECN1 -1.7374 (Protect) BECN1 -1.0540 (Protect)

FOSL2 0.7738 (Risk) FOSL2 0.4945 (Risk)

ERO1L 0.6041 (Risk) ERO1L 0.5166 (Risk)

(SRP54) -0.5563 ( - ) (SRP54) -0.5033 ( - )

(PAWR) 0.0448 ( - ) (PAWR) 0.0600 ( - )

SRP54*PAWR -0.3652 (Protect) SRP54*PAWR -0.7517 (Protect)

p-value 0.10 p-value 0.10

the categorical classifier were not significant for stage I patients in CAN/DF data. The Kaplan-Meier curves also illustrated that these signature per-formed worse than the original signature for the stage I patients in CAN/DF data preprocessed by MAS 5.0 algorithm. One possible reason was the mean shift caused by preprocessing data set separately. Our dimension reduction model was nonlinear because we added the interaction term. The mean shift issue may affect our nonlinear structure. Therefore, these results suggest that the gene signature had potential for lung adenocarcinomas diagnostic, but for implementing in practice, the improvement of array technique and the preprocessing is still needed.

CHAPTER 7. SUMMARY AND DISCUSSION 66

Table 7.2: Validation results in CAN/DF and MSK data preprocessed by dChip and RMA

CAN/DF All stage Hazard ratio 95% C.I. p-value CPE Risk score - dChip 2.76 (1.78, 4.28) 0.000 0.710 Risk score - RMA 2.04 (1.47, 2.82) 0.000 0.659 Categorical - dChip 4.35 (1.84, 10.28) 0.000 0.659 Categorical - RMA 3.44 (1.51, 7.84) 0.002 0.639 CAN/DF stage I Hazard ratio 95% C.I. p-value CPE Risk score - dChip 2.12 (1.13, 3.97) 0.024 0.653 Risk score - RMA 1.70 (1.06, 2.73) 0.049 0.617 Categorical - dChip 2.11 (0.69, 6.50) 0.182 0.591 Categorical - RMA 1.48 (0.49, 4.42) 0.484 0.549 MSK All stage Hazard ratio 95% C.I. p-value CPE Risk score - dChip 1.79 (1.20, 2.67) 0.004 0.632 Risk score - RMA 1.64 (1.09, 2.48) 0.019 0.607 Categorical - dChip 2.88 (1.38, 6.03) 0.003 0.622 Categorical - RMA 3.04 (1.45, 6.35) 0.002 0.627 MSK stage I Hazard ratio 95% C.I. p-value CPE Risk score - dChip 3.08 (1.41, 6.71) 0.003 0.715 Risk score - RMA 2.46 (1.18, 5.15) 0.017 0.672 Categorical - dChip 9.97 (1.28, 77.5) 0.003 0.708 Categorical - RMA 9.97 (1.28, 77.5) 0.003 0.708

CHAPTER 7. SUMMARY AND DISCUSSION 67

0 10 20 30 40 50 60

0.00.20.40.60.81.0

CAN/DF - All stage

Time (months)

Proportion alive Low score (n=41)

High score (n=41)

Proportion alive Low score (n=52)

High score (n=52) Cat. p= 0.00325 CPE= 0.62 Score p= 0.00428 CPE= 0.63

0 10 20 30 40 50 60

0.00.20.40.60.81.0

CAN/DF - Stage I

Time (months)

Proportion alive Low score (n=28)

High score (n=28)

Proportion alive Low score (n=31)

High score (n=32) Cat. p= 0.00662 CPE= 0.71 Score p= 0.00311 CPE= 0.71

Figure 7.1: Kaplan-Meier curves for all stage and stage I samples separated by gene signature preprocessed by dChip in CAN/DF and MSK data

CHAPTER 7. SUMMARY AND DISCUSSION 68

0 10 20 30 40 50 60

0.00.20.40.60.81.0

CAN/DF - All stage

Time (months)

Proportion alive Low score (n=41)

High score (n=41) Cat. p= 0.00173 CPE= 0.64 Score p= 0.00011 CPE= 0.66

0 10 20 30 40 50 60

0.00.20.40.60.81.0

MSK - All stage

Time (months)

Proportion alive Low score (n=52)

High score (n=52) Cat. p= 0.00196 CPE= 0.63 Score p= 0.01947 CPE= 0.61

0 10 20 30 40 50 60

0.00.20.40.60.81.0

CAN/DF - Stage I

Time (months)

Proportion alive Low score (n=28)

High score (n=28) Cat. p= 0.48239 CPE= 0.55 Score p= 0.04892 CPE= 0.62

0 10 20 30 40 50 60

0.00.20.40.60.81.0

MSK - Stage I

Time (months)

Proportion alive Low score (n=31)

High score (n=32) Cat. p= 0.00662 CPE= 0.71 Score p= 0.01704 CPE= 0.67

Figure 7.2: Kaplan-Meier curves for all stage and stage I samples separated by gene signature preprocessed by RMA in CAN/DF and MSK data

Bibliography

[1] A. Jemal, R. Siegel, E. Ward, Y. Hao, J. Xu and M. J. Thun, Cancer Statistics, 2009, CA Cancer J. Clin., 59 (2009), 225-249.

[2] D. M. Parkin, F. Bray, J. Ferlay and P. Pisani, Global Cancer Statis-tics, 2002, CA Cancer J. Clin., 55 (2005), 74-108.

[3] C. M. Booth and F. A. Shepherd, Adjuvant chemotherapy for re-sected non-small cell lung cancer, J. Thorac. Oncol., 1 (2006), 180-187.

[4] A. Bhattacharjee, W. G. Richards, J. Staunton, C. Li, S. Monti, P. Vasa, C. Ladd, J. Beheshti, R. Bueno, M. Gillette, M. Loda, G. Weber, E. J.

Mark, E. S. Lander, W. Wong, B. E. Johnson, T. R. Golub, D. J. Sugar-baker and M. Meyerson, Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses, Proc. Natl. Acad. Sci., 98 (2001), 13790-13795.

[5] M. E. Garber, O. G. Troyanskaya, K. Schluens, S. Petersen, Z. Thaesler, M. Pacyna-Gengelbach, M. van de Rijn, G. D. Rosen, C. M. Perou, R.

I. Whyte, R. B. Altman, P. O. Brown, D. Botstein and I. Petersen, Diversity of gene expression in adenocarcinoma of the lung, Proc. Natl. Acad. Sci., 98 (2001), 13784-13789.

69

BIBLIOGRAPHY 70 [6] D. A. Wigle, I. Jurisica, N. Radulovich, M. Pintilie, J. Rossant, N. Liu, C. Lu, J. Woodgett, I. Seiden, M. Johnston, S. Keshavjee, G. Darling, T. Winton, B. J. Breitkreutz, P. Jorgenson, M. Tyers, F. A. Shepherd and M. S. Tsao, Molecular profiling of non-small cell lung cancer and correlation with disease-free survival, Cancer Res., 62 (2002), 3005-3008.

[7] A. Potti, S. Mukherjee, R. Petersen, H. K. Dressman, A. Bild, J. Koontz, R. Kratzke, M. A. Watson, M. Kelley, G. S. Ginsburg, M. West, D. H.

Harpole Jr. and J. R. Nevins, A genomic strategy to refine prog-nosis in early-stage non–small-cell lung cancer, N. Engl. J. Med., 355 (2006), 570-580.

[8] H. Y. Chen, S. L. Yu, C. H. Chen, G. C. Chang, C. Y. Chen, A. Yuan, C. L. Cheng, C. H. Wang, H. J. Terng, S. F. Kao, W. K. Chan, H. N.

Li, C. C. Liu, S. Singh, W. J. Chen, J. J. W. Chen and P. C. Yang, A five-gene signature and clinical outcome in non-small-cell lung cancer, N. Engl. J. Med., 356 (2007), 11-20.

[9] Y. Lu, Y. Lemon, P. Y. Liu, Y. Yi, C. Morrison, P. Yang, Z. Sun, J.

Szoke, W. L. Gerald, M. Watson, R. Govindan and M. You, A gene expression signature predicts survival of subjects with stage I nonsmall cell lung cancer, PLoS Med., 12 (2006), e467.

[10] K. Shedden, J. M. G. Taylor, S. A. Enkemann, M. S. Tsao, T. J. Yeat-man, W. L. Gerald, S. Eschrich, I. Jurisica, T. J. Giordano, D. E. Misek, A. C. Chang, C. Q. Zhu, D. Strumpf, S. Hanash, F. A. Shepherd, K.

Ding, L. Seymour, K. Naoki, N. Pennell, B. Weir, R. Verhaak, C. Ladd-Acosta, T. Golub, M. Gruidl, A. Sharma, J. Szoke, M. Zakowski, V.

BIBLIOGRAPHY 71 Rusch, M. Kris, A. Viale, N. Motoi, W. Travis, B. Conley, V. E. Se-shan, M. Meyerson, R. Kuick, K. K. Dobbin, T. Lively, J. W. Jacobson and D. G. Beer, Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study, Nat. Med., 14 (2008), 822-827.

[11] T. Wu, W. Sun, S. Yuan, C. H. Chen and K. C. Li, A method for analyzing censored survival phenotype with gene expression data, BMC Bioinformatics, 9 (2008), 417.

[12] A. H. Bild, G. Yao, J. T. Chang, Q. Wang, A. Potti, D. Chasse, M.

B. Joshi, D. Harpole, J. M. Lancaster, A. Berchuck, J. A. Olson Jr., J. R. Marks, H. K. Dressman, M. West and J. R. Nevins, Oncogenic pathway signatures in human cancers as a guide to targeted therapies, Nature, 439 (2006), 353-357.

[13] K. C. Li, Genome-wide co-expression dynamics: theory and ap-plication, Proc. Natl. Acad. Sci., 99 (2002), 16875-16880.

[14] K. C. Li, A. Palotie, S. Yuan, D. Bronnikov, D. Chen, X. Wei and O. W.

Choi, J. Saarela and L. Peltonen, Finding disease candidate genes by liquid association, Genome Biol., 8 (2007), R205.

[15] K. C. Li, Sliced inverse regression for dimension reduction (with discussion), J. Amer. Statist. Assoc., 86 (1991), 316-327.

[16] K. C. Li, J. L. Wang and C. H. Chen, Dimension reduction for censored regression data, The Annals of Statistics, 27 (1999), 1-23.

[17] Y. Benjamini and D. Yekutieli, The control of the false discovery rate in multiple testing under dependency, The Annals of Statis-tics, 29 (2001), 1165–1188.

BIBLIOGRAPHY 72 [18] Affymetrix, Statistical algorithms reference guide Technical report,

Affymetrix Inc.; (2001).

[19] C. Li, and W. H. Wong, Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error ap-plications, Genome Biol., 2 (2001), 1-11.

[20] R. A. Irizarry, B. Hobbs, F. Collin, Y. D. Beazer-Barclay, K. J. An-tonellis, U. Scherf and T. P. Speed, Exploration, normalization and summaries of high density oligonucleotide array probe level data, Biostatistics, 4 (2003), 249-264.

[21] D. R. Cox and D. Oakes, Analysis of survival data, London; New York:

Chapman and Hall Ltd.; (1984).

[22] J. P. Klein and M. L. Moeschberger, Survival Analysis - Techniques for Censored and Truncated Data second edition, New York: Springer;

(2003).

[23] M. Gönen and G. Heller, Concordance probability and discrimi-natory power in proportional hazards regression, Biometrika, 92 (2005), 965-970.

[24] L. Pirtoli, G. Cevenini, P. Tini, M. Vannini, G. Oliveri, S. Marsili, V.

Mourmouras, G. Rubino and C. Miracco, The prognostic role of Beclin 1 protein expression in high-grade gliomas, Autophagy, 5 (2009), 930-936.

[25] K. Koneri, T. Goi, Y. Hirono, K. Katayama and A. Yamaguchi, Beclin 1 gene inhibits tumor growth in colon cancer cell lines, Anticancer Res., 27 (2007), 1453-1458.

BIBLIOGRAPHY 73 [26] B. X. Li, C. Y. Li, R. Q. Peng, X. J. Wu, H. Y. Wang, D. S. Wan, X.

F. Zhu and X. S. Zhang, The expression of beclin 1 is associated with favorable prognosis in stage IIIB colon cancers, Autophagy, 5 (2009), 303-306.

[27] C. Z. Zhou, G. Q. Qiu, X. L. Wang, J. W. Fan, H. M. Tang, Y. H. Sun, Q. Wang, F. Huang, D. W. Yan, D. W. Li and Z. H. Peng, Screening of tumor suppressor genes on 1q31.1-32.1 in Chinese patients with sporadic colorectal cancer, Chin. Med. j. (Engl.), 121 (2008), 2479-2486.

[28] T. Nakayama, K. Hieshima, T. Arao, Z. Jin, D. Nagakubo, A. K. Shi-rakawa, Y. Yamada, M. Fujii, N. Oiso, A. Kawada, K. Nishio and O.

Yoshie, Aberrant expression of Fra-2 promotes CCR4 expres-sion and cell proliferation in adult T-cell leukemia, Oncogene, 27 (2008), 3221-3232.

[29] K. Milde-Langosch, S. Janke, I. Wagner, C. Schröder, T. Streichert, A.

M. Bamberger, F. Jänicke and T. Löning, Role of Fra-2 in breast cancer: influence on tumor cell invasion and motility, Breast Cancer Res. Treat., 107 (2008), 337-347.

[30] H. Endoh, S. Tomida, Y. Yatabe, H. Konishi, H. Osada, K. Tajima, H.

Kuwano, T. Takahashi and T. Mitsudomi, Prognostic model of pul-monary adenocarcinoma by expression profiling of eight genes as determined by quantitative real-time reverse transcriptase polymerase chain reaction, J. Clin. Oncol., 22 (2004), 881-889.

[31] D. May, A. Itin, O. Gal, H. Kalinski, E. Feinstein and E. Keshet, Ero1-L alpha plays a key role in a HIF-1-mediated pathway to improve

disulfide bond formation and VEGF secretion under hypoxia:

implication for cancer, Oncogene, 24 (2005), 1011-1020.

[32] K. Yasui, I. Imoto, Y. Fukuda, A. Pimkhaokham, Z. Q. Yang, T. Naruto, Y. Shimada, Y. Nakamura and J. Inazawa, Identification of target genes within an amplicon at 14q12-q13 in esophageal squamous cell carcinoma, Genes, Chromosomes and Cancer, 32 (2001), 112-118.

[33] ”Entrez Gene” http://www.ncbi.nlm.nih.gov/gene/.

相關文件