多標記接受者操作特徵曲線下部分面積最佳線性組合之研究 - 政大學術集成

全文

(1)國立政治大學統計學系研究所博士論文. 指導教授：薛慧敏博士張源俊博士. 政治大. ‧. ‧ 國. 學. 立多標記接受者操作特徵曲線下部分面積最佳線性組合之研究 y. Nat. n. er. io. sit. The study on the optimal linear a l of markers based combination on the v i n Ch U i e h n c g partial area under the ROC curve 研究生：許嫚荏撰. 民. 國. 一. 百. 零. 二. 年. 四. 月.

(2) 致. 謝. 這篇論文能夠完成，首先感謝兩位指導教授薛慧敏博士和張源俊博士。非常感謝老師們花了那麼多的時間和精力，讓我學會如何嚴謹的作研究及如何將研究出來的內容發表成文章；並從老師們身上學習到更多其他層面的知識。在論文的修訂上，要特別感謝劉仁沛博士和黃怡婷博士及蔡政安博士，感謝老師們的寶貴意見，使得這篇論文可以更加完善，同時也給予我更多的啟發和收穫，嫚荏在此由衷的感謝。此外感謝我的家人和朋友，謝謝你們的關心和照顧。從這個漫長的博士生生涯，我學會了如何面對困難，並深刻體會到人的潛力. 政治大無憾的。再次感謝所有曾經幫助過我的人，我將這份喜悅與你們分享。立. 是無限的，任何事只要下定決心，事情發展到最後即時未必是完美的，但至少是. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. 嫚荏於民國一○二年四月.

(3) 摘. 要. 本論文的研究目標是建構一個由多標記複合成的最佳疾病診斷工具，所考慮的評估準則為操作者特徵曲線在特定特異度範圍之線下面積(pAUC)。在常態分布假設下，我們推導多標記線性組合之 pAUC 以及最佳線性組合之必要條件。由於函數本身過於複雜使得計算困難。除此之外，我們也發現其最佳解可能不唯一，以及局部極值存在，這些情況使得現有演算法的運用受限，我們因此提出多重初始值演算法。當母體參數未知時，我們利用最大概似估計量以獲得樣本 pAUC 以及令其極大化之最佳線性組合，並證明樣本最佳線性組合將一致性地收斂到母體. 政治大複合判別能力以及個別標記的條件判別能力，分別提出相關統計檢定方法。這些立. 最佳線性組合。在進一步的研究中，我們針對單標記的邊際判別能力、多標記的. ‧ 國. 學. 統計檢定被運用至兩個標記選取的方法，分別是前進選擇法與後退淘汰法。我們運用這些方法以選取與疾病檢測有顯著相關的標記。本論文透過模擬研究來驗證. ‧. 所提出的演算法、統計檢定方法以及標記選取的方法。另外，也將這些方法運用. n. al. er. io. sit. y. Nat. 在數組實際資料上。. Ch. engchi. i n U. v. 關鍵字：判別能力(discriminatory power)，疾病偵測(disease detection)，操作者特徵曲線下的部份面積(partial area under ROC curve)，標記選取(biomarker selection)，最佳線性組合(optimal linear combination)，操作者特徵曲線 (receiver operating curve)，特異度(specificity)，敏感度(sensitivity).

(4) Abstract The aim of this work is to construct a composite diagnostic tool based on multiple biomarkers under the criterion of the partial area under a ROC curve (pAUC) for a pre-determined specificity range. Recently several studies are interested in the optimal linear combination maximizing the whole area under a ROC curve (AUC). In this study, we focus on finding the optimal linear combination by a direct maximization of the pAUC. 政治大 pAUC is derived. The form is so complicated, that a further validation on the Hessian 立 matrix is difficult. In addition, we find that the pAUC maximizer may not be unique and under normal assumption. In order to find an analytic solution, the first derivative of the. ‧ 國. 學. sometimes, local maximizers exist. As a result, the existing algorithms, which depend on the initial-point, are inadequate to serve our needs. We propose a new algorithm by. ‧. adopting several initial points at one time. In addition, when the population parameters. y. Nat. are unknown and only a random sample data set is available, the maximizer of the sam-. sit. ple version of the pAUC is shown to be a strong consistent estimator of its theoretical. al. er. io. counterpart. We further focus on determining whether a biomarker set, or one specific. iv n C tistical tests for the identification h eof nthegdiscriminatory i U power. The proposed tests are h c applied to biomarker selection for reducing the variable number in advanced analysis. n. biomarker has a significant contribution to the disease diagnosis. We propose three sta-. Numerical studies are performed to validate the proposed algorithm and the proposed statistical procedures.. Keywords: Discriminatory power; Hypothesis testing; Optimal linear combination; Partial area under ROC curve; Stepwise biomarker selection; Receiver operating curve; Specificity; Sensitivity..

(5) Contents 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1. 1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5. 政治大 The Linear Combination Achieving the Optimal Partial Area under the 立 ROC Curve. 學. ‧ 國. 2. 1. 2.1 Partial Area under the ROC curve (pAUC) . . . . . . . . . . . . . . . . . .. 7 7. 2.2 Computational Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10. ‧. 2.3 Multiple-Initial Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 14. sit. y. Nat. 3 Statistical Inference Related with the pAUC Maximizer. io. er. 3.1 Estimating the Linear Combination Maximizing the pAUC . . . . . . . . . 14 3.2 Testing the Discriminatory Power . . . . . . . . . . . . . . . . . . . . . . . 15. n. al. i n U. v. 3.3 Biomarker Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Simulation Study. Ch. engchi. 23. 4.1 Multiple-Initial Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3 Two-Biomarker Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5 Real Examples. 57. 5.1 Atherosclerotic Coronary Heart Disease Data . . . . . . . . . . . . . . . . . 58 5.2 Duchenne Muscular Dystrophy (DMD) Data . . . . . . . . . . . . . . . . . 62 5.3 Breast Tissue Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65. II.

(6) 5.4 Magic Gamma Telescope Data . . . . . . . . . . . . . . . . . . . . . . . . . 70 6 Conclusions and Future Works. 76. 6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 A Proofs. 81. A.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 A.2 Proof of Corollary 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 A.3 Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 A.4 Lemma 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83. 治政 Proof of Lemma 1 . . . . . . . . . . . . . .大 . . . . . . . . . . . . . . . . . . Proof of Lemma立 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. A.5 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83. 學 ‧. io. sit. y. Nat. n. al. er. A.7. ‧ 國. A.6. Ch. engchi. III. i n U. v. 85 86.

(7) List of Tables 4.1 Covariance set up in the simulation study. . . . . . . . . . . . . . . . . . . 24 4.2 The maximizer(s) and the corresponding maximal pAUC value found by using the grid search and the proposed multiple-initial method.. . . . . . . 25. 治政大error of the estimated aˆ and The true value, empirical mean and standard 立 pAUC(â ), and the power of the global test over 1000 replications. . . . .. 4.3 The Setting of Populations. . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4. n. n. ‧ 國. 學. 4.5 The pAUC and pAUC estimate after the biomarker selection.. 36. . . . . . . . 37. 4.6 Biomarker Selection: Forward Method vs. Backward Method over 1000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38. ‧. replications.. 4.7 The pAUC and pAUC estimate after the biomarker-selection for the speci-. y. Nat. sit. ficity range (0.9,1) based on multivariate t distribution with degree freedom. al. er. io. 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39. v. n. 4.8 The true pAUC, the estimated pAUC, and the global test using the full. Ch. i n U. biomarker set; the estimated pAUC by using the reduced biomarker set. engchi. after biomarker selection based on 1000 replications for three and four dimensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.9 The true value, empirical mean, and standard error of the estimated aˆn based on 1000 replications. . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.10 The proportion of outcomes from the two biomarker selection methods among 1000 replications for three dimension. . . . . . . . . . . . . . . . . . 42 4.11 The proportion of outcomes from the Forward selection method among 1000 replications for four dimension. . . . . . . . . . . . . . . . . . . . . . 43. IV.

(8) 4.12 The proportion of outcomes from the Backward selection method among 1000 replications for four dimension. . . . . . . . . . . . . . . . . . . . . . 43 4.13 The sample size effect of two groups in the power of the global test for unbalanced data over 1000 replications. . . . . . . . . . . . . . . . . . . . . 52 4.14 The sample size effect of two groups in pAUC for unbalanced data over 1000 replications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.15 The sample size effect of two groups in the power of the global test for balanced data over 1000 replications. . . . . . . . . . . . . . . . . . . . . . 53 4.16 The critical value of optimal pAUC is the 100 × (1 − α)th percentile among. 105 replicates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54. 政治大. 4.17 The critical values, d1 , d2 , of the coefficient in the optimal linear combination are the 100 × (1 − α/2)th and 100 × (α/2)th percentiles among 105. 立. replicates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55. ‧ 國. 學. 5.1 The coefficients of the optimal linear combination and the corresponding pAUC value for the specificity range (1−t, 1) in the atherosclerotic coronary. ‧. heart disease example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59. y. Nat. 5.2 The Forward and Backward selections for the specificity range (0.9,1) in the. sit. atherosclerotic coronary heart disease example with/without standardization. 61. er. io. 5.3 The coefficients of the optimal linear combination and the corresponding. al. n. iv n C The Forward and Backward for the specificity range (0.9,1) in h e nselections gchi U. pAUC value for the specificity range (1 − t, 1) in the DMD example. . . . . 63 5.4. DMD example with/without standardization. . . . . . . . . . . . . . . . . 64 5.5 The coefficients of the optimal linear combination and the corresponding pAUC value for the specificity range (1 − t, 1) in the breast tissue example. 66 5.6 The Forward and the Backward selections for the specificity range (0.9,1) in the breast tissue example. . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.7 The distributions of two populations, their corresponding pAUC for the specificity range (0.9,1), and two measure indicators in the breast tissue example with standardization. . . . . . . . . . . . . . . . . . . . . . . . . . 69. V.

(9) 5.8 The coefficients of the optimal linear combination and the corresponding pAUC value for the specificity range (1 − t, 1) in the telescope example. . . 74 5.9 The Forward and the Backward selections for the specificity range (0.9,1) in the telescope example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. VI. i n U. v.

(10) List of Figures 2.1 θ versus pAUC at t = 0.1, ρ0 = 0.9, ρ1 = −0.9 for example 1 (Left), example 2 (Right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10. 4.1 The distributions of best linear combination, a∗T X , for Case 4 (Left top),. 政治大 The distributions 立of best linear combination, a. Case 7 (Right top), Case 10 (Left bottom), Case 13 (Right bottom). . . . . 27 4.2. ∗T. X , for Case 4 (Left top),. ‧ 國. 學. Case 16 (Right top), Case 19 (Left bottom), Case 22 (Right bottom). . . . 29 4.3 The distribution of estimated linear combination for Case 1 (Left:â1 , Right:â2 ). 31 4.4 θ versus pAUC at t = 0.1 for Class I. . . . . . . . . . . . . . . . . . . . . . 48. ‧. 4.5 θ versus pAUC at t = 0.1 for Class II. . . . . . . . . . . . . . . . . . . . . . 49. y. Nat. 4.6 θ versus pAUC at t = 0.1 for Class III. . . . . . . . . . . . . . . . . . . . . 50. al. er. io. sit. 4.7 Case versus MSE (10−4 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51. n. 5.1 The distributions of I0, P, DR, DA, PA500, AREA, A/DA, MAX IP, and. Ch. i n U. v. HFS for two groups in the breast tissue example. . . . . . . . . . . . . . . 72. engchi. 5.2 The distributions of the best linear combination obtained via the Forward method and the Backward method for two groups in the breast tissue example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73. VII.

(11) Chapter 1 Introduction 1.1. Motivation. 立. 政治大. In medical practice, investigators mainly use clinical and laboratory data to detect and. ‧ 國. 學. predict the occurrence of disease. The tissues or serum specimens of subjects from cancer cases and from normal controls are collected to find potential biomarkers of the disease.. ‧. For example, CA125, CA19-9 have been identified as the biomarkers for pancreatic cancer. In the paper of Liu et al. (2005), lutein, TBARS, HDL cholesterol, and uric acid are be-. y. Nat. sit. lieved as risk factors of atherosclerotic coronary heart disease. However, in an exploratory. er. io. study, there is usually no prior information about which biomarker is the best among all. al. iv n C sitive and specific diagnosis. The use of multipleU h e n g c h i biomarkers at a time is expected to provide a better diagnosis than the use of individual ones (Bast Jr (1993); Woolas et al. n. valid ones for diagnosis purpose. Usually, no single biomarker provides a sufficiently sen-. (1995)). The goal of this study is to find a good composite classifier, which combines multiple biomarkers, in terms of having a high accuracy to classify the non-diseased and diseased subjects. Many researchers have developed various summarizing methods to combine multiple markers into a single indicator in literature. For example, Marshall (1989) proposed using a Boolean operator to combine the results of two diagnostic tests, which performs better than either single test in accuracy. However, this combination method becomes complicated and inefficient when there are more than two biomarkers. When the biomarkers are 1.

(12) continuous, Pepe and Thompson (2000) considered using a linear combination of markers for disease detection. Furthermore, they showed that their proposed linear combination has a satisfactory accuracy. Due to the advantages of easy interpretation and fast computation, we consider the class of linear combinations to summarize multiple biomarkers in this study. The statistical measures to assess the accuracy of classifiers have been developed for many years (Pepe (2004)). Sensitivity and specificity are the two most familiar measures to assess a diagnostic tool. The sensitivity is defined as the probability that a diseased subject is correctly diagnosed. Similarly, the specificity is defined as the probability that a non-diseased subject is correctly diagnosed. Generally, it’s impossible to find a. 政治大. classification rule which has a high sensitivity and a high specificity at the same time. For example, consider a class of noninformative test with a constant probability p of a positive. 立. finding. Then every test has the sensitivity, p, and the specificity, 1 − p. The sensitivity. ‧ 國. 學. decreases as the specificity increases. Hence, it is difficult to compare the diagnostic tests in this class. Consequently, the misclassification probability, which summarizes the. ‧. sensitivity and specificity, is proposed (Pepe (2004)). Given the population prevalence of disease, ρ, the misclassification probability is the sum of two products: ρ·(1-sensitivity),. Nat. sit. y. and (1 − ρ)·(specificity). This measure is commonly used in engineering and computer. er. io. science applications.. Consider a binary test based on the biomarkers for disease diagnosis. Suppose Y is a. n. al. Ch. i n U. v. continuous-scaled biomarker, and the classification rule is that if Y is larger than some. engchi. cutting point, c, the subject is diagnosed as diseased. It’s seen that the uses of different cutting points bring about different testing results and difference accuracy. In practice, a useful cutting point may be found by the experts of the subject domain. For example, ankle-brachial index (ABI)< 0.9 has been used in clinical practice and epidemiologic studies as an indictor of peripheral arterial occlusive disease (PAOD) according to previous studies (Su et al. (2004)). Sometimes choosing an appropriate cutoff is not a trivial task. For an exhaustive look at the relation between sensitivity and specificity over all possible cutting points, the ROC curve, defined as a plot of sensitivities versus 1specificities is proposed. The ROC curve was used in signal detection theory during 1950, 2.

(13) and was applied to radiology in 1982. Many mathematical properties of the ROC curve are summarized in the book of Pepe (2004). However, although the ROC curve can provide a graphical representation of classification performance, a useful summarization of the plot is always a welcome addition. Especially when comparing several diagnostic tools at a time, a good summarization can negate the need to look into the details of individual curves. The area under ROC curve (AUC) proposed by Bamber (1975), integrates the sensitivity values over all cutoffs and thus provides a cutoff-independent assessment of a diagnostic method. Consequently, AUC can be interpreted as an average sensitivity across all possible (1-specificity) range (Pepe (2004)). According to the definition, the AUC of an. 政治大. uninformative classifier is 0.5 and the AUC of the perfect classifier is 1. In addition, AUC can be shown to equal to the probability that the biomarker value of a randomly-drawn. 立. diseased subject exceeds that of a randomly-drawn non-diseased subject. Consequently,. ‧ 國. 學. AUC can be easily estimated by the Wilcoxon-Mann-Whitney statistic. DeLong et al. (1988) proposed a nonparametric approach by using the theory on generalized U-statistics. ‧. to estimate the area under the ROC curve in multivariate tests.. Sometimes, instead of the whole plot, the researchers are interested in some region. Nat. sit. y. of the ROC curve. For example, for a diagnosis of a fatal disease, a liberal cutoff may. er. io. be used under the requirement of a high sensitivity. On the other hand, in a population screening test, high false positive findings result in a dramatic cost increase. The inves-. n. al. Ch. i n U. v. tigators may consider a stringent cutoff for a high specificity and place their emphasis. engchi. only on the high specificity region of a ROC curve. Due to the fact that only a restricted region of cutoffs is clinical relevant and of research interest, the partial area under ROC curve (pAUC) is suggested for evaluation of a diagnosis as a consequence (Thompson and Zucchini (1989); McClish (1989); Li et al. (2008)). For example, Baker and Pinsky (2001) assessed the digital and the analog mammography for breast cancer screening by using the pAUC. Beam et al. (2003) evaluated the potential effect of proscriptive health care policies directed toward improving screening mammogram interpretation in the United States under the criterion of the pAUC. Hence, our target criterion is the pAUC. In this study, we focus on finding the linear combination of biomarkers that maximizes 3.

(14) the pAUC over the region with sufficiently high specificity. When the diseased and nondiseased groups both follow normal distributions, Su and Liu (1993) derived the explicit form of the unique best solution that possesses the maximal AUC among all possible linear combinations. Liu et al. (2005) pointed out that the optimal linear combination of Su and Liu (1993) may give an unsatisfactory low sensitivity on either high or low specificity areas, which is undesirable in practice. Hence, Liu et al. (2005) started from deriving the sufficient condition for one linear combination dominates the other in terms of sensitivity, they proposed a linear combination, which has higher sensitivity than the other linear combination in some specificity region. However, the superiority of the developed linear combination is not uniform with respect to the specificity region. That is, the dominance. 政治大. region depends on the linear combination being compared. Hence, it does not guarantee the attainment of the optimal pAUC for the particular clinical relevant specificity range.. 立. In this study, the optimal linear combination is developed based on a direct maxi-. ‧ 國. 學. mization of pAUC correspondent to a given specificity region. We propose an algorithm to find the optimal linear combination. In application, the summary score calculated by. ‧. the optimal linear combination can be used for disease diagnosis. In addition, when the population parameters are unknown and only a random sample data set is available, the. Nat. sit. y. maximizer of the sample version of the pAUC is shown to be a strong consistent estimator. er. io. of its theoretical counterpart. Furthermore, we focus on determining whether the contribution of a biomarker set, or the contribution of one specific biomarkers is significant to. n. al. Ch. i n U. v. the disease diagnosis. As a result, we propose three statistical tests to determine whether. engchi. some biomarkers or specified biomarker have(s) significant power to detect the disease. According to these three tests, we develop two biomarker selection methods to reduce the biomarker number in advanced analysis. We can use less biomarkers with limited power loss to detect the disease. The reduced biomarker set selected by our selection procedure are said to be statistical significant for disease detection.. 4.

(15) 1.2. Outline. In Section 2.1, we will present a necessary condition of the optimal solution. The first derivative of the pAUC for a linear combination of multiple biomarkers under the normality assumption is derived. Two examples will be given for illustrating the computational issues in finding the pAUC maximizer(s) in Section 2.2. One example shows the nonuniqueness of maximizer, and the other shows the existence of local maximizers. Due to these problems, the computation becomes a non-trivial task and the solution produced by a commonly-used FORTRAN subroutine may be dependent with the initial point input. An algorithm to obtain the optimal linear combination that maximizes the pAUC will be proposed in Section 2.3.. 政治大 the statistical inference related with the pAUC are introduced in Chapter 3. It’s known 立. When population parameters are unknown and a random sample data set is available,. that many methods are used to estimate the parameters of a normal distribution. Ac-. ‧ 國. 學. cording to the properties of the maximum likelihood estimators (MLEs), we consider to use them to estimate the parameters and the ROC function. Subsequently, the estimated. ‧. Nat. estimated pAUC maximizer is derived in Section 3.1.. y. linear combination maximizing the pAUC can be obtained. The strong consistency of the. sit. In Section 3.2 we propose three related tests on the discriminatory power of a biomarker. al. n. iv n C Another is to test h ethen conditional g c h i Udiscriminatory power of one specific. of a biomarker set. one biomarker.. er. io. set or of a single biomarker. The first one is to test the global discriminatory power The other one is to test the marginal discriminatory power of. biomarker given the existence of other biomarkers. Because there is a great difficulty in finding the exact null distribution for the test statistic, we propose a parametric bootstrapping method to generate the empirical null distribution for critical values. Furthermore, we apply the developed statistical tests in biomarker selection in Section 3.3. Simulation results are presented in Chapter 4. Furthermore, applying our methods to four real examples, which contain atherosclerotic coronary heart disease, Duchenne Muscular Dystrophy (DMD), electrical impedance spectroscopy for breast tissue and magic gamma telescope data in Chapter 5. In additional to a conclusion, Chapter 6 gives a description of the. 5.

(16) future work. Technical details are given in the Appendix.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. 6. i n U. v.

(17) Chapter 2 The Linear Combination Achieving the Optimal Partial Area under the 政治大 ROC Curve 立. ‧ 國. 學. Partial Area under the ROC curve (pAUC). ‧. 2.1. y. Nat. For a subject, let X be a p × 1 vector of biomarkers related to the disease, and D be the. sit. binary disease status. Suppose D = 0, if the subject is from the non-diseased group; and. er. io. D = 1, if he/she is from the diseased group. Assume the biomarkers in the non-diseased. n. al. v. and the diseased groups comes from a multivariate normal distribution with mean vectors µd. i n and covariance matrices ΣC , for d = 0, 1. That is, hengchi U d. X|D = d ∼ N(µd , Σd ), d = 0, 1. Then, given a p × 1 real vector a, the linear combinations of the p markers of two populations VD¯ = aT X|D = 0 and VD = aT X|D = 1 have the following distributions: VD¯ ∼ N(aT µ0 , Q0 ), VD ∼ N(aT µ1 , Q1 ), where Qd = aT Σd a, for d = 0, 1. For the (1-specificity) at u ∈ [0, 1], define the value on the ROC curve of the linear combination as ROC(u) = 1 − FD FD−1 ¯ (1 − u) , 7.

(18) where FD¯ , FD are the cumulated distribution functions of VD¯ , VD , respectively. When FD¯ , FD are both normal distributions, √ c(u) Q0 − aT ∆µ √ , ROC(u) = 1 − Φ Q1 . where c(u) = Φ−1 (1 − u), ∆µ = µ1 − µ0 , Φ(·) is the cumulative distribution function of N(0, 1). Then, for a given (1-specificity) region (0, t), t ∈ (0, 1), the partial area under the ROC curve (pAUC) of the linear combination is defined as Z t pAUC(a) = ROC(u)du. (2.1.1). 0. The aim of this study is to find the optimal linear combination that maximizes the pAUC. 政治大. in Equation (2.1.1). In the following, we denote the pAUC maximizer as a∗ , which satisfies. 立. a∗ = arg max pAUC(a). a∈ℜp. ‧ 國. 學. When (1-specificity) range is zero to one, the integration of the ROC function is the area under the ROC curve (AUC). It is a special case of the pAUC. Su and Liu (1993). ‧. showed that the best linear combination that maximizes the AUC under normality is. (2.1.2). io. er. a0 = (Σ0 + Σ1 )−1 (∆µ ).. sit. y. Nat. proportional to. al. n. iv n C Regarding the optimization of known that the optimal linear combination hpAUC, e n git’s chi U. This linear combination is the unique maximizer among all possible linear combinations.. is a zero root of the first derivative of pAUC. That is, it is necessary for the solution to satisfy the following equation of the first derivative, ∂pAUC(a) = 0. ∂a. (2.1.3). With intensive derivations, we have the following theorem.. Theorem 1. The coefficient vector of the linear combination that satisfies Equation (2.1.3) is proportional to (w1 Σ0 + w2 Σ1 )−1 (∆µ ), 8. (2.1.4).

(19) where w 1 = c1 with c1 =. √. aT ∆µ aT ∆µ + c2 (Q1 ), w2 = c1 − c2 (Q0 ), Q0 + Q1 Q0 + Q1. . ν − c(t) 2πσΦ σ. and. ν = aT ∆µ. p. . 1 (c(t) − ν)2 √ , c2 = σ exp − , 2 2σ Q0 (Q1 ) 2. Q0 /(Q0 + Q1 ), σ 2 = Q1 /(Q0 + Q1 ).. When we consider the whole area under the ROC curve, that is, t = 1, Equation (2.1.4) is reduced to Equation (2.1.2) as expected. On the other hand, when Σ0 is proportional to Σ1 , Equation (2.1.4) can be further simplified, as seen in the corollary below. Under this. 治政 Su and Liu (1993). In fact, in such cases, Su and Liu 大(1993) indicated that their solution 立best linear discriminant. coincides with the Fisher’s situation, the solution has the same form as the linear combination of Equation (2.1.2) by. ‧ 國. 學. Corollary 1. When Σ0 is proportional to Σ1 , the coefficient vector of the linear combi-. ‧. nation that satisfies Equation (2.1.3) is proportional to Σ−1 0 (µ1 − µ0 ). We find that the pAUC has the scale invariant property, thus the best linear combi-. y. Nat. n. al. er. io. the solution has a unit norm, in calculation. That is,. sit. nation maximizing the pAUC is not unique. For simplicity, we add one restriction, that. i n U. a∗ = arg max pAUC(a),. Ch. a∈Ep. engchi. v. where Ep = {a| kak = 1, a ∈ ℜp }. Equation (2.1.4) provides a necessary condition for the linear combination that maximizes pAUC for a given t ∈ (0, 1). Solving for the optimal coefficient vector is a fixed-point problem, a∗ = f (a∗ ). However, the explicit form of the corresponding Hessian matrix is still too complicated to obtain. Therefore, one can not validate that a vector satisfying the equation is the global maximum, as it may be a local minimum. Furthermore, in next section some examples are given to illustrate possible computational issues which increase the complexity of the calculation. 9.

(20) 2.2. Computational Issues. We present two examples of two biomarkers to illustrate that pAUC can have non-unique maximal points and local extrema. From Equation (2.1.2) and Equation (2.1.4), we can see that the solutions are of the form as some rotated transforms of ∆µ , the mean difference between the non-diseased and diseased groups. Therefore, we consider expressing the pAUC as a function of the mean difference vector between two distributions. In the following, we reparameterize the problem by expressing any (a1 , a2 )T as a rotation transform of ∆µ by angle θ, θ ∈ (−π, π), a1 cos(θ) − sin(θ) = ∆µ . a2 sin(θ) cos(θ). 治政大 combination that maximizes the all possible linear combinations. Numerically, the linear 立 pAUC can be found by a grid search. 0.025. 0.015. io. 0.01. 0 −4. −3. −2. −1. 0. 0.015. 0.01. al. 0.005. n. 0.005. 0.02. y. 0.02. θ. 1. Ch 2. 3. 4. Max2. sit. Nat. pAUC. 0.025. Max1. ‧. 0.03. True pAUC. 0.03. True pAUC Max1 Max2. engchi. er. 0.035. 0.035. pAUC. 0.04. 學. ‧ 國. From the plot of pAUC as a function of θ for θ ∈ (−π, π), we learn about the behavior of. i n U. 0 −4. v. −3. −2. −1. 0. θ. 1. 2. 3. 4. Figure 2.1: θ versus pAUC at t = 0.1, ρ0 = 0.9, ρ1 = −0.9 for example 1 (Left), example 2 (Right) In example 1, we take t = 0.1 as the upper bound for the false positive rate and consider µ0 =. . 0 0. . , Σ0 =. . 1 ρ0 ρ0 1. . , and µ1 =. . 1 1. . , Σ1 =. . 1 ρ1 ρ1 1. . .. The maximal pAUC 0.0372 is achieved at two different vectors: (0.8144, −0.5803)T ,. (−0.5803, 0.8144)T . The left panel in Figure 2.1 shows that the corresponding angles 10.

(21) between the two solutions and the mean difference are -1.4045 and 1.4045, and the two maximal points are denoted as Max1 and Max2. The angles are of same absolute value, but of opposite sign. The pAUC plot is symmetric about zero. In example 2, we consider that the first biomarker has less variation in the diseased group, µ0 =. . 0 0. . , Σ0 =. . 1 ρ0 ρ0 1. . , and µ1 =. . 1 1. . , Σ1 =. . 0.25 0.5ρ1 0.5ρ1 1. . .. Again with ρ0 = 0.9, ρ1 = −0.9 and t = 0.1, we find that the maximal pAUC is 0.0342, and the corresponding optimal linear combination vector is (−0.0554, 0.8322)T . Moreover,. from the right panel in Figure 2.1 we find that there is another local maximal point at (0.8, −0.6)T with pAUC value 0.0325. The pAUC values of the two vectors are quite close.. 政治大 problem becomes more complicated. When the data dimension is two, the global max立. Hence, by the existence of multiple maximums and local extrema, the optimization. ‧ 國. 學. imum can be found by using a grid search as in previous examples. When the data dimension is three or four, the grid search method is still valid and involves an ’n-sphere’ approach (Marsaglia (1972); Muller (1959)). However, as the data dimension increases, it. ‧. becomes more difficult and less efficient to apply the grid search method for the optimal. sit. y. Nat. solution. The problems of multiple roots and local extrema add to the complexity and difficulty of computation. The fact that a pAUC often takes value at nearly zero makes the. io. er. situation even worse. As a consequence, the commonly-used subroutines for optimization. al. n. iv n C h e nIngthe found instead of the global maximum. i Usection, we will propose a strategy to c hnext are likely to produce unsatisfactory results. For example, a local maximum/minimum is. improve the existing algorithm. In addition, we will take three examples in two-biomarker study to investigate what situations make the multiple pAUC maximizers and local extrema exist in Section 4.3.. 2.3. Multiple-Initial Algorithm. In order to solve the optimization problem, one common and efficient way is to call the subroutine of the IMSL Library in a FORTRAN program. We propose using the UMING subroutine, which is created for optimizing a function of multiple variables and requests a 11.

(22) user-supplied gradient. Mainly the subroutine employs the quasi-Newton method to find the optimal coefficient vector. From examples in the previous section, we learned that the linear combination maximizing pAUC(a) may not be unique, and that there exist local extrema in this optimization problem. The use of one arbitrary initial point is likely to either produce an incorrect solution, or fail to find all solutions. To minimize the risk, we thus propose an algorithm, which considers multiple initial points simultaneously. Intuitively, the standard basis of ℜp is adopted as initial points. The best linear combination of AUC, a0 , is also considered as well. Hence, we totally input p+1 initial points. Once we get p + 1 solutions and their corresponding pAUC values by the FORTRAN UMING subroutine for optimization, the proposed algorithm endeavors to determine whether multiple. 政治大. maximums exist, and then provides a complete list of maximizers after the determination. Multiple roots exist if distinct solutions have indifferential pAUC values. The algorithm. 立. is present in the following:. ‧ 國. 學. Step 0: Calculate the coefficient vector a0 of Equation (2.1.2) by Su and Liu (1993).. (0). (0). (0). ‧. Step 1: Use the standard basis of ℜp and a0 as p + 1 initial points and denote them as (0). a1 , a2 , . . . ap , ap+1 = a0 , respectively.. y. Nat. sit. Step 2: Using the FORTRAN UMING algorithm, we can get p + 1 convergent points. n. al. er. io. and their associated pAUCs, which are denoted as (ai , pAUCi ), for i = 1, . . . p + 1.. Ch. i n U. v. Step 3: Sort the p + 1 pairs of solutions according to their pAUC value in descending. engchi. order, denote the ordered pairs as (a(i) , pAUC(i) ), for i = 1, . . . p + 1. Step 4: Given ǫ > 0, find the set of best solutions, {a(i) : pAUC(i) > pAUC(1) − ǫ, 1 ≤ i ≤ p + 1}. Step 5: If the set contains a single solution, we find the unique maximum, a(1) , with maximal pAUC value pAUC(1) . Step 6: If the set contains multiple solutions, whether any two solutions are distinct is determined by the angle between them. If the angle exceeds a given η > 0, they are 12.

(23) distinct maximums; otherwise, they are regarded as indifferential. Modify the set such that all the solutions are distinct with others. Step 7. Report the set of solutions and the pAUC value pAUC(1) . In Chapter 4, we will display the performance of the proposed Multiple-Initial Algorithm. In next chapter, we will study the statistical inference related with the pAUC maximizer, containing the theoretical property of the sample version of the maximizer, three discriminatory power tests, and biomarker selection methods.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. 13. i n U. v.

(24) Chapter 3 Statistical Inference Related with the pAUC Maximizer 政治大立. Estimating the Linear Combination Maximizing. ‧ 國. 學. 3.1. the pAUC. ‧. When population means and covariance matrices are unknown, we can use their maximum. sit. y. Nat. likelihood estimators (MLEs) for a sample version of pAUC in calculation. An estimate. io. er. of the pAUC maximizer can be obtained by using our algorithm. It is known that when sample size is large, the MLEs strongly converge to the true values. We will show that. n. al. i n U. v. the estimated pAUC maximizer has adequate performance for sufficiently large sample sizes in the following.. Ch. engchi. Assume two independent random samples of subjects were drawn from the nondiseased and the diseased populations. Let n0 be the size of the sample from the nondiseased population, and n1 be the size of the sample from the diseased population. Further n = min{n0 , n1 }. When the parameters of two population distributions are unknown, the maximum likelihood estimators (MLEs), which are known to have some good properties, such as consistency, are employed. The estimated mean vectors, and the esˆ 0, Σ ˆ 1 . Plugging the MLEs of the timated covariance matrices are denoted as µ ˆ0, µ ˆ1 , Σ unknown population means and covariance matrices in Equation (2.1.1), we obtain the. 14.

(25) sample version of the pAUC, \ n (a) ≡ pAUC. Z. t 0. q  ˆ 0 − aT ∆ ˆµ c(u) Q , q 1 − Φ ˆ1 Q . (3.1.1). ˆ 0 = aT Σ ˆ 0 a, Q ˆ 1 = aT Σ ˆ 1 a, and ∆ ˆµ = µ where Q ˆ1 − µ ˆ0 . Further, the coefficients a∗ of the optimal linear combination maximizing the pAUC is estimated by the maximizer of Equation (3.1.1), that is, \ n (a). a ˆn = arg max pAUC a∈Ep. Next we show that the sample pAUC maximizer, a ˆn , is a strong consistent estimator of a∗ .. 政治大. Theorem 2. Suppose that the conditional distribution of X|D = d follows N(µd , Σd ). 立. and Σd is positive definite for d = 0, 1. Assume that pAUC(a) in (2.1.1) is a continuous. ‧ 國. 學. function of a and has a unique maximizer a∗ in Ep , and a ˆn is the maximizer of the sample \ n (a), in (3.1.1). Then a pAUC, pAUC ˆn → a∗ with probability 1 as n → ∞.. ‧. In Theorem 2, the assumption that the eigenvalues of the covariance matrices are away. y. Nat. from zero guarantees that the two matrices are not ill-conditioned. To prove Theorem 2,. io. sit. \ n (a) is a strong consistent we first show that in the compact set, Ep , the sample pAUC. er. estimator of pAUC(a). We further show that the norm of the first derivative of the. al. n. iv n C h eproofs as Lemma 1, 2 in Appendix. The lemmas and Theorem 2 are given in i U n g cofhthe \ n (a) is uniformly bounded in a with probability 1. The two facts are summarized pAUC. the Appendix. Next section, we will propose three discriminatory power tests to find significant biomarkers. One is the global discriminatory power test, the other is the marginal discriminatory power test and another is the conditional discriminatory power test.. 3.2. Testing the Discriminatory Power. When we find the linear combination estimator maximizing the pAUC of p biomarkers to detect the disease, we would like to know whether the linear combination or every 15.

(26) biomarker in the combination has a significant discriminatory power or not. Three related hypotheses testing problems are studied. Due to the difficulty of deriving the exact null distribution of the proposed test statistics, we apply the parametric bootstrapping approach to find the critical values. First, when the dimension of the biomarker set is greater than 1, we are interested in testing the global discriminatory power of the full set. That is, H0,g : The optimal linear combination has no discriminatory power to the disease versus H1,g : The optimal linear combination has discriminatory power to the disease. Consider to evaluate the global discriminatory power of the biomarker set by the pAUC. 政t 治大 t versus H : pAUC(a ) > . : pAUC(a ) ≤ 2 2 立. of the correspondent optimal linear combination, we have H0,g. ∗. 2. ∗. 1,g. 2. Nat. The rejection region is. io. al. Tg ≥ c,. q  ˆ0 − c(u) Q  du. q ˆ1 Q. y. 0. Φ. ˆµ a ˆTn ∆. sit. a∈Ep. t. . er. \ n (a) = pAUC \ n (ân ) = Tg = max pAUC. Z. ‧. ‧ 國. test statistic,. 學. We propose using the estimated pAUC of the optimal linear combination, a ˆTn X, as the. n. iv n C Under the h enull n ghypothesis c h i UH , the biomarkers X in the non-. with some critical value c. We consider to find the critical value c by the parametric bootstrapping method.. 0,g. diseased and the diseased groups both come from the same multivariate normal. The. common parameters are estimated by the pooled mean vector and the pooled covariance ˜ p , respectively. In every bootstrapping, we take two independent matrix, denoted as µ ˜p , Σ ˜ p ). Then, samples of sizes n0 , n1 , respectively, from the estimated null distribution, N(˜ µp , Σ (b). we use the bootstrap samples to find the test statistic, denoted as Tg , b = 1, . . . , B. Repeat the bootstrap sampling for B times. The critical value c at the significance level (b). α is then equal to the 100 × (1 − α)th percentile among these Tg. values.. The test proposed above can be applied to test the marginal discriminatory power of one specific biomarker in X. Assume Xi is the biomarker of research interest. Consider 16.

(27) the following hypothesis, H0,m : pAUC(1i ) ≤. t2 2. versus. H1,m : pAUC(1i ) >. t2 , 2. where the vector 1i ∈ ℜp has its components almost equal to zero, except the component corresponding to Xi of X, which is equal to one. The test statistic Tm is the estimated pAUC of Xi , \ n (1i ) = Tm = pAUC. Z. 0. t. q  ˆ 0,ii (ˆ µ1,i − µ ˆ0,i ) − c(u) Σ  du, q Φ ˆ 1,ii Σ . ˆ 0,ii and µ ˆ 1,ii are the MLEs of the mean and the variance of Xi from the where µ ˆ 0,i , Σ ˆ1,i , Σ. 治政 T is sufficiently large. Similar to the global discriminatory 大 power test, the critical value 立 bootstrapping method. Since the dimension of biomarker is is decided by the parametric non-diseased and the diseased groups, respectively. The null hypothesis H0,m is rejected if m. ‧ 國. 學. one, the computation of the critical value is easier.. Another problem of research interest is to assess the contribution of one specific. ‧. T biomarker given the existence of other biomarkers. Assume X T = (Xi , Xi− ), where Xi T is the target biomarker and Xi− includes the biomarkers in X other than Xi . Then, the. y. Nat. io. sit. testing problem can be described as follows,. n. al. er. H0,c : Given Xi− , Xi has no discriminatory power to the disease versus. i n U. v. H1,c : Given Xi− , Xi has discriminatory power to the disease.. Ch. engchi. The coefficients of the optimal linear combination of X are written as a∗T = (a∗i , a∗T i− ), where a∗i is the corresponding coefficient of Xi . We propose evaluating the biomarker Xi from a∗i for this problem. Intuitively, the null hypothesis is equivalent to that in the optimal linear combination, Xi has a zero correspondent coefficient, i.e. a∗i = 0. Hence, the null hypothesis can be expressed as H0,c : a∗i = 0 versus H1,c : a∗i 6= 0. Define aˆn,i as the estimator of a∗i . We then propose using the estimate as the test statistic, i.e. Tc,i = aˆn,i . Further, the null hypothesis, H0,c , is rejected if either Tc,i ≥ d1 or Tc,i ≤ d2 . 17.

(28) The critical values, d1 , d2 , are determined from the empirical null distribution of Tc,i by using the parametric bootstrapping method. In order to generate the bootstrap samples, the null scenario under H0,c is discussed. Under the normality assumption, for given D = d, d ∈ {0, 1}, Σd,ii Σd,ii− Xi µd,i X= | D = d ∼ MV N , ΣT . X i− µd,i− d,ii− Σd,i− i− Then under H0,c , P (Xi |D, Xi− ) = P (Xi |Xi− ), providing that for each realization, Xi− = xi− , −1 − − − − − µ1,i + Σ1,ii− Σ−1 1,i− i− (xi − µ1,i ) = µ0,i + Σ0,ii Σ0,i− i− (xi − µ0,i ),. 政治大 Therefore, estimating the 立null distribution involves a non-trivial constrained inference. T −1 T − Σ1,ii − Σ1,ii− Σ−1 1,i− i− Σ1,ii− = Σ0,ii − Σ0,ii Σ0,i− i− Σ0,ii− .. ‧ 國. P (Xi |D, Xi− ) = P (Xi ).. 學. Here for simplicity, we consider a narrower null scenario, where. ‧. That is, within the two groups not only that Xi has a common distribution, but also that. y. Nat. Xi is independent with Xi− . As a consequence, we then consider the following model for. sit. n. al. er. io. bootstrap samples: for d = 0, 1, ˆ p,ii Xi µ ˆp,i Σ 0T X= | D = d ∼ MV N , . ˆ d,i− i− X i− µ ˆd,i− 0 Σ. Ch. i n U. v. ˆ p,ii are the pooled estimates of the mean and the variance of Xi . In In which, where µ ˆ p,i , Σ. engchi. ˆ d,i−i− are the conventional MLEs of mean and covariance matrix of Xi− , addition, µ ˆd,i− , Σ respectively; 0 is the (p − 1) × 1 zero vector. Repeat the bootstrap sampling B times, find the sample pAUC maximizers of the bootstrap samples and record the B estimated (b). coefficient a ˆn,i correspondent to Xi , b = 1, . . . , B. Under the significance level α, the critical values d1 , d2 are then the 100 × (1 − α/2)th and 100 × (α/2)th percentiles among the B coefficients. Note that this conditional test has less power to detect the significance of Xi when Xi− solely is independent of the disease D. Under H0,c , it’s known that P (Xi , Xi− |D) = P (Xi |Xi− )P (Xi− |D). 18.

(29) Combining the fact that P (Xi− |D) = P (Xi− ), it then leads to the complete null scenario that all biomarkers are independent of the disease. Under the condition, the estimated coefficients have great variability subject to the requirement of unit length in the algorithm. As a consequence, the critical values become so extreme that it is unlikely to obtain a significance, even when in fact Xi is strongly correlated with the disease. From simulation results in Section 4.2, we can further verify the fact. If a null hypothesis of discriminatory power is not rejected, then we may say that the biomarker tested gives an insignificant contribution to detect the disease and can be removed from the analysis. Hence, the three proposed discriminatory power tests are useful to identify the importance of a biomarker and can be embedded in the biomarker. 政治大. selection procedure to reduce variable numbers. This is to be discussed in the next section.. 立. Biomarker Selection. 學. ‧ 國. 3.3. When the optimal linear combination maximizing the pAUC is found, we are interested in. ‧. determining whether every present biomarker in the linear combination has a significant contribution in the pAUC. Or on contrary, some biomarkers due to having limited shares. Nat. sit. y. can be removed from the analysis for simplicity. In this study, we propose two biomarker. io. er. selection methods to reduce the variable number. One is the Backward selection, and the other is the Forward selection. The main idea is similar with the variable selection in the. n. al. Ch. i n U. v. regression analysis for a reduced subset of predictors. In addition, all biomarkers should. engchi. take an adequate standardization a prior.. First of all, the linear combination maximizing the estimated pAUC by considering the full biomarker set is found. Then every biomarker enters the selection procedure in the order determined by the absolute value of its correspondent coefficient in the linear combination. Assume X be the vector of the full biomarker set and a ˆTn = (ân1 , a ˆn2 , . . . , a ˆnp ) be the estimated optimal linear combination obtained from a random sample data set. Then, the biomarkers of X are rearranged, denoted as X(1) , X(2) , . . . , X(p) , according to |âni |’s in ascending order. In which, X(1) is the biomarker with the smallest absolute value component of the coefficient vector, and X(p) is the biomarker with the largest absolute. 19.

(30) value component of the coefficient vector. Afterward, if the discriminatory power of one biomarker in either a marginal or a conditional way is found significant, this tested biomarker will be selected. Now, define A as the set of significant biomarkers in every selecting step. The two selection methods have different testing sets and testing object(s) in every step. In addition, the significance levels of every selecting steps in these two selection approaches are fixed as α. The Forward method starts with a null set A, and the first step is to test the marginal discriminatory power of X(p) . If X(p) is significant, add it to A. Next, given the result, test the significance of X(p−1) , and then X(p−2) ,. . . , X(1) . One biomarker is tested at a time. When the marginal discriminatory power of every biomarker was insignificant,. 政治大. no biomarkers are selected in the final step.. On the other hand, the Backward method starts with the full biomarker set, i.e.. 立. A = X. The very beginning step confirms that the full biomarker set has a significant. ‧ 國. 學. global discriminatory power. If the global discriminatory power test is not significant, we stop the selection and conclude that all biomarkers are not significant. Next, given the. ‧. significance of the full set, the conditional discriminatory power of the biomarker X(1) is tested. If X(1) is insignificance, remove it from A. Given the result, test the significance. Nat. sit. y. of X(2) , and then X(3) and so on. The details for the Forward and the Backward methods. n. al. er. io. are described as follows, respectively.. Forward Method:. Ch. engchi. i n U. v. Step 1. Set A = ∅, test the marginal discriminatory power of X(p) . That is, H0,(p) : Given A, X(p) has no discriminatory power. If H0,(p) is rejected, add X(p) to A. Go to the next step. Step 2. Test the significance of X(p−1) with H0,(p−1) : Given A, X(p−1) has no discriminatory power. If H0,(p−1) is rejected, add X(p−1) to A. Go to the next step. .. . 20.

(31) Step p. Test the significance of X(1) with H0,(1) : Given A, X(1) has no discriminatory power. If H0,(1) is rejected, add X(1) to A. Stop.. Backward Method: Step 0. Set A = {X}, test the global discriminatory power of A. That is, H0,(0) : A has no discriminatory power.. 政治大 by removing X from A and test the hypothesis, 立. If H0,(0) is rejected, go to the next step; otherwise, stop and set A = ∅. (1). H0,(1) : Given A, X(1) has no discriminatory power.. 學. ‧ 國. Step 1. Assess X(1). If H0,(1) is rejected, add X(1) back to A. Go to the next step.. ‧. Step 2. Assess X(2) by removing X(2) from A and test the hypothesis,. y. Nat. er. io. sit. H0,(2) : Given A, X(2) has no discriminatory power.. If H0,(2) is rejected, add X(2) back to A. Go to the next step.. al. n. .. .. Ch. engchi. i n U. v. Step p. Assess the effect of X(p) . If, at the beginning of this stage, A = {X(p) }, stop. Otherwise, remove X(p) from A and test the following null hypothesis, H0,(p) : Given A, X(p) has no discriminatory power. If H0,(p) is rejected, add X(p) back to A. Stop. In our approaches, the estimated best linear combination a ˆn is used only in determining the order of every biomarker. The insignificance of a biomarker with a larger absolute coefficient does not imply the insignificance of other biomarkers. Similarly, the significance 21.

(32) of a biomarker with a smaller absolute coefficient does not imply the significance of other biomarkers. Hence, all biomarkers are assessed. For a study of p biomarkers, the Forward method needs p tests, but the Backward method needs p or p + 1 tests. In the Backward method, after achieving the significant global discriminatory power, if the first p − 1 biomarkers are tested and concluded as insignificant, it’s unnecessary to test the significance of X(p) . We directly conclude that X(p) is the only significant biomarker to detect the disease. During the selection process, the testing problem with an empty A is different from that with a non-empty A. If A is an empty set, the test assesses the marginal discriminatory power of the target biomarker as H0,m in Section 3.2. The test statistic is the. 政治大. correspondent estimated pAUC, Tm . When A is not an empty set, the test assesses the conditional discriminatory power of the target biomarker given A as H0,c in Section 3.2.. 立. The test statistic is the corresponding coefficient in the estimated optimal linear combi-. ‧ 國. 學. nation, Tc,i . In next Chapter, we will show the empirical results on statistical inference for the cases of two, three and four biomarkers.. ‧. n. er. io. sit. y. Nat. al. Ch. engchi. 22. i n U. v.

(33) Chapter 4 Simulation Study. 治政 t = 0.1. The simulation study includes three parts. 大 Section 4.1 gives the performance of 立 the proposed Multiple-Initial algorithm. We take four cases in ℜ and two cases in ℜ In our simulation study, we are interested in the specificity range (0.9, 1.0). That is,. 2. 3. ‧ 國. 學. to study the performance of the proposed algorithm. Next in Section 4.2, the empirical results on statistical inference for the cases consisting of two, three and four biomarkers are. ‧. presented, but we focus on the two-biomarker study. We generate sample data from several scenarios and use the multiple-initial algorithm to find the optimal linear combination. y. Nat. sit. based on the estimated pAUC. The effect of the population parameters of the biomarkers. al. er. io. on the pAUC is first discussed. Next we display the performances of the estimated optimal. v. n. linear combination and of the proposed two biomarker selection approaches. In Section. Ch. i n U. 4.3, show three examples of two biomarkers to investigate which parameters will make. engchi. the plot of θ versus pAUC be bimodal. That is, the best linear combination maximizing the pAUC is not unique or the local maximum exists. Then, we show some results for two selection methods when using three different sample allocations for two groups to find the effect of the pAUC and of the power in two-biomarker study. Additionally, a power analysis by using different sample sizes is performed. At last, we tabulate the critical values of the three proposed statistical tests at various scenarios.. 23.

(34) Table 4.1: Covariance set up in the simulation study. Case I II III IV V VI. Σ1 1 0.9 1 0.9 1 −0.9 1 −0.9 0.25 −0.45 −0.45 1 0.25 −0.45 −0.45 1 ! 1 0.9 0 0.9 1 0 0 0 1 ! 1 −0.9 0 −0.9 1 0 0 0 1 . 政治大. 立 Algorithm Multiple-Initial. 學. ‧ 國. 4.1. Σ0 1 0.9 1 0.9 1 0.9 0.9 1 1 0.9 0.9 1 1 −0.1 −0.1 1 ! 1 0.9 0 0.9 1 0 0 0 1 ! 1 0.9 0 0.9 1 0 0 0 1 . The biomarkers in the non-diseased and diseased groups have the following distributions,. ‧. X|D = d ∼ N(µd , Σd ), d = 0, 1.. y. Nat. For simplicity, the non-disease population mean of each biomarker is set at zero, while. sit. the disease population mean is set at one for all cases in this section. That is, µ0 = 0. n. al. er. io. and µ1 = 1. Different variances, which determine the true effect sizes, and correlations,. iv n C U diseased population, respectively. fromh the enon-diseased n g c h i and. which also affect the pAUC, are considered. Please see Table 4.1 for the details of the two covariance matrices Σ0 , Σ1. In Case I, the two covariance matrices, from diseased and non-diseased populations, are equal. Basically, the two biomarkers both have unit effect size and have strong positive correlations with each other. The resultant pAUC function is unimodal. Case II considers the special case that the two biomarkers are negatively correlated in the diseased population, while the covariance matrix of the non-diseased population is kept the same as in Case I. In the following two cases, Case III and Case IV , we enlarge the effect size of the first biomarker in the diseased population with a smaller variance. However, the two biomarkers have a strong positive correlation in Case III, and a slight negative correlation in Case IV . By considering all possible linear combinations, the pAUCs of Case I 24.

(35) Table 4.2: The maximizer(s) and the corresponding maximal pAUC value found by using the grid search and the proposed multiple-initial method. Case Method I Grid Search Multiple-initial II Grid Search Multiple-initial III Grid Search Multiple-initial IV Grid Search Multiple-initial V Grid Search Multiple-initial V I Grid Search Multiple-initial. Maximizer(s) Optimal pAUC (0.7077, 0.7065) 0.0253 (0.7071, 0.7071) 0.0253 (0.8144, -0.5803),(-0.5803, 0.8144) 0.0372 (0.8145, -0.5802),(-0.5802, 0.8145) 0.0372 (-0.5544, 0.8322) 0.0342 (-0.5568, 0.8308) 0.0342 (0.6580, 0.7530) 0.0365 (0.6215, 0.7834) 0.0364 (0.4255, 0.4184, 0.8024) 0.0387 (0.4222, 0.4223, 0.8021) 0.0387 (-0.4762, 0.8120, 0.3375),(0.8122, -0.4758, 0.3376) 0.0411 (-0.4798, 0.8149, 0.3251),(0.8149, -0.4798, 0.3251) 0.0411. 政治大 and Case III have a unique maximizer. However, one local maximal solution other than 立 the global maximum exists in Case III, and there are two optimal linear combinations in. ‧ 國. 學. Case II as shown in previous Chapter. Next, the two cases of three biomarkers are also taken as Case V and Case V I. Basically, Case V is a simple extension of Case I for the. ‧. case of unique pAUC maximizer, and Case V I is a simple extension of Case II for the. y. Nat. case of two pAUC maximizers.. sit. In all scenarios, the true optimal linear combinations produced by the grid search are. er. io. obtained as the gold standard. All results are listed in Table 4.2. From Table 4.2, we see. al. n. iv n C optimal linear combinations. Moreover, cases of multiple maximizers, such as Case h e n ing the chi U. that the solutions found by our proposed multiple-initial method are all close to the true. II and Case V I, our method can successfully find all the solutions. Overall, the proposed multiple-initial method has quite satisfactory performance in these synthetic examples.. 4.2. Statistical Inference. In this section, through simulation results, we can validate our proposed methods, including the estimated best linear combinations of all biomarkers, the global test of the discriminatory power of a set of biomarkers, and the two biomarker selection methods. We provide the simulation results of two, three and four biomarkers. 25.

(36) First, we discuss the cases of two biomarker. Again, the mean vector in the nondiseased population is fixed as the zero vector, µ0 = 0. Then the mean vector in the diseased population is equal to the vector of the mean difference, µ1 = ∆µ = (∆1 , ∆2 )T . Various ∆µ are selected. Further, every biomarker has unit variance in the two groups. Denote the within group correlation by ρd , d = 0, 1. Hence, the covariance matrices of the two biomarkers in the two groups are 1 ρd Σd = , d = 0, 1. ρd 1. In addition to the independent case, three population correlations, 0.1, 0.5, 0.9 are considered in the simulation study. See Table 4.3. Given the true values of the parameters, the true best linear combination maximizing the pAUC with t = 0.1, denoted as a∗ , is. 政治大 maximizing the pAUC, a X, are reported in Table 4.3. The corresponding true pAUC 立 found by grid search with 106 grids. The distributions of the true best linear combination ∗T. values, pAUC(a∗ ), of a∗T X are also listed in the last column of Table 4.3.. ‧ 國. 學. The first case is a complete null scenarios, that is, their distributions in two groups are the same. All linear combinations do not have any discriminatory power to the disease,. ‧. and their pAUC are equal to t2 /2 = 0.005. We simply define a∗ = (0, 0)T in this case.. y. Nat. In Case 2-31, we simulate the scenario that the mean difference of the first biomarker is. sit. zero, but the second biomarker has a positive mean difference. As a result, the second. er. io. biomarker is considered as a more important biomarker under the situation. In Case. al. n. iv n C disease and the second biomarker to the disease. h eis nthegonly i U c hcontributor. 2-4, the two biomarkers are independent, so the first biomarker is uncorrelated with the Nevertheless,. when the correlation between the first biomarker and the second biomarker exists, the first biomarker provides a nonignorable contribution, see Case 5-31. Comparing with the Case 2-4, we find that the global discriminatory power is improved from the existence of the positive correlation between the two biomarkers. Now, in order to investigate the effect of correlation, we consider various covariance matrices. One is that only correlation of two biomarkers only exists in the non-diseased group (Case 5-13), the other one is only that the correlation of two biomarkers exists in the diseased group (Case 14-22), and another is that there are same correlations between two biomarkers in the non-diseased and diseased groups (Case 23-31). Except for Case 26.

(37) 14-22, the rotation direction in the best linear combination both are clockwise with the increase of the correlation of two biomarkers. In addition, when the correlation between the two biomarkers in the non-diseased and the correlation in the diseased groups are the same, the best linear combination does not change with the mean difference, see Case 23-31.. 立. 政治大. ‧. ‧ 國. 學 y. Nat. sit. Figure 4.1: The distributions of best linear combination, a∗T X , for Case 4 (Left top),. er. io. Case 7 (Right top), Case 10 (Left bottom), Case 13 (Right bottom).. al. n. iv n C As we knows that in the integration for pAUC, heng c h i Uthe non-diseased population distribu-. tion, related to the specificity, determines the threshold range, and the diseased population distribution, related to the sensitivity, determines the magnitude of the integrand. From Table 4.3, we can get some results about the relationships between the distribution of a∗T X in two groups and pAUC(a∗ ). In the following, a thorough discussion is provided. First, we fix the mean difference ∆µ = (0, 1)T , then assess the effect of the distribution of a∗T X in the non-diseased group and pAUC(a∗ ). In specific, we compare Case 4, Case 7, Case 10 and Case 13. Figure 4.1 plot the distributions of a∗T X in two groups for the four cases. When the correlation of two biomarkers in the non-diseased group ρ0 increases from 27.

(38) 0.1 to 0.9, the variance of the best linear combination in the non-diseased groups, Q0 , decreases quickly. The mean of the best linear combination in the diseased group, a∗T µ1 , decreases, too. Consequently, Case 13 has a dramatic increase in pAUC than Case 4 as the correlation of two biomarkers in the non-diseased group becomes large. From Figure 4.1, we see that when the variance of the non-diseased population decreases, the cut-off point c is more toward the left. Given the cutoff points, the pAUC is the integration of the tail probability, so the integration range becomes larger and the pAUC increases. Similarly, we fix the mean difference ∆µ = (0, 1)T , then assess the effect of the distribution of a∗T X in the diseased group on its pAUC(a∗ ). Specifically, Case 4, Case 16, Case 19, Case 22 are compared. When the correlation of two biomarkers in the diseased group. 政治大. ρ1 increases from 0.1 to 0.9, the variance of the best linear combination in the diseased groups, Q1 , increases slowly and their means are closed to the mean of Case 4. Since. 立. their non-diseased distributions all are standard normal distribution, which are the same. ‧. ‧ 國. 學. as Case 4, the pAUC of the linear combination a∗T X defined in Equation (2.1.1) is, Z t ∗T Z t a ∆µ − c(u) c(u) − a∗T ∆µ ∗ √ √ du = du. Φ pAUC(a ) = 1−Φ Q1 Q1 0 0. From this expression, we have the following findings. First, as a∗T ∆µ increases, pAUC. y. Nat. increases. Second, if a∗T ∆µ < Φ−1 (1 − t), pAUC increases as Q1 increases; otherwise,. sit. pAUC decreases as Q1 increases. Hence, the pAUC in Case 22 is larger than the pAUC. er. io. in Case 4 from Table 4.3. See Figure 4.2 for the distribution plots of these cases.. al. n. iv n C U is larger than the change brought brought by the correlation in the h non-diseased e n g c h igroup Comparing Figure 4.1 with Figure 4.2, we find that the change in the distributions. by the correlation in the diseased group. Hence, we conclude that the appearance of a positive correlation in the non-diseased group has a greater effect than that in the diseased group. The last three cases simulate the scenarios that two biomarkers have the same mean difference and are independent with each other. Their pAUC increases with the common mean difference. Next, we study the performance of the proposed estimated best linear combination, a ˆn , and the corresponding pAUC, pAUC(ân ). In order to have a balance study, the sample size of two groups both are 100. All empirical mean and standard error of these estimators 28.

(39) 立. 政治大. Figure 4.2: The distributions of best linear combination, a∗T X , for Case 4 (Left top),. ‧ 國. 學. Case 16 (Right top), Case 19 (Left bottom), Case 22 (Right bottom). over 1000 replications are listed in Table 4.4. The mean and standard error are denoted. ‧. as Ave and SE, respectively. Note that the estimated best linear combinations are found. sit. y. Nat. by our multiple-initial algorithm because the best linear combination may not be unique and has local maximum exist, discussed in Section 2.2.. io. n. al. er. In the view of the estimated best linear combination, we find that the estimated best. i n U. v. linear combination tends to give a conservative and toward-zero result. In the Case 1,. Ch. engchi. where no best linear combination exists in theory, we find the least stable estimation. The variance of the coefficient of the estimated best linear combination under the complete null scenario are the largest values among all cases. Figure 4.3 are the density plots of the two coefficients in the estimated optimal linear combination. We can see that they both follow a bimodal distribution, and have a high chance of observing the two boundaries. The variations reduce as the association of the two biomarkers with the disease becomes large. In addition, the corresponding estimated pAUC overestimates the true value, and similarly, the estimation improves as the biomarkers are more correlated with the disease. The last column of Table 4.4 is the empirical power of the proposed. 29.

(40) test for the global discriminatory ability at the significance level α = 0.05. We find that the test not only adequately controls the type I error rate, but also has satisfactory performances in alternative cases. In addition, we try other complete null cases, which the non-diseased and the diseased distributions are the same. Then, we find the type I error is quite controlled in every different completely null case. Now, we compare the performances of the Forward method and the Backward method in the variable selection. For every testing procedure, the significance level α is 0.05 and the bootstrapping sample size is 500 in every replication. For a study of two biomarkers, four possible outcomes in the conclusion are defined as follows: (c1 , c2 ), if two biomarkers both are selected; (1, 0), if the first biomarker is selected; (0, 1), if the second biomarker. 政治大. is selected; (0, 0), if two biomarker both are not selected. The last case means that no biomarker is significant to detect the disease, the final reduced biomarker set is null, and. 立. hence no corresponding pAUC is obtained. Once we obtain a non-empty reduced set,. ‧ 國. 學. we compute the best linear combination of the reduced set and its correspondent pAUC. Table 4.5 reports the performance of the resultant best linear combination of the non-. ‧. empty reduced set among the 1000 replications. The proportions of the four possible outcomes of two methods among the 1000 replications are listed in Table 4.6. In every. Nat. sit. y. scenario, the figure in boldface is correspondent to the most likely outcome.. er. io. From Table 4.5, we find that the performance of the Forward method is better than the Backward method in most cases except for the complete null case. In addition, when the. n. al. Ch. i n U. v. first biomarker has a non-ignorable contribution mainly due to the existence of a positive. engchi. correlation between the two biomarkers, such as Case 8-16 and Case 28-31, the Backward approach may have an unsatisfactory performance. From Table 4.6, we find that in these cases, the proportion of only selecting the first biomarker is quite high, and it is out of our expectation of selecting both biomarkers. The confusing result arises from an improper null distribution generated in these scenarios. In applying the Backward approach in these scenarios, we usually have the following outcomes. After the significant global effect test at step 0 is obtained, the first biomarker is the first tested variable since it tends to have a smaller absolute coefficient. By having a non-ignorable contribution in pAUC, it is usually determined as a significant biomarker. Next, the conditional discriminatory 30.

(41) power of the second biomarker is assessed through the use of its corresponding coefficient as the test statistic. In the proposed parametric bootstrap sampling method, the null distribution assume that the tested biomarker in two groups has common mean, common variance, and is uncorrelated with other biomarker. However, in this case, the significant effect of the first biomarker comes from the correlation, and eliminating the correlation produces a completely null scenario as Case 1. We learn that the estimated coefficient has a great variation in this situation as discussed in previous paragraph. As a result, it is difficult to obtain a significance in testing the conditional discriminatory power of the second biomarker. Finally, we get an inadequate result: The first biomarker is the only selected biomarker.. 政治大 0.60 0.55. ‧ 國. sit. y. 0.50. er. 0.45. Density. Ch. engchi. i n U. v. 0.30. 0.35. 0.35. n. al. 0.40. 0.55 0.50 0.45. io. 0.40. ‧. Nat. Density. 學. 0.60. 立. −1.0. −0.5. 0.0. 0.5. 1.0. N = 1000 Bandwidth = 0.1604. −1.0. −0.5. 0.0. 0.5. 1.0. N = 1000 Bandwidth = 0.1593. Figure 4.3: The distribution of estimated linear combination for Case 1 (Left:â1 , Right:â2 ).. On the other hand, in these scenarios, the Forward approach, which starts from testing 31.

(42) the biomarker with the largest absolute coefficient, is not able to take the advantage of the correlation. When the mean difference of the second biomarker is small, such as Case 8-9, Case 11-12, and Case 29-30, the Forward approach has a larger chance of selecting no biomarker than the Backward approach and hence is less powerful. But as the mean difference of the biomarker becomes moderate to large, the Forward approach has a greater proportion of selecting both biomarkers, see Case 10, 13, 31. Moreover, on the average the resultant optimal pAUC of the reduced set selected from the Forward approach over 1000 replications is always larger than that from the Forward approach in Case 8-13, and Case 29-31. Hence, we conclude that when the mean difference is not too small, the Forward method performs better than the Backward method.. 政治大. Two-biomarker simulations are generated from multivariate-t distributions with degree freedom 3 to investigate the robustness of our biomarker-selection methods with respect to. 立. deviation of the normality assumption. In Table 4.7, the true maximal pAUC, pAUC(a∗ ),. ‧ 國. 學. which is found under the multivariate t distribution, is reported. In addition, over 1000 replications, the average and the standard error of the estimated maximal pAUC value of. ‧. the reduced biomarker set, which is selected via our biomarker selection methods on the basis of the normality assumption, are also present in Table 4.7. From this table, we find. Nat. sit. y. that our methods give overestimated results in these cases. Thus, we conclude that the. er. io. proposed optimal pAUC estimation and the biomarker selection methods are sensitive to the normality assumption.. n. al. Ch. i n U. v. Next we study the cases consisting of three and four biomarkers, i.e. p = 3 or 4.. engchi. Again, assume µ0 = 0 in the non-diseased group, and µ1 = ∆ = (∆1 , . . . , ∆p )T in the diseased group. Further, the covariance matrices are of the following form: for d = 0, 1,   ! 1 ρd 0 0 1 ρd 0  ρ 1 0 0  . if p = 3, Σd = ρd 1 0 , and if p = 4, Σd =  d 0 0 1 0  0 0 1 0 0 0 1. The performance of the estimated pAUC of the best linear combination of the full biomarker set, and that of the reduced biomarker set found from the two biomarker selection approaches, are presented in Table 4.8. We can see that similar to the cases of p = 2, the estimated pAUC tends to overestimate the true value. By using the Backward approach, we are less likely to obtain a confusing conclusion as in p = 2. Currently, the 32.