A new approach based on self-consistency algorithm

3. Inference under Right Censoring

3.5 A new approach based on self-consistency algorithm

Portnoy (2003) proposed a method which utilizes the idea of self-consistency based on the Kaplan-Meier estimator. Specifically the Kaplan-Meier based on data {(X_i, ) (_i i1,..., )}n can be written as

Alternatively it also satisfies:

S X . Now we return to the quantile problem. Setting ( ) 1

Figure 3.1 shows the idea of (3.7). For example, there is a single sample with 10 observations at X 1, 2,...,10 with X 5, 7,8 be censored points. functions but the former is decreasing, while the latter is increasing. Notice that the value of empirical quantile  is equivalent to 1 minus the corresponding value of

Y( )

This implies that a censored observation with _i 0 will contribute the weight ( ) 1

 . Note that the proposed algorithm is based on minimizing

 

The idea proposed by Portnoy (2003) is novel, but it is hard to understand the details.

Later work by Neocleous, Branden and Portnoy (2006) and Portnoy and Lin (2010) provide more explanations on the self-consistency algorithm.

Chapter 4 Conclusion

In this thesis, we review literature on quantile regression models. Comparing with other regression models, the quantile regression model is more flexible without imposing strong assumption on the error distribution and can well explain the heterogeneity data. Unlike the squared loss function in the least squared regression method, the loss function for estimating the quantile is not differentiable. Accordingly many nice analytic approaches such as the Newton-Raphson method are no longer applicable. Nevertheless, without censoring, some nice analytic properties such as monotonicity and convexity for the estimating and objective functions still exist. They are also helpful for developing numerical algorithms to implement the estimation.

When censoring is present, how to modify the estimation procedures and how the modifications affect the analytical properties are the main issues. The thesis is just an initial review for these useful but complicated methods. Thorough investigation to understand these methods better deserve future study.

Figure 1.1: Definition of  th quantile.

Figure 1.2: Quantiles for two samples.

Figure 2.1: The loss function.

Figure 2.2: Plots of U_n( ) and R_n( ) .

Figure 2.3: Objective and estimating functions based on LSE with

~ (1 2)

Zi Bernoulli .

Figure 2.4: Plots of ( ( ), ( ),    R_n) and ( ( ), ( ),    U_n^{( )}^k ) (k 1, 2) based on quantile loss function with  0.3. Here Z_i ~Bernoulli(1 2).

Figure 2.5: Plots of ( ( ), ( ),    R_n) and ( ( ), ( ),    U_n^{( )}^k ) (k 1, 2) based on quantile loss function with  0.3. Here Z_i ~Uniform(0,1).

~ (1 2)

Zi Bernoulli :

~ (0,1)

Zi Uniform :

Figure 2.6: Plots of U_n⁽²⁾( ( ))  based on Z_i ~Bernoulli(1 2) and

~ (0,1)

Zi Uniform .

Step 1: initial point Step 2:

Step 3: Step 4:

Step 5: Stpr 6:

Final step:

Figure 2.7: Intermediate steps for the simulated annealing algorithm.

Figure 3.1: Idea of equation (3.7)

Figure 3.2: The survival function S_Y( )y and the empirical quantile  .

References:

[1] Barrodale, I., and Roberts, F. D. K. (1974), “Solution of an Overdetermined System of Equations in the  Norm,” Communications of the ACM, 17, ₁ 319-320.

[2] Honoré, B., Khan, S., and Powell J. L. (2002), “Quantile Regression Under Random Censoring,” Journal of Econometrics, 109, 67-105.

[3] Lin, D.Y., and Geyer, C. J. (1992), “Computational Methods for Semiparametric Linear Regression with Censored Data,” Journal of Computational and

graphical Statistics, 1, 77-90.

[4] Necleous, T., Branden, K. V., and Portnoy, S. (2006), “Correction to Censored Regression Quantiles by S. Portnoy, 98 (2003), 1001-1012,” Journal of the American Statistical Association, 101, 869-861.

[5] Peng, L., and Huang, Y. (2008), “Survival Analysis with Quantile Regression Models,” Journal of the American Statistical Association, 103, 637-649.

[6] Portnoy, S. (2003), “Censored Regression Quantiles,” Journal of the American Statistical Association, 98, 1001-1012; Corr, 101, 860-861.

[7] Portnoy, S., and Lin, G. (2010), “Asymptotics for censored regression quantiles,”

Journal of Nonparametric Statistics, 22, 115-130.

[8] Powell, J. L. (1986), “Censored Regression Quantiles,” Journal of Econometrics, 32, 143-155.

[9] Ying, Z., Jung, S. H., and Wei, L. J. (1995), “Survival Analysis with Median Regression Models,” Journal of the American Statistical Association, 90, 178-184.

[10] Koenker, R., and Geling O. (2001), “Reappraising Medfly Longevity: A Quantile Regression Survival Analysis,” Journal of the American Statistiical Association, 96, 458-468.

[11] Koenker, R., and V. D’Orey (1987), “Computing Regression Quantiles,” Journal of the Royal Statistical Society, 36, 383-393.

[12] Koenker, R. (2005), Quantile Regression, Cambridge: Cambridge University Press.

Appendix

A.1 Data generation from a quantile regression model

Usually to generate Y which has the distributional function F_Y(.), we can monotone increasing function of t , it follows that

Pr(Y_iZ_i^T ( ) |Z_i)Pr(Z_i^T(U_i)Z_i^T ( ))Pr(U_i ) . 

The next question is how to set the distribution of Y as a target one. Consider a _i simple two-sample case with Z_i ~Bernoulli p( ). Notice that

Now let n500, p1 2 and c₀ 1.5. Then we generate data from a quantile regression model by the following codes of R:

n=500; b=1.5; Z=sample(c(0,1), size=n, replace=TRUE);

u=runif(n,0,1); btau=b*u; Y=array(0,n);

Y=qnorm(u)+Z*btau windows(); plot(Z, Y);

Figure A.1 shows that the two groups of generated data (Z Y_i, )_i (i1,...,n) with the

control group (Z_i 0) and the treatment group (Z_i 1). To check the normality for the control group, we can examine Figure A.2 which provides a histogram plot and the Q-Q plot which show that Y Z_i| _i 0 is exactly generated from a standard normal distribution. Notice that

Pr(Y_i  ( ) ( ) |Z_i  1) F1( ( )   ( ))

which means that the slope  ( )F₁^¹( ) F₀^¹( ) is the quantile treatment effect, illustrated in Figure A.3. To check the accuracy of generated data, we plot the empirical quantile functions and the corresponding theory functions of F and ₀ F ₁ in Figure A.4. Note that the empirical functions of F and ₀ F are ₁

From Figure A.4, we see that both empirical quantile functions match the corresponding theory functions very well. That is, our data generation from a quantile regression model is correct.

Because the objective function is not differentiable, to evaluate the estimation procedures, we make 2-D plots of



 ^{( ),}Rn^{( ( ))} 



and



^{ }^^{( ),}^Rⁿ^{( ( ))}^{ }^



for given

0.3, 0.5, 0.7

  by minimizing the objective function

Figure A.5, we see that there is a minimum for the parabolic curve in each panel and these are the estimators of ( )  and ( ) ^ which are marked by blue points. It shows that the estimators of  ( ) and ( ) ^ are close to the true values  ₀( ) and

0( )

 ^ which are marked red dash lines. Figure A.5 is made by the following codes of R:

alpha=seq(-5, 5, by=0.1); beta=seq(-5, 8, by=0.1);

for(k in 2:4) {

tau=0.1*(2*k-1); windows(); par(mfrow=c(1,2));

Ra=array(0, length(alpha)); Rb=array(0, length(beta)) amax=-2^31; amin=2^31; bmax=amax; bmin=amin;

for(i in 1:length(alpha)) { for(j in 1:n) {

if(Y[j]<alpha[i]+Z[j]*b*tau)

Ra[i]=Ra[i]+(Y[j]-alpha[i]-Z[j]*b*tau)*(tau-1) else Ra[i]=Ra[i]+(Y[j]-alpha[i]-Z[j]*b*tau)*tau } if(Ra[i]> amax) amax=Ra[i]

if(Ra[i]< amin) {

amin=Ra[i]; ax=alpha[i] } }

plot(alpha, Ra, type='l', xlab=expression(alpha(tau)),

ylab=expression(paste('R(',alpha(tau),')'))) title(main=substitute(list(tau)==list(a), list(a=tau)))

abline(v=qnorm(tau), col='red', lty=2)

text(qnorm(tau)+0.3, amax-200, substitute(list(alpha[0](tau))==list(a),

list(a=round(qnorm(tau),3))), col='red') points(ax, amin, col='blue')

text(ax, amin+350, substitute(list(hat(alpha)(tau))==list(a),

list(a=round(ax,3))), col='blue') for(i in 1:length(beta)) {

for(j in 1:n) {

if(Y[j]<qnorm(tau)+Z[j]*beta[i])

Rb[i]=Rb[i]+(Y[j]-qnorm(tau)-Z[j]*beta[i])*(tau-1) else Rb[i]=Rb[i]+(Y[j]-qnorm(tau)-Z[j]*beta[i])*tau }

if(Rb[i]> bmax) bmax=Rb[i]

if(Rb[i]< bmin) {

bmin=Rb[i]; bx=beta[i]; } }

plot(beta, Rb, type='l', xlab=expression(beta(tau)),

ylab=expression(paste('R(',beta(tau),')'))) title(main=substitute(list(tau)==list(a), list(a=tau)))

abline(v=b*tau, col='red', lty=2)

text(b*tau+0.2, bmax-100, substitute(list(beta[0](tau))==list(a),

list(a=round(b*tau,3))),col='red') points(bx, bmin,col='blue')

text(bx, bmin+250, substitute(list(hat(beta)(tau))==list(a),

list(a=round(bx,3))), col='blue') } Now we do the data generation from a quantile regression model 100 times. The mean, variance and biased for the estimators of  ( ) and ( ) ^ are presented in the following:

Estimator of  ( ) Estimator of  ^( )

Mean Variance Bias Mean Variance Bias

 0.1 -1.269 0.0076 0.0125 0.164 0.019 0.014

 0.3 -0.524 0.0052 4 10 ^⁴ 0.46 0.01777 0.01

 0.5 2 10 ^¹⁶ 0.0056 2 10 ^¹⁶ 0.747 0.01625 -0.003

 0.7 0.511 0.0056 -0.0134 1.038 0.01874 -0.012

 0.9 1.273 0.0066 -0.0085 1.331 0.01953 -0.019

From the table, we see that the values of bias are very small for all  .That is, the average values of the estimators are closed to the true value  ₀( ) and  ^₀( ). A.2 How to generate the 3-D plots

To show how to generate the 3-D plots in R, we take Figure 2.4, Figure 2.5 and Figure 2.6 as examples. Note that the functions for 3-D plots are in package scatterplot3d and rgl for R. If you are the first time to make a 3-D plot, you must install them and load them into R. The code of 3-D plot for R is as follow:

###Loading these packages

library(scatterplot3d) ; library(rgl);

### Data generate n=10; b=3.5; tau=0.3;

Z=sample(c(0,1),size=n,replace=TRUE); u=runif(n,0,1); btau=b*u;

Y=array(0,n); Y=qnorm(u)+Z*btau;

### Range of alpha and beta

alpha=seq(-4, 4, by=0.5); beta=seq(-4, 4, by=0.5)

a=array(rep(c(alpha), each=length(beta)), length(alpha)*length(beta)) b=array(rep(c(beta), times=length(alpha)), length(alpha)*length(beta))

### 3-D plot for R(.)

R=array(0, length(alpha)*length(beta)) ; NR=array(0, length(alpha)*length(beta));

for(k in 1: length(alpha)) { for(s in 1: length(beta)) {

NR[k+(s-1)*length(alpha)]=R[s+(k-1)*length(beta)] } } col=rainbow(length(alpha)*length(beta))

open3d()

plot3d(a, b, R, pch=176, col=col, type='p', xlab="alpha", ylab="beta",

zlab=" R(tau)", smooth=FALSE) surface3d(alpha, beta, NR, col=col, alpha=0.65)

### 3-D plot for U1(.)

U1=array(0, length(alpha)*length(beta)); NU1=array(0, length(alpha)*length(beta)) for(k in 1: length(alpha)) {

for(s in 1: length(beta)) { NU1[k+(s-1)*length(alpha)]=U1[s+(k-1)*length(beta)] }}

col=rainbow(length(alpha)*length(beta)) open3d()

plot3d(a, b, U1, pch=176, col=col, type= 'p',xlab="alpha", ylab="beta",

zlab=" U1(tau)") surface3d(alpha, beta, NU1, col=col, alpha=0.65)

### 3-D plot for U2(.)

U2=array(0, length(alpha)*length(beta)); NU2=array(0, length(alpha)*length(beta)) for(k in 1: length(alpha)) {

for(s in 1: length(beta)) { NU2[k+(s-1)*length(alpha)]=U2[s+(k-1)*length(beta)] } }

col=rainbow(length(alpha)*length(beta)) open3d()

plot3d(a, b, U2, pch=176, col=col, type= 'p',xlab="alpha", ylab="beta",

zlab=" U2(tau)") surface3d(alpha, beta, NU2, col=col, alpha=0.65)

### 3-D plot for ||U(.)||

U=array(0, length(alpha)*length(beta)); NU=array(0, length(alpha)*length(beta)) for(k in 1: length(alpha)) {

for(s in 1: length(beta)) { for(j in 1: n) {

if(Y[j]< alpha[k]+beta[s]*Z[j])

U[s+(k-1)*length(beta)]=U[s+(k-1)*length(beta)]+Z[j]*(tau-1)/n else

U[s+(k-1)*length(beta)]=U[s+(k-1)*length(beta)]+Z[j]*tau/n } U[s+(k-1)*length(beta)]=abs(U[s+(k-1)*length(beta)])

NU[k+(s-1)*length(alpha)]=U[s+(k-1)*length(beta)] } } col=rainbow(length(alpha)*length(beta))

open3d()

plot3d(a, b, U, col=col, type= 'p',xlab="alpha", ylab="beta", zlab=" U(tau)") surface3d(alpha, beta, NU, col=col, alpha=0.65)

Figure A.1: Scatterplot of generated data from a quantile regression model.

Figure A.2: Histogram plot and Q-Q plot of Y Z_i | _i 0.

Figure A.3: Plot of the quantile treatment effect  ^( ).

Figure A.4: Plots of the empirical quantile functions and the corresponding theory functions of F and ₀ F . ₁

Figure A.5: Plots of



 ^{( ),}Rn^{( ( ))} 



and



^{ }^^{( ),}^Rⁿ^{( ( ))}^{ }^



在文檔中設限資料下分量迴歸模型分析之文獻回顧 (頁 26-0)