• 沒有找到結果。

In this section, we report simulation studies of the performance of OGA+

HDBIC+Trim. These simulations consider the regression model y?t =

p0

X

j=1

βj?wtj +

p

X

j=p0+1

βj?wtj+ ξ?t, t = 1, 2, ..., n, (5.1) where βp0+1, βp0+2, ..., βp = 0, p  n, ηtjare i.i.d. N (0, σ2η), ∀t = 1, 2, ..., n, j = 1, 2, ..., p, and are independent of xtj. ξt are i.i.d. N (0, σ2ξ) and are indepen-dent of xtj, ηtj. ηyt are i.i.d. N (0, ση2y) and are independent of xtj, ηtj, ξt

Examples 1 and 2 consider the case

xtj = dtj + ˜η ˜xt, (5.2) in which ˜η ≥ 0 and (dt1, dt2, ..., dtj, ˜xt)T, t = 1, 2, ..., n are i.i.d. normal with mean (1, 1, ..., 1, 0)T and covariance matrix I. We standardize the variance of xtj by replacing xtj with √xtj

1+˜η2. Since for any J ⊂ {1, 2, ..., p} and 1 ≤ i ≤ p with i /∈ J,

λmin(R(J )) = 1

1 + ˜η2 + ση2 > 0 and ||R−1(J )γi(J )||1 < 1,

(3.1) is satisfied. Moreover, Corr(wti, wtj) = 1+˜η˜2η2 increases when ˜η grows.

Example 1. Consider (5.1) with p0 = 5, (β1, β2, ..., β5) = (3, −3.5, 4, −2.8, 3.2), σξ2 = 1, ση2y = 0.01 and assume that (5.2) holds. The cases ˜η = 0, which means the regressors are uncorrelated, ση2 = 0.01, 0.5, 0.1, and (n, p) = (50, 1000), (100, 2000), (200, 4000) are considered here. We choose Kn = b5(n/pq12 )12c and allow q1 to vary between 4 and 15. We have also allowed D in Kn = bD(n/pq12 )12c to vary between 3 and 10, and the results are similar to those for D = 5. We perform 1000 simulations on each case. Define the mean squared prediction errors

MSPE = 1 1000

1000

X

l=1

(

p

X

j=1

βj?wn+1(l) − ˆy(l)n+1)2

in which x(l)n+1,1, x(l)n+1,2, ..., x(l)n+1,p are the regressors associated with yn+1(l) , the new outcome in the lth simulation run, and ˆyn+1(l) denotes the predictor of y(l)n+1. Table 1 shows that OGA+HDBIC+Trim is very sensitive to the order of moment bounds q1, it performs well with proper q1, but performs poorly with improper q1. If q1 is too small, the penalty for the number of predictor variables in HDBIC is too large, so, OGA+HDBIC tends to be underfitting;

if q1 is too large, the penalty for the number of predictor variables in HD-BIC is too small, so, OGA+HDHD-BIC tends to be overfitting. With moderate order of moment bounds (q1 = 8, 10), in the simulations for n ≥ 100, OGA includes the 5 relevant regressors within Kn iterations for 99.9% or more of the simulations, and HDBIC+Trim identify the smallest correct model for 98% or more of the simulations.

Table1. Frequency, in 1000 simulations, of including all five relevant variables (Correct), of selecting exactly the relevant variables (E), of selecting all relevant variables and i irrelevant variables (E+i).

ση2 q1 n p E E+1 E+2 E+3 E+4 E+5 Correct MSPE

0.01 4 50 1000 0 0 0 0 0 0 0 64.02502

100 2000 0 0 0 0 0 0 0 53.08281

200 4000 0 0 0 0 0 0 0 55.59686

6 50 1000 623 0 0 0 0 0 623 24.54740

100 2000 1000 0 0 0 0 0 1000 0.15931

200 4000 1000 0 0 0 0 0 1000 0.08096

8 50 1000 911 18 0 0 0 0 929 4.34789

100 2000 1000 0 0 0 0 0 1000 0.17550

200 4000 1000 0 0 0 0 0 1000 0.08053

10 50 1000 571 129 43 17 17 7 922 10.29011

100 2000 983 16 1 0 0 0 1000 0.17837

200 4000 999 1 0 0 0 0 1000 0.16207

15 50 1000 0 0 0 0 0 0 914 14.44628

100 2000 21 12 10 7 3 2 1000 4.70902

200 4000 677 225 75 14 5 2 1000 0.19443

0.05 5 50 1000 0 0 0 0 0 0 0 65.54043

100 2000 0 0 0 0 0 0 0 53.91476

200 4000 0 0 0 0 0 0 0 47.64495

6 50 1000 2 0 0 0 0 0 2 59.51543

100 2000 689 0 0 0 0 0 689 16.94148

200 4000 1000 0 0 0 0 0 1000 0.18862

8 50 1000 816 16 2 0 0 0 834 13.14926

100 2000 1000 0 0 0 0 0 1000 0.39365

200 4000 1000 0 0 0 0 0 1000 0.17408

10 50 1000 522 118 36 21 8 14 861 13.67555

100 2000 983 16 1 0 0 0 1000 0.44005

200 4000 998 2 0 0 0 0 1000 0.17630

15 50 1000 0 0 0 0 0 0 854 26.64408

100 2000 11 17 12 10 3 0 1000 10.66257

200 4000 683 218 75 19 1 2 1000 0.43310

ση2 q1 n p E E+1 E+2 E+3 E+4 E+5 Correct MSPE

0.1 5.5 50 1000 0 0 0 0 0 0 0 59.78768

100 2000 0 0 0 0 0 0 0 50.27432

200 4000 28 0 0 0 0 0 28 45.76851

6 50 1000 0 0 0 0 0 0 0 59.57889

100 2000 20 0 0 0 0 0 20 46.85445

200 4000 973 0 0 0 0 0 973 1.57907

8 50 1000 507 9 0 0 0 0 516 28.21477

100 2000 999 0 0 0 0 0 999 0.65213

200 4000 1000 0 0 0 0 0 1000 0.31026

10 50 1000 493 94 36 12 8 9 744 22.35131

100 2000 987 13 0 0 0 0 1000 0.63598

200 4000 999 1 0 0 0 0 1000 0.29356

15 50 1000 0 0 0 0 0 0 773 43.94780

100 2000 16 13 8 4 5 0 1000 16.68930

200 4000 684 222 66 20 3 2 1000 0.83435

Example 2. The settings of this example are the same with Example 1, but we allow ση2 to have a rate of convergence such that

||U e

||1 ≤ C(

s pq12

n ), (5.3)

for some C varies between 0.01 and 45. Two cases are considered here:

Case 1: Consider ˜η = 0, which means the regressors are uncorrelated, and let σ2η = C

r

p

q12

n . In this case, the inequality in (5.3) becomes an equality.

Table 2 shows that OGA+HDBIC+Trim agrees with the asymptotic theory of Theorem 4. In the cases of n = 50, p = 1000, OGA can include all relevant variables at least 90% of the time if C ≤ 1 (ση2 ≤ 0.02), furthermore, with proper order of moment bounds (q1 = 8), OGA+HDBIC+Trim can identify the smallest correct model over 88% of the time. In the cases of n ≥ 100, OGA always include all relevant variables when C ≤ 5 (σ2η ≤ 0.085), further-more, when q1 = 8, 10, HDBIC+Trim identifies the smallest correct model at least 98% of the time. In the case of n = 200, p = 4000, q1 = 12, even if ση2 = 0.625, which is 62.5% of the variance of the real input variables,

OGA+HDBIC+Trim could still identify the smallest correct model for 91.5%

of the time. Since the penalty term of each number of predictor variables in HDBIC is log(n)pq12 , when n is small, a small q1 is appropriate to prevent from being overfitting; When n is large, a larger q1 can be tolerated without being seriously overfitting.

Case 2: Consider ˜η = 2, which means the regressors are highly correlated (80%), and let σ2η = C1p

r

p

q12

n , which implies (5.3). Table 3 shows that in the cases of n = 50, p = 1000, the performance of OGA is getting worse with the ratio of including all relevant variables decreases to about 50 ∼ 60% of the time when C = 0.01 (ση2 is about 0.0001) due to the highly correlatedness of the regressors. However, when n ≥ 100, q1 = 10, 12, C ≤ 5 (ση2 ≤ 0.024), OGA can include all relevant variables for 98% or more of the time, and HDBIC+Trim identifies the smallest correct model for 80% or more of the time. In the case of n = 200, p = 4000, q1 = 10, C = 35 (ση2 is about 0.09), HDBIC+Trim can identify the smallest correct model for 85% of the time.

Table2. Case 1 in Example 2, with η e

= 0 and σ2η = C r

p

2 q1

n . The other notations are the same in Table 1.

q1 C n p E E+1 E+2 E+3 E+4 E+5 Correct MSPE σ2η

8 0.01 50 1000 913 13 0 0 0 0 926 6.28107 0.00020

100 2000 999 1 0 0 0 0 1000 0.11205 0.00016

200 4000 1000 0 0 0 0 0 1000 0.05414 0.00012

1 50 1000 887 13 0 0 0 0 900 7.44817 0.02075

100 2000 1000 0 0 0 0 0 1000 0.18092 0.01592

200 4000 1000 0 0 0 0 0 1000 0.08328 0.01223

5 50 1000 459 8 1 0 0 0 468 31.75064 0.11312

100 2000 999 1 0 0 0 0 1000 0.53822 0.08503

200 4000 1000 0 0 0 0 0 1000 0.20888 0.06431

20 50 1000 0 0 0 0 0 0 0 45.31566 0.68492

100 2000 6 0 0 0 0 0 6 34.84350 0.45657

200 4000 963 0 0 0 0 0 963 1.03303 0.31875

10 0.01 50 1000 569 125 46 24 20 16 920 9.10107 0.00017

100 2000 984 16 0 0 0 0 1000 0.10932 0.00013

200 4000 1000 0 0 0 0 0 1000 0.05647 0.00010

1 50 1000 559 130 59 26 11 9 911 8.49444 0.01740

100 2000 980 19 1 0 0 0 1000 0.19531 0.01313

200 4000 997 3 0 0 0 0 1000 0.07708 0.00992

5 50 1000 493 101 35 18 10 8 763 18.06490 0.09350

100 2000 983 16 1 0 0 0 1000 0.50227 0.06929

200 4000 999 1 0 0 0 0 1000 0.19478 0.05165

35 50 1000 0 0 0 0 0 0 0 42.26501 1.49096

100 2000 13 2 0 0 0 0 15 28.75040 0.83021

200 4000 901 0 0 0 0 0 901 1.74922 0.52387

12 0.01 50 1000 6 2 1 0 1 1 932 14.44258 0.00015

100 2000 789 151 37 16 4 1 1000 0.21749 0.00011

200 4000 970 28 2 0 0 0 1000 0.05420 0.00009

1 50 1000 0 4 1 1 0 0 908 17.25881 0.01548

100 2000 781 170 35 5 4 1 1000 0.39322 0.01155

200 4000 973 25 1 1 0 0 1000 0.08424 0.00863

5 50 1000 3 3 1 0 2 8 777 41.15342 0.08249

100 2000 798 147 39 7 3 2 1000 0.81430 0.06055

200 4000 999 1 0 0 0 0 1000 0.19478 0.05165

45 50 1000 0 0 0 0 0 0 0 119.84890 2.18342

100 2000 14 3 1 0 0 0 18 26.23490 1.05687

200 4000 915 25 2 0 0 0 942 1.53381 0.62584

Table3. Case 2 in Example 2, with η e

= 2 and σ2η = C1p r

p

2 q1

n . The other notations are the same in Table 1.

q1 C n p E E+1 E+2 E+3 E+4 E+5 Correct MSPE ση2

8 0.01 50 1000 507 3 3 0 0 0 513 4.75287 0.00011

100 2000 1000 0 0 0 0 0 1000 0.04985 0.00006

200 4000 1000 0 0 0 0 0 1000 0.02725 0.00003

1 50 1000 214 3 0 0 0 0 217 7.69615 0.01061

100 2000 994 0 0 0 0 0 994 0.09892 0.00578

200 4000 1000 0 0 0 0 0 1000 0.03679 0.00315

5 50 1000 2 0 0 0 0 0 2 11.83424 0.05303

100 2000 711 0 0 0 0 0 711 0.45313 0.02891

200 4000 1000 0 0 0 0 0 1000 0.08263 0.01576

15 50 1000 0 0 0 0 0 0 0 15.98425 0.15908

100 2000 2 0 0 0 0 0 2 9.95848 0.08674

200 4000 971 0 0 0 0 0 971 0.48686 0.04729

10 0.01 50 1000 466 77 21 21 3 4 610 3.65043 0.00010

100 2000 990 10 0 0 0 0 1000 0.05153 0.00005

200 4000 1000 0 0 0 0 0 1000 0.02699 0.00003

1 50 1000 343 43 19 6 2 3 429 5.37950 0.00892

100 2000 982 17 0 0 0 0 999 0.07926 0.00478

200 4000 997 3 0 0 0 0 1000 0.02730 0.00256

5 50 1000 52 8 2 1 0 0 67 9.36974 0.04462

100 2000 958 21 1 0 0 0 980 0.33916 0.02391

200 4000 1000 1 0 0 0 0 1000 0.07646 0.01281

35 50 1000 0 0 0 0 0 0 0 20.92983 0.31231

100 2000 1 0 0 0 0 0 1 13.52782 0.16736

200 4000 851 8 0 0 0 0 859 1.35893 0.08969

12 0.01 50 1000 64 20 13 7 10 14 573 5.72614 0.00008

100 2000 834 122 32 6 5 0 999 0.09410 0.00004

200 4000 980 19 1 0 0 0 1000 0.02973 0.00002

1 50 1000 59 17 14 9 8 1 466 7.44604 0.00795

100 2000 855 105 28 7 4 0 1000 0.10421 0.00421

200 4000 978 22 0 0 0 0 1000 0.03700 0.00223

5 50 1000 12 10 2 2 3 0 97 12.88040 0.03976

100 2000 803 137 35 10 4 0 993 0.31848 0.02106

200 4000 969 30 1 0 0 0 1000 0.07507 0.01116

40 50 1000 0 0 0 0 0 0 0 26.24179 0.31811

100 2000 7 7 0 0 0 0 14 11.19198 0.16851

200 4000 816 144 2 0 0 0 962 1.10595 0.08927

References

Abhirup Datta and Hui Zou. (2016). CoCoLasso for High-dimensional Error-in-variables Regression. https://arxiv.org/abs/1510.07123.

Alexandre Belloni, Mathieu Rosenbaum and Alexandre B. Tsybakov. (2014). An {`1, `2,

`}-Regularization Approach to High-Dimensional Errors-in-variables Models. https:

//arxiv.org/abs/1412.7216.

Alexandre Belloni, Mathieu Rosenbaum and Alexandre B. Tsybakov. (2016). Linear and Conic Programming Estimators in High-Dimensional Errors-in-variables Models. https:

//arxiv.org/abs/1408.0241.

Ching-Kang Ing and Tze Leung Lai. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statist. Sinica, 1473-1513.

Ching-Kang Ing and Kunling Huang. (2016). Model Selection for High-Dimensional Mul-tivariate Time Dependent Models (Unpublished master’s thesis). National Taiwan University, Taipei City.

C. Z. Wei. (1987). Adaptive prediction by least squares predictors in stochastic regression models with applications to time series. Ann. Statist. 15(4):1667-1682.

David F. Findley and Ching-Zong Wei. (1993). Moment bounds for deriving time series CLTs and model selection procedures. Statist. Sinica, 453-480.

Po-Ling Loh and Martin J. Wainwright. (2012). High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Ann. Statist. 40(3):1637 -1664.

Temlyakov, V. N. (2000). Weak greedy algorithms. Adv. Comput. Math. 12, 213-227.

Appendix

Before we prove (3.4), three lemmas are needed first:

Lemma 1. Assume (C2), max

1≤i,j≤p

Lemma 2. Assume (C1)-(C4), max

1≤i≤p it follows from (C2) that

1≤i,j≤pmax

Similarly,

Proof of Lemma 2. Note that E( max

where the third inequality comes from Lemma 2 in Wei (1987), K is a positive constant depends only on q1 and C2 in (C1); the last inequality comes from Jensen’s inequality. From (C1), it follows that

E1 = O(

by (C1) and similar arguments in the derivation of (A.4), it follows that E( max

The last equality comes from Lemma 1, (A.5) , (C3) and (C4). By (A.4)-(A.7) and Markov’s inequality, the proof of Lemma 2 is complete.

Proof of Lemma 3. For (A.1), note that max Lemma 1, we have the desired conclusion. By (A7) and (A8) in Ing and Lai (2011) and (A.1), we have (A.2). Furthermore, since max

1≤#(J )≤Kn

so, we have (A.3), and the tools for proving (3.4) are ready.

Proof of (3.4). Since by Lemma 1, n12||wi||−→ σp ii ≥ min

where wti;J = wti− γiT(J )R−1(J )wt(J ). Since and the proof of (3.4) is complete.

Proof of (4.4). By the proof of (3.4), it follows that P (| ˆBn| ≥ θLn, Dn) ≤ P ( max

the equality comes from (C1), (C7), (C8), (A.8), Lemma 1-3. Note that P ( ˆAn< v2n, Dn) ≤ P (λmin( ˆR( ˆJ˜k)) < v2n, Dn)

≤ P (λmin( ˆR( ˆJm0)) < v2n)

≤ P ( max

1≤#(J )≤m0

|| ˆR(J ) − R(J )|| > δ2)

= o(1), (A.10)

the equality comes from Lemma 3, and P (| ˆEn| ≥ θL2n, Dn)

≤ P (||U e

||1 max

1≤j≤pj?|( max

1≤i,j≤p 1 n|Pn

t=1wtiwtj− σij| + max

1≤i,j≤pij|) ≥ θ2L2n) +P (||U

e

||12(S2,n+ S3,n) ≥ θ2L2n)

= o(1). (A.11)

where S2,n, S3,n are the same as those in the proof of (3.4), the equality comes from (C1), (C3), (C7), Lemma 1 and the proof of (3.4). By (A.8)-(A.11), the proof of (4.4) is complete.

Proof of (4.7). Since ∃ a constant ˜λ > 0 and ζn→ ∞ s.t.

n(1 − exp(−n−1wn(ˆk − ˜k)pq12 ))

ˆk − ˜k ≥ ˜λ min{(npq12 )12, wnpq12 }

= ζnpq12 , (A.12)

it follows from (A.12) that

P ((ˆk − ˜k)(ˆan+ ˆbn) ≥ θn(1 − exp(−n−1wn(ˆk − ˜k)pq12 )), ˆk > ˜k)

≤ P (|| ˆR−1( ˆJKn)|| max

1≤i≤p(n12 Pn

t=1wtiξ?t)2θ2ζnpq12 ) +P (|| ˆR−1( ˆJKn)||||n(S2,n+ S3,n)|| ≥ θ2ζnpq12 )

= o(1), (A.13)

the equality comes from Lemma 2, 3 and proof of (3.4). Note that P ((P

l /∈ ˆJ˜kβl?wl)T(HJˆ

ˆk− HJˆ

˜k)(P

l /∈ ˆJk˜βl?wl)

≥ θn(1 − exp(−n−1wn(ˆk − ˜k)pq12 )), ˆk > ˜k)

≤ P (||U e

||21( max

1≤i,j≤p 1 n|Pn

t=1wtiwtj− σij| + max

1≤i,j≤pij|)

≥ θ min{

r

p

q12

n , n−1wnpq12 }(ˆk − ˜k), ˆk > ˜k)

≤ P (||U e

||21( max

1≤i,j≤p 1 n|Pn

t=1wtiwtj− σij| + max

1≤i,j≤pij|) ≥ θ min{

r

p

q12

n , L4n})

= o(1), (A.14)

and P (|(P

l /∈ ˆJ˜kβl?wl)T(HJˆˆk− HJˆ˜k)ξ e

?| ≥ θn(1 −exp(−n−1wn(ˆk − ˜k)pq12 )), ˆk > ˜k)

≤ P (||U e

||12(S2,n+ S3,n) ≥ θ min{

r

p

q12

n , L4n})

= o(1). (A.15)

and similar to (A.14),(A.15), it follows that

P ((X

l /∈ ˆJ˜k

βl?wl)T(I − HJˆk˜)(X

l /∈ ˆJ˜k

βl?wl) ≥ θn, ˆk > ˜k) = o(1), (A.16)

P (|(X

l /∈ ˆJ˜k

βl?wl)T(I − HJˆ˜k)ξ e

?| ≥ θn, ˆk > ˜k) = o(1). (A.17)

So, by (A.13)-(A.17), the proof of (4.7) is complete.

相關文件