5 Simulation Studies - 高維度時間序列並帶有測量誤差模型之模型選擇

In this section, we report simulation studies of the performance of OGA+

HDBIC+Trim. These simulations consider the regression model y^?_t =

p⁰

j=1

β_j^?w_tj +

j=p⁰+1

β_j^?w_tj+ ξ^?_t, t = 1, 2, ..., n, (5.1) where β_p⁰₊₁, β_p⁰₊₂, ..., β_p = 0, p n, η_tjare i.i.d. N (0, σ²_η), ∀t = 1, 2, ..., n, j = 1, 2, ..., p, and are independent of x_tj. ξ_t are i.i.d. N (0, σ²_ξ) and are indepen-dent of x_tj, η_tj. η_yt are i.i.d. N (0, σ_η²_y) and are independent of x_tj, η_tj, ξ_t

Examples 1 and 2 consider the case

x_tj = d_tj + ˜η ˜x_t, (5.2) in which ˜η ≥ 0 and (d_t1, d_t2, ..., d_tj, ˜x_t)^T, t = 1, 2, ..., n are i.i.d. normal with mean (1, 1, ..., 1, 0)^T and covariance matrix I. We standardize the variance of xtj by replacing xtj with √^x^tj

1+˜η². Since for any J ⊂ {1, 2, ..., p} and 1 ≤ i ≤ p with i /∈ J,

λ_min(R(J )) = 1

1 + ˜η² + σ_η² > 0 and ||R⁻¹(J )γ_i(J )||₁ < 1,

(3.1) is satisfied. Moreover, Corr(w_ti, w_tj) = _1+˜^η^˜²_η2 increases when ˜η grows.

Example 1. Consider (5.1) with p⁰ = 5, (β₁, β₂, ..., β₅) = (3, −3.5, 4, −2.8, 3.2), σ_ξ² = 1, σ_η²_y = 0.01 and assume that (5.2) holds. The cases ˜η = 0, which means the regressors are uncorrelated, σ_η² = 0.01, 0.5, 0.1, and (n, p) = (50, 1000), (100, 2000), (200, 4000) are considered here. We choose Kn = b5(n/p^q1² )¹²c and allow q1 to vary between 4 and 15. We have also allowed D in K_n = bD(n/p^q1² )¹²c to vary between 3 and 10, and the results are similar to those for D = 5. We perform 1000 simulations on each case. Define the mean squared prediction errors

MSPE = 1 1000

1000

l=1

(

j=1

β_j^?w_n+1^(l) − ˆy^(l)_n+1)²

in which x^(l)_n+1,1, x^(l)_n+1,2, ..., x^(l)_n+1,p are the regressors associated with y_n+1^(l) , the new outcome in the lth simulation run, and ˆy_n+1^(l) denotes the predictor of y^(l)_n+1. Table 1 shows that OGA+HDBIC+Trim is very sensitive to the order of moment bounds q₁, it performs well with proper q₁, but performs poorly with improper q₁. If q₁ is too small, the penalty for the number of predictor variables in HDBIC is too large, so, OGA+HDBIC tends to be underfitting;

if q₁ is too large, the penalty for the number of predictor variables in HD-BIC is too small, so, OGA+HDHD-BIC tends to be overfitting. With moderate order of moment bounds (q1 = 8, 10), in the simulations for n ≥ 100, OGA includes the 5 relevant regressors within K_n iterations for 99.9% or more of the simulations, and HDBIC+Trim identify the smallest correct model for 98% or more of the simulations.

Table1. Frequency, in 1000 simulations, of including all five relevant variables (Correct), of selecting exactly the relevant variables (E), of selecting all relevant variables and i irrelevant variables (E+i).

σ_η² q1 n p E E+1 E+2 E+3 E+4 E+5 Correct MSPE

0.01 4 50 1000 0 0 0 0 0 0 0 64.02502

100 2000 0 0 0 0 0 0 0 53.08281

200 4000 0 0 0 0 0 0 0 55.59686

6 50 1000 623 0 0 0 0 0 623 24.54740

100 2000 1000 0 0 0 0 0 1000 0.15931

200 4000 1000 0 0 0 0 0 1000 0.08096

8 50 1000 911 18 0 0 0 0 929 4.34789

100 2000 1000 0 0 0 0 0 1000 0.17550

200 4000 1000 0 0 0 0 0 1000 0.08053

10 50 1000 571 129 43 17 17 7 922 10.29011

100 2000 983 16 1 0 0 0 1000 0.17837

200 4000 999 1 0 0 0 0 1000 0.16207

15 50 1000 0 0 0 0 0 0 914 14.44628

100 2000 21 12 10 7 3 2 1000 4.70902

200 4000 677 225 75 14 5 2 1000 0.19443

0.05 5 50 1000 0 0 0 0 0 0 0 65.54043

100 2000 0 0 0 0 0 0 0 53.91476

200 4000 0 0 0 0 0 0 0 47.64495

6 50 1000 2 0 0 0 0 0 2 59.51543

100 2000 689 0 0 0 0 0 689 16.94148

200 4000 1000 0 0 0 0 0 1000 0.18862

8 50 1000 816 16 2 0 0 0 834 13.14926

100 2000 1000 0 0 0 0 0 1000 0.39365

200 4000 1000 0 0 0 0 0 1000 0.17408

10 50 1000 522 118 36 21 8 14 861 13.67555

100 2000 983 16 1 0 0 0 1000 0.44005

200 4000 998 2 0 0 0 0 1000 0.17630

15 50 1000 0 0 0 0 0 0 854 26.64408

100 2000 11 17 12 10 3 0 1000 10.66257

200 4000 683 218 75 19 1 2 1000 0.43310

σ_η² q₁ n p E E+1 E+2 E+3 E+4 E+5 Correct MSPE

0.1 5.5 50 1000 0 0 0 0 0 0 0 59.78768

100 2000 0 0 0 0 0 0 0 50.27432

200 4000 28 0 0 0 0 0 28 45.76851

6 50 1000 0 0 0 0 0 0 0 59.57889

100 2000 20 0 0 0 0 0 20 46.85445

200 4000 973 0 0 0 0 0 973 1.57907

8 50 1000 507 9 0 0 0 0 516 28.21477

100 2000 999 0 0 0 0 0 999 0.65213

200 4000 1000 0 0 0 0 0 1000 0.31026

10 50 1000 493 94 36 12 8 9 744 22.35131

100 2000 987 13 0 0 0 0 1000 0.63598

200 4000 999 1 0 0 0 0 1000 0.29356

15 50 1000 0 0 0 0 0 0 773 43.94780

100 2000 16 13 8 4 5 0 1000 16.68930

200 4000 684 222 66 20 3 2 1000 0.83435

Example 2. The settings of this example are the same with Example 1, but we allow σ_η² to have a rate of convergence such that

||U e

||1 ≤ C(

s p^q1²

n ), (5.3)

for some C varies between 0.01 and 45. Two cases are considered here:

Case 1: Consider ˜η = 0, which means the regressors are uncorrelated, and let σ²_η = C

q12

n . In this case, the inequality in (5.3) becomes an equality.

Table 2 shows that OGA+HDBIC+Trim agrees with the asymptotic theory of Theorem 4. In the cases of n = 50, p = 1000, OGA can include all relevant variables at least 90% of the time if C ≤ 1 (σ_η² ≤ 0.02), furthermore, with proper order of moment bounds (q₁ = 8), OGA+HDBIC+Trim can identify the smallest correct model over 88% of the time. In the cases of n ≥ 100, OGA always include all relevant variables when C ≤ 5 (σ²_η ≤ 0.085), further-more, when q₁ = 8, 10, HDBIC+Trim identifies the smallest correct model at least 98% of the time. In the case of n = 200, p = 4000, q₁ = 12, even if σ_η² = 0.625, which is 62.5% of the variance of the real input variables,

OGA+HDBIC+Trim could still identify the smallest correct model for 91.5%

of the time. Since the penalty term of each number of predictor variables in HDBIC is log(n)p^q1² , when n is small, a small q₁ is appropriate to prevent from being overfitting; When n is large, a larger q₁ can be tolerated without being seriously overfitting.

Case 2: Consider ˜η = 2, which means the regressors are highly correlated (80%), and let σ²_η = C^√¹_p

q12

n , which implies (5.3). Table 3 shows that in the cases of n = 50, p = 1000, the performance of OGA is getting worse with the ratio of including all relevant variables decreases to about 50 ∼ 60% of the time when C = 0.01 (σ_η² is about 0.0001) due to the highly correlatedness of the regressors. However, when n ≥ 100, q1 = 10, 12, C ≤ 5 (σ_η² ≤ 0.024), OGA can include all relevant variables for 98% or more of the time, and HDBIC+Trim identifies the smallest correct model for 80% or more of the time. In the case of n = 200, p = 4000, q₁ = 10, C = 35 (σ_η² is about 0.09), HDBIC+Trim can identify the smallest correct model for 85% of the time.

Table2. Case 1 in Example 2, with η e

= 0 and σ²_η = C r

2 q1

n . The other notations are the same in Table 1.

q1 C n p E E+1 E+2 E+3 E+4 E+5 Correct MSPE σ²_η

8 0.01 50 1000 913 13 0 0 0 0 926 6.28107 0.00020

100 2000 999 1 0 0 0 0 1000 0.11205 0.00016

200 4000 1000 0 0 0 0 0 1000 0.05414 0.00012

1 50 1000 887 13 0 0 0 0 900 7.44817 0.02075

100 2000 1000 0 0 0 0 0 1000 0.18092 0.01592

200 4000 1000 0 0 0 0 0 1000 0.08328 0.01223

5 50 1000 459 8 1 0 0 0 468 31.75064 0.11312

100 2000 999 1 0 0 0 0 1000 0.53822 0.08503

200 4000 1000 0 0 0 0 0 1000 0.20888 0.06431

20 50 1000 0 0 0 0 0 0 0 45.31566 0.68492

100 2000 6 0 0 0 0 0 6 34.84350 0.45657

200 4000 963 0 0 0 0 0 963 1.03303 0.31875

10 0.01 50 1000 569 125 46 24 20 16 920 9.10107 0.00017

100 2000 984 16 0 0 0 0 1000 0.10932 0.00013

200 4000 1000 0 0 0 0 0 1000 0.05647 0.00010

1 50 1000 559 130 59 26 11 9 911 8.49444 0.01740

100 2000 980 19 1 0 0 0 1000 0.19531 0.01313

200 4000 997 3 0 0 0 0 1000 0.07708 0.00992

5 50 1000 493 101 35 18 10 8 763 18.06490 0.09350

100 2000 983 16 1 0 0 0 1000 0.50227 0.06929

200 4000 999 1 0 0 0 0 1000 0.19478 0.05165

35 50 1000 0 0 0 0 0 0 0 42.26501 1.49096

100 2000 13 2 0 0 0 0 15 28.75040 0.83021

200 4000 901 0 0 0 0 0 901 1.74922 0.52387

12 0.01 50 1000 6 2 1 0 1 1 932 14.44258 0.00015

100 2000 789 151 37 16 4 1 1000 0.21749 0.00011

200 4000 970 28 2 0 0 0 1000 0.05420 0.00009

1 50 1000 0 4 1 1 0 0 908 17.25881 0.01548

100 2000 781 170 35 5 4 1 1000 0.39322 0.01155

200 4000 973 25 1 1 0 0 1000 0.08424 0.00863

5 50 1000 3 3 1 0 2 8 777 41.15342 0.08249

100 2000 798 147 39 7 3 2 1000 0.81430 0.06055

200 4000 999 1 0 0 0 0 1000 0.19478 0.05165

45 50 1000 0 0 0 0 0 0 0 119.84890 2.18342

100 2000 14 3 1 0 0 0 18 26.23490 1.05687

200 4000 915 25 2 0 0 0 942 1.53381 0.62584

Table3. Case 2 in Example 2, with η e

= 2 and σ²_η = C^√¹_p r

2 q1

n . The other notations are the same in Table 1.

q₁ C n p E E+1 E+2 E+3 E+4 E+5 Correct MSPE σ_η²

8 0.01 50 1000 507 3 3 0 0 0 513 4.75287 0.00011

100 2000 1000 0 0 0 0 0 1000 0.04985 0.00006

200 4000 1000 0 0 0 0 0 1000 0.02725 0.00003

1 50 1000 214 3 0 0 0 0 217 7.69615 0.01061

100 2000 994 0 0 0 0 0 994 0.09892 0.00578

200 4000 1000 0 0 0 0 0 1000 0.03679 0.00315

5 50 1000 2 0 0 0 0 0 2 11.83424 0.05303

100 2000 711 0 0 0 0 0 711 0.45313 0.02891

200 4000 1000 0 0 0 0 0 1000 0.08263 0.01576

15 50 1000 0 0 0 0 0 0 0 15.98425 0.15908

100 2000 2 0 0 0 0 0 2 9.95848 0.08674

200 4000 971 0 0 0 0 0 971 0.48686 0.04729

10 0.01 50 1000 466 77 21 21 3 4 610 3.65043 0.00010

100 2000 990 10 0 0 0 0 1000 0.05153 0.00005

200 4000 1000 0 0 0 0 0 1000 0.02699 0.00003

1 50 1000 343 43 19 6 2 3 429 5.37950 0.00892

100 2000 982 17 0 0 0 0 999 0.07926 0.00478

200 4000 997 3 0 0 0 0 1000 0.02730 0.00256

5 50 1000 52 8 2 1 0 0 67 9.36974 0.04462

100 2000 958 21 1 0 0 0 980 0.33916 0.02391

200 4000 1000 1 0 0 0 0 1000 0.07646 0.01281

35 50 1000 0 0 0 0 0 0 0 20.92983 0.31231

100 2000 1 0 0 0 0 0 1 13.52782 0.16736

200 4000 851 8 0 0 0 0 859 1.35893 0.08969

12 0.01 50 1000 64 20 13 7 10 14 573 5.72614 0.00008

100 2000 834 122 32 6 5 0 999 0.09410 0.00004

200 4000 980 19 1 0 0 0 1000 0.02973 0.00002

1 50 1000 59 17 14 9 8 1 466 7.44604 0.00795

100 2000 855 105 28 7 4 0 1000 0.10421 0.00421

200 4000 978 22 0 0 0 0 1000 0.03700 0.00223

5 50 1000 12 10 2 2 3 0 97 12.88040 0.03976

100 2000 803 137 35 10 4 0 993 0.31848 0.02106

200 4000 969 30 1 0 0 0 1000 0.07507 0.01116

40 50 1000 0 0 0 0 0 0 0 26.24179 0.31811

100 2000 7 7 0 0 0 0 14 11.19198 0.16851

200 4000 816 144 2 0 0 0 962 1.10595 0.08927

References

Abhirup Datta and Hui Zou. (2016). CoCoLasso for High-dimensional Error-in-variables Regression. https://arxiv.org/abs/1510.07123.

Alexandre Belloni, Mathieu Rosenbaum and Alexandre B. Tsybakov. (2014). An {`₁, `₂,

`_∞}-Regularization Approach to High-Dimensional Errors-in-variables Models. https:

//arxiv.org/abs/1412.7216.

Alexandre Belloni, Mathieu Rosenbaum and Alexandre B. Tsybakov. (2016). Linear and Conic Programming Estimators in High-Dimensional Errors-in-variables Models. https:

//arxiv.org/abs/1408.0241.

Ching-Kang Ing and Tze Leung Lai. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statist. Sinica, 1473-1513.

Ching-Kang Ing and Kunling Huang. (2016). Model Selection for High-Dimensional Mul-tivariate Time Dependent Models (Unpublished master’s thesis). National Taiwan University, Taipei City.

C. Z. Wei. (1987). Adaptive prediction by least squares predictors in stochastic regression models with applications to time series. Ann. Statist. 15(4):1667-1682.

David F. Findley and Ching-Zong Wei. (1993). Moment bounds for deriving time series CLTs and model selection procedures. Statist. Sinica, 453-480.

Po-Ling Loh and Martin J. Wainwright. (2012). High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Ann. Statist. 40(3):1637 -1664.

Temlyakov, V. N. (2000). Weak greedy algorithms. Adv. Comput. Math. 12, 213-227.

Appendix

Before we prove (3.4), three lemmas are needed first:

Lemma 1. Assume (C2), max

1≤i,j≤p

Lemma 2. Assume (C1)-(C4), max

1≤i≤p it follows from (C2) that

1≤i,j≤pmax

Similarly,

Proof of Lemma 2. Note that E( max

where the third inequality comes from Lemma 2 in Wei (1987), K is a positive constant depends only on q₁ and C₂ in (C1); the last inequality comes from Jensen’s inequality. From (C1), it follows that

E₁ = O(

by (C1) and similar arguments in the derivation of (A.4), it follows that E( max

The last equality comes from Lemma 1, (A.5) , (C3) and (C4). By (A.4)-(A.7) and Markov’s inequality, the proof of Lemma 2 is complete.

Proof of Lemma 3. For (A.1), note that max Lemma 1, we have the desired conclusion. By (A7) and (A8) in Ing and Lai (2011) and (A.1), we have (A.2). Furthermore, since max

1≤#(J )≤Kn

so, we have (A.3), and the tools for proving (3.4) are ready.

Proof of (3.4). Since by Lemma 1, n¹²||w_i||−→ σ^p _ii ≥ min

where w_ti;J^⊥ = w_ti− γ_i^T(J )R⁻¹(J )w_t(J ). Since and the proof of (3.4) is complete.

Proof of (4.4). By the proof of (3.4), it follows that P (| ˆB_n| ≥ θL_n, D_n) ≤ P ( max

the equality comes from (C1), (C7), (C8), (A.8), Lemma 1-3. Note that P ( ˆA_n< ^v₂ⁿ, D_n) ≤ P (λ_min( ˆR( ˆJ˜k)) < ^v₂ⁿ, D_n)

≤ P (λ_min( ˆR( ˆJ_m₀)) < ^v₂ⁿ)

≤ P ( max

1≤#(J )≤m0

|| ˆR(J ) − R(J )|| > ^δ₂)

= o(1), (A.10)

the equality comes from Lemma 3, and P (| ˆE_n| ≥ θL²_n, D_n)

≤ P (||U e

||1 max

1≤j≤p|β_j^?|( max

1≤i,j≤p 1 n|Pn

t=1wtiwtj− σij| + max

1≤i,j≤p|σij|) ≥ ^θ₂L²_n) +P (||U

||12(S2,n+ S3,n) ≥ ^θ₂L²_n)

= o(1). (A.11)

where S_2,n, S_3,n are the same as those in the proof of (3.4), the equality comes from (C1), (C3), (C7), Lemma 1 and the proof of (3.4). By (A.8)-(A.11), the proof of (4.4) is complete.

Proof of (4.7). Since ∃ a constant ˜λ > 0 and ζ_n→ ∞ s.t.

n(1 − exp(−n⁻¹w_n(ˆk − ˜k)p^q1² ))

ˆk − ˜k ≥ ˜λ min{(np^q1² )¹², wnp^q1² }

= ζ_np^q1² , (A.12)

it follows from (A.12) that

P ((ˆk − ˜k)(ˆa_n+ ˆb_n) ≥ θn(1 − exp(−n⁻¹w_n(ˆk − ˜k)p^q1² )), ˆk > ˜k)

≤ P (|| ˆR⁻¹( ˆJ_K_n)|| max

1≤i≤p(n⁻¹² Pn

t=1w_tiξ^?_t)² ≥ ^θ₂ζ_np^q1² ) +P (|| ˆR⁻¹( ˆJ_K_n)||||n(S_2,n+ S_3,n)|| ≥ ^θ₂ζ_np^q1² )

= o(1), (A.13)

the equality comes from Lemma 2, 3 and proof of (3.4). Note that P ((P

l /∈ ˆJ˜kβ_l^?w_l)^T(H_J_ˆ

ˆk− H_J_ˆ

˜k)(P

l /∈ ˆJk˜β_l^?w_l)

≥ θn(1 − exp(−n⁻¹w_n(ˆk − ˜k)p^q1² )), ˆk > ˜k)

≤ P (||U e

||²₁( max

1≤i,j≤p 1 n|Pn

t=1w_tiw_tj− σ_ij| + max

1≤i,j≤p|σ_ij|)

≥ θ min{

q12

n , n⁻¹w_np^q1² }(ˆk − ˜k), ˆk > ˜k)

≤ P (||U e

||²₁( max

1≤i,j≤p 1 n|Pn

t=1w_tiw_tj− σ_ij| + max

1≤i,j≤p|σ_ij|) ≥ θ min{

q12

n , L⁴_n})

= o(1), (A.14)

and P (|(P

l /∈ ˆJ˜kβ_l^?wl)^T(HJˆˆk− HJˆ˜k)ξ e

?| ≥ θn(1 −exp(−n⁻¹wn(ˆk − ˜k)p^q1² )), ˆk > ˜k)

≤ P (||U e

||₁2(S_2,n+ S_3,n) ≥ θ min{

q12

n , L⁴_n})

= o(1). (A.15)

and similar to (A.14),(A.15), it follows that

P ((X

l /∈ ˆJ˜k

β_l^?wl)^T(I − HJˆk˜)(X

l /∈ ˆJ˜k

β_l^?wl) ≥ θn, ˆk > ˜k) = o(1), (A.16)

P (|(X

l /∈ ˆJ˜k

β_l^?wl)^T(I − HJˆ˜k)ξ e

?| ≥ θn, ˆk > ˜k) = o(1). (A.17)

So, by (A.13)-(A.17), the proof of (4.7) is complete.

在文檔中高維度時間序列並帶有測量誤差模型之模型選擇 (頁 21-35)