A Proof of Lemma 3

(1)

A Proof of Lemma 3

Using the notation ˆv = Yn−1v/kYn−1vk and An = xnx^>_n, one can follow the analysis in Balsubramani et al.

(2013) to show that Φ^(v)n ≤ Φ^(v)_n−1+ βn− Zn, with

• βn= 5γ_n²+ 2γ³_n,

• Zn = 2γ_n(ˆv^>U U^>A_nv − kUˆ ^>vkˆ ²vˆ^>A_nv), andˆ

• E [Zn|Fn−1] ≥ 2γn(λ − ˆλ)Φ^(v)_n−1(1 − Φ^(v)_n−1) ≥ 0.

We omit the proof here as the adaptation is straightforward.

It remains to show our better bound on |Zn|. For this, note that

|Z_n| ≤ 2γ_n

ˆv^>U U^>− kU^>vkˆ ²vˆ^>

· kA_nvk,ˆ where kAnvk ≤ 1 andˆ

ˆv^>U U^>− kU^>vkˆ ²vˆ^>

2

= kU^>vkˆ ²− 2kU^>vkˆ ⁴+ kU^>vkˆ ⁴

= kU^>vkˆ ² 1 − kU^>vkˆ ² .

As kU^>vkˆ ²≤ 1 and 1 − kU^>vkˆ ² = Φ^(v)_n−1, we have

|Zn| ≤ 2γn

q Φ^(v)_n−1.

B Proof of Lemma 4

Assume that the event Γ0 holds and consider any n ∈ [n₀, n₁). We need the following, which we prove in Ap- pendix B.1.

Proposition 1. For any n > m and any v ∈ R^k, kU^>Ynvk

kYnk ≥m n

3c

· kU^>Ymvk kYmk .

From Proposition 1, we know that for any v ∈ S, kU^>Ynvk

kY_nvk ≥kU^>Ynvk kY_nk ≥n0

n

3ckU^>Y0vk kY₀k , where (n0/n)^3c ≥ (n0/n1)^3c≥ (1/c1)^3cfor the constant c1 given in Remark 1. As Y0 = Q0 and kQ0k = 1 = kQ0vk, we obtain

kU^>Ynvk

kYnvk ≥kU^>Q0vk c^3c₁ kQ0vk ≥

√1 − ρ0

c^3c₁ = r c¯

c^6c₁ kd. Therefore, assuming Γ0, we always have

Φ_n = max

v

1 −kU^>Y_nvk² kYnvk²

≤ 1 − c¯

c^6c₁ kd = ρ₁.

B.1 Proof of Proposition 1

Recall that for any n, Yn = Yn−1+ γnxnx^>_nYn−1 and kxnx^>_nk ≤ 1. Then for any v ∈ R^k,

kU^>Ynvk

kYnk ≥ kU^>Yn−1vk − γnkU^>Yn−1vk kYn−1k + γnkYn−1k , which is

1 − γn

1 + γ_n ·kU^>Yn−1vk

kYn−1k ≥ e^−3γⁿkU^>Yn−1vk kYn−1k , using the fact that 1−x ≥ e^−2xfor x ≤ 1/2 and γn≤ 1/2.

Then by induction, we have kU^>Y_nvk

kYnk ≥ e⁻³^Pⁿ^t>m^γⁱ·kU^>Y_mvk kYmk . The Proposition follows as

e⁻³^Pⁿ^t>m^γⁱ= e^−3c^Pⁿ^t>m¹^t ≥m n

3c

using the fact thatPn t>m

1 t ≤Rn

m 1

xdx = ln(_mⁿ).

C Proof of Lemma 5

According to Lemma 3, our Φ^(v)n ’s satisfy the same recurrence relation as the functions Ψ_n’s of Balsubramani et al.

(2013). We can therefore have the following, which we prove in Appendix C.1.

Lemma 9. Let ˆρ_i = ρ_i/de^5/c⁰e^c⁰^(1−ρⁱ⁾. Then for anyu ∈ S and αi≥ 12c²/ni−1,

Pr

sup

n≥ni

Φ^(u)_n ≥ ˆρi+ αi| Γi

≤ e^−Ω((α²ⁱ^/(c²^ρⁱ⁾⁾ⁿⁱ⁻¹⁾.

Our goal is to bound Pr [¬Γi+1|Γi], which is

Pr

"

∃v ∈ S : sup

ni≤n<ni+1

Φ^(v)_n ≥ ρ_i+1|Γ_i

# .

As discussed before, we cannot directly apply a union bound on the bound in Lemma 9 as there are infinitely many v’s in S. Instead, we look for a small “-net” Di

of S, with the property that any v ∈ S has some u ∈ Di

with kv − uk ≤ . Such a Diwith |Di| ≤ (1/)^O(k)is known to exist (see e.g. Milman and Schechtman (1986)).

Then what we need is that when v and u are close, Φ^(v)n and Φ^(u)n are close as well. This is guaranteed by the following, which we prove in Appendix C.2.

Lemma 10. Suppose Γi happens. Then for any n ∈ [ni, ni+1), any ≤ √

1 − ρi/(2c^6c₁ ), and any u, v ∈ S withku − vk ≤ , we have

Φ^(v)_n − Φ^(u)_n

≤ 16c^6c₁ /p 1 − ρi.

(2)

According to this, we can choose αi = (ρi+1− ˆρi)/2 and

= αi

√1 − ρi/(16c^6c₁ ) so that with ku − vk ≤ , we have

|Φ^(v)n − Φ^(u)n | ≤ αi. This means that given any v ∈ S with Φ^(v)n ≥ ρ_i+1, there exists some u ∈ D_iwith Φ^(u)n ≥ ρi+1− αi= ˆρi+ αi. As a result, we can now apply a union bound over Diand have

Pr [¬Γ_i+1|Γ_i] ≤ X

u∈D_i

Pr

sup

n≥n_i

Φ^(u)_n ≥ ˆρ_i+ α_i| Γ_i

. (7) To bound this further, consider the following two cases.

First, for the case of i < π1, we have ρi ≥ 3/4 and ηi = 1 − ρi≤ 1/4, so that

ˆ

ρ_i≤ ρ_ie^−5(1−ρⁱ⁾= (1 − η_i)e^−5ηⁱ ≤ e^−6ηⁱ ≤ 1 − 3η_i. Then αi ≥ ((1 − 2ηi) − (1 − 3ηi)) /2 = ηi/2, which is at least 12c²/ni−1, as ηi ≥ η1 ≥ ¯c/(c^6c₁ kd) and ni−1 ≥ n0 = ˆc^ck³d²log d for a large enough constant ˆc. There- fore, we can apply Lemma 9 and the bound in (7) becomes

(c^c₁/ηi)^O(k)e^−Ω((ηⁱ²^/c²⁾ⁿⁱ⁻¹⁾≤ δ0

2(i + 1)². Next, for the case of i ≥ π1, we have ρi≤ 3/4 so that

ˆ

ρi≤ ρi/de^5/c⁰e^c⁰^/4≤ ρi/de^5/c⁰e³,

as c0 ≥ 12 by assumption. Since ρi+1 ≥ ρi/de^5/c⁰e², this gives us αi ≥ ρi(de^5/c⁰e⁻²− de^5/c⁰e⁻³)/2, which is at least 12c²/ni−1, as ρi, according to our choice, is about c2(c³k log ni−1)/(ni−1+1) for a large enough constant c2. Thus, we can apply Lemma 9 and the bound in (7) becomes

(c^c₁/ρi)^O(k)e^−Ω((ρⁱ^/c²⁾ⁿⁱ⁻¹⁾≤ δ₀

2(i + 1)². (8) This completes the proof of Lemma 5.

C.1 Proof of Lemma 9

By Lemma 3, the random variables Φ^(v)n ’s satisfy the same recurrence relation of Balsubramani et al. (2013) for their random variables Φn’s. Thus, we can follow their analysis¹, but use our better bound on |Zn|, and have the following.

First, when given Γi, we have |Zn| ≤ 2γn

√ρifor ni−1≤ n < ni. Then one can easily modify the analysis in Bal- subramani et al. (2013) to show that for any t ≥ 0,

E h

e^tΦ^(v)ⁿⁱ|Γi

i≤ exp

t ˆρi+ c²(6t + 2t²ρi)

1 ni−1

− 1 ni

,

by noting that (ni+ 1)/(ni−1+ 1) = de^5/c⁰e and n ≥ n0= ˆc^ck³d²log d according to our choice of parameters.

1In particular, their proofs for Lemma 2.9 and Lemma 2.10.

Next, following Balsubramani et al. (2013) and applying Doob’s martingale inequality, we obtain

Pr

sup

n≥ni

Φ^(v)_n ≥ ˆρi+ αi|Γi

≤ E

he^tΦ^(v)ⁿⁱ|Γi

iexp

−t(ˆρ_i+ α_i) +c² ni

(6t + 2t²ρ_i)

≤ exp

−tαi+ c² ni−1

(6t + 2t²ρi)

≤ exp

−tα_i

2 +2c²t²ρ_i ni−1

,

as αi ≥ _n^12c²

i−1. Finally, by choosing t = ^α_8cⁱⁿ2ⁱ⁻¹ρi , we have the lemma.

C.2 Proof of Lemma 10

Assume without loss of generality that Φ^(v)n ≤ Φ^(u)n (oth- erwise, we switch v and u), so that

Φ^(v)_n − Φ^(u)_n

=kU^>Ynvk²

kY_nvk² −kU^>Ynuk² kY_nuk² . As kv − uk ≤ , we have

kU^>Y_nvk

kYnvk ≤kU^>Y_nuk + kU^>Y_nk

kYnuk − kYnk . (9) To relate this to ^kU_kY^>^Yⁿ^uk²

nuk² , we would like to express kU^>Ynk in terms of kU^>Ynuk and kYnk in terms of kYnuk. For this, note that both kU^>Ynuk/kU^>Ynk and kYnuk/kYnk are at least kU^>Ynuk/kYnk, which by Proposition 1 is at least

ni−1

n

3ckU^>Y_n_i−1uk

kYni−1k ≥ c^−6c₁ kU^>Y_n_i−1uk kYni−1k , (10) using the fact that n_i−1/n ≥ n_i−1/n_i+1 ≥ 1/c²₁. Then as Y_n_i−1 = Q_n_i−1 and kQ_n_i−1k = kQ_n_i−1uk, the righthand side of (10) becomes

c^−6c₁ kU^>Q_n_i−1uk kQ_n_i−1uk = c^−6c₁

q

1 − Φ^(u)ni−1 ≥ c^−6c₁ p 1 − ρi, given Γ_i. What we have obtained so far is a lower bound for both kU^>Ynuk/kU^>Ynk and kYnuk/kYnk. Plugging this into (9), with ˆ = c^6c₁ /√

1 − ρi, we get kU^>Y_nvk

kYnvk ≤ kU^>Y_nuk(1 + ˆ) kYnuk(1 − ˆ) . As a result, we have

Φ^(v)_n − Φ^(u)_n

≤ kU^>Ynuk² kYnuk²

(1 + ˆ)² (1 − ˆ)² − 1

≤ 16ˆ,

since ^(1+ˆ_(1−ˆ⁾₎²2 − 1 ≤ _(1−ˆ^4ˆ₎₂ ≤ 16ˆ for ˆ ≤ 1/2.

(3)

D Proof of Lemma 7

As cos(U, Qi−1)² = _1+tan(U,Q¹

i−1)² ≥ _1+ε¹2 i−1

≥ β_i², we have kGik ≤ 4βi≤ 4 cos(U, Qi−1). Thus, we can apply Lemma 6 and have

tan(U, AQi−1+ Gi) ≤ max(βi, max(βi, γ)εi−1), which is at most max(βi, γεi−1) ≤ γεi−1 = εi. The lemma follows as tan(U, Qi) = tan(U, AQi−1+ Gi).

E Proof of Lemma 8

Let ρ = 4βiand note that kGik ≤ kA − Fik, where Fiis the average of |Ii| i.i.d. random matrices, each with mean A. Recall that kAk ≤ 1 by Assumption 1. Then from a matrix Chernoff bound, we have

Pr [kG_ik > ρ] ≤ Pr [kA − Fik > ρ] ≤ de^−Ω(ρ²^|Iⁱ^|)≤ δi, for |Ii| given in (3).

F Proof of Lemma 9

Let L be the iteration number such that ε_L−1> ε and εL≤ ε. Note that with εL = ε₀γ^L = ε₀(1 − (λ − ¯λ)/λ)^L/4 ≤ ε0e^−L(λ−¯^λ)/(4λ), we can have

L ≤ O

λ

λ − ¯λlogε₀ ε

≤ O

λ

λ − ¯λlogd ε

.

As the number of samples in iteration i is

|Ii| = O

log(d/δi) (λ − ¯λ)²β_i²

≤ O

log(di) (λ − ¯λ)²β²_i

, the total number of samples needed is

L

X

i=1

|Ii| ≤ O log(dL) (λ − ¯λ)²

·

L

X

i=1

1 β_i². With βi = min(γ/q

1 + ε²_i−1, γε_i−1), one sees that for some i0≤ O(log d), βi = γ/q

1 + ε²_i−1when i ≤ i0and βi= γεi−1= εiwhen i > i0. This implies that

L

X

i=1

1 β_i² =

i₀

X

i=1

1 + ε²_i−1 γ² +

L

X

i=i0+1

1

ε²_i, (11) where the first sum in the righthand side of (11) is

i₀ γ² +

i0

X

i=1

ε²₀γ²ⁱ⁻⁴≤ O(log d)

γ² + ε²₀ γ²(1 − γ²), while the second sum is

L

X

i=i₀+1

γ^2(L−i)

ε²_L ≤ 1

(1 − γ²)ε²_L ≤ 1 γ²(1 − γ²)ε²

using the fact that εL = γεL−1 ≥ γε. Since γ² =

1 − ^λ−¯_λ^λ1/2

≤ 1 − ^λ−¯_2λ^λ, we have _1−γ¹2 ≤ _λ−¯^2λ_λ, and since λ ≤ O(¯λ), we also have _γ¹2 ≤ O(1). Moreover, as we assume that ε ≤ 1/√

kd, we can conclude that the total number of samples needed is at most

L

X

i=1

|I_i| ≤ O log(dL) (λ − ¯λ)²

·O

λ

(λ − ¯λ)ε²

≤ O λ log(dL) ε²(λ − ¯λ)³

.

References

Balsubramani, A., Dasgupta, S., and Freund, Y. (2013).

The fast convergence of incremental pca. In Advances in Neural Information Processing Systems.

Milman, V. D. and Schechtman, G. (1986). Asymptotic the- ory of finite-dimensional normed spaces. Lecture Notes in Mathematics. Springer.