A Proof of Lemma 3
Using the notation ˆv = Yn−1v/kYn−1vk and An = xnx>n, one can follow the analysis in Balsubramani et al.
(2013) to show that Φ(v)n ≤ Φ(v)n−1+ βn− Zn, with
• βn= 5γn2+ 2γ3n,
• Zn = 2γn(ˆv>U U>Anv − kUˆ >vkˆ 2vˆ>Anv), andˆ
• E [Zn|Fn−1] ≥ 2γn(λ − ˆλ)Φ(v)n−1(1 − Φ(v)n−1) ≥ 0.
We omit the proof here as the adaptation is straightforward.
It remains to show our better bound on |Zn|. For this, note that
|Zn| ≤ 2γn
ˆv>U U>− kU>vkˆ 2vˆ>
· kAnvk,ˆ where kAnvk ≤ 1 andˆ
ˆv>U U>− kU>vkˆ 2vˆ>
2
= kU>vkˆ 2− 2kU>vkˆ 4+ kU>vkˆ 4
= kU>vkˆ 2 1 − kU>vkˆ 2 .
As kU>vkˆ 2≤ 1 and 1 − kU>vkˆ 2 = Φ(v)n−1, we have
|Zn| ≤ 2γn
q Φ(v)n−1.
B Proof of Lemma 4
Assume that the event Γ0 holds and consider any n ∈ [n0, n1). We need the following, which we prove in Ap- pendix B.1.
Proposition 1. For any n > m and any v ∈ Rk, kU>Ynvk
kYnk ≥m n
3c
· kU>Ymvk kYmk .
From Proposition 1, we know that for any v ∈ S, kU>Ynvk
kYnvk ≥kU>Ynvk kYnk ≥n0
n
3ckU>Y0vk kY0k , where (n0/n)3c ≥ (n0/n1)3c≥ (1/c1)3cfor the constant c1 given in Remark 1. As Y0 = Q0 and kQ0k = 1 = kQ0vk, we obtain
kU>Ynvk
kYnvk ≥kU>Q0vk c3c1 kQ0vk ≥
√1 − ρ0
c3c1 = r c¯
c6c1 kd. Therefore, assuming Γ0, we always have
Φn = max
v
1 −kU>Ynvk2 kYnvk2
≤ 1 − c¯
c6c1 kd = ρ1.
B.1 Proof of Proposition 1
Recall that for any n, Yn = Yn−1+ γnxnx>nYn−1 and kxnx>nk ≤ 1. Then for any v ∈ Rk,
kU>Ynvk
kYnk ≥ kU>Yn−1vk − γnkU>Yn−1vk kYn−1k + γnkYn−1k , which is
1 − γn
1 + γn ·kU>Yn−1vk
kYn−1k ≥ e−3γnkU>Yn−1vk kYn−1k , using the fact that 1−x ≥ e−2xfor x ≤ 1/2 and γn≤ 1/2.
Then by induction, we have kU>Ynvk
kYnk ≥ e−3Pnt>mγi·kU>Ymvk kYmk . The Proposition follows as
e−3Pnt>mγi= e−3cPnt>m1t ≥m n
3c
using the fact thatPn t>m
1 t ≤Rn
m 1
xdx = ln(mn).
C Proof of Lemma 5
According to Lemma 3, our Φ(v)n ’s satisfy the same recur- rence relation as the functions Ψn’s of Balsubramani et al.
(2013). We can therefore have the following, which we prove in Appendix C.1.
Lemma 9. Let ˆρi = ρi/de5/c0ec0(1−ρi). Then for anyu ∈ S and αi≥ 12c2/ni−1,
Pr
sup
n≥ni
Φ(u)n ≥ ˆρi+ αi| Γi
≤ e−Ω((α2i/(c2ρi))ni−1).
Our goal is to bound Pr [¬Γi+1|Γi], which is
Pr
"
∃v ∈ S : sup
ni≤n<ni+1
Φ(v)n ≥ ρi+1|Γi
# .
As discussed before, we cannot directly apply a union bound on the bound in Lemma 9 as there are infinitely many v’s in S. Instead, we look for a small “-net” Di
of S, with the property that any v ∈ S has some u ∈ Di
with kv − uk ≤ . Such a Diwith |Di| ≤ (1/)O(k)is known to exist (see e.g. Milman and Schechtman (1986)).
Then what we need is that when v and u are close, Φ(v)n and Φ(u)n are close as well. This is guaranteed by the following, which we prove in Appendix C.2.
Lemma 10. Suppose Γi happens. Then for any n ∈ [ni, ni+1), any ≤ √
1 − ρi/(2c6c1 ), and any u, v ∈ S withku − vk ≤ , we have
Φ(v)n − Φ(u)n
≤ 16c6c1 /p 1 − ρi.
According to this, we can choose αi = (ρi+1− ˆρi)/2 and
= αi
√1 − ρi/(16c6c1 ) so that with ku − vk ≤ , we have
|Φ(v)n − Φ(u)n | ≤ αi. This means that given any v ∈ S with Φ(v)n ≥ ρi+1, there exists some u ∈ Diwith Φ(u)n ≥ ρi+1− αi= ˆρi+ αi. As a result, we can now apply a union bound over Diand have
Pr [¬Γi+1|Γi] ≤ X
u∈Di
Pr
sup
n≥ni
Φ(u)n ≥ ˆρi+ αi| Γi
. (7) To bound this further, consider the following two cases.
First, for the case of i < π1, we have ρi ≥ 3/4 and ηi = 1 − ρi≤ 1/4, so that
ˆ
ρi≤ ρie−5(1−ρi)= (1 − ηi)e−5ηi ≤ e−6ηi ≤ 1 − 3ηi. Then αi ≥ ((1 − 2ηi) − (1 − 3ηi)) /2 = ηi/2, which is at least 12c2/ni−1, as ηi ≥ η1 ≥ ¯c/(c6c1 kd) and ni−1 ≥ n0 = ˆcck3d2log d for a large enough constant ˆc. There- fore, we can apply Lemma 9 and the bound in (7) becomes
(cc1/ηi)O(k)e−Ω((ηi2/c2)ni−1)≤ δ0
2(i + 1)2. Next, for the case of i ≥ π1, we have ρi≤ 3/4 so that
ˆ
ρi≤ ρi/de5/c0ec0/4≤ ρi/de5/c0e3,
as c0 ≥ 12 by assumption. Since ρi+1 ≥ ρi/de5/c0e2, this gives us αi ≥ ρi(de5/c0e−2− de5/c0e−3)/2, which is at least 12c2/ni−1, as ρi, according to our choice, is about c2(c3k log ni−1)/(ni−1+1) for a large enough constant c2. Thus, we can apply Lemma 9 and the bound in (7) becomes
(cc1/ρi)O(k)e−Ω((ρi/c2)ni−1)≤ δ0
2(i + 1)2. (8) This completes the proof of Lemma 5.
C.1 Proof of Lemma 9
By Lemma 3, the random variables Φ(v)n ’s satisfy the same recurrence relation of Balsubramani et al. (2013) for their random variables Φn’s. Thus, we can follow their analy- sis1, but use our better bound on |Zn|, and have the follow- ing.
First, when given Γi, we have |Zn| ≤ 2γn
√ρifor ni−1≤ n < ni. Then one can easily modify the analysis in Bal- subramani et al. (2013) to show that for any t ≥ 0,
E h
etΦ(v)ni|Γi
i≤ exp
t ˆρi+ c2(6t + 2t2ρi)
1 ni−1
− 1 ni
,
by noting that (ni+ 1)/(ni−1+ 1) = de5/c0e and n ≥ n0= ˆcck3d2log d according to our choice of parameters.
1In particular, their proofs for Lemma 2.9 and Lemma 2.10.
Next, following Balsubramani et al. (2013) and applying Doob’s martingale inequality, we obtain
Pr
sup
n≥ni
Φ(v)n ≥ ˆρi+ αi|Γi
≤ E
hetΦ(v)ni|Γi
iexp
−t(ˆρi+ αi) +c2 ni
(6t + 2t2ρi)
≤ exp
−tαi+ c2 ni−1
(6t + 2t2ρi)
≤ exp
−tαi
2 +2c2t2ρi ni−1
,
as αi ≥ n12c2
i−1. Finally, by choosing t = α8cin2i−1ρi , we have the lemma.
C.2 Proof of Lemma 10
Assume without loss of generality that Φ(v)n ≤ Φ(u)n (oth- erwise, we switch v and u), so that
Φ(v)n − Φ(u)n
=kU>Ynvk2
kYnvk2 −kU>Ynuk2 kYnuk2 . As kv − uk ≤ , we have
kU>Ynvk
kYnvk ≤kU>Ynuk + kU>Ynk
kYnuk − kYnk . (9) To relate this to kUkY>Ynuk2
nuk2 , we would like to express kU>Ynk in terms of kU>Ynuk and kYnk in terms of kYnuk. For this, note that both kU>Ynuk/kU>Ynk and kYnuk/kYnk are at least kU>Ynuk/kYnk, which by Proposition 1 is at least
ni−1
n
3ckU>Yni−1uk
kYni−1k ≥ c−6c1 kU>Yni−1uk kYni−1k , (10) using the fact that ni−1/n ≥ ni−1/ni+1 ≥ 1/c21. Then as Yni−1 = Qni−1 and kQni−1k = kQni−1uk, the righthand side of (10) becomes
c−6c1 kU>Qni−1uk kQni−1uk = c−6c1
q
1 − Φ(u)ni−1 ≥ c−6c1 p 1 − ρi, given Γi. What we have obtained so far is a lower bound for both kU>Ynuk/kU>Ynk and kYnuk/kYnk. Plugging this into (9), with ˆ = c6c1 /√
1 − ρi, we get kU>Ynvk
kYnvk ≤ kU>Ynuk(1 + ˆ) kYnuk(1 − ˆ) . As a result, we have
Φ(v)n − Φ(u)n
≤ kU>Ynuk2 kYnuk2
(1 + ˆ)2 (1 − ˆ)2 − 1
≤ 16ˆ,
since (1+ˆ(1−ˆ))22 − 1 ≤ (1−ˆ4ˆ)2 ≤ 16ˆ for ˆ ≤ 1/2.
D Proof of Lemma 7
As cos(U, Qi−1)2 = 1+tan(U,Q1
i−1)2 ≥ 1+ε12 i−1
≥ βi2, we have kGik ≤ 4βi≤ 4 cos(U, Qi−1). Thus, we can apply Lemma 6 and have
tan(U, AQi−1+ Gi) ≤ max(βi, max(βi, γ)εi−1), which is at most max(βi, γεi−1) ≤ γεi−1 = εi. The lemma follows as tan(U, Qi) = tan(U, AQi−1+ Gi).
E Proof of Lemma 8
Let ρ = 4βiand note that kGik ≤ kA − Fik, where Fiis the average of |Ii| i.i.d. random matrices, each with mean A. Recall that kAk ≤ 1 by Assumption 1. Then from a matrix Chernoff bound, we have
Pr [kGik > ρ] ≤ Pr [kA − Fik > ρ] ≤ de−Ω(ρ2|Ii|)≤ δi, for |Ii| given in (3).
F Proof of Lemma 9
Let L be the iteration number such that εL−1> ε and εL≤ ε. Note that with εL = ε0γL = ε0(1 − (λ − ¯λ)/λ)L/4 ≤ ε0e−L(λ−¯λ)/(4λ), we can have
L ≤ O
λ
λ − ¯λlogε0 ε
≤ O
λ
λ − ¯λlogd ε
.
As the number of samples in iteration i is
|Ii| = O
log(d/δi) (λ − ¯λ)2βi2
≤ O
log(di) (λ − ¯λ)2β2i
, the total number of samples needed is
L
X
i=1
|Ii| ≤ O log(dL) (λ − ¯λ)2
·
L
X
i=1
1 βi2. With βi = min(γ/q
1 + ε2i−1, γεi−1), one sees that for some i0≤ O(log d), βi = γ/q
1 + ε2i−1when i ≤ i0and βi= γεi−1= εiwhen i > i0. This implies that
L
X
i=1
1 βi2 =
i0
X
i=1
1 + ε2i−1 γ2 +
L
X
i=i0+1
1
ε2i, (11) where the first sum in the righthand side of (11) is
i0 γ2 +
i0
X
i=1
ε20γ2i−4≤ O(log d)
γ2 + ε20 γ2(1 − γ2), while the second sum is
L
X
i=i0+1
γ2(L−i)
ε2L ≤ 1
(1 − γ2)ε2L ≤ 1 γ2(1 − γ2)ε2
using the fact that εL = γεL−1 ≥ γε. Since γ2 =
1 − λ−¯λλ1/2
≤ 1 − λ−¯2λλ, we have 1−γ12 ≤ λ−¯2λλ, and since λ ≤ O(¯λ), we also have γ12 ≤ O(1). Moreover, as we assume that ε ≤ 1/√
kd, we can conclude that the total number of samples needed is at most
L
X
i=1
|Ii| ≤ O log(dL) (λ − ¯λ)2
·O
λ
(λ − ¯λ)ε2
≤ O λ log(dL) ε2(λ − ¯λ)3
.
References
Balsubramani, A., Dasgupta, S., and Freund, Y. (2013).
The fast convergence of incremental pca. In Advances in Neural Information Processing Systems.
Milman, V. D. and Schechtman, G. (1986). Asymptotic the- ory of finite-dimensional normed spaces. Lecture Notes in Mathematics. Springer.