Refining Estimates of Invariant and Deflating Subspaces for Large and Sparse Matrices and Pencils

(1)

(will be inserted by the editor)

Refining Estimates of Invariant and Deflating Subspaces

for Large and Sparse Matrices and Pencils

Hung-Yuan Fan · Peter Chang-Yi Weng · Eric King-wah Chu · Wen-Wei Lin

Abstract We consider the refinement of estimates of invariant (or deflat-ing) subspaces for a large and sparse real matrix A (or pencil A − λB), through some (generalized) nonsymmetric algebraic Riccati equations or their associated (generalized) Sylvester equations via Newton’s method. The crux of the method is the inversion of the unstructured matrix P2>AP2− γIn−m

(or P_l2>(A − γB)Pr2), for some unitary projection P2 (or unitary projections

Pl2, Pr2) from Rn×(n−m), through the efficient and stable inversion of the

struc-tured but near-singular A − γIn (or A − γB). All computations have an O(n)

complexity per iteration, illustrated by some numerical examples.

Keywords deflating subspace, invariant subspace · large-scale problem · Newton’s method · nonsymmetric algebraic Riccati equation · sparse matrix · Sylvester equation

Mathematics Subject Classification (2000) 15A18, 15A22, 15A24, 65F15, 65F50

Version January 15, 2013 Hung-Yuan Fan

Department of Mathematics, National Taiwan Normal University, Taipei 116, Taiwan E-mail: [email protected]

Peter Chang-Yi Weng

School of Mathematical Sciences, Building 28, Monash University 3800, Australia E-mail: [email protected] (Corresponding author)

Eric King-wah Chu

School of Mathematical Sciences, Building 28, Monash University 3800, Australia E-mail: [email protected]

Wen-Wei Lin

Department of Applied Mathematics, National Chiao Tung University, Hsinchu 300, Taiwan E-mail: [email protected]

(2)

1 Introduction

Consider the eigenvalue problem associated with a large and sparse real matrix A (or pencil A − λB later). From [9] (which unified the approaches in [7, 10, 21]), an estimate of an invariant subspace for A ∈ Rn×n_{can be refined through}

the solution of the nonsymmetric algebraic equation (NARE)

N (R) ≡ A22R − RA11− RA12R + A21= 0, (1)

with Aij = Pi>APj (i, j = 1, 2) and P ≡ [P1, P2] being real unitary. We

shall assume that P is in Householder factors [11, p. 146], so that vector multiplications by P (from the left and the right) can be carried out in O(n) complexity. Here P1 ∈ Rn×m is an accurate estimate to the basis of some

invariant subspace “associated” with A11 (see Theorem 1 below), assuming

that σ(A11) ∩ σ(A22) = ∅; i.e., the subspectra are nonintersecting, thus the

invariant subspace approximated by span(P1) is isolated and well-defined. We

shall assume the conditions for the solvability of (1) (and (25) for matrix pencils in Section 3) are satisfied; see [7, 9, 10, 21], essentially when P1 is an

accurate estimate, and the “residual” A21or the correction R are small, relative

to the gap between A11and A22. For example, we have the following theorem:

Theorem 1 (Stewart [21]) Let P>AP = (Aij) as in (1), with the real unitary

P ≡ [P1, P2]. If δ = sep(A11, A22) ≡ infkXk=1kA11X − XA22k > 0 and

kA21k·kA12k < δ2/4, then there exists a unique matrix R such that the columns

of W = P1+ P2R span an invariant subspace of A and A11+ A12R is the

representation of A in span(W ). The matrix R solves the Riccati equation A22R − RA11− RA12R + A21= 0 in (1) with kRk < 2δ−1kA21k.

Note that a perturbation analysis of (1) was also presented in [21]. We shall also assume that A (as well as B later in Section 3) are large and structured, with vector multiplications by A and (A − γI)−1 (as well as B and (A − γB)−1 in Section 3) being computable in O(n) flops. We assume further that m, the dimension of the invariant subspace, is small relative to n, the dimension of A. Throughout the paper, we put most of our effort on the case when m > 1, as it is relatively simple (c.f. inverse iteration) when m = 1.

We shall attempt the refinement of an estimate for an invariant subspace (REIS) for a real matrix A (or a deflating subspace (REDS) of a real matrix pencil A−λB) by applying Newton’s method to the NARE (1) (or the coupled NARE (25)). The special cases when A and B are symmetric will be discussed briefly, with more structures in the NARE (1) and better efficiency for the associated computations. Note that a symmetric A implies symmetric A11

and A22, and A12 = AT21, giving rise to an interesting NARE (1) with some

symmetry in its defining matrices. Other similar structures, such as skew-symmetry, as well as the more general cases for complex matrices or pencils, can be treated similarly.

The REIS and REDS problems for large and sparse matrices and matrix pencils are important and interesting, because they form a vital part of any eigen-solver; see [1, 20] and the references therein. The REIS problem has also

(3)

been investigated in detail in [2–4, 8], in the context of the continuation of in-variant subspaces (CIS) where parameter-dependent eigenvalue problems have to be solved. We came across the REIS and REDS in [9] when searching for applications of large-scale NAREs. The equation (1) is different from those related to M-matrices, considered in [13, 14]. We are not aware of any substan-tial extension of the work by Demmel and others since 1987. For large-scale sparse matrices and pencils in [5, 20], the NARE (1) (or its equivalence) has been solved via the Arnoldi-Galerkin approach; see also the related work in [2–4, 8, 19] and a comparative operation counts in Section 2.2. We shall follow a basic linear algebraic route, attempting to invert nearly singular matrices after applying Newton’s method to (1) exactly, without any Krylov subspace approximation as in [5, 16, 20]. We believe our new algorithms of O(n) complex-ity per iteration are new and will be useful for refining invariant and deflating subspaces of large and sparse matrices and pencils. Related work for matrix polynomials and periodic matrix pairs will be reported elsewhere.

2 Invariant Subspace of A

Consider the application of Newton’s method on (1) with R0 = 0. We may

solve the REIS problem via Sylvester equations of the form: (for k ≥ 0) S(Rk+1) ≡ eA22Rk+1− Rk+1Ae11= − eA21, (2)

with eA22 ≡ A22− RkA12, eA11 ≡ A11+ A12Rk and eA21 ≡ A21+ RkA12Rk.

On the conditions for the convergence of Newton’s method for (1), please consult [7, 9], which were predated by [22, 23]. Under favourable conditions (such as those in [7, 22, 23]), the iterates Rk will converge to the correction

R quadratically. Note that eA22 is a low-ranked update of A22, for which the

Sherman-Morrison-Woodbury formula (SMWF)

(M + U C−1V>)−1= M−1− M−1U (C + V>M−1U )−1V>M−1 (3) can be applied for its inversion. Also, A11, A12and A21, and thus eA11, eA12and

e

A21 can all be computed efficiently via the products of the structured matrix

A to P1> and P1, and P (in its Householder factors) to vectors, and between

vectors, all in O(n) complexity. The uniquely solvable Sylvester equations (2) are obviously linearizations of the NARE (1).

Remark 1 In the numerical examples in Section 4, we measure the accuracies of the iterates Rk+1≡ Rk+ Ek in terms of the residuals: (for k ≥ 0)

rk+1≡ N (Rk+1) =

h

S(Rk+1) + eA21

i

− EkA12Ek= −EkA12Ek, (4)

with S(Rk+1) + eA21 = 0 because of the Newton step in (2). This implies

that the quadratic convergence of iterates Rk, the errors Ek and the

(4)

raises the possibility of obtaining faster convergence. For example, when A is symmetric, then r0 ≡ A21 = A>12 = O() and E0 = O(), with (4) implying

r1= O(kr0k3) = O(3). We can then perform a “restart” version of Newton’s

method, which runs one Newton iteration, updates the invariant subspace and then repeats the process. The resulting algorithm converges cubically. In gen-eral, we have rk = O(kr0k2

k+1₋₁

) (k ≥ 1), implying quadratic convergence in the limit but better convergence for smaller k’s. For the nonsymmetric case, the “restart” algorithm is “mathematically” equivalent to the asymptotically cubically convergent Algorithm 2.2 in [19, Thereom 2.1], which partly explains the good convergence in the Examples 2 and 4 in Section 4. The same conclu-sions for the REDS can be drawn (in Remark 3).

Remark 2 Newton’s iteration in (2), with R0= 0, is equivalent to

S(Ek) = −rk, Rk+1= Rk+ Ek. (5)

From our numerical experience, (5) yields more accurate iterates and converges faster. We believe that the vastly different sizes of Rk and Ek in Rk+1might

have created unnecessary errors in (2), with round-off errors taking effects after two or three iterations. Although we keep the mathematically equivalent iteration (2) in our discussion, we apply (5) numerically. See also the similar Remark 4 for the REDS below.

2.1 Solution to Sylvester Equations

From Newton’s method for the REIS, we have to solve the Sylvester equation (2). When m = 1, this degenerates to a simple linear equation in Rk+1,

re-quiring the inversion of eA22− eA11I, a low-ranked update of A22− eA11I, as

considered in Section 2.3 later. The solution of (1) degenerates into the inverse iteration for eA22with the shift λ = eA11.

When m > 1 with m n, we shall devise an algorithm to solve the Sylvester equation (2) directly using the structures of A, by the inversion of matrices like A22− γI as in Section 2.3. Consider the small eA11 = QSQ> ∈

Rm×m in Schur decomposition [11, p. 192]. Here Q is unitary, S is upper triangular with sj (j = 1, · · · , m) being the jth diagonal element and ηj ∈

Cj−1 (j = 2, · · · , m; degenerate otherwise) contains elements above sj in S.

We then transform (1) to e

A22Rk+1− Rk+1QSQ>= − eA21

or, with X = Rk+1Q and bA21= eA21Q, to

e

A22X − XS = − bA21. (6)

To solve the Sylvester equation (6), we have, for j = 1, 2, · · · , m:

(5)

where (·)j denotes the jth column, xj= (X)j and X−j = [x1, · · · , xj−1] (j =

2, · · · , m; degenerate otherwise). Note that eA22− sjI = (A22− sjI) − RkA12,

a low-ranked update of A22− sjI, is nonsingular because the spectra of A11

and A22 are nonintersecting and Rk is small. In fact, in many applications,

the spectra are far apart and A22− sjI should be well-conditioned.

2.2 Computational Complexity per Iteration

The solution of the linear systems in (7) can then be realized in O(n) complex-ity through the efficient inversion of A22− sjI discussed in the next section,

with the help of the SMWF. The correction Rk+1 = XQ> can be retrieved

from X in 2m2_{n flops. In terms of other dominant operation counts, there}

will be 2m (or m) linear system solves involving A22− γI in the kth Newton

iteration, when k > 0 (or k = 0). When solving (7) (j = 1, · · · , m), we re-quire M−1U and M−1zj in the SMWF in (3) with M = A22− sjI, U = Rk

(degenerate for k = 0), V> = A12 and C = Im. The evaluation of A22Rk in

N (Rk) for (5) or convergence control will also be of O(n) complexity,

requir-ing efficient vector multiplications by P and A. The counts are similar for the modifications involving real Schur forms in Section 2.4, as well as the REDS in Section 3 later. All these compare favourably with the generalizations of the BEM method [12] in [2, 4], requiring respectively m(2m + 1) solves of n × n systems and 2m + 1 solves of n2× n2_{systems. It is difficult to compare with}

the methods in [5, 20], as the NARE (1) or the associated linearizations (2) are projected by Galerkin-Arnoldi methods, with their accuracy dependent on the quality of the corresponding approximating (rational) Krylov subspaces, and their adequate sizes unknown a priori. It is also unclear why the solutions to (1) or (2) have to be numerically low-ranked, the basis of projection methods. With rational Krylov subspaces generated by (A − γI)−1, similar numbers of solves involving A will obviously be required. In contrast, we approach the ex-act corrections via Newton’s method, without approximating equations, still achieving an efficient O(n) computational complexity per iteration.

For a symmetric A, we have a symmetric A22 and, for k = 0, a diagonal

S, making the solution of (7) more efficient. For k > 0, eA11= A11+ A12Rk is

dominantly symmetric and ηj = O(kRkk), which can be exploited further for

better efficiency. Recall also the comments on convergence in Remark 1.

2.3 Inversion of A22− γI or A22− γB22

The main building block of our algorithms is the inversion of A22− γI (or

A22− γB22 for the REDS), for some γ, through the efficient inversion of the

near-singular A − γI (or A − γB). This is required for the Sylvester equations (2) in Newton’s method.

Note interestingly that near-singular linear systems are routinely modified to well-conditioned ones [6, 12]. We are doing the opposite, solving a smaller

(6)

well-conditioned linear system via a slightly bigger ill-conditioned one, because of the inherent structures in A (and B). This seemingly dangerous practice will be backed up by a error analysis in Section 2.3.1 later.

Ignoring the γ term to simplify the notation and unify the treatment for the REIS and REDS, we assume (with P = Pl= Pr for the REIS for A)

A11A12

A21A22

= P_l>APr. (8)

Again, it is vital that Pl and Pr are stored in Householder factors, so that

the vector multiplication by Pl, Pr can be achieved in O(n) complexity. Let

Cij ≡ Pri>A−1Plj (i, j = 1, 2), we have C11 = (A11− A12A−122A21)−1 or the

Schur complement (P_l>APr)A22 ≡ A11− A12A −1 22A21= C11−1, and P_r>A−1Pl= C ≡ C11C12 C21C22 = A11A12 A21A22 −1 = C11 −C11A12A−122 −A−1₂₂A21C11A−122 + A −1 22A21C11A12A−122 . (9)

Substituting C11, C12and C21 into C22in (9), for some v, we obtain

A−1₂₂v = CC11v ≡ C22v − C21C −1

11C12v. (10)

The products C22v and C12v can be computed through

C11C12 C21C22 0 v = Pr>A−1Pl 0 v ,

which is equivalent to the solution (by, e.g., Gaussian elimination with partial pivoting) of the linear system:

Ay = Pl 0 v , (11) with C12v C22v = Pr>y.

From (10), we also require

C11= (A11− A12A−122A21)−1, C21= −A−122A21C11. (12)

which have only m columns and can be computed efficiently through the linear system (again solvable by Gaussian elimination with partial pivoting):

AZ = Pl Im 0 , (13) with C11 C21 = P_r>Z.

(7)

All the calculations involved in C22v can then be achieved in O(n) flops. With

the invariant subspace being accurate and A21 being small, we have

C₁₁−1= A11+ O(kA21k). (14)

For a symmetric A, we have

C11= C11>, C22= C22>, C12= C21>. (15)

Better efficiency comes from the fact that the symmetric linear equations (11) and (13) can be solved cheaper, and we only need to compute one of C21

and C12. In practice, the symmetry can be utilized to minimize errors and

improve accuracy. For example, after the symmetric C₁₁−1 is calculated, we should symmetrize the result by averaging that with its transpose, allowing some possible error cancellation. See also Remarks 1 and 3.

2.3.1 Error Analysis

In the refinement of invariant and deflating subspaces, A, or respectively A−γI and A − γB with the γ terms, are nearly singular, especially when Newton’s iteration is near its convergence. (It is interesting that [1, 18] contain comments to the effect that inverting nearly singular matrices or solving ill-conditioned linear systems are “essential”, and cannot and should not be avoided, when refining invariant subspaces.) Therefore, we need to analyse the errors and stability associated with the near-singular matrices, similar to that in inverse iteration. Indeed, we shall prove that our computation is backward stable [15], producing an approximation satisfying a nearby problem.

Ignoring the γ terms, consider A in (8) with some small eigenvalues in A11

and σ(A11) ∩ σ(A22) = ∅. From the error analysis of linear equations (see, e.g.,

[15, Section 13.3] or [11, Section 3.4]), the computation in (11) actually yields A11+ δ11A12+ δ12 A21+ δ21A22+ δ22 b y = Pl 0 v , b y1 b y2 ≡ Pr>by ≈ C12v C22v . (16) We shall use the abbreviations ˘Aij ≡ Aij+ δij for i, j = 1, 2, and let ( bCij) ≡

( ˘Aij)−1 (analogous to (9)). The backward errors δij of order cij(n)u, where

cij(n) (i, j = 1, 2) are some low degree polynomials in n and u is the unit

round-off or machine accuracy. For simplicity, let us assume kδijk ≤ cnu for

some small constant cn dependent on n.

We now show that the computed approximation to A−1₂₂ in (10), with bCij

replacing Cij (i, j = 1, 2), is accurate and not affected adversely by the

ill-condition of A or A11. Similar to (10), the definition of bCij leads to

˘

A11Cb11+ ˘A12Cb21= Im, A˘11Cb12+ ˘A12Cb22= 0;

˘

A21Cb11+ ˘A22Cb21= 0, A˘21Cb12+ ˘A22Cb22= In−m.

These then imply b C21= − ˘A−122A˘21Cb11, Cb22= ˘A−122 In−m− ˘A21Cb12 ,

(8)

and subsequently b C₁₁−1 = ˘A11− ˘A12A˘−122A˘21, Cb12= − bC11A˘12A˘−122; b C21 = − ˘A−122A˘21Cb11, Cb22= ˘A−122 + ˘A −1 22A˘21Cb11A˘12A˘−122. (17) Similar to (14), we have b C₁₁−1= A11+ O(kA21k). (18)

For the given v, the computed approximation to A−1₂₂v in (10) is b y2− bC21Cb₁₁−1_by1= bC_C_b₁₁v = ˘A−122v ≈ C22v − C21C11−1C12v, (19) with bC b C11 ≡ bC22− bC21Cb −1

11Cb12, provided that we can compute bC21Cb₁₁−1 in b

C

b

C11v in (19) accurately. Note that ˘A22 is well-conditioned (with a small δ22)

and its inversion should not generate much errors, producing an accurate ap-proximation to A−1₂₂v. The large components in bC12 are cancelled off by the

multiplication by bC₁₁−1 in (18).

Lastly, we shall consider the accurate approximation cW to bC21Cb₁₁−1in (19). From (13), and similar to inverse iteration, the computed solution has the form

b Z = " b C11 b C21 # + Pr ukA−1_kW 11 uW21 ,

where the error mainly lies in span(Pr1) and W11, W21 = O(1). With A−1

replaced by bC11 of similar magnitudes (without modifying the notation for

W11for simplicity), and after pre-multiplication by Pr>, we have the computed

approximation of C21C11−1: c W ≡ ( bC21+ uW21)( bC11+ uk bC11kW11)−1 = ( bC21Cb₁₁−1+ uW21Cb₁₁−1) h I + uk bC11kW11Cb₁₁−1 i−1 = bC21Cb₁₁−1+ ∆21, (20)

where the error term ∆21is dominated by u(W21Cb₁₁−1−k bC11k bC21Cb₁₁−1W11Cb₁₁−1) = O(uκ( bC11)k). We obviously require uκ( bC11)) to be reasonably small relative

to unity, for ∆21to be small or (I + uk bC11kW11Cb₁₁−1)−1to exist. (The effective or structured condition number may be much smaller, with components of varying magnitudes in A or bC11.) This is indeed the case at the beginning of

Newton’s iteration. Near convergence, a possible source of trouble will be from κ( bC11) ≈ κ(A) = O(krkk−1). Then uκ( bC11) = O(1) or krkk = O(u), and there

will be no need for further refinement.

From (17) and (20), A−1₂₂v in (19) now has the computed value

b

y2− cWby1= bC22v − ( bC21Cb

−1

11 + ∆21) bC12v

(9)

where δv ≡ − ˘A22∆21Cb12v. With negligible errors in the subtraction from bC22,

the multiplication by cW (backward stably using inner products [15]), and the multiplications by unitary projections, which modify δv in (21) slightly, the approximation to A−1₂₂v in (21) is exactly the required quantity with A22 and

v perturbed slightly. In other words, except when A11 in (8) being (nearly)

singular at the end of the Newton process, our computation of A−1₂₂v = CC11v

(for some v), with linear equations in (11)–(13) solved by Gaussian elimina-tion with partial pivoting, is backward stable. The above discussion may have suggested higher precision for the solution of (13) but we have found that un-necessary in our numerical experiments. This conclusion is important to our refinement procedure.

Finally, as in (7), we may invert A22− γI in an application of the SMWF

to invert eA22− γI. Since the matrix operators are well-conditioned, the

appli-cation of the SMWF will not create any large numerical errors [24].

2.4 Real Schur Form

When m > 1, a “real” version of the procedure in Section 2.1 is desirable, using the real Schur form of eA11. Instead of the complex conjugate pair of eigenvalues

sj, sj+1= sj and the corresponding eigenvectors xj, xj+1= xj, we have Xj∈

R(n−m)×2, spanning the same column space as [xj, xj+1]. Correspondingly, we

have the real 2 × 2 block Sj in place of sj, sj+1, with ηj, ηj+1∈ Rj−1above the

block Sj. With (·)j1:j2 denoting the j1 to j2 columns and X−j as previously

defined in Section 2.1, (7) becomes e

A22Xj− XjSj = −( bA21)j:j+1+ X−j[ηj, ηj+1]. (22)

Using the Kronecker product, (22) is equivalent to f Mkjv(Xj) = ˜zj, with fMkj≡ I2⊗ eA22− Sj>⊗ In−m, ˜zj ≡ v h −( bA21)j:j+1+ X−j[ηj, ηj+1] i and v[·] stacking columns. The inversion of fMkj, with the similar structures as

e

A22repeated in the two diagonal subblocks, can be realized efficiently in O(n)

complexity. One possibility is to use the diagonal blocks as pivot to eliminate the off-diagonal blocks. Equivalently and more directly, with Sj≡

sj βj1 βj2sj+1 , we have f Mkj= " e A22− sjIn−m −βj2In−m −βj1In−m Ae22− sj+1In−m # (23) = Mkj− Rk 0 0 Rk A12 0 0 A12 , Mkj≡ A22− sjIn−m −βj2In−m −βj1In−m A22− sj+1In−m ,

(10)

where fMkj is a low-ranked update of Mkj and P_l> A − sjIn −βj2In −βj1In A − sj+1In Pr= Pl>NjPr= ∗ ∗ ∗ Mkj , Nj ≡ I2⊗ A − Sj>⊗ In, Pl= Pr= P 0 0 P     Im In−m Im In−m     . (24)

The inversion of fMkj can be realized through the SMWF via the inversion

of Mkj, which can in turn be done by inverting Nj in O(n) computational

complexity, as in Section 2.3. Note that Nj is structured (c.f. A − γIn with

γ = sj in Section 2.3), P is a projection (c.f. P ) and Mkj is the lower-right

corner block in P_l>NjPr (c.f. A22− γI).

3 Deflating Subspaces of A − λB

First, and mathematically, we can always consider the generalized eigenvalue problem (GEP) associated with the matrix pencil A−λB via the matrix (β0A+

α0B)−1(β1A + α1B), after the shift-and-invert strategy for some parameters

αiand βi (i = 0, 1). The techniques for REIS in Section 2 can then be applied,

requiring the inversion of matrices in the form

(β0A + α0B)−1(β2A + α2B), α2= α1− γα0, β2= β1− γβ0,

for some γ as in Section 2.3, still producing algorithms of O(n) computational complexity. In what follows, we continue on a different path, refining the de-flating subspaces directly.

3.1 Generalized NARE and Sylvester Equation

For a large and structured real matrix pencil A − λB, let Pl = [Pl1, Pl2] and

Pr= [Pr1, Pr2] be unitary (in Householder factors), with Pl1, Pr1∈ Rn×m

con-taining the bases of the approximate deflating subspaces, with m n. From [9], with Aij = Pli>APrj and Bij = Pli>BPrj (i, j = 1, 2), the refinement of

estimates of deflating subspaces (REDS) for the GEP, with the exact deflating subspace spanned by the columns of (Pl1+ Pl2L, Pr1+ Pr2R), requires the

solution of the generalized coupled NARE:

e

N (L, R) ≡ A22R − LA11− LA12R + A21 B22R − LB11− LB12R + B21

= 0. (25)

Newton’s method, with L0= R0= 0, yields the coupled Sylvester equations:

e S(Lk+1, Rk+1) ≡ " e A22Rk+1− Lk+1Ae11 e B22Rk+1− Lk+1Be11 # = − " e A21 e B21 # (26)

(11)

where e

A22≡ A22− LkA12, Ae11≡ A11+ A12Rk, Ae21≡ A21+ LkA12Rk;

e

B22≡ B22− LkB12, Be11≡ B11+ B12Rk, Be21≡ B21+ LkB12Rk.

For the conditions of convergence for (26), please consult [9].

Remark 3 Similar to (4), with Rk+1≡ Rk+ Ek and Lk+1≡ Lk+ Fk, we have

the residuals e rk+1= eN (Lk+1, Rk+1) = − FkA12Ek FkB12Ek , r_e0≡ A21 B21 . (27)

For a symmetric pencil A − λB, we have a “restart” version of Newton’s method for the REDS, which runs one Newton step and updates the deflating subspace before repeating. For such an algorithm, we have cubic convergence with _er1 = O(kre0k

3_{). Generally, we have}

e

rk = O(ker0k

2k+1−1_{) (k ≥ 1), giving}

quadratic convergence in the limit but better convergence for smaller ks. Remark 4 Similar to Remark 2, Newton’s iteration in (26), with L0, R0 = 0,

is equivalent to e

S(Fk, Ek) = −erk, (Lk+1, Rk+1) = (Lk+ Fk, Rk+ Ek). (28) From our numerical experience, (28) yields more accurate iterates and con-verges faster. Again, we believe that the vastly different sizes of (Lk, Rk) and

(Fk, Ek) in (Lk+1, Rk+1) might have created unnecessary errors in (26). We

shall keep (26) in our discussion but apply (28) numerically. When m = 1, (26) degenerates to c Mk Rk+1 Lk+1 = − " e A21 e B21 # , Mck ≡ " e A22− eA11I e B22− eB11I # .

The inversion of cMk can be carried out as in that for cMkj in (30) below.

Again, when m = 1, the solution of (26) degenerates into an inverse iteration for β eA22− α eB22 as in Section 2.3, with α = eA11 and β = eB11.

When m > 1, we shall propose a method for the solution of (26) using the structures of A and B, via the inversion of matrices like βA22− αB22, as in

Section 2.3. Consider the (real) generalized Schur decomposition [11, p. 253] for eA11, eB11∈ Rm×m:

e

A11= Q1SAQ>2, Be11= Q1SBQ>2,

with Q1and Q2being unitary, and SAand SBbeing upper triangular with αj

and βjbeing the diagonal elements, respectively. We denote the elements above

αj and βj by ηj and ζj in SA and SB, degenerate when j = 1, respectively.

The coupled Sylvester equation is then transformed to e

(12)

with X ≡ Rk+1Q2, Y ≡ Lk+1Q1, bA21 = eA21Q2 and bB21 = eB21Q2. For

j = 1, 2, · · · , m, with yj = (Y )j and Y−j = [y1, · · · , yj−1] (degenerate when

j = 1), we have c Mkj xj yj = zj1 zj2 ≡ " −( bA21)j+ Y−jηj −( bB21)j+ Y−jζj # , Mckj≡ " e A22−αjI e B22 −βjI # . (29)

If αj = 0, inverting the upper block-triangular cMkj will be trivial, in terms

of the inverse of eA22 (a special case of the inverses in Section 2.3). Assume

without loss of generality that αj6= 0 (note that αjβj6= 0 for a regular matrix

pencil A − λB), we have c Mkj= I 0 βjα−1j I α −1 j I " e A22−αjI b Nj 0 # , Nbj≡ αjBe22− βjAe22. (30)

The inversion of cMkj can again be realised via that of bNj, as in Section 2.3.

Note that b

Nj≡ αjBe22− βjAe22= (αjB22− βjA22) − Lk(αjB12− βjA12)

is a low-ranked update of βjA22− αjB22. Again, the inversion of cMkj and

the evaluation of (A22Rk, B22Rk) in eN (Lk, Rk) for (28) or convergence

con-trol have O(n) computational complexity, from the vector multiplication and inversion of structured matrices. The matrix bNjin (30) is nonsingular because

σ(A11, B11) ∩ σ(A22, B22) = ∅ and Lk is small. From (29) and (30), we have

b

Njxj= −βjzj1+ αjzj2, yj = α−1j ( eA22xj− zj1). (31)

After obtaining (X, Y ), we can retrieve (Rk+1, Lk+1) = (XQ>2, Y Q>1).

For a symmetric pencil, we have the symmetry associated with A and B in (15). Similar to the computations in Section 2 for the REIS, efficiency can be gained from the symmetry of A, B, Aiiand Bii (i = 1, 2), the properties Pli=

Pri(i = 1, 2), Q1= Q2, A21= A12> and B12= B21>, SAand SB being diagonal

(when k = 0), and ηj and ζj being small. For example, bNj in (30) will be

dominantly symmetric. Recall also the remark on convergence in Remark 3.

3.2 Generalized Real Schur Form

When m > 1, a “real” version of Section 3.1 is possible, using the real gener-alized Schur form of ( eA11, eB11), similar to Section 2.4 for the REIS. With the

obvious extension of notation similar to those in Section 2.4, we have e

A22Xj− YjSαj = ˆzj1≡ −( bA21)j:j+1+ Y−j[ηj, ηj+1],

e

(13)

where Sαj, Sβj ∈ R2×2 and Sβj is upper triangular: Sαj= αj11 αj12 αj21 αj22 , Sβj = βj11 βj12 βj21 βj22 .

Applying the Kronecker product, the resulting linear equation has the form

c Mkj v(X) v(Y ) = v(ˆzj1) v(ˆzj2) , Mckj= " I2⊗ eA22−S>αj⊗ In−m I2⊗ eB22−Sβj> ⊗ In−m # . (33)

Note that Sαj and Sβj cannot be singular simultaneously, assuming the

reg-ularity of A − λB. Without loss of generality, let Sαj be nonsingular and we

have, similar to (30), c Mkj= _I ₀ S_βj>S_αj−>⊗ In−m I " I2⊗ eA22−Sαj> ⊗ In−m ˘ Nj 0 # , with ˘Nj ≡ I2 ⊗ eB22− S>βjS −>

αj ⊗ eA22. Again, the inversion of cMkj can be

realised in terms of ˘ Nj = " e B22− τ (j) 11 Ae22 −τ (j) 12Ae22 −τ₂₁(j)Ae22 Be22− τ (j) 22Ae22 # , Sβj>Sαj−>= " τ₁₁(j) τ₁₂(j) τ₂₁(j) τ₂₂(j) # . (34)

If the approximate deflating subspaces are reasonably accurate, thus σ(Sαj, Sβj)

⊆ σ( eA11, eB11) does not intersect σ( eA22, eB22), ˘Nj will be nonsingular and

well-conditioned, as it is the following generalized Sylvester operator in disguise:

e

B22Z − eA22Z(S−1αjSβj) = eB22(ZSαj−1)Sαj− eA22(ZS−1αj)Sβj.

As ˘Nj is nonsingular, no two subblocks on the same row- or column-blocks

can be singular simultaneously, enabling the inversion to be pivoted at one of the subblocks. In turn, all the subblocks are low-ranked updates of matrices of the form A22− γB22, allowing the inversion to benefit from the structures in

A − γB in O(n) computational complexity, as in Section 2.3. Alternatively, ˘Nj

in (34) can also be considered as a generalization of fMkj in (23) and treated

similarly. As in (24), for some orthogonal projections ˘Pl2and ˘Pr2, we have

˘ Nj= I2⊗ eB22− (Sβj>S −> αj ) ⊗ eA22= ˘Pl2> h I2⊗ eB − (Sβj>S −> αj ) ⊗ eAi ˘Pr2.

Similar to (31), (32) or (33) imply that ˘

(14)

4 Numerical Examples 4.1 General Description

All the numerical results have been computed using a HP workstation xw4600 with an Intel Core 2 Quad CPU 2.83 GHz and 8 GB RAM.

The examples have been generated randomly, where A and B have preset sparsity. Subspace iteration is first applied to obtain the invariant or deflating subspaces corresponding to m dominant eigenvalues. These are then artifi-cially perturbed before our refinement procedures are applied. The numerical results are presented in Tables 1–4, where examples for n = 1000, 2000, 4000, 8000, 16000, 32000 and 64000 are presented. For individual examples, the 2-norms of the residuals in (4) or (27) of the NAREs (1) or (25) corresponding to the iterates, and the respective execution time information, are presented in columns 3–6. Here, dt(n)_k denotes the execution time for the kth iteration for the particular value n, t(n)_k =Pk

j=1dt (n)

j is the accumulated execution time

up to and including the kth iteration and the ratios in the 6th column reflect the computational complexity. On the right of the tables, the accuracy of the eigenvalues corresponding to the converged invariant/deflating subspaces are summarized with the eigenvalues and their relative errors. For the nonsym-metric Example 2 for the REIS in Tables 2, the ratios t(4n)_k /t(n)_k are computed in column 6, instead of t(2n)_k /t(n)_k as in other tables. It is the result of the changing structures of the dominant eigenvalues, making some ratios reflect-ing the computational complexity poorly. Consequently, the ratios in brackets, for eigenvalues with different structures, should be considered skeptically. For Example 2 in Table 2, the ratios of t(4n)_k /t(n)_k for n = 8000 compare the exe-cution times for the groups of dominant eigenvalues with one or two groups of complex eigenvalues. The computations in Section 2.4 are different for the two groups of eigenvalues, making the ratio reflecting the computational com-plexity badly. We expect a ratio of 4 to reflect the O(n) comcom-plexity but the two pairs of complex eigenvalues force the ratio higher. The same comments apply to Table 4 for the nonsymmetric Example 4 for the REDS. Lastly, any additional iterations will not improve the accuracy because of round-off errors. As a result, some examples have three iterates presented, others have two.

4.2 Particular Comments

We have Example 1, a symmetric example for REIS, in Table 1. The symmetric matrix A has zero components except on the five diagonals, which are

[10, 20, · · · , 10m, δ1, · · · , δn−m]>∈ Rn, c ∈ Rn−1, d ∈ Rn−2,

where [δ1, · · · , δn−m]>, c and d have components from the standard normal

(15)

The convergence of the invariant subspaces is (at least) quadratic in terms of the residuals rk in (4). The computation for REIS is shown to be efficient,

requiring only 6.13 seconds for the largest example with n = 64000. The O(n) computational complexity is demonstrated in the ratios in column 6, for n ≤ 16000. Somehow, higher ratios have been produced for larger n’s, possibly because of the swapping of pages of virtual memory. Similar observations also hold for other examples. The converged eigenvalues are accurate up to machine accuracy, as shown in the last two columns of Table 1.

For other examples, we have presented Tables 2–4, similar to Table 1. Example 2 for REIS in Table 2 is nonsymmetric, and Example 3 (or 4) is for the REDS for a symmetric (or nonsymmetric) matrix pencil. In Example 2, the nonsymmetric matrix A is similar to that in Example 1 with the same main diagonal, but with sub-diagonals a ∈ Rn−2

and b ∈ Rn−1_{and super-diagonals}

c ∈ Rn−1

and d ∈ Rn−2_{, where the components of a and c are from N (0, 1),}

while those of b and d are from N (0, 10). In Example 3, the symmetric A and B are tridiagonal with nonzero components from N (0, 1), which can be defined by the MATLAB command sprandsym [17]. In Example 4, the nonsymmetric tridiagonal A has nonzero components from N (0, 1) and the tridiagonal B has super- and sub-diagonals [2, 3, · · · , n]> and main diagonal d = [d1, · · · , dn]>

(with di∈ N (0, 1)).

The convergence of the invariant/deflating subspaces is again (at least) quadratic in terms of the residuals (4) or (27), and the computational com-plexity has been shown to be O(n). The converged eigenvalues are accurate up to (near-)machine accuracy. Based on Theorem 1, the eigenvalues are com-puted from eA11≡ A11+ A12Rk for REIS (together with eB11≡ B11+ B12Rk

for the REDS, with the most accurate Rk). Their accuracy is typically better

than that of the corresponding invariant or deflating subspaces. Note that, for a symmetric A from Theorem 1, the compression A11+A12R = A11+O(kA21k2)

is the exact representation of A in span(W ). Consequently, any R with an O() error will show up in the compression as an error of O(2_{). A similar}

conclu-sion can be drawn for symmetric matrix pencils. Consequently, the accuracies of the refined invariant/deflating subspaces do not have to be high for the eigenvalues to attain the machine accuracy.

Lastly, as indicated in Remarks 1 and 3, the residuals associated with the first iterates for symmetric A and A−λB in Tables 1 and 3, respectively, exhibit cubic convergence behaviour. Somehow, the first iterates in the other tables for the nonsymmetric examples also exhibit similar behaviour frequently, es-pecially for small initial residuals r0 and _er0. This is the consequence of the

equivalence of the “restart” versions of our algorithms, apart from the vital tricks in Section 2.3, to [19, Algorithm 2.2], which converges asymptotically cu-bically. This super-convergence behaviour and the “restart” Newton’s method will be investigated and explored further in the future.

(16)

n k krkk2 dt(n)k t (n) k t (2n) k /t (n) k λ |δλ/λ| 0 1.34e−02 - - 60.1460 1.18e−15 1000 1 1.04e−08 6.24e−02 6.24e−02 1.00 49.9933 1.42e−15 2 2.75e−16 6.24e−02 1.25e−01 1.50 40.0267 1.07e−15 30.4158 1.87e−15 19.9696 0.00e+00 9.4998 5.61e−16 0 1.14e−02 - - 60.0849 1.42e−15 2000 1 7.77e−09 6.24e−02 6.24e−02 2.76 50.0014 2.42e−15 2 9.69e−17 1.25e−01 1.87e−01 2.00 40.1318 2.48e−15 29.9690 1.54e−15 19.9836 1.24e−15 9.8558 2.52e−15 0 1.99e−02 - - 60.0588 3.55e−16 4000 1 4.85e−08 1.72e−01 1.72e−01 1.36 50.2018 1.13e−15 2 3.81e−16 2.03e−01 3.74e−01 1.38 39.8994 4.81e−15 29.9927 5.92e−16 20.0590 1.95e−15 9.8870 2.52e−15 0 1.28e−02 - - 60.1135 2.13e−15 8000 1 8.14e−09 2.34e−01 2.34e−01 1.33 50.3596 1.27e−15 2 4.12e−16 2.81e−01 5.15e−01 1.75 39.7705 7.15e−16 29.9563 1.19e−15 19.9333 7.13e−16 9.9772 8.90e−16 0 1.66e−02 - - 60.1032 2.36e−16 16000 1 1.94e−07 3.12e−01 3.12e−01 2.20 50.1944 0.00e+00 2 1.11e−16 6.40e−01 9.52e−01 2.54 39.9966 2.49e−15 30.0106 1.18e−16 19.7865 8.98e−16 9.9230 8.95e−16 0 1.33e−02 - - 60.0279 1.89e−15 32000 1 1.94e−08 6.86e−01 6.86e−01 2.92 50.1242 2.84e−16 2 6.98e−17 1.73e+00 2.42e+00 2.53 40.3072 5.29e−16 30.2661 4.58e−15 19.4404 2.38e−15 9.8401 1.08e−15 0 1.96e−02 - - 60.0890 1.06e−15 64000 1 4.84e−07 2.00e+00 2.00e+00 - 50.0953 2.41e−15 2 2.55e−15 4.13e+00 6.13e+00 - 40.0295 3.55e−16 30.2179 3.17e−15 20.4460 2.09e−15 9.2398 4.04e−15

Table 1 Example 1 (REIS; Symmetric, n = 1000–64000, m = 6)

5 Conclusions

We have presented algorithms for the refinement of invariant and deflating subspaces for large and structured matrices A and matrix pencils A − λB, respectively. Our technique solves the (coupled) NAREs (1) or (25) by the quadratically convergent Newton’s method, inverting the well-conditioned and unstructured matrices A22− γIn−m or A22− γB22 via that of the structured

(17)

n k krkk2 dt(n)_k t(n)_k t(4n)_k /t(n)_k λ |δλ/λ|

0 2.00e−02 - - 40.2220 1.24e−15

1000 1 3.17e−07 3.12e−02 3.12e−02 3.49 28.8732 4.92e−16 2 2.70e−12 3.12e−02 6.24e−02 4.25 −21.7461 7.35e−15 3 2.78e−15 3.12e−02 9.36e−02 - 21.1349 8.40e−16

0 1.91e−02 - - 40.5221 1.05e−15

2000 1 4.97e−09 1.40e−01 1.40e−01 2.79 25.9616 1.81e−15 2 3.57e−15 2.03e−01 3.43e−01 2.73 ±7.8421i

3 21.4961 6.94e−15

0 1.99e−02 - - 43.0885 1.65e−15

4000 1 5.12e−07 1.09e−01 1.09e−01 1.33 29.6436 1.20e−15 2 5.66e−15 1.56e−01 2.65e−01 2.06 24.7172 5.17e−15

3 24.1487 4.71e−15

0 2.00e−02 - - 45.5527 2.03e−15

8000 1 4.56e−09 3.90e−01 3.90e−01 (6.77) 32.3695 3.51e−15 2 5.95e−15 5.46e−01 9.36e−01 (7.88) −11.9007 4.49e−15

3 ±19.6545i

0 1.95e−02 - - 40.9067 3.47e−15

16000 1 4.56e−09 1.72e−01 1.72e−01 4.72 29.0183 1.59e−15 2 1.88e−14 3.74e−01 5.46e−01 4.23 −26.0567 2.86e−15

3 −25.6342 1.11e−15

0 1.98e−02 - - 37.3284 8.48e−16

32000 1 1.38e−06 2.64e+00 2.64e+00 - ±6.1791i

2 8.62e−15 4.74e+00 7.38e+00 - −13.5315 1.46e−15

3 ±25.5664i

0 1.98e−02 - - 41.5012 0.00e+00

64000 1 9.37e−07 8.11e−01 8.11e−01 - 28.5096 3.74e−15 2 5.23e−13 1.50e+00 2.31e+00 - 28.2844 2.89e−15

3 28.1070 6.07e−15

Table 2 Example 2 (REIS; Nonsymmetric, n = 1000–64000, m = 4)

but near-singular A − γIn or A − γB. Interestingly and importantly, our

er-ror analysis shows that any ill-conditioning associated with the near-singular matrices does not affect our computation adversely. This produces algorithms of an efficient O(n) computational complexity per iteration. The theoretical results have been illustrated favourably by numerical examples.

Our refinement procedures will be useful as part of any eigen-solver. In particular, they will be useful for the CIS problem, or when subspace methods are used to compute the initial invariant or deflating subspace. Note that the Ritz refinement by Jia [16] only improves accuracy of invariant or deflating subspaces within some approximating (Krylov) subspaces. To achieve higher accuracy, our methods may be applicable, provided that A − γI or A − γB are “structured” and efficiently vector-multiplicable and invertible.

Acknowledgements

This research work was partially supported by the National Science Council and the NCTS in Taiwan, as well as the CMMSC and the ST Yau Centre at the National Chiao Tung University. The second author has been supported

(18)

n k kerkk2 dt (n) k t (n) k t (2n) k /t (n) k λ |δλ/λ| 0 8.21e−02 - - 2149.6019 3.86e−17 1000 1 2.52e−05 6.24e−02 6.24e−02 1.00 149.9449 1.01e−17 2 4.39e−12 3.12e−02 9.36e−02 1.16 −77.6590 2.36e−18 3 1.26e−15 1.56e−02 1.09e−01 1.58 −67.5203 1.12e−16 0 5.67e−02 - - 640.3876 5.27e−18 2000 1 7.71e−06 6.24e−02 6.24e−02 1.50 496.0150 2.08e−18 2 5.59e−13 4.68e−02 1.09e−01 1.72 262.7633 7.90e−17 3 7.70e−16 6.24e−02 1.72e−01 1.63 196.7227 7.49e−17 0 7.23e−02 - - 21372.4423 2.13e−18 4000 1 7.76e−04 9.36e−02 9.36e−02 2.00 3109.4906 2.17e−17 2 1.69e−07 9.36e−02 1.87e−01 2.09 −590.6424 7.66e−17 3 7.58e−15 9.36e−02 2.81e−01 1.78 −347.5213 3.39e−17 0 8.62e−02 - - −5370.9484 2.63e−17 8000 1 4.78e−05 1.87e−01 1.87e−01 1.17 2392.9336 9.61e−18 2 1.73e−11 2.03e−01 3.90e−01 1.24 761.8216 3.92e−19 3 1.57e−15 1.09e−01 4.99e−01 1.50 −689.7808 1.89e−17 0 9.08e−02 - - −2290.4621 4.87e−17 16000 1 2.72e−04 2.18e−01 2.18e−01 1.72 2204.0092 3.21e−17 2 6.84e−07 2.65e−01 4.84e−01 2.29 1039.3738 3.28e−17 3 3.50e−14 2.65e−01 7.49e−01 2.24 −868.6618 5.21e−17 0 6.94e−02 - - 17153.3032 5.90e−17 32000 1 1.95e−04 3.74e−01 3.74e−01 2.25 15379.2434 1.61e−17 2 1.25e−08 7.33e−01 1.11e+00 2.09 −8084.1829 3.53e−17 3 1.59e−15 5.77e−01 1.68e+00 2.42 −5068.7095 1.31e−16 0 9.73e−02 - - 16248.1401 4.57e−18 64000 1 8.06e−05 8.42e−01 8.42e−01 - −12863.1956 2.76e−18 2 4.81e−11 1.48e+00 2.32e+00 - −6251.9958 1.52e−17 3 6.18e−16 1.75e+00 4.07e+00 - −6024.6360 1.06e−16

Table 3 Example 3 (REDS; Symmetric, n = 1000–64000, m = 4)

by a Monash Graduate Scholarship and a Monash International Postgraduate Research Scholarship. Part of the work was completed when the third author visited the CMMSC and the NCTS at the National Chiao Tung University.

References

1. p.a. absil, r. sepulchre, p. van dooren and r. mahony. Cubically convergent it-erations for invariant subspace computation, SIAM J. Matrix Anal. Appl., 26 (2004) 70–96.

2. w.-j. beyn, w. kless and v. thummler. Continuation of low-dimensional invariant subspaces in dynamical systems of large dimension, in Ergodic Theory, Analysis, and Efficient Simulation of Dynamical Systems, B. Fiedler, ed., Springer, 2001, pp. 47–72. 3. d. bindel, j. demmel and m. friedman. Continuation of invarinat subspaces in large

bifurcation problems, SIAM J. Sci. Comput., 30 (2008) 637–656.

4. j. bosec. Continuation of Invariant Subspaces in Bifurcation Problems, PhD thesis, University of Marburg, 2002.

5. j. brandts. The Riccati algorithm for eigenvalues and invariant subspaces of matrices with inexpensive actions, Lin. Alg. Appl., 358 (2003) 335–365.

6. t.f. chan and d.c. resasco. Generalized deflated block-elimination, SIAM J. Numer. Anal., 23 (1986) 913–924.

(19)

n k k_erkk2 dt(n)k t (n) k t (2n) k /t (n) k λ |δλ/λ| 0 1.10e−02 - - 5.0298 2.53e−15 1000 1 1.98e−05 1.09e−01 1.09e−01 (2.00) 2.2538 1.83e−15 2 4.70e−11 1.25e−01 2.34e−01 (2.20) −1.9776 6.33e−16 3 7.92e−13 1.40e−01 3.74e−01 - −0.9544 3.97e−15

±1.1454i

0 1.96e−02 - - −31.4178 2.88e−16 2000 1 3.03e−06 2.18e−01 2.18e−01 (1.50) −0.6719 1.73e−15

2 6.90e−13 2.96e−01 5.15e−01 (1.36) ±1.3423i

3 0.8249 1.46e−14

±0.6797i

0 1.77e−02 - - 3.8243 5.12e−16 4000 1 5.24e−05 3.28e−01 3.28e−01 (2.05) −1.3641 2.45e−15

2 1.72e−10 3.74e−01 7.02e−01 (2.38) ±1.2597i

3 7.83e−13 4.21e−01 1.12e+00 - −1.4453 1.08e−15 1.0246 3.14e−15 0 8.15e−03 - - −2.6615 4.07e−15 8000 1 1.64e−06 6.71e−01 6.71e−01 2.03 0.9752 1.22e−15

2 3.95e−12 9.98e−01 1.67e+00 2.52 ±1.3836i

3 −1.1421 8.26e−15

±0.8742i

0 1.50e−02 - - −4.2306 7.05e−16 16000 1 1.54e−05 1.36e+00 1.36e+00 3.03 −0.4621 1.07e−15

2 2.14e−11 2.85e+00 4.21e+00 2.59 ±1.3846i

3 0.7509 2.20e−13

±0.6520i

0 1.96e−02 - - 2.0737 6.82e−14 32000 1 2.64e−06 4.12e+00 4.12e+00 1.69 ±1.2427i

2 2.95e−11 6.77e+00 1.09e+01 1.80 −0.2288 5.51e−15

3 ±1.3816i

1.1659 3.43e−14 0 1.71e−02 - - 1.1221 1.70e−15 64000 1 1.69e−04 6.97e+00 6.97e+00 - ±1.0001i

2 1.22e−07 1.27e+01 1.96e+01 - −1.1545 6.55e−14 3 2.21e−11 1.65e+01 3.61e+01 - ±0.4857i

−1.1730 2.10e−13

Table 4 Example 4 (REDS; Nonsymmetric, n = 1000–64000, m = 5)

7. f. chatelin. Simultaneous Newton’s iteration for the eigenproblem, Computing, Suppl. 5 (1984) 67–74.

8. l. dieci and m. friedman. Continuation of invariant subspaces, Numer. Lin. Alg. Ap-plic., 8 (2001) 317–327.

9. j.w. demmel. Three methods for refining estimates of invariant subspaces, Computing, 38 (1987) 43–57.

10. j.j. dongarra, c.b. moler and j.h wilkinson. Improving the accuracy of computed eigenvalues and eigenvectors, SIAM J. Num. Anal., 20 (1983) 46–58.

11. g.h. golub and c.f. van loan. Matrix Computations, 2nd Ed., Johns Hopkins Univer-sity Press, Baltimore, MD, 1989.

12. w. govaerts and j.d. pryce. Mixed block elimination for linear systems with wider borders, IMA J. Numer. Anal., 13 (1993) 161–180.

13. c.-h. guo and n.j. higham. Iterative solution of a nonsymmetric algebraic Riccati equation, SIAM J. Matrix Anal. Appl., 29 (2007) 396–412.

14. x.-x. guo, w.-w. lin and s.-f. xu. A structure-preserving doubling algorithm for non-symmetric algebraic Riccati equations, Numer. Math., 103 (2006) 393–412.

(20)

15. n.j. higham. Accuracy and Stability of Numerical Algorithms, Society for Industrial and Applied Mathematics, Philadelphia, 2002.

16. z. jia. The convergence of harmonic Ritz values, harmonic Ritz vectors, and refined harmonic Ritz vectors, Math. Comp 74 (2005) 1441–1456.

17. mathworks. MATLAB User’s Guide, 2010.

18. g. peters and j.h. wilkinson. Inverse iteration, ill-conditioned equations and Newton’s method, SIAM Rev., 21 (1979) 339–360.

19. b. philippe and y. saad. On correction equations and domain decomposition for com-puting invariant subspaces, Comput. Methods Appl. Mech. Engrg., 196 (2007) 1471– 1483.

20. v. simoncini and m. sadkane. Arnoldi-Riccati method for large eigenvalue problems, BIT, 36 (1996) 579–594.

21. g.w. stewart. Error and perturbation bounds for subspaces associated with certain eigenvalue problems, SIAM Rev., 15 (1973) 752–764.

22. j.-g. sun. Invariant subspaces and generalized invariant subspaces I. Existence and uniqueness theorems, Math. Numer. Sinica, 2 (1980) 1–14.

23. j.-g. sun. Invariant subspaces and generalized invariant subspaces II. Perturbation the-orems, Math. Numer. Sinica, 2 (1980) 113–123.

24. e.l. yip. A note on the stability of solving a rank-p modification of a linear system by the Sherman-Morrison-Woodbury formula, SIAM J. Sci. Stat. Comput.. 7 (1986) 507–513.