ON A NONLINEAR MATRIX EQUATION ARISING IN NANO RESEARCH

(1)

ON A NONLINEAR MATRIX EQUATION ARISING IN NANO RESEARCH∗

CHUN-HUA GUO†, YUEH-CHENG KUO‡, ANDWEN-WEI LIN§

Abstract. The matrix equation X + AX−1A = Q arises in Green’s function calculations in nano research, whereA is a real square matrix and Q is a real symmetric matrix dependent on a parameter and is usually indeﬁnite. In practice one is mainly interested in those values of the parameter for which the matrix equation has no stabilizing solutions. The solution of interest in this case is a special weakly stabilizing complex symmetric solutionX∗, which is the limit of the unique stabilizing solutionXηof the perturbed equationX + AX−1A = Q + iηI, as η → 0+_{. It has been}

shown that a doubling algorithm can be used to computeXη efficiently even for very small values of η, thus providing good approximations to X_∗. It has been observed by nano scientists that a modified fixed-point method can sometimes be quite useful, particularly for computingXηfor many different values of the parameter. We provide a rigorous analysis of this modified fixed-point method and its variant and of their generalizations. We also show that the imaginary partX_Iof the matrix X∗is positive semidefinite and we determine the rank ofX_I in terms of the number of unimodular eigenvalues of the quadratic pencilλ2_A_{− λQ + A. Finally we present a new structure-preserving}

algorithm that is applied directly on the equationX + AX−1A = Q. In doing so, we work with real arithmetic most of the time.

Key words. nonlinear matrix equation, complex symmetric solution, weakly stabilizing solution, ﬁxed-point iteration, structure-preserving algorithm, Green’s function

AMS subject classifications. 15A24, 65F30 DOI. 10.1137/100814706

1. Introduction. The nonlinear matrix equation X + AX−1A = Q, where A is real and Q is real symmetric positive deﬁnite, arises in several applications and has been studied in [3, 7, 11, 21, 22, 27], for example.

In this paper we further study the nonlinear matrix equation X + AX−1A = Q + iηI,

where A∈ Rn×n, Q = Q ∈ Rn×n, and η≥ 0, but Q is usually not positive deﬁnite.

The equation arises from the nonequilibrium Green’s function approach for treating quantum transport in nanodevices, where the system Hamiltonian is a semi-infinite or bi-infinite real symmetric matrix with special structures [1, 5, 17, 18, 25]. A first systematic mathematical study of the equation has already been undertaken in [12].

∗_{Received by the editors November 12, 2010; accepted for publication (in revised form) December}

7, 2011; published electronically March 15, 2012.

http://www.siam.org/journals/simax/33-1/81470.html

†_{Department of Mathematics and Statistics, University of Regina, Regina, SK S4S 0A2, Canada}

(chguo@math.uregina.ca). The work of this author was supported in part by a grant from the Natural Sciences and Engineering Research Council of Canada.

‡_{Department of Applied Mathematics, National University of Kaohsiung, Kaohsiung 811, Taiwan}

(yckuo@nuk.edu.tw). The work of this author was partially supported by the National Science Council in Taiwan.

§_{Department of Mathematics, National Taiwan University, Taipei 106, Taiwan, and Center of}

Mathematical Modelling and Scientiﬁc Computing, National Chiao Tung University, Hsinchu 300, Taiwan (wwlin@math.nctu.edu.tw). The work of this author was partially supported by the National Science Council and the National Center for Theoretical Sciences in Taiwan.

235

(2)

For the bi-inﬁnite case, the Green’s function corresponding to the scattering region

G_S∈ Cns×ns_{, in which the nano scientists are interested, satisﬁes the relation [4, 17]}

G_S =(E + i0+)I− H_S− C_L,S G_L,SC_L,S− D_S,RG_S,RD_S,R−1,

where E is energy, a real number that may be negative, H_S ∈ Rns×ns _{is the}

Hamil-tonian for the scattering region, C_L,S ∈ Rn×ns _{and D}

S,R ∈ Rns×nr represent the

coupling with the scattering region for the left lead and the right lead, respectively,

and G_L,S∈ Cn×n _{and G}_S,R∈ Cnr×nr _{are special solutions of the matrix equations}

G_L,S=(E + i0+)I− B_L− A_LG_L,SA_L−1 (1.1) and G_S,R=(E + i0+)I− B_R− A_RG_S,RA_R−1 (1.2) with A_L, B_L = B_L ∈ Rn×n_{, and A}_R_{, B}_R _{= B} R ∈ Rnr×nr. Since (1.1) and (1.2)

have the same type, we only need to study (1.1). We simplify the notation n to n.

In nano research, one is mainly interested in the values ofE for which G_L,S in (1.1)

has a nonzero imaginary part [18].

For each ﬁxedE, we replace 0+in (1.1) by a suﬃciently small positive number η

and consider the matrix equation

X =(E + iη)I − B_L− A_LXA_L−1. (1.3)

It is shown in [12] that the required special solution G_L,S of (1.1) is given by G_L,S =

lim_η→0+G_L,S(η) with X = G_L,S(η) being the unique complex symmetric solution

of (1.3) such that ρ (G_L,S(η)A_L) < 1, where ρ(·) denotes the spectral radius. Thus

G_L,S is a special complex symmetric solution of X =EI − B_L− A_LXA_L−1 with

ρ (G_L,SA_L)≤ 1.

The question as to when G_L,S has a nonzero imaginary part is answered in the

following result from [12], whereT denotes the unit circle.

Theorem 1.1. _{For λ}∈ T, let the eigenvalues of ψ_L_{(λ) = B}_L_{+ λA}_L_{+ λ}−1_A

L be

μ_L,1(λ)≤ · · · ≤ μ_L,n(λ). Let

Δ_L,i=

min

|λ|=1μL,i(λ), max|λ|=1μL,i(λ)

and Δ_L = n_i=1Δ_L,i. Then G_L,S is a real symmetric matrix if E /∈ Δ_L. When E ∈ ΔL, the quadratic pencil λ2AL − λ(EI − BL) + AL has eigenvalues on T. If all

these eigenvalues onT are simple (they must then be nonreal, as explained in the proof of Theorem 3.3), then G_L,S has a nonzero imaginary part.

By replacing X in (1.3) with X−1, we get the equation

(1.4) X + AX−1A = Q_η,

where A = A_L and Q_η = Q + iηI with Q = EI − B_L. So Q is a real symmetric

matrix dependent on the parameter E and is usually indeﬁnite. For η > 0, we need

the stabilizing solution X of (1.4), which is the solution with ρ(X−1A) < 1, and then

G_L,S(η) = X−1. The existence of the stabilizing solution was proved in [12] using

advanced tools; an elementary proof has been given recently in [10]. When η = 0 and

(3)

E ∈ ΔL, it follows from Theorem 1.1 that the required solution X = G−1L,S of (1.4) is

only weakly stabilizing, in the sense that ρ(X−1A) = 1.

The size of the matrices in (1.4) can be very small or very large, depending on how the system Hamiltonian is obtained. If the Hamitonian is obtained from layer-based models, as in [18] and [25], then the size of the matrices is just the number of principal layers. In [18] considerable attention is paid to single-layer models and the more realistic double-layer models, which correspond to n = 1 and n = 2 in (1.4). We

can say that (1.4) with n≤ 10 is already of signiﬁcant practical interest. On the other

hand, if the Hamiltonian is obtained from the discretization of a diﬀerential operator, as in [1], then the size of the matrices in (1.4) can be very large if a ﬁne mesh grid is used.

One way to approximate G_L,S is to take a very small η > 0 and compute G_L,S(η).

It is proved in [12] that the sequence{X_k} from the basic ﬁxed-point iteration (FPI)

X_k+1= Q_η− AX_k−1A, (1.5)

with X₀= Q_η, converges to G_L,S(η)−1. And it follows that the sequence{Y_k} from

the basic FPI

Y_k+1= (Q_η− AY_kA)−1, (1.6)

with Y₀ = Q−1_η , converges to G_L,S(η). However, the convergence is very slow for

E ∈ ΔL since ρ(G_L,S(η)A) ≈ 1 for η close to 0. It is also shown in [12] that a

doubling algorithm (DA) can be used to compute the desired solution X = G_L,S(η)−1

of the equation (1.4) eﬃciently for each ﬁxed value of E. However, in practice the

desired solution needs to be computed for many diﬀerentE values. Since the DA is

not a correction method, it cannot use the solution obtained for one E value as an

initial approximation for the exact solution at a nearby E value. To compute the

solutions corresponding to manyE values, it may be more eﬃcient to use a modiﬁed

FPI together with the DA. Indeed, it is suggested in [25] that the following modiﬁed

FPI be used to approximate G_L,S(η):

Y_k+1= 1 2Yk+ 1 2(Qη− A _Y kA)−1. (1.7)

A variant of this FPI is given in [12] to approximate G_L,S(η)−1,

X_k+1=1 2Xk+ 1 2(Qη− A _X−1 k A), (1.8)

which requires less computational work each iteration. However, the convergence analysis of these two modiﬁed FPIs has been an open problem even for the special

initial matrices Y₀= Q−1_η and X₀= Q_η, respectively.

Our ﬁrst contribution in this paper is a proof of convergence (to the desired solutions) of these two modiﬁed FPIs and their generalizations for many choices of initial matrices. These methods can be used as correction methods. In this process we

will show that the unique stabilizing solution X = G_L,S(η)−1of (1.4) is also the unique

solution of (1.4) with a positive deﬁnite imaginary part. It follows that the imaginary

part X_I of the matrix G−1_L,S is positive semideﬁnite. Our second contribution in this

paper is a determination of the rank of X_I in terms of the number of eigenvalues

onT of the quadratic pencil λ2A− λQ + A. Our third contribution is a

structure-preserving algorithm (SA) that is applied directly on (1.4) with η = 0. In doing so, we work with real arithmetic most of the time.

(4)

2. Convergence analysis of FPIs. In this section we perform convergence

analysis for some FPIs, including the modiﬁed FPIs (1.7) and (1.8). The main tool we need is the following important result due to Earle and Hamilton [6]. The presentation here follows [14, Theorem 3.1] and its proof.

Theorem 2.1 (Earle–Hamilton theorem). Let D be a nonempty domain in a

complex Banach space Z and let h : D → D be a bounded holomorphic function. If h(D) lies strictly inside D, then h has a unique fixed point in D. Moreover, the sequence {z_k} defined by the fixed point iteration z_k+1= h(z_k) converges to this fixed point for any z₀∈ D.

Now let Z be the complex Banach spaceCn×n equipped with the spectral norm.

For any K∈ Cn×n, its imaginary part is the Hermitian matrix

ImK = 1

2i(K− K

∗_).

For any Hermitian matrices X and Y , X > Y (X≥ Y ) means that X − Y is positive

deﬁnite (semideﬁnite). Let D₊ = {X ∈ Cn×n : ImX > 0}, D₋ = {X ∈ Cn×n :

ImX < 0}.

We start with a proof of convergence for the basic FPI (1.5) for many diﬀerent

choices of X₀, not just for X₀= Q_η.

Theorem 2.2. _{For any X}₀∈ D₊_{, the sequence}{X_k} produced by the FPI (1.5)

converges to the unique ﬁxed point X_η inD₊.

Proof. Let D = {X ∈ Cn×n : ImX > η₂I}. For each X ∈ D, X is invertible

by Bendixson’s theorem (see [26], for example) and we also haveX−1 < 2_η (see [2,

Corollary 4] or [13, Lemma 3.1]). Now let

f (X) = Q_η− AX−1A. Then for X ∈ D Imf (X) = ImQ_η− 1 2i AX−1A− (AX−1A)∗

= ηI + AX−1(ImX)(AX−1)∗≥ ηI.

It follows that f :D → D is a bounded holomorphic function and f(D) lies strictly

inside D. By the Earle–Hamilton theorem, f has a unique ﬁxed point X_η in D and

X_k converges to X_η for any X₀∈ D. The theorem is proved by noting that X₁∈ D

for any X₀∈ D₊and that any ﬁxed point X_∗ inD₊must be inD by X_∗= Q_η− A

X_∗−1A.

Remark 2.1. Since{X_k} converges to G_L,S(η)−1 for X₀ = Q_η ∈ D₊, we know

that X_η = G_L,S(η)−1 in Theorem 2.2. Thus X_η is the unique solution of (1.4)

such that ρ(X_η−1A) < 1, and it is also the unique solution of (1.4) in D₊. If we

have obtained a particular solution of (1.4) by some method and would like to know whether it is the required solution, the latter condition is easier to check.

The matrix G_L,S(η) can also be computed directly by using (1.6) for many

dif-ferent choices of Y₀. Note that the FPI (1.6) is Y_k+1= ˆf (Y_k) with

ˆ

f (Y ) = (Q_η− AY A)−1.

Corollary 2.3. For any Y₀∈ D₋, the sequence{Y_k} produced by the FPI (1.6)

converges to the unique ﬁxed point Y_η = G_L,S(η) inD₋.

(5)

Proof. For any Y₀ ∈ D₋, Y₀ is invertible by Bendixson’s theorem. We now take

X₀ = Y₀−1 in (1.5). Then X₀∈ D₊ since ImX₀ =−Y₀−1(ImY₀)Y₀−∗. It follows that

the sequence {Y_k} is well deﬁned and related to the sequence {X_k} from (1.5) by

Y_k = X_k−1. Thus {Y_k} converges to Y_η = X_η−1 ∈ D₋. Since X_η is the unique ﬁxed

point of f inD₊, Y_η is the unique ﬁxed point of ˆf in D₋.

For faster convergence, we consider the modiﬁed FPI for (1.4)

(2.1) X_k+1= (1− c)X_k+ c(Q_η− AX_k−1A), 0 < c < 1,

or X_k+1= g(X_k) with the function g deﬁned by

(2.2) g(X) = (1− c)X + cf(X).

Note that f (X) = X if and only if g(X) = X. So f and g have the same ﬁxed

points. Note also that the FPI (1.8) is a special case of the FPI (2.1) with c = 1₂. We

can now prove the following general result.

Theorem 2.4. For any X₀∈ D₊, the FPI X_k+1= g(X_k) converges to the unique

ﬁxed point X_η in D₊.

Proof. For any X₀ ∈ D₊, X₁ is well deﬁned and ImX₁ > cηI. Let b be any

number such that b > X₁ and b > 2(Q_η + _cη1A2). Let D = {X ∈ Cn×n :

ImX > cηI, X < b}. Thus X₁ ∈ D. For each X ∈ D, X is invertible and

X−1_< 1

cη, as before. Then for X ∈ D

Img(X) = (1− c)ImX + cImf(X) > (1 − c)cηI + cηI = (2 − c)cηI

and g(X) ≤ (1 − c)X + c Qη +_cη1A2 < (1− c)b + cb 2 = 1−c 2 b.

It follows that g :D → D is a bounded holomorphic function and g(D) lies strictly

insideD. By the Earle–Hamilton theorem, X_k converges to the unique ﬁxed point of

g inD, which must be X_η.

Similarly, we consider the modiﬁed FPI

(2.3) Y_k+1= (1− c)Y_k+ c(Q_η− AY_kA)−1, 0 < c < 1,

or Y_k+1= ˆg(Y_k) with the function ˆg deﬁned by

(2.4) ˆg(Y ) = (1− c)Y + c ˆf (Y ).

The FPI (2.3) includes (1.7) as a special case. Note that ˆf (Y ) = Y if and only

if ˆg(Y ) = Y . So ˆf and ˆg have the same ﬁxed points. However, there are no simple

relations between X_k from (2.1) and Y_k from (2.3).

Theorem 2.5. _{For any Y}₀∈ D₋_{, the FPI Y}_k+1_{= ˆ}_g(Y_k_{) converges to the unique}

ﬁxed point Y_η inD₋.

Proof. Take b > 2/η, and let D = {Y ∈ Cn×n : ImY < 0, Y < b}. For any Y ∈ D, Im(Q_η− AY A)≥ ηI. So ˆg(Y ) is well deﬁned and (Q_η− AY A)−1 ≤ 1_η. Thus ˆg(Y ) < (1 − c)b + c1 η < 1−1 2c b.

(6)

Moreover,

Im(ˆg(Y )) < cIm(Q_η− AY A)−1

=−c(Q_η− AY A)−1Im(Q_η− AY A)(Q_η− AY A)−∗ ≤ −cη(Q_η− AY A)∗(Q_η− AY A)−1

≤ −cηQη− AY A−2I

≤ − cη

(Q_η + bA2)2I.

It follows that ˆg :D → D is a bounded holomorphic function and ˆg(D) lies strictly

inside D. By the Earle–Hamilton theorem, Y_k converges to Y_η for any Y₀ ∈ D and

hence for any Y₀∈ D₋ since we can take b >Y₀.

We remark that the modiﬁed FPI (2.1) is slightly less expensive than the modiﬁed FPI (2.3) for each iteration. These two methods make improvements over the basic FPIs (1.5) and (1.6) in the same way, as explained below.

The rate of convergence of each FPI can be determined by computing the Fr´echet

derivative of the ﬁxed-point mapping, as in [11]. For (1.5) and (1.6), we have lim sup

k→∞

k

Xk− Xη ≤ (ρ(Xη−1A))2, lim sup k→∞

k

Yk− Yη ≤ (ρ(YηA))2,

where equality typically holds. Recall that Y_η = X_η−1. Note also that if Y₀ = X₀−1

(with X₀∈ D₊), then lim sup k→∞ k Yk− Yη = lim sup k→∞ k Xk− Xη.

The Fr´echet derivative at X_η of the function g in (2.2) is given by

g(X_η)E = (1− c)E + cAX_η−1EX_η−1A.

It follows that for FPI (2.1) lim sup

k→∞

k

Xk− Xη ≤ ρ(1− c)I + c(AX_η−1)⊗ (AX_η−1).

Similarly, the Fr´echet derivative at Y_η of the function ˆg in (2.4) is given by

ˆ

g(Y_η)E = (1− c)E + cY_ηAEAY_η.

It follows that for FPI (2.3) lim sup

k→∞

k

Yk− Yη ≤ ρ(1− c)I + c(Y_ηA)⊗ (Y_ηA). The rate of convergence of both modiﬁed FPIs is then determined by

r_η= max

i,j 1− c + cλi(X −1

η A)λj(X_η−1A).

The convergence of the modiﬁed FPIs is often much faster because r_η may be much

smaller than 1 for a proper choice of c. An extreme example is the following.

(7)

Example 2.1. Consider the scalar equation (1.4) with A = 1 and Q_η = ηi. It is easy to ﬁnd that X_η= 1 2(η + 4 + η2)i.

Thus ρ(X_η−1A)→ 1 as η → 0+, while for c = 1₂ we have r_η→ 0 as η → 0+.

Note that for i, j = 1, . . . , n, the n2numbers λ_i(X_η−1A)λ_j(X_η−1A) are insideT for

each η > 0. In the limit η → 0+, at least one of them is onT if E ∈ Δ_L. So each of

these numbers has the form reiθ with 0≤ r ≤ 1 and 0 ≤ θ < 2π. We may allow c = 1

in (2.1) and (2.3). In this case, the basic FPIs (1.5) and (1.6) are recovered. To get

some insight, we ﬁrst consider choosing c∈ (0, 1] such that for ﬁxed (r, θ)

p(c) =1− c + creiθ

is minimized. If (r, θ) = (1, 0), then p(c) = 1 for all c∈ (0, 1]. So assume (r, θ) = (1, 0).

In this case, p(c) =1− reiθ 1 1− reiθ − c is minimized on (0, 1] when c = min 1, Re 1 1− reiθ = min 1, 1− r cos θ 1 + r2− 2r cos θ ≥1 2,

where we have used 1− r cos θ − 1₂(1 + r2− 2r cos θ) = 1₂(1− r2)≥ 0. It follows that

c = 1 is the best choice when _1+r1−r cos θ₂_{−2r cos θ} ≥ 1 or, in other words, when z = reiθ is in

the disk {z ∈ C : |z −1₂| ≤ 1₂}. Note that p(1) = r. It also follows that c = 1₂ is the

best choice when r = 1. Note that p(1₂) = 1₂√1 + r2+ 2r cos θ = 1₂ 2(1 + cos θ) for

r = 1.

We know from [12] that the eigenvalues of X_η−1A are precisely the n eigenvalues

inside T of the quadratic pencil λ2A− λQ_η+ A. We also know from Theorem 1.1

that the quadratic pencil P (λ) = λ2A− λQ + A has some eigenvalues on T when

E ∈ ΔL. We can then make the following conclusions.

Remark 2.2. If P (λ) has some eigenvalues near 1 or−1, the convergence of the FPI (2.1) is expected to be very slow for any choice of the parameter c. The DA is

recommended for this case. If all eigenvalues of P (λ) are clustered around ±i, then

the FPI (2.1) with c = 1₂ is expected to be very eﬃcient. In the general case, the

optimal c is somewhere between 1₂ and 1. If all eigenvalues of X_η−1A are available,

we can determine the optimal c to minimize r_η using the bisection procedure in [9,

section 6] with [1₂, 1] as the initial interval. In practice we would not compute these

eigenvalues for every E value. But we may use DA to compute X_η for one E value,

determine the optimal c value for thisE, and use it as a suboptimal c for many nearby

E values. If one does not want to compute any eigenvalues when using the FPI (2.1),

then c = 1₂ is recommended since this c value is best in handling the eigenvalues of

X_η−1A that are extremely close to T, which are the eigenvalues responsible for the extreme slow convergence of the basic FPIs (1.5) and (1.6).

We note that the approximate solution from the DA or the FPI for anyE value is

in D₊ and can be used as an initial guess for the exact solution when using the FPI

for otherE values, with guaranteed convergence. However, even for small problems

(8)

the convergence of the FPI (2.1), with c = 1₂, for example, can still be very slow when P (λ) has some eigenvalues near 1 or −1. Moreover, the latter situation will happen

for some energy values since P (λ) is palindromic and thus, as E varies, eigenvalues

of P (λ) leave or enter the unit circle typically through the points ±1. When n is

large, there may be some eigenvalues of P (λ) near±1 for almost all energy values of

interest, and thus the convergence of the FPI may be very slow and other methods should be used.

3. Rank of Im(G_L,S). The equation (1.4) has a unique stabilizing solution

X_η = G_L,S(η)−1 for any η > 0. Thus

X_η+ AX_η−1A = Q_η (3.1)

with ρ(X_η−1A) < 1. We also know that X_η is complex symmetric. Write X_η =

X_η,R+ iX_η,I with X_η,R = X_η,R, X_η,I = X_η,I ∈ Rn×n. We know from the previous

section that Im(X_η) = X_η,I > 0. Let

ϕ_η(λ) = λA+ λ−1A− Q_η.

(3.2)

By (3.1) the rational matrix-valued function ϕ_η(λ) has the factorization

ϕ_η(λ) =λ−1I− S_ηX_η(−λI + S_η) ,

where S_η = X_η−1A. Let X = lim_η→0+X_η= G−1_L,S. Then

X + AX−1A = Q (3.3)

with ρ(X−1A) ≤ 1 and Im(X) ≥ 0. Note that ϕ₀(λ) = λA+ λ−1A− Q has the

factorization

ϕ₀(λ) =λ−1I− SX (−λI + S) ,

(3.4)

where S = X−1A. In particular, ϕ₀(λ) is regular, i.e., its determinant is not

indenti-cally zero. In this section we will determine the rank of Im(X), which is the same as

the rank of Im(G_L,S) since Im(G_L,S) = Im(X−1) =−X−1Im(X)X−∗.

Let M = A 0 Q −I , L = 0 I A 0 . (3.5)

Then the pencilM − λL, also denoted by (M, L), is a linearization of the quadratic

matrix polynomial

P (λ) = λϕ₀(λ) = λ2A− λQ + A. (3.6)

It is easy to check that y and z are the right and left eigenvectors, respectively, corresponding to an eigenvalue λ of P (λ) if and only if

y Qy− λAy , z −λz (3.7)

are the right and left eigenvectors of (M, L), respectively.

Theorem 3.1. Suppose that λ₀ is a semisimple eigenvalue of ϕ₀(λ) on T with

multiplicity m₀ and Y ∈ Cn×m0 _{forms an orthonormal basis of right eigenvectors}

(9)

corresponding to λ₀. Then iY∗(2λ₀A−Q)Y is a nonsingular Hermitian matrix. Let d_j, j = 1, . . . , , be the distinct eigenvalues of iY∗(2λ₀A− Q)Y with multiplicities mj₀, and let ξ_j ∈ Cm0×mj0 form an orthonormal basis of the eigenspace corresponding to d_j. Then for η > 0 suﬃciently small and j = 1, . . .

λ(k)_j,η = λ₀−λ0

d_jη + O(η

2_{), k = 1, . . . , m}j

0, and yj,η= Y ξj+ O(η) (3.8)

are perturbed eigenvalues and a basis of the corresponding invariant subspace of ϕ_η(λ), respectively.

Proof. Since P (λ₀)Y = λ₀ϕ₀(λ₀)Y = 0 with Y∗Y = I_m₀ and|λ₀| = 1, we have

0∗= (P (λ₀)Y )∗= 1

λ2₀Y

∗_(λ2

0A− λ0Q + A).

It follows that Y forms an orthonormal basis for left eigenvectors of P (λ)

correspond-ing to λ₀. From (3.7), we obtain that the column vectors of

YR= Y QY − λ₀AY andY_L= Y −λ0Y

form a basis of left and right eigenspaces ofM−λL corresponding to λ₀, respectively.

Since λ₀is semisimple, the matrix

[Y∗,−λ₀Y∗]L

Y QY − λ₀AY

=−Y∗(2λ₀A− Q)Y = −Y∗P(λ₀)Y is nonsingular. Let YR=−YR(Y∗P(λ0)Y )−1, YL=YL. Then we have Y∗ LL YR = Im0 and YL∗M YR= λ0Im0. (3.9)

For η > 0, suﬃciently small, we consider the perturbed equation of P (λ) by P (λ)− λiηI = λ2A− λ(Q + iηI) + A = λϕ_η(λ).

LetM_η =_{Q + iηI}A _−I0. Then M_η− λL is a linearization of λϕ_η(λ). By (3.9) and

[24, Chapter VI, Theorem 2.12] there are Y_Rand Y_L such that [ Y_RY_R] and [ Y_LY_L]

are nonsingular and Y∗ L Y∗ L MYRYR = λ₀I_m₀ 0 0 M , Y∗ L Y∗ L LYRYR = I_m₀ 0 0 L .

Then, by [24, Chapter VI, Theorem 2.15] there exist matrices Δ₁(η) = O(η) and

Δ₂(η) = O(η2) such that the column vectors of Y_R+ Δ₁(η) span the right eigenspace

of (M_η,L) corresponding to (λ₀I_m₀ + ηE₁₁+ Δ₂(η), I_m₀), where E₁₁= Y_L∗ 0 0 iI 0 YR= λ0Y∗(iI)Y (Y∗P(λ0)Y )−1 =−λ₀(iY∗(2λ₀A− Q)Y )−1. (3.10)

(10)

The matrix iY∗(2λ₀A− Q)Y in (3.10) is Hermitian since

iY∗(2λ₀A− Q)Y = iY∗ϕ₀(λ₀)Y + iλ₀Y∗AY − iλ₀Y∗AY = iλ₀Y∗AY + (iλ₀Y∗AY )∗.

(3.11)

Let d_j for j = 1, . . . , be the distinct eigenvalues of iY∗(2λ₀A − Q)Y with

mul-tiplicities mj₀, and let ξ_j ∈ Cm0×mj0 form an orthonormal basis of the eigenspace

corresponding to d_j. Then we have

Φ∗E₁₁Φ = diag −λ0 d₁ Im10, . . . , −λ0 d Im0 ,

where Φ = [ξ₁, . . . , ξ]∈ Cm0×m0_{. It follows that λ}₀_I_m

0+ ηE11+ Δ2(η) is similar to λ₀I_m₀+ diag −λ0 d₁ ηIm10, . . . , −λ0 d ηIm0 + Δ₃(η)

for some Δ₃(η) = O(η2). Then for each j ∈ {1, 2, . . . , }, the perturbed eigenvalues

λ(k)_j,η, k = 1, . . . , mj₀, and a basis of the corresponding invariant subspace ofM_η− λL

with λ(k)_j,η|_η=0= λ₀can be expressed by

λ(k)_j,η = λ₀−λ0 d_jη + O(η 2_{), k = 1, . . . , m}j 0, (3.12a) and ζ_j,η=Y_Rξ_j+ O(η). (3.12b)

The second equation in (3.8) follows from (3.12b).

Lemma 3.2. Let Z_η be the solution of the equation

Z_η− R∗_ηZ_ηR_η= ηW_η for η > 0, (3.13)

where W_η ∈ Cm×m is positive definite, R_η = eiθI_m+ ηE_η with θ∈ [0, 2π] fixed, and E_η ∈ Cm×m is uniformly bounded such that ρ(R_η) < 1. Then Z_η is positive definite. Furthermore, if Z_η converges to Z₀ and W_η converges to a positive definite matrix W₀ as η→ 0+, then Z₀ is also positive definite.

Proof. Since ρ(R_η) < 1 and ηW_η is positive deﬁnite, it is well known that Z_η is uniquely determined by (3.13) and is positive deﬁnite.

Since E_η is bounded, we have from (3.13) that

ηW_η = Z_η−e−iθI_m+ ηE_η∗Z_ηeiθI_m+ ηE_η

=−ηeiθE_η∗Z_η− ηe−iθZ_ηE_η+ O(η2).

This implies that

W_η=−eiθE_η∗Z_η− e−iθZ_ηE_η+ O(η).

(3.14)

If Z_η converges to Z₀ as η → 0+, then Z₀ is positive semideﬁnite. To prove that Z₀

is positive deﬁnite, it suﬃces to show that Z₀ is nonsingular. Suppose that x∈ Cm

(11)

such that Z₀x = 0. Then we have Z_ηx→ 0 and x∗Z_η → 0 as η → 0+. Multiplying

(3.14) by x∗ and x from the left and right, respectively, we have

x∗W_ηx =−eiθx∗E_η∗Z_ηx− e−iθx∗Z_ηE_ηx + O(η)→ 0 as η → 0+.

Thus x = 0 because W_η converges to W₀ and W₀ is positive deﬁnite. It follows that

Z₀ is positive deﬁnite.

Theorem 3.3. The number of eigenvalues (counting multiplicities) of ϕ₀(λ) on

T must be even, say, 2m. Let X = limη→0+X_η be invertible and write X = X_R+ iX_I

with X_R= X_R, X_I = X_I∈ Rn×n. Then

(i) rank (X_I)≤ m;

(ii) rank (X_I) = m if all eigenvalues of ϕ₀(λ) on T are semisimple and S_η −

S₂= O(η) for η > 0 suﬃciently small, where S_η= X_η−1A and S = X−1A;

(iii) rank (X_I) = m if all eigenvalues of ϕ₀(λ) on T are semisimple and each

unimodular eigenvalue of multiplicity m_j is perturbed to m_j eigenvalues (of

ϕ_η(λ)) inside the unit circle or to m_j eigenvalues outside the unit circle.

Proof. Consider the real quadratic pencil P (λ) = λϕ₀(λ) = λ2A− λQ + A. So P (λ) and ϕ₀(λ) have the same eigenvalues onT. If λ₀ = ±1 is an eigenvalue of P (λ)

onT with multiplicity m₀, then so is λ₀. Thus the total number of nonreal eigenvalues

of P (λ) onT must be even. Now the quadratic pencil P_η(λ) = λ2A−λ(Q+iηI)+A

is -palindromic, and it has no eigenvalues on T for any η = 0 [12]. If 1 (or −1) is an eigenvalue of P (λ) with multiplicity r and Q in P (λ) is perturbed to Q + iηI,

then half of these r eigenvalues are perturbed to the inside of T and the other half

are perturbed to the outside ofT. This means that r must be even. Thus the total

number of eigenvalues of ϕ₀(λ) onT is also even and is denoted by 2m.

(i) By X_η+ AX_η−1A = Q_η we have

i(Q∗_η− Q_η) = i(X_η∗− X_η)− iA(X_η−1− X_η−∗)A

= i(X_η∗− X_η)− (X_η−1A)∗i(X_η∗− X_η)(X_η−1A).

Thus

K_η− S_η∗K_ηS_η= 2ηI,

(3.15)

where K_η = iX_η∗− X_η = 2X_η,I. Note that the eigenvalues of S_η = X_η−1A are

the eigenvalues of P_η(λ) inside T. Since X = lim_η→0+X_η is invertible, we have

S = X−1A = lim_η→0+S_η. Let S = V₀ R_0,1 0 0 R_0,2 V₀−1 (3.16)

be a spectral resolution of S, where R_0,1∈ Cm×mand R_0,2∈ C(n−m)×(n−m)are upper

triangular with σ(R_0,1)⊆ T and σ(R_0,2)⊆ D ≡ {λ ∈ C| |λ| < 1}, and V₀= [V_0,1, V_0,2]

with V_0,1 ∈ Cn×m and V_0,2 ∈ Cn×(n−m) having unit column vectors. It follows from

[24, Chapter V, Theorem 2.8] that there is a nonsingular matrix V_η= [V_η,1, V_η,2] with

V_η,1∈ Cn×m and V_η,2∈ Cn×(n−m) such that

S_η = V_η R_η,1 0 0 R_η,2 V_η−1 (3.17) and R_η,1→ R_0,1, R_η,2→ R_0,2, and V_η→ V₀, as η→ 0+.

(12)

From (3.15) and (3.17) we have V_η∗K_ηV_η− R∗_η,1 0 0 R∗_η,2 V_η∗K_ηV_η R_η,1 0 0 R_η,2 = 2ηV_η∗V_η. (3.18) Let H_η= V_η∗K_ηV_η= H_η,1 H_η,3 H_η,3∗ H_η,2 , V_η∗V_η= W_η,1 W_η,3 W_η,3∗ W_η,2 . (3.19) Then (3.18) becomes H_η,1− R∗_η,1H_η,1R_η,1= 2ηW_η,1, (3.20a) H_η,2− R∗_η,2H_η,2R_η,2= 2ηW_η,2, (3.20b) H_η,3− R∗_η,1H_η,3R_η,2= 2ηW_η,3. (3.20c)

As η→ 0+, R_η,1 → R_0,1 with ρ(R_0,1) = 1, R_η,2 → R_0,2 with ρ(R_0,2) < 1, and W_η,2

and W_η,3are bounded. So we have H_η,2→ 0 from (3.20b) and H_η,3→ 0 from (3.20c).

It follows from (3.19) that K_η= 2X_η,I converges to K₀= 2X_I with rank(X_I)≤ m.

(ii) Suppose that eigenvalues of ϕ₀(λ) onT are semisimple and S_η− S₂= O(η)

for η > 0 suﬃciently small. Then we will show that H_η,1 in (3.20a) converges to H_0,1

with rank(H_0,1) = m. Let λ₁, . . . , λ_r∈ T be the distinct semisimple eigenvalues of S

with multiplicities m₁, . . . , m_r, respectively. Then (3.16) can be written as

S = V₀ D_0,1 0 0 R_0,2 V₀−1,

where D_0,1= diag{λ₁I_m₁, . . . , λ_rI_m_r}, V₀= [V_0,λ₁, . . . , V_0,λ_r, V_0,2], andr_i=1m_i= m.

Now S_η= S+(S_η−S) with S_η−S₂= O(η). By repeated application of [24, Chapter

V, Theorem 2.8] there is a nonsingular matrix V_η = [V_η,λ₁, . . . , V_η,λ_r, V_η,2] ∈ Cn×n

such that S_η = V_η D_0,1+ ηE_η,1 0 0 R_0,2+ ηE_η,2 V_η−1

and V_η → V₀as η→ 0+, where E_η,1= diag{E1_m₁_,η, . . . , E_m1_r_,η} with E_m1_j_,η∈ Cmj×mj

and E_η,2 ∈ C(n−m)×(n−m) are such that E_m1_j_,η₂ = O(1) for j = 1, . . . , r and

Eη,22= O(1).

The equation (3.20a) can then be written as

H_η,1− (D_0,1+ ηE_η,1)∗H_η,1(D_0,1+ ηE_η,1) = 2ηW_η,1.

(3.21)

Since D_0,1+ ηE_η,1 is a block diagonal matrix and all eigenvalues of its jth diagonal

block converge to λ_j, with λ_j’s distinct numbers onT, we have

H_η,1= diag{H_m1₁_,η, . . . , H_m1_r_,η} + O(η),

where diag{H_m1₁_,η, . . . , H_m1_r_,η} is the block diagonal of H_η,1. Then (3.21) gives

H_m1

j,η− (λjImj + ηEm1j,η)∗Hm1j,η(λjImj + ηEm1j,η) = 2ηWm1j,ηfor j = 1, . . . , r,

where W_m1_j_,η is the jth diagonal block of W_η,1. Since W_η,1 is positive deﬁnite and

converges to a positive deﬁnite matrix, W_m1

j,η, j = 1, . . . , r, are also positive deﬁnite

(13)

and converge to positive deﬁnite matrices. For η > 0, we have ρ(λ_jI_m_j+ ηE_m1

j,η) < 1

for j = 1, . . . , r since ρ(S_η) < 1. By the assumption that X_η converges to X, we

have that H_m1_j_,η converges to H_m1_j_,0 for j = 1, . . . , r. From Lemma 3.2, we obtain

that H_m1

j,0 is positive deﬁnite for j = 1, . . . , r. Hence, Hη,1 converges to H0,1 with

rank(H_0,1) = m. It follows from (3.19) that K_η= 2X_η,I converges to K₀= 2X_I with

rank(X_I) = m.

(iii) It suﬃces to show thatS_η− S₂= O(η) for η > 0 suﬃciently small. Since

X is a solution of X + AX−1A = Q, we have M I X =L I X S,

where the pencil (M, L) is deﬁned in (3.5) . Under the condition in (iii), the column

space of_XI is a simple eigenspace of (M, L), in the terminology of [24]. It follows

from [24, Chapter VI, Theorems 2.12 and 2.15] that A 0 Q + iηI −I I + ηF_η,1 X + ηF_η,2 = 0 I A 0 I + ηF_η,1 X + ηF_η,2 (S + ηE_η),

where F_η,1, F_η,2, E_η ∈ Cn×n with max{F_η,1₂,F_η,2₂,E_η₂} ≤ c for η > 0

suﬃ-ciently small and c > 0. It is easily seen that

X_η = (X + ηF_η,2)(I + ηF_η,1)−1,

S_η = X_η−1A = (I + ηF_η,1)(S + ηE_η)(I + ηF_η,1)−1.

It follows thatS_η− S₂= O(η) for η > 0 suﬃciently small.

Remark 3.1. Without the additional conditions in Theorem 3.3(ii) or (iii),

rank (X_I) could be much smaller than m. Consider the example with A = I_n

and Q = 2I_n. Then ϕ₀(λ) has all 2n eigenvalues at 1 with partial multiplicities

2. Thus m = n, but it is easy to see that rank (X_I) = 0. For this example, we

have S_η − S₂ = O(η1/2) for η > 0 suﬃciently small. We also know that the 2n

eigenvalues of ϕ₀(λ) at 1 are perturbed to n eigenvalues inside the unit circle and n

eigenvalues outside the unit circle.

Corollary 3.4. If ϕ₀(λ) has no eigenvalues on T, then X is real symmetric.

Furthermore, In(X) = In (−ϕ₀(1)). Here In(W ) denotes the inertia of a matrix W . Proof. From Theorem 3.3, it is easy to see that X is a real symmetric matrix.

Since X is real, S = X−1A is a real matrix. By setting λ = 1 in (3.4) we get

ϕ₀(1) =−(I − S)X(I− S). Hence, In(X) = In(−ϕ₀(1)).

Corollary 3.5. _{If all eigenvalues of ϕ}₀_{(λ) are on}T and are simple, then X_I _is

positive deﬁnite.

Proof. The proof is by Theorem 3.3(iii) immediately.

4. An SA. As explained in [12] and also in this paper, the required solution X = G−1_L,S is a particular weakly stabilizing solution of (3.3) and is given by X =

lim_η→0+X_η, where X_η is the unique stabilizing solution of (1.4). We will call this

particular solution the weakly stabilizing solution of (3.3). It can be approximated by

X_η for a small η. For a ﬁxed η > 0, X_η can be computed eﬃciently by the DA studied

in [12] for all energy values.

In this section we will develop an SA that for most cases can ﬁnd the weakly stabilizing solution of (3.3) more eﬃciently and more accurately than the DA by working on the equation (3.3) directly.

(14)

Consider the pencil (M, L) given by (3.5). The simple relation M_XI=L_XIX−1A

shows that the weakly stabilizing solution of (3.3) is obtained by X = X₂X₁−1, where

_X₁

X2

forms (or, more precisely, the columns of X1

X2

form) a basis for the invariant

subspace of (M, L) corresponding to its eigenvalues inside T and its eigenvalues on T

that would be perturbed to the inside ofT when Q is replaced by Q_η with η > 0.

We now assume that all unimodular eigenvalues λ = ±1 of (M, L) are semisimple

and the eigenvalues±1 (if they exist) have partial multiplicities 2. This assumption

seems to hold generically. Under this assumption, for computing the weakly stabiliz-ing solution we need to include all linearly independent eigenvectors associated with

the eigenvalues±1 and use Theorem 3.1 to determine which half of the unimodular

eigenvalues λ = ±1 should be included to compute the required invariant subspace.

We may use the QZ algorithm to determine this invariant subspace, but it would

be better to exploit the structure of the pencil (M, L). We will use the same approach

as in [15] to develop an SA to ﬁnd a basis for the desired invariant subspace of (M, L)

and then compute the weakly stabilizing solution of (3.3). The algorithm is still based

on the (S + S−1)-transform in [19] and Patel’s algorithm in [23], but some new issues

need to be addressed here.

It is well known that (M, L) is a symplectic pair, i.e., (M, L) satisﬁes MJ M=

LJ L_{, where}_{J =}0 I

−I 0

. Furthermore, the eigenvalues of (M, L) form reciprocal

pairs (ν, 1/ν), where we allow ν = 0,∞. We deﬁne the (S + S−1)-transform [19] of

(M, L) by K := MJ L_{J + LJ M}_{J =} Q A− A A− A Q , (4.1a) N := LJ L_{J =} A 0 0 A . (4.1b)

ThenK and N are both skew-Hamiltonian, i.e., KJ = J K andN J = J N. The

relationship between eigenvalues of (M, L) and (K, N ) and their Kronecker structures

has been studied in [19, Theorem 3.2]. We will ﬁrst extend that result to allow

unimodular eigenvalues for (M, L). The following preliminary result is needed.

Lemma 4.1. Let N_r(λ) := λI_r+ N_r, where N_r is the nilpotent matrix with

N_r(i, i + 1) = 1, i = 1, . . . , r− 1, and zeros elsewhere. Let eq.∼ denote the equivalence

between two matrix pairs. (Two matrix pairs (Y₁, Y₂) and (Z₁, Z₂) are called equivalent if there are nonsingular matrices U and V such that U Y₁V = Z₁ and U Y₂V = Z₂.) Then

(i) for λ = 0, ±1, (N_r(λ) + N_r(λ)−1, I_r)eq.∼ (N_r(λ + 1/λ), I_r);

(ii) (N_r2+ I_r, N_r)eq.∼ (I, N_r).

Proof. (i) Since λ = 0, one can show that N_r(λ)−1 ≡ [t_j−i] and N_r(λ)+N_r(λ)−1 ≡

[s_j−i] are Toeplitz upper triangular with t_k= (−1)kλ−(k+1) for k = 0, 1, . . . , r− 1, as

well as s₀ = λ + 1/λ, s₁= 1− λ−2, and s_k = t_k for k = 2, . . . , r− 1. Since λ = ±1,

s₁= 1− λ−2 is nonzero. It follows that (N_r(λ) + N_r(λ)−1, I_r)eq.∼ (N_r(λ + 1/λ), I_r).

(ii) (I + N_r2, N_r) eq.∼ (I_r, N_r(I + N_r2)−1) eq.∼ (I_r, N_r − N_r3 + N_r5 − · · · ) eq.∼

(I_r, N_r).

Theorem 4.2. _{Suppose that (}M, L) has eigenvalues {±1} with partial

multiplic-ities 2. Let γ = λ + 1/λ (λ = 0,∞ permitted). Then λ and 1/λ are eigenvalues of (M, L) if and only if γ is a double eigenvalue of (K, N ). Furthermore, for λ = ±1

(i.e., γ = ±2) γ, λ and 1/λ have the same size Jordan blocks, i.e., they have the same

partial multiplicities; for λ =±1, γ = ±2 are semisimple eigenvalues of (K, N ).

(15)

Proof. By the results on Kronecker canonical form for a symplectic pencil (see

[20]) and by our assumption, there are nonsingular matricesX and Y such that

YMX = J D 0 I_n , YLX = I_n 0 0 J , (4.2)

where J = J₁⊕ J_s⊕ J₀, J₁ = I_p ⊕ (−I_q), J_s is the direct sum of Jordan blocks

corresponding to nonzero eigenvalues λ_j of (M, L), where |λ_j| < 1 or λ_j = eiθj

with Im(λ_j) > 0, and J₀ is the direct sum of nilpotent blocks corresponding to zero

eigenvalues, and D = I_p⊕ I_q ⊕ 0_n−r with r = p + q.

Let X−1J X− =_−XX1 X2

2 X3

, where X_i ∈ Cn×n, i = 1, 2, 3. Thus X₁ =−X₁

and X₃=−X₃. Using (4.2) in MJ M =LJ L we get

J X₁J− DX₂J+ J X₂D + DX₃D = X₁, (4.3a) J X₂+ DX₃= X₂J, (4.3b) J X₃J= X₃. (4.3c)

Let J_s,0 = J_s⊕ J₀. We partition X₃ and X₂ by X₃ = _−XX3,1 X3,2

3,2 X3,3 and X₂ = _X_2,1 _X_2,2 X2,3 X2,4

, respectively, where X_2,1, X_3,1 ∈ Cr×r. Comparing the diagonal blocks

in (4.3b) we get

J₁X_2,1+ X_3,1 = X_2,1J₁, (4.4a)

J_s,0X_2,4 = X_2,4J_s,0. (4.4b)

From (4.4a) we see that X_3,1 has the form X_3,1 =_−ω0p ₀ω

q

. From (4.3c) we have [I_p⊕ (−I_q)]X_3,1[I_p⊕ (−I_q)] = X_3,1. It follows that ω = 0 and thus X_3,1 = 0. From

(4.3c) we also have J₁X_3,2J_s,0 = X_3,2 and J_s,0X_3,3J_s,0 = X_3,3, from which we get

X_3,2 = 0 and X_3,3 = 0. So we have X₃ = 0. Then (4.3b) becomes J X₂ = X₂J,

from which we get

X_2,1= η_p⊕ η_q, X_2,2= X_2,3 = 0, (4.5)

where η_p ∈ Cp×p and η_q ∈ Cq×q. Moreover, X_2,1 and X_2,4 are nonsingular by the

nonsingularity ofX−1J X−. Substituting (4.3b) into (4.3a) we get

X₁= J X₁J− DJX₂+ X₂JD≡ JX₁J+ V, (4.6) where V = X₂JD− DJX₂ =V₀1 0₀ with V₁ = (η_p− η_p)⊕ (η_q − η_q) by (4.5). Partition X₁=_−XX1,1 X1,2 1,2 X1,3

with X_1,1 ∈ Cr×r. From the equations for the (1, 2) and

(2, 2) blocks of (4.6) we get X_1,2 = 0 and X_1,3= 0, respectively. Furthermore, from

the equation for the (1, 1) block in (4.6) we get X_1,1 = ξ_p⊕ ξ_q with ξ_p ∈ Cp×p and

ξ_q ∈ Cq×q, and we also get η_p= η_p and η_q = η_q, i.e., X_2,1 = X_2,1.

(16)

From (4.2), (4.4), and Lemma 4.1(ii) we now have (K, N )eq.∼ (MJ L+LJ M,LJ L) eq._∼ (J₁X_1,1+ X_1,1J₁)⊕0_n−r (J₁2+ I_r)X_2,1⊕(J_s,02 + I_n−r)X_2,4 X_2,1(J₁2+ I_r)⊕ (X_2,4 ((J_s,02 )+ I_n−r)) 0_n , X_1,1⊕ 0_n−r J₁X_2,1⊕ J_s,0X_2,4 X_2,1J₁⊕ X_2,4 J_s,0 0_n eq._∼ 2ξ_p⊕ (−2ξ_q) 2I_r 2I_r 0_r ⊕ (Js+ Js−1)⊕ I ⊕ (Js+ Js−1)⊕ I, ξ_p⊕ ξ_q I_p⊕ (−I_q) I_p⊕ (−I_q) 0_r ⊕ I ⊕ J0⊕ I ⊕ J0 eq._∼_2I 2p⊕ (−2I2q)⊕ (Js+ Js−1)⊕ I ⊕ (Js+ Js−1)⊕ I, I2p⊕ I2q⊕ I ⊕ J0⊕ I ⊕ J0. The proof is completed by using Lemma 4.1(i).

4.1. Development of SA. It is helpful to keep in mind that the transform λ→ γ achieves the following:

{0, ∞} → ∞, T → [−2, 2], R \ {±1, 0} → R \ [−2, 2], C \ (R ∪ T) → C \ R.

By Theorem 4.2 and our assumption on the unimodular eigenvalues of (M, L), all

eigenvalues of (K, N ) in [−2, 2] are semisimple.

Based on the Patel approach [23] we ﬁrst reduce (K, N ) to a block triangular

matrix pair of the form

U_{KZ =} K1 K2 0 K₁ , UN Z = N₁ N₂ 0 N₁ , (4.7)

where K₁and N₁∈ Rn×nare upper Hessenberg and upper triangular, respectively, K₂

and N₂are skew symmetric,U and Z ∈ R2n×2nare orthogonal satisfyingUJ Z = J .

By the QZ algorithm, we have orthogonal matrices Q₁ and Z₁ such that

Q₁K₁Z₁= K₁₁, Q₁N₁Z₁= N₁₁, (4.8)

where K₁₁ and N₁₁are quasi-upper and upper triangular, respectively.

From (4.7) and (4.8) we see that the pair (K₁₁, N₁₁) contains half the eigenvalues

of (K, N ). We now reduce (K₁₁, N₁₁) to the quasi-upper and upper block triangular matrix pair UK₁₁Z = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ I_m₀ K₀₁ K₀₂ · · · K_0r Γ₁ K₁₂ · · · K_1r Γ₂ . .. ... . ._. _K r−1r Γ_r ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ , UN₁₁Z = diag {Γ₀, I_m₁, I_m₂, . . . , I_m_r},

where m₀+ m₁+· · ·+ m_r= n, Γ₀is strictly upper triangular, Γ₁= diag{g₁, . . . , g_m₁}

with g_i ∈ [−2, 2], and σ(Γ_j) ={γ_j} ⊆ R \ [−2, 2] or σ(Γ_j) ={γ_j, γ_j} ⊆ C \ R with

(17)

σ(Γ_j)∩ σ(Γ_i) =∅, i = j, i, j = 2, . . . , r. Let K_i= [ K_ii+1, . . . , K_ir], i = 0, . . . , r− 1, Γ_j = Γ_j K_j 0 Γ_j+1 , Γ_r= Γ_r, j = 1, . . . , r− 1. By solving some Sylvester equations

Γ₀Z Γ₁− Z = K₀,

Z Γ_i+1− Γ_iZ = K_i, i = 1 . . . , r− 1, (4.9)

we can reduce (K₁₁, N₁₁) to the quasi-upper and upper block diagonal matrix pair

UK₁₁Z = diag {I_m₀, Γ₁, Γ₂, . . . , Γ_r},

UN₁₁Z = diag {Γ₀, I_m₁, I_m₂, . . . , I_m_r}. (4.10)

The procedure here is very much like the block diagonalization described in

sec-tion 7.6.3 of [8]. Partisec-tion Z = [ Z₀, Z₁, . . . , Z_r] with Z_i ∈ Rn×mi _{according to the}

block sizes of (4.10). It holds that

K₁₁Z₀Γ₀= N₁₁Z₀, K₁₁Z_j= N₁₁Z_jΓ_j, j = 1, . . . , r. (4.11)

It follows that for Z₁ from (4.8),Z(:, 1 : n)(Z₁Z_j) forms a basis for an invariant

subspace of (K, N ) for j = 0, 1, . . . , r. In particular, the columns of Z(:, 1 : n)(Z₁Z₁)

are real eigenvectors of (K, N ) corresponding to real eigenvalues in [−2, 2]. We then

need to get a suitable invariant subspace of (M, L) from each of these invariant

subspaces for (K, N ).

We start with two lemmas about solving the quadratic equation γ = λ + 1/λ in the matrix form.

Lemma 4.3. Given a real quasi-upper triangular matrix

Γ_s= ⎡ ⎢ ⎣ γ₁₁ · · · γ_1m . ._. .._. 0 γ_mm ⎤ ⎥ ⎦ , (4.12)

where γ_ii is 1× 1 or 2 × 2 block with σ(γ_ii)⊆ C \ [−2, 2], i = 1, . . . , m, the quadratic matrix equation

Λ2_s− Γ_sΛ_s+ I = 0

(4.13)

of Λ_sis uniquely solvable with Λ_sbeing real quasi-upper triangular with the same block form as Γ_sin (4.12) and σ(Λ_s)⊆ D ≡ {λ ∈ C| |λ| < 1}. Proof. Let Λ_s= ⎡ ⎢ ⎣ λ₁₁ · · · λ_1m . ._. .._. 0 λ_mm ⎤ ⎥ ⎦

have the same block form as Γ_s. We ﬁrst solve the diagonal blocks{λ_ii}m_i=1 of Λ_s

from the quadratic equation λ2− γ_iiλ + I_[i]= 0, where [i] denotes the size of γ_ii. Note

that the scalar equation

λ2− γλ + 1 = 0 (4.14)

has no solutions onT for γ ∈ C \ [−2, 2]. It always has one solution inside T and the

other outsideT.

(18)

For i = 1, . . . , m, if γ_ii ∈ R \ [−2, 2], then λ_ii ∈ (−1, 1) is uniquely solved from

(4.14) with γ = γ_ii. If γ_ii∈ R2×2 with γ_iiz = γz for z = 0 and γ ∈ C \ R, then γ_ii =

[z, z]diag{γ, γ} [z, z]−1 and the required solution is λ_ii = [z, z]diag λ, λ![z, z]−1 ∈

R2×2_{, where λ}_{∈ D is uniquely solved from (4.14).}

For j > i, comparing the (i, j) block on both sides of (4.13) and using λ_ii− γ_ii =

−λ−1 ii , we get λ_ijλ_jj− λ−1_ii λ_ij= γ_ijλ_jj+ j−1 " =i+1 (γ_i− λ_i)λ_j.

Since σ(λ−1_ii )∩ σ(λ_jj) = ∅, i, j = 1, . . . , m, the strictly upper triangular part of Λ_s

can be determined by the following recursive formula:

For d = 1, . . . , m− 1, For i = 1, . . . , m− d, j = i + d, A := λ_jj⊗ I_[i]− I_[j]⊗ λ−1_ii , b := γ_ijλ_jj+j−1_=i+1(γ_i− λ_i)λ_j, λ_ij = vec−1(A−1vec(b)), end i, end d.

Here⊗ denotes the Kronecker product, vec is the operation of stacking the columns

of a matrix into a vector, and vec−1 is its inverse operation.

We note that (4.13) is a special case of the palindromic matrix equation studied

recently in [16]. The desired solution Λ_s could also be obtained by applying the

general formula in [16, Theorem 5], which involves the computation of (Γ−1_s )2 and

a matrix square root. However, for our special equation, the procedure given in the proof of Lemma 4.3 is more direct and numerically advantageous.

Lemma 4.4. Given a strictly upper triangular matrix Γ₀ = [γ_ij] ∈ Re×e, the

quadratic matrix equation

Γ₀Λ2₀− Λ₀+ Γ₀= 0 in Λ₀= [λ_ij]∈ Re×e

(4.15)

with Λ₀ being strictly upper triangular is uniquely solvable.

Proof. From (4.15) the matrix Λ₀ is uniquely determined by λ_i,i+j = γ_i,i+j, i = 1, . . . , e− 2, j = 1, 2, λ_e−1,e= γ_e−1,e, and

For j = 3, . . . , e,

For i = 1, . . . , e− j + 1,

λ_i,i+j−1=i+j−2_=i+1λ_i,λ_,i+j−1,

end i,

For i = 1, . . . , e− j,

λ_i,i+j = γ_i,i+j+i+j−2_=i+1γ_i,λ_,i+j,

end i, end j.

Theorem 4.5. Let Z_s form a basis for an invariant subspace of (K, N )

cor-responding to Γ_s with σ(Γ_s) ⊆ C \ [−2, 2], i.e., KZ_s = N Z_sΓ_s. Suppose that Λ_s solves Γ_s = Λ_s+ Λ−1_s as in Lemma 4.3 with σ(Λ_s) ⊆ D \ {0}. If the columns of J (L_{J Z}

sΛs− MJ Zs) are linearly independent, then they form a basis for a stable

invariant subspace of (M, L) corresponding to Λ_s. Proof. Since

KZs− N ZsΓ_s= (MJ L+LJ M)J Z_s− LJ LJ Z_s(Λ_s+ Λ−1_s ) = 0,

we haveMJLJ Z_sΛ_s− MJ Z_s=LJLJ Z_sΛ_s− MJ Z_sΛ_s.