THE MATRIX EQUATION X + A(T)X(-1) A = Q AND ITS APPLICATION IN NANO RESEARCH

(1)

THE MATRIX EQUATION X + ATX−1A = Q AND ITS

APPLICATION IN NANO RESEARCH∗

CHUN-HUA GUO† AND WEN-WEI LIN‡

Abstract. The matrix equation X + ATX−1A = Q has been studied extensively when A and

Q are real square matrices and Q is symmetric positive definite. The equation has positive definite solutions under suitable conditions, and in that case the solution of interest is the maximal positive definite solution. The same matrix equation plays an important role in Green’s function calculations in nano research, but the matrixQ there is usually indefinite (so the matrix equation has no positive definite solutions), and one is interested in the case where the matrix equation has no positive definite solutions even whenQ is positive definite. The solution of interest in this nano application is a special weakly stabilizing complex symmetric solution. In this paper we show how a doubling algorithm can be used to find good approximations to the desired solution efficiently and reliably.

Key words. nonlinear matrix equation, complex symmetric solution, stable solution, ﬁxed-point

iteration, doubling algorithm, Newton’s method, Green’s function

AMS subject classifications. 15A24, 65F30, 65H10 DOI. 10.1137/090758209

1. Introduction. The nonequilibrium Green’s function formalism provides a powerful conceptual and computational framework for treating quantum transport in nanodevices [6]. Typically, the system Hamiltonian H is a bi-infinite or semi-infinite block tridiagonal real symmetric matrix [1, 6, 15, 16, 21]. The semi-infinite case is slightly easier to handle. So we assume H is bi-infinite, and in that case H is usually of the form H = ⎡ ⎣ _HHTL HL,S L,S HS HS,R HS,RT HR ⎤ ⎦ ,

where HS is the Hamiltonian for the scattering region, which is an ns× nssymmetric

matrix; HL is the Hamiltonian for the left lead, given by

HL= ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ . ._. . ._. . ._. _B L AL AT_L BL AL ATL BL ⎤ ⎥ ⎥ ⎥ ⎥ ⎦,

∗_{Received by the editors May 6, 2009; accepted for publication (in revised form) August 30, 2010;}

published electronically October 5, 2010.

http://www.siam.org/journals/sisc/32-5/75820.html

†_{Department of Mathematics and Statistics, University of Regina, Regina, SK S4S 0A2, Canada}

([email protected]). The work of this author was supported in part by a grant from the Natural Sciences and Engineering Research Council of Canada and by the National Center for Theoretical Sciences in Taiwan.

‡_{Department of Applied Mathematics, National Chiao Tung University, Hsinchu 300, Taiwan}

([email protected]). The work of this author was partially supported by the National Science Council and the National Center for Theoretical Sciences in Taiwan.

3020

(2)

where all matrix blocks are nl× nl, AL = 0, and BL is symmetric; and HR is the

Hamiltonian for the right lead, given by

HR= ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ BR AR AT_R BR AR ATR BR . .. . ._. . ._. ⎤ ⎥ ⎥ ⎥ ⎥ ⎦,

where all matrix blocks are nr× nr, AR= 0, and BRis symmetric. The matrix HL,S

represents the coupling between the scattering region and the left lead and is given by HL,S = ⎡ ⎢ ⎢ ⎢ ⎣ .. . 0 0 CL,S ⎤ ⎥ ⎥ ⎥ ⎦,

where CL,S is nl × ns, and the matrix HS,R represents the coupling between the

scattering region and the right lead and is given by

HS,R= [DS,R0 0 . . .] ,

where DS,Ris ns× nr.

The Green’s function G is deﬁned [5, 7] by G = ((E + i0+)I − H)−1= lim

η→0+((E + iη)I − H)

−1_,

whereE is energy, a real number that may be negative, and I is the identity operator.

We will also use Ir (or simply I) to denote the identity matrix of dimension r. Since

H is real symmetric, it is easily seen [5, 7] that G = G1+ iG2, with G1 and G2 real

symmetric (so G is complex symmetric). In practice, one is only interested in GS, the

Green’s function corresponding to the scattering region. It is easily found [5, 15] that GS =(E + i0+_{)I − H}_S− H_L,ST _G_L_H_L,S− H_S,R_G_R_H_S,RT −1

=(E + i0+_{)I − H}_S− C_L,ST _G_L,S_C_L,S− D_S,R_G_S,R_DT_S,R −1_,

where GL = ((E + i0+_{)I − H}_L)−1_{, G}_R = ((E + i0+_{)I − H}_R)−1_{, G}_L,S _{is the n}_l× n_l

matrix in the lower-right corner of GL, and GS,Ris the nr×nrmatrix in the upper-left

corner of GR. Moreover, one is only interested in the values ofE for which G_L,S and

GS,R, respectively, have nonzero imaginary parts [16]. Computing GL,S and GS,Rhas

been a challenging problem for nano scientists.

It is easily seen [15] that GL,S satisﬁes the matrix equation

GL,S = ((E + i0+_{)I − B}_L− AT_L_G_L,S_A_L)−1_,

and GS,Rsatisﬁes the matrix equation

GS,R= ((E + i0+)I − BR− ARGS,RATR)−1.

For each ﬁxedE, we take η > 0 suﬃciently small and compute G_L,S_{(η), the n}_l× n_l

matrix in the lower-right corner of GL(η) = ((E + iη)I − HL)−1. In [1, 15, 21], the

(3)

values of E are between −5 and 5, and the smallest η used is 10−6_{. Now G}_L,S_(η) satisﬁes the matrix equation

(1.1) _{X = ((E + iη)I − B}_L− AT_L_XA_L)−1_,

and GL,S(η) is taken to be an approximation to GL,S. Similarly, GS,Ris approximated

by GS,R(η), which is the nr× nr matrix in the upper-left corner of GR(η) = ((E +

iη)I − HR)−1 and is a particular solution of

(1.2) _{X = ((E + iη)I − B}_R− A_R_XAT_R)−1_.

Since (1.1) and (1.2) are the same type, we only need to study (1.1).

A basic numerical method for ﬁnding GL,S(η) is the ﬁxed-point iteration [21]

(1.3) _X(k+1)= ((E + iη)I − B_L− AT_L_X(k)_A_L)−1

with X(0)= ((E + iη)I − B_L)−1. It has been observed that the iteration converges in

practice and the limit is the required matrix GL,S(η) (rather than a diﬀerent solution

of (1.1)). The convergence of the iteration has been observed to be very slow for η close to 0. Note that we cannot take η = 0 for this iteration, since, otherwise, the

sequence X(k)(even if well-deﬁned) would be real and would not approximate GL,S,

which is to have a nonzero imaginary part. To speed up the convergence of (1.3), the

following strategy is suggested in [21]: once X(k+1) has been computed from X(k),

replace X(k+1) by (X(k+1)+ X(k))/2 and proceed with the iteration. Since GL,S(η)

needs to be computed for many diﬀerent values of E, it is also suggested in [1, 21]

that the GL,S(η) computed at a given energy be used as the initial guess for GL,S(η)

at the next nearby energy point.

Other methods have also been considered. In [15] (1.1) is rewritten as

(1.4) _AT_L_XA_L_{X − ((E + iη)I − B}_L_{)X + I = 0,}

and Newton’s method with exact line searches (as in [14]) is used, possibly preceded

by the ﬁxed-point iteration (1.3). In [15] η = 0+in theory, but η is taken to be 2×10−6

in actual computations. Whether this Newton’s method will always converge to the

solution GL,S(η) is left as an open problem in [15]. Another approach is suggested in

[16]. In that approach one would obtain the equation

(1.5) _AT_L_Y2− ((E + iη)I − B_L_{)Y + A}_L= 0

by postmultiplying (1.4) by AL, where Y = XAL. Equations (1.4) and (1.5) are

equivalent when AL is nonsingular. In [16] η = 0 is assumed. If the right solution Y

of (1.5) is found, then GL,S= Y A−1_L is obtained. To ﬁnd Y the auxiliary matrix

(1.6) 0 _I −(AT L)−1AL (ATL)−1(EI − BL)

is used. Assuming this matrix is diagonalizable, the required solution Y can be found

(see [13]) by selecting nl linearly independent eigenvectors of (1.6). However, there

is no explanation in [16] how these nl eigenvectors can be selected from the 2nl

eigenvectors. Moreover, this approach does not work when ALis singular. Indeed, for

a simple example given in [16], AL is singular but the solution GL,S can be found

analytically.

(4)

In this paper we show how a doubling algorithm can ﬁnd the desired solution

GL,S(η) of (1.1) and the desired solution GS,R(η) of (1.2). Our numerical experiments

indicate the eﬃciency and reliability of the doubling algorithm. The algorithm works very well even when η is taken to be very close to 0.

Our starting point is to rewrite (1.1) as

(1.7) _{X + A}T_X−1_{A = Q,}

where

(1.8) _{A = A}_L_, _{Q = Q}_L_{+ iηI,} _Q_L=EI − B_L_,

and the required solution is X = (GL,S(η))−1. Similarly, we rewrite (1.2) as

(1.9) _{X + AX}−1_AT _{= Q,}

where

(1.10) _{A = A}_R_, _{Q = Q}_R_{+ iηI,} _Q_R=EI − B_R_,

and the required solution is X = (GS,R(η))−1.

We often have AL = AR and BL = BR (in [15], for example). In this case, the

doubling algorithm is able to compute GL,S(η) and GS,R(η) simultaneously.

2. Characterization of GL,S(η) and GS,R(η). The matrices GL,S(η),

GS,R(η) have been uniquely deﬁned through the inverses of semi-inﬁnite block Toeplitz

matrices. They are also particular solutions of the matrix equations (1.1) and (1.2), re-spectively. Those matrix equations may have many other solutions. So we need to give

a characterization for GL,S(η) and GS,R(η), in terms of the matrix equations (rather

than the inﬁnite matrices). We only need to derive results for GL,S(η); analogous

results for GS,R(η) follow readily.

We have rewritten (1.1) as (1.7) by replacing X in (1.1) with X−1. When Q is a

real symmetric positive definite matrix (which is possible here only when η = 0), the matrix equation (1.7) has been studied extensively (see [3, 8, 10, 11, 12, 17, 20, 22]). In all those papers the desired solution is the maximal positive definite solution (when it has at least one positive definite solution). As we will explain later, in the nano

application we are only interested in values of energy E for which (1.7) with Q =

EI − BLhas no positive deﬁnite solutions.

When A is nonsingular, with Y = X−1A, (1.7) is equivalent to the quadratic

matrix equation

ATY2− QY + A = 0

for which we have the associated quadratic eigenvalue problem P (λ)x = 0, x = 0,

where P (λ) = λ2AT− λQ + A.

We will not use this connection to solve (1.7) since the matrix A may be singular in the application we have in mind. However, we will use the quadratic matrix polynomial P (λ) in our discussions even when A is singular.

Theorem 2.1. Let A and Q be as in (1.8) and T be the unit circle. Then the

quadratic P (λ) = λ2AT − λQ + A has no eigenvalues on T for any η = 0. In this

(5)

case it has exactly nl eigenvalues inside T (counting multiplicities) and exactly nl

eigenvalues outsideT (counting multiplicities and including eigenvalues at ∞). Proof. The quadratic P (λ) has eigenvalues on T if and only if det P (λ) = 0 for

some λ ∈ T or, equivalently, det(λAT + λ−1A − QL− iηI) = 0 for some λ ∈ T.

However, for any λ ∈ T the matrix λAT + λ−1A − QL is Hermitian and thus has real

eigenvalues ν1, ν2, . . . , νnl. The eigenvalues of λAT + λ−1A − QL− iηI are then the

nonzero numbers ν1− ηi, ν2− ηi, . . ., νnl− ηi. So det(λAT + λ−1A − QL− iηI) = 0

for any λ ∈ T. Thus the quadratic P (λ) has no eigenvalues on T. The location of the eigenvalues must be as stated since P (λ) is a T -palindromic matrix polynomial (see [18]). More precisely, the number ξ is an eigenvalue of the quadratic P (λ) if and only

if ξ−1is so and they have the same partial multiplicities (see [18, Theorem 2.2]).

Let X be any complex symmetric solution of (1.7) and ρ(·) denote the

spec-tral radius. Then ρ(X−1A) = ρ((X−1A)T) = ρ(ATX−1). We adopt the following

deﬁnition.

Definition 2.2.A complex symmetric solution X of (1.7) is said to be stabilizing

if ρ(X−1A) < 1 and weakly stabilizing if ρ(X−1A) = 1. Let (2.1) _M₀= A 0 Q −I , L0= 0 _I AT 0 .

Then the pencil M0 − λL0 (also denoted by (M0, L0)) is a linearization of the

T -palindromic polynomial λ2AT − λQ + A, and X is a solution of (1.7) if and only if

(2.2) _M₀ I X = L0 I X X−1A.

It follows that each complex symmetric solution of (1.7) is determined by a suitable

de-ﬂating subspace of the pencil M0−λL0. The required matrix GL,S= lim_η→0+G_L,S(η)

is the inverse of a special complex symmetric solution of (1.7) with η = 0, and we

are only interested in the values ofE for which G_L,S has a nonzero imaginary part.

We will see later in this section that for those values ofE the pencil M₀− λL₀ (or,

equivalently, the quadratic λ2AT−λQ+A) must have eigenvalues on T and that GL,S

is the inverse of one of the weakly stable solutions of (1.7) with η = 0. For η > 0,

the pencil M0− λL0 has no eigenvalues on T by Theorem 2.1. This is a necessary

condition for the existence of a (necessarily unique) stabilizing solution of (1.7). By using a deep result on linear operators, we will show that (1.7) with η > 0 always

has a stabilizing solution and that its inverse is precisely the matrix GL,S(η) we are

looking for.

Recall that GL,S(η) is the nl× nlmatrix in the lower-right corner of ((E + iη)I −

HL)−1. Note that (E + iη)I − H_L= ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ . ._. . ._. . ._. _Q _−A −AT _Q _−A −AT _Q ⎤ ⎥ ⎥ ⎥ ⎥ ⎦.

(6)

It is easy to see that GL,S(η) is also the nl× nlmatrix in the upper-left corner of T−1 with (2.3) _{T =} ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ Q −AT −A Q −AT −A Q . .. . ._. . ._. ⎤ ⎥ ⎥ ⎥ ⎥ ⎦.

This change is largely notational and is not needed if we discuss GS,R(η). The symbol

for the block Toeplitz matrix T is φ(λ) = −λA + Q − λ−1AT.

Let 2 be the usual Hilbert space of all square summable sequences of complex

numbers, and let m₂ be the Cartesian product of m copies of 2. The inﬁnite matrix

(2.3) is then seen to be inB(nl

2 ), the set of all bounded linear operators on n2l. It is

clear that T = (ηi)I − W with W a self-adjoint operator in B(nl

2 ). It is well known

that the spectrum of a self-adjoint operator is real. So, for each η > 0, T has an

inverse inB(nl

2 ) since ηi is not in the spectrum of W . Thus, by appealing to a deep

result on linear operators (see [9, Chapter XXIV, Theorem 4.1] and [19]), we know that φ(λ) has a factorization

(2.4) _{φ(λ) = (I − λ}−1_{L)D(I − λU )}

with D invertible, ρ(L) < 1, and ρ(U ) < 1. From (2.4) we see that A = DU, AT = LD, Q = D + LDU.

Thus D +ATD−1A = Q and ρ(D−1A) < 1. In other words, D is the unique stabilizing

solution of (1.7), which must be complex symmetric. By [9, Chapter XXIV, Theorem

4.1], the nl× nlmatrix in the upper-left corner of T−1 is precisely D−1. We thus have

the following characterization of GL,S(η) and GS,R(η).

Theorem 2.3.The matrix G_L,S(η) is the inverse of the unique stabilizing solution

of (1.7), and the matrix GS,R(η) is the inverse of the unique stabilizing solution of

(1.9).

We will show in the next section that a doubling algorithm can compute GL,S(η)

and GS,R(η) eﬃciently, even if η is very close to 0. In the remainder of this section

we answer the following question, For what values ofE will G_L,S = lim_η→0+G_L,S(η)

have a nonzero imaginary part?

In the limiting case η = 0, the matrix Q in (1.7) is real symmetric. For real symmetric (and more generally Hermitian) matrices X and Y , we write X ≥ Y (X > Y ) if X − Y is positive semidefinite (definite). When Q is real symmetric positive definite, necessary and sufficient conditions for the existence of a positive definite solution of (1.7) have been given in [8].

Theorem 2.4. _{Let Q be positive deﬁnite. Then (1.7) has a positive deﬁnite}

so-lution if and only if the rational matrix function ψ(λ) = Q + λA + λ−1AT is regular (i.e., the determinant of ψ(λ) is not identically zero) and ψ(λ) ≥ 0 for all λ on T. In this case (1.7) has a maximal positive deﬁnite solution X+ (i.e., X+≥ X for any

other positive deﬁnite solution).

The next result follows quite quickly from Theorem 2.4.

Theorem 2.5.Let Q be real symmetric, and for each λ on T, let the eigenvalues

of the Hermitian matrix Q + λA + λ−1AT be μ1(λ) ≤ μ2(λ) ≤ · · · ≤ μn(λ). For

i = 1, 2, . . . , n, let ai = min|λ|=1μi(λ) and bi = max|λ|=1μi(λ), and let Δi = [ai, bi].

(7)

Then the matrix equation

(2.5) _{X + A}T_X−1_{A = sI − Q}

has a positive deﬁnite solution for s > bn, has a negative deﬁnite solution for s < a1,

and has no positive or negative deﬁnite solutions for a1 < s < bn. The quadratic

Ps(λ) = λ2AT − λ(sI − Q) + A has eigenvalues on T if and only if s ∈ ∪1≤i≤nΔi.

Proof. For each i = 1, . . . , n, μi(λ) is a continuous function on T. So μi(T) = Δi.

Since μ1(λ) ≤ μ2(λ) ≤ · · · ≤ μn(λ), we have a1 ≤ a2 ≤ · · · ≤ an and b1 ≤ b2 ≤

· · · ≤ bn. Let ψs(λ) = sI − Q − λA − λ−1AT. If s > bn, then ψs(λ) ≥ sI − bnI > 0 for all λ ∈ T. Note that in this case we must have sI − Q > 0. In fact, we have sI − Q − (A + AT) > 0 for λ = 1 and sI − Q + (A + AT) > 0 for λ = −1. Adding these two we get 2(sI − Q) > 0. Now by Theorem 2.4, (2.5) has a positive deﬁnite solution.

If s < a1, then _ψ_s_{(λ) = Q − sI + λA + λ}−1_AT ≥ a₁_{I − sI > 0. So Q − sI > 0, and by}

Theorem 2.4, the equation X + ATX−1A = Q − sI has a positive deﬁnite solution X∗.

So−X_∗is a negative deﬁnite solution of (2.5). We now show that (2.5) has no positive

deﬁnite solutions for s < bn. In fact, the existence would imply ψs(λ) ≥ 0 for all λ ∈ T

and then ψbn(λ) > 0 for all λ ∈ T. This is impossible since ψbn(λ) is singular for some

λ ∈ T. Similarly, (2.5) has no negative deﬁnite solutions for s > a1. The quadratic

Ps(λ) = λ2AT− λ(sI − Q) + A has eigenvalues on T if and only if det Ps(λ) = 0 for

some λ ∈ T or, equivalently, det ψs(λ) = (s − μ1(λ))(s − μ2(λ)) · · · (s − μn(λ)) = 0 for

some λ ∈ T; the latter is equivalent to s ∈ ∪1≤i≤nΔi.

We use one simple example to illustrate the results in Theorem 2.5. Example 2.1. Let n = 2, and

A = 0 0 1 0 , Q = t + 1 t t t + 1 , t ≥ 0.

This example is a special case of an example in [16]. For λ = eiθ, we ﬁnd

μ1(λ) = t + 1 − t2+ 1 + 2t cos θ, μ2(λ) = t + 1 + t2+ 1 + 2t cos θ. We then ﬁnd Δ₁= [0, 2t] , 0 ≤ t ≤ 1, [0, 2] , t ≥ 1, Δ2= [2, 2(t + 1)] , 0≤ t ≤ 1, [2t, 2(t + 1)] , t ≥ 1.

By Theorem 2.5 or direct verification, (2.5) has a positive definite solution for s > 2(t + 1), has a negative definite solution for s < 0, and has no positive or negative

deﬁnite solutions for 0 < s < 2(t + 1). The quadratic Ps(λ) = λ2AT − λ(sI − Q) + A

has eigenvalues on the unit circle if and only if s ∈ Δ1∪ Δ2. In particular, when

t = 0, the quadratic Ps(λ) has eigenvalues on the unit circle only for s = 0, 2; when t = 1, the quadratic Ps(λ) has eigenvalues on the unit circle for all s ∈ [0, 4]. For this

example, the value t = 1 is the only one for which Δ1∪ Δ2is connected.

We can now give the following result about when GL,S has a nonzero imaginary

part. A similar result can be given for GS,R.

Theorem 2.6.For λ ∈ T, let the eigenvalues of ψ_L(λ) = B_L+ λA_L+ λ−1AT

L be

μL,1(λ) ≤ · · · ≤ μL,n(λ), where n = nl. Let

Δ_L,i=

min

|λ|=1μL,i(λ), max|λ|=1μL,i(λ)

(8)

and Δ_L = ∪n_i=1Δ_L,i_{. Then G}_L,S is a real symmetric matrix if E /∈ Δ_L, and it is a complex symmetric matrix if E ∈ Δ_L. When E ∈ Δ_L_{, the quadratic P}_L_{(λ) =} λ2AT_L− λ(EI − BL) + AL has eigenvalues on T. In the generic case that all these

eigenvalues on T are simple and nonreal, G_L,S has a nonzero imaginary part. Proof. By taking Q = BL, A = AL, and s = E in Theorem 2.5, we know that

PL(λ) has eigenvalues on T if and only if E ∈ ΔL. The matrix GL,S is known to be

complex symmetric. IfE /∈ Δ_L_{, then we know by Theorem 2.3 that G}_L,S is determined

by the deﬂating subspace of M0− λL0 corresponding to the n eigenvalues inside T,

where M0 and L0 are given in (2.1) with η = 0. So GL,S is a real matrix since AL

and BL are real. We now assume that E ∈ Δ_L and that the eigenvalues on T are

simple eigenvalues αk± βki (k = 1, . . . , m) with βk> 0. In this case, for each η > 0,

PL,η(λ) = λ2ATL− λ((E + iη)I − BL) + AL has no eigenvalues on T (see Theorem

2.1). After the introduction of iη, one of the pair αk± βki is moved to the inside of

T (the choice is independent of the size of η > 0 by the continuity of eigenvalues),

and the other is moved to the outside ofT (by the property of T -palindromic matrix

polynomials). Therefore, GL,S is determined by the deﬂating subspace of M0− λL0

(with η = 0) corresponding to the n − m eigenvalues inside T together with the m

eigenvalues onT that would be moved to the inside of T with the introduction of iη.

These n eigenvalues of M0− λL0 must be those of GL,SA by (2.2). In this case GL,S

must have a nonzero imaginary part since, otherwise, the eigenvalues of GL,SA would

appear in conjugate pairs.

We remark that the numerical determination of Δ_L in the above theorem may

require the computation of the eigenvalues of ψL(λ) for many ﬁxed values of λ on T.

However, for each ﬁxed λ on T, ψL(λ) is just a Hermitian matrix. The computation

of all its eigenvalues is a relatively easy task. We don’t have to do the computation

for too many points of λ on T. A rough numerical approximation of ΔL would be

suﬃcient. In this way, we can avoid the much more complicated computation of GL,S

for many energy valuesE of no practical interest. Moreover, in many cases, Δ_L,iand

Δ_L,i+1 _{overlap for i = 1, . . . , n − 1, so an accurate approximation of Δ}_L requires only

the computation of the extreme eigenvalues μL,1(λ) and μL,n(λ) for many λ values

onT.

3. Computation of GL,S(η) and GS,R(η). We know that GL,S(η) = Xs−1,

where η > 0 and Xs is the unique stabilizing solution of (1.7). A doubling algorithm

has been studied in [17] for (1.7) with a real symmetric positive deﬁnite Q. In our case, Q is complex symmetric. However, the more general presentation in [3, 4] can be used directly.

Let M0 and L0be as given in (2.1). It is easy to verify that the pencil M0− λL0

is T -symplectic; i.e., M0JM0T = L0JLT0 for J = 0 _I −I 0 .

We can deﬁne the sequences{M_k} and {L_k}, where

(3.1) _M_k = Ak 0 Qk −I , Lk = −Pk I ATk 0 , by the following doubling algorithm [3, 4] if no breakdown occurs.

(9)

Algorithm 3.1. Let A₀= A, Q₀= Q, P₀= 0. For k = 0, 1, . . ., compute Ak+1= Ak(Qk− Pk)−1Ak,

Qk+1= Qk− ATk(Qk− Pk)−1Ak,

Pk+1= Pk+ Ak(Qk− Pk)−1_AT_k_.

We will show shortly that this algorithm will not break down, and Qk converges

to Xs much more quickly than the sequence {X_k} generated by the following basic

ﬁxed-point iteration. Algorithm 3.2.

X0= Q,

Xk+1= Q − ATXk−1A, k = 0, 1, . . . .

We will also show that the sequence{X_k} is well-deﬁned and converges to X_s.

Note that the sequence {X(k)} from (1.3) is then well-deﬁned and given by X(k) =

X_k−1. So we indeed have lim X(k)= GL,S(η), as observed in numerical experiments.

Theorem 3.1. _{Let A and Q be as in (1.8) with η > 0. Let X}_s _{be the stabilizing}

solution of (1.7) and _X_s be the stabilizing solution of the dual equation X + AX−1AT = Q.

(The existence of _X_s _{is also guaranteed by the argument leading to Theorem 2.3.)}

Then the following are true.

(a) The sequences {A_k}, {Q_k}, {P_k} in Algorithm 3.1 are well-deﬁned, and Q_k

and Pk are complex symmetric.

(b) The sequence{X_k} in Algorithm 3.2 is well-deﬁned, and X_k is complex

sym-metric.

(c) Qk= X2k₋₁ for each k ≥ 0.

(d) Qk converges to Xsquadratically, Ak converges to 0 quadratically, and Q−Pk

converges to _X_s quadratically with lim sup

k→∞

2k _Q

k− Xs ≤ (ρ(Xs−1A))2, lim sup k→∞ 2k _A k ≤ ρ(Xs−1A), lim sup k→∞ 2k_{Q − P} k− Xs ≤ (ρ(Xs−1A))2,

where · is any matrix norm.

(e) Xk converges to Xs linearly with

lim sup

k→∞

k

Xk− Xs ≤ (ρ(Xs−1A))2.

Proof. Let Tk be the block k × k matrix given by

Tk = ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ Q −AT −A Q . .. . ._. . ._. _−AT −A Q ⎤ ⎥ ⎥ ⎥ ⎥ ⎦.

(10)

For each k ≥ 1, we can write Tk= Vk+ iηI with Vk real symmetric. So Tk has no zero

eigenvalues and is thus invertible. The sequence{X_k} is obtained by block Gaussian

elimination performed on the matrix (2.3). In fact, X0= Q is the (1, 1) block in (2.3);

when the (1, 1) block is used to eliminate the (2, 1) block, the new (2, 2) block is X1;

when the new (2, 2) block is used to eliminate the (3, 2) block, the new (3, 3) block is

X2; and so on. Since Tk is invertible for each k ≥ 1, Xk is well-deﬁned and invertible

for each k ≥ 0. It is easily seen by induction that Xk is complex symmetric since Q

is complex symmetric. So (b) is proved.

Let Wk= Qk− Pk in Algorithm 3.1. Then the sequence {W_k} satisﬁes

Wk+1= Wk− ATkWk−1Ak− AkWk−1ATk, W0= Q.

It follows from [2, Theorem 13] that Wk is nonsingular for each k ≥ 0. The sequences

{Ak}, {Qk}, {Pk} are then well-deﬁned. Again, Qk and Pk are complex symmetric

since Q is complex symmetric. This proves (a).

The proof of (c) is the same as in the proof of [11, Proposition 5], although Q is complex symmetric here.

To prove (d), we start with M0 I Xs = L0 I Xs Xs−1A,

a special case of (2.2). It follows from [3] that for each k ≥ 0,

(3.2) _M_k I Xs = Lk I Xs (Xs−1A)2 k . Substituting (3.1) into (3.2) yields

(3.3) _A_k _{= (X}_s− P_k_)(X_s−1_A)2k_, _Q_k− X_s_{= A}T_k_(X_s−1_A)2k_. Similarly, we have M0 I Xs = _L₀ I Xs Xs−1AT, where M0= AT 0 Q −I , L0= 0 _I A 0 .

The pencil _M₀−λL₀_{is a linearization of λ}2_A−λQ+AT, which has the same eigenvalues

as λ2AT− λQ + A. It follows that Xs−1AT and Xs−1A have the same eigenvalues, and

in particular, ρ( Xs−1AT) = ρ(Xs−1A). For each k ≥ 0, we now have

Mk I Xs = _L_k I Xs ( _X_s−1_AT)2k_,

where _M_k and _L_k _{are given by (3.1) and Algorithm 3.1 when A}₀ _{= A in Algorithm}

3.1 is replaced by A0= AT. It is easy to see that

Mk= AT_k 0 Qk −I , Lk= − Pk I Ak 0

(11)

with (3.4) _P_k _{= Q − Q}_k_, _Q_k _{= Q − P}_k_. So we now have (3.5) _AT_k = ( _X_s− _P_k)( _X_s−1_AT)2k_, _Q_k− _X_s_{= A}_k( _X_s−1_AT)2k_. By (3.3), (3.5), and (3.4), we have Qk− Xs= ATk(Xs−1A)2 k = ( _X_s− _P_k)( _X_s−1_AT)2k_(X_s−1_A)2k = (Qk− Xs+ (Xs+ Xs− Q))( Xs−1AT)2 k (Xs−1A)2 k . Thus (3.6) (Qk− Xs)(I − ( Xs−1AT)2 k (Xs−1A)2 k ) = (Xs+ Xs− Q)( Xs−1AT)2 k (Xs−1A)2 k . It follows that lim sup k→∞ 2k _Q k− Xs ≤ ρ( Xs−1AT)ρ(Xs−1A) = (ρ(Xs−1A))2< 1.

So Qk converges to Xs quadratically. Then we know { _P_k} is bounded, and we have

by the ﬁrst equation in (3.5) that lim sup

k→∞

2k _A

k ≤ ρ(Xs−1A) < 1.

So Ak converges to 0 quadratically. By the second equations in (3.5) and (3.4), we

have

lim sup

k→∞

2k

(Q − Pk)− Xs ≤ (ρ(Xs−1A))2< 1.

So Q − Pk converges to Xs quadratically. This completes the proof of (d).

Since Qk converges to Xsand X2k₋₁ = Q_k, we know that a subsequence of{X_k}

converges to Xs. Since ρ(Xs−1A) < 1, Xsis an attractive ﬁxed point of the mapping

f deﬁned by f (X) = Q − ATX−1A. It follows that the sequence {Xk} also converges

to Xs, and we have lim sup k→∞ k Xk− Xs ≤ (ρ(Xs−1A))2 as in [12]. So (e) is proved.

Algorithm 3.1 is said to be structure-preserving since for each k ≥ 0, Mk and Lk

have the structures given in (3.1) and the pencil Mk− λLk is T -symplectic.

When AL = AR and BL = BR, the dual equation of (1.7) with (1.8) is

pre-cisely (1.9) with (1.10). In this case, Algorithm 3.1 can ﬁnd GL,S(η) and GS,R(η)

simultaneously.

To get good approximations to GL,S, we need to take η > 0 to be suﬃciently

small. However, we cannot apply Algorithm 3.1 with η = 0 directly. If we take η = 0,

then the sequence{Q_k} is real and will not approximate X_s, which is to be complex

symmetric.

(12)

We are only interested in the values ofE for which lim_η→0+ρ(X_s−1A) = 1. Since we

need to take η suﬃciently small, we always have ρ(Xs−1A) ≈ 1, and the convergence of

Algorithm 3.2 will be very slow in general. The strategy proposed in [21] for improving

the convergence of iteration (1.3) generates a new sequence X(k) as follows.

Algorithm 3.3. Take X(0)= Q−1. For k = 0, 1, . . . , compute

X(k+1)= (Q − ATX(k)A)−1, X(k+1)← (X(k)+ X(k+1))/2.

We now adapt this strategy for Algorithm 3.2 and get the following modiﬁed ﬁxed-point method.

Algorithm 3.4. _{Take X}₀_{= Q. For k = 0, 1, . . . , compute}

Xk+1= Q − ATXk−1A,

Xk+1← (Xk+ Xk+1)/2.

We remark that for {X(k)} from Algorithm 3.3 and {X_k} from Algorithm 3.4,

we no longer have X(k) = X_k−1 in general. To keep this relation, one would have to

replace the last line of Algorithm 3.4 by

Xk+1←(X_k−1+ X_k+1−1 )/2 −1.

This would make the algorithm slightly more expensive. Numerical experiments show that this change does not aﬀect the performance of the algorithm in any signiﬁcant

way. So we prefer to use the update of Xk+1 as given in Algorithm 3.4. Numerical

experiments also show that the convergence of Algorithm 3.4 is often much faster than that of Algorithm 3.2. But a rigorous convergence analysis remains an open problem. In [16], η = 0 is assumed, and a diagonalization procedure is used on the

auxil-iary matrix (1.6) when A = AL is nonsingular. However, that approach will run into

diﬃculties even when A is perfectly conditioned and the auxiliary matrix is diagonal-izable. Example 3.1. Let AL=−I₃_, _B_L= ⎡ ⎣ ₋₁4 −14 −10 0 −1 4 ⎤ ⎦ . It is easy to determine by Theorem 2.6 that

Δ_L,1_{= [0.5858, 4.5858],} Δ_L,2_{= [2.0000, 6.0000],} Δ_L,3_{= [3.4142, 7.4142].}

So Δ_L_{= [0.5858, 7.4142]. For η = 0 and E = 4, the matrix in (1.6) has all 6 eigenvalues}

onT as follows: − √ 2 2 ± √ 2 2 i, √ 2 2 ± √ 2 2 i, ±i.

The diﬃculty with the approach in [16] is that we do not know which 3 eigenvalues

should be used to get GL,S. There are 20 diﬀerent choices. By the proof of Theorem

2.6, we now know that we can pick only 1 eigenvalue from each conjugate pair. This reduces the number of choices to 6. Only by changing η from 0 to a small positive number do we ﬁnd that the 3 eigenvalues with negative imaginary parts are perturbed

(13)

to the inside ofT. So these 3 eigenvalues are to be used to ﬁnd G_L,S. We could have

taken a small η > 0, say η = 10−10, right in the beginning and used Algorithm 3.1 to

compute GL,S(η) as a good approximation to GL,S.

In general, the palindromic matrix polynomial P (λ) = λ2AT − λQ + A, where

A and Q are given in (1.8) with η = 0, can have 2k eigenvalues on T with k being

any integer from 0 to nl. For the above example, we ﬁnd that P (λ) has 0 eigenvalues

onT for E ∈ (−∞, 0.5857] ∪ [7.4143, ∞), has 2 eigenvalues on T for E ∈ [0.5858, 2) ∪

(6, 7.4142], has 4 eigenvalues on T for E ∈ [2, 3.4142]∪[4.5858, 6], and has 6 eigenvalues

onT for E ∈ [3.4143, 4.5857]. For a large problem, the computed eigenvalues of P (λ)

(or the matrix (1.6)) may contain a number of conjugate pairs nearT, and the number

of eigenvalues insideT may be diﬀerent from the number of eigenvalues outside T. In

that case, it is hard to tell which eigenvalues should be used to compute GL,S. For

this reason, the eigenvalue approach will not be considered further in this paper. Newton’s method has been studied in [12] for (1.7) with a Hermitian positive deﬁnite Q. Since Q is non-Hermitian in our case, there is no guarantee of convergence

for Newton’s method with X0= Q. With a given X0, Newton’s iteration for (1.7) is

easily found to be

(3.7) _X_k− AT_X_k−1−1 _X_k_X_k−1−1 _{A = Q − 2A}T_X_k−1−1 _A.

If Xk−1 is a nonsingular complex symmetric matrix with ρ(Xk−1−1 A) < 1, then (3.7)

has a unique solution Xk, which must be complex symmetric. Since ρ(Xs−1A) < 1 for

η > 0, the convergence of Newton’s method is guaranteed if X0is complex symmetric

and suﬃciently close to Xs.

Algorithm 3.5 (Newton’s method for (1.7)). Take X₀ to be complex symmetric

and suﬃciently close to Xs. For k = 1, 2, . . . , compute Lk= Xk−1−1 A and solve

(3.8) _X_k− LT_k_X_k_L_k_{= Q − 2L}T_k_A.

Note that the Stein equation (3.8) is uniquely solvable when ρ(Lk) < 1.

If we are to ﬁnd Xs (and then GL,S(η) = Xs−1) only for one ﬁxed value of E,

there is little hope for Algorithm 3.5 to beat Algorithm 3.1. In practice, we need to

determine GL,S (through Xs and GL,S(η) for a small η > 0) for a range of energy

values [21]. So for Algorithm 3.5, the Xscomputed at a given energy can be used as

an initial guess for Xsat the next nearby energy point. However, there is some danger

associated with this practice. If η is very small, then the convergence of Algorithm

3.5 is guaranteed only when Xs for the previous energy is very close to the Xs for

the current energy. This means that the stepsize for the energyE must be very small.

But it is hard to tell how small the stepsize should be for a given η > 0. On the other hand, Algorithm 3.1 is not a correction method, and thus we cannot use the

Xs computed at a given energy to compute Xsat the next nearby energy point. But

the convergence of Qk to Xs is fast (see Theorem 3.1 and (3.6)) even though we use

Q0= Q for each energy E.

4. Numerical results. In this section we present some numerical results to illustrate the convergence behavior of the algorithms for computing the stabilizing

so-lution Xsof (1.7). We use DA, M FPM, and NM to denote the doubling algorithm

(Al-gorithm 3.1), the modiﬁed ﬁxed-point method (Al(Al-gorithm 3.4), and Newton’s method (Algorithm 3.5), respectively. All computations are performed in MATLAB R2008b

using IEEE double-precision ﬂoating-point arithmetic (eps ≈ 2.2 × 10−16) on the

Linux system.

(14)

Suppose one complex ﬂop is equivalent to four real ﬂops and n is the dimension

of the matrices in (1.7). Then for each iteration, DA requires about (104/3)n3 real

ﬂops, M FPM requires about (38_/3)n3 _{real ﬂops, and NM requires about (398/3)n}3

real ﬂops. We also recall that DA can compute the stabilizing solutions of (1.7) and (1.9) simultaneously if the matrices A and Q in these two equations are the same two matrices. This makes the comparison more favorable for DA.

To measure the accuracy of a computed stabilizing solution X to (1.7), we use the relative residual (r res)

_{X + A}T_X−1_{A − Q}

X + A2_X−1_{+ Q},

where · is the spectral norm.

Example 4.1. We randomly generate a symmetric matrix BL and an arbitrary

matrix AL of dimension 50. By Theorem 2.6, we ﬁnd

Δ_L= [−23.03, −19.94] ∪ [−19.84, 15.83] ∪ [16.12, 17.78] ∪ [18.26, 21.28].

We divide the interval [−23.03, 21.28] (the smallest interval containing Δ_L_{) into P}

subintervals using P + 1 equally spaced nodes Ei=−23.03 + 44.31(i/P ), i = 0, . . . , P .

We now choose P = 1000 and take η = 10−10in (1.8). Let dibe the distance between

T and the set of eigenvalues of the pencil (M0, L0) in (2.1) withE = Ei. We ﬁnd that

di< 10−8 wheneverEi∈ ΔL.

We run DA for eachE_i as well as for a fewE values outside [−23.03, 21.28]. The

algorithm is stopped when Q_k+1− Q_k < 10−8_{, and X = Q}_k+1 is taken to be a

good approximation to the stabilizing solution of (1.7). We also compute the relative residual for X as another check of accuracy. In Figure 4.1, we plot the relative

resid-uals, the number of iterations, and the distance betweenT and the set of eigenvalues

of the pencil (M0, L0). We see that DA converges to the stabilizing solution of (1.7)

in about 40 iterations for E ∈ Δ_L, and that the convergence is very fast for otherE

values. We also see that r res _{< 10}−10 for almost all E values. Generally speaking,

there is no need to look for higher accuracy since we have taken η = 10−10 rather

than η = 0+.

Note that DA computes the stabilizing solution for each E value without using

the stabilizing solution already computed for a nearby E value. For this example,

we have already found by DA the (approximate) stabilizing solution X_DA(i) for each

E = Ei, i = 0, . . . , P . We now examine whether it is possible to use NM to ﬁnd all

those stabilizing solutions more efficiently, with some help from DA. Let q ≥ 2 be a factor of P and P = mq. Suppose we use DA to find the stabilizing solutions only for E = Erq, r = 0, . . . , m − 1. Then we may find the stabilizing solutions for E = Erq+i

(i = 1, . . . , qr, with qr = q − 1 for r < m − 1 and qr = q for r = m − 1) as follows.

Take X0= XDA(rq), apply NM untilXk+1− Xk < 10−8 or k + 1 = 30, and take Xk+1

to be X_NM(rq+1). To get X_NM(rq+i) for i = 2, . . . , qr, we apply NM in the same way with

X0 = XNM(rq+i−1). For r = 0, . . . , m − 1 and i = 1, . . . , qr, we say NM is successful if

X_NM(rq+i)−X_DA(rq+i) < 10−6_{. Suppose NM is successful S times; we deﬁne the eﬃciency}

(or success rate) of NM by

(4.1) Eﬀ = S

m(q − 1) + 1.

For P = 1000, we ﬁnd that NM converges (within 30 iterations) most of the time, with average number of iterations at about 10. However, it often converges to a wrong

(15)

−20 −15 −10 −5 0 5 10 15 20 0 0.5 E Distance −20 −15 −10 −5 0 5 10 15 20 10−15 10−10 Relative Residuals −20 −15 −10 −5 0 5 10 15 20 0 10 20 30 40 Iterations of DA

Fig. 4.1. Relative residuals, the number of iterations for convergence of DA, and the distance betweenT and the spectrum of (M0, L0);P = 1000.

solution. In practice, X_DA(rq+i) is not computed if we choose to use NM to get X_NM(rq+i).

But we could still check whether X_NM(rq+i)is approximating the stabilizing solution by

computing the eigenvalues of (X_NM(rq+i))−1_{A. The eﬃciency of NM is shown in Table 4.1}

for diﬀerent values of q. Note that the success rate of NM is only 58% even for q = 2. Besides, the computational work for 10 NM iterations is not much less than that for 40 DA iterations. Therefore, we would prefer to use DA to compute the stabilizing

solution for eachE value. However, NM may be used to improve the accuracy of the

solution obtained by DA if there is a need to do so. For example, whenE = 5.64, we

ﬁnd that r res = 2× 10−9 after 40 DA iterations, and starting with this approximate

solution, we get r res = 3× 10−13 after 1 NM iteration.

Table 4.1

Eﬃciency of NM forP = 1000.

q 40 20 10 5 2

Eﬀ 15% 27% 33% 48% 58%

Example 4.2. We consider a semi-inﬁnite Hamiltonian operator for a heterostruc-tured semiconductor of the form

(4.2) _{H(ψ,
x) = −∇} 2ε( x)∇ψ + V ( x)ψ, x ≡ x1 x2 ∈ Ω,

(16)

where Ω≡ Ω₁∪ Ω₂ with

Ω₁= ([−9, −1] × (−∞, 0]) ∪ ([1, 9] × (−∞, 0]) , Ω₂= [−1, 1] × (−∞, 0],

is the reduced Planck constant, ψ is the associated wave function, ε( x) is the electron eﬀective mass with

ε( x) =

ε1, x ∈ Ω1,

ε2, x ∈ Ω2,

and V ( x) = ω1x21is the potential energy.

Let Trbe the tridiagonal matrix of dimension r with 4 on the main diagonal and

−1 on the two adjacent diagonals. We use the classical 5-point central ﬁnite diﬀerence method to discretize the Hamiltonian operator (4.2) on uniform grid points in Ω with

mesh size h. Then the corresponding matrices BLand ALin (1.8) are of the following

forms: BL= δ1Tl⊕ (2(δ1+ δ2))⊕ δ2Tm⊕ (2(δ1+ δ2))⊕ δ1Tl − δ1el+1eTl + eleTl+1 − δ2el+2eTl+1+ el+1eTl+2 − δ2en+1eTn+ eneTn+1 − δ1en+2eTn+1+ en+1eTn+2 + ω1h2diag(1− c)2, (2 − c)2, . . . , (N − c)2 and AL =− δ1Il⊕ δ1+ δ2 2 ⊕ δ2Im⊕ δ1+ δ2 2 ⊕ δ1Il ,

where δi=/2h2εi (i = 1, 2), ej denotes the jth vector of the identity matrix, l and

m are the numbers of grid points on the x1-axis in (−9, −1) and (−1, 1), respectively,

n = l + m + 1, N = 2l + m + 2, c = (N + 1)/2, and ⊕ denotes the direct sum of two matrices.

In our tests we take l = 39, m = 9, δ1 = 1, δ2 = 0.1, and ω1 = 5× 10−4.

By Theorem 2.6, we ﬁnd Δ_L _{= [0.00386, 8.0103]. We divide Δ}_L _{into P subintervals}

using P + 1 equally spaced nodes Ei, i = 0, . . . , P . We now choose P = 1000 and take

η = 10−6 in (1.8). Let di be the distance betweenT and the set of eigenvalues of the

pencil (M0, L0) in (2.1) withE = Ei. We ﬁnd that di< 10−5 wheneverEi ∈ ΔL.

We run DA for eachE_ias well as for a fewE values outside Δ_L. The algorithm is

stopped whenQ_k+1− Q_k < 10−8_{, and X = Q}_k+1is taken to be a good

approxima-tion to the stabilizing soluapproxima-tion of (1.7). In Figure 4.2, we plot the relative residuals and the number of iterations. We see that DA converges to the stabilizing solution of

(1.7) in about 26 iterations forE ∈ Δ_L_{. We also see that r res < 10}−9 for almost all

E values.

In this example, we have increased η from 10−10to 10−6, and as a result the usual

number of DA iterations is reduced from 40 to 26. We have also tried η = 10−10 for

this example and found that DA still works very well with the number of iterations

around 40, as in Example 4.1. In the nano literature, the smallest η used is 10−6since

no powerful methods were available at that time. With DA studied in this paper, we

can also take η to be smaller than 10−10 if there is a need to do so.

(17)

0 1 2 3 4 5 6 7 8 0 10 20 30 40 50 E = energy (η = 10−6₎ Iterations of DA 0 1 2 3 4 5 6 7 8 10−16 10−14 10−12 10−10 10−8 Relative Residuals

Fig. 4.2_{. Relative residuals and the number of iterations for convergence of DA;}P = 1000.

Suppose we use DA to ﬁnd the stabilizing solution only forE = E₀ and use NM

(in the same way as in Example 4.1) to ﬁnd the stabilizing solutions for E = E_i,

i = 1, . . . , P . In the notation of Example 4.1, we have q = P , m = 1, and Eff = S/P . We find that Eff = 46.3% and 46.5% for P = 1000 and 2000, respectively. To have Eff = 100%, one would have to take an extremely large P , which is not practical.

Example 4.3. We consider Example 2.1 with t = 1. The matrices A and Q in

(1.8) are now given by A = [0₁ 0₀] and Q = (E + iη)I2− [21 12]. We take η = 10−10. For

0≤ E ≤ 4, the distance between T and the set of eigenvalues of the pencil (M₀_{, L}₀)

in (2.1) is less than 10−8.

We run DA as before. In Figure 4.3 we plot the relative residuals and the number of iterations for convergence of DA. In most cases, 37 iterations are needed. We know from Theorem 3.1(c) that in exact arithmetic the approximate solution obtained after

37 DA iterations is the same as the approximate solution obtained after 237−1 (which

is over 137 billion) iterations of the basic ﬁxed-point method (Algorithm 3.2). So we would not try to use Algorithm 3.2 even for this small problem. But M FPM turns

out to be quite useful. We run M FPM for E = E_i _{= 0.004i, i = 0, 1, . . . , 1000.}

The algorithm is stopped when r res < 10−10 or the number of iterations exceeds

10000. Figure 4.4 plots the relative residuals and the number of iterations. More

speciﬁcally, we ﬁnd that M FPM converges in 37–100 iterations for 536 values of E_i

with i ∈ {37–301, 303–305, 695–697, 699–963}, converges in 0 iterations for E500= 2,

and does not converge in 100 iterations for the other 464 values ofE_i. Note that for

E500 = 2, the number of M FPM iterations is 0 since X0 = Q happens to be the

exact solution when η = 0 and since we take η = 10−10, which is very close to 0.

DA takes 1 iteration for E = 2 only because a diﬀerent stopping criterion is used.

ForE = E_i_{, i = 302, 698, the number of M FPM iterations is 102, exceeding 100 only}

(18)

−0.50 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0.2 0.4 0.6 0.8 1x 10 −9 Relative Residuals −0.50 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 10 20 30 40 50 E Iterations of DA

Fig. 4.3. Relative residuals and the number of iterations for convergence of DA;η = 10−10.

0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5x 10 −5 Relative Residuals 0 0.5 1 1.5 2 2.5 3 3.5 4 0 5000 10000 E Iterations of M_FPM

Fig. 4.4_{. Relative residuals and the number of iterations for convergence of M FPM;}η = 10−10_.

slightly. We may say that for this example, M FPM is more eﬃcient than DA most of the time. However, there is currently no theory about the convergence of M FPM even for this simple example.

5. Conclusion. We have studied a doubling algorithm for Green’s function cal-culations in nano research. Its convergence to the desired solution is guaranteed. The algorithm has been shown to be efficient and reliable. Newton’s method cannot be used for this purpose by itself, but it may be used as a correction method if very high accuracy is required. A modified fixed-point method may be more efficient than the doubling algorithm in some situations, but its convergence analysis remains an open problem.

(19)

REFERENCES

[1] I. Appelbaum, T. Wang, J. D. Joannopoulos, and V. Narayanamurti, Ballistic hot-electron transport in nanoscale semiconductor heterostructures: Exact self-energy of a three-dimensional periodic tight-binding Hamiltonian, Phys. Rev. B, 69 (2004), Article Number 165301, 6 pp.

[2] D. A. Bini, L. Gemignani, and B. Meini, Computations with inﬁnite Toeplitz matrices and polynomials, Linear Algebra Appl., 343–344 (2002), pp. 21–61.

[3] C.-Y. Chiang, E. K.-W. Chu, C.-H. Guo, T.-M. Huang, W.-W. Lin, and S.-F. Xu, Con-vergence analysis of the doubling algorithm for several nonlinear matrix equations in the critical case, SIAM J. Matrix Anal. Appl., 31 (2009), pp. 227–247.

[4] E. K.-W. Chu, T.-M. Hwang, W.-W. Lin, and C.-T. Wu, Vibration of fast trains, palin-dromic eigenvalue problems and structure-preserving doubling algorithms, J. Comput. Appl. Math., 219 (2008), pp. 237–252.

[5] S. Datta, Electronic Transport in Mesoscopic Systems, Cambridge University Press, Cam-bridge, UK, 1995.

[6] S. Datta, Nanoscale device modeling: The Green’s function method, Superlattices and Mi-crostructures, 28 (2000), pp. 253–278.

[7] E. N. Economou, Green’s Functions in Quantum Physics, 3rd ed., Springer, Berlin, 2006. [8] J. C. Engwerda, A. C. M. Ran, and A. L. Rijkeboer, Necessary and suﬃcient conditions

for the existence of a positive deﬁnite solution of the matrix equationX + A∗X−1A = Q, Linear Algebra Appl., 186 (1993), pp. 255–275.

[9] I. Gohberg, S. Goldberg, and M. A. Kaashoek, Classes of Linear Operators, Vol. II, Oper. Theory Adv. Appl. 63, Birkh¨auser, Basel, 1993.

[10] C.-H. Guo, Convergence rate of an iterative method for a nonlinear matrix equation, SIAM J. Matrix Anal. Appl., 23 (2001), pp. 295–302.

[11] C.-H. Guo, Numerical solution of a quadratic eigenvalue problem, Linear Algebra Appl., 385 (2004), pp. 391–406.

[12] C.-H. Guo and P. Lancaster, Iterative solution of two matrix equations, Math. Comp., 68 (1999), pp. 1589–1603.

[13] N. J. Higham and H.-M. Kim, Numerical analysis of a quadratic matrix equation, IMA J. Numer. Anal., 20 (2000), pp. 499–519.

[14] N. J. Higham and H.-M. Kim, Solving a quadratic matrix equation by Newton’s method with exact line searches, SIAM J. Matrix Anal. Appl., 23 (2001), pp. 303–316.

[15] D. L. John and D. L. Pulfrey, Green’s function calculations for semi-inﬁnite carbon nan-otubes, Physica Status Solidi B, 243 (2006), pp. 442–448.

[16] A. Kletsov, Y. Dahnovsky, and J. V. Ortiz, Surface Green’s function calculations: A non-recursive scheme with an inﬁnite number of principal layers, J. Chem. Phys., 126 (2007), Article Number 134105, 5 pp.

[17] W.-W. Lin and S.-F. Xu, Convergence analysis of structure-preserving doubling algorithms for Riccati-type matrix equations, SIAM J. Matrix Anal. Appl., 28 (2006), pp. 26–39. [18] D. S. Mackey, N. Mackey, C. Mehl, and V. Mehrmann, Structured polynomial eigenvalue

problems: Good vibrations from good linearizations, SIAM J. Matrix Anal. Appl., 28 (2006), pp. 1029–1051.

[19] C. van der Mee, G. Rodriguez, and S. Seatzu,LDU factorization results for bi-infinite and semi-infinite scalar and block Toeplitz matrices, Calcolo, 33 (1996), pp. 307–335. [20] B. Meini, Efficient computation of the extreme solutions of X + A∗X−1A = Q and X −

A∗_X−1_{A = Q, Math. Comp., 71 (2002), pp. 1189–1204.}

[21] J. Tomfohr and O. F. Sankey, Theoretical analysis of electron transport through organic molecules, J. Chem. Phys., 120 (2004), pp. 1542–1554.

[22] X. Zhan, Computing the extremal positive deﬁnite solutions of a matrix equation, SIAM J. Sci. Comput., 17 (1996), pp. 1167–1174.