Solving large-scale nonlinear matrix equations by doubling

(1)

Contents lists available atSciVerse ScienceDirect

Linear Algebra and its Applications

journal homepage:w w w . e l s e v i e r . c o m / l o c a t e / l a a

Solving large-scale nonlinear matrix equations by doubling

Peter Chang-Yi Weng

a

, Eric King-Wah Chu

a,

∗

, Yueh-Cheng Kuo

b

,

Wen-Wei Lin

c

a_{School of Mathematical Sciences, Building 28, Monash University, Vic 3800, Australia} b_{Department of Applied Mathematics, National University of Kaohsiung, Kaohsiung 811, Taiwan} c_{Department of Applied Mathematics, National Chiao Tung University, Hsinchu 300, Taiwan}

A R T I C L E I N F O A B S T R A C T

Article history:

Received 22 November 2011 Accepted 7 August 2012

Available online 19 September 2012 Submitted by P.Semrlˇ AMS classification: 15A24 65F50 Keywords: Doubling algorithm Green’s function Krylov subspace Leaky surface wave Nano research

Nonlinear matrix equation Surface acoustic wave

We consider the solution of the large-scale nonlinear matrix equa-tion X+BX−1A−Q = 0, with A,B,Q,X ∈ Cn×n, and in some applications B = A. (. = or H). The matrix Q is assumed to be nonsingular and sparse with its structure allowing the solu-tion of the corresponding linear system Qv = r in O(n) computa-tional complexity. Furthermore, B and A are respectively of ranks ra,rbn. The type 2 structure-preserving doubling algorithm by Lin and Xu (2006) [24] is adapted, with the appropriate applica-tions of the Sherman–Morrison–Woodbury formula and the low-rank updates of various iterates. Two resulting large-scale doubling algorithms have an O((ra+rb)3)computational complexity per it-eration, after some pre-processing of data in O(n)computational complexity and memory requirement, and converge quadratically. These are illustrated by the numerical examples.

1. Introduction

Consider the nonlinear matrix equation (NME)

R

(

X

) ≡

X

+

BX−1A

−

Q

=

0 (1)

with A

,

B

,

Q

,

X

∈ C

n×n. We assume that Q is nonsingular with structures, like being banded or sparse, allowing the solution of the corresponding linear system Qv

=

r in O

(

n

)

computational complexity.

∗Corresponding author.

E-mail addresses:[email protected](P.C.-Y. Weng),[email protected](E.K.-w. Chu),[email protected](Y.-C. Kuo), [email protected](W.-W. Lin).

(2)

We further assume that A

,

B are respectively of ranks ra

,

rb

n. These NMEs arise in the solution of palindromic eigenvalue problems, with applications in the computation of Green’s function in nano research [12–14,16] and surface acoustic simulations [18,19]; for the individual models, structures of the particular NMEs and their solvability conditions, please consult these references. For the surface acoustic wave application in [18,19], we have Q

=

Q

∈ C

n×n and B

=

A

∈ C

n×n. In some applications as in [18,19], selected eigenvalues from the pencils

λ

X

−

A or

λ

B

−

X are required, after

the solution X to (1) is found.

We shall adapt the structure-preserving doubling algorithm (SDA) of type 2 [13,16,24] for the NME in (1), resulting in an efficient large-scale doubling algorithm (SDA_ls). The original SDA is usually attributed to Anderson [1],1 as an accelerated variant of the direct functional iteration method. It has recently been revitalized and further developed in [5–7], for a great variety of applications [9]. Recently, we have extended the SDA (of type 1) for large-scale algebraic Riccati equations (AREs) [21,22,28] and the associated linear equations [23], with the resulting algorithms possessing an efficient O

(

n

)

computational complexity and memory requirement per iteration. We shall extend these methods to NMEs, based on similar philosophy. Notice the important difference in the large-scale NME, from large-scale AREs, that the solution X is nonsingular and not numerically low-ranked. (We shall see later that X is a numerically low-ranked update of the nonsingular Q .) Interestingly, the essential steps of compression and truncation of Krylov bases for large-scale AREs are not required for NMEs. Also, the SDA_ls for large-scale NMEs is of a more efficient O

((

ra

+

rb

)

3

)

computational complexity per iteration, as compared to O

(

n

)

for AREs. The overall algorithm shares the O

(

n

)

computational complexity and memory requirement because of the pre-processing of data.

Similar techniques in this paper are applicable to the cyclic reduction method in [25] for X

±

A.X−1A

−

Q

=

0 (.

= ,

H; denoting the transpose and the Hermitian).

2. Preliminaries

In this section, we shall introduce some notations, briefly describe the solvability condition for NME (1) and give some preliminary results. Throughout this paper, we denote the unit circle in the complex plane by

T

. For a matrix A

∈ C

n×n,

σ (

A

)

and

ρ(

A

)

denote respectively the spectrum and spectral radius of A, and

σ

max

(

A

)

and

σ

min

(

A

)

are respectively the maximum and minimum singular values of

A. The conjugate transpose and transpose of A are denoted by AHand Arespectively. We can write

A

=

AR

+

iAI, where the Hermitian matrices

AR

=

1 2

(

A

+

A H

_),

_A I

=

1 2i

(

A

−

A H

₎

are called the real part and the imaginary part of A, respectively. For Hermitian matrices A1, A2

∈ C

n×n, we use A1

>

A2(A1

≥

A2) to denote the fact that A1

−

A2is positive definitive (positive semi-definite).

The NME in (1) can be reformulated as

R

(

X

) =

X

+ (

CH

+

iDH

)

X−1

(

C

+

iD

) − (

QR

+

iQI

) =

0

,

where C

≡

1₂

(

A

+

BH

)

, D

≡

_2i1

(

A

−

BH

)

and QR, QIare the real part and the imaginary part of Q , respectively. Let

ψ(

z

) =

zDH

+

QI

+

z−1D (2)

be a rational matrix-valued function. The following is a consequence of the solvability results from [13] and the proof can be found in [14], after superficial modifications.

1 _{In [2, p. 149], we have the quotation “Doubling algorithms have been part of the folklore associated with Riccati equations in}

linear systems problems for some time. We are unable to give any original reference containing material close to that presented here.” This quotation from Anderson, to whom the SDA is widely attributed, indicates the uncertain origin of the method.

(3)

Theorem 2.1. Let A

=

C

+

iD, B

=

CH

+

iDHand Q

=

QR

+

iQIbe n

×

n complex matrices. Suppose

that

ψ(

z

)

defined in (2) is positive definite for each z

∈ T

.

(i) The matrix polynomial P

(

z

) =

z2B

−

zQ

+

A has exactly n eigenvalues each inside and outside

T

. (ii) The NME (1) has a solution X

=

XR

+

iXIwith

ρ(

X−1A

) <

1 and XI

>

0.

Note that if X is solution of NME, then P

(

z

) = (

zBX−1

−

I

)

X

(

zI

−

X−1A

)

. So the eigenvalues of

X−1A are the n eigenvalues of P

(

z

)

, and the eigenvalues of BX−1are the reciprocals of remaining n eigenvalues of P

(

z

)

. A solution X of NME is said to be stabilizing if

ρ(

X−1A

) <

1. From Theorem2.1, if

ψ(

z

) >

0 for each z

∈ T

, then the NME and its dual

X

+

AX−1B

=

Q (3)

have stabilizing solutions.

Remark 2.1. Theorem2.1gives a sufficient condition for the existence of stabilizing solutions of NMEs. For applications in the computation of the surface Greens function in nano research [13,16] , we have

QI

=

I, D

=

0 and C, QRbeing real, and it is easy to check that the sufficient condition holds. For the surface wave application in [18,19], we have C, D, QRand QIbeing real such that

ψ(

z

) >

0 for each

z

∈ T

. For the special case where C

,

QR

=

0, if

ψ(

z

) >

0 for each z

∈ T

, then

X

+ (

iDH

)

X−1

(

iD

) =

iQI (4)

has a stabilizing solution XSwith XS,I

>

0,

ρ(

XS−1

(

iD

)) <

1 and

ρ((

iDH

)

XS−1

) <

1. We shall show that the real part of XSis zero.Consider the nonlinear matrix equation

X

+ (

iDH

)

X−1

(

iD

) = −

iQI

.

(5)

Itis obvious that

−

XSand XSHare solutions with

ρ(−

X−S1

(

iD

)) <

1 and

ρ(

XS−H

(

iD

)) <

1, i.e.,

−

XS and X_SHare stabilizing solutions. Since the stabilizing solution of (5) is unique, X_SH

= −

XS. Hence, XS has only imaginary part, i.e., XS

=

iXS,I. Since XS,I

>

0, substituting XS

=

iXS,Iinto (4) implies that

X

+

DHX−1D

=

QIhas a Hermitian positive definite solution XS,I. This coincides with the result in [10]. Under the assumption that

ψ(

z

) >

0 for each z

∈ T

, we know that the NME and its dual have stabilizing solutions X and X, respectively. The corresponding SDA of type 2 [13,16,24] has the form

A0

=

A

,

B0

=

B

,

Q0

=

Q

,

P0

=

0

;

Ak+1

=

Ak

(

Qk

−

Pk

)

−1Ak

,

Bk+1

=

Bk

(

Qk

−

Pk

)

−1Bk

;

Qk+1

=

Qk

−

Bk

(

Qk

−

Pk

)

−1Ak

,

Pk+1

=

Pk

+

Ak

(

Qk

−

Pk

)

−1Bk

.

(6) From [13, Theorem 3.1], all the iterates are well-defined (i.e., Mk

≡

Qk

−

Pkare always invertible),

Qk

→

X, Q

−

Pk

→

X and A k

,

Bk

→

0, all quadratically as k

→ ∞

. In the following, we shall give an upper bound of

{

M−_k1

2

:

k

=

0

,

1

, . . .}

For the case where B

=

AHand Q

=

QH, an upper bound has been given in [3, Theorem 15].

The following lemma is useful to estimate the upper bound.

Lemma 2.2. Let

=

R

+

iI

∈ C

n×n, whereRandIare the real and imaginary parts of. Suppose

thatIis positive (negative) definite.

(i) is invertible and −1

₌

−1

(4)

with−_I 1R

(

I

+

R−I1R

)

−1being Hermitian and

−(

I

+

R−I 1R

)

−1being Hermitian

negative (positive) definite.

(ii) IfI

I (I

−

I) for some

>

0, then

σ

min

(

)

, i.e.,

−1

2

−1.

Proof.

(

i

)

For eachunitvector x

∈ C

n,sinceR,Iare Hermitian andIis positive (negative) definite,

we have

x

2

= (

R

+

iI

)

x

2

xH

(

R

+

iI

)

x

2

xHIx

2

σ

min

(

I

).

(8) Hence,is invertible. From

(

R

+

iI

)[

I−1R

(

I

+

RI−1R

)

−1

−

i

(

I

+

R−I 1R

)

−1

]

= (

R

+

iI

)(

−I 1R

−

iI

)(

I

+

R−I 1R

)

−1

= (

I

+

R−I 1R

)(

I

+

R−I 1R

)

−1

=

I

,

we obtain (7). SinceIis positive (negative) definite andR

=

HR, it is easy to see that

−(

I

+

R−I 1R

)

−1 is Hermitian negative (positive) definite. Finally, we show that −I 1R

(

I

+

R−I 1R

)

−1is Hermitian. From (7), we have

[

−1

I R

(

I

+

RI−1R

)

−1

−

i

(

I

+

R−I 1R

)

−1

](

R

+

iI

) =

I

.

It follows that−_I 1R

(

I

+

R−I 1R

)

−1I

− (

I

+

R−I 1R

)

−1R

=

0. SinceRandIare Hermitian, we deduce that−_I 1R

(

I

+

R−I 1R

)

−1is Hermitian.

(

ii

)

IfI

I (I

−

I) for some

>

0 then from (8), we have

σ

min

(

) =

min

x =1

x

2

.

Hence,

−1

2

= σ

min

(

)

−1

−1.

Suppose that

ψ(

z

) >

0 for each z

∈ T

. Let

Mk

=

Qk

−

Pk

, φ

k

(

z

) = −

Bkz

+

Mk

−

Akz−1

,

(9) where Ak, Bk, Qk, Pkare generated by SDA (6). From [13], we know that Mkis invertible. Then

φ

k

(

z

)

Mk−1

φ

k

(−

z

) = (−

Bkz

+

Mk

−

Akz−1

)(

Mk−1Bkz

+

I

+

M−k1Akz−1

)

= −

BkM−k1Bkz2

+ (

Mk

−

BkMk−1Ak

−

AkMk−1Bk

) −

AkMk−1Akz−2

= −

Bk+1z2

+

Mk+1

−

Ak+1z−2

= φ

k+1

(

z2

).

(10) Let

φ

0,R

(

z

)

and

φ

0,I

(

z

)

be the real and imaginary parts of

φ

0

(

z

)

, respectively. Since, for z

∈ T

,

φ

0

(

z

) = φ

0,R

(

z

) +

i

φ

0,I

(

z

), φ

0,I

(

z

) = ψ(−

z

) >

0

.

(11) Lemma2.2implies that

φ

0

(

z

)

is invertible for z

∈ T

. It follows from (10) that for k

=

0

,

1

, . . .

,

φ

k

(

z

)

is invertible for z

∈ T

. Let

ϕ

k

(

z

) = φ

k

(

z

)

−1for z

∈ T

and

(5)

where

ϕ

k,R

(

z

)

and

ϕ

k,I

(

z

)

are the real and imaginary parts of

ϕ

k

(

z

)

. Taking the inverse of (10) yields

ϕ

k+1

(

z2

) = φ

k+1

(

z2

)

−1

= φ

k

(−

z

)

−1Mk

φ

k

(

z

)

−1

= φ

k

(−

z

)

−1

φ

k

(

z

) + φ

k

(−

z

)

2

φ

k

(

z

)

−1

=

1 2

[φ

k

(

z

)

−1

_{+ φ}

k

(−

z

)

−1

] =

1 2

[ϕ

k

(

z

) + ϕ

k

(−

z

)].

(13) From (11), (12) and Lemma2.2, we have

ϕ

0,I

(

z

) = −(φ

0,I

(

z

) + φ

0,R

(

z

)φ

0,I

(

z

)

−1

φ

0,R

(

z

))

−1

<

0 for

z

∈ T

. It follows from (13) that for each k,

ϕ

k,I

(

z

) <

0 for z

∈ T

and max

z∈T

σ

max

(ϕ

k,R

(

z

))

maxz∈T

σ

max

(ϕ

0,R

(

z

)) ≡ σ

max,R

,

max

z∈T

σ

max

(ϕ

k,I

(

z

))

maxz∈T

σ

max

(ϕ

0,I

(

z

)) ≡ σ

max,I

,

min

z∈T

σ

min

(ϕ

k,I

(

z

))

minz∈T

σ

min

(ϕ

0,I

(

z

)) ≡ σ

min,I

>

0

.

(14) For z

∈ T

, we then have

ϕ

k,R

(

z

)

2

σ

max,R

, ϕ

k,I

(

z

)

2

σ

max,I

, ϕ

k,I

(

z

)

−1

2

σ

min−1,I

.

(15) The following theorem gives upper bounds of

Mk

2and

Mk−1

2for k

=

0

,

1

, . . .

Theorem 2.3. Let A

=

C

+

iD, B

=

CH

+

iDHand Q

=

QR

+

iQIbe given such that

ψ(

z

)

is positive for

each z

∈ T

. Let Mk

=

Qk

−

Pk, where Qkand Pkare generated by the SDA in (6). Then

Mk

2

σ

2 max,R

+ σ

min2 ,I

σ

2 min,I

,

M−_k1

2

σ

max,I

+

σ

2 max,R

σ

min,I

,

where

σ

max,R,

σ

max,Iand

σ

min,Iare given in (14), which are only dependent on

φ

0

(

z

)

for z

∈ T

. Proof. For each k

=

0

,

1

, . . .

, from (15), we have that for each z

∈ T

,

−

σ

max,R

σ

min,I

ϕ

k,I

(

z

)

ϕ

k,R

(

z

)

σ

max,R

σ

min,I

ϕ

k,I

(

z

),

0

< σ

min,II

−ϕ

k,I

(

z

) − ϕ

k,R

(

z

)ϕ

k,I

(

z

)

−1

ϕ

k,R

(

z

)

σ

max,I

+

σ

2 max,R

σ

min,I I

.

Let

φ

k,R

(

z

)

and

φ

k,I

(

z

)

be the real and imaginary parts of

φ

k

(

z

)

, respectively. From Lemma2.2

(

i

)

and using the fact that

φ

k

(

z

) = ϕ

k

(

z

)

−1, we have for z

∈ T

,

σ

max,R

σ

2 min,I I

φ

k,R

(

z

)

−

σ

max,R

σ

2 min,I I

,

σ

−1 min,II

φ

k,I

(

z

)

σ

max,I

+

σ

2 max,R

σ

min,I −1 I

.

(6)

Since Mk

= [φ

k

(

z

) + φ

k

(−

z

)]/

2

= [φ

k,R

(

z

) + φ

k,R

(−

z

)]/

2

+

i

[φ

k,I

(

z

) + φ

k,I

(−

z

)]/

2, we have

Mk

2

σ

max,R

σ

2 min,I 2

+ σ

−2 min,I

=

σ

2 max,R

+ σ

min2 ,I

σ

2 min,I and Mk,I

=

1 2i

(

Mk

−

M H k

) =

φ

k,I

(

z

) + φ

k,I

(−

z

)

2

σ

max,I

+

σ

2 max,R

σ

min,I −1 I

.

It follows from Lemma2.2

(

ii

)

that

M−_k1

2

σ

max,I

+ σ

max2 ,R

/σ

min,I

.

For the special case C

,

QR

=

0, it follows from (14) and Theorem2.3that

Mk

2

σ

min−1,I and

M_k−1

2

σ

max,I. This coincides with the results in [3, Theorem 15].

3. Large-scale doubling algorithm

3.1. Main ideas

From [21–23,28], the main ideas behind the algorithm for large-scale problems are:

(a) The appropriate application of the Sherman–Morrison–Woodbury formula (SMWF) in order to avoid the inversion of large or unstructured matrices.

(b) The use of low-rank updates for various iterates.

(c) The computation of matrix operators (Ak) recursively, to preserve the corresponding sparsity or low-rank structures, instead of forming them explicitly.

(d) The careful organization of convergence control in the algorithm, so as to preserve the low computational complexity and memory requirement per iteration.

For the SDA for large-scale NMEs, we shall see that (c) is not relevant.

Let A, B, Q

∈ C

n×nbe given such that

ψ(

z

)

defined in (2) is positive definite for each z

∈ T

and A,

B be respectively of ranks ra, rb

n. Assume the full-rank decompositions

A

=

FaRaGHa

,

B

=

FbRbGbH (16)

with Ra

∈ C

ra×ra and Rb

∈ C

rb×rb. Without loss of generality, we shall assume that Fa, Fb, Gaand

Gbare unitary. In this paper, we shall call some matrices “kernels”, mostly denoted by R with various subscripts (Y is also used in Section3.3). Most of our computation will be done in terms of kernels.

From Theorem2.1, we know that NME (1) and its dual (3) have stabilizing solutions X and X,

respectively. The SDA for NME and its dual has the form in (6) which requires the inverse of Mk

=

Qk

−

Pk. It is shown in [13] that Mkis invertible for each k

=

0

,

1

, . . .

Furthermore, an upper bound of

{

M−_k1

2

|

k

=

0

,

1

, . . .}

is given in Theorem2.3. Similar to the approach in [21,23], we shall apply the SMWF:

(

_A

_±

_U_CV

₎

−1

₌

_A−1

_∓

_A−1_U

₍

_I

_±

_CV_A−1_U

₎

−1_CV_A−1

to various inverses of matrices in sparse-plus-low-rank (splr) form, enabling the computation of large inverses of size n in terms of much smaller matrices. In the following lemma, we show that the small size matrix

(

I

±

CV A−1U

)

is invertible provided that A and

(

A

±

U CV

)

are invertible.

(7)

Lemma 3.1. If A and

(

A

±

U CV

)

are invertible then

(

I

±

CV A−1U

)

is invertible.

Proof. Suppose that A and

(

A

±

U CV

)

are invertible. Then ⎡ ⎣ A

±

U CV 0 0

∓

I ⎤ ⎦ is invertible. Since A is invertible and ⎡ ⎣I

∓

U 0 I ⎤ ⎦ ⎡ ⎣ A

±

U CV 0 0

∓

I ⎤ ⎦ ⎡ ⎣ I 0

∓

CV I ⎤ ⎦

₌

⎡ ⎣ A U CV

∓

I ⎤ ⎦

=

⎡ ⎣ I 0 CV A−1

∓

I ⎤ ⎦ ⎡ ⎣ A 0 0 I

±

CV A−1U ⎤ ⎦ ⎡ ⎣I A−1U 0 I ⎤ ⎦

_,

we have shown that

(

I

±

CV A−1U

)

is invertible.

3.2. Algorithm 1

For k

=

0

,

1

, . . .

, we can organize the SDA so that the iterates have the recursive forms

Ak

=

FaRakGHa

,

Bk

=

FbRbkGHb

;

Qk

=

Q

−

FbRqkGHa

,

Pk

=

FaRpkGHb

;

(17)

with Rak

∈ C

ra×ra, Rbk

∈ C

rb×rband Rpk

,

RHqk

∈ C

ra×rb. The general forms in (17) can be verified easily from (6), when identifying the updating formulae in (23) and the initail values in (25). Note that we can equivalently formulate the SDA in terms of Ak, Bk, Pkand the new variable Qk

≡

Q

−

Qk, with the symmetry in the low-ranked Pkand Qk. Note also that the row and column spaces of all these matrices remain constant, with only the various kernels Rs varying with k. Also, Qkare low rank updates of Q . (For the nano research application in [13,16], this corresponds to the behaviour that only upper right corner in Akand the lower right corner of Qkare changing for different k.)

We require the inverse of Mk

=

Qk

−

Pkin the SDA in (6). From (17), we have

Mk

=

Q

− [

Fa

,

Fb

]

Rmk

[

Ga

,

Gb

]

H

,

(18) where Rmk

≡

⎡ ⎣ 0 Rpk Rqk 0 ⎤ ⎦

_{∈ C}

(ra+rb)×(ra+rb)_{. Applying the SMWF to M}−1 k yields M−_k1

=

Q−1

+

Q−1

[

Fa

,

Fb

]

Nk

[

Ga

,

Gb

]

HQ−1 (19) with Nk

≡ (

Ira+rb

−

RmkT

)

− 1_R mk (20) and T

=

⎡ ⎣Taa Tab Tba Tbb ⎤ ⎦

_{≡ [}

Ga

,

Gb

]

HQ−1

[

Fa

,

Fb

].

(21)

(8)

Remark 3.1. Since Q and Mkare invertible, it follows from Lemma3.1that Ira+rb

−

RmkT

(

k

=

0

,

1

, . . .)

are invertible. In the following, we shall give upper bounds of

I

−

RmkT

2and

(

I

−

RmkT

)

−1

2for k

=

0

,

1

, . . .

when

[

Ga

,

Gb

]

and

[

Fa

,

Fb

]

are of full column rank. Assume

σ

min

([

Ga

,

Gb

]) = σ

min,G

>

0

and

σ

min

([

Fa

,

Fb

]) = σ

min,F

>

0. Since Ga, Gb, Faand Fbare unitary, we obtain

[

Ga

,

Gb

]

2

√

2

,

[

Fa

,

Fb

]

2

√

2

,

[

Ga

,

Gb

]

†

2

=

1

/σ

min,G

, [

Fa

,

Fb

]

†

2

=

1

/σ

min,F

,

(22)

where

(·)

†denotes the pseudoinverse of a matrix. From (18), (21) and (22), we have

Rm,k

2

(

Mk

2

+

Q

2

)/(σ

min,F

σ

min,G

)

and

T

2 2

Q−1

2. Hence, we obtain

I

+

RmkT

2 1

+

2

Q−1

2

σ

min,F

σ

min,G

(

Mk

2

+

Q

2

).

From (19) and (22), we have

Nk

2

(

Q

2

Mk−1

2

+

1

)

Q

2

/(σ

min,F

σ

min,G

)

. Using the fact that

(

I

−

RmkT

)

−1

(

I

−

RmkT

) =

I, it follows that

(

I

−

RmkT

)

−1

=

NkT

+

I and hence

(

I

−

RmkT

)

−1

2 1

+

2

Q

2

Q−1

2

σ

min,F

σ

min,G

(

Q

2

Mk−1

2

+

1

).

With (17) and (19), the SDA in (6) now becomes the updating formulae:

Ra,k+1

=

RakWkaaRak

,

Rb,k+1

=

RbkWkbbRbk

;

Rq,k+1

=

Rqk

+

RbkWkbaRak

,

Rp,k+1

=

Rpk

+

RakWkabRbk

;

(23)

where W_kuv

≡

GH_uM_k−1Fv

(

u

,

v

=

a

,

b

)

. From (19), we obtain ⎡

⎣Wkaa Wkab

W_kba W_kbb

⎤

⎦

₌

T

+

TNkT

.

(24)

The computation requires about26₃

(

r3_a

+

r_b3

)+

22rarb

(

ra

+

rb

)

flops for each iteration (the detailed count is given in Table1), with the help of (19), after the pre-processing in O

(

n

)

complexity for quantities like Q−1U, Q−HV

(

U

=

Fa

,

Fband V

=

Ga

,

Gb

)

in (20) and T in (21).

For initial values, we have the obvious

Ra0

=

Ra

,

Rb0

=

Rb

;

Rp0

,

Rq0

=

0

.

(25)

In [21–23,28], the SDA of type 1 has been extended for large-scalarStein/Lyapunov and AREs equa-tions. The iterates Akin the SDA of type 1 are computed recursively, without being forming explicitly. As the SDA of type 2 in (6) now translates to the updating formulae (23) in

C

ru×rv_(u

,

_v

=

_a

,

_{b), this} previously important aspect (c) in Section3.1of the SDA_ls is now irrelevant.

The SDA_ls for an NME (and its dual), realizes the iteration in (6) with the help of (17), (19) and (23), the initial values in (25), and the convergence control in Section4.1. We summarize the SDA_ls in Algorithm 1 below.

(9)

Algorithm 1 (SDA_ls)

Input

: A

=

FaRaGHa

,

B

=

FbRbGHb

,

Q

∈ C

n×n, positive tolerance

;

Output

: Rp

,

RHq

∈ C

ra×rb, with Q

−

FbRqGHa and Q

−

FaRpGHb,

approximating, respectively, the solutions X and X to the

large-scale NME (1) and its dual (3);

Orthogonalize

Fa

,

Fb

,

Ga

,

Gb, modify Ra

,

Rb; (if required) Setk

=

0, r0

=

2

; Ra0

=

Ra, Rb0

=

Rb; Rp0

,

Rq0

=

0;

(as in (25))

Do

until convergence:

If

the relative residual

˜

rk

= |

rk

/(

qk

+

mk

)| <

,

Set

Rq

=

Rqkand Rp

=

Rpk;

Exit

End If

Compute

W_kuv

=

G_uHM−_k1Fv

(

u

,

v

=

a

,

b

)

, (as in (20) and (24))

Ra,k+1

=

RakWkaaRak,

Rb,k+1

=

RbkWkbbRbk,

Rq,k+1

=

Rqk

+

RbkWkbaRak, and

Rp,k+1

=

Rpk

+

RakWkabRbk; (as in (23))

Compute

k

←

k

+

1, rk, qkand mk; (as in (40) in Section4.1)

End Do

3.3. A non-symmetric discrete-time algebraic Riccati equation

From Section3.2, Q

−

Qkand PkHare low-ranked with small rb

×

rakernels. It suggests that the solution X, or its kernel Y , can be characterized by a matrix equation of lower dimensions, as stated in the following lemma. We shall not elaborate on the similar result for the solution X of dual equation.

Lemma 3.2. Let X be a solution of the large-scale NME (1). Then there exists Y

∈ C

rb×ra_{such that}

X

=

Q

−

FbYGHa

.

(26)

Proof. Suppose that X is a solution of NME (1). Substituting (16) into (1) and setting Y

=

RbGHbX−1FaRa, we obtain (26).

We shall now show that Y in (26) satisfies an uncommon non-symmetric discrete-time algebraic Riccati equation (NARE_D). Note that the standard non-symmetric algebraic Riccati equations (NARE) [28] are of continuous-time type in most literature.

Substituting (16) and (26) into (1), we have

(10)

Multiplying F_bHand Gafrom the left and the right of (27), respectively, and applying the SMWF, we obtain

−

Y

+

RbGbH Q−1

+

Q−1FbY

(

I

−

GaHQ−1FbY

)

−1GaHQ−1 FaRa

=

0

.

(28)

With the notation in (21), (28) is equivalent to

−

Y

+

RbTbaRa

+

RbTbbY

(

I

−

TabY

)

−1TaaRa

=

0

,

or, the NARE_D

D

(

Y

) ≡ −

Y

+

TbbY

(

I

−

TabY

)

−1 Taa

+

Tba

=

0

,

(29) where

Taa

≡

TaaRa

,

Tbb

≡

RbTbb

,

Tba

≡

RbTbaRa

,

Tab

≡

Tab

.

(30) In [15], an NARE is first processed by Cayley transform to the form (29) and then doubling is applied. We shall refer to this approach (without the Cayley transform) as Algorithm 2.

Algorithm 2 (SDA_ls)

Input

: A

=

FaRaGHa

,

B

=

FbRbGHb

,

Q

∈ C

n×n, positive tolerance

;

Output

: Y

,

YH

∈ C

rb×ra_{, with Q}

−

_F

bYGHa and Q

−

FaYGbH, approximating, respectively, the solutions X and X to the

large-scale NME (1) and its dual (3);

Orthogonalize

Fa

,

Fb

,

Ga

,

Gb, modify Ra

,

Rb; (if required) Compute Taa, Tbb, Tab, Tba; (as in (30)) tba

=

Tba

;

Setk

=

0,r

˜

0

=

2

; Taa,0

=

Taa, Tbb,0

=

Tbb, Tba,0

=

Tba, Tab,0

=

Tab;

Do

until convergence:

If

the relative residual

˜

rk

= |

rk

/(

qk

+

mk

+

tba

)| <

,

Set

Y

=

Tba,kand Y

=

Tab,k;

Exit

End If

Compute

_T_aa_,_k₊₁

₌

_T_aa_,_k

₍

_I_r a

−

Tab,k Tba,k

)

−1 Taa,k, Tbb,k+1

=

Tbb,k

(

Irb

−

Tba,k Tab,k

)

− 1_T bb,k, _T_ab_,_k₊₁

₌

_T_ab_,_k

₊

_T_aa_,_k _T_ab_,_k

₍

_I_r b

−

Tba,k Tab,k

)

− 1 _T bb,k, and Tba,k+1

=

Tba,k

+

Tbb,k Tba,k

(

Ira

−

Tab,k Tba,k

)

− 1_T aa,k;

Compute

k

←

k

+

1, rk

=

D

(

Tba,k

)

, qk

=

Tba,k

and mk

=

Tbb Tba,k

(

I

−

Tab Tba,k

)

−1 Taa

;

End Do

Notice that the computation of Algorithm 2 can be realized as: computing Tu,v(u

,

v

=

a

,

b) in (21) requires O

(

n

)

computational complexity; computing Tu,v(u

,

v

=

a

,

b) in (30) requires 2ra3

+

2rb3

+

2rarb

(

ra

+

rb

)

flops; for each iteration, it requires22₃ra3

+

10ra2rb

+

8rarb2

+

143rb3flops. The detailed count

(11)

is given in Table2. For the case ra

=

rb, Algorithm 2 requires 8ra3

+(

30ra3

)

k flops, where k is the number of iterations, after the pre-processing in O

(

n

)

computational complexity. Algorithms 1 and 2 all need to compute Tu,v(u

,

v

=

a

,

b) in (21) which required O

(

n

)

computational complexity. From Table1, we obtain that Algorithm 1 requires 74r_a3flops per iteration when ra

=

rb. When ra

≈

n1/3

n, Algorithm 2 will be more efficient than Algorithm 1. However, we guarantee that all iterations in Algorithm 1 are well-defined, but the same for Algorithm 2 cannot be guaranteed. After each iterative step, Rqkin Algorithm 1 and Tba,kin Algorithm 2 are different but they converge to the same Y . From our numerical experience, for a given example, the two algorithms converge in the same number of iterations, possibly the result of Theorem3.3below.

Suppose that X

=

Q

−

FbYGHa and X

=

Q

−

FaY GHb are the stabilizing solutions of NME and its dual, respectively. We shall show below, when all iterations of Algorithm 2 are well-defined then Tba,k

→

Y and Tab,k

→

Y as k

→ ∞

. Let M

=

⎡ ⎣ TaaRa 0

−

RbTbaRa Irb ⎤ ⎦

_,

L

=

⎡ ⎣Ira

−

Tab 0 RbTbb ⎤ ⎦

_{∈ C}

(ra+rb)×(ra+rb)

.

₍₃₁₎ Suppose that (29) has a solution Y

∈ C

rb×ra_{. Then Y satisfies that}

M ⎡ ⎣Ira Y ⎤ ⎦

₌

L ⎡ ⎣Ira Y ⎤ ⎦ S

,

(32) where S

= (

Ira

−

TabY

)

−1TaaRa

∈ C

ra×ra.

Theorem 3.3. Let X

=

Q

−

FbYGHa and X

=

Q

−

FaY GHb be the stabilizing solutions of, respectively, NME (1) and its dual (3). Then

(i) X−1A and S

= (

Ira

−

TabY

)

− 1_T

aaRahave the same nonzero eigenvalues.

(ii) X−1B and S

= (

Irb

−

TbaY

)

− 1_T

bbRbhave the same nonzero eigenvalues.

Proof.

(

i

)

From (16), we have X−1A

= (

Q

−

FbYGHa

)

−1FaRaGHa. Applying the SMWF to

(

Q

−

FbYGHa

)

−1 yields

X−1A

= (

Q−1Fa

+

Q−1FbY

(

Ira

−

G H

aQ−1FbY

)

−1GHaQ−1Fa

)

RaGHa

.

(33) Let G

= [

Ga

,

Ga

]

be a unitary matrix. Multiply GHand G from the left and right of (33), respectively, we obtain GHX−1AG

=

⎡ ⎣Z 0 . 0 ⎤ ⎦

_,

where Z

=

Taa

+

TabY

(

Ira

−

TabY

)

− 1_T aa Ra

= (

Ira

−

TabY

)

− 1_T aaRa

=

S

.

Hence, X−1A and S

= (

Ira

−

TabY

)

− 1_T

aaRahave the same nonzero eigenvalues. The proof of

(

ii

)

is similar to the proof of

(

i

)

.

As mentioned before, we may require the eigenvalues of X−1A or X−1B in some applications, and

Theorem3.3provides an efficient route to the nonzero parts of these spectra via the much smaller matrices S and S.

(12)

Suppose that X

=

Q

−

FaY GHb is a stabilizing solution of (3). Then the kernel Y

∈ C

ra×rb and S

= (

Irb

−

TbaY

)

− 1_T bbRbsatisfy ⎡ ⎣ TbbRb 0

−

RaTabRb Ira ⎤ ⎦ ⎡ ⎣Irb Y ⎤ ⎦

₌

⎡ ⎣Irb

−

Tba 0 RaTaa ⎤ ⎦ ⎡ ⎣Irb Y ⎤ ⎦S

.

Let Y

=

R−_a1Y R_b−1

,

S

=

RbSR−b1

.

(34) It is easily seen that

M ⎡ ⎣ Y Irb ⎤ ⎦ S

=

L ⎡ ⎣ Y Irb ⎤ ⎦

_,

(35)

where M and L are defined in (31). It follows from (32), (34), (35) and Theorem3.3that the matrix pencil

λ

L

−

M has raeigenvalues inside the unit circle and rbeigenvalues outside the unit circle. Similar to the theory in [5,15], if the matrix sequences

{

Taa,k

}

,

{

Tbb,k

}

,

{

Tab,k

}

and

{

Tba,k

}

generated by Algorithm 2 are well-defined, then we have

Tba,k

−

Y

O

(ρ(

S

)

2 k

ρ(

S

)

2k

_),

_T ab,k

−

Y

O

(ρ(

S

)

2 k

ρ(

S

)

2k

_),

₍₃₆₎ where S

= (

Ira

−

TabY

)

−1TaaRaand S is defined in (34) with

ρ(

S

) <

1 and

ρ(

S

) <

1. For Algorithm 1, it has been shown in [13] that

Rq,k

−

Y

= (

Q

−

FbRq,kGaH

) − (

Q

−

FbYGHa

)

O

(ρ(

X−1B

)

2 k

ρ(

X−1A

)

2k

),

Rp,k

−

Y

= (

Q

−

FaRp,kGbH

) − (

Q

−

FaY GHb

)

O

(ρ(

X−1B

)

2 k

ρ(

X−1A

)

2k

),

(37)

where

{

Rq,k

}

and

{

Rp,k

}

are generated by Algorithm 1. From (36), (37) and Theorem3.3, it is easily seen that Algorithms 1 and 2 converge in the same number of iterations.

It is intriguing, that we started from an NME associated with the SDA of type 2 and ended up with an equivalent NARE_D associated with the SDA of type 1. Similar links between NMEs and AREs have been considered before. In [10], an NME has been transformed to a discrete-time ARE when B

=

A∗. The transformation of an NARE into a unilateral quadratic matrix polynomial (UQME), which was then solved by the SDA of type 2, has recently been studied in [4].

3.4. Errors in SDA_ls

It is easy to see from (6) that errors in the iterates will propagate through the SDA. Let

δ

Ak,

δ

Bk,

δ

Pk and

δ

Qkbe the errors in Ak, Bk, Pkand Qk, respectively. From (6), withk

≡ δ

Qk

− δ

Pk, and ignoring higher order terms, we have

δ

Ak+1

≈ δ

AkMk−1Ak

−

AkMk−1kM−k1Ak

+

AkMk−1

δ

Ak

,

δ

Bk+1

≈ δ

BkMk−1Bk

−

BkMk−1kM−k1Bk

+

BkMk−1

δ

Bk

,

δ

Pk+1

≈ δ

Pk

+ δ

AkMk−1Bk

−

AkMk−1kMk−1Bk

+

AkMk−1

δ

Bk

,

δ

Qk+1

≈ δ

Qk

− δ

BkMk−1Ak

+

BkMk−1kMk−1Ak

−

BkM−k1

δ

Ak

.

(13)

With

δ

k

≡

max

{ δ

Ak

, δ

Bk

, δ

Pk

, δ

Qk

}

, cak

≡

M−k1

Ak

and cbk

≡

Mk−1

Bk

, we have

δ

k+1

δ

k

·

max

{

2cak

(

1

+

cak

),

2cbk

(

1

+

cbk

),

1

+ (

cak

+

cbk

)

2

} +

O

(δ

k2

).

From Theorem2.3, we have the upper bounds of

{

M−_k1

2

|

k

=

0

,

1

, . . .}

dependent only on

φ

0

(

z

)

, or from (9), only on A, B and Q . Using the fact that

Ak

and

Bk

are also bounded in [13], the errors

δ

Ak,

δ

Bk,

δ

Pkand

δ

Qkthen pass into

δ

Ak+1,

δ

Bk+1,

δ

Pk+1and

δ

Qk+1, creating errors of the same order. Note that ignoring the higher terms simplifies the error equations, without altering the conclusions of the discussion. The fact that Ak

,

Bk

→

0, or cak

,

cbk

→

0, will contribute towards diminishing the errors.

4. Computational issues

4.1. Residual and convergence control

For the convergence control in Algorithm 1, we should compute residuals and differences of iterates carefully. Note that it is much easier to compute the smaller analogous quantities in Algorithm 2.

Consider the differences of successive iterates:

dQk

≡

Qk+1

−

Qk

=

Fb

(

Rqk

−

Rq,k+1

)

GaH

,

dPk

≡

Pk+1

−

Pk

=

Fa

(

Rp,k+1

−

Rpk

)

GHb

,

implying that

dQk

=

Rqk

−

Rq,k+1

,

dPk

=

Rpk

−

Rp,k+1

in 2- or F-norm, because the Fs and Gs are unitary by choice. The computations of

dQk

and

dPk

can be achieved in about 8rarb

ˆ

rab(see [11, p. 254]) and 4rarbflops for 2-norm and F-norm, respectively, where

ˆ

rab

=

max

{

ra

,

rb

}.

Similarly, we have the residual rk

≡

R

(

Qk

)

of the NME, the corresponding relative residual equals rk

≡

rk qk

+

mk (38) with qk

≡

Q

−

Qk

=

Qk

,

mk

≡

BQk−1A

.

(39)

With the low-rank forms in (17), we then have

rk

= −

FbRqkGaH

+

FbRbGHb

(

Q

−

FbRqkGHa

)

−1FaRaGHa

,

qk

=

FbRqkGaH

,

mk

=

FbRbGbH

(

Q

−

FbRqkGHa

)

− 1_F

aRaGHa

.

After applying the SMWF, using the notation in (21), we have the efficient formulae

rk

=

−

Rqk

+

Rb Tba

+

Tbb

(

Irb

−

RqkTab

)

− 1_R qkTaa Ra

,

qk

=

Rqk

,

mk

=

Rb Tba

+

Tbb

(

Irb

−

RqkTab

)

− 1_R qkTaa Ra

.

(40)

(14)

The computation of relative residual requires about 6r_a2rb

+

4rarb2

+

8 3r

3

bflops, assuming the Ts are available, and using F-norm in (40). If we use 2-norm, then an additional 12rarb

ˆ

rabflops are required. Notice that the computation of relative residual in Algorithm 2 requires8₃r_a3

+

4r_a2rb

+

2rarb2flops.

On the relation between residuals and actual errors in the computed solutions, please consult [27,29].

4.2. Operation and memory counts

In Algorithms 1 and 2, the dominant calculations of O

(

n

)

computational complexity and memory requirement are in the pre-processing, with help from the structure of Q . We shall assume that cqn flops are required in the solution of Qv

=

r or QHv

=

r, with v

,

r

∈ C

n.

For the pre-processing, the cost of 2

(

cq

+

ra

+

rb

)(

ra

+

rb

)

n flops is made up of the following: (1) compute Q−1U and VHQ−1

(

U

=

Fa

,

Fband V

=

Ga

,

Gb

)

, requiring 2cq

(

ra

+

rb

)

n flops; and (2) compute VHQ−1U (U

=

Fa

,

Fband V

=

Ga

,

Gb), requiring 2

(

ra

+

rb

)

2n flops.

There is also a memory requirement for

(

ra

+

rb

)(

ra

+

rb

+

2

)

n variables. In addition, there may be up to 8

(

r_a2

+

r_b2

)

n flops required for the orthogonalization of Fu

,

Gv(u

,

v

=

a

,

b) [11, p. 250], if required and dependent on the exact structures in A and B.

After the pre-processing of data, it is easily seen that the memory requirement is about O

((

ra

+

rb

)

2

)

variables, per iteration in both Algorithms 1 and 2. The memory requirement of each iteration is much smaller than

(

ra

+

rb

)(

ra

+

rb

+

2

)

n (the memory required in the pre-processing). Hence, we do not need to count the memory requirement for each iteration in Algorithms 1 and 2. In Algorithm 2, we need to compute Tu,v(u

,

v

=

a

,

b) as in (30), which requires 2ra3

+

2rb3

+

2rarb

(

ra

+

rb

)

flops, before starting the iteration.

The detailed operation counts for the kth iteration in the SDA for large-scale NMEs of Algorithms 1 and 2 are summarized in Tables1and2below, respectively, with all the kernels formed explicitly. Only the dominant counts are recorded and the F-norm is applied. When B

=

A.and Q is Hermitian, the workload and memory requirement will be halved.

Table 1

Operation counts for the kth iteration in Algorithm 1 (SDA_ls).

Computation Flops RmkT 4rarb(ra+rb) (I−RmkT)−1RmkT 83(ra+rb)3 Wuv k (u,v=a,b) 2(ra+rb)3 Ra,k+1, Rb,k+1 4(ra3+r3b) Rq,k+1, Rp,k+1 4rarb(ra+rb) rk, qk, mk 6ra2rb+4rar2b+ 8 3r 3 b Total 26₃r3 a+28r2arb+26rarb2+ 34 3r 3 b Table 2

Operation counts for the kth iteration in Algorithm 2 (SDA_ls).

Computation Flops Tab,k Tba,k 2r2arb Tba,k Tab,k 2rarb2 Taa,k+1, Tba,k+1 143r 3 a+2rarb(ra+rb) Tbb,k+1, Tab,k+1 143r 3 b+2rarb(ra+rb) rk, qk, mk 83r 3 a+4r2arb+2rar2b Total 22₃r3 a+10r2arb+8rar2b+ 14 3r 3 b

(15)

5. Numerical examples

All the numerical experiments were conducted using MATLAB 2010a on an iMac with a 2.97 GHz Intel i7 processor and 8 Gigabyte RAM. To measure the accuracy of a computed stabilizing solution X of NME (1), we use the relative and absolute residuals:

RRes

≡

X

+

BX

−1_A

₋

_Q

X

−

Q

+

BX−1_A

,

ARes

≡

X

+

BX−

1_A

₋

_Q

_.

₍₄₁₎

Suppose that Rq

=

Rqk∗ and Y

=

Tba,k∗ are the computed kernels by Algorithms 1 and 2, re-spectively, and k_∗is the required number of iterations for convergence. From (38)–(40), we obtain that the relative and absolute residuals of the computed stabilizing solution X

=

Q

−

FbRq,GHa are RRes

=

rk_∗

/(

qk_∗

+

mk_∗

)

and ARes

=

rk_∗, respectively, where rk_∗, qk_∗ and mk_∗are defined in (40). From (27)–(30), it is easily seen that the relative and absolute residuals of the computed stabilizing solution X

=

Q

−

FbYGHa are RRes

=

rk_∗

/(

qk_∗

+

mk_∗

+

tab

)

and ARes

=

rk_∗, respectively, where rk_∗,

qk_∗, mk_∗and tabare given in Algorithm 2. Furthermore, we use“Time 1” and “Time 2”to represent the

executiontimes for the pre-processing data and SDA, respectively.

Example 1. In order to report the actual errors in the computed solution, we consider an example with exact solution. Suppose that A

=

iD and B

=

iDHwhere D

∈ C

n×n with ra

=

rb

=

3. We randomly generate RD

∈ C

ra×ra,Fa

,

Ga

∈ C

n×ra and setFa

:=

Fa

(

FaHFa

)

−

1

2_{, G}_a

:=

_G_a

(

_GH aGa

)

−

1 2 and RD

=

RD

/(

4

RD

2

)

. Assume thatD

=

FaRDGHa, then A and B have the full-rank decompositions A

=

FaRaGaHand B

=

FbRbGHb where

Gb

=

Fa

,

Fb

=

Ga

,

Ra

=

iRD

,

Rb

=

iRHD

.

Let H

∈ C

n×ra_{and set H}

=

_H

(

_HH_H

)

−12_{. Assume that}

Xe

=

i

(

In

−

0

.

5HHH

)

(42)

is the exact solution of NME. Then X_e−1

= −

i

(

In

+

HHH

)

and Q

=

iQI

=

i

(

In

−

0

.

5HHH

+

GaRDHFaH

(

In

+

HHH

)

FaRDGHa

)

=

i

(

In

−

0

.

5HHH

+

GaRHDRDGaH

+

GaRHDFaHHHHFaRDGHa

).

Since

D

2

=

RD

2

=

1

/

4 and

σ

min

(

QI

)

σ

min

(

In

−

0

.

5HHH

) =

1

/

2, it is easily seen that

ψ(

z

) =

zDH

+

QI

+

z−1D is positive definite for each z

∈ T

and

ρ(

Xe−1A

) <

1. That is, Xein (42) is the stabilizing solution of X

+

BX−1A

=

Q .

We compute the stabilizing solution X with n

=

102, 5

×

102, 103and 5

×

103, respectively, by using Algorithm 1 with positive tolerance

=

10−10. The numerical results are shown in Table3. Example 2. We first consider an example associated with the computation of Green function in nano research [13,16].

Table 3

(Example 1) Numbers of iterations, absolute residuals, relative residuals and X−Xe 2.

n 102 ₅_×₁₀2 ₁₀3 ₅_×₁₀3

# iterations (k∗) 5 5 5 5

ARes 1.46×10−17 1.75×10−17 1.82×10−17 1.39×10−17 RRes 6.48×10−17 ₇_.₈₆_×₁₀−17 ₈_.₂₈_×₁₀−17 ₆_.₃₅_×₁₀−17 X−Xe 2 4.01×10−17 1.11×10−16 1.11×10−16 1.11×10−16

(16)

Table 4

(Example 2) Numbers of iterations, absolute residuals, relative residuals andexecutiontimes in seconds.

n 102 ₁₀3 ₁₀4 ₁₀5 ₁₀6 ₁₀7 # iterations (k∗)6 6 6 6 7 6 ARes 1.85×10−16 ₂_.₃₆_×₁₀−16 ₂_.₇₁_×₁₀−16 ₂_.₀₇_×₁₀−16 ₂_.₅₂_×₁₀−16 ₁_.₈₃_×₁₀−16 RRes 7.16×10−17 ₈_.₁₄_×₁₀−17 ₈_.₉₂_×₁₀−17 ₇_.₁₆_×₁₀−17 ₆_.₉₆_×₁₀−17 ₆_.₆₀_×₁₀−17 Time 1 1.65×10−3 3.65×10−3 1.57×10−2 1.24×10−1 9.94×10−1 133.16 Time 2 5.06×10−3 ₇_.₀₅_×₁₀−3 ₇_.₁₆_×₁₀−3 ₇_.₇₆_×₁₀−3 ₈_.₆₂_×₁₀−4 ₇_.₄₁_×₁₀−3 Table 5

(Example 2) O(n)Computational Complexity.

n 1×106 ₂_×₁₀6 ₃_×₁₀6 ₄_×₁₀6 ₅_×₁₀6 ₆_×₁₀6 # iterations (k∗)7 6 6 6 6 6 ARes 2.52×10−16 1.98×10−16 2.46×10−16 1.85×10−16 2.10×10−16 1.89×10−16 RRes 6.96×10−17 ₇_.₁₉_×₁₀−17 ₉_.₈₆_×₁₀−17 ₆_.₇₆_×₁₀−17 ₅_.₉₅_×₁₀−17 ₇_.₂₆_×₁₀−17 Time 1 0.994 1.89 2.81 3.75 4.70 5.69 Time 2 8.62×10−4 ₅_.₀₈_×₁₀−3 ₆_.₂₇_×₁₀−3 ₈_.₁₇_×₁₀−3 ₅_.₁₆_×₁₀−4 ₈_.₄₇_×₁₀−4

With ra

=

3, and rb

=

5 and we randomly generate Ra

∈ C

ra×ra, Rb

∈ C

rb×rb, Fa

,

Ga

∈ C

n×raand

Fb

,

Gb

∈ C

n×rband set Fu

:=

Fu

(

FuHFu

)

− 1 2_{, G}_u

:=

_G_u

(

_GH uGu

)

− 1 2 _(u

=

_a

,

_{b). Then F}_u_{, G}_u_(u

=

_a

,

_{b) are}

unitary. Recall that A and B have the forms

A

=

FaRaGHa

,

B

=

FbRbGbH

.

Let Q be the tridiagonal matrix of dimension n with 2 on the main diagonal and

−

1 on the two adjacent diagonals. Choose a suitable

∈ R

and set Q

:=

Q

+

i

I such that A

,

B and Q satisfy the solvability

condition which is given in Theorem2.1, i.e.,

ψ(λ) =

I

+ λ

D

+ λ

−1DH

>

0

,

for all

λ ∈ T,

(43) where D

= (

A

−

BH

)/(

2i

)

. In the following numerical experiments, we choose

=

5. We compute the stabilizing solution X with n

=

102, 103, 104, 105, 106and 107, by using Algorithm 1 with the positive tolerance

=

10−10. The numerical results are shown in Table4. Somehow, when the size of the problem jumped from n

=

106to 107, the memory requirement crossed some critical boundary concerning virtual memory on the computer. The amount ofexecution timejumped 134-folds, instead of the previous 1.9- to 8-folds when n was increased 10-folds. Anyway, Algorithm 1 was successful in solving the associated NMEs to machine accuracy (in terms of residuals), for n

=

102 to 106, in reasonably quickexecution times. This is consistent with the predicted O

(

n

)

computational complexity and memory requirement for the SDA.

Tosee the O

(

n

)

computational complexity more clearly, we construct Table5for n

=

j

×

106

(

j

=

1

:

6

)

, with the corresponding execution time equals approximately to j seconds.

The numerical results from Algorithm 2 is similar. Example 3.

Next we consider a small example associated with the surface acoustic wave simulation [18,19]. Here we have