• 沒有找到結果。

Solving large-scale nonlinear matrix equations by doubling

N/A
N/A
Protected

Academic year: 2021

Share "Solving large-scale nonlinear matrix equations by doubling"

Copied!
19
0
0

加載中.... (立即查看全文)

全文

(1)

Contents lists available atSciVerse ScienceDirect

Linear Algebra and its Applications

journal homepage:w w w . e l s e v i e r . c o m / l o c a t e / l a a

Solving large-scale nonlinear matrix equations by doubling

Peter Chang-Yi Weng

a

, Eric King-Wah Chu

a,

, Yueh-Cheng Kuo

b

,

Wen-Wei Lin

c

aSchool of Mathematical Sciences, Building 28, Monash University, Vic 3800, Australia bDepartment of Applied Mathematics, National University of Kaohsiung, Kaohsiung 811, Taiwan cDepartment of Applied Mathematics, National Chiao Tung University, Hsinchu 300, Taiwan

A R T I C L E I N F O A B S T R A C T

Article history:

Received 22 November 2011 Accepted 7 August 2012

Available online 19 September 2012 Submitted by P.Semrlˇ AMS classification: 15A24 65F50 Keywords: Doubling algorithm Green’s function Krylov subspace Leaky surface wave Nano research

Nonlinear matrix equation Surface acoustic wave

We consider the solution of the large-scale nonlinear matrix equa-tion X+BX−1AQ = 0, with A,B,Q,X ∈ Cn×n, and in some applications B = A. (. = or H). The matrix Q is assumed to be nonsingular and sparse with its structure allowing the solu-tion of the corresponding linear system Qv = r in O(n) computa-tional complexity. Furthermore, B and A are respectively of ranks ra,rbn. The type 2 structure-preserving doubling algorithm by Lin and Xu (2006) [24] is adapted, with the appropriate applica-tions of the Sherman–Morrison–Woodbury formula and the low-rank updates of various iterates. Two resulting large-scale doubling algorithms have an O((ra+rb)3)computational complexity per it-eration, after some pre-processing of data in O(n)computational complexity and memory requirement, and converge quadratically. These are illustrated by the numerical examples.

© 2012 Elsevier Inc. All rights reserved.

1. Introduction

Consider the nonlinear matrix equation (NME)

R

(

X

) ≡

X

+

BX−1A

Q

=

0 (1)

with A

,

B

,

Q

,

X

∈ C

n×n. We assume that Q is nonsingular with structures, like being banded or sparse, allowing the solution of the corresponding linear system Qv

=

r in O

(

n

)

computational complexity.

∗Corresponding author.

E-mail addresses:[email protected](P.C.-Y. Weng),[email protected](E.K.-w. Chu),[email protected](Y.-C. Kuo), [email protected](W.-W. Lin).

0024-3795/$ - see front matter © 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.laa.2012.08.008

(2)

We further assume that A

,

B are respectively of ranks ra

,

rb



n. These NMEs arise in the solution of palindromic eigenvalue problems, with applications in the computation of Green’s function in nano research [12–14,16] and surface acoustic simulations [18,19]; for the individual models, structures of the particular NMEs and their solvability conditions, please consult these references. For the surface acoustic wave application in [18,19], we have Q

=

Q

∈ C

n×n and B

=

A

∈ C

n×n. In some applications as in [18,19], selected eigenvalues from the pencils

λ

X

A or

λ

B

X are required, after

the solution X to (1) is found.

We shall adapt the structure-preserving doubling algorithm (SDA) of type 2 [13,16,24] for the NME in (1), resulting in an efficient large-scale doubling algorithm (SDA_ls). The original SDA is usually attributed to Anderson [1],1 as an accelerated variant of the direct functional iteration method. It has recently been revitalized and further developed in [5–7], for a great variety of applications [9]. Recently, we have extended the SDA (of type 1) for large-scale algebraic Riccati equations (AREs) [21,22,28] and the associated linear equations [23], with the resulting algorithms possessing an efficient O

(

n

)

computational complexity and memory requirement per iteration. We shall extend these methods to NMEs, based on similar philosophy. Notice the important difference in the large-scale NME, from large-scale AREs, that the solution X is nonsingular and not numerically low-ranked. (We shall see later that X is a numerically low-ranked update of the nonsingular Q .) Interestingly, the essential steps of compression and truncation of Krylov bases for large-scale AREs are not required for NMEs. Also, the SDA_ls for large-scale NMEs is of a more efficient O

((

ra

+

rb

)

3

)

computational complexity per iteration, as compared to O

(

n

)

for AREs. The overall algorithm shares the O

(

n

)

computational complexity and memory requirement because of the pre-processing of data.

Similar techniques in this paper are applicable to the cyclic reduction method in [25] for X

±

A.X−1A

Q

=

0 (.

= ,

H; denoting the transpose and the Hermitian).

2. Preliminaries

In this section, we shall introduce some notations, briefly describe the solvability condition for NME (1) and give some preliminary results. Throughout this paper, we denote the unit circle in the complex plane by

T

. For a matrix A

∈ C

n×n,

σ (

A

)

and

ρ(

A

)

denote respectively the spectrum and spectral radius of A, and

σ

max

(

A

)

and

σ

min

(

A

)

are respectively the maximum and minimum singular values of

A. The conjugate transpose and transpose of A are denoted by AHand Arespectively. We can write

A

=

AR

+

iAI, where the Hermitian matrices

AR

=

1 2

(

A

+

A H

),

A I

=

1 2i

(

A

A H

)

are called the real part and the imaginary part of A, respectively. For Hermitian matrices A1, A2

∈ C

n×n, we use A1

>

A2(A1

A2) to denote the fact that A1

A2is positive definitive (positive semi-definite).

The NME in (1) can be reformulated as

R

(

X

) =

X

+ (

CH

+

iDH

)

X−1

(

C

+

iD

) − (

QR

+

iQI

) =

0

,

where C

12

(

A

+

BH

)

, D

2i1

(

A

BH

)

and QR, QIare the real part and the imaginary part of Q , respectively. Let

ψ(

z

) =

zDH

+

QI

+

z−1D (2)

be a rational matrix-valued function. The following is a consequence of the solvability results from [13] and the proof can be found in [14], after superficial modifications.

1 In [2, p. 149], we have the quotation “Doubling algorithms have been part of the folklore associated with Riccati equations in

linear systems problems for some time. We are unable to give any original reference containing material close to that presented here.” This quotation from Anderson, to whom the SDA is widely attributed, indicates the uncertain origin of the method.

(3)

Theorem 2.1. Let A

=

C

+

iD, B

=

CH

+

iDHand Q

=

QR

+

iQIbe n

×

n complex matrices. Suppose

that

ψ(

z

)

defined in (2) is positive definite for each z

∈ T

.

(i) The matrix polynomial P

(

z

) =

z2B

zQ

+

A has exactly n eigenvalues each inside and outside

T

. (ii) The NME (1) has a solution X

=

XR

+

iXIwith

ρ(

X−1A

) <

1 and XI

>

0.

Note that if X is solution of NME, then P

(

z

) = (

zBX−1

I

)

X

(

zI

X−1A

)

. So the eigenvalues of

X−1A are the n eigenvalues of P

(

z

)

, and the eigenvalues of BX−1are the reciprocals of remaining n eigenvalues of P

(

z

)

. A solution X of NME is said to be stabilizing if

ρ(

X−1A

) <

1. From Theorem2.1, if

ψ(

z

) >

0 for each z

∈ T

, then the NME and its dual

X

+

AX−1B

=

Q (3)

have stabilizing solutions.

Remark 2.1. Theorem2.1gives a sufficient condition for the existence of stabilizing solutions of NMEs. For applications in the computation of the surface Greens function in nano research [13,16] , we have

QI

=

I, D

=

0 and C, QRbeing real, and it is easy to check that the sufficient condition holds. For the surface wave application in [18,19], we have C, D, QRand QIbeing real such that

ψ(

z

) >

0 for each

z

∈ T

. For the special case where C

,

QR

=

0, if

ψ(

z

) >

0 for each z

∈ T

, then

X

+ (

iDH

)

X−1

(

iD

) =

iQI (4)

has a stabilizing solution XSwith XS,I

>

0,

ρ(

XS−1

(

iD

)) <

1 and

ρ((

iDH

)

XS−1

) <

1. We shall show that the real part of XSis zero.Consider the nonlinear matrix equation

X

+ (

iDH

)

X−1

(

iD

) = −

iQI

.

(5)

Itis obvious that

XSand XSHare solutions with

ρ(−

XS1

(

iD

)) <

1 and

ρ(

XSH

(

iD

)) <

1, i.e.,

XS and XSHare stabilizing solutions. Since the stabilizing solution of (5) is unique, XSH

= −

XS. Hence, XS has only imaginary part, i.e., XS

=

iXS,I. Since XS,I

>

0, substituting XS

=

iXS,Iinto (4) implies that

X

+

DHX−1D

=

QIhas a Hermitian positive definite solution XS,I. This coincides with the result in [10]. Under the assumption that

ψ(

z

) >

0 for each z

∈ T

, we know that the NME and its dual have stabilizing solutions X and X, respectively. The corresponding SDA of type 2 [13,16,24] has the form

A0

=

A

,

B0

=

B

,

Q0

=

Q

,

P0

=

0

;

Ak+1

=

Ak

(

Qk

Pk

)

−1Ak

,

Bk+1

=

Bk

(

Qk

Pk

)

−1Bk

;

Qk+1

=

Qk

Bk

(

Qk

Pk

)

−1Ak

,

Pk+1

=

Pk

+

Ak

(

Qk

Pk

)

−1Bk

.

(6) From [13, Theorem 3.1], all the iterates are well-defined (i.e., Mk

Qk

Pkare always invertible),

Qk

X, Q

Pk

X and A k

,

Bk

0, all quadratically as k

→ ∞

. In the following, we shall give an upper bound of

{

Mk1

2

:

k

=

0

,

1

, . . .}

For the case where B

=

AHand Q

=

QH, an upper bound has been given in [3, Theorem 15].

The following lemma is useful to estimate the upper bound.

Lemma 2.2. Let

=

R

+

iI

∈ C

n×n, whereRandIare the real and imaginary parts of. Suppose

thatIis positive (negative) definite.

(i) is invertible and −1

=

−1

(4)

withI 1R

(

I

+

RI1R

)

−1being Hermitian and

−(

I

+

RI 1R

)

−1being Hermitian

negative (positive) definite.

(ii) IfI



I (I

−

I) for some

 >

0, then

σ

min

(



)





, i.e.,

−1

2



−1.

Proof.

(

i

)

For eachunitvector x

∈ C

n,sinceR,Iare Hermitian andIis positive (negative) definite,

we have

x

2

= (

R

+

iI

)

x

2

xH

(

R

+

iI

)

x

2

xHIx

2

σ

min

(

I

).

(8) Hence,is invertible. From

(

R

+

iI

)[

I−1R

(

I

+

RI−1R

)

−1

i

(

I

+

RI 1R

)

−1

]

= (

R

+

iI

)(

I 1R

iI

)(

I

+

RI 1R

)

−1

= (

I

+

RI 1R

)(

I

+

RI 1R

)

−1

=

I

,

we obtain (7). SinceIis positive (negative) definite andR

=

HR, it is easy to see that

−(

I

+

RI 1R

)

−1 is Hermitian negative (positive) definite. Finally, we show that I 1R

(

I

+

RI 1R

)

−1is Hermitian. From (7), we have

[

−1

I R

(

I

+

RI−1R

)

−1

i

(

I

+

RI 1R

)

−1

](

R

+

iI

) =

I

.

It follows thatI 1R

(

I

+

RI 1R

)

−1I

− (

I

+

RI 1R

)

−1R

=

0. SinceRandIare Hermitian, we deduce thatI 1R

(

I

+

RI 1R

)

−1is Hermitian.

(

ii

)

IfI



I (I

−

I) for some

 >

0 then from (8), we have

σ

min

(



) =

min

x =1

x

2

.

Hence,

−1

2

= σ

min

(



)

−1



−1. 

Suppose that

ψ(

z

) >

0 for each z

∈ T

. Let

Mk

=

Qk

Pk

, φ

k

(

z

) = −

Bkz

+

Mk

Akz−1

,

(9) where Ak, Bk, Qk, Pkare generated by SDA (6). From [13], we know that Mkis invertible. Then

φ

k

(

z

)

Mk−1

φ

k

(−

z

) = (−

Bkz

+

Mk

Akz−1

)(

Mk−1Bkz

+

I

+

Mk1Akz−1

)

= −

BkMk1Bkz2

+ (

Mk

BkMk−1Ak

AkMk−1Bk

) −

AkMk−1Akz−2

= −

Bk+1z2

+

Mk+1

Ak+1z−2

= φ

k+1

(

z2

).

(10) Let

φ

0,R

(

z

)

and

φ

0,I

(

z

)

be the real and imaginary parts of

φ

0

(

z

)

, respectively. Since, for z

∈ T

,

φ

0

(

z

) = φ

0,R

(

z

) +

i

φ

0,I

(

z

), φ

0,I

(

z

) = ψ(−

z

) >

0

.

(11) Lemma2.2implies that

φ

0

(

z

)

is invertible for z

∈ T

. It follows from (10) that for k

=

0

,

1

, . . .

,

φ

k

(

z

)

is invertible for z

∈ T

. Let

ϕ

k

(

z

) = φ

k

(

z

)

−1for z

∈ T

and

(5)

where

ϕ

k,R

(

z

)

and

ϕ

k,I

(

z

)

are the real and imaginary parts of

ϕ

k

(

z

)

. Taking the inverse of (10) yields

ϕ

k+1

(

z2

) = φ

k+1

(

z2

)

−1

= φ

k

(−

z

)

−1Mk

φ

k

(

z

)

−1

= φ

k

(−

z

)

−1 

φ

k

(

z

) + φ

k

(−

z

)

2 

φ

k

(

z

)

−1

=

1 2

k

(

z

)

−1

+ φ

k

(−

z

)

−1

] =

1 2

k

(

z

) + ϕ

k

(−

z

)].

(13) From (11), (12) and Lemma2.2, we have

ϕ

0,I

(

z

) = −(φ

0,I

(

z

) + φ

0,R

(

z

0,I

(

z

)

−1

φ

0,R

(

z

))

−1

<

0 for

z

∈ T

. It follows from (13) that for each k,

ϕ

k,I

(

z

) <

0 for z

∈ T

and max

z∈T

σ

max

k,R

(

z

))

 maxz∈T

σ

max

0,R

(

z

)) ≡ σ

max,R

,

max

z∈T

σ

max

k,I

(

z

))

 maxz∈T

σ

max

0,I

(

z

)) ≡ σ

max,I

,

min

z∈T

σ

min

k,I

(

z

))

 minz∈T

σ

min

0,I

(

z

)) ≡ σ

min,I

>

0

.

(14) For z

∈ T

, we then have

ϕ

k,R

(

z

)

2

σ

max,R

, ϕ

k,I

(

z

)

2

σ

max,I

, ϕ

k,I

(

z

)

−1

2

σ

min−1,I

.

(15) The following theorem gives upper bounds of

Mk

2and

Mk−1

2for k

=

0

,

1

, . . .

Theorem 2.3. Let A

=

C

+

iD, B

=

CH

+

iDHand Q

=

QR

+

iQIbe given such that

ψ(

z

)

is positive for

each z

∈ T

. Let Mk

=

Qk

Pk, where Qkand Pkare generated by the SDA in (6). Then

Mk

2 

σ

2 max,R

+ σ

min2 ,I

σ

2 min,I

,

Mk1

2

σ

max,I

+

σ

2 max,R

σ

min,I

,

where

σ

max,R,

σ

max,Iand

σ

min,Iare given in (14), which are only dependent on

φ

0

(

z

)

for z

∈ T

. Proof. For each k

=

0

,

1

, . . .

, from (15), we have that for each z

∈ T

,



σ

max,R

σ

min,I 

ϕ

k,I

(

z

)



ϕ

k,R

(

z

)

 

σ

max,R

σ

min,I 

ϕ

k,I

(

z

),

0

< σ

min,II

−ϕ

k,I

(

z

) − ϕ

k,R

(

z

k,I

(

z

)

−1

ϕ

k,R

(

z

)

 

σ

max,I

+

σ

2 max,R

σ

min,I  I

.

Let

φ

k,R

(

z

)

and

φ

k,I

(

z

)

be the real and imaginary parts of

φ

k

(

z

)

, respectively. From Lemma2.2

(

i

)

and using the fact that

φ

k

(

z

) = ϕ

k

(

z

)

−1, we have for z

∈ T

,



σ

max,R

σ

2 min,I  I

φ

k,R

(

z

)





σ

max,R

σ

2 min,I  I

,

σ

−1 min,II

φ

k,I

(

z

)

 

σ

max,I

+

σ

2 max,R

σ

min,I −1 I

.

(6)

Since Mk

= [φ

k

(

z

) + φ

k

(−

z

)]/

2

= [φ

k,R

(

z

) + φ

k,R

(−

z

)]/

2

+

i

k,I

(

z

) + φ

k,I

(−

z

)]/

2, we have

Mk

2    

σ

max,R

σ

2 min,I 2

+ σ

−2 min,I

=



σ

2 max,R

+ σ

min2 ,I

σ

2 min,I and Mk,I

=

1 2i

(

Mk

M H k

) =

φ

k,I

(

z

) + φ

k,I

(−

z

)

2  

σ

max,I

+

σ

2 max,R

σ

min,I −1 I

.

It follows from Lemma2.2

(

ii

)

that

Mk1

2

σ

max,I

+ σ

max2 ,R

min,I

.



For the special case C

,

QR

=

0, it follows from (14) and Theorem2.3that

Mk

2

σ

min−1,I and

Mk−1

2

σ

max,I. This coincides with the results in [3, Theorem 15].

3. Large-scale doubling algorithm

3.1. Main ideas

From [21–23,28], the main ideas behind the algorithm for large-scale problems are:

(a) The appropriate application of the Sherman–Morrison–Woodbury formula (SMWF) in order to avoid the inversion of large or unstructured matrices.

(b) The use of low-rank updates for various iterates.

(c) The computation of matrix operators (Ak) recursively, to preserve the corresponding sparsity or low-rank structures, instead of forming them explicitly.

(d) The careful organization of convergence control in the algorithm, so as to preserve the low computational complexity and memory requirement per iteration.

For the SDA for large-scale NMEs, we shall see that (c) is not relevant.

Let A, B, Q

∈ C

n×nbe given such that

ψ(

z

)

defined in (2) is positive definite for each z

∈ T

and A,

B be respectively of ranks ra, rb



n. Assume the full-rank decompositions

A

=

FaRaGHa

,

B

=

FbRbGbH (16)

with Ra

∈ C

ra×ra and Rb

∈ C

rb×rb. Without loss of generality, we shall assume that Fa, Fb, Gaand

Gbare unitary. In this paper, we shall call some matrices “kernels”, mostly denoted by R with various subscripts (Y is also used in Section3.3). Most of our computation will be done in terms of kernels.

From Theorem2.1, we know that NME (1) and its dual (3) have stabilizing solutions X and X,

respectively. The SDA for NME and its dual has the form in (6) which requires the inverse of Mk

=

Qk

Pk. It is shown in [13] that Mkis invertible for each k

=

0

,

1

, . . .

Furthermore, an upper bound of

{

Mk1

2

|

k

=

0

,

1

, . . .}

is given in Theorem2.3. Similar to the approach in [21,23], we shall apply the SMWF:

(

A

±

U CV

)

−1

=

A−1

A−1U

(

I

±

CV A−1U

)

−1 CV A−1

to various inverses of matrices in sparse-plus-low-rank (splr) form, enabling the computation of large inverses of size n in terms of much smaller matrices. In the following lemma, we show that the small size matrix

(

I

±

CV A−1U

)

is invertible provided that A and

(

A

±

U CV

)

are invertible.

(7)

Lemma 3.1. If A and

(

A

±

U CV

)

are invertible then

(

I

±

CV A−1U

)

is invertible.

Proof. Suppose that A and

(

A

±

U CV

)

are invertible. Then ⎡ ⎣ A

±

U CV 0 0

I⎦ is invertible. Since A is invertible and ⎡ ⎣I

U 0 I ⎤ ⎦ ⎡ ⎣ A

±

U CV 0 0

I ⎤ ⎦ ⎡ ⎣ I 0

CV I ⎤ ⎦

=

⎡ ⎣ A U CV

I ⎤ ⎦

=

⎡ ⎣ I 0 CV A−1

I ⎤ ⎦ ⎡ ⎣ A 0 0 I

±

CV A−1U ⎤ ⎦ ⎡ ⎣I A−1U 0 I ⎤ ⎦

,

we have shown that

(

I

±

CV A−1U

)

is invertible. 

3.2. Algorithm 1

For k

=

0

,

1

, . . .

, we can organize the SDA so that the iterates have the recursive forms

Ak

=

FaRakGHa

,

Bk

=

FbRbkGHb

;

Qk

=

Q

FbRqkGHa

,

Pk

=

FaRpkGHb

;

(17)

with Rak

∈ C

ra×ra, Rbk

∈ C

rb×rband Rpk

,

RHqk

∈ C

ra×rb. The general forms in (17) can be verified easily from (6), when identifying the updating formulae in (23) and the initail values in (25). Note that we can equivalently formulate the SDA in terms of Ak, Bk, Pkand the new variable Qk

Q

Qk, with the symmetry in the low-ranked Pkand Qk. Note also that the row and column spaces of all these matrices remain constant, with only the various kernels Rs varying with k. Also, Qkare low rank updates of Q . (For the nano research application in [13,16], this corresponds to the behaviour that only upper right corner in Akand the lower right corner of Qkare changing for different k.)

We require the inverse of Mk

=

Qk

Pkin the SDA in (6). From (17), we have

Mk

=

Q

− [

Fa

,

Fb

]

Rmk

[

Ga

,

Gb

]

H

,

(18) where Rmk

⎡ ⎣ 0 Rpk Rqk 0 ⎤ ⎦

∈ C

(ra+rb)×(ra+rb). Applying the SMWF to M−1 k yields Mk1

=

Q−1

+

Q−1

[

Fa

,

Fb

]

Nk

[

Ga

,

Gb

]

HQ−1 (19) with Nk

≡ (

Ira+rb

RmkT

)

− 1R mk (20) and T

=

⎡ ⎣Taa Tab Tba Tbb ⎤ ⎦

≡ [

Ga

,

Gb

]

HQ−1

[

Fa

,

Fb

].

(21)

(8)

Remark 3.1. Since Q and Mkare invertible, it follows from Lemma3.1that Ira+rb

RmkT

(

k

=

0

,

1

, . . .)

are invertible. In the following, we shall give upper bounds of

I

RmkT

2and

(

I

RmkT

)

−1

2for k

=

0

,

1

, . . .

when

[

Ga

,

Gb

]

and

[

Fa

,

Fb

]

are of full column rank. Assume

σ

min

([

Ga

,

Gb

]) = σ

min,G

>

0

and

σ

min

([

Fa

,

Fb

]) = σ

min,F

>

0. Since Ga, Gb, Faand Fbare unitary, we obtain

[

Ga

,

Gb

]

2

2

,

[

Fa

,

Fb

]

2

2

,

[

Ga

,

Gb

]

2

=

1

min,G

, [

Fa

,

Fb

]

2

=

1

min,F

,

(22)

where

(·)

†denotes the pseudoinverse of a matrix. From (18), (21) and (22), we have

Rm,k

2

(

Mk

2

+

Q

2

)/(σ

min,F

σ

min,G

)

and

T

2 2

Q−1

2. Hence, we obtain

I

+

RmkT

2 1

+

2

Q−1

2

σ

min,F

σ

min,G

(

Mk

2

+

Q

2

).

From (19) and (22), we have

Nk

2

(

Q

2

Mk−1

2

+

1

)

Q

2

/(σ

min,F

σ

min,G

)

. Using the fact that

(

I

RmkT

)

−1

(

I

RmkT

) =

I, it follows that

(

I

RmkT

)

−1

=

NkT

+

I and hence

(

I

RmkT

)

−1

2 1

+

2

Q

2

Q−1

2

σ

min,F

σ

min,G

(

Q

2

Mk−1

2

+

1

).

With (17) and (19), the SDA in (6) now becomes the updating formulae:

Ra,k+1

=

RakWkaaRak

,

Rb,k+1

=

RbkWkbbRbk

;

Rq,k+1

=

Rqk

+

RbkWkbaRak

,

Rp,k+1

=

Rpk

+

RakWkabRbk

;

(23)

where Wkuv

GHuMk−1Fv

(

u

,

v

=

a

,

b

)

. From (19), we obtain ⎡

Wkaa Wkab

Wkba Wkbb

=

T

+

TNkT

.

(24)

The computation requires about263

(

r3a

+

rb3

)+

22rarb

(

ra

+

rb

)

flops for each iteration (the detailed count is given in Table1), with the help of (19), after the pre-processing in O

(

n

)

complexity for quantities like Q−1U, QHV

(

U

=

Fa

,

Fband V

=

Ga

,

Gb

)

in (20) and T in (21).

For initial values, we have the obvious

Ra0

=

Ra

,

Rb0

=

Rb

;

Rp0

,

Rq0

=

0

.

(25)

In [21–23,28], the SDA of type 1 has been extended for large-scalarStein/Lyapunov and AREs equa-tions. The iterates Akin the SDA of type 1 are computed recursively, without being forming explicitly. As the SDA of type 2 in (6) now translates to the updating formulae (23) in

C

ru×rv(u

,

v

=

a

,

b), this previously important aspect (c) in Section3.1of the SDA_ls is now irrelevant.

The SDA_ls for an NME (and its dual), realizes the iteration in (6) with the help of (17), (19) and (23), the initial values in (25), and the convergence control in Section4.1. We summarize the SDA_ls in Algorithm 1 below.

(9)

Algorithm 1 (SDA_ls)

Input

: A

=

FaRaGHa

,

B

=

FbRbGHb

,

Q

∈ C

n×n, positive tolerance



;

Output

: Rp

,

RHq

∈ C

ra×rb, with Q

FbRqGHa and Q

FaRpGHb,

approximating, respectively, the solutions X and X to the

large-scale NME (1) and its dual (3);

Orthogonalize

Fa

,

Fb

,

Ga

,

Gb, modify Ra

,

Rb; (if required) Setk

=

0, r0

=

2



; Ra0

=

Ra, Rb0

=

Rb; Rp0

,

Rq0

=

0;

(as in (25))

Do

until convergence:

If

the relative residual

˜

rk

= |

rk

/(

qk

+

mk

)| < 

,

Set

Rq

=

Rqkand Rp

=

Rpk;

Exit

End If

Compute

Wkuv

=

GuHMk1Fv

(

u

,

v

=

a

,

b

)

, (as in (20) and (24))

Ra,k+1

=

RakWkaaRak,

Rb,k+1

=

RbkWkbbRbk,

Rq,k+1

=

Rqk

+

RbkWkbaRak, and

Rp,k+1

=

Rpk

+

RakWkabRbk; (as in (23))

Compute

k

k

+

1, rk, qkand mk; (as in (40) in Section4.1)

End Do

3.3. A non-symmetric discrete-time algebraic Riccati equation

From Section3.2, Q

Qkand PkHare low-ranked with small rb

×

rakernels. It suggests that the solution X, or its kernel Y , can be characterized by a matrix equation of lower dimensions, as stated in the following lemma. We shall not elaborate on the similar result for the solution X of dual equation.

Lemma 3.2. Let X be a solution of the large-scale NME (1). Then there exists Y

∈ C

rb×rasuch that

X

=

Q

FbYGHa

.

(26)

Proof. Suppose that X is a solution of NME (1). Substituting (16) into (1) and setting Y

=

RbGHbX−1FaRa, we obtain (26). 

We shall now show that Y in (26) satisfies an uncommon non-symmetric discrete-time algebraic Riccati equation (NARE_D). Note that the standard non-symmetric algebraic Riccati equations (NARE) [28] are of continuous-time type in most literature.

Substituting (16) and (26) into (1), we have

(10)

Multiplying FbHand Gafrom the left and the right of (27), respectively, and applying the SMWF, we obtain

Y

+

RbGbH  Q−1

+

Q−1FbY

(

I

GaHQ−1FbY

)

−1GaHQ−1  FaRa

=

0

.

(28)

With the notation in (21), (28) is equivalent to

Y

+

RbTbaRa

+

RbTbbY

(

I

TabY

)

−1TaaRa

=

0

,

or, the NARE_D

D

(

Y

) ≡ −

Y

+

TbbY

(

I

TabY

)

−1 Taa

+

Tba

=

0

,

(29) where

Taa

TaaRa

,

Tbb

RbTbb

,

Tba

RbTbaRa

,

Tab

Tab

.

(30) In [15], an NARE is first processed by Cayley transform to the form (29) and then doubling is applied. We shall refer to this approach (without the Cayley transform) as Algorithm 2.

Algorithm 2 (SDA_ls)

Input

: A

=

FaRaGHa

,

B

=

FbRbGHb

,

Q

∈ C

n×n, positive tolerance



;

Output

: Y

,

YH

∈ C

rb×ra, with Q

F

bYGHa and Q

FaYGbH, approximating, respectively, the solutions X and X to the

large-scale NME (1) and its dual (3);

Orthogonalize

Fa

,

Fb

,

Ga

,

Gb, modify Ra

,

Rb; (if required) Compute Taa, Tbb, Tab, Tba; (as in (30)) tba

=

Tba

;

Setk

=

0,r

˜

0

=

2



; Taa,0

=

Taa, Tbb,0

=

Tbb, Tba,0

=

Tba, Tab,0

=

Tab;

Do

until convergence:

If

the relative residual

˜

rk

= |

rk

/(

qk

+

mk

+

tba

)| < 

,

Set

Y

=

Tba,kand Y

=

Tab,k;

Exit

End If

Compute

Taa,k+1

=

Taa,k

(

Ir a

Tab,k Tba,k

)

−1 Taa,k, Tbb,k+1

=

Tbb,k

(

Irb

Tba,k Tab,k

)

− 1 T bb,k, Tab,k+1

=

Tab,k

+

Taa,k Tab,k

(

Ir b

Tba,k Tab,k

)

− 1 T bb,k, and Tba,k+1

=

Tba,k

+

Tbb,k Tba,k

(

Ira

Tab,k Tba,k

)

− 1 T aa,k;

Compute

k

k

+

1, rk

=

D

(

Tba,k

)

, qk

=

Tba,k

and mk

=

Tbb Tba,k

(

I

Tab Tba,k

)

−1 Taa

;

End Do

Notice that the computation of Algorithm 2 can be realized as: computing Tu,v(u

,

v

=

a

,

b) in (21) requires O

(

n

)

computational complexity; computing Tu,v(u

,

v

=

a

,

b) in (30) requires 2ra3

+

2rb3

+

2rarb

(

ra

+

rb

)

flops; for each iteration, it requires223ra3

+

10ra2rb

+

8rarb2

+

143rb3flops. The detailed count

(11)

is given in Table2. For the case ra

=

rb, Algorithm 2 requires 8ra3

+(

30ra3

)

k flops, where k is the number of iterations, after the pre-processing in O

(

n

)

computational complexity. Algorithms 1 and 2 all need to compute Tu,v(u

,

v

=

a

,

b) in (21) which required O

(

n

)

computational complexity. From Table1, we obtain that Algorithm 1 requires 74ra3flops per iteration when ra

=

rb. When ra

n1/3



n, Algorithm 2 will be more efficient than Algorithm 1. However, we guarantee that all iterations in Algorithm 1 are well-defined, but the same for Algorithm 2 cannot be guaranteed. After each iterative step, Rqkin Algorithm 1 and Tba,kin Algorithm 2 are different but they converge to the same Y . From our numerical experience, for a given example, the two algorithms converge in the same number of iterations, possibly the result of Theorem3.3below.

Suppose that X

=

Q

FbYGHa and X

=

Q

FaY GHb are the stabilizing solutions of NME and its dual, respectively. We shall show below, when all iterations of Algorithm 2 are well-defined then Tba,k

Y and Tab,k

Y as k

→ ∞

. Let M

=

⎡ ⎣ TaaRa 0

RbTbaRa Irb ⎤ ⎦

,

L

=

⎡ ⎣Ira

Tab 0 RbTbb ⎤ ⎦

∈ C

(ra+rb)×(ra+rb)

.

(31) Suppose that (29) has a solution Y

∈ C

rb×ra. Then Y satisfies that

M ⎡ ⎣Ira Y ⎤ ⎦

=

L ⎡ ⎣Ira Y⎦ S

,

(32) where S

= (

Ira

TabY

)

−1TaaRa

∈ C

ra×ra.

Theorem 3.3. Let X

=

Q

FbYGHa and X

=

Q

FaY GHb be the stabilizing solutions of, respectively, NME (1) and its dual (3). Then

(i) X−1A and S

= (

Ira

TabY

)

− 1T

aaRahave the same nonzero eigenvalues.

(ii) X−1B and S

= (

Irb

TbaY

)

− 1T

bbRbhave the same nonzero eigenvalues.

Proof.

(

i

)

From (16), we have X−1A

= (

Q

FbYGHa

)

−1FaRaGHa. Applying the SMWF to

(

Q

FbYGHa

)

−1 yields

X−1A

= (

Q−1Fa

+

Q−1FbY

(

Ira

G H

aQ−1FbY

)

−1GHaQ−1Fa

)

RaGHa

.

(33) Let G

= [

Ga

,

Ga

]

be a unitary matrix. Multiply GHand G from the left and right of (33), respectively, we obtain GHX−1AG

=

⎡ ⎣Z 0 . 0 ⎤ ⎦

,

where Z

=

 Taa

+

TabY

(

Ira

TabY

)

− 1T aa  Ra

= (

Ira

TabY

)

− 1T aaRa

=

S

.

Hence, X−1A and S

= (

Ira

TabY

)

− 1T

aaRahave the same nonzero eigenvalues. The proof of

(

ii

)

is similar to the proof of

(

i

)

. 

As mentioned before, we may require the eigenvalues of X−1A or X−1B in some applications, and

Theorem3.3provides an efficient route to the nonzero parts of these spectra via the much smaller matrices S and S.

(12)

Suppose that X

=

Q

FaY GHb is a stabilizing solution of (3). Then the kernel Y

∈ C

ra×rb and S

= (

Irb

TbaY

)

− 1T bbRbsatisfy ⎡ ⎣ TbbRb 0

RaTabRb Ira ⎤ ⎦ ⎡ ⎣Irb Y ⎤ ⎦

=

⎡ ⎣Irb

Tba 0 RaTaa ⎤ ⎦ ⎡ ⎣Irb Y⎦S

.

Let Y

=

Ra1Y Rb−1

,

S

=

RbSRb1

.

(34) It is easily seen that

M ⎡ ⎣ Y Irb⎦ S

=

L ⎡ ⎣ Y Irb ⎤ ⎦

,

(35)

where M and L are defined in (31). It follows from (32), (34), (35) and Theorem3.3that the matrix pencil

λ

L

M has raeigenvalues inside the unit circle and rbeigenvalues outside the unit circle. Similar to the theory in [5,15], if the matrix sequences

{

Taa,k

}

,

{

Tbb,k

}

,

{

Tab,k

}

and

{

Tba,k

}

generated by Algorithm 2 are well-defined, then we have

Tba,k

Y



O

(ρ(

S

)

2 k

ρ(

S

)

2k

),

T ab,k

Y



O

(ρ(

S

)

2 k

ρ(

S

)

2k

),

(36) where S

= (

Ira

TabY

)

−1TaaRaand S is defined in (34) with

ρ(

S

) <

1 and

ρ(

S

) <

1. For Algorithm 1, it has been shown in [13] that

Rq,k

Y

= (

Q

FbRq,kGaH

) − (

Q

FbYGHa

) 

O

(ρ(

X−1B

)

2 k

ρ(

X−1A

)

2k

),

Rp,k

Y

= (

Q

FaRp,kGbH

) − (

Q

FaY GHb

) 

O

(ρ(

X−1B

)

2 k

ρ(

X−1A

)

2k

),

(37)

where

{

Rq,k

}

and

{

Rp,k

}

are generated by Algorithm 1. From (36), (37) and Theorem3.3, it is easily seen that Algorithms 1 and 2 converge in the same number of iterations.

It is intriguing, that we started from an NME associated with the SDA of type 2 and ended up with an equivalent NARE_D associated with the SDA of type 1. Similar links between NMEs and AREs have been considered before. In [10], an NME has been transformed to a discrete-time ARE when B

=

A∗. The transformation of an NARE into a unilateral quadratic matrix polynomial (UQME), which was then solved by the SDA of type 2, has recently been studied in [4].

3.4. Errors in SDA_ls

It is easy to see from (6) that errors in the iterates will propagate through the SDA. Let

δ

Ak,

δ

Bk,

δ

Pk and

δ

Qkbe the errors in Ak, Bk, Pkand Qk, respectively. From (6), with k

≡ δ

Qk

− δ

Pk, and ignoring higher order terms, we have

δ

Ak+1

≈ δ

AkMk−1Ak

AkMk−1 kMk1Ak

+

AkMk−1

δ

Ak

,

δ

Bk+1

≈ δ

BkMk−1Bk

BkMk−1 kMk1Bk

+

BkMk−1

δ

Bk

,

δ

Pk+1

≈ δ

Pk

+ δ

AkMk−1Bk

AkMk−1 kMk−1Bk

+

AkMk−1

δ

Bk

,

δ

Qk+1

≈ δ

Qk

− δ

BkMk−1Ak

+

BkMk−1 kMk−1Ak

BkMk1

δ

Ak

.

(13)

With

δ

k

max

{ δ

Ak

, δ

Bk

, δ

Pk

, δ

Qk

}

, cak

Mk1

Ak

and cbk

Mk−1

Bk

, we have

δ

k+1

 δ

k

·

max

{

2cak

(

1

+

cak

),

2cbk

(

1

+

cbk

),

1

+ (

cak

+

cbk

)

2

} +

O

k2

).

From Theorem2.3, we have the upper bounds of

{

Mk1

2

|

k

=

0

,

1

, . . .}

dependent only on

φ

0

(

z

)

, or from (9), only on A, B and Q . Using the fact that

Ak

and

Bk

are also bounded in [13], the errors

δ

Ak,

δ

Bk,

δ

Pkand

δ

Qkthen pass into

δ

Ak+1,

δ

Bk+1,

δ

Pk+1and

δ

Qk+1, creating errors of the same order. Note that ignoring the higher terms simplifies the error equations, without altering the conclusions of the discussion. The fact that Ak

,

Bk

0, or cak

,

cbk

0, will contribute towards diminishing the errors.

4. Computational issues

4.1. Residual and convergence control

For the convergence control in Algorithm 1, we should compute residuals and differences of iterates carefully. Note that it is much easier to compute the smaller analogous quantities in Algorithm 2.

Consider the differences of successive iterates:

dQk

Qk+1

Qk

=

Fb

(

Rqk

Rq,k+1

)

GaH

,

dPk

Pk+1

Pk

=

Fa

(

Rp,k+1

Rpk

)

GHb

,

implying that

dQk

=

Rqk

Rq,k+1

,

dPk

=

Rpk

Rp,k+1

in 2- or F-norm, because the Fs and Gs are unitary by choice. The computations of

dQk

and

dPk

can be achieved in about 8rarb

ˆ

rab(see [11, p. 254]) and 4rarbflops for 2-norm and F-norm, respectively, where

ˆ

rab

=

max

{

ra

,

rb

}.

Similarly, we have the residual rk

R

(

Qk

)

of the NME, the corresponding relative residual equals rk

rk qk

+

mk (38) with qk

Q

Qk

=

Qk

,

mk

BQk−1A

.

(39)

With the low-rank forms in (17), we then have

rk

= −

FbRqkGaH

+

FbRbGHb

(

Q

FbRqkGHa

)

−1FaRaGHa

,

qk

=

FbRqkGaH

,

mk

=

FbRbGbH

(

Q

FbRqkGHa

)

− 1F

aRaGHa

.

After applying the SMWF, using the notation in (21), we have the efficient formulae

rk

=



Rqk

+

Rb  Tba

+

Tbb

(

Irb

RqkTab

)

− 1R qkTaa  Ra

,

qk

=

Rqk

,

mk

=

Rb  Tba

+

Tbb

(

Irb

RqkTab

)

− 1R qkTaa  Ra

.

(40)

(14)

The computation of relative residual requires about 6ra2rb

+

4rarb2

+

8 3r

3

bflops, assuming the Ts are available, and using F-norm in (40). If we use 2-norm, then an additional 12rarb

ˆ

rabflops are required. Notice that the computation of relative residual in Algorithm 2 requires83ra3

+

4ra2rb

+

2rarb2flops.

On the relation between residuals and actual errors in the computed solutions, please consult [27,29].

4.2. Operation and memory counts

In Algorithms 1 and 2, the dominant calculations of O

(

n

)

computational complexity and memory requirement are in the pre-processing, with help from the structure of Q . We shall assume that cqn flops are required in the solution of Qv

=

r or QHv

=

r, with v

,

r

∈ C

n.

For the pre-processing, the cost of 2

(

cq

+

ra

+

rb

)(

ra

+

rb

)

n flops is made up of the following: (1) compute Q−1U and VHQ−1

(

U

=

Fa

,

Fband V

=

Ga

,

Gb

)

, requiring 2cq

(

ra

+

rb

)

n flops; and (2) compute VHQ−1U (U

=

Fa

,

Fband V

=

Ga

,

Gb), requiring 2

(

ra

+

rb

)

2n flops.

There is also a memory requirement for

(

ra

+

rb

)(

ra

+

rb

+

2

)

n variables. In addition, there may be up to 8

(

ra2

+

rb2

)

n flops required for the orthogonalization of Fu

,

Gv(u

,

v

=

a

,

b) [11, p. 250], if required and dependent on the exact structures in A and B.

After the pre-processing of data, it is easily seen that the memory requirement is about O

((

ra

+

rb

)

2

)

variables, per iteration in both Algorithms 1 and 2. The memory requirement of each iteration is much smaller than

(

ra

+

rb

)(

ra

+

rb

+

2

)

n (the memory required in the pre-processing). Hence, we do not need to count the memory requirement for each iteration in Algorithms 1 and 2. In Algorithm 2, we need to compute Tu,v(u

,

v

=

a

,

b) as in (30), which requires 2ra3

+

2rb3

+

2rarb

(

ra

+

rb

)

flops, before starting the iteration.

The detailed operation counts for the kth iteration in the SDA for large-scale NMEs of Algorithms 1 and 2 are summarized in Tables1and2below, respectively, with all the kernels formed explicitly. Only the dominant counts are recorded and the F-norm is applied. When B

=

A.and Q is Hermitian, the workload and memory requirement will be halved.

Table 1

Operation counts for the kth iteration in Algorithm 1 (SDA_ls).

Computation Flops RmkT 4rarb(ra+rb) (IRmkT)−1RmkT 83(ra+rb)3 Wuv k (u,v=a,b) 2(ra+rb)3 Ra,k+1, Rb,k+1 4(ra3+r3b) Rq,k+1, Rp,k+1 4rarb(ra+rb) rk, qk, mk 6ra2rb+4rar2b+ 8 3r 3 b Total 263r3 a+28r2arb+26rarb2+ 34 3r 3 b Table 2

Operation counts for the kth iteration in Algorithm 2 (SDA_ls).

Computation Flops Tab,k Tba,k 2r2arb Tba,k Tab,k 2rarb2 Taa,k+1, Tba,k+1 143r 3 a+2rarb(ra+rb) Tbb,k+1, Tab,k+1 143r 3 b+2rarb(ra+rb) rk, qk, mk 83r 3 a+4r2arb+2rar2b Total 223r3 a+10r2arb+8rar2b+ 14 3r 3 b

(15)

5. Numerical examples

All the numerical experiments were conducted using MATLAB 2010a on an iMac with a 2.97 GHz Intel i7 processor and 8 Gigabyte RAM. To measure the accuracy of a computed stabilizing solution X of NME (1), we use the relative and absolute residuals:

RRes

X

+

BX

−1A

Q

X

Q

+

BX−1A

,

ARes

X

+

BX

1A

Q

.

(41)

Suppose that Rq

=

Rqkand Y

=

Tba,k∗ are the computed kernels by Algorithms 1 and 2, re-spectively, and kis the required number of iterations for convergence. From (38)–(40), we obtain that the relative and absolute residuals of the computed stabilizing solution X

=

Q

FbRq,GHa are RRes

=

rk

/(

qk

+

mk

)

and ARes

=

rk, respectively, where rk, qk and mkare defined in (40). From (27)–(30), it is easily seen that the relative and absolute residuals of the computed stabilizing solution X

=

Q

FbYGHa are RRes

=

rk

/(

qk

+

mk

+

tab

)

and ARes

=

rk, respectively, where rk,

qk, mkand tabare given in Algorithm 2. Furthermore, we use“Time 1” and “Time 2”to represent the

executiontimes for the pre-processing data and SDA, respectively.

Example 1. In order to report the actual errors in the computed solution, we consider an example with exact solution. Suppose that A

=

iD and B

=

iDHwhere D

∈ C

n×n with ra

=

rb

=

3. We randomly generate RD

∈ C

ra×ra,Fa

,

Ga

∈ C

n×ra and setFa

:=

Fa

(

FaHFa

)

1

2, Ga

:=

Ga

(

GH aGa

)

1 2 and RD

=

RD

/(

4

RD

2

)

. Assume thatD

=

FaRDGHa, then A and B have the full-rank decompositions A

=

FaRaGaHand B

=

FbRbGHb where

Gb

=

Fa

,

Fb

=

Ga

,

Ra

=

iRD

,

Rb

=

iRHD

.

Let H

∈ C

n×raand set H

=

H

(

HHH

)

−12. Assume that

Xe

=

i

(

In

0

.

5HHH

)

(42)

is the exact solution of NME. Then Xe−1

= −

i

(

In

+

HHH

)

and Q

=

iQI

=

i

(

In

0

.

5HHH

+

GaRDHFaH

(

In

+

HHH

)

FaRDGHa

)

=

i

(

In

0

.

5HHH

+

GaRHDRDGaH

+

GaRHDFaHHHHFaRDGHa

).

Since

D

2

=

RD

2

=

1

/

4 and

σ

min

(

QI

)



σ

min

(

In

0

.

5HHH

) =

1

/

2, it is easily seen that

ψ(

z

) =

zDH

+

QI

+

z−1D is positive definite for each z

∈ T

and

ρ(

Xe−1A

) <

1. That is, Xein (42) is the stabilizing solution of X

+

BX−1A

=

Q .

We compute the stabilizing solution X with n

=

102, 5

×

102, 103and 5

×

103, respectively, by using Algorithm 1 with positive tolerance

 =

10−10. The numerical results are shown in Table3. Example 2. We first consider an example associated with the computation of Green function in nano research [13,16].

Table 3

(Example 1) Numbers of iterations, absolute residuals, relative residuals and XXe 2.

n 102 5×102 103 5×103

# iterations (k∗) 5 5 5 5

ARes 1.46×10−17 1.75×10−17 1.82×10−17 1.39×10−17 RRes 6.48×10−17 7.86×10−17 8.28×10−17 6.35×10−17 XXe 2 4.01×10−17 1.11×10−16 1.11×10−16 1.11×10−16

(16)

Table 4

(Example 2) Numbers of iterations, absolute residuals, relative residuals andexecutiontimes in seconds.

n 102 103 104 105 106 107 # iterations (k∗)6 6 6 6 7 6 ARes 1.85×10−16 2.36×10−16 2.71×10−16 2.07×10−16 2.52×10−16 1.83×10−16 RRes 7.16×10−17 8.14×10−17 8.92×10−17 7.16×10−17 6.96×10−17 6.60×10−17 Time 1 1.65×10−3 3.65×10−3 1.57×10−2 1.24×10−1 9.94×10−1 133.16 Time 2 5.06×10−3 7.05×10−3 7.16×10−3 7.76×10−3 8.62×10−4 7.41×10−3 Table 5

(Example 2) O(n)Computational Complexity.

n 1×106 2×106 3×106 4×106 5×106 6×106 # iterations (k∗)7 6 6 6 6 6 ARes 2.52×10−16 1.98×10−16 2.46×10−16 1.85×10−16 2.10×10−16 1.89×10−16 RRes 6.96×10−17 7.19×10−17 9.86×10−17 6.76×10−17 5.95×10−17 7.26×10−17 Time 1 0.994 1.89 2.81 3.75 4.70 5.69 Time 2 8.62×10−4 5.08×10−3 6.27×10−3 8.17×10−3 5.16×10−4 8.47×10−4

With ra

=

3, and rb

=

5 and we randomly generate Ra

∈ C

ra×ra, Rb

∈ C

rb×rb, Fa

,

Ga

∈ C

n×raand

Fb

,

Gb

∈ C

n×rband set Fu

:=

Fu

(

FuHFu

)

− 1 2, Gu

:=

Gu

(

GH uGu

)

− 1 2 (u

=

a

,

b). Then Fu, Gu(u

=

a

,

b) are

unitary. Recall that A and B have the forms

A

=

FaRaGHa

,

B

=

FbRbGbH

.

Let Q be the tridiagonal matrix of dimension n with 2 on the main diagonal and

1 on the two adjacent diagonals. Choose a suitable

∈ R

and set Q

:=

Q

+

i

I such that A

,

B and Q satisfy the solvability

condition which is given in Theorem2.1, i.e.,

ψ(λ) =

I

+ λ

D

+ λ

−1DH

>

0

,

for all

λ ∈ T,

(43) where D

= (

A

BH

)/(

2i

)

. In the following numerical experiments, we choose

=

5. We compute the stabilizing solution X with n

=

102, 103, 104, 105, 106and 107, by using Algorithm 1 with the positive tolerance

 =

10−10. The numerical results are shown in Table4. Somehow, when the size of the problem jumped from n

=

106to 107, the memory requirement crossed some critical boundary concerning virtual memory on the computer. The amount ofexecution timejumped 134-folds, instead of the previous 1.9- to 8-folds when n was increased 10-folds. Anyway, Algorithm 1 was successful in solving the associated NMEs to machine accuracy (in terms of residuals), for n

=

102 to 106, in reasonably quickexecution times. This is consistent with the predicted O

(

n

)

computational complexity and memory requirement for the SDA.

Tosee the O

(

n

)

computational complexity more clearly, we construct Table5for n

=

j

×

106

(

j

=

1

:

6

)

, with the corresponding execution time equals approximately to j seconds.

The numerical results from Algorithm 2 is similar. Example 3.

Next we consider a small example associated with the surface acoustic wave simulation [18,19]. Here we have

數據

Fig. 1. (Example 3) Sparsity patterns in M 1 and M 2 .

參考文獻

相關文件

Review a high-resolution wave propagation method for solving hyperbolic problems on mapped grids (which is basic integration scheme implemented in CLAWPACK) Describe

Keywords: Interior transmission eigenvalues, elastic waves, generalized eigen- value problems, quadratic eigenvalue problems, quadratic Jacobi-Davidson method..

Polynomial Jacobi Davidson Method for Large/Sparse Eigenvalue Problems..

For an important class of matrices the more qualitative assertions of Theorems 13 and 14 can be considerably sharpened. This is the class of consistly

In this paper, we have studied a neural network approach for solving general nonlinear convex programs with second-order cone constraints.. The proposed neural network is based on

Assume that the partial multiplicities of purely imaginary and unimodular eigenvalues (if any) of the associated Hamiltonian and symplectic pencil, respectively, are all even and

Qi (2001), Solving nonlinear complementarity problems with neural networks: a reformulation method approach, Journal of Computational and Applied Mathematics, vol. Pedrycz,

Abstract In this paper, we consider the smoothing Newton method for solving a type of absolute value equations associated with second order cone (SOCAVE for short), which.. 1