Large-Scale Discrete-Time Algebraic Riccati Equations — Numerically Low-Ranked Solution, SDA and Error Analysis.*

(1)

Numerically Low-Ranked Solution, SDA and Error Analysis

∗

.

Tiexiang Li

†

Yueh-Cheng Kuo

‡

Eric King-wah Chu

§

Wen-Wei Lin

¶

November 23, 2012

Abstract

We consider the solution of large-scale discrete-time algebraic Riccati equations with numerically low-ranked solutions. The structure-preserving doubling algorithm (SDA) will be adapted, with the iterates for A not explicitly computed but in the recursive form Ak=

A2k−1−D (1) k Sk[Dk(2)] > , where D(1)k and D (2)

k are low-ranked with Skbeing small in dimension.

With n being the dimension of the algebraic Riccati equations, the resulting algorithm is of an efficient O(n) computational complexity and memory requirement per iteration, without the need for any inner iterations, and essentially converge quadratically. A detailed error analysis, on the effects of truncation of iterates with an explicit relative error bound for the approximate solution from the SDA, and some numerical results will be presented.

Keywords. Discrete-time algebraic Riccati equation, doubling algorithm, Krylov subspace, large-scale problem, numerically low-ranked solution

AMS subject classifications. 15A24, 65F30, 93C05

1 Introduction

Let the system matrix A be large and sparse, possibly with band structures. The discrete-time algebraic Riccati equation (DARE):

D(X) ≡ −X + A>X(I + GX)−1A + H = 0, (1) with the low-ranked

G = BR−1B>, H = CT−1C>, (2)

∗_{Version of November 23, 2012}

†_{Department of Mathematics,} _{Southeast University,} _Nanjing, _211189, _{People’s Republic of China;}

[email protected]

‡_{Department of Applied Mathematics,} _{National University of Kaohsiung,} _{Kaohsiung 811,} _Taiwan;

[email protected]

§_{School of Mathematical Sciences, Building 28, Monash University, VIC 3800, Australia; [email protected]} ¶_Department _of _Applied _Mathematics, _National _Chiao _Tung _University, _Hsinchu _300, _Taiwan;

[email protected]

(2)

where R ∈ Rm×m, T ∈ Rl×l are positive definite, B ∈ Rn×m, C ∈ Rn×lare orthogonal, m, l n, and (A, B) and (A, C) are stabilizable and detectable, respectively, arises often in linear-quadratic optimal control problems [22, 27].

The solution of DAREs and continuous-time algebraic Riccati equations (CAREs) has been an extremely active area of research; see, e.g., [22, 27]. The usual solution methods such as the Schur vector method, symplectic SR methods, the matrix sign function, the matrix disk function or the doubling method have not made full use of the sparsity and structure in A, G and H. Requiring in general O(n3_{) flops and workspace of size O(n}2_{), they are obviously inappropriate}

for the large-scale problems we are interested in here.

For control problems for parabolic PDEs and the balancing based model order reduction of large linear systems, large-scale CAREs, DAREs, Lyapunov and Stein equations have to be solved; see [2]–[9], [15]–[21], [23, 24, 29, 30]. As stated in [6, 8], “the basic observation on which all methods for solving such kinds of matrix equations are based, is that often the (numerical) rank of the solution is very small compared to its actual dimension and therefore it allows for a good approximation via low rank solution factors”. Importantly, without solving the correspond-ing algebraic Riccati equations, alternative solutions to the optimal control problem require the deflating subspace of the corresponding Hamiltonian matrices or symplectic pencils which are prohibitively expensive to compute.

Benner and his collaborators have done much on large-scale algebraic Riccati equations; see [4]–[8], [29, 30] and the references therein. They built their methods on (inexact) Newton’s methods with inner iterations for the associated Lyapunov and Stein equations. Consult also [1], [16]–[21] on invariant and Krylov subspace methods. We shall adapt the structure-preserving doubling algorithm (SDA) [11, 12, 25], making use of the sparsity in A and the (numerically) low-ranked structures in G, H and X. For other applications of the SDA, see [13].

Main Contributions

Apart from being the first paper on the efficient adaptation of the SDA for the numerical solution of large-scale DAREs, we formalize the discussion on numerical rank, showing constructively when X will be numerically low-ranked. We show how the exponential growth in the rank of the approximate solution is controlled by truncation, while not affecting the convergence of the SDA adversely. For a “reasonable” accuracy tolerance, we show that the adapted SDA has an O(n) computational complexity and memory requirement per iteration. An error analysis, on the effects of truncation is presented, with explicit bounds on the relative forward errors of approximations to X and the dual solution Y of (4) (instead of the usual analysis of residuals). We also show that the SDA is closely related to Krylov subspace and Galerkin methods, benefitting from the power of approximation of the associated Krylov subspaces as well as the diminishing Ak (the

iterates for A in the SDA).

2 Structure-Preserving Doubling Algorithm

The following theorem on the DARE (1) is quoted from [28]:

Theorem 2.1 Suppose that B, C are of full column rank and R, T are positive. Assume that the pairs (A, B) and (A, C) are stabilizable and detectable, respectively. Then the DARE (1) has

(3)

a unique symmetric positive semi-definite solution X with ρ((I + GX)−1A) < 1. Furthermore, if (A, C) is completely reconstructible then the solution X of DARE is positive.

The structure-preserving doubling algorithm (SDA) [12], assuming (In+ GkHk)−1 exist, has

the following form: (starting from A0= A, G0= G and H0= H)

   Gk+1 = Gk+ Ak(In+ GkHk)−1GkA>k, Hk+1 = Hk+ A>kHk(In+ GkHk)−1Ak, Ak+1 = Ak(In+ GkHk)−1Ak. (3)

With X ≥ 0 from Theorem 2.1, we can define η-rank as follows.

Definition 2.2 Let X ∈ Cn×n. For a given numerical rank tolerance η > 0, the η-rank of X, denoted by rankηX, is defined as the lowest rank of bX ∈ Cn×n such that k bX − Xk 6 η.

The following theorem on the SDA has been proved in [10, 12, 25]: Theorem 2.3 Assume that X, Y > 0 satisfy (1) and its dual equation

Y = AY (I + HY )−1A>+ G, (4) where G and H are given in (2), and let S = (I + GX)−1A and ˘S = (I + HY )−1A>. Then the sequences {Ak}, {Gk} and {Hk} generated by the SDA (3) satisfy

(a) Ak= (I + GkX)S2 k ; (b) H 6 Hk6 Hk+16 X and X − Hk 6 (S>)2 k (X + XY X)S2k ; and (c) G 6 Gk 6 Gk+16 Y and Y − Gk 6 ( ˘S>)2 k (Y + Y XY ) ˘S2k.

From the spectral properties of symplectic pencils, we have ρ(S) = ρ( ˘S). Consequently from Definition 2.2 and Theorem 2.3, we prove the following monotonicity results.

Corollary 2.4 Under the hypothesis of Theorem 2.3, for a given numerical rank tolerance η > 0 and if ρ(S) = ρ( ˘S) < 1, then we have

(a) rankη(H) 6 rankη(Hk) 6 rankη(Hk+1) 6 rankη(X),

(b) rankη(G) 6 rankη(Gk) 6 rankη(Gk+1) 6 rankη(Y ),

and there exist kh, kg∈ N such that rankη(X) 6 2khl and rankη(Y ) 6 2kgm.

Proof. From Definition 2.2, assertions (a) and (b) can be obtained directly from Theorem 2.3 (b) and (c). Since ρ(S) < 1, there exists kh ∈ N such that kS2

kh

k2 _{< η/[kXk(1 + kXkkY k)]. From}

Theorem 2.3 (b), we obtain

kX − Hkhk 6 kXk(1 + kXkkY k)kS

2kh_k2_{< η.} ₍₅₎

Since rank(H) = l, it follows from the SDA (3) that rank(Hkh) 6 2

kh_{l. From Definition 2.2 and}

(5), we have rankη(X) 6 2khl. Similarly, there exists kg∈ N such that rankη(Y ) 6 2kgm.

(4)

Definition 2.5 The solution X ∈ Rn×n to the large-scale DARE (1) is said to be numerically low-ranked with respect to the numerical rank tolerance η > 0 if rankη(X) n.

Remark 2.1 Theorem 2.1 shows when the DARE (1) has a unique positive or nonnegative defi-nite solution X. From Corollary 2.4, we know that rankηX, the numerical rank of X with respect

to η > 0, is bounded by 2kh_{l for some constant k}

h. If 2khl n, then the solution X is

numer-ically low-ranked with respect to η. Importantly, we deduce when X is numernumer-ically low-ranked, instead of starting from the corresponding observation on the singular values of X from some given examples.

2.1 Large-Scale SDA

Let B0= B, C0 = C, R0 = R−1 > 0 and T0= T−1> 0, where B, C, R and T are given in (2).

Suppose that {Gk} and {Hk} are generated by SDA (3), with the initial

A0= A, G0= B0R0B0>, H0= C0T0C0>, (6)

it is easily seen that Gk and Hk have the low-ranked forms: (for k = 0, 1, · · · )

Gk= BkRkBk>, Hk= CkTkCk>, (7)

where Bk∈ Rn×mk, Ck∈ Rn×lk, Rk ∈ Rmk×mk and Tk ∈ Rlk×lk.

Applying the Sherman-Morrison-Woodbury formula (SMWF), we have (In+ GkHk)−1 = In− GkCkTk I + Ck>GkCkTk −1 C_k> (8a) = In− Bk I + RkB>kHkBk −1 RkBk>Hk, (8b)

From (8), notice the sparse plus low rank (splr) forms in (In+ GkHk)−1, (In+ GkHk)−1Gk and

Hk(In+ GkHk)−1, generalized slightly from [8, Definition 3.1].

From the first glance, the iteration for A in the SDA in (3) appears doomed, with O(n3₎

operations for the products of full matrices. However, with the low rank and splr forms in (2) and (8), we can organize, for k = 1, 2, · · · , Ak = A2k−1− D

(1) k Sk h D_k(2)i > . Equations in (8) then imply Ak+1 = Ak(In+ GkHk)−1Ak = Ak h In− GkCkTk I + Ck>GkCkTk −1 C_k>iAk = Ak h In− Bk I + RkB>kHkBk −1 RkBk>Hk i Ak = A2_k− D(1)_k+1Sk+1 h D(2)_k+1i > , (9) with D(1)_k+1= AkGkCk, D (2) k+1= A > kCk, Sk+1= Tk I + Ck>GkCkTk −1 , (10a) or D(1)_k+1= AkBk, D (2) k+1= A > kHkBk, Sk+1= I + RkB>kHkBk −1 Rk, (10b)

(5)

still involving O(n3) operations for a dense A. Note that the operation counts can be reduced to O(n) with the assumption that the maximum number of nonzero components in any row or column of A (denoted by nA) is much less than and independent of n (see the flop counts in

Table 1 in Section 2.4). One possibility is not to form Ak explicitly. Note that we have to store

all the Bi, Ci, Ri and Ti for i = 0, 1, · · · , k − 1 to facilitate the multiplication of low-ranked

matrices by Ak or A>k.

The induction proof of the general form of Ak in (9)–(10) can be completed by considering

the initial k = 0 case, which is trivial.

Applying (8) to the SDA, we have bGk+1≡ bBk+1Rbk+1Bb_k+1> and bHk+1≡ bCk+1Tbk+1Cb_k+1> with

b Bk+1= [Bk, AkBk], Cbk+1= [Ck, A>kCk], (11) b Rk+1 = Rk⊕ h Rk− RkB>kCkTk I + Ck>GkCkTk −1 C_k>BkRk i (12a) = Rk⊕ h Rk− I + RkBk>HkBk −1 RkBk>HkBkRk i (12b) ≡ Rk⊕ ˘Rk, and b Tk+1 = Tk⊕ h Tk− TkCk>GkCkTk I + Ck>GkCkTk −1i , (13a) = Tk⊕ h Tk− TkCk>Bk I + RkBk>HkBk −1 RkB>kCkTk i (13b) ≡ Tk⊕ ˘Tk.

Note that bRk, bTk and Sk in (10) are symmetric for all k. Interestingly, the column spaces of bBk

(and bCk) are the Krylov subspaces Kk(A, B) (and Kk(A>, C)); see Section 3.1.

2.2 Truncation of B

k

and C

k

Now we shall consider an important aspect of the SDA for large-scale DAREs (SDA ls) — the growth of Bk and Ck. Obviously, as the SDA converges, low-ranked components, which are

increasing smaller but with higher ranks, are added to Bkand Ck. Apparent from (11), the growth

in the sizes and ranks of these iterates is potentially exponential. To reduce the dimensions of Bk and Ck, which control the dimensions of D

(1) k and D

(2)

k , we shall truncate their columns by

orthogonalization.

Assume without loss of generality that Bk and Ck in (7) are orthogonal. Consider the QR

decompositions with column pivoting:

[Bk, AkBk](I ⊕ Πb) = [Bk, Φk, Φτ,k]   I Γb 1,2 Γb1,3 0 Γb 2,2 Γb2,3 0 0 Γb τ  , (14a) [Ck, A>kCk](I ⊕ Πc) = [Ck, Ψk, Ψτ,k]   I Γc_1,2 Γc_1,3 0 Γc 2,2 Γc2,3 0 0 Γc τ  , (14b)

(6)

where Πb, Πc are permutation matrices, kΓbτk 6 ξb,k and kΓcτk 6 ξc,k. Here, ξb,k and ξc,k are

some tolerances controlling the truncation process. We denote Γu= _Γu 1,2 Γu1,3 Γu2,2 Γu2,3 Π>_u, Γu,τ = [0, Γuτ] Π > u, bΓu= _Γ u Γu,τ , Mu= I 0 Γu , (15) where u = b or c. Let Bk+1= [Bk, Φk], Ck+1= [Ck, Ψk], Rτ_k+1= MbRbk+1Mb>, Tk+1τ = McTbk+1Mc>. (16) From (11)–(16), we then have

b Gk+1= bBk+1Rbk+1Bb_k+1> = Bk+1Rτk+1B > k+1+ 4Gk+1, (17a) b Hk+1= bCk+1Tbk+1Cb_k+1> = Ck+1Tk+1τ C > k+1+ 4Hk+1, (17b) where 4Gk+1= [Bk+1, Φτ,k] " 0 ΓbR˘kΓ>b,τ Γb,τR˘kΓ>b Γb,τR˘kΓ>b,τ # _B> k+1 Φ>_τ,k , (18a) 4Hk+1= [Ck+1, Ψτ,k] 0 ΓcT˘kΓ>c,τ Γc,τT˘kΓ>c Γc,τT˘kΓ>c,τ _C> k+1 Ψ>_τ,k . (18b) Let Gτ_k+1= Bk+1Rτk+1Bk+1> , H τ k+1= Ck+1Tk+1τ Ck+1> . (19)

We now have all the additional components required for the SDA for large-scale DAREs (SDA ls), with the truncation of Bk, Ck and the recursive form of Ak. The superscripts of Tk+1τ and R

τ k+1,

indicating truncated quantities, will later be dropped in Algorithm 1. In other words, we replace the notations T_k+1τ and R_k+1τ defined in (16) by Tk+1 and Rk+1, respectively, in Algorithm 1.

Theorem 2.3 shows that the sequences {Gk} and {Hk} generated by the SDA are monotonically

increasing and bounded above. In the following theorem, we shall prove the monotonicity of the sequences {Gτ

k} and {H τ

k} generated by the SDA ls, where G τ

k = BkRkBk> and H τ

k = CkTkCk>.

To this end, we need the following useful results. Lemma 2.6 Let Gτ

k = BkRkB>k and Hkτ = CkTkCk> with Rk and Tk being positive definite.

Then ˘Rk, ˘Tk in (12), (13) are positive definite and k ˘Rkk 6 kRkk = kGτkk, k ˘Tkk 6 kTkk = kHkτk.

Proof. From (12)–(13), we have ˘

Rk = Rk− RkB>kCkTk(I + Ck>GkCkTk)−1Ck>BkRk = Rk(I + Bk>CkTkCk>BkRk)−1,

˘

Tk = Tk− TkCk>GkCkTk(I + Ck>GkCkTk)−1= Tk(I + Ck>GkCkTk)−1.

Since Rk and Tk are positive definite, we deduce that ˘Rk, ˘Tk are positive definite and ˘Rk6 Rk,

˘

(7)

Theorem 2.7 For given tolerances η, ξb,k and ξc,k, suppose that the sequences {Gτk} and {H τ k}

are generated by the SDA ls and Gτ_k = BkRkBk>, H τ k = CkTkCk>. Then we have (a) Gτ k6 G τ k+1and H τ k 6 H τ k+1;

(b) rankη(Gτk) 6 rankη(Gτk+1) and rankη(Hkτ) 6 rankη(Hk+1τ ); and

(c) kGτ_k+1− Gτ kk = kΓbR˘kΓ>bk and kH τ k+1− H τ kk = kΓcT˘kΓ>ck, where ˘Rk, ˘Tk and Γu (u = b or

c) are given in (12), (13) and (15), respectively.

Proof. Since R0 and T0 are positive definite, from (12), (13), (16) and Lemma 2.6, it is easily

seen that Rk and Tk are positive definite for each k ∈ N by induction. From (12), (13), (15), (16)

and (19), we have Gτ_k+1− Gτ k = Bk+1ΓbR˘kΓ>bB>k+1, H τ k+1− H τ k = Ck+1ΓcT˘kΓ>cCk+1> . (20)

It follows from Lemma 2.6 that Gτ_k_{6 G}τ_k+1and H_kτ_{6 H}_k+1τ . This proves assertion (a). Assertion (b) can be obtained from assertion (a) directly. Since Bk+1 and Ck+1 are orthogonal, we can

obtain assertion (c) from (20).

For given truncation tolerances τg and τh, suppose that ξb,k = τg/(2kAkBkk) and ξc,k =

τh/(2kA>kCkk). In the following, we show that τg and τh are the upper bounds of the relative

truncation errors k bGk+1− Gτk+1k/kGτk+1k and k bHk+1− Hk+1τ k/kHk+1τ k, respectively.

Theorem 2.8 For given truncation tolerances τg and τh, suppose that ξb,k= τg/(2kAkBkk) and

ξc,k= τh/(2kA>kCkk). Then

k bGk+1− Gτk+1k 6 τgkGτk+1k, k bHk+1− Hk+1τ k 6 τhkHk+1τ k, (21)

where Gτ_k+1, H_k+1τ are given in (19), and bGk+1, bHk+1 in (17).

Proof. With kΓb,τk = kΓbτk 6 τg/(2kAkBkk) and kΓc,τk = kΓcτk 6 τh/(2kA>kCkk), it follows

from (14) that kAkBkk = kbΓbk > kΓbk and kA>kCkk = kbΓck > kΓck. From (17)–(18), Lemma 2.6

and Theorem 2.7 (a), the truncation errors satisfy

k bGk+1− Gτk+1k = k4Gk+1k 6 2kΓb,τkkbΓbkk ˘Rkk 6 τgkGτkk 6 τgkGτk+1k,

k bHk+1− Hk+1τ k = k4Hk+1k 6 2kΓc,τkkbΓckk ˘Tkk 6 τhkHkτk 6 τgkHk+1τ k.

Remark 2.2 In the SDA ls with given truncation tolerances τgand τh, we set ξb,k= τg/(2kAkBkk),

ξc,k= τh/(2kA>kCkk). We can then guarantee that the τg and τh are the upper bounds of relative

truncation errors, k4Gk+1k/kGτk+1k and k4Hk+1k/kHk+1τ k, respectively. As an additional

safe-guard, mk and lk (the widths of Bk and Ck respectively) can be restricted to be less than or equal to

their respective maxima mmax and lmax in SDA ls. When mk+1> mmax (or lk+1> lmax) for the

given τg (or τh), we force mk+1= mmax (or lk+1= lmax) and the corresponding error k4Gk+1k

(or k4Hk+1k) can still be computed by (18), and hence the resulting truncation tolerance (on the

left-hand-side) or relative error (on the right-hand-side) are equal:

(8)

2.3 Algorithm and Residuals

Consider the difference of successive iterates, δGk+1 ≡ Gτk+1− G τ

k and δHk+1 ≡ Hk+1τ − H τ k.

From Theorem 2.7 (c), we have kδGk+1k = kΓbR˘kΓ>bk and kδHk+1k = kΓcT˘kΓ>ck, where ˘Rk, ˘Tk

and Γu (u = b or c) are given in (12), (13) and (15), respectively.

Let Hτ

k = CkTkCk> and

Tk+1= Tk− TkCk>B0(I + R0B0>HkτB0)−1R0B0>CkTk,

Λk = A>CkTk+1Ck>A, Cek+1≡Ck, A>Ck .

From the SDA ls process, we have D(Hkτ) = −H τ k + Λk+ H = −CkTkCk>+ A>CkTk+1Ck>A + CT0C> = eCk+1 −Tk+ T0 0 0 0 ⊕ Tk+1 e Ck+1> ≡ eCk+1_Tbk+1Cek+1> .

Because Ck is orthogonal, eCk+1 has the QR decomposition

e Ck+1= [Ck, Θ] I 0 Υ ≡ [Ck, Θ] eΥ.

For the residual rk ≡ kD(Hkτ)k and the relative residualerk≡ rk/(kH

τ

kk + kΛkk + kHk), we have

the efficient formulae: kHτ kk = kTkk, kHk = kT0k, kΛkk = kΥTk+1Υ>k, kD(Hkτ)k = k eΥbTk+1Υe>k. Algorithm 1 (SDA ls) Input: _{A ∈ R}n×n , orthogonal B ∈ Rn×m and C ∈ Rn×l_{, R}−1 _{= R}−>_{∈ R}m×m_, T−1= T−>∈ Rl×l_{, positive tolerances τ}

g, τhand , and mmax, lmax;

Output: τg,, τh,and B∈ Rn×m, R∈ Rm×m, C∈ Rn×l and T∈ Rl×l,

with CTC> (≈ X in (1)) and BRB> (≈ Y in (4)); Set A0= A, D (1) 0 = D (2) 0 = 0 ∈ Rn×l, S0= Il, B0= B, R0= R−1, C0= C, T0= T−1, δ0= 2, τg,= τg, τh,= τhand k = 0; Do until convergence:

If δk< , Set B= Bk, R= Rk, C= Ck and T= Tk; Exit; End If

Compute bBk+1= [Bk, AkBk], bCk+1= [Ck, A>kCk]; b Rk+1= Rk⊕ h Rk− RkBk>CkTk I + Ck>GkCkTk −1 C_k>BkRk i , b Tk+1= Tk⊕ h Tk− TkCk>Bk I + RkBk>HkBk −1 RkBk>CkTk i ; with Ak+1= A2k− D (1) k+1Sk+1 h D_k+1(2) i > , D(1)_k+1= AkGkCk, D (2) k+1= A > kCk, Sk+1= Tk Il+ Ck>GkCkTk −1 ; Compute Bk+1, Ck+1 by truncating bBk+1, bCk+1 with tolerances τg, τh;

modify Rk+1← Rτk+1and Tk+1← Tk+1τ ;

If mk+1> mmax(or lk+1> lmax), Set mk+1← mmax (or lk+1← lmax);

Compute τg,k+1 (or τh,k+1) as in (22), and τg,← max{τg,, τg,k+1}

(or τh,← max{τh,, τh,k+1}); End If

Compute k ← k + 1, δk= max{kδHkk, kδGkk} anderk. End Do

(9)

Equations (9) (used recursively but not explicitly), (10a) (or (10b)), (11), (12a) (or (12b)), (13a) (or (13b)) and (19) (with the superscript (·)τ denoting truncation dropped for simpler notation), together with the corresponding initial condition (6), constitute the SDA for large-scale DAREs (SDA ls). We summarize our method in Algorithm 1, with the particular choice of (9), (10a), (11), (12a), (13b) and (19). We would like to emphasize again that care has to be exercised in Algorithm 1, where multiplications by Ak+1 and A>k+1are carried out recursively

using (9) and (10a) or (10b). Otherwise, computation cannot be carried out in O(n) complexity. Similar care has to be taken in the computation of differences of iterates or residuals (used in Algorithm 1).

2.4 Operation and Memory Counts

Recall that the maximum number of nonzero elements in any row or column in A is bounded by nA, for bounding the number of flops in the second column in Table 1. In the third column,

the number of variables is recorded. Only O(n) operations or memory requirement are included. Note that most of the work is done in the computation of bBk+1and bCk+1, for which AkBk and

A>_kCk have to be calculated recursively, as Akis not available explicitly. We shall use the notation

Nk ≡P k

j=1(lj+ mj), and the operation count for the QR decomposition of an n × r matrix is

4nr2 _{flops [14, p. 250].}

Table 1: Operation and Memory Counts for the kth Iteration in Algorithm 1 (SDA ls)

Computation Flops Memory

b Bk+1, bCk+1 2k+1nA(lk+ mk) + (6lk+ 1)Nk n Nk+1n b Rk+1, bTk+1 2lkmkn O(l2k+ m2k) D(1)_k+1, D(2)_k+1 2l2 kn − S−1_k+1 O(l3_k) O(l_k2) Truncate Bk+1, Ck+1 4(l2k+ m2k)n − Modify Rk+1, Tk+1 O(l3k+ m3k) − ˜ rk+1 4lk2n − Total 2k+1nA(lk+ mk) + (6lk+ 1)Nk+ Nk+1n 2(5l2 k+ 2m2k+ lkmk) n

With lk and mk controlled by the truncation in Section 2.2, the operation count will be

dominated by the calculation of bBk+1 and bCk+1. In our numerical examples in Section 4, the

flop count at the end of Algorithm 1 dominates, with the work in one iteration approximately doubled that of the previous one. This corresponds to the 2k+1 factor in the total flop count.

(10)

3 Errors Analysis

3.1 SDA and Krylov Subspaces

There is an interesting relationship between the SDA and Krylov subspaces. Define the Krylov subspaces

Kk(A, B) ≡

_span{B} _{(k = 0),}

span{B, AB, A2_{B, · · · , A}2k₋₁

B} (k > 0). From (7), (9) and (11), we can see that

span{B0} = span{B} = K0(A, B), span{B1} = span {[B, AB]} = K1(A, B)

and, for some low-ranked F ,

span{B2} = span {[B1, A1B1]}

= spanB, AB, (A2− ABF>)(B, AB) = K2(A, B).

Similarly, it is easy to show that

span{Bk} = Kk(A, B), span{Ck} = Kk(A>, C). (23)

The SDA is closely related to approximating the solution X and the dual solution Y using the Krylov subspaces in (23), with additional components vanishing quadratically. However, for problems of moderate size n, Bk and Ck become full-ranked after a few iterations.

Note that the Krylov subspaces Kk(A±1, B) and Kk(A±>, B) have been used in the

solu-tion of CAREs and Lyapunov equasolu-tions in [15, 17, 18, 19, 20, 21]. From (23) and the Cayley transform from CAREs to DAREs [11], with Aγ ≡ A − γI for some γ, we can see clearly the

appropriate choices of (rational) Krylov subspaces for DAREs and CAREs are Kk(A>, C) and

Kk(A−>γ , A−>γ C) for X, and Kk(A, B) and Kk(A−1γ , A−1γ B) for the dual solution Y , respectively.

The link between the SDA and the Krylov subspaces defined above is important in explaining the fast convergence of the SDA. We used to believe the convergence of the SDA came from the following results from (3):

Gk+1= Gk+ δGk+1, δGk+1≡ Ak(In+ GkHk)−1GkA>k,

Hk+1= Hk+ δHk+1, δHk+1≡ A>kHk(In+ GkHk)−1Ak,

(24) thus kδHk+1k ≤ kAkk2kHk(In+ GkHk)−1k and kδGk+1k ≤ kAkk2k(In+ GkHk)−1Gkk, and the

fact that kAkk → 0 quadratically. However, we understand now the power of approximation

of the Krylov subspaces contributes in (24) itself as well, creating cancellations and diminishing components in δHk+1and δGk+1. This is consistent with numerical results from examples where

the corresponding Ak → 0 slowly but the overall convergence for Hk and Gk is faster.

3.2 Galerkin Methods

Motivated by Section 3.1, another possibility to solve large-scale algebraic Riccati equations (AREs) is the Galerkin method for some given approximating (Krylov) subspaces. Consult [17, 18, 19, 20, 21] for examples of similar methods, with Arnoldi processes constructing the Krylov subspaces.

(11)

Let Z = [z1, · · · , zN] contain a set of normalized orthogonal basis vectors and assume that

e

X ≡ ZΛZ>=PN

i,j=1λijzizj>, with the symmetric Λ = [λij], approximates the solution X to the

DAREs in some sense. As mentioned before, the (Krylov) subspaces spanned by the columns of Z may be the results of an Arnoldi process or the SDA ls.

It is well-known, with the help of SMWF, that D(X) in (1) can be rewritten as D(X) = −X + A>XA − A>XB(R + B>XB)−1B>XA + H.

First, assume that the approximate solution eX ≡ ZΛZ>_{, with Λ ∈ R}N ×N_{, approximates the}

solution X to the DAREs in the sense that the residual D( eX) is small, where

D( eX) = −ZΛZ>+ A>ZΛZ>A − A>ZΛZ>B(R + B>ZΛZ>B)−1B>ZΛZ>A + H

= O(). (25)

This leads to the reduced DARE in Λ of size N , with an O() error: e

D(Λ) = −Λ + eA>Λ eA − eA>Λ eB(R + eB>Λ eB)−1Be>Λ eA + eH = O(), (26) with eA ≡ Z>AZ, eB ≡ Z>B and eH ≡ Z>HZ.

We next extend Z to the orthogonal matrices [Z, e_{Z] ∈ R}n×n. From (25) and (26), we have

[Z, eZ]>D( eX)[Z, eZ] = e D(Λ) O() O() O() = O().

If “adequate” Krylov subspaces are known or have been generated from an Arnoldi process, a solution bΛ to the small reduced DARE (26), eD(Λ) = 0, can be computed, generating an approximate Galerkin solution bX ≡ Z bΛZ to the DARE (1). Consequently, eX and bX are hopefully approximately equal, as they solves equations differed by O(). Note that proving the solvability of (26) is nontrivial from the solvability of (1). Also, the Galerkin method summarized in (26) will be inferior to the SDA ls if the Krylov subspaces are selected inappropriately. Note that many others selected the Krylov subspaces generated arbitrarily by A, its inverse and transpose, by the Arnoldi process. These approaches resulting in subspaces very different from those in (23) by the SDA ls, which also benefitting from the diminishing magnitudes of Ak. This may explain

the slow convergence of some Galerkin or Krylov-subspace methods. Lastly, for a given accuracy of the approximating solution to the DAREs, it is difficult to predict the required dimensions of the Krylov subspaces a priori, creating the need of some outer iterative procedures.

3.3 Errors of SDA ls

In this subsection, we shall write down an error analysis for the SDA ls with truncation. From [12, 25], the SDA is equivalent to a doubling process applied to the associated symplectic matrix pencils. The error analysis, conducted via these matrix pencils, produces some interesting and elegant results.

We start from the equivalence of the DARE (1), for some stable matrix S = (I + GX)−1A, to the symplectic eigenvalue problem

A 0 −H I I X = I G 0 A> I X S.

(12)

The kth iteration in the SDA is then equivalent to Mk I X = Lk I X S2k, Mk≡ Ak 0 −Hk I , Lk≡ I Gk 0 A>_k . (27) The SDA is equivalent to the following doubling action, finding (Mk∗, Lk∗) such that Mk∗Lk=

Lk∗Mk and (Mk+1, Lk+1) = (Mk∗Mk, Lk∗Lk). The effect of this doubling action is the

“dou-bling” (or squaring) of S in Mk+1 I X = Mk∗Mk I X = Mk∗Lk I X S2k = Lk∗Mk I X S2k= Lk∗Lk I X S2k+1= Lk+1 I X S2k+1.

Note, interestingly and importantly, that the stable deflating subspace (corresponding to eigen-values in S2k

) stays the same, spanned by the columns of [I, X>]>. Let Ak, Gkand Hkbe perturbed to Aτk, G

τ kand H

τ

k, respectively, by errors from the truncation

in Section 2.2, round-off or whatever sources. Let

f Mk ≡ Aτ k 0 −Hτ k I , Lek≡ I Gτ k 0 Aτ > k . Suppose that k1 and k2are the errors such that

f Mk I X = eLk I X S2k+ k1 k2 . (28)

Doubling using the perturbed quantities, i.e., fMk∗Lek= eLk∗Mfk with

f Mk∗= Aτ_k(I + Gτ_kH_kτ)−1 0 −Aτ > k Hkτ(I + GτkHkτ)−1 I , Lek∗= I Aτ(I + Gτ_kH_kτ)−1Gτ_k 0 Aτ > k (I + HkτGτk)−1 and c Mk+1≡ " b Ak+1 0 − bHk+1 I # = fMk∗Mfk, Lbk+1≡ " I Gbk+1 0 Ab>_k+1 # = eLk∗Lek. We then have c Mk+1 I X = fMk∗Mfk I X = fMk∗Lek I X S2k+ fMk∗ k1 k2 = eLk∗Mfk I X S2k+ fMk∗ k1 k2 = eLk∗Lek I X S2k+1+ eLk∗ k1 k2 S2k+ fMk∗ k1 k2 = bLk+1 I X S2k+1+ ˆ k+1,1 ˆ k+1,2 , (29)

(13)

where ˆ k+1,1 ˆ k+1,2 = " k1S2 k + Aτ k(I + G τ kH τ k) −1 k1+ Aτk(I + G τ kH τ k) −1_Gτ kk2S2 k k2+ Aτ >k (I + H τ kG τ k)−1k2S2 k − Aτ > k (I + H τ kG τ k)−1H τ kk1 # . (30)

Suppose that 4Gk+1 and 4Hk+1 in (18) are respectively truncated from bGk+1 and bHk+1, i.e.,

b

Gk+1= Gτk+1+ 4Gk+1 and bHk+1= Hk+1τ + 4Hk+1. From (29) and (30), we have

f Mk+1 I X = eLk+1 I X S2k+1+ k+1,1 k+1,2 , (31) where k+1,1 k+1,2 = ˆ k+1,1 ˆ k+1,2 + 4Gk+1XS2 k+1 4Hk+1 . (32)

Let τg and τh be the small truncation tolerances for the relative truncation errors of bGk and

b

Hk respectively, i.e., k4Gkk 6 τgkGkτk and k4Hkk 6 τhkHkτk (c.f. (21)). Suppose that there is

only truncation from the k0th iteration and that the SDA ls converges after kmax iterations (i.e.,

δkmax = max{kδHkmaxk, kδGkmaxk} < , where is some accuracy tolerance). Let

sk ≡ max{kAτk(I + G τ kH τ k)−1k, kA τ > k (I + H τ kG τ k)−1k}, h ≡ kHτ kmaxk, g ≡ kG τ kmaxk, ρ ≡ ρ(S), (33) where ρ ∈ (0, 1) is the spectral radius of S. Note that from Theorem 2.7, we have kGτ_kk 6 g and kHτ

kk 6 h for k = k0, . . . , kmax. Since the sequence {Ak} generated by the SDA in (3) converges

to zero, we expect {sk} to converge to a value near zero. Suppose that the stable matrices S

and ˘S defined in Theorem 2.3 are diagonalizable, with X_S−1SXS and X_S−1˘ SX˘ S˘ being diagonal.

Since ρ(S) = ρ( ˘_{S), we then have kSk 6 κ}sρ and k ˘Sk 6 κ˘sρ, where κs ≡ kXSkkXS−1k > 1 and

κ˘s≡ kX_S˘kkX−1_˘

S k > 1. Let

κ ≡ max{κs, κ˘s} > 1. (34)

Then it is easy to see that kS2k

k 6 κρ2k

and k ˘S2k

k 6 κρ2k

. Since ρ < 1, there exists a smallest k∗∈ N such that κρ2 k∗ 6 1. Denote ξ∗= (1 + κρ2k0_{)(1 + κρ}2k0+1_{) · · · (1 + κρ}2k∗₎ _{if k} 06 k∗, 1 if k∗< k0. (35) We have following useful result.

Lemma 3.1 Let ρ ∈ (0, 1) and κ be constants defined in (33) and (34), respectively. Then

kmax−1 Y i=k0 (1 + κρ2i_{) 6} ξ∗ 1 − ρ2k0, where ξ∗ is given in (35).

(14)

Proof. Let k∗ be the smallest positive integer such that κ∗= κρ2 k∗ 6 1. If k06 k∗, then kmax−1 Y i=k0 (1 + κρ2i_{) 6} ∞ Y i=k0 (1 + κρ2i) = (1 + κρ2k0)(1 + κρ2k0+1) · · · (1 + κρ2k∗) ∞ Y i=k∗+1 1 + κ∗ρ2 i₋₂k∗ 6 ξ∗ ∞ Y i=k∗ 1 + κ∗ρ2 i 6 ξ∗ ∞ Y i=k0 1 + κ∗ρ2 i 6 ξ∗ ∞ X j=0 ρj2k0 ₆ ξ∗ 1 − ρ2k0. Similarly, if k∗< k0 thenQ kmax−1 i=k0 (1 + κρ 2i ) 6 1/(1 − ρ2k0) = ξ∗/(1 − ρ2k0).

From (30) and (32), we obtain kk+1,1k kk+1,2k 6 " sk+ κρ2 k skgκρ2 k skh 1 + skκρ2 k # kk1k kk2k + τggkXkκρ2 k+1 τhh (36) with kk0,1k ≈ τggkXkκρ 2k0

and kk0,2k ≈ τhh, for each k = k0, . . . , kmax. In the following, we

give upper bounds on the relative forward errors in the SDA ls.

Theorem 3.2 Let X and Y be the solutions of DARE (1) and its dual (4). Suppose that the SDA ls computes kmax iterations and executes the truncation procedure of Gk and Hk from k =

k0, . . . , kmaxwith relative truncation tolerances τgand τh as in (21), respectively. Then, we have

H_kτ max− X kHτ kmaxk 6 cξ∗(kmax− k0+ 1) 1 − ρ2k0 " τh+ τggkXkκρ2 k0 h # +kA τ > kmaxXkκρ 2kmax h , (37a) Gτ kmax− Y kGτ kmaxk 6cξ∗(kmax− k0+ 1) 1 − ρ2k0 " τg+ τhhkY kκρ2 k0 g # +kA τ kmaxY kκρ 2kmax g , (37b) where c ≡Qkmax−1

k=k0 max{1, sk(h + 1), sk(g + 1)}, sk, g, h and ρ are defined in (33), and κ and ξ∗

in (34) and (35), respectively.

Proof. For each k = k0, . . . , kmax− 1, we denote Ek = Lk+ ρ2

k κUk where Lk= sk 0 skh 1 , Uk= 1 skg 0 sk . From (36), we have kkmax,1k kkmax,2k 6Ek0 τggkXkκρ2 k0 τhh + Ek0+1 τggkXkκρ2 k0+1 τhh + · · · + Ekmax−1 τggkXkκρ2 kmax−1 τhh + τggkXkκρ2 kmax τhh , (38)

(15)

where Ek= Ekmax−1Ekmax−2· · · Ek, for k = k0, . . . , kmax− 1.

Let `k≡ kLkk1= max{1, sk(1 + h)}, uk ≡ kUkk1= max{1, sk(1 + g)} and

c = kmax−1 Y k=k0 max{`k, uk} = kmax−1 Y k=k0 max{1, sk(h + 1), sk(g + 1)}.

From the definition of Ek and Lemma 3.1, we obtain that for each k > k0,

Ek τggkXkκρ2 k τhh 1 6τhh + τggkXkκρ2 kkmax_Y−1 i=k (`i+ uiκρ2 i ) 6 cτhh + τggkXkκρ2 k0kmax_Y−1 i=k0 (1 + κρ2i) 6 cξ∗ 1 − ρ2k0 τhh + τggkXkκρ2 k0 . It follows from (38) that

kkmax,2k 6 cξ∗(kmax− k0+ 1) 1 − ρ2k0 h τhh + τggkXkκρ2 k0i . Since kS2kmax k 6 κρ2kmax , we have kAτ > kmaxXS 2kmax k 6 kAτ > kmaxXkκρ 2kmax . From (28) and h = kHτ

kmaxk, we obtain (37a).

Similarly, the dual equation (4) can be rewritten as A> 0 −G I I Y = I H 0 A I Y ˘ S, (39)

where ˘S = (I + HY )−1A>, and from (39) we can obtain (37b).

Remark 3.1 In fact, the last term in (37a) is the upper bound for kAτ > kmaxXS 2kmax k/h and kAτ > kmaxXS 2kmax_{k ≈ kA}τ > kmaxH τ kmax(I + G τ kmaxH τ kmax) −1_Aτ kmaxk ≈ δkmax+1.

Thus, δkmax+1 is an important quantity for controlling the relative forward error of X. In the

SDA ls, we use δk < with some accuracy tolerance as the stopping criterion.

Remark 3.2 For the fixed relative truncation tolerances τg and τh, the upper bounds of relative

forward errors are given on the right-hand-side of (37). In the SDA ls, the effective truncation tolerance τg, (or τh,) may be larger than the original tolerance τg (or τh), if we enforce mk 6

mmaxand lk6 lmax. Then the upper bounds of relative forward errors can be obtained from (37)

by replacing τg (or τh) by τg, (or τh,).

For large enough k0near convergence, we have ρ2

k0

≈ 0. Essentially, the truncation tolerances τh and τg controlled the accuracy in Hkτmax and G

τ

(16)

illustrate that this is indeed the case, with the accuracy of H_kτ

max usually slightly better than the

asymptotic bound cξ∗(kmax− k0+ 1)τh in (37a), possibly due to some cancellation of errors, or

the approximation effect of the Krylov subspaces. Note that the expected ill-effect of ρ ≈ 1 in (37). Finally, we would like to remind readers that τg and τh control the relative errors in Gτk

and H_kτ from any source, not just the truncation process.

Notice the interesting fact that the forward errors on the left-hand-sides of (37) have been obtained directly from the SDA via the associated symplectic pencil (Mk, Lk) in (27). This

contrasts with most numerical algorithms that forward errors are difficult to derive, leading to the celebrated approach of backward error analysis from Wilkinson. Of course, ill-conditioning still occurs in DAREs, associated with the eigenvalue problem of (Mk, Lk) and the closeness from

unity of the spectral radius ρ of S.

4 Numerical Experiments

4.1 Test Examples

We have tested the SDA ls on selected numerical examples from [7]. The suite of problems involves continuous-time systems originated from the boundary control problem modeling the cooling of rail sections. The PDE model was semi-discretized using 2D finite elements to a continuous-time linear system with n variables, where n = 1357, 5177, 20209, 79841. We transformed these continuous-time systems (discretized with ∆t = 0.01) to discrete-time models, as in [6], yielding Examples 4.1–4.4 below.

4.2 Numerical Results

All the numerical experiments were conducted using MATLAB [26] Version 2010a on an iMac with a 2.97 GHz Intel i7 processor, 8 Gigabyte RAM and machine accuracy eps = 2.22 × 10−16_.

For the numerical results, we present δk = kδHkk, where δk ≡ {kδHkk, kδGkk}. In

Algo-rithm 1, the stopping criterion is δk < with some accuracy tolerance ; please also consult the

convergence results in Theorem 3.2 and Remark 3.1. Note that δti is the execution time for the

ith iteration and the sub-total execution time tk=P k i=1δti.

In Example 4.1 with the smallest n, we apply the SDA (3) to obtain accurate estimates of the exact solutions X and Y . These were then used to illustrate the results for the η-ranks of X and Y and the relative forward errors. Effects of different combinations of (τg, τh) or (mmax, lmax)

are also presented. Because of the simple relationship between (τg, τh) and the relative forward

errors in Theorem 3.2, it takes only a few experiments to find an acceptable choice of the former. In Examples 4.2–4.4, the iterations in the SDA ls are reported for a corresponding set of (τg, τh). In Tables 4–6 below, k, δk, kHkτk, kGτkk, rk,_erk, mk, lk, δtk and tk are displayed.

Example 4.1 ([7], with n = 1357, m = 7, l = 6, ∆t = 0.01) Although the smallest in the suite of examples on the cooling of steel profiles [7], Example 4.1 is not the easiest in any sense. Somehow, the SDA ls converges faster for the larger examples. In this example, we have performed two tests. Test 1: We first employ the SDA (3) to compute (near-exact approximation to) the solutions X and Y of DARE (1) and its dual (4). The norms (kXk, kY k), residuals (denoted by rX, rY)

(17)

and relative residuals (denoted byr_eX, _erY) are estimated, respectively, as

kXk = 43.39, rX= 3.690 × 10−15, r_eX = 4.250 × 10−17,

kY k = 5.438 × 10−5_, _r

Y = 3.589 × 10−21, r_eY = 3.293 × 10−17.

Table 2 shows the rankη(X) and rankη(Y ) with η = 10−5, 10−7, 10−9, 10−11, 10−13 and 10−15.

Note that the η-ranks of X and Y are much smaller than the matrix size n = 1357.

Table 2: η-rank of X and Y .

η 10−5 10−7 10−9 10−11 10−13 10−15 rankη(X) 15 17 17 17 17 34

rankη(Y ) 7 21 36 48 59 71

Test 2: In this test, we set mmax= lmax= 200, = 2.22×10−16and employ Algorithm 1 with

various combinations of τgand τh. From the formula of (37a), assume that kAτ >kmaxXkκρ

2kmax_/h

1 and kXkκρ2k0

/h ≈ 1, the best ratio of τhand τg, i.e., τh/τg, should approximate g = kGτmaxk ≈

kY k ≈ 10−5. From the numerical results of Test 1, we have near-exact solutions X and Y . Hence, we can compute the relative forward errors, kHτ

kmax− Xk/kH τ kmaxk and kG τ kmax− Y k/kG τ kmaxk

in the example. The numerical results are shown in Table 3. Note that τh = τh, because

lkmax< 200 for all cases, and similarly τg= τg,when τg= 10

−2_{or 10}−4_{. For the case τ}

g= 10−7,

τg< τg,because we restrict mk 6 200, making the original truncation tolerance τgtoo small and

ineffective. Note also that the relative forward errors of X and Y equal approximately to τh,and

τg,, respectively, as predicted in Remark 3.2 and Theorem 3.2.

Table 3: Numerical results with various combinations of τg and τh.

(τg, τh) (10−7, 10−7) (10−2, 10−7) (10−4, 10−9) (10−7, 10−12)

kmax 17 17 17 17

mkmax 200 86 184 200

lkmax 17 17 17 44

δkmax 7.016e−22 7.016e−22 7.016e−22 7.016e−22

e

rkmax 2.179e−14 5.544e−11 2.194e−14 2.491e−16

τh, 1.000e−07 1.000e−07 1.000e−09 1.000e−12

τg, 2.225e−07 1.000e−02 1.000e−04 2.095e−07

kHτ

kmax− Xk/kH

τ

kmaxk 9.278e−12 1.160e−07 9.260e−12 8.301e−14

kGτ

kmax− Y k/kG

τ

kmaxk 3.349e−07 4.271e−03 3.486e−07 3.091e−07

tkmax 9.8e+2 3.5e+2 9.1e+2 1.1e+3

Example 4.2 ([7], with n = 5177, m = 7, l = 6, ∆t = 0.01) From Table 4, the (near-)machine accuracy of O(10−14) (in relative residual) is achieved within 13 iterations, in 230 seconds (exe-cution time). In the 14th iteration, the relative modification of Hτ

13 is δ14/kH13τ k ≈ 1.58 × 10−12,

which is smaller than the given truncation tolerance τh= τh,= 10−9. Hence, the relative forward

(18)

Table 4, the monotonicity of Gk and Hk, as summarized in Theorem 2.3 (b), (c) and Theorem 2.7

(a). Notice also the near-quadratic convergence of δkand rk, and the doubling of δtkand tk, with

respect to increasing values of k.

Table 4: Example 4.2 (τg= 10−4, τh= 10−9, = 10−10, mmax= lmax= 200).

k δk kHkτk kG τ

kk rk erk mk lk δtk tk 1 6.989e−02 0.1409 5.968e−6 6.882e−02 2.452e−01 14 12 2.8e−2 2.8e−2 2 1.366e−01 0.2775 1.096e−5 6.673e−02 1.376e−01 21 17 5.4e−2 8.1e−2 3 2.609e−01 0.5384 1.885e−5 6.319e−02 7.192e−02 30 18 9.3e−2 1.7e−1 4 4.991e−01 1.015 2.959e−5 6.136e−02 3.791e−02 39 19 1.2e−1 3.0e−1 5 9.553e−01 1.968 4.156e−5 5.787e−02 1.899e−02 50 20 2.3e−1 5.3e−1 6 1.751e+00 3.719 5.425e−5 5.148e−02 8.961e−03 62 22 4.7e−1 1.0e+0 7 2.944e+00 6.662 6.194e−5 4.077e−02 3.909e−03 76 25 1.0e+0 2.0e+0 8 4.181e+00 10.84 6.475e−5 2.563e−02 1.462e−03 92 27 2.1e+0 4.2e+0 9 4.292e+00 15.12 6.524e−5 1.021e−02 3.930e−04 112 28 5.0e+0 9.2e+0 10 2.407e+00 17.51 6.526e−5 1.655e−03 5.069e−05 134 30 1.1e+1 2.0e+1 11 4.571e−01 17.96 6.526e−5 4.501e−05 1.269e−06 152 32 2.6e+1 4.6e+1 12 1.284e−02 17.97 6.526e−5 3.432e−08 9.550e−10 163 33 5.7e+1 1.0e+2 13 9.816e−06 17.97 6.526e−5 4.232e−13 1.177e−14 163 34 1.2e+2 2.3e+2 14 2.839e−11 17.97 6.526e−5 4.290e−13 1.193e−14 163 34 2.5e+2 4.8e+2

Example 4.3 ([7], with n = 20209, m = 7, l = 6, ∆t = 0.01) From Table 5, the (near-)machine accuracy of O(10−13) (in relative residual) is achieved within 11 iterations, in 200 seconds (exe-cution time). In the 12th iteration, the relative modification of Hτ

11 is δ12/kH11τ k ≈ 1.81 × 10−12,

errors of the computed solution Hτ

11is about 10−9 by (37a).

k δk kHkτk kGτkk rk _erk mk lk δtk tk

1 1.079e−01 0.2226 3.393e−5 1.021e−01 2.327e−01 14 12 7.6e−2 7.6e−2 2 2.027e−01 0.4197 3.393e−5 9.915e−02 1.335e−01 23 17 1.7e−1 2.4e−1 3 3.880e−01 0.7994 4.914e−5 9.352e−02 7.073e−02 34 17 3.0e−1 5.5e−1 4 7.113e−01 1.511 6.505e−5 8.322e−02 3.463e−02 46 17 5.8e−1 1.1e+0 5 1.197e+00 2.707 7.496e−5 6.594e−02 1.536e−02 60 17 1.1e+0 2.3e+0 6 1.701e+00 4.407 7.862e−5 4.151e−02 5.793e−03 76 17 2.7e+0 5.0e+0 7 1.749e+00 6.153 7.924e−5 1.658e−02 1.566e−03 94 17 4.9e+0 9.9e+0 8 9.838e−01 7.129 7.927e−5 2.700e−03 2.031e−04 113 17 1.1e+1 2.0e+1 9 1.879e−01 7.313 7.927e−5 7.415e−05 5.131e−06 132 17 2.3e+1 4.3e+1 10 5.330e−03 7.318 7.927e−5 5.763e−08 3.936e−09 143 17 4.9e+1 9.3e+1 11 4.153e−06 7.318 7.927e−5 1.773e−12 1.211e−13 143 17 1.0e+2 2.0e+2 12 1.324e−11 7.318 7.927e−5 1.729e−12 1.181e−13 143 17 2.0e+2 4.0e+2

(19)

Example 4.4 ([7], with n = 79841, m = 7, l = 6, ∆t = 0.01) From Table 6, a (near-)machine accuracy of O(10−15) (in relative residual) is achieved within 10 iterations, in 330 seconds (exe-cution time). In the 10th iteration, the relative modification of H9τ is δ10/kH9τk ≈ 2.34 × 10−12,

errors of the computed solution H9τ is about 10−9 by (37a).

Lastly, the ratio between t10 for Examples 4.4 and 4.3 equals 330/93 ≈ 3.548, against the

expected value of 79841/20209 ≈ 3.951. Similar ratios from Examples 4.1–4.4 illustrate the O(n) computational complexity of the SDA ls.

k δk kHkτk kG τ

kk rk _erk mk lk δtk tk

1 1.510e−01 0.3132 5.552e−5 1.425e−01 2.218e−01 14 12 1.7e−1 1.7e−1 2 2.770e−01 0.5880 7.633e−5 1.270e−01 1.218e−01 26 17 4.7e−1 6.4e−1 3 4.669e−01 1.055 9.039e−5 1.009e−01 5.737e−02 40 17 1.3e+0 1.9e+0 4 6.660e−01 1.720 9.590e−5 6.381e−02 2.236e−02 55 17 2.8e+0 4.7e+0 5 6.894e−01 2.409 9.689e−5 2.573e−02 6.172e−03 74 17 5.6e+0 1.0e+1 6 3.924e−01 2.798 9.693e−5 4.269e−03 8.169e−04 92 17 1.2e+1 2.2e+1 7 7.651e−02 2.873 9.693e−5 1.216e−04 2.139e−05 106 17 2.2e+1 4.4e+1 8 2.254e−03 2.875 9.693e−5 1.017e−07 1.765e−08 113 17 4.4e+1 8.8e+1 9 1.890e−06 2.875 9.693e−5 3.122e−13 5.415e−14 113 17 8.3e+1 1.7e+2 10 6.725e−12 2.875 9.693e−5 1.382e−14 2.397e−15 113 17 1.6e+2 3.3e+2

5 Conclusions

We have proposed a structure-preserving doubling algorithm for the large-scale discrete-time algebraic Riccati equation (1), the SDA ls, with A being large and sparse, and B and C being low-ranked. Similar continuous-time algebraic Riccati equations can be treated after an application of a Cayley transform. We apply the Sherman-Morrison-Woodbury formula when appropriate and do not form Ak (the iterate for A) explicitly. For well-behaved DAREs (or CAREs), with

eigenvalues of the corresponding symplectic pencil (or Hamiltonian matrix) not on or near the unit circle (or the imaginary axis) and I + GkHk being invertible for all k, low-ranked approximations

to the solutions X and Y can be obtained efficiently. The convergence of the SDA ls is quadratic (ignoring the truncation of Bk and Ck), as shown in [11, 12, 25]. The computational complexity

and memory requirement are both O(n), provided that the growth of Bk and Ck is controlled.

We have also shown that the truncation process in Section 2.2 controls the relative errors in the approximate solutions.

Similar to the methods in [6, 8], our technique can be applied when A is large and sparse(-like), or is a product (inverse) of such matrices (a matrix). The feasibility of the SDA ls depends on whether Av and A>v (or A−1v and A−>v for CAREs) can be formed efficiently, for an arbitrary v. Contrasting other methods, ours does not require any inner iteration of the ADI or Smith method, for the Stein equations arisen from the inexact Newton’s iteration. Consequently, and evidently illustrated by our numerical results, the SDA ls solves large-scale DAREs efficiently.

(20)

Lastly, the SDA ls should be applied even for DAREs and CAREs of moderate sizes, with Ak

computed recursively or otherwise. This will produce symmetric approximations to the symmetric solution X (in factor form), contrasting the artificially symmetrized approximate solutions from the original SDA (3).

Acknowledgements

The first author was supported by the Major Scientific Research Foundation of Southeast Uni-versity (No.3207011102), the NSFC (No.11101080) and the SRFDP (No.20110092120023). Parts of this project were completed while the first author visited Monash University and when the third author visited the CMMSC and the NCTS at the National Chiao Tung University, and we would like to acknowledge the corresponding support. The second and the fourth author would like to acknowledge the support from the National Science Council and the NCTS in Taiwan. The fourth author also likes to thank the CMMSC and the ST Yau Centre at the National Chiao Tung University for their support.

References

[1] L. Amodei, and J.-M. Buchot, An invariant subspace method for large-scale algebraic Riccati equation, Appl. Numer. Maths., 60 (2010), pp. 1067–1082.

[2] A. Antoulas, Approximation of Large-Scale Dynamical Systems, SIAM Publications, Philadelphia, PA, (2005).

[3] P. Benner, Solving large-scale control problems, IEEE Control Systems Magazine, 14 (2004), pp. 44–59.

[4] P. Benner, Editorial of special issue on “Large-Scale Matrix Equations of Special Type”, Numer. Linear Algebra Appl., 15 (2008), pp. 747–754.

[5] P. Benner, J.-R. Li, and T. Penzl, Numerical solution of large Lyapunov equations, Riccati equations and linear-quadratic control problems, Numer. Linear Algebra Appl., 15 (2008), pp. 755–777.

[6] P. Benner, and H. Fassbender, On the numerical solution of large-scale sparse discrete-time Riccati equations, Adv. Comp. Maths., 35 (2011), pp. 119–147.

[7] P. Benner, and J. Saak, A semi-discretized heat transfer model for optimal cooling of steel profiles, in Dimension Reduction of Large-Scale Systems, ed. P. Benner, V. Mehrmann and D.C. Sorensen, Lecture Notes in Computational Science and Engineering, Springer-Verlag, Berlin/Heidelberg, Germany, 45 (2005), pp. 353–356.

[8] P. Benner, and J. Saak, A Galerkin-Newton-ADI method for solving large-scale algebraic Riccati equations, DFG Priority Programme 1253 “Optimization with Partial Differential Equations”, Preprint SPP1253-090.

(21)

[9] A. Bensoussan, G. Da Prato, M.C. Delfour, and S.K. Mitter, Representation and Control of Infinite Dimensional Systems, Syst. & Control: Foundations & Applic., Birkh¨auser Boston Inc., Boston, MA, 2nd Ed..

[10] C.-Y. Chiang, E. K.-W. Chu, C.-H. Guo, T.-M Huang, W.-W. Lin, and S.-F. Xu, Convergence analysis of the doubling algorithm for several nonlinear matrix equations in the critical case, SIAM J. Matrix Anal. Appl., 31 (2009), pp. 227–247.

[11] E. K.-W. Chu, H.-Y. Fan, and W.-W. Lin, A structure-preserving doubling algorithm for continuous-time algebraic Riccati equations, Linear Algebra Appl., 396 (2005), pp. 55–80. [12] E. K.-W. Chu, H.-Y. Fan, W.-W. Lin and C.-S. Wang, A structure-preserving doubling algorithm for periodic discrete-time algebraic Riccati equations, Int. J. Control, 77 (2004), pp. 767–788.

[13] E. K.-W. Chu, T.-M Huang, W.-W. Lin and C.-T. Wu, Palindromic eigenvalue prob-lems: a brief survey, Taiwanese J. Math., 14 (2010), pp. 743–779.

[14] G.H. Golub, and C.F. Van Loan, Matrix Computations, 3rd ed., Johns Hopkins Univer-sity Press, Baltimore, MD, 1996.

[15] M. Heyouni, and K. Jbilou, An extended block Arnoldi algorithm for large-scale solutions of the continuous-time algebraic Riccati equations, Elect. Trans. Numer. Anal., 33 (2009), pp. 53–62.

[16] I.M. Jaimoukha, and E.M. Kasenally, Krylov subspace methods for solving large Lya-punov equations, SIAM J. Numer. Anal., 31 (1994), pp. 227–251.

[17] K. Jbilou, Block Krylov subspace methods for large continuous-time algebraic Riccati equa-tions, Numer. Algorithms, 34 (2003), pp. 339–353.

[18] K. Jbilou, An Arnoldi based algorithm for large algebraic Riccati equations, Appl. Math. Lett., 19 (2006), pp. 437–444.

[19] K. Jbilou, Low rank approximate solutions to large Sylvester matrix equations, Appl. Math. Comp., 177 (2006), pp. 365–376.

[20] K. Jbilou, ADI preconditioned Krylov methods for large Lyapunov matrix equations, Linear Algebra Appl., 432 (2010), pp. 2473–2485.

[21] K. Jbilou, and A. Riquet, Projection methods for large Lyapunov matrix equations, Linear Algebra Appl., 415 (2006), pp. 344–358.

[22] P. Lancaster, and L. Rodman, Algebraic Riccati Equations, Clarendon Press, Oxford, 1995.

[23] I. Lasiecka, and R.Triggiani, Control Theory for Partial Differential Equations: Con-tinuous and Approximation Theories; I. Abstract Parabolic Systems, Cambridge University Press, Cambridge, 2000.

(22)

[24] J.-R. Li, and J. White, Low-rank solution of Lyapunov equations, SIAM Review, 46 (2004), pp. 693–713.

[25] W.-W. Lin, and S.-F. Xu, Convergence analysis of structure-preserving doubling algo-rithms for Riccati-type matrix equations, SIAM J. Matrix Anal. Appl., 28 (2006), pp. 26–39. [26] Mathworks , MATLAB User’s Guide, 2010.

[27] V.L. Mehrmann, The Autonomous Linear Quadratic Control Problem, Lecture Notes in Control and Information Sciences, Vol. 163, Springer Verlag, Berlin, 1991.

[28] T. Pappas, A.J. Laub, and N.R.Sandell, On the numerical solution of the discrete-time algebraic Riccati equation, IEEE Trans. Automat. Control, 25 (1980), pp. 631–641.

[29] J. Saak, Efficient Numerical Solution of Large Scale Algebraic Matrix Equations in PDE Control and Model Order Reduction, Dr. rer. nat. Dissertation, Chemnitz University of Tech-nology, Germany, 2009.

[30] J. Saak, H. Mena, and P. Benner, Matrix Equation Sparse Solvers (MESS): a Matlab Toolbox for the Solution of Sparse Large-Scale Matrix Equations, Chemnitz University of Technology, Germany, 2010.