Solution of a nonsymmetric algebraic Riccati equation from a one-dimensional multistate transport model

(1)

doi:10.1093/imanum/drq034

Advance Access publication on May 30, 2011

Solution of a nonsymmetric algebraic Riccati equation from

a one-dimensional multistate transport model

TIEXIANGLI

Department of Mathematics, Southeast University, Nanjing 211189, People’s Republic of China

[email protected] ERICKING-WAHCHU∗

School of Mathematical Sciences, Building 28, Monash University, Victoria 3800, Australia

∗_{Corresponding author: [email protected]}

JONGJUANG

Department of Applied Mathematics, National Chiao Tung University, Hsinchu 300, Taiwan [email protected]

AND

WEN-WEILIN

Centre of Mathematical Modelling and Scientific Computing, Centre of Theoretical Sciences and Department of Mathematics, National Taiwan University, No. 1, Sec. 4,

Roosevelt Road, Taipei 10617 Taiwan [email protected]

[Received on 18 January 2010; revised on 9 September 2010]

For the steady-state solution of a differential equation from a one-dimensional multistate model in trans-port theory, we shall derive and study a nonsymmetric algebraic Riccati equation B−− X F−− F+X+

X B+X = 0, where F±≡ (I − F)D±and B±≡ B D±with positive diagonal matrices D±and

pos-sibly low-ranked matrices F and B. We prove the existence of the minimal positive solution X∗under a set of physically reasonable assumptions and study its numerical computation by fixed-point iteration, Newton’s method and the doubling algorithm. We shall also study several special cases. For example when B and F are low ranked then X∗ _{= Γ ◦} P4_i₌₁U_iV_iTwith low-ranked U_i and V_i that can be computed using more efficient iterative processes. Numerical examples will be given to illustrate our theoretical results.

Keywords: algebraic Riccati equation; doubling algorithm; fixed-point iteration; Newton’s method; reflection; transport theory.

1. Introduction

Transport theory has been an active area of research, associated with masters like R. E. Bellman and S. Chandrasekhar (see the references in Juang (1995)). A one-dimensional model was studied first inJuang(1995), starting a series of numerical studies, for example, in Lu(2005),Bai et al.(2008),

c

at National Chiao Tung University Library on April 24, 2014

http://imajna.oxfordjournals.org/

(2)

FIG. 1. The one-dimensional rod.

Bini et al.(2008),Lin et al.(2008), andMehrmann & Xu(2008), in the past 15 years. We shall study a different one-dimensional multistate model fromBellman et al.(1973) andBellman & Wing(1975, equation (1.37), p. 15) that is generalized slightly in this paper.

We start from a simple one-dimensional ‘rod’ or line segment that extends from 0 to x and denote a generic point in the rod by z, as in Fig.1. Particles move to the right and left along this rod without colliding with one another while interacting with the rod itself without affecting it. We first assume that all particles are of the same type and have the same speed. The objective is to obtain information about the density of the beam of particles as a function of the position z.

We further assume that the probability of a particle at z (moving in either direction) interacting with the rod while moving a distance ofΔ is given by the expression

σ (z)Δ+ o(Δ), (1.1)

whereσ (∙) > 0 is the macroscopic cross-section and o(∙) denotes higher-order terms. As a result of this interaction, an expected average of f(z) and b(z) new particles emerge at the point z in the same (forward) and opposite (backward) directions, respectively, as the original particle. Particles travelling to the left of z = 0 and the right of z = x are lost to the system. Particles injected at the left and right ends, together with the new particles generated through the collision process, make up the total particle population of the system.

Initially, let us assume a time-independent state, where the expected particle population is stationary and independent of the time at which the system is observed. We define u(z) and v(z) to be the expected numbers of right- and left-moving particles, respectively, passing through the point z each second. (The adjective ‘expected’ is sometimes neglected but is necessary due to the stochastic nature of the prob-lem.) From the definition in (1.1) the probability of particles passing through from z to z+ Δ without interacting with the rod is 1− σ(z)Δ + o(Δ). The expected contribution to u(z + Δ) from this type of occurrence is

[1− σ(z)Δ + o(Δ)]u(z) = [(1 − σ(z)Δ)]u(z) + o(Δ). (1.2) However, some right- (or left-) moving particles passing through z will interact with the rod before reaching z+Δ. Each such event will produce an expected number f (z) (or b(z)) of particles proceeding in the direction of interest. The expected contributions to u(z+ Δ) from these types of occurrences are

σ (z) f (z)u(z)Δ+ o(Δ), σ(z)b(z)v(z)Δ + o(Δ). (1.3) Other events can take place but the contributions are o(Δ) and insignificant. Hence, summing the con-tributions in (1.2) and (1.3) yields

u(z+ Δ) = [(1 − σ(z)Δ)]u(z) + σ(z) f (z)u(z)Δ + σ(z)b(z)v(z)Δ + o(Δ). (1.4) Taking the limitΔ→ 0 in (1.4) leads to the following differential equation for u:

du

dz = σ(z){[ f (z) − 1]u(z) + b(z)v(z)}, u(0) = 0. (1.5)

(3)

A similar particle counting process also produces the following differential equation forv: −dv

dz = σ(z){b(z)u(z) + [ f (z) − 1]v(z)}, v(x) = 1. (1.6) For a multistate model we allow n different states (e.g., speed, energy, type or any other features other than direction that distinguish between the particles). The macroscopic cross-section for the j th state isσj(z) > 0 and the probability in (1.1) resulting in the emission of particles in the j th state then reads

σj(z)Δ+ o(Δ).

Similarly, we have functions uj(z) and vj(z) ( j = 1, . . . , n) representing the expected number of particles in state j , moving to the right and left, respectively, past the point z each second. We define the matrix functions F(z)= [ fi j(z)], B(z)= [bi j(z)], eF(z)= [ efi j(z)] and eB(z)≡ [ebi j(z)], where

e

fi j(z)= σi(z)[ fi j(z)− δi j], ebi j(z)= σi(z)bi j(z), (i, j = 1, . . . , n),

δi j is the Kroneckerδ, and fi j(z) and bi j(z) are the expected numbers of particles travelling, respec-tively, in the forward and backward directions, respecrespec-tively, after the collision of a particle of state j emitting particles of state i . A similar argument to that leading to (1.5) and (1.6) then produces

du

dz = eF(z)u+ eB(z)v, − dv

dz = eB(z)u+ eF(z)v,

with u = [u1, . . . , un]T,v = [v1, . . . , vn]Tand the convention M = [mi j] (capital letters for matrices and the corresponding lower-case letters with indices for elements), and the initial conditions ui(0)= 0 andvi(x)= δi j(corresponding to the initial injection of a particle of state j from the right).

From the above discussion we expect B, F > 0 to satisfy X

i

( fi j+ bi j) < 1 ∀ j. (1.7)

We will allow the critical case of equality in (1.7) (the ‘pure scattering’ case inBellman & Wing(1975, equation (4.1), p. 55)) later.

To carry out the invariant imbedding procedure the functions R and T are introduced, where ri j(x) is the expected number of particles emergent each second at z= x in state i from a rod of length x when the only input is one particle per second in state j at the right end z= x, and ti j(x) is defined similarly except the emergence is at the other end z = 0. Consider a rod of length x +Δ with the sub-rod of length x imbedded. Assuming that the reflecting response function R(z)= [ri j(z)] is known, the transmission response function T(z)= [ti j(z)] can be defined through a differential equation derived from a particle counting process.

Counting all the significant events as enumerated inWing(1962), and allowing different macro-scopic cross-sectionsσ±_j (z) > 0 for sources from the left and the right, the following equation for R(x)≡ [ri j(x)] can be derived:

dR(x) dx = B

−_(x)_{− R(x)F}−_(x)_{− F}+_(x)R(x)_{+ R(x)B}+_(x)R(x), _R(0)_{= 0,} _(1.8)

where B±≡ B D±, F±≡ (I − F)D±, D±≡ diag{σ_k±} > 0, and B and F are possibly low ranked. (In

Wing(1962) the signs of the linear terms on the right-hand side of (1.8) were positive. We change these

(4)

signs to make the resulting nonsymmetric algebraic Riccati equation (NARE) in (1.9) more consistent with notation in other papers on NAREs. Note that the{σk} were independent of direction inBellman

et al.(1973) andWing(1962).) After the determination of R, the function T can be derived from the simpler equation

dT(x)

dx = T (x)[eF(x)+ eB(x)R(x)], T(0)= I. For the steady-state solution for a particular x, (1.8) leads to

B−− X F−− F+X+ X B+X = 0, (1.9) with R replaced by the usual variable X for NAREs.

2. Existence of a solution

Some relevant definitions are as follows. For any matrices bA, bB ∈ Rm×n we write bA > bB or bA > bB if their elements satisfybai j > bbi j orbai j > bbi j, respectively, for all i and j . A real square matrix bA is called a Z -matrix if all of its off-diagonal elements are nonpositive. It is clear that any Z-matrix bA can be written as s I − bB with bB > 0. A Z-matrix bA is called an M-matrix if s> ρ(bB), where ρ(∙) is the spectral radius, and it is a singular M-matrix if s = ρ(bB) and a nonsingular M-matrix if s > ρ(bB). We have the following useful results from Berman & Plemmons(1994) andGuo & Higham (2007, Theorem 1.1).

LEMMA2.1 For a Z-matrix bA the following statements are equivalent: (a) bA is a nonsingular M-matrix;

(b) bA−1> 0;

(c) bAv > 0 for some vector v > 0. THEOREM2.2 Let us consider the NARE

X bC X− X bD− bA X+ bB = 0, (2.1) where bA, bB, bC and bD are real matrices of sizes m× m, m × n, n × m and n × n, respectively. Assume that M = _b D −bC −bB Ab (2.2) is a nonsingular M-matrix or an irreducible singular M-matrix. Then the NARE has a minimal non-negative solution S. If M is irreducible, then S > 0, and bA − SbC and bD − bC S are irreducible M-matrices. If M is a nonsingular M-matrix, then bA− SbC and bD− bC S are nonsingular M-matrices. If M is an irreducible singular M-matrix with positive left and right null vectors [uT₁, uT

2] T_{and [v}T 1, v T 2] T

(where u1, v1∈ Rnand u2, v2∈ Rm) satisfying

uT₁v16= uT2v2,

then

MS = In⊗ (bA− SbC)+ (bD− bC S)T⊗ Im

is a nonsingular M-matrix. If M is an irreducible singular M-matrix with uT₁v1= uT₂v2, then MS is an

irreducible singular M-matrix.

(5)

We have the following existence result.

THEOREM2.3 Under assumption (1.7), the unique minimal non-negative solution X∗of (1.9) exists. Proof. From Theorem2.2we need to show that the Z -matrix

M = (I− F)D− −B D+ −B D− _(I_{− F)D}+ = I − F B B F D− D+ (2.3)

is a nonsingular M-matrix. Note that A is an M-matrix if and only if AT is an M-matrix. Applying Lemma2.1, we need to find a vectorv > 0 such that MT_{v > 0, which is trivial from (}_1.7_).

2.1 NARE as an eigenvalue problem

The NARE (1.9) can be reformulated as the following eigenvalue problem: H I X = I X S, H ≡ F− −B+ B− −F+ = I −I + −F −B B F D− D+ . (2.4)

From (1.7) it is easy to see that the eigenvalues of H are shifted slightly from±σ_k∓, splitting equally on opposite sides of the imaginary axis. Using the Gerschgorin theorem, with_{D(a, r) ≡ {x ∈ C: |x − a| 6} r}, the eigenvalues are in [S_k_D(σ_k−, ασ_k−)]∪ [SkD(−σk+, ασk+)], divided equally on opposite sides of the imaginary axis, withα≡ (kF + Bk1) < 1 from (1.7).

REMARK2.4 For the critical case withα= 1 a simple application of the Gerschgorin theorem implies that the matrices H in (2.4) and M in (2.3) may be singular. However, the potential singularity may be detected or excluded by applying the extensions of the Gerschgorin theorem inHorn & Johnson(1985, Section 6.2). Consider all of the Gerschgorin disks of HT containing the origin. At least one of the corresponding inequalities should not be satisfied with equality. In other words, we may have to exclude the ultra-critical case that all of the first or last n rows have their corresponding off-diagonal row sums equal to unity.

Note that, even if H or M are singular, the existence result in Theorem2.2still holds provided that M is irreducible. With the additional requirement for the null vectors as in Theorem2.2, Newton’s method in Section4.1will be quadratically convergent.

2.2 NARE as a nonlinear equation

To compute the minimal non-negative solution X∗for the NARE (1.9), consider it in component form as follows:

(σ−_j + σi+)xi j = (B D−)i j + xi∙(F D−)∙ j+ (F D+)i∙x∙ j+ xi∙(B D+)x∙ j.

(Here xi_∙ and x_{∙ j} denote the i th row and j th column, respectively, of X , and(∙)i j denotes the element (i, j ) of a matrix.) Equivalently, we have

X = φ(X) ≡ Γ ◦ (B D−+ X F D−+ F D+X+ X B D+X), Γ ≡ [(σ−j + σi+)−1], (2.5) with X being a Hadamard product. Note that Theorem2.3, (2.5) and the assumption B > 0 imply that X∗> 0.

(6)

We have the following theorem for X∗and the fixed-point iteration.

THEOREM 2.5 Let X(0) = 0 and X(k+1) = φ(X(k)). Then, under assumption (1.7), we have the following

(i) the iterates satisfy X∗> X(k+1)> X(k)> Γ ◦ BD−> 0; (ii) X(k)→ X∗as k→ ∞.

Proof. It is easy to show that X(1)= Γ ◦ B D−> 0 from (2.5). For (i) consider the difference between X∗= φ(X∗) and X(k)= φ(X(k−1)_{) as follows:}

X∗− X(k) = Γ ◦ [(X∗− X(k−1))F D−+ F D+(X∗− X(k−1)₎

+ (X∗− X(k₋₁₎_{)B D}₊_X_∗_{+ X}(k₋₁₎_{B D}₊_(X_∗_{− X}(k₋₁₎_)].

Induction will then complete the argument for X∗− X(k) > 0 in (i). Similarly, induction on the difference between X(k+1)= φ(X(k)) and X(k)_{= φ(X}(k−1)_{) leads to X}(k+1)_{− X}(k)_{> 0 in (i).}

For (ii) convergence is implied by (i) with the limit eX∗ ≡ limk→∞X(k) = X∗because (i) implies

that X∗> eX∗> 0 and X∗is minimal.

3. Low-ranked B and F

When the full-ranked decompositions F = F1F₂Tand B = B1B₂Tare of rank m and p, respectively,

(2.5) implies that

X= Γ ◦ (B1B2TD−+ Z1F2TD−+ F1Z2T+ Z3ZT4) (3.1)

with the auxiliary variables

Z1≡ X F1, Z2≡ XTD+F2, Z3≡ X B1, Z4≡ XTD+B2. (3.2)

Substituting X in (3.1) into (3.2), we have 2(m+ p)n nonlinear equations for the 2(m + p)n unknowns in Zj( j = 1, . . . , 4) as follows:            Z1= [Γ ◦ (B1B2TD−+ Z1F2TD−+ F1ZT2+ Z3ZT4)]F1, Z2= [ΓT◦ (D−B2B₁T+ D−F2Z₁T+ Z2F₁T+ Z4Z₃T)]D+F2, Z3= [Γ ◦ (B1B₂TD−+ Z1F₂TD−+ F1ZT₂+ Z3ZT₄)]B1, Z4= [ΓT◦ (D−B2B1T+ D−F2Z1T+ Z2F1T+ Z4Z3T)]D+B2, (3.3)

(cf. the 2n equations in 2n unknowns inJuang(1995) for a simpler NARE with m= p = 1). Similarly, X can be retrieved using (3.1) after the Zj are obtained. It is obvious from Theorem2.5and (3.3) that Zj > 0 ( j= 1, . . . , 4) and X > 0.

The convergence of various iterative schemes for the set of nonlinear equations (3.3) can be shown. First let Rj ( j = 1, . . . , 4) be the jth right-hand side in (3.3). Starting from Z(0)_j = 0 ( j = 1, . . . , 4), we shall consider the following iterative methods.

(1) Simple iteration (SI): for j, l= 1, . . . , 4 we have Z(k_j+1)= Rj(. . . , Z

(kjl)

l , . . .), kjl = k (∀ j, l). (3.4) The right-hand sides Rjonly involve the previous iterates Z_l(k).

(7)

(2) Modified simple iteration (MSI): for j, l = 1, . . . , 4 we have Z(k_j +1)= Rj(. . . , Z (ekjl) l , . . .), ekjl = k+ 1 if l < j, k otherwise. (3.5)

The right-hand sides Rjinvolve Z_l(k), as well as Z_l(k+1)if they have been computed. (3) Nonlinear block Jacobi method (NBJ): for j, l = 1, . . . , 4 we have

Z(k_j +1)= Rj(. . . , Z (bkjl) l , . . .), bkjl = k+ 1 if l = j, k otherwise. (3.6)

The right-hand sides Rj involve Z_l(k), as well as Z_l(k+1)with l = j and the corresponding terms moved to the left-hand sides.

(4) Nonlinear block Gauss–Seidel method (NBGS): for j, l= 1, . . . , 4 we have Z(k_j +1)= Rj(. . . , Z ( ˇkjl) l , . . .), ˇkjl = k+ 1 if l6 j, k otherwise. (3.7)

The right-hand sides Rj involve Z(k)l and Z (k₊₁₎

l (when available), as well as Z (k₊₁₎

l with l = j and the corresponding terms moved to the left-hand sides.

From the above formulae we obviously have the following inequalities for the indices:

k= kjl 6 ekjl 6 ˇkjl, k= kjl 6 bkjl 6 ˇkjl ( j, l= 1, . . . , 4). (3.8) The following simple lemma will be repeatedly applied in the proof of Theorem3.2.

LEMMA3.1 Let U = [ui j], V = [vi j] and W = [wi j]∈ Rn×qbe non-negative, and let eΓ = [eτi j]∈ Rn×n_{be positive. Consider the linear system}

U− [ eΓ ◦ (U VT_)]W _{= e}_R, _(3.9)

and its i th row has the form

ui(I− Pi)= eri, (3.10) where (Pi)s j = n X l=1 eτilvlswl j, (3.11)

for i = 1, . . . , n and s, j = 1, . . . , q. In addition, assume that

u∗_i(I− Pi∗)= eri∗, (3.12)

where P_i∗is constructed as in (3.11) withvlsreplaced byvls∗ = (V∗)ls,eri∗is the i th row of eR∗and e

R∗> eR> 0, U∗> 0, V∗> V. (3.13) Then I − P_i∗and I − Pi are nonsingular M-matrices and

P_i∗> Pi, (I− Pi∗)−1> (I − Pi)−1> 0. (3.14)

(8)

Proof. From (3.11) and (3.13) we have that u∗_i,er_i∗ > 0 and I − P∗

i is a Z-matrix. Consequently, the transpose of (3.12) and Lemma2.1imply that (the transpose of) I− P_i∗is a nonsingular M-matrix with a non-negative inverse and(I− P∗

i )v > 0 for some vector v > 0. From (3.11) Pi is linear in V , with V∗> V implying that P_i∗> Pi, I − Pi > I − Pi∗and(I− Pi)v> (I − Pi∗)v > 0. Lemma2.1then implies that I− Pi is a nonsingular M-matrix with a non-negative inverse and (3.14) follows. With the additional subscripts_{I = S, M, J , G for the four different methods (}3.4)–(3.7), respec-tively (and ignoring them when the result holds for all the methods), we have the following results that are similar to those inGuo & Lin(2010).

THEOREM3.2 We shall assume that (1.7) holds and we have the splitting F = F1F₂Tand B= B1B₂T,

with the full-ranked B1, B2, F1, F2> 0. We have for j = 1, . . . , 4 and k = 0, 1, . . . that the following

holds

(i) the iterates satisfy Z∗_j > Z(k_j +1)> Z(k)_j > 0, except Z(0)_j = 0; (ii) Z(k)_j → Z∗_j as k→ ∞;

(iii) 06 Z(k)_{S, j} 6 Z_{M, j}(k) 6 Z_{G, j}(k); (iv) 06 Z(k)_{S, j} 6 Z_{J , j}(k) 6 Z_{G, j}(k).

Proof. For NBJ and NBGS the formulae in (3.3) (ignoring the superscripts(k+1) on the left-hand sides and(k) on the right for Zj as in (3.6) and (3.7)) are equivalent to

Z1− [Γ ◦ (Z1F₂T)]D−F1= eR1≡ [Γ ◦ (B1B₂TD−+ F1ZT₂+ Z3Z₄T)]F1, (3.15)

Z2− [ΓT◦ (Z2F1T)]D+F2= eR2≡ [ΓT◦ (D−B2B1T+ D−F2ZT1+ Z4Z3T)]D+F2, (3.16)

Z3− [Γ ◦ (Z3ZT4)]B1= eR3≡ [Γ ◦ (B1B2TD−+ Z1F2TD−+ F1Z2T)]B1, (3.17)

Z4− [ΓT◦ (Z4Z3T)]D+B2= eR4≡ [ΓT◦ (D−B2B1T+ D−F2ZT1+ Z2F1T)]D+B2. (3.18)

The operators on the left-hand side of (3.15)–(3.18) are of similar form and we need to invert them with known right-hand sides eRj. For the generic term U − [ eΓ ◦ (U VT)]W for eΓ = [eτi j]∈ Rn×nand U = [ui j], V = [vi j], W = [wi j]∈ Rn×q(with q= m or p), the (i, j) component equals

{U − [ eΓ ◦ (U VT_)]W_} i j = ui j− q X s₌₁ ui s n X l₌₁ eτilvlswl j ! , (3.19)

implying that the i th row in (3.15)–(3.18) has the generic form (3.10) with Pi ∈ Rq×q(i= 1, . . . , n) as in (3.11). Note that ui anderi in (3.10) are the i th rows of U and the right-hand side eR in (3.15)–(3.18) or Table1below, respectively.

We shall prove (i) by induction.

For the k = 0 case in (i), except for Z4in NBGS, it is easy to see from (3.3) and (3.4)–(3.7) that

Z(1)_j are well defined and

Z∗_j > Z(1)_j > Z(0)_j = 0 ( j = 1, . . . , 4), (3.20) where the limits (indicated by(∙)∗_{) are guaranteed to exist by Theorems}_2.3_or_4.1_{together with (}_3.2_).

Note from Table1and Z(0)_j = 0 that the Pi are constant in (3.10) for Z1and Z2in NBJ and NBGS, and

Pi = 0 for Z3and Z4in NBJ as well as Z3in NBGS (because the corresponding V s in Table1vanish).

(9)

TABLE1 Construction of Pi in (3.10) or (3.19) for NBJ and NBGS Method U Γe V W Re Both Z₁(k+1) Γ D− F2 F1 Re1 Z₂(k+1) ΓT_D₊ _F 1 F2 Re2 Z₃(k+1) Γ Z(k)₄ B1 Re3 NBJ Z₄(k+1) ΓT_D+ _Z(k) 3 B2 Re4 NBGS Z₄(k+1) ΓT_D+ _Z(k+1) 3 B2 Re4

For Z4in NBGS the iteration has the following form that is similar to (3.10):

u(1)_i [I − P_i(1)]= er_i(0), (3.21) where P_i(1)is linear in V = Z(1)₃ , which has been proved to satisfy

Z₃∗> Z₃(1)> 0. (3.22)

From (1.7) and (3.20) we have u∗_i > 0 (from methods other than NBGS) ander_i∗> er_i(0)> 0. With (3.21) in place of (3.10), or u_i(1), P_i(1)and_er_i(0)in place of ui, Pi anderi, respectively, Lemma3.1then implies that I − P_i(1)is nonsingular and(I− P∗

i )−1> [I − P (1) i ]−1> 0. Consequently, u (1) i and thus Z (1) 4 are

well defined, and we have

u∗_i = er_i∗(I− Pi∗)−1> ˜r (0) i [I− P (1) i ]−1= u (1) i ⇐⇒ Z4∗> Z (1) 4 > Z (0) 4 = 0.

We have proved the k= 0 case in (i).

Assuming that (i) holds up to some value of k, we shall prove the(k+ 1) case. The conclusion can be easily drawn for SI and MSI by considering the differences between (3.4) and (3.5) for successive values of k as well as the limiting case when k→ ∞. For NBJ and NBGS in (3.6) and (3.7) for Z1and

Z2, we have that Pi(s) = Pi∗ = Pi (for all s) are constant as V = F2, F1from Table1. The limiting

case in (3.12) or a trivial application of Lemma3.1imply that [I− P_i(s)]−1> 0 (for all s). For NBJ and NBGS for Z3and Z4, the iterations take the following generic form, as in (3.10):

u(s_i +1)[I− P_i(s+1)]= er_i(s) (s = 0, 1, . . . , k + 1), (3.23) with_er_i(s) and P_i(s+1) dependent on V = Z(s)_j or Z(s₃+1) (for NBGS for the iteration for Z4, where

Z(k₃+2) > Z₃(k+1)when the iterations in (3.7) are executed in the intended order j = 1, . . . , 4). From the induction hypothesis and the linearity of eR with respect to Zj (for all j ) in (3.15)–(3.18), we have U∗ > 0, V∗ > V > 0 and eR∗ > eR(s) > 0. With (3.23) in place of (3.10), or u(s_i +1), P_i(s+1) and er_i(s)in place of ui, Pi anderi, respectively, Lemma3.1implies that I − P_i(s+1) (s = 0, 1, . . . , k + 1) are nonsingular M-matrices with non-negative inverses, P_i∗ > P_i(k+2) > P_i(k+1) and(I− P∗

i )−1 >

(10)

[I − P_i(k+2)]−1 > [I − P_i(k+1)]−1> 0. From successive values of s = k, k + 1 and the limiting case k→ ∞, appropriate differences yield

u∗_i − u(k_i +2) = er_i∗[I − P_i∗]−1−er_i(k+1)[I− P_i(k+2)]−1 = (eri∗−er (k+1) i )(I− Pi∗)−1+er (k+1) i [I− P (k+2) i ]−1[Pi∗− P (k+2) i ](I− Pi∗)−1> 0. Similarly, we have u_i(k+2)− u(k_i +1)= er_i(k+1)[I − P_i(k+2)]−1−er_i(k)[I − P_i(k+1)t]−1> 0. Thus a Z∗_j > Z(k_j +2)> Z(k_j +1)( j = 1, . . . , 4) and the induction for (i) is complete.

For (ii) a similar argument as in the proof of (ii) in Theorem2.5can be applied. For (iii) and (iv) we note that the iterates Z(k)_j are increasing towards their respective limits Z∗_j, and Rj and eRjpreserve the order of positivity of their arguments. We shall prove the inequalities again by induction. The ini-tial cases for k = 0 are obvious. Assume that the results hold for some value of k. From (3.8), for

j= 1, . . . , 4 and k = 0, 1, . . . we have Z_{S, j}(k+1)= Rj(. . . , Z_S,l(k), . . .)6 Rj(. . . , Z(k)_M,l, . . .)6 Rj(. . . , Z (ekjl) M,l, . . .)= Z (k+1) M, j , Z_{S, j}(k+1)= Rj(. . . , Z_S,l(k), . . .)6 Rj(. . . , Z(k)_{J ,l}, . . .)6 Rj(. . . , Z (bkjl) J ,l, . . .)= Z (k+1) J , j , Z(k+1) M, j = Rj(. . . , Z (ekjl) M,l, . . .)6 Rj(. . . , Z (ekjl) G,l , . . .)6 Rj(. . . , Z ( ˇkjl) G,l , . . .)= Z (k+1) G, j .

For the right-most inequalities in (iv) consider the iterations in the general form (3.23). We then have u_{J ,i}(k+1)= er_{J ,i}(k)[I− P_{J ,i}(k+1)]−16 er_G,i(k)[I− P_{J ,i}(k+1)]−16 er_G,i(k)[I − P_G,i(k+1)]−1= u(k_G,i+1)

since P_{J ,i}(k+1)6 P_G,i(k+1)ander_{J ,i}(k) 6 er_G,i(k). Thus Z_{J , j}(k+1)6 Z(k_{G, j}+1), and induction is complete. REMARK 3.3 The assumption that Bi, Fi > 0 (i = 1, 2) is just a convenient sufficient condition for Theorem 3.2. There are many other weaker but more tedious sufficient conditions that we can write down. For example, by careful application of (3.15)–(3.18) in the proof, we can make the al-ternative assumption, with W1≡ (Γ D−)◦ B and W2≡ (D+Γ D−)◦ B, that the following matrices are

positive:

F2F1T; W1B1, W1F1, B2TW1B1; F2TW2, B2TW2B1.

4. General case

For the general case with B and F being full ranked, the NARE (1.9), namely, B−− X F−− F+X+ X B+X = 0,

or the equivalent (2.5), can be solved by fixed-point iteration (as in Theorem 2.5), Newton’s method (Lu,2005;Guo & Higham,2007;Lin et al.,2008) or doubling (Guo et al.,2006;Chiang et al.,2009). The existence of the unique minimal positive solution X∗of (1.9) is guaranteed by Theorem2.3.

(11)

4.1 Newton’s method

Considering the NARE (1.9), let R(X ) denote the left-hand side of the equation. At the (k+1)th iteration with X(k)being an approximate solution and X(k+1)= X(k)+ δX(k+1), Newton’s method requires the solution of the Sylvester equation

(F+− X(k)_B+_{)δ X}(k+1)_{+ δX}(k+1)_(F−_{− B}+_X(k)₎_{= R(X}(k)_). _(4.1)

The convergence of Newton’s method is guaranteed by the following theorem quoted from Guo & Higham(2007, Theorem 2.3).

THEOREM4.1 Let S be the minimal positive solution of (1.9). Then, under assumption (1.7), for the Newton iteration (4.1) with X(0) = 0, the sequence {X(k)} is well defined, X(k) 6 X(k+1) 6 S for all k> 0, and limk→∞X(k)= S.

The proof makes use of selected results from Theorem2.2. In particular, when vectorized the above Sylvester operator can be written as the matrix operator MS(with m= n) as in Theorem2.2.

4.2 Doubling

We shall quote the doubling algorithm for the general NARE (2.1), with the matrix M in (2.2) being a nonsingular M-matrix, fromGuo et al.(2006). Note that, per iteration, the doubling algorithm is faster than Newton’s method, as concluded inGuo et al.(2006),Guo(2007) and Table2, and we refer the reader to the details in these references.

For the general NARE

X bC X− X bD− bA X+ bB = 0,

with the corresponding matrix M in (2.2) being a nonsingular M-matrix, we first transform bA, bB, bC and b

D to

Eγ = I − 2γ Vγ−1, Gγ = 2γ Dγ−1C Wb γ−1, Fγ = I − 2γ Wγ−1, Hγ = 2γ Wγ−1bB D−1γ ,

with the parameterγ > max{baii, . . . ,bann; bd11, . . . , bdnn} and

Aγ = bA+ γ I, Dγ = bD+ γ I, Wγ = Aγ− bB Dγ−1Cb, Vγ = Dγ − bC A−1γ bB.

The doubling algorithm can then be summarized as follows:

E0= Eγ, F0= Fγ, G0= Gγ, H0= Hγ,

Ek₊₁= Ek(I− GkHk)−1Ek, Fk₊₁= Fk(I− HkGk)−1Fk,

Gk₊₁= Gk+ Ek(I− GkHk)−1GkFk, Hk₊₁= Hk+ Fk(I− HkGk)−1HkEk. TABLE2 Operation counts per iteration

B, F Method Flops

Low-ranked NBGS 8(m+ p + 2)n2

General Fixed-point iteration 4n3

Newton’s method 36n3

Doubling 102₃n3

(12)

The iterates are well defined with I − HkGk and I − GkHk being nonsingular M-matrices for all k, and Hk → X and Gk → Y (the solution to the adjoint algebraic Riccati equation to (2.1)) from below quadratically as k→ ∞ (seeGuo et al.,2006, Theorem 5.1).

Note that, if D±= D, bB= bC= B D and bA= bD= (I − F)D, then we can halve the computation as Ek = Fk and Gk = Hkfor all k. Some saving in computation can also be made in Newton’s method as the Sylvester equations in the iteration become Lyapunov equations.

5. Numerical examples

For comparison, we shall summarize the operation counts per iteration of various iterative methods in Table2. We shall show only the dominant terms, assuming that n  m, p. For low-ranked B and F, only the fastest method NBGS is considered. The slow fixed-point iteration method is also included for comparison.

REMARK5.1 For the simpler NARE considered inLu(2005),Bai et al.(2008),Lin et al.(2008) and

Mehrmann & Xu(2008), the associated structure gave rise to iterative solution processes (analogous to our SI, MSI, NBJ and NBGS methods) of O(n) computational complexity. The ‘fast’ Newton method in

Bini et al.(2008) is of O(n2_{) complexity and is uncompetitive. However, our NARE in (}_1.9_{), for both the}

low-ranked and the general cases, has very different structures. It is likely that faster solution methods can be found but we do not anticipate methods of less complexity than the O(n2_{) NBGS method (for}

the low-ranked case) and the O(n3_{) doubling method (for the general case).}

We shall consider two randomly generated examples for n = 64, 128, 256, 512, 1024 and 2048. Example 1 has B and F being full ranked, and Example 2 has B and F of rank 10. For the examples the assumptions in Theorems2.3and3.2are satisfied. The numerical computation has been carried out using MATLAB R2008b on a laptop with precision eps= 2.2204 × 10−16(MathWorks,2002).

For Example 1, fixed-point iteration, Newton’s method and the doubling algorithm have been com-pared for various values of n. The iterations have been run until convergence with tolerance tol= 10−15. The results are summarized in Table3, with tn denoting the CPU time, rn ≡ tn/tn/2 and #It being

the number of iterations required, for particular values of n. The iterates are also plotted in Fig.2for n = 1024. Note that the residuals in Figs2and3are plotted using a logarithmic scale.

Table 3 and Fig. 2 seem to indicate that the doubling algorithm performs better than Newton’s method in CPU-time and the fixed-point iteration method is the slowest, as predicted in Table2. The ratios rnillustrate the O(n3) complexity of the methods. The graphs in Fig.2illustrate the quadratic convergence of the doubling algorithm and Newton’s method, with the fixed-point iteration method obviously converging linearly. Newton’s method is two to three times faster than the doubling method in terms of number of iterations, but the latter has an advantage in operation count per iteration by a factor of 3.6, resulting in its better efficiency in terms of CPU time. Note that the cputime command in MATLAB (MathWorks,2002) is not an exact reflection of CPU time consumed and should be used as a rough guide only. Also, users have no control over some parts of the algorithms, such as the inversion of the Sylvester operators by the MATLAB command lyap (MathWorks,2002) in Newton’s method.

For Example 2, only the fastest iteration method NBGS has been tested against the doubling method and the results are summarized in Table 4 (for n = 64, 128, 256, 512, 1024, 2048) and Fig. 3 (for n = 1024), with tol = 10−15. The O(n2_{) complexity of NBGS and the O(n}3_{) complexity of the doubling}

method are illustrated in the ratios rn in Table4. The linear convergence of NBGS and the quadratic convergence of the doubling method can be seen clearly in Fig. 3. NBGS usually requires less itera-tions than the doubling method and is also more efficient in terms of CPU time because of its superior

(13)

FIG. 2. Residuals for Example 1 (n_{= 1024).}

FIG. 3. Residuals for Example 2 (n= 1024).

TABLE3 CPU times and iteration numbers for Example 1

Fixed-point iteration Newton Doubling

n tn rn #It tn rn #It tn rn #It

64 0.249 — 79 0.062 — 5 0.062 — 12 128 1.809 7.26 96 1.108 17.8 5 0.561 9.04 14 256 9.111 5.04 98 3.994 3.60 5 2.371 4.23 13 512 78.35 8.59 125 20.05 5.02 6 18.22 7.68 16 1024 560.2 7.15 132 186.0 9.27 6 124.1 6.81 15 2048 5147 9.18 150 1965 10.5 6 1112 8.96 18

(14)

TABLE4 CPU times and iteration numbers for Example 2 with NBGS and doubling and the methods NBGS Doubling n tn rn #It tn rn #It 64 0.468 — 12 0.125 — 13 128 1.203 2.57 10 0.625 5.00 13 256 3.718 3.09 8 4.094 6.55 14 512 13.67 3.67 7 25.66 6.27 15 1024 51.62 3.77 6 207.0 8.07 15 2048 170.1 3.29 5 1031 4.98 17

operation count per iteration. For fixed ranks of B and F the number of iterations required decreases with respect to n, as indicated in the fourth column of Table4.

6. Concluding remarks

For the one-dimensional multistate model in transport theory we need to solve a differential equation to obtain the reflection function R. For the steady-state solution we have derived an NARE from the differential equation. We have proved the existence and uniqueness of the minimal positive solution of the NARE. When B and F are low ranked the NBGS method of O(n2_{) complexity solves the NARE}

efficiently. For the general case the doubling algorithm seems to be more efficient than Newton’s method. The numerical results support our theoretical findings.

For future work we need to improve on the efficiency of the numerical algorithms for large values of n. Finally, there are other similar models and problems in transport theory (Bellman & Wing,1975) worthy of investigation.

Funding

The Centre of Mathematical Modeling and Scientific Computing and the National Centre of Theoretical Sciences, National Chiao Tung University, Republic of China (to E.C.); The National Science Founda-tion, Republic of China (to T.L.).

REFERENCES

BAI, Z.-Z., GAO, Y.-H. & LU, L.-Z. (2008) Fast iterative schemes for nonsymmetric algebraic Riccati equations arising from transport theory. SIAM J. Sci. Comput., 30, 804–818.

BELLMAN, R. E., UENO, S. & VASUDEVAN, R. (1973) On the matrix Riccati equation of transport processes. J. Math. Anal. Appl., 44, 472–481.

BELLMAN, R. E. & WING, G. M. (1975) An Introduction to Invariant Imbedding. New York: Wiley.

BERMAN, A. & PLEMMONS, A. J. (1994) Nonnegative Matrices in the Mathematical Sciences. Philadelphia, PA: SIAM.

BINI, D. A., IANNAZZO, B. & POLONI, F. (2008) A fast Newton’s method for a nonsymmetric algebraic Riccati equation. SIAM J. Matrix Anal. Appl., 30, 276–290.

CHIANG, C.-Y., CHU, E. K.-W., GUO, C.-H., HUANG, T.-M., LIN, W.-W. & XU, S.-F. (2009) Convergence analysis of the doubling algorithm for several nonlinear matrix equations in the critical case. SIAM J. Matrix Anal. Appl., 31, 227–247.

GUO, C.-H. (2007) A new class of nonsymmetric algebraic Riccati equations. Linear Algebra Appl., 426, 636–649.

(15)

GUO, C.-H. & HIGHAM, N. J. (2007) Iterative solution of a nonsymmetric algebraic Riccati equation. SIAM J. Matrix Anal. Appl., 29, 396–412.

GUO, C.-H. & LANCASTER, P. (2006) Analysis and modification of Newton’s method for algebraic Riccati equations. Math. Comput., 67, 1089–1105.

GUO, C.-H. & LIN, W.-W. (2010) Convergence rates of some iterative methods for nonsymmetric algebraic Riccati equations arising in transport theory. Linear Algebra Appl., 432, 283–291.

GUO, X.-X., LIN, W.-W. & XU, S.-F. (2006) A structure-preserving doubling algorithm for nonsymmetric algebraic Riccati equations. Numer. Math., 103, 393–412.

HORN, R. A. & JOHNSON, C. R. (1985) Matrix Analysis. Cambridge: Cambridge University Press.

JUANG, J. (1995) Existence of algebraic matrix Riccati equations arising in transport theory. Linear Algebra Appl.,

230, 89–100.

LIN, Y., BAO, L. B. & WEI, Y. (2008) A modified Newton method for solving non-symmetric algebraic Riccati equations arising in transport theory. IMA J. Numer. Anal., 29, 215–224.

LU, L.-Z. (2005) Newton iterations for a non-symmetric algebraic Riccati equation. Numer. Linear Algebra Appl.,

12, 191–200.

MATHWORKS. (2002) MATLAB User’s Guide. Natick, MA: MathWorks.

MEHRMANN, V. & XU, H. (2008) Explicit solutions for a Riccati equation from transport theory. SIAM J. Matrix Anal., 30, 1339–1357.

WING, G. M. (1962) An Introduction to Transport Theory. New York: Wiley.