Proof: - Iterative techniques in matrix algebra

(“ ⇒”) Rewrite g(x) as

g(x) = < x − x^∗, A(x − x^∗) > + < x, Ax^∗ > + < x^∗, Ax >

− < x^∗, Ax^∗ > −2 < x, b >

= < x − x^∗, A(x − x^∗) > − < x^∗, Ax^∗ >

+2 < x, Ax^∗> −2 < x, b >

= < x − x^∗, A(x − x^∗) > − < x^∗, Ax^∗ > +2 < x, Ax^∗− b > . Suppose that x^∗is the solution of Ax = b, i.e., Ax^∗ = b. Then

g(x) =< x − x^∗, A(x − x^∗) > − < x^∗, Ax^∗ >

which minimum occurs at x = x^∗.

66 / 87

師大

(“⇐”) Fixed vectors x and v, for any α ∈ R, f (α) ≡ g(x + αv)

= < x + αv, Ax + αAv > −2 < x + αv, b >

= < x, Ax > +α < v, Ax > +α < x, Av > +α² < v, Av >

−2 < x, b > −2α < v, b >

= < x, Ax > −2 < x, b > +2α < v, Ax > −2α < v, b > +α² < v, Av >

= g(x) + 2α < v, Ax − b > +α²< v, Av > .

Because f is a quadratic function of α and < v, Av > is positive, f has a minimal value when f⁰(α) = 0. Since

f⁰(α) = 2 < v, Ax − b > +2α < v, Av >, the minimum occurs at

α = −< v, Ax − b >

< v, Av > = < v, b − Ax >

< v, Av > .

師大

and

g(x + ˆαv) = f ( ˆα) = g(x) − 2< v, b − Ax >

< v, Av > < v, b − Ax >

+ < v, b − Ax >

< v, Av >

= g(x) −< v, b − Ax >²

< v, Av > . So, for any nonzero vector v, we have

g(x + ˆαv) < g(x) if < v, b − Ax >6= 0 (6) and

g(x + ˆαv) = g(x) if < v, b − Ax >= 0. (7) Suppose that x^∗is a vector that minimizes g. Then

g(x^∗+ ˆαv) ≥ g(x^∗) for any v. (8)

68 / 87

師大

From (6), (7) and (8), we have

< v, b − Ax^∗>= 0 for any v, which implies that Ax^∗= b.

Let

r = b − Ax.

Then

α = < v, b − Ax >

< v, Av > = < v, r >

< v, Av >. If r 6= 0 and if v and r are not orthogonal, then

g(x + αv) < g(x)

which implies that x + αv is closer to x^∗ than is x.

師大

Let x⁽⁰⁾ be an initial approximation to x^∗ and v⁽¹⁾ 6= 0 be an initial search direction. For k = 1, 2, 3, . . ., we compute

α_k = < v^(k), b − Ax^(k−1) >

< v^(k), Av^(k)> , x^(k) = x^(k−1)+ αkv^(k)

and choose a new search direction v^(k+1).

Question: How to choose {v^(k)} such that {x^(k)} converges rapidly to x^∗?

Let Φ : Rⁿ→ R be a differential function on x. Then it holds Φ(x + εp) − Φ(x)

ε = ∇Φ(x)^Tp + O(ε).

The right hand side takes minimum at p = − ∇Φ(x)

k∇Φ(x)k (i.e., the largest descent) for all p with kpk = 1 (neglect O(ε)).

70 / 87

師大

Denote x = [x₁, x₂, . . . , x_n]^T. Then g(x) =< x, Ax > −2 < x, b >=

i=1 n

j=1

a_ijx_ix_j− 2

i=1

x_ib_i.

It follows that

∂g

∂xk

(x) = 2

i=1

a_kix_i− 2b_k, for k = 1, 2, . . . , n.

Therefore, the gradient of g is

∇g(x) = ∂g

∂x₁(x), ∂g

∂x₂, · · · , ∂g

∂x_n(x)

= 2(Ax − b) = −2r.

師大

Steepest descent method (gradient method) Given an initial x06= 0.

For k = 1, 2, . . . r_k−1= b − Ax_k−1 If rk−1= 0, then stop;

else αk = ^r

T k−1r_k−1

r^T_k−1Ar_k−1, x_k= x_k−1+ α_kr_k−1. End for

Theorem 37

If xk, xk−1are two approximations of the steepest descent method for solving Ax = b and λ1≥ λ2≥ · · · ≥ λn > 0are the eigenvalues of A, then it holds:

kxk− x^∗kA≤ λ₁− λ_n λ1+ λn

kxk−1− x^∗kA,

where kxkA=

√

x^TAx. Thus the gradient method is convergent.

72 / 87

師大

If the condition number of A (= λ₁/λ_n) is large, then

λ1−λ_n λ1+λn ≈ 1.

The gradient method converges very slowly. Hence this method is not recommendable.

It is favorable to choose that the search directions {v⁽ⁱ⁾} as mutually A-conjugate, where A is symmetric positive definite.

Definition 38

Two vectors p and q are calledA-conjugate(A-orthogonal), if p^TAq = 0.

師大

Lemma 39

Let v₁, . . . , v_n6= 0 be pairwiselyA-conjugate. Then they are linearly independent.

Proof: From

0 =

j=1

c_jv_j

follows that

0 = (vk)^TA





j=1

cjvj



=

j=1

cj(vk)^TAvj = ck(vk)^TAvk,

so c_k = 0, for k = 1, . . . , n.

74 / 87

師大

Theorem 40

Let A besymm. positive definiteand v1, . . . , vn∈ Rⁿ\{0} be pairwiselyA-orthogonal. Give x0and let r0= b − Ax0. For k = 1, . . . , n, let

αk= < v_k, b − Ax_k−1>

< vk, Avk> and xk= xk−1+ αkvk. ThenAxn = band

< b − Axk, vj>= 0, for each j = 1, 2, . . . , k − 1.

Proof: Since, for each k = 1, 2, . . . , n, xk = xk−1+ αkvk, we have

Axn = Axn−1+ αnAvn = (Axn−2+ αn−1Avn−1) + αnAvn= · · ·

= Ax₀+ α₁Av₁+ α₂Av₂+ · · · + α_nAv_n.

師大

It implies that

< Ax_n− b, v_k>

= < Ax₀− b, v_k > +α₁ < Av₁, v_k > + · · · + α_n< Av_n, v_k>

= < Ax₀− b, v_k > +α₁ < v₁, Av_k > + · · · + α_n< v_n, Av_k>

= < Ax0− b, v_k > +α_k< v_k, Av_k>

= < Ax₀− b, v_k > +< v_k, b − Ax_k−1 >

< vk, Avk> < v_k, Av_k>

= < Ax₀− b, v_k > + < v_k, b − Ax_k−1>

= < Ax0− b, v_k >

+ < vk, b − Ax0+ Ax0− Ax₁+ · · · − Axk−2+ Axk−2− Ax_k−1>

= < Ax0− b, v_k > + < vk, b − Ax0 > + < vk, Ax0− Ax₁>

+ · · · + < v_k, Ax_k−2− Ax_k−1>

= < v_k, Ax₀− Ax₁> + · · · + < v_k, Ax_k−2− Ax_k−1 > .

76 / 87

師大

For any i

xi = xi−1+ αivi and Axi = Axi−1+ αiAvi, we have

Axi−1− Ax_i = −αiAvi. Thus, for k = 1, . . . , n,

< Axn− b, v_k>

= −α₁ < v_k, Av₁ > − · · · − α_k−1 < v_k, Av_k−1 >= 0 which implies that Ax_n= b.

Suppose that

< r_k−1, v_j >= 0 for j = 1, 2, . . . , k − 1. (9) By the result

r_k= b − Ax_k= b − A(x_k−1+ α_kv_k) = r_k−1− α_kAv_k

師大

it follows that

< r_k, v_k> = < r_k−1, v_k > −α_k< Av_k, v_k>

= < r_k−1, v_k > −< v_k, b − Ax_k−1 >

< v_k, Av_k> < Av_k, v_k>

= 0.

From assumption (9) and A-orthogonality, for j = 1, . . . , k − 1

< rk, vj >=< rk−1, vj > −αk< Avk, vj >= 0 which is completed the proof by the mathematic induction.

Method of conjugate directions:

Let A besymmetric positive definite, b, x₀∈ Rⁿ. Given v₁, . . . , v_n∈ Rⁿ\{0} pairwisely A-orthogonal.

r₀ = b − Ax₀, For k = 1, . . . , n,

αk= ^<v_<v^k^,r^k−1^>

k,Avk>, xk= xk−1+ αkvk, r_k= r_k−1− α_kAv_k= b − Ax_k. End For

78 / 87

師大

Practical Implementation

In k-th step a direction v_kwhich is A-orthogonal to v₁, . . . , v_k−1must be determined.

It allows for orthogonalization of r_kagainst v1, . . . , v_k. Let r_k6= 0, g(x) decreases strictly in the direction −rk. For ε > 0small, we have g(x_k− εr_k) < g(x_k).

If r_k−1= b − Ax_k−16= 0, then we use r_k−1to generate v_k by v_k= r_k−1+ β_k−1v_k−1. (10) Choose β_k−1such that

0 = < vk−1, Avk >=< vk−1, Ark−1+ βk−1Avk−1 >

= < v_k−1, Ar_k−1> +β_k−1< v_k−1, Av_k−1> .

師大

That is

β_k−1= −< vk−1, Ark−1>

< v_k−1, Av_k−1>. (11) Theorem 41

Let vkand βk−1be defined in (10) and (11), respectively. Then r₀, . . . , r_k−1are mutually orthogonal and

< v_k, Av_i>= 0, for i = 1, 2, . . . , k − 1.

That is {v1, . . . , vk} is an A-orthogonal set.

Having chosen vk, we compute α_k = < v_k, r_k−1>

< vk, Avk> = < r_k−1+ β_k−1v_k−1, r_k−1>

< vk, Avk >

= < r_k−1, r_k−1>

< vk, Avk > + βk−1

< v_k−1, r_k−1>

< vk, Avk >

= < rk−1, rk−1>

< vk, Avk > . (12)

80 / 87

師大

Since

r_k= r_k−1− α_kAv_k, we have

< r_k, r_k>=< r_k−1, r_k > −α_k< Av_k, r_k>= −α_k< r_k, Av_k> . Further, from (12),

< r_k−1, r_k−1 >= α_k< v_k, Av_k>, so

βk = −< vk, Ark >

< v_k, Av_k> = −< rk, Avk>

< v_k, Av_k>

= (1/α_k) < r_k, r_k>

(1/αk) < rk−1, rk−1> = < r_k, r_k >

< rk−1, rk−1 >.

師大

Algorithm 4 (Conjugate Gradient method (CG-method)) Let A be s.p.d., b ∈ Rⁿ, choose x0∈ Rⁿ, r0 = b − Ax0 = v0. If r₀ = 0, then N = 0 stop, otherwise for k = 0, 1, . . .

(a). α_k= _<v^<r^k^,r^k^>

k,Av_k>, (b). x_k+1= x_k+ α_kv_k, (c). r_k+1= r_k− α_kAv_k,

(d). If r_k+1= 0, let N = k + 1, stop.

(e). β_k= ^<r_<r^k+1^,r^k+1^>

k,r_k> , (f). v_k+1= r_k+1+ β_kv_k.

Theoretically, the exact solution is obtained in n steps.

If A is well-conditioned, then approximate solution is obtained in about√

nsteps.

If A is ill-conditioned, then the number of iterations may be greater than n.

82 / 87

師大

Select a nonsingular matrix C so that A = C˜ ⁻¹AC^−T is better conditioned.

Consider the linear system A˜˜x = ˜b, where

x = C^Tx and ˜b = C⁻¹b.

Then

A˜˜x = (C⁻¹AC^−T)(C^Tx) = C⁻¹Ax.

Thus,

Ax = b ⇔ ˜A˜x = ˜b and x = C^−Tx.˜

師大

Since

x_k= C^Tx_k, we have

rk = ˜b − ˜A˜xk= C⁻¹b − C⁻¹AC^−T C^Txk

= C⁻¹(b − Axk) = C⁻¹rk. Let

v_k = C^Tv_k and w_k= C⁻¹r_k. Then

β˜_k = < ˜r_k, ˜r_k>

< ˜r_k−1, ˜r_k−1 > = < C⁻¹r_k, C⁻¹r_k>

< C⁻¹r_k−1, C⁻¹r_k−1>

= < w_k, w_k>

< w_k−1, w_k−1>.

84 / 87

師大

Thus,

α_k = < ˜r_k−1, ˜r_k−1 >

< ˜v_k, ˜A˜v_k> = < C⁻¹r_k−1, C⁻¹r_k−1 >

< C^Tv_k, C⁻¹AC^−TC^Tv_k>

= < w_k−1, w_k−1 >

< C^Tvk, C⁻¹Avk>

and, since

< C^Tv_k, C⁻¹Av_k> = (v_k)^TCC⁻¹Av_k= (v_k)^T Av_k

= < v_k, Av_k>, we have

α_k= < w_k−1, w_k−1 >

< v_k, Av_k> . Further,

x_k= ˜x_k−1+ ˜α_kv˜_k, so C^Tx_k= C^Tx_k−1+ ˜α_kC^Tv_k and

x_k = x_k−1+ ˜α_kv_k.

師大

Continuing,

rk= ˜rk−1− ˜αkA˜˜vk, so

C⁻¹r_k= C⁻¹r_k−1− ˜α_kC⁻¹AC^−TC^Tv_k and

r_k= r_k−1− ˜α_kAv_k. Finally,

v_k+1 = ˜r_k+ ˜β_k˜v_k and C^Tv_k+1= C⁻¹r_k+ ˜β_kC^Tv_k, so

v_k+1= C^−TC⁻¹r_k+ ˜β_kv_k= C^−Tw_k+ ˜β_kv_k.

86 / 87

在文檔中 Iterative techniques in matrix algebra (頁 66-87)