(“ ⇒”) Rewrite g(x) as
g(x) = < x − x∗, A(x − x∗) > + < x, Ax∗ > + < x∗, Ax >
− < x∗, Ax∗ > −2 < x, b >
= < x − x∗, A(x − x∗) > − < x∗, Ax∗ >
+2 < x, Ax∗> −2 < x, b >
= < x − x∗, A(x − x∗) > − < x∗, Ax∗ > +2 < x, Ax∗− b > . Suppose that x∗is the solution of Ax = b, i.e., Ax∗ = b. Then
g(x) =< x − x∗, A(x − x∗) > − < x∗, Ax∗ >
which minimum occurs at x = x∗.
66 / 87
師大
(“⇐”) Fixed vectors x and v, for any α ∈ R, f (α) ≡ g(x + αv)
= < x + αv, Ax + αAv > −2 < x + αv, b >
= < x, Ax > +α < v, Ax > +α < x, Av > +α2 < v, Av >
−2 < x, b > −2α < v, b >
= < x, Ax > −2 < x, b > +2α < v, Ax > −2α < v, b > +α2 < v, Av >
= g(x) + 2α < v, Ax − b > +α2< v, Av > .
Because f is a quadratic function of α and < v, Av > is positive, f has a minimal value when f0(α) = 0. Since
f0(α) = 2 < v, Ax − b > +2α < v, Av >, the minimum occurs at
ˆ
α = −< v, Ax − b >
< v, Av > = < v, b − Ax >
< v, Av > .
師大
and
g(x + ˆαv) = f ( ˆα) = g(x) − 2< v, b − Ax >
< v, Av > < v, b − Ax >
+ < v, b − Ax >
< v, Av >
2
< v, Av >
= g(x) −< v, b − Ax >2
< v, Av > . So, for any nonzero vector v, we have
g(x + ˆαv) < g(x) if < v, b − Ax >6= 0 (6) and
g(x + ˆαv) = g(x) if < v, b − Ax >= 0. (7) Suppose that x∗is a vector that minimizes g. Then
g(x∗+ ˆαv) ≥ g(x∗) for any v. (8)
68 / 87
師大
From (6), (7) and (8), we have
< v, b − Ax∗>= 0 for any v, which implies that Ax∗= b.
Let
r = b − Ax.
Then
α = < v, b − Ax >
< v, Av > = < v, r >
< v, Av >. If r 6= 0 and if v and r are not orthogonal, then
g(x + αv) < g(x)
which implies that x + αv is closer to x∗ than is x.
師大
Let x(0) be an initial approximation to x∗ and v(1) 6= 0 be an initial search direction. For k = 1, 2, 3, . . ., we compute
αk = < v(k), b − Ax(k−1) >
< v(k), Av(k)> , x(k) = x(k−1)+ αkv(k)
and choose a new search direction v(k+1).
Question: How to choose {v(k)} such that {x(k)} converges rapidly to x∗?
Let Φ : Rn→ R be a differential function on x. Then it holds Φ(x + εp) − Φ(x)
ε = ∇Φ(x)Tp + O(ε).
The right hand side takes minimum at p = − ∇Φ(x)
k∇Φ(x)k (i.e., the largest descent) for all p with kpk = 1 (neglect O(ε)).
70 / 87
師大
Denote x = [x1, x2, . . . , xn]T. Then g(x) =< x, Ax > −2 < x, b >=
n
X
i=1 n
X
j=1
aijxixj− 2
n
X
i=1
xibi.
It follows that
∂g
∂xk
(x) = 2
n
X
i=1
akixi− 2bk, for k = 1, 2, . . . , n.
Therefore, the gradient of g is
∇g(x) = ∂g
∂x1(x), ∂g
∂x2, · · · , ∂g
∂xn(x)
T
= 2(Ax − b) = −2r.
師大
Steepest descent method (gradient method) Given an initial x06= 0.
For k = 1, 2, . . . rk−1= b − Axk−1 If rk−1= 0, then stop;
else αk = r
T k−1rk−1
rTk−1Ark−1, xk= xk−1+ αkrk−1. End for
Theorem 37
If xk, xk−1are two approximations of the steepest descent method for solving Ax = b and λ1≥ λ2≥ · · · ≥ λn > 0are the eigenvalues of A, then it holds:
kxk− x∗kA≤ λ1− λn λ1+ λn
kxk−1− x∗kA,
where kxkA=
√
xTAx. Thus the gradient method is convergent.
72 / 87
師大
If the condition number of A (= λ1/λn) is large, then
λ1−λn λ1+λn ≈ 1.
The gradient method converges very slowly. Hence this method is not recommendable.
It is favorable to choose that the search directions {v(i)} as mutually A-conjugate, where A is symmetric positive definite.
Definition 38
Two vectors p and q are calledA-conjugate(A-orthogonal), if pTAq = 0.
師大
Lemma 39
Let v1, . . . , vn6= 0 be pairwiselyA-conjugate. Then they are linearly independent.
Proof: From
0 =
n
X
j=1
cjvj
follows that
0 = (vk)TA
n
X
j=1
cjvj
=
n
X
j=1
cj(vk)TAvj = ck(vk)TAvk,
so ck = 0, for k = 1, . . . , n.
74 / 87
師大
Theorem 40
Let A besymm. positive definiteand v1, . . . , vn∈ Rn\{0} be pairwiselyA-orthogonal. Give x0and let r0= b − Ax0. For k = 1, . . . , n, let
αk= < vk, b − Axk−1>
< vk, Avk> and xk= xk−1+ αkvk. ThenAxn = band
< b − Axk, vj>= 0, for each j = 1, 2, . . . , k − 1.
Proof: Since, for each k = 1, 2, . . . , n, xk = xk−1+ αkvk, we have
Axn = Axn−1+ αnAvn = (Axn−2+ αn−1Avn−1) + αnAvn= · · ·
= Ax0+ α1Av1+ α2Av2+ · · · + αnAvn.
師大
It implies that
< Axn− b, vk>
= < Ax0− b, vk > +α1 < Av1, vk > + · · · + αn< Avn, vk>
= < Ax0− b, vk > +α1 < v1, Avk > + · · · + αn< vn, Avk>
= < Ax0− b, vk > +αk< vk, Avk>
= < Ax0− b, vk > +< vk, b − Axk−1 >
< vk, Avk> < vk, Avk>
= < Ax0− b, vk > + < vk, b − Axk−1>
= < Ax0− b, vk >
+ < vk, b − Ax0+ Ax0− Ax1+ · · · − Axk−2+ Axk−2− Axk−1>
= < Ax0− b, vk > + < vk, b − Ax0 > + < vk, Ax0− Ax1>
+ · · · + < vk, Axk−2− Axk−1>
= < vk, Ax0− Ax1> + · · · + < vk, Axk−2− Axk−1 > .
76 / 87
師大
For any i
xi = xi−1+ αivi and Axi = Axi−1+ αiAvi, we have
Axi−1− Axi = −αiAvi. Thus, for k = 1, . . . , n,
< Axn− b, vk>
= −α1 < vk, Av1 > − · · · − αk−1 < vk, Avk−1 >= 0 which implies that Axn= b.
Suppose that
< rk−1, vj >= 0 for j = 1, 2, . . . , k − 1. (9) By the result
rk= b − Axk= b − A(xk−1+ αkvk) = rk−1− αkAvk
師大
it follows that
< rk, vk> = < rk−1, vk > −αk< Avk, vk>
= < rk−1, vk > −< vk, b − Axk−1 >
< vk, Avk> < Avk, vk>
= 0.
From assumption (9) and A-orthogonality, for j = 1, . . . , k − 1
< rk, vj >=< rk−1, vj > −αk< Avk, vj >= 0 which is completed the proof by the mathematic induction.
Method of conjugate directions:
Let A besymmetric positive definite, b, x0∈ Rn. Given v1, . . . , vn∈ Rn\{0} pairwisely A-orthogonal.
r0 = b − Ax0, For k = 1, . . . , n,
αk= <v<vk,rk−1>
k,Avk>, xk= xk−1+ αkvk, rk= rk−1− αkAvk= b − Axk. End For
78 / 87
師大
Practical Implementation
In k-th step a direction vkwhich is A-orthogonal to v1, . . . , vk−1must be determined.
It allows for orthogonalization of rkagainst v1, . . . , vk. Let rk6= 0, g(x) decreases strictly in the direction −rk. For ε > 0small, we have g(xk− εrk) < g(xk).
If rk−1= b − Axk−16= 0, then we use rk−1to generate vk by vk= rk−1+ βk−1vk−1. (10) Choose βk−1such that
0 = < vk−1, Avk >=< vk−1, Ark−1+ βk−1Avk−1 >
= < vk−1, Ark−1> +βk−1< vk−1, Avk−1> .
師大
That is
βk−1= −< vk−1, Ark−1>
< vk−1, Avk−1>. (11) Theorem 41
Let vkand βk−1be defined in (10) and (11), respectively. Then r0, . . . , rk−1are mutually orthogonal and
< vk, Avi>= 0, for i = 1, 2, . . . , k − 1.
That is {v1, . . . , vk} is an A-orthogonal set.
Having chosen vk, we compute αk = < vk, rk−1>
< vk, Avk> = < rk−1+ βk−1vk−1, rk−1>
< vk, Avk >
= < rk−1, rk−1>
< vk, Avk > + βk−1
< vk−1, rk−1>
< vk, Avk >
= < rk−1, rk−1>
< vk, Avk > . (12)
80 / 87
師大
Since
rk= rk−1− αkAvk, we have
< rk, rk>=< rk−1, rk > −αk< Avk, rk>= −αk< rk, Avk> . Further, from (12),
< rk−1, rk−1 >= αk< vk, Avk>, so
βk = −< vk, Ark >
< vk, Avk> = −< rk, Avk>
< vk, Avk>
= (1/αk) < rk, rk>
(1/αk) < rk−1, rk−1> = < rk, rk >
< rk−1, rk−1 >.
師大
Algorithm 4 (Conjugate Gradient method (CG-method)) Let A be s.p.d., b ∈ Rn, choose x0∈ Rn, r0 = b − Ax0 = v0. If r0 = 0, then N = 0 stop, otherwise for k = 0, 1, . . .
(a). αk= <v<rk,rk>
k,Avk>, (b). xk+1= xk+ αkvk, (c). rk+1= rk− αkAvk,
(d). If rk+1= 0, let N = k + 1, stop.
(e). βk= <r<rk+1,rk+1>
k,rk> , (f). vk+1= rk+1+ βkvk.
Theoretically, the exact solution is obtained in n steps.
If A is well-conditioned, then approximate solution is obtained in about√
nsteps.
If A is ill-conditioned, then the number of iterations may be greater than n.
82 / 87
師大
Select a nonsingular matrix C so that A = C˜ −1AC−T is better conditioned.
Consider the linear system A˜˜x = ˜b, where
˜
x = CTx and ˜b = C−1b.
Then
A˜˜x = (C−1AC−T)(CTx) = C−1Ax.
Thus,
Ax = b ⇔ ˜A˜x = ˜b and x = C−Tx.˜
師大
Since
˜
xk= CTxk, we have
˜
rk = ˜b − ˜A˜xk= C−1b − C−1AC−T CTxk
= C−1(b − Axk) = C−1rk. Let
˜
vk = CTvk and wk= C−1rk. Then
β˜k = < ˜rk, ˜rk>
< ˜rk−1, ˜rk−1 > = < C−1rk, C−1rk>
< C−1rk−1, C−1rk−1>
= < wk, wk>
< wk−1, wk−1>.
84 / 87
師大
Thus,
˜
αk = < ˜rk−1, ˜rk−1 >
< ˜vk, ˜A˜vk> = < C−1rk−1, C−1rk−1 >
< CTvk, C−1AC−TCTvk>
= < wk−1, wk−1 >
< CTvk, C−1Avk>
and, since
< CTvk, C−1Avk> = (vk)TCC−1Avk= (vk)T Avk
= < vk, Avk>, we have
˜
αk= < wk−1, wk−1 >
< vk, Avk> . Further,
˜
xk= ˜xk−1+ ˜αkv˜k, so CTxk= CTxk−1+ ˜αkCTvk and
xk = xk−1+ ˜αkvk.
師大
Continuing,
˜
rk= ˜rk−1− ˜αkA˜˜vk, so
C−1rk= C−1rk−1− ˜αkC−1AC−TCTvk and
rk= rk−1− ˜αkAvk. Finally,
˜
vk+1 = ˜rk+ ˜βk˜vk and CTvk+1= C−1rk+ ˜βkCTvk, so
vk+1= C−TC−1rk+ ˜βkvk= C−Twk+ ˜βkvk.
86 / 87