Chapter 3
Orthogonalization and least squares methods
3.1 QR-factorization (QR-decomposition)
3.1.1 Householder transformation
Definition 3.1.1 A complex m×n-matrix R = [rij] is called an upper (lower) triangular matrix, if rij= 0 for i > j (i < j).
Example 3.1.1 (1) m = n : R =
r11 · · · r1n . .. ...
0 rnn
, (2) m < n : R =
r11 · · · r1n
. .. ...
0 rmm · · · rmn
,
(3) m > n : R =
r11 · · · r1n
. .. ...
0 rnn
0
.
Definition 3.1.2 Given A∈ Cm×n, Q∈ Cm×m unitary and R ∈ Cm×n upper triangular as in Examples such that A = QR. Then the product is called a QR-factorization of A.
Basic problem:
Given b 6= 0, b ∈ Cn. Find a vector w∈ Cn with w∗w = 1 and c∈ C such that
(I− 2ww∗)b = ce1. (3.1.1)
Solution (Householder transformation):
(1) b = 0: w arbitrary (in general w = 0) and c = 0.
(2) b6= 0:
c =
−|bb11|kbk2, if b1 6= 0,
kbk2, if b1 = 0, (3.1.2)
w = 2k1 (b1− c, b2, . . . , bn)T := 2k1u with2k =p
2kbk2(kbk2+|b1|) (3.1.3)
Theorem 3.1.1 Any complex m× n matrix A can be factorized by the product A = QR, where Q is m× m-unitary. R is m × n upper triangular.
Proof: Let A(0) = A = [a(0)1 |a(0)2 | · · · |a(0)n ]. Find Q1 = (I−2w1w∗1) such that Q1a(0)1 = ce1. Then
A(1)= Q1A(0) = [Q1a(0)1 , Q1a(0)2 ,· · · , Q1a(0)n ] =
c1 ∗ · · · ∗ 0... a(1)2 · · · a(1)n 0
. (3.1.4)
Find Q2 =
1 0
0 I − w2w∗2
such that (I − 2w2w2∗)a(1)2 = c2e1. Then
A(2) = Q2A(1) =
c1 ∗ ∗ · · · ∗ 0 c2 ∗ · · · ∗ 0 0
... ... a(2)3 · · · a(2)n 0 0
.
We continue this process. Then after l = min(m, n) steps A(l) is an upper triangular matrix satisfying
A(l−1) = R = Ql−1· · · Q1A.
Then A = QR, where Q = Q∗1· · · Q∗l−1.
Remark 3.1.1 We usually call the method in Theorem 3.1.1 as Householder method.
(Algorithm ??).
Theorem 3.1.2 Let A be a nonsingular n× n matrix. Then the QR- factorization is essentially unique. That is, if A = Q1R1 = Q2R2, then there is a unitary diagonal matrix D = diag(di) with |di| = 1 such that Q1 = Q2D and DR1 = R2.
Proof: Let A = Q1R1 = Q2R2. Then Q∗2Q1 = R2R−11 = D must be a diagonal unitary matrix.
Remark 3.1.2 The QR-factorization is unique, if it is required that the diagonal ele- ments of R are positive.
Corollary 3.1.1 A is an arbitrary m× n-matrix. The following factorizations exist:
(i) A = LQ, where Q is n× n unitary and L is m × n lower triangular.
(ii) A = QL, where Q is m× m unitary and L is m × n lower triangular.
(iii) A = RQ, where Q is n× n unitary and R is m × n upper triangular.
3.1 QR-factorization (QR-decomposition) 47 Proof: (i) A∗ has a QR-factorization. Then
A∗ = QR⇒ A = R∗Q∗ ⇒ (i).
(ii) Let Pm =
O 1
1 O
. Then by Theorem 3.1.1 we have PmAPn= QR. This implies
A = (PmQPm)(PmRPn)≡ ˜QL⇒ (ii).
(iii) A∗ has a QL-factorization (from (ii)), i.e., A∗ = QL. This implies A = L∗Q∗ ⇒ (iii).
Cost of Householder method
Consider that the multiplications in (3.1.4) can be computed in the form (I− 2w1w∗1)A = (I − u1
kbk22+|b1|kbk2u∗1)A = (I− vu∗1)A
= A− vu∗1A := A− vw∗. So the first step for a m× n-matrix A requires;
c1: m multiplications, 1 root;
4k2: 1 multiplication;
v: m divisions (= multiplications);
w: mn multiplications;
A(1) = A− vw∗: m(n− 1) multiplications.
Similarly, for the j-th step m and n are replaced by m−j+1 and n−j+1, respectively.
Let l = min(m, n). Then the number of multiplications is Xl−1
j=1
[2(m− j + 1)(n − j + 1) + (m − j + 2)] (3.1.5)
= l(l− 1)[2l− 1
3 − (m + n) − 5/2] + (l − 1)(2mn + 3m + 2n + 4) (= mn2− 1/3n3, if m≥ n).
Especially, for m = n, it needs Xn−1
j=1
[2(n− j + 1)2+ m− j + 2] = 2/3n3+ 3/2n2+ 11/6n− 4 (3.1.6) flops and (l + n− 2) roots. To compute Q = Q∗1· · · Q∗l−1, it requires
2[m2n− mn2+ n3/3] multiplications (m≥ n). (3.1.7) Remark 3.1.3 Let A = QR be a QR-factorization A. Then we have
A∗A = R∗Q∗QR = R∗R.
If A has full column rank and we require that the diagonal elements of R are positive, then we obtain the Cholesky factorization of A∗A.
3.1.2 Gram-Schmidt method
Remark 3.1.4 Theorem 3.1.1 (or Algorithm ??) can be used to solved orthonormal basis (OB) problem.
(OB) : Given linearly independent vectors a1,· · · , an ∈ Rn×1. Find an orthonormal basis for span{a1,· · · , an}.
If A = [a1,· · · , an] = QR with Q = [q1,· · · , qn], and R = [rij], then ak =
Xk i=1
rikqi. (3.1.8)
By assumption rank(A) = n and (3.1.8) it implies rkk6= 0. So, we have qk = 1
rkk
(ak− Xk−1
i=1
rikqi). (3.1.9)
The vector qk can be thought as a unit vector in the direction of zk = ak −Pk−1
i=1 sikqi. To ensure that zk ⊥ q1,· · · , qk−1 we choose sik = qiTak, for i = 1, · · · , k − 1. This leads to the Classical Gram-Schmidt (CGS) Algorithm for solving (OB) problem.
Algorithm 3.1.1 (Classical Gram-Schmidt (CGS) Algorithm) Given A ∈ Rm×n with rank(A) = n. We compute A = QR, where Q∈ Rm×n has orthonormal columns and R ∈ Rn×n.
For i = 1,· · · , n, qi = ai;
For j = 1,· · · , i − 1 rji = qjTai, qi = qi − rjiqj, end for
rii =kqik2, qi = qi/rii, end for
Disadvantage : The CGS method has very poor numerical properties, if some columns of A are nearly linearly independent.
Advantage : The method requires mn2 multiplications (m≥ n).
Remark 3.1.5 Modified Gram-Schmidt (MGS):
Write A =Pn
i=1qirTi . Define A(k) by [0, A(k)] = A−
Xk−1 i=1
qiriT = Xn
i=k
qirTi (3.1.10)
It follows that if A(k)= [z, B], z ∈ Rm, B ∈ Rm×(n−k) then rkk =kzk2 and qk = z/rkk by (3.1.9). Compute
[rk,k+1,· · · , rkn] = qkTB.
Next step: A(k+1) = B− qk[rk,k+1,· · · , rkn].
3.1 QR-factorization (QR-decomposition) 49 Algorithm 3.1.2 (MGS) Given A∈ Rm×n with rank(A) = n. We compute A = QR, where Q∈ Rm×n has orthonormal columns and R∈ Rn×n is upper triangular.
For i = 1,· · · , n, qi = ai;
For j = 1,· · · , i − 1 rji = qjTqi, qi = qi − rjiqj, end for
rii =kqik2, qi = qi/rii, end for
The MGS requires mn2 multiplications.
Remark 3.1.6 MGS computes the QR factorization at the kth step, the kth column of Q and the kth row of R are computed. CGS at the kth step, the kth column of Q and the kth column of R are computed.
Advantage for OB problem (m≥ n): (i) Householder method requires mn2− n3/3 flops to get factorization. A = QR and mn2− n3/3 flops to get the first n columns of Q. But MGS requires only mn2 flops. Thus for the problem of finding an orthonormal basis of range(A), MGS is about twice as efficient as Householder orthogonalization. (ii) MGS is numerically stable.
3.1.3 Givens method
Basic problem: Given (a, b)T ∈ R2, find c, s∈ R with c2+ s2 = 1 such that
c s
−s c
a b
=
k 0
, where c = cosα and s = sinα.
Solution:
c = 1, s = 0, k = a; if b = 0, c = √a2a+b2, s = √a2b+b2, k =√
a2+ b2; if b6= 0. (3.1.11) Let
G(i, j, α) =
1
. .. ... ...
· · · cos α · · · sin α · · · ... ...
· · · − sin α · · · cos α · · ·
... ... . ..
1
.
Then G(i, j, α) is called a Givens rotation in the (i, j)-coordinate plane. In the matrix A = G(i, j, α)A, the rows with index˜ 6= i, j are the same as in A and
˜
aik = cos(α)aik+ sin(α)ajk, for k = 1, . . . , n,
˜
ajk = − sin(α)aik+ cos(α)ajk, for k = 1, . . . , n.
Algorithm 3.1.3 (Givens orthogonalization) Given A ∈ Rm×n. The folllowing Al- gorithm overwrites A with QTA = R, where Q is orthonormal and R is upper triangular.
For q = 2,· · · , m,
for p = 1, 2,· · · , min{q − 1, n},
Find c = cos α and s = sin α as in (3.1.11) such that
c s
−s c
app
aqp
=
∗ 0
. A := G(p, q, α)A.
This algorithm requires 2n2(m− n/3) flops.
Fast Givens method (See Matrix Computations, pp.205-209):
A modification of Givens method bases on the fast Givens rotations and requires about n2(m− n/3) flops.
3.2 Overdetermined linear Systems - Least Squares Methods
Given A∈ Rm×n, b∈ Rm and m > n. Consider the least squares(LS) problem:
x∈RminnkAx − bk2. (3.2.1)
Let X be the set of minimizers defined by X ={x ∈ Rn| kAx − bk2 = min!}. It is easy to see the following properties:
• x ∈ X ⇐⇒ AT(b− Ax) = 0. (3.2.2)
• X is convex. (3.2.3)
• X has a unique element xLS having minimal 2-norm. (3.2.4)
• X = {xLS} ⇐⇒ rank(A) = n. (3.2.5)
For x ∈ Rn, we refer to r = b− Ax as its residual. AT(b− Ax) = 0 is refered to as the normal equation. The minimum sum is defined by ρ2LS = kAxLS − bk22. If we let ϕ(x) = 12kAx − bk22, then ∇ϕ(x) = AT(Ax− b).
Theorem 3.2.1 Let A = Pr
i=1
σiuiviT, with r =rank(A), U = [u1, . . . , um] and V = [v1,· · · , vn] be the SVD of A∈ Rm×n (m≥ n). If b ∈ Rm, then
xLS = Pr i=1
(uTi b/σi)vi (3.2.6)
and
ρ2LS = Pm i=r+1
(uTi b)2 (3.2.7)
3.2 Overdetermined linear Systems - Least Squares Methods 51 Proof: For any x∈ Rn we have
kAx − bk22 =kUTAV (VTx)− UTbk22 = Xr
i=1
(σiαi− uTi b)2+ Xm i=r+1
(uTi b)2,
where α = VTx. Clearly, if x solves the LS-problem, then αi = (uTi b/σi), for i = 1, . . . , r.
If we set αr+1 =· · · = αn = 0, then x = xLS.
Remark 3.2.1 If we define A+ by A+ = V Σ+UT, where Σ+= diag(σ−11 , .., σr−1, 0, .., 0)
∈ Rn×mthen xLS= A+b and ρLS =k(I −AA+)bk2. A+is refered to as the pseudo-inverse of A. A+ is defined to be the unique matrix X ∈ Rn×m that satisfies Moore-Penrose conditions :
(i)AXA = A, (iii) (AX)T = AX,
(ii)XAX = X, (iv) (XA)T = XA. (3.2.8)
Existence of X is easy to check by taking X = A+. Now, we show the uniqueness of X.
Suppose X and Y satisfying the conditions (i)–(iv). Then X = XAX = X(AY A)X = X(AY A)Y (AY A)X
= (XA)(Y A)Y (AY )(AX) = (XA)T(Y A)TY (AY )T(AX)T
= (AXA)TYTY YT(AXA)T = ATYTY YTAT
= Y (AY A)Y = Y AY = Y.
If rank(A) = n (m ≥ n), then A+ = (ATA)−1AT. If rank(A) = m (m ≤ n), then A+ = AT(AAT)−1. If m = n = rank(A), then A+= A−1.
• For the case rank(A)=n:
Algorithm 3.2.1 (Normal equations) Given A ∈ Rm×n (m ≥ n) with rank(A) = n and b ∈ Rm. This Algorithm computes the solution to the LS-problem: min{kAx − bk2; x∈ Rn}.
Compute d := ATb, and form C := ATA by computing the Cholesky factorization C = RTR (see Remark 6.1). Solve RTy = d and RxLS = y.
Algorithm 3.2.2 (Householder and Givens orthogonalizations) Given A∈ Rm×n (m ≥ n) with rank(A) = n and b ∈ Rm. This Algorithm computes the solutins to the LS-problem: min{kAx − bk2; x∈ Rn}.
Compute QR-factorization QTA =
R1
0
by using Householder and Givens methods respectively. (Here R1 is upper triangular). Then
kAx − bk22 =kQTAx− QTbk22 =kR1x− ck22+kdk22, where QTb =
c d
. Thus, xLS = R1−1c, (since rank(A) =rank(R1) = n) and ρ2LS =kdk22.
Algorithm 3.2.3 (Modified Gram-Schmidt) Given A∈ Rm×n (m≥ n) with rank(A) = n and b∈ Rm. The solution of minkAx − bk2 is given by:
Compute A = Q1R1, where Q1 ∈ Rm×n with QT1Q1 = In and R1 ∈ Rn×n upper tri- angular. Then the normal equation (ATA)x = ATb is transformed to the linear system R1x = QT1b⇒ xLS = R−11 Q1Tb.
• For the case rank(A) < n:
Problem:
(i) How to find a solution to the LS-problem?
(ii) How to find the unique solution having minimal 2-norm?
(iii) How to compute xLS reliably with infinite conditioned A ?
Definition 3.2.1 Let A be a m× n matrix with rank(A) = r (r ≤ m, n). The factoriza- tion A = BC with B ∈ Rm×r and C ∈ Rr×n is called a full rank factorization, provided that B has full column rank and C has full row rank.
Theorem 3.2.2 If A = BC is a full rank factorization, then
A+ = C+B+ = CT(CCT)−1(BTB)−1BT. (3.2.9)
Proof: From assumption follows that
B+B = (BTB)−1BTB = Ir, CC+ = CCT(CCT)−1 = Ir. We calculate (3.2.8) with
A(C+B+)A = BCC+B+BC = BC = A,
(C+B+)A(C+B+) = C+B+BCC+B+ = C+B+, A(C+B+) = BCC+B+= BB+ symmetric, (C+B+)A = C+B+BC = C+C symmetric.
These imply that X = C+B+ satisfies (3.2.8). It follows A+= C+B+.
Unfortunately, if rank(A) < n, then the QR-factorization does not necessarily produce a full rank factorization of A. For example
A = [a1, a2, a3] = [q1, q2, q3]
1 1 1 0 0 1 0 0 1
.
Fortunately,we have the following two methods to produce a full rank factorization of A.
3.2 Overdetermined linear Systems - Least Squares Methods 53
3.2.1 Rank Deficiency I : QR with column pivoting
Algorithm ?? can be modified in a simple way so as to produce a full rank factorization of A.
AΠ = QR, R =
R11
| {z }0
r
R12
0
| {z }
n−r }r
}m-r , (3.2.10)
where r = rank(A) < n (m ≥ n), Q is orthogonal, R11 is nonsingular upper triangular and Π is a permuatation. Once (3.2.10) is computed, then the LS-problem can be readily solved by
kAx − bk22 =k(QTAπ)(πTx)− QTbk22 =kR11y− (c − R12z)k22+kdk22, where ΠTx =
y z
}r
}n-r and QTb =
c d
}r
}m-r . Thus if kAx − bk2 = min!, then we must have
x = Π
R−111(c− R12z) z
. If z is set to be zero, then we obtain the basic solution
xB = π
R11−1c 0
.
The basic solution is not the solution with minimal 2-norm, unless the submatrix R12 is zero. Since
kxLSk2 = min
z∈Rn−r
xB− π
R−111R12
−In−r
z
2 . (3.2.11)
We now solve the LS-problem (3.2.11) by using Algorithms 3.2.1 to 3.2.3.
Algorithm 3.2.4 Given A ∈ Rm×n, with rank(A) = r < n. The following algorithm computes the factorization AΠ = QR defined by (3.2.10). The element aij is overwritten by rij (i≤ j). The permutation Π = [ec1, . . . , ecn] is determined according to choosing the maximum of column norm in the current step.
cj := j (j = 1, 2, . . . , n), rj :=Pm
i=1
a2ij (j = 1, . . . , n), For k = 1, . . . , n,
Detemine p with (k≤ p ≤ n) so that rp = max
k≤j≤nrj. If rp = 0 then stop; else
Interchange ck and cp, rk and rp, and aik and aip, for i = 1, . . . , m.
Determine a Householder ˆQk such that
Qˆk
akk
...
amk
=
∗ 0...
0
.
A := diag(Ik−1, ˆQk)A; rj := rj − a2kj(j = k + 1, . . . , n).
This algorithm requires 2mnr− r2(m + n) + 2r3/3 flops.
Algorithm 3.2.4 produces the full rank factorization (3.2.10) of A. We have the fol- lowing important relations:
|r11| ≥ |r22| ≥ . . . ≥ |rrr|, rjj = 0, j = r + 1, . . . , n,
|rii| ≥ |rik|, i = 1, . . . , r, k = i + 1, . . . , n. (3.2.12) Here, r = rank(A) < n, and R = (rjj). In the following we show another application of the full rank factorization for solving the LS-problem.
Algorithm 3.2.5 (Compute xLS = A+b directly)
(i) Compute (3.2.10): AΠ = QR≡ (Q|{z}(1)
r
| Q(2))
R1
0
}r
}m-r , ⇒ AΠ = Q(1)R1.
(ii) (AΠ)+ = R+1Q(1)+ = R1+Q(1)T. (iii) Compute R+1:
Either: R+1 = RT1(R1RT1)−1 (since R1 has full row rank)
⇒ (AΠ)+ = RT1(R1RT1)−1Q(1)T.
Or: Find b˙ Q using Householder transformation (Algorithm ??) such that b˙QRT1 = T
0
, where T ∈ Rr×r is upper triangular.
Let bQT := ( ˆQ(1), ˆQ(2)) ⇒ RT1 = ˆQ(1)T + ˆQ(2)0 = ˆQ(1)T . R1 = TTQˆ(1)T ⇒ R+1 = ( ˆQ(1)T)+(TT)+ = ˆQ(1)(TT)−1.
⇒ (AΠ)+ = ˆQ(1)(TT)−1Q(1)T.
(iv) Since minkAx − bk2 = minkAΠ(ΠTx)− bk2 ⇒ (ΠTx )LS = (AΠ )+b
⇒ xLS = Π (AΠ )+b .
Remark 3.2.2 Unfortunately, QR with column pivoting is not entirely reliable as a method for detecting near rank deficiency. For example:
Tn(c) = diag(1, s,· · · , sn−1)
1 −c −c · · · −c 1 −c · · · −c . .. ...
0 1
c2+ s2 = 1, c, s > 0.
If n = 100, c = 0.2, then σn=0.3679e−8. But this matrix is unaltered by Algorithm 3.2.4.
However,the “degree of unreliability” is somewhat like that for Gaussian elimination with partial pivoting, a method that works very well in practice.
3.2 Overdetermined linear Systems - Least Squares Methods 55
3.2.2 Rank Deficiency II : The Singular Value Decomposition
Algorithm 3.2.6 (Householder Bidiagonalization) Given A∈ Rm×n(m≥ n). The following algorithm overwrite A with UBTAVB = B, where B is upper bidiagonal and UB
and VB are orthogonal.
For k = 1,· · · , n,
Determine a Householder matrix ˜Uk of order n− k + 1 such that
Ub˙k
akk
...
amk
=
∗ 0...
0
,
A := diag(Ik−1, b˙Uk)A,
If k≤ 2, then determine a Householder matrix b˙Vk of order n− k + 1 such that [ak,k+1,· · · , akn] b˙Vk = (∗, 0, · · · , 0),
A := Adiag(Ik, b˙Vk).
This algorithm requires 2mn2− 2/3n3 flops.
Algorithm 3.2.7 (R-Bidiagonalization) when m n we can use the following faster method of bidiagonalization.
(1) Compute an orthogonal Q1 ∈ Rm×m such that QT1A =
R1
0
, where R1 ∈ Rn×n is upper triangular.
(2) Applying Algorithm 3.2.6 to R1, we get QT2R1VB = B1, where Q2, VB ∈ Rn×n orthogonal and B1 ∈ Rn×n upper bidiagonal.
(3) Define UB = Q1diag(Q2, Im−n). Then UBTAVB =
B1
0
≡ B bidiagonal.
This algorithm require mn2 + n3. It involves fewer compuations comparing with Algorithm 7.6 (2mn2− 2/3n3) whenever m≥ 5/3n.
Once the bidiagonalization of A has been achieved,the next step in the Golub-Reinsch SVD algorithm is to zero out the super diagonal elements in B. Unfortunately, we must defer our discussion of this iteration until Chapter 5 since it requires an understanding of the symmetric QR algorithm for eigenvalues. That is, it computes orthogonal matrices UΣ and VΣ such that
UΣTBVΣ = Σ = diag(σ1,· · · , σn).
By defining U = UBUΣ and V = VBVΣ, we see that UTAV = Σ is the SVD of A.
Algorithms Flop Counts Algorithm 3.2.1 Normal equations mn2/2 + n3/6 Algorithm 3.2.2 Householder orthogonalization mn2− n3/3 rank(A)=n Algorithm 3.2.3 Modified Gram-Schmidt mn2
Algorithm 3.1.3 Givens orthogonalization 2mn2− 2/3n3 Algorithm 3.2.6 Householder Bidiagonalization 2mn2− 2/3n3 Algorithm 3.2.7 R-Bidiagonalization mn2+ n3
LINPACK Golub-Reinsch SVD 2mn2+ 4n3
rank(A) < n Algorithm 3.2.5 QR-with column pivoting 2mnr− mr2+ 1/3r3
Alg. 3.2.7+SVD Chan SVD mn2+ 11/2n3
Table 3.1: Solving the LS problem (m≥ n)
Remark 3.2.3 If the LINPACK SVD Algorithm is applied with eps=10−17 to
T100(0.2) = diag(1, s,· · · , sn−1)
1 −c −c · · · −c 1 −c · · · −c . .. ...
0 1
,
then ˆσn= 0.367805646308792467× 10−8.
Remark 3.2.4 As we mentioned before, when solving the LS problem via the SVD, only Σ and V have to be computed (see (3.2.6)). Table 3.1 compares the efficiency of this approach with the other algorithms that we have presented.
3.2.3 The Sensitivity of the Least Squares Problem
Corollary 3.2.1 (of Theorem 1.2.3) Let U = [u1,· · · , um], V = [v1,· · · , vn] and U∗AV = Σ = diag(σ1,· · · , σr, 0,· · · , 0). If k < r = rank(A) and Ak =
Pk i=1
σiuiviT, Then
rank(B)=kmin kA − Bk2 =kA − Akk2 = σk+1.
Proof: Since UTAkV = diag(σ1,· · · , σk, 0,· · · , 0), it follows rank(Ak) = k and that kA − Akk2 =kUT(A− Ak)Vk2 =kdiag(0, · · · , 0, σk+1,· · · , σr)k2 = σk+1.
Suppose B ∈ Rm×n and rank(B) = k, i.e., there are orthogonal vectors x1,· · · , xn−k such that N (B) = span{x1,· · · , xn−k}. This implies
span{x1,· · · , xn−k}\
span{v1,· · · , vk+1} 6= {0}.
3.2 Overdetermined linear Systems - Least Squares Methods 57 Let z be a unit vector in the intersection set. Then Bz = 0 and Az =
k+1P
i=1
σi(vTi z)ui. Thus,
kA − Bk22 ≥ k(A − B)zk22 =kAzk22 = Xk+1
i=1
σi2(viTz)2 ≥ σk+12 .
3.2.4 Condition number of a Rectangular Matrix
Let A∈ Rm×n, rank(A) = n, κ2(A) = σmax(A)/σmin(A).
(i) The method of normal equation:
x∈RminnkAx − bk2 ⇔ ATAx = ATb.
(a) C = ATA, d = ATb.
(b) Compute the Cholesky factorization C = GGT. (c) Solve Gy = d and GTxLS= y. Then
k˜xLS − xLSk2
xLS ≈ epsκ2(ATA) = epsκ2(A)2. k˜x − xk
kxk ≤ κ(A)
εkF k
kAk + εkfk kbk
+ o(ε2), where (A + F )˜x = b + f and Ax = b.
(ii) LS solution via QR factorization
kAx − bk22 =kQTAx− QTbk22 =kR1x− ck22+kdk22, xLS = R1−1c, ρLS =kdk2.
Numerically, trouble can be expected wherever κ2(A) = κ2(R) ≈ 1/eps. But this is in contrast to normal equation, Cholesky factorization becomes problematical once κ2(A) is in the neighborhood of 1/√eps.
Remark 3.2.5
kAk2k(ATA)−1ATk2 = κ2(A), kAk22k(ATA)−1k2 = κ2(A)2.
Theorem 3.2.3 Let A∈ Rm×n, (m≥ n), b 6= 0. Suppose that x, r, ˜x, ˜r satisfy kAx − bk = min!, r = b − Ax, ρLS =krk2,
k(A + δA)˜x − (b + δb)k2 = min!,
˜
r = (b + δb)− (A + δA)˜x.
If
ε = max{kδAk2
kAk2 ,kδbk2
kbk2 } < σn(A) σ1(A) and
sin θ = ρLS
kbk2 6= 1,
then k˜x − xk2
kxk2 ≤ ε{2κ2(A)
cos θ + tan θκ2(A)2} + O(ε2)
and k˜r − rk2
kbk2 ≤ ε(1 + 2κ2(A)) min(1, m− n) + O(ε2).
Proof: Let E = δA/ε and f = δb/ε. SincekδAk2 < σn(A), by previous Corollary follows that rank(A + εE) = n for t∈ [0, ε].
[t = ε ⇒ A + tE = A + δA. If rank(A + δA) = k < n, then kA − (A + δA)k2 = kδAk2 ≥ kA − Akk2 = σk+1 ≥ δn. Contradiction! So min
rank(B)=kkA − Bk2 = kA − Akk2
=kA −Pk
i=1
σiuivTi k2 = σk+1].
Hence we have,
(A + tE)T(A + tE)x(t) = (A + tE)T(b + tf ). (3.2.13) Since x(t) is continuously differentiable for all t∈ [0, ε], x = x(0) and ˜x = λ(ε), it follows that
˜
x = x + ε ˙x(0) + O(ε2)
and k˜x − xk
kxk = εk ˙x(0)k2
kxk + O(ε2).
Differentiating (3.2.13) and setting t = 0 then we have
ETAx + ATEx + ATA ˙x(0) = ATf + ETb.
Thus,
˙x(0) = (ATA)−1AT(f − Ex) + (ATA)−1ETr.
From kfk2 ≤ kbk2 and kEk2 ≤ kAk2 follows k˜x − xk2
kxk2 ≤ ε{kAk2k(ATA)−1ATk2( kbk2
kAk2kxk2 + 1) + ρLS
kAk2kxk2kAk22k(ATA)−1k2} + O(ε2).
Since AT(AxLS − b) = 0, AxLS ⊥ AxLS − b and then kb − Axk22+kAxk22 =kbk22
and
kAk22kxk22 ≥ kbk22 − ρ2LS.
3.2 Overdetermined linear Systems - Least Squares Methods 59 Thus,
k˜x − xk2
kxk2 ≤ eps{κ2(A)( 1
cos θ + 1) + κ2(A)2sin θ
cos θ} + O(ε2).
Furthermore, by sin θcos θ = √ ρLS
kbk22−ρ2LS, we have k˜x − xk2
kxk2 ≈ eps(κ2(A) + κ2(A)2ρLS). (θ : small )
Remark 3.2.6 Normal equation: eps κ2(A)2. QR-approach: eps(κ2(A) + ρLSκ2(A)2).
(i) If ρLS is small and κ2(A) is large, then QR is better than the normal equation.
(ii) The normal equation approach involves about half of the arithmetic when m n and does not requires as much storage.
(iii) The QR approach is applicable to a wider class of matrices because the Cholesky to ATA break down “before” the back substitution process on QTA = R.
3.2.5 Iterative Improvement
Im A AT 0
r x
=
b 0
, kb − Axk2 = min!
r + Ax = b, ATr = 0⇒ ATAx = ATb. Thus,
f(k) g(k)
=
b 0
−
I A
AT 0
r(k) x(k)
and
I A
AT 0
p(k) z(k)
=
f(k) g(k)
.
This implies,
r(k+1) x(k+1)
=
r(k) x(k)
+
p(k) z(k)
If A = QR = Q
R1 0
, then
I A
AT 0
p z
=
f g
implies that
In 0 R1 0 Im−n 0 R1T 0 0
h f2 z
=
f1
f2 g
,
where QTf =
f1 f2
QTp =
h f2
. Thus, RT1h = g⇒ h = R1−Tg. Then
z = R1−1(f1− h), P = Q
h f2
.