3.1 QR-factorization (QR-decomposition)

(1)

Chapter 3 Orthogonalization and least squares methods

3.1 QR-factorization (QR-decomposition)

3.1.1 Householder transformation

Definition 3.1.1 A complex m×n-matrix R = [r^ij] is called an upper (lower) triangular matrix, if rij= 0 for i > j (i < j).

Example 3.1.1 (1) m = n : R =





r11 · · · r¹ⁿ . .. ...

0 rnn



, (2) m < n : R =





r11 · · · r¹ⁿ

. .. ...

0 rmm · · · r^mn



,

(3) m > n : R =







r11 · · · r1n

. .. ...

0 rnn

0





.

Definition 3.1.2 Given A∈ C^m×n, Q∈ C^m×m unitary and R ∈ C^m×n upper triangular as in Examples such that A = QR. Then the product is called a QR-factorization of A.

Basic problem:

Given b 6= 0, b ∈ Cⁿ. Find a vector w∈ Cⁿ with w^∗w = 1 and c∈ C such that

(I− 2ww^∗)b = ce1. (3.1.1)

Solution (Householder transformation):

(1) b = 0: w arbitrary (in general w = 0) and c = 0.

(2) b6= 0:

c =

−_|b^b¹₁_|kbk2, if b1 6= 0,

kbk², if b1 = 0, (3.1.2)

w = _2k¹ (b1− c, b2, . . . , bn)^T := _2k¹u with2k =p

2kbk²(kbk²+|b¹|) (3.1.3)

(2)

Theorem 3.1.1 Any complex m× n matrix A can be factorized by the product A = QR, where Q is m× m-unitary. R is m × n upper triangular.

Proof: Let A⁽⁰⁾ = A = [a⁽⁰⁾₁ |a⁽⁰⁾2 | · · · |a⁽⁰⁾ⁿ ]. Find Q1 = (I−2w¹w^∗₁) such that Q1a⁽⁰⁾₁ = ce1. Then

A⁽¹⁾= Q1A⁽⁰⁾ = [Q1a⁽⁰⁾₁ , Q1a⁽⁰⁾₂ ,· · · , Q¹a⁽⁰⁾_n ] =







c1 ∗ · · · ∗ 0... a⁽¹⁾₂ · · · a⁽¹⁾ⁿ 0





. (3.1.4)

Find Q2 =





1 0

0 I − w2w^∗₂



 such that (I − 2w2w₂^∗)a⁽¹⁾₂ = c2e1. Then

A⁽²⁾ = Q2A⁽¹⁾ =







c₁ ∗ ∗ · · · ∗ 0 c2 ∗ · · · ∗ 0 0

... ... a⁽²⁾₃ · · · a⁽²⁾ⁿ 0 0





 .

We continue this process. Then after l = min(m, n) steps A^(l) is an upper triangular matrix satisfying

A^(l−1) = R = Q_l−1· · · Q1A.

Then A = QR, where Q = Q^∗₁· · · Q^∗_l−1.

Remark 3.1.1 We usually call the method in Theorem 3.1.1 as Householder method.

(Algorithm ??).

Theorem 3.1.2 Let A be a nonsingular n× n matrix. Then the QR- factorization is essentially unique. That is, if A = Q1R1 = Q2R2, then there is a unitary diagonal matrix D = diag(di) with |di| = 1 such that Q1 = Q2D and DR1 = R2.

Proof: Let A = Q1R1 = Q2R2. Then Q^∗₂Q1 = R2R⁻¹₁ = D must be a diagonal unitary matrix.

Remark 3.1.2 The QR-factorization is unique, if it is required that the diagonal elements of R are positive.

Corollary 3.1.1 A is an arbitrary m× n-matrix. The following factorizations exist:

(i) A = LQ, where Q is n× n unitary and L is m × n lower triangular.

(ii) A = QL, where Q is m× m unitary and L is m × n lower triangular.

(iii) A = RQ, where Q is n× n unitary and R is m × n upper triangular.

(3)

3.1 QR-factorization (QR-decomposition) 47 Proof: (i) A^∗ has a QR-factorization. Then

A^∗ = QR⇒ A = R^∗Q^∗ ⇒ (i).

(ii) Let P_m =





O 1

1 O



. Then by Theorem 3.1.1 we have PmAP_n= QR. This implies

A = (PmQPm)(PmRPn)≡ ˜QL⇒ (ii).

(iii) A^∗ has a QL-factorization (from (ii)), i.e., A^∗ = QL. This implies A = L^∗Q^∗ ⇒ (iii).

Cost of Householder method

Consider that the multiplications in (3.1.4) can be computed in the form (I− 2w1w^∗₁)A = (I − u1

kbk²2+|b¹|kbk²u^∗₁)A = (I− vu^∗1)A

= A− vu^∗1A := A− vw^∗. So the first step for a m× n-matrix A requires;

c1: m multiplications, 1 root;

4k²: 1 multiplication;

v: m divisions (= multiplications);

w: mn multiplications;

A⁽¹⁾ = A− vw^∗: m(n− 1) multiplications.

Similarly, for the j-th step m and n are replaced by m−j+1 and n−j+1, respectively.

Let l = min(m, n). Then the number of multiplications is Xl−1

j=1

[2(m− j + 1)(n − j + 1) + (m − j + 2)] (3.1.5)

= l(l− 1)[2l− 1

3 − (m + n) − 5/2] + (l − 1)(2mn + 3m + 2n + 4) (= mn²− 1/3n³, if m≥ n).

Especially, for m = n, it needs Xn−1

j=1

[2(n− j + 1)²+ m− j + 2] = 2/3n³+ 3/2n²+ 11/6n− 4 (3.1.6) flops and (l + n− 2) roots. To compute Q = Q^∗1· · · Q^∗_l−1, it requires

2[m²n− mn²+ n³/3] multiplications (m≥ n). (3.1.7) Remark 3.1.3 Let A = QR be a QR-factorization A. Then we have

A^∗A = R^∗Q^∗QR = R^∗R.

If A has full column rank and we require that the diagonal elements of R are positive, then we obtain the Cholesky factorization of A^∗A.

(4)

3.1.2 Gram-Schmidt method

Remark 3.1.4 Theorem 3.1.1 (or Algorithm ??) can be used to solved orthonormal basis (OB) problem.

(OB) : Given linearly independent vectors a1,· · · , aⁿ ∈ R^n×1. Find an orthonormal basis for span{a1,· · · , an}.

If A = [a1,· · · , an] = QR with Q = [q1,· · · , qn], and R = [rij], then ak =

Xk i=1

rikqi. (3.1.8)

By assumption rank(A) = n and (3.1.8) it implies r_kk6= 0. So, we have qk = 1

rkk

(ak− Xk−1

i=1

rikqi). (3.1.9)

The vector qk can be thought as a unit vector in the direction of zk = ak −P_k−1

i=1 sikqi. To ensure that zk ⊥ q1,· · · , qk−1 we choose sik = q_i^Tak, for i = 1, · · · , k − 1. This leads to the Classical Gram-Schmidt (CGS) Algorithm for solving (OB) problem.

Algorithm 3.1.1 (Classical Gram-Schmidt (CGS) Algorithm) Given A ∈ R^m×n with rank(A) = n. We compute A = QR, where Q∈ R^m×n has orthonormal columns and R ∈ R^n×n.

For i = 1,· · · , n, qi = ai;

For j = 1,· · · , i − 1 r_ji = q_j^Ta_i, qi = qi − r^jiqj, end for

r_ii =kqik2, qi = qi/rii, end for

Disadvantage : The CGS method has very poor numerical properties, if some columns of A are nearly linearly independent.

Advantage : The method requires mn² multiplications (m≥ n).

Remark 3.1.5 Modified Gram-Schmidt (MGS):

Write A =Pn

i=1qir^T_i . Define A^(k) by [0, A^(k)] = A−

Xk−1 i=1

qir_i^T = Xn

i=k

qir^T_i (3.1.10)

It follows that if A^(k)= [z, B], z ∈ R^m, B ∈ R^m×(n−k) then rkk =kzk2 and qk = z/rkk by (3.1.9). Compute

[rk,k+1,· · · , r^kn] = q_k^TB.

Next step: A^(k+1) = B− qk[r_k,k+1,· · · , rkn].

(5)

3.1 QR-factorization (QR-decomposition) 49 Algorithm 3.1.2 (MGS) Given A∈ R^m×n with rank(A) = n. We compute A = QR, where Q∈ R^m×n has orthonormal columns and R∈ R^n×n is upper triangular.

For i = 1,· · · , n, q_i = a_i;

For j = 1,· · · , i − 1 rji = q_j^Tqi, q_i = q_i − rjiq_j, end for

rii =kqik2, q_i = q_i/r_ii, end for

The MGS requires mn² multiplications.

Remark 3.1.6 MGS computes the QR factorization at the kth step, the kth column of Q and the kth row of R are computed. CGS at the kth step, the kth column of Q and the kth column of R are computed.

Advantage for OB problem (m≥ n): (i) Householder method requires mn²− n³/3 flops to get factorization. A = QR and mn²− n³/3 flops to get the first n columns of Q. But MGS requires only mn² flops. Thus for the problem of finding an orthonormal basis of range(A), MGS is about twice as efficient as Householder orthogonalization. (ii) MGS is numerically stable.

3.1.3 Givens method

Basic problem: Given (a, b)^T ∈ R², find c, s∈ R with c²+ s² = 1 such that

c s

−s c

a b

=

k 0

, where c = cosα and s = sinα.

Solution:

c = 1, s = 0, k = a; if b = 0, c = ^√_a₂^a_+b₂, s = ^√_a₂^b_+b₂, k =√

a²+ b²; if b6= 0. (3.1.11) Let

G(i, j, α) =





 1

. .. ... ...

· · · cos α · · · sin α · · · ... ...

· · · − sin α · · · cos α · · ·

... ... . ..

1





 .

Then G(i, j, α) is called a Givens rotation in the (i, j)-coordinate plane. In the matrix A = G(i, j, α)A, the rows with index˜ 6= i, j are the same as in A and

˜

aik = cos(α)aik+ sin(α)ajk, for k = 1, . . . , n,

˜

a_jk = − sin(α)aik+ cos(α)a_jk, for k = 1, . . . , n.

(6)

Algorithm 3.1.3 (Givens orthogonalization) Given A ∈ R^m×n. The folllowing Al- gorithm overwrites A with Q^TA = R, where Q is orthonormal and R is upper triangular.

For q = 2,· · · , m,

for p = 1, 2,· · · , min{q − 1, n},

Find c = cos α and s = sin α as in (3.1.11) such that

c s

−s c

app

a_qp

=

∗ 0

. A := G(p, q, α)A.

This algorithm requires 2n²(m− n/3) flops.

Fast Givens method (See Matrix Computations, pp.205-209):

A modification of Givens method bases on the fast Givens rotations and requires about n²(m− n/3) flops.

3.2 Overdetermined linear Systems - Least Squares Methods

Given A∈ R^m×n, b∈ R^m and m > n. Consider the least squares(LS) problem:

x∈RminⁿkAx − bk2. (3.2.1)

Let X be the set of minimizers defined by X ={x ∈ Rⁿ| kAx − bk² = min!}. It is easy to see the following properties:

• x ∈ X ⇐⇒ A^T(b− Ax) = 0. (3.2.2)

• X is convex. (3.2.3)

• X has a unique element xLS having minimal 2-norm. (3.2.4)

• X = {xLS} ⇐⇒ rank(A) = n. (3.2.5)

For x ∈ Rⁿ, we refer to r = b− Ax as its residual. A^T(b− Ax) = 0 is refered to as the normal equation. The minimum sum is defined by ρ²_LS = kAx^LS − bk²2. If we let ϕ(x) = ¹₂kAx − bk²2, then ∇ϕ(x) = A^T(Ax− b).

Theorem 3.2.1 Let A = P^r

i=1

σ_iu_iv_i^T, with r =rank(A), U = [u₁, . . . , u_m] and V = [v1,· · · , vn] be the SVD of A∈ R^m×n (m≥ n). If b ∈ R^m, then

x^LS = Pr i=1

(u^T_i b/σi)vi (3.2.6)

and

ρ²LS = Pm i=r+1

(u^T_i b)² (3.2.7)

(7)

3.2 Overdetermined linear Systems - Least Squares Methods 51 Proof: For any x∈ Rⁿ we have

kAx − bk²2 =kU^TAV (V^Tx)− U^Tbk²2 = Xr

i=1

(σiαi− u^Ti b)²+ Xm i=r+1

(u^T_i b)²,

where α = V^Tx. Clearly, if x solves the LS-problem, then αi = (u^T_i b/σi), for i = 1, . . . , r.

If we set αr+1 =· · · = αn = 0, then x = xLS.

Remark 3.2.1 If we define A⁺ by A⁺ = V Σ⁺U^T, where Σ⁺= diag(σ⁻¹₁ , .., σ_r⁻¹, 0, .., 0)

∈ R^n×mthen xLS= A⁺b and ρLS =k(I −AA⁺)bk². A⁺is refered to as the pseudo-inverse of A. A⁺ is defined to be the unique matrix X ∈ R^n×m that satisfies Moore-Penrose conditions :

(i)AXA = A, (iii) (AX)^T = AX,

(ii)XAX = X, (iv) (XA)^T = XA. (3.2.8)

Existence of X is easy to check by taking X = A⁺. Now, we show the uniqueness of X.

Suppose X and Y satisfying the conditions (i)–(iv). Then X = XAX = X(AY A)X = X(AY A)Y (AY A)X

= (XA)(Y A)Y (AY )(AX) = (XA)^T(Y A)^TY (AY )^T(AX)^T

= (AXA)^TY^TY Y^T(AXA)^T = A^TY^TY Y^TA^T

= Y (AY A)Y = Y AY = Y.

If rank(A) = n (m ≥ n), then A⁺ = (A^TA)⁻¹A^T. If rank(A) = m (m ≤ n), then A⁺ = A^T(AA^T)⁻¹. If m = n = rank(A), then A⁺= A⁻¹.

• For the case rank(A)=n:

Algorithm 3.2.1 (Normal equations) Given A ∈ R^m×n (m ≥ n) with rank(A) = n and b ∈ R^m. This Algorithm computes the solution to the LS-problem: min{kAx − bk2; x∈ Rⁿ}.

Compute d := A^Tb, and form C := A^TA by computing the Cholesky factorization C = R^TR (see Remark 6.1). Solve R^Ty = d and Rx^LS = y.

Algorithm 3.2.2 (Householder and Givens orthogonalizations) Given A∈ R^m×n (m ≥ n) with rank(A) = n and b ∈ R^m. This Algorithm computes the solutins to the LS-problem: min{kAx − bk2; x∈ Rⁿ}.

Compute QR-factorization Q^TA =

R1

0

by using Householder and Givens methods respectively. (Here R1 is upper triangular). Then

kAx − bk²2 =kQ^TAx− Q^Tbk²2 =kR¹x− ck²2+kdk²2, where Q^Tb =

c d

. Thus, xLS = R1⁻¹c, (since rank(A) =rank(R₁) = n) and ρ²LS =kdk²².

(8)

Algorithm 3.2.3 (Modified Gram-Schmidt) Given A∈ R^m×n (m≥ n) with rank(A) = n and b∈ R^m. The solution of minkAx − bk² is given by:

Compute A = Q1R1, where Q1 ∈ R^m×n with Q^T₁Q1 = In and R1 ∈ R^n×n upper triangular. Then the normal equation (A^TA)x = A^Tb is transformed to the linear system R1x = Q^T₁b⇒ x^LS = R⁻¹1 Q1^Tb.

• For the case rank(A) < n:

Problem:

(i) How to find a solution to the LS-problem?

(ii) How to find the unique solution having minimal 2-norm?

(iii) How to compute x_LS reliably with infinite conditioned A ?

Definition 3.2.1 Let A be a m× n matrix with rank(A) = r (r ≤ m, n). The factorization A = BC with B ∈ R^m×r and C ∈ R^r×n is called a full rank factorization, provided that B has full column rank and C has full row rank.

Theorem 3.2.2 If A = BC is a full rank factorization, then

A⁺ = C⁺B⁺ = C^T(CC^T)⁻¹(B^TB)⁻¹B^T. (3.2.9)

Proof: From assumption follows that

B⁺B = (B^TB)⁻¹B^TB = Ir, CC⁺ = CC^T(CC^T)⁻¹ = Ir. We calculate (3.2.8) with

A(C⁺B⁺)A = BCC⁺B⁺BC = BC = A,

(C⁺B⁺)A(C⁺B⁺) = C⁺B⁺BCC⁺B⁺ = C⁺B⁺, A(C⁺B⁺) = BCC⁺B⁺= BB⁺ symmetric, (C⁺B⁺)A = C⁺B⁺BC = C⁺C symmetric.

These imply that X = C⁺B⁺ satisfies (3.2.8). It follows A⁺= C⁺B⁺.

Unfortunately, if rank(A) < n, then the QR-factorization does not necessarily produce a full rank factorization of A. For example

A = [a₁, a₂, a₃] = [q₁, q₂, q₃]



 1 1 1 0 0 1 0 0 1



 .

Fortunately,we have the following two methods to produce a full rank factorization of A.

(9)

3.2 Overdetermined linear Systems - Least Squares Methods 53

3.2.1 Rank Deficiency I : QR with column pivoting

Algorithm ?? can be modified in a simple way so as to produce a full rank factorization of A.

AΠ = QR, R =

R11

| {z }0

r

R12

0

| {z }

n−r }r

}m-r , (3.2.10)

where r = rank(A) < n (m ≥ n), Q is orthogonal, R¹¹ is nonsingular upper triangular and Π is a permuatation. Once (3.2.10) is computed, then the LS-problem can be readily solved by

kAx − bk²2 =k(Q^TAπ)(π^Tx)− Q^Tbk²2 =kR11y− (c − R12z)k²2+kdk²2, where Π^Tx =

y z

}r

}n-r and Q^Tb =

c d

}r

}m-r . Thus if kAx − bk2 = min!, then we must have

x = Π

R⁻¹₁₁(c− R12z) z

. If z is set to be zero, then we obtain the basic solution

xB = π

R₁₁⁻¹c 0

.

The basic solution is not the solution with minimal 2-norm, unless the submatrix R12 is zero. Since

kx^LSk² = min

z∈Rⁿ^−r

x^B− π

R⁻¹₁₁R12

−In−r

z

₂ . (3.2.11)

We now solve the LS-problem (3.2.11) by using Algorithms 3.2.1 to 3.2.3.

Algorithm 3.2.4 Given A ∈ R^m×n, with rank(A) = r < n. The following algorithm computes the factorization AΠ = QR defined by (3.2.10). The element aij is overwritten by rij (i≤ j). The permutation Π = [ec1, . . . , ecn] is determined according to choosing the maximum of column norm in the current step.

cj := j (j = 1, 2, . . . , n), r_j :=P^m

i=1

a²_ij (j = 1, . . . , n), For k = 1, . . . , n,

Detemine p with (k≤ p ≤ n) so that r^p = max

k≤j≤nrj. If r_p = 0 then stop; else

Interchange ck and cp, rk and rp, and aik and aip, for i = 1, . . . , m.

Determine a Householder ˆQk such that

Qˆk





 akk

...

amk





=







∗ 0...

0





.

A := diag(I_k−1, ˆQ_k)A; r_j := r_j − a²kj(j = k + 1, . . . , n).

(10)

This algorithm requires 2mnr− r²(m + n) + 2r³/3 flops.

Algorithm 3.2.4 produces the full rank factorization (3.2.10) of A. We have the following important relations:

|r11| ≥ |r22| ≥ . . . ≥ |rrr|, rjj = 0, j = r + 1, . . . , n,

|rii| ≥ |rik|, i = 1, . . . , r, k = i + 1, . . . , n. (3.2.12) Here, r = rank(A) < n, and R = (rjj). In the following we show another application of the full rank factorization for solving the LS-problem.

Algorithm 3.2.5 (Compute x^LS = A⁺b directly)

(i) Compute (3.2.10): AΠ = QR≡ (Q|{z}⁽¹⁾

r

| Q⁽²⁾)

R1

0

}r

}m-r , ⇒ AΠ = Q⁽¹⁾R1.

(ii) (AΠ)⁺ = R⁺₁Q⁽¹⁾⁺ = R₁⁺Q⁽¹⁾^T. (iii) Compute R⁺₁:

Either: R⁺₁ = R^T₁(R₁R^T₁)⁻¹ (since R₁ has full row rank)

⇒ (AΠ)⁺ = R^T₁(R1R^T₁)⁻¹Q⁽¹⁾^T.

Or: Find b˙ Q using Householder transformation (Algorithm ??) such that b˙QR^T₁ = T

0

, where T ∈ R^r×r is upper triangular.

Let bQ^T := ( ˆQ⁽¹⁾, ˆQ⁽²⁾) ⇒ R^T1 = ˆQ⁽¹⁾T + ˆQ⁽²⁾0 = ˆQ⁽¹⁾T . R1 = T^TQˆ⁽¹⁾^T ⇒ R⁺1 = ( ˆQ^(1)T)⁺(T^T)⁺ = ˆQ⁽¹⁾(T^T)⁻¹.

⇒ (AΠ)⁺ = ˆQ⁽¹⁾(T^T)⁻¹Q⁽¹⁾^T.

(iv) Since minkAx − bk2 = minkAΠ(Π^Tx)− bk2 ⇒ (Π^Tx )LS = (AΠ )⁺b

⇒ x^LS = Π (AΠ )⁺b .

Remark 3.2.2 Unfortunately, QR with column pivoting is not entirely reliable as a method for detecting near rank deficiency. For example:

T_n(c) = diag(1, s,· · · , sⁿ⁻¹)







1 −c −c · · · −c 1 −c · · · −c . .. ...

0 1





 c²+ s² = 1, c, s > 0.

If n = 100, c = 0.2, then σn=0.3679e−8. But this matrix is unaltered by Algorithm 3.2.4.

However,the “degree of unreliability” is somewhat like that for Gaussian elimination with partial pivoting, a method that works very well in practice.

(11)

3.2 Overdetermined linear Systems - Least Squares Methods 55

3.2.2 Rank Deficiency II : The Singular Value Decomposition

Algorithm 3.2.6 (Householder Bidiagonalization) Given A∈ R^m×n(m≥ n). The following algorithm overwrite A with U_B^TAVB = B, where B is upper bidiagonal and UB

and V_B are orthogonal.

For k = 1,· · · , n,

Determine a Householder matrix ˜Uk of order n− k + 1 such that

Ub˙_k





 a_kk

...

amk





=







∗ 0...

0





,

A := diag(I_k−1, b˙U_k)A,

If k≤ 2, then determine a Householder matrix b˙Vk of order n− k + 1 such that [ak,k+1,· · · , a^kn] b˙Vk = (∗, 0, · · · , 0),

A := Adiag(Ik, b˙Vk).

This algorithm requires 2mn²− 2/3n³ flops.

Algorithm 3.2.7 (R-Bidiagonalization) when m n we can use the following faster method of bidiagonalization.

(1) Compute an orthogonal Q1 ∈ R^m×m such that Q^T₁A =

R1

0

, where R1 ∈ R^n×n is upper triangular.

(2) Applying Algorithm 3.2.6 to R1, we get Q^T₂R1VB = B1, where Q2, VB ∈ R^n×n orthogonal and B1 ∈ R^n×n upper bidiagonal.

(3) Define UB = Q1diag(Q2, I_m−n). Then U_B^TAVB =

B1

0

≡ B bidiagonal.

This algorithm require mn² + n³. It involves fewer compuations comparing with Algorithm 7.6 (2mn²− 2/3n³) whenever m≥ 5/3n.

Once the bidiagonalization of A has been achieved,the next step in the Golub-Reinsch SVD algorithm is to zero out the super diagonal elements in B. Unfortunately, we must defer our discussion of this iteration until Chapter 5 since it requires an understanding of the symmetric QR algorithm for eigenvalues. That is, it computes orthogonal matrices UΣ and VΣ such that

U_Σ^TBV_Σ = Σ = diag(σ₁,· · · , σn).

By defining U = U_BU_Σ and V = V_BV_Σ, we see that U^TAV = Σ is the SVD of A.

(12)

Algorithms Flop Counts Algorithm 3.2.1 Normal equations mn²/2 + n³/6 Algorithm 3.2.2 Householder orthogonalization mn²− n³/3 rank(A)=n Algorithm 3.2.3 Modified Gram-Schmidt mn²

Algorithm 3.1.3 Givens orthogonalization 2mn²− 2/3n³ Algorithm 3.2.6 Householder Bidiagonalization 2mn²− 2/3n³ Algorithm 3.2.7 R-Bidiagonalization mn²+ n³

LINPACK Golub-Reinsch SVD 2mn²+ 4n³

rank(A) < n Algorithm 3.2.5 QR-with column pivoting 2mnr− mr²+ 1/3r³

Alg. 3.2.7+SVD Chan SVD mn²+ 11/2n³

Table 3.1: Solving the LS problem (m≥ n)

Remark 3.2.3 If the LINPACK SVD Algorithm is applied with eps=10⁻¹⁷ to

T100(0.2) = diag(1, s,· · · , sⁿ⁻¹)







1 −c −c · · · −c 1 −c · · · −c . .. ...

0 1





,

then ˆσn= 0.367805646308792467× 10⁻⁸.

Remark 3.2.4 As we mentioned before, when solving the LS problem via the SVD, only Σ and V have to be computed (see (3.2.6)). Table 3.1 compares the efficiency of this approach with the other algorithms that we have presented.

3.2.3 The Sensitivity of the Least Squares Problem

Corollary 3.2.1 (of Theorem 1.2.3) Let U = [u1,· · · , u^m], V = [v1,· · · , vⁿ] and U^∗AV = Σ = diag(σ1,· · · , σr, 0,· · · , 0). If k < r = rank(A) and Ak =

Pk i=1

σiuiv_i^T, Then

rank(B)=kmin kA − Bk2 =kA − Akk2 = σk+1.

Proof: Since U^TAkV = diag(σ1,· · · , σk, 0,· · · , 0), it follows rank(Ak) = k and that kA − A^kk² =kU^T(A− A^k)Vk² =kdiag(0, · · · , 0, σ^k+1,· · · , σ^r)k² = σk+1.

Suppose B ∈ R^m×n and rank(B) = k, i.e., there are orthogonal vectors x₁,· · · , xn−k such that N (B) = span{x¹,· · · , xn−k}. This implies

span{x1,· · · , xn−k}\

span{v1,· · · , vk+1} 6= {0}.

(13)

3.2 Overdetermined linear Systems - Least Squares Methods 57 Let z be a unit vector in the intersection set. Then Bz = 0 and Az =

k+1P

i=1

σi(v^T_i z)ui. Thus,

kA − Bk²2 ≥ k(A − B)zk²2 =kAzk²2 = Xk+1

i=1

σ_i²(v_i^Tz)² ≥ σk+1² .

3.2.4 Condition number of a Rectangular Matrix

Let A∈ R^m×n, rank(A) = n, κ2(A) = σmax(A)/σmin(A).

(i) The method of normal equation:

x∈RminⁿkAx − bk2 ⇔ A^TAx = A^Tb.

(a) C = A^TA, d = A^Tb.

(b) Compute the Cholesky factorization C = GG^T. (c) Solve Gy = d and G^TxLS= y. Then

k˜xLS − xLSk²

xLS ≈ epsκ2(A^TA) = epsκ2(A)². k˜x − xk

kxk ≤ κ(A)

εkF k

kAk + εkfk kbk

+ o(ε²), where (A + F )˜x = b + f and Ax = b.

(ii) LS solution via QR factorization

kAx − bk²2 =kQ^TAx− Q^Tbk²2 =kR1x− ck²2+kdk²2, xLS = R1⁻¹c, ρLS =kdk².

Numerically, trouble can be expected wherever κ2(A) = κ2(R) ≈ 1/eps. But this is in contrast to normal equation, Cholesky factorization becomes problematical once κ2(A) is in the neighborhood of 1/√eps.

Remark 3.2.5

kAk2k(A^TA)⁻¹A^Tk2 = κ2(A), kAk²2k(A^TA)⁻¹k² = κ2(A)².

Theorem 3.2.3 Let A∈ R^m×n, (m≥ n), b 6= 0. Suppose that x, r, ˜x, ˜r satisfy kAx − bk = min!, r = b − Ax, ρ^LS =krk²,

k(A + δA)˜x − (b + δb)k2 = min!,

˜

r = (b + δb)− (A + δA)˜x.

(14)

If

ε = max{kδAk2

kAk² ,kδbk2

kbk² } < σn(A) σ1(A) and

sin θ = ρLS

kbk² 6= 1,

then k˜x − xk2

kxk² ≤ ε{2κ2(A)

cos θ + tan θκ2(A)²} + O(ε²)

and k˜r − rk2

kbk² ≤ ε(1 + 2κ²(A)) min(1, m− n) + O(ε²).

Proof: Let E = δA/ε and f = δb/ε. SincekδAk² < σn(A), by previous Corollary follows that rank(A + εE) = n for t∈ [0, ε].

[t = ε ⇒ A + tE = A + δA. If rank(A + δA) = k < n, then kA − (A + δA)k2 = kδAk² ≥ kA − A^kk² = σk+1 ≥ δⁿ. Contradiction! So min

rank(B)=kkA − Bk² = kA − A^kk²

=kA −P^k

i=1

σiuiv^T_i k2 = σk+1].

Hence we have,

(A + tE)^T(A + tE)x(t) = (A + tE)^T(b + tf ). (3.2.13) Since x(t) is continuously differentiable for all t∈ [0, ε], x = x(0) and ˜x = λ(ε), it follows that

˜

x = x + ε ˙x(0) + O(ε²)

and k˜x − xk

kxk = εk ˙x(0)k2

kxk + O(ε²).

Differentiating (3.2.13) and setting t = 0 then we have

E^TAx + A^TEx + A^TA ˙x(0) = A^Tf + E^Tb.

Thus,

˙x(0) = (A^TA)⁻¹A^T(f − Ex) + (A^TA)⁻¹E^Tr.

From kfk² ≤ kbk² and kEk² ≤ kAk² follows k˜x − xk2

kxk² ≤ ε{kAk2k(A^TA)⁻¹A^Tk2( kbk2

kAk²kxk² + 1) + ρLS

kAk2kxk2kAk²2k(A^TA)⁻¹k²} + O(ε²).

Since A^T(AxLS − b) = 0, Ax^LS ⊥ Ax^LS − b and then kb − Axk²2+kAxk²2 =kbk²2

and

kAk²2kxk²2 ≥ kbk²2 − ρ²LS.

(15)

3.2 Overdetermined linear Systems - Least Squares Methods 59 Thus,

k˜x − xk2

kxk² ≤ eps{κ²(A)( 1

cos θ + 1) + κ2(A)²sin θ

cos θ} + O(ε²).

Furthermore, by ^{sin θ}_{cos θ} = √ ^ρ^LS

kbk²2−ρ²_LS, we have k˜x − xk²

kxk2 ≈ eps(κ²(A) + κ2(A)²ρLS). (θ : small )

Remark 3.2.6 Normal equation: eps κ2(A)². QR-approach: eps(κ2(A) + ρLSκ2(A)²).

(i) If ρ_LS is small and κ₂(A) is large, then QR is better than the normal equation.

(ii) The normal equation approach involves about half of the arithmetic when m n and does not requires as much storage.

(iii) The QR approach is applicable to a wider class of matrices because the Cholesky to A^TA break down “before” the back substitution process on Q^TA = R.

3.2.5 Iterative Improvement

Im A A^T 0

r x

=

b 0

, kb − Axk² = min!

r + Ax = b, A^Tr = 0⇒ A^TAx = A^Tb. Thus,

f^(k) g^(k)

=

b 0

−

I A

A^T 0

r^(k) x^(k)

and

I A

A^T 0

p^(k) z^(k)

=

f^(k) g^(k)

.

This implies,

r^(k+1) x^(k+1)

=

r^(k) x^(k)

+

p^(k) z^(k)

If A = QR = Q

R₁ 0

, then

I A

A^T 0

p z

=

f g

implies that





I_n 0 R₁ 0 I_m−n 0 R₁^T 0 0







 h f₂ z



 =



 f1

f₂ g



 ,

where Q^Tf =

f₁ f2

Q^Tp =

h f2

. Thus, R^T₁h = g⇒ h = R1^−Tg. Then

z = R₁⁻¹(f1− h), P = Q

h f2

.

(16)