about 500,000. The column n is the video list, which is about 20, 000. The rating is from 1 star to 5 stars. The matrix is certainly incomplete. We want to fill in the vacancy based on an assumption that the completed matrix is low rank. The singular value decomposition of A is
A =
p
X
i=1
σiuivTi .
Each class uivTi represents certain types of videos. For instance, the drama, the action, etc. In a class uiviT, the row vTi lists those videos with this type (say drama), and uilists those members who rate these videos higher. The completed matrix can be used for recommendation to the members.
2.2 Introduction and overview
There are three kinds of linear problems we encounter in applications:
• solving large linear system: Ax = b
• solving least squares problem: minxkAx − bk2.
• solving eigenvalue problems: Ax = λx
• solving singular value decomposition problem.
In solving linear systems, there are two classes of methods:
• Direct methods: which solves the equation directly. This is usually for small system. Basi-cally, the solving process is a factorization of the matrix A such as LU-factorization.
• Iterative methods: the basic idea is to decompose A = M − N , where M is a major part and easy to invert, while N is a minor part. Then perform an iteration M xn+1− N xn= b to get an approximate solution. Usually, a preconditioning is needed, which means that we replace Ax = b by P Ax = P b so that it is easy to have above major-minor decomposition.
In solving eigenvalue problems, I shall discuss the power method and QR algorithm. For least square problem, I shall discussed weighted iterative method.
2.3 *Matrix Algebra
Spectral Decomposition We assume A is an n × n matrix in Cn.
Theorem 2.2 (Caley-Hamilton). Let pA(λ) := det(λI − A) be the characteristic polynomial of A.
ThenpA(A) = 0.
Theorem 2.3. There exists a minimal polynomial pmwhich is a factor ofpAandpm(A) = 0.
Theorem 2.4 (Fundamental Theorem of Algebra). Any polynomial p(λ) over C of degree m can be factorized as
p(λ) = a
m
Y
i=1
(λ − λi)
for some constanta 6= 0 and λ1, ..., λm∈ C. This factorization is unique.
Definition 2.2. Let A : Cn → Cn. A subspace V ⊂ Cn is called an invariant subspace of the linear mapA if AV ⊂ V.
Definition 2.3. Let A : Cn→ Cn. A vectorv is called an eigenvector of A if there exists a λ such that
Av = λv.
Definition 2.4. For a matrix A, the set of all its eigenvalues σ(A) := {λ1, ..., λn} is called the spectra ofA.
Definition 2.5. A vector space V is said to be the direct sum of its two subspaces V1 andV2 if for anyv ∈ V there exist two unique vectors vi ∈ Vi,i = 1, 2 such that v = v1+ v2. We denote it by V = V1⊕ V2.
Remark 2.1. We also use the notation V = V1+ V2for the property: anyv ∈ V can be written as v = v1+ v2for somevi ∈ Vi,i = 1, 2. Notice that V = V1⊕ V2if and only ifV = V1+ V2and V1∩ V2 = {0}.
Lemma 2.1. Suppose p and q are two polynomials over C and are relatively prime (i.e. no common roots). Then there exist two other polynomialsa and b such that
ap + bq = 1.
Lemma 2.2. Suppose p and q are two polynomials over C and are relatively prime (i.e. no common roots). LetNp:= Ker(p(A)), Nq := Ker(q(A)) and Npq:= Ker(p(A)q(A)). Then
Npq = Np⊕ Nq. Proof. From ap + bq = 1 we get
a(A)p(A) + b(A)q(A) = I.
For any v ∈ Npq, acting the above operator formula to v, we get v = a(A)p(A)v + b(A)q(A)v := v2+ v1. We claim that v1 ∈ Np, whereas v2∈ Nq. This is because
p(A)v1 = p(A)b(A)q(A)v = b(A)p(A)q(A)v = 0.
Similar argument for proving v2 ∈ Nq. To see this is a direct sum, suppose v ∈ Np∩ Nq. Then v = a(A)p(A)v + b(A)q(A)v = 0.
Hence Np∩ Nq= {0}.
2.3. *MATRIX ALGEBRA 21 Corollary 2.1. Suppose a polynomial p is factorized as p = p1· · · ps withp1, ..., ps are relatively prime (no common roots). LetNpi := Kerpi(A). Then
Np = Np1 ⊕ · · · ⊕ Nps.
Theorem 2.5 (Spectral Decomposition). Let pmbe the minimal polynomial ofA. Suppose pmcan be factorized as
Jordan matrix A matrix J is called a Jordan normal form of a matrix A if we can find matrix V such that
Here, λki are the eigenvalues of A, vkj ∈ Cn are called the generalized eigenvectors of A, the matrices Jk are called Jordan blocks of size k of A. The matrix Vk = [v1k, · · · , vkk] is an n × k matrix. We can restrict A to Vk, k = k1, ..., ksas
AVk= A[v1k, · · · , vkk] = [v1k, · · · , vkk]Jk, k = k1, ..., ks.
For each generalized vector,
It is easy to check that
N2k =
Theorem 2.6. Any matrix A over C is similar to a Jordan normal form. The structure of this Jordan normal form is unique.
Example Suppose A is a 2 × 2 matrix with double eigenvalue λ. Let N1 = Ker(A − λI) and N2 = Ker(A − λI)2. We assume dimN1 = 1. Then N1 ⊂ N2 = C2. Let us choose any v2∈ N2\ N1. We define v1 = (A − λI)v2. Then (A − λI)v1 = (A − λI)2v2= 0. Thus, under [v1, v2], the matrix A is transformed to J2(λ).
2.3. *MATRIX ALGEBRA 23 Orthogonality, Self-adjoint operators There are some other decomposition, mainly when the under space Rnor Cnendowed with inner product structure.
Below V and W are vector spaces.
1. Orthogonal Projection: Given W ⊂ V , there is an orthogonal projection P : V → W such that (i) P w = w for all w ∈ W , (ii) (I − P )v ⊥ W for all v ∈ V .
2. For any W ⊂ V , there is a subspace W⊥such that (i) V = W ⊕ W⊥, (ii) W ∩ W⊥ = {0}, (iii) W ⊥ W⊥.
3. Self adjoint operator: we define A∗ = (¯aji). A matrix A is called self-adjoint if A∗ = A.
4. Alternatively, A∗is defined by
hv, A∗wi = hAv, wi, and A is self-adjoint if hAv, wi = hv, Awi.
5. A matrix U is unitary if U∗U = U U∗ = I. This is equivalent to that U = [u1, ..., un] and {ui}ni=1are orthonormal.
Theorem 2.7. If A is self adjoint, then A is diagonalizable by a unitary matrix U and all eigenvalues are real.
Proof. 1. Suppose µ is an eigenvalue of A. By the spectral decomposition theorem, we can find the maximal invariant subspace W corresponding to µI − A. Let J = µI − A. We claim that J = 0 on W .
2. Since A is self-adjoint, so is J .
3. If the minimal polynomial of J in W is pm(λ) = λm. If m > 1, this means that there exists v1and v2 which are independent such that
J v1= 0, J v2= v1. Then we have
hv1, v1i = hJ v2, v1i = hv2, J v1i = 0.
This is a contradiction. Hence, m = 1. This also means J = 0.
4. The eigenvalues are real. Suppose λ, v are a pair of eigenvalue/eigenvector.
λhv, vi = hλv, vi = hAv, vi
= hλv, Avi = hλv, λvi = ¯λhv, vi
5. The eigenspace corresponding to two distinct eigenvalues λ 6= µ are orthogonal to each other.
Suppose
Av = λv, Aw = µw, λ 6= µ.
Then
λhv, wi = hAv, wi = hv, Awi = µhv, wi Hence, we get hv, wi = 0.
The Rayleigh quotient method is a constructive method to find eigenvalues of self-adjoint operator.
λ1= max
v
hAv, vi hv, vi . Suppose V1be the corresponding eigenspace.
λ2= max
v⊥V1
hAv, vi hv, vi .
This process can be proceeded inductively and find all eigenvalues and eigenvectors.
Singular Value Decomposition
Theorem 2.8. Let A : Rn → Rm(or Cn → Cm). Then there exist orthonormal bases V = [v1, ..., vn] in RnandU = [u1, ..., um] and non-negative numbers
σ1 ≥ ... ≥ σp > 0, p ≥ min(m, n) such that
Avi= σiui, i = 1, ..., p, Avi= 0 forp < i ≤ n Or in matrix form
AV = U Σ,
whereV is n × n unitary matrix, U is m × m unitary matrix, Σ is m × n diagonal matrix:
Σ =
(diag(σ1, ..., σp), 0) ifm ≤ n (diag(σ1, ..., σp), 0)T ifm > n.
Proof. 1. The matrix A∗A is self-adjoint. All its eigenvalues are real. They are also non-negative because if λ and v is a pair of eigenvalue/eigenvector, then from Rayleigh quotient
λhv, vi = hA∗Av, vi = hAv, Avi ≥ 0.
2. From the spectral decomposition for the self-adjoint matrix A∗A, we can find unitary matrix [v1, ..., vn] and Λ = diag(λ1, ..., λp, 0, ..., 0) such that AV = V Λ. Here, λ1 ≥ · · · ≥ λp > 0, the rest eigenvalues are 0. The corresponding eigenspace spanned by < vp+1, ..., vn> is the kernel N (A∗A).
2.3. *MATRIX ALGEBRA 25
1. A∗has the following representation:
A∗ui= 1 σi
A∗(Avi) = σivi.
2. The domain and range of A can be decomposed into
Rn= [v1, ..., vp] ⊕ [vp+1, ..., vn] = [v1, ..., vp] ⊕ N (A),
4. The least-squares solution for Ax = b is the minimizer of 1
2kAx − bk22 With the singular value decomposition, we can represent
b =
The least squares solution is
x∗= which minimize kAx − bk2with minimal value
kAx∗− bk2= kb⊥k2.
Norm in vector space In analysis, we need to measure how close of two vectors, the concept of convergence. A natural way is to define the concept of norm for vectors.
Definition 2.6. Let V be a vector space. A mapping k · k : V → R is called a norm if (i) kxk ≥ 0 and kxk = 0 if and only if x = 0;
(ii) kλxk = |λ|kxk for any λ ∈ R and any x ∈ Rn; (iii) kx + yk ≤ kxk + kyk.
A vector space endowed with a normk · k is called a normed vector space.
In Rn, we define the norms