Nonnegative Rank Factorization
A Heuristic Approach via Rank Reduction (Work in Progress)
Moody T. Chu
(Joint work with Bo Dong and Matthew Lin) North Carolina State University
March 8, 2012 @ National Cheng Kung University
一曲新詞酒ㄧ杯,去年天氣舊亭臺, 夕陽西下幾時回.
無可奈何花落去,似曾相識燕歸來, 小園香俓獨徘徊.
--- 宋.宴殊.浣溪沙
Take Home Question
Given a nonnegative matrix A, write
A =
k
X
i=1
uik>i
whereui,ki ≥ 0 and k is minimal.
Outline
Background
Nonnegative Rank
Nonnegative Rank Factorization Generic Phenomenon
Geometry of NRF Probability Issues Perturbation Theory Wedderburn Formula
Subtractivity Rank Reduction Algorithm
Maximin Problem Other Applications
Completely Positive Matrices Maximal Nonnegative Rank Splitting Conclusion
Outline
Background
Nonnegative Rank
Nonnegative Rank Factorization Generic Phenomenon
Geometry of NRF Probability Issues Perturbation Theory Wedderburn Formula
Subtractivity Rank Reduction Algorithm
Maximin Problem Other Applications
Completely Positive Matrices Maximal Nonnegative Rank Splitting Conclusion
Outline
Background
Nonnegative Rank
Nonnegative Rank Factorization Generic Phenomenon
Geometry of NRF Probability Issues Perturbation Theory Wedderburn Formula
Subtractivity Rank Reduction Algorithm
Maximin Problem Other Applications
Completely Positive Matrices Maximal Nonnegative Rank Splitting Conclusion
Outline
Background
Nonnegative Rank
Nonnegative Rank Factorization Generic Phenomenon
Geometry of NRF Probability Issues Perturbation Theory Wedderburn Formula
Subtractivity Rank Reduction Algorithm
Maximin Problem Other Applications
Completely Positive Matrices Maximal Nonnegative Rank Splitting Conclusion
Outline
Background
Nonnegative Rank
Nonnegative Rank Factorization Generic Phenomenon
Geometry of NRF Probability Issues Perturbation Theory Wedderburn Formula
Subtractivity Rank Reduction Algorithm
Maximin Problem Other Applications
Completely Positive Matrices Maximal Nonnegative Rank Splitting Conclusion
Outline
Background
Nonnegative Rank
Nonnegative Rank Factorization Generic Phenomenon
Geometry of NRF Probability Issues Perturbation Theory Wedderburn Formula
Subtractivity Rank Reduction Algorithm
Maximin Problem Other Applications
Completely Positive Matrices Maximal Nonnegative Rank Splitting Conclusion
Nonnegative Rank
I Given A ∈ Rm×n+ , write
A = UV .
• U ∈ Rm×k+ and V ∈ Rk ×n+ with k ≤ min{m, n}
• Always possible.
I Interested in the smallest k rendering this factorization.
• Denote by rank+(A).
I A trivial fact —
rank(A) ≤ rank+(A) ≤ min{m, n}.
I A challenge —
• Determining the exact nonnegative rank is NP-hard.
This Is Not ...
I A = UV is a special nonnegative factorization of A.
I Should be unequivocally distinquished from what is known as the nonnegative matrix factorization (NMF).
• Misnamed!
• Is only a low rank approximation from min
U∈Rm×p+ ,V ∈Rp×n+
kA − UV kF.
• p < min{m, n} is preassigned.
• Many numerical techniques.
• Cannot guarantee the required equality in a complete factorization even with p = rank+(A).
NMF Fails!
I Almost all NMF techniques adapt conventional mathematical programming schemes.
I The objective function in NMF is non-convex.
I The factors U and V retrieved by NMF techniques are typically local minimizers only.
I Cannot ensure equality to A.
Nonnegative Rank Factorization
R(m, n) :=A ∈ Rm×n+ | rank(A) = rank+(A) .
I Interested in
• Identifying if A ∈ R(m, n).
• Procuring a nonnegative factorization for A ∈ R(m, n).
I Is called a nonnegative rank factorization (NRF) of A.
An Example
I Not every nonnegative matrix has an NRF.
I The simplest non-NRF matrix:
C =
1 1 0 0
1 0 1 0
0 1 0 1
0 0 1 1
.
• rank(C ) = 3.
• rank+(C ) = 4.
I There is a necessary and sufficient condition qualifying an NRF matrix (Thomas’1974).
• Not suitable for computation.
Probability Simplex
I Given A ∈ Rm×n+ , define
σ(A) := diag{ka1k1, . . . , kank1} ϑ(A) := Aσ(A)−1.
I Columns of ϑ(A) are points on the probability simplex Dmin Rm+. Dm:=a ∈ Rm+|1>ma = 1 ,
Convex Hull of ϑ(A) ∈ R
3×11+1
1
1 Probability Simplex
Convex Hull of Columns
. . .
.
.
Minimal Convex Polytope
I A = UV = (UD−1)(DV ) with any invertible diagonal matrix D.
• May assume σ(U) = In.
I Write
A = ϑ(A)σ(A) = UV = ϑ(U)ϑ(V )σ(V ).
• It must be such that
ϑ(A) = ϑ(U)ϑ(V ), σ(A) = σ(V ).
• Columns of ϑ(A) are in the convex hull of ϑ(U).
Lemma
The nonnegative rank rank+(A) stands for the minimal number of vertices on Dmso that the resulting convex polytope encloses all columns of the pullback ϑ(A).
Visualizing ϑ( C )
y x
z
S D
A1 A2
A3 A4
I Suffices to represent the probability simplex D4by the unit tetrahedron S in the first octant of R3.
I Columns of ϑ(C ) can be interpreted as points A1,A2,A3,A4.
• Coplanar because rank(ϑ(C )) = 3.
• Four “edges" sitting on separate facets of the tetrahedron.
• The minimum number of vertices for a convex set in the unit tetrahedron to cover D is four.
How Often Can This Happen?
Question
(R2R+) :Given an arbitrary nonnegative 4 by 4 matrix of rank 3, what is the probability that its nonnegative rank is 3?
I No easy answer!
Question
Sylvester’s Four-Point Problem What is the probability of four random, independent, and uniform points from a compact set such that none of them lies in the triangle formed by the other three?
I “This problem does not admit of a determinate solution!"
(Sylvester’1865).
Flipped Side Question
Question
(R+2R) : Given an arbitrary nonnegative 4 by 4 matrix of nonnegative rank 3, what is the probability that its rank is 3?
Theorem
Given k < min{m, n}, let R+(k ) denote the manifold of nonnegative matrices in Rm×n+ with nonnegative rank k . Then the conditional probability of rank(A) = k , given A ∈ R+(k ), is one.
I Matrices which have an NRF are generic.
Almost Surely?
I If A = UV with randomly generated U ∈ Rm×k+ and V ∈ Rk ×n+ ,
• Almost surely we have rank(A) = k .
• The converse is not true.
I Those matrices whose rank are not equal to the nonnegative rank form a measure zero set.
• Not necessarily mean that the set is “unobservable", nor that nontrivial “exceptions" are difficult to come by.
Euclidean Distance Matrices
I Given n distinct points in Rr:
E (q1, . . .qn) :=kqi− qjk22 .
• Of rank r + 2, regardless of the number n of points.
• If r = 1, then generically of nonnegative rank n (Chu’2010).
• Certainly not have the generic phenomenon described above.
• Form a large and characterizable set, but is too specific to have a nonzero measure.
Perturbation Theory
I Very little perturbation analysis for NRF in general has been studied in the literature.
I Some open questions:
1. Given a nonnegative matrix A which has an NRF, under what condition will the perturbed nonnegative matrix A + E still have an NRF?
2. Given a nonnegative matrix A which has an NRF, let U and V be the nonnegative factors found by our (or any) numerical algorithm so that UV is a numerical NRF of A. Is UV the exact NRF of some perturbed nonnegative matrix A + E ?
Local Rank Condition
Theorem
Given an m × n non-negative matrix with rank+(A) = k , then 1. There exists a ball B(A; ) such that rank+(N) ≥ k for all
N ∈ B(P; ).
2. For any > 0, there exists N ∈ B(A; ) such rank+(N) = rank+(A) and N 6= λA for any λ.
Chipping Away a Nonnegative Portion
I B ≥ 0 is a nonnegative component (NC) of A ≥ 0 iff A − B ≥ 0.
• Compute the “maximum" rank-one NC of A (Levin’1985).
• The residual after an NC subtraction might have higher rank.
• Might end up with an infinite series of NC matrices.
I B ≥ 0 is a nonnegative element (NE) of A ≥ 0 iff B is a rank-one NC and rank(A − B) = rank(A) − 1.
I The matrixC has many NCs, but has no NE at all.
I Need to gradually distribute A over a sequence of NEs.
A Major Difficulty
I A has an NRF, =⇒ A =Pk
i=1uik>i .
• Eachuik>i is an NE.
I Miss the particular sequence of NE’s (not known to begin with)?
• Any bad choices of NE’s in the intermediate stages could cause the rank reduction process to break down.
I Get stuck at a matrix that has no more NE at all.
I A major challenge:
• Could not foresee which NE would be a “good" NE to continue on the rank reduction.
• Finding the right NE’s is precisely why the NRF problem is so challenging.
• A weakness of our approach.
I Still might be the first sensible way to crack the nut!
Wedderburn Rank Reduction Formula
Theorem
Letu ∈ Rmandv ∈ Rn. Then the matrix B := A − σ−1uv>
satisfies the rank subtractivity rank(B) = rank(A) − 1 if and only if there are vectorsx ∈ Rnandy ∈ Rmsuch that
u = Ax, v = A>y, σ =y>Ax.
Theorem
Suppose U ∈ Rm×k, R ∈ Rk ×k, and V ∈ Rn×k. Then rank(A − UR−1V>) = rank(A) − rank(UR−1V>) if and only if there exist X ∈ Rn×kand Y ∈ Rm×k such that
U = AX , V = A>Y , and R = Y>AX .
Applications
I Provide a mechanism to break down a matrix into a sum of rank-one matrices.
• Define a sequence {Ak} of matrices by
Ak +1:=Ak − (y>k Akxk)−1Akxky>k Ak.
• Properly chosen vectorsxk ∈ Rnandyk ∈ Rm satisfying y>kAkxk 6= 0.
• The process can be continued so long as Ak 6= 0.
• The sequence {Ak} must be finite.
I Almost all classical matrix decompositions can be found in this way (Chu, Funderlic, Golub’1995).
Wedderburn for NRF
I For NRF,
• Break down a nonnegative matrix by taking away one NE a time.
• An NE must assume the Wedderburn form (y>k Akxk)−1Akxky>k Ak.
• Ak +1must be nonnegative after the subtraction.
I The most difficult part.
I Two probable causes for premature termination.
• A is not a matrix in R(m, n) to begin with.
I A welcome conclusion.
I Notion of maximal nonnegative rank splitting of A.
• Bad starting points have branched Ak into a “dead end" , i.e., Ak
has no NE.
I A restart might remedy the problem.
I Only a heuristic way to find the (approximate) NRF.
I Need more analysis to conclude whether A has an NRF or not.
Maximin Problem
xk∈Rmaxn,yk∈Rm minAk− Akxky>kAk ,
subject to Akxk ≥ 0, y>kAk ≥ 0, y>kAkxk =1,
I Ak − Akxky>kAk ≤ Ak.
• maximizer of minAk− Akxky>kA always exists.
I Nonnegative objective value =⇒ Ak +1≥ 0.
• A feasible NE is found.
A Pathological Example
I Consider A = [C ; c].
• c ≥ 0 is random.
• rank+(A) = rank(A) = 4.
• Splitting A by its rows is automatically an NRF.
I The matrix ∆ := [04,c] an NE.
• Leave behind a nonnegative matrix A − ∆ = [C ; 0] of rank 3.
• [C ; 0] does not have any NRF.
• Iteration would get stuck.
I Restart =⇒
1.0000 1.0000 0 0 0.0781 1.0000 0 1.0000 0 0.6690 0 1.0000 0 1.0000 0.5002 0 0 1.0000 1.0000 0.2180
=
0.0000 0.0000 0.8624 0 0.0000 0.3862 0 0
3.4544 0 0 0
0.0000 0.0000 0.0000 1.7018
0 0.2895 0.0000 0.2895 0.1448 2.5895 0.0000 2.5895 0.0000 1.7325 1.1596 1.1596 0.0000 0.0000 0.0905 0 0 0.5876 0.5876 0.1281
.
Multiple Rank Reduction
max
Xk∈Rm×r,Yk∈Rn×r minAk − AkXkYk>Ak , subject to AkXk ≥ 0,
Yk>Ak ≥ 0, Yk>AkXk =Ir ×r,
Theorem
If a nonnegative matrix has a nonnegative rank-r reduction, then it must have r nonnegative rank-one reductions.
Completely Positive Matrices
I A ∈ Rn×n+ is completely positive (CP) iff A = BB>.
• B ≥ 0 is not necessarily square.
• Interested in the smallest number of columns of B.
I Denoted by rankcp(A).
I Two questions (Berman’2003):
• Determine whether a given nonnegative semi-definite matrix is CP.
• Determine the cp-rank and compute its CP factorization.
I More stringent than NRF.
Symmetric Wedderburn Formula
xmaxk∈Rn minAk − Akxkx>kAk
subject to Akxk ≥ 0 x>kAkxk =1
Maximal Nonnegative Rank Splitting
Question
(MNRS): Given a nonnegative matrix A, find a splitting
A = B + C,
where both B and C are nonnegative matrices satisfying rank(B) = rank+(B),
rank(A) = rank(B) + rank(C), and rank(B) is maximized.
I If A ∈ R(m, n), then trivially B = A and C = 0.
I If A /∈ R(m, n),
• A might still has a few NEs.
• Retrieve all possible NEs of A.
Conclusion
I Detecting the nonnegative rank and computing the corresponding nonnegative factorization for a general
nonnegative matrix are very challenging tasks both in theory and in practice.
I No existing algorithms can guarantee to find the NRF.
I Exploit the Wedderburn rank reduction formula to downdate a nonnegative matrix.
I Only a possible computational tool for the NRF problem.
I Need more perturbation analysis for NRF in general.