MoodyT.Chu AHeuristicApproachviaRankReduction NonnegativeRankFactorization

(1)

Nonnegative Rank Factorization

A Heuristic Approach via Rank Reduction (Work in Progress)

Moody T. Chu

(Joint work with Bo Dong and Matthew Lin) North Carolina State University

March 8, 2012 @ National Cheng Kung University

(2)

一曲新詞酒ㄧ杯,去年天氣舊亭臺, 夕陽西下幾時回.

無可奈何花落去,似曾相識燕歸來, 小園香俓獨徘徊.

--- 宋.宴殊.浣溪沙

(3)

Take Home Question

Given a nonnegative matrix A, write

A =

k

X

i=1

u_ik^>_i

whereu_i,k_i ≥ 0 and k is minimal.

(4)

Outline

Background

Nonnegative Rank

Nonnegative Rank Factorization Generic Phenomenon

Geometry of NRF Probability Issues Perturbation Theory Wedderburn Formula

Subtractivity Rank Reduction Algorithm

Maximin Problem Other Applications

Completely Positive Matrices Maximal Nonnegative Rank Splitting Conclusion

(5)

Outline

Background

Nonnegative Rank

(6)

Outline

Background

Nonnegative Rank

(7)

Outline

Background

Nonnegative Rank

(8)

Outline

Background

Nonnegative Rank

(9)

Outline

Background

Nonnegative Rank

(10)

Nonnegative Rank

I Given A ∈ R^m×n+ , write

A = UV .

• U ∈ R^m×k+ and V ∈ R^{k ×n}+ with k ≤ min{m, n}

• Always possible.

I Interested in the smallest k rendering this factorization.

• Denote by rank+(A).

I A trivial fact —

rank(A) ≤ rank+(A) ≤ min{m, n}.

I A challenge —

• Determining the exact nonnegative rank is NP-hard.

(11)

This Is Not ...

I A = UV is a special nonnegative factorization of A.

I Should be unequivocally distinquished from what is known as the nonnegative matrix factorization (NMF).

• Misnamed!

• Is only a low rank approximation from min

U∈R^m×p+ ,V ∈R^p×n+

kA − UV kF.

• p < min{m, n} is preassigned.

• Many numerical techniques.

• Cannot guarantee the required equality in a complete factorization even with p = rank+(A).

(12)

NMF Fails!

I Almost all NMF techniques adapt conventional mathematical programming schemes.

I The objective function in NMF is non-convex.

I The factors U and V retrieved by NMF techniques are typically local minimizers only.

I Cannot ensure equality to A.

(13)

Nonnegative Rank Factorization

R(m, n) :=A ∈ R^m×n+ | rank(A) = rank+(A) .

I Interested in

• Identifying if A ∈ R(m, n).

• Procuring a nonnegative factorization for A ∈ R(m, n).

I Is called a nonnegative rank factorization (NRF) of A.

(14)

An Example

I Not every nonnegative matrix has an NRF.

I The simplest non-NRF matrix:

C =







1 1 0 0

1 0 1 0

0 1 0 1

0 0 1 1





 .

• rank(C ) = 3.

• rank+(C ) = 4.

I There is a necessary and sufficient condition qualifying an NRF matrix (Thomas’1974).

• Not suitable for computation.

(15)

Probability Simplex

I Given A ∈ R^m×n+ , define

σ(A) := diag{ka1k₁, . . . , kank₁} ϑ(A) := Aσ(A)⁻¹.

I Columns of ϑ(A) are points on the probability simplex D_min R^m+. D_m:=a ∈ R^m+|1^>_ma = 1 ,

(16)

Convex Hull of ϑ(A) ∈ R

^3×11+

1

1 Probability Simplex

Convex Hull of Columns

. . .

.

(17)

Minimal Convex Polytope

I A = UV = (UD⁻¹)(DV ) with any invertible diagonal matrix D.

• May assume σ(U) = In.

I Write

A = ϑ(A)σ(A) = UV = ϑ(U)ϑ(V )σ(V ).

• It must be such that

ϑ(A) = ϑ(U)ϑ(V ), σ(A) = σ(V ).

• Columns of ϑ(A) are in the convex hull of ϑ(U).

Lemma

The nonnegative rank rank+(A) stands for the minimal number of vertices on D_mso that the resulting convex polytope encloses all columns of the pullback ϑ(A).

(18)

Visualizing ϑ( C )

y x

z

S D

A1 A2

A3 A4

I Suffices to represent the probability simplex D4by the unit tetrahedron S in the first octant of R³.

I Columns of ϑ(C ) can be interpreted as points A1,A2,A3,A4.

• Coplanar because rank(ϑ(C )) = 3.

• Four “edges" sitting on separate facets of the tetrahedron.

• The minimum number of vertices for a convex set in the unit tetrahedron to cover D is four.

(19)

How Often Can This Happen?

Question

(R2R+) :Given an arbitrary nonnegative 4 by 4 matrix of rank 3, what is the probability that its nonnegative rank is 3?

I No easy answer!

Question

Sylvester’s Four-Point Problem What is the probability of four random, independent, and uniform points from a compact set such that none of them lies in the triangle formed by the other three?

I “This problem does not admit of a determinate solution!"

(Sylvester’1865).

(20)

Flipped Side Question

Question

(R+2R) : Given an arbitrary nonnegative 4 by 4 matrix of nonnegative rank 3, what is the probability that its rank is 3?

Theorem

Given k < min{m, n}, let R+(k ) denote the manifold of nonnegative matrices in R^m×n+ with nonnegative rank k . Then the conditional probability of rank(A) = k , given A ∈ R+(k ), is one.

I Matrices which have an NRF are generic.

(21)

Almost Surely?

I If A = UV with randomly generated U ∈ R^m×k+ and V ∈ R^{k ×n}+ ,

• Almost surely we have rank(A) = k .

• The converse is not true.

I Those matrices whose rank are not equal to the nonnegative rank form a measure zero set.

• Not necessarily mean that the set is “unobservable", nor that nontrivial “exceptions" are difficult to come by.

(22)

Euclidean Distance Matrices

I Given n distinct points in R^r:

E (q₁, . . .q_n) :=kq_i− q_jk²₂ .

• Of rank r + 2, regardless of the number n of points.

• If r = 1, then generically of nonnegative rank n (Chu’2010).

• Certainly not have the generic phenomenon described above.

• Form a large and characterizable set, but is too specific to have a nonzero measure.

(23)

Perturbation Theory

I Very little perturbation analysis for NRF in general has been studied in the literature.

I Some open questions:

1. Given a nonnegative matrix A which has an NRF, under what condition will the perturbed nonnegative matrix A + E still have an NRF?

2. Given a nonnegative matrix A which has an NRF, let U and V be the nonnegative factors found by our (or any) numerical algorithm so that UV is a numerical NRF of A. Is UV the exact NRF of some perturbed nonnegative matrix A + E ?

(24)

Local Rank Condition

Theorem

Given an m × n non-negative matrix with rank+(A) = k , then 1. There exists a ball B(A; ) such that rank+(N) ≥ k for all

N ∈ B(P; ).

2. For any > 0, there exists N ∈ B(A; ) such rank+(N) = rank+(A) and N 6= λA for any λ.

(25)

Chipping Away a Nonnegative Portion

I B ≥ 0 is a nonnegative component (NC) of A ≥ 0 iff A − B ≥ 0.

• Compute the “maximum" rank-one NC of A (Levin’1985).

• The residual after an NC subtraction might have higher rank.

• Might end up with an infinite series of NC matrices.

I B ≥ 0 is a nonnegative element (NE) of A ≥ 0 iff B is a rank-one NC and rank(A − B) = rank(A) − 1.

I The matrixC has many NCs, but has no NE at all.

I Need to gradually distribute A over a sequence of NEs.

(26)

A Major Difficulty

I A has an NRF, =⇒ A =Pk

i=1u_ik^>_i .

• Eachuik^>i is an NE.

I Miss the particular sequence of NE’s (not known to begin with)?

• Any bad choices of NE’s in the intermediate stages could cause the rank reduction process to break down.

I Get stuck at a matrix that has no more NE at all.

I A major challenge:

• Could not foresee which NE would be a “good" NE to continue on the rank reduction.

• Finding the right NE’s is precisely why the NRF problem is so challenging.

• A weakness of our approach.

I Still might be the first sensible way to crack the nut!

(27)

Wedderburn Rank Reduction Formula

Theorem

Letu ∈ R^mandv ∈ Rⁿ. Then the matrix B := A − σ⁻¹uv^>

satisfies the rank subtractivity rank(B) = rank(A) − 1 if and only if there are vectorsx ∈ Rⁿandy ∈ R^msuch that

u = Ax, v = A^>y, σ =y^>Ax.

Theorem

Suppose U ∈ R^m×k, R ∈ R^{k ×k}, and V ∈ R^n×k. Then rank(A − UR⁻¹V^>) = rank(A) − rank(UR⁻¹V^>) if and only if there exist X ∈ R^n×kand Y ∈ R^m×k such that

U = AX , V = A^>Y , and R = Y^>AX .

(28)

Applications

I Provide a mechanism to break down a matrix into a sum of rank-one matrices.

• Define a sequence {Ak} of matrices by

Ak +1:=Ak − (y^>k Akxk)⁻¹Akxky^>k Ak.

• Properly chosen vectorsxk ∈ Rⁿandyk ∈ R^m satisfying y^>kAkxk 6= 0.

• The process can be continued so long as Ak 6= 0.

• The sequence {Ak} must be finite.

I Almost all classical matrix decompositions can be found in this way (Chu, Funderlic, Golub’1995).

(29)

Wedderburn for NRF

I For NRF,

• Break down a nonnegative matrix by taking away one NE a time.

• An NE must assume the Wedderburn form (y^>k Akxk)⁻¹Akxky^>k Ak.

• Ak +1must be nonnegative after the subtraction.

I The most difficult part.

I Two probable causes for premature termination.

• A is not a matrix in R(m, n) to begin with.

I A welcome conclusion.

I Notion of maximal nonnegative rank splitting of A.

• Bad starting points have branched Ak into a “dead end" , i.e., Ak

has no NE.

I A restart might remedy the problem.

I Only a heuristic way to find the (approximate) NRF.

I Need more analysis to conclude whether A has an NRF or not.

(30)

Maximin Problem

xk∈Rmaxⁿ,yk∈R^m minA_k− A_kx_ky^>_kA_k ,

subject to A_kx_k ≥ 0, y^>_kA_k ≥ 0, y^>_kA_kx_k =1,

I A_k − A_kx_ky^>_kA_k ≤ A_k.

• maximizer of minAk− Akxky^>kA always exists.

I Nonnegative objective value =⇒ Ak +1≥ 0.

• A feasible NE is found.

(31)

A Pathological Example

I Consider A = [C ; c].

• c ≥ 0 is random.

• rank+(A) = rank(A) = 4.

• Splitting A by its rows is automatically an NRF.

I The matrix ∆ := [0₄,c] an NE.

• Leave behind a nonnegative matrix A − ∆ = [C ; 0] of rank 3.

• [C ; 0] does not have any NRF.

• Iteration would get stuck.

I Restart =⇒







1.0000 1.0000 0 0 0.0781 1.0000 0 1.0000 0 0.6690 0 1.0000 0 1.0000 0.5002 0 0 1.0000 1.0000 0.2180







=







0.0000 0.0000 0.8624 0 0.0000 0.3862 0 0

3.4544 0 0 0

0.0000 0.0000 0.0000 1.7018













0 0.2895 0.0000 0.2895 0.1448 2.5895 0.0000 2.5895 0.0000 1.7325 1.1596 1.1596 0.0000 0.0000 0.0905 0 0 0.5876 0.5876 0.1281





 .

(32)

Multiple Rank Reduction

max

Xk∈R^m×r,Yk∈R^n×r minA_k − A_kX_kY_k^>A_k , subject to AkXk ≥ 0,

Y_k^>Ak ≥ 0, Y_k^>AkXk =Ir ×r,

Theorem

If a nonnegative matrix has a nonnegative rank-r reduction, then it must have r nonnegative rank-one reductions.

(33)

Completely Positive Matrices

I A ∈ R^n×n+ is completely positive (CP) iff A = BB^>.

• B ≥ 0 is not necessarily square.

• Interested in the smallest number of columns of B.

I Denoted by rankcp(A).

I Two questions (Berman’2003):

• Determine whether a given nonnegative semi-definite matrix is CP.

• Determine the cp-rank and compute its CP factorization.

I More stringent than NRF.

(34)

Symmetric Wedderburn Formula

xmaxk∈Rⁿ minA_k − A_kx_kx^>_kA_k

subject to Akxk ≥ 0 x^>_kA_kx_k =1

(35)

Maximal Nonnegative Rank Splitting

Question

(MNRS): Given a nonnegative matrix A, find a splitting

A = B + C,

where both B and C are nonnegative matrices satisfying rank(B) = rank+(B),

rank(A) = rank(B) + rank(C), and rank(B) is maximized.

I If A ∈ R(m, n), then trivially B = A and C = 0.

I If A /∈ R(m, n),

• A might still has a few NEs.

• Retrieve all possible NEs of A.

(36)

Conclusion

I Detecting the nonnegative rank and computing the corresponding nonnegative factorization for a general

nonnegative matrix are very challenging tasks both in theory and in practice.

I No existing algorithms can guarantee to find the NRF.

I Exploit the Wedderburn rank reduction formula to downdate a nonnegative matrix.

I Only a possible computational tool for the NRF problem.

I Need more perturbation analysis for NRF in general.