MoodyT.Chu DynamicalSystemsonManifolds GroupActions,LinearTransformations,andFlows:

(1)

Group Actions, Linear Transformations, and Flows:

Dynamical Systems on Manifolds

Moody T. Chu

North Carolina State University

April 8, 2010 @ National Cheng Kung University

(2)

“I admit that twice two makes four is an excellent thing, but if we are to give everything its due, twice two makes five is something a very charming thing too."

– Fedor Dostoievsky, 1821-1881 (in a letter to a friend)

(3)

Outline

Motivation

Realization Process Two Examples Basic Form

Splitting and Factorization Abstraction

Matrix Groups and Actions Matrix Groups

Group Actions Tangent Space Canoncial Forms Objective Functions

Examples

Projected Gradient Flows New Thoughts

Conclusion

(4)

Outline

Motivation

Examples

Conclusion

(5)

Outline

Motivation

Examples

Conclusion

(6)

Outline

Motivation

Examples

Conclusion

(7)

Outline

Motivation

Examples

Conclusion

(8)

Outline

Motivation

Examples

Conclusion

(9)

Motivation

“What isthe simplest formto which a family of matrices depending smoothly on the parameters can be reduced bya change of coordinatesdepending smoothly on the

parameters?"

– V. I. Arnold

(in Geometric Methods in the Theory of Ordinary Differential Equations, 1988)

• What is the simplest form referred to here?

• What kind of continuous change can be employed?

(10)

Realization Process

• Realization process, in a sense, means any deducible procedure that we use to rationalize and solve problems.

• The simplest form refers to the agility to think and draw conclusions.

• In mathematics, a realization process often appears in the form of an iterative procedure or a differential equation.

• The steps taken for the realization, i.e., the changes, could be discrete or continuous.

(11)

Continuous Realization

• Two abstract problems:

• One is a make-up and is easy.

• The other is the real problem and is difficult.

• A bridge:

• A continuous path connecting the two problems.

• A path that is easy to follow.

• A numerical method:

• A method for moving along the bridge.

• A method that is readily available.

(12)

Build the Bridge

• Specified guidance is available.

• The bridge is constructed by monitoring the values of certain specified functions.

• The path is guaranteed to work.

• Such as the projected gradient method.

• Only some general guidance is available.

• A bridge is built in a straightforward way.

• No guarantee the path will be complete.

• Such as the homotopy method.

• No guidance at all.

• A bridge is built seemingly by accident.

• Usually deeper mathematical theory is involved.

• Such as the isospectral flows.

(13)

Characteristics of a Bridge

• A bridge, if it exists, usually is characterized by an ordinary differential equation.

• The discretization of a bridge, or a numerical method in travelling along a bridge, usually produces an iterative scheme.

(14)

Two Examples

• Eigenvalue Computation

• Constrained Least Squares Approximation

(15)

The Eigenvalue Problem

• A symmetric matrix A0is given.

• Solve the equation

A₀x = λx for a nonzero vector x and a scalar λ.

(16)

An Iterative Method

• The QR decomposition:

A = QR

where Q is orthogonal and R is upper triangular.

• The QR algorithm (Francis’61):

A_k = Q_kR_k Ak +1 = RkQk.

• The sequence {A_k} converges to a diagonal matrix.

• Every matrix A_k has the same eigenvalues of A₀, i.e., (A_{k +1}=Q_k^TA_kQ_k).

(17)

A Continuous Method

• Lie algebra decomposition:

X = X^o+X⁺+X⁻

where X^ois the diagonal, X⁺the strictly upper triangular, and X⁻the strictly lower triangular part of X .

• Define Π₀(X ) := X⁻− X^−>.

• The Toda lattice (Symes’82, Deift el al’83):

dX

dt = [X , Π₀(X )]

X (0) = X0.

• Sampled at integer times, {X (k )} gives the same sequence as does the QR algorithm applied to the matrix A0=exp(X0).

(18)

Flow

Evolution starts from X0and converges to the limit point of Toda flow which is a diagonal matrix.

• The flow maintains the spectrum.

• The construction of the Toda lattice is based on the physics.

• This is a Hamiltonian system.

• A certain physical quantities are kept at constant, i.e., this is a completely integrable system.

• The convergence is guaranteed by “an act of God"?

(19)

Least Squares Matrix Approximation

• A symmetric matrix N and a set of real values {λ₁, . . . , λ_n} are given.

• Find a least squares approximation of N that has the prescribed eigenvalues.

(20)

A Standard Formulation

Minimize F (Q) := 1

2||Q^TΛQ − N||² Subject to Q^TQ = I.

• Equality Constrained Optimization:

• Augmented Lagrangian methods.

• Sequential quadratic programming methods.

• None of these techniques is easy.

• The constraint carries lots of redudancies.

(21)

A Continuous Approach

• The projection of the gradient of F can easily be calculated.

• Projected gradient flow (Brocket’88, Chu&Driessel’90):

dX

dt = [X , [X , N]]

X (0) = Λ.

• X := Q^TΛQ.

• Flow X (t) moves in a descent direction to reduce ||X − N||².

• The optimal solution X can be fully characterized by the spectral decomposition of N and is unique.

(22)

Flow

Evolution starts from an initial value and converges to the limit point which solves the least squares problem.

• The flow is built on the basis of systematically reducing the difference between the current position and the target position.

• This is a descent flow.

(23)

Equivalence

• (Bloch’90) Suppose X is tridiagonal. Take N = diag{n, . . . , 2, 1}, then

[X , N] = Π0(X ).

• A gradient flow hence becomes a Hamiltonian flow.

(24)

Basic Form

• Lax dynamics:

dX (t)

dt := [X (t), k1(X (t))]

X (0) := X₀.

• Parameter dynamics:

dg1(t)

dt := g1(t)k1(X (t)) g₁(0) := I.

and

dg2(t)

dt := k2(X (t))g2(t) g₂(0) := I.

• k1(X ) + k2(X ) = X .

(25)

Similarity Property

X (t) = g1(t)⁻¹X0g1(t) = g2(t)X0g2(t)⁻¹.

• Define Z (t) = g₁(t)X (t)g₁(t)⁻¹.

• Check dZ

dt = dg1

dt Xg⁻¹₁ +g1

dX

dt g₁⁻¹+g1Xdg₁⁻¹ dt

= (g1k1(X ))Xg₁⁻¹+g1(Xk1(X ) − k1(X )X )g₁⁻¹ +g₁X (−k₁(X )g₁⁻¹) =0.

• Thus Z (t) = Z (0) = X (0) = X₀.

(26)

Decomposition Property

exp(tX₀) =g₁(t)g₂(t).

• Trivially exp(X0t) satisfies the IVP dY

dt =X₀Y , Y (0) = I.

• Define Z (t) = g1(t)g2(t).

• Then Z (0) = I and dZ

dt = dg₁

dt g₂+g₁dg₂ dt

= (g1k1(X ))g2+g1(k2(X )g2) =g1Xg2

= X₀Z (by Similarity Property).

• By the uniqueness theorem in ODEs, Z (t) = exp(X0t).

(27)

Reversal Property

exp(tX (t)) = g2(t)g1(t).

• By Decomposition Property,

g₂(t)g₁(t) = g₁(t)⁻¹exp(X₀t)g₁(t)

= exp(g1(t)⁻¹X0g1(t)t)

= exp(X (t)t).

(28)

Abstraction

• QR-type Decomposition.

• QR-type Algorithm.

(29)

QR-type Decomposition

• Lie algebra decomposition of gl(n) ⇐⇒ Lie group decomposition of Gl(n) in the neighborhood of I.

• Arbitrary subspace decomposition gl(n) ⇐⇒ Factorization of a one-parameter semigroup in the neighborhood of I as the product of two nonsingular matrices , i.e.,

exp(X₀t) = g₁(t)g₂(t).

• The product g1(t)g2(t) will be called the abstract g1g2

decomposition of exp(X0t).

(30)

QR-type Algorithm

• By setting t = 1, we have

exp(X (0)) =g1(1)g2(1) exp(X (1)) =g₂(1)g₁(1).

• The dynamical system for X (t) is autonomous =⇒ The above phenomenon will occur at every feasible integer time.

• Corresponding to the abstract g1g2decomposition, the above iterative process for all feasible integers will be called the abstract g1g2algorithm.

(31)

Matrix Groups and Actions

Lots of realization processes used in numerical linear algebra are the results of group actions.

• What groups can be used?

• What actions can be taken?

• What results can be expected?

(32)

Matrix Groups

• A subset of nonsingular matrices (over any field) closed under matrix multiplication and inversion is called a matrix group.

• Matrix groups are central in many parts of mathematics and applications.

• A smooth manifold which is also a group where the multiplication and the inversion are smooth maps is called a Lie group.

• The most remarkable feature of a Lie group is that the structure is the same in the neighborhood of each of its elements.

• (Howe’83) Every (non-discrete) matrix group is in fact a Lie group.

• Algebra and geometry are intertwined in the study of matrix groups.

(33)

Examples of Matrix Groups

Group Subgroup Notation Characteristics

General linear Gl(n) {A ∈ Rn×n | det(A) 6= 0}

Special linear Sl(n) {A ∈ Gl(n)| det(A) = 1}

Upper triangular U (n) {A ∈ Gl(n)|A is upper triangular}

Unipotent U nip(n) {A ∈ U (n)|aii = 1 for all i}

Orthogonal O(n) {Q ∈ Gl(n)|Q> Q = I}

Generalized orthogonal OS (n) {Q ∈ Gl(n)|Q> SQ = S}; S is a fixed matrix

Symplectic Sp(2n) OJ (2n); J :=

2 4

0 I

−I 0

3 5

Lorentz Lor (n, k ) OL(n + k); L := diag{1, . . . , 1

| {z } n

, −1, . . . − 1

| {z }

k }

Affine Aff (n)

8

<

: 2 4

A t

0 1

3

5 |A ∈ Gl(n),t ∈ Rn 9

=

;

Translation T rans(n)

8

<

: 2 4

I t

0 1

3 5 |t ∈ Rn

9

=

;

Isometry Isom(n)

8

<

: 2 4

Q t

0 1

3

5 |Q ∈ O(n),t ∈ Rn 9

=

; Center of G Z (G) {z ∈ G|zg = gz, for every g ∈ G}, G is a given group Product of G1 and G2 G1 × G2 (g1, g2) ∗ (h1, h2) := (g1h1, g2h2)

Quotient G/N {Ng|g ∈ G}; N is a fixed normal subgroup of G

Hessenberg Hess(n) U nip(n)/Zn

(34)

Group Actions

• A function µ : G × V −→ V is said to be a group action of G on a set V if and only if

• µ(gh,x) = µ(g, µ(h, x)) for all g, h ∈ G and x ∈ V.

• µ(e,x) = x, if e is the identity element in G.

• Givenx ∈ V, two important notions associated with a group action µ:

• The stabilizer ofx is

StabG(x) := {g ∈ G|µ(g, x) = x}.

• The orbit ofx is

Orb_G(x) := {µ(g, x)|g ∈ G}.

(35)

Examples of Group Actions

Set V Group G Action µ(g, A) Application

Rn×n Any subgroup g−1 Ag conjugation

Rn×n O(n) g> Ag orthogonal similarity

Rn×n× . . . × Rn×n

| {z }

k

Any subgroup (g−1 A1g, . . . , g−1 Ak g) simultaneous reduction

S(n) × SPD (n) Any subgroup (g> Ag, g> Bg) symm. positive definite pencil reduction Rn×n × Rn×n O(n) × O(n) (g>1Ag2, g>

1Bg2) QZ decomposition

Rm×n O(m) × O(n) g>1Ag2 singular value decomp.

Rm×n × Rp×n O(m) × O(p) × Gl(n) (g>1Ag3, g>

2Bg3) generalized

singular value decomp.

(36)

Some Exotic Group Actions (yet to be studied!)

• In numerical analysis, it is customary to use actions of the orthogonal group to perform the change of coordinates for the sake of cost efficiency and numerical stability.

• What could be said if actions of the isometry group are used?

• Being isometric, stability is guaranteed.

• The inverse of an isometry matrix is easy.

» Q t

0 1

–−1

=

» Q^> −Q^>t

0 1

– .

• The isometry group is larger than the orthogonal group.

(37)

Actions with Shift or Scaling

• What could be said if actions of the orthogonal group plus shift are used?

µ((Q, s), A) := Q^>AQ + sI, Q ∈ O(n), s ∈ R⁺.

• What could be said if action of the orthogonal group with scaling are used?

µ((Q, s), A) := sQ^>AQ, Q ∈ O(n), s ∈ R×, or

µ((Q,s, t), A) := diag{s}Q^>AQdiag{t}, Q ∈ O,s, t ∈ Rⁿ×.

(38)

Using the Group Actions

• Given a group G and its action µ on a set V, the associated orbit Orb_G(x) characterizes the rule by which x is to be changed in V.

• Merely an orbit is often too “wild" to be readily traced for finding the

“simplest form" ofx.

• Depending on the applications, a path or differential equation needs to be built on the orbit to connectx to its simplest form.

• A differential equation on the orbit Orb_G(x) is equivalent to a differential equation on the group G.

• Lax dynamics on X (t).

• Parameter dynamics on g1(t) or g2(t).

(39)

Following the Orbits

• To stay in either the orbit or the group, the vector field of the dynamical system must be distributed in the tangent space of the corresponding manifold.

• Most of the tangent spaces for the matrix groups can be calculated explicitly.

• If some kind of objective function has been used to control the connecting bridge, its gradient should be projected to the tangent space.

(40)

Tangent Space in General

• Given a matrix group G ≤ Gl(n), the tangent space to G at A ∈ G can be defined as

T_AG := {γ⁰(0)|γ is a differentiable curve in G with γ(0) = A}.

• The tangent space g = T_IG at the identity I is critical.

• gis a Lie subalgebra in R^n×n, i.e.,

If α⁰(0), β⁰(0) ∈g, then [α⁰(0), β⁰(0)] ∈g

• The tangent space of a matrix group has the same structure everywhere, i.e.,

TAG = Ag.

• TIG can be characterized as the logarithm of G, i.e., g= {M ∈ R^n×n| exp(tM) ∈ G, for all t ∈ R}.

(41)

Examples of Tangent Spaces

Group G Algebrag Characteristics

Gl(n) gl(n) Rn×n

Sl(n) sl(n) {M ∈ gl(n)|trace(M) = 0}

Aff (n) aff (n) {

2 6 4

M t

0 0

3 7

5| M ∈ gl(n), t ∈ Rn }

O(n) o(n) {K ∈ gl(n)|K is skew-symmetric}

Isom(n) isom(n) {

2 6 4

K t

0 0

3 7

5| K ∈ o(n), t ∈ Rn }

G1 × G2 T(e1 ,e2 )G1 × G2 g₁×g₂

(42)

An Illustration of Projection

• The tangent space of O(n) at any orthogonal matrix Q is T_QO(n) = QK(n)

where

K(n) = {All skew-symmetric matrices}.

• The normal space of O(n) at any orthogonal matrix Q is N_QO(n) = QS(n).

• The space R^n×nis split as

R^n×n=QS(n) ⊕ QK(n).

(43)

• A unique orthogonal splitting of X ∈ R^n×n:

X = Q(Q^TX ) = Q 1

2(Q^TX − X^TQ)} + Q{1

2(Q^TX + X^TQ)

.

• The projection of X onto the tangent space T_QO(n) is given by

Proj_T_Q_O(n)X = Q 1

2(Q^TX − X^TQ)

.

(44)

Canonical Forms

• A canonical form refers to a “specific structure" by which a certain conclusion can be drawn or a certain goal can be achieved.

• The superlative adjective “simplest" is a relative term which should be interpreted broadly.

• A matrix with a specified pattern of zeros, such as a diagonal, tridiagonal, or triangular matrix.

• A matrix with a specified construct, such Toeplitz, Hamiltonian, stochastic, or other linear varieties.

• A matrix with a specified algebraic constraint, such as low rank or nonnegativity.

(45)

Examples of Canonical Forms

Canonical form Also know as Action

Bidiagonal J Quasi-Jordan Decomp., P−1 AP = J,

A ∈ Rn×n P ∈ Gl(n)

Diagonal Σ Sing. Value Decomp., U> AV = Σ,

A ∈ Rm×n (U, V ) ∈ O(m) × O(n)

Diagonal pair (Σ1, Σ2) Gen. Sing. Value Decomp., (U> AX , V > BX ) = (Σ1, Σ2), (A, B) ∈ Rm×n × Rp×n (U, V , X ) ∈ O(m) × O(p) × Gl(n) Upper quasi-triangular H Real Schur Decomp., Q> AQ = H,

A ∈ Rn×n Q ∈ O(n)

Upper quasi-triangular H Gen. Real Schur Decomp., (Q> AZ , Q> BZ ) = (H, U),

Upper triangular U A, B ∈Rn×n Q, Z ∈ O(n)

Symmetric Toeplitz T Toeplitz Inv. Eigenv. Prob., Q> diag{λ1, . . . , λn}Q = T , {λ1, . . . , λn} ⊂ R is given Q ∈ O(n) Nonnegative N ≥ 0 Nonneg. inv. Eigenv. Prob., P−1 diag{λ1, . . . , λn}P = N,

{λ1, . . . , λn} ⊂ C is given P ∈ Gl(n) Linear variety X Matrix Completion Prob., P−1 {λ1, . . . , λn}P = X, with fixed entries {λ1, . . . , λn} ⊂ C is given P ∈ Gl(n) at fixed locations Xiν ,jν = aν , ν = 1, . . . , `

Nonlinear variety Test Matrix Construction, P−1 ΛP = U> ΣV with fixed singular values Λ =diag{λ1, . . . , λn} and P ∈ Gl(n), U, V ∈ O(n)

and eigenvalues Σ =diag{σ1, . . . σn} are given Maximal fidelity Structured Low Rank Approx. “

diag“

USS> U>””−1/2 USV > , A ∈ Rm×n (U, S, V ) ∈ O(m) × Rk

× × O(n)

(46)

Objective Functions

• The orbit of a selected group action only defines the rule by which a transformation is to take place.

• Properly formulated objective functions helps to control the construction of a bridge between the current point and the desired canonical form on a given orbit.

• The bridge often assumes the form of a differential equation on the manifold.

• The vector field of the differential equation must distributed over the tangent space of the manifold.

• Corresponding to each differential equation on the orbit of a group action is a differential equation on the group, and vice versa.

• How to choose appropriate objective functions?

(47)

Flows on Orb

_O(n)

(X ) under Conjugation

• Toda lattice arises from a special mass-spring system (Symes’82, Deift el al’83),

dX

dt = [X , Π0(X )], Π₀(X ) = X⁻− X^−>, X (0) = tridiagonal and symmetric.

• No specific objective function is used, but physics laws govern the definition of the vector field.

(48)

• Generalization to general matrices is totally by brutal force and blindness (and by the then young and desperate researchers) (Chu’84, Watkins’84).

dX

dt = [X , Π0(G(X ))], G(z) is analytic over spectrum of X (0).

• But nicely explains the pseudo-convergence and convergence behavior of the classical QR algorithm for general and normal matrices, respectively.

• Sorting of eigenvalues at the limit point is observed, but not quite clearly understood.

(49)

• Double bracket flow (Brockett’88), dX

dt = [X , [X , N]], N = fixed and symmetric.

• This is the projected gradient flow of the objective function

Minimize F (Q) := 1

2||Q^TΛQ − N||², Subject to Q^TQ = I.

• Sorting is necessary in the first order optimality condition (Wielandt&Hoffman’53).

• Take a special N = diag{n, n − 1, . . . , 2, 1},

• X is tridiagonal and symmetric =⇒ Double bracket flow ≡ Toda lattice (Bloch’90).

• Bingo!The classical Toda lattice does have an objective function in mind.

• X is a general symmetric matrix =⇒ Double bracket = A specially scaled Toda lattice.

(50)

• Scaled Toda lattice (Chu’95), dX

dt = [X , K ◦ X ], K = fixed and skew-symmetric.

• Flexible in componentwise scaling.

• Enjoy very general convergence behavior.

• But still no explicit objective function in sight.

(51)

Flows on Orb

_O(m)×O(n)

(X ) under Equivalence

• Any flow on the orbit OrbO(m)×O(n)(X ) under equivalence must be of the form

dX

dt =X (t)h(t) − k (t)X (t), h(t) ∈ K(n), k (t) ∈ K(m).

(52)

• QZ flow (Chu’86), dX1

dt = X1Π₀(X₂⁻¹X1) − Π₀(X1X₂⁻¹)X1, dX2

dt = X2Π₀(X₂⁻¹X1) − Π₀(X1X₂⁻¹)X2, .

• SVD flow (Chu’86), dY

dt = Y Π₀ Y (t)^>Y (t) − Π₀ Y (t)Y (t)^> Y , Y (0) = bidiagonal.

• The "objective" in the design of this flow was to maintain the bidiagonal structure of Y (t) for all t.

• The flow gives rise to the Toda flows for Y^>Y and YY^>.

• Votka-Volterra equation (Nakamura’01).

(53)

Projected Gradient Flows

• Given

• A continuous matrix group G ⊂ Gl(n).

• A fixed X ∈ V where V ⊂ R^n×nbe a subset of matrices.

• A differentiable map f : V −→ R^n×nwith a certain “inherent"

properties, e.g., symmetry, isospectrum, low rank, or other algebraic constraints.

• A group action µ : G × V −→ V.

• A projection map P from R^n×nonto a singleton, a linear subspace, or an affine subspace P ⊂ R^n×nwhere matrices in R carry a certain desired structure, e.g., the canonical form.

• Minimize the functional F : G −→ R F (g) :=1

2kf (µ(g, X )) − P(µ(g, X ))k²_F.

(54)

Flow Approach

• Compute ∇F (g).

• Project ∇F (g) onto TgG.

• Follow the projected gradient until convergence.

(55)

Some Old Examples

• Brockett’s double bracket flow (Brockett’88).

• Least squares approximation with spectral constraints (Chu&Driessel’90, Nakamura’92-98).

dX

dt = [X , [X , P(X )]].

• Simultaneous reduction problem (Chu’91),

dX_i dt =



X_i,

p

X

j=1

[Xj,P_j^T(Xj)]−[Xj,P_j^T(Xj)]^T 2



 Xi(0) = Ai

(56)

• Nearest normal matrix problem (Chu’91), dW

dt =

W ,1

2{[W , diag(W^∗)] − [W , diag(W^∗)]^∗}

W (0) = A.

• Matrix with prescribed diagonal entries and spectrum (Schur-Horn Theorem) (Chu’95),

X = [X , [diag(X ) − diag(a), X ]]˙

(57)

• Inverse generalized eigenvalue problem for symmetric-definite pencil (Chu&Guo’98).

X˙ = − (XW )^T +XW , Y˙ = − (YW )^T +YW ,

W := X (X − P1(X )) + Y (Y − P2(Y )).

• Various structured inverse eigenvalue problems (Chu&Golub’02).

(58)

New Thoughts

• The idea of group actions, least squares, and the corresponding gradient flows can be generalized to other structures such as

• Stiefel manifold O(p, q) := {Q ∈ R^p×q|Q^TQ = Iq}.

• The manifold of oblique matrices OB(n) := {Q ∈ R^n×n|diag(Q^>Q) = In}.

• Cone of nonnegative matrices.

• Semigroups.

• Low rank approximation.

• Using the product topology to describe separate groups and actions might broaden the applications.

• Any advantages of using the isometry group over the orthogonal group?

(59)

Stochastic Inverse Eigenvalue Problem

• Construct a stochastic matrix with prescribed spectrum

• A hard problem (Karpelevic’51, Minc’88).

−1 −0.5 0 0.5 1

−1

−0.8

−0.6

−0.4

−0.2 0 0.2 0.4 0.6 0.8 1

Figure:Θ4by the Karpeleviˇc theorem.

• Would be done if the nonnegative inverse eigenvalue problem is solved – a long standing open question.

(60)

Least Squares Formulation

Minimize F (g, R) := 1

2||gJg⁻¹− R ◦ R||² Subject to g ∈ Gl(n), R ∈ gl(n).

• J = Real matrix carrying spectral information.

• ◦ = Hadamard product.

(61)

Steepest Descent Flow

dg

dt = [(gJg⁻¹)^T, α(g, R)]g^−T dR

dt = 2α(g, R) ◦ R.

• α(g, R) := gJg⁻¹− R ◦ R.

(62)

ASVD Flow for g

(Bunse-Gerstner et al’91, Wright’92)

g(t) = X (t)S(t)Y (t)^T

g˙ = X SY˙ ^T +X ˙SY^T +XS ˙Y^T X^TgY˙ = X^TX˙

| {z }

Z

S + ˙S + S ˙Y^TY

| {z }

W

Define Q := X^TgY . Then˙ dS

dt = diag(Q).

dX

dt = XZ . dY

dt = YW .

(63)

Nonnegative Matrix Factorization

• For various applications, given a nonnegative matrix A ∈ R^m×n, want to

min

0≤V ∈R^m×k,0≤H∈R^{k ×n}

1

2kA − VHk²_F.

• Relatively new techniques for dimension reduction applications.

• Image processing — no negative pixel values.

• Data mining — no negative frequencies.

• No firm theoretical foundation available yet (Tropp’03).

(64)

• Relatively easy by flow approach!

min

E ∈R^m×k,F ∈R^{k ×n}

1

2kA − (E ◦ E)(F ◦ F )k²_F.

• Gradient flow:

dV

dt = V ◦ (A − VH)H^>), dH

dt = H ◦ (V^>(A − VH)).

• Once any entry of either V or H hits 0, it stays zero. This is a natural barrier!

• The first order optimality condition is clear.

(65)

Initial Structure Dynamical System Operator

X0=staircase X = [X , Π0(X)]˙ Π0(X) = X− − (X−)>

B0λ − A0 = staircase L = L Π0(Y˙ −1 X) − Π0 (XY−1 )L B0 = staircase B = BΠ0(B˙ > B) − Π0 (BB> )B

B0λ − A0 = Lancaster K = 1˙ 2(CDK − KDC) + N>LK + KNR D, NR , NL = controls C = (MDK − KDM) + N>˙ LC + CNR

M = 1˙ 2(MDC − CDM) + N>L M + MNR

H0 = Hamiltonian H = [H, P0(H)]˙ P0(X) =

"

0 −X >21

X21 0

#

W0 = skew-Hamiltonian W = [W, P0(W˙ 1/2 )]

H0 = Hamiltonian H = [H, P1(H)]˙ P1(X) =

» Π0(X11) −X21

X21 Π0(X11)

–

W0 = skew-Hamiltonian W = [W, P1(W˙ 1/2 )]

X0 = general X = X P3(X˙ > X) − P3 (XX> )X P3 = generalized P0 H0 = Hamiltonian X = X P2((X˙ > JXJ)1/2) − P1 ((XJX> J)1/2)X P2(X) :=

"

−Π0(X>

11) X12

−X12 −Π0(X>

11)

#

W0λ − H0 = sHH L = L P4(W˙ −1 H) − P4 (HW−1 )L P4(X) :=

"

Π0(X11) −X >

21 X21 −Π0(X>

22)

#

B0λ − A0 = Hamiltonian L = L P1(B˙ −1 A) − Π0 (AB−1 )L B0λ − A0 = general A = AP2((A˙ > B−>JB−1AJ)1/2) − P4 (AB−1 )A

B = BP1((B˙ −1 AJA>B−>J)1/2) − P4 (AB−1 )A

(66)

Conclusion

• Many operations used to transform matrices can be considered as matrix group actions.

• Some basic ideas and examples have been outlined in this talk.

• This view unifies different transformations under the same framework of tracing orbits associated with corresponding group actions.

• More sophisticated actions can be composed that might offer the design of new numerical algorithms.

• As a special case of Lie groups, tangent space structure of a matrix group is the same at every of its element. Computation is easy and cheap.

• Continuous realization methods often enable to tackle existence problems that are seemingly impossible to be solved by conventional discrete methods.

(67)

• It is yet to be determined how a dynamical system should be defined over a group so as to locate the simplest form.

• The notion of “simplicity" varies according to the applications.

• Various objective functions should be used to control the dynamical systems.

• Usually offers a global method for solving the underlying problem.

• Group actions together with properly formulated objective functions can offer a channel to tackle various classical or new and challenging problems.

(68)

• New computational techniques — the so called geometric integration — for structured dynamical systems on matrix group will further extend and benefit the scope of this interesting topic.

• Many more chapters to come.