• 沒有找到結果。

Optimization Methods with Signal Denoising Applications

N/A
N/A
Protected

Academic year: 2022

Share "Optimization Methods with Signal Denoising Applications"

Copied!
24
0
0

加載中.... (立即查看全文)

全文

(1)

Optimization Methods with Signal Denoising Applications

Paul Tseng

Mathematics, University of Washington Seattle

Taiwan Normal University March 2, 2006

(Joint works with Sylvain Sardy (EPFL), Andrew Bruce (MathSoft), and Sangwoon Yun (UW))

Abstract

This is a talk given at optimization seminar, Jan 2006, and at Taiwan Normal University in March 2006.

(2)

Talk Outline

• Basic Problem Model

? Primal-Dual Interior Point Method

? Block Coordinate Minimization Method

? Applications

• General Problem Model

? Block Coordinate Gradient Descent Method

? Convergence

? Numerical Testing (ongoing)

• Conclusions & Future Work

(3)

Basic Problem Model

tm







b(t ) b(t )

1 1

B = 1 1

m





b = b(t ) 

b(t )

1 m

 





b(t ) b(t ) B =

m 2 2 1

2









 





!!

""

b( )t

t

2

##$$ %%&& ''((

))** ++,, --..

/01234

Wavelet basis

& its transl.

Observed

Noisy Signal:

Denoised Signal:

556

778

99:

t t1 2

;;

<<

==

>>??

@@

AA BB

CC

DDEE

FF

GG

HHII

JJ

KK LL MM

NN OOPPQQ RR

SS TTUUVV WWXX YY

ZZ

[[

\\

]]

^^

__

`` aabb cc dd

ee ff gg

hh ii jj

b( )t

. . . t

b( )t

t

1

b( )t

t

m+1

m+1

b (t )m+1

b (t )m B = m+1 1

B1w1 + · · · + Bnwn = [B1 · · · Bn]

"

w1 w...n

#

= Bw (n ≥ m)

(4)

Find w so that Bw − b ≈ 0 and w has “few” nonzeros.

Formulate this as an unconstrained convex optimization problem:

P1

w∈<minn kBw − bk22 + ckwk1 (c > 0) “Basis Pursuit”

Chen, Donoho, Saunders

Difficulty:. . Typically m ≥ 1000, n ≥ 8000, and B is dense. k · k1 is nonsmooth.

6

_

(5)

Primal-Dual Interior Point Method for P1

Idea

: Reformulate P1 as a convex QP, and apply primal-dual IP method.

QP Reformulation of P1:

P1

w∈<minn kBw − bk22 + ckwk1

Substitute w = w+ − w with w+ ≥ 0, w ≥ 0, kwk1 = eT(w+ + w):

min

w+≥0 w−≥0

k Bw+ − Bw − b

| {z }

y

k22 + ceT(w+ + w)

min kyk22 + ceT

w+ w



w+≥0

w≥0 [B − B]

| {z }

A

 w+ w



| {z }

x

+y = b

(6)

QP Reformulation of P1:

min kyk22 + ceTx x ≥ 0 Ax + y = b KKT Optimality Condition for QP:

Ax + y = b, x ≥ 0 ATy + z = ce, z ≥ 0

Xz = 0

(X = diag[x1, ..., x2n])

Perturbed KKT Optimality Condition:

Ax + y = b, x > 0 ATy + z = ce, z > 0

Xz = µe (µ > 0)

Primal-Dual IP method: Apply damped Newton method to solve inexactly the perturbed KKT equations while maintaining x > 0, z > 0. Decrease µ after each iteration. Fiacco-McCormick ’68, Karmarkar ’84,...

(7)

Method description:

Given µ > 0, x > 0, y, z > 0, solve

A∆x + ∆y = b − Ax − y, AT∆y + ∆z = ce − ATy − z, Z∆x + X∆z = µe − Xz

Newton Eqs.

Update

xnew = x + .99βp∆x, ynew = y + .99βd∆y, znew = z + .99βd∆z,

µnew = (1 − min{.99, βp, βd})µ, where βp = min

i:∆xi<0

 xi

−∆xi



, βd = min

i:∆zi<0

 zi

−∆zi



(8)

Implementation & Initialization:

• Newton Eqs. reduce to

(I + AZ−1XAT)∆y = r . Solve by Conjugate Gradient (CG) method.

Multiplication by A

m×2n|{z}

& AT require O(m log m) & O(m(log m)2) opers.

• Initialization as in Chen-Donoho-Saunders ’96

• Theoretical convergence?

CG preconditioning?

(9)

Block Coord. Minimization Method for P1

Method description:

Given w, choose I ⊆ {1, ..., n} with |I| = m, {Bi}i∈I is orthog.

Update

wnew = arg min

ui=wi ∀i6∈I kBu − bk22 + ckuk1 closed

form soln

• Choose I to maximize min

v∈∂uI(kBu−bk22+ckuk1)|u=w

kvk2.

Requires O(m log m) opers. by algorithm of Coifman & Wickerhauser.

• Theoretical convergence: w-sequence is bounded & each cluster point solves P1.

(10)

Convergence of BCM method depends crucially on

• differentiability of k · k22

• separability of k · k1

• convexity ⇒ global minimum

(11)

Application 1:

Electronic surveillance

0 10 20 30 40 50 60

bp.cptable.spectrogram

50 55 60 65

0 10 20 30 40 50 60

ew.cptable.spectrogram

microseconds

KHz

-40 -30 -20 -10 0 10 20

m = 211 = 2048, c = 4, local cosine transform, all but 4 levels

(12)

Method efficiency

:

2500 2000

1500

1000

500

1 5 10















 





 





   



 

          !" #$

%&

CPU time (log scale) CPU time (log scale)

Real part Imaginary part

Objective

1 5 10

1000 500 1500 2000 2500 3000

BCR

IP IP

BCR

Comparing CPU times of IP and BCM methods (S-Plus, Sun Ultra 1).

(13)

ML Estimation P2

: min

w −`(Bw; b) + c X

i∈J

|wi| (c > 0)

` is log likelihood, {Bi}i6∈J are lin. indep “coarse-scale Wavelets”

• −`(y; b) = 12ky − bk22 Gaussian noise

• −`(y; b) =

m

X

i=1

(yi − bi ln yi) (yi ≥ 0) Poisson noise

Solve

P2

by adapting IP method. BCM?

(14)

Application 2:

Γ ray astronomy

0 0.5 1 1.5 Measurements

50 100 150 200 250 300 350 50 100 150

0 0.5 1 1.5 Anscombe−based estimate

50 100 150 200 250 300 350 50 100 150

0 0.5 1 1.5 TIPSH estimate

50 100 150 200 250 300 350 50 100 150

0 0.5 1 1.5 l1−penalized likelihood estimate

50 100 150 200 250 300 350 50 100 150

m = 720 · 360, c chosen by CV, Symmlets of order 8 (levels 3-8). Spatially inhomogeneous Poisson noise.

(15)

But IP method is slow (many CG steps).

6. .

_

Adapt BCM method?

(16)

General Problem Model

P3

min

w Fc(w) := f (w) + cP (w) (c ≥ 0)

f : <N → < is smooth.

P : <N → (−∞, ∞] is proper, convex, lsc, and P (w) = Pn

j=1Pj(wj) (w = (w1, ..., wn)).

• P (w) = kwk1

• P (w) = n0 if l ≤ w ≤ u

∞ else

(17)

Block Coord. Gradient Descent Method for P3

Idea

: Do BCM on a quadratic approx. of f.

For w ∈ domP, I ⊆ {1, ..., n}, and H  0n, let dH(w; I) and qH(w; I) be the optimal soln and obj. value of

d|di=0 ∀i6∈Imin {gTd + 1

2dTHd + cP (w + d) − cP (w)} direc.

subprob

with g = ∇f (w).

Facts

:

• dH(w; {1, ..., n}) = 0 ⇔ Fc0(w; d) ≥ 0 ∀d ∈ <N. stationarity

• H is diagonal ⇒ dH(w; I) = X

i∈I

dH(w; i), qH(w; I) = X

i∈I

qH(w; i). separab.

(18)

Method description:

Given w ∈ domP, choose I ⊆ {1, ..., n}, H  0n. Let d = dH(w; I).

Update wnew = w + αd (α > 0)

• α = largest element of {1, β, β2, ...} satisfying

Fc(w + αd) − Fc(w) ≤ σαqH(w; I) (0 < β < 1, 0 < σ < 1) Armijo

• I = {1}, {2}, ..., {n}, {1}, {2}, ... Gauss-Seidel

• kdD(w; I)k ≥ υkdD(w; {1, ..., n})k (0 < υ ≤ 1, D  0n is diagonal,

e.g., D = I or D = diag(H)). Gauss-Southwell-d

• qD(w; I) ≤ υ qD(w; {1, ..., n}). Gauss-Southwell-q

(19)

Convergence Results

: (a) If

• 0 < λ ≤ λi(D), λi(H) ≤ ¯λ ∀i,

• α is chosen by Armijo rule,

• I is chosen by G-Seidel or G-Southwell-d or G-Southwell-q,

then every cluster point of the w-sequence generated by BCGD method is a stationary point of Fc.

(b) If in addition P and f satisfy any of the following assumptions, then the w-sequence converges at R-linear rate (excepting G-Southwell-d).

C1 f is strongly convex, ∇f is Lipschitz cont. on domP. C2 f is (nonconvex) quadratic. P is polyhedral.

C3 f (w) = g(Ew) + qTw, where E ∈ <m×N, q ∈ <N, g is strongly convex, ∇g is Lipschitz cont. on <m. P is polyhedral.

(20)

C4 f (w) = maxy∈Y{(Ew)Ty − g(y)} + qTw, where Y ⊆ <m is polyhedral, E ∈ <m×N, q ∈ <N, g is strongly convex, ∇g is Lipschitz cont. on <m. P is polyhedral.

Notes

:

• BCGD has stronger global convergence property (and cheaper iteration) than BCM.

• Proof of (b) uses a local error bound on dist (w, {stat. pts. of Fc}).

(21)

Numerical Testing

(ongoing):

• Implement BCGD method in Matlab.

• Numerical tests with f from Mor ´e-Garbow-Hillstrom set and CUTEr set

(Gould, Orban, Toint ’05), P (w) = kwk1, and different c (e.g., c = .1, 1, 10).

• Comparison with MINOS 5.5.1 (Murtagh, Saunders ’05), a Fortran

implementation of an active-set method, applied to a reformulation of P3 with P (w) = kwk1 as

min

w+≥0 w−≥0

f (w+ − w) + c eT(w+ + w).

• Preliminary results are “promising”.

(22)

f (w) n Description

BAL 1000 Brown almost-linear func, nonconvex, dense Hessian.

BT 1000 Broyden tridiagonal func, nonconvex, sparse Hessian.

DBV 1000 Discrete boundary value func, nonconvex, sparse Hessian.

EPS 1000 Extended Powell singular func, convex, 4-block diag. Hessian.

ER 1000 Extended Rosenbrook func, nonconvex, 2-block diag. Hessian.

QD1 1000 f (w) =

n

X

i=1

wi − 1

!2

, convex, quad., rank-1 Hessian.

QD2 1000 f (w) =

n

X

i=1

wi 2 n + 1

n

X

j=1

wj − 1

2

+

2 n + 1

n

X

j=1

wj + 1

2

, strongly convex, quad., dense Hessian.

VD 1000 f (w) =

n

X

i=1

(wi − 1)2 +

n

X

j=1

j(wj − 1)

2

+

n

X

j=1

j(wj − 1)

4

, strongly convex, dense ill-conditioned Hessian.

Table 1: Least square problems from Mor ´e, Garbow, Hillstrom, 1981

(23)

MINOS BCGD- BCGD- G-Southwell-d G-Southwell-q f (w) c ]nz/objec/cpu ]nz/objec/cpu ]nz/objec/cpu

BAL 1 1000/1000/43.9 1000/1000/.1 1000/1000/.1

10 1000/9999.9/43.9 1000/9999.9/.1 1000/9999.9/.1 100 1000/99997.5/44.3 1000/99997.5/.1 1000/99997.5/.2 BT .1 1000/71.725/134.4 999/71.394/4.5 999/71.394/5.0 1 999/672.41/95.3 21/672.70/292.6 995/991.06/1.3(?)

10 0/1000/77.7 0/1000/.01 0/1000/.01

DBV .1 0/0/52.7 0/4.5E-9/.1 0/4.5E-9/.04

1 0/0/52.9 0/4.5E-9/.1 0/4.5E-9/.04

10 0/0/53.0 0/4.5E-9/.01 0/4.5E-9/.01

EPS 1 1000/351.14/58.5 500/351.14/.3 500/351.14/.3

10 243/1250/45.7 250/1250/.05 250/1250/.05

100 0/1250/50.7 0/1250/.01 0/1250/.02

ER 1 1000/436.25/72.0 1000/436.25/.5 1000/436.25/.4

10 0/500/51.5 0/500/.1 0/500/.01

100 0/500/52.4 0/500/.03 0/500/.01

QD1 .1 1000/.0975/29.9 1/.0975/.01 1/.0975/.02

1 1000/.75/37.8 1/.75/.01 1/.75/.01

10 0/1/38.6 0/1/.01 0/1/.01

QD2 .1 1000/98.5/74.2 0/98.5/.01 0/98.5/.03

1 1000/751/75.8 0/751/.01 0/751/.02

10 0/1001/53.1 0/1001/.01 0/1001/.01

VD 1 1000/937.59/43.9 1000/937.66/856.3 1000/937.66/869.0 10 413/6726.80/57.1 1000/6746.74/235.7 999/6746.74/246.9 100 136/55043/57.8 1000/55078/12.6 1000/55078/13.3

Table 2: Performance of MINOS, BCGD-Gauss-Southwell-d/q, with winit = (1, 1, ..., 1)

(24)

Conclusions & Future Work

1. For ML estimation, `1-penalty imparts parsimony in the coefficients and avoid oversmoothing the signals.

2. The resulting estimation problem can be solved effectively by IP method or BCM method, exploiting the problem structure, including nondiffer. of

`1-norm. Which to use? Depends on problem.

3. Problem reformulation may be needed.

4. For general problem model, we propose BCGD method. Numerical testing is ongoing.

5. Applications to denoising, regression, SVM?

參考文獻

相關文件

BAL 1000 Brown almost-linear func, nonconvex, dense Hessian1. BT 1000 Broyden tridiagonal func, nonconvex,

Nonsmooth regularization induces sparsity in the solution, avoids oversmoothing signals, and is useful for variable selection.. The regularized problem can be solved effectively by

This problem has wide applications, e.g., Robust linear programming, filter design, antenna design, etc... (Lobo, Vandenberghe, Boyd,

In this paper, we have shown that how to construct complementarity functions for the circular cone complementarity problem, and have proposed four classes of merit func- tions for

Abstract Like the matrix-valued functions used in solutions methods for semidefinite pro- gram (SDP) and semidefinite complementarity problem (SDCP), the vector-valued func-

Chen, Conditions for error bounds and bounded level sets of some merit func- tions for the second-order cone complementarity problem, Journal of Optimization Theory and

In this paper, we illustrate a new concept regarding unitary elements defined on Lorentz cone, and establish some basic properties under the so-called unitary transformation associ-

Fukushima, On the local convergence of semismooth Newton methods for linear and nonlinear second-order cone programs without strict complementarity, SIAM Journal on Optimization,