Coordinatewise Distributed Methods for Large Scale Convex Optimization

(1)

Scale Convex Optimization

Paul Tseng

Mathematics, University of Washington Seattle

ICCOPT, McMaster University August 16, 2007

Abstract This is a talk given at ICCOPT 2007.

(2)

Talk Outline

• Sensor network localization and SDP, SOCP, ESDP relaxations

• Distributed methods for SOCP and ESDP relaxations

• Distributed method for TV-based image restoration

• Extensions

(3)

Sensor Network Localization

Basic Problem

:

• n pts in <^d (d = 1, 2, 3).

• Know last n − m pts (‘anchors’) x_m+1, ..., x_n and Eucl. dist. estimate for pairs of ‘neighboring’ pts

d_ij ≥ 0 ∀(i, j) ∈ A

with A ⊆ {(i, j) : 1 ≤ i < j ≤ n}.

• Estimate first m pts (‘sensors’).

History? Graph realization, position estimation in wireless sensor network, determining protein structures, ...

(4)

Optimization Problem Formulation

υ_opt := min

x₁,...,x_m

X

(i,j)∈A

kx_i − x_jk² − d²_ij

2

• Objective function is smooth but nonconvex._{. .} m can be large (m > 1000).

6

_

• Problem is NP-hard (reduction from PARTITION). ⁶_{_}^{. .}

• Use a convex (SDP, SOCP) relaxation. High soln accuracy unnecessary.

• Seek “simple” distributed methods (important for practical implementation).

(5)

SDP Relaxation

Let X := [x₁ · · · x_m], A := [x_m+1 · · · x_n]. Then υ_opt = min

X,Y

X

(i,j)∈A

tr b_ijb^T_ijZ − d²_ij

2

s.t. Z = Y X^T X I_d

0, rankZ = d

with b_ij := I_m 0

0 A

(e_i − e_j).

SDP relaxation (Biswas,Ye ’03):

υ_sdp := min

X,Y

X

(i,j)∈A

2

0

(6)

However, SDP relaxation is expensive to solve for m large..

(7)

SOCP Relaxation

υ_opt = min

x₁,...,x_m,y_ij

X

(i,j)∈A

y_ij − d²_ij

2

s.t. y_ij = kx_i − x_jk² ∀(i, j) ∈ A

Relax “=” to “≥” constraint:

υ_socp := min

x₁,...,x_m,y_ij

X

(i,j)∈A

y_ij − d²_ij

2

s.t. y_ij ≥ kx_i − x_jk² ∀(i, j) ∈ A

= min

x₁,...,x_m f (x₁, ..., x_m) := X

(i,j)∈A

max{0, kx_i − x_jk² − d²_ij}²

This is an unconstrained problem, with f smooth, convex, partially separable.

(8)

Solve using a coordinate gradient descent (CGD) method (T, Yun ’06):

• If k∇_x_if k ≥ tol, then update x_i by moving it along

−H_i⁻¹∇_x_if, with H_i 0 and stepsize by Armijo rule to decrease f, and re-iterate.

Computation is cheap and distributes. Only {x_j}_(i,j)∈A are needed to update x_i. Provable global convergence. Fast convergence in practice.

However, SOCP can be significantly weaker than SDP relaxation..

(9)

ESDP Relaxation

Idea: Further relax the constraint Z 0 in SDP relaxation.

ESDP relaxation (Wang, Zheng, Boyd, Ye ’06):

υ_esdp := min

X,Y

X

(i,j)∈A

2





Y_ii Y_ij x^T_i Y_ij Y_jj x^T_j x_i x_j I_d



 0 ∀(i, j) ∈ A with j ≤ m

Y_ii x^T_i x_i I_d

0 ∀(i, j) ∈ A with j > m

ESDP is stronger than SOCP, weaker than SDP relaxation. In simulation, ESDP is nearly as strong as SDP relaxation, and solvable much faster by SeDuMi. Distributed method?

(10)

Distributed Method for Partially Separable SDP

ESDP has the partially separable form

minz h(z) :=

K

X

k=1

h_k(z) s.t. A_kz+B_k 0, k = 1, ..., K

with A_k very sparse, B_k low-dim., and h_k convex, C², with ∇²h_k of the same sparsity pattern as A_k.

KKT Optimality conditions:

∇h(z) − X

k

A^∗_kΛ_k = 0,

0 Λ_k ⊥ A_kz + B_k 0, k = 1, ..., K

(11)

Unconstrained reformulation:

minz,Λ f (z, Λ) := X

k

ψ_FB(A_kz + B_k, Λ_k) + k∇h(z) − X

k

A^∗_kΛ_kk²

with

ψ_FB(X, Y ) = k(X² + Y ²)^1/2 − X − Y k²_F. Facts: (T ’98, Sim, Sun, Ralph ’06)

• f is smooth, partially separable, nonneg.

• If KKT soln exists, then (z, Λ) is KKT soln ⇐⇒ ∇f (z, Λ) = 0.

Solvable by many methods, but most update all variables at once.

CGD-based distributed method:

• Choose a “small” subset of variables w of (z, Λ). If k∇_wf k ≥ tol, then move w along −H⁻¹∇_wf, with H 0 and stepsize by Armijo rule to decrease f, and re-iterate.

(12)

TV-Based Image Restoration

Total variation-based problem for restoring a noisy image b on Ω ⊂ <²: (Rudin, Osher, Fatemi ’92)

minu

Z

Ω

k∇ukdx + λ Z

Ω

|b − u|²dx

Dual has form:

minw f (w) :=

Z

Ω

|∇ · w − λb|²dx s.t. kwk ≤ 1 a.e. on Ω.

When discretized on a grid, reduces to minimizing a convex, partially separable quad. func. of w_ij ∈ <² subject to kw_ijk ≤ 1.

(13)

CGD-based distributed method:

• If kd_ijk ≥ tol, where

d_ij := arg min

kw_ij+dk≤1

(∇_w_ijf )^Td + 1

2d^TH_ijd

with H_ij 0, then move w_ij along d_ij with stepsize by Armijo rule to decrease f, and re-iterate.

If H_ij is a multiple of I₂, then d_ij has closed form solution.

(14)

Extensions

• Partially asynchronous computation, with constant stepsize?

• Simulation and numerical testing?

• Modifications to find a relative interior soln of ESDP?