• 沒有找到結果。

Merit Func values

N/A
N/A
Protected

Academic year: 2022

Share "Merit Func values"

Copied!
51
0
0

加載中.... (立即查看全文)

全文

(1)

On the second-order cone complementarity problem

Jein-Shan Chen

jschen@math.ntnu.edu.tw

Department of Mathematics National Taiwan Normal University

(2)

Outline

Introduction

Nonsmooth functions approach Merit functions approach

Algorithms

Numerical experiments

(3)

Introduction

I

Introduction

(4)

History

1. NCP (1960): To find x ∈ IRn satisfying

hF (x), xi = 0 , F (x) ≥ 0 , x ≥ 0.

2. SDCP (1980): To find X ∈ IRn×n satisfying hF (X), Xi = 0 , F (X)  O , X  O.

3. SOCCP (1998): To find x ∈ IRn satisfying

hF (x), xi = 0 , F (x) Kn 0 , x Kn 0.

(5)

What’s SOC ?

The second-order cone (SOC) in IRn, also called Lorentz cone, is defined to be

Kn = {(x1, x2) ∈ IR × IRn−1 | kx2k ≤ x1}

where k · k denotes the Euclidean norm. If n = 1, let Kn be the set of nonnegative reals IR+.

X1 X1

(6)

What’s SOCCP ?

The second-order cone complementarity problem (SOCCP) is to find x, y ∈ IRn and ζ ∈ IRl such that

hx, yi = 0

x ∈ K , y ∈ K

x = F (ζ) , y = G(ζ)

where h·, ·i denotes the Euclidean inner product, F, G : IRl → IRn are smooth mappings, K is the Cartesian product of SOC, that is,

K = Kn1 × · · · × Knm,

(7)

Why study SOCCP ?

This problem has wide applications, e.g., Robust linear programming, filter design, antenna design, etc.. (Lobo, Vandenberghe, Boyd, Lebret, 1998)

It includes a large class of quadratically constrained problems and minimization of sum of Euclidean

norms as special cases.

It also includes as a special case the well-known nonlinear complementarity problem (NCP).

Difficulty: K is closed and convex, but non-polyhedral.

(8)

Nonsmooth function

II

Nonsmooth functions approach

(9)

The matrix-valued functions

Let Sn be the space of n × n real symmetric matrices. For any

X ∈ Sn, its eigenvalues λ1, λ2, · · · , λn are real and admits a spectral decomposition:

X = P

λ1

. ..

λn

PT,

where P is orthogonal (i.e., PT = P−1). Then, for any function f : IR → IR, we define a corresponding matrix-valued function fmat : Sn → Sn by

fmat(X) = P

f (λ1)

. ..

PT.

(10)

Known results about f

mat

(a) fmat is continuous iff f is continuous.

(b) fmat is directionally differentiable iff f is directionally differentiable.

(c) fmat is Fréchet-differentiable iff f is Fréchet-differentiable.

(d) fmat is continuously differentiable iff f is continuously differentiable.

(e) fmat is strictly continuous iff f is strictly continuous.

(f) fmat is Lipschitz continuous with constant κ iff f is Lipschitz continuous with constant κ.

(g) fmat is semismooth iff f is semismooth.

(11)

Remarks

1. Strictly continuous is also called locally Lipschitz continuous.

2. Let F : IRn → IRm is strictly continuous, then F is semismooth at x if F is directionally differentiable at x and ∀V ∈ ∂F (x + h), we have

F (x + h) − F (x) − V h = ◦(khk) 3. Convex functions and piecewise continuously

differentiable functions are examples of semismooth functions.

(12)

Spectral Factorization

Let x = (x1, x2) ∈ IR × IRn−1, then x can be decomposed as

x = λ1u(1) + λ2u(2) ,

where λ1, λ2 and u(1), u(2) are the spectral values and the associated spectral vectors of x are given by

λi = x1 + (−1)ikx2k ,

u(i) =

1 2



1 , (−1)i x2 kx2k



, if x2 6= 0

1 2



1 , (−1)iw



, if x2 = 0 ,

for i = 1, 2 with w being any vector in IRn−1 satisfying kwk = 1. If x 6= 0, the decomposition is unique.

(13)

The SOC-functions

For any function f : IR → IR, we define a corresponding function on IRn associated with SOC by

fsoc(x) = f (λ1) u(1) + f (λ2) u(2),

for all x = (x1, x2) ∈ IR × IRn−1, where λ1, λ2 and u(1), u(2) are the spectral values and vectors of x.

If f is defined only on a subset of IR, then fsoc is defined on the corresponding subset of IRn.

(14)

Parallel results about f

soc

(a) fsoc is continuous iff f is continuous.

(b) fsoc is directionally differentiable iff f is directionally differentiable.

(c) fsoc is Fréchet-differentiable iff f is Fréchet-differentiable.

(d) fsoc is continuously differentiable iff f is continuously differentiable.

(e) fsoc is strictly continuous iff f is strictly continuous.

(f) fsoc is Lipschitz continuous with constant κ iff f is Lipschitz continuous with constant κ.

(g) fsoc is semismooth iff f is semismooth.

(15)

A bridge from f

mat

to f

soc

For any x = (x1, x2) ∈ IR × IRn−1, let λ1, λ2 be its spectral values, then

(a) For any t ∈ IR, the matrix Lx + tMx2 has eigenvalues of λ1, λ2 and x1 + t of multiplicity n − 2, where

Lx =

x1 xT2 x2 x1I

and Mx2 =

0 0

0 I x2xT2 kx2k2

.

(b) For any f : IR → IR and t ∈ IR, we have

fsoc(x) = fmat



Lx + tMx2

 e,

where e = (1, 0, · · · , 0)T ∈ IRn.

(16)

A reformulation of SOCCP

Proposition [Fukushima-Luo-Tseng, 2002]

hx, yi = 0, x ∈ K, y ∈ K, F (x, y, ζ) = 0.

m

H(x, y, ζ) :=  x − [x − y]+ F (x, y, ζ)



= 0,

where [·]+ : IRn → K denotes the nearest-point projection onto K.

(17)

In Summary

SOCCP ⇐⇒ H(x, y, ζ) = 0.

H is nonsmooth due to the nonsmoothness of the projection operator [·]+.

[·]+ is semismooth so that H is semismooth.

This approach can be used to design and analyze nonsmooth Newton-type methods for solving

H(x, y, ζ) = 0.

(18)

Merit functions

III

Merit functions approach

(19)

Merit function approach

We will show that the SOCCP can be reformulated as an unconstrained differentiable minimization problem. The objectve function of such unconstrained minimization problem is called a merit function.

For simplicity, we assume m = 1, so K = Kn.

(20)

Jordan product

For any x = (x1, x2) ∈ IR × IRn−1 and

y = (y1, y2) ∈ IR × IRn−1, we define their Jordan product associated with Kn as

x ◦ y = (hx, yi , y1x2 + x1y2) The identity element under this product is e := (1, 0, · · · , 0)T ∈ IRn.

(21)

Basic Property

Property

(a) e ◦ x = x , ∀x ∈ IRn .

(b) x ◦ y = y ◦ x , ∀x, y ∈ IRn .

(c) (x + y) ◦ z = x ◦ z + y ◦ z , ∀x, y, z ∈ IRn .

Remarks

1. The Jordan product is not associative.

2. Kn is not closed under Jordan product.

(22)

Jordan Product associated with SOC

We write x2 to mean x ◦ x and write x + y to mean the usual componentwise addition of vectors. Then we have:

If x ∈ Kn, then there exists a unique vector in Kn, denoted by x1/2 such that

(x1/2)2 = x1/2 ◦ x1/2 = x.

For any x ∈ IRn, we have x2 ∈ Kn. Hence, there exists a unique vector (x2)1/2 ∈ Kn denoted by |x|.

For any x ∈ IRn, we have x2 = |x|2.

(23)

A partial order associated with K

n

For any x, y in IRn, we denote

x Kn y ⇐⇒ x − y ∈ Kn, and

x Kn y ⇐⇒ x − y ∈ int(Kn).

Then we have:

Any x ∈ IRn satisfies |x| Kn x .

For any x, y Kn 0, if x Kn y, then x1/2 Kn y1/2 .

For any x, y ∈ IRn, if x2 Kn y2 , then |x| Kn |y| .

(24)

Fischer-Burmeister function

We define φ : IRn × IRn → IRn, by

φ(x, y) = (x2 + y2)1/2 − (x + y).

φ is the Fischer-Burmeister function and is well-defined here.

(25)

A merit function

We define ψ : IRn × IRn → IR, by

ψ(x, y) = kφ(x, y)k2.

It is proved by Fukushima-Luo-Tseng in 2002 that ψ(x, y) = 0 ⇐⇒ x, y ∈ Kn , hx, yi = 0

(26)

A smooth reformulation of SOCCP

Hence, we can reformulate SOCCP as an equivalent unconstrained minimization problem :

(SOCCP)

Find x, y ∈ IRn, ζ ∈ IRl satisfying hx, yi = 0 , x, y ∈ Kn

x = F (ζ) , y = G(ζ) m

ζmin∈IRl {f(ζ) = ψ(F (ζ), G(ζ))} . Thus f is a merit function for SOCCP.

(27)

Proposition

The function ψ is smooth everywhere. Moreover,

(i) If x2 + y2 ∈ int(Kn), then

xψ(x, y) = 

LxL−1(x2+y2)1/2 − I

2φ(x, y) ,

yψ(x, y) = 

LyL−1(x2+y2)1/2 − I

2φ(x, y) .

where Lx =

x1 x2T x2 x1I

.

(ii) If x2 + y2 6∈ int(Kn), i.e., kxk2 + kyk2 = 2kx1x2 + y1y2k, then

xψ(x, y) = (Lx¯ − I) 2φ(x, y)

yψ(x, y) = (Ly¯ − I) 2φ(x, y) ,

(28)

Two Important Lemmas

For any x = (x1, x2), y = (y1, y2) ∈ IR × IRn−1 with x2 + y2 6∈ int(Kn), we have

x21 = kx2k2, y12 = ky2k2, x1y1 = xT2 y2, x1y2 = y1x2.

For any x = (x1, x2), y = (y1, y2) ∈ IR × IRn−1 with x1x2 + y1y2 6= 0, we have

x1 (xkx1x12x+y2+y1y12y)T2kx2 2

x2 − x1kxx11xx22+y+y11yy22k

2

≤ kxk2 + kyk2 − 2kx x + y y k.

(29)

Proposition

Let f (ζ) = ψ(F (ζ), G(ζ)). Then f is smooth and, for every ζ ∈ IRn with ∇F (ζ), −∇G(ζ) column monotone, we have either (i) f (ζ) = 0 or (ii) ∇f(ζ) 6= 0.

In case (ii), if ∇G(ζ) is invertible, then hd(ζ), ∇f(ζ)i < 0, where

d(ζ) := −(∇G(ζ)−1)Txψ(F (ζ), G(ζ)).

Note: We say M, N ∈ IRn×n are column monotone if, for any u, v ∈ IRn,

M u + N v = 0 ⇒ uT v ≥ 0.

(30)

Advantages

Every stationary point ζ of f is a global minimum if

∇F (ζ), −∇G(ζ) are column monotone.

For every non-stationary point ζ of f , we provide a descent direction d(ζ) without computing the

Jacobian of F (ζ).

The assumption of ∇F (ζ), −∇G(ζ) being column monotone is reasonable and holds for second-order cone program (SOCP).

(31)

Another merit function

Another merit function motivated by Yamashita and Fukushima is defined as

ψYF(x, y) = ψ0(hx, yi) + ψ(x, y), where ψ0 : IR → [0, ∞) is any smooth function satisfying

ψ0(t) = 0 ∀t ≤ 0 and ψ0 (t) > 0 ∀t > 0.

(32)

Remarks

An example of ψ0 is ψ0(t) = 14(max{0, t})4. Let fYF(ζ) = ψYF(F (ζ), G(ζ)). It follows from properties of ψ and ψ0 that fYF is also a smooth merit function.

fYF has similar properties as f . In addition, fYF has properties of bounded level sets and provides error bounds when F, G are jointly strongly monotone, whereas f does not.

(33)

Proposition

Let fYF(ζ) := ψYF(F (ζ), G(ζ)). Then fYF is smooth and, for every ζ ∈ IRn with ∇F (ζ), −∇G(ζ) are column

monotone, either (i) fYF(ζ) = 0 or (ii) ∇fYF(ζ) 6= 0.

In case (ii), if ∇G(ζ) is invertible, then hdYF(ζ), ∇fYF(ζ)i < 0, where

dYF(ζ) :=

−(∇G(ζ)−1)T 

ψ0 (hF (ζ), G(ζ)i)G(ζ) +∇xψ(F (ζ), G(ζ))

.

(34)

Error Bounds for f

YF

Suppose that F and G are jointly strongly monotone mappings from IRn to IRn. Also, suppose that SOCCP has a solution ζ. Then there exists a scalar τ > 0 such that

τkζ − ζk2

≤ max{0, hF (ζ), G(ζ)i} + k(−F (ζ))+k +k(−G(ζ))+k, ∀ζ ∈ IRn.

Moreover,

τkζ − ζk2 ≤ ψ0−1 (fYF(ζ)) + 2√

2fYF(ζ)1/2, ζ ∈ IRn

(35)

Remark

F and G are jointly strongly monotone if there exist ρ > 0 such that

hF (ζ) − F (ξ) , G(ζ) − G(ξ)i ≥ ρkζ − ξk2, for all ζ, ξ ∈ IRn.

(36)

Bounded level sets for f

YF

Suppose that F and G are differentiable, jointly strongly monotone mappings from IRn to IRn. Then the level set

L(γ) := {ζ ∈ IRn | fYF(ζ) ≤ γ}

is nonempty and bounded for all γ ≥ 0.

Remark:

The merit function f lacks these properties due to the absence of the term ψ0(hF (ζ), G(ζ)i).

(37)

Algorithms

IV

Algorithms

(38)

Solve SOCP via merit function

The second-order cone program (SOCP) is

min cTx

s.t. Ax = b , x ∈ K,

where A ∈ IRm×n has full rank and m < n. The KKT optimality conditions can be reformulated as

ζ∈IRminn {f(ζ) = ψ(F (ζ), G(ζ))} , where

F (ζ) = d + I − AT (AAT)−1A ζ G(ζ) = c − AT(AAT )−1Aζ,

(39)

Algorithm (1)

• FR-CG method (Fletcher-Reeves):

ζk+1 = ζk + αkdk ,

dk = −∇f(ζk) + βFRk dk−1 , where βFRk is given by

βFRk =

0, k = 1

∇f(ζk)T ∇f(ζk)

∇f(ζk−1)T ∇f(ζk−1), k ≥ 2.

(40)

Algorithm (2)

• PR-CG method (Polak-Ribier):

ζk+1 = ζk + αkdk ,

dk = −∇f(ζk) + βPRk dk−1 , where βPRk is given by

βPRk = ∇f(ζk)T ∇f(ζk) − ∇f(ζk−1)

∇f(ζk−1)T ∇f(ζk−1) . Remark: Powell, in 1986, suggested to modify the PR-CG method by setting

(41)

Algorithm (3)

• BFGS method (Broyden-Fletcher-Goldfarb-Shanno):

ζk+1 = ζk + αkdk , dk = −Dk∇f(ζk) , where

Dk+1 = Dk +



1 + (qk)TDkqk (pk)Tqk

 pk(pk)T (pk)Tqk

Dkqk(pk)T + pk(pk)TDk (pk)Tqk ,

and

pk = ζk+1 − ζk,

qk = ∇f(ζk+1) − ∇f(ζk).

Remark: Need big storage when dimension is large.

(42)

Algorithm (4)

• L-BFGS method: Rewrite BFGS update as

Dk+1 = (V k)TDkV k + ρkpk(pk)T, where

ρk = 1

(qk)Tpk , V k = I − ρkqk(pk)T, and

Dk = (VkT−1 · · · VkT−m)D0k(Vk−m · · · Vk−1) + ρk−m(VkT−1 · · · VkT−m+1)

pk−mpTk−m(Vk−m+1 · · · Vk−1) + ρk−m+1(VkT−1 · · · VkT−m+2)

pk−m+1pTk−m+1(Vk−m+2 · · · Vk−1) + · · · +

(43)

Remarks

Dk+1 is obtained by updating Dk using the pair {pk, qk}.

We store a modified version of Dk implicitly by

storing the most recently computed m pairs {pk, qk}.

When the m + 1 pair is computed, the oldest pair is discarded and its location in memory taken by the new pair.

The L-BFGS is suitable for large problems because it has been observed in practice that small values of m (3 ≤ m ≤ 20) gives satisfactory results.

(44)

Initial scalings of D

0k

Liu and Nocedal studied four different types of initial scalings of D0k for L-BFGS.

Scaling 1 D0k = D0 (no scaling).

Scaling 2 γ0D0 (only initial scaling).

Scaling 3 D0k = γkD0, whereγk = (qk)Tpk/kqkk2. Scaling 4 Same as Scaling 3 during the first m

iterations. For k > m, D0k = diag(ωki ), ωki = pik−1qk−1i + · · · + pik−mqk−mi

(qk−1i )2 + · · · + (qk−mi )2 , i = 1, · · · , n.

(45)

Step-size

Armijo rule : Fix β, σ ∈ (0, 1) and s > 0, choose αk that is the largest α ∈ {s, sβ, sβ2, · · ·} satisfying

f (ζk + αdk) − f(ζk)

α ≤ σ∇f(ζk)T dk, i.e.,

f (ζk + αdk) ≤ f(ζk) + σα∇f(ζk)T dk.

(46)

Numerical Experiments

V

Numerical Experiments

(47)

Test Problems

Table 1: Set of test problems

Problem

Names n m # of nonzero elts

of matrix A structure of SOCs

nb 2383 123 192439 [4 × 1; 793 × 3]

nql30 6302 3680 26819 [3602 × 1; 900 × 3]

qssp30 7568 3691 36851 [2 × 1; 1891 × 4]

nb-L2-bessel 2641 123 209924 [4× 1; 1 × 123; 838 × 3]

In the structure of SOCs, for example, [ 4 × 1; 1 × 123; 838 × 3 ]

means that K consists of the product of 4 K1, one K123, and 838 K3.

(48)

FR-CG Convergence

0 500 1000 1500 2000 2500 3000

10−5 10−4 10−3 10−2 10−1 100 101

Iterations

Merit Func values

Merit Func values v.s. Iterations

(49)

PR-CG Convergence

0 200 400 600 800 1000 1200 1400 1600 1800 2000

10−5 10−4 10−3 10−2 10−1 100 101

Iterations

Merit Func values

Merit Func values v.s. Iterations

(50)

L-BFGS Convergence 1

0 50 100 150 200 250

10−5 10−4 10−3 10−2 10−1 100 101

Iterations

Merit Func values

Merit Func values v.s. Iterations

(51)

L-BFGS Convergence 2

0 50 100 150 200 250

10−6 10−5 10−4 10−3 10−2 10−1 100 101

Iterations

Merit Func values

Merit Func values v.s. Iterations

參考文獻

相關文件

Chen, Conditions for error bounds and bounded level sets of some merit func- tions for the second-order cone complementarity problem, Journal of Optimization Theory and

Chen, Conditions for error bounds and bounded level sets of some merit func- tions for the second-order cone complementarity problem, Journal of Optimization Theory and

Solving SVM Quadratic Programming Problem Training large-scale data..

engineering design, product design, industrial design, ceramic design, decorative design, graphic design, illustration design, information design, typographic design,

Keywords: Computer game applications; CCNA Certification Training; digital content design; game-based Learning;

Therefore, the purpose of this study is to propose a model, named as the Interhub Heterogeneous Fleet Routing Problem (IHFRP), to deal with the route design

Research on Analog and Mixed-Signal Processing Integrated Circuit Design for ISFET-Based Linear Sensor Array

“Reliability-Constrained Area Optimization of VLSI Power/Ground Network Via Sequence of Linear Programming “, Design Automation Conference, pp.156 161, 1999. Shi, “Fast