On the second-order cone complementarity problem
Jein-Shan Chen
jschen@math.ntnu.edu.tw
Department of Mathematics National Taiwan Normal University
Outline
Introduction
Nonsmooth functions approach Merit functions approach
Algorithms
Numerical experiments
Introduction
I
Introduction
History
1. NCP (1960): To find x ∈ IRn satisfying
hF (x), xi = 0 , F (x) ≥ 0 , x ≥ 0.
2. SDCP (1980): To find X ∈ IRn×n satisfying hF (X), Xi = 0 , F (X) O , X O.
3. SOCCP (1998): To find x ∈ IRn satisfying
hF (x), xi = 0 , F (x) Kn 0 , x Kn 0.
What’s SOC ?
The second-order cone (SOC) in IRn, also called Lorentz cone, is defined to be
Kn = {(x1, x2) ∈ IR × IRn−1 | kx2k ≤ x1}
where k · k denotes the Euclidean norm. If n = 1, let Kn be the set of nonnegative reals IR+.
X1 X1
What’s SOCCP ?
The second-order cone complementarity problem (SOCCP) is to find x, y ∈ IRn and ζ ∈ IRl such that
hx, yi = 0
x ∈ K , y ∈ K
x = F (ζ) , y = G(ζ)
where h·, ·i denotes the Euclidean inner product, F, G : IRl → IRn are smooth mappings, K is the Cartesian product of SOC, that is,
K = Kn1 × · · · × Knm,
Why study SOCCP ?
This problem has wide applications, e.g., Robust linear programming, filter design, antenna design, etc.. (Lobo, Vandenberghe, Boyd, Lebret, 1998)
It includes a large class of quadratically constrained problems and minimization of sum of Euclidean
norms as special cases.
It also includes as a special case the well-known nonlinear complementarity problem (NCP).
Difficulty: K is closed and convex, but non-polyhedral.
Nonsmooth function
II
Nonsmooth functions approach
The matrix-valued functions
Let Sn be the space of n × n real symmetric matrices. For any
X ∈ Sn, its eigenvalues λ1, λ2, · · · , λn are real and admits a spectral decomposition:
X = P
λ1
. ..
λn
PT,
where P is orthogonal (i.e., PT = P−1). Then, for any function f : IR → IR, we define a corresponding matrix-valued function fmat : Sn → Sn by
fmat(X) = P
f (λ1)
. ..
PT.
Known results about f
mat(a) fmat is continuous iff f is continuous.
(b) fmat is directionally differentiable iff f is directionally differentiable.
(c) fmat is Fréchet-differentiable iff f is Fréchet-differentiable.
(d) fmat is continuously differentiable iff f is continuously differentiable.
(e) fmat is strictly continuous iff f is strictly continuous.
(f) fmat is Lipschitz continuous with constant κ iff f is Lipschitz continuous with constant κ.
(g) fmat is semismooth iff f is semismooth.
Remarks
1. Strictly continuous is also called locally Lipschitz continuous.
2. Let F : IRn → IRm is strictly continuous, then F is semismooth at x if F is directionally differentiable at x and ∀V ∈ ∂F (x + h), we have
F (x + h) − F (x) − V h = ◦(khk) 3. Convex functions and piecewise continuously
differentiable functions are examples of semismooth functions.
Spectral Factorization
Let x = (x1, x2) ∈ IR × IRn−1, then x can be decomposed as
x = λ1u(1) + λ2u(2) ,
where λ1, λ2 and u(1), u(2) are the spectral values and the associated spectral vectors of x are given by
λi = x1 + (−1)ikx2k ,
u(i) =
1 2
1 , (−1)i x2 kx2k
, if x2 6= 0
1 2
1 , (−1)iw
, if x2 = 0 ,
for i = 1, 2 with w being any vector in IRn−1 satisfying kwk = 1. If x 6= 0, the decomposition is unique.
The SOC-functions
For any function f : IR → IR, we define a corresponding function on IRn associated with SOC by
fsoc(x) = f (λ1) u(1) + f (λ2) u(2),
for all x = (x1, x2) ∈ IR × IRn−1, where λ1, λ2 and u(1), u(2) are the spectral values and vectors of x.
If f is defined only on a subset of IR, then fsoc is defined on the corresponding subset of IRn.
Parallel results about f
soc(a) fsoc is continuous iff f is continuous.
(b) fsoc is directionally differentiable iff f is directionally differentiable.
(c) fsoc is Fréchet-differentiable iff f is Fréchet-differentiable.
(d) fsoc is continuously differentiable iff f is continuously differentiable.
(e) fsoc is strictly continuous iff f is strictly continuous.
(f) fsoc is Lipschitz continuous with constant κ iff f is Lipschitz continuous with constant κ.
(g) fsoc is semismooth iff f is semismooth.
A bridge from f
matto f
socFor any x = (x1, x2) ∈ IR × IRn−1, let λ1, λ2 be its spectral values, then
(a) For any t ∈ IR, the matrix Lx + tMx2 has eigenvalues of λ1, λ2 and x1 + t of multiplicity n − 2, where
Lx =
x1 xT2 x2 x1I
and Mx2 =
0 0
0 I − x2xT2 kx2k2
.
(b) For any f : IR → IR and t ∈ IR, we have
fsoc(x) = fmat
Lx + tMx2
e,
where e = (1, 0, · · · , 0)T ∈ IRn.
A reformulation of SOCCP
Proposition [Fukushima-Luo-Tseng, 2002]
hx, yi = 0, x ∈ K, y ∈ K, F (x, y, ζ) = 0.
m
H(x, y, ζ) := x − [x − y]+ F (x, y, ζ)
= 0,
where [·]+ : IRn → K denotes the nearest-point projection onto K.
In Summary
SOCCP ⇐⇒ H(x, y, ζ) = 0.
H is nonsmooth due to the nonsmoothness of the projection operator [·]+.
[·]+ is semismooth so that H is semismooth.
This approach can be used to design and analyze nonsmooth Newton-type methods for solving
H(x, y, ζ) = 0.
Merit functions
III
Merit functions approach
Merit function approach
We will show that the SOCCP can be reformulated as an unconstrained differentiable minimization problem. The objectve function of such unconstrained minimization problem is called a merit function.
For simplicity, we assume m = 1, so K = Kn.
Jordan product
For any x = (x1, x2) ∈ IR × IRn−1 and
y = (y1, y2) ∈ IR × IRn−1, we define their Jordan product associated with Kn as
x ◦ y = (hx, yi , y1x2 + x1y2) The identity element under this product is e := (1, 0, · · · , 0)T ∈ IRn.
Basic Property
Property
(a) e ◦ x = x , ∀x ∈ IRn .
(b) x ◦ y = y ◦ x , ∀x, y ∈ IRn .
(c) (x + y) ◦ z = x ◦ z + y ◦ z , ∀x, y, z ∈ IRn .
Remarks
1. The Jordan product is not associative.
2. Kn is not closed under Jordan product.
Jordan Product associated with SOC
We write x2 to mean x ◦ x and write x + y to mean the usual componentwise addition of vectors. Then we have:
If x ∈ Kn, then there exists a unique vector in Kn, denoted by x1/2 such that
(x1/2)2 = x1/2 ◦ x1/2 = x.
For any x ∈ IRn, we have x2 ∈ Kn. Hence, there exists a unique vector (x2)1/2 ∈ Kn denoted by |x|.
For any x ∈ IRn, we have x2 = |x|2.
A partial order associated with K
nFor any x, y in IRn, we denote
x Kn y ⇐⇒ x − y ∈ Kn, and
x ≻Kn y ⇐⇒ x − y ∈ int(Kn).
Then we have:
Any x ∈ IRn satisfies |x| Kn x .
For any x, y Kn 0, if x Kn y, then x1/2 Kn y1/2 .
For any x, y ∈ IRn, if x2 Kn y2 , then |x| Kn |y| .
Fischer-Burmeister function
We define φ : IRn × IRn → IRn, by
φ(x, y) = (x2 + y2)1/2 − (x + y).
φ is the Fischer-Burmeister function and is well-defined here.
A merit function
We define ψ : IRn × IRn → IR, by
ψ(x, y) = kφ(x, y)k2.
It is proved by Fukushima-Luo-Tseng in 2002 that ψ(x, y) = 0 ⇐⇒ x, y ∈ Kn , hx, yi = 0
A smooth reformulation of SOCCP
Hence, we can reformulate SOCCP as an equivalent unconstrained minimization problem :
(SOCCP)
Find x, y ∈ IRn, ζ ∈ IRl satisfying hx, yi = 0 , x, y ∈ Kn
x = F (ζ) , y = G(ζ) m
ζmin∈IRl {f(ζ) = ψ(F (ζ), G(ζ))} . Thus f is a merit function for SOCCP.
Proposition
The function ψ is smooth everywhere. Moreover,
(i) If x2 + y2 ∈ int(Kn), then
∇xψ(x, y) =
LxL−1(x2+y2)1/2 − I
2φ(x, y) ,
∇yψ(x, y) =
LyL−1(x2+y2)1/2 − I
2φ(x, y) .
where Lx =
x1 x2T x2 x1I
.
(ii) If x2 + y2 6∈ int(Kn), i.e., kxk2 + kyk2 = 2kx1x2 + y1y2k, then
∇xψ(x, y) = (Lx¯ − I) 2φ(x, y)
∇yψ(x, y) = (Ly¯ − I) 2φ(x, y) ,
Two Important Lemmas
For any x = (x1, x2), y = (y1, y2) ∈ IR × IRn−1 with x2 + y2 6∈ int(Kn), we have
x21 = kx2k2, y12 = ky2k2, x1y1 = xT2 y2, x1y2 = y1x2.
For any x = (x1, x2), y = (y1, y2) ∈ IR × IRn−1 with x1x2 + y1y2 6= 0, we have
x1 − (xkx1x12x+y2+y1y12y)T2kx2 2
≤
x2 − x1kxx11xx22+y+y11yy22k
2
≤ kxk2 + kyk2 − 2kx x + y y k.
Proposition
Let f (ζ) = ψ(F (ζ), G(ζ)). Then f is smooth and, for every ζ ∈ IRn with ∇F (ζ), −∇G(ζ) column monotone, we have either (i) f (ζ) = 0 or (ii) ∇f(ζ) 6= 0.
In case (ii), if ∇G(ζ) is invertible, then hd(ζ), ∇f(ζ)i < 0, where
d(ζ) := −(∇G(ζ)−1)T ∇xψ(F (ζ), G(ζ)).
Note: We say M, N ∈ IRn×n are column monotone if, for any u, v ∈ IRn,
M u + N v = 0 ⇒ uT v ≥ 0.
Advantages
Every stationary point ζ of f is a global minimum if
∇F (ζ), −∇G(ζ) are column monotone.
For every non-stationary point ζ of f , we provide a descent direction d(ζ) without computing the
Jacobian of F (ζ).
The assumption of ∇F (ζ), −∇G(ζ) being column monotone is reasonable and holds for second-order cone program (SOCP).
Another merit function
Another merit function motivated by Yamashita and Fukushima is defined as
ψYF(x, y) = ψ0(hx, yi) + ψ(x, y), where ψ0 : IR → [0, ∞) is any smooth function satisfying
ψ0(t) = 0 ∀t ≤ 0 and ψ0′ (t) > 0 ∀t > 0.
Remarks
An example of ψ0 is ψ0(t) = 14(max{0, t})4. Let fYF(ζ) = ψYF(F (ζ), G(ζ)). It follows from properties of ψ and ψ0 that fYF is also a smooth merit function.
fYF has similar properties as f . In addition, fYF has properties of bounded level sets and provides error bounds when F, G are jointly strongly monotone, whereas f does not.
Proposition
Let fYF(ζ) := ψYF(F (ζ), G(ζ)). Then fYF is smooth and, for every ζ ∈ IRn with ∇F (ζ), −∇G(ζ) are column
monotone, either (i) fYF(ζ) = 0 or (ii) ∇fYF(ζ) 6= 0.
In case (ii), if ∇G(ζ) is invertible, then hdYF(ζ), ∇fYF(ζ)i < 0, where
dYF(ζ) :=
−(∇G(ζ)−1)T
ψ0′ (hF (ζ), G(ζ)i)G(ζ) +∇xψ(F (ζ), G(ζ))
.
Error Bounds for f
YFSuppose that F and G are jointly strongly monotone mappings from IRn to IRn. Also, suppose that SOCCP has a solution ζ∗. Then there exists a scalar τ > 0 such that
τkζ − ζ∗k2
≤ max{0, hF (ζ), G(ζ)i} + k(−F (ζ))+k +k(−G(ζ))+k, ∀ζ ∈ IRn.
Moreover,
τkζ − ζ∗k2 ≤ ψ0−1 (fYF(ζ)) + 2√
2fYF(ζ)1/2, ζ ∈ IRn
Remark
F and G are jointly strongly monotone if there exist ρ > 0 such that
hF (ζ) − F (ξ) , G(ζ) − G(ξ)i ≥ ρkζ − ξk2, for all ζ, ξ ∈ IRn.
Bounded level sets for f
YFSuppose that F and G are differentiable, jointly strongly monotone mappings from IRn to IRn. Then the level set
L(γ) := {ζ ∈ IRn | fYF(ζ) ≤ γ}
is nonempty and bounded for all γ ≥ 0.
Remark:
The merit function f lacks these properties due to the absence of the term ψ0(hF (ζ), G(ζ)i).
Algorithms
IV
Algorithms
Solve SOCP via merit function
The second-order cone program (SOCP) is
min cTx
s.t. Ax = b , x ∈ K,
where A ∈ IRm×n has full rank and m < n. The KKT optimality conditions can be reformulated as
ζ∈IRminn {f(ζ) = ψ(F (ζ), G(ζ))} , where
F (ζ) = d + I − AT (AAT)−1A ζ G(ζ) = c − AT(AAT )−1Aζ,
Algorithm (1)
• FR-CG method (Fletcher-Reeves):
ζk+1 = ζk + αkdk ,
dk = −∇f(ζk) + βFRk dk−1 , where βFRk is given by
βFRk =
0, k = 1
∇f(ζk)T ∇f(ζk)
∇f(ζk−1)T ∇f(ζk−1), k ≥ 2.
Algorithm (2)
• PR-CG method (Polak-Ribier):
ζk+1 = ζk + αkdk ,
dk = −∇f(ζk) + βPRk dk−1 , where βPRk is given by
βPRk = ∇f(ζk)T ∇f(ζk) − ∇f(ζk−1)
∇f(ζk−1)T ∇f(ζk−1) . Remark: Powell, in 1986, suggested to modify the PR-CG method by setting
Algorithm (3)
• BFGS method (Broyden-Fletcher-Goldfarb-Shanno):
ζk+1 = ζk + αkdk , dk = −Dk∇f(ζk) , where
Dk+1 = Dk +
1 + (qk)TDkqk (pk)Tqk
pk(pk)T (pk)Tqk
− Dkqk(pk)T + pk(pk)TDk (pk)Tqk ,
and
pk = ζk+1 − ζk,
qk = ∇f(ζk+1) − ∇f(ζk).
Remark: Need big storage when dimension is large.
Algorithm (4)
• L-BFGS method: Rewrite BFGS update as
Dk+1 = (V k)TDkV k + ρkpk(pk)T, where
ρk = 1
(qk)Tpk , V k = I − ρkqk(pk)T, and
Dk = (VkT−1 · · · VkT−m)D0k(Vk−m · · · Vk−1) + ρk−m(VkT−1 · · · VkT−m+1)
pk−mpTk−m(Vk−m+1 · · · Vk−1) + ρk−m+1(VkT−1 · · · VkT−m+2)
pk−m+1pTk−m+1(Vk−m+2 · · · Vk−1) + · · · +
Remarks
Dk+1 is obtained by updating Dk using the pair {pk, qk}.
We store a modified version of Dk implicitly by
storing the most recently computed m pairs {pk, qk}.
When the m + 1 pair is computed, the oldest pair is discarded and its location in memory taken by the new pair.
The L-BFGS is suitable for large problems because it has been observed in practice that small values of m (3 ≤ m ≤ 20) gives satisfactory results.
Initial scalings of D
0kLiu and Nocedal studied four different types of initial scalings of D0k for L-BFGS.
Scaling 1 D0k = D0 (no scaling).
Scaling 2 γ0D0 (only initial scaling).
Scaling 3 D0k = γkD0, whereγk = (qk)Tpk/kqkk2. Scaling 4 Same as Scaling 3 during the first m
iterations. For k > m, D0k = diag(ωki ), ωki = pik−1qk−1i + · · · + pik−mqk−mi
(qk−1i )2 + · · · + (qk−mi )2 , i = 1, · · · , n.
Step-size
Armijo rule : Fix β, σ ∈ (0, 1) and s > 0, choose αk that is the largest α ∈ {s, sβ, sβ2, · · ·} satisfying
f (ζk + αdk) − f(ζk)
α ≤ σ∇f(ζk)T dk, i.e.,
f (ζk + αdk) ≤ f(ζk) + σα∇f(ζk)T dk.
Numerical Experiments
V
Numerical Experiments
Test Problems
Table 1: Set of test problems
Problem
Names n m # of nonzero elts
of matrix A structure of SOCs
nb 2383 123 192439 [4 × 1; 793 × 3]
nql30 6302 3680 26819 [3602 × 1; 900 × 3]
qssp30 7568 3691 36851 [2 × 1; 1891 × 4]
nb-L2-bessel 2641 123 209924 [4× 1; 1 × 123; 838 × 3]
In the structure of SOCs, for example, [ 4 × 1; 1 × 123; 838 × 3 ]
means that K consists of the product of 4 K1, one K123, and 838 K3.
FR-CG Convergence
0 500 1000 1500 2000 2500 3000
10−5 10−4 10−3 10−2 10−1 100 101
Iterations
Merit Func values
Merit Func values v.s. Iterations
PR-CG Convergence
0 200 400 600 800 1000 1200 1400 1600 1800 2000
10−5 10−4 10−3 10−2 10−1 100 101
Iterations
Merit Func values
Merit Func values v.s. Iterations
L-BFGS Convergence 1
0 50 100 150 200 250
10−5 10−4 10−3 10−2 10−1 100 101
Iterations
Merit Func values
Merit Func values v.s. Iterations
L-BFGS Convergence 2
0 50 100 150 200 250
10−6 10−5 10−4 10−3 10−2 10−1 100 101
Iterations
Merit Func values
Merit Func values v.s. Iterations