SOC Functions and Their Applications

(1)

SOC Functions and Their Applications

by

Jein-Shan Chen

Department of Mathematics National Taiwan Normal University

January 07, 2019

(2)

Preface

The second-order cone programs (SOCP) have been an attraction due to plenty of applications in engineering, data science, and finance. To deal with this special type of optimization problems involving second-order cone (SOC). We believe that the following items are crucial concepts: (i) spectral decomposition associated with SOC, (ii) analysis of SOC functions, (iii) SOC-convexity and SOC-monotonicity. In this book, we go through all these concepts and try to provide the readers a whole picture regarding SOC functions and their applications.

As introduced in Chapter 1, the SOC functions are indeed vector-valued functions associated with SOC, which are accompanied by Jordan product. However, unlike the matrix multiplication, the Jordan product associated with SOC is not associative which is the main source of difficulty when we do the analysis. Therefore, the ideas for proofs are usually quite different from those for matrix-valued functions. In other words, although SOC and positive semidefinite cone both belong to symmetric cones, the analysis for them are different. In general, the arguments are more tedious and need subtle ar- rangements in the SOC setting. This is due to the feature of SOC.

To deal with second-order cone programs (SOCPs) and second-order cone complementarity problems (SOCCPs), many methods rely on some SOC complementarity functions or merit functions to reformulate the KKT optimality conditions as a nonsmooth (or smoothing) system of equations or an unconstrained minimization problem. In fact, such SOC complementarity or merit functions are connected to SOC functions. In other words, the vector-valued functions associated with SOC are heavily used in the solutions methods for SOCP and SOCCP. Therefore, further study on these functions will be helpful for developing and analyzing more solutions methods.

For SOCP, there are still many approaches without using SOC complementarity functions. In this case, the concepts of SOC-convexity and SOC-monotonicity introduced in Chapter 2 play a key to those solution methods. In Chapter 3, we present proximal-type algorithms in which SOC-convexity and SOC-monotonicity are needed in designing solution methods and proving convergence analysis.

In Chapter 4, we pay attention to some other types of applications of SOC-functions, SOC-convexity, and SOC-monotonicity introduced in this monograph. These include so-called SOC means, SOC weighted means, and a few SOC trace versions of Young, H¨older, Minkowski inequalities, and Powers-Størmer’s inequality. All these materials are newly discovered and we believe that they will be helpful in convergence analysis of various optimizations involving SOC. Chapter 5 offers a direction for future investigation, although it is not very consummate yet.

(3)

This book is based on my series of study regarding second-order cone, SOCP, SOCCP, SOC-functions, etc. during the past fifteen years. It is dedicated to the memory of my supervisor, Prof. Paul Tseng, who guided me into optimization research, especially to second-order cone optimization. Without his encouragement, it is not possible to achieve the whole picture of SOC-functions, which is the main role of this monograph. His attitude towards doing research always remains in my heart, albeit he got missing in 2009. I would like to thank all my co-authors of the materials that appear in this book, including Prof. Shaohua Pan, Prof. Xin Chen, Prof. Jiawei Zhang, Prof. Yu-Lin Chang, Dr. Chien-Hao Huang, etc.. The collaborations with them are wonderful and enjoyable experiences. I also thank Dr. Chien-Hao Huang, Dr. Yue Lu, Dr. Liguo Jiao, Prof.

Xinhe Miao, and Prof. Chu-Chin Hu for their help on proofreading. Final gratitude goes to my family, Vivian, Benjamin, and Ian, who offer me support and stimulate endless strength in pursuing my exceptional academic career.

January 07, 2019 Taipei, Taiwan

(4)

Notations

• Throughout this book, an n-dimensional vector x = (x₁, x₂, · · · , x_n) ∈ IRⁿ means a column vector, i.e.,

x =





 x1

x₂ ... xn





 .

In other words, without ambiguity, we also write the column vector as x = (x₁, x₂, · · · , x_n).

• IRⁿ₊ means {x = (x₁, x₂, . . . , x_n) | x_i ≥ 0, for all i = 1, 2, . . . , n}, whereas IRⁿ₊₊ denotes {x = (x₁, x₂, . . . , x_n) | x_i > 0, ∀i = 1, 2, . . . , n}.

• h·, ·i denotes the Euclidean inner product.

• ^T means transpose of a vector or a matrix.

• B(x, δ) denotes the neighborhood of x with radius δ > 0.

• IR^n×n denotes the space of n × n real matrices.

• I represents an identity matrix of suitable dimension.

• For any symmetric matrices A, B ∈ IR^n×n, we write A B (respectively, A B) to mean A − B is positive semidefinite (respectively, positive definite).

• Sⁿdenotes the space of n × n symmetric matrices; and S₊ⁿ means the space of n × n symmetric positive semidefinite matrices.

• O denotes the set of P ∈ IR^n×n that are orthogonal, i.e., P^T = P⁻¹.

• k · k is the Euclidean norm.

• Given a set S, we denote ¯S, int(S) and bd(S) by the closure, the interior and the boundary of S, respectively.

• A function f : IRⁿ → (−∞, ∞] is said to be proper if f (ζ) < ∞ for at least one ζ ∈ IRⁿ and f (ζ) > −∞ for all ζ ∈ IRⁿ.

• For a mapping f : IRⁿ→ IR, ∇f (x) denotes the gradient of f at x.

• For a closed proper convex function f : IRⁿ → (−∞, ∞], we denote its domain by domf := { ζ ∈ IRⁿ| f (ζ) < ∞}.

(5)

• For a closed proper convex function f : IRⁿ → (−∞, ∞], we denote the subdifferential of f at bζ by

∂f (bζ) :=n

w ∈ IRⁿ| f (ζ) ≥ f (bζ) + hw, ζ − bζi, ∀ζ ∈ IRⁿo .

• C⁽ⁱ⁾(J ) denotes the family of functions which are defined on J ⊆ IRⁿ to IR and have continuous i-th derivative.

• For any differentiable mapping F = (F₁, F₂, · · · , F_m) : IRⁿ → IR^m, ∇F (x) = [∇F₁(x) · · · ∇F_m(x)] is a n × m matrix which denotes the transpose Jacobian of F at x.

• For any x, y ∈ IRⁿ, we write x _Kn y if x − y ∈ Kⁿ; and write x _Kn y if x − y ∈ int(Kⁿ).

• For a real valued function f : J → IR, f⁰(t) and f⁰⁰(t) denote the first derivative and second-order derivative of f at the differentiable point t ∈ J , respectively.

• For a mapping F : S ⊆ IRⁿ → IR^m, ∂F (x) denotes the subdifferential of F at x, while ∂_BF (x) denotes the B-subdifferential of F at x.

(6)

Chapter 1 SOC Functions

During the past two decades, there have been active research for second-order cone programs (SOCPs) and second-order cone complementarity problems (SOCCPs). Various methods had been proposed which include the interior-point methods [1, 102, 109, 123, 146], the smoothing Newton methods [51, 63, 71], the semismooth Newton methods [86, 120], and the merit function methods [43, 48]. All of these methods are proposed by using some SOC complementarity function or merit function to reformulate the KKT optimality conditions as a nonsmooth (or smoothing) system of equations or an unconstrained minimization problem. In fact, such SOC complementarity functions or merit functions are closely connected to so-called SOC functions. In other words, studying SOC functions is crucial to dealing with SOCP and SOCCP, which is the main target of this chapter.

1.1 On the second-order cone

The second-order cone (SOC) in IRⁿ, also called Lorentz cone, is defined by

Kⁿ =(x₁, x₂) ∈ IR × IRⁿ⁻¹| kx₂k ≤ x₁ , (1.1) where k · k denotes the Euclidean norm. If n = 1, let Kⁿ denote the set of nonnegative reals IR₊. For n = 2 and n = 3, the pictures of Kⁿ are depicted in Figure 1.1(a) and Figure 1.1(b), respectively. It is known that Kⁿ is a pointed closed convex cone so that a partial ordering can be deduced. More specifically, for any x, y in IRⁿ, we write x _Kn y if x − y ∈ Kⁿ; and write x _Kn y if x − y ∈ int(Kⁿ). In other words, we have x _Kn 0 if and only if x ∈ Kⁿ; whereas x _Kn 0 if and only if x ∈ int(Kⁿ). The relation _Kn is a partial ordering, but not a linear ordering in Kⁿ, i.e., there exist x, y ∈ Kⁿ such that neither x _Kn y nor y _Kn x. To see this, for n = 2, let x = (1, 1) ∈ K² and y = (1, 0) ∈ K². Then, we have x − y = (0, 1) /∈ K² and y − x = (0, −1) /∈ K².

1

(7)

(a) 2-dimensional SOC (b) 3-dimensional SOC

Figure 1.1: The graphs of SOC

The second-order cone has received much attention in optimization, particularly in the context of applications and solutions methods for second-order cone program (SOCP) [1, 47, 48, 102, 115, 116, 118] and second-order cone complementarity problem (SOCCP), [42, 43, 45, 48, 63, 71, 117]. For those solutions methods, there needs spectral decomposition associated with SOC whose basic concept is described below. For any x = (x₁, x₂) ∈ IR × IRⁿ⁻¹, x can be decomposed as

x = λ₁(x)u⁽¹⁾_x + λ₂(x)u⁽²⁾_x , (1.2) where λ₁(x), λ₂(x) and u⁽¹⁾x , u⁽²⁾x are the spectral values and the associated spectral vectors of x given by

λ_i(x) = x₁+ (−1)ⁱkx₂k, (1.3)

u⁽ⁱ⁾_x =







1 2

1, (−1)ⁱ x2

kx₂k

, if x2 6= 0,

1

2(1, (−1)ⁱw) , if x2 = 0,

(1.4)

for i = 1, 2 with w being any vector in IRⁿ⁻¹ satisfying kwk = 1. If x2 6= 0, the decomposition is unique.

For any x = (x₁, x₂) ∈ IR × IRⁿ⁻¹ and y = (y₁, y₂) ∈ IR × IRⁿ⁻¹, we define their Jordan product as

x ◦ y = (hx, yi, y₁x₂+ x₁y₂) ∈ IR × IRⁿ⁻¹. (1.5) The Jordan product is not associative. For example, for n = 3, let x = (1, −1, 1) and y = z = (1, 0, 1), then we have (x ◦ y) ◦ z = (4, −1, 4) 6= x ◦ (y ◦ z) = (4, −2, 4). However, it is power associative, i.e., x ◦ (x ◦ x) = (x ◦ x) ◦ x, for all x ∈ IRⁿ. Thus, without fear of ambiguity, we may write x^m for the product of m copies of x and x^m+n= x^m◦ xⁿ for all positive integers m and n. The vector e = (1, 0, . . . , 0) is the unique identity element for the Jordan product, and we define x⁰ = e for convenience. In addition, Kⁿ is not closed under Jordan product. For example, x = (√

2, 1, 1) ∈ K³, y = (√

2, 1, −1) ∈ K³,

(8)

but x ◦ y = (2, 2√

2, 0) /∈ K³. We point out that lacking associative property of Jordan product and closedness of SOC are the main sources of difficulty when dealing with SOC.

We write x² to denote x ◦ x and write x + y to mean the usual componentwise addition of vectors. Then, “◦, +” together with e = (1, 0, . . . , 0) ∈ IRⁿ have the following basic properties (see [61, 63]):

(1) e ◦ x = x, for all x ∈ IRⁿ. (2) x ◦ y = y ◦ x, for all x, y ∈ IRⁿ.

(3) x ◦ (x²◦ y) = x²◦ (x ◦ y), for all x, y ∈ IRⁿ. (4) (x + y) ◦ z = x ◦ z + y ◦ z, for all x, y, z ∈ IRⁿ.

For each x = (x1, x2) ∈ IR × IRⁿ⁻¹, the determinant and the trace of x are defined by det(x) = x²₁− kx2k², tr(x) = 2x1.

In view of the definition of spectral values (1.3), it is clear that the determinant, the trace and the Euclidean norm of x can all be represented in terms of λ₁(x) and λ₂(x):

det(x) = λ₁(x)λ₂(x), tr(x) = λ₁(x) + λ₂(x), kxk² = 1

2 λ₁(x)²+ λ₂(x)² . (1.6) As below, we elaborate more about the determinant and trace by showing some properties.

Proposition 1.1. For any x _Kn 0 and y _Kn 0, the following results hold.

(a) If x _Kn y, then det(x) ≥ det(y) and tr(x) ≥ tr(y).

(b) If x _Kn y, then λ_i(x) ≥ λ_i(y) for i = 1, 2.

Proof. (a) From definition, we know that

det(x) = x²₁− kx₂k², tr(x) = 2x₁, det(y) = y₁²− ky₂k², tr(y) = 2y₁.

Since x − y = (x₁− y₁, x₂− y₂) _Kn 0, we have kx₂− y₂k ≤ x₁− y₁. Thus, x₁ ≥ y₁, and then tr(x) ≥ tr(y). Besides, using the assumption on x and y gives

x₁− y₁ ≥ kx₂− y₂k ≥

kx₂k − ky₂k

, (1.7)

which is equivalent to x₁− kx₂k ≥ y₁− ky₂k > 0 and x₁+ kx₂k ≥ y₁+ ky₂k > 0. Hence, det(x) = x²₁− kx₂k² = (x₁ + kx₂k)(x₁− kx₂k) ≥ (y₁+ ky₂k)(y₁− ky₂k) = det(y).

(9)

(b) From definition of spectral values, we know that

λ₁(x) = x₁ − kx₂k, λ₂(x) = x₁+ kx₂k and λ₁(y) = y₁− ky₂k, λ₂(y) = y₁+ ky₂k.

Then, by the inequality (1.7) in the proof of part(a), the results follow immediately. We point out that there may have other simpler ways to prove Proposition 1.1. The approach here is straightforward and intuitive by checking definitions. The converse of Proposition 1.1 does not hold, a counterexample occurs when taking x = (5, 3) ∈ K² and y = (3, −1) ∈ K². In fact, if (x₁, x₂) ∈ IR × IRⁿ⁻¹ serves as a counterexample for Kⁿ, then (x₁, x₂, 0, . . . , 0) ∈ IR × IR^m−1 is automatically a counterexample for K^m whenever m ≥ n. Moreover, for any x _Kn y, there always have λ_i(x) ≥ λ_i(y) and tr(x) ≥ tr(y) for i = 1, 2. There is no need to restrict x _Kn 0 and y _Kn 0 as in Proposition 1.1.

Proposition 1.2. Let x _Kn 0, y _Kn 0 and e = (1, 0, · · · , 0). Then, the following hold.

(a) det(x + y) ≥ det(x) + det(y).

(b) det(x ◦ y) ≤ det(x) det(y).

(c) det αx + (1 − α)y ≥ α²det(x) + (1 − α)²det(y) for all 0 < α < 1.

(d) det(e + x)1/2

≥ 1 + det(x)^1/2. (e) det(e + x + y) ≤ det(e + x) det(e + y).

Proof. (a) For any x _Kn 0 and y _Kn 0, we know kx₂k ≤ x₁ and ky₂k ≤ y₁, which implies

|hx₂, y₂i| ≤ kx₂k ky₂k ≤ x₁y₁. Hence, we obtain

det(x + y) = (x₁+ y₁)²− kx₂+ y₂k²

= x²₁− kx₂k² + y²₁− ky₂k² + 2 x₁y₁− hx₂, y₂i

≥ x²₁− kx₂k² + y²₁− ky₂k²

= det(x) + det(y).

(b) Applying the Cauchy inequality gives det(x ◦ y) = hx, yi²− kx1y2+ y1x2k²

= x₁y₁+ hx₂, y₂i2

− x²₁ky₂k²+ 2x₁y₁hx₂, y₂i + y²₁kx₂k²

= x²₁y₁²+ hx₂, y₂i²− x²₁ky₂k²− y₁²kx₂k²

≤ x²₁y₁²+ kx₂k²ky₂k²− x²₁ky₂k²− y₁²kx₂k²

= x²₁− kx2k²

y²₁− ky2k²

= det(x) det(y).

(10)

(c) For any x _Kn 0 and y _Kn 0, it is clear that αx _Kn 0 and (1 − α)y _Kn 0 for every 0 < α < 1. In addition, we observe that det(αx) = α²det(x). Hence,

det αx + (1 − α)y ≥ det(αx) + det((1 − α)y) = α²det(x) + (1 − α)²det(y), where the inequality is from part(a).

(d) For any x _Kn 0, we know det(x) = λ₁(x)λ₂(x) ≥ 0, where λ_i(x) are the spectral values of x. Hence,

det(e + x) = (1 + λ₁(x))(1 + λ₂(x)) ≥ 1 +p

λ₁(x)λ₂(x)2

= 1 + det(x)^1/2² . Then, taking square root on both sides yields the desired result.

(e) Again, For any x _Kn 0 and y _Kn 0, we have the following inequalities

x₁− kx₂k ≥ 0, y₁− ky₂k ≥ 0, |hx₂, y₂i| ≤ kx₂k ky₂k ≤ x₁y₁. (1.8) Moreover, we know det(e+x+y) = (1+x₁+y₁)²−kx₂+y₂k², det(e+x) = (1+x₁)²−kx₂k² and det(e + y) = (1 + y₁)²− ky₂k². Hence,

det(e + x) det(e + y) − det(e + x + y)

= (1 + x1)²− kx2k²

(1 + y1)²− ky2k² − (1 + x1 + y1)²− kx2+ y2k²

= 2x₁y₁+ 2hx₂, y₂i + 2x₁y₁²+ 2x²₁y₁− 2y₁kx₂k²− 2x₁ky₂k² +x²₁y²₁ − y₁²kx₂k²− x²₁ky₂k² + kx₂k²ky₂k²

= 2 x₁y₁+ hx₂, y₂i + 2x₁ y₁²− ky₂k² + 2y₁ x²₁− kx₂k² + x²₁− kx2k²

y²₁− ky2k²

≥ 0,

where we multiply out all the expansions to obtain the second equality and the last inequality holds by (1.8).

Proposition 1.2(c) can be extended to a more general case:

det αx + βy ≥ α²det(x) + β²det(y) ∀α ≥ 0, β ≥ 0.

Note that together with Cauchy-Schwartz inequality and properties of determinant, one may achieve other way to verify Proposition 1.2. Again, the approach here is only one choice of proof which is straightforward and intuitive. There are more inequalities about determinant, see Proposition 1.8 and Proposition 2.32, which are established by using the concept of SOC-convexity that will be introduced in Chapter 2. Next, we move to the inequalities about trace.

Proposition 1.3. For any x, y ∈ IRⁿ, we have

(11)

(a) tr(x + y) = tr(x) + tr(y) and tr(αx) = α tr(x) for any α ∈ IR. In other words, tr(·) is a linear function on IRⁿ.

(b) λ₁(x)λ₂(y) + λ₁(y)λ₂(x) ≤ tr(x ◦ y) ≤ λ₁(x)λ₁(y) + λ₂(x)λ₂(y).

Proof. Part(a) is trivial and it remains to verify part(b). Using the fact that tr(x ◦ y) = 2hx, yi, we obtain

λ1(x)λ2(y) + λ1(y)λ2(x) = (x1− kx2k)(y1 + ky2k) + (x1+ kx2k)(y1− ky2k)

= 2(x₁y₁− kx₂kky₂k)

≤ 2(x₁y₁+ hx₂, y₂i)

= 2hx, yi

= tr(x ◦ y)

≤ 2(x₁y₁+ kx₂kky₂k)

= (x₁− kx₂k)(y₁ − ky₂k) + (x₁+ kx₂k)(y₁+ ky₂k)

= λ₁(x)λ₁(y) + λ₂(x)λ₂(y), which completes the proof.

In general, det(x ◦ y) 6= det(x) det(y) unless x₂ = αy₂. A vector x = (x₁, x₂) ∈ IR × IRⁿ⁻¹ is said to be invertible if det(x) 6= 0. If x is invertible, then there exists a unique y = (y1, y2) ∈ IR × IRⁿ⁻¹ satisfying x ◦ y = y ◦ x = e. We call this y the inverse of x and denote it by x⁻¹. In fact, we have

x⁻¹ = 1

x²₁ − kx₂k²(x₁, −x₂) = 1

det(x) tr(x)e − x.

Therefore, x ∈ int(Kⁿ) if and only if x⁻¹ ∈ int(Kⁿ). Moreover, if x ∈ int(Kⁿ), then x^−k = (x^k)⁻¹ = (x⁻¹)^k is also well-defined. For any x ∈ Kⁿ, it is known that there exists a unique vector in Kⁿ denoted by x^1/2 (also denoted by √

x sometimes) such that (x^1/2)² = x^1/2◦ x^1/2 = x. Indeed,

x^1/2= s, x₂

2s

, where s = s

1 2

x₁ +

q

x²₁− kx₂k²

.

In the above formula, the term ^x_2s² is defined to be the zero vector if s = 0 (and hence x2 = 0), i.e., x = 0 .

For any x ∈ IRⁿ, we always have x² ∈ Kⁿ(i.e., x² _Kn 0). Hence, there exists a unique vector (x²)^1/2 ∈ Kⁿ denoted by |x|. It is easy to verify that |x| _Kn 0 and x² = |x|² for any x ∈ IRⁿ. It is also known that |x| _Kn x. For any x ∈ IRⁿ, we define [x]+ to be the projection point of x onto Kⁿ, which is the same definition as in IRⁿ₊. In other words, [x]₊ is the optimal solution of the parametric SOCP:

[x]₊ = argmin{kx − yk | y ∈ Kⁿ}.

(12)

Here the norm is in Euclidean norm since Jordan product does not induce a norm. Like- wise, [x]₋ means the projection point of x onto −Kⁿ, which implies [x]₋= −[−x]₊. It is well known that [x]₊ = ¹₂(x + |x|) and [x]− = ¹₂(x − |x|), see Property 1.2(f).

The spectral decomposition along with the Jordan algebra associated with SOC entails some basic properties as below. We omit the proofs since they can be found in [61, 63].

Property 1.1. For any x = (x₁, x₂) ∈ IR × IRⁿ⁻¹ with the spectral values λ₁(x), λ₂(x) and spectral vectors u⁽¹⁾x , u⁽²⁾x given as in (1.3)-(1.4), we have

(a) u⁽¹⁾x and u⁽²⁾x are orthogonal under Jordan product and have length ^√¹

2, i.e., u⁽¹⁾_x ◦ u⁽²⁾_x = 0, ku⁽¹⁾_x k = ku⁽²⁾_x k = 1

√2.

(b) u⁽¹⁾x and u⁽²⁾x are idempotent under Jordan product, i.e., u⁽ⁱ⁾_x ◦ u⁽ⁱ⁾_x = u⁽ⁱ⁾_x , i = 1, 2.

(c) λ₁(x), λ₂(x) are nonnegative (positive) if and only if x ∈ Kⁿ (x ∈ int(Kⁿ)), i.e., λ_i(x) ≥ 0 for i = 1, 2 ⇐⇒ x _Kn 0.

λi(x) > 0 for i = 1, 2 ⇐⇒ x _Kn 0.

Although the converse of Proposition 1.1(b) does not hold as mentioned earlier, Prop- erty 1.1(c) is useful in verifying whether a point x belongs to Kⁿ or not.

Property 1.2. For any x = (x₁, x₂) ∈ IR × IRⁿ⁻¹ with the spectral values λ₁(x), λ₂(x) and spectral vectors u⁽¹⁾x , u⁽²⁾x given as in (1.3)-(1.4), we have

(a) x² = λ₁(x)²u⁽¹⁾x + λ₂(x)²u⁽²⁾x and x⁻¹ = λ⁻¹₁ (x)u⁽¹⁾x + λ⁻¹₂ (x)u⁽²⁾x . (b) If x ∈ Kⁿ, then x^1/2 =pλ1(x) u⁽¹⁾x +pλ2(x) u⁽²⁾x .

(c) |x| = |λ₁(x)|u⁽¹⁾x + |λ₂(x)|u⁽²⁾x .

(d) [x]₊ = [λ₁(x)]₊u⁽¹⁾x + [λ₂(x)]₊u⁽²⁾x and [x]− = [λ₁(x)]−u⁽¹⁾x + [λ₂(x)]−u⁽²⁾x . (e) |x| = [x]₊+ [−x]₊ = [x]₊− [x]−.

(f ) [x]₊ = ¹₂(x + |x|) and [x]−= ¹₂(x − |x|).

Property 1.3. Let x = (x₁, x₂) ∈ IR × IRⁿ⁻¹ and y = (y₁, y₂) ∈ IR × IRⁿ⁻¹. Then, the following hold.

(13)

(a) Any x ∈ IRⁿ satisfies |x| _Kn x.

(b) For any x, y _Kn 0, if x _Kn y, then x^1/2 _Kn y^1/2. (c) For any x, y ∈ IRⁿ, if x² _Kn y², then |x| _Kn |y|.

(d) For any x ∈ IRⁿ, x _Kn 0 if and only if hx, yi ≥ 0 for all y _Kn 0.

(e) For any x _Kn 0 and y ∈ IRⁿ, if x² _Kn y², then x _Kn y.

Note that for any x, y _Kn 0, if x _Kn y, one can also conclude that x⁻¹ _Kn y⁻¹. However, the arguments are not trivial by direct verifications. We present it by other approach, see Proposition 2.3(a).

Property 1.4. For any x = (x₁, x₂) ∈ IR × IRⁿ⁻¹ with spectral values λ₁(x), λ₂(x) and any y = (y₁, y₂) ∈ IR × IRⁿ⁻¹ with spectral values λ₁(y), λ₂(y), we have

|λ_i(x) − λ_i(y)| ≤√

2kx − yk, i = 1, 2.

Proof. First, we compute that

|λ₁(x) − λ₁(y)| = |x₁ − kx₂k − y₁+ ky₂k|

≤ |x₁ − y₁| + |kx₂k − ky₂k|

≤ |x₁ − y₁| + kx₂− y₂k

≤ √

2 |x₁− y₁|²+ kx₂− y₂k²1/2

= √

2kx − yk,

where the second inequality uses kx₂k ≤ kx₂−y₂k+ky₂k and ky₂k ≤ kx₂−y₂k+kx₂k; the last inequality uses the relation between the 1-norm and the 2-norm. A similar argument applies to |λ2(x) − λ2(y)|.

In fact, Property 1.1-1.3 are parallel results analogous to those associated with positive semidefinite cone S₊ⁿ, see [74]. Even though both Kⁿ and S₊ⁿ belong to the family of symmetric cones [61] and share similar properties, as we will see, the ideas and techniques for proving these results are quite different. One reason is that the Jordan product is not associative as mentioned earlier.

1.2 SOC function and SOC trace function

In this section, we introduce two types of functions, SOC function and SOC trace function, which are very useful in dealing with optimization involved with SOC. Some inequalities are established in light of these functions.

(14)

Let x = (x₁, x₂) ∈ IR × IRⁿ⁻¹ with spectral values λ₁(x), λ₂(x) given as in (1.3) and spectral vectors u⁽¹⁾x , u⁽²⁾x given as in (1.4). We first define its corresponding SOC function as below. For any real-valued function f : IR → IR, the following vector-valued function associated with Kⁿ (n ≥ 1) was considered [45, 63]:

f^soc(x) := f (λ₁(x))u⁽¹⁾_x + f (λ₂(x))u⁽²⁾_x , ∀x = (x₁, x₂) ∈ IR × IRⁿ⁻¹. (1.9) The definition (1.9) is unambiguous whether x₂ 6= 0 or x₂ = 0. The cases of f^soc(x) = x^1/2, x², exp(x), which correspond to f (t) = t^1/2, t², e^t, are already discussed in the book [61].

Indeed, the above definition (1.9) is analogous to one associated with the semidefinite cone S₊ⁿ, see [140, 145]. For subsequent analysis, we also need the concept of SOC trace function [46] defined by

f^tr(x) := f (λ1(x)) + f (λ2(x)) = tr(f^soc(x)). (1.10) If f is defined only on a subset of IR, then f^soc and f^trare defined on the corresponding subset of IRⁿ. More specifically, from Proposition 1.4 shown as below, we see that the corresponding subset for f^soc and f^tr is

S = {x ∈ IRⁿ| λ_i(x) ∈ J, i = 1, 2.} (1.11) provided f is defined on a subset of J ⊆ IR. In addition, S is open in IRⁿ whenever J is open in IR. To see this assertion, we need the following technical lemma.

Lemma 1.1. Let A ∈ IR^m×m be a symmetric positive definite matrix, C ∈ IR^n×n be a symmetric matrix, and B ∈ IR^m×n. Then,

A B

B^T C

O ⇐⇒ C − B^TA⁻¹B O (1.12)

and

A B

B^T C

O ⇐⇒ C − B^TA⁻¹B O. (1.13)

Proof. This is indeed the Schur Complement Theorem, please see [21, 22, 74] for a proof.

Proposition 1.4. For any given f : J ⊆ IR → IR, let f^soc : S → IRⁿ and f^tr : S → IR be given by (1.9) and (1.10), respectively. Assume that J is open. Then, the following results hold.

(a) The domain S of f^soc and f^tr is also open.

(15)

(b) If f is (continuously) differentiable on J , then f^soc is (continuously) differentiable on S. Moreover, for any x ∈ S, ∇f^soc(x) = f⁰(x₁)I if x₂ = 0, and otherwise

∇f^soc(x) =







b(x) c(x) x^T₂

kx₂k c(x) x₂

kx₂k a(x)I + (b(x) − a(x))x₂x^T₂ kx₂k²







, (1.14)

where

a(x) = f (λ₂(x)) − f (λ₁(x)) λ₂(x) − λ₁(x) , b(x) = f⁰(λ₂(x)) + f⁰(λ₁(x))

2 ,

c(x) = f⁰(λ₂(x)) − f⁰(λ₁(x))

2 .

(c) If f is (continuously) differentiable, then f^tr is (continuously) differentiable on S with ∇f^tr(x) = 2(f⁰)^soc(x); if f is twice (continuously) differentiable, then f^tr is twice (continuously) differentiable on S with ∇²f^tr(x) = ∇(f⁰)^soc(x).

Proof. (a) Fix any x ∈ S. Then λ₁(x), λ₂(x) ∈ J . Since J is an open subset of IR, there exist δ1, δ2 > 0 such that {t ∈ IR | |t − λ1(x)| < δ1} ⊆ J and {t ∈ IR | |t − λ2(x)| <

δ₂} ⊆ J. Let δ := min{δ₁, δ₂}/√

2. Then, for any y satisfying ky − xk < δ, we have

|λ₁(y) − λ₁(x)| < δ₁ and |λ₂(y) − λ₂(x)| < δ₂ by noting that (λ1(x) − λ1(y))²+ (λ2(x) − λ2(y))²

= 2(x²₁+ kx₂k²) + 2(y²₁ + ky₂k²) − 4(x₁y₁+ kx₂kky₂k)

≤ 2(x²₁+ kx₂k²) + 2(y²₁ + ky₂k²) − 4(x₁y₁+ hx₂, y₂i)

= 2 kxk²+ kyk²− 2hx, yi

= 2kx − yk²,

and consequently λ₁(y) ∈ J and λ₂(y) ∈ J . Since f is a function from J to IR, this means that {y ∈ IRⁿ| ky − xk < δ} ⊆ S, and therefore the set S is open. In addition, from the above, we see that S is characterized as in (1.11).

(b) The arguments are similar to Proposition 1.13 and Proposition 1.14 in Section 1.3.

Please check them for details.

(c) If f is (continuously) differentiable, then from part(b) and f^tr(x) = 2e, f^soc(x) it follows that f^tr is (continuously) differentiable. In addition, a simple computation yields that ∇f^tr(x) = 2∇f^soc(x)e = 2(f⁰)^soc(x). Similarly, by part(b), the second part follows.

(16)

Proposition 1.5. For any f : J → IR, let f^soc : S → IRⁿ and f^tr : S → IR be given by (1.9) and (1.10), respectively. Assume that J is open. If f is twice differentiable on J , then

(a) f⁰⁰(t) ≥ 0 for any t ∈ J ⇐⇒ ∇(f⁰)^soc(x) O for any x ∈ S ⇐⇒ f^tr is convex in S.

(b) f⁰⁰(t) > 0 for any t ∈ J ⇐⇒ ∇(f⁰)^soc(x) O for any x ∈ S =⇒ f^tr is strictly convex in S.

Proof. (a) By Proposition 1.4(c), ∇²f^tr(x) = 2∇(f⁰)^soc(x) for any x ∈ S, and the second equivalence follows by [20, Prop. B.4(a) and (c)]. We next come to the first equivalence.

By Proposition 1.4(b), for any fixed x ∈ S, ∇(f⁰)^soc(x) = f⁰⁰(x₁)I if x₂ = 0, and otherwise

∇(f⁰)^soc(x) has the same expression as in (1.14) except that a(x) = f⁰(λ₂(x)) − f⁰(λ₁(x))

λ₂(x) − λ₁(x) , b(x) = f⁰⁰(λ₂(x)) + f⁰⁰(λ₁(x))

2 ,

c(x) = f⁰⁰(λ₂(x)) − f⁰⁰(λ₁(x))

2 .

Assume that ∇(f⁰)^soc(x) O for any x ∈ S. Then, we readily have b(x) ≥ 0 for any x ∈ S. Noting that b(x) = f⁰⁰(x₁) when x₂ = 0, we particularly have f⁰⁰(x₁) ≥ 0 for all x₁ ∈ J, and consequently f⁰⁰(t) ≥ 0 for all t ∈ J . Assume that f⁰⁰(t) ≥ 0 for all t ∈ J . Fix any x ∈ S. Clearly, b(x) ≥ 0 and a(x) ≥ 0. If b(x) = 0, then f⁰⁰(λ1(x)) = f⁰⁰(λ2(x)) = 0, and consequently c(x) = 0, which in turn implies that

∇(f⁰)^soc(x) = " 0 0 0 a(x)

I − _kx^x²^x^T²

2k²

#

O. (1.15)

If b(x) > 0, then by the first equivalence of Lemma 1.1 and the expression of ∇(f⁰)^soc(x) it suffices to argue that the following matrix

a(x)I + (b(x) − a(x)) x₂x^T₂

kx₂k² −c²(x) b(x)

x₂x^T₂

kx₂k² (1.16)

is positive semidefinite. Since the rank-one matrix x2x^T₂ has only one nonzero eigenvalue kx₂k², the matrix in (1.16) has one eigenvalue a(x) of multiplicity n−1 and one eigenvalue

b(x)²−c(x)²

b(x) of multiplicity 1. Since a(x) ≥ 0 and ^b(x)²_b(x)^−c(x)² = f⁰⁰(λ₁(x))f⁰⁰(λ₂(x)) ≥ 0, the matrix in (1.16) is positive semidefinite. By the arbitrary of x, we have that ∇(f⁰)^soc(x) O for all x ∈ S.

(b) The first equivalence is direct by using (1.13) of Lemma 1.1, noting ∇(f⁰)^soc(x) O implies a(x) > 0 when x₂ 6= 0, and following the same arguments as part(a). The second part is due to [20, Prop. B.4(b)].

(17)

Remark 1.1. Note that the strict convexity of f^tr does not necessarily imply the positive definiteness of ∇²f^tr(x). Consider f (t) = t⁴ for t ∈ IR. We next show that f^tr is strictly convex. Indeed, f^tr is convex in IRⁿ by Proposition 1.5(a) since f⁰⁰(t) = 12t² ≥ 0. Taking into account that f^tr is continuous, it remains to prove that

f^tr x + y 2

= f^tr(x) + f^tr(y)

2 =⇒ x = y. (1.17)

Since h(t) = (t₀ + t)⁴ + (t₀ − t)⁴ for some t₀ ∈ IR is increasing on [0, +∞), and the function f (t) = t⁴ is strictly convex in IR, we have that

f^tr x + y 2

=

λ₁ x + y 2

4

+

λ₂ x + y 2

4

= x₁+ y₁− kx₂+ y₂k 2

4

+ x₁+ y₁+ kx₂+ y₂k 2

4

≤ x₁+ y₁− kx₂k − ky₂k 2

4

+ x₁ + y₁+ kx₂k + ky₂k 2

4

= λ₁(x) + λ₁(y) 2

4

+ λ₂(x) + λ₂(y) 2

4

≤ (λ₁(x))⁴ + (λ₁(y))⁴+ (λ₂(x))⁴+ (λ₂(y))⁴ 2

= f^tr(x) + f^tr(y)

2 ,

and moreover, the above inequalities become the equalities if and only if kx₂+ y₂k = kx₂k + ky₂k, λ₁(x) = λ₁(y), λ₂(x) = λ₂(y).

It is easy to verify that the three equalities hold if and only if x = y. Thus, the implication in (1.17) holds, i.e., f^tr is strictly convex. However, by Proposition 1.5(b), ∇(f⁰)^soc(x) O does not hold for all x ∈ IRⁿ since f⁰⁰(t) > 0 does not hold for all t ∈ IR.

We point out that the fact that the strict convexity of f implies the strict convexity of f^tr was proved in [7, 15] via the definition of convex function, but here we use the Schur Complement Theorem and the relation between ∇(f⁰)^soc and ∇²f^tr to establish the convexity of SOC trace functions. Next, we illustrate the application of Proposition 1.5 with some SOC trace functions.

Proposition 1.6. The following functions associated with Kⁿ are all strictly convex.

(a) F₁(x) = − ln(det(x)) for x ∈ int(Kⁿ).

(b) F₂(x) = tr(x⁻¹) for x ∈ int(Kⁿ).

(18)

(c) F₃(x) = tr(φ(x)) for x ∈ int(Kⁿ), where φ(x) =

( _xp+1−e

p+1 +^x^1−q_q−1^−e if p ∈ [0, 1], q > 1,

x^p+1−e

p+1 − ln x if p ∈ [0, 1], q = 1.

(d) F₄(x) = − ln(det(e − x)) for x ≺_Kn e.

(e) F5(x) = tr((e − x)⁻¹◦ x) for x ≺_Kn e.

(f ) F₆(x) = tr(exp(x)) for x ∈ IRⁿ.

(g) F₇(x) = ln(det(e + exp(x))) for x ∈ IRⁿ. (h) F₈(x) = tr x + (x² + 4e)^1/2

2

for x ∈ IRⁿ.

Proof. Note that F₁(x), F₂(x) and F₃(x) are the SOC trace functions associated with f₁(t) = − ln t (t > 0), f₂(t) = t⁻¹ (t > 0) and f₃(t) (t > 0), respectively, where

f3(t) =

( _tp+1−1

p+1 +^t^1−q_q−1⁻¹ if p ∈ [0, 1], q > 1,

t^p+1−1

p+1 − ln t if p ∈ [0, 1], q = 1;

Next, F₄(x) is the SOC trace function associated with f₄(t) = − ln(1 − t) (t < 1), F₅(x) is the SOC trace function associated with f₅(t) = _1−t^t (t < 1) by noting that

(e − x)⁻¹◦ x = λ₁(x)

λ₁(e − x)u⁽¹⁾_x + λ₂(x) λ₂(e − x)u⁽²⁾_x ;

In addition, F₆(x) and F₇(x) are the SOC trace functions associated with f₆(t) = exp(t) (t ∈ IR) and f7(t) = ln(1 + exp(t)) (t ∈ IR), respectively, and F8(x) is the SOC trace function associated with f₈(t) = ¹₂ t +√

t²+ 4 (t ∈ IR). It is easy to verify that all the functions f₁-f₈ have positive second-order derivatives in their respective domain, and therefore F₁-F₈ are strictly convex functions by Proposition 1.5(b).

The functions F1, F2 and F3 are the popular barrier functions which play a key role in the development of interior point methods for SOCPs, see, e.g., [14, 19, 109, 123, 146], where F₃ covers a wide range of barrier functions, including the classical logarithmic barrier function, the self-regular functions and the non-self-regular functions; see [14]

for details. The functions F4 and F5 are the popular shifted barrier functions [6, 7, 9]

for SOCPs, and F₆-F₈ can be used as penalty functions for second-order cone programs (SOCPs), and these functions are added to the objective of SOCPs for forcing the solution to be feasible.

Besides the application in establishing convexity for SOC trace functions, the Schur Complement Theorem can be employed to establish convexity of some compound functions of SOC trace functions and scalar-valued functions, which are usually difficult

(19)

to achieve by checking the definition of convexity directly. The following proposition presents such an application.

Proposition 1.7. For any x ∈ Kⁿ, let F9(x) := −[det(x)]^1/p with p > 1. Then, (a) F₉ is twice continuously differentiable in int(Kⁿ).

(b) F9 is convex when p ≥ 2, and moreover, it is strictly convex when p > 2.

Proof. (a) Note that −F₉(x) = exp (p⁻¹ln(det(x))) for any x ∈ int(Kⁿ), and ln(det(x)) = f^tr(x) with f (t) = ln(t) for t ∈ IR++. By Proposition 1.4(c), ln(det(x)) is twice continuously differentiable in int(Kⁿ). Hence −F₉(x) is twice continuously differentiable in int(Kⁿ). The result then follows.

(b) In view of the continuity of F₉, we only need to prove its convexity over int(Kⁿ). By part(a), we next achieve this goal by proving that the Hessian matrix ∇²F₉(x) for any x ∈ int(Kⁿ) is positive semidefinite when p ≥ 2, and positive definite when p > 2. Fix any x ∈ int(Kⁿ). From direct computations, we obtain

∇F₉(x) = −1 p

"

(2x1) x²₁− kx2k²¹_p−1

(−2x2) x²₁− kx2k²¹_p−1

#

and

∇²F₉(x) = p − 1

p² (det(x))¹^p⁻²





4x²₁−^2p(^x²1−kx2k²)

p−1 −4x₁x^T₂

−4x₁x₂ 4x₂x^T₂ +^2p(^x²1−kx2k²)

p−1 I



.

Since x ∈ int(Kⁿ), we have x₁ > 0 and det(x) = x²₁− kx₂k² > 0, and therefore a1(x) := 4x²₁− 2p (x²₁− kx₂k²)

p − 1 =

4 − 2p p − 1

x²₁+ 2p

p − 1kx₂k².

We next proceed the arguments by the following two cases: a₁(x) = 0 or a₁(x) > 0.

Case 1: a1(x) = 0. Since p ≥ 2, under this case we must have x2 = 0, and consequently,

∇²F₉(x) = p − 1

p² (x₁)²^p⁻⁴ 0 0 0 _p−1^2p x²₁I

O.

Case 2: a1(x) > 0. Under this case, we calculate that

4x²₁− 2p (x²₁ − kx₂k²) p − 1

4x₂x^T₂ +2p (x²₁− kx₂k²) p − 1 I

− 16x²₁x₂x^T₂

= 4p (x²₁− kx2k²) p − 1

p − 2

p − 1x²₁I + p

p − 1kx₂k²I − 2x₂x^T₂

. (1.18)

(20)

Since the rank-one matrix 2x₂x^T₂ has only one nonzero eigenvalue 2kx₂k², the matrix in the bracket of the right hand side of (1.18) has one eigenvalue of multiplicity 1 given by

p − 2

p − 1x²₁+ p

p − 1kx₂k²− 2kx₂k² = p − 2

p − 1 x²₁− kx₂k² ≥ 0,

and one eigenvalue of multiplicity n − 1 given by ^p−2_p−1x²₁+_p−1^p kx2k² ≥ 0. Furthermore, we see that these eigenvalues must be positive when p > 2 since x²₁ > 0 and x²₁− kx₂k² > 0.

This means that the matrix on the right hand side of (1.18) is positive semidefinite, and moreover, it is positive definite when p > 2. Applying Lemma 1.1, we have that

∇²F₉(x) O, and furthermore ∇²F₉(x) O when p > 2.

Since a₁(x) > 0 must hold when p > 2, the arguments above show that F₉(x) is convex over int(Kⁿ) when p ≥ 2, and strictly convex over int(Kⁿ) when p > 2.

It is worthwhile to point out that det(x) is neither convex nor concave on Kⁿ, and it is difficult to argue the convexity of those compound functions involving det(x) by the definition of convex function. But, our SOC trace function offers a simple way to prove their convexity. Moreover, it helps on establishing more inequalities associated with SOC. Some of these inequalities have been used to analyze the properties of SOC function f^soc [41] and the convergence of interior point methods for SOCPs [7].

Proposition 1.8. For any x _Kn 0 and y _Kn 0, the following inequalities hold.

(a) det(αx + (1 − α)y) ≥ (det(x))^α(det(y))^1−α for any 0 < α < 1.

(b) det(x + y)^1/p≥ 2²^p⁻¹ det(x)^1/p+ det(y)^1/p for any p ≥ 2.

(c) det(αx + (1 − α)y) ≥ α²det(x) + (1 − α)²det(y) for any 0 < α < 1.

(d) [det(e + x)]^1/2 ≥ 1 + det(x)^1/2. (e) det(x)^1/2 = inf 1

2tr(x ◦ y)

det(y) = 1, y _Kn 0

. Furthermore, when x _Kn 0, the same relation holds with inf replaced by min.

(f ) tr(x ◦ y) ≥ 2 det(x)^1/2det(y)^1/2.

Proof. (a) From Proposition 1.6(a), we know that ln(det(x)) is strictly concave in int(Kⁿ). With this, we have

ln(det(αx + (1 − α)y)) ≥ α ln(det(x)) + (1 − α) ln(det(y))

= ln(det(x)^α) + ln(det(x)^1−α)

for any 0 < α < 1 and x, y ∈ int(Kⁿ). This, together with the increasing of ln t (t > 0) and the continuity of det(x), implies the desired result.

(21)

(b) By Proposition 1.7(b), det(x)^1/p is concave over Kⁿ. Then, for any x, y ∈ Kⁿ, we have

det x + y 2

1/p

≥ 1

2det(x)^1/p+ det(y)^1/p

⇐⇒ 2

"

x₁+ y₁ 2

2

−

x₂+ y₂ 2

2#1/p

≥ x²₁− kx2k²1/p

+ y₁²− ky2k²1/p

⇐⇒ (x₁+ y₁)²− kx₂+ y₂k²1/p

≥ 4¹^p 2

h

x²₁− kx₂k²1/p

+ y₁²− ky₂k²1/pi

⇐⇒ det(x + y)^1/p ≥ 2²^p⁻¹ det(x)^1/p+ det(y)^1/p , which is the desired result.

(c) Using the inequality in part(b) with p = 2, we have

det(x + y)^1/2 ≥ det(x)^1/2+ det(y)^1/2. Squaring both sides yields

det(x + y) ≥ det(x) + det(y) + 2 det(x)^1/2det(y)^1/2 ≥ det(x) + det(y),

where the last inequality is by the nonnegativity of det(x) and det(y) since x, y ∈ Kⁿ. This together with the fact det(αx) = α²det(x) leads to the desired result.

(d) This inequality is presented in Proposition 1.2(d). Nonetheless, we provide a different approach by applying part(b) with p = 2 and the fact that det(e) = 1.

(e) From Proposition 1.3(b), we have

tr(x ◦ y) ≥ λ₁(x)λ₂(y) + λ₁(y)λ₂(x), ∀x, y ∈ IRⁿ.

For any x, y ∈ Kⁿ, this along with the arithmetic-geometric mean inequality implies that tr(x ◦ y)

2 ≥ λ₁(x)λ₂(y) + λ₁(y)λ₂(x) 2

≥ p

λ₁(x)λ₂(y)λ₁(y)λ₂(x)

= det(x)^1/2det(y)^1/2, which means that inf 1

2tr(x ◦ y)

det(y) = 1, y _Kn 0

= det(x)^1/2 for a fixed x ∈ Kⁿ. If x _Kn 0, then we can verify that the feasible point y^∗ = √^x⁻¹

det(x) is such that 1

2tr(x◦y^∗) = det(x)^1/2, and the second part follows.

(f) Using part(e), for any x ∈ Kⁿ and y ∈ int(Kⁿ), we have tr(x ◦ y)

2pdet(y) = 1

2tr x ◦ y pdet(y)

!

≥p

det(x),