Interior proximal-like algorithms for SOCP

Considering that is arbitrary and f (ζ^N) ≥ f∗, we thus have the desired result.

(b) Suppose that ζ^∗ ∈ X . Then, from Proposition 3.6(d), D(ζ^∗, ζ^k) ≤ D(ζ^∗, ζ⁰) for any k. This implies that {ζ^k} ⊆ LD(ζ^∗, D(ζ^∗, ζ⁰)). By Proposition 3.6(d), the sequence {ζ^k} is then bounded. Let ¯ζ ∈ F be an accumulation point of {ζ^k} with subsequence {ζ^k^j} → ¯ζ. Then, from part(a), it follows that f (ζ^k^j) → f∗. On the other hand, since f is lower-semicontinuous, we have f (¯ζ) = lim inf_k_j→+∞f (ζ^k^j). The two sides show that f (¯ζ) ≤ f (ζ^∗). Consequently, ¯ζ is a solution of the CSOCP.

and obtained the convergence theorems under weaker assumptions. Clearly, when ϕ(t) = − ln t + t − 1 (t > 0),

we have that d_ϕ(ζ, ξ) = d(ξ, ζ), and consequently the algorithm reduces to Eggermont’s (3.32).

Observing that the proximal-like algorithm (3.33) associated with ϕ(t) = − ln t + t − 1 inherits the features of the interior point method as well as the proximal point method, Auslender [8] extended the algorithm to general linearly constrained convex minimiza-tion problems and variaminimiza-tional inequalities on polyhedra. Then, is it possible to extend the algorithm to nonpolyhedra symmetric conic optimization problems and establish the corresponding convergence results? In this section, we will explore its extension to the setting of second-order cones and establish a class of interior proximal-like algorithms for the CSOCP. We should mention that the algorithm (3.33) with the entropy function t ln t − t + 1 (t ≥ 0) was recently extended to convex semidefinite programming [57].

Again as defined in (3.17) and (3.18), we denote F the constraint set of the CSOCP, i.e.,

F := {ζ ∈ IR^m| Aζ + b _Kn 0} , and denote its interior by int(F ), i.e.,

int(F ) := {ζ ∈ IR^m| Aζ + b _Kn 0} .

Accordingly, the 2nd proximal-like algorithm that we propose for the CSOCP is defined as follows:

( ζ⁰ ∈ int(F ) ζ^k = argmin

ζ∈int(F )

f (ζ) + µ⁻¹_k D(Aζ + b, Aζ^k−1+ b) , (3.34) where D : IRⁿ × IRⁿ → (−∞, +∞] is a closed proper convex function generated by a class of twice continuously differentiable and strictly convex functions on (0, +∞), and the specific expression is given later. The class of distance measures includes as a special case the natural extension of d_ϕ(x, y) with ϕ(t) = − ln t + t − 1 to the second-order cones.

For the proximal-like algorithm (3.34), we particularly consider an approximate version which allows inexact minimization of the subproblem (3.34) and establish its global con-vergence results under some mild assumptions.

Next, we present the definition of the distance-like function D(x, y) involved in the proximal-like algorithm (3.34) and some specific examples. Let φ : IR → (−∞, ∞] be a closed proper convex function with domφ = [0, ∞) and assume that

(C1) φ is strictly convex on its domain.

(C2) φ is twice continuously differentiable on int(domφ) with lim_t→0⁺φ⁰⁰(t) = ∞.

(C3) φ⁰(t)t − φ(t) is convex on int(domφ).

(C4) φ⁰ is SOC-concave on int(domφ).

In the sequel, we denote by Φ the class of functions satisfying conditions (C1)-(C4).

Given a φ ∈ Φ, let φ^soc and (φ⁰)^soc be the vector-valued functions given as in (3.6)-(3.7). We define D(x, y) involved in the proximal-like algorithm (3.34) by

D(x, y) :=

( trh

φ^soc(y) − φ^soc(x) − (φ⁰)^soc(x) ◦ (y − x)i

, ∀x ∈ int(Kⁿ), y ∈ Kⁿ.

∞, otherwise.

(3.35) The function, as will be shown later, possesses some favorable properties. Particularly, D(x, y) ≥ 0 for any x, y ∈ int(Kⁿ), and D(x, y) = 0 if and only if x = y. Hence, D(x, y) can be used to measure the distance between any two points in int(Kⁿ).

In the following, we concentrate on the examples of the distance-like function D(x, y).

For this purpose, we first give another characterization for condition (C3).

Lemma 3.2. Let φ : IR → (−∞, ∞] be a closed proper convex function with domφ = [0, +∞). If φ is thrice continuously differentiable on int(domφ), then φ satisfies condition (C3) if and only if its derivative function φ⁰ is exponentially convex (which means the function φ⁰(exp(·)) : IR → IR is convex on IR), or

φ⁰(t1t2) ≤ 1 2

φ⁰(t²₁) + φ⁰(t²₂)

, ∀t1, t2 > 0. (3.36) Proof. Since the function φ is thrice continuously differentiable on int(domφ), φ satisfies condition (C3) if and only if

φ⁰⁰(t) + tφ⁰⁰⁰(t) ≥ 0, ∀t > 0.

Observe that the inequality is also equivalent to

tφ⁰⁰(t) + t²φ⁰⁰⁰(t) ≥ 0, ∀t > 0,

and hence substituting by t = exp(θ) for θ ∈ IR into the inequality yields that exp(θ)φ⁰⁰(exp(θ)) + exp(2θ)φ⁰⁰⁰(exp(θ)) ≥ 0, ∀θ ∈ IR.

Since the left hand side of this inequality is exactly [φ⁰(exp(θ))]⁰⁰, it means that φ⁰(exp(·)) is convex on IR. Consequently, the first part of the conclusions follows.

Note that the convexity of φ⁰(exp(·)) on IR is equivalent to saying for any θ₁, θ₂ ∈ IR, φ⁰(exp(rθ₁+ (1 − r)θ₂)) ≤ rφ⁰(exp(θ₁)) + (1 − r)φ⁰(exp(θ₂)), r ∈ [0, 1],

which, by letting t₁ = exp(θ₁) and t₂ = exp(θ₂), can be rewritten as

φ⁰(t^r₁t^1−r₂ ) ≤ rφ⁰(t1) + (1 − r)φ⁰(t2), ∀t1, t2 > 0 and r ∈ [0, 1].

This is clearly equivalent to the statement in (3.36) due to the continuity of φ⁰. Remark 3.3. The exponential convexity was also used in the definition of the self-regular function [123], in which the authors denote Ω by the set of functions whose elements are twice continuously differentiable and exponentially convex on (0, +∞). By Lemma 3.2, clearly, if h ∈ Ω, then the function Rt

0 h(θ)dθ necessarily satisfies condition (C3). For example, ln t belongs to Ω, and hence Rt

0 ln θdθ = t ln t satisfies condition (C3).

Now we present several examples showing how to construct D(x, y). From these examples, we see that the conditions required by φ ∈ Φ are not so strict and the con-struction of the distance-like functions in SOCs can be completed by selecting a class of single variate convex functions.

Example 3.4. Let φ₁ : IR → (−∞, ∞] be given by

φ₁(t) = t ln t − t + 1 if t ≥ 0,

∞ if t < 0.

Solution. It is easy to verify that φ₁ satisfies conditions (C1)-(C3). In addition, by Example 2.10 and 2.12, the function ln t is SOC-concave and SOC-monotone on (0, ∞), hence the condition (C4) also holds. From formula (3.6)-(3.7), it follows that for any y ∈ Kⁿ and x ∈ int(Kⁿ),

φ^soc₁ (y) = y ◦ ln y − y + e and (φ⁰₁)^soc(x) = ln x.

Consequently, the distance-like function induced by φ₁ is given by

D₁(x, y) = tr (y ◦ ln y − y ◦ ln x + x − y) , ∀x ∈ int(Kⁿ), y ∈ Kⁿ.

This function is precisely the natural extension of the entropy-like distance d_ϕ(·, ·) with ϕ(t) = − ln t + t − 1 to the second-order cones. In addition, comparing D₁(x, y) with the distance-like function H(x, y) in Example 3.1 of [115] (see Section 3.1), we note that D₁(x, y) = H(y, x), but the proximal-like algorithms corresponding to them are completely different.

Example 3.5. Let φ₂ : IR → (−∞, ∞] be given by

φ2(t) = t ln t + (1 + t) ln(1 + t) − (1 + t) ln 2 if t ≥ 0,

∞ if t < 0.

Solution. By computing, we can verify that φ₂ satisfies conditions (C1)-(C3). Further-more, from earlier examples, we learn that φ₂ also satisfies condition (C4). This means that φ₂ ∈ Φ. For any y ∈ Kⁿ and x ∈ int(Kⁿ), we can compute that

φ^soc₂ (y) = y ◦ ln y + (e + y) ◦ ln(e + y) − ln 2(e + y), (φ⁰₂)^soc(x) = (2 − ln 2)e + ln x + ln(e + x).

Therefore, the distance-like function generated by such a φ is given by

D₂(x, y) = tr − ln(e + x) ◦ (e + y) + y ◦ (ln y − ln x) + (e + y) ◦ ln(e + y) − 2(y − x)

for any x ∈ int(Kⁿ) and y ∈ Kⁿ. It should be pointed out that D₂(x, y) is not the extension of d_ϕ(·, ·) with ϕ(t) = φ₂(t) given by [80] to the second-order cones. Example 3.6. For any 0 ≤ r < ¹₂, let φ₃ : IR → (−∞, ∞] be given by

φ₃(t) = t^2r+3² + t² if t ≥ 0,

∞ if t < 0.

Solution. It is easy to verify that φ3 satisfies conditions (C1)-(C3). Furthermore, from Examples 2.12-2.14, it follows that φ₃ satisfies condition (C4). In other words, φ₃ ∈ Φ.

By a simple computation, we know

φ^soc₃ (y) = y^2r+3² + y², ∀y ∈ Kⁿ, (φ⁰₃)^soc(x) = 2r + 3

2 x^2r+1² + 2x, ∀x ∈ int(Kⁿ).

Hence, the distance-like function induced by φ₃ has the following expression D₃(x, y) = tr 2r + 1

2 x^2r+3² + x²− y ◦2r + 3

2 x^2r+1² + 2x

+ y^2r+3² + y²

Example 3.7. For any 0 < a ≤ 1, let φ₄ : IR → (−∞, ∞] be given by φ₄(t) = t^a+1+ at ln t − at if t ≥ 0,

∞ if t < 0.

Solution. It is easily shown that φ₄ satisfies conditions (C1)-(C3). By Examples 2.11-2.14, φ⁰₄ is SOC-concave on (0, ∞). Hence, φ4 ∈ Φ. For any y ∈ Kⁿ and x ∈ int(Kⁿ),

φ^soc₄ (y) = y^a+1+ ay ◦ ln y − ay and (φ⁰₄)^soc(x) = (a + 1)x^a+ a ln x.

Consequently, the distance-like function induced by φ₄ has the following expression D₄(x, y) = trh

ax^a+1+ ax − y ◦

(a + 1)x^a+ a ln x

+ y^a+1+ ay ◦ ln y − ayi .

In what follows, we study some favorable properties of the function D(x, y). We begin with some technical lemmas that will be used in the subsequent analysis.

Lemma 3.3. Suppose that φ : IR → (−∞, ∞] belongs to the class of Φ, i.e., satisfying (C1)-(C4). Let φ^soc and (φ⁰)^soc be the corresponding SOC-functions of φ and φ⁰ given as in (1.9). Then, the following hold.

(a) φ^soc(x) and (φ⁰)^soc(x) are well-defined on Kⁿ and int(Kⁿ), respectively, and λ_i[φ^soc(x)] = φ[λ_i(x)], λ_i[(φ⁰)^soc(x)] = φ⁰[λ_i(x)], i = 1, 2.

(b) φ^soc(x) and (φ⁰)^soc(x) are continuously differentiable on int(Kⁿ) with the transposed Jacobian at x given as in formulas (1.28)–(1.29).

∇tr [φ^soc(x)] = 2∇φ^soc(x)e = 2(φ⁰)^soc(x),

∇tr [(φ⁰)^soc(x)] = 2∇(φ⁰)^soc(x)e = 2(φ⁰⁰)^soc(x).

(d) The function tr[φ^soc(x)] is strictly convex on int(Kⁿ).

Proof. Mimicking the arguments as in Lemma 3.1, in other words, using Propositions 1.13-1.14, and the definition of Φ, the desired results follow.

Lemma 3.4. Suppose that φ : IR → (−∞, ∞] belongs to the class of Φ and z ∈ IRⁿ. Let φ_z : int(Kⁿ) → IR be defined by

φ_z(x) := tr − z ◦ (φ⁰)^soc(x). (3.37) Then, the function φ_z(x) possesses the following properties.

(a) φ_z(x) is continuously differentiable on int(Kⁿ) with ∇φ_z(x) = −2∇(φ⁰)^soc(x) · z.

(b) φ_z(x) is convex over int(Kⁿ) when z ∈ Kⁿ, and furthermore, it is strictly convex over int(Kⁿ) when z ∈ int(Kⁿ).

Proof. (a) Since φz(x) = −2h(φ⁰)^soc(x), zi for any x ∈ int(Kⁿ), we have that φz(x) is continuously differentiable on int(Kⁿ) by Lemma 3.3(b). Moreover, applying the chain rule for inner product of two functions readily yields ∇φ_z(x) = −2∇(φ⁰)^soc(x) · z.

(b) By the continuous differentiability of φ_z(x), to prove the convexity of φ_z on int(Kⁿ), it suffices to prove the following inequality

φ_z x + y 2

≤ 1 2

φ_z(x) + φ_z(y)

, ∀x, y ∈ int(Kⁿ). (3.38) By condition (C4), φ⁰ is SOC-concave on (0, +∞). Therefore, we have

−(φ⁰)^soc x + y 2

_Kn −1 2 h

(φ⁰)^soc(x) + (φ⁰)^soc(y)i ,

i.e.,

(φ⁰)^soc x + y 2

− 1

2(φ⁰)^soc(x) −1

2(φ⁰)^soc(y) _Kn 0.

Using Property 1.3(d) and the fact that z ∈ Kⁿ, we then obtain that

z, (φ⁰)^soc x + y 2

− 1

2(φ⁰)^soc(x) −1

2(φ⁰)^soc(y)

≥ 0, (3.39)

which in turn implies that D− z, (φ⁰)^soc x + y

≤ 1 2

D− z, (φ⁰)^soc(x)E + 1

D− z, (φ⁰)^soc(y)E .

The last inequality is exactly the one in (3.38). Hence, φ_z is convex on int(Kⁿ) for z ∈ Kⁿ. To prove the second part of the conclusions, we only need to prove that the inequality in (3.39) holds strictly for any x, y ∈ int(Kⁿ) and x 6= y. By Property 1.3(d), this is also equivalent to proving the vector (φ⁰)^soc ^x+y₂ − ¹₂(φ⁰)^soc(x) − ¹₂(φ⁰)^soc(y) is nonzero since

(φ⁰)^soc x + y 2

− 1

2(φ⁰)^soc(x) −1

2(φ⁰)^soc(y) ∈ Kⁿ and z ∈ int(Kⁿ).

From condition (C4), it follows that φ⁰ is concave on (0, ∞) since the SOC-concavity implies the concavity. This together with the strict monotonicity of φ⁰ implies that φ⁰ is strictly concave on (0, ∞). Using Lemma 3.3(d), we then have that tr[(φ⁰)^soc(x)] is strictly concave on int(Kⁿ). This means that for any x, y ∈ int(Kⁿ) and x 6= y,

(φ⁰)^soc

x + y 2

− 1

2tr [(φ⁰)^soc(x)] − 1

2tr [(φ⁰)^soc(y)] > 0. (3.40) In addition, we note that the first element of (φ⁰)^soc ^x+y₂ − ¹₂(φ⁰)^soc(x) −¹₂(φ⁰)^soc(y) is

φ⁰ λ₁

x+y 2

+ φ⁰ λ₂

x+y 2

2 − φ⁰(λ₁(x)) + φ⁰(λ₂(x))

4 − φ⁰(λ₁(y)) + φ⁰(λ₂(y))

4 ,

which, by (1.6), can be rewritten as 1

2tr

(φ⁰)^socx + y 2

− 1

4tr [(φ⁰)^soc(x)] − 1

4tr [(φ⁰)^soc(y)] .

This together with (3.40) shows that (φ⁰)^soc ^x+y₂ −¹₂(φ⁰)^soc(x) − ¹₂(φ⁰)^soc(y) is nonzero for any x, y ∈ int(Kⁿ) and x 6= y. Consequently, φz is strictly convex on int(Kⁿ). Lemma 3.5. Let F be the set defined as in (3.17). Then, its recession cone 0⁺F is described by

0⁺F =n

d ∈ IR^m | Ad _Kn 0o

. (3.41)

Proof. Assume that d ∈ IR^m such that Ad _Kn 0. Then, for any λ > 0, λAd _Kn 0.

Considering that Kⁿ is closed under the “+” operation, we have for any ζ ∈ F ,

A(ζ + λd) + b = (Aζ + b) + λ(Ad) _Kn 0. (3.42) By [131, page 61], this shows that every element in the set of the right hand side of (3.41) is a recession direction of F . Consequently, {d ∈ IR^m | Ad _Kn 0} ⊆ 0⁺F .

Now take any d ∈ 0⁺F and ζ ∈ F . Then, for any λ > 0, equation (3.42) holds. By Property 1.1(c), we then have λ1

(Aζ + b) + λAd i

≥ 0 for any λ > 0. This implies that λ₁(Ad) ≥ 0, since otherwise letting λ → +∞ and using the fact that

λ₁h

(Aζ + b) + λAdi

= (Aζ + b)₁+ λ(Ad)₁− k(Aζ + b)₂+ λ(Ad)₂k

≤ (Aζ + b)₁+ λ(Ad)₁−

λk(Ad)₂k − k(Aζ + b)₂k

= λλ₁(Ad) + λ₂(Aζ + b),

we obtain that λ₁[(Aζ + b) + λAd] → −∞. Thus, we prove that Ad _Kn 0, and conse-quently 0⁺F ⊆ {d ∈ IR^m | Ad _Kn 0}. Combining with the above discussions then yields the result.

Lemma 3.6. Let {a_nk} be a sequence of real numbers satisfying (i) a_nk ≥ 0, ∀n = 1, 2, · · · and ∀k = 1, 2, · · · .

(ii)

∞

k=1

a_nk = 1, ∀n = 1, 2, · · · ; and lim

n→∞

k=1

a_nku_k= u, ∀k = 1, 2, · · · . If {u_k} is a sequence such that lim_k→+∞u_k= u, then lim_k→+∞a_nku_k= u.

Proof. Please see [91, Theorem 2].

Lemma 3.7. Let {υ_k} and {β_k} be nonnegative sequences of real numbers satisfying (i) υ_k+1 ≤ υ_k+ β_k, (ii) P∞

k=1β_k < +∞. Then, the sequence {υ_k} is convergent.

Proof. Please see [125, Chapter 2] for a proof.

Now we are in a position to study the properties of the distance-like function D(x, y).

Proposition 3.8. Given a function φ ∈ Φ, let D(x, y) be defined as in (3.35). Then, the following hold.

(a) D(x, y) ≥ 0 for any x ∈ int(Kⁿ) and y ∈ Kⁿ, and D(x, y) = 0 if and only if x = y.

(b) For any fixed y ∈ Kⁿ, D(·, y) is continuously differentiable on int(Kⁿ) with

∇_xD(x, y) = 2∇(φ⁰)^soc(x) · (x − y). (3.43)

(c) For any fixed y ∈ Kⁿ, the function D(·, y) is convex over int(Kⁿ), and for any fixed y ∈ int(Kⁿ), D(·, y) is strictly convex over int(Kⁿ).

(d) For any fixed y ∈ int(Kⁿ), the function D(·, y) is essentially smooth.

(e) For any fixed y ∈ Kⁿ, the level sets L_D(y, γ) := {x ∈ int(Kⁿ) | D(x, y) ≤ γ} for all γ ≥ 0 are bounded.

Proof. (a) By Lemma 3.3(c), for any x ∈ int(Kⁿ) and y ∈ Kⁿ, we can rewrite D(x, y) as D(x, y) = tr[φ^soc(y)] − tr[φ^soc(x)] − h∇tr[φ^soc(x)], y − xi.

Notice that tr[φ^soc(x)] is strictly convex on int(Kⁿ) by Lemma 3.3 (d), and hence D(x, y) ≥ 0 for any x ∈ int(Kⁿ) and y ∈ Kⁿ, and D(x, y) = 0 if and only if x = y.

(b) By Lemma 3.3(b) and (c), the functions tr[φ^soc(x)] and h(φ⁰)^soc(x), xi are continuously differentiable on int(Kⁿ). Noting that, for any x ∈ int(Kⁿ) and y ∈ Kⁿ,

D(x, y) = tr[φ^soc(y)] − tr[φ^soc(x)] − 2h(φ⁰)^soc(x), y − xi,

we then have the continuous differentiability of D(·, y) on int(Kⁿ). Furthermore,

∇_xD(x, y) = −∇tr[φ^soc(x)] − 2∇(φ⁰)^soc(x) · (y − x) + 2(φ⁰)^soc(x)

= −2(φ⁰)^soc(x) + 2∇(φ⁰)^soc(x) · (x − y) + 2(φ⁰)^soc(x)

= 2∇(φ⁰)^soc(x) · (x − y).

(c) By the definition of φ_z given as in (3.37), D(x, y) can be rewritten as D(x, y) = tr[(φ⁰)^soc(x) ◦ x − φ^soc(x)] + φ_y(x) + tr[φ^soc(y)].

Thus, to prove the (strict) convexity of D(·, y) on int(Kⁿ), it suffices to show that tr[(φ⁰)^soc(x) ◦ x − φ^soc(x)] + φ_y(x)

is (strictly) convex on int(Kⁿ). From condition (C3) and Lemma 3.3(d), it follows that tr[(φ⁰)^soc(x) ◦ x − φ^soc(x)] is convex over int(Kⁿ). In addition, by Lemma 3.4(b), φ_y(x) is convex on int(Kⁿ) if y ∈ Kⁿ, and it is strictly convex if y ∈ int(Kⁿ). Thus, we get the desired results.

(d) From [131, page 251] and part(a)-(b), to prove that D(·, y) is essentially smooth for any fixed y ∈ int(Kⁿ), it suffices to show that k∇_xD(x^k, y)k → ∞ for any {x^k} ⊆ int(Kⁿ) with x^k → x ∈ bd(Kⁿ). We next prove the conclusion by the two cases: x₁ > 0 and x₁ = 0. For the sake of notation, let x^k = (x^k₁, x^k₂) ∈ IR × IRⁿ⁻¹.

Case 1: x₁ > 0. In this case, kx₂k = x₁ > 0 since x ∈ bd(Kⁿ). Noting that x^k → x, we have x^k₂ 6= 0 for all sufficiently large k. From the gradient formula (3.43),

k∇_xD(x^k, y)k = k2∇(φ⁰)^soc(x^k) · (x^k− y)k ≥

2[∇(φ⁰)^soc(x^k) · (x^k− y)]₁

, (3.44)

where [∇(φ⁰)^soc(x^k)·(x^k−y)]₁ denotes the first element of the vector ∇(φ⁰)^soc(x^k)·(x^k−y).

By the gradient formula (1.29), we can compute that

2[∇(φ⁰)^soc(x^k) · (x^k− y)]1 = [φ⁰⁰(λ2(x^k)) + φ⁰⁰(λ1(x^k))](x^k₁− y1) +[φ⁰⁰(λ₂(x^k)) − φ⁰⁰(λ₁(x^k))](x^k₂ − y₂)^Tx^k₂

kx^k₂k

= φ⁰⁰(λ₂(x^k)) λ₂(x^k) − y₁− y₂^Tx^k₂/kx^k₂k

−φ⁰⁰(λ₁(x^k)) y₁− y₂^Tx^k₂/kx^k₂k − λ₁(x^k) . (3.45) Therefore,

2[∇(φ⁰)^soc(x^k) · (x^k− y)]₁ ≥

φ⁰⁰(λ₁(x^k)) y₁− y^T₂x^k₂/kx^k₂k − λ₁(x^k)

−

φ⁰⁰(λ2(x^k)) λ2(x^k) − y1− y₂^Tx^k₂/kx^k₂k

≥

φ⁰⁰(λ₁(x^k)) ·

y₁− y^T₂x^k₂/kx^k₂k

− λ₁(x^k)

−

φ⁰⁰(λ2(x^k)) ·

λ2(x^k) − y1− y₂^Tx^k₂/kx^k₂k

≥

φ⁰⁰(λ₁(x^k)) ·

λ₁(y) − λ₁(x^k)

−

φ⁰⁰(λ₂(x^k)) ·

λ₂(x^k) − y₁− y₂^Tx^k₂/kx^k₂k . Noting that λ₁(x^k) → λ₁(x) = 0, λ₂(x^k) → λ₂(x) > 0 and y₂^Tx^k₂

kx^k₂k → y₂^Tx2

kx₂k as k → ∞, the second term in the right hand side of last inequality converges to a finite value, whereas the first term approaches to ∞ since |φ⁰⁰(λ₁(x^k))| → ∞ by condition (C2) and λ₁(y) − λ₁(x^k) → λ₁(y) > 0. This implies that as k → +∞,

2[∇(φ⁰)^soc(x^k) · (x^k− y)]₁

→ ∞.

Combining with the inequality (3.44) immediately yields k∇xD(x^k, y)k → ∞.

Case 2: x₁ = 0. In this case, we necessarily have that x = 0 since x ∈ Kⁿ. Considering that x^k → x, it then follows that x^k₂ = 0 or x^k₂ > 0 for all sufficiently large k. If x^k₂ = 0 for all sufficiently large k, then from (1.28) we have that

k∇_xD(x^k, y)k = k2φ⁰⁰(x^k₁)(x^k− y)k ≥ 2|φ⁰⁰(x^k₁)| · |x^k₁ − y₁|.

Since y₁ > 0 by y ∈ int(Kⁿ) and x^k₁ → x₁ = 0, applying condition (C2) yields that the right hand side tends to ∞, and consequently k∇_xD(x^k, y)k → +∞ when k → ∞.

Next, we consider the case that x^k₂ > 0 for all sufficiently large k. In this case, the inequalities (3.44)-(3.45) still hold. By Cauchy-Schwartz Inequality,

λ2(x^k) − y1− y^T₂x^k₂/kx^k₂k ≥ λ2(x^k) − y1− ky2k = λ2(x^k) − λ2(y), y₁ − y₂^Tx^k₂/kx^k₂k − λ₁(x^k) ≥ y₁− ky₂k − λ₁(x^k) = λ₁(y) − λ₁(x^k).

Since λ₁(x^k), λ₂(x^k) → 0 as k → +∞ and λ₁(y), λ₂(y) > 0 by y ∈ int(Kⁿ), the last two inequalities imply that

λ2(x^k) − y1− y₂^Tx^k₂/kx^k₂k → −λ2(y) < 0, y₁− y₂^Tx^k₂/kx^k₂k − λ₁(x^k) → λ₁(y) > 0.

On the other hand, by condition (C2), when k → ∞,

φ⁰⁰(λ₂(x^k)) → ∞, φ⁰⁰(λ₁(x^k)) → ∞.

The two sides show that the right hand side of (3.45) approaches to −∞ as k → ∞, and consequently, 2|[∇(φ⁰)^soc(x^k) · (x^k − y)]₁| → ∞. Thus, from (3.44), it follows that k∇_xD(x^k, y)k → ∞ as k → ∞.

(e) From the definition of D(x, y), it follows that for any x, y ∈ int(Kⁿ), D(x, y) = tr[φ^soc(y)] − tr[φ^soc(x)] − tr[(φ⁰)^soc(x) ◦ y] + tr[(φ⁰)^soc(x) ◦ x]

i=1

φ(λ_i(y)) −

i=1

φ(λ_i(x)) − tr[(φ⁰)^soc(x) ◦ y] + tr[(φ⁰)^soc(x) ◦ x],(3.46)

where the second equality is from Lemma 3.3(a) and (1.6). Since (φ⁰)^soc(x) ◦ x = h

φ⁰(λ₁(x))u⁽¹⁾_x + φ⁰(λ₂(x))u⁽²⁾_x i

◦h

λ₁(x)u⁽¹⁾_x + λ₂(x)u⁽²⁾_x i

= φ⁰(λ₁(x))λ₁(x)u⁽¹⁾_x + φ⁰(λ₂(x))λ₂(x)u⁽²⁾_x , we have from Lemma 3.3(a) that

tr[(φ⁰)^soc(x) ◦ x] =

i=1

φ⁰(λ_i(x))λ_i(x).

In addition, by Property 1.3(b) and Lemma 3.3(a), we have

tr[(φ⁰)^soc(x) ◦ y] ≤

i=1

φ⁰(λ_i(x))λ_i(y).

Combining the last two inequalities with (3.46) yields that

D(x, y) ≥

i=1

φ(λ_i(y)) − φ(λ_i(x)) − φ⁰(λ_i(x))λ_i(y) + φ⁰(λ_i(x))λ_i(x)i

i=1

φ(λ_i(y)) − φ(λ_i(x)) − φ⁰(λ_i(x))(λ_i(y) − λ_i(x))i

i=1

d_B(λ_i(y), λ_i(x)),

where d_B : IR₊× IR₊₊→ IR is the function defined by d_B(s, t) = φ(s) − φ(t) − φ⁰(t)(s − t).

This implies that for any fixed y ∈ Kⁿ and γ ≥ 0,

L_D(y, γ) ⊆ (

x ∈ int(Kⁿ)

i=1

d_B(λ_i(y), λ_i(x)) ≤ γ )

. (3.47)

Note that for any fixed s ≥ 0, the set {t > 0 | dB(s, t) ≤ 0} equals to {s} or ∅, and hence it is bounded. Thus, from [131, Corollary 8.7.1] and condition (C3), it follows that the level sets {t > 0 | d_B(s, t) ≤ γ} for any fixed s ≥ 0 are bounded. This together with (3.47) implies that the level sets L_D(y, γ) are bounded for all γ ≥ 0.

Proposition 3.9. Given a function φ ∈ Φ, let D(x, y) be defined as in (3.35). Then, for all x, y ∈ int(Kⁿ) and z ∈ Kⁿ, we have the following inequality

D(x, z) − D(y, z) ≥ 2h∇(φ⁰)^soc(y) · (z − y), y − xi

= 2h∇(φ⁰)^soc(y) · (y − x), z − yi. (3.48) Proof. Let ψ : (0, ∞) → IR be the function defined by

ψ(t) := φ⁰(t)t − φ(t). (3.49)

Then, the vector-valued function induced by ψ via (3.6)-(3.7) is (φ⁰)^soc(x) ◦ x − φ^soc(x), i.e.,

ψ^soc(x) = (φ⁰)^soc(x) ◦ x − φ^soc(x). (3.50) From the definition of D(x, y) and φz(x) and equality (3.50), it follows that

D(x, z) − D(y, z) = tr[(φ⁰)^soc(x) ◦ x − φ^soc(x)] + φ_z(x)

−tr[(φ⁰)^soc(y) ◦ y − φ^soc(y)] − φz(y)

= tr[ψ^soc(x)] − tr[ψ^soc(y)] + φ_z(x) − φ_z(y)

≥ h∇tr[ψ^soc(y)], x − yi + h∇φ_z(y), x − yi

= h2(ψ⁰)^soc(y), x − yi − h2∇(φ⁰)^soc(y) · z, x − yi, (3.51) where the inequality is due to the convexity of tr[ψ^soc(x)] and φ_z(x) and the last equality follows from Lemma 3.3(c) and Lemma 3.4(a). From the definition of ψ given as in (3.49), it is easy to compute that

h(ψ⁰)^soc(y), x − yi = h(φ⁰⁰)^soc(y) ◦ y, x − yi. (3.52) In addition, by the gradient formulas in (1.28)-(1.29), we can compute that

∇(φ⁰)^soc(y) · y = (φ⁰⁰)^soc(y) ◦ y,

which in turn implies that

h∇(φ⁰)^soc(y) · z, x − yi

= h∇(φ⁰)^soc(y) · (y + z − y), x − yi

= h∇(φ⁰)^soc(y) · y, x − yi + h∇(φ⁰)^soc(y) · (z − y), x − yi

= h(φ⁰⁰)^soc(y) ◦ y, x − yi + h∇(φ⁰)^soc(y) · (z − y), x − yi.

This, together with (3.52) and (3.51), yields the first inequality in (3.48), whereas the second inequality follows from the symmetry of the matrix ∇(φ⁰)^soc(y).

Propositions 3.8-3.9 indicate that D(x, y) possesses some favorable properties similar to those for d_ϕ. We will employ these properties to establish the convergence for an approximate version of the proximal-like algorithm (3.34).

The proximal-like algorithm described as (3.34) for the CSOCP consists of a sequence of exact minimization. However, in practical computations, it is impossible to obtain the exact solution of these minimization problems. Therefore, we consider an approximate version of this algorithm which allows the inexact solution of the subproblems (3.34).

Throughout this section, we make the following assumptions for the CSOCP:

(A1) inf {f (ζ) | ζ ∈ F } := f∗ > −∞ and dom(f ) ∩ int(F ) 6= ∅.

(A2) The matrix A is of maximal rank m.

Remark 3.4. As remarked in Remark 3.2, Assumption (A1) is elementary for the ex-istence of the solution of the CSOCP. Assumption (A2) is common in the solution of the SOCPs, which is clearly satisfied when F = {ζ ∈ IRⁿ| ζ Kⁿ 0}. Moreover, if we consider the linear SOCP

min ¯c^Tx

s.t. Ax = ¯¯ b, x ∈ Kⁿ, (3.53)

where ¯A ∈ IR^m×n with m ≤ n, ¯b ∈ IR^m, and ¯c ∈ IRⁿ, the assumption that ¯A has full row rank m is standard. Consequently, its dual problem, given by

max ¯b^Ty

s.t. ¯c − ¯A^Ty _Kn 0, (3.54)

satisfies assumption (A2). This shows that we can solve the linear SOCP by applying the approximate proximal-like algorithm described below to the dual problem (3.54). In addition, we know that the recession cone of F is given by 0⁺F = {d ∈ IR^m| Ad _Kn 0}.

This implies that assumption (A2) is also satisfied when F is supposed to be bounded, since its recession cone 0⁺F now reduces to zero.

For the sake of notation, in the sequel, we denote D : int(F ) × F → IR by

D(ζ, ξ) := D(Aζ + b, Aξ + b). (3.55)

From Proposition 3.8, we readily obtain the following properties of D(ζ, ξ).

Proposition 3.10. Let D(ζ, ξ) be defined by (3.55). Then, under Assumption (A2), we have

(a) D(ζ, ξ) ≥ 0 for any ζ ∈ int(F ) and ξ ∈ F , and D(ζ, ξ) = 0 if and only if ζ = ξ;

(b) the function D(·, ξ) for any fixed ξ ∈ F is continuously differentiable on int(F ) with

∇ζD(ζ, ξ) = 2A^T∇(φ⁰)^soc(Aζ + b)A(ζ − ξ); (3.56) (c) for any fixed ξ ∈ F , the function D(·, ξ) is convex on int(F ), and for any fixed

ξ ∈ int(F ), then D(·, ξ) is strictly convex over int(F );

(d) for any fixed ξ ∈ int(F ), the function D(·, ξ) is essentially smooth;

(e) for any fixed ξ ∈ F , the level sets L(ξ, γ) =ζ ∈ int(F) | D(ζ, ξ) ≤ γ for all γ ≥ 0 are bounded.

Now we describe an approximate version of the proximal-like algorithm (3.34).

The APM. Given a starting point ζ⁰ ∈ int(F ) and constants _k ≥ 0 and µ_k > 0, generate the sequence {ζ^k} ⊂ int(F ) satisfying

g^k∈ ∂_kf (ζ^k),

µ_kg^k+ ∇_ζD(ζ^k, ζ^k−1) = 0, (3.57) where ∂f represents the -subdifferential of f .

Remark 3.5. The APM can be regarded as an approximate version of the entropy proximal-like algorithm (3.34) in the following sense. From the relation in (3.57) and the convexity of D(·, ξ) over int(F ) for any fixed ξ ∈ int(F ), it follows that for any u ∈ int(F ),

f (u) ≥ f (ζ^k) + hu − ζ^k, g^ki − _k and

µ⁻¹_k D(u, ζ^k−1) ≥ µ⁻¹_k D(ζ^k, ζ^k−1) + µ⁻¹_k h∇_ζD(ζ^k, ζ^k−1), u − ζ^ki.

Adding the last two inequalities and using (3.57) yields

f (u) + µ⁻¹_k D(u, ζ^k−1) ≥ f (ζ^k) + µ⁻¹_k D(ζ^k, ζ^k−1) − _k. This implies that

ζ^k ∈ _k− argminf (ζ) + µ⁻¹_k D(ζ, ζ^k−1) , (3.58) where for a given function F and ≥ 0, the notation

− argmin F (ζ) :=ζ^∗| F (ζ^∗) ≤ inf F (ζ) + . (3.59)

In the rest of this section, we focus on the convergence of the APM defined as in (3.57) under assumptions (A1) and (A2). First, we prove that the APM generates a sequence {ζ^k} ⊂ int(F ), and consequently the APM is well-defined.

Proposition 3.11. For any ξ ∈ int(F ) and µ > 0, we have the following results.

(a) The function F (·) := f (·)+µ⁻¹D(·, ξ) has bounded level sets under assumption (A1).

(b) If, in addition, assumption (A2) holds, then there has a unique bζ ∈ int(F ) such that ζ = argminb

ζ∈int(F )

f (ζ) + µ⁻¹D(ζ, ξ) , (3.60)

and moreover, the minimum in the right hand side is attained at bζ satisfying

−2µ⁻¹A^T∇(φ⁰)^soc(Abζ + b)A(bζ − ξ) ∈ ∂f (bζ). (3.61) Proof. (a) Fix ξ ∈ int(F ) and µ > 0. By assumption (A1) and the nonnegativity of D(ζ, ξ), to show that F (ζ) has bounded level sets, it suffices to show that for all ν ≥ f∗, the level sets L(ν) := {ζ ∈ int(F ) | F (ζ) ≤ ν} are bounded. Notice that L(ν) ⊆ L(ξ, µ(ν − f∗)) and L(ξ, γ) := {ζ ∈ int(F ) | D(ζ, ξ) ≤ γ} are bounded for all γ ≥ 0 by Proposition 3.10(e). Therefore, the sets L(ν) all ν ≥ f_∗ are bounded.

(b) By Proposition 3.10(b), F (ζ) is a closed proper strictly convex function. Hence, if the minimum exists, it must be unique. From part(a), the minimizer bζ exists, and so it is unique. Under assumption (A2), using the gradient formula in (3.56) and the optimality conditions for (3.60) then yields that

0 ∈ ∂f (bζ) + 2µ⁻¹A^T∇(φ⁰)^soc(Abζ + b)A(bζ − ξ) + ∂δ(bζ | F ), (3.62) where δ(u | F ) = 0 if u ∈ F and +∞ otherwise. By Proposition3.10(c) and [131, Theorem 26.1], we have ∂_ζD(ζ, ξ) = ∅ for all ζ ∈ bd(F ). Hence, the relation in (3.62) implies that ζ ∈ int(F ). On the other hand, from [131, Page 226], we know thatb

∂δ(u | F ) = {v ∈ IRⁿ| v Kⁿ 0, tr(v ◦ u) = 0} .

Using Property 1.3(d), we then obtain ∂δ(bζ | F ) = {0}. Thus, the proof is complete.

Next, we investigate the properties of the sequence {ζ^k} generated by the APM defined as in (3.57).

Proposition 3.12. Let {µ_k} be any sequence of positive numbers and σ_n = Pn k=1µ_k. Let {ζ^k} be the sequence generated by the APM defined as in (3.57). Then, the following hold.

(a) µ_k[f (ζ^k) − f (ζ)] ≤ D(ζ^k−1, ζ) − D(ζ^k, ζ) + µ_k_k for all ζ ∈ F . (b) D(ζ^k, ζ) ≤ D(ζ^k−1, ζ) + µ_k_k for all ζ ∈ F subject to f (ζ) ≤ f (ζ^k).

k=1σ_k_k for all ζ ∈ F .

Proof. (a) For any ζ ∈ F , using the definition of the -subdifferential, we have

f (ζ) ≥ f (ζ^k) + hg^k, ζ − ζ^ki − _k, (3.63) where g^k∈ ∂_kf (ζ^k). However, from (3.57) and (3.56), it follows that

g^k = −2µ⁻¹_k A^T∇(φ⁰)^soc(Aζ^k+ b)A(ζ^k− ζ^k−1).

Substituting this g^k into (3.63), we then obtain that µ_kf (ζ^k) − f (ζ) ≤ 2D

A^T∇(φ⁰)^soc(Aζ^k+ b)A(ζ^k− ζ^k−1), ζ − ζ^kE

+ µ_k_k.

On the other hand, applying Proposition 3.9 at the points x = Aζ^k−1+ b, y = Aζ^k+ b and z = Aζ + b and using the definition of D(ζ, ξ) given by (3.55) yields

D(ζ^k−1, ζ) − D(ζ^k, ζ) = 2D

A^T∇(φ⁰)^soc(Aζ^k+ b)A(ζ^k− ζ^k−1), ζ − ζ^kE . Combining the last two equations, we immediately obtain the result.

(b) The result follows directly from part (a) for any ζ ∈ F such that f (ζ^k) ≥ f (ζ).

ζ^k ∈ k− argminf (ζ) + µ⁻¹_k D(ζ, ζ^k−1) . This implies that for any ζ ∈ int(F ),

f (ζ) + µ⁻¹_k D(ζ, ζ^k−1) ≥ f (ζ^k) + µ⁻¹_k D(ζ^k, ζ^k−1) − _k.

Setting ζ = ζ^k−1 in this inequality and using Proposition 3.10(d) then yields that f (ζ^k−1) − f (ζ^k) ≥ µ⁻¹_k D(ζ^k, ζ^k−1) − _k ≥ −_k.

Multiplying the above inequality by σ_k−1 and summing over k = 1, 2, · · · , n, we get

k=1

σ_k−1f (ζ^k−1) − (σ_k− µ_k)f (ζ^k) ≥ −

k=1

σ_k−1_k,

which, by noting that σ_k = µ_k+ σ_k−1 (with σ₀ ≡ 0), can be reduced to σ_nf (ζⁿ) −

k=1

µ_kf (ζ^k) ≤

k=1

σ_k−1_k.

On the other hand, using part (a) and summing over k = 1, 2, · · · , n, we have

−σ_nf (ζ) +

k=1

µ_kf (ζ^k) ≤ D(ζ⁰, ζ) − D(ζⁿ, ζ) +

k=1

µ_k_k, ∀ζ ∈ F .

Adding the last two inequalities yields

σ_n(f (ζⁿ) − f (ζ)) ≤ D(ζ⁰, ζ) − D(ζⁿ, ζ) +

k=1

(µ_k+ σ_k−1)_k,

which proves (c) because µ_k+ σ_k−1 = σ_k.

We are now in a position to prove our main convergence result for the APM defined as in (3.57).

Proposition 3.13. Let {ζ^k} be the sequence generated by the APM defined as in (3.57) and σ_n=Pn

k=1µ_k. Then, under assumptions (A1) and (A2), the following hold.

(a) If σ_n→ ∞ and µ⁻¹_k σ_k_k→ 0, then lim_n→∞f (ζⁿ) → f∗. (b) If the optimal set X 6= ∅, σ_n → ∞ and P∞

k=1µ_k_k < ∞, then the sequence ζ^k is bounded and every accumulation point is a solution of the CSOCP.

Proof. (a) From Proposition 3.12(c) and the nonnegativity of D(ζⁿ, ζ), it follows that

f (ζⁿ) − f (ζ) ≤ σ_n⁻¹D(ζ⁰, ζ) + σ_n⁻¹

k=1

σ_k_k, ∀ζ ∈ F .

Taking the limit σ_n → +∞ to the two sides of the last inequality, we immediately have that the first term in the right hand side goes to zero. In addition, applying Lemma 3.6 with ank := σ⁻¹_n µk if k ≤ n and ank := 0 otherwise and uk := µ⁻¹_k σkk, we obtain that the second term in the right hand side

σ⁻¹_n

k=1

σkk =X

ankuk→ 0

because σ_n→ +∞ and µ⁻¹_k σ_k_k → 0. Therefore, we have

n→+∞lim f (ζⁿ) ≤ f∗.

This, together with the fact that f (ζⁿ) ≥ f∗, implies the desired result.

(b) Suppose that ζ^∗ ∈ X . For any k, we have f (ζ^k) ≥ f (ζ^∗). From Proposition 3.12(b), it then follows that

D(ζ^k, ζ^∗) ≤ D(ζ^k−1, ζ^∗) + µ_k_k.

Since P∞

k=1µ_k_k < ∞, using Lemma 3.7 with v_k := D(ζ^k, ζ^∗) ≥ 0 and β_k := µ_k_k ≥ 0 yields that the sequence {D(ζ^k, ζ^∗)} converges. Thus, by Proposition 3.10(e), the sequence {ζ^k} is bounded and consequently has an accumulation point. Without any loss of generality, let bζ ∈ F be an accumulation point of {ζ^k}. Then, there exists a subsequence {ζ^k^j} → bζ for some k_j → ∞. Since f is lower semi-continuous, we obtain f (bζ) = lim inf_k_j_→∞f (ζ^k^j). On the other hand, f (ζ^k^j) → f_∗ by part (a). The two sides imply that f (bζ) = f∗. Therefore, bζ is a solution of the CSOCP. The proof is thus complete.

在文檔中 SOC Functions and Their Applications (頁 119-136)