Proximal-like algorithm for SOCP - SOC Functions and Their Applications

Chapter 3 Algorithmic Applications

In this Chapter, we will see details about how the characterizations established in Chap-ter 2 be applied in real algorithms. In particular, the SOC-convexity are often involved in the solution methods of convex SOCPs; for example, the proximal-like methods. We present three types of proximal-like algorithms, and refer the readers to [115, 116, 118]

for their numerical performance.

on IR^m which replaces the problem min

ζ∈IR^mf (ζ) by a sequence of minimization problems with strictly convex objectives, generating a sequence {ζ^k} defined by

ζ^k = argmin_ζ∈IRm

f (ζ) + 1

2µ_kkζ − ζ^k−1k²

, (3.2)

where {µ_k} is a sequence of positive numbers and k·k denotes the Euclidean norm in IR^m. The method was due to Martinet [105] who introduced the above proximal minimization problem based on the Moreau proximal approximation [110] of f . The proximal point algorithm was then further developed and studied by Rockafellar [132, 133]. Later, several researchers [34, 39, 59, 60, 144] proposed and investigated nonquadratic proximal point algorithm for the convex programming with nonnegative constraints, by replacing the quadratic distance in (3.2) with other distance-like functions. Among others, Censor and Zenios [34] replaced the method (3.2) by a method of the form

ζ^k = argmin_ζ∈IRm

f (ζ) + 1

µ_kD(ζ, ζ^k)

, (3.3)

where D(·, ·), called D-function, is a measure of distance based on a Bregman function.

Recall that, given a differentiable function ϕ, it is called a Bregman function [33, 54] if it satisfies the properties listed in Definition 3.1 below, and the induced D-function is given as follows:

D(ζ, ξ) := ϕ(ζ) − ϕ(ξ) − h∇ϕ(ξ), ζ − ξi, (3.4) where h·, ·i denotes the inner product in IR^m and ∇ϕ denotes the gradient of ϕ.

Definition 3.1. Let S ⊆ IR^mbe an open set and ¯S be its closure. The function ϕ : ¯S → IR is called a Bregman function with zone S if the following properties hold:

(i) ϕ is continuously differentiable on S;

(ii) ϕ is strictly convex and continuous on ¯S;

(iii) For each γ ∈ IR, the level sets L_D(ξ, γ) = {ζ ∈ ¯S : D(ζ, ξ) ≤ γ} and L_D(ζ, γ) = {ξ ∈ S : D(ζ, ξ) ≤ γ} are bounded for any ξ ∈ S and ζ ∈ ¯S, respectively;

(iv) If {ξ^k} ⊂ S converges to ξ^∗, then D(ξ^∗, ξ^k) → 0;

(v) If {ζ^k} and {ξ^k} are sequences such that ξ^k → ξ^∗ ∈ ¯S, {ζ^k} is bounded and if D(ζ^k, ξ^k) → 0, then ζ^k→ ξ^∗.

The Bregman proximal minimization (BPM) method described in (3.3) was further extended by Kiwiel [89] with generalized Bregman functions, called B-functions. Com-pared with Bregman functions, these functions are possibly nondifferentiable and infinite on the boundary of their domain. For the detailed definition of B-functions and the

convergence of BPM method using B-functions, please refer to [89].

Next, we present a class of distance measures on SOC and discuss its relations with the D-function and the double-regularized Bregman distance [137]. To this end, we need a class of functions φ : [0, ∞) → IR satisfying

(T1) φ is continuously differentiable on IR₊₊; (T2) φ is strictly convex and continuous on IR₊;

(T3) For each γ ∈ IR, the level sets {s ∈ IR+| d(s, t) ≤ γ} and {t ∈ IR++| d(s, t) ≤ γ}

are bounded for any t ∈ IR₊₊ and s ∈ IR₊, respectively;

(T4) If {t^k} ⊂ IR₊₊ is a sequence such that lim_k→+∞t^k = 0, then for all s ∈ IR₊₊, lim_k→+∞φ⁰(t^k)(s − t^k) = −∞;

where the function d : [0, ∞) × (0, ∞) → IR is defined by

d(s, t) = φ(s) − φ(t) − φ⁰(t)(s − t), ∀s ∈ IR+, t ∈ IR++. (3.5) The function φ satisfying (T4) is said in [80–82] to be boundary coercive. If setting φ(x) = +∞ when x /∈ IR₊, then φ becomes a closed proper strictly convex function on IR. Furthermore, by [89, Lemma 2.4(d)] and (T3), it is not difficult to see that φ(x) and Pn

i=1φ(x_i) are an B-function on IR and IRⁿ, respectively. Unless otherwise stated, in the rest of this section, we always assume that φ satisfies (T1)-(T4).

Using (1.9), the corresponding SOC functions of φ and φ⁰ are given by

φ^soc(x) = φ (λ₁(x)) u⁽¹⁾_x + φ (λ₂(x)) u⁽²⁾_x , (3.6) and

(φ⁰)^soc(x) = φ⁰(λ1(x)) u⁽¹⁾_x + φ⁰(λ2(x)) u⁽²⁾_x , (3.7) which are well-defined over Kⁿ and int(Kⁿ), respectively. In view of this, we define

H(x, y) := trφ^soc(x) − φ^soc(y) − (φ⁰)^soc(y) ◦ (x − y)

∀x ∈ Kⁿ, y ∈ int(Kⁿ),

∞ otherwise. (3.8)

In what follows, we will show that the function H : IRⁿ × IRⁿ → (−∞, +∞] enjoys some favorable properties similar to those of the D-function. Particularly, we prove that H(x, y) ≥ 0 for any x ∈ Kⁿ, y ∈ int(Kⁿ), and moreover, H(x, y) = 0 if and only if x = y.

Consequently, it can be regarded as a distance measure on the SOC.

We first start with a technical lemma that will be used in the subsequent analysis.

Lemma 3.1. Suppose that φ : [0, ∞) → IR satisfies (T1)-(T4). Let φ^soc(x) and (φ⁰)^soc(x) be given as in (3.6) and (3.7), respectively. Then, the following hold.

(a) φ^soc(x) is continuously differentiable on int(Kⁿ) with the gradient ∇φ^soc(x) satisfying

∇φ^soc(x)e = (φ⁰)^soc(x).

(b) tr[φ^soc(x)] =P2

i=1φ[λ_i(x)] and tr[(φ⁰)^soc(x)] = P2

i=1φ⁰[λ_i(x)].

(d) tr[φ^soc(x)] is strictly convex and continuous on int(Kⁿ).

(e) If {y^k} ⊂ int(Kⁿ) is a sequence such that lim_k→+∞y^k= ¯y ∈ bd(Kⁿ), then

k→+∞lim h∇tr[φ^soc(y^k)], x − y^ki = −∞ for all x ∈ int(Kⁿ).

In other words, the function tr[φ^soc(x)] is boundary coercive.

Proof. (a) The first part follows directly from Proposition 1.14. Now we prove the second part. If x₂ 6= 0, then by formulas (1.29)-(1.30) it is easy to compute that

∇φ^soc(x)e =







φ⁰(λ₂(x)) + φ⁰(λ₁(x)) φ⁰(λ2(x)) − φ2⁰(λ1(x))

kx₂k





.

In addition, using equations (1.4) and (3.7), we can prove that the vector in the right hand side is exactly (φ⁰)^soc(x). Therefore, ∇φ^soc(x)e = (φ⁰)^soc(x). If x2 = 0, then using (1.28) and (1.4), we can also prove that ∇φ^soc(x)e = (φ⁰)^soc(x).

(b) The result follows directly from (1.6) and equations (3.6)-(3.7).

(c) From part(a) and the fact that tr[φ^soc(x)] = tr[φ^soc(x) ◦ e] = 2hφ^soc(x), ei, clearly, tr[φ^soc(x)] is continuously differentiable on int(Kⁿ). Applying the chain rule for inner product of two functions immediately yields that ∇tr[φ^soc(x)] = 2∇φ^soc(x)e.

(d) It is clear that φ^soc(x) is continuous on Kⁿ. We next prove that it is strictly convex on int(Kⁿ). For any x, y ∈ Kⁿ with x 6= y and α, β ∈ (0, 1) with α + β = 1, we have

λ₁(αx + βy) = αx₁+ βy₁− kαx₂+ βy₂k ≥ αλ₁(x) + βλ₁(y), λ₂(αx + βy) = αx₁+ βy₁+ kαx₂+ βy₂k ≤ αλ₂(x) + βλ₂(y), which implies that

αλ₁(x) + βλ₁(y) ≤ λ₁(αx + βy) ≤ λ₂(αx + βy) ≤ αλ₂(x) + βλ₂(y).

On the other hand,

λ₁(αx + βy) + λ₂(αx + βy) = 2αx₁+ 2βy₁ = [αλ₁(x) + βλ₁(y)] + [αλ₂(x) + βλ₂(y)].

The last two equations imply that there exists ρ ∈ [0, 1] such that

λ₁(αx + βy) = ρ[αλ₁(x) + βλ₁(y)] + (1 − ρ)[αλ₂(x) + βλ₂(y)], λ₂(αx + βy) = (1 − ρ)[αλ₁(x) + βλ₁(y)] + ρ[αλ₂(x) + βλ₂(y)].

Thus, we have

tr[φ^soc(αx + βy)] = φ[λ₁(αx + βy)] + φ[λ₂(αx + βy)]

= φ h

ρ(αλ1(x) + βλ1(y)) + (1 − ρ)(αλ2(x) + βλ2(y)) i +φh

(1 − ρ)(αλ₁(x) + βλ₁(y)) + ρ(αλ₂(x) + βλ₂(y))i

≤ ρφ αλ₁(x) + βλ₁(y) + (1 − ρ)φ αλ₂(x) + βλ₂(y) +(1 − ρ)φ αλ₁(x) + βλ₁(y) + ρφ αλ₂(x) + βλ₂(y)

= φ αλ₁(x) + βλ₁(y) + φ αλ₂(x) + βλ₂(y)

< αφ λ₁(x) + βφ λ1(y) + αφ λ2(x) + βφ λ2(y)

= αtr[φ^soc(x)] + βtr[φ^soc(y)],

where the first equality and the last one follow from part(b), and the two inequalities are due to the strict convexity of φ on IR₊₊. From the definition of strict convexity, we thus prove that the conclusion holds.

(e) From part(a) and part(c), we can readily obtain the following equality

∇tr[φ^soc(x)] = 2(φ⁰)^soc(x), ∀x ∈ int(Kⁿ). (3.9) Using the relation and Proposition 1.3(b), we then have

h∇tr[φ^soc(y^k)], x − y^ki = 2h(φ⁰)^soc(y^k), x − y^ki

= tr[(φ⁰)^soc(y^k) ◦ (x − y^k)]

= tr[(φ⁰)^soc(y^k) ◦ x] − tr[(φ⁰)^soc(y^k) ◦ y^k]

≤

i=1

φ⁰[λ_i(y^k)]λ_i(x) − tr[(φ⁰)^soc(y^k) ◦ y^k]. (3.10)

In addition, by Property 1.1(a)-(b), for any y ∈ int(Kⁿ), we can compute that (φ⁰)^soc(y) ◦ y = h

φ⁰(λ₁(y))u⁽¹⁾_y + φ⁰(λ₂(y))u⁽²⁾_y i

◦h

λ₁(y)u⁽¹⁾_y + λ₂(y)u⁽²⁾_y i

= φ⁰(λ₁(y))λ₁(y)u⁽¹⁾_y + φ⁰(λ₂(y))λ₂(y)u⁽²⁾_y , (3.11) which implies that

tr[(φ⁰)^soc(y^k) ◦ y^k =

i=1

φ⁰[λ_i(y^k)]λ_i(y^k). (3.12)

Combining with (3.10) and (3.12) immediately yields that

h∇tr[φ^soc(y^k)], x − y^ki ≤

i=1

φ⁰[λi(y^k)][λi(x) − λi(y^k)]. (3.13)

Note that λ₂(¯y) ≥ λ₁(¯y) = 0 and λ₂(x) ≥ λ₁(x) > 0 since ¯y ∈ bd(Kⁿ) and x ∈ int(Kⁿ).

Hence, if λ₂(¯y) = 0, then by (T4) and the continuity of λ_i(·) for i = 1, 2,

k→+∞lim φ⁰[λ_i(y^k)][λ_i(x) − λ_i(y^k)] = −∞, i = 1, 2, which means that

k→+∞lim

i=1

φ⁰[λi(y^k)][λi(x) − λi(y^k)] = −∞. (3.14)

If λ₂(¯y) > 0, then lim_k→+∞φ⁰[λ₂(y^k)][λ₂(x) − λ₂(y^k)] is finite and lim

k→+∞φ⁰[λ₁(y^k)][λ₁(x) − λ₁(y^k)] = −∞,

and therefore the result in (3.14) also holds under such case. Combining (3.14) with (3.13), we prove that the conclusion holds.

Using the relation in (3.9), we have that for any x ∈ Kⁿ and y ∈ int(Kⁿ), trh

(φ⁰)^soc(y) ◦ (x − y)i

= 2D

(φ⁰)^soc(y), x − yE

∇tr[φ^soc(y)], x − yE . As a consequence, the function H(x, y) in (3.8) can be rewritten as

H(x, y) = tr[φ^soc(x)] − tr[φ^soc(y)] − h∇tr[φ^soc(y)], x − yi, ∀x ∈ Kⁿ, y ∈ int(Kⁿ).

∞, otherwise. (3.15)

By the representation, we next investigate several important properties of H(x, y).

Proposition 3.1. Let H(x, y) be the function defined as in (3.8) or (3.15). Then, the following hold.

(a) H(x, y) is continuous on Kⁿ× int(Kⁿ), and for any y ∈ int(Kⁿ), the function H(·, y) is strictly convex on Kⁿ.

(b) For any given y ∈ int(Kⁿ), H(·, y) is continuously differentiable on int(Kⁿ) with

∇_xH(x, y) = ∇tr[φ^soc(x)] − ∇tr[φ^soc(y)] = 2(φ⁰)^soc(x) − (φ⁰)^soc(y). (3.16) (c) H(x, y) ≥ P2

i=1d(λ_i(x), λ_i(y)) ≥ 0 for any x ∈ Kⁿ and y ∈ int(Kⁿ), where d(·, ·) is defined by (3.5). Moreover, H(x, y) = 0 if and only if x = y.

(d) For every γ ∈ IR, the partial level sets of L_H(y, γ) = {x ∈ Kⁿ : H(x, y) ≤ γ} and L_H(x, γ) = {y ∈ int(Kⁿ) : H(x, y) ≤ γ} are bounded for any y ∈ int(Kⁿ) and x ∈ Kⁿ, respectively.

(e) If {y^k} ⊂ int(Kⁿ) is a sequence converging to y^∗ ∈ int(Kⁿ), then H(y^∗, y^k) → 0.

(f ) If {x^k} ⊂ int(Kⁿ) and {y^k} ⊂ int(Kⁿ) are sequences such that {y^k} → y^∗ ∈ int(Kⁿ), {x^k} is bounded, and H(x^k, y^k) → 0, then x^k → y^∗.

Proof. (a) Note that φ^soc(x), (φ⁰)^soc(y), (φ⁰)^soc(y) ◦ (x − y) are continuous for any x ∈ Kⁿ and y ∈ int(Kⁿ) and the trace function tr(·) is also continuous, and hence H(x, y) is continuous on Kⁿ× int(Kⁿ). From Lemma 3.1(d), tr[φ^soc(x)] is strictly convex over Kⁿ, whereas −tr[φ^soc(y)] − h∇tr[φ^soc(y)], x − yi is clearly convex in Kⁿ for fixed y ∈ int(Kⁿ).

This means that H(·, y) is strictly convex for any y ∈ int(Kⁿ).

(b) By Lemma 3.1(c), the function H(·, y) for any given y ∈ int(Kⁿ) is continuously differentiable on int(Kⁿ). The first equality in (3.16) is obvious and the second is due to (3.9).

H(x, y) = trφ^soc(x)] − trφ^soc(y)] − tr[(φ⁰)^soc(y) ◦ (x − y)

= trφ^soc(x)] − trφ^soc(y)] − tr[(φ⁰)^soc(y) ◦ x + tr[(φ⁰)^soc(y) ◦ y

≥ trφ^soc(x)] − trφ^soc(y)] −

i=1

φ⁰(λ_i(y))λ_i(x) + tr[(φ⁰)^soc(y) ◦ y

i=1

φ(λ_i(x)) − φ(λ_i(y)) − φ⁰(λ_i(y))λ_i(x) + φ⁰(λ_i(y))λ_i(y)i

i=1

φ(λ_i(x)) − φ(λ_i(y)) − φ⁰(λ_i(y))(λ_i(x) − λ_i(y))i

i=1

d(λ_i(x), λ_i(y)) ≥ 0,

where the first equality is due to (3.8), the second and fourth are obvious, the third follows from Lemma 3.1(b) and (3.11), the last one is from (3.5), and the first inequality follows from Proposition 1.3(b) and the last one is due to the strict convexity of φ on IR₊. Note that tr[φ^soc(x)] is strictly convex for any x ∈ Kⁿ by Lemma 3.1(d), and therefore H(x, y) = 0 if and only if x = y by (3.15).

(d) From part(c), we have that L_H(y, γ) ⊆ {x ∈ Kⁿ| P2

i=1d(λ_i(x), λ_i(y)) ≤ γ}. By (T3), the set in the right hand side is bounded. Thus, L_H(y, γ) is bounded for y ∈ int(Kⁿ).

Similarly, LH(x, γ) is bounded for x ∈ Kⁿ.

From part(a)-(d), we immediately obtain the results in (e) and (f).

Remark 3.1. (i) From (3.8), it is not difficult to see that H(x, y) is exactly a distance measure induced by tr[φ^soc(x)] via formula (3.4). Therefore, if n = 1 and φ is a Bregman function with zone IR₊₊, i.e., φ also satisfies the property:

(T5) if {s^k} ⊆ IR+ and {t^k} ⊂ IR++ are sequences such that t^k → t^∗, {s^k} is bounded, and d(s^k, t^k) → 0, then s^k → t^∗;

then H(x, y) reduces to the Bregman distance function d(x, y) in (3.5).

(ii) When n > 1, H(x, y) is generally not a Bregman distance even if φ is a Bregman function with zone IR++, by noting that Proposition 3.1(e) and (f ) do not hold for {x^k} ⊆ bd(Kⁿ) and y^∗ ∈ bd(Kⁿ). By the proof of Proposition 3.1(c), the main reason is that in order to guarantee that

tr[(φ⁰)^soc(y) ◦ x] =

i=1

φ⁰(λi(y))λi(x)

for any x ∈ Kⁿ and y ∈ int(Kⁿ), the relation [(φ⁰)^soc(y)]2 = αx2 with some α > 0 is required, where [(φ⁰)^soc(y)]₂ is a vector composed of the last n − 1 elements of (φ⁰)^soc(y). It is very stringent for φ to satisfy such relation. By this, tr[φ^soc(x)] is not a B-function [89] on IRⁿ, either, even if φ itself is a B-function.

(iii) We observe that H(x, y) is inseparable, whereas the double-regularized distance func-tion proposed by [137] belongs to the separable class of funcfunc-tions. In view of this, H(x, y) can not become a double-regularized distance function in Kⁿ × int(Kⁿ), even when φ is such that ˜d(s, t) = d(s, t)/φ⁰⁰(t) + ^µ₂(s − t)² is a double regularized component (see [137]).

In view of Proposition 3.1 and Remark 3.1, we call H(x, y) a quasi D-function. In the following, we present several specific examples of quasi D-functions.

Example 3.1. Let φ : [0, ∞) → IR be φ(t) = t ln t − t with the convention 0 ln 0 = 0.

Solution. It is easy to verify that φ satisfies (T1)-(T4). By [63, Proposition 3.2 (b)] and (3.6)-(3.7), we can compute that for any x ∈ Kⁿ and y ∈ int(Kⁿ),

φ^soc(x) = x ◦ ln x − x and (φ⁰)^soc(y) = ln y.

Therefore, we obtain

H(x, y) = tr(x ◦ ln x − x ◦ ln y + y − x), ∀x ∈ Kⁿ, y ∈ int(Kⁿ),

∞, otherwise,

which is a quasi D-function.

Example 3.2. Let φ : [0, ∞) → IR be φ(t) = t²−√ t.

Solution. It is not hard to verify that φ satisfies (T1)-(T4). From Property 1.2, we have that for any x ∈ Kⁿ,

x² = x ◦ x = λ²₁(x)u⁽¹⁾_x + λ²₂(x)u⁽²⁾_x and x^1/2 =p

λ₁(x)u⁽¹⁾_x +p

λ₂(x)u⁽²⁾_x . By a direct computation, we then obtain for any x ∈ Kⁿ and y ∈ int(Kⁿ),

φ^soc(x) = x ◦ x − x^1/2 and (φ⁰)^soc(y) = 2y − tr(y^1/2)e − y^1/2 2pdet(y) . This yields

H(x, y) =





 tr

(x − y)²− (x^1/2− y^1/2) + (tr(y^1/2)e − y^1/2) ◦ (x − y) 2pdet(y)

, ∀x ∈ Kⁿ, y ∈ int(Kⁿ),

∞, otherwise,

which is a quasi D-function.

Example 3.3. Let φ : [0, ∞) → IR be φ(t) = t ln t − (1 + t) ln(1 + t) + (1 + t) ln 2 with the convention 0 ln 0 = 0.

Solution. It is easily shown that φ satisfies (T1)-(T4). Using [63, Proposition 3.2 (b)], we know that for any x ∈ Kⁿ and y ∈ int(Kⁿ),

φ^soc(x) = x ◦ ln x − (e + x) ◦ ln(e + x) + (e + x) ln 2 and

(φ⁰)^soc(y) = ln y − ln(e + y) + e ln 2.

Consequently, we obtain

H(x, y) = trx ◦ (ln x −ln y) −(e + x) ◦ (ln(e +x) − ln(e +y)), ∀x ∈ Kⁿ, y ∈ int(Kⁿ),

∞, otherwise,

which is a quasi D-function.

In addition, from [80, 82, 144], it follows that Pm

i=1φ(ζ_i) generated by φ in the above examples is a Bregman function with zone S = IR^m₊, and consequently Pm

i=1d(ζ_i, ξ_i) defined as in (3.5) is a D-function induced by Pm

i=1φ(ζ_i).

Proposition 3.2. Let H(x, y) be defined as in (3.8) or (3.15). Then, for all x, y ∈ int(Kⁿ) and z ∈ Kⁿ, the following three-points identity holds:

H(z, x) + H(x, y) − H(z, y) = D

∇tr[φ^soc(y)] − ∇tr[φ^soc(x)], z − xE

= trh

(φ⁰)^soc(y) − (φ⁰)^soc(x)

◦ (z − x)i .

Proof. Using the definition of H given as in (3.15), we have D

∇tr[φ^soc(x)], z − x E

= tr[φ^soc(z)] − tr[φ^soc(x)] − H(z, x), D∇tr[φ^soc(y)], x − yE

= tr[φ^soc(x)] − tr[φ^soc(y)] − H(x, y), D∇tr[φ^soc(y)], z − yE

= tr[φ^soc(z)] − tr[φ^soc(y)] − H(z, y).

Subtracting the first two equations from the last one gives the first equality. By (3.9), D∇tr[φ^soc(y)] − ∇tr[φ^soc(x)], z − xE

= 2D

(φ⁰)^soc(y) − (φ⁰)^soc(x), z − xE . This together with the fact that tr(x ◦ y) = hx, yi leads to the second equality.

In this section, we propose a proximal-like algorithm for solving the CSOCP based on the quasi D-function H(x, y). For the sake of notation, we denote F by the set

F = n

ζ ∈ IR^m | Aζ + b _Kn 0o

. (3.17)

It is easy to verify that F is convex and its interior int(F ) is given by int(F ) =n

ζ ∈ IR^m | Aζ + b _Kn 0o

. (3.18)

Let ψ : IR^m → (−∞, +∞] be the function defined by ψ(ζ) = tr[φ^soc(Aζ + b)], if ζ ∈ F .

∞, otherwise. (3.19)

By Lemma 3.1, it is easily shown that the following conclusions hold for ψ(ζ).

Proposition 3.3. Let ψ(ζ) be given as in (3.19). If the matrix A has full rank m, then (a) ψ(ζ) is continuously differentiable on int(F ) with ∇ψ(ζ) = 2A^T(φ⁰)^soc(Aζ + b);

(d) ψ(ζ) is strictly convex and continuous on F ;

(c) ψ(ζ) is boundary coercive, i.e., if {ξ^k} ⊆ int(F ) such that lim_k→+∞ξ^k= ξ ∈ bd(F ), then for all ζ ∈ int(F ), there holds that lim_k→+∞∇ψ(ξ^k)^T(ζ − ξ^k) = −∞.

Let D(ζ, ξ) be the function induced by the above ψ(ζ) via formula (3.4), i.e.,

D(ζ, ξ) = ψ(ζ) − ψ(ξ) − h∇ψ(ξ), ζ − ξi. (3.20) Then, from (3.15) and (3.19), it is not difficult to see that

D(ζ, ξ) = H(Aζ + b, Aξ + b). (3.21)

Thus, by Proposition 3.1 and Lemma 3.3, we draw the following conclusions.

Proposition 3.4. Let D(ζ, ξ) be given by (3.20) or (3.21). If the matrix A has full rank m, then

(a) D(ζ, ξ) is continuous on F × int(F ), and for any given ξ ∈ int(F ), the function D(·, ξ) is strictly convex on F .

(b) For any fixed ξ ∈ int(F ), D(·, ξ) is continuously differentiable on int(F ) with

∇_ζD(ζ, ξ) = ∇ψ(ζ) − ∇ψ(ξ) = 2A^Th

(φ⁰)^soc(Aζ + b) − (φ⁰)^soc(Aξ + b)i .

i=1d(λ_i(Aζ + b), λ_i(Aξ + b)) ≥ 0 for any ζ ∈ F and ξ ∈ int(F ), where d(·, ·) is defined by (3.5). Moreover, D(ζ, ξ) = 0 if and only if ζ = ξ.

(d) For each γ ∈ IR, the partial level sets of LD(ξ, γ) = {ζ ∈ F : D(ζ, ξ) ≤ γ} and L_D(ζ, γ) = {ξ ∈ int(F ) : D(ζ, ξ) ≤ γ} are bounded for any ξ ∈ int(F ) and ζ ∈ F , respectively.

The PLA. The first proximal-like algorithm that we propose for the CSOCP (3.1) is defined as follows:

ζ⁰ ∈ int(F ),

ζ^k = argmin_ζ∈Ff (ζ) + (1/µk)D(ζ, ζ^k−1)

(k ≥ 1), (3.22) where {µ_k}_k≥1 is a sequence of positive numbers. To establish the convergence of the algorithm, we make the following Assumptions for the CSOCP:

(A1) infn

f (ζ) | ζ ∈ Fo

:= f∗ > −∞ and dom(f ) ∩ int(F ) 6= ∅.

(A2) The matrix A is of maximal rank m.

Remark 3.2. Assumption (A1) is elementary for the solution of the CSOCP. Assump-tion (A2) is common in the soluAssump-tion of SOCPs and it is obviously satisfied when F = Kⁿ. Moreover, if we consider the standard SOCP

min c^Tx

s.t. Ax = b, x ∈ Kⁿ,

where A ∈ IR^m×n with m ≤ n, b ∈ IR^m, and c ∈ IRⁿ, the assumption that A has full row rank m is standard. Consequently, its dual problem, given by

max b^Ty

s.t. c − A^Ty _Kn 0, (3.23)

satisfies assumption (A2). This shows that we can solve the SOCP by applying the proximal-like algorithm (PLA) defined as in (3.22) to the dual problem (3.23).

Now, we show the algorithm PLA given by (3.22) is well-defined under assumptions (A1) and (A2).

Proposition 3.5. Suppose that assumptions (A1)-(A2) hold. Then, the algorithm PLA given by (3.22) generates a sequence {ζ^k} ⊂ int(F ) such that

−2µ⁻¹_k A^T (φ⁰)^soc(Aζ^k+ b) − (φ⁰)^soc(Aζ^k−1+ b) ∈ ∂f (ζ^k).

Proof. The proof proceeds by induction. For k = 0, it clearly holds. Assume that ζ^k−1 ∈ int(F ). Let f_k(ζ) := f (ζ)+µ⁻¹_k D(ζ, ζ^k−1). Then assumption (A1) and Proposition 3.4(d) imply that f_k has bounded level sets in F . By the lower semi-continuity of f and Proposition 3.4(a), the minimization problem min_ζ∈Ff_k(ζ), i.e., the subproblem in (3.22), has solutions. Moreover, the solution ζ^kis unique due to the convexity of f and the strict convexity of D(·, ξ). In the following, we prove that ζ^k∈ int(F ).

By [131, Theorem 23.8] and the definition of D(ζ, ξ) given by (3.20), we can verify that ζ^k is the only ζ ∈ dom(f ) ∩ F such that

2µ⁻¹_k A^T(φ⁰)^soc(Aζ^k−1+ b) ∈ ∂ f (ζ) + µ⁻¹_k ψ(ζ) + δ(ζ|F ) , (3.24) where δ(ζ|F ) = 0 if ζ ∈ F and +∞ otherwise. We will show that

∂ f (ζ) + µ⁻¹_k ψ(ζ) + δ(ζ|F ) = ∅ for all ζ ∈ bd(F), (3.25) which by (3.24) implies that ζ^k ∈ int(F ). Take ζ ∈ bd(F ) and assume that there exists w ∈ ∂

f (ζ) + µ⁻¹_k ψ(ζ)

. Take bζ ∈ dom(f ) ∩ int(F ) and let

ζ^l = (1 − _l)ζ + _lζb (3.26) with lim_l→+∞_l = 0. From the convexity of int(F ) and dom(f ), it then follows that ζ^l ∈ dom(f ) ∩ int(F ), and moreover, lim_l→+∞ζ^l = ζ. Consequently,

_lw^T(bζ − ζ) = w^T(ζ^l− ζ)

≤ f (ζ^l) − f (ζ) + µ⁻¹_k h

ψ(ζ^l) − ψ(ζ)i

≤ f (ζ^l) − f (ζ) + µ⁻¹_k D

2A^T(φ⁰)^soc(Aζ^l+ b), ζ^l− ζE

≤ _l(f (bζ) − f (ζ)) + µ⁻¹_k _l 1 − l

trh

(φ⁰)^soc(Aζ^l+ b) ◦ (Abζ − Aζ^l)i , where the first equality is due to (3.26), the first inequality follows from the definition of subdifferential and the convexity of f (ζ) + µ⁻¹_k ψ(ζ) in F , the second one is due to the convexity and differentiability of ψ(ζ) in int(F ), and the last one is from (3.26) and the

convexity of f . Using Proposition 1.3(b) and (3.11), we then have µ_k(1 − _l)[f (ζ) − f (bζ) + w^T(bζ − ζ)]

≤ trh

(φ⁰)^soc(Aζ^l+ b) ◦ (Abζ + b)i

− trh

(φ⁰)^soc(Aζ^l+ b) ◦ (Aζ^l+ b)i

≤

i=1

φ⁰(λ_i(Aζ^l+ b))λ_i(Abζ + b) − φ⁰(λ_i(Aζ^l+ b))λ_i(Aζ^l+ b)i

i=1

φ⁰(λ_i(Aζ^l+ b))h

λ_i(Abζ + b) − λ_i(Aζ^l+ b)i .

Since ζ ∈ bd(F ), i.e., Aζ + b ∈ bd(Kⁿ), it follows that lim_l→+∞λ₁(Aζ^l+ b) = 0. Thus, using (T4) and following the same line as the proof of Lemma 3.1(d), we can prove that the right hand side of the last inequality goes to −∞ when l tends to ∞, whereas the left-hand side has a finite limit. This gives a contradiction. Hence, the equation (3.25) follows, which means that ζ^k∈ int(F ).

Finally, let us prove ∂δ(ζ^k| F ) = {0}. From [131, page 226], it follows that

∂δ(z|Kⁿ) = {υ ∈ IRⁿ| υ Kⁿ 0, tr(υ ◦ z) = 0}.

Using [131, Theorem 23.9] and the assumption dom(f ) ∩ int(F ) 6= ∅, we have

∂δ(ζ| F ) =A^Tυ ∈ IRⁿ| υ Kⁿ 0, tr(υ ◦ (Aζ + b)) = 0 .

In addition, from the self-dual property of symmetric cone Kⁿ, we know that tr(x ◦ y) = 0 for any x _Kn 0 and y _Kn 0 implies x = 0. Thus, we obtain ∂δ(ζ^k|F ) = {0}. This together with (3.24) and [131, Theorem 23.8] yields the desired result.

Proposition 3.5 implies that the second-order cone constrained subproblem in (3.22) is actually equivalent to an unconstrained one

ζ^k = argmin_ζ∈IRm

f (ζ) + 1

µ_kD(ζ, ζ^k−1)

which is obviously simpler than the original CSOCP. This shows that the proximal-like algorithm proposed transforms the CSOCP into the solution of a sequence of simpler problems. We next present some properties satisfied by {ζ^k}. For convenience, we denote the optimal set of the CSOCP by X := {ζ ∈ F | f (ζ) = f∗}.

Proposition 3.6. Let {ζ^k} be the sequence generated by the algorithm PLA given by (3.22), and let σ_N =PN

k=1µ_k. Then, the following hold.

(a) {f (ζ^k)} is a nonincreasing sequence.

(b) µ_k(f (ζ^k) − f (ζ)) ≤ D(ζ, ζ^k−1) − D(ζ, ζ^k) for all ζ ∈ F .

(d) D(ζ, ζ^k) is nonincreasing for any ζ ∈ X if the optimal set X 6= ∅.

(e) D(ζ^k, ζ^k−1) → 0 if the optimal set X 6= ∅.

Proof. (a) By the definition of ζ^k given as in (3.22), we have

f (ζ^k) + µ⁻¹_k D(ζ^k, ζ^k−1) ≤ f (ζ^k−1) + µ⁻¹_k D(ζ^k−1, ζ^k−1).

Since D(ζ^k, ζ^k−1) ≥ 0 and D(ζ^k−1, ζ^k−1) = 0 by Proposition 3.4(c), it follows that f (ζ^k) ≤ f (ζ^k−1) (k ≥ 1).

(b) By Proposition 3.5, 2µ⁻¹_k A^T[(φ⁰)^soc(Aζ^k−1 + b) − (φ⁰)^soc(Aζ^k+ b)] ∈ ∂f (ζ^k). Hence, from the definition of subdifferential, it follows that for any ζ ∈ F ,

f (ζ) ≥ f (ζ^k) + 2µ⁻¹_k D

(φ⁰)^soc(Aζ^k−1+ b) − (φ⁰)^soc(Aζ^k+ b), Aζ − Aζ^kE

= f (ζ^k) + µ⁻¹_k trh

[(φ⁰)^soc(Aζ^k−1+ b) − (φ⁰)^soc(Aζ^k+ b)] ◦ [(Aζ + b) − (Aζ^k+ b)]i

= f (ζ^k) + µ⁻¹_k H(Aζ + b, Aζ^k+ b) + H(Aζ^k+ b, Aζ^k−1+ b) − H(Aζ + b, Aζ^k−1+ b)

= f (ζ^k) + µ⁻¹_k h

D(ζ, ζ^k) + D(ζ^k, ζ^k−1) − D(ζ, ζ^k−1)i

, (3.27)

where the first equality is due to the definition of trace and the second follows from Proposition 3.2. From this inequality and the nonnegativity of D(ζ^k, ζ^k−1), we readily obtain the conclusion.

µk[f (ζ^k−1) − f (ζ^k)] ≥ D(ζ^k−1, ζ^k) − D(ζ^k−1, ζ^k−1) = D(ζ^k−1, ζ^k).

Multiplying this inequality by σk−1 and noting that σk = σk−1+ µk, one has

σk−1f (ζ^k−1) − (σk− µk)f (ζ^k) ≥ σk−1µ⁻¹_k D(ζ^k−1, ζ^k). (3.28) Summing up the inequalities in (3.28) for k = 1, 2, · · · , N and using σ0 = 0 yields

−σ_Nf (ζ^N) +

k=1

µ_kf (x^k) ≥

k=1

σ_k−1µ⁻¹_k D(ζ^k−1, ζ^k). (3.29)

On the other hand, summing the inequality in part (b) over k = 1, 2, · · · , N , we get

−σ_Nf (ζ) +

k=1

µ_kf (ζ^k) ≤ D(ζ, ζ⁰) − D(ζ, ζ^N). (3.30)

Now subtracting (3.29) from (3.30) yields that

σ_N[f (ζ^N) − f (ζ)] ≤ D(ζ, ζ⁰) − D(ζ, ζ^N) −

k=1

σ_k−1µ⁻¹_k D(ζ^k−1, ζ^k).

This together with the nonnegativity of D(ζ^k−1, ζ^k) implies the conclusion.

(d) Note that f (ζ^k) − f (ζ) ≥ 0 for all ζ ∈ X . Thus, the result follows from part(b) directly.

(e) From part(d), we know that D(ζ, ζ^k) is nonincreasing for any ζ ∈ X . This together with D(ζ, ζ^k) ≥ 0 for any k implies that D(ζ, ζ^k) is convergent. Thus, we have

D(ζ, ζ^k−1) − D(ζ, ζ^k) → 0 as k → ∞. (3.31) On the other hand, from (3.27) it follows that

0 ≤ µ_k[f (ζ^k) − f (ζ)] ≤ D(ζ, ζ^k−1) − D(ζ, ζ^k) − D(ζ^k, ζ^k−1), ∀ζ ∈ X , which together with the nonnegativity of D(ζ^k, ζ^k−1) implies

D(ζ^k, ζ^k−1) ≤ D(ζ, ζ^k−1) − D(ζ, ζ^k), ∀ζ ∈ X . This combining with (3.31) yields the desired result.

We have proved that the proximal-like algorithm (PLA) defined as in (3.22) is well-defined and satisfies some favorable properties. By this, we next establish its convergence.

Proposition 3.7. Let {ζ^k} be the sequence generated by the algorithm PLA given by in (3.22), and let σN =PN

k=1µk. Then, under Assumptions (A1)-(A2), (a) if σ_N → ∞, then lim_{N →+∞}f (ζ^N) → f∗;

(b) if σ_N → ∞ and the optimal set X 6= ∅, then the sequence {x^k} is bounded and every accumulation point is a solution of the CSOCP.

Proof. (a) From the definition of f∗, there exists a bζ ∈ F such that f (bζ) < f∗ + , ∀ > 0.

Moreover, from Proposition 3.6(c) and the nonnegativity of D(ζ, ζ^N), we have that f (ζ^N) − f (ζ) ≤ σ_N⁻¹D(ζ, ζ⁰), ∀ζ ∈ F .

Let ζ = bζ in the above inequality and take the limit with σ_N → +∞, we then obtain lim_{N →+∞}f (ζ^N) < f∗+ .

Considering that is arbitrary and f (ζ^N) ≥ f∗, we thus have the desired result.

(b) Suppose that ζ^∗ ∈ X . Then, from Proposition 3.6(d), D(ζ^∗, ζ^k) ≤ D(ζ^∗, ζ⁰) for any k. This implies that {ζ^k} ⊆ LD(ζ^∗, D(ζ^∗, ζ⁰)). By Proposition 3.6(d), the sequence {ζ^k} is then bounded. Let ¯ζ ∈ F be an accumulation point of {ζ^k} with subsequence {ζ^k^j} → ¯ζ. Then, from part(a), it follows that f (ζ^k^j) → f∗. On the other hand, since f is lower-semicontinuous, we have f (¯ζ) = lim inf_k_j→+∞f (ζ^k^j). The two sides show that f (¯ζ) ≤ f (ζ^∗). Consequently, ¯ζ is a solution of the CSOCP.

在文檔中 SOC Functions and Their Applications (頁 104-119)