• 沒有找到結果。

Interior proximal-like algorithms for SOCP

在文檔中 SOC Functions and Their Applications (頁 119-136)

Considering that  is arbitrary and f (ζN) ≥ f, we thus have the desired result.

(b) Suppose that ζ ∈ X . Then, from Proposition 3.6(d), D(ζ, ζk) ≤ D(ζ, ζ0) for any k. This implies that {ζk} ⊆ LD, D(ζ, ζ0)). By Proposition 3.6(d), the sequence {ζk} is then bounded. Let ¯ζ ∈ F be an accumulation point of {ζk} with subsequence {ζkj} → ¯ζ. Then, from part(a), it follows that f (ζkj) → f. On the other hand, since f is lower-semicontinuous, we have f (¯ζ) = lim infkj→+∞f (ζkj). The two sides show that f (¯ζ) ≤ f (ζ). Consequently, ¯ζ is a solution of the CSOCP. 

and obtained the convergence theorems under weaker assumptions. Clearly, when ϕ(t) = − ln t + t − 1 (t > 0),

we have that dϕ(ζ, ξ) = d(ξ, ζ), and consequently the algorithm reduces to Eggermont’s (3.32).

Observing that the proximal-like algorithm (3.33) associated with ϕ(t) = − ln t + t − 1 inherits the features of the interior point method as well as the proximal point method, Auslender [8] extended the algorithm to general linearly constrained convex minimiza-tion problems and variaminimiza-tional inequalities on polyhedra. Then, is it possible to extend the algorithm to nonpolyhedra symmetric conic optimization problems and establish the corresponding convergence results? In this section, we will explore its extension to the setting of second-order cones and establish a class of interior proximal-like algorithms for the CSOCP. We should mention that the algorithm (3.33) with the entropy function t ln t − t + 1 (t ≥ 0) was recently extended to convex semidefinite programming [57].

Again as defined in (3.17) and (3.18), we denote F the constraint set of the CSOCP, i.e.,

F := {ζ ∈ IRm| Aζ + b Kn 0} , and denote its interior by int(F ), i.e.,

int(F ) := {ζ ∈ IRm| Aζ + b Kn 0} .

Accordingly, the 2nd proximal-like algorithm that we propose for the CSOCP is defined as follows:

( ζ0 ∈ int(F ) ζk = argmin

ζ∈int(F )

f (ζ) + µ−1k D(Aζ + b, Aζk−1+ b) , (3.34) where D : IRn × IRn → (−∞, +∞] is a closed proper convex function generated by a class of twice continuously differentiable and strictly convex functions on (0, +∞), and the specific expression is given later. The class of distance measures includes as a special case the natural extension of dϕ(x, y) with ϕ(t) = − ln t + t − 1 to the second-order cones.

For the proximal-like algorithm (3.34), we particularly consider an approximate version which allows inexact minimization of the subproblem (3.34) and establish its global con-vergence results under some mild assumptions.

Next, we present the definition of the distance-like function D(x, y) involved in the proximal-like algorithm (3.34) and some specific examples. Let φ : IR → (−∞, ∞] be a closed proper convex function with domφ = [0, ∞) and assume that

(C1) φ is strictly convex on its domain.

(C2) φ is twice continuously differentiable on int(domφ) with limt→0+φ00(t) = ∞.

(C3) φ0(t)t − φ(t) is convex on int(domφ).

(C4) φ0 is SOC-concave on int(domφ).

In the sequel, we denote by Φ the class of functions satisfying conditions (C1)-(C4).

Given a φ ∈ Φ, let φsoc and (φ0)soc be the vector-valued functions given as in (3.6)-(3.7). We define D(x, y) involved in the proximal-like algorithm (3.34) by

D(x, y) :=

( trh

φsoc(y) − φsoc(x) − (φ0)soc(x) ◦ (y − x)i

, ∀x ∈ int(Kn), y ∈ Kn.

∞, otherwise.

(3.35) The function, as will be shown later, possesses some favorable properties. Particularly, D(x, y) ≥ 0 for any x, y ∈ int(Kn), and D(x, y) = 0 if and only if x = y. Hence, D(x, y) can be used to measure the distance between any two points in int(Kn).

In the following, we concentrate on the examples of the distance-like function D(x, y).

For this purpose, we first give another characterization for condition (C3).

Lemma 3.2. Let φ : IR → (−∞, ∞] be a closed proper convex function with domφ = [0, +∞). If φ is thrice continuously differentiable on int(domφ), then φ satisfies condition (C3) if and only if its derivative function φ0 is exponentially convex (which means the function φ0(exp(·)) : IR → IR is convex on IR), or

φ0(t1t2) ≤ 1 2



φ0(t21) + φ0(t22)



, ∀t1, t2 > 0. (3.36) Proof. Since the function φ is thrice continuously differentiable on int(domφ), φ satisfies condition (C3) if and only if

φ00(t) + tφ000(t) ≥ 0, ∀t > 0.

Observe that the inequality is also equivalent to

00(t) + t2φ000(t) ≥ 0, ∀t > 0,

and hence substituting by t = exp(θ) for θ ∈ IR into the inequality yields that exp(θ)φ00(exp(θ)) + exp(2θ)φ000(exp(θ)) ≥ 0, ∀θ ∈ IR.

Since the left hand side of this inequality is exactly [φ0(exp(θ))]00, it means that φ0(exp(·)) is convex on IR. Consequently, the first part of the conclusions follows.

Note that the convexity of φ0(exp(·)) on IR is equivalent to saying for any θ1, θ2 ∈ IR, φ0(exp(rθ1+ (1 − r)θ2)) ≤ rφ0(exp(θ1)) + (1 − r)φ0(exp(θ2)), r ∈ [0, 1],

which, by letting t1 = exp(θ1) and t2 = exp(θ2), can be rewritten as

φ0(tr1t1−r2 ) ≤ rφ0(t1) + (1 − r)φ0(t2), ∀t1, t2 > 0 and r ∈ [0, 1].

This is clearly equivalent to the statement in (3.36) due to the continuity of φ0.  Remark 3.3. The exponential convexity was also used in the definition of the self-regular function [123], in which the authors denote Ω by the set of functions whose elements are twice continuously differentiable and exponentially convex on (0, +∞). By Lemma 3.2, clearly, if h ∈ Ω, then the function Rt

0 h(θ)dθ necessarily satisfies condition (C3). For example, ln t belongs to Ω, and hence Rt

0 ln θdθ = t ln t satisfies condition (C3).

Now we present several examples showing how to construct D(x, y). From these examples, we see that the conditions required by φ ∈ Φ are not so strict and the con-struction of the distance-like functions in SOCs can be completed by selecting a class of single variate convex functions.

Example 3.4. Let φ1 : IR → (−∞, ∞] be given by

φ1(t) = t ln t − t + 1 if t ≥ 0,

∞ if t < 0.

Solution. It is easy to verify that φ1 satisfies conditions (C1)-(C3). In addition, by Example 2.10 and 2.12, the function ln t is SOC-concave and SOC-monotone on (0, ∞), hence the condition (C4) also holds. From formula (3.6)-(3.7), it follows that for any y ∈ Kn and x ∈ int(Kn),

φsoc1 (y) = y ◦ ln y − y + e and (φ01)soc(x) = ln x.

Consequently, the distance-like function induced by φ1 is given by

D1(x, y) = tr (y ◦ ln y − y ◦ ln x + x − y) , ∀x ∈ int(Kn), y ∈ Kn.

This function is precisely the natural extension of the entropy-like distance dϕ(·, ·) with ϕ(t) = − ln t + t − 1 to the second-order cones. In addition, comparing D1(x, y) with the distance-like function H(x, y) in Example 3.1 of [115] (see Section 3.1), we note that D1(x, y) = H(y, x), but the proximal-like algorithms corresponding to them are completely different. 

Example 3.5. Let φ2 : IR → (−∞, ∞] be given by

φ2(t) = t ln t + (1 + t) ln(1 + t) − (1 + t) ln 2 if t ≥ 0,

∞ if t < 0.

Solution. By computing, we can verify that φ2 satisfies conditions (C1)-(C3). Further-more, from earlier examples, we learn that φ2 also satisfies condition (C4). This means that φ2 ∈ Φ. For any y ∈ Kn and x ∈ int(Kn), we can compute that

φsoc2 (y) = y ◦ ln y + (e + y) ◦ ln(e + y) − ln 2(e + y), (φ02)soc(x) = (2 − ln 2)e + ln x + ln(e + x).

Therefore, the distance-like function generated by such a φ is given by

D2(x, y) = tr − ln(e + x) ◦ (e + y) + y ◦ (ln y − ln x) + (e + y) ◦ ln(e + y) − 2(y − x)

for any x ∈ int(Kn) and y ∈ Kn. It should be pointed out that D2(x, y) is not the extension of dϕ(·, ·) with ϕ(t) = φ2(t) given by [80] to the second-order cones.  Example 3.6. For any 0 ≤ r < 12, let φ3 : IR → (−∞, ∞] be given by

φ3(t) = t2r+32 + t2 if t ≥ 0,

∞ if t < 0.

Solution. It is easy to verify that φ3 satisfies conditions (C1)-(C3). Furthermore, from Examples 2.12-2.14, it follows that φ3 satisfies condition (C4). In other words, φ3 ∈ Φ.

By a simple computation, we know

φsoc3 (y) = y2r+32 + y2, ∀y ∈ Kn, (φ03)soc(x) = 2r + 3

2 x2r+12 + 2x, ∀x ∈ int(Kn).

Hence, the distance-like function induced by φ3 has the following expression D3(x, y) = tr 2r + 1

2 x2r+32 + x2− y ◦2r + 3

2 x2r+12 + 2x

+ y2r+32 + y2

 .



Example 3.7. For any 0 < a ≤ 1, let φ4 : IR → (−∞, ∞] be given by φ4(t) = ta+1+ at ln t − at if t ≥ 0,

∞ if t < 0.

Solution. It is easily shown that φ4 satisfies conditions (C1)-(C3). By Examples 2.11-2.14, φ04 is SOC-concave on (0, ∞). Hence, φ4 ∈ Φ. For any y ∈ Kn and x ∈ int(Kn),

φsoc4 (y) = ya+1+ ay ◦ ln y − ay and (φ04)soc(x) = (a + 1)xa+ a ln x.

Consequently, the distance-like function induced by φ4 has the following expression D4(x, y) = trh

axa+1+ ax − y ◦

(a + 1)xa+ a ln x

+ ya+1+ ay ◦ ln y − ayi .



In what follows, we study some favorable properties of the function D(x, y). We begin with some technical lemmas that will be used in the subsequent analysis.

Lemma 3.3. Suppose that φ : IR → (−∞, ∞] belongs to the class of Φ, i.e., satisfying (C1)-(C4). Let φsoc and (φ0)soc be the corresponding SOC-functions of φ and φ0 given as in (1.9). Then, the following hold.

(a) φsoc(x) and (φ0)soc(x) are well-defined on Kn and int(Kn), respectively, and λisoc(x)] = φ[λi(x)], λi[(φ0)soc(x)] = φ0i(x)], i = 1, 2.

(b) φsoc(x) and (φ0)soc(x) are continuously differentiable on int(Kn) with the transposed Jacobian at x given as in formulas (1.28)–(1.29).

(c) tr[φsoc(x)] and tr[(φ0)soc(x)] are continuously differentiable on int(Kn), and

∇tr [φsoc(x)] = 2∇φsoc(x)e = 2(φ0)soc(x),

∇tr [(φ0)soc(x)] = 2∇(φ0)soc(x)e = 2(φ00)soc(x).

(d) The function tr[φsoc(x)] is strictly convex on int(Kn).

Proof. Mimicking the arguments as in Lemma 3.1, in other words, using Propositions 1.13-1.14, and the definition of Φ, the desired results follow. 

Lemma 3.4. Suppose that φ : IR → (−∞, ∞] belongs to the class of Φ and z ∈ IRn. Let φz : int(Kn) → IR be defined by

φz(x) := tr − z ◦ (φ0)soc(x). (3.37) Then, the function φz(x) possesses the following properties.

(a) φz(x) is continuously differentiable on int(Kn) with ∇φz(x) = −2∇(φ0)soc(x) · z.

(b) φz(x) is convex over int(Kn) when z ∈ Kn, and furthermore, it is strictly convex over int(Kn) when z ∈ int(Kn).

Proof. (a) Since φz(x) = −2h(φ0)soc(x), zi for any x ∈ int(Kn), we have that φz(x) is continuously differentiable on int(Kn) by Lemma 3.3(b). Moreover, applying the chain rule for inner product of two functions readily yields ∇φz(x) = −2∇(φ0)soc(x) · z.

(b) By the continuous differentiability of φz(x), to prove the convexity of φz on int(Kn), it suffices to prove the following inequality

φz x + y 2



≤ 1 2



φz(x) + φz(y)

, ∀x, y ∈ int(Kn). (3.38) By condition (C4), φ0 is SOC-concave on (0, +∞). Therefore, we have

−(φ0)soc x + y 2



Kn −1 2 h

0)soc(x) + (φ0)soc(y)i ,

i.e.,

0)soc x + y 2



− 1

2(φ0)soc(x) −1

2(φ0)soc(y) Kn 0.

Using Property 1.3(d) and the fact that z ∈ Kn, we then obtain that



z, (φ0)soc x + y 2



− 1

2(φ0)soc(x) −1

2(φ0)soc(y)



≥ 0, (3.39)

which in turn implies that D− z, (φ0)soc x + y

2

E

≤ 1 2

D− z, (φ0)soc(x)E + 1

2

D− z, (φ0)soc(y)E .

The last inequality is exactly the one in (3.38). Hence, φz is convex on int(Kn) for z ∈ Kn. To prove the second part of the conclusions, we only need to prove that the inequality in (3.39) holds strictly for any x, y ∈ int(Kn) and x 6= y. By Property 1.3(d), this is also equivalent to proving the vector (φ0)soc x+y2  − 120)soc(x) − 120)soc(y) is nonzero since

0)soc x + y 2



− 1

2(φ0)soc(x) −1

2(φ0)soc(y) ∈ Kn and z ∈ int(Kn).

From condition (C4), it follows that φ0 is concave on (0, ∞) since the SOC-concavity implies the concavity. This together with the strict monotonicity of φ0 implies that φ0 is strictly concave on (0, ∞). Using Lemma 3.3(d), we then have that tr[(φ0)soc(x)] is strictly concave on int(Kn). This means that for any x, y ∈ int(Kn) and x 6= y,

tr

 (φ0)soc

x + y 2



− 1

2tr [(φ0)soc(x)] − 1

2tr [(φ0)soc(y)] > 0. (3.40) In addition, we note that the first element of (φ0)soc x+y2  − 120)soc(x) −120)soc(y) is

φ0 λ1

x+y 2



+ φ0 λ2

x+y 2



2 − φ01(x)) + φ02(x))

4 − φ01(y)) + φ02(y))

4 ,

which, by (1.6), can be rewritten as 1

2tr



0)socx + y 2



− 1

4tr [(φ0)soc(x)] − 1

4tr [(φ0)soc(y)] .

This together with (3.40) shows that (φ0)soc x+y2  −120)soc(x) − 120)soc(y) is nonzero for any x, y ∈ int(Kn) and x 6= y. Consequently, φz is strictly convex on int(Kn).  Lemma 3.5. Let F be the set defined as in (3.17). Then, its recession cone 0+F is described by

0+F =n

d ∈ IRm | Ad Kn 0o

. (3.41)

Proof. Assume that d ∈ IRm such that Ad Kn 0. Then, for any λ > 0, λAd Kn 0.

Considering that Kn is closed under the “+” operation, we have for any ζ ∈ F ,

A(ζ + λd) + b = (Aζ + b) + λ(Ad) Kn 0. (3.42) By [131, page 61], this shows that every element in the set of the right hand side of (3.41) is a recession direction of F . Consequently, {d ∈ IRm | Ad Kn 0} ⊆ 0+F .

Now take any d ∈ 0+F and ζ ∈ F . Then, for any λ > 0, equation (3.42) holds. By Property 1.1(c), we then have λ1

h

(Aζ + b) + λAd i

≥ 0 for any λ > 0. This implies that λ1(Ad) ≥ 0, since otherwise letting λ → +∞ and using the fact that

λ1h

(Aζ + b) + λAdi

= (Aζ + b)1+ λ(Ad)1− k(Aζ + b)2+ λ(Ad)2k

≤ (Aζ + b)1+ λ(Ad)1−

λk(Ad)2k − k(Aζ + b)2k

= λλ1(Ad) + λ2(Aζ + b),

we obtain that λ1[(Aζ + b) + λAd] → −∞. Thus, we prove that Ad Kn 0, and conse-quently 0+F ⊆ {d ∈ IRm | Ad Kn 0}. Combining with the above discussions then yields the result. 

Lemma 3.6. Let {ank} be a sequence of real numbers satisfying (i) ank ≥ 0, ∀n = 1, 2, · · · and ∀k = 1, 2, · · · .

(ii)

X

k=1

ank = 1, ∀n = 1, 2, · · · ; and lim

n→∞

n

X

k=1

ankuk= u, ∀k = 1, 2, · · · . If {uk} is a sequence such that limk→+∞uk= u, then limk→+∞ankuk= u.

Proof. Please see [91, Theorem 2]. 

Lemma 3.7. Let {υk} and {βk} be nonnegative sequences of real numbers satisfying (i) υk+1 ≤ υk+ βk, (ii) P

k=1βk < +∞. Then, the sequence {υk} is convergent.

Proof. Please see [125, Chapter 2] for a proof. 

Now we are in a position to study the properties of the distance-like function D(x, y).

Proposition 3.8. Given a function φ ∈ Φ, let D(x, y) be defined as in (3.35). Then, the following hold.

(a) D(x, y) ≥ 0 for any x ∈ int(Kn) and y ∈ Kn, and D(x, y) = 0 if and only if x = y.

(b) For any fixed y ∈ Kn, D(·, y) is continuously differentiable on int(Kn) with

xD(x, y) = 2∇(φ0)soc(x) · (x − y). (3.43)

(c) For any fixed y ∈ Kn, the function D(·, y) is convex over int(Kn), and for any fixed y ∈ int(Kn), D(·, y) is strictly convex over int(Kn).

(d) For any fixed y ∈ int(Kn), the function D(·, y) is essentially smooth.

(e) For any fixed y ∈ Kn, the level sets LD(y, γ) := {x ∈ int(Kn) | D(x, y) ≤ γ} for all γ ≥ 0 are bounded.

Proof. (a) By Lemma 3.3(c), for any x ∈ int(Kn) and y ∈ Kn, we can rewrite D(x, y) as D(x, y) = tr[φsoc(y)] − tr[φsoc(x)] − h∇tr[φsoc(x)], y − xi.

Notice that tr[φsoc(x)] is strictly convex on int(Kn) by Lemma 3.3 (d), and hence D(x, y) ≥ 0 for any x ∈ int(Kn) and y ∈ Kn, and D(x, y) = 0 if and only if x = y.

(b) By Lemma 3.3(b) and (c), the functions tr[φsoc(x)] and h(φ0)soc(x), xi are continuously differentiable on int(Kn). Noting that, for any x ∈ int(Kn) and y ∈ Kn,

D(x, y) = tr[φsoc(y)] − tr[φsoc(x)] − 2h(φ0)soc(x), y − xi,

we then have the continuous differentiability of D(·, y) on int(Kn). Furthermore,

xD(x, y) = −∇tr[φsoc(x)] − 2∇(φ0)soc(x) · (y − x) + 2(φ0)soc(x)

= −2(φ0)soc(x) + 2∇(φ0)soc(x) · (x − y) + 2(φ0)soc(x)

= 2∇(φ0)soc(x) · (x − y).

(c) By the definition of φz given as in (3.37), D(x, y) can be rewritten as D(x, y) = tr[(φ0)soc(x) ◦ x − φsoc(x)] + φy(x) + tr[φsoc(y)].

Thus, to prove the (strict) convexity of D(·, y) on int(Kn), it suffices to show that tr[(φ0)soc(x) ◦ x − φsoc(x)] + φy(x)

is (strictly) convex on int(Kn). From condition (C3) and Lemma 3.3(d), it follows that tr[(φ0)soc(x) ◦ x − φsoc(x)] is convex over int(Kn). In addition, by Lemma 3.4(b), φy(x) is convex on int(Kn) if y ∈ Kn, and it is strictly convex if y ∈ int(Kn). Thus, we get the desired results.

(d) From [131, page 251] and part(a)-(b), to prove that D(·, y) is essentially smooth for any fixed y ∈ int(Kn), it suffices to show that k∇xD(xk, y)k → ∞ for any {xk} ⊆ int(Kn) with xk → x ∈ bd(Kn). We next prove the conclusion by the two cases: x1 > 0 and x1 = 0. For the sake of notation, let xk = (xk1, xk2) ∈ IR × IRn−1.

Case 1: x1 > 0. In this case, kx2k = x1 > 0 since x ∈ bd(Kn). Noting that xk → x, we have xk2 6= 0 for all sufficiently large k. From the gradient formula (3.43),

k∇xD(xk, y)k = k2∇(φ0)soc(xk) · (xk− y)k ≥

2[∇(φ0)soc(xk) · (xk− y)]1

, (3.44)

where [∇(φ0)soc(xk)·(xk−y)]1 denotes the first element of the vector ∇(φ0)soc(xk)·(xk−y).

By the gradient formula (1.29), we can compute that

2[∇(φ0)soc(xk) · (xk− y)]1 = [φ002(xk)) + φ001(xk))](xk1− y1) +[φ002(xk)) − φ001(xk))](xk2 − y2)Txk2

kxk2k

= φ002(xk)) λ2(xk) − y1− y2Txk2/kxk2k

−φ001(xk)) y1− y2Txk2/kxk2k − λ1(xk) . (3.45) Therefore,

2[∇(φ0)soc(xk) · (xk− y)]1

φ001(xk)) y1− yT2xk2/kxk2k − λ1(xk)

φ002(xk)) λ2(xk) − y1− y2Txk2/kxk2k

φ001(xk)) ·

y1− yT2xk2/kxk2k

− λ1(xk)

φ002(xk)) ·

λ2(xk) − y1− y2Txk2/kxk2k

φ001(xk)) ·

λ1(y) − λ1(xk)

φ002(xk)) ·

λ2(xk) − y1− y2Txk2/kxk2k . Noting that λ1(xk) → λ1(x) = 0, λ2(xk) → λ2(x) > 0 and y2Txk2

kxk2k → y2Tx2

kx2k as k → ∞, the second term in the right hand side of last inequality converges to a finite value, whereas the first term approaches to ∞ since |φ001(xk))| → ∞ by condition (C2) and λ1(y) − λ1(xk) → λ1(y) > 0. This implies that as k → +∞,

2[∇(φ0)soc(xk) · (xk− y)]1

→ ∞.

Combining with the inequality (3.44) immediately yields k∇xD(xk, y)k → ∞.

Case 2: x1 = 0. In this case, we necessarily have that x = 0 since x ∈ Kn. Considering that xk → x, it then follows that xk2 = 0 or xk2 > 0 for all sufficiently large k. If xk2 = 0 for all sufficiently large k, then from (1.28) we have that

k∇xD(xk, y)k = k2φ00(xk1)(xk− y)k ≥ 2|φ00(xk1)| · |xk1 − y1|.

Since y1 > 0 by y ∈ int(Kn) and xk1 → x1 = 0, applying condition (C2) yields that the right hand side tends to ∞, and consequently k∇xD(xk, y)k → +∞ when k → ∞.

Next, we consider the case that xk2 > 0 for all sufficiently large k. In this case, the inequalities (3.44)-(3.45) still hold. By Cauchy-Schwartz Inequality,

λ2(xk) − y1− yT2xk2/kxk2k ≥ λ2(xk) − y1− ky2k = λ2(xk) − λ2(y), y1 − y2Txk2/kxk2k − λ1(xk) ≥ y1− ky2k − λ1(xk) = λ1(y) − λ1(xk).

Since λ1(xk), λ2(xk) → 0 as k → +∞ and λ1(y), λ2(y) > 0 by y ∈ int(Kn), the last two inequalities imply that

λ2(xk) − y1− y2Txk2/kxk2k → −λ2(y) < 0, y1− y2Txk2/kxk2k − λ1(xk) → λ1(y) > 0.

On the other hand, by condition (C2), when k → ∞,

φ002(xk)) → ∞, φ001(xk)) → ∞.

The two sides show that the right hand side of (3.45) approaches to −∞ as k → ∞, and consequently, 2|[∇(φ0)soc(xk) · (xk − y)]1| → ∞. Thus, from (3.44), it follows that k∇xD(xk, y)k → ∞ as k → ∞.

(e) From the definition of D(x, y), it follows that for any x, y ∈ int(Kn), D(x, y) = tr[φsoc(y)] − tr[φsoc(x)] − tr[(φ0)soc(x) ◦ y] + tr[(φ0)soc(x) ◦ x]

=

2

X

i=1

φ(λi(y)) −

2

X

i=1

φ(λi(x)) − tr[(φ0)soc(x) ◦ y] + tr[(φ0)soc(x) ◦ x],(3.46)

where the second equality is from Lemma 3.3(a) and (1.6). Since (φ0)soc(x) ◦ x = h

φ01(x))u(1)x + φ02(x))u(2)x i

◦h

λ1(x)u(1)x + λ2(x)u(2)x i

= φ01(x))λ1(x)u(1)x + φ02(x))λ2(x)u(2)x , we have from Lemma 3.3(a) that

tr[(φ0)soc(x) ◦ x] =

2

X

i=1

φ0i(x))λi(x).

In addition, by Property 1.3(b) and Lemma 3.3(a), we have

tr[(φ0)soc(x) ◦ y] ≤

2

X

i=1

φ0i(x))λi(y).

Combining the last two inequalities with (3.46) yields that

D(x, y) ≥

2

X

i=1

h

φ(λi(y)) − φ(λi(x)) − φ0i(x))λi(y) + φ0i(x))λi(x)i

=

2

X

i=1

h

φ(λi(y)) − φ(λi(x)) − φ0i(x))(λi(y) − λi(x))i

=

2

X

i=1

dBi(y), λi(x)),

where dB : IR+× IR++→ IR is the function defined by dB(s, t) = φ(s) − φ(t) − φ0(t)(s − t).

This implies that for any fixed y ∈ Kn and γ ≥ 0,

LD(y, γ) ⊆ (

x ∈ int(Kn)

2

X

i=1

dBi(y), λi(x)) ≤ γ )

. (3.47)

Note that for any fixed s ≥ 0, the set {t > 0 | dB(s, t) ≤ 0} equals to {s} or ∅, and hence it is bounded. Thus, from [131, Corollary 8.7.1] and condition (C3), it follows that the level sets {t > 0 | dB(s, t) ≤ γ} for any fixed s ≥ 0 are bounded. This together with (3.47) implies that the level sets LD(y, γ) are bounded for all γ ≥ 0. 

Proposition 3.9. Given a function φ ∈ Φ, let D(x, y) be defined as in (3.35). Then, for all x, y ∈ int(Kn) and z ∈ Kn, we have the following inequality

D(x, z) − D(y, z) ≥ 2h∇(φ0)soc(y) · (z − y), y − xi

= 2h∇(φ0)soc(y) · (y − x), z − yi. (3.48) Proof. Let ψ : (0, ∞) → IR be the function defined by

ψ(t) := φ0(t)t − φ(t). (3.49)

Then, the vector-valued function induced by ψ via (3.6)-(3.7) is (φ0)soc(x) ◦ x − φsoc(x), i.e.,

ψsoc(x) = (φ0)soc(x) ◦ x − φsoc(x). (3.50) From the definition of D(x, y) and φz(x) and equality (3.50), it follows that

D(x, z) − D(y, z) = tr[(φ0)soc(x) ◦ x − φsoc(x)] + φz(x)

−tr[(φ0)soc(y) ◦ y − φsoc(y)] − φz(y)

= tr[ψsoc(x)] − tr[ψsoc(y)] + φz(x) − φz(y)

≥ h∇tr[ψsoc(y)], x − yi + h∇φz(y), x − yi

= h2(ψ0)soc(y), x − yi − h2∇(φ0)soc(y) · z, x − yi, (3.51) where the inequality is due to the convexity of tr[ψsoc(x)] and φz(x) and the last equality follows from Lemma 3.3(c) and Lemma 3.4(a). From the definition of ψ given as in (3.49), it is easy to compute that

h(ψ0)soc(y), x − yi = h(φ00)soc(y) ◦ y, x − yi. (3.52) In addition, by the gradient formulas in (1.28)-(1.29), we can compute that

∇(φ0)soc(y) · y = (φ00)soc(y) ◦ y,

which in turn implies that

h∇(φ0)soc(y) · z, x − yi

= h∇(φ0)soc(y) · (y + z − y), x − yi

= h∇(φ0)soc(y) · y, x − yi + h∇(φ0)soc(y) · (z − y), x − yi

= h(φ00)soc(y) ◦ y, x − yi + h∇(φ0)soc(y) · (z − y), x − yi.

This, together with (3.52) and (3.51), yields the first inequality in (3.48), whereas the second inequality follows from the symmetry of the matrix ∇(φ0)soc(y). 

Propositions 3.8-3.9 indicate that D(x, y) possesses some favorable properties similar to those for dϕ. We will employ these properties to establish the convergence for an approximate version of the proximal-like algorithm (3.34).

The proximal-like algorithm described as (3.34) for the CSOCP consists of a sequence of exact minimization. However, in practical computations, it is impossible to obtain the exact solution of these minimization problems. Therefore, we consider an approximate version of this algorithm which allows the inexact solution of the subproblems (3.34).

Throughout this section, we make the following assumptions for the CSOCP:

(A1) inf {f (ζ) | ζ ∈ F } := f > −∞ and dom(f ) ∩ int(F ) 6= ∅.

(A2) The matrix A is of maximal rank m.

Remark 3.4. As remarked in Remark 3.2, Assumption (A1) is elementary for the ex-istence of the solution of the CSOCP. Assumption (A2) is common in the solution of the SOCPs, which is clearly satisfied when F = {ζ ∈ IRn| ζ Kn 0}. Moreover, if we consider the linear SOCP

min ¯cTx

s.t. Ax = ¯¯ b, x ∈ Kn, (3.53)

where ¯A ∈ IRm×n with m ≤ n, ¯b ∈ IRm, and ¯c ∈ IRn, the assumption that ¯A has full row rank m is standard. Consequently, its dual problem, given by

max ¯bTy

s.t. ¯c − ¯ATy Kn 0, (3.54)

satisfies assumption (A2). This shows that we can solve the linear SOCP by applying the approximate proximal-like algorithm described below to the dual problem (3.54). In addition, we know that the recession cone of F is given by 0+F = {d ∈ IRm| Ad Kn 0}.

This implies that assumption (A2) is also satisfied when F is supposed to be bounded, since its recession cone 0+F now reduces to zero.

For the sake of notation, in the sequel, we denote D : int(F ) × F → IR by

D(ζ, ξ) := D(Aζ + b, Aξ + b). (3.55)

From Proposition 3.8, we readily obtain the following properties of D(ζ, ξ).

Proposition 3.10. Let D(ζ, ξ) be defined by (3.55). Then, under Assumption (A2), we have

(a) D(ζ, ξ) ≥ 0 for any ζ ∈ int(F ) and ξ ∈ F , and D(ζ, ξ) = 0 if and only if ζ = ξ;

(b) the function D(·, ξ) for any fixed ξ ∈ F is continuously differentiable on int(F ) with

ζD(ζ, ξ) = 2AT∇(φ0)soc(Aζ + b)A(ζ − ξ); (3.56) (c) for any fixed ξ ∈ F , the function D(·, ξ) is convex on int(F ), and for any fixed

ξ ∈ int(F ), then D(·, ξ) is strictly convex over int(F );

(d) for any fixed ξ ∈ int(F ), the function D(·, ξ) is essentially smooth;

(e) for any fixed ξ ∈ F , the level sets L(ξ, γ) =ζ ∈ int(F) | D(ζ, ξ) ≤ γ for all γ ≥ 0 are bounded.

Now we describe an approximate version of the proximal-like algorithm (3.34).

The APM. Given a starting point ζ0 ∈ int(F ) and constants k ≥ 0 and µk > 0, generate the sequence {ζk} ⊂ int(F ) satisfying

 gk∈ ∂kf (ζk),

µkgk+ ∇ζD(ζk, ζk−1) = 0, (3.57) where ∂f represents the -subdifferential of f .

Remark 3.5. The APM can be regarded as an approximate version of the entropy proximal-like algorithm (3.34) in the following sense. From the relation in (3.57) and the convexity of D(·, ξ) over int(F ) for any fixed ξ ∈ int(F ), it follows that for any u ∈ int(F ),

f (u) ≥ f (ζk) + hu − ζk, gki − k and

µ−1k D(u, ζk−1) ≥ µ−1k D(ζk, ζk−1) + µ−1k h∇ζD(ζk, ζk−1), u − ζki.

Adding the last two inequalities and using (3.57) yields

f (u) + µ−1k D(u, ζk−1) ≥ f (ζk) + µ−1k D(ζk, ζk−1) − k. This implies that

ζk ∈ k− argminf (ζ) + µ−1k D(ζ, ζk−1) , (3.58) where for a given function F and  ≥ 0, the notation

 − argmin F (ζ) :=ζ| F (ζ) ≤ inf F (ζ) +  . (3.59)

In the rest of this section, we focus on the convergence of the APM defined as in (3.57) under assumptions (A1) and (A2). First, we prove that the APM generates a sequence {ζk} ⊂ int(F ), and consequently the APM is well-defined.

Proposition 3.11. For any ξ ∈ int(F ) and µ > 0, we have the following results.

(a) The function F (·) := f (·)+µ−1D(·, ξ) has bounded level sets under assumption (A1).

(b) If, in addition, assumption (A2) holds, then there has a unique bζ ∈ int(F ) such that ζ = argminb

ζ∈int(F )

f (ζ) + µ−1D(ζ, ξ) , (3.60)

and moreover, the minimum in the right hand side is attained at bζ satisfying

−2µ−1AT∇(φ0)soc(Abζ + b)A(bζ − ξ) ∈ ∂f (bζ). (3.61) Proof. (a) Fix ξ ∈ int(F ) and µ > 0. By assumption (A1) and the nonnegativity of D(ζ, ξ), to show that F (ζ) has bounded level sets, it suffices to show that for all ν ≥ f, the level sets L(ν) := {ζ ∈ int(F ) | F (ζ) ≤ ν} are bounded. Notice that L(ν) ⊆ L(ξ, µ(ν − f)) and L(ξ, γ) := {ζ ∈ int(F ) | D(ζ, ξ) ≤ γ} are bounded for all γ ≥ 0 by Proposition 3.10(e). Therefore, the sets L(ν) all ν ≥ f are bounded.

(b) By Proposition 3.10(b), F (ζ) is a closed proper strictly convex function. Hence, if the minimum exists, it must be unique. From part(a), the minimizer bζ exists, and so it is unique. Under assumption (A2), using the gradient formula in (3.56) and the optimality conditions for (3.60) then yields that

0 ∈ ∂f (bζ) + 2µ−1AT∇(φ0)soc(Abζ + b)A(bζ − ξ) + ∂δ(bζ | F ), (3.62) where δ(u | F ) = 0 if u ∈ F and +∞ otherwise. By Proposition3.10(c) and [131, Theorem 26.1], we have ∂ζD(ζ, ξ) = ∅ for all ζ ∈ bd(F ). Hence, the relation in (3.62) implies that ζ ∈ int(F ). On the other hand, from [131, Page 226], we know thatb

∂δ(u | F ) = {v ∈ IRn| v Kn 0, tr(v ◦ u) = 0} .

Using Property 1.3(d), we then obtain ∂δ(bζ | F ) = {0}. Thus, the proof is complete.



Next, we investigate the properties of the sequence {ζk} generated by the APM defined as in (3.57).

Proposition 3.12. Let {µk} be any sequence of positive numbers and σn = Pn k=1µk. Let {ζk} be the sequence generated by the APM defined as in (3.57). Then, the following hold.

(a) µk[f (ζk) − f (ζ)] ≤ D(ζk−1, ζ) − D(ζk, ζ) + µkk for all ζ ∈ F . (b) D(ζk, ζ) ≤ D(ζk−1, ζ) + µkk for all ζ ∈ F subject to f (ζ) ≤ f (ζk).

(c) σn(f (ζn) − f (ζ)) ≤ D(ζ0, ζ) − D(ζn, ζ) +Pn

k=1σkk for all ζ ∈ F .

Proof. (a) For any ζ ∈ F , using the definition of the -subdifferential, we have

f (ζ) ≥ f (ζk) + hgk, ζ − ζki − k, (3.63) where gk∈ ∂kf (ζk). However, from (3.57) and (3.56), it follows that

gk = −2µ−1k AT∇(φ0)soc(Aζk+ b)A(ζk− ζk−1).

Substituting this gk into (3.63), we then obtain that µkf (ζk) − f (ζ) ≤ 2D

AT∇(φ0)soc(Aζk+ b)A(ζk− ζk−1), ζ − ζkE

+ µkk.

On the other hand, applying Proposition 3.9 at the points x = Aζk−1+ b, y = Aζk+ b and z = Aζ + b and using the definition of D(ζ, ξ) given by (3.55) yields

D(ζk−1, ζ) − D(ζk, ζ) = 2D

AT∇(φ0)soc(Aζk+ b)A(ζk− ζk−1), ζ − ζkE . Combining the last two equations, we immediately obtain the result.

(b) The result follows directly from part (a) for any ζ ∈ F such that f (ζk) ≥ f (ζ).

(c) First, from (3.58), it follows that

ζk ∈ k− argminf (ζ) + µ−1k D(ζ, ζk−1) . This implies that for any ζ ∈ int(F ),

f (ζ) + µ−1k D(ζ, ζk−1) ≥ f (ζk) + µ−1k D(ζk, ζk−1) − k.

Setting ζ = ζk−1 in this inequality and using Proposition 3.10(d) then yields that f (ζk−1) − f (ζk) ≥ µ−1k D(ζk, ζk−1) − k ≥ −k.

Multiplying the above inequality by σk−1 and summing over k = 1, 2, · · · , n, we get

n

X

k=1

k−1f (ζk−1) − (σk− µk)f (ζk) ≥ −

n

X

k=1

σk−1k,

which, by noting that σk = µk+ σk−1 (with σ0 ≡ 0), can be reduced to σnf (ζn) −

n

X

k=1

µkf (ζk) ≤

n

X

k=1

σk−1k.

On the other hand, using part (a) and summing over k = 1, 2, · · · , n, we have

−σnf (ζ) +

n

X

k=1

µkf (ζk) ≤ D(ζ0, ζ) − D(ζn, ζ) +

n

X

k=1

µkk, ∀ζ ∈ F .

Adding the last two inequalities yields

σn(f (ζn) − f (ζ)) ≤ D(ζ0, ζ) − D(ζn, ζ) +

n

X

k=1

k+ σk−1)k,

which proves (c) because µk+ σk−1 = σk. 

We are now in a position to prove our main convergence result for the APM defined as in (3.57).

Proposition 3.13. Let {ζk} be the sequence generated by the APM defined as in (3.57) and σn=Pn

k=1µk. Then, under assumptions (A1) and (A2), the following hold.

(a) If σn→ ∞ and µ−1k σkk→ 0, then limn→∞f (ζn) → f. (b) If the optimal set X 6= ∅, σn → ∞ and P

k=1µkk < ∞, then the sequence ζk is bounded and every accumulation point is a solution of the CSOCP.

Proof. (a) From Proposition 3.12(c) and the nonnegativity of D(ζn, ζ), it follows that

f (ζn) − f (ζ) ≤ σn−1D(ζ0, ζ) + σn−1

n

X

k=1

σkk, ∀ζ ∈ F .

Taking the limit σn → +∞ to the two sides of the last inequality, we immediately have that the first term in the right hand side goes to zero. In addition, applying Lemma 3.6 with ank := σ−1n µk if k ≤ n and ank := 0 otherwise and uk := µ−1k σkk, we obtain that the second term in the right hand side

σ−1n

n

X

k=1

σkk =X

k

ankuk→ 0

because σn→ +∞ and µ−1k σkk → 0. Therefore, we have

n→+∞lim f (ζn) ≤ f.

This, together with the fact that f (ζn) ≥ f, implies the desired result.

(b) Suppose that ζ ∈ X . For any k, we have f (ζk) ≥ f (ζ). From Proposition 3.12(b), it then follows that

D(ζk, ζ) ≤ D(ζk−1, ζ) + µkk.

Since P

k=1µkk < ∞, using Lemma 3.7 with vk := D(ζk, ζ) ≥ 0 and βk := µkk ≥ 0 yields that the sequence {D(ζk, ζ)} converges. Thus, by Proposition 3.10(e), the sequence {ζk} is bounded and consequently has an accumulation point. Without any loss of generality, let bζ ∈ F be an accumulation point of {ζk}. Then, there exists a subsequence {ζkj} → bζ for some kj → ∞. Since f is lower semi-continuous, we obtain f (bζ) = lim infkj→∞f (ζkj). On the other hand, f (ζkj) → f by part (a). The two sides imply that f (bζ) = f. Therefore, bζ is a solution of the CSOCP. The proof is thus complete. 

在文檔中 SOC Functions and Their Applications (頁 119-136)