• 沒有找到結果。

Proximal-like algorithm for SOCP

在文檔中 SOC Functions and Their Applications (頁 104-119)

Chapter 3

Algorithmic Applications

In this Chapter, we will see details about how the characterizations established in Chap-ter 2 be applied in real algorithms. In particular, the SOC-convexity are often involved in the solution methods of convex SOCPs; for example, the proximal-like methods. We present three types of proximal-like algorithms, and refer the readers to [115, 116, 118]

for their numerical performance.

on IRm which replaces the problem min

ζ∈IRmf (ζ) by a sequence of minimization problems with strictly convex objectives, generating a sequence {ζk} defined by

ζk = argminζ∈IRm



f (ζ) + 1

kkζ − ζk−1k2



, (3.2)

where {µk} is a sequence of positive numbers and k·k denotes the Euclidean norm in IRm. The method was due to Martinet [105] who introduced the above proximal minimization problem based on the Moreau proximal approximation [110] of f . The proximal point algorithm was then further developed and studied by Rockafellar [132, 133]. Later, several researchers [34, 39, 59, 60, 144] proposed and investigated nonquadratic proximal point algorithm for the convex programming with nonnegative constraints, by replacing the quadratic distance in (3.2) with other distance-like functions. Among others, Censor and Zenios [34] replaced the method (3.2) by a method of the form

ζk = argminζ∈IRm



f (ζ) + 1

µkD(ζ, ζk)



, (3.3)

where D(·, ·), called D-function, is a measure of distance based on a Bregman function.

Recall that, given a differentiable function ϕ, it is called a Bregman function [33, 54] if it satisfies the properties listed in Definition 3.1 below, and the induced D-function is given as follows:

D(ζ, ξ) := ϕ(ζ) − ϕ(ξ) − h∇ϕ(ξ), ζ − ξi, (3.4) where h·, ·i denotes the inner product in IRm and ∇ϕ denotes the gradient of ϕ.

Definition 3.1. Let S ⊆ IRmbe an open set and ¯S be its closure. The function ϕ : ¯S → IR is called a Bregman function with zone S if the following properties hold:

(i) ϕ is continuously differentiable on S;

(ii) ϕ is strictly convex and continuous on ¯S;

(iii) For each γ ∈ IR, the level sets LD(ξ, γ) = {ζ ∈ ¯S : D(ζ, ξ) ≤ γ} and LD(ζ, γ) = {ξ ∈ S : D(ζ, ξ) ≤ γ} are bounded for any ξ ∈ S and ζ ∈ ¯S, respectively;

(iv) If {ξk} ⊂ S converges to ξ, then D(ξ, ξk) → 0;

(v) If {ζk} and {ξk} are sequences such that ξk → ξ ∈ ¯S, {ζk} is bounded and if D(ζk, ξk) → 0, then ζk→ ξ.

The Bregman proximal minimization (BPM) method described in (3.3) was further extended by Kiwiel [89] with generalized Bregman functions, called B-functions. Com-pared with Bregman functions, these functions are possibly nondifferentiable and infinite on the boundary of their domain. For the detailed definition of B-functions and the

convergence of BPM method using B-functions, please refer to [89].

Next, we present a class of distance measures on SOC and discuss its relations with the D-function and the double-regularized Bregman distance [137]. To this end, we need a class of functions φ : [0, ∞) → IR satisfying

(T1) φ is continuously differentiable on IR++; (T2) φ is strictly convex and continuous on IR+;

(T3) For each γ ∈ IR, the level sets {s ∈ IR+| d(s, t) ≤ γ} and {t ∈ IR++| d(s, t) ≤ γ}

are bounded for any t ∈ IR++ and s ∈ IR+, respectively;

(T4) If {tk} ⊂ IR++ is a sequence such that limk→+∞tk = 0, then for all s ∈ IR++, limk→+∞φ0(tk)(s − tk) = −∞;

where the function d : [0, ∞) × (0, ∞) → IR is defined by

d(s, t) = φ(s) − φ(t) − φ0(t)(s − t), ∀s ∈ IR+, t ∈ IR++. (3.5) The function φ satisfying (T4) is said in [80–82] to be boundary coercive. If setting φ(x) = +∞ when x /∈ IR+, then φ becomes a closed proper strictly convex function on IR. Furthermore, by [89, Lemma 2.4(d)] and (T3), it is not difficult to see that φ(x) and Pn

i=1φ(xi) are an B-function on IR and IRn, respectively. Unless otherwise stated, in the rest of this section, we always assume that φ satisfies (T1)-(T4).

Using (1.9), the corresponding SOC functions of φ and φ0 are given by

φsoc(x) = φ (λ1(x)) u(1)x + φ (λ2(x)) u(2)x , (3.6) and

0)soc(x) = φ01(x)) u(1)x + φ02(x)) u(2)x , (3.7) which are well-defined over Kn and int(Kn), respectively. In view of this, we define

H(x, y) := trφsoc(x) − φsoc(y) − (φ0)soc(y) ◦ (x − y)

∀x ∈ Kn, y ∈ int(Kn),

∞ otherwise. (3.8)

In what follows, we will show that the function H : IRn × IRn → (−∞, +∞] enjoys some favorable properties similar to those of the D-function. Particularly, we prove that H(x, y) ≥ 0 for any x ∈ Kn, y ∈ int(Kn), and moreover, H(x, y) = 0 if and only if x = y.

Consequently, it can be regarded as a distance measure on the SOC.

We first start with a technical lemma that will be used in the subsequent analysis.

Lemma 3.1. Suppose that φ : [0, ∞) → IR satisfies (T1)-(T4). Let φsoc(x) and (φ0)soc(x) be given as in (3.6) and (3.7), respectively. Then, the following hold.

(a) φsoc(x) is continuously differentiable on int(Kn) with the gradient ∇φsoc(x) satisfying

∇φsoc(x)e = (φ0)soc(x).

(b) tr[φsoc(x)] =P2

i=1φ[λi(x)] and tr[(φ0)soc(x)] = P2

i=1φ0i(x)].

(c) tr[φsoc(x)] is continuously differentiable on int(Kn) with ∇tr[φsoc(x)] = 2∇φsoc(x)e.

(d) tr[φsoc(x)] is strictly convex and continuous on int(Kn).

(e) If {yk} ⊂ int(Kn) is a sequence such that limk→+∞yk= ¯y ∈ bd(Kn), then

k→+∞lim h∇tr[φsoc(yk)], x − yki = −∞ for all x ∈ int(Kn).

In other words, the function tr[φsoc(x)] is boundary coercive.

Proof. (a) The first part follows directly from Proposition 1.14. Now we prove the second part. If x2 6= 0, then by formulas (1.29)-(1.30) it is easy to compute that

∇φsoc(x)e =

φ02(x)) + φ01(x)) φ02(x)) − φ201(x))

2

x2

kx2k

.

In addition, using equations (1.4) and (3.7), we can prove that the vector in the right hand side is exactly (φ0)soc(x). Therefore, ∇φsoc(x)e = (φ0)soc(x). If x2 = 0, then using (1.28) and (1.4), we can also prove that ∇φsoc(x)e = (φ0)soc(x).

(b) The result follows directly from (1.6) and equations (3.6)-(3.7).

(c) From part(a) and the fact that tr[φsoc(x)] = tr[φsoc(x) ◦ e] = 2hφsoc(x), ei, clearly, tr[φsoc(x)] is continuously differentiable on int(Kn). Applying the chain rule for inner product of two functions immediately yields that ∇tr[φsoc(x)] = 2∇φsoc(x)e.

(d) It is clear that φsoc(x) is continuous on Kn. We next prove that it is strictly convex on int(Kn). For any x, y ∈ Kn with x 6= y and α, β ∈ (0, 1) with α + β = 1, we have

λ1(αx + βy) = αx1+ βy1− kαx2+ βy2k ≥ αλ1(x) + βλ1(y), λ2(αx + βy) = αx1+ βy1+ kαx2+ βy2k ≤ αλ2(x) + βλ2(y), which implies that

αλ1(x) + βλ1(y) ≤ λ1(αx + βy) ≤ λ2(αx + βy) ≤ αλ2(x) + βλ2(y).

On the other hand,

λ1(αx + βy) + λ2(αx + βy) = 2αx1+ 2βy1 = [αλ1(x) + βλ1(y)] + [αλ2(x) + βλ2(y)].

The last two equations imply that there exists ρ ∈ [0, 1] such that

λ1(αx + βy) = ρ[αλ1(x) + βλ1(y)] + (1 − ρ)[αλ2(x) + βλ2(y)], λ2(αx + βy) = (1 − ρ)[αλ1(x) + βλ1(y)] + ρ[αλ2(x) + βλ2(y)].

Thus, we have

tr[φsoc(αx + βy)] = φ[λ1(αx + βy)] + φ[λ2(αx + βy)]

= φ h

ρ(αλ1(x) + βλ1(y)) + (1 − ρ)(αλ2(x) + βλ2(y)) i +φh

(1 − ρ)(αλ1(x) + βλ1(y)) + ρ(αλ2(x) + βλ2(y))i

≤ ρφ αλ1(x) + βλ1(y) + (1 − ρ)φ αλ2(x) + βλ2(y) +(1 − ρ)φ αλ1(x) + βλ1(y) + ρφ αλ2(x) + βλ2(y)

= φ αλ1(x) + βλ1(y) + φ αλ2(x) + βλ2(y)

< αφ λ1(x) + βφ λ1(y) + αφ λ2(x) + βφ λ2(y)

= αtr[φsoc(x)] + βtr[φsoc(y)],

where the first equality and the last one follow from part(b), and the two inequalities are due to the strict convexity of φ on IR++. From the definition of strict convexity, we thus prove that the conclusion holds.

(e) From part(a) and part(c), we can readily obtain the following equality

∇tr[φsoc(x)] = 2(φ0)soc(x), ∀x ∈ int(Kn). (3.9) Using the relation and Proposition 1.3(b), we then have

h∇tr[φsoc(yk)], x − yki = 2h(φ0)soc(yk), x − yki

= tr[(φ0)soc(yk) ◦ (x − yk)]

= tr[(φ0)soc(yk) ◦ x] − tr[(φ0)soc(yk) ◦ yk]

2

X

i=1

φ0i(yk)]λi(x) − tr[(φ0)soc(yk) ◦ yk]. (3.10)

In addition, by Property 1.1(a)-(b), for any y ∈ int(Kn), we can compute that (φ0)soc(y) ◦ y = h

φ01(y))u(1)y + φ02(y))u(2)y i

◦h

λ1(y)u(1)y + λ2(y)u(2)y i

= φ01(y))λ1(y)u(1)y + φ02(y))λ2(y)u(2)y , (3.11) which implies that

tr[(φ0)soc(yk) ◦ yk =

2

X

i=1

φ0i(yk)]λi(yk). (3.12)

Combining with (3.10) and (3.12) immediately yields that

h∇tr[φsoc(yk)], x − yki ≤

2

X

i=1

φ0i(yk)][λi(x) − λi(yk)]. (3.13)

Note that λ2(¯y) ≥ λ1(¯y) = 0 and λ2(x) ≥ λ1(x) > 0 since ¯y ∈ bd(Kn) and x ∈ int(Kn).

Hence, if λ2(¯y) = 0, then by (T4) and the continuity of λi(·) for i = 1, 2,

k→+∞lim φ0i(yk)][λi(x) − λi(yk)] = −∞, i = 1, 2, which means that

k→+∞lim

2

X

i=1

φ0i(yk)][λi(x) − λi(yk)] = −∞. (3.14)

If λ2(¯y) > 0, then limk→+∞φ02(yk)][λ2(x) − λ2(yk)] is finite and lim

k→+∞φ01(yk)][λ1(x) − λ1(yk)] = −∞,

and therefore the result in (3.14) also holds under such case. Combining (3.14) with (3.13), we prove that the conclusion holds. 

Using the relation in (3.9), we have that for any x ∈ Kn and y ∈ int(Kn), trh

0)soc(y) ◦ (x − y)i

= 2D

0)soc(y), x − yE

=D

∇tr[φsoc(y)], x − yE . As a consequence, the function H(x, y) in (3.8) can be rewritten as

H(x, y) =  tr[φsoc(x)] − tr[φsoc(y)] − h∇tr[φsoc(y)], x − yi, ∀x ∈ Kn, y ∈ int(Kn).

∞, otherwise. (3.15)

By the representation, we next investigate several important properties of H(x, y).

Proposition 3.1. Let H(x, y) be the function defined as in (3.8) or (3.15). Then, the following hold.

(a) H(x, y) is continuous on Kn× int(Kn), and for any y ∈ int(Kn), the function H(·, y) is strictly convex on Kn.

(b) For any given y ∈ int(Kn), H(·, y) is continuously differentiable on int(Kn) with

xH(x, y) = ∇tr[φsoc(x)] − ∇tr[φsoc(y)] = 2(φ0)soc(x) − (φ0)soc(y). (3.16) (c) H(x, y) ≥ P2

i=1d(λi(x), λi(y)) ≥ 0 for any x ∈ Kn and y ∈ int(Kn), where d(·, ·) is defined by (3.5). Moreover, H(x, y) = 0 if and only if x = y.

(d) For every γ ∈ IR, the partial level sets of LH(y, γ) = {x ∈ Kn : H(x, y) ≤ γ} and LH(x, γ) = {y ∈ int(Kn) : H(x, y) ≤ γ} are bounded for any y ∈ int(Kn) and x ∈ Kn, respectively.

(e) If {yk} ⊂ int(Kn) is a sequence converging to y ∈ int(Kn), then H(y, yk) → 0.

(f ) If {xk} ⊂ int(Kn) and {yk} ⊂ int(Kn) are sequences such that {yk} → y ∈ int(Kn), {xk} is bounded, and H(xk, yk) → 0, then xk → y.

Proof. (a) Note that φsoc(x), (φ0)soc(y), (φ0)soc(y) ◦ (x − y) are continuous for any x ∈ Kn and y ∈ int(Kn) and the trace function tr(·) is also continuous, and hence H(x, y) is continuous on Kn× int(Kn). From Lemma 3.1(d), tr[φsoc(x)] is strictly convex over Kn, whereas −tr[φsoc(y)] − h∇tr[φsoc(y)], x − yi is clearly convex in Kn for fixed y ∈ int(Kn).

This means that H(·, y) is strictly convex for any y ∈ int(Kn).

(b) By Lemma 3.1(c), the function H(·, y) for any given y ∈ int(Kn) is continuously differentiable on int(Kn). The first equality in (3.16) is obvious and the second is due to (3.9).

(c) The result follows directly from the following equalities and inequalities:

H(x, y) = trφsoc(x)] − trφsoc(y)] − tr[(φ0)soc(y) ◦ (x − y)

= trφsoc(x)] − trφsoc(y)] − tr[(φ0)soc(y) ◦ x + tr[(φ0)soc(y) ◦ y

≥ trφsoc(x)] − trφsoc(y)] −

2

X

i=1

φ0i(y))λi(x) + tr[(φ0)soc(y) ◦ y

=

2

X

i=1

h

φ(λi(x)) − φ(λi(y)) − φ0i(y))λi(x) + φ0i(y))λi(y)i

=

2

X

i=1

h

φ(λi(x)) − φ(λi(y)) − φ0i(y))(λi(x) − λi(y))i

=

2

X

i=1

d(λi(x), λi(y)) ≥ 0,

where the first equality is due to (3.8), the second and fourth are obvious, the third follows from Lemma 3.1(b) and (3.11), the last one is from (3.5), and the first inequality follows from Proposition 1.3(b) and the last one is due to the strict convexity of φ on IR+. Note that tr[φsoc(x)] is strictly convex for any x ∈ Kn by Lemma 3.1(d), and therefore H(x, y) = 0 if and only if x = y by (3.15).

(d) From part(c), we have that LH(y, γ) ⊆ {x ∈ Kn| P2

i=1d(λi(x), λi(y)) ≤ γ}. By (T3), the set in the right hand side is bounded. Thus, LH(y, γ) is bounded for y ∈ int(Kn).

Similarly, LH(x, γ) is bounded for x ∈ Kn.

From part(a)-(d), we immediately obtain the results in (e) and (f). 

Remark 3.1. (i) From (3.8), it is not difficult to see that H(x, y) is exactly a distance measure induced by tr[φsoc(x)] via formula (3.4). Therefore, if n = 1 and φ is a Bregman function with zone IR++, i.e., φ also satisfies the property:

(T5) if {sk} ⊆ IR+ and {tk} ⊂ IR++ are sequences such that tk → t, {sk} is bounded, and d(sk, tk) → 0, then sk → t;

then H(x, y) reduces to the Bregman distance function d(x, y) in (3.5).

(ii) When n > 1, H(x, y) is generally not a Bregman distance even if φ is a Bregman function with zone IR++, by noting that Proposition 3.1(e) and (f ) do not hold for {xk} ⊆ bd(Kn) and y ∈ bd(Kn). By the proof of Proposition 3.1(c), the main reason is that in order to guarantee that

tr[(φ0)soc(y) ◦ x] =

2

X

i=1

φ0i(y))λi(x)

for any x ∈ Kn and y ∈ int(Kn), the relation [(φ0)soc(y)]2 = αx2 with some α > 0 is required, where [(φ0)soc(y)]2 is a vector composed of the last n − 1 elements of (φ0)soc(y). It is very stringent for φ to satisfy such relation. By this, tr[φsoc(x)] is not a B-function [89] on IRn, either, even if φ itself is a B-function.

(iii) We observe that H(x, y) is inseparable, whereas the double-regularized distance func-tion proposed by [137] belongs to the separable class of funcfunc-tions. In view of this, H(x, y) can not become a double-regularized distance function in Kn × int(Kn), even when φ is such that ˜d(s, t) = d(s, t)/φ00(t) + µ2(s − t)2 is a double regularized component (see [137]).

In view of Proposition 3.1 and Remark 3.1, we call H(x, y) a quasi D-function. In the following, we present several specific examples of quasi D-functions.

Example 3.1. Let φ : [0, ∞) → IR be φ(t) = t ln t − t with the convention 0 ln 0 = 0.

Solution. It is easy to verify that φ satisfies (T1)-(T4). By [63, Proposition 3.2 (b)] and (3.6)-(3.7), we can compute that for any x ∈ Kn and y ∈ int(Kn),

φsoc(x) = x ◦ ln x − x and (φ0)soc(y) = ln y.

Therefore, we obtain

H(x, y) = tr(x ◦ ln x − x ◦ ln y + y − x), ∀x ∈ Kn, y ∈ int(Kn),

∞, otherwise,

which is a quasi D-function. 

Example 3.2. Let φ : [0, ∞) → IR be φ(t) = t2−√ t.

Solution. It is not hard to verify that φ satisfies (T1)-(T4). From Property 1.2, we have that for any x ∈ Kn,

x2 = x ◦ x = λ21(x)u(1)x + λ22(x)u(2)x and x1/2 =p

λ1(x)u(1)x +p

λ2(x)u(2)x . By a direct computation, we then obtain for any x ∈ Kn and y ∈ int(Kn),

φsoc(x) = x ◦ x − x1/2 and (φ0)soc(y) = 2y − tr(y1/2)e − y1/2 2pdet(y) . This yields

H(x, y) =



 tr

"

(x − y)2− (x1/2− y1/2) + (tr(y1/2)e − y1/2) ◦ (x − y) 2pdet(y)

#

, ∀x ∈ Kn, y ∈ int(Kn),

∞, otherwise,

which is a quasi D-function. 

Example 3.3. Let φ : [0, ∞) → IR be φ(t) = t ln t − (1 + t) ln(1 + t) + (1 + t) ln 2 with the convention 0 ln 0 = 0.

Solution. It is easily shown that φ satisfies (T1)-(T4). Using [63, Proposition 3.2 (b)], we know that for any x ∈ Kn and y ∈ int(Kn),

φsoc(x) = x ◦ ln x − (e + x) ◦ ln(e + x) + (e + x) ln 2 and

0)soc(y) = ln y − ln(e + y) + e ln 2.

Consequently, we obtain

H(x, y) = trx ◦ (ln x −ln y) −(e + x) ◦ (ln(e +x) − ln(e +y)), ∀x ∈ Kn, y ∈ int(Kn),

∞, otherwise,

which is a quasi D-function. 

In addition, from [80, 82, 144], it follows that Pm

i=1φ(ζi) generated by φ in the above examples is a Bregman function with zone S = IRm+, and consequently Pm

i=1d(ζi, ξi) defined as in (3.5) is a D-function induced by Pm

i=1φ(ζi).

Proposition 3.2. Let H(x, y) be defined as in (3.8) or (3.15). Then, for all x, y ∈ int(Kn) and z ∈ Kn, the following three-points identity holds:

H(z, x) + H(x, y) − H(z, y) = D

∇tr[φsoc(y)] − ∇tr[φsoc(x)], z − xE

= trh

0)soc(y) − (φ0)soc(x)

◦ (z − x)i .

Proof. Using the definition of H given as in (3.15), we have D

∇tr[φsoc(x)], z − x E

= tr[φsoc(z)] − tr[φsoc(x)] − H(z, x), D∇tr[φsoc(y)], x − yE

= tr[φsoc(x)] − tr[φsoc(y)] − H(x, y), D∇tr[φsoc(y)], z − yE

= tr[φsoc(z)] − tr[φsoc(y)] − H(z, y).

Subtracting the first two equations from the last one gives the first equality. By (3.9), D∇tr[φsoc(y)] − ∇tr[φsoc(x)], z − xE

= 2D

0)soc(y) − (φ0)soc(x), z − xE . This together with the fact that tr(x ◦ y) = hx, yi leads to the second equality. 

In this section, we propose a proximal-like algorithm for solving the CSOCP based on the quasi D-function H(x, y). For the sake of notation, we denote F by the set

F = n

ζ ∈ IRm | Aζ + b Kn 0o

. (3.17)

It is easy to verify that F is convex and its interior int(F ) is given by int(F ) =n

ζ ∈ IRm | Aζ + b Kn 0o

. (3.18)

Let ψ : IRm → (−∞, +∞] be the function defined by ψ(ζ) =  tr[φsoc(Aζ + b)], if ζ ∈ F .

∞, otherwise. (3.19)

By Lemma 3.1, it is easily shown that the following conclusions hold for ψ(ζ).

Proposition 3.3. Let ψ(ζ) be given as in (3.19). If the matrix A has full rank m, then (a) ψ(ζ) is continuously differentiable on int(F ) with ∇ψ(ζ) = 2AT0)soc(Aζ + b);

(d) ψ(ζ) is strictly convex and continuous on F ;

(c) ψ(ζ) is boundary coercive, i.e., if {ξk} ⊆ int(F ) such that limk→+∞ξk= ξ ∈ bd(F ), then for all ζ ∈ int(F ), there holds that limk→+∞∇ψ(ξk)T(ζ − ξk) = −∞.

Let D(ζ, ξ) be the function induced by the above ψ(ζ) via formula (3.4), i.e.,

D(ζ, ξ) = ψ(ζ) − ψ(ξ) − h∇ψ(ξ), ζ − ξi. (3.20) Then, from (3.15) and (3.19), it is not difficult to see that

D(ζ, ξ) = H(Aζ + b, Aξ + b). (3.21)

Thus, by Proposition 3.1 and Lemma 3.3, we draw the following conclusions.

Proposition 3.4. Let D(ζ, ξ) be given by (3.20) or (3.21). If the matrix A has full rank m, then

(a) D(ζ, ξ) is continuous on F × int(F ), and for any given ξ ∈ int(F ), the function D(·, ξ) is strictly convex on F .

(b) For any fixed ξ ∈ int(F ), D(·, ξ) is continuously differentiable on int(F ) with

ζD(ζ, ξ) = ∇ψ(ζ) − ∇ψ(ξ) = 2ATh

0)soc(Aζ + b) − (φ0)soc(Aξ + b)i .

(c) D(ζ, ξ) ≥ P2

i=1d(λi(Aζ + b), λi(Aξ + b)) ≥ 0 for any ζ ∈ F and ξ ∈ int(F ), where d(·, ·) is defined by (3.5). Moreover, D(ζ, ξ) = 0 if and only if ζ = ξ.

(d) For each γ ∈ IR, the partial level sets of LD(ξ, γ) = {ζ ∈ F : D(ζ, ξ) ≤ γ} and LD(ζ, γ) = {ξ ∈ int(F ) : D(ζ, ξ) ≤ γ} are bounded for any ξ ∈ int(F ) and ζ ∈ F , respectively.

The PLA. The first proximal-like algorithm that we propose for the CSOCP (3.1) is defined as follows:

 ζ0 ∈ int(F ),

ζk = argminζ∈Ff (ζ) + (1/µk)D(ζ, ζk−1)

(k ≥ 1), (3.22) where {µk}k≥1 is a sequence of positive numbers. To establish the convergence of the algorithm, we make the following Assumptions for the CSOCP:

(A1) infn

f (ζ) | ζ ∈ Fo

:= f > −∞ and dom(f ) ∩ int(F ) 6= ∅.

(A2) The matrix A is of maximal rank m.

Remark 3.2. Assumption (A1) is elementary for the solution of the CSOCP. Assump-tion (A2) is common in the soluAssump-tion of SOCPs and it is obviously satisfied when F = Kn. Moreover, if we consider the standard SOCP

min cTx

s.t. Ax = b, x ∈ Kn,

where A ∈ IRm×n with m ≤ n, b ∈ IRm, and c ∈ IRn, the assumption that A has full row rank m is standard. Consequently, its dual problem, given by

max bTy

s.t. c − ATy Kn 0, (3.23)

satisfies assumption (A2). This shows that we can solve the SOCP by applying the proximal-like algorithm (PLA) defined as in (3.22) to the dual problem (3.23).

Now, we show the algorithm PLA given by (3.22) is well-defined under assumptions (A1) and (A2).

Proposition 3.5. Suppose that assumptions (A1)-(A2) hold. Then, the algorithm PLA given by (3.22) generates a sequence {ζk} ⊂ int(F ) such that

−2µ−1k AT (φ0)soc(Aζk+ b) − (φ0)soc(Aζk−1+ b) ∈ ∂f (ζk).

Proof. The proof proceeds by induction. For k = 0, it clearly holds. Assume that ζk−1 ∈ int(F ). Let fk(ζ) := f (ζ)+µ−1k D(ζ, ζk−1). Then assumption (A1) and Proposition 3.4(d) imply that fk has bounded level sets in F . By the lower semi-continuity of f and Proposition 3.4(a), the minimization problem minζ∈Ffk(ζ), i.e., the subproblem in (3.22), has solutions. Moreover, the solution ζkis unique due to the convexity of f and the strict convexity of D(·, ξ). In the following, we prove that ζk∈ int(F ).

By [131, Theorem 23.8] and the definition of D(ζ, ξ) given by (3.20), we can verify that ζk is the only ζ ∈ dom(f ) ∩ F such that

−1k AT0)soc(Aζk−1+ b) ∈ ∂ f (ζ) + µ−1k ψ(ζ) + δ(ζ|F ) , (3.24) where δ(ζ|F ) = 0 if ζ ∈ F and +∞ otherwise. We will show that

∂ f (ζ) + µ−1k ψ(ζ) + δ(ζ|F ) = ∅ for all ζ ∈ bd(F), (3.25) which by (3.24) implies that ζk ∈ int(F ). Take ζ ∈ bd(F ) and assume that there exists w ∈ ∂



f (ζ) + µ−1k ψ(ζ)



. Take bζ ∈ dom(f ) ∩ int(F ) and let

ζl = (1 − l)ζ + lζb (3.26) with liml→+∞l = 0. From the convexity of int(F ) and dom(f ), it then follows that ζl ∈ dom(f ) ∩ int(F ), and moreover, liml→+∞ζl = ζ. Consequently,

lwT(bζ − ζ) = wTl− ζ)

≤ f (ζl) − f (ζ) + µ−1k h

ψ(ζl) − ψ(ζ)i

≤ f (ζl) − f (ζ) + µ−1k D

2AT0)soc(Aζl+ b), ζl− ζE

≤ l(f (bζ) − f (ζ)) + µ−1k l 1 − l

trh

0)soc(Aζl+ b) ◦ (Abζ − Aζl)i , where the first equality is due to (3.26), the first inequality follows from the definition of subdifferential and the convexity of f (ζ) + µ−1k ψ(ζ) in F , the second one is due to the convexity and differentiability of ψ(ζ) in int(F ), and the last one is from (3.26) and the

convexity of f . Using Proposition 1.3(b) and (3.11), we then have µk(1 − l)[f (ζ) − f (bζ) + wT(bζ − ζ)]

≤ trh

0)soc(Aζl+ b) ◦ (Abζ + b)i

− trh

0)soc(Aζl+ b) ◦ (Aζl+ b)i

2

X

i=1

h

φ0i(Aζl+ b))λi(Abζ + b) − φ0i(Aζl+ b))λi(Aζl+ b)i

=

2

X

i=1

φ0i(Aζl+ b))h

λi(Abζ + b) − λi(Aζl+ b)i .

Since ζ ∈ bd(F ), i.e., Aζ + b ∈ bd(Kn), it follows that liml→+∞λ1(Aζl+ b) = 0. Thus, using (T4) and following the same line as the proof of Lemma 3.1(d), we can prove that the right hand side of the last inequality goes to −∞ when l tends to ∞, whereas the left-hand side has a finite limit. This gives a contradiction. Hence, the equation (3.25) follows, which means that ζk∈ int(F ).

Finally, let us prove ∂δ(ζk| F ) = {0}. From [131, page 226], it follows that

∂δ(z|Kn) = {υ ∈ IRn| υ Kn 0, tr(υ ◦ z) = 0}.

Using [131, Theorem 23.9] and the assumption dom(f ) ∩ int(F ) 6= ∅, we have

∂δ(ζ| F ) =ATυ ∈ IRn| υ Kn 0, tr(υ ◦ (Aζ + b)) = 0 .

In addition, from the self-dual property of symmetric cone Kn, we know that tr(x ◦ y) = 0 for any x Kn 0 and y Kn 0 implies x = 0. Thus, we obtain ∂δ(ζk|F ) = {0}. This together with (3.24) and [131, Theorem 23.8] yields the desired result. 

Proposition 3.5 implies that the second-order cone constrained subproblem in (3.22) is actually equivalent to an unconstrained one

ζk = argminζ∈IRm



f (ζ) + 1

µkD(ζ, ζk−1)

 ,

which is obviously simpler than the original CSOCP. This shows that the proximal-like algorithm proposed transforms the CSOCP into the solution of a sequence of simpler problems. We next present some properties satisfied by {ζk}. For convenience, we denote the optimal set of the CSOCP by X := {ζ ∈ F | f (ζ) = f}.

Proposition 3.6. Let {ζk} be the sequence generated by the algorithm PLA given by (3.22), and let σN =PN

k=1µk. Then, the following hold.

(a) {f (ζk)} is a nonincreasing sequence.

(b) µk(f (ζk) − f (ζ)) ≤ D(ζ, ζk−1) − D(ζ, ζk) for all ζ ∈ F .

(c) σN(f (ζN) − f (ζ)) ≤ D(ζ, ζ0) − D(ζ, ζN) for all ζ ∈ F .

(d) D(ζ, ζk) is nonincreasing for any ζ ∈ X if the optimal set X 6= ∅.

(e) D(ζk, ζk−1) → 0 if the optimal set X 6= ∅.

Proof. (a) By the definition of ζk given as in (3.22), we have

f (ζk) + µ−1k D(ζk, ζk−1) ≤ f (ζk−1) + µ−1k D(ζk−1, ζk−1).

Since D(ζk, ζk−1) ≥ 0 and D(ζk−1, ζk−1) = 0 by Proposition 3.4(c), it follows that f (ζk) ≤ f (ζk−1) (k ≥ 1).

(b) By Proposition 3.5, 2µ−1k AT[(φ0)soc(Aζk−1 + b) − (φ0)soc(Aζk+ b)] ∈ ∂f (ζk). Hence, from the definition of subdifferential, it follows that for any ζ ∈ F ,

f (ζ) ≥ f (ζk) + 2µ−1k D

0)soc(Aζk−1+ b) − (φ0)soc(Aζk+ b), Aζ − AζkE

= f (ζk) + µ−1k trh

[(φ0)soc(Aζk−1+ b) − (φ0)soc(Aζk+ b)] ◦ [(Aζ + b) − (Aζk+ b)]i

= f (ζk) + µ−1k H(Aζ + b, Aζk+ b) + H(Aζk+ b, Aζk−1+ b) − H(Aζ + b, Aζk−1+ b)

= f (ζk) + µ−1k h

D(ζ, ζk) + D(ζk, ζk−1) − D(ζ, ζk−1)i

, (3.27)

where the first equality is due to the definition of trace and the second follows from Proposition 3.2. From this inequality and the nonnegativity of D(ζk, ζk−1), we readily obtain the conclusion.

(c) From the result in part(b), we have

µk[f (ζk−1) − f (ζk)] ≥ D(ζk−1, ζk) − D(ζk−1, ζk−1) = D(ζk−1, ζk).

Multiplying this inequality by σk−1 and noting that σk = σk−1+ µk, one has

σk−1f (ζk−1) − (σk− µk)f (ζk) ≥ σk−1µ−1k D(ζk−1, ζk). (3.28) Summing up the inequalities in (3.28) for k = 1, 2, · · · , N and using σ0 = 0 yields

−σNf (ζN) +

N

X

k=1

µkf (xk) ≥

N

X

k=1

σk−1µ−1k D(ζk−1, ζk). (3.29)

On the other hand, summing the inequality in part (b) over k = 1, 2, · · · , N , we get

−σNf (ζ) +

N

X

k=1

µkf (ζk) ≤ D(ζ, ζ0) − D(ζ, ζN). (3.30)

Now subtracting (3.29) from (3.30) yields that

σN[f (ζN) − f (ζ)] ≤ D(ζ, ζ0) − D(ζ, ζN) −

N

X

k=1

σk−1µ−1k D(ζk−1, ζk).

This together with the nonnegativity of D(ζk−1, ζk) implies the conclusion.

(d) Note that f (ζk) − f (ζ) ≥ 0 for all ζ ∈ X . Thus, the result follows from part(b) directly.

(e) From part(d), we know that D(ζ, ζk) is nonincreasing for any ζ ∈ X . This together with D(ζ, ζk) ≥ 0 for any k implies that D(ζ, ζk) is convergent. Thus, we have

D(ζ, ζk−1) − D(ζ, ζk) → 0 as k → ∞. (3.31) On the other hand, from (3.27) it follows that

0 ≤ µk[f (ζk) − f (ζ)] ≤ D(ζ, ζk−1) − D(ζ, ζk) − D(ζk, ζk−1), ∀ζ ∈ X , which together with the nonnegativity of D(ζk, ζk−1) implies

D(ζk, ζk−1) ≤ D(ζ, ζk−1) − D(ζ, ζk), ∀ζ ∈ X . This combining with (3.31) yields the desired result. 

We have proved that the proximal-like algorithm (PLA) defined as in (3.22) is well-defined and satisfies some favorable properties. By this, we next establish its convergence.

Proposition 3.7. Let {ζk} be the sequence generated by the algorithm PLA given by in (3.22), and let σN =PN

k=1µk. Then, under Assumptions (A1)-(A2), (a) if σN → ∞, then limN →+∞f (ζN) → f;

(b) if σN → ∞ and the optimal set X 6= ∅, then the sequence {xk} is bounded and every accumulation point is a solution of the CSOCP.

Proof. (a) From the definition of f, there exists a bζ ∈ F such that f (bζ) < f + , ∀ > 0.

Moreover, from Proposition 3.6(c) and the nonnegativity of D(ζ, ζN), we have that f (ζN) − f (ζ) ≤ σN−1D(ζ, ζ0), ∀ζ ∈ F .

Let ζ = bζ in the above inequality and take the limit with σN → +∞, we then obtain limN →+∞f (ζN) < f+ .

Considering that  is arbitrary and f (ζN) ≥ f, we thus have the desired result.

(b) Suppose that ζ ∈ X . Then, from Proposition 3.6(d), D(ζ, ζk) ≤ D(ζ, ζ0) for any k. This implies that {ζk} ⊆ LD, D(ζ, ζ0)). By Proposition 3.6(d), the sequence {ζk} is then bounded. Let ¯ζ ∈ F be an accumulation point of {ζk} with subsequence {ζkj} → ¯ζ. Then, from part(a), it follows that f (ζkj) → f. On the other hand, since f is lower-semicontinuous, we have f (¯ζ) = lim infkj→+∞f (ζkj). The two sides show that f (¯ζ) ≤ f (ζ). Consequently, ¯ζ is a solution of the CSOCP. 

在文檔中 SOC Functions and Their Applications (頁 104-119)