Entropy-like proximal algorithms based on a second-order homogeneous distance function for quasi-convex programming

21  Download (0)

Full text

(1)

DOI 10.1007/s10898-007-9156-y O R I G I NA L PA P E R

Entropy-like proximal algorithms based on a second-order homogeneous distance function for quasi-convex programming

Shaohua Pan · Jein-Shan Chen

Received: 13 August 2006 / Accepted: 12 March 2007 / Published online: 25 April 2007

© Springer Science+Business Media LLC 2007

Abstract We consider two classes of proximal-like algorithms for minimizing a proper lower semicontinuous quasi-convex function f(x) subject to non-negative constraints x ≥ 0.

The algorithms are based on an entropy-like second-order homogeneous distance function.

Under the assumption that the global minimizer set is nonempty and bounded, we prove the full convergence of the sequence generated by the algorithms, and furthermore, obtain two important convergence results through imposing certain conditions on the proximal param- eters. One is that the sequence generated will converge to a stationary point if the proximal parameters are bounded and the problem is continuously differentiable, and the other is that the sequence generated will converge to a solution of the problem if the proximal parameters approach to zero. Numerical experiments are done for a class of quasi-convex optimization problems where the function f(x) is a composition of a quadratic convex function from IRn to IR and a continuously differentiable increasing function from IR to IR, and computational results indicate that these algorithms are very promising in finding a global optimal solution to these quasi-convex problems.

Keywords Proximal-like method· Entropy-like distance · Quasi-convex programming

Shaohua Pan work was partially supported by the Doctoral Starting-up Foundation (05300161) of GuangDong Province.

Member of Mathematics Division, National Center for Theoretical Sciences, Taipei Office. Jein-Shan Chen work is partially supported by National Science Council of Taiwan.

S. Pan (

B

)

School of Mathematical Sciences, South China University of Technology, Guangzhou 510641, China e-mail: shhpan@scut.edu.cn

J.-S. Chen

Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan e-mail: jschen@math.ntnu.edu.tw

(2)

1 Introduction

The proximal point algorithm for minimizing a convex function f(x) on IRn generates a sequence{xk}k∈N ⊆ IRn by the following iterative scheme:

xk+1= argmin

x∈IRn

f(x) + λkx − xk2

, (1)

whereλkis a sequence of positive numbers and ·  denotes the Euclidean norm in IRn. This method, originally introduced by Martinet [15], is based on the Moreau proximal approxi- mation of f (see [16]). The proximal point algorithm was then further developed and studied by Rockafellar [19,20]. Later, several researchers [4,5,7,12,14,23] proposed and studied nonquadratic proximal point algorithm by replacing the quadratic distance in (1) with a Bregman distance or an entropy-like distance. Among others, the entropy-like distance, also calledϕ-divergence, is defined by

dϕ(x, y) =

n i=1

yiϕ(xi/yi), (2)

whereϕ: IR → (−∞, +∞] is a closed proper strictly convex function satisfying certain conditions; see [12,13,23,24]. This class of distance-like functions was first proposed by Teboulle [23] in order to define entropy-like proximal maps. A popular choice ofϕ is the case thatϕ(t) = t ln t − t + 1, for which the corresponding dϕis exactly the well-known Kullback–Leibler entropy function from statistics [7,8,10,23] and that is the “entropy" ter- minology stems from.

The proximal-like algorithm based onϕ-divergence, originally designed for minimizing a convex function f(x) subject to non-negative constraints x ≥ 0, consists of a sequence {xk}k∈N ⊆ IRn++generated by the iterative scheme as follows:

x0> 0, xk+1= argmin

x≥0 { f (x) + λkdϕ(x, xk)}. (3) This class of proximal-like algorithms were studied extensively for convex programming;

see [12,13,23,24] and references therein, and particularly, the one withϕ(t) = t ln t − t + 1 was recently extended to convex semidefinite programs [6] and convex second-order cone programs in a recent manuscript of J.-S. Chen. In fact, the algorithm (3) associated with ϕ(t) = − ln t + t − 1 was first proposed by Eggermont [8]. It is worth to point out that the fundamental difference between (1) and (3) is that the term dϕ(·, ·) is used in (3) to force the iterates{xk}k∈Nto stay in IR++n which is the interior of the non-negative orthant, namely the algorithm (3) will automatically generate a positive sequence{xk}k∈N ⊆ IRn++.

In this paper, we will focus on two classes of proximal-like algorithms of the form (3) but with a second-order homogeneous distance-like function dφgiven by

dφ(x, y) =

n i=1

yi2φ(xi/yi), (4)

where the kernelφ is defined with two types of special ϕ and a quadratic function. The definition ofφ and the properties of dφare given in Sect. 3. This class of algorithms has been studied for convex minimization (see [1,2,22]). However, we in this paper employ these

(3)

algorithms to solve the following quasi-convex minimization problem:

min f(x)

s.t. x ≥ 0, (5)

where f: IRn → IR is a proper lower semicontinuous quasi-convex function. Since we do not require the convexity of f , the basic iterative scheme for the algorithms is as follows:

x0> 0, xk+1∈ argmin

x≥0 { f (x) + λkdφ(x, xk)}, (6) whereλkis same as before. The purpose of this paper is to establish the full convergence of the sequence{xk}k∈N generated by (6) under some mild assumptions for the quasi-convex problem (5), and verify the effectiveness of the algorithms by numerical experiments.

Note that (5) is a special nonconvex optimization problem, and therefore the global opti- mization methods [11] developed for the general nonconvex optimization problem can be applied for solving it. Nevertheless, we should point out that the design of these global optimization methods is often far more complex than that of the proximal-like method (6).

The rest of this paper is organized as follows. In Sect.2, we recall some definitions and basic results that will be used in the later sections. In Sect.3, we present the definition of the kernelφ and investigate the properties of dφ. Based on the entropy-like second-order homo- geneous distance function dφ, we in Sect.4propose two classes of proximal-like algorithms, and prove the full convergence of the sequence generated. In Sect.5, numerical experi- ments were done with a specific dφ for a class of continuously differentiable quasi-convex programming problems.

Unless otherwise stated, in this paper, we use the notation·, · and  ·  to denote the Euclidean inner product and Euclidean norm in IRn, and IR+n to represent the non-negative orthant in IRnwith the interior IRn++. For a given differentiable function f: IRn → IR, ∇ f (x) denotes the gradient of f at x, while(∇ f (x))i means the i th partial derivative of f with respect to x. In addition, we use1dφ(x, y) to denote the partial derivative of dφwith respect to its first component.

2 Basic concepts

In this section, we recall some definitions and basic results which will be used in the subse- quent analysis. We start with the definition of Fejér convergence for a sequence.

Definition 2.1 A sequence{yk}k∈N is Fejér convergent to a nonempty set U ⊆ IRn with respect to a distance-like function d(·, ·), if for every u ∈ U, we have d(u, yk+1) ≤ d(u, yk).

When d is the Euclidean distance,{yk} is called Fejér convergent to U.

Given an extended real-valued function f: IRn → IR ∪ {+∞}, denote its domain by dom f:= {x ∈ IRn: f (x) < +∞}

and its epigraph by

epi f:=

(x, β) ∈ IRn× IR : f (x) ≤ β .

Then, f is said to be proper if dom f = ∅ and f (x) > −∞ for any x ∈ dom f , and f is a lower semicontinuous function if epi f is a closed subset of IRn× IR. We next recall the definition of the Fréchet subdifferential; see [18, Chapter 8] and [21, Chapter10].

(4)

Definition 2.2 Let f: IRn → IR ∪ {+∞} be a proper lower semicontinuous function. For each x∈ dom f , the Fréchet subdifferential of f at x, denoted by ˆ∂ f (x), is the set of vectors s∈ IRnsuch that

lim inf

y =x,y→x

1

y − x



f(y) − f (x) − s, y − x

≥ 0. (7)

If x /∈ dom f , then ˆ∂ f (x) = ∅.

The vector s satisfying the inequality (7) is also termed as a regular subgradient of f at x (see [21, p. 301]). It is not difficult to see that the inequality (7) is equivalent to

f(y) ≥ f (x) + s, y − x + o(y − x), where

y→xlimo(y − x)/y − x = 0.

For the subdifferential ˆ∂ f (x), the following results hold by direct verifications.

Lemma 2.3 [21, Chapter 8] Let f: IRn → IR ∪ {+∞} be a proper lower semicontinuous function and ˆ∂ f (x) be the subdifferential of f at x. Then,

(a) ˆ∂ f (x) is a closed and convex set.

(b) If f is differentiable at x or in a neighborhood of x, then ˆ∂ f (x) = {∇ f (x)}, where

∇ f (x) is the gradient of f .

(c) If g = f + h with f finite at x and h differentiable on a neighborhood of x, then ˆ∂g(x) = ˆ∂ f (x) + ∇h(x).

(d) If f has a local minimum at ¯x, then 0 ∈ ˆ∂ f ( ¯x).

To work with differentiable minimization problems, we also need the following definition.

Definition 2.4 Suppose that f: IRn→ IR is a differentiable function. Then,

(a) For an unconstrained optimization problem of minimizing f(x) over x ∈ IRn, xis called a stationary point if∇ f (x) = 0.

(b) For a constrained optimization problem of minimizing f(x) over x ∈ C where C is nonempty and convex subset of IRn, xis called a stationary point if

∇ f (x)T(x − x) ≥ 0 for all x ∈ C.

To close this section, we recall the concept of quasi-convexity, strict quasi-convexity and strong quasi-convexity, and briefly discuss general properties of the minimization problem involving the objective function with such properties.

Definition 2.5 Let f: IRn→ IR be a proper function. Then, f is called quasi-convex if for all x, y ∈ dom f and β ∈ (0, 1), there always holds

f(βx + (1 − β)y) ≤ max{ f (x), f (y)}.

It can be proved that any convex function is also quasi-convex, but the converse is not true.

For a quasi-convex function, we have the following important property.

Proposition 2.6 The proper function f: IRn → IR is quasi-convex if and only if the level sets Lf(α) := {x ∈ dom f | f (x) ≤ α} are convex for every α ∈ IR.

(5)

Definition 2.7 Let f: IRn→ IR be a proper function. Then, f is called strictly quasi-convex if for all x, y ∈ dom f with f (x) = f (y), there always holds

f(βx + (1 − β)y) < max{ f (x), f (y)} for ∀β ∈ (0, 1).

By [3, Lemma 3.5.7], if f is lower semicontinuous and strictly quasi-convex, then f is quasi- convex. For a strictly quasi-convex function, we have the following important result, which implies that every local optimal solution of (5) is also a global optimal solution.

Proposition 2.8 [3, Theorem 3.5.6] Let f: IRn → IR be a proper strictly quasi-convex function. Consider the problem to minimize f(x) subject to x ∈ C, where C is a nonempty convex set in IRn. If ¯x is a local optimal solution, then ¯x is also a global optimal solution.

Definition 2.9 Let f: IRn → IR be a proper function. Then, f is called strongly quasi-convex if for all x, y ∈ dom f with x = y, there always holds

f(βx + (1 − β)y) < max{ f (x), f (y)} for ∀β ∈ (0, 1).

It can be shown that every strongly quasi-convex function is strictly quasi-convex, and every strongly quasi-convex function is quasi-convex even without semicontinuity assumption.

When f(x) is strongly quasi-convex, the problem (5) has the unique global optimal solution.

3 Distance-like function dφand its properties

In this section, we present the definition of the kernelφ and investigate the properties of the bivariate function dφ induced byφ via formula (4). We start with the assumptions on the functionϕ, needed to define the kernel φ. Let ϕ: IR → (−∞, +∞] be a closed proper convex function with domϕ = ∅ and domϕ ⊆ [0, +∞). We assume that

(i) ϕ is twice continuously differentiable on int(domϕ) = (0, +∞);

(ii) ϕ is strictly convex on its domain;

(iii) limt→0+ϕ(t) = −∞;

(iv) ϕ(1) = ϕ(1) = 0 and ϕ(1) > 0.

In the rest of this paper, we denote by the class of functions satisfying (1)–(4).

Givenϕ ∈ , we define the following two subclasses of :

1=

ϕ ∈  : ϕ(1)(1 − 1/t) ≤ ϕ(t) ≤ ϕ(1) ln t, ∀t > 0

(8) and

2=

ϕ ∈  : ϕ(1)(1 − 1/t) ≤ ϕ(t) ≤ ϕ(1)(t − 1), ∀t > 0

. (9)

Since ln t≤ t − 1 for any t > 0 and ϕ(1) > 0, clearly, 1⊆ 2⊆ . The assumptions on

1and2are very mild. It is not hard to verify that the following functions ϕ1(t) = t ln t − t + 1, dom ϕ = [0, +∞), ϕ2(t) = − ln t + t − 1, dom ϕ = (0, +∞),

ϕ3(t) = (

t− 1)2, dom ϕ = [0, +∞)

are all in1, and consequently belong to2. The first example ϕ1 plays an important role in the convergence analysis of our first class of algorithms that will be studied in the

(6)

next section. As mentioned in the introduction, theϕ-divergence for ϕ = ϕ1 is exactly the Kullback–Leibler entropy function, given by

H(x, y) := dϕ(x, y) =

n j=1

xjln(xj/yj) + yj− xj, (10)

whose domain can be continuously extended to IR+n × IR++n by using the convention that 0 ln 0= 0. The following lemma states some useful properties of H(x, y), and since their proofs are elementary by use of (10), we here omit them.

Lemma 3.1 Let H(·, ·) be defined as in (10). Then, we have the following results.

(a) The level sets of H(x, ·) are bounded for all x ∈ IRn+.

(b) If{yk} ⊂ IRn++converges to y∈ IRn+, then limk→+∞H(y, yk) = 0.

(c) If{zk} ⊂ IR+n, {yk} ⊂ IRn++are sequences such that{zk} is bounded, limk→+∞yk= y and limk→+∞H(zk, yk) = 0, then limk→+∞zk= y.

With the above assumptions onϕ, we now give the definition of the kernel φ involved in the function dφ. Givenϕ ∈  and the parameters µ > 0 and ν ≥ 0, let φ : IR → (−∞, +∞]

be a closed proper convex function defined by φ(t) := µϕ(t) +ν

2(t − 1)2. (11)

It is not difficult to verify thatφ satisfies the properties listed in (i)–(iv), and consequently φ ∈ . Particularly, φ will be strongly convex on its domain if ν > 0. This implies that the objective function of the subproblem (6), i.e., f(x) + λn

i=1(xik)2φ(xi/xik) will be strictly convex on IRn++ if the parameterλ is set to be sufficiently large, although f (x) itself is quasi-convex. That is to say, the proximal term dφ(·, ·) plays a convexification role in the quasi-convex subproblem (6), and moreover, the convexification role becomes stronger as the parameterλ increases. In fact, from the computational results in Sect.5, we may see that the proximal term shows a good convexification role for the quasi-convex function f(x), even for a very smallλ.

In what follows, we will concentrate on the properties of the bivariate function dφ. Lemma 3.2 Given aϕ ∈  and the parameters µ > 0, ν ≥ 0, and let φ be the kernel defined by (11) and dφ(·, ·) be the function induced by φ via formula (4). Then,

(a) dφis a homogeneous function of order 2, i.e., dφ(αx, αy) = α2dφ(x, y) for ∀α > 0.

(b) For a fixed y∈ IRn++, the function dφ(·, y) is strictly convex over IRn++. If, in addition, ν > 0, then dφ(·, y) is strongly convex on IR++n .

(c) For any(x, y) ∈ IRn++× IRn++, dφ(x, y) ≥ 0, and dφ(x, y) = 0 if and only if x = y.

(d) For any fixed z ∈ IR++n , the level sets L(z, γ ) := {x ∈ IR++n : dφ(x, z) ≤ γ } are bounded for allγ ≥ 0.

(e) Ifϕ ∈ 1or2, and{yk}k∈N⊆ IR++n converges to¯y ∈ IR+n, then for any fixed x∈ IR++n , the sequence{dφ(x, yk)}k∈Nis bounded.

Proof The properties in (a) and (b) are clear from the definition of dφgiven by (4).

(c) Note thatφ(t) is strictly convex and moreover φ(1) = µϕ(1) = 0 due to (iv). Hence, φ(t) ≥ φ(1) = 0 and φ(t) = 0 iff t = 1.

This implies that dφ(x, y) ≥ 0 for ∀(x, y) ∈ IRn++× IRn++, and dφ(x, y) = 0 iff x = y.

(7)

(d) To prove the result, it is enough to consider the one-dimensional case, i.e., to show that hζ(t):= ζ2φ(t/ζ ) for ζ > 0 has bounded level sets, which in turn is equivalent to showing thatφ has bounded level sets. Note that {t : φ(t) ≤ 0} = {1}. Therefore, the conclusion follows from [18, Corrollary 8.7.1].

(e) From the definitions ofφ and dφ, we have that

dφ(x, yk) =

n i=1

⎣µ(yik)2ϕ xi

yik

+ν 2(yik)2

xi yki − 1

2

=

n i=1

µ(yik)2ϕ(xi/yik) +ν

2(xi− yik)2 .

Ifϕ(t) is bounded above for any t > 0, then the conclusion is obvious. Otherwise, we discuss the following two cases:

Case (1)¯yi> 0 for each i ∈ {1, 2, . . . , n}. Since {yik}k∈N→ ¯yifor each i , the proof follows directly from the continuity ofϕ.

Case (2) there exists an index i0 ∈ {1, 2, . . . , n} such that ¯yi0 = 0. By the given assumptions and Case (1), it suffices to prove that the sequence{(yik0)2ϕ(xi/yik0)} is bounded above. For any k∈ N, using the convexity of ϕ and the fact that ϕ(1) = 0, we have that

0≥ ϕ(xi/yik0) + ϕ(xi/yik0)

1− xi/yki0 . Multiplying the inequality with(yki0)2readily yields that

(yik0)2ϕ(xi/yik0) ≤ (yik0)2ϕ(xi/yik0)

xi/yik0− 1

= (yik0(xi/yik0)

xi− yik0 , which in turn implies that

(yik0)2ϕ(xi/yik0) ≤(yik0(xi/yik0)(xi− yik0).

Ifϕ ∈ 2, then it follows from (9) that

ϕ(1)yik0(1 − yik0/xi) ≤ yik0ϕ(xi/yik0) ≤ ϕ(1)(xi− yik0).

Combining the last two inequalities immediately gives that (yik0)2ϕ(xi/yik0) ≤ max

 ϕ(1)

xi− yik02

, ϕ(1)yik

0

xi



xi− yik02 .

This together with the given assumptions shows that{(yik0)2ϕ(xi/yik0)} is bounded above for anyϕ ∈ 2, and consequently the sequence{dφ(x, yk)}k∈N is bounded. Noting that

1⊆ 2, the sequence{dφ(x, yk)}k∈Nis also bounded forϕ ∈ 1.  Lemma3.2(a)–(c) state that dφ defined by (4) is a convex second-order homogeneous distance-like function. Thus, in analogy with the Euclidean distance, we can define the pro- jection of a point y, denoted by ˆx(y), to a closed convex set S ⊆ IRn with respect to dφ, which is characterized as the solution of the following problem

inf



dφ(x, y): x ∈ S

. (12)

The existence of ˆx(y) is guaranteed by Lemma3.2(d). For this projection, we have the following similar results to the Euclidean projection.

(8)

Lemma 3.3 Let S be a closed convex subset of IRn and y∈ IRn be a point not in S. Then ˆx(y) is the projection of y on S with respect to dφif and only if



x− ˆx(y), −∇1dφ( ˆx(y), y)

≤ 0, ∀x ∈ S. (13)

Proof Note that problem (12) is equivalent to inf{dφ(x, y) + δ(x | S) : x ∈ IRn}, where δ(· | S) denotes the indicator function of the set S. By [19, Theorem 27.4], ˆx(y) solves the unconstrained optimization problem if and only if the inequality (13) holds. Thus, the proof

is completed. 

Finally, we present a favorable property of dφ withϕ ∈ 1 or2, which will play a crucial role in the convergence analysis of algorithms to be studied in the next section.

Lemma 3.4 Given aϕ ∈  and the parameters µ > 0, ν ≥ 0, and let φ be the kernel defined as in (11). Then, for any a, b ∈ IRn++and c∈ IRn+, we have the following results:

(a) Ifν = 0 and ϕ ∈ 1, thenc − b, ∇1dφ(b, a) ≤ µϕ(1) max1≤ j≤n{aj}[H(c, a) − H(c, b)].

(b) Ifν ≥ µϕ(1) > 0 and ϕ ∈ 2, thenc − b, ∇1dφ(b, a) ≤ θ(c − a2− c − b2) withθ = (ν + µϕ(1))/2.

Proof (a) Sinceϕ ∈ 1, we have from (8) that

ϕ(t) ≤ ϕ(1) ln t for any t > 0.

Setting t= bj/ajin the inequality, we then obtain that

cjajϕ(bj/aj) ≤ cjajϕ(1) ln(bj/aj), j = 1, 2, . . . , n. (14) On the other hand, it follows from (8) that

−ϕ(t) ≤ −ϕ(1)(1 − 1/t), ∀t > 0.

Substituting t= bj/aj into the inequality and multiplying with ajgives

− bjajϕ(bj/aj) ≤ ajϕ(1)(aj− bj), j = 1, 2, . . . , n. (15) Define

(a, b):= (a1ϕ(b1/a1), . . . , anϕ(bn/an))T, ∀a, b ∈ IRn++. Then, adding the inequalities (14) and (15) and summing over j = 1, . . . , n gives



c− b, (a, b)

≤ ϕ(1)

⎣n

j=1

aj



cjln(bj/aj) + aj− bj

⎤

≤ ϕ(1) max

1≤ j≤n{aj}

⎣n

j=1

cjln(bj/aj) + aj− bj

= ϕ(1) max

1≤ j≤n{aj} [H(c, a) − H(c, b)] .

Note that∇1dφ(b, a) = µ (a, b), and hence we obtain the result from the last inequality.

(b) The proof is similar to [2, Lemma 3.4]. For completeness, we here include it. Since ϕ ∈ 2, the inequality (15) still holds. On the other hand, we have from (9) that

ϕ(t) ≤ ϕ(1)(t − 1), ∀t > 0.

(9)

Substituting t= bj/aj into the above inequality leads to

cjajϕ(bj/aj) ≤ cjajϕ(1)(bj/aj− 1) = ϕ(1)cj(bj− aj), j = 1, 2, . . . , n. (16) Adding the two inequalities (15) and (16), summing over j = 1, 2, . . . , n, and using the definition of (a, b), we obtain

c − b, (a, b) ≤ ϕ(1)

n j=1



cj(bj− aj) + aj(aj− bj)

= ϕ(1)c − a, b − a.

Note that∇1dφ(b, a) = µ (a, b) + ν(b − a). Then, the last inequality implies that

c − b, ∇1dφ(a, b) ≤ µϕ(1)c − a, b − a + νc − b, b − a. (17) Using the identities

c − a, b − a = (1/2)(c − a2− c − b2+ b − a2) and

c − b, b − a = (1/2)(c − a2− c − b2− b − a2) we then from (17) obtain

c − b, ∇1dφ(b, a) ≤ θ(c − a2− c − b2) −1

2(ν − µϕ(1))b − a2

≤ θ(c − a2− c − b2),

where the second inequality is due toν ≥ µϕ(1). Thus, the proof is completed. 

4 Interior proximal-like methods

In this section, we consider two classes of proximal-like algorithms based on the second-order homogeneous function dφ for the quasi-convex optimization problem (5). The two kinds of algorithms are described as follows, where the RIPM was first proposed by Auslender et al.

[2] for convex minimization problems subject to non-negative constraints.

Interior Proximal Method (IPM) Letφ be defined as in (11) withµ > 0, ν = 0 and ϕ ∈ 1. Generate the sequence{xk}k∈Nby the iterative scheme (6).

Regularized Interior Proximal Method (RIPM) Letφ be defined as in (11) withν ≥ µϕ(1) > 0 and ϕ ∈ 2. Generate the sequence{xk}k∈Nby the iterative scheme (6).

To establish the convergence of IPM and RIPM, throughout this section, we make the following assumptions for the quasi-convex optimization problem (5):

(A1) dom f ∩ IR++n = ∅.

(A2) The optimal set of problem (5), denoted byX, is nonempty and bounded.

In what follows, we concentrate on the convergence of IPM and RIPM. We first prove that they are well-defined, which is a direct consequence of the following lemma.

Lemma 4.1 Givenµ > 0, ν ≥ 0 and ϕ ∈ , and let φ and dφ be defined as in (11) and (4), respectively. Then, under assumptions (A1) and (A2), the sequence{xk}k∈N generated by the iterative scheme (6) is well defined.

(10)

Proof The proof proceeds by induction. Clearly, when k= 0, the conclusion holds due to (6). Assume that xkis well defined. Let fbe the optimal value of problem (5), then

f(x) + λkdφ(x, xk) ≥ f+ λkdφ(x, xk) for all x ∈ IRn++. (18) Let Fk(x) := f (x) + λkdφ(x, xk) and denote its level sets by

LFk(γ ) := {x ∈ IRn++: Fk(x) ≤ γ } for all γ ∈ IR.

Then, the inequality in (18) implies that LFk(γ ) ⊆ L(xk, λ−1k (γ − f)). By Lemma3.2(c), the level sets L(xk, λ−1k (γ − f)) are bounded for any γ ≥ f, and consequently, the sets LFk(γ ) are bounded for any γ ≥ f. Whereas for anyγ ≤ f, we have LFk(γ ) ⊆ X, which are obviously bounded due to assumption (A2). The two sides show that the level sets of the function Fk(x) are bounded. Also, Fk(x) is lower semicontinuous on dom f . Hence, the level sets of Fk(x) are compact. Now, using the lower semicontinuity of Fk(x) and the compactness of its level sets, we have that Fk(x) has a global minimum which may not be unique due to the nonconvexity of f . In such case, xk+1can be arbitrarily chosen among the

set of minimizers of Fk(x). 

Next, we investigate the properties of the sequence{xk}k∈Ngenerated by IPM and RIPM.

To this end, we define the following set U :=

x∈ IR+n | f (x) ≤ inf

k∈N f(xk) .

From assumptions (A1)–(A2) and Proposition2.6, U is a nonempty closed convex set.

Lemma 4.2 Letk}k∈Nbe an arbitrary sequence of positive numbers and{xk}k∈Nbe the sequence generated by IPM. Then, under assumptions (A1)–(A2),

(a) { f (xk)}k∈Nis a decreasing and convergent sequence.

(b) {xk}k∈Nis Fejér convergent to the set U with respect to H . (c) For all x∈ U, the sequence {H(x, xk)}k∈N is convergent.

Proof (a) From Eq.6, xk+1is a global optimal solution of the following problem:

minx≥0



f(x) + λkdφ(x, xk) and consequently, for any x∈ IRn+, it follows that

f(xk+1) + λkdφ(xk+1, xk) ≤ f (x) + λkdφ(x, xk). (19) Setting x= xkin (19), we then obtain that

f(xk+1) + λkdφ(xk+1, xk) ≤ f (xk) + λkdφ(xk, xk) = f (xk), which means that

0≤ λkdφ(xk+1, xk) ≤ f (xk) − f (xk+1).

Hence,{ f (xk)}k∈Nis decreasing, and furthermore, convergent due to assumption (A2).

(b) From inequality (19), it follows that for any x∈ U, dφ(xk+1, xk) ≤ dφ(x, xk).

(11)

This implies that xk+1is the unique projection of xkon U with respect to dφ. Therefore, by Lemma3.3, we have that



x− xk+1, −∇1dφ(xk+1, xk)

≤ 0, ∀x ∈ U. (20)

On the other hand, applying Lemma3.4(a) at the points c= x, a = xk, and b= xk+1, we then obtain that

H(x, xk) − H(x, xk+1) ≥ x − xk+1, ∇1dφ(xk+1, xk)

µϕ(1) max1≤ j ≤ n{xkj} . (21) Sinceµϕ(1) max1≤ j ≤ n{xkj} > 0, using the inequalities (20) and (21) yields that

H(x, xk) ≥ H(x, xk+1), ∀x ∈ U.

From Definition2.1, it follows that{xk}k∈N is Fejér convergent to U with respect to H . (c) The proof follows from part (b) and the non-negativity of H .  Lemma 4.3 Letk}k∈Nbe an arbitrary sequence of positive numbers and{xk}k∈Nbe the sequence generated by RIPM. Then, under assumptions (A1) and (A2),

(a) { f (xk)}k∈Nis a decreasing and convergent sequence.

(b) {xk}k∈Nis Fejér convergent to the set U .

(c) For all x∈ U, the sequence {x − xk}k∈N is convergent.

Proof

(a) The proof is similar to that of Lemma4.2(a), and we here omit it.

(b) By a similar argument to Lemma4.2(b), we can obtain the inequality (20). On the other hand, applying Lemma3.4(b) at the points c= x, a = xk, and b= xk+1gives

x − xk, ∇1dφ(xk, xk+1) ≤ θ(x − xk2− x − xk+12), (22) whereθ = (ν + µϕ(1))/2. Since θ > 0, using the inequalities (20) and (22) yields that

x − xk+12≤ x − xk2, ∀x ∈ U. (23) By Definition2.1, we thus prove that{xk}k∈Nis Fejér convergent to the set U .

(c) The proof follows from part (b) and the non-negativity ofx − xk.  To now, we have proved that the sequence{xk}k∈N generated by IPM or RIPM is well- defined and satisfies some favorable properties. With these properties, we next establish the convergence results of the proposed algorithms.

Proposition 4.4 Suppose that assumptions (A1) and (A2) are satisfied. Letk}k∈N be an arbitrary sequence of positive numbers and{xk}k∈N be the sequence generated by IPM.

Then, the sequence{xk}k∈Nconverges, and furthermore,

(a) if there existλ and ¯λ such that 0 < λ < λk≤ ¯λ for any k, then lim inf

k→+∞gik≥ 0, lim

k→+∞gkixik= 0, ∀i = 1, 2, . . . , n, (24) where gk ∈ ˆ∂ f (xk) and gki is the i th component of gk.

(b) If limk→+∞λk= 0, then {xk}k∈N converges to a solution of the problem (5).

(12)

Proof We first prove that the sequence{xk}k∈Nconverges. By Lemma4.2(b),{xk}k∈Nis Fejér convergent to the set U with respect to H , which implies that

{xk}k∈N ⊆

y∈ IR++n | H(x, y) ≤ H(x, x0)

for∀x ∈ U.

As a consequence,{xk}k∈N is bounded by Lemma 3.1(a). Thus, there exist an ¯x and a subsequence{xkj} of {xk}k∈Nconverging to¯x. From the lower semicontinuity of f ,

j→+∞lim f(xkj) ≥ f ( ¯x), which, together with Lemma4.2(a), implies that

f( ¯x) ≤ f (xk), ∀k ∈ N.

This shows that¯x ∈ U. By Lemma4.2(c), the sequence{H( ¯x, xk)}k∈Nis then convergent.

In addition, from Lemma3.1(b), we have limk→+∞H( ¯x, xkj) = 0. From all the above, we conclude that{H( ¯x, xk)}k∈Nis a convergent sequence with a subsequence converging to 0, and consequently it must converge to 0 itself, i.e., limk→+∞H( ¯x, xk) = 0. Using Lemma 3.1(c) with zk= xkand yk = ¯x, we thus prove that {xk}k∈N converges to¯x.

(a) From the iterative formula (6) and Lemma2.3(d), we have that 0∈ ˆ∂

f(x) + λkdφ(x, xk) (xk+1).

Therefore, by Lemma2.3(c), there exists gk+1∈ ˆ∂ f (xk+1) such that λk1dφ(xk+1, xk) = −gk+1,

i.e.,

µλkxikϕ xik+1

xik

= −gk+1i , i = 1, 2, . . . , n. (25) Define the index sets

I( ¯x) :=

i∈ {1, 2, . . . , n} | ¯xi > 0

and J( ¯x) :=

i∈ {1, 2, . . . , n} | ¯xi = 0 . We next argue the conclusion by the two cases i∈ I ( ¯x) and i ∈ J( ¯x).

Case (1) i∈ I ( ¯x). In this case, limk→+∞xik+1/xik= 1 since {xk}k∈Nconverges to¯x. Using the continuity ofϕandϕ(1) = 0 and recalling that 0 < λ ≤ λk ≤ ¯λ for all k, we then obtain from (25) that

k→+∞lim gik+1= 0, ∀i ∈ I ( ¯x). (26) Case (2) i∈ J( ¯x). For every i ∈ J( ¯x), we define the following two index sets:

J+i =

k: xik+1/xik> 1

and Ji =

k: xik+1/xik≤ 1 .

Sinceϕ(1) = 0 and ϕis monotone increasing on its domain, we have from (25) that gik+1≤ 0 for ∀k ∈ J+i, ∀i ∈ J( ¯x).

On the other hand, using (25) and the fact thatϕ ∈ 1⊆ 2yields that gik+1≥ −µϕ(1)λkxik

xik+1 xik − 1

≥ −µϕ(1)¯λ(xik+1− xik), ∀k ∈ J+i.

(13)

Noting that limk→+∞(xik+1− xik) = 0, the last two equations imply that

k→+∞, k∈Jlim +i gik+1= 0, ∀i ∈ J( ¯x). (27)

Furthermore, sinceϕ(t) ≤ 0 for any 0 < t ≤ 1 by (ii) and (iv), we have from (25) that gik+1≥ 0, ∀k ∈ Ji, ∀i ∈ J( ¯x). (28) The inequalities (26)–(28) immediately imply the first part of (24), i.e.,

lim inf

k→+∞gik≥ 0, ∀i = 1, 2, . . . , n.

Next, let us prove the second part of (24). Using (26) and (27) and the fact that{xk}k∈N

converges to¯x, we have only to prove that

k→+∞, k∈Jlim i gki+1xik+1= 0, ∀i ∈ J( ¯x).

Considering that

k→+∞, k∈Jlim i xik+1= 0, ∀i ∈ J( ¯x)

and using the first part of (24), we then have only to prove that the subsequence{gki}k∈Ji for each i∈ J( ¯x) is bounded above. Take 0 > 0 and x ∈ IRn++∩ dom f with xi > 0for any i . Then for k∈ Ji sufficiently large, we have

xi− xik+10

2, ∀i ∈ J( ¯x). (29)

From Definition2.2, we have f(x) ≥ f (xk+1) +

n i=1

gik+1(x − xik+1) + o(x − xk+1), (30)

which implies that the subsequence{gik}k∈Ji is bounded above for i ∈ J( ¯x). Indeed, sup- pose the contrary. Then there would exist an i0∈ J( ¯x) and a subsequence {gik0l}kl∈Ji (with liml→+∞kl = +∞) such that

l→+∞lim gikl+1

0 = +∞, gik0l+1≥ 0.

Since the sequence{xk}k∈Nis convergent, using the Eq.27–29gives that there existsη ∈ IR such that for sufficiently large l,



i =i0

gikl+1(xi− xikl+1) + o(x − xk+1) ≥ η.

Then, from (29) and (30), we obtain

f(x) ≥ f (xkl+1) +0

2gikl+1

0 + η.

Since liml→+∞ f(xkl+1) ≥ f ( ¯x) and liml→+∞gkil+1

0 = +∞, passing to the limit in the above inequality leads to a contradiction.

(14)

(b) From the inequality in (19) and the non-negativity of dφ, it follows that f(xk+1) ≤ f (x) + λkdφ(x, xk), ∀x ∈ IRn++.

Taking the limit k→ +∞ into the inequality and using limk→+∞λk = 0, Lemma3.2(e) and the lower semicontinuity of f , we then obtain that

f( ¯x) ≤ f (x), ∀x ∈ IRn++, (31)

where¯x is such that limk→+∞xk= ¯x. This implies that ¯x ∈X.  Proposition 4.5 Suppose that assumptions (A1) and (A2) are satisfied. Letk}k∈N be an arbitrary sequence of positive numbers and{xk}k∈N be the sequence generated by RIPM.

Then, the sequence{xk}k∈Nconverges, and furthermore,

(a) If there existλ and ¯λ such that 0 < λ < λk≤ ¯λ for any k, we have lim inf

k→+∞gik≥ 0, lim

k→+∞gkixik= 0, ∀i = 1, 2, . . . , n, (32) where gki is same as Proposition4.4.

(b) If limk→+∞λk= 0, then the sequence {xk}k∈Nconverges to a solution of problem (5).

Proof First, we prove that the sequence{xk}k∈Nconverges. By Lemma4.3(b),{xk}k∈N is Fejér convergent to the set U , which implies that

{xk}k∈N ⊆

y∈ IRn| x − y ≤ x − x0

for∀x ∈ U.

Note that the latter set is bounded for any given x∈ IRn, and therefore, the sequence{xk}k∈N

is bounded and there exist anˆx and a subsequence {xkj} of {xk}k∈Nconverging to ˆx. Using a similar argument to the first part of Proposition4.4, we can prove that ˆx ∈ U. Thus, by Lemma4.3(c), the sequence{xk− ˆx}k∈Nis convergent. Since{xkj} ∈ IRn++converges to ˆx ∈ IR+n, we have that ˆx − xkj → 0, and consequently  ˆx − xk → 0, which implies that the limit point is unique and xk→ ˆx.

(a) From the iterative formula (6) and Lemma2.3(d), we have 0∈ ˆ∂

f(x) + λkdφ(x, xk) (xk+1).

Therefore, there exists gk+1∈ ˆ∂ f (xk+1) such that

λk1dφ(xk+1, xk) = −gk+1, i.e., for each i= 1, 2, . . . , n,

gki+1= −µλkxikϕ(xik+1/xik) − νλk(xik+1− xki). (33) Sinceϕ ∈ 2, we have from (9) that for each i= 1, 2, . . . , n,

−µλkxikϕ

xik+1 xik

≥ −µλkϕ(1)xik

xik+1 xik − 1

≥ −µλkϕ(1)(xik+1− xik).

Combining the last two inequalities then yields that

gik+1≥ λk(µϕ(1) + ν)(xik− xik+1) = 2θλk(xik− xik+1), i = 1, 2, . . . , n, (34)

Figure

Updating...

References

Related subjects :