Entropy-like proximal algorithms based on a second-order homogeneous distance function for quasi-convex programming

(1)

DOI 10.1007/s10898-007-9156-y O R I G I NA L PA P E R

Entropy-like proximal algorithms based on a second-order homogeneous distance function for quasi-convex programming

Shaohua Pan · Jein-Shan Chen

Received: 13 August 2006 / Accepted: 12 March 2007 / Published online: 25 April 2007

Abstract We consider two classes of proximal-like algorithms for minimizing a proper lower semicontinuous quasi-convex function f(x) subject to non-negative constraints x ≥ 0.

The algorithms are based on an entropy-like second-order homogeneous distance function.

Under the assumption that the global minimizer set is nonempty and bounded, we prove the full convergence of the sequence generated by the algorithms, and furthermore, obtain two important convergence results through imposing certain conditions on the proximal parameters. One is that the sequence generated will converge to a stationary point if the proximal parameters are bounded and the problem is continuously differentiable, and the other is that the sequence generated will converge to a solution of the problem if the proximal parameters approach to zero. Numerical experiments are done for a class of quasi-convex optimization problems where the function f(x) is a composition of a quadratic convex function from IRⁿ to IR and a continuously differentiable increasing function from IR to IR, and computational results indicate that these algorithms are very promising in finding a global optimal solution to these quasi-convex problems.

Keywords Proximal-like method· Entropy-like distance · Quasi-convex programming

Shaohua Pan work was partially supported by the Doctoral Starting-up Foundation (05300161) of GuangDong Province.

Member of Mathematics Division, National Center for Theoretical Sciences, Taipei Office. Jein-Shan Chen work is partially supported by National Science Council of Taiwan.

S. Pan (

B

⁾

School of Mathematical Sciences, South China University of Technology, Guangzhou 510641, China e-mail: shhpan@scut.edu.cn

J.-S. Chen

Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan e-mail: jschen@math.ntnu.edu.tw

(2)

1 Introduction

The proximal point algorithm for minimizing a convex function f(x) on IRⁿ generates a sequence{x^k}k∈N ⊆ IRⁿ by the following iterative scheme:

x^k+1= argmin

x∈IRⁿ

f(x) + λkx − x^k²

, (1)

whereλkis a sequence of positive numbers and · denotes the Euclidean norm in IRⁿ. This method, originally introduced by Martinet [15], is based on the Moreau proximal approxi- mation of f (see [16]). The proximal point algorithm was then further developed and studied by Rockafellar [19,20]. Later, several researchers [4,5,7,12,14,23] proposed and studied nonquadratic proximal point algorithm by replacing the quadratic distance in (1) with a Bregman distance or an entropy-like distance. Among others, the entropy-like distance, also calledϕ-divergence, is defined by

d_ϕ(x, y) =

n i=1

yiϕ(xi/yi), (2)

whereϕ: IR → (−∞, +∞] is a closed proper strictly convex function satisfying certain conditions; see [12,13,23,24]. This class of distance-like functions was first proposed by Teboulle [23] in order to define entropy-like proximal maps. A popular choice ofϕ is the case thatϕ(t) = t ln t − t + 1, for which the corresponding d_ϕis exactly the well-known Kullback–Leibler entropy function from statistics [7,8,10,23] and that is the “entropy" ter- minology stems from.

The proximal-like algorithm based onϕ-divergence, originally designed for minimizing a convex function f(x) subject to non-negative constraints x ≥ 0, consists of a sequence {x^k}k∈N ⊆ IRⁿ₊₊generated by the iterative scheme as follows:

x⁰> 0, x^k+1= argmin

x≥0 { f (x) + λkd_ϕ(x, x^k)}. (3) This class of proximal-like algorithms were studied extensively for convex programming;

see [12,13,23,24] and references therein, and particularly, the one withϕ(t) = t ln t − t + 1 was recently extended to convex semidefinite programs [6] and convex second-order cone programs in a recent manuscript of J.-S. Chen. In fact, the algorithm (3) associated with ϕ(t) = − ln t + t − 1 was first proposed by Eggermont [8]. It is worth to point out that the fundamental difference between (1) and (3) is that the term d_ϕ(·, ·) is used in (3) to force the iterates{x^k}k∈Nto stay in IR₊₊ⁿ which is the interior of the non-negative orthant, namely the algorithm (3) will automatically generate a positive sequence{x^k}k∈N ⊆ IRⁿ₊₊.

In this paper, we will focus on two classes of proximal-like algorithms of the form (3) but with a second-order homogeneous distance-like function d_φgiven by

d_φ(x, y) =

n i=1

y_i²φ(xi/yi), (4)

where the kernelφ is defined with two types of special ϕ and a quadratic function. The definition ofφ and the properties of d_φare given in Sect. 3. This class of algorithms has been studied for convex minimization (see [1,2,22]). However, we in this paper employ these

(3)

algorithms to solve the following quasi-convex minimization problem:

min f(x)

s.t. x ≥ 0, (5)

where f: IRⁿ → IR is a proper lower semicontinuous quasi-convex function. Since we do not require the convexity of f , the basic iterative scheme for the algorithms is as follows:

x⁰> 0, x^k+1∈ argmin

x≥0 { f (x) + λkd_φ(x, x^k)}, (6) whereλkis same as before. The purpose of this paper is to establish the full convergence of the sequence{x^k}k∈N generated by (6) under some mild assumptions for the quasi-convex problem (5), and verify the effectiveness of the algorithms by numerical experiments.

Note that (5) is a special nonconvex optimization problem, and therefore the global optimization methods [11] developed for the general nonconvex optimization problem can be applied for solving it. Nevertheless, we should point out that the design of these global optimization methods is often far more complex than that of the proximal-like method (6).

The rest of this paper is organized as follows. In Sect.2, we recall some definitions and basic results that will be used in the later sections. In Sect.3, we present the definition of the kernelφ and investigate the properties of dφ. Based on the entropy-like second-order homo- geneous distance function d_φ, we in Sect.4propose two classes of proximal-like algorithms, and prove the full convergence of the sequence generated. In Sect.5, numerical experi- ments were done with a specific d_φ for a class of continuously differentiable quasi-convex programming problems.

Unless otherwise stated, in this paper, we use the notation·, · and · to denote the Euclidean inner product and Euclidean norm in IRⁿ, and IR₊ⁿ to represent the non-negative orthant in IRⁿwith the interior IRⁿ₊₊. For a given differentiable function f: IRⁿ → IR, ∇ f (x) denotes the gradient of f at x, while(∇ f (x))i means the i th partial derivative of f with respect to x. In addition, we use∇1d_φ(x, y) to denote the partial derivative of d_φwith respect to its first component.

2 Basic concepts

In this section, we recall some definitions and basic results which will be used in the subse- quent analysis. We start with the definition of Fejér convergence for a sequence.

Definition 2.1 A sequence{y^k}k∈N is Fejér convergent to a nonempty set U ⊆ IRⁿ with respect to a distance-like function d(·, ·), if for every u ∈ U, we have d(u, y^k+1) ≤ d(u, y^k).

When d is the Euclidean distance,{y^k} is called Fejér convergent to U.

Given an extended real-valued function f: IRⁿ → IR ∪ {+∞}, denote its domain by dom f:= {x ∈ IRⁿ: f (x) < +∞}

and its epigraph by

epi f:=

(x, β) ∈ IRⁿ× IR : f (x) ≤ β .

Then, f is said to be proper if dom f = ∅ and f (x) > −∞ for any x ∈ dom f , and f is a lower semicontinuous function if epi f is a closed subset of IRⁿ× IR. We next recall the definition of the Fréchet subdifferential; see [18, Chapter 8] and [21, Chapter10].

(4)

Definition 2.2 Let f: IRⁿ → IR ∪ {+∞} be a proper lower semicontinuous function. For each x∈ dom f , the Fréchet subdifferential of f at x, denoted by ˆ∂ f (x), is the set of vectors s∈ IRⁿsuch that

lim inf

y=x,y→x

1

y − x

f(y) − f (x) − s, y − x

≥ 0. (7)

If x /∈ dom f , then ˆ∂ f (x) = ∅.

The vector s satisfying the inequality (7) is also termed as a regular subgradient of f at x (see [21, p. 301]). It is not difficult to see that the inequality (7) is equivalent to

f(y) ≥ f (x) + s, y − x + o(y − x), where

y→xlimo(y − x)/y − x = 0.

For the subdifferential ˆ∂ f (x), the following results hold by direct verifications.

Lemma 2.3 [21, Chapter 8] Let f: IRⁿ → IR ∪ {+∞} be a proper lower semicontinuous function and ˆ∂ f (x) be the subdifferential of f at x. Then,

(a) ˆ∂ f (x) is a closed and convex set.

(b) If f is differentiable at x or in a neighborhood of x, then ˆ∂ f (x) = {∇ f (x)}, where

∇ f (x) is the gradient of f .

(c) If g = f + h with f finite at x and h differentiable on a neighborhood of x, then ˆ∂g(x) = ˆ∂ f (x) + ∇h(x).

(d) If f has a local minimum at ¯x, then 0 ∈ ˆ∂ f ( ¯x).

To work with differentiable minimization problems, we also need the following definition.

Definition 2.4 Suppose that f: IRⁿ→ IR is a differentiable function. Then,

(a) For an unconstrained optimization problem of minimizing f(x) over x ∈ IRⁿ, x^∗is called a stationary point if∇ f (x^∗) = 0.

(b) For a constrained optimization problem of minimizing f(x) over x ∈ C where C is nonempty and convex subset of IRⁿ, x^∗is called a stationary point if

∇ f (x^∗)^T(x − x^∗) ≥ 0 for all x ∈ C.

To close this section, we recall the concept of quasi-convexity, strict quasi-convexity and strong quasi-convexity, and briefly discuss general properties of the minimization problem involving the objective function with such properties.

Definition 2.5 Let f: IRⁿ→ IR be a proper function. Then, f is called quasi-convex if for all x, y ∈ dom f and β ∈ (0, 1), there always holds

f(βx + (1 − β)y) ≤ max{ f (x), f (y)}.

It can be proved that any convex function is also quasi-convex, but the converse is not true.

For a quasi-convex function, we have the following important property.

Proposition 2.6 The proper function f: IRⁿ → IR is quasi-convex if and only if the level sets Lf(α) := {x ∈ dom f | f (x) ≤ α} are convex for every α ∈ IR.

(5)

Definition 2.7 Let f: IRⁿ→ IR be a proper function. Then, f is called strictly quasi-convex if for all x, y ∈ dom f with f (x) = f (y), there always holds

f(βx + (1 − β)y) < max{ f (x), f (y)} for ∀β ∈ (0, 1).

By [3, Lemma 3.5.7], if f is lower semicontinuous and strictly quasi-convex, then f is quasi- convex. For a strictly quasi-convex function, we have the following important result, which implies that every local optimal solution of (5) is also a global optimal solution.

Proposition 2.8 [3, Theorem 3.5.6] Let f: IRⁿ → IR be a proper strictly quasi-convex function. Consider the problem to minimize f(x) subject to x ∈ C, where C is a nonempty convex set in IRⁿ. If ¯x is a local optimal solution, then ¯x is also a global optimal solution.

Definition 2.9 Let f: IRⁿ → IR be a proper function. Then, f is called strongly quasi-convex if for all x, y ∈ dom f with x = y, there always holds

f(βx + (1 − β)y) < max{ f (x), f (y)} for ∀β ∈ (0, 1).

It can be shown that every strongly quasi-convex function is strictly quasi-convex, and every strongly quasi-convex function is quasi-convex even without semicontinuity assumption.

When f(x) is strongly quasi-convex, the problem (5) has the unique global optimal solution.

3 Distance-like function d_φand its properties

In this section, we present the definition of the kernelφ and investigate the properties of the bivariate function d_φ induced byφ via formula (4). We start with the assumptions on the functionϕ, needed to define the kernel φ. Let ϕ: IR → (−∞, +∞] be a closed proper convex function with domϕ = ∅ and domϕ ⊆ [0, +∞). We assume that

(i) ϕ is twice continuously differentiable on int(domϕ) = (0, +∞);

(ii) ϕ is strictly convex on its domain;

(iii) lim_t_→0+ϕ(t) = −∞;

(iv) ϕ(1) = ϕ(1) = 0 and ϕ(1) > 0.

In the rest of this paper, we denote by the class of functions satisfying (1)–(4).

Givenϕ ∈ , we define the following two subclasses of :

1=

ϕ ∈ : ϕ(1)(1 − 1/t) ≤ ϕ(t) ≤ ϕ(1) ln t, ∀t > 0

(8) and

2=

ϕ ∈ : ϕ(1)(1 − 1/t) ≤ ϕ(t) ≤ ϕ(1)(t − 1), ∀t > 0

. (9)

Since ln t≤ t − 1 for any t > 0 and ϕ(1) > 0, clearly, 1⊆ 2⊆ . The assumptions on

1and2are very mild. It is not hard to verify that the following functions ϕ1(t) = t ln t − t + 1, dom ϕ = [0, +∞), ϕ2(t) = − ln t + t − 1, dom ϕ = (0, +∞),

ϕ3(t) = (√

t− 1)², dom ϕ = [0, +∞)

are all in1, and consequently belong to2. The first example ϕ1 plays an important role in the convergence analysis of our first class of algorithms that will be studied in the

(6)

next section. As mentioned in the introduction, theϕ-divergence for ϕ = ϕ1 is exactly the Kullback–Leibler entropy function, given by

H(x, y) := dϕ(x, y) =

n j=1

xjln(xj/yj) + yj− xj, (10)

whose domain can be continuously extended to IR₊ⁿ × IR₊₊ⁿ by using the convention that 0 ln 0= 0. The following lemma states some useful properties of H(x, y), and since their proofs are elementary by use of (10), we here omit them.

Lemma 3.1 Let H(·, ·) be defined as in (10). Then, we have the following results.

(a) The level sets of H(x, ·) are bounded for all x ∈ IRⁿ₊.

(b) If{y^k} ⊂ IRⁿ₊₊converges to y∈ IRⁿ₊, then lim_k→+∞H(y, y^k) = 0.

(c) If{z^k} ⊂ IR₊ⁿ, {y^k} ⊂ IRⁿ₊₊are sequences such that{z^k} is bounded, limk→+∞y^k= y and limk→+∞H(z^k, y^k) = 0, then limk→+∞z^k= y.

With the above assumptions onϕ, we now give the definition of the kernel φ involved in the function d_φ. Givenϕ ∈ and the parameters µ > 0 and ν ≥ 0, let φ : IR → (−∞, +∞]

be a closed proper convex function defined by φ(t) := µϕ(t) +ν

2(t − 1)². (11)

It is not difficult to verify thatφ satisfies the properties listed in (i)–(iv), and consequently φ ∈ . Particularly, φ will be strongly convex on its domain if ν > 0. This implies that the objective function of the subproblem (6), i.e., f(x) + λ_n

i=1(x_i^k)²φ(xi/x_i^k) will be strictly convex on IRⁿ₊₊ if the parameterλ is set to be sufficiently large, although f (x) itself is quasi-convex. That is to say, the proximal term d_φ(·, ·) plays a convexification role in the quasi-convex subproblem (6), and moreover, the convexification role becomes stronger as the parameterλ increases. In fact, from the computational results in Sect.5, we may see that the proximal term shows a good convexification role for the quasi-convex function f(x), even for a very smallλ.

In what follows, we will concentrate on the properties of the bivariate function d_φ. Lemma 3.2 Given aϕ ∈ and the parameters µ > 0, ν ≥ 0, and let φ be the kernel defined by (11) and d_φ(·, ·) be the function induced by φ via formula (4). Then,

(a) d_φis a homogeneous function of order 2, i.e., d_φ(αx, αy) = α²d_φ(x, y) for ∀α > 0.

(b) For a fixed y∈ IRⁿ₊₊, the function d_φ(·, y) is strictly convex over IRⁿ₊₊. If, in addition, ν > 0, then d_φ(·, y) is strongly convex on IR₊₊ⁿ .

(c) For any(x, y) ∈ IRⁿ₊₊× IRⁿ₊₊, d_φ(x, y) ≥ 0, and d_φ(x, y) = 0 if and only if x = y.

(d) For any fixed z ∈ IR₊₊ⁿ , the level sets L(z, γ ) := {x ∈ IR₊₊ⁿ : dφ(x, z) ≤ γ } are bounded for allγ ≥ 0.

(e) Ifϕ ∈ 1or2, and{y^k}k∈N⊆ IR₊₊ⁿ converges to¯y ∈ IR₊ⁿ, then for any fixed x∈ IR₊₊ⁿ , the sequence{dφ(x, y^k)}k∈Nis bounded.

Proof The properties in (a) and (b) are clear from the definition of d_φgiven by (4).

(c) Note thatφ(t) is strictly convex and moreover φ(1) = µϕ(1) = 0 due to (iv). Hence, φ(t) ≥ φ(1) = 0 and φ(t) = 0 iff t = 1.

This implies that d_φ(x, y) ≥ 0 for ∀(x, y) ∈ IRⁿ₊₊× IRⁿ₊₊, and d_φ(x, y) = 0 iff x = y.

(7)

(d) To prove the result, it is enough to consider the one-dimensional case, i.e., to show that h_ζ(t):= ζ²φ(t/ζ ) for ζ > 0 has bounded level sets, which in turn is equivalent to showing thatφ has bounded level sets. Note that {t : φ(t) ≤ 0} = {1}. Therefore, the conclusion follows from [18, Corrollary 8.7.1].

(e) From the definitions ofφ and d_φ, we have that

d_φ(x, y^k) =

n i=1

⎡

⎣µ(y_i^k)²ϕ x_i

y_i^k

+ν 2(y_i^k)²

x_i y^k_i − 1

2⎤

⎦

=

n i=1

µ(y_i^k)²ϕ(xi/y_i^k) +ν

2(xi− y_i^k)² .

Ifϕ(t) is bounded above for any t > 0, then the conclusion is obvious. Otherwise, we discuss the following two cases:

Case (1)¯yi> 0 for each i ∈ {1, 2, . . . , n}. Since {y_i^k}k∈N→ ¯yifor each i , the proof follows directly from the continuity ofϕ.

Case (2) there exists an index i0 ∈ {1, 2, . . . , n} such that ¯yi0 = 0. By the given assumptions and Case (1), it suffices to prove that the sequence{(y_i^k₀)²ϕ(xi/y_i^k₀)} is bounded above. For any k∈ N, using the convexity of ϕ and the fact that ϕ(1) = 0, we have that

0≥ ϕ(xi/y_i^k₀) + ϕ(xi/y_i^k₀)

1− xi/y^k_i₀ . Multiplying the inequality with(y^k_i₀)²readily yields that

(y_i^k₀)²ϕ(xi/y_i^k₀) ≤ (y_i^k₀)²ϕ(xi/y_i^k₀)

xi/y_i^k₀− 1

= (y_i^k₀)ϕ(xi/y_i^k₀)

xi− y_i^k₀ , which in turn implies that

(y_i^k₀)²ϕ(xi/y_i^k₀) ≤(yi^k₀)ϕ(xi/y_i^k₀)(xi− y_i^k₀).

Ifϕ ∈ 2, then it follows from (9) that

ϕ(1)y_i^k₀(1 − y_i^k₀/xi) ≤ y_i^k₀ϕ(xi/y_i^k₀) ≤ ϕ(1)(xi− y_i^k₀).

Combining the last two inequalities immediately gives that (y_i^k₀)²ϕ(xi/y_i^k₀) ≤ max

ϕ(1)

xi− y_i^k₀₂

, ϕ(1)y_i^k

0

xi

xi− y_i^k₀₂ .

This together with the given assumptions shows that{(y_i^k₀)²ϕ(xi/y_i^k₀)} is bounded above for anyϕ ∈ 2, and consequently the sequence{dφ(x, y^k)}k∈N is bounded. Noting that

1⊆ 2, the sequence{dφ(x, y^k)}k∈Nis also bounded forϕ ∈ 1. Lemma3.2(a)–(c) state that d_φ defined by (4) is a convex second-order homogeneous distance-like function. Thus, in analogy with the Euclidean distance, we can define the pro- jection of a point y, denoted by ˆx(y), to a closed convex set S ⊆ IRⁿ with respect to d_φ, which is characterized as the solution of the following problem

inf

d_φ(x, y): x ∈ S

. (12)

The existence of ˆx(y) is guaranteed by Lemma3.2(d). For this projection, we have the following similar results to the Euclidean projection.

(8)

Lemma 3.3 Let S be a closed convex subset of IRⁿ and y∈ IRⁿ be a point not in S. Then ˆx(y) is the projection of y on S with respect to dφif and only if

x− ˆx(y), −∇1d_φ( ˆx(y), y)

≤ 0, ∀x ∈ S. (13)

Proof Note that problem (12) is equivalent to inf{dφ(x, y) + δ(x | S) : x ∈ IRⁿ}, where δ(· | S) denotes the indicator function of the set S. By [19, Theorem 27.4], ˆx(y) solves the unconstrained optimization problem if and only if the inequality (13) holds. Thus, the proof

is completed.

Finally, we present a favorable property of d_φ withϕ ∈ 1 or2, which will play a crucial role in the convergence analysis of algorithms to be studied in the next section.

Lemma 3.4 Given aϕ ∈ and the parameters µ > 0, ν ≥ 0, and let φ be the kernel defined as in (11). Then, for any a, b ∈ IRⁿ₊₊and c∈ IRⁿ₊, we have the following results:

(a) Ifν = 0 and ϕ ∈ 1, thenc − b, ∇1d_φ(b, a) ≤ µϕ(1) max1≤ j≤n{aj}[H(c, a) − H(c, b)].

(b) Ifν ≥ µϕ(1) > 0 and ϕ ∈ 2, thenc − b, ∇1d_φ(b, a) ≤ θ(c − a²− c − b²) withθ = (ν + µϕ(1))/2.

Proof (a) Sinceϕ ∈ 1, we have from (8) that

ϕ(t) ≤ ϕ(1) ln t for any t > 0.

Setting t= bj/ajin the inequality, we then obtain that

c_ja_jϕ(bj/aj) ≤ cja_jϕ(1) ln(bj/aj), j = 1, 2, . . . , n. (14) On the other hand, it follows from (8) that

−ϕ(t) ≤ −ϕ(1)(1 − 1/t), ∀t > 0.

Substituting t= bj/aj into the inequality and multiplying with ajgives

− bja_jϕ(bj/aj) ≤ ajϕ(1)(aj− bj), j = 1, 2, . . . , n. (15) Define

(a, b):= (a1ϕ(b1/a1), . . . , anϕ(bn/an))^T, ∀a, b ∈ IRⁿ₊₊. Then, adding the inequalities (14) and (15) and summing over j = 1, . . . , n gives

c− b, (a, b)

≤ ϕ(1)

⎡

⎣ⁿ

j=1

a_j

c_jln(bj/aj) + aj− bj

⎤

⎦

≤ ϕ(1) max

1≤ j≤n{aj}

⎡

⎣ⁿ

j=1

c_jln(bj/aj) + aj− bj

⎤

⎦

= ϕ(1) max

1≤ j≤n{aj} [H(c, a) − H(c, b)] .

Note that∇1d_φ(b, a) = µ (a, b), and hence we obtain the result from the last inequality.

(b) The proof is similar to [2, Lemma 3.4]. For completeness, we here include it. Since ϕ ∈ 2, the inequality (15) still holds. On the other hand, we have from (9) that

ϕ(t) ≤ ϕ(1)(t − 1), ∀t > 0.

(9)

Substituting t= bj/aj into the above inequality leads to

cjajϕ(bj/aj) ≤ cjajϕ(1)(bj/aj− 1) = ϕ(1)cj(bj− aj), j = 1, 2, . . . , n. (16) Adding the two inequalities (15) and (16), summing over j = 1, 2, . . . , n, and using the definition of (a, b), we obtain

c − b, (a, b) ≤ ϕ(1)

n j=1

cj(bj− aj) + aj(aj− bj)

= ϕ(1)c − a, b − a.

Note that∇1d_φ(b, a) = µ (a, b) + ν(b − a). Then, the last inequality implies that

c − b, ∇1d_φ(a, b) ≤ µϕ(1)c − a, b − a + νc − b, b − a. (17) Using the identities

c − a, b − a = (1/2)(c − a²− c − b²+ b − a²) and

c − b, b − a = (1/2)(c − a²− c − b²− b − a²) we then from (17) obtain

c − b, ∇1d_φ(b, a) ≤ θ(c − a²− c − b²) −1

2(ν − µϕ(1))b − a²

≤ θ(c − a²− c − b²),

where the second inequality is due toν ≥ µϕ(1). Thus, the proof is completed.

4 Interior proximal-like methods

In this section, we consider two classes of proximal-like algorithms based on the second-order homogeneous function d_φ for the quasi-convex optimization problem (5). The two kinds of algorithms are described as follows, where the RIPM was first proposed by Auslender et al.

[2] for convex minimization problems subject to non-negative constraints.

Interior Proximal Method (IPM) Letφ be defined as in (11) withµ > 0, ν = 0 and ϕ ∈ 1. Generate the sequence{x^k}k∈Nby the iterative scheme (6).

Regularized Interior Proximal Method (RIPM) Letφ be defined as in (11) withν ≥ µϕ(1) > 0 and ϕ ∈ 2. Generate the sequence{x^k}k∈Nby the iterative scheme (6).

To establish the convergence of IPM and RIPM, throughout this section, we make the following assumptions for the quasi-convex optimization problem (5):

(A1) dom f ∩ IR₊₊ⁿ = ∅.

(A2) The optimal set of problem (5), denoted byX^∗, is nonempty and bounded.

In what follows, we concentrate on the convergence of IPM and RIPM. We first prove that they are well-defined, which is a direct consequence of the following lemma.

Lemma 4.1 Givenµ > 0, ν ≥ 0 and ϕ ∈ , and let φ and d_φ be defined as in (11) and (4), respectively. Then, under assumptions (A1) and (A2), the sequence{x^k}k∈N generated by the iterative scheme (6) is well defined.

(10)

Proof The proof proceeds by induction. Clearly, when k= 0, the conclusion holds due to (6). Assume that x^kis well defined. Let f^∗be the optimal value of problem (5), then

f(x) + λkd_φ(x, x^k) ≥ f^∗+ λkd_φ(x, x^k) for all x ∈ IRⁿ₊₊. (18) Let Fk(x) := f (x) + λkd_φ(x, x^k) and denote its level sets by

LFk(γ ) := {x ∈ IRⁿ₊₊: Fk(x) ≤ γ } for all γ ∈ IR.

Then, the inequality in (18) implies that L_F_k(γ ) ⊆ L(x^k, λ⁻¹_k (γ − f^∗)). By Lemma3.2(c), the level sets L(x^k, λ⁻¹_k (γ − f^∗)) are bounded for any γ ≥ f^∗, and consequently, the sets LF_k(γ ) are bounded for any γ ≥ f^∗. Whereas for anyγ ≤ f^∗, we have LF_k(γ ) ⊆ X^∗, which are obviously bounded due to assumption (A2). The two sides show that the level sets of the function F_k(x) are bounded. Also, Fk(x) is lower semicontinuous on dom f . Hence, the level sets of F_k(x) are compact. Now, using the lower semicontinuity of Fk(x) and the compactness of its level sets, we have that F_k(x) has a global minimum which may not be unique due to the nonconvexity of f . In such case, x^k+1can be arbitrarily chosen among the

set of minimizers of Fk(x).

Next, we investigate the properties of the sequence{x^k}k∈Ngenerated by IPM and RIPM.

To this end, we define the following set U :=

x∈ IR₊ⁿ | f (x) ≤ inf

k∈N f(x^k) .

From assumptions (A1)–(A2) and Proposition2.6, U is a nonempty closed convex set.

Lemma 4.2 Let{λk}k∈Nbe an arbitrary sequence of positive numbers and{x^k}k∈Nbe the sequence generated by IPM. Then, under assumptions (A1)–(A2),

(a) { f (x^k)}k∈Nis a decreasing and convergent sequence.

(b) {x^k}k∈Nis Fejér convergent to the set U with respect to H . (c) For all x∈ U, the sequence {H(x, x^k)}_k∈N is convergent.

Proof (a) From Eq.6, x^k+1is a global optimal solution of the following problem:

minx≥0

f(x) + λkd_φ(x, x^k) and consequently, for any x∈ IRⁿ₊, it follows that

f(x^k+1) + λkd_φ(x^k+1, x^k) ≤ f (x) + λkd_φ(x, x^k). (19) Setting x= x^kin (19), we then obtain that

f(x^k+1) + λkd_φ(x^k+1, x^k) ≤ f (x^k) + λkd_φ(x^k, x^k) = f (x^k), which means that

0≤ λkd_φ(x^k+1, x^k) ≤ f (x^k) − f (x^k+1).

Hence,{ f (x^k)}k∈Nis decreasing, and furthermore, convergent due to assumption (A2).

(b) From inequality (19), it follows that for any x∈ U, d_φ(x^k+1, x^k) ≤ d_φ(x, x^k).

(11)

This implies that x^k⁺¹is the unique projection of x^kon U with respect to d_φ. Therefore, by Lemma3.3, we have that

x− x^k⁺¹, −∇1d_φ(x^k⁺¹, x^k)

≤ 0, ∀x ∈ U. (20)

On the other hand, applying Lemma3.4(a) at the points c= x, a = x^k, and b= x^k+1, we then obtain that

H(x, x^k) − H(x, x^k+1) ≥ x − x^k+1, ∇1d_φ(x^k+1, x^k)

µϕ(1) max1≤ j ≤ n{x^k_j} . (21) Sinceµϕ(1) max1≤ j ≤ n{x^k_j} > 0, using the inequalities (20) and (21) yields that

H(x, x^k) ≥ H(x, x^k+1), ∀x ∈ U.

From Definition2.1, it follows that{x^k}k∈N is Fejér convergent to U with respect to H . (c) The proof follows from part (b) and the non-negativity of H . Lemma 4.3 Let{λk}k∈Nbe an arbitrary sequence of positive numbers and{x^k}k∈Nbe the sequence generated by RIPM. Then, under assumptions (A1) and (A2),

(a) { f (x^k)}k∈Nis a decreasing and convergent sequence.

(b) {x^k}k∈Nis Fejér convergent to the set U .

(c) For all x∈ U, the sequence {x − x^k}k∈N is convergent.

Proof

(a) The proof is similar to that of Lemma4.2(a), and we here omit it.

(b) By a similar argument to Lemma4.2(b), we can obtain the inequality (20). On the other hand, applying Lemma3.4(b) at the points c= x, a = x^k, and b= x^k+1gives

x − x^k, ∇1d_φ(x^k, x^k+1) ≤ θ(x − x^k²− x − x^k+1²), (22) whereθ = (ν + µϕ(1))/2. Since θ > 0, using the inequalities (20) and (22) yields that

x − x^k⁺¹²≤ x − x^k², ∀x ∈ U. (23) By Definition2.1, we thus prove that{x^k}k∈Nis Fejér convergent to the set U .

(c) The proof follows from part (b) and the non-negativity ofx − x^k. To now, we have proved that the sequence{x^k}k∈N generated by IPM or RIPM is well- defined and satisfies some favorable properties. With these properties, we next establish the convergence results of the proposed algorithms.

Proposition 4.4 Suppose that assumptions (A1) and (A2) are satisfied. Let{λk}k∈N be an arbitrary sequence of positive numbers and{x^k}k∈N be the sequence generated by IPM.

Then, the sequence{x^k}k∈Nconverges, and furthermore,

(a) if there existλ and ¯λ such that 0 < λ < λk≤ ¯λ for any k, then lim inf

k→+∞g_i^k≥ 0, lim

k→+∞g^k_ix_i^k= 0, ∀i = 1, 2, . . . , n, (24) where g^k ∈ ˆ∂ f (x^k) and g^k_i is the i th component of g^k.

(b) If lim_k→+∞λk= 0, then {x^k}_k∈N converges to a solution of the problem (5).

(12)

Proof We first prove that the sequence{x^k}k∈Nconverges. By Lemma4.2(b),{x^k}k∈Nis Fejér convergent to the set U with respect to H , which implies that

{x^k}k∈N ⊆

y∈ IR₊₊ⁿ | H(x, y) ≤ H(x, x⁰)

for∀x ∈ U.

As a consequence,{x^k}k∈N is bounded by Lemma 3.1(a). Thus, there exist an ¯x and a subsequence{x^k^j} of {x^k}k∈Nconverging to¯x. From the lower semicontinuity of f ,

j→+∞lim f(x^k^j) ≥ f ( ¯x), which, together with Lemma4.2(a), implies that

f( ¯x) ≤ f (x^k), ∀k ∈ N.

This shows that¯x ∈ U. By Lemma4.2(c), the sequence{H( ¯x, x^k)}k∈Nis then convergent.

In addition, from Lemma3.1(b), we have lim_k→+∞H( ¯x, x^k^j) = 0. From all the above, we conclude that{H( ¯x, x^k)}k∈Nis a convergent sequence with a subsequence converging to 0, and consequently it must converge to 0 itself, i.e., limk→+∞H( ¯x, x^k) = 0. Using Lemma 3.1(c) with z^k= x^kand y^k = ¯x, we thus prove that {x^k}k∈N converges to¯x.

(a) From the iterative formula (6) and Lemma2.3(d), we have that 0∈ ˆ∂

f(x) + λkd_φ(x, x^k) (x^k+1).

Therefore, by Lemma2.3(c), there exists g^k⁺¹∈ ˆ∂ f (x^k⁺¹) such that λk∇1d_φ(x^k⁺¹, x^k) = −g^k⁺¹,

i.e.,

µλkx_i^kϕ x_i^k⁺¹

x_i^k

= −g^k+1_i , i = 1, 2, . . . , n. (25) Define the index sets

I( ¯x) :=

i∈ {1, 2, . . . , n} | ¯xi > 0

and J( ¯x) :=

i∈ {1, 2, . . . , n} | ¯xi = 0 . We next argue the conclusion by the two cases i∈ I ( ¯x) and i ∈ J( ¯x).

Case (1) i∈ I ( ¯x). In this case, lim_k→+∞x_i^k⁺¹/x_i^k= 1 since {x^k}_k∈Nconverges to¯x. Using the continuity ofϕandϕ(1) = 0 and recalling that 0 < λ ≤ λk ≤ ¯λ for all k, we then obtain from (25) that

k→+∞lim g_i^k⁺¹= 0, ∀i ∈ I ( ¯x). (26) Case (2) i∈ J( ¯x). For every i ∈ J( ¯x), we define the following two index sets:

J₊ⁱ =

k: x_i^k+1/x_i^k> 1

and J₋ⁱ =

k: x_i^k+1/x_i^k≤ 1 .

Sinceϕ(1) = 0 and ϕis monotone increasing on its domain, we have from (25) that g_i^k+1≤ 0 for ∀k ∈ J₊ⁱ, ∀i ∈ J( ¯x).

On the other hand, using (25) and the fact thatϕ ∈ 1⊆ 2yields that g_i^k+1≥ −µϕ(1)λkx_i^k

x_i^k⁺¹ x_i^k − 1

≥ −µϕ(1)¯λ(x_i^k+1− x_i^k), ∀k ∈ J₊ⁱ.

(13)

Noting that lim_k→+∞(x_i^k⁺¹− x_i^k) = 0, the last two equations imply that

k→+∞, k∈Jlim ₊ⁱ g_i^k+1= 0, ∀i ∈ J( ¯x). (27)

Furthermore, sinceϕ(t) ≤ 0 for any 0 < t ≤ 1 by (ii) and (iv), we have from (25) that g_i^k+1≥ 0, ∀k ∈ J₋ⁱ, ∀i ∈ J( ¯x). (28) The inequalities (26)–(28) immediately imply the first part of (24), i.e.,

lim inf

k→+∞g_i^k≥ 0, ∀i = 1, 2, . . . , n.

Next, let us prove the second part of (24). Using (26) and (27) and the fact that{x^k}k∈N

converges to¯x, we have only to prove that

k→+∞, k∈Jlim ₋ⁱ g^k_i⁺¹x_i^k⁺¹= 0, ∀i ∈ J( ¯x).

Considering that

k→+∞, k∈Jlim ₋ⁱ x_i^k⁺¹= 0, ∀i ∈ J( ¯x)

and using the first part of (24), we then have only to prove that the subsequence{g^k_i}_k∈J₋ⁱ for each i∈ J( ¯x) is bounded above. Take 0 > 0 and x ∈ IRⁿ₊₊∩ dom f with xi > 0for any i . Then for k∈ J₋ⁱ sufficiently large, we have

xi− x_i^k+1≥0

2, ∀i ∈ J( ¯x). (29)

From Definition2.2, we have f(x) ≥ f (x^k⁺¹) +

n i=1

g_i^k+1(x − x_i^k+1) + o(x − x^k⁺¹), (30)

which implies that the subsequence{g_i^k}_k_∈J₋ⁱ is bounded above for i ∈ J( ¯x). Indeed, sup- pose the contrary. Then there would exist an i₀∈ J( ¯x) and a subsequence {g_i^k₀^l}_k_l_∈J₋ⁱ (with liml→+∞kl = +∞) such that

l→+∞lim g_i^k^l⁺¹

0 = +∞, g_i^k₀^l⁺¹≥ 0.

Since the sequence{x^k}k∈Nis convergent, using the Eq.27–29gives that there existsη ∈ IR such that for sufficiently large l,

i=i0

g_i^k^l⁺¹(xi− x_i^k^l⁺¹) + o(x − x^k+1) ≥ η.

Then, from (29) and (30), we obtain

f(x) ≥ f (x^k^l⁺¹) +0

2g_i^k^l⁺¹

0 + η.

Since liml→+∞ f(x^k^l⁺¹) ≥ f ( ¯x) and liml→+∞g^k_i^l⁺¹

0 = +∞, passing to the limit in the above inequality leads to a contradiction.

(14)

(b) From the inequality in (19) and the non-negativity of d_φ, it follows that f(x^k+1) ≤ f (x) + λkd_φ(x, x^k), ∀x ∈ IRⁿ₊₊.

Taking the limit k→ +∞ into the inequality and using limk→+∞λk = 0, Lemma3.2(e) and the lower semicontinuity of f , we then obtain that

f( ¯x) ≤ f (x), ∀x ∈ IRⁿ₊₊, (31)

where¯x is such that limk→+∞x^k= ¯x. This implies that ¯x ∈X^∗. Proposition 4.5 Suppose that assumptions (A1) and (A2) are satisfied. Let{λk}k∈N be an arbitrary sequence of positive numbers and{x^k}k∈N be the sequence generated by RIPM.

Then, the sequence{x^k}k∈Nconverges, and furthermore,

(a) If there existλ and ¯λ such that 0 < λ < λk≤ ¯λ for any k, we have lim inf

k→+∞g_i^k≥ 0, lim

k→+∞g^k_ix_i^k= 0, ∀i = 1, 2, . . . , n, (32) where g^k_i is same as Proposition4.4.

(b) If lim_k→+∞λk= 0, then the sequence {x^k}k∈Nconverges to a solution of problem (5).

Proof First, we prove that the sequence{x^k}_k∈Nconverges. By Lemma4.3(b),{x^k}_k∈N is Fejér convergent to the set U , which implies that

{x^k}k∈N ⊆

y∈ IRⁿ| x − y ≤ x − x⁰

for∀x ∈ U.

Note that the latter set is bounded for any given x∈ IRⁿ, and therefore, the sequence{x^k}k∈N

is bounded and there exist anˆx and a subsequence {x^k^j} of {x^k}k∈Nconverging to ˆx. Using a similar argument to the first part of Proposition4.4, we can prove that ˆx ∈ U. Thus, by Lemma4.3(c), the sequence{x^k− ˆx}k∈Nis convergent. Since{x^k^j} ∈ IRⁿ₊₊converges to ˆx ∈ IR₊ⁿ, we have that ˆx − x^k^j → 0, and consequently ˆx − x^k → 0, which implies that the limit point is unique and x^k→ ˆx.

(a) From the iterative formula (6) and Lemma2.3(d), we have 0∈ ˆ∂

f(x) + λkd_φ(x, x^k) (x^k+1).

Therefore, there exists g^k⁺¹∈ ˆ∂ f (x^k⁺¹) such that

λk∇1d_φ(x^k⁺¹, x^k) = −g^k⁺¹, i.e., for each i= 1, 2, . . . , n,

g^k_i⁺¹= −µλkx_i^kϕ(x_i^k⁺¹/x_i^k) − νλk(x_i^k⁺¹− x^k_i). (33) Sinceϕ ∈ 2, we have from (9) that for each i= 1, 2, . . . , n,

−µλkx_i^kϕ

x_i^k+1 x_i^k

≥ −µλkϕ(1)x_i^k

x_i^k+1 x_i^k − 1

≥ −µλkϕ(1)(x_i^k+1− x_i^k).

Combining the last two inequalities then yields that

g_i^k⁺¹≥ λk(µϕ(1) + ν)(x_i^k− x_i^k⁺¹) = 2θλk(x_i^k− x_i^k⁺¹), i = 1, 2, . . . , n, (34)