Interior proximal methods for SOCP - SOC Functions and Their Applications

Since P∞

k=1µ_k_k < ∞, using Lemma 3.7 with v_k := D(ζ^k, ζ^∗) ≥ 0 and β_k := µ_k_k ≥ 0 yields that the sequence {D(ζ^k, ζ^∗)} converges. Thus, by Proposition 3.10(e), the sequence {ζ^k} is bounded and consequently has an accumulation point. Without any loss of generality, let bζ ∈ F be an accumulation point of {ζ^k}. Then, there exists a subsequence {ζ^k^j} → bζ for some k_j → ∞. Since f is lower semi-continuous, we obtain f (bζ) = lim inf_k_j_→∞f (ζ^k^j). On the other hand, f (ζ^k^j) → f_∗ by part (a). The two sides imply that f (bζ) = f∗. Therefore, bζ is a solution of the CSOCP. The proof is thus complete.

where {λ_k} is a sequence of positive parameters, and H : IRⁿ × IRⁿ → (−∞, ∞] is a proximal distance with respect to int(K) (see Def. 3.1) which plays the same role as the Euclidean distance kx − yk² in the classical proximal algorithms (see, e.g., [105, 132]), but possesses certain more desirable properties to force the iterates to stay in K ∩ V, thus eliminating the constraints automatically. As will be shown, such proximal distances can be produced with an appropriate closed proper univariate function.

In the rest of this section, we focus on the case where K = Kⁿ, and all the analysis can be carried over to the case where K has the direct product structure. Unless otherwise stated, we make the following minimal assumption for the CSOCP (3.64):

(A1) domf ∩ (V ∩ int(Kⁿ)) 6= ∅ and f∗ := inf{f (x) | x ∈ V ∩ Kⁿ} > −∞.

Definition 3.2. An extended-valued function H : IRⁿ × IRⁿ → (−∞, ∞] is called a proximal distance with respect to int(Kⁿ) if it satisfies the following properties:

(P1) domH(·, ·) = C₁× C₂ with int(Kⁿ) × int(Kⁿ) ⊂ C₁× C₂ ⊆ Kⁿ× Kⁿ.

(P2) For each given y ∈ int(Kⁿ), H(·, y) is continuous and strictly convex on C1, and it is continuously differentiable on int(Kⁿ) with dom∇₁H(·, y) = int(Kⁿ).

(P3) H(x, y) ≥ 0 for all x, y ∈ IRⁿ, and H(y, y) = 0 for all y ∈ int(Kⁿ).

(P4) For each fixed y ∈ C₂, the sets {x ∈ C₁ : H(x, y) ≤ γ} are bounded for all γ ∈ IR.

Definition 3.2 has a little difference from Definition 2.1 of [10] for a proximal distance w.r.t. int(Kⁿ), since here H(·, y) is required to be strictly convex over C₁ for any fixed y ∈ int(Kⁿ). We denote D(int(Kⁿ)) by the family of functions H satisfying Definition 3.2. With a given H ∈ D(int(Kⁿ)), we have the following basic iterative algorithm for (3.64).

Interior Proximal Algorithm (IPA). Given H ∈ D(int(Kⁿ)) and x⁰ ∈ V ∩ int(Kⁿ).

For k = 1, 2, . . . , with λk > 0 and εk ≥ 0, generate a sequence {x^k} ⊂ V ∩ int(Kⁿ) with g^k ∈ ∂_ε_kf (x^k) via the following iterative scheme:

x^k := argminλkf (x) + H(x, x^k−1) | x ∈ V

(3.66) such that

λ_kg^k+ ∇₁H(x^k, x^k−1) = A^Tu^k for some u^k ∈ IR^m. (3.67) The following proposition implies that the IPA is well-defined, and moreover, from its proof we see that the iterative formula (3.66) is equivalent to the iterative scheme (3.65).

When ε_k > 0 for any k ∈ N (the set of natural numbers), the IPA can be viewed as an approximate interior proximal method, and it becomes exact if ε_k= 0 for all k ∈ N.

Proposition 3.14. For any given H ∈ D(int(Kⁿ)) and y ∈ int(Kⁿ), consider the problem f_∗(y, τ ) = inf {τ f (x) + H(x, y) | x ∈ V} with τ > 0. (3.68) Then, for each ε ≥ 0, there exist x(y, τ ) ∈ V ∩ int(Kⁿ) and g ∈ ∂_εf (x(y, τ )) such that

τ g + ∇₁H(x(y, τ ), y) = A^Tu (3.69) for some u ∈ IR^m. Moreover, for such x(y, τ ), we have

τ f (x(y, τ )) + H(x(y, τ ), y) ≤ f∗(y, τ ) + ε. (3.70) Proof. Set F (x, τ ) := τ f (x)+H(x, y)+δ_V∩Kⁿ(x), where δ_V∩Kⁿ(x) is the indicator function defined on the set V ∩ Kⁿ. Since domH(·, y) = C₁ ⊂ Kⁿ, it is clear that

f∗(y, τ ) = inf {F (x, τ ) | x ∈ IRⁿ} . (3.71) Since f∗ > −∞, it is easy to verify that for any γ ∈ IR the following relation holds

{x ∈ IRⁿ| F (x, τ ) ≤ γ} ⊂ {x ∈ V ∩ Kⁿ| H(x, y) ≤ γ − τ f_∗}

⊂ {x ∈ C₁| H(x, y) ≤ γ − τ f∗} ,

which together with (P4) implies that F (·, τ ) has bounded level sets. In addition, by (P1)-(P3), F (·, τ ) is a closed proper and strictly convex function. Hence, the problem (3.71) has a unique solution, to say x(y, τ ). From the optimality conditions of (3.71), we get

0 ∈ ∂F (x(y, τ )) = τ ∂f (x(y, τ )) + ∇1H(x(y, τ ), y) + ∂δV∩Kⁿ(x(y, τ ))

where the equality is due to [131, Theorem 23.8] and domf ∩ (V ∩ int(Kⁿ)) 6= ∅. Notice that dom ∇₁H(·, y) = int(Kⁿ) and dom ∂δV∩Kⁿ(·) = V ∩ Kⁿ. Therefore, the last equation implies x(y, τ ) ∈ V ∩ int(Kⁿ), and there exists g ∈ ∂f (x(y, τ )) such that

−τ g − ∇₁H(x(y, τ ), y) ∈ ∂δ_V∩Kⁿ(x(y, τ )).

On the other hand, by the definition of δV∩Kⁿ(·), it is not hard to derive that

∂δV∩Kⁿ(x) = Im(A^T), ∀x ∈ V ∩ int(Kⁿ).

The last two equations imply that (3.69) holds for ε = 0. When ε > 0, (3.69) also holds for such x(y, τ ) and g since ∂f (x(y, τ )) ⊂ ∂εf (x(y, τ )). Finally, since for each y ∈ int(Kⁿ) the function H(·, y) is strictly convex, and since g ∈ ∂_εf (x(y, τ )), we have

τ f (x) + H(x, y) ≥ τ f (x(y, τ )) + H(x(y, τ ), y)

+hτ g + ∇₁H(x(y, τ ), y), x − x(y, τ )i − ε

= τ f (x(y, τ )) + H(x(y, τ ), y) + hA^Tu, x − x(y, τ )i − ε

= τ f (x(y, τ )) + H(x(y, τ ), y) − ε for all x ∈ V,

where the first equality is from (3.69) and the last one is by x, x(y, τ ) ∈ V. Thus, f_∗(y, τ ) = inf{τ f (x) + H(x, y) | x ∈ V} ≥ τ f (x(y, τ )) + H(x(y, τ ), y) − ε.

In the following, we focus on the convergence behaviors of the IPA with H from several subclasses of D(int(Kⁿ)), which also satisfy one of the following properties.

(P5) For any x, y ∈ int(Kⁿ) and z ∈ C₁, H(z, y) − H(z, x) ≥ h∇₁H(x, y), z − xi;

(P5’) For any x, y ∈ int(Kⁿ) and z ∈ C₂, H(y, z) − H(x, z) ≥ h∇₁H(x, y), z − xi.

(P6) For each x ∈ C₁, the level sets {y ∈ C₂| H(x, y) ≤ γ} are bounded for all γ ∈ IR.

Specifically, we denote F₁(int(Kⁿ)) and F₂(int(Kⁿ)) by the family of functions H ∈ D(int(Kⁿ)) satisfying (P5) and (P5’), respectively. If C₁ = Kⁿ, we denote F₁(Kⁿ) by the family of functions H ∈ D(int(Kⁿ)) satisfying (P5) and (P6). If C₂ = Kⁿ, we write F₂(int(Kⁿ)) as F (Kⁿ). It is easy to see that the class of proximal distance F (int(Kⁿ)) (respectively, F (Kⁿ)) in [10] subsumes the (H, H) with H ∈ F₁(int(Kⁿ)) (respectively, F₁(Kⁿ)), but it does not include any (H, H) with H ∈ F₂(int(Kⁿ)) (respectively, F₂(Kⁿ)).

Proposition 3.15. Let {x^k} be the sequence generated by the IPA with H ∈ F₁(int(Kⁿ)) or H ∈ F₂(int(Kⁿ)). Set σ_ν =Pν

k=1λ_k. Then, the following results hold.

(a) f (x^ν) − f (x) ≤ σ⁻¹_ν H(x, x⁰) + σ⁻¹_ν Pν

k=1σ_kε_k for any x ∈ V ∩ C₁ if H ∈ F₁(int(Kⁿ));

f (x^ν)−f (x) ≤ σ_ν⁻¹H(x⁰, x)+σ⁻¹_ν Pν

k=1σ_kε_k for any x ∈ V ∩C₂ if H ∈ F₂(int(Kⁿ)).

(b) If σ_ν → +∞ and ε_k → 0, then lim inf_ν→∞f (x^ν) = f∗. (c) The sequence {f (x^k)} converges to f∗ whenever P∞

k=1ε_k < ∞.

(d) If X∗ 6= ∅, then {x^k} is bounded with all limit points in X∗ under (d1) or (d2) below:

(d1) X∗ is bounded and P∞

k=1ε_k < ∞;

(d2) P∞

k=1λ_kε_k< ∞ and H ∈ F₁(Kⁿ) (or H ∈ F₂(Kⁿ)).

Proof. The proofs are similar to those of [10, Theorem 4.1]. For completeness, we here take H ∈ F₂(int(Kⁿ)) for example to prove the results.

(a) Since g^k ∈ ∂_ε_kf (x^k), from the definition of the subdifferential, it follows that f (x) ≥ f (x^k) + hg^k, x − x^ki − ε_k, ∀x ∈ IRⁿ.

This together with equation (3.67) implies that

λ_k(f (x^k) − f (x)) ≤ h∇₁H(x^k, x^k−1), x − x^ki + λ_kε_k, ∀x ∈ V ∩ C₂. Using (P5’) with x = x^k, y = x^k−1 and z = x ∈ V ∩ C2, it then follows that

λ_k(f (x^k) − f (x)) ≤ H(x^k−1, x) − H(x^k, x) + λ_kε_k, ∀x ∈ V ∩ C₂. (3.72)

Summing over k = 1, 2, . . . , ν in this inequality yields that

−σ_νf (x) +

k=1

λ_kf (x^k) ≤ H(x⁰, x) − H(x^ν, x) +

k=1

λ_kε_k. (3.73)

On the other hand, setting x = x^k−1 in (3.72), we obtain

f (x^k) − f (x^k−1) ≤ λ⁻¹_k H(x^k−1, x^k−1) − H(x^k, x^k−1) + ε_k≤ ε_k. (3.74) Multiplying the inequality by σ_k−1 (with σ₀ ≡ 0) and summing over k = 1, . . . , ν, we get

k=1

σ_k−1f (x^k) −

k=1

σ_k−1f (x^k−1) ≤

k=1

σ_k−1ε_k.

Noting that σ_k = λ_k+ σ_k−1 with σ₀ ≡ 0, the above inequality can reduce to σ_νf (x^ν) −

k=1

λ_kf (x^k) ≤

k=1

σ_k−1ε_k. (3.75)

Adding the inequalities (3.73) and (3.75) and recalling that σ_k = λ_k+ σ_k−1, it follows that

f (x^ν) − f (x) ≤ σ_ν⁻¹H(x⁰, x) − H(x^ν, x) + σ⁻¹_ν

k=1

σ_kε_k, ∀x ∈ V ∩ C₂,

which immediately implies the desired result due to the nonnegativity of H(x^ν, x).

(b) If σ_ν → +∞ and ε_k → 0, then applying Lemma 2.2(ii) of [10] with a_k = ε_k and b_ν := σ⁻¹_ν Pν

k=1λ_kε_k yields σ_ν⁻¹Pν

k=1λ_kε_k→ 0. From part(a), it then follows that lim inf

ν→∞ f (x^ν) ≤ inf {f (x) | x ∈ V ∩ int(Kⁿ)} . This together with f (x^ν) ≥ inf {f (x) | x ∈ V ∩ Kⁿ} implies that

lim inf

ν→∞ f (x^ν) = inf {f (x) | x ∈ V ∩ int(Kⁿ)} = f∗.

(c) From (3.74), 0 ≤ f (x^k) − f∗ ≤ f (x^k−1) − f∗+ ε_k. Using Lemma 2.1 of [10] with γ_k≡ 0 and vk= f (x^k) − f∗, we have that {f (x^k)} converges to f∗ whenever P∞

k=1εk < ∞.

(d) If the condition (d1) holds, then the sets {x ∈ V ∩ Kⁿ| f (x) ≤ γ} are bounded for all γ ∈ IR, since f is closed proper convex and X∗ = {x ∈ V ∩ Kⁿ| f (x) ≤ f∗}. Note that (3.74) implies {x^k} ⊆ {x ∈ V ∩ Kⁿ| f (x) ≤ f (x⁰) +Pk

j=1εj}. Along withP∞

k=1εk < ∞, clearly, {x^k} is bounded. Since {f (x^k)} converges to f∗ and f is l.s.c., passing to the limit and recalling that {x^k} ⊂ V ∩ Kⁿ yields that each accumulation point of {x^k} is a solution of (3.64).

Suppose that the condition (d2) holds. If H ∈ F₂(Kⁿ), then inequality (3.72) holds for each x ∈ V ∩ Kⁿ, and particularly for x_∗ ∈ X_∗. Consequently,

H(x^k, x∗) ≤ H(x^k−1, x∗) + λ_kε_k ∀x∗ ∈ X∗. (3.76) Summing over k = 1, 2, . . . , ν for the last inequality, we obtain

H(x^ν, x∗) ≤ H(x⁰, x∗) +

k=1

λ_kε_k.

This, by (P4) and P∞

k=1λ_kε_k < ∞, implies that {x^k} is bounded, and hence has an accumulation point. Without loss of generality, let ˆx ∈ Kⁿ be an accumulation point of {x^k}. Then there exists a subsequence {x^k^j} such that x^k^j → ˆx as j → ∞. From the lower semicontinuity of f and part(c), we get f (ˆx) ≤ lim_j→+∞f (x^k^j) = f∗, which means that ˆx is a solution of (3.64). If H ∈ F₁(Kⁿ), then the last inequality becomes

H(x∗, x^ν) ≤ H(x∗, x⁰) +

k=1

λ_kε_k.

By (P6) and P∞

k=1λkεk < ∞, we also have that {x^k} is bounded, and hence has an accumulation point. Using the same arguments as above, we get the desired result. An immediate byproduct of the above analysis yields the following global rate of convergence estimate for the IPA with H ∈ F₁(Kⁿ) or H ∈ F₂(Kⁿ).

Proposition 3.16. Let {x^k} be the sequence given by the IPA with H ∈ F₁(Kⁿ) or F₂(Kⁿ). If X∗ 6= ∅ andP∞

k=1ε_k< ∞, then f (x^ν) − f∗ = O(σ_ν⁻¹).

Proof. The result is direct by setting x = x^∗ for some x^∗ ∈ X∗ in the inequalities of Proposition 3.15(a), and noting that 0 < ^σ_σ^k

ν ≤ 1 for all k = 1, 2, · · · , ν.

To establish the global convergence of {x^k} to an optimal solution of (3.64), we need to make further assumptions on X∗ or the proximal distances in F1(Kⁿ) and F2(Kⁿ).

We denote bF₁(Kⁿ) by the family of functions H ∈ F₁(Kⁿ) satisfying (P7)-(P8) below, Fb₂(Kⁿ) by the family of functions H ∈ F₂(Kⁿ) satisfying (P7’)–(P8’) below, and ¯F (Kⁿ) by the family of functions H ∈ F₂(Kⁿ) satisfying (P7’)-(P9’) below:

(P7) For any {y^k} ⊆ int(Kⁿ) converging to y^∗ ∈ Kⁿ, we have H(y^∗, y^k) → 0;

(P8) For any bounded sequence {y^k} ⊆ int(Kⁿ) and any y^∗ ∈ Kⁿ with H(y^∗, y^k) → 0, there holds that λ_i(y^k) → λ_i(y^∗) for i = 1, 2;

(P7’) For any {y^k} ⊆ int(Kⁿ) converging to y^∗ ∈ Kⁿ, we have H(y^k, y^∗) → 0;

(P8’) For any bounded sequence {y^k} ⊆ int(Kⁿ) and any y^∗ ∈ Kⁿ with H(y^k, y^∗) → 0, there holds that λ_i(y^k) → λ_i(y^∗) for i = 1, 2;

(P9’) For any bounded sequence {y^k} ⊆ int(Kⁿ) and any y^∗ ∈ Kⁿ with H(y^k, y^∗) → 0, there holds that y^k → y^∗.

It is easy to see that all previous subclasses of D(int(Kⁿ)) have the following relations:

Fb₁(Kⁿ) ⊆ F₁(Kⁿ) ⊆ F₁(int(Kⁿ)), F¯₂(Kⁿ) ⊆ bF₂(Kⁿ) ⊆ F₂(Kⁿ) ⊆ F₂(int(Kⁿ)).

Proposition 3.17. Let {x^k} be generated by the IPA with H ∈ F₁(int(Kⁿ)) or F₂(int(Kⁿ)).

Suppose that X∗ is nonempty, P∞

k=1λ_kε_k < ∞ and P∞

k=1ε_k < ∞.

(a) If X_∗ is a single point set, then {x^k} converges to an optimal solution of (3.64).

(b) If X∗ at least includes two elements and for any x^∗ = (x^∗₁, x^∗₂), ¯x^∗ = (¯x^∗₁, ¯x^∗₂) ∈ X∗

with x^∗ 6= ¯x^∗, it holds that x^∗₁ 6= ¯x^∗₁ or kx^∗₂k 6= k¯x^∗₂k, then {x^k} converges to an optimal solution of (3.64) whenever H ∈ bF₁(Kⁿ) (or H ∈ bF₂(Kⁿ)).

Proof. Part (a) is direct by Proposition 3.15(d1). We next consider part (b). Assume that H ∈ bF2(Kⁿ). Since P∞

k=1λkεk < ∞, from (3.76) and Lemma 2.1 of [10], it follows that the sequence {H(x^k, x)} is convergent for any x ∈ X_∗. Let ¯x be the limit of a subsequence {x^k^l}. By Proposition 3.15(d2), ¯x ∈ X∗. Consequently, {H(x^k, ¯x)} is convergent. By (P7’), H(x^k^l, ¯x) → 0, and so H(x^k, ¯x) → 0. Along with (P8’), λ_i(x^k) → λi(¯x) for i = 1, 2, i.e.,

x^k₁ − kx^k₂k → ¯x₁− k¯x₂k and x^k₁ + kx^k₂k → ¯x₁+ k¯x₂k as k → ∞.

This implies that x^k₁ → ¯x₁ and kx^k₂k → k¯x₂k. Together with the given assumption for X∗, we have that x^k→ ¯x. Suppose that H ∈ bF₁(Kⁿ). The inequality (3.76) becomes

H(x∗, x^k) ≤ H(x∗, x^k−1) + λ_kε_k, ∀x∗ ∈ X∗,

and using (P7)-(P8) and the same arguments as above then yields the result. Part(c) is direct by the arguments above and the property (P9’).

When all points in the nonempty X∗ lie on the boundary of Kⁿ, we must have x^∗₁ 6= ¯x^∗₁ or kx^∗₂k 6= k¯x^∗₂k for any x^∗ = (x^∗₁, x^∗₂), ¯x^∗ = (¯x^∗₁, ¯x^∗₂) ∈ X∗ with x^∗ 6= ¯x^∗, and the assump-tion for X∗ in (b) is automatically satisfied. Since the solutions of (3.64) are generally on the boundary of Kⁿ, the assumption for X∗ in Proposition 3.17(b) is much weaker than the one in Proposition 3.17(a).

Up to now, we have studied two types of convergence results for the IPA by the class in which the proximal distance H lies. Proposition 3.15 and Proposition 3.16 show that the largest, and less demanding, classes F₁(int(Kⁿ)) and F₂(int(Kⁿ)) provide reasonable convergence properties for the IPA under minimal assumptions on the problem’s data.

This coincides with interior proximal methods for convex programming over nonnegative orthant cones; see [10]. The smallest subclass ¯F₂(Kⁿ) of F₂(int(Kⁿ)) guarantees that {x^k} converges to an optimal solution provided that X∗ is nonempty. The smaller class Fb₂(Kⁿ) may guarantee the global convergence of the sequence {x^k} to an optimal solution under an additional assumption except the nonempty of X∗. Moreover, we will illustrate that there are indeed examples for the class ¯F₂(Kⁿ). For the smallest subclass bF₁(Kⁿ) of F₁(int(Kⁿ)), the analysis shows that it seems hard to find an example, although it guarantees the convergence of {x^k} to an optimal solution by Proposition 3.17(b).

Next, we provide three kinds of ways to construct a proximal distance w.r.t. int(Kⁿ) and analyze their own advantages and disadvantages. All of these ways exploit a l.s.c.

(lower semi-continuous) proper univariate function to produce such a proximal distance.

In addition, with such a proximal distance and the Euclidean distance, we obtain the regularized ones.

The first way produces the proximal distances for the class F₁(int(Kⁿ)). This way is based on the compound of a univariate function φ and the determinant function det(·), where φ : IR → (−∞, ∞] is a l.s.c. proper function satisfying the following conditions:

(B1) domφ ⊆ [0, ∞), int(domφ) = (0, ∞), and φ is continuous on its domain;

(B2) for any t1, t2 ∈ domφ, there holds that

φ(t^r₁t^1−r₂ ) ≤ rφ(t₁) + (1 − r)φ(t₂), ∀r ∈ [0, 1]; (3.77) (B3) φ is continuously differentiable on int(domφ) with dom(φ⁰) = (0, ∞);

(B4) φ⁰(t) < 0 for all t ∈ (0, ∞), lim_t→0⁺φ(t) = ∞, and lim_t→∞t⁻¹φ(t²) ≥ 0.

With such a univariate φ, we define the function H : IRⁿ× IRⁿ→ (−∞, ∞] as in (3.15):

H(x, y) := φ(det(x)) − φ(det(y)) − h∇φ(det(y)), x − yi, ∀x, y ∈ int(Kⁿ).

∞, otherwise. (3.78)

By the conditions (B1)-(B4), we may prove that H has the following properties.

Proposition 3.18. Let H be defined as in (3.78) with φ satisfying (B1)-(B4). Then, the following hold.

(a) For any fixed y ∈ int(Kⁿ), H(·, y) is strictly convex over int(Kⁿ).

(b) For any fixed y ∈ int(Kⁿ), H(·, y) is continuously differentiable on int(Kⁿ) with

∇₁H(x, y) = 2φ⁰(det(x))

x₁

−x₂

− 2φ⁰(det(y))

y₁

−y₂

(3.79) for all x ∈ int(Kⁿ), where x = (x₁, x₂), y = (y₁, y₂) ∈ IR × IRⁿ⁻¹.

(d) For any y ∈ int(Kⁿ), the sets {x ∈ int(Kⁿ) | H(x, y) ≤ γ} are bounded for all γ ∈ IR.

(e) For any x, y ∈ int(Kⁿ) and z ∈ int(Kⁿ), the following three point identity holds H(z, y) = H(z, x) + H(x, y) + h∇₁H(x, y), z − xi.

Proof. (a) It suffices to prove φ(det(x)) is strictly convex on int(Kⁿ). By Proposition 1.8(a), there has

det(αx + (1 − α)z) > (det(x))^α(det(z))^1−α, ∀α ∈ (0, 1),

for all x, z ∈ int(Kⁿ) and x 6= z. Since φ⁰(t) < 0 for all t ∈ (0, +∞), we have that φ is decreasing on (0, +∞). This, together with the condition (B2), yields that

φ [det(αx + (1 − α)z)] < φ(det(x))^α(det(z))^1−α

≤ αφ[det(x)] + (1 − α)φ[det(z)], ∀α ∈ (0, 1)

for any x, z ∈ int(Kⁿ) and x 6= z. This means that φ(det(x)) is strictly convex on int(Kⁿ).

(b) Since det(x) is continuously differentiable on IRⁿ and φ is continuously differentiable on (0, ∞), we have that φ(det(x)) is continuously differentiable on int(Kⁿ). This means that for any fixed y ∈ int(Kⁿ), H(·, y) is continuously differentiable on int(Kⁿ). By a simple computation, we immediately obtain the formula in (3.79).

(c) Since φ(det(x)) is strictly convex and continuously differentiable on int(Kⁿ), we have φ(det(x)) > φ(det(y)) − h∇φ(det(y)), x − yi,

for any x, y ∈ int(Kⁿ) with x 6= y. This implies that H(y, y) = 0 for all y ∈ int(Kⁿ). In addition, from the inequality and the continuity of φ on its domain, it follows that

φ(det(x)) ≥ φ(det(y)) − h∇φ(det(y)), x − yi

for any x, y ∈ int(Kⁿ). By the definition of H, we have H(x, y) ≥ 0 for all x, y ∈ IRⁿ. (d) Let {x^k} ⊆ int(Kⁿ) be a sequence with kx^kk → ∞. For any fixed y = (y₁, y₂) ∈ int(Kⁿ), we next prove that the sequence {H(x^k, y)} is unbounded by three cases, and then the desired result follows. For convenience, we write x^k = (x^k₁, x^k₂) for each k.

Case 1: the sequence {det(x^k)} has a zero limit point. Without loss of generality, we assume that det(x^k) → 0 as k → ∞. Together with lim_t→0⁺φ(t) = ∞, it readily follows that lim_k→∞φ(det(x^k)) → ∞. In addition, for each k we have that

h∇φ(det(y)), x^ki = 2φ⁰(det(y))(x^k₁y1 − (x^k₂)^Ty2)

≤ 2φ⁰(det(y))y₁(x^k₁ − kx^k₂k) ≤ 0, (3.80)

where the inequality is true by using φ⁰(t) < 0 for all t > 0, the Cauchy-Schwartz Inequality, and y ∈ int(Kⁿ). Now from (3.78), it then follows that lim_k→∞H(x^k, y) = ∞.

Case 2: the sequence {det(x^k)} is unbounded. Noting that det(x^k) > 0 for each k, we must have det(x^k) → +∞ as k → ∞. Since φ is decreasing on its domain, we have that

φ(det(x^k)) kx^kk =

√2φ(λ₁(x^k)λ₂(x^k))

p(λ1(x^k))²+ (λ₂(x^k))² ≥ φ[(λ₂(x^k))²] λ₂(x^k) .

Note that λ₂(x^k) → ∞ in this case, and from the last equation and (B4) it follows that lim

k→∞

φ(det(x^k))

kx^kk ≥ lim

k→∞

φ[(λ₂(x^k))²] λ2(x^k) ≥ 0.

In addition, since {_kx^x^kkk} is bounded, we without loss of generality assume that x^k

kx^kk → ˆx = (ˆx₁, ˆx₂) ∈ IR × IRⁿ⁻¹. Then, ˆx ∈ Kⁿ, kˆxk = 1, and ˆx₁ > 0 (if not, ˆx = 0), and hence

k→∞lim

∇φ(det(y)), x^k kx^kk

= h∇φ(det(y)), ˆxi

= 2φ⁰(det(y))(ˆx₁y₁− ˆx^T₂y₂)

≤ 2φ⁰(det(y))ˆx₁(y₁− ky₂k)

< 0.

The two sides show that lim_k→∞ ^H(x_kxk^kk^,y) > 0, and consequently lim_k→∞H(x^k, y) = ∞.

Case 3: the sequence {det(x^k)} has some limit point ω with 0 < ω < ∞. Without loss of generality, we assume that det(x^k) → ω as k → ∞. Since {x^k} is unbounded and {x^k} ⊂ int(Kⁿ), we must have x^k₁ → ∞. In addition, by (3.80) and φ⁰(t) < 0 for t > 0,

−h∇φ(det(y)), x^ki ≥ −2φ⁰(det(y))(x^k₁y₁− kx^k₂kky₂k) ≥ −2φ⁰(det(y))x^k₁(y₁− ky₂k).

This along with y ∈ int(Kⁿ) implies that −h∇φ(det(y)), x^ki → +∞ as k → ∞. Noting that φ(det(x^k)) is bounded, from (3.78) it follows that lim_k→∞H(x^k, y) → ∞.

(e) For any x, y ∈ int(Kⁿ) and z ∈ int(Kⁿ), from the definition of H it follows that H(z, y) − H(z, x) − H(x, y) = h∇φ(det(x)) − ∇φ(det(y)), z − xi

= h∇₁H(x, y), z − xi,

where the last equality is by part (b). The proof is thus complete.

Proposition 3.18 shows that the function H defined by (3.15) with φ satisfying (B1)–

(B4) is a proximal distance w.r.t. int(Kⁿ) and dom H = int(Kⁿ) × int(Kⁿ). Also, H ∈ F₁(int(Kⁿ)). The conditions (B1) and (B3)-(B4) are easy to check, whereas by Lemma 2.2 of [123] we have the following important characterizations for the condition (B2).

Lemma 3.8. A function φ : (0, ∞) → IR satisfies (B2) if and only if one of the following conditions holds:

(a) the function φ(exp(·)) is convex on IR;

(b) φ(t₁t₂) ≤ 1

2 φ(t²₁) + φ(t²₂) for any t₁, t₂ > 0;

Example 3.8. Let φ : (0, ∞) → IR be φ(t) = − ln t, if t > 0.

∞, otherwise.

Solution. It is easy to verify that φ satisfies (B1)-(B4). By formula (3.78), the induced proximal distance is

H(x, y) :=







− lndet(x)

det(y) +2x^TJny

det(y) − 2, ∀x, y ∈ int(Kⁿ),

∞, otherwise,

where J_n is a diagonal matrix with the first entry being 1 and the rest (n − 1) entries being −1. This is exactly the proximal distance given by [10]. Since H ∈ F₁(int(Kⁿ)), we have the results of Proposition 3.15(a)-(d1) if the proximal distance is used for the IPA.

Example 3.9. Take φ(t) = t^1−q/(q −1) (q > 1) if t > 0, and otherwise φ(t) = ∞.

Solution. It is not hard to check that φ satisfies (B1)-(B4). In light of (3.78), we compute that

H(x, y) :=







(det(x))^1−q− (det(y))^1−q

q − 1 + 2x^TJ_ny

(det(y))^q − (det(y))^1−q, ∀x, y ∈ int(Kⁿ),

∞, otherwise,

where J_nis the diagonal matrix same as Example 3.8. Since H ∈ F (int(Kⁿ)), when using the proximal distance for the IPA, the results of Proposition 3.15(a)-(d1) hold.

We should emphasize that using the first way can not produce the proximal distances of the class F1(Kⁿ), and so bF1(Kⁿ), since the condition lim_t→0⁺φ(t) = ∞ is necessary to guarantee that H has the property (P4), but it implies that the domain of H(·, y) for any y ∈ int(Kⁿ) can not be continuously extended to Kⁿ. Thus, when choosing such proximal distances for the IPA, we can not apply Proposition 3.15(d2) and Proposition 3.17.

The other two ways are both based on the compound of the trace function tr(·) and a vector-valued function induced by a univariate φ via (1.9). For convenience, in the

sequel, for any l.s.c. proper function φ : IR → (−∞, ∞], we write d : IR × IR → (−∞, ∞]

d(s, t) := φ(s) − φ(t) − φ⁰(t)(s − t), if s ∈ domφ, t ∈ dom(φ⁰).

∞, otherwise. (3.81)

The second way also produces the proximal distances for the class F1(int(Kⁿ)), which requires φ : IR → (−∞, ∞] to be a l.s.c. proper function satisfying the conditions:

(C1) domφ ⊆ [0, +∞) and int(domφ) = (0, ∞);

(C2) φ is continuous and strictly convex on its domain;

(C3) φ is continuously differentiable on int(domφ) with dom(φ⁰) = (0, ∞);

(C4) for any fixed t > 0, the sets {s ∈ domφ | d(s, t) ≤ γ} are bounded with all γ ∈ IR;

for any fixed s ∈ domφ, the sets {t > 0 | d(s, t) ≤ γ} are bounded with all γ ∈ IR.

Let φ^soc be the vector-valued function induced by φ via (1.9) and write dom(φ^soc) = C₁. Clearly, C₁ ⊆ Kⁿ and intC₁= int(Kⁿ). Define the function H : IRⁿ× IRⁿ → (−∞, ∞] by H(x, y) := tr(φ^soc(x)) − tr(φ^soc(y)) − h∇tr(φ^soc(y)), x − yi, ∀x ∈ C₁, y ∈ int(Kⁿ).

∞, otherwise. (3.82)

Using (1.6), Proposition 1.3, Lemma 3.3, the conditions (C1)-(C4), and similar arguments to [116, Proposition 3.1 and Proposition 3.2] (also see Section 3.1), it is not difficult to argue that H has the following favorable properties.

Proposition 3.19. Let H be defined by (3.82) with φ satisfying (C1)-(C4). Then, the following hold.

(a) For any fixed y ∈ int(Kⁿ), H(·, y) is continuous and strictly convex on C1. (b) For any fixed y ∈ int(Kⁿ), H(·, y) is continuously differentiable on int(Kⁿ) with

∇₁H(x, y) = ∇tr(φ^soc(x)) − ∇tr(φ^soc(y)) = 2 [(φ⁰)^soc(x) − (φ⁰)^soc(y)] . (c) H(x, y) ≥ 0 for all x, y ∈ IRⁿ, and H(y, y) = 0 for any y ∈ int(Kⁿ).

(d) H(x, y) ≥P2

i=1d(λ_i(x), λ_i(y)) ≥ 0 for any x ∈ C₁ and y ∈ int(Kⁿ).

(e) For any fixed y ∈ int(Kⁿ), the sets {x ∈ C₁| H(x, y) ≤ γ} are bounded for all γ ∈ IR;

for any fixed x ∈ C₁, the sets {y ∈ int(Kⁿ) | H(x, y) ≤ γ} are bounded for all γ ∈ IR.

(f ) For any x, y ∈ int(Kⁿ) and z ∈ C₁, the following three point identity holds:

H(z, y) = H(z, x) + H(x, y) + h∇₁H(x, y), z − xi.

Proposition 3.19 shows that the function H defined by (3.82) with φ satisfying (C1)-(C4) is a proximal distance w.r.t. int(Kⁿ) with dom H = C₁× int(Kⁿ), and furthermore, such proximal distances belong to the class F₁(int(Kⁿ)). In particular, when domφ = [0, ∞), they also belong to the class F₁(Kⁿ). We next present some specific examples.

Example 3.10. Let φ(t) = t ln t − t if t ≥ 0, and otherwise φ(t) = ∞, where we stipulate 0 ln 0 = 0.

Solution. It is easy to verify that φ satisfies (C1)-(C4) with domφ = [0, ∞). By formulas (1.9) and (3.82), we compute that H has the following expression:

H(x, y) = tr(x ◦ ln x − x ◦ ln y + y − x), ∀x ∈ Kⁿ, y ∈ int(Kⁿ).

∞, otherwise.

Example 3.11. Let φ(t) = t^p − t^q if t ≥ 0, and otherwise φ(t) = ∞, where p ≥ 1 and 0 < q < 1.

Solution. We can show that φ satisfies the conditions (C1)-(C4) with dom(φ) = [0, ∞).

When p = 1 and q = 1/2, from formulas (1.9) and (3.82), we derive that

H(x, y) =





 tr

y¹² − x¹² + (tr(y¹²)e − y¹²) ◦ (x − y) 2pdet(y)

, ∀x ∈ Kⁿ, y ∈ int(Kⁿ).

∞, otherwise.

Example 3.12. Let φ(t) = −t^q if t ≥ 0, and otherwise φ(t) = ∞, where 0 < q < 1.

Solution. We can show that φ satisfies the conditions (C1)-(C4) with domφ = [0, ∞).

Now

H(x, y) = (1 − q)tr(y^q) − tr(x^q) + tr(qy^q−1◦ x), ∀x ∈ Kⁿ, y ∈ int(Kⁿ).

∞, otherwise.

Example 3.13. Let φ(t) = − ln t + t − 1 if t > 0, and otherwise φ(t) = ∞.

Solution. It is easy to check that φ satisfies (C1)-(C4) with domφ = (0, ∞). The induced proximal distance is

H(x, y) = tr(ln y) − tr(ln x) + 2hy⁻¹, xi − 2, ∀x, y ∈ int(Kⁿ).

∞, otherwise.

By a simple computation, we obtain that the proximal distance is same as the one given by Example 3.8, and the one induced by φ(t) = − ln t (t > 0) via formula (3.82).

Clearly, the proximal distances in Examples 3.10–3.12 belong to the class F₁(Kⁿ).

Also, by Proposition 3.20 below, the proximal distances in Examples 3.10–3.11 also satisfy (P8) since the corresponding φ also satisfies the following condition (C5):

(C5) For any bounded sequence {a^k} ⊆ int(domφ) and a ∈ domφ such that lim

k→∞d(a, a^k)

= 0, there holds that a = limk→∞a^k, where d is defined as in (3.81).

Proposition 3.20. Let H be defined as in (3.82) with φ satisfying (C1)-(C5) and domφ = [0, ∞). Then, for any bounded sequence {y^k} ⊆ int(Kⁿ) and y^∗ ∈ Kⁿ such that H(y^∗, y^k) → 0, we have λi(y^k) → λi(y^∗) for i = 1, 2.

Proof. From Proposition 3.19(d) and the nonnegativity of d, for each k we have H(y^∗, y^k) ≥ d(λ_i(y^∗), λ_i(y^k)) ≥ 0, i = 1, 2.

This, together with the given assumption H(y^∗, y^k) → 0, implies that d(λi(y^∗), λi(y^k)) → 0, i = 1, 2.

Notice that {λ_i(y^k)} ⊂ int(domφ) and λ_i(y^∗) ∈ Kⁿ for i = 1, 2 by Property 1.1(c). From the condition (C5), we immediately obtain λ_i(y^k) → λ_i(y^∗) for i = 1, 2.

Nevertheless, we should point out that the proximal distance H given by (3.82) with φ satisfying (C1)-(C4) and domφ = [0, ∞) generally does not have the property (P7), even if φ satisfies the condition (C6) below. This fact will be illustrated by Example 3.14.

(C6) For any {a^k} ⊆ (0, ∞) converging to a ∈ [0, ∞), lim_k→∞d(a^∗, a^k) → 0.

Example 3.14. Let H be the proximal distance induced by the entropy function φ in Example 3.10.

Solution. It is easy to verify that φ satisfies the conditions (C1)-(C6). Here we shall present a sequence {y^k} ⊂ int(K³) which converges to y^∗ ∈ K³, but H(y^∗, y^k) → ∞. Let

y^k =





p2(1 + e^−k³)

√

1 + k⁻¹− e^−k³

√

1 − k⁻¹+ e^−k³



∈ int(K³) and y^∗ =





√2 1 1



∈ K³.

By the expression of H(y^∗, y^k), i.e., H(y^∗, y^k) = tr(y^∗◦ ln y^∗) − tr(y^∗◦ ln y^k) + tr(y^k− y^∗), it suffices to prove that lim_k→∞−tr(y^∗ ◦ ln y^k) = ∞ since lim_k→∞tr(y^k− y^∗) = 0 and tr(y^∗◦ ln y^∗) = λ₂(y^∗) ln(λ₂(y^∗)) < ∞. By the definition of ln y^k, we have

tr(y^∗◦ ln y^k) = ln(λ₁(y^k)) y₁^∗− (y₂^∗)^Ty¯₂^k + ln(λ₂(y^k)) y₁^∗+ (y₂^∗)^Ty¯₂^k

(3.83)

for y^∗ = (y₁^∗, y₂^∗), y^k= (y₁^k, y₂^k) ∈ IR × IR² with ¯y₂^k = y^k₂/ky₂^kk. By computing, ln(λ₁(y^k)) = ln√

2 − ln 1 +p

1 + e^−k³

− k³, y₁^∗− (y₂^∗)^Ty¯₂^k = 1

ky^k₂k

−k⁻¹+ e^−k³ 1 +√

1 + k⁻¹− e^−k³ + k⁻¹− e^−k³ 1 +√

1 − k⁻¹+ e^−k³

! .

The last two equalities imply that lim_k→∞ln(λ₁(y^k)) y₁^∗− (y₂^∗)^Ty¯₂^k = −∞. In addition, by noting that y^k₂ 6= 0 for each k, we compute that

lim

k→∞ln(λ₂(y^k)) y₁^∗− (y₂^∗)^Ty¯₂^k = ln(λ2(y^k))

y₁^∗+ (y₂^∗)^T y₂^∗ ky₂^∗k

= λ₂(y^∗) ln(λ₂(y^∗)).

From the last two equations, we immediately have lim_k→∞−tr(y^∗◦ ln y^k) = ∞. Thus, when the proximal distance in the IPA is chosen as the one given by (3.82) with φ satisfying (C1)-(C6) and domφ = [0, ∞), Proposition 3.17(b) may not apply, i.e.

the global convergence to an optimal solution may not be guaranteed. This is different from interior proximal methods for convex programming over nonnegative orthant cones by noting that φ is now a univariate Bregman function. Similarly, it seems hard to find examples for the class F+(Kⁿ) in [10] so that Theorem 2.2 therein can apply for since it also requires (P7).

The third way will produce the proximal distances for the class F2(int(Kⁿ)), which needs a l.s.c. proper function φ : IR → (−∞, ∞] satisfying the following conditions:

(D1) φ is strictly convex and continuous on domφ, and φ is continuously differentiable on a subset of domφ, where dom(φ⁰) ⊆ domφ ⊆ [0, ∞) and int(domφ⁰) = (0, ∞);

(D2) φ is twice continuously differentiable on int(domφ) and lim_t→0⁺φ⁰⁰(t) = ∞;

(D3) φ⁰(t)t − φ(t) is convex on dom(φ⁰), and φ⁰ is strictly concave on dom(φ⁰);

(D4) φ⁰ is SOC-concave on dom(φ⁰).

With such a univariate φ, we define the proximal distance H : IRⁿ× IRⁿ→ (−∞, ∞] by H(x, y) := tr(φ^soc(y)) − tr(φ^soc(x)) − h∇tr(φ^soc(x)), y − xi, ∀x ∈ C₁, y ∈ C₂,

∞, otherwise. (3.84)

where C₁ and C₂ are the domain of φ^soc and (φ⁰)^soc, respectively. By the relation between dom(φ) and dom(φ⁰), obviously, C₂ ⊆ C₁ ⊆ Kⁿ and intC₁ = intC₂ = int(Kⁿ).

Lemma 3.9. Let φ : IR → (−∞, ∞] be a l.s.c. proper function satisfying (D1)-(D4).

Then, the following hold.

(a) tr [(φ⁰)^soc(x) ◦ x − φ^soc(x)] is convex in C₁ and continuously differentiable on intC₁.

(b) For any fixed y ∈ IRⁿ, h(φ⁰)^soc(x), yi is continuously differentiable on intC₁, and moreover, it is strictly concave over C₁ whenever y ∈ int(Kⁿ).

Proof. (a) Let ψ(t) := φ⁰(t)t−φ(t). Then, by (D2) and (D3), ψ(t) is convex on domφ⁰ and continuously differentiable on int(domφ⁰) = (0, +∞). Since tr [(φ⁰)^soc(x) ◦ x − φ^soc(x)] = tr[ψ^soc(x)], using Lemma 3.3(b) and (c) immediately yields part(a).

(b) From (D2) and Lemma 3.3(a), (φ⁰)^soc(·) is continuously differentiable on int C₁. This implies that hy, (φ⁰)^soc(x)i for any fixed y is continuously differentiable on intC₁. We next show that it is also strictly concave in C₁ whenever y ∈ int(Kⁿ). Note that tr[(φ⁰)^soc(·)]

is strictly concave on C₁ since φ⁰ is strictly concave on dom(φ⁰). Consequently, tr[(φ⁰)^soc(βx + (1 − β)z)] > βtr[(φ⁰)^soc(x)] + (1 − β)tr[(φ⁰)^soc(z)], ∀0 < β < 1 for any x, z ∈ C₁ and x 6= z. This implies that

(φ⁰)^soc(βx + (1 − β)z) − β(φ⁰)^soc(x) − (1 − β)(φ⁰)^soc(z) 6= 0.

In addition, since φ⁰ is SOC-concave on dom(φ⁰), it follows that

(φ⁰)^soc[βx + (1 − β)z] − β(φ⁰)^soc(x) − (1 − β)(φ⁰)^soc(z) Kⁿ 0.

Thus, for any fixed y ∈ int(Kⁿ), the last two equations imply that

hy, (φ⁰)^soc[βx + (1 − β)z] − β(φ⁰)^soc(x) − (1 − β)(φ⁰)^soc(z)i > 0.

This shows that hy, (φ⁰)^soc(x)i for any fixed y ∈ int(Kⁿ) is strictly convex on C₁. Using the conditions (D1)-(D4) and Lemma 3.9, and following the same arguments as [116, Propositions 4.1 and 4.2] (also see Section 3.2, Propositions 3.8-3.9), we may prove the following proposition.

Proposition 3.21. Let H be defined as in (3.84) with φ satisfying (D1)-(D4). Then, the following hold.

(a) H(x, y) ≥ 0 for any x, y ∈ IRⁿ, and H(y, y) = 0 for any y ∈ int(Kⁿ).

(b) For any fixed y ∈ C₂, H(·, y) is continuous in C₁, and it is strictly convex on C₁ whenever y ∈ int(Kⁿ).

∇₁H(x, y) = 2∇(φ⁰)^soc(x)(x − y).

Moreover, dom∇₁H(·, y) = int(Kⁿ) whenever y ∈ int(Kⁿ).

(d) H(x, y) ≥P2

i=1d(λ_i(y), λ_i(x)) ≥ 0 for any x ∈ C₁ and y ∈ C₂.

(e) For any fixed y ∈ C₂, the sets {x ∈ C₁| H(x, y) ≤ γ} are bounded for all γ ∈ IR.

(f ) For all x, y ∈ int(Kⁿ) and z ∈ C₂, H(x, z) − H(y, z) ≥ 2h∇₁H(y, x), z − yi.

Proposition 3.21 demonstrates that the function H defined by (3.84) with φ satisfying (D1)-(D4) is a proximal distance w.r.t. the cone int(Kⁿ) and possesses the property (P5’), and therefore belongs to the class F₂(int(Kⁿ)). If, in addition, domφ = [0, ∞), then H belongs to the class F₂(Kⁿ). The conditions (D1)–(D3) are easy to check, and for the condition (D4), we can employ the characterizations in [41, 44] to verify whether φ⁰ is SOC-concave or not. Some examples are presented as follows.

Example 3.15. Let φ(t) = t ln t − t + 1 if t ≥ 0, and otherwise φ(t) = ∞.

Solution. It is easy to verify that φ satisfies (D1)–(D3) with domφ = [0, ∞) and dom(φ⁰) = (0, ∞). By Example 2.12(c), φ⁰ is SOC-concave on (0, ∞). Using formu-las (1.9) and (3.84), we have

H(x, y) = tr(y ◦ ln y − y ◦ ln x + x − y), ∀x ∈ int(Kⁿ), y ∈ Kⁿ.

∞, otherwise.

Example 3.16. Let φ(t) = t^q+1

q +1 if t ≥ 0, and otherwise φ(t) = ∞, where 0 < q < 1.

Solution. It is easy to show that φ satisfies (D1)-(D3) with domφ = [0, ∞) and dom(φ⁰) = [0, ∞). By Example 2.12, φ⁰ is also SOC-concave on [0, ∞). By (1.9) and (3.84), we com-pute that

H(x, y) =

₁

q+1tr(y^q+1) + _q+1^q tr(x^q+1) − tr(x^q◦ y), ∀ x ∈ int(Kⁿ), y ∈ Kⁿ.

∞, otherwise.

Example 3.17. Let φ(t) = (1 + t) ln(1 + t) + t^q+1

q +1 if t ≥ 0, and otherwise φ(t) = ∞, where 0 < q < 1.

Solution. We can verify that φ satisfies (D1)-(D3) with domφ = dom(φ⁰) = [0, ∞).

From Example 2.12, φ⁰ is also SOC-concave on [0, ∞). Using (1.9) and (3.84), it is not hard to compute that for any x, y ∈ Kⁿ,

H(x, y) = tr [(e + y) ◦ (ln(e + y) − ln(e + x))] − tr(y − x) + 1

q + 1tr(y^q+1) + q

q + 1tr(x^q+1) − tr(x^q◦ y).

Note that the proximal distances in Example 3.16 and Example 3.17 belong to the class F₂(Kⁿ). By Proposition 3.22 below, the ones in Example 3.16 and Example 3.17 also belong to the class bF₂(Kⁿ).

Proposition 3.22. Let H be defined as in (3.84) with φ satisfying (D1)-(D4). Suppose that domφ = dom(φ⁰) = [0, ∞). Then, H possesses the properties (P7’) and (P8’).

Proof. By the given assumption, C₁ = C₂ = Kⁿ. From Proposition 3.21(b), the function H(·, y^∗) is continuous on Kⁿ. Consequently, lim_k→∞H(y^k, y^∗) = H(y^∗, y^∗) = 0.

From Proposition 3.21(d), H(y^k, y^∗) ≥ d(λi(y^∗), λi(y^k)) ≥ 0 for i = 1, 2. This together with the assumption H(y^k, y^∗) → 0 implies d(λ_i(y^∗), λ_i(y^k)) → 0 for i = 1, 2. From this, we necessarily have λ_i(y^k) → λ_i(y^∗) for i = 1, 2. Suppose not, then the bounded sequence {λ_i(y^k)} must have another limit point ν_i^∗ ≥ 0 such that ν_i^∗ 6= λ_i(y^∗). Without loss of generality, we assume that limk∈K,k→∞λi(y^k) = ν_i^∗. Then, we have

d(ν_i^∗, λ_i(y^∗)) = lim

k→∞d(ν_i^∗, λ_i(y^k)) = lim

k∈K,k→∞d(ν_i^∗, λ_i(y^k)) = d(ν_i^∗, ν_i^∗) = 0,

where the first equality is due to the continuity of d(s, ·) for any fixed s ∈ [0, ∞), and the second one is by the convergence of {d(ν_i^∗, λi(y^k))} implied by the first equality. This contradicts the fact that d(ν_i^∗, λ_i(y^∗)) > 0 since ν_i^∗ 6= λ_i(y^∗).

As illustrated by the following example, the proximal distance generated by (3.84) with φ satisfying (D1)-(D4) generally does not belong to the class ¯F₂(Kⁿ).

Example 3.18. Let H be the proximal distance as in Example 3.15.

Solution. Let

y^k =





√2 (−1)^k_k+1^k (−1)^k_k+1^k



 for each k and y^∗ =





√2 1 1



.

It is not hard to check that the sequence {y^k} ⊆ int(K³) satisfies H(y^k, y^∗) → 0. Clearly, the sequence y^k 9 y^∗ as k → ∞, but λ₁(y^k) → λ₁(y^∗) = 0 and λ₂(y^k) → λ₂(y^∗) = 2√

Finally, let H₁ be a proximal distance produced via one of the ways above, and define H_α(x, y) := H₁(x, y) + α

2kx − yk², (3.85)

where α > 0 is a fixed parameter. Then, by Propositions 3.18, 3.19 and 3.21 and the identity

kz − xk² = kz − yk²+ ky − xk² + 2hz − y, y − xi, ∀x, y, z ∈ IRⁿ,

it is easily shown that H_α is also a proximal distance w.r.t. int(Kⁿ). Particularly, when H₁ is given by (3.84) with φ satisfying (D1)-(D4) and domφ = dom(φ⁰) = [0, ∞) (for example the distances in Examples 3.16 and and Example 3.17), the regularized proximal distance H_α satisfies (P7’) and (P9’), and hence H_α ∈ ¯F₂(Kⁿ). With such a regularized proximal distance, the sequence generated by the IPA converges to an optimal solution of (3.64) if X_∗ 6= ∅.

To sum up, we may construct a proximal distance w.r.t. the cone int(Kⁿ) via three ways with an appropriate univariate function. The first way in (3.78) can only produce a proximal distance belonging to F₁(int(Kⁿ)), the second way in (3.82) produces a proximal distance of F₁(Kⁿ) if domφ = [0, ∞), whereas the third way in (3.84) produces a proximal distance of the class bF₂(Kⁿ) if domφ = dom(φ⁰) = [0, ∞). Particularly, the regularized proximal distances Hα in (3.85) with H1 given by (3.84) with domφ = dom(φ⁰) = [0, ∞) belong to the smallest class ¯F₂(Kⁿ). With such regularized proximal distances, we have the convergence result of Proposition 3.17(c) for the general convex SOCP with X∗ 6= ∅.

For the linear SOCP, we will obtain some improved convergence results for the IPA by exploring the relations between the sequence generated by the IPA and the central path associated to the corresponding proximal distances.

Given a l.s.c. proper strictly convex function Φ with dom(Φ) ⊆ Kⁿ and int(domΦ) = int(Kⁿ), the central path of (3.64) associated to Φ is the set {x(τ ) | τ > 0} defined by

x(τ ) := argminn

τ f (x) + Φ(x) | x ∈ V ∩ Kⁿo

for τ > 0. (3.86) In what follows, we will focus on the central path of (3.64) w.r.t. a distance-like function H ∈ D(int(Kⁿ)). From Proposition 3.14, we immediately have the following result.

Proposition 3.23. For any given H ∈ D(int(Kⁿ)) and ¯x ∈ int(Kⁿ), the central path {x(τ ) | τ > 0} associated to H(·, ¯x) is well defined and is in V ∩ int(Kⁿ). For each τ > 0, there exists g_τ ∈ ∂f (x(τ )) such that τ g_τ+∇₁H(x(τ ), ¯x) = A^Ty(τ ) for some y(τ ) ∈ IR^m.

We next study the favorable properties of the central path associated to H ∈ D(int(Kⁿ)).

Proposition 3.24. For any given H ∈ D(int(Kⁿ)) and ¯x ∈ int(Kⁿ), let {x(τ ) | τ > 0}

be the central path associated to H(·, ¯x). Then, the following results hold.

(a) The function H(x(τ ), ¯x) is nondecreasing in τ .

(b) The set {x(τ ) | ˆτ ≤ τ ≤ ˜τ } is bounded for any given 0 < ˆτ < ˜τ . (c) x(τ ) is continuous at any τ > 0.

(d) The set {x(τ ) | τ ≥ ¯τ } is bounded for any ¯τ > 0 if X∗ 6= ∅ and domH(·, ¯x) = Kⁿ.

(e) All cluster points of {x(τ ) | τ > 0} are solutions of (3.64) if X∗ 6= ∅.

Proof. The proofs are similar to those of Propositions 3–5 of [82].

(a) Take τ₁, τ₂ > 0 and let xⁱ = x(τ_i) for i = 1, 2. Then, from Proposition 3.23, we know x¹, x² ∈ V ∩ int(Kⁿ) and there exist g¹ ∈ ∂f (x¹) and g² ∈ ∂f (x²) such that

∇1H(x¹, ¯x) = −τ1g¹+ A^Ty¹ and ∇1H(x², ¯x) = −τ2g²+ A^Ty² (3.87) for some y¹, y² ∈ IR^m. This together with the convexity of H(·, ¯x) yields that

τ₁⁻¹ H(x¹, ¯x) − H(x², ¯x)

≤ τ₁⁻¹h∇₁H(x¹, ¯x), x¹− x²i = hg¹, x²− x¹i, τ₂⁻¹ H(x², ¯x) − H(x¹, ¯x)

≤ τ₂⁻¹h∇₁H(x², ¯x), x²− x¹i = hg², x¹− x²i. (3.88) Adding the two inequalities and using the convexity of f , we obtain

τ₁⁻¹− τ₂⁻¹

H(x¹, ¯x) − H(x², ¯x) ≤ hg¹− g², x²− x¹i ≤ 0.

Thus, H(x¹, ¯x) ≤ H(x², ¯x) whenever τ₁ ≤ τ₂. Particularly, from the last two equations, 0 ≤ τ₁⁻¹H(x¹, ¯x) − H(x², ¯x)

≤ τ₁⁻¹h∇₁H(x¹, ¯x), x¹− x²i (3.89)

≤ hg², x²− x¹i

≤ τ₂⁻¹H(x¹, ¯x) − H(x², ¯x) , ∀τ₁ ≥ τ₂ > 0.

(b) By part(a), H(x(τ ), ¯x) ≤ H(x(˜τ ), ¯x) for any τ ≤ ˜τ , which implies that {x(τ ) : τ ≤ ˜τ } ⊆ L₁ = {x ∈ int(Kⁿ) | H(x, ¯x) ≤ H(x(˜τ ), ¯x)} .

Noting that {x(τ ) : ˆτ ≤ τ ≤ ˜τ } ⊆ {x(τ ) : τ ≤ ˜τ } ⊆ L₁, the desired result follows by (P4).

= x(¯τ ) for any sequence {τk} such that limk→∞τk = ¯τ . Given such a sequence {τk}, and take ˆτ , ˜τ such that ˆτ > ¯τ > ˜τ . Then, {x(τ ) : ˆτ ≤ τ ≤ ˜τ } is bounded by part (b), and τ_k∈ (ˆτ , ˜τ ) for sufficiently large k. Consequently, the sequence {x(τ_k)} is bounded. Let ¯y be a cluster point of {x(τ_k)}, and without loss of generality assume that lim_k→∞x(τ_k) = ¯y.

Let K1 := {k : τk ≤ ¯τ } and take k ∈ K1. Then, from (3.89) with τ1 = ¯τ and τ2 = τk, 0 ≤ ¯τ⁻¹[H(x(¯τ ), ¯x) − H(x(τ_k), ¯x)]

≤ ¯τ⁻¹h∇₁H(x(¯τ ), ¯x), x(¯τ ) − x(τ_k)i

≤ τ_k⁻¹[H(x(¯τ ), ¯x) − H(x(τ_k), ¯x)] .

If K₁ is infinite, taking the limit k → ∞ with k ∈ K₁ in the last inequality and using the continuity of H(·, ¯x) on int(Kⁿ) yields that

H(x(¯τ ), ¯x) − H(¯y, ¯x) = h∇₁H(x(¯τ ), ¯x), x(¯τ ) − ¯yi.

This together with the strict convexity of H(·, ¯x) implies x(¯τ ) = ¯y. If K₁ is finite, then K₂ := {k : τ_k≥ ¯τ } must be infinite. Using the same arguments, we also have x(¯τ ) = ¯y.

(d) By (P3) and Proposition 3.23, there exists g_τ ∈ ∂f (x(τ )) such that for any z ∈ V ∩Kⁿ, H(x(τ ), ¯x) − H(z, ¯x) ≤ τ⁻¹h∇₁H(x(τ ), ¯x), x(τ ) − zi = hg_τ, z − x(τ )i. (3.90) In particular, taking z = x^∗ ∈ X∗ in the last equality and using the fact

0 ≥ f (x^∗) − f (x(τ )) ≥ hg_τ, x^∗− x(τ )i,

we have H(x(τ ), ¯x) − H(x^∗, ¯x) ≤ 0. Hence, {x(τ ) | τ > ¯τ } ⊂ {x ∈ int(Kⁿ) | H(x, ¯x) ≤ H(x^∗, ¯x)}. By (P4), the latter is bounded, and the desired result then follows.

(e) Let ˆx be a cluster point of {x(τ )} and {τ_k} be a sequence such that lim_k→∞τ_k = ∞ and limk→∞x(τk) = ˆx. Write x^k := x(τk) and take x^∗ ∈ X∗ and z ∈ V ∩ int(Kⁿ). Then, for any ε > 0, we have x(ε) := (1 − ε)x^∗+ εz ∈ V ∩ int(Kⁿ). From the property (P3),

h∇₁H(x(ε), ¯x) − ∇₁H(x^k, ¯x), x^k− x(ε)i ≤ 0.

On the other hand, taking z = x(ε) in (3.90), we readily have τ_k⁻¹h∇₁H(x^k, ¯x), x^k− x(ε)i = hg^k, x(ε) − x^ki with g^k ∈ ∂f (x^k). Combining the last two equations, we obtain

τ_k⁻¹h∇₁H(x(ε), ¯x), x^k− x(ε)i ≤ hg^k, x(ε) − x^ki.

Since the subdifferential set ∂f (x^k) for each k is compact and g^k ∈ ∂f (x^k), the sequence {g^k} is bounded. Taking the limit in the last inequality yields 0 ≤ hˆg, x(ε) − ˆxi, where ˆg is a limit point of {g^k}, and by [131, Theorem 24.4], ˆg ∈ ∂f (ˆx). Taking the limit ε → 0 in the inequality, we get 0 ≤ hˆg, x^∗− ˆxi. This implies that f (ˆx) ≤ f (x^∗) since x^∗ ∈ X∗

and ˆg ∈ ∂f (ˆx). Consequently, ˆx is a solution of the CSOCP (3.64).

Particularly, from the following proposition, we also have that the central path is convergent if H ∈ D(int(Kⁿ)) satisfies domH(·, ¯x) = Kⁿ, where ¯x ∈ int(Kⁿ) is a given point. Notice that H(·, ¯x) is continuous on domH(·, ¯x) by (P2), and hence the assumption for H is equivalent to saying that H(·, ¯x) is continuous at the boundary of the cone Kⁿ. Proposition 3.25. For any given ¯x ∈ int(Kⁿ) and H ∈ D(int(Kⁿ)) with domH(·, ¯x) = Kⁿ, let {x(τ ) : τ > 0} be the central path associated to H(·, ¯x). If X∗ is nonempty, then lim_{τ →∞}x(τ ) exists and is the unique solution of min{H(x, ¯x) | x ∈ X_∗}.

Proof. Let ˆx be a cluster point of {x(τ )} and {τk} be such that limk→∞τk = ∞ and lim_k→∞x(τ_k) = ˆx. Then, for any x ∈ X∗, using (3.89) with x¹ = x(τ_k) and x² = x, we obtain

[H(x(τ_k), ¯x) − H(x, ¯x)] ≤ τ_khg^k, x − x(τ_k)i ≤ τ_k[f (x) − f (x(τ_k))] ≤ 0,

where the second inequality is since g^k ∈ ∂f (x(τ_k)), and the last one is due to x ∈ X∗. Taking the limit k → ∞ in the last inequality and using the continuity of H(·, ¯x), we have H(ˆx, ¯x) ≤ H(x, ¯x) for all x ∈ X∗. Since ˆx ∈ X∗ by Proposition 3.24(e), this shows that any cluster point of {x(τ ) | τ > 0} is a solution of min{H(x, ¯x) | x ∈ X∗}. By the uniqueness of the solution of min{H(x, ¯x) | x ∈ X∗}, we have lim_{τ →∞}x(τ ) = x^∗.

For the linear SOCP, we may establish the relations between the sequence generated by the IPA and the central path associated to the corresponding distance-like functions.

Proposition 3.26. For the linear SOCP, let {x^k} be the sequence generated by the IPA with H ∈ D(int(Kⁿ)), x⁰ ∈ V ∩ int(Kⁿ) and ε_k ≡ 0, and {x(τ ) | τ > 0} be the central path associated to H(·, x⁰). Then, x^k = x(τ_k) for k = 1, 2, . . . under either of the conditions:

(a) H is constructed via (3.78) or (3.82), and {τ_k} is given by τ_k = Pk

j=0λ_j for k = 1, 2, . . .;

(b) H is constructed via (3.84), the mapping ∇(φ⁰)^soc(·) defined on int(Kⁿ) maps any vector IRⁿ into ImA^T, and the sequence {τ_k} is given by τ_k= λ_k for k = 1, 2, · · · . Moreover, for any positive increasing sequence {τ_k}, there exists a positive sequence {λ_k} with P∞

k=1λk = ∞ such that the proximal sequence {x^k} satisfies xk= x(τk).

Proof. (a) Suppose that H is constructed via (3.78). From (3.67) and Proposition 3.18(b), we have

λ_jc + ∇φ(det(x^j)) − ∇φ(det(x^j−1)) = A^Tu^j for j = 0, 1, 2, . . . . (3.91) Summing the equality from j = 0 to k and taking τk =Pk

j=0λj, y^k=Pk

j=0u^j, we get τ_kc + ∇φ(det(x^k)) − ∇φ(det(x⁰)) = A^Ty^k.

This means that x^k satisfies the optimal conditions of the problem

minτkf (x) + H(x, x⁰) | x ∈ V ∩ int(Kⁿ) , (3.92) and so x^k = x(τ_k). Now let {x(τ ) : τ > 0} be the central path. Take a positive increasing sequence {τ_k} and let x^k ≡ x(τ_k). Then from Proposition 3.23 and Proposition 3.18(b), it follows that

τ_kc + ∇φ(det(x^k)) − ∇φ(det(x⁰)) = A^Ty^k for some y^k ∈ IR^m. Setting λ_k= τ_k− τ_k−1 and u^k= y^k− y^k−1, from the last equality it follows that

λ_kc + ∇φ(det(x^k)) − ∇φ(det(x^k−1)) = A^Tu^k.

This shows that {x^k} is the sequence generated by the IPA with ε_k ≡ 0. If H is given by (3.82), using Proposition 3.19(b) and the same arguments, we also have the result holds.

(b) Under this case, by Proposition 3.21(c), the above (3.91) becomes λ_jc + ∇(φ⁰)^soc(x^j) · (x^j − x^j−1) = A^Tu^j for j = 0, 1, 2, . . . .

Since φ⁰⁰(t) > 0 for all t ∈ (0, ∞) by (D1) and (D2), from [63, Proposition 5.2] it follows that ∇(φ⁰)^soc(x) is positive definite on int(Kⁿ). Thus, the last equality is equivalent to

∇(φ⁰)^soc(x^j)−1

λ_jc + (x^j − x^j−1) =∇(φ⁰)^soc(x^j)−1

A^Tu^j for j = 0, 1, 2, . . . . (3.93) Summing the equality (3.93) from j = 0 to k and making suitable arrangement, we get λ_kc + ∇(φ⁰)^soc(x^k)(x^k− x⁰) = A^Tu^k+ ∇(φ⁰)^soc(x^k)

k−1

j=0

∇(φ⁰)^soc(x^j)−1

(A^Tu^j− λ_jc), which, using the given assumptions and setting τ_k = λ_k, reduces to

τ_kc + ∇(φ⁰)^soc(x^k)(x^k− x⁰) = A^Ty¯^k for some ¯y^k ∈ IR^m.

This means that x^k is the unique solution of (3.92), and hence x^k = x(τ_k) for any k. Let {x(τ ) : τ > 0} be the central path. Take a positive increasing sequence {τ_k} and define the sequence x^k = x(τ_k). Then, from Proposition 3.23 and Proposition 3.21(c),

τ_kc + ∇(φ⁰)^soc(x^k)(x^k− x⁰) = A^Ty^k for some y^k ∈ IR^m, which, by the positive definiteness of ∇(φ⁰)^soc(·) on int(Kⁿ), implies that

[∇(φ⁰)^soc(x^k)]⁻¹(τ_kc − A^Ty^k) + [∇(φ⁰)^soc(x^k−1)]⁻¹(τ_k−1c − A^Ty^k−1) + (x^k− x^k−1) = 0.

Consequently,

τ_kc + ∇(φ⁰)^soc(x^k)(x^k− x^k−1) = ∇(φ⁰)^soc(x^k)[∇(φ⁰)^soc(x^k−1)]⁻¹(A^Ty^k−1− τ_k−1c).

Using the given assumptions and setting λ_k = τ_k, we have

λ_kc + ∇(φ⁰)^soc(x^k)(x^k− x^k−1) = A^Tu^k for some u^k ∈ IR^m.

for some u^k ∈ IR^m. This implies that {x^k} is the sequence generated by the IPA and the sequence {λ_k} satisfiesP∞

k=1λ_k = ∞ since {τ_k} is a positive increasing sequence. From Proposition 3.25 and Proposition 3.26, we readily have the following improved convergence results of the sequence generated by the IPA for the linear SOCP.

Proposition 3.27. For the linear SOCP, let {x^k} be the sequence generated by the IPA with H ∈ D(int(Kⁿ)), x⁰ ∈ V ∩ int(Kⁿ) and ε_k≡ 0. If one of the conditions is satisfied:

(a) H is constructed via (3.82) with domH(·, x⁰) = Kⁿ and P∞

k=0λ_k = ∞;

(b) H is constructed via (3.84) with domH(·, x⁰) = Kⁿ, the mapping ∇(φ⁰)^soc(·) defined on int(Kⁿ) maps any vector in IRⁿ into ImA^T, and lim_k→∞λ_k= ∞;

and X∗ 6= ∅, then {x^k} converges to the unique solution of min{H(x, x⁰) | x ∈ X∗}.

Chapter 4 SOC means and SOC inequalities

In this chapter, we present some other types of applications of the aforementioned SOC-functions, SOC-convexity, and SOC-monotonicity. These include so-called SOC means, SOC weighted means, and a few SOC trace versions of Young, H¨older, Minkowski in-equalities, and Powers-Størmer’s inequality. We believe that these results will be helpful in convergence analysis of optimizations involved with SOC. Many materials of this chap-ter are extracted from [36, 77, 78], the readers can look into them for more details.

在文檔中 SOC Functions and Their Applications (頁 136-160)