Smoothing Newton methods - 6 Smoothing Newton methods and applications

6 Smoothing Newton methods and applications

6.1 Smoothing Newton methods

and m = 120, n = 2650. The matrices Q ∈ IRⁿ^×n and A ∈ IR^m^×n and the vectors b∈ IR^m and c ∈ IRⁿ are generated in the same way as in Subsection 4.1. We applied the above nonmonotone line search version of Algorithm 5.1 with the parameters in (49) and tolerance ϵ = 1.0× 10⁻⁹ for 50 test instances generated randomly. The computational results are as follows: Algorithm 5.1 solves all problems successfully with given accuracy, the average function evaluations and the average iterations needed are 11 and 10, re-spectively, the average CPU time used by each problem is 857.3(s), and all problems do not make use of the negative gradient steps. As pointed out in Subsection 4.1, this class of problems has a bad scaling, and the optimal value attains the order of 10³, but numerical results show that the FB semismooth Newton method does not suﬀer from this.

This function was employed to develop a smoothing Newton method in [18], where nu-merical comparisons with the interior point method SDPT3 for linear SOCPs indicate that the former is very promising. The FB smoothing function associated with Kⁿ is

φ_FB(x, y, ε) := (x + y)− (x²+ y²+ 2ε²e)^1/2 ∀x, y ∈ IRⁿ, ε > 0. (54) The following proposition shows that φ_CM, φ_SQ and φ_FB deﬁned as above are the uniform smooth approximation of the corresponding SOC complementarity function, and characterizes the properties of Jacobians of these smoothing functions.

Proposition 6.1 Let φ be one of φ_CM, φ_SQ and φ_FB deﬁned as above. Then,

(a) φ is positively homogeneous, i.e., φ(tx, ty, tε) = tφ(x, y, ε) for all x, y∈ IRⁿ, ε, t > 0.

(b) For any x, y ∈ IRⁿ and ε₂ > ε₁ > 0, there hold that

κ(ε₂ − ε1)e ≽_Kⁿ φ(x, y, ε₂)− φ(x, y, ε1) ≻_Kⁿ 0, κε₁e ≽Kⁿ φ(x, y, 0)− φ(x, y, ε1) ≻Kⁿ 0.

where φ(x, y, 0) = lim_ε_↓0φ(x, y, ε), and κ = g(0) if φ = φ_CM, and otherwise κ =√ 2.

(c) φ is continuously diﬀerentiable everywhere in IRⁿ× IRⁿ× IR++. Furthermore, φ^′

CM(x, y, ε) = [

I−∇g^soc(z) ∇g^soc(z) ∇g^soc(z)^Tz−g^soc(z)]

with z = x− y ε where ∇g^soc(z) has the same expression as given in Lemma 2.2, and

φ^′

SQ(x, y, ε) = 1 2 [

I− L⁻¹_z L_x_−y I +L⁻¹_z L_x_−y −4εL⁻¹_z e ]

with z = ((x− y)²+ 4ε²e)^1/2; φ^′

FB(x, y, ε) = [

I− L⁻¹z L_x I− L⁻¹z L_y − 2εL⁻¹z e ]

with z = (x²+ y²+ 2ε²e)^1/2. (d) The partial Jacobians φ^′_x and φ^′_y of φ are nonsingular in IRⁿ× IRⁿ× IR++. (e) The matrices (φ^′_x)⁻¹φ^′_y and (φ^′_y)⁻¹φ^′_x are positive deﬁnite in IRⁿ× IRⁿ× IR++. Proof. Part (a) is direct by the expression of φ. When φ = φ_CM and φ_FB, part (b) is proved in [25, Prop. 5.1]. Using similar arguments, we can prove that part (b) holds for φ = φ_SQ. Part (c) is immediate by [9, Prop. 5]. When φ = φ_CM, the proof of part (d) can be found in [25, Prop. 6.1], and when φ = φ_SQ and φ_FB, part (d) is direct by the expressions of φ^′_x and φ^′_y. When φ = φ_SQ and φ_FB, the proof of part (e) can be found in Prop. 6.1 and 6.2 of [25], respectively; and when φ = φ_SQ, since z² ≻_Kn (x− y)² and z ≻_Kn 0, we have from Prop. 3.4 of [25] that L²_z− L²_x−y is positive deﬁnite, and part (e) follows by noting that L²_z− L²x−y = (L_z − Lx−y) (L_z+ L_x−y) + (L_z+ L_x−y) (L_z− Lx−y) . 2

With φ = φ_CM, φ_SQ and φ_FB above, we deﬁne the function θ : IRⁿ× IRⁿ× IR → IRⁿby θ(x, y, ε) :=

{ φ(x, y,|ε|) if ε ̸= 0,

ϕ(x, y) if ε = 0. (55)

The following proposition summarizes some favorable properties of the function θ.

Proposition 6.2 Let θ be deﬁned as in (55) with φ = φ_CM, φ_SQ or φ_FB. Then,

(a) θ is continuously diﬀerentiable at any (x, y, ε) with ε̸= 0. In particular, in this case,

∥θ^′(x, y, ε)∥ ≤ C, where C > 0 is a constant independent on x, y and ε.

(b) θ is globally Lipschitz continuous and directionally diﬀerentiable everywhere.

(c) θ is a strongly semismooth function if φ = φ_CM with g given by (52), φ_SQ or φ_FB. Proof. (a) The ﬁrst part is a direct consequence of Prop. 6.1(c). For the second part, when φ = φ_CM, by the properties of g in (50) and the expression of ∇g^soc(z) in Lemma 2.2, it is easy to verify that the boundness of θ^′; when φ = φ_SQ and φ_FB, using the same arguments as in those of [10, Lemma 4] can show that θ^′ is bounded.

(b) Using part (a) and Prop. 6.1(b) and noting that ϕ_NR and ϕ_FB are globally Lipschitz continuous and directionally diﬀerentiable everywhere, we readily get the result.

(c) When φ = φ_CM with g given by (52), letting h : IR² → IR be deﬁned by h(t, ε) :=

√t²+ 4ε²+ t

2 and h(t, ε) := ε ln (1 + exp(−t/ε)) ∀t, ε ∈ IR, it is not hard to see that for any x, y∈ IRⁿ and ε∈ IR,

θ(x, y, ε) = x−[

h(λ₁(z), ε)u⁽¹⁾_z + h(λ₂(z), ε)u⁽²⁾_z ]

where λ₁(z)u⁽¹⁾z + λ₂(z)u⁽²⁾z is the spectral decomposition of z = x− y. From Prop. 1 and Prop. 2 of [54], the above h are strongly semismooth functions. Hence, θ is strongly semismooth everywhere in IRⁿ× IRⁿ× IR by [9, Prop. 7]. When φ = φSQ and φ_FB, the result is implied by [18, Theorem 4.2] and [58, Theorem 3.2], respectively. 2

Unless otherwise stated, θ(x, y, ε) in the rest of this section is the function associated with K, i.e. θ(x, y, ε) = (θ(x1, y₁, ε), . . . , θ(x_m, y_m, ε)) with θ(x_i, y_i, ε) deﬁned as in (55).

Let Θ : IR× IRⁿ× IR^l× IRⁿ → IR × IRⁿ × IR^l × IRⁿ be the operator deﬁned by (12) with such θ. The following proposition shows that Θ is continuously diﬀerentiable at any ω ∈ IR++× IRⁿ× IRⁿ× IR^l, and has nonsingular Jacobians under some mild assumptions.

Proposition 6.3 Let Θ be deﬁned by (12) with θ given as in (55). Then,

(a) the operator Θ is continuously diﬀerentiable at any ω∈ IR++× IRⁿ× IRⁿ× IR^l and

Θ^′(ω) =



 1 0 0 0

0 E_x^′(x, y, ζ) E_y^′(x, y, ζ) E_ζ^′(x, y, ζ) θ^′_ε(x, y, ε) Dx(x, y, ε) Dy(x, y, ε) 0





where

D_x(x, y, ε) := diag(

θ^′_x₁(x₁, y₁, ε),· · · , θx^′m(x_m, y_m, ε)) , D_y(x, y, ε) := diag(

θ^′_y₁(x₁, y₁, ε),· · · , θ_y^′_m(x_m, y_m, ε))

. (56)

(b) Θ^′(ω) is nonsingular provided that rank E_ζ^′(x, y, ζ) = l and for any u̸= 0, v ̸= 0, E^′(x, y, ζ)(u, v, s) = 0⇒ ∃ν ∈ {1, . . . , m} s.t. uν ̸= 0 and ⟨uν, v_ν⟩ ≥ 0. (57) Proof. Part (a) is direct by Prop. 6.2(a) and the deﬁnition of Θ. We next prove part (b). By the expression of Θ^′(ω), it suﬃces to prove that the following system

E_x^′(x, y, ζ)u + E_y^′(x, y, ζ)v + E_ζ^′(x, y, ζ)s = 0

D_x(x, y, ζ)u + D_y(x, y, ζ)v = 0 (58) has only zero solutions. If one of u and v is zero, then we must have u = 0 and v = 0 from the second equation since D_x(x, y, ζ) and D_y(x, y, ζ) are nonsingular by Prop. 6.1(c).

Together with the ﬁrst equation and the assumption of rank E_ζ^′(x, y, ζ) = l, we get s = 0.

Thus, we prove that u = 0, v = 0, s = 0 under this condition. If u̸= 0 and v ̸= 0, then the ﬁrst equation of (58) and the given assumption imply that there exists a ν ∈ {1, . . . , m}

such that u_ν ̸= 0 and ⟨uν, v_ν⟩ ≥ 0. Note that the second equation of (58) is equivalent to φ^′_x_i(xi, yi, ε)ui+ φ^′_y_i(xi, yi, ε)vi = 0 for all i = 1, 2, . . . , m.

For i = ν, since φ^′_y_ν(xν, yν, ε) is nonsingular by Prop. 6.1(d), it follows that u^T_ν [

φ^′_y

ν(x_ν, y_ν, ε)]₋₁[ φ^′_x

ν(x_ν, y_ν, ε)]

u_ν+⟨uν, v_ν⟩ = 0.

From Prop. 6.1(e), the matrix [

φ^′_y_ν(x_ν, y_ν, ε)]₋₁[

φ^′_x_ν(x_ν, y_ν, ε)]

is positive deﬁnite, and from the last equality and ⟨uν, v_ν⟩ ≥ 0, it then follows that uν = 0. Thus, we obtain a contradiction. The proof is completed. 2

The condition in Prop. 6.3(b) is weaker than the conditions (6.2)–(6.3) of [25]. When l = 0, the condition of Prop. 6.3(b) is equivalent to saying that E_x^′ and E_y^′ are Cartesian column monotone, whereas the condition (6.3) of [25] is equivalent to saying that E_x^′ and E_y^′ are column monotone. For the SOCCP (4), the condition in Prop. 6.3(b) is equivalent

to requiring the Cartesian P₀-property of F^′, whereas the condition (6.3) of [25] is equiv-alent to requiring the positive semideﬁniteness of F^′. Recently, for the SOCCP (4), Chua et al. [19] establish the nonsingularity of Θ^′(ω) with φ = φ_CM and φ_SQ under the uniform nonsingularity of F , which is another nonmonotone property of F . Now it is not clear whether this condition is weaker than the Cartesian P₀-property of F^′for diﬀerentiable F . Let Ξ : IR₊× IRⁿ× IRⁿ× IR^l→ IR+ be the natural merit function of Θ(w) = 0, i.e.,

Ξ(ω) =∥Θ(w)∥² ∀ω ∈ IR+× IRⁿ× IRⁿ× IR^l. (59) The following proposition provides the coerciveness conditions of Ξ for the SOCCP (3).

Proposition 6.4 Let Θ and Ξ be deﬁned by (12) and (59), respectively. Suppose that

E(x, y, ζ)≡

( F (ζ)− x G(ζ)− y

)

with F and G being continuous. Then Ξ is coercive under (C.1) or (C.2) of Prop. 4.2.

Proof. We ﬁrst prove the result under the condition (C.1) of Prop. 4.2. Suppose on the contrary that there exist a constant γ ≥ 0 and a sequence {ω^k} with ∥ω^k∥ → ∞ such that Ξ(ω^k)≤ γ. Since {ε^k} is bounded by Ξ(ω^k)≤ γ, we must have ∥(x^k, y^k, ζ^k)∥ → ∞.

Observe that ∥ζ^k∥ → ∞ necessarily holds. If not, using the continuity of F and G, and Ξ(ω^k)≤ γ, we deduce that {x^k} and {y^k} are bounded, which contradicts the fact that

∥(x^k, y^k, ζ^k)∥ → ∞. From the uniform Jordan P -property and the linear growth of F and G, it then follows that ∥F (ζ^k)∥, ∥G(ζ^k)∥ → ∞. If not, we assume without loss of generality that {F (ζ^k)} is bounded. Deﬁne the bounded sequence {ξ^k} by

ξ_i^k =

{ ζ_i^k if i∈ J 0 otherwise where J :={

i∈ {1, . . . , m} | ∥ζi^k∥ is unbounded}

̸= ∅. Using the boundedness of {F (ζ^k)} and {F (ξ^k)}, and the linear growth of G, we obtain that

λ₂[

(F (ζ^k)− F (ξ^k))◦ (G(ζ^k)− G(ξ^k))]

≤ C1∥ζ^k∥ + C2

for some C₁, C₂ > 0, which contradicts the uniform Jordan P -property of F and G. Thus,

∥ζ^k∥ → +∞, ∥F (ζ^k)∥ → ∞, ∥G(ζ^k)∥ → ∞, ∥x^k∥ → ∞, ∥y^k∥ → ∞.

By Prop. 6.1 and Ξ(ω^k)≤ γ, we have that {ϕ(x^k, y^k)} is bounded with ϕ = ϕNR or ϕ_FB. This together with Lemma 3.5 and the last equation implies that{λ1(x^k)} and {λ1(y^k)} are bounded below, but λ₂(x^k), λ₂(y^k)→ +∞. From the proof of [47, Prop. 4.2(a)], we know that the uniform Jordan P -property and the linear growth of F and G implies that

lim_k_→∞ _{∥F (ζ}^{F (ζ}^kk⁾)∥ ◦ _∥G(ζ^G(ζ^k^k⁾₎_∥ ̸= 0. Using the boundedness of {F (ζ^k)− x^k} and {G(ζ^k)− y^k}, it is easy to argue that lim_k_→∞ _∥x^x^k_k_∥ ◦ _∥y^y^kk∥ ̸= 0. From Lemma 3.5, it then follows that {ϕ(x^k, y^k)} is unbounded, which is impossible.

When the condition (C.2) of Prop. 4.2 holds, using the similar arguments as above and those of [47, Prop. 4.2(d)], we can prove the desired result. 2

Note that, when E(x, y, ζ)≡

( Ax− b A^Tζ + y− c

)

with A∈ IR^m^×n, b ∈ IR^m and c∈ IRⁿ, the row full rank of A can not guarantee the coerciveness of Ξ. For example, let

A =

( 1 −1 1

−1 1 2 )

, x^k =



 k

k− 1 0



 , y^k = 0, ε^k ∈ [0, 1],

and {ζ^k} is an arbitrary bounded sequence in IR². Clearly, A has full row rank. Since x^k ∈ K³ and y^k = 0, we have that {ϕ(x^k, y^k)} is bounded with ϕ = ϕNR or ϕ_FB, and so is {θ(x^k, y^k, ε^k)}. In addition, it is easy to verify that {Ax^k− b} and {A^Tζ^k+ y^k− c} are bounded. This means that {Ξ(ω^k)} is bounded, but ∥ω^k∥ → ∞, i.e., Ξ is not coercive.

This partly interprets why in Subsection 6.2 using the smoothing method below to solve some linear SOCPs requires much more iterations than the interior point methods.

Motivated by the eﬃciency of the smoothing Newton method [54], we next apply this method for solving the SOC complementarity system (1), i.e., we want to obtain a solution of (1) by solving a single augmented smooth system Θ(ω) = 0. Choose ¯ε > 0 and γ ∈ (0, 1) such that ¯εγ < 1, and let ¯ω = (¯ε, 0) ∈ IR++× IR^2n+l. Deﬁne

β(ω) := γ min{1, Ξ(ω)} .

The smoothing Newton method [54] for solving system (1) is described as follows.

Algorithm 6.1 (Smoothing Newton method)

Step 0. Select a smoothing function φ of ϕ_NR or ϕ_FB. Choose constants δ ∈ (0, 1) and σ∈ (0, 1/2), and a point (x⁰, ζ⁰, y⁰)∈ IRⁿ× IR^l× IRⁿ. Let ε⁰ = ¯ε and k := 0.

Step 1. If Θ(ω^k) = 0, then stop. Otherwise, let β_k:= β(ω^k).

Step 2. Compute the direction dω^k:= (dε^k, dx^k, dy^k, dζ^k)∈ IR × IRⁿ× IRⁿ× IR^l by Θ(ω^k) + Θ^′(ω^k)dω = β_kω.¯ (60) Step 3. Let l_k be the smallest nonnegative integer l satisfying

Ξ(ω^k+ δ^ldω^k)≤[

1− 2σ(1 − γ∥ ¯w∥)δ^l]

Ξ(ω^k). (61)

Step 4. Deﬁne ω^k+1 := ω^k+ δ^l^kdω^k. Let k := k + 1, and then go to Step 1.

From Prop. 6.3(a), it follows that the mapping Θ(·) is continuously diﬀerentiable at any ω^k ∈ IR++× IRⁿ× IRⁿ× IR^l, and if for each ε > 0 and (x, y, ζ) ∈ IRⁿ× IRⁿ× IR^l, the mapping E satisﬁes the condition of Prop. 6.3(b), then Θ^′(ω) is nonsingular. As re-marked after Prop. 6.3, there are many types of SOCCPs, even nonmonotone SOCCPs, such that E satisﬁes this condition. The main computation work of Algorithm 6.1 is to calculate the direction dω^k by (60). In Subsection 6.2, we analyze that for the standard linear SOCPs, the calculation of dω^k needs only one factorization of an m× m positive deﬁnite matrix, whereas for nonlinear SOCPs, it requires one factorization of an n× n positive deﬁnite matrix and one factorization of an m× m positive deﬁnite matrix.

Note that ¯ε > 0 and the starting ω⁰ = (ε⁰, x⁰, y⁰, ζ⁰) belongs to the following set Ω := {

ω = (ε, x, y, ζ)∈ IR × IRⁿ× IRⁿ× IR^l | ε ≥ β(ω)¯ε} .

Therefore, using the same arguments as those of [54], we can prove that Algorithm 6.1 generates an inﬁnite sequence {ω^k} with ε^k ∈ IR++ and ω^k ∈ Ω, provided that for each k with ε^k > 0 and ω^k ∈ Ω, the Jacobian Θ^′(ω^k) is invertible. Particularly, we have the following global convergence result, whose proof is similar to that of [54, Theorem 4].

Theorem 6.1 Suppose that for all ω = (ε, x, y, ζ) ∈ Ω, rank Eζ^′(x, y, ζ) = l and the implication (57) holds for any u̸= 0, v ̸= 0. Then, an inﬁnite sequence {ω^k} is generated by Algorithm 6.1 and each accumulation point ω^∗ of {ω^k} is a solution of Θ(w) = 0.

When applying Algorithm 6.1 for the SOCCP (3), the sequence generated is bounded if the mappings F and G satisfy one of the conditions of Prop. 4.2. For the linear SOCPs, the counterexample after Prop. 6.4 shows that the full row rank of A can not guaran-tee the boundedness of level sets of the merit function Ξ, although numerical results in Subsection 6.2 demonstrate that the sequence generated by Algorithm 6.1 is generally bounded for this class of problems.

Assume that φ in Algorithm 6.1 is chosen as φ_CM with g given by (52), φ_SQ or φ_FB. By Prop. 6.2(c), the operator Θ is semismooth everywhere whenever E is semismooth, and strongly semismooth everywhere whenever E is strongly semismooth. Therefore, under the nonsingularity assumption of ∂BΘ(ω^∗), using the similar arguments to those of [54], we can establish the following local superlinear (or quadratic) convergence results.

Theorem 6.2 Suppose that for all ω = (ε, x, y, ζ) ∈ Ω, rank Eζ^′(x, y, ζ) = l and the implication (57) holds for any u ̸= 0, v ̸= 0, and ω^∗ is an accumulation point of the inﬁnite sequence {ω^k} generated by Algorithm 6.1 with φ = φCM for g given by (52), φ_SQ

or φ_FB. If E is semismooth at ω^∗ and all V ∈ ∂BΘ(ω^∗) are nonsingular, then the whole sequence {ω^k} converges to ω^∗, and

∥ω^k+1− ω^∗∥ = o(∥ω^k− ω^∗∥) and ε^k+1 = o(ε^k).

If, in addition, E is strongly semismooth at ω^∗, then

∥ω^k+1− ω^∗∥ = O(∥ω^k− ω^∗∥²) and ε^k+1 = O((ε^k)²).

Using the same arguments as in [64] and [48], it is not hard to verify that the con-ditions of Theorems 5.1 and 5.2 may guarantee the nonsingularity of ∂_BΘ(ω^∗). Thus, analogous to the NR and FB semismooth Newton methods, the local superlinear (or quadratic) convergence of the smoothing Newton methods with the CHKS smoothing function, the log-exponential smoothing function, the squared smoothing function and the FB smoothing function do not require the strict complementarity of solutions.

在文檔中 3 Merit functions associated with K (頁 34-41)