6 Smoothing Newton methods and applications
6.1 Smoothing Newton methods
and m = 120, n = 2650. The matrices Q ∈ IRn×n and A ∈ IRm×n and the vectors b∈ IRm and c ∈ IRn are generated in the same way as in Subsection 4.1. We applied the above nonmonotone line search version of Algorithm 5.1 with the parameters in (49) and tolerance ϵ = 1.0× 10−9 for 50 test instances generated randomly. The computational results are as follows: Algorithm 5.1 solves all problems successfully with given accuracy, the average function evaluations and the average iterations needed are 11 and 10, re-spectively, the average CPU time used by each problem is 857.3(s), and all problems do not make use of the negative gradient steps. As pointed out in Subsection 4.1, this class of problems has a bad scaling, and the optimal value attains the order of 103, but numerical results show that the FB semismooth Newton method does not suffer from this.
This function was employed to develop a smoothing Newton method in [18], where nu-merical comparisons with the interior point method SDPT3 for linear SOCPs indicate that the former is very promising. The FB smoothing function associated with Kn is
φFB(x, y, ε) := (x + y)− (x2+ y2+ 2ε2e)1/2 ∀x, y ∈ IRn, ε > 0. (54) The following proposition shows that φCM, φSQ and φFB defined as above are the uniform smooth approximation of the corresponding SOC complementarity function, and characterizes the properties of Jacobians of these smoothing functions.
Proposition 6.1 Let φ be one of φCM, φSQ and φFB defined as above. Then,
(a) φ is positively homogeneous, i.e., φ(tx, ty, tε) = tφ(x, y, ε) for all x, y∈ IRn, ε, t > 0.
(b) For any x, y ∈ IRn and ε2 > ε1 > 0, there hold that
κ(ε2 − ε1)e ≽Kn φ(x, y, ε2)− φ(x, y, ε1) ≻Kn 0, κε1e ≽Kn φ(x, y, 0)− φ(x, y, ε1) ≻Kn 0.
where φ(x, y, 0) = limε↓0φ(x, y, ε), and κ = g(0) if φ = φCM, and otherwise κ =√ 2.
(c) φ is continuously differentiable everywhere in IRn× IRn× IR++. Furthermore, φ′
CM(x, y, ε) = [
I−∇gsoc(z) ∇gsoc(z) ∇gsoc(z)Tz−gsoc(z)]
with z = x− y ε where ∇gsoc(z) has the same expression as given in Lemma 2.2, and
φ′
SQ(x, y, ε) = 1 2 [
I− L−1z Lx−y I +L−1z Lx−y −4εL−1z e ]
with z = ((x− y)2+ 4ε2e)1/2; φ′
FB(x, y, ε) = [
I− L−1z Lx I− L−1z Ly − 2εL−1z e ]
with z = (x2+ y2+ 2ε2e)1/2. (d) The partial Jacobians φ′x and φ′y of φ are nonsingular in IRn× IRn× IR++. (e) The matrices (φ′x)−1φ′y and (φ′y)−1φ′x are positive definite in IRn× IRn× IR++. Proof. Part (a) is direct by the expression of φ. When φ = φCM and φFB, part (b) is proved in [25, Prop. 5.1]. Using similar arguments, we can prove that part (b) holds for φ = φSQ. Part (c) is immediate by [9, Prop. 5]. When φ = φCM, the proof of part (d) can be found in [25, Prop. 6.1], and when φ = φSQ and φFB, part (d) is direct by the expressions of φ′x and φ′y. When φ = φSQ and φFB, the proof of part (e) can be found in Prop. 6.1 and 6.2 of [25], respectively; and when φ = φSQ, since z2 ≻Kn (x− y)2 and z ≻Kn 0, we have from Prop. 3.4 of [25] that L2z− L2x−y is positive definite, and part (e) follows by noting that L2z− L2x−y = (Lz − Lx−y) (Lz+ Lx−y) + (Lz+ Lx−y) (Lz− Lx−y) . 2
With φ = φCM, φSQ and φFB above, we define the function θ : IRn× IRn× IR → IRnby θ(x, y, ε) :=
{ φ(x, y,|ε|) if ε ̸= 0,
ϕ(x, y) if ε = 0. (55)
The following proposition summarizes some favorable properties of the function θ.
Proposition 6.2 Let θ be defined as in (55) with φ = φCM, φSQ or φFB. Then,
(a) θ is continuously differentiable at any (x, y, ε) with ε̸= 0. In particular, in this case,
∥θ′(x, y, ε)∥ ≤ C, where C > 0 is a constant independent on x, y and ε.
(b) θ is globally Lipschitz continuous and directionally differentiable everywhere.
(c) θ is a strongly semismooth function if φ = φCM with g given by (52), φSQ or φFB. Proof. (a) The first part is a direct consequence of Prop. 6.1(c). For the second part, when φ = φCM, by the properties of g in (50) and the expression of ∇gsoc(z) in Lemma 2.2, it is easy to verify that the boundness of θ′; when φ = φSQ and φFB, using the same arguments as in those of [10, Lemma 4] can show that θ′ is bounded.
(b) Using part (a) and Prop. 6.1(b) and noting that ϕNR and ϕFB are globally Lipschitz continuous and directionally differentiable everywhere, we readily get the result.
(c) When φ = φCM with g given by (52), letting h : IR2 → IR be defined by h(t, ε) :=
√t2+ 4ε2+ t
2 and h(t, ε) := ε ln (1 + exp(−t/ε)) ∀t, ε ∈ IR, it is not hard to see that for any x, y∈ IRn and ε∈ IR,
θ(x, y, ε) = x−[
h(λ1(z), ε)u(1)z + h(λ2(z), ε)u(2)z ]
where λ1(z)u(1)z + λ2(z)u(2)z is the spectral decomposition of z = x− y. From Prop. 1 and Prop. 2 of [54], the above h are strongly semismooth functions. Hence, θ is strongly semismooth everywhere in IRn× IRn× IR by [9, Prop. 7]. When φ = φSQ and φFB, the result is implied by [18, Theorem 4.2] and [58, Theorem 3.2], respectively. 2
Unless otherwise stated, θ(x, y, ε) in the rest of this section is the function associated with K, i.e. θ(x, y, ε) = (θ(x1, y1, ε), . . . , θ(xm, ym, ε)) with θ(xi, yi, ε) defined as in (55).
Let Θ : IR× IRn× IRl× IRn → IR × IRn × IRl × IRn be the operator defined by (12) with such θ. The following proposition shows that Θ is continuously differentiable at any ω ∈ IR++× IRn× IRn× IRl, and has nonsingular Jacobians under some mild assumptions.
Proposition 6.3 Let Θ be defined by (12) with θ given as in (55). Then,
(a) the operator Θ is continuously differentiable at any ω∈ IR++× IRn× IRn× IRl and
Θ′(ω) =
1 0 0 0
0 Ex′(x, y, ζ) Ey′(x, y, ζ) Eζ′(x, y, ζ) θ′ε(x, y, ε) Dx(x, y, ε) Dy(x, y, ε) 0
where
Dx(x, y, ε) := diag(
θ′x1(x1, y1, ε),· · · , θx′m(xm, ym, ε)) , Dy(x, y, ε) := diag(
θ′y1(x1, y1, ε),· · · , θy′m(xm, ym, ε))
. (56)
(b) Θ′(ω) is nonsingular provided that rank Eζ′(x, y, ζ) = l and for any u̸= 0, v ̸= 0, E′(x, y, ζ)(u, v, s) = 0⇒ ∃ν ∈ {1, . . . , m} s.t. uν ̸= 0 and ⟨uν, vν⟩ ≥ 0. (57) Proof. Part (a) is direct by Prop. 6.2(a) and the definition of Θ. We next prove part (b). By the expression of Θ′(ω), it suffices to prove that the following system
Ex′(x, y, ζ)u + Ey′(x, y, ζ)v + Eζ′(x, y, ζ)s = 0
Dx(x, y, ζ)u + Dy(x, y, ζ)v = 0 (58) has only zero solutions. If one of u and v is zero, then we must have u = 0 and v = 0 from the second equation since Dx(x, y, ζ) and Dy(x, y, ζ) are nonsingular by Prop. 6.1(c).
Together with the first equation and the assumption of rank Eζ′(x, y, ζ) = l, we get s = 0.
Thus, we prove that u = 0, v = 0, s = 0 under this condition. If u̸= 0 and v ̸= 0, then the first equation of (58) and the given assumption imply that there exists a ν ∈ {1, . . . , m}
such that uν ̸= 0 and ⟨uν, vν⟩ ≥ 0. Note that the second equation of (58) is equivalent to φ′xi(xi, yi, ε)ui+ φ′yi(xi, yi, ε)vi = 0 for all i = 1, 2, . . . , m.
For i = ν, since φ′yν(xν, yν, ε) is nonsingular by Prop. 6.1(d), it follows that uTν [
φ′y
ν(xν, yν, ε)]−1[ φ′x
ν(xν, yν, ε)]
uν+⟨uν, vν⟩ = 0.
From Prop. 6.1(e), the matrix [
φ′yν(xν, yν, ε)]−1[
φ′xν(xν, yν, ε)]
is positive definite, and from the last equality and ⟨uν, vν⟩ ≥ 0, it then follows that uν = 0. Thus, we obtain a contradiction. The proof is completed. 2
The condition in Prop. 6.3(b) is weaker than the conditions (6.2)–(6.3) of [25]. When l = 0, the condition of Prop. 6.3(b) is equivalent to saying that Ex′ and Ey′ are Cartesian column monotone, whereas the condition (6.3) of [25] is equivalent to saying that Ex′ and Ey′ are column monotone. For the SOCCP (4), the condition in Prop. 6.3(b) is equivalent
to requiring the Cartesian P0-property of F′, whereas the condition (6.3) of [25] is equiv-alent to requiring the positive semidefiniteness of F′. Recently, for the SOCCP (4), Chua et al. [19] establish the nonsingularity of Θ′(ω) with φ = φCM and φSQ under the uniform nonsingularity of F , which is another nonmonotone property of F . Now it is not clear whether this condition is weaker than the Cartesian P0-property of F′for differentiable F . Let Ξ : IR+× IRn× IRn× IRl→ IR+ be the natural merit function of Θ(w) = 0, i.e.,
Ξ(ω) =∥Θ(w)∥2 ∀ω ∈ IR+× IRn× IRn× IRl. (59) The following proposition provides the coerciveness conditions of Ξ for the SOCCP (3).
Proposition 6.4 Let Θ and Ξ be defined by (12) and (59), respectively. Suppose that
E(x, y, ζ)≡
( F (ζ)− x G(ζ)− y
)
with F and G being continuous. Then Ξ is coercive under (C.1) or (C.2) of Prop. 4.2.
Proof. We first prove the result under the condition (C.1) of Prop. 4.2. Suppose on the contrary that there exist a constant γ ≥ 0 and a sequence {ωk} with ∥ωk∥ → ∞ such that Ξ(ωk)≤ γ. Since {εk} is bounded by Ξ(ωk)≤ γ, we must have ∥(xk, yk, ζk)∥ → ∞.
Observe that ∥ζk∥ → ∞ necessarily holds. If not, using the continuity of F and G, and Ξ(ωk)≤ γ, we deduce that {xk} and {yk} are bounded, which contradicts the fact that
∥(xk, yk, ζk)∥ → ∞. From the uniform Jordan P -property and the linear growth of F and G, it then follows that ∥F (ζk)∥, ∥G(ζk)∥ → ∞. If not, we assume without loss of generality that {F (ζk)} is bounded. Define the bounded sequence {ξk} by
ξik =
{ ζik if i∈ J 0 otherwise where J :={
i∈ {1, . . . , m} | ∥ζik∥ is unbounded}
̸= ∅. Using the boundedness of {F (ζk)} and {F (ξk)}, and the linear growth of G, we obtain that
λ2[
(F (ζk)− F (ξk))◦ (G(ζk)− G(ξk))]
≤ C1∥ζk∥ + C2
for some C1, C2 > 0, which contradicts the uniform Jordan P -property of F and G. Thus,
∥ζk∥ → +∞, ∥F (ζk)∥ → ∞, ∥G(ζk)∥ → ∞, ∥xk∥ → ∞, ∥yk∥ → ∞.
By Prop. 6.1 and Ξ(ωk)≤ γ, we have that {ϕ(xk, yk)} is bounded with ϕ = ϕNR or ϕFB. This together with Lemma 3.5 and the last equation implies that{λ1(xk)} and {λ1(yk)} are bounded below, but λ2(xk), λ2(yk)→ +∞. From the proof of [47, Prop. 4.2(a)], we know that the uniform Jordan P -property and the linear growth of F and G implies that
limk→∞ ∥F (ζF (ζkk))∥ ◦ ∥G(ζG(ζkk))∥ ̸= 0. Using the boundedness of {F (ζk)− xk} and {G(ζk)− yk}, it is easy to argue that limk→∞ ∥xxkk∥ ◦ ∥yykk∥ ̸= 0. From Lemma 3.5, it then follows that {ϕ(xk, yk)} is unbounded, which is impossible.
When the condition (C.2) of Prop. 4.2 holds, using the similar arguments as above and those of [47, Prop. 4.2(d)], we can prove the desired result. 2
Note that, when E(x, y, ζ)≡
( Ax− b ATζ + y− c
)
with A∈ IRm×n, b ∈ IRm and c∈ IRn, the row full rank of A can not guarantee the coerciveness of Ξ. For example, let
A =
( 1 −1 1
−1 1 2 )
, xk =
k
k− 1 0
, yk = 0, εk ∈ [0, 1],
and {ζk} is an arbitrary bounded sequence in IR2. Clearly, A has full row rank. Since xk ∈ K3 and yk = 0, we have that {ϕ(xk, yk)} is bounded with ϕ = ϕNR or ϕFB, and so is {θ(xk, yk, εk)}. In addition, it is easy to verify that {Axk− b} and {ATζk+ yk− c} are bounded. This means that {Ξ(ωk)} is bounded, but ∥ωk∥ → ∞, i.e., Ξ is not coercive.
This partly interprets why in Subsection 6.2 using the smoothing method below to solve some linear SOCPs requires much more iterations than the interior point methods.
Motivated by the efficiency of the smoothing Newton method [54], we next apply this method for solving the SOC complementarity system (1), i.e., we want to obtain a solution of (1) by solving a single augmented smooth system Θ(ω) = 0. Choose ¯ε > 0 and γ ∈ (0, 1) such that ¯εγ < 1, and let ¯ω = (¯ε, 0) ∈ IR++× IR2n+l. Define
β(ω) := γ min{1, Ξ(ω)} .
The smoothing Newton method [54] for solving system (1) is described as follows.
Algorithm 6.1 (Smoothing Newton method)
Step 0. Select a smoothing function φ of ϕNR or ϕFB. Choose constants δ ∈ (0, 1) and σ∈ (0, 1/2), and a point (x0, ζ0, y0)∈ IRn× IRl× IRn. Let ε0 = ¯ε and k := 0.
Step 1. If Θ(ωk) = 0, then stop. Otherwise, let βk:= β(ωk).
Step 2. Compute the direction dωk:= (dεk, dxk, dyk, dζk)∈ IR × IRn× IRn× IRl by Θ(ωk) + Θ′(ωk)dω = βkω.¯ (60) Step 3. Let lk be the smallest nonnegative integer l satisfying
Ξ(ωk+ δldωk)≤[
1− 2σ(1 − γ∥ ¯w∥)δl]
Ξ(ωk). (61)
Step 4. Define ωk+1 := ωk+ δlkdωk. Let k := k + 1, and then go to Step 1.
From Prop. 6.3(a), it follows that the mapping Θ(·) is continuously differentiable at any ωk ∈ IR++× IRn× IRn× IRl, and if for each ε > 0 and (x, y, ζ) ∈ IRn× IRn× IRl, the mapping E satisfies the condition of Prop. 6.3(b), then Θ′(ω) is nonsingular. As re-marked after Prop. 6.3, there are many types of SOCCPs, even nonmonotone SOCCPs, such that E satisfies this condition. The main computation work of Algorithm 6.1 is to calculate the direction dωk by (60). In Subsection 6.2, we analyze that for the standard linear SOCPs, the calculation of dωk needs only one factorization of an m× m positive definite matrix, whereas for nonlinear SOCPs, it requires one factorization of an n× n positive definite matrix and one factorization of an m× m positive definite matrix.
Note that ¯ε > 0 and the starting ω0 = (ε0, x0, y0, ζ0) belongs to the following set Ω := {
ω = (ε, x, y, ζ)∈ IR × IRn× IRn× IRl | ε ≥ β(ω)¯ε} .
Therefore, using the same arguments as those of [54], we can prove that Algorithm 6.1 generates an infinite sequence {ωk} with εk ∈ IR++ and ωk ∈ Ω, provided that for each k with εk > 0 and ωk ∈ Ω, the Jacobian Θ′(ωk) is invertible. Particularly, we have the following global convergence result, whose proof is similar to that of [54, Theorem 4].
Theorem 6.1 Suppose that for all ω = (ε, x, y, ζ) ∈ Ω, rank Eζ′(x, y, ζ) = l and the implication (57) holds for any u̸= 0, v ̸= 0. Then, an infinite sequence {ωk} is generated by Algorithm 6.1 and each accumulation point ω∗ of {ωk} is a solution of Θ(w) = 0.
When applying Algorithm 6.1 for the SOCCP (3), the sequence generated is bounded if the mappings F and G satisfy one of the conditions of Prop. 4.2. For the linear SOCPs, the counterexample after Prop. 6.4 shows that the full row rank of A can not guaran-tee the boundedness of level sets of the merit function Ξ, although numerical results in Subsection 6.2 demonstrate that the sequence generated by Algorithm 6.1 is generally bounded for this class of problems.
Assume that φ in Algorithm 6.1 is chosen as φCM with g given by (52), φSQ or φFB. By Prop. 6.2(c), the operator Θ is semismooth everywhere whenever E is semismooth, and strongly semismooth everywhere whenever E is strongly semismooth. Therefore, under the nonsingularity assumption of ∂BΘ(ω∗), using the similar arguments to those of [54], we can establish the following local superlinear (or quadratic) convergence results.
Theorem 6.2 Suppose that for all ω = (ε, x, y, ζ) ∈ Ω, rank Eζ′(x, y, ζ) = l and the implication (57) holds for any u ̸= 0, v ̸= 0, and ω∗ is an accumulation point of the infinite sequence {ωk} generated by Algorithm 6.1 with φ = φCM for g given by (52), φSQ
or φFB. If E is semismooth at ω∗ and all V ∈ ∂BΘ(ω∗) are nonsingular, then the whole sequence {ωk} converges to ω∗, and
∥ωk+1− ω∗∥ = o(∥ωk− ω∗∥) and εk+1 = o(εk).
If, in addition, E is strongly semismooth at ω∗, then
∥ωk+1− ω∗∥ = O(∥ωk− ω∗∥2) and εk+1 = O((εk)2).
Using the same arguments as in [64] and [48], it is not hard to verify that the con-ditions of Theorems 5.1 and 5.2 may guarantee the nonsingularity of ∂BΘ(ω∗). Thus, analogous to the NR and FB semismooth Newton methods, the local superlinear (or quadratic) convergence of the smoothing Newton methods with the CHKS smoothing function, the log-exponential smoothing function, the squared smoothing function and the FB smoothing function do not require the strict complementarity of solutions.