Two unconstrained optimization approaches for the Euclidean κ-centrum location problem

(1)

(will be inserted by the editor)

Shaohua Pan and Jein-Shan Chen

Two unconstrained optimization approaches for the Euclidean κ-centrum location problem

Abstract Consider the single-facility Euclidean κ-centrum location problem in Rⁿ. This problem is a generalization of the classical Euclidean 1-median problem and 1-center problem. In this paper, we develop two efficient algorithms that are particularly suitable for problems where n is large by using unconstrained optimization techniques. The first algorithm is based on the neural networks smooth approximation for the plus function and reduces the problem to an unconstrained smooth convex minimization problem. The second algorithm is based on the Fischer-Burmeister merit function for the second-order cone complementarity problem and transforms the KKT system of the second-order cone programming reformulation for the problem into an unconstrained smooth minimization problem. Our computational experiments indicate that both methods are extremely efficient for large problems and the first algorithm is able to solve problems of dimension n up to 10,000 efficiently.

Keywords The Euclidean κ-centrum problem, smoothing function, merit function, second-order cone programming

Mathematics Subject Classification (2000) 90C30

1 Introduction

Given a positive integer number κ in the interval [1, m], the single-facility Euclidean κ-centrum problem in Rⁿ concerns locating a new facility so as to minimize the sum of the κ largest weighted Euclidean distances to the existing m facilities. Let ai ∈ Rⁿ represent the position of the ith existing facility and x ∈ Rⁿ denote the unknown position of the new facility. Let

fi(x) = ωi

q¡x1− ai1

¢₂ +¡

x2− ai2

¢₂

+ · · · +¡

xn− ain

¢₂

, i = 1, 2, · · · , m

be the weighted Euclidean distance between the new facility and the ith existing facility, where ωi> 0 is the associated weight. Then the problem can be formulated as the following minimization problem

x∈RminⁿΦκ(x) :=

Xκ l=1

f[l](x) (1)

where f[1](x), f[2](x), · · · , f[m](x) are the functions obtained by sorting f1(x), f2(x), · · · , fm(x) in non- increasing order, namely for any x ∈ Rⁿ,

f[1](x) ≥ f[2](x) ≥ · · · ≥ f[κ](x) ≥ · · · ≥ f[m](x).

Shaohua Pan: School of Mathematical Sciences, South China University of Technology, Guang zhou, 510640, China. E-mail: [email protected].

Jein-Shan Chen: Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan. E- mail: [email protected].

(2)

There are many different kinds of facility location problems and for which there have been proposed various methods; see [9,10,14] and the related literature in the web-site maintained by EWGLA (Euro Working Group on Locational Analysis). The single-facility Euclidean κ-centrum problem studied here is to generalize the classical Euclidean 1-median problem (corresponding to the case κ = m) and 1- center problem (corresponding to the case κ = 1) from a view of solution concept. To our knowledge, the κ-centrum concept was first defined by Slater [25] and Andreatta and Mason [3,4] for the discrete single-facility location problem, and later in [23] was extended to general location problems covering discrete and continuous decisions. In addition, Tamir et al. [22,28] recently did some excellent works related to the κ-centrum location problem, especially the rectilinear κ-centrum problem.

The current research for Euclidean facility location problems mainly focuses on the 1-median problem, for which many practical and efficient algorithms have been designed since Weiszfeld presented a simple iterative algorithm in 1937. These include the hyperboloid approximation procedure [11], the interior point algorithms [1,29,30], the smoothing Newton methods [19,20] and the merit function approach [7]. However, the solution methods for the κ-centrum location problem are rarely seen in the literature except [2,21], where the problem of minimizing the κ largest Euclidean norms is only mentioned as a special example of a second-order cone programming and consequently can be solved by an interior point method. The main purpose of this paper is to develop two efficient algorithms for the single-facility Euclidean κ-centrum problem in Rⁿ, which can be used to handle the cases where n is large (say, in the order of thousands).

Note that problem (1) is a nonsmooth convex optimization problem. Due to the nondifferentiability of the objective function, the gradient-based algorithms can not be used to solve the problem directly.

To overcome this difficulty, we reduce (1) as a nonsmooth problem only involving the plus function max{0, t}, and then utilize the neural network smoothing function [6] to give a convex approximation.

Based on the approximation problem, we propose a globally convergent quasi-Newton algorithm. In addition, we reformulate (1) as a standard primal second-order cone programming (SOCP) problem to circumvent its nondifferentiability. This SOCP reformulation is completely different from the ones in [2,21], and particularly we here use a merit function approach rather than an interior point method to solve it. More specifically, we convert its Karush-Kuhn-Tucker (KKT) system into an equivalent unconstrained smooth minimization problem by the Fischer-Burmeister merit function [8] for second-order cone complementarity problems (SOCCPs), and then solve this smooth optimization problem with a limited-memory BFGS method. In contrast to interior point methods, our unconstrained optimization methods do not require an interior point starting, and moreover have the advantages of requiring less work per iteration, thereby rendering themselves to large problems.

The rest of this paper proceeds as follows. In section 2, we derive a smooth approximation to (1).

Based on the smooth approximation, we design a globally convergent quasi-Newton algorithm in section 3. In section 4, we present the detailed process of reformulating (1) as a standard primal SOCP problem. Based on its KKT system and the Fischer-Burmeister merit function for SOCCPs, we develop another quasi-Newton algorithm in section 5. In section 6, we report our preliminary computational results and compare our algorithms with the SeDuMi 1.05 (a primal-dual interior point algorithm for the SOCP and the semidefinite programming). The results show that our first algorithm is the most effective by CPU time and able to solve problems of dimension n up to 10, 000, whereas our second algorithm is comparable with even superior to the SeduMi for the moderate problems (say, in the order of hundreds). Finally, we conclude this paper in section 7.

In this paper, unless otherwise stated, all vectors are column vectors. We use Id to denote the d × d identity matrix and 0d to denote a zero vector in R^d. To represent a large matrix with several small matrices, we use semicolons “;” for column concatenation and commas “,” for row concatenation. This notation also applies to vectors. For a convex function f : Rⁿ→ R, we let ∂f (x) denote the subdiffer- ential of f at x. Note that f is minimized at x over Rⁿ if and only if 0 ∈ ∂f (x). For the calculus rules on subdifferentials of convex functions, please refer to [16] or [24].

2 Smooth approximation

In this section, we reduce problem (1) to a nonsmooth optimization problem only involving the plus function max{0, t}, and then give a smooth approximation via the neural networks smoothing function proposed by [6]. We also demonstrate that the smooth approximation can be generated by regularizing

(3)

problem (1) with a binary entropy function.

First, it is not hard to verify that the function Φ_κ(x) in (1) can be expressed by

Φκ(x) = max nX^m

i=1

λifi(x) : Xm i=1

λi= κ, 0 ≤ λi≤ 1, i = 1, 2, · · · , m o

. (2)

Since, on the one hand, for any feasible point λ of the maximization problem in (2), there always holds P_m

i=1λifi(x) = λi1f[1](x) + λi2f[2](x) + · · · + λimf[m](x)

≤ λi1f[1](x) + · · · + λiκf[κ](x) + λi,κ+1f[κ](x) + · · · + λimf[κ](x)

= λi1f_[1](x) + · · · + λiκf_[κ](x) + (κ − λi1− · · · − λiκ)f_[κ](x)

≤ Φκ(x);

on the other hand, there exists an feasible pointP_m

i=1˜λifi(x) = Φκ(x) such thatP_m

i=1λ˜ifi(x) = Φκ(x), where ˜λi= 1 if fi(x) belongs to the κ largest function in the collection {fi(x)}^m_i=1, and otherwise ˜λi= 0.

We note that the linear programming problem in (2) has the following dual problem minw,η κw +P_m

i=1ηi

s.t. ηi≥ fi(x) − w, i = 1, 2, · · · , m, ηi≥ 0, i = 1, 2, · · · , m

(3)

where η = (η1, · · · , ηm)^T and w and ηi are the Lagrange multipliers associated with the constraints P_m

i=1λi= κ and λi≤ 1, respectively. Moreover, they both have nonempty feasible sets for any x ∈ Rⁿ. Therefore, from the duality theory of linear programming, Φκ(x) can also be represented by the dual problem (3). However, we observe that each pair of constraints ηi ≥ fi(x) − w and ηi≥ 0 in (3) can be replaced with one constraint ηi≥ max{fi(x) − w, 0}, whereas the objective function of (3) is increasing componentwise in η. Therefore,

Φκ(x) = min

w∈R

n κw +

Xm i=1

max{0, fi(x) − w}o

, (4)

Thus, problem (1) is simplified as a nonsmooth problem only involving the plus function:

w∈R,x∈Rmin ⁿ n

κw + Xm i=1

max{0, fi(x) − w}

o

. (5)

To the best of our knowledge, the equivalent formulation (5) has not been found in the literature though the derivation procedure above is very simple.

In [6], Chen and Mangasarian presented a class of smooth approximations to the plus function max{0, t}. Among these smooth approximations, the neural networks smooth function and the Chen- Harker-Kanzow-Smale smooth function are most commonly used. In this paper we will use the neural networks smoothing function, which is defined by

ϕ(t, ε) = ε ln£

1 + exp(t/ε)¤

(ε > 0). (6)

By ϕ(t, ε), we define the function

Φ(w, x, ε) := κw + Xm i=1

ϕ¡

fi(x, ε) − w, ε¢

(7)

where

f_i(x, ε) = ω_ip

kx − a_ik²+ ε², i = 1, 2, · · · , m.

(4)

Then, from Lemma 1 below, problem (1) can be converted into solving the following unconstrained smooth convex minimization problem

w∈R,x∈Rmin ⁿΦ(w, x, ε). (8)

Lemma 1 The function Φ(w, x, ε) defined in (7) has the following properties:

(i) For any w ∈ R, x ∈ Rⁿ, and ε1, ε2 satisfying 0 < ε1< ε2, we have

Φ(w, x, ε1) < Φ(w, x, ε2); (9)

(ii) For any x ∈ Rⁿ and ε > 0,

Φκ(x) ≤ min

w∈RΦ(w, x, ε) ≤ Φκ(x) + m(ln 2 + 1)ε; (10) (iii) For any ε > 0, Φ(w, x, ε) is continuously differentiable and strictly convex.

Proof (i) For any t ∈ R, by a simple computation, we have

∂ϕ(t, ε)

∂ε = ln£

1 + exp(t/ε)¤

− exp(t/ε) 1 + exp(t/ε) · t

ε > 0

and ∂ϕ(t, ε)

∂t = exp(t/ε) 1 + exp(t/ε) > 0.

The two inequalities imply that for any w ∈ R, x ∈ Rⁿ, and ε1, ε2 satisfying 0 < ε1< ε2, Φ(w, x, ε1) < κw +

Xm i=1

ϕ¡

fi(x, ε2) − w, ε1

¢< κw + Xm i=1

ϕ¡

fi(x, ε2) − w, ε2

¢= Φ(w, x, ε2).

(ii) For any t ∈ R and ε > 0, it is easy to verify that

max{0, t} ≤ ϕ(t, ε) ≤ max{0, t} + ε ln 2.

This implies that for any w ∈ R, x ∈ Rⁿ and ε > 0,

max{0, fi(x, ε) − w} ≤ ϕ(fi(x, ε) − w, ε) ≤ max{0, fi(x, ε) − w} + ε ln 2, i = 1, · · · , m;

whereas

max{fi(x) − w, 0} ≤ max{fi(x, ε) − w, 0} ≤ max{fi(x) − w, 0} + ε, i = 1, 2, · · · , m.

The two sides imply that κw +

Xm i=1

max{0, fi(x) − w} ≤ Φ(w, x, ε) ≤ κw + Xm i=1

max{0, fi(x) − w} + m(ln 2 + 1)ε.

From the inequality and (4), the conclusion immediately follows.

(iii) For any ε > 0, it is clear that Φ(w, x, ε) is continuously differentiable. Now we prove that it is strictly convex. From (7),

∇Φ(w, x, ε) =







κ −P_m

i=1λi(w, x, ε) Xm

i=1

λi(w, x, ε)ωi(x − ai) pkx − aik²+ ε²





 , (11)

where

λi(w, x, ε) = exp£

(fi(x, ε) − w)/ε¤ 1 + exp£

(fi(x, ε) − w)/ε¤ , i = 1, 2, · · · , m. (12)

(5)

Let ˆλ_i(w, x, ε) = λ_i(w, x, ε) − λ²_i(w, x, ε), i = 1, 2, · · · , m,

and Q =

Xm i=1

h λi(w, x, ε)ωi

pkx − aik²+ ε²

³

In−(x − ai)(x − ai)^T kx − a_ik²+ ε²

´

+ ˆλi(w, x, ε)ω_i² ε(kx − a_ik²+ ε²)

¡x − ai

¢¡x − ai

¢_Ti . Then, it follows from (11) that

∇²Φ(w, x, ε) =







Xm i=1

λˆi(w, x, ε)

ε −

Xm i=1

λˆi(w, x, ε)ωi(x − ai)^T εp

kx − aik²+ ε²

− Xm i=1

ˆλi(w, x, ε)ωi(x − ai)^T εp

kx − aik²+ ε² Q





 .

For any ε > 0 and z = (z0; z) ∈ Rⁿ⁺¹with z 6= 0, we have z^T∇²Φ(w, x, ε)z = ε⁻¹z²₀

Xm i=1

ˆλi(w, x, ε) − 2ε⁻¹z0

Xm i=1

λˆi(w, x, ε)

pkx − aik²+ ε²ωi(x − ai)^Tz + z^TQz

= ε⁻¹ Xm i=1

ˆλi(w, x, ε) h

z₀²− 2z0 ωi(x − ai)^Tz

pkx − aik²+ ε² +³ ω_i(x − ai)^Tz pkx − aik²+ ε²

´₂i

+ Xm i=1

λi(w, x, ε)ωi

³ kzk²−

³ (x − ai)^Tz pkx − aik²+ ε²

´₂´

≥ Xm i=1

λi(w, x, ε)ωi

pkx − aik²+ ε² h

kzk²−³ (x − ai)^Tz pkx − aik²+ ε²

´₂i

≥ Xm i=1

λi(w, x, ε)ωi

³

kzk²− kzk²·

°°

° x − ai

°°

°²

´

> 0

and moreover z^T∇²Φ(w, x, ε)z = 0 if and only if z = 0. The first inequality is due to the nonnegativity of λˆi(w, x, ε), the second follows from the Cauchy-Schwartz inequality and the nonnegativity of λi(w, x, ε), and the last is obvious. Here, the proof of lemma is completed. ut

Next we give a different view into the smooth approximation (8) to close the content of this section.

Let θ(t) = t ln t + (1 − t) ln(1 − t). Introduce the binary entropy functionP_m

i=1θ(λi) as a regularizing term to the objective of linear programming problem in (2) and construct the regularization problem

max nX^m

i=1

λifi(x, ε) − ε Xm i=1

θ(λi) : Xm i=1

λi = κ, 0 ≤ λi≤ 1, i = 1, 2, · · · , m o

. (13)

Clearly, (13) is a strictly convex programming problem and satisfies the Slater constraint qualification.

Consequently, from the duality theory of convex program, the optimal value of (13) is equivalent to that of its dual problem. However, by a simple computation, (13) has the following dual problem

minw∈RΦ(w, x, ε).

Compared with the previous equation (8), this means that the smooth approximation in (8) is actually equivalent to the following binary entropy regularization problem

x∈Rminⁿmax nX^m

i=1

λifi(x, ε) − ε Xm

i=1

θ(λi) : Xm i=1

λi= κ, 0 ≤ λi≤ 1, i = 1, 2, · · · , m o

.

(6)

We note that Shi [26] constructed a similar regularization problem to derive a smoothing function for the sum of the κ largest components, but he did not provide an explicit expression for his smoothing function so that the function value must be determined by numerical computations.

3 Algorithm based on the neural networks smoothing function

In what follows, we present an algorithm for solving problem (1) based on the smooth approximation (8), followed by a global convergence result.

Algorithm 1

Let σ ∈ (0, 1), ( ˆw⁰, ˆx⁰) ∈ Rⁿ⁺¹ and ε0> 0 be given. Set k := 0.

For k = 0, 1, 2, · · · , do

1. Use an unconstrained minimization method to solve

w∈R,x∈Rmin ⁿΦ(w, x, εk), (14)

and write its minimizer as (w^k, x^k).

2. Set εk+1:= σεk and k := k + 1, and then go back to Setp 1.

End

Lemma 2 Let x be any point in Rⁿ and define the index set I(x) :=©

i ∈ {1, 2, · · · , m} | fi(x) ≥ f_[κ](x)ª

. (15)

Then

∂Φκ(x) =n P

i∈I(x)ρi∂fi(x) : P

i∈I(x)ρi= κ, 0 ≤ ρi≤ 1 for i ∈ I(x), ρi= 0 for i /∈ I(x) o

. (16) Proof Let ϑκ(y) =P_κ

l=1y[l], where y[1], · · · , y[m] are the numbers y1, · · · , ym sorted in nonincreasing order. Then, it follows from (2) that

ϑ_κ(y) = maxnX^m

i=1

λ_iy_i: Xm i=1

λ_i= κ, 0 ≤ λ_i ≤ 1, i = 1, 2, · · · , mo

= δ^∗(y| C)

where δ^∗(· | C) denotes the support function of the set C and C =©

λ ∈ R^m| P_m

i=1λi= κ, 0 ≤ λi≤ 1, i = 1, 2, · · · , mª . Therefore, by [24, Corollary 23.5.3],

∂ϑκ(y) = argmax nX^m

i=1

λiyi: Xm i=1

λi= κ, 0 ≤ λi≤ 1, i = 1, 2, · · · , m o

.

It is easily shown that the set on the right side of the last equality is exactly n P

j∈J(y)ρ_je_j : P

j∈J(y)ρ_j= κ, 0 ≤ ρ_j ≤ 1 for j ∈ J(y), ρ_j = 0 for j /∈ J(y)o , where {e1, e2, · · · , em} denote the canonical basis of R^mand

J(y) =©

j ∈ {1, 2, · · · , m}| yj≥ y[κ]

ª. Thus, from [16, Theorem 4.3.1], we immediately obtain (16). ut

(7)

Lemma 3 Let {(w^k, x^k)} be the sequence generated by Algorithm 1. Then, any limit points of {x^k} are optimal solutions to problem (1).

Proof Let (w^∗, x^∗) be a limit point of the sequence {(w^k, x^k)}. Without loss of generality, we assume that {(w^k, x^k)} → (w^∗, x^∗) when k tends to +∞. We now prove that x^∗ is an optimal solution to problem (1). First, from (10) and the fact that (w^k, x^k) is a solution of (14), it follows that

Φκ(x^k) ≤ Φ(w^k, x^k, εk) ≤ Φκ(x^k) + m(ln 2 + 1)εk. (17) By the continuity of fi(x), we thus have

Φκ(x^∗) = lim

k→+∞Φ(w^k, x^k, εk) = κw^∗+ Xm i=1

max©

0, fi(x^∗) − w^∗ª ,

which can be rewritten as

Φκ(x^∗) = κw^∗+ X

i∈I(x^∗)

max©

0, fi(x^∗) − w^∗ª

+ X

i /∈I(x^∗)

max©

0, fi(x^∗) − w^∗ª .

Here, I(x^∗) is defined by (15). The last equation implies that

fi(x^∗) − w^∗≤ 0 for all i /∈ I(x^∗). (18) In addition, from the fact that (w^k, x^k) is a solution of (14), we have

∇Φ(w^k, x^k, εk) =







κ −P_m

i=1λ^k_i Xm

i=1

λ^k_iωi(x^k− ai) pkx − aik²+ ε²





 = 0 (19)

where λ^k_i = λi(w^k, x^k, εk) for i = 1, 2, · · · , m. Combining with the definition of λi(w^k, x^k, εk) in (12), it is clear thatP_m

i=1λ^k_i = κ and 0 < λ^k_i < 1 for i = 1, 2, · · · , m. This means that the sequence {λ^k_i} for every i = 1, 2, · · · , m has a convergent subsequence. Without loss of generality, we suppose that

k→+∞lim λ^k_i = λ^∗_i, i = 1, 2, · · · , m.

Then, P_m

i=1λ^∗_i = κ, 0 ≤ λ^∗_i ≤ 1 for i = 1, 2, · · · , m. (20) Next we prove λ^∗_i = 0 for i /∈ I(x^∗). By (18), we consider the following two cases to prove it.

Case 1: there exits an index i0∈ I(x/ ^∗) such that fi0(x^∗) − w^∗= 0. In this case, there must hold f[1](x^∗) − w^∗> 0, f[2](x^∗) − w^∗> 0, · · · , f[κ](x^∗) − w^∗> 0.

By the continuity of fi(x), we have for all sufficiently large k

f_[1](x^k, εk) − w^k > 0, f_[2](x^k, εk) − w^k > 0, · · · , f_[κ](x^k, εk) − w^k > 0.

This implies that

k→+∞lim exp£

(f_[l](x^k, εk) − w^k)/εk

¤ 1 + exp£

(f_[l](x^k, εk) − w^k)/εk

¤ = 1, l = 1, 2, · · · , κ.

Thus, from (17) and the definition of λ^k_i, we have X

i /∈I(x^∗)

λ^∗_i = κ − lim

k→+∞

X

i∈I(x^∗)

exp£

(fi(x^k, εk) − w^k)/εk

¤ 1 + exp£

(fi(x^k, εk) − w^k)/εk

¤ = 0.

Consequently, λ^∗_i = 0 for i /∈ I(x^∗).

(8)

Case 2: f_i(x^∗) − w^∗< 0 for all i /∈ I(x^∗). Now, when k is sufficiently large, fi(x^k, εk) − w^k < 0 for all i /∈ I(x^∗)

due to the continuity of fi(x). Thus, from the definition of λ^k_i, we readily have λ^∗_i = 0 for i /∈ I(x^∗).

Thus, combing with the equation (20), we have P

i∈I(x^∗)λ^∗_i = κ, 0 ≤ λ^∗_i ≤ 1 for i ∈ I(x^∗), and λ^∗_i = 0 for i /∈ I(x^∗). (21) Note that for i ∈ I(x^∗),

ν_i^∗= lim

k→+∞

ωi(x^k− ai)

pkx^k− aik²+ ε²_k ∈ ∂f_i(x^∗).

Therefore, it follows from the equations (19) and (21) that X

i∈I(x^∗)

λ^∗_iν_i^∗= lim

k→+∞

Xm i=1

λ^k_iωi(x^k− ai) pkx − aik²+ ε² = 0.

Compared with (16), this indicates that 0 ∈ ∂Φκ(x^∗), and accordingly x^∗ is an optimal solution to problem (1). Here, we complete the proof of lemma. ut

Theorem 1 Let {x^k} be the sequence generated by Algorithm 1. If x^∗ is the unique optimal solution of problem (1), then we have limk→+∞x^k = x^∗.

Proof For any k ≥ 1, by Lemma 1,

Φ(w¹, x¹, ε1) > Φ(w¹, x¹, εk) ≥ Φ(w^k, x^k, εk) ≥ Φκ(x^k). (22) Consider that Φκ(x) is coercive, and so the level set L =©

x ∈ Rⁿ| Φκ(x) ≤ Φ(w¹, x¹, ε1)ª

is bounded.

However, from (22), we have {x^k} ⊆ L. This shows that the sequence {x^k} is bounded. Since x^∗ is the unique solution of problem (1), we have from Lemma 2 that limk→+∞x^k= x^∗. ut

In practice, we would stop Algorithm 1 once some stopping criteria are satisfied. Moreover, we will use a first-order, or gradient-based, unconstrained minimization algorithm to solve the problem (14) in Step 1. In what follows, we describe a more practical version of Algorithm 1 by choosing a limited-memory BFGS algorithm to solve the problem (14) in Step 1 because this algorithm can solve very large unconstrained optimization problem efficiently.

Algorithm 2

Let σ ∈ (0, 1), τ1, τ2> 0, ( ˆw⁰, ˆx⁰) ∈ Rⁿ⁺¹and ε0> 0 be given. Set k := 0.

For k = 0, 1, 2, · · · , until εk≤ τ1, do

1. Use a version of the limited-memory BFGS algorithm to solve (14) approximately, and obtain an (w^k, x^k) such that k∇Φ(w^k, x^k, ε_k)k ≤ τ₂.

2. Set ε_k+1= σε_k and k := k + 1, and then go back to Setp 1.

End

(9)

4 Second-order cone program reformulation

In this section, we will reformulate the single-facility κ-centrum problem (1) as a standard primal second-order cone program. First, from the discussions in the second paragraph of section 2, we know that problem (1) is equivalent to the optimization problem

w,ν,ηmin κw + Xm i=1

ηi

s.t. kωi(ν − ai)k ≤ ηi+ w, i = 1, 2, · · · , m, ηi≥ 0

(23)

where ν = (ν1, · · · , νn)^T ∈ Rⁿ. This problem can be rewritten as min (κ − m)w +

Xm i=1

ωiti

s.t. q¡

ν1− a11

¢₂ +¡

ν2− a12

¢₂

+ · · · +¡

νn− a1n

¢₂

≤ t1, q¡ν1− a21

¢₂ +¡

ν2− a22

¢₂

+ · · · +¡

νn− a2n

¢₂

≤ t2,

... ...

q¡ν1− am1

¢₂ +¡

ν2− am2

¢₂

+ · · · +¡

νn− amn

¢₂

≤ tm, ωiti− w ≥ 0, i = 1, 2, · · · , m.

(24)

Let ωiti− w = υi, i = 1, 2, · · · , m;

νj− aij= uij, i = 1, 2, · · · , m, j = 1, 2, · · · , n. (25) Then the constraints of problem (24) become





ω1t1− υ1= ω2t2− υ2= · · · = ωmtm− υm= w, υi ≥ 0, i = 1, 2, · · · , m,











u²₁₁ + u²₁₂+ · · · + u²_1n ≤ t²₁, t1≥ 0, u²₂₁ + u²₂₂+ · · · + u²_2n ≤ t²₂, t2≥ 0,

... ... ... u²_m1+ u²_m2+ · · · + u²_mn≤ t²_m, tm≥ 0,

and 









u11+ a11= u21+ a21= · · · = um1+ am1, u12+ a12= u22+ a22= · · · = um2+ am2,

... ... ... ...

u1n+ a1n = u2n+ a2n = · · · = umn+ amn.

(10)

Let

zi = (ti, ui1, ui2, · · · , uin) ∈ Rⁿ⁺¹, ci=¡ κ

mωi; 0n; 1−κ m

¢∈ Rⁿ⁺², i = 1, 2, · · · , m (26)

and define the second-order cone Kⁿ⁺¹=n

(ξ1; ξ2) | ξ1∈ R+, ξ2∈ Rⁿ, and kξ2k ≤ ξ1

o . Then,

xi:= (zi; υi) ∈ Kⁿ⁺¹× R+, i = 1, 2, · · · , m and furthermore, it follows from (25) that

c^T₁x1+ c^T₂x2+ · · · + c^T_mxm= (k − m)w + Xm i=1

ωiti.

Therefore, the problem in (24) turns into the form of minP_m

i=1c^T_ixi

s.t.











(ω1t1− ω2t2) − (υ1− υ2) = 0, (ω1t1− ω3t3) − (υ1− υ3) = 0,

...

(ω1t1− ωmtm) − (υ1− υm) = 0, u11− u21= a21− a11, u11− u31= a31− a11,

...

u11− um1= am1− a11, u12− u22= a22− a12, u12− u32= a32− a12,

...

u12− um2= am2− a12, ...

u1n− u2n= a2n− a1n, u1n− u3n= a3n− a1n,

...

u_1n− u_mn= a_mn− a_1n,

(x1; x2; · · · ; xm) ∈ (Kⁿ⁺¹× R+) × · · · × (Kⁿ⁺¹× R+).

(27)

Let N = m(n + 2), N1= (m − 1)(n + 1), K = (Kⁿ⁺¹× R+) × · · · × (Kⁿ⁺¹× R+),

x = (x1; · · · ; xm) ∈ R^N, c = (c1; · · · ; cm) ∈ R^N, b = (0m−1; b1; · · · ; bn) ∈ R^N¹,

(28)

where

bj = (a2j− a1j, a3j− a1j, · · · , amj− a1j)^T ∈ R^m−1 for j = 1, 2, · · · , n.

Then problem (27) can be recast into a standard primal SOCP:

min c^Tx

s.t. Ax = b, x ∈ K, (29)

(11)

where

A =







B11 B12 · · · B1m

B21 B22 · · · B2m

... ... · · · ...

B_(m−1)1 B_(m−1)2 · · · B_(m−1)m

· · · ·

A11 A12 · · · A1m

A21 A22 · · · A2m

... ... · · · ...

A_(m−1)1 A_(m−1)2 · · · A_(m−1)m

· · · ·

... ... · · · ...

· · · · A(n(m−1)−m+2)1 A(n(m−1)−m+2)2 · · · A(n(m−1)−m+2)m

A(n(m−1)−m+1)1 A(n(m−1)−m+1)2 · · · A(n(m−1)−m+1)m

... ... · · · ...

A_(n(m−1))1 A_(n(m−1))2 · · · A_(n(m−1))m







∈ R^N¹^×N (30)

in which the element Bkl is a row vector in Rⁿ⁺²defined by

Bkl=





(ω1, 0, · · · , 0, −1), if l = 1, (−ωl, 0, · · · , 0, 1), if l = k + 1,

0n+2, otherwise,

and the element Aklis a row vector in Rⁿ⁺²with the (dk/me + 2)th element being 1 and the others 0 if l = 1, otherwise if l = k + 1 the (dk/me + 2)th element being −1 and the others 0, and otherwise it is a zero vector in Rⁿ⁺². Here, dk/me is the largest positive integer not over than k/m.

It is not difficult to verify that A defined in (30) has a full row rank N1. We here want to point out that the similar reformulation techniques developed as above were also used by [7,17] for facility location problems in R², where the resulting SOCP problems are solved by a merit function approach and an interior point method, respectively. In the next section, we will develop an algorithm for problem (1) by following the same line as [7].

5 Algorithm based on the SOCP reformulation

In this section, we use the Fischer-Burmeister merit function developed by [8] for SOCCPs to transform the KKT system of the SOCP (29) into an equivalent unconstrained smooth minimization problem, and then based on this minimization problem, present an algorithm for solving problem (1).

The KKT optimality conditions of (29), which are sufficient but not necessary for optimality, are hx, yi = 0, x ∈ K, y ∈ K,

Ax = b, y = c − A^Tζd for some ζd∈ R^N¹,

(31)

where y = (y1; · · · ; ym) and yi = (si; wi) with si ∈ Rⁿ⁺¹ and wi ∈ R+ for i = 1, 2, · · · , m. Choose any d ∈ R^N satisfying Ad = b. (If no such d exists, then (29) has no feasible solution). Let AN ∈ R^{N ×(N −N}¹⁾be any matrix whose columns span the null space of A. Then x satisfies Ax = b if and only if x = d + ANζpfor some ζp∈ R^{N −N}¹. Thus, the KKT optimality conditions in (31) can be rewritten as the following SOCCP:

hx, yi = 0, x ∈ K, y ∈ K,

x = F (ζ), y = G(ζ), (32)

where

ζ = (ζ_p; ζ_d), F (ζ) := d + A_Nζ_p, G(ζ) := c − A^Tζ_d. (33)

(12)

Alternatively, consider that any ζ ∈ R^N can be decomposed into the sum of its orthogonal projection on the column space of A^T and the null space of A, so the following form can be used in place of (33) F (ζ) := d + (I − A^T(AA^T)⁻¹A)ζ, G(ζ) := c − A^T(AA^T)⁻¹Aζ. (34) In the sequel, unless otherwise stated, F (ζ) and G(ζ) are both defined by (34).

In [8], Chen extended the merit function, the squared norm of the Fischer-Burmeister function for the nonlinear complementarity problems, to SOCCPs, and then developed a unconstrained optimiza- tion technique for SOCCPs by use of the merit function. For any u = (u1, u2), v = (v1, v2) ∈ R × Rⁿ, define their Jordan product associated with Kⁿ⁺¹ by

u ◦ v :=¡

hu, vi, v1u2+ u1v2

¢.

The identity element under this product is (1, 0, · · · , 0) ∈ Rⁿ⁺¹. Write u² to mean u ◦ u and u + v to mean the usual componentwise addition of vectors. It is well known that u²∈ Kⁿ⁺¹for all u ∈ Rⁿ⁺¹. Moreover, if u ∈ Kⁿ⁺¹, there is a unique vector in Kⁿ⁺¹, denoted by u^1/2, such that (u^1/2)²= u^1/2◦u^1/2. Then,

φ(u, v) :=¡

u²+ v²¢_1/2

− u − v (35)

is well-defined for all (u, v) ∈ Rⁿ⁺¹× Rⁿ⁺¹and maps Rⁿ⁺¹× Rⁿ⁺¹to Rⁿ⁺¹. It was shown in [15] that φ(u, v) = 0 ⇐⇒ hu, vi = 0, u ∈ Kⁿ⁺¹, v ∈ Kⁿ⁺¹.

Note that K¹coincides with the set of nonnegative reals R+, and in that case φ(u, v) in (35) reduces to the Fischer-Burmeister NCP-function [12,13]. Thus,

ψ(x, y) :=1 2

Xm i=1

³

kφ(zi, si)k²+ φ²(υi, wi)

´

(36) is a merit function for the SOCCP (32), where zi, si ∈ Rⁿ⁺¹and υi, wi∈ R+. Consequently, (32) can rewritten as an equivalent global minimization problem as below:

ζ∈Rmin^NΨ(ζ) = ψ¡

F (ζ), G(ζ)¢

. (37)

Moreover, we know from [8] that ψ is smooth and every stationary point of (32) solves the SOCP (29).

Now we establish an algorithm for the problem (1) by solving the unconstrained smooth minimization problem (32) with a limited-memory BFGS method.

Algorithm 3

Let τ1, τ2, τ3> 0 and ζ⁰∈ R^N be given. Generate a steepest descent direction ∆⁰= −∇Ψ(ζ⁰) and seek a suitable stepsize α0. Let

ζ¹:= ζ⁰+ α0∆⁰, x¹:= F (ζ¹), y¹:= G(ζ¹).

For k = 1, 2, · · · , until Ψ(ζ^k) ≤ τ1 and |(x^k)^Ty^k| ≤ τ2, do If¡

∇Ψ(ζ^k) − ∇Ψ(ζ^k−1)¢_T

(ζ^k− ζ^k−1) ≤ τ3kζ^k− ζ^k−1k · k∇Ψ(ζ^k) − ∇Ψ(ζ^k−1)k, then

∆^k = −∇Ψ(ζ^k);

else

∆^k is generated by the limited-memory BFGS method.

End

Set ζ^k:= ζ^k+ αk∆^k, x^k:= F (ζ^k), y^k:= G(ζ^k), where αk is a stepsize. Let k := k + 1.

End

Remark 1 Suppose that ζ^∗ is the final iteration generated by Algorithm 3 and x^∗ = F (ζ^∗). Then, x^∗(2 : n + 1) + a1 is the optimal solution of problem (1) and the optimal value is c^Tx^∗.

(13)

6 Numerical experiments

We implemented Algorithm 2 in section 3 and Algorithm 3 in section 5 with our codes and compared them with SeduMi 1.05 [27] (A high quality sofware packages with Matlab interface for solving SOCP and semidefinite programming problems). Our computer codes are all written in Matlab, including the evaluation of ψ(x, y) and ∇ψ(x, y) in Algorithm 3. All the numerical experiments were done at a PC with CPU of 2.8 GHz and RAM of 512 MB. To solve the unconstrained minimization problem (14) in Algorithm 2 and generate the direction ∆^k and the suitable stepsize in Algorithm 3, we choose a limited-memory BFGS algorithm with Armijo line-search and 5 limited-memory vector-updates [5], where for the scaling matrix H⁰= γIN we use the recommended choice of γ = p^Tq/q^Tq [18, P. 226], where p := ζ − ζ^old and q := ∇Ψ(ζ) − ∇Ψ(ζ^old). This choice is found to work better than the choice used by [8] for our problems. To evaluate F and G in Algorithm 3, we use LU factorization of AA^T. In particular, given such a factorization LU = AA^T, we can compute x = F (ζ) and s = G(ζ) for each ζ via two matrix-vector multiplications and two forward/backword solves:

Lu = Aζ, U v = ξ, w = A^Tv, x = d + ζ − w, s = c − w. (38) For the vector d satisfying Ad = b, we compute it as a solution of mindkAd − bk using Matlab’s linear least square solver “lsqlin”.

Throughout the computational experiments, we use the following parameters in Algorithm 2:

ε0= 1, σ = 0.1, τ1= 1.0e − 6, τ2= 1.0e − 3.

For Algorithm 3, we use the following parameters

τ1= 1.0e − 8, τ2= 1.0e − 5, τ3= 1.0e − 4.

For SeDuMi, we use all the default values except that the parameter eps is set to be 1.0e − 6. The starting points for Algorithms 2 and 3 are chosen to be x⁰= 0n and ζ⁰= 0N, respectively.

The test problems are generated randomly by the following pseudo-random sequence:

ψ0= 7, ψi+1= (445ψi+ 1) mod 4096, i = 1, 2, · · · , ψ¯i= ψi/40.96. i = 1, 2, · · · ,

The elements of ai for i = 1, 2, · · · , m are successively set to ¯ψ1, ¯ψ2, · · · , in the order:

a1(1), a1(2), · · · , a1(n), a2(1), a2(2), · · · , a2(n), · · · , am(1), am(2), · · · , am(n), and the weight ωi is set to be ai(1)/10 if mod(i, 10) = 0, and otherwise ωi is set to be 1.

The numerical results are summarized in Tables 1-3. In these tables, n and m specify the problem dimensions, Obj. denotes the objective value of (1) at the final iteration, Iter indicates the iteration number and Time represents the CPU time in seconds for solving each problem, and for Algorithm 3, it also includes the time to make the LU decomposition of AA^T and find the feasible point d.

The results in Table 1 show how the iteration number of Algorithms 2-3 and SeduMi varies with κ for the problems of the same dimension (m = 50 and n = 1000). We can see from this table that the value of κ has a visible influence on their iteration number, and when κ closes to m, they will decrease, but when κ closes to 1, they will increase, and especially that of Algorithms 2 increase greatly.

The results listed in Tables 1-2 show that the algorithms presented in this paper perform very well and they are able to obtain good accuracy for all test problems. Particularly, Algorithm 2 consistently uses less CPU time than Algorithm 3 and SeDuMi. For the moderate problems where n and m are in the order of hundreds, Algorithm 3 is comparable with SeDuMi, and moreover, we can see from Table 2 that the CPU time of Algorithm 3 increases slower than that of SeDuMi with the dimension of problem increasing. In addition, we should point out that the evaluations of ψ(x, y) and ∇ψ(x, y) coded in Matlab increased the CPU time of Algorithm 3 greatly since it is a main part of the algorithm.

The results in Table 3 show that Algorithm 2 can solve very large problems in a reasonable amount of CPU time. However, Algorithm 3 and SeDuMi failed for these problems due to excessive CPU time or memory requirement.

We conclude that Algorithm 2 is better than Algorithm 3 and SeDuMi for very large problems since it consumes less memory. For moderate problems, Algorithm 3 and SeDuMi are comparable.