(will be inserted by the editor)
Shaohua Pan and Jein-Shan Chen
Two unconstrained optimization approaches for the Euclidean κ-centrum location problem
Abstract Consider the single-facility Euclidean κ-centrum location problem in Rn. This problem is a generalization of the classical Euclidean 1-median problem and 1-center problem. In this paper, we develop two efficient algorithms that are particularly suitable for problems where n is large by using unconstrained optimization techniques. The first algorithm is based on the neural networks smooth approximation for the plus function and reduces the problem to an unconstrained smooth convex min- imization problem. The second algorithm is based on the Fischer-Burmeister merit function for the second-order cone complementarity problem and transforms the KKT system of the second-order cone programming reformulation for the problem into an unconstrained smooth minimization problem. Our computational experiments indicate that both methods are extremely efficient for large problems and the first algorithm is able to solve problems of dimension n up to 10,000 efficiently.
Keywords The Euclidean κ-centrum problem, smoothing function, merit function, second-order cone programming
Mathematics Subject Classification (2000) 90C30
1 Introduction
Given a positive integer number κ in the interval [1, m], the single-facility Euclidean κ-centrum problem in Rn concerns locating a new facility so as to minimize the sum of the κ largest weighted Euclidean distances to the existing m facilities. Let ai ∈ Rn represent the position of the ith existing facility and x ∈ Rn denote the unknown position of the new facility. Let
fi(x) = ωi
q¡x1− ai1
¢2 +¡
x2− ai2
¢2
+ · · · +¡
xn− ain
¢2
, i = 1, 2, · · · , m
be the weighted Euclidean distance between the new facility and the ith existing facility, where ωi> 0 is the associated weight. Then the problem can be formulated as the following minimization problem
x∈RminnΦκ(x) :=
Xκ l=1
f[l](x) (1)
where f[1](x), f[2](x), · · · , f[m](x) are the functions obtained by sorting f1(x), f2(x), · · · , fm(x) in non- increasing order, namely for any x ∈ Rn,
f[1](x) ≥ f[2](x) ≥ · · · ≥ f[κ](x) ≥ · · · ≥ f[m](x).
Shaohua Pan: School of Mathematical Sciences, South China University of Technology, Guang zhou, 510640, China. E-mail: [email protected].
Jein-Shan Chen: Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan. E- mail: [email protected].
There are many different kinds of facility location problems and for which there have been proposed various methods; see [9,10,14] and the related literature in the web-site maintained by EWGLA (Euro Working Group on Locational Analysis). The single-facility Euclidean κ-centrum problem studied here is to generalize the classical Euclidean 1-median problem (corresponding to the case κ = m) and 1- center problem (corresponding to the case κ = 1) from a view of solution concept. To our knowledge, the κ-centrum concept was first defined by Slater [25] and Andreatta and Mason [3,4] for the discrete single-facility location problem, and later in [23] was extended to general location problems covering discrete and continuous decisions. In addition, Tamir et al. [22,28] recently did some excellent works related to the κ-centrum location problem, especially the rectilinear κ-centrum problem.
The current research for Euclidean facility location problems mainly focuses on the 1-median prob- lem, for which many practical and efficient algorithms have been designed since Weiszfeld presented a simple iterative algorithm in 1937. These include the hyperboloid approximation procedure [11], the interior point algorithms [1,29,30], the smoothing Newton methods [19,20] and the merit function approach [7]. However, the solution methods for the κ-centrum location problem are rarely seen in the literature except [2,21], where the problem of minimizing the κ largest Euclidean norms is only mentioned as a special example of a second-order cone programming and consequently can be solved by an interior point method. The main purpose of this paper is to develop two efficient algorithms for the single-facility Euclidean κ-centrum problem in Rn, which can be used to handle the cases where n is large (say, in the order of thousands).
Note that problem (1) is a nonsmooth convex optimization problem. Due to the nondifferentiability of the objective function, the gradient-based algorithms can not be used to solve the problem directly.
To overcome this difficulty, we reduce (1) as a nonsmooth problem only involving the plus function max{0, t}, and then utilize the neural network smoothing function [6] to give a convex approximation.
Based on the approximation problem, we propose a globally convergent quasi-Newton algorithm. In addition, we reformulate (1) as a standard primal second-order cone programming (SOCP) problem to circumvent its nondifferentiability. This SOCP reformulation is completely different from the ones in [2,21], and particularly we here use a merit function approach rather than an interior point method to solve it. More specifically, we convert its Karush-Kuhn-Tucker (KKT) system into an equivalent uncon- strained smooth minimization problem by the Fischer-Burmeister merit function [8] for second-order cone complementarity problems (SOCCPs), and then solve this smooth optimization problem with a limited-memory BFGS method. In contrast to interior point methods, our unconstrained optimization methods do not require an interior point starting, and moreover have the advantages of requiring less work per iteration, thereby rendering themselves to large problems.
The rest of this paper proceeds as follows. In section 2, we derive a smooth approximation to (1).
Based on the smooth approximation, we design a globally convergent quasi-Newton algorithm in sec- tion 3. In section 4, we present the detailed process of reformulating (1) as a standard primal SOCP problem. Based on its KKT system and the Fischer-Burmeister merit function for SOCCPs, we develop another quasi-Newton algorithm in section 5. In section 6, we report our preliminary computational results and compare our algorithms with the SeDuMi 1.05 (a primal-dual interior point algorithm for the SOCP and the semidefinite programming). The results show that our first algorithm is the most effective by CPU time and able to solve problems of dimension n up to 10, 000, whereas our second algorithm is comparable with even superior to the SeduMi for the moderate problems (say, in the order of hundreds). Finally, we conclude this paper in section 7.
In this paper, unless otherwise stated, all vectors are column vectors. We use Id to denote the d × d identity matrix and 0d to denote a zero vector in Rd. To represent a large matrix with several small matrices, we use semicolons “;” for column concatenation and commas “,” for row concatenation. This notation also applies to vectors. For a convex function f : Rn→ R, we let ∂f (x) denote the subdiffer- ential of f at x. Note that f is minimized at x over Rn if and only if 0 ∈ ∂f (x). For the calculus rules on subdifferentials of convex functions, please refer to [16] or [24].
2 Smooth approximation
In this section, we reduce problem (1) to a nonsmooth optimization problem only involving the plus function max{0, t}, and then give a smooth approximation via the neural networks smoothing function proposed by [6]. We also demonstrate that the smooth approximation can be generated by regularizing
problem (1) with a binary entropy function.
First, it is not hard to verify that the function Φκ(x) in (1) can be expressed by
Φκ(x) = max nXm
i=1
λifi(x) : Xm i=1
λi= κ, 0 ≤ λi≤ 1, i = 1, 2, · · · , m o
. (2)
Since, on the one hand, for any feasible point λ of the maximization problem in (2), there always holds Pm
i=1λifi(x) = λi1f[1](x) + λi2f[2](x) + · · · + λimf[m](x)
≤ λi1f[1](x) + · · · + λiκf[κ](x) + λi,κ+1f[κ](x) + · · · + λimf[κ](x)
= λi1f[1](x) + · · · + λiκf[κ](x) + (κ − λi1− · · · − λiκ)f[κ](x)
≤ Φκ(x);
on the other hand, there exists an feasible pointPm
i=1˜λifi(x) = Φκ(x) such thatPm
i=1λ˜ifi(x) = Φκ(x), where ˜λi= 1 if fi(x) belongs to the κ largest function in the collection {fi(x)}mi=1, and otherwise ˜λi= 0.
We note that the linear programming problem in (2) has the following dual problem minw,η κw +Pm
i=1ηi
s.t. ηi≥ fi(x) − w, i = 1, 2, · · · , m, ηi≥ 0, i = 1, 2, · · · , m
(3)
where η = (η1, · · · , ηm)T and w and ηi are the Lagrange multipliers associated with the constraints Pm
i=1λi= κ and λi≤ 1, respectively. Moreover, they both have nonempty feasible sets for any x ∈ Rn. Therefore, from the duality theory of linear programming, Φκ(x) can also be represented by the dual problem (3). However, we observe that each pair of constraints ηi ≥ fi(x) − w and ηi≥ 0 in (3) can be replaced with one constraint ηi≥ max{fi(x) − w, 0}, whereas the objective function of (3) is increasing componentwise in η. Therefore,
Φκ(x) = min
w∈R
n κw +
Xm i=1
max{0, fi(x) − w}o
, (4)
Thus, problem (1) is simplified as a nonsmooth problem only involving the plus function:
w∈R,x∈Rmin n n
κw + Xm i=1
max{0, fi(x) − w}
o
. (5)
To the best of our knowledge, the equivalent formulation (5) has not been found in the literature though the derivation procedure above is very simple.
In [6], Chen and Mangasarian presented a class of smooth approximations to the plus function max{0, t}. Among these smooth approximations, the neural networks smooth function and the Chen- Harker-Kanzow-Smale smooth function are most commonly used. In this paper we will use the neural networks smoothing function, which is defined by
ϕ(t, ε) = ε ln£
1 + exp(t/ε)¤
(ε > 0). (6)
By ϕ(t, ε), we define the function
Φ(w, x, ε) := κw + Xm i=1
ϕ¡
fi(x, ε) − w, ε¢
(7)
where
fi(x, ε) = ωip
kx − aik2+ ε2, i = 1, 2, · · · , m.
Then, from Lemma 1 below, problem (1) can be converted into solving the following unconstrained smooth convex minimization problem
w∈R,x∈Rmin nΦ(w, x, ε). (8)
Lemma 1 The function Φ(w, x, ε) defined in (7) has the following properties:
(i) For any w ∈ R, x ∈ Rn, and ε1, ε2 satisfying 0 < ε1< ε2, we have
Φ(w, x, ε1) < Φ(w, x, ε2); (9)
(ii) For any x ∈ Rn and ε > 0,
Φκ(x) ≤ min
w∈RΦ(w, x, ε) ≤ Φκ(x) + m(ln 2 + 1)ε; (10) (iii) For any ε > 0, Φ(w, x, ε) is continuously differentiable and strictly convex.
Proof (i) For any t ∈ R, by a simple computation, we have
∂ϕ(t, ε)
∂ε = ln£
1 + exp(t/ε)¤
− exp(t/ε) 1 + exp(t/ε) · t
ε > 0
and ∂ϕ(t, ε)
∂t = exp(t/ε) 1 + exp(t/ε) > 0.
The two inequalities imply that for any w ∈ R, x ∈ Rn, and ε1, ε2 satisfying 0 < ε1< ε2, Φ(w, x, ε1) < κw +
Xm i=1
ϕ¡
fi(x, ε2) − w, ε1
¢< κw + Xm i=1
ϕ¡
fi(x, ε2) − w, ε2
¢= Φ(w, x, ε2).
(ii) For any t ∈ R and ε > 0, it is easy to verify that
max{0, t} ≤ ϕ(t, ε) ≤ max{0, t} + ε ln 2.
This implies that for any w ∈ R, x ∈ Rn and ε > 0,
max{0, fi(x, ε) − w} ≤ ϕ(fi(x, ε) − w, ε) ≤ max{0, fi(x, ε) − w} + ε ln 2, i = 1, · · · , m;
whereas
max{fi(x) − w, 0} ≤ max{fi(x, ε) − w, 0} ≤ max{fi(x) − w, 0} + ε, i = 1, 2, · · · , m.
The two sides imply that κw +
Xm i=1
max{0, fi(x) − w} ≤ Φ(w, x, ε) ≤ κw + Xm i=1
max{0, fi(x) − w} + m(ln 2 + 1)ε.
From the inequality and (4), the conclusion immediately follows.
(iii) For any ε > 0, it is clear that Φ(w, x, ε) is continuously differentiable. Now we prove that it is strictly convex. From (7),
∇Φ(w, x, ε) =
κ −Pm
i=1λi(w, x, ε) Xm
i=1
λi(w, x, ε)ωi(x − ai) pkx − aik2+ ε2
, (11)
where
λi(w, x, ε) = exp£
(fi(x, ε) − w)/ε¤ 1 + exp£
(fi(x, ε) − w)/ε¤ , i = 1, 2, · · · , m. (12)
Let ˆλi(w, x, ε) = λi(w, x, ε) − λ2i(w, x, ε), i = 1, 2, · · · , m,
and Q =
Xm i=1
h λi(w, x, ε)ωi
pkx − aik2+ ε2
³
In−(x − ai)(x − ai)T kx − aik2+ ε2
´
+ ˆλi(w, x, ε)ωi2 ε(kx − aik2+ ε2)
¡x − ai
¢¡x − ai
¢Ti . Then, it follows from (11) that
∇2Φ(w, x, ε) =
Xm i=1
λˆi(w, x, ε)
ε −
Xm i=1
λˆi(w, x, ε)ωi(x − ai)T εp
kx − aik2+ ε2
− Xm i=1
ˆλi(w, x, ε)ωi(x − ai)T εp
kx − aik2+ ε2 Q
.
For any ε > 0 and z = (z0; z) ∈ Rn+1with z 6= 0, we have zT∇2Φ(w, x, ε)z = ε−1z20
Xm i=1
ˆλi(w, x, ε) − 2ε−1z0
Xm i=1
λˆi(w, x, ε)
pkx − aik2+ ε2ωi(x − ai)Tz + zTQz
= ε−1 Xm i=1
ˆλi(w, x, ε) h
z02− 2z0 ωi(x − ai)Tz
pkx − aik2+ ε2 +³ ωi(x − ai)Tz pkx − aik2+ ε2
´2i
+ Xm i=1
λi(w, x, ε)ωi
pkx − aik2+ ε2
³ kzk2−
³ (x − ai)Tz pkx − aik2+ ε2
´2´
≥ Xm i=1
λi(w, x, ε)ωi
pkx − aik2+ ε2 h
kzk2−³ (x − ai)Tz pkx − aik2+ ε2
´2i
≥ Xm i=1
λi(w, x, ε)ωi
pkx − aik2+ ε2
³
kzk2− kzk2·
°°
° x − ai
pkx − aik2+ ε2
°°
°2
´
> 0
and moreover zT∇2Φ(w, x, ε)z = 0 if and only if z = 0. The first inequality is due to the nonnegativity of λˆi(w, x, ε), the second follows from the Cauchy-Schwartz inequality and the nonnegativity of λi(w, x, ε), and the last is obvious. Here, the proof of lemma is completed. ut
Next we give a different view into the smooth approximation (8) to close the content of this section.
Let θ(t) = t ln t + (1 − t) ln(1 − t). Introduce the binary entropy functionPm
i=1θ(λi) as a regularizing term to the objective of linear programming problem in (2) and construct the regularization problem
max nXm
i=1
λifi(x, ε) − ε Xm i=1
θ(λi) : Xm i=1
λi = κ, 0 ≤ λi≤ 1, i = 1, 2, · · · , m o
. (13)
Clearly, (13) is a strictly convex programming problem and satisfies the Slater constraint qualification.
Consequently, from the duality theory of convex program, the optimal value of (13) is equivalent to that of its dual problem. However, by a simple computation, (13) has the following dual problem
minw∈RΦ(w, x, ε).
Compared with the previous equation (8), this means that the smooth approximation in (8) is actually equivalent to the following binary entropy regularization problem
x∈Rminnmax nXm
i=1
λifi(x, ε) − ε Xm
i=1
θ(λi) : Xm i=1
λi= κ, 0 ≤ λi≤ 1, i = 1, 2, · · · , m o
.
We note that Shi [26] constructed a similar regularization problem to derive a smoothing function for the sum of the κ largest components, but he did not provide an explicit expression for his smoothing function so that the function value must be determined by numerical computations.
3 Algorithm based on the neural networks smoothing function
In what follows, we present an algorithm for solving problem (1) based on the smooth approximation (8), followed by a global convergence result.
Algorithm 1
Let σ ∈ (0, 1), ( ˆw0, ˆx0) ∈ Rn+1 and ε0> 0 be given. Set k := 0.
For k = 0, 1, 2, · · · , do
1. Use an unconstrained minimization method to solve
w∈R,x∈Rmin nΦ(w, x, εk), (14)
and write its minimizer as (wk, xk).
2. Set εk+1:= σεk and k := k + 1, and then go back to Setp 1.
End
Lemma 2 Let x be any point in Rn and define the index set I(x) :=©
i ∈ {1, 2, · · · , m} | fi(x) ≥ f[κ](x)ª
. (15)
Then
∂Φκ(x) =n P
i∈I(x)ρi∂fi(x) : P
i∈I(x)ρi= κ, 0 ≤ ρi≤ 1 for i ∈ I(x), ρi= 0 for i /∈ I(x) o
. (16) Proof Let ϑκ(y) =Pκ
l=1y[l], where y[1], · · · , y[m] are the numbers y1, · · · , ym sorted in nonincreasing order. Then, it follows from (2) that
ϑκ(y) = maxnXm
i=1
λiyi: Xm i=1
λi= κ, 0 ≤ λi ≤ 1, i = 1, 2, · · · , mo
= δ∗(y| C)
where δ∗(· | C) denotes the support function of the set C and C =©
λ ∈ Rm| Pm
i=1λi= κ, 0 ≤ λi≤ 1, i = 1, 2, · · · , mª . Therefore, by [24, Corollary 23.5.3],
∂ϑκ(y) = argmax nXm
i=1
λiyi: Xm i=1
λi= κ, 0 ≤ λi≤ 1, i = 1, 2, · · · , m o
.
It is easily shown that the set on the right side of the last equality is exactly n P
j∈J(y)ρjej : P
j∈J(y)ρj= κ, 0 ≤ ρj ≤ 1 for j ∈ J(y), ρj = 0 for j /∈ J(y)o , where {e1, e2, · · · , em} denote the canonical basis of Rmand
J(y) =©
j ∈ {1, 2, · · · , m}| yj≥ y[κ]
ª. Thus, from [16, Theorem 4.3.1], we immediately obtain (16). ut
Lemma 3 Let {(wk, xk)} be the sequence generated by Algorithm 1. Then, any limit points of {xk} are optimal solutions to problem (1).
Proof Let (w∗, x∗) be a limit point of the sequence {(wk, xk)}. Without loss of generality, we assume that {(wk, xk)} → (w∗, x∗) when k tends to +∞. We now prove that x∗ is an optimal solution to problem (1). First, from (10) and the fact that (wk, xk) is a solution of (14), it follows that
Φκ(xk) ≤ Φ(wk, xk, εk) ≤ Φκ(xk) + m(ln 2 + 1)εk. (17) By the continuity of fi(x), we thus have
Φκ(x∗) = lim
k→+∞Φ(wk, xk, εk) = κw∗+ Xm i=1
max©
0, fi(x∗) − w∗ª ,
which can be rewritten as
Φκ(x∗) = κw∗+ X
i∈I(x∗)
max©
0, fi(x∗) − w∗ª
+ X
i /∈I(x∗)
max©
0, fi(x∗) − w∗ª .
Here, I(x∗) is defined by (15). The last equation implies that
fi(x∗) − w∗≤ 0 for all i /∈ I(x∗). (18) In addition, from the fact that (wk, xk) is a solution of (14), we have
∇Φ(wk, xk, εk) =
κ −Pm
i=1λki Xm
i=1
λkiωi(xk− ai) pkx − aik2+ ε2
= 0 (19)
where λki = λi(wk, xk, εk) for i = 1, 2, · · · , m. Combining with the definition of λi(wk, xk, εk) in (12), it is clear thatPm
i=1λki = κ and 0 < λki < 1 for i = 1, 2, · · · , m. This means that the sequence {λki} for every i = 1, 2, · · · , m has a convergent subsequence. Without loss of generality, we suppose that
k→+∞lim λki = λ∗i, i = 1, 2, · · · , m.
Then, Pm
i=1λ∗i = κ, 0 ≤ λ∗i ≤ 1 for i = 1, 2, · · · , m. (20) Next we prove λ∗i = 0 for i /∈ I(x∗). By (18), we consider the following two cases to prove it.
Case 1: there exits an index i0∈ I(x/ ∗) such that fi0(x∗) − w∗= 0. In this case, there must hold f[1](x∗) − w∗> 0, f[2](x∗) − w∗> 0, · · · , f[κ](x∗) − w∗> 0.
By the continuity of fi(x), we have for all sufficiently large k
f[1](xk, εk) − wk > 0, f[2](xk, εk) − wk > 0, · · · , f[κ](xk, εk) − wk > 0.
This implies that
k→+∞lim exp£
(f[l](xk, εk) − wk)/εk
¤ 1 + exp£
(f[l](xk, εk) − wk)/εk
¤ = 1, l = 1, 2, · · · , κ.
Thus, from (17) and the definition of λki, we have X
i /∈I(x∗)
λ∗i = κ − lim
k→+∞
X
i∈I(x∗)
exp£
(fi(xk, εk) − wk)/εk
¤ 1 + exp£
(fi(xk, εk) − wk)/εk
¤ = 0.
Consequently, λ∗i = 0 for i /∈ I(x∗).
Case 2: fi(x∗) − w∗< 0 for all i /∈ I(x∗). Now, when k is sufficiently large, fi(xk, εk) − wk < 0 for all i /∈ I(x∗)
due to the continuity of fi(x). Thus, from the definition of λki, we readily have λ∗i = 0 for i /∈ I(x∗).
Thus, combing with the equation (20), we have P
i∈I(x∗)λ∗i = κ, 0 ≤ λ∗i ≤ 1 for i ∈ I(x∗), and λ∗i = 0 for i /∈ I(x∗). (21) Note that for i ∈ I(x∗),
νi∗= lim
k→+∞
ωi(xk− ai)
pkxk− aik2+ ε2k ∈ ∂fi(x∗).
Therefore, it follows from the equations (19) and (21) that X
i∈I(x∗)
λ∗iνi∗= lim
k→+∞
Xm i=1
λkiωi(xk− ai) pkx − aik2+ ε2 = 0.
Compared with (16), this indicates that 0 ∈ ∂Φκ(x∗), and accordingly x∗ is an optimal solution to problem (1). Here, we complete the proof of lemma. ut
Theorem 1 Let {xk} be the sequence generated by Algorithm 1. If x∗ is the unique optimal solution of problem (1), then we have limk→+∞xk = x∗.
Proof For any k ≥ 1, by Lemma 1,
Φ(w1, x1, ε1) > Φ(w1, x1, εk) ≥ Φ(wk, xk, εk) ≥ Φκ(xk). (22) Consider that Φκ(x) is coercive, and so the level set L =©
x ∈ Rn| Φκ(x) ≤ Φ(w1, x1, ε1)ª
is bounded.
However, from (22), we have {xk} ⊆ L. This shows that the sequence {xk} is bounded. Since x∗ is the unique solution of problem (1), we have from Lemma 2 that limk→+∞xk= x∗. ut
In practice, we would stop Algorithm 1 once some stopping criteria are satisfied. Moreover, we will use a first-order, or gradient-based, unconstrained minimization algorithm to solve the problem (14) in Step 1. In what follows, we describe a more practical version of Algorithm 1 by choosing a limited-memory BFGS algorithm to solve the problem (14) in Step 1 because this algorithm can solve very large unconstrained optimization problem efficiently.
Algorithm 2
Let σ ∈ (0, 1), τ1, τ2> 0, ( ˆw0, ˆx0) ∈ Rn+1and ε0> 0 be given. Set k := 0.
For k = 0, 1, 2, · · · , until εk≤ τ1, do
1. Use a version of the limited-memory BFGS algorithm to solve (14) approximately, and obtain an (wk, xk) such that k∇Φ(wk, xk, εk)k ≤ τ2.
2. Set εk+1= σεk and k := k + 1, and then go back to Setp 1.
End
4 Second-order cone program reformulation
In this section, we will reformulate the single-facility κ-centrum problem (1) as a standard primal second-order cone program. First, from the discussions in the second paragraph of section 2, we know that problem (1) is equivalent to the optimization problem
w,ν,ηmin κw + Xm i=1
ηi
s.t. kωi(ν − ai)k ≤ ηi+ w, i = 1, 2, · · · , m, ηi≥ 0
(23)
where ν = (ν1, · · · , νn)T ∈ Rn. This problem can be rewritten as min (κ − m)w +
Xm i=1
ωiti
s.t. q¡
ν1− a11
¢2 +¡
ν2− a12
¢2
+ · · · +¡
νn− a1n
¢2
≤ t1, q¡ν1− a21
¢2 +¡
ν2− a22
¢2
+ · · · +¡
νn− a2n
¢2
≤ t2,
... ...
q¡ν1− am1
¢2 +¡
ν2− am2
¢2
+ · · · +¡
νn− amn
¢2
≤ tm, ωiti− w ≥ 0, i = 1, 2, · · · , m.
(24)
Let ωiti− w = υi, i = 1, 2, · · · , m;
νj− aij= uij, i = 1, 2, · · · , m, j = 1, 2, · · · , n. (25) Then the constraints of problem (24) become
ω1t1− υ1= ω2t2− υ2= · · · = ωmtm− υm= w, υi ≥ 0, i = 1, 2, · · · , m,
u211 + u212+ · · · + u21n ≤ t21, t1≥ 0, u221 + u222+ · · · + u22n ≤ t22, t2≥ 0,
... ... ... u2m1+ u2m2+ · · · + u2mn≤ t2m, tm≥ 0,
and
u11+ a11= u21+ a21= · · · = um1+ am1, u12+ a12= u22+ a22= · · · = um2+ am2,
... ... ... ...
u1n+ a1n = u2n+ a2n = · · · = umn+ amn.
Let
zi = (ti, ui1, ui2, · · · , uin) ∈ Rn+1, ci=¡ κ
mωi; 0n; 1−κ m
¢∈ Rn+2, i = 1, 2, · · · , m (26)
and define the second-order cone Kn+1=n
(ξ1; ξ2) | ξ1∈ R+, ξ2∈ Rn, and kξ2k ≤ ξ1
o . Then,
xi:= (zi; υi) ∈ Kn+1× R+, i = 1, 2, · · · , m and furthermore, it follows from (25) that
cT1x1+ cT2x2+ · · · + cTmxm= (k − m)w + Xm i=1
ωiti.
Therefore, the problem in (24) turns into the form of minPm
i=1cTixi
s.t.
(ω1t1− ω2t2) − (υ1− υ2) = 0, (ω1t1− ω3t3) − (υ1− υ3) = 0,
...
(ω1t1− ωmtm) − (υ1− υm) = 0, u11− u21= a21− a11, u11− u31= a31− a11,
...
u11− um1= am1− a11, u12− u22= a22− a12, u12− u32= a32− a12,
...
u12− um2= am2− a12, ...
u1n− u2n= a2n− a1n, u1n− u3n= a3n− a1n,
...
u1n− umn= amn− a1n,
(x1; x2; · · · ; xm) ∈ (Kn+1× R+) × · · · × (Kn+1× R+).
(27)
Let N = m(n + 2), N1= (m − 1)(n + 1), K = (Kn+1× R+) × · · · × (Kn+1× R+),
x = (x1; · · · ; xm) ∈ RN, c = (c1; · · · ; cm) ∈ RN, b = (0m−1; b1; · · · ; bn) ∈ RN1,
(28)
where
bj = (a2j− a1j, a3j− a1j, · · · , amj− a1j)T ∈ Rm−1 for j = 1, 2, · · · , n.
Then problem (27) can be recast into a standard primal SOCP:
min cTx
s.t. Ax = b, x ∈ K, (29)
where
A =
B11 B12 · · · B1m
B21 B22 · · · B2m
... ... · · · ...
B(m−1)1 B(m−1)2 · · · B(m−1)m
· · · ·
A11 A12 · · · A1m
A21 A22 · · · A2m
... ... · · · ...
A(m−1)1 A(m−1)2 · · · A(m−1)m
· · · ·
... ... · · · ...
· · · · A(n(m−1)−m+2)1 A(n(m−1)−m+2)2 · · · A(n(m−1)−m+2)m
A(n(m−1)−m+1)1 A(n(m−1)−m+1)2 · · · A(n(m−1)−m+1)m
... ... · · · ...
A(n(m−1))1 A(n(m−1))2 · · · A(n(m−1))m
∈ RN1×N (30)
in which the element Bkl is a row vector in Rn+2defined by
Bkl=
(ω1, 0, · · · , 0, −1), if l = 1, (−ωl, 0, · · · , 0, 1), if l = k + 1,
0n+2, otherwise,
and the element Aklis a row vector in Rn+2with the (dk/me + 2)th element being 1 and the others 0 if l = 1, otherwise if l = k + 1 the (dk/me + 2)th element being −1 and the others 0, and otherwise it is a zero vector in Rn+2. Here, dk/me is the largest positive integer not over than k/m.
It is not difficult to verify that A defined in (30) has a full row rank N1. We here want to point out that the similar reformulation techniques developed as above were also used by [7,17] for facility location problems in R2, where the resulting SOCP problems are solved by a merit function approach and an interior point method, respectively. In the next section, we will develop an algorithm for problem (1) by following the same line as [7].
5 Algorithm based on the SOCP reformulation
In this section, we use the Fischer-Burmeister merit function developed by [8] for SOCCPs to transform the KKT system of the SOCP (29) into an equivalent unconstrained smooth minimization problem, and then based on this minimization problem, present an algorithm for solving problem (1).
The KKT optimality conditions of (29), which are sufficient but not necessary for optimality, are hx, yi = 0, x ∈ K, y ∈ K,
Ax = b, y = c − ATζd for some ζd∈ RN1,
(31)
where y = (y1; · · · ; ym) and yi = (si; wi) with si ∈ Rn+1 and wi ∈ R+ for i = 1, 2, · · · , m. Choose any d ∈ RN satisfying Ad = b. (If no such d exists, then (29) has no feasible solution). Let AN ∈ RN ×(N −N1)be any matrix whose columns span the null space of A. Then x satisfies Ax = b if and only if x = d + ANζpfor some ζp∈ RN −N1. Thus, the KKT optimality conditions in (31) can be rewritten as the following SOCCP:
hx, yi = 0, x ∈ K, y ∈ K,
x = F (ζ), y = G(ζ), (32)
where
ζ = (ζp; ζd), F (ζ) := d + ANζp, G(ζ) := c − ATζd. (33)
Alternatively, consider that any ζ ∈ RN can be decomposed into the sum of its orthogonal projection on the column space of AT and the null space of A, so the following form can be used in place of (33) F (ζ) := d + (I − AT(AAT)−1A)ζ, G(ζ) := c − AT(AAT)−1Aζ. (34) In the sequel, unless otherwise stated, F (ζ) and G(ζ) are both defined by (34).
In [8], Chen extended the merit function, the squared norm of the Fischer-Burmeister function for the nonlinear complementarity problems, to SOCCPs, and then developed a unconstrained optimiza- tion technique for SOCCPs by use of the merit function. For any u = (u1, u2), v = (v1, v2) ∈ R × Rn, define their Jordan product associated with Kn+1 by
u ◦ v :=¡
hu, vi, v1u2+ u1v2
¢.
The identity element under this product is (1, 0, · · · , 0) ∈ Rn+1. Write u2 to mean u ◦ u and u + v to mean the usual componentwise addition of vectors. It is well known that u2∈ Kn+1for all u ∈ Rn+1. Moreover, if u ∈ Kn+1, there is a unique vector in Kn+1, denoted by u1/2, such that (u1/2)2= u1/2◦u1/2. Then,
φ(u, v) :=¡
u2+ v2¢1/2
− u − v (35)
is well-defined for all (u, v) ∈ Rn+1× Rn+1and maps Rn+1× Rn+1to Rn+1. It was shown in [15] that φ(u, v) = 0 ⇐⇒ hu, vi = 0, u ∈ Kn+1, v ∈ Kn+1.
Note that K1coincides with the set of nonnegative reals R+, and in that case φ(u, v) in (35) reduces to the Fischer-Burmeister NCP-function [12,13]. Thus,
ψ(x, y) :=1 2
Xm i=1
³
kφ(zi, si)k2+ φ2(υi, wi)
´
(36) is a merit function for the SOCCP (32), where zi, si ∈ Rn+1and υi, wi∈ R+. Consequently, (32) can rewritten as an equivalent global minimization problem as below:
ζ∈RminNΨ(ζ) = ψ¡
F (ζ), G(ζ)¢
. (37)
Moreover, we know from [8] that ψ is smooth and every stationary point of (32) solves the SOCP (29).
Now we establish an algorithm for the problem (1) by solving the unconstrained smooth minimiza- tion problem (32) with a limited-memory BFGS method.
Algorithm 3
Let τ1, τ2, τ3> 0 and ζ0∈ RN be given. Generate a steepest descent direction ∆0= −∇Ψ(ζ0) and seek a suitable stepsize α0. Let
ζ1:= ζ0+ α0∆0, x1:= F (ζ1), y1:= G(ζ1).
For k = 1, 2, · · · , until Ψ(ζk) ≤ τ1 and |(xk)Tyk| ≤ τ2, do If¡
∇Ψ(ζk) − ∇Ψ(ζk−1)¢T
(ζk− ζk−1) ≤ τ3kζk− ζk−1k · k∇Ψ(ζk) − ∇Ψ(ζk−1)k, then
∆k = −∇Ψ(ζk);
else
∆k is generated by the limited-memory BFGS method.
End
Set ζk:= ζk+ αk∆k, xk:= F (ζk), yk:= G(ζk), where αk is a stepsize. Let k := k + 1.
End
Remark 1 Suppose that ζ∗ is the final iteration generated by Algorithm 3 and x∗ = F (ζ∗). Then, x∗(2 : n + 1) + a1 is the optimal solution of problem (1) and the optimal value is cTx∗.
6 Numerical experiments
We implemented Algorithm 2 in section 3 and Algorithm 3 in section 5 with our codes and compared them with SeduMi 1.05 [27] (A high quality sofware packages with Matlab interface for solving SOCP and semidefinite programming problems). Our computer codes are all written in Matlab, including the evaluation of ψ(x, y) and ∇ψ(x, y) in Algorithm 3. All the numerical experiments were done at a PC with CPU of 2.8 GHz and RAM of 512 MB. To solve the unconstrained minimization problem (14) in Algorithm 2 and generate the direction ∆k and the suitable stepsize in Algorithm 3, we choose a limited-memory BFGS algorithm with Armijo line-search and 5 limited-memory vector-updates [5], where for the scaling matrix H0= γIN we use the recommended choice of γ = pTq/qTq [18, P. 226], where p := ζ − ζold and q := ∇Ψ(ζ) − ∇Ψ(ζold). This choice is found to work better than the choice used by [8] for our problems. To evaluate F and G in Algorithm 3, we use LU factorization of AAT. In particular, given such a factorization LU = AAT, we can compute x = F (ζ) and s = G(ζ) for each ζ via two matrix-vector multiplications and two forward/backword solves:
Lu = Aζ, U v = ξ, w = ATv, x = d + ζ − w, s = c − w. (38) For the vector d satisfying Ad = b, we compute it as a solution of mindkAd − bk using Matlab’s linear least square solver “lsqlin”.
Throughout the computational experiments, we use the following parameters in Algorithm 2:
ε0= 1, σ = 0.1, τ1= 1.0e − 6, τ2= 1.0e − 3.
For Algorithm 3, we use the following parameters
τ1= 1.0e − 8, τ2= 1.0e − 5, τ3= 1.0e − 4.
For SeDuMi, we use all the default values except that the parameter eps is set to be 1.0e − 6. The starting points for Algorithms 2 and 3 are chosen to be x0= 0n and ζ0= 0N, respectively.
The test problems are generated randomly by the following pseudo-random sequence:
ψ0= 7, ψi+1= (445ψi+ 1) mod 4096, i = 1, 2, · · · , ψ¯i= ψi/40.96. i = 1, 2, · · · ,
The elements of ai for i = 1, 2, · · · , m are successively set to ¯ψ1, ¯ψ2, · · · , in the order:
a1(1), a1(2), · · · , a1(n), a2(1), a2(2), · · · , a2(n), · · · , am(1), am(2), · · · , am(n), and the weight ωi is set to be ai(1)/10 if mod(i, 10) = 0, and otherwise ωi is set to be 1.
The numerical results are summarized in Tables 1-3. In these tables, n and m specify the problem dimensions, Obj. denotes the objective value of (1) at the final iteration, Iter indicates the iteration number and Time represents the CPU time in seconds for solving each problem, and for Algorithm 3, it also includes the time to make the LU decomposition of AAT and find the feasible point d.
The results in Table 1 show how the iteration number of Algorithms 2-3 and SeduMi varies with κ for the problems of the same dimension (m = 50 and n = 1000). We can see from this table that the value of κ has a visible influence on their iteration number, and when κ closes to m, they will decrease, but when κ closes to 1, they will increase, and especially that of Algorithms 2 increase greatly.
The results listed in Tables 1-2 show that the algorithms presented in this paper perform very well and they are able to obtain good accuracy for all test problems. Particularly, Algorithm 2 consistently uses less CPU time than Algorithm 3 and SeDuMi. For the moderate problems where n and m are in the order of hundreds, Algorithm 3 is comparable with SeDuMi, and moreover, we can see from Table 2 that the CPU time of Algorithm 3 increases slower than that of SeDuMi with the dimension of problem increasing. In addition, we should point out that the evaluations of ψ(x, y) and ∇ψ(x, y) coded in Matlab increased the CPU time of Algorithm 3 greatly since it is a main part of the algorithm.
The results in Table 3 show that Algorithm 2 can solve very large problems in a reasonable amount of CPU time. However, Algorithm 3 and SeDuMi failed for these problems due to excessive CPU time or memory requirement.
We conclude that Algorithm 2 is better than Algorithm 3 and SeDuMi for very large problems since it consumes less memory. For moderate problems, Algorithm 3 and SeDuMi are comparable.