A continuation approach for solving binary quadratic program based on a class of NCP-functions

(1)

to appear in Applied Mathematics and Computation, 2012

A continuation approach for solving binary quadratic program based on a class of NCP-functions

Jein-Shan Chen ¹ Department of Mathematics National Taiwan Normal University

Taipei 11677, Taiwan E-mail: [email protected]

Jing-Fan Li

Department of Mathematics National Taiwan Normal University

Taipei 11677, Taiwan E-mail: [email protected]

Jia Wu

School of Mathematical Sciences Dalian University of Technology

Dalian 116024, China

E-mail: jwu [email protected]

February 12, 2012

Abstract. In the paper, we consider a continuation approach for the binary quadratic program (BQP) based on a class of NCP-functions. More speciﬁcally, we recast the BQP as an equivalent minimization and then seeks its global minimizer via a global continuation method. Such approach had been considered in [11] which is based on the Fischer-Burmeister function. We investigate this continuation approach again by using a more general function, called the generalized Fischer-Burmeister function. However, the theoretical background for such extension can not be easily carried over. Indeed, it needs some subtle analysis.

Keywords. Nonlinear complementarity problem, generalized Fischer-Burmeister func- tion, Binary quadratic program.

1Member of Mathematics Division, National Center for Theoretical Sciences, Taipei Oﬃce. The author’s work is supported by National Science Council of Taiwan.

(2)

In this paper, we consider the following binary quadratic program (BQP)

min x^TQx + c^Tx over x∈ S, (1)

where Q is an n×n symmetric matrix, c is a vector in IRⁿand S is the binary discrete set {0, 1}ⁿ. It is known that BQP is NP-hard and has a variety of applications in computer science, operations research and engineering, see [1, 3, 8, 13, 14] and references therein.

There have been proposed several continuous approaches for solving BQP [9, 12, 15] which often need to cooperate with branch and bound algorithms or some heuristic strategies to generate an exact or approximate solution. In [10], another type of continuous approach was proposed which is to reformulate BQP as an equivalent mathematical programming problem with equilibrium constraints (MPEC) and then consider an effective algorithm to find its global solution. In this approach, many NCP-functions are employed to convert equilibrium constraints into a collection of quasi-linear equality constraints. Among others, the Fischer-Burmeister function ϕ_FB : IR² → IR defined as

ϕ_FB(a, b) =√

a²+ b²− (a + b) (2)

is a popular one. In this paper, we investigate this continuation approach again by using a more general function ϕ_p : IR² → IR, called the generalized Fischer-Burmeister function and deﬁned by

ϕp(a, b) :=∥(a, b)∥p− (a + b), (3) where p > 1 is an arbitrary ﬁxed real number and ∥(a, b)∥p denotes the p-norm of (a, b), i.e., ∥(a, b)∥p = √^p

|a|^p+|b|^p. In other words, in the generalized FB function ϕ_p, we replace the 2-norm of (a, b) appeared in the FB function by a more general p-norm.

The function ϕp is still an NCP-function, which naturally induces another NCP-function ψ_p : IR² → IR+ given by

ψ_p(a, b) := 1

2|ϕp(a, b)|². (4)

For any given p > 1, the function ψ_p is shown to possess all favorable properties of the FB function ψ_FB.

Traditionally, in the continuation approach for BQP, one needs to utilize the fact that x∈ {0, 1}ⁿ ⇐⇒ xi = x²_i, i = 1, 2,· · · , n. (5) To the contrast, our proposed continuous optimization approach arises from the complementarity condition formulation of 0−1 vector x ∈ {0, 1}ⁿ, which includes the equivalence (5) with redundant constraints

0≤ xi ≤ 1, i = 1, 2, · · · , n.

(3)

so that it can generate an integer feasible solution. For ﬁnding the global minimizer of our continuous optimization problem, we employ the similar way as in [10, 11]. In summary, the method is to add a quadratic penalty term associated with its equilibrium constraints and a logarithmic barrier term associated with box constraints−1 ≤ xi ≤ 1, i = 1, 2, ..., n, respectively, to the objective function, and then construct a global smoothing function.

Since the generalized Fischer-Burmeister function ψ_pis quasi-linear, the quadratic penalty for equilibrium constraints will make the convexity of the global smoothing function more stronger. Particularly, we have shown that the global smoothing function is strictly convex in the whole domain for barrier parameter large enough or in a subset of its domain for penalty parameter large enough. According to the feature above, we use a global continuation algorithm deﬁned in [11] via a sequence of unconstrained minimization for this function with varying penalty and barrier parameters. Although the idea is brought from [11], as will be seen, the theoretical background for such extension can not be easily carried over. Indeed, it needs some subtle analysis for extending the background materials. Without loss of generally, in this paper we consider the case that S ={−1, 1}ⁿ. By a transformation z = (x + e)/2 for the variable x and the unit vector e in IR, we can extend the conclusions to the case S ={0, 1}ⁿ.

2 Continuous formulation based on Φ

_p

function

In this section we will reformulate (1) as an equivalent continuous optimization based on the ϕ_p function. As will be seen, the following equivalence plays a key role which says that a binary constraint t∈ {a, b} with a, b ∈ IR is equivalent to a complementarity condition (or equilibrium constraint), i.e.,

t∈ {a, b} ⇐⇒ t − a ≥ 0, b − t ≥ 0, (t − a)(t − b) = 0.

With this, the unconstrained BQP problem in (1) can be recast as a mathematical programming problem with equilibrium constraints (MPEC)

min f (x)

s.t. (1 + x_i, 1− xi) = 0, i = 1, 2, ..., n, 1 + x_i ≥ 0, 1 − xi ≥ 0, i = 1, 2, · · · , n.

(6)

In fact, given any NCP-function ϕ : IR×IR → IR, the property of NCP-functions (see [6]) yields that the equilibrium constraint in (6) is indeed equivalent to an equality constraint associated with ϕ:

(1 + xi, 1− xi) = 0, 1 + xi ≥ 0, 1 − xi ≥ 0, ⇐⇒ ϕ(1 + xi, 1− xi) = 0. (7) Thus we reformulate the original BQP problem, which together with (6) and (7), as the following continuous optimization problem:

min f (x)

s.t. ϕ(1 + x_i, 1− xi) = 0, i = 1, 2,· · · , n

−1 ≤ xi ≤ 1, i = 1, 2, · · · , n.

(8)

(4)

box constraints −1 ≤ xi ≤ 1, i = 1, 2, · · · , n in (8) are indeed redundant, we keep them on purpose. Actually, we shall see that such constraints play a crucial role in the construction of a global smoothing function for problem (8) as was shown in [9, 10]. Generally speaking, most NCP-functions are non-diﬀerentiable, such as the popular Fischer-Burmeister function in (2), the generalized Fischer-Burmeister function in (3), as well as the minimum function

ϕ_min(a, b) = min{a, b}.

However, it is very interesting to observe that, when specializing ϕ in (8) as the generalized Fischer-Burmeister function, we can reach smooth constraint functions

ϕ_p(1 + x_i, 1− xi) = √^p

|1 + xi|^p+|1 − xi|^p− 2 = 0, i = 1, 2, ..., n

and consequently some usual nonlinear programming solvers can be employed to design an eﬀective algorithm for solving problem (8). In view of this, we in this paper pay atten- tion to the following equivalent continuous formulations reformulated by the generalized Fischer-Burmeister function:

min f (x)

s.t. ϕ_p(1 + x_i, 1− xi) = 0, i = 1, 2,· · · , n

−1 ≤ xi ≤ 1, i = 1, 2, · · · , n.

(9)

We also note that using the equivalence that x_i ∈ {−1, 1} ⇐⇒ x²i = 1 gives another another type of continuous optimization:

min f (x)

s.t. x²_i = 1, i = 1, 2,· · · , n

−1 ≤ xi ≤ 1, i = 1, 2, · · · , n.

(10)

The formulation of (10) looks simple and friendly at ﬁrst glance, nonetheless, the following remarkable advantages explain why we still stick to the smooth constrained optimization problem (9):

(i) The quasi-linearity of generalized Fischer-Burmeister function implies that it feasible set tends to be convex.

(ii) The equality constraint conditions ϕ_p(1 + x_i, 1− xi) = 0, i = 1, 2, ..., n have incor- porated the equivalent formulation x²_i = 1, i = 1, 2, ..., n, of x ∈ {−1, 1} with its relaxation formulation −1 ≤ xi ≤ 1, i = 1, 2, ..., n, which indicates that, when solving (9) with a penalty function method, an implicit interior point constraint is additionally imposed on.

(iii) From Proposition 2.1 as below, we see that the quadratic penalty function of equal- ity constraints is strictly convex in a very large region when the penalty parameter is large enough.

(5)

These advantages have great contributions to searching for an optimal solution or a favorable suboptimal solution of (1), which will be shown later. Before we prove the main proposition, we ﬁrst introduce several technical lemmas which are important for building up the background materials of our extension.

Lemma 2.1 Let f, g be real-valued functions from IR to IR₊. Suppose f, g satisfy (i) f^′(x) > 0 and g^′(x) < 0 for all x ∈ (a, b),

(ii) f^′′(x) < 0 and g^′′(x) < 0 for all x∈ (a, b), (iii) (f g)^′(a) < 0 and f (a)≥ g(a).

Then (f g)^′(x) < 0 for all x ∈ (a, b).

Proof. To achieve our result, we need to verify two things: (i) (f g)^′(a) < 0 and (ii) (f g)^′(x) is decreasing on x ∈ (a, b). We proceed these veriﬁcations as below.

(i) From the assumptions and the chain rule, it is clear that (f g)^′(a) = f^′(a)g(a) + f (a)g^′(a) < 0.

(ii) Since (f g)^′(x) = f^′(x)g(x) + f (x)g^′(x), we see that in order to show (f g)^′(x) is decreasing on x∈ (a, b), it is enough to argue both f^′(x)g(x) and f (x)g^′(x) are decreasing on (a, b). We look into the ﬁrst term ﬁrst. Note that

(f^′(x)g(x))^′ = f^′′(x)g(x) + f^′(x)g^′(x)≤ 0 ∀x ∈ (a, b)

because f^′′(x) < 0, g(x) ≥ 0, f^′(x) > 0 and g^′(x) < 0. This claims that f^′(x)g(x) is decreasing on x ∈ (a, b). The decreasing of f(x)g^′(x) over (a, b) can be concluded similarly.

Thus, from all the above, the proof is complete. 2

The conclusion of next lemma is simple and neat, however, its arguments are very tedious. Indeed the main idea behind is approximation.

Lemma 2.2 Let ψ_p be deﬁned as in (4). Then, ψ_p^′′(1+t, 1−t) is positive at t = ±√ 2¹³ − 1 for all p≥ 2.

Proof. For symmetry, we only prove the case of t = √

2¹³ − 1. First, from direct computations and simplifying the expression of ψ^′′_p, we have

ψ^′′_p(1 + t, 1− t) = ((1 + t)^p + (1− t)^p)¹^p

(1 + t)²(1− t)²((1 + t)^p+ (1− t)^p)² × F (p, t) (11)

(6)

where F (p, t) = f₀(p, t) f₁(p, t) + f₂(p, t) + f₃(p, t) + f₄(p, t) with

f₀(p, t) = ((1 + t)^p+ (1− t)^p)^p¹ f₁(p, t) = (1− t)²(t + 1)^2p f₂(p, t) = (t + 1)²(1− t)^2p

f3(p, t) = (2t²+ 4p− 6)(t + 1)^p(1− t)^p f₄(p, t) = (8− 8p)(1 − t²)^p.

Since the ﬁrst term on the right side of (11) is always positive for all p≥ 2, it suﬃces to show that F

( p,

√

2¹³ − 1 )

> 0 for all p≥ 2. However, it is very hard to claim this fact directly. Our strategy is to construct a function A : IR→ IR such that

A(p)≤ F (

p,

√ 2¹³ − 1

)

∀p ≥ 2. (12)

The special feature for A(p) is that it is easier to verify A(p)≥ 0 for all p ≥ 2 so that our goal could be reached. Now, we proceed the proof by carrying out the aforementioned two steps.

Step (1): Construct a function A(·) satisfying (12). Indeed, the function F (·, ·) is com- posed of f₀, f₁, f₂, f₃ and f₄, so for each f_i, we will construct a corresponding piecewise function a_i such that a_i(p)≤ fi

( p,√

2¹³ − 1)

for i = 0, 1, 2, 3, 4. Then, combining them together to build up the function A(·). For making the reader understand more easier , we will give some pictures during the process of proof.

(i) First, we explain how to set up a₀(p). Notice that the second derivative of f₀ with respect to p is positive at t =

√

2¹³ − 1 for all p ≥ 2, f0 is strictly convex at t =

√ 2¹³ − 1 for all p≥ 2 (the detailed arguments are provided in Appendix A). Hence, we consider a real piecewise function deﬁned as

a₀(p) = { −1

8 (p− 2) + 2²³ if 2≤ p ≤ −8√

2¹³ − 1 − 6 + 8(2²³),

√

2¹³ − 1 + 1 if p≥ −8√

2¹³ − 1 − 6 + 8(2²³).

Figure 1 depicts the relation between a₀(p) and f₀ (

p,√

2¹³ − 1 )

. Besides, the following facts

a₀(2) = f₀ (

2,

√ 2¹³ − 1

)

lim

p→2⁺a^′₀(p) < d dpf₀

( 2,

√ 2¹³ − 1

)

a^′′₀(p) = 0 < d² dp²f₀

( p,

√ 2¹³ − 1

)

(7)

Figure 1: The graphs of a₀ and f₀.

indicate the ﬁrst part of function a₀(p) is less than f₀ (

p,

√

2¹³ − 1 )

for 2 < p ≤

−8√

2¹³ − 1 − 6 + 8(2²³). On the other hand, another fact

p→∞lim f0

( p,

√ 2¹³ − 1

)

=

√

2¹³ − 1 + 1

says that the second part of function a₀(p) is less than or equal to f₀ (

p,√

2¹³ − 1) for p≥ −8√

2¹³ − 1 − 6 + 8(2²³). Thus, we conclude that

a₀(p)≤ f0

( p,

√ 2¹³ − 1

)

∀p ≥ 2.

(ii) Secondly, we consider a quadratic function deﬁned as

a₁(p) = (

1−

√ 2¹³ − 1

)2( 1 +

√ 2¹³ − 1

)4

ln (

1 +

√ 2¹³ − 1

)

(p− 1)² +

( 1−

√ 2¹³ − 1

)2( 1 +

√ 2¹³ − 1

)4[ 1− ln

( 1 +

√ 2¹³ − 1

)]

.

(8)

Figure 2 depicts the relation between a₁(p) and f₁ (

p,√

2¹³ − 1)

. Again, using the following facts

a₁(2) = f₁ (

2,

√ 2¹³ − 1

) , a^′₁(2) = d

dpf₁ (

2,

√ 2¹³ − 1

) , a^′′₁(p) ≤ d²

dp²f₁ (

p,

√ 2¹³ − 1

)

∀p ≥ 2, we immediately achieve

a₁(p)≤ f1

( p,

√ 2¹³ − 1

)

∀p ≥ 2.

(iii) Thirdly, we consider a function deﬁned as

a2(p) =











−¹₅(p− 2) +( 1−√

2¹³ − 1 )4( 1 +√

2¹³ − 1 )2

if 2≤ p ≤ 12 + 20(

2¹³ − 2²³) +

√

2¹³ − 1(

40(2¹³)− 40 − 10(2²³) )

, 0

if p≥ 12 + 20(

2¹³ − 2²³) +

√

2¹³ − 1(

40(2¹³)− 40 − 10(2²³) )

.

(9)

Figure 3: The graphs of a₂ and f₂. Figure 3 depicts the relation between a₂(p) and f₂

( p,√

2¹³ − 1)

. We observe that the function f₂ is positive and convex on p≥ 2, then the following facts

a2(2) = f2

( 2,

√ 2¹³ − 1

)

lim

p→2⁺a^′₂(p) < d dpf₂

( 2,

√ 2¹³ − 1

)

a^′′₂(p) = 0 < d² dp²f2

( p,

√ 2¹³ − 1

)

∀p > 2

yield a₂(p)≤ f2

( p,√

2¹³ − 1)

for all p≥ 2.

(iv) Fourthly, we consider a real piecewise function deﬁned as

a₃(p) =









 [√

2− 2¹³ (

24− 12(2²³) )

+ 16(2²³ − 2¹³)− 8] p +

√

2− 2¹³(24(2²³)− 48) + 40(2²³ − 2¹³) + 20 if 2≤ p ≤ ⁵₂,

−(

4

312¹³ + ₃₁⁴ )

(2− 2¹³)⁵²p + (72

312¹³ + ⁷²₃₁ )

(2− 2¹³)⁵² if ⁵₂ ≤ p ≤ 18,

0 if p≥ 18.

Figure 4 depicts the relation between a₃(p) and f₃ (

p,√

2¹³ − 1)

. The relation is clear from the picture, however, we need to go through three subcases to verify it mathemati-

(10)

cally.

If 2 ≤ p ≤ ⁵₂, we compute f₃ (

p,√

2¹³ − 1 )

= (

2(2¹³)− 8 + 4p)

(2− 2¹³)^p. Moreover, we have

d dpf₃

( p,

√ 2¹³ − 1

)

= (2− 2¹³)^p [

4 + (

2(2¹³)− 8 + 4p)

ln(2− 2¹³) ]

, d²

dp²f₃ (

p,

√ 2¹³ − 1

)

= (2− 2¹³)^pln(2− 2¹³) [

8 + (

2(2¹³)− 8 + 4p)

ln(2− 2¹³) ]

. Then, the following facts

a₃(2) = f₃ (

2,

√ 2¹³ − 1

) , a3

(5 2

)

= f3

(5 2,

√ 2¹³ − 1

) , lim

p→2⁺a^′₃(p) ≤ d dpf₃

( 2,

√ 2¹³ − 1

) ,

and f₃ (

p,

√

2¹³ − 1 )

being concave on[ 2,⁵₂]

imply a₃(p)≤ f3

( p,

√

2¹³ − 1)

under this case.

(11)

If ⁵₂ ≤ p ≤ 18 , using the facts that

a₃ (5

2 )

= f₃ (5

2,

√ 2¹³ − 1

)

lim

p→⁵₂⁺

a^′₃(p) ≤ d dpf₃

(5 2,

√ 2¹³ − 1

)

and a₃(p) = f₃ (

p,√

2¹³ − 1)

having only one solution at p = ⁵₂, we obtain a₃(p) ≤ f₃

( p,√

2¹³ − 1)

under this case.

If p ≥ 18, knowing f3(p) > 0 for all p, then it is clear that a₃(p) ≤ f3

( p,

√

2¹³ − 1) under this case.

(v) Finally, notice that the second derivative of f₄ with respect to p is positive at t =

√

2¹³ − 1 for all p ≥ ^{−2+ln(2−2}¹³⁾

ln(2−2¹³) , and negative for p≤ ^{−2+ln(2−2}¹³⁾

ln(2−2¹³) , so f₄ is strictly convex at t =

√

2¹³ − 1 for all p ≥ ^{−2+ln(2−2}¹³⁾

ln(2−2¹³) and strictly concave for all p≤ ^{−2+ln(2−2}¹³⁾

ln(2−2¹³) . Hence, we consider a real piecewise function deﬁned as

a₄(p) =











−¹⁵³₅₀p− 8(2 − 2¹³)²+ ¹⁵³₂₅ if 2≤ p ≤ ⁵₂, [−⁴⁹⁷₅₀ + 16(2− 2¹³)²

]

p + ⁵⁸³₂₅ − 48(2 − 2¹³)² if ⁵₂ ≤ p ≤ 3,

−¹³₁₀p− ¹³₅ if 3≤ p ≤ ⁴⁹₁₃,

−¹⁵₂ if p≥ ⁴⁹₁₃.

Figure 5 depicts the relation between a₄(p) and f₄ (

p,√

2¹³ − 1)

. Again, we need to discuss several subcases to prove the relation mathematically.

For 2≤ p ≤ ⁵₂, the following facts

a₄(2) = f₄ (

2,

√ 2¹³ − 1

)

lim

p→2⁺a^′₄(p) < d dpf₄

( 2,

√ 2¹³ − 1

)

a^′′₄(p) = 0 < d² dp²f4

( p,

√ 2¹³ − 1

)

yield the ﬁrst part of function a₄(p) is less than f₄ (

p,

√

2¹³ − 1 )

under this case.

(12)

For ⁵₂ ≤ p ≤ 3, using the following facts

a₄(3) < f₄ (

3,

√ 2¹³ − 1

)

lim

p→3⁻a^′₄(p) > d dpf₄

( 3,

√ 2¹³ − 1

)

(13) a^′′₄(p) = 0 < d²

dp²f₄ (

p,

√ 2¹³ − 1

)

we have a4(p) is less than f4

( p,

√

2¹³ − 1)

under this case.

For 3≤ p ≤ ⁴⁹₁₃, we know that

lim

p→3⁺a^′₄(p) > d dpf4

( 3,

√ 2¹³ − 1

) .

This together with (13) gives a₄(p) is less than f₄ (

p,

√

2¹³ − 1)

under this case.

(13)

Figure 6: The graphs of A and F

For p ≥ ⁴⁹₁₃, from f₄(p) being strictly convex for all p ≤ ^{−2+ln(2−2}¹³⁾

ln(2−2¹³) and being strictly concave for all p≥ ^{−2+ln(2−2}¹³⁾

ln(2−2¹³) , we know d

dpf₄

(−1 + ln(2 − 2¹³) ln(2− 2¹³)

)

= 0 and lim

p→∞f₄(p) = 0 which lead to f₄(p) >−¹⁵₂ for all p≤ 2. Thus, a4(p) ≤ f4

( p,

√

2¹³ − 1 )

under this case.

Now, we are ready to deﬁne a function A : IR → IR satisfying (12). As the mentioned idea, the function is deﬁned by

A(p) = a₀(p) [

a₁(p) + a₂(p) + a₃(p) ]

+ a₄(p).

According to our constructions of a_i(p), it is clear that A(p) ≤ F( p,

√

2¹³ − 1 )

for all p≥ 2. Figure 6 shows the relation between A(p) and F(

p,

√

2¹³ − 1 ) .

Step (2): We will show that A(p) ≥ 0 for all p ≥ 2. Notice that A(p) is piecewise smooth, hence A^′(p) is a piecewise function. Indeed, the expression of A^′(p) looks very

(14)

expression for A^′(p) in Appendix C which helps us understand the structure of A^′(p).

The key point is that from the expression of the A^′(p), we can verify the following facts:

A(2) = 0, A

(

−8

√

2¹³ − 1 − 6 + 8(2²³) )

> A (5

2 )

> 0,

and {

A^′(p) < 0 if p∈ (⁵₂,−8√

2¹³ − 1 − 6 + 8(2²³)), A^′(p) > 0 otherwise,

with the exception of points of discontinuity. Thus, we conclude A(p)≥ 0 for all p ≥ 2 and (12) is satisﬁed, which imply F

( p,√

2¹³ − 1 )

≥ 0 for all p ≥ 2. Then, the proof is complete. 2

Lemma 2.3 (a) Let f be a convex function deﬁed on a convex set C in IRⁿ and g be a nondecreasing convex function deﬁned on an interval I in IR. Suppose f (C)⊆ I.

Then, the composite function g◦ f deﬁned by (g ◦ f)(x) = g(f(x)) is convex on C.

(b) Suppose ϕ1 : U → IR is a twice continuously diﬀerentiable function with a compact set U ∈ IRⁿ and ϕ₂ : X → IR is a twice continuously diﬀerentiable function such that the minimum eigenvalue of its Hessian matrix∇²xxϕ₂(x) is greater than ε (> 0) for all x∈ X, where X ⊂ U. Then there exists a constant ˆβ > 0 such that ϕ1+ βϕ₂ is a strictly convex function on X for β > ˆβ.

Proof. (a) See [2, Chap III, Lemma 1.4].

(b) See [9, Theorem 3.1]. 2

Proposition 2.1 Let ϕ_p and ψ_p be deﬁned as in (3) and (4), respectively. Then, for any ﬁxed p≥ 2, the following hold.

(a) The function ϕp(1 + t, 1− t) is strictly convex for all t ∈ IR.

(b) The function ψ_p(1 + t, 1− t) is strictly convex for all t /∈[

−√

2¹³ − 1,√

2¹³ − 1] . Proof. (a) It is known know that ϕ_p is a convex function [4, 5, 6]. Note that f is a composition of ϕ_p and an affine function. Thus, f is convex since it is a composition of a convex function and an affine function (the composition of two convex functions is not necessarily convex, however, our case does guarantee the convexity because one of them is affine).

(15)

(b) Due to the symmetry of ψ_p(1 + t, 1− t), it is enough to show that ψp(1 + t, 1− t) is strictly convex for t≥√

2¹³ − 1. To proceed, we discuss two cases.

(i) If t ≥ 1, the function ψp(1 + t, 1− t) can be regard as a composite function of ϕp(1 + t, 1− t) and h(·) = (·)². Because h(·) is nondecreasing convex function on [0, ∞]

and ϕ_p(1 + t, 1− t) is positive strictly convex for t ≥ 1, from Lemma 2.3, we obtain ψ(1 + t, 1− t) is strictly convex for t ≥ 2.

(ii) If 1 > t≥√

2¹³ − 1, we know that

−ψ^′p(1 + t, 1− t) = −ϕp(1 + t, 1− t)ϕ^′p(1 + t, 1− t)

−ψp^′′(1 + t, 1− t) = [

−ϕ^′p(1 + t, 1− t)ϕ^′p(1 + t, 1− t) − ϕp(1 + t, 1− t)ϕ^′′p(1 + t, 1− t)] . Then, it suﬃces to show that −ψp^′′(1 + t, 1− t) < 0 for p ≥ 2. To this end, we compute the third derivative of ϕ_p(1 + t, 1− t) with respect to t and prove that it is negative. To see this,

ϕ^′′′_p(1 + t, 1− t) = 4 [(1 + t)^p + (1− t)^p]¹^p(1 + t)^p(1− t)^p(p− 1)

(1 + t)³(t− 1)³[(1 + t)^p+ (1− t)^p]³ × T (p, t) (14) where T is a real valued function deﬁned by

T (p, t) = (1 + t)^p(2p− 1 − 3t) − (1 − t)^p(2p + 3t− 1).

It’s not hard to verify the ﬁrst term of the right side of (14) is always negative for all p ≥ 2. Thus, we only need to show T (p, t) > 0 for all p ≥ 2 which is equivalent to verifying T (2, t) > 0 and T (p, t) > T (2, t) for all p > 2. These can be done as below.

(i) Because T (2, t) = 6t− 6t³, it is clear T (2, t) > 0.

(ii) To show that T (p, t) > T (2, t) for p > 2, we ﬁrst argue that

(1 + t)^p > (1− t)^p⁻¹(2p + 3t− 1) ∀p > 2, (15) it’s equivalent to show that ₍₁_−t)p^(1+t)−1(2p+3t^p −1) is greater than 1 for all p > 2. Therefore, we consider the derivative of the following function with respect to p as follows:

d dp

(1 + t)^p

(1− t)^p⁻¹(2p + 3t− 1) (16)

= (1 + t)^p

(1− t)^p⁻¹(2p + 3t− 1)² ×[

(1− 3t − 2p) ln(1 − t) + (2p + 3t − 1) ln(1 + t) − 2] . Observing both terms of the right side of (16) are positive for all p > 2 and using

(1+t)^p

(−1+3t+2p)(1−t)^p−1 > 1 when p=2, we can achieve (15). Secondly, we know that

2p− 1 − 3t > 1 − t ∀p > 2. (17)

(16)

ϕ^′′′_p(1 + t, 1− t) < 0 ∀p ≥ 2.

Then, applying Lemma 2.1 gives the desired result for which we set f (t) =−ϕp(1+t, 1−t) and g(t) = ϕ^′_p(1 + t, 1− t). 2

The result of Proposition 2.1(b) could be improved under some sense. More specifi- cally, the interval where ψ_p(1 + t, 1− t) is strictly convex varies as long as p changes. We originally wish to figure out the exact interval where ψp(1 + t, 1− t) is strictly convex for each p. However, it is very hard to find a closed form depending p to reflect this feature (indeed, it may be not possible in our opinion). To compromise, we try to find such an appropriate common interval for all p≥ 2 as shown in Proposition 2.1(b). The following two figures (Figures 7-8) depict the geometric view regarding what we just mentioned.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

p=1.1 p=1.5 p=2 p=3 p=10

Figure 7: The graphs of ψ_p(1 + t, 1− t) for diﬀerent p.

3 Global continuation algorithm for BQP

Due to the logarithmic barrier function being strictly convex and Proposition 2.1, we now introduce the quadratic penalty ∑_n

i=1ψ_p(1 + x_i, 1− xi) for the equality constraints

(17)

0

1

2

3 0 0.5 1 1.5 2 2.5 3

0 0.5 1 1.5 2

y−axis x−axis

z−axis

Figure 8: The graph of ψ_p(1 + t, 1− t) with a ﬁxed p.

and the logarithmic barrier−∑_n

i=1[ln(1 + x_i) + ln(1− xi)] of the box constraints into the (9). Construct a global smoothing function

ϕ(x, α, τ ) = f (x) + α

∑n i=1

ψ_p(1 + x_i, 1− xi)− τ

∑n i=1

[ln(1 + x_i) + ln(1− xi)] (18)

where τ > 0 is a barrier parameter and α > 0 is a penalty parameter. The next property indicates that the strictly convexity of function ϕ(x, α, τ ) on (−1, 1)ⁿ when the barrier parameter is large enough, and the strictly convexity of function ϕ(x, α, τ ) in a large subset of its domain for all τ > 0.

Proposition 3.1 Let ϕ(x, α, τ ) be the function deﬁned by (18). Then, the following hold.

(a) There exists a constant ˆτ > 0 such that if τ > ˆτ and α > 0, ϕ(x, α, τ ) is strictly convex on (−1, 1)ⁿ.

(b) There exists a constant ˆα > 0 such that if α > ˆα and τ > 0, ϕ(x, α, τ ) is strictly convex on the set D :=

{

x∈ (−1, 1)ⁿ |x_i| >√

2¹³ − 1, i = 1, 2, · · · , n} .

(18)

ϕ_a(x) := f (x) + α

∑n i=1

ψ_p(1 + x_i, 1− xi),

ϕ_b(x) := −

∑n i=1

[ln(1 + x_i) + ln(1− xi)].

Then the expression of the Hessian matrix of ϕ_b(x) at any x∈ X is given by

∇²xxϕb(x) = diag

( 1

(1− x1)² + 1

(1 + x₁)²,· · · , 1

(1− xn)² + 1 (1 + x_n)²

) ,

where diag(x) denotes a diagonal matrix with the components of x as the diagonal elements. Moreover, the function ₍₁_−x¹

i)² + _(1+x¹

i)² has minimum 2 at point x_i = 0, and so every diagonal element of∇²xxϕ_b(x) is at least 2. Thus, by letting U = [−1, 1]ⁿ, ε = 2 and using Lemma 2.3(b) yield the desired result.

(b) Set ϕ_a= f (x) and ϕb =

∑n i=1

ψp(1 + xi, 1− xi)− τ α

∑n i=1

[ln(1 + xi) + ln(1− xi)].

From the proof of Lemma 2.2, it follows that

∇²xx

(∑n

i=1

ψ_p(1− xi, 1− xi) )

= diag (

ψ_p^′′(1− x1, 1 + x₁),· · · , ψp^′′(1− xn, 1 + x_n) )

,

where ψ_p^′′(1− xi, 1 + x_i) can be found in (11). Now taking f (t) = −ϕp(1 + t, 1− t), g(t) = ϕ^′_p(1 + t, 1− t) and applying the proof(ii) of Lemma 2.1, we have

ψ_p^′′(1− t, 1 + t) > ψ2^′′(1− t, 1 + t) ∀p > 2.

In addition, from [11, Lemma 3.1], we also have ψ_b^′′(1− t, 1 + t) = 2√

(2t²+ 2)³ − 8

√(2t²+ 2)³ > 0.0004 ∀|t| > 0.51.

Therefore, the above two inequalities imply

ψ_p^′′(1− t, 1 + t) > 0.0004 ∀ |t| > 0.51 and ∀p ≥ 2.

This indicates that every diagonal element of ∇²xxψ_p is at least 0.0004. Using the fact that the Hessian matrix of −_α^τ ∑_n

i=1[ln(1 + x_i) + ln(1− xi)] is positive deﬁnite, we obtain that every diagonal element of∇²xxϕ_b is at least 0.0004. Now taking

U = [−1, 1]ⁿ, X = D and ε = 0.0004

(19)

and applying Lemma 2.3 gives the desired conclusion. 2

As remarked in [11], the result of Proposition 3.1 oﬀers motivation to use the function ϕ(x, α, τ ) to develop a global continuation algorithm for the constrained optimization problem (9). This method will generate a global optimal solution or at least a desirable local solution via a sequence of unconstrained minimization

xmin∈IRⁿϕ(x, α_k, τ_k) (19) with an increasing penalty parameter sequence{αk} and a decreasing barrier parameter sequence{τk}. Note that to ensure the strict convexity of ϕ(x, αk, τ_k), we have to utilize a suﬃciently large initial value τ₀ to start with the algorithm. As the iteration goes on, the convexity of logarithmic barrier −τk

∑n

i=1[ln(1 + x_i) + ln(1− xi)] will become weak, but the strict convexity of ϕ(x, αk, τk) can still be guaranteed due to the increasing of the penalty parameter α_k. This means that for each k ∈ IN, the minimization prob- lem (19) can be easily solved if we have skillful technique to adjust the parameter α and τ .

Algorithm 3.1

Step 0 Given parameters α0, τ0, σ1 > 1, σ2 ∈ (0, 1) and ϵ > 0. Select a starting point ˆ

x⁰ and set k = 0.

Step 1 Solve the unconstrained minimization problem (19) with the starting point ˆx^k, and denote by x^k its optimal solution.

Step 2 If √∑n

i=1ψ_p(1− x^ki, 1− x^ki)≤ ϵ, terminate the algorithm, else go to Step 3.

Step 3 Update the parameters αk+1 = σ1αk and τk+1 = σ2τk. Step 4 Set ˆx^k+1 = ˆx^k, k = k + 1 and go to Step 1.

Is Algorithm 3.1 well-deﬁned? To answer this, we give an existence theorem of solution for the unconstrained minimization problem (19). In fact, its proof can be found in [11, Lemma 3.2], we give a brief proof here for completeness.

Proposition 3.2 Let ϕ(x, α_k, τ_k) be the function deﬁned as in (18). Then, the following hold.

(a) For each k∈ IN, the minimization problem (19) has a solution x^k.

(b) From (a), there exists an ˆτ such that the solution to problem (19) is unique when τ_k> ˆτ

(20)

ϕ(x, α_k, τ_k) is continuous and X₁ is a compact set, there exist two real numbers L₁ and U₁ such that

L₁ ≤ ϕ(x, αk, τ_k)≤ U1 ∀x ∈ X1.

On the other hand, we note that ϕ(x, α_k, τ_k)−→ +∞ when xi0 −→ 1⁻ or x_i₀ −→ 1⁺ for some i0 ∈ {1, 2, ..., n}. Hence, the continuity of function ϕ(x, αk, τk) implies that there exists an δ with 0 < δ < 1/4 such that

ϕ(x, α_k, τ_k)≥ U1 ∀x ∈ (

(−1, −1 + δ] ∪ [1 − δ, 1) )n

. (20)

Let X = [−1+δ, 1−δ]ⁿ. Again, ϕ(x, α_k, τ_k) being continuous on a compact set X implies that there exists an ˆx∈ X such that for each k ∈ IN

ϕ(ˆx, α_k, τ_k)≤ ϕ(x, αk, τ_k) ∀x ∈ X.

Moreover, due to X₁ ⊆ X, we know

ϕ(ˆx, α_k, τ_k)≤ U1. (21)

Combining (20) and (21) yields that

ϕ(ˆx, αk, τk)≤ ϕ(x, αk, τk) ∀x ∈ (−1, 1)ⁿ\ X.

Thus, together with (20), it shows that ˆx is exactly the desired solution x^k.

(b) From conclusion of Proposition 3.1(a), ϕ(ˆx, α_k, τ_k) is strictly convex on (−1, 1)ⁿ. Hence x^k is unique. 2

4 Numerical experiments

In this section, we report numerical results of Algorithm 3.1 for solving the unconstrained binary quadratic programming problem. Our numerical experiments are carried out in Matlab (version 7.8) running on a PC Inter core 2 Q8200 of 2.33 GHz CPU and 2.00 GB Memory.

In our numerical experiments, we employ BFGS algorithm with strong Wolfe-Powell line search to solve the unconstrained minimization problem (19), and terminate the current iteration as long as x^k satisﬁes the following criterion:

∇^xϕ(x^k, α_k, τ_k) ≤5.0e− 3.

(21)

The values for the parameters involved in Algorithm 3.1 are chosen as follows:

α₀ = 0, σ₁ = 2, σ₂ = 0.5, ε = 1.0e− 3,

and the initial barrier parameter τ0 varies with the scale of problems (here we choose its value the same as that in [11]). The starting point ˆx⁰ = 0.9(1, 1, . . . , 1)^T ∈ IRⁿ is used for all test problems. To obtain an integer solution x^∗ from the ﬁnal iterate point ˆx^∗ of Algorithm 3.1, we let

x^∗_i =

{ −1 if |ˆx^∗i + 1| ≤ 1.0e − 2

1 if |ˆx^∗i − 1| ≤ 1.0e − 2 for i = 1, 2,· · · , n.

The test problems are all from the OR-Library and have the following formulation max z^TQz

s.t. z_i ∈ {0, 1}, i = 1, 2, · · · , n.

To solve these problems with Algorithm 3.1, we use the formula z = (x+e)/z to transform them into the following formulation

− min −¹₄x^TQx− ¹₂x^TQe− ¹₄e^TQe s.t. x_i ∈ {−1, 1}, i = 1, 2, · · · , n.

The optimal values generated by Algorithm 3.1 with different p (p=1.1, 2, 4, 5, 10, 20, 50, 100) are listed in Tables 1-5 (see Appendix D), where^′−^′ means that the algorithm fails to get an optimal solution when the maximum CPU time arrives. Moreover, to present the objective evaluation and comparison of the performance of Algorithm 3.1 with different p, we adopt the performance profile introduced in [7] as a means. In particular, we regard Algorithm 3.1 corresponding to a p as a solver and assume that there are n_s solvers and n_j test problems from the OR-Library collection J . We are interested in using the optimal values calculated by Algorithm 3.1 as performance measure for different p. For each problem j and solver s, let

t_j,s:= the optimal value of problem j by solver s, µ_j,s:= 1 tj,s

.

We compare the performance on problem j by solver s with the best performance by any one of the n_s solvers on this problem, i.e., we employ the performance ratio

r_j,s:= µ_j,s

min{µj,s : s∈ S } = max{tj,s: s∈ S } t_j,s ,

whereS is the set of eight solvers. An overall assessment of each solver is obtained from ρ_s(τ ) := 1

n_jsize{j ∈ J : rj,s≤ τ},

(22)

Algorithm 3.1 for solver s.

Figure 9 shows the performance profile of the reciprocals of optimal values obtained by Algorithm 3.1 in the range of [1, 1.04] for eight solvers on 50 test problems. The eight solvers correspond to Algorithm 3.1 with p = 1.1, p = 2, p = 4, p = 5, p = 10, p = 20, p = 50 and p = 100, respectively. From this figure, we see that Algorithm 3.1 are considerably efficient no matter which value of p is chosen. In fact, Algorithm 3.1 with the aforementioned p values can solve all the 50 test problems except for p = 5, 20, 100.

Moreover, Algorithm 3.1 with p = 4 has the best numerical performance (has the highest probability of being the optimal solver) and the probability of its being the winner on a given BQP is around 0.48. Besides, p = 1.1 and p = 2 have a comparable performance with p = 4, please refer to Appendix D for more detailed numerical reports.

1 1.005 1.01 1.015 1.02 1.025 1.03 1.035 1.04

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

The values of tau

The values of performance profile

p=1.1 p=2 p=4 p=5 p=10 p=20 p=50 p=100

Figure 9: Performance proﬁle of the reciprocals of optimal values by Algorithm 3.1 with diﬀerent p.

References

[1] B. Alidaee, G. Kochenberger and A. Ahmadian, 0−1 quadratic programming approach for the optimal solution of two scheduling problems, International Journal of Systems Science, 25 (1994), 401-408.