Duan Li

Department of Systems Engineering & Engineering Management

The Chinese University of Hong Kong At National Cheng Kung University

March 24, 2011

Outline

• Binary quadratic program and maximum cut problem

• Polynomially solvable subclasses of BQP

• Exploiting geometric properties of BQP

• SDP relaxation

• Goemanns and Williamson’s bound and Nesterov’s SDP bound

• Further improvement of the SDP bound

Binary quadratic optimization and max-cut problem

• A general 0-1 or binary quadratic problem can be expressed as:

(P ) min f (x) = 1

2x^{T} Qx + c^{T}x
s.t. x ∈ X,

where Q is a symmetric n × n matrix, c ∈ R^{n} and X is
either {−1, 1}^{n} or {0, 1}^{n}. Note

x^{2}_{i} = 1, x_{i} ∈ {−1, 1} or x^{2}_{i} = x_{i}, x_{i} ∈ {0, 1}.

Problem (P ) is in general NP-hard even if the matrix Q is
positive definite, since x^{T}Qx = x^{T} (Q + diag(λ))x − e^{T} λ
if x_{i} ∈ {−1, 1} or x^{T} Qx = x^{T} (Q + diag(λ))x − λ^{T} x if
x_{i} ∈ {0, 1}.

Max-cut problem

– Given a graph G = (V, E), find a partition of the nodes
of V into two disjoint sets V_{1} and V_{2} (V_{1} ∩ V_{2} = ∅,
V_{1} ∪ V_{2} = V ) in such a way to maximize the weights
of edges that have one endpoint in V_{1} and the other in
V_{2}.

– Let w^{ij} be the weight corresponding to the (i, j) edge,
and is zero if the nodes i and j are not connected.

Define:

y_{i} = −1⇐⇒i ∈ V_{1}, y_{i} = 1⇐⇒i ∈ V_{2}.
– The max-cut problem can be written as:

max 1 4

X

i,j

w_{ij}(1 − y_{i}y_{j})

s.t. y_{i} ∈ {−1, 1}, i = 1, . . . , n,

– The max cut problem is also APX-hard, meaning that there is no polynomial-time (ǫ-) approximation scheme (PTAS) for it unless P = NP.

Maximal stable set

– Given an undirected graph G = (V, E).

Figure 1: Maximal stable set of 9 blue nodes in a 24-node graph

– Independent subset of a graph: no two nodes from it are linked by an arc in E. Maximal stable set : the independent subset with maximum size of nodes α(G).

– Define

(SQP ) min x^{T} (I + A)x

s.t. e^{T} x = 1, x ≥ 0,

where A is the adjacency matrix of G. Then α(G) =

1

v(SQP ).

– Maximal stable set problem is NP-hard and is closely related to quadratic knapsack problem.

Polynomially solvable subclasses of BQP

• As the binary quadratic programming program is NP-hard in general, identifying polynomially solvable subclasses of binary quadratic programming problems not only offers the- oretical insight into the complicated nature of the problem, but also provides platforms to design relaxation schemes for exact solution methods.

Problems with all off-diagonal elements of Q being non-positive

• Consider a subclass of problem (0-1QP ) where all off-diagonal
elements of Q are non-positive and q_{ii} = 0, i = 1, . . ., n.

• As x_{i}x_{j} = min(x_{i}, x_{j}) when x_{i}, x_{j} ∈ {0, 1}. and q_{ij} ≤ 0
for 1 ≤ i < j ≤ n ⇒ A linear programming problem with a
totally unimodular constraint matrix and an integral right-
hand side

min

n

X

i=1

c_{i}x_{i} + X

1≤i<j≤n

q_{ij}z_{ij}

s.t. z_{ij} ≤ xi, 1 ≤ i < j ≤ n,
z_{ij} ≤ xj, 1 ≤ i < j ≤ n,

x_{i}, x_{j}, z_{ij} ∈ {0, 1}, 1 ≤ i < j ≤ n.

• LP generates integer solution!

Homogenous problems with fixed rank Q

• We consider now a subclass of homogenous problems where Q is negative semidefinite and rank(Q) = d.

• Let G = −Q ⇒ There a row full rank d × n matrix, V ,
such that G = V ^{T} V .

• ⇒ Equivalent problem:

(BQP_{f r}) max

x∈{0,1}^{n} x^{T} Gx = x^{T}V ^{T}V x =

d

X

i=1

(v_{i}x)^{2}, (1)
where v_{i} is the i-th row of matrix V .

• If d is equal to 1, i.e., the matrix G is of rank one with G

= v_{1}^{T} v_{1}, the solution to (BQP_{f r}) can be easily found by
inspection. More specifically, we only need to select x such
that the absolute value of v_{1}x is maximized on {0, 1}^{n}.

• In general cases with rank(G) = d > 1, we consider a
linear map Φ: x ∈ R^{n} → z = V x ∈ R^{d}, in which Φ
maps the hypercube [0, 1]^{n} into a convex polytope Z(V ) =
Φ([0, 1]^{n}) = {z ∈ R^{d} | z = V x, x ∈ [0, 1]^{n}}, known as a
zonotope.

• Note that

x∈{0,1}max^{n} x^{T}Gx = max

x∈{0,1}^{n}
d

X

i=1

(v_{i}x)^{2} = max

z∈Z(V ) d

X

i=1

z_{i}^{2} = max

z∈Z(V ) kzk^{2}.
Thus, (BQP_{f r}) reduces to a problem of finding the maxi-

mum norm in a zonotope.

• Let N_{ep}(Z) denote the number of extreme points of the
zonotope Z(V ). Then N_{ep}(Z) = O(n^{d}^{−1}).

• All the extreme points of the zonotope Z(V ) can be found
by cell enumeration algorithms in discrete geometry with
complexity O(n^{d}).

• ⇒ Homogenous problems with fixed rank Q is polynomially solvable.

Problems with Q Being a tridiagonal matrix are polynomially solvable

(BQP) defined by a series-parallel graph are polynomially solvable

• (BQP ) can be always reduced to a max-cut problem. Con- sider an instance of (BQP ) with

Q =

0 1 0 −1.5

1 0 0 0

0 0 0 −0.5

−1.5 0 −0.5 0

, c =

2.5

−2 3 1.5

• A graph is called series-parallel if it is not contractible to
K_{4}.

• If graph G(Q) is series-parallel, then graph G(Q, c) is not
contractible to K_{5}.

Exploiting Geometric Properties of QBP

• Equivalent perturbed problem:

(P_{µ}) min

x∈{0,1}^{n} f_{µ}(x) = 1

2x^{T} (Q + µI)x + (c − 1

2µe)^{T} x,

where µ ≥ 0, I is the n × n identity matrix and e is the vector with all elements equal to 1.

– The shape of contour changes when µ changes.

– The properties of (P^{µ}), e.g., the conditional number,
change when µ changes.

−1 −0.5 0 0.5 1 1.5 2

−4

−3

−2

−1 0 1 2

µ=1 µ=2

µ=3 µ=5

v = fµ(x)

Properties of Perturbation Problem

• The ellipse contour of f_{µ}(x) with level ˜v:

E_{µ}(˜v) = {x ∈ R^{n} | fµ(x) = ˜v}.

• The center of E_{µ}(˜v):

x^{0}(µ) = −(Q + µI)^{−1}(c − 1

2µe). (2)

• Denote x^{0} = x^{0}(0) = −Q^{−1}c and t_{0}(µ) = f_{µ}(x^{0}(µ)). The
perturbed objective function can be rewritten as

f_{µ}(x) = 1

2(x − x^{0}(µ))^{T}(Q + µI)(x − x^{0}(µ)) + t_{0}(µ). (3)

Properties of Perturbation Problem

Theorem 1 Assume that x^{0} 6= −^{1}_{2}e. (i) t_{0}(µ) is a strictly con-
cave function on [0, ∞).

(ii) Let d_{0}(µ) = kx^{0}(µ) − ^{1}_{2}ek. Then d_{0}(µ) is a strictly de-
creasing function on [0, ∞).

(iii) Let S = {x ∈ R^{n} | kx− ^{1}_{2}ek ≤ ^{√}_{2}^{n}}. Let ¯µ be the critical
point on the sphere such that kx^{0}(µ_{0}) − ^{1}_{2}ek = ^{√}_{2}^{n}. If x^{0} 6∈ S,
then t_{0}(µ) is strictly increasing on [0, ¯µ] and strictly decreasing
on [¯µ, ∞). Otherwise, if x^{0} ∈ S, then t_{0}(µ) is strictly decreasing
on [0, ∞).

Lower Bound Derived from the Minimum Distance Sphere

−0.5 0 0.5 1 1.5

−1

−0.5 0 0.5

1 (0,1) (1,1)

(0,0) (1,0)

x^{0}

f(x) = f (x^{c})

f(x) = ℓ(0) S0(0)

f(x) = f^{∗}

x^{c}
x2

x^{1}

Lower Bound Derived from the Minimum Distance Sphere

Theorem 2 Let f^{∗} denote the optimal value of (P ) and ¯x(µ)
be the nearest 0-1 point to x_{0}(µ).

(i) For any µ ≥ 0, a lower bound of f^{∗} is given by
ℓ(µ) = λ_{1} + µ

2 k¯x(µ) − x^{0}(µ)k^{2} + t_{0}(µ). (4)
(ii) The limit of ℓ(µ) exists:

ℓ^{∞} ≡ lim

µ→∞ ℓ(µ) = f (1

2e) + nλ_{1}

8 − (¯x^{∞} − 1

2e)^{T}Q(x^{0} − 1
2e).

Lower Bound derived from Multiple Minimum Distance Spheres centered at the Longest Axis of the Contour

−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

−4

−3

−2

−1 0 1 2

Lower Bound

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

−2.5

−2

−1.5

−1

−0.5 0 0.5 1 1.5 2 2.5

α0 α1 α2 α3

z^{0} z^{1} z^{2}

E_{p}(η)

W^{0} W^{1} W^{2} W^{3}

• Let z^{i} be the intersection point of Ω^{i} and Ω^{i+1} and define
η = min{min_{i=0,...,m} h(z^{i}), h(u^{1}), h(u^{2})}.

• Let Ψ be the set of all points of {0, 1}^{n} in ∪^{m+1}_{i=0} S^{i}.

Theorem 3 Let P (¯v) = {x ∈ R^{n} | f (x) ≤ ¯v}, where ¯v =

1

2η + f (x^{0})

(i) If P (¯v) ∩ Ψ 6= ∅, Then

˜

x ∈ arg min

x∈P (¯v)∩Ψ f (x) is an optimal solution to (P ).

(ii) If P (¯v) ∩ Ψ = ∅, then ¯v is a lower bound of f^{∗}.

SDP Problem

• Example of Linear program and semidefinite program (SDP):

(LP ) min 2x_{1} + x_{2} + x_{3}
s.t. x_{1} + x_{2} + x_{3} = 1

(x_{1}, x_{2}, x_{3}) ≥ 0.

(SDP ) min 2x_{1} + x_{2} + x_{3}
s.t. x_{1} + x_{2} + x_{3} = 1

x_{1} x_{2}
x_{2} x_{3}

0.

Figure 2: Set of 3 × 3 positive semidefinite matrices with unit diagonal

• General form of SDP problem:

(SDP ) min C • X

s.t. A_{i} • X = b_{i}, i = 1, . . . , m,
X 0,

where C, A_{i} are given n × n symmetric and b_{i}s are given
scalars, and

A • X =

n

X

i=1 n

X

j=1

a_{ij}x_{ij} = T r(A^{T}X).

• The dual of (SDP) is

(SDD) max b^{T} y
s.t.

m

X

i=1

y_{i}A_{i} C,

where y ∈ R^{m}. Or equivalently
(SDD) max b^{T}y

s.t.

m

X

i=1

y_{i}A_{i} + S = C,
S 0.

• Strong duality: v(SDP ) = v(SDD) if (SDP ) or (SDD) is strictly feasible.

• The SDP interior-point algorithm finds an ǫ-approximate solution where solution time is linear in log(1/ǫ) and poly- nomial in m and n.

• SDP solvers and software based on MatLab: SeDuMi, CVX (n ≤ 1000).

SDP relaxation for 0-1 QP

Consider the unconstrained case of (P ):

(BQP ) min f (x) = 1

2x^{T}Qx + c^{T} x
x ∈ {−1, 1}^{n},

Two ways of constructing SDP relaxations:

• Lifting method. Note that f (x) = 1

2(1, x^{T} ) 0 c
c Q

1 x

= 1

2Q•[˜

1 x

(1, x^{T})].

Lifting x to a symmetric matrix X =

1 x

(1, x^{T}), we get
1

2x^{T}Qx + c^{T}x = 1

2Q • X,˜

x ∈ {−1, 1}^{n}⇐⇒X_{ii} = 1, i = 2, . . . , n + 1.

• Note also X =

1 x

(1, x^{T} )⇐⇒X 0, X_{11} = 1, rank(X) = 1.

Dropping rank(X) = 1, we obtain the following SDP relax- ation for (BQP ):

(SDP_{0}) min 1

2Q • X˜

s.t. X 0, X_{ii} = 1, i = 1, . . . , n + 1.

• Shor’s relaxation scheme. Note that (BQP ) is equivalent to

(BQP_{c}) min f (x) = 1

2x^{T} Qx + c^{T}x
x^{2}_{i} = 1, i = 1, . . . , n.

The dual problem of (BQP ) is

(D) max

λ∈R^{n} min

x∈R^{n}

1

2x^{T} [Q + 2diag(λ)]x + c^{T}x − e^{T} λ
which is equivalent to an SDP problem:

(D) max τ

s.t. Q + 2diag(λ) c

c −2e^{T} λ − 2τ

0.

• (SDP_{0})⇐⇒(D) (conic dual).

• The nature of the SDP relaxation is a nonlinear lifting to a higher dimensional space. Indeed, rather than solving the original quadratic problem in the n-dimensional vector space, we are instead solving a linear problem in the n × n matrix space.

• If we find an optimal solution X of the SDP that has rank one, ⇒ we solve the original problem.

• In general, it is not the case that the optimal solution of the SDP relaxation will be rank one. However, it is possible to use rounding schemes to obtain nearby rank one solutions.

Furthermore, in some cases, it is possible to do so while obtaining some approximation guarantees on the quality of the rounded solutions.

Goemanns and Williamson’s bound

• Basic questions:

– Approximation guarantees: Is it possible to prove gen- eral properties on the quality of the bounds obtained by SDP?

– Feasible solutions: Can we (somehow) use the SDP re- laxations to provide not just bounds, but actual feasible points with good (or optimal) values of the objective?

• In their celebrated MAXCUT paper (JACM, 1995), Goe- mans and Williamson developed the following randomized method for finding a good feasible cut from the solution of the SDP:

– Factorize X as X = V ^{T}V , where V = [v_{1}, . . . , v_{n}] ∈
R^{r}^{×n}, where r is the rank of X.

– Then X^{ij} = v_{i}^{T} v_{j}, and, since X_{ii} = 1, this factorization
gives n vectors v_{i} on the unit sphere in R^{r}.

– Now, choose a random hyperplane (passing through the
origin) in R^{r}, and assign to each variable x_{i} either +1
or −1, depending on which side of the hyperplane the
point v_{i} lies.

• It turns out that this procedure gives a solution that, on average, is quite close to the value of the SDP bound. The random hyperplane can be characterized by its normal vector p, which is chosen to be uniformly distributed on the unit sphere.

• The rounded solution is given by x_{i} = sign(p^{T}v_{i}). ⇒ The
expected value of this solution in x^{T} W x:

E_{p}[x^{T}W x] = X

i,j

w_{ij}E_{p}[x_{i}x_{j}]

= X

i,j

w_{ij}E_{p}[sign(p^{T}v_{i}) · sign(p^{T} v_{j})]

= 2

π

X

i,j

w_{ij} arcsin X_{ij}.

• The objective of maximum cut is ^{1}_{4} P

i,j w_{ij}(1 − y_{i}y_{j}). ⇒
csdp−expected = 1

4 · 2 π

X

i,j

w_{ij} arccos X_{ij}.

• On the other hand, the solution of the SDP gives an upper bound on the cut capacity equal to:

csdp−upperbound = 1 4

X

i,j

w_{ij}(1 − X_{ij}).

• Consider the problem of finding a constant α such that α(1 − t) ≤ 2

π arccos(t), ∀t ∈ [−1, 1].

This is

α = min

t∈[−1,1]

2 π

arccos(t)

1 − t = min

θ∈[0,π]

2 π

θ

1 − cos θ.

• It can be shown that 0.87856 < α < 0.87857.

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 0

0.5 1 1.5 2 2.5

α(1−t) 2/π arccos(t)

• Thus

csdp−upperbound ≤ 1 4· 1

α

X

i,j

w_{ij} arccos(X_{ij}) = 1

αcsdp−expected.

• So far we have the following inequalities:

csdp−upperbound ≤ 1

αcsdp−expected

csdp−expected ≤ c_{max}
c_{max} ≤ csdp−upperbound

• Therefore

0.878 · csdp−upperbound ≤ c_{max}

(approximation ratio for SDP bound)
0.878 · c_{max} ≤ csdp−expected

(approximation ratio for feasible solution)

Nesterov’s SDP bound

• In the MAXCUT problem, we are in fact maximizing the
homogeneous quadratic form (omitting ^{1}_{4}):

x^{T} Ax =

n

X

i,j=1

w_{ij}(1−x_{i}x_{j}) =

n

X

i=1

(

n

X

j=1

w_{ij})x^{2}_{i} −

n

X

i,j=1

w_{ij}x_{i}x_{j}
over {−1, 1}^{n}.

• Special properties of A:

– A 0;

– A^{ij} ≤ 0 for all i 6= j;

–

Pn

j=1 A_{ij} = 0.

• What happens if A is a general positive semidefinite matrix?

• Let A 0, consider the problem:

(P ) max

x∈{−1,1}^{n} x^{T} Ax.

• The SDP relaxation of (P ) is

(SDP ) max A • X

s.t. diag(X) = e, X 0.

v(SDP )

v(P ) ≥ 1.

• Nesterov (1998). Let A 0. Then

v(P ) ≤ v(SDP ) ≤ ^{π}_{2}v(P )

Improvement of the SDP bound

– Saddle point characterization of optimality conditions:

Let x^{∗} ∈ {−1, 1}^{n} and λ^{∗} ∈ R^{n}. Then x^{∗} solves (P ),
λ^{∗} solves (D) and v(P ) = v(D) ⇔

(1) Q + 2diag(λ^{∗}) 0;

(2) [Q + 2diag(λ^{∗})]x^{∗} = −c.

– Let

Q^{∗} = Q + 2diag(λ^{∗}),

C = {x ∈ R^{n} | Q^{∗}x = −c}.

Then, v(P ) = v(D) ⇔∈ {−1, 1}^{n} ∩ C 6= ∅.

– A sufficient condition for the polynomial solvability of (P ):

Q^{∗} ≻ 0 ⇒ v(P ) = v(D), i.e. there is no duality gap.

– What if Q^{∗} 6≻ 0? The saddle point conditions motivate
us to define the distance:

δ = dist({−1, 1}^{n}, C).

– δ = 0⇔ v(P ) = v(D).

– Can we improve the SDP bound v(D) if δ 6= 0?

– Let

Q^{∗} = Q + 2diag(λ^{∗}),

C = {x ∈ R^{n} | Q^{∗}x = −c}.

Then, v(P ) − v(D) = 0 ⇔ ∃ x^{∗} ∈ {−1, 1}^{n} ∩ C.

– This motivates us to define the distance:

δ = dist({−1, 1}^{n}, C).

– δ = 0⇔ v(P ) = v(D).

• Main result: δ > 0 ⇒ an improved lower bound:

ν_{s} = v(D) + 1

4ξ_{r+1}δ^{2},

where 0 < ξ_{r+1} ≤ · · · ≤ ξ_{n} are the positive eigenvalues of
Q^{∗} = Q + 2diag(λ^{∗}).

• Now, the question is how to compute

δ = dist({−1, 1}^{n}, C)?

C^{1}
C^{2}

C^{3}

C^{4}
C^{5}

C^{6}

C^{7}

C^{9}
C^{10}

Figure 3: Partition of C

Computation of distance δ

The idea is to partition the set C into certain subsets in each of which the signs of x ∈ C are the same.

⇓

The distance between {−1, 1}^{n} and each subset must be
achieved at sign(π), where π is an interior point of the
subset.

⇓

Via enumerating all the subsets and finding an interior point in each of the subsets, we obtain δ.

• Let the solution in R^{s} of C be given as
x_{j} = x^{0}_{j} +

s

X

i=1

G_{ij}z_{i} = 0, i = 1, . . . , n. (5)

• Consider n hyperplanes in R^{s}:
x^{0}_{j} +

s

X

i=1

G_{ij}z_{i} = 0, i = 1, . . . , n, (6)
which consist of an arrangement of hyperplanes (in a term
of geometric computation). Finding all T_{x}’s corresponds to
finding all cells of the arrangement.

• It has been known that the number of cells of the hyperplane
arrangement is O(n^{s}).

• For fixed s, the distance δ can be computed in polynomial time. Another polynomially solvable case is s = n − 1.

• Solution method for cell enumeration: Reverse search algo- rithm (Avis and Fukuda (1996), Sleumer (1999))