(4)Max-cut problem – Given a graph G = (V, E), find a partition of the nodes of V into two disjoint sets V1 and V2 (V1 ∩ V2

(1)

Duan Li

Department of Systems Engineering & Engineering Management

The Chinese University of Hong Kong At National Cheng Kung University

March 24, 2011

(2)

Outline

• Binary quadratic program and maximum cut problem

• Polynomially solvable subclasses of BQP

• Exploiting geometric properties of BQP

• SDP relaxation

• Goemanns and Williamson’s bound and Nesterov’s SDP bound

• Further improvement of the SDP bound

(3)

Binary quadratic optimization and max-cut problem

• A general 0-1 or binary quadratic problem can be expressed as:

(P ) min f (x) = 1

2x^T Qx + c^Tx s.t. x ∈ X,

where Q is a symmetric n × n matrix, c ∈ Rⁿ and X is either {−1, 1}ⁿ or {0, 1}ⁿ. Note

x²_i = 1, x_i ∈ {−1, 1} or x²_i = x_i, x_i ∈ {0, 1}.

Problem (P ) is in general NP-hard even if the matrix Q is positive definite, since x^TQx = x^T (Q + diag(λ))x − e^T λ if x_i ∈ {−1, 1} or x^T Qx = x^T (Q + diag(λ))x − λ^T x if x_i ∈ {0, 1}.

(4)

Max-cut problem

– Given a graph G = (V, E), find a partition of the nodes of V into two disjoint sets V₁ and V₂ (V₁ ∩ V₂ = ∅, V₁ ∪ V₂ = V ) in such a way to maximize the weights of edges that have one endpoint in V₁ and the other in V₂.

(5)

– Let w^ij be the weight corresponding to the (i, j) edge, and is zero if the nodes i and j are not connected.

Define:

y_i = −1⇐⇒i ∈ V₁, y_i = 1⇐⇒i ∈ V₂. – The max-cut problem can be written as:

max 1 4

X

i,j

w_ij(1 − y_iy_j)

s.t. y_i ∈ {−1, 1}, i = 1, . . . , n,

– The max cut problem is also APX-hard, meaning that there is no polynomial-time (ǫ-) approximation scheme (PTAS) for it unless P = NP.

(6)

Maximal stable set

– Given an undirected graph G = (V, E).

Figure 1: Maximal stable set of 9 blue nodes in a 24-node graph

(7)

– Independent subset of a graph: no two nodes from it are linked by an arc in E. Maximal stable set : the independent subset with maximum size of nodes α(G).

– Define

(SQP ) min x^T (I + A)x

s.t. e^T x = 1, x ≥ 0,

where A is the adjacency matrix of G. Then α(G) =

1

v(SQP ).

– Maximal stable set problem is NP-hard and is closely related to quadratic knapsack problem.

(8)

Polynomially solvable subclasses of BQP

• As the binary quadratic programming program is NP-hard in general, identifying polynomially solvable subclasses of binary quadratic programming problems not only offers the- oretical insight into the complicated nature of the problem, but also provides platforms to design relaxation schemes for exact solution methods.

(9)

Problems with all off-diagonal elements of Q being non-positive

• Consider a subclass of problem (0-1QP ) where all off-diagonal elements of Q are non-positive and q_ii = 0, i = 1, . . ., n.

• As x_ix_j = min(x_i, x_j) when x_i, x_j ∈ {0, 1}. and q_ij ≤ 0 for 1 ≤ i < j ≤ n ⇒ A linear programming problem with a totally unimodular constraint matrix and an integral right- hand side

min

n

X

i=1

c_ix_i + X

1≤i<j≤n

q_ijz_ij

s.t. z_ij ≤ xi, 1 ≤ i < j ≤ n, z_ij ≤ xj, 1 ≤ i < j ≤ n,

x_i, x_j, z_ij ∈ {0, 1}, 1 ≤ i < j ≤ n.

• LP generates integer solution!

(10)

Homogenous problems with fixed rank Q

• We consider now a subclass of homogenous problems where Q is negative semidefinite and rank(Q) = d.

• Let G = −Q ⇒ There a row full rank d × n matrix, V , such that G = V ^T V .

• ⇒ Equivalent problem:

(BQP_{f r}) max

x∈{0,1}ⁿ x^T Gx = x^TV ^TV x =

d

X

i=1

(v_ix)², (1) where v_i is the i-th row of matrix V .

(11)

• If d is equal to 1, i.e., the matrix G is of rank one with G

= v₁^T v₁, the solution to (BQP_{f r}) can be easily found by inspection. More specifically, we only need to select x such that the absolute value of v₁x is maximized on {0, 1}ⁿ.

• In general cases with rank(G) = d > 1, we consider a linear map Φ: x ∈ Rⁿ → z = V x ∈ R^d, in which Φ maps the hypercube [0, 1]ⁿ into a convex polytope Z(V ) = Φ([0, 1]ⁿ) = {z ∈ R^d | z = V x, x ∈ [0, 1]ⁿ}, known as a zonotope.

• Note that

x∈{0,1}maxⁿ x^TGx = max

x∈{0,1}ⁿ d

X

i=1

(v_ix)² = max

z∈Z(V ) d

X

i=1

z_i² = max

z∈Z(V ) kzk². Thus, (BQP_{f r}) reduces to a problem of finding the maxi-

mum norm in a zonotope.

(12)

• Let N_ep(Z) denote the number of extreme points of the zonotope Z(V ). Then N_ep(Z) = O(n^d⁻¹).

• All the extreme points of the zonotope Z(V ) can be found by cell enumeration algorithms in discrete geometry with complexity O(n^d).

• ⇒ Homogenous problems with fixed rank Q is polynomially solvable.

(13)

Problems with Q Being a tridiagonal matrix are polynomially solvable

(BQP) defined by a series-parallel graph are polynomially solvable

• (BQP ) can be always reduced to a max-cut problem. Con- sider an instance of (BQP ) with

Q =







0 1 0 −1.5

1 0 0 0

0 0 0 −0.5

−1.5 0 −0.5 0







, c =







2.5

−2 3 1.5







(14)

• A graph is called series-parallel if it is not contractible to K₄.

• If graph G(Q) is series-parallel, then graph G(Q, c) is not contractible to K₅.

(15)

Exploiting Geometric Properties of QBP

• Equivalent perturbed problem:

(P_µ) min

x∈{0,1}ⁿ f_µ(x) = 1

2x^T (Q + µI)x + (c − 1

2µe)^T x,

where µ ≥ 0, I is the n × n identity matrix and e is the vector with all elements equal to 1.

– The shape of contour changes when µ changes.

– The properties of (P^µ), e.g., the conditional number, change when µ changes.

(16)

−1 −0.5 0 0.5 1 1.5 2

−4

−3

−2

−1 0 1 2

µ=1 µ=2

µ=3 µ=5

v = fµ(x)

(17)

Properties of Perturbation Problem

• The ellipse contour of f_µ(x) with level ˜v:

E_µ(˜v) = {x ∈ Rⁿ | fµ(x) = ˜v}.

• The center of E_µ(˜v):

x⁰(µ) = −(Q + µI)⁻¹(c − 1

2µe). (2)

• Denote x⁰ = x⁰(0) = −Q⁻¹c and t₀(µ) = f_µ(x⁰(µ)). The perturbed objective function can be rewritten as

f_µ(x) = 1

2(x − x⁰(µ))^T(Q + µI)(x − x⁰(µ)) + t₀(µ). (3)

(18)

Properties of Perturbation Problem

Theorem 1 Assume that x⁰ 6= −¹₂e. (i) t₀(µ) is a strictly con- cave function on [0, ∞).

(ii) Let d₀(µ) = kx⁰(µ) − ¹₂ek. Then d₀(µ) is a strictly decreasing function on [0, ∞).

(iii) Let S = {x ∈ Rⁿ | kx− ¹₂ek ≤ ^√₂ⁿ}. Let ¯µ be the critical point on the sphere such that kx⁰(µ₀) − ¹₂ek = ^√₂ⁿ. If x⁰ 6∈ S, then t₀(µ) is strictly increasing on [0, ¯µ] and strictly decreasing on [¯µ, ∞). Otherwise, if x⁰ ∈ S, then t₀(µ) is strictly decreasing on [0, ∞).

(19)

Lower Bound Derived from the Minimum Distance Sphere

−0.5 0 0.5 1 1.5

−1

−0.5 0 0.5

1 (0,1) (1,1)

(0,0) (1,0)

x⁰

f(x) = f (x^c)

f(x) = ℓ(0) S0(0)

f(x) = f^∗

x^c x2

x¹

(20)

Lower Bound Derived from the Minimum Distance Sphere

Theorem 2 Let f^∗ denote the optimal value of (P ) and ¯x(µ) be the nearest 0-1 point to x₀(µ).

(i) For any µ ≥ 0, a lower bound of f^∗ is given by ℓ(µ) = λ₁ + µ

2 k¯x(µ) − x⁰(µ)k² + t₀(µ). (4) (ii) The limit of ℓ(µ) exists:

ℓ^∞ ≡ lim

µ→∞ ℓ(µ) = f (1

2e) + nλ₁

8 − (¯x^∞ − 1

2e)^TQ(x⁰ − 1 2e).

(21)

Lower Bound derived from Multiple Minimum Distance Spheres centered at the Longest Axis of the Contour

−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

−4

−3

−2

−1 0 1 2

(22)

Lower Bound

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

−2.5

−2

−1.5

−1

−0.5 0 0.5 1 1.5 2 2.5

α0 α1 α2 α3

z⁰ z¹ z²

E_p(η)

W⁰ W¹ W² W³

(23)

• Let zⁱ be the intersection point of Ωⁱ and Ωⁱ⁺¹ and define η = min{min_i=0,...,m h(zⁱ), h(u¹), h(u²)}.

• Let Ψ be the set of all points of {0, 1}ⁿ in ∪^m+1_i=0 Sⁱ.

Theorem 3 Let P (¯v) = {x ∈ Rⁿ | f (x) ≤ ¯v}, where ¯v =

1

2η + f (x⁰)

(i) If P (¯v) ∩ Ψ 6= ∅, Then

˜

x ∈ arg min

x∈P (¯v)∩Ψ f (x) is an optimal solution to (P ).

(ii) If P (¯v) ∩ Ψ = ∅, then ¯v is a lower bound of f^∗.

(24)

SDP Problem

• Example of Linear program and semidefinite program (SDP):

(LP ) min 2x₁ + x₂ + x₃ s.t. x₁ + x₂ + x₃ = 1

(x₁, x₂, x₃) ≥ 0.

(SDP ) min 2x₁ + x₂ + x₃ s.t. x₁ + x₂ + x₃ = 1

x₁ x₂ x₂ x₃

0.

(25)

Figure 2: Set of 3 × 3 positive semidefinite matrices with unit diagonal

(26)

• General form of SDP problem:

(SDP ) min C • X

s.t. A_i • X = b_i, i = 1, . . . , m, X 0,

where C, A_i are given n × n symmetric and b_is are given scalars, and

A • X =

n

X

i=1 n

X

j=1

a_ijx_ij = T r(A^TX).

(27)

• The dual of (SDP) is

(SDD) max b^T y s.t.

m

X

i=1

y_iA_i C,

where y ∈ R^m. Or equivalently (SDD) max b^Ty

s.t.

m

X

i=1

y_iA_i + S = C, S 0.

• Strong duality: v(SDP ) = v(SDD) if (SDP ) or (SDD) is strictly feasible.

(28)

• The SDP interior-point algorithm finds an ǫ-approximate solution where solution time is linear in log(1/ǫ) and polynomial in m and n.

• SDP solvers and software based on MatLab: SeDuMi, CVX (n ≤ 1000).

(29)

SDP relaxation for 0-1 QP

Consider the unconstrained case of (P ):

(BQP ) min f (x) = 1

2x^TQx + c^T x x ∈ {−1, 1}ⁿ,

Two ways of constructing SDP relaxations:

• Lifting method. Note that f (x) = 1

2(1, x^T ) 0 c c Q

1 x

= 1

2Q•[˜

1 x

(1, x^T)].

(30)

Lifting x to a symmetric matrix X =

1 x

(1, x^T), we get 1

2x^TQx + c^Tx = 1

2Q • X,˜

x ∈ {−1, 1}ⁿ⇐⇒X_ii = 1, i = 2, . . . , n + 1.

• Note also X =

1 x

(1, x^T )⇐⇒X 0, X₁₁ = 1, rank(X) = 1.

Dropping rank(X) = 1, we obtain the following SDP relaxation for (BQP ):

(SDP₀) min 1

2Q • X˜

s.t. X 0, X_ii = 1, i = 1, . . . , n + 1.

(31)

• Shor’s relaxation scheme. Note that (BQP ) is equivalent to

(BQP_c) min f (x) = 1

2x^T Qx + c^Tx x²_i = 1, i = 1, . . . , n.

The dual problem of (BQP ) is

(D) max

λ∈Rⁿ min

x∈Rⁿ

1

2x^T [Q + 2diag(λ)]x + c^Tx − e^T λ which is equivalent to an SDP problem:

(D) max τ

s.t. Q + 2diag(λ) c

c −2e^T λ − 2τ

0.

• (SDP₀)⇐⇒(D) (conic dual).

(32)

• The nature of the SDP relaxation is a nonlinear lifting to a higher dimensional space. Indeed, rather than solving the original quadratic problem in the n-dimensional vector space, we are instead solving a linear problem in the n × n matrix space.

• If we find an optimal solution X of the SDP that has rank one, ⇒ we solve the original problem.

• In general, it is not the case that the optimal solution of the SDP relaxation will be rank one. However, it is possible to use rounding schemes to obtain nearby rank one solutions.

Furthermore, in some cases, it is possible to do so while obtaining some approximation guarantees on the quality of the rounded solutions.

(33)

Goemanns and Williamson’s bound

• Basic questions:

– Approximation guarantees: Is it possible to prove general properties on the quality of the bounds obtained by SDP?

– Feasible solutions: Can we (somehow) use the SDP relaxations to provide not just bounds, but actual feasible points with good (or optimal) values of the objective?

• In their celebrated MAXCUT paper (JACM, 1995), Goe- mans and Williamson developed the following randomized method for finding a good feasible cut from the solution of the SDP:

(34)

– Factorize X as X = V ^TV , where V = [v₁, . . . , v_n] ∈ R^r^×n, where r is the rank of X.

– Then X^ij = v_i^T v_j, and, since X_ii = 1, this factorization gives n vectors v_i on the unit sphere in R^r.

– Now, choose a random hyperplane (passing through the origin) in R^r, and assign to each variable x_i either +1 or −1, depending on which side of the hyperplane the point v_i lies.

• It turns out that this procedure gives a solution that, on average, is quite close to the value of the SDP bound. The random hyperplane can be characterized by its normal vector p, which is chosen to be uniformly distributed on the unit sphere.

(35)

• The rounded solution is given by x_i = sign(p^Tv_i). ⇒ The expected value of this solution in x^T W x:

E_p[x^TW x] = X

i,j

w_ijE_p[x_ix_j]

= X

i,j

w_ijE_p[sign(p^Tv_i) · sign(p^T v_j)]

= 2

π

X

i,j

w_ij arcsin X_ij.

• The objective of maximum cut is ¹₄ P

i,j w_ij(1 − y_iy_j). ⇒ csdp−expected = 1

4 · 2 π

X

i,j

w_ij arccos X_ij.

(36)

• On the other hand, the solution of the SDP gives an upper bound on the cut capacity equal to:

csdp−upperbound = 1 4

X

i,j

w_ij(1 − X_ij).

• Consider the problem of finding a constant α such that α(1 − t) ≤ 2

π arccos(t), ∀t ∈ [−1, 1].

This is

α = min

t∈[−1,1]

2 π

arccos(t)

1 − t = min

θ∈[0,π]

2 π

θ

1 − cos θ.

• It can be shown that 0.87856 < α < 0.87857.

(37)

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 0

0.5 1 1.5 2 2.5

α(1−t) 2/π arccos(t)

• Thus

csdp−upperbound ≤ 1 4· 1

α

X

i,j

w_ij arccos(X_ij) = 1

αcsdp−expected.

(38)

• So far we have the following inequalities:

csdp−upperbound ≤ 1

αcsdp−expected

csdp−expected ≤ c_max c_max ≤ csdp−upperbound

• Therefore

0.878 · csdp−upperbound ≤ c_max

(approximation ratio for SDP bound) 0.878 · c_max ≤ csdp−expected

(approximation ratio for feasible solution)

(39)

Nesterov’s SDP bound

• In the MAXCUT problem, we are in fact maximizing the homogeneous quadratic form (omitting ¹₄):

x^T Ax =

n

X

i,j=1

w_ij(1−x_ix_j) =

n

X

i=1

(

n

X

j=1

w_ij)x²_i −

n

X

i,j=1

w_ijx_ix_j over {−1, 1}ⁿ.

• Special properties of A:

– A 0;

– A^ij ≤ 0 for all i 6= j;

–

Pn

j=1 A_ij = 0.

(40)

• What happens if A is a general positive semidefinite matrix?

• Let A 0, consider the problem:

(P ) max

x∈{−1,1}ⁿ x^T Ax.

• The SDP relaxation of (P ) is

(SDP ) max A • X

s.t. diag(X) = e, X 0.

v(SDP )

v(P ) ≥ 1.

• Nesterov (1998). Let A 0. Then

v(P ) ≤ v(SDP ) ≤ ^π₂v(P )

(41)

Improvement of the SDP bound

– Saddle point characterization of optimality conditions:

Let x^∗ ∈ {−1, 1}ⁿ and λ^∗ ∈ Rⁿ. Then x^∗ solves (P ), λ^∗ solves (D) and v(P ) = v(D) ⇔

(1) Q + 2diag(λ^∗) 0;

(2) [Q + 2diag(λ^∗)]x^∗ = −c.

– Let

Q^∗ = Q + 2diag(λ^∗),

C = {x ∈ Rⁿ | Q^∗x = −c}.

Then, v(P ) = v(D) ⇔∈ {−1, 1}ⁿ ∩ C 6= ∅.

(42)

– A sufficient condition for the polynomial solvability of (P ):

Q^∗ ≻ 0 ⇒ v(P ) = v(D), i.e. there is no duality gap.

– What if Q^∗ 6≻ 0? The saddle point conditions motivate us to define the distance:

δ = dist({−1, 1}ⁿ, C).

– δ = 0⇔ v(P ) = v(D).

– Can we improve the SDP bound v(D) if δ 6= 0?

(43)

– Let

Q^∗ = Q + 2diag(λ^∗),

C = {x ∈ Rⁿ | Q^∗x = −c}.

Then, v(P ) − v(D) = 0 ⇔ ∃ x^∗ ∈ {−1, 1}ⁿ ∩ C.

– This motivates us to define the distance:

δ = dist({−1, 1}ⁿ, C).

– δ = 0⇔ v(P ) = v(D).

(44)

• Main result: δ > 0 ⇒ an improved lower bound:

ν_s = v(D) + 1

4ξ_r+1δ²,

where 0 < ξ_r+1 ≤ · · · ≤ ξ_n are the positive eigenvalues of Q^∗ = Q + 2diag(λ^∗).

• Now, the question is how to compute

δ = dist({−1, 1}ⁿ, C)?

(45)

C¹ C²

C³

C⁴ C⁵

C⁶

C⁷

C⁹ C¹⁰

Figure 3: Partition of C

(46)

Computation of distance δ

The idea is to partition the set C into certain subsets in each of which the signs of x ∈ C are the same.

⇓

The distance between {−1, 1}ⁿ and each subset must be achieved at sign(π), where π is an interior point of the subset.

⇓

Via enumerating all the subsets and finding an interior point in each of the subsets, we obtain δ.

• Let the solution in R^s of C be given as x_j = x⁰_j +

s

X

i=1

G_ijz_i = 0, i = 1, . . . , n. (5)

(47)

• Consider n hyperplanes in R^s: x⁰_j +

s

X

i=1

G_ijz_i = 0, i = 1, . . . , n, (6) which consist of an arrangement of hyperplanes (in a term of geometric computation). Finding all T_x’s corresponds to finding all cells of the arrangement.

• It has been known that the number of cells of the hyperplane arrangement is O(n^s).

• For fixed s, the distance δ can be computed in polynomial time. Another polynomially solvable case is s = n − 1.

• Solution method for cell enumeration: Reverse search algorithm (Avis and Fukuda (1996), Sleumer (1999))