Slides credited from Hsueh-I Lu & Hsu-Chun Hsiao

(1)

(2)

▪ Approximation Algorithms

▪ Examples

▪ Vertex Cover

▪ Traveling Salesman Problem

▪ Set Cover

▪ 3-CNF-SAT

(3)

▪ “A value or quantity that is nearly but not exactly correct”

▪ Approximation algorithms for optimization problems: the approximate solution is guaranteed to be close to the exact solution (i.e., the optimal value)

▪ Cf. heuristics search: no guarantee

▪ Note: we cannot approximate decision problems

The exact answer

error bound

(4)

▪ Most practical optimization problems are NP-hard

▪ It is widely believed that P ≠ NP

▪ Thus, polynomial-time algorithms are unlikely, and we must sacrifice either optimality, efficiency, or generality

▪ Approximation algorithms sacrifice optimality, return near-optimal answers

▪ How “near” is near-optimal?

(5)

▪ -approximation algorithm

▪ Approximation ratio

▪ n: input size

▪ C^*: cost of an optimal solution

▪ C: cost of the solution produced by the approximation algorithm

Maximization problem:

Minimization problem:

(6)

▪

▪ Smaller is better ( indicates an exact algorithm)

▪ Challenge: prove that C is close to C^* without knowing C^*

n: input size

C^*: cost of an optimal solution C: cost of an approximate solution

(7)

Textbook 35.1 – The vertex-cover problem

7

(8)

▪ A vertex cover of G = (V, E) is a subset V’ ⊆ V s.t. if (w, v) ∈ E, then w ∈ V’ or v ∈ V’

▪ A vertex cover “covers” every edge in G

▪ Optimization problem: find a minimum size vertex cover in G

▪ Decision problem: is there a vertex cover with size smaller than k

NP-complete

(9)

▪ Idea: cover as many edges as possible (vertex with the maximum degree) at each stage and then delete the covered edges

c

b d

a e f g

c

b d

a e f g

c

b d

a e f g

c

b d

a e f g

(10)

▪ Idea: cover as many edges as possible (vertex with the maximum degree) at each stage and then delete the covered edges

▪ The greedy heuristic cannot always find optimal solution (otherwise P=NP is proven)

▪ There is no guarantee that C is always close to C^* either

(11)

▪ APPROX-VERTEX-COVER

▪ Randomly select one edge at a time

▪ Remove all incident edges

▪ Running time =

APPROX-VERTEX-COVER(G) C = Ø

E’ = G.E

while E’ ≠ Ø

let (u, v) be an arbitrary edge of E’

C = C ∪ {u, v}

remove from E’ every edge incident on either u or v return C

(12)

▪ APPROX-VERTEX-COVER

▪ Randomly select one edge at a time

▪ Remove all incident edges

c

b d

a e f g

c

b d

a e f g

c b

c

b d

a e f g

d

f

{b, c, d, f} is a vertex cover of size 4 found by the approximation algorithm (not optimal!)

(13)

Theorem. APPROX-VERTEX-COVER is a 2-approximation for the vertex cover problem.

▪ 3 things to check

▪ Q1: Does it give a feasible solution?

▪ A feasible solution for vertex cover is a node set that covers all the edges

▪ Finding an optimal solution is hard, but finding a feasible one could be easy

▪ Q2: Does it run in polynomial time?

▪ An exponential-time algorithm is not qualified to be an approximation algorithm

▪ Q3: Does it give an approximate solution with approximation ratio ≤ 2?

▪ Other names: 2-approximate solution, factor-2 approximation

(14)

▪ Suppose that the algorithm runs for k iterations. Let C be the output of APPROX-VERTEX-COVER. Let OPT be any optimal vertex cover of G.

▪ If k = 0, then

▪ If k > 0, then . It suffices to ensure that

▪ Observe that all those k edges (u, v) chosen by APPROX-VERTEX-COVER in those k iterations form a matching of G. Just for OPT (or any feasible solution) to cover this matching requires at least k nodes.

Prove that . That is .

The proof doesn’t require knowing the actual value of C*!

(15)

▪ Tight analysis: check whether we underestimate the quality of the approximate solution obtained by APPROX-VERTEX-COVER

▪ This factor-2 approximation is still the best known approximation algorithm

▪ Reducing to 1.99 is a significant result

Yes, it is tight!

(16)

▪ C is a vertex cover of graph G=(V, E) iff V – C is an independent set of G

▪ Q: Does a 2-approximation algorithm for vertex cover imply a 2- approximation for maximum independent set?

Optimal independent Set: 51 nodes Optimal vertex

cover: 49 nodes

A 2-approximate vertex cover: 98 nodes

2 nodes

(17)

Textbook 35.2 – The traveling-salesman problem

17

(18)

▪ Optimization problem: Given a set of cities and their pairwise distances, find a tour of lowest cost that visits each city exactly once.

▪ Inter-city distances satisfy triangle inequality if for all vertices

u v

y x

3

4 5 5 1

3

u v

y x

3

1 1 1 1

1

w/ triangle inequality w/o triangle inequality

(19)

▪ APPROX-TSP-TOUR

▪ Grow an MST from a random root

▪ MST-PRIM

▪ For (n - 1) iterations, add the least-weighted edge incident to the current subtree that does not incur a cycle

▪ Running time =

APPROX-TSP-TOUR(G)

select a vertex r from G.V as a “root” vertex

grow a minimum spanning tree T for G from root r using MST-PRIM(G, d, r)

H = the list of vertices visited in a preorder tree walk of T return C

(20)

H = a, b, c, h, d, e, f, g, a

H* = a, b, c, h, f, g, e, d, a

(21)

Theorem. APPROX-TSP-TOUR is a 2-approximation for the TSP problem.

▪ A feasible solution is a path of G visiting each cities exactly once

▪ The property of a complete graph is needed

▪ Q3: Does it give an approximate solution with approximation ratio ≤ 2?

(22)

▪ With triangle inequality:

▪ Let H* denote an optimal tour formed by some tree plus an edge:

▪ Hence,

Prove that . That is .

(23)

Theorem 35.3. If P ≠ NP, there is no polynomial-time approximation algorithm with a constant ratio bound ρ for the general TSP

▪ Proof by contradiction

▪ Suppose there is such an algorithm A with a constant ratio ρ. We will use A to solve HAM-CYCLE in polynomial time.

▪ Algorithm for HAM-CYCLE

▪ Convert G = (V, E) into an instance I of TSP with cities V (resulting in a complete graph G' = (V, E’)):

▪ Run A on I

▪ If the reported cost ≤ ρ|V|, then return “Yes” (i.e., G contains a tour that is an

(24)

Theorem 35.3. If P ≠ NP, there is no polynomial-time approximation algorithm with a constant ratio bound ρ for the general TSP

▪ Analysis

▪ If G has an HC: G’ contains a tour of cost |V| by picking edges in E, each has 1 cost

▪ If G does not have an HC: any tour of G’ must use some edge not in E, which has a total cost

▪ Algorithm A guarantees to return a tour of cost

▪ HAM-CYCLE can be solved in polynomial time, contradiction

▪ A returns a cost if G contains an HC; A returns a cost , otherwise

v y

u

v y

u 1

≤_p

u, y, v, w, x, u is a Hamiltonian Cycle

u, y, v, w, x, u is a traveling- salesman tour with cost |V|

(25)

Show how in polynomial time we can transform one instance of the traveling-

salesman problem into another instance whose cost function satisfies the triangle inequality. The two instances must have the same set of optimal tours. Explain why such a polynomial-time transformation does not contradict Theorem 35.3, assuming that P ≠ NP.

u v

y x

5

1 1 1 1

5

u v

y x

?

? ? ? ?

?

≤_p

(26)

▪ For example, we can add d_max (the largest cost) to each edge

▪ G contains a tour of minimum cost k  G’ contains a tour of minimum cost

▪ G’s satisfies triangle inequality because for all vertices

u v

y x

5

1 1 1 1

5

TSP w/o triangle inequality

u v

y x

5 + d_max

TSP w/ triangle inequality

≤_p 1 + d_max

5 + d_max

1 + d_max 1 + d_max

1 + d_max

d_max= 5

(27)

u v

y x

5

1 1 1 1

5

TSP w/o triangle inequality

u v

y x

5 + d_max

TSP w/ triangle inequality

≤_p 1 + d_max

5 + d_max

1 + d_max 1 + d_max

1 + d_max

d_max= 5

u ¹⁰ v

6 6 6 6

approximate

(28)

Textbook 35.3 – The set-covering problem

28

(29)

▪ Optimization problem: Given k subsets {S₁, S₂, …, S_k} of 1, 2, …, n, find an index subset C of {1, 2, …, k} with minimum |C| s.t.

Set cover is NP-complete.

1) It is in NP 2) It is NP-hard

(30)

▪ GREEDY-SET-COVER

▪ At each stage, picking the set S that covers the greatest number of remaining elements that are uncovered

▪ Running time = ?

GREEDY-SET-COVER(S) I = Ø

C = Ø

while C ≠ {1, 2, …, n}

select i be an index maximizing |S_i - C|

I = I ∪ {i}

C = C ∪ S_i return I

(31)

(32)

Theorem. GREEDY-SET-COVER is a -approximation for the set cover problem.

▪ A feasible solution output is a collection of subsets whose union is the ground set {1, 2, …, n}.

▪ Q3: Does it give an approximate solution with ?

(33)

▪ Let I* denote an optimal set cover. We plan to prove that Prove that . That is, .

(34)

▪ For brevity, we re-index those subsets s.t. for each i, S_i is the i-th set selected by GREEDY-SET-COVER

▪ Let C_i be the C right before the elements of S_i is inserted into C

▪ If an element j is inserted into C in the i-th iteration, the price of j is

▪ The sum of price of all n integers is exactly

(35)

1/3

1/8

1/1

(36)

▪ For brevity, we re-index the integers s.t. they are inserted into C according to the increasing order of these integers

▪ When j is about to be put into C, there are at least n-j+1 uncovered numbers. I* is a collection of sets that can cover these n-j+1 numbers.

There is an index t ϵ I* s.t. S_t can cover at least uncovered numbers

▪ We have , where j is inserted into C in the i-th iteration.

▪ The price of j is

(37)

▪ The sum of price of all n integers is exactly

▪ The price of j is at most

▪ Therefore, we can prove that

(38)

Textbook 35.4 – Randomization and linear programming

38

(39)

▪ Randomized algorithm’s behavior is determined not only by its input but also by values produced by a random-number generator

Exact Approximate

Deterministic MST APPROX-TSP-TOUR

Randomized Quick Sort MAX-3-CNF-SAT

(40)

▪ Decision problem: Satisfiability of Boolean formulas in 3-conjunctive normal form (3-CNF)

▪ 3-CNF = AND of clauses, each of which is the OR of exactly 3 distinct literals

▪ A literal is an occurrence of a variable or its negation, e.g., x₁ or ¬x₁

→ satisfiable

What is the optimization version of 3-CNF-SAT?

(41)

▪ Optimization problem: find an assignment of the variables that satisfies as many clauses as possible

▪ Closeness to optimum is measured by the fraction of satisfied clauses

satisfies 3 clauses satisfies 2 clauses

This clause is always satisfied.

For simplicity, we assume no clause containing both literal and its negation.

(42)

▪ Randomly set each literal to be 0 or 1 (丟硬幣)

▪ Then…

▪ End

Theorem 35.6. Given an instance of MAX-3-CNF-SAT with n variables x₁, x₂, …, x_n and m clauses, the randomized algorithm that independently sets each variable to 1 with probability 1/2 and to 0 with probability 1/2 is a randomized 8/7-

approximation algorithm

(43)

Theorem 35.6. Given an instance of MAX-3-CNF-SAT with n variables x₁, x₂, …, x_n and m clauses, the randomized algorithm that independently sets each variable to 1 with probability 1/2 and to 0 with probability 1/2 is a randomized 8/7-

approximation algorithm

▪ Proof

▪ Each clause is the OR of exactly 3 distinct literals

(satisfying 8/7 of clauses in expectation)

(44)

▪ Most practical optimization problems are NP-hard

▪ It is widely believed that P ≠ NP

▪ Thus, polynomial-time algorithms are unlikely, and we must sacrifice either optimality, efficiency, or generality

▪ Approximation algorithms sacrifice optimality, return near-optimal answers

Maximization problem:

Minimization problem:

(45)

Course Website: http://ada.miulab.tw Email: ada-ta@csie.ntu.edu.tw

45

Important announcement will be sent to @ntu.edu.tw mailbox

& post to the course website