### ▪ Approximation Algorithms

### ▪ Examples

### ▪ Vertex Cover

### ▪ Traveling Salesman Problem

### ▪ Set Cover

### ▪ 3-CNF-SAT

▪ “A value or quantity that is nearly but not exactly correct”

▪ **Approximation algorithms for optimization problems: the approximate **
solution is guaranteed to be close to the exact solution (i.e., the optimal
value)

▪ Cf. heuristics search: no guarantee

▪ Note: we cannot approximate decision problems

**The exact answer**

error bound

▪ Most practical optimization problems are NP-hard

▪ It is widely believed that P ≠ NP

▪ Thus, polynomial-time algorithms are unlikely, and we must sacrifice either
**optimality, efficiency, or generality**

▪ **Approximation algorithms sacrifice optimality, return near-optimal **
answers

▪ How “near” is near-optimal?

▪ -approximation algorithm

▪ Approximation ratio

▪ *n: input size*

▪ *C** ^{*}*: cost of an optimal solution

▪ *C: cost of the solution produced by the approximation algorithm*

Maximization problem:

Minimization problem:

▪

▪ Smaller is better ( indicates an exact algorithm)

▪ *Challenge: prove that C is close to C*^{*}*without knowing C*^{*}

*n: input size*

*C** ^{*}*: cost of an optimal solution

*C: cost of an approximate solution*

Textbook 35.1 – The vertex-cover problem

**7**

▪ A vertex cover of G = (V, E) is a subset V’ ⊆ V s.t. if (w, v) ∈ E, then w ∈ V’ or v ∈ V’

▪ A vertex cover “covers” every edge in G

▪ Optimization problem: find a minimum size vertex cover in G

▪ *Decision problem: is there a vertex cover with size smaller than k*

NP-complete

▪ Idea: cover as many edges as possible (vertex with the maximum degree) at each stage and then delete the covered edges

c

b d

a e f g

c

b d

a e f g

c

b d

a e f g

c

b d

a e f g

▪ Idea: cover as many edges as possible (vertex with the maximum degree) at each stage and then delete the covered edges

▪ The greedy heuristic cannot always find optimal solution (otherwise P=NP is proven)

▪ *There is no guarantee that C is always close to C** ^{*}* either

▪ APPROX-VERTEX-COVER

▪ **Randomly select one edge at a time**

▪ Remove all incident edges

▪ Running time =

APPROX-VERTEX-COVER(G) C = Ø

E’ = G.E

while E’ ≠ Ø

let (u, v) be an arbitrary edge of E’

C = C ∪ {u, v}

remove from E’ every edge incident on either u or v return C

▪ APPROX-VERTEX-COVER

▪ **Randomly select one edge at a time**

▪ Remove all incident edges

c

b d

a e f g

c

b d

a e f g

c b

c

b d

a e f g

d

f

{b, c, d, f} is a vertex cover of size 4 found by the approximation algorithm (not optimal!)

Theorem. APPROX-VERTEX-COVER is a 2-approximation for the vertex cover problem.

▪ 3 things to check

▪ Q1: Does it give a feasible solution?

▪ A feasible solution for vertex cover is a node set that covers all the edges

▪ Finding an optimal solution is hard, but finding a feasible one could be easy

▪ Q2: Does it run in polynomial time?

▪ An exponential-time algorithm is not qualified to be an approximation algorithm

▪ Q3: Does it give an approximate solution with approximation ratio ≤ 2?

▪ Other names: 2-approximate solution, factor-2 approximation

▪ *Suppose that the algorithm runs for k iterations. Let C be the output of *
*APPROX-VERTEX-COVER. Let OPT be any optimal vertex cover of G.*

▪ *If k = 0, then *

▪ *If k > 0, then . It suffices to ensure that*

▪ *Observe that all those k edges (u, v) chosen by APPROX-VERTEX-COVER in *
*those k iterations form a matching of G. Just for OPT (or any feasible solution) *
*to cover this matching requires at least k nodes.*

Prove that . That is .

The proof doesn’t require knowing the actual value of C*!

▪ Tight analysis: check whether we underestimate the quality of the approximate solution obtained by APPROX-VERTEX-COVER

▪ This factor-2 approximation is still the best known approximation algorithm

▪ Reducing to 1.99 is a significant result

Yes, it is tight!

▪ *C is a vertex cover of graph G=(V, E) iff V – C is an independent set of G*

▪ Q: Does a 2-approximation algorithm for vertex cover imply a 2- approximation for maximum independent set?

Optimal independent Set: 51 nodes Optimal vertex

cover: 49 nodes

A 2-approximate vertex cover: 98 nodes

2 nodes

Textbook 35.2 – The traveling-salesman problem

**17**

▪ Optimization problem: Given a set of cities and their pairwise distances, find a tour of lowest cost that visits each city exactly once.

▪ **Inter-city distances satisfy triangle inequality if for all vertices**

u v

y x

3

4 5 5 1

3

u v

y x

3

1 1 1 1

1

w/ triangle inequality w/o triangle inequality

▪ APPROX-TSP-TOUR

▪ Grow an MST from a random root

▪ MST-PRIM

▪ *For (n - 1) iterations, add the least-weighted edge incident to the current *
subtree that does not incur a cycle

▪ Running time =

APPROX-TSP-TOUR(G)

select a vertex r from G.V as a “root” vertex

grow a minimum spanning tree T for G from root r using MST-PRIM(G, d, r)

H = the list of vertices visited in a preorder tree walk of T return C

**H = a, b, c, h, d, e, f, g, a**

**H* = a, b, c, h, f, g, e, d, a**

Theorem. APPROX-TSP-TOUR is a 2-approximation for the TSP problem.

▪ 3 things to check

▪ Q1: Does it give a feasible solution?

▪ *A feasible solution is a path of G visiting each cities exactly once*

▪ The property of a complete graph is needed

▪ Q2: Does it run in polynomial time?

▪ Q3: Does it give an approximate solution with approximation ratio ≤ 2?

▪ With triangle inequality:

▪ *Let H* denote an optimal tour formed by some tree plus an edge:*

▪ Hence,

Prove that . That is .

**Theorem 35.3.** **If P ≠ NP, there is no polynomial-time approximation**
**algorithm with a constant ratio bound ρ for the general TSP**

▪ Proof by contradiction

▪ *Suppose there is such an algorithm A with a constant ratio ρ. We will use A *
to solve HAM-CYCLE in polynomial time.

▪ Algorithm for HAM-CYCLE

▪ *Convert G = (V, E) into an instance I of TSP with cities V (resulting in a complete *
*graph G' = (V, E’)):*

▪ *Run A on I*

▪ *If the reported cost ≤ ρ|V|, then return “Yes” (i.e., G contains a tour that is an *

**Theorem 35.3.** **If P ≠ NP, there is no polynomial-time approximation**
**algorithm with a constant ratio bound ρ for the general TSP**

▪ Analysis

▪ *If G has an HC: G’ contains a tour of cost |V| by picking edges in E, each has 1 cost*

▪ *If G does not have an HC: any tour of G’ must use some edge not in E, which has a *
total cost

▪ *Algorithm A guarantees to return a tour of cost *

▪ HAM-CYCLE can be solved in polynomial time, contradiction

▪ *A returns a cost if G contains an HC; A returns a cost , otherwise*

v y

u

v y

u 1

≤_{p}

u, y, v, w, x, u is a Hamiltonian Cycle

u, y, v, w, x, u is a traveling- salesman tour with cost |V|

Show how in polynomial time we can transform one instance of the traveling-

salesman problem into another instance whose cost function satisfies the triangle inequality. The two instances must have the same set of optimal tours. Explain why such a polynomial-time transformation does not contradict Theorem 35.3, assuming that P ≠ NP.

u v

y x

5

1 1 1 1

5

u v

y x

?

? ? ? ?

?

≤_{p}

▪ For example, we can add d_{max} (the largest cost) to each edge

▪ *G contains a tour of minimum cost k G’ contains a tour of minimum *
cost

▪ G’s satisfies triangle inequality because for all vertices

u v

y x

5

1 1 1 1

5

TSP w/o triangle inequality

u v

y x

5 + d_{max}

TSP w/ triangle inequality

≤_{p} 1 + d_{max}

5 + d_{max}

1 + d_{max}
1 + d_{max}

1 + d_{max}

d_{max}= 5

u v

y x

5

1 1 1 1

5

TSP w/o triangle inequality

u v

y x

5 + d_{max}

TSP w/ triangle inequality

≤_{p} 1 + d_{max}

5 + d_{max}

1 + d_{max}
1 + d_{max}

1 + d_{max}

d_{max}= 5

u ^{10} v

6 6 6 6

approximate

Textbook 35.3 – The set-covering problem

**28**

▪ *Optimization problem: Given k subsets {S*_{1}*, S*_{2}*, …, S*_{k}*} of 1, 2, …, n, find *
*an index subset C of {1, 2, …, k} with minimum |C| s.t.*

Set cover is NP-complete.

1) It is in NP 2) It is NP-hard

▪ GREEDY-SET-COVER

▪ At each stage, picking the set S that covers the greatest number of remaining elements that are uncovered

▪ Running time = ?

GREEDY-SET-COVER(S) I = Ø

C = Ø

while C ≠ {1, 2, …, n}

select i be an index maximizing |S_{i} - C|

I = I ∪ {i}

C = C ∪ S_{i}
return I

Theorem. GREEDY-SET-COVER is a -approximation for the set cover problem.

▪ 3 things to check

▪ Q1: Does it give a feasible solution?

▪ A feasible solution output is a collection of subsets whose union is the ground
*set {1, 2, …, n}. *

▪ Q2: Does it run in polynomial time?

▪ Q3: Does it give an approximate solution with ?

▪ *Let I* denote an optimal set cover. We plan to prove that*
Prove that . That is, .

▪ *For brevity, we re-index those subsets s.t. for each i, S*_{i}*is the i-th set *
selected by GREEDY-SET-COVER

▪ *Let C*_{i}*be the C right before the elements of S*_{i}*is inserted into C*

▪ *If an element j is inserted into C in the i-th iteration, the price of j is*

▪ *The sum of price of all n integers is exactly *

1/3

1/8

1/1

▪ *For brevity, we re-index the integers s.t. they are inserted into C according *
to the increasing order of these integers

▪ *When j is about to be put into C, there are at least n-j+1 uncovered *
*numbers. I* is a collection of sets that can cover these n-j+1 numbers. *

*There is an index t ϵ I* s.t. S** _{t}* can cover at least uncovered numbers

▪ *We have , where j is inserted into C in the i-th iteration.*

▪ *The price of j is *

▪ *The sum of price of all n integers is exactly *

▪ *The price of j is at most*

▪ Therefore, we can prove that

Textbook 35.4 – Randomization and linear programming

**38**

▪ **Randomized algorithm’s behavior is determined not only by its input **
but also by values produced by a random-number generator

**Exact** **Approximate**

Deterministic MST APPROX-TSP-TOUR

Randomized Quick Sort MAX-3-CNF-SAT

▪ *Decision problem: Satisfiability of Boolean formulas in 3-conjunctive *
*normal form (3-CNF)*

▪ 3-CNF = AND of clauses, each of which is the OR of exactly 3 distinct literals

▪ *A literal is an occurrence of a variable or its negation, e.g., x*_{1}*or ¬x*_{1}

→ satisfiable

What is the optimization version of 3-CNF-SAT?

▪ Optimization problem: find an assignment of the variables that satisfies
**as many clauses as possible**

▪ Closeness to optimum is measured by the fraction of satisfied clauses

satisfies 3 clauses satisfies 2 clauses

This clause is always satisfied.

For simplicity, we assume no clause containing both literal and its negation.

▪ Randomly set each literal to be 0 or 1 (丟硬幣)

▪ Then…

▪ End

Theorem 35.6. *Given an instance of MAX-3-CNF-SAT with n variables x*_{1}*, x*_{2}*, …, x*_{n}*and m clauses, the randomized algorithm that independently sets each variable *
**to 1 with probability 1/2 and to 0 with probability 1/2 is a randomized 8/7-**

**approximation algorithm**

Theorem 35.6. *Given an instance of MAX-3-CNF-SAT with n variables x*_{1}*, x*_{2}*, …, x*_{n}*and m clauses, the randomized algorithm that independently sets each variable *
**to 1 with probability 1/2 and to 0 with probability 1/2 is a randomized 8/7-**

**approximation algorithm**

▪ Proof

▪ Each clause is the OR of exactly 3 distinct literals

*(satisfying 8/7 of clauses in expectation)*

▪ Most practical optimization problems are NP-hard

▪ It is widely believed that P ≠ NP

▪ Thus, polynomial-time algorithms are unlikely, and we must sacrifice either
**optimality, efficiency, or generality**

▪ **Approximation algorithms sacrifice optimality, return near-optimal answers**

Maximization problem:

Minimization problem:

Course Website: http://ada.miulab.tw Email: ada-ta@csie.ntu.edu.tw

**45**

Important announcement will be sent to @ntu.edu.tw mailbox

& post to the course website