▪ Approximation Algorithms
▪ Examples
▪ Vertex Cover
▪ Traveling Salesman Problem
▪ Set Cover
▪ 3-CNF-SAT
▪ “A value or quantity that is nearly but not exactly correct”
▪ Approximation algorithms for optimization problems: the approximate solution is guaranteed to be close to the exact solution (i.e., the optimal value)
▪ Cf. heuristics search: no guarantee
▪ Note: we cannot approximate decision problems
The exact answer
error bound
▪ Most practical optimization problems are NP-hard
▪ It is widely believed that P ≠ NP
▪ Thus, polynomial-time algorithms are unlikely, and we must sacrifice either optimality, efficiency, or generality
▪ Approximation algorithms sacrifice optimality, return near-optimal answers
▪ How “near” is near-optimal?
▪ -approximation algorithm
▪ Approximation ratio
▪ n: input size
▪ C*: cost of an optimal solution
▪ C: cost of the solution produced by the approximation algorithm
Maximization problem:
Minimization problem:
▪
▪ Smaller is better ( indicates an exact algorithm)
▪ Challenge: prove that C is close to C* without knowing C*
n: input size
C*: cost of an optimal solution C: cost of an approximate solution
Textbook 35.1 – The vertex-cover problem
7
▪ A vertex cover of G = (V, E) is a subset V’ ⊆ V s.t. if (w, v) ∈ E, then w ∈ V’ or v ∈ V’
▪ A vertex cover “covers” every edge in G
▪ Optimization problem: find a minimum size vertex cover in G
▪ Decision problem: is there a vertex cover with size smaller than k
NP-complete
▪ Idea: cover as many edges as possible (vertex with the maximum degree) at each stage and then delete the covered edges
c
b d
a e f g
c
b d
a e f g
c
b d
a e f g
c
b d
a e f g
▪ Idea: cover as many edges as possible (vertex with the maximum degree) at each stage and then delete the covered edges
▪ The greedy heuristic cannot always find optimal solution (otherwise P=NP is proven)
▪ There is no guarantee that C is always close to C* either
▪ APPROX-VERTEX-COVER
▪ Randomly select one edge at a time
▪ Remove all incident edges
▪ Running time =
APPROX-VERTEX-COVER(G) C = Ø
E’ = G.E
while E’ ≠ Ø
let (u, v) be an arbitrary edge of E’
C = C ∪ {u, v}
remove from E’ every edge incident on either u or v return C
▪ APPROX-VERTEX-COVER
▪ Randomly select one edge at a time
▪ Remove all incident edges
c
b d
a e f g
c
b d
a e f g
c b
c
b d
a e f g
d
f
{b, c, d, f} is a vertex cover of size 4 found by the approximation algorithm (not optimal!)
Theorem. APPROX-VERTEX-COVER is a 2-approximation for the vertex cover problem.
▪ 3 things to check
▪ Q1: Does it give a feasible solution?
▪ A feasible solution for vertex cover is a node set that covers all the edges
▪ Finding an optimal solution is hard, but finding a feasible one could be easy
▪ Q2: Does it run in polynomial time?
▪ An exponential-time algorithm is not qualified to be an approximation algorithm
▪ Q3: Does it give an approximate solution with approximation ratio ≤ 2?
▪ Other names: 2-approximate solution, factor-2 approximation
▪ Suppose that the algorithm runs for k iterations. Let C be the output of APPROX-VERTEX-COVER. Let OPT be any optimal vertex cover of G.
▪ If k = 0, then
▪ If k > 0, then . It suffices to ensure that
▪ Observe that all those k edges (u, v) chosen by APPROX-VERTEX-COVER in those k iterations form a matching of G. Just for OPT (or any feasible solution) to cover this matching requires at least k nodes.
Prove that . That is .
The proof doesn’t require knowing the actual value of C*!
▪ Tight analysis: check whether we underestimate the quality of the approximate solution obtained by APPROX-VERTEX-COVER
▪ This factor-2 approximation is still the best known approximation algorithm
▪ Reducing to 1.99 is a significant result
Yes, it is tight!
▪ C is a vertex cover of graph G=(V, E) iff V – C is an independent set of G
▪ Q: Does a 2-approximation algorithm for vertex cover imply a 2- approximation for maximum independent set?
Optimal independent Set: 51 nodes Optimal vertex
cover: 49 nodes
A 2-approximate vertex cover: 98 nodes
2 nodes
Textbook 35.2 – The traveling-salesman problem
17
▪ Optimization problem: Given a set of cities and their pairwise distances, find a tour of lowest cost that visits each city exactly once.
▪ Inter-city distances satisfy triangle inequality if for all vertices
u v
y x
3
4 5 5 1
3
u v
y x
3
1 1 1 1
1
w/ triangle inequality w/o triangle inequality
▪ APPROX-TSP-TOUR
▪ Grow an MST from a random root
▪ MST-PRIM
▪ For (n - 1) iterations, add the least-weighted edge incident to the current subtree that does not incur a cycle
▪ Running time =
APPROX-TSP-TOUR(G)
select a vertex r from G.V as a “root” vertex
grow a minimum spanning tree T for G from root r using MST-PRIM(G, d, r)
H = the list of vertices visited in a preorder tree walk of T return C
H = a, b, c, h, d, e, f, g, a
H* = a, b, c, h, f, g, e, d, a
Theorem. APPROX-TSP-TOUR is a 2-approximation for the TSP problem.
▪ 3 things to check
▪ Q1: Does it give a feasible solution?
▪ A feasible solution is a path of G visiting each cities exactly once
▪ The property of a complete graph is needed
▪ Q2: Does it run in polynomial time?
▪ Q3: Does it give an approximate solution with approximation ratio ≤ 2?
▪ With triangle inequality:
▪ Let H* denote an optimal tour formed by some tree plus an edge:
▪ Hence,
Prove that . That is .
Theorem 35.3. If P ≠ NP, there is no polynomial-time approximation algorithm with a constant ratio bound ρ for the general TSP
▪ Proof by contradiction
▪ Suppose there is such an algorithm A with a constant ratio ρ. We will use A to solve HAM-CYCLE in polynomial time.
▪ Algorithm for HAM-CYCLE
▪ Convert G = (V, E) into an instance I of TSP with cities V (resulting in a complete graph G' = (V, E’)):
▪ Run A on I
▪ If the reported cost ≤ ρ|V|, then return “Yes” (i.e., G contains a tour that is an
Theorem 35.3. If P ≠ NP, there is no polynomial-time approximation algorithm with a constant ratio bound ρ for the general TSP
▪ Analysis
▪ If G has an HC: G’ contains a tour of cost |V| by picking edges in E, each has 1 cost
▪ If G does not have an HC: any tour of G’ must use some edge not in E, which has a total cost
▪ Algorithm A guarantees to return a tour of cost
▪ HAM-CYCLE can be solved in polynomial time, contradiction
▪ A returns a cost if G contains an HC; A returns a cost , otherwise
v y
u
v y
u 1
≤p
u, y, v, w, x, u is a Hamiltonian Cycle
u, y, v, w, x, u is a traveling- salesman tour with cost |V|
Show how in polynomial time we can transform one instance of the traveling-
salesman problem into another instance whose cost function satisfies the triangle inequality. The two instances must have the same set of optimal tours. Explain why such a polynomial-time transformation does not contradict Theorem 35.3, assuming that P ≠ NP.
u v
y x
5
1 1 1 1
5
u v
y x
?
? ? ? ?
?
≤p
▪ For example, we can add dmax (the largest cost) to each edge
▪ G contains a tour of minimum cost k G’ contains a tour of minimum cost
▪ G’s satisfies triangle inequality because for all vertices
u v
y x
5
1 1 1 1
5
TSP w/o triangle inequality
u v
y x
5 + dmax
TSP w/ triangle inequality
≤p 1 + dmax
5 + dmax
1 + dmax 1 + dmax
1 + dmax
dmax= 5
u v
y x
5
1 1 1 1
5
TSP w/o triangle inequality
u v
y x
5 + dmax
TSP w/ triangle inequality
≤p 1 + dmax
5 + dmax
1 + dmax 1 + dmax
1 + dmax
dmax= 5
u 10 v
6 6 6 6
approximate
Textbook 35.3 – The set-covering problem
28
▪ Optimization problem: Given k subsets {S1, S2, …, Sk} of 1, 2, …, n, find an index subset C of {1, 2, …, k} with minimum |C| s.t.
Set cover is NP-complete.
1) It is in NP 2) It is NP-hard
▪ GREEDY-SET-COVER
▪ At each stage, picking the set S that covers the greatest number of remaining elements that are uncovered
▪ Running time = ?
GREEDY-SET-COVER(S) I = Ø
C = Ø
while C ≠ {1, 2, …, n}
select i be an index maximizing |Si - C|
I = I ∪ {i}
C = C ∪ Si return I
Theorem. GREEDY-SET-COVER is a -approximation for the set cover problem.
▪ 3 things to check
▪ Q1: Does it give a feasible solution?
▪ A feasible solution output is a collection of subsets whose union is the ground set {1, 2, …, n}.
▪ Q2: Does it run in polynomial time?
▪ Q3: Does it give an approximate solution with ?
▪ Let I* denote an optimal set cover. We plan to prove that Prove that . That is, .
▪ For brevity, we re-index those subsets s.t. for each i, Si is the i-th set selected by GREEDY-SET-COVER
▪ Let Ci be the C right before the elements of Si is inserted into C
▪ If an element j is inserted into C in the i-th iteration, the price of j is
▪ The sum of price of all n integers is exactly
1/3
1/8
1/1
▪ For brevity, we re-index the integers s.t. they are inserted into C according to the increasing order of these integers
▪ When j is about to be put into C, there are at least n-j+1 uncovered numbers. I* is a collection of sets that can cover these n-j+1 numbers.
There is an index t ϵ I* s.t. St can cover at least uncovered numbers
▪ We have , where j is inserted into C in the i-th iteration.
▪ The price of j is
▪ The sum of price of all n integers is exactly
▪ The price of j is at most
▪ Therefore, we can prove that
Textbook 35.4 – Randomization and linear programming
38
▪ Randomized algorithm’s behavior is determined not only by its input but also by values produced by a random-number generator
Exact Approximate
Deterministic MST APPROX-TSP-TOUR
Randomized Quick Sort MAX-3-CNF-SAT
▪ Decision problem: Satisfiability of Boolean formulas in 3-conjunctive normal form (3-CNF)
▪ 3-CNF = AND of clauses, each of which is the OR of exactly 3 distinct literals
▪ A literal is an occurrence of a variable or its negation, e.g., x1 or ¬x1
→ satisfiable
What is the optimization version of 3-CNF-SAT?
▪ Optimization problem: find an assignment of the variables that satisfies as many clauses as possible
▪ Closeness to optimum is measured by the fraction of satisfied clauses
satisfies 3 clauses satisfies 2 clauses
This clause is always satisfied.
For simplicity, we assume no clause containing both literal and its negation.
▪ Randomly set each literal to be 0 or 1 (丟硬幣)
▪ Then…
▪ End
Theorem 35.6. Given an instance of MAX-3-CNF-SAT with n variables x1, x2, …, xn and m clauses, the randomized algorithm that independently sets each variable to 1 with probability 1/2 and to 0 with probability 1/2 is a randomized 8/7-
approximation algorithm
Theorem 35.6. Given an instance of MAX-3-CNF-SAT with n variables x1, x2, …, xn and m clauses, the randomized algorithm that independently sets each variable to 1 with probability 1/2 and to 0 with probability 1/2 is a randomized 8/7-
approximation algorithm
▪ Proof
▪ Each clause is the OR of exactly 3 distinct literals
(satisfying 8/7 of clauses in expectation)
▪ Most practical optimization problems are NP-hard
▪ It is widely believed that P ≠ NP
▪ Thus, polynomial-time algorithms are unlikely, and we must sacrifice either optimality, efficiency, or generality
▪ Approximation algorithms sacrifice optimality, return near-optimal answers
Maximization problem:
Minimization problem:
Course Website: http://ada.miulab.tw Email: ada-ta@csie.ntu.edu.tw
45
Important announcement will be sent to @ntu.edu.tw mailbox
& post to the course website