Theory of Computation Lecture Notes

(1)

Theory of Computation Lecture

Notes

Prof. Yuh-Dauh Lyuu

Dept. Computer Science & Information Engineering and

Department of Finance National Taiwan University

(2)

Class Information

• Papadimitriou. Computational Complexity. 2nd printing. Addison-Wesley. 1995.

– The best book on the market for graduate students. – We more or less follow the topics of the book.

– More “advanced” materials may be added. • Check

www.csie.ntu.edu.tw/~lyuu/complexity/2003 for last year’s lecture notes.

(3)

Class Information (concluded)

• More information and future lecture notes (in PDF format) can be found at

www.csie.ntu.edu.tw/~lyuu/complexity.html • Please ask many questions in class.

– The best way for me to remember you in a large class.a

• Teaching assistants will be announced later.

a

“[A] science concentrator [...] said that in his eighth semester of [Harvard] college, there was not a single science professor who could

(4)

Grading

• No roll calls. • No homeworks.

– Try some of the exercises at the end of each chapter. • Two to three examinations.

• You must show up for the examinations, in person. • If you cannot make it to an examination, please email

me beforehand (unless there is a legitimate reason). • Missing the final examination will earn a “fail” grade.

(5)

A Brief History (Biased towards Complexity)

1930–1931: G¨odel’s (1906–1978) completeness and incompleteness theorems and recursive functions. 1935–1936: Kleene (1909–1994), Turing (1912–1954),

Church (1903–1995), Post (1897–1954) on computability. 1936: Turing defined Turing machines and oracle Turing

machines.

1938: Shannon (1916–2001) used boolean algebra for the design and analysis of switching circuits. Circuit

complexity was also born. Shannon’s master’s thesis was “possibly the most important, and also the most

(6)

A Brief History (continued)

1947: Dantzig invented linear programming simplex algorithm.

1947: Paul Erd˝os (1913–1996) popularized the probabilistic method. (Also Shannon (1948).)

1949: Shannon established information theory.

1949: Shannon’s study of cryptography was published. 1956: Ford and Fulkerson’s network flows.

(7)

A Brief History (continued)

1964–1966: Solomonoff, Kolmogorov, and Chaitin formalized Kolmogorov complexity (program size and randomness). 1965: Hartmanis and Stearns started complexity theory and

hierarchy theorems (see also Rabin (1960)).

1965: Edmonds identified NP and P (actual names were coined by Karp in 1972).

1971: Cook invented the idea of NP-completeness.

1972: Karp established the importance of NP-completeness. 1972–1973: Karp, Meyer, and Stockmeyer defined the

(8)

A Brief History (continued)

1973: Karp studied PSPACE-completeness.

1973: Meyer and Stockmeyer studied exponential time and space.

1973: Baker, Gill, and Solovay studied “NP=P” relative to oracles.

1975: Ladner studied P-completeness.

1976–1977: Rabin, Solovay, Strassen, and Miller proposed probabilistic algorithms (for primality testing).

1976–1978: Diffie, Hellman, and Merkle invented public-key cryptography.

(9)

A Brief History (continued)

1977: Gill formalized randomized complexity classes. 1978: Rivest, Shamir, and Adleman invented RSA. 1978: Fortune and Wyllie defined the PRAM model. 1979: Garey and Johnson published their book on

computational complexity. 1979: Valiant defined #P. 1979: Pippenger defined NC.

1979: Khachiyan proved that linear programming is in polynomial time.

(10)

A Brief History (continued)

1980: Lamport, Shostak, and Pease defined the Byzantine agreements problem in distributed computing.

1981: Shamir proposed cryptographically strong pseudorandom numbers.

1982: Goldwasser and Micali proposed probabilistic encryption. 1982: Yao founded secure multiparty computation.

1982: Goldschlager, Shaw, and Staples proved that the maximum flow problem is P-complete.

1982–1984: Yao, Blum, and Micali founded pseudorandom number generation on complexity theory.

(11)

A Brief History (continued)

1983: Ajtai, Koml´os, and Szemer´edi constructed an O(log n)-depth, O(n log n)-size sorting network. 1984: Valiant founded computational learning theory. 1984–1985: Furst, Saxe, Sipser, and Yao proved

exponential bounds for parity circuits of constant depth. 1985: Razborov proved exponential lower bounds for

monotone circuits.

1985: Goldwasser, Micali, and Rackoff invented zero-knowledge proofs.

(12)

A Brief History (continued)

1986: Goldreich, Micali, and Wigderson proved that every problem in NP has a zero-knowledge proof under certain complexity assumptions.

1987: Adleman and Huang proved that primality testing can be solved in randomized polynomial time.

1987–1988: Szelepsc´enyi and Immerman proved that NL equals coNL.

1989: Blum and Kannan proposed program checking. 1990: Shamir proved IP=PSPACE.

1990: Du and Hwang settled the Gilbert-Pollak conjecture on Steiner tree problems.

(13)

A Brief History (concluded)

1992: Arora, Lund, Motwani, Sudan, and Szegedy proved the PCP theorem.

1993: Bernstein, Vazirani, and Yao established quantum complexity theory.

1994: Shor presented a quantum polynomial-time algorithm for factoring.

1996: Ajtai on the shortest lattice vector problem. 2002: Agrawal, Kayal, and Saxena discovered a

(14)

(15)

I have never done anything “useful.” — Godfrey Harold Hardy (1877–1947), A Mathematician’s Apology (1940)

(16)

What This Course Is All About

Computability: What can be computed? • What is computation anyway?

• There are well-defined problems that cannot be computed.

(17)

What This Course Is All About (concluded)

Complexity: What is a computable problem’s inherent complexity?

• Some computable problems require at least

exponential time and/or space; they are intractable. • Some practical problems require superpolynomial

resources unless certain conjectures are disproved. • Other resource limits besides time and space?

– Program size, circuit size (growth), number of random bits, etc.

(18)

Tractability and intractability

• Polynomial in terms of the input size n defines tractability.

– n, n log n, n2, n90.

– Time, space, circuit size, number of random bits, etc. • It results in a fruitful and practical theory of complexity. • Few practical, tractable problems require a large degree. • Exponential-time or superpolynomial-time algorithms

are usually impractical. – nlog n, 2√n_{, 2}n

(19)

Growth of Factorials

n n! n n! 1 1 9 362,880 2 2 10 3,628,800 3 6 11 39,916,800 4 24 12 479,001,600 5 120 13 6,227,020,800 6 720 14 87,178,291,200 7 5040 15 1,307,674,368,000 8 40320 16 20,922,789,888,000

(20)

Most Important Results: a Sampler

• An operational definition of computability. • Decision problems in logic are undecidable.

• Decisions problems on program behavior are usually undecidable.

• Complexity classes and the existence of intractable problems.

• Complete problems for a complexity class.

• Randomization and cryptographic applications. • Approximability.

(21)

(22)

What Is Computation?

• That can be coded in an algorithm.

• An algorithm is a detailed step-by-step method for solving a problem.

– The Euclidean algorithm for the greatest common divisor is an algorithm.

– “Let s be the least upper bound of compact set A” is not an algorithm.

– “Let s be a smallest element of a finite-sized array” can be solved by an algorithm.

(23)

Turing Machines

a

• A Turing machine (TM) is a quadruple M = (K, Σ, δ, s). • K is a finite set of states.

• s ∈ K is the initial state.

• Σ is a finite set of symbols (disjoint from K). – Σ includes F (blank) and (first symbol).

• δ : K × Σ → (K ∪ {h, “yes”, “no”}) × Σ × {←, →, −} is a transition function.

– ← (left), → (right), and − (stay) signify cursor movements.

(24)

A TM Schema

δ

(25)

“Physical” Interpretations

• The tape: computer memory and registers. • δ: program.

• K: instruction numbers. • s: “main()” in C.

(26)

More about δ

• The program has the halting state (h), the accepting state (“yes”), and the rejecting state (“no”).

• Given current state q ∈ K and current symbol σ ∈ Σ, δ(q, σ) = (p, ρ, D).

– It specifies the next state p, the symbol ρ to be written over σ, and the direction D the cursor will move afterwards.

• We require δ(q, ) = (p, , →) so that the cursor never falls off the left end of the string.

(27)

The Operations of TMs

• Initially the state is s.

• The string on the tape is initialized to a , followed by a finite-length string x ∈ (Σ − {F})∗.

• x is the input of the TM.

– The input must not contain Fs (why?)!

• The cursor is pointing to the first symbol, always a . • The TM takes each step according to δ.

• The cursor may overwrite F to make the string longer during the computation.

(28)

Program Count

• A program has a finite size. • Recall that

δ : K × Σ → (K ∪ {h, “yes”, “no”}) × Σ × {←, →, −}. • So |K| × |Σ| “lines” suffice to specify a program, one line

per pair from K × Σ.

• Given K and Σ, there are

((|K| + 3) × |Σ| × 3)|K|×|Σ| possible δ’s (see next page).

– This is a constant—albeit large.

(29)

(| K | + 3) Χ | Σ | Χ 3

possibilities

(30)

The Halting of a TM

• A TM M may halt in three cases.

“yes”: M accepts its input x, and M (x) = “yes”. “no”: M rejects its input x, and M (x) = “no”.

h: M (x) = y, where the string consists of a , followed by a finite string y, whose last symbol is not F,

followed by a string of Fs.

– y is the output of the computation. – y may be empty denoted by .

(31)

Why TMs?

• Because of the simplicity of the TM, the model has the advantage when it comes to complexity issues.

• One can develop a complexity theory based on C++ or Java, say.

• But the added complexity does not yield additional fundamental insights.

(32)

The Concept of Configuration

• A configuration is a complete description of the current state of the computation.

• The specification of a configuration is sufficient for the computation to continue as if it had not been stopped.

– What does your PC save before it sleeps? – Enough for it to resume work later.

• Similar to the concept of Markov process in stochastic processes or dynamic systems.

(33)

Configurations (concluded)

• A configuration is a triple (q, w, u): – q ∈ K.

– w ∈ Σ∗ is the string to the left of the cursor (inclusive).

– u ∈ Σ∗ is the string to the right of the cursor.

• Note that (w, u) describes both the string and the cursor position.

(34)

1000110000111001110001110

• w = 1000110000.

(35)

Yielding

• Fix a TM M .

• Configuration (q, w, u) yields configuration (q0, w0, u0) in one step,

(q, w, u) −→M (q0, w0, u0),

if a step of M from configuration (q, w, u) results in configuration (q0_{, w}0_{, u}0_). • (q, w, u) M k −→ (q0, w0, u0): Configuration (q, w, u) yields configuration (q0_{, w}0_{, u}0_{) in k ∈ N steps.} • (q, w, u) M ∗ −→ (q0, w0, u0): Configuration (q, w, u) yields configuration (q0_{, w}0_{, u}0_).

(36)

Example: How to Insert a Symbol

• We want to compute f(x) = ax.

– The TM moves the last symbol of x to the right by one position.

– It then moves the next to last symbol to the right, and so on.

– The TM finally writes a in the first position.

• The total number of steps is O(n), where n is the length of x.

(37)

Palindromes

• A string is a palindrome if it reads the same forwards and backwards (e.g., 001100).

• A TM program can be written to recognize palindromes: – It matches the first character with the last character. – It matches the second character with the next to last

character, etc. (see next page).

– “yes” for palindromes and “no” for nonpalindromes. • This program takes O(n2) steps.

(38)

(39)

A Matching Lower Bound for palindrome

Theorem 1 (Hennie (1965)) palindrome on

(40)

The Proof: Setup

100011000000100111

x yr

Communication: at most log₂ | K | bits

P(x, y)

yes/no

m

(41)

The Proof: Communications

• Our input is more restricted; hence any lower bound holds for the original problem.

• Each communication between the two halves across the cut is a state from K, hence of size O(1).

• C(x, x): the sequence of communications for palindrome problem P(x, x) across the cut.

(42)

The Proof: Communications (concluded)

• C(x, x) 6= C(y, y) when x 6= y.

– Suppose otherwise, C(x, x) = C(y, y).

– Then C(y, y) = C(x, y) by the cut-and-paste argument (see next page).

– Hence P(x, y) has the same answer as P(y, y)! • So C(x, x) is distinct for each x.

(43)

x xr _y _yr _x _yr

b b

(44)

The Proof: Amount of Communications

• Assume | x | = | y | = m = n/3.

• | C(x, x) | is the number of times the cut is crossed. • We first seek a lower bound on the total number of

communications:

X

x_∈{0,1}m

| C(x, x) |. • Define

(45)

The Proof: Amount of Communications (continued)

• There are ≤ | K |i distinct C(x, x)s with | C(x, x) | = i. • Hence there are at most

κ X i=0 | K |i = | K | κ+1 − 1 | K | − 1 ≤ | K |κ+1 | K | − 1 = 2m+1 m distinct C(x, x)s with | C(x, x) | ≤ κ.

• The rest must have | C(x, x) | > κ.

• Because C(x, x) is distinct for each x (p. 42), there are at least 2m ₋ 2m_m+1 _{of them with | C(x, x) | > κ.}

(46)

The Proof: Amount of Communications (concluded)

• Thus X x_∈{0,1}m | C(x, x) | ≥ X x_∈{0,1}m_, | C(x,x) |>κ | C(x, x) | > 2m ₋ 2 m+1 m κ = κ2mm − 2 m .

• As κ = Θ(m), the total number of communications is X

x_∈{0,1}m

(47)

The Proof (continued)

We now lower-bound the worst-case number of communication points in the middle section.

x i xr m

(48)

The Proof (continued)

• Ci(x, x) denotes the sequence of communications for

P(x, x) given the cut at position i.

• Then Pm

i=1 | Ci(x, x) | is the number of steps spent in

the middle section for P (x, x). • Let T (n) = maxx_∈{0,1}m

Pm

i=1 | Ci(x, x) |.

– T (n) is the worst-case running time spent in the middle section when dealing with any P (x, x) with | x | = m.

• Note that T (n) ≥ Pm

(49)

The Proof (continued)

• Now, 2mT (n) = X x_∈{0,1}m T (n) ≥ X x_∈{0,1}m m X i=1 | Ci(x, x) | = m X i=1 X x_∈{0,1}m | Ci(x, x) |.

(50)

The Proof (concluded)

• By the pigeonhole principle,a there exists an 0 ≤ i∗ ≤ m, X

x_∈{0,1}m

| Ci∗(x, x) | ≤

2mT (n)

m .

• Eq. (1) on p. 46 says that X x_∈{0,1}m | Ci∗(x, x) | = Ω(m2m). • Hence T (n) = Ω(m2) = Ω(n2). a Dirichlet (1805–1859).