Large Deviations

(1)

ZPP

^a

(Zero Probabilistic Polynomial)

• The class ZPP is deﬁned as RP ∩ coRP.

• A language in ZPP has two Monte Carlo algorithms, one with no false positives (RP) and the other with no false negatives (coRP).

• If we repeatedly run both Monte Carlo algorithms, eventually one deﬁnite answer will come (unlike RP).

– A positive answer from the one without false positives.

– A negative answer from the one without false negatives.

(2)

The ZPP Algorithm (Las Vegas)

1: {Suppose L ∈ ZPP.}

2: {N1 has no false positives, and N₂ has no false negatives.}

3: while true do

4: if N₁(x) = “yes” then

5: return “yes”;

6: end if

7: if N₂(x) = “no” then

8: return “no”;

9: end if

10: end while

(3)

ZPP (concluded)

• The expected running time for the correct answer to emerge is polynomial.

– The probability that a run of the 2 algorithms does not generate a deﬁnite answer is 0.5 (why?).

– Let p(n) be the running time of each run of the while-loop.

– The expected running time for a deﬁnite answer is

∞ i=1

0.5ⁱip(n) = 2p(n).

• Essentially, ZPP is the class of problems that can be

(4)

Large Deviations

• Suppose you have a biased coin.

• One side has probability 0.5 + to appear and the other 0.5 − , for some 0 < < 0.5.

• But you do not know which is which.

• How to decide which side is the more likely side—with high conﬁdence?

• Answer: Flip the coin many times and pick the side that appeared the most times.

• Question: Can you quantify your conﬁdence?

(5)

The Chernoﬀ Bound

^a

Theorem 70 (Chernoﬀ (1952)) Suppose x₁, x₂, . . . , x_n are independent random variables taking the values 1 and 0 with probabilities p and 1 − p, respectively. Let X = _n

i=1 x_i. Then for all 0 ≤ θ ≤ 1,

prob[ X ≥ (1 + θ) pn ] ≤ e^−θ²^pn/3.

• The probability that the deviate of a binomial random variable from its expected value

E[ X ] = E

_n

i=1

x_i

= pn

decreases exponentially with the deviation.

(6)

The Proof

• Let t be any positive real number.

• Then

prob[ X ≥ (1 + θ) pn ] = prob[ e^tX ≥ e^{t(1+θ) pn} ].

• Markov’s inequality (p. 519) generalized to real-valued random variables says that

prob

e^tX ≥ kE[ e^tX ]

≤ 1/k.

• With k = e^{t(1+θ) pn}/E[ e^tX ], we have^a

prob[ X ≥ (1 + θ) pn ] ≤ e^{−t(1+θ) pn}E[ e^tX ].

aNote that X does not appear in k. Contributed by Mr. Ao Sun

(7)

The Proof (continued)

• Because X = _n

i=1 x_i and x_i’s are independent, E[ e^tX ] = (E[ e^tx¹ ])ⁿ = [ 1 + p(e^t − 1) ]ⁿ.

• Substituting, we obtain

prob[ X ≥ (1 + θ) pn ] ≤ e^{−t(1+θ) pn}[ 1 + p(e^t − 1) ]ⁿ

≤ e^{−t(1+θ) pn}e^pn(e^t⁻¹⁾ as (1 + a)ⁿ ≤ e^an for all a > 0.

(8)

The Proof (concluded)

• With the choice of t = ln(1 + θ), the above becomes prob[ X ≥ (1 + θ) pn ] ≤ epn[ θ−(1+θ) ln(1+θ) ].

• The exponent expands to

−θ²

2 + θ³

6 − θ⁴

12 + · · · for 0 ≤ θ ≤ 1.

• But it is less than

−θ²

2 + θ³

6 ≤ θ²

−1

2 + θ 6

≤ θ²

−1

2 + 1 6

= −θ² 3 .

(9)

Other Variations of the Chernoﬀ Bound

The following can be proved similarly (prove it).

Theorem 71 Given the same terms as Theorem 70 (p. 583),

prob[ X ≤ (1 − θ) pn ] ≤ e^−θ²^pn/2.

The following slightly looser inequalities achieve symmetry.

Theorem 72 (Karp, Luby, & Madras (1989)) Given the same terms as Theorem 70 (p. 583) except with

0 ≤ θ ≤ 2,

prob[ X ≥ (1 + θ) pn ] ≤ e^−θ²^pn/4,

−θ²

(10)

Power of the Majority Rule

The next result follows from Theorem 71 (p. 587).

Corollary 73 If p = (1/2) + for some 0 ≤ ≤ 1/2, then prob

_n

i=1

x_i ≤ n/2

≤ e⁻²^n/2.

• The textbook’s corollary to Lemma 11.9 seems too loose, at e⁻²^n/6.^a

• Our original problem (p. 582) hence demands, e.g.,

n ≈ 1.4k/² independent coin ﬂips to guarantee making an error with probability ≤ 2^−k with the majority rule.

a

(11)

BPP

^a

(Bounded Probabilistic Polynomial)

• The class BPP contains all languages L for which there is a precise polynomial-time NTM N such that:

– If x ∈ L, then at least 3/4 of the computation paths of N on x lead to “yes.”

– If x ∈ L, then at least 3/4 of the computation paths of N on x lead to “no.”

• So N accepts or rejects by a clear majority.

aGill (1977).

(12)

Magic 3/4?

• The number 3/4 bounds the probability (ratio) of a right answer away from 1/2.

• Any constant strictly between 1/2 and 1 can be used without aﬀecting the class BPP.

• In fact, as with RP,

1

2 + 1 q(n)

for any polynomial q(n) can replace 3/4.

• The next algorithm shows why.

(13)

The Majority Vote Algorithm

Suppose L is decided by N by majority (1/2) + .

1: for i = 1, 2, . . . , 2k + 1 do

2: Run N on input x;

3: end for

4: if “yes” is the majority answer then

5: “yes”;

6: else

7: “no”;

8: end if

(14)

Analysis

• The running time remains polynomial: 2k + 1 times N’s running time.

• By Corollary 73 (p. 588), the probability of a false answer is at most e⁻²^k.

• By taking k = 2/² , the error probability is at most 1/4.

• Even if is any inverse polynomial, k remains a polynomial in n.

(15)

Aspects of BPP

• BPP is the most comprehensive yet plausible notion of eﬃcient computation.

– If a problem is in BPP, we take it to mean that the problem can be solved eﬃciently.

– In this aspect, BPP has eﬀectively replaced P.

• (RP ∪ coRP) ⊆ (NP ∪ coNP).

• (RP ∪ coRP) ⊆ BPP.

• Whether BPP ⊆ (NP ∪ coNP) is unknown.

• But it is unlikely that NP ⊆ BPP.^a

(16)

coBPP

• The deﬁnition of BPP is symmetric: acceptance by clear majority and rejection by clear majority.

• An algorithm for L ∈ BPP becomes one for ¯L by reversing the answer.

• So ¯L ∈ BPP and BPP ⊆ coBPP.

• Similarly coBPP ⊆ BPP.

• Hence BPP = coBPP.

• This approach does not work for RP.^a

aIt did not work for NP either.

(17)

BPP and coBPP

Ø\HVÙ ØQRÙ ØQRÙ Ø\HVÙ

(18)

“The Good, the Bad, and the Ugly”

P BPP ZPP

RP coRP

NP coNP

(19)

Circuit Complexity

• Circuit complexity is based on boolean circuits instead of Turing machines.

• A boolean circuit with n inputs computes a boolean function of n variables.

• Now, identify true/1 with “yes” and false/0 with “no.”

• Then a boolean circuit with n inputs accepts certain strings in { 0, 1 }ⁿ.

• To relate circuits with an arbitrary language, we need one circuit for each possible input length n.

(20)

Formal Deﬁnitions

• The size of a circuit is the number of gates in it.

• A family of circuits is an inﬁnite sequence

C = (C0, C₁, . . .) of boolean circuits, where C_n has n boolean inputs.

• For input x ∈ { 0, 1 }^∗, C_{| x |} outputs 1 if and only if x ∈ L.

• In other words,

C_n accepts L ∩ { 0, 1 }ⁿ.

(21)

Formal Deﬁnitions (concluded)

• L ⊆ { 0, 1 }^∗ has polynomial circuits if there is a family of circuits C such that:

– The size of C_n is at most p(n) for some ﬁxed polynomial p.

– C_n accepts L ∩ { 0, 1 }ⁿ.

(22)

Exponential Circuits Suﬃce for All Languages

• Theorem 14 (p. 195) implies that there are languages that cannot be solved by circuits of size 2ⁿ/(2n).

• But surprisingly, circuits of size 2ⁿ⁺² can solve all problems, decidable or otherwise!

(23)

Exponential Circuits Suﬃce for All Languages (continued)

Proposition 74 All decision problems (decidable or otherwise) can be solved by a circuit of size 2ⁿ⁺².

• We will show that for any language L ⊆ { 0, 1 }^∗, L ∩ { 0, 1 }ⁿ can be decided by a circuit of size 2ⁿ⁺².

• Deﬁne boolean function f : { 0, 1 }ⁿ → { 0, 1 }, where

f (x₁x₂ · · · x_n) =

⎧⎨

⎩

1 x₁x₂ · · · x_n ∈ L, 0 x₁x₂ · · · x_n ∈ L.

(24)

The Proof (concluded)

• Clearly, any circuit that implements f decides L ∩ { 0, 1 }ⁿ.

• Now,

f(x¹x² · · · xn) = (x¹ ∧ f(1x² · · · xn)) ∨ (¬x¹ ∧ f(0x² · · · xn)).

• The circuit size s(n) for f(x1x₂ · · · xn) hence satisﬁes s(n) = 4 + 2s(n − 1)

with s(1) = 1.

• Solve it to obtain s(n) = 5 × 2ⁿ⁻¹ − 4 ≤ 2ⁿ⁺².

(25)

The Circuit Complexity of P

Proposition 75 All languages in P have polynomial circuits.

• Let L ∈ P be decided by a TM in time p(n).

• By Corollary 31 (p. 297), there is a circuit with O(p(n)²) gates that accepts L ∩ { 0, 1 }ⁿ.

• The size of that circuit depends only on L and the length of the input.

• The size of that circuit is polynomial in n.

(26)

Polynomial Circuits vs. P

• Is the converse of Proposition 75 true?

– Do polynomial circuits accept only languages in P?

• No.

• Polynomial circuits can accept undecidable languages!

(27)

BPP’s Circuit Complexity

Theorem 76 (Adleman (1978)) All languages in BPP have polynomial circuits.

• Our proof will be nonconstructive in that only the existence of the desired circuits is shown.

– Recall our proof of Theorem 14 (p. 195).

– Something exists if its probability of existence is nonzero.

• It is not known how to eﬃciently generate circuit C_n. – If the construction of C_n can be made eﬃcient, then

(28)

The Proof

• Let L ∈ BPP be decided by a precise polynomial-time NTM N by clear majority.

• We shall prove that L has polynomial circuits C₀, C₁, . . ..

– These deterministic circuits do not err.

• Suppose N runs in time p(n), where p(n) is a polynomial.

• Let A_n = { a₁, a₂, . . . , a_m }, where a_i ∈ { 0, 1 }^p(n).

• Each a_i ∈ A_n represents a sequence of nondeterministic choices (i.e., a computation path) for N .

• Pick m = 12(n + 1).

(29)

The Proof (continued)

• Let x be an input with | x | = n.

• Circuit C_n simulates N on x with all sequences of choices in A_n and then takes the majority of the m outcomes.^a

– Note that each A_n yields a circuit.

• As N with a_i is a polynomial-time deterministic TM, it can be simulated by polynomial circuits of size O(p(n)²).

– See the proof of Proposition 75 (p. 603).

aAs m is even, there may be no clear majority. Still, the probability of that happening is very small and does not materially aﬀect our general

(30)

The Circuit

,₂ ,

, ,

(31)

The Proof (continued)

• The size of C_n is therefore O(mp(n)²) = O(np(n)²).

– This is a polynomial.

• We now conﬁrm the existence of an A_n making C_n correct on all n-bit inputs.

• Call a_i bad if it leads N to an error (a false positive or a false negative) for x.

• Select A_n uniformly randomly.

(32)

The Proof (continued)

• For each x ∈ { 0, 1 }ⁿ, 1/4 of the computations of N are erroneous.

• Because the sequences in A_n are chosen randomly and independently, the expected number of bad a_i’s is m/4.^a

• By the Chernoﬀ bound (p. 583), the probability that the number of bad a_i’s is m/2 or more is at most

e^−m/12 < 2⁻⁽ⁿ⁺¹⁾.

• The error probability of using the majority rule is thus

< 2⁻⁽ⁿ⁺¹⁾ for each x ∈ { 0, 1 }ⁿ.

aSo the proof will not work for NP. Contributed by Mr. Ching-Hua

(33)

The Proof (continued)

• The probability that there is an x such that A_n results in an incorrect answer is

< 2ⁿ2⁻⁽ⁿ⁺¹⁾ = 2⁻¹.

– Recall the union bound (Boole’s inequality):

prob[ A ∪ B ∪ · · · ] ≤ prob[ A ] + prob[ B ] + · · · .

• We just showed that at least half of them are correct.

• So with probability ≥ 0.5, a random A_n produces a correct C_n for all inputs of length n.

– Of course, verifying this fact may take a long time.

(34)

The Proof (concluded)

• Because this probability exceeds 0, an An that makes majority vote work for all inputs of length n exists.

• Hence a correct C_n exists.^a

• We have used the probabilistic method popularized by Erd˝os.^b

• This result answers the question on p. 514 with a “yes.”

aQuine (1948), “To be is to be the value of a bound variable.”

bA counting argument in the probabilistic language.

(35)

Leonard Adleman

^a

(1945–)

(36)

Paul Erd˝ os (1913–1996)

(37)

Cryptography

(38)

Whoever wishes to keep a secret must hide the fact that he possesses one.

— Johann Wolfgang von Goethe (1749–1832)

(39)

Cryptography

• Alice (A) wants to send a message to Bob (B) over a channel monitored by Eve (eavesdropper).

• The protocol should be such that the message is known only to Alice and Bob.

• The art and science of keeping messages secure is cryptography.

Alice Eve -

Bob

(40)

Encryption and Decryption

• Alice and Bob agree on two algorithms E and D—the encryption and the decryption algorithms.

• Both E and D are known to the public in the analysis.

• Alice runs E and wants to send a message x to Bob.

• Bob operates D.

• Privacy is assured in terms of two numbers e, d, the encryption and decryption keys.

• Alice sends y = E(e, x) to Bob, who then performs D(d, y) = x to recover x.

• x is called plaintext, and y is called ciphertext.^a

(41)

Some Requirements

• D should be an inverse of E given e and d.

• D and E must both run in (probabilistic) polynomial time.

• Eve should not be able to recover x from y without knowing d.

– As D is public, d must be kept secret.

– e may or may not be a secret.

(42)

Degrees of Security

• Perfect secrecy: After a ciphertext is intercepted by the enemy, the a posteriori probabilities of the plaintext that this ciphertext represents are identical to the a

priori probabilities of the same plaintext before the interception.

– The probability that plaintext P occurs is

independent of the ciphertext C being observed.

– So knowing C yields no advantage in recovering P.

• Such systems are said to be informationally secure.

• A system is computationally secure if breaking it is theoretically possible but computationally infeasible.

(43)

Conditions for Perfect Secrecy

^a

• Consider a cryptosystem where:

– The space of ciphertext is as large as that of keys.

– Every plaintext has a nonzero probability of being used.

• It is perfectly secure if and only if the following hold.

– A key is chosen with uniform distribution.

– For each plaintext x and ciphertext y, there exists a unique key e such that E(e, x) = y.

aShannon (1949).

(44)

The One-Time Pad

^a

1: Alice generates a random string r as long as x;

2: Alice sends r to Bob over a secret channel;

3: Alice sends x ⊕ r to Bob over a public channel;

4: Bob receives y;

5: Bob recovers x := y ⊕ r;

aMauborgne and Vernam (1917); Shannon (1949). It was allegedly used for the hotline between Russia and U.S.

(45)

Analysis

• The one-time pad uses e = d = r.

• This is said to be a private-key cryptosystem.

• Knowing x and knowing r are equivalent.

• Because r is random and private, the one-time pad achieves perfect secrecy (see also p. 621).

• The random bit string must be new for each round of communication.

• But the assumption of a private channel is problematic.

(46)

Public-Key Cryptography

^a

• Suppose only d is private to Bob, whereas e is public knowledge.

• Bob generates the (e, d) pair and publishes e.

• Anybody like Alice can send E(e, x) to Bob.

• Knowing d, Bob can recover x by D(d, E(e, x)) = x.

• The assumptions are complexity-theoretic.

– It is computationally diﬃcult to compute d from e.

– It is computationally diﬃcult to compute x from y without knowing d.

a

(47)

Whitﬁeld Diﬃe

^a

(1944–)

aTuring Award (2016).

(48)

Martin Hellman

^a

(1945–)

(49)

Complexity Issues

• Given y and x, it is easy to verify whether E(e, x) = y.

• Hence one can always guess an x and verify.

• Cracking a public-key cryptosystem is thus in NP.

• A necessary condition for the existence of secure public-key cryptosystems is P = NP.

• But more is needed than P = NP.

• For instance, it is not suﬃcient that D is hard to compute in the worst case.

• It should be hard in “most” or “average” cases.

(50)

One-Way Functions

A function f is a one-way function if the following hold.^a 1. f is one-to-one.

2. For all x ∈ Σ^∗, | x |^1/k ≤ |f(x)| ≤ | x |^k for some k > 0.

• f is said to be honest.

3. f can be computed in polynomial time.

4. f⁻¹ cannot be computed in polynomial time.

• Exhaustive search works, but it must be slow.

aDiﬃe & Hellman (1976); Boppana & Lagarias (1986); Grollmann &

Selman (1988); Ko (1985); Ko, Long, & Du (1986); Watanabe (1985);

Young (1983).

(51)

Existence of One-Way Functions (OWFs)

• Even if P = NP, there is no guarantee that one-way functions exist.

• No functions have been proved to be one-way.

• Is breaking glass a one-way function?

(52)

Candidates of One-Way Functions

• Modular exponentiation f(x) = g^x mod p, where g is a primitive root of p.

– Discrete logarithm is hard.^a

• The RSA^b function f (x) = x^e mod pq for an odd e relatively prime to φ(pq).

– Breaking the RSA function is hard.

aConjectured to be 2ⁿ for some > 0 in both the worst-case sense and average sense. Doable in time n^{O(log n)} for ﬁnite ﬁelds of small char- acteristic (Barbulescu, et al., 2013). It is in NP in some sense (Grollmann and Selman, 1988).

bRivest, Shamir, & Adleman (1978).

(53)

Candidates of One-Way Functions (concluded)

• Modular squaring f(x) = x² mod pq.

– Determining if a number with a Jacobi symbol 1 is a quadratic residue is hard—the quadratic

residuacity assumption (QRA).^a

– Breaking it is as hard as factorization when p ≡ q ≡ 3 mod 4.^b

aDue to Gauss.

bRabin (1979).

(54)

The Secret-Key Agreement Problem

• Exchanging messages securely using a private-key cryptosystem requires Alice and Bob possessing the same key (p. 623).

– An example is the r in the one-time pad (p. 622).

• How can they agree on the same secret key when the channel is insecure?

• This is called the secret-key agreement problem.

• It was solved by Diﬃe and Hellman (1976) using one-way functions.

(55)

The Diﬃe-Hellman Secret-Key Agreement Protocol

1: Alice and Bob agree on a large prime p and a primitive root g of p; {p and g are public.}

2: Alice chooses a large number a at random;

3: Alice computes α = g^a mod p;

4: Bob chooses a large number b at random;

5: Bob computes β = g^b mod p;

6: Alice sends α to Bob, and Bob sends β to Alice;

7: Alice computes her key β^a mod p;

8: Bob computes his key α^b mod p;

(56)

Analysis

• The keys computed by Alice and Bob are identical as β^a = g^ba = g^ab = α^b mod p.

• To compute the common key from p, g, α, β is known as the Diﬃe-Hellman problem.

• It is conjectured to be hard.

• If discrete logarithm is easy, then one can solve the Diﬃe-Hellman problem.

– Because a and b can then be obtained by Eve.

• But the other direction is still open.

(57)

The RSA Function

• Let p, q be two distinct primes.

• The RSA function is x^e mod pq for an odd e relatively prime to φ(pq).

– By Lemma 54 (p. 464),

φ(pq) = pq

1 − 1 p

1 − 1 q

= pq − p − q + 1. (14)

• As gcd(e, φ(pq)) = 1, there is a d such that ed ≡ 1 mod φ(pq),

which can be found by the Euclidean algorithm.^a

(58)

A Public-Key Cryptosystem Based on RSA

• Bob generates p and q.

• Bob publishes pq and the encryption key e, a number relatively prime to φ(pq).

– The encryption function is y = x^e mod pq.

– Bob calculates φ(pq) by Eq. (14) (p. 635).

– Bob then calculates d such that ed = 1 + kφ(pq) for some k ∈ Z.

• The decryption function is y^d mod pq.

• It works because y^d = x^ed = x^1+kφ(pq) = x mod pq by the Fermat-Euler theorem when gcd(x, pq) = 1 (p. 473).

(59)

The “Security” of the RSA Function

• Factoring pq or calculating d from (e, pq) seems hard.^a

• Breaking the last bit of RSA is as hard as breaking the RSA.^b

• Recommended RSA key sizes:^c – 1024 bits up to 2010.

– 2048 bits up to 2030.

– 3072 bits up to 2031 and beyond.

aSee also p. 469.

bAlexi, Chor, Goldreich, & Schnorr (1988).

cRSA (2003). RSA was acquired by EMC in 2006 for 2.1 billion US

(60)

The “Security” of the RSA Function (continued)

• Recall that problem A is “harder than” problem B if solving A results in solving B.

– Factorization is “harder than” breaking the RSA.

– It is not hard to show that calculating Euler’s phi function^a is “harder than” breaking the RSA.

– Factorization is “harder than” calculating Euler’s phi function (see Lemma 54 on p. 464).

– So factorization is harder than calculating Euler’s phi function, which is harder than breaking the RSA.

aWhen the input is not factorized!

(61)

The “Security” of the RSA Function (concluded)

• Factorization cannot be NP-hard unless NP = coNP.^a

• So breaking the RSA is unlikely to imply P = NP.

• But numbers can be factorized eﬃciently by quantum computers.^b

• RSA was alleged to have received 10 million US dollars from the government to promote unsecure p and q.^c

aBrassard (1979).

bShor (1994).

cMenn (2013).

(62)

Adi Shamir, Ron Rivest, and Leonard Adleman

(63)

Ron Rivest

^a

(1947–)

(64)

Adi Shamir

^a

(1952–)

(65)

A Parallel History

• Diﬃe and Hellman’s solution to the secret-key

agreement problem led to public-key cryptography.

• At around the same time (or earlier) in Britain, the RSA public-key cryptosystem was invented ﬁrst before the Diﬃe-Hellman secret-key agreement scheme was.

– Ellis, Cocks, and Williamson of the Communications Electronics Security Group of the British Government Communications Head Quarters (GCHQ).