• 沒有找到結果。

On the Computational Power of Players in Two-Person Strategic Games

N/A
N/A
Protected

Academic year: 2022

Share "On the Computational Power of Players in Two-Person Strategic Games"

Copied!
32
0
0

加載中.... (立即查看全文)

全文

(1)

On the Computational Power of Players in Two-Person Strategic Games

Advisor: Prof. Yuh-Dauh Lyuu

Ching-Lueh Chang

Department of Computer Science and Information Engineering

National Taiwan University

(2)

Contents

1 Introduction 1

1.1 Two-Person Strategic Games . . . 1

1.2 Our Results . . . 2

1.3 Related Works . . . 3

2 Preliminaries 5 2.1 Basic Terms in Game Theory . . . 5

2.2 Randomized Circuits . . . 7

2.3 Tail Inequalities . . . 9

2.4 Conventions and Assumptions . . . 9 3 Extraction of Small Numbers of Good Strategies 11

4 Implications on the Power of Players 19

A Complexity Classes 24

Bibliography 29

(3)

Abstract

We consider families of two-person strategic games parameterized by a pos- itive integer n. We assume that each of the players, the row player and the column player, has 2n strategies to choose from and can take mixed strate- gies. We also assume that the games are not too “risky” in that the payoffs are at most polynomial in n in absolute value. The row player is said to guarantee an expected payoff of t ∈ R against all column players of a certain class when his expected payoff is at least t against all column players of that class. This thesis studies the expected payoff the row player could guarantee against all column players of a certain computational power. In our main re- sult we consider the case where the row player is informed of how the column player chooses his action but is not allowed to see the internal coin tosses of the column player. Roughly speaking, we show that when the computational power of the column player shrinks polynomially, the row player can have his number of pure strategies shrunk exponentially without harming his guar- anteed expected payoff too much. We obtain several corollaries regarding the computational power needed by the row player to guarantee a good ex- pected payoff against computationally bounded column players in situations where the row player is aware of the output strategy of the column player, the mixed strategy of the column player, or only O(log n) bits of information about the column player.

(4)

Chapter 1 Introduction

1.1 Two-Person Strategic Games

In a two-person strategic game there are two players, the row player and the column player. Each player has several pure strategies to choose from. There is a payoff function mapping each combination of the players’ pure strategies to their payoffs. Each player can also adopt a mixed strategy, that is, a probability distribution over pure strategies. In a two-person game, a pair of mixed strategies (α1, α2) is a mixed strategy Nash Equilibrium if no player i has a strategy yielding an expected payoff higher than when he chooses αi, given that the other player j chooses αj. The expectation is over the mixed strategies adopted by both players. If the payoffs of the two players sum to 0, the game is said to be zero-sum. For a zero-sum game, the expected payoff for the row player when a mixed strategy Nash equilibrium is played is called a game value under mixed strategies. The game value thus defined is unique [OR94]. For a precise definition of these and some other terms please refer to Section 2.1.

(5)

1.2 Our Results

We consider families of two-person strategic games parameterized by a pos- itive integer n. In the game parameterized by n, each player is given input 1n and oracle access to the payoff function mapping each combination of the players’ pure strategies to their payoffs. The payoffs are assumed to be bounded by nk in absolute value for some constant k. For technical reasons we will sometimes need to assume that the payoffs can be represented in bi- nary using polynomially many bits (in n). Each player has {0, 1}n as his set of pure strategies and can take mixed strategies. The row player is said to guarantee an expected payoff of t ∈ R against all column players of a certain class if his expected payoff is at least t against all column players of that class. This thesis considers the case where the row player is informed of how the column player chooses his action but is not allowed to see the internal coin tosses of the column player. In this case, therefore, the row player is said to guarantee an expected payoff of t against all column players of a certain class if, against each column player of that class, the row player can take some strategy to force an expected payoff of at least t. In particular, the row player is interested in the expected payoff he could guarantee against all column players being randomized circuits (with oracle gates for the players’

payoff functions) of a certain size s. This thesis shows that when s shrinks polynomially (in n), the row player could have his number of pure strate- gies shrunk exponentially (in n) without harming his guaranteed expected payoff too much. We obtain several corollaries regarding the computational power needed by the row player to guarantee a good expected payoff against randomized circuits (acting as the column player) of a certain size, in situ- ations where the row player is informed of the action taken by the column player, the mixed strategy of the column player, or only some O(log n) bits

(6)

of information about the column player.

1.3 Related Works

The computational power of players in games is a topic being actively re- searched. We briefly describe several results in this area. For the definitions of standard complexity classes please refer to the complexity zoo [AK]. In this section when a game is given to an algorithm, it is given by explicitly specifying the payoffs for both players given each combination of pure strate- gies being adopted by the players, unless otherwise specified. It is well-known that for a two-person zero-sum game, a mixed strategy Nash equilibrium, and hence the game value, can be found in polynomial time by linear program- ming [Kha79, Owe82]. It is also known that given a two-person zero-sum game and a number v, computing whether the game value equals v is P-hard [FIKU05]. Given a polynomial-size circuit that computes the payoff func- tion of a two-person zero-sum game and a number v, computing whether the game value equals v is known to be EXP-complete [FKS95]. A mixed strategy α in a two-person zero-sum game is said to be -optimal if, when the corresponding player uses α, his least possible expected payoff (against arbi- trarily malicious players) is at most  lower than that when a mixed strategy Nash equilibrium is played. Newman has shown that in any two-person zero- sum game with payoffs in [ 0, 1 ], each player has an -optimal strategy which chooses uniformly from a multiset of O(log N/2) pure strategies, where N is the number of pure strategies of the other player [New91]. Finding -optimal strategies can be done efficiently in parallel [GK92, GK95, LN93, PST95], as well as sequentially in sublinear time by a randomized algorithm [GK95].

Given a polynomial-size circuit that computes the payoff function of a two-

(7)

person zero-sum game, approximating the game value to within an additive factor is complete for the class promise-Sp2 [FIKU05], and a pair of -optimal strategies can be constructed by a ZPPNP algorithm [FIKU05]. After a series of exciting developments initiated by Daskalakis, Goldberg, and Pa- padimitriou [DGP05], Cheng and Deng [CD06] recently show that computing a mixed strategy Nash equilibrium is PPAD-complete for two-person games.

The rest of the thesis is organized as follows. Chapter 2 presents the necessary background knowledge and some conventions. Chapter 3 shows how to extract a small number of strategies good against players of a certain computational power using an idea similar to that in [FPS03]. Chapter 4 applies the result to analyze the computational power needed by a player to guarantee a certain expected payoff in several cases.

(8)

Chapter 2

Preliminaries

2.1 Basic Terms in Game Theory

A finite N -person game G in strategic form consists of players 1, . . . , N , N finite sets A1, A2, . . . , AN, and for i = 1, . . . , N , a payoff function Mi : A1 × A2 × · · · × AN → R. For i = 1, . . . , N, Ai is the finite set of pure strategies player i can take, and Mi maps the players’ pure strategies to the payoff for player i.

Player i can adopt a probability distribution over Ai as his strategy. In this case we say that player i takes a mixed strategy. A mixed strategy αi of player i is also used alternatively to represent a random variable whose distribution is specified by αi as the ambiguity will not arise. A pure strategy is obviously a degenerate mixed strategy.

Now suppose αi is the mixed strategy of player i and α1, . . . , αN are independent. We say that αi is a best response to ×1≤k≤N,k6=iαk if for every mixed strategy αi0 of player i,

E[Mi1, . . . , αi−1, αi0, αi+1, , . . . , αN)]

≤ E[Mi1, . . . , αi−1, αi, αi+1, . . . , αN)].

(9)

The randomness in the first expectation comes from α1, . . . , αi−1, αi0, αi+1, . . . , αN, that of the second expectation comes from α1, . . . , αN. We say that (α1, . . . , αN) is a mixed strategy Nash equilibrium for G if for each i, αi is a best response to ×1≤k≤N,k6=iαk. A celebrated theorem due to Nash states that every fi- nite N -person game in strategic form has a mixed strategy Nash equilibrium [Nas51]. The following fact is well-known.

Fact 1. ([OR94]) Let α1, . . . , αN be mixed strategies for players 1, . . . , N in a finite N -person game in strategic form. For 1 ≤ i ≤ N , there is a pure strategy for the ith player which is a best response to ×1≤k≤N,k6=iαk.

When there are only two players, player 1 is called the row player and player 2 the column player. The corresponding payoff functions will be de- noted Mrow : Arow× Acol→ R and Mcol: Arow× Acol→ R, respectively. For convenience, we will sometimes combine Mrowand Mcolinto a single function M (·) = (Mrow(·), Mcol(·)). Mrow, Mcol, and M will be referred to as the payoff function for the row player, the payoff function for the column player, and simply the payoff function, respectively. When a two-person game satisfies Mrow = −Mcol, the game is said to be zero-sum. Whenever (αrow, αcol) is a mixed strategy Nash equilibrium in a two-person zero-sum game, we say that E[Mrowrow, αcol)] is a value for that game under mixed strategies. The game value thus defined is known to be unique [OR94]. The set of all distri- butions over the pure strategies of the row player, or equivalently, the set of all mixed strategies of the row player, will be denoted Prow. Pcol is defined similarly.

(10)

2.2 Randomized Circuits

Let o : {0, 1}l → {0, 1}m be a function. A randomized o-oracle circuit C is a collection of gates with ordered input and output pins, and directed wires going from output pins to input pins, together with a specification of several distinct ordered output pins p1, . . . , pn as the output bits of the whole circuit. Cycles are not permitted. That is, we cannot start from some gate, keep traveling along outgoing wires and finally reach the gate we started with.

The set of gates is Ginp∪ Grand ∪ Gstd ∪ Goracle, where Ginp = {ω1, . . . , ωt} is the set of input gates, Grand the set of random gates, Gstd the set of standard gates, and Goracle the set of oracle gates. Each non-oracle gate has exactly one output pin. Each input pin should receive exactly one incoming wire. The indegree of a gate is the number of its incoming wires. Each input or random gate has indegree zero, each standard gate has indegree zero, one, or two, and each oracle gate has indegree l. The output pin of each standard gate of indegree zero is labeled either by the constant 0 or by the constant 1. The output pin of each standard gate of indegree one is labeled by the negation function NOT : {0, 1} → {0, 1}, which maps 0 to 1 and 1 to 0. The output pin of each standard gate of indegree two is labeled either by the conjuction AND : {0, 1}2 → {0, 1} or by the disjunction OR : {0, 1}2 → {0, 1}. The AND function outputs a 1 when its inputs are both 1 and a 0 otherwise. The OR function outputs a 0 when its inputs are both 0 and a 1 otherwise. Each oracle gate has m output pins labeled with functions pin1 : {0, 1}l → {0, 1}, . . . , pinm : {0, 1}l → {0, 1}. The function pini is the ith output bit of o.

The wires in C carry Boolean values 0 and 1. For any assignment of Boolean values to the input gates, we compute the outputs of C by first labeling independently and equiprobably a constant 0 or 1 for each output

(11)

pin of a random gate. We then proceed by propagating values along the wires and computing the functions labeled with the respective pins until the output bits of C are obtained. The size of a randomized o-oracle circuit is defined to be the number of its gates. A randomized circuit is a randomized o-oracle circuit without oracle gates. A circuit is deterministic if it has no random gates.

A well-known counting technique due to Shannon can be adapted to bound the number of randomized o-oracle circuits of size s with n output bits.

Lemma 1. ([Sha49]) Fix a Boolean function o : {0, 1}l → {0, 1}m. For each s ∈ N+, there are fewer than 2O(ls log(sm)+n log(sm)) randomized o-oracle circuits of size s with n output bits.

Proof. The input gates are ω1, . . . , ωt, for some 1 ≤ t ≤ s. Each gate, except the input gates, is one of seven kinds. Each gate receives at most max(l, 2) incoming wires, each coming from some output pin of another gate. Finally, there are at most (sm)n ways to select the output bits in order. To sum up, the number of such circuits is at most s7s((sm)max(l,2))s(sm)n. A direct computation completes the proof.

We will need a well-known circuit called a multiplexer.

Lemma 2. ([KB04]) There is an O((m + n)2m)-size deterministic circuit that, on Boolean input values bi,j, 1 ≤ i ≤ 2m, 1 ≤ j ≤ n and an integer 1 ≤ g ≤ 2m in binary, outputs bg,j, 1 ≤ j ≤ n.

Proof. For 1 ≤ i ≤ 2m, an O(m)-size circuit computes whether g = i. Having computed whether g = i for 1 ≤ i ≤ 2m, each output bit bg,j can be computed by an O(2m)-size deterministic circuit.

(12)

2.3 Tail Inequalities

We will need two results regarding tail probabilities. The first is the famous Chernoff bound.

Fact 2. ([Che52]) Let X1, X2, . . . , Xn be independent 0-1 random variables such that for 1 ≤ i ≤ n, Pr[Xi = 1] = pi. Then, for any δ > 0,

Pr Σni=1Xi

n > (1 + δ)Σni=1pi n



<

 eδ (1 + δ)1+δ

Σni=1pi

.

The next famous bound is Hoeffding’s inequality.

Fact 3. ([Hoe63]) Let X1, X2, . . . , Xn be n independent random variables with the same probability distribution, each ranging over the (real) interval [ a, b ], and let µ denote the expected value of each of these variables. Then, for every  > 0,

Pr



Σni=1Xi

n − µ

> 



< 2e

22n (b−a)2.

2.4 Conventions and Assumptions

We consider families of two-person games in strategic form, parameterized by a positive integer n. In the game parameterized by n, each player is given a parameter 1n and has {0, 1}n as his set of pure strategies. For each player, we use either {0, 1}n or numbers 0 to 2n− 1 to represent his pure strategies.

The payoff functions Mrow, Mcol, and M for the game parameterized by n will be written as Mrow(n), Mcol(n), and M(n), respectively. We assume kMrow(n)k≤ nk and kMcol(n)k≤ nk for some positive constant k, where for every real-valued,

(13)

continuous, bounded function f , its supremum norm kf k is the supremum of the image of | f | [Rud76]. Whenever M(n) is used as an oracle, both its output components, Mrow(n) and Mcol(n), are in binary.

The players may be either computationally unbounded, time-bounded Turing machines, or polynomial-size (in n) randomized M(n)-oracle circuits.

Using the tableau method, randomized polynomial-size M(n)-oracle circuits can be simulated by randomized polynomial-time Turing machines with poly- nomial advice and an M(n)-oracle, and vice versa [Sip05]. The set of random- ized M(n)-oracle circuits of size s with n output bits is denoted SIZEMs,n(n), or simply SIZEMs when the value of n is clear from the text. Similarly, the set of randomized circuits (without oracle gates) of size s with n output bits is SIZEs,n, or SIZEs when the value of n is clear from the text. Throughout this thesis, we say that a (deterministic or nondeterministic) Turing machine runs in polynomial time (resp. logarithmic space) if its time (resp. space) complexity is polynomial (resp. logarithmic) in its input length, which is n in the game parameterized by n.

We abuse our notation slightly by using Mrow(n)(R, C) to denote the pay- off of the row player when the row player is R and the column player is C, and E[Mrow(n)(R, C)] its expected value. The expectation is over the random coin flips of R and C. If C always uses pure strategy j, we also denote Mrow(n)(R, C) by Mrow(n)(R, j). The same convention is adopted for Mcol(n). Simi- larly, M(n)(R, C) denotes (Mrow(n)(R, C), Mcol(n)(R, C)), and M(n)(R, j) denotes (Mrow(n)(R, j), Mcol(n)(R, j)).

(14)

Chapter 3

Extraction of Small Numbers of Good Strategies

In this chapter, we will often encounter a function  : N+ → (0, 1). For convenience, we will write (n) simply as  when the value of n is clear from the context. We first prove the following theorem.

Theorem 1. Let k > 0, d ≥ 1, and c > d + k + 1 be constants. Con- sider a family of two-person strategic games parameterized by n ∈ N+. Let M(n) : {0, 1}n × {0, 1}n → [−nk, nk]2 be the payoff function for the game parameterized by n. Let  : N+ → (0, 1) be a function. We assume that the binary representation of each number in the range of Mrow(n) and Mcol(n) is of length polynomial in n. For each n, there is a set S ⊆ {0, 1}n of size O((nk+d+1log n)/) such that

min

C∈SIZEM

nd

max

i∈S E[Mrow(n)(i, C)] > min

C∈SIZEMnc log(1/)/

max

i∈{0,1}nE[Mrow(n)(i, C)] −  The symmetric statement, with the roles of players exchanged, holds.

Here is the interpretation of Theorem 1. Suppose the row player is in- formed of the mixed strategy of the column player. For each column player

(15)

C, the row player chooses a best response and obtains an expected payoff of

α∈Pmaxrow

E[Mrow(n)(α, C)] = max

i∈{0,1}nE[Mrow(n)(i, C)].

The equality is from Fact 1. The first expectation is over α and the coin flips of C and the second is over the coin flips of C. The expected payoff the row player could guarantee against all randomized M(n)-oracle circuits of size nclog(1/)/ is therefore

min

C∈SIZEMnc log(1/)/

max

i∈{0,1}n

E[Mrow(n)(i, C)].

Similarly,

min

C∈SIZEM

nd

maxi∈S E[Mrow(n)(i, C)]

is the expected payoff the row player could guarantee by using only strategies in S against all randomized M(n)-oracle column players of size nd, provided that the row player is informed of the mixed strategy of the column player.

For  not too small, Theorem1shows that when the circuit size of the column player shrinks polynomially (from nclog(1/)/ to nd), the set of strategies of the row player could shrink exponentially (from 2n to O((nk+d+1) log n)/) without affecting the guaranteed expected payoff of the row player too much.

Proof of Theorem 1. We will always assume n is sufficiently large whenever needed since the requirement on the size of S contains a big-O and the theorem is therefore immediately true for small values of n.

Denote

t = min

C∈SIZEMnc log(1/)/

max

i∈{0,1}nE[Mrow(n)(i, C)]

for convenience. We say that a pure row strategy i is good against a ran- domized nd-size M(n)-oracle circuit C if E[Mrow(n)(i, C)] > t − .

(16)

S will be formed in stages. We keep a set of survivors which is initially the set of all randomized nd-size M(n)-oracle circuits with n output bits. In each stage we put one pure row strategy into S and kill the survivors against which this strategy is good. We do so until there are no survivors left. If the number of stages is O((nk+d+1log n)/), then we are done.

Let Survivorsi denote the set of circuits that have not been killed after stage i. Initially, Survivors0 consists of all randomized nd-size M(n)-oracle circuits with n output bits. According to Shannon’s counting argument (Lemma 1 with nd assigned to s, 2n assigned to l and poly(n) assigned to m),

| Survivors0| ≤ 2O(nd+1log n). (3.1) Given Survivorsi, we now show how to obtain Survivorsi+1. Let T be the smallest power of 2 not less than (nk+1log n)/. Hence (nk+1log n)/ ≤ T ≤ 2(nk+1log n)/. Consider any collection of T circuits in Survivorsi, possibly with repetitions: C1, C2, . . . , CT. We construct a randomized M(n)- oracle circuit C that feeds independent random inputs to C1, C2, . . . , CT and chooses equiprobably one of the n-bit outputs of C1, C2, . . . , CT as the output of C. Summing up the sizes of C1, C2, . . . , CT and including a multiplexer (Lemma 2with T assigned to 2m), we see that C is of size

O (nk+1log n)/ · nd+ (nk+1log n)/ · (log((nk+1log n)/) + n)

< nclog(1/)/ (3.2)

for sufficiently large n. Inequality (3.2) and the definition of t show that there is an i with E[Mrow(n)(i, C)] ≥ t. This is equivalent to saying

E[Mrow(n)(i, C1)] + · · · + E[Mrow(n)(i, CT)]

T ≥ t. (3.3)

(17)

Let f be the fraction of values above t −  among

E[Mrow(n)(i, C1)], E[Mrow(n)(i, C2)], . . . , E[Mrow(n)(i, CT)].

Since kMrow(n)k≤ nk, it is clear that

E[Mrow(n)(i, C1)] + · · · + E[Mrow(n)(i, CT)]

T

≤ f nk+ (1 − f )(t − ). (3.4)

Inequalities (3.3) and (3.4) imply that t ≤ f nk + (1 − f )(t − ), or that f ≥ /(nk− t + ). This and the fact that  ∈ (0, 1) and | t | ≤ kMrown k≤ nk result in

f > /(3nk)

for sufficiently large n. That is, i is good against more than a /(3nk) fraction of players among C1, C2, . . . , CT for sufficiently large n.

Next suppose we actually pick each of C1, C2, . . . , CT independently and uniformly from Survivorsi. Fix arbitrarily, if any, a pure row strategy i0 good against less than an /(4nk) fraction of Survivorsi. Let fi0/(3nk) be the fraction of players in Survivorsi against which i0 is good. By the choice of i0, we have fi0 < 3/4. The Chernoff bound (Fact 2 with Xj = 1 if i0 is good against Cj and 0 otherwise, pj = fi0/(3nk), T assigned to n and −1 + 1/fi0 assigned to δ) gives

Pri0 is good against more than an /(3nk) fraction of C1, . . . , CT



< (fi0e1−fi0)T /(3nk) ≤ e−Ω(n log n)

.

The probability is over the picking of C1, . . . , CT. The last inequality is true because the function xe1−xhas positive derivative on (0, 1) and (3/4)e1−3/4 <

1. By summing over i0 ∈ {0, 1}n, the probability is at most 2ne−Ω(n log n)

(18)

that some pure row strategy good against less than an /(4nk) fraction of Survivorsi is good against more than an /(3nk) fraction of randomly picked C1, C2, . . . , CT. Since 2ne−Ω(n log n) < 1 for sufficiently large n, a probabilistic argument shows that there is a choice of C1, C2, . . . , CT such that every pure row strategy good against more than an /(3nk) fraction of C1, C2, . . . , CT must be good against at least an /(4nk) fraction of Survivorsi. We have seen in the last paragraph that for this (in fact, every) choice of C1, C2, . . . , CT, there is an i good against more than an /(3nk) fraction of C1, C2, . . . , CT. This i must therefore be good against at least an /(4nk) fraction of Survivorsi. We add this i to S and obtain

| Survivorsi+1| ≤ (1 − /(4nk))| Survivorsi|.

From this and inequality (3.1), after O((nk+d+1log n)/) stages the number of survivors will be less than one. At that time, there must be no survivors left.

The same theorem holds for circuits without oracle gates.

Corollary 1. Let k > 0, d ≥ 1 and c > d + k + 1 be constants. Consider a family of two-person strategic games parameterized by n ∈ N+. Let M(n): {0, 1}n× {0, 1}n → [−nk, nk]2 be the payoff function for the game. Let  : N+ → (0, 1) be a function. We assume that the binary representation of each number in the range of Mrow(n) and Mcol(n) is of length polynomial in n. For each n, there is a set S ⊆ {0, 1}n of size O((nk+d+1log n)/) such that

min

C∈SIZEndmax

i∈S E[Mrow(n)(i, C)]

> min

C∈SIZEnc log(1/)/

max

i∈{0,1}n

E[Mrow(n)(i, C)] − 

The symmetric statement, with the roles of players exchanged, holds.

(19)

Proof. The proof is the same as that of Theorem 1but with each occurrence of “randomized M(n)-oracle circuit” replaced with “randomized circuit.”

As have been pointed out in [FKS95], Newman’s result implies the fol- lowing, which is relevant to our work [New91].

Lemma 3. ([New91]) Consider a family of two-person zero-sum strategic games parameterized by n ∈ N+. Let Mrow(n) : {0, 1}n × {0, 1}n → [−nk, nk] be the payoff function for the row player. Let v be the game value under mixed strategies for the game parameterized by n, and  : N+ → (0, 1) be a function. For every T ≥ 2n2k+1/2 there is a multiset of T pure row strategies such that if the row player selects equiprobably one of these strategies, his expected payoff is at least v −  against any mixed strategy in Pcol adopted by the column player. Similarly, there is a multiset of T pure column strategies such that if the column player selects equiprobably one of these strategies, his expected payoff is at least −v −  against any mixed strategy in Prow adopted by the row player.

Proof. Let T ≥ 2n2k+1/2. Let each of the independent random variables α1, α2, . . ., αT be distributed identically as the mixed strategy of the row player in a mixed strategy Nash equilibrium. For an arbitrary pure column strategy j, we have E[Mrow(n)i, j)] ≥ v, i = 1, . . . , T , by the definition of the game value under mixed strategies. From Hoeffding’s inequality (Fact3with [ a, b ] = [−nk, nk], Mrow(n)i, j) assigned to Xi and T assigned to n), we have

Pr

"

Mrow(n)1, j) + · · · + Mrow(n)T, j)

T < v − 

#

< 2e−n.

So with probability at most 2n2e−n< 1, there exists a j with Mrow(n)1, j) + · · · + Mrow(n)T, j)

T < v − .

(20)

Hence there exist pure strategies i1, i2, . . . , iT for the row player with Mrow(n)(i1, j) + · · · + Mrow(n)(iT, j)

T ≥ v − , ∀j.

By selecting equiprobably from among i1, i2, . . ., iT, the row player can guarantee an expected payoff of at least v −  against any mixed strategy of the column player. The second part of this corollary can be proved by observing that by exchanging the roles of the two players, the game value becomes −v.

We now consider the effects of Lemma3 on Theorem1.

Corollary 2. Let k > 0 and c > 2k + 2 be constants. Consider a family of two-person strategic games parameterized by n ∈ N+. Let M(n) : {0, 1}n× {0, 1}n → [−nk, nk]2 be the payoff function for the game Gn. Let  : N+ → (0, 1) be a function. There is a multiset S ⊆ {0, 1}n of size O(n2k+1/2) such that the row player RS who plays each strategy in S equiprobably guarantees

β∈Pmincol

E[Mrow(n)(RS, β)] > min

C∈SIZEM

nc log(1/)/2

max

i∈{0,1}nE[Mrow(n)(i, C)] −  for sufficiently large n.

Proof. We may define a zero-sum game G0n with the payoff function −Mrow(n)

for the column player and Mrow(n) for the row player. Let v0 be the value of G0n under mixed strategies. Let T be the smallest power of 2 not less than 2n2k+1/2. Using Lemma 3 on G0n, we see that there is a multiset S0 of size T such that, for the column player CS0 choosing equiprobably a strategy in S0,

∀α ∈ Prow, E[Mrow(n)(α, CS0)] ≤ v0+ /3. (3.5) The same holds in the original game Gn because the games Gnand G0nadopt the same payoff function for the row player and CS0 makes no queries. CS0

(21)

could be implemented as a circuit by hardwiring S0, adding random gates, and including a multiplexer. From Lemma 2 with T assigned to 2m, such a circuit is of size

O n2k+1/2 · n + n2k+1/2· (log(n2k+1/2) + n)

< nclog(1/)/2, (3.6)

for sufficiently large n. Denote

t = min

C∈SIZEM

nc log(1/)/2

max

i∈{0,1}nE[Mrow(n)(i, C)].

Inequalities (3.5), (3.6), and our definition of t gives t ≤ v0+/3 for sufficiently large n, or, equivalently,

v0 ≥ t − /3, (3.7)

for sufficiently large n. Applying Lemma 3 to G0n, we see that by selecting equiprobably from among some T strategies S, the row player can guarantee an expected payoff of at least v0 − /3 against every mixed strategy of the column player. Again, this must also be true in Gn. Inequality (3.7) implies v0 − /3 ≥ t − 2/3 > t −  and completes the proof.

Since

∀β ∈ Pcol, E[Mrow(n)(RS, β)] ≤ max

i∈S E[Mrow(n)(i, β)],

a direct computation on the sizes of S and the circuit sizes in Theorem1and Corollary 2shows that Theorem 1 with d = k + 3 and  > 1/n is implied by Corollary2, and hence by Lemma3. It is not known if Theorem1is implied by Lemma 3, however.

(22)

Chapter 4

Implications on the Power of Players

We use Theorem 1 to investigate the computational resource needed for a player to guarantee a good expected payoff against computationally-bounded players. A naive player performs exponentially (in n) many queries to M(n) to determine an optimal strategy, pure or mixed, but we show how a player could run in less than exponential time without degrading his payoff too much.

Corollary 3. Let k > 0, d ≥ 1, c > k + d + 1, and 0 <  < 1 be constants. Consider a family of two-person strategic games parameterized by a positive integer n. Let M(n) : {0, 1}n × {0, 1}n → [−nk, nk]2 be the payoff function. We assume that the binary representation of each num- ber in the range of Mrow(n) and Mcol(n) is of length polynomial in n. Denote t = minC∈SIZEM

nc log(1/)/maxi∈{0,1}nE[Mrow(n)(i, C)]. The following statements are true.

(i) If R is informed of the circuit computing C, then he needs only poly- nomial time with polynomial advice (in n), private fair coin flips, and

(23)

an M(n)-oracle to guarantee that E[Mrow(n)(R, C)] > t −  for every ran- domized nd-size M(n)-oracle C.

(ii) Assume R is allowed to produce his output after he sees the output strategy of C. If the language {i, j, k, n| the kth bit in the binary rep- resentation of Mrow(n)(i, j) is a 1 } is in NL/poly, then there is a non- deterministic logarithmic-space row player R with polynomial advice that always has a unique accepting branch, and E[Mrow(n)(R, C)] > t −  holds for every randomized nd-size M(n)-oracle C. Here the output of R is defined to be the string it leaves on its output tape on the unique accepting branch.

(iii) If R is required to determine his output strategy simultaneously with C (which is unknown to R), and if he obtains an additional O(log n) bits of information about C, then he needs only deterministic polynomial time with polynomial advice (in n) to guarantee that E[Mrow(n)(R, C)] > t −  holds for every randomized nd-size M(n)-oracle circuit C.

Here is the interpretation of Corollary 3. Assuming that R is informed of the mixed strategy (or the circuit) of the column player, the value t is the expected payoff R could guarantee against all randomized M(n)-oracle col- umn players of size nclog(1/)/. Item (i) states that with polynomial advice and oracle access to M(n), a polynomial-time R suffices to guarantee a t −  payoff against all randomized nd-size M(n)-oracle column players. Item (ii) further assumes that R chooses his output strategy after the column player does, and M(n) is computed by a nondeterministic logarithmic-space Turing machine with polynomial advice. Under these assumptions, the polynomial- time Turing machine R in item (i) can be further reduced to an unambiguous logarithmic-space one. The guaranteed expected payoff is still at least t − 

(24)

against randomized nd-size M(n)-oracle column players. Item (iii) suggests that, even if R knows only some O(log n) bits of information about the col- umn player (of size nd), his guaranteed expected payoff is almost as if he is informed of the whole circuit of the column player (of size nclog(1/)/).

Proof of Corollary 3. We have seen in Theorem1that there is a set S of size polynomial in n such that, for every randomized nd-size M(n)-oracle column player C, there is a strategy i ∈ S with E[Mrow(n)(i, C)] > t − /2. We give S as the advice to R in all the following cases.

Proof of (i). Let C be an arbitrary randomized M(n)-oracle circuit of size nd with n output bits. Being informed of the circuit computing C, R could simu- late C independently 50n2k+1/2times and obtains outputs O1, . . . , O50n2k+1/2. For each i ∈ S, he computes Mrow(n)(i, Oj), 1 ≤ j ≤ O50n2k+1/2. Hoeffding’s inequality (Fact 3with Mrow(n)(i, Oj) assigned to Xj, 50n2k+1/2 assigned to n,

/5 assigned to  and [a, b] = [−nk, nk]) yields

Pr

"

Σ50nj=12k+1/2Mrow(n)(i, Oj)

50n2k+1/2 − E[Mrow(n)(i, C)]

> /5

#

< 2e−n, (4.1)

where the probability is over the coin tosses of R. For each i ∈ S, R estimates E[Mrow(n)(i, C)] as

E[M˜ row(n)(i, C)]

≡ Mrow(n)(i, O1) + · · · + Mrow(n)(i, O50n2k+1/2)

50n2k+1/2 .

Note that the random variable ˜E[Mrow(n)(i, C)] depends solely on the random coin tosses of R, and not those of C. Using inequality (4.1), we see that

Prh

∃i ∈ S,

E[M˜ row(n)(i, C)] − E[Mrow(n)(i, C)]

> /5i

< |S|2e−n, (4.2)

(25)

where the probability is over the random coin tosses of R. The selection of S guarantees

maxi∈S E[Mrow(n)(i, C)] > t − /2. (4.3) Consider the row player R outputting an i ∈ S with the largest ˜E[Mrow(n)(i, C)].

Let i be the output of R. From inequality (4.2), with probability more than 1 − |S|2e−n

E[Mrow(n)(i, C)]

≥ E[M˜ row(n)(i, C)] − /5

= max

i∈S

E[M˜ row(n)(i, C)] − /5

≥ max

i∈S E[Mrow(n)(i, C)] − 2/5

> t − 9/10. (4.4)

The last inequality is from inequality (4.3). Since R and C use independent coin tosses, and since kMrow(n)k ≤ nk,

E[Mrow(n)(R, C)]

> (1 − |S|2e−n)(t − 9/10) + |S|2e−n(−nk)

> t − 

for sufficiently large n, as required. For smaller values of n so that the last inequality does not hold, item (i) is immediately true because a polynomial time Turing machine could use every strategy in {0, 1}n.

Proof of (ii). In this case R himself evaluates Mrow(n) on C’s output strategy and each strategy in S. He chooses the best strategy in S for output.

Since the Immerman-Szelepcsenyi theorem [Imm88] can be directly ex- tended to say that NL/poly = coNL/poly, our assumption implies that

(26)

asking if the kth bit of Mrow(n)(i, j) is 0 is also in NL/poly. We will also use the fact that NL/poly = UL/poly from [RA97].

Let C’s output strategy be j and denote S = {i1, . . . , i|S|}. The row player computes Mrow(n)(it, j) for t = 1, . . . , |S|. When computing Mrow(n)(it, j), he guesses its first bit using a nondeterministic branch and verifies it in unam- biguous logarithmic space. This can be done since NL/poly = coNL/poly = UL/poly. He then does the same for the second, third bit and so on.

It is clear that only one branch guesses completely correctly and survives.

Others are rejected. In this manner R proceeds by computing Mrow(n)(it, j), t = 1, . . . , |S|, one by one. Instead of saving all these values, which takes space polynomial in n, he needs only store the best strategy he has eval- uated so far. The corresponding Mrow(n)-value can be computed on the fly whenever it is needed. These observations yield that R runs in unambiguous logarithmic space (in n).

For an arbitrary randomized nd-size M(n)-oracle player C, let i ∈ S be such that E[Mrow(n)(i, C)] > t − . Since whatever strategy C takes, R always chooses a strategy no worse than i, we must have E[Mrow(n)(R, C)] > t − .

Proof of (iii). In this case R just needs to know which i ∈ S makes E[Mrow(n)(i, C)] >

t − . This information takes O(log n) bits.

(27)

Appendix A

Complexity Classes

A brief definition of several complexity classes are given. For detailed defini- tions, please refer to [AK].

1. P is the class of languages decidable in polynomial time.

2. P-hard is the class of languages logarithmic-space reducible from all languages in P.

3. EXP is the class of languages decidable in exponential time.

4. EXP-hard is the class of languages polynomial-time reducible from all languages in EXP.

5. EXP-complete = EXP ∩ EXP-hard.

6. promise-SP2 is the class of languages L such that there are disjoint sets Π+, Π with Π+ ⊆ L, Π ∩ L = ∅ and there is a polynomial-time computable predicate R(x, y, z) for |y| = |z| = poly(|x|) satisfying the following: ∀x ∈ Π+, ∃y∀zR(x, y, z) = 1 and ∀x ∈ Π, ∃z∀yR(x, y, z) = 0.

(28)

7. RP is the class of languages L such that there is a nondeterministic polynomial time Turing machine which on input x ∈ L accepts on at least 1/2 of its computation paths and on input x /∈ L rejects on every computation path.

8. coRP is the class of languages whose complement is in RP.

9. ZPP = RP ∩ coRP

10. NP is the class of languages L such that there is a nondeterministic polynomial-time Turing machine which on input x accepts on at least one computation path if and only if x ∈ L.

11. PPAD is the class of function problems of the following form. Given a polynomial time algorithm P that, on any input x, implicitly defines a directed graph G(x) with nodes Σp(|x|)by outputting for each y ∈ Σp(|x|) the vertices pointing to or from y, where p is a polynomial and Σ is some constant size alphabet. The graph G(x) is restricted to be one in which each vertex has indegree and outdegree at most one. Given a source of G(x) (i.e., one with indegree zero), the problem is to find another source or sink.

12. PPAD-complete is the class of function problems in PPAD reducible from every function problem in PPAD.

13. NL/poly is the class of languages L such that there is a nondeterminis- tic logarithmic-space Turing machine and a sequence of polynomially- long (in n) advices {an}n∈N such that on inputs x and a|x|, the Turing machine has an accepting computation path if and only if x ∈ L.

14. UL/poly is the same as NL/poly except the logarithmic-space Turing

(29)

machine must have exactly one computation path whatever input it is given.

15. coNL/poly is the class of languages whose complement is in NL/poly.

(30)

Bibliography

[AK] S. Aaronson and G. Kuperberg, http://www.complexityzoo.

com/.

[CD06] X. Chen and X. Deng, 2D-SPERNER is PPAD-complete, Sub- mitted to STOC (2006).

[Che52] H. Chernoff, A measure of the asymptotic efficiency of tests of a hypothesis based on the sum of observations, Annals of Math- mematical Statistics 23 (1952), 493–507.

[DGP05] C. Daskalakis, P. W. Goldberg, and C. H. Papadimitriou, The complexity of computing a nash equilibrium, Tech. Report TR05- 115, Electronic Colloquium on Computational Complexity, 2005.

[FIKU05] L. Fortnow, R. Impagliazzo, V. Kabanets, and C. Umans, On the complexity of succinct zero-sum games, Proceedings of the 20th IEEE Conference on Computational Complexity, 2005, pp. 323–

332.

[FKS95] J. Feigenbaum, D. Koller, and P. Shor, A game-theoretic classifi- cation of interactive complexity classes, Proceedings of the 10th Annual IEEE Conference on Computational Complexity, 1995, pp. 227–237.

(31)

[FPS03] L. Fortnow, A. Pavan, and S. Sengupta, Proving sat does not have small circuits with an application to the two queries problem, Pro- ceedings of the 18th Annual IEEE Conference on Computational Complexity, 2003, pp. 347–357.

[GK92] M. Grigoriadis and L. Khachiyan, Approximating solution of matrix games in parallel, Advances in Optimization and Paral- lel Computing (Amsterdam) (P. Pardalos, ed.), Elsevier, 1992, pp. 129–136.

[GK95] , A sublinear-time randomized approximation algorithm for matrix games, Operations Research Letters 18 (1995), no. 2, 53–

58.

[Hoe63] W. Hoeffding, Probability inequalities for sums of bounded ran- dom variables, Journal of the American Statistical Association 58 (1963), no. 301, 13–30.

[Imm88] N. Immerman, Nondeterministic space is closed under complemen- tation, SIAM Journal on Computing (1988), 935–938.

[KB04] R. H. Katz and G. Borriello, Contemporary logic design, 2nd ed., Prentice Hall, 2004.

[Kha79] L. G. Khachiyan, A polynomial algorithm in linear programming, Soviet Mathematics Doklady 20 (1979), 191–194.

[LN93] M. Luby and N. Nisan, A parallel approximation algorithm for positive linear programming, Proceedings of the 25th Annual ACM Symposium on Theory of Computing, 1993, pp. 448–457.

(32)

[Nas51] J. Nash, Noncooperative games, Annals of Mathematics 54 (1951), 289–295.

[New91] I. Newman, Private vs. common random bits in communication complexity, Information Processing Letters 39 (1991), 67–71.

[OR94] M. J. Osborne and A. Rubinstein, A course in game theory, MIT Press, 1994.

[Owe82] G. Owen, Game theory, Academic Press, 1982.

[PST95] S. Plotkin, D. Shmoys, and E. Tardos, Fast approximation algo- rithms for fractional packing and covering problems, Mathematics of Operations Research 20 (1995), no. 2, 257–301.

[RA97] Klaus Reinhardt and Eric Allender, Making nondeterminism un- ambiguous, Proceedings of the 38th Annual IEEE Symposium on Foundations of Computer Science, 1997, pp. 244–253.

[Rud76] W. Rudin, Principles of mathematical analysis, 3rd ed., McGraw- Hill, 1976.

[Sha49] C. E. Shannon, Communication in the presence of noise, IRE 37 (1949), 10–21.

[Sip05] M. Sipser, Introduction to the theory of computation, 2nd ed., Course Technology, 2005.

參考文獻

相關文件

You are given the wavelength and total energy of a light pulse and asked to find the number of photons it

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =&gt;

incapable to extract any quantities from QCD, nor to tackle the most interesting physics, namely, the spontaneously chiral symmetry breaking and the color confinement.. 

• Formation of massive primordial stars as origin of objects in the early universe. • Supernova explosions might be visible to the most

Miroslav Fiedler, Praha, Algebraic connectivity of graphs, Czechoslovak Mathematical Journal 23 (98) 1973,

Researches of game algorithms from earlier two-player games and perfect information games extend to multi-player games and imperfect information games3. There are many kinds of

The difference resulted from the co- existence of two kinds of words in Buddhist scriptures a foreign words in which di- syllabic words are dominant, and most of them are the

(Another example of close harmony is the four-bar unaccompanied vocal introduction to “Paperback Writer”, a somewhat later Beatles song.) Overall, Lennon’s and McCartney’s