On the Computational Power of Players in Two-Person Strategic Games

(1)

On the Computational Power of Players in Two-Person Strategic Games

Advisor: Prof. Yuh-Dauh Lyuu

Ching-Lueh Chang

Department of Computer Science and Information Engineering

National Taiwan University

(2)

We consider families of two-person strategic games parameterized by a positive integer n. We assume that each of the players, the row player and the column player, has 2ⁿ strategies to choose from and can take mixed strategies. We also assume that the games are not too “risky” in that the payoffs are at most polynomial in n in absolute value. The row player is said to guarantee an expected payoff of t ∈ R against all column players of a certain class when his expected payoff is at least t against all column players of that class. This thesis studies the expected payoff the row player could guarantee against all column players of a certain computational power. In our main result we consider the case where the row player is informed of how the column player chooses his action but is not allowed to see the internal coin tosses of the column player. Roughly speaking, we show that when the computational power of the column player shrinks polynomially, the row player can have his number of pure strategies shrunk exponentially without harming his guaranteed expected payoff too much. We obtain several corollaries regarding the computational power needed by the row player to guarantee a good expected payoff against computationally bounded column players in situations where the row player is aware of the output strategy of the column player, the mixed strategy of the column player, or only O(log n) bits of information about the column player.

(4)

Chapter 1 Introduction

1.1 Two-Person Strategic Games

In a two-person strategic game there are two players, the row player and the column player. Each player has several pure strategies to choose from. There is a payoff function mapping each combination of the players’ pure strategies to their payoffs. Each player can also adopt a mixed strategy, that is, a probability distribution over pure strategies. In a two-person game, a pair of mixed strategies (α1, α2) is a mixed strategy Nash Equilibrium if no player i has a strategy yielding an expected payoff higher than when he chooses α_i, given that the other player j chooses αj. The expectation is over the mixed strategies adopted by both players. If the payoffs of the two players sum to 0, the game is said to be zero-sum. For a zero-sum game, the expected payoff for the row player when a mixed strategy Nash equilibrium is played is called a game value under mixed strategies. The game value thus defined is unique [OR94]. For a precise definition of these and some other terms please refer to Section 2.1.

(5)

1.2 Our Results

We consider families of two-person strategic games parameterized by a positive integer n. In the game parameterized by n, each player is given input 1ⁿ and oracle access to the payoff function mapping each combination of the players’ pure strategies to their payoffs. The payoffs are assumed to be bounded by n^k in absolute value for some constant k. For technical reasons we will sometimes need to assume that the payoffs can be represented in binary using polynomially many bits (in n). Each player has {0, 1}ⁿ as his set of pure strategies and can take mixed strategies. The row player is said to guarantee an expected payoff of t ∈ R against all column players of a certain class if his expected payoff is at least t against all column players of that class. This thesis considers the case where the row player is informed of how the column player chooses his action but is not allowed to see the internal coin tosses of the column player. In this case, therefore, the row player is said to guarantee an expected payoff of t against all column players of a certain class if, against each column player of that class, the row player can take some strategy to force an expected payoff of at least t. In particular, the row player is interested in the expected payoff he could guarantee against all column players being randomized circuits (with oracle gates for the players’

payoff functions) of a certain size s. This thesis shows that when s shrinks polynomially (in n), the row player could have his number of pure strategies shrunk exponentially (in n) without harming his guaranteed expected payoff too much. We obtain several corollaries regarding the computational power needed by the row player to guarantee a good expected payoff against randomized circuits (acting as the column player) of a certain size, in situations where the row player is informed of the action taken by the column player, the mixed strategy of the column player, or only some O(log n) bits

(6)

of information about the column player.

1.3 Related Works

The computational power of players in games is a topic being actively re- searched. We briefly describe several results in this area. For the definitions of standard complexity classes please refer to the complexity zoo [AK]. In this section when a game is given to an algorithm, it is given by explicitly specifying the payoffs for both players given each combination of pure strategies being adopted by the players, unless otherwise specified. It is well-known that for a two-person zero-sum game, a mixed strategy Nash equilibrium, and hence the game value, can be found in polynomial time by linear programming [Kha79, Owe82]. It is also known that given a two-person zero-sum game and a number v, computing whether the game value equals v is P-hard [FIKU05]. Given a polynomial-size circuit that computes the payoff function of a two-person zero-sum game and a number v, computing whether the game value equals v is known to be EXP-complete [FKS95]. A mixed strategy α in a two-person zero-sum game is said to be -optimal if, when the corresponding player uses α, his least possible expected payoff (against arbitrarily malicious players) is at most lower than that when a mixed strategy Nash equilibrium is played. Newman has shown that in any two-person zero- sum game with payoffs in [ 0, 1 ], each player has an -optimal strategy which chooses uniformly from a multiset of O(log N/²) pure strategies, where N is the number of pure strategies of the other player [New91]. Finding -optimal strategies can be done efficiently in parallel [GK92, GK95, LN93, PST95], as well as sequentially in sublinear time by a randomized algorithm [GK95].

Given a polynomial-size circuit that computes the payoff function of a two-

(7)

person zero-sum game, approximating the game value to within an additive factor is complete for the class promise-S^p₂ [FIKU05], and a pair of -optimal strategies can be constructed by a ZPP^NP algorithm [FIKU05]. After a series of exciting developments initiated by Daskalakis, Goldberg, and Pa- padimitriou [DGP05], Cheng and Deng [CD06] recently show that computing a mixed strategy Nash equilibrium is PPAD-complete for two-person games.

The rest of the thesis is organized as follows. Chapter 2 presents the necessary background knowledge and some conventions. Chapter 3 shows how to extract a small number of strategies good against players of a certain computational power using an idea similar to that in [FPS03]. Chapter 4 applies the result to analyze the computational power needed by a player to guarantee a certain expected payoff in several cases.

(8)

Chapter 2 Preliminaries

2.1 Basic Terms in Game Theory

A finite N -person game G in strategic form consists of players 1, . . . , N , N finite sets A1, A2, . . . , AN, and for i = 1, . . . , N , a payoff function Mi : A₁ × A₂ × · · · × A_N → R. For i = 1, . . . , N, Ai is the finite set of pure strategies player i can take, and Mi maps the players’ pure strategies to the payoff for player i.

Player i can adopt a probability distribution over Ai as his strategy. In this case we say that player i takes a mixed strategy. A mixed strategy α_i of player i is also used alternatively to represent a random variable whose distribution is specified by α_i as the ambiguity will not arise. A pure strategy is obviously a degenerate mixed strategy.

Now suppose α_i is the mixed strategy of player i and α₁, . . . , α_N are independent. We say that αi is a best response to ×1≤k≤N,k6=iαk if for every mixed strategy α_i⁰ of player i,

E[M_i(α₁, . . . , α_i−1, α_i⁰, α_i+1, , . . . , α_N)]

≤ E[M_i(α₁, . . . , α_i−1, α_i, α_i+1, . . . , α_N)].

(9)

The randomness in the first expectation comes from α₁, . . . , α_i−1, α_i⁰, α_i+1, . . . , α_N, that of the second expectation comes from α₁, . . . , α_N. We say that (α₁, . . . , α_N) is a mixed strategy Nash equilibrium for G if for each i, α_i is a best response to ×_{1≤k≤N,k6=i}α_k. A celebrated theorem due to Nash states that every finite N -person game in strategic form has a mixed strategy Nash equilibrium [Nas51]. The following fact is well-known.

Fact 1. ([OR94]) Let α1, . . . , αN be mixed strategies for players 1, . . . , N in a finite N -person game in strategic form. For 1 ≤ i ≤ N , there is a pure strategy for the ith player which is a best response to ×1≤k≤N,k6=iαk.

When there are only two players, player 1 is called the row player and player 2 the column player. The corresponding payoff functions will be denoted M_row : A_row× A_col→ R and Mcol: A_row× A_col→ R, respectively. For convenience, we will sometimes combine Mrowand Mcolinto a single function M (·) = (M_row(·), M_col(·)). M_row, M_col, and M will be referred to as the payoff function for the row player, the payoff function for the column player, and simply the payoff function, respectively. When a two-person game satisfies Mrow = −Mcol, the game is said to be zero-sum. Whenever (αrow, αcol) is a mixed strategy Nash equilibrium in a two-person zero-sum game, we say that E[Mrow(αrow, αcol)] is a value for that game under mixed strategies. The game value thus defined is known to be unique [OR94]. The set of all distri- butions over the pure strategies of the row player, or equivalently, the set of all mixed strategies of the row player, will be denoted P_row. P_col is defined similarly.

(10)

2.2 Randomized Circuits

Let o : {0, 1}^l → {0, 1}^m be a function. A randomized o-oracle circuit C is a collection of gates with ordered input and output pins, and directed wires going from output pins to input pins, together with a specification of several distinct ordered output pins p₁, . . . , p_n as the output bits of the whole circuit. Cycles are not permitted. That is, we cannot start from some gate, keep traveling along outgoing wires and finally reach the gate we started with.

The set of gates is G_inp∪ G_rand ∪ G_std ∪ G_oracle, where G_inp = {ω₁, . . . , ω_t} is the set of input gates, G_rand the set of random gates, G_std the set of standard gates, and G_oracle the set of oracle gates. Each non-oracle gate has exactly one output pin. Each input pin should receive exactly one incoming wire. The indegree of a gate is the number of its incoming wires. Each input or random gate has indegree zero, each standard gate has indegree zero, one, or two, and each oracle gate has indegree l. The output pin of each standard gate of indegree zero is labeled either by the constant 0 or by the constant 1. The output pin of each standard gate of indegree one is labeled by the negation function NOT : {0, 1} → {0, 1}, which maps 0 to 1 and 1 to 0. The output pin of each standard gate of indegree two is labeled either by the conjuction AND : {0, 1}² → {0, 1} or by the disjunction OR : {0, 1}² → {0, 1}. The AND function outputs a 1 when its inputs are both 1 and a 0 otherwise. The OR function outputs a 0 when its inputs are both 0 and a 1 otherwise. Each oracle gate has m output pins labeled with functions pin₁ : {0, 1}^l → {0, 1}, . . . , pin_m : {0, 1}^l → {0, 1}. The function pin_i is the ith output bit of o.

The wires in C carry Boolean values 0 and 1. For any assignment of Boolean values to the input gates, we compute the outputs of C by first labeling independently and equiprobably a constant 0 or 1 for each output

(11)

pin of a random gate. We then proceed by propagating values along the wires and computing the functions labeled with the respective pins until the output bits of C are obtained. The size of a randomized o-oracle circuit is defined to be the number of its gates. A randomized circuit is a randomized o-oracle circuit without oracle gates. A circuit is deterministic if it has no random gates.

A well-known counting technique due to Shannon can be adapted to bound the number of randomized o-oracle circuits of size s with n output bits.

Lemma 1. ([Sha49]) Fix a Boolean function o : {0, 1}^l → {0, 1}^m. For each s ∈ N⁺, there are fewer than 2O(ls log(sm)+n log(sm)) randomized o-oracle circuits of size s with n output bits.

Proof. The input gates are ω1, . . . , ωt, for some 1 ≤ t ≤ s. Each gate, except the input gates, is one of seven kinds. Each gate receives at most max(l, 2) incoming wires, each coming from some output pin of another gate. Finally, there are at most (sm)ⁿ ways to select the output bits in order. To sum up, the number of such circuits is at most s7^s((sm)^max(l,2))^s(sm)ⁿ. A direct computation completes the proof.

We will need a well-known circuit called a multiplexer.

Lemma 2. ([KB04]) There is an O((m + n)2^m)-size deterministic circuit that, on Boolean input values b_i,j, 1 ≤ i ≤ 2^m, 1 ≤ j ≤ n and an integer 1 ≤ g ≤ 2^m in binary, outputs bg,j, 1 ≤ j ≤ n.

Proof. For 1 ≤ i ≤ 2^m, an O(m)-size circuit computes whether g = i. Having computed whether g = i for 1 ≤ i ≤ 2^m, each output bit b_g,j can be computed by an O(2^m)-size deterministic circuit.

(12)

2.3 Tail Inequalities

We will need two results regarding tail probabilities. The first is the famous Chernoff bound.

Fact 2. ([Che52]) Let X₁, X₂, . . . , X_n be independent 0-1 random variables such that for 1 ≤ i ≤ n, Pr[X_i = 1] = p_i. Then, for any δ > 0,

Pr Σⁿ_i=1X_i

n > (1 + δ)Σⁿ_i=1p_i n

<

e^δ (1 + δ)^1+δ

^Σⁿi=1pi

.

The next famous bound is Hoeffding’s inequality.

Fact 3. ([Hoe63]) Let X₁, X₂, . . . , X_n be n independent random variables with the same probability distribution, each ranging over the (real) interval [ a, b ], and let µ denote the expected value of each of these variables. Then, for every > 0,

Pr

Σⁿ_i=1X_i

n − µ

>

< 2e⁻

22n (b−a)2.

2.4 Conventions and Assumptions

We consider families of two-person games in strategic form, parameterized by a positive integer n. In the game parameterized by n, each player is given a parameter 1ⁿ and has {0, 1}ⁿ as his set of pure strategies. For each player, we use either {0, 1}ⁿ or numbers 0 to 2ⁿ− 1 to represent his pure strategies.

The payoff functions M_row, M_col, and M for the game parameterized by n will be written as Mrow⁽ⁿ⁾, M_col⁽ⁿ⁾, and M⁽ⁿ⁾, respectively. We assume kMrow⁽ⁿ⁾k∞≤ n^k and kM_col⁽ⁿ⁾k_∞≤ n^k for some positive constant k, where for every real-valued,

(13)

continuous, bounded function f , its supremum norm kf k∞ is the supremum of the image of | f | [Rud76]. Whenever M⁽ⁿ⁾ is used as an oracle, both its output components, Mrow⁽ⁿ⁾ and M_col⁽ⁿ⁾, are in binary.

The players may be either computationally unbounded, time-bounded Turing machines, or polynomial-size (in n) randomized M⁽ⁿ⁾-oracle circuits.

Using the tableau method, randomized polynomial-size M⁽ⁿ⁾-oracle circuits can be simulated by randomized polynomial-time Turing machines with polynomial advice and an M⁽ⁿ⁾-oracle, and vice versa [Sip05]. The set of randomized M⁽ⁿ⁾-oracle circuits of size s with n output bits is denoted SIZE^M_s,n⁽ⁿ⁾, or simply SIZE^M_s when the value of n is clear from the text. Similarly, the set of randomized circuits (without oracle gates) of size s with n output bits is SIZE_s,n, or SIZE_s when the value of n is clear from the text. Throughout this thesis, we say that a (deterministic or nondeterministic) Turing machine runs in polynomial time (resp. logarithmic space) if its time (resp. space) complexity is polynomial (resp. logarithmic) in its input length, which is n in the game parameterized by n.

We abuse our notation slightly by using Mrow⁽ⁿ⁾(R, C) to denote the payoff of the row player when the row player is R and the column player is C, and E[Mrow⁽ⁿ⁾(R, C)] its expected value. The expectation is over the random coin flips of R and C. If C always uses pure strategy j, we also denote Mrow⁽ⁿ⁾(R, C) by Mrow⁽ⁿ⁾(R, j). The same convention is adopted for M_col⁽ⁿ⁾. Simi- larly, M⁽ⁿ⁾(R, C) denotes (Mrow⁽ⁿ⁾(R, C), M_col⁽ⁿ⁾(R, C)), and M⁽ⁿ⁾(R, j) denotes (Mrow⁽ⁿ⁾(R, j), M_col⁽ⁿ⁾(R, j)).

(14)

Chapter 3 Extraction of Small Numbers of Good Strategies

In this chapter, we will often encounter a function : N⁺ → (0, 1). For convenience, we will write (n) simply as when the value of n is clear from the context. We first prove the following theorem.

Theorem 1. Let k > 0, d ≥ 1, and c > d + k + 1 be constants. Con- sider a family of two-person strategic games parameterized by n ∈ N⁺. Let M⁽ⁿ⁾ : {0, 1}ⁿ × {0, 1}ⁿ → [−n^k, n^k]² be the payoff function for the game parameterized by n. Let : N⁺ → (0, 1) be a function. We assume that the binary representation of each number in the range of Mrow⁽ⁿ⁾ and M_col⁽ⁿ⁾ is of length polynomial in n. For each n, there is a set S ⊆ {0, 1}ⁿ of size O((n^k+d+1log n)/) such that

min

C∈SIZE^M

nd

max

i∈S E[M_row⁽ⁿ⁾(i, C)] > min

C∈SIZE^Mnc log(1/)/

max

i∈{0,1}ⁿE[M_row⁽ⁿ⁾(i, C)] − The symmetric statement, with the roles of players exchanged, holds.

Here is the interpretation of Theorem 1. Suppose the row player is informed of the mixed strategy of the column player. For each column player

(15)

C, the row player chooses a best response and obtains an expected payoff of

α∈Pmaxrow

E[M_row⁽ⁿ⁾(α, C)] = max

i∈{0,1}ⁿE[M_row⁽ⁿ⁾(i, C)].

The equality is from Fact 1. The first expectation is over α and the coin flips of C and the second is over the coin flips of C. The expected payoff the row player could guarantee against all randomized M⁽ⁿ⁾-oracle circuits of size n^clog(1/)/ is therefore

min

max

i∈{0,1}ⁿ

E[M_row⁽ⁿ⁾(i, C)].

Similarly,

min

C∈SIZE^M

nd

maxi∈S E[M_row⁽ⁿ⁾(i, C)]

is the expected payoff the row player could guarantee by using only strategies in S against all randomized M⁽ⁿ⁾-oracle column players of size n^d, provided that the row player is informed of the mixed strategy of the column player.

For not too small, Theorem1shows that when the circuit size of the column player shrinks polynomially (from n^clog(1/)/ to n^d), the set of strategies of the row player could shrink exponentially (from 2ⁿ to O((n^k+d+1) log n)/) without affecting the guaranteed expected payoff of the row player too much.

Proof of Theorem 1. We will always assume n is sufficiently large whenever needed since the requirement on the size of S contains a big-O and the theorem is therefore immediately true for small values of n.

Denote

t = min

max

i∈{0,1}ⁿE[M_row⁽ⁿ⁾(i, C)]

for convenience. We say that a pure row strategy i is good against a randomized n^d-size M⁽ⁿ⁾-oracle circuit C if E[Mrow⁽ⁿ⁾(i, C)] > t − .

(16)

S will be formed in stages. We keep a set of survivors which is initially the set of all randomized n^d-size M⁽ⁿ⁾-oracle circuits with n output bits. In each stage we put one pure row strategy into S and kill the survivors against which this strategy is good. We do so until there are no survivors left. If the number of stages is O((n^k+d+1log n)/), then we are done.

Let Survivors_i denote the set of circuits that have not been killed after stage i. Initially, Survivors₀ consists of all randomized n^d-size M⁽ⁿ⁾-oracle circuits with n output bits. According to Shannon’s counting argument (Lemma 1 with n^d assigned to s, 2n assigned to l and poly(n) assigned to m),

| Survivors0| ≤ 2^O(n^d+1^{log n)}. (3.1) Given Survivors_i, we now show how to obtain Survivors_i+1. Let T be the smallest power of 2 not less than (n^k+1log n)/. Hence (n^k+1log n)/ ≤ T ≤ 2(n^k+1log n)/. Consider any collection of T circuits in Survivors_i, possibly with repetitions: C1, C2, . . . , CT. We construct a randomized M⁽ⁿ⁾- oracle circuit C that feeds independent random inputs to C₁, C₂, . . . , C_T and chooses equiprobably one of the n-bit outputs of C1, C2, . . . , CT as the output of C. Summing up the sizes of C₁, C₂, . . . , C_T and including a multiplexer (Lemma 2with T assigned to 2^m), we see that C is of size

O (n^k+1log n)/ · n^d+ (n^k+1log n)/ · (log((n^k+1log n)/) + n)

< n^clog(1/)/ (3.2)

for sufficiently large n. Inequality (3.2) and the definition of t show that there is an i^∗ with E[Mrow⁽ⁿ⁾(i^∗, C)] ≥ t. This is equivalent to saying

E[Mrow⁽ⁿ⁾(i^∗, C₁)] + · · · + E[Mrow⁽ⁿ⁾(i^∗, C_T)]

T ≥ t. (3.3)

(17)

Let f be the fraction of values above t − among

E[M_row⁽ⁿ⁾(i^∗, C1)], E[M_row⁽ⁿ⁾(i^∗, C2)], . . . , E[M_row⁽ⁿ⁾(i^∗, CT)].

Since kMrow⁽ⁿ⁾k_∞≤ n^k, it is clear that

E[Mrow⁽ⁿ⁾(i^∗, C₁)] + · · · + E[Mrow⁽ⁿ⁾(i^∗, C_T)]

T

≤ f n^k+ (1 − f )(t − ). (3.4)

Inequalities (3.3) and (3.4) imply that t ≤ f n^k + (1 − f )(t − ), or that f ≥ /(n^k− t + ). This and the fact that ∈ (0, 1) and | t | ≤ kM_rowⁿ k_∞≤ n^k result in

f > /(3n^k)

for sufficiently large n. That is, i^∗ is good against more than a /(3n^k) fraction of players among C₁, C₂, . . . , C_T for sufficiently large n.

Next suppose we actually pick each of C1, C2, . . . , CT independently and uniformly from Survivors_i. Fix arbitrarily, if any, a pure row strategy i⁰ good against less than an /(4n^k) fraction of Survivors_i. Let f_i⁰/(3n^k) be the fraction of players in Survivors_i against which i⁰ is good. By the choice of i⁰, we have f_i⁰ < 3/4. The Chernoff bound (Fact 2 with X_j = 1 if i⁰ is good against C_j and 0 otherwise, p_j = f_i⁰/(3n^k), T assigned to n and −1 + 1/f_i⁰ assigned to δ) gives

Pri⁰ is good against more than an /(3n^k) fraction of C1, . . . , CT

< (fi⁰e^1−fⁱ⁰)^{T /(3n}^k⁾ ≤ e−Ω(n log n)

.

The probability is over the picking of C₁, . . . , C_T. The last inequality is true because the function xe^1−xhas positive derivative on (0, 1) and (3/4)e^1−3/4 <

1. By summing over i⁰ ∈ {0, 1}ⁿ, the probability is at most 2ⁿe−Ω(n log n)

(18)

that some pure row strategy good against less than an /(4n^k) fraction of Survivors_i is good against more than an /(3n^k) fraction of randomly picked C₁, C₂, . . . , C_T. Since 2ⁿe−Ω(n log n) < 1 for sufficiently large n, a probabilistic argument shows that there is a choice of C₁, C₂, . . . , C_T such that every pure row strategy good against more than an /(3n^k) fraction of C₁, C₂, . . . , C_T must be good against at least an /(4n^k) fraction of Survivors_i. We have seen in the last paragraph that for this (in fact, every) choice of C₁, C₂, . . . , C_T, there is an i^∗ good against more than an /(3n^k) fraction of C₁, C₂, . . . , C_T. This i^∗ must therefore be good against at least an /(4n^k) fraction of Survivors_i. We add this i^∗ to S and obtain

| Survivorsi+1| ≤ (1 − /(4n^k))| Survivorsi|.

From this and inequality (3.1), after O((n^k+d+1log n)/) stages the number of survivors will be less than one. At that time, there must be no survivors left.

The same theorem holds for circuits without oracle gates.

Corollary 1. Let k > 0, d ≥ 1 and c > d + k + 1 be constants. Consider a family of two-person strategic games parameterized by n ∈ N⁺. Let M⁽ⁿ⁾: {0, 1}ⁿ× {0, 1}ⁿ → [−n^k, n^k]² be the payoff function for the game. Let : N⁺ → (0, 1) be a function. We assume that the binary representation of each number in the range of Mrow⁽ⁿ⁾ and M_col⁽ⁿ⁾ is of length polynomial in n. For each n, there is a set S ⊆ {0, 1}ⁿ of size O((n^k+d+1log n)/) such that

min

C∈SIZE_ndmax

i∈S E[M_row⁽ⁿ⁾(i, C)]

> min

C∈SIZEnc log(1/)/

max

i∈{0,1}ⁿ

E[M_row⁽ⁿ⁾(i, C)] −

The symmetric statement, with the roles of players exchanged, holds.

(19)

Proof. The proof is the same as that of Theorem 1but with each occurrence of “randomized M⁽ⁿ⁾-oracle circuit” replaced with “randomized circuit.”

As have been pointed out in [FKS95], Newman’s result implies the following, which is relevant to our work [New91].

Lemma 3. ([New91]) Consider a family of two-person zero-sum strategic games parameterized by n ∈ N⁺. Let Mrow⁽ⁿ⁾ : {0, 1}ⁿ × {0, 1}ⁿ → [−n^k, n^k] be the payoff function for the row player. Let v be the game value under mixed strategies for the game parameterized by n, and : N⁺ → (0, 1) be a function. For every T ≥ 2n^2k+1/² there is a multiset of T pure row strategies such that if the row player selects equiprobably one of these strategies, his expected payoff is at least v − against any mixed strategy in P_col adopted by the column player. Similarly, there is a multiset of T pure column strategies such that if the column player selects equiprobably one of these strategies, his expected payoff is at least −v − against any mixed strategy in Prow adopted by the row player.

Proof. Let T ≥ 2n^2k+1/². Let each of the independent random variables α₁, α₂, . . ., α_T be distributed identically as the mixed strategy of the row player in a mixed strategy Nash equilibrium. For an arbitrary pure column strategy j, we have E[Mrow⁽ⁿ⁾(α_i, j)] ≥ v, i = 1, . . . , T , by the definition of the game value under mixed strategies. From Hoeffding’s inequality (Fact3with [ a, b ] = [−n^k, n^k], Mrow⁽ⁿ⁾(α_i, j) assigned to X_i and T assigned to n), we have

Pr

"

Mrow⁽ⁿ⁾(α₁, j) + · · · + Mrow⁽ⁿ⁾(α_T, j)

T < v −

#

< 2e⁻ⁿ.

So with probability at most 2ⁿ2e⁻ⁿ< 1, there exists a j with Mrow⁽ⁿ⁾(α₁, j) + · · · + Mrow⁽ⁿ⁾(α_T, j)

T < v − .

(20)

Hence there exist pure strategies i₁, i₂, . . . , i_T for the row player with Mrow⁽ⁿ⁾(i₁, j) + · · · + Mrow⁽ⁿ⁾(i_T, j)

T ≥ v − , ∀j.

By selecting equiprobably from among i₁, i₂, . . ., i_T, the row player can guarantee an expected payoff of at least v − against any mixed strategy of the column player. The second part of this corollary can be proved by observing that by exchanging the roles of the two players, the game value becomes −v.

We now consider the effects of Lemma3 on Theorem1.

Corollary 2. Let k > 0 and c > 2k + 2 be constants. Consider a family of two-person strategic games parameterized by n ∈ N⁺. Let M⁽ⁿ⁾ : {0, 1}ⁿ× {0, 1}ⁿ → [−n^k, n^k]² be the payoff function for the game G_n. Let : N⁺ → (0, 1) be a function. There is a multiset S ⊆ {0, 1}ⁿ of size O(n^2k+1/²) such that the row player R_S who plays each strategy in S equiprobably guarantees

β∈Pmincol

E[M_row⁽ⁿ⁾(RS, β)] > min

C∈SIZE^M

nc log(1/)/2

max

i∈{0,1}ⁿE[M_row⁽ⁿ⁾(i, C)] − for sufficiently large n.

Proof. We may define a zero-sum game G⁰_n with the payoff function −Mrow⁽ⁿ⁾

for the column player and Mrow⁽ⁿ⁾ for the row player. Let v⁰ be the value of G⁰_n under mixed strategies. Let T be the smallest power of 2 not less than 2n^2k+1/². Using Lemma 3 on G⁰_n, we see that there is a multiset S⁰ of size T such that, for the column player C_S⁰ choosing equiprobably a strategy in S⁰,

∀α ∈ P_row, E[M_row⁽ⁿ⁾(α, C_S⁰)] ≤ v⁰+ /3. (3.5) The same holds in the original game G_n because the games G_nand G⁰_nadopt the same payoff function for the row player and C_S⁰ makes no queries. C_S⁰

(21)

could be implemented as a circuit by hardwiring S⁰, adding random gates, and including a multiplexer. From Lemma 2 with T assigned to 2^m, such a circuit is of size

O n^2k+1/² · n + n^2k+1/²· (log(n^2k+1/²) + n)

< n^clog(1/)/², (3.6)

for sufficiently large n. Denote

t = min

C∈SIZE^M

nc log(1/)/2

max

i∈{0,1}ⁿE[M_row⁽ⁿ⁾(i, C)].

Inequalities (3.5), (3.6), and our definition of t gives t ≤ v⁰+/3 for sufficiently large n, or, equivalently,

v⁰ ≥ t − /3, (3.7)

for sufficiently large n. Applying Lemma 3 to G⁰_n, we see that by selecting equiprobably from among some T strategies S, the row player can guarantee an expected payoff of at least v⁰ − /3 against every mixed strategy of the column player. Again, this must also be true in G_n. Inequality (3.7) implies v⁰ − /3 ≥ t − 2/3 > t − and completes the proof.

Since

∀β ∈ P_col, E[M_row⁽ⁿ⁾(R_S, β)] ≤ max

i∈S E[M_row⁽ⁿ⁾(i, β)],

a direct computation on the sizes of S and the circuit sizes in Theorem1and Corollary 2shows that Theorem 1 with d = k + 3 and > 1/n is implied by Corollary2, and hence by Lemma3. It is not known if Theorem1is implied by Lemma 3, however.

(22)

Chapter 4 Implications on the Power of Players

We use Theorem 1 to investigate the computational resource needed for a player to guarantee a good expected payoff against computationally-bounded players. A naive player performs exponentially (in n) many queries to M⁽ⁿ⁾ to determine an optimal strategy, pure or mixed, but we show how a player could run in less than exponential time without degrading his payoff too much.

Corollary 3. Let k > 0, d ≥ 1, c > k + d + 1, and 0 < < 1 be constants. Consider a family of two-person strategic games parameterized by a positive integer n. Let M⁽ⁿ⁾ : {0, 1}ⁿ × {0, 1}ⁿ → [−n^k, n^k]² be the payoff function. We assume that the binary representation of each number in the range of Mrow⁽ⁿ⁾ and M_col⁽ⁿ⁾ is of length polynomial in n. Denote t = min_C∈SIZEM

nc log(1/)/max_i∈{0,1}ⁿE[Mrow⁽ⁿ⁾(i, C)]. The following statements are true.

(i) If R is informed of the circuit computing C, then he needs only polynomial time with polynomial advice (in n), private fair coin flips, and

(23)

an M⁽ⁿ⁾-oracle to guarantee that E[Mrow⁽ⁿ⁾(R, C)] > t − for every randomized n^d-size M⁽ⁿ⁾-oracle C.

(ii) Assume R is allowed to produce his output after he sees the output strategy of C. If the language {i, j, k, n| the kth bit in the binary representation of Mrow⁽ⁿ⁾(i, j) is a 1 } is in NL/poly, then there is a nondeterministic logarithmic-space row player R with polynomial advice that always has a unique accepting branch, and E[Mrow⁽ⁿ⁾(R, C)] > t − holds for every randomized n^d-size M⁽ⁿ⁾-oracle C. Here the output of R is defined to be the string it leaves on its output tape on the unique accepting branch.

(iii) If R is required to determine his output strategy simultaneously with C (which is unknown to R), and if he obtains an additional O(log n) bits of information about C, then he needs only deterministic polynomial time with polynomial advice (in n) to guarantee that E[Mrow⁽ⁿ⁾(R, C)] > t − holds for every randomized n^d-size M⁽ⁿ⁾-oracle circuit C.

Here is the interpretation of Corollary 3. Assuming that R is informed of the mixed strategy (or the circuit) of the column player, the value t is the expected payoff R could guarantee against all randomized M⁽ⁿ⁾-oracle column players of size n^clog(1/)/. Item (i) states that with polynomial advice and oracle access to M⁽ⁿ⁾, a polynomial-time R suffices to guarantee a t − payoff against all randomized n^d-size M⁽ⁿ⁾-oracle column players. Item (ii) further assumes that R chooses his output strategy after the column player does, and M⁽ⁿ⁾ is computed by a nondeterministic logarithmic-space Turing machine with polynomial advice. Under these assumptions, the polynomial- time Turing machine R in item (i) can be further reduced to an unambiguous logarithmic-space one. The guaranteed expected payoff is still at least t −

(24)

against randomized n^d-size M⁽ⁿ⁾-oracle column players. Item (iii) suggests that, even if R knows only some O(log n) bits of information about the column player (of size n^d), his guaranteed expected payoff is almost as if he is informed of the whole circuit of the column player (of size n^clog(1/)/).

Proof of Corollary 3. We have seen in Theorem1that there is a set S of size polynomial in n such that, for every randomized n^d-size M⁽ⁿ⁾-oracle column player C, there is a strategy i ∈ S with E[Mrow⁽ⁿ⁾(i, C)] > t − /2. We give S as the advice to R in all the following cases.

Proof of (i). Let C be an arbitrary randomized M⁽ⁿ⁾-oracle circuit of size n^d with n output bits. Being informed of the circuit computing C, R could simu- late C independently 50n^2k+1/²times and obtains outputs O1, . . . , O_50n^2k+1_/². For each i ∈ S, he computes Mrow⁽ⁿ⁾(i, O_j), 1 ≤ j ≤ O_50n^2k+1_/². Hoeffding’s inequality (Fact 3with Mrow⁽ⁿ⁾(i, Oj) assigned to Xj, 50n^2k+1/² assigned to n,

/5 assigned to and [a, b] = [−n^k, n^k]) yields

Pr

"

Σ⁵⁰ⁿ_j=1^2k+1^/²Mrow⁽ⁿ⁾(i, Oj)

50n^2k+1/² − E[M_row⁽ⁿ⁾(i, C)]

> /5

#

< 2e⁻ⁿ, (4.1)

where the probability is over the coin tosses of R. For each i ∈ S, R estimates E[Mrow⁽ⁿ⁾(i, C)] as

E[M˜ _row⁽ⁿ⁾(i, C)]

≡ Mrow⁽ⁿ⁾(i, O1) + · · · + Mrow⁽ⁿ⁾(i, O_50n^2k+1_/²)

50n^2k+1/² .

Note that the random variable ˜E[Mrow⁽ⁿ⁾(i, C)] depends solely on the random coin tosses of R, and not those of C. Using inequality (4.1), we see that

Prh

∃i ∈ S,

E[M˜ _row⁽ⁿ⁾(i, C)] − E[M_row⁽ⁿ⁾(i, C)]

> /5i

< |S|2e⁻ⁿ, (4.2)

(25)

where the probability is over the random coin tosses of R. The selection of S guarantees

maxi∈S E[M_row⁽ⁿ⁾(i, C)] > t − /2. (4.3) Consider the row player R outputting an i ∈ S with the largest ˜E[Mrow⁽ⁿ⁾(i, C)].

Let i^∗ be the output of R. From inequality (4.2), with probability more than 1 − |S|2e⁻ⁿ

E[M_row⁽ⁿ⁾(i^∗, C)]

≥ E[M˜ _row⁽ⁿ⁾(i^∗, C)] − /5

= max

i∈S

E[M˜ _row⁽ⁿ⁾(i, C)] − /5

≥ max

i∈S E[M_row⁽ⁿ⁾(i, C)] − 2/5

> t − 9/10. (4.4)

The last inequality is from inequality (4.3). Since R and C use independent coin tosses, and since kMrow⁽ⁿ⁾k ≤ n^k,

E[M_row⁽ⁿ⁾(R, C)]

> (1 − |S|2e⁻ⁿ)(t − 9/10) + |S|2e⁻ⁿ(−n^k)

> t −

for sufficiently large n, as required. For smaller values of n so that the last inequality does not hold, item (i) is immediately true because a polynomial time Turing machine could use every strategy in {0, 1}ⁿ.

Proof of (ii). In this case R himself evaluates Mrow⁽ⁿ⁾ on C’s output strategy and each strategy in S. He chooses the best strategy in S for output.

Since the Immerman-Szelepcsenyi theorem [Imm88] can be directly ex- tended to say that NL/poly = coNL/poly, our assumption implies that

(26)

asking if the kth bit of Mrow⁽ⁿ⁾(i, j) is 0 is also in NL/poly. We will also use the fact that NL/poly = UL/poly from [RA97].

Let C’s output strategy be j and denote S = {i₁, . . . , i|S|}. The row player computes Mrow⁽ⁿ⁾(i_t, j) for t = 1, . . . , |S|. When computing Mrow⁽ⁿ⁾(i_t, j), he guesses its first bit using a nondeterministic branch and verifies it in unambiguous logarithmic space. This can be done since NL/poly = coNL/poly = UL/poly. He then does the same for the second, third bit and so on.

It is clear that only one branch guesses completely correctly and survives.

Others are rejected. In this manner R proceeds by computing Mrow⁽ⁿ⁾(i_t, j), t = 1, . . . , |S|, one by one. Instead of saving all these values, which takes space polynomial in n, he needs only store the best strategy he has eval- uated so far. The corresponding Mrow⁽ⁿ⁾-value can be computed on the fly whenever it is needed. These observations yield that R runs in unambiguous logarithmic space (in n).

For an arbitrary randomized n^d-size M⁽ⁿ⁾-oracle player C, let i ∈ S be such that E[Mrow⁽ⁿ⁾(i, C)] > t − . Since whatever strategy C takes, R always chooses a strategy no worse than i, we must have E[Mrow⁽ⁿ⁾(R, C)] > t − .

Proof of (iii). In this case R just needs to know which i ∈ S makes E[Mrow⁽ⁿ⁾(i, C)] >

t − . This information takes O(log n) bits.

(27)

Appendix A

Complexity Classes

A brief definition of several complexity classes are given. For detailed definitions, please refer to [AK].

1. P is the class of languages decidable in polynomial time.

2. P-hard is the class of languages logarithmic-space reducible from all languages in P.

3. EXP is the class of languages decidable in exponential time.

4. EXP-hard is the class of languages polynomial-time reducible from all languages in EXP.

5. EXP-complete = EXP ∩ EXP-hard.

6. promise-S^P₂ is the class of languages L such that there are disjoint sets Π⁺, Π⁻ with Π⁺ ⊆ L, Π⁻ ∩ L = ∅ and there is a polynomial-time computable predicate R(x, y, z) for |y| = |z| = poly(|x|) satisfying the following: ∀x ∈ Π⁺, ∃y∀zR(x, y, z) = 1 and ∀x ∈ Π⁻, ∃z∀yR(x, y, z) = 0.

(28)

7. RP is the class of languages L such that there is a nondeterministic polynomial time Turing machine which on input x ∈ L accepts on at least 1/2 of its computation paths and on input x /∈ L rejects on every computation path.

8. coRP is the class of languages whose complement is in RP.

9. ZPP = RP ∩ coRP

10. NP is the class of languages L such that there is a nondeterministic polynomial-time Turing machine which on input x accepts on at least one computation path if and only if x ∈ L.

11. PPAD is the class of function problems of the following form. Given a polynomial time algorithm P that, on any input x, implicitly defines a directed graph G(x) with nodes Σ^p(|x|)by outputting for each y ∈ Σ^p(|x|) the vertices pointing to or from y, where p is a polynomial and Σ is some constant size alphabet. The graph G(x) is restricted to be one in which each vertex has indegree and outdegree at most one. Given a source of G(x) (i.e., one with indegree zero), the problem is to find another source or sink.

12. PPAD-complete is the class of function problems in PPAD reducible from every function problem in PPAD.

13. NL/poly is the class of languages L such that there is a nondeterministic logarithmic-space Turing machine and a sequence of polynomially- long (in n) advices {a_n}_n∈N such that on inputs x and a|x|, the Turing machine has an accepting computation path if and only if x ∈ L.

14. UL/poly is the same as NL/poly except the logarithmic-space Turing

(29)

machine must have exactly one computation path whatever input it is given.

15. coNL/poly is the class of languages whose complement is in NL/poly.

(30)

Bibliography

[AK] S. Aaronson and G. Kuperberg, http://www.complexityzoo.

com/.

[CD06] X. Chen and X. Deng, 2D-SPERNER is PPAD-complete, Sub- mitted to STOC (2006).

[Che52] H. Chernoff, A measure of the asymptotic efficiency of tests of a hypothesis based on the sum of observations, Annals of Math- mematical Statistics 23 (1952), 493–507.

[DGP05] C. Daskalakis, P. W. Goldberg, and C. H. Papadimitriou, The complexity of computing a nash equilibrium, Tech. Report TR05- 115, Electronic Colloquium on Computational Complexity, 2005.

[FIKU05] L. Fortnow, R. Impagliazzo, V. Kabanets, and C. Umans, On the complexity of succinct zero-sum games, Proceedings of the 20th IEEE Conference on Computational Complexity, 2005, pp. 323–

332.

[FKS95] J. Feigenbaum, D. Koller, and P. Shor, A game-theoretic classifi- cation of interactive complexity classes, Proceedings of the 10th Annual IEEE Conference on Computational Complexity, 1995, pp. 227–237.

(31)

[FPS03] L. Fortnow, A. Pavan, and S. Sengupta, Proving sat does not have small circuits with an application to the two queries problem, Pro- ceedings of the 18th Annual IEEE Conference on Computational Complexity, 2003, pp. 347–357.

[GK92] M. Grigoriadis and L. Khachiyan, Approximating solution of matrix games in parallel, Advances in Optimization and Paral- lel Computing (Amsterdam) (P. Pardalos, ed.), Elsevier, 1992, pp. 129–136.

[GK95] , A sublinear-time randomized approximation algorithm for matrix games, Operations Research Letters 18 (1995), no. 2, 53–

58.

[Hoe63] W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association 58 (1963), no. 301, 13–30.

[Imm88] N. Immerman, Nondeterministic space is closed under complemen- tation, SIAM Journal on Computing (1988), 935–938.

[KB04] R. H. Katz and G. Borriello, Contemporary logic design, 2nd ed., Prentice Hall, 2004.

[Kha79] L. G. Khachiyan, A polynomial algorithm in linear programming, Soviet Mathematics Doklady 20 (1979), 191–194.

[LN93] M. Luby and N. Nisan, A parallel approximation algorithm for positive linear programming, Proceedings of the 25th Annual ACM Symposium on Theory of Computing, 1993, pp. 448–457.

(32)

[Nas51] J. Nash, Noncooperative games, Annals of Mathematics 54 (1951), 289–295.

[New91] I. Newman, Private vs. common random bits in communication complexity, Information Processing Letters 39 (1991), 67–71.

[OR94] M. J. Osborne and A. Rubinstein, A course in game theory, MIT Press, 1994.

[Owe82] G. Owen, Game theory, Academic Press, 1982.

[PST95] S. Plotkin, D. Shmoys, and E. Tardos, Fast approximation algo- rithms for fractional packing and covering problems, Mathematics of Operations Research 20 (1995), no. 2, 257–301.

[RA97] Klaus Reinhardt and Eric Allender, Making nondeterminism unambiguous, Proceedings of the 38th Annual IEEE Symposium on Foundations of Computer Science, 1997, pp. 244–253.

[Rud76] W. Rudin, Principles of mathematical analysis, 3rd ed., McGraw- Hill, 1976.

[Sha49] C. E. Shannon, Communication in the presence of noise, IRE 37 (1949), 10–21.

[Sip05] M. Sipser, Introduction to the theory of computation, 2nd ed., Course Technology, 2005.

On the Computational Power of Players in Two-Person Strategic Games