• 沒有找到結果。

Diversified Strategies for Mitigating Adversarial Attacks in Multiagent Systems

N/A
N/A
Protected

Academic year: 2022

Share "Diversified Strategies for Mitigating Adversarial Attacks in Multiagent Systems"

Copied!
9
0
0

加載中.... (立即查看全文)

全文

(1)

Diversified Strategies for Mitigating Adversarial Attacks in Multiagent Systems

Maria-Florina Balcan

Carnegie Mellon University ninamf@cs.cmu.edu

Avrim Blum

T TI-Chicago avrim@ttic.edu

Shang-Tse Chen

Georgia Institute of Technology schen351@gatech.edu

ABSTRACT

In this work we consider online decision-making in settings where players want to guard against possible adversarial attacks or other catastrophic failures. To address this, we propose a solution concept in which players have an additional constraint that at each time step they must play adiversified mixed strategy: one that does not put too much weight on any one action. This constraint is motivated by applications such as finance, routing, and resource allocation, where one would like to limit one’s exposure to adversarial or catastrophic events while still performing well in typical cases. We explore prop- erties of diversified strategies in both zero-sum and general-sum games, and provide algorithms for minimizing regret within the family of diversified strategies as well as methods for using taxes or fees to guide standard regret-minimizing players towards diver- sified strategies. We also analyze equilibria produced by diversified strategies in general-sum games. We show that surprisingly, requir- ing diversification can actually lead to higher-welfare equilibria, and give strong guarantees on both price of anarchy and the social welfare produced by regret-minimizing diversified agents. We addi- tionally give algorithms for finding optimal diversified strategies in distributed settings where one must limit communication overhead.

KEYWORDS

Game theory; regret minimization; adversarial multiagent systems;

diversified strategies; risk mitigation; general-sum games ACM Reference Format:

Maria-Florina Balcan, Avrim Blum, and Shang-Tse Chen. 2018. Diversified Strategies for Mitigating Adversarial Attacks in Multiagent Systems. InProc.

of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), Stockholm, Sweden, July 10–15, 2018, IFAAMAS, 9 pages.

1 INTRODUCTION

A common piece of advice when one needs to make decisions in the face of unknown future events is “Don’t put all your eggs in one bas- ket.” This is especially important when there can be an adversarial attack or catatrophic failure. We consider game-theoretic problems from this perspective, design online learning algorithms with good performance subject to such exposure-limiting constraints on be- havior, and analyze the effects of these constraints on the expected value obtained (in zero-sum games) and the overall social welfare produced (in general-sum games).

As an example, consider a standard game-theoretic scenario: an agent must drive from point A to point B and hasn different routes it

Proc. of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), M. Dastani, G. Sukthankar, E. André, S. Koenig (eds.), July 10–15, 2018, Stockholm, Sweden. © 2018 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.

can take. We could model this as a gameM where rows correspond to then routes, columns correspond to m possible traffic patterns, and entryM(i, j) is the cost for using route i under traffic pattern j.

However, suppose the agent is carrying valuable documents and is concerned an adversary might try to steal them. In this case, to reduce the chance of this happening, we might require that no route have more than (say) 10% probability. The agent then wants to minimize expected travel time subject to this requirement. Or in an investment scenario, if rows correspond to different investments and columns to possible market conditions, we might have an additional worry that perhaps one of the investment choices is run by a crook. In this case, we may wish to restrict the strategy space to allocations of funds that are not too concentrated.

To address such scenarios, forϵ ∈ [n1, 1] let us define a probabil- ity distribution (or allocation)P to be ϵ-diversified if P(i) ≤ ϵn1 for alli. For example, for ϵ = n1 this is no restriction at all, forϵ = 1 this requires the uniform distribution, and for intermediate values ofϵ this requires an intermediate level of diversification. We then explore properties of such diversified strategies in both zero-sum and general-sum games as well as give algorithmic guarantees.

For zero-sum games, definevϵ to be the minimax-optimal value of the game in which the row player is restricted to playingϵ- diversified mixed strategies. Natural questions we address are: Can one design adaptive learning algorithms that maintainϵ-diversified distributions and minimize regret within this class so they never perform much worse thanvϵ? Can a central authority “nudge”

a generic non-diversified regret-minimizer into using diversified strategies via fines or taxes (extra loss vectors strategically placed into the event stream) such that it maintains low-regret over the original sequence? And for reasonable games, how much worse isvϵ compared to the non-diversified minimax valuev? We also consider a dual problem of producing a strategyQ for the column player that achieves valuevϵ against all but anϵ fraction of the rows (which an adversary can then aim to attack).

One might ask why not model such an adversary directly within the game, via additional columns that each give a large loss to one of the rows. The main reason is that these would then dominate the minimax value of the game. (And they either would not have values within the usual[0, 1] range assumed by regret-minimizing learners, or, if they were scaled to lie in this range, they would cause all other events to seem roughly the same). Instead, we want to consider learning algorithms that optimize for more common events, while keeping to the constraint of maintaining diversified strategies. We also remark that one could also make diversification a soft constraint by adding a loss term for not diversifying.

We next consider general-sum games, such as routing games and atomic congestion games, in whichk players interact in ways that lead to various costs being incurred by each player. We show

(2)

that surprisingly, requiring a player to use diversified strategies can actually improve its performance in equilibria in such games.

We then study theϵ-diversified price of anarchy: the ratio of the social cost of the worst equilibrium subject to all players beingϵ- diversified to the social cost of the socially-best set ofϵ-diversified strategies. We show that in some natural games, even requiring a small amount of diversification can dramatically improve the price of anarchy of the game, though we show there also exist games where diversification can make the price of anarchy worse. We also bring several threads of this investigation together by showing that for the class ofsmooth games defined by Roughgarden [25], for any diversification parameterϵ ∈ [n1, 1], the ϵ-diversified price of anar- chy is no worse than the smoothness of the game, and moreover, players using diversified regret-minimizing strategies will indeed approach this bound. Thus, we get strong guarantees on the quality of interactions produced by self-interested diversified play. Finally, we consider how much diversification can hurtoptimal play, show- ing that in random unit-demand congestion games, diversification indeed incurs a low penalty.

Lastly, we consider an information-limited, distributed, “big-data”

setting in which the number of rows and columns of the matrixM is very large and we do not have it explicitly. Specifically, we assume then rows are distributed among r processors, and the only access to the matrixM we have is via an oracle for the column player that takes in a sample of rows and outputs the column player’s best response. What we show is how in such a setting to produce near optimal strategies for each player in the sense described above, from very limited communication among processors.

In addition to our theoretical results, we also present experimen- tal simulations for both zero-sum and general-sum games.

1.1 Related Work

There has been substantial work on design of “no-regret” learn- ing algorithms for repeated play of zero-sum games [7, 10, 15].

Multiplicative Weight Update methods [2, 21] are a specific type of no-regret algorithm that have received considerable attention in game theory [15, 16], machine learning [11, 15], and many other research areas [1, 20], due to their simplicity and elegance.

We consider the additional constraint that players play diversi- fied mixed strategies, motivated by the goal of reducing exposure to adversarial attacks. The concept of diversified strategies, some- times called “smooth distributions”, appears in a range of different areas [12, 17, 20]. [9] considers a somewhat related notion where there is a penalty for deviation from a given fixed strategy, and shows existence of equilibria in such games. Also related is work on adversarial machine learning, e.g., [13, 18, 27]; however, in this work we are instead focused on decision-making scenarios.

Our distributed algorithm is inspired by prior work in distributed machine learning [4, 11, 14], where the key idea is to perform weight updates in a communication efficient way. Other work on the impact of adversaries in general-sum games appears in [3, 5, 6].

2 ZERO-SUM GAMES

We begin by studying two-player zero-sum games. Recall that a two-player zero-sum game is defined by an × m matrix M. In each round of the game, the row player chooses a distributionP over the

Algorithm 1 Multiplicative Weights Update algorithm with Re- stricted Distributions

Initialization: Fix aγ ≤ 12. SetP(1)to be the uniform distribu- tion.

fort = 1, 2, . . . ,T do

(1) Choose distributionP(t )

(2) Receive the pure strategyjtfor the column player (3) Compute the multiplicative update rule

i(t +1)= Pi(t )(1 −γ )M(i, jt)/Z(t )

whereZ(t )= ÍiPi(t )(1 −γ )M(i, jt)is the normalization factor.

(4) Project ˆP(t +1)intoPϵ P(t +1)= arg min

P ∈Pϵ

RE(P ∥ ˆP(t +1)) end for

rows ofM, and the column player chooses a distribution Q over the columns ofM. The expected loss of the row player is

M(P, Q) = PTMQ =Õ

i, j

P(i)M(i, j)Q(j),

whereM(i, j) ∈ [0, 1] is the loss suffered by the row player if the row player plays rowi and the column player plays column j. The goal of the row player is to minimize its loss, and the goal of the column player is to maximize this loss. The minimax valuev of the game is:

v = minP max

Q M(P, Q) = maxQ min

P M(P, Q).

2.1 Multiplicative Weights and Diversified Strategies

We now consider row players restricted to only playing diversified distributions, defined as follows.

Definition 2.1. A distribution p ∈ ∆n is calledϵ-diversified if maxipiϵn1 .

LetPϵbe the set of allϵ-diversified distributions, and let vϵ be the minimax value of the game subject to the row player restricted to playing inPϵ. Note that the range ofϵ is between 1/n and 1. It is easy to verify thatPϵis a convex set. As a result, the minimax theorem applies toPϵ[26], and we call the minimax valuevϵ:

vϵ = min

P ∈Pϵ

max

Q M(P, Q) = max

Q min

P ∈PϵM(P, Q).

The multiplicative weights update algorithm [19, 21] can be naturally adapted to maintain diversified strategies by projecting its distributions into the classPϵif they ever step outside of it. This is shown in Algorithm 1. By adapting the analysis of [19] to this case, we arrive at the following regret bound.

Theorem 2.2. For any 0 <γ ≤ 1/2 and any positive integer T , Algorithm 1 generates distributionsP(1), . . . , P(T )∈ Pϵto responses j1, . . . , jT, such that for anyP ∈ Pϵ,

ÍT

t =1M(P(t ), jt) ≤ (1+ γ ) ÍTt =1M(P, jt)+RE(P ∥Pγ (1)), whereRE(p ∥ q) = Íipiln(pi/qi) is relative entropy.

(3)

By combining Algorithm 1 with a best-response oracle for the column player, and applying Theorem 2.2 and a standard argument [2, 16] we have:

Theorem 2.3. Running Algorithm 1 forT steps against a best- response oracle, one can construct mixed strategies ¯P and ¯Q s.t.

maxQ M( ¯P, Q) ≤ vϵ+ ∆T and min

P ∈PϵM(P, ¯Q) ≥ vϵ−∆T, for∆T = 2q

ln(1/ϵ)

T , where ¯P =T1ÍT

t =1P(t )and ¯Q =T1ÍT

t =1jt. Proof. We can sandwich the desired inequalities inside a proof of the minimax theorem as follows:

P ∈Pminϵ

maxQ M(P, Q) ≤ max

Q M( ¯P, Q) = max

Q

1 T

ÕT t =1

M(P(t ), Q)

≤ 1 T

ÕT t =1

maxQ M(P(t ), Q) = 1 T

ÕT t =1

M(P(t ), jt)

≤ min

P ∈Pϵ

1+ γ T

ÕT t =1

M(P, jt)+ln(1/ϵ) γT

≤ min

P ∈PϵM(P, ¯Q) + γ +ln(1/ϵ) γT

≤ max

Q min

P ∈PϵM(P, Q) + γ +ln(1/ϵ) γT

If we setγ =q

ln(1/ϵ)

T , then∆T = γ +ln(1/γTϵ)= 2q

ln(1/ϵ)

T . The two inequalities in the theorem follow by skipping the first and the last inequalities from the proof above, respectively. □ The next theorem shows that the distributionQ in Theorem 2.3¯ is also a good mixed strategy for the column player against any row-player strategy if we remove a small fraction of the rows.

Theorem 2.4. By running Algorithm 1 forT steps against a best- response oracle, we can construct a mixed strategy ¯Q such that for all but anϵ fraction of the rows i, M(i, ¯Q) ≥ vϵ−γ . Moreover we can do this with at mostT = O log(1/ϵ)

γ2(1+γ −vϵ)



oracle calls.

Proof. We generate distributionsP(1), . . . , P(T )∈ Pϵby using Algorithm 1. Letjt be the column returned by the oracle with the inputP(t ). AfterT =l log(1/ϵ)

γ2(1+γ −vϵ)m + 1 rounds, we set the mixed strategyQ =¯ T1ÍT

t =1jt. SetE = {i|M(i, ¯Q) < vϵ−γ }. Suppose for contradiction that|E| ≥ ϵn. Let P = uE, the uniform distribution onE and 0 elsewhere. It is easy to see that uE∈ Pϵ, since|E| ≥ ϵn.

By the assumption of the oracle, we havevϵT ≤ ÍTt =1M(P(t ), jt).

In addition, by Theorem 2.2, we have ÕT

t =1

M(P(t ), jt) ≤ (1+ γ ) ÕT t =1

M(P, jt)+RE(P ∥ P(1))

γ .

For anyi ∈ E, ÍTt =1M(i, jt)= T · M(i, ¯Q) < (vϵ−γ )T . Since P is the uniform distribution onE, we have ÍTt =1M(P, jt)< (vϵ−γ )T . Furthermore, since|E| ≥ ϵn, we have

RE(P ∥ P(1))= RE(uE ∥u) ≤ ln(1/ϵ).

Algorithm 2 Multiplicative Weights Update algorithm with Inter- ventions

Initialization: Fix aγ ≤ 12. SetP(1)to be the uniform distribu- tion.

fort = 1, 2, . . . ,T do

(1) Choose distributionP(t )

(2) Receive the pure strategyjtfor the column player (3) Compute the multiplicative update rule

Pi(t +1)= Pi(t )(1 −γ )M(i, jt)/Z(t )

whereZ(t )= ÍiPi(t )(1 −γ )M(i, jt)is the normalization factor.

(4) WhileP(t +1)is not(1 −γ )ϵ-diversified, run multiplica- tive update (Step 3) on fake loss vectorℓ defined as:

i =

1 ifPi(t +1)>(1−γ )ϵn1 0 ifPi(t +1)(1−γ )ϵn1 end for

Putting these facts together, we getvϵT ≤ (1+γ )(vϵ−γ )T +ln(1/γϵ), which impliesT ≤γ2(1ln(1/+γ −vϵ)ϵ), a contradiction. □

2.2 Diversifying Dynamics

Theorem 2.2 shows that it is possible for a player to maintain anϵ- diversified distribution at all times while achieving low regret with respect to the entire familyPϵ ofϵ-diversified distributions. How- ever, suppose a player, who is allocating an investment portfolio amongn investments, does not recognize the need for maintaining a diversified distribution and simply uses the standard multiplicative- weights algorithm to minimize regret. For example, the player might not realize that the matrixM only represents “typical” behavior of investments, and that a crooked portfolio manager or clever hacker could cause an entire investment to be wiped out. This player might quickly reach a dangerous non-diversified portfolio in which nearly all of its weight is just on one row.

Suppose, however, that an investment advisor or helpful author- ity has the ability to charge fees on actions whose weights are too high, that can be viewed as inserting fake loss vectors into the stream of loss vectors observed by the player’s algorithm. We show here that by doing so in an appropriate manner, this advisor or authority can ensure that the player both (a) maintains diversified distributions, and (b) incurs low regret with respect to the family Pϵover the sequence of real loss vectors. Viewed another way, this can be thought of as an alternative to Algorithm 1 with slightly weaker guarantees but that does not require the projection. The algorithm remains efficient.

Theorem 2.5. Algorithm 2 generates distributionsP(1), . . . , P(T ) such that

(a)P(t )∈ P(1−γ )ϵfor allt, and (b) for anyP ∈ Pϵ we have

T

Õ

t =1

M(P(t ), jt) ≤ (1+ γ )

T

Õ

t =1

M(P, jt)+RE(P ∥ P(1))

γ .

(4)

Proof. For part (a) we just need to show that the while loop in Step 4 of the algorithm halts after a finite number of loops. To show this, we show that each time a fake loss vector is applied, the gap between the maximum and minimum total losses (including both actual losses and fake losses) over the rowsi is reduced. In particular, the multiplicative-weights algorithm has the property that the probability on an actioni is proportional to (1 − γ )Lit ot al whereLitotal is the total loss (actual plus fake) on actioni so far;

so, the actions of highest probability are also the actions of lowest total loss. This means that in Step 4, there exists some thresholdτ such thatℓi = 1 for all i of total loss at most τ and ℓi = 0 for all i of total loss greater thanτ . Since we are adding 1 to those actions of total loss at mostτ , this means that the gap between the maximum and minimum total loss over all the actions is decreasing, so long as that gap was greater than 1. However, note that ifP(t +1)is not (1 −γ )ϵ-diversified then the gap between maximum and minimum total loss must be greater than 1, by definition of the update rule and using the fact thatϵ ≤ 1. Therefore, the gap between maximum and minimum total loss is strictly reduced on each iteration (and reduced by at least 1 if any row is ever updated twice) untilP(t +1) becomes(1 −γ )ϵ-diversified.

For part (b), defineLalдactual = ÍTt =1M(P(t ), jt) to be the actual loss of the algorithm and defineLPactual = ÍTt =1M(P, jt) to be the actual loss of someϵ-diversified distribution P. We wish to show thatLalдactual is not too much larger thanLPactual. To do so, we begin with the fact that, by the usual multiplicative weights analysis, the algorithm has low regret with respect to any fixed strategy over the entire sequence of loss vectors (actual and fake).

Say the algorithm’s total loss isLalдtotal = Lalдactual+ Lalдf akeand the total loss ofP is LPtotal = LPactual+ LPf ake. We know that

Lalдtotal ≤ (1+ γ )LPtotal+RE(P ∥Pγ (1)), which we can rewrite as:

Lalдactual + Lalдf ake ≤ (1+ γ )LPactual+ (1 + γ )LPf ake+RE(P ∥Pγ (1)). Thus, to prove part (b) it suffices to show thatLalдf ake ≥ (1+γ )LPf ake. But notice that on each fake loss vector, for each indexi such thatℓi = 1, the algorithm has strictly more than (1−γ )ϵn1 > 1ϵn probability mass on rowi. In contrast, P has at mostϵn1 probability mass on rowi, since P is ϵ-diversified. Therefore Lalдf ake ≥ (1+

γ )LPf akeand the proof is complete. □

This analysis can be extended to the case of an advisor who only periodically monitors the player’s strategy. If the advisor monitors the strategy everyk steps, then in the meantime the maximum probability that any rowi can reach is (1−γ )1kϵn. So, part (a) of Theorem 2.5 would need to be relaxed toP(t )∈ P(1−γ )kϵ. However, part (b) of Theorem 2.5 holds as is.

2.3 How close is v

ϵ

to v?

Restricting the row player to playϵ-diversified strategies can of course increase its minimax loss, i.e.,vϵ ≥v. In fact, it is not hard to give examples of games where the gap is quite large. For example,

suppose the row player has one action that always incurs loss 0, and the remainingn − 1 actions always incur loss 1 (whatever the column player does). Thenv = 0 but for ϵ ∈ [n1, 1], vϵ= 1 −ϵn1 .

However, we show here that forrandom matricesM, the gap between the two is quite small. I.e., the additional loss incurred due to requiring diversification is low. A related result, in a somewhat different model, appears in [22].

Theorem 2.6. Consider a randomn × n game M where each en- tryM(i, j) is drawn i.i.d. from some distribution D over [0, 1]. With probability ≥ 1 −n1, for anyϵ ≤ 1, we have vϵ−v = Oqlogn

n  . Proof. Letµ = Ex∼D[x] be the mean of distribution D. We will show thatv and vϵ are both close toµ. To argue this, we will examine the value of the uniform distributionPunif for the row player, and the value of the uniform distributionQunif for the column player. In particular, notice thatv ≥ miniM(i, Qunif) becauseQunifis just one possible strategy for the column player, and by definition,v = miniM(i, Q) whereQis the minimax optimal strategy for the column player, and the row player’s loss underQis greater than or equal to the row player’s loss underQunifsince the column player is trying to maximize the row player’s loss. Similarly, vϵ ≤ maxjM(Punif, j) since Punif is just one possible diversified strategy for the row player, and by definitionvϵ = maxjM(P, j) wherePis the minimax optimal diversified strategy for the row player and the row player is trying to minimize loss. So, we have

mini M(i, Qunif) ≤v ≤ vϵ ≤ max

j M(Punif, j).

Thus, if we can show that with high probability miniM(i, Qunif) and maxjM(Punif, j) are both close to µ, then this will imply that v andvϵ are close to each other.

Let us begin withPunif. Notice thatM(Punif, j) is just the average of the entries in thejth column. So, by Hoeffding bounds, there exists a constantc such that for any given column j,

Pr



M(Punif, j) > µ + cq

logn n



1

2n2,

where the probability is over the random draw ofM. By the union bound, with probability at least 1− 1

2n, this inequality holds simul- taneously for all columnsj. Since Punifisϵ-diversified, as noted above this implies thatvϵ ≤µ + cp(logn)/n with probability at least 1−21n.

On the other hand, by the same reasoning, with probability at least 1−21n the uniform distributionQuniffor the column player has the property that for all rowsi, M(i, Qunif) ≥ µ − cq

logn n . This implies as noted above thatv ≥ µ − cq

logn

n . Therefore, with probability at least 1−n1,vϵ−v ≤ 2cq

logn

n as desired. □

3 GENERAL-SUM GAMES

We now considerk-player general-sum games. Instead of minimax optimality, the natural solution concept now is a Nash equilibrium.

We begin by showing that unlike zero-sum games, it is now possible for the payoff of a player at equilibrium to actually be improved by requiring it to play a diversified strategy. This is a bit peculiar because constraining a player is actually helping it.

(5)

We then consider the relationship between the social cost at equilibrium and the optimal social cost, when all players are re- quired to use diversified strategies. We call the ratio of these two quantities thediversified price of anarchy of the game, in analogy to the usualprice of anarchy notion when there is no diversification constraint. We show that in some natural games, even requiring a small amount of diversification can significantly improve the price of anarchy of the game, though there also exist games where diver- sification can make the price of anarchy worse. Finally, we bring several threads of this investigation together by showing that for the class ofsmooth games defined by Roughgarden [25], for any diversification parameterϵ ∈ [n1, 1], the ϵ-diversified price of anar- chy is no worse than the smoothness of the game, and moreover that players using diversified regret-minimizing strategies (such as those in Sections 2.1 and 2.2) will indeed approach this bound.

3.1 The Benefits of Diversification

First, let us formally define the notion of aNash equilibrium subject to a (convex) constraint C, where C could be a constraint such as

“the row player must use anϵ-diversified strategy”.

Definition 3.1. A set of mixed strategies (P1, . . . , Pk) is a Nash equilibrium subject to constraint C if no player can unilaterally deviate to improve its payoff without violating constraintC. We will just call this aNash equilibrium when C is clear from context.

We now consider the case ofk = 2 players, and examine how requiring the row player to diversify can affect its payoff at equi- librium. For zero-sum games, the valuevϵ was always no better than the minimax valuev of the game, since constraining the row player can never help it. We show here that this is not the case for general-sum games: requiring a player to use a diversified strategy can in some gamesimprove its payoff at equilibrium.

Theorem 3.2. There exist 2-player general-sum games for which a diversification constraint on the row player lowers the row player’s payoff at equilibrium, and games for which such a constraint increases the row player’s payoff at equilibrium.

Proof. Consider the following two bimatrix games (entries here represent payoffs rather than losses):

GameA : 2, 2 1, 1

1, 1 0, 0 GameB : 1, 1 3, 0 0, 0 1, 3 In GameA, the unique Nash equilibrium has payoff of 2 to each player, and requiring the row player to be diversified strictly lowers both player’s payoffs. On the other hand, diversification helps the row player in GameB. Without a diversification constraint, in Game B the row player will play the top row and the column player will therefore play the left column, giving both players a payoff of 1.

However, requiring the row-player to put probability1

2on each row will cause the column player to choose the right column, giving the row player a payoff of 2 and the column player a payoff of 1.5. □ Routing games [24] are an interesting class of many-player games where requiring all players to diversify can actually im- prove the quality of the equilibrium for everyone. An example is Braess’ paradox [8] shown in Figure 1. In this example,k players need to travel froms to t and wish to take the cheapest route. Edge

𝑏 𝑠

𝑎

𝑡 1

0 1

𝑘

𝑠,𝑎

/𝑘

𝑘

𝑏,𝑡

/𝑘

Figure 1: Braess’ paradox. Here,k players wish to travel from s to t, and requiring all players to use diversified strategies improves the quality of the equilibrium for everyone.

costs are given in the figure, wherekeis the number of players us- ing edgee. At Nash equilibrium, all players choose the route s-a-b-t and incur a cost of 2. However, if they must put equal probability on the three routes they can choose from, the expected cost of each player approaches only 1

3(23+ 1) +13(23+23)+13(1+23)= 1 +59. Thus, even though from an individual player’s perspective, diversi- fication is a restriction that increases robustness at the expense of higher average loss, overall, diversification can actually improve the quality of the resulting equilibrium state. In the next section, we discuss the social cost of diversified equilibria in many-player games in more detail, analyzing what we call thediversified price of anarchy as well as the social cost that results from all players using diversified regret-minimizing strategies.

3.2 The Diversified Price of Anarchy

We now consider structured general-sum games withk ≥ 2 players.

In these games, each playeri chooses some strategy sifrom a strat- egy spaceSi. The combined choice of the playerss= (s1, . . . , sk), which we call theoutcome, determines the cost that each player incurs. Specifically, letcosti(s) denote the cost incurred by playeri under outcomes, and letcost(s) = Íki=1costi(s) denote the overall social cost ofs. Let s= argminscost(s), i.e., the outcome of opti- mum social cost. Theprice of anarchy of a game is defined as the maximum ratiocost(s)/cost(s) over all Nash equilibria s. A low price of anarchy in a game means that all Nash equilibria have social cost that is not too much worse than the optimum. We can analogously define theϵ-diversified price of anarchy:

Definition 3.3. Let sϵdenote the outcome of optimum social cost subject to each player choosing anϵ-diversified strategy. The ϵ- diversified price of anarchy is the maximum ratiocost(sϵ)/cost(sϵ) over all outcomessϵthat are Nash equilibria subject to all players playingϵ-diversified strategies.

Note that for any game, the 1-diversified price of anarchy equals 1, because players are all required to play the uniform distribution.

This suggests that as we increaseϵ, the ϵ-diversified price of anarchy should drop, though as we show, in some games it is not monotone.

Examples. In consensus games, each playeri is a distinct node in ak-node graph G. Players each choose one of two colors, red or blue,

(6)

and the cost of playeri is the number of neighbors it has of color different from its own. The social cost is the sum of the players’

costs, and to keep ratios finite we add 1 to the total. The optimals is either “all blue” or “all red” in which each player has a cost of 0, so the social cost is 1. However, if the graph is a complete graph minus a matching, then there exists an equilibrium in which half of the players choose red and half the players choose blue. Each player has

k

2 − 1 red neighbors andk2 − 1 blue neighbors, so the social cost of this equilibrium isΘ(k2). This means the price of anarchy isΘ(k2).

However, if we require players to playϵ-diversified strategies for any constantϵ > 12 (i.e., they cannot play pure strategies), then for anym-edge graph G, even the optimum outcome has cost Ω(m) since every edge has a constant probability of contributing to the cost. So the diversified price of anarchy isO(1).

As another example, consideratomic congestion games [23]. Here, we have a setR of resources (e.g., edges in a graph G) and each player i has a strategy set Si ⊆ 2R(e.g., all ways to select a path between two specified vertices inG). The cost incurred by a player is the sum of the costs of the resources it uses (the cost of its path). Each resourcej has a cost function cj(kj) wherekj is the number of players who are using resourcej. The cost functions cj could be increasing, such as in packet routing where latency increases with the number of users of an edge, or decreasing, such as players splitting the cost of a shared printer. When examining diversified strategies, we sometimes view players as making fractional choices, such as sending half their packets down one path and half of them down another. The quantitykj then denotes the total fractional usage of resourcej (or equivalently, the expected number of users of that resource).

Non-monotonicity. An example of an atomic congestion game where some diversification can initially increase the price of anar- chy is the following. Suppose there are four resources, and each player just needs to choose one of them. The costs of the resources behave as follows:

c1(k1)= 1, c2(k2)= 5, c3(k3)= 6/k3, c4(k4)= 6/k4. Assume the total number of playersk is at least 13. The optimal outcomesis for all players to choose resource 3 (or all choose resource 4) for a total social cost of 6. The optimalϵ-diversified outcome forϵ = 12 (i.e., each player can put weight at most1

2 on any given resource) is for all players to put half their weight on strategy 3 and half their weight on strategy 4, for a total cost of 12.

The worst Nash equilibrium is for all players to choose strategy 1, for a total cost ofk, giving a price of anarchy of k/6. However if we require players to beϵ-diversified for ϵ = 12, there is now a worse equilibrium where each player puts half its weight on strategy 1 and half its weight on strategy 2, for a total cost of 3k and a diversified price of anarchy of 3k/12 = k/4. So, increasing ϵ from14 up to1

2

increases the price of anarchy, and then increasingϵ further to 1 will then decrease the price of anarchy to 1.

3.2.1 General Bounds. We now present a general bound on the diversified price of anarchy for games, as well as for the social welfare when all players use diversified regret-minimizing strate- gies such as given in Sections 2.1 and 2.2, using the smoothness framework of Roughgarden [25].

Definition 3.4. [25] A general-sum game is (λ, µ)-smooth if for any two outcomess and s,

k

Õ

i=1

costi(si, s−i) ≤λ cost(s)+ µ cost(s).

Here,(si, s−i) means the outcome in which playeri plays its action insbut all other players play their action ins.

Theorem 3.5. If a game is (λ, µ)-smooth, then for any ϵ, the ϵ- diversified price of anarchy is at most1−µλ .

Proof. Lets= sϵbe some Nash equilibrium subject to all play- ers playingϵ-diversified strategies, and let s= sϵ be an outcome of optimum social cost subject to all players choosingϵ-diversified strategies. Sinces is an equilibrium, no player wishes to deviate to their action ins; here we are using the fact thatsincludes onlyϵ- diversified strategies, so such a deviation would be legal. Therefore cost(s) ≤ Íki=1costi(si, si) ≤λ cost(s)+ µ cost(s). Rearranging, we have(1 −µ)cost(s) ≤ λ cost(s), socost(s)/cost(s) ≤ 1−λµ. □ Roughgarden [25] shows that atomic congestion games with affine cost functions, i.e., cost functions of the formcj(kj)= ajkj+ bj, are(53,13)-smooth. So, theirϵ-diversified price of anarchy is at most 2.5. We now adapt the proof in Roughgarden [25] to show that players with vanishing regret with respect to diversified strategies will also approach the bound of Theorem 3.5.

Theorem 3.6. Suppose that in repeated play of a (λ, µ)-smooth game, each playeri uses a sequence of mixed strategies s(1)i , . . . , si(T ) such that for anyϵ-diversified strategy siwe have:

1 T

T

Õ

t =1

costi(s(t )) ≤ 1 T

T

Õ

t =1

costi(si, s(t )−i) + ∆T.

Then the average social cost over theT steps satisfies 1

T ÕT t =1

cost(s(t )) ≤ λ

1−µcost(s)+k∆T 1−µ.

In particular, if∆T → 0 then the average social cost approaches the bound of Theorem 3.5.

Proof. Combining the assumption of the theorem, the definition of social cost, and the smoothness definition we have:

1 T

ÕT t =1

cost(s(t )) = 1 T

ÕT t =1

Õk i=1

costi(s(t )) (definition of social cost)

k

Õ

i=1

"

1 T

T

Õ

t =1

costi(si, s(t )i)+ ∆T

#

(assumption of theorem)

= 1

T

T

Õ

t =1

" k Õ

i=1

costi(si, s(t )−i)

# + k∆T

(rearranging)

≤ 1

T ÕT t =1

hλ cost(s)+ µ cost(s(t ))i + k∆T. (applying smoothness)

(7)

Rearranging, we have:

(1 −µ)1 T

ÕT t =1

cost(s(t )) ≤ λ cost(s)+ k∆T,

which immediately yields the result of the theorem. □

3.3 The Cost of Diversification

We now complement the above results by considering how much worsecost(sϵ) can be compared tocost(s) in natural games. We focus here onunit-demand congestion games where each strategy setSi ⊆R; that is, each player i selects a single resource in Si. In particular, we focus on two important special cases: (a)cj(kj) = 1/kj ∀j (players share the cost of their resource equally with all others who make the same choice; this can be viewed as a game- theoretic distributed hitting-set problem), and (b)cj(kj)= kj ∀j, i.e., linear congestion games. To avoid unnecessary complication, we assume allSi have the same sizen, i.e., every player has n choices.

We will also think of the number of choices per playern as O(1), whereas the number of playersk and the total number of resources R may be large.

Unfortunately, in both cases (a) and (b), the cost of diversification can be very high in the worst case. For case (a) (cost sharing), a bad scenario is if there is a single elementjsuch thatSi ∩ Si= j for all pairsi , i. Here,cost(s)= 1 since all players can choose j, but for anyϵ ∈ [n2, 1], we have E[cost(sϵ)]= Ω(k), since even in the best solution each player has a 50% chance of choosing a resource that no other player chose. For case (b) (linear congestion), a bad scenario is if there aren − 1 elements j1, . . . , jn−1such that Si∩ Si= {j1, . . . , jn−1 } for all pairsi , i. Here,cost(s)= k since each player can choose a distinct resource, but for anyϵ ∈ [n2, 1], we haveE[cost(sϵ)]= Ω(k2/(n − 1)), which is Ω(k2) forn = O(1).

So, in both cases, the ratiocost(sϵ)/cost(s)= Ω(k).

However, in theaverage case (each Si consists ofn random ele- ments fromR) the cost of diversification is only O(1).

Theorem 3.7. For both (a) unit-demand cost-sharing and (b) unit- demand linear congestion games, withn = O(1) strategies per player and random strategy sets Si,E[cost(sϵ)]= O(E[cost(s)]).

Proof. Let us first consider (a) unit-demand cost-sharing. One lower bound oncost(s) is that it is at least the cardinality of the largest collection of disjoint strategy setsSi; forn = 2 this is the statement that the smallest vertex cover in a graph is at least the size of the maximum matching. Now consider selecting the random setsSi one at a time. Fori ≤ R/n2, the firsti sets cover at mostR/n resources, so set Si+1 has at least a constant prob- ability of being disjoint from the firsti. This means that the ex- pected size of the largest collection of disjoint strategy sets is at leastΩ(min{k, R/n2}). On the other hand, a trivial upper bound oncost(sϵ), even forϵ = 1, is min{k, R}, since at worst each player takes a separate resource until all resources are used. Thus, for n = O(1), we have E[cost(sϵ)]= O(E[cost(s)]).

Now let us consider (b) unit-demand linear congestion. In this case, a lower bound oncost(s) is the best-case allocation of all resources equally divided. In this case we havek/R usage per re- source for a total cost ofR ×(k/R)2= k2/R. Another lower bound is simplyk, so we have cost(s) ≥ max{k, k2/R}. On the other hand,

we can notice thatsϵfor a fully-diversifiedϵ = 1 and random sets Si is equivalent to players choosing resources independently at random. In this case, the social cost is identical to the analysis of random hashing:E[cost(s1)]= E[Íjk2j]= k + k(k − 1)/R. Thus, E[cost(sϵ)]= O(E[cost(s)]) as desired. □

4 DISTRIBUTED SETTING

We now consider a distributed setting where the actions of the row player are partitioned amongk entities, such as subdivisions within a company or machines in a distributed system. At each time step, the row player asks for a number of actions from each entity, and plays a mixed strategy over them. However, asking for actions requires communication, which we would like to minimize.

Our aim is to obtain results similar to Theorem 2.4 with low communication complexity, measured by the number of actions requested and any additional constant-sized words communicated.

Letd ≤ logm denote the VC-dimension or pseudo-dimension of the set of columnsH, viewing each row as an example and each column as a hypothesis. A baseline approach is that the row player samplesO(ϵd2log1ϵ) times, from the uniform multinomial distribu- tion over{1, . . . , k} and asks each entity to send the corresponding number of actions to the row player. The row player can then use Algorithm 1 as in the centralized setting over the sampled actions, and will lose only an additionalϵ in the value of its strategy. The communication complexity of this method isO(ϵd2log1ϵ) actions plusO(k) additional words. Here, we provide an algorithm that re- duces communication toO(d log(1/ϵ)). The idea is to show that in Algorithm 1, each iteration of the multiplicative weight update can be simulated in the distributed setting withO(d) communication.

Then, since there are at mostO(log(1/ϵ)) iterations, the desired result follows. More specifically, we show that in each iteration, we can do the following two actions communication efficiently:

(1) For any distributionP over the rows partitioned across k entities, obtain a columnj such that M(P, j) ≥ vϵ.

(2) Update the distribution using the received columnj.

To achieve the first statement, assume there is a centralized oracle, which for anyϵ-diversified distribution P returns a col- umnj such that M(P, j) ≥ vϵ. For any distributionP partitioned acrossk entities, each agent first sends its sum of weights to the row player. Then, the row player samplesO((1−α )d2v2ϵ) actions (0 < α < 1) across the k agents proportional to their sum of weights, whered is the VC-dimension of H. By the standard VC- theory, a mixed strategyPof choosing a uniform distribution over the sampled actions is a(1 −α)vϵ-approximation forH, i.e.

M(P, j) ≥ M(P, j)−(1−α)vϵ ≥αvϵfor all columnj ∈ H. The com- munication complexity of this step isO((1−α )d2vϵ2) actions plusO(k) additional words. For (2), we show steps 3 and 4 in Algorithm 1 can be simulated with low communication. Step 3 is easy: just send col- umnj to all entities, and each entity then updates its own weights.

What is left is to show that the projection step in Algorithm 1 can be simulated in the distributed setting. Fortunately, this projection step has been studied before in the distributed machine learning literature [11], where an efficient algorithm withO(k log2(d/ϵ)) words of comunication is proposed. We summarize our results for the distributed setting with the following theorem.

(8)

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0



1.0 1.2 1.4 1.6 1.8 2.0

averageloss

Non-diversified Diversified

Figure 2: Simulated resuls of Braess’ paradox afterT = 10, 000 rounds. A more diversified strategy leads to lower loss.

Theorem 4.1. Given a centralized oracle, which for anyϵ-diversified distributionP returns a column j such that M(P, j) ≥ vϵ. If the ac- tions of the row players are distributed acrossk entities, there is an algorithm that constructs a mixed strategyQ such that for all but anϵ fraction of the rows i, M(i, Q) ≥ αvϵ−γ , 0 < α < 1. The algo- rithm requests at mostO(γ2(1log(1/+γ −αvϵ)ϵ)· d

(1−α )2vϵ2) actions and uses an additionalO(γ2(1log+γ −αv(1/ϵ)ϵ)·k log2(d/ϵ)) words of communication.

5 EXPERIMENTS

To better understand the benefit of diversified strategies, we give some empirical simulations for both two-player zero-sum games and general-sum games. For all the experiments, we fixγ = 0.2 and show the results of using different values ofϵ.

Two-player zero-sum games. The row player hasn = 10 ac- tions to choose from, where each round, each actionai returns a uniformly random rewardri ∈ [i/n, 1]. The game is played for T = 10, 000 rounds. Note that the n-th action has the highest ex- pected reward.

We consider two scenarios in which a rare but catastrophic event occurs. The first scenario is that at timeT , the cumulative reward gained from choosing then-th action becomes zero. The second scenario is that then-th action incurs a large negative reward of

−T in time step T . Both of these can be viewed as different ways of simulating a bad event where, for instance, the shares of a company become worthless when the company goes bankrupt.

The results for both scenarios, averaged over 10 independent trials, are shown in Figure 3. One can see that as expected, in the normal situation, the diversified strategy gains less reward.

However, when the rare event happens, the non-diversified strategy gains very low reward. In both cases, a modest value ofϵ = 0.4 achieves a high reward whether the bad event happens or not.

General-sum games. We play the routing game defined in Braess’ paradox (see Figure 1). Each player has three routes to choose from (s-a-b-t, s-a-t, and s-b-t) in each round, so ϵ ∈ [1/3, 1].

As anlyzed in Section 3.1, without the diversified constraint (i.e., ϵ = 1/3), the game quickly converges to the Nash equilibrium where all players choose the routes-a-b-t and incur a loss of 2. The best strategy in this case is to play the 1-diversified strategy, which

0.6 0.4 0.2

0.1 -0.2 -0.4

-0.6

averagereward

0

0.2 0.3 0.4 0.5 0.6

ε 0.7 0.8 0.9 1.0

Non-diversified normal situation

Non-diversified rare event happens

Diversified rare event happens Diversified normal situation

(a) Rare event removes all the reward gained from the n-th action.

0.6 0.4 0.2

0.1 -0.2 -0.4 -0.6

averagereward

Diversified rare event happens Diversified normal situation

Non-diversified normal situation

Non-diversified rare event happens Devastating reward reduction for non-diversified strategy Devastating reward reduction for non-diversified strategy 0

0.2 0.3 0.4 0.5 0.6

ε

0.7 0.8 0.9 1.0

(b) Rare event changes the reward of the n-th action to −T in the last round.

Figure 3: Average reward overT = 10, 000 rounds with dif- ferent values ofϵ. When the rare event happens, the non- diversified strategy gains very low (even negative) reward.

incur a lower loss of about 1.55. See Figure 2 for the results using otherϵ values.

6 CONCLUSION

We consider games in which one wants to play well without choos- ing a mixed strategy that is too concentrated. We show that such a diversification restriction has a number of benefits, and give adap- tive algorithms to find diversified strategies that are near-optimal, also showing how taxes or fines can be used to keep a standard algorithm diversified. Further, our algorithms are simple and ef- ficient, and can be implemented in a distributed setting. We also analyze properties of diversified strategies in both zero-sum and general-sum games, and give general bounds on the diversified price of anarchy as well as the social cost achieved by diversified regret-minimizing players.

Acknowledgements: This work is supported in part by NSF grants CCF-1101283, CCF-1451177, CCF-1535967, CCF-1422910, CCF-1800317, CNS-1704701, ONR grant N00014-09-1-0751, AFOSR grant FA9550-09-1-0538, and a gift from Intel.

(9)

REFERENCES

[1] Sanjeev Arora, Elad Hazan, and Satyen Kale. 2005. Fast algorithms for approxi- mate semidefinite programming using the multiplicative weights update method.

In46th Annual IEEE Symposium on Foundations of Computer Science (FOCS’05).

IEEE, 339–348.

[2] Sanjeev Arora, Elad Hazan, and Satyen Kale. 2012. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications.Theory of Computing 8, 1 (2012), 121–164.

[3] Moshe Babaioff, Robert Kleinberg, and Christos H Papadimitriou. 2007. Conges- tion games with malicious players. InProceedings of the 8th ACM conference on Electronic commerce. ACM, 103–112.

[4] M.-F. Balcan, A. Blum, S. Fine, and Y. Mansour. 2012. Distributed Learning, Communication Complexity and Privacy.Journal of Machine Learning Research - Proceedings Track 23 (2012), 26.1–26.22.

[5] Maria-Florina Balcan, Avrim Blum, and Yishay Mansour. 2009. The price of uncertainty. InProceedings of the 10th ACM conference on Electronic commerce.

ACM, 285–294.

[6] Maria-Florina Balcan, Florin Constantin, and Steven Ehrlich. 2011. The snowball effect of uncertainty in potential games. InInternational Workshop on Internet and Network Economics. Springer, 1–12.

[7] Avrim Blum and Yishay Mansour. 2007. Learning, Regret Minimization, and Equilibria. CambridgeUniversityPress,79–102. https://doi.org/10.1017/

CBO9780511800481.006

[8] Dietrich Braess. 1968. Über ein Paradoxon aus der Verkehrsplanung. Un- ternehmensforschung 12 (1968), 258–268.

[9] Ioannis Caragiannis, David Kurokawa, and Ariel D Procaccia. 2014. Biased Games.

InAAAI. 609–615.

[10] Nicolo Cesa-Bianchi and Gábor Lugosi. 2006. Prediction, learning, and games.

Cambridge university press.

[11] S.-T. Chen, M.-F. Balcan, and D. Chau. 2016. Communication Efficient Distributed Agnostic Boosting. InProceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016, Cadiz, Spain, May 9-11, 2016. 1299–1307.

[12] S.-T. Chen, H.-T. Lin, and C.-J. Lu. 2012.An Online Boosting Algorithm with Theoretical Justifications. InProceedings of ICML. 1007–1014.

[13] N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma. 2004.Adversarial classification. InKDD.

[14] Hal Daumé, Jeff M. Phillips, Avishek Saha, and Suresh Venkatasubramanian. 2012.

Efficient Protocols for Distributed Classification and Optimization. InProceedings of the 23rd International Conference on Algorithmic Learning Theory (ALT’12).

154–168.

[15] Y. Freund and R. E. Schapire. 1997. A Decision-Theoretic Generalization of On- Line Learning and an Application to Boosting.J. Comput. System Sci. 55, 1 (1997), 119–139.

[16] Yoav Freund and Robert E Schapire. 1999. Adaptive game playing using multi- plicative weights.Games and Economic Behavior 29, 1 (1999), 79–103.

[17] Dmitry Gavinsky. 2003. Optimally-smooth Adaptive Boosting and Application to Agnostic Learning.J. Mach. Learn. Res. 4 (2003), 101–117.

[18] Matthias Hein and Maksym Andriushchenko. 2017. Formal guarantees on the robustness of a classifier against adversarial manipulation. InAdvances in Neural Information Processing Systems. 2263–2273.

[19] Mark Herbster and Manfred K. Warmuth. 2001. Tracking the Best Linear Predictor.

J. Mach. Learn. Res. 1 (2001), 281–309.

[20] Russell Impagliazzo. 1995. Hard-core distributions for somewhat hard problems.

InFoundations of Computer Science, 1995. Proceedings., 36th Annual Symposium on. IEEE, 538–545.

[21] Nick Littlestone and Manfred K Warmuth. 1989. The weighted majority algorithm.

InFoundations of Computer Science, 1989., 30th Annual Symposium on. IEEE, 256–

261.

[22] Panagiota N Panagopoulou and Paul G Spirakis. 2014. Random bimatrix games are asymptotically easy to solve (A simple proof ).Theory of Computing Systems 54, 3 (2014), 479–490.

[23] Robert W Rosenthal. 1973. A class of games possessing pure-strategy Nash equilibria.International Journal of Game Theory 2, 1 (1973), 65–67.

[24] Tim Roughgarden. 2007. Routing games.Algorithmic game theory 18 (2007), 459–484.

[25] Tim Roughgarden. 2015. Intrinsic Robustness of the Price of Anarchy.J. ACM 62, 5 (2015), 32:1–32:42. https://doi.org/10.1145/2806883

[26] Maurice Sion. 1958. On general minimax theorems.Pacific J. Math. 8, 1 (1958), 171–176.

[27] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R.

Fergus. 2014. Intriguing properties of neural networks. InICLR.

參考文獻

相關文件

In this paper, we provide new decidability and undecidability results for classes of linear hybrid systems, and we show that some algorithms for the analysis of timed automata can

[r]

 (3) In Calculus, we have learned the following important properties for convergent

To prove this theorem, we need to introduce the space of vector valued continuous func- tions.. This theorem will be

For periodic sequence (with period n) that has exactly one of each 1 ∼ n in any group, we can find the least upper bound of the number of converged-routes... Elementary number

Full credit if they got (a) wrong but found correct q and integrated correctly using their answer.. Algebra mistakes -1% each, integral mistakes

For x = ±1, this series also converge (conditionally) by Leibniz theorem.. Thus the orbit of the planet is a

Then for any µ, −∞ &lt; µ &lt; ∞, and any σ &gt; 0, the family of pdfs (1/σ)f ((x − µ)/σ), indexed by the parameter (µ, σ), is called the location-scale family with standard pdf