• 沒有找到結果。

Chapter 5: UnMask: Adversarial Detection and Defense in Deep Learning Through

6.1 Introduction

CHAPTER 6

DIVERSIFIED STRATEGIES FOR MITIGATING ADVERSARIAL ATTACKS IN MULTIAGENT SYSTEMS

In this chapter we consider online decision-making in settings where players want to guard against possible adversarial attacks or other catastrophic failures. To address this, we propose a solution concept in which players have an additional constraint that at each time step they must play a diversified mixed strategy: one that does not put too much weight on any one action. This constraint is motivated by applications such as finance, routing, and resource allocation, where one would like to limit one’s exposure to adversarial or catastrophic events while still performing well in typical cases. We explore properties of diversified strategies in both zero-sum and general-sum games, and provide algorithms for minimizing regret within the family of diversified strategies as well as methods for using taxes or fees to guide standard regret-minimizing players towards diversified strategies. We also analyze equilibria produced by diversified strategies in general-sum games. We show that surprisingly, requiring diversification can actually lead to higher-welfare equilibria, and give strong guarantees on both price of anarchy and the social welfare produced by regret-minimizing diversified agents. We additionally give algorithms for finding optimal diversified strategies in distributed settings where one must limit communication overhead.

Figure 6.1: For  ∈ [n1, 1], we define a probability distribution P to be -diversified if P (i) ≤ n1 for alli. A distribution can be diversified through Bregman projection into the set of all-diversified distributions. A mixed strategy determined by a diversified distribution is called a diversified (mixed) strategy. We explore properties of such diversified strategies in both zero-sum and general-sum games as well as give algorithmic guarantees.

on the expected value obtained (in zero-sum games) and the overall social welfare produced (in general-sum games).

As an example, consider a standard game-theoretic scenario: an agent must drive from point A to point B and hasn different routes it can take. We could model this as a game M where rows correspond to then routes, columns correspond to m possible traffic patterns, and entryM (i, j) is the cost for using route i under traffic pattern j. However, suppose the agent is carrying valuable documents and is concerned an adversary might try to steal them.

In this case, to reduce the chance of this happening, we might require that no route have more than (say) 10% probability. The agent then wants to minimize expected travel time subject to this requirement. Or in an investment scenario, if rows correspond to different investments and columns to possible market conditions, we might have an additional worry that perhaps one of the investment choices is run by a crook. In this case, we may wish to restrict the strategy space to allocations of funds that are not too concentrated.

To address such scenarios, for  ∈ [n1, 1] let us define a probability distribution (or allocation)P to be -diversified if P (i) ≤ n1 for alli. For example, for  = n1 this is no restriction at all, for = 1 this requires the uniform distribution, and for intermediate values of this requires an intermediate level of diversification. We then explore properties of such

diversified strategies in both zero-sum and general-sum games as well as give algorithmic guarantees.

For zero-sum games, definevto be the minimax-optimal value of the game in which the row player is restricted to playing -diversified mixed strategies. Natural questions we address are: Can one design adaptive learning algorithms that maintain-diversified distributions and minimize regret within this class so they never perform much worse than v? Can a central authority “nudge” a generic non-diversified regret-minimizer into using diversified strategies via fines or taxes (extra loss vectors strategically placed into the event stream) such that it maintains low-regret over the original sequence? And for reasonable games, how much worse isvcompared to the non-diversified minimax valuev? We also consider a dual problem of producing a strategyQ for the column player that achieves value v against all but an fraction of the rows (which an adversary can then aim to attack).

One might ask why not model such an adversary directly within the game, via additional columns that each give a large loss to one of the rows. The main reason is that these would then dominate the minimax value of the game. (And they either would not have values within the usual [0, 1] range assumed by regret-minimizing learners, or, if they were scaled to lie in this range, they would cause all other events to seem roughly the same). Instead, we want to consider learning algorithms that optimize for more common events, while keeping to the constraint of maintaining diversified strategies. We also remark that one could also make diversification a soft constraint by adding a loss term for not diversifying.

We next consider general-sum games, such as routing games and atomic congestion games, in which k players interact in ways that lead to various costs being incurred by each player. We show that surprisingly, requiring a player to use diversified strategies can actually improve its performance in equilibria in such games. We then study the-diversified price of anarchy: the ratio of the social cost of the worst equilibrium subject to all players being-diversified to the social cost of the socially-best set of -diversified strategies. We show that in some natural games, even requiring a small amount of diversification can

dramatically improve the price of anarchy of the game, though we show there also exist games where diversification can make the price of anarchy worse. We also bring several threads of this investigation together by showing that for the class of smooth games defined by Roughgarden [101], for any diversification parameter ∈ [n1, 1], the -diversified price of anarchy is no worse than the smoothness of the game, and moreover, players using diversified regret-minimizing strategies will indeed approach this bound. Thus, we get strong guarantees on the quality of interactions produced by self-interested diversified play.

Finally, we consider how much diversification can hurt optimal play, showing that in random unit-demand congestion games, diversification indeed incurs a low penalty.

Lastly, we consider an information-limited, distributed, “big-data” setting in which the number of rows and columns of the matrixM is very large and we do not have it explicitly.

Specifically, we assume then rows are distributed among r processors, and the only access to the matrixM we have is via an oracle for the column player that takes in a sample of rows and outputs the column player’s best response. What we show is how in such a setting to produce near optimal strategies for each player in the sense described above, from very limited communication among processors.

In addition to our theoretical results, we also present experimental simulations for both zero-sum and general-sum games.

6.1.1 Related Work

There has been substantial work on design of “no-regret” learning algorithms for repeated play of zero-sum games [102, 103, 104]. Multiplicative Weight Update methods [105, 106]

are a specific type of no-regret algorithm that have received considerable attention in game theory [102, 107], machine learning [102, 108], and many other research areas [109, 110], due to their simplicity and elegance.

We consider the additional constraint that players play diversified mixed strategies, motivated by the goal of reducing exposure to adversarial attacks. The concept of diversified

strategies, sometimes called “smooth distributions”, appears in a range of different areas [109, 111, 112]. [113] considers a somewhat related notion where there is a penalty for deviation from a given fixed strategy, and shows existence of equilibria in such games. Also related is work on adversarial machine learning, e.g., [114, 115, 116]; however, in this work we are instead focused on decision-making scenarios.

Our distributed algorithm is inspired by prior work in distributed machine learning [117, 118, 108], where the key idea is to perform weight updates in a communication efficient way.

Other work on the impact of adversaries in general-sum games appears in [119, 120, 121].