Chapter 5: UnMask: Adversarial Detection and Defense in Deep Learning Through
5.4 Conclusion & Discussion
In this work, we have introduced a new fundamental concept of building-block knowledge extraction, and showed how it protects deep learning models against adversarial attacks through the UNMASKdetection and defense framework. We draw inspiration from humans’
natural ability to make robust classification decisions through the detection and synthesis of contextual building-block knowledge contained in images. We designed and developed our UNMASKframework to simulate such capability, so it can detect adversarial pixel-centric manipulations targeting a deep learning model, and defend the model against attacks by rectifying the classification. Through extensive evaluation on large-scale real-world image data, we showcase the merits of our ideas through UNMASK’s ability to detect up to 92.9%
of attacks with a false positive rate of 9.67% and defend deep learning models by correctly classifying up to 92.24% of adversarial images in the gray-box scenario. Our proposed method is fast and architecture-agnostic.
In this work, we direct our efforts to studying the efficacy of UNMASKand the concept of building-block knowledge extraction on their own. As myriads of newer and stronger attack strategies are continuously discovered, our approach is not a panacea to defend all possible (future) attacks, and we do not intend for it to be used in isolation of other techniques. Rather, we believe that detection and defense strategies should be combined.
We expect our approach to be one of multiple techniques that are used in concert to provide comprehensive protection. Multi-pronged protection is a proven, long-standing defense strategy pervasive in security research and in practice [82, 45]. Fortunately, our proposed technique can be readily integrated with many existing techniques, as it operates in parallel to the deep learning model that it aims to protect (see Figure 5.1).
We note that UNMASK has the potential vulnerability to attacks that simultaneously target and manipulate all building-block features, e.g., changing every “bike” parts in a bike image, into “bird” parts (bike wheel→bird wing; bike handlebar→bird tail). Such simultaneous, multi-part attack could be challenging to formulate and execute. To the best of our knowledge, we have not yet encountered it in research or practice.
Future research directions include extending UNMASKto the object detection task. That is, we protect an object detector with an auxiliary detector that detects object parts. Such defense has the potential to mitigate ShapeShifter-style attacks in the black-box or gray-box
settings. We did not evaluate UNMASKdirectly against ShapeShifter because ShapeShifter currently only works in the white-box setting and has limited black-box transferability. We also want to improve UNMASKby reducing the dependency on object part labeling.
Algorithm 1: UNMASK
Input: Training imagesX, labels Y , segmentation masks S, set of possible classes C, attribute matrixV , threshold t, test image x
Result: adversarial predictionz ∈ {0, 1}, predicted class p Train unprotected classification modelM :
M = NeuralNet(X, Y );
y = M (x);ˆ
Train building-block extraction modelK:
K = Mask-RCNN(X, S);
fe=K(x); (extracted building blocks)
fa=V [ˆy]; (expected building blocks)
Detection:
s = J(fe, fa); d = 1 − s;
z = 0 (benign), ifd < t 1 (adversarial), if d ≥ t Defense:
p =
( y,ˆ ifz = 0
argmin
c∈C
J(fe, V [c]), if z = 1 returnz, p;
Features Airplane Bicycle Bird Boat Bottle Bus Car Cat Chair Cow DiningTable Dog Horse Motorbike Person PottedPlant Sheep Sofa Train Television Arm
Beak Body Cap Coach Door Engine Ear Eye Eyebrow Foot Front side Hair Hand Head Headlight Hoof Horn Leg
License plate Mirror Mouth Muzzle Neck Nose Paw Plant Pot Saddle Screen Stern Tail Torso Vehicle Wheel Window Wing
Class Set (CS) CS3a
CS3b CS5a CS5b
Table 5.2: Class-Feature Matrix. Top: dots mark classes’ features. Bottom: four class sets
Setup PASCAL-Part PASCAL VOC 2010 Model Classes Train Val Test Train Val Test
K 44 7,457 930 936 - -
-M
CS3a - - - 1,750 350 1,400
CS3b - - - 2,104 421 1,684
CS5a - - - 2,264 452 1,812
CS5b - - - 2,501 500 2,001
Table 5.3: Number of images used in training modelsK and M .
Defense Detection
Class Set DF FGSM PGD All Attacks
CS3a 3,485 2,823 3,494 3,494
CS3b 4,749 4,161 4,764 4,764
CS5a 5,827 5,252 5,849 5,849
CS5b 6,728 5,883 6,747 6,747
Table 5.4: Number of ImageNet images used to evaluate UNMASK. Only the images that can be successfully perturbed by the attack are used, thus the variations in numbers. We report values for PGD and FGSM with=16. The numbers for =8 are similar.
Class Set Classes Unique Parts Overlap
CS3a 3 29 6.89%
CS3b 3 18 50.00%
CS5a 5 34 23.53%
CS5b 5 34 29.41%
Table 5.5: Four class sets investigated in our evaluation, with varying number of classes and feature overlap.
ModelM Class Set Overlap No Attk DeepFool (No Def) DeepFool FGSM PGD
VGG16
CS3a 6.89% 87.00 5.13 94.33 73.44 89.89
CS3b 50.00% 89.13 3.47 85.62 60.11 75.19
CS5a 23.53% 80.35 3.91 91.11 65.86 82.65
CS5b 29.41% 81.36 3.04 87.17 62.88 77.02
ResNet50
CS3a 6.89 86.64 4.51 95.04 74.42 90.81
CS3b 50.00 85.75 3.28 86.12 66.71 78.55
CS5a 23.53 80.35 3.91 91.11 65.86 82.65
CS5b 29.41 79.91 3.33 87.57 65.19 80.01
Table 5.6: UNMASK’s accuracies (in %) in countering three attacks: DeepFool, FGSM, and Projected Gradient Descent(PGD). We test two popular CNN architectures, VGG16 and ResNet50, as unprotected modelM , with four class sets with varying numbers of classes and feature overlap. We use = 16 for FGSM and PGD in this experiment. We show the models’ accuracies (1) when not under attack (“No Attk” column); (2) attacked without defense (“DeepFool (No Def)”); for both FGSM and PGD, the accuracies drop to 0 without defense and are omitted in this table; and (3) attacked and defended by UNMASK(the last three columns).
Part II
Theoretically-Principled Defense via
Game Theory and ML
OVERVIEW
Most non-trivial security problems require some kind of decision making and resource allocation. For example, a company that wants to implement security controls with a limited budget needs to make trade-offs in its deployment. I modeled this problem as a two-player zero-sum game between a defender and an attacker, and introduced a novel solution concept called diversified mixed strategy (Chapter 6).
Inspired by the proverb “Don’t put all your eggs in one basket,” my new solution concept compels players to employ a “diversified” strategy that does not place too much weight on any one action. Furthermore, by leveraging the deep connection between game theory and boosting, we develop a communication-efficient distributed boosting algorithm with strong theoretical guarantees (Chapter 7) in the agnostic learning setting where the data can contain arbitrary noise. Our algorithm achieves exponential improvement in communication complexity over prior work and solves an open problem in distributed learning.
CHAPTER 6
DIVERSIFIED STRATEGIES FOR MITIGATING ADVERSARIAL ATTACKS IN MULTIAGENT SYSTEMS
In this chapter we consider online decision-making in settings where players want to guard against possible adversarial attacks or other catastrophic failures. To address this, we propose a solution concept in which players have an additional constraint that at each time step they must play a diversified mixed strategy: one that does not put too much weight on any one action. This constraint is motivated by applications such as finance, routing, and resource allocation, where one would like to limit one’s exposure to adversarial or catastrophic events while still performing well in typical cases. We explore properties of diversified strategies in both zero-sum and general-sum games, and provide algorithms for minimizing regret within the family of diversified strategies as well as methods for using taxes or fees to guide standard regret-minimizing players towards diversified strategies. We also analyze equilibria produced by diversified strategies in general-sum games. We show that surprisingly, requiring diversification can actually lead to higher-welfare equilibria, and give strong guarantees on both price of anarchy and the social welfare produced by regret-minimizing diversified agents. We additionally give algorithms for finding optimal diversified strategies in distributed settings where one must limit communication overhead.