I Adversarial Attack and Defense of Deep Neural Networks 17

(1)

AI-INFUSED SECURITY: ROBUST DEFENSE BY BRIDGING THEORY AND PRACTICE

A Dissertation Presented to The Academic Faculty

By

Shang-Tse Chen

In Partial Fulfillment

of the Requirements for the Degree Doctor of Philosophy in the School of Computer Science

Georgia Institute of Technology December 2019

Copyright c Shang-Tse Chen 2019

(2)

AI-INFUSED SECURITY: ROBUST DEFENSE BY BRIDGING THEORY AND PRACTICE

Approved by:

Dr. Duen Horng Chau, Advisor School of Computational Science and Engineering

Georgia Institute of Technology Dr. Maria-Florina Balcan, Co-advisor School of Computer Science

Carnegie Mellon University Dr. Wenke Lee

School of Computer Science Georgia Institute of Technology

Dr. Le Song

School of Computational Science and Engineering

Georgia Institute of Technology Dr. Kevin A. Roundy

Symantec Research Labs Dr. Cory Cornelius Intel Labs

Date Approved: August 19, 2019

(3)

For my parents, brother, and sister.

(4)

ACKNOWLEDGEMENTS

My Ph.D. journey at Georgia Tech has been an amazing experience thanks to the help from many people. Polo, you are my inspiration and role model. Your patience and cheerful attitude has saved me many times from self-doubt. Nina, thank you for your insights and advice on the theoretical side of my research. Having two advisors is an wonderful experience, and I am fortunate enough to be able to learn from the best of both worlds.

I would like to thank my mentors at Symantec and Intel. Kevin, I am impressed by your leadership and deep knowledge in security and machine learning. I really enjoyed brainstorming on our Virtual Product project with you and other Symantec colleagues. Cory, you changed my research topic in a very pleasant way. ShapeShifter has become one of my most well-known works, and it would not have been possible without your help.

My academic siblings in Polo Club, I treasure our friendship and collaboration: Acar Tamersoy, Robert Pienta, Chad Stolper, Elias Khalil, Minsuk Kahng, Fred Hohman, Nilaksh Das, Haekyu Park, Scott Freitas, Jay Wang, Austin Wright, Madhuri Shanbhogue, Peter Polack, Varun Bezzam, Prasenjeet Biswal, Paras Jain, Zhiyuan Lin, Yiqi Chen, Dezhi Fang, Siwei Li, Samuel Clarke, Joon Kim, and Alex Cabrera.

I thank my friends, collaborators, and colleagues for their friendship and help to my research: Wenke Lee, Le Song, Bistra Dilkina, Rich Vuduc, Santosh Vempala, Avrim Blum, Bogdan Carbunar, Yingyu Liang, Chris Berlind, Jason Martin, Michael E. Kounavis, Li Chen, Chris Gates, Yufei Han, Shou-De Lin, Hsuan-Tien Lin, and Chi-Jen Lu.

Special thanks to my roommates Chih-Wei Wu and Shin-Huei Chiou. You make my journey away from home less stressful. I will miss eating dinner, watching TV shows, playing video games, grocery shopping, and discussing crazy ideas with you guys.

Last but not least, my deepest gratitude to my parents, sister, and brother. You are always supportive no matter what I choose to do. I love you all.

(5)

TABLE OF CONTENTS

Acknowledgments . . . iv

List of Tables . . . xii

List of Figures . . . xvi

Chapter 1: Introduction . . . 1

1.1 Thesis Overview and Main Ideas . . . 1

1.1.1 Part I: Adversarial Attack and Defense of Deep Neural Networks . . 3

1.1.2 Part II: Theoretically-Principled Defense via Game Theory and ML 6 1.1.3 Part III: Applying AI to Protect Enterprise and Society . . . 7

1.2 Thesis Statement . . . 9

1.3 Research Contributions . . . 10

1.4 Impact . . . 11

Chapter 2: Survey . . . 13

2.1 Security of Machine Learning . . . 13

2.2 Applications of Machine Learning to Cybersecurity . . . 15

I Adversarial Attack and Defense of Deep Neural Networks 17

Chapter 3: ShapeShifter: Robust Physical Adversarial Attack on Object Detector 19

(6)

3.1 Introduction . . . 19

3.1.1 Our Contributions . . . 21

3.2 Background . . . 22

3.2.1 Adversarial Attacks . . . 22

3.2.2 Faster R-CNN . . . 22

3.3 Threat Model . . . 23

3.4 Attack Method . . . 24

3.4.1 Attacking an Image Classifier . . . 24

3.4.2 Extension to Attacking Faster R-CNN . . . 26

3.5 Evaluation . . . 27

3.5.1 Digitally Perturbed Stop Sign . . . 27

3.5.2 Physical Attack . . . 29

3.6 Discussion & Future Work . . . 36

3.7 Conclusion . . . 38

Chapter 4: SHIELD: Fast, Practical Defense and Vaccination for Deep Learn- ing using JPEG Compression . . . 39

4.1.1 Our Contributions and Impact . . . 41

4.2 Background: Adversarial Attacks . . . 42

4.3 Proposed Method: Compression as Defense . . . 43

4.3.1 Preprocessing Images using Compression . . . 44

4.3.2 Vaccinating Models with Compressed Images . . . 45

4.3.3 SHIELD: Multifaceted Defense Framework . . . 46

(7)

4.4.1 Experiment Setup . . . 47

4.4.2 Defending Gray-Box Attacks with Image Preprocessing . . . 49

4.4.3 Black-Box Attack with Vaccination and Ensembling . . . 52

4.4.4 Transferability in Black-Box Setting . . . 54

4.4.5 NIPS 2017 Competition Results . . . 55

4.5 Significance and Impact . . . 55

4.5.1 Software and Hardware Integration Milestones . . . 56

4.5.2 New Computational Paradigm: Secure Deep Learning . . . 56

4.5.3 Scope and Limitations . . . 57

4.6 Related Work . . . 57

4.6.1 Image Preprocessing as Defense . . . 58

4.6.2 Attacks against Preprocessing Techniques . . . 58

Chapter 5: UnMask: Adversarial Detection and Defense in Deep Learning Through Building-Block Knowledge Extraction . . . 61

5.1.1 Contributions . . . 63

5.2 UNMASK: Detection and Defense Framework . . . 64

5.2.1 Intuition: Protection via Building-Block Knowledge Extraction . . . 66

5.2.2 Overview of UNMASK . . . 66

5.2.3 Technical Walk-Through of UNMASK . . . 67

(8)

5.3.2 Evaluating UNMASKDefense and Detection . . . 72

5.4 Conclusion & Discussion . . . 74

II Theoretically-Principled Defense via Game Theory and ML 81

Chapter 6: Diversified Strategies for Mitigating Adversarial Attacks in Multia- gent Systems . . . 83

6.1.1 Related Work . . . 86

6.2 Zero-Sum Games . . . 87

6.2.1 Multiplicative Weights and Diversified Strategies . . . 87

6.2.2 Diversifying Dynamics . . . 90

6.2.3 How Close Isvtov? . . . 93

6.3 General-Sum Games . . . 95

6.3.1 The Benefits of Diversification . . . 95

6.3.2 The Diversified Price of Anarchy . . . 97

6.3.3 The Cost of Diversification . . . 101

6.4 Distributed Setting . . . 103

6.5 Experiments . . . 105

Chapter 7: Communication Efficient Distributed Agnostic Boosting . . . 108

7.2 Problem Setup . . . 111

(9)

7.2.1 Statistical Learning Problem . . . 111

7.2.2 Extension to the Distributed Setting . . . 112

7.3 Distributed Agnostic Boosting . . . 112

7.3.1 Agnostic Boosting: Centralized Version . . . 113

7.3.2 Agnostic Boosting: Distributed Version . . . 115

7.4 Experiments . . . 120

7.4.2 Synthetic Dataset . . . 120

7.4.3 Real-world Datasets . . . 121

7.5 Conclusions . . . 122

III Applying AI to Protect Enterprise and Society 124

Chapter 8: Predicting Cyber Threats with Virtual Security Products . . . 126

8.3 Proposed Model: Virtual Product . . . 132

8.3.1 Semi-Supervised Non-negative Matrix Factorization (SSNMF) . . . 133

8.3.2 Optimization Algorithm . . . 135

8.4.1 Data Collection . . . 137

8.4.2 Experiment Setup and Overview . . . 139

8.4.3 Evaluation on Reconstruction Accuracy . . . 140

8.4.4 Evaluation: Incident Detection and Categorization . . . 142

(10)

8.4.5 Evaluation of Computational Cost . . . 146

8.4.6 Improvement in Analyst Response Predictions . . . 147

8.5 Impact and Deployment . . . 148

8.5.1 Case Studies and Impact . . . 148

8.5.2 Deployment . . . 153

8.6 Conclusions and Discussion . . . 154

Chapter 9: Firebird: Predicting Fire Risk and Prioritizing Fire Inspections in Atlanta . . . 155

9.3 Data Description . . . 161

9.3.1 Data Sources . . . 161

9.3.2 Data Joining . . . 163

9.4 Identifying New Properties Needing Inspection . . . 164

9.5 Predictive Model of Fire Risk . . . 166

9.5.1 Data Cleaning . . . 166

9.5.2 Feature Selection . . . 166

9.5.3 Evaluation of the Models . . . 167

9.5.4 Further Discussion of the Models . . . 169

9.5.5 Assignment of Risk Scores . . . 170

9.6 Impact On AFRD and Atlanta . . . 172

9.6.1 Previous Inspection Process . . . 172

9.6.2 Technology Transfer to AFRD . . . 172

(11)

9.6.3 Impact on AFRD Processes . . . 174

9.7 Challenges . . . 176

9.8 Conclusions and Future Research Directions . . . 178

Chapter 10:Conclusions . . . 181

10.1 Contributions . . . 181

10.2 Future Research Directions . . . 182

References . . . 199

(12)

LIST OF TABLES

1.1 Thesis outline, and publications contributing to each part. . . 2

3.1 Our high-confidence perturbations succeed at attacking at a variety of distances and angles. For each distance-angle combination, we show the detected class and the confidence score. If more than one bounding boxes are detected, we report the highest-scoring one. Confidence values lower than 30% is considered undetected. . . 31 3.2 As expected, low-confidence perturbations achieve lower success rates. . . . 32 3.3 Sample high-confidence perturbations from indoor experiments. For com-

plete experiment results, please refer to Table 3.1. . . 33 3.4 Black-box transferability of our 3 perturbations. We report the number

of images (of the 15 angle-distance images) that failed to transfer to the specified model. We consider the detection of any stop sign a “failure to transfer.” Our perturbations fail to transfer for most models, most likely due to the iterative nature of our attack. . . 36

4.1 Summary of model accuracies (in %) for all defenses: SHIELD[20, 40, 60, 80], JPEG, median filter (MF), and total variation denoising (TVD); v/s all attacks: Carlini-Wagner L2 (CW-L2), DeepFool (DF), I-FGSM and FGSM.

While all techniques are able to recover accuracies from CW-L2 and DF attacks with lower perturbation strength, the best performing settings are from JPEG (in bold font). SHIELDbenefits from the combination of SLQ, vaccination and ensembling, and outperforms all other techniques when facing high perturbation delivered by I-FGSM and FGSM. We useκ = 0 in CW-L2 and = 4 in FGSM and I-FGSM. . . 51

(13)

4.2 Comparison of two ensemble schemes with SHIELD, when defending against FGSM. M_q× q corresponds to each model Mqvoting on each JPEG quality q from q ∈ {20, 30, 40, ..., 90}. In Mq− q, each model Mqvotes only on q, the JPEG quality it was trained on. SHIELD offers a favorable trade- off, providing at least 2x speed-up as compared to larger ensembles, while delivering comparable accuracies. . . 54 4.3 JPEG compression as defense does not reduce model accuracy significantly

on transferred attacks with low perturbation. Adversarial images crafted using the ResNet-v2 50 model are protected using JPEG alone and Stochas- tic Local Quantization (SLQ), before being fed into two other models:

Inception-v4 (Inc-v4) and ResNet-v2 101 (RN-v2 101). . . 55

5.1 Symbols and Definition . . . 65 5.2 Class-Feature Matrix. Top: dots mark classes’ features. Bottom: four class

sets with varying levels of feature overlap. . . 78 5.3 Number of images used in training modelsK and M . . . 79 5.4 Number of ImageNet images used to evaluate UNMASK. Only the images

that can be successfully perturbed by the attack are used, thus the variations in numbers. We report values for PGD and FGSM with=16. The numbers for=8 are similar. . . 79 5.5 Four class sets investigated in our evaluation, with varying number of classes

and feature overlap. . . 79 5.6 UNMASK’s accuracies (in %) in countering three attacks: DeepFool, FGSM,

and Projected Gradient Descent (PGD). We test two popular CNN archi- tectures, VGG16 and ResNet50, as unprotected modelM , with four class sets with varying numbers of classes and feature overlap. We use = 16 for FGSM and PGD in this experiment. We show the models’ accuracies (1) when not under attack (“No Attk” column); (2) attacked without defense (“DeepFool (No Def)”); for both FGSM and PGD, the accuracies drop to 0 without defense and are omitted in this table; and (3) attacked and defended by UNMASK(the last three columns). . . 80

7.1 Average (over 10 trials) error rate (%) and standard deviation on the synthetic dataset . . . 121 7.2 Average (over 10 trials) error rate (%) and standard deviation on real-world

datasets . . . 121

(14)

7.3 Average run time (sec) on real-world datasets . . . 122

8.1 A long list of inconclusive alerts generated in a real incident of a machine infected by the infamous Zbot Trojan. These alerts overwhelm a cybersecurity analyst, and do not help answer important questions such as: Is this machine compromised? How severe is the attack? What actions should be taken?Our technique, Virtual Product, correctly predicts the presence of the infamous Zbot Trojan, which would have been identified by an AV product, had it been installed. . . 127 8.2 Terminology used in this chapter . . . 131 8.3 Summary of the training datasets (Jul-Sept) for the top five products that

detect the most incidents. . . 137 8.4 Summary of validation datasets (Oct-Dec). . . 139 8.5 Performance of reconstruction on all five datasets . . . 141 8.6 Our approach (VP) detects security incidents with high accuracies (AUCs)

across all five datasets, outperforming the baseline model (LR). . . 143 8.7 True positive rate (TPR) of incident detection on all five data sets at 10%

false positive rate (FPR). Our approach (VP) outperforms the baseline (LR) 143 8.8 Average F1 scores of incident categorization on our datasets. We do not

include EP2 because over 99% of the detected incidents belong to one single incident type. . . 146 8.9 Average virtual product training times (over 10 runs). . . 147 8.10 Virtual Product correctly predicts that FirewallB would have detected an

incident, and 10 of its top 11 predicted alerts coincide with the one that actually occurred, yielding a clearer picture of the artifacts involved in the attack and the vulnerabilities used. The incorrect prediction is shown in strikeout font. . . 149 8.11 An attack on a webserver is obviously underway, but was it successful?

Virtual Product correctly predicts, with 99.9% confidence, that not only a deployed AV product would detect attacks on the machine, but predict successful infection of the system. . . 150

(15)

8.12 There are indications of possible ransomware activity, but how did the attack appear on the machine in the first place? Virtual Product correctly indicates that a malicious spreadsheet (detected as Bloodhound.Exploit.170) was at fault, a method by which the Locky RansomWare has been known to propagate.151

9.1 Summary of inspection and building lists . . . 161

9.2 Firebird Data Sources Summary . . . 162

9.3 Testing AUC of each year . . . 170

9.4 Top-10 features in Random Forest . . . 171

(16)

LIST OF FIGURES

1.1 My work on physical adversarial attack discovers a serious vulnerability of DNNs in a more realistic threat model where the attacker does not need to have control over the internal computer vision system pipeline. The crafted physical adversarial objects (e.g., fake stop signs) can fool the state-of-the- art object detectors. . . 3 1.2 Snapshots of a drive-by test result. The real stop sign is correctly predicted

by Faster R-CNN with high confidence. The adversarial stop sign crafted by ShapeShifteris detected as the target class “person.” . . . 4 1.3 UnMaskcombats adversarial attacks (in red) by extracting building-block

knowledge (e.g., wheel) from the image (top, in green), and comparing them to expected features of the classification (“Bird” at bottom) from the unprotected model. Low feature overlap signals attack. UnMask rectifies misclassification using the image’s extracted features. Our approach detects 92.9% of gray-box attacks (at 9.67% false positive rate) and defends the model by correctly classifying up to 92.24% of adversarial images crafted by the strongest attack, Projected Gradient Descent. . . 5 1.4 For ∈ [_n¹, 1], we define a probability distribution P to be -diversified if

P (i) ≤ _n¹ for alli. A distribution can be diversified through a Bregman projection onto the set of all-diversified distributions. A mixed strategy determined by a diversified distribution is called a diversified (mixed) strategy.

We explore properties of such diversified strategies in both zero-sum and general-sum games as well as give algorithmic guarantees. . . 6 1.5 Our distributed SmoothBoost algorithm. In each iteration, (1) each machine

samples its own data based on the current data distribution and sends it to the Center; (2) Center trains an ML model by using some weak learning algorithm and broadcasts the trained model to all the machines; (3) each machine updates its data distribution (i.e., example weights) based on received model, and performs distributed Bregman projection to ensure the distribution is diversified. All the weak models are combined at the end to obtain a strong model. . . 7

(17)

1.6 Virtual Producthelps our user Sam discover and understand cyber-threats, and informs deployment decisions (e.g., add firewall?) through semi- supervised non-negative matrix factorization on telemetry data from other users (with firewalls deployed). In the data matrix, each row represents a machine-day, and each column a security event’s occurrences. Missing events from undeployed products are shown as gray blocks. The last column indicates whether the firewall has detected an incident. Our virtual firewall serves as a proxy to the actual firewall and predicts the occurrence of security events and incidents Sam might observe (dark green block) if he deploys the firewall. . . 8 1.7 Firebird Framework Overview. By combining 8 datasets, Firebird identi-

fies new commercial properties for fire inspections. Its fire risk predictive models (SVM, random forest) and interactive map help AFRD prioritize fire inspections and personnel allocation. . . 9

3.1 Illustration motivating the need of physical adversarial attack, because attackers typically do not have full control over the computer vision system pipeline. . . 20 3.2 Digital perturbations we created using our method. Low confidence pertur-

bations on the top and high confidence perturbations on the bottom. . . 29 3.3 Indoor experiment setup. We take photos of the printed adversarial sign,

from multiple angles (0^◦, 15^◦, 30^◦, 45^◦, 60^◦, from the sign’s tangent), and distances (5’ to 40’). The camera locations are indicated by the red dots, and the camera always points at the sign. . . 30 3.4 Snapshots of the drive-by test results. In (a), the person perturbation was

detected 47% of the frames as a person and only once as a stop sign. The perturbation in (b) was detected 36% of the time as a sports ball and never as a stop sign. The untargeted perturbation in (c) was detected as bird 6 times and never detected as a stop sign or anything else for the remaining frames. 35 3.5 Example stop signs from the MS-COCO dataset. Stop signs can vary by

language, by degree of occlusion by stickers or modification by graffiti, or just elements of the weather. Each stop sign in the images is correctly detected by the object detector with high confidence (99%, 99%, 99%, and 64%, respectively). . . 36

(18)

4.1 SHIELDFramework Overview. SHIELDcombats adversarial images (in red) by removing perturbation in real-time using Stochastic Local Quantization (SLQ) and an ensemble of vaccinated models which are robust to the compression transformation. Our approach eliminates up to 98% of gray-box attacks delivered by strong adversarial techniques such as Carlini-Wagner’s L2attack and DeepFool. . . 40 4.2 SHIELD uses Stochastic Local Quantization (SLQ) to remove adversarial

perturbations from input images. SLQ divides an image into 8 × 8 blocks and applies a randomly selected JPEG compression quality (20, 40, 60 or 80) to each block to mitigate the attack. . . 45 4.3 Carlini-Wagner-L2 (CW-L2) and DeepFool, two recent strong attacks, intro-

duce perturbations that lowers model accuracy to around 10% (∅). JPEG compression recovers up to 98% of the original accuracy (with DeepFool), while SHIELDachieves similar performance, recovering up to 95% of the original accuracy (with DeepFool). . . 47 4.4 SHIELD recovers the accuracy of the model when attacked with I-FGSM

(left) and FGSM (right). Both charts show the accuracy of the model when undefended (gray dotted curve). Applying varying JPEG compression qualities (purple curves) helps recover accuracy significantly, and SHIELD

(orange curve) is able to recover more than any single JPEG-defended model. 49 4.5 Runtime comparison for three defenses: (1) total variation denoising (TVD),

(2) median filter (MF), and (3) JPEG compression, timed using the full 50k ImageNet validation images, averaged over 3 runs. JPEG is at least 22x faster than TVD, and 14x faster than MF. . . 52 4.6 Vaccinating a model by retraining it with compressed images helps recover

its accuracy. Each plot shows the model accuracies when preprocessing with different JPEG qualities with the FGSM attack. Each curve in the plot corresponds to a different model. The gray dotted curve corresponds to the original unvaccinated ResNet-v2 50 model. The orange and purple curves correspond to the models retrained on JPEG qualities 80 and 20 respectively.

Retraining on JPEG compressed images and applying JPEG preprocessing helps recover accuracy in a gray-box attack. . . 54

(19)

5.1 UNMASKframework overview. UNMASKcombats adversarial attacks (in red) by extracting building-block knowledge (e.g., wheel) from the image (top, in green), and comparing them to expected features of the classification (“Bird” at bottom) from the unprotected model. Low feature overlap signals attack. UNMASK rectifies misclassification using the image’s extracted features. Our approach detects 92.9% of gray-box attacks (at 9.67% false positive rate) and defends the model by correctly classifying up to 92.24% of adversarial images crafted by the state-of-the-art Projected Gradient Descent attack. . . 62 5.2 UNMASK guards against adversarial image perturbation by extracting

building-block features from an image and comparing them to its expected features using Jaccard similarity. If the similarity is below a threshold, UN- MASKdeems the image adversarial and predicts its class by matching the extracted features with the most similar class. . . 67 5.3 UNMASK’s effectiveness in detecting detecting three attacks: DeepFool,

FGSM (=8,16), and PGD (=8,16). UNMASK’s protection may not be affected strictly based on the number of classes. Rather, an important factor is the feature overlap among classes. UNMASK provides better detection when there are 5 classes (dark orange; 23.53% overlap) than when there are 3 (light blue; 50% overlap). Keeping the number of classes constant and varying their feature overlap also supports our observation about the role of feature overlap (e.g., CS3a at 6.89% vs. CS3b at 50%). . . 71

6.1 For ∈ [_n¹, 1], we define a probability distribution P to be -diversified ifP (i) ≤ _n¹ for alli. A distribution can be diversified through Bregman projection into the set of all-diversified distributions. A mixed strategy determined by a diversified distribution is called a diversified (mixed) strategy.

We explore properties of such diversified strategies in both zero-sum and general-sum games as well as give algorithmic guarantees. . . 84 6.2 Braess’ paradox. Here,k players wish to travel from s to t, and requiring all

players to use diversified strategies improves the quality of the equilibrium for everyone. . . 97 6.3 Simulated resuls of Braess’ paradox after T = 10, 000 rounds. A more

diversified strategy leads to lower loss. . . 106 6.4 Average reward overT = 10, 000 rounds with different values of . When

the rare event happens, the non-diversified strategy gains very low (even negative) reward. . . 106

(20)

8.1 Virtual Producthelps our user Sam discover and understand cyber-threats, and informs deployment decisions (e.g., add firewall?) through semi- supervised non-negative matrix factorization on telemetry data from other users (with firewalls deployed). In the data matrix, each row represents a machine-day, and each column a security event’s occurrences. Missing events from undeployed products are shown as gray blocks. The last column indicates if the firewall has detected an incident. Our virtual firewall serves as a proxy to the actual product and predicts the output Sam may observe (dark green block) if he deploys it. . . 129 8.2 Averaged ROC curves from 10-fold cross-validation of Virtual Product on

our top five product datasets. . . 144 8.3 ROC curves of the Virtual Product model evaluated using the validation

datasets of the five products. . . 145

9.1 Firebird Framework Overview. By combining 8 datasets, Firebird identifies new commercial properties for fire inspections. Its fire risk predictive models (SVM, random forest) and interactive map help AFRD prioritize fire inspections and personnel allocation. . . 156 9.2 Joining eight datasets using three spatial information types (geocode, ad-

dress, parcel ID). . . 158 9.3 ROC curves of Random Forest and SVM . . . 168 9.4 Interactive map of fires and inspections. The colored circles on the map

represent fire incidents, currently inspected properties, and potentially in- spectable properties in red, green, and blue, respectively. Inspectors can filter the displayed properties based on property usage type, date of fire or inspection, and fire risk score. Callout: activating the Neighborhood Plan- ning Unit overlay allows an inspector to mouse-over a political subdivision of the city to view its aggregate and percentage of the fires, inspections, and potential inspections. . . 173

(21)

SUMMARY

While Artificial Intelligence (AI) has tremendous potential as a defense against real- world cybersecurity threats, understanding the capabilities and robustness of AI remains a fundamental challenge. This dissertation tackles problems essential to successful deployment of AI in security settings and is comprised of the following three interrelated research thrusts.

(1) Adversarial Attack and Defense of Deep Neural Networks: We discover vulnerabilities of deep neural networks in real-world settings and the countermeasures to mitigate the threat. We develop ShapeShifter, the first targeted physical adversarial attack that fools state-of-the-art object detectors. For defenses, we develop SHIELD, an efficient defense leveraging stochastic image compression, and UnMask, a knowledge-based adversarial detection and defense framework.

(2) Theoretically-Principled Defense via Game Theory and ML: We develop new theories that guide defense resources allocation to guard against unexpected attacks and catastrophic events, using a novel online decision-making framework that compels players to employ “diversified” mixed strategies. Furthermore, by leveraging the deep connection between game theory and boosting, we develop a communication-efficient distributed boosting algorithm with strong theoretical guarantees in the agnostic learning setting.

(3) Using AI to Protect Enterprise and Society: We show how AI can be used in real enterprise environment with a novel framework called Virtual Product that predicts potential enterprise cyber threats. Beyond cybersecurity, we also develop the Firebird framework to help municipal fire departments prioritize fire inspections.

Our work has made multiple important contributions to both theory and practice: our distributed boosting algorithm solved an open problem of distributed learning; ShaperShifter motivated a new DARPA program (GARD); Virtual Product led to two patents; and Firebird was highlighted by National Fire Protection Association as a best practice for using data to inform fire inspections.

(22)

CHAPTER 1 INTRODUCTION

Internet-connected devices, such as mobile phones and smart home systems, have become ubiquitous in our everyday lives. The increased connectivity also presents new cybersecurity challenges and creates significant national risks. The number of cyber incidents on federal systems reported to the U.S. Department oF Homeland Security increased more than ten-fold between 2006 and 2015 [1].

To defend against these daunting and ever-increasing attacks, artificial intelligence (AI) and machine learning (ML) have been explored and employed by cybersecurity researchers and practitioners. However, even today, researchers have not yet fully understood the complex ML models and their capabilities in solving various real-world tasks. The goal of this thesis is to gain a deeper understanding of the capabilities and limitations of AI in security-critical tasks, so that we can develop resilient AI-powered next-generation cybersecurity defenses.

1.1 Thesis Overview and Main Ideas

Many cybersecurity scenarios can be modeled as a game between the defender and the attacker. To de- sign the best security solution, we need to fully understand the capabilities and limitations from both the defense and attack point of views, and how they in- teract with each other. Recent advances in AI provide

great opportunities to fortify security-critical applications. However, AI may also pose new threats and challenges. To solve these challenges, my research innovates at the intersection of AI, cybersecurity, and algorithmic game theory. My thesis includes three parts of research,

(23)

spanning the theory and application parts of cybersecurity. I make contributions to both the defensive and attacking sides of cybersecurity. Table 1.1 provides a brief overview of my dissertation.

Table 1.1: Thesis outline, and publications contributing to each part.

Part I: Adversarial Attack and Defense of Deep Neural Networks (Chapter 3, 4, 5)

§ ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector.

Shang-Tse Chen, Cory Cornelius, Jason Martin, Duen Horng Chau. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2018.

§ Shield: Fast, Practical Defense and Vaccination for Deep Learning using JPEG Compression.

Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Fred Hohman, Siwei Li, Li Chen, Michael E. Kounavis, Duen Horng Chau. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2018.

§ Extracting Knowledge For Adversarial Detection and Defense in Deep Learning.

Scott Freitas, Shang-Tse Chen, Duen Horng Chau. In KDD 2019 Workshop on Learning and Mining for Cybersecurity (LEMINCS), 2019.

Part II: Theoretically-Principled Defense via Game Theory and ML (Chapter 6, 7)

§ Diversified Strategies for Mitigating Adversarial Attacks in Multiagent Systems.

Maria-Florina Balcan, Avrim Blum, Shang-Tse Chen. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2018.

§ Communication Efficient Distributed Agnostic Boosting.

Shang-Tse Chen, Maria-Florina Balcan, Duen Horng Chau. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2016.

Part IV: Applying AI to Protect Enterprise and Society (Chapter 8, 9)

§ Predicting Cyber Threats with Virtual Security Products.

Shang-Tse Chen, Yufei Han, Duen Horng Chau, Christopher Gates, Michael Hart, Kevin Roundy. In Proceedings of the Annual Computer Security Applications Conference (ACSAC), 2017.

§ Firebird: Predicting Fire Risk and Prioritizing Fire Inspections in Atlanta.

Michael Madaio, Shang-Tse Chen, Oliver Haimson, Wenwen Zhang, Xiang Cheng, Matthew Hinds- Aldrich, Duen Horng Chau, and Bistra Dilkina. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2016.

(24)

Figure 1.1: My work on physical adversarial attack discovers a serious vulnerability of DNNs in a more realistic threat model where the attacker does not need to have control over the internal computer vision system pipeline. The crafted physical adversarial objects (e.g., fake stop signs) can fool the state-of-the-art object detectors.

1.1.1 Part I: Adversarial Attack and Defense of Deep Neural Networks

Recent advances in deep neural networks (DNNs) have generated much optimism about deploying AI in safety-critical applications, such as self-driving cars. However, it has recently been discovered that given the ability to directly manipulate image pixels in the digital input space, an adversary can easily generate imperceptible perturbations to fool a DNN image classifier [2].

Although many adversarial attack algorithms have been proposed [3, 4], attacking a real-world computer vision system is difficult, because attackers usually do not have the ability to directly manipulate data inside such systems (Figure 1.1). To understand the vulnerabilities of DNN-based computer vision systems, I collaborated with Intel and developed ShapeShifter [5], the first targeted physical adversarial attack on the state- of-the-art Faster R-CNN object detectors.

Attacking an object detector is more difficult than attacking an image classifier, as the attack needs to mislead the classifications of multiple bounding boxes at different scales.

Extending a digital attack to the physical world adds another layer of difficulty; this requires the perturbation to be sufficiently robust to survive real-world distortions due to different

(25)

Figure 1.2: Snapshots of a drive-by test result. The real stop sign is correctly predicted by Faster R-CNN with high confidence. The adversarial stop sign crafted by ShapeShifter is detected as the target class “person.”

viewing distances and angles, lighting conditions, and camera limitations.

ShapeShifter generates adversarial stop signs that were consistently mis-detected by Faster R-CNN as the target objects in real drive-by tests (Figure 1.2), posing a potential threat to autonomous vehicles and other safety-critical computer vision systems. Our code is open-sourced and the drive-by test videos are publicly available¹. ShapeShifter was highlighted as the state-of-the-art physical adversarial attack in the recent DARPA program “Guaranteeing AI Robustness against Deception” (GARD) that focuses on defending against such kind of attacks.

Although there have been many attempts to mitigate adversarial attacks, completely protecting a DNN model from adversarial attacks remains an open problem. Most methods suffer from significant computational overhead or sacrifice accuracy on benign data. In collaboration with Intel, we developed SHIELD[6], a practical defense leveraging stochastic compression that removes adversarial perturbations. SHIELD makes multiple positive impacts on Intel’s research and product development plans. Utilizing Intel’s Quick Sync Video (QSV) technology with dedicated hardware for high-speed video processing, we pave the way for real-time defense in safety-critical applications, such as autonomous vehicles. Our research sparked insightful discussion at Intel about secure deep learning that necessitates tight integration of practical defense strategies, software platforms, and hardware accelerators. Our work will accelerate the industry’s emphasis on this important

1https://github.com/shangtse/robust-physical-attack

(26)

Correctly Classified

"Bird"

UnMask

Unmasking Attacks using Building-Block Knowledge

Building Blocks

CNN

Unprotected Attacked

Mask R-CNN

wheel, frame, seat, ...

Figure 1.3: UnMask combats adversarial attacks (in red) by extracting building-block knowledge(e.g., wheel) from the image (top, in green), and comparing them to expected features of the classification (“Bird” at bottom) from the unprotected model. Low feature overlap signals attack. UnMask rectifies misclassification using the image’s extracted features. Our approach detects 92.9% of gray-box attacks (at 9.67% false positive rate) and defendsthe model by correctly classifying up to 92.24% of adversarial images crafted by the strongest attack, Projected Gradient Descent.

topic. Both ShapeShifter and SHIELD have been incorporated into MLsploit [7], an open- sourced ML evaluation and fortification framework designed for education and research.

These two works are also part of the Intel AI Academy course.

Shieldis best suited for defending against imperceptible perturbations. To defend against ShapeShifter-style attacks, we developed UnMask, a knowledge-based adversarial detection and defense framework. UnMask protects models by verifying that an image’s predicted class (e.g., “bird”) contains the expected building blocks (e.g., beak, wings, eyes). For example, if an image is classified as “bird”, but the extracted building blocks are wheel, seat and frame, the model may be under attack. When UnMask detects such attacks, it can rectify the misclassification by re-classifying the image based on its extracted building blocks (Figure 1.3).

(27)

Figure 1.4: For ∈ [_n¹, 1], we define a probability distribution P to be -diversified if P (i) ≤ _n¹ for alli. A distribution can be diversified through a Bregman projection onto the set of all-diversified distributions. A mixed strategy determined by a diversified distribution is called a diversified (mixed) strategy. We explore properties of such diversified strategies in both zero-sum and general-sum games as well as give algorithmic guarantees.

1.1.2 Part II: Theoretically-Principled Defense via Game Theory and ML

Defense resource allocation is a well-known and critical task in security. For example, a company that wants to implement security controls with a limited budget needs to make trade-offs in its deployment. I modeled this problem as a two-player zero-sum game between a defender and an attacker, and introduced a novel solution concept called diversified mixed strategy [8].

Inspired by the proverb “don’t put all your eggs in one basket,” my new solution concept compels players to employ a “diversified” strategy that does not place too much weight on any one action. I systematically studied properties of diversified strategies in multiple games, and designed efficient algorithms that asymptotically achieve the optimum reward within the family of diversified strategies. As a result, these algorithms limit the exposure to adversarial or catastrophic events while still performing successfully in typical cases.

Leveraging the deep connection between game theory, online learning, and boosting, I proved that the proposed diversified strategy concept can also be used to help learn robust and efficient ML models. Specifically, I solved an open problem listed in [9] by developing a boosting-based approach [10] in one of the hardest and most general settings in distributed

(28)

Figure 1.5: Our distributed SmoothBoost algorithm. In each iteration, (1) each machine samples its own data based on the current data distribution and sends it to the Center;

(2) Center trains an ML model by using some weak learning algorithm and broadcasts the trained model to all the machines; (3) each machine updates its data distribution (i.e., example weights) based on received model, and performs distributed Bregman projection to ensure the distribution is diversified. All the weak models are combined at the end to obtain a strong model.

learning, where data is adversarially partitioned and distributed across multiple locations, and can have arbitrary forms of noise (Figure 1.5). Succinctly, since boosting algorithms tend to place too much weight on outliers, we can project the weights back to the set of diversifieddistributions at the end of each boosting iteration. Our algorithm is simultaneously noise tolerant, communication efficient, and computationally efficient. This is a significant improvement over prior works that either were only communication efficient in noise-free scenarios or were computationally prohibitive. Our distributed boosting algorithm is not only theoretically principled but also demonstrates excellent accuracy on real-world datasets.

1.1.3 Part III: Applying AI to Protect Enterprise and Society

Part I and II provide theories, algorithms, and insight of the capabilities and limitations of AI. But how can we put AI into practice and utilize it to provide solutions that solve real enterprise security problems and create positive societal impacts? In collaboration

(29)

Figure 1.6: Virtual Product helps our user Sam discover and understand cyber-threats, and informs deployment decisions (e.g., add firewall?) through semi-supervised non-negative matrix factorization on telemetry data from other users (with firewalls deployed). In the data matrix, each row represents a machine-day, and each column a security event’s occurrences.

Missing events from undeployed products are shown as gray blocks. The last column indicates whether the firewall has detected an incident. Our virtual firewall serves as a proxy to the actual firewall and predicts the occurrence of security events and incidents Sam might observe (dark green block) if he deploys the firewall.

with Symantec, we develop the patented Virtual Product framework, the first method to predict security events and high-severity incidents that would have been identified by a security product if it had been deployed. This is made possible by learning from the vast amounts of telemetry data produced by the prevalent defense-in-depth approach to computer security, wherein multiple security products are deployed alongside each other, producing highly correlated alert data. By studying this data, we are able to accurately predict which security alerts a product would have triggered in a particular situation, even though it was not deployed. See Figure 1.6 for the overview of our approach.

Beyond cybersecurity, I further explored novel applications of AI in various domains that create positive societal impacts. In collaboration with the Atlanta Fire Rescue Department,

(30)

Figure 1.7: Firebird Framework Overview. By combining 8 datasets, Firebird identifies new commercial properties for fire inspections. Its fire risk predictive models (SVM, random forest) and interactive map help AFRD prioritize fire inspections and personnel allocation.

we developed the Firebird framework [11] (Figure 1.7) that helps municipal fire departments identify and prioritize commercial property fire inspections. Firebird computes fire risk scores for over 5, 000 buildings in Atlanta, and correctly predicts 71% of fires. Firebird won the Best Student Paper Award Runner-up at KDD 2016 and was highlighted by National Fire Protection Association as a best practice for using data to inform fire inspections.

1.2 Thesis Statement

Uniquely combining techniques from AI, cybersecurity, and algorithmic game theory, enables the development of next-generation strong cybersecurity defenses, contributing to:

1. New theory that guide defense resources allocation to guard against surprise attacks and catastrophic events;

2. New scalable and robust machine learning algorithms for a variety of threat models;

3. New application of AI on predicting enterprise cyber threats and prioritizing fire inspections.

(31)

1.3 Research Contributions

The goal of this thesis is to develop robust AI, and apply AI to solve security-critical and high-stakes problems. Our research contributes in multiple facets of AI and cybersecurity.

New Algorithms:

• Our ShapeShifter attack is the first robust targeted attack that can fool a state-of-the-art Faster R-CNN object detector. (Chapter 3)

• Our SHIELD defense combines image compression and randomization to protect neural networks from adversarial attacks in real-time. (Chapter 4)

• Our distributed boosting algorithm is simultaneously noise tolerant, communication efficient, and computationally efficient. (Chapter 7)

New Theories:

• We introduce a new online decision-making setting in game theory where players are compelled to play “diversified” strategies, and give strong guarantees on both the price of anarchy and the social welfare in this setting. (Chapter 6)

• Our distributed boosting algorithm requires exponentially less communication com- plexity in the agnostic setting, solving an open problem in distributed learning [9].

(Chapter 7)

New Applications:

• Our Virtual Product framework (Chapter 8) is the first method to predict security events and high-severity incidents identifiable by a security product as if it had been deployed.

• Our Firebird framework (Chapter 9) computes fire risk scores for over 5, 000 buildings in the city, with true positive rates of up to 71% in predicting fires.

(32)

1.4 Impact

This thesis work has made significant impact to society:

• My thesis ideas in developing theoretically principled, practical techniques to defend ML-based systems directly contributed to two funded competitive grant awards:

– Our theory-guided decision making framework (Chapter 6) laid the foundation of the $1.2M medium NSF grant Understanding and Fortifying Machine Learning Based Security Analytics(NSF CNS 1704701);

– ShapeShifter and SHIELD (Chapter 3) were two highlights of the $1.5M In- tel “gift” grant for Intel Science & Technology Center for Adversary-Resilient Security Analytics(ISTC-ARSA);

• Our ShapeShifter attack, developed with Intel, reveals serious vulnerabilities for autonomous vehicles that use pure vision-based input, and was highlighted as the state- of-the-art physical adversarial attack in the recent DARPA program “Guaranteeing AI Robustness against Deception” (GARD). Our work appeared in the media² and is open-sourced at https://github.com/shangtse/robust-physical- attack.

• ShapeShifter and SHIELDhave been integrated into the Intel AI Academy course.

• Our Virtual Product framework, developed with Symantec, has led to two patents.

• Our Firebird project is open-sourced³and has been used by the Atlanta Fire Rescue Department to prioritize fire inspections. Firebird won the Best Student Paper Award Runner-up at KDD 2016 and was highlighted by National Fire Protection Association as a best practice for using data to inform fire inspections.

2https://techhq.com/2018/10/study-reveals-new-vulnerability-in-self- driving-cars/

3http://firebird.gatech.edu

(33)

• My thesis research on AI-infused security has been recognized by the 2018 IBM PhD Fellowship.

(34)

CHAPTER 2 SURVEY

Our survey focuses on two important areas of research related to this thesis: security of ML and applications of ML in cybersecurity.

2.1 Security of Machine Learning

We briefly survey robust machine learning algorithms under various threat models.

Random Classification Noise. This is one of the most basic threat models studied in classic learning theory [12]. In this setting, the training and testing data come from the same fixed but unknown distribution. However, the label of each training example presented to the learning algorithm is randomly flipped with probability 0 ≤η < 1/2. Here we only consider the binary classification case, andη is a parameter called the classification error rate. It is known that if a training algorithm is in the family of Statistical query (SQ) learning, it can be converted into a noise-tolerant algorithm in the random classification noise setting [13].

Malicious Noise. This setting is similar to the random classification noise model, where η fraction of the training examples are changed by the adversary. The only difference is that the adversary can arbitrarily change not only the label but also the features of the training examples, making it a notoriously difficult setting [14]. It has been proved that it is information-theoretically impossible to learn to accuracy 1 − if η > /(1 + ) [15]. Most of the positive results require strong assumptions on the underlying data distribution or the target function [16, 17]. For learning linear separators, the current state-of-the-art method is developed by Awasthi et al. [18].

(35)

Agnostic Learning. This is the setting that we will study in Chapter 7. In the two aforementioned settings, with probability 1 −η the examples are labeled by an unknown target function from a known hypothesis set. For example, in the case of learning a linear classifier, 1 −η fraction of the examples are linearly separable, and the remaining examples are contaminated by random classification noise or malicious noise, respectively. In contrast, in the agnostic learning setting, we do not make any assumptions on the data distribution nor the target function [19]. Since the target function may not be from the hypothesis set that the training algorithm uses, the goal is to achieve accuracy as close to the best hypothesis in our hypothesis set as possible.

Adversarial Machine Learning. This line of research was first studied by cybersecurity researchers in applications such as spam filtering [20], network intrusion detection [21], and malware detection [22]. Depending on the stage at which an attacker can manipulate data, adversarial attacks can be further categorized into causative attacks and exploratory attacks[23].

Causative attack, also known as poisoning attack, refers to the setting where the attacker can manipulate the training data in order to decrease the accuracy on all or a subset of the test examples. For example, the attacker can add backdoors to a maliciously trained traffic sign image classifier such that it achieves high overall test accuracy but classifies stop signs as speed limit signs when a special sticker is attached to the stop sign [24]. Similarly, one can also train networks for face recognition and speech recognition that only perform malicious behaviors when a specific “trojan” trigger is presented [25].

In an exploratory attack, also called an evasion attack, the attacker can only change the test examples to fool a trained ML model. The success of Deep Neural Networks (DNNs) in computer vision does not exempt them from this threat. It is possible to reduce the accuracy of a state-of-the-art DNN image classifier to zero percent by adding imperceptible adversarial perturbations [2, 26]. Many new attack algorithms have been proposed [27,

(36)

28, 29, 30] and applied to other domains such as malware detection [31, 32], sentiment analysis [33], and reinforcement learning [34, 35]. In Chapter 3, we demonstrate a new attack in a slightly different setting called physical adversarial attacks. There have been various attempts to mitigate the threat of adversarial attacks [4, 36], but immunizing a DNN model to adversarial attacks remains an open problem and an active research area. In Chapter 4 and 5, we propose new methods toward this goal.

2.2 Applications of Machine Learning to Cybersecurity

Malware Detection. Traditional anti-malware software depends heavily on signature- based methods, which use static fingerprints of known malware to detect future malicious files [37]. However, it can only identify “known” malware for which the signatures have been created, and hence can be easily evaded by more advanced attacking techniques like polymorphism and obfuscation [38, 39]. Many machine learning based approaches, using various feature extraction techniques and learning algorithms, have thus been explored [40, 41, 42, 43]. Reputation-based approaches using graph mining is another popular line of research [44, 45].

Intrusion Detection System. The main task of an intrusion detection system (IDS) is to monitor a system’s vulnerability exploits and attacks. Similar to malware detection, early work on IDS used signature-based approaches [46], which has limited ability to detect zero-day attacks. Anomaly-based detection models the normal internet traffic or system behavior using machine learning and data mining methods, and detects deviations from the baseline behavior [47, 48].

Online Fraudulent Behavior Detection. AI helps many websites provide better services, but it also creates new vulnerabilities. For example, an adversary can create fake accounts and write fraudulent reviews to manipulate reputation-based recommendation system. Re- searchers have used data mining and machine learning techniques to detect fake reviews [49,

(37)

50], internet bots [51], auction fraud [52], insider trading [53], and credit card fraud [54].

A good defense requires a combination of several techniques such as natural language processing, graph mining, and time series analysis.

(38)

Part I

Adversarial Attack and Defense of Deep

Neural Networks

(39)

OVERVIEW

Deep neural networks (DNNs), although very powerful, are known to be vulnerable to adversarial attacks. In computer vision applications, such attack can be achieved by adding carefully crafted but visually imperceptible perturbations to input images.

The threat of adversarial attack casts a shadow over deploying DNNs in security- and safety-critical applications, such as self-driving cars. To better understand and fix the vulnerabilities, there is a growing body of research on both designing stronger attacks and making DNN models more robust. However, many existing works are “impractical” either because they assume an unrealistic threat model, or the defense is too computationally expensive to be used in practice. In Part I of my thesis, we present the following practical attack and defenses.

• ShapeShifter (Chapter 3) is the first “physical” adversarial attack that fools the state-of-the-art object detector.

• SHIELD (Chapter 4) is an efficient defense leveraging stochastic image compression

• UnMask (Chapter 5) is a knowledge-based adversarial detection and defense framework.

(40)

CHAPTER 3

SHAPESHIFTER: ROBUST PHYSICAL ADVERSARIAL ATTACK ON OBJECT DETECTOR

Given the ability to directly manipulate image pixels in the digital input space, an adversary can easily generate imperceptible perturbations to fool a Deep Neural Network (DNN) image classifier, as demonstrated in prior work. In this work, we propose ShapeShifter, an attack that tackles the more challenging problem of crafting physical adversarial perturbations to fool image-based object detectors like Faster R-CNN. Attacking an object detector is more difficult than attacking an image classifier, as it needs to mislead the classification results in multiple bounding boxes at different scales. Extending a digital attack to the physical world adds another layer of difficulty, because it requires the perturbation to be robust enough to survive real-world distortions like different viewing distances and angles, lighting conditions, and camera limitations. We show that the Expectation over Transformation technique, which was originally proposed to enhance the robustness of adversarial perturbations in image classification, can be adapted to the object detection setting. ShapeShifter can generate adversarially perturbed stop signs that Faster R-CNN consistently mis-detects as other objects, posing a potential threat to autonomous vehicles and other safety-critical computer vision systems.

3.1 Introduction

Adversarial examples are input instances that are intentionally designed to fool a machine learning model into producing a chosen prediction. The success of DNNs in computer vision does not exempt it from this threat. It is possible to bring the accuracy of a state-of-the-art DNN image classifier down to zero percent by adding imperceptible adversarial perturbations [2, 26]. The existence of adversarial examples not only reveals intriguing theoretical

(41)

Figure 3.1: Illustration motivating the need of physical adversarial attack, because attackers typically do not have full control over the computer vision system pipeline.

properties of DNN, but also raises serious practical concerns about their deployment in security- and safety-critical systems. Autonomous vehicle is an example application that cannot be fully trusted until DNNs are robust to adversarial attacks. The need to understand robustness of DNNs attracts tremendous interest among machine learning, computer vision, and security researchers.

Although many adversarial attack algorithms have been proposed, using them to attack a real-world computer vision systems is difficult. First of all, many of these existing attack algorithms focus on the image classification task, yet for many real-world use cases there will be more than one object in an image. Object detection, which recognizes and localizes multiple objects in an image, is a more suitable model for many vision-based real-world use cases. Attacking an object detector is more difficult than attacking an image classifier, as it needs to mislead the classification results in multiple bounding boxes at different scales [55].

Further difficulty comes from the fact that a DNN is usually only one component in a complete computer vision system pipeline. For many applications, attackers do not have the ability to directly manipulate data inside the pipeline. Instead, they can only manipulate the things outside of the system, i.e., those things in the physical environment.

Figure 3.1 illustrates the intuition behind physical adversarial attacks. To be successful attacks, physical adversarial attacks must be robust enough to survive real-world distortions

(42)

like different viewing distances and angles, lighting conditions, and camera limitations.

Prior work can either attack object detectors digitally [56], or attack image classifiers physically [3, 57, 58]. However, existing attempts to physically attack object detectors remain unsatisfactory. The perturbed stop sign shown in [59] cannot be detected by the Faster R-CNN object detector [60]. However, the perturbation is very noticeable. The authors tested it against a background with poor texture contrast, making the perturbed stop sign difficult to see even by humans. A concurrent work [61] claims to be able to generate some adversarial stickers that, when attached to a stop sign, can fool the YOLO object detector [62] and Faster R-CNN.

In this work, we propose ShapeShifter, the first robust targeted attack that can fool a state- of-the-art Faster R-CNN object detector. To make the attack robust, we adopt the Expectation over Transformationtechnique [63, 64], and adapt it from the image classification task to the object detection task. As a case study, we created adversarial stop signs that are mis-detected by Faster R-CNN in real drive-by tests. Our contributions are summarized below.

3.1.1 Our Contributions

• To the best of knowledge, our work presents the first reproducible and robust targeted attack against Faster R-CNN [55]. We have open-sourced our code on GitHub¹.

• We show that the Expectation over Transformation technique [63], originally proposed for image classification, can be adapted to the object detection task and can significantly enhance the robustness of the resulting perturbation.

• By carefully studying the Faster R-CNN object detector algorithm, we overcome non- differentiable components in the model, and successfully perform optimization-based attacks using gradient descent and backpropogation.

• We generate perturbed stop signs that can consistently fool Faster R-CNN in real

1https://github.com/shangtse/robust-physical-attack

(43)

drive-by tests (videos available on the GitHub repository), demonstrating for the need to improve and fortify vision-based object detectors.

3.2 Background

This section provides background information of adversarial attacks and briefly describes the Faster R-CNN object detector that we attack in this work.

3.2.1 Adversarial Attacks

Given a trained machine learning modelC and a benign instance x ∈ X that is correctly classified by C, the goal of the untargeted adversarial attack is to find another instance x⁰ ∈ X , such that C(x⁰) 6= C(x) and d(x, x⁰) ≤ for some distance metric d(·, ·) and perturbation budget > 0. For targeted attacks, we further require C(x⁰) = y⁰ where y⁰ 6= C(x) is the target class. Common distance metrics d(·, ·) in the computer vision domain are`2 distanced(x, x⁰) = ||x − x⁰||²₂ and`∞distanced(x, x⁰) = ||x − x⁰||_∞.

The work of [2] was the first to discover the existence of adversarial examples for DNNs.

Several subsequent works have improved the computational cost of creating adversarial examples and made the perturbations highly imperceptible to humans [27, 28]. Many adversarial attack algorithms assume that the model is differentiable, and use the gradient of the model to change the input towards the desired model output [26]. Sharif et al. [57] first demonstrated a physically realizable attack to fool a face recognition model by wearing an adversarially crafted pair of glasses.

3.2.2 Faster R-CNN

Faster R-CNN [60] is a state-of-the-art general object detector. It adopts a two-stage detection strategy. In the first stage, a region proposal network generates several class- agnostic bounding boxes, called region proposals, that may contain objects. In the second stage, a classifier and a regressor output a classification result and refined bounding box

(44)

coordinate for each region proposal, respectively. The computation cost is reduced by sharing the convolutional layers between the two stages. Faster R-CNN is difficult to attack, because a single object can be covered by multiple region proposals of different sizes and aspect ratios, and one needs to mislead the classification results in all overlapping region proposals to fool the detection.

3.3 Threat Model

Existing methods that generate adversarial examples typically yield imperceptible perturbations that fool a given machine learning model. Our work, following [57], generates perturbations that are perceptible but constrained such that a human would not be easily fooled by such a perturbation. We examine this kind of perturbation in the context of object detection. We chose this use case because of object detector’s possible uses in security- related and safety-related settings (e.g., autonomous vehicles). For example, attacks on traffic sign recognition could cause a car to miss a stop sign or travel faster than legally allowed.

We assume the adversary has white-box level access to the machine learning model. This means the adversary has access to the model structure and weights such that the adversary can compute both outputs (i.e., the forward pass) and gradients (i.e., the backward pass). It also means that the adversary does not have to construct a perturbation in real-time. Rather, the adversary can study the model and craft an attack for that model using methods like Carlini-Wagner attack [26]. This kind of adversary is distinguished from one with black-box level of access which is defined as having no access to the model architecture or weights.

While our choice of adversary is the most knowledgeable one, existing research has shown it is possible to construct imperceptible perturbations without white-box level access [65].

Whether our method is capable of generating perceptible perturbations with only black-box access remains an open question. Results from Liu et al. [66] suggest that iterative attacks (like ours) tend not to transfer well to other models.

(45)

Unlike previous work, we restrict the adversary such that they cannot manipulate the digital values of pixels gathered by the camera used to sense the world. This is an important distinction from existing imperceptible perturbation methods. Because those methods create imperceptible perturbations, there is a high likelihood such imperceptible perturbations would not fool our use cases when physically realized. That is, when printed and then presented to the systems in our use cases, those imperceptible perturbations would have to survive both the printing process and camera sensing pipeline in order to fool the system.

This is not an insurmountable task as Kurakin et al. [3] have constructed such imperceptible yet physically realizable adversarial perturbations for image classification systems.

Finally, we also restrict our adversary by limiting the shape of the perturbation the adversary can generate. This is important distinction for our use cases because one could easily craft an oddly-shaped “stop sign” that does not exist in the real world. We also do not give the adversary the latitude of modifying all pixels in an image like Kurakin et al. [3], but rather restrict them to certain pixels that we believe are physically realistic, and whose change is inconspicuous.

3.4 Attack Method

Our attack method, ShapeShifter, is inspired by the iterative, change-of-variable attack described in [26] and the Expectation over Transformation technique [63, 64]. Both methods were originally proposed for the task of image classification. We describe these two methods in the image classification setting before showing how to extend them to attack the Faster R-CNN object detector.

3.4.1 Attacking an Image Classifier

Let F : [−1, 1]^h×w×3 → R^K be an image classifier that takes an image of height h and widthw as input, and outputs a probability distribution over K classes. The goal of the attacker is to create an imagex⁰that looks like an objectx of class y, but will be classified

(46)

as another target classy⁰.

Change-of-variable Attack

DenoteLF(x, y) = L(F (x), y) as the loss function that calculates the distance between the model outputF (x) and the target label y. Given an original input image x and a target class y⁰, the change-of-variable attack [26] propose the following optimization formulation.

arg min

x⁰∈R^h×w×3

LF(tanh(x⁰), y⁰) +c · || tanh(x⁰) −x||²₂. (3.1)

The use of tanh ensures that each pixel is between [−1, 1]. The constant c controls the similarity between the modified objectx⁰ and the original imagex. In practice, c can be determined by binary search [26].

Expectation over Transformation

The Expectation over Transformation [63, 64] idea is simple: adding random distortions in each iteration of the optimization makes the resulting perturbation more robust to these distortions. Given a transformationt like translation, rotation, or scaling, Mt(xb, xo) is an operation that transforms an object imagexousingt and then overlays it onto a background imagexb. Mt(xb, xo) can also include a masking operation that only keeps a certain area of xo. Masking is necessary when one wants to restrict the shape of the perturbation. After incorporating the random distortions, equation (3.1) becomes

arg min

x⁰∈R^h×w×3E^x∼X,t∼T [LF(Mt(x, tanh(x⁰)), y⁰)] +c · || tanh(x⁰) −xo||²₂, (3.2) where X is the training set of background images. When the model F is differentiable, this optimization problem can be solved by gradient descent and back-propagation. The expectation can be approximated by the empirical mean.