Physical Attack - I Adversarial Attack and Defense of Deep Neural Networks 17

Chapter 2: Survey

3.5 Evaluation

3.5.2 Physical Attack

We performed physical attacks on the object detector by printing out the perturbed stop signs shown in Figure 3.2. We then took photos from a variety of distances and angles in a controlled indoor setting. We also conducted drive-by tests by recording videos from a moving vehicle that approached the signs from a distance. The lightning conditions varied from recording to recording due to the weather at the time.

Equipment

We used a Canon Pixma Pro-100 photo printer to print out signs with high-confidence perturbations, and an HP DesignJet to print out those with low-confidence perturbations³. For static images, we used a Canon EOS Rebel T7i DSLR camera, equipped with a EF-S 18-55mm IS STM lens. The videos in our drive-by tests were shot using an iPhone 8 Plus

3We used two printers to speed up our sign production, since a sign can take more than 30 minutes to produce.

Figure 3.3: Indoor experiment setup. We take photos of the printed adversarial sign, from multiple angles (0^◦, 15^◦, 30^◦, 45^◦, 60^◦, from the sign’s tangent), and distances (5’ to 40’).

The camera locations are indicated by the red dots, and the camera always points at the sign.

mounted on the windshield of a car.

Indoor Experiments

Following the experimental setup of [58], we took photos of the printed adversarial stop sign, at a variety of distances (5’ to 40’) and angles (0^◦, 15^◦, 30^◦, 45^◦, 60^◦, from the sign’s tangent). This setup is depicted in Figure 3.3 where camera locations are indicated by red dots. The camera always pointed at the sign. We chose these distance-angle combinations to mimic a vehicle’s points of view as it would approach the sign [59]. Table 3.1 and Table 3.2 summarize the results for our high-confidence and low-confidence perturbations, respectively. For each distance-angle combination, we show the detected class and the detection’s confidence score. If more than one bounding box was detected, we report the highest-scoring one. Confidence values lower than 30% were considered undetected; we decided to use the threshold of 30%, instead of the default 50% in the Tensorflow Object Detection API [69], to impose a stricter requirement on ourselves (the “attacker”). Since

Table 3.1: Our high-confidence perturbations succeed at attacking at a variety of distances and angles. For each distance-angle combination, we show the detected class and the confidence score. If more than one bounding boxes are detected, we report the highest-scoring one. Confidence values lower than 30% is considered undetected.

Distance Angle person (Conf.) sports ball (Conf.) untargeted (Conf.)

5’ 0^◦ person (.77) sports ball (.61) clock (.35)

5’ 15^◦ person (.91) cake (.73) clock (.41)

5’ 30^◦ person (.93) cake (.66) cake (.39)

5’ 45^◦ person (.69) cake (.61) stop sign (.62)

5’ 60^◦ stop sign (.93) stop sign (.70) stop sign (.88)

10’ 0^◦ person (.55) cake (.34) clock (.99)

10’ 15^◦ person (.63) cake (.33) clock (.99)

10’ 30^◦ person (.51) cake (.55) clock (.99)

15’ 0^◦ undetected — cake (.49) clock (.99)

15’ 15^◦ person (.57) cake (.53) clock (.99)

20’ 0^◦ person (.49) sports ball (.98) clock (.99)

20’ 15^◦ person (.41) sports ball (.96) clock (.99)

25’ 0^◦ person (.47) sports ball (.99) stop sign (.91)

30’ 0^◦ person (.49) sports ball (.92) undetected —

40’ 0^◦ person (.56) sports ball (.30) stop sign (.30)

Targeted success rate 87% 40% N/A

Untargeted success rate 93% 93% 73%

an object can be detected as a stop sign and the target class simultaneously, we consider our attack to be successful only when the confidence score of the target class is the highest among all of the detected classes.

Table 3.1 shows that our high-confidence perturbations achieve a high attack success rate at a variety of distances and angles. For example, we achieved a targeted success rate 87% in misleading the object detector into detecting the stop sign as a person, and an even higher untargeted success rate of 93% when our attack goal is to cause the detector to either fail to detect the stop sign (e.g., at 15’ 0^◦) or to detect it as a class that is not a stop sign. The sports balltargeted attack has a lower targeted success rate but achieves the same untargeted success rate. Our untargeted attack consistently misleads the detection into the clock class in medium distances, but is less robust for longer distances. Overall, the perturbation is less

Table 3.2: As expected, low-confidence perturbations achieve lower success rates.

Distance Angle person (Conf.) sports ball (Conf.) untargeted (Conf.)

5’ 0^◦ stop sign (.87) cake (.90) cake (.41)

5’ 15^◦ stop sign (.63) cake (.93) cake (.34)

5’ 30^◦ person (.83) cake (.84) stop sign (.48)

5’ 45^◦ stop sign (.97) stop sign (.94) stop sign (.82) 5’ 60^◦ stop sign (.99) stop sign (.99) stop sign (.89)

10’ 0^◦ stop sign (.83) stop sign (.99) undetected —

10’ 15^◦ stop sign (.79) stop sign (.94) undetected —

10’ 30^◦ stop sign (.60) stop sign (.98) stop sign (.78) 15’ 0^◦ stop sign (.52) stop sign (.94) stop sign (.31)

15’ 15^◦ stop sign (.33) stop sign (.93) undetected —

20’ 0^◦ stop sign (.42) sports ball (.73) undetected —

20’ 15^◦ person (.51) sports ball (.83) cell phone (.62)

25’ 0^◦ stop sign (.94) sports ball (.87) undetected —

30’ 0^◦ stop sign (.94) sports ball (.95) stop sign (.79)

40’ 0^◦ stop sign (.95) undetected — stop sign (.52)

Targeted success rate 13% 27% N/A

Untargeted success rate 13% 53% 53%

robust to very high viewing angle (60^◦from the sign’s tangent), because we did not simulate this high viewing angle distortion in the optimization.

The low-confidence perturbations (Table 3.2), as expected, achieve a much lower attack success rate, which informed our use of higher-confidence perturbations when we conducted the more challenging drive-by tests. Table 3.3 shows some high-confidence perturbations from our indoor experiments.

Drive-by Tests

We performed drive-by tests at a parking lot so as not to disrupt other vehicles with our stop signs. We used a real stop sign as a control and put our printed, perturbed stop sign by its side. Starting from about 200 feet away, we slowly drove (between 5 mph to 15 mph) towards the signs while simultaneously recording video from the vehicle’s dashboard at 4K resolution and 24 FPS using an iPhone 8 Plus. We extracted all video frames, and for

Table 3.3: Sample high-confidence perturbations from indoor experiments. For complete experiment results, please refer to Table 3.1.

Dist. Angle Target: person Target: sports

ball Untargeted

40’ 0^◦

10’ 0^◦

10’ 30^◦

5’ 60^◦

each frame, we obtained the detection results from Faster R-CNN object detection model.

Because our low confidence attacks showed relatively little robustness indoors, we only include the results from our high-confidence attack. Similar to our indoor experiments, we only consider detections that had a confidence score of at least 30%.

In Figure 3.4, we show sample video frames (rectangular images) to show the size of the signs relative to the full video frame; we also show zoomed-in views (square images) that more clearly show the Faster R-CNN detection results.

The person-perturbation in Figure 3.4a drive-by totaled 405 frames. The real stop sign in the video was correctly detected in every frame with high confidence. On the other hand, the perturbed stop sign was only correctly detected once, while 190 of the frames identified the perturbed stop sign as a person with medium confidence. For the rest of the 214 frames the object detector failed to detect anything around the perturbed stop sign.

The video we took with the sports-ball-perturbation shown in Figure 3.4b had 445

frames. The real stop sign was correctly identified all of the time, while the perturbed stop sign was never detected as a stop sign. As the vehicle (video camera) moved closer to the perturbed stop sign, 160 of the frames were detected as a sports ball with medium confidence. One frame was detected as apple and sports ball and the remaining 284 frames had no detection around the perturbed stop sign.

Finally, the video of the untargeted perturbation (Figure 3.4c) totaled 367 frames. While the unperturbed stop sign was correctly detected all of the time, the perturbed stop sign was detected as bird 6 times and never detected in the remaining 361 frames.

Exploring Black-box Transferability

We also sought to understand how well our high-confidence perturbations could fool other object detection models. For image recognition, it is known that high-confidence targeted attacks fail to transfer to other models [66].

To this end, we fed our high-confidence perturbations into 8 other MS-COCO-trained models from the Tensorflow detection model zoo⁴. Table 3.4 shows how well our pertur-bation generated from the Faster R-CNN Inception-V2 transfer to other models. To better understand transferability, we examined the worse case. That is, if a model successfully detects a stop sign in the image, we say the perturbation has failed to transfer or attack that model. We report the number of images (of the 15 angle-distance images in our indoor experiments) where a model successfully detected a stop sign with at least 30% confidence.

We also report the maximum confidence of all of those detected stop sign.

Table 3.4 shows the lack of transferability of our generated perturbations. The untargeted perturbation fails to transfer most of the time, followed by the sports ball perturbation, and finally the person perturbation. The models most susceptible to transferability were the Faster R-CNN Inception-ResNet-V2 model, followed by the SSD MobileNet-V2 model.

Iterative attacks on image recognition also usually fail to transfer [66], so it is not surprising

4https://github.com/tensorflow/models/blob/master/research/object_

detection/g3doc/detection_model_zoo.md

Figure 3.4: Snapshots of the drive-by test results. In (a), the person perturbation was detected 47% of the frames as a person and only once as a stop sign. The perturbation in (b) was detected 36% of the time as a sports ball and never as a stop sign. The untargeted perturbation in (c) was detected as bird 6 times and never detected as a stop sign or anything else for the remaining frames.

Table 3.4: Black-box transferability of our 3 perturbations. We report the number of images (of the 15 angle-distance images) that failed to transfer to the specified model. We consider the detection of any stop sign a “failure to transfer.” Our perturbations fail to transfer for most models, most likely due to the iterative nature of our attack.

Model person (conf.) sports ball (conf.) untargeted (conf.)

Faster R-CNN Inception-V2 3 (.93) 1 (.70) 5 (0.91)

SSD MobileNet-V2 2 (.69) 8 (.96) 15 (1.00)

SSD Inception-V2 11 (1.00) 14 (.99) 15 (1.00)

R-FCN ResNet-101 4 (.82) 10 (.85) 15 (1.00)

Faster R-CNN ResNet-50 13 (.00) 15 (1.00) 15 (1.00)

Faster R-CNN ResNet-101 15 (.99) 13 (.97) 15 (1.00)

Faster R-CNN Inc-Res-V2 1 (.70) 0 (.00) 12 (1.00)

Faster R-CNN NASNet 14 (1.00) 15 (1.00) 15 (1.00)

that our attacks fail to transfer as well. We leave the thorough exploration of transferability as future work.

在文檔中 I Adversarial Attack and Defense of Deep Neural Networks 17 (頁 50-57)