Privacy, Utility, and Efficiency of Anonymization policies

Chapter 5 Evaluation and Observation

5.2 Privacy, Utility, and Efficiency of Anonymization policies

In this evaluation, we define the utility and privacy for the anonymization. The

privacy is evaluated with the percentage of sensitive fields in the packet traces that have been anonymized precisely. Utility is evaluated with the percentage of malicious packet traces after anonymization that can be still detected by DUTs. We use Snort 2.8.5, which is an open source signature-base IDS, as the DUT to verify trace that we can investigate the effect of anonymization by comparing snort’s signatures and packet traces. The first step is to replay the raw ATC traces to Snort and collect the logs. Next, we anonymize these traces and replay them again to calculate the metrics.

We choose true positive (TP) and false negative (FN) as our metrics to define utility and privacy as follows: TPfield is True Positive of field, FNfield is False Negative of field and TPtrace is True Positive of trace, FNtrace is False Negative of trace, HTTP header fields. The evaluation observes that most malicious signatures are embedded in host/cookie/request.uri fields. If these fields are anonymized, the traces will not be triggered by DUTs. Figure 8 illustrates an example of false negatives that occurs in anonymizing http URI field. The original packet trace tried to access the password configuration file against GET URI /etc/passwd. If the URI argument is anonymized for privacy protection, the / signature in the packet trace will be lost --- the trace will not trigger the alert of "WEB-MISC /etc/passwd" anymore.

Figure 7: The anonymization impact of HTTP header fields [Before Anonymization]

GET

//lists/admin/index.php?_SERVER[ConfigFile]=../../../../../../../../../../../../../../../../../../../.

./../../../etc/passwd HTTP/1.1 [After Anonymization]

GET

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXX HTTP/1.1

[Snort rule]

alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS

(msg:"WEB-MISC /etc/passwd"; flow:to_server,established; content:"/etc/passwd";

nocase; metadata:service http; classtype:attempted-recon; sid:1122; rev:6;) Figure 8. False negative in anonymized HTTP trace

Figure 9 and 10 compare the privacy and utility of PCAPAnon with two other anonymization tools, anontool and tcpanon which can anonymize packet payload. In the privacy evaluation, we defined sensitive identities for four protocols,http, ftp, pop and smtp, as Table 8 lists. Figure 9 shows the comparison result, we can observe that (1) anontool provides only pattern substitution and leads to privacy leak seriously, (2) tcpanon uses customized parsing to hide more identities than anontool, but its ftp protocol parser not be defined well. It misses some sensitive identities such as PORT and STOR, (3) PCAPAnon provides DPA to hide identities accurately due to its rich set of protocol parsers.

Figure 10 shows the utility result, we can observe that

(1) Although anontool keeps excellent utitility after anonymization, it maintains the privacy poorly withitd rough pattern substitution;

(2) tcpanon’s customized parsing overwrites significant content, so the utility becomes poor.

(3)Because PCAPAnon provides protocol dissection which can parse protocol fields accurately, it avoids overwriting the packet payload and modified the specific content which occurred in tcpanon, and results in less alternation of signatures.

Table 7. Identities of Privacy Measurement

HTTP sensitive fields FTP sensitive fields POP sensitive fields SMTP sensitive fields

Proxy_authentication USER USER HELO

Proxy_authorization PASS PASS MAIL FROM:

WWW_authenticate PORT Reply.+OK RCPT TO:

Content RETR Mail address DATA

Authorization STOR IP address Reply.220

Set_cookie Reply.150 URL address Reply.250

Referer Reply.227 Mail address

HOST Reply.230 IP address

Cookie Reply.331 URL address

Mail address Mail address IP address IP address URL address URL address

Figure 9: Privacy of Anonymization Tools

Figure 10: Utility of Anonymization Tools

Besides privacy and utility, we define a metric for the efficiency of anonymization as follows to consolidate the former two metrics as one. The single metricis useful for comparing anonymization tools and policies when both privacy and utility are of concern.

Figure 12 shows the efficiency of PCAPAnon three-level anonymization policies. (1) Hiding MAC/IP addresses and Checksum in the L2/L3/L4 headers provides little protection of sensitive identities, (2) Pattern Matching allows to replace Mail/IP/URL within the payload, but if identities were not defined in patterns, these identities will be missed from protection. (3) Protocol Dissection can precisely parse for protocol semantics (e.g.. Host, Cookie, Authentication Key, Password and User ID) to anonymize not only the patterns but also field values. In our experiment, the three combo policies gain good efficiency up to 93%.

Figure 11: Efficiency of anonymization tools

Figure 12: Efficiency of three-level anonymization policies 5.3 Statistics of FP/FN cases

FPNA provides a convenient way to find out the FPs and FNs of DUTs. In this section, we study the causes of FP/FN cases using the same ATC source traffic captured from the NCTU BetaSite. In our investigation, the main causes of FP/FN can be subsumed into three types: Type 1 is attributed to signature design.The signatures are too general or rough, so that they can easily match the packet content. Type 2 is attributed to traffic similarity, where normal traffic may behavior weirdly or mistake to other network protocol. Type 3 is Rule Configuration, meaning some configuration

arguments may not be instituted well, such as a threshold too high to detect a specific malicious behavior of.

Figure 13(a) shows the most frequent FP cases (1) SQL Injection comment attempt results from BitTorrent traffic similarity because the client binds port 80 (2) FTP wu-ftp bad file completion attempt [ results from "[" character often appear in ftp transfer data (3) EXPLOIT Veritas Backup Agent DoS attempt results from BitTorrent traffic similarity because the client bind port 10000 (4) Google Chrome setInterval Denial of Service results from SetInterval('swltxtColor()', 500) may be used in many web pages and its’ User-Agent is Mozilla/4.0 not Chrome. (5) IBM Lotus Domino Accept-Language Buffer Overflow results from Accept-Language field does not exist buffer overflow code just because field length over 100. Figure 13(b) present the proportion of three types that Traffic Similarity accounts for 63% of the FP cases because P2P dynamic port make the DUTs mistake the application protocols (e.g. (1) and (3)).

(a) Top five frequent cases

(b) Proportion of three causes Figure 13: Statistic of False Positives

Figure 14(a) shows the most frequent FN cases. (1) SQL SA brute force login attempt TDS v7/8 due to threshold be set to 5 times in 2 seconds, but in our investigation, it happens in 3 times in 2 seconds. The other four cases result from the DUTs does not have such signatures. From Figure 14(b), we can know the insufficiency signatures account for 62% of FN cases.

(a)Top five frequent cases

(b) Proportion of three causes Figure 14: Statistic of False Negatives

Chapter 6 Conclusions and Future Works

This work proposes a PCAP Lib framework to provide well-classified packet traces with anonymization and FP/FN case studies from these traces. ATC collects 323 distinctive packet trace in five months. 33% of the packet traces are healthy and 67% are malicious. The distribution of collected traces shows that web applications, which occupy 40%, are a frequent way that attacker used to exploit.

In anonymization, we define “privacy/utility” and “efficiency” to evaluate the different anonymization methods. PCAPAnon uses DPA to achieve the best efficiency 93%. Moreover, PCAPAnon’s efficiency of pattern matching 51% higher than anontool due to it supports global search.

In FP/FN case studies, FPNA gives the statistic of cases from ATC collected traces. Herein, we focus on security devices, but the method could be extended to other DUTs. In false positive, we observe that traffic similarity 63% dominates the high percentage because P2P dynamic port makes the DUTs mistaking the application protocol. In false negative, signature insufficiency, which is the main cause, occupies 62% high proportion. To researcher and developer, PCAP Lib provides completeness and flexibility to satisfy their various purposes.

Although PCAP Lib has many functions, it still exists an issue needs to be solved.

As Section 5.2 shows false negative in anonymized trace, if malicious signatures are embedded in privacy fields, we choose to protect privacy first. Because these signatures are modified, packet trace will not be triggered by IDS/IDP. According to the feedbacks of IDS/IDP then reserving the signature contents is a way to avoid this situation happen. Another issue is due to our anonymization policy script base on manual decide which protocol field should be transferred. But if the packet traces contain various protocols, it will hard to configure. Hence, a good way is to use traffic statistic tool (e.g. trace-summary) identify the protocols in traces and provide a collaborative mechanism for user can modify the same policy script.

References

[1] P. Porras, and V. Shmatikov, "Large-scale collection and sanitization of network security data: risks and challenges," Proc. of the 2006 workshop on New security paradigms, Germany, pp. 57-64, Sep. 2007.

[2] G. Minshall, "TCPdpriv: Program for Eliminating Confidential Information from Traces," Ipsilon Networks, Inc. http://ita.ee.lbl.gov/html/contrib/tcpdpriv.html.

[3] R. Pang, M. Allman, V. Paxson, J. Lee, "The Devil and Packet Trace Anonymization," ACM SIGCOMM Comput. Commun. Rev., vol. 36, pp. 29-38, Jan.

2006.

[4] K. Lakkaraju and A. Slagell, "Evaluating the Utility of Anonymized Network Traces for Intrusion Detection," Proc. of the 4th international conference on security and privacy in communication netowrks, Sep. 2008.

[5] Tcpanon, available at http://www.ing.unibs.it/ntw/tools/tcpanon/

[6] W. Yurcik, C. Woolam, G. Hellings, L. Khan, B. Thuraisingham,

"SCRUB-tcpdump: A Multi-Level Packet Anonymizer Demonstrating Privacy/Analysis Tradeoffs," 3rd IEEE Intl. Workshop on the Value of Security through Collab. (SECOVAL), pp. 49-56, 2007.

[7] D. Koukis, S. Antonatos, D. Antoniades, E. P. Markatos, P. Trimintzios "A Generic Anonymization Framework for Network Traffic," Communications, 2006.

ICC '06. IEEE International Conference on, pp. 2302-2309, Nov. 2006.

[8] R. Pang, and V. Paxson, "A High-Level Programming Environment for Packet Trace Anonymization and Transformation," Proc. of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, Karlsruhe, Germany, pp. 339-351, Aug. 2003.

[9] G. C. Tjhai, M. Papadaki, S. M. Furnell, N. L. Clarke, "Investigating the Problem of IDS False Alarms: An Experimental Study Using Snort," IFIP, Proc. of The IFIP Tc 11 23rd International Information Security Conference, pp. 253-267, Jul. 2008.

[10] Packetlife, available at http://www.packetlife.net/captures/.

[11] Pcapr, available at http://www.pcapr.net/.

[12] S. Axelsson, "The Base-Rate Fallacy and the Difficulty of Intrusion Detection,"

ACM Transactions on Information and System Security (TISSEC), vol. 3, pp.

186-205, Aug. 2000.

[13] Harpoon traffic generator, available at http://pages.cs.wisc.edu/~jsommers/harpoon

[14] DARPA Intrusion Detection Evaluation Data Sets, available at

http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/index.html [15] M. Roesch, "Snort: Lightweight intrusion detection for networks," USENIX

LISA Conference, 1999.

[16] V. Paxson, "Bro: A system for detecting network intruders in real-time," Comput.

Networks, vol. 31, no. 23, pp. 2435-2463, 1999.

[17] U. Lamping, "Wireshark Developer's Guide," Wireshark Foundation, 2008.

available at http://www.wireshark.org/docs/wsdg_html_chunked/

[18] I. W. Chen, P. C. Lin, C. C. Luo, T. H. Cheng, Y. D. Lin, Y. C. Lai, F. C. Lin,

"Extracting Attack Sessions from Real Traffic with Intrusion Prevention Systems,"

ICC, IEEE, 2009.

[19]R. Sommer and V. Paxson, "Outside the Closed World: On Using Machine Learning For Network Intrusion Detection," Proc. IEEE Symposium on Security and Privacy, May 2010

[20]J. Heidemann and C. Papadopoulos. "Uses and Challenges for Network Datasets,"

In Proc. IEEE Cybersecurity Applications and Technologies Conference for Homeland Security (CATCH), Washington, DC, USA. Mar. 2009

[21]M. Foukarakis, D. Antoniades and M. Polychronakis,"Deep packet anonymization," Proc. ACM EUROSEC, Mar. 2009

[22]Ying-Dar Lin, Chun-Nan Lu, Yuan-Cheng Lai, Wei-Hao Peng, and Po-Ching Lin,

"Application Classification Using Packet Size Distribution and Port Association,"

Journal of Network and Computer Applications, Vol. 32, Issue 5, pp. 1023-1030, Sep.

2009.

Appendix. POP3 Payload Deep Anonymization

This table shows the anonymization result of PCAPAnon, The left column shows the original payload data which is a plain mail content. The right column shows mail 6 Received: from mail-iw0-f194.google.com 7 (mail-iw0-f194.google.com [209.85.223.194]) 8 by d2-spool-lb-0.nctu.edu.tw (Postfix) with ESMTP id 9 B010969A8D7;

10 Tue, 27 Oct 2009 09:38:09 +0800 (CST)

1 +OK 797880 octets

2 Return-Path: <[email protected]>

3 X-Original-To: [email protected] 4 Delivered-To:

5 [email protected] 6 Received: from mail-iw0-f194.google.com 7 (mail-iw0-f194.google.com [138.52.206.189]) 8 by d2-spool-lb-0nctuedu.tw (Postfix) with ESMTP id 9 B010969A8D7;

10 Tue, 27 Oct 2009 09:38:09 +0800 (CST)

11 Authentication-Results: d2-spool-lb-0.nctu.edu.tw;

12 sender-id=none

13 [email protected];

14 spf=none

15 [email protected] 16 Received: by iwn32 with SMTP id 32so6481732iwn.23 17 for <multiple recipients>; Mon, 26 Oct 2009 18 18:38:08 -0700 (PDT)

19 MIME-Version: 1.0

20 Received: by 10.231.1.22 with SMTP id 21 22mr1672949ibd.56.1256607488296; Mon, 26 22 Oct 2009 18:38:08 -0700 (PDT)

23 Date: Tue, 27 Oct 2009 09:38:08 +0800 24 Message-ID:

25<8197f2480910261838h50b01e49x933c16c77428dba1@

26 mail.gmail.com>

27 Subject: =?Big5?B?xbLD0azsvsekwLLVs/inaarsqqk=?=

28 From: =?Big5?B?pP2pycV0?=

29 <[email protected]> 35 Content-Type: multipart/mixed;

36 boundary=00151773eaa0f64c850476e0badb 37

38 --00151773eaa0f64c850476e0badb 39 Content-Type: multipart/alternative;

40 boundary=00151773eaa0f64c740476e0bad9 41

42 --00151773eaa0f64c740476e0bad9 43 Content-Type: text/plain; charset=Big5 44 Co