In this section, we evaluate the performance of our proxy filtering algorithm and abnormal domain detection in terms of accuracy.
5.2.1 Accuracy of Proxy Filtering
Evaluation Metrics
To evaluate the performance of our proxy filtering algorithm, we compute a precision metric as follows.
• Precision : Number of correctly detected IP address among all IP address returned by our approach
In our experiment, we return a list of IP address as well as its port detected by our approach.
Then, we verify each of them by performing nmap (Network Mapper) scanning test. Namp takes an IP address and a port to discover services, software product used to run a service, exact version number of that product etc. on a scanned host. By manual analysis from the software product used to run a service responded from a scanning test, we can confirm whether a scanned IP address is a proxy server or not.
Table 5.2.1 shows an example of returned information by nmap scanning test during our exper-iment. Some software products such as ”AkamaiGHost”, ”nginx http proxy”, ”Squid webproxy”, and ”Sandpiper Footprint http load balancer” are labeled proxy-relevant. If a target IP class is run-ning one of the proxy-relevant software products in the list, then the IP class is confirmed as a proxy server.
Results
Figure 5.1 and Figure 5.2 show the precision as function of the length (up to 100) of the returned list for proxy filtering with and without smoothing factor included on different data sets. We set smooth to 0.1. As precision of filtering with smoothing factor is nearly twice higher than that of filtering without smoothing factor, we can conclude that proxy filtering with the introduce of smoothing factor achieves more effective detection accuracy than that of non-smoothing case. This is because by taking smoothing factor into consideration, proxy scores will not be dominated by low-frequency adoption of IP classes.
In most cases, the precision is around 70% achieved by smoothed proxy filtering on NCTU dor-mitory data set in Figure 5.1. During evaluation, some scanning tests are failed which might owing
Table 5.2: A list of software products collected by performing nmap scanning test.
Software Product Proxy Or Not
3Com switch webadmin No
AkamaiGHost Yes
AOLserver httpd No
Apache httpd No
Apache Tomcat/Coyote JSP engine No
Caucho Resin JSP engine No
Google httpd No
hp color LaserJet 4650 No
HP Jetdirect httpd No
HP LaserJet No
IBM HTTP Server No
Icecast http statistics plugin No
Jetty httpd No
lighttpd No
Microsoft IIS webserver No
Microsoft Windows Media Server No
Netscape Enterprise httpd No
Netscreen administrative web server No
nginx http proxy Yes
Sandpiper Footprint http load balancer Yes
Squid webproxy Yes
SunONE WebServer No
Urchin RSS aggregator No
Zeus httpd No
0
Figure 5.1: Precision of proxy filtering on NCTU dormitory data set.
to the downtime of scanned host. In that case, even though an IP class behaves like a proxy server in collected raw traffic logs, it will be considered a false alarm in our experiments. Consequently, it causes the decrease of detection accuracy. Overall, in the collected proxy pool, 298 (around 24.9%) IP classes lack of software product information.
5.2.2 Accuracy of Mutual Association Discovery
To evaluate the performance of proposed mutual association discovery algorithm, we collect IP classes for well-known corporations on the internet. Afterwards, to demonstrate the effectiveness of mutual associations for distinguishing domains, examination of mutual associations for IP classes within a company (e.g., Google) and between two different companies is performed. Specifically, in the case of intra-company experiment, we collect the IP classes adopted by each company. Then, for each pair of IP classes corresponding to a company, we compute its mutual associations. Finally, we collect all pairs of computed mutual association scores and draw a distribution for that. In the case of inter-company experiment, for each pair of companies, we compute mutual associations by iteratively taking one IP class from respective set of IP classes. Finally, we collect all pairs of computed mutual association scores and draw a distribution for that.
A public AS has a globally unique number, an ASN, associated with it. This number is used both in the exchange of exterior routing information (between neighboring ASes) and as an identifier of the AS itself. Following the same way, we take ASN as the groundtruth to evaluate mutual association algorithm. Detail statistics of groundtruth for evaluation are summarized in Table 5.2.2.
As illustrated in Table 5.2.2, only a few IP classes under a company appear in our dataset. Hence,
0
Figure 5.2: Precision of proxy filtering on Trend Inc. data set.
a group of IP classes under an ASN may not share similar characteristics. For example, some serve as PC, others may serve as web servers. Although the associations between IP classes and company can be aware via ASN, we still cannot learn that which pair of IP classes are mutually associated and which pairs are not.
Table 5.3: Groundtruth Statistics.
Company #AS #Prefix #IP Class #IP Class Appeared in our Dataset
Google 5 27 875 127
Facebook 1 4 40 10
Yahoo 34 177 5874 406
Evaluation Metrics
We examine the differences of mutual association scores by illustrating distributions of mutual as-sociations for IP classes inter(intra) company and inter(intra)ASNs.
• IntraComp : Distribution of mutual association scores for IP classes from the same company
• InterComp : Distribution of mutual association scores for IP classes from different companies
• IntraASN : Distribution of mutual association scores for IP classes from the same ASN
• InterASN : Distribution of mutual association scores for IP classes from different ASNs
Results
Figure 5.3 and Figure 5.4 illustrate comparison for mutual associations for IP classes and ASNs respectively. As we expected, the mutual associations among a group of IP classes under an ASN are higher than that under a company. The results prove the effectiveness of proposed mutual asso-ciations.
0.6 0.8 1
0 200 400 600 800 1000
Percentage
Mutual Association Score(K) IntraComp InterComp IntraAS InterAS
Figure 5.3: Evaluation on mutual association scores (Trend).
0.6 0.8 1
0 200 400 600 800 1000
Percentage
Mutual Association Score(K) IntraComp InterComp IntraAS InterAS
Figure 5.4: Evaluation on mutual association scores (NCTU).
5.2.3 Accuracy of Abnormal Domain Detection
To evaluate the performance of proposed framework for abnormal domain detection, we conducted experiments for FFSN detection. We use the domain corpus from the website [17]. We use both benign and suspected domains. Overall, ”lookups-benign” corpus contains 34,647 benign domains;
ff” corpus contains 94 suspected fast-flux domains detected by [4]. ”lookups-ndss-ff” corpus serves as a groudtruth for evaluating the performance of proposed model. Specifically, we compute bridging scores for all unique domains in collected corpus based on constructed profiles, which results in a list of domains ranked by bridging scores. By computing precision and recall for different length of returned domains, we complete the performance evaluation on FFSN detection.
Evaluation Metrics
We compute the following metric to evaluate the effectiveness of FFSN detection.
• Domain-Precision : Number of correctly detected domains among all domains detected by our approach.
• Domain-Recall : Number of correctly detected domains among all domains in ”lookups-ndss-ff” corpus.
Results
Figure 5.5 shows the quality of ranking using the recall-precision graph for FFSN detection. We construct two set of profiles based on Trend Inc. and NCTU dormitory data set respectively. In top 89 domains returned by our model, all returned domains are confirmed as FFSN domains as [4] did.
The top 95 returned domains contain one benign domain (bs(”runescape.com”)=66 at 90th). Table 5.4 shows the detail information of ”runescape.com”. The reason why ”runescape.com” is regarded as a suspect is because the mutual association for each pair of its IP classes learnt from the similarity of their neighboring domains in both data set is zero, which results in a large bridging score. Note that zero mutual association indicates the adjacent domains of two IP classes are disjoint, which means the two IP classes never co-occur in any traffic log.
Table 5.4: The IP classes adopted by ”runescape.com.”
64.37.71 64.90.181 65.39.250 66.151.43 69.22.158 69.31.109 80.64.4 82.133.85 85.133.44 168.75.179 209.249.24 216.180.254
50
Figure 5.5: Recall-precision graph of the abnormal domain detection model for FFSN corpus.